Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study

Zhao, Qinghe; Liu, Xinyi; Fang, Junlong

doi:10.3390/en16247962

Open AccessArticle

Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study

by

Qinghe Zhao

,

Xinyi Liu

and

Junlong Fang

^*

Electrical Engineering and Information College, Northeast Agricultural University, Harbin 150030, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(24), 7962; https://doi.org/10.3390/en16247962

Submission received: 26 October 2023 / Revised: 26 November 2023 / Accepted: 28 November 2023 / Published: 8 December 2023

(This article belongs to the Special Issue Forecasting Techniques for Power Systems with Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Short-term power load forecasting refers to the use of load and weather information to forecast the Day-ahead load, which is very important for power dispatch and the establishment of the power spot market. In this manuscript, a comprehensive study on the frame of input data for electricity load forecasting is proposed based on the extreme gradient boosting algorithm. Periodicity was the first of the historical load data to be analyzed using discrete Fourier transform, autocorrelation function, and partial autocorrelation function to determine the key width of a sliding window for an optimization load feature. The mean absolute error (MAE) of the frame reached 52.04 using a boosting model with a 7-day width in the validation dataset. Second, the fusing of datetime variables and meteorological information factors was discussed in detail and determined how to best improve performance. The datetime variables were determined as a form of integer, sine–cosine pairs, and Boolean-type combinations, and the meteorological features were determined as a combination with 540 features from 15 sampled sites, which further decreased MAE to 44.32 in the validation dataset. Last, a training method for day-ahead forecasting was proposed to combine the Minkowski distance to determine the historical span. Under this framework, the performance has been significantly improved without any tuning for the boosting algorithm. The proposed method further decreased MAE to 37.84. Finally, the effectiveness of the proposed method is evaluated using a 200-day load dataset from the Estonian grid. The achieved MAE of 41.69 outperforms other baseline models, with MAE ranging from 65.03 to 104.05. This represents a significant improvement of 35.89% over the method currently employed by the European Network of Transmission System Operators for Electricity (ENTSO-E). The robustness of the proposal method can be also guaranteed with excellent performance in extreme weather and on special days.

Keywords:

STLF; load forecast; machine learning; boosting algorithm

1. Introduction

Load forecasting, also called power energy demand forecasting, is a crucial business in need of both commerce and engineering. Forecasting provides a guarantee of stability and a decision reference for power supply planning, transmission, and distribution systems planning, as well as power systems operations and maintenance, financial planning, and rate design. Technically, load forecasting is classified into short-term load forecasting (STLF), medium-term load forecasting, and long-term load forecasting based on the forecasting horizon. Medium-term and long-term load forecasting, usually through a service grid or utility company for planning and maintenance such as transmission and distribution planning, contain integrated resource planning, energy efficiency planning, etc. STLF plays a more important role in the modern energy market environment, with the development of independent system operators (ISOs) and a connection to international or regional grids [1,2].

Day-ahead load forecasting is an STLF that, in the morning, needs to provide the power energy demand for the entire following day. Day-ahead forecasting is a daily necessary demand for the Day-Ahead Energy Market, wherein market participants purchase and sell electric energy at financially binding day-ahead prices for the following day, and then the day-ahead market allows energy providers and utility companies to hedge against price volatility in the Real-Time Energy Market by locking in energy prices before the operating day [3]. Further, day-ahead energy forecasting, as a part of an STLF, has fundamental transparency information to submit to some international grid organizations or energy markets [4]. It is critical to create a level playing field between market participants and avoid a scope that is conducive to the abuse of market power for the creation of efficient liquid and competitive transaction environments, making accurate short-term load forecasts even more critical.

In recent years, with the development of artificial intelligence, the combination of electric load forecasting and machine learning has become a popular academic topic. There are continuous studies exploring the intersection between the two. While most of the innovative research on machine learning combined with other fields is at the algorithm level, several include various neural network structures and the application of new heuristic algorithms in hyperparameter optimization (HPO). However, it is the data-driven algorithm that machine learning methods are. The quality of data directly determines the performance of forecasting or prediction. There is no unified frame to determine what the most suitable form of input data is. (1) For historical load features, the width of sliding windowing modeling is mostly determined by experience. Serhani et al. conducted neural network-based modeling to forecast power load in the Île-de-France region and evaluated the future load using a rolling forecasting method, whose inputs consisted of only 30 half-hour load features but with sufficient performance [5]. Tian et al. designed a recurrent neural network (RNN) model using only wider 21 × 24 load features for next-day forecasting [6]. (2) For external features, data-driven algorithms have more flexible data acceptance capabilities than traditional time series models and can more conveniently introduce datetime and meteorological variables, which are considered to be essential in the model building. However, there is no consensus on what kind of meteorological variables have a significantly positive impact on the modeling process and how to consider meteorological information in a large area, such as electric load demand on a national level. Morais et al. designed a data-driven model including a gate recurrent unit (GRU) and long–short-term memory (LSTM) for the short-term electric load forecasting task in four major regions in Brazil, the inputs of which are historical load features over the past 7 days, integer-coded datetime features, and three forms of regional temperature and specially designed temperature encoding as meteorological features [7]. Abu-Rub et al. designed a stacked ensemble of boosting and multilayer perceptrons to achieve point-to-point electric load forecasting using only temperature data; moreover, they verified the effectiveness of the method using two datasets from the United States and Malaysia [8]. Tahir et al. proposed a forecasting scheme back propagation neural network model that combines Prophet and LSTM and uses 48 historical load series and temperature series, as well as the total load sampled daily within 7 days, and incorporates datetime features in the form of one-hot encoding [9]. (3) Last, the question of how to apply machine learning algorithms as a generative model in electric load sequences to meet the time logic and modeling logic in engineering also needs a more detailed discussion.

As shown above, the key points of current research have been focused on a powerful algorithm rather than the dataset or load series quality [10]. However, in fact, research on data quality and training methods is more valuable than research on algorithms themselves. Therefore, in this research, an algorithm was adopted without any hyperparameter tuning from start to finish for modeling with the aim of conducting a summary study on future machine learning methods for electric load forecasting from the input of data.

In this research, first, the window width of the load series was determined based on the cyclicality and periodicity of the power load using the discrete Fourier transform (DFT), the autocorrelation function (ACF), and the partial autocorrelation function (PACF). Second, the best combination of datetime features and the composite meteorological variables from multiple sources were determined for a regional grid level. Third, a better machine learning training method was determined based on the Minkowski distance for advanced performance. Finally, this research determined a historical load using a width of 7 days as the optimal window width, the sf3 + if4 + bfs datetime features in combination with effective integrates, and 15 sets of meteorological information input into the boosting model. The MAE value can reach 41.69 in the 200-day test dataset without any HPO process, which is 35.89% and 40.56% lower than the current model used in the European Network of Transmission System Operators for Electricity (ENTSO-E) and the best comparison, respectively. It is a persuasive breakthrough in the field of load forecasting. It also has a certain robustness in extreme weather and during holidays. We believe that this work has the potential to be used in real-world power load demand forecasting applications, and we look forward to further research on improving the model’s performance and generalizability.

The rest of this paper is structured as follows.

Section 2 provides a review of the methods and materials used in this research. Section 2.1 describes the core principles of the boosting algorithm; Section 2.2 introduces applicable methods for feature engineering regarding load and external features; Section 2.3 shows the data-driven modeling training methods in this paper; Section 2.4 describes the case study dataset, including the sources of load data and meteorological variables; Section 2.5 introduces the baseline models for comparison and the metrics for evaluation used in this manuscript.

Section 3 provides the results of this paper. Section 3.1 presents the results of the optimal window exploration based on methods in Section 2.1; Section 3.2 shows the results after adding the external variable features; Section 3.3 presents a comparison of the training methods proposed in Section 2.3; Section 3.4 presents the forecasting results of the final framework adopted in this paper, including the comparison with the baseline model and the forecasting results for extreme weather and special dates.

Section 4 concludes and highlights potential avenues for future research and development.

2. Materials and Methods

2.1. Extreme Gradient Boosting Algorithms in Load Forecasting

The essence of electricity load forecasting using machine learning lies in the multi-regression task within supervised learning. The basic flow is shown in Figure 1.

Firstly, a certain length of historical load series is required for the training set. Then, the series is restructured into a time series dataset for the samples of the supervised learning problem using the sliding window method with a fixed width. Next, some reasonable and effective variables are added as the external features in the samples. Finally, the model is trained or fitted to build the forecasting model in a specified way with the objective function of the designed algorithm. In the application process, the model will predict or forecast the expected sequence data to complete the load forecast after the evolution.

The model building here Is a complex system of engineering with three key steps. First, for load features, the width of the window sliding should be determined according to the correct size and should both save the information based on the periodicity and avoid the chaotic noise [11]. Second, external variables need to be determined to assist historical load features for better forecasting performance [12]. Last, for the model-building method, the algorithms adhere to a purposeful design that utilizes a reasonable training method using a dataset split point that meets the time logic of application forecasting of feature engineering.

In this manuscript, the core algorithm applied is the extreme gradient boosting algorithm, a kind of gradient boosting method developed by Tianqi Chen in 2017 [13]. The algorithm ensembles decision tree models with a greedy search strategy. As an additive ensemble model, it considers the gradient of the first- and second-order derivatives of the loss function in the Taylor series and constructs a model that is approximately correct (PAC) in probability. The objective function is as follows [13]:

o b j^{(t)} = \sum_{i = 1}^{n} [l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]} + {\hat{y}}_{i}^{[t]})] + Ω ({tr}^{[t]})

(1)

where n is the counts of trained samples, t is the number of meta estimator iterations, loss is the loss function used to measure the distance between predicted residual and actual values, tr is the classification and regression tree (CART) meta model,

{\hat{y}}_{i}^{[k]} = t r^{[k]} (x_{i})

, and

Ω

is the regular term used to control model complexity and robustness.

After performing Taylor expansion, retaining the first- and second-order partial derivative terms, and then ignoring the Peano remainder, the remaining part of the loss function in Equation (1) is approximately equivalent to the following function:

\sum_{i = 1}^{n} [l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]} + {\hat{y}}_{i}^{[t]})] = \sum_{i = 1}^{n} [l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]}) + g_{i} t r^{[t]} (x_{i}) + \frac{1}{2} h_{i} t r^{[t]}^{2} (x_{i})]

(2)

where

g_{i} = \frac{\partial l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]})}{\partial ({\hat{y}}_{i}^{[t - 1]})}

and

h_{i} = \frac{\partial^{2} l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]})}{\partial {({\hat{y}}_{i}^{[t - 1]})}^{2}}

. The regular term on the right side in Equation (1) is configured based on the CART tree structure. It can be expanded into a regular term represented by the leaf nodes T and the corresponding weight of tree leaf ω, as follows:

Ω (t r^{[t]}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} ω_{j}^{2}

(3)

where γ is the complexity cost measure after introducing new nodes and λ is the strength of the L2 regular term.

Combining (2) and (3) into (1) obtains the following form of object function:

o b j^{(t)} = \sum_{i = 1}^{n} [l o s s (y_{i}, {\hat{y}}_{i}^{[t - 1]}) + g_{i} t r^{[t]} (x_{i}) + \frac{1}{2} h_{i} t r^{[t]}^{2} (x_{i})] + γ T + \frac{1}{2} λ \sum_{j = 1}^{T} ω_{j}^{2}

(4)

Sequence number i relates to the sample, and sequence number j relates to the tree node in the formula. After mapping the two into j uniformly, the following form of the objective function is obtained:

o b j^{(t)} = \sum_{j = 1}^{T} [G_{j} ω_{j} + \frac{1}{2} (H_{j} + λ) ω_{j}^{2}] + γ T

(5)

where

G_{j} = \sum_{i \in I_{j}} g_{i}

,

H_{j} = \sum_{i \in I_{j}} h_{i}

.

Then, solve for

\frac{d o b j^{(t)}}{d ω_{j}} = 0

in order to obtain the best-expected object function of the algorithm and obtain

* ω_{j} = - \frac{G_{j}}{H_{j} + λ}

. Bring the

* ω_{j}

back to (5), the final result of the object function is as follows:

o b j^{(t)} = \sum_{j = 1}^{T} [G_{j} ω_{j} + \frac{1}{2} (H_{j} + λ) ω_{j}^{2}] + γ T

(6)

where

G_{j} = \sum_{i \in I_{j}} g_{i}

and

H_{j} = \sum_{i \in I_{j}} h_{i}

; then, returning Equation (6) to a single CART tree can be achieved to obtain the node-splitting measure as the following:

G a i n = \frac{1}{2} [\frac{G_{L}^{2}}{H_{L} + λ} + \frac{G_{R}^{2}}{H_{R} + λ} - \frac{{(G_{L} + G_{R})}^{2}}{H_{L} + H_{R} + λ}] - γ

(7)

Equation (7) can be decomposed as the score of the new left leaf, the score of the new right leaf, the score of the original leaf, and the regularization of the additional leaf. After all boosting meta boosting, the feature importance values for the features can be calculated using the CART model as follows [11,14,15]:

F I V (m) = \frac{\sum_{i \in m} G a i n (i)}{\sum_{j = 1}^{T} G a i n (j)}

(8)

For the ensemble model F, the global FIV is calculated based on the average of the statistical feature m, which corresponds to the ratio of the gain values of all CART tree species to the sum of the gain of the ensemble learning model species:

F I V s (m) = \frac{\sum_{n = 1}^{N} \sum_{i \in m} G a i n (i)}{\sum_{n = 1}^{N} \sum_{j = 1}^{T} G a i n (j)}

(9)

The FIV or FIVs are used to measure the impact of features on the final decision from the built effectiveness model [15].

In general, the extreme gradient boosting algorithm completes the ensemble in the dataset according to the objective function of Equation (6). In addition, the contribution degree of each feature to the final prediction can be reflected by the obtained FIVs of the model with good interpretability.

2.2. Feature Engineering in Boosting Model

Technically, load forecasting with machine learning is a regression task that is 24 points in the future day ahead. The process requires a dataset to provide the algorithm with power load information and external reference information for better performance. Eventually, forming effective samples with temporal executable logic to train the model would be designed, and these also need to satisfy the feasibility of the input in the forecasting process. The process, shown as a flow chart in Figure 2, includes (1) determining the load features built through the sliding window and (2) designing external features to assist forecasting, including the datetime variables and meteorological factors.

2.2.1. Load Features from Sliding Window in Width from Periodicity

Load features comprise the historical actual power load before forecasting time, which is the main part used to extract time series patterns to train the model and then use a predetermined developing pattern to forecast future values based on fixed experiences. In other words, the past time series is fitted to infer or decide the possible form of future data, and the scope of historical data in the model training process is obviously critical. Too many historical data will introduce trends that have already changed into the current time forecast or that are part of a random chaos without the possibility of being forecasted; meanwhile, too little historical data will not be able to contain the entire trend of the sequence, resulting in inaccurate data trends. Therefore, there is a need to determine the optimal width size of sliding windows.

Electric power load is a special type of time series that differs from financial sequences and weather sequences. They are not from chaotic systems but rather originate with stable periodicity and cyclicity from human activity patterns. Further, they also have a determinable capacity upper limit from the constraint of the power system. Therefore, if there was fixed periodicity or cyclicity in the power load, the appropriate window width would be closely related to those expected in the research.

(1) Fourier transform

Fourier transform is a method of converting time-domain data to frequency-domain data. Any waveform in the time domain can be represented as a superposition of sinusoidal waves with different amplitudes and phases in the frequency domain. If the synthetic signal f(t) satisfies the Dirichlet conditions in the range of (−∞, +∞), it can be converted to frequency-domain component signals g(freq) as follows:

g (f r e q) = \int_{- \infty}^{+ \infty} f (t) \cdot e^{- 2 π i \cdot f r e q \cdot t} d_{t}

(10)

The power loads time series in this paper are sampled discretely with limited length, and the fast discrete Fourier method proposed by Bluestein [16] is used instead:

g (f r e q) = \sum_{t = 1}^{N} [f (t) \cdot e^{- 2 π i \cdot f r e q \cdot \frac{t}{N}}]

(11)

where N is the length of the series to analyze, and the

f r e q

series contains the frequency bin centers in cycles per unit of the sample spacing with zero at the start. The second half of the

f r e q

series is the conjugate of the first half, and only the positive is saved.

The DFT can convert time series to frequency-domain data and, generally, if there is a significant frequency component [11,17,18,19], the period corresponding to that frequency is the period of the time series data.

The cyclicity and periodicity of electric load are further explored using information in the time domain, including the analysis of the autocorrelation function and partial autocorrelation function.

(2) Autocorrelation function

The autocorrelation function (ACF) is a method used to describe the influence of historical data on the present, providing us with the correlation degree of data at different periods [20]. The ACF estimates the autocorrelation coefficient by calculating the correlation between the two time periods before and after the sequence. For the autocorrelation coefficient of lag k, it is calculated using the following formula:

A C F (l, k) = \frac{γ_{k}}{γ_{0}}

(12)

where

γ_{k}

is the autocorrelation coefficient for lag k, which is calculated as follows:

γ_{k} = \frac{1}{N - k} \sum_{t = k + 1}^{N} (l (t) - l_{m e a n}) (l (t - k) - l_{m e a n})

(13)

In this equation,

l (t)

is the electricity load sequence, which contains N elements, with t as the time index, and

l_{m e a n}

is the mean of the load series.

The ACF value reflects the similarity between two observations: If there is a cyclical correlation between the actual load at two sampling points, the value of ACF will appear in peaks at some positions in the cycle. However, the evaluation of periodicity and cyclicity using the ACF is seriously affected by the problem of multicollinearity. At this time, the value of the ACF will still appear in peaks at a relatively low lag and will decay to the next cycle.

(3) Partial autocorrelation function

The appearance of multiple peaks helps us to assess the cycle; however, it affects the exploration of correlation. Therefore, the partial autocorrelation function (PACF) must be used to explore the correlation degree after excluding other lag terms [20].

PACF is a measure of the correlation between paired observations in time series data, excluding the influence of other lag values. It can better reflect the simple correlation of the data. The calculation of PACF requires the use of the Yule–Walker equation to solve [21] as follows:

[\begin{matrix} ρ_{0} & ρ_{1} & ρ_{2} & \dots & ρ_{k - 1} \\ ρ_{1} & ρ_{0} & ρ_{1} & ρ_{k - 2} \\ ρ_{2} & ρ_{1} & ρ_{0} & ρ_{k - 3} \\ \dots \\ ρ_{k - 1} & ρ_{k - 2} & ρ_{k - 3} & \dots & ρ_{0} \end{matrix}] [\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ \dots \\ ϕ_{k} \end{matrix}] = [\begin{matrix} ρ_{1} \\ ρ_{2} \\ ρ_{3} \\ \dots \\ ρ_{k} \end{matrix}]

(14)

where

ρ_{k} = A C F (l, k)

, and the PACF(k) of the expected k lag terms is the

ϕ_{k}

element in the solution vector.

2.2.2. External Features from Datetime and Meteorological Information Variables

(1) Datetime variables

The addition of external variables is an important method used to improve the prediction performance of time series forecasting. Although load features provide a continuation of the pattern, the load series also have periodic changes from human activity patterns. For instance, there is a clear difference in actual loads between workdays and weekends, as well as between the influence of predictable social activities such as holidays. This explicitly known information, which is marked by time, is difficult for algorithms to accurately capture in the samples obtained after the window sliding process through only single load features. However, flexible modeling methods and the data-driven characteristics of machine learning allow one to customize the introduction of this time axis information as datetime factors or datetime features.

The time-related datetime factors used to introduce the features are shown in Table 1.

First, four types of datetime variables based on the date of the forecast day, f1~f4, were added to describe the chronological order to predict of the year, month, and week. Second, the four variables are considered in two forms: integer and sine–cosine pairs [22,23] (SCPs) as ifs and sfs. The former, ifs, is directly introduced into the dataset as integer data, while the latter, sfs, considers the periodicity of variables and converts it into a pair of features in the form of [sin(x/T), cos(x/T)] as shown in Table 1. Last, two Boolean variables, bf1 and bf2, were introduced to determine weekends and holidays, wherein the holiday or weekend is True.

(2) Meteorological variables

The meteorological factors are regarded as another major category of features that correlate strongly with electricity demand. Seasonal climate change and human life rhythms are coupled in time, and the cyclicality of meteorology is related to agriculture or industrial production as well. Moreover, meteorological factors such as temperature, humidity, and precipitation, which come from chaotic systems [24,25,26], directly affect the life rhythm of residential electricity consumption. In cases of extreme heat and extreme cold, humans will use more electrical equipment to control the environmental temperature. Extreme weather such as precipitation and blizzards, because people’s daily life is disturbed by multiple factors, will also change daily electricity habits [27].

The following 14 meteorological data are initially considered as external features in this research, as shown in Table 2. The Description columns of the table describe in detail the meaning of each variable. Additionally, according to their physical, geophysical, and meteorological meaning, the variables were divided into five categories, indexed from A to E.

However, as mentioned, most meteorological information comes from complex chaotic systems. Based on the time logic of day-ahead electricity load forecasting, these factors and the objects occur simultaneously in day-ahead forecasting. This means that simultaneous and accurate small-scale meteorological information is not feasible in engineering applications. Therefore, the meteorological features adopted in this paper need to be resampled to larger-scale meteorological features with higher forecasting feasibility. All the raw data in the table are real-time data with an hourly time resolution. Considering the forecasted information bias in the actual application process, that is, the future prediction information will not be completely accurate, this research has resampled it into three types of data in daily intervals as the mean, maximum, and minimum of the factors. In the implementation process, meteorological factors were taken as one of the output dimensions, and then a model was built to study them based on their history load and datetime features.

2.3. Training Methods for Data-Driven Models

Traditional model-driven time series algorithms applied to load forecasting, including ARIMA, ETS, and Prophet, first decompose series into components such as the trend term and the cycle term under configured parameters. Then, the future energy trend is expanded according to the existing pattern, as shown in Figure 3a, where the T.D. is the load of today and the D.H. is the day-ahead load to forecast. These methods can follow the human logical model to extrapolate the electric load sequence according to a historical pattern. The new load value points are extended on the original trend; however, the potential long-term pattern requires a longer series. An over-sized series will also amplify random chaotic errors, making it difficult for the model to identify key factors that are significant in the long-term history.

For general machine learning methods, there is a fixed forecaster built from the historical dataset, which could be applied in all future loads. As Figure 3b shows, whatever the distance between the series to be forecasted and the fitted series, they share the same single forecaster without the updated dataset. The trends and periodicity can be extracted from the historical dataset; however, the method cannot perceive the changes at times close to the forecasting day.

To incorporate nearby changes into the boosting models, this manuscript designed three updating training methods that can build more forecasters on different days.

The grow method can rebuild a new forecaster for each power load, and each forecaster is fitted using the datasets, starting with the newest samples, as shown in Figure 3c. For example, the (k + 1)th forecast is from the model that was trained by samples from the first day in the historical load to the k – 1 day. And the fixed method would rebuild the new forecaster fitted from the newest samples in the fixed lengths instead of all load series.

The dis—short for distance—method would construct similar subsets for the single forecaster; however, those subsets would only contain a similar series in which the load is yet to be forecasted. The similarity here is defined by the distance of the newest daily load features between the day to predict and the samples in the historical datasets as shown in Figure 3d. The distance in this manuscript was set using the Minkowski distance from the multifeature characters of the electrical loads calculated as follows [8]:

d (l_{f r c s - 1}, l_{h i s}) = \sqrt[p]{\sum_{h = 1}^{H} {| l_{f r c s - 1} (t = h) - l_{h i s} (t = h) |}^{p}}

(15)

where

l_{f r c s - 1}

is the load series before the day to forecast for day-ahead forecasting;

l_{h i s}

is the historical load in datasets; H = 24 represents the daily samples for the series pairs; p denotes the degree parameters of the Minkowski distance and was set as 2 in this manuscript.

After the calculations of the distances, the fixed count samples were computed as the subsets for fitting the forecaster. In this manuscript, the samples were configured to 365 to compare with the vanilla methods.

In addition, this research also designed a method with a smaller size of subsets to observe the differences. in28 includes the methods with a training dataset with 28-day (four-week) samples, and each training set comprises the data of the four weeks before the date to be predicted.

2.4. Dataset, Estonian Power Load, and Meteorological Information

Estonia has a population of about 1.3 million with a clear urban–rural divide. A total of 78% of the population lives in the urban areas of four major cities, namely, Tallinn, Tartu, Narva, and Pärnu, accounting for more than 45% of the population. Rural populations live in the southern and eastern regions of Estonia, where they work in agriculture, forestry, petrochemicals, and mining. The industrial distribution of Estonia is characterized by a clear regional imbalance, with most industries being concentrated in the Harju and Ida-Viru provinces. Harju, including the capital city, Tallinn, is the economic center of Estonia and the most developed industrial region, with a focus on information technology, manufacturing, and services. Ida-Viru is Estonia’s industrial heartland and one of the largest petrochemical bases in Europe.

The dataset used In this study is the real-time power of Estonia from Elering AS [28] and ENTSO-e [29]. The data, as shown in Figure 4, are directly collected from the SCADA system with a sampling time resolution of 1 h. Due to the adoption of daylight-saving time in Estonia and Europe, some defects may occur at specific transformed time points. The time is converted to standard UTC + 2 time to ensure that there are no factual samples missing.

To minimize the impact of the COVID-19 pandemic on the study, the time span was chosen to range from 1 December 2020 to 1 July 2023. The experimental data and insightful conclusions involved in the study were obtained from December 2020 to 31 December 2022, and the final conclusion evaluation effect was verified in the first 200 days of 2023.

Figure 5a shows the electrical consumption proportion by industry [30]. The electricity demand from industries dominated by manufacturing comprises the largest part, accounting for 24.7% of the total. This electricity is used for production and processing. Household demand is the second largest, accounting for about 23.1% of lighting, heating, cooking, and washing. The service sector accounts for about 21.6%, mainly for commercial, office, medical, and educational purposes. Heat supply is also an important part of electricity demand, accounting for 15.2%, which varies greatly between seasons due to Estonia’s cold and long winters and short and warm summers. The transmission of electricity through the grid inevitably results in losses, which Estonian covers by buying electricity from the Nordic electricity exchange. In 2021, losses accounted for about 5% of the total amount of electricity supplied to the main Estonian grid [28].

The regional electricity demand encompasses a broad area between latitude 57.3° and 59.5° north and longitude 21.5° and 28.1° east. Therefore, we incorporated the nationwide sampling information of 15 countries (maakonnad) in Estonia into the model as shown in Figure 5b. The meteorological information used in the study was obtained from the CERRA, ERA5, and ERA5-Land systems of the European Centre for Medium-Range Weather Forecasts (ECMWF) [31]. As shown in Figure 5b, the large-scale meteorological variables can, basically, represent the meteorological conditions in the inland, coastal, mountainous, and island areas of Estonia. All the raw data in the table are real-time data with an hourly time resolution. It has been resampled into three types of data in daily intervals as the mean, maximum, and minimum of the features.

2.5. Baseline Models and Evaluation Metrics

2.5.1. Baseline Models

Six different comparative models are used in this manuscript to forecast the load demand in the Estonian grid, including four model-driven and two RNN models,

(1) TRAPUNTA model (EU)

TRAPUNTA [32], short for Temperature Regression and oad Projection with Uncertainty Analysis, is a model-driven method developed by Milano Multiphysics for ENTSO-E. The model is based on a complex methodology for electric load projection analysis, which contains regression, model order reduction, and uncertainty propagation.

The TRAPUNTA is currently used by ENTSO-E, MedTSO, and various TSOs across Europe and North Africa to conduct adequacy studies and scenario-building for European and North African countries. The forecasting values in this manuscript refer to the release results on ENTSOE’s transparent platform.

Three additional model-driven forecasting methods—SARIMA, exponential smoothing, and Prophet—were considered as baseline models. These algorithms require longer series as input data for better performance; therefore, the models were built for each day-ahead forecast up to four weeks before the sample and predict the next 48 points, in which the last 24 points comprise the day-ahead values.

(2) Autoregressive integrated moving average models (ARM)

Autoregressive integrated moving average (SARIMA) models are classical statistical models that are widely used for time series forecasting. They are composed of autoregressive, moving average, and seasonal components. There are six parameters that need to be considered: p, d, q, and P, D, Q for the seasonal part.

This paper implemented cross-validation to realize the SARIMA model. This method estimates the performance of the model by using the Akaike information criterion and selects the optimal parameter configuration using a stepwise procedure. For the specific parameters, a search range was set for all orders, including seasonal P, Q, and general p, q, from 1 to 5 for testing, and the order of the difference transformation before the SARIMA model was limited to 10 for searching.

(3) Exponential smoothing models (ETS)

Exponential smoothing models are efficient time series forecasting models that use exponential smoothing to estimate model parameters. They are suitable for a variety of time series data, including power load data forecasting, which describes the series with trend, seasonal, and residual terms and then makes the forecasts.

This paper used the automation time series method for ETS that also uses AIC to configure parameters. The error, trend, seasonal, and damped trend parameters were configured for the model using 1000 as the maximum number of iterations in automation.

(4) Prophet models (PRF)

Prophet, also referred to as FProphet or MetaProphet, is an open-source time series forecasting model developed by Meta (Facebook). It is a non-parametric model that can be used to forecast a variety of time series data including electric load data. The basic principle of the Prophet model is to decompose time series data into three components—trend, seasonality, and noise—and then use these components to generate forecasts [33,34].

In this paper, the Prophet model configuration considered weekly seasonality and set the holiday impact range to 10. The changepoints used a position prior scale of 0.05 and a changepoint range of 0.9. A larger number of Monte Carlo samples was also used to improve the accuracy of the model as much as possible.

Additionally, this research adopted recurrent neural networks (RNNs) as the baseline models for the data-driven machine learning algorithms. RNN is well established for forecasting time series and can capture long-term dependencies in time series data. Electric load data exhibit clear seasonality and trends, making RNNs well suited to capture these relationships. LSTM and GRU were adopted as the baseline algorithms.

(5) Long–short-term memory model (LSTM)

LSTMs are a type of RNN with a gating mechanism. They consist of four gates: forget gate, input gate, output gate, and memory cell. These gates control the flow of information in the LSTM model, preventing gradient vanishing and gradient explosion problems.

(6) Gate Recurrent Unit (GRU)

GRUs are a simplified version of LSTMs with fewer parameters and a faster training speed. They consist of two gates: the reset gate and the update gate. The reset gate controls whether the information from the previous state needs to be retained, and the update gate controls how to combine the new input with the information from the previous state.

The research refers to the research methods of the pattern in the paper [35,36] and the design of two RNN models with high recognition performance. Both models are implemented using TensorFlow and contain two RNN layers (LSTM or GRU) with 64 hidden units and a dropout rate of 0.2 before the dense layer to the final predicted values. The loss function of the model is MSE, and the optimizer is Adam. The training is completed in 100 epochs, with a batch size of 128. In addition, 5% of the data in the 2021 training dataset are configured as validation to evaluate the performance of the model. The RNN model is then applied to other data as predicted values.

2.5.2. Evaluations Metrics

(1) Basic metrics

Initially, MAE and MAPE were used as the basic evaluation metrics for the forecasting performance of the models [37].

Mean absolute error (MAE) is the measure of errors between paired forecasted values expressing the same actual load values as follows:

MAE = \frac{\sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |}{n}

(16)

where y is the actual load values of the time in t;

\hat{y}

is forecasted at the same time with y; n is the counts of the samples in the dataset.

The MAE can be used to directly show the difference between the forecasted and the actual loads. However, since the magnitude of the daily values is different, the MAE value within a single day is useful for comparing models; meanwhile, the difference between days needs to be measured using the unified scale mean absolute percentage error (MAPE). The MAPE was calculated as follows:

MAPE = \frac{1}{n} \times \sum_{i = 1}^{n} | \frac{y_{i} - \hat{y_{i}}}{y_{i}} |

(17)

where the difference between the actual and forecasted is divided by the actual value. The absolute value of this ratio is summed for every forecasted point in time and divided by the number of fitted points n.

In this manuscript, the metrics of 1-MAPE would be applied for better vision such as with the accuracy metrics, which ranged from 0 to 100%: the larger the number, the better the forecasting performance. MAEs of different scales were used by adjusting the denominator N according to the description scenario, such as the error in a single day or a whole dataset for validation.

(2) Metrics for peak time

For the peak time of daily load, three additional derivative metrics were designed for a comprehensive evaluation.

The peak time of the load in a single day was defined as T_PL, and the forecasted peak load time of the model is TH_PL. When the absolute difference between the two Δ is less than

κ

,

Δ \leq κ

; this considers the likelihood that the model successfully captured the peak time on that day in tolerance of

κ

; then, the accuracy of the peak time can be calculated as follows:

A C P T_{κ} = \frac{1}{N} \times \sum_{i = 1}^{d} I (| T H_{P L} - T_{P L} | \leq κ)

(18)

where d is the count of the days in the dataset’s sum of N daily samples. ACPT

κ

is the average metric used to measure the accuracy ability of models to capture the peak load time.

MAE_Tpl, using the MAE, is defined as the difference between the peak time of the forecasted series and the actual series in a single day as follows; moreover, MAE_MAX is defined by the absolute difference between the actual and load forecasted value at the peak time as follows:

{MAE}_{T_{P L}} = \frac{\sum_{d = 1}^{N} | l_{a c t u} (t = T_{p l} (d)) - \max (l_{f r c s} (t \in d)) |}{N}

(19)

{MAE}_{M A X} = \frac{\sum_{d = 1}^{N} | l (t = T_{p l} (d)) - l_{f r c s} (t = T_{p l} (d)) |}{N}

(20)

where

l_{a c t u} (t = T_{p l} (d))

and

l_{f r c s} (t = T_{p l} (d))

are the max values of the actual and forecasted values;

l_{f r c s} (t \in d)

is the forecasted load series with 24 outputs in a single day.

{MAE}_{T_{P L}}

can measure the predictive ability of the model for the expected maximum capacity in the future grid system.

{MAE}_{M A X}

represents the model’s ability to capture the true actual peak demand when applied.

3. Results

3.1. Historical Load Width from the Periodicity

3.1.1. Periodicity of Power Load Series

Figure 6 shows the results of the DFT for the actual load in the Estonian electricity system, the sampling periods for which are 365, 730, 180, and 90 days in the training or train-validation dataset. The four figures only show the top 10 components of each sampling period ranked in descending order. The x-axis is the period in hours and the negative frequencies were ignored; the y-axis is the magnitude of the strength of the frequency domain. The gray bar is the periods based on total sampled length, the blue bar is the periods based on the single day, and the red is the periods need to be care for.

In addition to the grey multiple harmonics of the sampling period, such as 4380(1/2) and 2190(1/4) for 8760 in Figure 6a, all four figures show clear human-related frequencies. The blue 24 h and its corresponding half-harmonic frequency of 12 h bar are the main strongest components in a single day among the figures. The periods in the range from 166 h to 168 h contain the second largest components, which are colored red and correspond to a period of 7 days. The FFT decomposition confirms that the electricity load in the Estonian grid we studied exhibits multiple nested periods when the sampling periods reach a long enough length. In addition to the frequency octave component of the sampling range generated by the FFT itself, the useful periods are closely related to periods of both 1 day and 7 days.

Figure 7 is the ACF and PACF values of the electricity load sequence from 1 January 2021 to the end of the year, with 24 × 21 lags. Figure 7a,c shows the ACF and PACF values sorted by lag index. Figure 7b,d shows the top 20 ACF/PACF values in descending order.

The results of the ACF clearly show the correlation between a single time point and the t-k time points. Figure 7a shows multiple peaks, which are spaced between 24 time points, indicating the high autocorrelation of any time point within the past 24 × k points. As shown in Figure 7b, except for the strong intraday autocorrelation (t-1 to t-24), the correlation is also significant from the t-165 to t-170 sampling points, as shown by the red bar. The PACF shows the correlation after removing the influence of previous points. In Figure 7c, there is a clear peak just before t-168. After this, the PACF values start to decrease, indicating that the influence of the sequence has significantly decreased. Figure 7d quantifies the PACF values rank and shows that, except for the intraday near points, only 169 and 168 are significant among the top 20 PACF values.

Furthermore, this study considered the correlation of the total electricity demand within a single day in the analysis. Figure 8 shows the ACF (Figure 8a) and PACF (Figure 8b) results of the series obtained using the resampled summing of the load demand of a single day.

The ACF value of d-7 in Figure 8a is the second highest after D-1 (the previous day), which indicates that the autocorrelation is still strong between the total load demand of any single day and its previous seventh day. However, the PACF values in Figure 8b reflect the strong effects in the two days (d-6 and d-8) before and after d-7. Compared to the more detailed time mesh analysis shown in Figure 7, the analysis of the total electricity demand within a day also reveals a strong hint of the load itself for a 7-day cycle.

The CART (meta models of boosting algorithm) and boosting algorithms, as cited in Section 2.1, were applied to build models using the actual load data in the Estonian system. Figure 9 shows the results of the FIV and FIVs of the modeling and Figure 9b,d shows the historical load features with the top 10 feature importance rankings among the models. The models are validated on the validation dataset for the entire year of 2022, whose MAEs of CART and XGBOOST are 71.00 and 53.12, respectively. The boosting algorithm has a better performance under the condition that the training error is almost at the lower limit.

The CART model’s features ranged from d-6 to d-7, as shown in Figure 9a,b. Especially for t-152, with over 60% of FIV for all features; moreover, in Figure 9b it almost dominates the overall trend of the CART model for the load forecasting. For t-25, the CART tree takes it as the second highest gain values feature, which is the closest feature to the load to the forecast in the day-ahead style. The extreme gradient boosting algorithm has similar results to the CART model. The boosting model focuses on the features around t-152 as well; however, it is not the absolute domain feature in Figure 9c,d. Indeed, t-152, t-151, t-153, t-156, t-162, and t-161, ranged from d-6 to t-7 and occupied six positions in the top 10 values of FIVs; moreover, the remaining data are, basically, within the time features from d-1 to d-2, including t-25, t-26, and t-27, which are close to the series to forecast in time periods.

Based on the analysis, FFT, ACF, PACF, and FIVs, basically, determined the optimal feature length to be around d-7 of the day to be forecasted for the future loads.

3.1.2. Window Sliding Width of Load Variable

The effectiveness of modeling using electricity load demand in Estonia was explored with a 1d basic cycle. The window widths range from 1 × 24 (d-2) to 30 × 24 (d-31), which covers the entire year of 2021, including some data from the end of 2020, to meet the requirements of the sliding window. The boosting model is fitted without parameter tuning and was validated using the 2022 full-year dataset.

Figure 10 shows the validation results. The x-axis of the figure is the window width accumulated in units of 24 h. The y-axis is the MAE calculated from the model obtained by using the boosting algorithm to forecast each corresponding single-day load of 2022.

From the results of MAE in Figure 10, it is clear that there is a regional optimum at d-7. In the range from d-2 to d-7, the MAE decreases rapidly, indicating that the difference gradually decreases between the forecast value of the model and the actual data; then, it reaches the lowest value at d-7, i.e., the optimum width. Subsequently, in the range from d-7 to d-31, the MAE increases slightly and stabilizes in a certain range. The MAE value there does not change significantly and has reached a de facto stable state.

The decrease in MAE in the early stage of window widening can be explained by the fact that, as more historical load data are introduced, the algorithm can continuously introduce more effective information to improve the forecast ability of the model. However, such improvement is not endless, and, at d-7, the historical information is already sufficient due to the highly cyclical and periodic nature of electrical demand. A wider window width actually introduces redundant information that is irrelevant to forecasts, which actually forms the multicollinearity effect [38,39,40].

Based on the results of the validation set, and combined with the conclusions of the cyclical and periodic nature of the model in Section 3.1.1, for the historical load series, a time width of 7d can be chosen for both sufficient information and to avoid irrelevant interference. This width can fully integrate historical trends without using the redundant information of the tail type, with MAE = 52.04 in the validation dataset. Therefore, the authors of this study chose lag = 7d (24 × 7 h-points) as the modeling length of the historical load features.

3.2. External Information Features

3.2.1. Datetime Features

The research in this study uses the extreme gradient boosting algorithm with the same hyperparameters and then fits and validates them in the same training and validation datasets. The results for ifs, sfs, and bfs are shown in Figure 11a, where the vanilla model, V1, is the pure historical load fitting-prediction model as described in Section 3.1, without adding any external variables.

Based on the experimental results in Figure 11a, firstly, most datetime variables have improved after being introduced. The best positive effect is the addition of if3, which reduces MAE from 52.04 to 50.06. The second improvements were bf1, which essentially exhibited the same weekly information, and if3, which is in strong agreement with the 7-day periodic conclusion in Section 3.1. Secondly, in the comparison between ifs and sfs, there are different effects for different types of datetime, where the SCP variables’ results of f1 and f3 are worse than the integer; f2 and f4 have the opposite results but are only slightly improved. Moreover, f2 and f4 are markers of the position of the datetime point in the year to emphasize a larger degree of periodicity. Last, the addition of bf2 slightly improved the model’s performance even if it had no overlapping part with other datetime features.

The results of adding all variables of each type are shown in Figure 11b. The ifs, sfs, and bfs in the figure, respectively, represent the integer, SCP, and Boolean variables. First, after adding datetime features, all combinations have a significantly improved effect than in the V1 model using MAE. The best improvement effect among the combinations is the integer and Boolean (ifs + bfs), which reduces MAE to 50.18 compared to the vanilla of 52.04 but is still not as good as the if2 at 50.06, as shown in Figure 11a. This is related to the mutual construction of the added variables in the data. Second, the sine–cosine pair combinations are generally not as good as the integer ones. When Boolean features are not added, the two have a difference of almost 1.00 for MAE (50.60 vs. 51.57); moreover, after Boolean features are added, the gap is narrowed to 0.6 (50.18 vs. 50.72). Last, the research in this study shows that all results somewhat improved after Boolean features were added in.

Two results show the special mutual influence of datetime features. Even if features with the same essence but different types are added, the validation results are still complex. The research finally validated all 324 (3 × 3 × 3 × 3 × 2 × 2) kinds of combinations and, finally, obtained the top five results, as shown in the red bar in Figure 12.

Firstly, the difference in MAEs among the five optimal combinations is not large; however, compared to V1 (blue bar), the simple combinations ifs + bfs (purple) and single added if3 dataset (orange) achieved the best combinations, with a significant or almost significant improvement of 2.5–0.4. Secondly, all the optimal combinations depend on the concept of a week in their way, which either exists in bf1, if3, or sf3, and can be considered key effective information, which is absolutely consistent with the 7-d periodic results in Section 3.1. Last, all the results contain the bf2 feature, which is related to the introduction of key statutory holidays that are unrelated to other information.

The conclusion shows that the combination of sf3 + if4 + bfs is the best, with MAE = 49.6038, which is 4.69% lower than the vanilla. Further conclusions will be based on this as vanilla2, or V2.

3.2.2. Meteorological Information Combination Features

The same boosting method was configured to fit the five groups of meteorological variables separately based on V2. The results of the validation dataset are shown in Figure 13, where all is the result of adding all the data in Table 2.

First, it is obvious that introducing all external features can significantly improve the forecasting performance of the model. Compared with V2, after adding the five groups of features from 15 cities, the MAE is further reduced from 49.60 to 44.84—a huge improvement. Second, not all individual features have a positive effect on the model. The Bs and Es groups have a negative impact on the model, while the As, Cs, and Ds groups can effectively improve the performance individually. Among them, the As group has the most obvious improvement with an MAE of 45.20, which is almost the same as that when all features are introduced.

This study conducted a comprehensive combined exploration according to the information provided in the following Table 2. A five-bit binary code was used to represent the type of combinations. For example, 10001 represents the combination of As and Es, while 00000 represents V2, the model without any external meteorological variables. The following combination diagram (Figure 13) shows the results of introducing 32 (2⁵) external variables. The model used remained the same to ensure it adhered to the control variables.

According to the MAE results in Figure 13, the optimal external variable combination here is 11110 of MAE = 44.33, which is obtained by adding four categories of external meteorological information—temperatures (As), pressure (Bs), precipitation (Cs), and radiation (Ds)—as external variables based on V2, which, compared with the MAE here, is reduced by 10.63%; moreover, it can be compared with V1 with only load features, whose MAE is reduced by 14.82%. However, regarding the omitted one, variable Es (winds), the results show that, in almost all cases, the introduction of Es information ensures a negative impact on the forecasting for 13 of the 16 pairs of results, as marked with an arrow in Figure 14. Therefore, including the variable Es leads to a worse MAE.

In Figure 15 Tartu, Põlva, Haapsalu, and Rapla are clearly high-weight regions in this model, with FIVs of 15.64%, 14.42%, 10.81%, and 6.72%. In all meteorological features, the four contribute nearly 58% of FIVs. In fact, Tartu and Põlva are cities in the southeastern inland of Estonia, Rapla is an inland city in the middle part, and Haapsalu is a coastal city in northwestern Estonia, as shown in Figure 15, whose degree of red color represents the the FIV value and the star marked the sampled position. The meteorological information of these four locations represents the typical climates of Estonia [41].

Regarding the specific meteorological features from Figure 16a, temperature information (a1, a3, and a4) is the first largest, of which soil information (a4) accounts for almost 66.8% of all external features and clearly dominates the modeling process of the model; meanwhile, air temperature (a1) and apparent temperature (a3) also have a relatively large advantage, and the three contribute to more than 90% of the total FIV values of external variables. Figure 16b represents other external features, and it can be observed that dew point temperature (a2) and several solar radiation variables (d1–d4) have a significant impact, while rain information, including snowfall and pressure information, actually has a limited impact.

From the analysis results in Section 3.2.1 of the datetime features and in Section 3.2.2 of the meteorological features, the exploratory experimental results show the following:

(1) The datetime features in the modeling can effectively improve the prediction performance of the model under the premise of width = 7; however, the impact of the added feature categories and feature types on the prediction performance is different. In the end, adding a sine–cosine pair form of the day in the week as sf3 and an integer form of the day in the year as if4, as well as a Boolean form of weekend and holiday information as bfs, can effectively improve the forecast effect of the model based on d = 7;

(2) For the external features, overall, it can significantly improve the effect of load forecasting by using the multiregional sampling information in a wide area. And, the temperature information, especially soil temperature information, plays a key role in the modeling process. Moreover, more accurate modeling needs solar radiation, precipitation, and air pressure information based on temperature information.

In summary, the authors of this study chose d = 7 as the historical width for load features, adding the sf3 + if4 + bfs variables as datetime features and 12 meteorological information variables of four categories in 15 regions as meteorological variables. Further conclusions were based on this as a V3 model.

3.3. Results of Forecasting Performance in Training Methods

The evaluation metrics between the forecasted and actual load in the validation dataset are shown in Figure 17.

Figure 17 shows a violin plot of the boosting model with the same hyperparameters and the six training methods mentioned in Section 2.3, where the V3 is the fitted and forecasted training model described in Section 3.2. The extreme forecasting values were marked, and the dotted line is the Q1 and Q3 of error distributions. The middle square point is the MAE of each model as shown in the legend.

As shown in the difference in the 2022 validation dataset in Figure 17, training methods have an ensured impact on the forecasting performance for the models with the same hyperparameters. First, larger training datasets have a better improvement effect. The grow, fixed, and edis have significant improvement effects, with improved MAEs of 44.32 (V3), 37.01 (grow), 38.55 (fixed), and 37.84 (edis), respectively. And, in28, which only uses monthly training data, has a worse performance, but its MAE is 47.97, which is also close to V3. Second, from the violin part of the figures, all methods have similar error distributions, and the shape of the violin plots is symmetric around the y = 0, indicating that the models do not have a clear tendency to over-forecast or under-forecast. Larger-dataset models, such as the grow, the fixed, and the edis, have better upper bounds for positive errors than V3; however, the lower bounds for negative errors do not apply to all. Specifically, the fixed model has a smaller error bound value (281.26 than 355.03 of V3), and edis has a smaller lower bound value (−268.30 than −287.59). The edis model has softer upper and lower boundaries and, therefore, better robustness.

Figure 18 is based on the MAE values statistically by month. The three training methods with large training datasets—grow, fixed, and edis—have better performance than V3 in the validation dataset. They have good performance in each month except in August and October, as reflected in the average MAE in Figure 17. Although in28 has an overall worse performance, is not entirely bad and has better performance than vanilla in February, July, and September.

In conclusion, the authors chose the edis method as the final model. First, it has excellent MAE values and little difference when compared to the optimal grow scheme. Second, it has small upper and lower error margins and a well-distributed error in Q quantiles, which indicates robustness. Third, it has a large dataset, which provides comprehensive information and can be pre-screened by distance to further improve modeling accuracy.

This scheme will be referred to as ours in the following part of this manuscript. Specifically, based on Section 3.1, the method uses a width of d-7 as historical load information. Based on Section 3.2.1, The authors chose sf3 + if4 + bfs as the datetime features to assist the model in locating the historical timeline. Based on Section 3.2.2, it uses 12 meteorological features, As–Ds, as auxiliary information. Based on the conclusions of Section 3.3, the authors used the second-order Minkowski distance to pre-screen the evaluation data and then performed tracking modeling.

3.4. Comparison with Baseline Models

Based on the results in Section 3.2 and Section 3.3, this study evaluates the performance of the final model and the aforementioned compared algorithms on a new 2023 dataset. The study uses data from 200 days in 2023 for testing. For more information on the forecasted result for power loads, please refer to Appendix A.

3.4.1. Error between Forecasted and Actual Load

The final method from Section 3.3 is shown in Figure 19, marked as ours. Figure 19a is the overall mean absolute error of the forecasted day-head values in the test dataset from 200 × 24 points of samples; Figure 19b shows the results of the percentages of points, which are better than the MAE in Figure 19a for each model; Figure 19c is a violin box plot of 200 × 24 prediction points, whose components have the same meaning as in Figure 17.

Figure 19a shows the overall performance of each model, marked with MAE(1-MAPE%). The final proposal, ours (red bar), has the lowest MAE among all models at 41.96. All four boosting models, V1~3 and ours, have a smaller or similar MAE, ranging from 41.69 to 66.20, compared with the EU (purple bar) valued at 65.03, in which the algorithm is applied using ENTSO-E. The effectiveness of the research has been proved here.

Figure 19b shows the global robustness of the models. The percentage of the forecasted points performed better than the global 1-MAPE. About 65.48% of 200 × 24 points could be forecasted with the precision results of the methods, which exhibits more stability than ENTSOE’s approximately 64.62%. Additionally, LSTM and GRU also have good performance in such evaluation metrics; however, two RNN models have worse MAEs than in the results of Figure 19a. Therefore, the robust results cannot beat the performance in the forecasting of power demand of the two RNN models.

Figure 19c shows the distribution of the forecasting errors of the models. The proposed model almost has the smallest upper and lower errors and quartiles, which indicates that the Ours model does not make extreme forecasting. These results corroborate the stability and robustness of the model.

Based on the results of Figure 19, the proposed method has more accurate performance and stable robustness in power demand forecasting along with a significant advantage in overall point-to-point, day-ahead load forecasting.

3.4.2. Peak Load Capture and Maximum Capacity Forecasting

Table 3 shows the forecasting performance of the models in relation to the occurrence time and value of the daily peak load demand. The

A C P T_{κ}

column represents the accuracy in capturing the peak load time (T_PL), where the on time column is the proportion of the peak time forecasted by the model that is exactly the same as the actual peak load time.

The proposal method has the most obvious advantages in all six metrics in the table. First, the accuracy of capturing the peak load on time can reach 34.5%, which is higher than the 30.0% of the current ENTSOE algorithm and is a significant improvement from the other algorithms. Second, in the forecasting of peak load time obtained using MAET_PL, the proposal model reached 41.61, which is higher than the other model valued from 80.62 to 148.92, which indicates the model has a strong ability to predict future peak capacity. Last, regarding MAE_MAX, all ten models have larger metrics than overall MAE in Figure 19a; however, this method still has the lowest number of differences (49.67) compared to the others, which ranged from 78.83 to 164.56.

There is a note that the V1~V3 methods in this study are not good at forecasting peak load times with MAET_PL from 57.21 to 67.97 and MAE_MAX from 61.59 to 76.88. In other words, boosting-based models have the ability to predict the maximum capacity; however, they cannot predict the peak load time with enough accuracy. The proposal method improves on this disadvantage, as seen in the results of V1~V3 and Ours.

Based on the results, the designed model has an excellent peak load-handling ability. It can forecast the time of peak occurrence with high accuracy and realize future capacity prediction based on future maximum demand. In addition, even if the predicted peak time is not completely accurate, the algorithm can still meet the demand of the future peak time with a lower MAE value. This is an excellent result for day-ahead STLF.

3.4.3. Robustness of Models

Figure 20 shows the distribution of forecasting using 1-MAPE for a single day. The red squares represent 1-MAPE less than 90%, which can be considered as failed forecasting. Blue and green represent an accuracy higher than 90% and 95%. The right side of the figure shows the proportion of days, with 1-MAPE ≥ 90% in the test dataset sum of 200 days.

In Figure 20, the proposed method has the highest forecasting accuracy, where only 10 days have the worst 1-MAPE below 90% in day-ahead forecasting. However, among these 10 days, 6 days of the model also performed better than all other models. The worst 10 forecasting results of the daily MAE are shown in Figure 21.

Even on these samples with generally inaccurate forecasting results, the proposal model is still able to maintain certain usability, showing MAE values similar to the best in others. It still has the best MAE in 6 days and only has poor performance on 2 April (EU is the best), 7 April (EU is the best), 4 June (RNN(LSTM) is the best), and 7 June (RNN(GRU) is the best). This fully demonstrates both the accuracy and robustness of the model.

3.4.4. Performance in Extreme Weather and Special Days

Figure 22 shows the day-ahead forecasting for three single days: 2 February, which is the Ours model’s best performance, and 2 April and 7 June, which are the Ours model’s unique worst performances, as shown in Figure 21. For Figure 22a, the forecasting values from the Ours model almost overlap with the outstanding MAE of 12.82. In Figure 22b, there is a sharp fall at 12:00 in the load from the actual line, which means that some sudden occurrence happened and no model could forecast it as a prognosticator.

The research in this study also analyzed three days with extreme weather that is difficult to forecast. Figure 23 shows the three extreme climates in Estonia within the first 200 days of 2023 (the range of our test set), wherein 20 January (Figure 23a) was an extremely cold day, 18 July (Figure 23b) was an extremely hot day, and 12 July (Figure 23c) was the day with the highest annual rainfall. The proposed method still achieved a better performance on these days, with MAE values of 19.27, 24.99, and 30.38, which are, numerically, better than the EU and all others. The trends in the figure also show that the proposal method can accurately fit the load demands with good adaptability.

Figure 24 shows the days with a significant impact from foreseeable special social activities, where Figure 24a is New Year’s Day, Figure 24b is Easter Sunday (Estonian language, ülestõusmispühade 1. püha), and Figure 24c is Midsummer Night (jaanipäev). The Ours model has a sufficient forecasting ability on all three days. On 1 January—New Year’s Day—Ours has the almost same MAE as the best EU model; on 9 April, only the Ours model forecasted an additional trough between 4 am and 8 pm, indicating that the model still has a certain degree of precise control in the case of sudden changes in electricity demand. Jaan, on 24 June, (Figure 24c) was the worst-performing day of 1-MAPE in the model on single day-ahead results. However, compared to our other models, although the effect is not good, the MAE can still be maintained at 117.04, which is still significantly better than other methods with MAE values of 124.79–174.40.

Based on these results, the designed model still performs excellently even on especially difficult forecasting days. First, it can forecast the actual demand, with 1-MAPE achieving over 90% and 95% in the test dataset. Second, it can maintain a stable forecasting performance on days with lower accuracy in a single day. Finally, compared with other models, the designed model still has a lower MAE score on days with naturally extreme weather conditions such as extreme cold, heat, and heavy rain, as well as on days with predictable social events such as holidays.

4. Conclusions

In this paper, a comprehensive boosting model was proposed for day-ahead load demand forecasting. The model in this research achieved state-of-the-art results on benchmark models when applied to the Estonian grid. We believe that our work has made significant contributions to the field of power load demand forecasting by providing solutions to some key details.

DFT, ACF, PACF, and FIVs were used to explore the load series for its periodicity and cyclicality, which were determined to ensure the best modeling length. A fairly large-scale exploratory experiment verified the effectiveness of our theory, and the research in this study initially built the vanilla model with an MAE = 52.04 and a 7-d load width in the validation dataset. A comprehensive exploration of the complementary external information of the load series was conducted, including datetime and meteorological variables. The datetime variables were determined as a form of sf3 + if4 + bfs, with an MAE = 49.60 from 324 combinations, and the complex meteorological combinations of 540 features for the meteorological variables from 15 sampled sites further decreased the MAE to 44.32 in the validation dataset. On the basis of 1 and 2, this research conducted a comparative study of the five training methods proposed in Section 2.3. Finally, based on the evaluation of the upper and lower bounds and the MAE of the forecasted values, the complete form of the model was determined. Specifically, based on the input form of 1 and 2, this research used a large-scale dataset search based on the Minkowski distance to complete the construction of historical data. Then, the day-ahead forecast was completed with an MAE = 37.84 in the validation dataset, which was used as the final result for the day-ahead electrical load forecasting of Estonian power based on boosting models. The above final forecasting model was compared with six other widely used models on the same dataset. The results demonstrate the applicability of this model and show that the model has high forecasting performance and good robustness. In addition, the model has significant advantages in forecasting future peak load times and peak load capacities, and it can still maintain high availability even in extreme weather and predictable social events.

We believe that our work has the potential to be used in real-world power load demand forecasting applications, and we look forward to furthering research on improving the model’s performance and generalizability.

5. Discussion

Data-driven machine learning methods are at the forefront of time series forecasting. In this process, data quality is essential, including the quality of the original data and the way it is input. This paper explores load data and external data in detail to obtain conclusions that are applicable to the case study, namely, using a lag width of 7 days and a specific combination of date–time and meteorological features. However, as is the case with machine learning and data mining, our findings are based on our research on the Estonian grid. These findings may not be applicable to other regions. Therefore, it is not appropriate to directly cite our conclusions when conducting related work based on this paper. A more appropriate approach is to follow the research path of this paper, i.e., to calculate the lag width using DFT, ACF, and PACF, and then to find the correct combination of date–time and meteorological features on the validation dataset. Only in this way can quality data be ensured.

Alternatively, this paper emphasizes that the problem of electric load forecasting is both a machine learning and a time series forecasting issue, where the forecasted load is a time series and the input features are also time-related. The large amount of input content is highly correlated, which also raises a new question: is cross-validation suitable for electric load forecasting? In the real-world application of machine learning, there are generally four processes: Train–Validation–Test–Application. The first three processes, including hyperparameter optimization, feature engineering, and data exploration, can be completed using cross-validation. However, in the case of limited datasets, using cross-validation for time series will lead to incompleteness at the sample level, resulting in insufficient training. For example, if four-fold cross-validation is used in one year, the training data for each season (approximately) is incomplete. This is because the load has different characteristics in spring, summer, autumn, and winter. However, because the electric load has high similarity, this incomplete training does not seem to produce too-poor results. This paper strictly divides the research into three datasets based on this contradiction. Whether cross-validation is suitable for electric load forecasting remains to be further studied.

Finally, the choice of algorithm for load forecasting is also a common problem in the field of machine learning. In the exploration stage of this paper, we explored the application of models for a long time; however, most models have a good enough performance. Moreover, based on the research content of this paper, even the boosting model that has not undergone a careful hyperparameter optimization process still has excellent potential. This does not deny the significance of algorithm development and research, and we hope to further explore the potential of algorithms from the perspective of data.

Author Contributions

Conceptualization, Q.Z. and J.F.; methodology, Q.Z.; software, Q.Z.; validation, Q.Z. and X.L.; data curation, X.L.; writing—original draft preparation, Q.Z.; writing—review and editing, Q.Z., J.F. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2016YFD0300610.

Data Availability Statement

The dataset and fitted models can be downloaded at GitHub at: https://github.com/gniqeh/neau006uusmaailm (accessed on 1 December 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ACF	autocorrelation function
ACPT	accuracy in peak time
CART	classification and regression tree
D.H.	day-ahead load
DFT	discrete Fourier transform
ENTSO-E	European network of transmission system operators for electricity
FIV	feature importance value
GRU	gate recurrent unit
HPO	hyperparameters optimization
LSTM	long–short-term memory
MAE	mean absolute error
MAPE	mean absolute percentage error
PAC	approximately correct
PACF	partial autocorrelation function
RNN	recurrent neural network
STLF	short-term load forecasting
T.D.	today load
TPL	time point of peak load

Appendix A

The day-ahead forecasting results from 1 January 2023 for 200 days.

Figure A1. The day-ahead forecasting results by method in this paper.

References

Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy Forecasting: A Review and Outlook. IEEE Open J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
Hong, T. Short Term Electric Load Forecasting. Doctor’s Thesis, North Carolina State University, Raleigh, NC, USA, 2010. [Google Scholar]
Giabardo, P.; Zugno, M.; Pinson, P.; Madsen, H. Feedback, Competition and Stochasticity in a Day Ahead Electricity Market. Energy Econ. 2010, 32, 292–301. [Google Scholar] [CrossRef]
Hirth, L.; Mühlenpfordt, J.; Bulkeley, M. The ENTSO-E Transparency Platform—A Review of Europe’s Most Ambitious Electricity Data Platform. Appl. Energy 2018, 225, 1054–1067. [Google Scholar] [CrossRef]
Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M. Optimal Deep Learning LSTM Model for Electric Load Forecasting Using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
Tian, C.; Ma, J.; Zhang, C.; Zhan, P. A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network. Energies 2018, 11, 3493. [Google Scholar] [CrossRef]
Morais, L.B.S.; Aquila, G.; De Faria, V.A.D.; Lima, L.M.M.; Lima, J.W.M.; De Queiroz, A.R. Short-Term Load Forecasting Using Neural Networks and Global Climate Models: An Application to a Large-Scale Electrical Power System. Appl. Energy 2023, 348, 121439. [Google Scholar] [CrossRef]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A Novel Stacked Generalization Ensemble-Based Hybrid LGBM-XGB-MLP Model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Bashir, T.; Chen, H.; Tahir, M.F.; Zhu, L. Short Term Electricity Load Forecasting Using Hybrid Prophet-LSTM Model Optimized by BPNN. Energy Rep. 2022, 8, 1678–1686. [Google Scholar] [CrossRef]
Salmani, F.; Mahpeykar, M.R.; Rad, E.A. Estimating Heat Release Due to a Phase Change of High-Pressure Condensing Steam Using the Buckingham Pi Theorem. Eur. Phys. J. Plus 2019, 134, 48. [Google Scholar] [CrossRef]
Zhao, Q.; Xiang, W.; Huang, B.; Wang, J.; Fang, J. Optimised Extreme Gradient Boosting Model for Short Term Electric Load Demand Forecasting of Regional Grid System. Sci. Rep. 2022, 12, 19282. [Google Scholar] [CrossRef]
Hong, T.; Wang, P.; White, L. Weather Station Selection for Electric Load Forecasting. Int. J. Forecast. 2015, 31, 286–295. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: San Francisco, CA, USA, 2016; pp. 785–794. [Google Scholar]
Zhao, Q.; Zhang, Z.; Huang, Y.; Fang, J. TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values. Agriculture 2022, 12, 1452. [Google Scholar] [CrossRef]
Adler, A.I.; Painsky, A. Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection. Entropy 2022, 24, 687. [Google Scholar] [CrossRef] [PubMed]
Bluestein, L. A Linear Filtering Approach to the Computation of Discrete Fourier Transform. IEEE Trans. Audio Electroacoust. 1970, 18, 451–455. [Google Scholar] [CrossRef]
Sethares, W.A.; Staley, T.W. Periodicity Transforms. IEEE Trans. Signal Process. 1999, 47, 2953–2964. [Google Scholar] [CrossRef]
González-Romera, E.; Jaramillo-Morán, M.A.; Carmona-Fernández, D. Monthly Electric Energy Demand Forecasting with Neural Networks and Fourier Series. Energy Convers. Manag. 2008, 49, 3135–3142. [Google Scholar] [CrossRef]
Vasudevan, R.K.; Belianinov, A.; Gianfrancesco, A.G.; Baddorf, A.P.; Tselev, A.; Kalinin, S.V.; Jesse, S. Big Data in Reciprocal Space: Sliding Fast Fourier Transforms for Determining Periodicity. Appl. Phys. Lett. 2015, 106, 091601. [Google Scholar] [CrossRef]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 3rd ed.; Otexts, Online Open-Access Textbooks: Melbourne, Australia, 2021; ISBN 978-0-9875071-3-6. [Google Scholar]
Weiß, C.H.; Aleksandrov, B.; Faymonville, M.; Jentsch, C. Partial Autocorrelation Diagnostics for Count Time Series. Entropy 2023, 25, 105. [Google Scholar] [CrossRef]
Lo Duca, A.; Marchetti, A. Towards the Evaluation of Date Time Features in a Ship Route Prediction Model. JMSE 2022, 10, 1130. [Google Scholar] [CrossRef]
Huang, J.; Algahtani, M.; Kaewunruen, S. Energy Forecasting in a Public Building: A Benchmarking Analysis on Long Short-Term Memory (LSTM), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost) Networks. Appl. Sci. 2022, 12, 9788. [Google Scholar] [CrossRef]
Shen, B.-W.; Pielke, R.; Zeng, X.; Cui, J.; Faghih-Naini, S.; Paxson, W.; Kesarkar, A.; Zeng, X.; Atlas, R. The Dual Nature of Chaos and Order in the Atmosphere. Atmosphere 2022, 13, 1892. [Google Scholar] [CrossRef]
Elabid, Z.; Chakraborty, T.; Hadid, A. Knowledge-Based Deep Learning for Modeling Chaotic Systems. In Proceedings of the 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 12–14 December 2022; pp. 1203–1209. [Google Scholar]
Zheng, L.; Su, R.; Sun, X.; Guo, S. Historical PV-Output Characteristic Extraction Based Weather-Type Classification Strategy and Its Forecasting Method for the Day-Ahead Prediction of PV Output. Energy 2023, 271, 127009. [Google Scholar] [CrossRef]
Tuy, S.; Lee, H.S.; Chreng, K. Integrated Assessment of Offshore Wind Power Potential Using Weather Research and Forecast (WRF) Downscaling with Sentinel-1 Satellite Imagery, Optimal Sites, Annual Energy Production and Equivalent CO₂ Reduction. Renew. Sustain. Energy Rev. 2022, 163, 112501. [Google Scholar] [CrossRef]
Elektri Tarbimine ja Tootmine|Elering. Available online: https://elering.ee/elektri-tarbimine-ja-tootmine (accessed on 26 October 2023).
European Network of Transmission System Operators for Electricity. ENTSO-E Transparency Platform 2023. Available online: https://transparency.entsoe.eu (accessed on 15 June 2023).
Eesti Statistika. KE36: ENERGIA EFEKTIIVSUSE SUHTARVUD. Available online: https://www.stat.ee/et/avasta-statistikat/valdkonnad/energia-ja-transport/energeetika (accessed on 15 June 2023).
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 Global Reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
European Network of Transmission System Operators for Electricity. Demand Forecasting: National Estimates Scenario-Eraa 2022. Available online: https://eepublicdownloads.azureedge.net/clean-documents/sdc-documents/ERAA/2022/data-for-publication/Demand%20Forecasting%20ERAA22.pdf (accessed on 15 June 2023).
Shohan, M.J.A.; Faruque, M.O.; Foo, S.Y. Forecasting of Electric Load Using a Hybrid LSTM-Neural Prophet Model. Energies 2022, 15, 2158. [Google Scholar] [CrossRef]
Almazrouee, A.I.; Almeshal, A.M.; Almutairi, A.S.; Alenezi, M.R.; Alhajeri, S.N. Long-Term Forecasting of Electrical Loads in Kuwait Using Prophet and Holt–Winters Models. Appl. Sci. 2020, 10, 5627. [Google Scholar] [CrossRef]
Huang, Y. Research on Short-Term Power Load Forecasting Method Based on MA-GRU Model. Master’s Thesis, Northeast Agricultural University, Harbin, China, 2023. [Google Scholar]
Wu, T. Short-Term Multi-Step Load Forecasting in High Latitude and Cold Regions Based on Recurrent Neural Network. Master’s Thesis, Northeast Agricultural University, Harbin, China, 2022. [Google Scholar]
Chen, Q.; Guo, H.; Zheng, K.; Wang, Y. Data Analysis in Power Markets; Science Press: Beijing, China, 2021; ISBN 9789811649752. [Google Scholar]
Zamee, M.A.; Han, D.; Won, D. Online Hour-Ahead Load Forecasting Using Appropriate Time-Delay Neural Network Based on Multiple Correlation–Multicollinearity Analysis in IoT Energy Network. IEEE Internet Things J. 2022, 9, 12041–12055. [Google Scholar] [CrossRef]
Jifri, M.H.; Hassan, E.E.; Miswan, N.H. Forecasting Performance of Time Series and Regression in Modeling Electricity Load Demand. In Proceedings of the 2017 7th IEEE International Conference on System Engineering and Technology (ICSET), Shah Alam, Malaysia, 2–3 October 2017; pp. 12–16. [Google Scholar]
Yildiz, B.; Bilbao, J.I.; Sproul, A.B. A Review and Analysis of Regression and Machine Learning Models on Commercial Building Electricity Load Forecasting. Renew. Sustain. Energy Rev. 2017, 73, 1104–1122. [Google Scholar] [CrossRef]
Keevallik, S.; Post, P.; Tuulik, J. European Circulation Patterns and Meteorological Situation in Estonia. Theor. Appl.Climatol. 1999, 63, 117–127. [Google Scholar] [CrossRef]

Figure 1. Supervised learning in load forecasting.

Figure 2. The flow of feature engineering in this manuscript.

Figure 3. Training methods for forecasters in STLF.

Figure 4. Estonian actual electrical load from 2021 to 2023 (partly).

Figure 5. Consumption proportion of the Republic of Estonia and sampled locations.

Figure 6. Top 10 periodic components from fast Fourier transform.

Figure 7. Hours’ ACF and PACF of 24 × 21 lag results from load series in 2021.

Figure 8. Resampled days’ ACF and PACF of 21 d lag results from load series in 2021.

Figure 9. FIV results from models built using the load series in 2021.

Figure 10. FIV results from models built using load series in 2021.

Figure 11. Results after adding the datetime features.

Figure 12. Results of top complex combinations.

Figure 13. Results of performance after adding meteorological features.

Figure 14. Contrasts of performance for Es, the winds variables.

Figure 15. Sampled regions’ weights for meteorological information.

Figure 16. Weights of meteorological categories in boosting model.

Figure 17. Results of performance of training methods.

Figure 18. Monthly performance of models measured using MAE.

Figure 19. Results of performance of models.

Figure 20. Day-ahead forecasting robustness measured using daily metrics.

Figure 21. Ten worst day-ahead forecasting loads among models.

Figure 22. Worst forecasting days.

Figure 23. Extreme weather forecasting.

Figure 24. Holiday forecasting.

Table 1. Datetime variables in the research.

Description	Feature Name	Type	Value Example ¹
Which day in the Month	if1	Integer	15
Which day in the Month	sf1	Sine pairs	[sin( $\frac{15}{30} \times 2 π$ ), cos( $\frac{15}{30} \times 2 π$ )]
Which month in the year	if2	Integer	6
Which month in the year	sf2	Sine pairs	[sin( $\frac{6}{12} \times 2 π$ ), cos( $\frac{6}{12} \times 2 π$ )]
Which day in the week	if3	Integer	4
Which day in the week	sf3	Sine pairs	[sin( $\frac{4}{7} \times 2 π$ ), cos( $\frac{4}{7} \times 2 π$ )]
Which day in the year	if4	Integer	160
Which day in the year	sf4	Sine pairs	[sin( $\frac{160}{365} \times 2 π$ ), cos( $\frac{160}{365} \times 2 π$ )]
Is it a weekend?	bf1	Boolean	False
Is it a holiday?	bf2	Boolean	False

¹ Example date: 15 June 2023 (Thursday).

Table 2. Meteorological factors in the research.

Class	Idx	Feature Name	Unit	Description
Temperature	a1	Temperature	°C	-Air temperature at 2 m above ground
	a2	Dewpoint	°C	-Dew point temperature at 2 m above ground
	a3	Apparent temperature	°C	-The perceived feels-like temperature
	a4	Soil temperature	°C	-Average temperature of different soil levels below ground at 0 to 7 cm depths.
Pressure	b1	MSL pressure	hPa	-Atmospheric air pressure reduced to mean sea level
Pressure	b2	Surface pressure	hPa	-Atmospheric air pressure at the surface.
Precipitation	c1	Rain	mm	-Only liquid precipitation of the preceding hour.
Precipitation	c2	Snowfall	cm	-Snowfall amount of the preceding hour in centimeters.
Radiation	d1	Shortwave radiation	W/m²	-Shortwave solar radiation as average of the preceding hour.
	d2	Direct radiation	W/m²	-Direct solar radiation as average of the preceding hour on the horizontal plane
	d3	Direct normal irradiance radiation	W/m²	-Direct solar radiation as average of the preceding hour on the normal plane
	d4	Diffuse radiation	W/m²	-Diffuse solar radiation as average of the preceding hour
Wind	e1	Wind speed	km/h	-Wind speed at 10 m above ground.
Wind	e2	Wind gusts	km/h	-Gusts at 10 m above ground of the indicated hour.

Table 3. Performance of peak load capture and capacity forecasting.

Model	$A C P T_{κ}$				$M A E_{T_{P L}}$	$M A E_{M A X}$
Model	On Time	$κ = 1$	$κ = 2$	$κ = 4$	$M A E_{T_{P L}}$	$M A E_{M A X}$
V1	0.145	0.365	0.54	0.745	67.97114	76.88116
V2	0.19	0.445	0.595	0.75	64.19887	69.47483
V3	0.2	0.435	0.585	0.71	57.21661	61.5919
Ours	0.345	0.63	0.775	0.845	41.61119	49.67199
EU	0.3	0.615	0.735	0.81	71.71	78.83
LSTM	0.135	0.365	0.475	0.59	84.44812	81.06971
GRU	0.145	0.365	0.465	0.57	79.67624	78.72932
ETS	0.21	0.41	0.455	0.49	92.51735	112.701
PRF	0.21	0.49	0.555	0.61	80.61517	94.25627
ARM	0.055	0.205	0.375	0.515	148.9198	164.5627

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Q.; Liu, X.; Fang, J. Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study. Energies 2023, 16, 7962. https://doi.org/10.3390/en16247962

AMA Style

Zhao Q, Liu X, Fang J. Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study. Energies. 2023; 16(24):7962. https://doi.org/10.3390/en16247962

Chicago/Turabian Style

Zhao, Qinghe, Xinyi Liu, and Junlong Fang. 2023. "Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study" Energies 16, no. 24: 7962. https://doi.org/10.3390/en16247962

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extreme Gradient Boosting Model for Day-Ahead STLF in National Level Power System: Estonia Case Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Extreme Gradient Boosting Algorithms in Load Forecasting

2.2. Feature Engineering in Boosting Model

2.2.1. Load Features from Sliding Window in Width from Periodicity

2.2.2. External Features from Datetime and Meteorological Information Variables

2.3. Training Methods for Data-Driven Models

2.4. Dataset, Estonian Power Load, and Meteorological Information

2.5. Baseline Models and Evaluation Metrics

2.5.1. Baseline Models

2.5.2. Evaluations Metrics

3. Results

3.1. Historical Load Width from the Periodicity

3.1.1. Periodicity of Power Load Series

3.1.2. Window Sliding Width of Load Variable

3.2. External Information Features

3.2.1. Datetime Features

3.2.2. Meteorological Information Combination Features

3.3. Results of Forecasting Performance in Training Methods

3.4. Comparison with Baseline Models

3.4.1. Error between Forecasted and Actual Load

3.4.2. Peak Load Capture and Maximum Capacity Forecasting

3.4.3. Robustness of Models

3.4.4. Performance in Extreme Weather and Special Days

4. Conclusions

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI