3.2.1. Metrics and Naive Estimators
Our procedure has two steps: first, prediction of the curves is performed, and then prediction of the matching price is obtained by intersecting the requirement and the predicted curve. We need two metrics and two naive estimators, one for each prediction step we carry out. First, to measure the error and effectiveness of our supply curve forecasts, we use the mean absolute error (MAE) and the root mean squared error (RMSE) as metrics, and the previous-day curve as a naive estimator. On the other hand, with regard to the predictions of the matching price, we use the MAE as a metric and the matching price of the previous day as a naive estimator. For both cases, we try other naive estimators such as the previous week’s curve and prices, but they provide worse results.
3.2.2. Preprocessing of Data
After the approximation of the supply curves to a fixed grid of prices, we have a table of dimensions where the is 50. To try to predict future supply curves, we add lags and transform this table by reducing its dimensionality through Principal Component Analysis (PCA). We also incorporate exogenous variables, some related to calendar effects, and finally perform feature engineering to create new variables. The steps of the preprocessing procedure are as follows:
Lag 24: In each row, we add 50 new columns with the values of the previous day’s curve on which we later apply dimensionality reduction.
Calendar variables: We include the following dummy variables using one-hot encoding: hour of the day, day of the week, month, quarter, a binary variable on whether the day is a national holiday or not.
Exogenous variables: We include an indicator of the wind speed and solar diffuse radiation of each capital of province in Spain obtained from
https://open-meteo.com/ (104 variables) on which we later apply dimension reduction. We also includes a column with the Dutch TTF gas price, the reference price in the European market, from
www.investing.com. Lastly, we include a column with the matching price reached in the daily market and another one with the amount of MWh assigned.
Train and test split and dimensionality reduction: Before reducing the dimensions of the variable input, we perform a train and a test split reserving the years from 2014 to 2018 for training and 2019, 2020 and 2021 for testing. Next, we perform a principal component analysis on the 50 columns of lag24 of each curve. As expected, the 50 columns are highly correlated, and selecting the first five principal components, we obtain an explained variance of 98.51%. Something similar happens with the wind and radiation variables in each province. From the 52 solar radiation variables, by selecting just the first two, we obtain 89.66% of explained variance, while for the wind with 10 principal components, we explain 79.82% of the variance. This process helps us to greatly reduce the number of input variables and therefore the processing time. This entire process is carried out in the training set and reproduced with the same parameters in the testing set.
Customized features: Perhaps the lagged curves do not provide enough information about the evolution of the offers in the last days/weeks. For this reason, we add some more variables that enable the algorithm to detect these dynamics. We add information in two ways: on the one hand, to provide some data on the evolution of the supply for the same hour over the last few weeks, we take the first principal component of each supply curve for the same hour in the last 12 weeks (84 columns). Through PCA, we reduce this information to 15 components. We also add information relative to all the hours of the previous four days, taking the first principal component for each curve and then performing PCA to select the first eight principal components that explain this trajectory.
Of course, before performing each PCA, we scale the data. Subsequently, for the training of some methods such as neural networks, we also scale and normalize the input variables. On the other hand, these parameters that we mentioned like 12 previous weeks of information on the same hour and four previous days of information on all hours are the values that we found after hyper-parameter optimization (HPO) with the method that worked best with the training data.
Finally, we have the following 91 input variables:
Lag–24 (5 variables): The first five principal components of the supply curve for the same hour in the day before.
Hour (24 columns).
Weekday (7 columns).
Month (12 columns).
Quarter (4 columns).
Holiday (1 column).
Solar radiation (2 columns): The first two principal components of the diffuse solar radiation in Spanish capitals of provinces.
Wind speed (10 columns): The first ten principal components of the wind speed in Spanish capitals of provinces.
Gas price (1 column): Price of Dutch TTF gas.
Daily market (2 columns): The matching price and quantity assigned in the daily market of the same day.
Same-hour evolution (15 columns): Information on the evolution of the first principal component of each curve for the same hour in the last 12 weeks.
All-hours evolution (8 columns): Information on the evolution of the first principal component of each curve for all hours in the last four days.
3.2.4. Machine Learning Models
To predict the 50 series, different machine learning algorithms were tested, including Random Forest (RF) [
20], Histogram-based (HB) Gradient Boosting [
21], Dense Neural Networks (DNN) [
22] and Long Short-Term Memory (LSTM) [
23]. In all these models, we used a multi-output approach, that is, the 50 time series that arise from the approximation of the curves were predicted simultaneously.
Four different alternatives were tested with Random Forest: (1) A single RF model considering only the time series; in other words, the input variables being the first five PCs of the previous-day curve. (2) A single RF model with all the input variables, endogenous and exogenous. (3) A total of 24 RF models, one for each hour, considering only the time series (first five PCs of the previous day curve). (4) A total of 24 RF models, one for each hour, considering all the input variables. It should be noted that when we used a single model, it was trained using data from all hours of the training set, while when we used 24 models, each model was trained only with the data of its corresponding time; therefore, the training set was 24 times smaller than in the case of a single model.
We tested these models in years 2019–2021, in which there was irregular behavior in the electricity market due to the pandemic. For this reason, in
Table 5, in addition to the results for that period, we present also the error metrics for year 2019. This allows us observation of the ways in which the model predicts the curves for a standard year.
As it is observed in
Table 5, a single model works better than 24 models. This is probably due to the fact that by dividing the data into 24 groups, a sufficient number of observations for an efficient fit of the model is not obtained.
Histogram-Based Gradient Boosting works better than Random Forest, producing better results with a single model than with 24 different ones, probably for the same reason. To try to mitigate this problem of having few observations in each group, we attempted to cluster the hours, clustering them into four different groups and therefore fitting 4 models and not 24. The results, which also appear in the table, are no better than those of a single HB Gradient Boosting model, so we chose the latter as our winning model. It should be noted that the improvements of HB Gradient Boosting relative to HB Gradient Boosting with four models were small in 2019 but were larger in the entire period, 2019–2021.
A DNN model was also tested, performing HPO on the number of layers, testing the size of each layer, dropout, etc. Finally, the best option for DNN was that of 24 different models, each one being a neural network of one hidden layer of 1443 neurons. However the results did not improve the HB Gradient Boosting. On the other hand, a model with LSTM also was considered, but it obtained very poor results.
3.2.5. Monotonicity Restoration
Each model predicts the future values of 50 time series; however, it must be taken into consideration that these 50 series make up different supply curves that must always be non-decreasing functions. In practice, we observe that the models do not exactly respect this characteristic, producing small mismatches that break the monotonicity between the 50 values. In
Figure 6, we can observe the frequency and size of the monotonicity breaks in the outputs from HB Gradient Boosting.
To solve this problem, a method is proposed to restore monotonicity. It consists of the following procedure: if in a curve there is a point followed by more than one point of lesser value before the curve rises up again, then we consider this local maximum as an error and we decrease it. Meanwhile, if it is only a single point that has a lesser value before the curve rises again, then we increase the low point. In this way, iterating several times, we manage to restore the monotony in the curves, and these new corrected curves are closer to the original ones. In
Table 6 and
Table 7, it is possible to see how this process reduces the errors of the predictions with respect to the real curves.
We can see that the improvement is small, indicating that monotonicity problems are not large in the prediction results of the chosen model. In any case, it is preferable to apply this correction so that the predictions satisfy the non-decreasing monotonicity constraint.