Prediction of Matching Prices in Electricity Markets through Curve Representation

Foronda-Pascual, Daniel; Alonso, Andrés M.

doi:10.3390/en16237812

Open AccessArticle

Prediction of Matching Prices in Electricity Markets through Curve Representation

by

Daniel Foronda-Pascual

¹ and

Andrés M. Alonso

^2,*

¹

Research Service, Universidad Carlos III de Madrid, 28911 Leganés, Spain

²

Department of Statistics, Institute Flores de Lemus, Universidad Carlos III de Madrid, 28903 Getafe, Spain

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(23), 7812; https://doi.org/10.3390/en16237812

Submission received: 3 November 2023 / Revised: 21 November 2023 / Accepted: 23 November 2023 / Published: 27 November 2023

(This article belongs to the Section C: Energy Economics and Policy)

Download

Browse Figures

Versions Notes

Abstract

:

In the Spanish electricity market, after the daily market is held in which prices are set for the next day, the secondary and tertiary markets take place, which allow companies more accurate adjustment of the electricity they are able to offer. The objective of this paper is to predict the final price reached in these markets by predicting the supply curve in advance, which is the aggregate of what companies offer. First, we study a procedure to represent the supply curves, and then we consider different machine learning approaches to obtain the day-ahead supply curves for the secondary market. Finally, the predictions of the supply curves are crossed with the system requirements to obtain the expected price predictions. Histogram-Based Gradient Boosting is the best performing algorithm for predicting supply curves. The most relevant variables for the prediction are the lagged values, the daily market price, the price of gas and values of the wind recorded in the Spanish provinces.

Keywords:

electricity secondary market; electricity supply curves; multivariate time series; forecasting

1. Introduction

Electricity trade is currently liberalized in most countries in our social and economic environment. Predicting prices in this market is complex due to the particular characteristics of the supply and demand of electricity and the difficulty of its storage. Despite the difficulties, forecasts are necessary for several reasons. First, it is a strategic sector of the economy; second, financial implications arising from the trading of forward contracts and options are important; third, forecasts help to optimize and plan consumption and production (see, for instance, [1]).

On the other hand, electricity supply prediction is a critical task in the energy sector, as it assists in efficient resource planning, grid management, and the integration of renewable energy sources. Supply curves represent the relationship between the amount of energy that electricity suppliers are willing to offer and the prices associated with each supply level. Accurate prediction of electricity supply curves is a crucial aspect in energy markets, as it provides valuable information about energy prices and availability in the near future (see, for instance, [2]).

Most of the works on electricity price prediction focus on the daily or the spot market, considering price values as a univariate time series and applying time series analysis, machine learning and deep learning techniques for their prediction. The recent reviews of [3,4] show the main characteristics of the models and achievements obtained. Since the price of electricity results from intersecting the supply and demand curves, it is interesting to explore an approach that is based on the prediction of these curves to obtain the electricity price prediction as proposed in [5,6]. It should be noted that this approach addresses the prediction of two different objects: first, the curve predictions are obtained, and second the price predictions. Both predictions are of interest to market agents.

In this article, we use this two-step approach based on a parsimonious and uniform representation of the supply curves. Once we have a representation of the curves, their prediction is addressed using machine learning techniques, although we also considered time series and deep learning models. We exemplify the procedure in data from the Spanish secondary market where the energy requirements are known prior to the sending of offers bids by the producing units.

The rest of the paper is organized as follows. Section 1.1 and Section 1.2 describe the Spanish electricity market and its structure, respectively. Section 2 presents a state of the art of electricity supply and demand curve prediction. In Section 3.1, we present a procedure to approximate the curves. In Section 3.2, we present the considered variables and the models used to forecast the curves. In Section 4, we obtain the price prediction using the model with the best performance in predicting the curves. The results obtained are discussed in relation to the year to be predicted and an analysis of the interpretability of the used model is produced. Section 5 presents the conclusions.

1.1. Spanish Electricity Market

Prior to 1997, the Spanish electricity sector was dominated by a state-owned utility, which held a monopoly over electricity production and distribution. However, in 1997, the Spanish Electricity Act was established, which initiated a process of liberalization and deregulation, aiming to introduce competition and promote a more efficient and dynamic electricity market. In the early 2000s, Spain experienced a boom in renewable energy, particularly wind power. Government incentives and favorable policies attracted significant investments, making Spain one of the global leaders in wind energy capacity. In 2009, the Spanish government introduced a feed-in tariff system to promote renewable energy that guaranteed fixed prices for electricity generated from renewable sources attracting further investments. However, due to the rapid growth and higher costs, the government later reduced these incentives to mitigate the impact on consumers’ electricity bills (see in [7,8]).

In recent years, Spain has continued to prioritize the expansion of renewable energy and the transition towards a more sustainable and decarbonized power sector. It has set ambitious targets for renewable energy penetration, aiming to reach 100% renewable electricity by 2050.

The main participants in the Spanish electricity market (see Figure 1) are Generation Companies which are responsible for producing electricity. They own and operate power plants, including conventional thermal power plants, nuclear power plants, and renewable energy installations such as wind farms and solar power plants. The Transmission System Operator, known as Red Eléctrica de España (REE), manages and controls the high-voltage transmission grid. REE ensures the reliable and secure transmission of electricity throughout the country, maintaining the balance between supply and demand. Distribution Companies operate the local distribution networks and are responsible for delivering electricity to end consumers; they maintain and manage the distribution infrastructure, including power lines, transformers, and substations. Retail Suppliers purchase electricity from the wholesale market and sell it to end consumers; they offer various pricing plans, manage customer relationships, and handle billing and customer services. Market Operator, known as Operador del Mercado Ibérico de Energía (OMIE), oversees the operation of the wholesale electricity market. OMIE facilitates the trading of electricity between generation companies and retail suppliers, ensuring fair and transparent market conditions; the Regulator of the electricity market is the National Commission of Markets and Competition (CNMC). The CNMC ensures compliance with market rules, promotes competition, and regulates tariffs and prices to protect consumer interests. Finally, Consumers include residential, commercial, and industrial users who purchase electricity for their own consumption. Consumers have the option to choose their preferred retail supplier and participate in demand response programs to optimize their energy usage.

1.2. Daily, Intraday, Secondary, and Tertiary Markets

The Spanish electricity market operates through a three-tier system consisting of the daily, secondary, and tertiary markets. Each market serves a specific purpose and contributes to the determination of the final electricity price. Here is an explanation of how each market works (see, for instance, in [9]):

Daily Market: The daily market, also known as the spot market or the day-ahead market, is where electricity is traded for delivery on the following day. In this market, generation companies submit their offers to supply electricity based on their production costs and availability. At the same time, retailers and large consumers submit their bids for purchasing electricity. The market operator, Operador del Mercado Ibérico de Energía (OMIE), matches the offers and bids to determine the market matching price, also known as the marginal price. The market matching price is the price at which the demand for electricity matches the available supply. This price is used to settle the transactions in the daily market.
Intraday Markets: The intraday markets, also known as real-time markets, are an additional segment where electricity is traded and balanced in real time. It allows market participants, such as producers and consumers, adjustment of their energy schedules and commitment to last-minute trades, ensuring the efficient utilization of electricity resources. The market operator, OMIE, is also responsible for these markets.

The following two markets are managed by Red Eléctrica de España (REE) which is responsible for the maintenance and extension of the transmission network and the energy demand management:

Secondary Market: The secondary market takes place after the Daily Market, enabling market participants to adjust their positions and make corrections to balance their portfolios. It provides flexibility for market participants to manage unexpected changes in supply or demand. There are two modalities: Upward, which corresponds to the increase in the electricity supply of a generation company, and Downward, which corresponds to its decrease.
Tertiary Market: The tertiary market, also referred to as the imbalance settlement market, addresses any imbalances between the contracted and actual consumption or generation of electricity. Market participants who deviate from their contracted positions during the delivery period can buy or sell imbalances in the tertiary market. The prices in the tertiary market are set based on the costs associated with balancing the system, including penalties for imbalances.

Below, we include a diagram representing the time at which each market occurs (in green) and at which each market applies (in blue). This market structure and operation is common in other European electricity markets (see, for instance, [10]). The diagram in Figure 2 is based on the information in [11].

In this paper, we study prediction methods for the day-ahead matching prices at the secondary electricity market in Spain by first predicting the supply curves and matching them later with the requirements to obtain the price. As Figure 2 shows, this market occurs before 16:00 at day

D - 1

, and it applies to the 24 h of day D. That is, before 16:00 on the current day, producers must present their offers by 24 h of the following day.

In Figure 3, we present an example of several supply curves for the secondary market (upward) for a given day (there is one curve for every hour).

Before the secondary market occurs, Red Eléctrica de España publishes the requirement for each hour. By intersecting the curve with the requirement, we obtain the matching price. The publication of the requirements takes place an hour and a quarter before the companies send the offers (see Figure A1 in Appendix A taken from the Official State Gazette [12]). This resolution ensures that the requirements for the secondary market are published at 2:45 p.m. and that companies must send their offers at 4:00 p.m. Therefore, companies know the requirements before sending their offers.

2. State of the Art

The objectives of this work are to predict the supply curve and the matching price in the secondary market. The second objective could be attempted by treating the prices as a time series and obtaining a model based on its past values and/or exogenous variables such as days of the week, holidays and weather variables such as temperature, wind speed or solar irradiance (see the review in [4]). The first objective, predicting curves, is more complex because the object to be predicted is a function rather than a point value such as price. In this section, we focus on the papers that address the prediction of supply (and demand) curves.

In [13], a functional data nonparametric techniques are used to model hourly electricity residual demand curves. The authors assume that the curves are observed in a common interval and that the functions are sufficiently smooth with up to two derivatives. This assumption might be acceptable in markets with a high number of small offer/demands bids, but it is not acceptable in markets whose curves are clearly stepped as in Figure 3. In [6], the difference between demand and supply is also modeled,

η (p) = d (p) - s (p)

, to obtain the price as the solution to equation

η (p) = 0

. The authors use a polynomial model of a sixth order to approximate this curve around the market clearing price. Clearly, this is a very local model, and it also requires smooth curve conditions.

In [5], the authors predict both demand and supply curves and take into account the fact that the matching price is the intersection point between the supply and demand curves trying first to predict these curves in order to predict the matching price. In this way, the authors are able to make more accurate predictions than those derived from techniques that were previously used based on the matching price series. This information on the origin of the matching price as the intersection of two curves seems to enable a more accurate prediction that also has potential applications in the bidding strategies of electricity companies. The approach of authors in [5] is based on modeling bids in pre-selected price blocks and modeling those aggregate bid values as multivariate time series. The choice of blocks is based on the average curve. The method uses a small number of blocks which makes its approach to the curves imprecise. In [14], a modification of the [5] approach is proposed, where the transformed versions of the curves with perfectly inelastic demand are used. As the central objective of [5,14] is price prediction, the authors do not study the behavior of the procedure as a curve predictor.

In order to predict these supply and demand curves, one possible approach is to perform dimensionality reduction and therefore lose some information. To avoid this loss, the authors of [15] use functional regression for the same purpose through a model based on a double-seasonal functional SARMAHX capable of capturing daily and weekly seasonality of the time series, also including exogenous variables.In [16], the authors use both parametric and nonparametric functional autoregressive models showing that nonparametric models lead to a statistically significant improvement in the forecasting accuracy compared to previous studies. A nonparametric functional autoregressive model is a flexible approach used to analyze functional data and make predictions without assuming a specific functional form. It captures the dependence between variables over time by considering a functional response variable and its past values which allows for capturing complex patterns and dynamic characteristics of functional data. However, these works assume smoothness conditions in the curves that are not valid in step functions.

The approach that we propose in the following section attempts to solve some of the deficiencies that we commented on in the previous works.In particular, curve smoothness assumptions are not needed, and we conduct an exhaustive study to obtain a good approximation of them. It should be noticed that the requirements in the Spanish secondary market are given quantities, and in [5], the demands are also curved. Furthermore, the requirements are published before obtaining the supply curves.

3. Procedures for Approximation and Forecasting of Supply Curves

3.1. Approximations of Supply Curves

Supply curves are non-decreasing step functions. The first problem that we encounter when trying to use time series prediction methods on them is that the steps are located on different abscissas, as we saw in Figure 3. To solve this problem, we must first establish a fixed grid on the x-axis (price) and approximate each supply curve to another increasing step function that has the steps in that grid. In this section, we follow the procedure described in [17].

3.1.1. Choosing the Grid for Prices

In order to accurately reflect the steps that occur most frequently in the supply functions, we calculate the empirical cumulative distribution function (ecdf) of these steps and use certain evenly distributed percentiles of it. However, since there is a large concentration of different steps close to zero, we establish a filter selecting only the prices whose supplied quantity (q, measured in MWh) is above the threshold of

q_{0}

. That is, we use n homogeneously distributed percentiles of function

\hat{F} (p | q \geq q_{0})

being

\hat{F} (p | q) = N^{- 1} \sum_{j = 1}^{N} I (p \leq p_{j}, q \leq q_{j})

where pairs

(p_{j}, q_{j})

are the observed bids in the curves. Then, given n and

q_{0}

, the prices in the grid are obtained by

p_{n, q_{0}}^{i} = {\hat{F}}^{- 1} (\frac{i}{n} | q \geq q_{0}) .

(1)

Therefore, there are two parameters that determine the grid and that influence the precision of the approximations: the size of the grid (n) and the minimum quantity (

q_{0}

) that we consider to take into account the prices in the ecdf. To tune these parameters and determine which ones offer the best results, we first need to define a curve approximation method to be able to measure the error generated by each grid.

3.1.2. Approximations Using $L_{2}$ Loss Function

To assess the goodness of the approximations, we need a loss function. We use the following with r = 2 as proposed in [17], in which case we can analytically obtain the minimum.

L_{r} = {∥C_{t} - {\hat{C}}_{t, n}∥}_{r}^{r} = \int_{0}^{+ \infty} {|C_{t} (p) - \sum_{i = 1}^{n} c_{t, i, n} ϕ_{i, n} (p)|}^{r} W (p) d p,

where

ϕ_{i, n} = \{\begin{matrix} 0 & if p < p_{i} \\ 1 & if p \geq p_{i} \end{matrix},

and

W (p)

is a non-negative weight function such that

{lim}_{p \to + \infty} W (p) = 0

in order to ensure the convergence of the above integral.

Since we want the approximations to be more accurate in the areas where the match with the requirement usually occurs, we considered different candidates for the W function:

The fit of the final prices in a train set with exponential (FinalPrices_Exp), logNormal (FinalPrices_logNormal), Cauchy (FinalPrices_Cauchy) and normal distributions (FinalPrices_Normal).
The fit of all prices in the training set with an exponential function (AllPrices_Exp).
The fit of all unique prices in the training set using an exponential function (Unique Prices_Exp).

First place, we find weight function W that offers the best results in terms of minimizing the difference between the matching price using the original curve and the approximate curve. We provisionally adopt a grid with

n = 45

and

q_{0} = 337

(first quartile of the quantities) and we take a sample of size 1000 curves.

After this exercise (see Table 1), we select the exponential fit of the unique prices, UniquePrices_exp, as a weight function since in this way we obtain an error of less than EUR 1/MWh in 95% of the curves. This weight function is represented in Figure 4.

After choosing W, we can refine the selection of n and

q_{0}

. As it is logical, a larger n increases the accuracy of the approximations, but it subsequently slows down the prediction methods and could lead to unnecessary redundancies. For

q_{0}

, we consider two possibilities: zero or the first quartile of the quantities (337 MWh). Table 2 and Table 3 show the approximation errors for the final price depending on these two parameters with a sample size of 1000 curves.

It is worth mentioning that there are some errors that can exceed EUR 100/MWh. This is because sometimes the supply curve does not reach the required quantity, and therefore the curves do not intersect while there is a final price assigned. This makes the difference between that price and its approximation very large and the median a more valuable measure than the mean.

As a conclusion, we choose

n = 50

and

q_{0} = 337

as the parameters that guarantee a good balance between precision and simplicity. Clearly,

q_{0} = 337

works better than

q_{0} = 0

while

n = 50

somewhat improves the median of the errors, while increasing it to

n = 55

would hardly improve it.

To check that the approximations are good enough, we can compare them with two naive estimators. The first one consists of approximating each curve with the previous day’s curve and using it to calculate the matching price. The second naive estimator simply consists of estimating the final price of a day as the same as the previous day. Table 4 shows that both methods are much worse than the approximations we obtained.

Finally, in Figure 5, we can see an example of an original supply curve (in blue) together with its approximation (in red) and with the requirement (horizontal black line), the true matching price and the matching price obtained through approximation.

3.2. Models

3.2.1. Metrics and Naive Estimators

Our procedure has two steps: first, prediction of the curves is performed, and then prediction of the matching price is obtained by intersecting the requirement and the predicted curve. We need two metrics and two naive estimators, one for each prediction step we carry out. First, to measure the error and effectiveness of our supply curve forecasts, we use the mean absolute error (MAE) and the root mean squared error (RMSE) as metrics, and the previous-day curve as a naive estimator. On the other hand, with regard to the predictions of the matching price, we use the MAE as a metric and the matching price of the previous day as a naive estimator. For both cases, we try other naive estimators such as the previous week’s curve and prices, but they provide worse results.

3.2.2. Preprocessing of Data

After the approximation of the supply curves to a fixed grid of prices, we have a table of dimensions

(n_{d a t a}, g r i d_{s i z e})

where the

g r i d_{s i z e}

is 50. To try to predict future supply curves, we add lags and transform this table by reducing its dimensionality through Principal Component Analysis (PCA). We also incorporate exogenous variables, some related to calendar effects, and finally perform feature engineering to create new variables. The steps of the preprocessing procedure are as follows:

Lag 24: In each row, we add 50 new columns with the values of the previous day’s curve on which we later apply dimensionality reduction.
Calendar variables: We include the following dummy variables using one-hot encoding: hour of the day, day of the week, month, quarter, a binary variable on whether the day is a national holiday or not.
Exogenous variables: We include an indicator of the wind speed and solar diffuse radiation of each capital of province in Spain obtained from https://open-meteo.com/ (104 variables) on which we later apply dimension reduction. We also includes a column with the Dutch TTF gas price, the reference price in the European market, from www.investing.com. Lastly, we include a column with the matching price reached in the daily market and another one with the amount of MWh assigned.
Train and test split and dimensionality reduction: Before reducing the dimensions of the variable input, we perform a train and a test split reserving the years from 2014 to 2018 for training and 2019, 2020 and 2021 for testing. Next, we perform a principal component analysis on the 50 columns of lag24 of each curve. As expected, the 50 columns are highly correlated, and selecting the first five principal components, we obtain an explained variance of 98.51%. Something similar happens with the wind and radiation variables in each province. From the 52 solar radiation variables, by selecting just the first two, we obtain 89.66% of explained variance, while for the wind with 10 principal components, we explain 79.82% of the variance. This process helps us to greatly reduce the number of input variables and therefore the processing time. This entire process is carried out in the training set and reproduced with the same parameters in the testing set.
Customized features: Perhaps the lagged curves do not provide enough information about the evolution of the offers in the last days/weeks. For this reason, we add some more variables that enable the algorithm to detect these dynamics. We add information in two ways: on the one hand, to provide some data on the evolution of the supply for the same hour over the last few weeks, we take the first principal component of each supply curve for the same hour in the last 12 weeks (84 columns). Through PCA, we reduce this information to 15 components. We also add information relative to all the hours of the previous four days, taking the first principal component for each curve and then performing PCA to select the first eight principal components that explain this trajectory.

Of course, before performing each PCA, we scale the data. Subsequently, for the training of some methods such as neural networks, we also scale and normalize the input variables. On the other hand, these parameters that we mentioned like 12 previous weeks of information on the same hour and four previous days of information on all hours are the values that we found after hyper-parameter optimization (HPO) with the method that worked best with the training data.

Finally, we have the following 91 input variables:

Lag–24 (5 variables): The first five principal components of the supply curve for the same hour in the day before.
Hour (24 columns).
Weekday (7 columns).
Month (12 columns).
Quarter (4 columns).
Holiday (1 column).
Solar radiation (2 columns): The first two principal components of the diffuse solar radiation in Spanish capitals of provinces.
Wind speed (10 columns): The first ten principal components of the wind speed in Spanish capitals of provinces.
Gas price (1 column): Price of Dutch TTF gas.
Daily market (2 columns): The matching price and quantity assigned in the daily market of the same day.
Same-hour evolution (15 columns): Information on the evolution of the first principal component of each curve for the same hour in the last 12 weeks.
All-hours evolution (8 columns): Information on the evolution of the first principal component of each curve for all hours in the last four days.

3.2.3. ARIMA Model

The first model that we consider is a Seasonal AutoRegressive Integrated Moving Average (SARIMA) (see, for instance, [18]). The results using this model are considered as an additional benchmark. In order to fit a SARIMA model to the series, we first reduce its dimension by means of PCA, taking only the first two principal components that explain 95.39% of the variance. A SARIMA model is selected and fitted to each time series formed by the values of a principal component. Then, to predict a 24 h horizon, we use the auto_arima procedure from statsmodels [19] with training data from the previous month. As Table 5 shows, the results are only slightly better than those of the naive estimator.

3.2.4. Machine Learning Models

To predict the 50 series, different machine learning algorithms were tested, including Random Forest (RF) [20], Histogram-based (HB) Gradient Boosting [21], Dense Neural Networks (DNN) [22] and Long Short-Term Memory (LSTM) [23]. In all these models, we used a multi-output approach, that is, the 50 time series that arise from the approximation of the curves were predicted simultaneously.

Four different alternatives were tested with Random Forest: (1) A single RF model considering only the time series; in other words, the input variables being the first five PCs of the previous-day curve. (2) A single RF model with all the input variables, endogenous and exogenous. (3) A total of 24 RF models, one for each hour, considering only the time series (first five PCs of the previous day curve). (4) A total of 24 RF models, one for each hour, considering all the input variables. It should be noted that when we used a single model, it was trained using data from all hours of the training set, while when we used 24 models, each model was trained only with the data of its corresponding time; therefore, the training set was 24 times smaller than in the case of a single model.

We tested these models in years 2019–2021, in which there was irregular behavior in the electricity market due to the pandemic. For this reason, in Table 5, in addition to the results for that period, we present also the error metrics for year 2019. This allows us observation of the ways in which the model predicts the curves for a standard year.

As it is observed in Table 5, a single model works better than 24 models. This is probably due to the fact that by dividing the data into 24 groups, a sufficient number of observations for an efficient fit of the model is not obtained.

Histogram-Based Gradient Boosting works better than Random Forest, producing better results with a single model than with 24 different ones, probably for the same reason. To try to mitigate this problem of having few observations in each group, we attempted to cluster the hours, clustering them into four different groups and therefore fitting 4 models and not 24. The results, which also appear in the table, are no better than those of a single HB Gradient Boosting model, so we chose the latter as our winning model. It should be noted that the improvements of HB Gradient Boosting relative to HB Gradient Boosting with four models were small in 2019 but were larger in the entire period, 2019–2021.

A DNN model was also tested, performing HPO on the number of layers, testing the size of each layer, dropout, etc. Finally, the best option for DNN was that of 24 different models, each one being a neural network of one hidden layer of 1443 neurons. However the results did not improve the HB Gradient Boosting. On the other hand, a model with LSTM also was considered, but it obtained very poor results.

3.2.5. Monotonicity Restoration

Each model predicts the future values of 50 time series; however, it must be taken into consideration that these 50 series make up different supply curves that must always be non-decreasing functions. In practice, we observe that the models do not exactly respect this characteristic, producing small mismatches that break the monotonicity between the 50 values. In Figure 6, we can observe the frequency and size of the monotonicity breaks in the outputs from HB Gradient Boosting.

To solve this problem, a method is proposed to restore monotonicity. It consists of the following procedure: if in a curve there is a point followed by more than one point of lesser value before the curve rises up again, then we consider this local maximum as an error and we decrease it. Meanwhile, if it is only a single point that has a lesser value before the curve rises again, then we increase the low point. In this way, iterating several times, we manage to restore the monotony in the curves, and these new corrected curves are closer to the original ones. In Table 6 and Table 7, it is possible to see how this process reduces the errors of the predictions with respect to the real curves.

We can see that the improvement is small, indicating that monotonicity problems are not large in the prediction results of the chosen model. In any case, it is preferable to apply this correction so that the predictions satisfy the non-decreasing monotonicity constraint.

4. Discussion on Curve and Price Prediction Results

4.1. Final Model for Curve Prediction

As mentioned in the previous section, the model used to predict the supply curves is an HB Gradient Boosting. In order to obtain the best possible results, a monthly retraining was carried out, that is, the model was retrained with all the data, from the first observation to last day of the previous month, to predict the new month. We also tested a fixed time window (for example, the last two years) when predicting each month, but the results were better when all the data prior to that date were considered for the training. To predict each month, therefore, all the data were preprocessed again in each iteration, the model was retrained and the curves for the next month were forecasted. In this way, we obtained the errors in the prediction of the curves (see Figure 7).

We could see that the errors in the test set were always lower with the final model than with the naive estimator except in one month, January 2021. In Figure 8, we can see two examples of predictions for supply curves. We remark that the way in which the curve is predicted in the neighborhood near the intersection with the requirement (black horizontal line) is particularly important.

Looking at some examples of predictions, the importance of the previous-day curve for the new prediction becomes clear, as confirmed in the next section. Therefore, it may be interesting to study its influence with greater detail. We can illustrate this fact with the first of the following plots that shows, on the horizontal axis, the distance from a predicted curve to the actual curve, and on the vertical axis the distance from the predicted curve to the previous-day curve.

As can be seen in Figure 9, the distance from the predicted curve to the previous-day curve follows a right skewed distribution. It is interesting to note that 25% of the best predicted curves have a shorter distance from the previous-day curve than other groups, which means that in many of the curves with a more accurate prediction, the real value closely matches lag 24, so in those cases, they may not be very difficult to predict. In the opposite case, however, something similar happens: 25% of the worst predictions have a smaller distance from the previous-day curve than in groups of second and third quartiles, so it might happen that in many cases the algorithm is predicting the curve very similarly to that of the previous day, but the real value is quite different. In these cases, perhaps the algorithm is not correctly using certain information to disassociate the prediction from lag 24, or perhaps the input data are not providing the key information that causes the supply curve to be different on those days from that of the previous day.

Interpretability

Histogram-Based Gradient Boosting is an optimized variant of the Gradient Boosting algorithm [24] especially useful when working with large or high-dimensional data sets as in our case. Unfortunately, it does not provide feature importance that might allow us the knowledge of input variables that are more decisive in the predictions. However, we can use the Shap package that makes use of Shapley values (see [25]). Shapley values are a technique derived from game theory used to fairly allocate each player’s contribution to the overall outcome of a cooperative game. When applying them to the interpretation of machine learning models, the central idea is to evaluate how the prediction changes when a specific feature is included or excluded, considering all possible feature combinations. In this way, we can obtain the following results for the prediction of a time series. In Figure 10, we show the Shap values when the model is used to predict time series Q30.

In the previous figure, there is one point for each predicted hour. The Shap value is proportional to the influence that each of the features has on the prediction of Q30. If the majority of points are far from zero, that variable will have great importance in the predictions. We summarize this information in Figure 11.

It is shown that the most influential variable in the prediction of a curve is the curve of the previous day (lag24_pc1). This influence is also more or less constant along the different Qs, although a slight increasing trend of up to Q39 can be observed (see Figure 12).

The second most important variable for predicting the supply curve is the daily market matching price (mDaily_price) for that day. Next, the evolution of the first principal component of the curve for that same hour over the last 15 weeks (sameHourEvol_pc1) is the third most important variable, although it has less weight when predicting the values at the end of each curve. Lastly, the price of TTF gas (gas_price), which mainly affects the series between Q12 and Q16 and between Q39 and Q44 approximately, and the first component of wind speed (w1) are considered. We can remark that the Shap values do not add up to one and that the importance in the series Q1 to Q5 is not very indicative since their values are very close in all cases.

4.2. Matching Prices Prediction

Once we solve the prediction problem for the supply curves, we only have to calculate their intersection with the requirement for that hour/day to obtain a prediction for the matching price. After that, we compare these predictions with the true final prices, obtaining the following table.

In Table 8, we compare the prediction error using HB Gradient Boosting (HGB) with the ones obtained using two naive methods: (1) the previous-day matching price (naive price) and (2) the intersection of the previous-day supply curve with the actual requirement (naive curve). It is clear that HB Gradient Boosting outperforms the two naive approaches.

Figure 13 shows the monthly average error and the interval between the 5th and 95th percentile of the prediction errors. We can see that in the last year, the errors are notably higher than those in the two previous ones (see also Table 9).

As can be seen, the predictions for 2021 are much worse than those of previous years, and could even be below the performance of naive estimators. However, if we disaggregate the errors made by the naive estimators and show only those for 2021, we will see how they are worse than those of HB Gradient Boosting (see Table 10).

A possible explanation for this irregularity can be attributed to the measures implemented in response to the COVID-19 pandemic in 2021. However, it is worth noting that strong measures were also implemented in 2020, yet the errors do not appear to show such a pronounced effect. When comparing the month-to-month error with the number of detected COVID-19 cases (see Figure 14), considering the limited resources available for accurate case detection in the early months of the pandemic, we observe the following graph.

The pandemic could be a contributing factor to greater difficulty in predicting electricity prices in the secondary market. But also, during that year, there was an important change in electricity market legislation [26] that modified the minimum and maximum prices as well as various support measures to address energy poverty such as the expansion of the Electricity Social Bonus or the strengthening of the Supply Guarantee Fund. These reasons can serve to justify, at least in part, the irregularity of year 2021 and therefore the lower efficiency of the predictions.

5. Conclusions

The conclusions that we can draw are divided into two groups: those related to the prediction of supply curves and those related to the prediction of price clearing.

First of all, despite the complexity of the problem, we were able to predict the supply curves by significantly improving the naive estimator as well as the ARIMA model. The best prediction results were obtained with a Histogram-Based Gradient Boosting model. The information of the previous day’s curve and the matching price of the daily market of the same day were the factors with the greatest influence, although the rest of the factors had a notable weight in the prediction. However, as we can see in Figure 7, the improvement of these predictions with respect to the naive estimator was larger in years 2019 and 2020 than in 2021. In general, as we see in Figure 9, when the prediction of the curve is not good, there is a tendency for the predicted curve to move closer to the previous-day curve. This fact could indicate that there is some reason for why the real curve differs from that of the previous day that might be absent in the input features and whose inclusion could improve the performance of the algorithm. Some of the plausible inputs that we would explore in future work are (1) accumulated rainfall and availability of water reserves for hydroelectric production, and (2) changes and/or errors in demand prediction by the system operator.

The abnormality on year 2021 is amplified when estimating the matching prices. For years 2019 and 2020, we were able to predict the final price with an accuracy greater than 10 EUR/MWh in 95% of the cases. However, 2021 was a year that apparently behaved much more irregularly or at least not following the same patterns and where the results, although improving the naive estimators, were not as good as in the previous years. The expansion of the pandemic and the measures implemented to stop it could be related with this fact, making 2021 an especially difficult year to predict.

Another issue that we should explore is the reduction in dimension through the canonical correlation procedure, which would allow us consideration of linear combinations of the input variables that are highly correlated with the variables to be predicted.

Author Contributions

Conceptualization, D.F.-P. and A.M.A.; methodology, D.F.-P. and A.M.A.; software, D.F.-P.; writing—original draft preparation, D.F.-P.; writing—review and editing, A.M.A.; visualization, D.F.-P.; supervision, A.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

The second author acknowledges the partial funding of Ministerio de Ciencia e Innovación: PID2019-108311GB-I00/MCIN/AEI/10.13039/501100011033.

Data Availability Statement

The data considered in this paper is available at https://www.esios.ree.es/es/descargas (accessed on 15 February 2023) and https://www.esios.ree.es/es/curvas-de-ofertas (accessed on 15 February 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNMC	Comisión Nacional de los Mercados y la Competencia
DNN	Dense Neural Network
HB	Histogram-based
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MSE	Mean Square Error
OMIE	Operador del Mercado Ibérico de Energía
PC	Principal Components
PCA	Principal Components Analysis
RF	Random Forest
REE	Red Eléctrica de España
SARIMA	Seasonal AutoRegressive Integrated Moving Average

Appendix A

Figure A1. Publications and submission of requirements and offers for secondary and tertiary markets. Extract from the resolution published in the Official State Gazette (in Spanish).

References

Hong, T.; Pinson, P.; Wang, Y.; Weron, R.; Yang, D.; Zareipour, H. Energy Forecasting: A Review and Outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
Mestre, G.; Sánchez-Úbeda, E.F.; Muñoz San Roque, A.; Alonso, E. The arithmetic of stepwise offer curves. Energy 2022, 239, 122444. [Google Scholar] [CrossRef]
Nowotarski, J.; Weron, R. Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renew. Sustain. Energy Rev. 2018, 81, 1548–1568. [Google Scholar] [CrossRef]
Lago, J.; Marcjasz, G.; De Schutter, B.; Weron, R. Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Appl. Energy 2021, 293, 116983. [Google Scholar] [CrossRef]
Ziel, F.; Steinert, R. Electricity price forecasting using sale and purchase curves: The X-model. Energy Econ. 2016, 59, 435–454. [Google Scholar] [CrossRef]
Pinhão, M.; Fonseca, M.; Covas, R. Electricity Spot Price Forecast by Modelling Supply and Demand Curve. Mathematics 2022, 10, 2012. [Google Scholar] [CrossRef]
Energia y Sociedad. History of Electricity in Spain. 2023. Available online: https://www.energiaysociedad.es/manual-de-la-energia/1-2-historia-de-la-electricidad-en-espana/ (accessed on 3 May 2023). (In Spanish).
Agosti, L.; Padilla, A.J.; Requejo, A. El mercado de generación eléctrica en España: Estructura, funcionamiento y resultados. Econ. Ind. 2007, 364, 21–37. [Google Scholar]
Endesa. How Electricity Market Works in Spain. 2022. Available online: https://www.endesa.com/en/the-e-face/energy-sector/how-electricity-market-works-in-spain (accessed on 10 April 2023).
Liu, J.; Wang, J.; Cardinal, J. Evolution and reform of UK electricity market. Renew. Sustain. Energy Rev. 2022, 161, 112317. [Google Scholar] [CrossRef]
Energia y Sociedad. Demand and Production Adjustment Mechanisms. 2023. Available online: https://www.energiaysociedad.es/manual-de-la-energia/6-5-mecanismos-de-ajuste-de-demanda-y-produccion (accessed on 10 April 2023). (In Spanish).
BOE. No. 335 del Jueves 24 de Diciembre de 2020, Sec. III. Pág. 120122–120317, Resolución de 10 de Diciembre de 2020, de la Comisión Nacional de los Mercados y la Competencia, por la Que se Aprueba la Adaptación de los Procedimientos de Operación del Sistema a las Condiciones Relativas al Balance Aprobadas por Resolución de 11 de Diciembre de 2019; Gobierno de España: Madrid, Spain, 2020. [Google Scholar]
Aneiros, G.; Vilar, J.M.; Cao, R.; Muñoz San Roque, A. Functional Prediction for the Residual Demand in Electricity Spot Markets. IEEE Trans. Power Syst. 2013, 28, 4201–4208. [Google Scholar] [CrossRef]
Kulakov, S. X-Model: Further Development and Possible Modifications. Forecasting 2020, 2, 20–35. [Google Scholar] [CrossRef]
Mestre, G.; Portela, J.; Muñoz San Roque, A.; Alonso, E. Forecasting hourly supply curves in the Italian day-ahead electricity market with a double-seasonal SARMAHX model. Int. J. Electr. Power Energy Syst. 2020, 121, 106083. [Google Scholar] [CrossRef]
Shah, I.; Lisi, F. Forecasting of electricity price through a functional prediction of sale and purchase curves. J. Forecast. 2020, 39, 242–259. [Google Scholar] [CrossRef]
Alonso, A.M.; Li, Z. Approximation of supply curves. arXiv 2023, arXiv:2311.10738. [Google Scholar]
Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Guryanov, A. Histogram-Based Algorithm for Building Gradient Boosting Ensembles of Piecewise Linear Decision Trees. In Analysis of Images, Social Networks and Texts, Proceedings of the 8th International Conference, AIST 2019, Kazan, Russia, 17–19 July 2019; van der Aalst, W.M.P., Batagelj, V., Ignatov, D.I., Khachay, M., Kuskova, V., Kutuzov, A., Kuznetsov, S.O., Lomazova, I.A., Loukachevitch, N., Napoli, A., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 39–50. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A survey on long short-term memory networks for time series prediction. Procedia CIRP 2021, 99, 650–655. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
BOE. No. 120, del Jueves 20 de Mayo de 2021. Sec. I, Pág. 61443–61605. Resolución de 6 de Mayo de 2021, de la Comisión Nacional de los Mercados y la Competencia, por la Que se Aprueban las Reglas de Funcionamiento de los Mercados Diario e Intradiario de Energía Eléctrica para su Adaptación de los Líites de Oferta a los Límites de Casación Europeos; Gobierno de España: Madrid, Spain, 2021. [Google Scholar]

Figure 1. Scheme of the participants in the electricity market.

Figure 2. Spanish electricity market scheme.

Figure 3. Examples of the 24 supply curves of a day.

Figure 4. Weight function UniquePrices_exp used to calculate the approximation error.

Figure 5. Example of curve approximation.

Figure 6. Histogram of monotonicity breaks before correction.

Figure 7. Monthly MAE and RMSE in curve forecasting.

Figure 8. Examples of curve forecasts.

Figure 9. Influence of lag 24 on predicted curves.

Figure 10. Shap values of features for Q30 series (w1 represents the first principal component of the wind speed).

Figure 11. Mean of the shap values of features for Q30 series (w1 represents the first principal component of the wind speed).

Figure 12. Shap values of the most relevant features for all series (w1 represents the first principal component of the wind speed).

Figure 13. Prediction Errors of matching price by month using HBG.

Figure 14. Error in matching price prediction and COVID-19.

Table 1. Summary statistics of the approximation errors for the price depending on the weight function, W.

Weight Function	Mean	Median	SD	Q1	Q3	P90	P95	P99
UniquePrices_Exp	0.37	0.23	1.92	0.10	0.41	0.66	0.95	2.56
FinalPrices_logNormal	0.37	0.23	1.76	0.10	0.41	0.67	0.98	2.64
AllPrices_Exp	0.37	0.23	1.69	0.10	0.41	0.67	1.00	2.40
FinalPrices_Cauchy	0.39	0.23	2.19	0.10	0.41	0.70	1.00	2.77
FinalPrices_Exp	0.39	0.23	2.35	0.10	0.41	0.69	1.00	2.75
FinalPrices_Normal	0.40	0.22	2.29	0.10	0.41	0.68	1.00	2.61

Table 2. Summary statistics of the approximation errors for the price depending on n for

q_{0} = 0

.

Table 2. Summary statistics of the approximation errors for the price depending on n for

q_{0} = 0

.

n	Mean	Median	SD	Q1	Q3	P90	P95	P99
35	0.55	0.30	1.68	0.12	0.60	1.00	1.48	4.22
40	0.47	0.27	1.08	0.10	0.52	0.90	1.29	3.78
45	0.44	0.24	2.33	0.10	0.47	0.77	1.12	3.18
50	0.40	0.23	2.00	0.10	0.42	0.75	1.06	2.99
55	0.34	0.20	0.78	0.08	0.40	0.67	0.95	2.44

Table 3. Summary statistics of the approximation errors for the price depending on n for

q_{0} = 337

.

Table 3. Summary statistics of the approximation errors for the price depending on n for

q_{0} = 337

.

n	Mean	Median	SD	Q1	Q3	P90	P95	P99
35	0.49	0.32	2.27	0.13	0.53	0.83	1.20	3.06
40	0.46	0.27	2.77	0.12	0.47	0.77	1.12	2.96
45	0.38	0.23	2.20	0.10	0.42	0.67	0.99	2.57
50	0.40	0.20	3.53	0.09	0.39	0.60	0.91	2.51
55	0.34	0.20	2.85	0.09	0.36	0.57	0.80	2.11

Table 4. Summary statistics of the prediction errors for the price using naive estimators.

Estimator	Mean	Median	SD	Q1	Q3	P90	P95	P99
Previous Day Curve	7.03	3.60	12.19	1.38	8.35	16.41	24.33	50.56
Previous Day Price	6.20	3.36	8.70	1.30	7.68	14.65	21.22	43.10

Table 5. Error metrics in curve prediction, 2019 and 2019–2021.

Model	2019–2021		2019
Model	MAE	RMSE	MAE	RMSE
Naive	172	229	181	240
SARIMA	167	220	174	229
Random Forest without exogenous vars	151	195	153	196
Random Forest with exogenous vars	157	194	141	180
Random Forest 24 models without exogenous vars	160	204	162	206
Random Forest 24 models with exogenous vars	168	216	150	192
HB Gradient Boosting	151	195	137	175
HB Gradient Boosting 24 models	159	204	143	183
HB Gradient Boosting 4 models	155	200	137	176
DNN 24 models	169	233	138	178

Table 6. MAE before and after monotonicity restoration.

MAE	2019 I	2019 II	2019 III	2020 I	2020 II	2020 III	2021 I	2021 II	2021 III
Before correction	131.27	123.85	153.29	163.23	136.05	136.54	136.85	147.38	146.01
After correction	131.19	123.57	152.86	162.74	135.49	136.20	136.89	147.32	145.58
Difference (%)	$- 0.06$	$- 0.23$	$- 0.28$	$- 0.30$	$- 0.41$	$- 0.25$	0.03	$- 0.03$	$- 0.29$

Table 7. RMSE before and after monotonicity restoration.

RMSE	2019 I	2019 II	2019 III	2020 I	2020 II	2020 III	2021 I	2021 II	2021 III
Before correction	167.20	158.05	195.89	209.50	171.04	174.37	176.54	188.90	189.13
After correction	167.04	157.69	195.33	208.90	170.43	173.89	176.58	188.89	188.65
Difference (%)	$- 0.10$	$- 0.23$	$- 0.29$	$- 0.29$	$- 0.36$	$- 0.27$	0.03	$- 0.00$	$- 0.25$

Table 8. Summary statistics of the absolute value of the prediction errors for matching prices (EUR/MWh).

	HGB	Naive Price	Naive Curve
Mean	4.97	6.36	7.08
SD	8.48	10.34	21.00
Min	0.00	0.00	0.00
P25%	0.96	1.03	1.00
P50%	2.28	2.83	2.80
P75%	5.38	7.38	7.40
P90%	11.63	15.20	15.75
P95%	18.33	23.71	24.79
P99%	43.49	52.87	58.00
Max	208.20	209.62	592.87

Table 9. Summary statistics of absolute errors in matching price predictions by year (EUR/MWh).

Year	Mean	Q1	Median	Q3	P90	P95	P99	Max
2019	2.65	0.70	1.45	3.10	6.55	9.70	16.33	59.00
2020	2.74	0.85	1.90	3.65	6.26	8.58	13.37	43.21
2021	9.52	2.12	5.06	11.61	23.03	35.06	66.71	208.20

Table 10. Summary statistics of the absolute prediction errors for the matching prices in 2021 (EUR/MWh).

	HGB	Naive Price	Naive Curve
Mean	9.52	11.68	13.02
SD	12.82	15.17	28.78
Min	0.00	0.00	0.00
P25%	2.12	2.59	2.53
P50%	5.06	6.51	6.40
P75%	11.61	14.36	14.64
P90%	23.03	28.27	29.76
P95%	35.06	43.30	45.00
P99%	66.71	74.98	82.84
Max	208.20	209.62	585.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Foronda-Pascual, D.; Alonso, A.M. Prediction of Matching Prices in Electricity Markets through Curve Representation. Energies 2023, 16, 7812. https://doi.org/10.3390/en16237812

AMA Style

Foronda-Pascual D, Alonso AM. Prediction of Matching Prices in Electricity Markets through Curve Representation. Energies. 2023; 16(23):7812. https://doi.org/10.3390/en16237812

Chicago/Turabian Style

Foronda-Pascual, Daniel, and Andrés M. Alonso. 2023. "Prediction of Matching Prices in Electricity Markets through Curve Representation" Energies 16, no. 23: 7812. https://doi.org/10.3390/en16237812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Matching Prices in Electricity Markets through Curve Representation

Abstract

1. Introduction

1.1. Spanish Electricity Market

1.2. Daily, Intraday, Secondary, and Tertiary Markets

2. State of the Art

3. Procedures for Approximation and Forecasting of Supply Curves

3.1. Approximations of Supply Curves

3.1.1. Choosing the Grid for Prices

3.1.2. Approximations Using $L_{2}$ Loss Function

3.2. Models

3.2.1. Metrics and Naive Estimators

3.2.2. Preprocessing of Data

3.2.3. ARIMA Model

3.2.4. Machine Learning Models

3.2.5. Monotonicity Restoration

4. Discussion on Curve and Price Prediction Results

4.1. Final Model for Curve Prediction

Interpretability

4.2. Matching Prices Prediction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Prediction of Matching Prices in Electricity Markets through Curve Representation

Abstract

1. Introduction

1.1. Spanish Electricity Market

1.2. Daily, Intraday, Secondary, and Tertiary Markets

2. State of the Art

3. Procedures for Approximation and Forecasting of Supply Curves

3.1. Approximations of Supply Curves

3.1.1. Choosing the Grid for Prices

3.1.2. Approximations Using L 2 Loss Function

3.2. Models

3.2.1. Metrics and Naive Estimators

3.2.2. Preprocessing of Data

3.2.3. ARIMA Model

3.2.4. Machine Learning Models

3.2.5. Monotonicity Restoration

4. Discussion on Curve and Price Prediction Results

4.1. Final Model for Curve Prediction

Interpretability

4.2. Matching Prices Prediction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.2. Approximations Using $L_{2}$ Loss Function