1. Introduction
The government usually allocates the carbon emission quota to enterprises to reduce carbon emissions. Carbon emission allowances are the total allowable amount of carbon dioxide and other greenhouse gases that enterprises can emit during a period. The government distributes free quotas to enterprises based on industry benchmarks. If an enterprise’s carbon emission needs exceed the free quota, the enterprise has to buy carbon emission allowances from the market. The European carbon market (EU ETS, European Union emission trading system) is the largest in the world and contains many industries, such as power, steel, petrochemicals, and so on. Enterprises with low carbon emissions can sell their extra emission allowances to enterprises that have high emission demands.
Carbon emission allowances trading includes spot trading and derivatives trading. Spot trading involves buying and selling the carbon emission allowances directly. Derivatives trading includes carbon emission allowance futures and options. Carbon emission allowance futures are contracts where the buyer and seller agree to trade the carbon emission allowances at a specific price on a future date. The carbon emission allowances will be delivered when the contract expires. Options are the derivatives that give holders the right to buy or sell an underlying asset at a specific price (strike price) at the expiration date. This paper considers carbon emission allowance options trading, whose underlying asset is EUA (EU allowance) futures.
We study the carbon options trading problem for three reasons. First, options can lock in the price. The option holder has the right to buy or sell the underlying asset at the strike price on the expiration date. So if the underlying asset price increases considerably, the call option holder can still buy the underlying asset at the strike price. On the other hand, if the underlying asset price falls sharply, the put option holder can still sell the underlying asset at the strike price. Second, options are more flexible than spots. Options give holders the right rather than the obligation to trade the underlying asset. A call option holder has the right to buy the underlying asset at the strike price on the expiration date. If the underlying asset price is lower than the strike price on the expiration day, he will not exercise the call option. A put option holder has the right to sell the underlying asset at the strike price on its expiration date. If the underlying asset price is higher than the strike price on the expiration day, he will not exercise the put option. Third, options allow investors to control a larger value of assets at a lower cost (option premium). The option price (premium) is usually much less than the underlying price, so options have a leverage effect.
Carbon options have raised researchers’ attention in recent years. Some of the literature focuses on the economic effects of carbon options. Ref. [
1] found that financial options contribute to the stability of the spot market and stimulate investment in carbon emission abatement technologies. Ref. [
2] found that a portfolio contract containing a wholesale price contract and carbon option contract can bring more profits to the company in the context of demand uncertainty. Ref. [
3] pointed out that estimating and analyzing the values of carbon options can assist in strategy design for carbon asset management and help carbon-consuming enterprises achieve effective risk control. Some researchers focused on carbon option pricing problems. Ref. [
4] designed a model that combines the GARCH model and fractional Brownian motion (FBM) for predicting carbon option prices. Ref. [
5] simulated the price change of carbon emission rights with the Caputo–Hadamard uncertain fractional differential equation (UFDE) and proved its effectiveness in the Chinese market. Under uncertainty theory, ref. [
6] modeled carbon futures prices with uncertain differential equations and derived the pricing formulas for American carbon call and put options, which overcame the limitations of the pricing model based on probability theory.
Some researchers focused on option trading strategies. Ref. [
7] utilized the ARIMA model to forecast the S&P 500 index, guiding the formulation of trading strategies, and achieved superior performance in call option trading compared to the GARCH model; ref. [
8] conducted a study using companies from the top six sector indices of the National Stock Exchange as their sample. They examined the risk–return tradeoff of various trading strategies under different market conditions, including straddles and butterflies; ref. [
9] proposed an LSTM framework designed to predict the profit probability of trading strategies and integrated the results into the Kelly criterion to calculate the optimal position size, thereby transforming futures trading strategies into options trading strategies and effectively enhancing overall returns. Ref. [
10] constructed an options trading strategy based on the SMA (Simple Moving Average) and Bollinger Bands, validating the flexibility and effectiveness of options trading. They particularly highlighted the significant role of leverage in enhancing returns during low-volatility periods.
However, the above literature overlooked the influence of trading behavior itself on option prices, failing to incorporate price impact into consideration. The concept of price impact originates from stock trading, where investors trading large volumes of shares within a short timeframe can cause prices to move in a direction unfavorable to the investors. Unlike the frictionless market described in the classic Black–Scholes model, real markets are more complex due to the existence of execution cost and price impact. Therefore, ref. [
11] introduced stochastic optimal control under these conditions to propose an option pricing and hedging model. Based on the trading data of KOSPI 200 options, ref. [
12] validated the presence of price impact in the options market and pointed out that the temporary price impact is a concave function proportional to the square root of the trading volume. Inspired by the aforementioned literature, this paper is a pioneering work on carbon options trading strategies that considers the price impact. This paper studies how to allocate orders within a day if the trader wants to sell out all their carbon options inventory under some price impact conditions.
Research on optimal execution remains predominantly focused on the stock market. Using the stochastic dynamic programming method, ref. [
13] derived closed-form trading strategies for the stocks optimal execution problem. Ref. [
14] derived an analytical solution for the execution problem with a linear price impact function. By incorporating stochastic resilience into the continuous optimal execution problem, ref. [
15] derived backward stochastic differential equations (BSDEs) for the system. Ref. [
16] solved the optimal execution problem of nonlinear transient impact with a homotopy analysis approach. Ref. [
17] solved the optimal execution problem with the price impact function of trading speed. However, the analytically derived methods rely on a thorough understanding and strict restrictions of the market model, which are difficult to achieve in reality. As a result, model-free algorithms, represented by reinforcement learning and deep learning, have begun to be applied to optimal execution research. Ref. [
18] were the pioneers in applying Q-learning, a kind of reinforcement learning algorithm, to optimal execution. Similarly, ref. [
19] also utilized Q-learning to refine the Almgren–Chriss model strategy based on prevailing spread and volume dynamics. Refs. [
20,
21] developed a deep deterministic policy gradient approach for optimal trade execution problems. Ref. [
22] proposed a deep reinforcement learning method for futures contracts trading, which learned the risk preferences and the optimal trading speed. Ref. [
23] solved the high-frequency optimal trading problem with neural networks. Ref. [
24] extended the optimal execution problem to continuous time and solved it using an actor–critic algorithm.
However, trading options is more complex than trading stocks, because the trader also needs to trade the underlying asset to hedge the risk. A commonly used strategy is to conduct delta hedging when trading options. To the best of our knowledge, there is no research that combines options with optimal execution currently. Benefiting from the flexibility of neural networks, we propose a deep learning strategy for the intraday optimal execution problem of carbon options. Assuming that a trader aims to sell out all of his carbon option inventories within a day, he will decide the amount of options to sell at each time grid. Selling a certain amount of options would have a price impact, which influences the trader’s profit and loss (PnL). The objective function is to maximize the trader’s PnL. We consider four types of price impact functions: linear, logarithmic, power law, and time-varying. We apply the deep learning method to these four cases and compare our neural network strategy with the other two strategies. One is the naive strategy, which sells out all the option inventory at the beginning. The other is the TWAP (time-weighted average price) strategy, which sells the total inventory equally within the day. We find that our deep learning strategy outperforms the other two strategies. The discrepancy increases as the market becomes more illiquid, and the neural network strategies have more advantages in the time-varying price impact case. Our method can also be extended to stock options trading problems.
The rest of the paper is organized as follows.
Section 2 presents the options optimal execution model.
Section 3 develops a deep learning algorithm for it.
Section 4 evaluates the performance of our neural network strategies and compares them with other strategies. The last section concludes the paper.
2. Model
Suppose a trader has Q quantities of carbon call option contracts at the beginning. He wants to sell all these options within a day for some reason. For example, if the underlying price has a decreasing trend and the trader expects the trend to continue, then he would like to sell all the call options as soon as possible.
However, the carbon options market is not that liquid. Selling a certain amount of options will have a non-negligible price impact; thus, the execution price (or trading price) will be less than the current price. In this paper, we consider the optimal execution problem of selling Q quantities of carbon call option contracts within a day.
Consider that the trader sells the carbon call option and trades its underlying asset over the time horizon
, with discrete time grids
. At each time grid
, the trader decides the number of carbon options
to sell within the time interval
. The underlying price, option mid-price, and option delta (the derivative of the option price with respect to the underlying price) at
are denoted by
,
, and
, respectively. Since the carbon option market is illiquid, the average trading price will be
rather than the mid-price
. We illustrate
in more detail in
Section 2.2. The option inventory at
is denoted by
; then,
Assume that the trader adopts delta hedging on each time grid to hedge the risk. The delta value of an option is defined as the rate of the option price’s change with respect to the underlying price’s change,
. According to the Black–Scholes model, holding one option contract and a
amount of underlying asset constitute a riskless portfolio [
25]. Then, the underlying inventory
is
Since the underlying asset is liquid, we assume that it can be traded at price
. Then, the underlying inventory change is
Denote
as the cash the investor holds at
immediately after delta hedging is applied. Then, the cash change between
to
is
Then, the initial wealth and the terminal wealth can be written as
As the investor sells out all the option inventory
by the terminal time
, then
and
. The profit and loss (PnL) is
2.1. Price Dynamics
Suppose that the underlying price is a geometric Brownian motion, which is defined as
Then, the carbon option mid-price
can be calculated by the Black–Scholes formula as follows:
where
is the option’s expiration time, and
is the strike price.
2.2. Price Impact Function
Carbon options are not liquid, so trading
amount of options will have an impact on its execution price
. Suppose the price impact function is denoted by
; then, the average execution price for selling
amount of options can be written as
where
is the option’s mid-price.
We consider four types of price impact functions and provide the supporting literature for them. Here, in the following functions is a scaling constant, which controls the price impact level of a certain quantity. The price impact functions are described as follows:
- (1)
Linear price impact function:
Researchers usually set the price impact as a linear function for the sake of simplifying the model and deriving convenience [
14,
26,
27]. Nevertheless, based on data from the NYSE, ref. [
28] empirically found a linear relationship between price changes and order flow imbalance, thereby providing evidence for this assumption.
- (2)
Logarithmic price impact function:
- (3)
Power law price impact function:
Some scholars have discovered that nonlinear functions offer a superior fit and are theoretically more aligned with reality. It is widely posited that price impact should resemble an “S”-shaped function, where the price change diminishes as the scale of the transaction increases [
29,
30,
31]. However, there is no consensus on the specific functional form. Refs. [
32,
33,
34] characterized the price impact as a power law function, while refs. [
35,
36] argued that a logarithmic function is more appropriate. Consequently, we take both of the aforementioned price impacts into account.
- (4)
Time-varying price impact function:
A number of scholars have turned their attention to the temporal characteristics of liquidity and initiated research in this field. Ref. [
37] pointed out that, in the context of optimal execution, resilience, defined as one dimension of liquidity, is a critical factor that influences trading strategies. Ref. [
38] utilized the short-time Fourier transform to estimate intraday stock market resilience and discovered a variety of resilience patterns, including “U”-shaped and “W”-shaped forms. Under the Fourier transform, any segment of a signal, regardless of its periodicity, can be decomposed into a combination of trigonometric functions. Therefore, we take the sine function as an example to represent such liquidity variations.
2.3. Objective Function
The objective function of the trader is to maximize the expected PnL (profit and loss):
By choosing the amount of options to sell at each time step, the trader aims to maximize the PnL or, equivalently, to reduce the cost due to trading as much as possible. To solve the problem with neural networks, we transform the original objective function with a penalty on the constraints. Then, (
14) becomes
where
is a constant penalty parameter.
4. Results
We collect the EUA (EU allowance) futures data in 2022 and 2023 from investing.com, which contains daily price and volume information. We estimate the data’s annual mean
and annual volatility
in (
8). We consider the intraday optimal execution problem with the number of time grids set to
. The initial underlying price is set to be
, and the expiration date of its options is 1 year later. We calculate the options prices of different strikes:
. We simulate 1,000,000 intraday price paths of the underlying asset and each option as the training dataset. Then, we simulate 100,000 paths as the test dataset. The batch size for training the neural network is set to 10,000. The epoch
. The learning rate is
for epoch 1, and
for epoch 2, which is
of the learning rate of epoch 1. The penalty parameter is set to
. We set the initial underlying inventory as
, the initial option inventory as
, and the initial cash as
= 10,000. We consider four levels of price impact coefficient:
. Low
represents a low price impact level, and high
stands for a high price impact level. We provide the results of different price impact functions in the following subsections.
4.1. Linear Price Impact
The price impact function is linear as defined in (
10). The scaling parameter
is set to be
. When
, the price impact of selling a certain amount
is small. When
, the price impact of selling a certain amount
is relatively large.
Table 1 shows the results. The elements in
Table 1 are the average PnL (profit and loss), which is the average
of the test dataset. The first column of
Table 1 shows the strike price of the call option. The second to the fourth columns represent three strategies. The NN strategy is the neural network strategy proposed in
Section 3. The naive strategy is to sell all the option inventory at the beginning
. The TWAP (time-weighted average price) strategy is to sell equal quantities of
at each time interval. Since selling a large amount of option contracts will have a significant price impact, the trading price (the cash received by the trader) will be certainly much lower than the market’s mid-price. Thus, the naive strategy, which sells all the inventory at the beginning, yields the lowest PnL (or equivalently, the largest loss). Note that the loss of the naive strategy is the price impact
times the option inventory,
.
Table 1 indicates that the NN strategy outperforms the others as
increases. The PnL difference between the NN strategy and the TWAP strategy expands when the market becomes illiquid. When the market is liquid,
is small, the NN and the TWAP strategies do not have large discrepancy. But when the market is illiquid,
is large, the NN strategy has an evident lower loss than the TWAP strategy. We plot the average PnL of each strategy with respect to
in
Figure 2a. The naive strategy’s PnL is obviously lower than the other two strategies. In addition, the discrepancy between the neural network and the TWAP strategy increases with
. As the naive strategy PnL magnitude is much larger than the others, the difference between the neural network strategy and the TWAP strategy is not very clear in the figure, but it can be seen in
Table 1.
Figure 3a exhibits a neural network action path of the option with strike
and
.
varies with time slightly.
Figure 4 plots the loss function (
15) of the neural network. The top to bottom rows represent different option strikes
. The left to right columns represent different levels of price impact coefficient
. All the loss functions converge as the training iterations increase.
4.2. Logarithmic Price Impact
The price impact function is logarithmic as defined in (
11). The coefficient parameter
is set to be
, representing low to high levels of price impact. We compare the neural network strategy with the naive strategy and the TWAP strategy. The results are presented in
Table 2. The neural network strategy is better than the naive strategy for all values of
. The advantages of the neural network strategy increase with the parameter
. The PnLs of the neural network strategy are close to the TWAP strategy when
and
. But when
and
, the market price impact coefficient is high, the neural network strategy outperforms the TWAP strategy.
Figure 5 plots the loss functions for different strikes and
. The loss functions all converge as the training iterations increase.
Figure 2b plots the average PnL of each strategy with respect to
for this case. The naive strategy’s PnL is also obviously lower than the other two strategies. The discrepancy between the neural network and the TWAP strategy increases with
.
Figure 3b plots an action path of the neural network when the price impact is logarithmic. The strike price
, and
. The trader sells more quantities of options at the beginning, and then the action becomes stable later.
4.3. Power Law Price Impact
The price impact has a power law form as defined in (
12). The coefficient parameter
is set to be
, which measures the scale of price impact. We also compare the neural network strategy with the naive strategy and the TWAP strategy. The results are shown in
Table 3. The NN strategy outperforms the others as
increases. The neural network (NN) strategy performance is close to the TWAP strategy when
in this case, while the NN performance is better than the TWAP for
and
. Both of them are much better than the naive strategy, which sells all the option inventory at the beginning. So if the market price impact function is
, the trader can adopt either the neural network strategy or the TWAP strategy when
is small and adopt the neural network strategy when
is large.
Figure 6 shows that the loss functions of the neural network converge for all strikes and
.
Figure 2c plots the average PnL of each strategy with respect to
for the power law price impact case. The naive strategy’s PnL is still the lowest among the three strategies. The discrepancy between the neural network and the TWAP strategy increases with
.
Figure 3c shows an action path of the neural network strategy for the power law price impact case. The trader sells a little bit more quantities of options at the beginning and the end.
4.4. Time-Varying Price Impact
The price impact function is as defined in (
13). The price impact coefficient
varies with time, which reflects the time-varying intraday price impact level. When the market is liquid,
is small, and the price impact for selling
is small. When the market is illiquid,
is large, and the price impact for selling
is large. So a better strategy is to sell more options when the price impact coefficient
is small and to sell less when the price impact coefficient
is large.
Table 4 shows the average PnL for this case. The neural network strategy performs much better than the naive strategy and the TWAP strategy for all
, and its advantages increase with
. When
, the average PnL of the neural network strategy ranges from
to
, the average PnL of the TWAP strategy is around
, and the average PnL of the naive strategy is
. When
, the average PnL of the neural network strategy ranges from
to
, the average PnL of the TWAP strategy is around
, and the average PnL of the naive strategy is
. When
, the average PnL of the neural network strategy ranges from
to
, the average PnL of the TWAP strategy is around
, and the average PnL of the naive strategy is
. When
, the average PnL of the neural network strategy ranges from
to
. The average PnL of the TWAP strategy is around
, and the average PnL of the naive strategy is
. Thus, the trader should take the neural network strategy for selling out
Q quantities of options when the intraday market liquidity varies with time. The loss functions in
Figure 7 also converge in this case.
Figure 2d plots the average PnL of each strategy with respect to
for the time-varying price impact case. The naive strategy’s PnL is still the lowest. The discrepancy between the neural network and the TWAP strategy is more obvious than the other three price impact function cases, and it increases with
.
Figure 3d shows an action path of the neural network strategy for the time-varying case. The trader sells a large quantity of options at the fourth step (
in the figure). This is because the price impact function coefficient
at that time.
4.5. Sensitivity Analysis
We analyze the neural network performance with respect to hyperparameters in this section. We plot the loss function of the test dataset for different hidden sizes and different initial learning rates. We take the option with strike price
and the price impact level parameter
as an example.
Figure 8 plots the test loss with respect to the log scale of the initial learning rate,
. Different colors represent different hidden sizes.
Figure 8a–d show the linear, logarithmic, power law, and time-varying price impact functions, respectively. We experiment with hidden size
and initial learning rates of
. The results show that when the learning rate equals
or
and the hidden size equals 256 or 512, the test loss is stable and lowest.
We also test the role of the penalty parameter
in (
15).
controls the importance of the constraint in the objective function. Increasing the value of
will tend to reduce the execution error
, but the PnL (profit and loss) may decrease. We experiment on the option with strike
and price impact level
. The hidden size is set to 256, and the initial learning rate is set to
. We train the neural network with four levels of
and plot the corresponding test PnL and the exceed rate in
Figure 9. The blue line shows the test PnL. The green line shows the exceed rate,
. The X axis is
.
Figure 9a–d show the results of the four types of price impact functions. As
increases, both the PnL and the exceed rate decrease in general. When
, the exceed rate is already close to zero. Note that the exceed rates of the time-varying price impact case are very close to zero for all values of
. So, the trader should choose the
based on his acceptance of the exceed rate and the PnL.
4.6. Discussion
We have shown that our deep learning method can solve the carbon options intraday optimal execution problem, and its performance is better than the TWAP and the naive strategy. Its advantages increase with the price impact level parameter . The neural network method is more flexible and does not have strict restrictions on the underlying dynamics or price impact functions compared to the existing methods in the literature. In addition, since an agent in reinforcement learning interacts with the environment to update one’s strategy, the reinforcement learning approach usually takes much longer time for training than the deep learning method. Note that our deep learning method only takes 85 s to obtain an optimal strategy for each option.
To practically implement our method in real-world financial markets, one needs to handle the following issues. First, collect the underlying price data and model their dynamics. The geometric Brownian motion and the stochastic volatility model are commonly used. Second, construct a price impact model for the options. This usually needs the limit order book data. Note that our deep learning method does not have restrictions on the underlying dynamics or the price impact forms.
The options exchanges often have position limits on option contracts. This limit restricts the maximum number of option contracts a person can hold. So the trader’s option inventory Q should not exceed the position limit of the exchange. The trading costs of options depend on the exchange’s regulations. They typically consist of a fixed fee and a per-contract fee. For a trader who wants to sell out all the inventory Q within K steps, the total transaction cost is . Here, is the fixed fee, and is the per-contract fee. The total transaction cost is deterministic and does not depend on the trader’s strategy .
We assume that the underlying asset is very liquid so that it can be traded at the mid-price. But the options market is not the case. Selling a large order of option will influence its trading price; thus, the trading price is lower than the mid-price. The price impact function depends on the options order size
and the price impact level parameter
. A larger value of
indicates a more illiquid market, and then the price impact of trading
is larger. Many studies have found that the price impact is a logarithmic or power law function of the order size, and the intraday market liquidity varies with time. Therefore, we test our neural network strategy on four types of price impact functions shown in Equations (
10)–(
13). For practice, a trader needs to choose an appropriate model for the price impact function
based on the market data and then implement our method.
5. Conclusions
This paper proposes a neural network approach for the carbon options optimal execution problem, and it can also be applied to stock options. As the carbon options trading market is usually not liquid, a trader who wants to clear all his carbon option inventories must consider the trading cost. We construct a model for the optimal execution with four types of price impact functions, linear, logarithmic, power-law, and time-varying, in this paper. Then, we develop a neural network approach for solving this problem. We compare the neural network strategy with the other two strategies for different price impact levels. The first is the TWAP strategy, which sells the inventory Q equally at each time grid. The second is the naive strategy, which sells out all the inventory Q at the beginning. Our neural network strategy performs better than the other two strategies for all four types of price impact functions, especially for the time-varying price impact case. The advantages of our neural network strategy increase when the market is more illiquid.
The neural network strategy can be extended to other assets, such as stock options and credit derivatives. The underlying asset dynamics can also be extended to more complicated models, like the stochastic volatility model. For further research, macroeconomic variables or financial market variables can also be included in the trading model. The reinforcement learning techniques are also worth trying in this framework.