Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations

Lai, Qianhui; Yang, Qiang

doi:10.3390/math13071035

Open AccessArticle

Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations

by

Qianhui Lai

^* and

Qiang Yang

School of Economics, Qingdao University, Qingdao 266071, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(7), 1035; https://doi.org/10.3390/math13071035

Submission received: 21 February 2025 / Revised: 17 March 2025 / Accepted: 20 March 2025 / Published: 22 March 2025

(This article belongs to the Section E5: Financial Mathematics)

Download

Browse Figures

Versions Notes

Abstract

This paper solves the optimal trading problem of carbon options with a deep learning approach. In this setting, a trader wants to sell out the option inventory within a day. Since trading a large-size order in the market will influence the price, the trader needs to design a trading strategy to maximize the profit and loss (PnL). We propose a deep learning strategy for carbon options optimal trading, which can also be extended to stock options. Using the data from the European carbon market, we apply our deep learning strategy to four types of price impact functions: linear, logarithmic, power law, and time-varying. We show that our deep learning strategy performs much better than the naive strategy and the TWAP (time-weighted average price) strategy, which are widely used in the industry, especially when the price impact function is time-varying. Our neural network strategy’s advantage becomes larger when the market is more illiquid.

Keywords:

algorithmic trading; deep learning; carbon options

MSC:

91G20; 91G15

1. Introduction

The government usually allocates the carbon emission quota to enterprises to reduce carbon emissions. Carbon emission allowances are the total allowable amount of carbon dioxide and other greenhouse gases that enterprises can emit during a period. The government distributes free quotas to enterprises based on industry benchmarks. If an enterprise’s carbon emission needs exceed the free quota, the enterprise has to buy carbon emission allowances from the market. The European carbon market (EU ETS, European Union emission trading system) is the largest in the world and contains many industries, such as power, steel, petrochemicals, and so on. Enterprises with low carbon emissions can sell their extra emission allowances to enterprises that have high emission demands.

Carbon emission allowances trading includes spot trading and derivatives trading. Spot trading involves buying and selling the carbon emission allowances directly. Derivatives trading includes carbon emission allowance futures and options. Carbon emission allowance futures are contracts where the buyer and seller agree to trade the carbon emission allowances at a specific price on a future date. The carbon emission allowances will be delivered when the contract expires. Options are the derivatives that give holders the right to buy or sell an underlying asset at a specific price (strike price) at the expiration date. This paper considers carbon emission allowance options trading, whose underlying asset is EUA (EU allowance) futures.

We study the carbon options trading problem for three reasons. First, options can lock in the price. The option holder has the right to buy or sell the underlying asset at the strike price on the expiration date. So if the underlying asset price increases considerably, the call option holder can still buy the underlying asset at the strike price. On the other hand, if the underlying asset price falls sharply, the put option holder can still sell the underlying asset at the strike price. Second, options are more flexible than spots. Options give holders the right rather than the obligation to trade the underlying asset. A call option holder has the right to buy the underlying asset at the strike price on the expiration date. If the underlying asset price is lower than the strike price on the expiration day, he will not exercise the call option. A put option holder has the right to sell the underlying asset at the strike price on its expiration date. If the underlying asset price is higher than the strike price on the expiration day, he will not exercise the put option. Third, options allow investors to control a larger value of assets at a lower cost (option premium). The option price (premium) is usually much less than the underlying price, so options have a leverage effect.

Carbon options have raised researchers’ attention in recent years. Some of the literature focuses on the economic effects of carbon options. Ref. [1] found that financial options contribute to the stability of the spot market and stimulate investment in carbon emission abatement technologies. Ref. [2] found that a portfolio contract containing a wholesale price contract and carbon option contract can bring more profits to the company in the context of demand uncertainty. Ref. [3] pointed out that estimating and analyzing the values of carbon options can assist in strategy design for carbon asset management and help carbon-consuming enterprises achieve effective risk control. Some researchers focused on carbon option pricing problems. Ref. [4] designed a model that combines the GARCH model and fractional Brownian motion (FBM) for predicting carbon option prices. Ref. [5] simulated the price change of carbon emission rights with the Caputo–Hadamard uncertain fractional differential equation (UFDE) and proved its effectiveness in the Chinese market. Under uncertainty theory, ref. [6] modeled carbon futures prices with uncertain differential equations and derived the pricing formulas for American carbon call and put options, which overcame the limitations of the pricing model based on probability theory.

Some researchers focused on option trading strategies. Ref. [7] utilized the ARIMA model to forecast the S&P 500 index, guiding the formulation of trading strategies, and achieved superior performance in call option trading compared to the GARCH model; ref. [8] conducted a study using companies from the top six sector indices of the National Stock Exchange as their sample. They examined the risk–return tradeoff of various trading strategies under different market conditions, including straddles and butterflies; ref. [9] proposed an LSTM framework designed to predict the profit probability of trading strategies and integrated the results into the Kelly criterion to calculate the optimal position size, thereby transforming futures trading strategies into options trading strategies and effectively enhancing overall returns. Ref. [10] constructed an options trading strategy based on the SMA (Simple Moving Average) and Bollinger Bands, validating the flexibility and effectiveness of options trading. They particularly highlighted the significant role of leverage in enhancing returns during low-volatility periods.

However, the above literature overlooked the influence of trading behavior itself on option prices, failing to incorporate price impact into consideration. The concept of price impact originates from stock trading, where investors trading large volumes of shares within a short timeframe can cause prices to move in a direction unfavorable to the investors. Unlike the frictionless market described in the classic Black–Scholes model, real markets are more complex due to the existence of execution cost and price impact. Therefore, ref. [11] introduced stochastic optimal control under these conditions to propose an option pricing and hedging model. Based on the trading data of KOSPI 200 options, ref. [12] validated the presence of price impact in the options market and pointed out that the temporary price impact is a concave function proportional to the square root of the trading volume. Inspired by the aforementioned literature, this paper is a pioneering work on carbon options trading strategies that considers the price impact. This paper studies how to allocate orders within a day if the trader wants to sell out all their carbon options inventory under some price impact conditions.

Research on optimal execution remains predominantly focused on the stock market. Using the stochastic dynamic programming method, ref. [13] derived closed-form trading strategies for the stocks optimal execution problem. Ref. [14] derived an analytical solution for the execution problem with a linear price impact function. By incorporating stochastic resilience into the continuous optimal execution problem, ref. [15] derived backward stochastic differential equations (BSDEs) for the system. Ref. [16] solved the optimal execution problem of nonlinear transient impact with a homotopy analysis approach. Ref. [17] solved the optimal execution problem with the price impact function of trading speed. However, the analytically derived methods rely on a thorough understanding and strict restrictions of the market model, which are difficult to achieve in reality. As a result, model-free algorithms, represented by reinforcement learning and deep learning, have begun to be applied to optimal execution research. Ref. [18] were the pioneers in applying Q-learning, a kind of reinforcement learning algorithm, to optimal execution. Similarly, ref. [19] also utilized Q-learning to refine the Almgren–Chriss model strategy based on prevailing spread and volume dynamics. Refs. [20,21] developed a deep deterministic policy gradient approach for optimal trade execution problems. Ref. [22] proposed a deep reinforcement learning method for futures contracts trading, which learned the risk preferences and the optimal trading speed. Ref. [23] solved the high-frequency optimal trading problem with neural networks. Ref. [24] extended the optimal execution problem to continuous time and solved it using an actor–critic algorithm.

However, trading options is more complex than trading stocks, because the trader also needs to trade the underlying asset to hedge the risk. A commonly used strategy is to conduct delta hedging when trading options. To the best of our knowledge, there is no research that combines options with optimal execution currently. Benefiting from the flexibility of neural networks, we propose a deep learning strategy for the intraday optimal execution problem of carbon options. Assuming that a trader aims to sell out all of his carbon option inventories within a day, he will decide the amount of options to sell at each time grid. Selling a certain amount of options would have a price impact, which influences the trader’s profit and loss (PnL). The objective function is to maximize the trader’s PnL. We consider four types of price impact functions: linear, logarithmic, power law, and time-varying. We apply the deep learning method to these four cases and compare our neural network strategy with the other two strategies. One is the naive strategy, which sells out all the option inventory at the beginning. The other is the TWAP (time-weighted average price) strategy, which sells the total inventory equally within the day. We find that our deep learning strategy outperforms the other two strategies. The discrepancy increases as the market becomes more illiquid, and the neural network strategies have more advantages in the time-varying price impact case. Our method can also be extended to stock options trading problems.

The rest of the paper is organized as follows. Section 2 presents the options optimal execution model. Section 3 develops a deep learning algorithm for it. Section 4 evaluates the performance of our neural network strategies and compares them with other strategies. The last section concludes the paper.

2. Model

Suppose a trader has Q quantities of carbon call option contracts at the beginning. He wants to sell all these options within a day for some reason. For example, if the underlying price has a decreasing trend and the trader expects the trend to continue, then he would like to sell all the call options as soon as possible.

However, the carbon options market is not that liquid. Selling a certain amount of options will have a non-negligible price impact; thus, the execution price (or trading price) will be less than the current price. In this paper, we consider the optimal execution problem of selling Q quantities of carbon call option contracts within a day.

Consider that the trader sells the carbon call option and trades its underlying asset over the time horizon

[0, T]

, with discrete time grids

0 = t_{0} < t_{1} < \dots < t_{K} = T

. At each time grid

t_{k}

, the trader decides the number of carbon options

a_{k}

to sell within the time interval

[t_{k}, t_{k + 1})

. The underlying price, option mid-price, and option delta (the derivative of the option price with respect to the underlying price) at

t_{k}

are denoted by

S_{k}

,

C_{k}

, and

Δ_{k}

, respectively. Since the carbon option market is illiquid, the average trading price will be

{\tilde{C}}_{k}

rather than the mid-price

C_{k}

. We illustrate

{\tilde{C}}_{k}

in more detail in Section 2.2. The option inventory at

t_{k}

is denoted by

q_{k}^{o}

; then,

q_{k + 1}^{o} = q_{k}^{o} - a_{k} .

(1)

Assume that the trader adopts delta hedging on each time grid to hedge the risk. The delta value of an option is defined as the rate of the option price’s change with respect to the underlying price’s change,

Δ = \frac{\partial c}{\partial s}

. According to the Black–Scholes model, holding one option contract and a

- Δ

amount of underlying asset constitute a riskless portfolio [25]. Then, the underlying inventory

q_{k}^{s}

is

q_{k}^{s} = - q_{k}^{o} \times Δ_{k}, k = 1, \dots, K .

(2)

Since the underlying asset is liquid, we assume that it can be traded at price

S_{k}

. Then, the underlying inventory change is

δ q_{k + 1}^{s} = q_{k + 1}^{s} - q_{k}^{s} .

(3)

Denote

x_{k}

as the cash the investor holds at

t_{k}

immediately after delta hedging is applied. Then, the cash change between

t_{k}

to

t_{k + 1}

is

x_{k + 1} - x_{k} = a_{k} {\tilde{C}}_{k} - δ q_{k + 1}^{s} S_{k + 1}, k = 0, \dots, K - 1 .

(4)

Then, the initial wealth and the terminal wealth can be written as

\begin{matrix} W_{0} & = x_{0} + q_{0}^{o} C_{0} + q_{0}^{s} S_{0}, \end{matrix}

(5)

\begin{matrix} W_{K} & = x_{K} + q_{K}^{o} C_{K} + q_{K}^{s} S_{K} . \end{matrix}

(6)

As the investor sells out all the option inventory

q_{0}^{o} = Q

by the terminal time

t_{K} = T

, then

q_{K}^{o} = 0

and

q_{K}^{s} = 0

. The profit and loss (PnL) is

PnL = W_{K} - W_{0}

(7)

2.1. Price Dynamics

Suppose that the underlying price is a geometric Brownian motion, which is defined as

d S_{t} = μ S_{t} d t + σ S_{t} d W_{t} .

(8)

Then, the carbon option mid-price

C_{k}

can be calculated by the Black–Scholes formula as follows:

\begin{matrix} C (S_{t}, t) & = N (d_{+}) S_{t} - N (d_{-}) κ e^{- (\bar{T} - t)}, \\ d_{+} & = \frac{1}{σ \sqrt{\bar{T} - t}} [\ln (\frac{S_{t}}{κ}) + (r + \frac{σ^{2}}{2}) (\bar{T} - t)], \\ d_{-} & = d_{+} - σ \sqrt{\bar{T} - t} . \end{matrix}

where

\bar{T}

is the option’s expiration time, and

κ

is the strike price.

2.2. Price Impact Function

Carbon options are not liquid, so trading

a_{k}

amount of options will have an impact on its execution price

{\tilde{C}}_{k}

. Suppose the price impact function is denoted by

f (a)

; then, the average execution price for selling

a_{k}

amount of options can be written as

{\tilde{C}}_{k} = C_{k} - f (a_{k}),

(9)

where

C_{k}

is the option’s mid-price.

We consider four types of price impact functions and provide the supporting literature for them. Here,

γ

in the following functions is a scaling constant, which controls the price impact level of a certain quantity. The price impact functions are described as follows:

(1): Linear price impact function:

f (a) = γ a .

(10)

Researchers usually set the price impact as a linear function for the sake of simplifying the model and deriving convenience [14,26,27]. Nevertheless, based on data from the NYSE, ref. [28] empirically found a linear relationship between price changes and order flow imbalance, thereby providing evidence for this assumption.

(2): Logarithmic price impact function:

f (a) = γ \ln (a + 1) .

(11)

(3): Power law price impact function:

f (a) = γ \sqrt{a} .

(12)

Some scholars have discovered that nonlinear functions offer a superior fit and are theoretically more aligned with reality. It is widely posited that price impact should resemble an “S”-shaped function, where the price change diminishes as the scale of the transaction increases [29,30,31]. However, there is no consensus on the specific functional form. Refs. [32,33,34] characterized the price impact as a power law function, while refs. [35,36] argued that a logarithmic function is more appropriate. Consequently, we take both of the aforementioned price impacts into account.

(4): Time-varying price impact function:

\begin{matrix} f (a, t) & = h (t) a, \\ h (t) & = γ \sin (\frac{π}{2} t) + γ . \end{matrix}

(13)

A number of scholars have turned their attention to the temporal characteristics of liquidity and initiated research in this field. Ref. [37] pointed out that, in the context of optimal execution, resilience, defined as one dimension of liquidity, is a critical factor that influences trading strategies. Ref. [38] utilized the short-time Fourier transform to estimate intraday stock market resilience and discovered a variety of resilience patterns, including “U”-shaped and “W”-shaped forms. Under the Fourier transform, any segment of a signal, regardless of its periodicity, can be decomposed into a combination of trigonometric functions. Therefore, we take the sine function as an example to represent such liquidity variations.

2.3. Objective Function

The objective function of the trader is to maximize the expected PnL (profit and loss):

\begin{matrix} max_{\begin{matrix} a_{k} \\ k = 0, 1, \dots, K - 1 \end{matrix}} & E [W_{K} - W_{0}], \\ s . t . & \sum_{k = 0}^{K - 1} a_{k} = Q . \end{matrix}

(14)

By choosing the amount of options to sell at each time step, the trader aims to maximize the PnL or, equivalently, to reduce the cost due to trading as much as possible. To solve the problem with neural networks, we transform the original objective function with a penalty on the constraints. Then, (14) becomes

min_{\begin{matrix} a_{k} \\ k = 0, 1, \dots, K - 1 \end{matrix}} - E [W_{K} - W_{0}] + η {(\sum_{k = 0}^{K - 1} a_{k} - Q)}^{2},

(15)

where

η

is a constant penalty parameter.

3. Method

Inspired by [39], we adopt a deep learning approach to solve the stochastic control problem (15). For each time grid

k = 0, \dots, K - 1

, we approximate the strategy

a_{k}

(the amount of options to sell) using a feedforward neural network. Figure 1 shows the neural network structure.

At each time grid

t_{k}

, the inputs are the underlying price

S_{k}

, the option mid-price

C_{k}

, cash

x_{k}

, option inventory

q_{k}^{o}

, and the underlying inventory

q_{k}^{s}

. The output is the percentage of the option’s inventory to sell. Each neural network cell has two hidden layers. Both layers have a hidden size of 256. The activation function is ReLU for two hidden layers and sigmoid for the output layer. We apply the Adam optimizer for the neural network.

4. Results

We collect the EUA (EU allowance) futures data in 2022 and 2023 from investing.com, which contains daily price and volume information. We estimate the data’s annual mean

μ = - 0.0367

and annual volatility

σ = 0.43

in (8). We consider the intraday optimal execution problem with the number of time grids set to

K = 10

. The initial underlying price is set to be

S_{0} = 80

, and the expiration date of its options is 1 year later. We calculate the options prices of different strikes:

{70, 75, 80, 85, 90}

. We simulate 1,000,000 intraday price paths of the underlying asset and each option as the training dataset. Then, we simulate 100,000 paths as the test dataset. The batch size for training the neural network is set to 10,000. The epoch

= 2

. The learning rate is

L R = 0.0001

for epoch 1, and

L R = 0.00002

for epoch 2, which is

1 / 5

of the learning rate of epoch 1. The penalty parameter is set to

η = 1

. We set the initial underlying inventory as

q_{0}^{s} = 100

, the initial option inventory as

q_{0}^{o} = 100

, and the initial cash as

x_{0}

= 10,000. We consider four levels of price impact coefficient:

γ = 0.05, 0.1, 0.2, 0.5

. Low

γ

represents a low price impact level, and high

γ

stands for a high price impact level. We provide the results of different price impact functions in the following subsections.

4.1. Linear Price Impact

The price impact function is linear as defined in (10). The scaling parameter

γ

is set to be

0.05, 0.1, 0.2, 0.5

. When

γ = 0.05

, the price impact of selling a certain amount

a_{k}

is small. When

γ = 0.5

, the price impact of selling a certain amount

a_{k}

is relatively large. Table 1 shows the results. The elements in Table 1 are the average PnL (profit and loss), which is the average

W_{K} - W_{0}

of the test dataset. The first column of Table 1 shows the strike price of the call option. The second to the fourth columns represent three strategies. The NN strategy is the neural network strategy proposed in Section 3. The naive strategy is to sell all the option inventory at the beginning

t_{0}

. The TWAP (time-weighted average price) strategy is to sell equal quantities of

Q / K

at each time interval. Since selling a large amount of option contracts will have a significant price impact, the trading price (the cash received by the trader) will be certainly much lower than the market’s mid-price. Thus, the naive strategy, which sells all the inventory at the beginning, yields the lowest PnL (or equivalently, the largest loss). Note that the loss of the naive strategy is the price impact

f (Q)

times the option inventory,

f (Q) \times Q = γ Q^{2}

. Table 1 indicates that the NN strategy outperforms the others as

γ

increases. The PnL difference between the NN strategy and the TWAP strategy expands when the market becomes illiquid. When the market is liquid,

γ

is small, the NN and the TWAP strategies do not have large discrepancy. But when the market is illiquid,

γ

is large, the NN strategy has an evident lower loss than the TWAP strategy. We plot the average PnL of each strategy with respect to

γ

in Figure 2a. The naive strategy’s PnL is obviously lower than the other two strategies. In addition, the discrepancy between the neural network and the TWAP strategy increases with

γ

. As the naive strategy PnL magnitude is much larger than the others, the difference between the neural network strategy and the TWAP strategy is not very clear in the figure, but it can be seen in Table 1. Figure 3a exhibits a neural network action path of the option with strike

κ = 80

and

γ = 0.2

.

a_{k}

varies with time slightly.

Figure 4 plots the loss function (15) of the neural network. The top to bottom rows represent different option strikes

κ = 70, 75, 80, 85, 90

. The left to right columns represent different levels of price impact coefficient

γ = 0.05, 0.1, 0.2, 0.5

. All the loss functions converge as the training iterations increase.

4.2. Logarithmic Price Impact

The price impact function is logarithmic as defined in (11). The coefficient parameter

γ

is set to be

0.05, 0.1, 0.2, 0.5

, representing low to high levels of price impact. We compare the neural network strategy with the naive strategy and the TWAP strategy. The results are presented in Table 2. The neural network strategy is better than the naive strategy for all values of

γ

. The advantages of the neural network strategy increase with the parameter

γ

. The PnLs of the neural network strategy are close to the TWAP strategy when

γ = 0.05

and

0.1

. But when

γ = 0.2

and

0.5

, the market price impact coefficient is high, the neural network strategy outperforms the TWAP strategy. Figure 5 plots the loss functions for different strikes and

γ

. The loss functions all converge as the training iterations increase. Figure 2b plots the average PnL of each strategy with respect to

γ

for this case. The naive strategy’s PnL is also obviously lower than the other two strategies. The discrepancy between the neural network and the TWAP strategy increases with

γ

. Figure 3b plots an action path of the neural network when the price impact is logarithmic. The strike price

κ = 80

, and

γ = 0.2

. The trader sells more quantities of options at the beginning, and then the action becomes stable later.

4.3. Power Law Price Impact

The price impact has a power law form as defined in (12). The coefficient parameter

γ

is set to be

0.05, 0.1, 0.2, 0.5

, which measures the scale of price impact. We also compare the neural network strategy with the naive strategy and the TWAP strategy. The results are shown in Table 3. The NN strategy outperforms the others as

γ

increases. The neural network (NN) strategy performance is close to the TWAP strategy when

γ = 0.05, 0.1

in this case, while the NN performance is better than the TWAP for

γ = 0.2

and

0.5

. Both of them are much better than the naive strategy, which sells all the option inventory at the beginning. So if the market price impact function is

f (a) = γ \sqrt{a}

, the trader can adopt either the neural network strategy or the TWAP strategy when

γ

is small and adopt the neural network strategy when

γ

is large. Figure 6 shows that the loss functions of the neural network converge for all strikes and

γ

. Figure 2c plots the average PnL of each strategy with respect to

γ

for the power law price impact case. The naive strategy’s PnL is still the lowest among the three strategies. The discrepancy between the neural network and the TWAP strategy increases with

γ

. Figure 3c shows an action path of the neural network strategy for the power law price impact case. The trader sells a little bit more quantities of options at the beginning and the end.

4.4. Time-Varying Price Impact

The price impact function is as defined in (13). The price impact coefficient

h (t) = γ \sin (\frac{π}{2} t) + γ

varies with time, which reflects the time-varying intraday price impact level. When the market is liquid,

h (t)

is small, and the price impact for selling

a_{k}

is small. When the market is illiquid,

h (t)

is large, and the price impact for selling

a_{k}

is large. So a better strategy is to sell more options when the price impact coefficient

h (t)

is small and to sell less when the price impact coefficient

h (t)

is large.

Table 4 shows the average PnL for this case. The neural network strategy performs much better than the naive strategy and the TWAP strategy for all

γ

, and its advantages increase with

γ

. When

γ = 0.05

, the average PnL of the neural network strategy ranges from

- 4.44

to

- 1.47

, the average PnL of the TWAP strategy is around

- 55.4

, and the average PnL of the naive strategy is

- 500

. When

γ = 0.1

, the average PnL of the neural network strategy ranges from

- 9.76

to

- 3.95

, the average PnL of the TWAP strategy is around

- 110.4

, and the average PnL of the naive strategy is

- 1000

. When

γ = 0.2

, the average PnL of the neural network strategy ranges from

- 18.68

to

- 3.61

, the average PnL of the TWAP strategy is around

- 220.4

, and the average PnL of the naive strategy is

- 2000

. When

γ = 0.5

, the average PnL of the neural network strategy ranges from

- 27.68

to

- 9.75

. The average PnL of the TWAP strategy is around

- 550.4

, and the average PnL of the naive strategy is

- 5000

. Thus, the trader should take the neural network strategy for selling out Q quantities of options when the intraday market liquidity varies with time. The loss functions in Figure 7 also converge in this case. Figure 2d plots the average PnL of each strategy with respect to

γ

for the time-varying price impact case. The naive strategy’s PnL is still the lowest. The discrepancy between the neural network and the TWAP strategy is more obvious than the other three price impact function cases, and it increases with

γ

. Figure 3d shows an action path of the neural network strategy for the time-varying case. The trader sells a large quantity of options at the fourth step (

t = 3

in the figure). This is because the price impact function coefficient

h (t) = 0

at that time.

4.5. Sensitivity Analysis

We analyze the neural network performance with respect to hyperparameters in this section. We plot the loss function of the test dataset for different hidden sizes and different initial learning rates. We take the option with strike price

κ = 80

and the price impact level parameter

γ = 0.2

as an example. Figure 8 plots the test loss with respect to the log scale of the initial learning rate,

\log_{10} (l r)

. Different colors represent different hidden sizes. Figure 8a–d show the linear, logarithmic, power law, and time-varying price impact functions, respectively. We experiment with hidden size

= {32, 64, 128, 256, 512}

and initial learning rates of

l r = {10^{- 2}, 10^{- 3}, 10^{- 4}, 10^{- 5}}

. The results show that when the learning rate equals

10^{- 3}

or

10^{- 4}

and the hidden size equals 256 or 512, the test loss is stable and lowest.

We also test the role of the penalty parameter

η

in (15).

η

controls the importance of the constraint in the objective function. Increasing the value of

η

will tend to reduce the execution error

{(\sum_{k = 0}^{K - 1} a_{k} - Q)}^{2}

, but the PnL (profit and loss) may decrease. We experiment on the option with strike

κ = 80

and price impact level

γ = 0.2

. The hidden size is set to 256, and the initial learning rate is set to

0.0001

. We train the neural network with four levels of

η = {0.1, 1, 10, 100}

and plot the corresponding test PnL and the exceed rate in Figure 9. The blue line shows the test PnL. The green line shows the exceed rate,

\frac{Q - \sum_{k = 0}^{K - 1} a_{k}}{Q}

. The X axis is

\log_{10} (η)

. Figure 9a–d show the results of the four types of price impact functions. As

η

increases, both the PnL and the exceed rate decrease in general. When

η = 1

, the exceed rate is already close to zero. Note that the exceed rates of the time-varying price impact case are very close to zero for all values of

η

. So, the trader should choose the

η

based on his acceptance of the exceed rate and the PnL.

4.6. Discussion

We have shown that our deep learning method can solve the carbon options intraday optimal execution problem, and its performance is better than the TWAP and the naive strategy. Its advantages increase with the price impact level parameter

γ

. The neural network method is more flexible and does not have strict restrictions on the underlying dynamics or price impact functions compared to the existing methods in the literature. In addition, since an agent in reinforcement learning interacts with the environment to update one’s strategy, the reinforcement learning approach usually takes much longer time for training than the deep learning method. Note that our deep learning method only takes 85 s to obtain an optimal strategy for each option.

To practically implement our method in real-world financial markets, one needs to handle the following issues. First, collect the underlying price data and model their dynamics. The geometric Brownian motion and the stochastic volatility model are commonly used. Second, construct a price impact model for the options. This usually needs the limit order book data. Note that our deep learning method does not have restrictions on the underlying dynamics or the price impact forms.

The options exchanges often have position limits on option contracts. This limit restricts the maximum number of option contracts a person can hold. So the trader’s option inventory Q should not exceed the position limit of the exchange. The trading costs of options depend on the exchange’s regulations. They typically consist of a fixed fee and a per-contract fee. For a trader who wants to sell out all the inventory Q within K steps, the total transaction cost is

K \times f f + Q \times c f

. Here,

f f

is the fixed fee, and

c f

is the per-contract fee. The total transaction cost is deterministic and does not depend on the trader’s strategy

{a_{k}}_{k = 0, 1, \dots, K - 1}

.

We assume that the underlying asset is very liquid so that it can be traded at the mid-price. But the options market is not the case. Selling a large order of option will influence its trading price; thus, the trading price is lower than the mid-price. The price impact function depends on the options order size

a_{k}

and the price impact level parameter

γ

. A larger value of

γ

indicates a more illiquid market, and then the price impact of trading

a_{k}

is larger. Many studies have found that the price impact is a logarithmic or power law function of the order size, and the intraday market liquidity varies with time. Therefore, we test our neural network strategy on four types of price impact functions shown in Equations (10)–(13). For practice, a trader needs to choose an appropriate model for the price impact function

f (a)

based on the market data and then implement our method.

5. Conclusions

This paper proposes a neural network approach for the carbon options optimal execution problem, and it can also be applied to stock options. As the carbon options trading market is usually not liquid, a trader who wants to clear all his carbon option inventories must consider the trading cost. We construct a model for the optimal execution with four types of price impact functions, linear, logarithmic, power-law, and time-varying, in this paper. Then, we develop a neural network approach for solving this problem. We compare the neural network strategy with the other two strategies for different price impact levels. The first is the TWAP strategy, which sells the inventory Q equally at each time grid. The second is the naive strategy, which sells out all the inventory Q at the beginning. Our neural network strategy performs better than the other two strategies for all four types of price impact functions, especially for the time-varying price impact case. The advantages of our neural network strategy increase when the market is more illiquid.

The neural network strategy can be extended to other assets, such as stock options and credit derivatives. The underlying asset dynamics can also be extended to more complicated models, like the stochastic volatility model. For further research, macroeconomic variables or financial market variables can also be included in the trading model. The reinforcement learning techniques are also worth trying in this framework.

Author Contributions

Conceptualization, Q.L.; Methodology, Q.L.; Software, Q.L. and Q.Y.; Validation, Q.L.; Formal analysis, Q.L.; Data curation, Q.Y.; Writing—original draft, Q.L. and Q.Y.; Writing—review & editing, Q.L.; Supervision, Q.L.; Funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by: The Natural Science Foundation of Shandong Province, Grant No. ZR2023QG173.

Data Availability Statement

The data will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, L.; Deng, S.J.; Thomas, V.M. Carbon emission permit price volatility reduction through financial options. Energy Econ. 2016, 53, 248–260. [Google Scholar] [CrossRef]
Ding, J.; Chen, W.; Fu, S. Optimal policy for remanufacturing firms with carbon options under service requirements. J. Syst. Sci. Syst. Eng. 2022, 31, 34–63. [Google Scholar]
Liu, Y.; Tian, L.; Sun, H.; Zhang, X.; Kong, C. Option pricing of carbon asset and its application in digital decision-making of carbon asset. Appl. Energy 2022, 310, 118375. [Google Scholar] [CrossRef]
Liu, Z.; Huang, S. Carbon option price forecasting based on modified fractional Brownian motion optimized by GARCH model in carbon emission trading. N. Am. J. Econ. Financ. 2021, 55, 101307. [Google Scholar] [CrossRef]
Liu, H.; Zhu, Y. Carbon option pricing based on uncertain fractional differential equation: A binomial tree approach. Math. Comput. Simul. 2024, 225, 13–28. [Google Scholar] [CrossRef]
Liu, Z.; Li, Y. Carbon option pricing and carbon management under uncertain finance theory. Commun. Stat. Theory Methods 2024, 1–18. [Google Scholar] [CrossRef]
Rostan, P.; Rostan, A.; Nurunnabi, M. Options trading strategy based on arima forecasting. PSU Res. Rev. 2020, 4, 111–127. [Google Scholar] [CrossRef]
Shivaprasad, S.P.; Geetha, E.; Kishore, L.; Matha, R. Choosing the right options trading strategy: Risk-return trade-off and performance in different market conditions. Investig. Manag. Financ. 2022, 19, 37. [Google Scholar]
Wu, J.M.-T.; Wu, M.-E.; Hung, P.-J.; Hassan, M.M.; Fortino, G. Convert index trading to option strategies via lstm architecture. Neural Comput. Appl. 2020, 1–18. [Google Scholar] [CrossRef]
Carlier, F. A simple options trading strategy based on technical indicators. Int. J. Econ. Financ. Issues 2021, 11, 88–91. [Google Scholar]
Guéant, O.; Pu, J. Option pricing and hedging with execution costs and market impact. Math. Financ. 2017, 27, 803–831. [Google Scholar]
Said, E.; Ayed, A.B.H.; Thillou, D.; Rabeyrin, J.-J.; Abergel, F. Market impact: A systematic study of the high frequency options market. Quant. Financ. 2021, 21, 69–84. [Google Scholar]
Bertsimas, D.; Lo, A.W. Optimal control of execution costs. J. Financ. Mark. 1998, 1, 1–50. [Google Scholar]
Almgren, R.; Chriss, N. Optimal execution of portfolio transactions. J. Risk 2001, 3, 5–40. [Google Scholar]
Graewe, P.; Horst, U. Optimal trade execution with instantaneous price impact and stochastic resilience. SIAM J. Control Optim. 2017, 55, 3707–3725. [Google Scholar]
Curato, G.; Gatheral, J.; Lillo, F. Optimal execution with non-linear transient market impact. Quant. Financ. 2017, 17, 41–54. [Google Scholar]
Kalsi, J.; Lyons, T.; Arribas, I.P. Optimal execution with rough path signatures. SIAM J. Financ. Math. 2020, 11, 470–493. [Google Scholar]
Nevmyvaka, Y.; Feng, Y.; Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 673–680. [Google Scholar]
Hendricks, D.; Wilcox, D. A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), London, UK, 27–28 March 2014; pp. 457–464. [Google Scholar]
Ye, Z.; Deng, W.; Zhou, S.; Xu, Y.; Guan, J. Optimal trade execution based on deep deterministic policy gradient. In Database Systems for Advanced Applications: Proceedings of the 25th International Conference, DASFAA 2020, Jeju, Republic of Korea, 24–27 September 2020; Proceedings, Part I 25; Springer: Berlin/Heidelberg, Germany, 2020; pp. 638–654. [Google Scholar]
Micheli, A.; Monod, M. Deep reinforcement learning for online optimal execution strategies. arXiv 2024, arXiv:2410.13493. [Google Scholar]
Zhang, Z.; Zohren, S.; Roberts, S. Deep reinforcement learning for trading. arXiv 2019, arXiv:1911.10107. [Google Scholar]
Leal, L.; Laurière, M.; Lehalle, C.A. Learning a functional control for high-frequency finance. Quant. Financ. 2022, 22, 1973–1987. [Google Scholar]
Wang, B.; Gao, X.; Li, L. Reinforcement Learning for Continuous-Time Optimal Execution: Actor-Critic Algorithm and Error Analysis. 2023. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4378950 (accessed on 18 February 2025).
Hull, J.C.; Basu, S. Options, Futures, and Other Derivatives; Pearson Education India: Noida, India, 2016. [Google Scholar]
Kyle, A.S. Continuous auctions and insider trading. Econom. J. Econom. Soc. 1985, 53, 1315–1335. [Google Scholar]
Belak, C.; Muhle-Karbe, J.; Ou, K. Liquidation in target zone models. Mark. Microstruct. Liq. 2018, 4, 1950010. [Google Scholar] [CrossRef]
Cont, R.; Kukanov, A.; Stoikov, S. The price impact of order book events. J. Financ. Econom. 2014, 12, 47–88. [Google Scholar]
Farmer, J.D.; Gerig, A.; Lillo, F.; Waelbroeck, H. How efficiency shapes market impact. Quant. Financ. 2013, 13, 1743–1758. [Google Scholar]
Bacry, E.; Iuga, A.; Lasnier, M.; Lehalle, C.-A. Market impacts and the life cycle of investors orders. Mark. Microstruct. Liq. 2015, 1, 1550009. [Google Scholar]
Philip, R. Estimating permanent price impact via machine learning. J. Econom. 2020, 215, 414–449. [Google Scholar]
Plerou, V.; Stanley, H.E.; Gabaix, X.; Gopikrishnan, P. On the origin of power-law fluctuations in stock prices. Quant. Financ. 2004, 4, 11–15. [Google Scholar]
Xu, H.-C.; Jiang, Z.-Q.; Zhou, W.-X. Immediate price impact of a stock and its warrant: Power-law or logarithmic model? Int. J. Mod. Phys. B 2017, 31, 1750048. [Google Scholar]
Guasoni, P.; Weber, M.H. Nonlinear price impact and portfolio choice. Math. Financ. 2020, 30, 341–376. [Google Scholar]
Potters, M.; Bouchaud, J.-P. More statistical properties of order books and price impact. Phys. A: Stat. Mech. Its Appl. 2003, 324, 133–140. [Google Scholar]
Zarinelli, E.; Treccani, M.; Farmer, J.D.; Lillo, F. Beyond the square root: Evidence for logarithmic dependence of market impact on size and participation rate. Mark. Microstruct. Liq. 2015, 1, 1550004. [Google Scholar]
Obizhaeva, A.A.; Wang, J. Optimal trading strategy and supply/demand dynamics. J. Financ. Mark. 2013, 16, 1–32. [Google Scholar] [CrossRef]
Olbrys, J.; Mursztyn, M. Estimation of intraday stock market resiliency: Short-time fourier transform approach. Phys. A: Stat. Mech. Its Appl. 2019, 535, 122413. [Google Scholar]
Han, J.; E, W. Deep learning approximation for stochastic control problems. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]

Figure 1. Neural network structure.

Figure 2. Average PnL with respect to

γ

.

Figure 2. Average PnL with respect to

γ

.

Figure 3. Action path of the neural network strategy. The option strike price

κ = 80

; price impact level parameter

γ = 0.2

.

Figure 3. Action path of the neural network strategy. The option strike price

κ = 80

; price impact level parameter

γ = 0.2

.

Figure 4. Loss functions of the linear price impact case. From top to bottom rows, strike price

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 4. Loss functions of the linear price impact case. From top to bottom rows, strike price

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 5. Loss functions of the logarithmic price impact case. From top to bottom rows, strike price

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 5. Loss functions of the logarithmic price impact case. From top to bottom rows, strike price

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 6. Loss functions of the power law price impact case. From top to bottom rows, strike prices

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 6. Loss functions of the power law price impact case. From top to bottom rows, strike prices

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 7. Loss functions of the time-varying price impact case. From top to bottom rows, strike prices

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 7. Loss functions of the time-varying price impact case. From top to bottom rows, strike prices

κ = 70, 75, 80, 85, 90

. From left to right columns,

γ = 0.05, 0.1, 0.2, 0.5

.

Figure 8. Sensitivity analysis of different hyperparameters.

Figure 9. PnL and the exceed rate with respect to different values of

η

.

Figure 9. PnL and the exceed rate with respect to different values of

η

.

Table 1. Linear price impact PnL.

	(a)	$[γ = 0.05]$
Strike	NN	Naive	TWAP
70	−49.6303	−500	−50.1141
75	−49.7530	−500	−50.1135
80	−50.1891	−500	−50.1128
85	−49.6368	−500	−50.1138
90	−50.6781	−500	−50.1136
	(b)	$[γ = 0.1]$
Strike	NN	Naive	TWAP
70	−98.1512	−1000	−100.1141
75	−98.1553	−1000	−100.1135
80	−98.8641	−1000	−100.1128
85	−98.2245	−1000	−100.1138
90	−98.7457	−1000	−100.1136
	(c)	$[γ = 0.2]$
Strike	NN	Naive	TWAP
70	−192.368	−2000	−200.1141
75	−196.7205	−2000	−200.1135
80	−193.2954	−2000	−200.1128
85	−192.6193	−2000	−200.1138
90	−192.8411	−2000	−200.1136
	(d)	$[γ = 0.5]$
Strike	NN	Naive	TWAP
70	−459.143	−5000	−500.1141
75	−453.7183	−5000	−500.1135
80	−454.58	−5000	−500.1128
85	−455.7442	−5000	−500.1138
90	−459.2235	−5000	−500.1135

Table 2. Logarithmic price impact PnL.

	(a)	$[γ = 0.05]$
Strike	NN	Naive	TWAP
70	−12.7752	−23.0756	−12.8015
75	−12.7618	−23.0756	−12.7816
80	−12.7393	−23.0756	−12.7618
85	−12.7282	−23.0756	−12.7444
90	−12.7058	−23.0756	−12.7266
	(b)	$[γ = 0.1]$
Strike	NN	Naive	TWAP
70	−24.7403	−46.1512	−24.7904
75	−24.7205	−46.1512	−24.7714
80	−24.6986	−46.1512	−24.7516
85	−24.6433	−46.1512	−24.7332
90	−24.6530	−46.1512	−24.7154
	(c)	$[γ = 0.2]$
Strike	NN	Naive	TWAP
70	−48.5561	−92.3024	−48.7699
75	−48.4813	−92.3024	−48.7499
80	−48.5015	−92.3024	−48.7300
85	−48.4959	−92.3024	−48.7126
90	−48.4517	−92.3024	−48.6948
	(d)	$[γ = 0.5]$
Strike	NN	Naive	TWAP
70	−119.3734	−230.7560	−120.7064
75	−119.3508	−230.7560	−120.6874
80	−119.3245	−230.7560	−120.6676
85	−119.3043	−230.7560	−120.6492
90	−119.2799	−230.7560	−120.6314

Table 3. Power law price impact PnL.

	(a)	$[γ = 0.05]$
Strike	NN	Naive	TWAP
70	−15.7263	−50	−15.7547
75	−15.7356	−50	−15.7595
80	−15.7336	−50	−15.7622
85	−15.7277	−50	−15.7676
90	−15.7364	−50	−15.7716
	(b)	$[γ = 0.1]$
Strike	NN	Naive	TWAP
70	−31.4622	−100	−31.566
75	−31.4701	−100	−31.5709
80	−31.4489	−100	−31.5745
85	−31.4726	−100	−31.5790
90	−31.4333	−100	−31.5829
	(c)	$[γ = 0.2]$
Strike	NN	Naive	TWAP
70	−62.7527	−200	−63.1887
75	−62.7544	−200	−63.1935
80	−62.8651	−200	−63.1972
85	−62.7555	−200	−63.2016
90	−62.7773	−200	−63.2055
	(d)	$[γ = 0.5]$
Strike	NN	Naive	TWAP
70	−155.2964	−500	−158.0574
75	−155.3309	−500	−158.0613
80	−155.2946	−500	−158.0650
85	−155.9841	−500	−158.0703
90	−155.3230	−500	−158.0743

Table 4. Time-varying price impact PnL.

	(a)	$[γ = 0.05]$
Strike	NN	Naive	TWAP
70	−1.4672	−500	−55.4542
75	−1.5485	−500	−55.4425
80	−2.7397	−500	−55.4307
85	−4.4350	−500	−55.4212
90	−3.3802	−500	−55.4110
	(b)	$[γ = 0.1]$
Strike	NN	Naive	TWAP
70	−4.8403	−1000	−110.4542
75	−3.9479	−1000	−110.4424
80	−4.8679	−1000	−110.4307
85	−9.7642	−1000	−110.4212
90	−9.7306	−1000	−110.4110
	(c)	$[γ = 0.2]$
Strike	NN	Naive	TWAP
70	−3.6092	−2000	−220.4542
75	−8.3599	−2000	−220.4424
80	−6.7813	−2000	−220.4307
85	−18.6771	−2000	−220.4212
90	−13.2148	−2000	−220.411
	(d)	$[γ = 0.5]$
Strike	NN	Naive	TWAP
70	−9.7451	−5000	−550.4542
75	−21.8538	−5000	−550.4424
80	−18.2810	−5000	−550.4307
85	−24.5852	−5000	−550.4213
90	−27.6799	−5000	−550.4110

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lai, Q.; Yang, Q. Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations. Mathematics 2025, 13, 1035. https://doi.org/10.3390/math13071035

AMA Style

Lai Q, Yang Q. Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations. Mathematics. 2025; 13(7):1035. https://doi.org/10.3390/math13071035

Chicago/Turabian Style

Lai, Qianhui, and Qiang Yang. 2025. "Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations" Mathematics 13, no. 7: 1035. https://doi.org/10.3390/math13071035

APA Style

Lai, Q., & Yang, Q. (2025). Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations. Mathematics, 13(7), 1035. https://doi.org/10.3390/math13071035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning Strategies for Intraday Optimal Carbon Options Trading with Price Impact Considerations

Abstract

1. Introduction

2. Model

2.1. Price Dynamics

2.2. Price Impact Function

2.3. Objective Function

3. Method

4. Results

4.1. Linear Price Impact

4.2. Logarithmic Price Impact

4.3. Power Law Price Impact

4.4. Time-Varying Price Impact

4.5. Sensitivity Analysis

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI