**1. Introduction**

In commodity markets, typical products include directional trades such as futures and forwards, which establish an obligation to purchase or sell an underlying commodity in the future (Clark 2014). As essential tools for managing risks from these contracts, which may consist of multiple underlying assets, there are various options contracts that provide a right to trade the underlying commodity under a specified condition. In this study, we focus on early-exercisable options on multiple underlying assets in commodity markets, i.e., multivariate Bermudan commodity options.

Solving Bermudan commodity option pricing problems with multiple underlying assets and factors is challenging because computational efforts grow exponentially in tandem with the problem dimension in general, which is determined by the number of assets and factors. However, the improvement of algorithms and the rapid growth of computational power have led to a remarkable surge of interest in computational science in recent years. Currently, a wide variety of machine learning algorithms, such as deep learning and neural networks, are successfully employed for classification, regression, clustering, or dimensionality reduction tasks and are applied for large-sized and high-dimensional data in various areas. In this study, we develop a new Bermudan commodity option algorithm

**Citation:** Hoshisashi, Kentaro, and Yuji Yamada. 2023. Pricing Multi-Asset Bermudan Commodity Options with Stochastic Volatility Using Neural Networks. *Journal of Risk and Financial Management* 16: 192. https://doi.org/10.3390/ jrfm16030192

Academic Editors: Kentaro Iwatsubo and Thanasis Stengos

Received: 28 December 2022 Revised: 5 March 2023 Accepted: 10 March 2023 Published: 12 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

via multi-layered neural networks and show its efficiency and effectiveness based on the multi-asset commodity market model with stochastic volatility, wherein significant changes in volatility may be observed according to the demand–supply relationship and geopolitical conditions in commodity markets.

An important feature of Bermudan options is that they can be exercised early, with their value being determined by whether or not they are exercised before maturity. In other words, the option holder must decide whether to continue holding the option or immediately exercise it at a prespecified period. In this situation, it is crucial to determine the continuation value, i.e., the value of holding the option until the next exercise window. Such a continuation value may be given as the discounted conditional expectation of the remaining option value on one step ahead under a risk-neutral probability measure, which generally has no explicit boundary conditions. Moreover, the conditional expectation providing the continuation value is an unknown (possibly nonlinear and complex) multivariate function whose dimensions depend on the number of underlying factors; hence, an exact (yet still approximate) computation involves high-dimensional discrete grids concerning state variables and is quite difficult to solve.

To price early-exercisable options with estimations of continuation value functions, Longstaff and Schwartz (2001) proposed a simple yet powerful numerical method involving a regression-based functional estimation using simulated sample paths known as the least-squares Monte Carlo (LSMC) method. Since then, several studies have examined the application of neural networks (or machine learning methods) for estimating continuation value functions in option pricing based on the LSMC method. For American options' pricing, Haugh and Kogan (2004) applied a neural network with one hidden layer for valuation, whereas Kohler et al. (2010) proved price consistency and convergence with multiple payoff types. Lapeyre and Lelong (2021) gave several numerical examples of Bermudan options and proved convergence. There are additional examples, e.g., the 5000 assets rainbow option (Becker et al. 2021) and expected exposures (Andersson and Oosterlee 2021). Furthermore, other machine learning algorithms have been used for early-exercisable options, e.g., radial basis functions (Ballestra and Pacelli 2013), nearest-neighbor (Feng et al. 2013), deep learning (Becker et al. 2020; Liang et al. 2021), unsupervised learning (Salvador et al. 2020), and reinforcement learning (Li 2022), as well as the support vector machine (Lin and Almeida 2021). Moreover, numerical approaches have also been used, including stochastic kriging metamodels (Ludkovski 2018), high-dimensional partial differential equations (Sirignano and Spiliopoulos 2018), and backward stochastic differential equations (Chen and Wan 2021). Furthermore, there are other applications of neural networks in the finance field, e.g., extending the feature set (Montesdeoca and Niranjan 2016), the calculation of implied volatilities (Liu et al. 2019), and decision-making (Puka et al. 2021). A comprehensive review of these methods was conducted by Ruf and Wang (2020).

Although this study shares the same ideas as the aforementioned studies—in particular, as in Lapeyre and Lelong (2021), given that a multi-layer perceptron (MLP) is applied in the neural network—it is worthwhile to mention that our study may be considered novel in several aspects: We illustrate an algorithm for estimating the continuation values of multi-asset Bermudan commodity options with stochastic volatility features, whereby a smooth activation function, such as the sigmoid function, is applied in the MLP to reflect the smoothness of conditional expectations regarding state variables. The smoothness of functions to represent conditional expectations is key in this study. In the Markovian setting, the conditional probability density functions are usually smooth given state variables; thus, conditional expectations are smooth functions. Therefore, the target continuation value function is smooth, and we can expect a better fit to the target function by using a smooth activation function in the MLP. This is in contrast with more commonly used piecewise linear functions such as the leaky ReLU function applied in the numerical experiments by Lapeyre and Lelong (2021), wherein the fitting function may not be smooth but only piecewise smooth. (Note that, in a similar context with the smoothness of estimated functions, Yamada (2012, 2017) applied the generalized additive model to calculate smooth

functions for conditional expectations in multivariate hedging problems with European and Bermudan options.) Additionally, we applied more sophisticated techniques such as the resampling procedure and early stopping to improve the computational efficiency and avoid possible biases in pricing or overfitting for optimal learning (see Section 3.2 and the numerical experiments).

While several neural network/machine learning models for option pricing exist, we believe that, for this type of research, the current methodologies, along with developed computational algorithms, need to be combined with existing techniques using the currently available computational environment. In this context, the choice of problem and methodology choice is important, as is the approach to the problem and how to perform the numerical experiments. The combination of multi-asset commodity options with stochastic volatility and recently developed neural network techniques (including the computational environment and software) is meaningful since commodity markets are largely volatile, and this volatility may change over time. Moreover, the multi-asset Bermudan commodity options with stochastic volatility, to the best of our knowledge, have not been previously considered despite the problem's importance. It should become more challenging in numerical calculations to configure multiple underlying assets that recognize mean-reverting dynamics and solve boundary conditions with stochastic factors (see, e.g., Hahn and Dyer 2008 and Ball and Roma 1994).

The present study implements a multi-layered neural network and examines its efficiency and effectiveness for multi-asset Bermudan commodity option pricing problems with stochastic volatility. First, we formulate the multi-asset commodity market with stochastic volatility, wherein individual asset price dynamics are expressed as a two-factor model by combining a well-known commodity model by Schwartz (1997) with Heston's stochastic volatility model (Heston 1993). Next, we apply MLPs in the neural network to represent the continuation value functions in Bermudan option pricing, whereby the entire neural network is trained using LSMC simulations. We perform numerical experiments to compare the continuation value function accuracy in response to changing the exercise dates, the number of layers/neurons, and the dimension of the problem. We also compare the relationship between the continuation values and network configurations.

The outline of this article is as follows. Section 2 gives an introduction to the commodity option structure adopted in this study and the formulation of the multi-dimensional Bermudan option problem. Section 3 describes the configuration of neural networks, a multi-dimensional asset model with stochastic volatility, and a Bermudan options pricing procedure for learning and valuation via Monte Carlo sample paths. Section 4 presents the numerical results of the Bermudan option prices and compares the accuracy of the continuation value surfaces. Section 5 summarizes the analysis results and discussions. Lastly, Section 6 concludes this study.

#### **2. Pricing Multi-Asset Bermudan Commodity Options with Stochastic Volatility**

In this section, we introduce early-exercisable commodity options and formulate the problem of pricing multi-asset Bermudan commodity options with stochastic volatility.

#### *2.1. Early-Exercisable Commodity Options*

As stated earlier, in commodity markets, typical products include plain directional trades such as future and forward contracts, which establish an obligation to buy or sell a particular commodity asset at a specified price in the future (Clark 2014). Depending on the terminal values of commodity assets, holding these contracts may lead to a loss or profit for the contractor, while a large loss is particularly undesirable for the holder; furthermore, the possibility of a large profit may be pursued. Such opportunities are realized using options contracts, giving a right to purchase or sell an underlying commodity asset with a prespecified strike price in the future.

Among the many types of options used as hedging tools in commodity markets, early-exercisable options provide additional flexibility regarding exercise timing and are

considered useful for hedgers, in practice. Traditionally, such options are characterized as American options; however, in the context of exotic options, Bermudan options have similar flexibility. These options allow holders to exercise them early, although only on specific dates before maturity; thus, the option holder must decide whether to continue holding the option or immediately exercise it during the exercisable period. However, such continuation value is usually unknown because it depends on future option values on specific exercisable dates. Therefore, it is paramount to determine the continuation values for Bermudan options. The objective of this study is to evaluate computational performance (including the accuracy of continuation value estimation) for pricing multi-asset Bermudan commodity options via multi-layered neural networks.

#### *2.2. Multi-Dimensional Bermudan Option Pricing Problem*

In this subsection, we describe the multi-dimensional Bermudan option pricing problem, following Lapeyre and Lelong (2021). Given a complete filtered probability space (Ω, <sup>F</sup>, (F*t*)0≤*t*≤*<sup>T</sup>* , P) with a finite time horizon *T* > 0, we assume that a set of underlying assets is modeled via a multifactored process (*Xt*)0≤*t*≤*<sup>T</sup>* adapted to the filtration, (F*t*)0≤*t*≤*<sup>T</sup>* , and that P is an associated risk-neutral measure. We consider a Bermudan option with exercise dates 0 = *T*<sup>0</sup> ≤ *T*<sup>1</sup> < *T*<sup>2</sup> < . . . < *T<sup>N</sup>* = *T* and a discrete-time payoff process *PT<sup>n</sup>* if exercised at times (*Tn*)0≤*n*≤*<sup>N</sup>* , where *PT<sup>n</sup>* is specified as a function of *XT<sup>n</sup>* . Then, Bermudan option prices *ZT<sup>n</sup>* are computed using the following recursive equation:

$$\begin{cases} Z\_{T\_N} = P\_{T\_N} \\ Z\_{T\_n} = \max \left( P\_{T\_n}, e^{-r\delta\_{T\_n}} \mathbb{E} \left[ Z\_{T\_{n+1}} | \mathcal{F}\_{T\_n} \right] \right), \ 0 \le n \le N - 1 \end{cases} \tag{1}$$

where E denotes the expectation under the risk-neutral probability measure P with the risk-free interest rate *r* and the interval between *Tn*−<sup>1</sup> and *T<sup>n</sup>* as δ*T<sup>n</sup>* . Furthermore, assuming that (*Xt*)0≤*t*≤*<sup>T</sup>* is a multi-dimensional Markov process, there exists a measurable function Φ*<sup>n</sup>* : <sup>R</sup>*d<sup>x</sup>* <sup>→</sup> <sup>R</sup>, such that:

$$e^{-r\delta\_{T\_n}}\mathbb{E}\left[Z\_{T\_{n+1}}|\mathcal{F}\_{T\_n}\right] = e^{-r\delta\_{T\_n}}\mathbb{E}\left[Z\_{T\_{n+1}}|X\_{T\_n}\right] = \Phi\_n(X\_{T\_n}) \; \; \; 0 \le n \le N-1. \tag{2}$$

Herein, we refer to Φ*<sup>n</sup>* as a continuation value function in this paper.

Note that finding the exact Φ*<sup>n</sup>* is difficult; alternatively, one may identify a function *f<sup>n</sup>* to minimize the following quantity,

$$\mathbb{E}\left[\left|e^{-r\mathcal{S}\_{T\_n}}Z\_{T\_{n+1}} - f\_n(X\_{T\_n})\right|^2\right],\tag{3}$$

over a parametrized set of functions B. If all (real-valued) square-integrable measurable functions are searched to minimize Equation (3), it turns out that the function Φ*<sup>n</sup>* providing the conditional expectation in Equation (2) is achieved via an optimizer. However, there is a trade-off between the generality of a set of functions and the efficiency of computation. Additionally, computational tractability depends on the methodology to solve the optimization problem.

#### *2.3. Multi-Asset Commodity Market Model with Stochastic Volatility*

This study employs a multivariate commodity market model consisting of multiple underlying assets with stochastic volatility for the Bermudan option problem. To this end, we adopt a stochastic volatility model for the mean-reverting commodity dynamics (Schwartz 1997) and expand it to the multi-asset case.

Consider the Bermudan option problem with *n* underlying assets at time *t*, *Si*,*<sup>t</sup>* , *i* = 1, . . . , *l*, the *i*-th price dynamics of which are governed by the following two-dimensional stochastic differential equations (SDEs):

$$\begin{cases} d\mathbf{S}\_{i,t} = \kappa\_{\mathbf{S}\_i} (\mu\_i - \ln \mathbf{S}\_{i,t}) \mathbf{S}\_{i,t} dt + \sqrt{\upsilon\_{i,t}} \mathbf{S}\_{i,t} \, d\mathcal{W}\_{\mathbf{S}\_i,t}, \\ d\upsilon\_{i,t} = \kappa\_{\upsilon\_i} (\theta\_i - \upsilon\_{i,t}) dt + \mathbf{f}\_i \sqrt{\upsilon\_{i,t}} dW\_{\upsilon\_i,t}. \end{cases} \tag{4}$$

Herein, *WS<sup>i</sup>* ,*<sup>t</sup>* and *Wv<sup>i</sup>* ,*<sup>t</sup>* are correlated Brownian motions with appropriate correlation parameters, while the magnitude of the speed coefficient *κS<sup>i</sup>* measures the degree of mean reversion to the long-run mean *µ<sup>i</sup>* , including the market price of risk in the underlying asset price processes. The second term characterizes the *i*-th volatility process, *σi*,*<sup>t</sup>* ≡ √*vi*,*<sup>t</sup>* , where *κv<sup>i</sup>* indicates a degree of mean reversion toward long-term volatility *θ<sup>i</sup>* , and *ξ<sup>i</sup>* is the volatility of volatility.

Since each of the underlying asset price dynamics in (4) follows a two-dimensional Markov process, the state variables at time *t*, denoted by *X<sup>t</sup>* , corresponding to the input features *X* of the MLP in the previous subsection, may be described as

$$X\_t := \begin{bmatrix} \mathbb{S}\_{1,t}, \sigma\_{1,t} \; \mathbb{S}\_{2,t}, \sigma\_{2,t} \; \dots \; \mathbb{S}\_{l,t}, \sigma\_{l,t} \end{bmatrix}^\mid \in \mathbb{R}^{d\_l}. \tag{5}$$

The dimension *d<sup>l</sup>* in (5) depends on the number of state variables and is given by *dl* := 2*l*. Note that the SDEs of the underlying assets are used to generate sample paths of the LSMC method in the Bermudan option pricing problem.

#### **3. Application of Neural Networks with MLP**

When pricing Bermudan commodity options using a model with multi-dimensional factors—including the multi-asset stochastic volatility model introduced in the previous section—it is crucial to determine continuation values at each exercisable date. In order to identify a continuation value function in the multi-asset Bermudan option pricing problem with stochastic volatility, this study takes a neural network approach with MLP, similar to Lapeyre and Lelong (2021). First, we introduce the neural network architecture considered in this study, which generates a continuation value in Bermudan commodity options pricing. Second, we present the underlying assets model with multi-dimensional factors, which has multi-asset and stochastic volatility. Finally, we provide algorithms for learning the entire network and option pricing procedure.

### *3.1. Continuation Value Functions via MLPs*

First, we explain the configuration of an MLP to express a general multi-dimensional function and to approximate the continuation value function in the multi-dimensional Bermudan option problem.

The basic configuration of the MLP is shown in Figure 1, where *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* is an input vector and *Z* ∈ R is an output of the entire neural network. Each neuron is called a "perceptron" that defines a mapping of input/output signals with appropriate dimensions, being dependent on the number of neurons at input/output layers. For example, if *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*d<sup>x</sup>* denotes an input signal of a perceptron with a weight matrix *W* ∈ R *<sup>d</sup>x*×*d<sup>y</sup>* and a bias vector *b* ∈ R *dy* , then, an output signal *y* ∈ R *dy* from the perceptron is given by

$$\mathbf{y} = h \left( \mathbf{W}^T \mathbf{x} + \mathbf{b} \right), \tag{6}$$

where *h* : R *<sup>d</sup><sup>y</sup>* <sup>→</sup> <sup>R</sup> *dy* is a component-wise activation function. Typical choices of activation functions in neurons are as follows:

$$\begin{array}{l}\text{Sigmoid}: \ x \mapsto \frac{1}{1 + \epsilon^{-x}}\\\text{ReLU}: \ x \mapsto \max(x, 0) \end{array} . \tag{7}$$

In the case where the MLP is applied for a regression, all the weight matrices and bias vectors in the MLP are computed to minimize the sum of squared errors between the actual dependent variable, denoted by *<sup>Z</sup>* <sup>∈</sup> <sup>R</sup>, and the predicted dependent variable *<sup>Z</sup>*<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup> given the training datasets of *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*d<sup>x</sup>* and *<sup>Z</sup>* <sup>∈</sup> <sup>R</sup> (expressed using the MLP in Figure 1).

Note that the MLP can express a continuous and complex nonlinear surface in entire networks by sequentially performing a linear and nonlinear transformation on inputs *X* ∈ R*d<sup>x</sup>* to the compiled layer output *<sup>Z</sup>*<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup>. The properties of the MLP function derive from the universal approximation theorem proposed by Cybenko (1989) and the Kolmogorov–

Arnold representation theorem put forward by Kolmogorov (1957) and Arnold (2009), in which any function can be approximated if the input size and network are infinite. In this sense, functions expressed by the MLP are generally considered suitable for a problem with complicated interactions because of the adjustable basis functions (see Choon et al. 2008). *J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 6 of 23

**Figure 1.** Multi-layered structure with perceptrons (circles) in a multi-layer perceptron. **Figure 1.** Multi-layered structure with perceptrons (circles) in a multi-layer perceptron.

Note that the MLP can express a continuous and complex nonlinear surface in entire networks by sequentially performing a linear and nonlinear transformation on inputs ∈ ℝௗೣ to the compiled layer output መ ∈ ℝ. The properties of the MLP function derive from the universal approximation theorem proposed by Cybenko (1989) and the Kolmogorov– Arnold representation theorem put forward by Kolmogorov (1957) and Arnold (2009), in which any function can be approximated if the input size and network are infinite. In this sense, functions expressed by the MLP are generally considered suitable for a problem with complicated interactions because of the adjustable basis functions (see Choon et al. 2008). For the randomly generated sample paths of (௧)ஸ௧ஸ் , we can apply the LSMC method (see Appendixes A and B) combined with the MLP, whereby the continuation value function is modeled at each step using a function given by the MLP. Let (ଵ) (ଶ) (ெ) For the randomly generated sample paths of (*Xt*)0≤*t*≤*<sup>T</sup>* , we can apply the LSMC method (see Appendices A and B) combined with the MLP, whereby the continuation value function is modeled at each step using a function given by the MLP. Let *X* (1) *t* , *X* (2) *t* , . . . , *X* (*M*) *t* and <sup>0</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>* be the simulated *<sup>M</sup>* sample paths of (*Xt*)0≤*t*≤*<sup>T</sup>* . Since the discrete-time payoff process *PT<sup>n</sup>* (if exercised at times (*Tn*)0≤*n*≤*<sup>N</sup>* ) is specified as a function of *XT<sup>n</sup>* for a Bermudan option and *ZT<sup>N</sup>* = *PT<sup>N</sup>* , the training data of the output variable, *Z* ≡ *ZT<sup>N</sup>* ∈ R, in the first step of the LSMC method, are computed as *P* (1) *T<sup>N</sup>* , *P* (2) *T<sup>N</sup>* , . . . , *P* (*M*) *T<sup>N</sup>* , corresponding to the payoffs of the Bermudan option at maturity along the sample path of *XT<sup>N</sup>* . The MLP in the first step is constructed for the training data of *<sup>X</sup>* <sup>≡</sup> *<sup>X</sup>TN*−<sup>1</sup> <sup>∈</sup> <sup>R</sup>*d<sup>x</sup>* , given as *X* (1) *TN*−<sup>1</sup> , *X* (2) *TN*−<sup>1</sup> , . . . , *X* (*M*) *TN*−<sup>1</sup> , together with those of *Z* ≡ *ZT<sup>N</sup>* ∈ R. Then, we obtain an approximation of the continuation value function, denoted by Φ<sup>ˆ</sup> *<sup>N</sup>*−1, and the continuation values along the sample path, Φ<sup>ˆ</sup> *<sup>N</sup>*−<sup>1</sup> *X* (*m*) *TN*−<sup>1</sup> , *m* = 1, . . . , *M*.

௧ , ௧ ,…,௧ and 0≤≤ be the simulated sample paths of (௧)ஸ௧ஸ். Since the discrete-time payoff process ் (if exercised at times ()ஸஸே) is specified as a function of ் for a Bermudan option and ்ಿ = ்ಿ, the training data of the output variable, In the second step, the training data of the output variable, *<sup>Z</sup>* ≡ *<sup>Z</sup>TN*−<sup>1</sup> ∈ R, are computed using (1), as

$$Z\_{T\_{N-1}}^{(m)} = \max \left( P\_{T\_{N-1}}^{(m)}, \,\,\middle|\,\Phi\_{N-1} \left( X\_{T\_{N-1}}^{(m)} \right) \right), \, m = 1, \ldots, M. \tag{8}$$

்ಿ. The MLP in the first step is constructed for the training data of ≡்ಿషభ ∈ ℝௗೣ, given as ்ಿషభ (ଵ) , ்ಿషభ (ଶ) ,…,்ಿషభ (ெ) , together with those of ≡்ಿ ∈ ℝ. Then, we obtain an approximation of the continuation value function, denoted by ேିଵ, and the continuation values along the sample path, ேିଵ൫்ಿషభ () ൯, = 1, … , . In the second step, the training data of the output variable, ≡்ಿషభ ∈ ℝ, are comas well as the training data of *<sup>X</sup>* <sup>≡</sup> *<sup>X</sup>TN*−<sup>2</sup> <sup>∈</sup> <sup>R</sup>*d<sup>x</sup>* , given as *X* (1) *TN*−<sup>2</sup> , *X* (2) *TN*−<sup>2</sup> , . . . , *X* (*M*) *TN*−<sup>2</sup> . Using these training datasets, the MLP is constructed to find an approximation of the continuation value function, denoted by Φ<sup>ˆ</sup> *<sup>N</sup>*−2, and the continuation values along the sample paths, Φ<sup>ˆ</sup> *<sup>N</sup>*−<sup>2</sup> *X* (*m*) *TN*−<sup>2</sup> . We then repeat the same procedure until *T*0.

#### puted using (1), as *3.2. Learning Networks and Option Pricing*

்ಿషభ () = ቀ்ಿషభ () , ேିଵ൫்ಿషభ () ൯ቁ , = 1, … , *,* (8) as well as the training data of ≡்ಿషమ ∈ ℝௗೣ , given as ்ಿషమ (ଵ) , ்ಿషమ (ଶ) ,…,்ಿషమ (ெ) . Using these training datasets, the MLP is constructed to find an approximation of the continuation value function, denoted by ேିଶ, and the continuation values along the sample paths, ேିଶ൫்ಿషమ () ൯. We then repeat the same procedure until . *3.2. Learning Networks and Option Pricing*  For learning neural networks, we generate Monte Carlo sample paths using the SDEs For learning neural networks, we generate Monte Carlo sample paths using the SDEs in Section 3.2 based on a similar idea to that of the ordinary LSMC method introduced by Longstaff and Schwartz (2001). Herein, we apply nonlinear functions of the MLPs instead of polynomial functions for the basis of the continuation value functions. Additionally, we introduce techniques such as early stopping to improve the fitted continuation functions and avoid possible overfitting or biases in learning and pricing. We also introduce the resampling procedure to avoid a possible bias caused by using the same random samples between learning and valuation, and we regenerate Monte Carlo sample paths for the valuation of Bermudan option prices (see Appendix A for pricing details).

in Section 3.2 based on a similar idea to that of the ordinary LSMC method introduced by Longstaff and Schwartz (2001). Herein, we apply nonlinear functions of the MLPs instead

we introduce techniques such as early stopping to improve the fitted continuation functions and avoid possible overfitting or biases in learning and pricing. We also introduce

Herein, we summarize a learning procedure, as described in Algorithm 1 below, where the underlying price and the volatility vector are denoted by *S<sup>t</sup>* := [*S*1,*<sup>t</sup>* , *S*2,*<sup>t</sup>* , . . . , *Sl*,*<sup>t</sup>* ] > and *σt* := [*σ*1,*<sup>t</sup>* , *σ*2,*<sup>t</sup>* , . . . , *σl*,*<sup>t</sup>* ] > , while *g* denotes a payoff function of *S<sup>t</sup>* . Given simulation the sample paths generated by the multi-asset stochastic volatility models in (4), we provide a Bermudan option pricing procedure involving the algorithm for estimating the continuation values using neural networks. This algorithm operates to find MLP Φ as a continuation value function satisfying Equation (8) in the previous section and gives the Bermudan option price.


**Require:** Initiate paths *S t* , σ *t* , *t* = *T*0, *T*<sup>1</sup> , · · · , *TN*, *j* = 1, 2, · · · , *M* 1:Let *p* be the patience and *Maxiter* be the maximum number of epochs 2:Put *V* (*j*) <sup>←</sup> *<sup>g</sup> S* (*j*) *T<sup>N</sup>* for all *j* 3:**for** *t* from *TN*−<sup>1</sup> to *T*<sup>1</sup> **do** 4: Let *X* (*j*) <sup>←</sup> *<sup>S</sup>* (*j*) *t* , σ (*j*) *t* and *V* (*j*) <sup>←</sup> *<sup>e</sup>* −*r*δ*t* · *V* (*j*) for all *j* 5: **if** *t* on exercisable periods **then** 6: Perform learning on *X* to obtain network Φ*<sup>t</sup>* with *Z* to be *V* 7: *i* ← 0 8: *k* ← 0 9: **while** *i* < *Maxiter* **do** 10: Train Φ*<sup>t</sup>* on *X* and *V* 11: **if** improved **then** 12: *k* ← 0 13: **else** 14: *k* ← *k* + 1 15: **end if** 16: **if** *k* == *p* **then** 17: Break 18: **end if** 19: *i* ← *i* + 1 20: **end while** 21: Calculate the continuation value Φ*t X* (*j*) for all *j* 22: **for** *j* from 1 to *M* **do** 23: **if** *g S* (*j*) *t* > Φ*t X* (*j*) **then** 24: *V* (*j*) <sup>←</sup> *<sup>g</sup> S* (*j*) *t* 25: **end if** 26: **end for** 27: **end if** 28:**end for** 29:**return** mean of *e* −*r*δ*t* · *V*

Note This study does not use the selection technique, which performs regression using only the in-the-money paths proposed by Longstaff and Schwartz (2001), for the purpose of constructing a versatile algorithm.

It is noted that one cycle of training with the complete training data is known as an epoch and is repeated for learning purposes for each continuation value function in Algorithm 1. In general, the larger the number of epochs, the better learning of the training data. However, a large number of epochs usually requires a long computational time, even with large computer resources, and sometimes leads to overfitting of the training data. To prevent such situations, we introduce an early stopping rule for the learning procedure given a specified integer *p* in Algorithm 1. Under the early stopping rule, the objective function (3) is monitored for improvement, and the number of iterations (i.e., the number of epochs) without improvement (compared with the previous epoch), denoted by *k*, are counted. If this number reaches *p*, the iteration stops and the learning procedure of the continuation value function terminates; otherwise, the iteration continues as long as the

iteration index *i* is less than *Maxiter*, where *Maxiter* is the maximum number of epochs specified at the beginning of Algorithm 1. Note that the introduction of the early stopping rule not only decreases the computational time but also prevents overfitting/underfitting for the MLP. epochs specified at the beginning of Algorithm 1. Note that the introduction of the early stopping rule not only decreases the computational time but also prevents overfitting/underfitting for the MLP. Based on the network configuration of the neural networks, the computational complexity of Algorithm 1 is given by the number of iterations for parameter estimation of the

number of epochs) without improvement (compared with the previous epoch), denoted

as the iteration index is less than ௧, where ௧ is the maximum number of

*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 8 of 23

Based on the network configuration of the neural networks, the computational complexity of Algorithm 1 is given by the number of iterations for parameter estimation of the MLP. This number of iterations depends on the maximum number of epochs and the number of exercisable dates, *Maxiter* and *N* − 1. Once these values are specified, the maximum number of iterations is *Maxiter* × (*N* − 1), which is the total number of epochs applied in Algorithm 1. In addition, the computational complexity of each epoch in the MLP depends on the network configuration (see Serpen and Gao 2014). MLP. This number of iterations depends on the maximum number of epochs and the number of exercisable dates, ௧ and −1. Once these values are specified, the maximum number of iterations is ௧ × (−1), which is the total number of epochs applied in Algorithm 1. In addition, the computational complexity of each epoch in the MLP depends on the network configuration (see Serpen and Gao 2014). To price the Bermudan commodity options using the continuation functions esti-

To price the Bermudan commodity options using the continuation functions estimated in Algorithm 1, we regenerate different sample paths from those used in the learning procedure for computing the continuation values and Bermudan commodity option prices, given the neural networks in Algorithm 1, i.e., we separate the learning and the valuation procedures, and Algorithm 1 may be applied without learning (i.e., given the estimated neural networks) for the valuation procedure. The merit of this resampling is that it avoids a price bias, which results from overfitting using the same sample paths. Accordingly, this study adopts the following procedure: mated in Algorithm 1, we regenerate different sample paths from those used in the learning procedure for computing the continuation values and Bermudan commodity option prices, given the neural networks in Algorithm 1, i.e., we separate the learning and the valuation procedures, and Algorithm 1 may be applied without learning (i.e., given the estimated neural networks) for the valuation procedure. The merit of this resampling is that it avoids a price bias, which results from overfitting using the same sample paths. Accordingly, this study adopts the following procedure: 1. Generate the sample paths of the underlying assets for the MLP learning of Algo-


In the above, it is key that learning and pricing (i.e., valuation) utilize the different simulation sample paths set in Steps 1 and 3. Figure 2 shows a flowchart of the entire procedure for learning and valuation. In the above, it is key that learning and pricing (i.e., valuation) utilize the different simulation sample paths set in Steps 1 and 3. Figure 2 shows a flowchart of the entire procedure for learning and valuation.

**Figure 2. Figure 2.** Flowchart of the entire procedure for learning Flowchart of the entire procedure for learning and valuation of Bermudan option pricing. and valuation of Bermudan option pricing.

#### **4. Numerical Experiments**

The objective of this section is to execute numerical experiments based on the learning and valuation procedure explained in the previous section and make comparisons of the Bermudan option pricing between the MLP and the benchmark polynomial regression (i.e., the standard (naïve) LSMC method by Longstaff and Schwartz 2001).

#### *4.1. Problem Setting and Preliminary Experiment 4.1. Problem Setting and Preliminary Experiment*

**4. Numerical Experiments** 

*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 9 of 23

(i.e., the standard (naïve) LSMC method by Longstaff and Schwartz 2001).

Herein, we consider Bermudan commodity options with early-exercisable dates in discretized periods until maturity *T*, i.e., 0 = *T*<sup>0</sup> < *T*<sup>1</sup> < . . . < *T<sup>N</sup>* = *T*, the payoffs of which are given by *g*(*St*) when exercised. We define several settings for different dimensions of Bermudan options, *d<sup>l</sup>* (i.e., the number of state variables in (5)), exercisable dates, and payoff functions. We also introduce a constant volatility model as a one-dimensional problem to perform a preliminary experiment. Herein, we consider Bermudan commodity options with early-exercisable dates in discretized periods until maturity , i.e., 0= < ଵ <⋯<ே = , the payoffs of which are given by (௧) when exercised. We define several settings for different dimensions of Bermudan options, (i.e., the number of state variables in (5)), exercisable dates, and payoff functions. We also introduce a constant volatility model as a one-dimensional problem to perform a preliminary experiment.

The objective of this section is to execute numerical experiments based on the learning and valuation procedure explained in the previous section and make comparisons of the Bermudan option pricing between the MLP and the benchmark polynomial regression

For the exercisable dates of the Bermudan commodity options, we consider two cases as depicted in Figure 3. One is a two-period problem, wherein the Bermudan commodity option is issued at time *T*<sup>0</sup> and can be exercised at *T*<sup>1</sup> and maturity *T*2. The other is a case with multiple exercisable dates, wherein we choose ten exercisable timings before maturity. In both cases, the options can be exercised after a half-year period to compare the continuation value surfaces at time *T*<sup>1</sup> between different methodologies, and the values of the options are evaluated at the initial time period, *T*0. For the exercisable dates of the Bermudan commodity options, we consider two cases as depicted in Figure 3. One is a two-period problem, wherein the Bermudan commodity option is issued at time and can be exercised at ଵ and maturity ଶ. The other is a case with multiple exercisable dates, wherein we choose ten exercisable timings before maturity. In both cases, the options can be exercised after a half-year period to compare the continuation value surfaces at time ଵ between different methodologies, and the values of the options are evaluated at the initial time period, .

**Figure 3.** Bermudan option schedules with exercisable dates of 2 and 11 times. **Figure 3.** Bermudan option schedules with exercisable dates of 2 and 11 times.

Moreover, the payoff functions for Bermudan basket put options are given as Moreover, the payoff functions for Bermudan basket put options are given as

$$\log(S\_{l}) = \max\left(K - \frac{1}{l} \sum\_{i} S\_{i,t}, 0\right). \tag{9}$$

Then, an upper limit price is introduced to the payoff function as Then, an upper limit price *D* is introduced to the payoff function as

$$\log(\mathcal{S}\_l) = \min\left(\max\left(\mathcal{K} - \frac{1}{l}\sum\_{i} \mathcal{S}\_{i,l'} \mathbf{0}\right), D\right) \tag{10}$$

for Bermudan capped put options. Note that the upper limit, , provides an additional complexity of the payoff functions. The parameters of the underlying assets and neural networks are set as shown in for Bermudan capped put options. Note that the upper limit, *D*, provides an additional complexity of the payoff functions.

Table 1 and Table 2 below. The parameters of the underlying assets and neural networks are set as shown in Tables 1 and 2 below.

#### **Table 1.** Parameters of the underlying asset model with constant and stochastic volatility. **Parameter Constant Vol. (** = **) Stochastic Vol. (** ≥ **) 1** *4.2. Low-Dimensional Case: Single-Asset Bermudan Options with Stochastic Volatility and Constant Volatility*

Spot rate () 100.0 Strike rate () 105.0 Capped rate () 10.0 Time to maturity () [years] 1.0 Risk-free interest rate () [%] 6.0 Initial volatility ( ,) [%] 30.0 Long-run mean () 4.8 We begin with the simplest valuation based on Schwartz's (1997) single-asset and constant volatility model in, compared with the standard LSMC method using polynomial regression and the finite difference method (FDM) detailed by Tavella and Randall (2000). In the MLP, we applied Algorithm 1 for learning neural networks with Monte Carlo simulations and then repeated the valuation procedure (using Algorithm 1 without learning) 100 times. The MLP in this experiment contains two hidden layers with sixty-four neurons of sigmoid activate functions. In the standard LSMC method, we use a quintic function for one-dimensional problems (i.e., *d<sup>l</sup>* = 1) and a multi-dimensional quadratic function for two or more higher-dimensional problems (i.e., *d<sup>l</sup>* ≥ 2) and apply the same procedure (i.e., the learning and valuation procedure). The FDM is based on the Crank–Nicolson scheme with discretized 2000/200/50 grids in the time/asset/volatility directions.


**Table 1.** Parameters of the underlying asset model with constant and stochastic volatility.

<sup>1</sup> We applied Euler's method as a discretized method.

**Table 2.** Neural networks' learning parameters.


<sup>1</sup> Learning optimizer Adam (Kingma and Ba 2014) hyperparameters are set to 0.01 for the learning rate, 0.9 for beta1, 0.999 for beta2, and 1 <sup>×</sup> <sup>10</sup>−<sup>7</sup> for epsilon; training is completed when the loss does not improve even after 20 epochs, as early stopping. Randomized 20% of input paths are used in evaluations to avoid over-learning.

Table 3 compares the means and standard deviations of the option prices obtained with the MLP and the standard LSMC methods. Considering the option price of the FDM as a proxy for the value of the Bermudan commodity option price, we see that both the MLP and the standard LSMC method almost achieve the Bermudan option price value, i.e., the gap between the three prices is sufficiently small for this one-dimensional problem. We implemented Algorithm 1 using the MLP and the standard LSMC on Python, using a machine learning package based on TensorFlow (Abadi et al. 2015) and the PolynomialFeatures toolbox of the scikit-learn library (Pedregosa et al. 2011). All our numerical experiments were run using Google Colaboratory (Google 2022) with 36 GB of RAM and a dual-core CPU of 2.3 GHz.

**Table 3.** Bermudan (capped) put option prices in single-asset constant volatility (*d<sup>l</sup>* = 1).


note FDM = finite difference method; LSMC = least-squares Monte Carlo; MLP = multi-layer perceptron.

tron.

When considering a single-asset stochastic model, the corresponding Bermudan commodity option problem becomes two-dimensional, i.e., *d<sup>l</sup>* = 2, we can observe a slight difference between the MLP and the LSMC methods, compared with the approximate solution of the FDM, as shown in Table 4. First, we see that there is no significant difference between the 3 cases for the Bermudan put option with exercisable dates *N* = 2. However, the gap between the LSMC and FDM prices becomes slightly wider, compared with that between the MLP and FDM prices with exercisable dates *N* = 11. For the Bermudan capped put option, there is a slightly larger difference vis-à-vis the FDM price for both the MLP and LSMC prices, whereas a slight improvement was achieved by using the MLP in terms of the gap from the FDM price, as illustrated in the box plots in Figure 4. When considering a single-asset stochastic model, the corresponding Bermudan commodity option problem becomes two-dimensional, i.e., = 2, we can observe a slight difference between the MLP and the LSMC methods, compared with the approximate solution of the FDM, as shown in Table 4. First, we see that there is no significant difference between the 3 cases for the Bermudan put option with exercisable dates =2. However, the gap between the LSMC and FDM prices becomes slightly wider, compared with that between the MLP and FDM prices with exercisable dates = 11. For the Bermudan capped put option, there is a slightly larger difference vis-à-vis the FDM price for both the MLP and LSMC prices, whereas a slight improvement was achieved by using the MLP in terms of the gap from the FDM price, as illustrated in the box plots in Figure 4.

**Bermudan Put Option ( = 1) Bermudan Capped Put Option ( = 1) # of Ex. N = 2 N = 11 # of Ex. N = 2 N = 11** 

 (St. dev.) (0.071) (0.069) (St. dev.) (0.027) (0.029) MLP price Mean 11.471 11.812 MLP price Mean 5.729 6.320 (St. dev.) (0.069) (0.061) (St. dev.) (0.027) (0.031) FDM price 11.415 11.808 FDM price 5.752 6.350 note FDM = finite difference method; LSMC = least-squares Monte Carlo; MLP = multi-layer percep-

LSMC price Mean 11.474 11.786 LSMC price Mean 5.731 6.244

**Table 4.** Bermudan (capped) put option prices in single-asset stochastic volatility (*d<sup>l</sup>* = 2). **Table 4.** Bermudan (capped) put option prices in single-asset stochastic volatility ( = 2).


*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 11 of 23

**Figure 4.** The box plot of option prices with a single asset with stochastic volatility. It has two-dimensional state variables. The conditions were put and capped put payoff, as well as changing exercisable dates via the method. In the box plot, the center of the box in the first and third quartiles is a median (line), while the beards are the maximum and the minimum values. **Figure 4.** The box plot of option prices with a single asset with stochastic volatility. It has twodimensional state variables. The conditions were put and capped put payoff, as well as changing exercisable dates *N* via the method. In the box plot, the center of the box in the first and third quartiles is a median (line), while the beards are the maximum and the minimum values.

#### *4.3. Higher-Dimensional Case: Multi-Asset Bermudan Options with Stochastic Volatility 4.3. Higher-Dimensional Case: Multi-Asset Bermudan Options with Stochastic Volatility*

In the case of higher-dimensional Bermudan commodity options with multi-asset stochastic volatility (e.g., two asset problems with = 4), it is difficult (or unrealistic) to obtain an approximate Bermudan option price with high accuracy using the FDM. Thus, we compare option prices obtained with the MLP with the benchmark polynomials (i.e., the standard LSMC method) only, where we set = 10, i.e., five-asset stochastic volatility, in the numerical experiments. Then, we will discuss the source of the difference in view of the continuation value functions for both methods in the next section. In the case of higher-dimensional Bermudan commodity options with multi-asset stochastic volatility (e.g., two asset problems with *d<sup>l</sup>* = 4), it is difficult (or unrealistic) to obtain an approximate Bermudan option price with high accuracy using the FDM. Thus, we compare option prices obtained with the MLP with the benchmark polynomials (i.e., the standard LSMC method) only, where we set *d<sup>l</sup>* = 10, i.e., five-asset stochastic volatility, in the numerical experiments. Then, we will discuss the source of the difference in view of the continuation value functions for both methods in the next section. Additionally, we will compare the accuracy of estimated continuation values by considering a two-asset problem with *d<sup>l</sup>* = 4 and two exercisable dates *N* = 2.

Table 5 shows our numerical results, which compare the mean and standard deviation between the LSMC and the MLP prices for *N* = 2 and *N* = 11, obtained via the learning and valuation procedure described in Section 3. In the case of the Bermudan put option for *N* = 2, we see that there is no significant difference between the two methods, as with the case with a low-dimensional problem, *d<sup>l</sup>* = 2. However, the gap between the two increased for the Bermudan capped put options and the case with *N* = 11, as shown in the box plots in Figure 5. In other words, we see that the differences between the MLP and the LSMC are emphasized by introducing additional complexity to the payoff function or by increasing exercisable dates.


tion or by increasing exercisable dates.

*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 12 of 23

**Table 5.** Bermudan (capped) put option prices in the five-asset (*d<sup>l</sup>* = 10) stochastic volatility model. **Table 5.** Bermudan (capped) put option prices in the five-asset ( = 10) stochastic volatility model.

Additionally, we will compare the accuracy of estimated continuation values by consid-

Table 5 shows our numerical results, which compare the mean and standard deviation between the LSMC and the MLP prices for =2 and = 11, obtained via the learning and valuation procedure described in Section 3. In the case of the Bermudan put option for =2, we see that there is no significant difference between the two methods, as with the case with a low-dimensional problem, = 2. However, the gap between the two increased for the Bermudan capped put options and the case with = 11, as shown in the box plots in Figure 5. In other words, we see that the differences between the MLP and the LSMC are emphasized by introducing additional complexity to the payoff func-

ering a two-asset problem with = 4 and two exercisable dates =2.

**Figure 5.** Box plot of option prices' distribution with five-asset stochastic volatility. It has ten-dimensional state variables. The conditions were put and capped-put payoff, as well as changing exercisable dates via the method. In the box plot, the circles are outliers. **Figure 5.** Box plot of option prices' distribution with five-asset stochastic volatility. It has tendimensional state variables. The conditions were put and capped-put payoff, as well as changing exercisable dates *N* via the method. In the box plot, the circles are outliers.

Additionally, we demonstrated the same numerical experiments above but increased the number of assets in the stochastic volatility model to 10 and 20 assets, respectively (i.e., = 20 and = 40) and compared the estimated Bermudan commodity option prices between the MLP and the LSMC. To observe the effects of increasing the number of exercisable dates more clearly, we added =6 between =2 and = 11 to obtain the estimation values shown in Tables 6 and 7 (corresponding to the cases of = 20 and = 40, respectively). Table 6 shows the means, the standard deviations, and the gaps between the estimated prices for the Bermudan put options and Bermudan capped put options with = 20, and Table 7 shows those with = 40. Additionally, we demonstrated the same numerical experiments above but increased the number of assets in the stochastic volatility model to 10 and 20 assets, respectively (i.e., *d<sup>l</sup>* = 20 and *d<sup>l</sup>* = 40) and compared the estimated Bermudan commodity option prices between the MLP and the LSMC. To observe the effects of increasing the number of exercisable dates more clearly, we added *N* = 6 between *N* = 2 and *N* = 11 to obtain the estimation values shown in Tables 6 and 7 (corresponding to the cases of *d<sup>l</sup>* = 20 and *d<sup>l</sup>* = 40, respectively). Table 6 shows the means, the standard deviations, and the gaps between the estimated prices for the Bermudan put options and Bermudan capped put options with *d<sup>l</sup>* = 20, and Table 7 shows those with *d<sup>l</sup>* = 40.

**Table 6.** Bermudan (capped) put option prices in the 10-asset ( = 20) stochastic volatility model. **Table 6.** Bermudan (capped) put option prices in the 10-asset (*d<sup>l</sup>* = 20) stochastic volatility model.


**Table 7.** Bermudan (capped) put option prices in the 20-asset (*d<sup>l</sup>* = 40) stochastic volatility model.


(MLP— LSMC)

Difference (MLP—

> Similar to the previous cases, the gap between the MLP and LMSC increases for the Bermudan put prices given a larger number of exercisable dates but decreases for the Bermudan capped put options when *N* = 11 for both *d<sup>l</sup>* = 20 and *d<sup>l</sup>* = 40. It is possible that the choice of regression function has a weaker effect for higher dimensional Bermudan capped put options with a larger number of exercisable dates, i.e., the continuation value functions of the capped put options may become flatter or smoother when the number of exercisable dates increases and can be fitted with less sophisticated functions. This phenomenon should be investigated in more detail in a future study. Bermudan put prices given a larger number of exercisable dates but decreases for the Bermudan capped put options when = 11 for both = 20 and = 40. It is possible that the choice of regression function has a weaker effect for higher dimensional Bermudan capped put options with a larger number of exercisable dates, i.e., the continuation value functions of the capped put options may become flatter or smoother when the number of exercisable dates increases and can be fitted with less sophisticated functions. This phenomenon should be investigated in more detail in a future study. *4.4. Comparison of Continuation Value Surfaces*

Similar to the previous cases, the gap between the MLP and LMSC increases for the

Mean 0.020 0.022 0.005

#### *4.4. Comparison of Continuation Value Surfaces* In the previous subsection, we observed that there are some differences between the

Mean 0.031 0.111 0.147

*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 13 of 23

(MLP— LSMC)

**Bermudan Put Option (** = **) Bermudan Capped Put Option (** = **)** # of Ex. N = 2 N = 6 N = 11 # of Ex. N = 2 N = 6 N = 11

(St. dev.) (0.112) (0.111) (0.111) (St. dev.) (0.048) (0.042) (0.037)

(St. dev.) (0.111) (0.105) (0.104) (St. dev.) (0.048) (0.040) (0.037)

Difference (MLP— LSMC)

LSMC price Mean 9.338 9.4855 9.453 LSMC price Mean 5.362 5.728 5.828

MLP price Mean 9.369 9.596 9.600 MLP price Mean 5.382 5.750 5.833

**Table 7.** Bermudan (capped) put option prices in the 20-asset ( = 40) stochastic volatility model.

In the previous subsection, we observed that there are some differences between the MLP and the LSMC regarding the estimated prices and that these differences were more notable for Bermudan capped put options and/or increased exercisable dates. Herein, we discuss the possible reason for this price difference by visualizing the continuation value surfaces for both the MLP and the LSMC methods. For visualization purposes, we consider single-asset Bermudan capped put options with stochastic volatility and constant volatility, i.e., the low-dimensional cases with *d<sup>l</sup>* = 1 and *d<sup>l</sup>* = 2 introduced in Section 4.2. MLP and the LSMC regarding the estimated prices and that these differences were more notable for Bermudan capped put options and/or increased exercisable dates. Herein, we discuss the possible reason for this price difference by visualizing the continuation value surfaces for both the MLP and the LSMC methods. For visualization purposes, we consider single-asset Bermudan capped put options with stochastic volatility and constant volatility, i.e., the low-dimensional cases with = 1 and = 2 introduced in Section 4.2.

The left-hand plot of Figure 6 illustrates the continuation value function estimated at *T*<sup>1</sup> in the one-dimensional Bermudan capped-put option problem (corresponding to the single-asset constant volatility model) using the LSMC, whereas the right-hand plot shows the problem using the MLP. We first observe that the continuation value function of the MLP monotonically decreases with the underlying price, whereas the one obtained from the LSMC method is a nonmonotonic function. Since the payoff function of the Bermudan capped put option is piecewise linear and it monotonically decreases with the underlying price, the monotonicity of the continuation value function is more consistent with the payoff structure of the Bermudan capped-put option. In this sense, we see that the continuation value function of the MLP reflects the monotonic property more appropriately than that of the LSMC method in the one-dimensional single-asset problem. The left-hand plot of Figure 6 illustrates the continuation value function estimated at ଵ in the one-dimensional Bermudan capped-put option problem (corresponding to the single-asset constant volatility model) using the LSMC, whereas the right-hand plot shows the problem using the MLP. We first observe that the continuation value function of the MLP monotonically decreases with the underlying price, whereas the one obtained from the LSMC method is a nonmonotonic function. Since the payoff function of the Bermudan capped put option is piecewise linear and it monotonically decreases with the underlying price, the monotonicity of the continuation value function is more consistent with the payoff structure of the Bermudan capped-put option. In this sense, we see that the continuation value function of the MLP reflects the monotonic property more appropriately than that of the LSMC method in the one-dimensional single-asset problem.

**Figure 6.** Continuation value functions (red lines) for the one-dimensional problem (with constant volatility) in the Bermudan capped option with exercisable dates *N* = 2. The blue dots represent the generated sample points of Bermudan option payoffs at expiration *T*2.

For the two-dimensional case with single-asset stochastic volatility for Bermudan capped put options, the continuation value functions have three-dimensional surfaces, as shown in Figure 7, wherein the left-hand and right-hand plots depict the continuation values with respect to volatility and underlying asset price directions for the LSMC method and the MLP, respectively. As in the one-dimensional problem, the payoff function is piecewise linear with respect to the underlying asset price direction and is flat when the underlying asset price exceeds the strike price *K* = 105 or is less than a certain value related to the capped rate *D* = 10. Since continuation value functions are supposed to approximate the payoff function at the maturity *T*2, given the information up to time *T*1, their surfaces are expected to have similar shapes, i.e., continuation values are approximately zero or close to the capped rate for larger or smaller values of the underlyings, respectively. In view of this payoff structure for the Bermudan capped put options, the continuation value

*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 14 of 23

the generated sample points of Bermudan option payoffs at expiration ଶ.

surface of the MLP seems to approximate the payoff function more accurately than that of the LSMC. continuation value surface of the MLP seems to approximate the payoff function more accurately than that of the LSMC.

**Figure 6.** Continuation value functions (red lines) for the one-dimensional problem (with constant volatility) in the Bermudan capped option with exercisable dates =2. The blue dots represent

For the two-dimensional case with single-asset stochastic volatility for Bermudan capped put options, the continuation value functions have three-dimensional surfaces, as shown in Figure 7, wherein the left-hand and right-hand plots depict the continuation values with respect to volatility and underlying asset price directions for the LSMC method and the MLP, respectively. As in the one-dimensional problem, the payoff function is piecewise linear with respect to the underlying asset price direction and is flat when the underlying asset price exceeds the strike price = 105 or is less than a certain value related to the capped rate = 10. Since continuation value functions are supposed to approximate the payoff function at the maturity ଶ, given the information up to time ଵ, their surfaces are expected to have similar shapes, i.e., continuation values are approximately zero or close to the capped rate for larger or smaller values of the underlyings, respectively. In view of this payoff structure for the Bermudan capped put options, the

**Figure 7.** Continuation value surfaces at ଵ of the two-dimensional problem (single-asset stochastic volatility) for Bermudan capped put options. **Figure 7.** Continuation value surfaces at *T*<sup>1</sup> of the two-dimensional problem (single-asset stochastic volatility) for Bermudan capped put options.

**Remark 1.** *In general, the visualization of nonparametric methods provides an intuitive interpretation of the estimated functions. We have observed that "the continuation value function of the MLP monotonically decreases with the underlying price, whereas the one obtained from the LSMC method is a non-monotonic function," and that "since the payoff function of the Bermudan cappedput option is piecewise linear and it monotonically decreases with the underlying price, the monotonicity of the continuation value function is more consistent with the payoff structure of the Bermudan capped-put option," as stated earlier in this section. Such a visualization helps in understanding the valuation structure for the applied method in the middle of the process for Bermudan option pricing, but the effect of the approximation error may be weakened in the total procedure. However, we should be able to understand the approximation errors of estimated continuation value functions intuitively in the middle of the process from such a visualization.*  **Remark 1.** *In general, the visualization of nonparametric methods provides an intuitive interpretation of the estimated functions. We have observed that "the continuation value function of the MLP monotonically decreases with the underlying price, whereas the one obtained from the LSMC method is a non-monotonic function," and that "since the payoff function of the Bermudan capped-put option is piecewise linear and it monotonically decreases with the underlying price, the monotonicity of the continuation value function is more consistent with the payoff structure of the Bermudan capped-put option," as stated earlier in this section. Such a visualization helps in understanding the valuation structure for the applied method in the middle of the process for Bermudan option pricing, but the effect of the approximation error may be weakened in the total procedure. However, we should be able to understand the approximation errors of estimated continuation value functions intuitively in the middle of the process from such a visualization.*

#### *4.5. Comparison of Accuracy in Continuation Values 4.5. Comparison of Accuracy in Continuation Values*

In the previous subsection, we observed that the continuation value surfaces of the MLP may approximate the payoff functions more accurately than those of the LSMC by visualizing the continuation value surfaces. To further investigate the estimated continuation value surfaces in higher-dimension problems, we next measure the accuracy of the continuation values using a four-dimensional problem of the Bermudan capped put basket option with two exercisable dates (i.e., = 4 and =2) for both the MLP and the In the previous subsection, we observed that the continuation value surfaces of the MLP may approximate the payoff functions more accurately than those of the LSMC by visualizing the continuation value surfaces. To further investigate the estimated continuation value surfaces in higher-dimension problems, we next measure the accuracy of the continuation values using a four-dimensional problem of the Bermudan capped put basket option with two exercisable dates (i.e., *d<sup>l</sup>* = 4 and *N* = 2) for both the MLP and the LSMC method. *J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 15 of 23 Consider a problem of estimating the continuation values at ଵ, as shown in Figure

LSMC method. Consider a problem of estimating the continuation values at *T*1, as shown in Figure 8. Since Bermudan options with exercisable dates *N* = 2 become simple European options if not exercised at *T*1, the continuation values at *T*<sup>1</sup> may be estimated via European option prices expiring at *T*2, given the state variables at *T*1, i.e., *XT*<sup>1</sup> = - *S*1,*T*<sup>1</sup> , *σ*1,*T*<sup>1</sup> , *S*2,*T*<sup>1</sup> , *σ*2,*T*<sup>2</sup> > . Therefore, we can calculate the accuracy of the estimated continuation values at *T*<sup>1</sup> by measuring the differences between the estimated continuation values and the European option prices at *T*<sup>1</sup> by specifying different state variables as input values of the European options. 8. Since Bermudan options with exercisable dates =2 become simple European options if not exercised at ଵ, the continuation values at ଵ may be estimated via European option prices expiring at ଶ , given the state variables at ଵ , i.e., ்భ = ൣଵ,்భ, ଵ,்భ, ଶ,்భ, ଶ,்మ൧ ୃ . Therefore, we can calculate the accuracy of the estimated continuation values at ଵ by measuring the differences between the estimated continuation values and the European option prices at ଵ by specifying different state variables as input values of the European options.

We first solved the Bermudan option problem with = 4 and =2 using the same parameter settings as those in the previous subsections by applying the MLP and

state variables specified in Table 8. Note that Table 8 defines the discretized domain of the state variables, wherein each state variable is discretized in the interval between the minimum and maximum values so that the number of grid points in 1 dimension becomes 2ସ + 1. Then, the total number of grid points is given by 83,521 (= 17ସ). Similarly, we applied the standard Monte Carlo simulation to compute the European option price on each grid and repeated this procedure 83,521 times to estimate the surface of the European option prices. This surface of the European option prices may be considered to provide an approximation of the theoretical continuation value (see notes in Table 8), and we can measure the accuracy of the continuation values using the difference between the estimated continuation values with the MLP and the LSMC methods and the simulation-

**Table 8.** State variables at ଵ. Each state variable is equally discretized into the number of points

**Variable Minimum Maximum # of Points 1 Interval**  ଵ,்భ 25.0 175.0 2ସ + 1 9.375 ଵ,்భ 0.05 0.55 2ସ + 1 0.03125 ଶ,்భ 25.0 175.0 2ସ + 1 9.375 ଶ,்భ 0.05 0.55 2ସ + 1 0.03125 1 The number of grid points with state variables at ଵ are 83,521 (= 17ସ) in total. The number of sample paths generated for evaluating each European option in the standard Monte Carlo simulation is 10,000. The average values of estimated European prices and standard errors are 4.997 and 0.027, respectively, indicating that the 95% confidence interval is given by 4.997 േ 0.053 on aver-

Additionally, we can change the number of hidden layers/neurons and the type of activation function in the MLP to verify their effects on its accuracy. In this study, we evaluate the size of accuracy in terms of the following normalized root-mean-squared er-

> ̂ ௫ − ̂

ඥ∑ ( − ෝప) ூ <sup>ଶ</sup> ୀଵ

, (11)

1 √

=

**Figure 8.** Continuation value at ଵ with two exercisable dates. **Figure 8.** Continuation value at *T*<sup>1</sup> with two exercisable dates.

based (theoretical) surface.

from minimum up to maximum value.

age for the Monte Carlo simulations.

ror (NRMSE) for each methodology:

We first solved the Bermudan option problem with *d<sup>l</sup>* = 4 and *N* = 2 using the same parameter settings as those in the previous subsections by applying the MLP and the LSMC methods; subsequently, we calculated the continuation values at *T*1, given the state variables specified in Table 8. Note that Table 8 defines the discretized domain of the state variables, wherein each state variable is discretized in the interval between the minimum and maximum values so that the number of grid points in 1 dimension becomes 2 <sup>4</sup> + 1. Then, the total number of grid points is given by 83,521 (= 17<sup>4</sup> ). Similarly, we applied the standard Monte Carlo simulation to compute the European option price on each grid and repeated this procedure 83,521 times to estimate the surface of the European option prices. This surface of the European option prices may be considered to provide an approximation of the theoretical continuation value (see notes in Table 8), and we can measure the accuracy of the continuation values using the difference between the estimated continuation values with the MLP and the LSMC methods and the simulation-based (theoretical) surface.

**Table 8.** State variables at *T*<sup>1</sup> . Each state variable is equally discretized into the number of points from minimum up to maximum value.


<sup>1</sup> The number of grid points with state variables at *T*<sup>1</sup> are 83,521 (= 17<sup>4</sup> ) in total. The number of sample paths generated for evaluating each European option in the standard Monte Carlo simulation is 10,000. The average values of estimated European prices and standard errors are 4.997 and 0.027, respectively, indicating that the 95% confidence interval is given by 4.997 ± 0.053 on average for the Monte Carlo simulations.

Additionally, we can change the number of hidden layers/neurons and the type of activation function in the MLP to verify their effects on its accuracy. In this study, we evaluate the size of accuracy in terms of the following normalized root-mean-squared error (NRMSE) for each methodology:

$$NRMSE = \frac{1}{\sqrt{I}} \frac{\sqrt{\sum\_{i=1}^{I} (p\_i - \mathfrak{p}\_i)^2}}{\mathfrak{p}\_{\max} - \mathfrak{p}\_{\min}},\tag{11}$$

where *I* is the total number of grid points (i.e., *I* = 83, 521), and *p<sup>i</sup>* and *p*ˆ*<sup>i</sup>* are the *i th* continuation value and the corresponding European option price on the same grid point. In Equation (11), the root-mean-squared error is normalized by the difference between *p*ˆ*min* and *p*ˆ*max*, which are the minimum and maximum values of European option prices over the entire grid.

We computed the NRMSEs for different settings of neural networks in the case of the MLP, as shown in Table 9, wherein we changed the number of hidden layers/neurons and applied two types of activation functions, i.e., the ReLU and sigmoid functions. Note that the NRMSE with the LSMC method is also computed, as shown in the bottom row of the table, while Figure 9 compares the same NRMSE with respect to a different number of neurons for 16, 32, and 128 using bar graphs. In Table 9, we first observe that the MLP almost always provided better accuracy in terms of NRMSEs compared with the LSMC. Second, when comparing the types of activation functions, the MLP with the sigmoid function was always better than the MLP with the ReLU function for estimating continuation value surfaces when the number of hidden layers/neurons was fixed. This may be explained by the smoothness of the sigmoid function; the continuation value functions are expected to be smooth with respect to the state variables and can be fitted via smooth functions (e.g., the sigmoid function) better than non-smooth functions, such as the ReLU function. Third, an increase in the number of hidden layers is effective for a few hidden layers but does not necessarily improve the NRMSE when the number of hidden ̂

and ̂

over the entire grid.

layers is three or larger for both MLPs with the ReLU and sigmoid functions. However, in any case, we obtained better NRMSEs by using the MLP with the sigmoid function. layers but does not necessarily improve the NRMSE when the number of hidden layers is three or larger for both MLPs with the ReLU and sigmoid functions. However, in any case, we obtained better NRMSEs by using the MLP with the sigmoid function.

where is the total number of grid points (i.e., = 83,521), and and ෝప are the

continuation value and the corresponding European option price on the same grid point. In Equation (11), the root-mean-squared error is normalized by the difference between

௫, which are the minimum and maximum values of European option prices

We computed the NRMSEs for different settings of neural networks in the case of the MLP, as shown in Table 9, wherein we changed the number of hidden layers/neurons and applied two types of activation functions, i.e., the ReLU and sigmoid functions. Note that the NRMSE with the LSMC method is also computed, as shown in the bottom row of the table, while Figure 9 compares the same NRMSE with respect to a different number of neurons for 16, 32, and 128 using bar graphs. In Table 9, we first observe that the MLP almost always provided better accuracy in terms of NRMSEs compared with the LSMC. Second, when comparing the types of activation functions, the MLP with the sigmoid function was always better than the MLP with the ReLU function for estimating continuation value surfaces when the number of hidden layers/neurons was fixed. This may be explained by the smoothness of the sigmoid function; the continuation value functions are expected to be smooth with respect to the state variables and can be fitted via smooth functions (e.g., the sigmoid function) better than non-smooth functions, such as the ReLU function. Third, an increase in the number of hidden layers is effective for a few hidden

௧-

**Table 9.** NRMSE comparisons with continuation values at *T*<sup>1</sup> by network settings of the MLP. It differs by activation functions and the number of hidden layers/neurons. **Table 9.** NRMSE comparisons with continuation values at ଵ by network settings of the MLP. It differs by activation functions and the number of hidden layers/neurons.


*J. Risk Financial Manag.* **2023**, *16*, x FOR PEER REVIEW 16 of 23

**Figure 9.** NRMSE comparison with continuation values at *T*1. MLP differences in a number of hidden layers, neurons, and selected activation functions. **Figure 9.** NRMSE comparison with continuation values at *T*<sup>1</sup> . MLP differences in a number of hidden layers, neurons, and selected activation functions.

#### **5. Discussion of Robustness and Computational Costs 5. Discussion of Robustness and Computational Costs**

In this section, we discuss the robustness and computational costs of our experiment for the pricing algorithm of multi-asset Bermudan commodity options using the MLP. In this section, we discuss the robustness and computational costs of our experiment for the pricing algorithm of multi-asset Bermudan commodity options using the MLP.

First, we discuss the learning adaptability of the MLP applied to our multi-asset Bermudan commodity option problems with stochastic volatility. Figure 10 depicts changes in the mean and standard deviation of learning rates with respect to the number of epochs in the MLP. In this figure, we see that training and validation losses remain close, which indicates no overfitting. Furthermore, the learning rate decreases rapidly until the number of epochs is 10 and stays at sufficiently good levels thereafter.

Next, we estimated the computational costs of the learning and valuation procedure when the problem dimension is increased. Table 10 compares the computational costs with respect to the dimensions of *d<sup>l</sup>* = 2, 4, 8, 16, 32, 64, wherein the same numerical experiment as that of the previous section was repeated 100 times and computed the mean and standard deviation of the computational time for each algorithm. Furthermore, the average numbers of epochs in learning are also computed in the MLP. Although the LSMC generally performs much better in terms of computational costs when the dimension is particularly low, its computational time grows exponentially in tandem with the size of the dimension. This is because when using a polynomial regression in the LSMC, the number of terms in the polynomial function increases combinatorially with the number of variables, even though its maximum order is fixed.

of epochs is 10 and stays at sufficiently good levels thereafter.

**Figure 10.** Learning rate by epochs. The learning of the MLP was repeated 100 times and computed the mean (line) and standard deviation (bar) of learning losses in the Bermudan capped put option with the exercise dates = 2 and two underlying assets with stochastic volatility (i.e., = 4). **Figure 10.** Learning rate by epochs. The learning of the MLP was repeated 100 times and computed the mean (line) and standard deviation (bar) of learning losses in the Bermudan capped put option with the exercise dates *N* = 2 and two underlying assets with stochastic volatility (i.e., *d<sup>n</sup>* = 4).

First, we discuss the learning adaptability of the MLP applied to our multi-asset Bermudan commodity option problems with stochastic volatility. Figure 10 depicts changes in the mean and standard deviation of learning rates with respect to the number of epochs in the MLP. In this figure, we see that training and validation losses remain close, which indicates no overfitting. Furthermore, the learning rate decreases rapidly until the number

Next, we estimated the computational costs of the learning and valuation procedure when the problem dimension is increased. Table 10 compares the computational costs with respect to the dimensions of = 2, 4, 8, 16, 32,6 4, wherein the same numerical experiment as that of the previous section was repeated 100 times and computed the mean **Table 10.** Computational cost comparisons. Each algorithm's computational time periods (seconds) and their statistics by dimensions were calculated 100 times in the same option condition, as shown in Figure 10. In the MLP, the average numbers of epochs in learning are also listed.


in learning. The computational cost of learning depends on how often the networks are updated during training, but the computation cost per one cycle of training data (i.e., epoch) remains the same when the size of the network is fixed. In the numerical experi-<sup>1</sup> LSMC with a multi-dimensional quadratic function. <sup>2</sup> MLP with 2 hidden layers with 64 neurons of sigmoid activation functions.

ments, we applied two hidden layers with sixty-four neurons using sigmoid activation functions, whereby the computational cost per epoch remained almost the same regardless of dimensions; the computational time in learning is determined by the total number of epochs. Although the computational cost per epoch slightly increases as the number of features in the input layer increases by dimension, the average computational time decreases even for large dimensions with the reduction in the average number of epochs due to the early stopping rule. This is the benefit of introducing the early stopping rule in Algorithm 1, which is particularly effective for higher-dimensional problems to avoid unnecessarily increasing training iterations (and overfitting). Note that the average computational time for both learning and valuation became smaller for the MLP than the LSMC when = 64. In contrast with the LSMC, the computational cost in the MLP is mostly unaffected by the size of the dimension but is directly proportional to the average number of epochs in learning. The computational cost of learning depends on how often the networks are updated during training, but the computation cost per one cycle of training data (i.e., epoch) remains the same when the size of the network is fixed. In the numerical experiments, we applied two hidden layers with sixty-four neurons using sigmoid activation functions, whereby the computational cost per epoch remained almost the same regardless of dimensions; the computational time in learning is determined by the total number of epochs. Although the computational cost per epoch slightly increases as the number of features in the input layer increases by dimension, the average computational time decreases even for large dimensions with the reduction in the average number of epochs due to the early stopping rule. This is the benefit of introducing the early stopping rule in Algorithm 1, which is particularly effective for higher-dimensional problems to avoid unnecessarily increasing training iterations (and overfitting). Note that the average computational time for both learning and valuation became smaller for the MLP than the LSMC when *d<sup>l</sup>* = 64.

### **6. Conclusions**

In this study, we detailed the use of a neural network for pricing multi-asset Bermudan option problems with stochastic volatility in commodity markets and illustrated its effectiveness. First, we employed the MLP to estimate continuation values in the multi-dimensional Bermudan commodity option problem, whereby we formulated the multi-asset stochastic

volatility model and generated Monte Carlo simulation sample paths for learning continuation value functions using the MLP. Then, in the applied algorithm, we introduced early stopping into the learning of the MLP to avoid unnecessarily increasing training iterations and overfitting. The early stopping rule was activated by counting the number of epochs without improvement compared with the previous epoch. We also introduced a resampling process, and a valuation procedure was applied for the estimated neural networks by generating a different set of simulation sample paths. We executed numerical experiments to evaluate the accuracy of the continuation values and the initial price of the option using different settings of networks, problem dimension, and exercisable dates, whereby two types of payoff functions for Bermudan commodity options were considered, namely Bermudan put options and Bermudan capped put options. From our numerical analysis, we clarified the following observations:


Essentially, the use of neural networks for option pricing has the advantage of recognizing sizable input features and generating flexible output features in a unified framework. Nevertheless, there are some drawbacks: high computational effort and resources are required for learning networks, especially for exotic options, including the Bermudan commodity options considered in this study. Additionally, it is necessary to frequently re-learn the networks in response to market conditions. However, we observed that the neural network approach using the MLP reached an appropriate level of learning rates at around 10 epochs, even in high-dimensional cases, as illustrated in our numerical experiments. Therefore, this approach is expected to reduce learning costs if network configurations are developed appropriately.

Although this study chose a relatively simple structure of multi-layered networks, there are other types of network structures such as a recursive structure and unsupervised learning, as discussed in various fields, including pattern recognition and time-series prediction. In finance, although some examples of recursive neural networks for time-series analyses exist, to the best of our knowledge, their use for option pricing has not been considered sufficiently. Moreover, it is important to use empirical data and demonstrate the practicability and applications for risk management in actual commodity market businesses. Such further investigation is interesting and could be considered potential topics for future study.

**Author Contributions:** Conceptualization, K.H. and Y.Y.; methodology, K.H. and Y.Y.; software, K.H.; validation, K.H. and Y.Y.; formal analysis, K.H.; investigation, K.H.; resources, K.H. and Y.Y.; data curation, K.H.; writing—original draft preparation, K.H.; writing—review and editing, Y.Y.; visualization, K.H.; supervision, Y.Y.; project administration, Y.Y.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** Grant-in-Aid for Scientific Research (A) 20H00285 and Grant-in-Aid for Challenging Research (Exploratory) 19K22024 from the Japan Society for the Promotion of Science (JSPS).

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This work was supported by the Grant-in-Aid for Scientific Research (A) 20H00285 and Grant-in-Aid for Challenging Research (Exploratory) 19K22024 from the Japan Society for the Promotion of Science (JSPS).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Recursive Formulation for a Bermudan Option Pricing Problem**

Given a complete filtered probability space (Ω, <sup>F</sup>, (F*t*)0≤*t*≤*<sup>T</sup>* , P) with a finite time horizon *T* > 0, we assume that a set of underlying assets is modeled using a multifactored process (*Xt*)0≤*t*≤*<sup>T</sup>* adapted to the filtration, (F*t*)0≤*t*≤*<sup>T</sup>* , and that P is an associated riskneutral measure. We consider a Bermudan option with exercise dates 0 = *T*<sup>0</sup> ≤ *T*<sup>1</sup> < *T*<sup>2</sup> < . . . < *T<sup>N</sup>* = *T* and discrete-time payoff process *PT<sup>n</sup>* if exercised at times (*Tn*)0≤*n*≤*<sup>N</sup>* , where *PT<sup>n</sup>* is specified as a function of *XT<sup>n</sup>* .

In the Bermudan option, the continuation and exercise values are compared at each exercisable period, while the option is exercised if the exercise value is higher. Therefore, Bermudan option value *VT<sup>n</sup>* is computed using the following recursive equation:

$$\begin{cases} V\_{T\_N} = P\_{T\_N} \\ V\_{T\_{n-1}} = \max \left( P\_{T\_{n-1}} e^{-r\delta\_{T\_n}} \mathbb{E} \left[ V\_{T\_n} \middle| \mathcal{F}\_{T\_{n-1}} \right] \right), 1 \le n \le N \end{cases} \tag{A1}$$

In the risk-neutral measure, the conditional expected value of the risk-neutral probability measure *P*e with the risk-free interest rate *r* and the interval between *Tn*−<sup>1</sup> and *T<sup>n</sup>* as δ*T<sup>n</sup>* , based on the filtration <sup>F</sup>*Tn*−<sup>1</sup> up to time *<sup>T</sup>n*−1, indicates the continuation value *<sup>U</sup>Tn*−<sup>1</sup> as

$$\mathcal{U}\_{T\_{n-1}} = e^{-r\delta\_{T\_n}} \widetilde{E}\left[V\_{T\_n} \middle| \mathfrak{F}\_{T\_{n-1}}\right]. \tag{A2}$$

Bermudan option value *<sup>V</sup>Tn*−<sup>1</sup> at time *Tn*−<sup>1</sup> is sequentially calculated backward when the continuation value *<sup>U</sup>Tn*−<sup>1</sup> is identified. The key is estimating the continuation value as a function that consists of underlying multivariate risk factors. Since underlying assets consist of a multi-dimensional Markov process, the continuation value function can be expressed as the multi-dimensional nonlinear function with the Markov process state variables *<sup>X</sup>Tn*−<sup>1</sup> ,

$$
\widetilde{E}\left[V\_{T\_n}\middle|\mathfrak{F}\_{T\_{n-1}}\right] = \widetilde{E}\left[V\_{T\_n}\middle|X\_{T\_{n-1}}\right].\tag{A3}
$$

Furthermore, from the definition of the conditional expectation, there is a measurable function *<sup>h</sup>Tn*−<sup>1</sup> that satisfies the following equation:

$$h\_{T\_{n-1}}(X\_{T\_{n-1}}) = e^{-r\delta\_{T\_n}}\tilde{E}\left[V\_{T\_n} \middle| X\_{T\_{n-1}}\right].\tag{A4}$$

For the approximation of a function *<sup>h</sup>Tn*−<sup>1</sup> *<sup>X</sup>Tn*−<sup>1</sup> , we can consider <sup>Φ</sup>*Tn*−<sup>1</sup> *<sup>x</sup>Tn*−<sup>1</sup> as the approximation function at time *Tn*−1,

$$h\_{T\_{n-1}}(X\_{T\_{n-1}}) \approx \Phi\_{T\_{n-1}}(X\_{T\_{n-1}}).\tag{A5}$$

After that, the price at time *t* = 0 is calculated by following recursive backward procedures using the relationship *VT<sup>n</sup>* and *<sup>V</sup>Tn*−<sup>1</sup> . At maturity *TN*(= *T*), the continuation value of the Bermudan option is *U<sup>T</sup>* ≡ 0. Therefore, the Bermudan option's value at maturity *T<sup>N</sup>* is:

$$V\_{T\_N} = P\_{T\_N}.\tag{A6}$$

By (A4), the continuation value *<sup>U</sup>TN*−<sup>1</sup> at time *TN*−<sup>1</sup> is expressed as

$$\mathcal{U}\_{T\_{N-1}} = h\_{T\_{N-1}}(\mathcal{X}\_{T\_{N-1}}) \approx \Phi\_{T\_{N-1}}(\mathcal{X}\_{T\_{N-1}}).\tag{A7}$$

The Bermudan option value at time *TN*−<sup>1</sup> is expressed as

$$V\_{T\_{N-1}} := \max\big(P\_{T\_{N-1}\prime}, \Phi\_{T\_{N-1}}(X\_{T\_{N-1}})\big). \tag{A8}$$

We can obtain Bermudan option value *VT*<sup>0</sup> by adapting (A7) and (A8) backward, recursively, each time step to *n* = 1. In multi-asset Bermudan option pricing, Monte Carlo simulations are generally used because other numerical methods become exponentially more difficult in higher-dimensional cases. By simulating a large number of paths, we can use the average of the prices obtained from each path as an estimator of the price as

$$\overline{V\_{T\_0}} = \frac{1}{M} \sum\_{j=1}^{M} V\_{T\_0}^{(j)} \, \prime \tag{A9}$$

where *M* is the number of simulated paths.

From the above, the prices of the Bermudan options can be obtained by finding the approximate functions of the continuation value functions at each exercisable period.

#### **Appendix B. Least-Squares Monte Carlo Method**

The least-squares Monte Carlo (LSMC) method, proposed by Longstaff and Schwartz (2001), is a method of early-exercisable option pricing in which regression calculation uses simulation sample paths. In the LSMC method, a polynomial function of the Markov process state variables is applied to identify the continuation values. The following is an algorithm for Bermudan option pricing using the LSMC method.

Step 0. Generate Monte Carlo sample paths of the underlying asset prices and state vari-(*j*)

ables. We denote the underlying asset prices at time *t* in the *j*-th sample path as *S t* and the Markov process state variables as *x* (*j*) *t* . Subsequently, we obtain the series of paths as

Step 1. Calculate the series of Bermudan option values at maturity *TN*(= *T*), as follows:

$$V\_{T\_N} := \left[ \mathcal{g} \left( \mathcal{S}\_{T\_N}^{(1)} \right), \mathcal{g} \left( \mathcal{S}\_{T\_N}^{(2)} \right), \dots, \mathcal{g} \left( \mathcal{S}\_{T\_N}^{(M)} \right) \right]^\top,\tag{A11}$$

where *g* denotes a payoff function of *S<sup>t</sup>* .

Step 2. Find a polynomial function that approximates the continuation values. Herein, we denote a polynomial function as <sup>ˆ</sup>*hTN*−<sup>1</sup> and a measurable function in (A4) as *<sup>h</sup>TN*−<sup>1</sup> . Then,

$$h\_{T\_{N-1}}(\mathbf{x}\_{T\_{N-1}}) \approx \hat{h}\_{T\_{N-1}}(\mathbf{x}\_{T\_{N-1}}).\tag{A12}$$

Additionally, <sup>ˆ</sup>*hT*−<sup>1</sup> is sought to minimize the following equation:

$$\frac{1}{M} \sum\_{i=1}^{M} \left( \hat{h}\_{T\_{N-1}}(\mathbf{x}\_{T\_{N-1}}) - e^{-r\delta\_{T\_N}} V\_{T\_N} \right)^2. \tag{A13}$$

Step 3. Calculate the approximated continuation values. Let *<sup>h</sup>TN*−<sup>1</sup> <sup>≡</sup> <sup>ˆ</sup>*hTN*−<sup>1</sup> and set the series of approximated continuation values at *TN*−<sup>1</sup> as

$$\left[\hat{h}\_{T\_{N-1}}\left(\mathbf{x}\_{T\_{N-1}}^{(1)}\right), \hat{h}\_{T\_{N-1}}\left(\mathbf{x}\_{T\_{N-1}}^{(2)}\right), \dots, \dots, \hat{h}\_{T\_{N-1}}\left(\mathbf{x}\_{T\_{N-1}}^{(M)}\right)\right]^\top. \tag{A14}$$

Step 4. Calculate the exercised values and Bermudan option values. The series of exercise values at *TN*−<sup>1</sup> using underlying asset prices (A10) is

$$\left[\mathcal{g}\left(\mathcal{S}\_{T\_{N-1}}^{(1)}\right), \mathcal{g}\left(\mathcal{S}\_{T\_{N-1}}^{(2)}\right), \dots, \mathcal{g}\left(\mathcal{S}\_{T\_{N-1}}^{(M)}\right)\right]^\top. \tag{A15}$$

Then, the series of Bermudan option values *<sup>V</sup>TN*−<sup>1</sup> at time *TN*−<sup>1</sup> is obtained as

$$\left[V\_{T\_{N-1}}\left(\mathbf{S}\_{T\_{N-1}}^{(1)}\mathbf{x}\_{T\_{N-1}}^{(1)}\right), V\_{T\_{N-1}}\left(\mathbf{S}\_{T\_{N-1}}^{(2)}\mathbf{x}\_{T\_{N-1}}^{(2)}\right), \dots, \dots, V\_{T\_{N-1}}\left(\mathbf{S}\_{T\_{N-1}}^{(M)}\mathbf{x}\_{T\_{N-1}}^{(M)}\right)\right]^{\top},\tag{A16}$$

where

$$V\_{T\_{N-1}}\left(\mathbf{S}\_{T\_{N-1}}^{(j)},\mathbf{x}\_{T\_{N-1}}^{(j)}\right) := \max\left(\mathbf{g}\left(\mathbf{S}\_{T\_{N-1}}^{(j)}\right), h\_{T\_{N-1}}\left(\mathbf{x}\_{T\_{N-1}}^{(j)}\right)\right), j = 1, \dots, M,\tag{A17}$$


Using this approach, we calculated the Bermudan option price using the continuation values from the polynomial function at each exercisable time. The main point of the pricing procedure in the LSMC method is that the polynomial function is defined to approximate the continuation values at each exercisable time point.

#### **References**


Serpen, Gursel, and Zhenning Gao. 2014. Complexity analysis of multilayer perceptron neural network embedded into a wireless sensor network. *Procedia Computer Science* 36: 192–97. [CrossRef]

Sirignano, Justin, and Konstantinos Spiliopoulos. 2018. Dgm: A Deep Learning Algorithm for Solving Partial Differential Equations. *Journal of Computational Physics* 375: 1339–64. [CrossRef]

Tavella, Domingo, and Curt Randall. 2000. *Pricing Financial Instruments: The Finite Difference Method*. New York: John Wiley & Sons.


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*Journal of Risk and Financial Management* Editorial Office E-mail: jrfm@mdpi.com www.mdpi.com/journal/jrfm

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9028-8