**2. Framework**

The first step in implementing a numerical algorithm to price early exercise options is to assume that time can be discretized. We specify *J* exercise points as *t*0 = 0 < *t*1 ≤ *t*2 ≤ ... ≤ *tJ* = *T*, with *t*0 and *T* denoting the current time and maturity of the option, respectively. Thus, we are essentially approximating the American option by the so-called Bermudan option. The American option price is obtained in the limit by increasing the number of exercise points, *J*; see also Bouchard and Warin (2012) for a formal justification of this approach.<sup>3</sup> We assume a complete probability space (<sup>Ω</sup>, F, Q) equipped with a discrete filtration F *tjJj*=<sup>0</sup> and a unique pricing measure corresponding to the probability measure Q. The derivative's value depends on one or more underlying assets modeled using a Markovian process, with state variables *X tjJj*=<sup>0</sup> adapted to the filtration. We denote by *Z tjJj*=<sup>0</sup> an adapted discounted payoff process for the derivative satisfying *Z tj* = *π X tj* , *tj* for a suitable

<sup>3</sup> We do not stress any further this difference as the literature on pricing early exercise options using simulation generally refers to these as American style options; see, e.g., Longstaff and Schwartz (2001).

function *π* (·, ·) assumed to be square integrable. This notation is sufficiently general to allow for non-constant interest rates through the appropriate definition of the state variables *X* and the payoff function *π* (see, e.g., Glasserman 2004). Following, e.g., Karatzas (1988) and Duffie (1996), in the absence of arbitrage, we can specify the American option price as:

$$P\left(X\left(0\right)=\mathbf{x}\right) = \max\_{\substack{\pi\left(t\_1\right)\in\mathcal{T}\left(t\_1\right)}} \mathbb{E}\left[Z\left(\pi\right)\left|X\left(0\right)=\mathbf{x}\right],\tag{1}$$

where T *tj* denotes the set of all stopping times with values in {*tj*, ... , *tJ*}. Thus, we explicitly assume that the option cannot be exercised at time *t* = 0.

The problem of calculating the option price in (1) with *J* > 1 is referred to as a discrete time optimal stopping time problem and typically solved using the dynamic programming principle. Intuitively this procedure can be motivated by considering the choice faced by the option holder at time *tj*. The optimal choice will be to exercise immediately if the value of this is positive and larger than the expected payoff from holding the option until the next period and behaving optimally onwards. Let *V X tj* denote the value of the option for state variables *X* at a time *tj* prior to expiration and define *F X tj* ≡ E[*Z τ tj*+<sup>1</sup> |*X tj*] as the expected conditional payoff, where *τ tj*+<sup>1</sup> is the optimal stopping time. It then follows that:

$$V\left(X\left(t\_{\bar{j}}\right)\right) = \max\left(Z\left(t\_{\bar{j}}\right), F\left(X\left(t\_{\bar{j}}\right)\right)\right),\tag{2}$$

and the optimal stopping time can be derived iteratively as:

$$\begin{cases} \tau\left(t\_{\bar{l}}\right) = T\\ \tau\left(t\_{\bar{j}}\right) = t\_{\bar{j}}\mathbf{1}\_{\{Z\_{\{\bar{l}\}}\} \ge F\{X(t\_{\bar{j}})\}} + \tau\left(t\_{\bar{j}+1}\right)\mathbf{1}\_{\{Z\_{\{\bar{l}\}} < F\{X(t\_{\bar{j}})\}\} \prime} \quad 1 < j \le J-1. \end{cases} \tag{3}$$

Based on this stopping time, the value of the option in (1) can be calculated as:

$$P\left(X\left(0\right)=\mathbf{x}\right) = \mathbb{E}\left[Z\left(\tau\left(t\_1\right)\right)|X\left(0\right)=\mathbf{x}\right].\tag{4}$$

The backward induction theorem of Chow et al. (1971) (Theorem 3.2) provides the theoretical foundation for the algorithm in (3) and establishes the optimality of the derived stopping time and the resulting price estimate in (4).

### *2.1. Simulation and Regression Methods*

The idea behind using simulation for option pricing is quite simple and involves estimating expected values and, therefore, option prices by an average of a number of random draws. However, when the option is American, one needs to determine simultaneously the optimal early exercise strategy, and this complicates matters. In particular, it is generally not possible to implement the exact algorithm in (3) because the conditional expectations are unknown, and therefore, the price estimate in (4) is infeasible. Instead, an approximate algorithm is needed. Because conditional expectations can be represented as a countable linear combination of basis functions, we may write *F X tj* = ∑∞*<sup>m</sup>*=<sup>0</sup> *φm X tj cm tj*, where {*φm* (·)}<sup>∞</sup>*m*=<sup>0</sup> form a basis.<sup>4</sup> To make this operational we further assume that the conditional expectation function can be well approximated with the first

<sup>4</sup> This is justified when approximating elements of the *L*<sup>2</sup> space of square integrable functions relative to some measure. Since *L*<sup>2</sup> is a Hilbert space, it has a countable orthonormal basis (see, e.g., Royden 1988).

*M* + 1 terms such that *F X tj* ≈ *FM X tj* = ∑*Mm*=<sup>0</sup> *φm X tj cm tj* and that we can obtain an estimate of this function by:

$$\left(\mathcal{F}\_M^N\left(X\left(t\_j\right)\right)\right) = \sum\_{m=0}^M \phi\_m\left(X\left(t\_j\right)\right) \mathfrak{E}\_m^N\left(t\_j\right),\tag{5}$$

where the coefficients *c*ˆ*Nm tj* are approximated or estimated using *N* ≥ *M* simulated paths. For example, in the Least-Squares Monte Carlo (LSM) method of Longstaff and Schwartz (2001), these are determined from a cross-sectional regression of the discounted future path-wise payoff on transformations of the state variables.

Based on the estimate in (5), we can derive an estimate of the optimal stopping time as:

$$\begin{cases} \mathfrak{t}\_M^N(t\_l) = T\\ \mathfrak{t}\_M^N(t\_j) = t\_{\bar{l}} \mathbf{1}\_{\{Z(\underline{t}\_{\bar{l}}) \ge \bar{\mathbf{f}}\_M^N(\mathbf{X}(\underline{t}\_{\bar{l}}))\}} + \mathfrak{t}\_M^N \mathbf{1}\_{\{Z(\underline{t}\_{\bar{l}}) < \bar{\mathbf{f}}\_M^N(\mathbf{X}(\underline{t}\_{\bar{l}}))\}}, \quad 1 < j \le J - 1. \end{cases} \tag{6}$$

From the algorithm in (6), a natural estimate of the option value in (4) is given by:

$$P\_M^N(X(0) = x) = \mathbb{E}[Z\left(\mathfrak{t}\_M^N(1)\right)|X(0) = x].\tag{7}$$

In the special case when all the paths are started at the current values of the state variables, i.e., *X* (0) = *x*, the conditional expectation in (7) can be estimated by the sample average given by:

$$\mathcal{P}\_{\mathcal{M}}^{N}\left(X\left(0\right)=x\right) = \frac{1}{N} \sum\_{n=1}^{N} Z\left(n, \mathfrak{f}\_{\mathcal{M}}^{N}\left(1, n\right)\right),\tag{8}$$

where *Z n*, *τ*ˆ*NM* (1, *n*) is the payoff from exercising the option at the estimated optimal stopping time *τ*ˆ*NM* (1, *n*) determined for path *n* according to (6). Convergence of this type of estimate has been analyzed in detail in the existing literature. The first step in doing so is to establish the convergence of the estimated approximate conditional expectation function, which is done in, e.g., the following lemma.

**Lemma 1** (Adapted from Theorem 2 of Stentoft 2004b)**.** *Under some regularity and integrability assumptions on the conditional expectation function, F (see Stentoft (2004b) for details), if M* = *M* (*N*) *is increasing in N such that M* → ∞ *and M*3/*N* → 0*, then F*ˆ*NM X tj converges to F X tj in probability for j* = 1, . . . , *J.*
