A Neural Network Monte Carlo Approximation for Expected Utility Theory

Zhu, Yichen; Escobar-Anel, Marcos

doi:10.3390/jrfm14070322

Open AccessArticle

A Neural Network Monte Carlo Approximation for Expected Utility Theory

by

Yichen Zhu

^† and

Marcos Escobar-Anel

^*,†

Department of Statistical and Actuarial Sciences, Western University, London, ON N6A5B7, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Risk Financial Manag. 2021, 14(7), 322; https://doi.org/10.3390/jrfm14070322

Submission received: 17 May 2021 / Revised: 15 June 2021 / Accepted: 18 June 2021 / Published: 13 July 2021

(This article belongs to the Section Mathematics and Finance)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper proposes an approximation method to create an optimal continuous-time portfolio strategy based on a combination of neural networks and Monte Carlo, named NNMC. This work is motivated by the increasing complexity of continuous-time models and stylized facts reported in the literature. We work within expected utility theory for portfolio selection with constant relative risk aversion utility. The method extends a recursive polynomial exponential approximation framework by adopting neural networks to fit the portfolio value function. We developed two network architectures and explored several activation functions. The methodology was applied on four settings: a 4/2 stochastic volatility (SV) model with two types of market price of risk, a 4/2 model with jumps, and an Ornstein–Uhlenbeck 4/2 model. In only one case, the closed-form solution was available, which helps for comparisons. We report the accuracy of the various settings in terms of optimal strategy, portfolio performance and computational efficiency, highlighting the potential of NNMC to tackle complex dynamic models.

Keywords:

neural networks; expected utility theory; CRRA utility; 4/2 stochastic volatility model

1. Introduction

Optimally allocating a collection of financial investments such as stocks, bonds and commodities has been a topic of concern to financial institutions and shareholders at least since the pioneering work of Markowitz’s mean-variance portfolio theory in 1952. People then realized the potential of diversification and their work laid the foundations for the development of portfolio analysis in both academia and industry. These initial results were in discrete-time, but it was not long before continuous-time portfolio decisions were produced in the alternative paradigm of expected utility theory, as can be seen in Merton (1969). The author assumed that the investor is able to continuously adjust their position, and the stock price process is modelled by a geometric Brownian motion (GBM). The optimal trading strategy and consumption policy that maximize the investor’s expected utility were obtained in closed-form by solving a Hamilton–Jacobi–Bellman equation.

The beauty and practicality of this continuous-time solution has led many researchers onto this path, producing optimal closed-form strategies for a wide range of models. For example, Kraft (2005) considered the stochastic volatility (SV) Heston model, Heston (1993). Flor and Larsen (2014) constructed a portfolio of stocks and fixed-income market products to hedge the interest rate risk. Explicit solutions in the presence of regime switching, stochastic interest rate and stochastic volatility was presented in Escobar et al. (2017), whilst the positive performance of their portfolio is confirmed by empirical study. For the commodities asset class, Chiu and Wong (2013) modelled a mean-reverting risky asset by an exponential Ornstein–Uhlenbeck (OU) process and solved the investment problem for an insurer subject to the random payment of insurance claim.

These models are particular cases of the quadratic-affine family (see Liu (2006)), one of the broadest models solvable in closed-form. The value function for a model in this family is the product of a function of wealth and an exponential quadratic function. Nonetheless, the complexity of financial markets has continued increasing every decade, with researchers detecting new stylized facts and proposing new models outside the quadratic-affine. Needless to say, investors must rely on these advanced models for better financial decisions, however. closed-form solutions are no longer guaranteed. One example of these advanced models is the GBM 4/2 model, introduced in Grasselli (2017). The model improves the Heston model in terms of the better fitting of implied volatility surfaces and historical volatilities patterns. The optimal portfolio problem with the GBM 4/2 model is solvable for certain types of market price of risk (MPR, see Cheng and Escobar-Anel (2021)), while the optimal trading strategy has not been found yet with an MPR proportional to the instantaneous volatility. More recently, an OU 4/2 model, which unifies the mean-reverting drift and stochastic volatility in a single model, was presented in Escobar-Anel and Gong (2020). The model targets two asset classes: commodities and volatility indexes. The optimal portfolio with the OU 4/2 model is not in closed form. This motivates approximation methods for dynamic portfolio choice.

Most approximation methods follow the idea from martingale method (see Karatzas et al. (1987)) or dynamic programming technique Brandt et al. (2005). Cvitanić et al. (2003) proposed a simulation-based method seeking the financial replication of the optimal terminal wealth given in the martingale method. Detemple et al. (2003) developed a comprehensive approach for the same investment problems, and the application of Malliavin calculus enhances its accuracy. The work in Brandt et al. (2005) led to the BGSS method, which was inspired by the popular least-square Monte Carlo method of Longstaff and Schwartz (2001). BGSS pioneered the recursive approximation method for dynamic portfolio choice. Cong and Oosterlee (2017) enhanced BGSS with the stochastic grid bundling method (SGBM) for conditional expectation estimation introduced in Jain and Oosterlee (2015). More recently, a polynomial affine method for constant relative risk aversion utility (PAMC) was recently developed in Zhu et al. (2020). The method takes advantage of the quadratic-affine structure, leading to superior accuracy and efficiency in the approximation of the optimal strategy and value function. In this paper, we extend the methodology in PAMC using neural networks.

The history of artificial neural networks goes back to McCulloch and Pitts (1943), where the author created the so-called “threshold logic” on the basis of the neural networks of the human brain in order to mimic human thoughts. Deep learning has since steadily evolved. Almost three decades later, back propagation, a widely used algorithm in neural network’s parameter fitting for supervised learning, was introduced, see Linnainmaa (1970). The importance of back propagation was only fully recognized when Rumelhart et al. (1986) showed that it can provide interesting distribution representations. The universal approximation theorem (see Cybenko (1989)) illustrated that every bounded continuous function can be approximated by a network with an arbitrarily small error, which further verifies the effectiveness of the neural network. Neural networks recently attracted a lot of attention of applied scientists, and were successful in fields such as image recognition and natural language processing because they are particular good at function approximation when the form of the target function is unknown. In the realm of dynamic portfolio analyses, Lin et al. (2006) first predicted portfolio covariance matrix with the Elman network and achieved the good estimation of the optimal mean-variance portfolio. More recently, Li and Forsyth (2019) proposed a neural network, representing the portfolio strategy at each rebalancing time, for a constrained defined contribution (DC) allocation problem. Chen and Ge (2021) introduced a differential equation-based method, where the value function with the Heston model is estimated by a deep neural network.

In this paper, motivated by the lack of knowledge on the correct expression for the portfolio value function for unsolvable models, we approximated the optimal portfolio strategy for any given stochastic process model with a neural network fitting the value function. Successful fitting relies on a suitable network architecture that captures the connection between input and output variables, as well as reasonable activation functions. We designed two architectures enriching an embedded quadratic-affine structure, and we considered three types of activation functions.

Given the lack of closed-form solutions for SV 4/2 models, we used them as our toy examples in the implementations. In particular, we first implemented our methodology in the solvable case (i.e., GBM 4/2 with solvable MPR), so the accuracy and efficiency were demonstrated before it is applied to the unsolvable cases of: GBM 4/2 model with stochastic jumps, GBM 4/2 model with proportional instantaneous volatility MPR, and the OU 4/2 model. Furthermore, we numerically show which network architecture is preferable in each case.

The paper is organized as follows. Section 2 introduces the dynamic portfolio choice problem, and presents the neural network architectures, activation functions and parameter training details. The step-by-step algorithm of our methodology is provided in Section 3. Section 4 and Section 5 apply the methodology to the GBM 4/2 and the OU 4/2 models. Section 6 concludes.

2. Problem Setting and Architectures of the Deep Learning Model

We considered a frictionless market consisting of a money market account (cash, M) and one stock (S). We assume the stock price follows a generalized diffusion process incorporating a one-dimensional state variable X. All the processes are defined on a complete probability space

(Ω, F, P)

with a right-continuous filtration

{F_{t}}_{t \in [0, T]}

, summarized by the stochastic differential equations (SDE):

\{\begin{matrix} \frac{d M_{t}}{M_{t}} = r (X_{t}) d t \\ d S_{t} = S_{t} θ (X_{t}, S_{t}) d t + S_{t} σ (X_{t}, S_{t}) d B_{t} + S_{t -} μ_{N} d N_{t} \\ d X_{t} = a (X_{t}) d t + b (X_{t}) d B_{t}^{X} \\ < d B_{t}, d B_{t}^{X} > = ρ d t . \end{matrix}

(1)

B_{t}

and

B_{t}^{X}

are Brownian motions with correlation

ρ

.

r (X_{t})

is the interest rate,

θ (X_{t}, S_{t})

and

σ (X_{t}, S_{t})

are the drift and diffusion coefficients for the stock price.

a (X_{t})

and

b (X_{t})

are measurable functions of state variable

X_{t}

.

N_{t}

is a pure-jump process independent of

B_{t}

and

B_{t}^{X}

with stochastic intensity

λ_{N} X_{t}

for constant

λ_{N} > 0

, and

μ_{N} > - 1

denotes the jump size.

We consider an investor with risk preference represented by a constant relative risk aversion (CRRA) utility:

U (W) = \frac{W^{1 - γ}}{1 - γ} .

(2)

Investors can adjust their allocation at a predetermined set of rebalancing times

(0, Δ t, 2 Δ t, . . .,

T - Δ t)

. The investors wish to derive a portfolio strategy

π

(percentage of wealth allocated to the stock) that maximizes their expected utility of terminal wealth, in other words,

E (U (W_{T}))

. The value function, representing the investor’s conditional expected utility, has the following representation:

V (t, W, S, X) = max_{π_{s \geq t}} E (U (W_{T}) ∣ t, W, S, X) = \frac{W^{1 - γ}}{1 - γ} f (t, S, X) .

(3)

The value function is separated into a wealth factor

\frac{W^{1 - γ}}{1 - γ}

and a state variable function f. The NNMC estimates the state variable function f with a neural network model

N N

and computes the optimal strategy

π_{t}^{*}

with the Bellman principle.

2.1. Architectures of the Deep Learning Model

In this section, we present two neural network architectures to fit the value function. According to the separable property of the value function shown in (3), the only unknown component is the state variable function f, which is therefore the target function for the neural network. The architectures of the networks are built around exponential polynomial functions, which are the most common form of solvable investor’s value functions and used in the PAMC method (see Zhu et al. (2020)). This property of proposed networks ensures that the new method generalizes PAMC.

The neural network is expected to achieve a better fit than a polynomial regression if the true state variable function is significantly different from the exponential polynomial function. Furthermore, we designed an initialization method for networks, which is better than a random initialization in terms of portfolio value function fitting.

2.1.1. Sum of Exponential Network

We first introduced the sum of the exponential polynomial neural network (SEN), as illustrated in Figure 1. The amount of input depends on the number of state variables. For simplicity, we took two inputs as an example. The first hidden layer computes the monomial of inputs. The second hidden layer obtains the linear combinations of the neuron in the first layer, where the weights are fitted in NNMC. An exponential activation function is applied to the second layer. The final output calculates a linear combination of exponential polynomials, so the exponential polynomial is a specific case of this neural network.

We denote the sum of exponential network by

N N^{S E N}

; the proposition next states the estimation of the corresponding optimal allocation.

Proposition 1.

Given the SEN approximation of the value function at the next rebalancing time

t + Δ t

, (i.e.,

N N^{S E N} [t + Δ t, S_{t}, X_{t}]

), the optimal strategy at time t is given by

π_{t}^{S E N} = \underset{π}{arg max} V (t, W_{t}, π_{t}, S_{t}, X_{t})

(4)

which is the solution of:

f_{2} (t, W_{t}, S_{t}, X_{t}) + f_{1} (t, W_{t}, S_{t}, X_{t}) π_{t} + N N^{S E N} (t + Δ t, S_{t} (1 + μ_{N}), X_{t}) λ_{N} X_{t} μ_{N} {(1 + π_{t} μ_{N})}^{- γ} = 0,

(5)

where:

\begin{matrix} f_{1} (t, W_{t}, S_{t}, X_{t}) & = - γ N N^{S E N} (t + Δ t, W_{t}, S_{t}, X_{t}) σ^{2} (X_{t}, S_{t}) \\ f_{2} (t, W_{t}, S_{t}, X_{t}) & = N N^{S E N} (t + Δ t, W_{t}, S_{t}, X_{t}) (θ (X_{t}, S_{t}) - r (X_{t}))) \\ + \frac{\partial N N^{S E N} (t + Δ t, W_{t}, S_{t}, X_{t})}{\partial S_{t}} S_{t} σ^{2} (X_{t}, S_{t}) \\ + \frac{\partial N N^{S E N} (t + Δ t, W_{t}, S_{t}, X_{t})}{\partial X_{t}} σ (X_{t}, S_{t}) b (X_{t}) ρ . \end{matrix}

(6)

Notably,

π_{t}^{S E N} = - \frac{f_{2} (t, W_{t}, S_{t}, X_{t})}{f_{1} (t, W_{t}, S_{t}, X_{t})}

when

S_{t}

follows a diffusion process, i.e.,

λ_{N} = 0

.

π_{t}^{S E N} = \frac{1}{μ_{N}} ({(- \frac{f_{2} (t, W_{t}, S_{t}, X_{t})}{N N^{S E N} (t + Δ t, S_{t}, X_{t}) λ_{N} X_{t} μ_{N}})}^{- \frac{1}{γ}} - 1)

when

S_{t}

follows a jump process, i.e.,

σ (X_{t}, S_{t}) = 0

.

Proof.

It follows similarly to Theorem 1 in Zhu and Escobar-Anel (2020). According to the Bellman principle:

V (t, W_{t}, S_{t}, X_{t}) = max_{π_{t}} E_{t} (V (t + Δ t, W_{t + Δ t}, S_{t + Δ t}, X_{t + Δ t}) ∣ W_{t}, S_{t}, X_{t}) .

(7)

We substitute

V (t + Δ t, W_{t + Δ t}, S_{t + Δ t}, X_{t + Δ t})

with

\frac{W^{1 - γ}}{1 - γ} N N^{S E N} (t + Δ t, W_{t + Δ t}, S_{t + Δ t},

X_{t + Δ t})

and expand the right hand side of the equation with respect to W, S and X, then

V (t, W_{t}, S_{t}, X_{t})

is written as a function of strategy

π_{t}

. Equation (5) is obtained with the first order condition. □

2.1.2. Improving Exponential Network

The architecture of an improving exponential network (IEN) is exhibited in Figure 2.

The target function of IEN is the log of the state variable function f (i.e.,

ln f

). The neural network consists of three parts. Node 1 is a polynomial with the output denoted by

V_{1}

. Node 2 is an artificial neural network with an arbitrary number of hidden layers and neurons; we denoted its output by

V_{2}

. Node 3 is a single-layer network with a Sigmoid function which computes a proportion

p \in [0, 1]

. The final output is the weighted average of the first two nodes

p V_{1} + (1 - p) V_{2}

. The second node is the complement to the exponential polynomial function. Moreover, the similarity between the true value function and the exponential polynomial function is measured by p, which is fitted into the NNMC methodology. Therefore, the network automatically adjusts the weights on the exponential polynomial function and its supplement according to the generated data. Finally, the state variable function f is computed as

f = e^{p v_{1} + (1 - p) v_{2}} = {(e^{v_{1}})}^{p} \times {(e^{v_{2}})}^{1 - p},

(8)

which is the geometric weighted average of nodes 1 and 2. Letting

N N^{I E N}

denote the IEN, the estimation of the optimal strategy is given in the next proposition.

Proposition 2.

Given the IEN approximation of the log value function at time

t + Δ t

(i.e.,

N N^{I E N} [t + Δ t, S_{t}, X_{t}]

), the optimal strategy at time t is given by

π_{t}^{I E N} = \underset{π}{arg max} V (t, W_{t}, S_{t}, X_{t})

(9)

which is the solution of:

(f_{2} (t, W_{t}, S_{t}, X_{t}) + f_{1} (t, W_{t}, S_{t}, X_{t}) π_{t}) + λ_{N} X_{t} exp (N N^{I E N} (t + Δ t, S_{t} (1 + μ_{N}), X_{t})) μ_{N} {(1 + π_{t} μ_{N})}^{- γ} = 0,

(10)

where:

\begin{matrix} f_{1} (t, W_{t}, S_{t}, X_{t}) & = - γ exp (N N^{I E N} (t + Δ t, S_{t}, X_{t})) σ^{2} (X_{t}, S_{t}) \\ f_{2} (t, W_{t}, S_{t}, X_{t}) & = exp (N N^{I E N} (t + Δ t, S_{t}, X_{t})) (θ (X_{t}, S_{t}) - r (X_{t}))) \\ + \frac{\partial N N^{I E N} (t + Δ t, W_{t}, S_{t}, X_{t})}{\partial S_{t}} exp (N N^{I E N} (t + Δ t, S_{t}, X_{t})) S_{t} σ^{2} (X_{t}, S_{t}) \\ + \frac{\partial N N^{I E N} (t + Δ t, W_{t}, S_{t}, X_{t})}{\partial X_{t}} exp (N N^{I E N} (t + Δ t, S_{t}, X_{t})) σ (X_{t}, S_{t}) b (X_{t}) ρ . \end{matrix}

(11)

Notably,

π_{t}^{I E N} = - \frac{f_{2} (t, W_{t}, S_{t}, X_{t})}{f_{1} (t, W_{t}, S_{t}, X_{t})}

when

S_{t}

follows a diffusion process, in other words,

λ_{N} = 0

.

π_{t}^{I E N} = \frac{1}{μ_{N}} ({(- \frac{f_{2} (t, W_{t}, S_{t}, X_{t})}{exp (N N^{I E N} (t + Δ t, S_{t}, X_{t})) λ_{N} X_{t} μ_{N}})}^{- \frac{1}{γ}} - 1)

when

S_{t}

follows a jump process (i.e.,

σ [X_{t}, S_{t}] = 0

).

Proof.

The proof follows similarly to Proposition 1. □

2.2. Initialization, Stopping Criterion and Activation Function

In this section, we disclose more details on training the neural networks. The initialization of weights is the first step of network training, which may significantly impact the goodness of fit. A good initialization prevents the network’s weights from converging to a local minimum and avoids slow convergence. Random initialization is mostly used as the interpretability of the network is usually weak. In contrast, both the SEN and the IEN are extensions of an exponential polynomial function; we suggest taking advantage of the results from the polynomial regression. Hence, the neural network searches the minimum near the exponential polynomial function used in the PAMC ensuring consistency. The polynomial regression initialization achieves superior results to the random initialization.

The coefficients of the exponential polynomial were first obtained with a regression model. The output of the SEN is a linear combination of exponential polynomial functions

\sum_{i = 1}^{N} a_{i} e x p (P_{n}^{i} (x, y)) + b

, we substitute the coefficients from polynomial regression into

P_{n}^{1} (x, y)

and set

a_{1} = 1, a_{2} = a_{3} = . . . = a_{n} = b = 0

. For the initialization of the IEN, we substitute the coefficients into the first node and artificially make

p = 0

.

The training process minimizes the mean squared error (MSE) between the network’s output and the simulated expected utility, and the sample data are split into a training set and a test set to reduce the overfitting problem. Adam is a back-propagation algorithm that combines the best properties of the AdaGrad and RMSProp algorithms to handle sparse gradients on noisy problems and provides excellent convergence speed. We applied the Adam on the training set for updating the network’s weights, and the test set MSE was computed and subsequently recorded. The test set MSE was expected to be convergent, so the training process was finished when the difference between the moving average of the recent 100 test set MSEs and the most recent test set MSE was less than a predetermined threshold, which was set at

0.00001

in the implementation.

The number of exponential polynomials is a hyperparameter in the SEN. We let the SEN be a sum of two exponential polynomial functions for simplicity. Node 2 in the IEN is an artificial neural network, which complements node 1 when the value function significantly deviates from an exponential polynomial function. The number of hidden layers and neurons, as well as the activation function of node 2, are freely determined before fitting the value function. We assume node 2 is a single layer network with 10 neurons and we implement several functions for comparison purposes, such as the logistic (sigmoid):

f (x) = \frac{1}{1 + e^{- x}},

(12)

the Rectified linear unit (ReLU):

f (x) = \{\begin{matrix} 0 & if x \leq 0 \\ x & if x > 0 \end{matrix},

(13)

and the Exponential linear unit (ELU):

f (x) = \{\begin{matrix} 0 & if x \leq 0 \\ e^{x} - 1 & if x > 0 \end{matrix}

(14)

3. Notation and Algorithm of the Methodology

In this section, we clarify the notation and the step-by-step algorithm. Table 1 displays a summary of the notation.

Algorithm

We first generated the paths of the stock price

S_{t}^{m}

and state variable

X_{t}^{m}

. The method starts from

t = T - Δ t

(i.e., the last rebalancing time before the terminal). We computed the optimal strategy

π_{T - Δ t}^{m}

given

W_{0}

,

S_{T - Δ t}^{m}

,

X_{T - Δ t}^{m}

using the Equation (5) or Equation (10). Then,

{\hat{v}}^{m}

is obtained through simulation, which estimates

f (T - Δ t, S_{T - Δ t}^{m}, X_{T - Δ t}^{m})

when using SEN and

ln [f (T - Δ t, S_{T - Δ t}^{m}, X_{T - Δ t}^{m})]

when using IEN. The network

N N (T - Δ t, X, S)

, approximating the state variable function, is trained with the input (

X_{T - Δ t}^{m}

,

S_{T - Δ t}^{m}

) and output

{\hat{v}}^{m}

. We conduct a similar procedure at each rebalancing point and recursively approximate the value function and optimal strategy until the inception of the portfolio. To evaluate the expected utility, we regenerated the paths of stock price and state variables. The path-wise optimal strategy was computed from

N N (t, X, S)

, so the optimal terminal wealth is easy to obtain. The average of the utility of optimal terminal wealth approximates the expected utility. Algorithms 1 and 2 present the pseudo code for NNMC using SEN and IEN, respectively. Simulation variance reduction methods, such as antithetic variates, could be incorporated into both algorithms to reduce the standard error of estimated expected utility.

Algorithm 1: NNMC-SEN

Algorithm 2: NNMC-IEN

4. Application to 4/2 Model

Grasselli (2017) unified the 1/2 and 3/2 SV models and proposed the 4/2 SV model. The 4/2 model better captures the evolution of the implied volatility surface and uniformly bounds the instantaneous variance away from zero when weights on 1/2 and 3/2 factors are positive. We implement the NNMC on the 4/2 model and report the optimal allocation, expected utility and the annualized CER defined by

U (W_{0} {(1 + C E R)}^{T}) = V (0, W_{0}, S_{0}, X_{0})

(15)

Three versions of the 4/2 model are considered; all are specific cases of the generalized model (1). The first assumes market price of risk proportional to the volatility driver. In other words, the value function and the optimal allocation are solvable in closed form. The second incorporates stochastic jumps into the 4/2 model, while the last uses the preferred setting for the market price of risk in the economics/finance literature (i.e., proportional to the instantaneous volatility). The parameters used in this section are presented in Table 2 1 and are estimated from the S&P 500 and its volatility index (VIX) in Cheng and Escobar-Anel (2021).

4.1. A Solvable Case

Cheng and Escobar-Anel (2021) found the closed-form solution for an optimal dynamic portfolio when the stock price follows a 4/2 model with a market price of risk linear to the square root of the volatility driver

\sqrt{X_{t}}

. The dynamics of stock price

S_{t}

and volatility driver

X_{t}

are exhibited in (16):

\{\begin{matrix} \frac{d M_{t}}{M_{t}} = r d t \\ \frac{d S_{t}}{S_{t}} = (r + λ_{S} (a_{S} X_{t} + b_{S})) d t + (a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}}) d B_{t}^{S} \\ d X_{t} = κ_{X} (θ_{X} - X_{t}) d t + σ_{X} \sqrt{X_{t}} d B_{t}^{X} . \end{matrix} < B_{t}^{S}, B_{t}^{X} > = ρ t

(16)

Solving the associated Hamilton–Jacobi–Bellman (HJB) equation:

\begin{matrix} 0 & = sup_{π} {V_{t} + W_{t} (r + λ_{S} (a_{S} X_{t} + b_{S}) + κ_{X} (θ_{X} - X_{t}) V_{X} \\ + \frac{1}{2} W_{t}^{2} π^{2} {(a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}})}^{2} V_{W W} + \frac{1}{2} σ_{X}^{2} X_{t} V_{X X} + π W_{t} (a_{S} X_{t} + b_{S}) σ_{X} ρ V_{W X}}, \end{matrix}

(17)

the optimal trading strategy and value function are given by

\begin{matrix} V (t, W, X) & = \frac{W^{1 - γ}}{1 - γ} e^{a (T - t) + b (T - t) X} \\ π_{t}^{*} & = \frac{X}{a X + b} [\frac{σ_{X} ρ_{S X} b (T - t)}{γ} + \frac{λ_{S}}{γ}] . \end{matrix}

(18)

The functions

a (T - t)

and

b (T - t)

are:

\begin{matrix} a (T - t) & = γ r (T - t) + \frac{2 κ_{X} θ_{X}}{k_{2}} ln \frac{2 k_{3} e^{0.5 (k_{1} + k_{2}) (T - t)}}{2 k_{3} + (k_{1} + k_{3}) (e^{k_{3} (T - t)} - 1)} \\ b (T - t) & = \frac{k_{0} (e^{k_{3} (T - t)} - 1)}{2 k_{3} + (k_{1} + k_{3}) (e^{k_{3} (T - t)} - 1)}, \end{matrix}

(19)

with auxiliary parameters

k_{0} = \frac{1 - γ}{γ} λ_{S}^{2}

,

k_{1} = κ_{X} - \frac{1 - γ}{γ} ρ_{S X} σ_{X} λ_{S}

,

k_{2} = σ_{X}^{2} + \frac{(1 - γ) σ_{X}^{2} ρ_{S X}^{2}}{γ}

and

k_{3} = \sqrt{k_{1}^{2} - k_{0} k_{2}}

.

The closed-form solution (see (18)) reveals that the value function in this case is an exponential linear function. Hence, we set the degree of polynomial to 1 when implementing NNMC with both the SEN and the IEN. Table 3 compares the optimal allocation, expected utility and CER from NNMC, the embedded PAMC and the theoretical solution. PAMC takes the least computational time. The optimal allocation obtained from PAMC is more accurate than the results from NNMC, while the differences in expected utility and CER are not significant. Furthermore, SEN slightly outperforms IEN in terms of the accuracy of optimal allocation and computation efficiency. Moreover, the ReLU activation function is superior to the sigmoid and ELU function when the IEN is applied.

We repeat the estimation of expected utility (i.e., steps 14–16 in NNMC-SEN and steps 15–17 in NNMC-IEN) after the value function and optimal strategy are obtained. All approximation methods have similar standard deviations of the estimated expected utility and CERs. Moreover, standard deviation decreases with an risk aversion level

γ

, which indicates that our approximation is more accurate for higher risk averse investors.

Figure 3 displays the expected utility and CER as a function of time to maturity T when

γ = 2

. The expected utility increases with maturity T as expected, while the CER decreases. Expected utility from PAMC, NNMC and the theoretical solution are visually the same. The comparison in portfolio performance is clearer by showing the CER: PAMC and NNMC produce CERs that are slightly smaller than the theoretical result. Furthermore, ELU seems to be inferior to the ReLU and sigmoid function, and the CER obtained from NNMC with the ELU activation function is slightly smaller than the results from other methods when the investment horizon is small.

4.2. An Unsolvable Case, 4/2 Model with Jumps

We then extended the 4/2 model to account for stochastic jumps. The dynamics of stock prices and volatility drivers are summarized by the SDE:

\{\begin{matrix} \frac{d M_{t}}{M_{t}} = r d t \\ \frac{d S_{t}}{S_{t}} = (r + λ_{S} (a_{S} X_{t} + b_{S}) - λ_{Q} X_{t} μ_{N}) d t + (a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}}) d B_{t}^{S} + μ_{N} d N_{t} \\ d X_{t} = κ_{X} (θ_{X} - X_{t}) d t + σ_{X} \sqrt{X_{t}} d B_{t}^{X} . \end{matrix} < B_{t}^{S}, B_{t}^{X} > = ρ t

(20)

Volatility and market price of risk are the same with the 4/2 model given in (16).

N_{t}

is an independent Poisson process with intensity

λ_{N} X_{t}

,

μ_{N}

is the jump size, and

λ_{Q} X_{t}

captures the market price of jump risk.

We used the set of jump risk parameters given in Liu and Pan (2003):

λ_{N} = λ_{Q} = 0.1 / θ_{X}

and

μ_{N} = 0.1

. Notably, the stock is expected to jump once every 10 years if

X_{t}

stays at its mean level

θ_{X}

. The degree of polynomial in PAMC and NNMC was chosen to be 1. In this case, the optimal strategy cannot be explicitly solved given the approximation of the value function at the next rebalancing time (see Propositions 1 and 2), which is therefore obtained by the Newton–Raphson method in NNMC. The optimal allocation, expected utility, CER obtained with NNMC and PAMC are reported in Table 4. When the stock follows the 4/2 model with jumps, PAMC is faster, followed by NNMC-SEN. Moreover, the accuracy of the estimated expected utility and CER from PAMC and NNMC are similar; the standard deviations of these approximation methods have little difference.

Figure 4 exhibits the expected utility and CER as a function of investment horizon T. Portfolios with a longer investment horizon are expected to achieve a better performance (i.e., higher expected utility) while CER decreases with T.

4.3. An Unsolvable Case, Market Price of Risk Proportional to Volatility

In this section, we consider an excess return, proportional to the instantaneous variance. The dynamics are given in (21), and a closed-form solution has not yet been found. We report the optimal allocation and expected utility from PAMC and NNMC, as well as investigated the impact of maturity T. The degree of polynomial in PAMC and NNMC remains 1:

\{\begin{matrix} \frac{d M_{t}}{M_{t}} = r d t \\ \frac{d S_{t}}{S_{t}} = (r + λ_{S} {(a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}})}^{2}) d t + (a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}}) d B_{t}^{S} \\ d X_{t} = κ_{X} (θ_{X} - X_{t}) d t + σ_{X} \sqrt{X_{t}} d B_{t}^{X} . \end{matrix} < B_{t}^{S}, B_{t}^{X} > = ρ t

(21)

Table 5 reports the optimal allocation, expected utility and CER from PAMC and NNMC. PAMC is still the most efficient method, followed by the NNMC-SEN. All methods achieve similar portfolio performance in terms of the expected utility and CER as well as the corresponding standard deviation. Figure 5 plots the expected utility and CER versus maturity T when

γ = 2

, which further verifies the non-significant difference in expected utility and CER obtained from the methods.

5. Application to the OU 4/2 Model

Motivated by the 4/2 stochastic volatility model and mean-reverting price pattern popular among various asset classes (e.g., commodities, exchange rates, volatility indexes), Escobar-Anel and Gong (2020) defined an Ornstein–Uhlenbeck 4/2 (OU 4/2) stochastic volatility model for volatility index option and commodity option valuation. Equation (22) presents the dynamics involved in the OU 4/2 model, which is a specific case of (1) given

θ (X_{t}, S_{t}) = (L_{S} + (λ_{S} - \frac{1}{2}) {(a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}})}^{2} - β_{S} ln S_{t})

,

σ (X_{t}, S_{t}) = (a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}})

,

a (X_{t}) = κ_{X} (θ_{X} - X_{t})

and

b (X_{t}) = σ_{X} \sqrt{X_{t}}

. The parameters used in this section are reported in Table 6, which is estimated from the data of gold Exchange-traded fund (ETF) and the volatility index of gold ETF in Escobar-Anel and Gong (2020). There are two state variables in the OU 4/2 model; hence, the input in both the SEN and the IEN are 2. Furthermore, the degree of polynomial in PAMC and NNMC is 2:

\{\begin{matrix} \frac{d M_{t}}{M_{t}} = r d t \\ \frac{d S_{t}}{S_{t}} = (L_{S} + λ_{S} {(a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}})}^{2} - β_{S} ln S_{t}) d t + (a_{S} \sqrt{X_{t}} + \frac{b_{S}}{\sqrt{X_{t}}}) d B_{t}, \\ d X_{t} = κ_{X} (θ_{X} - X_{t}) d t + σ_{X} \sqrt{X_{t}} d B_{t}^{X} \\ < d B_{t}, d B_{t}^{X} > = ρ d t . \end{matrix}

(22)

SEN performs worse than IEN when fitting the value function with the OU 4/2 model. Sometimes, SEN significantly deviates from the true value function, which results in poor portfolio performances and the occurrence of negative terminal wealth. Therefore, we excluded the results from NNMC-SEN in this section. Table 7 compares the optimal allocation, expected utility and CER obtained for the OU 4/2 model. PAMC and NNMC-IEN produce similar optimal allocations, both outperforming NNMC-SEN. Furthermore, we also estimated the standard deviation of expected utility and CER, which demonstrates that NNMC leads to a less volatile estimation of expected utility and CER than PAMC in most cases. In contrast to the results for the 4/2 model, IEN is more efficient than SEN. We conclude that IEN is suitable for the model with a complex structure and multiple state variables. The expected utility and CER as a function of the maturity T when

γ = 2

is plotted in Figure 6. Both the expected utility and CER increase with T. The expected utility and CER obtained from PAMC and NNMC-IEN visually overlap and are slightly higher than that of NNMC-SEN. Moreover, the selection of activation function in IEN makes little difference.

6. Conclusions

This paper investigated fitting the value function in an expected utility, dynamic portfolio choice using a deep learning model. We proposed two architectures for the neural network, which extends the broadest solvable family of value functions (i.e., the exponential polynomial function). We measured the accuracy and efficiency of various types of NNMC methods on the 4/2 model and the OU 4/2 model. The difference in optimal allocation, expected utility and CER is insignificant when the stock price follows the 4/2 model. The embedded PAMC is superior to NNMC due to the lower parametric space, hence its efficiency. Furthermore, when considering the OU 4/2 model, NNMC-SEN is inferior to a polynomial regression (PAMC) and to the NNMC-IEN in terms of expected utility and CER.

In summary, NNMC benefits from the popular exponential polynomial representation (embedded PAMC method) to propose a network architecture flexible enough to reach beyond affine models. Although the best setting, NNMC-IEN (ELU), is not as efficient as PAMC, neural networks demonstrate the way to tackle more advanced models along the lines of Markov switching, Lévy processes and fractional Brownian processes.

Author Contributions

The authors contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC, grant number RGPIN-2020-05068.

Conflicts of Interest

The authors declare no conflict of interest.

Note

1	$Δ_{t}^{r e}$ is the portfolio rebalancing interval, $\frac{1}{Δ_{t}^{r e}}$ indicates the rebalancing frequency. The Euler method with step size $Δ_{t}^{s i}$ is applied in generating the stock price and states variables.

References

Brandt, Michael W., Amit Goyal, Pedro Santa-Clara, and Jonathan R. Stroud. 2005. A simulation approach to dynamic portfolio choice with an application to learning about return predictability. The Review of Financial Studies 18: 831–73. [Google Scholar] [CrossRef] [Green Version]
Chen, Shun, and Lei Ge. 2021. A learning-based strategy for portfolio selection. International Review of Economics & Finance 71: 936–42. [Google Scholar]
Cheng, Yuyang, and Marcos Escobar-Anel. 2021. Optimal investment strategy in the family of 4/2 stochastic volatility models. Quantitative Finance, 1–29. [Google Scholar] [CrossRef]
Chiu, Mei Choi, and Hoi Ying Wong. 2013. Optimal investment for an insurer with cointegrated assets: Crra utility. Insurance: Mathematics and Economics 52: 52–64. [Google Scholar] [CrossRef]
Cong, Fei, and Cornelis W. Oosterlee. 2017. Accurate and robust numerical methods for the dynamic portfolio management problem. Computational Economics 49: 433–58. [Google Scholar] [CrossRef] [Green Version]
Cvitanić, Jakša, Levon Goukasian, and Fernando Zapatero. 2003. Monte carlo computation of optimal portfolios in complete markets. Journal of Economic Dynamics and Control 27: 971–86. [Google Scholar] [CrossRef] [Green Version]
Cybenko, George. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2: 303–14. [Google Scholar] [CrossRef]
Detemple, Jerome B., Ren Garcia, and Marcel Rindisbacher. 2003. A monte carlo method for optimal portfolios. The Journal of Finance 58: 401–46. [Google Scholar] [CrossRef] [Green Version]
Escobar, Marcos, Daniela Neykova, and Rudi Zagst. 2017. Hara utility maximization in a markov-switching bond–stock market. Quantitative Finance 17: 1715–33. [Google Scholar] [CrossRef]
Escobar-Anel, Marcos, and Zhenxian Gong. 2020. The mean-reverting 4/2 stochastic volatility model: Properties and financial applications. Applied Stochastic Models in Business and Industry 36: 836–56. [Google Scholar] [CrossRef]
Flor, Christian Riis, and Linda Sandris Larsen. 2014. Robust portfolio choice with stochastic interest rates. Annals of Finance 10: 243–65. [Google Scholar] [CrossRef]
Grasselli, Martino. 2017. The 4/2 stochastic volatility model: A unified approach for the heston and the 3/2 model. Mathematical Finance 27: 1013–34. [Google Scholar] [CrossRef]
Heston, Steven L. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies 6: 327–43. [Google Scholar] [CrossRef] [Green Version]
Jain, Shashi, and Cornelis W. Oosterlee. 2015. The stochastic grid bundling method: Efficient pricing of bermudan options and their greeks. Applied Mathematics and Computation 269: 412–31. [Google Scholar] [CrossRef] [Green Version]
Karatzas, Ioannis, John P. Lehoczky, and Steven E. Shreve. 1987. Optimal portfolio and consumption decisions for a “small investor” on a finite horizon. SIAM Journal on Control and Optimization 25: 1557–86. [Google Scholar] [CrossRef] [Green Version]
Kraft, Holger. 2005. Optimal portfolios and heston’s stochastic volatility model: An explicit solution for power utility. Quantitative Finance 5: 303–13. [Google Scholar] [CrossRef]
Li, Yuying, and Peter A. Forsyth. 2019. A data-driven neural network approach to optimal asset allocation for target based defined contribution pension plans. Insurance: Mathematics and Economics 86: 189–204. [Google Scholar] [CrossRef]
Lin, Chi-Ming, Jih-Jeng Huang, Mitsuo Gen, and Gwo-Hshiung Tzeng. 2006. Recurrent neural network for dynamic portfolio selection. Applied Mathematics and Computation 175: 1139–46. [Google Scholar] [CrossRef]
Linnainmaa, Seppo. 1970. The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors. Master’s thesis, University of Helsinki, Helsinki, Finland; pp. 6–7. (In Finnish). [Google Scholar]
Liu, Jun. 2006. Portfolio selection in stochastic environments. The Review of Financial Studies 20: 1–39. [Google Scholar] [CrossRef]
Liu, Jun, and Jun Pan. 2003. Dynamic derivative strategies. Journal of Financial Economics 69: 401–30. [Google Scholar] [CrossRef] [Green Version]
Longstaff, Francis A., and Eduardo S. Schwartz. 2001. Valuing american options by simulation: a simple least-squares approach. The Review of Financial Studies 14: 113–47. [Google Scholar] [CrossRef] [Green Version]
McCulloch, Warren S., and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5: 115–33. [Google Scholar] [CrossRef]
Merton, Robert C. 1969. Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics 51: 247–57. [Google Scholar] [CrossRef] [Green Version]
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323: 533–36. [Google Scholar] [CrossRef]
Zhu, Yichen, and Marcos Escobar-Anel. 2020. Polynomial affine approach to hara utility maximization with applications to ornstein- uhlenbeck 4/2 models. Applied Mathematics and Computation. submitted. [Google Scholar]
Zhu, Yichen, Marcos Escobar-Anel, and Matt Davison. 2020. A polynomial-affine approximation for dynamic portfolio choice. Computational Economics. submitted. [Google Scholar]

Figure 1. Sum of exponential network (SEN).

Figure 2. Improving exponential polynomial.

Figure 3.

S_{t}

follows the 4/2 model with a market price of risk

λ_{S} \sqrt{X_{t}}

, where (a) shows the expected utilities obtained with theoretical results and approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Figure 3.

S_{t}

follows the 4/2 model with a market price of risk

λ_{S} \sqrt{X_{t}}

, where (a) shows the expected utilities obtained with theoretical results and approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Figure 4.

S_{t}

follows a 4/2 model with stochastic jump, where (a) shows the expected utilities obtained with the approximation methods versus investment horizon T; and (b) shows CERs versus investment horizon T given

γ = 2

.

Figure 4.

S_{t}

follows a 4/2 model with stochastic jump, where (a) shows the expected utilities obtained with the approximation methods versus investment horizon T; and (b) shows CERs versus investment horizon T given

γ = 2

.

Figure 5.

S_{t}

follows the 4/2 model with a market price of risk

λ_{S} (a \sqrt{X_{t}} + \frac{b}{\sqrt{X_{t}}})

, where (a) shows the expected utilities obtained with theoretical results and approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Figure 5.

S_{t}

follows the 4/2 model with a market price of risk

λ_{S} (a \sqrt{X_{t}} + \frac{b}{\sqrt{X_{t}}})

, where (a) shows the expected utilities obtained with theoretical results and approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Figure 6.

S_{t}

follows the OU 4/2 model, where (a) shows the expected utilities obtained via approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Figure 6.

S_{t}

follows the OU 4/2 model, where (a) shows the expected utilities obtained via approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given

γ = 2

.

Table 1. Notation for NNMC is listed here.

Notation	Meaning
$B_{t}^{m}$	Brownian motion at time t in $m_{t h}$ simulated path
$S_{t}^{m}$	Stock price at time t in $m_{t h}$ simulated path
$X_{t}^{m}$	Other state variable such as interest rate or volatility
$n_{r}$	Number of simulated paths
N	Number of simulation to compute expected utility for a given set $(W_{0}, S_{t}^{m}, X_{t}^{m})$
${\hat{W}}_{t + Δ t}^{m, n} (π^{m})$	A simulated wealth level at $t + Δ t$ given the wealth, allocation and other state variables at t are $W_{0}$ , $π^{m}$ and $X_{t}^{m}$
${\hat{S}}_{t + Δ t}^{m, n}$	A simulated stock price at $t + Δ t$ given $S_{t}^{m}$
${\hat{X}}_{t + Δ t}^{m, n}$	A simulated state variable at $t + Δ t$ give $X_{t}^{m}$
$V (t, W, S, X)$	Value function at time t given wealth W, stock price S and state variable X
$N N (t, X, S)$	The neural network used to fit $f (t, S_{t}, X_{t})$ or $ln [f (t, S_{t}, X_{t})]$
${\hat{v}}^{m}$	Estimation of $f (t, S_{t}^{m}, X_{t}^{m})$ or $ln [f (t, S_{t}^{m}, X_{t}^{m})]$
$π_{s}^{m, n}$	Optimal strategy at time s given wealth, stock price and other state variables are ${\hat{W}}_{s}^{m, n}$ , ${\hat{S}}_{s}^{m, n}$ and ${\hat{X_{s}}}^{m, n}$
$\hat{V} (0, W_{0}, S_{0}, X_{0})$	Estimation of expected utility at time 0.

Table 2. Parameter values for 4/2 model.

Parameter	Value	Parameter	Value
T	1	$X_{0}$	0.04
r	0.05	$λ_{S}$	2.9428
$Δ_{t}^{r e}$	$\frac{1}{10}$	$Δ_{t}^{s i}$	$\frac{1}{60}$
$S_{0}$	1.0	$M_{0}$	1.0
$W_{0}$	1	$n_{r}$	100
N	2000	$N_{0}$	200000
$κ_{X}$	7.3479	$θ_{X}$	0.0328
$σ_{X}$	0.6612	$a_{s}$	0.9051
$b_{S}$	0.0023	$ρ$	−0.7689

Table 3. Results for the 4/2 model with a market price of risk

λ_{S} \sqrt{X_{t}}

. We reported the optimal weights, expected utility and CER obtained with the theoretical result and with the approximation method for different levels of risk aversion

γ