Next Article in Journal
How Do Health, Care Services Consumption and Lifestyle Factors Affect the Choice of Health Insurance Plans in Switzerland?
Next Article in Special Issue
Machine Learning for Multiple Yield Curve Markets: Fast Calibration in the Gaussian Affine Framework
Previous Article in Journal
A Tail Dependence-Based MST and Their Topological Indicators in Modeling Systemic Risk in the European Insurance Sector
Previous Article in Special Issue
Neural Networks for the Joint Development of Individual Payments and Claim Incurred
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization

1
Department of Mathematics, ETH Zürich, 8092 Zürich, Switzerland
2
Department of Mathematics and Statistics, Concordia University, 1455 De Maisonneuve Blvd. W., Montréal, QC H3G 1M8, Canada
*
Author to whom correspondence should be addressed.
Risks 2020, 8(2), 40; https://doi.org/10.3390/risks8020040
Submission received: 28 February 2020 / Revised: 9 April 2020 / Accepted: 17 April 2020 / Published: 23 April 2020
(This article belongs to the Special Issue Machine Learning in Finance, Insurance and Risk Management)

Abstract

:
A regularization approach to model selection, within a generalized HJM framework, is introduced, which learns the closest arbitrage-free model to a prespecified factor model. This optimization problem is represented as the limit of a one-parameter family of computationally tractable penalized model selection tasks. General theoretical results are derived and then specialized to affine term-structure models where new types of arbitrage-free machine learning models for the forward-rate curve are estimated numerically and compared to classical short-rate and the dynamic Nelson-Siegel factor models.

1. Introduction

The compatibility of penalized regularization with machine learning approaches allows for the successful treatment of various challenges in learning theory such as variable selection (see Tibshirani (1996)) and dimension reduction (see Zou et al. (2006)). The objective of many machine learning models used in mathematical finance is to predict asset prices by learning functions depending on stochastic inputs. In general, there is no guarantee that these stochastic factor models are consistent with no-arbitrage conditions. This paper introduces a novel penalized regularization approach to address this modelling difficulty in a manner consistent with financial theory. The incorporation of an arbitrage-penalty term allows various machine learning methods to be directly and coherently integrated into mathematical finance applications. We focus on regression-type model selection tasks in this article. However, the arbitrage-penalty can also be applied to other types of machine learning algorithms with financial applications.
To motivate our approach we first consider informally, similar to (Björk 2009, Chapter 10), the following simple situation that will later be made more precise. Let ( Ω , F , ( F t ) t 0 , P ) be a filtered probability space satisfying the usual conditions. Let ( X t ) t 0 be an F t -adapted real-valued stochastic process with continuous paths representing the price of a financial asset. Let r be the constant risk-free interest rate and assume a fixed time interval [ 0 , T ] . Existence of a martingale measure Q equivalent to the underlying real-world measure P implies absence of arbitrage. The price at time t [ 0 , T ] of a derivative security with integrable payoff f ( X T ) at time T is given by the risk-neutral pricing formula
E Q e r ( T t ) f ( X T ) | F t .
With L t = d Q d P | F t we may express Equation (1) under the real-world probability measure P as
E P e r ( T t ) L T L t f ( X T ) | F t .
Equivalently, the price given by Equation (2) can be expressed under P by defining the state-price density process Z t = e r t L t . If Q is the minimal martingale measure of Schweizer (1995) then the transformation
f ( X t ) Z t f ( X t )
can be interpreted as finding the process closest to ( f ( X t ) ) t 0 which is a (local) martingale under P . The purpose of this paper is to find an analogue of the transformation (3) in this setting when X t is described by a stochastic factor model, as is the case with most machine learning approaches to mathematical finance. For example, X t may be described by a deep neural network with stochastic inputs.
The above-ignored well-known results regarding the uniqueness of Q (see Schweizer (1995)) and other important generalizations of the martingale approach to arbitrage theory. In particular, the more general setting for the fundamental theorem of asset pricing of Delbaen and Schachermayer (1998) implies that if “arbitrage”, in the sense of no-free lunch with vanishing risk, exists then the transformation (3) is undefined. However, many machine learning approaches to mathematical finance may admit arbitrage so it is necessary to consider the general case. The arbitrage-regularization framework introduced in this paper integrates machine learning methodologies with the general martingale approach to arbitrage theory.
We consider a general framework for learning the arbitrage-free factor model that is most similar to a factor model within a prespecified class of alternative factor models. This search is optimized by minimizing a loss-function measuring the distance of the alternative model to the original factor model with the additional constraint that the market described by the alternative model is a local martingale under a reference probability measure.
The main theoretical results rely on asymptotics for the arbitrage-regularization penalty for selecting the optimal arbitrage-free model from a class of stochastic factor models. Relaxation of the asymptotic results necessary for practical implementation are presented. Throughout this paper, the bond market will serve as the primary example of our methods since no-arbitrage conditions for factor models are well understood, see Filipović (2001) and the references therein. Numerical results applying the arbitrage-regularization methodology are implemented using real data.
The remainder of this paper is organized as follows. Section 2 states the arbitrage-regularization problem and presents an overview of relevant background on bond markets. Section 3 develops the arbitrage-penalty and establishes the main asymptotic optimality results. Non-asymptotic relaxations of these results are also considered and linked with transaction costs. Section 4 specializes the general results to bond markets and where a simplified expression for the arbitrage-penalty is obtained. Numerical implementations of the results are considered and the arbitrage-regularization methodology is used to generate new machine learning-based models consistent with no free lunch with vanishing risk (NFLVR) and the results are compared with classical term-structure models as benchmarks. Section 5 concludes and Appendix A contains supplementary material primarily required for the proofs, such as functional Itô calculus and Γ -convergence results. Proofs of the main theorems of the paper are included in Appendix B.

2. The Arbitrage-Regularization Problem

For the remainder of the paper all stochastic processes described in this paper are defined on a common stochastic basis Ω , F , ( F t ) t 0 , P . Let P be a probability measure equivalent to the reference probability measure P and ( r t ) t 0 denotes the risk-free rate in effect at time t 0 . Assume that there exists an asset whose price process, denoted by ( N t ) t 0 , is a strictly positive P -martingale which serves as numéraire. Unless otherwise specified all the processes in this paper will be described under the martingale-measure for ( N t ) t 0 , denoted P N is defined by
d P N d P = exp 0 T r s d s N T N 0 .
The choice of the numéraire can be used to encode or remove any trend from the price processes being modelled. Price processes which are local martingales under P N or P are usually only semi-martingales under the objective measure P . Further details on numéraires can be found in Shreve (2004).
We consider a large financial market ( X t ( u ) ) t 0 , u U , indexed by a non-empty Borel subset U R D , were D is a positive integer. For example, ( X t ( u ) ) t 0 , u U may be used to represent a bond market where, using the parameterization of Musiela and Rutkowski (1997), U = [ 0 , ) represents the collection of all possible maturities and X t ( u ) represents the time t price of a zero-coupon bond with maturity u.
For each u U , the process ( X t ( u ) ) t 0 will be driven by a latent, possibly infinite-dimensional, factor process. In the case of the bond market, this latent process will be the forward-rate curve. Write
X t ( u ) S t ϕ t u , [ [ ϕ u ] ] t ; u
where ϕ t u ϕ ( t , β t , u ) , { S t ( · , · ; u ) } u U is a family of path-dependent functionals encoding the latent process into the asset price X t ( u ) , ϕ t u is the factor model for the latent process, and β t are the R d -valued stochastic factors driving the latent process. Following Fournie (2010), S t will be allowed to depend on the local quadratic-variation of the factor process ϕ ( t , β t , u ) , denoted by [ [ ϕ u ] ] t and defined by
[ ϕ ( · , β · , u ) ] t = 0 t [ [ ϕ u ] ] s d s ,
where [ ϕ ( · , β · , u ) ] t denotes the usual quadratic-variation of the factor process [ ϕ ( · , β · , u ) ] t . It is instructive to note that the local quadratic-variation [ [ ϕ u ] ] t is well-defined due to Assumption 1, imposed later.
In the case of the bond market, S t will be the map taking a forward-rate curve, such as ( ϕ ( t , β t , u ) ) u U , to the time t price of a zero-coupon bond with maturity u, as defined by
S t ϕ t u , [ [ ϕ u ] ] t ; u exp t u ϕ ( t , β t , v ) d v .
It will often be convenient to use the reparameterization of Musiela and Rutkowski (1997) and rewrite (6) as
S t ϕ t u , [ [ ϕ u ] ] t ; u exp 0 u t ϕ ( t , β t , τ ) d τ .
where τ u s , for 0 t s u , and represents the time to maturity of the bond.
In general, S t will be allowed to depend on the path of ϕ ( t , β t , u ) . Thus, S t will be a path-dependent functional of regularity C b 1 , 2 in the sense of Fournie (2010) as discussed in Appendix A.2. However, as in the bond market, if S t depends only on the current value of ϕ ( t , β t , u ) then the requirement that S t be of class C b 1 , 2 , in the sense of Fournie (2010), is equivalent to it being of regularity C 1 , 2 ( I × R d ) in the classical sense; where I [ 0 , ) . Therefore, the classical Itô-calculus would apply to S t .
Analogously to Björk and Christensen (1999), the factor model φ for the latent process will always be assumed to be suitably integrable and suitably differentiable. Specifically, φ will belong to a Banach subspace X of L ν μ p I × R d × U which can be continuously embedded within the Fréchet space C 1 , 2 , 2 ( I × R d × U ) ; where ν is a Borel probability measure supported on I, μ is a Borel probability measure supported on R d × U , and both ν and μ are equivalent to the corresponding Lebesgue measures restricted to their supports. Here, 1 p < is kept fixed.
An example from the bond modelling literature is the Nelson-Siegel model (see Nelson and Siegel (1987) and Diebold and Rudebusch (2013)), which expresses the forward-rate curve as a function of its level, slope, and curvature through the factor model. The Nelson-Siegel family is part of a larger class of affine term-structure models, in which, at any given time, the forward-rate curve is described in terms of a set of market factors as
φ ( t , β , u ) φ 0 ( u t ) + i = 1 d β i φ i ( u t ) ,
where d is a positive integer and φ i C 2 ( U ) and φ 0 is a forward-rate curve typically calibrated to the data available at time t = 0 . Note that the forward-rate curves in (7) are parameterized according to the change of variables in (6), however, since U represents all times to maturities these are indeed traded assets. However, as shown in Filipović (2001), the Nelson-Siegel model is typically not arbitrage-free therefore we would like to learn the closest arbitrage-free factor model, driven by the same stochastic factors. Therefore, given a non-empty and unbounded hypothesis class H X of plausible alternative models, we optimize
argmin [ ϕ H ] φ ϕ subject   to : S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is   a   N local   martingale   for   all   u U ;
where H is required to contain the (naive) factor model φ and : X [ 0 , ) is continuous and coercive loss function. For example, may be taken to be the norm on X . Geometrically, (8) describes a projection of φ onto the (possibly non-convex) subset of H of factor models making each S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) into a P N -local martingale for every u U . The requirement that H contains the (naive) factor model φ is for consistency, in order to ensure that for any arbitrage-free factor model φ the solution to problem (8) is itself.
In general, the problem described by (8) may be challenging to implement as projections onto non-convex sets are intractable. In analogy with regularization literature, such as Hastie et al. (2015), instead we consider the following relaxation of problem (8) which is more amenable to numerical implementation
argmin [ ϕ H ] φ ϕ + AF λ ( ϕ ) ;
where { AF λ } 2 λ < is a family of functions from H to [ 0 , ] taking value 0 if each
S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is a P N -local martingale simultaneously for every value of u and λ is a meta-parameter determining the amount of emphasis placed on the penalizing factor models which fail to meet this requirement. Problem (9) is called the arbitrage-regularization problem.
At the moment, there are only two available lines of research which are comparable to the arbitrage-regularization problem. Results of the first kind, such as the arbitrage-free Nelson-Siegel model of Christensen et al. (2011a), provide closed-form case-by-case arbitrage-free variants of specific model only if they coincide with specific arbitrage-free HJM type factor models, such as those studied by Björk and Christensen (1999). However, the reliance on analytic methods typically limit this type of approach to simple or specific models and does not allow for a general or computationally viable solution to the problem. Moreover, arbitrage-free corrections derived in this way are not guaranteed to be optimal in the sense of (8), or approximately optimal in the sense of (9). This will be examined further in the numerics section of this paper.
The use of a penalty to capture no-arbitrage conditions has, to the best of the authors’ knowledge, thus far only been explored numerically by Chen et al. (2019) within the discrete-time portfolio optimization setting. A similar problem has been treated in Chen et al. (2006) for learning the equivalent martingale measure in the multinomial tree setting for stock prices. Our paper provides the first instance of a theoretical result in this direction as well as such a framework that applies to large-financial markets such as bond markets or which applies in the continuous-time setting.
Before presenting the main results we first state necessary assumptions.
Assumption 1.
The following assumptions will be maintained throughout this paper.
(i) 
β t is an R d -valued diffusion process which is the unique strong solution to
β t = β 0 + 0 t α ( s , β s ) d s + 0 t σ ( s , β s ) d W s ,
where β 0 R d , W t is an R d -valued Brownian motion, the components α i : R 1 + d R are continuous, the components σ i , j : R 1 + d R d × d i , j = 1 d are measurable and such that the diffusion matrix
σ ( s , β ) σ ( s , β )
is a continuous function of β for any fixed t 0 .
(ii) 
The stochastic differential equation (10) has a unique R d -valued solution for each β 0 R d .
(iii) 
For every u U , { S t ( · , · ; u ) } t [ 0 , ) is a non-anticipative functional in C b 1 , 2 verifying the following “predictable-dependence” condition of Fournie (2010):
S t ( x t , x t ; u ) = S t ( x t , x t ; u ) ,
for all t [ 0 , ) and all ( x , v ) D ( [ 0 , t ] ; R d ) × D ( [ 0 , t ] ; S + d ) , where S + d is the set of d × d -dimensional positive semi-definite matrices with real-coefficients,
The central problem of the paper will be addressed in full generality before turning to applications in term-structure models, in the next section.

3. Main Results

In this section, we show the asymptotic equivalence of problems (8) and (9) for general asset classes. This requires the construction of the penalty term AF λ measuring how far a given factor model is from being a P N -local martingale. The construction of AF λ is made in two steps. First, a drift condition which guarantees that each { S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) } u U is simultaneously a P N -local martingale is obtained. This condition generalizes the drift condition of Heath et al. (1992) and provides an analogue to the consistency condition of Filipović and Teichmann (2004). Second, the drift condition is used to produce the penalty term in (9). Subsequently, the optimizers of (9) will be used to asymptotically solve problem (8).
Proposition 1
(Drift Condition). The processes S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) are P N -local-martingales, for each u U simultaneously, if and only if
D S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) = S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ϕ t ( s , β s , u ) + i = 1 d ϕ β i ( s , β s , u ) α i ( s , β s ) + 1 2 i , j = 1 d 2 ϕ β i β j ( s , β s , u ) σ i ( s , β s ) σ j ( s , β s ) + 1 2 [ 2 S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ] i = 1 d ϕ β i ( s , β s , u ) σ i ( s , β s ) 2 .
is satisfied P N -a.s. for every t [ 0 , ) and every u U , where D and ∇ respectively denote the horizontal and vertical derivative of Fournie (2010) (see Appendix A.2 Equation (A3)).
The drift condition in Proposition 1 implies that if ϕ is such that the difference of the left and right-hand sides of (11) is equal to 0, P N -a.s. for all u U then S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is a P N -local martingale simultaneously for all u U . Thus, S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is simultaneously a P N -local martingale for all u U if for every u U the [ 0 , ) -valued process Λ ̲ t u ( ϕ ) is equal to 0 P N -a.s, where Λ ̲ t u ( ϕ ) is defined using (11) by
Λ ̲ t u ( ϕ ) D S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) + S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ϕ t ( s , β s , u ) + i = 1 d ϕ β i ( s , β s , u ) α i ( s , β s ) + 1 2 i , j = 1 d 2 ϕ β i β j ( s , β s , u ) σ i ( s , β s ) σ j ( s , β s ) + 1 2 [ 2 S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ] i = 1 d ϕ β i ( s , β s , u ) σ i ( s , β s ) 2 ,
where ϕ t u = ϕ ( t , β t , u ) . The arbitrage-penalty is defined as follows.
Definition 1
(Arbitrage-Penalty). Let { ( Λ t u ( ϕ ) ) t 0 } ϕ H ; u U be a family of F t -adapted [ 0 , ) -valued stochastic processes for which
Λ t u ( ϕ ) ( ω ) = 0 Λ ̲ t u ( ϕ ) ( ω ) = 0 ,
holds for all ϕ H , t I , u U , and P N -almost every ω Ω . Then, for every λ 0 , the family { AF λ } λ 0 of functions
AF λ : X [ 0 , ] ϕ λ ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) λ .
is said to define an arbitrage-penalty.
Remark 1.
Whenever | Λ t u ( ϕ ) | λ fails to be integrable, we make the convention that ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) λ = .
The convergence of the optimizers of (9) to the optimizers of (8) is demonstrated in the next theorem. The proof relies on the theory of Γ -convergence, which is useful for interchanging the limit and an arginf operations.
Assumption 2.
Assume that
(i) 
For every ϕ H and N -a.e. ω Ω , the function ( t , u ) Λ t u ( ϕ ) ( ω ) is continuous on H ,
(ii) 
ϕ H : ( u U ) S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) i s a P N l o c a l m a r t i n g a l e H is closed and non-empty.
Please note that both statements (i) and (ii) are with respect to the relative topology on H .
Theorem 1.
Under Assumption 2 the following hold:
(i) 
Equation (8) admits a minimizer on H ,
(ii) 
lim λ ; λ 2 inf ϕ H ( φ ϕ ) + AF λ ( ϕ ) = min ϕ H ( φ ϕ ) + ι H ( ϕ ) ,
(iii) 
If for every λ 2 AF λ is lower-semi-continuous on H then
lim λ ; λ 2 argmin ϕ H ( φ ϕ ) + AF λ ( ϕ ) argmin ϕ H ( φ ϕ ) + ι H ( ϕ ) ,
where ι H is defined on H as
ι H ( ϕ ) 0 i f ( u U ) S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) i s a N l o c a l m a r t i n g a l e o t h e r w i s e .
Theorem 1 provides a theoretical means of asymptotically computing the optimizer ϕ ^ of problem (8). In practice, this limit cannot always be computed and only very large values of λ can be used. However, in reality trading does not occur in a friction-less market but every transaction placed at time t incurs a cost 0 < k t . Moreover, only a finite number of assets are traded.
Consider a market with frictions where only finitely many assets are traded. In this setting, an admissible strategy is an adapted, left-continuous of finite-variation process θ t R n whose corresponding wealth process is P -a.s. bounded below. In the context of this paper, the sub-market S t ( ϕ t u i , [ ϕ u i ] t ; u i ) i = 1 n with proportional transaction cost k t > 0 is precisely such a market. Any such admissible strategy on this finite sub-market defines an admissible portfolio whose liquidation value, as defined by (Guasoni 2006, Equation 2.2) and (Guasoni 2006, Remark 2.4), is defined by
V ( θ t ) = i = 1 n 0 t θ s i d S s ( ϕ s u i , [ [ ϕ u i ] ] s ; u i ) 0 t k s S s ( ϕ t u i , [ [ ϕ u i ] ] s ; u i ) d | D θ s i | k t S s ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) | θ t i | ,
where ϕ denote the optimizer of 9 for a fixed value of 2 λ < , D θ i denotes the weak derivative of θ i in the sense of measures, and | D θ i | denotes its variation. The first term on the right-hand side of (17) is the capital gains from trading, second represents the cost incurred from various transaction costs, and the last term represents the cost of instantaneous liquidation at time t. Although more general transactions costs may be considered, the proportional transaction costs presented here are sufficient for the formulation of the next result.
The next result guarantees that the market model ϕ is arbitrage-free, granted that k t is large enough to cover the spread between S t ( ϕ t u i , [ ϕ u i ] t ; u i ) and S t ( ϕ ^ t u i , [ [ ϕ ^ u i ] ] t ; u i ) . The following assumption quantifies the requirement that λ be taken to be sufficiently large.
Assumption 3.
There exists some 0 < m < M and some 2 < λ such that for every 0 t , positive integer n, and every u 1 , , u n U the following holds:
(i) 
sup 0 t max i = 1 , , n ess−sup S t ( ϕ ^ t u i , [ [ ϕ ^ u i ] ] t ; u i ) S t ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) < M ,
(ii) 
m < inf 0 t inf i = 1 , , n ess−inf S t ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) .
Proposition 2.
If m k t M for all times 0 t then for any admissible strategy θ trading S t ( ϕ t u 1 , [ ϕ u 1 ] t ; u 1 ) , , S t ( ϕ t u n , [ ϕ u n ] t ; u n ) , then P 0 V T ( θ ) implies that P V T ( θ ) = 0 .
In the next section, we apply Theorem 1 and the arbitrage-regularization (9) to the bond market.

4. Arbitrage-Regularization for Bond Pricing

As discussed in Diebold and Rudebusch (2013), affine term-structure models are commonly used in forward-rate curve modelling due to their tractability and the interpretability. In the formulation of Björk and Christensen (1999), as further developed in Filipović (2000); Filipović et al. (2010), affine term-structure models are characterized by (7) together with the additional requirement that its stochastic factor process β t follows an affine diffusion. By Cuchiero (2011) this means that the dynamics of β t are given by
α ( t , β ) γ + k = 1 d β k γ k σ j ( t , β ) σ ( t , β ) ζ + k = 1 d β k ζ k ,
for some d × d matrices ζ and { ζ k } k = 1 d , and some vectors γ and { γ k } k = 1 d in R d such that there exists a solution ( L , R ) to the following Riccati system
2 t L ( t , x ) = R ( t , x ) γ R ( t , x ) + ζ R ( t , x ) L ( 0 , x ) = 0 , 2 t R k ( t , x ) = R ( t , x ) γ k R ( t , x ) + ζ k R ( t , x ) R ( 0 , x ) = x , 1 k d
such that L ( t , x ) + R ( t , x ) β has negative real part for all t 0 , x R d , β R d , and R ( x ) = ( R 1 ( x ) , , R d ( x ) ) .
Fix meta-parameters p , κ 1 and U = [ 0 , ) . For the next result, all the factor models will be taken as belonging to the weighted Sobolev space W w p , k ( I × R d × U ) with weight function
w ( t , β , u ) C e | t | β κ | u | κ ,
where C is a unique constant ensuring that 1 W w p , k and its weighted integral is equal to 1. Fix measures n u ( d t ) = e t d t on [ 0 , ) and μ ( d β d u ) = c e | u | κ β κ on R d × U . The space W w p , k ( I × R d × U ) is defined of all ν μ -locally integrable, k-times weakly differentiable functions f : I × R d × U R equipped with the norm:
f 0 0 β R d e | t | | u | κ β κ f p ( t , u , β ) d t d β d u + 1 | a | k 0 0 β R d e | t | | u | κ β κ D η f p ( t , u , β ) d t d β d u ,
where η ( η , , η d ) is a multi-index, η i = 0 , 1 , , | η | = i = 1 d η i , and D η f is the weak derivative of f of order η defined by
0 0 β R d f ( D η g ) d β d u = ( 1 ) | η | 0 0 β R d ( D η f ) g d β d u g C 0 ( I × R d × U ) .
Here, C 0 ( I × R d × U ) is the space of all compactly supported functions with infinitely many derivatives. Furthermore, k is required to satisfy
k 1 + d + D p + 2 .
Remark 2.
In the case where p = 2 and k 1 , the Sobolev space W w p , k is a reproducing kernel Hilbert space (see Nelson and Siegel (1987)) therefore point evaluation is a continuous linear functional and by (weighted) Morrey-Sobolev Theorem of Brown and Opic (1992) it can be embedded within a space of continuous functions. Therefore, given any ϕ W w p , k and any R d -valued stochastic process β t , the process ( ϕ ( t , β t , · ) ) t 0 is a well-defined process in the following space of forward-rate curves of Filipović (2001) defined by h L l o c 1 ( [ 0 , ) ) : h L l o c 1 ( [ 0 , ) ) a n d | h ( 0 ) | 2 + 0 | h ( u ) | 2 e | u | κ d u < .
Analytic tractability is ensured by requiring that the factor models considered for the arbitrage-regularization (9) belong to the class H defined by
H ϕ W w p , k : ( ϕ 0 , , ϕ d W w ˜ p ( U ) ) ϕ ( t , β , u ) = ϕ 0 ( u t ) + i = 1 d β i ϕ i ( u t ) ,
where w ˜ ( u ) = e | u | κ . This class of functions generalizes the Nelson-Siegel family (7) discussed in the introduction.
Under these conditions the following theorem characterizes the asymptotic behavior of (9) in λ as solving problem (8), given fixed meta-parameters p , κ 1 . Following Filipović (2001), it will be convenient to denote
Φ i ( u ) = 0 u ϕ i ( s ) d s .
Theorem 2.
Let φ be given by (7), β t be as in (18), and fix p , κ 1 . Then
(i) 
For every λ [ 2 3 , 1 ) there exists an element ϕ λ in H minimizing
0 0 u β R d e | t | | u | κ β κ φ ( u t , β ) ϕ ( u t , β ) p d β d t d u + λ Γ ( 1 + 1 κ ) 1 λ 0 e | u | κ Λ u ( ϕ ) λ d u λ ,
where Λ t u ( ϕ ) is defined by
Λ u ( ϕ ) φ 0 ( 0 ) Φ 0 u ( u ) + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) p + k = 1 d φ k ( 0 ) Φ k u ( u ) + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) p ,
where ( ζ i , j ) i , j = 1 d = ζ , ( ζ k ; i , j ) i , j = 1 d = ζ k , ( γ i ) i = 1 d = γ , and ( γ k ; i ) i = 1 d = γ k , for k = 1 , , d .
(ii) 
The following inclusion holds
lim λ ; λ 2 ϕ λ ϕ H 0 0 u β R d e | t | | u | κ β κ φ ( u , β ) ϕ ( u , β ) p d β d t d u + ι H ( ϕ ) ,
where ι H is as in (16).
It may convenient to understand the ϕ λ as a function of λ when interpreting approximations of the limit (26) as a function of λ . The following result removes the challenges posed by the unbounded interval [ 2 , ) , in which λ lies, by reparameterizing problem (23) with a bounded meta-parameter λ ˜ .
Corollary 1.
Let φ be given by (7), β t be as in (18), ϕ be in H , and fix p , κ 1 . For every λ ˜ [ 2 , ) , define λ ˜ = λ 1 + λ 2 3 , 1 . Then ϕ λ H minimizes (23) if and only if it minimizes
( 1 λ ˜ ) 0 0 u β R d e | u | κ β κ φ ( u , β ) ϕ ( u , β ) p d β d t d u + λ ˜ Γ ( 1 + 1 κ ) 1 λ λ 0 e | u | κ Λ u ( ϕ ) λ 1 λ d u λ 1 λ ,
where Λ u ( ϕ ) is as in (24). In particular, the following inclusion holds
lim λ 1 ; λ 2 3 ϕ λ 1 λ argmin ϕ H 0 0 u β R d e | u | κ β κ φ ( u , β ) ϕ ( u , β ) p d β d t d u + ι H ( ϕ ) ,
where ι H is as in (16).
Next, the arbitrage-regularization of forward-rate curves will be considered using deep learning methods.

4.1. A Deep Learning Approach to Arbitrage-Regularization

The flexibility of feed-forward neural networks (ffNNs), as described in the universal approximation theorems of Hornik (1991); Kratsios (2019b), makes the collection of ffNNs a well-suited class of alternative models for the arbitrage-regularization problem. In the context of this paper an ffNN is any function from R d to R d + 1 of the form
W N + 1 ρ W N ρ W 1 ,
where each W i ( x ) = A i ( x ) + b i for some d i + 1 × d i dimensional matrix A i and some b i R d i + 1 , where d 1 = d and d N + 1 = d + 1 , ρ is a smooth activation-function, and • denotes component-wise composition. Fix integers N > 1 , h > 1 , and d > 0 . The set of all feed-forward neural networks with d i = h for 1 < i N , d N + 1 = d + 1 , and fixed activation function ρ will be denoted by NN N , h , d + 1 ρ .
To maintain analytic tractability, it will be required that our hypothesis class H consists of all ϕ W w p , k ( I × R d × U ) of the form
ϕ ( t , β , u ) = β ̲ ρ W N ρ W 1 ( u t ) ,
where β ̲ 1 = 1 , β ̲ i + 1 = β i for all i > 1 , and where β denotes the transpose of β . The process β t will be assumed to be a d dimensional Ornstein-Uhlenbeck process and in particular will be of the form(18). Therefore, the special class of models we consider here are of the form (7).
It has been shown in Rahimi and Recht (2008), among others, that if a network is appropriately designed, then training only the final layer and suitably initializing the matrices W N , , W 1 performs comparably well to networks with all the layers trained. More recently, the approximation capabilities of neural networks with randomized first few layers has is shown in Gonon (2020). This phenomenon was observed in numerous numerical studies, such as Jaeger and Haas (2004), where the entries of the matrices W N , , W 1 are chosen entirely randomly. This practice has also become fundamental to feasible implementations of recurrent neural network (RNN) theory and reservoir computing, as studied in Gelenbe (1989), where training speed becomes a key factor in determining the feasibility of the RNN and reservoir computing paradigms.
The hypothesis class of alternative factor models to be considered in the arbitrage-regularization problem effectively reduces from (28) to
ϕ ( t , β , u ) = β ̲ f ( u t ) ,
where β R d + 1 and f NN N , h , d + 1 ρ is initialized through by
( β , f ) β R n + 1 , f NN N , h , d + 1 ρ j = 1 J β f ( u j ) ϕ i ( u j ) p e | u j | κ ,
ϕ is a given factor model of the form (21), and { u j } j = 1 J is a uniform random sample on a non-empty compact subset of U ; J > 0 . Thus, the optimization problem (30) is random since it relies on randomly generated data points { u j } j = 1 J . However, instead of initializing f ^ in an ad-hoc random manner, the initialization (30) guarantees that the shapes generated by (29) are close to those produced by the naive factor model (7). In this case, a brief computation shows that Λ t u ( ϕ ) simplifies to
Λ u ( β ) φ 0 ( 0 ) β 0 f ( u ) + i = 1 d γ i β i F ( u ) 1 2 i , j = 1 d ζ i , j F ( u ) ( β i ) β j F ( u ) p + k = 1 d φ k ( 0 ) β k f ( u ) + i = 1 d γ k ; i β i F ( u ) 1 2 i , j = 1 d ζ k ; i , j F ( u ) ( β i ) β j F ( u ) p ,
where F ( u ) 0 u f ( s ) d s with the integration is defined component-wise and β i denotes the i t h entry of the vector β .

4.2. Numerical Implementations

The data-set for this implementation consists of German bond data for 31 maturities with observations obtained on 1273 trading days from January 4th 2010 to December 30th 2014. As is common practice in machine learning, further details of our code and implementation can be found on Kratsios (2019a). The code is relatively flexible and could be adapted to other bond-data sets.
The performance of the arbitrage-regularization methodology will now be applied to two factor models of affine type and its performance will be evaluated numerically. The first factor model is the commonly used dynamic Nelson-Siegel model of Diebold and Rudebusch (2013) and the second is a machine learning extension of the classical PCA approach to term-structure modeling. The performance of the arbitrage-regularization for each model will be benchmarked against both the original factor models and against the HJM-extension of the Vasiček model. The Vasiček model is a natural benchmark since, as shown in Björk and Christensen (1999), it is consistent with a low-dimensional factor model. Therefore, each of the factor models contains roughly the same number of driving factors which ensures that the comparisons are fair. Moreover, the numéraire process N t will be taken to be the money-market and we take P = P . The meta-parameter λ is taken to be 1 10 4 so that it is approximately 1.
As described in (29)–(31), the solution to the arbitrage-regularization (9), will be numerically approximated using randomly initialized deep feed-forward neural networks. The initialization network f of (29) is selected to have fixed depth N = 5 , fixed height d = d i = 10 2 and its weights are learned using the ADAM algorithm. The meta-parameters p = 2 and κ = 1 are chosen empirically, and the parameters of the Ornstein-Uhlenbeck process are estimated using the maximum-likelihood. Once the model parameters have been learned, and the factor model optimizing (9) has been learned, the day ahead predictions of the stochastic factors are obtained through Kalman filter estimates of the hidden parameters β t for each of the factor models. In the case of the Vasiček model the unobservable short-rate parameter is also estimated using the Kalman filter (see Bain and Crisan (2009)). These day-ahead predictions are then fed into the factor model and used to compute the next-day bond prices. These predictions are then compared to the realized next-day bond prices.

4.2.1. Model 1: The Dynamic Nelson-Siegel Model (Practitioner Model)

The Nelson-Siegel family is a low-dimensional family of forward-rate curve models used by various central banks to produce forward-rate or yield curves. As discussed in Carmona (2014), Finland, Italy, and Spain are such examples with other countries such as Canada, Belgium, and France relying on a slight extension of this model. The Nelson-Siegel model’s popularity is largely due to its interpretable factors and satisfactory empirical performance. It is defined by
φ ( t , β , u ) β 1 + β 2 e ( u t ) τ + β 3 ( u t ) e ( u t ) τ ,
where, as discussed in Diebold and Rudebusch (2013), the first factor represents the long-term level of the forward-rate curve, the second represents its shape, the third represents its curvature, and τ is a shape parameter; typically kept fixed.
Since market conditions are continually changing, the Nelson-Siegel model is typically extended from a static model to a dynamic model by replacing the static choice of β with a three-dimensional Ornstein-Uhlenbeck process and fixing the shape parameter τ > 0 as in Diebold and Rudebusch (2013). However, as demonstrated in Filipović (2001), the dynamic Nelson-Siegel model does not admit an equivalent measure to P N that makes the entire bond market simultaneously into local martingales. It was then shown in Christensen et al. (2011a) that a specific additive perturbation of the Nelson-Siegel family circumvents this problem, but empirically this is observed to come at the cost of reduced predictive accuracy. In our implementation, the parameters of the Ornstein-Uhlenbeck process driving β t i will be estimated using the maximum likelihood method described in Meucci (2005).

4.2.2. Model 2: dPCA (Machine-Learning Model)

The dynamic Nelson-Siegel model’s shape has been developed through practitioner experience. The second factor model considered here will be of a different type, with its factors learned algorithmically. As with (32), consider a static three-factor model for the forward-rate curve of the form
φ ( t , β , u ) i = 1 3 β i ϕ i ( u t )
where ϕ 1 , , ϕ 3 are the first three principal components of the forward-rate curve calibrated on the first 100 days of data.
Subsequently, a time-series for the β i parameters is generated, using the first 100 days of data, where on each day the β i are optimizes according to the Elastic-Net (ENET) regression problem of Hastie et al. (2015) defined by
β t E N E T = argmin β R 3 t = t j 100 t j k = 0 K t i = 1 3 β i ϕ i ( u k , t ) 2 + θ 1 i = 1 3 | β i | + θ 2 i = 1 3 | β i | 2 ,
on rolling windows consisting of 100 data points and { u k , t } k = 0 K t are the available data-points on the forward-rate curve at time t. The meta-parameters θ 1 and θ 2 are chosen by cross-validation on the first 100 training days and then fixed.
The ENET regression is used due to its factor selection abilities and computational efficiency. Next, analogously to the dynamic Nelson-Siegel model, an R 3 -valued Ornstein-Uhlenbeck process β ^ t is calibrated, using the maximum likelihood methodology outlined in Meucci (2005) to the time-series β t E N E T . These will provide the hidden stochastic factors in the dynamic PCA model (33). Thus, the dPCA model is the factor model with stochastic inputs defined by
i = 1 3 β ^ t i ϕ i ( u t ) .
The resulting model differs from the dynamic Nelson-Siegel model in that its factors and dynamics are not chosen by practitioner experience but learned through the data and implicitly encode some path-dependence. However, as with the dynamic Nelson-Siegel model it falls within the scope of Theorem 2.

5. Discussion

The predictive performance of the Vasiček (Vasiček), dPCA, A-Reg(dPCA), the dynamic Nelson-Siegel Model (dNS), the arbitrage-free Nelson-Siegel model of Christensen et al. (2011a) (AFNS), and the arbitrage-regularization of the dynamic Nelson-Siegel Model (A-Reg(dNS)) is reported in the following tables. The predictive quality is quantified by the estimated mean-squared errors when making day-ahead predictions of the bond price for each maturity, for all but the first days in our data-set. The lowest estimated mean-squared errors recorded are highlighted using bold font and the second lowest estimated mean-squared errors on each maturity are emphasized using italics.
Table 1 evaluates the performance of the considered models on the short-mid end of the curve. Overall, the performance of all the models are generally comparable at the very short end but rapidly after the dPCA model begins to outperform the rest. The accuracy of the Vasiček model on small maturities is likely to it being a short-rate model.
In Table 2 the dPCA model outperforms the rest by progressively larger margins. Most notably, in Table 3 and Table 4 which summarize the performance of the models for very long bong maturities the A-Reg(dPCA) model shows very low predictive error for a low number of factors while simultaneously being consistent with no-arbitrage conditions.
Even though arbitrage-free regularization does slightly reduce its accuracy, which is natural since it adds a constraint into an otherwise purely predictive process, the arbitrage-regularized dPCA model is still much more accurate than the rest.
An advantage of the A-Reg(dPCA) model is that it can accurately model the long-end of the forward-rate curve in an arbitrage-free manner. This fact is due to the dynamic factor selection properties of the dPCA model which otherwise could not have been used in a consistent manner if it were not for Theorem 2.
The numerical implementation highlights a few key facts about the arbitrage-regularization methodology. First, for nearly every maturity, the empirical performance of the arbitrage-regularization of a factor model is comparable to the original factor model. An analogous phenomenon was observed in Devin et al. (2010) when projecting infinite-dimensional arbitrage-free HJM models onto the finite-dimensional manifold of Nelson-Siegel curves. Therefore, correcting for arbitrage does not come at a significant predictive cost. However, it does come with the benefit of making the model theoretically sound and compatible with the techniques of arbitrage-pricing theory.
Second, since (9) incorporates an additional constraint into the modeling procedure the arbitrage-regularization of a factor model has a reduction in performance as compared to the initial factor model. This phenomenon has also been observed empirically in Christensen et al. (2011a) for the arbitrage-free Nelson-Siegel correction of the dynamic Nelson-Siegel model. Therefore, one should not expect to improve on the predictive performance of the initial factor model by correcting for the existence of arbitrage.
Third, the empirical performance of A-Reg(dPCA) was significantly better than the empirical performance of the other arbitrage-free models, namely AFNS, A-Reg(dNS), and the Vasiček model, across nearly all maturities. This was especially true for mid and long maturity zero-coupon bonds. Moreover, the performance of A-Reg(dPCA) and dPCA were comparable. Similarly, for most maturities, the empirical performance of the AFNS, dNS, and A-Reg(dNS) models were all similar and notably lower than the performance of the A-Reg(dPCA), dPCA, and Vasiček models. This emphasizes the fact that arbitrage-regularization methodology produces performant models only if the original model itself produces accurate predictions. Therefore, it is up to the practitioner to make an appropriate choice of model. However, the methodology used to develop dPCA and A-Reg(dPCA) could be used as a generic starting point.
Since the arbitrage-regularization methodology applies to nearly any factor model, one may use any methodology to produce an accurate reference factor model and then apply arbitrage-regularization to make it theoretically consistent at a small cost in performance. This opens the possibility to applying machine learning models, such as dPCA, to finance without the worry that they are not arbitrage-free since their asymptotic arbitrage-regularization is well-defined. Furthermore, the flexibility of deep feed-forward neural networks allows for the efficient implementation of (9).
The AFNS model proposes an arbitrage-free correction for the dynamic Nelson-Siegel. However, there is no guarantee that the AFNS corrects dNS optimally, and the predictive gap between these two models is documented in Christensen et al. (2011a). This is both echoed in Table 2 and Table 3. Furthermore, this is also reflected by Theorem 2 which guarantees asymptotic optimally of the A-Reg(dNS) model.
Unlike most regularization problems where there is a trade-off between the regularization term and the (un-regularized) objective function, the arbitrage-regularization requires λ to be taken as close to 1 as possible. Since the limit (26) can only be approximated numerically λ cannot be evaluated, however λ can be taken to be arbitrarily close to, but less than, 1. This choice is justified by Figure 1 and Figure 2 which illustrates that for values of λ near 1 there is little change in the model’s predictive performance.
Figure 1 and Figure 2 plot the change in the shape of the day-ahead predicted forward-rate curve and the change in the MSE of the day-ahead predicted bond prices as function of λ . In those figures, the curves with a pink color correspond to low values of λ and the curves progressing towards a blue color correspond to high values of lambda. Please note that in these plots, the reparameterization of Corollary 1 is used and an abuse of notation is made by using λ to denote λ ˜ .
In the case of the dNS model, an interesting property is that long-maturity bond prices do not change much, whereas short-maturity bond prices exhibit more dramatic changes. This property suggests that the dNS model is closer to being arbitrage-free on the long end of the curve than it is on the short end. This paper introduced a novel model-selection problem and provided an asymptotic solution in the form of the penalized optimization given by problem (9). The problem was posed and solved in a generalized HJM-type setting, within Theorem 1 and specialized to the term-structure of interest setting in Theorem 2 where simple expressions for the penalty term were derived.
The key innovation of the paper was the construction of the penalty term AF λ defining the arbitrage-regularization problem (9). The construction of this term in Proposition 1 relied on the structure of the generalized HJM-type setting proposed in Heath et al. (1992) and generalized in (4) which allowed one to encode the dynamics of a large class of factor models with stochastic inputs into the specific structure of any asset class.
The numerical feasibility of the proposed method was made possible by the flexibility of feed-forward neural networks, as demonstrated in Hornik (1991); Kratsios (2019b), which allowed the optimizer of the arbitrage-regularization problem (9) to be approximated to arbitrary precision. In the numerics section of this paper, it was found that the arbitrage-regularization of a factor model does not heavily impact its predictive performance but does make it approximately consistent with no-arbitrage requirements.
In particular, the compatibility of the proposed approach with generic factor models with stochastic inputs allowed for the consistent use of factor models generated from machine learning methods. The A-Reg(dPCA) model is a novel example of such an approximately arbitrage-free model where the dynamics and factors were generated algorithmically instead of through practitioner experience.
The precise quantification of the approximate arbitrage-free property was made in Proposition 2. Thus, approximately arbitrage-free factor models under the stylized assumption of no transaction costs were indeed arbitrage-free when proportional transaction costs are in place, which is a more realistic assumption.
Finally, the arbitrage-regularization approach introduced in this paper opens the door to the compatible use of predictive machine-learning factor models with the no-free lunch with vanishing risk condition. The general treatment in Theorem 1 can be transferred to other asset classes and models generated from other learning algorithms. This approach can be an important new avenue of research lying at the junction of predictive machine learning and mathematical finance.

Author Contributions

The conceptualization, development of the methodology, review and editing was undertaken by both authors. The formal analysis, investigation, data curation, coding, and draft preparation was done by A.K. The project’s supervision was undertaken by C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ETH Zürich Foundation and by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Acknowledgments

The authors thank Alina Stancu for many helpful discussions, Josef Teichmann for his support, and Christoph Czichowsky for his suggestion on transaction costs for finite values of λ .

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AbbreviationsMeaningPage
A-RegArbitrage-Regularization4
AFNSArbitrage-Free Nelson Siegel Model12
A-Reg(dPCA)Arbitrage-Regularized Dynamic Principal Component Analysis Model13
A-Reg(dNS)Arbitrage-Regularized Nelson-Siegel Model12
dPCADynamic Principal Component Analysis Model13
NFLVRNo Free Lunch with Vanishing Risk4
dNSdynamic Nelson Siegel Model12
The following symbols are used in this manuscript:
SymbolDescriptionPage
[ Y ] t quadratic-variation of the process Y t 4
[ [ Y ] ] t local quadratic-variation of the process Y t 4
AFArbitrage-Penalty7
β t Stochastic factor process4
β E N E T Solution to Elastic-Net regularized regression problem of Hastie et al. (2015)13
C b 1 , 2 Twice-Continuously Differentiable Boundedness-preserving Path-Functionals19
D Horizontal Derivative of Fournie (2010)19
D θ i Weak-Derivative in the sense of measures of θ 8
| D θ i | Total variation of D θ i 8
E P [ P ] The expectation under the probability measure P 7
Vertical Derivative of Fournie (2010)19
ϕ Reference factor model for latent process4
ϕ t u Abbreviation for ϕ ( t , β t , u ) 5
H Hypothesis class of functions for factor model5
Loss Function5
L ν μ p ( I × R d × U ) Space of (equivalence classes of) p-integrable functions with respect to ν μ 5
λ Meta-parameter for arbitrage-penalty problem7
Λ Process given by (13)7
μ Borel probability measure on R d U 5
NN N , h , n + 1 ρ Feed-forward neural networks with activation function ρ , depth N + 1 , and height h11
ν Borel probability measure on I5
ν μ Product (probability) measure of ν and μ 5
P Real-World Probability Measure4
P Equivalent Probability Measure to P 4
P N Martingale measure for the numéraire4
Q Equivalent Martingale Probability Measure to P 4
R d d-dimensional Euclidean space4
r t Short-rate5
S t Non-anticipative path-dependant functional in C b 1 , 2 encoding latent factor process4
into X t ( u ) u U
S + d d × d symmetric matrices6
U Borel subset of R D which indexes the large financial market4
W i Affine function between Euclidean spaces11
X Banach subspace of L ν μ p ( I × R d × U ) admitting a continuous embedding into5
C 1 , 2 , 2 ( I × R d × U )
X t ( u ) u U large financial market4

Appendix A. Background

Some relevant technical background is briefly discussed within this appendix. These topics include related aspects of the functional Itô calculus introduced in Dupire (2009) and developed in Fournie (2010), as well as pertinent stochastic differential geometric considerations; as developed in Elworthy (1982). Some elements from arbitrage-theory are also discussed concisely.
Next some background on arbitrage theory in large financial markets is discussed.

Appendix A.1. Arbitrage-Theory

The efficient market hypothesis, introduced in Bachelier (1900), states that the typical market participant cannot earn a risk-less profit. The efficient market hypothesis has found several mathematical formulations, as summarized in Fontana (2014). The most commonly used form is No Free Lunch with Vanishing Risk (NFLVR) as formulated in the sequence of papers Delbaen and Schachermayer (1998) which builds on the ideas of Harrison and Kreps (1979). Essentially, in the case of locally bounded processes, NFLVR expresses the non-existence of arbitrage-strategies as the existence of an equivalent local martingale measure (ELMM); that is, a probability measure which is equivalent to the reference probability measure and which simultaneously makes the price process of each market asset a local martingale.
However, mathematically bond markets are unlike traditional financial markets in that they are comprised of an uncountable number of assets, one for each potential maturity; thus, the results of Delbaen and Schachermayer (1998) no longer apply since their formulation requires that only a finite number of assets be tradeable. Instead, in the setting of such a large financial market, a satisfactory and economically meaningful no-arbitrage condition is obtained in Cuchiero et al. (2016) by considering strategies which can be described by limits of classical strategies written on a finite number of market assets. It is shown in Cuchiero et al. (2016), that when each asset in the market is locally bounded, as in Delbaen and Schachermayer (1998), then the no-arbitrage condition derived in Cuchiero et al. (2016) reduces to the existence of an equivalent local martingale measure. However, if the local-boundedness assumption is dropped, then the existence of an equivalent local martingale measure remains sufficient for precluding no-arbitrage but it no longer necessary.

Appendix A.2. Functional Itô Calculus

In what follows, the set of d × d symmetric positive definite matrices will be denoted by S d + . Moreover, the Skorohod space of cádlág paths in R d and S t + will be respectively denoted by D ( [ 0 , T ] ; R d ) and D ( [ 0 , T ] ; S d + ) . Moreover, for any R d -valued semi-martingale X t , one associates to it an S d + -valued process [ [ X ] ] t defined by
[ X ] t = 0 t [ [ X ] ] u d u .
Here [ [ X ] ] t is interpreted as the local quadratic variation of X t .
The functional Itô calculus of Dupire (2009) and Fournie (2010), has found many applications in mathematical finance. Applications range from, but are not limited to, computational methods for the Greeks of path-dependent options in Jazaerli and Saporito (2017) to portfolio theory in Pang and Hussain (2015). For fixed T > 0 , the basic concept of the Functional Itô Calculus, relies on non-anticipative and path-dependent extensions of the time and spatial derivative operators. Both these extensions defined on any cádlág path in
Λ d t [ 0 , T ] D ( [ 0 , t ] ; R d ) × D ( [ 0 , t ] ; S d + ) ,
by artificially extending its endpoint either vertically or horizontally. For any fixed 0 t s T , the horizontal extension of a path x t Λ d is defined by
x t , s t ( u ) x u : 0 u t x t : t u s ,
and its height h > 0 vertical extension is defined by
x t , h ( u ) x u : 0 u < t x t + h : u = t .
For a functional from Λ d to R , its vertical and horizontal derivatives on the path x t Λ d are defined by infinitesimally extending the path S ( x t , v t ) either vertically or horizontally using (A1) and (A2). However, since the calculus should not look into the future, instead only non-anticipative functionals are to be considered.
Definition A1
(Non-Anticipative Functional; (Fournie 2010, Definition 2.1)). A non-anticipative functional is a family of functionals S S t t [ 0 , T ] where
S t : D ( [ 0 , t ] ; R d ) × D ( [ 0 , t ] ; S d + ) R ( x t , v t ) S t ( x t , v t )
is measurable to the Borel σ-algebra on D ( [ 0 , t ] ; R d ) × D ( [ 0 , t ] ; S d + ) .
Analogously to classical calculus, the limiting ratio between the difference of S ( x t , v t ) and its extensions define its horizontal and vertical at any given time 0 t , respectively, by
D S ( x t , v t ) = lim Δ 0 S ( x t , Δ + t , v t , Δ + t ) S ( x t , v t ) Δ S t ( x t , v t ) = lim h 0 S ( x t , h , v t , h ) S ( x t , v t ) h ;
where the limits defined in (A3) are taken in Λ d with respect to metric
d ( x t , y s ) sup u [ 0 , s ] x t , s t ( u ) y u + | s t | ,
and for any x R d , v S d + one has ( x , v ) = x 2 + v F 2 where x is the usual Euclidean norm and v F is the Fröbenius norm. As it will be seen shortly, the horizontal and vertical derivatives extend the time and spacial derivatives from ordinary calculus. However, some technical remarks must first be addressed.
In general, these path derivatives are not defined on any non-anticipative functional S : Λ d R moreover even if it is, analogously to the classical calculus, there is no guarantee that its vertical (resp. horizontal) derivative is continuous with respect to d. Analogously with the traditional Itô calculus, the collection of paths for which one can derive a useful Itô formula are those which admit one continuous horizontal derivative and for which D S ( x t , v t ) and two vertical derivatives; i.e., 2 S ( x t , v t ) ( S ( x t , v t ) ) are both continuous.
However, for an tractable extension of the Itô formula with respect to S to be possible it is additionally required that S be boundedness-preserving. Here, a functional S : Λ d R is said to be boundedness-preserving if, for every non-empty compact subset K R , there exists some C K > 0 such that | f ( z t ) | C K if z t Λ d satisfies
y R : z t ( s ) = y f o r s o m e 0 s t K .
The collection subset of all functional S : Λ d R which are boundedness-preserving and have continuous boundedness-preserving derivatives S ( x t , v t ) , D S ( x t , v t ) , and D 2 S ( x t , v t ) at every path x t Λ d is denoted by C 1 , 2 .
For any functional S C 1 , 2 and any R d -valued semi-martingale, if in addition it satisfies the predictable-dependence condition
( t [ 0 , T ] ) ( ( x , v ) D ( [ 0 , t ] ; R d ) × S d + ) S t ( x t , v t ) = S t ( x t , v t ) .
Theorem A1
(Functional Itô Formula (Fournie 2010, Theorem 4.1)). For any non-anticipative functional S C 1 , 2 satisfying (A5) and any R d -valued semi-martingale the following holds
S t ( X t , [ [ X ] ] t ) S t ( X t , [ [ X ] ] t ) = 0 t D S ( X u , [ [ X ] ] u ) d u + 0 t S u ( X u , [ [ X ] ] u ) d X u + 1 2 0 t tr 2 S u ( X u , [ [ X ] ] u ) d [ X ] u .
This section closes by noting that Theorem A1 is a strict generalization of the classical Itô formula. This is because, the vertical and horizontal derivatives reduce to the familiar spacial and time derivatives when S t does not depend on any path-data as formalized by the following result.
Proposition A1
((Fournie 2010, Example 1)). If S ( x t , v t ) = f ( t , x t ( t ) ) for some f C 1 , 2 ( [ 0 , t ] × R d ; R ) then
D S ( x t , v t ) = t f ( t , x t ) a n d i S ( x t , v t ) = x i f ( t , x t ) ,
for i = 1 , 2 .

Appendix A.3. Background on Γ-Convergence

Pioneered in De Giorgi (1975), the theory of Γ -convergence describes the precise conditions required for the optimizers of a sequence of loss-functions { n } n to converge to the optimizer of a limiting loss-function . The entire theory of Γ -convergence can be seen as a sequential generalization of the Weierstrass’s theorem, a fundamental existence result from non-convex optimization theory. Geometrically, Weierstrass’s theorem states that it possible to continuously descent along the epigraph of (lower semi-continuity), if the set all small values of is compact (coercivity) then can be minimized; granted that does not only take infinite-values (proper).
Theorem A2
(Weierstrass’ Theorem; (Focardi 2012, Theorem 2.2)). Let ( X , d ) be a metric space and : X R be a lower semi-continuous function which is mildly coercive, that is there exists a sequentially compact subset K of X such that
inf x K ( x ) = inf x X ( x ) .
Then ℓ admits a minimizer on X if in addition inf x X ( x ) is finite.
Γ -limits provide precise conditions ensuring that any sequence of optimizers of n n converges to an optimizer of if is the Γ -limit of n n , written Γ −lim n n = . Before providing a precise definition of Γ -limits, a few of its properties are discussed.
Theorem A3
(Properties of Γ -convergence; (Focardi 2012, Theorem 2.8)). Let ( X , d ) be a metric space and { n } n be a sequence of functions from ( X , d ) to R { } . If Γ −lim n n exists, then
(i) 
(Lower Semicontinuity): Γ −lim n n is lower semicontinuous on X,
(ii) 
(Stability Under Continuous Perturbation): If g : X R is continuous, then
Γ −lim n ( n + g ) = Γ −lim n n + g ,
(iii) 
(Stability Under Relaxation): For every n let { ˜ n } n be a sequence of functions from X to R { } satisfying n l s c ˜ n n . Then
Γ −lim n ˜ n = Γ −lim n n ,
where l s c is the largest lower semi-continuous function dominated by ℓ, point-wise.
The first of the two critical ingredients in Theorem A2 was the lower semi-continuity of and the second was its coerciveness. Analogously to the definition of equi-continuity, in general, when working with a sequence of functions, to be able to apply the analogous machinery to Theorem A2 we require that there exists a non-empty compact subset of K satisfying
inf x X n ( x ) = inf x X ( x ) ; ( n ) .
The property described by (A8) is called mild equi-coerciveness. A stronger condition, that we will make use of is equi-coerciveness, which states that for every t > 0 , there exists a compact subset K t of ( X , d ) satisfying
n { x X : n ( x ) t } K t .
The central result in the Theory of Γ -converges is the following extension of Theorem A2.
Theorem A4
(The Fundamental Theorem of Γ -Convergence; (Braides 2002, Theorem 2.10), (Focardi 2012, Theorem 2.1)). If { n } n is a mildly equi-coercive sequence of functions from X to R { } for which the Γ-limit exists in X, then
lim k inf x X k n ( x ) = inf x X Γ −lim n n ( x ) .
If moreover, { n } n is equicoercive, then lim n arginf x X n ( x ) exists in X and
lim k arginf x X k n ( x ) arginf x X Γ −lim n n ( x ) .
This section closes with the precise definition of convergence in the Γ -sense.
Definition A2
((Dal Maso 1993, Chapter 4)). Let { n } n be a sequence of R { } -valued functions on a metric space ( X , d ) . A function ℓ is the Γ-limit of { n } n if and only if both
(i) 
l s c ( x ) lim−inf n n ( x n ) for every net { x n } n converging to x in ( X , d ) ,
(ii) 
l s c ( x ) lim−inf n n ( y n ) for some net { y n } n converging to x in ( X , d )
where l s c is the largest lower semi-continuous function dominated by ℓ point-wise.

Appendix B. Proofs

Proof of Proposition 1.
For legibility, for each u U , we represent the process ϕ ( t , β t , u ) by
ϕ t u = ϕ 0 u + 0 t α u ( s , ϕ s u ) d s + 0 t γ u ( s , ϕ s u ) d W s .
By the Functional Itô Formula, (Cont and Fournié 2013, Theorem 4.1), if follows that, for every u U
S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) = S 0 ( 0 , ϕ 0 u , [ [ ϕ u ] ] 0 ; u ) + 0 t D S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) + S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) α u ( s , ϕ s u ) + 1 2 [ 2 S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ] ( γ u ( s , ϕ s u ) ) d s + 0 t S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) γ u ( s , ϕ s u ) d W s .
From (A9) it follows that, for each u U , the price processes X t ( u ) are P N -local-martingales if and only if
D S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) = S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) α s u + 1 2 [ 2 S s ( ϕ s u , [ [ ϕ u ] ] s ; u ) ] ( γ s u )
Next, the quantities a u and γ u are described. By the usual Itô formula, it follows that for each u U
ϕ ( t , β t , u ) = 0 t ϕ t ( s , β s , u ) d s + 0 t i = 1 d ϕ β i ( s , β s , u ) d β s i + 0 t 1 2 i , j = 1 d 2 ϕ β i β j ( s , β s , u ) d [ b ] s i , j d s = 0 t ϕ t ( s , β s , u ) + i = 1 d ϕ β i ( s , β s , u ) α i ( s , β s ) d s + 1 2 i , j = 1 d 2 ϕ β i β j ( s , β s , u ) σ i ( s , β s ) σ j ( s , β s ) d s + 0 t i = 1 d ϕ β i ( s , β s , u ) σ i ( s , β s ) d W s i .
Therefore, (A11) implies that
α s u = ϕ t ( s , β s , u ) + i = 1 d ϕ β i ( s , β s , u ) α i ( s , β s ) + 1 2 i , j = 1 d 2 ϕ β i β j ( s , β s , u ) σ i ( s , β s ) σ j ( s , β s ) γ s u = i = 1 d ϕ β i ( s , β s , u ) σ i ( s , β s ) .
Incorporating (A12) into (A10) yields (11). Therefore, for S t ( ϕ t u , [ [ ϕ u ] ] t , u ) is a P N -local-martingale, simultaneously for every u U , if and only if P N -a.s. (11) holds simultaneously for every u U . □
Proof of Theorem 1.
We begin by showing (ii) and (iii), simultaneously. Since ( I × U , B I × U , μ ) is a finite measure space then
lim λ ; λ 2 ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = esssup ( t , u ) I × U Λ t u ( ϕ ) ;
By Assumption (2) (i), the map ( t , u ) Λ t u ( ϕ ) is continuous, for each ϕ H , therefore
esssup ( t , u ) I × U Λ t u ( ϕ ) = sup ( t , u ) I × U Λ t u ( ϕ ) .
Since the product of limits is the limit of the product, then (A13) yields
lim λ ; λ 2 λ ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = lim λ ; λ 2 λ esssup ( t , u ) I × U Λ t u ( ϕ ) = lim λ ; λ 2 λ sup ( t , u ) I × U Λ t u ( ϕ ) = 0 : sup ( t , u ) I × U Λ t u ( ϕ ) = 0 : else .
Since μ is a probability measure in I × U then for every 1 λ 1 λ 2 < , it follows that for every ϕ H
( t , u ) I × U | Λ t u ( ϕ ) | λ 1 d μ ( t , u ) 1 λ 1 ( t , u ) I × U | Λ t u ( ϕ ) | λ 2 d μ ( t , u ) 1 λ 2 esssup ( t , u ) I × U Λ t u ( ϕ ) .
Thus, for every ϕ H , the convergence described by (A13) is monotone (increasing) and non-negative; therefore by the monotone convergence theorem, it follows that
lim λ ; λ 2 λ E P ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = 0 : sup ( t , u ) I × U Λ t u ( ϕ ) = 0 , N a . s . : else .
Applying Proposition 1 to the right-hand side of (A17), it follows that
ι H ( ϕ ) = 0 : sup ( t , u ) I × U Λ t u ( ϕ ) , N a . s . = 0 : else
where ι H is defined as in (16). Therefore, the following limit holds
lim λ ; λ 2 λ E P ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = ι H ( ϕ ) ϕ H .
Thus (A19) establishes the convergence of the penalty functions AF λ to ι H , in H . Next, their Γ -convergence is established and their Γ -convergence is used to deduce the Γ -convergence of the objective functions in (9) to the objective function of problem (8).
Applying (A16) and the monotonicity of integration, it follows that for every ϕ H
λ 1 E P ( t , u ) I × U | Λ t u ( ϕ ) | λ 1 d μ ( t , u ) 1 λ 1 λ 2 ( t , u ) I × U | Λ t u ( ϕ ) | λ 2 d μ ( t , u ) 1 λ 2 ι H ( ϕ ) .
Thus, (A20) together with (Dal Maso 1993, Proposition 5.4) and (Braides 2002, Remark 1.40 (ii)) imply that (on H )
Γ −lim λ ; λ 2 λ E P ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = ι H l s c ( ϕ ) ,
where ι H l s c is the lower-semi-continuous relaxation of ι H on H ; that is, the smallest lower-semi-continuous function dominating ι H on H ; a precise description can be found on (Focardi 2012, page 11). However, Assumption (2) (ii) implies that ι H is indeed lower-semi-continuous; thus ι H l s c = ι H , on H . Therefore (A21) simplifies (on H ) to
Γ −lim λ ; λ 2 λ E P ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = ι H ( ϕ ) .
Since Γ -limits are invariant under continuous perturbation, see (Focardi 2012, Theorem 2.8.), then (A22) and the continuity of ( φ · ) on H implies that (on H )
Γ −lim λ ; λ 2 φ · + λ E P ( t , u ) I × U | Λ t u ( · ) | λ d μ ( t , u ) 1 λ = ( φ ϕ ) + ι H ( · ) .
To apply the Fundamental Theorem of Γ -convergence, the family of functions on the left-hand side of (A23) must be equicoercive. Since λ E P ( t , u ) I × U | Λ t u ( · ) | λ d μ ( t , u ) 1 λ is non-negative and since H is unbounded then ( φ ϕ ) is coercive on H , ie:
lim λ ; λ 2 ( φ ϕ ) = ,
thus, it follows that
( φ ϕ ) ( φ ϕ ) + λ E P ( t , u ) I × U | Λ t u ( · ) | λ d μ ( t , u ) 1 λ λ 2 ;
whence, by (Dal Maso 1993, Proposition 7.7) together with (A24) ( φ ϕ ) + λ E P ( t , u ) I × U | Λ t u ( · ) | λ d μ ( t , u ) 1 λ λ 2 forms an equicoercive family, on H .
Thus, ( φ ϕ ) + λ E P ( t , u ) I × U | Λ t u ( · ) | λ d μ ( t , u ) 1 λ λ 2 defines an equicoercive family which Γ -converges to ι H , on H . Therefore, together (A23) and (A25) imply that the Fundamental Theorem of Γ -convergence, (Dal Maso 1993, Theorem 7.8), applies. Hence,
lim λ ; λ 2 inf ϕ H ( φ ϕ ) + λ E P ( t , u ) I × U | Λ t u ( ϕ ) | λ d μ ( t , u ) 1 λ = min ϕ H ( φ ϕ ) + ι H ( ϕ ) .
This shows both (ii) and (iii).
Lastly, for (i), (Dal Maso 1993, Theorem 7.8) also implies that ( φ ϕ ) + ι H ( · ) is coercive on H . Hence, ( φ ϕ ) + ι H ( · ) is coercive, lower-semi-continuous, and bounded-below by 0. Therefore, by Weirestrass’s Theorem, (Focardi 2012, Theorem 2.2), it follows that ( φ ϕ ) + ι H ( · ) admits a minimizer on H . □
Proof of Proposition 2.
For every 0 t Assumption 3 implies that
S t ( ϕ ^ t u i , [ [ ϕ ^ u i ] ] t ; u i ) S t ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) M = M m m < M m S t ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) < k t S t ( ϕ t u i , [ [ ϕ u i ] ] t ; u i ) .
Therefore, (Guasoni 2006, Lemma 2.1) and (Guasoni 2006, Remark 2.5) implies that for any admissible strategy θ
V t ( θ ) 0 t S s ( ϕ ^ s u i , [ [ ϕ ^ u i ] ] s ; u i ) d s .
By Theorem A2, since S s ( ϕ ^ s u i , [ [ ϕ ^ u i ] ] s ; u i ) is a P N -local martingale and by construction P N P then the (Delbaen and Schachermayer 1994, Fundamental Theorem of Asset Pricing) implies that if
P 0 0 T S s ( ϕ ^ s u i , [ [ ϕ ^ u i ] ] s ; u i ) P 0 = 0 T S s ( ϕ ^ s u i , [ [ ϕ ^ u i ] ] s ; u i ) .
Combining (A28) and (A29) yields
P 0 V T ( θ ) P 0 = V T ( θ ) ,
for every 0 T and every finite number of u 1 , , u n U . □
Proof of Theorem 2.
By definition of ( R d , g t ) , β t , and H Assumptions 1 (i) and (iv) hold; thus only Assumptions 1 (ii) and (iii) must be verified, in order to ensure that the stated problem falls within the scope of this paper. Let μ = μ ˜ υ where d μ ˜ d m ( u ) = e | u | κ Γ 1 + 1 κ I [ 0 , ) , υ is the unique probability measure with Lebesgue density proportional to e β κ on R d , and where d ν d m ( t ) = e | t | 1 [ 0 , ) , 1 [ 0 , ) is the (probabilistic not convex analytic) indicator function on the interval [ 0 , ) , here m is the Lebesgue measure on R . Therefore, the elements of W w p , k ( I × R d × U ) are elements of L ν μ p ( I × R d × U ) . Since [ 0 , ) × [ 0 , ) × R d has a smooth boundary and since k was assumed to satisfy (20), then the (weighted) Morrey-Sobolev Theorem of Brown and Opic (1992) applies. Therefore, W w p , k ( I × R d × U ) can be continuously embedded within C 2 , 2 , 2 ( I × R d × U ) and therefore Assumption 1 (ii) holds. Furthermore, by (6) and together with (Cont and Fournié 2013, Example 1) each S t ( · , · ; u ) satisfies Assumption 1 (iii); thus Assumption 1 is satisfied.
Next, (26) is reformulated in terms of Theorem 1 and Assumptions 2 are verified. Subsequently, the optimizers of the objective-function under the limit on the left-hand side of (26) are shown to exist for λ 2 .
It will be convenient to work under the parameterization of Musiela and Rutkowski (1997), where
X t ( u ) = exp t u ϕ ( t , β t , s ) d s = exp 0 τ ϕ ( t , β t , v ) d v ,
where τ = u t and u = u s for t s u . Under this reparameterization, in the case where each S t ( · , · ; u ) is as in (6), it is shown in (Filipović 2009, Proposition 9.3) that for each u U and 0 t , the bond prices S t ϕ t u , [ [ ϕ u ] ] t ; u are each P N -local martingales if and only if for every u U , 0 t u , and P N -a.e ω Ω , the following holds
0 = c 0 ϕ 0 ( u ) + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) + k = 1 d β t k c k ϕ k ( u ) + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) ,
where the constants c i are defined by c i ϕ i ( 0 ) for i = 0 , , d . Equation (A30) is satisfied for P N -a.s every ω Ω , for every 0 t u , and for every u U each Λ t u ( ϕ ) = 0 are P N -a.s; where Λ t u ( ϕ ) is defined by
Λ t u ( ϕ ) c 0 Φ 0 u ( u ) + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) p + k = 1 d c k Φ k u ( u ) + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) p .
Therefore, since Λ t u ( ϕ ) satisfies (13) then the family { A F λ } λ > 0 of functions defined by
A F λ ( ϕ ) λ 0 β R d 0 u 1 Γ ( 1 + 1 κ ) e | u | κ Λ s u ( ϕ ) λ d s d β d u λ ,
define an arbitrage-penalty in the sense of (14). Moreover, by (A31) Equation (A32) further simplifies to
A F λ ( ϕ ) = λ Γ ( 1 + 1 κ ) 1 λ 0 e | u | κ Λ t u ( ϕ ) λ d u λ
Since W w k , p ( I × R d × U ) is continuously embedded in C 2 ( I × R d × U ) , then each equivalence class ϕ W w k , p ( I × R d × U ) can be identified with a continuous function from I × R d × U ; therefore each of the functions
u c 0 Φ 0 u ( u ) + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) p u c k Φ k u ( u ) + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) p ,
are continuous in u; moreover, they are continuous in t since they are constant in t. Therefore, ( t , u ) Λ t u ( ϕ ) is continuous for every ϕ H ; whence Assumption 2 (i) holds.
Next, Assumption 2 (ii) will be verified. Given the dynamics of (18), (Filipović 2009, Proposition 9.3) characterizes all ϕ 0 , , ϕ d for which the forward-rate curve (21) corresponds to a bond market, through (6), in which each bond price is a P N -local-martingale; all such ϕ 0 , , ϕ d are solutions to the differential Riccati system
Φ 0 u ( u ) = c 0 + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) Φ 0 ( 0 ) = 0 Φ k u ( u ) = c k + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) Φ k ( 0 ) = 0 ;
where c 0 , , c k are any elements of R . Thus,
ϕ H : ( u U ) S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is   a P N local martingale = ϕ H : ( c 0 , , c d R ) { Φ i } i = 0 d solves ( A35 ) ,
where as before, { Φ i } i = 0 d and ϕ are related through (21) and (22). Differentiating across the Riccatti system (A35) with respect to u yields an equivalent differential system of the form
2 Φ 0 u 2 ( u ) = c 0 + i = 1 d γ i Φ i u 1 2 i , j = 1 d ζ i , j Φ i u Φ j ( u ) + Φ i ( u ) Φ j u , 2 Φ k u 2 ( u ) = c k + i = 1 d γ k ; i Φ i u 1 2 i , j = 1 d ζ k ; i , j Φ i u Φ j ( u ) + Φ i ( u ) Φ j u , Φ 0 ( 0 ) = 0 , Φ k ( 0 ) = 0 , Φ 0 u ( 0 ) = c 0 Φ k u ( 0 ) = c k ;
Therefore, (A36) can be rewritten as
ϕ H : ( u U ) S t ( ϕ t u , [ [ ϕ u ] ] t ; u ) is   a P N local martingale = ϕ H : { Φ i } i = 0 d solves ( ) for   some c 0 , , c d R .
Since the C k ( R ; R d + 1 ) is equipped with its the topology of uniform convergence on compacts of functions and their first two derivatives, then it follows that the right-hand side of (A38) is closed in C 2 ( R ; R d + 1 ) ; whence it is closed in the relative topology on H C 2 ( R ; R d + 1 ) . Thus, Assumption (ii) holds.
Lastly, since the loss function , defined by
( φ ϕ ) 0 0 u β R d e | t | | u | κ β κ φ ( u t , β ) ϕ ( u t , β ) p d t d β d u ;
is continuous on H ; then the conditions for Theorem 1 are all met. Therefore, (26) holds.
Since second-order differential operators are continuous from C 2 ( I × R d × U ) to C 0 ( I × R d × U ) , where the latter is equipped with the convergence on compact topology, then functions
Φ c 0 Φ 0 u ( u ) + i = 1 d γ i Φ i ( u ) 1 2 i , j = 1 d ζ i , j Φ i ( u ) Φ j ( u ) p Φ c k Φ k u ( u ) + i = 1 d γ k ; i Φ i ( u ) 1 2 i , j = 1 d ζ k ; i , j Φ i ( u ) Φ j ( u ) p ,
are continuous from C 2 ( I × R d × U ) to [ 0 , ) . Furthermore, since W w p , k ( I × R d × U ) is continuously embedded within C 2 ( I × R d × U ) then the functions of (A40) are continuous from W w p , k ( I × R d × U ) to [ 0 , ) . The definition of the weight function in (19) implies that, for every ϕ H and every λ 2 , the integral (A39) is finite. Thus, for every λ 2 , the map Φ λ ( ϕ ) is continuous from W w p , k ( I × R d × U ) to [ 0 , ) . Furthermore, since the loss-function is continuous, then, for every λ 2 the function Φ ( φ ϕ ) + AF λ ( ϕ ) is continuous. Furthermore, since both and AF λ are bounded below by 0 and finite-valued then so is ( φ · ) + AF λ ( · ) . Lastly, since is coercive then by definition, for every r 0 , there exists a compact subset K r H satisfying
ϕ H : ( φ ϕ ) k K r .
Therefore, the non-negativity of each AF λ implies that for every λ 2 and every r 0 ,
ϕ H : ( φ ϕ ) + AF λ ( ϕ ) k ϕ H : ( φ ϕ ) k K r ;
thus (A42) implies that ϕ ( φ · ) + AF λ ( · ) is coercive. Thus, for every λ 2 , the function ϕ ( φ · ) + AF λ ( · ) is lower semi-continuous, bounded-below, proper, and coercive on H ; thus by Weirestrass’s Theorem, (Focardi 2012, Theorem 2.2), it admits a minimizer on H . □
Proof of Corollary 1.
since argmin is invariant with respect to multiplication by a constant then for every λ 2
argmin ϕ H 0 0 u β R d e | u | κ β κ φ ( u t , β ) ϕ ( u t , β ) p d β d t d u + λ Γ ( 1 + 1 κ ) 1 λ 0 e | u | κ Λ u ( ϕ ) λ d u λ = argmin ϕ H 1 1 + λ 0 0 u β R d e | u | κ β κ φ ( u t , β ) ϕ ( u t , β ) p d β d t d u + λ 1 + λ Γ ( 1 + 1 κ ) 1 λ 0 e | u | κ Λ u ( ϕ ) λ d u λ = argmin ϕ H ( 1 λ ˜ ) 0 0 u β R d e | u | κ β κ φ ( u t , β ) ϕ ( u t , β ) p d β d t d u + λ ˜ Γ ( 1 + 1 κ ) 1 λ 0 e | u | κ Λ u ( ϕ ) λ d u λ ,
where λ ˜ λ 1 + λ . Please note that λ [ 2 , ) if and only if λ ˜ [ 2 3 , 1 ) . This reparameterization yields the conclusions. □

References

  1. Bachelier, Louis. 1900. Théorie de la spéculation. Annales Scientifiques de l’É.N.S. 17: 21–86. [Google Scholar] [CrossRef]
  2. Bain, Alan, and Dan Crisan. 2009. Fundamentals of stochastic filtering. In Stochastic Modelling and Applied Probability. New York: Springer, vol. 60. [Google Scholar]
  3. Björk, Tomas. 2009. Arbitrage Theory in Continuous Time, 3rd ed. Oxford: Oxford University Press. [Google Scholar]
  4. Björk, Tomas, and Bent Jesper Christensen. 1999. Interest rate dynamics and consistent forward rate curves. Mathematical Finance 9: 323–48. [Google Scholar] [CrossRef]
  5. Braides, Andrea. 2002. Γ-convergence for beginners. In Oxford Lecture Series in Mathematics and Its Applications. Oxford: Oxford University Press, vol. 22. [Google Scholar]
  6. Brown, R. C., and Bohumir Opic. 1992. Embeddings of weighted Sobolev spaces into spaces of continuous functions. Proceedings of the Royal Society of London. Series A 439: 279–96. [Google Scholar]
  7. Carmona, René. 2014. Springer Texts in Statistics. In Statistical Analysis of Financial Data in R, 2nd ed. Springer Texts in Statistics. New York: Springer. [Google Scholar]
  8. Chen, Hung-Ching Justin, and Magdon-Ismail Malik. NN-OPT: Neural Network for Option Pricing Using Multinomial Tree. In International Conference on Neural Information Processing. New York: Springer, pp. 360–369.
  9. Chen, Luyang, Markus Pelger, and Jason Zhu. 2019. Deep learning in asset pricing. Available online: https://ssrn.com/abstract=3350138 (accessed on 15 April 2020).
  10. Christensen, Jens H. E., Francis X. Diebold, and Glenn D. Rudebusch. 2011a. The affine arbitrage-free class of Nelson-Siegel term structure models. Journal of Econometrics 164: 4–20. [Google Scholar] [CrossRef] [Green Version]
  11. Cont, Rama, and David-Antoine Fournié. 2013. Functional Itô calculus and stochastic integral representation of martingales. Annals of Probability 41: 109–33. [Google Scholar] [CrossRef]
  12. Cuchiero, Christa. 2011. Affine and Polynomial Processes. Ph.D. thesis, ETH Zurich, Zürich, Switzerland. [Google Scholar]
  13. Cuchiero, Christa, Irene Klein, and Josef Teichmann. 2016. A new perspective on the fundamental theorem of asset pricing for large financial markets. Theory of Probability and Its Applications 60: 561–79. [Google Scholar] [CrossRef] [Green Version]
  14. Dal Maso, Gianni. 1993. An introduction to Γ-convergence. In Progress in Nonlinear Differential Equations and Their Applications. Boston: Birkhäuser Boston, Inc., vol. 8. [Google Scholar]
  15. De Giorgi, Ennio. 1975. Sulla convergenza di alcune successioni d’integrali del tipo dell’area. Rend. Mat. 8: 277–94. [Google Scholar]
  16. Delbaen, Freddy, and Walter Schachermayer. 1994. A general version of the fundamental theorem of asset pricing. Mathematische Annalen 300: 463–520. [Google Scholar] [CrossRef]
  17. Delbaen, Freddy, and Walter Schachermayer. 1998. The fundamental theorem of asset pricing for unbounded stochastic processes. Mathematische Annalen 312: 215–50. [Google Scholar] [CrossRef] [Green Version]
  18. Devin, Siobhán, Bernard Hanzon, and Thomas Ribarits. 2010. A Finite-Dimensional HJM Model: How Important is Arbitrage-Free Evolution? International Journal of Theoretical and Applied Finance 13: 1241–63. [Google Scholar] [CrossRef]
  19. Diebold, Francis X., and Glenn D. Rudebusch. 2013. Yield Curve Modeling And Forecasting. Princeton: Princeton University Press. [Google Scholar]
  20. Dupire, Bruno. 2009. Functional Itô Calculus. Bloomberg Portfolio Research Paper No. 2009-04-FRONTIERS. New York: Bloomberg L.P. [Google Scholar]
  21. Elworthy, K. David. 1982. Stochastic Differential Equations on Manifolds. London Mathematical Society Lecture Note Series; Cambridge: Cambridge University Press, vol. 70. [Google Scholar]
  22. Filipović, Damir. 2000. Exponential-polynomial families and the term structure of interest rates. Bernoulli 6: 1081–107. [Google Scholar] [CrossRef]
  23. Filipović, Damir. 2001. Consistency Problems for Heath-Jarrow-Morton Interest Rate Models. Lecture Notes in Mathematics. Berlin: Springer, vol. 1760. [Google Scholar]
  24. Filipović, Damir. 2009. Term-Structure Models. A Graduate Course. Springer Finance. Berlin: Springer. [Google Scholar]
  25. Filipović, Damir, Stefan Tappe, and Josef Teichmann. 2010. Term structure models driven by Wiener processes and Poisson measures: Existence and positivity. SIAM Journal on Financial Mathematics 1: 523–54. [Google Scholar] [CrossRef] [Green Version]
  26. Filipović, Damir, and Josef Teichmann. 2004. On the geometry of the term structure of interest rates. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 460: 129–67. [Google Scholar] [CrossRef] [Green Version]
  27. Focardi, Matteo. 2012. Γ-convergence: A tool to investigate physical phenomena across scales. Mathematical Methods in the Applied Sciences 35: 1613–58. [Google Scholar] [CrossRef]
  28. Fontana, Claudio. 2014. A note on arbitrage, approximate arbitrage and the fundamental theorem of asset pricing. Stochastics An International Journal of Probability and Stochastic Processes 86: 922–31. [Google Scholar] [CrossRef] [Green Version]
  29. Fournie, David-Antoine. 2010. Functional Ito Calculus and Applications. Ph.D. thesis, Columbia University, New York, NY, USA. [Google Scholar]
  30. Gelenbe, Erol. 1989. Random neural networks with negative and positive signals and product form solution. Neural Computation 1: 502–10. [Google Scholar] [CrossRef]
  31. Gonon, Lukas, Lyudmila Grigoryeva, and Juan-Pablo Ortega. 2020. Approximation bounds for random neural networks and reservoir systems. arXiv arXiv:2002.05933. [Google Scholar]
  32. Guasoni, Paolo. 2006. No arbitrage under transaction costs, with fractional Brownian motion and beyond. Mathematical Finance 16: 569–82. [Google Scholar] [CrossRef]
  33. Harrison, J. Michael, and David M. Kreps. 1979. Martingales and arbitrage in multiperiod securities markets. Journal of Economic Theory 20: 381–408. [Google Scholar] [CrossRef]
  34. Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. 2015. Statistical Learning with Sparsity: The LASSO and Generalizations. Boca Raton: CRC Press. [Google Scholar]
  35. Heath, David, Robert Jarrow, and Andrew Morton. 1992. Bond Pricing and the Term Structure of Interest Rates: A New Methodology for Contingent Claims Valuation. Econometrica 60: 77–105. [Google Scholar] [CrossRef]
  36. Hornik, Kurt. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4: 251–57. [Google Scholar] [CrossRef]
  37. Jaeger, Herbert, and Harald Haas. 2004. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304: 78–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Jazaerli, Samy, and Yuri F. Saporito. 2017. Functional Itô calculus, Path-Dependence and the Computation of Greeks. Stochastic Processes and their Applications 127: 3997–4028. [Google Scholar] [CrossRef] [Green Version]
  39. Kratsios, Anastasis. 2019a. Deep Arbitrage-Free Regularization. Available online: https://github.com/AnastasisKratsios/Deep_Arbitrage_Free_Regularization/ (accessed on 8 April 2020).
  40. Kratsios, Anastasis. 2020. The Universal Approximation Property: Characterizations, Existence, and a Canonical Topology for Deep-Learning. Machine Learning arXiv:1910.03344. [Google Scholar]
  41. Meucci, Attilio. 2005. Risk and Asset Allocation. Springer Finance. Berlin: Springer. [Google Scholar]
  42. Musiela, Marek, and Marek Rutkowski. 1997. Martingale Methods in Financial Modelling. In Applications of Mathematics. Berlin: Springer, vol. xii, p. 512. [Google Scholar]
  43. Nelson, Charles R., and Andrew F. Siegel. 1987. Parsimonious modeling of yield curves. Journal of Business 60: 473–89. [Google Scholar] [CrossRef]
  44. Pang, Tao, and Azmat Hussain. 2015. An application of functional Ito’s formula to stochastic portfolio optimization with bounded memory. In Paper presented at 2015 Proceedings of the Conference on Control and Its Applications, Paris, France, July 8–10; pp. 159–166. [Google Scholar]
  45. Rahimi, Ali, and Benjamin Recht. 2008. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems. Vancouver: NIPS, pp. 1177–1184. [Google Scholar]
  46. Schweizer, Martin. 1995. On the minimal martingale measure and the Föllmer-Schweizer decomposition. Stochastic Analysis and Applications 13: 573–99. [Google Scholar] [CrossRef]
  47. Shreve, Steven E. 2004. Stochastic Calculus for finance II: Continuous-Time Models. Berlin: Springer Science & Business Media, vol. 11. [Google Scholar]
  48. Tibshirani, Robert. 1996. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological) 58: 267–88. [Google Scholar] [CrossRef]
  49. Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2006. Sparse principal component analysis. Journal of Computational and Graphical Statistics 15: 265–86. [Google Scholar] [CrossRef] [Green Version]
Sample Availability: Code is available at: https://github.com/AnastasisKratsios.
Figure 1. Day-ahead predictions as a function of λ ˜ across given maturities for the A−Reg(dNS) model. (a) Average day-ahead predicted yield curves. (b) Estimated MSE of day-ahead bond price predictions.
Figure 1. Day-ahead predictions as a function of λ ˜ across given maturities for the A−Reg(dNS) model. (a) Average day-ahead predicted yield curves. (b) Estimated MSE of day-ahead bond price predictions.
Risks 08 00040 g001
Figure 2. Day-ahead predictions as a function of λ ˜ across given maturities for the A−Reg(dPCA) model. (a) Average day-ahead predicted yield curves. (b) Estimated MSE of day-ahead bond price predictions.
Figure 2. Day-ahead predictions as a function of λ ˜ across given maturities for the A−Reg(dPCA) model. (a) Average day-ahead predicted yield curves. (b) Estimated MSE of day-ahead bond price predictions.
Risks 08 00040 g002
Table 1. (Short): MSE Comparisons for 1-day ahead bond-price predictions.
Table 1. (Short): MSE Comparisons for 1-day ahead bond-price predictions.
Model ∖ Maturity0.51234
Vasiček3.155 × 10 1 4.323 × 10 1 3.622 × 10 1 1.950 × 10 1 5.730 × 10 2
dPCA2.526 × 10 1 4.349 × 10 1 4.176 × 10 1 2.526 × 10 1 9.261 × 10 2
A-Reg(dPCA)8.066 × 10 1 6.943 × 10 1 5.110 × 10 1 2.755 × 10 1 9.588 × 10 2
dNS4.513 × 10 2 1.479 × 10 1 2.134 × 10 1 1.477 × 10 1 5.968 × 10 2
AFNS3.729 × 10 1 5.315 × 10 1 4.414 × 10 1 2.436 × 10 1 8.301 × 10 2
A-Reg(dNS)2.903 × 10 2 9.514 × 10 2 1.601 × 10 1 1.235 × 10 1 6.482 × 10 2
Model ∖ Maturity56789
Vasiček7.735 × 10 3 1.996 × 10 4 1.024 × 10 3 1.480 × 10 3 1.348 × 10 3
dPCA2.193 × 10 2 3.326 × 10 3 3.119 × 10 4 1.897 × 10 5 8.097 × 10 7
A-Reg(dPCA)2.221 × 10 2 3.340 × 10 3 3.123 × 10 4 1.898 × 10 5 8.099 × 10 7
dNS1.972 × 10 2 8.313 × 10 3 5.323 × 10 3 3.925 × 10 3 2.998 × 10 3
AFNS1.840 × 10 2 3.225 × 10 3 1.633 × 10 3 2.084 × 10 3 2.708 × 10 3
A-Reg(dNS)3.579 × 10 2 2.236 × 10 2 1.523 × 10 2 1.050 × 10 2 7.308 × 10 3
Table 2. (Mid): MSE Comparisons for 1-day ahead bond-price predictions.
Table 2. (Mid): MSE Comparisons for 1-day ahead bond-price predictions.
Model ∖ Maturity1011121314
Vasiček1.108 × 10 3 9.002 × 10 4 7.382 × 10 4 6.125 × 10 4 5.135 × 10 4
dPCA2.578 × 10 8 6.328 × 10 10 1.433 × 10 11 2.607 × 10 13 4.179 × 10 15
A-Reg(dPCA)2.579 × 10 8 6.328 × 10 10 1.433 × 10 11 2.607 × 10 13 4.179 × 10 15
dNS2.381 × 10 3 1.969 × 10 3 1.686 × 10 3 1.484 × 10 3 1.337 × 10 3
AFNS3.407 × 10 3 4.163 × 10 3 4.918 × 10 3 5.603 × 10 3 6.164 × 10 3
A-Reg(dNS)5.215 × 10 3 3.827 × 10 3 2.885 × 10 3 2.229 × 10 3 1.761 × 10 3
Model ∖ Maturity1516171819
Vasiček4.342 × 10 4 3.698 × 10 4 3.169 × 10 4 2.729 × 10 4 2.360 × 10 4
dPCA6.714 × 10 17 9.566 × 10 19 1.426 × 10 20 1.819 × 10 22 2.749 × 10 24
A-Reg(dPCA)6.714 × 10 17 9.566 × 10 19 1.426 × 10 20 1.818 × 10 22 2.746 × 10 24
dNS1.225 × 10 3 1.138 × 10 3 1.069 × 10 3 1.012 × 10 3 9.639 × 10 4
AFNS6.577 × 10 3 6.867 × 10 3 7.115 × 10 3 7.453 × 10 3 8.052 × 10 3
A-Reg(dNS)1.422 × 10 3 1.171 × 10 3 9.831 × 10 4 8.406 × 10 4 7.316 × 10 4
Table 3. (Long): MSE Comparisons for 1-day ahead bond-price predictions.
Table 3. (Long): MSE Comparisons for 1-day ahead bond-price predictions.
Model ∖ Maturity2021222324
Vasiček2.049 × 10 4 1.784 × 10 4 1.558 × 10 4 1.364 × 10 4 1.196 × 10 4
dPCA3.816 × 10 26 5.254 × 10 28 8.047 × 10 30 9.958 × 10 32 1.336 × 10 33
A-Reg(dPCA)3.781 × 10 26 4.847 × 10 28 3.015 × 10 30 2.684 × 10 30 1.452 × 10 29
dNS9.228 × 10 4 8.866 × 10 4 8.542 × 10 4 8.247 × 10 4 7.976 × 10 4
AFNS9.097 × 10 3 1.075 × 10 2 1.310 × 10 2 1.611 × 10 2 1.960 × 10 2
A-Reg(dNS)6.480 × 10 4 5.838 × 10 4 5.349 × 10 4 4.984 × 10 4 4.814 × 10 4
Model ∖ Maturity2526272829
Vasiček1.051 × 10 4 9.254 × 10 5 8.160 × 10 5 7.205 × 10 5 6.371 × 10 5
dPCA2.067 × 10 35 2.814 × 10 37 3.639 × 10 39 5.371 × 10 41 7.459 × 10 43
A-Reg(dPCA)9.846 × 10 29 1.102 × 10 27 2.108 × 10 26 6.986 × 10 25 3.979 × 10 23
dNS7.722 × 10 4 7.484 × 10 4 7.257 × 10 4 7.041 × 10 4 6.835 × 10 4
AFNS2.323 × 10 2 2.660 × 10 2 2.929 × 10 2 3.097 × 10 2 3.147 × 10 2
A-Reg(dNS)4.911 × 10 4 5.288 × 10 4 6.011 × 10 4 7.214 × 10 4 9.138 × 10 4
Table 4. (30 Year): MSE Comparisons for 1-day ahead bond-price predictions.
Table 4. (30 Year): MSE Comparisons for 1-day ahead bond-price predictions.
30
Vasiček6.371 × 10 5
dPCA7.459 × 10 43
A-Reg(dPCA)3.979 × 10 23
dNS6.835 × 10 4
AFNS3.147 × 10 2
A-Reg(dNS)9.138 × 10 4

Share and Cite

MDPI and ACS Style

Kratsios, A.; Hyndman, C. Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization. Risks 2020, 8, 40. https://doi.org/10.3390/risks8020040

AMA Style

Kratsios A, Hyndman C. Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization. Risks. 2020; 8(2):40. https://doi.org/10.3390/risks8020040

Chicago/Turabian Style

Kratsios, Anastasis, and Cody Hyndman. 2020. "Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization" Risks 8, no. 2: 40. https://doi.org/10.3390/risks8020040

APA Style

Kratsios, A., & Hyndman, C. (2020). Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization. Risks, 8(2), 40. https://doi.org/10.3390/risks8020040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop