In this section, we present a GP-based methodology to build an arbitrage-free European payer (receiver) swaption cube.
For an input domain
of maturities, tenors and strikes, we build at quotation date
, a payer (receiver) swaption price cube
satisfying the no arbitrage conditions given in Proposition 1, from
n noisy observations
of the function
at input points
where
,
. We assume that the price function
is represented as a Gaussian process. The market fit condition is then written as
where
is the vector of the values of the GP at the input points
. The additive noise term
is assumed to be a zero-mean Gaussian vector, independent from
, and with a homoscedastic covariance matrix given as
, where
is the identity matrix of size
n.
3.1. Classical 3-Dimensional Gaussian Regression
We consider a zero-mean Gaussian process prior on the mapping with covariance function c.
Then, the output vector
has a normal distribution with zero mean and covariance matrix
with components
. We consider a 3-dimensional anisotropic, stationary covariance kernel given as, for
and
,
where
,
,
are the length scale parameters associated with directions
, and
is the marginal variance of the GP prior. The functions
and
are kernel correlation functions. As explained in
Rasmussen and Williams (
2005),
is again a GP with mean function
and covariance function
given respectively by
and
where
.
Prediction and uncertainty quantification are made using the conditional distribution
. The best linear unbiased estimator of
is given as the conditional mean function (2). The conditional covariance function
can be used to obtain confidence bounds around the predicted price surface. The hyperparameters of the kernel function
c, as well as the variance of the noise, can be estimated using a maximum likelihood approach or a cross-validation approach (see
Rasmussen and Williams (
2005)). The model described in this subsection corresponds to unconstrained Gaussian process regression, which therefore does not take into account the arbitrage-free conditions as described in Proposition 1.
3.2. Imposing No Arbitrage Constraints to GP Regression
Conditional on the inequality constraints given in Proposition 1, the process
is no longer Gaussian, and this leads to two difficulties. The first is that we depart from the classical framework of GP regression, where the posterior distribution remains Gaussian. The second is that testing the inequality constraints on the entire input domain would require an infinite number of checks. We will use the solution proposed by
Cousin et al. (
2016) which consists of constructing a finite-dimensional approximation
of the Gaussian prior
, which is designed in such a way that the constraints can be imposed for the entire domain
with a finite number of checks. We then recover the constrained posterior distribution by sampling a truncated Gaussian vector. Let us describe the methodology with more details.
We first rescale the input domain
to
. Without loss of generality, we then consider a discretized version of this rescaled domain as a 3-dimensional regular grid with nodes
,
,
,
, where
are, respectively, the number of maturities, tenors and strikes we chose for our grid, and
the corresponding constant steps
1. Then,
is the total number of nodes of the 3-dimensional grid. Now, each node of the grid is associated with a hat basis function defined as:
Then, the GP prior
is approximated on
by the following finite-dimensional process:
Note that the process corresponds to a piecewise linear interpolation of at nodes , for , , and .
If we denote
as
, then the vector
is a zero-mean Gaussian vector with
covariance matrix
with components
for
and
two nodes of the grid. Let us define
as the
N dimensional vector given as:
Then, the finite-dimensional GP prior defined in (
4) can be restated in matrix form: Equality (
4) can then be written as:
Then, if we define as the matrix of basis functions such that each row i corresponds to the vector , we have that , with .
From now on, we will refer to as .
Proposition 2. The following statements hold for European payer swaptions.
- (i)
The finite-dimensional process uniformly converges to on as , and , almost surely;
- (ii)
is a decreasing function of K on if and only if ;
- (iii)
is a convex function of K on if and only if .
These properties can be proved using the same methodology as in
Maatouk and Bay (
2017) for proving monotonicity and convexity. Note that these inequality constraints are linear in the components of
.
Remark 1. For a European receiver swaption, the above properties are the same for and . However, has to be replaced by: is an increasing function of K if and only if .
However, it is not possible to impose the in-plane triangular condition everywhere on the domain
with a finite number of checks. In empirical
Section 4, we consider a weaker version of this constraint by only imposing its validity for time steps of size 1 year.
This constraint is set using a “fictitious grid” of maturities , where , and the step . Therefore, it is a rescaled 1 year step.
Formally, we consider the following weaker version of the in-plane triangular condition:
Then, (
6) is linear in
and can be added to the constraints defined in Proposition 2.
Given property
of Proposition 2, if we define
as the set of 3-dimensional continuous functions which are increasing and monotonic with respect to
K, and respecting the in-plane triangular inequality (for steps of size 1 year, defined as (
5)), then our construction problem consists in finding the conditional distribution of
such that
Given properties
and
of Proposition 2, our process
satisfies the no-arbitrage conditions in the strike direction on the entire domain
when these constraints are satisfied at the nodes. The problem above can then be restated as follows
Indeed,
, where
is a set of linear inequality constraints as defined in
,
of Proposition 2 and the relaxed in-plane triangular inequality constraint (
6).
3.3. Maximum Likelihood Estimation
The parameters of the kernel function c and the noise parameter can be given as an input or estimated. Let . In our approach, we estimate as the hyperparameter vector that maximizes the marginal log likelihood for the process .
The unconstrained Gaussian Likelihood is written as
. Under the finite dimensional approximation, the marginal log likelihood can be expressed as:
This marginal log likelihood also has a closed form derivative with respect to each parameter
(see
Rasmussen and Williams (
2005)), which is given as:
Here, and is the element-wise derivative of A with respect to .
This closed form can be used in order to shorten the computation time of the MLE in practice. This is useful because the computation of the MLE is longer for the swaption cube than for the equity surface, since the product depends on the dimension of the problem.
Remark 2. The reason for the choice of maximizing the unconstrained likelihood instead of maximizing the constrained likelihood is discussed in Bachoc et al. (2019). In this paper, it is explained that constraining the GP increases the computational burden of the maximization, with a negligible impact on the resulting MLE. 3.4. The Most Probable Response Cube and Measurement Noises
In the constrained GP regression case, we consider the mode of the truncated Gaussian process as an estimator for the cube instead of the mean. For the unconstrained GP regression case, mean and mode coincide because of the Gaussian profile of the conditional distribution, but the mode is easier to compute.
The maximum a posteriori (MAP) of
given the constraints satisfies the constraints on the entire domain. In the sense of Bayesian statistics, it coincides with the mode of the truncated Gaussian process. Its expression is given in
Cousin et al. (
2016),
where
is the MAP of the Gaussian coefficients
, which satisfies the inequality constraints. In order to locate the largest noise terms, we proceed as in
Chataigner et al. (
2021). We compute the joint MAP(
) of the truncated Gaussian vector
. This joint MAP (
) is solution of:
The fact that
is Gaussian with a mean 0 with the covariance matrix and the block-diagonal matrix
implies that (
) is a solution to the quadratic problem:
The most probable response cube is then , for , and the most probable measurement noise is .