Grouped Normal Variance Mixtures

Hintz, Erik; Hofert, Marius; Lemieux, Christiane

doi:10.3390/risks8040103

Open AccessArticle

Grouped Normal Variance Mixtures

by

Erik Hintz

^*,

Marius Hofert

and

Christiane Lemieux

Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Risks 2020, 8(4), 103; https://doi.org/10.3390/risks8040103

Submission received: 29 July 2020 / Revised: 8 September 2020 / Accepted: 30 September 2020 / Published: 7 October 2020

(This article belongs to the Special Issue Computational Risk Management)

Download

Browse Figures

Versions Notes

Abstract

:

Grouped normal variance mixtures are a class of multivariate distributions that generalize classical normal variance mixtures such as the multivariate t distribution, by allowing different groups to have different (comonotone) mixing distributions. This allows one to better model risk factors where components within a group are of similar type, but where different groups have components of quite different type. This paper provides an encompassing body of algorithms to address the computational challenges when working with this class of distributions. In particular, the distribution function and copula are estimated efficiently using randomized quasi-Monte Carlo (RQMC) algorithms. We propose to estimate the log-density function, which is in general not available in closed form, using an adaptive RQMC scheme. This, in turn, gives rise to a likelihood-based fitting procedure to jointly estimate the parameters of a grouped normal mixture copula jointly. We also provide mathematical expressions and methods to compute Kendall’s tau, Spearman’s rho and the tail dependence coefficient

λ

. All algorithms presented are available in the R package nvmix (version ≥ 0.0.5).

Keywords:

grouped normal variance mixtures; distribution functions; densities; copulas; grouped t copula; risk measures; quasi-random number sequences

1. Introduction

It is well known that for the purpose of modeling dependence in a risk management setting, the multivariate normal distribution is not flexible enough, and therefore its use can lead to a misleading assessment of risk(s). Indeed, the multivariate normal has light tails and its copula is tail-independent such that inference based on this model heavily underestimates joint extreme events. An important class of distributions that generalizes this simple model is that of normal variance mixtures. A random vector

X = (X_{1}, \dots, X_{d})

follows a normal variance mixture, denoted by

X \sim {NVM}_{d} (μ, Σ, F_{W})

, if, in distribution,

\begin{matrix} X = μ + \sqrt{W} A Z, \end{matrix}

(1)

where

μ \in R^{d}

is the location (vector),

Σ = A A^{⊤}

for

A \in R^{d \times k}

denotes the symmetric, positive semidefinite scale (matrix) and

W \sim F_{W}

is a non-negative random variable independent of the random vector

Z \sim N_{k} (0, I_{k})

(where

I_{k} \in R^{k \times k}

denotes the identity matrix); see, for example, (McNeil et al. 2015, Section 6.2) or (Hintz et al. 2020). Here, the random variable W can be thought of as a shock mixing the normal

Z

, thus allowing

X

to have different tail behavior and dependence structure than the special case of a multivariate normal.

The multivariate t distribution with

ν > 0

degrees of freedom (dof) is also a special case of (1), for

W \sim IG (ν / 2, ν / 2)

; a random variable (rv) W is said to follow an inverse-gamma distribution with shape

α > 0

and rate

β > 0

, notation

W \sim IG (α, β)

, if W has density

f_{W} (w) = β^{- α} w^{- α - 1} exp (- 1 / (β w)) / Γ (α)

for

w > 0

(here,

Γ (x) = \int_{0}^{\infty} t^{x - 1} exp (- t) d t

denotes the gamma function). If

W \sim IG (ν / 2, ν / 2)

, then

X_{j} \sim t_{ν} (μ_{j}, Σ_{j j})

,

j = 1, \dots, d

, so that all margins are univariate t with the same dof

ν

. The t copula, which is the implicitly derived copula from

X \sim t_{d} (ν, 0, P)

for a correlation matrix P via Sklar’s theorem, is a widely used copula in risk management; see, e.g., Demarta and McNeil (2005). It allows one to model pairwise dependencies, including tail dependence, flexibly via the correlation matrix P. When

P = I_{d}

, all k-dimensional margins of

X

are identically distributed. To overcome this limitation, one can allow different margins to have different dof. On a copula level, this leads to the notion of grouped t copulas of Daul et al. (2003) and generalized t copulas of Luo and Shevchenko (2010).

In this paper, we, more generally, define grouped normal variance mixtures via the stochastic representation

\begin{matrix} X = μ + Diag (\sqrt{W}) A Z, \end{matrix}

(2)

where

W = (W_{1}, \dots, W_{d})

is a d-dimensional non-negative and comonotone random vector with

W_{j} \sim F_{W_{j}}

that is independent of

Z

. Denote by

F_{W}^{\leftarrow} (u) = inf {w \geq 0 : F_{W} (w) \geq u}

the quantile function of a random variable W. Comonotonicity of the

W_{j}

implies the stochastic representation

\begin{matrix} W = (W_{1}, \dots, W_{d}) = (F_{W_{1}}^{\leftarrow} (U), \dots, F_{W_{d}}^{\leftarrow} (U)), U \sim U (0, 1) . \end{matrix}

(3)

If a d-dimensional random vector

X

satisfies (2) with

W

given as in (3), we use the notation

X \sim gNVM (μ, Σ, F_{W})

where

F_{W} (w) = P (W \leq w)

for

w \in R^{d}

and the inequality is understood component-wise.

As mentioned above, in the case of an (ungrouped) normal variance mixture distribution from (1), the scalar random variable (rv) W can be regarded as a shock affecting all components of

X

. In the more general setting considered in this paper where

W

is a vector of comonotone mixing rvs, different, perfectly dependent random variables affect different margins of

X

. By moving from a scalar mixing rv to a comonotone random vector, one obtains non-elliptical distributions well beyond the classical multivariate t case, giving rise to flexible modeling of joint and marginal body and tail behaviors. The price to pay for this generalization is significant computational challenges: Not even the density of a grouped t distribution is available in closed form.

At first glance, the definition given in (2) does not indicate any “grouping” yet. However, Equation (3) allows one to group components of the random vector

X

such that all components within a group have the same mixing distribution. More precisely, let

W

be split into S sub-vectors, i.e.,

W = (W_{1}, \dots, W_{S})

where

W_{k}

has dimension

d_{k}

for

k = 1, \dots, S

and

\sum_{k = 1}^{S} d_{k} = d

. Now let each

W_{k}

have stochastic representation

W_{k} = (F_{W_{k}}^{\leftarrow} (U), \dots, F_{W_{k}}^{\leftarrow} (U))

. Hence, all univariate margins of the subvector

W_{k}

are identically distributed. This implies that all margins of the corresponding subvector

X_{k}

are of the same type.

An example is the copula derived from

X

in (2) when

F_{W_{k}} = IG (ν_{k} / 2, ν_{k} / 2)

for

k = 1, \dots, S

; this is the aforementioned grouped t copula. Here, different margins of the copula follow (potentially) different t copulas with different dof, allowing for more flexibility in modeling pairwise dependencies. A grouped t copula with

S = d

, that is when each component has their own mixing distribution, was proposed in Venter et al. (2007) (therein called “individuated t copula”) and studied in more detail in Luo and Shevchenko (2010) (therein called “t copula with multiple dof”). If

S = 1

, the classical t copula with exactly one dof parameter is recovered.

For notational convenience, derivations in this paper are often done for the case

S = d

, so that the

F_{W_{j}}

are all different; the case

S < d

, that is when grouping is present, is merely a special case where some of the

F_{W_{j}}

are identical. That being said, we chose to keep the name “grouped” to refer to this class of models so as to reflect the original motivation for this type of model, e.g., as in Daul et al. (2003), where it is used to model the components of a portfolio in which there are subgroups representing different business sectors.

Previous work on grouped t copulas and their corresponding distributions includes some algorithms for the tasks needed to handle these models, but were mostly focused on demonstrating the superiority of this class of models over special cases such as the multivariate normal or t distribution. More precisely, in Daul et al. (2003), the grouped t copula was introduced and applied to model an internationally diversified credit portfolio of 92 risk factors split into 8 subgroups. It was demonstrated that the grouped t copula is superior to both the Gaussian and t copula in regards to modeling the tail dependence present in the data. Luo and Shevchenko (2010) also study the grouped t copula and, unlike in Daul et al. (2003), allow group sizes of 1 (corresponding to

S = d

in our definition). They provide calibration methods to fit the copula to data and furthermore study bivariate characteristics of the grouped t copula, including symmetry properties and tail dependence.

However, to the best of our knowledge, there currently does not exist an encompassing body of work providing all algorithms and formulas required to handle these copulas and their corresponding distributions, both in terms of evaluating distributional quantities and in terms of general fitting algorithms. In particular, not even the problem of computing the distribution and density function of a grouped t copula has been addressed. Our paper fills this gap by providing a complete set of algorithms for performing the main computational tasks associated with these distributions and their associated copulas, and does so in an as automated way as possible. This is done not only for grouped t copulas, but (in many cases) for the more general grouped normal variance mixture distributions/copulas, which allow for even further flexibility in modeling the shock variables

W

. Furthermore, we assume that the only available information about the distribution of the

W_{j}

are the marginal quantile functions in the form of a “black box“, meaning that we can only evaluate these quantile functions but have no mathematical expression for them (so that neither the density, nor the distribution function of

W_{j}

are available in closed form).

Our work includes the following contributions: (i) we develop an algorithm to evaluate the distribution function of a grouped NVM model. Our method only requires the user to provide a function that evaluates the quantile function of the

W_{j}

through a black box. As such, different mixing distributions can be studied by merely providing a quantile function without having to implement an integration routine for the model at hand; (ii) as mentioned above, the density function for a grouped t distribution does not exist in closed form, neither does it for the more general grouped NVM case. We provide an adaptive algorithm to estimate this density function in a very general setting. The adaptive mechanism we propose ensures the estimation procedure is precise even for points that are far from the mean; (iii) to estimate Kendall’s tau and Spearman’s rho for a two-dimensional grouped NVM copula, we provide a representation as an expectation, which in turn leads to an easy-to-approximate two- or three-dimensional integral; (iv) we provide an algorithm to estimate the copula and its density associated with the grouped t copula, and fitting algorithms to estimate the parameters of a grouped NVM copula based on a dataset. While the problem of parameter estimation was already studied in Daul et al. (2003) and Luo and Shevchenko (2010), the computation of the copula density which is required for the joint estimation of all dof parameters has not been investigated in full generality for arbitrary dimensions yet, which is a gap we fill in this paper.

The four items from the list of contributions described in the previous paragraph correspond to Section 3, Section 4, Section 5 and Section 6 of the paper. Section 2 includes a brief presentation of the notation used, basic properties of grouped NVM distributions and a description of randomized quasi-Monte Carlo methods that are used throughout the paper since most quantities of interest require the approximation of integrals. Section 7 provides a discussion. The proofs are given in Section 8.

All our methods are implemented in the R package nvmix (Hofert et al. (2020)) and all numerical results are reproducible with the demo grouped_mixtures.

2. Notation, Basic Properties and Tools

This section provides the necessary background, notations and tools needed throughout the remainder of this paper. The first part addresses some properties of grouped normal variance mixtures, such as mean, covariance and the relationship with elliptical distributions. The second part of this section describes randomized quasi-Monte Carlo methods, which is the type of integration method we apply to approximate quantities that are expressed as expectations, such as the distribution function of grouped normal variance mixtures.

2.1. Grouped Normal Variance Mixtures

To simplify the notation throughout the remainder of this paper, we use the shorthand notations

F_{W}^{\leftarrow} (U) = (F_{W_{1}}^{\leftarrow} (U), \dots, F_{W_{d}}^{\leftarrow} (U))

,

W^{D} = Diag (W)

,

W^{D} (U) = Diag (F_{W}^{\leftarrow} (U))

;

{\sqrt{W}}^{D}

,

{(1 / W)}^{D}

as well as

{(1 / \sqrt{W})}^{D}

are defined similarly.

Many properties of grouped normal variance mixtures are derived by conditioning on the d-dimensional random vector

W

, or equivalently by conditioning on the underlying univariate uniform rv U. Indeed,

\begin{matrix} X ∣ W \sim N_{d} (μ, {\sqrt{W}}^{D} Σ {\sqrt{W}}^{D}) or equivalently X ∣ U \sim N_{d} (μ, {\sqrt{W}}^{D} (U) Σ {\sqrt{W}}^{D} (U)), \end{matrix}

where

N_{d} (μ, Σ)

denotes the d-dimensional multivariate normal distribution with mean vector

μ

and covariance matrix

Σ

. One can see that

W

“mixes“ the covariance matrix of a multivariate normal and can be regarded as a shock affecting all components of

X

.

2.1.1. Mean and Covariance

If

E (\sqrt{W})

exists, then

E (X) = μ

, and if

E (W) < \infty

, then

cov (X) = E ({\sqrt{W}}^{D} Σ {\sqrt{W}}^{D})

. Furthermore,

corr (X) = P

, where P denotes the correlation matrix corresponding to

Σ

. If

A = I_{d}

, the (uncorrelated) components of

X

are independent if and only if all components of

W

are constant with probability 1 and thus if

X

is multivariate normal; see (McNeil et al. 2015, Lemma 6.5). Assuming

k = d

, the matrix A is typically the Cholesky factor computed from a given

Σ

. Other decompositions of

Σ

into

A A^{⊤}

for some

A \in R^{d \times d}

can be obtained from the eigendecomposition or singular-value decomposition.

2.1.2. Relationship with Elliptical Distributions

It is well known that normal variance mixtures, such as the multivariate normal and t distributions, are elliptical. A d-dimensional random vector

Y

is said to have an elliptical distribution, denoted by

Y \sim {ELL}_{d} (μ, Σ, F_{R})

, if

\begin{matrix} Y \underset{}{\overset{d}{=}} μ + R A S, \end{matrix}

(4)

where

μ \in R^{d}

,

A A^{⊤} = Σ

and

S \sim (S^{d - 1})

independent of a non-negative rv

R \sim F_{R}

. Here,

S^{d - 1} = {x \in R^{d} : x^{⊤} x = 1}

denotes the unit sphere in

R^{d}

.

Let

Y \sim {NVM}_{d} (μ, Σ, F_{W})

. Using the fact that

Z / \sqrt{Z^{⊤} Z} \sim (S^{d - 1})

is independent of

Z^{⊤} Z \sim χ_{d}^{2}

for

Z \sim N_{d} (0, I_{d})

(see, e.g., (Devroye 1986, Chapter 5)) it is easy to see from (4) that

Y \sim {ELL}_{d} (μ, Σ, F_{R})

for

R = W \tilde{X}

where

W \sim F_{W}

is independent of

{\tilde{X}}^{2} \sim χ_{d}^{2}

.

If

X \sim {gNVM}_{d} (0, I_{d}, F_{W})

, then

X

is in general not elliptical, unless

F_{W_{1}} = \dots = F_{W_{d}}

. This can be seen from (4), since the scalar radial rv R cannot be used to model comonotone shocks. Applying the same principle used to define grouped normal variance mixtures in (2), one can define grouped elliptical distributions via the stochastic representation

\begin{matrix} X \underset{}{\overset{d}{=}} μ + Diag (R) A S, \end{matrix}

(5)

where

R = (R_{1}, \dots, R_{d})

satisfies that

R_{j} \underset{}{\overset{d}{=}} F_{R_{j}}^{\leftarrow} (U) \times F_{W} (\tilde{U})

for

U, \tilde{U} \underset{}{\overset{ind .}{\sim}} (0, 1)

and

R_{j} \geq 0

a.s.

If

X \sim {gNVM}_{d} (μ, Σ, F_{W})

, we can set

F_{R_{j}} = F_{W_{j}}

for

j = 1, \dots, d

and

F_{W}

to be the distribution function of a

χ_{d}

random variable (the square root of a

χ_{d}^{2}

random variable). This shows that

X

is a grouped elliptical distribution in the sense of (5). In this work we focus on the more tractable class of grouped NVM distributions and do not further detail grouped elliptical distributions.

2.2. Randomized quasi-Monte Carlo Methods

Many quantities of interest in this paper, such as the distribution function of a gNVM distribution, can, after a suitable transformation, be expressed as

\begin{matrix} μ = \int_{{(0, 1)}^{d}} g (u) d u, \end{matrix}

(6)

where

d \in N

can be large (e.g.,

μ = F (a, b)

for large d; see (15) in the next section) or small (e.g.,

d = 3

in (22)) and the integral cannot be computed explicitly. As a method that works flexibly for small and large dimensions, one might consider Monte Carlo (MC) estimation, that is, estimate

μ

by

\begin{matrix} {\hat{μ}}_{n}^{MC} = \frac{1}{n} \sum_{i = 1}^{n} g (U_{i}), U_{1}, \dots, U_{n} \underset{}{\overset{ind .}{\sim}} U {(0, 1)}^{d}, \end{matrix}

whose asymptotic

(1 - α)

-confidence interval (CI) can be approximated for sufficiently large n by

\begin{matrix} [{\hat{μ}}_{n}^{MC} - z_{1 - α / 2} {\hat{σ}}_{g} / \sqrt{n}, {\hat{μ}}_{n}^{MC} + z_{1 - α / 2} {\hat{σ}}_{g} / \sqrt{n}], \end{matrix}

where

z_{α} = Φ^{- 1} (α)

and

{\hat{σ}}_{g}^{2} = \hat{var} (g (U)) = \sum_{i = 1}^{n} {(g (U_{i}) - {\hat{μ}}_{n}^{MC})}^{2} / (n - 1)

. An often superior alternative to MC estimation are quasi-Monte Carlo (QMC) methods. Instead of averaging function values of a random sample

U_{1}, \dots, U_{n}

, a low discrepancy point-set

P_{n} = {v_{1}, \dots, v_{n}} \subset {[0, 1)}^{d}

is employed, which aims at filling the unit hypercube in a more homogeneous way. To make error estimation easily possible, one can randomize the point-set

P_{n}

in a way such that the points in the resulting point set, say

{\tilde{P}}_{n}

, are uniformly distributed over

{[0, 1]}^{d}

while keeping the low discrepancy of the point set overall, giving rise to randomized QMC (RQMC) methods. In our algorithms, we use a digitally-shifted Sobol’ sequence as implemented in the function sobol(, randomize = "digital.shift") of the R package qrng; see Hofert and Lemieux (2019). We remark that generating

{\tilde{P}}_{n}

is even slightly faster than the Mersenne Twister, which is R’s default (pseudo-)random number generator.

Given B independently randomized copies of

P_{n}

, say

{\tilde{P}}_{n, b} = {u_{1, b}, \dots, u_{n, b}}

for

b = 1, \dots, B

, we construct B independent RQMC estimators of the form

\begin{matrix} {\hat{μ}}_{b, n}^{RQMC} = \frac{1}{n} \sum_{i = 1}^{n} g (u_{i, b}), b = 1, \dots, B, \end{matrix}

(7)

which are combined into the RQMC estimator.

\begin{matrix} {\hat{μ}}_{n}^{RQMC} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{μ}}_{b, n}^{RQMC} \end{matrix}

(8)

of

μ

. An approximate

(1 - α)

-CI for

μ

can be estimated as

\begin{matrix} [{\hat{μ}}_{n}^{RQMC} - z_{1 - α / 2} {\hat{σ}}_{{\hat{μ}}^{RQMC}} / \sqrt{B}, {\hat{μ}}_{n}^{RQMC} + z_{1 - α / 2} {\hat{σ}}_{{\hat{μ}}_{n}^{RQMC}} / \sqrt{B}], \end{matrix}

(9)

where

{\hat{σ}}_{{\hat{μ}}^{RQMC}}^{2} = \sum_{i = 1}^{B} {({\hat{μ}}_{b, n}^{RQMC} - {\hat{μ}}_{n}^{RQMC})}^{2} / (B - 1)

is an unbiased estimator of

var ({\hat{μ}}_{b, n}^{RQMC})

. One can compute

{\hat{μ}}_{n}^{RQMC}

from (8) for some initial sample size n (e.g.,

n = 2^{7}

) and iteratively increase the sample size of each

{\hat{μ}}_{b, n}^{RQMC}

in (7) until the length of the CI in (9) satisfies a pre-specified error tolerance. In our implementations, we use

B = 15

, an absolute default error tolerance

ε = 0.001

(which can be changed by the user) and

z_{1 - α / 2} = 3.5

(so

α \approx 0.00047

). By using

{\hat{μ}}_{n}^{RQMC}

as approximation for the true value of

μ

, one can also consider relative instead of absolute errors.

Sometimes it is necessary to estimate

log (μ)

rather than

μ

; in particular, when

μ

is small. For instance, if

μ = f (x)

where

f (x)

is the density of some random vector evaluated at

x \in R^{d}

, interest may lie in

log (f (x))

as this quantity is needed to compute the log-likelihood of a random sample (which then may be optimized over some parameter space). When

μ

is small, it is typically not a good idea to use

log (μ) \approx log ({\hat{μ}}_{n}^{RQMC})

directly, but rather to compute a numerically more robust estimator for

log (μ)

, a proper logarithm. Define the function LSE (for Logarithmic Sum of Exponentials) as

\begin{matrix} LSE (c_{1}, \dots, c_{n}) = log (\sum_{i = 1}^{n} exp (c_{i})) = c_{\max} + log (\sum_{i = 1}^{n} exp (c_{i} - c_{\max})), \end{matrix}

where

c_{1}, \dots, c_{n} \in R

and

c_{\max} = max {c_{1}, \dots, c_{n}}

.

The sum inside the logarithm on the right side of this equation is bounded between 1 and n so that the right side of this equation is numerically more robust than the left side.

Let

c_{i, b} = log (g (u_{i, b}))

for

i = 1, \dots, n

and

b = 1, \dots, B

. An estimator numerically superior to

log ({\hat{μ}}_{n}^{RQMC})

(but mathematically equivalent) is given by

\begin{matrix} {\hat{μ}}_{n, log}^{RQMC} = - log (B) + LSE ({\hat{μ}}_{1, n, log}^{RQMC}, \dots, {\hat{μ}}_{B, n, log}^{RQMC}), \end{matrix}

(10)

where

\begin{matrix} {\hat{μ}}_{b, n, log}^{RQMC} = - log (n) + LSE (c_{1, b}, \dots, c_{n, b}), b = 1, \dots, B . \end{matrix}

Using

\begin{matrix} {\hat{σ}}_{{\hat{μ}}^{RQMC}, log} = \sqrt{\frac{1}{(B - 1)} \sum_{b = 1}^{B} {({\hat{μ}}_{b, n, log}^{RQMC} - {\hat{μ}}_{n, log}^{RQMC})}^{2}} \end{matrix}

the integration error can be estimated from the length of the CI in (9) as before.

For more details about (randomized) quasi-Monte Carlo methods and their applications in the financial literature, see, e.g., Niederreiter (1992), Lemieux (2009), Glasserman (2013).

3. Distribution Function

Let

- \infty \leq a < b \leq \infty

componentwise (entries

\pm \infty

to be interpreted as the corresponding limits). Then

F (a, b) = ℙ (a < X \leq b)

is the probability that the random vector

X

falls in the hyper-rectangle spanned by the lower-left and upper-right endpoints

a

and

b

, respectively. If

a = (- \infty, \dots, - \infty)

, we recover

F (a, x) = F (x) = ℙ (X_{1} \leq x_{1}, \dots, X_{d} \leq x_{d})

which is the (cumulative) distribution function of

X

.

Assume wlog that

μ = 0

, otherwise adjust

a

,

b

accordingly. Then

\begin{matrix} F (a, b) & = ℙ (a < {\sqrt{W}}^{D} A Z \leq b) = E [ℙ ({(1 / \sqrt{W})}^{D} (U) a < A Z \leq {(1 / \sqrt{W})}^{D} (U) b | U)] \\ = E [Φ_{Σ} ({(1 / \sqrt{W})}^{D} (U) a, {(1 / \sqrt{W})}^{D} (U) b)] = \int_{0}^{1} Φ_{Σ} ({(1 / \sqrt{W})}^{D} (u) a, {(1 / \sqrt{W})}^{D} (u) b) d u, \end{matrix}

(11)

where

Φ_{Σ} (a, b) = ℙ (a < Y \leq b)

for

Y \sim N_{d} (0, Σ)

. Note that the function

Φ_{Σ} (a, b)

itself is a d-dimensional integral for which no closed formula exists and is typically approximated via numerical methods; see, e.g., Genz (1992).

Comonotonicity of the

W_{j}

allowed us to write

F (a, b)

as a

(d + 1)

-dimensional integral; had the

W_{j}

a different dependence structure, this convenience would be lost and the resulting integral in (11) could be up to

2 d

-dimensional (e.g., when all

W_{j}

are independent).

3.1. Estimation

As demonstrated in Section 2.1, we need to approximate

\begin{matrix} F (a, b) = \int_{0}^{1} Φ_{Σ} ({(1 / \sqrt{W})}^{D} (u) a, {(1 / \sqrt{W})}^{D} (u) b) d u . \end{matrix}

In Hintz et al. (2020), randomized quasi-Monte Carlo methods have been derived to approximate the distribution function of a normal variance mixture

X \sim {NVM}_{d} (μ, Σ, F_{W})

from (1). Grouped normal variance mixtures can be dealt with similarly, thanks to the comonotonicity of the mixing random variables in

W

.

In order to apply RQMC to the problem of estimating

F (a, b)

, we need to transform

F (a, b)

to an integral over the unit hypercube. To this end, we first address

Φ_{Σ}

. Let

C = {(C_{i j})}_{i, j = 1}^{d}

be the Cholesky factor of

Σ

(a lower triangular matrix such that

C C^{⊤} = Σ

). We assume that

Σ

has full rank which implies

C_{j j} > 0

for

j = 1, \dots, d

. Genz (1992) (see also Genz and Bretz (1999, 2002, 2009)) uses a series of transformations, relying on C being a lower triangular matrix, to write

\begin{matrix} Φ_{Σ} (a, b) & = \int_{a_{1}}^{b_{1}} \dots \int_{a_{d}}^{b_{d}} \frac{1}{\sqrt{{(2 π)}^{d} | Σ |}} exp (- \frac{x^{⊤} Σ^{- 1} x}{2}) d x \\ = ({\hat{e}}_{1} - {\hat{d}}_{1}) \int_{0}^{1} ({\hat{e}}_{2} - {\hat{d}}_{2}) \dots \int_{0}^{1} ({\hat{e}}_{d} - {\hat{d}}_{d}) d u_{d - 1} \dots d u_{1}, \end{matrix}

(12)

where the

{\hat{d}}_{i}

and

{\hat{e}}_{i}

are recursively defined via

\begin{matrix} {\hat{e}}_{1} = Φ (\frac{b_{1}}{C_{11}}), {\hat{e}}_{i} = {\hat{e}}_{i} (u_{1}, \dots, u_{i - 1}) = Φ (\frac{b_{i} - \sum_{j = 1}^{i - 1} C_{i j} Φ^{- 1} ({\hat{d}}_{j} + u_{j} ({\hat{e}}_{j} - {\hat{d}}_{j}))}{C_{i i}}), \end{matrix}

(13)

and

{\hat{d}}_{i}

is

{\hat{e}}_{i}

with

b_{i}

replaced by

a_{i}

for

i = 1, \dots, d

. Note that the final integral in (12) is

(d - 1)

-dimensional.

Combining the representation (12) of

Φ_{Σ}

with Equation (11) yields

\begin{matrix} F (a, b) = \int_{{(0, 1)}^{d}} g (u) d u = \int_{0}^{1} g_{1} (u_{0}) \int_{0}^{1} g_{2} (u_{0}, u_{1}) \dots \int_{0}^{1} g_{d} (u_{0}, \dots, u_{d - 1}) d u_{d - 1} \dots d u_{0}, \end{matrix}

(14)

where

\begin{matrix} g (u) = \prod_{i = 1}^{d} g_{i} (u_{0}, \dots, u_{i - 1}), g_{i} (u_{0}, \dots, u_{i - 1}) = e_{i} - d_{i}, i = 1, \dots, d, \end{matrix}

(15)

for

u = (u_{0}, u_{1}, \dots, u_{d - 1}) \in {(0, 1)}^{d}

. The

e_{i}

are recursively defined by

\begin{matrix} e_{1} & = e_{1} (u_{0}) = Φ (\frac{b_{1}}{C_{11} \sqrt{F_{W_{1}}^{\leftarrow} (u_{0})}}), \\ e_{i} & = e_{i} (u_{0}, \dots, u_{i - 1}) = Φ (\frac{1}{C_{i i}} (\frac{b_{i}}{\sqrt{F_{W_{i}}^{\leftarrow} (u_{0})}} - \sum_{j = 1}^{i - 1} C_{i j} Φ^{- 1} (d_{j} + u_{j} (e_{j} - d_{j})))), \end{matrix}

(16)

for

i = 2, \dots, d

and the

d_{i}

are

e_{i}

with

b_{i}

replaced by

a_{i}

for

i = 1, \dots, d

.

Summarizing, we were able to write

F (a, b)

as an integral over the d-dimensional unit hypercube. Our algorithm to approximate

F (a, b)

consists of two steps:

First, a greedy re-ordering algorithm is applied to the inputs

a

,

b

,

Σ

. It re-orders the components

1, \dots, d

of

a

and

b

as well as the corresponding rows and columns in

Σ

in a way that the expected ranges of

g_{i}

in (15) are increasing with the index i for

i = 1, \dots, d

. Observe that the integration variable

u_{i}

is present in all remaining

d - i + 1

integrals in (14) whose ranges are determined by the ranges of

g_{1}, \dots, g_{i}

; reordering the variables according to expected ranges therefore (in the vast majority of cases) reduces the overall variability of g (namely,

var (g (U))

for

U \sim U {(0, 1)}^{d}

). Reordering also makes the first variables “more important” than the last ones, thereby reducing the effective dimension of the integrand. This is particularly beneficial for quasi-Monte Carlo methods, as these methods are known to perform well in high-dimensional problems with low effective dimension; see, e.g., Caflisch et al. (1997), Wang and Sloan (2005). For a detailed description of the method, see (Hintz et al. 2020, Algorithm 3.2) (with

a_{j} / μ_{\sqrt{W}}

replaced by

a_{j} / μ_{{\sqrt{W}}_{j}}

and similarly for

b_{j}

for

j = 1, \dots, d

to account for the generalization); similar reordering strategies have been proposed in Gibson et al. (1994) for calculating multivariate normal and in Genz and Bretz (2002) for multivariate t probabilities.

Second, an RQMC algorithm as described in Section 2.2 is applied to approximate the integral in (14) with re-ordered

a

,

b

,

Σ

and

F_{W}

. Instead of integrating g from (15) directly, antithetic variates are employed so that effectively, the function

\tilde{g} (u) = (g (u) + g (1 - u)) / 2

is integrated.

The algorithm to estimate

F (a, b)

just described is implemented in the function pgnvmix() of the R package nvmix.

3.2. Numerical Results

In order to assess the performance of our algorithm described in Section 3.1, we estimate the error as a function of the number of function evaluations. Three estimators are considered. First, the “Crude MC“ estimator is constructed by sampling

X_{1}, \dots, X_{n} \underset{}{\overset{ind .}{\sim}} gNVM (μ, Σ, F_{W})

and estimating

ℙ (X \leq x)

by

{\hat{μ}}_{n}^{MC} = (1 / n) \sum_{i = 1}^{n} 𝟙_{{X_{i} \leq x}}

. The second and third estimator are based on the integrand g from (15), which is integrated once using MC (“g (MC)”) and once using a randomized Sobol’ sequence (“g (sobol)”). In either case, variable reordering is applied first.

We perform our experiments for an inverse-gamma mixture. As motivated in the introduction, an important special case of (grouped) normal variance mixtures is obtained when the mixing distribution is inverse-gamma. In the ungrouped case when

X \sim {NVM}_{d} (μ, Σ, F_{W})

with

W \sim IG (ν / 2, ν / 2)

, the distribution of

X

is multivariate t (notation

X \sim t_{d} (ν, μ, Σ)

) with density

\begin{matrix} f_{ν, μ, Σ}^{t} (x) = \frac{Γ ((ν + d) / 2)}{Γ (ν / 2) \sqrt{{(ν π)}^{d} | Σ |}} {(1 + \frac{D^{2} (x; μ, Σ)}{ν})}^{- \frac{ν + d}{2}}, x \in R^{d} . \end{matrix}

(17)

The distribution function of

X \sim t_{d} (ν, μ, Σ)

does not admit a closed form; estimation of the latter was discussed for instance in Genz and Bretz (2009), Hintz et al. (2020), Cao et al. (2020). The same holds for a grouped inverse-gamma mixture model. If

W_{j} \sim IG (ν_{j} / 2, ν_{j} / 2)

for

j = 1, \dots, d

, the random vector

X

follows a grouped t distribution, denoted by

X \sim {gt}_{d} (ν_{1}, \dots, ν_{d}; μ, Σ)

or by

X \sim {gt}_{d} (ν, μ, Σ)

for

ν = (ν_{1}, \dots, ν_{d})

. If

1 < S < d

, denote by

d_{1}, \dots, d_{S}

the group sizes. In this case, we use the notation

X \sim {gt}_{d} (ν_{1}, \dots, ν_{S}; d_{1}, \dots, d_{S}; μ, Σ)

or

X \sim {gt}_{d} (ν, d, μ, Σ)

for

d = (d_{1}, \dots, d_{S})

. If

S = 1

, it follows that

X \sim t_{d} (ν_{1}, μ, Σ)

.

For our numerical examples to test the performance of our procedure for estimating

F (a, b)

, assume

X \sim {gt}_{d} (ν, 0, P)

for a correlation matrix P. We perform the experiment in

d = 5

with

ν = (1.5, 2.5, \dots, 5.5)

and in

d = 20

with

ν = (1, 1.25, \dots, 5.5, 5.75)

. The following is repeated 15 times: Sample an upper limit

b \sim U {(0, 3 \sqrt{d})}^{d}

and a correlation matrix P (sampled based on a random Wishart matrix via the function rWishart() in R). Then estimate

ℙ (X \leq b)

using the three aforementioned methods using various sample sizes and estimate the error for the MC estimators based on a CLT argument and for the RQMC estimator as described in Section 2.2. Figure 1 reports the average absolute errors for each sample size over the 15 runs.

Convergence speed as measured by the regression coefficient

α

of

log (\hat{ε}) = α log (n) + c

where

\hat{ε}

is the estimated error are displayed in the legend. As expected, the MC estimators have an overall convergence speed of

1 / \sqrt{n}

; however, the crude estimator has a much larger variance than the MC estimator based on the function g. The RQMC estimator (“g (sobol)”) not only shows much faster convergence speed than its MC counterparts, but also a smaller variance.

4. Density Function

Let us now focus on the density of

X \sim gNVM (μ, Σ, F_{W})

, where we assume that

Σ

has full rank in order for the density to exist. As mentioned in the introduction, the density of

X

is typically not available in closed form, not even in the case of a grouped t distribution. The same conditioning argument used to derive (11) yields that the density of

X \sim {gNVM}_{d} (μ, Σ, F_{W})

evaluated at

x \in R^{d}

can be written as

\begin{matrix} f_{X} (x) & = E (\frac{1}{\sqrt{{(2 π)}^{d} | {\sqrt{W}}^{D} (U) Σ {\sqrt{W}}^{D} (U) |}} exp (- \frac{D^{2} (x; μ, {\sqrt{W}}^{D} (U) Σ {\sqrt{W}}^{D} (U))}{2})) \\ = \int_{0}^{1} \frac{1}{\sqrt{{(2 π)}^{d} | Σ | \prod_{i = 1}^{d} F_{W_{i}}^{\leftarrow} (u)}} exp (- \frac{D^{2} (x; μ, {\sqrt{W}}^{D} (u) Σ {\sqrt{W}}^{D} (u))}{2}) d u = \int_{0}^{1} h (u) d u, \end{matrix}

(18)

where

D^{2} (x; μ, Σ) = {(x - μ)}^{⊤} Σ^{- 1} (x - μ)

denotes the (squared) Mahalanobis distance of

x \in R^{d}

from

μ

with respect to

Σ

and the integrand

h (u)

is defined in an obvious manner. Except for some special cases (e.g., when all

W_{j}

are inverse-gamma with the same parameters), this integral cannot be computed explicitly, so that we rely on numerical approximation thereof.

4.1. Estimation

From (18), we find that computing the density

f (x)

of

X \sim {gNVM}_{d} (μ, Σ, F_{W})

evaluated at

x \in R^{d}

requires the estimation of a univariate integral. As interest often lies in the logarithmic density (or log-density) rather than the actual density (e.g., likelihood-based methods where the log-likelihood function of a random sample is optimized over some parameter space), we directly consider the problem of estimating

log (μ)

for

μ = \int_{0}^{1} h (u) d u

with h given in (18).

Since

μ

is expressed as an integral over

(0, 1)

, RQMC methods to estimate

log (μ)

from Section 2.2 can be applied directly to the problem in this form. If the log-density needs to be evaluated at several

x_{1}, \dots, x_{N}

, one can use the same point-sets

{\tilde{P}}_{n, b}

and therefore the same realizations of the mixing random vector

W

for all inputs. This avoids costly evaluations of the quantile functions

F_{W_{j}}^{\leftarrow}

.

Estimating

log (f (x))

via RQMC as just described works well for input

x

of moderate size, but deteriorates if

x

is far away from the mean. To see this, Figure 2 shows the integrand h for three different input

x

and three different settings for

F_{W}

. If

x

is “large”, most of the mass is contained in a small subdomain of

(0, 1)

containing the abscissa of the maximum of h. If an integration routine is not able to detect this peak, the density is substantially underestimated. Further complication arises as we are estimating the log-density rather than the density. Unboundedness of the natural logarithm at 0 makes estimation of

log (μ)

for small

μ

challenging, both from a theoretical and a computational point of view due to finite machine precision.

In (Hintz et al. 2020, Section 4), an adaptive RQMC algorithm is proposed to efficiently estimate the log-density of

X \sim {NVM}_{d} (μ, Σ, F_{W})

. We generalize this method to the grouped case. The grouped case is more complicated because the distribution is not elliptical, hence the density does not only depend on

x

through

D^{2} (x, μ, Σ)

. Furthermore, the height of the (unique) maximum of h in the ungrouped case can be easily computed without simulation, which helps the adaptive procedure find the relevant region; in the grouped case, the value of the maximum is usually not available. Lastly, S (as opposed to 1) quantile evaluations are needed to obtain one function value

h (u)

; from a run time perspective, evaluating these quantile functions is the most expensive part.

If

x

is “large”, the idea is to apply RQMC only in a relevant region

(u_{l}, u_{r})

with

{argmax}_{u} h (u) = : u^{*} \in (u_{l}, u_{r})

. More precisely, given a threshold

ε_{th}

with

0 < ε_{th} < h_{max} = {max}_{u \in (0, 1)} h (u)

, choose

u_{l}, u_{r}

(l for “left” and r for “right”) with

0 \leq u_{l} \leq u^{*} \leq u_{r} \leq 1

so that

h (u) > ε_{th}

if and only if

u \in (u_{l}, u_{r})

. For instance, take

\begin{matrix} ε_{th} = 10^{log (h_{max}) / log (10) - k_{th}} \end{matrix}

(19)

with

k_{th} = 10

so that

ε_{th}

is 10 orders smaller than

h_{max}

.

One can then apply RQMC (with a proper logarithm) in the region

(u_{l}, u_{r})

(by replacing every

u_{i, b} \in (0, 1)

by

u_{i, b}^{'} = u_{l} + (u_{r} - u_{l}) u_{i, b} \in (u_{l}, u_{r})

), producing an estimate for

log \int_{u_{l}}^{u_{r}} h (u) d u

. By construction, the remaining regions do not contribute significantly to the overall integral anyway, so that a rather quick integration routine suffices here. Note that neither

h_{max}

, nor

u_{l}, u_{r}

are known explicitly. However,

h_{max}

can be estimated from pilot-runs and

u_{l}, u_{r}

can be approximated using bisections.

Summarizing, we propose the following method to estimate

log (f (x_{i}))

,

i = 1, \dots, N

, for given inputs

x_{1}, \dots, x_{N}

and error tolerance

ε

.

This algorithm is implemented in the function dgnvmix(, log = TRUE) in the R package nvmix, which by default uses a relative error tolerance.

The advantage of the proposed algorithm is that only little run time is spent on estimating “easy” integrals, thanks to the pilot run in Step 1. If

n_{0} = 2^{10}

and

B = 15

(the current default in the nvmix package), this step gives 15 360 pairs

(u, F_{W}^{\leftarrow} (u))

. These pairs give good starting values for the bisections to find

u_{l}, u_{r}

. Note that no additional quantile evaluations are needed to estimate the less important regions

(0, u_{l})

and

(u_{r}, 1)

.

4.2. Numerical Results

Luo and Shevchenko (2010) are faced with almost the same integration problem when estimating the density of a bivariate grouped t copula. They use a globally adaptive integration scheme from Piessens et al. (2012) to integrate h. While this procedure works well for a range of inputs, it deteriorates for input

x

with large components.

Consider first

X \sim t_{d} (ν, 0, I_{d})

and recall that the density of

X

is known and given by (17); this is useful to test our estimation procedure. As such, let

X \sim t_{2} (ν = 6, 0, I_{2})

and consider the problem of evaluating the density of

X

at

x \in {(0, 0), (5, 5), (25, 25), (50, 50)}

. Some values of the corresponding integrands are shown in Figure 2. In Table 1, true and estimated (log-)density values are reported; once estimated using the R function integrate(), which is based on the QUADPACK package of Piessens et al. (2012) and once using dgnvmix(), which is based on Algorithm 1. Clearly, the integrate() integration routine is not capable of detecting the peak when input

x

is large, yielding substantially flawed estimates. The estimates obtained from dgnvmix(), however, are quite close to the true values even far out in the tail.

Algorithm 1: Adaptive RQMC Algorithm to Estimate

log (f (x_{1})), \dots, log (f (x_{n}))

.

Given

x_{1}, \dots, x_{N}

,

Σ

,

ε

,

ε_{th}

,

n_{0}

, estimate

log (f (x_{l}))

,

l = 1, \dots, N

, via:

Compute ${\hat{μ}}_{log f (x_{i}), n_{0}}^{RQMC}$ with sample size $n_{0}$ using the same random numbers for all input $x_{i}$ , $i = 1, \dots, N$ . Store all uniforms with corresponding quantile evaluations $F_{W}^{\leftarrow}$ in a list $L$ .
If all estimates ${\hat{μ}}_{log f (x_{i}), n_{0}}^{RQMC}$ , $i = 1, \dots, N$ , meet the error tolerance $ε$ , go to Step 4. Otherwise let $x_{s}$ , $s = 1, \dots, N^{'}$ with $1 \leq N^{'} \leq N$ be the inputs whose error estimates exceed the error tolerance.
For each remaining input $x_{s}$ , $s = 1, \dots, N^{'}$ , do:
(a)
Use all pairs $(u, F_{W}^{\leftarrow} (u))$ in $L$ to compute values of $h (u)$ and set ${\hat{h}}_{max} = {max}_{u \in L} h (u)$ . If the largest value of h is obtained for the largest (smallest) u in the list $L$ , set $u^{*} = 1$ ( $u^{*} = 0$ ).
(b)
If $u^{*} = 1$ , set $u_{r} = 1$ and if $u^{*} = 0$ , set $u_{l} = 0$ . Unless already specified, use bisections to find $u_{l}$ and $u_{r}$ such that $u_{l} < u^{*} < u_{r}$ and $u_{l}$ ( $u_{r}$ ) is the smallest (largest) u such that $h (u) > ε_{th}$ from (19) with $h_{max}$ replaced by ${\hat{h}}_{max}$ . Starting intervals for the bisections can be found from the values in $L$ .
(c)
If $u_{l} > 0$ , approximate $log (\int_{0}^{u_{l}} h (u) d u)$ using a trapezoidal rule with proper logarithm and knots $u_{1}^{'}, \dots, u_{m}^{'}$ where $u_{i}^{'}$ are those u’s in $L$ satisfying $u \leq u_{l}$ . Call the approximation ${\hat{μ}}_{(0, u_{l})} (x_{s})$ . If $u_{l} = 0$ , set ${\hat{μ}}_{(0, u_{l})} = - \infty$ .
(d)
If $u_{r} < 1$ , approximate $log (\int_{u_{r}}^{1} h (u) d u)$ using a trapezoidal rule with proper logarithm and knots $u_{1}^{″}, \dots, u_{p}^{″}$ where $u_{i}^{″}$ are those u’s in $L$ satisfying $u \geq u_{r}$ . Call the approximation ${\hat{μ}}_{(u_{r}, 1)} (x_{s})$ . If $u_{r} = 0$ , set ${\hat{μ}}_{(u_{r}, 1)} (x_{s}) = - \infty$ .
(e)
Estimate $log (\int_{u_{l}}^{u_{r}} h (u) d u)$ via RQMC. That is, compute ${\hat{μ}}_{log f, n}^{RQMC}$ from (10) where every $u_{i, b} \in (0, 1)$ is replaced by $u_{i, b}^{'} = u_{l} + (u_{r} - u_{l}) u_{i, b} \in (u_{l}, u_{r})$ . Increase n until the error tolerance $ε$ is met. Then set ${\hat{μ}}_{(u_{l}, u_{r})} = log (u_{r} - u_{l}) + {\hat{μ}}_{log f, n}^{RQMC}$ which estimates $log (\int_{u_{l}}^{u_{r}} h (u) d u)$ .
(f)
Combine

$\begin{matrix} {\hat{μ}}_{log f (x_{s})}^{RQMC} = LSE ({\hat{μ}}_{(0, u_{l})} (x_{s}), {\hat{μ}}_{(u_{l}, u_{r})} (x_{s}), {\hat{μ}}_{(u_{r}, 1)} (x_{s})) \end{matrix}$
Return ${\hat{μ}}_{log f (x_{l})}^{RQMC}$ , $l = 1, \dots, N$ .

The preceding discussion focused on the classical multivariate t setting, as the density is known in this case. Next, consider a grouped inverse-gamma mixture model and let

X \sim {gt}_{d} (ν, μ, Σ)

. The density

f_{ν, μ, P}^{gt}

of

X \sim {gt}_{d} (ν, μ, Σ)

is not available in closed form, so that here we indeed need to rely on estimation of the latter. The following experiment is performed for

X \sim {gt}_{2} (ν, 0, I_{2})

with

ν = (3, 6)

and for

X \sim {gt}_{10} (ν, 0, I_{10})

where

ν = (3, \dots, 3, 6, \dots, 6)

(corresponding to two groups of size 5 each). First, a sample from a more heavy tailed grouped t distribution of size 2500 is sampled (with degrees of freedom

ν^{'} = (1, 2)

and

ν^{'} = (1, \dots, 1, 2, \dots, 2)

, respectively) and then the log-density function of

X \sim {gt}_{d} (ν, 0, I_{d})

is evaluated at the sample. The results are shown in Figure 3.

It is clear from the plots that integrate() again gives wrong approximations to

f (x)

for input

x

far out in the tail; for small input

x

, the results from integrate() and from dgnvmix() coincide. Furthermore, it can be seen that the density function is not monotonic in the Mahalanobis distance (as grouped normal mixtures are not elliptical anymore). The plot also includes the log-density functions of an ungrouped d-dimensional t distribution with degrees of freedom 3 and 6, respectively. The log-density function of the grouped mixture with

ν = (3, 6)

is not bounded by either; in fact, the grouped mixture shows heavier tails than both the t distribution with 3 and with 6 dof.

5. Kendall tau and Spearman rho

Two widely used measures of association are the rank correlation coefficients Spearman’s rho

ρ_{S}

and Kendall’s tau

ρ_{τ}

. For elliptical models, one can easily compute Spearman’s rho as a function of the copula parameter

ρ

which can be useful in estimating the matrix P non-parametrically. For grouped mixtures, however, this is not easily possible. In this section, integral representations for Spearman’s rho and Kendall’s tau in the general grouped NVM case are derived.

If

X = (X_{1}, X_{2}) \sim F

is a random vector with continuous margins

F_{1}, F_{2}

, then

ρ_{S} (X_{1}, X_{2}) = ρ (F_{1} (X_{1}), F_{2} (X_{2}))

and

ρ_{τ} (X_{1}, X_{2}) = ℙ ((X_{1} - Y_{1}) (X_{2} - Y_{2}) > 0) - ℙ ((X_{1} - Y_{1}) (X_{2} - Y_{2}) < 0)

, where

(Y_{1}, Y_{2}) \sim F

independent of

(X_{1}, X_{2})

and

ρ (X, Y) = cov (X, Y) / \sqrt{var (X) var (Y)}

is the linear correlation between X and Y. Both

ρ_{S}

and

ρ_{τ}

depend only on the copula of F.

If

X \sim {ELL}_{2} (μ, Σ, F_{R})

is elliptical and

ρ = Σ_{12} / \sqrt{Σ_{11} Σ_{22}}

, then

\begin{matrix} ρ_{τ} (X_{1}, X_{2}) = \frac{2}{π} arcsin (ρ); \end{matrix}

(20)

see (Lindskog et al. 2003, Theorem 2). This formula holds only approximately for grouped normal variance mixtures. In Daul et al. (2003), an expression was derived for Kendall’s tau of bivariate, grouped t copulas. Their result is easily extended to the more general grouped normal variance mixture case; see Section 8 for the proof.

Proposition 1.

Let

X \sim {gNVM}_{2} (μ, Σ, F_{W})

and

ρ = Σ_{12} / \sqrt{Σ_{11} Σ_{22}}

. Then

\begin{matrix} ρ_{τ} (X_{1}, X_{2}) = \frac{2}{π} E (arcsin (ρ \frac{F_{W_{1}}^{\leftarrow} (U) F_{W_{2}}^{\leftarrow} (U) + F_{W_{1}}^{\leftarrow} (\tilde{U}) F_{W_{2}}^{\leftarrow} (\tilde{U})}{\sqrt{(F_{W_{1}}^{\leftarrow} {(U)}^{2} + F_{W_{1}}^{\leftarrow} {(\tilde{U})}^{2}) (F_{W_{2}}^{\leftarrow} {(U)}^{2} + F_{W_{2}}^{\leftarrow} {(\tilde{U})}^{2})}})), \end{matrix}

(21)

where

U, \tilde{U} \underset{}{\overset{ind .}{\sim}} U (0, 1)

.

Next, we address Spearman’s rho

ρ_{S}

. For computing

ρ_{S}

, it is useful to study

ℙ (X_{1} > 0, X_{2} > 0)

. If

X \sim {ELL}_{2} (μ, P, F_{R})

where P is a correlation matrix with

P_{12} = ρ

and

P (X = 0) = 0

, then

\begin{matrix} P (X_{1} > 0, X_{2} > 0) = \frac{1}{4} + \frac{arcsin (ρ)}{2 π}, \end{matrix}

see, e.g., (McNeil et al. 2015, Proposition 7.41). Using the same technique, we can show that this result also holds for grouped normal variance mixtures; see Section 8 for the proof.

Proposition 2.

Let

X \sim {gNVM}_{2} (μ, Σ, F_{W})

and

ρ = Σ_{12} / \sqrt{Σ_{11} Σ_{22}}

. Then

\begin{matrix} P (X_{1} > 0, X_{2} > 0) = \frac{1}{4} + \frac{arcsin (ρ)}{2 π} . \end{matrix}

Remark 1.

If

Y

is a grouped elliptical distribution in the sense of (5), a very similar idea can be used to show that

ℙ (Y_{1} > 0, Y_{2} > 0) = 1 / 4 + arcsin (ρ) / (2 π)

.

Next, we derive a new expression for Spearman’s rho

ρ_{S}

for bivariate grouped normal variance mixture distributions; see Section 8 for the proof.

Proposition 3.

Let

X \sim {gNVM}_{2} (0, P, F_{W})

and

ρ = P_{12}

. Then

\begin{matrix} ρ_{S} (X_{1}, X_{2}) = \frac{6}{π} E (arcsin (ρ \sqrt{\frac{F_{W_{1}}^{\leftarrow} (U) F_{W_{2}}^{\leftarrow} (U)}{(F_{W_{1}}^{\leftarrow} (U) + F_{W_{1}}^{\leftarrow} (\tilde{U})) (F_{W_{2}}^{\leftarrow} (U) + F_{W_{2}}^{\leftarrow} (\bar{U}))}})), \end{matrix}

(22)

where

U, \tilde{U}, \bar{U} \underset{}{\overset{ind .}{\sim}} U (0, 1)

.

Numerical Results

Let

X \sim {gNVM}_{2} (0, P, F_{W})

. It follows from Proposition 1 that

\begin{matrix} ρ_{τ} (X_{1}, X_{2}) = \int_{{(0, 1)}^{2}} g_{τ} (u) d u, g_{τ} (u) = \frac{2}{π} arcsin (ρ \frac{F_{W_{1}}^{\leftarrow} (u_{1}) F_{W_{2}}^{\leftarrow} (u_{1}) + F_{W_{1}}^{\leftarrow} (u_{2}) F_{W_{2}}^{\leftarrow} (u_{2})}{\sqrt{(F_{W_{1}}^{\leftarrow} {(u_{1})}^{2} + F_{W_{1}}^{\leftarrow} {(u_{2})}^{2}) (F_{W_{2}}^{\leftarrow} {(u_{1})}^{2} + F_{W_{2}}^{\leftarrow} {(u_{2})}^{2})}}) . \end{matrix}

Similarly, Proposition 3 implies that

\begin{matrix} ρ_{S} (X_{1}, X_{2}) = \int_{{(0, 1)}^{3}} g_{ρ} (u) d u, g_{ρ} (u) = \frac{6}{π} arcsin (ρ \sqrt{\frac{F_{W_{1}}^{\leftarrow} (u_{1}) F_{W_{2}}^{\leftarrow} (u_{1})}{(F_{W_{1}}^{\leftarrow} (u_{1}) + F_{W_{1}}^{\leftarrow} (u_{2})) (F_{W_{2}}^{\leftarrow} (u_{1}) + F_{W_{2}}^{\leftarrow} (u_{3}))}}) . \end{matrix}

Hence, both

ρ_{τ} (X_{1}, X_{2})

and

ρ_{S} (X_{1}, X_{2})

can be expressed as integrals over the d-dimensional unit hypercube with

d \in {2, 3}

so that RQMC methods as described in Section 2.2 can be applied directly to the problem in this form to estimate

ρ_{τ} (X_{1}, X_{2})

and

ρ_{S} (X_{1}, X_{2})

, respectively. This is implemented in the function corgnvmix() (with method = "kendall" or method = "spearman") of the R package nvmix.

As an example, we consider three different bivariate grouped t distributions with

ν \in {(1, 2), (4, 8), (1, 5), (4, 20), (1, \infty), (4, \infty)}

and plot estimated

ρ_{τ}

as a function of

ρ

in Figure 4. The elliptical case (corresponding to equal dof) is included for comparison. When the pairwise dof are close and

ρ

is not too close to 1, the elliptical approximation is quite satisfactory. However, when the dof are further apart there is a significant difference between the estimated

ρ_{τ}

and the elliptical approximation. This is highlighted in the plot on the right side, which displays the relative difference

(ρ_{τ}^{ell} - ρ_{τ}) / ρ_{τ}^{ell}

. Intuitively, it makes sense that the approximation deteriorates when the dof are further apart, as the closer the dof, the “closer” is the model to being elliptical.

6. Copula Setting

So far, the focus of this paper was on grouped normal variance mixtures. This section addresses grouped normal variance mixture copulas, i.e., the copulas derived from

X \sim gNVM (F_{W}, μ, Σ)

via Sklar’s theorem. The first part addresses grouped NVM copulas in full generality and provides formulas for the copula, its density and the tail dependence coefficients. The second part details the important special case of inverse-gamma mixture copulas, that is copulas derived from a grouped t distribution,

X \sim {gt}_{d} (ν, μ, Σ)

. The third part discusses estimation of the copula and its density whereas the fourth part answers the question of how copula parameters can be fitted to a dataset. The last part of this section includes numerical examples.

6.1. Grouped Normal Variance Mixture Copulas

Copulas provide a flexible tool for modeling dependent risks, as they allow one to model the margins separately from the dependence between the margins. Let

X \sim F

be a d-dimensional random vector with continuous margins

F_{1}, \dots, F_{d}

. Consider the random vector

U

given by

U = (U_{1}, \dots, U_{d}) = (F_{1} (X_{1}), \dots, F_{d} (X_{d}))

; note that

U_{j} \sim U (0, 1)

for

j = 1, \dots, d

. The copula C of F (or

X

) is the distribution function of the margin-free

U

, i.e.,

\begin{matrix} C (u) = ℙ (F_{1} (X_{1}) \leq u_{1}, \dots, F_{d} (X_{d}) \leq u_{d}) = F (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d})), u = (u_{1}, \dots, u_{d}) \in {[0, 1]}^{d} . \end{matrix}

If F is absolutely continuous and the margins

F_{1}, \dots, F_{d}

are strictly increasing and continuous, the copula density is given by

\begin{matrix} c (u) = \frac{\partial}{\partial u_{1} \dots \partial u_{d}} C (u_{1}, \dots, u_{d}) = \frac{f (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d}))}{\prod_{j = 1}^{d} f_{j} (F_{j}^{\leftarrow} (u_{j}))}, u = (u_{1}, \dots, u_{d}) \in {(0, 1)}^{d}, \end{matrix}

(23)

where f denotes the (joint) density of F and

f_{j}

is the marginal density of

F_{j}

. For more about copulas and their applications to risk management, see, e.g., Embrechts et al. (2001); Nelsen (2007).

Since copulas are invariant with respect to strictly increasing marginal transformations, we may wlog assume that

μ = 0

,

Σ = P

is a correlation matrix and we may consider

X \sim {gNVM}_{d} (0, P, F_{W})

. We find using (11) that the grouped normal variance mixture copula is given by

\begin{matrix} C_{P, F_{W}}^{gNVM} (u) & = F (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d})) = \int_{0}^{1} Φ_{Σ} ({(1 / \sqrt{W})}^{D} (u) x) d u, x = (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d})), \end{matrix}

(24)

and its density can be computed using (18) as

\begin{matrix} c_{P, F_{W}}^{gNVM} (u) & = \frac{f (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d}))}{\prod_{i = 1}^{d} f_{i} (F_{i}^{\leftarrow} (u_{i}))} \\ = \frac{\int_{0}^{1} {\sqrt{{(2 π)}^{d} | Σ | \prod_{j = 1}^{d} F_{W_{j}}^{\leftarrow} (u)}}^{- 1} exp (- \frac{D^{2} (x; μ, {\sqrt{W}}^{D} (u) Σ {\sqrt{W}}^{D} (u))}{2}) d u}{\prod_{j = 1}^{d} f_{j} (F_{j}^{\leftarrow} (u_{j}))}, \end{matrix}

(25)

where

F_{j}

and

f_{j}

denote the distribution function and density function of

X_{j} \sim {NVM}_{1} (0, 1, F_{W_{i}})

for

j = 1, \dots, d

; directly considering

log (c_{P, F_{W}}^{gNVM} (u))

also makes (25) more robust to compute.

In the remainder of this subsection, some useful properties of gNVM copulas are derived. In particular, we study symmetry properties, rank correlation and tail dependence coefficients.

6.1.1. Radial Symmetry and Exchangeability

A d-dimensional random vector

X

is radially symmetric about

μ \in R^{d}

if

X - μ \underset{}{\overset{d}{=}} μ - X

. It is evident from (2) that

X \sim {gNVM}_{d} (μ, Σ, F_{W})

is radially symmetric about its location vector

μ

. In layman’s terms this implies that jointly large values of

X

are as likely as jointly small values of

X

. Radial symmetry also implies that

c_{P, F_{W}}^{gNVM} (u) = c_{F_{W}}^{gNVM} (1 - u)

.

If

(X_{Π (1)}, \dots, X_{Π (d)}) \underset{}{\overset{d}{=}} (X_{1}, \dots, X_{d})

for all permutations

Π

of

{1, \dots, d}

, the random vector

X

is called exchangeable. The same definition applies to copulas. If

X \sim {gNVM}_{d} (0, I_{d}, F_{W})

, then

X

is in general not exchangeable unless

F_{W_{1}} = \dots = F_{W_{d}}

in which case

X \sim {NVM}_{d} (0, P, F_{W_{1}})

. The lack of exchangeability implies that

c_{I_{d}, F_{W}}^{gNVM} (u_{1}, \dots, u_{d}) \neq c_{I_{d}, F_{W}}^{gNVM} (u_{Π (1)}, \dots, u_{Π (d)})

, in general.

6.1.2. Tail Dependence Coefficients

Consider a bivariate

C_{P, F_{W}}^{gNVM}

copula. Such copula is radially symmetric, hence the lower and upper tail dependence coefficients are equal, i.e.,

λ_{l} = λ_{u} = : λ \in [0, 1]

, where

\begin{matrix} λ_{l} & = lim_{q \to 0^{+}} (U_{2} \leq q ∣ U_{1} \leq q) = lim_{q \to 0^{+}} \frac{C (q, q)}{q}, \end{matrix}

for

(U_{1}, U_{2}) \sim C_{P, F_{W}}^{gNVM}

. In the case where only the quantile functions

F_{W_{j}}^{\leftarrow}

are available, no simple expression for

λ

is available. In Luo and Shevchenko (2010),

λ

is derived for grouped t copulas, as will be discussed in Section 6.2. Following the arguments used in their proof, the following lemma provides a new expression for

λ

in the more general normal variance mixture case.

Proposition 4.

The tail dependence coefficient λ for a bivariate

C_{P, F_{W}}^{gNVM}

with

ρ = P_{12}

satisfies

\begin{matrix} λ = lim_{q \to 0^{+}} I (q, 1, 2) + I (q, 2, 1), \end{matrix}

where for

i, j \in {1, 2}

,

\begin{matrix} I (q, i, j) = \int_{0}^{1} \frac{ϕ (F_{i}^{\leftarrow} (q) / \sqrt{F_{W_{i}}^{\leftarrow} (u)})}{\sqrt{F_{W_{i}}^{\leftarrow} (u)} f_{i} (F_{i}^{\leftarrow} (q))} Φ (\frac{F_{j}^{\leftarrow} (q) / \sqrt{F_{W_{j}}^{\leftarrow} (u)} - ρ F_{i}^{\leftarrow} (q) / \sqrt{F_{W_{i}}^{\leftarrow} (u)}}{\sqrt{1 - ρ^{2}}}) d u . \end{matrix}

6.2. Inverse-Gamma Mixtures

If

X \sim t_{d} (ν, 0, P)

for a positive definite correlation matrix P, the copula of

X

extracted via Sklar’s theorem is the well known t copula, denoted by

C_{ν, P}^{t}

. This copula is given by

\begin{matrix} C_{ν, P}^{t} (u) = \int_{- \infty}^{t_{ν}^{- 1} (u_{1})} \dots \int_{- \infty}^{t_{ν}^{- 1} (u_{d})} \frac{Γ ((ν + d) / 2)}{Γ (ν / 2) \sqrt{{(ν π)}^{d} | P |}} {(1 + \frac{x^{⊤} P^{- 1} x}{ν})}^{- \frac{ν + d}{2}} d x, u = (u_{1}, \dots, u_{d}) \in {[0, 1]}^{d}, \end{matrix}

(26)

where

t_{ν}

and

t_{ν}^{- 1}

denote the distribution function and quantile function of a univariate standard t distribution. Note that (26) is merely the distribution function of

X \sim t_{d} (ν, 0, P)

evaluated at the quantiles

t_{ν}^{- 1} (u_{1}), \dots, t_{ν}^{- 1} (u_{d})

. The copula density

c_{ν, P}^{t} (u)

is

\begin{matrix} c_{ν, P}^{t} (u) = f_{ν, 0, P}^{t} (t_{ν}^{- 1} (u_{1}), \dots, t_{ν}^{- 1} (u_{d})) {(\prod_{j = 1}^{d} f_{t_{ν}} (t_{ν}^{- 1} (u_{j})))}^{- 1}, u \in {[0, 1]}^{d} . \end{matrix}

The (upper and lower) tail dependence coefficient

λ

of the bivariate

C_{ν, P}^{t}

with

ρ = P_{12}

is well known to be

\begin{matrix} λ = 2 t_{ν + 1} (- \sqrt{(ν + 1) (1 - ρ) / (1 + ρ)}); \end{matrix}

see (Demarta and McNeil 2005, Propositon 1). The multivariate t distribution being elliptical implies the formula

ρ_{τ} = 2 arcsin (ρ) / π

for Kendall’s tau.

A closed formula for Spearman’s rho is not available, but our Proposition 3 implies that

\begin{matrix} ρ_{S} = \frac{6}{π} E (arcsin (ρ W {\sqrt{(W + \tilde{W}) (W + \bar{W})}}^{- 1})), W, \tilde{W}, \bar{W} \underset{}{\overset{ind .}{\sim}} IG (ν / 2, ν / 2) . \end{matrix}

The same formula was given in (McNeil et al. 2015, Proposition 7.44).

Next, consider a grouped inverse-gamma mixture model. If

X \sim {gt}_{d} (ν, 0, P)

, the copula of

X

is the grouped t copula, denoted by

C_{ν, P}^{gt}

. From (24),

\begin{matrix} C_{ν, P}^{gt} (u) = \int_{0}^{1} Φ_{P} ({(1 / \sqrt{W})}^{D} (u) x) d u, x = (t_{ν_{1}}^{- 1} (u_{1}), \dots, t_{ν_{d}}^{- 1} (u_{d}), \end{matrix}

and the copula density follows from (25) as

\begin{matrix} c_{ν, P}^{gt} (u) = f_{ν, 0, P}^{gt} (t_{ν_{1}}^{- 1} (u_{1}), \dots, t_{ν_{d}}^{- 1} (u_{d})) {(\prod_{j = 1}^{d} f_{t_{ν_{j}}} (t_{ν_{j}}^{- 1} (u_{j})))}^{- 1} . \end{matrix}

The (lower and upper) tail dependence coefficient

λ

of

C_{ν_{1}, ν_{2}, P}^{gt}

is given by

\begin{matrix} λ & = Ω (ρ, ν_{1}, ν_{2}) + Ω (ρ, ν_{2}, ν_{1}), \\ Ω (ρ, ν_{1}, ν_{2}) & = \int_{0}^{\infty} f_{χ_{ν_{1} + 1}^{2}} (t) Φ (- (B_{ν_{1}, ν_{2}} t^{ν_{1} / (2 ν_{2})} - ρ \sqrt{t}) {(1 - ρ^{2})}^{- 1 / 2}) d t, \\ B_{ν_{1}, ν_{2}} & = {(\frac{2^{ν_{2} / 2} Γ ((1 + ν_{2}) / 2)}{2^{ν_{1} / 2} Γ ((1 + ν_{1}) / 2)})}^{1 / ν_{2}}; \end{matrix}

(27)

see (Luo and Shevchenko 2010, Equation (26)). Here,

f_{χ_{ν}^{2}}

denotes the density of a

χ_{ν}^{2}

distribution.

Finally, consider rank correlation coefficients for grouped t copulas. No closed formula for either Kendall’s tau or Spearman’s rho exists in the grouped t case. An exact integral representation of

ρ_{τ}

for

C_{ν_{1}, ν_{2}, P}^{gt}

follows from Proposition 1. No substantial simplification of (21) therein can be achieved by considering the special case when

W_{j} \sim IG (ν_{j} / 2, ν_{j} / 2)

. In order to compute

ρ_{τ}

, one can either numerically integrate (21) (as will be discussed in the next subsection) or use the approximation

ρ_{τ} \approx \frac{2}{π} arcsin (ρ)

which was shown to be a “very accurate” approximation in Daul et al. (2003).

For Spearman’s rho, no closed formula can be derived either, not even in the ungrouped t copula case, so that the integral in (22) in Proposition 3 needs be computed numerically, as will be discussed in the next subsection.

The discussion in this section highlights that moving from a scalar mixing rv W (as in the classical t case) to comonotone mixing rvs

W_{1}, \dots, W_{S}

(as in the grouped t case) introduces challenges from a computational point of view. While in the classical t setting, the density, Kendall’s tau and the tail dependence coefficient are available in closed form, all of these quantities need to be estimated in the more general grouped setting. Efficient estimation of these important quantities is discussed in the next subsection.

6.3. Estimation of the Copula and Its Density

Consider a d-dimensional normal variance mixture copula

C_{P, F_{W}}^{gNVM}

. From (24), it follows that

\begin{matrix} C_{P, F_{W}}^{gNVM} (u) = F_{X} (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d})), u = (u_{1}, \dots, u_{d}) \in {[0, 1]}^{d}, \end{matrix}

where

F_{X}

is the distribution function of

X \sim gNVM (0, P, F_{W})

and

F_{j}

is the distribution function of

{NVM}_{1} (0, 1, F_{W_{j}})

for

j = 1, \dots, d

. If the margins are known (as in the case of an inverse-gamma mixture), evaluating the copula is no harder than evaluating the distribution function of

X

so that the methods described in Section 3.1 can be applied.

When the mixing rvs

W_{j}

are only known through their quantile functions in the form of a “black box”, one needs to estimate the marginal quantiles

F_{j}

of F first. Note that

\begin{matrix} F_{j} (x) = P (X_{j} \leq x) = \int_{(0, 1)} Φ (x / \sqrt{F_{W_{j}}^{\leftarrow} (u)}) d u, x \in R, \end{matrix}

(28)

which can be estimated using RQMC. The quantile

F_{j}^{\leftarrow} (u_{j})

can then be estimated by numerically solving

F_{j} (x) = u

for x, for instance using a bisection algorithm or Newton’s method.

The general form of gNVM copula densities was given in (25). Again, if the margins are known, the only unknown quantity is the joint density

f_{X}

which can be estimated using the adaptive RQMC procedure proposed in Section 4.1. If the margins are not available,

F_{j}^{\leftarrow}

can be estimated as discussed above. The marginal densities

f_{j}

can be estimated using an adaptive RQMC algorithm similar to the one developed in Section 4.1; see also (Hintz et al. 2020, Section 4).

Remark 2.

Estimating the copula density is the most challenging problem discussed in this paper if we assume that

F_{W}

is only known via its marginal quantile functions. Evaluating the copula density

c_{P, F_{W}}^{gNVM}

at one

u \in {[0, 1]}^{d}

requires estimation of:

the marginal quantiles $F_{j}^{\leftarrow} (u_{j})$ , which involves estimation of $F_{j}$ and then numerical root finding, for each $j = 1, \dots, d$ ,
the marginal densities evaluated at the quantiles $f_{j} (F_{j}^{\leftarrow} (u_{j}))$ for $j = 1, \dots, d$ . This involves estimation of the density of a univariate normal variance mixture,
the joint density evaluated at the quantiles $f (F_{1}^{\leftarrow} (u_{1}), \dots, F_{d}^{\leftarrow} (u_{d}))$ , which is another one dimensional integration problem.

It follows from Remark 2 that, while estimation of

c_{P, F_{W}}^{gNVM}

is theoretically possible with the methods proposed in this paper, the problem becomes computationally intractable for large dimensions d. If the margins are known, however, our proposed methods are efficient and accurate, as demonstrated in next subsection, where we focus on the important case of a grouped t model. Our methods to estimate the copula and the density of

C_{ν, k, P}^{gt}

are implemented in the functions pgStudentcopula() and dgStudentcopula() in the R package nvmix.

6.4. Fitting Copula Parameters to a Dataset

In this subsection, we discuss estimation methods for grouped normal variance mixture copulas. Let

X_{1}, \dots, X_{n}

be independent and distributed according to some distribution with

C_{P, F_{W}}^{gNVM}

as underlying copula, with

X_{i} = (X_{i, 1}, \dots, X_{i, d})

and group sizes

d_{1}, \dots, d_{S}

with

\sum_{j = 1}^{S} d_{j} = d

. Furthermore, let

ν_{k}

be (a vector of) parameters of the kth mixing distribution for

k = 1, \dots, S

; for instance, in the grouped t case,

ν_{k} = ν_{k}

is the degrees of freedom for group k. Finally, denote by

ν = (ν_{1}, \dots, ν_{S})

the vector consisting of all mixing parameters. Note that we assume that the group structure is given. We are interested in estimating the parameter vector

ν

and the matrix P of the underlying copula

C_{P, F_{W}}^{gNVM}

.

In Daul et al. (2003), this problem was discussed for the grouped t copula where

d_{k} \geq 2

for

k = 1, \dots, S

. In this case, all subgroups are t copulas and Daul et al. (2003) suggest estimating the dof

ν_{1}, \dots, ν_{S}

separately in each subgroup. Computationally, this is rather simple as the density of the ungrouped t copula is known analytically. Luo and Shevchenko (2010) consider the grouped t copula with

S = d

, so

d_{k} = 1

for

k = 1, \dots, d

. Since any univariate margin of a copula is uniformly distributed, separate estimation is not feasible. As such, Luo and Shevchenko (2010) suggest estimating

ν_{1}, \dots, ν_{S}

jointly by maximizing the copula-likelihood of the grouped mixture. In both references, the matrix P is estimated by estimating pairwise Kendall’s tau and using the approximate identity

ρ_{τ} (X_{i}, X_{j}) \approx 2 arcsin (ρ_{i, j}) / π

for

i \neq j

. Although we have shown in Section 5 that in some cases, this approximation could be too crude, our assessment is that in the context of the fitting examples considered in the present section, this approximation is sufficiently accurate. Luo and Shevchenko (2010) also consider joint estimation of

(P, ν)

by maximizing the corresponding copula likelihood simultaneously over all

d + d (d - 1) / 2

parameters. Their numerical results in

d = 2

suggest that this does not lead to a significant improvement. In large dimensions

d > 2

, the optimization problem becomes intractable, however, so that the first non-parametric approach for estimating P is likely to be preferred.

We combine the two estimation methods, applied to the general case of a grouped normal variance mixture, in Algorithm 2.

Algorithm 2: Estimation of the Copula Parameters

ν

and

P

of

C_{P, F_{W}}^{gNVM}

.

Given iid

X_{1}, \dots, X_{n}

, estimate

ν

and P of the underlying

C_{P, F_{W}}^{gNVM}

as follows:

Estimation of P. Estimate Kendall’s tau $ρ_{τ} (X_{i}, X_{j})$ for each pair $1 \leq i < j \leq d$ . Use the approximate identity $ρ_{τ} (X_{i}, X_{j}) \approx 2 arcsin (ρ_{i, j}) / π$ to find the estimates $ρ_{i, j}$ . Then combine the estimates $ρ_{i, j}$ into a correlation matrix $\hat{P}$ , which may have to be modified to ensure positive definiteness.
Transformation to pseudo-observations. If necessary, transform the data $X_{1}, \dots, X_{n}$ to pseudo-observations $U_{1}, \dots, U_{n}$ from the underlying copula, for instance, by setting $U_{i, j} = R_{i, j} / (n + 1)$ where $R_{i, j}$ is the rank of $X_{i, j}$ among $X_{1, j}, \dots, X_{n, j}$ .
Initial parameters. Maximize the copula log-likelihood for each subgroup k with $d_{k} \geq 2$ over their respective parameters separately. That is, if $U_{i}^{(k)} = (U_{i, d_{k - 1} + 1}, \dots, U_{i, d_{k - 1} + d_{k}})$ (where $d_{0} = 0$ ) denotes the sub-vector of $U_{i}$ belonging to group k, and if ${\hat{P}}^{(k)}$ is defined accordingly, solve the following optimization problems:

$\begin{matrix} {\hat{ν}}_{0}^{(k)} = argmax l (ν^{(k)}; U_{1}^{(k)}, \dots, U_{n}^{(k)}), \\ l (ν^{(k)}; U_{1}^{(k)}, \dots, U_{n}^{(k)}) = \sum_{i = 1}^{n} log c_{{\hat{P}}^{(k)}, F_{W}}^{gNVM} (U_{i}^{(k)}), \forall k : d_{k} \geq 2 . \end{matrix}$

(29)

For “groups” with $d_{k} = 1$ , choose the initial estimate ${\hat{ν}}_{0}^{(k)}$ from prior/expert experience or as a hard-coded value.
Joint estimation. With initial estimates ${\hat{ν}}_{0}^{(k)}$ , $k = 1, \dots, S$ at hand, optimize the full copula likelihood to estimate $ν$ ; that is,

$\begin{matrix} \hat{ν} = argmax l (ν; U_{1}, \dots, U_{n}), l (ν; U_{1}, \dots, U_{n}) = \sum_{i = 1}^{n} log c_{\hat{P}, F_{W}}^{gNVM} (U_{i}) . \end{matrix}$

(30)

The method proposed in Daul et al. (2003) returns the initial estimates obtained in Step 3. A potential drawback of this approach is that it fails to consider the dependence between the groups correctly. Indeed, the dependence between a component in group

k_{1}

and a component in group

k_{2}

(e.g., measured by Kendall’s tau or by the tail-dependence coefficient) is determined by both

ν^{(k_{1})}

and

ν^{(k_{2})}

. As such, these parameters should be estimated jointly.

Note that the copula density is not available in closed form, not even in the grouped t case, so that each call of the likelihood function in (30) requires the approximation of n integrals. This poses numerical challenges, as the estimated likelihood function is typically “bumpy”, having many local maxima due to estimation errors.

If

F_{W}

is only known via its marginal quantile functions, as is the general theme of this paper, the optimization problem in (29) and in (30) become intractable (unless d and n are small) due to the numerical challenges involved in the estimation of the copula density; see also Remark 2. We leave the problem of fitting grouped normal variance mixture copulas in full generality (where the distribution of the mixing random variables

W_{j}

is only specified via marginal quantile functions in the form of a “black box”) for future research. Instead, we focus on the important case of a grouped t copula. Here, the quantile functions

F_{j}^{\leftarrow}

(of

X_{j}

) and the densities

f_{j}

are known for

j = 1, \dots, d

, since the margins are all t distributed. This substantially simplifies the underlying numerical procedure. Our method is implemented in the function fitgStudentcopula() of the R package nvmix. The numerical optimizations in Steps 3 and 4 are passed to the R optimizer optim() and the copula density is estimated as in Section 6.3.

Example 1.

Consider a 6-dimensional grouped t copula, with three groups of size 2 each and degrees of freedom

1, 4

and 7, respectively. We perform the following experiment: We sample a correlation matrix P using the R function rWishart(). Then, for each sample size

n \in {250, 500, \dots, 1750, 2000}

, we repeat sampling

X_{1}, \dots, X_{n}

15 times, and in each case, estimate the degrees of freedom once using the method in Daul et al. (2003) (i.e., by estimating the dof in each group separately) and once using our method from the previous section. The true matrix P is used in the fitting, so that the focus is really on estimating the dof. The results are displayed in Figure 5. The estimates on the left are obtained for each group separately; on the right, the dof were estimated jointly by maximizing the full copula likelihood (with initial estimates obtained as in the left figure). Clearly, the jointly estimated parameters are much closer to their true values (which are known in this simulation study and indicated by horizontal lines), and it can be confirmed that the variance decreases with increasing sample size n.

Example 2.

Let us now consider the negative logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2014 to 31 December 2015 (

n = 503

data points obtained from the R package qrmdata of Hofert and Hornik (2016)) and, after deGARCHing, fit a grouped t copula to the standardized residuals. We choose the natural groupings induced by the industry sectors of the 30 constituents and merge groups of size 1 so that 9 groups are left. Figure 6 displays the estimates obtained for various specifications of maxit, the maximum number of iterations for the underlying optimizer (note that the current default of optim() is as low as maxit = 500). The points for maxit = 0 correspond to the initial estimates found from separately fitting t copulas to the groups. The initial estimates differ significantly from the maximum likelihood estimates (MLEs) obtained from the joint estimation of the dof. Note also that the MLEs change with increasing maxit argument, even though they do not change drastically anymore if 1500 or more iterations are used. Note that the initial parameters result in a much more heavy tailed model than the MLEs. Figure 6 also displays the estimated log-likelihood of the parameters found by the fitting procedure. The six lines correspond to the estimated log-likelihood using six different seeds. It can be seen that estimating the dof jointly (as opposed to group-wise) yields a substantially larger log-likelihood, whereas increasing the parameter maxit (beyond a necessary minimum) only gives a minor improvement.

In order to examine the impact of the different estimates on the underlying copula in terms of its tail behavior, Figure 7 displays the probability

C (u, \dots, u)

estimated using methods from Section 6.3 as a function of u; in a risk management context,

C (u, \dots, u)

is the probability of a jointly large loss, hence a rare event. An absolute error tolerance of

10^{- 7}

was used to estimate the copula. The figure also includes the corresponding probability for the ungrouped t copula, for which the dof were estimated to be 6.3. Figure 7 indicates that the initial estimates yield the most heavy tailed model. This seems reasonable since all initial estimates for the dof range between 0.9 and 5.3 (with average 2.8). The models obtained from the MLEs exhibit the smallest tail probability, indicating that these are the least heavy tailed models considered here. This is in line with Figure 6, which shows that the dof are substantially larger than the initial estimates. The ungrouped t copula is more heavy tailed than the fitted grouped one (with MLEs) but less heavy tailed than the fitted grouped one with initial estimates.

This example demonstrates that it is generally advisable to estimate the dof jointly when grouped modeling is of interest, rather than group-wise as suggested in Daul et al. (2003). Indeed, in this particular example, the initial estimates give a model that substantially overestimates the risk of jointly large losses. As can be seen from Figure 6, optimizing an estimated log-likelihood function is not at all trivial, in particular when many parameters are involved. Indeed, the underlying optimizer never detected convergence, which is why the user needs to carefully assess which specification of maxit to use. We plan on exploring more elaborate optimization procedures which perform better in large dimensions for this problem in the future.

Example 3.

In this example, we consider the problem of mean-variance (MV) portfolio optimization in the classical Markowitz (1952) setting. Consider d assets, and denote by

μ_{t}

and

Σ_{t}

the expected return vector on the risky assets in excess of the risk free rate and the variance-covariance (VCV) matrix of asset returns in the portfolio at time t, respectively. We assume that an investor chooses the weights

x_{t}

of the portfolio to maximize the quadratic utility function

U (x_{t}) = x_{t}^{⊤} μ_{t} - \frac{γ}{2} x_{t}^{⊤} Σ_{t} x_{t}

, where in what follows we assume the risk-aversion parameter

γ = 1

. When there are no shortselling (or other) constraints, one finds the optimal

x_{t}

as

x_{t} = Σ_{t}^{- 1} μ_{t}

. As in Low et al. (2016), we consider relative portfolio weights, which are thus given by

w_{t} = \frac{Σ_{t}^{- 1} μ_{t}}{| 1^{⊤} Σ_{t}^{- 1} μ_{t} |} .

As such, the investor needs to estimate

μ_{t}

and

Σ_{t}

. If we assume no shortselling, i.e.,

x_{t, j} \geq 0

for

j = 1, \dots, d

, the optimization problem can be solved numerically, for instance using the R package quadprog of Turlach et al. (2019).

Assume we have return data for the d assets stored in vectors

y_{t}

,

t = 1, \dots, T

, and a sampling window

0 < M < T

. We perform an experiment similar to Low et al. (2016) and compare a historical approach with a model-based approach to estimate

μ_{t}

and

Σ_{t}

. The main steps are as follows:

1.: In each period $t = M + 1, \dots, T$ , estimate $μ_{t}$ and $Σ_{t}$ using the M previous return data $y_{i}$ , $i = t - M, \dots, t - 1$ .
2.: Compute the optimal portfolio weights $w_{t}$ and the out-of-sample return $r_{t} = w_{t}^{⊤} y_{t}$ .

In the historical approach,

μ_{t}

and

Σ_{t}

in the first step are merely computed as the sample mean vector and sample VCV matrix of the past return data. Our model-based approach is a simplification of the approach used in Low et al. (2016). In particular, to estimate

μ_{t}

and

Σ_{t}

in the first step, the following is done in each time period:

1a.: Fit marginal $ARMA (1, 1) - GARCH (1, 1)$ models with standardized t innovations to $y_{i}$ , $i = t - M, \dots, t - 1$ .
1b.: Extract the standardized residuals and fit a grouped t copula to the pseudo-observations thereof.
1c.: Sample n vectors from the fitted copula, transform the margins by applying the quantile function of the respective standardized t distribution and based on these n d-dimensional residuals, sample from the fitted $ARMA (1, 1) - GARCH (1, 1)$ giving a total of n simulated return vectors, say $y_{i}^{'}$ , $i = 1, \dots, n$ .
1d.: Estimate $μ_{t}$ and $Σ_{t}$ from $y_{i}^{'}$ , $i = 1, \dots, n$ .

The historical and model-based approaches each produce

T - M

out-of-sample returns from which we can estimate the certainty-equivalent return (CER) and the Sharpe-ratio (SR) as

\hat{CER} = {\hat{μ}}_{r} - \frac{1}{2} {\hat{σ}}_{r}^{2} and \hat{SR} = \frac{{\hat{μ}}_{r}}{{\hat{σ}}_{r}},

where

{\hat{μ}}_{r}

and

{\hat{σ}}_{r}

denote the sample mean and sample standard deviation of the

T - M

out-of-sample returns; see also Tu and Zhou (2011). Note that larger, positive values of the SR and CER indicate better portfolio performance.

We consider logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2013 to 31 December 2014 (

n = 503

data points obtained from the R package qrmdata of Hofert and Hornik (2016)), a sampling window of

M = 250

days,

n = 10^{4}

samples to estimate

μ_{t}

and

Σ_{t}

in the model-based approach, a risk-free interest rate of zero and no transaction costs. We report (in percent) the point estimates

{\hat{μ}}_{r}

,

\hat{CER}

and

\hat{SR}

for the historical approach and for the model-based approach based on an ungrouped and grouped t copula in Table 2 assuming no shortselling. To limit the run time for this illustrative example, the degrees of freedom for the grouped and ungrouped t copula are estimated once and held fixed throughout all time periods

t = M + 1, \dots, T

. We see that the point estimates for the grouped model exceed the point estimates for the ungrouped model.

7. Discussion and Conclusions

We introduced the class of grouped normal variance mixtures and provided efficient algorithms to work with this class of distributions: Estimating the distribution function and log-density function, estimating the copula and its density, estimating Spearman’s rho and Kendall’s tau and estimating the parameters of a grouped NVM copula given a dataset. Most algorithms (and functions in the package nvmix) merely require one to provide the quantile function(s) of the mixing distributions. Due to their importance in practice, algorithms presented in this paper (and their implementation in the R package nvmix) are widely applicable in practice.

We saw that the distribution function (and hence, the copula) of grouped NVM distributions can be efficiently estimated even in high dimensions using RQMC algorithms. The density function of grouped NVM distributions is in general not available in closed form, not even for the grouped t distribution, so one relies on its estimation. Our proposed adaptive algorithm is capable of estimating the log-density even in high dimensions accurately and efficiently. Fitting grouped normal variance mixture copulas, such as the grouped t copula, to data is an important yet challenging task due to lack of a tractable density function. Thanks to our adaptive procedure for estimating the density, the parameters can be estimated jointly in the special case of a grouped t copula. As was demonstrated in the previous section, it is indeed advisable to estimate the dof jointly, as otherwise one might severely over- or underestimate the joint tails.

A computational challenge that we plan to further investigate is the optimization of the estimated log-likelihood function, which is currently slow and lacks a reliable convergence criterion that can be used for automation. Another avenue for future research is to study how one can, for a given multivariate dataset, assign the components to homogeneous groups.

8. Proofs

Proof of Proposition 1.

This is an immediate application of a proposition proven in (Daul et al. 2003, p. 6). □

Proof of Proposition 2.

Assume that

Σ = P

is a correlation matrix with

P_{12} = ρ

(otherwise standardize the margins). We can write

\begin{matrix} X_{1} = \sqrt{W_{1}} Z_{1}, X_{2} = \sqrt{W_{2}} (ρ Z_{1} + \sqrt{1 - ρ^{2}} Z_{2}), \end{matrix}

where

Z = (Z_{1}, Z_{2}) \sim N_{2} (0, I_{2})

and

W_{j} = F_{W_{j}}^{\leftarrow} (U)

for

j = 1, 2

and

U \sim (0, 1)

. Let

S = (S_{1}, S_{2}) \sim U (S^{2 - 1})

be uniformly distributed on the unit circle. Such random vector

S

can be expressed via the stochastic representation

\begin{matrix} S_{1} = cos (Θ), S_{2} = sin (Θ), Θ \sim ([- π, π)) . \end{matrix}

Note that

Z / \sqrt{Z^{⊤} Z} \sim U (S^{2 - 1})

is independent of

Z^{⊤} Z \sim χ_{2}^{2}

. Let

W_{3}^{2} \sim χ_{2}^{2}

be independent of

S

,

W_{1}

,

W_{2}

and set

W_{j}^{'} = \sqrt{W_{j}} W_{3}

for

j = 1, 2

. Then

\begin{matrix} X_{1} = W_{1}^{'} S_{1} = W_{1}^{'} cos (Θ), X_{2} = W_{2}^{'} (ρ S_{1} + \sqrt{1 - ρ^{2}} S_{2}) = W_{2}^{'} (ρ cos (Θ) + \sqrt{1 - ρ^{2}} sin (Θ)) . \end{matrix}

Finally, setting

ϕ = arcsin ρ

and noting that

P (W_{j}^{'} > 0) = 1

for

j = 1, 2

allows us to compute

ℙ (X_{1} > 0, X_{2} > 0)

via

\begin{matrix} ℙ (X_{1} > 0, X_{2} > 0) = ℙ (W_{1}^{'} cos (Θ) > 0, W_{2}^{'} (ρ cos (Θ) + \sqrt{1 - ρ^{2}} sin (Θ)) > 0) \\ = ℙ (cos (Θ) > 0, sin (ϕ) cos (Θ) + cos (ϕ) sin (Θ) > 0) = ℙ (cos (Θ) > 0, sin (Θ + ϕ) > 0) \\ = ℙ (Θ \in (- π / 2, π / 2) \cap (- ϕ, π - ϕ)) = \frac{π / 2 + ϕ}{2 π} . \end{matrix}

Substituting

ϕ = arcsin (ρ)

gives the result. □

Proof of Proposition 3.

We follow (McNeil et al. 2015, Proposition 7.44), where a similar result was shown for

X \sim {NVM}_{2} (μ, Σ, F_{W})

. Since

X

has continuous margins, we can write

\begin{matrix} ρ_{S} (X_{1}, X_{2}) = 6 ℙ ((X_{1} - {\tilde{X}}_{1}) (X_{2} - {\bar{X}}_{2}) > 0) - 3, \end{matrix}

(31)

where

{\tilde{X}}_{1}

and

{\bar{X}}_{2}

are random variables with

{\tilde{X}}_{1} \underset{}{\overset{d}{=}} X_{1}

and

{\bar{X}}_{2} \underset{}{\overset{d}{=}} X_{2}

and where

(X_{1}, X_{2})

,

{\tilde{X}}_{1}

and

{\bar{X}}_{2}

are all independent; see (McNeil et al. 2015, Proposition 7.34).

Let

W_{j} = F_{W_{j}}^{\leftarrow} (U)

for

j = 1, 2

,

{\tilde{W}}_{1} = F_{W_{1}}^{\leftarrow} (\tilde{U})

and

{\bar{W}}_{2} = F_{W_{2}}^{\leftarrow} (\bar{U})

for

U, \tilde{U}, \bar{U} \underset{}{\overset{ind .}{\sim}} U (0, 1)

. Furthermore, let

\tilde{Z}, \bar{Z} \underset{}{\overset{ind .}{\sim}} N (0, 1)

, independent of

Z \sim N_{2} (0, P)

. Finally, set

\begin{matrix} X_{1} = \sqrt{W_{1}} Z_{1}, X_{2} = \sqrt{W_{2}} Z_{2}, \tilde{X} = \sqrt{{\tilde{W}}_{1}} \tilde{Z}, \bar{X} = \sqrt{{\bar{W}}_{2}} \bar{Z} . \end{matrix}

and define the random vector

Y = (Y_{1}, Y_{2})

via

\begin{matrix} Y_{1} = X_{1} - \tilde{X} = \sqrt{W_{1}} Z_{1} - \sqrt{{\tilde{W}}_{1}} \tilde{Z}, Y_{2} = X_{2} - \bar{X} = \sqrt{W_{2}} Z_{2} - \sqrt{{\bar{W}}_{2}} \bar{Z} . \end{matrix}

From (31) it follows that we need to compute

ℙ (Y_{1} Y_{2} > 0)

. Conditional on

U, \tilde{U}, \bar{U}

, we find

\begin{matrix} Y ∣ U, \tilde{U}, \bar{U} \sim N_{2} (0, (\begin{matrix} W_{1} + {\tilde{W}}_{1} & \sqrt{W_{1} W_{2}} ρ \\ \sqrt{W_{1} W_{2}} ρ & W_{2} + {\bar{W}}_{2} \end{matrix})) . \end{matrix}

Since the multivariate normal distribution is a special case of a gNVM distribution (obtained with

W_{1} = \dots = W_{d} = c

for some

c > 0

), Proposition 2 can be applied to compute

\begin{matrix} ρ_{S} (X_{1}, X_{2}) & = 6 ℙ ((X_{1} - {\tilde{X}}_{1}) (X_{2} - {\bar{X}}_{2}) > 0) - 3 = 3 (E (2 ℙ (Y_{1} Y_{2} > 0 ∣ U, \tilde{U}, \bar{U})) - 1) \\ = 3 (E (4 ℙ (Y_{1} > 0, Y_{2} > 0 ∣ U, \tilde{U}, \bar{U})) - 1) \\ = \frac{6}{π} E (arcsin (ρ \sqrt{\frac{F_{W_{1}}^{\leftarrow} (U) F_{W_{2}}^{\leftarrow} (U)}{(F_{W_{1}}^{\leftarrow} (U) + F_{W_{1}}^{\leftarrow} (\tilde{U})) (F_{W_{2}}^{\leftarrow} (U) + F_{W_{2}}^{\leftarrow} (\bar{U}))}})) . \end{matrix}

□

Proof of Proposition 4.

For

u, q \in (0, 1)

, let

x_{j} (u, q) = F_{j}^{\leftarrow} (q) / \sqrt{F_{W_{j}}^{\leftarrow} (u)}

for

j = 1, 2

. Writing C instead of

C_{P, F_{W}}^{gNVM}

and

Φ_{ρ}

for the distribution function of

Z \sim N_{2} (0, P)

, we get

C (q, q) = \int_{0}^{1} Φ_{ρ} (x_{1} (u, q), x_{2} (u, q)) d u,

which implies that

\begin{matrix} λ = lim_{q \to 0^{+}} \frac{1}{q} C (q, q) = lim_{q \to 0^{+}} \frac{\partial}{\partial q} \int_{0}^{1} Φ_{ρ} (x_{1} (u, q), x_{2} (u, q)) d u . \end{matrix}

Taking the derivative with respect to q gives

\begin{matrix} \frac{\partial}{\partial q} \int_{0}^{1} Φ_{ρ} (x_{1} (u, q), x_{2} (u, q)) d u = \int_{0}^{1} \frac{\partial Φ_{ρ}}{\partial x_{1}} \frac{\partial x_{1}}{\partial q} d u + \int_{0}^{1} \frac{\partial Φ_{ρ}}{\partial x_{2}} \frac{\partial x_{2}}{\partial q} d u . \end{matrix}

(32)

We find

\frac{\partial x_{i}}{\partial q} = {(\sqrt{F_{W_{i}}^{\leftarrow} (u)} f_{i} (F_{i}^{\leftarrow} (q)))}^{- 1}

for

i = 1, 2

. Furthermore,

\begin{matrix} \frac{\partial}{\partial x_{1}} Φ_{ρ} (x_{1}, x_{2}) = \frac{\partial}{\partial x_{1}} P (Z_{1} \leq x_{1}, Z_{2} \leq x_{2}) = ϕ (x_{1}) ℙ (Z_{2} \leq x_{2} ∣ Z_{1} = x_{1}) = ϕ (x_{1}) Φ (\frac{x_{2} - ρ x_{1}}{\sqrt{1 - ρ^{2}}}); \end{matrix}

the derivative

\frac{\partial}{\partial x_{2}} Φ_{ρ} (x_{1}, x_{2})

can be found analogously by swapping the roles of

Z_{1}

and

Z_{2}

. Plugging the derivatives into (32) and taking the limit gives the result in the statement. □

Author Contributions

Methodology, E.H., M.H., C.L.; software, E.H., M.H., C.L.; validation, E.H., M.H., C.L.; formal analysis, E.H., M.H., C.L.; data curation, E.H., M.H.; writing–original draft preparation, E.H.; writing–review and editing, E.H., M.H., C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC Discovery Grant RGPIN-2020-04897 and NSERC Discovery Grant RGPIN-238959.

Acknowledgments

We thank five anonymous reviewers for their comments and suggestions, which helped us improve this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Caflisch, Russel, William Morokoff, and Art Owen. 1997. Valuation of Mortgage Backed Securities Using Brownian Bridges to Reduce Effective Dimension. Los Angeles: Department of Mathematics, University of California. [Google Scholar]
Cao, Jian, Marc Genton, David Keyes, and George Turkiyyah. 2020. Exploiting low rank covariance structures for computing high-dimensional normal and student-t probabilities. arXiv arXiv:2003.11183. [Google Scholar]
Daul, Stéphane, Enrico De Giorgi, Filip Lindskog, and Alexander McNeil. 2003. The Grouped t-Copula with an Application to Credit Risk. SSRN 1358956. Available online: https://ssrn.com/abstract=1358956 (accessed on 21 August 2020).
Demarta, Stefano, and Alexander McNeil. 2005. The t copula and related copulas. International Statistical Review 73: 111–29. [Google Scholar] [CrossRef]
Devroye, Luc. 1986. Non-Uniform Random Variate Generation. New York: Springer. [Google Scholar]
Embrechts, Paul, Filip Lindskog, and Alexander McNeil. 2001. Modelling Dependence with Copulas. Rapport Technique. Zurich: Département de Mathématiques, Institut Fédéral de Technologie de Zurich. [Google Scholar]
Genz, Alan. 1992. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics 1: 141–49. [Google Scholar]
Genz, A., and Frank Bretz. 1999. Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. Journal of Statistical Computation and Simulation 63: 103–17. [Google Scholar] [CrossRef]
Genz, Alan, and Frank Bretz. 2002. Comparison of methods for the computation of multivariate t probabilities. Journal of Computational and Graphical Statistics 11: 950–71. [Google Scholar] [CrossRef]
Genz, Alan, and Frank Bretz. 2009. Computation of Multivariate Normal and t Probabilities. Berlin: Springer, vol. 195. [Google Scholar]
Gibson, Garvin Jarvis, Chris Glasbey, and David Elston. 1994. Monte Carlo evaluation of multivariate normal integrals and sensitivity to variate ordering. In Advances in Numerical Methods and Applications. River Edge: World Scientific Publishing, pp. 120–26. [Google Scholar]
Glasserman, Paul. 2013. Monte Carlo Methods in Financial Engineering. Berlin: Springer, vol. 53. [Google Scholar]
Hintz, Erik, Marius Hofert, and Christiane Lemieux. 2020. Normal Variance Mixtures: Distribution, Density and Parameter Estimation. arXiv arXiv:1911.03017. [Google Scholar]
Hofert, Marius, Erik Hintz, and Christiane Lemieux. 2020. nvmix: Multivariate Normal Variance Mixtures. R Package Version 0.0.5. Available online: https://cran.r-project.org/package=nvmix (accessed on 21 August 2020).
Hofert, Marius, Kurt Hornik, and Alexander McNeil. 2016. qrmdata: Data Sets for Quantitative Risk Management Practice. R Package Version 2016-01-03-1. Available online: https://cran.r-project.org/package=qrmdata (accessed on 21 August 2020).
Hofert, Marius, and Christiane Lemieux. 2019. qrng: (Randomized) Quasi-Random Number Generators. R Package Version 0.0.7. Available online: https://cran.r-project.org/package=qrng (accessed on 21 August 2020).
Lemieux, Christiane. 2009. Monte Carlo and Quasi-Monte Carlo Sampling. Berlin: Springer. [Google Scholar]
Lindskog, Filip, Alexander Mcneil, and Uwe Schmock. 2003. Kendall’s tau for elliptical distributions. In Credit Risk. Berlin: Springer, pp. 149–56. [Google Scholar]
Low, Rand, Robert Faff, and Kjersti Aas. 2016. Enhancing mean–variance portfolio selection by modeling distributional asymmetries. Journal of Economics and Business 85: 49–72. [Google Scholar] [CrossRef] [Green Version]
Luo, Xiaolin, and Pavel Shevchenko. 2010. The t copula with multiple parameters of degrees of freedom: Bivariate characteristics and application to risk management. Quantitative Finance 10: 1039–54. [Google Scholar] [CrossRef] [Green Version]
Markowitz, Harry. 1952. Portfolio selection. Journal of Finance 7: 77–91. [Google Scholar]
McNeil, Alexander, Rüdiger Frey, and Paul Embrechts. 2015. Quantitative Risk Management: Concepts, Techniques and Tools. Princeton: Princeton University Press. [Google Scholar]
Nelsen, Roger. 2007. An Introduction to Copulas. Berlin: Springer. [Google Scholar]
Niederreiter, Harald. 1992. Random Number Generation and quasi-Monte Carlo Methods. Philadelphia: SIAM, vol. 63. [Google Scholar]
Piessens, Robert, Elise de Doncker-Kapenga, Christoph Überhuber, and David Kahaner. 2012. Quadpack: A Subroutine Package for Automatic Integration. Berlin: Springer, vol. 1. [Google Scholar]
Tu, Jun, and Guofu Zhou. 2011. Markowitz meets talmud: A combination of sophisticated and naive diversification strategies. Journal of Financial Economics 99: 204–15. [Google Scholar] [CrossRef] [Green Version]
Turlach, Berwin, Andreas Weingessel, and Cleve Moler. 2019. quadprog: Functions to Solve Quadratic Programming Problems. R Package Version 1.5.8. Available online: https://cran.r-project.org/package=quadprog (accessed on 21 August 2020).
Venter, Gary, Jack Barnett, Rodney Kreps, and John Major. 2007. Multivariate copulas for financial modeling. Variance 1: 103–19. [Google Scholar]
Wang, Xiaoqun, and Ian Sloan. 2005. Why are high-dimensional finance problems often of low effective dimension? SIAM Journal on Scientific Computing 27: 159–83. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Estimated average errors over 15 replications. In each run and for each sample size,

ℙ (X \leq b)

is estimated where

X \sim {gNVM}_{d} (0, P, F_{W})

for a random correlation matrix P and a random upper limit

b

. For

d = 5

(left),

X_{j} \sim t_{0.5 + j}

and for

d = 20

(right),

X_{j} \sim t_{0.75 + 0.25 j}

for

j = 1, \dots, d

.

Figure 1. Estimated average errors over 15 replications. In each run and for each sample size,

ℙ (X \leq b)

is estimated where

X \sim {gNVM}_{d} (0, P, F_{W})

for a random correlation matrix P and a random upper limit

b

. For

d = 5

(left),

X_{j} \sim t_{0.5 + j}

and for

d = 20

(right),

X_{j} \sim t_{0.75 + 0.25 j}

for

j = 1, \dots, d

.

Figure 2. Integrand values h for a 2-dimensional distribution with t margins and

x = (0, 0)

(left),

x = (5, 5)

(middle) and

x = (25, 25)

(right).

Figure 2. Integrand values h for a 2-dimensional distribution with t margins and

x = (0, 0)

(left),

x = (5, 5)

(middle) and

x = (25, 25)

(right).

Figure 3. Estimated log-density of a grouped t distribution with

ν = (3, 6)

in

d = 2

(left) and

ν = (3, \dots, 3, 6, \dots, 6)

in

d = 10

(right). Estimation with dgnvmix() was carried out using a relative error tolerance of 0.01. The plot also shows the log-density function of

t_{d} (3, 0, I_{d})

and

t_{d} (6, 0, I_{d})

for comparison.

Figure 3. Estimated log-density of a grouped t distribution with

ν = (3, 6)

in

d = 2

(left) and

ν = (3, \dots, 3, 6, \dots, 6)

in

d = 10

(right). Estimation with dgnvmix() was carried out using a relative error tolerance of 0.01. The plot also shows the log-density function of

t_{d} (3, 0, I_{d})

and

t_{d} (6, 0, I_{d})

for comparison.

Figure 4. Kendall’s tau for a bivariate grouped t distribution for various

ρ

estimated via corgnvmix() (left); relative difference of

ρ_{τ}

wrt the elliptical case (right).

Figure 4. Kendall’s tau for a bivariate grouped t distribution for various

ρ

estimated via corgnvmix() (left); relative difference of

ρ_{τ}

wrt the elliptical case (right).

Figure 5. Estimated dof for a 6-dimensional grouped t copula with 3 groups of size 2 each. (left) Estimates are obtained by fitting the t copulas of each group separately; (right) the joint copula-likelihood of the grouped t copula is maximized.

Figure 6. Estimated dof for a grouped t copula fitted to the standardized residuals obtained from the Dow Jones 30 dataset from 1 January 2014 to 31 December 2015 after deGARCHing (left); estimated log-likelihood of the estimated dof on the middle and right plot.

Figure 7. Estimated shortfall probability

C (u, \dots, u)

for grouped and ungrouped t copulas fitted to the standardized residuals obtained from the Dow Jones 30 dataset from 1 January 2014 to 31 December 2015 after deGARCHing.

Figure 7. Estimated shortfall probability

C (u, \dots, u)

for grouped and ungrouped t copulas fitted to the standardized residuals obtained from the Dow Jones 30 dataset from 1 January 2014 to 31 December 2015 after deGARCHing.

Table 1. Estimated and true density values of a bivariate t distribution with 6 degrees of freedom. Logarithmic values are in brackets.

x	True Density	Estimated Density (`integrate()`)	Estimated Density (`dgnvmix()`)
$(0, 0)$	$1.59 \times 10^{- 1}$ $(- 1.84)$	$1.59 \times 10^{- 1}$ $(- 1.84)$	$1.59 \times 10^{- 1}$ $(- 1.84)$
$(5, 5)$	$2.10 \times 10^{- 5}$ $(- 10.77)$	$2.11 \times 10^{- 5}$ $(- 10.77)$	$2.16 \times 10^{- 5}$ $(- 10.74)$
$(10, 10)$	$1.15 \times 10^{- 7}$ $(- 15.98)$	$4.47 \times 10^{- 8}$ $(- 16.92)$	$1.16 \times 10^{- 7}$ $(- 15.97)$
$(25, 25)$	$8.29 \times 10^{- 11}$ $(- 23.21)$	$5.50 \times 10^{- 23}$ $(- 51.25)$	$8.00 \times 10^{- 11}$ $(- 23.25)$
$(50, 50)$	$3.28 \times 10^{- 13}$ $(- 28.74)$	$3.26 \times 10^{- 76}$ $(- 173.82)$	$3.13 \times 10^{- 13}$ $(- 28.79)$

Table 2. Estimated returns, certainty-equivalent returns (CERs) and Sharpe-ratios (SRs) (all in percentage points) of a mean-variance (MV) investor under three investment rules assuming no short-sales: Model-based with grouped and ungrouped t copula and historical.

	${\hat{μ}}_{r}$	$\hat{CER}$	$\hat{SR}$
historical	0.045	0.041	5.356
ungrouped t	0.075	0.071	8.678
grouped t	0.083	0.079	9.474

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hintz, E.; Hofert, M.; Lemieux, C. Grouped Normal Variance Mixtures. Risks 2020, 8, 103. https://doi.org/10.3390/risks8040103

AMA Style

Hintz E, Hofert M, Lemieux C. Grouped Normal Variance Mixtures. Risks. 2020; 8(4):103. https://doi.org/10.3390/risks8040103

Chicago/Turabian Style

Hintz, Erik, Marius Hofert, and Christiane Lemieux. 2020. "Grouped Normal Variance Mixtures" Risks 8, no. 4: 103. https://doi.org/10.3390/risks8040103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Grouped Normal Variance Mixtures

Abstract

1. Introduction

2. Notation, Basic Properties and Tools

2.1. Grouped Normal Variance Mixtures

2.1.1. Mean and Covariance

2.1.2. Relationship with Elliptical Distributions

2.2. Randomized quasi-Monte Carlo Methods

3. Distribution Function

3.1. Estimation

3.2. Numerical Results

4. Density Function

4.1. Estimation

4.2. Numerical Results

5. Kendall tau and Spearman rho

Numerical Results

6. Copula Setting

6.1. Grouped Normal Variance Mixture Copulas

6.1.1. Radial Symmetry and Exchangeability

6.1.2. Tail Dependence Coefficients

6.2. Inverse-Gamma Mixtures

6.3. Estimation of the Copula and Its Density

6.4. Fitting Copula Parameters to a Dataset

7. Discussion and Conclusions

8. Proofs

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI