**1. Introduction**

In this paper, we show that a class of uniformity-preserving transformations for uniform random variables can facilitate the application of copula modelling to time series exhibiting the serial dependence characteristics that are typical of volatile financial return data. Our main aims are twofold: to establish the fundamental properties of v-transforms and show that they are a natural fit to the volatility modelling problem; to develop a class of processes using the implied copula process of a Gaussian ARMA model that can serve as an archetype for copula models using v-transforms. Although the existing literature on volatility modelling in econometrics is vast, the models we propose have some attractive features. In particular, as copula-based models, they allow the separation of marginal and serial dependence behaviour in the construction and estimation of models.

A distinction is commonly made between genuine stochastic volatility models, as investigated by Taylor (1994) and Andersen (1994), and GARCH-type models as developed in a long series of papers by Engle (1982), Bollerslev (1986), Ding et al. (1993), Glosten et al. (1993) and Bollerslev et al. (1994), among others. In the former an unobservable process describes the volatility at any time point while in the latter volatility is modelled as a function of observable information describing the past behaviour of the process; see also the review articles by Shephard (1996) and Andersen and Benzoni (2009). The generalized autoregressive score (GAS) models of Creal et al. (2013) generalize the observation-driven approach of GARCH models by using the score function of the conditional density to model time variation in key parameters of the time series model. The models of this paper have more in common with the observation-driven approach of GARCH and GAS but have some important di fferences.

In GARCH-type models, the marginal distribution of a stationary process is inextricably linked to the dynamics of the process as well as the conditional or innovation distribution; in most cases, it has no simple closed form. For example, the standard GARCH mechanism serves to create

power-law behaviour in the marginal distribution, even when the innovations come from a lighter-tailed distribution such as Gaussian (Mikosch and Stărică 2000). While such models work well for many return series, they may not be sufficiently flexible to describe all possible combinations of marginal and serial dependence behaviour encountered in applications. In the empirical example of this paper, which relates to log-returns on the Bitcoin price series, the data appear to favour a marginal distribution with sub-exponential tails that are lighter than power tails and this cannot be well captured by standard GARCH models. Moreover, in contrast to much of the GARCH literature, the models we propose make no assumptions about the existence of second-order moments and could also be applied to very heavy-tailed situations where variance-based methods fail.

Let *X*1, ... , *Xn* be a time series of financial returns sampled at (say) daily frequency and assume that these are modelled by a strictly stationary stochastic process (*Xt*) with marginal distribution function (cdf) *FX*. To match the stylized facts of financial return data described, for example, by Campbell et al. (1997) and Cont (2001), it is generally agreed that (*Xt*) should have limited serial correlation, but the squared or absolute processes -*X*2*t* and (|*Xt*|) should have significant and persistent positive serial correlation to describe the effects of volatility clustering.

In this paper, we refer to transformed series like (|*Xt*|), in which volatility is revealed through serial correlation, as *volatility proxy series*. More generally, a volatility proxy series (*T*(*Xt*)) is obtained by applying a transformation *T* : *R* -→ *R* which (i) depends on a change point μ*T* that may be zero, (ii) is increasing in *Xt* − μ*T* for *Xt* ≥ μ*T* and (iii) is increasing in μ*T* − *Xt* for *Xt* ≤ μ*<sup>T</sup>*.

Our approach in this paper is to model the probability-integral transform (PIT) series (*Vt*) of a volatility proxy series. This is defined by *Vt* = *FT*(*X*)(*T*(*Xt*)) for all *t*, where *FT*(*X*) denotes the cdf of *<sup>T</sup>*(*Xt*). If (*Ut*) is the PIT series of the original process (*Xt*), defined by *Ut* = *FX*(*Xt*) for all *t*, then a *v-transform* is a function describing the relationship between the terms of (*Vt*) and the terms of (*Ut*). Equivalently, a v-transform describes the relationship between quantiles of the distribution of *Xt* and the distribution of the volatility proxy *<sup>T</sup>*(*Xt*). Alternatively, it characterizes the dependence structure or copula of the pair of variables (*Xt*, *<sup>T</sup>*(*Xt*)). In this paper, we show how to derive flexible, parametric families of v-transforms for practical modelling purposes.

To gain insight into the typical form of a v-transform, let *x*1, ... , *xn* represent the realized data values and let *u*1, ... , *un* and *v*1, ... , *vn* be the samples obtained by applying the transformations *vt* = *F*(|*X*|) *n* (|*xt*|) and *ut* = *F*(*X*) *n* (*xt*), where *F*(*X*) *n* (*x*) = 1 *n*+1 *nt*=<sup>1</sup> *<sup>I</sup>*{*xt*≤*x*} and *F*(|*X*|) *n* (*x*) = 1 *n*+1 *nt*=<sup>1</sup> *<sup>I</sup>*{|*xt*|≤*x*} denote scaled versions of the empirical distribution functions of the *xt* and |*xt*| samples, respectively. The graph of (*ut*, *vt*) gives an empirical estimate of the v-transform for the random variables (*Xt*, |*Xt*|). In the left-hand plot of Figure 1 we show the relationship for a sample of *n* = 1043 daily log-returns of the Bitcoin price series for the years 2016–2019. Note how the empirical v-transform takes the form of a slightly asymmetric 'V'.

**Figure 1.** Scatterplot of *vt* against *ut* (left), sample acf of raw data *xt* (centre) and sample acf of *zt* = <sup>Φ</sup>−<sup>1</sup>(*vt*) (right). The transformed data are defined by *vt* = *F*(|*X*|) *n* (|*xt*|) and *ut* = *F*(*X*) *n* (*xt*) where *F*(*X*) *n* and *F*(|*X*|) *n* denote versions of the empirical distribution function of the *xt* and |*xt*| values, respectively. The sample size is *n* = 1043 and the data are daily log-returns of the Bitcoin price for the years 2016–2019.

The right-hand plot of Figure 1 shows the sample autocorrelation function (acf) of the data given by *zt* = <sup>Φ</sup>−<sup>1</sup>(*vt*) where Φ is the standard normal cdf. This reveals a persistent pattern of positive serial correlation which can be modelled by the implied ARMA copula. This pattern is not evident in the acf of the raw *xt* data in the centre plot.

To construct a volatility model for (*Xt*) using v-transforms, we need to specify a process for (*Vt*). In principle, any model for a series of serially dependent uniform variables can be applied to (*Vt*). In this paper, we illustrate concepts using the Gaussian copula model implied by the standard ARMA dependence structure. This model is particularly tractable and allows us to derive model properties and fit models to data relatively easily.

There is a large literature on copula models for time series; see, for example, the review papers by Patton (2012) and Fan and Patton (2014). While the main focus of this literature has been on cross-sectional dependencies between series, there is a growing literature on models of serial dependence. First-order Markov copula models have been investigated by Chen and Fan (2006), Chen et al. (2009) and Domma et al. (2009) while higher-order Markov copula models using D-vines are applied by Smith et al. (2010). These models are based on the pair-copula apporoach developed in Joe (1996), Bedford and Cooke (2001, 2002) and Aas et al. (2009). However, the standard bivariate copulas that enter these models are not generally e ffective at describing the typical serial dependencies created by stochastic volatility, as observed by Loaiza-Maya et al. (2018).

The paper is structured as follows. In Section 2, we provide motivation for the paper by constructing a symmetric model using the simplest example of a v-transform. The general theory of v-transforms is developed in Section 3 and is used to construct the class of VT-ARMA processes and analyse their properties in Section 4. Section 5 treats estimation and statistical inference for VT-ARMA processes and provides an example of their application to the Bitcoin return data; Section 6 presents the conclusions. Proofs may be found in the Appendix A.

#### **2. A Motivating Model**

Given a probability space (<sup>Ω</sup>, F , *<sup>P</sup>*), we construct a symmetric, strictly stationary process (*Xt*)*<sup>t</sup>*∈*N*\{0} such that, under the even transformation *<sup>T</sup>*(*x*)=|*x*|, the serial dependence in the volatility proxy series (*T*(*Xt*)) is of ARMA type. We assume that the marginal cdf *FX* of (*Xt*) is absolutely continuous and the density *fX* satisfies *fX*(*x*) = *fX*(−*<sup>x</sup>*) for all *x* > 0. Since *FX* and *<sup>F</sup>*|*X*| are both continuous, the properties of the probability-integral (PIT) transform imply that the series (*Ut*) and (*Vt*) given by *Ut* = *FX*(*Xt*) and *Vt* = *<sup>F</sup>*|*X*|(|*Xt*|) both have standard uniform marginal distributions. Henceforth, we refer to (*Vt*) as the *volatility PIT process* and (*Ut*) as the *series PIT process*.

Any other volatility proxy series that can be obtained by a continuous and strictly increasing transformation of the terms of (|*Xt*|), such as - *X*<sup>2</sup> *t* , yields exactly the same volatility PIT process. For example, if *Vt* = *FX*2 - *X*<sup>2</sup> *t* , then it follows from the fact that *FX*2 (*x*) = *<sup>F</sup>*|*X*| - √ + *x* for *x* ≥ 0 that *Vt* = *FX*2 - *X*<sup>2</sup> *t* = *<sup>F</sup>*|*X*|(|*Xt*|) = *Vt*. In this sense, we can think of classes of equivalent volatility proxies, such as (|*Xt*|), - *X*<sup>2</sup> *t* , (exp *Xt* ) and (ln(<sup>1</sup>+|*Xt*|)). In fact, (*Vt*) is itself an equivalent volatility proxy to (|*Xt*|) since *<sup>F</sup>*|*X*| is a continuous and strictly increasing transformation.

The symmetry of *fX* implies that *<sup>F</sup>*|*X*|(*x*) = <sup>2</sup>*FX*(*x*) − 1 = 1 − <sup>2</sup>*FX*(−*<sup>x</sup>*) for *x* ≥ 0. Hence, we find that

$$V\_t = F\_{|\mathcal{X}|}(|\mathcal{X}\_t|) = \begin{cases} \begin{array}{l} F\_{|\mathcal{X}|}(-\mathcal{X}\_t) \\ F\_{|\mathcal{X}|}(X\_t) \end{array} = 1 - 2F\_{\mathcal{X}}(X\_t) = 1 - 2lI\_{t\_\mathcal{I}} & \text{if} \mathcal{X}\_t < 0 \\\ \begin{array}{l} F\_{|\mathcal{X}|}(X\_t) \end{array} = 2F\_{\mathcal{X}}(X\_t) - 1 = 2lI\_t - 1, & \text{if} \mathcal{X}\_t \ge 0 \end{array}$$

which implies that the relationship between the volatility PIT process (*Vt*) and the series PIT process (*Ut*) is given by

$$V\_l = \mathcal{V}(lL\_l) = |2lL\_l - 1|\tag{1}$$

where V(*u*)=|2*u* − 1| is a perfectly symmetric v-shaped function that maps values of *Ut* close to 0 or 1 to values of *Vt* close to 1, and values close to 0.5 to values close to 0. V is the canonical example

of a v-transform. It is related to the so-called tent-map transformation T (*u*) = 2 min(*<sup>u</sup>*, 1 − *u*) by V(*u*) = 1 − T (*u*).

Given (*Vt*), let the process (*Zt*) be defined by setting *Zt* = <sup>Φ</sup>−<sup>1</sup>(*Vt*) so that we have the following chain of transformations:

$$X\_t \stackrel{F\_X}{\rightarrow} \quad \mathcal{U}\_t \stackrel{\mathcal{V}}{\rightarrow} \quad \mathcal{V}\_t \stackrel{\Phi^{-1}}{\rightarrow} \quad Z\_t. \tag{2}$$

We refer to (*Zt*) as a *normalized volatility proxy series*. Our aim is to construct a process (*Xt*) such that, under the chain of transformations in (2), we obtain a Gaussian ARMA process (*Zt*) with mean zero and variance one. We do this by working back through the chain.

The transformation V is not an injection and, for any *Vt* > 0, there are two possible inverse values, 1 2 (1 − *Vt*) and 1 2 (1 + *Vt*). However, by randomly choosing between these values, we can 'stochastically invert' V to construct a random variable *Ut* such that V(*Ut*) = *Vt*, This is summarized in Lemma 1, which is a special case of a more general result in Proposition 4.

**Lemma 1.** *Let V be a standard uniform variable. If V* = 0*, set U* = 1 2 *. Otherwise, let U* = 1 2 (1 − *V*) *with probability 0.5 and U* = 1 2(1 + *V*) *with probability 0.5. Then, U is uniformly distributed and* V(*U*) = *V.*

This simple result suggests Algorithm 1 for constructing a process (*Xt*) with symmetric marginal density *fX* such that the corresponding normalized volatility proxy process (*Zt*) under the absolute value transformation (or continuous and strictly increasing functions thereof) is an ARMA process. We describe the resulting model as a VT-ARMA process.

It is important to state that the use of the Gaussian process (*Zt*) as the fundamental building block of the VT-ARMA process in Algorithm 1 has no e ffect on the marginal distribution of (*Xt*), which is *FX* as specified in the final step of the algorithm. The process (*Zt*) is exploited *only for its serial dependence structure*, which is described by a family of finite-dimensional Gaussian copulas; this dependence structure is applied to the volatility proxy process.

**Algorithm 1:**


Figure 2 shows a symmetric VT-ARMA(1,1) process with ARMA parameters α1 = 0.95 and β1 = −0.85; such a model often works well for financial return data. Some intuition for this observation can be gained from the fact that the popular GARCH(1,1) model is known to have the structure of an ARMA(1,1) model for the squared data process; see, for example, McNeil et al. (2015) (Section 4.2) for more details.

**Figure 2.** Realizations of length *n* = 500 of (*Xt*) and (*Zt*) for a VT-ARMA(1,1) process with a marginal Student t distribution with ν = 3 degrees of freedom and ARMA parameters α = 0.95 and β = −0.85. ACF plots for (*Xt*) and (|*Xt*|) are also shown.

## **3. V-Transforms**

To generalize the class of v-transforms, we admit two forms of asymmetry in the construction described in Section 2: we allow the density *fX* to be skewed; we introduce an asymmetric volatility proxy.

**Definition 1** (Volatility proxy transformation and profile)**.** *Let T*1 *and T*2 *be strictly increasing, continuous and di*ff*erentiable functions on R*<sup>+</sup> = [0, ∞) *such that <sup>T</sup>*1(0) = *<sup>T</sup>*2(0)*. Let* μ*T* ∈ *R. Any transformation T* : *R* → *R of the form*

$$T(\mathbf{x}) = \begin{cases} \begin{array}{ll} T\_1(\mu\_T - \mathbf{x}) & \mathbf{x} \leq \mu\_T \\\ T\_2(\mathbf{x} - \mu\_T) & \mathbf{x} > \mu\_T \end{array} \tag{3} \\ \end{cases} \tag{3}$$

*is a volatility proxy transformation. The parameter* μ*T is the change point of T and the associated function gT* : *R*<sup>+</sup> → *R*<sup>+</sup> *, gT*(*x*) = *T*−<sup>1</sup> 2◦ *<sup>T</sup>*1(*x*) *is the profile function of T.*

By introducing μ*<sup>T</sup>*, we allow for the possibility that the natural change point may not be identical to zero. By introducing different functions *T*1 and *T*2 for returns on either side of the change point, we allow the possibility that one or other may contribute more to the volatility proxy. This has a similar economic motivation to the *leverage* effects in GARCH models (Ding et al. 1993); falls in equity prices increase a firm's leverage and increase the volatility of the share price.

Clearly, the profile function of a volatility proxy transformation is a strictly increasing, continuous and differentiable function on *R*<sup>+</sup> such that *gT*(*x*) = 0. In conjunction with μ*<sup>T</sup>*, the profile contains all the information about *T* that is relevant for constructing v-transforms. In the case of a volatility proxy transformation that is symmetric about μ*<sup>T</sup>*, the profile satisfies *gT*(*x*) = *x*.

The following result shows how v-transforms *V* = V(*U*) can be obtained by considering different continuous distributions *FX* and different volatility proxy transformations *T* of type (3).

**Proposition 1.** *Let X be a random variable with absolutely continuous and strictly increasing cdf FX on R and let T be a volatility proxy transformation. Let U* = *FX*(*X*) *and V* = *FT*(*X*)(*T*(*X*))*. Then, V* = V(*U*) *where*

$$\mathcal{V}(u) = \begin{cases} F\_X(\mu\_T + \mathcal{g}\_T(\mu\_T - F\_X^{-1}(u))) - u, & u \le F\_X(\mu\_T) \\ u - F\chi(\mu\_T - \mathcal{g}\_T^{-1}(F\_X^{-1}(u) - \mu\_T)), & u > F\chi(\mu\_T) \end{cases} \tag{4}$$

The result implies that any two volatility proxy transformations *T* and *T* which have the same change point μ*T* and profile function *gT* belong to an equivalence class with respect to the resulting v-transform. This generalizes the idea that *<sup>T</sup>*(*x*)=|*x*| and *<sup>T</sup>*(*x*) = *x*2 give the same v-transform in the symmetric case of Section 2. Note also that the volatility proxy transformations *T*(*V*) and *T*(*Z*) defined by

$$\begin{array}{lll} T^{(V)}(\mathbf{x}) &=& F\_{T(X)}(T(\mathbf{x})) = \mathcal{V}(F\_X(\mathbf{x})) \\ T^{(Z)}(\mathbf{x}) &=& \Phi^{-1}(T^{(V)}(\mathbf{x})) = \Phi^{-1}(\mathcal{V}(F\_X(\mathbf{x}))) \end{array} \tag{5}$$

are in the same equivalence class as *T* since they share the same change point and profile function.

**Definition 2** (v-transform and fulcrum)**.** *Any transformation* V *that can be obtained from Equation (4) by choosing an absolutely continuous and strictly increasing cdf FX on R and a volatility proxy transformation T is a v-transform. The value* δ = *FX*(μ*T*) *is the fulcrum of the v-transform.*

#### *3.1. A Flexible Parametric Family*

In this section, we derive a family of v-transforms using construction (4) by taking a tractable asymmetric model for *FX* using the construction proposed by Fernández and Steel (1998) and by setting μ*T* = 0 and *gT*(*x*) = *kx*ξ for *k* > 0 and ξ > 0. This profile function contains the identity profile *gT*(*x*) = *x* (corresponding to the symmetric volatility proxy transformation) as a special case, but allows cases where negative or positive returns contribute more to the volatility proxy. The choices we make may at first sight seem rather arbitrary, but the resulting family can in fact assume many of the shapes that are permissible for v-transforms, as we will argue.

Let *f*0 be a density that is symmetric about the origin and let γ > 0 be a scalar parameter. Fernandez and Steel suggested the model

$$f\_{\mathbf{X}}(\mathbf{x};\boldsymbol{\gamma}) = \begin{cases} \frac{2\boldsymbol{\gamma}}{1+\boldsymbol{\gamma}^{2}} \, \operatorname{f0}\left(\boldsymbol{\gamma}\mathbf{x}\right) & \mathbf{x} \le \mathbf{0} \\\ \frac{2\boldsymbol{\gamma}}{1+\boldsymbol{\gamma}^{2}} \, \operatorname{f0}\left(\frac{\mathbf{x}}{\boldsymbol{\gamma}}\right) & \mathbf{x} > \mathbf{0} \end{cases} \tag{6}$$

This model is often used to obtain skewed normal and skewed Student distributions for use as innovation distributions in econometric models. A model with γ > 1 is skewed to the right while a model with γ < 1 is skewed to the left, as might be expected for asset returns. We consider the particular case of a Laplace or double exponential distribution *f*0(*x*) = 0.5 exp(−|*x*|) which leads to particularly tractable expressions.

**Proposition 2.** *Let FX*(*x*; γ) *be the cdf of the density* (6) *when f*0(*x*) = 0.5 exp(−|*x*|)*. Set* μ*T* = 0 *and let gT*(*x*) = *kx*ξ *for k*, ξ > 0*. The v-transform (4) is given by*

$$\mathcal{IV}\_{\delta,\varkappa,\zeta}(u) = \begin{cases} 1 - u - (1 - \delta) \exp\left( -\kappa \left( -\ln\left(\frac{u}{\delta}\right) \right)^{\zeta} \right) & u \le \delta, \\\ u - \delta \exp\left( -\kappa^{-1/\zeta} \left( -\ln\left(\frac{1 - u}{1 - \delta}\right) \right)^{1/\zeta} \right) & u > \delta, \end{cases} \tag{7}$$

*where* δ = *FX*(0) = -1 + γ<sup>2</sup>−<sup>1</sup> ∈ (0, 1) *and* κ = *<sup>k</sup>*/γξ+<sup>1</sup> > 0*.*

It is remarkable that (7) is a uniformity-preserving transformation. If we set ξ = 1 and κ = 1, we ge<sup>t</sup>

$$\mathcal{V}\_{\delta}(u) = \begin{cases} (\delta - u) / \delta & u \le \delta, \\\ (u - \delta) / (1 - \delta) & u > \delta \end{cases} \tag{8}$$

which obviously includes the symmetric model V0.5(*u*) = |2*u* − 1|. The v-transform Vδ(*u*) in (8) is a very convenient special case, and we refer to it as the *linear* v-transform.

In Figure 3, we show the v-transform Vδ,κ,<sup>ξ</sup> when δ = 0.55, κ = 1.4 and ξ = 0.65. We will use this particular v-transform to illustrate further properties of v-transforms and find a characterization.

**Figure 3.** An asymmetric v-transform from the family defined in (7). For any v-transform, if *v* = V(*u*) and *u*<sup>∗</sup> is the dual of *u*, then the points (*<sup>u</sup>*, <sup>0</sup>), (*<sup>u</sup>*, *<sup>v</sup>*), (*u*<sup>∗</sup>, 0) and (*u*<sup>∗</sup>, *v*) form the vertices of a square. For the given fulcrum δ, a v-transform can never enter the gray shaded area of the plot.

#### *3.2. Characterizing v-Transforms*

It is easily verified that any v-transform obtained from (4) consists of two arms or branches, described by continuous and strictly monotonic functions; the left arm is decreasing and the right arm increasing. See Figure 3 for an illustration. At the fulcrum δ, we have V(δ) = 0. Every point *u* ∈ [0, 1]\{δ} has a *dual point u*<sup>∗</sup> on the opposite side of the fulcrum such that V(*u*<sup>∗</sup>) = V(*u*). Dual points can be interpreted as the quantile probability levels of the distribution of *X* that give rise to the same level of volatility.

We collect these properties together in the following lemma and add one further important property that we refer to as the *square property* of a v-transform; this property places constraints on the shape that v-transforms can take and is illustrated in Figure 3.

**Lemma 2.** *A v-transform is a mapping* V : [0, 1] → [0, 1] *with the following properties:*


*4.* V *is strictly decreasing on* [0, δ] *and strictly increasing on* [δ, 1]*;*

*5. Every point u* ∈ [0, 1]\{δ} *has a dual point u*<sup>∗</sup> *on the opposite side of the fulcrum satisfying* V(*u*) = V(*u*<sup>∗</sup>) *and* |*u*<sup>∗</sup> −*<sup>u</sup>*<sup>=</sup> V(*u*) *(square property).*

It is instructive to see why the square property must hold. Consider Figure 3 and fix a point *u* ∈ [0, 1]\{δ} with V(*u*) = *v*. Let *U* ∼ *U*(0, 1) and let *V* = V(*U*). The events {*V* ≤ *v*} and min(*<sup>u</sup>*, *u*<sup>∗</sup>) ≤ *U* ≤ max(*<sup>u</sup>*, *u*<sup>∗</sup>) are the same and hence the uniformity of *V* under a v-transform implies that

$$\upsilon = P(V \le \upsilon) = P(\min(u, u^\*) \le \mathcal{U} \le \max(u, u^\*)) = |u^\* - u|. \tag{9}$$

The properties in Lemma 2 could be taken as the basis of an alternative definition of a v-transform. In view of (9), it is clear that any mapping V that has these properties is a uniformity-preserving transformation. We can characterize the mappings V that have these properties as follows.

**Theorem 1.** *A mapping* V : [0, 1] → [0, 1] *has the properties listed in Lemma 2 if and only if it takes the form*

$$\mathcal{V}(u) = \begin{cases} (1-u) - (1-\delta)\Psi\left(\frac{u}{\delta}\right) & u \le \delta, \\ u - \delta \Psi^{-1}\left(\frac{1-u}{1-\delta}\right) & u > \delta, \end{cases} \tag{10}$$

*where* Ψ *is a continuous and strictly increasing distribution function on* [0, 1]*.*

Our arguments so far show that every v-transform must have the form (10). It remains to verify that every uniformity-preserving transformation of the form (10) can be obtained from construction (4), and this is the purpose of the final result of this section. This allows us to view Definition 2, Lemma 2, and the characterization (10) as three equivalent approaches to the definition of v-transforms.

**Proposition 3.** *Let* V *be a uniformity-preserving transformation of the form* (10) *and FX a continuous distribution function. Then,* V *can be obtained from construction* (4) *using any volatility proxy transformation with change point* μ*T* = *F*−<sup>1</sup> *X*(δ) *and profile*

$$g\_T(\mathbf{x}) = F\_X^{-1}(F\_X(\mu\_T - \mathbf{x}) + \mathcal{V}(F\_X(\mu\_T - \mathbf{x}))) - \mu\_T, \mathbf{x} \ge 0. \tag{11}$$

Henceforth, we can view (10) as the general equation of a v-transform. Distribution functions Ψ on [0, 1] can be thought of as *generators* of v-transforms. Comparing (10) with (7), we see that our parametric family Vδ,κ,<sup>ξ</sup> is generated by <sup>Ψ</sup>(*x*) = exp(−<sup>κ</sup>(−(ln *<sup>x</sup>*)<sup>ξ</sup>)). This is a 2-parameter distribution whose density can assume many different shapes on the unit interval including increasing, decreasing, unimodal, and bathtub-shaped forms. In this respect, it is quite similar to the beta distribution which would yield an alternative family of v-transforms. The uniform distribution function <sup>Ψ</sup>(*x*) = *x* gives the family of linear v-transforms Vδ.

In applications, we construct models starting from the building blocks of a tractable v-transform V such as (7) and a distribution *FX*; from these, we can always infer an implied profile function *gT* using (11). The alternative approach of starting from *gT* and *FX* and constructing V via (4) is also possible but can lead to v-transforms that are cumbersome and computationally expensive to evaluate if *FX* and its inverse do not have simple closed forms.

#### *3.3. V-Transforms and Copulas*

If two uniform random variables are linked by the v-transform *V* = V(*U*), then the joint distribution function of (*<sup>U</sup>*, *V*) is a special kind of copula. In this section, we derive the form of the copula, which facilitates the construction of stochastic processes using v-transforms.

To state the main result, we use the notationV−<sup>1</sup> andV for the the inverse function and the gradient function of a v-transform V. Although there is no unique inverse V−<sup>1</sup>(*v*) (except when *v* = 0), the fact that the two branches of a v-transform mutually determine each other allows us to defineV−<sup>1</sup>(*v*)to be the inverse of the left branch of the v-transform given by V−<sup>1</sup> : [0, 1] → [0, δ], V−<sup>1</sup>(*v*) = inf*u* : V(*u*) = *<sup>v</sup>*. The gradient V(*u*) is defined for all points *u* ∈ [0, <sup>1</sup>]\{δ}, and we adopt the convention that V(δ) is the left derivative as *u* → δ.

**Theorem 2.** *Let V and U be random variables related by the v-transform V* = V(*U*)*.*

*1. The joint distribution function of* (*<sup>U</sup>*, *V*) *is given by the copula*

$$\mathbb{C}(u,v) = (\mathbb{U} \le u, V \le v) = \begin{cases} 0 & u < \mathcal{V}^{-1}(v) \\ u - \mathcal{V}^{-1}(v) & \mathcal{V}^{-1}(v) \le u < \mathcal{V}^{-1}(v) + v \\ v & u \ge \mathcal{V}^{-1}(v) + v \end{cases} \tag{12}$$

*2. Conditional on V* = *v, the distribution of U is given by*

$$\mathcal{U} = \begin{cases} \mathcal{V}^{-1}(v) & \text{with probability } \Delta(v) \text{ if } v \neq 0 \\ \mathcal{V}^{-1}(v) + v & \text{with probability } 1 - \Delta(v) \text{ if } v \neq 0 \\ \delta & \text{if } v = 0 \end{cases} \tag{13}$$

*where*

$$\Delta(v) = -\frac{1}{\mathcal{V}'(\mathcal{V}^{-1}(v))}.\tag{14}$$

*3. E*(Δ(*V*)) = δ*.*

**Remark 1.** *In the case of the symmetric v-transform* V(*u*)=|<sup>1</sup> − 2*u*|*, the copula in* (12) *takes the form <sup>C</sup>*(*<sup>u</sup>*, *v*) = maxmin*u* + *v*2 − 12 , *v*, 0*. We note that this copula is related to a special case of the tent map copula family C*Tθ*in Rémillard (2013) by <sup>C</sup>*(*<sup>u</sup>*, *v*) = *u* − *C*T1(*<sup>u</sup>*, 1 − *<sup>v</sup>*)*.*

For the linear v-transform family, the conditional probability <sup>Δ</sup>(*v*) in (14) satisfies <sup>Δ</sup>(*v*) = δ. This implies that the value of *V* contains no information about whether *U* is likely to be below or above the fulcrum; the probability is always the same regardless of *V*. In general, this is not the case and the value of *V* does contain information about whether *U* is large or small.

Part (2) of Theorem 2 is the key to stochastically inverting a v-transform in the general case. Based on this result, we define the concept of stochastic inversion of a v-transform. We refer to the function Δ as the *conditional down probability* of V.

**Definition 3** (Stochastic inversion function of a v-transform)**.** *Let* V *be a v-transform with conditional down probability* Δ*. The two-place function* V−<sup>1</sup> : [0, 1] × [0, 1] → [0, 1] *defined by*

$$\mathcal{V}^{-1}(v, w) = \begin{cases} \mathcal{V}^{-1}(v) & \it f w \le \Delta(v) \\ v + \mathcal{V}^{-1}(v) & \it f w > \Delta(v) \end{cases} \tag{15}$$

*is the stochastic inversion function of* V*.*

The following proposition, which generalizes Lemma 1, allows us to construct general asymmetric processes that generalize the process of Algorithm 1.

**Proposition 4.** *Let V and W be iid U*(0, 1) *variables and let* V *be a v-transform with stochastic inversion function* V*. If U* = <sup>V</sup>−<sup>1</sup>(*<sup>V</sup>*, *<sup>W</sup>*)*, then* V(*U*) = *V and U* ∼ *U*(0, 1)*.*

In Section 4, we apply v-transforms and their stochastic inverses to the terms of time series models. To understand the effect this has on the serial dependencies between random variables, we need to consider multivariate componentwise v-transforms of random vectors with uniform marginal distributions and these can also be represented in terms of copulas. We now give a result which forms the basis for the analysis of serial dependence properties. The first part of the result shows the relationship between copula densities under componentwise v-transforms. The second part shows the relationship under the componentwise stochastic inversion of a v-transform; in this case, we assume that the stochastic inversion of each term takes place independently given *V* so that all serial dependence comes from *V*.

**Theorem 3.** *Let* V *be a v-transform and let U* = (*<sup>U</sup>*1, ... , *Ud*) *and V* = (*<sup>V</sup>*1, ... , *Vd*) *be vectors of uniform random variables with copula densities cU and cV, respectively.*

*1. If V* = (V(*<sup>U</sup>*1), ... , <sup>V</sup>(*Ud*))*, then*

$$c\_V(v\_1, \dots, v\_d) = \sum\_{j\_1=1}^2 \cdots \sum\_{j\_d=1}^2 c u(u\_{1j\_1}, \dots, u\_{dj\_d}) \prod\_{i=1}^d \Delta(v\_i)^{I\_{\{i\}}=1} (1 - \Delta(v\_i))^{I\_{\{i\}}=2} \tag{16}$$

*where ui*1 = V−<sup>1</sup>(*vi*) *and ui*2 = V−<sup>1</sup>(*vi*) + *vi for all i* ∈ {1, ... , *d*}*.*

*2. If U* = -<sup>V</sup>−<sup>1</sup>(*<sup>V</sup>*1, *<sup>W</sup>*1), ... , <sup>V</sup>−<sup>1</sup>(*Vd*, *Wd*) *where W*1, ... , *Wd are iid uniform random variables that are also independent of V*1, ... , *Vd, then*

$$c\_{\mathcal{U}}(\boldsymbol{u}\_1, \dots, \boldsymbol{u}\_d) = c\_{\mathcal{V}}(\mathcal{V}(\boldsymbol{u}\_1), \dots, \mathcal{V}(\boldsymbol{u}\_d)).\tag{17}$$

#### **4. VT-ARMA Copula Models**

In this section, we study some properties of the class of time series models obtained by the following algorithm, which generalizes Algorithm 1. The models obtained are described as VT-ARMA processes since they are stationary time series constructed using the fundamental building blocks of a v-transform V and an ARMA process.

We can add any marginal behaviour in the final step, and this allows for an infinitely rich choice. We can, for instance, even impose an infinite-variance or an infinite-mean distribution, such as the Cauchy distribution, and still obtain a strictly stationary process for (*Xt*). We make the following definitions.

**Definition 4** (VT-ARMA and VT-ARMA copula process)**.** *Any stochastic process* (*Xt*) *that can be generated using Algorithm 2 by choosing an underlying ARMA process with mean zero and variance one, a v-transform* V*, and a continuous distribution function FX is a VT-ARMA process. The process* (*Ut*) *obtained at the penultimate step of the algorithm is a VT-ARMA copula process.*

Figure 4 gives an example of a simulated process using Algorithm 2 and the v-transform Vδ,κ,<sup>ξ</sup> in (7) with κ = 0.9 and MA parameter ξ = 1.1. The marginal distribution is a heavy-tailed skewed Student distribution of type (6) with degrees-of-freedom ν = 3 and skewness γ = 0.8, which gives rise to more large negative returns than large positive returns. The underlying time series model is an ARMA(1,1) model with AR parameter α = 0.95 and MA parameter β = −0.85. See the caption of the figure for full details of parameters.

**Figure 4.** Top left: realization of length *n* = 500 of (*Xt*) for a process with a marginal skewed Student distribution (parameters: ν = 3, γ = 0.8, μ = 0.3, σ = 1) a v-transform of the form (7) (parameters: δ = 0.50, κ = 0.9, ξ = 1.1) and an underlying ARMA process (α = 0.95, β = −0.85, σ = 0.95). Top right: the underlying ARMA process (*Zt*) in gray with the conditional mean (μ*t*) superimposed in black; horizontal lines at μ*t* = 0.5 (a high value) and μ*t* = −0.5 (a low value). The corresponding conditional densities are shown in the bottom figures with the marginal density as a dashed line.

## **Algorithm 2:**


In the remainder of this section, we concentrate on the properties of VT-ARMA copula processes (*Ut*) from which related properties of VT-ARMA processes (*Xt*) may be easily inferred.

#### *4.1. Stationary Distribution*

The VT-ARMA copula process (*Ut*) of Definition 4 is a strictly stationary process since the joint distribution of -*Ut*1 , ... , *Utk* for any set of indices *t*1 < ··· < *tk* is invariant under time shifts. This property follows easily from the strict stationarity of the underlying ARMA process (*Zt*) according to the following result, which uses Theorem 3.

**Proposition 5.** *Let* (*Ut*) *follow a VT-ARMA copula process with v-transform* V *and an underlying ARMA(p,q) structure with autocorrelation function* ρ(*k*)*. The random vector* -*Ut*1 , ... , *Utk for k* ∈ *N has joint density* *cGa <sup>P</sup>*(*<sup>t</sup>*1,...,*tk*) (V(*<sup>u</sup>*1), ... , V(*uk*))*, where cGa <sup>P</sup>*(*<sup>t</sup>*1,...,*tk*) *denotes the density of the Gaussian copula CGa <sup>P</sup>*(*<sup>t</sup>*1,...,*tk*) *and <sup>P</sup>*(*<sup>t</sup>*1, ··· , *tk*) *is a correlation matrix with* (*i*, *j*) *element given by* ρ - *tj* − *ti .*

An expression for the joint density facilitates the calculation of a number of dependence measures for the bivariate marginal distribution of (*Ut*, *Ut*+*<sup>k</sup>*). In the bivariate case, the correlation matrix of the underlying Gaussian copula *C*Ga *<sup>P</sup>*(*<sup>t</sup>*,*t*+*k*) contains a single o ff-diagonal value ρ(*k*) and we simply write *C*Ga ρ(*k*). The Pearson correlation of (*Ut*, *Ut*+*<sup>k</sup>*) is given by

$$\rho\left(\mathcal{U}\_{l\prime}\mathcal{U}\_{l+k}\right) = 12 \int\_0^1 \int\_0^1 u\_1 u\_2 c\_{\rho(k)}^{\text{Ga}}\left(\mathcal{V}(u\_1), \mathcal{V}(u\_2)\right) \text{d}u\_1 \text{d}u\_2 - 3\,\,. \tag{18}$$

This value is also the value of the Spearman rank correlation ρ*S*(*Xt*, *Xt*+*<sup>k</sup>*) for a VT-ARMA process (*Xt*) with copula process (*Ut*) (since the Spearman's rank correlation of a pair of continuous random variables is the Pearson correlation of their copula). The calculation of (18) typically requires numerical integration. However, in the special case of the linear v-transform Vδ in (8), we can ge<sup>t</sup> a simpler expression as shown in the following result.

**Proposition 6.** *Let* (*Ut*) *be a VT-ARMA copula process satisfying the assumptions of Proposition 5 with linear v-transform* Vδ*. Let* (*Zt*) *denote the underlying Gaussian ARMA process. Then,*

$$\rho\left(\mathcal{U}\_{t\prime}\mathcal{U}\_{t+k}\right) \quad = \ \left(2\delta - 1\right)^{2} \rho\_{\mathcal{S}}\left(Z\_{t\prime}Z\_{t+k}\right) = \frac{\theta(2\delta - 1)^{2}\arcsin\left(\frac{\rho(k)}{2}\right)}{\pi} \,. \tag{19}$$

For the symmetric v-transform V0.5, Equation (19) obviously yields a correlation of zero so that, in this case, the VT-ARMA copula process (*Ut*) is a white noise with an autocorrelation function that is zero, except at lag zero. However, even a very asymmetric model with δ = 0.4 or δ = 0.6 gives ρ(*Ut*, *Ut*+*<sup>k</sup>*) = 0.04ρ*S*(*Zt*,*Zt*+*<sup>k</sup>*) so that serial correlations tend to be very weak.

When we add a marginal distribution, the resulting process (*Xt*) has a di fferent auto-correlation function to (*Ut*), but the same rank autocorrelation function. The symmetric model of Section 2 is a white noise process. General asymmetric processes (*Xt*) are not perfect white noise processes but have only very weak serial correlation.

#### *4.2. Conditional Distribution*

To derive the conditional distribution of a VT-ARMA copula process, we use the vector notation *Ut* = (*<sup>U</sup>*1, ... , *Ut*) and *Zt* = (*<sup>Z</sup>*1, ... ,*Zt*) to denote the history of processes up to time point *t* and *ut* and *zt* for realizations. These vectors are related by the componentwise transformation *Zt* = <sup>Φ</sup>−<sup>1</sup>(V(*<sup>U</sup>t*)). We assume that all processes have a time index set given by *t* ∈ {1, 2, ...}.

**Proposition 7.** *For t* > 1*, the conditional density fUt*<sup>|</sup>*Ut*−<sup>1</sup> (*u* | *<sup>u</sup>t*−<sup>1</sup>) *is given by*

$$f\_{l\mid \mathcal{U}l=1}(u \mid u\_{t-1}) = \frac{\phi\big(\frac{\Phi^{-1}(\mathcal{V}(u)) - \mu\_t}{\sigma\_d}\big)}{\sigma\_d \phi(\Phi^{-1}(\mathcal{V}(u)))}\tag{20}$$

*where* μ*t* = *E* - *Zt* | *Zt*−<sup>1</sup> = <sup>Φ</sup>−<sup>1</sup>(V(*<sup>u</sup>t*−<sup>1</sup>)) *and* σ *is the standard deviation of the innovation process for the ARMA model followed by* (*Zt*)*.*

When (*Zt*) is iid white noise μ*t* = 0, σ = 1 and (20) reduce to the uniform density *fUt*<sup>|</sup>*Ut*−<sup>1</sup> (*u* | *<sup>u</sup>t*−<sup>1</sup>) = 1 as expected. In the case of the first-order Markov AR(1) model *Zt* = <sup>α</sup>1*Zt*−<sup>1</sup> + *t*, the conditional mean of *Zt* is μ*t* = <sup>α</sup>1Φ−<sup>1</sup>(V(*ut*−<sup>1</sup>)) and σ2 = 1 − α<sup>2</sup> 1. The conditional density (20) can be easily shown to simplify to *fUt*<sup>|</sup>*Ut*−<sup>1</sup> (*u* | *ut*−<sup>1</sup>) = *c*Ga α1 (V(*u*), V(*ut*−<sup>1</sup>)) where *c*Ga α1 (V(*<sup>u</sup>*1), V(*<sup>u</sup>*2)) denotes the copula density derived in Proposition 5. In this special case, the VT-ARMA model falls within the class of first-order Markov copula models considered by Chen and Fan (2006), although the copula is new.

If we add a marginal distribution *FX* to the VT-ARMA copula model to obtain a model for (*Xt*) and use similar notational conventions as above, the resulting VT-ARMA model has conditional density

$$f\_{\mathbf{X}|\mathbf{X}\_{t-1}}(\mathbf{x}\mid\mathbf{x}\_{t-1}) = f\chi(\mathbf{x}) f\_{\|\mathbf{L}\|\_{\mathbf{L}^{-1}}}(F\chi(\mathbf{x}) \mid F\chi(\mathbf{x}\_{t-1})) \tag{21}$$

with *fUt*<sup>|</sup>*Ut*−<sup>1</sup> as in (20). An interesting property of the VT-ARMA process is that the conditional density (21) can have a pronounced bimodality for values of μ*t* in excess of zero that is in high volatility situations where the conditional mean of *Zt* is higher than the marginal mean value of zero; in low volatility situations, the conditional density appears more concentrated around zero. This phenomenon is illustrated in Figure 4. The bimodality in high volatility situations makes sense: in such cases, it is likely that the next return will be large in absolute value and relatively less likely that it will be close to zero.

The conditional distribution function of (*Xt*) is *FXt*<sup>|</sup>*Xt*−<sup>1</sup> (*x* | *<sup>x</sup>t*−<sup>1</sup>) = *FUt*<sup>|</sup>*Ut*−<sup>1</sup> (*FX*(*x*) | *FX*(*<sup>x</sup>t*−<sup>1</sup>)) and hence the ψ-quantile *<sup>x</sup>*ψ,*<sup>t</sup>* of *FXt*<sup>|</sup>*Xt*−<sup>1</sup>can be obtained by solving

$$\psi = F\_{\text{UL}|\text{dL}\_{t-1}} \left( F \chi(\mathbf{x}\_{\psi, t}) \mid F\_X(\mathbf{x}\_{t-1}) \right). \tag{22}$$

For ψ < 0.5, the negative of this value is often referred to as the conditional (1 − ψ)-VaR (value-at-risk) at time *t* in financial applications.

#### **5. Statistical Inference**

In the copula approach to dependence modelling, the copula is the object of central interest and marginal distributions are often of secondary importance. A number of di fferent approaches to estimation are found in the literature. As before, let *x*1, ... , *xn* represent realizations of variables *X*1, ... , *Xn* from the time series process (*Xt*).

The semi-parametric approach developed by Genest et al. (1995) is very widely used in copula inference and has been applied by Chen and Fan (2006) to first-order Markov copula models in the time series context. In this approach, the marginal distribution *FX* is first estimated non-parametrically using the scaled empirical distribution function *F*(*X*) *n* (see definition in Section 1) and the data are transformed onto the (0, 1) scale. This has the e ffect of creating pseudo-copula data *ut* = rank(*xt*)/(*n* + 1) where rank(*xt*) denotes the rank of *xt* within the sample. The copula is fitted to the pseudo-copula data by maximum likelihood (ML).

As an alternative, the inference-functions-for-margins (IFM) approach of Joe (2015) could be applied. This is also a two-step method although in this case a parametric model *<sup>F</sup>*<sup>ˆ</sup>*X* is estimated under an iid assumption in the first step and the copula is fitted to the data *ut* = *<sup>F</sup>*<sup>ˆ</sup>*X*(*xt*) in the second step.

The approach we adopt for our empirical example is to first use the semi-parametric approach to determine a reasonable copula process, then to estimate marginal parameters under an iid assumption, and finally to estimate all parameters jointly using the parameter estimates from the previous steps as starting values.

We concentrate on the mechanics of deriving maximum likelihood estimates (MLEs). The problem of establishing the asymptotic properties of the MLEs in our setting is a di fficult one. It is similar to, but appears to be more technically challenging than, the problem of showing consistency and e fficiency of MLEs for a Box-Cox-transformed Gaussian ARMA process, as discussed in Terasaka and Hosoya (2007). We are also working with a componentwise transformed ARMA process, although, in our case, the transformation (*Xt*) → (*Zt*) is via the nonlinear, non-increasing volatility proxy transformation *<sup>T</sup>*(*Z*)(*x*) in (5), which is not differentiable at the change point μ*<sup>T</sup>*. We have, however, run extensive simulations which suggests good behaviour of the MLEs in large samples.

#### *5.1. Maximum Likelihood Estimation of the VT-ARMA Copula Process*

We first consider the estimation of the VT-ARMA copula process for a sample of data *u*1, ... , *un*. Let θ(*V*) and θ(*A*) denote the parameters of the v-transform and ARMA model, respectively. It follows from Theorem 3 (part 2) and Proposition 5 that the log-likelihood for the sample *u*1, ... , *un* is simply the log density of the Gaussian copula under componentwise inverse v-transformation. This is given by

$$\begin{split} L\{\boldsymbol{\theta}^{(V)}, \boldsymbol{\theta}^{(A)} \mid \boldsymbol{u}\_{1}, \dots, \boldsymbol{u}\_{\text{fl}}\} &= L^{\*}\{\boldsymbol{\theta}^{(A)} \mid \boldsymbol{\Phi}^{-1}\Big(\mathcal{V}\_{\boldsymbol{\theta}^{(V)}}(\boldsymbol{u}\_{1})\big), \dots, \boldsymbol{\Phi}^{-1}\Big(\mathcal{V}\_{\boldsymbol{\theta}^{(V)}}(\boldsymbol{u}\_{\text{fl}})\big)\Big\} \\ &\quad - \sum\_{t=1}^{n} \ln \phi\Big(\boldsymbol{\Phi}^{-1}\Big(\mathcal{V}\_{\boldsymbol{\theta}^{(V)}}(\boldsymbol{u}\_{t})\big)\Big) \end{split} \tag{23}$$

where the first term *L*∗ is the log-likelihood for an ARMA model with a standard N(0,1) marginal distribution. Both terms in the log-likelihood (23) are relatively straightforward to evaluate.

The evaluation of the ARMA likelihood *L*∗(θ(*A*) | *z*1, ... , *zn*) for parameters θ(*A*) and data *z*1, ... , *zn* can be accomplished using the Kalman filter. However, it is important to note that the assumption that the data *z*1, ... , *zn* are standard normal requires a bespoke implementation of the Kalman filter, since standard software always treats the error variance σ2 as a free parameter in the ARMA model. In our case, we need to constrain σ2 to be a function of the ARMA parameters so that var(*Zt*) = 1. For example, in the case of an ARMA(1,1) model with AR parameter α1 and MA parameter β1, this means that σ2 = <sup>σ</sup><sup>2</sup>(<sup>α</sup>1, β1) = -1 − α21/-1 + 2α1β1 + β21. The constraint on σ2 must be incorporated into the state-space representation of the ARMA model.

Model validation tests for the VT-ARMA copula can be based on residuals

$$r\_t = z\_t - \hat{\mu}\_t, z\_t = \Phi^{-1}(\mathcal{V}\_{\dot{\theta}^{(V)}}(u\_t)) \tag{24}$$

where *zt* denotes the implied realization of the normalized volatility proxy variable and where an estimate μˆ*t* of the conditional mean μ*t* = *E*(*Zt* | *Zt*−<sup>1</sup> = *<sup>z</sup>t*) may be obtained as an output of the Kalman filter. The residuals should behave like an iid sample from a normal distribution.

Using the estimated model, it is also possible to implement a likelihood-ratio (LR) test for the presence of stochastic volatility in the data. Under the null hypothesis that θ(*A*) = **0**, the log-likelihood (23) is identically equal to zero. Thus, the size of the maximized log-likelihood *<sup>L</sup>*(θ<sup>ˆ</sup>(*V*), θ<sup>ˆ</sup>(*A*) ; *u*1, ... , *un*) provides a measure of the evidence for the presence of stochastic volatility.

#### *5.2. Adding a Marginal Model*

If *FX* and *fX* denote the cdf and density of the marginal model and the parameters are denoted <sup>θ</sup>(*M*), then the full log-likelihood for the data *x*1, ... , *xn* is simply

$$\begin{array}{rcl}L^{\text{full}}(\boldsymbol{\theta} \mid \mathbf{x}\_{1}, \dots, \mathbf{x}\_{n}) &=& \sum\_{t=1}^{n} \ln f\_{\mathbf{X}} \{ \mathbf{x}\_{t} \; ; \; \boldsymbol{\theta}^{(M)} \} \\ &+ L \big( \boldsymbol{\theta}^{(V)}, \boldsymbol{\theta}^{(A)} \mid \mathcal{F}\_{\mathbf{X}} \big( \mathbf{x}\_{1} \; ; \; \boldsymbol{\theta}^{(M)} \big) , \dots, \mathcal{F}\_{\mathbf{X}} \big( \mathbf{x}\_{n} \; ; \; \boldsymbol{\theta}^{(M)} \big) \end{array} \tag{25}$$

where the first term is the log-likelihood for a sample of iid data from the marginal distribution *FX* and the second term is (23).

When a marginal model is added, we can recover the implied form of the volatility proxy transformation using Proposition 3. If δ ˆ is the estimated fulcrum parameter of the v-transform, then the estimated change point is μ<sup>ˆ</sup>*T* = *F*−<sup>1</sup> *X* (δˆ; θ<sup>ˆ</sup>(*M*)) and the implied profile function is

$$\mathcal{G}\_T(\mathbf{x}) \quad = \; \mathcal{F}\_X^{-1} \{ \mathcal{F}\_X(\hat{\mu}\_T - \mathbf{x}) - \mathcal{V}\_{\hat{\mathcal{O}}(V)} \left( \mathcal{F}\_X(\hat{\mu}\_T - \mathbf{x}) \right) \} - \hat{\mu}\_T. \tag{26}$$

Note that is is possible to force the change point to be zero in a joint estimation of marginal model and copula by imposing the constraint *FX*(0; θ(*M*)) = δ on the fulcrum and marginal parameters during the optimization. However, in our experience, superior fits are obtained when these parameters are unconstrained.
