2.3.2. Vine Copulas

For actual statistical inference, a d-dimensional copula density c can be decomposed into a product of d (d−1)/2 so-called pair-copula constructions (PCCs) based on bivariate

(conditional) copulas [43]. The PCCs involve marginal conditional distributions of the form *F*(*x* |*ω*). Joe [44] showed that, for every j,

$$h(\mathbf{x}|\boldsymbol{\omega}) := F(\boldsymbol{\mathfrak{x}}|\boldsymbol{\omega}) = \frac{\partial \mathbb{C}\_{\mathbf{x},\boldsymbol{\omega}\_{\boldsymbol{\cdot}}|\boldsymbol{\omega}\_{-\boldsymbol{\cdot}}}(F(\boldsymbol{\mathfrak{x}}|\boldsymbol{\omega}\_{-\boldsymbol{\cdot}}), F(\boldsymbol{\omega}\_{\boldsymbol{\cdot}}|\boldsymbol{\omega}\_{-\boldsymbol{\cdot}}))}{\partial F(\boldsymbol{\omega}\_{\boldsymbol{\cdot}}|\boldsymbol{\omega}\_{-\boldsymbol{\cdot}})} \tag{7}$$

where *ω* = *ω*1, . . . , *ω<sup>j</sup>* , . . . , *ω<sup>n</sup>* is a *n*-dimensional vector, *ω<sup>j</sup>* is an arbitrarily selected component of the vector *ω*, and *ω*−*<sup>j</sup>* is a vector of *ω* without the *j*th component; *h*(*x* |*ω*) is the conditional distribution function given the *k*-dimensional vector *ω* (i.e., *h*-function) [43].

Then, the C-vines with one node connected to all others is the focus of this study (as shown in a). The density of the d-dimensional C-vine can be factorized as follows [45]:

$$f(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_d) = \prod\_{k=1}^d f\_k(\mathbf{x}\_k) \times \prod\_{l=1}^{d-1} \prod\_{j=1}^{d-l} c\_{l, l+j|1:(i-1)} \left( F(\mathbf{x}\_l | \mathbf{x}\_1, \dots, \mathbf{x}\_{i-1}), F\left(\mathbf{x}\_{i+j} | \mathbf{x}\_1, \dots, \mathbf{x}\_{i-1}\right) \right) \tag{8}$$

where *<sup>c</sup>i*,*i*+*j*|1:(*i*−1) are the bivariate (conditional) copula densities, index *j* indicates the trees, while *i* runs over the edges in each tree.

In order to understand the decomposition of C-vine structures, only 5-dimensional C-vine structure is taken as an example to show the pair-copulas of vine structure decomposition in Figure 2a, that is, the joint density of C-vine copula can be decomposed into the following:

*f*12345(*x*1, *x*2, *x*3, *x*4, *x*5) = *f*1(*x*1) · *f*2(*x*2) · *f*3(*x*3) · *f*4(*x*4) · *f*5(*x*5) ·*c*12(*F*1(*x*1), *F*2(*x*2)) · *c*13(*F*1(*x*1), *F*3(*x*3)) · *c*14(*F*1(*x*1), *F*4(*x*4)) · *c*15(*F*1(*x*1), *F*5(*x*5)) ·*c*23|<sup>1</sup> *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ), *<sup>F</sup>*3|<sup>1</sup> (*x*3|*x*<sup>1</sup> ) · *<sup>c</sup>*24|<sup>1</sup> *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ), *<sup>F</sup>*4|<sup>1</sup> (*x*4|*x*<sup>1</sup> ) · *<sup>c</sup>*25|<sup>1</sup> *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ), *<sup>F</sup>*5|<sup>1</sup> (*x*5|*x*<sup>1</sup> ) ·*c*34|12 *<sup>F</sup>*3|12(*x*3|*x*1, *<sup>x</sup>*<sup>2</sup> ), *<sup>F</sup>*4|12(*x*4|*x*1, *<sup>x</sup>*<sup>2</sup> ) · *<sup>c</sup>*35|12 *<sup>F</sup>*3|12(*x*3|*x*1, *<sup>x</sup>*<sup>2</sup> ), *<sup>F</sup>*5|12(*x*5|*x*1, *<sup>x</sup>*<sup>2</sup> ) ·*c*45|123 *<sup>F</sup>*4|123(*x*4|*x*1, *<sup>x</sup>*2, *<sup>x</sup>*<sup>3</sup> ), *<sup>F</sup>*5|123(*x*5|*x*1, *<sup>x</sup>*2, *<sup>x</sup>*<sup>3</sup> ) (9)

where *c*12(*F*1(*x*1), *F*2(*x*2)), denoted as *c*12, represents the density function of pair-copula with marginal distributions *F*1(*x*1) and *F*2(*x*2).

According to the joint density of a C-vine copula presented in Equation (9), a Cvine copula with a certain order for given data can be fitted using all of the pair-copulas (conditional bivariate copulas). Then, the conditional distribution function *C*34*|*<sup>12</sup> and *C*35*|*<sup>12</sup> from tree 3, with edges *F*3*|*12(*x*3*|x*1, *x*2), *F*4*|*12(*x*4*|x*1, *x*2), and *F*5*|*12(*x*5*|x*1, *x*2), can be obtained using Equation (7) along with *C*3*|*12, *C*4*|*12, *C*5*|*<sup>12</sup> and *C*12, *C*13, *C*14, *C*<sup>15</sup> from the first two trees. In general, the whole inferences for the conditional distribution function of predicted variable *x*<sup>5</sup> given *x*1, *x*2, *x*3, and *x*<sup>4</sup> can be decomposed recursively from the bivariate copulas as follows:

*<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*2|<sup>1</sup> (*F*2(*x*2)|*F*1(*x*1) ) *<sup>F</sup>*3|<sup>1</sup> (*x*3|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*3|<sup>1</sup> (*F*3(*x*3)|*F*1(*x*1) ) *<sup>F</sup>*4|<sup>1</sup> (*x*4|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*4|<sup>1</sup> (*F*4(*x*4)|*F*1(*x*1) ) *<sup>F</sup>*5|<sup>1</sup> (*x*5|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*5|<sup>1</sup> (*F*5(*x*5)|*F*1(*x*1) ) *For Tree* 2 *<sup>F</sup>*3|12(*x*3|*x*<sup>1</sup> , *<sup>x</sup>*<sup>2</sup> ) <sup>=</sup> *<sup>h</sup>*3|12 *<sup>F</sup>*3|<sup>1</sup> (*x*3|*x*<sup>1</sup> ) *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*3|12 *<sup>h</sup>*3|<sup>1</sup> (*F*3(*x*3)|*F*1(*x*1) ) *<sup>h</sup>*2|<sup>1</sup> (*F*2(*x*2)|*F*1(*x*1) ) *<sup>F</sup>*4|12(*x*4|*x*<sup>1</sup> , *<sup>x</sup>*<sup>2</sup> ) <sup>=</sup> *<sup>h</sup>*4|12 *<sup>F</sup>*4|<sup>1</sup> (*x*4|*x*<sup>1</sup> ) *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*4|12 *<sup>h</sup>*4|<sup>1</sup> (*F*4(*x*4)|*F*1(*x*1) ) *<sup>h</sup>*2|<sup>1</sup> (*F*2(*x*2)|*F*1(*x*1) ) *<sup>F</sup>*5|12(*x*5|*x*<sup>1</sup> , *<sup>x</sup>*<sup>2</sup> ) <sup>=</sup> *<sup>h</sup>*5|12 *<sup>F</sup>*5|<sup>1</sup> (*x*5|*x*<sup>1</sup> ) *<sup>F</sup>*2|<sup>1</sup> (*x*2|*x*<sup>1</sup> ) <sup>=</sup> *<sup>h</sup>*5|12 *<sup>h</sup>*5|<sup>1</sup> (*F*5(*x*5)|*F*1(*x*1) ) *<sup>h</sup>*2|<sup>1</sup> (*F*2(*x*2)|*F*1(*x*1) ) *For Tree* 3 . . . ⇒ *F*(*x*5|*x*<sup>1</sup> , *x*2, *x*3, *x*<sup>4</sup> ) = *h*(*h*(*T*25,1|*T*23,1 )|*h*(*T*24,1|*T*23,1 ) ) (10)

$$\text{where } T\_{ij,1} = h\left(h\left(u\_j|u\_1\right)|h(u\_i|u\_1)\right), \text{ } 2 \le i < j \le 5.$$

2.3.3. CVQR Model

Generally, taking the bivariate copula as an example, the condition distribution function of Y under the condition of X = *x*, i.e., *FY|X*(*y|x*) can be expressed as follows:

$$F\_{Y|X}(y|\mathbf{x}) = \mathbb{C}\_1(F\_X(\mathbf{x}), F\_Y(y)) = \partial \mathbb{C}(u, v) / \partial u \tag{11}$$

where *u* = *FX*(*x*), *v* = *FY*(*y*) are the cumulative distribution function of *y* and *x*, respectively.

For any probabilities *τ* ∈ (0, 1) (e.g., τ = 0.05, 0.1, . . . , 0.95), the τth quantile function of Y given X = *x* from *C*1(*FX*(*x*), *FY*(*y*)) can be derived from the h-function:

$$\pi = F\_{Y|X}(y|\mathbf{x}) \equiv \mathbb{C}\_1(F\_X(\mathbf{x}), F\_Y(y)) \tag{12}$$

$$Q\_Y(\tau|X=\mathbf{x}) = F\_Y^{-1}\left(h^{-1}(\tau|\mu)\right) \tag{13}$$

where *h* −1 () indicates the inverse conditional distribution function (inverse h-function) of a given parametric bivariate copula.

In this study, the main purpose of the C-vine copula-based quantile regression (CVQR) model is to predict the quantile of a response variable Y given the outcome of some predictor variables. For the five-dimensional case, according to Equations (10)–(13), the τth conditional quantile function of x5, *Qx*<sup>5</sup> (*τ*|*x*1, *x*2, *x*3, *x*<sup>4</sup> ), can be derived from the recursive formulation:

$$\begin{array}{l} Q\_{\mathfrak{x}\mathfrak{z}}(\boldsymbol{\tau}|\mathbf{x}\_{1},\mathbf{x}\_{2},\mathbf{x}\_{3},\mathbf{x}\_{4}) = F^{-1}(\boldsymbol{u}\_{\mathfrak{z}}) = \\ F^{-1}\{h^{-1}\{h^{-1}(h^{-1}(\boldsymbol{\tau}|h(\boldsymbol{h}(\boldsymbol{u}\_{4}|\boldsymbol{u}\_{1}))|h(\boldsymbol{u}\_{3}|\boldsymbol{u}\_{1})))|h(\boldsymbol{h}(\boldsymbol{u}\_{3}|\boldsymbol{u}\_{1})|h(\boldsymbol{u}\_{2}|\boldsymbol{u}\_{1}))\} |h(\boldsymbol{u}\_{2}|\boldsymbol{u}\_{1})\}\,\end{array} \tag{14}$$

A C-vine copula-based quantile regression (CVQR) model is developed for monthly streamflow forecasting coupling a C-vine copula model and a quantile regression method within a general optimization framework. Specially, the CVQR model is constructed by modelling the distributions of predictors (i.e., *x*1, . . . *xn*−1) and predicted variable (*xn*) with the selected n-d C-vine (structure), i.e., unconditioned and conditioned pairs (e.g., Equation (9)); then, the predicted variable of *x<sup>n</sup>* is derived from the conditional distribution function (Equations (10)–(14)). In detail, the predicted variable *x*<sup>5</sup> can be obtained from the given predictor variables *x*1, *x*2, *x*3, and *x*4. Firstly, the Monte Carlo simulation is used to generate a sample of 5000 uniformly distributed random numbers spaced [0, 1] as the quantiles τ. Secondly, the 5000 implementations of *x*<sup>5</sup> can be generated using Equation (14), with one random number generated for each quantile τ. Then, the average of these realizations is considered the general prediction.

A recommended tool for statistical inference of vine copulas is statistical software R with the VineCopula package (http://CRAN.R-project.org/, accessed on 20 January 2021). In this study, the Archimedean copula family (Frank, Clayton, and Gumbel copulas [46,47]) and Normal and Student's t copulas are employed to build the C-vine structures. The optimal bivariate copula families associated with parameter estimation are selected and calculated depending on the AIC and BIC using the maximum likelihood estimation (MLE) for the first C-vine tree. Then, based on these pair-copula families and the corresponding estimated parameters, the *h*-function can be used to calculate and specify the pair-copula input for the second C-vine tree. The process is iterated tree by tree until the last paircopula is evaluated. The building steps were detailed in Brechmann and Schepsmeier [48]. Meanwhile, the goodness-of-fit test includes the λ-function and Kolmogorov–Smirnov (KS) test with *p*-values and statistics (Sn) to check whether the selected copula is suitable for describing the observed dependencies, where the λ function is defined as follows:

$$
\lambda(v,\theta) = v - K(v,\theta) \tag{15}
$$

where *K*(*v*, *θ*) = *P*(*C*(*u*1, *u*2|*θ*) ≤ *v*) is the Kendall distribution function of copula *C* with parameter *θ*, and *v* ∈ [0, 1], and (*u*1*, u*2) are the marginal cumulative distribution func-

tions of copula *C*. The λ-function can be obtained by the 'BiCopLambda' function in the VineCopula package. For more descriptions, please refer to Genest and Favre [49], and Genest and Rivest [50].

In general, the main procedures of the proposed CVQR model for monthly streamflow predictions can be expresses as follows:

Step 1: Fit optimal marginal distributions, denoted as *u<sup>i</sup>* = *Fi*(*xi*), (*i* = 1, 2, . . . , *d*);

Step 2: Model the joint probability distributions *C* (*u*1, *u*2), . . . , *C* (*u*1, *u<sup>d</sup>* ), and then, the C-vine copula is iterated tree by tree until the last pair-copula is evaluated *F*(*x*1, *x*2, . . . , *xd*) = *C*(*u*1, *u*2, . . . , *ud*);

Step 3: Calculate the conditional distribution of the predictive variable (monthly streamflow) *u<sup>d</sup>* , *F*(*x<sup>d</sup>* |*x*1, *x*2, . . . , *xd*−<sup>1</sup> );

Step 4: Generate uniformly distributed random numbers τ, and then, predictive variable is derived from the inverse function of the conditional distribution in Step 3, that is, *x<sup>d</sup>* = *F* −1 (*τ*|*x*1, *x*2, . . . , *xd*−<sup>1</sup> ).
