Abstract
We discuss a Bayesian hierarchical copula model for clusters of financial time series. A similar approach has been developed in recent paper. However, the prior distributions proposed there do not always provide a proper posterior. In order to circumvent the problem, we adopt a proper global–local shrinkage prior, which is also able to account for potential dependence structures among different clusters. The performance of the proposed model is presented via simulations and a real data analysis.
1. Introduction
There is a large body of literature with respect to hierarchical model settings. The concept to pull the mean of a single group towards the mean across different groups can be found at least in Kelley [1]. Tiao and Tan [2] and Hill [3] consider the one-way random effects model and they discuss a Bayesian approach for the analysis of variance because the frequentist unbiased estimator of the variance of random effects could be negative. For the same model, Stone and Springer [4] discuss and resolve a paradox that arises with the use of Jeffreys’ prior. The foundation for the Bayesian hierarchical linear model is established in Lindley and Smith [5]. More recently, Gelman [6] discuss a review on prior distributions for variance parameters in the hierarchical model.
More recently, Zhuang et al. [7] introduced a hierarchical model in a copula framework; they suggest using, for the variance parameters of two different priors, (i) the standard improper prior for scale parameters, which is proportional to , or (ii) a vaguely informative prior, say an inverse gamma density with both parameters equal to a small value.
However, both the above proposals might be impractical: in the first case, the posterior is simply not proper (as we show in the Appendix A); in the second case, the use of small parameters of the inverse Gamma priors simply hides the problem without actually solving it; see for example Berger [8].
Hobert and Casella [9] also provide another review on the effect of improper priors in the Gibbs sampling algorithm.
In this paper, we propose a Bayesian hierarchical copula model using a different prior. In particular, we adopt a global–local shrinkage prior. These prior distributions naturally arise in a linear regression framework with high dimensional data and where a sparsity constraint is necessary for the vector of coefficients. Several different global–local shrinkage families of priors have been proposed: Park and Casella [10] and Hans [11] discuss the Bayesian LASSO; Carvalho et al. [12] introduce the Horseshoe prior, Armagan et al. [13] propose a Generalized Double Pareto prior. Here, we will use a Dirichlet–Laplace prior, proposed in Bhattacharya et al. [14], with a slight modification; while in a regression framework, it is natural to adopt a prior that shrinks the parameters towards zero, this is not the case for our hierarchical copula model, where the zero value does not have a particular interpretation in the model. For this reason we need to introduce a further level of hierarchy, assuming a prior distribution on the location of the shrinkage point.
The rest of this paper is organized as follows: The next section is devoted to illustrating the statistical model and the prior distribution, highlighting the differences with the approach described in Zhuang et al. [7]; we conclude the section with a description of the sampling algorithm. In the third section, we perform a simulation study in order to compare the mean square error of the estimates produced by our model and compare them with a standard maximum likelihood approach. Then, we reconsider a dataset discussed in Zhuang et al. [7] and compare the results of the two approaches. We conclude with another illustration of the model in the problem of clustering financial time series.
2. Materials and Methods
2.1. The Statistical Model
2.1.1. Likelihood and Priors Distributions
Copula representation is a way to recast a multivariate distribution in such a way that the dependence structure is not influenced by the shape, the parametrization, and the unit of measurement of the marginal distributions. Their applications in statistical inferences and a review on the most popular approaches can be found in Hofert et al. [15]. In this paper we will consider several different parametric forms of copula functions: In particular, in the bivariate case, we will use the standard Archimedean families, namely the Joe, Clayton, Gumbel, and Frank copulae. For more than two dimensions, we will concentrate on the use of the most popular elliptical versions, namely the Gaussian and Student’s t copulae. Since the main objective of the paper is the clusterization of the dependence structure, for the sake of simplicity and without a loss of generality, we will assume that all marginal distributions are known or, equivalently, their parameters have been previously estimated. In this way, we can directly work with the transformed variables: , .
Let be the generic copula density function associated with the i-th group. The statistical model can be stated as follows:
where m denotes the number of groups or clusters. Set the following:
and assume the following.
In the previous expressions, and , respectively, denote the lower and the upper bound of the parameter space of the corresponding , and is the mapping of into the real axis; is the dimension of i-th group, and a is a hyperparameter, which we typically set to 1, although different values can be used. In general, the Archimedean copulae are parametrized in terms of Kendall’s Tau, for which its range of values has been restricted to for the Clayton, Joe, and Gumbel copulae, while it is set to for the Frank copula. In the elliptical case, the Gaussian copula is parametrized in terms of the correlation coefficient , which ranges in ; finally, Student’s t copula has the additional parameter , and that is the number of degrees of freedom: A discrete uniform prior on has been used here. When dimension d of the specific group is larger than two, we restrict the analysis to elliptical copulae with an equi-correlation matrix: in that case, it is well known that the range of the correlation parameter is .
Let be entire observed sample and let be the k-th observation of i-th component in the j-th group, and let be the number of observation in the j-th group. The posterior distribution on the parameter vector is then described as follows:
where and .
The complex form of the posterior distribution requires the use of simulation based methods of inference. In particular, we will adapt the algorithm of Bhattacharya et al. [14] with a minor modification for the updates of and the shrinkage location . Following, Bhattacharya et al. [14], we introduce a vector in order to have a latent variable representation of the prior; then, the following is obtained.
Here, we briefly describe the algorithm. Start the chain at time 0 by drawing a sample from the prior. At time t, we use the following updating procedure:
- 1.
- Update :
- (a)
- Sample from a proposal Cauchy ;
- (b)
- Set and compute the following.
- (c)
- Sample ,
- (d)
- Set if ; otherwise, .
- 2.
- Update :
- (a)
- Sample from a proposal Cauchy ;
- (b)
- Compute the following.
- (c)
- Sample ;
- (d)
- Set if ; otherwise, .
- 3.
- Update : sample .
- 4.
- Update : sample , and set the following.
- 5.
- Update : sample and set the following.
In previous statements, Cauchy denotes a one-dimensional Cauchy distribution with location a and scale b, while is the generalized inverse Gaussian distribution with the following density function.
Notice that is the inverse Gaussian distribution, and it is known that . Finally, and are scalar tuning parameters.
In the case of the Student’s t copula, we need to add another step between stride 1 and 2 in order to update :
- Update :
- (a)
- Sample from discrete uniform distribution in ;
- (b)
- Compute the following.
- (c)
- Sample ;
- (d)
- Set if ; otherwise, .
2.1.2. Prior Distribution of
The choice of the prior distribution for the shrinkage location needs some explanation. First of all, notice that, according to our prior specification,
however , so otherwise is the case.
Therefore, given , the median of is . Then, it is easy to show that the natural choice of a uniform prior on for all implies a standard logistic density for .
2.1.3. Previous Work
Apart form the prior specification, the model described in previous sections is the one proposed by Zhuang et al. [7]. We restrict our discussion to the case where each copula expression has one parameter only. Their prior can be stated as follows.
There is no unique choice for the distributions of , although the authors suggest using weakly informative priors, for example, inverse gamma densities with small hyperparameters values or, as an alternative, an objective prior: for example, an improper uniform prior. However, one can prove that, in the second case, the posterior distribution cannot be proper no matter what the sample size is. We show this result in Appendix A. When the posterior distribution is improper, the resulting summary statistics are meaningless. In fact, the Markov Chain implied by the MCMC does not have a limiting distribution so the Ergodic theorem does not hold and the posterior is completely useless. Moreover, even the first solution is not feasible. In fact, when an improper prior produces an improper posterior, using a vague proper prior can typically hide—not solve—the problem. In these cases, in fact, as shown in Berger [8] (p. 398), the use of a vague prior approximating an improper prior typically concentrates the posterior mass on some boundary of the parameter space.
3. Results
3.1. Simulation Study
We compare the performance of our approach with the results based on a maximum likelihood approach in a simulation study. We will use a Student’s t copula with an equi-correlation matrix and set the number of groups m equal to five. We repeat the procedure 100 times; at iteration j for the i-th group, we sample the true value from a standard normal distribution, the degrees of freedom are sampled from the prior distribution, and the dimensions of the groups are sampled from the uniform discrete distribution in . Given the parameters and dimensions of the groups, we sample 20 observations for each group. In the maximum likelihood framework, we estimate the following:
and compute the standard errors.
In a Bayesian framework, we use the posterior mean as a point estimate, obtained from the use of the MCMC algorithm described above. We ran six independent chains of scans, discarded the first as a burn-in, and finally computed the via the sample mean of simulation outputs for all . As a tuning parameters, we set and . Then, we compute the following.
Comparison are performed in terms of the corresponding mean square errors.
Table 1 reports values against for all groups based on 100 simulations.
Table 1.
MSE of the proposed Bayesian Hierarchical Model and of the likelihood-based one.
3.2. Real Data Applications
This section is devoted to the implementation of the method in two different applications. The first one is the same as in Zhuang et al. [7] and we include it for comparative purposes; to this end, we quantify the goodness of fit of the model using a predictive approach based on the conditional version of the Widely Applicable Information Criterion, WAIC, in a hierarchical setting, as discussed in Millar [16]. The second one deals with clustering financial time series.
3.2.1. Column Vertebral Data
We apply our model to the Column Vertebral Data, available at the UCI Machine Learning Repository. It consists of 60 patients with disk hernia, 150 subjects with spondylolisthesis, and 100 healthy individuals; data are available for the following variables: angle of pelvic incidence (PI), angle of pelvic tilt (PT), lumbar lordosis angle (LL), sacral slope (SS), pelvic radius (PR), and the degree of spondylolisthesis (DS). As in Zhuang et al. [7], we adopt the generalized skew-t distribution for the marginals, use a maximum likelihood estimator in order to calibrate the parameters and then transform data via the fitted cumulative distribution function. Computations were performed using the R package sgt available on CRAN. Table 2 reports the values of fitted parameters for the marginals.
Table 2.
Fitted parameters for each margin distribution.
Following Zhuang et al. [7], we consider the same parametric copulae for the bivariate distributions of the features of interest, and for each of these, we construct our Bayesian hierarchical copula model for three groups of subjects. We run six independent chains of simulations and discard the first . We also set and . We did not report any convergence issues, and the multiple Gelman–Rubin test scores for each of the six implemented models Gelman [17] were very close to the optimal value 1. In terms of the goodness of fit, we have computed the WAIC index for all six models. Our findings is that the most significant relation is the one between PI and PT. Table 3 compares the results of Zhuang et al. [7] (model A) with our ones (model B). The main difference between the results obtained with the two methods is related to the posterior uncertainty quantification. Credible intervals obtained with model B are systemically larger than those obtianed with model A. Our feeling is that it depends on the fact that results in model A are obtained by running a chain where some hyperparameters are fixed to some estimated values, as explained in Zhuang et al. [7]. Fixing values of the hyperparameters eliminates a critical source of variation, inducing shrinkage in credible intervals size.
Table 3.
Fitted parameters of copulae.
For the ease of comparisons, we follow Zhuang et al. [7] and report the results not in terms of parameter but rather according the natural parameter of each copula, that is, for the Gaussian copula and for the Archimedean ones.
3.2.2. Financial Data Application
Grouping financial time series is important for diversification purposes; a portfolio manager should avoid investing in instruments with a high degree of positive dependence, and clustering procedures allow the construction of groups according to some specific risk measure. In this way, financial instruments that belong to the same group will show a certain degree of association; however, the strength of dependence within groups may well be different in different groups. It is then important to assess the strength of the association for each single cluster, and a method to perform this is to use a hierarchical structure, such as the one discussed in this paper.
As a risk measure, we consider the so-called tail index, which measures the strength of dependence between two variables when one of them takes extremely low values. Following De Luca and Zuccolotto [18], we construct a dissimilarity measure based on the lower tail coefficient. Let be a bivariate random vector; the lower tail coefficient of is defined as follow:
or, equivalently,
where is the cumulative distribution function of the copula associated to . In order to estimate , we use the empirical estimator discussed in [19]:
where is the empirical copula, and n is the sample size. The dissimilarity measure is then defined as follows.
The preliminary clustering procedure has been implemented using a complete linkage method. Notice that a bivariate lower tail coefficient is not the unique method for modeling dependence on extreme low values: Durante et al. [20] proposed a conditioned correlation coefficient estimated using a nonparametric approach; Fuchs et al. [21] analyzed dissimilarity measure applicable to a multivariate lower tail coefficient.
We consider the “S&P 500 Full Dataset” available at Kaggle: It contains more relevant information for the components of S&P 500. We take the daily closing prices from 5 June 2000 to 5 June 2020 and discard instruments without a complete record for this period. Then, we restrict our analysis to 379 components. For all of them, we computed the log-returns by taking log-differences and filter data by fitting; for each time series, an ARMA(1,1)GJR-GARCH(1,1) model with Student’s t innovations was used; then, we extracted residuals and transformed them via the fitted cumulative distribution function in order to obtain pseudo-data. Computations were performed using the CRAN package rugarch. Hence, we compute the empirical estimator of the lower tail coefficient for any possible pair and the dissimilarity measure associated and use them to feed the clustering algorithm. Due to computational complexities, we used the coarsest partition under the constraint that the largest group must have at most 10 components. We obtained 30 groups with dimensions of more than one and discarded instruments that belong to groups with only one component. The final number of instruments was thus reduced to 93.
We ran the MCMC algorithm described above for the 30 clusters, performing 12 independent chains of scans and discarding the first as they burned in. Tuning parameters were set to , . Moreover, in this example, we did not report any convergence issues, and the Gelman–Rubin test score was 1.02. For each scan and for any group, we compute the lower tail coefficient via the following formula:
where is the univariate cumulative distribution function of a Student’s t random variable with degrees of freedom. The copula used in this example was a Student’s t copula with an equi-correlation matrix: As a consequence, we obtained a single value for the lower tail coefficient for each cluster. Table 4 reports the results for each pair that belongs to the same group. Finally, we report the estimation results.
Table 4.
Posterior distributions for lower tail coefficients.
4. Conclusions
We discussed and improved a fully Bayesian analysis for a hierarchical copula model proposed in Zhuang et al. [7]. We proposed the use of a proper prior, which is able to induce shrinkage and, at the same time, dependence among different clusters of observations. This prior does not mimic the behavior of an improper prior and is better suited for objectively representing information coming from the data. Our prior belongs to the large family of globa–local shrinkage densities, with an extra stage in the hierarchy, due to the absence of a significant shrinkage value; we experienced that this approach is very effective and useful in the case of parametric copulae depending on a single parameter. In a more general situation, this approach needs to be modified, and this can be easily accommodated.
Finally, we presented an application in a financial context, where the goal was to estimate the lower tail coefficient of several financial time series in a parametric way using the Student’s t copula.
Author Contributions
Funding
B. Liseo acknowledges the financial support of Sapienza Università di Roma, Italy, grant n. RG12117A85687F4D, year 2021.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Dataset Vertebral Column can be found at the website http://archive.ics.uci.edu/ml/datasets/vertebral+column (accessed on 1 June 2021). Dataset S&P stock can be found at the website https://www.kaggle.com/datasets/nroll12/sp-500-full-dataset (accessed on 1 June 2021).
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| S&P | Standard and Poor’s 500 stock exchange index; |
| mle or MLE | Maximum likelihood estimator; |
| MSE | Mean squared error; |
| MCMC | Markov chain Monte Carlo. |
Appendix A
Here, we show that the prior proposed in Zhuang et al. [7] leads to an improper posterior.
The statistical model consists of m d-dimensional copulae governing different sets of observations.
Let ; here, is a scaling parameter that can be considered known. One-to-one mapping functions are needed to put all dependence parameters on the real line. Zhuang et al. [7] made the following assumptions.
Hyper-parameters ’s, , and are given a suitable prior distribution. For the moment, we do not specify the priors and set the following.
Since the ’s are one-to-one, we write instead of . Let U be the observed sample, and let be the k-th observation of i-th component in the j-th group. Let be the sample size of the j-th group. Furthermore, let , and . Finally, let denote the parameter space of the generic parameter .
The next proposition shows that, using standard noninformative priors for scale and location parameters, the resulting posterior will be improper independently of the sample size.
Proposition A1.
If for , and , the posterior distribution is improper for any choice of the copula densities and independently of the sample size.
Proof.
For the sake of clarity, set and . We need to show that the following pseudo-marginal posterior distribution of is not integrable:
where represents the likelihood function. Then, we obtain the following:
with
and
Consider only the following:
and set ; then, we obtain the following.
For any choice of , can be written as follows.
Now, we compute the following.
Notice that the following is the case:
and set and : then, we obtain the following.
So is a convex parabolic function of , and by the Weierstrass theorem, a global maximum exists for all bounded and closed sets. By integrating , one obtains the following.
Let . The second term of the last expression is as follows:
which also implies the following.
For the same argument, one can also see that the following obtains.
It follows that
□
A similar argument can be used to prove the following result.
Proposition A2.
If for , and , the posterior distribution is improper for any choice of copula densities and is independent of the sample size.
Proof.
As before, one needs to show that the following pseudo-marginal posterior distribution of does not have a finite integral.
We use the same notation as in Proposition 1 and assume (when , the theorem is trivially true since itself is not defined). With a slight modification in the proof of the proposition, we obtain the following:
and
However, for all , the integral with respect to is not finite, and this again implies the following.
□
References
- Kelley, L.T. The Interpretation of Educational Measurement; Measurement and Adjustment Series; WorldBook Company: Yonkers-on-Hudson, NY, USA, 1927. [Google Scholar]
- Tiao, G.C.; Tan, W.Y. Bayesian Analysis of Random-Effect Models in the Analysis of Variance. i. Posterior Distribution of Variance-Components. Biometrika 1965, 52, 37–53. [Google Scholar] [CrossRef]
- Hill, B.M. Inference About Variance Components in the One-Way Model. J. Am. Stat. Assoc. 1965, 60, 806–825. [Google Scholar] [CrossRef]
- Stone, M.; Springer, B.G.F. A Paradox Involving Quasi Prior Distributions. Biometrika 1965, 52, 623–627. [Google Scholar] [CrossRef]
- Lindley, D.V.; Smith, A.F.M. Bayes Estimates for the Linear Model. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 1–41. [Google Scholar] [CrossRef]
- Gelman, A. Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper. Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
- Zhuang, H.; Diao, L.; Yi, G.Y. A Bayesian Hierarchical Copula Model. Electron. J. Stat. 2020, 14, 4457–4488. [Google Scholar] [CrossRef]
- Berger, J. The Case for Objective Bayesian Analysis. Bayesian Anal. 2006, 1, 385–402. [Google Scholar] [CrossRef]
- Hobert, J.P.; Casella, G. The Effect of Improper Priors on Gibbs Sampling in Hierarchical Linear Mixed Models. J. Am. Stat. Assoc. 1996, 91, 1461–1473. [Google Scholar] [CrossRef]
- Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
- Hans, C. Bayesian Lasso Regression. Biometrika 2009, 96, 835–845. [Google Scholar] [CrossRef]
- Carvalho, C.M.; Polson, N.G.; Scott, J.G. The Horseshoe Estimator for Sparse Signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef]
- Armagan, A.; Dunson, D.; Lee, J. Generalized Double Pareto Shrinkage. Stat. Sin. 2013, 23, 119–143. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, A.; Pati, D.; Pillai, N.S.; Dunson, D.B. Dirichlet–Laplace Priors for Optimal Shrinkage. J. Am. Stat. Assoc. 2016, 110, 1479–1490. [Google Scholar] [CrossRef] [PubMed]
- Hofert, M.; Kojadinovic, I.; Maechler, M.; Yan, J. Elements of Copula Modeling with R; Springer Use R! Series; Springer: New York, NY, USA, 2018. [Google Scholar]
- Millar, R. Conditional vs. marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation. Stat. Comput. 2018, 28, 375–385. [Google Scholar] [CrossRef]
- Gelman, A. and Rubin, D.B. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 1992, 1, 457–472. [Google Scholar]
- De Luca, G.; Zuccolotto, P. A Tail Dependence-Based Dissimilarity Measure for Financial Time Series Clustering. Adv. Data Anal. Classif. 2011, 5, 323–340. [Google Scholar] [CrossRef]
- Joe, H.; Smith, R.L.; Weissman, I. Bivariate Threshold Methods for Extremes. J. R. Stat. Soc. Ser. B (Methodol.) 1992, 54, 171–183. [Google Scholar] [CrossRef]
- Durante, F.; Pappadà, R.; Torelli, N. Clustering of Financial Time Series in Risky Scenarios. Adv. Data Anal. Classif. 2014, 8, 359–376. [Google Scholar] [CrossRef]
- Fuchs, S.; Di Lascio, F.M.L.; Durante, F. Dissimilarity Functions for Rank-Invariant Hierarchical Clustering of Continuous Variables. Comput. Stat. Data Anal. 2021, 159, 107201. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).