You are currently viewing a new version of our website. To view the old version click .
Risks
  • Article
  • Open Access

4 February 2024

L1 Regularization for High-Dimensional Multivariate GARCH Models

,
and
1
Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
2
School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
3
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11733, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Risks Journal: A Decade of Advancing Knowledge and Shaping the Future

Abstract

The complexity of estimating multivariate GARCH models increases significantly with the increase in the number of asset series. To address this issue, we propose a general regularization framework for high-dimensional GARCH models with BEKK representations, and obtain a penalized quasi-maximum likelihood (PQML) estimator. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for the BEKK representations. We then carry out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we apply the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018, and show that the proposed framework outperforms some benchmark models.

1. Introduction

Modeling the dynamics of high-dimensional variance–covariance matrices is a challenging problem in high-dimensional time series analysis and has wide applications in financial econometrics. Classical time series models for variance–covariance matrices assume that the number of component time series is low with respect to the number of observed samples. However, many financial and economic applications these days need to model the dynamics of high-dimensional variance–covariance matrices. For example, in modern portfolio management, the number of assets can easily be more than thousands and be larger or on the same order as the observed historical prices of the assets; in analyzing the movements in the financial markets of different products in different countries, it is critical to understand the interdependence and contagion effects of price movements over thousands of markets, while the amounts of jointly observed financial data are only available in decades.
In this paper, we propose an inference procedure with L 1 regularization for high-dimensional BEKK representations and obtain a class of penalized quasi-maximum likelihood (PQML) estimators. The L 1 regularization allows us to identify important parameters and shrink the non-essential ones to zero, hence providing an estimate of sparse parameters in BEKK representations. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for BEKK representations. The proposed procedure is a fairly general framework that can be applied to a large class of high-dimensional MGARCH models; by applying our regularization techniques, the complexity of making inferences from high-dimensional MGARCH models can be greatly reduced and the intrinsic sparse model structures can be uncovered. We carried out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we applied the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018. In the comparison of portfolio optimization based on different MGARCH models, we show that the proposed framework outperforms three benchmark models, i.e., the constant covariance model, the factor MGARCH model, and the dynamic conditional correlation model.
The proposed framework can be viewed as an extension of the literature on regularization techniques for converting high-dimensional linear models to nonlinear time series models. Since () introduced LASSO for linear regression models, various regularization techniques concerning high-dimensional statistical inference have been studied for various problems in linear models. For example, () proposed the smoothly clipped absolute deviation (SCAD) penalty that generates sparse estimation of regression coefficients with reduced bias and explored the so-called “oracle property”, in which the estimator has asymptotic properties that are equivalent to the maximum likelihood estimator in the non-penalized model. () proposed adaptive LASSO by adding adaptive weights for different parameters in the L 1 penalty to obtain better estimator performance. () proposed a group LASSO penalty to solve the problem of selecting grouped factors in regression models. () proposed a minimax concave penalty that gives nearly unbiased variable selection in linear regression. In addition to discussions on regularized estimation in high-dimensional statistics, which relies primarily on independent and identically distributed (i.i.d.) samples and linear models, regularization techniques have also been applied to study inference problems in high-dimensional linear time-series models. For instance, () studied a class of penalty functions and showed the oracle properties for the estimators in high-dimensional vector autoregressive (VAR) models. () investigated the theoretical properties of L 1 -regularized estimates in high-dimensional stochastic regressions with serially correlated errors and transition matrix estimation in high-dimensional VAR models. () developed a regularization framework for full-factor MGARCH models (), in which the dynamics of covariance matrices are determined by the dynamics of univariate GARCH processes for orthogonal factors. Using the group LASSO technique, () studied the inference problem for MGARCH models with vine structure, an alternative to dynamic conditional correlation MGARCH models.
The proposed regularization framework is also related to the problem of estimating p × p covariance matrices using various shrinkage and regularization methods. For instance, () proposed an optimal linear shrinkage method to estimate constant covariance matrices of p-dimensional i.i.d. vectors, and, later on, () extended the method and developed nonlinear shrinkage estimators for high-dimensional covariance matrices. () and () proposed covariance regularization procedures that are based on the thresholding of sample covariance matrices to estimate inverse covariance matrices. () studied sparsistency and rates of convergence for estimating covariance based on penalized likelihood with nonconcave penalties, and () estimated high-dimensional inverse covariance by minimizing L 1 -penalized log-determinant divergence. This method is also called graphical LASSO and was studied in () and (). We note that all these discussions focus on high-dimensional constant covariance matrices; thus, they do not involve the dynamics of covariance matrices.
The remainder of the paper is organized as follows. Section 2 provides a literature review of MGARCH models and their applications in volatility spillover. Section 3 explains the BEKK model with L 1 -penalty functions in detail. In Section 4, we provide theoretical properties and implementation procedures for the regularized BEKK model. Simulation results and real data analysis are presented in Section 5 and Section 6, respectively. Section 7 gives concluding remarks.

2. Literature Review

Inspired by the idea of univariate generalized autoregressive conditionally heteroskedastic (GARCH) models (); (); (); (), various multivariate GARCH (MGARCH) models were proposed to characterize the dynamics of covariance matrices during the last three decades. Among these MGARCH models, the Baba–Engle–Kraft–Kroner (BEKK) model () uses a general specification to describe the dynamics of covariance matrices of an n-dimensional multivariate time series. Since such a specification contains unknown parameters of order O ( n 2 ) , inference on the BEKK model becomes complicated, even for not very large ns. When n 2 increases with the same order as, or larger order than, the length of the time series, inference on the MGARCH–BEKK representation becomes even more difficult due to “the curse of dimensionality”.
To reduce the complexity of inference procedures for unknown parameters in MGARCH models, other forms of MGARCH specifications were proposed to reduce the number of unknown parameters in the model. An important improvement to MGARCH models is the dynamic conditional correlation (DCC) model (; ; ; ). The DCC model allows for time-varying conditional correlations and reduces the dimensionality by factorizing the conditional covariance matrix into the product of a diagonal matrix of conditional standard deviations and a correlation matrix that evolves dynamically over time. Other forms of MGARCH specifications make more assumptions on structures and dynamics of covariance matrices and include, for example, the MGARCH in mean model (), the constant conditional correlation GARCH model (; ; ), the time-varying conditional correlation MGARCH model (), the orthogonal factor MGARCH model (; ), and so on. Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of dynamics of covariance matrices. Furthermore, these models still fail to address the issue of making inference on high-dimensional MGARCH models.
In addition to modeling the joint behavior of volatilities for a set of returns, another aspect of MGARCH models is to characterize volatility spillover in financial markets. Volatility spillover refers to as the process and magnitude by which the instability in one market affects other markets. Volatility spillover is widely observed in equity markets (), bond markets (), futures markets (), exchange markets (), markets of equities and exchanges (), various industries and commodities (; ), and so on. Understanding volatility spillover can provide an insight into financial vulnerabilities, as well as the source and nature of financial exposures, for academic researchers, financial practitioners, and regulatory authorities. For investors, as significant volatility spillover may increase non-systemic risk, understanding volatility spillover can help them diversify the risks associated with their investment. For financial sector regulators, understanding volatility spillover can help them formulate appropriate policies to maintain financial stability, especially when stress from a particular market is transmitted to other markets, such that the risk of systemic instability increases. MGARCH models are generally used to characterize volatility spillover in the markets, which are represented via a low-dimensional multivariate series; see (), (); (), (), and (). In particular, () used multivariate GARCH-in-mean model to study the economic spillover effect across five countries, () applied a BEKK(1,1) model to study transmission of weekly equity returns and volatility in nine Asian countries from 1988 to 2000, and () employed the BEKK(1,1) specification to study three-dimensional US sector indices. Spillover effect has also been explored recently for other financial markets, such as cryptocurrency markets () and European banks with GARCH models (). Additionally, there has been an investigation into the spillover effects using network representations derived from GARCH models in recent studies (; ).
The aforementioned studies on spillover effects rely on the foundational structures of the DCC model for analysis (; ; ). Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of the dynamics of covariance matrices. Moreover, these models still fail to address the issue of making inference on high-dimensional MGARCH models. Under these constraints, the performance and accuracy of these simplified MGARCH models need further investigation in real markets ().

3. The MGARCH–BEKK Representations with L 1 Regularization

We first introduce the following notations. Given a vector x and a matrix A, the ith component of x and the i j th elements of A are written as x i and A i j , respectively. The jth column and the ith row vectors of A are denoted as A . j and A i . , respectively. | | x | | is the Euclidean norm for vector x. | | x | | is the largest element of x in the modulus. ρ ( A ) is the spectral radius of A, i.e., the largest modulus of eigenvalues of A. λ min ( A ) and λ max ( A ) are the minimum and maximum eigenvalues of A, respectively. | | A | | is the spectral norm, i.e., a square root of ρ ( A T A ) . | | A | | represents the operator norm induced by | | x | | , or the largest absolute row sum. For any matrix A and vector x such that A x is well defined, let | | A | | 2 , : = max | | x | | = 1 | | A x | | . We use sign ( x ) to denote the sign of x : sign ( x ) = x / | x | if x 0 , and sign ( x ) = 0 otherwise.

3.1. The MGARCH–BEKK Representation

Let r t be the vector of returns on n assets in period t. Let ϵ t be i.i.d. n-dimensional standard normal random vectors. Let F t be the sigma field generated by the past information from r t s. Then, Σ t is measurable with respect to F t 1 ; the distribution of r t can be specified as
r t = Σ t 1 2 ϵ t , ϵ t N ( 0 , I n ) ,
where I n is an n × n identity matrix. Denote the conditional covariance matrix of r t given F t 1 as Σ t , i.e., Σ t = Cov ( r t | F t 1 ) . () proposed the following BEKK ( a , b ) model to characterize the dynamics of Σ t :
Σ t = C C + k = 1 K i = 1 a A i k r t 1 r t 1 A i k + k = 1 K i = 1 b B i k Σ t 1 B i k ,
where A i k , and B i k are n × n parameter matrices, C is an n × n triangular matrix, and the summation limit K determines the generality of the process.
To illustrate the idea, we consider BEKK(1,1) in our examples with K = 1 in this paper, which can be written as
Σ t = C C + i = 1 a A i r t i r t i A i + i = 1 b B i Σ t i B i .
in which A i , B i , and C are real n × n matrices. And, without loss of generality, we choose Σ t 1 / 2 to be symmetric. For identification purposes, () showed the following property for the BEKK model.
Proposition 1.
Suppose that the diagonal elements in C, a 11 , and b 11 are positive. Then, there exists no other C, A, or B in Model (3) that gives an equivalent representation.
Proposition 1 is also known as the identification condition ().
Let vec and vech be the vector operators that stack the columns of a matrix and the lower triangular part of a matrix, respectively. That is, if
Y = y 11 y 1 n y n 1 y n n ,
then
v e c ( Y ) = ( y 11 , , y n 1 , y 12 , , y n 2 , , y 1 n , , y n n ) ,
and
v e c h ( Y ) = ( y 11 , , y n 1 , y 22 , , y n 2 , , y i i , , y i n , y n n ) .
Then, Model (3) can be rewritten in a vector form:
v e c ( Σ t ) = v e c ( C C ) + i = 1 a A i ˚ v e c ( r t i r t i ) + i = 1 b B i ˚ v e c ( Σ t i ) .
in which A i ˚ = A i A i , B i ˚ = B i B i , and ⊗ is the Kronecker product. Since the covariance matrices Σ t are symmetric, we can also write (3) in the vector-half form:
v e c h ( Σ t ) = v e c h ( C C ) + i = 1 a A ˜ i v e c h ( r t i r t i ) + i = 1 b B ˜ i v e c h ( Σ t i ) .
where A i ˜ = L n A i ˚ K n , B i ˜ = L n B i ˚ K n , and L n and K n are matrices of dimension n ( n + 1 ) × n 2 extracting the upper triangular parts of symmetric matrices A i ˚ and B i ˚ . Note that dim ( v e c ( Σ t ) ) = n 2 and dim ( v e c h ( Σ t ) ) = n ( n + 1 ) / 2 . For convenience, we denote θ = ( θ 1 , , θ p ) by the parameter vector in Model (3), in which p = 2 ( a + b ) n 2 + n ( n + 1 ) / 2 , so that the matrices C, A i , and B i are functions of θ : C = C ( θ ) , A i = A i ( θ ) , B i = B i ( θ ) . And we denote by θ 0 the true parameter vector of the model.
We assume that the values of r t in (1) are stationary; then, the following stationary condition should be imposed for the BEKK ( a , b ) Model (5) (see () and ()).
Condition 1 (Stationary Condition).
The p-dimensional return series  r t  in (1) is stationary if the following conditions hold for Model (3):
(i) 
C * ( θ ) = C C  is a continuous function of  θ , and there exists  C 0 > 0 det ( C * ( θ ) ) C 0 , where  det ( · )  represents the determinant of a matrix;
(ii) 
For any  θ ,  A ˜ i ( θ )  and  B i ˜ ( θ )  are continuous functions of  θ ;
(iii) 
For any  θ ρ ( i = 1 a A ˜ i ( θ ) + i = 1 b B ˜ i ( θ ) ) < 1 , i.e., the largest modulus of eigenvalues of  i = 1 a A ˜ i ( θ ) + i = 1 b B ˜ i ( θ )  is less than 1.

3.2. Likelihood Function

In this section, we discuss some properties of the likelihood of the BEKK ( a , b ) model. Assume that ϵ t follows a standard n-dimensional Gaussian distribution. Ignoring constants, we can write the quasi-log-likelihood as
T ( θ ) = 1 2 T t = 1 T l t ( θ ) , l t ( θ ) = ( log [ d e t ( Σ t ) ] + r t Σ t 1 r t ) .
Taking the derivative on Σ t with respect to the ith element of θ , we obtain
Σ t θ i = C C θ i + j = 1 a A j θ i r t j r t j A j + A j r t j r t j A j θ i + j = 1 b ( B j θ i Σ t j B j + B j Σ t j B j θ i + B j Σ t j θ i B j ) ,
which can be computed recursively. The derivative in (7) has the following property (the proof is given in Appendix A).
Proposition 2.
Let R t = v e c h ( r t r t ) ; then,
Σ t θ i Ψ 1 + Ψ 2 · sup | | R t | | .
where Ψ 1 and Ψ 2 are two constants.
Assume that L T ( θ ) is twice continuously differentiable in a neighborhood Θ 0 Θ of θ 0 . We define the averages of the score vector and Hessian matrix as follows:
S T ( θ ) = T 1 t = 1 T s t ( θ ) and H T ( θ ) = T 1 t = 1 T h t ( θ ) ,
where s t ( θ ) = l t ( θ ) / θ and h t ( θ ) = 2 l t ( θ ) / θ θ . Taking the derivative of (6) with respect to θ i yields
l t θ i = T r Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 ,
2 l t ( θ ) θ j θ i = T r 2 Σ t θ j θ i Σ t 1 Σ t θ i Σ t 1 Σ t θ j Σ t 1 + r t r t Σ t 1 Σ t θ j Σ t 1 Σ t θ i Σ t 1 r t r t Σ t 1 2 Σ t θ j θ i Σ t 1 + r t r t Σ t 1 Σ t θ i Σ t 1 Σ t θ j Σ t 1 ,
in which T r ( · ) represents the trace of a matrix. () showed the following property for l t ( θ ) .
Proposition 3.
Under Condition 1, the following properties hold:
(i) 
When T + , H T 0 : = 1 T t = 1 T l t 2 ( θ 0 ) θ θ P H for a nonrandom positive-definite matrix H;
(ii) 
For the Fisher information matrix I 0 : = E ( l t ( θ 0 ) θ · l t ( θ 0 ) θ )   = E ( S T 0 ( S T 0 ) ) , | | I 0 | | < ;
(iii) 
For θ Θ , E ( sup | | θ θ 0 | | ϵ | 3 l t ( θ ) θ i θ j θ k | ) is bounded for all ϵ > 0 and i , j , k = 1 , , p .
In the sparse representation, the majority elements of the true parameter vector θ 0 are exactly 0. Hence, we could partition θ 0 into two sub-vectors. Let U 0 be the set of indices { j { 1 , , p } : θ j 0 0 } and θ U 0 0 be the q-dimensional vector composed of the nonzero elements { θ j 0 0 : j U 0 } . Similarly, we define θ U 0 c 0 as a ( p q ) -dimensional zero vector. Without loss of generality, θ 0 is stacked as θ 0 = ( ( θ U 0 0 ) , 0 ) = ( ( θ U 0 0 ) , ( θ U 0 c 0 ) ) . For convenience, we define the average of the “score subvector” S U 0 , T ( θ ) and the “Hessian sub-matrix” H U 0 , T ( θ ) by s U 0 , t ( θ ) = l t ( θ ) / θ U 0 and h U 0 , t ( θ ) = 2 l t ( θ ) / θ U 0 θ U 0 . Similarly, we define S U 0 c , T ( θ ) . We also denote S T ( θ 0 ) = S T ( θ U 0 0 , 0 ) as S T 0 .
Proposition 4.
The quasi-log-likelihood function L T for the BEKK(1,1) has the following properties:
(i) 
For i = 1 , , p , E | T · S T , i 0 | 4 < , where S T , i 0 is the ith element of S T 0 ;
(ii) 
For a sufficiently large T, H U 0 , T 0 is almost surely positive definite, and λ min ( H U 0 , T 0 ) = O p ( 1 ) ;
(iii) 
There exists a neighborhood Θ U 0 0 Θ of θ U 0 0 such that, for all θ ( 1 ) and θ ( 2 ) Θ U 0 0 and some K T = O p ( 1 ) ,
| | H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) | | K T | | θ ( 1 ) θ ( 2 ) | | .
Here, a T = O p ( 1 ) means that | a T | c with probability 1 when T and c is a constant. Proposition 4(i) shows that the fourth moment of the score function S T is always finite. Proposition 4(ii) indicates that λ min ( H U 0 , T 0 ) is almost surely positive and bounded away from 0. Hence, when the L 1 penalty is combined with the quasi-likelihood function, the concavity around θ 0 can be ensured, so that a local maximizer can be obtained. Proposition 4(iii) is trivial in linear models, but not in our case. The proof of Proposition 4 is given in Appendix A.

3.3. L 1 Penalty Function and Penalized Quasi-Likelihood

Before discussing the consistency of the sparse estimator, we introduce the following condition, by following the strong irrepresentable condition for LASSO-regularized linear regression models in ().
Condition 2 (Irrepresentable condition).
There exists a neighborhood Θ U 0 0 Θ of θ U 0 0 , such that
sup θ ( 1 ) , θ ( 2 ) Θ U 0 0 | | [ ( / θ U 0 T ) S U 0 c , T ( θ ( 1 ) , 0 ) ] [ H U 0 , T ( θ ( 2 ) , 0 ) ] 1 | | c
for a constant c that takes its value in (0,1) almost surely.
Definition 1.
The half of the minimum signal d is defined as
d ( d = d T ) = 1 2 min { | θ j 0 |   :   θ j 0 0 } = 1 2 min j U 0 | θ j 0 | .
Assume that p λ ( x ) is an L 1 penalty function, i.e., p λ ( | x | ) = λ | x | . We consider the following penalized quasi-likelihood (PQL):
Q T ( θ ) = L T ( θ ) P T ( θ )
in which P T ( θ ) = j = 1 p p λ ( | θ j | ) = λ j = 1 p | θ j | is the penalty term and λ ( = λ T ) 0 is the regularization parameter determining the size of the model. If θ ^ maxmizes the PQL, i.e.,
θ ^ = arg max θ Θ Q T ( θ ) .
we say that θ ^ is a penalized quasi-maximum likelihood estimator (PQMLE).
Similar to () and (), we add some conditions on the penalty function p λ ( · ) and the half minimum signal.
Condition 3.
The penalty function p λ satisfies the following properties:
(i) 
λ = min { O ( T α ) , o ( q 1 2 T γ log T ) }  for some  α ( δ 0 + γ , 1 2 δ 0 4 ) , γ ( 0 , 1 2 ]  and large T. Here,  a = O ( f ( T ) )  means  | a / f ( T ) |  is bounded by a constant and  b = o ( g ( T ) )  means  | b / g ( T ) |  when  T ;
(ii) 
d T γ log T for some γ ( 0 , 1 2 ] and large T, where d is the half-minimum signal we defined before.

4. Properties of the PQML Estimator and Implementation

This section studies the sparsity and the consistency of the PQML estimator and discuss some implementation issues.

4.1. Sparsity of the PQML Estimator

First, we introduce three lemmas whose proofs are given in Appendix A. For convenience, we denote U ^ : = supp ( θ ^ ) , which is a set of indices corresponding to all nonzero components of θ ^ , where supp is the notation of support set and θ ^ U ^ is a subvector of θ ^ , formed by its restriction to U ^ . Then, U c ^ represents a set of indices corresponding to all 0 components in θ ^ . We also denote ⊙ as the Hadmard product.
Lemma 1.
When the penalty function p λ satisfies Condition 3, θ ^ is a strict local maximizer of the L 1 -PQL Q T ( θ ) defined in (9) if
S U ^ , T ( θ ^ ) λ T 1 s i g n ( θ ^ U ^ ) = 0 ,
| | S U c ^ , T ( θ ^ ) | | < λ T ,
λ m i n [ H U ^ , T ( θ ^ ) ] > 0 ,
in which 1 represents the vector with all elements equaling to 1 and s i g n ( · ) is as defined at the beginning of Section 3.
To show the weak oracle property of the PQML estimator, we also need the following lemma.
Lemma 2.
Let w t be a martingale difference sequence with E | w t | m C w for all t, where m > 2 and C w is a constant. Then, we have
T m 2 E t = 1 T w t m < .
Then, the weak oracle property of the PQML estimator can be established by the following theorem, whose proof is provided in Appendix A.
Theorem 1.
( L 1 -PQML estimator) Under Conditions 2 and 3, for the L 1 penalty function P T ( θ ) = λ i = 1 p | θ | , in which p = O ( T δ ) and q = O ( T δ 0 ) , if
δ [ 0 , 4 ( 1 2 α ) ) , 0 < δ 0 < min { 2 3 ( 1 2 γ ) , γ } ,
with α ( δ 0 + γ , 1 2 δ 0 4 ) , γ ( 0 , 1 2 ] , and δ > δ 0 , then there exists a local maximizer θ ^ = ( ( θ ^ U 0 ) , ( θ ^ U 0 c ) ) for Q T ( θ ) , such that the following properties are satisfied:
(i) 
(Sparsity) θ ^ U 0 c = 0 with probability approaching one;
(ii) 
(Rate of convergence) | | θ ^ U 0 θ U 0 0 | | = O p ( T γ log T ) .
p = O ( T δ ) is equivalent to p T δ c when T . The growth rate of p is controlled by T δ and q is slower than with T δ 0 . For example, to make this growth rate of q much slower than p, we can find a set of values for δ = 3 2 , δ 0 = 1 20 , γ = 1 30 , and α = 1 5 that satisfy the conditions above. Since, in our case, p O ( n 2 ) , we have n = O ( T 3 4 ) and, hence, it is possible for n to exceed the sample size T. Although the difference between the rates of p and n is not as large as that in (), in which log p = O ( T 1 2 α ) and q = o ( T ) , it is enough to be applied in most cases in practice.

4.2. Implementation and Selection of λ

To compute the whole regularization path of L 1 -PQML estimators, we note that several algorithms have been proposed to solve penalized optimization problems. For example, () proposed the least-angle regression (LARS) algorithm to compute an efficient solution to the optimization problem for LASSO. Later on, pathwise coordinate descent methods were proposed to solve the LASSO-type problem efficiently; see () and (). For the PQML estimator, we used an algorithm inspired by the BLasso algorithm (, ) with some necessary modifications since the BLasso algorithm does not need to explicitly calculate the first derivatives and second derivatives of the likelihood function, which are complicated in our case. We note that the original BLasso algorithm uses 0 as initial values for all parameters, but the diagonal elements of A and B are positive by definition, so we make the following modification. We set 0 as the initial values for all off-diagonal elements in A, B, and C, and set the estimated values of fitting the component series into a univariate GARCH model as the initial values of the diagonal elements in parameter matrices A, B, and C.
Another issue in the implementation is to select the tuning parameter λ , which leads to the problem of model selection. The tuning parameter λ can be chosen by several criteria. For example, it is usually easy to consider the Akaike information criterion (AIC), the small-sample corrected AIC (AICC), and Bayesian information criterion (BIC) criteria to select the tuning parameter. In addition, () proposed a modified BIC criterion and () extended it for the case p > T . () proposed using Cohen’s kappa coefficient, which measures the agreement between two sets. Another method for model selection is to use cross-validation (CV). () used CV to choose the best model among model selection procedures such as AIC and BIC. In our study, we apply the AIC and BIC criteria on the testing data and select the best tuning parameters. Note that, since our data are ordered, k-fold CV is not applicable here and the data are split in time order.

5. Simulation

In this section, we study the performance of the regularized BEKK models on some simulated examples. Consider Model (3) with n = 4 and a = b = 1 . Note that we then have p = 42 parameters, as matrix C is lower triangular. We assume that the parameter matrices satisfy the stationary condition, Condition 2, and, for identification purposes, we assume that the diagonal elements in C are positive, a 11 > 0 , and b 11 > 0 . We consider two cases for matrices A, B, and C, which are summarized in Table 1. In both cases, the indices of nonzero elements in coefficient matrices A, B, and C are randomly generated. To ensure that the matrices satisfy Condition 1, values of the diagonal elements in A and B are randomly generated from a uniform distribution on U ( 0.45 , 0.45 ) , and the off-diagonal nonzero elements in A and B are generated from U ( 0.5 , 0.5 ) . All the nonzero elements in C are generated from U ( 0.1 , 0.1 ) .
Table 1. Parameter matrices in simulations.
For each case, we simulate the data r t ( 1 t T ) with T = 600 , and then use the proposed regularized procedure to make inference on the model. Since the diagonal elements in A, B, and C cannot be zero, we do not shrink the diagonal elements in A, B, and C. Additionally, we set the estimates of parameters in univariate GARCH models for each component series as the initial values of diagonal elements in A, B, and C.
To demonstrate the performance of our estimates, we consider three measurements. The first is the success rate in estimating zero and nonzero elements in θ or parameter matrices:
τ 0 = i = 1 p I ( θ i 0 = 0 θ ^ i = 0 ) i = 1 p I ( θ i 0 = 0 ) , τ 0 C = i = 1 p I ( θ i 0 0 θ ^ i 0 ) i = 1 p I ( θ i 0 0 ) .
The second measure is the root of mean squared errors, which is defined as ν = | | θ 0 θ ^ λ | | 2 . The third measure is the Kullback–Leibler information, which is given by
κ = 1 2 T t = 1 T | Σ t Σ ^ t 1 | log | Σ t Σ ^ t 1 | ,
where Σ ^ t = C ^ C ^ + A ^ r t 1 r t 1 A ^ + B ^ Σ t 1 B ^ . We run N = 500 simulations for each case, and present the performance measures and their standard errors (in parentheses) for different λ s in Table 2.
Table 2. Performance measures in two cases.
To select the tuning parameter λ , we use the first 500 samples as the training data and the last 100 samples as the test data. The training data are used to estimate model parameters θ λ for a given λ , and the test data are used to choose the best λ , i.e., the one that gives the minimum AICs and BICs. That is,
λ ^ BIC = arg min λ BIC λ , λ ^ AIC = arg min λ AIC λ ,
in which the AIC λ and BIC λ are defined as
BIC λ = 2 L T t e s t ( θ ^ λ ) + k log ( T t e s t ) T t e s t AIC λ = 2 L T t e s t ( θ ^ λ ) + 2 k T t e s t ,
where, in this case, k = i = 1 p I ( θ ^ i λ 0 ) , T t e s t = 100 , and
L T t e s t ( θ ) = 1 2 T t e s t t = 501 600 ( log [ d e t ( Σ t ) ] + r t Σ t 1 r t ) .
Figure 1 shows the histograms of selected λ s via BIC and AIC with CV for Cases 1 and 2. In general, we can see from Figure 1 that λ is favored by BIC and AIC when its value is between 0.64 and 2. However, slight differences between these two cases can be found. For Case 1, λ s around 1 are most favored by both BIC and AIC, while, for Case 2, λ s around 1 and 2 are most favored by BIC and AIC, respectively.
Figure 1. Histograms of selected λ in Cases 1 (top) and 2 (bottom) via BIC (left) and AIC (right).

6. Real Data Applications

In this section, we use the regularized BEKK representation to study the volatility spillover effect and find optimal Markowitz’s mean–variance portfolios. The data we studied consist of daily log-returns of 18 stocks during the period 4 January 2016–31 January 2018, which are listed in Table 3 (). Figure 2 shows the time series of these 18 stocks and Table 4 summarizes the sample mean, the sample standard deviation, the sample skewness, the sample kurtosis, and the correlations of these 18 series. All the correlations are positive for every two stocks in the selected period, and, except for IPG, all the stocks have a positive mean. The sample kurtosis for some stocks is way larger than 3, which indicates that we cannot simply assume that those returns are following normal distributions individually. Hence, it is natural to employ a suitable time series model to examine the data.
Table 3. Full names of 18 tickers.
Figure 2. Daily returns of 18 stocks from 4 January 2016 to 31 January 2018.
Table 4. Correlation and statistical features of 18 stocks for 2016–2017.

6.1. Volatility Spillovers

To use the MGARCH–BEKK representation to analyze the market, consisting of 18 stocks, we should realize that certain types of regularization or shrinkage are necessary, due to the complexity of the volatility dynamics. In particular, we use the proposed L 1 -regularized BEKK(1,1) model and procedure to study the volatility spillover among the 18 stocks. We first compute the PQML estimates of the model for different λ s. Figure 3 shows the estimated structures of estimated coefficient matrices A ^ λ and B ^ λ for λ = 4 , 2 , 1 , 0.5 , 0.3 , in which the nonzero values of A ^ λ and B ^ λ are represented as the directional lines among stocks. Since matrices A and B in the model are not symmetric before the quadratic forms, we use the directional lines to distinguish the nonzero elements between upper-diagonal and lower-diagonal elements. Specifically, if a i j 0 , the directional line progresses from i to j. As the PQML estimates A ^ λ and B ^ λ tell us the significant interdependence and contagion effects of the 18 stocks, the network structures in Figure 3 provide a clear representation on volatility spillover. Furthermore, we notice that, for some moderate values of λ , for example, λ = 0.5 , A ^ λ is very sparse, whereas B ^ λ demonstrates more interdependence among stocks. When larger values of λ are used in the regularization procedure, the PQML estimates A ^ λ are quickly shrunk into diagonal matrices, and B ^ λ also become more sparse than for the case λ = 0.5 .
Figure 3. The network structure of estimated matrices A (top) and B (bottom) under different λ s.
Using the PQML estimates of A ^ λ , B ^ λ , and C ^ λ and the BEKK(1, 1) representation, we compute the estimated volatilities and the dynamic correlations among 18 stocks. Figure 4 shows the volatilities estimated by the regularized BEKK(1,1) model with λ = 2 , 0.5 and univariate GARCH models. Note that most volatility series estimated by the three models are similar, except for stocks NFLX, ORCL, and TIF. We also show the estimated dynamic correlations among 18 stocks in a regularized BEKK(1,1) model with λ = 1 in Figure 5. We note that most correlations among the 18 stocks are positive during the sample period.
Figure 4. Estimated volatilities by regularized BEKK(1,1) with λ = 2 (red lines), λ = 0.5 (blue lines), and univariate GARCH models (green lines).
Figure 5. Daily estimated conditional correlations when λ = 1 .
To show the overall volatility spillover, we extend the idea of the spillover index in (). Specifically, note that E [ ϵ t + 1 ϵ t + 1 ] = Σ t + 1 = Σ t + 1 1 2 ( Σ t + 1 1 2 ) , where Σ t + 1 1 2 is the unique lower-triangular Cholesky factor of Σ t + 1 . We denote elements of Σ t 1 2 by σ 1 2 , i , j , t ; then, the Spillover Index S t + 1 is defined as
S t + 1 = i , j = 1 , i j n σ ^ 1 2 , i , j , t + 1 2 t r a c e ( Σ ^ t + 1 ) × 100 % ,
where n is the number of stocks, which is equal to 18. We plot the daily spillover indices of 18 stocks for λ = 2 and 0.5 . The spillover indices during the sample period vary between 5% and 80%, and smaller λ s seem to generate more correlations among stocks. In particular, three big spikes can be found on 4 February 2016, 24 June 2016, and 9 November 2016. In addition to finding the PQML estimates for different λ s, we also find the whole Ł 1 regularization path. Note that the number of parameters in the BEKK(1,1) model for 18 stocks is p = 819 , and we only show the regularized path for 819 18 × 3 = 765 off-diagonal elements in A ^ λ , B ^ λ , and C ^ λ . And both plots are shown in Figure 6.
Figure 6. Daily spillover index (top) and regularization paths of estimated off-diagonal parameters in BEKK regularization Model represented by different colors (bottom).

6.2. Portfolio Optimization

We further apply the regularized BEKK model to Markowitz mean–variance portfolio optimization (). Using portfolio variance as a measure of the risk, Markowitz portfolio optimization theory provides an optimal pay-off between the profit and the risk. Since the means and covariance matrix of assets are assumed to be known in the theory, they need to be estimated before being plugged into the framework. For high-dimensional portfolios, regularized methods are commonly used to achieve better performance. For instance, () and () used an L 1 penalty function for sparse portfolios, and () used a concave optimization-based approach to estimate the optimal portfolio. In our case, we use the regularized BEKK model to predict the covariance matrices in the next period, and then apply Markowitz portfolio theory to find the optimal portfolios.
In particular, we assume that the portfolio consists of n = 18 risky assets and denote μ t and Σ t as the mean and covariance matrix, respectively, of the n risk assets at time t. Let 1 = ( 1 , , 1 ) be an n-dimensional vector of ones. Markowtiz mean–variance portfolio theory minimizes the variance of the portfolio min w t w t Σ t w t , subject to the constraint w t 1 = 1 and w t μ t = μ * , where μ * is the target return. When short selling is allowed, the efficient portfolio can be explicitly expressed as
w effi , t = b ˜ d ˜ Σ t 1 1 a ˜ d ˜ Σ t 1 μ t + μ * c ˜ d ˜ Σ t 1 μ t a ˜ d ˜ Σ t 1 1 ,
where a ˜ = μ t Σ t 1 1 , b ˜ = μ t Σ t μ t , c ˜ = 1 Σ t 1 , and d ˜ = b ˜ c ˜ a ˜ 2 . When the target return μ * is chosen to minimize the variance of the efficient portfolio, we obtain the global minimum variance (GMV) portfolio:
w minvar , t = Σ t 1 1 / 1 Σ t 1 .
For comparison purposes, we also use another three multivariate volatility models to predict the covariance matrices of n = 18 stocks. The first is very simple, and it assumes a constant covariance matrix for n stocks. The second is a factor-GARCH model (; ; ; ), which assumes the following for asset return vector r t , factors f t , and volatilties of k independent factors:
r t = W f t , Cov ( f t ) = Σ t = diag { σ 1 t 2 , σ 2 t 2 , , σ k t 2 } ,
σ i t 2 = 1 + β i x i , t 1 2 + γ i σ i , t 1 2 ,
where W is a k × k lower-triangular matrix with diagonal elements equal to 1 and x t = ( x 1 t , , x k t ) is a vector of k independent factors. The third covariance model is a dynamic conditional correlation GARCH (DCC–GARCH) model (), which has the form
r t = Σ t 1 2 ϵ t , ϵ t N ( 0 , I n ) , Σ t = D t R t D t ,
Q t = ( 1 α β ) C + α s t 1 s t 1 + β Q t 1 , R t = diag ( Q t ) 1 2 Q t diag ( Q t ) 1 2 ,
where D t = diag ( d 1 t , , d n t ) , s i , t = r i , t / d i , t , s t = ( s 1 , t , , s T , t ) , and R t is the conditional correlation matrix at time t, that is, R t = Corr ( r t | F t 1 ) . And C is the unconditional correlation matrix, i.e., C = E ( R t ) . The matrix Q t can be interpreted as a conditional covariance matrix of devolatilized residuals. For the dynamics of the univariate volatilities, d i , t s are assumed to follow a GARCH(1,1) process:
d i , t = ω i + a i r i , t 1 2 + b i d i , t 1 2 ,
where ( w i , a i , b i ) are GARCH(1,1) parameters.
Let t = 2 January 2018, ⋯, 31 January 2018; we first fit 4 covariance models to the returns of 18 stocks from 4 January 2016 to t, and then compute the 1-day-ahead prediction of covariance matrices. Using the predicted covariance matrices, we compute the efficient portfolios w minvar , t + 1 and w effi , t + 1 for μ * = 0.15 % , 0.10 % , and 0.05 % . Table 5 shows the means, standard deviations (SD), and the information ratios (IR, i.e., ratio of means and standard deviations) for realized portfolio returns in the month of January 2018. As argued by () and (), these statistics are good measurements of the out-of-sample performance of Markowitz portfolios. As () claimed that it is difficult to outperform equally weighted portfolios in terms of the out-of-sample mean for Markowitz portfolios, we also include the performance of equally weighted portfolios as a benchmark in Table 5. We note that all the means generated from four covariance models are smaller than that from equally weighted portfolios (0.430%), and the standard deviations of covariance models, except the factor GARCH, are smaller than that of equally weighted portfolios. Notably, the regularized BEKK model consistently maintains the second-best mean performances at 0.39%, 0.352%, 0.382%, and 0.416% for GMV, and μ * values of 0.15%, 0.10%, and 0.05%. However, the information ratio of the regularized BEKK model surpasses that of all other portfolios. It achieves the highest values across all scenarios—0.601, 0.540, 0.654, and 0.657—for GMV and μ * = 0.15 % , 0.10 % , and 0.05 % . These results show the robustness and efficiency of the regularized BEKK model in portfolio optimization, consistently delivering competitive mean performance and superior risk-adjusted returns compared to other covariance models.
Table 5. Performance of portfolios using different covariance models.

7. Discussion and Conclusive Remarks

Modeling the dynamics of high-dimensional covariance matrices is an interesting and challenging problem in both financial econometrics and high-dimensional time series analysis. To address this issue, this paper proposes an inference procedure with L 1 regularization for the sparse representation of high-dimensional BEKK and to obtain a class of penalized quasi-maximum likelihood estimators. The proposed regularization allows us to find significant parameters in the BEKK representation and shrink the non-essential ones to zero, hence providing a sparse estimate of the BEKK representations. We show that the sparse BEKK representation has suitable theoretical properties and is promising for applications in portfolio optimization and volatility spillover.
The proposed sparse BEKK representation also contributes to the application of machine learning methods in time series modeling. As most discussion on applying regularization methods to time series modeling focuses on regularizing high-dimensional vector autoregressive models and their variants (; ), it seems that the sparse representation of dynamics of high-dimensional variance–covariance matrices has been ignored in the literature. While obtaining a sparse representation of the dynamics within high-dimensional variance–covariance matrices is crucial to enhance interpretability in time series modeling, our study bridges this gap by considering a basic L 1 regularization method. One obvious extension from our current study is to replace the L 1 penalty with other types of penalty for high-dimensional MGARCH models, for instance, the SCAD penalty (), the adaptive LASSO (), and the group LASSO (). With different types of penalty functions, one can regularize the assets in the model with different requirements, hence causing the estimates to have different kinds of asymptotic properties.
As the proposed sparse BEKK representation simplifies the dynamics of covariance matrices of high-dimensional time series, it has advantages over existing MGARCH models in some financial applications. In particular, the sparse BEKK representation can capture significant volatility spillover effects in high-dimensional financial time series, which usually cannot be analyzed using other MGARCH models. Since significant volatility spillover is captured, the proposed method also improves the performance of portfolio optimization based on the dynamics of high-dimensional covariance matrices. The proposed procedure can certainly be extended to incorporate more empirical aspects of financial time series. Taking the leverage effect as an example, one may modify the regularization procedure to obtain sparse representation of high-dimensional multivariate exponential or threshold GARCH models.
Although the proposed framework shows advantages in modeling dynamics of high-dimensional covariance matrices, the computational challenge is not completely resolved. The main reason is that the proposed inference procedure involves a step of computing derivatives via the Kronecker product of parameter matrices. Since the Kronecker product turns two n × n matrices into an n 2 × n 2 matrix, the requirement for computational memory resources increases significantly. Hence, the proposed procedure is suitable for problems in which the number of component time series ranges from several to 100. If the number of assets progresses beyond 200, the computational cost is still a major concern. One possible remedy for this is training a neural network to approximate the regularized likelihood of the high-dimensional model. In such a way, the proposed regularization using the high-dimensional MGARCH model can be extended to characterize the dynamics of covariance matrices of larger size.

Author Contributions

Conceptualization, H.X.; methodology, H.X., H.Z. and S.Y.; software, S.Y.; validation, S.Y., H.Z. and H.X.; formal analysis, S.Y.; investigation, S.Y., H.X. and H.Z.; resources, S.Y.; data curation, S.Y.; writing—original draft preparation, S.Y., H.X. and H.Z.; writing—review and editing, H.X.; visualization, S.Y.; supervision, H.X.; project administration, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available by request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike information criterion
BEKKBaba–Engle–Kraft–Kroner
BICBayesian information criterion
CVCross-validation
DCCDynamic conditional correlation
GARCHGeneralized autoregressive conditionally heteroskedastic
GMVGlobal minimum variance
IRInformation ratio
LARSLeast-angle regression
LASSOLeast absolute shrinkage and selection operator
MGARCHMultivariate GARCH
PQLPenalized quasi-likelihood
PQMLPenalized quasi-maximum likelihood
SCADSmoothly clipped absolute deviation
SDStandard deviation

Appendix A. Proofs of Propositions, Lemmas, and Theorems

Appendix A.1. Proof of Propostion 2

Let R t , C , and Σ t * be defined by
R t = ( v e c h ( r t r t ) , , v e c h ( r t m + 1 r t m + 1 ) ) , Σ t * = ( v e c h ( Σ t ) , , v e c h ( Σ t m + 1 ) ) ,
where m = m a x ( a , b ) , and let
C = ( v e c h ( C C ) , 0 , , 0 ) ,
with dimensions m n ( n + 1 ) / 2 .
Define A = A ˜ 1 A ˜ m I 0 0 0 0 0 0 0 I 0 and B = B ˜ 1 B ˜ m I 0 0 0 0 0 0 0 I 0 , with convention A ˜ i = 0 if i > a and B ˜ i = 0 if i > b . Then, the model can be written as
Σ t * = C + A R t + B Σ t 1 * = k = 0 t 1 [ B k ( θ ) C ( θ ) ] + B t ( θ ) Σ 0 * + k = 0 t 1 B k ( θ ) A ( θ ) L k R t 1 ( θ 0 )
where L is the backshift operator L r t = r t 1 . Here, Σ 0 * is fixed and R t depends on θ 0 but is not a function of θ . Then, we have
Σ t * θ i = θ i ( k = 0 t 1 B ˜ k C ) + θ i ( B ˜ t ) Σ 0 * + θ i ( k = 0 t 1 B ˜ k A ˜ L ) R t 1
Since
B ˜ k θ = j = 0 k 1 B ˜ j B ˜ θ j B ˜ k 1 j ,
we have
B j B θ j B k 1 j | | B j | | B θ i | | B k 1 j | | , j = 0 , , k 1 .
Applying Lemma A.3. from (), | | B k | | Ψ k n 0 ρ 0 k for all k, we have
B j B θ j B k 1 j Ψ 2 k n 0 ρ 0 k B θ i .
in which n 0 is a fixed number, Ψ is a constant independent of θ , and 1 < ρ 0 < 1 .
To bound (A1), there are three terms to bound:
θ i ( k = 0 t 1 B ˜ k C ) = k = 1 t 1 B ˜ k θ i C + k = 0 t 1 B ˜ k C θ i k = 1 t 1 B ˜ k θ i | | | C | | + k = 0 t 1 | | B ˜ k | | C θ i Ψ 2 | | C | | k = 1 t 1 k n 0 ρ 0 k B ˜ θ i + Ψ C θ i k = 0 t 1 k n 0 ρ 0 k π ( n 0 ) Ψ ( Ψ | | C | | · B ˜ θ i + C θ i ) ,
using k = 0 t 1 k n 0 ρ 0 k k = 0 t k n 0 ρ 0 k 1 k = 0 k n 0 ρ 0 k 1 = π ( n 0 ) , where π ( n 0 ) is a constant that only depends on n 0 . And, if ρ 0 = 0 , this term is then easily bounded because B ˜ is the nilpotent and all sums are finite. In the same way,
θ i ( B ˜ t ) Σ 0 * Ψ π ( n 0 ) B ˜ θ i | | Σ 0 * | | .
Finally,
| | θ i ( k = 0 t 1 B ˜ k L k A ˜ ) R t 1 | | | | k = 0 t 1 ( θ i B ˜ k L k A ) ˜ R t 1 | | + | | [ k = 0 t 1 B ˜ k L k ( θ i A ˜ ) R t 1 | | .
Denote the first and second sums on the right-hand side of the inequality as T 1 and T 2 , respectively, we have
| | T 1 | | Ψ 2 k = 1 t 1 k n 0 + 1 ρ 0 k 1 | | A ˜ | | · | | R t k 1 | | · | | B ˜ θ i | | Ψ 2 | | A ˜ | | · | | B ˜ θ i | | · k = 1 t 1 k n 0 + 1 ρ 0 k 1 · sup t | | R t | | π ( n 0 + 1 ) Ψ 2 | | A ˜ | | · | | B ˜ θ i | | · sup t | | R t | | ,
and
| | T 2 | | Ψ 2 π ( n 0 ) | | A ˜ θ i | | s u p t | | R t | | .
By our assumption, | | C | | , | | C θ i | | , | | A ˜ | | , | | A ˜ θ i | | , | | B ˜ θ i | | , | | Σ 0 * | | are all bounded. And there exists a constant w such that | | Σ t * θ i | | = m | | Σ t θ i | | . Hence,
| | Σ t θ i | | Ψ 1 + Ψ 2 sup t | | R t | | .
where Ψ 1 = Ψ π ( n 0 ) ( Ψ | | C | | · | | B ˜ θ i | | + | | C θ i | | ) + Ψ π ( n 0 ) | | B ˜ θ i | | · | | Σ 0 * | | , and Ψ 2 = Ψ 2 π ( n 0 + 1 ) | | A ˜ | | · | | B ˜ θ i | | + Ψ 2 π ( n 0 ) | | A ˜ θ i | | . □

Appendix A.2. Proof of Proposition 4

As l t ( θ ) θ i = T r ( Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 ) , where T r ( · ) denote the trace of a matrix, and E [ r t r t | F t 1 ] = Σ t , we, hence, have E [ l t ( θ ) θ i | F t 1 ] = 0 , which means that l t ( θ ) θ i is a martingale difference. Then, we want to prove that E [ | T 1 / 2 T 1 t = 1 T l t ( θ 0 ) θ i | m ] = E [ | T 1 / 2 t = 1 T l t ( θ 0 ) θ i | m ] < holds for m = 4 . By Lemma 2, this proof is thus completed if we show that E [ | l t ( θ 0 ) θ i | 4 ] < . By Proposition 2, Σ t θ i Ψ 1 + Ψ 2 sup | | v e c h ( r t r t ) | | . Since
Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 = ( I r t r t Σ t 1 ) Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | Σ t θ i | | Σ t 1 | | ,
it is equivalent to show that
E l t ( θ 0 ) θ i 4 = E T r 4 Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 < .
Since T r ( A B ) | | A | | · | | B | | and | | Σ t 1 | | is finite, there exists a constant M such that | | Σ t 1 | | M for all t. Additionally, | | ( I r t r t Σ t 1 ) | | | | I | | + | | r t r t | | · | | Σ t | | 1 1 + M | | r t r t | | , then
T r Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | | | Σ t 1 | | Σ t θ i
E T r 4 Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 E [ ( 1 + M sup | | r t r t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | v e c h ( r t r t ) | | ) 4 ] .
Because | | A | | | | v e c h ( A ) | | r a n k ( A ) | | A | | , there exists a constant k such that | | r t r t | | = k | | v e c h ( r t r t ) | | . Hence, if we let | | v e c h ( r t r t ) | | = | | R t | | ,
E [ ( 1 + M sup t | | r t r t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | R t | | ) 4 ] = E [ ( 1 + k M sup t | | R t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | R t | | ) 4 ] = E ( i = 0 8 a i | | R t | | i )
where a i s are constants. Since r t Σ t 1 2 ϵ t , where ϵ t s follow a normal distribution, r t s , hence, admit 16 moments of order. Hence, E | | R t | | i < , for i from 0 to 8. Hence, E ( i = 0 8 a i | | R t | | i ) < ; then, E [ | l t ( θ 0 ) θ i | 4 ] < .
Next, we check (c) and (d). (c) is clear, as we said before. By (III) in Lemma 1, the derivative of H U 0 , T ( θ ) is bounded. By the mean-value theorem,
v e c ( H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) ) = H U 0 , T ( θ , 0 ) θ | θ = θ * · ( θ ( 1 ) θ ( 2 ) ) .
Hence,
| | H U 0 , T ( θ ( 1 ) , 0 ) H U 0 T ( θ ( 2 ) , 0 ) | | | | v e c ( H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) | | = H U 0 , T ( θ , 0 ) θ | θ = θ * · ( θ ( 1 ) θ ( 2 ) ) H U 0 , T ( θ , 0 ) θ | θ = θ * · | | θ ( 1 ) θ ( 2 ) | | K ˜ | | θ ( 1 ) θ ( 2 ) | |
where K ˜ is bounded by (iii) in Proposition 3. Hence, K ˜ = O p ( 1 ) and θ * lies between θ ( 1 ) and θ ( 2 ) .
Next, we verify (e) with β = δ 0 / 2 . For every i { 1 , , p } , it is sufficient to show that max | | v | | = 1 | ( H i 1 , T 0 , , H i q , T 0 ) v | = O p ( T δ 0 / 2 ) for a vector v R q . Using the Cauchy–Schwarz inequality and property of the norm, the left-hand side is bounded by | | ( H i 1 , T 0 , , H i q , T 0 ) | | q 1 / 2 max 1 j q | H i j , T | . Since, from (I) and (II) in Lemma 1, H i j , T 0 = O p ( 1 ) and q = O ( T δ 0 ) , the result follows. □

Appendix A.3. Proof of Lemma 1

First, consider the PQL Q T ( θ ) , as defined in (5), in the constrained | | θ ^ | | 0 -dimensional subspace S : = { θ R p : θ c = 0 } of R p , where θ c denotes the subvector of θ formed by the components in U c ^ . It follows from (12) that Q T ( θ ) is strictly concave in a ball N 0 S centered at θ ^ . This, along with (10), entails that θ ^ , as a critical point of Q T ( θ ) in S , is the unique maximizer of Q T ( θ ) in N 0 .
Now, we show that θ ^ is indeed a strict local maximizer of Q T ( θ ) on the whole space R p . Take a small ball N 1 R p centered at θ ^ such that N 1 S N 0 . We then need to show that Q T ( θ ^ ) > Q T ( γ 1 ) for any γ 1 N 1 N 0 . Let γ 2 be the projection of γ 1 onto S , such that γ 2 N 0 . Thus, it suffices to prove that Q T ( γ 2 ) > Q T ( γ 1 ) . By the mean value theorem, we have
Q T ( γ 1 ) Q T ( γ 2 ) = Q T ( γ 0 ) γ T ( γ 1 γ 2 ) ,
where the vector γ 0 lies between γ 1 and γ 2 . Note that the components of γ 1 γ 2 are zero for their indices in U ^ and ( γ 0 j ) = s g n ( γ 1 j ) for j U c ^ . Therefore, we have
Q T ( γ 0 ) γ T ( γ 1 γ 2 ) = S T ( γ 0 ) T ( γ 1 γ 2 ) λ T [ 1 s g n ( γ 0 ) ] T ( γ 1 γ 2 ) = S U c ^ T ( γ 0 ) T γ 1 U c ^ λ T j U c ^ | γ 1 j |
where γ 1 U c ^ is a subvector of γ 1 formed by the components in U c ^ . By (10), there exists some δ > 0 such that, for any θ in a ball in R p centered at θ ^ with radius δ ,
| | S U c ^ T ( θ ) | | < λ T
We further shrink the radius of ball N 1 to less than δ , so that | γ 0 j |     | γ 1 j |   <   δ for j U c ^ and (A3) holds for any θ N 1 . Since γ 0 N 1 , it follows from (A3) that (A2) is strictly less than
λ T | | γ 1 U c ^ | | 1 λ T | | γ 1 U c ^ | | 1 = 0
Since | | S U c ^ T ( γ 0 ) | | < λ T , S U c ^ T ( γ 0 ) T γ 1 U c ^ λ T | | γ 1 U c ^ | | 1 , and
λ T j U c ^ | γ 1 j |     λ T j U c ^ | γ 1 j |   =   λ T j U c ^ | | γ 1 U c ^ | | 1 .
we have Q T ( γ 0 ) γ T ( γ 1 γ 2 ) 0 and Q T ( γ 1 ) Q T ( γ 2 ) . □

Appendix A.4. Proof of Lemma 2

A Marcinkiewicz-–Zygmund inequality for martingales () states that
E ( t = 1 T w t ) m { 4 m ( m 1 ) } m / 2 T ( m 2 ) / 2 t = 1 T E | w t | m
holds for m > 2 . Because E | w t | m C w for all t, we have
T m / 2 E ( t = 1 T w t ) m { 4 m ( m 1 ) } m / 2 T 1 t = 1 T E | w t | m { 4 m ( m 1 ) } m / 2 C w .
Thus, the result follows. □

Appendix A.5. Proof for Theorem 1

For notational simplicity, we write, for example, Q T ( ( ( θ U 0 ) , ( θ U 0 c ) ) ) as Q T ( θ U 0 , θ U 0 c ) . Consider events
E T 1 = { | | S U 0 , T 0 | | ( q 1 / 2 / T ) 1 / 2 l o g 1 / 4 T } , E T 2 = { | | S U 0 c , T 0 | | λ l o g 1 T } ,
where q = O ( T δ 0 ) and λ = O ( T α ) . It follows from Bonferroni’s inequality and Markov’s inequality, together with Proposition 4(i), that
P ( E T 1 E T 2 ) 1 i U 0 P ( | T 1 / 2 S i , T 0 |   >   q 1 / 4 ( log T ) 1 / 4 ) i U 0 c P ( | T 1 / 2 S i , T 0 |   >   T 1 / 2 α ) 1 max i U 0 E ( | T 1 / 2 S i , T 0 | 4 ) q log T ( p q ) max i U 0 c E ( | T 1 / 2 S i , T 0 | 4 ) T 4 ( 1 / 2 α ) ( log T ) 4 = 1 O ( log 1 T ) O ( T δ 4 ( 1 / 2 α ) ( log T ) 4 ) ,
where the last two terms are o ( 1 ) because of the condition δ < 4 ( 1 / 2 α ) . Under the event E T 1 E T 2 , we will that there exists a solution θ ^ R p to (10)–(12) with s g n ( θ ^ ) = s g n ( θ 0 ) and | | θ ^ θ 0 | | = O ( T γ log T ) for some γ ( 0 , 1 / 2 ] .
First, we prove that, for a sufficiently large T, Equation (10) has a solution θ ^ U 0 inside the hypercube N = { θ U 0 R q : | | θ U 0 θ U 0 0 | | = T γ log T } , when we suppose U ^ = U 0 . Define the function Ψ : R q R q by
Ψ ( θ U 0 ) = S U 0 , T ( θ U 0 , 0 ) λ 1 s g n ( θ U 0 ) .
Then, (10) is equivalent to Ψ ( θ ^ U 0 ) = 0 . To show that the solution is in the hypercube N , we expand Ψ ( θ U 0 ) around θ U 0 0 . Function (A7) is written as
Ψ ( θ U 0 ) = S U 0 , T 0 + H U 0 T ( θ U 0 * , 0 ) ( θ U 0 θ U 0 0 ) λ 1 s g n ( θ U 0 ) = H U 0 , T 0 ( θ U 0 θ U 0 0 ) + [ S U 0 , T 0 λ 1 s g n ( θ U 0 ) ] + [ H U 0 , T ( θ U 0 * , 0 ) H U 0 , T ] ( θ U 0 θ U 0 0 ) = H U 0 , T 0 ( θ U 0 θ U 0 0 ) + v T + w T
where θ U 0 * lies on the line segment that joins θ U 0 and θ U 0 0 . Since the matrix H U 0 0 is invertible by Proposition 4(ii), (A8) is further written as
Ψ ˜ ( θ U 0 ) : = ( H U 0 , T 0 ) 1 Ψ ( θ U 0 ) = θ U 0 θ U 0 0 + ( H U 0 , T 0 ) 1 v T + ( H U 0 , T 0 ) 1 w T = θ U 0 θ U 0 0 + v ˜ T + w ˜ T
We now derive bounds for the last two terms in (A8). We consider v ˜ T first. For any θ U 0 N ,
min j U 0 | θ j | min j U 0 | θ j 0 | d T = d T T γ log T
by Condition 3(ii), and s g n ( θ U 0 ) = s g n ( θ U 0 0 ) . Using Condition 3(i), we have
| | λ 1 s g n ( θ U 0 ) | | = λ o ( q 1 / 2 T γ log T ) .
This, along with the property of matrix norms and Proposition 4(ii), entails that, during the event E T 1 ,
| | v ˜ T | | = | | H U 0 T 0 1 [ S U 0 T 0 λ 1 s g n ( θ U 0 ) ] | | | | H U 0 T 0 1 | | | | S U 0 T 0 λ 1 s g n ( θ U 0 ) | | q 1 / 2 | | H U 0 T 0 1 | | ( | | S U 0 T 0 | | + | | λ 1 s g n ( θ U 0 ) | | ) q 1 / 2 O p ( 1 ) ( ( q 2 / 4 / T ) 1 / 2 ( log T ) 1 / 2 + o ( q 1 / 2 T γ log T ) ) = o p ( T γ log T )
where the last equality follows from q = O ( T δ 0 ) and δ 0 < 2 3 ( 1 2 γ ) . Next, we consider w ˜ T . By the property of norms and Propositions 4(ii) and (iii),
| | w ˜ T | | = | | ( H U 0 , T 0 ) 1 ( θ U 0 * , 0 ) [ H U 0 , T ( θ U 0 * , 0 ) H U 0 , T 0 ] ( θ U 0 θ U 0 0 ) | | q 1 / 2 | | ( H U 0 , T 0 ) 1 | | | | [ H U 0 , T ( θ * , 0 ) H U 0 , T 0 ] ( θ U 0 θ U 0 0 ) | | q O p ( 1 ) | | H U 0 , T ( θ U 0 * , 0 ) H U 0 , T 0 | | | | θ U 0 θ U 0 0 | | q O p ( 1 ) K T | | θ U 0 * θ U 0 0 | | | | θ U 0 θ U 0 0 | | ,
Since K T = O p ( 1 ) and q = O ( T δ 0 ) with δ 0 < γ ,
| | w ˜ T | | = q O p ( T 2 γ ( log T ) 2 ) = o p ( T γ log T ) ,
with θ i θ i 0 = T γ l o g T for all i U 0 . By (A9), (A11), and(A12), for sufficiently large T, and for all i U 0 ,
Ψ ˜ i ( θ U 0 ) T γ l o g T | | v ˜ T | | | | w ˜ T | | 0 ,
if θ i θ i 0 = T γ log T , and
Ψ ˜ i ( θ U 0 ) T γ l o g T + | | v ˜ T | | + | | w ˜ T | | 0 ,
if θ i θ i 0 = T γ log T .
By the continuity of Ψ ˜ and inequalities (A13) and (A14), an application of Miranda’s existence theorem tells us that Ψ ˜ ( θ U 0 ) = 0 has a solution θ ^ U 0 in N . Clearly, θ ^ U 0 also solves the equation Ψ ( θ U 0 ) = 0 with regard to the first equality in (A8). Thus, we have shown that (10) indeed has a solution in N .
Second, let θ ^ = ( θ ^ U 0 , θ ^ U 0 c ) R p , with θ ^ U 0 N as a solution to (10), and θ ^ U 0 c = 0 . Next, we show that θ ^ satisfies (11) for the event E T 2 . By the triangle inequality and mean value theorem, we have
λ 1 | | S U 0 c T ( θ ^ ) | | λ 1 | | S U 0 c T 0 | | + λ 1 | | S U 0 c T ( θ ^ ) S U 0 c T 0 | | ( log T ) 1 + λ 1 | | ( / θ U 0 ) S U 0 c T ( θ ^ U 0 * * , 0 ) ( θ ^ U 0 θ U 0 0 ) | | ,
where θ ^ U 0 * * lies on the line segment joining θ ^ U 0 and θ U 0 0 . The first term of the upper bound in (A15) is negligible, so that it suffices to show that the second term is less than g ( 0 + ) = 1 . Since θ ^ U 0 solves the equation Ψ ( θ U 0 ) = 0 in (12), we obtain
S U 0 T 0 + H U 0 T ( θ ^ U 0 * , 0 ) ( θ ^ U 0 θ U 0 0 ) λ 1 s g n ( θ ^ U 0 ) = 0
with θ ^ U 0 * lying between θ ^ U 0 and θ U 0 0 . From Proposition 4(ii),(iii) and Condition 1, the last term in (A15) can be expressed as
λ 1 | | ( / θ U 0 ) S U 0 c T ( θ ^ U 0 * * , 0 ) [ H U 0 T ( θ ^ U 0 * , 0 ) ] 1 [ S U 0 T 0 λ 1 s g n ( θ ^ U 0 ) ] | | λ 1 sup θ , θ N | | ( / θ U 0 ) S U 0 c T ( θ , 0 ) [ H U 0 T ( θ , 0 ) ] 1 | | ( | | S U 0 T 0 | | + λ ) λ 1 c ( q 1 / 2 / T ) 1 / 2 l o g 1 / 2 T + λ = λ 1 c ( q 1 / 2 / T ) 1 / 2 log 1 / 2 T + c .
By Condition 3(i), the first term in the last equation of (A16) is o p ( 1 ) ; hence, (A16) is eventually less than 1. This verifies (11).
Finally, (12) is guaranteed by Lemma 1: we have θ ^ as a strict local maximizer of Q T ( θ ) with | | θ ^ θ 0 | | = O ( T γ log T ) and θ ^ U 0 c = 0 in the event that E T 1 E T 2 . Thus, the proofs of Theorems 1(a) and (b) are complete, by (A6). □

References

  1. Aielli, Gian Piero. 2013. Dynamic conditional correlation: On properties and estimation. Jouranl of Business and Economic Statistics 31: 282–99. [Google Scholar] [CrossRef]
  2. Alexander, Carol. 2000. Orthogonal methods for generating large positive semi-definite covariance matrices. In ICMA Centre Discussion Papers in Finance icma-dp2000-06. London: Henley Business School, Reading University. [Google Scholar]
  3. Ampountolas, Apostolos. 2022. Cryptocurrencies intraday high-frequency volatility spillover effects using univariate and multivariate GARCH models. International Journal of Financial Studies 10: 51. [Google Scholar] [CrossRef]
  4. Apergis, Nicholas, and Anthony Rezitis. 2001. Asymmetric cross-market volatility spillovers: Evidence from daily data on equity and foreign exchange markets. The Manchester School 69: 81–96. [Google Scholar] [CrossRef]
  5. Apergis, Nicholas, and Anthony Rezitis. 2003. An examination of okun’s law: Evidence from regional areas in greece. Applied Economics 35: 1147–51. [Google Scholar] [CrossRef]
  6. Baillie, Rechard T., and Tim Bollerslev. 1990. A multivariate generalized arch approach to modeling risk premia in forward foreign exchange rate markets. Journal of International Money and Finance 9: 309–24. [Google Scholar] [CrossRef]
  7. Basu, Sumanta, and George Michailidis. 2015. Regularized estimation in sparse high-dimensional time series model. The Annals of Statistics 43: 1535–67. [Google Scholar] [CrossRef]
  8. Bauwens, Luc, and Sébastien Laurent. 2005. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models. Journal of Business and Economic Statistics 23: 346–54. [Google Scholar] [CrossRef]
  9. Bickel, Peter J., and Elizaveta Levina. 2008. Covariance regularization by thresholding. The Annals of Statistics 36: 2577–604. [Google Scholar] [CrossRef] [PubMed]
  10. Billio, Monica, Massimiliano Caporin, Lorenzo Frattarolo, and Loriana Pelizzon. 2023. Networks in risk spillovers: A multivariate GARCH perspective. Econometrics and Statistics 28: 1–29. [Google Scholar] [CrossRef]
  11. Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–27. [Google Scholar] [CrossRef]
  12. Bollerslev, Tim. 1990. Comparing predictive accuracy modelling the coherence in short-run nominal exchange rates: A multivariate generalized arch model. The Review of Economics and Statistics 72: 498–05. [Google Scholar] [CrossRef]
  13. Bollerslev, Tim, Robert Engle, and Jeffrey Wooldridge. 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96: 116–31. [Google Scholar] [CrossRef]
  14. Boudt, Kris, Jon Danielsson, and Sébastien Laurent. 2013. Robust forecasting of dynamic conditional correlation garch models. International Journal of Forecasting 29: 244–57. [Google Scholar] [CrossRef]
  15. Brodie, Joshua, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. 2009. Sparse and stable markowitz portfolios. Proceedings of the National Academy of Sciences of the United States of America 106: 12267–72. [Google Scholar] [CrossRef]
  16. Cai, Tony, and Weidong Liu. 2011. Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association 106: 672–84. [Google Scholar] [CrossRef]
  17. Christiansen, Charlotte. 2007. Volatility-Spillover Effects in European Bond Markets. European Financial Management 13: 923–948. [Google Scholar] [CrossRef]
  18. Comte, Fabienne, and Offer Lieberman. 2003. Asymptotic theory for multivariate garch processes. Journal of Multivariate Analysis 84: 61–84. [Google Scholar] [CrossRef]
  19. DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. 2007. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? The Review of Financial Studies 22: 1915–53. [Google Scholar] [CrossRef]
  20. Diebold, Francis X., and Kamil Yilmaz. 2009. Measuring financial asset return and volatitliy spillovers, with application to global equity markets. Economic Journal 199: 158–71. [Google Scholar] [CrossRef]
  21. Di Lorenzo, David, Giampalo Liuzzi, Francesco Rinaldi, Fabio Schoen, and Marco Sciandrone. 2012. A concave optimization-based approach for sparse portfolio selection. Optimization Methods and Software 27: 983–1000. [Google Scholar] [CrossRef]
  22. Efron, Bradley, Trevor Hastie, and Robert Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–99. [Google Scholar] [CrossRef]
  23. Engle, Rober. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation. Econometrica 50: 987–1007. [Google Scholar] [CrossRef]
  24. Engle, Robert. 1990. Asset pricing with a factor-arch covariance structure: Empirical estimates for treasury bills. Journal of Econometrics 45: 213–37. [Google Scholar] [CrossRef]
  25. Engle, Robert. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics 20: 339–50. [Google Scholar] [CrossRef]
  26. Engle, Robert, and Kenneth Kroner. 1995. Multivariate simultaneous generalized arch. Econometric Theory 11: 122–50. [Google Scholar] [CrossRef]
  27. Engle, Robert, and Riccardo Colacito. 2006. Testing and valuing dynamic correlations for asset allocation. Journal of Business and Economic Statistics 24: 238–53. [Google Scholar] [CrossRef]
  28. Engle, Robert, Olivier Ledoit, and Michael Wolf. 2019. Large dynamic covariance matrices. Journal of Business and Economic Statistics 37: 363–75. [Google Scholar] [CrossRef]
  29. Engle, Robert, Takatoshi Ito, and Wen-Ling Lin. 1990. Meteor showers or heat waves? Heteroskedastic intra-daily volatility in the foreign exchange market. Econometrica 58: 525–42. [Google Scholar] [CrossRef]
  30. Fan, Jianqing, and Jinchi Lv. 2011. Noncave penalized likelihood with np-dimensionality. IEEE Transactions on Information Theory 57: 5467–84. [Google Scholar] [CrossRef]
  31. Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
  32. Fan, Yingying, and Cheng Yong Tang. 2013. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology 75: 531–52. [Google Scholar] [CrossRef]
  33. Fastrich, Björn, Sandra Paterlini, and Peter Winker. 2015. Constructing optimal sparse portfolios using regularization methods. Computational Management Science 12: 417–34. [Google Scholar] [CrossRef]
  34. Francq, Christian, and Jean-Michel Zakoian. 2019. GARCH Models: Structure, Statistical Inference and Financial Applications. Hoboken: John Wiley & Sons. [Google Scholar]
  35. Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics 1: 302–32. [Google Scholar] [CrossRef]
  36. Giacometti, Rosella, Gabriele Torri, Kamonchai Rujirarangsan, and Michela Cameletti. 2023. Spatial Multivariate GARCH Models and Financial Spillovers. Journal of Risk and Financial Management 16: 397. [Google Scholar] [CrossRef]
  37. Hamao, Yasushi, Ronald W. Masulis, and Victor Ng. 1990. Correlations in price changes and volatility across international stock markets. The Review of Financial Studies 3: 281–307. [Google Scholar] [CrossRef]
  38. Hafner, Christian M., and Arie Preminger. 2009. Asymptotic theory for a factor GARCH model. Econometric Theory 25: 336–63. [Google Scholar] [CrossRef]
  39. Hafner, Christian M., Helmut Herwartz, and Simone Maxand. 2022. Identification of structural multivariate GARCH models. Journal of Econometrics 227: 212–27. [Google Scholar] [CrossRef]
  40. Hassan, Syed Aun, and Farooq Malik. 2007. Multivariate garch modeling of sector volatility transmission. The Quarterly Review of Economics and Finance 47: 470–80. [Google Scholar] [CrossRef]
  41. Hong, Junping, Yi Yan, Ercan Engin Kuruoglu, and Wai Kin Chan. 2023. Multivariate Time Series Forecasting With GARCH Models on Graphs. IEEE Transactions On Signal And Information Processing Over Networks 9: 557–68. [Google Scholar] [CrossRef]
  42. Kaltenhäuser, Bernd. 2002. Return and Volatility Spillovers to Industry Returns: Does EMU Play a Role? CFS Working Paper Series 2002/05. Frankfurt a. M.: Center for Financial Studies (CFS). [Google Scholar]
  43. Lam, Clifford, and Jianqing Fan. 2009. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics 37: 4254–78. [Google Scholar] [CrossRef] [PubMed]
  44. Lanne, Markku, and Pentti Saikkonen. 2007. A multivariate generalized orthogonal factor GARCH model. Journal of Business & Economic Statistics 25: 61–75. [Google Scholar]
  45. Ledoit, Olivier, and Michael Wolf. 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88: 365–411. [Google Scholar] [CrossRef]
  46. Ledoit, Olivier, and Michael Wolf. 2012. Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics 40: 1024–60. [Google Scholar] [CrossRef]
  47. Ling, Shiqing, and Michael McAleer. 2003. Asymptotic theory for a vector arma-garch model. Econometric Theory 19: 280–310. [Google Scholar] [CrossRef]
  48. Markowitz, Harry. 1952. Portfolio selection. The Journal of Finance 7: 77–91. [Google Scholar]
  49. McAleer, Michael, Suhejia Hoti, and Felix Chan. 2009. Structure and asymptotic theory for multivariate asymmetric conditional volatility. Econometric Reviews 28: 422–40. [Google Scholar] [CrossRef]
  50. NASDAQ Stock Symbols. n.d. Stock Symbol. Available online: https://www.nasdaq.com/market-activity/stocks/ (accessed on 24 January 2024).
  51. Nicholson, William B., David S. Matteson, and Jacob Bien. 2017. VARX-L: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting 33: 627–51. [Google Scholar] [CrossRef]
  52. Pan, Ming-Shiun, and L. Paul Hsueh. 1998. Transmission of stock returns and volatility between the U.S. and Japan: Evidence from the stock index futures markets. Asia-Pacific Financial Markets 5: 211–25. [Google Scholar] [CrossRef]
  53. Poignard, Benjamin. 2017. New Approaches for High-Dimensional Multivariate Garch Models. General Mathematics [math.GM]. Ph.D. thesis, Université Paris Sciences et Lettres, Paris, France. [Google Scholar]
  54. Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electronic Journal of Statistics 5: 935–80. [Google Scholar] [CrossRef]
  55. Rio, Emmanuel. 2017. Asymptotic Theory of Weakly Dependent Random Processes. Berlin: Springer Nature. [Google Scholar]
  56. Sánchez García, Javier, and Salvador Cruz Rambaud. 2022. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 10: 877. [Google Scholar] [CrossRef]
  57. Shiferaw, Yegnanew A. 2019. Time-varying correlation between agricultural commodity and energy price dynamics with Bayesian multivariate DCC-GARCH models. Physica A: Statistical Mechanics and Its Applications 526: 120807. [Google Scholar] [CrossRef]
  58. Siddiqui, Taufeeque Ahmad, and Mazia Fatima Khan. 2018. Analyzing spillovers in international stock markets: A multivariate GARCH approach. IMJ 10: 57–63. [Google Scholar]
  59. Sun, Wei, Junhui Wang, and Yixin Fang. 2013. Consistent selection of tuning parameters via variable selection stability. Journal of Machine Learning Research 14: 3419–40. [Google Scholar]
  60. Sun, Yan, and Xiaodong Lin. 2011. Regularization for stationary multivariate time series. Quantitative Finance 12: 573–86. [Google Scholar] [CrossRef]
  61. Theodossiou, Panayiotis, and Unro Lee. 1993. Mean and volatility spillovers across major national stock markets: Further empirical evidence. The Journal of Financial Research 16: 337–50. [Google Scholar] [CrossRef]
  62. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58: 267–88. [Google Scholar] [CrossRef]
  63. Tse, Yiu Kuen, and Albert K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics 20: 351–62. [Google Scholar]
  64. Uematsu, Yoshimasa. 2015. Penalized likelihood estimation in high-dimensional time series models and its application. arXiv arXiv:1504.06706. [Google Scholar]
  65. van der Weide, Roy. 2002. Go-garch: A multivariate generalized orthogonal garch model. Journal of Applied Econometrics 17: 549–64. [Google Scholar] [CrossRef]
  66. Vrontos, Ioannis, Petros Dellaportas, and Dimitris N. Politis. 2003. A full-factor multivariate garch model. The Econometrics Journal 6: 312–34. [Google Scholar] [CrossRef]
  67. Wang, Hansheng, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71: 671–83. [Google Scholar] [CrossRef]
  68. Worthington, Andrew, and Helen Higgs. 2004. Transmission of equity returns and volatility in asian developed and emerging markets: A multivariate garch analysis. International Journal of Finance & Economics 9: 71–80. [Google Scholar]
  69. Wu, Tong Tong, and Kenneth Lange. 2008. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics 2: 224–44. [Google Scholar] [CrossRef]
  70. Yuan, Ming, and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68: 49–67. [Google Scholar] [CrossRef]
  71. Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef]
  72. Zhang, Yongli, and Yuhong Yang. 2015. Cross-validation for selecting a model selection procedure. Journal of Econometrics 187: 95–112. [Google Scholar] [CrossRef]
  73. Zhao, Peng, and Bin Yu. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7: 2541–67. [Google Scholar]
  74. Zhao, Peng, and Bin Yu. 2007. Stagewise lasso. Journal of Machine Learning Research 8: 2701–26. [Google Scholar]
  75. Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.