1. Introduction
Modeling the dynamics of high-dimensional variance–covariance matrices is a challenging problem in high-dimensional time series analysis and has wide applications in financial econometrics. Classical time series models for variance–covariance matrices assume that the number of component time series is low with respect to the number of observed samples. However, many financial and economic applications these days need to model the dynamics of high-dimensional variance–covariance matrices. For example, in modern portfolio management, the number of assets can easily be more than thousands and be larger or on the same order as the observed historical prices of the assets; in analyzing the movements in the financial markets of different products in different countries, it is critical to understand the interdependence and contagion effects of price movements over thousands of markets, while the amounts of jointly observed financial data are only available in decades.
In this paper, we propose an inference procedure with regularization for high-dimensional BEKK representations and obtain a class of penalized quasi-maximum likelihood (PQML) estimators. The regularization allows us to identify important parameters and shrink the non-essential ones to zero, hence providing an estimate of sparse parameters in BEKK representations. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for BEKK representations. The proposed procedure is a fairly general framework that can be applied to a large class of high-dimensional MGARCH models; by applying our regularization techniques, the complexity of making inferences from high-dimensional MGARCH models can be greatly reduced and the intrinsic sparse model structures can be uncovered. We carried out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we applied the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018. In the comparison of portfolio optimization based on different MGARCH models, we show that the proposed framework outperforms three benchmark models, i.e., the constant covariance model, the factor MGARCH model, and the dynamic conditional correlation model.
The proposed framework can be viewed as an extension of the literature on regularization techniques for converting high-dimensional linear models to nonlinear time series models. Since
Tibshirani (
1996) introduced LASSO for linear regression models, various regularization techniques concerning high-dimensional statistical inference have been studied for various problems in linear models. For example,
Fan and Li (
2001) proposed the smoothly clipped absolute deviation (SCAD) penalty that generates sparse estimation of regression coefficients with reduced bias and explored the so-called “oracle property”, in which the estimator has asymptotic properties that are equivalent to the maximum likelihood estimator in the non-penalized model.
Zou (
2006) proposed adaptive LASSO by adding adaptive weights for different parameters in the
penalty to obtain better estimator performance.
Yuan and Lin (
2006) proposed a group LASSO penalty to solve the problem of selecting grouped factors in regression models.
Zhang (
2010) proposed a minimax concave penalty that gives nearly unbiased variable selection in linear regression. In addition to discussions on regularized estimation in high-dimensional statistics, which relies primarily on independent and identically distributed (i.i.d.) samples and linear models, regularization techniques have also been applied to study inference problems in high-dimensional linear time-series models. For instance,
Uematsu (
2015) studied a class of penalty functions and showed the oracle properties for the estimators in high-dimensional vector autoregressive (VAR) models.
Basu and Michailidis (
2015) investigated the theoretical properties of
-regularized estimates in high-dimensional stochastic regressions with serially correlated errors and transition matrix estimation in high-dimensional VAR models.
Sun and Lin (
2011) developed a regularization framework for full-factor MGARCH models (
Vrontos et al. 2003), in which the dynamics of covariance matrices are determined by the dynamics of univariate GARCH processes for orthogonal factors. Using the group LASSO technique,
Poignard (
2017) studied the inference problem for MGARCH models with vine structure, an alternative to dynamic conditional correlation MGARCH models.
The proposed regularization framework is also related to the problem of estimating
covariance matrices using various shrinkage and regularization methods. For instance,
Ledoit and Wolf (
2004) proposed an optimal linear shrinkage method to estimate constant covariance matrices of
p-dimensional i.i.d. vectors, and, later on,
Ledoit and Wolf (
2012) extended the method and developed nonlinear shrinkage estimators for high-dimensional covariance matrices.
Bickel and Levina (
2008) and
Cai and Liu (
2011) proposed covariance regularization procedures that are based on the thresholding of sample covariance matrices to estimate inverse covariance matrices.
Lam and Fan (
2009) studied sparsistency and rates of convergence for estimating covariance based on penalized likelihood with nonconcave penalties, and
Ravikumar et al. (
2011) estimated high-dimensional inverse covariance by minimizing
-penalized log-determinant divergence. This method is also called graphical LASSO and was studied in
Yuan and Lin (
2006) and
Friedman et al. (
2007). We note that all these discussions focus on high-dimensional constant covariance matrices; thus, they do not involve the dynamics of covariance matrices.
The remainder of the paper is organized as follows.
Section 2 provides a literature review of MGARCH models and their applications in volatility spillover.
Section 3 explains the BEKK model with
-penalty functions in detail. In
Section 4, we provide theoretical properties and implementation procedures for the regularized BEKK model. Simulation results and real data analysis are presented in
Section 5 and
Section 6, respectively.
Section 7 gives concluding remarks.
2. Literature Review
Inspired by the idea of univariate generalized autoregressive conditionally heteroskedastic (GARCH) models
Bollerslev (
1986);
Engle (
1982);
Francq and Zakoian (
2019);
Hafner et al. (
2022), various multivariate GARCH (MGARCH) models were proposed to characterize the dynamics of covariance matrices during the last three decades. Among these MGARCH models, the Baba–Engle–Kraft–Kroner (BEKK) model (
Engle and Kroner 1995) uses a general specification to describe the dynamics of covariance matrices of an
n-dimensional multivariate time series. Since such a specification contains unknown parameters of order
, inference on the BEKK model becomes complicated, even for not very large
ns. When
increases with the same order as, or larger order than, the length of the time series, inference on the MGARCH–BEKK representation becomes even more difficult due to “the curse of dimensionality”.
To reduce the complexity of inference procedures for unknown parameters in MGARCH models, other forms of MGARCH specifications were proposed to reduce the number of unknown parameters in the model. An important improvement to MGARCH models is the dynamic conditional correlation (DCC) model (
Aielli 2013;
Bauwens and Laurent 2005;
Boudt et al. 2013;
Engle 2002). The DCC model allows for time-varying conditional correlations and reduces the dimensionality by factorizing the conditional covariance matrix into the product of a diagonal matrix of conditional standard deviations and a correlation matrix that evolves dynamically over time. Other forms of MGARCH specifications make more assumptions on structures and dynamics of covariance matrices and include, for example, the MGARCH in mean model (
Bollerslev et al. 1988), the constant conditional correlation GARCH model (
Bollerslev 1990;
Ling and McAleer 2003;
McAleer et al. 2009), the time-varying conditional correlation MGARCH model (
Tse and Tsui 2002), the orthogonal factor MGARCH model (
Hafner and Preminger 2009;
Lanne and Saikkonen 2007), and so on. Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of dynamics of covariance matrices. Furthermore, these models still fail to address the issue of making inference on high-dimensional MGARCH models.
In addition to modeling the joint behavior of volatilities for a set of returns, another aspect of MGARCH models is to characterize volatility spillover in financial markets. Volatility spillover refers to as the process and magnitude by which the instability in one market affects other markets. Volatility spillover is widely observed in equity markets (
Hamao et al. 1990), bond markets (
Christiansen 2007), futures markets (
Pan and Hsueh 1998), exchange markets (
Baillie and Bollerslev 1990), markets of equities and exchanges (
Apergis and Rezitis 2001), various industries and commodities (
Apergis and Rezitis 2003;
Kaltenhäuser 2002), and so on. Understanding volatility spillover can provide an insight into financial vulnerabilities, as well as the source and nature of financial exposures, for academic researchers, financial practitioners, and regulatory authorities. For investors, as significant volatility spillover may increase non-systemic risk, understanding volatility spillover can help them diversify the risks associated with their investment. For financial sector regulators, understanding volatility spillover can help them formulate appropriate policies to maintain financial stability, especially when stress from a particular market is transmitted to other markets, such that the risk of systemic instability increases. MGARCH models are generally used to characterize volatility spillover in the markets, which are represented via a low-dimensional multivariate series; see
Hamao et al. (
1990),
Christiansen (
2007);
Pan and Hsueh (
1998),
Engle et al. (
1990), and
Baillie and Bollerslev (
1990). In particular,
Theodossiou and Lee (
1993) used multivariate GARCH-in-mean model to study the economic spillover effect across five countries,
Worthington and Higgs (
2004) applied a BEKK(1,1) model to study transmission of weekly equity returns and volatility in nine Asian countries from 1988 to 2000, and
Hassan and Malik (
2007) employed the BEKK(1,1) specification to study three-dimensional US sector indices. Spillover effect has also been explored recently for other financial markets, such as cryptocurrency markets (
Billio et al. 2023) and European banks with GARCH models (
Giacometti et al. 2023). Additionally, there has been an investigation into the spillover effects using network representations derived from GARCH models in recent studies (
Ampountolas 2022;
Hong et al. 2023).
The aforementioned studies on spillover effects rely on the foundational structures of the DCC model for analysis (
Ampountolas 2022;
Shiferaw 2019;
Siddiqui and Khan 2018). Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of the dynamics of covariance matrices. Moreover, these models still fail to address the issue of making inference on high-dimensional MGARCH models. Under these constraints, the performance and accuracy of these simplified MGARCH models need further investigation in real markets (
Engle and Colacito 2006).
3. The MGARCH–BEKK Representations with Regularization
We first introduce the following notations. Given a vector x and a matrix A, the ith component of x and the th elements of A are written as and , respectively. The jth column and the ith row vectors of A are denoted as and , respectively. is the Euclidean norm for vector x. is the largest element of x in the modulus. is the spectral radius of A, i.e., the largest modulus of eigenvalues of A. and are the minimum and maximum eigenvalues of A, respectively. is the spectral norm, i.e., a square root of . represents the operator norm induced by , or the largest absolute row sum. For any matrix A and vector x such that is well defined, let . We use to denote the sign of if , and otherwise.
3.1. The MGARCH–BEKK Representation
Let
be the vector of returns on
n assets in period
t. Let
be i.i.d.
n-dimensional standard normal random vectors. Let
be the sigma field generated by the past information from
s. Then,
is measurable with respect to
; the distribution of
can be specified as
where
is an
identity matrix. Denote the conditional covariance matrix of
given
as
, i.e.,
.
Engle and Kroner (
1995) proposed the following BEKK
model to characterize the dynamics of
:
where
, and
are
parameter matrices,
C is an
triangular matrix, and the summation limit
K determines the generality of the process.
To illustrate the idea, we consider BEKK(1,1) in our examples with
in this paper, which can be written as
in which
, and
C are real
matrices. And, without loss of generality, we choose
to be symmetric. For identification purposes,
Engle and Kroner (
1995) showed the following property for the BEKK model.
Proposition 1. Suppose that the diagonal elements in C, , and are positive. Then, there exists no other C, A, or B in Model (
3)
that gives an equivalent representation. Let
vec and
vech be the vector operators that stack the columns of a matrix and the lower triangular part of a matrix, respectively. That is, if
then
and
Then, Model (
3) can be rewritten in a vector form:
in which
,
, and ⊗ is the Kronecker product. Since the covariance matrices
are symmetric, we can also write (
3) in the vector-half form:
where
,
, and
and
are matrices of dimension
extracting the upper triangular parts of symmetric matrices
and
. Note that dim
and dim
. For convenience, we denote
by the parameter vector in Model (
3), in which
, so that the matrices
C,
, and
are functions of
:
. And we denote by
the true parameter vector of the model.
We assume that the values of
in (
1) are stationary; then, the following stationary condition should be imposed for the BEKK
Model (
5) (see
Engle and Kroner (
1995) and
Comte and Lieberman (
2003)).
Condition 1 (Stationary Condition). The p-dimensional return series in (
1)
is stationary if the following conditions hold for Model (
3):
- (i)
is a continuous function of , and there exists , , where represents the determinant of a matrix;
- (ii)
For any , and are continuous functions of ;
- (iii)
For any , , i.e., the largest modulus of eigenvalues of is less than 1.
3.2. Likelihood Function
In this section, we discuss some properties of the likelihood of the BEKK
model. Assume that
follows a standard
n-dimensional Gaussian distribution. Ignoring constants, we can write the quasi-log-likelihood as
Taking the derivative on
with respect to the
ith element of
, we obtain
which can be computed recursively. The derivative in (
7) has the following property (the proof is given in
Appendix A).
Proposition 2. Let ; then,where and are two constants. Assume that
is twice continuously differentiable in a neighborhood
of
We define the averages of the score vector and Hessian matrix as follows:
where
and
. Taking the derivative of (
6) with respect to
yields
in which
represents the trace of a matrix.
Comte and Lieberman (
2003) showed the following property for
.
Proposition 3. Under Condition 1, the following properties hold:
- (i)
When , for a nonrandom positive-definite matrix H;
- (ii)
For the Fisher information matrix , ;
- (iii)
For , is bounded for all and .
In the sparse representation, the majority elements of the true parameter vector
are exactly 0. Hence, we could partition
into two sub-vectors. Let
be the set of indices
and
be the
q-dimensional vector composed of the nonzero elements
Similarly, we define
as a
-dimensional zero vector. Without loss of generality,
is stacked as
. For convenience, we define the average of the “score subvector”
and the “Hessian sub-matrix”
by
and
. Similarly, we define
. We also denote
as
Proposition 4. The quasi-log-likelihood function for the BEKK(1,1) has the following properties:
- (i)
For , , where is the ith element of ;
- (ii)
For a sufficiently large T, is almost surely positive definite, and
- (iii)
There exists a neighborhood of such that, for all and and some ,
Here,
means that
with probability 1 when
and
c is a constant. Proposition 4(i) shows that the fourth moment of the score function
is always finite. Proposition 4(ii) indicates that
is almost surely positive and bounded away from 0. Hence, when the
penalty is combined with the quasi-likelihood function, the concavity around
can be ensured, so that a local maximizer can be obtained. Proposition 4(iii) is trivial in linear models, but not in our case. The proof of Proposition 4 is given in
Appendix A.
3.3. Penalty Function and Penalized Quasi-Likelihood
Before discussing the consistency of the sparse estimator, we introduce the following condition, by following the strong irrepresentable condition for LASSO-regularized linear regression models in
Zhao and Yu (
2006).
Condition 2 (Irrepresentable condition). There exists a neighborhood of , such thatfor a constant c that takes its value in (0,1) almost surely.
Definition 1. The half of the minimum signal d is defined as Assume that
is an
penalty function, i.e.,
. We consider the following
penalized quasi-likelihood (PQL):
in which
is the penalty term and
is the regularization parameter determining the size of the model. If
maxmizes the PQL, i.e.,
we say that
is a penalized quasi-maximum likelihood estimator (PQMLE).
Similar to
Fan and Lv (
2011) and
Uematsu (
2015), we add some conditions on the penalty function
and the half minimum signal.
Condition 3. The penalty function satisfies the following properties:
- (i)
for some , and large T. Here, means is bounded by a constant and means when ;
- (ii)
for some and large T, where d is the half-minimum signal we defined before.
5. Simulation
In this section, we study the performance of the regularized BEKK models on some simulated examples. Consider Model (
3) with
and
. Note that we then have
parameters, as matrix
C is lower triangular. We assume that the parameter matrices satisfy the stationary condition, Condition 2, and, for identification purposes, we assume that the diagonal elements in
C are positive,
, and
. We consider two cases for matrices
A,
B, and
C, which are summarized in
Table 1. In both cases, the indices of nonzero elements in coefficient matrices
A,
B, and
C are randomly generated. To ensure that the matrices satisfy Condition 1, values of the diagonal elements in
A and
B are randomly generated from a uniform distribution on U
, and the off-diagonal nonzero elements in
A and
B are generated from U
. All the nonzero elements in
C are generated from U
.
For each case, we simulate the data with , and then use the proposed regularized procedure to make inference on the model. Since the diagonal elements in A, B, and C cannot be zero, we do not shrink the diagonal elements in A, B, and C. Additionally, we set the estimates of parameters in univariate GARCH models for each component series as the initial values of diagonal elements in A, B, and C.
To demonstrate the performance of our estimates, we consider three measurements. The first is the success rate in estimating zero and nonzero elements in
or parameter matrices:
The second measure is the root of mean squared errors, which is defined as
. The third measure is the Kullback–Leibler information, which is given by
where
. We run
simulations for each case, and present the performance measures and their standard errors (in parentheses) for different
s in
Table 2.
To select the tuning parameter
, we use the first 500 samples as the training data and the last 100 samples as the test data. The training data are used to estimate model parameters
for a given
, and the test data are used to choose the best
, i.e., the one that gives the minimum AICs and BICs. That is,
in which the
and
are defined as
where, in this case,
,
, and
Figure 1 shows the histograms of selected
s via BIC and AIC with CV for Cases 1 and 2. In general, we can see from
Figure 1 that
is favored by BIC and AIC when its value is between
and 2. However, slight differences between these two cases can be found. For Case 1,
s around 1 are most favored by both BIC and AIC, while, for Case 2,
s around 1 and 2 are most favored by BIC and AIC, respectively.
7. Discussion and Conclusive Remarks
Modeling the dynamics of high-dimensional covariance matrices is an interesting and challenging problem in both financial econometrics and high-dimensional time series analysis. To address this issue, this paper proposes an inference procedure with regularization for the sparse representation of high-dimensional BEKK and to obtain a class of penalized quasi-maximum likelihood estimators. The proposed regularization allows us to find significant parameters in the BEKK representation and shrink the non-essential ones to zero, hence providing a sparse estimate of the BEKK representations. We show that the sparse BEKK representation has suitable theoretical properties and is promising for applications in portfolio optimization and volatility spillover.
The proposed sparse BEKK representation also contributes to the application of machine learning methods in time series modeling. As most discussion on applying regularization methods to time series modeling focuses on regularizing high-dimensional vector autoregressive models and their variants (
Nicholson et al. 2017;
Sánchez García and Cruz Rambaud 2022), it seems that the sparse representation of dynamics of high-dimensional variance–covariance matrices has been ignored in the literature. While obtaining a sparse representation of the dynamics within high-dimensional variance–covariance matrices is crucial to enhance interpretability in time series modeling, our study bridges this gap by considering a basic
regularization method. One obvious extension from our current study is to replace the
penalty with other types of penalty for high-dimensional MGARCH models, for instance, the SCAD penalty (
Fan and Li 2001), the adaptive LASSO (
Zou 2006), and the group LASSO (
Yuan and Lin 2006). With different types of penalty functions, one can regularize the assets in the model with different requirements, hence causing the estimates to have different kinds of asymptotic properties.
As the proposed sparse BEKK representation simplifies the dynamics of covariance matrices of high-dimensional time series, it has advantages over existing MGARCH models in some financial applications. In particular, the sparse BEKK representation can capture significant volatility spillover effects in high-dimensional financial time series, which usually cannot be analyzed using other MGARCH models. Since significant volatility spillover is captured, the proposed method also improves the performance of portfolio optimization based on the dynamics of high-dimensional covariance matrices. The proposed procedure can certainly be extended to incorporate more empirical aspects of financial time series. Taking the leverage effect as an example, one may modify the regularization procedure to obtain sparse representation of high-dimensional multivariate exponential or threshold GARCH models.
Although the proposed framework shows advantages in modeling dynamics of high-dimensional covariance matrices, the computational challenge is not completely resolved. The main reason is that the proposed inference procedure involves a step of computing derivatives via the Kronecker product of parameter matrices. Since the Kronecker product turns two matrices into an matrix, the requirement for computational memory resources increases significantly. Hence, the proposed procedure is suitable for problems in which the number of component time series ranges from several to 100. If the number of assets progresses beyond 200, the computational cost is still a major concern. One possible remedy for this is training a neural network to approximate the regularized likelihood of the high-dimensional model. In such a way, the proposed regularization using the high-dimensional MGARCH model can be extended to characterize the dynamics of covariance matrices of larger size.