L1 Regularization for High-Dimensional Multivariate GARCH Models

Sijie Yao; Hui Zou; Haipeng Xing

doi:10.3390/risks12020034

,

and

¹

Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA

²

School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA

³

Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11733, USA

^*

Author to whom correspondence should be addressed.

Risks2024, 12(2), 34;https://doi.org/10.3390/risks12020034

This article belongs to the Special Issue Risks Journal: A Decade of Advancing Knowledge and Shaping the Future

Version Notes

Order Reprints

Abstract

The complexity of estimating multivariate GARCH models increases significantly with the increase in the number of asset series. To address this issue, we propose a general regularization framework for high-dimensional GARCH models with BEKK representations, and obtain a penalized quasi-maximum likelihood (PQML) estimator. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for the BEKK representations. We then carry out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we apply the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018, and show that the proposed framework outperforms some benchmark models.

Keywords:

Markov chain Monte Carlo; multivariate GARCH; spillover; stochastic approximation

1. Introduction

Modeling the dynamics of high-dimensional variance–covariance matrices is a challenging problem in high-dimensional time series analysis and has wide applications in financial econometrics. Classical time series models for variance–covariance matrices assume that the number of component time series is low with respect to the number of observed samples. However, many financial and economic applications these days need to model the dynamics of high-dimensional variance–covariance matrices. For example, in modern portfolio management, the number of assets can easily be more than thousands and be larger or on the same order as the observed historical prices of the assets; in analyzing the movements in the financial markets of different products in different countries, it is critical to understand the interdependence and contagion effects of price movements over thousands of markets, while the amounts of jointly observed financial data are only available in decades.

In this paper, we propose an inference procedure with

L_{1}

regularization for high-dimensional BEKK representations and obtain a class of penalized quasi-maximum likelihood (PQML) estimators. The

L_{1}

regularization allows us to identify important parameters and shrink the non-essential ones to zero, hence providing an estimate of sparse parameters in BEKK representations. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for BEKK representations. The proposed procedure is a fairly general framework that can be applied to a large class of high-dimensional MGARCH models; by applying our regularization techniques, the complexity of making inferences from high-dimensional MGARCH models can be greatly reduced and the intrinsic sparse model structures can be uncovered. We carried out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we applied the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018. In the comparison of portfolio optimization based on different MGARCH models, we show that the proposed framework outperforms three benchmark models, i.e., the constant covariance model, the factor MGARCH model, and the dynamic conditional correlation model.

The proposed framework can be viewed as an extension of the literature on regularization techniques for converting high-dimensional linear models to nonlinear time series models. Since () introduced LASSO for linear regression models, various regularization techniques concerning high-dimensional statistical inference have been studied for various problems in linear models. For example, () proposed the smoothly clipped absolute deviation (SCAD) penalty that generates sparse estimation of regression coefficients with reduced bias and explored the so-called “oracle property”, in which the estimator has asymptotic properties that are equivalent to the maximum likelihood estimator in the non-penalized model. () proposed adaptive LASSO by adding adaptive weights for different parameters in the

L_{1}

penalty to obtain better estimator performance. () proposed a group LASSO penalty to solve the problem of selecting grouped factors in regression models. () proposed a minimax concave penalty that gives nearly unbiased variable selection in linear regression. In addition to discussions on regularized estimation in high-dimensional statistics, which relies primarily on independent and identically distributed (i.i.d.) samples and linear models, regularization techniques have also been applied to study inference problems in high-dimensional linear time-series models. For instance, () studied a class of penalty functions and showed the oracle properties for the estimators in high-dimensional vector autoregressive (VAR) models. () investigated the theoretical properties of

L_{1}

-regularized estimates in high-dimensional stochastic regressions with serially correlated errors and transition matrix estimation in high-dimensional VAR models. () developed a regularization framework for full-factor MGARCH models (), in which the dynamics of covariance matrices are determined by the dynamics of univariate GARCH processes for orthogonal factors. Using the group LASSO technique, () studied the inference problem for MGARCH models with vine structure, an alternative to dynamic conditional correlation MGARCH models.

The proposed regularization framework is also related to the problem of estimating

p \times p

covariance matrices using various shrinkage and regularization methods. For instance, () proposed an optimal linear shrinkage method to estimate constant covariance matrices of p-dimensional i.i.d. vectors, and, later on, () extended the method and developed nonlinear shrinkage estimators for high-dimensional covariance matrices. () and () proposed covariance regularization procedures that are based on the thresholding of sample covariance matrices to estimate inverse covariance matrices. () studied sparsistency and rates of convergence for estimating covariance based on penalized likelihood with nonconcave penalties, and () estimated high-dimensional inverse covariance by minimizing

L_{1}

-penalized log-determinant divergence. This method is also called graphical LASSO and was studied in () and (). We note that all these discussions focus on high-dimensional constant covariance matrices; thus, they do not involve the dynamics of covariance matrices.

The remainder of the paper is organized as follows. Section 2 provides a literature review of MGARCH models and their applications in volatility spillover. Section 3 explains the BEKK model with

L_{1}

-penalty functions in detail. In Section 4, we provide theoretical properties and implementation procedures for the regularized BEKK model. Simulation results and real data analysis are presented in Section 5 and Section 6, respectively. Section 7 gives concluding remarks.

2. Literature Review

Inspired by the idea of univariate generalized autoregressive conditionally heteroskedastic (GARCH) models (); (); (); (), various multivariate GARCH (MGARCH) models were proposed to characterize the dynamics of covariance matrices during the last three decades. Among these MGARCH models, the Baba–Engle–Kraft–Kroner (BEKK) model () uses a general specification to describe the dynamics of covariance matrices of an n-dimensional multivariate time series. Since such a specification contains unknown parameters of order

O (n^{2})

, inference on the BEKK model becomes complicated, even for not very large ns. When

n^{2}

increases with the same order as, or larger order than, the length of the time series, inference on the MGARCH–BEKK representation becomes even more difficult due to “the curse of dimensionality”.

To reduce the complexity of inference procedures for unknown parameters in MGARCH models, other forms of MGARCH specifications were proposed to reduce the number of unknown parameters in the model. An important improvement to MGARCH models is the dynamic conditional correlation (DCC) model (; ; ; ). The DCC model allows for time-varying conditional correlations and reduces the dimensionality by factorizing the conditional covariance matrix into the product of a diagonal matrix of conditional standard deviations and a correlation matrix that evolves dynamically over time. Other forms of MGARCH specifications make more assumptions on structures and dynamics of covariance matrices and include, for example, the MGARCH in mean model (), the constant conditional correlation GARCH model (; ; ), the time-varying conditional correlation MGARCH model (), the orthogonal factor MGARCH model (; ), and so on. Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of dynamics of covariance matrices. Furthermore, these models still fail to address the issue of making inference on high-dimensional MGARCH models.

In addition to modeling the joint behavior of volatilities for a set of returns, another aspect of MGARCH models is to characterize volatility spillover in financial markets. Volatility spillover refers to as the process and magnitude by which the instability in one market affects other markets. Volatility spillover is widely observed in equity markets (), bond markets (), futures markets (), exchange markets (), markets of equities and exchanges (), various industries and commodities (; ), and so on. Understanding volatility spillover can provide an insight into financial vulnerabilities, as well as the source and nature of financial exposures, for academic researchers, financial practitioners, and regulatory authorities. For investors, as significant volatility spillover may increase non-systemic risk, understanding volatility spillover can help them diversify the risks associated with their investment. For financial sector regulators, understanding volatility spillover can help them formulate appropriate policies to maintain financial stability, especially when stress from a particular market is transmitted to other markets, such that the risk of systemic instability increases. MGARCH models are generally used to characterize volatility spillover in the markets, which are represented via a low-dimensional multivariate series; see (), (); (), (), and (). In particular, () used multivariate GARCH-in-mean model to study the economic spillover effect across five countries, () applied a BEKK(1,1) model to study transmission of weekly equity returns and volatility in nine Asian countries from 1988 to 2000, and () employed the BEKK(1,1) specification to study three-dimensional US sector indices. Spillover effect has also been explored recently for other financial markets, such as cryptocurrency markets () and European banks with GARCH models (). Additionally, there has been an investigation into the spillover effects using network representations derived from GARCH models in recent studies (; ).

The aforementioned studies on spillover effects rely on the foundational structures of the DCC model for analysis (; ; ). Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of the dynamics of covariance matrices. Moreover, these models still fail to address the issue of making inference on high-dimensional MGARCH models. Under these constraints, the performance and accuracy of these simplified MGARCH models need further investigation in real markets ().

3. The MGARCH–BEKK Representations with $L_{1}$ Regularization

We first introduce the following notations. Given a vector x and a matrix A, the ith component of x and the

i j

th elements of A are written as

x_{i}

and

A_{i j}

, respectively. The jth column and the ith row vectors of A are denoted as

A_{. j}

and

A_{i .}

, respectively.

| | x | |

is the Euclidean norm for vector x.

{| | x | |}_{\infty}

is the largest element of x in the modulus.

ρ (A)

is the spectral radius of A, i.e., the largest modulus of eigenvalues of A.

λ_{min} (A)

and

λ_{\max} (A)

are the minimum and maximum eigenvalues of A, respectively.

| | A | |

is the spectral norm, i.e., a square root of

ρ (A^{T} A)

.

{| | A | |}_{\infty}

represents the operator norm induced by

{| | x | |}_{\infty}

, or the largest absolute row sum. For any matrix A and vector x such that

A x

is well defined, let

{| | A | |}_{2, \infty} : = \max_{| | x | | = 1} {| | A x | |}_{\infty}

. We use

sign (x)

to denote the sign of

x : sign (x) = x / | x |

if

x \neq 0

, and

sign (x) = 0

otherwise.

3.1. The MGARCH–BEKK Representation

Let

r_{t}

be the vector of returns on n assets in period t. Let

ϵ_{t}

be i.i.d. n-dimensional standard normal random vectors. Let

F_{t}

be the sigma field generated by the past information from

r_{t}

s. Then,

Σ_{t}

is measurable with respect to

F_{t - 1}

; the distribution of

r_{t}

can be specified as

r_{t} = Σ_{t}^{- \frac{1}{2}} ϵ_{t}, ϵ_{t} \sim N (0, I_{n}),

(1)

where

I_{n}

is an

n \times n

identity matrix. Denote the conditional covariance matrix of

r_{t}

given

F_{t - 1}

as

Σ_{t}

, i.e.,

Σ_{t} = Cov (r_{t} | F_{t - 1})

. () proposed the following BEKK

(a, b)

model to characterize the dynamics of

Σ_{t}

:

Σ_{t} = C^{'} C + \sum_{k = 1}^{K} \sum_{i = 1}^{a} A_{i k} r_{t - 1} r_{t - 1}^{'} A_{i k}^{'} + \sum_{k = 1}^{K} \sum_{i = 1}^{b} B_{i k} Σ_{t - 1} B_{i k}^{'},

(2)

where

A_{i k}

, and

B_{i k}

are

n \times n

parameter matrices, C is an

n \times n

triangular matrix, and the summation limit K determines the generality of the process.

To illustrate the idea, we consider BEKK(1,1) in our examples with

K = 1

in this paper, which can be written as

Σ_{t} = C^{'} C + \sum_{i = 1}^{a} A_{i} r_{t - i} r_{t - i}^{'} A_{i}^{'} + \sum_{i = 1}^{b} B_{i} Σ_{t - i} B_{i}^{'} .

(3)

in which

A_{i},

B_{i}

, and C are real

n \times n

matrices. And, without loss of generality, we choose

Σ_{t}^{- 1 / 2}

to be symmetric. For identification purposes, () showed the following property for the BEKK model.

Proposition 1.

Suppose that the diagonal elements in C,

a_{11}

, and

b_{11}

are positive. Then, there exists no other C, A, or B in Model (3) that gives an equivalent representation.

Proposition 1 is also known as the identification condition ().

Let vec and vech be the vector operators that stack the columns of a matrix and the lower triangular part of a matrix, respectively. That is, if

Y = (\begin{matrix} y_{11} & \dots & y_{1 n} \\ ⋮ & ⋱ & ⋮ \\ y_{n 1} & \dots & y_{n n} \end{matrix}),

then

v e c (Y) = {(y_{11}, \dots, y_{n 1}, y_{12}, \dots, y_{n 2}, \dots, y_{1 n}, \dots, y_{n n})}^{'},

and

v e c h (Y) = {(y_{11}, \dots, y_{n 1}, y_{22}, \dots, y_{n 2}, \dots, y_{i i}, \dots, y_{i n}, \dots y_{n n})}^{'} .

Then, Model (3) can be rewritten in a vector form:

v e c (Σ_{t}) = v e c (C^{'} C) + \sum_{i = 1}^{a} \overset{˚}{A_{i}} v e c (r_{t - i} r_{t - i}^{'}) + \sum_{i = 1}^{b} \overset{˚}{B_{i}} v e c (Σ_{t - i}) .

(4)

in which

\overset{˚}{A_{i}} = A_{i} \otimes A_{i}

,

\overset{˚}{B_{i}} = B_{i} \otimes B_{i}

, and ⊗ is the Kronecker product. Since the covariance matrices

Σ_{t}

are symmetric, we can also write (3) in the vector-half form:

v e c h (Σ_{t}) = v e c h (C^{'} C) + \sum_{i = 1}^{a} {\tilde{A}}_{i} v e c h (r_{t - i} r_{t - i}^{'}) + \sum_{i = 1}^{b} {\tilde{B}}_{i} v e c h (Σ_{t - i}) .

(5)

where

\tilde{A_{i}} = L_{n} \overset{˚}{A_{i}} K_{n}^{'}

,

\tilde{B_{i}} = L_{n} \overset{˚}{B_{i}} K_{n}^{'}

, and

L_{n}

and

K_{n}

are matrices of dimension

n (n + 1) \times n^{2}

extracting the upper triangular parts of symmetric matrices

\overset{˚}{A_{i}}

and

\overset{˚}{B_{i}}

. Note that dim

(v e c (Σ_{t})) = n^{2}

and dim

(v e c h (Σ_{t})) = n (n + 1) / 2

. For convenience, we denote

θ = {(θ_{1}, \dots, θ_{p})}^{'}

by the parameter vector in Model (3), in which

p = 2 (a + b) n^{2} + n (n + 1) / 2

, so that the matrices C,

A_{i}

, and

B_{i}

are functions of

θ

:

C = C (θ), A_{i} = A_{i} (θ), B_{i} = B_{i} (θ)

. And we denote by

θ^{0}

the true parameter vector of the model.

We assume that the values of

r_{t}

in (1) are stationary; then, the following stationary condition should be imposed for the BEKK

(a, b)

Model (5) (see () and ()).

Condition 1 (Stationary Condition).

The p-dimensional return series

r_{t}

in (1) is stationary if the following conditions hold for Model (3):

(i): $C^{*} (θ) = C^{'} C$ is a continuous function of $θ$ , and there exists $C_{0} > 0$ , $det (C^{*} (θ)) \geq C_{0}$ , where $det (\cdot)$ represents the determinant of a matrix;
(ii): For any $θ$ , ${\tilde{A}}_{i} (θ)$ and $\tilde{B_{i}} (θ)$ are continuous functions of $θ$ ;
(iii): For any $θ$ , $ρ (\sum_{i = 1}^{a} {\tilde{A}}_{i} (θ) + \sum_{i = 1}^{b} {\tilde{B}}_{i} (θ)) < 1$ , i.e., the largest modulus of eigenvalues of $\sum_{i = 1}^{a} {\tilde{A}}_{i} (θ) + \sum_{i = 1}^{b} {\tilde{B}}_{i} (θ)$ is less than 1.

3.2. Likelihood Function

In this section, we discuss some properties of the likelihood of the BEKK

(a, b)

model. Assume that

ϵ_{t}

follows a standard n-dimensional Gaussian distribution. Ignoring constants, we can write the quasi-log-likelihood as

ℓ_{T} (θ) = \frac{1}{2 T} \sum_{t = 1}^{T} l_{t} (θ), l_{t} (θ) = - (\log [d e t (Σ_{t})] + r_{t}^{'} Σ_{t}^{- 1} r_{t}) .

(6)

Taking the derivative on

Σ_{t}

with respect to the ith element of

θ

, we obtain

\begin{matrix} \frac{\partial Σ_{t}}{\partial θ_{i}} = \frac{\partial C^{'} C}{\partial θ_{i}} + \sum_{j = 1}^{a} (\frac{\partial A_{j}}{\partial θ_{i}} r_{t - j} r_{t - j}^{'} A_{j}^{'} + A_{j} r_{t - j} r_{t - j}^{'} \frac{\partial A_{j}^{'}}{\partial θ_{i}}) + \sum_{j = 1}^{b} (\frac{\partial B_{j}}{\partial θ_{i}} Σ_{t - j} B_{j} \\ + B_{j} Σ_{t - j} \frac{\partial B_{j}^{'}}{\partial θ_{i}} + B_{j} \frac{\partial Σ_{t - j}}{\partial θ_{i}} B_{j}^{'}), \end{matrix}

(7)

which can be computed recursively. The derivative in (7) has the following property (the proof is given in Appendix A).

Proposition 2.

Let

R_{t} = v e c h (r_{t} r_{t}^{'})

; then,

∥\frac{\partial Σ_{t}}{\partial θ_{i}}∥ \leq Ψ_{1} + Ψ_{2} \cdot sup | | R_{t} | | .

where

Ψ_{1}

and

Ψ_{2}

are two constants.

Assume that

L_{T} (θ)

is twice continuously differentiable in a neighborhood

Θ^{0} \in Θ

of

θ^{0} .

We define the averages of the score vector and Hessian matrix as follows:

S_{T} (θ) = T^{- 1} \sum_{t = 1}^{T} s_{t} (θ) and H_{T} (θ) = T^{- 1} \sum_{t = 1}^{T} h_{t} (θ),

where

s_{t} (θ) = \partial l_{t} (θ) / \partial θ

and

h_{t} (θ) = \partial^{2} l_{t} (θ) / \partial θ \partial θ^{'}

. Taking the derivative of (6) with respect to

θ_{i}

yields

\frac{\partial l_{t}}{\partial θ_{i}} = - T r (\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1}),

\begin{matrix} \frac{\partial^{2} l_{t} (θ)}{\partial θ_{j} \partial θ_{i}} & = & - T r (\frac{\partial^{2} Σ_{t}}{\partial θ_{j} \partial θ_{i}} Σ_{t}^{- 1} - \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{j}} Σ_{t}^{- 1} + r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{j}} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} \\ - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial^{2} Σ_{t}}{\partial θ_{j} \partial θ_{i}} Σ_{t}^{- 1} + r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{j}} Σ_{t}^{- 1}), \end{matrix}

in which

T r (\cdot)

represents the trace of a matrix. () showed the following property for

l_{t} (θ)

.

Proposition 3.

Under Condition 1, the following properties hold:

(i): When $T \to + \infty$ , $- H_{T}^{0} : = - \frac{1}{T} \sum_{t = 1}^{T} \frac{\partial l_{t}^{2} (θ^{0})}{\partial θ \partial θ^{'}} \overset{P}{\to} H$ for a nonrandom positive-definite matrix H;
(ii): For the Fisher information matrix $I_{0} : = E (\frac{\partial l_{t} (θ^{0})}{\partial θ} \cdot \frac{\partial l_{t} (θ^{0})}{\partial θ^{'}})$ $= E (S_{T}^{0} {(S_{T}^{0})}^{'})$ , $| | I_{0} {| |}_{\infty} < \infty$ ;
(iii): For $θ \in Θ$ , $E ({sup}_{| | θ - θ^{0} | | \leq ϵ} | \frac{\partial^{3} l_{t} (θ)}{\partial θ_{i} \partial θ_{j} \partial θ_{k}} |)$ is bounded for all $ϵ > 0$ and $i, j, k = 1, \dots, p$ .

In the sparse representation, the majority elements of the true parameter vector

θ^{0}

are exactly 0. Hence, we could partition

θ^{0}

into two sub-vectors. Let

U_{0}

be the set of indices

{j \in {1, \dots, p} : θ_{j}^{0} \neq 0}

and

θ_{U_{0}}^{0}

be the q-dimensional vector composed of the nonzero elements

{θ_{j}^{0} \neq 0 : j \in U_{0}} .

Similarly, we define

θ_{U_{0}^{c}}^{0}

as a

(p - q)

-dimensional zero vector. Without loss of generality,

θ^{0}

is stacked as

θ^{0} = ({(θ_{U_{0}}^{0})}^{'}, 0^{'}) = ({(θ_{U_{0}}^{0})}^{'}, {(θ_{U_{0}^{c}}^{0})}^{'})

. For convenience, we define the average of the “score subvector”

S_{U_{0}, T} (θ)

and the “Hessian sub-matrix”

H_{U_{0}, T} (θ)

by

s_{U_{0}, t} (θ) = \partial l_{t} (θ) / \partial θ_{U_{0}}

and

h_{U_{0}, t} (θ) = \partial^{2} l_{t} (θ) / \partial θ_{U_{0}} \partial θ_{U_{0}}^{'}

. Similarly, we define

S_{U_{0}^{c}, T} (θ)

. We also denote

S_{T} (θ^{0}) = S_{T} (θ_{U_{0}}^{0}, 0)

as

S_{T}^{0} .

Proposition 4.

The quasi-log-likelihood function

L_{T}

for the BEKK(1,1) has the following properties:

(i): For $i = 1, \dots, p$ , $E (| T \cdot S_{T, i}^{0} |^{4}) < \infty$ , where $S_{T, i}^{0}$ is the ith element of $S_{T}^{0}$ ;
(ii): For a sufficiently large T, $- H_{U_{0}, T}^{0}$ is almost surely positive definite, and $λ_{min} (- H_{U_{0}, T}^{0}) = O_{p} (1);$
(iii): There exists a neighborhood $Θ_{U_{0}}^{0} \subset Θ$ of $θ_{U_{0}}^{0}$ such that, for all $θ^{(1)}$ and $θ^{(2)} \in Θ_{U_{0}}^{0}$ and some $K_{T} = O_{p} (1)$ ,

$| | H_{U_{0}, T} (θ^{(1)}, 0) - H_{U_{0}, T} (θ^{(2)}, 0) | | \leq K_{T} | | θ^{(1)} - θ^{(2)} | | .$

Here,

a_{T} = O_{p} (1)

means that

| a_{T} | \leq c

with probability 1 when

T \to \infty

and c is a constant. Proposition 4(i) shows that the fourth moment of the score function

S_{T}

is always finite. Proposition 4(ii) indicates that

λ_{min} (- H_{U_{0}, T}^{0})

is almost surely positive and bounded away from 0. Hence, when the

L_{1}

penalty is combined with the quasi-likelihood function, the concavity around

θ^{0}

can be ensured, so that a local maximizer can be obtained. Proposition 4(iii) is trivial in linear models, but not in our case. The proof of Proposition 4 is given in Appendix A.

3.3. $L_{1}$ Penalty Function and Penalized Quasi-Likelihood

Before discussing the consistency of the sparse estimator, we introduce the following condition, by following the strong irrepresentable condition for LASSO-regularized linear regression models in ().

Condition 2 (Irrepresentable condition).

There exists a neighborhood

Θ_{U_{0}}^{0} \subset Θ

of

θ_{U_{0}}^{0}

, such that

sup_{θ^{(1)}, θ^{(2)} \in Θ_{U_{0}}^{0}} | | [(\partial / \partial θ_{U_{0}}^{T}) S_{U_{0}^{c}, T} (θ^{(1)}, 0)] {[H_{U_{0}, T} (θ^{(2)}, 0)]}^{- 1} {| |}_{\infty} \leq c

for a constant c that takes its value in (0,1) almost surely.

Definition 1.

The half of the minimum signal d is defined as

d (d = d_{T}) = \frac{1}{2} min {| θ_{j}^{0} | : θ_{j}^{0} \neq 0} = \frac{1}{2} min_{j \in U_{0}} | θ_{j}^{0} | .

(8)

Assume that

p_{λ} (x)

is an

L_{1}

penalty function, i.e.,

p_{λ} (| x |) = λ | x |

. We consider the following penalized quasi-likelihood (PQL):

Q_{T} (θ) = L_{T} (θ) - P_{T} (θ)

(9)

in which

P_{T} (θ) = \sum_{j = 1}^{p} p_{λ} (| θ_{j} |) = λ \sum_{j = 1}^{p} | θ_{j} |

is the penalty term and

λ (= λ_{T}) \geq 0

is the regularization parameter determining the size of the model. If

\hat{θ}

maxmizes the PQL, i.e.,

\hat{θ} = arg \max_{θ \in Θ} Q_{T} (θ) .

we say that

\hat{θ}

is a penalized quasi-maximum likelihood estimator (PQMLE).

Similar to () and (), we add some conditions on the penalty function

p_{λ} (\cdot)

and the half minimum signal.

Condition 3.

The penalty function

p_{λ}

satisfies the following properties:

(i): $λ = min {O (T^{- α}), o (q^{- \frac{1}{2}} T^{- γ} \log T)}$ for some $α \in (δ_{0} + γ, \frac{1}{2} - \frac{δ_{0}}{4})$ , $γ \in (0, \frac{1}{2}]$ and large T. Here, $a = O (f (T))$ means $| a / f (T) |$ is bounded by a constant and $b = o (g (T))$ means $| b / g (T) | \to$ when $T \to \infty$ ;
(ii): $d \geq T^{- γ} \log T$ for some $γ \in (0, \frac{1}{2}]$ and large T, where d is the half-minimum signal we defined before.

4. Properties of the PQML Estimator and Implementation

This section studies the sparsity and the consistency of the PQML estimator and discuss some implementation issues.

4.1. Sparsity of the PQML Estimator

First, we introduce three lemmas whose proofs are given in Appendix A. For convenience, we denote

\hat{U} : = supp (\hat{θ})

, which is a set of indices corresponding to all nonzero components of

\hat{θ}

, where supp is the notation of support set and

{\hat{θ}}_{\hat{U}}

is a subvector of

\hat{θ}

, formed by its restriction to

\hat{U}

. Then,

\hat{U^{c}}

represents a set of indices corresponding to all 0 components in

\hat{θ}

. We also denote ⊙ as the Hadmard product.

Lemma 1.

When the penalty function

p_{λ}

satisfies Condition 3,

\hat{θ}

is a strict local maximizer of the

L_{1}

-PQL

Q_{T} (θ)

defined in (9) if

\begin{matrix} S_{\hat{U}, T} (\hat{θ}) - λ_{T} 1 ⊙ s i g n ({\hat{θ}}_{\hat{U}}) = 0, \end{matrix}

(10)

\begin{matrix} | | S_{\hat{U^{c}}, T} (\hat{θ}) {| |}_{\infty} < λ_{T}, \end{matrix}

(11)

\begin{matrix} λ_{m i n} [- H_{\hat{U}, T} (\hat{θ})] > 0, \end{matrix}

(12)

in which

1

represents the vector with all elements equaling to 1 and

s i g n (\cdot)

is as defined at the beginning of Section 3.

To show the weak oracle property of the PQML estimator, we also need the following lemma.

Lemma 2.

Let

w_{t}

be a martingale difference sequence with

E | w_{t} |^{m} \leq C_{w}

for all t, where

m > 2

and

C_{w}

is a constant. Then, we have

T^{- \frac{m}{2}} E {(\sum_{t = 1}^{T} w_{t})}^{m} < \infty .

Then, the weak oracle property of the PQML estimator can be established by the following theorem, whose proof is provided in Appendix A.

Theorem 1.

(

L_{1}

-PQML estimator) Under Conditions 2 and 3, for the

L_{1}

penalty function

P_{T} (θ) = λ \sum_{i = 1}^{p} | θ |

, in which

p = O (T^{δ})

and

q = O (T^{δ_{0}})

, if

δ \in [0, 4 (\frac{1}{2} - α)), 0 < δ_{0} < min {\frac{2}{3} (1 - 2 γ), γ},

with

α \in (δ_{0} + γ, \frac{1}{2} - \frac{δ_{0}}{4})

,

γ \in (0, \frac{1}{2}],

and

δ > δ_{0}

, then there exists a local maximizer

\hat{θ} = {({({\hat{θ}}_{U_{0}})}^{'}, {({\hat{θ}}_{U_{0}^{c}})}^{'})}^{'}

for

Q_{T} (θ)

, such that the following properties are satisfied:

(i): (Sparsity) ${\hat{θ}}_{U_{0}^{c}} = 0$ with probability approaching one;
(ii): (Rate of convergence) $| | {\hat{θ}}_{U_{0}} - θ_{U_{0}}^{0} {| |}_{\infty} = O_{p} (T^{- γ} \log T) .$

p = O (T^{δ})

is equivalent to

\frac{p}{T^{δ}} \leq c

when

T \to \infty

. The growth rate of p is controlled by

T^{δ}

and q is slower than with

T^{δ_{0}}

. For example, to make this growth rate of q much slower than p, we can find a set of values for

δ = \frac{3}{2},

δ_{0} = \frac{1}{20},

γ = \frac{1}{30}

, and

α = \frac{1}{5}

that satisfy the conditions above. Since, in our case,

p \sim O (n^{2})

, we have

n = O (T^{\frac{3}{4}})

and, hence, it is possible for n to exceed the sample size T. Although the difference between the rates of p and n is not as large as that in (), in which

\log p = O (T^{1 - 2 α})

and

q = o (T)

, it is enough to be applied in most cases in practice.

4.2. Implementation and Selection of $λ$

To compute the whole regularization path of

L_{1}

-PQML estimators, we note that several algorithms have been proposed to solve penalized optimization problems. For example, () proposed the least-angle regression (LARS) algorithm to compute an efficient solution to the optimization problem for LASSO. Later on, pathwise coordinate descent methods were proposed to solve the LASSO-type problem efficiently; see () and (). For the PQML estimator, we used an algorithm inspired by the BLasso algorithm (, ) with some necessary modifications since the BLasso algorithm does not need to explicitly calculate the first derivatives and second derivatives of the likelihood function, which are complicated in our case. We note that the original BLasso algorithm uses 0 as initial values for all parameters, but the diagonal elements of A and B are positive by definition, so we make the following modification. We set 0 as the initial values for all off-diagonal elements in A, B, and C, and set the estimated values of fitting the component series into a univariate GARCH model as the initial values of the diagonal elements in parameter matrices A, B, and C.

Another issue in the implementation is to select the tuning parameter

λ

, which leads to the problem of model selection. The tuning parameter

λ

can be chosen by several criteria. For example, it is usually easy to consider the Akaike information criterion (AIC), the small-sample corrected AIC (AICC), and Bayesian information criterion (BIC) criteria to select the tuning parameter. In addition, () proposed a modified BIC criterion and () extended it for the case

p > T

. () proposed using Cohen’s kappa coefficient, which measures the agreement between two sets. Another method for model selection is to use cross-validation (CV). () used CV to choose the best model among model selection procedures such as AIC and BIC. In our study, we apply the AIC and BIC criteria on the testing data and select the best tuning parameters. Note that, since our data are ordered, k-fold CV is not applicable here and the data are split in time order.

5. Simulation

In this section, we study the performance of the regularized BEKK models on some simulated examples. Consider Model (3) with

n = 4

and

a = b = 1

. Note that we then have

p = 42

parameters, as matrix C is lower triangular. We assume that the parameter matrices satisfy the stationary condition, Condition 2, and, for identification purposes, we assume that the diagonal elements in C are positive,

a_{11} > 0

, and

b_{11} > 0

. We consider two cases for matrices A, B, and C, which are summarized in Table 1. In both cases, the indices of nonzero elements in coefficient matrices A, B, and C are randomly generated. To ensure that the matrices satisfy Condition 1, values of the diagonal elements in A and B are randomly generated from a uniform distribution on U

(- 0.45, 0.45)

, and the off-diagonal nonzero elements in A and B are generated from U

(- 0.5, 0.5)

. All the nonzero elements in C are generated from U

(- 0.1, 0.1)

.

Table 1. Parameter matrices in simulations.

For each case, we simulate the data

r_{t}

(1 \leq t \leq T)

with

T = 600

, and then use the proposed regularized procedure to make inference on the model. Since the diagonal elements in A, B, and C cannot be zero, we do not shrink the diagonal elements in A, B, and C. Additionally, we set the estimates of parameters in univariate GARCH models for each component series as the initial values of diagonal elements in A, B, and C.

To demonstrate the performance of our estimates, we consider three measurements. The first is the success rate in estimating zero and nonzero elements in

θ

or parameter matrices:

τ_{0} = \frac{\sum_{i = 1}^{p} I (θ_{i}^{0} = 0 \land {\hat{θ}}_{i} = 0)}{\sum_{i = 1}^{p} I (θ_{i}^{0} = 0)}, τ_{0^{C}} = \frac{\sum_{i = 1}^{p} I (θ_{i}^{0} \neq 0 \land {\hat{θ}}_{i} \neq 0)}{\sum_{i = 1}^{p} I (θ_{i}^{0} \neq 0)} .

The second measure is the root of mean squared errors, which is defined as

ν = | | θ^{0} - {\hat{θ}}_{λ} {| |}_{2}

. The third measure is the Kullback–Leibler information, which is given by

κ = \frac{1}{2 T} \sum_{t = 1}^{T} (| Σ_{t} {\hat{Σ}}_{t}^{- 1} | - \log | Σ_{t} {\hat{Σ}}_{t}^{- 1} |),

where

{\hat{Σ}}_{t} = {\hat{C}}^{'} \hat{C} + \hat{A} r_{t - 1} r_{t - 1}^{'} {\hat{A}}^{'} + \hat{B} Σ_{t - 1} {\hat{B}}^{'}

. We run

N = 500

simulations for each case, and present the performance measures and their standard errors (in parentheses) for different

λ

s in Table 2.

Table 2. Performance measures in two cases.

To select the tuning parameter

λ

, we use the first 500 samples as the training data and the last 100 samples as the test data. The training data are used to estimate model parameters

θ_{λ}

for a given

λ

, and the test data are used to choose the best

λ

, i.e., the one that gives the minimum AICs and BICs. That is,

{\hat{λ}}_{BIC} = arg min_{λ} {BIC}_{λ}, {\hat{λ}}_{AIC} = arg min_{λ} {AIC}_{λ},

in which the

{AIC}_{λ}

and

{BIC}_{λ}

are defined as

{BIC}_{λ} = - 2 L_{T_{t e s t}} ({\hat{θ}}_{λ}) + \frac{k \log (T_{t e s t})}{T_{t e s t}} {AIC}_{λ} = - 2 L_{T_{t e s t}} ({\hat{θ}}_{λ}) + \frac{2 k}{T_{t e s t}},

where, in this case,

k = \sum_{i = 1}^{p} I ({\hat{θ}}_{i}^{λ} \neq 0)

,

T_{t e s t} = 100

, and

L_{T_{t e s t}} (θ) = \frac{1}{2 T_{t e s t}} \sum_{t = 501}^{600} - (\log [d e t (Σ_{t})] + r_{t}^{'} Σ_{t}^{- 1} r_{t}) .

Figure 1 shows the histograms of selected

λ

s via BIC and AIC with CV for Cases 1 and 2. In general, we can see from Figure 1 that

λ

is favored by BIC and AIC when its value is between

0.64

and 2. However, slight differences between these two cases can be found. For Case 1,

λ

s around 1 are most favored by both BIC and AIC, while, for Case 2,

λ

s around 1 and 2 are most favored by BIC and AIC, respectively.

Figure 1. Histograms of selected

λ

in Cases 1 (top) and 2 (bottom) via BIC (left) and AIC (right).

6. Real Data Applications

In this section, we use the regularized BEKK representation to study the volatility spillover effect and find optimal Markowitz’s mean–variance portfolios. The data we studied consist of daily log-returns of 18 stocks during the period 4 January 2016–31 January 2018, which are listed in Table 3 (). Figure 2 shows the time series of these 18 stocks and Table 4 summarizes the sample mean, the sample standard deviation, the sample skewness, the sample kurtosis, and the correlations of these 18 series. All the correlations are positive for every two stocks in the selected period, and, except for IPG, all the stocks have a positive mean. The sample kurtosis for some stocks is way larger than 3, which indicates that we cannot simply assume that those returns are following normal distributions individually. Hence, it is natural to employ a suitable time series model to examine the data.

Table 3. Full names of 18 tickers.

Figure 2. Daily returns of 18 stocks from 4 January 2016 to 31 January 2018.

Table 4. Correlation and statistical features of 18 stocks for 2016–2017.

6.1. Volatility Spillovers

To use the MGARCH–BEKK representation to analyze the market, consisting of 18 stocks, we should realize that certain types of regularization or shrinkage are necessary, due to the complexity of the volatility dynamics. In particular, we use the proposed

L_{1}

-regularized BEKK(1,1) model and procedure to study the volatility spillover among the 18 stocks. We first compute the PQML estimates of the model for different

λ

s. Figure 3 shows the estimated structures of estimated coefficient matrices

{\hat{A}}_{λ}

and

{\hat{B}}_{λ}

for

λ = 4, 2, 1, 0.5, 0.3

, in which the nonzero values of

{\hat{A}}_{λ}

and

{\hat{B}}_{λ}

are represented as the directional lines among stocks. Since matrices A and B in the model are not symmetric before the quadratic forms, we use the directional lines to distinguish the nonzero elements between upper-diagonal and lower-diagonal elements. Specifically, if

a_{i j} \neq 0

, the directional line progresses from i to j. As the PQML estimates

{\hat{A}}_{λ}

and

{\hat{B}}_{λ}

tell us the significant interdependence and contagion effects of the 18 stocks, the network structures in Figure 3 provide a clear representation on volatility spillover. Furthermore, we notice that, for some moderate values of

λ

, for example,

λ = 0.5

,

{\hat{A}}_{λ}

is very sparse, whereas

{\hat{B}}_{λ}

demonstrates more interdependence among stocks. When larger values of

λ

are used in the regularization procedure, the PQML estimates

{\hat{A}}_{λ}

are quickly shrunk into diagonal matrices, and

{\hat{B}}_{λ}

also become more sparse than for the case

λ = 0.5

.

Figure 3. The network structure of estimated matrices A (top) and B (bottom) under different

λ

s.

Using the PQML estimates of

{\hat{A}}_{λ}

,

{\hat{B}}_{λ}

, and

{\hat{C}}_{λ}

and the BEKK(1, 1) representation, we compute the estimated volatilities and the dynamic correlations among 18 stocks. Figure 4 shows the volatilities estimated by the regularized BEKK(1,1) model with

λ = 2, 0.5

and univariate GARCH models. Note that most volatility series estimated by the three models are similar, except for stocks NFLX, ORCL, and TIF. We also show the estimated dynamic correlations among 18 stocks in a regularized BEKK(1,1) model with

λ = 1

in Figure 5. We note that most correlations among the 18 stocks are positive during the sample period.

Figure 4. Estimated volatilities by regularized BEKK(1,1) with

λ = 2

(red lines),

λ = 0.5

(blue lines), and univariate GARCH models (green lines).

Figure 5. Daily estimated conditional correlations when

λ = 1

.

To show the overall volatility spillover, we extend the idea of the spillover index in (). Specifically, note that

E [ϵ_{t + 1} ϵ_{t + 1}^{'}] = Σ_{t + 1} = Σ_{t + 1}^{\frac{1}{2}} {(Σ_{t + 1}^{\frac{1}{2}})}^{'}

, where

Σ_{t + 1}^{\frac{1}{2}}

is the unique lower-triangular Cholesky factor of

Σ_{t + 1}

. We denote elements of

Σ_{t}^{\frac{1}{2}}

by

σ_{\frac{1}{2}, i, j, t}

; then, the Spillover Index

S_{t + 1}

is defined as

S_{t + 1} = \frac{\sum_{i, j = 1, i \neq j}^{n} {\hat{σ}}_{\frac{1}{2}, i, j, t + 1}^{2}}{t r a c e ({\hat{Σ}}_{t + 1})} \times 100 %,

where n is the number of stocks, which is equal to 18. We plot the daily spillover indices of 18 stocks for

λ = 2

and

0.5

. The spillover indices during the sample period vary between 5% and 80%, and smaller

λ

s seem to generate more correlations among stocks. In particular, three big spikes can be found on 4 February 2016, 24 June 2016, and 9 November 2016. In addition to finding the PQML estimates for different

λ

s, we also find the whole

Ł_{1}

regularization path. Note that the number of parameters in the BEKK(1,1) model for 18 stocks is

p = 819

, and we only show the regularized path for

819 - 18 \times 3 = 765

off-diagonal elements in

{\hat{A}}_{λ}

,

{\hat{B}}_{λ}

, and

{\hat{C}}_{λ}

. And both plots are shown in Figure 6.

Figure 6. Daily spillover index (top) and regularization paths of estimated off-diagonal parameters in BEKK regularization Model represented by different colors (bottom).

6.2. Portfolio Optimization

We further apply the regularized BEKK model to Markowitz mean–variance portfolio optimization (). Using portfolio variance as a measure of the risk, Markowitz portfolio optimization theory provides an optimal pay-off between the profit and the risk. Since the means and covariance matrix of assets are assumed to be known in the theory, they need to be estimated before being plugged into the framework. For high-dimensional portfolios, regularized methods are commonly used to achieve better performance. For instance, () and () used an

L_{1}

penalty function for sparse portfolios, and () used a concave optimization-based approach to estimate the optimal portfolio. In our case, we use the regularized BEKK model to predict the covariance matrices in the next period, and then apply Markowitz portfolio theory to find the optimal portfolios.

In particular, we assume that the portfolio consists of

n = 18

risky assets and denote

μ_{t}

and

Σ_{t}

as the mean and covariance matrix, respectively, of the n risk assets at time t. Let

1 = {(1, \dots, 1)}^{'}

be an n-dimensional vector of ones. Markowtiz mean–variance portfolio theory minimizes the variance of the portfolio

{min}_{w_{t}} w_{t}^{'} Σ_{t} w_{t}

, subject to the constraint

w_{t}^{'} 1 = 1

and

w_{t}^{'} μ_{t} = μ_{*}

, where

μ_{*}

is the target return. When short selling is allowed, the efficient portfolio can be explicitly expressed as

w_{effi, t} = \frac{\tilde{b}}{\tilde{d}} Σ_{t}^{- 1} 1 - \frac{\tilde{a}}{\tilde{d}} Σ_{t}^{- 1} μ_{t} + μ_{*} (\frac{\tilde{c}}{\tilde{d}} Σ_{t}^{- 1} μ_{t} - \frac{\tilde{a}}{\tilde{d}} Σ_{t}^{- 1} 1),

where

\tilde{a} = μ_{t} Σ_{t}^{- 1} 1

,

\tilde{b} = μ_{t}^{'} Σ_{t} μ_{t}

,

\tilde{c} = 1^{'} Σ_{t} 1

, and

\tilde{d} = \tilde{b} \tilde{c} - {\tilde{a}}^{2}

. When the target return

μ_{*}

is chosen to minimize the variance of the efficient portfolio, we obtain the global minimum variance (GMV) portfolio:

w_{minvar, t} = Σ_{t}^{- 1} 1 / (1^{'} Σ_{t} 1) .

For comparison purposes, we also use another three multivariate volatility models to predict the covariance matrices of

n = 18

stocks. The first is very simple, and it assumes a constant covariance matrix for n stocks. The second is a factor-GARCH model (; ; ; ), which assumes the following for asset return vector

r_{t}

, factors

f_{t}

, and volatilties of k independent factors:

r_{t} = W f_{t}, Cov (f_{t}) = Σ_{t} = diag {σ_{1 t}^{2}, σ_{2 t}^{2}, \dots, σ_{k t}^{2}},

σ_{i t}^{2} = 1 + β_{i} x_{i, t - 1}^{2} + γ_{i} σ_{i, t - 1}^{2},

where W is a

k \times k

lower-triangular matrix with diagonal elements equal to 1 and

x_{t} = {(x_{1 t}, \dots, x_{k t})}^{'}

is a vector of k independent factors. The third covariance model is a dynamic conditional correlation GARCH (DCC–GARCH) model (), which has the form

r_{t} = Σ_{t}^{\frac{1}{2}} ϵ_{t}, ϵ_{t} \sim N (0, I_{n}), Σ_{t} = D_{t} R_{t} D_{t},

Q_{t} = (1 - α - β) C + α s_{t - 1} s_{t - 1}^{'} + β Q_{t - 1,} R_{t} = diag {(Q_{t})}^{- \frac{1}{2}} Q_{t} diag {(Q_{t})}^{- \frac{1}{2}},

where

D_{t} = diag (d_{1 t}, \dots, d_{n t})

,

s_{i, t} = r_{i, t} / d_{i, t}

,

s_{t} = {(s_{1, t}, \dots, s_{T, t})}^{'}

, and

R_{t}

is the conditional correlation matrix at time t, that is,

R_{t} = Corr (r_{t} | F_{t - 1})

. And C is the unconditional correlation matrix, i.e.,

C = E (R_{t})

. The matrix

Q_{t}

can be interpreted as a conditional covariance matrix of devolatilized residuals. For the dynamics of the univariate volatilities,

d_{i, t}

s are assumed to follow a GARCH(1,1) process:

d_{i, t} = ω_{i} + a_{i} r_{i, t - 1}^{2} + b_{i} d_{i, t - 1}^{2},

where

(w_{i}, a_{i}, b_{i})

are GARCH(1,1) parameters.

Let

t =

2 January 2018, ⋯, 31 January 2018; we first fit 4 covariance models to the returns of 18 stocks from 4 January 2016 to t, and then compute the 1-day-ahead prediction of covariance matrices. Using the predicted covariance matrices, we compute the efficient portfolios

w_{minvar, t + 1}

and

w_{effi, t + 1}

for

μ_{*} = 0.15 %, 0.10 %

, and

0.05 %

. Table 5 shows the means, standard deviations (SD), and the information ratios (IR, i.e., ratio of means and standard deviations) for realized portfolio returns in the month of January 2018. As argued by () and (), these statistics are good measurements of the out-of-sample performance of Markowitz portfolios. As () claimed that it is difficult to outperform equally weighted portfolios in terms of the out-of-sample mean for Markowitz portfolios, we also include the performance of equally weighted portfolios as a benchmark in Table 5. We note that all the means generated from four covariance models are smaller than that from equally weighted portfolios (0.430%), and the standard deviations of covariance models, except the factor GARCH, are smaller than that of equally weighted portfolios. Notably, the regularized BEKK model consistently maintains the second-best mean performances at 0.39%, 0.352%, 0.382%, and 0.416% for GMV, and

μ_{*}

values of 0.15%, 0.10%, and 0.05%. However, the information ratio of the regularized BEKK model surpasses that of all other portfolios. It achieves the highest values across all scenarios—0.601, 0.540, 0.654, and 0.657—for GMV and

μ_{*} = 0.15 %, 0.10 %

, and

0.05 %

. These results show the robustness and efficiency of the regularized BEKK model in portfolio optimization, consistently delivering competitive mean performance and superior risk-adjusted returns compared to other covariance models.

Table 5. Performance of portfolios using different covariance models.

7. Discussion and Conclusive Remarks

Modeling the dynamics of high-dimensional covariance matrices is an interesting and challenging problem in both financial econometrics and high-dimensional time series analysis. To address this issue, this paper proposes an inference procedure with

L_{1}

regularization for the sparse representation of high-dimensional BEKK and to obtain a class of penalized quasi-maximum likelihood estimators. The proposed regularization allows us to find significant parameters in the BEKK representation and shrink the non-essential ones to zero, hence providing a sparse estimate of the BEKK representations. We show that the sparse BEKK representation has suitable theoretical properties and is promising for applications in portfolio optimization and volatility spillover.

The proposed sparse BEKK representation also contributes to the application of machine learning methods in time series modeling. As most discussion on applying regularization methods to time series modeling focuses on regularizing high-dimensional vector autoregressive models and their variants (; ), it seems that the sparse representation of dynamics of high-dimensional variance–covariance matrices has been ignored in the literature. While obtaining a sparse representation of the dynamics within high-dimensional variance–covariance matrices is crucial to enhance interpretability in time series modeling, our study bridges this gap by considering a basic

L_{1}

regularization method. One obvious extension from our current study is to replace the

L_{1}

penalty with other types of penalty for high-dimensional MGARCH models, for instance, the SCAD penalty (), the adaptive LASSO (), and the group LASSO (). With different types of penalty functions, one can regularize the assets in the model with different requirements, hence causing the estimates to have different kinds of asymptotic properties.

As the proposed sparse BEKK representation simplifies the dynamics of covariance matrices of high-dimensional time series, it has advantages over existing MGARCH models in some financial applications. In particular, the sparse BEKK representation can capture significant volatility spillover effects in high-dimensional financial time series, which usually cannot be analyzed using other MGARCH models. Since significant volatility spillover is captured, the proposed method also improves the performance of portfolio optimization based on the dynamics of high-dimensional covariance matrices. The proposed procedure can certainly be extended to incorporate more empirical aspects of financial time series. Taking the leverage effect as an example, one may modify the regularization procedure to obtain sparse representation of high-dimensional multivariate exponential or threshold GARCH models.

Although the proposed framework shows advantages in modeling dynamics of high-dimensional covariance matrices, the computational challenge is not completely resolved. The main reason is that the proposed inference procedure involves a step of computing derivatives via the Kronecker product of parameter matrices. Since the Kronecker product turns two

n \times n

matrices into an

n^{2} \times n^{2}

matrix, the requirement for computational memory resources increases significantly. Hence, the proposed procedure is suitable for problems in which the number of component time series ranges from several to 100. If the number of assets progresses beyond 200, the computational cost is still a major concern. One possible remedy for this is training a neural network to approximate the regularized likelihood of the high-dimensional model. In such a way, the proposed regularization using the high-dimensional MGARCH model can be extended to characterize the dynamics of covariance matrices of larger size.

Author Contributions

Conceptualization, H.X.; methodology, H.X., H.Z. and S.Y.; software, S.Y.; validation, S.Y., H.Z. and H.X.; formal analysis, S.Y.; investigation, S.Y., H.X. and H.Z.; resources, S.Y.; data curation, S.Y.; writing—original draft preparation, S.Y., H.X. and H.Z.; writing—review and editing, H.X.; visualization, S.Y.; supervision, H.X.; project administration, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available by request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike information criterion
BEKK	Baba–Engle–Kraft–Kroner
BIC	Bayesian information criterion
CV	Cross-validation
DCC	Dynamic conditional correlation
GARCH	Generalized autoregressive conditionally heteroskedastic
GMV	Global minimum variance
IR	Information ratio
LARS	Least-angle regression
LASSO	Least absolute shrinkage and selection operator
MGARCH	Multivariate GARCH
PQL	Penalized quasi-likelihood
PQML	Penalized quasi-maximum likelihood
SCAD	Smoothly clipped absolute deviation
SD	Standard deviation

Appendix A. Proofs of Propositions, Lemmas, and Theorems

Appendix A.1. Proof of Propostion 2

Let

R_{t}

,

C

, and

Σ_{t}^{*}

be defined by

R_{t} = {(v e c h {(r_{t} r_{t}^{'})}^{'}, \dots, v e c h {(r_{t - m + 1} r_{t - m + 1}^{'})}^{'})}^{'}, Σ_{t}^{*} = {(v e c h {(Σ_{t})}^{'}, \dots, v e c h {(Σ_{t - m + 1})}^{'})}^{'},

where

m = m a x (a, b)

, and let

C = {(v e c h {(C^{'} C)}^{'}, 0, \dots, 0)}^{'},

with dimensions

m n (n + 1) / 2

.

Define

A = (\begin{matrix} {\tilde{A}}_{1} & \dots & \dots & \dots & {\tilde{A}}_{m} \\ I & 0 & 0 & \dots & 0 \\ 0 & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & I & 0 \end{matrix})

and

B = (\begin{matrix} {\tilde{B}}_{1} & \dots & \dots & \dots & {\tilde{B}}_{m} \\ I & 0 & 0 & \dots & 0 \\ 0 & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & I & 0 \end{matrix}),

with convention

{\tilde{A}}_{i} = 0

if

i > a

and

{\tilde{B}}_{i} = 0

if

i > b .

Then, the model can be written as

Σ_{t}^{*} = C + A R_{t} + B Σ_{t - 1}^{*} = \sum_{k = 0}^{t - 1} [B^{k} (θ) C (θ)] + B^{t} (θ) Σ_{0}^{*} + \sum_{k = 0}^{t - 1} B^{k} (θ) A (θ) L^{k} R_{t - 1} (θ_{0})

where L is the backshift operator

L r_{t} = r_{t - 1}

. Here,

Σ_{0}^{*}

is fixed and

R_{t}

depends on

θ_{0}

but is not a function of

θ

. Then, we have

\frac{\partial Σ_{t}^{*}}{\partial θ_{i}} = \frac{\partial}{\partial θ_{i}} (\sum_{k = 0}^{t - 1} {\tilde{B}}^{k} C) + \frac{\partial}{\partial θ_{i}} ({\tilde{B}}^{t}) Σ_{0}^{*} + \frac{\partial}{\partial θ_{i}} (\sum_{k = 0}^{t - 1} {\tilde{B}}^{k} \tilde{A} L) R_{t - 1}

(A1)

Since

\frac{\partial {\tilde{B}}^{k}}{\partial θ} = \sum_{j = 0}^{k - 1} {\tilde{B}}^{j} \frac{\partial \tilde{B}}{\partial θ_{j}} {\tilde{B}}^{k - 1 - j},

we have

∥B^{j} \frac{\partial B}{\partial θ_{j}} B^{k - 1 - j}∥ \leq | | B^{j} | | ∥\frac{\partial B}{\partial θ_{i}}∥ | | B^{k - 1 - j} | |, j = 0, \dots, k - 1 .

Applying Lemma A.3. from (),

| | B^{k} | | \leq Ψ k^{n_{0}} ρ_{0}^{k}

for all k, we have

∥B^{j} \frac{\partial B}{\partial θ_{j}} B^{k - 1 - j}∥ \leq Ψ^{2} k^{n_{0}} ρ_{0}^{k} ∥\frac{\partial B}{\partial θ_{i}}∥ .

in which

n_{0}

is a fixed number,

Ψ

is a constant independent of

θ

, and

- 1 < ρ_{0} < 1

.

To bound (A1), there are three terms to bound:

\begin{matrix} ∥\frac{\partial}{\partial θ_{i}} (\sum_{k = 0}^{t - 1} {\tilde{B}}^{k} C)∥ = ∥\sum_{k = 1}^{t - 1} \frac{\partial {\tilde{B}}^{k}}{\partial θ_{i}} C + \sum_{k = 0}^{t - 1} {\tilde{B}}^{k} \frac{\partial C}{\partial θ_{i}}∥ \leq \\ \sum_{k = 1}^{t - 1} ∥\frac{\partial {\tilde{B}}^{k}}{\partial θ_{i}}∥ | | | C | | + \sum_{k = 0}^{t - 1} | | {\tilde{B}}^{k} | | ∥\frac{\partial C}{\partial θ_{i}}∥ \\ \leq Ψ^{2} | | C | | \sum_{k = 1}^{t - 1} k^{n_{0}} ρ_{0}^{k} ∥\frac{\partial \tilde{B}}{\partial θ_{i}}∥ + Ψ ∥\frac{\partial C}{\partial θ_{i}}∥ \sum_{k = 0}^{t - 1} k^{n_{0}} ρ_{0}^{k} \\ \leq π (n_{0}) Ψ (Ψ | | C | | \cdot ∥\frac{\partial \tilde{B}}{\partial θ_{i}}∥ + ∥\frac{\partial C}{\partial θ_{i}}∥), \end{matrix}

using

\sum_{k = 0}^{t - 1} k^{n_{0}} ρ_{0}^{k} \leq \sum_{k = 0}^{t} k^{n_{0}} ρ_{0}^{k - 1} \leq \sum_{k = 0}^{\infty} k^{n_{0}} ρ_{0}^{k - 1} = π (n_{0})

, where

π (n_{0})

is a constant that only depends on

n_{0} .

And, if

ρ_{0} = 0,

this term is then easily bounded because

\tilde{B}

is the nilpotent and all sums are finite. In the same way,

∥\frac{\partial}{\partial θ_{i}} ({\tilde{B}}^{t}) Σ_{0}^{*}∥ \leq Ψ π (n_{0}) ∥\frac{\partial \tilde{B}}{\partial θ_{i}}∥ | | Σ_{0}^{*} | | .

Finally,

| | \frac{\partial}{\partial θ_{i}} (\sum_{k = 0}^{t - 1} {\tilde{B}}^{k} L^{k} \tilde{A}) R_{t - 1} | | \leq | | \sum_{k = 0}^{t - 1} (\frac{\partial}{\partial θ_{i}} {\tilde{B}}^{k} L^{k} \tilde{A)} R_{t - 1} | | + | | [\sum_{k = 0}^{t - 1} {\tilde{B}}^{k} L^{k} (\frac{\partial}{\partial θ_{i}} \tilde{A}) R_{t - 1} | | .

Denote the first and second sums on the right-hand side of the inequality as

T_{1}

and

T_{2}

, respectively, we have

\begin{matrix} | | T_{1} | | \leq Ψ^{2} (\sum_{k = 1}^{t - 1} k^{n_{0} + 1} ρ_{0}^{k - 1} | | \tilde{A} | | \cdot | | R_{t - k - 1} | |) \cdot | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | \\ \leq Ψ^{2} | | \tilde{A} | | \cdot | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | \cdot (\sum_{k = 1}^{t - 1} k^{n_{0} + 1} ρ_{0}^{k - 1}) \cdot sup_{t} | | R_{t} | | \leq π (n_{0} + 1) Ψ^{2} | | \tilde{A} | | \cdot | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | \cdot sup_{t} | | R_{t} | |, \end{matrix}

and

| | T_{2} | | \leq Ψ^{2} π (n_{0}) | | \frac{\partial \tilde{A}}{\partial θ_{i}} | | \underset{t}{s u p} | | R_{t} | | .

By our assumption,

| | C | |, | | \frac{\partial C}{\partial θ_{i}} | |, | | \tilde{A} | |, | | \frac{\partial \tilde{A}}{\partial θ_{i}} | |, | | \frac{\partial \tilde{B}}{\partial θ_{i}} | |, | | Σ_{0}^{*} | |

are all bounded. And there exists a constant w such that

| | \frac{\partial Σ_{t}^{*}}{\partial θ_{i}} | | = m | | \frac{\partial Σ_{t}}{\partial θ_{i}} | | .

Hence,

| | \frac{\partial Σ_{t}}{\partial θ_{i}} | | \leq Ψ_{1} + Ψ_{2} sup_{t} | | R_{t} | | .

where

Ψ_{1} = Ψ π (n_{0}) (Ψ | | C | | \cdot | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | + | | \frac{\partial C}{\partial θ_{i}} | |) + Ψ π (n_{0}) | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | \cdot | | Σ_{0}^{*} | |

, and

Ψ_{2} = Ψ^{2} π (n_{0} + 1) | | \tilde{A} | | \cdot | | \frac{\partial \tilde{B}}{\partial θ_{i}} | | + Ψ^{2} π (n_{0}) | | \frac{\partial \tilde{A}}{\partial θ_{i}} | |

. □

Appendix A.2. Proof of Proposition 4

As

\frac{\partial l_{t} (θ)}{\partial θ_{i}} = T r (\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1})

, where

T r (\cdot)

denote the trace of a matrix, and

E [r_{t} r_{t}^{'} | F_{t - 1}] = Σ_{t}

, we, hence, have

E [\frac{\partial l_{t} (θ)}{\partial θ_{i}} | F_{t - 1}] = 0,

which means that

\frac{\partial l_{t} (θ)}{\partial θ_{i}}

is a martingale difference. Then, we want to prove that

E [| T^{1 / 2} T^{- 1} \sum_{t = 1} T \frac{\partial l_{t} (θ^{0})}{\partial θ_{i}} |^{m}] = E [| T^{- 1 / 2} \sum_{t = 1}^{T} \frac{\partial l_{t} (θ^{0})}{\partial θ_{i}} |^{m}] < \infty

holds for

m = 4

. By Lemma 2, this proof is thus completed if we show that

E [| \frac{\partial l_{t} (θ^{0})}{\partial θ_{i}} |^{4}] < \infty .

By Proposition 2,

∥\frac{\partial Σ_{t}}{\partial θ_{i}}∥ \leq Ψ_{1} + Ψ_{2} sup | | v e c h (r_{t} r_{t}^{'}) | |

. Since

\begin{matrix} ∥\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1}∥ = ∥(I - r_{t} r_{t}^{'} Σ_{t}^{- 1}) \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1}∥ \\ \leq | | (I - r_{t} r_{t}^{'} Σ_{t}^{- 1}) | | ∥\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1}∥ \leq | | (I - r_{t} r_{t}^{'} Σ_{t}^{- 1}) | | ∥\frac{\partial Σ_{t}}{\partial θ_{i}}∥ | | Σ_{t}^{- 1} | |, \end{matrix}

it is equivalent to show that

E [{|\frac{\partial l_{t} (θ^{0})}{\partial θ_{i}}|}^{4}] = E [T r^{4} (\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1})] < \infty .

Since

T r (A B) \leq | | A | | \cdot | | B | |

and

| | Σ_{t}^{- 1} | |

is finite, there exists a constant M such that

| | Σ_{t}^{- 1} | | \leq M

for all t. Additionally,

| | (I - r_{t} r_{t}^{'} Σ_{t}^{- 1}) | | \leq | | I | | + | | r_{t} r_{t}^{'} | | \cdot | | Σ_{t} {| |}^{- 1} \leq 1 + M | | r_{t} r_{t}^{'} | |

, then

T r (\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1}) \leq | | (I - r_{t} r_{t}^{'} Σ_{t}^{- 1}) | | | | Σ_{t}^{- 1} | | ∥\frac{\partial Σ_{t}}{\partial θ_{i}}∥

E [T r^{4} (\frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1} - r_{t} r_{t}^{'} Σ_{t}^{- 1} \frac{\partial Σ_{t}}{\partial θ_{i}} Σ_{t}^{- 1})] \leq E [(1 + M sup | | r_{t} r_{t}^{'} {| |)}^{4} (Ψ_{1} + Ψ_{2} sup_{t} | | v e c h (r_{t} r_{t}^{'}) | |)^{4}] .

Because

| | A | | \leq | | v e c h (A) | | \leq r a n k (A) | | A | |

, there exists a constant k such that

| | r_{t} r_{t}^{'} | | = k | | v e c h (r_{t} r_{t}^{'}) | |

. Hence, if we let

| | v e c h (r_{t} r_{t}^{'}) | | = | | R_{t} | |,

\begin{matrix} E [(1 + M sup_{t} | | r_{t} r_{t}^{'} {| |)}^{4} (Ψ_{1} + Ψ_{2} sup_{t} | | R_{t} | |)^{4}] \\ = & E [(1 + k M sup_{t} | | R_{t} {| |)}^{4} (Ψ_{1} + Ψ_{2} sup_{t} | | R_{t} | |)^{4}] = E (\sum_{i = 0}^{8} a_{i} | | R_{t} {| |}^{i}) \end{matrix}

where

a_{i} s

are constants. Since

r_{t} \sim Σ_{t}^{\frac{1}{2}} ϵ_{t}

, where

ϵ_{t} s

follow a normal distribution,

r_{t} s

, hence, admit 16 moments of order. Hence,

E | | R_{t} {| |}^{i} < \infty,

for i from 0 to 8. Hence,

E (\sum_{i = 0}^{8} a_{i} | | R_{t} | |^{i}) < \infty

; then,

E [| \frac{\partial l_{t} (θ^{0})}{\partial θ_{i}} |^{4}] < \infty

.

Next, we check (c) and (d). (c) is clear, as we said before. By (III) in Lemma 1, the derivative of

H_{U_{0}, T} (θ)

is bounded. By the mean-value theorem,

v e c (H_{U_{0}, T} (θ^{(1)}, 0) - H_{U_{0}, T} (θ^{(2)}, 0)) = \frac{\partial H_{U_{0}, T} (θ, 0)}{\partial θ} |_{θ = θ^{*}} \cdot (θ^{(1)} - θ^{(2)}) .

Hence,

\begin{matrix} | | H_{U_{0}, T} (θ^{(1)}, 0) - H_{U_{0} T} (θ^{(2)}, 0) | | \leq | | v e c (H_{U_{0}, T} (θ^{(1)}, 0) - H_{U_{0}, T} (θ^{(2)}, 0) | | \\ = & ∥\frac{\partial H_{U_{0,} T} (θ, 0)}{\partial θ} |_{θ = θ^{*}} \cdot (θ^{(1)} - θ^{(2)})∥ \leq ∥\frac{\partial H_{U_{0}, T} (θ, 0)}{\partial θ} |_{θ = θ^{*}}∥ \cdot | | θ^{(1)} - θ^{(2)} | | \\ \leq & \tilde{K} | | θ^{(1)} - θ^{(2)} | | \end{matrix}

where

\tilde{K}

is bounded by (iii) in Proposition 3. Hence,

\tilde{K} = O_{p} (1)

and

θ^{*}

lies between

θ^{(1)}

and

θ^{(2)}

.

Next, we verify (e) with

β = δ_{0} / 2

. For every

i \in {1, \dots, p},

it is sufficient to show that

\max_{| | v | | = 1} | (H_{i 1, T}^{0}, \dots, H_{i q, T}^{0}) v | = O_{p} (T^{δ_{0} / 2})

for a vector

v \in R^{q} .

Using the Cauchy–Schwarz inequality and property of the norm, the left-hand side is bounded by

| | (H_{i 1, T}^{0}, \dots, H_{i q, T}^{0}) | | \leq q^{1 / 2} \max_{1 \leq j \leq q} | H_{i j, T} | .

Since, from (I) and (II) in Lemma 1,

H_{i j, T}^{0} = O_{p} (1)

and

q = O (T^{δ_{0}}),

the result follows. □

Appendix A.3. Proof of Lemma 1

First, consider the PQL

Q_{T} (θ)

, as defined in (5), in the constrained

| | \hat{θ} {| |}_{0}

-dimensional subspace

S : = {θ \in R^{p} : θ^{c} = 0}

of

R^{p},

where

θ^{c}

denotes the subvector of

θ

formed by the components in

\hat{U^{c}}

. It follows from (12) that

Q_{T} (θ

) is strictly concave in a ball

N_{0} \in S

centered at

\hat{θ} .

This, along with (10), entails that

\hat{θ}

, as a critical point of

Q_{T} (θ)

in

S

, is the unique maximizer of

Q_{T} (θ)

in

N_{0}

.

Now, we show that

\hat{θ}

is indeed a strict local maximizer of

Q_{T} (θ)

on the whole space

R^{p} .

Take a small ball

N_{1} \subset R^{p}

centered at

\hat{θ}

such that

N_{1} \cap S \subset N_{0}

. We then need to show that

Q_{T} (\hat{θ}) > Q_{T} (γ_{1})

for any

γ_{1} \in N_{1} ∖ N_{0} .

Let

γ_{2}

be the projection of

γ_{1}

onto

S

, such that

γ_{2} \in N_{0}

. Thus, it suffices to prove that

Q_{T} (γ_{2}) > Q_{T} (γ_{1})

. By the mean value theorem, we have

Q_{T} (γ_{1}) - Q_{T} (γ_{2}) = \frac{\partial Q_{T} (γ_{0})}{\partial γ^{T}} (γ_{1} - γ_{2}),

where the vector

γ_{0}

lies between

γ_{1}

and

γ_{2}

. Note that the components of

γ_{1} - γ_{2}

are zero for their indices in

\hat{U}

and

(γ_{0 j}) = s g n (γ_{1 j})

for

j \in \hat{U^{c}}

. Therefore, we have

\begin{matrix} \frac{\partial Q_{T} (γ_{0})}{\partial γ^{T}} (γ_{1} - γ_{2}) & = & S_{T} {(γ_{0})}^{T} (γ_{1} - γ_{2}) - λ_{T} {[1 ⊙ s g n (γ_{0})]}^{T} (γ_{1} - γ_{2}) \\ = & S_{\hat{U^{c}} T} {(γ_{0})}^{T} γ_{1 \hat{U^{c}}} - λ_{T} \sum_{j \in \hat{U^{c}}} | γ_{1 j} | \end{matrix}

(A2)

where

γ_{1 \hat{U^{c}}}

is a subvector of

γ_{1}

formed by the components in

\hat{U^{c}}

. By (10), there exists some

δ > 0

such that, for any

θ

in a ball in

R^{p}

centered at

\hat{θ}

with radius

δ

,

| | S_{\hat{U^{c}} T} (θ) {| |}_{\infty} < λ_{T}

(A3)

We further shrink the radius of ball

N_{1}

to less than

δ

, so that

| γ_{0 j} | \leq | γ_{1 j} | < δ

for

j \in \hat{U^{c}}

and (A3) holds for any

θ \in N_{1}

. Since

γ_{0} \in N_{1}

, it follows from (A3) that (A2) is strictly less than

λ_{T} | | γ_{1 \hat{U^{c}}} {| |}_{1} - λ_{T} | | γ_{1 \hat{U^{c}}} {| |}_{1} = 0

Since

| | S_{\hat{U^{c}} T} (γ_{0}) {| |}_{\infty} < λ_{T}

,

S_{\hat{U^{c}} T} {(γ_{0})}^{T} γ_{1 \hat{U^{c}}} \leq λ_{T} | | γ_{1 \hat{U^{c}}} {| |}_{1}

, and

λ_{T} \sum_{j \in \hat{U^{c}}} {| γ}_{1 j} | \geq λ_{T} \sum_{j \in \hat{U^{c}}} {| γ}_{1 j} | = λ_{T} \sum_{j \in \hat{U^{c}}} | | γ_{1 \hat{U^{c}}} {| |}_{1} .

we have

\frac{\partial Q_{T} (γ_{0})}{\partial γ^{T}} (γ_{1} - γ_{2}) \leq 0

and

Q_{T} (γ_{1}) \leq Q_{T} (γ_{2})

. □

Appendix A.4. Proof of Lemma 2

A Marcinkiewicz-–Zygmund inequality for martingales () states that

E {(\sum_{t = 1}^{T} w_{t})}^{m} \leq {4 m (m - 1)}^{m / 2} T^{(m - 2) / 2} \sum_{t = 1}^{T} E {| w_{t} |}^{m}

(A4)

holds for

m > 2

. Because

E | w_{t} |^{m} \leq C_{w}

for all t, we have

T^{- m / 2} E {(\sum_{t = 1}^{T} w_{t})}^{m} \leq {4 m (m - 1)}^{m / 2} T^{- 1} \sum_{t = 1}^{T} E {| w_{t} |}^{m} \leq {4 m (m - 1)}^{m / 2} C_{w} .

(A5)

Thus, the result follows. □

Appendix A.5. Proof for Theorem 1

For notational simplicity, we write, for example,

Q_{T} ({({(θ_{U_{0}})}^{'}, {(θ_{U_{0}}^{c})}^{'})}^{'})

as

Q_{T} (θ_{U_{0}}, θ_{U_{0}}^{c})

. Consider events

E_{T}^{1} = {| | S_{U_{0}, T}^{0} {| |}_{\infty} \leq {(q^{1 / 2} / T)}^{1 / 2} l o g^{1 / 4} T}, E_{T}^{2} = {| | S_{U_{0}^{c}, T}^{0} {| |}_{\infty} \leq λ l o g^{- 1} T},

where

q = O (T^{δ_{0}})

and

λ = O (T^{- α}) .

It follows from Bonferroni’s inequality and Markov’s inequality, together with Proposition 4(i), that

\begin{matrix} P (E_{T}^{1} \cap E_{T}^{2}) & \geq & 1 - \sum_{i \in U_{0}} P (| T^{1 / 2} S_{i, T}^{0} | > q^{1 / 4} {(\log T)}^{1 / 4}) - \sum_{i \in U_{0}^{c}} P (| T^{1 / 2} S_{i, T}^{0} | > T^{1 / 2 - α}) \\ \geq & 1 - \frac{\max_{i \in U_{0}} E (| T^{1 / 2} S_{i, T}^{0} |^{4})}{q \log T} - (p - q) \frac{\max_{i \in U_{0}^{c}} E (| T^{1 / 2} S_{i, T}^{0} |^{4})}{T^{4 (1 / 2 - α)} {(\log T)}^{- 4}} \\ = & 1 - O (\log^{- 1} T) - O (T^{δ - 4 (1 / 2 - α)} {(\log T)}^{4}), \end{matrix}

(A6)

where the last two terms are

o (1)

because of the condition

δ < 4 (1 / 2 - α) .

Under the event

E_{T}^{1} \cap E_{T}^{2},

we will that there exists a solution

\hat{θ} \in R^{p}

to (10)–(12) with

s g n (\hat{θ}) = s g n (θ^{0})

and

| | \hat{θ} - θ^{0} {| |}_{\infty} = O (T^{- γ} \log T)

for some

γ \in (0, 1 / 2] .

First, we prove that, for a sufficiently large T, Equation (10) has a solution

{\hat{θ}}_{U_{0}}

inside the hypercube

N = {θ_{U_{0}} \in R^{q} : | | θ_{U_{0}} - θ_{U_{0}}^{0} | |_{\infty} = T^{- γ} \log T}

, when we suppose

\hat{U} = U_{0}

. Define the function

Ψ : R^{q} \to R^{q}

by

Ψ (θ_{U_{0}}) = S_{U_{0}, T} (θ_{U_{0}}, 0) - λ 1 ⊙ s g n (θ_{U_{0}}) .

(A7)

Then, (10) is equivalent to

Ψ ({\hat{θ}}_{U_{0}}) = 0 .

To show that the solution is in the hypercube

N,

we expand

Ψ (θ_{U_{0}})

around

θ_{U_{0}}^{0} .

Function (A7) is written as

\begin{matrix} Ψ (θ_{U_{0}}) = S_{U_{0}, T}^{0} + H_{U_{0} T} (θ_{U_{0}}^{*}, 0) (θ_{U_{0}} - θ_{U_{0}}^{0}) - λ 1 ⊙ s g n (θ_{U_{0}}) \\ = H_{U_{0}, T}^{0} (θ_{U_{0}} - θ_{U_{0}}^{0}) + [S_{U_{0}, T}^{0} - λ 1 ⊙ s g n (θ_{U_{0}})] + [H_{U_{0}, T} (θ_{U_{0}}^{*}, 0) - H_{U_{0}, T}] (θ_{U_{0}} - θ_{U_{0}}^{0}) \\ = H_{U_{0}, T}^{0} (θ_{U_{0}} - θ_{U_{0}}^{0}) + v_{T} + w_{T} \end{matrix}

(A8)

where

θ_{U_{0}}^{*}

lies on the line segment that joins

θ_{U_{0}}

and

θ_{U_{0}}^{0}

. Since the matrix

H_{U_{0}}^{0}

is invertible by Proposition 4(ii), (A8) is further written as

\begin{matrix} \tilde{Ψ} (θ_{U_{0}}) & : = & {(H_{U_{0}, T}^{0})}^{- 1} Ψ (θ_{U_{0}}) = θ_{U_{0}} - θ_{U_{0}}^{0} + {(H_{U_{0}, T}^{0})}^{- 1} v_{T} + {(H_{U_{0}, T}^{0})}^{- 1} w_{T} \\ = & θ_{U_{0}} - θ_{U_{0}}^{0} + {\tilde{v}}_{T} + {\tilde{w}}_{T} \end{matrix}

(A9)

We now derive bounds for the last two terms in (A8). We consider

{\tilde{v}}_{T}

first. For any

θ_{U_{0}} \in N

,

min_{j \in U_{0}} | θ_{j} | \geq min_{j \in U_{0}} | θ_{j}^{0} | - d_{T} = d_{T} \geq T^{- γ} \log T

(A10)

by Condition 3(ii), and

s g n (θ_{U_{0}}) = s g n (θ_{U_{0}}^{0}) .

Using Condition 3(i), we have

| | λ 1 ⊙ s g n (θ_{U_{0}}) {| |}_{\infty} = λ \leq o (q^{- 1 / 2} T^{- γ} \log T) .

This, along with the property of matrix norms and Proposition 4(ii), entails that, during the event

E_{T}^{1},

\begin{matrix} | | {\tilde{v}}_{T} {| |}_{\infty} & = & | | H_{U_{0} T}^{0 - 1} [S_{U_{0} T}^{0} - λ 1 ⊙ s g n (θ_{U_{0}})] {| |}_{\infty} \\ \leq & | | H_{U_{0} T}^{0 - 1} {| |}_{\infty} | | S_{U_{0} T}^{0} - λ 1 ⊙ s g n (θ_{U_{0}}) {| |}_{\infty} \\ \leq & q^{1 / 2} | | H_{U_{0} T}^{0 - 1} | | (| | S_{U_{0} T}^{0} {| |}_{\infty} + | | λ 1 ⊙ s g n (θ_{U_{0}}) {| |}_{\infty}) \\ \leq & q^{1 / 2} O_{p} (1) ({(q^{2 / 4} / T)}^{1 / 2} {(\log T)}^{1 / 2} + o (q^{- 1 / 2} T^{- γ} \log T)) \\ = & o_{p} (T^{- γ} \log T) \end{matrix}

(A11)

where the last equality follows from

q = O (T^{δ_{0}})

and

δ_{0} < \frac{2}{3} (1 - 2 γ) .

Next, we consider

{\tilde{w}}_{T}

. By the property of norms and Propositions 4(ii) and (iii),

\begin{matrix} | | {\tilde{w}}_{T} {| |}_{\infty} & = & | | {(H_{U_{0}, T}^{0})}^{- 1} (θ_{U_{0}}^{*}, 0) [H_{U_{0}, T} (θ_{U_{0}}^{*}, 0) - H_{U_{0}, T}^{0}] (θ_{U_{0}} - θ_{U_{0}}^{0}) {| |}_{\infty} \\ \leq & q^{1 / 2} | | {(H_{U_{0}, T}^{0})}^{- 1} | | | | [H_{U_{0}, T} (θ^{*}, 0) - H_{U_{0}, T}^{0}] (θ_{U_{0}} - θ_{U_{0}}^{0}) {| |}_{\infty} \\ \leq & q O_{p} (1) | | H_{U_{0}, T} (θ_{U_{0}}^{*}, 0) - H_{U_{0}, T}^{0} | | | | θ_{U_{0}} - θ_{U_{0}}^{0} {| |}_{\infty} \\ \leq & q O_{p} (1) K_{T} | | θ_{U_{0}}^{*} - θ_{U_{0}}^{0} {| |}_{\infty} | | θ_{U_{0}} - θ_{U_{0}}^{0} {| |}_{\infty}, \end{matrix}

Since

K_{T} = O_{p} (1)

and

q = O (T^{δ_{0}})

with

δ_{0} < γ

,

| | {\tilde{w}}_{T} {| |}_{\infty} = q O_{p} (T^{- 2 γ} {(\log T)}^{2}) = o_{p} (T^{- γ} \log T),

(A12)

with

θ_{i} - θ_{i}^{0} = T^{- γ} l o g T

for all

i \in U_{0} .

By (A9), (A11), and(A12), for sufficiently large T, and for all

i \in U_{0},

{\tilde{Ψ}}_{i} (θ_{U_{0}}) \geq T^{- γ} l o g T - | | {\tilde{v}}_{T} {| |}_{\infty} - | | {\tilde{w}}_{T} {| |}_{\infty} \geq 0,

(A13)

if

θ_{i} - θ_{i}^{0} = T^{- γ} \log T,

and

{\tilde{Ψ}}_{i} (θ_{U_{0}}) \leq - T^{- γ} l o g T + | | {\tilde{v}}_{T} {| |}_{\infty} + | | {\tilde{w}}_{T} {| |}_{\infty} \leq 0,

(A14)

if

θ_{i} - θ_{i}^{0} = - T^{- γ} \log T .

By the continuity of

\tilde{Ψ}

and inequalities (A13) and (A14), an application of Miranda’s existence theorem tells us that

\tilde{Ψ} (θ_{U_{0}}) = 0

has a solution

{\hat{θ}}_{U_{0}}

in

N .

Clearly,

{\hat{θ}}_{U_{0}}

also solves the equation

Ψ (θ_{U_{0}}) = 0

with regard to the first equality in (A8). Thus, we have shown that (10) indeed has a solution in

N .

Second, let

\hat{θ} = {({\hat{θ}}_{U_{0}}^{'}, {\hat{θ}}_{U_{0}^{c}}^{'})}^{'} \in R^{p}

, with

{\hat{θ}}_{U_{0}} \in N

as a solution to (10), and

{\hat{θ}}_{U_{0}^{c}} = 0

. Next, we show that

\hat{θ}

satisfies (11) for the event

E_{T}^{2}

. By the triangle inequality and mean value theorem, we have

\begin{matrix} λ^{- 1} | | S_{U_{0}^{c} T} (\hat{θ}) {| |}_{\infty} & \leq & λ^{- 1} | | S_{U_{0}^{c} T}^{0} {| |}_{\infty} + λ^{- 1} | | S_{U_{0}^{c} T} (\hat{θ}) - S_{U_{0}^{c} T}^{0} {| |}_{\infty} \\ \leq & {(\log T)}^{- 1} + λ^{- 1} | | (\partial / \partial θ_{U_{0}}) S_{U_{0}^{c} T} ({\hat{θ}}_{U_{0}}^{* *}, 0) ({\hat{θ}}_{U_{0}} - θ_{U_{0}}^{0}) {| |}_{\infty}, \end{matrix}

(A15)

where

{\hat{θ}}_{U_{0}}^{* *}

lies on the line segment joining

{\hat{θ}}_{U_{0}}

and

θ_{U_{0}}^{0} .

The first term of the upper bound in (A15) is negligible, so that it suffices to show that the second term is less than

g^{'} (0 +) = 1 .

Since

{\hat{θ}}_{U_{0}}

solves the equation

Ψ (θ_{U_{0}}) = 0

in (12), we obtain

S_{U_{0} T}^{0} + H_{U_{0} T} ({\hat{θ}}_{U_{0}}^{*}, 0) ({\hat{θ}}_{U_{0}} - θ_{U_{0}}^{0}) - λ 1 ⊙ s g n ({\hat{θ}}_{U_{0}}) = 0

with

{\hat{θ}}_{U_{0}}^{*}

lying between

{\hat{θ}}_{U_{0}}

and

θ_{U_{0}}^{0}

. From Proposition 4(ii),(iii) and Condition 1, the last term in (A15) can be expressed as

\begin{matrix} λ^{- 1} | | (\partial / \partial θ_{U_{0}}) S_{U_{0}^{c} T} ({\hat{θ}}_{U_{0}}^{* *}, 0) {[H_{U_{0} T} ({\hat{θ}}_{U_{0}}^{*}, 0)]}^{- 1} [S_{U_{0} T}^{0} - λ 1 ⊙ s g n ({\hat{θ}}_{U_{0}})] {| |}_{\infty} \\ \leq & λ^{- 1} sup_{θ, θ^{'} \in N} | | (\partial / \partial θ_{U_{0}}) S_{U_{0}^{c} T} (θ, 0) {[H_{U_{0} T} (θ^{'}, 0)]}^{- 1} {| |}_{\infty} (| | S_{U_{0} T}^{0} {| |}_{\infty} + λ) \\ \leq & λ^{- 1} c {(q^{1 / 2} / T)}^{1 / 2} l o g^{1 / 2} T + λ \\ = & λ^{- 1} c {(q^{1 / 2} / T)}^{1 / 2} \log^{1 / 2} T + c . \end{matrix}

(A16)

By Condition 3(i), the first term in the last equation of (A16) is

o_{p} (1)

; hence, (A16) is eventually less than 1. This verifies (11).

Finally, (12) is guaranteed by Lemma 1: we have

\hat{θ}

as a strict local maximizer of

Q_{T} (θ)

with

| | \hat{θ} - θ^{0} {| |}_{\infty} = O (T^{- γ} \log T)

and

{\hat{θ}}_{U_{0}^{c}} = 0

in the event that

E_{T}^{1} \cap E_{T}^{2} .

Thus, the proofs of Theorems 1(a) and (b) are complete, by (A6). □

References

Aielli, Gian Piero. 2013. Dynamic conditional correlation: On properties and estimation. Jouranl of Business and Economic Statistics 31: 282–99. [Google Scholar] [CrossRef]
Alexander, Carol. 2000. Orthogonal methods for generating large positive semi-definite covariance matrices. In ICMA Centre Discussion Papers in Finance icma-dp2000-06. London: Henley Business School, Reading University. [Google Scholar]
Ampountolas, Apostolos. 2022. Cryptocurrencies intraday high-frequency volatility spillover effects using univariate and multivariate GARCH models. International Journal of Financial Studies 10: 51. [Google Scholar] [CrossRef]
Apergis, Nicholas, and Anthony Rezitis. 2001. Asymmetric cross-market volatility spillovers: Evidence from daily data on equity and foreign exchange markets. The Manchester School 69: 81–96. [Google Scholar] [CrossRef]
Apergis, Nicholas, and Anthony Rezitis. 2003. An examination of okun’s law: Evidence from regional areas in greece. Applied Economics 35: 1147–51. [Google Scholar] [CrossRef]
Baillie, Rechard T., and Tim Bollerslev. 1990. A multivariate generalized arch approach to modeling risk premia in forward foreign exchange rate markets. Journal of International Money and Finance 9: 309–24. [Google Scholar] [CrossRef]
Basu, Sumanta, and George Michailidis. 2015. Regularized estimation in sparse high-dimensional time series model. The Annals of Statistics 43: 1535–67. [Google Scholar] [CrossRef]
Bauwens, Luc, and Sébastien Laurent. 2005. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models. Journal of Business and Economic Statistics 23: 346–54. [Google Scholar] [CrossRef]
Bickel, Peter J., and Elizaveta Levina. 2008. Covariance regularization by thresholding. The Annals of Statistics 36: 2577–604. [Google Scholar] [CrossRef] [PubMed]
Billio, Monica, Massimiliano Caporin, Lorenzo Frattarolo, and Loriana Pelizzon. 2023. Networks in risk spillovers: A multivariate GARCH perspective. Econometrics and Statistics 28: 1–29. [Google Scholar] [CrossRef]
Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–27. [Google Scholar] [CrossRef]
Bollerslev, Tim. 1990. Comparing predictive accuracy modelling the coherence in short-run nominal exchange rates: A multivariate generalized arch model. The Review of Economics and Statistics 72: 498–05. [Google Scholar] [CrossRef]
Bollerslev, Tim, Robert Engle, and Jeffrey Wooldridge. 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96: 116–31. [Google Scholar] [CrossRef]
Boudt, Kris, Jon Danielsson, and Sébastien Laurent. 2013. Robust forecasting of dynamic conditional correlation garch models. International Journal of Forecasting 29: 244–57. [Google Scholar] [CrossRef]
Brodie, Joshua, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. 2009. Sparse and stable markowitz portfolios. Proceedings of the National Academy of Sciences of the United States of America 106: 12267–72. [Google Scholar] [CrossRef]
Cai, Tony, and Weidong Liu. 2011. Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association 106: 672–84. [Google Scholar] [CrossRef]
Christiansen, Charlotte. 2007. Volatility-Spillover Effects in European Bond Markets. European Financial Management 13: 923–948. [Google Scholar] [CrossRef]
Comte, Fabienne, and Offer Lieberman. 2003. Asymptotic theory for multivariate garch processes. Journal of Multivariate Analysis 84: 61–84. [Google Scholar] [CrossRef]
DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. 2007. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? The Review of Financial Studies 22: 1915–53. [Google Scholar] [CrossRef]
Diebold, Francis X., and Kamil Yilmaz. 2009. Measuring financial asset return and volatitliy spillovers, with application to global equity markets. Economic Journal 199: 158–71. [Google Scholar] [CrossRef]
Di Lorenzo, David, Giampalo Liuzzi, Francesco Rinaldi, Fabio Schoen, and Marco Sciandrone. 2012. A concave optimization-based approach for sparse portfolio selection. Optimization Methods and Software 27: 983–1000. [Google Scholar] [CrossRef]
Efron, Bradley, Trevor Hastie, and Robert Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–99. [Google Scholar] [CrossRef]
Engle, Rober. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation. Econometrica 50: 987–1007. [Google Scholar] [CrossRef]
Engle, Robert. 1990. Asset pricing with a factor-arch covariance structure: Empirical estimates for treasury bills. Journal of Econometrics 45: 213–37. [Google Scholar] [CrossRef]
Engle, Robert. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics 20: 339–50. [Google Scholar] [CrossRef]
Engle, Robert, and Kenneth Kroner. 1995. Multivariate simultaneous generalized arch. Econometric Theory 11: 122–50. [Google Scholar] [CrossRef]
Engle, Robert, and Riccardo Colacito. 2006. Testing and valuing dynamic correlations for asset allocation. Journal of Business and Economic Statistics 24: 238–53. [Google Scholar] [CrossRef]
Engle, Robert, Olivier Ledoit, and Michael Wolf. 2019. Large dynamic covariance matrices. Journal of Business and Economic Statistics 37: 363–75. [Google Scholar] [CrossRef]
Engle, Robert, Takatoshi Ito, and Wen-Ling Lin. 1990. Meteor showers or heat waves? Heteroskedastic intra-daily volatility in the foreign exchange market. Econometrica 58: 525–42. [Google Scholar] [CrossRef]
Fan, Jianqing, and Jinchi Lv. 2011. Noncave penalized likelihood with np-dimensionality. IEEE Transactions on Information Theory 57: 5467–84. [Google Scholar] [CrossRef]
Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
Fan, Yingying, and Cheng Yong Tang. 2013. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology 75: 531–52. [Google Scholar] [CrossRef]
Fastrich, Björn, Sandra Paterlini, and Peter Winker. 2015. Constructing optimal sparse portfolios using regularization methods. Computational Management Science 12: 417–34. [Google Scholar] [CrossRef]
Francq, Christian, and Jean-Michel Zakoian. 2019. GARCH Models: Structure, Statistical Inference and Financial Applications. Hoboken: John Wiley & Sons. [Google Scholar]
Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics 1: 302–32. [Google Scholar] [CrossRef]
Giacometti, Rosella, Gabriele Torri, Kamonchai Rujirarangsan, and Michela Cameletti. 2023. Spatial Multivariate GARCH Models and Financial Spillovers. Journal of Risk and Financial Management 16: 397. [Google Scholar] [CrossRef]
Hamao, Yasushi, Ronald W. Masulis, and Victor Ng. 1990. Correlations in price changes and volatility across international stock markets. The Review of Financial Studies 3: 281–307. [Google Scholar] [CrossRef]
Hafner, Christian M., and Arie Preminger. 2009. Asymptotic theory for a factor GARCH model. Econometric Theory 25: 336–63. [Google Scholar] [CrossRef]
Hafner, Christian M., Helmut Herwartz, and Simone Maxand. 2022. Identification of structural multivariate GARCH models. Journal of Econometrics 227: 212–27. [Google Scholar] [CrossRef]
Hassan, Syed Aun, and Farooq Malik. 2007. Multivariate garch modeling of sector volatility transmission. The Quarterly Review of Economics and Finance 47: 470–80. [Google Scholar] [CrossRef]
Hong, Junping, Yi Yan, Ercan Engin Kuruoglu, and Wai Kin Chan. 2023. Multivariate Time Series Forecasting With GARCH Models on Graphs. IEEE Transactions On Signal And Information Processing Over Networks 9: 557–68. [Google Scholar] [CrossRef]
Kaltenhäuser, Bernd. 2002. Return and Volatility Spillovers to Industry Returns: Does EMU Play a Role? CFS Working Paper Series 2002/05. Frankfurt a. M.: Center for Financial Studies (CFS). [Google Scholar]
Lam, Clifford, and Jianqing Fan. 2009. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics 37: 4254–78. [Google Scholar] [CrossRef] [PubMed]
Lanne, Markku, and Pentti Saikkonen. 2007. A multivariate generalized orthogonal factor GARCH model. Journal of Business & Economic Statistics 25: 61–75. [Google Scholar]
Ledoit, Olivier, and Michael Wolf. 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88: 365–411. [Google Scholar] [CrossRef]
Ledoit, Olivier, and Michael Wolf. 2012. Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics 40: 1024–60. [Google Scholar] [CrossRef]
Ling, Shiqing, and Michael McAleer. 2003. Asymptotic theory for a vector arma-garch model. Econometric Theory 19: 280–310. [Google Scholar] [CrossRef]
Markowitz, Harry. 1952. Portfolio selection. The Journal of Finance 7: 77–91. [Google Scholar]
McAleer, Michael, Suhejia Hoti, and Felix Chan. 2009. Structure and asymptotic theory for multivariate asymmetric conditional volatility. Econometric Reviews 28: 422–40. [Google Scholar] [CrossRef]
NASDAQ Stock Symbols. n.d. Stock Symbol. Available online: https://www.nasdaq.com/market-activity/stocks/ (accessed on 24 January 2024).
Nicholson, William B., David S. Matteson, and Jacob Bien. 2017. VARX-L: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting 33: 627–51. [Google Scholar] [CrossRef]
Pan, Ming-Shiun, and L. Paul Hsueh. 1998. Transmission of stock returns and volatility between the U.S. and Japan: Evidence from the stock index futures markets. Asia-Pacific Financial Markets 5: 211–25. [Google Scholar] [CrossRef]
Poignard, Benjamin. 2017. New Approaches for High-Dimensional Multivariate Garch Models. General Mathematics [math.GM]. Ph.D. thesis, Université Paris Sciences et Lettres, Paris, France. [Google Scholar]
Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. High-dimensional covariance estimation by minimizing ℓ₁-penalized log-determinant divergence. Electronic Journal of Statistics 5: 935–80. [Google Scholar] [CrossRef]
Rio, Emmanuel. 2017. Asymptotic Theory of Weakly Dependent Random Processes. Berlin: Springer Nature. [Google Scholar]
Sánchez García, Javier, and Salvador Cruz Rambaud. 2022. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 10: 877. [Google Scholar] [CrossRef]
Shiferaw, Yegnanew A. 2019. Time-varying correlation between agricultural commodity and energy price dynamics with Bayesian multivariate DCC-GARCH models. Physica A: Statistical Mechanics and Its Applications 526: 120807. [Google Scholar] [CrossRef]
Siddiqui, Taufeeque Ahmad, and Mazia Fatima Khan. 2018. Analyzing spillovers in international stock markets: A multivariate GARCH approach. IMJ 10: 57–63. [Google Scholar]
Sun, Wei, Junhui Wang, and Yixin Fang. 2013. Consistent selection of tuning parameters via variable selection stability. Journal of Machine Learning Research 14: 3419–40. [Google Scholar]
Sun, Yan, and Xiaodong Lin. 2011. Regularization for stationary multivariate time series. Quantitative Finance 12: 573–86. [Google Scholar] [CrossRef]
Theodossiou, Panayiotis, and Unro Lee. 1993. Mean and volatility spillovers across major national stock markets: Further empirical evidence. The Journal of Financial Research 16: 337–50. [Google Scholar] [CrossRef]
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58: 267–88. [Google Scholar] [CrossRef]
Tse, Yiu Kuen, and Albert K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics 20: 351–62. [Google Scholar]
Uematsu, Yoshimasa. 2015. Penalized likelihood estimation in high-dimensional time series models and its application. arXiv arXiv:1504.06706. [Google Scholar]
van der Weide, Roy. 2002. Go-garch: A multivariate generalized orthogonal garch model. Journal of Applied Econometrics 17: 549–64. [Google Scholar] [CrossRef]
Vrontos, Ioannis, Petros Dellaportas, and Dimitris N. Politis. 2003. A full-factor multivariate garch model. The Econometrics Journal 6: 312–34. [Google Scholar] [CrossRef]
Wang, Hansheng, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71: 671–83. [Google Scholar] [CrossRef]
Worthington, Andrew, and Helen Higgs. 2004. Transmission of equity returns and volatility in asian developed and emerging markets: A multivariate garch analysis. International Journal of Finance & Economics 9: 71–80. [Google Scholar]
Wu, Tong Tong, and Kenneth Lange. 2008. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics 2: 224–44. [Google Scholar] [CrossRef]
Yuan, Ming, and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68: 49–67. [Google Scholar] [CrossRef]
Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef]
Zhang, Yongli, and Yuhong Yang. 2015. Cross-validation for selecting a model selection procedure. Journal of Econometrics 187: 95–112. [Google Scholar] [CrossRef]
Zhao, Peng, and Bin Yu. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7: 2541–67. [Google Scholar]
Zhao, Peng, and Bin Yu. 2007. Stagewise lasso. Journal of Machine Learning Research 8: 2701–26. [Google Scholar]
Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]

Figure 1. Histograms of selected

λ

in Cases 1 (top) and 2 (bottom) via BIC (left) and AIC (right).

Figure 2. Daily returns of 18 stocks from 4 January 2016 to 31 January 2018.

Figure 3. The network structure of estimated matrices A (top) and B (bottom) under different

λ

s.

Figure 4. Estimated volatilities by regularized BEKK(1,1) with

λ = 2

(red lines),

λ = 0.5

(blue lines), and univariate GARCH models (green lines).

Figure 5. Daily estimated conditional correlations when

λ = 1

.

Figure 6. Daily spillover index (top) and regularization paths of estimated off-diagonal parameters in BEKK regularization Model represented by different colors (bottom).

Table 1. Parameter matrices in simulations.

	Case 1	Case 2
A	$(\begin{matrix} 0.1268 & 0 & 0.0358 & - 0.0618 \\ 0 & 0.1737 & 0 & 0 \\ 0 & 0 & 0.2621 & 0 \\ 0 & 0 & 0 & 0.4096 \end{matrix})$	$(\begin{matrix} 0.4040 & - 0.0200 & 0 & 0 \\ 0 & 0.4434 & - 0.0752 & 0 \\ 0 & 0 & 0.0406 & 0.0684 \\ 0 & 0 & 0 & 0.2226 \end{matrix})$
B	$(\begin{matrix} 0.4257 & 0 & 0 & 0 \\ 0 & 0.3008 & 0 & 0 \\ 0.0912 & 0 & 0.2868 & 0 \\ 0 & 0 & 0 & 0.0372 \end{matrix})$	$(\begin{matrix} 0.2453 & 0 & 0 & 0 \\ 0 & 0.2401 & 0 & 0 \\ 0.2398 & 0 & 0.2157 & 0 \\ 0 & 0 & 0 & 0.3996 \end{matrix})$
C	$(\begin{matrix} 0.0324 & 0 & 0 & 0 \\ 0 & 0.0681 & 0 & 0 \\ 0 & 0.0349 & 0.0469 & 0 \\ 0.0728 & 0 & 0 & 0.0739 \end{matrix})$	$(\begin{matrix} 0.0804 & 0 & 0 & 0 \\ 0 & 0.0473 & 0 & 0 \\ 0 & 0 & 0.0521 & 0 \\ 0.0200 & 0 & 0 & 0.0628 \end{matrix})$

Table 2. Performance measures in two cases.

Case	$λ$	6	5	4	3	2	1	0.64	0.32	0.16	0.08
	$τ_{0}$	0.988	0.988	0.984	0.981	0.974	0.954	0.934	0.890	0.841	0.756
	$τ_{0}$	(0.031)	(0.031)	(0.035)	(0.037)	(0.040)	(0.046)	(0.051)	(0.058)	(0.066)	(0.083)
1	$τ_{0^{C}}$	0.755	0.755	0.767	0.786	0.814	0.821	0.822	0.834	0.864	0.897
1	$τ_{0^{C}}$	(0.028)	(0.033)	(0.035)	(0.036)	(0.024)	(0.014)	(0.013)	(0.026)	(0.034)	(0.038)
	$ν$	0.151	0.151	0.150	0.147	0.142	0.133	0.132	0.131	0.128	0.125
	$ν$	(0.029)	(0.029)	(0.029)	(0.029)	(0.030)	(0.032)	(0.032)	(0.031)	(0.030)	(0.029)
	$100 κ$	3.104	3.533	2.370	1.611	0.735	0.461	0.359	0.325	0.289	0.282
	$100 κ$	(6.463)	(7.124)	(5.471)	(4.158)	(1.051)	(0.460)	(0.356)	(0.294)	(0.242)	(0.257)
	$τ_{0}$	0.985	0.985	0.983	0.977	0.960	0.918	0.889	0.841	0.805	0.767
	$τ_{0}$	(0.030)	(0.030)	(0.034)	(0.034)	(0.043)	(0.048)	(0.052)	(0.052)	(0.054)	(0.065)
2	$τ_{0^{C}}$	0.740	0.742	0.745	0.752	0.759	0.766	0.765	0.767	0.768	0.782
2	$τ_{0^{C}}$	(0.029)	(0.029)	(0.028)	(0.025)	(0.019)	(0.011)	(0.014)	(0.013)	(0.014)	(0.030)
	$ν$	0.151	0.151	0.151	0.149	0.145	0.136	0.136	0.135	0.132	0.126
	$ν$	(0.022)	(0.022)	(0.022)	(0.021)	(0.021)	(0.023)	(0.023)	(0.022)	(0.021)	(0.022)
	$100 κ$	2.004	2.040	1.818	0.330	0.323	0.369	0.402	0.278	0.275	0.209
	$100 κ$	(8.525)	(9.024)	(7.710)	(1.381)	(1.936)	(1.829)	(2.388)	(0.163)	(0.163)	(0.180)

Table 3. Full names of 18 tickers.

Ticker	Company	Ticker	Company
GOOG	Alphabet Inc., Mountain View, CA, USA	GWW	W.W. Grainger, Inc., Lake Forest, FL, USA
IBM	International Business Machines Corporation, Armonk, NY, USA	JPM	JPMorgan Chase & Co., New York, NY, USA
MSFT	Microsoft Corporation, Redmond, WA, USA	NKE	Nike Inc., Beaverton, OR, USA
ORCL	Oracle Corporation, Austin, TX, USA	TIF	Tiffany & Co., New York, NY, USA
IPG	The Interpublic Group of Companies, New York, NY, USA	MAS	Masco Corporation, Livonia, MI, USA
MCD	Mcdonald’s Corp., Chicago, IL, USA	NFLX	Netflix, Inc., Los Gatos, CA, USA
RL	Ralph Lauren Corporation, New York, NY, USA	TXT	Textron Inc., Providence, RI, USA
LNC	Lincoln National Corporation, Radnor, PA, USA	MRO	Marathon Oil Corporation, Houston, TX, USA
TGT	Target Corporation, Minneapolis, MN, USA	WMT	Walmart Inc., Bentonville, AR, USA

Table 4. Correlation and statistical features of 18 stocks for 2016–2017.

	GOOG	GWW	IBM	JPM	MSFT	NKE	ORCL	TIF	IPG	MAS	MCD	NFLX	RL	TXT	LNC	MRO	TGT	WMT
Mean	7.037 × $10^{4}$	5.164 × $10^{4}$	4.297 × $10^{4}$	1.150 × $10^{3}$	1.033 × $10^{3}$	1.420 × $10^{4}$	6.439 × $10^{4}$	8.143 × $10^{4}$	−7.648 × $10^{5}$	1.029 × $10^{3}$	8.891 × $10^{4}$	1.304 × $10^{3}$	1.610 × $10^{4}$	7.103 × $10^{4}$	1.148 × $10^{3}$	1.217 × $10^{3}$	5.523 × $10^{5}$	1.116 × $10^{3}$
Std. Dev.	0.013	0.016	0.011	0.013	0.012	0.014	0.012	0.015	0.015	0.014	0.009	0.023	0.021	0.015	0.019	0.034	0.017	0.012
Skewness	−0.440	0.124	0.291	0.348	0.040	0.509	0.097	−0.391	−1.779	−0.456	0.005	0.635	−2.332	−1.438	−0.779	0.640	−1.108	2.291
Kurtosis	6.120	16.399	14.446	8.907	8.926	11.683	15.022	6.274	16.842	9.409	9.640	17.196	33.455	17.970	10.830	7.383	13.600	24.910
GOOG		0.04	0.24	0.26	0.67	0.28	0.36	0.22	0.28	0.33	0.29	0.44	0.15	0.23	0.27	0.11	0.06	0.13
GWW			0.38	0.34	0.13	0.17	0.16	0.24	0.29	0.27	0.03	0.06	0.09	0.34	0.28	0.21	0.16	0.09
IBM				0.38	0.30	0.15	0.42	0.27	0.30	0.31	0.12	0.13	0.15	0.39	0.37	0.19	0.15	0.14
JPM					0.37	0.23	0.38	0.41	0.31	0.44	0.23	0.19	0.28	0.52	0.78	0.36	0.16	0.09
MSFT						0.26	0.45	0.29	0.29	0.38	0.35	0.36	0.19	0.32	0.33	0.20	0.08	0.12
NKE							0.20	0.23	0.30	0.30	0.18	0.18	0.33	0.23	0.30	0.12	0.27	0.16
ORCL								0.34	0.26	0.36	0.26	0.26	0.15	0.30	0.35	0.19	0.09	0.12
TIF									0.23	0.33	0.14	0.18	0.25	0.33	0.42	0.26	0.27	0.11
IPG										0.38	0.09	0.12	0.21	0.26	0.32	0.15	0.21	0.10
MAS											0.28	0.23	0.26	0.36	0.46	0.20	0.24	0.12
MCD												0.14	0.13	0.12	0.16	0.07	0.09	0.16
NFLX													0.10	0.22	0.19	0.07	0.02	0.09
RL														0.22	0.35	0.16	0.31	0.09
TXT															0.53	0.26	0.17	0.09
LNC																0.39	0.19	0.05
MRO																	0.08	0.04
TGT																		0.36

Table 5. Performance of portfolios using different covariance models.

Model	Mean (%)	SD. (%)	IR	Mean (%)	SD. (%)	IR
Equally weighted	0.430	0.761	0.565
	GMV			$μ_{*} = 0.15 %$
Regularized BEKK	0.390	0.650	0.601	0.352	0.652	0.540
Factor GARCH	0.326	0.885	0.368	0.223	1.200	0.186
DCC–GARCH	0.244	0.665	0.367	0.302	0.677	0.446
Constant covariance	0.261	0.658	0.397	0.165	0.777	0.212
	$μ_{*} = 0.10 %$			$μ_{*} = 0.05 %$
Regularized BEKK	0.382	0.585	0.654	0.416	0.633	0.657
Factor GARCH	0.210	1.169	0.180	0.221	1.321	0.167
DCC–GARCH	0.286	0.631	0.452	0.316	0.660	0.479
Constant covariance	0.219	0.669	0.327	0.273	0.668	0.409

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

L₁ Regularization for High-Dimensional Multivariate GARCH Models

Abstract

1. Introduction

2. Literature Review

3. The MGARCH–BEKK Representations with $L_{1}$ Regularization

3.1. The MGARCH–BEKK Representation

3.2. Likelihood Function

3.3. $L_{1}$ Penalty Function and Penalized Quasi-Likelihood

4. Properties of the PQML Estimator and Implementation

4.1. Sparsity of the PQML Estimator

4.2. Implementation and Selection of $λ$

5. Simulation

6. Real Data Applications

6.1. Volatility Spillovers

6.2. Portfolio Optimization

7. Discussion and Conclusive Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proofs of Propositions, Lemmas, and Theorems

Appendix A.1. Proof of Propostion 2

Appendix A.2. Proof of Proposition 4

Appendix A.3. Proof of Lemma 1

Appendix A.4. Proof of Lemma 2

Appendix A.5. Proof for Theorem 1

References

Article Metrics

Citations

Article Access Statistics

L1 Regularization for High-Dimensional Multivariate GARCH Models

Abstract

1. Introduction

2. Literature Review

3. The MGARCH–BEKK Representations with L 1 Regularization

3.1. The MGARCH–BEKK Representation

3.2. Likelihood Function

3.3. L 1 Penalty Function and Penalized Quasi-Likelihood

4. Properties of the PQML Estimator and Implementation

4.1. Sparsity of the PQML Estimator

4.2. Implementation and Selection of λ

5. Simulation

6. Real Data Applications

6.1. Volatility Spillovers

6.2. Portfolio Optimization

7. Discussion and Conclusive Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proofs of Propositions, Lemmas, and Theorems

Appendix A.1. Proof of Propostion 2

Appendix A.2. Proof of Proposition 4

Appendix A.3. Proof of Lemma 1

Appendix A.4. Proof of Lemma 2

Appendix A.5. Proof for Theorem 1

References

Article Metrics

Citations

Article Access Statistics

L₁ Regularization for High-Dimensional Multivariate GARCH Models

3. The MGARCH–BEKK Representations with $L_{1}$ Regularization

3.3. $L_{1}$ Penalty Function and Penalized Quasi-Likelihood

4.2. Implementation and Selection of $λ$