1. Introduction
In high-frequency financial econometrics, covariance matrix estimation of asset returns has been extensively studied in the past two decades. High-frequency financial data are commonly modeled as a discretely observed semimartingale for which the quadratic covariation matrix plays the role of the covariance matrix, so their treatments are often different from those in a standard i.i.d. setting. In recent years, motivated by application to portfolio allocation and risk management in a large-scale asset universe, the high-dimensionality problem has attracted much attention in this area. Since the 2000s, great progress has been made in high-dimensional covariance estimation from i.i.d. data, so researchers are naturally led to apply the techniques developed therein to the context of high-frequency data. For example, Wang and Zou [
1] have applied the entry-wise shrinkage methods considered in [
2,
3] to estimating the covariance matrix of high-frequency data which are asynchronously observed with noise. See also [
4,
5,
6,
7] for further developments in this approach. In the meantime, it is well-recognized that the
factor structure is an important ingredient both theoretically and empirically for financial data. In the context of high-dimensional covariance estimation from high-frequency data, this perspective was first taken into account by Fan et al. [
8] and subsequently built up by, among others, [
9,
10,
11]. Other common methods used in i.i.d. settings have also been investigated in the literature of high-frequency financial econometrics. Hautsch et al. [
12] and Morimoto and Nagata [
13] formally apply eigenvalue regularization methods based on random matrix theory to high-frequency data. Lam et al. [
14] accommodate the non-linear shrinkage estimator of [
15] to a high-frequency data setting with the help of the spectral distribution theory for the realized covariance matrix developed in [
16]. Brownlees et al. [
17] employ the
-penalized Gaussian MLE, which is known as the
graphical Lasso, to estimate the precision matrix (the inverse of the covariance matrix) of high-frequency data. The last approach is closely related to the methodology we will focus on. Despite the recent advances in this topic as above, most studies in this area focus only on
point estimation of covariance and precision matrices, and there are little work about
interval estimation and
hypothesis testing for these objects. A few exceptions are [
18,
19,
20]. The first two articles are concerned with continuous-time factor models: Kong and Liu [
18] propose a test for the constancy of the factor loading matrix, while Pelger [
19] assumes constant loadings and develops an asymptotic distribution theory to make inference for the factors and loadings. Meanwhile, Koike [
20] establishes a high-dimensional central limit theorem for the realized covariance matrix which allows us to construct simultaneous confidence regions or carry out multiple testing for entries of the high-dimensional covariance matrix of high-frequency data. The aim of this study is to develop such a statistical inference theory for the
precision matrix of high-frequency data. This is naturally motivated by the fact that the precision matrix of asset returns plays an important role in mean-variance analysis of portfolio allocation (see e.g., [
21], Chapter 5). We accomplish this purpose by imposing a sparsity assumption on the precision matrix. Such an assumption has a clear interpretation in connection with
Gaussian graphical models: For a Gaussian random vector
with covariance matrix
,
and
are conditionally independent given the other components if and only if the
-th entry of
is equal to 0, so the sparsity of
is interpreted as the sparsity of the edge structure of the
conditional independence graph associated with
. We refer to Chapter 13 of [
22] and references therein for more details on graphical models. This standpoint also makes it interesting to estimate the precision matrix of financial data in view of the recent attention to financial network analysis such as [
23].
Statistical inference for high-dimensional sparse precision matrices has been actively studied in the recent literature, and various methodologies have ever been proposed; see [
24] for an overview. Among others, this paper studies (a weighted version of) the de-biased (or de-sparsified) graphical Lasso in the context of high-frequency data. The de-biased graphical Lasso was introduced in Janková and van de Geer [
25] and its theoretical property was investigated in the i.i.d. case. In this paper, we consider its weighted version discussed in [
24] because of its theoretically preferable behavior due to its adaptive nature (see Remarks 1 and 2). Compared to the i.i.d. case, we need to handle a new theoretical difficulty in the application to high-frequency data, which is caused by the non-ergodic nature of the problem, i.e., the precision matrix of high-frequency data is generally stochastic and not (stochastically) independent of the observation data. In our context, the precision matrix appears in the coefficients of the linear approximation of the de-biased estimator (see Lemma 1), so it spoils the martingale structure of the linear approximation which we usually have in the i.i.d. case. In a low-dimensional setting, this issue is typically resolved by the concept of
stable convergence (see e.g., [
26]), but the applicability of this approach is questionable in our setting due to the high-dimensionality (see pages 1451–1452 of [
20] for a discussion). Instead, we rely on the recent high-dimensional central limit theory of [
20] to establish the asymptotic distribution theory for the de-biased estimator, where we settle the above difficulty with the help of Malliavin calculus.
The graphical Lasso is an example of penalized estimation methods. We shall mention that penalized estimation has recently become an active research topic in the setting of asymptotic statistics for stochastic processes. For example, penalized quasi-likelihood estimation for stochastic processes has been developed in the fixed-dimensional setting by [
27,
28,
29,
30], while estimation for linearly parameterized high-dimensional diffusion models has been studied in [
31,
32]. Compared to these articles, this paper is novel in the respect that we develop an
asymptotic distribution theory in a
high-dimensional setting.
The rest of this paper is organized as follows. In
Section 2 we develop an abstract asymptotic theory for the weighted graphical Lasso based on a generic estimator for the quadratic covariation matrix of a high-dimensional semimartingale. This allows us to flexibly apply the developed theory to various settings arising in high-frequency financial econometrics. In
Section 3 we extend the scope of the theory to a situation where a known factor structure is present in data and a sparsity assumption is imposed on the precision matrix of the residual process rather than that of the original process. In
Section 4, we apply the abstract theory developed in
Section 3 to a concrete setting where we observe the process at equidistant times without jumps and noise.
Section 5 conducts a Monte Carlo study to assess the finite sample performance of the asymptotic theory, while
Section 6 performs a simple real data analysis for illustration. All the technical proofs are collected in the
Appendix A,
Appendix B,
Appendix C and
Appendix D.
Notation 1. Throughout the paper, we assume . ⊤
stands for the transpose of a matrix. For a vector , we write the i-th component of x as for . For two vectors , the statement means for all . The identity matrix of size d is denoted by . We write for the set of all matrices. denotes the set of all symmetric matrices. denotes the set of all positive semidefinite matrices. denotes the set of all positive definite matrices. For a matrix A, the -th entry of A is denoted by . Also, and denote the i-th row vector and the j-th column vector, respectively (both are regarded as column vectors). We write for the vectorization of A: For every , we set Also, we write for the -operator norm of A: It is well-known that and . When , denotes the diagonal matrix with the same diagonal entries as A, and we set . If A is symmetric, we denote by and the maximum and minimum eigenvalues of A, respectively. For two matrices A and B, denotes their Kronecker product. When A and B has the same size, we write for their Hadamard product.
For a random variable ξ and , denotes the -norm of ξ. For a l-dimensional semimartingale and a k-dimensional semimartingale , we define . We write for short. If is a.s. invertible, we write .
2. Estimators and Abstract Results
Given a stochastic basis , we consider a d-dimensional semimartingale defined there. We assume is a.s. invertible. In this paper, we consider the asymptotic theory such that the dimension d possibly depends on a parameter so that as . Consequently, both and Y may also depend on n. However, following the custom of the literature, we omit the indices n from these objects and many other ones appearing below.
Our aim is to estimate the precision matrix
when we have an estimator
for
; as a corollary, we can also estimate
itself. We assume that
is an
-valued random variable all of whose diagonal entries are a.s. positive, but we do not specify the form of
because the asymptotic theory developed in this section depends on the property of
rather than their construction. This is convenient because construction of the estimator depends heavily on observation schemes for
Y (with or without noise, synchronous or not, continuous or discontinuous and so on; see [
33] for details). In
Section 4 we illustrate how we apply the abstract theory developed in this and the next sections to a concrete situation.
We use the
weighted graphical Lasso to estimate
(cf. [
24]). The weighted graphical Lasso estimator
with penalty parameter
based on
is defined by
where
. According to the proof of [
34] (Lemma 1), the optimization problem in Equation (
1) has the unique solution when
and
is positive semidefinite and all the diagonal entries of
are positive, so
is a.s. defined in our setting. In the following we allow
to be a random variable because we typically select
in a data-driven way.
To analyze the theoretical property of
, it is convenient to consider the graphical Lasso estimator
based on the correlation matrix estimator
as follows:
We can easily check .
Remark 1. As pointed out in Rothman et al. [35] and Janková and van de Geer [24], the graphical Lasso based on correlation matrices is theoretically preferable to that based on covariance matrices (so the weighted graphical Lasso is also preferable). In particular, we do not need to impose the so-called irrepresentability condition on to derive the theoretical properties of our estimators, which contrasts with Brownlees et al. [17] (see Assumption 2 in [17]). See also Remark 2 for an additional discussion. We introduce some notation related to the sparsity assumptions we will impose on . Let . For , we set and . Then we define . We also define and . These quantities have a clear interpretation when the matrix A represents the edge structure of some graph so that is equivalent to the presence of an edge between vertices i and j for ; in this case, is the number of edges adjacent to vertex j (which is called the degree of vertex j) and is the total number of edges contained in the graph.
To derive our asymptotic results, we will impose the following structural assumptions on .
- [A1]
as .
- [A2]
as for some sequence , .
- [A3]
as for some sequence , .
[A1] is standard in the literature; see e.g., Condition A1 in [
24]. [A2] states that the sparsity of
is controlled by the deterministic sequence
; we will require the growth rate of
to be moderate. [A3] is another sparsity assumption on
. It is weaker than [A2] in the sense that it always holds true with
under [A2]. However, we can generally take
smaller than
.
2.1. Consistency
Set , and .
Proposition 1. Assume [A1]–[A2]. Let be a sequence of positive-valued random variables satisfying the following conditions:
- [B1]
as .
- [B2]
as .
Then we haveandas for any . Proposition 1 is essentially a rephrasing of Theorem 14.1.3 in [
24]. To get a better convergence rate in Proposition 1, we should choose
as small as possible, where a lower bound of
is determined by the convergence rate of
in the
-norm by [B1]. One typically derives this convergence rate by establishing entry-wise concentration inequalities for
. Such inequalities have already been established for various covariance estimators used in high-frequency financial econometrics; see Theorems 1–2 and Lemma 3 in [
36], Theorem 1 in [
4], Theorem 1 in [
37], and Theorem 2 in [
17] for example. We however note that
should be positive semidefinite to ensure that the graphical Lasso has the unique solution. This property is not necessarily ensured by many covariance estimators used in this area. In this regard, we mention that pre-averaging and realized kernel estimators have versions to ensure this property, for which relevant bounds are available in [
6] (Theorem 2) and [
11] (Lemma 1).
Remark 2 (Comparison to Brownlees et al. [
17]).
Compared with [17] (Theorem 1), Proposition 1 has two major theoretical improvements. First, Proposition 1 does not assume the so-called irrepresentability condition, which is imposed in [17] (Theorem 1) as Assumption 2. In fact, under the assumptions of Proposition 1, the unweighted graphical Lasso estimator adopted in [17] would have the convergence rate (rather than in our case) to estimate in the norm , in view of [24] (Theorem 14.1.2). This means that we need to select so that as to ensure the consistency, which is much stronger than the corresponding assumption [B2] in our setting. Since typically converges to 0 no faster than with n being the sample size (cf. Section 4), the condition excludes high-dimensional settings such that .Second, Proposition 1 gives consistency in the -operator norm for all , while [17] (Theorem 1) only shows consistency in the -norm. We shall remark that consistency in matrix operator norms is important in application. For example, the consistency of in the -operator norm implies that eigenvalues of consistently estimate the corresponding eigenvalues of . Also, the consistency in the -operator norm ensures as for any such that . This result is important for portfolio allocation because the weight vector for the global minimum variance portfolio is given by when assets have covariance matrix , where ; see e.g., [21] (Section 5.2). On the other hand, unlike [17] (Theorem 1), we do not show selection consistency (i.e., as ) under our assumptions. Indeed, in the linear regression setting, it is known that an irrepresentability type condition is necessary for the selection consistency of the Lasso; see [22] (Section 7.5.3) for more details. This suggests that our estimator would not have oracle property in the sense of [38] in general. However, we shall remark that the asymptotic mixed normality of the de-biased estimator stated below can be used to construct an estimator with selection consistency via thresholding as in e.g., [39] (Section 3.1) and [40] (Section 4.2). See Corollary 2 and the subsequent discussion for details. 2.2. Asymptotic Mixed Normality
The following lemma states that is asymptotically linear in after bias correction when is sufficiently sparse.
Lemma 1. Suppose that the assumptions of Proposition 1 and [A3] are satisfied. Then we haveas , where . Lemma 1 is an almost straightforward consequence of Equation (
4) and the Karush–Kuhn–Tucker (KKT) conditions for the optimization problem in Equation (
1). As a consequence of this lemma, we obtain the following result, which states that the “de-biased” weighted graphical Lasso estimator
inherits the asymptotic mixed normality of
.
Proposition 2. Suppose that the assumptions of Lemma 1 are satisfied. For every , let , be a positive semidefinite random matrix and be an random matrix, where may depend on n. Assume as . Assume also thatandas , where and is a -dimensional standard Gaussian vector independent of , which is defined on an extension of the probability space if necessary. Then, In a standard i.i.d. setting such that
is non-random, we can usually verify Equation (
5) by classical Lindeberg’s central limit theorem when
and
is non-random because
can be written as a sum of independent random variables; see the proof of [
25] (Theorem 1) for example. By contrast,
is generally random and not independent of
in our setting, so
may not be a martingale even if
is a martingale. In the case that
d is fixed, we typically resolve this issue by proving
stable convergence in law of
; see e.g., [
26] for details. However, extension of this approach to the case that
as
is far from trivial as discussed at the beginning of [
20] (Section 3). For this reason, [
20] gives a result to directly establish Equation (
5) type convergence in a high-dimensional setting. This result will be used in
Section 4 to apply our abstract theory to a more concrete setting.
Remark 3. Proposition 2 also allows m to diverge as , which is necessary when we need to derive an asymptotic approximation of the joint distribution of . Such an approximation can be used to make simultaneous inference for entries of ; see [40] for example. 3. Factor Structure
In financial applications, it is often important to take account of the factor structure of asset prices. In fact, many empirical studies have documented the existence of common factors in financial markets (e.g., [
41] (Section 6.5)). Also, factor models play a dominant role in asset pricing theory (cf. [
21] (Chapter 9)). When common factors are present across asset returns, the precision matrix cannot be sparse because all pairs of the assets are partially correlated given other assets through the common factors. Therefore, in such a situation, it is common practice to impose a sparsity assumption on the precision matrix of the residual process which is obtained after removing the co-movements induced by the factors (see e.g., [
17] (Section 4.2) and [
42] (Section 4.2)). In this section, we accommodate the theory developed in
Section 2 to such an application.
Specifically, suppose that we have an
r-dimensional known factor process
X, and consider the following continuous-time factor model:
Here,
is a non-random
matrix and
Z is a
d-dimensional semimartingale such that
.
and
Z represent the factor loading matrix and residual process of the model, respectively. This model is widely used in high-frequency financial econometrics; see [
8,
9,
11] in the context of high-dimensional covariance matrix estimation. One restriction of the model Equation (
7) is that the factor loading
is assumed to be constant, but there is empirical evidence that
may be regarded as constant in short time intervals (one week or less); see [
18,
43] for instance.
Remark 4. The number of factors r possibly depends on n and (slowly) diverges as . Also, β may depend on n.
We are interested in estimating
based on observation data for
X and
Y while taking account of the factor structure given by Equation (
7). Suppose that we have generic estimators
and
for
and
, respectively.
and
are assumed to be random variables taking values in
and
, respectively. Now, by assumption we have
Assume
is a.s. invertible. Then
can be written as
. Therefore, we can naturally estimate
by
, provided that
is invertible. In practical applications, the invertibility of
is usually not problematic because the number of factors
r is sufficiently small compared to the sample size. However, it is theoretically convenient to (formally) define
in the case that
is singular. For this reason, we take an
-valued random variable
such that
on the event where
is invertible, and redefine
as
. This does not affect the asymptotic properties of our estimators because
is asymptotically invertible under our assumptions we will impose. Now, from Equation (
8),
is estimated by
Since
might be a poor estimator for
because
d can be extremely large in our setting, we apply the weighted graphical Lasso to
in order to estimate
. Specifically, we construct the weighted graphical Lasso estimator
based on
as follows:
Then
is estimated by the inverse of
. Hence our final estimator for
is constructed as
Remark 5. Although we will impose the assumptions which guarantee that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1, it may have no solution for a fixed n. Thus, we formally define as an -valued random variable such that is defined by Equation (10) on the event where the optimization problem in Equation (10) has the unique solution. Remark 6 (Positive definiteness of ). Since is positive definite by construction, is positive definite (note that we assume is positive semidefinite).
We will impose the following structural assumptions on the model:
- [C1]
and as .
- [C2]
as .
- [C3]
as .
- [C4]
as for some sequence , .
- [C5]
as for some sequence , .
- [C6]
There is a positive definite matrix such that and as .
[C1]–[C3] are natural structural assumptions on the model and standard in the literature; see e.g., Assumptions 2.1 and 3.3 in [
44]. [C4]–[C5] are sparsity assumptions on the precision matrix of the residual process and necessary for our application of the (weighted) graphical Lasso. [C6] requires the factors to have non-negligible impact on almost all assets and is also standard in the context of covariance matrix estimation based on a factor model; see e.g., Assumption 3.5 in [
44] and Assumption 6 in [
8].
The following result establishes the consistency of the residual precision matrix estimator .
Proposition 3. Assume [C1]–[C4]. Let be a sequence of positive-valued random variables satisfying the following conditions:
- [D1]
, and as , where .
- [D2]
as .
- [D3]
as , where
Then and as for any .
Remark 7. (a) Since and , and are seen as natural estimators for and respectively if β were known. In this sense, [D1] is a natural extension of [B1]. In particular, if as , [D1] follows from the convergences , and under [C1], which are typically derived from entry-wise concentration inequalities for and .
(b) [D3]
ensures that is asymptotically positive semidefinite. This is necessary for guaranteeing that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1. From Proposition 3 we can also derive the convergence rates for the estimators
and
in appropriate norms, which may be seen as counterparts of Theorems 1–2 in [
8].
Proposition 4. Under the assumptions of Proposition 3, as
Proposition 5. Under the assumptions of Proposition 3, we additionally assume [C5]–[C6]. Then, and as .
Next we present the high-dimensional asymptotic mixed normality of the de-biased version of .
Proposition 6. Suppose that the assumptions of Proposition 3 and [C5]
are satisfied. For every , let , be a positive semidefinite random matrix and be an random matrix, where may depend on n. Assume as . Assume also thatandas , where and is a -dimensional standard Gaussian vector independent of , which is defined on an extension of the probability space if necessary. Then,where . Remark 8. It is worth mentioning that condition Equation (12) is stated for rather than . In other words, for deriving the asymptotic distribution, we do not need to take account of the effect of plugging into β, at least in the first order. This is thanks to Lemma A11. Although it is generally difficult to derive the asymptotic mixed normality of (the de-biased version of) , this is possible when d is sufficiently large. In fact, in such a situation, the entry-wise behavior of is dominated by as described by the following lemma:
Lemma 2. Under the assumptions of Proposition 5, and as .
Consequently, we obtain the following result.
Proposition 7. Suppose that the assumptions of Proposition 6 and [C6]
are satisfied. Suppose also as . Then we have 4. Application to Realized Covariance Matrix
In this section, we apply the abstract theory developed above to the simplest situation where the processes have no jumps and are observed at equidistant times without noise. Specifically, we consider the continuous-time factor model Equation (
7) and assume that both
Y and
X are observed at equidistant time points
,
. In this case,
is naturally estimated by the
realized covariance matrix:
Analogously, we define
and
. In addition, we assume that
Z and
X are respectively
d-dimensional and
r-dimensional continuous Itô semimartingales given by
where
and
are respectively
d-dimensional and
r-dimensional
-progressively measurable processes,
and
are respectively
-valued and
-valued
-progressively measurable processes, and
is a
-dimensional standard
-Wiener process. To apply the convergence rate results to this setting, we impose the following assumptions:
- [E1]
For all , we have an event and -progressively measurable processes , , and which take values in , , and , respectively, and they satisfy the following conditions:
- (i)
.
- (ii)
, , and on for all .
- (iii)
For all
, there is a constant
such that
where
and
.
- [E2]
and as .
[E1] is a local boundedness assumption on the coefficient processes and typical in the literature: For example, [E1] is satisfied when
and
are all bounded by some locally bounded process independent of
n. This latter condition is imposed in [
8], among others. [E2] restricts the growth rates of
d and
r. It is indeed an adaptation of [D1] to the present setting.
Theorem 1. Assume [C1]–[C4] and [E1]–[E2]. Let be a sequence of positive-valued random variables such that and as . Then , and as for any . Moreover, if we additionally assume [C5]–[C6], then and as .
Remark 9 (Optimal convergence rate).
From Theorem 1, the convergence rate of to in the -operator norm for any can be arbitrarily close to , which is similar to that in a standard i.i.d. setting (cf. Theorem 14.1.3 in [24]). On the other hand, in the Gaussian i.i.d. setting without factor structure, the minimax optimal rate for this problem is known to be (see [45] (Theorem 1.1) and [46] (Theorem 5)), which can be faster than . In a standard i.i.d. setting, this rate can be attained by using a node-wise penalized regression (see e.g., [46] (Section 3.1)), so it would be interesting to study the convergence rate of such a method in our setting. We leave it to future research. In the meantime, such a method does not ensure the positive definiteness of the estimated precision matrix in general, so our estimator would be preferable for some practical applications such as portfolio allocation. Next we derive the asymptotic mixed normality of the de-biased estimator in the present setting. As announced, we accomplish this purpose with the help of Malliavin calculus. In the following we will freely use standard concepts and notation from Malliavin calculus. We refer to [
47,
48] (Chapter 1) for detailed treatments of this subject.
We consider the Malliavin calculus with respect to
W. For any real number
and any integer
,
denotes the stochastic Sobolev space of random variables which are
k times differentiable in the Malliavin sense and the derivatives up to order
k have finite moments of order
p. If
, we denote by
the
kth Malliavin derivative of
F, which is a random variable taking values in
. Here, we identify the space
with the set of all
-dimensional
k-way arrays, i.e., real-valued functions on
. Since
is a random function on
, we can consider the value
evaluated at
. We denote this value by
. Moreover, since
takes values in
, we can consider the value
evaluated at
. This value is denoted by
. We remark that the variable
is defined only a.e. on
with respect to the measure
, where
denotes the Lebesgue measure on
. Therefore, if
satisfies some property a.e. on
with respect to
, by convention we will always take a version of
satisfying that property everywhere on
if necessary. We set
. We denote by
the space of all
d-dimensional random variables
F such that
for every
. The space
is defined in an analogous way. Finally, for any
-valued random variable
F and
, we set
We also need to define some variables related to the “asymptotic” covariance matrices of the estimators. We define
random matrix
by
where
. Then we set
and
. In addition, under [E1], we define
similarly to
with replacing
by
.
and
play roles of the asymptotic covariance matrices of
and
, respectively.
We impose the following assumptions on the model.
- (F1)
We have [E1] and
is a.s. invertible for all
. Moreover, for all
and
,
,
and
where
and
.
- (F2)
The matrix is non-random and as .
- (F3)
and as .
We give a few remarks on these assumptions. First, [F1] imposes the (local) Malliavin differentiability on the coefficient processes of the residual process
Z and the local boundedness on their Malliavin derivatives. Such an assumption is necessary for the application of the high-dimensional mixed normal limit theorem of [
20] to our setting (see Lemma A16). Please note that we do not need to impose this type of assumption on the factor process
X. We also remark that analogous assumptions are sometimes used in the literature of high-frequency financial econometrics even in low-dimensional settings; see e.g., [
49,
50]. Second, [F2] is clearly understood when we consider a Gaussian graphical model associated with
: The non-randomness of
implies that the edge structure of this Gaussian graphical model is determined in a non-random manner (by conditioning, it is indeed sufficient that the edge structure is determined independently of the driving Wiener process
W). Also, we remark that the condition
is equivalent to [C5] with
. It is seemingly possible to relax this condition so that it allows a diverging sequence
as long as
for an appropriate constant
. However, to determine the precise value of
, we need to carefully revise the proof of Lemma A16 so that it allows the quantity inside
in (A7) to diverge as
. To avoid such an additional complexity, we restrict our attention to the case of
. Third, the condition
in [F3] is used again for applying the high-dimensional CLT of [
20].
Now we are ready to state our result. Let be the set of all hyperrectangles in , i.e., consists of all sets A of the form for some , .
Theorem 2. Assume [C1]–[C4]
and [F1]–[F3].
Let be a sequence of positive-valued random variables such that , and as . Then we have as .
Remark 10. is typically chosen of order close to as possible, so is almost equivalent to . This is stronger than the condition which is used to derive the asymptotic normality of the de-biased weighted graphical Lasso estimator in [24] (Theorem 14.1.6) (note that we assume ). This is because Theorem 2 derives the approximations of the joint distributions of the de-biased estimator and its Studentization, while [24] (Theorem 14.1.6) focuses only on approximation of their marginal distributions. Theorem 2 is statistically infeasible in the sense that
is unobservable. Thus, we need to estimate it from the data. Since
is naturally estimated by
, we construct an estimator for
. Define the
-dimensional random vectors
by
where
. Then we set
Lemma 3. Suppose that the assumptions of Theorem 2 are satisfied. Suppose also as and that there is a constant such thatas for all . Then, as . Let us set and .
Corollary 1. Under the assumptions of Lemma 3, we have the following results:
- (a)
Assume and as . Then, - (b)
Assume and as . Then,as .
Corollary 1(a) particularly implies that
where
and
is the standard normal distribution function. This result can be used to construct entry-wise confidence intervals for
. Meanwhile, combining Corollary 1(b) with [
20] (Proposition 3.2), we can estimate the quantiles of
and
for a given set of indices
by simulation. Such a result can be used to construct simultaneous confidence intervals and control the family-wise error rate in multiple testing for entries of
; see Sections 2.3–2.4 of [
51] for details.
As announced, another application of our result is to construct an estimator with selection consistency via thresholding. This is carried out by using the following result:
Corollary 2. Let () satisfy and as for some . Define and Then, under the assumptions of Corollary 1(a), we haveprovided that as . Please note that the last condition is satisfied if
is bounded away from zero because
under our assumptions. Taking the sequence
so that
in Corollary 2, we can asymptotically recover the support of
. In this case, if we define
by
will be oracle in the sense of [
38]. However, we note that the estimator
would not be continuous in data, so it would not satisfy the third desirable property in [
38] (p. 1349). To construct an oracle estimator for
which is continuous in data, we will need to consider a non-concave penalized estimator as in [
52]. This is left to future research.
6. Empirical Application
To illustrate the applicability of the proposed method to real data analysis, we conduct a simple empirical study using high-frequency financial data. We take 1 March 2018 as the observation interval
and the log-price processes of the component stocks of the S&P 500 index as the process
Y. In addition, as is often performed in the literature, we regard the SPDR S&P 500 ETF (SPY) as the observable factor process
X. We use 5-minute returns to compute the estimators presented in
Section 4. The dataset is provided by Bloomberg. Please note that our setting implies
and
, yielding a high-dimensional setting considered in this paper (note that our dataset does not contain observations at the market opening).
The selection procedure presented in
Section 5.1 suggests
. Then we estimate the support
of
by the estimator
with
from Corollary 2.
Figure 1 shows the partial correlation network induced by
, drawn by the R package
igraph. Specifically, it depicts the undirected graph with vertices consisting of the S&P 500 component stocks and an edge set given by
. To illuminate the relationship between the network and sector structures, we color the vertices according to their Global Industry Classification Standard (GICS) sectors. We find there are strong interconnections in several sectors such as Consumer Staples, Energy, Real Estate and Utilities. The figure also suggests that the network would have some characteristics that are commonly observed in scale-free networks: It consists of a giant component with several hubs and a few small components. This is consistent with an observation made in [
42]. Indeed, in [
42] the authors have proposed a model for
that induces a scale-free partial correlation network. According to their model, the decay of the largest eigenvalues of
also exhibits power-law behavior. More precisely, letting
be the ordered eigenvalues of
, we have
with some
for moderate
i and large
d. Then, it is interesting to check whether this is the case in our dataset.
Figure 2 shows the log-log size-rank plot for the 50 largest eigenvalues of
. We see that except for the three largest eigenvalues, they clearly display power-law behavior.