Element Aggregation for Estimation of High-Dimensional Covariance Matrices

Jingying Yang

doi:10.3390/math12071045

Abstract

This study addresses the challenge of estimating high-dimensional covariance matrices in financial markets, where traditional sparsity assumptions often fail due to the interdependence of stock returns across sectors. We present an innovative element-aggregation method that aggregates matrix entries to estimate covariance matrices. This method is designed to be applicable to both sparse and non-sparse matrices, transcending the limitations of sparsity-based approaches. The computational simplicity of the method’s implementation ensures low complexity, making it a practical tool for real-world applications. Theoretical analysis then confirms the method’s consistency and effectiveness with its convergence rate in specific scenarios. Additionally, numerical experiments validate the method’s superior algorithmic performance compared to conventional methods, as well as the reduction in relative estimation errors. Furthermore, empirical studies in financial portfolio optimization demonstrate the method’s significant risk management benefits, particularly its ability to effectively mitigate portfolio risk even with limited sample sizes.

Keywords:

covariance matrix; factor models; high dimensionality; portfolio allocation; element aggregation

MSC:

62H20; 62J07

1. Introduction

Covariance matrix estimation is essential in statistics [1], econometrics [2], finance [3], genomic studies [4], and other related fields. This matrix quantifies the pairwise interdependencies between variables in a dataset, and each element signifies the covariance between two variables. The accuracy of this estimation is critical for statistical and data analyses. As the size of the matrix increases, the efficiency of the estimation process often decreases. Therefore, the development of efficient methods for this estimation remains a significant research challenge. Various approaches concentrate on assuming specific features in the matrix. Initial methods include principal component analysis (PCA) [5] and factor models [6]. Other prevalent techniques encompass the constant correlation approach [7], maximum likelihood estimation (MLE) [8], shrinkage methods [9,10], and so on [11]. Notably, shrinkage methods have demonstrated exceptional efficacy in financial portfolio allocation.

For the estimation of sparse or approximately sparse covariance matrices, various element-wise thresholding techniques for the sample covariance matrix have emerged. These include hard thresholding [12,13], soft thresholding with extensions [14], and adaptive thresholding [15,16]. Generally, these methods offer low computational demands and high consistency. However, the resulting matrix might not always be semi-positive definite. To address this, more sophisticated methods have been introduced to ensure the estimators are semi-positive definite [17,18].

While the sparsity condition is often assumed in [6,19], it is not universally applicable. For instance, in financial studies, variables such as stock returns often share common factors, making the sparsity assumption less appropriate. Instead, a more common pattern in the covariance matrix of financial data is the clustering of entries. This occurs because stocks within the same industry sector are often similarly correlated with stocks from other sectors [20]. An illustrative example is provided by [21,22], who analyzed the daily returns of nine companies on the New York Stock Exchange (NYSE) market based on 2515 observations from 2000 to 2009. Their analysis revealed a matrix of correlation coefficients without zero entries, with many coefficients sharing identical values. Using statistical hypothesis testing, the correlation coefficients were grouped into five distinct values

(0.27, 0.35, 0.56, 0.58, 0.69)

, shown in Figure 1a. A similar observation was made by [23]. When stocks are arranged in a particular order, the correlation coefficient matrix manifests itself as a block-symmetric matrix, as depicted in Figure 1b, which can be written as

(\begin{matrix} 1 - 0.27 & 0 & 0 \\ 0 & 1 - 0.58 & 0 \\ 0 & 0 & 1 - 0.69 \end{matrix}) \otimes I_{3} + (\begin{matrix} 0.27 & 0.35 & 0.35 \\ 0.35 & 0.58 & 0.56 \\ 0.35 & 0.56 & 0.69 \end{matrix}) \otimes (\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{matrix}),

where ⊗ is the Kronecker product. We will demonstrate later that this uncomplicated framework is highly effective for estimating the covariance matrix in Corollary 3.

Figure 1. Correlation coefficient matrix of the daily returns of 9 companies with stock symbols AIG, BA, BAC, GS, INTC, JPM, MS, PG, and WFC on the NYSE market.

The proposed estimation method in this paper derives from the correlation matrix structure mentioned earlier. This structure is widespread in numerous financial research situations, encompassing sparse covariance matrices, the formerly noted block-wise covariance matrices, and all correlation coefficients in a global constant [24]. However, many existing methods are unsuitable for estimating covariance matrices under this structure. To broaden our scope, we delve into the estimation of covariance matrices with clustered entries. To achieve this, we propose an element-aggregation method that is tailored towards covariance matrix estimation. Moreover, we examine the theoretical properties of our method and confirm its effectiveness through extensive numerical simulations and real-world data analysis.

The rest of this paper is organized as follows. In Section 2, we propose the element-aggregation method for covariance matrix estimation while also elucidating its implementation. Section 3 assesses the theoretical consistency of the estimation errors and exhibits corresponding convergence rates. All theoretical proofs are in Appendix A, Appendix B, Appendix C, Appendix D, Appendix E and Appendix F. Numerical simulation analyses and real data analysis with portfolio allocation are presented in Section 4 and Section 5, respectively. Conclusions are provided in Section 6.

2. Estimation and Implementation for Covariance Matrix

Suppose the multiple random variable

X = {(x_{1}, \dots, x_{p})}^{⊤}

has the covariance matrix

Σ = {(σ_{i j})}_{1 \leq i, j \leq p}

, and samples

X_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}

,

i = 1, \dots, n

are generated from X. The sample covariance matrix is defined as

\hat{Σ} = {({\hat{σ}}_{i j})}_{1 \leq i, j \leq p}

. For

1 \leq i, j \leq p

,

{\hat{σ}}_{i j} = n^{- 1} \sum_{ℓ = 1}^{n} π_{ℓ, i j},

where

π_{ℓ, i j} = (x_{ℓ i} - {\bar{x}}_{i}) (x_{ℓ j} - {\bar{x}}_{j})

and

{\bar{x}}_{k} = n^{- 1} \sum_{ℓ = 1}^{n} x_{ℓ k}

for

k = 1, \dots, p

.

Motivated by the structure of the correlation matrix discussed in the introduction and aiming to analyze the covariance matrices with clustered entries, we consider the following specification for the covariance matrix. Let

Ω = {σ_{i j} : i < j}

denote the collection of off-diagonal elements in the covariance matrix

Σ

, and let the set of distinct elements in

Ω

be represented by

ℵ = {ς_{k} \in Ω : ς_{k} \neq ς_{k^{'}}, \forall 1 \leq k \neq k^{'} \leq K},

where K is the cardinality of set ℵ, and K changes with p. In other words,

σ_{i j} \in ℵ

for all

1 \leq i < j \leq p

. For

k = 1, \dots, K

, the index set of covariance matrix elements that are equal to

ς_{k}

is defined as

ℵ_{k} = {(i, j) : σ_{i j} = ς_{k}, 1 \leq i < j \leq p} .

In brief,

ℵ_{k}

stands for the category of

ς_{k}

. Similarly, for

1 \leq i < j \leq p

, the index set of covariance matrix elements that are equal to

σ_{i j}

is defined as

ℵ_{i j} = {(a, b) : σ_{a b} = σ_{i j}, 1 \leq a < b \leq p} .

Then, for

(i, j) \in ℵ_{k}

, we have

ℵ_{i j} = ℵ_{k}

.

If the sets

ℵ_{k}, k = 1, \dots, K

are known, i.e., the indices for same elements in

Σ

are known, then we can estimate

ς_{k}

by averaging the sample covariance elements

{\hat{σ}}_{i j}

as follows for all

(i, j) \in ℵ_{k}

,

\begin{matrix} {\overset{ˇ}{ς}}_{k} = & \frac{1}{# ℵ_{k}} \sum_{(i, j) \in ℵ_{k}} {\hat{σ}}_{i j} \\ = & n^{- 1} \frac{1}{# ℵ_{k}} \sum_{ℓ = 1}^{n} \sum_{(i, j) \in ℵ_{k}} π_{ℓ, i j}, \end{matrix}

(1)

where

# ℵ_{k}

is the cardinality of set

ℵ_{k}

. Thus, the covariance matrix

Σ

can be estimated by

\overset{ˇ}{Σ} = ({\overset{ˇ}{σ}}_{i j}), with {\overset{ˇ}{σ}}_{i j} = {\overset{ˇ}{ς}}_{k}, if (i, j) \in ℵ_{k} .

(2)

Element-Aggregation Estimation Method (ELA)

We hereby introduce the element-aggregation (ELA) estimation method for estimation purposes. As

ℵ_{k}

is unknown, we estimate

ℵ_{i j}

, i.e., the category of each

σ_{i j}

, as follows,

{\tilde{ℵ}}_{i j} = {(a, b) : | {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | < c {\hat{ϱ}}_{i j} \sqrt{\frac{log (p \lor n)}{n}}, 1 \leq a < b \leq p},

(3)

where

{\hat{ϱ}}_{i j}

is a sample estimate of

ϱ_{i j} = V a r (π_{i j})

and c is a tuning parameter. Regarding the choice of this tuning parameter c, Cai and Liu [15] show that a good choice of the tuning parameter c does not affect the rate of convergence but it does affect the numerical performance of the estimators. The tuning parameter c can be taken as fixed at

c = 2

, as suggested in [15], or it can be chosen empirically by cross-validation, such as via the five-fold cross-validation method used in [12,15]. The tuning parameter c operates akin to a significance level in statistical testing, and the 2-

σ

rule is widely endorsed for normally distributed data, wherein approximately

95 %

of observations are encapsulated within the interval of the mean plus or minus two standard deviations, denoted as

[μ - 2 σ, μ + 2 σ]

. In our simulations, we also discover that employing

c = 2

as the tuning parameter for the proposed ELA algorithm yields commendable performance in high-dimensional cases, and does not differ significantly from the parameter optimization obtained by cross-validation. Consequently, we adopt

c = 2

as the tuning parameter for our subsequent analysis.

Analogous to (1), we estimate the covariance matrix element

σ_{i j}

by

{\tilde{σ}}_{i j} = n^{- 1} \frac{1}{# {\tilde{ℵ}}_{i j}} \sum_{ℓ = 1}^{n} \sum_{(s, t) \in {\tilde{ℵ}}_{i j}} π_{ℓ, s t},

(4)

and

Σ

by

\tilde{Σ} = ({\tilde{σ}}_{i j})

. This is the element-aggregation (ELA) estimator of the covariance matrix.

In terms of computational complexity, the ELA process requires

O (p^{2} log (p))

time, where

log (p)

accounts for the binary search computation of

{\tilde{ℵ}}_{i j}

in

{{\hat{σ}}_{a b} : 1 \leq a < b \leq p}

. In comparison, element-wise thresholding has a computational complexity of

O (p^{2})

. Therefore, the ELA estimation calculation is only slightly more complex than the element-wise thresholding approach.

3. Theoretical Properties

In this section, we provide the theoretical justification for the element-aggregation estimation under the assumption of a normal distribution of X. The results can be proven under more general conditions using more complicated techniques. Let

∥ A ∥

denote the operator norm, i.e.,

∥ A ∥ = {λ_{m a x} (A A^{⊤})}^{1 / 2}

, where

λ_{m a x} (A A^{⊤})

is the square root of the largest eigenvalue of

A A^{⊤}

.

Lemma 1.

Let K be the number of different values of the off-diagonal elements in the covariance matrix

C o v (X) = Σ

. Let

n_{i, k}

be the numbers of elements that are equal to

ς_{k}

in the i-th row of Σ, i.e.,

n_{i, k} = # {j : σ_{i j} = ς_{k}, 1 \leq j \neq i \leq p}

. If

max_{1 \leq i, j \leq p} σ_{i j} \leq M_{0}

for some constant

M_{0}

and

V a r (\frac{1}{\sqrt{# ℵ_{k}}} \sum_{(i, j) \in ℵ_{k}} π_{ℓ, i j}) = O (V_{k}^{2}), k = 1, \dots, K,

where

V_{k} = Var ({\overset{ˇ}{ς}}_{k})

and

{\overset{ˇ}{ς}}_{k}

is the estimate in (1), then the covariance matrix estimator

\overset{ˇ}{Σ}

in (2) has the following estimation error

∥ \overset{ˇ}{Σ} - Σ ∥ = O_{P} (\sqrt{\frac{log (p \lor n)}{n}}) \sum_{k = 1}^{K} \frac{n_{j, k}}{\sqrt{# ℵ_{k}}} V_{k} .

An important characteristic is that the effect of dimension on the estimation error is regulated by a slowly varying function,

log (p)

. Lemma 1 could also be extended to the case that

ℵ_{k}

is unknown. In such cases, we need to identify the corresponding category

ℵ_{k}

. To simplify the explanation, we will use

ℵ_{i j}

here, since

ℵ_{i j} = ℵ_{k}

if

(i, j) \in ℵ_{k}

.

Lemma 2.

Suppose X is normally distributed with

max_{1 \leq i, j \leq p} σ_{i j} \leq M_{0}

for some constant

M_{0}

. If

\sqrt{n} min {| ς_{k} - ς_{j} |, k \neq j} / {(log (p \lor n))}^{1 / 2} \to \infty,

then the identification by (3) is consistent, i.e., for any

(i, j) \in ℵ_{k}

,

P r (⋃_{1 \leq i, j \leq p} {{\tilde{ℵ}}_{i j} \neq ℵ_{i j}}) \to 0 .

Theorem 1.

With the above notation, if the conditions in Lemmas 1 and 2 are satisfied, then

∥ \tilde{Σ} - Σ ∥ = O_{P} (\sqrt{\frac{log (p \lor n)}{n}}) \sum_{k = 1}^{K} \frac{n_{j, k}}{\sqrt{# ℵ_{k}}} V_{k} .

(5)

Below, we give some special cases of Theorem 1, and show the convergence rates in Corollaries 1–3.

Corollary 1.

Suppose

Σ = {(σ_{i j})}_{1 \leq i, j \leq p}

is sparse with a number of non-zero elements in each row of

o (p)

, i.e.,

c (p) / p \to 0

, where

c (p) = {max}_{j} \sum_{i = 1}^{p} (σ_{i j} \neq 0)

. If the conditions in Theorem 1 hold, then

∥ \tilde{Σ} - Σ ∥ = O_{P} (c (p) \sqrt{\frac{log (p \lor n)}{n}}) .

This result is a special case in [12]. It could be extended to more general cases.

Corollary 2.

Suppose

X = {(x_{1}, \dots, x_{p})}^{⊤}

follows normal distribution with

Σ = {(ς_{| i - j |})}_{1 \leq i, j \leq p} .

If

| ς_{k} | = O (c^{k})

for some

c < 1

, then

∥ \tilde{Σ} - Σ ∥ = O_{P} (log (p) \sqrt{\frac{log (p \lor n)}{n}}) .

Finally, we consider a simple block covariance matrix with the matrix in Figure 1b as a special case.

Corollary 3.

Suppose

Σ = Λ_{M} \otimes I_{m} + Φ_{M} \otimes Ξ_{m}

where

I_{m}

is a

m \times m

identity matrix,

Λ_{M}

is a diagonal matrix,

Ξ_{m}

is a

m \times m

matrix with all elements 1, and

Φ_{M}

is an

M \times M

symmetric matrix. If M is fixed as

p \to \infty

, then we have

| | \tilde{Σ} - Σ | | = O_{P} (n^{- 1 / 2}) .

Remark 1.

Since the dimension M of the matrix

Φ_{M}

and the diagonal matrix

Λ_{M}

assumed in Corollary 3 are fixed (

p \to \infty

), the different values for their matrix elements are at most

M (M + 1) / 2

and M, respectively, so Σ also has at most

M (M + 3) / 2

different values, which is fixed at

p \to \infty

. By Lemma 2, the estimates

{\tilde{Λ}}_{M}

and

{\tilde{Φ}}_{M}

satisfy the rate of

O_{P} (n^{- 1 / 2})

, i.e., for any

δ > 0

, a subset of the probability space

S_{n} : P r (S_{n}) > 1 - δ

can be found, such that

| | {\tilde{Λ}}_{M} - Λ_{M} | | = O_{P} (n^{- 1 / 2}), | | {\tilde{Φ}}_{M} - Φ_{M} | | = O_{P} (n^{- 1 / 2}) .

where

\tilde{Σ} = {\tilde{Λ}}_{M} \otimes I_{m} + {\tilde{Φ}}_{M} \otimes Ξ_{m} .

Using the eigenvalue and Kronecker product properties, we show in the Appendix A, Appendix B, Appendix C, Appendix D, Appendix E and Appendix F that the covariance matrix estimate

\tilde{Σ}

also has the same rate

O_{P} (n^{- 1 / 2})

. That is, since the number of selectable elements of the covariance matrix is determined by a fixed M, which is not dependent on p, the rate of convergence for the covariance matrix estimate

\tilde{Σ}

is also independent of p.

Our ELA method efficiently estimates the covariance matrix with the above structure, demonstrating both efficiency and dimensional independence. Mathematically, this block-wise structure is straightforward. Furthermore, we hypothesize that these conclusions can be extended to more general block-wise matrices, such as the Khatri–Rao product or the Tracy–Singh product [25].

4. Numerical Simulation

Our study presents the ELA method to estimate the covariance matrix using an element-aggregation idea. The ELA method is designed for simplicity and clarity, making it easy to implement. The steps for implementing the method were thoroughly detailed in the preceding section. This section focuses on comparing the algorithmic performance of the proposed method with several estimation methods for high-dimensional sparse and non-sparse covariance matrices. We compare our ELA estimation method with multiple established methods: (1) the adaptive thresholding method [15], (2) the POET_k method [19] with

k = 0, 1, 2, \dots

factors, and (3) the Rothman method [18].

The numerical simulation focuses on the relative matrix errors when estimating the covariance matrix and the correlation coefficient matrix. To estimate the covariance matrix, first standardize each variable. Then, multiply the elements of the correlation coefficient matrix by the sample standard deviations of each random variable. This effectively transforms the estimation of the covariance matrix into the estimation of the corresponding correlation coefficient matrix. Thus, the ELA method’s efficiency and accuracy in capturing the underlying data structure can be comprehensively assessed through the dual focus on both covariance and correlation coefficient matrix estimations.

The metric employed to compare the estimation efficiency between these estimation methods is the average matrix loss over 500 replications. The matrix losses are measured by the spectral norm of the correlation coefficient matrix and the covariance matrix, similar to that performed in [18]. To aid visualization, we define the relative estimation errors between the estimated matrix and the actual matrix for both the correlation coefficient matrix R and the covariance matrix

Σ

as follows:

e_{c o r} = \frac{∥ \hat{R} - R ∥}{∥ R ∥} \times 100 %, e_{c o v} = \frac{∥ \hat{Σ} - Σ ∥}{∥ Σ ∥} \times 100 % .

where

∥ A ∥ = {λ_{m a x} (A A^{⊤})}^{1 / 2}

denote the operator norm. The next simulation compares the relative matrix estimation errors for various matrix types and algorithms, but also shows how the matrix estimation error changes as the matrix dimension p increases, in order to demonstrate more clearly the superiority of the proposed estimation algorithm.

The following three covariance matrices

Σ = {(σ_{i j})}_{1 \leq i, j \leq p}

are considered.

Example 1.

σ_{i j} = \sqrt{i j} ρ^{| i - j |}

with

ρ = - 0.9

.

Example 2.

σ_{i j} = {(i j)}^{1 / 4} r_{i j}

where

r_{i i} = 1, r_{i j} = 0.6 {| i - j |}^{- 1} u_{i j}

for

i \neq j

and

u_{i j} = u_{j i}

are IID uniform on [0, 1]. A similar example was used in [16].

Example 3.

Constant block matrix

Σ = (\begin{matrix} Σ_{1} & Σ_{2} \\ Σ_{2} & Σ_{3} \end{matrix})

where

Σ_{1} = 0.8 Ξ_{p / 2} + 0.2 I_{p / 2}, Σ_{2} = - 0.4 Ξ_{p / 2}, Σ_{3} = 0.3 Ξ_{p / 2} + 0.7 I_{p / 2}

,

Ξ_{p / 2}

is a

(p / 2) \times (p / 2)

matrix with all elements 1, and

I_{p / 2}

is the identity matrix of size

p / 2

.

Remark 2.

Due to the assumption of sparse covariance matrices in the adaptive thresholding method, the POET₀ method, and the Rothman estimation method [6,15,18], the covariance matrix is estimated as a sparse matrix, where most of the matrix elements are estimated to be zero. This is significantly different from the non-sparse covariance matrix in Example 3. The relative estimation error of the covariance matrix estimated by these methods is relatively large compared to the sample covariance estimation method. Therefore, the adaptive thresholding method, the POET₀ method, and the Rothman method cannot be used directly for Example 3.

Note that Σ can be written as

Σ = Σ_{0} + λ_{1} ℓ_{1} ℓ_{1}^{⊤} + λ_{2} ℓ_{2} ℓ_{2}^{⊤},

where

Σ_{0}

is a sparse matrix and

λ_{1}

and

λ_{2}

are the first two largest eigenvalues of Σ, with

ℓ_{1}

and

ℓ_{2}

as the corresponding eigenvectors, respectively. By this decomposition, the POET_k method with

k = 2

factors in [19] can be used, i.e., POET₂ is applicable to the estimation for Example 3.

Random samples are generated from

X \sim N (0, Σ)

with sample sizes

n = (100, 200)

and

p = (100, 500, 900, 1400, 1900, 2300, 2500)

. The averages of the relative estimation errors, based on 500 replications, are depicted in Figure 2, Figure 3 and Figure 4. The ELA method is represented by a solid red line. Only methods with a performance comparable to the best one are displayed in each panel.

Figure 2. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 1 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 3. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 2 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 4. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 3 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

In Figure 2 for Example 1 with

n = 100

, our ELA estimator for the correlation coefficient matrix R is slightly inferior to POET₀ but exceeds Rothman. For the covariance matrix

Σ

, our method slightly lags behind both POET₀ and Rothman. However, with

n = 200

, the ELA method outperforms them in both the correlation coefficient matrix and the covariance matrix with

p < 1000

, ranking first for the correlation coefficient matrix and second for the covariance matrix with

p > 1000

. In general, the ELA method is comparable to both POET and Rothman for Example 1.

In Figure 3 for Example 2, the ELA estimator consistently outperforms both POET and Rothman for both matrices when

n = (100, 200)

. Although Remark 2 suggests that POET₂ is suitable for Example 3, Figure 4 for Example 3 demonstrates the superior efficiency of the ELA method over POET₂ for both matrices with

n = (100, 200)

. In summary, the ELA method significantly outperforms both POET and Rothman for Examples 2 and 3.

Furthermore, we conducted a comprehensive analysis comparing the average computational time across 500 replications for our ELA method alongside the adaptive thresholding method [15], the POET method [19], and the Rothman method [18]. These comparisons are detailed in Table 1. The random samples are still generated from

X \sim N (0, Σ)

with

Σ

following Example 1, sample sizes

n = (100, 200)

, and covariance matrix dimensions

p = (100, 500, 900, 1400, 1900, 2300, 2500)

. Notably, due to the POET_k methods exhibiting computational times that are broadly analogous for different values of k, we present only the results for POET₁.

Table 1. The average computational time over 500 replications of the ELA method, the adaptive thresholding method, the POET₁ method, and the Rothman method for sample sizes

n = (100, 200)

and

p = (100, 500, 900, 1400, 1900, 2300, 2500)

(in seconds).

Upon examination of Table 1, it is evident that, for lower dimensions of

p = (100, 500)

, the ELA method has a higher computational cost compared to the adaptive thresholding and Rothman methods but outperforms the POET method. For

p = (900, 1400)

, the ELA method is slightly slower than adaptive thresholding but remains superior to both the POET and Rothman methods. When dealing with higher dimensions of

p = (2300, 2500)

, the ELA method demonstrates a significant computational advantage over its counterparts. This suggests that the ELA method scales more efficiently with increasing dimensionality, which is a desirable attribute for high-dimensional data analysis.

Our simulation studies have yielded the following conclusions: The ELA method outperforms both POET and Rothman for Examples 2 and 3 in both the correlation coefficient matrix and the covariance matrix with

n = (100, 200)

and

p = (100, \dots, 2500)

. For Example 1, it performs comparably to them. These results emphasize the efficiency of the ELA method for estimating covariance matrices. Our proposed ELA technique is not just computationally efficient but also markedly lowers the relative estimation error, i.e., with minimal loss of spectral norm for the correlation coefficient and the covariance matrix.

5. Real Data Analysis and Portfolio Allocation

The component stocks of the SP500 index were analyzed using historical daily prices sourced from Yahoo Finance through the R package ‘tidyquant’. We selected 430 stocks from SP500 with daily prices spanning over 3000 days, from 1 January 2008 to 31 December 2019, to ensure a sufficiently large sample size. This allowed for a larger window,

N = 500

, to be selected when we use the rolling window method later. Our emphasis is on the correlation coefficient matrix and covariance matrix for daily returns. Although the covariance matrix, in particular the individual stock variances, is known to fluctuate over time, the correlation coefficient matrix remains relatively stable. This stability underscores the significance of evaluating estimation methods based on the correlation coefficient matrix [26]. The constant conditional correlation multivariate GARCH model [26] is defined as

H_{t} = Λ_{t} R Λ_{t},

where

Λ_{t} = d i a g (σ_{1 t}, \dots, σ_{p t})

,

R = {(ρ_{i j})}_{1 \leq i, j \leq p}

is the constant correlation coefficient matrix, and

σ_{i t}

is the conditional volatility modeled by GARH(1,1) for i-th stock based on its past daily returns. See [27] for more discussion about the performance of GARCH(1,1). Let

r_{k, t}

be the return of the k-th stock on day t, where

k = 1, \dots, 430

. We begin by standardizing the returns using

u_{k, t} = r_{k, t} / σ_{k, t} .

In order to evaluate the estimation performance of the covariance matrix, we assume that the covariance matrix of

u_{t} = {(u_{1, t}, \dots, u_{p, t})}^{⊤}

remains constant over time or changes very slowly.

Rooted in modern portfolio theory (MPT), portfolio allocation advocates strategically distributing investments across diverse asset classes to optimize the trade-off between risk and return. The MPT theory highlights the significance of not only choosing individual stocks but also the proportional weighting of these stocks in a portfolio [28,29]. Therefore, we investigate the optimal allocation for minimum risk portfolios of stocks, utilizing the estimated covariance matrix of various methods.

To evaluate various methods, we apply their estimators to construct minimum risk portfolios. For any rolling window of time

(t - N, \dots ., t - 1)

, we utilize data from one window to estimate the covariance matrix via different methods, including (1) our ELA method, (2) the POET_k method [19] with

k = 0, 1, 2

factors, (3) the shrinkage method [9], denoted by Shrink_Market, and (4) the simple sample covariance matrix, denoted by Sample. The simple sample estimate of the covariance matrix

Σ

for a given time t is defined as

{\hat{Σ}}_{t}^{S} = N^{- 1} \sum_{s = t - N}^{t - 1} (u_{s} - \bar{u}) {(u_{s} - \bar{u})}^{⊤}

with

u_{s} = {(u_{1, s}, \dots, u_{p, s})}^{⊤}

.

For an estimated covariance matrix

{\hat{Σ}}_{t}

, the minimum risk portfolio

w_{t} = {(w_{1, t}, \dots, w_{p, t})}^{⊤}

for time t is defined as

\begin{matrix} w_{t}^{*} = & \underset{w_{t}}{arg min} w_{t}^{⊤} {\hat{Σ}}_{t} w_{t} \\ subject to & w_{i, t} \geq 0, i = 1, \dots, p \\ w_{1, t} + \dots + w_{p, t} = 1 . \end{matrix}

The portfolio for time t is then

w_{t}^{⊤} u_{t}

. To determine the portfolio with the lowest risk, we calculate the standard deviation of the portfolio returns

{w_{t}^{⊤} u_{t}, t = N + 1, \dots, T}

. Estimators yielding a smaller standard deviation are deemed superior in portfolio allocation.

To understand how risk varies with the number of stocks, we alphabetically order the stocks by their symbols on the New York Stock Exchange and analyze growing subsets of stocks with

p = 50

,

70, 90, \dots, 430

. The risks associated with portfolios based on different methods are illustrated in Figure 5, with the ELA method highlighted by a red solid line. As the sample size N in the rolling window increases from 100 to 500 across the panels, the performance gap between the different estimation methods narrows, illustrating the homogeneity of the correlation matrix.

Figure 5. Portfolios of minimum risk based on different estimators of the covariance matrix.

Figure 5 shows that portfolios crafted using the ELA method consistently have the lowest risk compared to the POET_k methods, the shrinkage method, and the sample covariance matrix method, especially with smaller sample sizes of

N = 100

and

N = 200

. Therefore, the ELA method is superior in constructing portfolios that minimize risk, is an effective tool for reducing portfolio risk, and is suitable for a wide range of sample sizes.

6. Conclusions

In this paper, we introduce a novel method called element-aggregation (ELA) estimation for the estimation of covariance matrices. The ELA method stands out due to its simplicity, low computational complexity, and applicability to both sparse and non-sparse matrices. Our theoretical analysis shows that the ELA method offers strong consistency while maintaining computational efficiency and dimensional independence, especially concerning block-wise covariance matrices.

In our numerical simulation study, we highlight the exceptional effectiveness of the ELA method in estimating correlation coefficient matrices and covariance matrices using diverse random samples. In comparison to established methods like POET and Rothman, the ELA method consistently either outperforms or equals them. The computational efficiency of our ELA method is complemented by a significant reduction in the relative estimation error for both correlation coefficients and covariance matrices.

In the real data analysis of the SP500 index stocks, the ELA method consistently generates portfolios with the lowest risk in comparison to other listed methods. This outcome is particularly pronounced in scenarios with smaller sample sizes, underscoring the potent ability of the ELA method to construct risk-minimized portfolios.

7. Future Work

This paper delves into the estimation of covariance matrices employing an elemental clustering approach, substantiating and evaluating the consistency and efficiency of the proposed methodology within the realm of high-dimensional data. The exponential growth in data volume and dimensionality due to advancements in computational and storage technologies has led to the emergence of ultra-high-dimensional and high-frequency datasets. Thus, the extensibility of the proposed method to these novel datasets, coupled with the demonstration of its consistency and efficacy, presents a significant avenue for future research. In addition, the exploration of computational strategies to increase efficiency under the constraints of ultra-high-dimensionality and to optimize the trade-off between computational resources and analytical accuracy is imperative. Furthermore, the proposed method has the potential to be integrated into diverse domains, such as psychology, social sciences, and genetic research. This also represents a promising direction for future research.

Funding

This research was funded by the Doctoral Foundation of Yunnan Normal University (Project No. 2020ZB014) and the Youth Project of Yunnan Basic Research Program (Project No. 202201AU070051).

Data Availability Statement

The author confirms that the data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Proof of Lemma 1

Proof.

First, recall that

{\overset{ˇ}{ς}}_{k} = \frac{1}{# ℵ_{k}} \sum_{(i, j) \in ℵ_{k}} {\hat{σ}}_{i j}

with

σ_{i j} = ς_{k}

for all

(i, j) \in ℵ_{k} .

We regard

{\hat{σ}}_{i j}

as independent zero-mean random variables with

E ({\hat{σ}}_{i j}) = ς_{k}

, for all

(i, j) \in ℵ_{k}

. By Bernstein inequalities we have

| {\overset{ˇ}{ς}}_{k} - ς_{k} | = \frac{V_{k}}{\sqrt{# ℵ_{k}}} O_{P} (\sqrt{\frac{log (p \lor n)}{n}}), k = 1, \dots, K .

where

V_{k} = V a r ({\overset{ˇ}{ς}}_{k})

. For symmetric matrices, by [30], we have

∥ \overset{ˇ}{Σ} - Σ ∥ \leq max_{j} \sum_{i = 1}^{p} | {\overset{ˇ}{σ}}_{i j} - σ_{i j} | .

(A1)

For the right hand side above,

\begin{matrix} \sum_{i = 1}^{p} | {\overset{ˇ}{σ}}_{i j} - σ_{i j} | & = \sum_{k = 1}^{K} \sum_{i : σ_{i j} = ς_{k}} | {\overset{ˇ}{σ}}_{i j} - ς_{k} | \\ = \sum_{k = 1}^{K} n_{j, k} | {\overset{ˇ}{ς}}_{k} - ς_{k} | \\ \leq O_{P} (\sqrt{\frac{log (p \lor n)}{n}}) \sum_{k = 1}^{K} \frac{n_{j, k}}{\sqrt{# ℵ_{k}}} V_{k} . \end{matrix}

We complete the proof. □

Appendix B. Proof of Lemma 2

Proof.

Let

D_{n} = {(M log (p \lor n))}^{1 / 2} \sqrt{ϱ_{a b} / n}

. For any

(i, j)

such that

σ_{i j} = ℵ_{k}

, recall that

{\tilde{ℵ}}_{i j} = {(a, b) : | {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | < D_{n}} .

Let

W_{1} = {(a, b) \in ℵ_{k} : (a, b) \notin {\tilde{ℵ}}_{i j}}

, and

W_{2} = {(a, b) \in {\tilde{ℵ}}_{i j} : (a, b) \notin ℵ_{k}}

. Obviously,

{{\tilde{ℵ}}_{i j} \neq ℵ_{k}} \subset W_{1} \cup W_{2} .

Thus, it is sufficient to prove

P r (W_{1}) \to 0 and P r (W_{2}) \to 0 .

For any

(a, b) \in W_{1}

, we have

(a, b) \in ℵ_{k}

, i.e.,

σ_{a b} = ς_{k}

, and

{(a, b) \notin {\tilde{ℵ}}_{i j}} = \{| {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | \geq D_{n}\} \subset {| {\hat{σ}}_{i j} - ς_{k} | \geq \frac{1}{2} D_{n}} \cup {| {\hat{σ}}_{a b} - ς_{k} | \geq \frac{1}{2} D_{n}} .

For any

σ_{a b} = ς_{k}

, if X follows normal distribution, by Bernstein inequality, we have

P r \{| {\hat{σ}}_{a b} - ς_{k} | \geq \frac{1}{2} D_{n}\} \leq exp (- \frac{1}{16} M log (p \lor n)) = {(p \lor n)}^{- M / 16}

(A2)

and

P r \{| {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | \geq D_{n}\} \leq P r \{| ς_{k} - {\hat{σ}}_{s t} | \geq \frac{1}{2} D_{n}\} + P r \{| ς_{k} - {\hat{σ}}_{i j} | \geq \frac{1}{2} D_{n}\} \leq 2 {(p \lor n)}^{- M / 16} .

Thus, as

n \to \infty

,

P r (W_{1}) \leq \sum_{(a, b) \in ℵ_{k}} P r {(a, b) \notin {\tilde{ℵ}}_{i j}} \leq 3 {(p \lor n)}^{- M / 16} p^{2} \to 0, if M > 32 .

(A3)

Now, for any

(a, b) \in W_{2}

, we have

(a, b) \notin ℵ_{k}

, but instead

(a, b) \in ℵ_{s}

for some

s \neq k

. By the assumption, we have

| ς_{k} - ς_{s} | > 2 D_{n}

when

k \neq s

and

log (p \lor n) / n \to 0

. Thus,

\begin{matrix} \{| {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | \leq D_{n}\} & \subset \{| ({\hat{σ}}_{i j} - ς_{k}) + (ς_{s} - {\hat{σ}}_{a b}) + (ς_{k} - ς_{s}) | \leq D_{n}\} \\ \subset \{(| ς_{k} - ς_{s} | - | {\hat{σ}}_{i j} - ς_{k} | - | ς_{s} - {\hat{σ}}_{a b} |) \leq D_{n}\} \\ = \{(| {\hat{σ}}_{i j} - ς_{k} | + | ς_{s} - {\hat{σ}}_{a b} |) \geq (| ς_{k} - ς_{s} | - D_{n})\} \\ \subset \{(| {\hat{σ}}_{i j} - ς_{k} | + | ς_{s} - {\hat{σ}}_{a b} |) \geq D_{n}\} \\ \subset \{| {\hat{σ}}_{i j} - ς_{k} | \geq \frac{1}{2} D_{n}\} \cup \{| ς_{s} - {\hat{σ}}_{a b} | \geq \frac{1}{2} D_{n}\} . \end{matrix}

Similar to (A2) and (A3), we have

P r (W_{2}) \leq \sum_{(a, b) \in ℵ_{k}} P r {(a, b) \in {\tilde{ℵ}}_{i j}} \leq 3 {(p \lor n)}^{- M / 16} p^{2} \to 0, if M > 32 .

(A4)

Thus, Lemma 2 follows from (A3) and (A4). □

Appendix C. Proof of Theorem 1

Proof.

It follows immediately by applying Lemma 1 to

⋃_{1 \leq i, j \leq p} {{\tilde{ℵ}}_{i j} \neq ℵ_{k}}

in Lemma 2. □

Appendix D. Proof of Corollary 1

Proof.

For simplicity, we only consider the case with

E X = 0

and

{\hat{σ}}_{i j} = n^{- 1} \sum_{k = 1}^{n} x_{k i} x_{k j}

. For any

ℓ = 1, \dots, n

, define

π_{ℓ, i j} = x_{ℓ i} x_{ℓ j}

and

π_{i j} = x_{i} x_{j}

. Let

ℵ_{1} = {(i, j) : σ_{i j} = 0}

. By the assumption, we have

# ℵ_{1} = p (p - 1) (1 - o (1))

and

# ℵ_{k} = O (c (p))

for

k = 2, \dots, K

. Note that

V_{k}^{2} = V a r (\sum_{(i, j) \in ℵ_{k}} π_{ℓ, i j} / \sqrt{# ℵ_{k}}) \leq # ℵ_{k} max_{(i, j) \in ℵ_{k}} V a r ({\hat{σ}}_{i j}) = O (1) # ℵ_{k} .

(A5)

Obviously, if we can prove that for those

σ_{i j} = 0

, the covariance

V_{1}^{2} = V a r (\sum_{(i, j) \in ℵ_{1}} π_{i j} / \sqrt{# ℵ_{1}}) = O (c {(p)}^{2}),

(A6)

then

\sum_{k = 1}^{K} \frac{n_{j, k}}{\sqrt{# ℵ_{k}}} V_{k} \leq \frac{p}{\sqrt{# ℵ_{1}}} V_{1} + \sum_{k = 2}^{K} V_{k} \leq \frac{p}{\sqrt{# ℵ_{1}}} O (c (p)) + \sum_{k = 2}^{K} # ℵ_{k} O (1) = O (c (p)),

thus the Corollary 1 follows.

By the properties of normal distribution, we have

\begin{matrix} C o v (π_{i j}, π_{a b}) & = E (x_{i} x_{j} x_{a} x_{b}) - σ_{i j} σ_{a b} \\ = E (x_{i} x_{j}) E (x_{a} x_{b}) + E (x_{i} x_{a}) E (x_{j} x_{b}) + E (x_{i} x_{b}) E (x_{j} x_{a}) - σ_{i j} σ_{a b} \\ = E (x_{i} x_{a}) E (x_{j} x_{b}) + E (x_{i} x_{b}) E (x_{j} x_{a}) \\ = σ_{i a} σ_{j b} + σ_{i b} σ_{j a} . \end{matrix}

(A7)

Thus, by assumption that

# {σ_{i j} \neq 0, 1 \leq i < j \leq p} = O (p c (p))

, it follows that

# {C o v (x_{1 i} x_{1 j}, x_{1 s} x_{1 t}) \neq 0, i < j, s < t} = O (p^{2} c {(p)}^{2}) .

Thus,

V a r (\sum_{(i, j) \in ℵ_{1}} π_{i j} / \sqrt{# ℵ_{1}}) = O (p^{2} c {(p)}^{2} / p^{2}) = O (c {(p)}^{2}) .

This completes the proof of (A6). □

Appendix E. Proof of Corollary 2

Proof.

By (A7) and the assumption that

σ_{i j} = ς_{| i - j |}

, we have

| C o v (π_{i j}, π_{a b}) | = O (ς_{| i - a |} ς_{| j - b |} + ς_{| i - b |} ς_{| j - a |}) .

(A8)

For

(i, j)

and any

k > 0

, define

A_{k} = {(a, b) : | i - a | = k and | j - b | \leq k} \cup {(a, b) : | i - a | \leq k and | j - b | = k} .

It is easy to see that

min (| i - a | + | j - b |, | i - b | + | j - a |} \geq k

for any

(a, b) \in A_{k}

and

# A_{k} = 8 k

. Thus,

ς_{| i - a |} ς_{| j - b |} + ς_{| i - b |} ς_{| j - a |} = O (c^{k}), for any (a, b) \in A_{k},

and

\sum_{(a, b)} | C o v (π_{i j}, π_{a b}) | = \sum_{k = 1}^{p} \sum_{(a, b) \in A_{k}} | C o v (π_{i j}, π_{a b}) | = \sum_{k = 1}^{p} 8 k O (c^{k}) \leq C_{0}

where

C_{0}

is a constant. Moreover, for any set

S

,

V a r (\sum_{(a, b) \in S} π_{a b}) \leq C_{0} # S .

(A9)

Let

Π_{ℓ} = x_{1} x_{1 + ℓ} + \dots + x_{p - ℓ} x_{p}

. In a similar calculation, we can show that

V a r (Π_{ℓ}) = O (p - ℓ) .

Thus, we have

| {\tilde{ς}}_{ℓ} - ς_{ℓ} | = O (\sqrt{\frac{log (p \lor n)}{(p - ℓ) n}}) .

Let

I = {k : | ς_{k} - ς_{ℓ} | > D_{n}

for all

ℓ \neq k

}, i.e., I is the set of elements in the matrix that can be identified by

D_{n}

. It can also easily be seen that

# I = O (log (p))

, and thus

\sum_{k \in I} | {\tilde{ς}}_{k} - ς_{k} | = O (# I \sqrt{\frac{log (p \lor n)}{(p - # I) n}}) = O (\sqrt{\frac{log (p \lor n)}{n}}) .

(A10)

Let

I I = {k : | ς_{k} | \leq D_{n} / p^{2}}

, i.e.,

I I

are all the elements in the matrix that are very small by themselves. When

p \lor n

is big enough,

I \cap I I = \emptyset

and

# I I = p - O (log (p \lor n))

. For any

| σ_{i j} | \leq D_{n} / p^{2}

, let

{\tilde{σ}}_{i j} = \frac{1}{# {\tilde{ℵ}}_{i j}} \sum_{(s, t) \in {\tilde{ℵ}}_{i j}} {\hat{σ}}_{s t}, and {\bar{σ}}_{i j} = \frac{1}{# {\tilde{ℵ}}_{i j}} \sum_{(a, b) \in {\tilde{ℵ}}_{i j}} σ_{a b},

where

{\tilde{ℵ}}_{i j} = {(a, b) : | {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | < D_{n}}

. It is easy to see that

# {\tilde{ℵ}}_{i j} = p (p - O (log (p \lor n))

, and by (A9),

V a r ({\hat{σ}}_{i j}) = O (1 / (n p^{2})) .

Thus,

max_{(i, j) \in ⋃_{k \in I I} ℵ_{k}} | {\tilde{σ}}_{i j} - {\bar{σ}}_{i j} | = O_{p} (\sqrt{\frac{log (p \lor n)}{p^{2} n}}) .

(A11)

On the other hand, we can see

\sum_{k \in I I} | ς_{k} | \leq (D_{n} / p^{2}) \sum_{i = 1}^{\infty} c^{i} = O (D_{n} / p^{2}),

and thus,

max_{(i, j) \in ⋃_{k \in I I} ℵ_{k}} | σ_{i j} - {\bar{σ}}_{i j} | = O (D_{n} / p^{2}) .

(A12)

It follows from (A11) and (A12) that

max_{(i, j) \in ⋃_{k \in I I} ℵ_{k}} | {\tilde{σ}}_{i j} - σ_{i j} | = O_{p} (\sqrt{\frac{log (p \lor n)}{p^{2} n}}) .

and

\sum_{k \in I I} | {\tilde{ς}}_{k} - ς_{k} | = p O_{p} (\sqrt{\frac{log (p \lor n)}{p^{2} n}}) = O_{p} (\sqrt{\frac{log (p \lor n)}{n}}) .

(A13)

Finally, let

I I I = Ω - I - I I

. Because

ς_{k} = O (c^{k})

, if

(i, j) \notin I

we must have

c^{k} - c^{k + 1} < c_{1} D_{n}

for some

c_{1} > 0

, where

σ_{i j} = ς_{k}

. Thus,

σ_{i j} = O (D_{n})

for any

(i, j) \in I I I

. It is easy to see that

# I I I = O (log (p)) .

Let

{\bar{σ}}_{i j}

be defined similarly as above. Because

| {\hat{σ}}_{i j} - σ_{i j} | = O (D_{n})

and by definition

sup_{(a, b) \in ℵ_{i j}} | {\hat{σ}}_{i j} - {\hat{σ}}_{a b} | \leq D_{n}

thus

| {\bar{σ}}_{i j} - σ_{i j} | < 4 D_{n} .

On the other hand, we have

max_{(i, j) \in ⋃_{k \in I I I} ℵ_{k}} | {\tilde{σ}}_{i j} - {\bar{σ}}_{i j} | = O (log (p)) O_{p} (\sqrt{\frac{log (p \lor n)}{{(log p)}^{2} n}}) = O_{p} (\sqrt{\frac{log (p \lor n)}{n}}) .

Similar to (A13), it follows from the above two equations that

\sum_{k \in I I I} | {\tilde{ς}}_{k} - ς_{k} | \leq O (log (p) D_{n}) .

(A14)

The lemma follows from (A1), (A10), (A13), and (A14). □

Appendix F. Proof of Corollary 3

Proof.

It is easy to see that in

Σ

there are at most

M (M + 3) / 2

different values, which is fixed as

p \to \infty

. By Lemma 2, these elements can be consistently identified and be estimated with root-n consistency, i.e., for any

δ > 0

, we can find a subset of probability space

S_{n} : P r (S_{n}) > 1 - δ

on which

| | {\tilde{Λ}}_{M} - Λ_{M} | | = O_{P} (n^{- 1 / 2}), | | {\tilde{Φ}}_{M} - Φ_{M} | | = O_{P} (n^{- 1 / 2}) .

where

{\tilde{Λ}}_{M} = diag ({\hat{λ}}_{1}, \dots, {\hat{λ}}_{M})

and that

\tilde{Σ} = {\tilde{Λ}}_{M} \otimes I_{m} + {\tilde{Φ}}_{M} \otimes Ξ_{m} .

Let

{\hat{τ}}_{1}, \dots, {\hat{τ}}_{M}

be the M eigenvalues of

{\tilde{Φ}}_{M}

, with corresponding eigenvectors

{\hat{μ}}_{1}, \dots, {\hat{μ}}_{M}

, respectively. Then, it is easy to check that the eigenvalues of

{\tilde{Φ}}_{M} \otimes Ξ_{m}

are

m {\hat{τ}}_{1}, \dots, m {\hat{τ}}_{M}

and 0 of

(m - 1) * M

replications. Because

I_{m}

is the identity matrix with the same dimension as

Ξ_{m}

, the eigenvalues for

{\tilde{Λ}}_{M} \otimes I_{m} + {\tilde{Φ}}_{M} \otimes Ξ_{m}

are

{\hat{λ}}_{1} + m {\hat{τ}}_{1}, \dots, {\hat{τ}}_{m} + m {\hat{λ}}_{M}

and

{\hat{λ}}_{1}

of

(m - 1)

replications, …, and

{\hat{λ}}_{m}

of

(m - 1)

replications.

Let the M eigenvectors of

Φ_{M}

be

μ_{1}, \dots, μ_{M}

, respectively. Note that the first eigenvectors of

Ξ_{m}

are

ν_{1} = {(1, \dots ., 1)}^{⊤} / \sqrt{m}

. Thus, the first M eigenvectors of

{\tilde{Λ}}_{M} \otimes I_{m} + {\tilde{Φ}}_{M} \otimes Ξ_{m}

are

{\hat{ℓ}}_{M} = {\hat{μ}}_{k} \otimes ν_{1}, k = 1, \dots, M

and that of

Λ_{M} \otimes I_{m} + Φ_{M} \otimes Ξ_{m}

are

ℓ_{k} = μ_{k} \otimes ν_{1}, k = 1, \dots, M

, respectively. It is easy to verify that

| | {\hat{ℓ}}_{k} - ℓ_{k} | | = | | {\hat{μ}}_{k} \otimes ν_{1} - μ_{k} \otimes ν_{1} | | = | | {\hat{μ}}_{k} - μ_{k} | | \times | | ν_{1} | | = | | {\hat{μ}}_{k} - μ_{k} | | = O_{P} (n^{- 1 / 2})

for

k = 1, \dots, M

. □

References

Ledoit, O.; Wolf, M. The power of (non-) linear shrinking: A review and guide to covariance matrix estimation. J. Financ. Econ. 2022, 20, 187–218. [Google Scholar] [CrossRef]
Zeileis, A. Econometric Computing with HC and HAC Covariance Matrix Estimators. J. Stat. Softw. 2004, 11, 1–17. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. Rev. Financ. Stud. 2017, 30, 4349–4388. [Google Scholar] [CrossRef]
Schäfer, J.; Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. 2005, 4, 1–32. [Google Scholar] [CrossRef]
Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Method. 1999, 61, 611–622. [Google Scholar] [CrossRef]
Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef]
Driscoll, J.C.; Kraay, A.C. Consistent covariance matrix estimation with spatially dependent panel data. Rev. Econ. Stat. 1998, 80, 549–560. [Google Scholar] [CrossRef]
Pourahmadi, M. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 2000, 87, 425–435. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 2003, 10, 603–621. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef]
Pepler, P.T.; Uys, D.W.; Nel, D.G. Regularized covariance matrix estimation under the common principal components model. Commun. Stat. Simul. Comput. 2018, 47, 631–643. [Google Scholar] [CrossRef]
Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Stat. 2008, 36, 199–227. [Google Scholar] [CrossRef]
El Karoui, N. Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat. 2008, 36, 2717–2756. [Google Scholar] [CrossRef]
Rothman, A.J.; Levina, E.; Zhu, J. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 2009, 104, 177–186. [Google Scholar] [CrossRef]
Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef]
Cai, T.T.; Yuan, M. Adaptive covariance matrix estimation through block thresholding. Ann. Stat. 2012, 40, 2014–2042. [Google Scholar] [CrossRef]
Lam, C.; Fan, J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 2009, 37, 4254–4278. [Google Scholar] [CrossRef] [PubMed]
Rothman, A.J. Positive definite estimators of large covariance matrices. Biometrika 2012, 99, 733–740. [Google Scholar] [CrossRef]
Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B Stat. Method. 2013, 75, 603–680. [Google Scholar] [CrossRef]
Onnela, J.P.; Chakraborti, A.; Kaski, K.; Kertesz, J.; Kanto, A. Dynamics of market correlations: Taxonomy and portfolio analysis. Phys. Rev. E 2003, 68, 056110. [Google Scholar] [CrossRef]
Matteson, D.; Tsay, R.S. Multivariate Volatility Modeling: Brief Review and a New Approach; Manuscript, Booth School of Business, University of Chicago: Chicago, IL, USA, 2011. [Google Scholar]
Qian, J. Shrinkage Estimation of Nonlinear Models and Covariance Matrix. Doctoral Thesis, National University of Singapore, Department of Statistics and Applied Probability, Singapore, 2012. Available online: https://core.ac.uk/download/pdf/48656486.pdf (accessed on 27 March 2024).
Jiang, H.; Saart, P.W.; Xia, Y. Asymmetric conditional correlations in stock returns. Ann. Appl. Stat. 2016, 10, 989–1018. [Google Scholar] [CrossRef]
Elton, E.J.; Gruber, M.J. Estimating the dependence structure of share prices–implications for portfolio selection. J. Financ. 1973, 28, 1203–1232. [Google Scholar]
Liu, S. Matrix results on the Khatri-Rao and Tracy-Singh products. Linear Algebra Appl. 1999, 289, 267–277. [Google Scholar] [CrossRef]
Bollerslev, T. Modeling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Rev. Econ. Stat. 1990, 72, 498–505. [Google Scholar] [CrossRef]
Hansen, P.R.; Lunde, A. A forecast comparison of volatility models: Does anything beat a GARCH(1,1)? J. Appl. Econ. 2005, 20, 873–889. [Google Scholar] [CrossRef]
Aghamohammadi, A.; Dadashi, H.; Sojoudi, M.; Sojoudi, M.; Tavoosi, M. Optimal portfolio selection using quantile and composite quantile regression models. Commun. Stat. Simul. Comput. 2022, 1–11. [Google Scholar] [CrossRef]
Zhang, Z.; Yue, M.; Huang, L.; Wang, Q.; Yang, B. Large portfolio allocation based on high-dimensional regression and Kendall’s Tau. Commun. Stat. Simul. Comput. 2023, 1–13. [Google Scholar] [CrossRef]
Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]

Figure 1. Correlation coefficient matrix of the daily returns of 9 companies with stock symbols AIG, BA, BAC, GS, INTC, JPM, MS, PG, and WFC on the NYSE market.

Figure 2. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 1 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 2. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 1 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 3. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 2 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 3. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 2 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 4. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 3 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 4. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 3 with sample sizes

n = 100

and 200; the estimation is based on 500 replications per data point.

Figure 5. Portfolios of minimum risk based on different estimators of the covariance matrix.

Table 1. The average computational time over 500 replications of the ELA method, the adaptive thresholding method, the POET₁ method, and the Rothman method for sample sizes

n = (100, 200)

and

p = (100, 500, 900, 1400, 1900, 2300, 2500)

(in seconds).

Table 1. The average computational time over 500 replications of the ELA method, the adaptive thresholding method, the POET₁ method, and the Rothman method for sample sizes

n = (100, 200)

and

p = (100, 500, 900, 1400, 1900, 2300, 2500)

(in seconds).

Method	Adaptive Thresholding		POET		Rothman		ELA
$p$	$n = 100$	$n = 200$	$n = 100$	$n = 200$	$n = 100$	$n = 200$	$n = 100$	$n = 200$
100	0.005	0.006	0.222	0.815	0.012	0.013	0.034	0.037
500	0.370	0.409	0.806	2.042	0.828	0.830	0.863	0.930
900	2.176	2.207	2.657	5.163	4.680	4.910	2.849	3.131
1400	7.967	7.467	9.248	12.711	18.444	18.442	7.743	8.087
2300	44.901	43.974	34.241	42.228	81.292	81.834	20.525	23.054
2500	60.117	60.876	43.077	52.826	108.323	110.301	25.317	25.570

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.