Next Article in Journal
Quasi Efficient Solutions and Duality Results in a Multiobjective Optimization Problem with Mixed Constraints via Tangential Subdifferentials
Next Article in Special Issue
K-L Estimator: Dealing with Multicollinearity in the Logistic Regression Model
Previous Article in Journal
Numerical Investigation of Heat Transfer Enhancement in a Microchannel with Conical-Shaped Reentrant Cavity
Previous Article in Special Issue
Joint Models for Incomplete Longitudinal Data and Time-to-Event Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Homogeneity Test of Multi-Sample Covariance Matrices in High Dimensions

1
Department of Statistics, East China Normal University, Shanghai 200062, China
2
KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China
3
Center for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
4
School of Mathematics and Statistics, Anhui Normal University, Anhui 241002, China
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(22), 4339; https://doi.org/10.3390/math10224339
Submission received: 13 October 2022 / Revised: 10 November 2022 / Accepted: 11 November 2022 / Published: 18 November 2022
(This article belongs to the Special Issue Statistical Theory and Application)

Abstract

:
In this paper, a new test statistic based on the weighted Frobenius norm of covariance matrices is proposed to test the homogeneity of multi-group population covariance matrices. The asymptotic distributions of the proposed test under the null and the alternative hypotheses are derived, respectively. Simulation results show that the proposed test procedure tends to outperform some existing test procedures.

1. Introduction

In the last thirty years, statistical methods have made great developments for high-dimensional data. One common feature of high-dimensional data is that the data dimension is larger or vastly larger than the total sample size. In high-dimensional settings, many classic methods are not well-defined or have poor performances. As a result, a great deal of statistical methods have been proposed to deal with high-dimensional data. As an important issue of statistical inference, hypothesis testing is being followed with interest by scholars. Testing the equality of several covariance matrices is a fundamental problem in multivariate statistical analysis and can be found in many relevant works, such as [1,2]. This problem will arise from the analysis of gene expression data. It was shown in [3] that many genes have different variances in gene expressions between disease states. For example, the dataset of the breast cancer of patient is classified in three groups based on their gene expression signatures: well-differentiated tumors, moderately differentiated tumors, and poorly differentiated and undifferentiated tumors. Refs. [4,5,6] respectively used artificial neural networks, feature selection and a Markov blanket-embedded genetic algorithm to investigate this dataset. This dataset includes 83 samples, which is greatly less than data dimension 2308. For this gene dataset, we are interested in checking whether the covariance matrices of the four groups are the same or not. It is noted that an important assumption for multivariate analysis of variance is that of equal covariance matrices in different groups. Therefore, testing the equality of several covariance matrices is needed. Specifically, assume X i 1 , , X i n i are independent and identically distributed (iid) with a p-dimensional distribution with mean vector μ i and covariance matrix Σ i for i = 1 , , k with k 2 . We want to test the following hypothesis
H 0 : Σ 1 = = Σ k vs . H 1 : H 0 is not true .
A classical method is the likelihood ratio test proposed by [7], where the population distribution is normal. However, as the data dimension is larger or vastly larger than the sample size, the likelihood ratio test is not well-defined because of the singularity of sample covariance matrices in probability one. Thus, it is vital to propose new procedures suitable to high-dimensional data. Owing to the curse of dimensionality, it becomes more challenging in high dimensions. For the hypothesis (1), there exists some test procedures in the literature. Based on the Frobenius norm, Refs. [8,9,10,11] proposed test statistics, respectively. Ref. [12] presented a test for high-dimensional longitudinal data. Ref. [13] proposed a power-enhancement high-dimensional test when the maximum eigenvalue of Σ i is bounded or tr ( Σ i j ) = O ( p j ) for j = 1 , 2 , 3 , 4 . Ref. [14] presented a test based on the well-known Box’s test ([15]) for high-dimensional normal data. It is noted that [9] imposed the conditions 0 < lim p tr ( Σ i ) / p < for i = 1 , , 8 to obtain the asymptotical null distribution of his test under normality, where Σ is equal to the covariance matrix under the null hypothesis H 0 . Under high-dimensional normal data, [10] proposed a test as 0 < lim p tr ( Σ i ) / p < for i = 1 , , 4 . Ref. [8] imposed 0 < tr ( Σ 2 ) / p < and other conditions to build asymptotical properties of his test statistic. The test statistics in [8,9,10,13,14] imposed the condition about the relationship between the data dimension p and the sample size n i or about the normal data. These conditions restrict the application of their test procedures. Ref. [11] gave a Frobenius norm-based test, which can be seen as an extension of that in [16], where the data dimension and the sample size can change arbitrarily.
The existing Frobenius norm-based tests for the hypothesis H 0 were almost constructed by i < j k tr ( Σ i Σ j ) 2 = k i = 1 k tr ( Σ i Σ ¯ ) 2 , where Σ ¯ = i = 1 k Σ i / k means average covariance matrix of the population covariance matrices Σ 1 , , Σ k . The deviations of Σ i from overall Σ ¯ is weighted by all equal weight 1 / k , which, however, cannot emphasize the deviations of populations with large sample sizes. Based on this, we here construct a new test statistic by a different Frobenius norm i = 1 k n i tr ( Σ i Σ * ) 2 where Σ * = i = 1 k n i Σ i / n and n = i = 1 k n i . Here, Σ * is a weighted average of k covariance matrices, which can emphasize the deviations of populations with large sample sizes. On the other hand, it is evident that the null hypothesis H 0 holds if and only if i = 1 k n i tr ( Σ i Σ * ) 2 = 0 .
The main purpose of this paper is to develop a new method to test the homogeneity of k high-dimensional covariance matrices on the basis of the weighted Frobenius norm. For the hypothesis in (1), most existing methods ([8,9,10,13]) imposed normality or the explicit relationship of between the data dimension p and the sample size n, and may behave poorly under non-normal data or ultra-high dimensional data. However, our method does not require normality or an explicit relationship between p and n, which is similar with the method in [11]. Thus, our method can be applied to non-normal and ultra-high dimensional data. On the other hand, the difference of the method in [11] and ours is that we use a sample-sizes-based Frobenius norm to build test statistics so that the tests in this paper and [11] behave differently, as the sample sizes are unequal. When k = 2 , our proposed test statistic can be seen as an extension of that [16] except a constant. Hence it can also be applied to two-sample data.
Simulation results show that the proposed test behaves differently from existing tests such as the tests in [11,13,14]. We will discuss the differences between the proposed test and the competing tests through numerical comparisons in various scenarios. We observe that the proposed test outperforms the competing tests in both size and power in many cases.
The remainder of the paper is organized as follows. Section 2 first presents the statistical model and the imposed conditions in order to construct our new test. In Section 3, a new test statistic is proposed and its asymptotic properties are also given. Section 4 presents a numerical study of the proposed test to compare three competing tests. Concluding remarks are provided in Section 5. The proofs of main results are arranged in Appendix A.

2. Preliminaries

We consider the following general multivariate model that is often used in literature:
X i j = μ i + Γ i Z i j
for j = 1 , , n i and i = 1 , , k , where Γ i is a p × r matrix for some r p such that Γ i Γ i T = Σ i and Z i 1 , , Z i n i are r-variate iid random vectors with E ( Z i j ) = 0 and Var ( Z i 1 ) = I r . Furthermore, denote Z i j = ( z i j 1 , , z i j r ) T and z i j 1 , , z i j r have a finite eighth moment with E ( z i j l 4 ) = Δ l where Δ l is some constant. Moreover, for any positive integers q and α v , there has
E ( z i j v 1 α 1 z i j v q α q ) = E ( z i j v 1 α 1 ) E ( z i j v q α q )
as v = 1 q α v 8 , where v 1 , , v q are distinct indices. It shows from (2) that z i j v 1 , , z i j v q are pseudo-independent, which is naturally satisfied when samples are generated from normal distribution.
In order to obtain the asymptotic distributions of new test statistic, some conditions are imposed as follows:
(C1)
As n , n i / n c i ( 0 , 1 ) for i = 1 , , k .
(C2)
As n , p = p ( n 1 , , n k ) and for arbitrary l , m , s , h { 1 , , k } , tr ( Σ l Σ m ) and tr Σ l Σ m Σ s Σ h = o t r ( Σ l Σ m ) t r ( Σ s Σ h ) .
(C1) implies that all sample sizes have the same increasing rate, except constant terms. (C2) can be seen as an extension of the condition A2 in [16] to the case of multi-groups. The two conditions are the same as those in [11]. A key aspect of (C2) is that it does not impose any explicit relationship between p and sample size n. It is noted that (C2) naturally holds when all eigenvalues of k covariance matrices are uniformly bounded. Next, we consider another set of covariance matrices satisfying (C2), namely spiked covariance structures. For convenience, we set k = 4 and let Σ i = diag ( a i 1 p δ i 1 , , a i m i p δ i m i , a i , m i + 1 , , a i p ) , where a i j s, δ i j s and m i s are fixed positive constants with δ i 1 δ i m i for i = 1 , 2 , 3 , 4 . Then, the main items of tr ( Σ 1 Σ 2 Σ 3 Σ 4 ) , tr ( Σ 1 Σ 2 ) and tr ( Σ 3 Σ 4 ) is respectively a 11 a 21 a 31 a 41 p δ 11 + δ 21 + δ 31 + δ 41 + O ( p ) , a 11 a 21 p δ 11 + δ 21 + O ( p ) and a 31 a 41 p δ 31 + δ 41 + O ( p ) . As a result, (C2) holds if δ 11 + δ 21 < 1 and δ 31 + δ 41 < 1 . Let λ i j denote the jth largest eigenvalue of Σ i , then if δ 11 , , δ 41 are less than 0.5, λ i 1 2 / tr ( Σ i 2 ) 0 for i = 1 , , 4 , which is called a non-strongly spiked eigenvalue (NSSE) structure in [17]. Otherwise, if there exists some δ i 1 0.5 , then λ i 1 2 / tr ( Σ i 2 ) 0 , which is called a strongly spiked eigenvalue (SSE) structure in [17]. Therefore, (C2) holds when all of the covariance matrices have NSSE structures, or some of the covariance matrices have SSE structures.
To propose our test statistic, the hypothesis in (1) is rewritten as the following hypothesis
H 0 : i = 1 k n i tr ( Σ i Σ * ) 2 = 0 vs . H 1 : i = 1 k n i tr ( Σ i Σ * ) 2 > 0 .
Then, we can construct a new test statistic based on i = 1 k n i tr ( Σ i Σ * ) 2 . Note that
i = 1 k n i tr ( Σ i Σ * ) 2 = i = 1 k n i ( n n i ) n tr ( Σ i 2 ) i j k n i n j n tr ( Σ i Σ j ) .
Thus, a new test statistic can be given if unbiased estimators of tr ( Σ i 2 ) and tr ( Σ i Σ j ) are respectively obtained for i = 1 , , k .

3. Main Results

In this section, we propose a new test statistic for the hypothesis in (1) and give its asymptotic properties under conditions (C1) and (C2). According to the equivalent hypothesis in (3), our new test statistic is given as the following
T : = i = 1 k n i ( n n i ) n A i i j k n i n j n C i j ,
where A i and C i j are respectively unbiased estimators of tr ( Σ i 2 ) and tr ( Σ i Σ j ) , which are given as follows
A i = 1 n i 2 j , l * X i j T X i l 2 2 n i 3 j , l , f * X i j T X i l X i j T X i f + 1 n i 4 j , l , f , g * X i j T X i l X i f T X i g
and
C i j = 1 n i n j l = 1 n i m = 1 n j X i l T X j m 2 1 n i n j 2 l = 1 n i f , m * X i l T X j f X i l T X j m 1 n i 2 n j l = 1 n j f , m * X j l T X i f X j l T X i m + 1 n i 2 n j 2 l , m * f , g * X i l T X j f X i m T X j g .
Here n i l = n i ! n i l ! and * denotes the sum for all different indices. Note that T is an unbiased estimator of i = 1 k n i tr ( Σ i Σ * ) 2 . The above two unbiased estimators are used in [11,16,18,19], respectively. It is noted that we can assume, without loss of generality, μ 1 = μ 2 = = μ k because T is invariant under location transformation. Under this assumption, the leading terms in A i and C i j are respectively the first term since the last two terms in A i and the last three terms in C i j are respectively infinitesimals of higher order of the first term. As a result, we only treat the first term in A i and C i j to save computation time. Please see [11,18] for more details about this computation time.
It shows from conditions (C1) and (C2) that the variance of T is
σ 2 = 4 n 2 s = 1 k n s ( n n s ) 2 ( n s 1 ) tr 2 ( Σ s 2 ) + s h k 2 n s n h tr 2 ( Σ s Σ h ) + 4 n 2 s = 1 k h f k n s n h n f tr { Γ s T ( Σ s Σ h ) Γ s Γ s T ( Σ s Σ f ) Γ s } + 8 n 2 s = 1 k h f k n s n h n f tr { Σ s ( Σ s Σ h ) Σ s ( Σ s Σ f ) } + o [ tr 2 ( Σ s 2 ) ] .
Then, we can obtain the following asymptotic distribution of T.
Theorem 1.
Under (C1) and (C2), as m i n { p , n } , we have
T E ( T ) σ d N ( 0 , 1 ) .
Proof. 
See Appendix B. □
It is clear that the variance of T under the null hypothesis H 0 is
σ 0 2 = 4 n 2 i = 1 k n i ( n n i ) 2 ( n i 1 ) tr 2 ( Σ i 2 ) + i j k 2 n i n j tr 2 ( Σ i Σ j ) .
As a result, to formulate the test procedure, we need to give a ratio-consistent estimator of σ 0 2 . In this paper, we use the unbiased estimators A i and C i j in (5) and (6) to respectively estimate tr ( Σ i 2 ) and tr ( Σ i Σ j ) . The following lemma is from [11].
Lemma 1.
Under (C1) and (C2), as m i n { p , n } ,
A i tr ( Σ i 2 ) p 1 , C i j tr ( Σ i Σ j ) p 1 .
On the basis of Lemma 1, a ratio-consistent estimator of σ 0 is given by
σ 0 ^ 2 : = 4 n 2 i = 1 k n i ( n n i ) 2 ( n i 1 ) A i 2 + i j k 2 n i n j C i j 2 .
Then, our new test is proposed by T ^ = T / σ 0 ^ . It follows from Theorem 1 and Lemma 1 that T ^ d N ( E ( T ) / σ 0 , σ 2 / σ 0 2 ) as m i n { p , n } . We see especially that the proposed test T ^ is asymptotically distributed with the standard norm distribution under the null hypothesis H 0 . As a result, we reject the null hypothesis when T ^ z α , where z α is the upper α quantile of standard normal distribution.
The following corollary is easy to be taken, which gives the asymptotic power function under the hypothesis H 1 .
Corollary 1.
Under (C1) and (C2), as m i n { p , n } , we have
P ( T ^ z α ) = Φ z α σ 0 / σ + E ( T ) / σ + o ( 1 ) .

4. Simulation Studies

In this section, we compare our proposed test with four existing tests by simulation. The four competing tests are denoted by T Z B H W , T Z L G Y and ρ L k ( y ) from [11,13,14], respectively. Note that the authors used eight test statistics according to a different y in [14]. Here, we only make simulations for four test statistics to save space, namely ρ L k ( y i ) , i = 1 , 2 , 3 and 4. We obtain that z i j r s are independently generated from the standard normal distribution N ( 0 , 1 ) and the centralized gamma distribution G a m m a ( 4 , 2 ) 2 , respectively, for i = 1 , , k , j = 1 , , n i and r = 1 , , p . We set k = 3 and Γ i = Σ i 1 / 2 for i = 1, 2 and 3, where the covariance matrices are considered respectively in the following four cases:
  • Case 1: H 0 : Σ 1 = Σ 2 = Σ 3 = Λ U 0 Λ , H 1 : Σ i = Λ U i 1 Λ , i = 1 , 2 , 3 , where Λ = diag ( λ 1 , , λ p ) with λ 1 , , λ p i . i . d . U n i f ( 1 , 5 ) and U i 1 is a p × p matrix with the ( a , b ) th element being ( 1 ) a + b 5 2 i 1 6 | a b | 0.1 .
  • Case 2: H 0 : Σ 1 = Σ 2 = Σ 3 = Λ U 0 Λ , H 1 : Σ i = Λ U i 1 Λ , i = 1 , 2 , 3 , where Λ = diag ( λ 1 , , λ p ) with λ 1 , , λ p i . i . d . U n i f ( 1 , 5 ) and U i 1 is a p × p matrix with the ( a , b ) th element being ( 1 ) a + b 4 i 5 | a b | 0.1 .
  • Case 3: H 0 : Σ 1 = Σ 2 = Σ 3 = Λ 0 , H 1 : Σ i = Σ i 1 , i = 1 , 2 , 3 , where Λ 0 = diag ( λ 1 , , λ p ) with λ 1 , , λ p i . i . d . U n i f ( 1 , 5 ) . Λ 1 is a p × p symmetric matrix with the ( a , a ) th element being 1.01, the ( a , a + 1 ) th elements being 0.1 and the rest being 0; Λ 2 is a p × p symmetric matrix with the ( a , a ) th element being 3, the ( a , a + 1 ) th elements being 2, the ( a , a + 2 ) th elements being 1 and the rest being 0.
  • Case 4: H 0 : Σ 1 = Σ 2 = Σ 3 = diag ( ω 1 , , ω p ) where ω 1 , , ω p i . i . d . U n i f ( 0.5 , 10 ) , H 1 : Σ 1 = U i Λ U i T , i = 1 , 2 , 3 , where Λ = diag ( λ 1 , , λ p ) with λ 1 , , λ p i . i . d . G a m m a ( 4 , 0.5 ) , U i = ( W i T W i ) 1 / 2 W i T and W i is a p × p matrix whose entries are independently generated from the normal distribution N ( 0 , i ) .
Without loss of generality, all the population means are chosen to be 0. The sample size and data dimension are respectively n = 45 , 95 and p = 16 , 32 , 64 , 128 , 256 . Empirical sizes and powers are computed under the nominal level α = 0.05 with 10 , 000 replications.
All the simulation results are reported in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 and Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Figures provide intuitive observation, and tables show the simulated values. First, Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8 show that the empirical sizes of the test T Z L G Y are seriously inflated, especially when n = 95 in all cases of simulation. For example, just as what is presented in Table 1, the empirical sizes of T Z L G Y are respectively 0.0719, 0.1230, 0.2814, 0.5641 and 0.7692 when n = 95 and p = 16, 32, 64, 128, 256. Thus, T Z L G Y cannot control the nominal size reasonably. Second, Table 1, Table 2, Table 3 and Table 4 imply that the empirical sizes of tests ρ L k ( y i ) , i = 1 , 2 , 3 , 4 , are closer to a given size than those of T Z B H W and T ^ , while T Z B H W and T ^ respectively obtains about 7 % and 6 % of empirical sizes as n = 95 . Thus, ρ L k ( y i ) outperforms T Z B H W and T ^ in size under Cases 1 and 2 when data are generated from standard normal distribution and gamma distribution. At the same time, our new test T ^ has higher empirical powers than T Z B H W and ρ L k ( y i ) . For example, as n = 95 and p = 256 , the empirical powers of T ^ , T Z B H W and ρ L k ( y i ) , i = 1 , 2 , 3 , 4 , are respectively 0.9966, 0.8974, 0.5509, 0.5560, 0.5359 and 0.3469 under Case 1 and normal distribution. Finally, Table 5, Table 6, Table 7 and Table 8 show that the six tests T Z B H W , ρ L k ( y i ) , i = 1 , 2 , 3 , 4 , and T ^ have similar empirical sizes, which are all close to the nominal size. On the other hand, the six tests still have similar empirical powers under Case 3. However, under Case 4, the empirical powers of ρ L k ( y i ) are extremely deflated, which are no more than 0.1600. Moreover, the empirical powers of our new test are respectively 0.04 and 0.16 more than those of T Z B H W when n = 45 and 95.
In summary, our proposed test T ^ can control a given size reasonably and has greater powers than competing tests in all cases of our simulation whenever samples are from the normal model or the non-normal model. However, T Z L G Y fails in controlling a given size. ρ L k ( y i ) , i = 1 , 2 , 3 , 4 , seriously deflates the empirical powers in some cases of our simulations. T Z B H W can slightly inflate the empirical size in some cases of our simulation.

5. Real Data Analysis

This problem will arise from the analysis of gene expression data. It was shown in [3] that many genes have different variances in gene expressions between disease states. For example, the dataset of the breast cancer of patient is classified into three groups based on their gene expression signatures: well-differentiated tumors ( n 1 = 29), moderately differentiated tumors ( n 2 = 136), and poorly differentiated and undifferentiated tumors ( n 3 = 35). The breast cancer microarray data sets, including patient outcome information, were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) data repository and are accessible through GEO Series accession now. GSE11121 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11121, accessed on 26 June 2022). This dataset includes 200 samples, which is far fewer than data dimension 22,283. For this gene dataset, we are interested in checking whether the covariance matrices of the three groups are the same or not. It is noted that an important assumption for multivariate analysis of variance is that of equal covariance matrices in different groups. Therefore, testing the equality of several covariance matrices is necessary.
Since we used the raw breast cancer dataset (22,283 features), there are many features and no preprocessing. First, we preprocessed and filtered the data, including conventional preprocessing such as background adjustment, normalization, and summarization. Then, we performed feature screening, filtered out features whose coefficient of variation is out of range (0.25, 1.0) and controlled at least 5 samples to exceed 1320 count values. Finally, a data set of 200 samples and 1280 features was screened out, and then a high-dimensional hypothesis test problem was performed on the screened data set.
Compared with other methods and our method, the observed test statistics (p value) of the screened breast cancer dataset are T Z B H W ( 1.92 , 2.75 × 10 2 ) , ρ L k ( y 1 ) ( 0.46 , 0.79 ) , ρ L k ( y 2 ) ( 0.4 , 0.82 ) , ρ L k ( y 3 ) ( 0.91 , 0.63 ) , ρ L k ( y 4 ) ( 0.27 , 0.69 ) , T Z L G Y ( 2.47 , 6.62 × 10 3 ) and T ^ ( 1.96 , 2.52 × 10 2 ) . The p-values of the comparison method statistics T Z B H W , T Z L G Y and our method’s statistic are significant.

6. Concluding Remarks

In this paper, we propose a new test on the homogeneity of k-sample covariance matrices in a high-dimensional setting. The asymptotic properties of the proposed test are derived under some regularity conditions. We compare our new test with six competing tests by simulation. Numerical results show that our proposed test can control the nominal size reasonably and that it has the highest empirical powers in our simulation scenario. However, just as what we showed in Section 2, the technique condition (C2) may not hold again when covariance matrices have some spiked eigenvalues. How to obtain the theoretical results of test statistics under spiked covariance matrices structure is an interested problem. We leave this problem as a future research direction.

Author Contributions

Conceptualization, P.S., Y.T. and M.C.; Simulation studies, P.S.; Formal analysis, P.S.; Funding acquisition, Y.T.; Investigation, P.S.; Methodology, P.S. and M.C.; Supervision, Y.T.; Software, P.S.; Visualization, P.S.; Validation, P.S. and M.C.; Writing—original draft, P.S.; Writing—review & editing, P.S. and M.C. All authors have read and agreed to the published version of the manuscript.

Funding

Tang’s research is supported by the Natural Science Foundation of China (No. 12271168) and the 111 Project of China (No. B14019). Cao’s research is supported by the Humanities and Social Sciences Foundation of Ministry of Education (No. 22YJC910001), the Natural Science Foundation of Anhui Province (No. 2108085MA09), the Foundation for Excellent Young Talents in College of Anhui Province (No. gxyqZD2021092) and the Program for Mathematical Statistics Research Team of Anhui Province (No. 2020jxtd102).

Data Availability Statement

Gene expression data generated for this manuscript is available to download and explore at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11121, accessed on 26 June 2022. All raw data are available in GEO database under the accession number GEO: GSE11121.

Conflicts of Interest

On behalf of the authors, the corresponding author states that there is no conflict of interest.

Appendix A. Some Lemmas

As mentioned in Section 3, we can assume all the population means are 0. In this case, we only need to consider respectively the first terms A i * = 1 n i 2 j , l * X i j T X i l 2 and C i j * = 1 n i n j l = 1 n i m = 1 n j X i l T X j m 2 of A i and C i j in (5) and (6). Let T 1 : = i = 1 k n i ( n n i ) n A i * i j k n i n j n C i j * , then we only need to prove the asymptotic normality of T 1 . To do this, we define Z l = Z s i , where l = i + j = 1 s 1 n j and j = 1 0 n j = 0 for i = 1 , , n s and s = 1 , , k . A sequence of increasing σ -fields is defined as F 0 = { , Ω } and F j = σ { Z 1 , , Z j } for j = 1 , , n . Let E j ( · ) denote the conditional expectation with respect to F j and D j : = ( E j E j 1 ) T 1 . It is easy to obtain T 1 E ( T 1 ) = j = 1 n D j .
Lemma A1.
For any n, { D j , F j } j = 1 n is a square integrable martingale difference sequence.
Proof. 
The conclusion is obvious. Hence, the proof is omitted. □
Therefore, to prove Theorem 1, we next apply the martingale central limit theorem.
Lemma A2.
Under conditions (C1) and (C2), as n , p ,
j = 1 n E ( D j 2 | F j 1 ) V a r ( T 1 ) p 1 .
Proof. 
By some calculations, we have
j = 1 n E ( D j 2 | F j 1 ) = j = 1 n E j 1 E j T 1 E j 1 T 1 2 = l = 1 k j = 1 n l E i = 1 l 1 n i + j 1 E i = 1 l 1 n i + j n n l n A l * h l k n h n l n C l h * E i = 1 l 1 n i + j 1 n n l n A l * h l k n h n l n C l h * 2 = l = 1 k j = 1 n l E i = 1 l 1 n i + j 1 n n l n ( n l ) 2 i < j n l tr ( X l i X l i T Σ l ) ( X l j X l j T Σ l ) + n n l n n l tr ( X l j X l j T Σ l ) Σ l n h n h > l k tr ( X l j X l j T Σ l ) Σ h 1 n h < l k i = 1 n h ( X h i T ( X l j X l j T Σ l ) X h i ) 2 = l = 1 k 1 n 2 n l 2 j = l n l E i = 1 l 1 n i + j 1 tr ( ( X l j X l j T Σ l ) Q l j ) 2 = l = l k 1 n 2 n l 2 j = l n l 2 tr ( Q l j Σ l ) 2 + Δ l tr ( Γ l T Q l j Γ l Γ l T Q l j Γ l ) ,
where
Q l j = n n l n l 1 i < j n l ( X l i X l i T Σ l ) + ( n n l ) Σ l + n l h l k n h Σ h n l h < l k i n h ( X h i X h i T Σ h ) = : Q l j 1 + Q l j 2 + Q l j 3 + Q l j 4 .
Note that E ( j = 1 n E ( D j 2 | F j 1 ) ) = Var ( T 1 ) . Next, we prove
Var j = 1 n E ( D j 2 | F j 1 ) V a r 2 T 1 0 .
Note that
tr Q l j Σ l 2 = tr Q l j 1 Σ l 2 + tr Q l j 2 Σ l 2 + tr Q l j 3 Σ l 2 + tr Q l j 4 Σ l 2 + 2 tr Q l j 1 Σ l Q l j 2 Σ l + 2 tr Q l j 1 Σ l Q l j 3 Σ l + 2 tr Q l j 1 Σ l Q l j 4 Σ l + 2 tr Q l j 2 Σ l Q l j 3 Σ l + 2 tr Q l j 2 Σ l Q l j 4 Σ l + 2 tr Q l j 3 Σ l Q l j 4 Σ l .
It then follows that
Var l = 1 k 1 n 2 n l 2 j = 1 n l tr ( Q l j Σ l ) 2 16 s , t = 1 4 R s t ,
where R s t = Var l = 1 k 1 n 2 n l 2 j = 1 n l tr E i = 1 l 1 n i + j Q l j s Σ l E i = 1 l 1 n i + j Q l j t Σ l . In the following, we will prove R s t = o Var 2 T 1 for s , t = 1 , 2 , 3 and 4.
R 11 = Var l = 1 k ( n n l ) 2 n 2 ( n l 1 ) 2 n l 2 j 1 , j 2 = 1 n l ( n l j 1 j 2 ) tr [ ( X l j 1 X l j 1 T Σ l ) ( X l j 2 X l j 2 T Σ l ) ] = l = 1 k j 1 , j 2 = 1 n l ( n n l ) 4 ( n l j 1 j 2 ) 2 n 4 ( n l 1 ) 4 n l 4 Var tr [ ( X l j 1 X l j 1 T Σ l ) ( X l j 2 X l j 2 T Σ l ) ] = l = 1 k j 1 = j 2 = j n l ( n n l ) 4 ( n l j ) 2 n 4 ( n l ) 2 4 Var tr [ ( X l j X l j T Σ l ) ( X l j X l j T Σ l ) ] + l = 1 k j 1 j 2 n l ( n n l ) 4 ( n l j 1 j 2 ) 2 n 4 ( n l 1 ) 4 n l 4 Var tr [ ( X l j 1 X l j 1 T Σ l ) ( X l j 2 X l j 2 T Σ l ) ] l = 1 k 1 ( n l ) 2 2 tr 4 ( Σ l 2 ) o ( 1 ) + O 1 n l ) = o Var 2 ( T 1 ) .
Similarly,
R 12 l = 1 k j = 1 n l 3 ( n n l ) 2 ( n l j ) 2 n 4 n l 4 ( n l 1 ) 2 tr Σ l Q l j 2 Σ l 2 2 = l = 1 k ( n n l ) 4 ( 2 n l 1 ) 2 n 4 n l 3 ( n l 1 ) tr [ ( Σ l 4 ) 2 ] l = 1 k ( n n l ) 4 ( 2 n l 1 ) 2 n 4 n l 3 ( n l 1 ) tr 2 ( Σ l 4 ) = o Var 2 ( T 1 ) .
R 13 l = 1 k j = 1 n l 3 ( n n l ) 2 ( n l j ) 2 n 4 n l 4 ( n l 1 ) 2 tr Σ l Q l j 3 Σ l 2 2 = l = 1 k ( n n l ) 2 ( 2 n l 1 ) 2 n 4 n l ( n l 1 ) tr Σ l 3 n l h l k n h Σ h 2 l = l k ( n n l ) 2 ( 2 n l 1 ) 2 n 4 n l ( n l 1 ) tr ( Σ l 4 ) tr Σ l h l k n h Σ h 2 = o Var 2 ( T 1 ) .
We can obtain R i 4 = o Var 2 ( T 1 ) for i = 2 , 3 and 4 by a similar method. It is noted that R 23 = R 22 = R 33 = 0 since Q l j 2 and Q l j 3 are non random.
As a result, it follows from the above equalities that
Var l = 1 k 1 n 2 n l 2 j = 1 n l 2 tr ( Q l j Σ l ) 2 = o Var 2 ( T 1 ) .
Finally, using similar calculations, we can obtain
Var l = l k 1 n 2 n l 2 j = l n l tr ( Γ l T Q l j Γ l Γ l T Q l j Γ l ) = o Var 2 ( T 1 ) .
This completes the proof of Lemma A2. □
Lemma A3.
Under the condition (C2), as n , p ,
j = 1 n E ( D j 4 ) = o V a r 2 ( T 1 ) .
Proof. 
By some calculations, for some constants c 1 , c 2 , c 3 and c 4 , we can obtain
j = 1 n E ( D j 4 ) c 1 l = 1 k 1 n l 4 tr 4 ( Σ l 2 ) + c 2 l h k n h n l 2 tr 4 ( Σ l Σ h )
and
Var 2 ( T 1 ) c 3 l = 1 k n l 2 ( n n l ) 4 n 4 ( n l 1 ) 2 tr 4 ( Σ l 2 ) + c 4 l h k n l 2 n h 2 n 4 tr 4 ( Σ l Σ h ) .
As a result,
j = 1 n E ( D j 4 ) V a r 2 ( T 1 ) l = 1 k 1 n l 0 .
This completes the proof of Lemma A3. □

Appendix B. Proof of Theorem 1

Proof. 
According Lemmas A1–A3, as n , p , it is easy to complete the proof of Theorem 1. □

References

  1. Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
  2. Muirhead, R.J. Aspects of Multivariate Statistical Theory; Wiley: New York, NY, USA, 2005. [Google Scholar]
  3. Schmidt, M.; Böhm, D.; Törne, C.; Steiner, E.; Puhl, A.; Pilch, H.; Lehr, H.; Hengstler, J.; Kölbl, H.; Gehrmann, M. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008, 68, 5405–5413. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001, 7, 673–679. [Google Scholar] [CrossRef] [PubMed]
  5. Zhu, Z.; Ong, Y.S.; Dash, M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 2007, 49, 3236–3248. [Google Scholar] [CrossRef]
  6. Li, T.; Zhang, C.; Ogihara, M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20, 2429–2437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Wilks, S.S. Sample criteria for testing equality of means, equality of variances, and equality of covariances in a normal multivariate distribution. Ann. Math. Stat. 1946, 17, 257–281. [Google Scholar] [CrossRef]
  8. Ahmad, M.R. Testing homogeneity of several covariance matrices and multi-sample sphericity for high-dimensional data under non-normality. Commun. Stat. Theory Methods 2017, 46, 3738–3753. [Google Scholar] [CrossRef]
  9. Schott, J.R.A. Test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Stat. Data Anal. 2007, 51, 6535–6542. [Google Scholar] [CrossRef]
  10. Srivastava, M.S.; Yanagihara, H. Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivar. Anal. 2010, 101, 1319–1329. [Google Scholar] [CrossRef] [Green Version]
  11. Zhang, C.; Bai, Z.; Hu, J.; Wang, C. Multi-sample test for high-dimensional covariance matrices. Commun. Stat. Theory Methods 2018, 47, 3161–3177. [Google Scholar] [CrossRef]
  12. Zhong, P.; Li, R.; Shanto, S. Homogeneity tests of covariance matrices with high-dimensional longitudinal data. Biometrika 2019, 106, 619–634. [Google Scholar] [CrossRef] [PubMed]
  13. Zheng, S.; Lin, R.; Guo, J.; Yin, G. Testing homogeneity of high-dimensional covariance matrices. Stat. Sin. 2020, 30, 35–53. [Google Scholar] [CrossRef] [Green Version]
  14. Qayed, A.; Han, D. Homogeneity test of several covariance matrices with high-dimensional data. J. Biopharm. Stat. 2021, 31, 523–540. [Google Scholar] [CrossRef] [PubMed]
  15. Box, G.E.P. A general distribution theory for a class of likelihood criteria. Biometrika 1949, 36, 317–346. [Google Scholar] [CrossRef]
  16. Li, J.; Chen, S. Two sample tests for high-dimensional covariance matrices. Ann. Stat. 2012, 40, 908–940. [Google Scholar] [CrossRef] [Green Version]
  17. Aoshima, M.; Yata, K. Two-sample tests for high-dimension, strongly spiked eigenvalue models. Stat. Sin. 2018, 28, 43–62. [Google Scholar] [CrossRef] [Green Version]
  18. Jiang, Y.; Wen, C.; Jiang, Y.; Wang, X.; Zhang, H. Use of Random Integration to Test Equality of High Dimensional Covariance Matrices. Stat. Sin. 2020. [Google Scholar] [CrossRef]
  19. Chen, S.; Qin, Y. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 2010, 38, 808–835. [Google Scholar] [CrossRef]
Figure 1. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Figure 1. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Mathematics 10 04339 g001
Figure 2. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Figure 2. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Mathematics 10 04339 g002
Figure 3. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Figure 3. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Mathematics 10 04339 g003
Figure 4. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Figure 4. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under normal distribution.
Mathematics 10 04339 g004
Figure 5. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Figure 5. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Mathematics 10 04339 g005
Figure 6. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Figure 6. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Mathematics 10 04339 g006
Figure 7. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Figure 7. The Empirical sizes of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Mathematics 10 04339 g007
Figure 8. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Figure 8. The Empirical powers of T Z B H W ( r e d ) , ρ L k ( y 1 ) ( o r a n g e ) , ρ L k ( y 2 ) ( y e l l o w ) , ρ L k ( y 3 ) ( g r e e n ) , ρ L k ( y 4 ) ( c y a n ) , T Z L G Y ( b l u e ) and T ^ ( p u r p l e ) under Gamma distribution.
Mathematics 10 04339 g008
Table 1. Empirical sizes and powers of seven tests under Case 1 and normal distribution.
Table 1. Empirical sizes and powers of seven tests under Case 1 and normal distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.04760.05010.04840.05080.05000.02250.0509
n = 45320.03580.04970.05070.05050.04980.02270.0461
640.04070.04920.04710.04720.05340.03170.0475
1280.04220.05160.05390.05320.05240.04140.0448
2560.05260.04950.04890.04520.05250.07100.0514
n = 95160.06330.05240.05350.05220.05120.07190.0601
320.05840.04880.05370.05150.04790.12300.0576
640.05950.05180.04990.04720.05340.28140.0582
1280.06280.04770.05560.04990.04450.56410.0594
2560.07100.04780.04770.05340.04740.76920.0669
Power160.12910.25820.19910.12240.17100.33010.2837
n = 45320.12730.25260.21600.17090.16910.44410.3582
640.13980.25280.23370.20390.17100.55160.4377
1280.15520.25230.24620.22700.16710.65940.5222
2560.16760.25950.24900.24440.16650.73800.5847
n = 95160.54900.56050.41980.24660.35610.87830.8870
320.68100.56840.48160.35660.36530.95590.9563
640.77710.56690.51280.44940.36030.98610.9863
1280.84760.56700.54650.50480.35210.99460.9924
2560.89740.55090.55600.53590.34690.99740.9966
Table 2. Empirical sizes and powers of seven tests under Case 1 and Gamma distribution.
Table 2. Empirical sizes and powers of seven tests under Case 1 and Gamma distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.06300.05500.06100.06310.06320.03290.0644
n = 45320.04670.05230.05260.05570.05610.02780.0511
640.04340.05730.05330.04940.05520.03320.0537
1280.04490.04840.04940.05450.05140.04610.0504
2560.04900.04980.04990.04930.05140.07070.0492
n = 95160.08220.05600.06570.06990.06310.08510.0789
320.07410.05310.06100.06290.05940.12240.0698
640.06190.05440.05350.05450.05400.25050.0617
1280.06380.05240.05400.05170.04770.50370.0627
2560.06860.05080.04930.04980.05170.70410.0668
Power160.13750.26020.20850.14370.18660.32250.2815
n = 45320.13920.26130.22210.17600.18150.44210.3639
640.14290.25870.24270.20940.17460.55610.4463
1280.16520.25130.25380.23340.16950.65360.5155
2560.16580.25650.25140.24010.16390.72790.5824
n = 95160.53360.56100.42190.26850.37800.86290.8730
320.67720.56640.48950.36830.36810.95230.9551
640.77490.56950.52560.44930.37090.98340.9849
1280.84600.56280.53910.50050.35400.99290.9923
2560.89220.56080.54770.52600.34700.99660.9978
Table 3. Empirical sizes and powers of seven tests in Case 2 and normal distribution.
Table 3. Empirical sizes and powers of seven tests in Case 2 and normal distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.05590.05010.04850.05020.04950.02420.0539
n = 45320.05620.04740.05100.04990.04600.03030.0545
640.05530.04820.04830.05010.04860.03120.0528
1280.06580.04710.04720.04840.05020.04550.0580
2560.07830.04970.05000.05180.05090.06290.0576
n = 95160.07240.05110.04990.04840.04940.07500.0656
320.06750.04920.04970.04860.04910.11320.0624
640.07470.05320.05050.04830.04980.24680.0641
1280.08210.04790.04650.04730.04680.51140.0680
2560.07900.05140.04880.05140.04920.76450.0625
Power160.08380.08690.08040.06620.07810.14470.1170
n = 45320.08620.09330.08580.07430.07880.19690.1471
640.08830.09280.09230.08380.07710.27060.1844
1280.09240.09150.09330.09400.07650.35050.2235
2560.10350.09890.09370.09070.07790.41720.2631
n = 95160.20530.16400.14050.10480.12380.47220.4314
320.28770.16930.14820.12200.12250.65320.5974
640.36410.16590.15690.14050.12030.78050.7090
1280.43800.17400.16320.15590.11430.87580.8046
2560.51360.16800.16510.15950.11290.93640.8659
Table 4. Empirical sizes and powers of seven tests in Case 2 and Gamma distribution.
Table 4. Empirical sizes and powers of seven tests in Case 2 and Gamma distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.07240.06050.06200.06320.05940.03300.0667
n = 45320.05700.05530.05830.05620.05480.02840.0556
640.06030.05290.05080.05530.05640.03490.0540
1280.06690.04670.05090.04800.05470.04040.0556
2560.08060.05220.05290.04890.05260.06460.0581
n = 95160.08860.05560.06530.06600.06490.08650.0844
320.07920.05390.05540.05800.05800.11460.0718
640.07940.05570.05500.05450.05550.22200.0696
1280.07740.05490.05240.05700.04900.44000.0658
2560.08550.05230.05040.04860.05050.70470.0691
Power160.09760.10410.09370.08340.08890.14300.1253
n = 45320.08820.09340.09610.08700.08390.19360.1486
640.08930.09310.09420.08900.08230.27460.1844
1280.09080.09720.09650.09220.07480.33950.2197
2560.09820.10290.09180.09130.07340.41950.2553
n = 95160.22480.18070.15470.12050.14640.46210.4391
320.28860.17570.16410.14030.12670.63250.5913
640.36710.16990.16810.15600.12530.75920.7119
1280.43920.16750.17150.16050.11840.85980.8064
2560.51940.16410.16820.15930.11290.92170.8760
Table 5. Empirical sizes and powers of seven tests in Case 3 and normal distribution.
Table 5. Empirical sizes and powers of seven tests in Case 3 and normal distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.05300.05170.04780.04950.04840.18970.0545
n = 45320.04910.05390.05130.05140.05030.26630.0517
640.04630.05210.04980.04870.04910.39560.0513
1280.04900.04810.05030.04960.04750.57150.0508
2560.04920.05120.04910.04780.05010.78790.0511
n = 95160.04550.04890.05160.04930.04960.27320.0520
320.04070.04840.05080.04930.04930.34620.0535
640.03570.04710.05010.05160.04960.43350.0483
1280.03860.04830.04960.04650.04830.57360.0476
2560.03810.04970.05010.05320.04940.75410.0506
Power160.99510.93630.90470.85940.81900.99810.9986
n = 45320.99880.94460.93310.90790.84280.99930.9997
640.99960.94150.94030.92840.85000.99970.9999
1280.99980.94330.94040.93860.85470.99941.0000
2560.99990.95040.94580.94760.86080.99831.0000
n = 95161.00000.99990.99960.99740.99521.00001.0000
321.00001.00001.00000.99990.99721.00001.0000
641.00001.00000.99980.99980.99801.00001.0000
1281.00001.00001.00000.99990.99831.00001.0000
2561.00001.00000.99991.00000.99861.00001.0000
Table 6. Empirical sizes and powers of seven tests in Case 3 and Gamma distribution.
Table 6. Empirical sizes and powers of seven tests in Case 3 and Gamma distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.07430.05400.05870.06330.06370.18850.0740
n = 45320.05990.05530.05440.05300.05510.24900.0622
640.05320.04760.05120.05360.05570.35140.0544
1280.05060.05080.04940.05110.05110.52320.0503
2560.04930.04710.05330.05160.04870.72730.0510
n = 95160.06190.05260.06140.06790.06350.24690.0709
320.04410.05130.05650.05620.05010.30590.0553
640.04300.05260.05370.05240.05580.38690.0521
1280.03680.05340.05000.05060.05110.51530.0478
2560.03650.04940.04630.04890.05230.70290.0445
Power160.99240.93160.90070.85190.82510.99680.9972
n = 45320.99780.94040.92860.90590.84510.99840.9994
640.99960.94110.93650.92730.85320.99941.0000
1280.99990.94490.94440.93600.85640.99931.0000
2561.00000.94210.94590.94320.85590.99861.0000
n = 95160.99980.99990.99950.99790.99371.00001.0000
321.00000.99970.99960.99950.99741.00001.0000
641.00000.99990.99990.99980.99791.00001.0000
1281.00000.99990.99990.99980.99791.00001.0000
2561.00000.99991.00001.00000.99841.00001.0000
Table 7. Empirical sizes and powers of seven tests in Case 4 and normal distribution.
Table 7. Empirical sizes and powers of seven tests in Case 4 and normal distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.05170.04900.05140.04840.05220.12090.0548
n = 45320.04810.04960.05060.05070.05410.26490.0507
640.04870.04800.04950.04930.04870.39750.0512
1280.04590.04960.04590.05150.05100.56400.0511
2560.04550.05040.04830.04690.05200.78110.0493
n = 95160.04500.04700.05220.04890.05500.24870.0513
320.03880.05040.05110.05140.05020.34860.0484
640.03810.05040.05030.04880.05020.43790.0480
1280.03290.05300.05110.04870.05230.57040.0441
2560.03480.04980.05080.05140.04620.75270.0438
Power160.55660.08720.09220.09300.09330.53310.5965
n = 45320.57490.07100.06980.07240.07490.60180.6180
640.59350.05960.05910.06010.06330.64950.6432
1280.60950.05470.05070.05220.05320.71080.6546
2560.61420.05160.05270.05080.05520.82690.6615
n = 95160.74180.13110.13610.13590.13380.87350.9232
320.78460.09230.09160.09660.09530.90240.9548
640.80890.06800.07380.07350.06790.91510.9753
1280.81840.06500.06070.06150.06260.91660.9824
2560.82580.05600.05480.05330.05490.91610.9849
Table 8. Empirical sizes and powers of seven tests in Case 4 and Gamma distribution.
Table 8. Empirical sizes and powers of seven tests in Case 4 and Gamma distribution.
p T Z B H W ρ L k ( y 1 ) ρ L k ( y 2 ) ρ L k ( y 3 ) ρ L k ( y 4 ) T Z L G Y T ^
Size160.07760.05590.05890.06320.06410.13380.0773
n = 45320.06340.05260.05340.05840.05670.24030.0642
640.05320.04880.05270.05270.05400.35790.0546
1280.05200.05330.05000.05170.04800.51950.0534
2560.04830.04950.05060.04820.05120.73160.0530
n = 95160.06740.05860.06050.06050.06680.23480.0729
320.04820.05280.05300.05860.05140.29450.0613
640.04250.05110.04910.05510.05250.38390.0533
1280.04200.05230.05220.04910.04990.51640.0520
2560.03880.04560.04880.05190.05110.69980.0490
Power160.54350.09930.10390.10280.10260.51910.5735
n = 45320.56850.06970.07410.07870.07560.57170.6106
640.59010.06060.06010.06150.06020.61530.6368
1280.61240.05510.05520.05920.05510.68100.6606
2560.61330.05110.05120.05540.05100.76850.6645
n = 95160.72830.14010.14680.15060.14380.85260.9070
320.76950.09730.09970.10000.10190.88320.9513
640.80810.07900.07450.07380.07490.91000.9746
1280.81580.05770.06270.06210.06110.89800.9835
2560.82310.05700.05450.06010.05500.89660.9843
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sun, P.; Tang, Y.; Cao, M. Homogeneity Test of Multi-Sample Covariance Matrices in High Dimensions. Mathematics 2022, 10, 4339. https://doi.org/10.3390/math10224339

AMA Style

Sun P, Tang Y, Cao M. Homogeneity Test of Multi-Sample Covariance Matrices in High Dimensions. Mathematics. 2022; 10(22):4339. https://doi.org/10.3390/math10224339

Chicago/Turabian Style

Sun, Peng, Yincai Tang, and Mingxiang Cao. 2022. "Homogeneity Test of Multi-Sample Covariance Matrices in High Dimensions" Mathematics 10, no. 22: 4339. https://doi.org/10.3390/math10224339

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop