Next Article in Journal
Fixed-Switching-Frequency Modulated Model Predictive Control for Islanded AC Microgrid Applications
Next Article in Special Issue
Limit Theorem for Spectra of Laplace Matrix of Random Graphs
Previous Article in Journal
Improved Least-Squares Progressive Iterative Approximation for Tensor Product Surfaces
Previous Article in Special Issue
Sharp Estimates for Proximity of Geometric and Related Sums Distributions to Limit Laws
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures

by
Yasunori Fujikoshi
1,* and
Tetsuro Sakurai
2
1
Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-2 Kagamiyama, Hiroshima 739-8626, Japan
2
School of General and Management Studies, Suwa University of Science, 5000-1 Toyohira, Chino 391-0292, Japan
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(3), 671; https://doi.org/10.3390/math11030671
Submission received: 18 November 2022 / Revised: 28 December 2022 / Accepted: 17 January 2023 / Published: 28 January 2023
(This article belongs to the Special Issue Limit Theorems of Probability Theory)

Abstract

:
In this paper, we consider the high-dimensional consistencies of KOO methods for selecting response variables in multivariate linear regression with covariance structures. Here, the covariance structures are considered as (1) independent covariance structure with the same variance, (2) independent covariance structure with different variances, and (3) uniform covariance structure. A sufficient condition for model selection consistency is obtained using a KOO method under a high-dimensional asymptotic framework, such that sample size n, the number p of response variables, and the number k of explanatory variables are large, as in p / n c 1 ( 0 , 1 ) and k / n c 2 [ 0 , 1 ) , where c 1 + c 2 < 1 .

1. Introduction

We focus on a multivariate linear regression model of p response variables y 1 , , y p on a subset of k explanatory variables x 1 , , x k . Suppose that there are n observations on a p-dimensional response vector y = ( y 1 , , y p ) and a k-dimensional explanatory vector x = ( x 1 , , x k ) , and let Y : n × p and X : n × k be the observation matrices of y and x with sample size n, respectively. The multivariate linear regression model including all the explanatory variables under normality is written as follows:
Y N n × p ( X Θ , Σ I n ) ,
where Θ is a k × p unknown matrix of regression coefficients, and Σ is a p × p unknown covariance matrix that is positive definite. N n × p ( · , · ) is the normal matrix distribution, such that the mean of Y is X Θ , and the covariance matrix of vec ( Y ) is Σ I n ; equivalently, the rows of Y are independently normal with the same covariance matrix Σ . Here, vec ( Y ) is the n p × 1 column vector that is obtained by stacking the columns of Y on top of one another. We assumed that rank ( X ) = k .
In multivariate linear regression, the selection of variables for the model is an important concern. One of the approaches is to first consider variable selection models and then apply model selection criteria such as AIC and BIC . Such a criterion for Full Model (1) is expressed as follows:
GIC = 2 log L ( Ξ ^ ) + d g ,
where L ( Ξ ^ ) is the maximal likelihood, Ξ = Θ , Σ , d > 0 is the penalty term, and g is the number of unknown parameters given by { k p + 1 2 p ( p + 1 ) } . For AIC and BIC , d is defined as 2 and log n , respectively. In the selection of k variables x 1 , , x k , we identified { x 1 , , x k } with the index set { 1 , , k } ω , and denote GIC for subset j ω by GIC j . Then, the model selection based on GIC chooses the following model:
j ˜ = arg min j GIC j .
Here the minimum is usually taken for all combinations of response variables. There are computational problems for the methods based on GIC , including AIC and BIC methods, since we need to compute 2 k 1 statistics for the selection of k explanatory variables. To avoid this computational problem, [1] proposed a method that was essentially thanks to [2]. The method, which was named the knock-one-out (KOO) method by [3], determines “selection” or “no selection” for each variable by comparing the model removing that variable and the full model. More precisely, the KOO method chooses the model or the set of variables given by
j ^ = { j ω | GIC ω j > GIC ω } ,
where ω j is a short expression for ω { j } , which is the set obtained by removing element j from the set ω . In general, the KOO method can be applied to a method or criterion, not only AIC, a general variable selection criterion or method.
In the literature on multivariate linear regression, numerous papers have dealt with the variable selection problem, as it relates to selecting explanatory variables. When Σ is unknown positive definite, [4,5,6], for example, indicated that, in a high-dimensional case, AIC and C p have consistency properties, but BIC is not necessarily consistent. KOO methods in the multivariate regression model were studied by [3] and [7,8]. The KOO method in discriminant analysis; see [9], and [10]. For a review, see [11].
In this paper, we assume that the covariance structure was one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. The numbers of unknown parameters in covariance structures (1)–(3) were 1, p, and 2, respectively. Sufficient conditions for the KOO method given by (4) to be consistent were derived under a high-dimensional asymptotic framework, such that sample size n, the number p of response variables, and the number k of explanatory variables were large, as in p / n c 1 ( 0 , 1 ) and k / n c 2 [ 0 , 1 ) , where c 1 + c 2 < 1 . Ref. [12] considered similar problems under covariance structures (1), (3), and (4), an autoregressive covariance structure, but did not consider them under (2). Moreover, in the study of asymptotic consistencies, they assumed that k was fixed, but in this paper, k may tend to infinity, such that k / n c 2 [ 0 , 1 ) . From the numerical experiments in [12], we know that the probability of choosing the true model in Cases (1) and (3) results from the following table (Table 1). In variable selection for multivariate linear regression using the KOO method, the probability of selecting the true model is shown in the following table. Here, we examine Cases (1), an independent covariance structure with the same variance, and (3), a uniform covariance structure.
In this table (Table 1), k is the number of nonzero true explanatory variables, and the true parameter values were omitted. In [12], k was treated as finite. In this paper, k may tend to infinity, such that k / n c 2 [ 0 , 1 ) .
The present paper is organized as follows. In Section 2, we present notations and preliminaries. In Section 3, we state KOO methods with Covariance Structures (1)–(3) in terms of key statistics. Further, an approach for their consistencies is stated in Section 3. In Section 4, Section 5 and Section 6, we discuss consistency properties of KOO methods under Covariance Structures (1)–(3). In Section 7, our conclusions are discussed.

2. Notations and Preliminaries

Suppose that j denotes a subset of ω = { 1 , , k } containing k j elements, and X j denotes the n × k j matrix comprising the columns of X indexed by the elements of j . Then, X ω = X . Further, we assumed that covariance matrix Σ had a covariance structure Σ c . Then, we have a generic candidate model:
M c , j : Y N n × p ( X j Θ j , Σ c , j I n ) ,
where Θ j is a k j × p unknown matrix of regression coefficients. We assumed that rank ( X ) = k .
When Σ c , j is a p × p unknown covariance matrix, we could write the GIC in (2) as follows:
GIC c , j = n log | Σ ^ j | + n p ( log 2 π + 1 ) + d k j p + 1 2 p ( p + 1 ) ,
where n Σ ^ j = Y ( I n P j ) Y and P j = X j ( X j X j ) 1 X j . When j = ω , model M c , ω is called the full model. Σ ^ c , ω and P ω are defined from Σ ^ c , j and P j as j = ω , k ω = k and X ω = X .
In this paper, we considered the cases in which the covariance matrix Σ c belonged to each of the following three structures:
(1)
Independent covariance structure with the same variance (ICSS).
Σ v = σ v 2 I p ,
(2)
Independent covariance structure with different variances (ICSD).
Σ b = diag ( σ 1 2 , , σ p 2 ) ,
(3)
Uniform covariance structure (UCS).
Σ u = σ u 2 ( ρ u 1 δ i j ) 1 i , j p .
The models considered in this paper can be expressed as in (5) with Σ v , j , Σ b , j , and Σ u , j for Σ c , j . Let f ( Y ; Θ j , Σ c , j ) be the density of Y in (5) with Σ = Σ c , j . In the derivation of the GIC , under the covariance structure Σ = Σ c , j , we use the following equality:
2 log max Θ j , Σ c , j f ( Y ; Θ j , Σ c , j ) = n p log ( 2 π ) + min Σ c , j n p log | Σ c , j | + tr Σ c , j 1 Y ( I n P j ) Y .
Let Σ ^ c , j be the quantity minimizing the right-hand side of (7). Then, in our model, it satisfies tr Σ ^ c , j 1 Y ( I n P j ) Y = n p , and we obtain
GIC c , j = 2 log f ( Y ; Θ ^ j , Σ ^ c ) + d m c , j = n p log | Σ ^ c , j | + n p ( log 2 π + 1 ) + d m c , j ,
where m c , j is the number of independent unknown parameters under M c , j , and d is a positive constant that may depend on n. For AIC and BIC , d is defined by 2 ([13]) and log n ([14]), respectively.

3. Approach to Consistencies of KOO Methods

Our KOO method is based on
T c , j ; d = GIC c , ω j GIC c , ω .
In fact, the KOO method chooses the following model:
j ^ c ; d = j | T c , j ; d > 0 .
Its consistency can be proven by showing the following two properties:
Q 1 : [ F 1 ] j j * Pr ( T c , j ; d 0 ) 0 ,
Q 2 : [ F 2 ] j j * Pr ( T c , j ; d 0 ) 0 ,
as in [11]. The result can be shown by using the following inequality:
Pr ( j ^ c ; d = j * ) = Pr j j * ` ` T c , j ; d > 0 " j j * ` ` T c , j ; d < 0 " = 1 Pr j j * ` ` T c , j ; d 0 " j j * ` ` T c , j ; d 0 " 1 j j * Pr ( T c , j ; d 0 ) j j * Pr ( T c , j ; d 0 ) .
Here, [ F 1 ] denotes the probability that true variables are not selected, and [ F 2 ] denotes the probability that nontrue variables are selected. Such notations are used for other variable selection methods. x j is included in the true set of variables if θ j 0 .
Here, we list some of our main assumptions:
A1: The set j * of the true explanatory variables is included in the full subset, i.e., j * ω . and the set j * is finite.
A2: The high-dimensional asymptotic framework:
p , n , k , p / n c 1 ( 0 , 1 ) , k / n c 2 [ 0 , 1 ) , where 0 < c 1 + c 2 < 1 .
A general model selection criterion j ^ c ; d is high-dimensionally consistent if
lim Pr ( j ^ c ; d = j * ) = 1 ,
under a high-dimensional asymptotic framework. Here, “lim” means the limit under A2.

4. Asymptotic Consistency under an Independent Covariance Structure

In this section, we show an asymptotic consistency of the KOO method on the basis of a general information criterion under an independent covariance structure. A generic candidate model when the set of explanatory variables is j can be expressed as follows:
M v , j : Y N n × p ( X j Θ j , Σ v , j I n ) ,
where Σ v , j = σ v , j 2 I p and σ v , j 2 > 0 . Let us denote the density of Y under (13) with f ( Y ; Θ j , σ v , j ) . Then, we have
2 log f ( Y ; Θ j , σ v , j 2 ) = n p log ( 2 π ) + n p log σ v , j 2 + 1 σ v , j 2 tr ( Y X j Θ j ) ( Y X j Θ j ) .
Therefore, the maximal estimators of Θ j and σ v , j 2 under M v , j are given as follows:
Θ ^ j = ( X j X j ) 1 X j Y , σ ^ v , j 2 = 1 n p tr Y ( I n P j ) Y .
General Information Criterion (8) is given by
GIC v , j = n p log σ ^ v , j 2 + n p ( log 2 π + 1 ) + d m v , j ,
where d is a positive constant, and m v , j = k j p + 1 .
Using (9) and (15), we have
T v , j ; d GIC v , ω j GIC v , ω = n p log 1 + U 2 j U 1 1 d p ,
where
U 1 = tr Y ( I n P ω ) Y = = 1 p y ( I n P ω ) y , U 2 j = tr Y ( P ω P ω j ) Y = = 1 p y ( P ω P ω j ) y .
U 1 / σ v , j * 2 and U 2 j / σ v , j * 2 are independently distributed as a central and a noncentral chi-squared distribution, respectively. More precisely, assume that
E ( Y ) = X j * Θ j * ,
and let σ v , * 2 = σ v , j * 2 . Then, using basic distributional properties (see, [15]) on quadratic forms of normal variates and Wishart matrices, we have the following results:
( 1 ) U 1 / σ v , * 2 χ ( n k ) p 2 , ( 2 ) U 2 j / σ v , * 2 χ p 2 ( δ v , j 2 ) , ( 3 ) U 1 U 2 j ,
where noncentrality parameter τ v , j 2 is defined by
δ v , j 2 = 1 σ v , * 2 tr ( X j * Θ j * ) ( P ω P ω j ) X j * Θ j * .
If j j * , δ v , j 2 = 0 , and if j j * , in general, τ v , j 2 0 . For a sufficient condition for the consistency of the KOO method based on GIC v , j , we assumed
A 3 v : For any j j * , δ v , j 2 = O ( n p ) , and lim p / n c 1 1 n p δ v , j 2 = η v , j 2 > 0 .
Now, we consider thew high-dimensional asymptotic consistency of the KOO method based on GIC v , j in (15), whose selection method is given by j ^ v , j ; d = { j | T v , j ; d > 0 } . When j j * , from (16), we can write
T v , j ; d = n p log 1 + χ p 2 / χ m 2 d p , m = ( n k ) p .
Therefore, we have
[ F 2 ] = j j * Pr ( n p log 1 + χ p 2 / χ m 2 d p ) = k k j * Pr ( U h ) k k j * Pr ( U h 0 ) ,
where
U = χ p 2 χ m 2 p m 2 , h = e d / n 1 p m 2 , h 0 = d n p m 2 .
Note that h 0 < h . Then, under the assumption h 0 > 0 , we have
[ F 2 ] k k j * h 2 E [ U 2 ] k k j * h 0 2 E [ U 2 ] .
Related to the assumption h 0 > 0 , we assumed
A 4 v : d > n p m 2 1 1 c 2 , and d = O ( n a ) , 0 < a < 1 .
The first part in A4v implies h 0 > 0 . It is easy to see that
E [ U 2 ] = 2 p ( m + p 2 ) ( m 2 ) 2 ( m 4 ) = O ( ( n 2 p ) 1 ) .
Here, for the first equality, assumption m > 4 is required. Further, h 0 2 = O ( n 2 ( 1 a ) ) . Therefore, from (22), we have that [F2] 0 .
When j j * , we can write T v , j ; d = n p log 1 + χ p 2 ( δ v , j 2 ) / χ m 2 d p . Therefore, we can express [F1] as
[ F 1 ] = j j * Pr ( T ˜ v , j ; d 0 ) ,
where
T ˜ v , j ; d = p n log 1 + χ p 2 ( δ v , j 2 ) χ m 2 d n .
Assumptions A3v and A4v easily show that
T ˜ v , j ; d c 1 log ( 1 + η v , j 2 ) > 0 .
This implies that Pr ( T ˜ v , j ; d 0 ) 0 .
These imply the following theorem.
Theorem 1. 
Suppose that Assumptions A1, A2 A3v, and A4v are satisfied. Then, the KOO method based on general information criteria GIC v , j defined by(15)is asymptotically consistent.
An alternative approach for “ [ F 1 ] 0 ”. When j j * , we can write
T v , j ; d = n p log 1 + χ p 2 ( δ v , j 2 ) / χ m 2 d p .
Therefore, we have
[ F 1 ] = j j * Pr ( n p log 1 + χ p 2 ( δ v , j 2 ) / χ m 2 d p ) = j j * Pr ( U ˜ j h ˜ j ) ,
where, for j j * ,
U ˜ j = χ p 2 ( δ v , j 2 ) χ m 2 p + δ v , j 2 m 2 , h ˜ j = e d / n 1 p + δ v , j 2 m 2 = h δ v , j 2 m 2 .
Then, under d = O ( n a ) ( 0 < a < 1 ) , A3v in (19) and the assumption h ˜ j < 0 (or equivalently h < δ j 2 / ( m 2 ) ), we have
[ F 1 ] k j * max j | h ˜ j | 2 E [ U ˜ 2 ] .
It is easily seen that
E [ U ˜ j 2 ] = 2 ( p + 2 δ v , j 2 ) ( m + p 2 + δ v , j 2 ) ( m 2 ) 2 ( m 4 ) = O ( ( n 2 p ) 1 ) ,
where m > 4 and under d = n a ( 0 < a < 1 ) and A3v,
| h ˜ j | 2 η v , j 2 c 1 ( 1 c 2 ) .
These imply that [ F 1 ] 0 . In this approach, it was assumed that h ˜ j < 0 (or equivalently h < δ j 2 / ( m 2 ) ).

5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances

In this section, we assumed that covariance matrix Σ had an independent covariance matrix with different variances, i.e., Σ = Σ b = diag ( σ b 1 2 , , σ b p ) . First, let us consider deriving a key statistic T b , j ; d = GIC b , ω j GIC b , ω . Consider a candidate model with E ( Y ) = X Θ ,
M b , ω : Y N n × p ( X Θ , Σ b I n ) .
Let the density in the full model be expressed as f ( Y ; Θ , Σ b ) . Then, we have
2 log f ( Y ; Θ , Σ b ) = n p log ( 2 π ) + = 1 p n log σ b 2 + 1 σ b 2 ( y X θ ) ( y X θ ) .
It holds that
2 log max Θ , Σ b f ( Y ; Θ , Σ b ) = n p log 2 π + 1 + = 1 p n log 1 n y ( I n P ω ) y .
Next, consider the model removing the jth explanatory variable from the full model M b , ω , which is denoted by M b , ω j or M ; b , ω j . Similarly,
2 log max M ; b , ω j f ( Y ; Θ , Σ b ) = n p log 2 π + 1 + = 1 p n log 1 n y ( I n P ω j ) y .
Using (25) and (26), we can obtain a general information criterion (8) for two models, M b , ω and M b , ω j , and we have
T b , j ; d GIC b , ω j GIC b , ω = = 1 p n log 1 + U 2 U 1 1 d p ,
where
U 1 = y ( I n P ω ) y , = 1 , , p , U 2 = y ( P ω P ω j ) y , = 1 , , p .
Let us assume that
E ( Y ) = X j * Θ j * and σ b , * 2 = σ b , j * 2
Then, as in (18), we have the following results:
( 1 ) U 1 / σ b , * 2 χ n k 2 , = 1 , , p , ( 2 ) U 2 / σ b , * 2 χ 1 2 ( δ b , j ; 2 ) , = 1 , , p , ( 3 ) U 1 , U 2 , ( = 1 , , p ) are independent ,
where noncentral parameters δ b , j ; 2 are defined by
δ b , j ; 2 = 1 σ b , * 2 ( X j * θ * ( ) ) ( P ω P ω j ) ( X j * θ * ( ) ) ,
with Θ * = ( θ * ( 1 ) , , θ * ( p ) ) . If j j * , δ b , j ; 2 = 0 , and if j j * , δ b , j ; 2 0 . For a sufficient condition for consistency of the KOO method based on GIC b , j , we assumed
A 3 b : For any j j * , lim ( n k ) 1 δ b , j ; 2 = η b , j ; 2 > 0 , and lim 1 p = 1 p log 1 + 1 n k δ b , j ; 2 η b , j 2 > 0 .
Now, we consider the high-dimensional asymptotic consistency of the KOO method based on T b , j ; d in (9), whose selection method is given by j ^ v , j ; d = { j | T b , j ; d > 0 } . When j j * , we have
[ F 2 ] = j j * Pr ( = 1 p n log 1 + U 2 U 1 1 d ) j j * = 1 p Pr ( n log 1 + U 2 U 1 1 d ) .
This implies that
[ F 2 ] p ( k k j * ) Pr ( n log 1 + χ 1 2 / χ n k 2 d ) = p ( k k j * ) Pr ( V r ) ,
where
V = χ 1 2 χ n k 2 1 n k 2 , r = e d / n 1 1 n k 2 , r 0 = d n 1 n k 2 .
Note that r 0 < r . Then, under the assumption r 0 > 0 , we have
[ F 2 ] p k k j * r 2 E [ V 2 ] p k k j * r 0 2 E [ V 2 ] .
Related to the assumption r 0 > 0 , we assumed
A 4 b : d > n n k 2 1 1 c 2 , and d = O ( n a ) , 0 < a < 1 .
The first part in A4b implies r 0 > 0 . It is easy to see that
E [ V 2 ] = 2 ( n k 1 ) ( n k 2 ) 2 ( n k 4 ) = O ( ( n 2 ) 1 ) .
Further, r 0 2 = O ( n 2 ( 1 a ) ) . Therefore, from (33), we have that [F2] 0 .
When j j * , we can write T b , j ; d = n = 1 p log { 1 + U 2 U 1 1 } d p . Therefore, we can express [F1] as follows:
[ F 1 ] = j j * Pr ( T ˜ b , j ; d 0 ) ,
where
T ˜ b , j ; d = 1 p = 1 p log 1 + χ 1 ; 2 ( δ b , j ; 2 ) χ n k ; 2 d n .
Assumptions A3b and A4b easily show that
T ˜ b , j ; d η b , j 2 > 0 .
This implies that Pr ( T ˜ b , j ; d 0 ) 0 .
These imply the following theorem.
Theorem 2. 
Suppose that Assumptions A1, A2, A3b and A4b are satisfied. Then, the KOO method based on T b , j : d in(27)is asymptotically consistent.
Let us consider an alternative approach for " [ F 1 ] 0 " as in the case of independent covariance structure. When j j * , we can write
[ F 1 ] = j j * Pr = 1 p n log 1 + χ 1 ; 2 ( δ b , j ; 2 ) χ n k ; 2 d 0 j j * = 1 p Pr n log 1 + χ 1 ; 2 ( δ b , j ; 2 ) χ n k ; 2 d 0 = j j * = 1 p Pr V ˜ j , r ˜ j , .
Here, for j j * ,
V ˜ j , = χ 1 ; 2 ( δ b , j ; 2 ) χ n k ; 2 1 + δ b , j ; 2 n k 2 , = 1 , , p , r ˜ j , = e d / n 1 1 + δ b , j ; 2 n k 2 = r δ b , j 2 n k 2 , = 1 , , p ,
where r is the same one as in (32). Note that χ 1 ; 2 ( δ b , j ; 2 ) , = 1 , , p are distributed as a noncentral distribution χ 1 2 ( δ b , j ; 2 ) , and they are independent. Then, under the assumption r ˜ j < 0 (or equivalently r < δ b j ; 2 / ( n k 2 ) ), we have
[ F 1 ] k j * = 1 p | r ˜ j , | 2 s E [ V ˜ j , 2 s ] , s = 1 , 2 , .
In the above upper bound, it holds that
| r ˜ j , | δ b , j ; 2 / ( n k ) η b , j ; 2 .
Useful bounds are obtained by giving the first few moments of V ˜ j ; . For example,
E [ V ˜ j , 2 ] = 2 ( 1 + 2 δ v , j ; 2 ) ( n k 1 + δ v , j ; 2 ) ( n k 2 ) 2 ( n k 4 ) = O ( n 1 ) , E [ V ˜ j , 4 ] = O ( n 2 ) .
Then, Bound (35) with s = 2 can be asymptotically expressed as follows:
k j * = 1 p η b , j ; 4 E [ V ˜ j , 4 ] = k j * p 1 p = 1 p η b , j ; 4 × O ( n 2 ) .
The above expression is O ( n 1 ) under the assumption that 1 p = 1 p η b , j ; 4 tends to a quantity.

6. Asymptotic Consistency under a Uniform Covariance Structure

In this section, we show an asymptotic consistency of KOO method based on a general information criterion under a uniform covariance structure. First, following [12], we derive a GIC u , j as in (6), and a key statistic T u , j ; d as in (9). A uniform covariance structure is given by
Σ u = σ u 2 ( ρ u 1 δ i j ) = σ u 2 { ( 1 ρ u ) I p + ρ u 1 p 1 p } ,
with Kronecker delta δ i j . The covariance structure is expressed as follows:
Σ u = α I p 1 p G p + β 1 p G p ,
where
α = σ u 2 ( 1 ρ u ) , β = σ u 2 { 1 + ( p 1 ) ρ u } , G p = 1 p 1 p ,
and 1 p = ( 1 , , 1 ) . Matrices I p 1 p G p and 1 p G p are orthogonal idempotent matrices, so we have
| Σ u | = β α p 1 , Σ u 1 = 1 α I p 1 p G p + 1 β · 1 p G p .
Now, we consider the multivariate regression model M u , j given by
M u , j : Y N n × p ( X j Θ j , Σ u , j I n ) ,
where Σ u , j = α j I p p 1 G p + β j p 1 G p . Let H = ( h 1 , H 2 ) be an orthogonal matrix where h 1 = p 1 / 2 1 p , and let
W j = Y ( I n P j ) Y and U j = H W j H .
Here, h 1 is a characteristic vector of Σ u , j , and each column vector of H 2 is a characteristic vector of Σ u , j . Let the density function of Y under M u , j be denoted by f ( Y ; Θ j , α j , β j ) . Then, we have
g ( α j , β j ) = 2 log max Θ j f ( Y ; Θ j , α j , β j ) = n p log ( 2 π ) + n ( p 1 ) log α j + n log β j + tr Ψ j 1 U j ,
where Ψ j = diag ( β j , α j , , α j ) . Therefore, the maximum likelihood estimators of α j and β j under M u , j are given by
α ^ j = 1 n ( p 1 ) tr H 2 Y ( I n P j ) Y H 2 , β ^ j = 1 n h 1 Y ( I n P j ) Y h 1 .
The number of independent parameters under M u , j is m j = k j p + 2 . Noting that Ψ j is diagonal, we can obtain the general information criterion (GIC) in (8) for Y in (38) as follows:
GIC u , j = n ( p 1 ) log α ^ j + n log β ^ j + n p ( log 2 π + 1 ) + d ( k j p + 2 ) .
Therefore, we have
T u , j ; d GIC u , ω j GIC u , ω = n ( p 1 ) log α ^ ω j α ^ ω 1 + n log β ^ ω j β ^ ω 1 d p = Z 1 j + Z 2 j .
Here, Z 1 j and Z 2 j are defined as follows:
Z 1 j = n ( p 1 ) log 1 + V 2 j ( 1 ) V 1 ( 1 ) 1 d ( p 1 ) , Z 2 j = n log 1 + V 2 j ( 2 ) V 1 ( 2 ) 1 d ,
using the following V 1 ( i ) , V 2 j ( i ) , i = 1 , 2 :
V 1 ( 1 ) = tr H 2 Y ( I n P ω ) Y H 2 , V 2 j ( 1 ) = tr H 2 Y ( P ω P ω j ) Y H 2 , V 1 ( 2 ) = h 1 Y ( I n P ω ) Y h 1 , V 2 j ( 2 ) = h 1 Y ( P ω P ω j ) Y h 1 .
Related to the distributional reductions of Z 1 j , Z 2 j , j = 1 , , k , we use the following Lemma frequently.
Lemma 1. 
Let W have a noncentral Whishart distribution W p ( m , Σ ; Ω ) . Let the covariance matrix Σ be decomposed into characteristic roots and vectors as follows:
Σ = H Λ H = ( H 1 , , H h ) diag ( λ 1 I q 1 , , λ h I q h ) ( H 1 , , H h ) ,
where λ 1 > > λ h > 0 and H is an orthogonal matrix. Then, tr H j W j H j , i = 1 , , h are independently distributed to noncentral chi-squared distributions with m k j degrees of freedom and noncentrality parameters δ j 2 = tr H j Ω H j .
Proof. 
The result may be proven by considering the characteristic function of ( tr H 1 W H 1 , , tr H q W H q ) which is expressed as follows (see Theorem 2.1.2 in [15]):
E e i t 1 tr H 1 W H 1 + + i t h tr H h W H h = E etr ( K ) = | I p 2 Σ K | m / 2 etr Ω K ( I p 2 Σ K ) 1 ) ,
where K = i t 1 H 1 H 1 + + i t 1 H q H q . The result can be easily obtained by checking that the above last expression equals
j = 1 q ( 1 2 i t j ) n k j / 2 exp i t j 1 2 i t j tr H j Ω H j .
Assume that the true model is expressed as
M u , j * : Y N n × p ( X j * Θ j * , Σ u , * I n ) ,
where Σ u , * = α * I p p 1 G p + β * p 1 G p . Using Lemma 1, we have the following lemma.
Lemma 2. 
Under True Model(42), it holds that
(1)
V 1 ( 1 ) / α * and V 2 j ( 1 ) / α * are independently distributed to a central chi-squared distribution χ ( p 1 ) ( n k ) 2 and a noncentral chi-squared distribution χ p 1 2 ( δ 1 j 2 ) , respectively.
(2)
V 1 ( 2 ) / β * and V 2 j ( 2 ) / β * are independently distributed to a central chi-squared distribution χ n k 2 and a noncentral chi-squared distribution χ 1 2 ( δ 2 j 2 ) , respectively.
(3)
Noncentrality parameters δ 1 j 2 and δ 2 j 2 are defined as follows:
δ 1 j 2 = 1 α * tr H 2 ( X j * Θ j * ) ( P ω P ω j ) ( X j * Θ j * ) H 2 δ 2 j 2 = 1 β * h 1 ( X j * Θ j * ) ( P ω P ω j ) ( X j * Θ j * ) h 1 .
Here, if j j * , then δ 1 j 2 = 0 and δ 2 j 2 = 0 .
Now, we consider the high-dimensional asymptotic consistency of the KOO method based on T b , j ; d in (27), whose selection method is given by j ^ v , j ; d = { j | T b , j ; d > 0 } . For a sufficient condition for the consistency of j ^ v , j ; d , we assumed
A3u: For any j j * , δ 1 j 2 = O ( n p ) , δ 2 j 2 = O ( n ) and
lim 1 n p δ 1 j 2 = η 1 j 2 > 0 , lim 1 n δ 2 j 2 = η 2 j 2 > 0 ,
When j j * , we have
[ F 2 ] = j j * Pr ( Z 1 j + Z 2 j 0 ) j j * Pr ( Z 1 j 0 ) + Pr ( Z 2 j 0 ) = ( k k j * ) Pr ( Z ( 1 ) s 0 ( 1 ) ) + Pr ( Z ( 2 ) s 0 ( 2 ) ) .
Here,
Z ( 1 ) = χ p 1 2 χ ( p 1 ) ( n k ) 2 p 1 ( p 1 ) ( n k ) 2 , s ( 1 ) = e d / n 1 p 1 ( p 1 ) ( n k ) 2 , s 0 ( 1 ) = d n p 1 ( p 1 ) ( n k ) 2 , Z ( 2 ) = χ 1 2 χ n k 2 1 n k 2 , s ( 2 ) = e d / n 1 1 n k 2 , s 0 ( 2 ) = d n 1 n k 2 .
Note that s 0 ( 1 ) < s ( 1 ) and s 0 ( 2 ) < s ( 2 ) . Then, under the assumption that s 0 ( 1 ) > 0 and s 0 ( 2 ) > 0 , we have
[ F 2 ] ( k k j * ) s 0 ( 1 ) 2 E ( Z ( 1 ) ) 2 + s 0 ( 2 ) 2 E ( Z ( 2 ) ) 2 .
Related to assumptions s 0 ( 1 ) > 0 and s 0 ( 2 ) > 0 , we assumed
A 4 u : d > n ( p 1 ) ( p 1 ) ( n k ) 2 1 1 c 2 , d > n n k 2 1 1 c 2 , and d = O ( n a ) , 0 < a < 1 .
The first part in A4u implies s 0 ( 1 ) > 0 and s 0 ( 2 ) > 0 . It is easy to see that
E [ ( Z ( 1 ) ) 2 ] = 2 ( p 1 ) 2 ( n k + 1 ) { ( p 1 ) ( n k ) 2 } 2 { ( p 1 ) ( n k ) 4 } = O ( ( n 3 ) 1 ) , E [ ( Z ( 2 ) ) 2 ] = 2 ( n k 1 ) ( n k 2 ) 2 ( n k 4 ) = O ( ( n 2 ) 1 ) .
Further, ( s 0 ( 1 ) ) 2 = O ( n 2 ( 1 a ) ) and ( s 0 ( 2 ) ) 2 = O ( n 2 ( 1 a ) ) . Therefore, from (44), we have that [F2] 0 .
When j j * , we can write T b , j ; d = n = 1 p log { 1 + U 2 U 1 1 } d p . Therefore, we can express [F1] as follows:
[ F 1 ] = j j * Pr ( T ˜ b , j ; d 0 ) ,
where
T ˜ b , j ; d = 1 p = 1 p log 1 + χ 1 ; 2 ( δ b , j ; 2 ) χ n k ; 2 d n .
Assumptions A3b and A4b easily show that
T ˜ v , j ; d log ( 1 + γ v , j 2 ) > 0 .
This implies that Pr ( T ˜ v , j ; d 0 ) 0 , and [ F 1 ] 0 .
These imply the following theorem.
Theorem 3. 
Suppose that Assumptions A1, A2, A3u and A4u are satisfied. Then, the KOO method based on T u , j : d in(40)is asymptotically consistent.

7. Concluding Remarks

In this paper, we considered selecting regression variables in a p variate regression model with one of three covariance structures: (1) ICSS (an independent covariance structure with the same variance), (2) ICSD (an independent covariance structure with different variances), and (3) UCS (a uniform covariance structure). It was proposed to use a KOO method on the basis of a general information criterion with a penalty term d. We indicated high-dimensional consistencies of the KOO methods with d = O ( n a ) , 0 < a < 1 . Ref. [12] studied the asymptotic consistencies of KOO methods in (1) and (3). However, in their approach, the number of explanatory variables was fixed; in this paper, the number of explanatory variables may have tended to infinity. KOO methods may be feasible in computation. The idea goes back to [1], and [2]. However, high-dimensional properties were recently studied in [7,8,9,11].
A high-dimensional study of the KOO method under an autoregressive covariance structure (AUTO), and extending our results to the case of non-normality remain as future work.

Author Contributions

Conceptualization, Y.F.; Methodology, Y.F. and T.S.; Software, T.S.; Writing—original draft, Y.F. and T.S.; Writing—review & editing, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to express their gratitude to Vladimir V. Ulyanov and the three referees for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nishii, R.; Bai, Z.D.; Krishnaia, P.R. Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Math. J. 1988, 18, 451–462. [Google Scholar] [CrossRef]
  2. Zhao, L.C.; Krishnaia, P.R.; Bai, Z.D. On detection of the number of signals in presence of white noise. J. Multivar. Anal. 1986, 20, 1–25. [Google Scholar] [CrossRef] [Green Version]
  3. Bai, Z.; Fujikoshi, Y.; Hu, J. Strong Consistency of the AIC, BIC, Cp and KOO Methods in High-Dimensional Multivariate Linear Regression; Hiroshima Statistical Research Group: Hiroshima, Japan, 2018. [Google Scholar]
  4. Yanagihara, H.; Wakaki, H.; Fujikoshi, Y. A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large. Electron. J. Stat. 2015, 9, 869–897. [Google Scholar] [CrossRef]
  5. Fujikoshi, Y.; Sakurai, T.; Yanagihara, H. Consistency of high-dimensional AIC-type and Cp-type criteria in multivariate linear regression. J. Multivar. Anal. 2014, 123, 184–200. [Google Scholar] [CrossRef]
  6. Fujikoshi, Y.; Sakurai, T. High-dimensional consistency of rank estimation criteria in multivariate linear Model. J. Multivar. Anal. 2016, 149, 199–212. [Google Scholar] [CrossRef]
  7. Oda, R.; Yanagihara, H. A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables. Electron. J. Stat. 2020, 14, 1386–1412. [Google Scholar] [CrossRef]
  8. Oda, R.; Yanagihara, H. A consistent likelihood-based variable selection method in normal multivariate linear regression. In Intelligent Decision Technologies; Czarnowski, I., Ed.; Springer: Singapore, 2021; Volume 238, pp. 391–401. [Google Scholar]
  9. Fujikoshi, Y.; Sakurai, T. Consistency of test-based method for selection of variables in high-dimensional two group-discriminant analysis. Jpn. J. Stat. Data Sci. 2019, 2, 155–171. [Google Scholar] [CrossRef] [Green Version]
  10. Oda, R.; Suzuki, Y.; Yanagihara, H.; Fujikoshi, Y. A consistent variable selection method in high-dimensional canonical discriminant analysis. J. Multivar. Anal. 2020, 175, 1–13. [Google Scholar] [CrossRef]
  11. Fujikoshi, Y. High-dimensional consistencies of KOO methods in multivariate regression model and discriminant analysis. J. Multivar. Anal. 2022, 188, 104860. [Google Scholar] [CrossRef]
  12. Sakurai, T.; Fujikoshi, Y. Exploring consistencies of information criterion and test-based criterion for high-dimensional multivariate regression models under three covariance structures. In Festschrift in Honor of Professor Dietrich von Rosen’s 65th Birthday; Holgerson, T., Singnull, M., Eds.; Springer: Berlin, Germany, 2020; pp. 313–334. [Google Scholar]
  13. Akaike, H. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory; Petrov, B.N., Csáki, F., Eds.; Akadémiai Kiadó: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
  14. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  15. Fujikoshi, Y.; Ulyanov, V.V.; Shimizu, R. Multivariate Statistics: High-Dimensional and Large-Sample Approximations; Wiley: Hobeken, NJ, USA, 2010. [Google Scholar]
Table 1. KOO Based on AIC.
Table 1. KOO Based on AIC.
k = 3 KOO Based on AICKOO Based on AIC
( n , p ) (20, 10)(200, 100)(20, 10)(200, 100)
(1)0.741.000.771.00
(3)0.471.000.221.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fujikoshi, Y.; Sakurai, T. High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. Mathematics 2023, 11, 671. https://doi.org/10.3390/math11030671

AMA Style

Fujikoshi Y, Sakurai T. High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. Mathematics. 2023; 11(3):671. https://doi.org/10.3390/math11030671

Chicago/Turabian Style

Fujikoshi, Yasunori, and Tetsuro Sakurai. 2023. "High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures" Mathematics 11, no. 3: 671. https://doi.org/10.3390/math11030671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop