1. Introduction
We focus on a multivariate linear regression model of 
p response variables 
 on a subset of 
k explanatory variables 
. Suppose that there are 
n observations on a 
p-dimensional response vector 
 and a 
k-dimensional explanatory vector 
, and let 
 and 
 be the observation matrices of 
 and 
 with sample size 
n, respectively. The multivariate linear regression model including all the explanatory variables under normality is written as follows:
      where 
 is a 
 unknown matrix of regression coefficients, and 
 is a 
 unknown covariance matrix that is positive definite. 
 is the normal matrix distribution, such that the mean of 
 is 
, and the covariance matrix of 
 is 
; equivalently, the rows of 
 are independently normal with the same covariance matrix 
. Here, 
 is the 
 column vector that is obtained by stacking the columns of 
 on top of one another. We assumed that 
.
In multivariate linear regression, the selection of variables for the model is an important concern. One of the approaches is to first consider variable selection models and then apply model selection criteria such as 
 and 
. Such a criterion for Full Model (
1) is expressed as follows:
      where 
 is the maximal likelihood, 
, 
 is the penalty term, and 
g is the number of unknown parameters given by 
. For 
 and 
, 
d is defined as 2 and 
, respectively. In the selection of 
k variables 
, we identified 
 with the index set 
, and denote 
 for subset 
 by 
. Then, the model selection based on 
 chooses the following model:
Here the minimum is usually taken for all combinations of response variables. There are computational problems for the methods based on 
, including 
 and 
 methods, since we need to compute 
 statistics for the selection of 
k explanatory variables. To avoid this computational problem, [
1] proposed a method that was essentially thanks to [
2]. The method, which was named the knock-one-out (KOO) method by [
3], determines “selection” or “no selection” for each variable by comparing the model removing that variable and the full model. More precisely, the KOO method chooses the model or the set of variables given by
      
      where 
 is a short expression for 
, which is the set obtained by removing element 
j from the set 
. In general, the KOO method can be applied to a method or criterion, not only AIC, a general variable selection criterion or method.
In the literature on multivariate linear regression, numerous papers have dealt with the variable selection problem, as it relates to selecting explanatory variables. When 
 is unknown positive definite, [
4,
5,
6], for example, indicated that, in a high-dimensional case, 
 and 
 have consistency properties, but 
 is not necessarily consistent. KOO methods in the multivariate regression model were studied by [
3] and [
7,
8]. The KOO method in discriminant analysis; see [
9], and [
10]. For a review, see [
11].
In this paper, we assume that the covariance structure was one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. The numbers of unknown parameters in covariance structures (1)–(3) were 1, 
p, and 2, respectively. Sufficient conditions for the KOO method given by (
4) to be consistent were derived under a high-dimensional asymptotic framework, such that sample size 
n, the number 
p of response variables, and the number 
k of explanatory variables were large, as in 
 and 
, where 
. Ref. [
12] considered similar problems under covariance structures (1), (3), and (4), an autoregressive covariance structure, but did not consider them under (2). Moreover, in the study of asymptotic consistencies, they assumed that 
k was fixed, but in this paper, 
k may tend to infinity, such that 
. From the numerical experiments in [
12], we know that the probability of choosing the true model in Cases (1) and (3) results from the following table (
Table 1). In variable selection for multivariate linear regression using the KOO method, the probability of selecting the true model is shown in the following table. Here, we examine Cases (1), an independent covariance structure with the same variance, and (3), a uniform covariance structure.
In this table (
Table 1), 
k is the number of nonzero true explanatory variables, and the true parameter values were omitted. In [
12], 
k was treated as finite. In this paper, 
k may tend to infinity, such that 
.
The present paper is organized as follows. In 
Section 2, we present notations and preliminaries. In 
Section 3, we state KOO methods with Covariance Structures (1)–(3) in terms of key statistics. Further, an approach for their consistencies is stated in 
Section 3. In 
Section 4, 
Section 5 and 
Section 6, we discuss consistency properties of KOO methods under Covariance Structures (1)–(3). In 
Section 7, our conclusions are discussed.
  2. Notations and Preliminaries
Suppose that 
 denotes a subset of 
 containing 
 elements, and 
 denotes the 
 matrix comprising the columns of 
 indexed by the elements of 
. Then, 
. Further, we assumed that covariance matrix 
 had a covariance structure 
. Then, we have a generic candidate model:
      where 
 is a 
 unknown matrix of regression coefficients. We assumed that 
.
When 
 is a 
 unknown covariance matrix, we could write the 
 in (
2) as follows:
      where 
 and 
. When 
, model 
 is called the full model. 
 and 
 are defined from 
 and 
 as 
, 
 and 
.
In this paper, we considered the cases in which the covariance matrix  belonged to each of the following three structures:
- (1)
- Independent covariance structure with the same variance (ICSS).
           
- (2)
- Independent covariance structure with different variances (ICSD).
           
- (3)
- Uniform covariance structure (UCS).
           
The models considered in this paper can be expressed as in (
5) with 
, 
, and 
 for 
. Let 
 be the density of 
 in (
5) with 
. In the derivation of the 
, under the covariance structure 
, we use the following equality:
Let 
 be the quantity minimizing the right-hand side of (
7). Then, in our model, it satisfies 
 and we obtain
      
      where 
 is the number of independent unknown parameters under 
, and 
d is a positive constant that may depend on 
n. For 
 and 
, 
d is defined by 2 ([
13]) and 
 ([
14]), respectively.
  3. Approach to Consistencies of KOO Methods
Our KOO method is based on
      
In fact, the KOO method chooses the following model:
Its consistency can be proven by showing the following two properties: 
      as in [
11]. The result can be shown by using the following inequality:
Here,  denotes the probability that true variables are not selected, and  denotes the probability that nontrue variables are selected. Such notations are used for other variable selection methods.  is included in the true set of variables if .
Here, we list some of our main assumptions:
A1: The set  of the true explanatory variables is included in the full subset, i.e.,  and the set  is finite.
A2: The high-dimensional asymptotic framework:
.
A general model selection criterion 
 is high-dimensionally consistent if
      
      under a high-dimensional asymptotic framework. Here, “lim” means the limit under A2.
  4. Asymptotic Consistency under an Independent Covariance Structure
In this section, we show an asymptotic consistency of the KOO method on the basis of a general information criterion under an independent covariance structure. A generic candidate model when the set of explanatory variables is 
 can be expressed as follows:
      where 
 and 
. Let us denote the density of 
 under (
13) with 
. Then, we have
      
Therefore, the maximal estimators of 
 and 
 under 
 are given as follows:
General Information Criterion (
8) is given by
      
      where 
d is a positive constant, and 
.
Using (
9) and (
15), we have
      
      where
      
 and 
 are independently distributed as a central and a noncentral chi-squared distribution, respectively. More precisely, assume that
      
      and let 
. Then, using basic distributional properties (see, [
15]) on quadratic forms of normal variates and Wishart matrices, we have the following results:
      where noncentrality parameter 
 is defined by
      
If 
, 
, and if 
, in general, 
. For a sufficient condition for the consistency of the KOO method based on 
, we assumed
      
Now, we consider thew high-dimensional asymptotic consistency of the KOO method based on 
 in (
15), whose selection method is given by 
. When 
, from (
16), we can write
      
Note that 
. Then, under the assumption 
, we have
      
Related to the assumption 
, we assumed
      
The first part in A4v implies 
. It is easy to see that
      
Here, for the first equality, assumption 
 is required. Further, 
. Therefore, from (
22), we have that [F2] 
.
When 
, we can write 
. Therefore, we can express [F1] as
      
      where
      
Assumptions A3v and A4v easily show that
      
This implies that .
These imply the following theorem.
Theorem 1.  Suppose that Assumptions A1, A2 A3v, and A4v
 are satisfied. Then, the KOO 
method based on general information criteria  defined by(
15)
is asymptotically consistent.  An alternative approach for “
”. When 
, we can write
      
Therefore, we have
      
      where, for 
,
      
Then, under 
, A3v in (
19) and the assumption 
 (or equivalently 
), we have
      
It is easily seen that
      
      where 
 and under 
 and A3v,
      
These imply that . In this approach, it was assumed that  (or equivalently ).
  5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances
In this section, we assumed that covariance matrix 
 had an independent covariance matrix with different variances, i.e., 
. First, let us consider deriving a key statistic 
 Consider a candidate model with 
,
      
Let the density in the full model be expressed as 
. Then, we have
      
Next, consider the model removing the 
jth explanatory variable from the full model 
, which is denoted by 
 or 
. Similarly,
      
Using (
25) and (
26), we can obtain a general information criterion (
8) for two models, 
 and 
, and we have
      
      where
      
Then, as in (
18), we have the following results:
      where noncentral parameters 
 are defined by
      
      with 
. If 
, 
, and if 
, 
. For a sufficient condition for consistency of the KOO method based on 
, we assumed
      
Now, we consider the high-dimensional asymptotic consistency of the KOO method based on 
 in (
9), whose selection method is given by 
. When 
, we have
      
Note that 
. Then, under the assumption 
, we have
      
Related to the assumption 
, we assumed
      
The first part in A4b implies 
. It is easy to see that
      
Further, 
. Therefore, from (
33), we have that [F2] 
.
When 
, we can write 
 Therefore, we can express [F1] as follows:
      where
      
Assumptions A3b and A4b easily show that
      
This implies that .
These imply the following theorem.
Theorem 2.  Suppose that Assumptions A1, A2, A3b and A4b
 are satisfied. Then, the KOO
 method based on  in(
27)
is asymptotically consistent.  Let us consider an alternative approach for "
" as in the case of independent covariance structure. When 
, we can write
      
Here, for 
,
      
      where 
r is the same one as in (
32). Note that 
 are distributed as a noncentral distribution 
, and they are independent. Then, under the assumption 
 (or equivalently 
), we have
      
In the above upper bound, it holds that
      
Useful bounds are obtained by giving the first few moments of 
. For example,
      
Then, Bound (
35) with 
 can be asymptotically expressed as follows:
The above expression is  under the assumption that  tends to a quantity.
  6. Asymptotic Consistency under a Uniform Covariance Structure
In this section, we show an asymptotic consistency of KOO method based on a general information criterion under a uniform covariance structure. First, following [
12], we derive a 
 as in (
6), and a key statistic 
 as in (
9). A uniform covariance structure is given by
      
      with Kronecker delta 
. The covariance structure is expressed as follows:
      where
      
      and 
. Matrices 
 and 
 are orthogonal idempotent matrices, so we have
      
Now, we consider the multivariate regression model 
 given by
      
      where 
. Let 
 be an orthogonal matrix where 
, and let
      
Here, 
 is a characteristic vector of 
, and each column vector of 
 is a characteristic vector of 
. Let the density function of 
 under 
 be denoted by 
. Then, we have
      
      where 
. Therefore, the maximum likelihood estimators of 
 and 
 under 
 are given by
      
The number of independent parameters under 
 is 
. Noting that 
 is diagonal, we can obtain the general information criterion (GIC) in (
8) for 
 in (
38) as follows:
Here, 
 and 
 are defined as follows:
      using the following 
, 
, 
: 
Related to the distributional reductions of , , , we use the following Lemma frequently.
Lemma 1.  Let  have a noncentral Whishart distribution . Let the covariance matrix Σ be decomposed into characteristic roots and vectors as follows:where  and  is an orthogonal matrix. Then, ,  are independently distributed to noncentral chi-squared distributions with  degrees of freedom and noncentrality parameters .  Proof.  The result may be proven by considering the characteristic function of 
 which is expressed as follows (see Theorem 2.1.2 in [
15]):
          
where 
. The result can be easily obtained by checking that the above last expression equals
          
□
 Assume that the true model is expressed as
      
      where 
. Using Lemma 1, we have the following lemma.
Lemma 2.  Under True Model(
42)
, it holds that - (1)
-  and  are independently distributed to a central chi-squared distribution  and a noncentral chi-squared distribution , respectively. 
- (2)
-  and  are independently distributed to a central chi-squared distribution  and a noncentral chi-squared distribution , respectively. 
- (3)
- Noncentrality parameters  and  are defined as follows: 
Here, if , then  and .
 Now, we consider the high-dimensional asymptotic consistency of the KOO method based on 
 in (
27), whose selection method is given by 
. For a sufficient condition for the consistency of 
, we assumed
A3u: For any 
, 
, 
 and
      
Note that 
 and 
. Then, under the assumption that 
 and 
, we have
      
Related to assumptions 
 and 
, we assumed
      
The first part in A4u implies 
 and 
. It is easy to see that
      
Further, 
 and 
. Therefore, from (
44), we have that [F2] 
.
When 
, we can write 
 Therefore, we can express [F1] as follows:
      where
      
Assumptions A3b and A4b easily show that
      
This implies that , and .
These imply the following theorem.
Theorem 3.  Suppose that Assumptions A1, A2, A3u and A4u
 are satisfied. Then, the KOO
 method based on  in(
40)
is asymptotically consistent.