1. Introduction
We focus on a multivariate linear regression model of
p response variables
on a subset of
k explanatory variables
. Suppose that there are
n observations on a
p-dimensional response vector
and a
k-dimensional explanatory vector
, and let
and
be the observation matrices of
and
with sample size
n, respectively. The multivariate linear regression model including all the explanatory variables under normality is written as follows:
where
is a
unknown matrix of regression coefficients, and
is a
unknown covariance matrix that is positive definite.
is the normal matrix distribution, such that the mean of
is
, and the covariance matrix of
is
; equivalently, the rows of
are independently normal with the same covariance matrix
. Here,
is the
column vector that is obtained by stacking the columns of
on top of one another. We assumed that
.
In multivariate linear regression, the selection of variables for the model is an important concern. One of the approaches is to first consider variable selection models and then apply model selection criteria such as
and
. Such a criterion for Full Model (
1) is expressed as follows:
where
is the maximal likelihood,
,
is the penalty term, and
g is the number of unknown parameters given by
. For
and
,
d is defined as 2 and
, respectively. In the selection of
k variables
, we identified
with the index set
, and denote
for subset
by
. Then, the model selection based on
chooses the following model:
Here the minimum is usually taken for all combinations of response variables. There are computational problems for the methods based on
, including
and
methods, since we need to compute
statistics for the selection of
k explanatory variables. To avoid this computational problem, [
1] proposed a method that was essentially thanks to [
2]. The method, which was named the knock-one-out (KOO) method by [
3], determines “selection” or “no selection” for each variable by comparing the model removing that variable and the full model. More precisely, the KOO method chooses the model or the set of variables given by
where
is a short expression for
, which is the set obtained by removing element
j from the set
. In general, the KOO method can be applied to a method or criterion, not only AIC, a general variable selection criterion or method.
In the literature on multivariate linear regression, numerous papers have dealt with the variable selection problem, as it relates to selecting explanatory variables. When
is unknown positive definite, [
4,
5,
6], for example, indicated that, in a high-dimensional case,
and
have consistency properties, but
is not necessarily consistent. KOO methods in the multivariate regression model were studied by [
3] and [
7,
8]. The KOO method in discriminant analysis; see [
9], and [
10]. For a review, see [
11].
In this paper, we assume that the covariance structure was one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. The numbers of unknown parameters in covariance structures (1)–(3) were 1,
p, and 2, respectively. Sufficient conditions for the KOO method given by (
4) to be consistent were derived under a high-dimensional asymptotic framework, such that sample size
n, the number
p of response variables, and the number
k of explanatory variables were large, as in
and
, where
. Ref. [
12] considered similar problems under covariance structures (1), (3), and (4), an autoregressive covariance structure, but did not consider them under (2). Moreover, in the study of asymptotic consistencies, they assumed that
k was fixed, but in this paper,
k may tend to infinity, such that
. From the numerical experiments in [
12], we know that the probability of choosing the true model in Cases (1) and (3) results from the following table (
Table 1). In variable selection for multivariate linear regression using the KOO method, the probability of selecting the true model is shown in the following table. Here, we examine Cases (1), an independent covariance structure with the same variance, and (3), a uniform covariance structure.
In this table (
Table 1),
k is the number of nonzero true explanatory variables, and the true parameter values were omitted. In [
12],
k was treated as finite. In this paper,
k may tend to infinity, such that
.
The present paper is organized as follows. In
Section 2, we present notations and preliminaries. In
Section 3, we state KOO methods with Covariance Structures (1)–(3) in terms of key statistics. Further, an approach for their consistencies is stated in
Section 3. In
Section 4,
Section 5 and
Section 6, we discuss consistency properties of KOO methods under Covariance Structures (1)–(3). In
Section 7, our conclusions are discussed.
2. Notations and Preliminaries
Suppose that
denotes a subset of
containing
elements, and
denotes the
matrix comprising the columns of
indexed by the elements of
. Then,
. Further, we assumed that covariance matrix
had a covariance structure
. Then, we have a generic candidate model:
where
is a
unknown matrix of regression coefficients. We assumed that
.
When
is a
unknown covariance matrix, we could write the
in (
2) as follows:
where
and
. When
, model
is called the full model.
and
are defined from
and
as
,
and
.
In this paper, we considered the cases in which the covariance matrix belonged to each of the following three structures:
- (1)
Independent covariance structure with the same variance (ICSS).
- (2)
Independent covariance structure with different variances (ICSD).
- (3)
Uniform covariance structure (UCS).
The models considered in this paper can be expressed as in (
5) with
,
, and
for
. Let
be the density of
in (
5) with
. In the derivation of the
, under the covariance structure
, we use the following equality:
Let
be the quantity minimizing the right-hand side of (
7). Then, in our model, it satisfies
and we obtain
where
is the number of independent unknown parameters under
, and
d is a positive constant that may depend on
n. For
and
,
d is defined by 2 ([
13]) and
([
14]), respectively.
3. Approach to Consistencies of KOO Methods
Our KOO method is based on
In fact, the KOO method chooses the following model:
Its consistency can be proven by showing the following two properties:
as in [
11]. The result can be shown by using the following inequality:
Here, denotes the probability that true variables are not selected, and denotes the probability that nontrue variables are selected. Such notations are used for other variable selection methods. is included in the true set of variables if .
Here, we list some of our main assumptions:
A1: The set of the true explanatory variables is included in the full subset, i.e., and the set is finite.
A2: The high-dimensional asymptotic framework:
.
A general model selection criterion
is high-dimensionally consistent if
under a high-dimensional asymptotic framework. Here, “lim” means the limit under A2.
4. Asymptotic Consistency under an Independent Covariance Structure
In this section, we show an asymptotic consistency of the KOO method on the basis of a general information criterion under an independent covariance structure. A generic candidate model when the set of explanatory variables is
can be expressed as follows:
where
and
. Let us denote the density of
under (
13) with
. Then, we have
Therefore, the maximal estimators of
and
under
are given as follows:
General Information Criterion (
8) is given by
where
d is a positive constant, and
.
Using (
9) and (
15), we have
where
and
are independently distributed as a central and a noncentral chi-squared distribution, respectively. More precisely, assume that
and let
. Then, using basic distributional properties (see, [
15]) on quadratic forms of normal variates and Wishart matrices, we have the following results:
where noncentrality parameter
is defined by
If
,
, and if
, in general,
. For a sufficient condition for the consistency of the KOO method based on
, we assumed
Now, we consider thew high-dimensional asymptotic consistency of the KOO method based on
in (
15), whose selection method is given by
. When
, from (
16), we can write
Note that
. Then, under the assumption
, we have
Related to the assumption
, we assumed
The first part in A4v implies
. It is easy to see that
Here, for the first equality, assumption
is required. Further,
. Therefore, from (
22), we have that [F2]
.
When
, we can write
. Therefore, we can express [F1] as
where
Assumptions A3v and A4v easily show that
This implies that .
These imply the following theorem.
Theorem 1. Suppose that Assumptions A1, A2 A3v, and A4v
are satisfied. Then, the KOO
method based on general information criteria defined by(
15)
is asymptotically consistent. An alternative approach for “
”. When
, we can write
Therefore, we have
where, for
,
Then, under
, A3v in (
19) and the assumption
(or equivalently
), we have
It is easily seen that
where
and under
and A3v,
These imply that . In this approach, it was assumed that (or equivalently ).
5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances
In this section, we assumed that covariance matrix
had an independent covariance matrix with different variances, i.e.,
. First, let us consider deriving a key statistic
Consider a candidate model with
,
Let the density in the full model be expressed as
. Then, we have
Next, consider the model removing the
jth explanatory variable from the full model
, which is denoted by
or
. Similarly,
Using (
25) and (
26), we can obtain a general information criterion (
8) for two models,
and
, and we have
where
Then, as in (
18), we have the following results:
where noncentral parameters
are defined by
with
. If
,
, and if
,
. For a sufficient condition for consistency of the KOO method based on
, we assumed
Now, we consider the high-dimensional asymptotic consistency of the KOO method based on
in (
9), whose selection method is given by
. When
, we have
Note that
. Then, under the assumption
, we have
Related to the assumption
, we assumed
The first part in A4b implies
. It is easy to see that
Further,
. Therefore, from (
33), we have that [F2]
.
When
, we can write
Therefore, we can express [F1] as follows:
where
Assumptions A3b and A4b easily show that
This implies that .
These imply the following theorem.
Theorem 2. Suppose that Assumptions A1, A2, A3b and A4b
are satisfied. Then, the KOO
method based on in(
27)
is asymptotically consistent. Let us consider an alternative approach for "
" as in the case of independent covariance structure. When
, we can write
Here, for
,
where
r is the same one as in (
32). Note that
are distributed as a noncentral distribution
, and they are independent. Then, under the assumption
(or equivalently
), we have
In the above upper bound, it holds that
Useful bounds are obtained by giving the first few moments of
. For example,
Then, Bound (
35) with
can be asymptotically expressed as follows:
The above expression is under the assumption that tends to a quantity.
6. Asymptotic Consistency under a Uniform Covariance Structure
In this section, we show an asymptotic consistency of KOO method based on a general information criterion under a uniform covariance structure. First, following [
12], we derive a
as in (
6), and a key statistic
as in (
9). A uniform covariance structure is given by
with Kronecker delta
. The covariance structure is expressed as follows:
where
and
. Matrices
and
are orthogonal idempotent matrices, so we have
Now, we consider the multivariate regression model
given by
where
. Let
be an orthogonal matrix where
, and let
Here,
is a characteristic vector of
, and each column vector of
is a characteristic vector of
. Let the density function of
under
be denoted by
. Then, we have
where
. Therefore, the maximum likelihood estimators of
and
under
are given by
The number of independent parameters under
is
. Noting that
is diagonal, we can obtain the general information criterion (GIC) in (
8) for
in (
38) as follows:
Here,
and
are defined as follows:
using the following
,
,
:
Related to the distributional reductions of , , , we use the following Lemma frequently.
Lemma 1. Let have a noncentral Whishart distribution . Let the covariance matrix Σ be decomposed into characteristic roots and vectors as follows:where and is an orthogonal matrix. Then, , are independently distributed to noncentral chi-squared distributions with degrees of freedom and noncentrality parameters . Proof. The result may be proven by considering the characteristic function of
which is expressed as follows (see Theorem 2.1.2 in [
15]):
where
. The result can be easily obtained by checking that the above last expression equals
□
Assume that the true model is expressed as
where
. Using Lemma 1, we have the following lemma.
Lemma 2. Under True Model(
42)
, it holds that - (1)
and are independently distributed to a central chi-squared distribution and a noncentral chi-squared distribution , respectively.
- (2)
and are independently distributed to a central chi-squared distribution and a noncentral chi-squared distribution , respectively.
- (3)
Noncentrality parameters and are defined as follows:
Here, if , then and .
Now, we consider the high-dimensional asymptotic consistency of the KOO method based on
in (
27), whose selection method is given by
. For a sufficient condition for the consistency of
, we assumed
A3u: For any
,
,
and
Note that
and
. Then, under the assumption that
and
, we have
Related to assumptions
and
, we assumed
The first part in A4u implies
and
. It is easy to see that
Further,
and
. Therefore, from (
44), we have that [F2]
.
When
, we can write
Therefore, we can express [F1] as follows:
where
Assumptions A3b and A4b easily show that
This implies that , and .
These imply the following theorem.
Theorem 3. Suppose that Assumptions A1, A2, A3u and A4u
are satisfied. Then, the KOO
method based on in(
40)
is asymptotically consistent.