High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures

Fujikoshi, Yasunori; Sakurai, Tetsuro

doi:10.3390/math11030671

Open AccessArticle

High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures

by

Yasunori Fujikoshi

^1,* and

Tetsuro Sakurai

²

¹

Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-2 Kagamiyama, Hiroshima 739-8626, Japan

²

School of General and Management Studies, Suwa University of Science, 5000-1 Toyohira, Chino 391-0292, Japan

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(3), 671; https://doi.org/10.3390/math11030671

Submission received: 18 November 2022 / Revised: 28 December 2022 / Accepted: 17 January 2023 / Published: 28 January 2023

(This article belongs to the Special Issue Limit Theorems of Probability Theory)

Download Versions Notes

Abstract

:

In this paper, we consider the high-dimensional consistencies of KOO methods for selecting response variables in multivariate linear regression with covariance structures. Here, the covariance structures are considered as (1) independent covariance structure with the same variance, (2) independent covariance structure with different variances, and (3) uniform covariance structure. A sufficient condition for model selection consistency is obtained using a KOO method under a high-dimensional asymptotic framework, such that sample size n, the number p of response variables, and the number k of explanatory variables are large, as in

p / n \to c_{1} \in (0, 1)

and

k / n \to c_{2} \in [0, 1)

, where

c_{1} + c_{2} < 1

.

Keywords:

consistency property; covariance structures; high-dimensional asymptotic framework; KOO methods; multivariate linear regression

MSC:

62H12; 62H10

1. Introduction

We focus on a multivariate linear regression model of p response variables

y_{1}, \dots, y_{p}

on a subset of k explanatory variables

x_{1}, \dots, x_{k}

. Suppose that there are n observations on a p-dimensional response vector

y = {(y_{1}, \dots, y_{p})}^{'}

and a k-dimensional explanatory vector

x = {(x_{1}, \dots, x_{k})}^{'}

, and let

Y : n \times p

and

X : n \times k

be the observation matrices of

y

and

x

with sample size n, respectively. The multivariate linear regression model including all the explanatory variables under normality is written as follows:

Y \sim N_{n \times p} (X Θ, Σ \otimes I_{n}),

(1)

where

Θ

is a

k \times p

unknown matrix of regression coefficients, and

Σ

is a

p \times p

unknown covariance matrix that is positive definite.

N_{n \times p} (\cdot, \cdot)

is the normal matrix distribution, such that the mean of

Y

is

X Θ

, and the covariance matrix of

vec (Y)

is

Σ \otimes I_{n}

; equivalently, the rows of

Y

are independently normal with the same covariance matrix

Σ

. Here,

vec (Y)

is the

n p \times 1

column vector that is obtained by stacking the columns of

Y

on top of one another. We assumed that

rank (X) = k

.

In multivariate linear regression, the selection of variables for the model is an important concern. One of the approaches is to first consider variable selection models and then apply model selection criteria such as

AIC

and

BIC

. Such a criterion for Full Model (1) is expressed as follows:

GIC = - 2 log L (\hat{Ξ}) + d g,

(2)

where

L (\hat{Ξ})

is the maximal likelihood,

Ξ = \{Θ, Σ\}

,

d > 0

is the penalty term, and g is the number of unknown parameters given by

{k p + \frac{1}{2} p (p + 1)}

. For

AIC

and

BIC

, d is defined as 2 and

log n

, respectively. In the selection of k variables

x_{1}, \dots, x_{k}

, we identified

{x_{1}, \dots, x_{k}}

with the index set

{1, \dots, k} \equiv ω

, and denote

GIC

for subset

j \subset ω

by

{GIC}_{j}

. Then, the model selection based on

GIC

chooses the following model:

\tilde{j} = arg min_{j} {GIC}_{j} .

(3)

Here the minimum is usually taken for all combinations of response variables. There are computational problems for the methods based on

GIC

, including

AIC

and

BIC

methods, since we need to compute

2^{k} - 1

statistics for the selection of k explanatory variables. To avoid this computational problem, [1] proposed a method that was essentially thanks to [2]. The method, which was named the knock-one-out (KOO) method by [3], determines “selection” or “no selection” for each variable by comparing the model removing that variable and the full model. More precisely, the KOO method chooses the model or the set of variables given by

\hat{j} = {j \in ω | {GIC}_{ω ∖ j} > {GIC}_{ω}},

(4)

where

ω ∖ j

is a short expression for

ω ∖ {j}

, which is the set obtained by removing element j from the set

ω

. In general, the KOO method can be applied to a method or criterion, not only AIC, a general variable selection criterion or method.

In the literature on multivariate linear regression, numerous papers have dealt with the variable selection problem, as it relates to selecting explanatory variables. When

Σ

is unknown positive definite, [4,5,6], for example, indicated that, in a high-dimensional case,

AIC

and

C_{p}

have consistency properties, but

BIC

is not necessarily consistent. KOO methods in the multivariate regression model were studied by [3] and [7,8]. The KOO method in discriminant analysis; see [9], and [10]. For a review, see [11].

In this paper, we assume that the covariance structure was one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. The numbers of unknown parameters in covariance structures (1)–(3) were 1, p, and 2, respectively. Sufficient conditions for the KOO method given by (4) to be consistent were derived under a high-dimensional asymptotic framework, such that sample size n, the number p of response variables, and the number k of explanatory variables were large, as in

p / n \to c_{1} \in (0, 1)

and

k / n \to c_{2} \in [0, 1)

, where

c_{1} + c_{2} < 1

. Ref. [12] considered similar problems under covariance structures (1), (3), and (4), an autoregressive covariance structure, but did not consider them under (2). Moreover, in the study of asymptotic consistencies, they assumed that k was fixed, but in this paper, k may tend to infinity, such that

k / n \to c_{2} \in [0, 1)

. From the numerical experiments in [12], we know that the probability of choosing the true model in Cases (1) and (3) results from the following table (Table 1). In variable selection for multivariate linear regression using the KOO method, the probability of selecting the true model is shown in the following table. Here, we examine Cases (1), an independent covariance structure with the same variance, and (3), a uniform covariance structure.

In this table (Table 1), k is the number of nonzero true explanatory variables, and the true parameter values were omitted. In [12], k was treated as finite. In this paper, k may tend to infinity, such that

k / n \to c_{2} \in [0, 1)

.

The present paper is organized as follows. In Section 2, we present notations and preliminaries. In Section 3, we state KOO methods with Covariance Structures (1)–(3) in terms of key statistics. Further, an approach for their consistencies is stated in Section 3. In Section 4, Section 5 and Section 6, we discuss consistency properties of KOO methods under Covariance Structures (1)–(3). In Section 7, our conclusions are discussed.

2. Notations and Preliminaries

Suppose that

j

denotes a subset of

ω = {1, \dots, k}

containing

k_{j}

elements, and

X_{j}

denotes the

n \times k_{j}

matrix comprising the columns of

X

indexed by the elements of

j

. Then,

X_{ω} = X

. Further, we assumed that covariance matrix

Σ

had a covariance structure

Σ_{c}

. Then, we have a generic candidate model:

M_{c, j} : Y \sim N_{n \times p} (X_{j} Θ_{j}, Σ_{c, j} \otimes I_{n}),

(5)

where

Θ_{j}

is a

k_{j} \times p

unknown matrix of regression coefficients. We assumed that

rank (X) = k

.

When

Σ_{c, j}

is a

p \times p

unknown covariance matrix, we could write the

GIC

in (2) as follows:

{GIC}_{c, j} = n log | {\hat{Σ}}_{j} | + n p (log 2 π + 1) + d \{k_{j} p + \frac{1}{2} p (p + 1)\},

(6)

where

n {\hat{Σ}}_{j} = Y^{'} (I_{n} - P_{j}) Y

and

P_{j} = X_{j} {(X_{j}^{'} X_{j})}^{- 1} X_{j}^{'}

. When

j = ω

, model

M_{c, ω}

is called the full model.

{\hat{Σ}}_{c, ω}

and

P_{ω}

are defined from

{\hat{Σ}}_{c, j}

and

P_{j}

as

j = ω

,

k_{ω} = k

and

X_{ω} = X

.

In this paper, we considered the cases in which the covariance matrix

Σ_{c}

belonged to each of the following three structures:

(1): Independent covariance structure with the same variance (ICSS).

$\begin{matrix} Σ_{v} = σ_{v}^{2} I_{p}, \end{matrix}$
(2): Independent covariance structure with different variances (ICSD).

$\begin{matrix} Σ_{b} = diag (σ_{1}^{2}, \dots, σ_{p}^{2}), \end{matrix}$
(3): Uniform covariance structure (UCS).

$\begin{matrix} Σ_{u} = σ_{u}^{2} {(ρ_{u}^{1 - δ_{i j}})}_{1 \leq i, j \leq p} . \end{matrix}$

The models considered in this paper can be expressed as in (5) with

Σ_{v, j}

,

Σ_{b, j}

, and

Σ_{u, j}

for

Σ_{c, j}

. Let

f (Y; Θ_{j}, Σ_{c, j})

be the density of

Y

in (5) with

Σ = Σ_{c, j}

. In the derivation of the

GIC

, under the covariance structure

Σ = Σ_{c, j}

, we use the following equality:

\begin{matrix} - 2 log max_{Θ_{j}, Σ_{c, j}} & f (Y; Θ_{j}, Σ_{c, j}) = n p log (2 π) \\ + min_{Σ_{c, j}} \{n p log | Σ_{c, j} | + tr Σ_{c, j}^{- 1} Y^{'} (I_{n} - P_{j}) Y\} . \end{matrix}

(7)

Let

{\hat{Σ}}_{c, j}

be the quantity minimizing the right-hand side of (7). Then, in our model, it satisfies

tr {\hat{Σ}}_{c, j}^{- 1} Y^{'} (I_{n} - P_{j}) Y = n p,

and we obtain

\begin{matrix} {GIC}_{c, j} & = - 2 log f (Y; {\hat{Θ}}_{j}, {\hat{Σ}}_{c}) + d m_{c, j} \\ = n p log | {\hat{Σ}}_{c, j} | + n p (log 2 π + 1) + d m_{c, j}, \end{matrix}

(8)

where

m_{c, j}

is the number of independent unknown parameters under

M_{c, j}

, and d is a positive constant that may depend on n. For

AIC

and

BIC

, d is defined by 2 ([13]) and

log n

([14]), respectively.

3. Approach to Consistencies of KOO Methods

Our KOO method is based on

T_{c, j; d} = {GIC}_{c, ω ∖ j} - {GIC}_{c, ω} .

(9)

In fact, the KOO method chooses the following model:

{\hat{j}}_{c; d} = \{j | T_{c, j; d} > 0\} .

(10)

Its consistency can be proven by showing the following two properties:

\begin{matrix} Q 1 : [F 1] & \equiv \sum_{j \in j_{*}} Pr (T_{c, j; d} \leq 0) \to 0, \end{matrix}

(11)

\begin{matrix} Q 2 : [F 2] & \equiv \sum_{j \notin j_{*}} Pr (T_{c, j; d} \geq 0) \to 0, \end{matrix}

(12)

as in [11]. The result can be shown by using the following inequality:

\begin{matrix} Pr ({\hat{j}}_{c; d} = j_{*}) & = Pr (⋂_{j \in j_{*}} ` ` T_{c, j; d} > 0 " ⋂_{j \notin j_{*}} ` ` T_{c, j; d} < 0 ") \\ = 1 - Pr (⋃_{j \in j_{*}} ` ` T_{c, j; d} \leq 0 " ⋃_{j \notin j_{*}} ` ` T_{c, j; d} \geq 0 ") \\ \geq 1 - \sum_{j \in j_{*}} Pr (T_{c, j; d} \leq 0) - \sum_{j \notin j_{*}} Pr (T_{c, j; d} \geq 0) . \end{matrix}

Here,

[F 1]

denotes the probability that true variables are not selected, and

[F 2]

denotes the probability that nontrue variables are selected. Such notations are used for other variable selection methods.

x_{j}

is included in the true set of variables if

θ_{j} \neq 0

.

Here, we list some of our main assumptions:

A1: The set

j_{*}

of the true explanatory variables is included in the full subset, i.e.,

j_{*} \subset ω .

and the set

j_{*}

is finite.

A2: The high-dimensional asymptotic framework:

p \to \infty, n \to \infty, k \to \infty, p / n \to c_{1} \in (0, 1), k / n \to c_{2} \in [0, 1), where 0 < c_{1} + c_{2} < 1

.

A general model selection criterion

{\hat{j}}_{c; d}

is high-dimensionally consistent if

lim Pr ({\hat{j}}_{c; d} = j_{*}) = 1,

under a high-dimensional asymptotic framework. Here, “lim” means the limit under A2.

4. Asymptotic Consistency under an Independent Covariance Structure

In this section, we show an asymptotic consistency of the KOO method on the basis of a general information criterion under an independent covariance structure. A generic candidate model when the set of explanatory variables is

j

can be expressed as follows:

M_{v, j} : Y \sim N_{n \times p} (X_{j} Θ_{j}, Σ_{v, j} \otimes I_{n}),

(13)

where

Σ_{v, j} = σ_{v, j}^{2} I_{p}

and

σ_{v, j}^{2} > 0

. Let us denote the density of

Y

under (13) with

f (Y; Θ_{j}, σ_{v, j})

. Then, we have

\begin{matrix} - 2 log f (Y; Θ_{j}, σ_{v, j}^{2}) & = n p log (2 π) + n p log σ_{v, j}^{2} \\ + \frac{1}{σ_{v, j}^{2}} tr {(Y - X_{j} Θ_{j})}^{'} (Y - X_{j} Θ_{j}) . \end{matrix}

Therefore, the maximal estimators of

Θ_{j}

and

σ_{v, j}^{2}

under

M_{v, j}

are given as follows:

{\hat{Θ}}_{j} = {(X_{j}^{'} X_{j})}^{- 1} X_{j}^{'} Y, {\hat{σ}}_{v, j}^{2} = \frac{1}{n p} tr Y^{'} (I_{n} - P_{j}) Y .

(14)

General Information Criterion (8) is given by

{GIC}_{v, j} = n p log {\hat{σ}}_{v, j}^{2} + n p (log 2 π + 1) + d m_{v, j},

(15)

where d is a positive constant, and

m_{v, j} = k_{j} p + 1

.

Using (9) and (15), we have

\begin{matrix} T_{v, j; d} & \equiv {GIC}_{v, ω ∖ j} - {GIC}_{v, ω} \\ = n p log (1 + U_{2 j} U_{1}^{- 1}) - d p, \end{matrix}

(16)

where

\begin{matrix} U_{1} = tr Y^{'} (I_{n} - P_{ω}) Y = \sum_{ℓ = 1}^{p} y_{ℓ}^{'} (I_{n} - P_{ω}) y_{ℓ}, \\ U_{2 j} = tr Y^{'} (P_{ω} - P_{ω ∖ j}) Y = \sum_{ℓ = 1}^{p} y_{ℓ}^{'} (P_{ω} - P_{ω ∖ j}) y_{ℓ} . \end{matrix}

U_{1} / σ_{v, j_{*}}^{2}

and

U_{2 j} / σ_{v, j_{*}}^{2}

are independently distributed as a central and a noncentral chi-squared distribution, respectively. More precisely, assume that

E (Y) = X_{j_{*}} Θ_{j_{*}},

(17)

and let

σ_{v, *}^{2} = σ_{v, j_{*}}^{2}

. Then, using basic distributional properties (see, [15]) on quadratic forms of normal variates and Wishart matrices, we have the following results:

\begin{matrix} (1) U_{1} / σ_{v, *}^{2} \sim χ_{(n - k) p}^{2}, \\ (2) U_{2 j} / σ_{v, *}^{2} \sim χ_{p}^{2} (δ_{v, j}^{2}), \\ (3) U_{1} ⊥ U_{2 j}, \end{matrix}

(18)

where noncentrality parameter

τ_{v, j}^{2}

is defined by

\begin{matrix} δ_{v, j}^{2} = \frac{1}{σ_{v, *}^{2}} tr {(X_{j_{*}} Θ_{j_{*}})}^{'} (P_{ω} - P_{ω ∖ j}) X_{j_{*}} Θ_{j_{*}} . \end{matrix}

If

j \notin j_{*}

,

δ_{v, j}^{2} = 0

, and if

j \in j_{*}

, in general,

τ_{v, j}^{2} \neq 0

. For a sufficient condition for the consistency of the KOO method based on

{GIC}_{v, j}

, we assumed

A 3 v : For any j \in j_{*}, δ_{v, j}^{2} = O (n p), and lim_{p / n \to c_{1}} \frac{1}{n p} δ_{v, j}^{2} = η_{v, j}^{2} > 0 .

(19)

Now, we consider thew high-dimensional asymptotic consistency of the KOO method based on

{GIC}_{v, j}

in (15), whose selection method is given by

{\hat{j}}_{v, j; d} = {j | T_{v, j; d} > 0}

. When

j \notin j_{*}

, from (16), we can write

T_{v, j; d} = n p log \{1 + χ_{p}^{2} / χ_{m}^{2}\} - d p, m = (n - k) p .

Therefore, we have

\begin{matrix} [F 2] & = \sum_{j \notin j_{*}} Pr (n p log \{1 + χ_{p}^{2} / χ_{m}^{2}\} \geq d p) \\ = (k - k_{j_{*}}) Pr (U \geq h) \\ \leq (k - k_{j_{*}}) Pr (U \geq h_{0}), \end{matrix}

(20)

where

\begin{matrix} U & = \frac{χ_{p}^{2}}{χ_{m}^{2}} - \frac{p}{m - 2}, \\ h & = e^{d / n} - 1 - \frac{p}{m - 2}, h_{0} = \frac{d}{n} - \frac{p}{m - 2} . \end{matrix}

(21)

Note that

h_{0} < h

. Then, under the assumption

h_{0} > 0

, we have

[F 2] \leq (k - k_{j_{*}}) h^{- 2 ℓ} E [U^{2 ℓ}] \leq (k - k_{j_{*}}) h_{0}^{- 2 ℓ} E [U^{2 ℓ}] .

(22)

Related to the assumption

h_{0} > 0

, we assumed

\begin{matrix} A 4 v : d > \frac{n p}{m - 2} \to \frac{1}{1 - c_{2}}, and d = O (n^{a}), 0 < a < 1 . \end{matrix}

(23)

The first part in A4v implies

h_{0} > 0

. It is easy to see that

E [U^{2}] = \frac{2 p (m + p - 2)}{{(m - 2)}^{2} (m - 4)} = O ({(n^{2} p)}^{- 1}) .

Here, for the first equality, assumption

m > 4

is required. Further,

h_{0}^{- 2} = O (n^{2 (1 - a)})

. Therefore, from (22), we have that [F2]

\to 0

.

When

j \in j_{*}

, we can write

T_{v, j; d} = n p log \{1 + χ_{p}^{2} (δ_{v, j}^{2}) / χ_{m}^{2}\} - d p

. Therefore, we can express [F1] as

\begin{matrix} [F 1] & = \sum_{j \in j_{*}} Pr ({\tilde{T}}_{v, j; d} \leq 0), \end{matrix}

where

{\tilde{T}}_{v, j; d} = \frac{p}{n} log \{1 + \frac{χ_{p}^{2} (δ_{v, j}^{2})}{χ_{m}^{2}}\} - \frac{d}{n} .

Assumptions A3v and A4v easily show that

{\tilde{T}}_{v, j; d} \to c_{1} log (1 + η_{v, j}^{2}) > 0 .

This implies that

Pr ({\tilde{T}}_{v, j; d} \leq 0) \to 0

.

These imply the following theorem.

Theorem 1.

Suppose that Assumptions A1, A2 A3v, and A4v are satisfied. Then, the KOO method based on general information criteria

{GIC}_{v, j}

defined by(15)is asymptotically consistent.

An alternative approach for “

[F 1] \to 0

”. When

j \in j_{*}

, we can write

T_{v, j; d} = n p log \{1 + χ_{p}^{2} (δ_{v, j}^{2}) / χ_{m}^{2}\} - d p .

Therefore, we have

\begin{matrix} [F 1] & = \sum_{j \in j_{*}} Pr (n p log \{1 + χ_{p}^{2} (δ_{v, j}^{2}) / χ_{m}^{2}\} \leq d p) \\ = \sum_{j \in j_{*}} Pr ({\tilde{U}}_{j} \leq {\tilde{h}}_{j}), \end{matrix}

where, for

j \in j_{*}

,

{\tilde{U}}_{j} = \frac{χ_{p}^{2} (δ_{v, j}^{2})}{χ_{m}^{2}} - \frac{p + δ_{v, j}^{2}}{m - 2}, {\tilde{h}}_{j} = e^{d / n} - 1 - \frac{p + δ_{v, j}^{2}}{m - 2} = h - \frac{δ_{v, j}^{2}}{m - 2} .

Then, under

d = O (n^{a}) (0 < a < 1)

, A3v in (19) and the assumption

{\tilde{h}}_{j} < 0

(or equivalently

h < δ_{j}^{2} / (m - 2)

), we have

[F 1] \leq k_{j_{*}} max_{j} {| {\tilde{h}}_{j} |}^{- 2 ℓ} E [{\tilde{U}}^{2 ℓ}] .

It is easily seen that

E [{\tilde{U}}_{j}^{2}] = \frac{2 (p + 2 δ_{v, j}^{2}) (m + p - 2 + δ_{v, j}^{2})}{{(m - 2)}^{2} (m - 4)} = O ({(n^{2} p)}^{- 1}),

where

m > 4

and under

d = n^{a} (0 < a < 1)

and A3v,

| {\tilde{h}}_{j} |^{2} \to \frac{η_{v, j}^{2}}{c_{1} (1 - c_{2})} .

These imply that

[F 1] \to 0

. In this approach, it was assumed that

{\tilde{h}}_{j} < 0

(or equivalently

h < δ_{j}^{2} / (m - 2)

).

5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances

In this section, we assumed that covariance matrix

Σ

had an independent covariance matrix with different variances, i.e.,

Σ = Σ_{b} = diag (σ_{b 1}^{2}, \dots, σ_{b p})

. First, let us consider deriving a key statistic

T_{b, j; d} = {GIC}_{b, ω ∖ j} - {GIC}_{b, ω} .

Consider a candidate model with

E (Y) = X Θ

,

M_{b, ω} : Y \sim N_{n \times p} (X Θ, Σ_{b} \otimes I_{n}) .

(24)

Let the density in the full model be expressed as

f (Y; Θ, Σ_{b})

. Then, we have

\begin{matrix} - 2 log f (Y; Θ, Σ_{b}) & = n p log (2 π) \\ + \sum_{ℓ = 1}^{p} \{n log σ_{b ℓ}^{2} + \frac{1}{σ_{b ℓ}^{2}} {(y_{ℓ} - X θ_{ℓ})}^{'} (y_{ℓ} - X θ_{ℓ})\} . \end{matrix}

It holds that

\begin{matrix} - 2 log max_{Θ, Σ_{b}} f (Y; Θ, Σ_{b}) & = n p (log 2 π + 1) \\ + \sum_{ℓ = 1}^{p} n log \frac{1}{n} y_{ℓ}^{'} (I_{n} - P_{ω}) y_{ℓ} . \end{matrix}

(25)

Next, consider the model removing the jth explanatory variable from the full model

M_{b, ω}

, which is denoted by

M_{b, ω ∖ j}

or

M; b, ω ∖ j

. Similarly,

\begin{matrix} - 2 log max_{M; b, ω ∖ j} f (Y; Θ, Σ_{b}) & = n p (log 2 π + 1) \\ + \sum_{ℓ = 1}^{p} n log \frac{1}{n} y_{ℓ}^{'} (I_{n} - P_{ω ∖ j}) y_{ℓ} . \end{matrix}

(26)

Using (25) and (26), we can obtain a general information criterion (8) for two models,

M_{b, ω}

and

M_{b, ω ∖ j}

, and we have

\begin{matrix} T_{b, j; d} & \equiv {GIC}_{b, ω ∖ j} - {GIC}_{b, ω} \\ = \sum_{ℓ = 1}^{p} n log (1 + U_{2 ℓ} U_{1 ℓ}^{- 1}) - d p, \end{matrix}

(27)

where

\begin{matrix} U_{1 ℓ} = y_{ℓ}^{'} (I_{n} - P_{ω}) y_{ℓ}, ℓ = 1, \dots, p, \\ U_{2 ℓ} = y_{ℓ}^{'} (P_{ω} - P_{ω ∖ j}) y_{ℓ}, ℓ = 1, \dots, p . \end{matrix}

Let us assume that

E (Y) = X_{j_{*}} Θ_{j_{*}} and σ_{b, *}^{2} = σ_{b, j_{*}}^{2}

(28)

Then, as in (18), we have the following results:

\begin{matrix} (1) U_{1 ℓ} / σ_{b, *}^{2} \sim χ_{n - k}^{2}, ℓ = 1, \dots, p, \\ (2) U_{2 ℓ} / σ_{b, *}^{2} \sim χ_{1}^{2} (δ_{b, j; ℓ}^{2}), ℓ = 1, \dots, p, \\ (3) U_{1 ℓ}, U_{2 ℓ}, (ℓ = 1, \dots, p) are independent, \end{matrix}

(29)

where noncentral parameters

δ_{b, j; ℓ}^{2}

are defined by

\begin{matrix} δ_{b, j; ℓ}^{2} = \frac{1}{σ_{b, *}^{2}} {(X_{j_{*}} θ_{*}^{(ℓ)})}^{'} (P_{ω} - P_{ω ∖ j}) (X_{j_{*}} θ_{*}^{(ℓ)}), \end{matrix}

with

Θ_{*} = (θ_{*}^{(1)}, \dots, θ_{*}^{(p)})

. If

j \notin j_{*}

,

δ_{b, j; ℓ}^{2} = 0

, and if

j \in j_{*}

,

δ_{b, j; ℓ}^{2} \neq 0

. For a sufficient condition for consistency of the KOO method based on

{GIC}_{b, j}

, we assumed

\begin{matrix} A 3 b : & For any j \in j_{*}, lim {(n - k)}^{- 1} δ_{b, j; ℓ}^{2} = η_{b, j; ℓ}^{2} > 0, and \\ lim \frac{1}{p} \sum_{ℓ = 1}^{p} log \{1 + \frac{1}{n - k} δ_{b, j; ℓ}^{2}\} \to η_{b, j}^{2} > 0 . \end{matrix}

(30)

Now, we consider the high-dimensional asymptotic consistency of the KOO method based on

T_{b, j; d}

in (9), whose selection method is given by

{\hat{j}}_{v, j; d} = {j | T_{b, j; d} > 0}

. When

j \notin j_{*}

, we have

\begin{matrix} [F 2] & = \sum_{j \notin j_{*}} Pr (\sum_{ℓ = 1}^{p} n log \{1 + U_{2 ℓ} U_{1 ℓ}^{- 1}\} \geq d) \\ \leq \sum_{j \notin j_{*}} \sum_{ℓ = 1}^{p} Pr (n log \{1 + U_{2 ℓ} U_{1 ℓ}^{- 1}\} \geq d) . \end{matrix}

This implies that

\begin{matrix} [F 2] & \leq p (k - k_{j_{*}}) Pr (n log \{1 + χ_{1}^{2} / χ_{n - k}^{2}\} \geq d) \\ = p (k - k_{j_{*}}) Pr (V \geq r), \end{matrix}

(31)

where

\begin{matrix} V & = \frac{χ_{1}^{2}}{χ_{n - k}^{2}} - \frac{1}{n - k - 2}, \\ r & = e^{d / n} - 1 - \frac{1}{n - k - 2}, r_{0} = \frac{d}{n} - \frac{1}{n - k - 2} . \end{matrix}

(32)

Note that

r_{0} < r

. Then, under the assumption

r_{0} > 0

, we have

[F 2] \leq p (k - k_{j_{*}}) r^{- 2 ℓ} E [V^{2 ℓ}] \leq p (k - k_{j_{*}}) r_{0}^{- 2 ℓ} E [V^{2 ℓ}] .

(33)

Related to the assumption

r_{0} > 0

, we assumed

\begin{matrix} A 4 b : d > \frac{n}{n - k - 2} \to \frac{1}{1 - c_{2}}, and d = O (n^{a}), 0 < a < 1 . \end{matrix}

(34)

The first part in A4b implies

r_{0} > 0

. It is easy to see that

E [V^{2}] = \frac{2 (n - k - 1)}{{(n - k - 2)}^{2} (n - k - 4)} = O ({(n^{2})}^{- 1}) .

Further,

r_{0}^{- 2} = O (n^{2 (1 - a)})

. Therefore, from (33), we have that [F2]

\to 0

.

When

j \in j_{*}

, we can write

T_{b, j; d} = n \sum_{ℓ = 1}^{p} log {1 + U_{2 ℓ} U_{1 ℓ}^{- 1}} - d p .

Therefore, we can express [F1] as follows:

[F 1] = \sum_{j \in j_{*}} Pr ({\tilde{T}}_{b, j; d} \leq 0),

where

\begin{matrix} {\tilde{T}}_{b, j; d} = \frac{1}{p} \sum_{ℓ = 1}^{p} log \{1 + \frac{χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2})}{χ_{n - k; ℓ}^{2}}\} - \frac{d}{n} . \end{matrix}

Assumptions A3b and A4b easily show that

{\tilde{T}}_{b, j; d} \to η_{b, j}^{2} > 0 .

This implies that

Pr ({\tilde{T}}_{b, j; d} \leq 0) \to 0

.

These imply the following theorem.

Theorem 2.

Suppose that Assumptions A1, A2, A3b and A4b are satisfied. Then, the KOO method based on

T_{b, j : d}

in(27)is asymptotically consistent.

Let us consider an alternative approach for "

[F 1] \to 0

" as in the case of independent covariance structure. When

j \in j_{*}

, we can write

\begin{matrix} [F 1] & = \sum_{j \in j_{*}} Pr (\sum_{ℓ = 1}^{p} \{n log (1 + \frac{χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2})}{χ_{n - k; ℓ}^{2}}) - d\} \leq 0) \\ \leq \sum_{j \in j_{*}} \sum_{ℓ = 1}^{p} Pr (n log (1 + \frac{χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2})}{χ_{n - k; ℓ}^{2}}) - d \leq 0) \\ = \sum_{j \in j_{*}} \sum_{ℓ = 1}^{p} Pr ({\tilde{V}}_{j, ℓ} \leq {\tilde{r}}_{j, ℓ}) . \end{matrix}

Here, for

j \in j_{*}

,

\begin{matrix} {\tilde{V}}_{j, ℓ} = \frac{χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2})}{χ_{n - k; ℓ}^{2}} - \frac{1 + δ_{b, j; ℓ}^{2}}{n - k - 2}, ℓ = 1, \dots, p, \\ {\tilde{r}}_{j, ℓ} = e^{d / n} - 1 - \frac{1 + δ_{b, j; ℓ}^{2}}{n - k - 2} = r - \frac{δ_{b, j}^{2}}{n - k - 2}, ℓ = 1, \dots, p, \end{matrix}

where r is the same one as in (32). Note that

χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2}), ℓ = 1, \dots, p

are distributed as a noncentral distribution

χ_{1}^{2} (δ_{b, j; ℓ}^{2})

, and they are independent. Then, under the assumption

{\tilde{r}}_{j} < 0

(or equivalently

r < δ_{b j; ℓ}^{2} / (n - k - 2)

), we have

[F 1] \leq k_{j_{*}} \sum_{ℓ = 1}^{p} {| {\tilde{r}}_{j, ℓ} |}^{- 2 s} E [{\tilde{V}}_{j, ℓ}^{2 s}], s = 1, 2, \dots .

(35)

In the above upper bound, it holds that

| {\tilde{r}}_{j, ℓ} | \sim δ_{b, j; ℓ}^{2} / (n - k) \to η_{b, j; ℓ}^{2} .

(36)

Useful bounds are obtained by giving the first few moments of

{\tilde{V}}_{j; ℓ}

. For example,

\begin{matrix} E [{\tilde{V}}_{j, ℓ}^{2}] & = \frac{2 (1 + 2 δ_{v, j; ℓ}^{2}) (n - k - 1 + δ_{v, j; ℓ}^{2})}{{(n - k - 2)}^{2} (n - k - 4)} = O (n^{- 1}), \\ E [{\tilde{V}}_{j, ℓ}^{4}] & = O (n^{- 2}) . \end{matrix}

Then, Bound (35) with

s = 2

can be asymptotically expressed as follows:

k_{j_{*}} \sum_{ℓ = 1}^{p} η_{b, j; ℓ}^{- 4} E [{\tilde{V}}_{j, ℓ}^{4}] = k_{j_{*}} p (\frac{1}{p} \sum_{ℓ = 1}^{p} η_{b, j; ℓ}^{- 4}) \times O (n^{- 2}) .

The above expression is

O (n^{- 1})

under the assumption that

\frac{1}{p} \sum_{ℓ = 1}^{p} η_{b, j; ℓ}^{- 4}

tends to a quantity.

6. Asymptotic Consistency under a Uniform Covariance Structure

In this section, we show an asymptotic consistency of KOO method based on a general information criterion under a uniform covariance structure. First, following [12], we derive a

{GIC}_{u, j}

as in (6), and a key statistic

T_{u, j; d}

as in (9). A uniform covariance structure is given by

Σ_{u} = σ_{u}^{2} (ρ_{u}^{1 - δ_{i j}}) = σ_{u}^{2} {(1 - ρ_{u}) I_{p} + ρ_{u} 1_{p} 1_{p}^{'}},

(37)

with Kronecker delta

δ_{i j}

. The covariance structure is expressed as follows:

Σ_{u} = α (I_{p} - \frac{1}{p} G_{p}) + β \frac{1}{p} G_{p},

where

α = σ_{u}^{2} (1 - ρ_{u}), β = σ_{u}^{2} {1 + (p - 1) ρ_{u}}, G_{p} = 1_{p} 1_{p}^{'},

and

1_{p} = {(1, \dots, 1)}^{'}

. Matrices

I_{p} - \frac{1}{p} G_{p}

and

\frac{1}{p} G_{p}

are orthogonal idempotent matrices, so we have

| Σ_{u} | = β α^{p - 1}, Σ_{u}^{- 1} = \frac{1}{α} (I_{p} - \frac{1}{p} G_{p}) + \frac{1}{β} \cdot \frac{1}{p} G_{p} .

Now, we consider the multivariate regression model

M_{u, j}

given by

M_{u, j} : Y \sim N_{n \times p} (X_{j} Θ_{j}, Σ_{u, j} \otimes I_{n}),

(38)

where

Σ_{u, j} = α_{j} (I_{p} - p^{- 1} G_{p}) + β_{j} p^{- 1} G_{p}

. Let

H = (h_{1}, H_{2})

be an orthogonal matrix where

h_{1} = p^{- 1 / 2} 1_{p}

, and let

W_{j} = Y^{'} (I_{n} - P_{j}) Y and U_{j} = H^{'} W_{j} H .

Here,

h_{1}

is a characteristic vector of

Σ_{u, j}

, and each column vector of

H_{2}

is a characteristic vector of

Σ_{u, j}

. Let the density function of

Y

under

M_{u, j}

be denoted by

f (Y; Θ_{j}, α_{j}, β_{j})

. Then, we have

\begin{matrix} g (α_{j}, β_{j}) & = - 2 log max_{Θ_{j}} f (Y; Θ_{j}, α_{j}, β_{j}) \\ = n p log (2 π) + n (p - 1) log α_{j} + n log β_{j} + tr Ψ_{j}^{- 1} U_{j}, \end{matrix}

where

Ψ_{j} = diag (β_{j}, α_{j}, \dots, α_{j})

. Therefore, the maximum likelihood estimators of

α_{j}

and

β_{j}

under

M_{u, j}

are given by

\begin{matrix} {\hat{α}}_{j} & = \frac{1}{n (p - 1)} tr H_{2}^{'} Y^{'} (I_{n} - P_{j}) Y H_{2}, \\ {\hat{β}}_{j} & = \frac{1}{n} h_{1}^{'} Y^{'} (I_{n} - P_{j}) Y h_{1} . \end{matrix}

The number of independent parameters under

M_{u, j}

is

m_{j} = k_{j} p + 2

. Noting that

Ψ_{j}

is diagonal, we can obtain the general information criterion (GIC) in (8) for

Y

in (38) as follows:

\begin{matrix} {GIC}_{u, j} & = n (p - 1) log {\hat{α}}_{j} + n log {\hat{β}}_{j} + n p (log 2 π + 1) + d (k_{j} p + 2) . \end{matrix}

(39)

Therefore, we have

\begin{matrix} T_{u, j; d} & \equiv {GIC}_{u, ω ∖ j} - {GIC}_{u, ω} \\ = n (p - 1) log \{{\hat{α}}_{ω ∖ j} {({\hat{α}}_{ω})}^{- 1}\} + n log \{{\hat{β}}_{ω ∖ j} {({\hat{β}}_{ω})}^{- 1}\} - d p \\ = Z_{1 j} + Z_{2 j} . \end{matrix}

(40)

Here,

Z_{1 j}

and

Z_{2 j}

are defined as follows:

\begin{matrix} Z_{1 j} = n (p - 1) log \{1 + V_{2 j}^{(1)} {(V_{1}^{(1)})}^{- 1}\} - d (p - 1), \\ Z_{2 j} = n log \{1 + V_{2 j}^{(2)} {(V_{1}^{(2)})}^{- 1}\} - d, \end{matrix}

(41)

using the following

V_{1}^{(i)}

,

V_{2 j}^{(i)}

,

i = 1, 2

:

\begin{matrix} V_{1}^{(1)} & = tr H_{2}^{'} Y (I_{n} - P_{ω}) Y H_{2}, V_{2 j}^{(1)} = tr H_{2}^{'} Y^{'} (P_{ω} - P_{ω ∖ j}) Y H_{2}, \\ V_{1}^{(2)} & = h_{1}^{'} Y^{'} (I_{n} - P_{ω}) Y h_{1}, V_{2 j}^{(2)} = h_{1}^{'} Y^{'} (P_{ω} - P_{ω ∖ j}) Y h_{1} . \end{matrix}

Related to the distributional reductions of

Z_{1 j}

,

Z_{2 j}

,

j = 1, \dots, k

, we use the following Lemma frequently.

Lemma 1.

Let

W

have a noncentral Whishart distribution

W_{p} (m, Σ; Ω)

. Let the covariance matrix Σ be decomposed into characteristic roots and vectors as follows:

\begin{matrix} Σ & = H Λ H^{'} \\ = (H_{1}, \dots, H_{h}) diag (λ_{1} I_{q_{1}}, \dots, λ_{h} I_{q_{h}}) {(H_{1}, \dots, H_{h})}^{'}, \end{matrix}

where

λ_{1} > \dots > λ_{h} > 0

and

H

is an orthogonal matrix. Then,

tr H_{j}^{'} W_{j} H_{j}

,

i = 1, \dots, h

are independently distributed to noncentral chi-squared distributions with

m k_{j}

degrees of freedom and noncentrality parameters

δ_{j}^{2} = tr H_{j}^{'} Ω H_{j}

.

Proof.

The result may be proven by considering the characteristic function of

(tr H_{1}^{'} W H_{1}, \dots, tr H_{q}^{'} W H_{q})

which is expressed as follows (see Theorem 2.1.2 in [15]):

\begin{matrix} E [e^{i t_{1} tr H_{1}^{'} W H_{1} + \dots + i t_{h} tr H_{h}^{'} W H_{h}}] \\ = E [etr (K)] \\ = | I_{p} {- 2 Σ K |}^{- m / 2} etr \{Ω K {(I_{p} - 2 Σ K)}^{- 1})\}, \end{matrix}

where

K = i t_{1} H_{1} H_{1}^{'} + \dots + i t_{1} H_{q} H_{q}^{'}

. The result can be easily obtained by checking that the above last expression equals

\prod_{j = 1}^{q} {(1 - 2 i t_{j})}^{- n k_{j} / 2} exp \{\frac{i t_{j}}{1 - 2 i t_{j}} tr H_{j}^{'} Ω H_{j}\} .

□

Assume that the true model is expressed as

M_{u, j_{*}} : Y \sim N_{n \times p} (X_{j_{*}} Θ_{j_{*}}, Σ_{u, *} \otimes I_{n}),

(42)

where

Σ_{u, *} = α_{*} (I_{p} - p^{- 1} G_{p}) + β_{*} p^{- 1} G_{p}

. Using Lemma 1, we have the following lemma.

Lemma 2.

Under True Model(42), it holds that

(1): $V_{1}^{(1)} / α_{*}$ and $V_{2 j}^{(1)} / α_{*}$ are independently distributed to a central chi-squared distribution $χ_{(p - 1) (n - k)}^{2}$ and a noncentral chi-squared distribution $χ_{p - 1}^{2} (δ_{1 j}^{2})$ , respectively.
(2): $V_{1}^{(2)} / β_{*}$ and $V_{2 j}^{(2)} / β_{*}$ are independently distributed to a central chi-squared distribution $χ_{n - k}^{2}$ and a noncentral chi-squared distribution $χ_{1}^{2} (δ_{2 j}^{2})$ , respectively.
(3): Noncentrality parameters $δ_{1 j}^{2}$ and $δ_{2 j}^{2}$ are defined as follows:

\begin{matrix} δ_{1 j}^{2} & = \frac{1}{α_{*}} tr H_{2}^{'} {(X_{j_{*}} Θ_{j_{*}})}^{'} (P_{ω} - P_{ω ∖ j}) (X_{j_{*}} Θ_{j_{*}}) H_{2} \\ δ_{2 j}^{2} & = \frac{1}{β_{*}} h_{1}^{'} {(X_{j_{*}} Θ_{j_{*}})}^{'} (P_{ω} - P_{ω ∖ j}) (X_{j_{*}} Θ_{j_{*}}) h_{1} . \end{matrix}

Here, if

j \notin j_{*}

, then

δ_{1 j}^{2} = 0

and

δ_{2 j}^{2} = 0

.

Now, we consider the high-dimensional asymptotic consistency of the KOO method based on

T_{b, j; d}

in (27), whose selection method is given by

{\hat{j}}_{v, j; d} = {j | T_{b, j; d} > 0}

. For a sufficient condition for the consistency of

{\hat{j}}_{v, j; d}

, we assumed

A3u: For any

j \in j_{*}

,

δ_{1 j}^{2} = O (n p)

,

δ_{2 j}^{2} = O (n)

and

lim \frac{1}{n p} δ_{1 j}^{2} = η_{1 j}^{2} > 0, lim \frac{1}{n} δ_{2 j}^{2} = η_{2 j}^{2} > 0,

(43)

When

j \notin j_{*}

, we have

\begin{matrix} [F 2] & = \sum_{j \notin j_{*}} \{Pr (Z_{1 j} + Z_{2 j} \geq 0)\} \\ \leq \sum_{j \notin j_{*}} \{Pr (Z_{1 j} \geq 0) + Pr (Z_{2 j} \geq 0)\} \\ = (k - k_{j_{*}}) \{Pr (Z^{(1)} \geq s_{0}^{(1)}) + Pr (Z^{(2)} \geq s_{0}^{(2)})\} . \end{matrix}

Here,

\begin{matrix} Z^{(1)} & = \frac{χ_{p - 1}^{2}}{χ_{(p - 1) (n - k)}^{2}} - \frac{p - 1}{(p - 1) (n - k) - 2}, \\ s^{(1)} & = e^{d / n} - 1 - \frac{p - 1}{(p - 1) (n - k) - 2}, s_{0}^{(1)} = \frac{d}{n} - \frac{p - 1}{(p - 1) (n - k) - 2}, \\ Z^{(2)} & = \frac{χ_{1}^{2}}{χ_{n - k}^{2}} - \frac{1}{n - k - 2}, \\ s^{(2)} & = e^{d / n} - 1 - \frac{1}{n - k - 2}, s_{0}^{(2)} = \frac{d}{n} - \frac{1}{n - k - 2} . \end{matrix}

Note that

s_{0}^{(1)} < s^{(1)}

and

s_{0}^{(2)} < s^{(2)}

. Then, under the assumption that

s_{0}^{(1)} > 0

and

s_{0}^{(2)} > 0

, we have

[F 2] \leq (k - k_{j_{*}}) [{(s_{0}^{(1)})}^{- 2 ℓ} E [{(Z^{(1)})}^{2 ℓ}] + {(s_{0}^{(2)})}^{- 2 ℓ} E [{(Z^{(2)})}^{2 ℓ}]] .

(44)

Related to assumptions

s_{0}^{(1)} > 0

and

s_{0}^{(2)} > 0

, we assumed

\begin{matrix} A 4 u : & d > \frac{n (p - 1)}{(p - 1) (n - k) - 2} \to \frac{1}{1 - c_{2}}, d > \frac{n}{n - k - 2} \to \frac{1}{1 - c_{2}}, \\ and d = O (n^{a}), 0 < a < 1 . \end{matrix}

(45)

The first part in A4u implies

s_{0}^{(1)} > 0

and

s_{0}^{(2)} > 0

. It is easy to see that

\begin{matrix} E [{(Z^{(1)})}^{2}] & = \frac{2 {(p - 1)}^{2} (n - k + 1)}{{(p - 1) (n - k) - 2}^{2} {(p - 1) (n - k) - 4}} = O ({(n^{3})}^{- 1}), \\ E [{(Z^{(2)})}^{2}] & = \frac{2 (n - k - 1)}{{(n - k - 2)}^{2} (n - k - 4)} = O ({(n^{2})}^{- 1}) . \end{matrix}

Further,

{(s_{0}^{(1)})}^{- 2} = O (n^{2 (1 - a)})

and

{(s_{0}^{(2)})}^{- 2} = O (n^{2 (1 - a)})

. Therefore, from (44), we have that [F2]

\to 0

.

When

j \in j_{*}

, we can write

T_{b, j; d} = n \sum_{ℓ = 1}^{p} log {1 + U_{2 ℓ} U_{1 ℓ}^{- 1}} - d p .

Therefore, we can express [F1] as follows:

[F 1] = \sum_{j \in j_{*}} Pr ({\tilde{T}}_{b, j; d} \leq 0),

where

\begin{matrix} {\tilde{T}}_{b, j; d} = \frac{1}{p} \sum_{ℓ = 1}^{p} log \{1 + \frac{χ_{1; ℓ}^{2} (δ_{b, j; ℓ}^{2})}{χ_{n - k; ℓ}^{2}}\} - \frac{d}{n} . \end{matrix}

Assumptions A3b and A4b easily show that

{\tilde{T}}_{v, j; d} \to log (1 + γ_{v, j}^{2}) > 0 .

This implies that

Pr ({\tilde{T}}_{v, j; d} \leq 0) \to 0

, and

[F 1] \to 0

.

These imply the following theorem.

Theorem 3.

Suppose that Assumptions A1, A2, A3u and A4u are satisfied. Then, the KOO method based on

T_{u, j : d}

in(40)is asymptotically consistent.

7. Concluding Remarks

In this paper, we considered selecting regression variables in a p variate regression model with one of three covariance structures: (1) ICSS (an independent covariance structure with the same variance), (2) ICSD (an independent covariance structure with different variances), and (3) UCS (a uniform covariance structure). It was proposed to use a KOO method on the basis of a general information criterion with a penalty term d. We indicated high-dimensional consistencies of the KOO methods with

d = O (n^{a}), 0 < a < 1

. Ref. [12] studied the asymptotic consistencies of KOO methods in (1) and (3). However, in their approach, the number of explanatory variables was fixed; in this paper, the number of explanatory variables may have tended to infinity. KOO methods may be feasible in computation. The idea goes back to [1], and [2]. However, high-dimensional properties were recently studied in [7,8,9,11].

A high-dimensional study of the KOO method under an autoregressive covariance structure (AUTO), and extending our results to the case of non-normality remain as future work.

Author Contributions

Conceptualization, Y.F.; Methodology, Y.F. and T.S.; Software, T.S.; Writing—original draft, Y.F. and T.S.; Writing—review & editing, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to express their gratitude to Vladimir V. Ulyanov and the three referees for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nishii, R.; Bai, Z.D.; Krishnaia, P.R. Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Math. J. 1988, 18, 451–462. [Google Scholar] [CrossRef]
Zhao, L.C.; Krishnaia, P.R.; Bai, Z.D. On detection of the number of signals in presence of white noise. J. Multivar. Anal. 1986, 20, 1–25. [Google Scholar] [CrossRef] [Green Version]
Bai, Z.; Fujikoshi, Y.; Hu, J. Strong Consistency of the AIC, BIC, Cp and KOO Methods in High-Dimensional Multivariate Linear Regression; Hiroshima Statistical Research Group: Hiroshima, Japan, 2018. [Google Scholar]
Yanagihara, H.; Wakaki, H.; Fujikoshi, Y. A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large. Electron. J. Stat. 2015, 9, 869–897. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Sakurai, T.; Yanagihara, H. Consistency of high-dimensional AIC-type and C_p-type criteria in multivariate linear regression. J. Multivar. Anal. 2014, 123, 184–200. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Sakurai, T. High-dimensional consistency of rank estimation criteria in multivariate linear Model. J. Multivar. Anal. 2016, 149, 199–212. [Google Scholar] [CrossRef]
Oda, R.; Yanagihara, H. A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables. Electron. J. Stat. 2020, 14, 1386–1412. [Google Scholar] [CrossRef]
Oda, R.; Yanagihara, H. A consistent likelihood-based variable selection method in normal multivariate linear regression. In Intelligent Decision Technologies; Czarnowski, I., Ed.; Springer: Singapore, 2021; Volume 238, pp. 391–401. [Google Scholar]
Fujikoshi, Y.; Sakurai, T. Consistency of test-based method for selection of variables in high-dimensional two group-discriminant analysis. Jpn. J. Stat. Data Sci. 2019, 2, 155–171. [Google Scholar] [CrossRef] [Green Version]
Oda, R.; Suzuki, Y.; Yanagihara, H.; Fujikoshi, Y. A consistent variable selection method in high-dimensional canonical discriminant analysis. J. Multivar. Anal. 2020, 175, 1–13. [Google Scholar] [CrossRef]
Fujikoshi, Y. High-dimensional consistencies of KOO methods in multivariate regression model and discriminant analysis. J. Multivar. Anal. 2022, 188, 104860. [Google Scholar] [CrossRef]
Sakurai, T.; Fujikoshi, Y. Exploring consistencies of information criterion and test-based criterion for high-dimensional multivariate regression models under three covariance structures. In Festschrift in Honor of Professor Dietrich von Rosen’s 65th Birthday; Holgerson, T., Singnull, M., Eds.; Springer: Berlin, Germany, 2020; pp. 313–334. [Google Scholar]
Akaike, H. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory; Petrov, B.N., Csáki, F., Eds.; Akadémiai Kiadó: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Ulyanov, V.V.; Shimizu, R. Multivariate Statistics: High-Dimensional and Large-Sample Approximations; Wiley: Hobeken, NJ, USA, 2010. [Google Scholar]

Table 1. KOO Based on AIC.

$k = 3$	KOO Based on AIC		KOO Based on AIC
$(n, p)$	(20, 10)	(200, 100)	(20, 10)	(200, 100)
(1)	0.74	1.00	0.77	1.00
(3)	0.47	1.00	0.22	1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fujikoshi, Y.; Sakurai, T. High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. Mathematics 2023, 11, 671. https://doi.org/10.3390/math11030671

AMA Style

Fujikoshi Y, Sakurai T. High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. Mathematics. 2023; 11(3):671. https://doi.org/10.3390/math11030671

Chicago/Turabian Style

Fujikoshi, Yasunori, and Tetsuro Sakurai. 2023. "High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures" Mathematics 11, no. 3: 671. https://doi.org/10.3390/math11030671

APA Style

Fujikoshi, Y., & Sakurai, T. (2023). High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. Mathematics, 11(3), 671. https://doi.org/10.3390/math11030671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures

Abstract

1. Introduction

2. Notations and Preliminaries

3. Approach to Consistencies of KOO Methods

4. Asymptotic Consistency under an Independent Covariance Structure

5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances

6. Asymptotic Consistency under a Uniform Covariance Structure

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI