Aggregate Kernel Inverse Regression Estimation

Li, Wenjuan; Wang, Wenying; Chen, Jingsi; Rao, Weidong

doi:10.3390/math11122682

Open AccessArticle

Aggregate Kernel Inverse Regression Estimation

by

Wenjuan Li

¹,

Wenying Wang

¹,

Jingsi Chen

¹ and

Weidong Rao

^2,*

¹

School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming 650221, China

²

School of Mathematics and Computer Science, Jiangxi Science and Technology Normal University, Nanchang 330038, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(12), 2682; https://doi.org/10.3390/math11122682

Submission received: 17 May 2023 / Revised: 5 June 2023 / Accepted: 6 June 2023 / Published: 13 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

Sufficient dimension reduction (SDR) is a useful tool for nonparametric regression with high-dimensional predictors. Many existing SDR methods rely on some assumptions about the distribution of predictors. Wang et al. proposed an aggregate dimension reduction method to reduce the dependence on the distributional assumptions. Motivated by their work, we propose a novel and effective method by combining the aggregate method and the kernel inverse regression estimation. The proposed approach can accurately estimate the dimension reduction directions and substantially improve the exhaustivity of the estimates with complex models. At the same time, this method does not depend on the arrangement of slices, and the influence of the extreme values of the response is reduced. In numerical examples and a real data application, it performs well.

Keywords:

aggregate kernel inverse regression estimation; kernel inverse regression; aggregate dimension reduction; sufficient dimension reduction

MSC:

62G08; 62H12

1. Introduction

Sufficient dimension reduction (SDR) has garnered significant interest as an efficient regression tool for high-dimensional data since the groundbreaking research by Li [1]. Given a univariate response

Y \in R

and a p dimensional predictor

X \in R^{p}

, the goal of SDR is to replace

X

with a small set of linear combinations

B^{T} X

, where

B = (β_{1}, β_{2}, \dots, β_{d})

is a

p \times d

matrix with

d < p

. Let

S

be column space of

B

. That is, SDR seeks a subspace

S \in R^{d}

such that

Y ⊥ ⊥ X ∣ P_{S} X,

(1)

where “

⊥ ⊥

” denotes independence and

P_{S}

represents the orthogonal projection operator onto

S

. The intersection of all such

S

, also satisfying the above conditional independence, is defined as the central subspace (CS) and is denoted by

S_{Y ∣ X}

. The dimensions of

S_{Y ∣ X}

, denoted by

d_{Y ∣ X}

, are called the structural dimensions. When the conditional independence in (1) is replaced by

Y ⊥ ⊥ E [Y ∣ X] ∣ P_{S} X,

(2)

S

is called a mean dimension reduction subspace. The central mean subspace (CMS, Cook and Li [2]), denoted by

S_{E [Y ∣ X]}

, is the intersection of all the mean dimension reduction subspaces, if it satisfies the conditional independence in (2). Cook and Li [2] showed that

S_{E [Y ∣ X]} \subseteq S_{Y ∣ X}

.

A variety of approaches for SDR have been proposed, such as the sliced inverse regression (SIR [1]), sliced average variance estimation (SAVE [3]), parametric inverse regression (PIR [4]), principal Hessian directions (pHd [5]), contour regression (CR [6]), directional regression (DR [7]), kernel inverse regression (KIR [8]), cumulative mean estimation (CUME [9]) and sliced average third moment ([10]), among others. However, the above methods usually rely on some assumptions about the distribution of independent variables, such as the linearity condition (L.D.C) and the constant conditional variance condition (CCV). Cook and Nachtsheim [11] proposed a method reweighting the predictor vectors to handle the scenarios of non-elliptical distributions. Li and Dong [12] and Dong and Li [13] proposed methods based on central solution space (CSS), which do not require the linearity condition. Ma and Zhu [14] constructed a semiparametric estimation framework for dimension reduction, removing the reliance on the distributional assumptions for the predictions at the cost of an additional semiparametric regression.

Recently, Wang et al. [15] proposed an aggregate dimension reduction (ADR) procedure. The basic idea of this method is to localize the dimension reduction process by estimating CS in the local neighborhoods and of each observations of the predictor vector and aggregate all the results of the localized dimension reduction. The sensitivity of the method to the distributional assumptions was greatly reduced. Wang and Yin [16] extended this idea to the cumulative slice estimation framework and proposed aggregate inverse mean estimation (AIME). Wang and Xue [17,18] proposed an ensemble of inverse moment estimators (ELF2M) and a structured covariance ensemble (enCov) to explore the central subspace. By aggregating information from the kth moment subspace, this method can effectively identify the dimension reduction directions when modeling the mean and the variance. In addition, enCov is applicable to both continuous and binary responses. In aggregate dimension reduction, the local kernel matrix may be affected by the imbalance among the number of observations in slices and extreme values in Y. As kernel inverse regression can avoid these problems, we combine the nonparametric kernel methods and the aggregated SDR idea proposed by Wang et al. [15] to propose a method called aggregated kernel inverse regression estimate (AKIRE). We use the idea of kernel inverse regression locally to find the local dimension reduction subspace, and then aggregate them. In contrast to ADR and AIME, the AKIRE method provides a kernel estimate of

E (X ∣ Y = y)

through a non-parametric approach, which can be regarded as a smoothed moving slicing estimator, thus reducing the possible impact from outliers. Moreover, in each local area, combining the kernel estimates of

E (X ∣ Y = y)

at many values of y can provide more accurate local dimension reduction results. Numerical studies have shown that AKIRE has effective estimation results and satisfactory robustness.

The paper is organized as follows. Section 2 reviews ADR and AIME. Section 3 introduces AKIRE, including its algorithm and tuning parameter selection. Simulation studies are included in Section 4 to illustrate the proposed method. Section 5 presents a practical application utilizing real data. Finally, we conclude the paper in Section 6.

2. Review of ADR and AIME

Let

G_{i}

be any open set in

Ω_{X} \in R^{p}

, where

Ω_{X}

is the support set of

X

. ADR (Wang et al. [15]) is supported by the fact that local central subspaces

S_{Y_{G_{i}} ∣ X_{G_{i}}}

must also belong to the global central subspace

S_{Y ∣ X}

. Therefore, the central subspace

S_{Y ∣ X}

can always be decomposed into local dimension reduction subspaces and we can aggregate the local subspaces to recover

S_{Y ∣ X}

, such that

\begin{matrix} S_{Y ∣ X} = S p a n \{⋃_{i = 1}^{m} S_{Y_{G_{i}} ∣ X_{G_{i}}}\} . \end{matrix}

(3)

Equation (3) guarantees that we can join a finite number of local central subspaces to recover the global central subspace.

Let

{\bar{G}}_{i}

denote the closure of an open set

G_{i}

and

∥ G_{i} ∥

denote the “diameter” of

G_{i}

in

Ω_{X}

, in the sense that

∥ G_{i} ∥ = s u p {| x - x^{'} | : x, x^{'} \in G}

. Let

μ_{G_{i}} = E (X_{G_{i}})

and

\dot{h} (y ∣ x) = \partial h (y ∣ x) / \partial x

, where

X_{G_{i}}

follows the conditional distribution of

X

given

x \in G_{i}

and

h (y ∣ x)

is the conditional density of Y given

X = x

. Let

β_{G_{i}}

be the column full rank matrix of

H_{G_{i}} = E [\dot{h} (Y_{G_{i}} ∣ μ_{G_{i}}) {\dot{h}}^{T} (Y_{G_{i}} ∣ μ_{G_{i}})]

, where

Y_{G_{i}}

is Y restricted on the set

G_{i}

. The space spanned by the column space of

β_{G_{i}}

is denoted as

S p a n (β_{G_{i}})

, and let

P_{β_{G_{i}}} = β_{G_{i}} {(β_{G_{i}}^{T} β_{G_{i}})}^{- 1} β_{G_{i}}^{T}

be the projection onto

S p a n (β_{G_{i}})

. Note that

S p a n (β_{G_{i}}) = S p a n (P_{β_{G_{i}})} \subset S_{Y_{G_{i}} ∣ X_{G_{i}}}

.

Proposition 1.

(Wang et al. [15]) Suppose that, for a fixed

y \in Ω_{Y}

, where

Ω_{Y}

is the support set of Y, the marginal density of Y is greater than 0,

h (y ∣ x)

is twice differentiable with respect to

x

on

{\bar{G}}_{i}

, and the second derivatives are bounded on

{\bar{G}}_{i}

. Then, as

∥ G_{i} ∥ \to 0

, almost everywhere on

Ω_{Y}

,

\begin{matrix} ∥ Σ_{G_{i}}^{- 1} [E (X_{G_{i}} ∣ Y_{G_{i}} = y) - E (X_{G_{i}})] - P_{β_{G_{i}}} Σ_{G_{i}}^{- 1} [E (X_{G_{i}} ∣ Y_{G_{i}} = y) - E (X_{G_{i}})] ∥_{F} = O (∥ G_{i} ∥), \end{matrix}

(4)

where

{∥ A ∥}_{F}

denotes the Frobenius norm of matrix A.

As

S p a n (β_{G_{i}}) \subset S_{Y_{G_{i}} ∣ X_{G_{i}}}

, Formula (4) indicates that the local inverse mean vector

E (X_{G_{i}} ∣ Y_{G_{i}} = y) - E (X_{G_{i}})

can be used to estimate

S_{Y_{G_{i}} ∣ X_{G_{i}}}

when

∥ G_{i} ∥

is small enough. Based on the proposed aggregate approach, Wang et al. [15] proposed kNN sliced inverse regression (kNNSIR) and the adaptive kNN sliced inverse regression (a-kNNSIR) using the k-nearest neighbor localizing mechanism. Wang and Yin [16] extended the above proposition to CUME, and gave the following proposition. Let

m (\tilde{Y}) = E [{X - E (X)} I (Y < \tilde{Y})]

, where

I (\cdot)

is an indicator function. In addition, Wang and Yin [16] defined the local kenel matrix as

M_{i} = E [m (\tilde{Y}) m^{T} (\tilde{Y}) ω (\tilde{Y}) ∣ X \in G_{i}]

, where

\tilde{Y}

is an independent copy of Y and

ω (\cdot)

is a non-negative weight function.

Proposition 2.

(Wang and Yin [16]) Assume the conditional distribution function

h (y ∣ x)

is twice differentiable with respect to

x

, and its second derivative is bounded on

Ω_{X}

. Then, when

\max_{i = 1, 2, \dots, m} ∥ G_{i} ∥ \to 0

, we have

S p a n (Σ_{G_{i}}^{- 1} M_{i}) \subseteq S_{Y_{G_{i}} ∣ X_{G_{i}}}

and

\cup_{i = 1}^{m} S p a n (Σ_{G_{i}}^{- 1} M_{i}) \subseteq S_{Y ∣ X}

, where

Σ_{G_{i}}

is a covariance matrix of

X

in

G_{i}

.

The above two propositions ensure that the aggregation of the dimension reduction direction of the local area belongs to the central subspace. Localization can not only transform the nonlinear data structure into a local linear structure and weaken the linearity condition, but also deal with the inability of SIR and CUME to recognize symmetry.

3. Aggregate Kernel Inverse Regression Estimation

Both kNNSIR and a-kNNSIR employ SIR for the local dimension reduction in

G_{i}

. In SIR, the inverse mean

E (X | Y) - E (X)

plays a key role, because

E (X | Y) - E (X) \subseteq Σ_{X} S_{Y ∣ X}

under the linearity condition. However, SIR uses

E (X | Y \in I_{i})

instead of

E (X | Y)

, where

i = 1, 2, \dots, H

denotes a partition of

Ω_{Y}

, and that limits its performance. Moreover, CUME used by AIME also depends on slicing, with the datasets always divided into two slices.

For the use of

E (X | Y)

, slicing is not the sole approach. Actually, estimating

E (X | Y = y)

directly by kernel smoothing (e.g., the Nadaraya–Watson estimator) and combining the estimates at many values of y [8] is a powerful method, which is called kernel inverse regression (KIR). Under some regularity conditions, Zhu and Fang [8] have proven the

\sqrt{n}

-consistency and asymptotic normality of KIR estimators.

We then use KIR for the dimension reduction in the local area. Let

(x_{G_{i}, j}, y_{G_{i}, j}),

j = 1, 2, \dots, m

be the data points with the observations of

X

falling into the local area

G_{i}

. Denote

\begin{matrix} {\hat{g}}_{i} (y) & = & \frac{1}{m} \sum_{j = 1}^{m} K_{h} (y_{G_{i}, j} - y)) (x_{G_{i}, j} - {\bar{x}}_{G_{i}}) = \frac{1}{m h} \sum_{j = 1}^{m} K ((y_{G_{i}, j} - y)) / h) (x_{G_{i}, j} - {\bar{x}}_{G_{i}}), \\ {\hat{f}}_{i} (y) & = & \frac{1}{m} \sum_{j = 1}^{m} K_{h} (y_{G_{i}, j} - y)) = \frac{1}{m h} \sum_{j = 1}^{m} K (y_{G_{i}, j} - y) / h), \end{matrix}

(5)

where

K_{h} (u) = K (u / h) / h

,

K (\cdot)

is a kernel function, h is a constant bandwidth and

{\bar{x}}_{G_{i}} = \sum_{j = 1}^{m} x_{G_{i}, j} / m

. Then,

E (X_{G_{i}} | Y_{G_{i}} = y) - E (X_{G_{i}})

can be estimated by following Nadaraya–Watson estimate

\begin{matrix} {\hat{r}}_{i} (y) = \frac{{\hat{g}}_{i} (y)}{{\hat{f}}_{i} (y)} . \end{matrix}

(6)

Based on

{\hat{r}}_{i} (Y)

, we construct the local kernel matrix for

G_{i}

,

\begin{matrix} {\hat{Λ}}_{i} = \frac{1}{m} \sum_{j = 1}^{m} {\hat{r}}_{i} (y_{G_{i}, j}) {\hat{r}}_{i}^{T} (y_{G_{i}, j}) . \end{matrix}

(7)

Finally, aggregating all the

{\hat{Λ}}_{i}

s from

G_{i}

s gives the global kernel matrix and its spectral decomposition gives the estimate of the central subspace

S_{Y ∣ X}

.

Let

{\tilde{Y}}_{G_{i}}

denote the independent copy of

Y_{G_{i}}

and

r_{i} (y) \hat{=} E (X_{G_{i}} | Y_{G_{i}} = y) - E (X_{G_{i}})

. Then, the above

{\hat{Λ}}_{i}

is estimated by

Λ_{i} = E [r_{i} ({\tilde{Y}}_{G_{i}}) r_{i}^{T} ({\tilde{Y}}_{G_{i}})] .

The following theorem provides the population consistency of the above method.

Theorem 1.

Under the assumptions that the conditional distribution function

h (y ∣ x)

is twice differentiable with respect to

x

and its second derivative is bounded on

Ω_{X}

, then, when

\max_{i = 1, 2, \dots, m} ∥ G_{i} ∥ \to 0

, we have

S p a n (Σ_{G_{i}}^{- 1} Λ_{i}) \subseteq S_{Y_{G_{i}} ∣ X_{G_{i}}}

and

\cup_{i = 1}^{m} S p a n (Σ_{G_{i}}^{- 1} Λ_{i}) \subseteq S_{Y ∣ X}

, where

Σ_{G_{i}}

is a covariance matrix of

X

in

G_{i}

.

The proof follows Theorem 2 in a work by Wang et al. [15], and is omitted here.

3.1. Estimation Algorithm

We now summarize the sample-level algorithm for AKIRE. Let

{(x_{i}, y_{i}), i = 1, 2, \dots, n}

be a sample from

(X, Y)

and assume the structural dimension d is known before estimation. The estimation algorithm of the proposed AKIRE procedure is summarized as follows, see Algorithm 1.

The Algorithm 1 highlights that the global kernel matrix

\hat{Λ}

aggregates information from all the local central subspaces. Unlike ADR and AIME, we constructed the kernel estimators of

E (X_{G_{i}} | Y_{G_{i}} = y) - E (X_{G_{i}})

for

y = y_{G_{i}}, j = 1, 2, \dots, m

and combined them to discover the local central subspaces. That ensures that the information contained in

G_{i}

is used more adequately and enhances the convergence speed of

\hat{B}

.

3.2. Tuning Parameters in Algorithm 1

We will discuss how to choose the tuning parameters in Algorithm 1, including the size of the local neighborhoods k, the kernel function bandwidth h, the order

α

in the

Σ

-envelope, the weighting function

ω (\cdot)

, and the determination of the structural dimension

d_{Y | X}

.

In this paper, the main improvement is made for the step of the dimension reduction in the local areas, where a Nadaraya–Watson estimator of

E (X_{G_{i}} | Y_{G_{i}} = y) - E (X_{G_{i}})

is used. In this step, the only tuning parameter is the bandwith h, and we use the MATLAB function “ksdensity” to output the best bandwidth. For the tuning parameters in the other step, we primarily adopt the suggestions proposed by Wang et al. [15]. The value of m is selected within the range of

2 p

to

4 p

. The order

α

in the

Σ

-envelope is determined based on the ratios of the consecutive eigenvalues of

{\hat{R}}_{G_{i}} {\hat{R}}_{G_{i}}^{T}

(Li et al. [19]):

α_{G_{i}} = \sum_{j}^{p - 1} I (\frac{r_{j}}{r_{j + 1}} > α_{0}),

where

r_{1} \geq \dots \geq r_{p}

are the eigenvalues of the matrix

{\hat{R}}_{G_{i}} {\hat{R}}_{G_{i}}^{T}

,

{\hat{R}}_{G_{i}} = ({\hat{ξ}}_{G_{i}}, {\hat{Σ}}_{G_{i}} {\hat{ξ}}_{G_{i}}, \dots, {\hat{Σ}}_{G_{i}}^{α - 1} {\hat{ξ}}_{G_{i}})

and

α_{0}

is a pre-specified threshold value, which in the numerical studies is set to 1.5 as recommended by Li et al. [19]. As to the weighted function, we chose

ω (η_{i}) = {∥ η_{i} ∥}_{2}^{- 2}

. We borrow the bootstap procedure proposed by Ye and Weiss [20] to estimate

d_{Y | X}

. Let

{\hat{S}}_{d^{*}}

be an estimate of

S_{Y ∣ X}

for a fixed

d^{*} \in {1, 2, \dots, p - 1}

.

{\hat{S}}_{d^{*}}^{(b)}

,

b = 1, 2, \dots, n_{b}

, are bootstrap estimates. The structural dimension

d_{Y ∣ X}

is determined by maximizing the mean of the distances between

{\hat{S}}_{d^{*}}^{(b)}

and

{\hat{S}}_{d^{*}}

. See Ye and Weiss [20] and Wang et al. [15] for more details.

Algorithm 1: Aggregate Kernel Inverse Regression Estimation

4. Simulation Studies

In this section, we evaluate the finite sample performance of AKIRE through simulations. We compare AKIRE with aggregate approaches, including a-kNNSIR and AIME. The vector correlation coefficient q (Ye and Weiss [20]) is used to measure the estimation accuracy. Various criteria are available for evaluating the accuracy of estimated directions, including the Euclidean distance between

P_{B}

and

P_{\hat{B}}

and the trace correlation coefficient between the

B

and

\hat{B}

, among others. Since ADR and AIME use the criterion q, we also use it for comparison. Let

B

be an orthonormal basis of CS and

\hat{B}

be an estimate of

B

satisfying

{\hat{B}}^{T} \hat{B} = I_{d}

. Then, q is defined as

\begin{matrix} q = \sqrt{|{\hat{B}}^{T} (B B^{T}) \hat{B}|}, \end{matrix}

(8)

where

0 < q < 1

, with a larger q indicating a closer

S p a n (\hat{B})

and

S p a n (B)

.

The following six models are considered in the numerical study

\begin{matrix} Model 1 : Y & = & 0.5 {(β_{1}^{T} X)}^{3} + 0.5 {(1 + β_{2}^{T} X)}^{2} + 0.2 ϵ_{1}, \\ Model 2 : Y & = & sgn (2 β_{1}^{T} X + ϵ_{1}) \times log | 2 β_{2}^{T} X + 3 + ϵ_{2} |, \\ Model 3 : Y & = & 2 {(β_{1}^{T} X)}^{2} + 2 exp (β_{2}^{T} X) ϵ_{1}, \\ Model 4 : Y & = & 3 sin (β_{1}^{T} X / 4) + 0.2 (1 + {(β_{2}^{T} X)}^{2}) ϵ_{1}, \\ Model 5 : Y & = & \frac{β_{1}^{T} X}{0.5 + {(1.5 + β_{2}^{T} X)}^{2}} + {(β_{3}^{T} X)}^{2} ϵ_{1}, \\ Model 6 : Y & = & (β_{1}^{T} X) {(β_{2}^{T} X)}^{2} + (β_{3}^{T} X) (β_{4}^{T} X) + 0.5 ϵ_{1} . \end{matrix}

All of these models have been studied extensively in the literature on sufficient dimension reduction. In

Model

1–4,

X \sim N_{p} (0, Σ)

and

Σ = (σ_{i j}) = (0 . 5^{| i - j |})

. In

Models

5–6,

X \sim N_{p} (0, I)

. The standard Gaussian noises

ϵ_{1}

and

ϵ_{2}

are independent of X.

Model

1 is from a work by Zhu et al. [9] proposed for CUME,

Model

2 comes from Chen and Li [21],

Model

3 is borrowed from Xia [22],

Model

4 was studied by Li and Wang [7],

Model

5 was used in work by Wang and Xia [23] for sliced regression (SR) and

Model

6 is from Xia et al.’s work [24] in the study of MAVE. The dimensions of CS are set to two for

Models

1–4, three for

Model

5 and four for

Model

6. In

Model

1,

β_{1} = {(1, 1, 1, 0, \dots, 0)}^{T}

and

β_{2} = {(1, 0, 0, 0, 1, 3, 0, \dots, 0)}^{T}

. In

Model

2,

β_{1} = (0.5, 0.5, 0.5, 0.5, 0,

{\dots, 0)}^{T}

,

β_{2} = {(0, \dots, 0, 0.5, 0.5, 0.5, 0.5)}^{T}

and the function

sgn (\cdot)

takes the value 1 or

- 1

, depending on the sign of the argument. In

Model

3, the first 10 elements of

β_{1}

and

β_{2}

are

{(1, 2, 0, \dots, 0, 2)}^{T} / 3

and

{(0, 0, 3, 4, 0, \dots, 0)}^{T} / 5

, respectively, and the rest elements are zeros. In

Model

4,

β_{1} = {(1, 1, 1, 0, \dots, 0)}^{T}

and

β_{2} = {(0, 0, 0, 1, 3, 0, \dots, 0)}^{T}

. In

Model

5,

β_{1} = {(1, 0, \dots, 0)}^{T}

,

β_{2} = {(0, 1, 0, \dots, 0)}^{T}

and

β_{3} = {(0, 0, 1, 0, \dots, 0)}^{T}

. In

Model

6,

β_{1} = {(1, 2, 3, 4, 0, \dots, 0)}^{T}

/ \sqrt{30}

,

β_{2} = {(- 2, 1, - 4, 3, 1, 2, 0, \dots, 0)}^{T} / \sqrt{35}

,

β_{3} = (0, \dots, 0, 2,

{- 1, 2, 1, 2, 1)}^{T} / \sqrt{15}

and

β_{4} = {(0, \dots, 0, - 1, - 1, 1, 1)}^{T} / 2

.

In

Model

1–4, the predictor dimensions are set to

p = 20

, and two cases with sample sizes of 200 and 400 are compared. In

Model

5–6, the sample size is set to n = 400, and two cases with dimensions of p = 10 and p = 20 are compared. In all the models, the size of the nearest neighborhood is taken to be

m = 4 p

. In the AKIRE method, the Gaussian kernel function is used. We conduct 100 replications. The boxplot results of q are shown in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6.

In work by Wang et al. [15] and Wang and Yin [16], it was shown that traditional sufficient dimension reduction methods such as SIR, SAVE, CUME and FSIR are far less effective than a-kNNSIR and AIME with these models. Therefore, the proposed method is only compared with a-kNNSIR and AIME. In

Model

1, both AIME and AKIRE perform very well, with AIME performing slightly better. In

Model

2, see Figure 2a, the median q-value of AKIRE in the boxplot is larger, while the median q-value of a-kNNSIR is slightly larger and the box is narrower in Figure 2b. Thus, AKIRE outperforms the other two methods for

n = 200

and a-kNNSIR performs slightly better for

n = 400

. In

Model

3 and

Model

4, the AKIRE boxplots of both Figure 3a and Figure 4a are overall higher than ADR and AIME, and Figure 3b and Figure 4b indicate that the median q-value of AKIRE is slightly worse than AIME, with narrower boxes. This indicates that AKIRE provides better results at

n = 200

and that AKIRE and AIME perform similarly but AKIRE is more robust at

n = 400

. In

Models

5 and 6, AKIRE obviously has a much better performance than the other two methods. It can be shown in Figure 5 and Figure 6 that the median q-value is better than the other two approaches. Additionally, the boxes are significantly narrower, indicating that the standard deviation of the q-value is small.

5. A Real Data Example

In this section, we analyze a dataset which contains example data of US college admissions. The dataset used in the ASA Statistical Graphics Section’s 1995 Data Analysis Exposition is described in textbook [25]. We obtained this dataset through the ISLR package in R. We are interested in the number of applications

(Y)

received by 558 private institutions, where the number of full-time undergraduates is less than 10,000. We investigate the relationship between Y and 11 predictors, which are the number of full-time undergraduates, the number of part-time undergraduates, out-of-state tuition, room and board costs, estimated book costs, estimated personal spending, percent of the faculty with a terminal degree, proportion of students and faculty, percent of alumni who donate, instructional expenditure per student and graduation rates per student. All the predictors were standardized separately, and we took the logarithm of the response variable.

We used bootstrap to determine the dimensions d of the central subspace as two. The estimated dimension reduction directions were obtained by AKIRE and to be

\begin{matrix} {\hat{β}}_{1}^{T} & = & {(0.9834, - 0.0757, 0.0991, - 0.0391, 0.0280, - 0.0025, 0.0146, 0.0984, 0.0396, - 0.0339, 0.0489)}^{T}, \\ {\hat{β}}_{2}^{T} & = & {(- 0.1625, - 0.4000, 0.2775, - 0.0683, 0.2017, 0.1113, 0.2455, 0.7462, 0.1019, - 0.0712, 0.2154)}^{T} . \end{matrix}

It can be seen that

{\hat{β}}_{1}^{T} x

is mainly affected by the number of full-time undergraduates and

{\hat{β}}_{2}^{T} x

is mainly affected by the student/faculty ratio and the number of part-time undergraduates. The scatterplots of

({\hat{β}}^{T} x, l o g (Y))

are presented in Figure 7, where obvious nonlinear trends exist. AKIRE can also successfully find two significant dimension reduction directions, which are more obvious than aggregate SIR. Then, we constructed a three-dimensional scatterplot, as shown in Figure 8. This three-dimensional scatterplot further illustrates the non-linear trend in the relationship between

l o g (Y)

and

{\hat{β}}_{1}^{T} x

and

{\hat{β}}_{2}^{T} x

.

6. Discussion

In this article, we propose AKIRE for estimating the central subspace. This method combines the idea of aggregate dimension reductions and kernel inverse regression, which reduces the dependence on linear conditions and avoids the influence of the arrangement of slices. Numerical experiments demonstrate the satisfactory performance of the proposed approach, especially with complex models. A direct extension of the current method is for the multivariate response variable, which can be handled by projective re-sampling. The other extension direction is to study how to apply our methodology to the case where

p > n

. Under the assumption of sparsity in the feature X, Yang et al. [26] proposed a two-step process with feature selection assistance. The first step involves implementing a model-free feature selection method to reduce the number of predictors to a manageable scale, where AKIRE may provide an improvement.

Author Contributions

Methodology, W.L.; software, W.L. and W.W.; writing—original draft preparation, W.L.; writing—review and editing, W.R. and W.W.; supervision, J.C.; Funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (grant no. 12061082), People’s Government of Yunnan Province (grant no. YB2021097), Yunnan Provincial Department of Education Science Research Fund Project (grant no. 2021J0574), Yunnan Fundamental Research Young Scholars Project (grant no. 202001AU070065), Talent Introduction Project of Yunnan University of Finance and Economics (grant no. 2020D02) and PhD Scientific Research Foundation of Jiangxi Science and Technology Normal University (grant no. 2022BSQD16).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Chen Fei for guidance and encouragement throughout this work.And we thank the editors and two referees for constructive comments which lead to a substantial improvement of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, K.C. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 1991, 86, 316–327. [Google Scholar] [CrossRef]
Cook, R.D.; Li, B. Dimension reduction for conditional mean in regression. Ann. Stat. 2002, 30, 455–474. [Google Scholar] [CrossRef]
Cook, R.D.; Weisberg, S. Sliced inverse regression for dimension reduction: Comment. J. Am. Stat. Assoc. 1991, 86, 328–332. [Google Scholar] [CrossRef]
Bura, E.; Cook, R.D. Estimating the structural dimension of regressions via parametric inverse regression. J. R. Stat. Soc. Ser. B 2001, 63, 393–410. [Google Scholar] [CrossRef]
Li, K.C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Stat. Assoc. 1992, 87, 1025–1039. [Google Scholar] [CrossRef]
Li, B.; Zha, H.; Chiaromonte, F. Contour regression: A general approach to dimension reduction. Ann. Stat. 2005, 33, 1580–1616. [Google Scholar] [CrossRef] [Green Version]
Li, B.; Wang, S. On directional regression for dimension reduction. J. Am. Stat. Assoc. 2007, 102, 997–1008. [Google Scholar] [CrossRef]
Zhu, L.X.; Fang, K.T. Asymptotics for kernel estimate of sliced inverse regression. Ann. Stat. 1996, 24, 1053–1068. [Google Scholar] [CrossRef]
Zhu, L.P.; Zhu, L.X.; Feng, Z.H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Stat. Assoc. 2010, 105, 1455–1466. [Google Scholar] [CrossRef]
Yin, X.; Cook, R.D. Estimating central subspaces via inverse third moments. Biometrika 2003, 90, 113–125. [Google Scholar] [CrossRef]
Cook, R.D.; Nachtsheim, C.J. Reweighting to achieve elliptically contoured covariates in regression. J. Am. Stat. Assoc. 1994, 89, 592–599. [Google Scholar] [CrossRef]
Li, B.; Dong, Y. Dimension Reduction for Nonelliptically Distributed Predictors. Ann. Stat. 2009, 37, 1272–1298. [Google Scholar] [CrossRef]
Dong, Y.; Li, B. Dimension reduction for non-elliptically distributed predictors: Second-order methods. Biometrika 2010, 97, 279–294. [Google Scholar] [CrossRef]
Ma, Y.; Zhu, L. A semiparametric approach to dimension reduction. J. Am. Stat. Assoc. 2012, 107, 168–179. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Yin, X.; Li, B.; Tang, Z. On aggregate dimension reduction. Stat. Sin. 2020, 30, 1027–1048. [Google Scholar] [CrossRef]
Wang, Q.; Yin, X. Aggregate inverse mean estimation for sufficient dimension reduction. Technometrics 2021, 63, 456–465. [Google Scholar] [CrossRef]
Wang, Q.; Xue, Y. An ensemble of inverse moment estimators for sufficient dimension reduction. Comput. Stat. Data Anal. 2021, 161, 107241. [Google Scholar] [CrossRef]
Wang, Q.; Xue, Y. A structured covariance ensemble for sufficient dimension reduction. Adv. Data Anal. Classif. 2022, 1–24. [Google Scholar] [CrossRef]
Li, L.; Cook, R.D.; Tsai, C.L. Partial inverse regression. Biometrika 2007, 94, 615–625. [Google Scholar] [CrossRef]
Ye, Z.; Weiss, R.E. Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Stat. Assoc. 2003, 98, 968–979. [Google Scholar] [CrossRef]
Chen, C.H.; Li, K.C. Can SIR be as popular as multiple linear regression? Stat. Sin. 1998, 8, 289–316. [Google Scholar]
Xia, Y. A Constructive Approach to the Estimation of Dimension Reduction Directions. Ann. Stat. 2007, 35, 2654–2690. [Google Scholar] [CrossRef]
Wang, H.; Xia, Y. Sliced regression for dimension reduction. J. Am. Stat. Assoc. 2008, 103, 811–821. [Google Scholar] [CrossRef]
Xia, Y.; Tong, H.; Li, W.K.; Zhu, L.X. An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 2002, 64, 363–410. [Google Scholar] [CrossRef]
Gareth, J.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Spinger: New York, NY, USA, 2013; pp. 67–68. [Google Scholar]
Yang, B.; Yin, X.; Zhang, N. Sufficient variable selection using independence measures for continuous response. J. Multivar. Anal. 2019, 173, 480–493. [Google Scholar] [CrossRef]

Figure 1. Comparison of the estimation accuracy for Model 1.

Figure 2. Comparison of the estimation accuracy for Model 2.

Figure 3. Comparison of the estimation accuracy for Model 3.

Figure 4. Comparison of the estimation accuracy for Model 4.

Figure 5. Comparison of the estimation accuracy for Model 5.

Figure 6. Comparison of the estimation accuracy for Model 6.

Figure 7. Analysis of college admission data: (a) scatterplots of

l o g (Y)

vs. the first estimated direction, (b) scatterplots of

l o g (Y)

vs. the second estimated direction.

Figure 7. Analysis of college admission data: (a) scatterplots of

l o g (Y)

vs. the first estimated direction, (b) scatterplots of

l o g (Y)

vs. the second estimated direction.

Figure 8. Analysis of college admission data: 3D scatterplots of

l o g (Y)

and two estimated directions.

Figure 8. Analysis of college admission data: 3D scatterplots of

l o g (Y)

and two estimated directions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Wang, W.; Chen, J.; Rao, W. Aggregate Kernel Inverse Regression Estimation. Mathematics 2023, 11, 2682. https://doi.org/10.3390/math11122682

AMA Style

Li W, Wang W, Chen J, Rao W. Aggregate Kernel Inverse Regression Estimation. Mathematics. 2023; 11(12):2682. https://doi.org/10.3390/math11122682

Chicago/Turabian Style

Li, Wenjuan, Wenying Wang, Jingsi Chen, and Weidong Rao. 2023. "Aggregate Kernel Inverse Regression Estimation" Mathematics 11, no. 12: 2682. https://doi.org/10.3390/math11122682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Aggregate Kernel Inverse Regression Estimation

Abstract

1. Introduction

2. Review of ADR and AIME

3. Aggregate Kernel Inverse Regression Estimation

3.1. Estimation Algorithm

3.2. Tuning Parameters in Algorithm 1

4. Simulation Studies

5. A Real Data Example

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI