Next Article in Journal
A Continuous Multistage Load Shedding Algorithm for Industrial Processes Based on Metaheuristic Optimization
Previous Article in Journal
An Adaptive Frequency Sampling Algorithm for Dynamic Condensation-Based Frequency Response Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aggregate Kernel Inverse Regression Estimation

1
School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming 650221, China
2
School of Mathematics and Computer Science, Jiangxi Science and Technology Normal University, Nanchang 330038, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(12), 2682; https://doi.org/10.3390/math11122682
Submission received: 17 May 2023 / Revised: 5 June 2023 / Accepted: 6 June 2023 / Published: 13 June 2023

Abstract

:
Sufficient dimension reduction (SDR) is a useful tool for nonparametric regression with high-dimensional predictors. Many existing SDR methods rely on some assumptions about the distribution of predictors. Wang et al. proposed an aggregate dimension reduction method to reduce the dependence on the distributional assumptions. Motivated by their work, we propose a novel and effective method by combining the aggregate method and the kernel inverse regression estimation. The proposed approach can accurately estimate the dimension reduction directions and substantially improve the exhaustivity of the estimates with complex models. At the same time, this method does not depend on the arrangement of slices, and the influence of the extreme values of the response is reduced. In numerical examples and a real data application, it performs well.

1. Introduction

Sufficient dimension reduction (SDR) has garnered significant interest as an efficient regression tool for high-dimensional data since the groundbreaking research by Li [1]. Given a univariate response  Y R  and a p dimensional predictor  X R p , the goal of SDR is to replace  X  with a small set of linear combinations  B T X , where  B = ( β 1 , β 2 , , β d )  is a  p × d  matrix with  d < p . Let  S  be column space of  B . That is, SDR seeks a subspace  S R d  such that
Y X P S X ,
where “ ” denotes independence and  P S  represents the orthogonal projection operator onto  S . The intersection of all such  S , also satisfying the above conditional independence, is defined as the central subspace (CS) and is denoted by  S Y X . The dimensions of  S Y X , denoted by  d Y X , are called the structural dimensions. When the conditional independence in (1) is replaced by
Y E [ Y X ] P S X ,
S  is called a mean dimension reduction subspace. The central mean subspace (CMS, Cook and Li [2]), denoted by  S E [ Y X ] , is the intersection of all the mean dimension reduction subspaces, if it satisfies the conditional independence in (2). Cook and Li [2] showed that  S E [ Y X ] S Y X .
A variety of approaches for SDR have been proposed, such as the sliced inverse regression (SIR [1]), sliced average variance estimation (SAVE [3]), parametric inverse regression (PIR [4]), principal Hessian directions (pHd [5]), contour regression (CR [6]), directional regression (DR [7]), kernel inverse regression (KIR [8]), cumulative mean estimation (CUME [9]) and sliced average third moment ([10]), among others. However, the above methods usually rely on some assumptions about the distribution of independent variables, such as the linearity condition (L.D.C) and the constant conditional variance condition (CCV). Cook and Nachtsheim [11] proposed a method reweighting the predictor vectors to handle the scenarios of non-elliptical distributions. Li and Dong [12] and Dong and Li [13] proposed methods based on central solution space (CSS), which do not require the linearity condition. Ma and Zhu [14] constructed a semiparametric estimation framework for dimension reduction, removing the reliance on the distributional assumptions for the predictions at the cost of an additional semiparametric regression.
Recently, Wang et al. [15] proposed an aggregate dimension reduction (ADR) procedure. The basic idea of this method is to localize the dimension reduction process by estimating CS in the local neighborhoods and of each observations of the predictor vector and aggregate all the results of the localized dimension reduction. The sensitivity of the method to the distributional assumptions was greatly reduced. Wang and Yin [16] extended this idea to the cumulative slice estimation framework and proposed aggregate inverse mean estimation (AIME). Wang and Xue [17,18] proposed an ensemble of inverse moment estimators (ELF2M) and a structured covariance ensemble (enCov) to explore the central subspace. By aggregating information from the kth moment subspace, this method can effectively identify the dimension reduction directions when modeling the mean and the variance. In addition, enCov is applicable to both continuous and binary responses. In aggregate dimension reduction, the local kernel matrix may be affected by the imbalance among the number of observations in slices and extreme values in Y. As kernel inverse regression can avoid these problems, we combine the nonparametric kernel methods and the aggregated SDR idea proposed by Wang et al. [15] to propose a method called aggregated kernel inverse regression estimate (AKIRE). We use the idea of kernel inverse regression locally to find the local dimension reduction subspace, and then aggregate them. In contrast to ADR and AIME, the AKIRE method provides a kernel estimate of  E ( X Y = y )  through a non-parametric approach, which can be regarded as a smoothed moving slicing estimator, thus reducing the possible impact from outliers. Moreover, in each local area, combining the kernel estimates of  E ( X Y = y )  at many values of y can provide more accurate local dimension reduction results. Numerical studies have shown that AKIRE has effective estimation results and satisfactory robustness.
The paper is organized as follows. Section 2 reviews ADR and AIME. Section 3 introduces AKIRE, including its algorithm and tuning parameter selection. Simulation studies are included in Section 4 to illustrate the proposed method. Section 5 presents a practical application utilizing real data. Finally, we conclude the paper in Section 6.

2. Review of ADR and AIME

Let  G i  be any open set in  Ω X R p , where  Ω X  is the support set of  X . ADR (Wang et al. [15]) is supported by the fact that local central subspaces  S Y G i X G i  must also belong to the global central subspace  S Y X . Therefore, the central subspace  S Y X  can always be decomposed into local dimension reduction subspaces and we can aggregate the local subspaces to recover  S Y X , such that
S Y X = S p a n i = 1 m S Y G i X G i .
Equation (3) guarantees that we can join a finite number of local central subspaces to recover the global central subspace.
Let  G ¯ i  denote the closure of an open set  G i  and  G i  denote the “diameter” of  G i  in  Ω X , in the sense that  G i = s u p { | x x | : x , x G } . Let  μ G i = E ( X G i )  and  h ˙ ( y x ) = h ( y x ) / x , where  X G i  follows the conditional distribution of  X  given  x G i  and  h ( y x )  is the conditional density of Y given  X = x . Let  β G i  be the column full rank matrix of  H G i = E h ˙ ( Y G i μ G i ) h ˙ T ( Y G i μ G i ) , where  Y G i  is Y restricted on the set  G i . The space spanned by the column space of  β G i  is denoted as  S p a n ( β G i ) , and let  P β G i = β G i ( β G i T β G i ) 1 β G i T  be the projection onto  S p a n ( β G i ) . Note that  S p a n ( β G i ) = S p a n ( P β G i ) S Y G i X G i .
Proposition 1.
(Wang et al. [15]) Suppose that, for a fixed  y Ω Y , where  Ω Y  is the support set of Y, the marginal density of Y is greater than 0,  h ( y x )  is twice differentiable with respect to  x  on  G ¯ i , and the second derivatives are bounded on  G ¯ i . Then, as  G i 0 , almost everywhere on  Ω Y ,
Σ G i 1 E ( X G i Y G i = y ) E ( X G i ) P β G i Σ G i 1 E ( X G i Y G i = y ) E ( X G i ) F = O ( G i ) ,
where  A F  denotes the Frobenius norm of matrix A.
As  S p a n ( β G i ) S Y G i X G i , Formula (4) indicates that the local inverse mean vector  E ( X G i Y G i = y ) E ( X G i )  can be used to estimate  S Y G i X G i  when  G i  is small enough. Based on the proposed aggregate approach, Wang et al. [15] proposed kNN sliced inverse regression (kNNSIR) and the adaptive kNN sliced inverse regression (a-kNNSIR) using the k-nearest neighbor localizing mechanism. Wang and Yin [16] extended the above proposition to CUME, and gave the following proposition. Let  m ( Y ˜ ) = E { X E ( X ) } I ( Y < Y ˜ ) , where  I ( · )  is an indicator function. In addition, Wang and Yin [16] defined the local kenel matrix as  M i = E [ m ( Y ˜ ) m T ( Y ˜ ) ω ( Y ˜ ) X G i ] , where  Y ˜  is an independent copy of Y and  ω ( · )  is a non-negative weight function.
Proposition 2.
(Wang and Yin [16]) Assume the conditional distribution function  h ( y x )  is twice differentiable with respect to  x , and its second derivative is bounded on  Ω X . Then, when  max i = 1 , 2 , , m G i 0 , we have  S p a n ( Σ G i 1 M i ) S Y G i X G i  and  i = 1 m S p a n ( Σ G i 1 M i ) S Y X , where  Σ G i  is a covariance matrix of  X  in  G i .
The above two propositions ensure that the aggregation of the dimension reduction direction of the local area belongs to the central subspace. Localization can not only transform the nonlinear data structure into a local linear structure and weaken the linearity condition, but also deal with the inability of SIR and CUME to recognize symmetry.

3. Aggregate Kernel Inverse Regression Estimation

Both kNNSIR and a-kNNSIR employ SIR for the local dimension reduction in  G i . In SIR, the inverse mean  E ( X | Y ) E ( X )  plays a key role, because  E ( X | Y ) E ( X ) Σ X S Y X  under the linearity condition. However, SIR uses  E ( X | Y I i )  instead of  E ( X | Y ) , where  i = 1 , 2 , , H  denotes a partition of  Ω Y , and that limits its performance. Moreover, CUME used by AIME also depends on slicing, with the datasets always divided into two slices.
For the use of  E ( X | Y ) , slicing is not the sole approach. Actually, estimating  E ( X | Y = y )  directly by kernel smoothing (e.g., the Nadaraya–Watson estimator) and combining the estimates at many values of y [8] is a powerful method, which is called kernel inverse regression (KIR). Under some regularity conditions, Zhu and Fang [8] have proven the  n -consistency and asymptotic normality of KIR estimators.
We then use KIR for the dimension reduction in the local area. Let  ( x G i , j , y G i , j ) , j = 1 , 2 , , m  be the data points with the observations of  X  falling into the local area  G i . Denote
g ^ i ( y ) = 1 m j = 1 m K h ( y G i , j y ) ) ( x G i , j x ¯ G i ) = 1 m h j = 1 m K ( ( y G i , j y ) ) / h ) ( x G i , j x ¯ G i ) , f ^ i ( y ) = 1 m j = 1 m K h ( y G i , j y ) ) = 1 m h j = 1 m K ( y G i , j y ) / h ) ,
where  K h ( u ) = K ( u / h ) / h , K ( · )  is a kernel function, h is a constant bandwidth and  x ¯ G i = j = 1 m x G i , j / m . Then,  E ( X G i | Y G i = y ) E ( X G i )  can be estimated by following Nadaraya–Watson estimate
r ^ i ( y ) = g ^ i ( y ) f ^ i ( y ) .
Based on  r ^ i ( Y ) , we construct the local kernel matrix for  G i ,
Λ ^ i = 1 m j = 1 m r ^ i ( y G i , j ) r ^ i T ( y G i , j ) .
Finally, aggregating all the  Λ ^ i s from  G i s gives the global kernel matrix and its spectral decomposition gives the estimate of the central subspace  S Y X .
Let  Y ˜ G i  denote the independent copy of  Y G i  and  r i ( y ) = ^ E ( X G i | Y G i = y ) E ( X G i ) . Then, the above  Λ ^ i  is estimated by
Λ i = E [ r i ( Y ˜ G i ) r i T ( Y ˜ G i ) ] .
The following theorem provides the population consistency of the above method.
Theorem 1.
Under the assumptions that the conditional distribution function  h ( y x )  is twice differentiable with respect to  x  and its second derivative is bounded on  Ω X , then, when  max i = 1 , 2 , , m G i 0 , we have  S p a n ( Σ G i 1 Λ i ) S Y G i X G i  and  i = 1 m S p a n ( Σ G i 1 Λ i ) S Y X , where  Σ G i  is a covariance matrix of  X  in  G i .
The proof follows Theorem 2 in a work by Wang et al. [15], and is omitted here.

3.1. Estimation Algorithm

We now summarize the sample-level algorithm for AKIRE. Let  { ( x i , y i ) , i = 1 , 2 , , n }  be a sample from  ( X , Y )  and assume the structural dimension d is known before estimation. The estimation algorithm of the proposed AKIRE procedure is summarized as follows, see Algorithm 1.
The Algorithm 1 highlights that the global kernel matrix  Λ ^  aggregates information from all the local central subspaces. Unlike ADR and AIME, we constructed the kernel estimators of  E ( X G i | Y G i = y ) E ( X G i )  for  y = y G i , j = 1 , 2 , , m  and combined them to discover the local central subspaces. That ensures that the information contained in  G i  is used more adequately and enhances the convergence speed of  B ^ .

3.2. Tuning Parameters in Algorithm 1

We will discuss how to choose the tuning parameters in Algorithm 1, including the size of the local neighborhoods k, the kernel function bandwidth h, the order  α  in the  Σ -envelope, the weighting function  ω ( · ) , and the determination of the structural dimension  d Y | X .
In this paper, the main improvement is made for the step of the dimension reduction in the local areas, where a Nadaraya–Watson estimator of  E ( X G i | Y G i = y ) E ( X G i )  is used. In this step, the only tuning parameter is the bandwith h, and we use the MATLAB function “ksdensity” to output the best bandwidth. For the tuning parameters in the other step, we primarily adopt the suggestions proposed by Wang et al. [15]. The value of m is selected within the range of  2 p  to  4 p . The order  α  in the  Σ -envelope is determined based on the ratios of the consecutive eigenvalues of  R ^ G i R ^ G i T  (Li et al. [19]):
α G i = j p 1 I ( r j r j + 1 > α 0 ) ,
where  r 1 r p  are the eigenvalues of the matrix  R ^ G i R ^ G i T R ^ G i = ( ξ ^ G i , Σ ^ G i ξ ^ G i , , Σ ^ G i α 1 ξ ^ G i )  and  α 0  is a pre-specified threshold value, which in the numerical studies is set to 1.5 as recommended by Li et al. [19]. As to the weighted function, we chose  ω η i = η i 2 2 . We borrow the bootstap procedure proposed by Ye and Weiss [20] to estimate  d Y | X . Let  S ^ d *  be an estimate of  S Y X  for a fixed  d * { 1 , 2 , , p 1 } S ^ d * ( b ) b = 1 , 2 , , n b , are bootstrap estimates. The structural dimension  d Y X  is determined by maximizing the mean of the distances between  S ^ d * ( b )  and  S ^ d * . See Ye and Weiss [20] and Wang et al. [15] for more details.
Algorithm 1: Aggregate Kernel Inverse Regression Estimation
Mathematics 11 02682 i001

4. Simulation Studies

In this section, we evaluate the finite sample performance of AKIRE through simulations. We compare AKIRE with aggregate approaches, including a-kNNSIR and AIME. The vector correlation coefficient q (Ye and Weiss [20]) is used to measure the estimation accuracy. Various criteria are available for evaluating the accuracy of estimated directions, including the Euclidean distance between  P B  and  P B ^  and the trace correlation coefficient between the  B  and  B ^ , among others. Since ADR and AIME use the criterion q, we also use it for comparison. Let  B  be an orthonormal basis of CS and  B ^  be an estimate of  B  satisfying  B ^ T B ^ = I d . Then, q is defined as
q = B ^ T B B T B ^ ,
where  0 < q < 1 , with a larger q indicating a closer  S p a n ( B ^ )  and  S p a n ( B ) .
The following six models are considered in the numerical study
Model 1 : Y = 0.5 ( β 1 T X ) 3 + 0.5 ( 1 + β 2 T X ) 2 + 0.2 ϵ 1 , Model 2 : Y = sgn ( 2 β 1 T X + ϵ 1 ) × log | 2 β 2 T X + 3 + ϵ 2 | , Model 3 : Y = 2 ( β 1 T X ) 2 + 2 exp ( β 2 T X ) ϵ 1 , Model 4 : Y = 3 sin ( β 1 T X / 4 ) + 0.2 ( 1 + ( β 2 T X ) 2 ) ϵ 1 , Model 5 : Y = β 1 T X 0.5 + ( 1.5 + β 2 T X ) 2 + ( β 3 T X ) 2 ϵ 1 , Model 6 : Y = ( β 1 T X ) ( β 2 T X ) 2 + ( β 3 T X ) ( β 4 T X ) + 0.5 ϵ 1 .
All of these models have been studied extensively in the literature on sufficient dimension reduction. In  Model  1–4,  X N p ( 0 , Σ )  and  Σ = ( σ i j ) = ( 0 . 5 | i j | ) . In  Models  5–6,  X N p ( 0 , I ) . The standard Gaussian noises  ϵ 1  and  ϵ 2  are independent of X.
Model  1 is from a work by Zhu et al. [9] proposed for CUME,  Model  2 comes from Chen and Li [21],  Model  3 is borrowed from Xia [22],  Model  4 was studied by Li and Wang [7],  Model  5 was used in work by Wang and Xia [23] for sliced regression (SR) and  Model  6 is from Xia et al.’s work [24] in the study of MAVE. The dimensions of CS are set to two for  Models  1–4, three for  Model  5 and four for  Model  6. In  Model  1,  β 1 = ( 1 , 1 , 1 , 0 , , 0 ) T  and  β 2 = ( 1 , 0 , 0 , 0 , 1 , 3 , 0 , , 0 ) T . In  Model  2,  β 1 = ( 0.5 , 0.5 , 0.5 , 0.5 , 0 , , 0 ) T β 2 = ( 0 , , 0 , 0.5 , 0.5 , 0.5 , 0.5 ) T  and the function  sgn ( · )  takes the value 1 or  1 , depending on the sign of the argument. In  Model  3, the first 10 elements of  β 1  and  β 2  are  ( 1 , 2 , 0 , , 0 , 2 ) T / 3  and  ( 0 , 0 , 3 , 4 , 0 , , 0 ) T / 5 , respectively, and the rest elements are zeros. In  Model  4,  β 1 = ( 1 , 1 , 1 , 0 , , 0 ) T  and  β 2 = ( 0 , 0 , 0 , 1 , 3 , 0 , , 0 ) T . In  Model  5,  β 1 = ( 1 , 0 , , 0 ) T β 2 = ( 0 , 1 , 0 , , 0 ) T  and  β 3 = ( 0 , 0 , 1 , 0 , , 0 ) T . In  Model  6,  β 1 = ( 1 , 2 , 3 , 4 , 0 , , 0 ) T   / 30 β 2 = ( 2 , 1 , 4 , 3 , 1 , 2 , 0 , , 0 ) T / 35 β 3 = ( 0 , , 0 , 2 ,   1 , 2 , 1 , 2 , 1 ) T / 15  and  β 4 = ( 0 , , 0 , 1 , 1 , 1 , 1 ) T / 2 .
In  Model  1–4, the predictor dimensions are set to  p = 20 , and two cases with sample sizes of 200 and 400 are compared. In  Model  5–6, the sample size is set to n = 400, and two cases with dimensions of p = 10 and p = 20 are compared. In all the models, the size of the nearest neighborhood is taken to be  m = 4 p . In the AKIRE method, the Gaussian kernel function is used. We conduct 100 replications. The boxplot results of q are shown in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6.
In work by Wang et al. [15] and Wang and Yin [16], it was shown that traditional sufficient dimension reduction methods such as SIR, SAVE, CUME and FSIR are far less effective than a-kNNSIR and AIME with these models. Therefore, the proposed method is only compared with a-kNNSIR and AIME. In  Model  1, both AIME and AKIRE perform very well, with AIME performing slightly better. In  Model  2, see Figure 2a, the median q-value of AKIRE in the boxplot is larger, while the median q-value of a-kNNSIR is slightly larger and the box is narrower in Figure 2b. Thus, AKIRE outperforms the other two methods for  n = 200  and a-kNNSIR performs slightly better for  n = 400 . In  Model  3 and  Model  4, the AKIRE boxplots of both Figure 3a and Figure 4a are overall higher than ADR and AIME, and Figure 3b and Figure 4b indicate that the median q-value of AKIRE is slightly worse than AIME, with narrower boxes. This indicates that AKIRE provides better results at  n = 200  and that AKIRE and AIME perform similarly but AKIRE is more robust at  n = 400 . In  Models  5 and 6, AKIRE obviously has a much better performance than the other two methods. It can be shown in Figure 5 and Figure 6 that the median q-value is better than the other two approaches. Additionally, the boxes are significantly narrower, indicating that the standard deviation of the q-value is small.

5. A Real Data Example

In this section, we analyze a dataset which contains example data of US college admissions. The dataset used in the ASA Statistical Graphics Section’s 1995 Data Analysis Exposition is described in textbook [25]. We obtained this dataset through the ISLR package in R. We are interested in the number of applications  ( Y )  received by 558 private institutions, where the number of full-time undergraduates is less than 10,000. We investigate the relationship between Y and 11 predictors, which are the number of full-time undergraduates, the number of part-time undergraduates, out-of-state tuition, room and board costs, estimated book costs, estimated personal spending, percent of the faculty with a terminal degree, proportion of students and faculty, percent of alumni who donate, instructional expenditure per student and graduation rates per student. All the predictors were standardized separately, and we took the logarithm of the response variable.
We used bootstrap to determine the dimensions d of the central subspace as two. The estimated dimension reduction directions were obtained by AKIRE and to be
β ^ 1 T = ( 0.9834 , 0.0757 , 0.0991 , 0.0391 , 0.0280 , 0.0025 , 0.0146 , 0.0984 , 0.0396 , 0.0339 , 0.0489 ) T , β ^ 2 T = ( 0.1625 , 0.4000 , 0.2775 , 0.0683 , 0.2017 , 0.1113 , 0.2455 , 0.7462 , 0.1019 , 0.0712 , 0.2154 ) T .
It can be seen that  β ^ 1 T x  is mainly affected by the number of full-time undergraduates and  β ^ 2 T x  is mainly affected by the student/faculty ratio and the number of part-time undergraduates. The scatterplots of  ( β ^ T x , l o g ( Y ) )  are presented in Figure 7, where obvious nonlinear trends exist. AKIRE can also successfully find two significant dimension reduction directions, which are more obvious than aggregate SIR. Then, we constructed a three-dimensional scatterplot, as shown in Figure 8. This three-dimensional scatterplot further illustrates the non-linear trend in the relationship between  l o g ( Y )  and  β ^ 1 T x  and  β ^ 2 T x .

6. Discussion

In this article, we propose AKIRE for estimating the central subspace. This method combines the idea of aggregate dimension reductions and kernel inverse regression, which reduces the dependence on linear conditions and avoids the influence of the arrangement of slices. Numerical experiments demonstrate the satisfactory performance of the proposed approach, especially with complex models. A direct extension of the current method is for the multivariate response variable, which can be handled by projective re-sampling. The other extension direction is to study how to apply our methodology to the case where  p > n . Under the assumption of sparsity in the feature X, Yang et al. [26] proposed a two-step process with feature selection assistance. The first step involves implementing a model-free feature selection method to reduce the number of predictors to a manageable scale, where AKIRE may provide an improvement.

Author Contributions

Methodology, W.L.; software, W.L. and W.W.; writing—original draft preparation, W.L.; writing—review and editing, W.R. and W.W.; supervision, J.C.; Funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (grant no. 12061082), People’s Government of Yunnan Province (grant no. YB2021097), Yunnan Provincial Department of Education Science Research Fund Project (grant no. 2021J0574), Yunnan Fundamental Research Young Scholars Project (grant no. 202001AU070065), Talent Introduction Project of Yunnan University of Finance and Economics (grant no. 2020D02) and PhD Scientific Research Foundation of Jiangxi Science and Technology Normal University (grant no. 2022BSQD16).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank Chen Fei for guidance and encouragement throughout this work.And we thank the editors and two referees for constructive comments which lead to a substantial improvement of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Li, K.C. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 1991, 86, 316–327. [Google Scholar] [CrossRef]
  2. Cook, R.D.; Li, B. Dimension reduction for conditional mean in regression. Ann. Stat. 2002, 30, 455–474. [Google Scholar] [CrossRef]
  3. Cook, R.D.; Weisberg, S. Sliced inverse regression for dimension reduction: Comment. J. Am. Stat. Assoc. 1991, 86, 328–332. [Google Scholar] [CrossRef]
  4. Bura, E.; Cook, R.D. Estimating the structural dimension of regressions via parametric inverse regression. J. R. Stat. Soc. Ser. B 2001, 63, 393–410. [Google Scholar] [CrossRef]
  5. Li, K.C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Stat. Assoc. 1992, 87, 1025–1039. [Google Scholar] [CrossRef]
  6. Li, B.; Zha, H.; Chiaromonte, F. Contour regression: A general approach to dimension reduction. Ann. Stat. 2005, 33, 1580–1616. [Google Scholar] [CrossRef] [Green Version]
  7. Li, B.; Wang, S. On directional regression for dimension reduction. J. Am. Stat. Assoc. 2007, 102, 997–1008. [Google Scholar] [CrossRef]
  8. Zhu, L.X.; Fang, K.T. Asymptotics for kernel estimate of sliced inverse regression. Ann. Stat. 1996, 24, 1053–1068. [Google Scholar] [CrossRef]
  9. Zhu, L.P.; Zhu, L.X.; Feng, Z.H. Dimension reduction in regressions through cumulative slicing estimation. J. Am. Stat. Assoc. 2010, 105, 1455–1466. [Google Scholar] [CrossRef]
  10. Yin, X.; Cook, R.D. Estimating central subspaces via inverse third moments. Biometrika 2003, 90, 113–125. [Google Scholar] [CrossRef]
  11. Cook, R.D.; Nachtsheim, C.J. Reweighting to achieve elliptically contoured covariates in regression. J. Am. Stat. Assoc. 1994, 89, 592–599. [Google Scholar] [CrossRef]
  12. Li, B.; Dong, Y. Dimension Reduction for Nonelliptically Distributed Predictors. Ann. Stat. 2009, 37, 1272–1298. [Google Scholar] [CrossRef]
  13. Dong, Y.; Li, B. Dimension reduction for non-elliptically distributed predictors: Second-order methods. Biometrika 2010, 97, 279–294. [Google Scholar] [CrossRef]
  14. Ma, Y.; Zhu, L. A semiparametric approach to dimension reduction. J. Am. Stat. Assoc. 2012, 107, 168–179. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, Q.; Yin, X.; Li, B.; Tang, Z. On aggregate dimension reduction. Stat. Sin. 2020, 30, 1027–1048. [Google Scholar] [CrossRef]
  16. Wang, Q.; Yin, X. Aggregate inverse mean estimation for sufficient dimension reduction. Technometrics 2021, 63, 456–465. [Google Scholar] [CrossRef]
  17. Wang, Q.; Xue, Y. An ensemble of inverse moment estimators for sufficient dimension reduction. Comput. Stat. Data Anal. 2021, 161, 107241. [Google Scholar] [CrossRef]
  18. Wang, Q.; Xue, Y. A structured covariance ensemble for sufficient dimension reduction. Adv. Data Anal. Classif. 2022, 1–24. [Google Scholar] [CrossRef]
  19. Li, L.; Cook, R.D.; Tsai, C.L. Partial inverse regression. Biometrika 2007, 94, 615–625. [Google Scholar] [CrossRef]
  20. Ye, Z.; Weiss, R.E. Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Stat. Assoc. 2003, 98, 968–979. [Google Scholar] [CrossRef]
  21. Chen, C.H.; Li, K.C. Can SIR be as popular as multiple linear regression? Stat. Sin. 1998, 8, 289–316. [Google Scholar]
  22. Xia, Y. A Constructive Approach to the Estimation of Dimension Reduction Directions. Ann. Stat. 2007, 35, 2654–2690. [Google Scholar] [CrossRef]
  23. Wang, H.; Xia, Y. Sliced regression for dimension reduction. J. Am. Stat. Assoc. 2008, 103, 811–821. [Google Scholar] [CrossRef]
  24. Xia, Y.; Tong, H.; Li, W.K.; Zhu, L.X. An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 2002, 64, 363–410. [Google Scholar] [CrossRef]
  25. Gareth, J.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Spinger: New York, NY, USA, 2013; pp. 67–68. [Google Scholar]
  26. Yang, B.; Yin, X.; Zhang, N. Sufficient variable selection using independence measures for continuous response. J. Multivar. Anal. 2019, 173, 480–493. [Google Scholar] [CrossRef]
Figure 1. Comparison of the estimation accuracy for Model 1.
Figure 1. Comparison of the estimation accuracy for Model 1.
Mathematics 11 02682 g001
Figure 2. Comparison of the estimation accuracy for Model 2.
Figure 2. Comparison of the estimation accuracy for Model 2.
Mathematics 11 02682 g002
Figure 3. Comparison of the estimation accuracy for Model 3.
Figure 3. Comparison of the estimation accuracy for Model 3.
Mathematics 11 02682 g003
Figure 4. Comparison of the estimation accuracy for Model 4.
Figure 4. Comparison of the estimation accuracy for Model 4.
Mathematics 11 02682 g004
Figure 5. Comparison of the estimation accuracy for Model 5.
Figure 5. Comparison of the estimation accuracy for Model 5.
Mathematics 11 02682 g005
Figure 6. Comparison of the estimation accuracy for Model 6.
Figure 6. Comparison of the estimation accuracy for Model 6.
Mathematics 11 02682 g006
Figure 7. Analysis of college admission data: (a) scatterplots of  l o g ( Y )  vs. the first estimated direction, (b) scatterplots of  l o g ( Y )  vs. the second estimated direction.
Figure 7. Analysis of college admission data: (a) scatterplots of  l o g ( Y )  vs. the first estimated direction, (b) scatterplots of  l o g ( Y )  vs. the second estimated direction.
Mathematics 11 02682 g007
Figure 8. Analysis of college admission data: 3D scatterplots of  l o g ( Y )  and two estimated directions.
Figure 8. Analysis of college admission data: 3D scatterplots of  l o g ( Y )  and two estimated directions.
Mathematics 11 02682 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Wang, W.; Chen, J.; Rao, W. Aggregate Kernel Inverse Regression Estimation. Mathematics 2023, 11, 2682. https://doi.org/10.3390/math11122682

AMA Style

Li W, Wang W, Chen J, Rao W. Aggregate Kernel Inverse Regression Estimation. Mathematics. 2023; 11(12):2682. https://doi.org/10.3390/math11122682

Chicago/Turabian Style

Li, Wenjuan, Wenying Wang, Jingsi Chen, and Weidong Rao. 2023. "Aggregate Kernel Inverse Regression Estimation" Mathematics 11, no. 12: 2682. https://doi.org/10.3390/math11122682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop