Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models

Song, Yunquan; Li, Zitong; Fang, Minglu

doi:10.3390/math10122000

Open AccessArticle

Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models

by

Yunquan Song

^*,

Zitong Li

and

Minglu Fang

College of Science, China University of Petroleum, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(12), 2000; https://doi.org/10.3390/math10122000

Submission received: 23 March 2022 / Revised: 21 May 2022 / Accepted: 23 May 2022 / Published: 10 June 2022

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications II)

Download

Browse Figures

Versions Notes

Abstract

:

The single-index model is an intuitive extension of the linear regression model. It has been increasingly popular due to its flexibility in modeling. In this work, we focus on the estimators of the parameters and the unknown link function for the single-index model in a high-dimensional situation. The SCAD and Laplace error penalty (LEP)-based penalized composite quantile regression estimators, which could realize variable selection and estimation simultaneously, are proposed; a practical iterative algorithm is introduced to obtain the efficient and robust estimators. The choices of the tuning parameters, the bandwidth, and the initial values are also discussed. Furthermore, under some mild conditions, we show the large sample properties and oracle property of the SCAD and Laplace penalized composite quantile regression estimators. Finally, we evaluated the performances of the proposed estimators by two numerical simulations and a real data application.

Keywords:

single-index models; composite quantile regression; SCAD; Laplace error penalty (LEP)

MSC:

62F12; 62G08; 62G20; 62J07T07

1. Introduction

As a generalized regression model, the single-index regression model has a wide range of applications in the fields of finance, economics, biomedicine, etc. The single-index regression model not only avoids the so-called “curse of dimensionality” in the nonparametric models, but also significantly improves the efficiency of model estimation and reveals the relationship between the response variables and the high-dimensional covariates, to keep good interpretability of the parametric models and flexibility of the nonparametric models simultaneously [1,2,3,4,5]. However, the single-index regression model also inherits the shortcomings of the classical regression models. For example, in practical applications, especially in heavy-tailed error distribution scenarios, it is difficult to satisfy the bounded error variance, which a single-index regression model requires. Moreover, in the mean regression scenario, the estimation results of a single-index regression model are very sensitive to extreme values. To overcome these drawbacks, robust regression methods are necessary for the single-index regression model when fitting the real data.

Compared with the mean regression, the quantile regression proposed by [6] can measure the effect of the explanatory variable, not only on the distribution center but also on the upper and lower tails of the response variable. The quantile regression (QR) estimation, which is not restricted by the error distribution and can effectively avoid the impact of outliers, is more robust than the least squares estimation. Furthermore, in order to utilize the information on different quantiles adequately, composite quantile regression (CQR) was proposed by [7]. [8] added the SCAD-L2 penalty to the loss function and proposed a robust variable selection method based on the weighted composite quantile regression (WCQR), which made variable selection insensitive to high-leverage points and outliers. In this article, we studied the estimation and variable selection of the single-index quantile regression model. The single-index quantile regression model is specified in the following form

Y = g (X^{⊤} γ) + ε,

(1)

where Y is the response variable,

X

is a d-dimensional covariate vector,

γ

is an unknown parameter vector,

g (\cdot)

is an unknown link function,

ε

is the random error, and the

τ

th conditional quantile is zero, i.e.,

P (ε \leq 0 | X) = τ

. In order to identify it more easily, we assume that

‖ γ ‖ = 1

and the first component of

γ

is positive, where

‖ \cdot ‖

denotes the Euclidean norm.

There are two estimation problems for the single-index quantile regression model. One is the estimation of parameters and the other is the estimation of the link function. The study of estimation for single-index quantile regression models began with [9], which generalized the average derivative method. Meanwhile, [10] proposed a simple algorithm to achieve the quantile regression for single-index models and proved the asymptotic properties of estimators. [3] proposed D-Vine Copula-based quantile regression, which is a new algorithm that does not require accurately assuming the shape of conditional quantiles and avoids the typical defects of linear models, such as multicollinearity. [11] proposed a non-iterative coincidence quantile regression (NICQR) estimation algorithm for the single-index quantile regression model, which has high computational efficiency and is suitable for analyzing massive data sets.

In real data, the model is often sparse. The variables inevitably contain a few irrelevant and unnecessary variables while modeling the real data, which can degrade the efficiency of the resulting estimation procedure and increase the complexity of models. In the case of linear models, many authors have considered variable selection via penalized least squares, which allows for a simultaneous selection of variables and estimation of regression parameters. Several penalty functions, including the SCAD [12], the adaptive LASSO [13], and the adaptive elastic net [14] have been shown to possess favorable theoretical properties: unbiased, sparsity, and continuity; it is regarded as the basic properties that a good estimator should enjoy [15]. It enjoys the oracle property. [5] combined the SCAD penalty variable selection method with LM-ANN for modeling, making good use of the advantages of SCAD in dimension reduction and the efficiency of LM-ANN in nonlinear relationship modeling.

Similar to the linear regression model, the set of predictors for the single-index quantile regression model can contain a large number of irrelevant variables. Therefore, it is important to select the relevant variables when fitting the single-index quantile regression model. However, the problem of variable selection for the high-dimensional single-index quantile regression model is not well settled in the literature. In recent years, many significant research results have emerged on the variable selection problem of the single-index quantile regression model. [16] proposed a non-iterative estimation and variable selection method for the single-index quantile regression model. The initial value and the weight of the penalty function were obtained via the inverse regression technique, which is the key to this method. [17] combined least absolute deviations (LAD) and SCAD for single-index models. However, we note that SCAD is a piecewise continuous spline function. Because of this structure, different splines need different derivative formulas; it is necessary to select different derivative formulas to match different splines when we carry out penalty optimization. This certainly adds to the programming complexity. So, [18] proposed a continuous bounded smooth penalty function–Laplace error penalty (LEP) that does not have a piecewise spline structure and proved its oracle property. LEP is infinitely differentiable everywhere except at the origin and, therefore, is much smoother than SCAD. Furthermore, LEP can approach the

L_{0}

penalty as close as possible, which is viewed as the optimal penalty. Moreover, LEP can yield a convex objective function for optimization under mild conditions, such that it is easier to calculate and obtain the only optimal solution with desired properties.

In this paper, we combined (composite) the quantile regression method with the SCAD penalty and Laplace error penalty to construct two sparse estimators for the single-index quantile regression model. Our method realizes variable selection and parameter estimation simultaneously. In addition, we prove that the proposed estimator has large sample properties, including N-consistency and oracle properties. A simulation study showed that our method has some resistance to heavy tail errors and outliers, and the accuracy of parameter estimation is higher.

The rest of this paper is organized as follows. In Section 2, the SCAD penalized composite quantile regression and the Laplace penalized composite quantile regression for single-index models are introduced. Furthermore, an iterative algorithm for the single-index model is analyzed and the selections of bandwidth, tuning parameters, and initial values are discussed. In Section 3, we state the large sample properties of SCAD and Laplace penalized composite quantile estimators for single-index models. In Section 4, we show our method and algorithm by two numerical simulations and real data. Section 5 includes some concluding remarks. Technical proofs and the algorithm based on LEP are relegated to Appendix A and Appendix B, respectively.

2. Problem Setting and Methodology

2.1. Composite Quantile-SCAD Method for Single-Index Models

We assume

{X_{i}, Y_{i}, i = 1, 2, \dots, n}

are n independent samples from the single-index model (1). Note that there are two estimation problems, which are the estimation of the parameter vector

γ

and the estimation of the link function

g (\cdot)

. Given an accurate estimate of

γ

, the link function

g (\cdot)

can be locally approximated by a linear function

g (X^{⊤} γ) \approx g (u) + g^{'} (u) (X^{⊤} γ - u) = a + b (X^{⊤} γ - u),

(2)

for

X^{⊤} γ

in the neighborhood of u, where

a = g (u)

and

b = g^{'} (u)

are local constants. Namely, we can obtain a good local linear estimation of

g (u)

and

g^{'} (u)

, which are

\hat{g} (u) = \hat{a}

and

\hat{g^{'}} (u) = \hat{b}

, respectively, based on an accurate estimate of

γ

. So our main interest is to estimate the parameter vector. Following [19], we adopt the MAVE estimate of

γ

, which is obtained by solving the minimization problem

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{i = 1}^{n} {[Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)]}^{2} w_{i j},

(3)

where

w_{i j} = k_{h} (X_{i}^{⊤} γ - X_{j}^{⊤} γ) / \sum_{l = 1}^{n} k_{h} (X_{l}^{⊤} γ - X_{j}^{⊤} γ)

,

a = {(a_{1}, a_{2}, \dots, a_{n})}^{⊤}

,

b = {(b_{1}, b_{2}, \dots, b_{n})}^{⊤}

,

k_{h} (\cdot) = k (\cdot / h) / h

and

k (\cdot)

is a symmetric kernel function, h is the bandwidth. [20] combined the MAVE and LASSO to obtain the sparse estimate (sim-lasso) of

γ

by solving the following minimization problem

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{i = 1}^{n} {[Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)]}^{2} w_{i j} + λ \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} | γ_{k} |,

(4)

where

λ

is a nonnegative penalty parameter. Note that the above target loss function is the square loss function based on the least squares method and, naturally, the LAD is extended to a single-index model as an alternative to the LS method. [17] combined LAD with SCAD to construct a sparse estimator of

γ

by solving the following minimization problem

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{i = 1}^{n} | Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ) | w_{i j} + \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} p_{λ} (| γ_{k} |),

(5)

where

p_{λ} (\cdot)

is the SCAD penalty function proposed by [3]; it is defined in terms of its first order derivative. For

θ > 0

p_{λ}^{'} (θ) = λ {I (θ \leq λ) + \frac{{(a λ - θ)}_{+}}{(a - 1) λ} I (θ > λ)},

(6)

where

a > 2

and

λ

is a nonnegative penalty parameter, the notation

Z_{+}

stands for the positive part of Z. The LAD is a special case of the quantile regression, which only indicates the case when the quantile is 1/2. Thus, this motives us to generalize composite quantile regression for single-index models. Combining the SCAD penalty function with the compound quantile regression, we can obtain a sparse estimation

{\hat{γ}}^{q r . s i m . s c a d}

of the parameter

γ

, which is the solution to the following minimization problem

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)] w_{i j} + \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} p_{λ} (| γ_{k} |)

(7)

where

τ_{q} = \frac{q}{q + 1} \in (0, 1)

stands for

τ_{q}

-quantile and

q = 1, 2, \dots, Q

with the number of quantile Q, and

ρ_{τ_{q}} (z) = τ_{q} z \cdot I_{[0, \infty]} (z) - (1 - τ_{q}) z \cdot I_{(- \infty, 0)} (z)

is the

τ_{q}

-quantile loss function. In addition, we assume that the

τ_{q}

-quantile of the random error

ε

is 0. Thus,

g (X^{⊤} γ)

is the conditional

τ_{q}

-quantile of the response variable Y. We denote the target function in (7) by

Q_{λ}^{S} (a, b, γ)

.

2.2. Composite Quantile–LEP Method for Single-Index Models

The Laplace error penalty function with two tuning parameters is proposed by [18]. Unlike other penalty functions, this new penalty function is naturally constructed as a bounded smooth function rather than a piecewise spline. The figure of LEP is similar to SCAD, but is much smoother than SCAD, which prompts us to apply it to the composite quantile regression for single-index models. Combining LEP with composite quantile regression, we can obtain a sparse estimation

{\hat{γ}}^{q r . s i m . l e p}

of the parameter

γ

, which is the solution to the following minimization problem

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)] w_{i j} + \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} p_{λ, κ} (| γ_{k} |)

(8)

where

P_{λ, κ} (\cdot)

is LEP. For

θ > 0

,

P_{λ, κ} (θ) = λ (1 - e^{- \frac{θ}{κ}}),

(9)

where

λ

and

κ

are two nonnegative tuning parameters regularizing the magnitude of penalty and controlling the degree of approximation to the

L_{0}

penalty, respectively. This penalty function is called Laplace penalty function because function

e^{- \frac{θ}{κ}}

has the form of the Laplace density. We denote the target function in (8) by

Q_{λ, κ}^{S} (a, b, γ)

.

2.3. Computation

Given the initial estimate

\hat{γ}

, the SCAD penalty function can be locally linear, approximated as follows [21]. For

{\hat{γ}}_{j} \neq 0

,

p_{λ} (| γ_{j} |) \approx p_{λ} (| {\hat{γ}}_{j} |) + p_{λ}^{'} (| {\hat{γ}}_{j} |) (| γ_{j} | - | {\hat{γ}}_{j} |),

(10)

where

γ_{j} \approx {\hat{γ}}_{j}

. Remove a few irrelevant terms, (7) can be rewritten as

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)] w_{i j} + \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} p_{λ}^{'} (| {\hat{γ}}_{k} |) | γ_{k} | .

(11)

We denote the target function in (11) by

Q_{λ}^{S *} (a, b, γ)

. We can easily discover that

Q_{λ}^{S *}

is invariant, so when minimizing

Q_{λ}^{S *}

, we restrict

‖ γ ‖ = 1

to be the unit length

‖ γ ‖ = 1

through normalization

γ

.

In order to obtain an accurate estimate of

γ

and

g (\cdot)

, we introduce a new iterative algorithm. Then our estimation procedure is described in detail as follows:

Step 0.: Obtain an initial estimate of $γ$ . Standardize the initial estimate $\hat{γ}$ , such that $‖ γ ‖ = 1$ and ${\hat{γ}}_{1} > 0$ .
Step 1.: Given an estimate $\hat{γ}$ , we obtain ${{\hat{a}}_{j}, {\hat{b}}_{j}, j = 1, 2, \dots, n}$ by solving

$\begin{matrix} min_{(a_{j}, b_{j})} \sum_{i = 1}^{n} \sum_{q = 1}^{Q} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} \hat{γ} - X_{j}^{⊤} \hat{γ})] w_{i j} + | b_{j} | \sum_{k = 1}^{d} p_{λ}^{'} (| {\hat{γ}}_{k} |) | {\hat{γ}}_{k} ∣ \\ = & min_{(a_{j}, b_{j})} \sum_{i = 1}^{n + 1} \sum_{q = 1}^{Q} ρ [Y_{i}^{*} - (A, B) (\begin{matrix} a_{j} \\ b_{j} \end{matrix})] w_{i j}^{*}, \end{matrix}$

(12)

where h is the optimal bandwidth, $(ρ, Y_{i}^{*}, A, B, w_{i j}^{*}) = (ρ_{τ_{q}}, Y_{i}, 1, X_{i}^{⊤} \hat{γ} - X_{j}^{⊤} \hat{γ}, w_{i j})$ for $i = 1, 2, \dots, n$ , and $(ρ, Y_{i}^{*}, A, B, w_{i j}^{*}) = (1 / Q, 0, 0, \sum_{k = 1}^{d} p_{λ}^{'} (| {\hat{γ}}_{k} |) | {\hat{γ}}_{k} |, 1)$ for $i = n + 1$ . The $r q (\cdot)$ function in R package “quantreg” is helpful to obtain ${{\hat{a}}_{j}, {\hat{b}}_{j}, j = 1, 2, \dots, n}$ . Moreover, for the SCAD penalty, we can apply a difference-of-convex algorithm [22] to the simple computation.
Step 2.: Given ${{\hat{a}}_{j}, {\hat{b}}_{j}, j = 1, 2, \dots, n}$ , update $\hat{γ}$ by solving

$min_{γ} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - {\hat{a}}_{j} - {\hat{b}}_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)] w_{i j} + \sum_{j = 1}^{n} | {\hat{b}}_{j} | \sum_{k = 1}^{d} p_{λ}^{'} (| {\hat{γ}}_{k} |) | γ_{k} | .$

(13)

We can apply a fast and efficient coordinate descent algorithm [23] if d is very large, or combine the MM algorithm [24] to reduce the calculation.
Step 3.: Scale $\hat{b} \leftarrow$ sgn $({\hat{γ}}_{1}) \cdot ‖ \hat{γ} ‖ \hat{b}$ , and $\hat{γ} \leftarrow$ sgn $({\hat{γ}}_{1}) \cdot \hat{γ} / ‖ \hat{γ} ‖$ .
Step 4.: Continue Step 1–Step 3 until convergence.
Step 5.: Given the final estimate $\hat{γ}$ from Step 4, we estimate $g (\cdot)$ at any u by $\hat{g} (\cdot, h, \hat{γ}) = \hat{a}$ , where

$(\hat{a}, \hat{b}) = min_{(a, b)} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a - b (X_{i}^{⊤} \hat{γ} - u)] k_{h} (X_{i}^{⊤} \hat{γ} - u) .$

(14)

Remark 1.

The above algorithm is aimed at the SCAD penalty function. Moreover, similarly, we can obtain the other algorithm for the Laplace penalty function, replacing SCAD with LEP.

2.4. The Selections of Bandwidth, Tuning Parameters, and Initial Value

The selection of the bandwidth plays a crucially important role in local polynomial smoothing because it controls the curvature of the fitted function. The cross-validation (CV) and the generalized cross-validation (GCV) can be utilized to choose a proper bandwidth, but these methods are not computationally practical due to the large calculation amounts. For the local linear quantile regression, [25] obtained an approximate optimal bandwidth under some suitable assumptions and found the rule-of-thumb bandwidth:

h_{τ} = h_{m} {τ (1 - τ) / ψ^{2} (Φ^{- 1} (τ))}^{1 / 5}

, where

ψ (\cdot)

and

Φ (\cdot)

are the probability density function and the cumulative distribution function of the normal distribution, respectively;

h_{m}

is the optimal bandwidth of the least squares regression. There are many algorithms for the selection of

h_{m}

. [26] found that the rule-of-thumb bandwidth:

h_{m} = {4 / (d + 2)}^{1 / (4 + d)} n^{- 1 / (4 + d)}

in the single-index models acts fairly well, where d is the dimension of the kernel function. We combined a multiplier only consisting of

τ

with the optimal bandwidth

h_{m}

of the LS regression to obtain a

h_{τ}

with good characters.

There are several kinds of selection methods for SCAD’s nonnegative tuning parameter

λ

, such as CV, GCV, AIC, BIC, and so on. Following [27], we utilized the BIC criterion to choose a proper tuning parameter of SCAD in this paper

BIC (λ) = \frac{1}{\tilde{σ}} \sum_{i = 1}^{n} \sum_{q = 1}^{Q} ρ_{τ_{q}} [Y_{i} - g (X_{i}^{⊤} \hat{γ} (λ))] + l o g (n) \cdot d f / 2,

(15)

where

\tilde{σ} = (1 / n) \sum_{i = 1}^{n} \sum_{q = 1}^{Q} ρ_{τ_{q}} [Y_{i} - g (X_{i}^{⊤} \tilde{γ})]

with

\tilde{γ}

being the composite quantile estimator without penalty and

d f

is the number of non-zero coefficients of

\hat{γ} (λ)

. Then, we chose the optimal tuning parameter by minimizing the above criteria. Moreover, for LEP, we utilized the extended Bayesian information criterion (EBIC) [28] to choose proper tuning parameters

λ

and

κ

.

EBIC (λ, κ) = l o g ({\hat{σ}}^{2}) + \frac{l o g (n) + l o g l o g (d)}{n} d f

(16)

where

{\hat{σ}}^{2} = 1 / (n - 1) \sum_{i = 1}^{n} \sum_{q = 1}^{Q} ρ_{τ_{q}} [Y_{i} - g (X_{i}^{⊤} \hat{γ})]

and

d f

is the number of non-zero coefficients of

\hat{γ} (λ, κ)

. Similarly, in order to select the best tuning parameters, we minimized the above criteria in the arrangement of

λ

values.

The initial value of the unknown parameter is required at the beginning of the iteration of our algorithm. A convenient choice is

\hat{γ} / ‖ \hat{γ} ‖

where

\hat{γ}

is the composite quantile estimator without penalty.

3. Theoretical Properties

A good estimate is supposed to satisfy unbiasedness, continuity, and the so-called oracle property. Thus, in this section, we discuss the large sample properties of the SCAD penalized composite quantile regression and the Laplace penalized composite quantile regression for single-index models. We consider the data

{(X_{i}, Y_{i}), i = 1, 2, \dots, n}

including n observations from model (1), such as Section 2. Moreover, let

X_{i} = {(X_{i 1}^{⊤}, X_{i 2}^{⊤})}^{⊤}

,

γ = {(γ_{1}^{⊤}, γ_{2}^{⊤})}^{⊤}

,

X_{i 1} \in ℜ^{s}

,

X_{i 2} \in ℜ^{d - s}

. In addition,

γ_{0} = {(γ_{10}^{⊤}, γ_{20}^{⊤})}^{⊤}

stands for the true regression parameters of model (1) and

‖ γ_{0} ‖ = 1

, where the s components in

γ_{10}

are not zero. We suppose the following regularity conditions to hold:

(i): The density function of $X^{⊤} γ$ is positive and uniformly continuous for $γ$ in a neighborhood of $γ_{0}$ . Further, the density of $X^{⊤} γ_{0}$ is continuous and bounded away from 0 and ∞ on its support D.
(ii): The function $g (\cdot)$ has a continuous and bounded second derivative in D.
(iii): The kernel function $k (\cdot)$ is a symmetric density function with bounded support and a bounded first derivative.
(iv): The density function $f_{Y} (\cdot)$ of Y is continuous and has a bounded derivative; it is bounded away from 0 and ∞ on compact support.
(v): The following expectations exist:

$\begin{matrix} C_{0} = E {g^{'} {(X^{⊤} γ_{0})}^{2} [X - E (X | X^{⊤} γ_{0})] {[X - E (X | X^{⊤} γ_{0})]}^{⊤}} \\ C_{1} = E {f_{Y} (g (X^{⊤} γ_{0})) g^{'} {(X^{⊤} γ_{0})}^{2} [X - E (X | X^{⊤} γ_{0})] {[X - E (X | X^{⊤} γ_{0})]}^{⊤}} \end{matrix}$
(vi): $h \to 0$ and $n h \to \infty$ .

Given

({\hat{a}}_{j}, {\hat{b}}_{j})

, let

H = \sum_{j = 1}^{n} | {\hat{b}}_{j} |

,

a_{n} = max {P_{λ}^{'} (γ_{0 j}) : γ_{0 j} \neq 0}

and

{\hat{γ}}^{q r . s i m . s c a d}

be the solution of (7). We should note that under condition (ii), the first derivative is bounded. Thus,

H = O (n)

.

Theorem 1.

Under the conditions (i)–(v). If

max {P_{λ}^{″} (| γ_{0 k} |) : γ_{0 k} \neq 0} \to 0

and

a_{n} = O (n^{- 1 / 2})

, then there exists a local minimizer in (7) such that

‖ {\hat{γ}}^{q r . s i m . s c a d} - γ_{0} ‖ = O_{P} (n^{- 1 / 2} + a_{n})

with

‖ {\hat{γ}}^{q r . s i m . s c a d} ‖ = ‖ γ_{0} ‖ = 1

.

According to Theorem 1, we show that there exists a

\sqrt{n}

-consistent SCAD penalized composite quantile regression estimate for

γ

if a proper tuning parameter

λ

is selected. Let

c_{n} = {p_{λ}^{'} (| γ_{01} |)

sgn

(γ_{01}), \dots, p_{λ}^{'} (| γ_{0 s} |)

sgn

(γ_{0 s})}^{⊤}

, and

\sum_{λ} = diag (p_{λ}^{″} (| γ_{01} |), \dots, p_{λ}^{″} (| γ_{0 s} |)

.

Lemma 1.

Under the same conditions as in Theorem 1. Assume that

\underset{n \to + \infty}{lim inf} \underset{θ \to 0^{+}}{lim inf} p_{λ_{n}}^{'} (θ / λ_{n}) > 0 .

(17)

If

λ \to 0

and

\sqrt{H} λ \to \infty

as

n \to \partial

, then with probability tending to 1, for any given

{\hat{γ}}_{1}

satisfying

‖ {\hat{γ}}_{1} - γ_{10} ‖ = O_{P} (n^{- 1 / 2})

, and any constant C, we have

Q_{λ}^{S} ({(γ_{1}^{⊤}, 0^{⊤})}^{⊤}) = min_{‖ γ_{2} ‖ \leq C n^{- 1 / 2}} Q_{λ}^{S} ({(γ_{1}^{⊤}, γ_{2}^{⊤})}^{⊤}) .

(18)

Theorem 2.

Under the same conditions as in Theorem 1—assume that the penalty function

p_{λ} (| θ |)

satisfies condition (17). If

λ \to 0

and

\sqrt{H} λ \to \infty

, then with the probability tending to 1, the

\sqrt{n}

-consistent local minimizer

{\hat{γ}}^{q r . s i m . s c a d} = {({({\hat{γ}}_{1}^{q r . s i m . s c a d})}^{⊤}, {({\hat{γ}}_{2}^{q r . s i m . s c a d})}^{⊤})}^{⊤}

in Theorem 1 must satisfy:

(i) Sparsity:

{\hat{γ}}_{2}^{q r . s i m . s c a d} = 0

.

(ii) Asymptotic normality:

\sqrt{n} {(Q C_{11} + H \sum_{λ} / n) ({\hat{γ}}_{1}^{q r . s i m . s c a d} - γ_{10}) + H c_{n} / n} \overset{D}{\to} N (0, 0.2 C_{01}),

where

C_{11}

is the top-left s-by-s sub-matrix of

C_{1}

and

C_{01}

is the top-left s-by-s sub-matrix of

C_{0}

.

Theorem 2 shows that the SCAD penalized composite quantile regression estimator has the so-called oracle property when

λ \to 0

and

\sqrt{H} λ \to \infty

.

Remark 2.

In this section, we discuss the large sample properties of the SCAD penalized composite quantile estimator (

{\hat{γ}}^{q r . s i m . s c a d}

) in detail. Similarly, we can also show the large sample properties of the Laplace penalized composite quantile estimator (

{\hat{γ}}^{q r . s i m . l e p}

).

4. Numerical Studies

4.1. Simulation Studies

In this section, we evaluate the proposed estimator by simulation studies. We compare the performances of different penalized estimators with the oracle estimator. Specially, in order to reduce the burden of computation and simplify the calculation, we take the Gaussian kernel as the kernel function in our simulations. Moreover, we do not tune the value of the parameter a and set

a = 3.7

, which is suggested by [12] for the SCAD penalty. Moreover, we set the quantile number:

Q = 5

. Next, we compare the performances of the following four estimates for the single-index model:

lad.sim.scad: the LAD estimators with the SCAD penalty;
cqr.sim.scad: the composite quantile estimators with the SCAD penalty;
cqr.sim.lep: the composite quantile estimators with the SCAD penalty;
Oracle: the oracle estimators (composite quantile regression without penalty under the true model).

In order to evaluate the performances of the above estimators, we consider the following criteria:

MAD (the mean absolute deviation) of $\hat{γ}$ : $MAD = \frac{1}{n} \sum_{i = 1}^{n} | X_{i}^{⊤} \hat{γ} - X_{i}^{⊤} γ_{0} |$ .
NC: the average number of non-zero coefficients that are correctly estimated to be non-zero.
NIC: the average number of zero coefficients that are incorrectly estimated to be non-zero, respectively.

Additionally, an estimated coefficient is viewed as 0 if its absolute value is smaller than

10^{- 6}

.

Scenario 1. We assume that the single-index model has the following form:

Y = 2 X^{⊤} γ_{0} + 10 exp (- {(X^{⊤} γ_{0})}^{2} / 5) + ϵ,

where

γ_{0} = γ / ‖ γ ‖

with

γ = {(1, - 1, 2, - 0.5, 0, \dots, 0)}^{⊤}

being a 15-dimensional vector with only four non-zero values (the true coefficients). The X-variables are generated from the multivariate normal distribution and set the correlation between

X_{i}

and

X_{j}

to be

{0.5}^{| i - j |}

and the Y-variable is generated from the above model. Then, to eliminate the impacts of different error distributions, we consider the following five error distributions:

$N (0, 1)$ : the standard normal distribution (N);
$t (3)$ : the t-distribution with 3 degrees of freedom;
$D E$ : the double exponential distribution with media 0 and scale parameter $1 / \sqrt{2}$ ;
$C N$ : the polluted normal distribution $0.9 N (0, 1) + 0.1 N (0, 25)$ (CN);
$O u t l i e r$ : an outlier case is considered, in which $10 %$ of the responses are shifted with a constant $c = 5$ .

In order to perform the simulations, we generated 200 replicates with moderate sample sizes,

n = 100

, 200. Then, the corresponding results are reported in Table 1.

Scenario 2. The model set-up is similar to Example 1, except the link function is

X^{⊤} γ_{0} + 4 \sqrt{| X^{⊤} γ_{0} + 1 |}

. These link functions are also analyzed by [20]. Then, Table 2 summarizes the corresponding results.

From Table 1 and Table 2, we can note that the performance of cqr.sim.lep is best, cqr.sim.scad is second, and lad.sim.scad is the worst for five different error distributions. This is consistent with our theoretical findings. Furthermore, with the sample size increasing, all estimators become better.

4.2. Real Data Application: Boston Housing Data

In this section, the methods are illustrated through an analysis of Boston housing data. The data (506 observations and 14 variables) are available in the package (‘MASS’) in R, and the definitions of the dependent variable (MEDV) and explanatory variables are described in Table 3. We checked whether there were missing values in the data through the violin diagram of each variable. Figure 1 and Figure 2 show the violins between the first and last seven variables, respectively. It is obvious from Figure 1 and Figure 2 that there are obvious outliers in CRIM and Black columns. In order to test the linear relationship among variables, the heat map between the variables is given in Figure 3. It can be seen from the heat map that variables RM, Ptratio, Istat, and MEDV have certain correlations. The correlation coefficients between Indus and nox, CRIM and RAD, RAD and tax, and Indus and tax were 0.7, 0.8, 0.9, and 0.7, respectively. Therefore, there was a high correlation between variables, so the single-index regression model could be considered.

Boston housing data have been utilized by many regression studies; the potential relationship between MEDV and X-variables was also founded [10,17]. For the single-index quantile regression, [10] introduced a practical algorithm where the unknown link function

g (\cdot)

is estimated by local linear quantile regression and the parametric index was estimated through linear quantile regression. However, the authors did not consider the variable selection. For the single-index regression models, [17] considered the penalized LAD regression, which dealt with variable selection and estimation simultaneously. However, the LAD estimator is only the special case of the quantile estimator in which the quantile

τ

is equal to 0.5. Moreover, the two literature studies mentioned above only used the case of a single quantile. In this article, we constructed new sparse estimators for single-index quantile regression models based on the composite quantile regression method combined with the SCAD penalty and Laplace error penalty.

Due to the sparsity of data in the region concerned, possible quantile curves cross at both tails similar to [14]. The results of the real data example and simulation studies confirm the reasonableness and effectiveness of our method in practice.

In order to better numerical performance, we need to standardize the response variable MEDV and the predictor variables except CHAS before applying our method. The estimated coefficient is treated as 0 if its absolute value is smaller than

10^{- 12}

. Then, the corresponding results are reported in Table 4.

From Table 4, we see that all methods can achieve the variable selection and parameter estimation simultaneously in the real problem. Moreover, all methods choose the sparse model including RM, DIS, PTRATIO, LSTAT, the same as the one using all predictors without penalty (cqr.sim). Moreover, the estimation of cqr.sim.lep is the closest to cqr.sim. These results indicate that only four explanatory variables are significant and the rest are irrelevant.

5. Conclusions

In this article, we propose SCAD and Laplace penalized composite quantile regression estimators for single-index models in a high-dimensional case. Compared with the least squares method, composite quantile regression can obtain the robust estimator with respect to heavy-tailed error distributions and outliers.Then, a practical iterative algorithm was introduced. It is based on composite quantile regression and uses local linear smoothing to estimate the unknown link function. This method realizes parameter selection and estimation simultaneously by combining two kinds of penalty functions with the composite quantile regression. In addition, we proved that the proposed estimator has large sample properties, including

\sqrt{n}

-consistence and the oracle property. Furthermore, the estimator was evaluated and illustrated by numerical studies. Moreover, we can draw the conclusion from Boston housing data: the sparse model with the same significant variables was selected by all three estimation methods; however, the estimators of cqr.sim.lep were the closest to that of cqr.sim. This reveals that using—or not using—the LEP penalty essentially acts the same when we estimate the link function.

Author Contributions

Formal analysis, Z.L.; Methodology, Y.S.; Software, M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the NNSF project (61503412) of China and the NSF project (ZR2019MA016) of the Shandong Province of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorems

Proof of Theorem 1.

Given

({\hat{a}}_{j}, {\hat{b}}_{j})

, we need to definite other symbols before proving. Let

α_{n} = n^{- 1 / 2} + a_{n}

,

Y_{i j} = Y_{i} - {\hat{a}}_{j} - {\hat{b}}_{j} X_{i j}^{⊤} γ_{0}

,

X_{i j} = X_{i} - X_{j}

,

γ = α_{n} u + γ_{0}

where

u

is a d-dimensional vector,

S (u) = \sum_{i, j, q} ρ_{τ_{q}} [Y_{i j} - α_{n} {\hat{b}}_{j} X_{i j}^{⊤} u] w_{i j} - \sum_{i, j, q} ρ_{τ_{q}} (Y_{i j}) w_{i j}

and set

‖ u ‖ = C

where C is a large enough constant.

Our purpose is to prove that

{\hat{γ}}^{q r . s i m . s c a d}

is

\sqrt{n}

-consistent; that is, to show that for any given

ϵ > 0

and large n, there is a large enough constant C, such that

P {inf_{‖ u ‖ = C} Q_{λ}^{S} (γ_{0} + α_{n} u) > Q_{λ}^{S} (γ_{0})} \geq 1 - ϵ,

(A1)

which implies that there exists a local minimum

\hat{γ}

in the ball

{α_{n} u + γ_{0} : ‖ u ‖ \leq C}

, such that

‖ \hat{γ} - γ_{0} ‖ = O_{p} (α_{n})

with probability of at least

1 - ϵ

.

Let

\begin{matrix} D_{n} (u) & = Q_{λ}^{S} (α_{n} u + γ_{0}) - Q_{λ}^{S} (γ_{0}) \\ \geq S (u) + H \sum_{k = 1}^{s} [p_{λ} (| α_{n} u_{k} + γ_{0 k} |) - p_{λ} (| γ_{0 k} |)] \\ = : I + ⨿ \end{matrix}

(A2)

First, we consider I and partition it into

I_{1}

and

I_{2}

. We have

\begin{matrix} I & = \sum_{i, j, q} ρ_{τ_{q}} [Y_{i j} - {\hat{b}}_{j} X_{i j}^{⊤} α_{n} u] w_{i j} - \sum_{i, j, q} ρ_{τ_{q}} (Y_{i j}) w_{i j} \\ = \sum_{i, j, q} w_{i j} {{\hat{b}}_{j} X_{i j}^{⊤} α_{n} u [I (Y_{i j} < {\hat{b}}_{j} X_{i j}^{⊤} α_{n} u) - τ_{q}]} + \sum_{i, j} Q w_{i j} Y_{i j} I ({\hat{b}}_{j} X_{i j}^{⊤} α_{n} u < Y_{i j} < 0) \\ = I_{1} + I_{2} \end{matrix}

(A3)

Obviously,

\begin{matrix} ‖ I_{1} ‖ & \leq ‖ \sum_{i, j} Q w_{i j} {\hat{b}}_{j} X_{i j}^{⊤} α_{n} u ‖ \\ = \sqrt{n} α_{n} ‖ u ‖ ‖ \sum_{i, j} Q w_{i j} {\hat{b}}_{j} X_{i j}^{⊤} / \sqrt{n} ‖ \end{matrix}

(A4)

Note that

‖ \sum_{i, j} Q w_{i j} {\hat{b}}_{j} X_{i j}^{⊤} / \sqrt{n} ‖ = O_{P} (1)

, which can refer to the proof of Theorem 2 of [10]. Therefore, we can obtain that

‖ I_{1} ‖ = O_{P} (\sqrt{n} α_{n}) ‖ u ‖ = O_{P} (n α_{n}^{2}) ‖ u ‖

. Then, we can obtain the mean and variance of

I_{2}

by direct computation.

\begin{matrix} E I_{2} & = \sum_{i, j} Q w_{i j} \int_{- \infty}^{+ \infty} Y_{i j} \cdot I ({\hat{b}}_{j} X_{i j}^{⊤} α_{n} u < Y_{i j} < 0) \cdot f_{Y_{i j}} (y) d y \\ = \sum_{i, j} Q w_{i j} \int_{{\hat{b}}_{j} X_{i j}^{⊤} α_{n} u}^{0} Y_{i j} \cdot f_{Y_{i j}} (y) d y \\ = \sum_{i, j} Q w_{i j} \cdot P ({\hat{b}}_{j} X_{i j}^{⊤} α_{n} u < Y_{i j} < 0) \\ = \sum_{i j} Q w_{i j} P_{i j} \end{matrix}

(A5)

where

f_{Y_{i j}} (\cdot)

is the probability density function of

Y_{i j}

,

P_{i j}

stands for

P ({\hat{b}}_{j} X_{i j}^{⊤} α_{n} u < Y_{i j} < 0)

. Note that

E I_{2} > 0

, furthermore,

E I_{2} \to 0

. Moreover,

\begin{matrix} V a r (I_{2}) & = \sum_{i, j} Q w_{i j} \cdot E {[Y_{i j} \cdot I ({\hat{b}}_{j} X_{i j}^{⊤} α_{n} u < Y_{i j} < 0) - P_{i j}]}^{2} \\ \leq \sum_{i, j} Q w_{i j} max_{i, j} | Y_{i j} | P_{i j} - P_{i j}^{2} \\ = \sum_{i, j} Q w_{i j} P_{i j} [max_{i, j} | Y_{i j} | - P_{i j}] \to 0 \end{matrix}

(A6)

In addition, by taking Taylor’s expansion for

P_{λ} (| γ_{k} |)

and the basic inequality, we obtain that,

\begin{matrix} ⨿ & = H \sum_{k = 1}^{s} [P_{λ} (| γ_{0 k} + α_{n} u_{k} |) - P_{λ} (| γ_{0 k} |)] \\ = H \sum_{k = 1}^{s} [α_{n} P_{λ}^{'} (| γ_{0 k} |) sgn (γ_{0 k}) u_{k} + 0.5 α_{n}^{2} P_{λ}^{″} (| γ_{0 k} |) u_{k}^{2}] \\ \leq \sqrt{s} H α_{n} a_{n} ‖ u ‖ + H max_{1 \leq k \leq s} P_{λ}^{″} (| γ_{0 k} |) α_{n}^{2} {‖ u ‖}^{2} \end{matrix}

(A7)

According to the condition

H = O_{P} (n)

,

max {P_{λ}^{″} (| γ_{0 k} |) : γ_{0 k} \neq 0} \to 0

, and

‖ u ‖ = C

,

D_{n} (u)

in (A2) is mainly determined by

I_{2}

, which is positive. Thus, we prove (A1). □

Proof of Lemma 1.

Due to

γ = α_{n} u + γ_{0}

, let

u_{1} = α_{n}^{- 1} (γ_{1} - γ_{01})

,

u_{2} = α_{n}^{- 1} (γ_{2} - γ_{02})

,

u = {(u_{1}^{⊤}, u_{2}^{⊤})}^{⊤}

. After the computation, we obtain

\begin{matrix} Q_{λ}^{S} ({(γ_{1}^{⊤}, 0^{⊤})}^{⊤}) - Q_{λ}^{S} ({(γ_{1}^{⊤}, γ_{2}^{⊤})}^{⊤}) \\ = S ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) - S ({(u_{1}^{⊤}, u_{2}^{⊤})}^{⊤}) - H \sum_{k = s + 1}^{d} P_{λ} (| γ_{k} |) \end{matrix}

(A8)

According to the proof of Theorem 1 and

‖ u ‖ = O (1)

, we have

‖ I_{1} ‖ = O_{P} (n α_{n}^{2})

and

I_{2} = O_{P} (1)

; thus, we prove that

‖ S ({(u_{1}^{⊤}, u_{2}^{⊤})}^{⊤}) ‖ = ‖ I ‖ = O_{P} (1)

. Similarly,

‖ S ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) ‖ = O_{P} (1)

. By the mean value theorem and

P_{λ} (0) = 0

, we can obtain the following inequality

\begin{matrix} H \sum_{k = s + 1}^{d} P_{λ} (| γ_{k} |) & = H \sum_{k = s + 1}^{d} P_{λ}^{'} (| γ_{k}^{*} |) | γ_{k} | \\ \geq H λ (\underset{λ \to 0}{lim inf} \underset{θ \to 0^{+}}{lim inf} (P_{λ}^{'} (θ / λ)) \sum_{k = s + 1}^{d} | γ_{k} | \end{matrix}

(A9)

where

0 < | γ_{k}^{*} | < | γ_{k} | (k = s + 1, \dots, d)

. We can obtain that

H λ = \sqrt{H} (\sqrt{H} λ)

is of higher order than

O (\sqrt{H})

because of the condition

\sqrt{H} λ \to \infty

, which implies that the last term of (A8) dominates in magnitude. As a result,

Q_{λ}^{S} ({(γ_{1}^{⊤}, 0^{⊤})}^{⊤}) - Q_{λ}^{S} ({(γ_{1}^{⊤}, γ_{2}^{⊤})}^{⊤}) < 0

for large n. This proves Lemma 1. □

Proof of Theorem 2.

(i): Follows from Lemma 1.
(ii): By partitioning the vectors $u = {(u_{1}^{⊤}, u_{2}^{⊤})}^{⊤}$ , $P_{λ} (0) = 0$ and (A2), we have

$\begin{matrix} D_{n} ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) & = S ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) + H \sum_{k = 1}^{s} [P_{λ} (| γ_{0 k} + α_{n} u_{k} |) - P_{λ} (| γ_{0 k} |)] \\ = S ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) + P_{λ} (u_{1}) \end{matrix}$

(A10)

where $P_{λ} (u_{1}) = H \sum_{k = 1}^{s} [P_{λ} (| γ_{0 k} + α_{n} u_{k} |) - P_{λ} (| γ_{0 k} |)]$ . Moreover, by Taylor’s expansion and calculation, $P_{λ} (u_{1})$ can be rewritten as

$P_{λ} (u_{1}) = H α_{n} c_{n}^{⊤} u_{1} + \frac{1}{2} H α_{n}^{2} u_{1}^{⊤} \sum_{λ} u_{1}$

(A11)

Let

\begin{matrix} δ^{*} = {\hat{a}}_{j} + {\hat{b}}_{j} X_{1 i j}^{⊤} (γ_{01} + α_{n} {\hat{u}}_{1}) \\ δ_{1}^{*} = {\hat{a}}_{j} + {\hat{b}}_{j} X_{1 i j}^{⊤} γ_{01} \end{matrix}

In order to find the minimized

{\hat{u}}_{1}

of

D_{n} ({(u_{1}^{⊤}, 0^{⊤})}^{⊤})

, we compute the derivation of it and set

D_{n}^{'} ({(u_{1}^{⊤}, 0^{⊤})}^{⊤}) = 0

. Thus, we obtain the following equation.

- \sum_{i, j, q} w_{i j} {\hat{b}}_{j} X_{1 i j} α_{n} [τ_{q} - I (Y_{i} - {\hat{a}}_{j} - {\hat{b}}_{j} X_{1 i j}^{⊤} (γ_{01} + α_{n} {\hat{u}}_{1}) < 0)] + H α_{n} c_{n} + H α_{n}^{2} \sum_{λ} {\hat{u}}_{1} = 0

(A12)

That is,

- \frac{1}{n} \sum_{i, j, q} {\hat{b}}_{j} X_{1 i j} w_{i j} [I (Y_{i} < δ^{*}) - τ_{q}] = \frac{1}{n} [H c_{n} + H α_{n} \sum_{λ} {\hat{u}}_{1}]

(A13)

Let

\begin{matrix} Z_{1} = n^{- 1 / 2} \sum_{i, j, q} {\hat{b}}_{j} X_{1 i j} w_{i j} [I (Y_{i} < δ_{1}^{*}) - τ_{q}] \\ B_{1} = n^{- 1} \sum_{i, j, q} {\hat{b}}_{j} X_{1 i j} w_{i j} [F_{Y} (δ_{1}^{*}) - F_{Y} (δ^{*})] \\ B_{2} = n^{- 1} \sum_{i, j, q} {\hat{b}}_{j} X_{1 i j} w_{i j} {[I (Y_{i} < δ_{1}^{*}) - I (Y_{i} < δ^{*})] - [F_{Y} (δ_{1}^{*}) - F_{Y} (δ^{*})]} \end{matrix}

where

F_{Y} (\cdot)

is the cumulative distribution function of Y. Therefore, we have

- \frac{1}{n} \sum_{i, j, q} {\hat{b}}_{j} X_{1 i j} w_{i j} [I (Y_{i} < δ^{*}) - τ_{q}] = - \frac{1}{\sqrt{n}} Z_{1} + B_{1} + B_{2}

(A14)

By taking the Taylor’s expansion for

F_{Y} (\cdot)

, we can obtain that

\begin{matrix} B_{1} & = - Q α_{n} n^{- 1} \sum_{i, j} {\hat{b}}_{j}^{2} f_{Y} (δ_{1}^{*}) w_{i j} X_{1 i j} X_{1 i j}^{⊤} {\hat{u}}_{1} \\ \to - Q α_{n} C_{11} {\hat{u}}_{1} \end{matrix}

(A15)

where

f_{Y} (\cdot)

is the probability density function of Y. According to the direct calculation of the mean and variance in [15], we have

B_{2} = o_{P} (\frac{Q}{\sqrt{n}}) = o_{P} (\frac{1}{\sqrt{n}})

. Moreover, combing (A14), (A15) and

{\hat{u}}_{1} = α_{n}^{- 1} ({\hat{γ}}_{1} - γ_{01})

, (A13) can be rewritten in the following form:

\sqrt{n} {(Q C_{11} + \frac{H}{n} \sum_{λ}) ({\hat{γ}}_{1} - γ_{01}) + \frac{H}{n} c_{n}} = - Z_{1} + o_{P} (\frac{1}{\sqrt{n}})

(A16)

Note that

Z_{1} \overset{D}{\to} Q N (0, 0.25 C_{01})

. We can obtain that

\sqrt{n} {(Q C_{11} + \frac{H}{n} \sum_{λ}) ({\hat{γ}}_{1} - γ_{01}) + \frac{H}{n} c_{n}} \overset{D}{\to} N (0, Q^{2} 0.25 C_{01})

(A17)

Thus, we prove Theorem 2. □

Remark A1.

Above all, we prove the

\sqrt{n}

-consistency and oracle property for the SCAD penalized composite quantile estimator

{\hat{γ}}^{q r . s i m . s c a d}

. Similarly, we can also show the same properties for the Laplace penalized composite quantile estimator

{\hat{γ}}^{q r . s i m . l e p}

Appendix B. The Algorithm Based on LEP

Similar to the SCAD penalty function, by the local linear approximation of LEP and removal of a few irrelevant terms, (8) can be rewritten as

min_{a, b, ‖ γ ‖ = 1} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ) w_{i j} + \sum_{j = 1}^{n} | b_{j} | \sum_{k = 1}^{d} p_{λ, κ}^{'} (| {\hat{γ}}_{k} |) | γ_{k} | .

(A18)

We denote the target function in (A18) by

Q_{λ, κ}^{S *} (a, b, γ)

. Moreover, the iterative algorithm based on LEP is as follows.

Step 0.: We obtain an initial estimate of $γ$ . We standardize the initial estimate $\hat{γ}$ such that $‖ γ ‖ = 1$ and ${\hat{γ}}_{1} > 0$ .
Step 1.: Given an estimate $\hat{γ}$ , we obtain ${{\hat{a}}_{j}, {\hat{b}}_{j}, j = 1, 2, \dots, n}$ by solving

$\begin{matrix} min_{(a_{j}, b_{j})} \sum_{i = 1}^{n} \sum_{q = 1}^{Q} ρ_{τ_{q}} [Y_{i} - a_{j} - b_{j} (X_{i}^{⊤} \hat{γ} - X_{j}^{⊤} \hat{γ})] w_{i j} + | b_{j} | \sum_{k = 1}^{d} p_{λ, κ}^{'} (| {\hat{γ}}_{k} |) | {\hat{γ}}_{k} | \\ = & min_{(a_{j}, b_{j})} \sum_{i = 1}^{n + 1} \sum_{q = 1}^{Q} ρ [Y_{i}^{*} - (A, B) (\begin{matrix} a_{j} \\ b_{j} \end{matrix})] w_{i j}^{*}, \end{matrix}$

(A19)

where h is the optimal bandwidth, $(ρ, Y_{i}^{*}, A, B, w_{i j}^{*}) = (ρ_{τ_{q}}, Y_{i}, 1, X_{i}^{⊤} \hat{γ} - X_{j}^{⊤} \hat{γ}, w_{i j})$ for $i = 1, 2, \dots, n$ , and $(ρ, Y_{i}^{*}, A, B, w_{i j}^{*}) = (1 / Q, 0, 0, \sum_{k = 1}^{d} p_{λ, κ}^{'} (| {\hat{γ}}_{k} |) | {\hat{γ}}_{k} |, 1)$ for $i = n + 1$ .
Step 2.: Given ${{\hat{a}}_{j}, {\hat{b}}_{j}, j = 1, 2, \dots, n}$ , update $\hat{γ}$ by solving

$min_{γ} \sum_{j = 1}^{n} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - {\hat{a}}_{j} - {\hat{b}}_{j} (X_{i}^{⊤} γ - X_{j}^{⊤} γ)] w_{i j} + \sum_{j = 1}^{n} | {\hat{b}}_{j} | \sum_{k = 1}^{d} p_{λ, κ}^{'} (| {\hat{γ}}_{k} |) | γ_{k} | .$

(A20)
Step 3.: Scale $\hat{b} \leftarrow$ sgn $({\hat{γ}}_{1}) \cdot ‖ \hat{γ} ‖ \hat{b}$ , and $\hat{γ} \leftarrow$ sgn $({\hat{γ}}_{1}) \cdot \hat{γ} / ‖ \hat{γ} ‖$ .
Step 4.: Continue Step 1–Step 3 until convergence.
Step 5.: Given the final estimate $\hat{γ}$ from Step 4, we estimate $g (\cdot)$ at any $u$ by $\hat{g} (\cdot, h, \hat{γ}) = \hat{a}$ , where

$(\hat{a}, \hat{b}) = min_{(a, b)} \sum_{q = 1}^{Q} \sum_{i = 1}^{n} ρ_{τ_{q}} [Y_{i} - a - b (X_{i}^{⊤} \hat{γ} - u)] k_{h} (X_{i}^{⊤} \hat{γ} - u) .$

(A21)

References

Kuruwita, C.N. Variable selection in the single-index quantile regression model with high dimensional covariates. Commun. Stat.-Simul. Comput. 2021, 1–13. [Google Scholar] [CrossRef]
Sara, M.; Amena, U.; Faridoon, K.; Mohammed, N.A.; Mohammed, A.; Sanaa, A. Comparison of weighted lag adaptive LASSO with Autometrics for Covariate Selection and forecasting using time-series data. Complexity 2022, 2022, 2649205. [Google Scholar]
Kraus, D.; Czado, C. D-vine copula based quantile regression. Comput. Stat. Data Anal. 2017, 110, 1–18. [Google Scholar] [CrossRef] [Green Version]
Imtiaz, S.; Abdul, G.; Abdollah, A.M. The COVID-19 pandemic and speculation in energy, precious metals, and agricultural futures. J. Behav. Exp. Financ. 2021, 30, 100498. [Google Scholar]
Mozafari, Z.; Arab Chamjangali, M.; Arashi, M.; Goudarzi, N. Performance of smoothly clipped absolute deviation as a variable selection method in the artificial neural network based QSAR studies. J. Chemom. 2021, 35, e3338. [Google Scholar] [CrossRef]
Koenker, R.; Basset, G. Regression quanties. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection Theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
Cao, Z.; Kang, X.; Wang, M. Doubly robust weighted composite quantile regression based on SCAD-L2. Can. J. Stat. 2021. [Google Scholar] [CrossRef]
Chaudhuri, P.; Doksum, K.; Samarov, A. On average derivative quantile regression. Ann. Stat. 1997, 25, 715–744. [Google Scholar] [CrossRef]
Wu, T.Z.; Yu, K.; Yu, Y. Single-index quantile regression. J. Multivar. Anal. 2010, 101, 1607–1621. [Google Scholar] [CrossRef] [Green Version]
Jiang, R.; Yu, K. Single-index composite quantile regression for massive data. J. Multivar. Anal. 2020, 180, 104669. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Lv, J. A selection overview of variable selection in high dimensional feature space. Stat. Sin. 2010, 20, 101–148. [Google Scholar]
Kuruwita, C.N. Non-iterative estimation and variable selection in the single-index quantile regression model. Commun. Stat.-Simul. Comput. 2016, 45, 3615–3628. [Google Scholar] [CrossRef]
Yang, H.; Lv, J.; Guo, C. Penalized LAD regression for single-index models. Commun. Stat.-Simul. Comput. 2016, 45, 2392–2408. [Google Scholar] [CrossRef]
Wen, C.; Wang, X.; Wang, S. Laplace error penalty-based variable selection in high dimension. Scand. J. Stat. 2015, 42, 685–700. [Google Scholar] [CrossRef]
Xia, Y.; Tong, H.; Li, W.K. An adaptive estimation of dimension reduction space (with discussion). J. R. Stat. Soc. Ser. B 2002, 64, 363–410. [Google Scholar] [CrossRef]
Zeng, P.; He, T.; Zhu, Y. A Lasso-type approach for estimation and variable selection in single index moedls. J. Comput. Graph. Stat. 2012, 21, 92–109. [Google Scholar] [CrossRef]
Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar] [PubMed]
An, L.T.H.; Tao, P.D. Solving a class of linearly constrained indefinite quadratic problems by d.c. algorithms. J. Glob. Optim. 1997, 11, 253–285. [Google Scholar]
Wu, T.T.; Lange, K. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2008, 2, 224–244. [Google Scholar] [CrossRef]
Hunter, D.R.; Lange, K. Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 2000, 9, 60–77. [Google Scholar]
Yu, K.; Jones, M. Local linear quantile regression. J. Am. Stat. Assoc. 1998, 93, 228–237. [Google Scholar] [CrossRef]
Wang, Q.; Yin, X. A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE. Comput. Stat. Data Anal. 2008, 52, 4512–4520. [Google Scholar] [CrossRef]
Shows, H.S.; Lu, W.; Zhang, H.H. Sparse estimation and inference for censored median regression. J. Stat. Plan. Inference 2010, 140, 1903–1917. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Chen, Z. Extended Bayesian information for model selection with large model spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Violin diagram of the first seven variables.

Figure 2. Violin diagram of the last seven variables.

Figure 3. Heat map between the variables.

Table 1. Simulation results for Scenario 1 based on 200 replications.

Error Distribution	Method	$n = 100$			$n = 200$
Error Distribution	Method	MAD (%)	NC	NIC	MAD (%)	NC	NIC
$N (0, 1)$	lad.sim.scad	11.65	3.96	3.53	7.63	4.00	1.57
	cqr.sim.scad	11.61	3.93	3.45	7.60	4.00	1.54
	cqr.sim.lep	11.59	3.91	3.44	7.58	4.00	1.53
	Oracle	11.57	3.90	3.42	7.56	4.00	1.51
$t (3)$	lad.sim.scad	13.72	3.90	3.73	9.25	3.99	1.99
	cqr.sim.scad	13.70	3.95	3.70	9.21	4.00	1.94
	cqr.sim.lep	13.68	3.96	3.68	9.18	4.00	1.93
	Oracle	13.67	3.98	3.67	9.15	4.00	1.91
$D E$	lad.sim.scad	8.79	3.97	3.20	5.69	4.00	1.76
	cqr.sim.scad	8.76	3.97	3.08	5.65	4.00	1.76
	cqr.sim.lep	8.74	3.98	3.05	5.63	4.00	1.76
	Oracle	8.72	3.99	3.00	5.61	4.00	1.76
$C N$	lad.sim.scad	16.65	3.82	2.83	10.78	3.94	1.55
	cqr.sim.scad	16.63	3.85	2.80	10.75	3.95	1.53
	cqr.sim.lep	16.62	3.87	2.78	10.74	3.97	1.52
	Oracle	16.61	3.89	2.77	10.71	3.98	1.50
$O u t l i e r$	lad.sim.scad	13.76	3.95	2.84	10.24	3.97	1.84
	cqr.sim.scad	13.74	3.96	2.83	10.23	3.97	1.81
	cqr.sim.lep	13.73	3.97	2.82	10.22	3.98	1.80
	Oracle	13.72	3.98	2.81	10.20	3.99	1.78

MAD (the mean absolute deviation) of

\hat{γ}

:

MAD = \frac{1}{n} \sum_{i = 1}^{n} | X_{i}^{⊤} \hat{γ} - X_{i}^{⊤} γ_{0} |

; NC: the average number of non-zero coefficients that are correctly estimated to be non-zero; NIC: the average number of zero coefficients that are incorrectly estimated to be non-zero, respectively.

Table 2. Simulation results for Scenario 2 based on 200 replications.

Error Distribution	Method	$n = 100$			$n = 200$
Error Distribution	Method	MAD (%)	NC	NIC	MAD (%)	NC	NIC
$N (0, 1)$	lad.sim.scad	9.90	3.98	3.27	8.26	4.00	1.41
	cqr.sim.scad	9.84	3.98	3.24	8.19	4.00	1.35
	cqr.sim.lep	9.81	3.99	3.23	8.16	4.00	1.34
	Oracle	9.79	3.90	3.21	8.14	4.00	1.32
$t (3)$	lad.sim.scad	12.15	3.97	3.41	8.57	4.00	1.72
	cqr.sim.scad	12.08	3.98	3.36	8.51	4.00	1.68
	cqr.sim.lep	12.05	3.99	3.34	8.50	4.00	1.67
	Oracle	12.03	3.88	3.32	8.47	4.00	1.64
$D E$	lad.sim.scad	7.63	4.00	3.16	4.97	4.00	2.18
	cqr.sim.scad	7.56	4.00	3.13	4.94	4.00	2.11
	cqr.sim.lep	7.54	4.00	3.11	4.92	4.00	2.08
	Oracle	7.51	4.00	3.09	4.89	4.00	2.06
$C N$	lad.sim.scad	12.02	3.95	3.06	11.31	3.97	1.24
	cqr.sim.scad	11.97	3.96	3.03	11.26	3.97	1.21
	cqr.sim.lep	11.96	3.98	3.02	11.25	3.98	1.18
	Oracle	11.93	3.86	2.97	11.23	3.88	1.15
$O u t l i e r$	lad.sim.scad	11.95	3.95	3.18	9.47	4.00	1.62
	cqr.sim.scad	11.92	3.96	3.15	9.42	4.00	1.59
	cqr.sim.lep	11.89	3.98	3.13	9.41	4.00	1.58
	Oracle	11.86	3.87	3.10	9.39	4.00	1.55

MAD (the mean absolute deviation) of

\hat{γ}

:

MAD = \frac{1}{n} \sum_{i = 1}^{n} | X_{i}^{⊤} \hat{γ} - X_{i}^{⊤} γ_{0} |

; NC: the average number of non-zero coefficients that are correctly estimated to be non-zero; NIC: the average number of zero coefficients that are incorrectly estimated to be non-zero, respectively.

Table 3. Description of variables for Boston housing data.

Variables	Description
MEDV	Median value of owner-occupied homes in USD thousands
CRIM	Per capita crime rate by town
ZN	Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS	Proportion of non-retail business acres per town
CHAS	Charles River dummy variable (=1 if tract bounds river, 0 otherwise)
NOX	Nitric oxide concentrations (parts per 10 million)
RM	Average number of rooms per dwelling
AGE	Proportion of owner-occupied units built prior to 1940
DIS	Weighted distances to five Boston employment centers
RAD	Index of accessibility to radial highways
TAX	Full-value property-tax rate per USD 10,000
PTRATIO	Pupil–teacher ratio by town
B	$1000 {(Bk - 0.63)}^{\land} 2$ which $Bk$ is the black proportion of the population
LSTAT	% lower status of the population

Table 4. Coefficient estimates for Boston housing data.

Variables	Methods
	cqr.sim.scad		cqr.sim.lep		lad.sim.scad	ls.sim.lasso	cqr.sim
	$τ$ = 0.25	$τ$ = 0.75	$τ$ = 0.25	$τ$ = 0.75	$τ$ = 0.5		$τ$ = 0.25	$τ$ = 0.75
CRIM	0.3092	0.3089	0.3082	0.3081	0.3083	0	0.3076	0.3075
ZN	0	0	0	0	0	−0.069	0	0
INDUS	0	0	0	0	0	0	0	0
CHAS	0	0	0	0	0	0	0	0
NOX	0	0	0	0	0	0	0	0
RM	−0.1884	−0.1883	−0.1871	−0.1870	−0.1872	−0.5300	−0.1866	−0.1864
AGE	0	0	0	0	0	0	0	0
DIS	0.1453	0.1451	0.1443	0.1442	0.1444	0.1163	0.1439	0.1437
RAD	0	0	0	0	0	−0.0460	0	0
TAX	0	0	0	0	0	0	0	0
PTRATIO	0.1877	0.1876	0.1870	0.1868	0.1871	0.1069	0.1865	0.1863
B	0	0	0	0	0	0	0	0
LSTAT	0.9042	0.9040	0.9031	0.9029	0.9032	0.8319	0.9024	0.9023

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Y.; Li, Z.; Fang, M. Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models. Mathematics 2022, 10, 2000. https://doi.org/10.3390/math10122000

AMA Style

Song Y, Li Z, Fang M. Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models. Mathematics. 2022; 10(12):2000. https://doi.org/10.3390/math10122000

Chicago/Turabian Style

Song, Yunquan, Zitong Li, and Minglu Fang. 2022. "Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models" Mathematics 10, no. 12: 2000. https://doi.org/10.3390/math10122000

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Variable Selection Based on Penalized Composite Quantile Regression for High-Dimensional Single-Index Models

Abstract

1. Introduction

2. Problem Setting and Methodology

2.1. Composite Quantile-SCAD Method for Single-Index Models

2.2. Composite Quantile–LEP Method for Single-Index Models

2.3. Computation

2.4. The Selections of Bandwidth, Tuning Parameters, and Initial Value

3. Theoretical Properties

4. Numerical Studies

4.1. Simulation Studies

4.2. Real Data Application: Boston Housing Data

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorems

Appendix B. The Algorithm Based on LEP

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI