Abstract
The paper develops an innovatively adaptive bi-level variable selection methodology for quantile regression models with a diverging number of covariates. Traditional variable selection techniques in quantile regression, such as the lasso and group lasso techniques, offer solutions predominantly for either individual variable selection or group-level selection, but not for both simultaneously. To address this limitation, we introduce an adaptive group bridge approach for quantile regression, to simultaneously select variables at both the group and within-group variable levels. The proposed method offers several notable advantages. Firstly, it adeptly handles the heterogeneous and/or skewed data inherent to quantile regression. Secondly, it is capable of handling quantile regression models where the number of parameters grows with the sample size. Thirdly, via employing an ingeniously designed penalty function, our method surpasses traditional group bridge estimation techniques in identifying important within-group variables with high precision. Fourthly, it exhibits the oracle group selection property, implying that the relevant variables at both the group and within-group levels can be identified with a probability converging to one. Several numerical studies corroborated our theoretical results.
Keywords:
quantile regression; bi-level variable selection; adaptive bridge estimator; diverging parameters MSC:
62F12; 62F35
1. Introduction
Over the past three decades, variable selection has emerged as a critical process in diverse scientific disciplines, including biomedical research, environmental science, and financial econometrics. Its importance lies in enhancing model interpretability, as statistical models with fewer important variables are more easily understood than their fully specified counterparts (Hastie, Tibshirani, and Friedman [1]). Usually, extensive data on potential predictors are collected to make sure that insignificant predictive relationships are overlooked. To diminish variability and enhance interpretability, it is necessary to seek a parsimonious model using a smaller subset of collected variables (Ahn and Kim [2]).
Although there is a large body of literature on variable selection, most works focus on the selection of individual variables. In many regression problems, however, important predictors are related and a manifestation of underlying common factors (Yuan and Lin [3]). Categorical factors, for example, are often represented by groups of indicator functions, whereas continuous factors may be modeled using basis functions. Moreover, groups of measurements are frequently employed to detect unobservable latent variables or to assess various aspects of complex entities. For instance, gene expression data might be categorized by biological pathways, and genetic markers grouped by the genes or haplotypes they represent (Goeman and Buhlmann [4]). Methods focused exclusively on individual variable selection can be suboptimal in these contexts, as they may overlook the information provided by the structure of groups, thereby potentially leading to incoherent and inefficient models.
In addition to convex/nonconvex regularizers designed for individual variable selection, such as the lasso (Tibshirani [5]), bridge (Frank and Friedman [6]), smoothly clipped absolute deviation penalty (Fan and Li [7]), and minimax concave penalty (Zhang [8]), several penalty functions have been developed to accommodate selection at the group level. Yuan and Lin [3] introduced the group lasso, where the penalty function is composed of the norms of predefined groups of variables. This approach promotes sparsity at the group level, while applying ridge-regression-like shrinkage within each group. Meier et al. [9] extended this concept to logistic regression models, and Zhao et al. [10] further extended it to accommodate overlapping and hierarchical group structures. While these aforementioned approaches can effectively perform variable selection at the group level, they do not facilitate individual-level variable selection within groups.
The group bridge penalty (Huang et al. [11]) applies a group penalty to the norms of groups, thereby enabling bi-level selection by promoting sparse solutions both at the group level and within individual groups. Bi-level selection is crucial for models that require identification of relevant groups of variables, as well as the selection of significant variables within those groups. Several recent works related to the group bridge penalty include [2,12], and references therein. Further advancing this area, Cai et al. [13] proposed an adaptive bi-level variable selection method for analyzing multivariate failure time data using the Cox proportional hazards model, showcasing the versatility and applicability of bi-level selection methods in survival data analysis. Buch et al. [14] demonstrated that bi-level selection methods offer enhanced model interpretability over traditional approaches like LASSO, particularly by effectively addressing the complex interactions present in grouped data, such as those encountered in omics research. More recently, Buch et al. [15] further highlighted the potential of bi-level methods to improve the flexibility and precision of variable selection, enabling more refined control over both group-level and individual-level sparsity. A thorough examination of the theoretical underpinnings and practical implications of bi-level variable selection techniques can be found in [16].
The existing bi-level variable selection approaches exhibit sensitivity to the tails of the unobservable error distribution. Additionally, when heterogeneity is present in response data, sparse estimators derived from least squares methods may yield inefficient results. Quantile regression (QR) emerges as a robust alternative to classical mean regression, offering a comprehensive view of the entire response distribution, while being resistant to heterogeneity, see [17,18,19]. Within the framework of QR, various regularization methods have been developed to identify significant group structures and individual variables in covariate data with an inherent group structure. Notable contributions include the works of Ciuperca ([20,21]), among others. Regarding bi-level variable selection, Ahn and Kim [2] investigated group bridge and adaptive group bridge penalties for competing risk QR with a diverging number of group variables, thereby facilitating bi-level variable selection in this specific context. Similarly, Shi and Wilke [22] employed a flexible-yet-dependent competing risks QR model to elucidate the relationships between early and late retirement transitions and various informative registers. Despite the advantages of the aforementioned approaches, there is a paucity of theoretical and computational aspects for bi-level variable selection in QR models in the presence of a diverging number of parameters. This highlights the necessity for further exploration and development of robust bi-level variable selection methodologies to effectively manage the complexities of QR models in such contexts. In addition, the frequent occurrence of heterogeneous group-structured data in medical and health-related research underscores the need for effective methods to identify both relevant groups and key individual variables within those groups [14,15]. Addressing these challenges was the primary motivation behind this study, as accurate group and within-group selection is crucial for improving the model interpretability and predictive performance for complex health data [16].
In this paper, we introduce an adaptive group bridge methodology for bi-level variable selection in QR models with a diverging number of parameters. The proposed bi-level variable selection approach utilizes an adaptive penalty function to simultaneously identify group structures and individual variables within the groups. The proposed estimation procedure offers several notable advantages: it effectively handles the heterogeneity and skewness often encountered in regression analysis; it is scalable, making it suitable for models with an increasing number of parameters as the sample size grows; it outperforms traditional group bridge methods by efficiently identifying key within-group variables, leveraging the flexibility of the adaptive penalty function; and it achieves the oracle group selection property, ensuring accurate identification of relevant variables at both the group and within-group levels with high probability. Additionally, an iterative optimization algorithm is presented to address the computational challenges posed by the non-differentiable check loss function and nonconvex penalty function, thereby improving both the computational efficiency and practical applicability.
The remainder of this paper is structured as follows: In Section 2, the proposed adaptive group bridge quantile methods are introduced, including a comprehensive description of the computational algorithm and the selection criteria for tuning parameters. The asymptotic properties of the proposed sparse estimation procedure are rigorously developed in Section 3. Section 4 presents simulation studies and an application to real data, providing empirical validation of the proposed methods. A concise discussion is provided in Section 5. All technical derivations and proofs are relegated to the Appendix A.
2. Methods
2.1. Adaptive Bi-Level Variable Selection in QR Models
Consider a sample of size n from some unknown population, where is the response of interest, and is the covariate or prediction vector. We focus on a family of linear QR models, which can be expressed as
where is an unknown coefficient vector, and is an independent random error variable with a -th quantile equal to zero.
Suppose the prediction variables can be divided into J groups. Let be subsets of representing known groupings of the design covariates. For any subset , let denote the -dimensional sub-vector of indexed by A, where denotes the cardinality of A. At a prefixed quantile level , the j-th group is denoted by . Additionally, a quantile level will be omitted from various expressions, such as and , whenever this is clear from the context.
To gain a comprehensive understanding of the relationship between the response variable and its predictors, it is crucial to not only select important groups of variables but also to identify significant individual members within these groups across different quantile levels. This process, known as bi-level selection, ensures a more detailed and accurate representation of the underlying data structure. To this end, we here introduce an adaptive group bridge quantile estimator , which is a minimizer of the adaptive group bridge penalized quantile loss function , i.e.,
where is the check loss function, with being the indicator function; is a tuning parameter that controls the penalty level; are constants for adjusting different dimensions of ; is a consistent estimator of its true counterpart ; v is a non-negative constant representing the penalty level of individuals within group ; and .
In general, groups are allowed to overlap, and their union may be a proper subset of the entire set of variables, ensuring that variables not included in are not subject to penalization. When , the penalty term corresponds to the group bridge penalty, whereas for , it becomes the adaptive group bridge penalty. The adaptive group bridge estimator reduces to individual variable selection when for all j, and it is equivalent to the adaptive hierarchical lasso penalty developed by [23] when and for all j.
The objective function demonstrates considerable flexibility in the variable selection regime. Specifically, when , it simplifies to the adaptive group lasso selection method for QR models in [20,24]. Furthermore, when and for , it reduces to the adaptive lasso objective function. When , it aligns with the group bridge variable selection method in QR models, analogous to that of mean regression models studied by [11]. Additionally, when and , utilizing a conditional hazard loss function, the estimation procedure transitions into an adaptive bi-level variable selection methodology for multivariate failure time models (Cai et al. [13]).
In what follows, we will show that the designed objective function (2) can be employed for bi-level variable selection in QR models. To see this, for , define
where is a penalty parameter, and with given by
Proposition 1.
Assume that . If , then minimizes if and only if minimizes , where and for .
This proposition is analogous to the characterization of the component selection and smoothing method of [25]. Examining the form of defined in (3), we observe that minimizing with respect to yields sparse solutions at both the group and individual variable levels. Specifically, the penalty is an adaptively weighted penalty, resulting in sparsity in . Moreover, for , small values force , leading to group selection. Following a similar approach to that outlined in [11], the validity of Proposition 1 can be rigorously verified. The detailed derivations are omitted here for brevity.
2.2. Algorithm
Since the check loss function is non-differentiable and the adaptive group bridge penalty is not a convex function for , minimizing the objective function in (2) with respect to poses a significant challenge. To address this difficulty, we propose an iterative optimization algorithm, as follows:
Step 1. At a given quantile level , obtain an initial estimator of in model (1), i.e.,
Step 2. Compute
Step 3. Compute
Step 4. Repeat Steps 2 and 3 until . For instance, can be chosen as the convergence rule.
The proposed algorithm is guaranteed to converge because it monotonically decreases the non-negative objective function (2) at each iteration. The minimization in Step 3 can be efficiently implemented by directly applying the adaptive LASSO penalized QR method of [26]. Generally, this algorithm converges to a local minimizer, depending on the initial value , due to the non-convex nature of the adaptive group bridge penalty. The flexibility and robustness of the iterative approach make it well-suited for the bi-level variable selection problems in QR settings.
3. Theoretical Properties
The proposed adaptive bi-level variable selection method for QR relies on adaptive weights , where and . This implies that an important variable in the j-th group will receive a smaller penalty, whereas less significant variables will incur larger penalties. Consequently, the initial estimator in (4) must be a consistent estimator of the true parameter in the QR model (1) with a diverging number of parameters. To ensure this consistency, we need the following technical conditions:
- (A1)
- Random error terms are independently and identifiably distributed with the -th quantile equal to zero and possess a continuous, positive density in a neighborhood of zero. The distribution functions are absolutely continuous with .
- (A2)
- Let and denote the largest and smallest eigenvalues of a positive definite matrix M, respectively. There are two positive constants and , such that
- (A3)
- as .
- (A4)
- The dimension satisfies , where .
Condition (A1) is standard in QR models (e.g., [26,27]). Condition (A2) ensures that the design matrix of the true model at the sample level is well-behaved. Condition (A3) is required for high-dimensional QR models (see [21,28]). Condition (A4) permits the number of parameters to diverge as the sample size n increases.
Theorem 1.
Suppose that Conditions (A1)–(A4) hold. Then, we have .
Theorem 1 indicates that the initial estimator is a consistent estimator of its true counterpart in the QR model (1). This consistency is crucial for the effectiveness of the adaptive bi-level variable selection method, ensuring reliable and accurate estimation in the presence of a diverging number of parameters.
Next, we present the asymptotic properties of . In particular, two scenarios are considered: (i) and , i.e., the group bridge penalty; (ii) and , i.e., the adaptive group bridge penalty. Specifically, we show that the group bridge estimators correctly select groups of nonzero coefficients with probability converging to one, while the adaptive group bridge estimator correctly identifies nonzero variables at both the group and within-group levels with probability approaching one. Moreover, the asymptotic distributions of nonzero components of these two penalized estimators are derived under different conditions.
Without loss of generality, we define and , such that and , where for and for . Write and as the true values of with indices belonging to and , respectively. Additionally, to distinguish the individual memberships between nonzero ’s and zero ’s , we define and such that if and if .
We first study the oracle property of the group bridge estimator at the given quantile level . For any vector , denote its norm by . Let . We assume that
- (A5)
- is bounded andwhere the constants satisfy and as .
- (A6)
- For fixed unknowns ,
Conditions (A5) and (A6) control the tuning parameter , the number of variables within each group, and the magnitude of the true parameters in nonzero groups. Condition (A5) is a simplified version of Assumption 3 in [11] based on Conditions (A2) and (A4). Together with Condition (A2) and in Condition (A6), we have . If we further assume , Condition (A5) still holds with .
The following theorem provides the large sample theory of the group bridge estimator at the given quantile level .
Theorem 2.
Assume in (2). Suppose that Conditions (A1)–(A6) hold. Then, we have
- (i)
- Consistency: .
- (ii)
- Group variable selection consistency: .
- (iii)
- Asymptotic distribution: for fixed unknowns ,wherewith W following , the leading submatrix of Υ with , and Υ satisfying as . In particular, if , thenwhere denotes convergence in distribution.
Theorem 2 demonstrates the asymptotic oracle property in group selection. Moreover, the estimator of coefficients in non-zero groups is -consistent and, in general, converges to the argmin of the Gaussian process .
From Theorem 2, we see that the group bridge method can consistently select nonzero group variables but may not effectively remove all unimportant variables within these groups. However, this issue can be addressed by using in (2), i.e., the adaptive group bridge penalty, which can consistently eliminate zero individual variables within nonzero groups by setting the corresponding weights to be large. To establish the oracle property of the adaptive group bridge, we need the following conditions:
- (A7)
- For some and such that , , , , , and
- (A8)
- , , .
Conditions (A7) and (A8) allow to diverge as . Condition (A7) controls the number of nonzero parameters and also represents the minimal signal strength condition, requiring that the smallest magnitude of nonzero parameters diminishes to zero at a rate slower than . Condition (A8) restricts and v as to prove the oracle property. Specifically, Condition (A8) implies that v and satisfy and , given . If , the third part of Condition (A8) becomes .
The following theorem establishes the oracle property of the adaptive group bridge quantile estimator at a given quantile level .
Theorem 3.
where is the leading submatrix of Υ with .
Assume in (2). Suppose that Conditions (A1)–(A4), (A7), and (A8) hold. Then, we have
- (i)
- Consistency: .
- (ii)
- Bi-level variable selection consistency: .
- (iii)
- Asymptotic distribution: for fixed unknowns ,
Theorem 3 demonstrates the oracle property of the adaptive group bridge estimator. The first part of Theorem 3 presents the convergence rate of this estimator. The second part shows that the adaptive group bridge consistently identifies not only important groups but also significant within-group variables. Moreover, the third part indicates that when the number of nonzero variables is fixed, the asymptotic distributions of the adaptive group bridge estimators are asymptotically equivalent to those obtained from the truly underlying variables at both the group and within-group levels.
The tuning parameter defined in (2) controls the trade-off between the goodness of fit and the model complexity. It is of great importance to select an optimal to achieve bi-level variable selection consistency in QR models. To this end, we here employ the following BIC-type criterion proposed by [29]:
where is the number of nonzero estimates given . Given a range of the tuning parameter values, the optimal tuning parameter is selected as the minimizer of .
4. Numerical Studies
4.1. Simulation Studies
We employed simulation studies to evaluate the finite-sample performance of the adaptive bi-level variable selection in QR models. Two scenarios were considered in our simulations. In Experiment 1, the group sizes were uniform, each comprising the same number of covariates. Conversely, in Experiment 2, the group sizes varied. Notably, both experiments included scenarios where some groups contained a mix of zero and nonzero coefficients. This setup allowed for the examination of the effectiveness of the proposed variable selection methods under different structural complexities and varying degrees of sparsity within the groups. The sample size was in each example.
Experiment 1. In this experiment, there were eight groups, each consisting of five covariates. The covariate vector was , where for . To generate , we first generated 40 independent random variables from the standard normal distribution. Then, () were simulated from a multivariate normal distribution with mean zero and . Thus, the covariates were simulated as
Finally, the response variable Y was simulated as
where is the conditional -quantile of e, and e is generated from . We considered two cases for : (i) ; and (ii) , which was used to investigate the effect of heteroscedasticity. Additionally, two different types of coefficient vectors were considered:
- (a)
- , , , .
- (b)
- , , , , .
Under the above settings, scenario (a) assumed that all coefficients within each group were either all nonzero or all zero. This scenario was designed to evaluate the finite sample performance of the proposed bi-level variable selection method at the group level, in comparison to several well-known variable selection methods for QR models, which operate at either the individual or group level. In scenario (b), however, some coefficients within a nonzero group, such as groups 3 and 4, were equal to zero. This setting was specifically designed to assess the performance of bi-level variable selection, as traditional methods that focus solely on individual or group-level selection may produce suboptimal results in such cases.
Experiment 2. In this experiment, the group sizes varied across groups. Specifically, there were four groups with size five and four groups with size three. The covariate vector was , where for , and for . To generate , we first generated 32 random variables from . Then, () were generated from a multivariate normal distribution with mean zero and . The covariates were then simulated as
Finally, the response variable y was simulated as in (5). Additionally, we here considered two different group structures of coefficients in model (5), as follows:
- (a)
- , , , , .
- (b)
- , , , , , , , .
These settings reflected the different structural characteristics in the coefficient vectors. In scenario (a), some groups had all zero coefficients, while others had nonzero coefficients, with the presence of zero coefficients within nonzero groups in scenario (b).
The proposed estimation procedures were applied to the simulated model (5) at the quantile levels , and , respectively. For each of the six combinations of quantile levels and parameter settings, 1000 datasets were independently generated, following the data generation processes outlined in Experiments 1 and 2. For each dataset, the adaptive group bridge quantile estimator with and , denoted by GB, and the adaptive group bridge quantile estimator with and , denoted by AGB, were computed. In addition, we evaluated the mean square error (MSE), which was calculated by , where is the estimator of evaluated on the ith dataset for a given .
The performance of the proposed AGB methods was compared with two existing variable selection techniques. The first technique was the smoothly clipped absolute deviation (SCAD) method, as developed by [27]. This method integrates the SCAD penalty into the QR loss function to achieve effective individual variable selection. The second technique was the adaptive group lasso quantile estimator (AGL), as described in [20]. The AGL method employs an adaptive lasso penalty at the group level within the QR framework to improve the identification of significant covariate groups.
The results for 1000 repetitions in each of the six cases are reported in Table 1 and Table 2, which correspond to the scenarios of homoscedasticity (i.e., ) and heteroscedasticity (i.e., ), respectively. In the tables, the notations “NG” and “NV” denote the average number of groups and individual variables selected by each variable selection method, respectively. The notations “%CG” and “%CI” represent the proportions that the corresponding variable selection method correctly identified as nonzero group variables and nonzero individual variables for the underlying model, respectively. “MSER” denotes the ratio of the median MSE of each variable selection method to that of the oracle estimator. The oracle values for these measures are also listed in Table 1 and Table 2. Clearly, the closer a method’s result is to the oracle value, the better its performance.
Table 1.
Simulation results for Experiment 1.
Table 2.
Simulation results for Experiment 2.
From Table 1 and Table 2, several key insights can be derived: (1) In scenario (a) of Experiment 1 designed for group variable selection, the proposed methods AGB and GB were comparable to AGL if all individual variables within each group were nonzero. Both AGB and GB methods effectively identified the true nonzero and zero groups, with mean group sizes closely approximating the actual number of true groups, indicating that the proposed bi-level variable selection procedures performed robustly at the group level; (2) When nonzero individual variables were present within a nonzero group, as observed in scenario (b) of Experiment 1 and in both scenarios (a) and (b) of Experiment 2, the AGB method outperformed AGL in terms of correctly identifying the magnitudes of nonzero variables. Additionally, the AGB method surpassed GB in accurately identifying nonzero individual variables, demonstrating the effectiveness of the proposed methods in individual-level variable selection; (4) The SCAD method exhibited poor performance across all considered settings, because SCAD is primarily designed for individual variable selection and does not extract group structure information; (5) When sparsity existed at both group and with-in group levels, the AGB method was superior to the other competitors listed in Table 1 and Table 2 in terms of estimation and variable selection performance in most cases, even when heteroscedasticity was present in the response data; (6) The proposed methods demonstrated resistance to variations in the number of individuals within a group, as evidenced by the results from the two experiments, underscoring the versatility and reliability of the bi-level variable selection approach. Overall, the findings reveal that the adaptive group bridge method for QR was capable of achieving both group and within-group variable selection, even in the presence of heterogeneity and variation in the number of individuals within groups, and competed effectively with a number of existing bi-level variable selection methods.
4.2. An Example
In this section, the Birthwt dataset, collected at Baystate Medical Center, Springfield, Massachusetts, in 1986, was utilized to illustrate the effectiveness of the proposed methods. The Birthwt dataset consists of 189 observations, with 16 predictors and an outcome variable, birth weight, which is available both as a continuous measure and as a binary indicator for low birth weight. In this analysis, the birth weight in kilograms was taken as the response variable Y, while the other 16 variables served as covariates. These covariates were divided into eight groups, as outlined below:
- age1 (), age2 (), age3 (): Orthogonal polynomials of the first, second, and third degree, representing the mother’s age in years.
- lwt1 (), lwt2 (), lwt3 (): Orthogonal polynomials of the first, second, and third degree, representing the mother’s weight in pounds at the last menstrual period.
- white (), black (): Indicator variables for the mother’s race; “other” serves as the reference group.
- smoke (): Smoking status during pregnancy.
- ptl1 (), ptl2 (): Indicator variables for one or for two or more previous premature labors, respectively. No previous premature labor serves as the reference category.
- ht (): History of hypertension.
- ui (): Presence of uterine irritability.
- ftv1 (), ftv2 (), ftv3m (): Indicator variables for one, for two, or for three or more physician visits during the first trimester, respectively. No visits serves as the reference category.
The primary objective of this study was to investigate whether Y was related to the covariates . To achieve this, a QR model was employed to fit the dataset. Specifically, the -th conditional quantile of was assumed to be
where is the -th quantile of Y given . Under this assumption, the group structures were defined as follows: , , , , , , , and . The group settings for and were designed to examine whether age and mother’s weight have linear or nonlinear effects on birth weight, respectively. The same rationale applies to the group settings for , and . Three different quantile levels, , , and , were considered. Similarly to in the simulation studies, the adaptive group lasso (AGL) estimator and the group bridge estimator proposed by [11] (denoted as GB-LS) were computed for comparison.
For each specified quantile level , the GB and AGB estimators of the coefficient vector were computed. To assess model performance, the mean absolute prediction error (APE) was defined by
where represents the predicted value based on the estimated coefficients .
Table 3 presents the point estimates of the parameters and the APE values for the four different methods. An analysis of Table 3 reveals several key findings: (i) While the GB-LS method focused on variable selection based on their effects on the mean, the other methods selected different variables across various quantiles. For example, at the lower quantile (), the AGB and GB methods selected ht and ui as significant variables, whereas these variables became less significant at higher quantiles. This suggests that ht and ui had a greater impact on lower birth weights, but their influence diminished for higher birth weights; (ii) The AGL method exclusively performed selection at the group level, while the GB and AGB methods were capable of identifying variables at both the group and individual levels. For instance, at , AGL treated the group of physician visits as insignificant, whereas AGB selected only ftv1 as insignificant, demonstrating a more precise selection at the individual level; (iii) When comparing APE values, the AGB and GB methods often yielded similar results. However, AGB typically produced sparser models. For instance, at , AGB selected fewer variables than GB, while maintaining comparable APE values, indicating the efficiency of the AGB method in balancing model sparsity and predictive accuracy; (iv) The AGL method occasionally failed to identify relevant groups at certain quantile levels. For example, at , AGL did not select the physician visit group (ftv1, ftv2, ftv3), whereas AGB identified ftv1 and ftv3 as significant. This highlighted AGB’s ability to adapt across different quantiles and capture the varying effects of covariates throughout the birth weight distribution; (v) age group had a significant impact on birth weight across all three quantile levels considered. Specifically, we observed no linear relationship between age and birth weight, as the coefficient of was zero. However, and emerged as important predictors, indicating a significant nonlinear effect on birth weight. A similar pattern was observed within groups , , and , suggesting complex, non-linear dynamics in their influence on birth weight. Overall, these findings underscored the effectiveness and flexibility of the proposed AGB method for handling the bi-level variable selection problem in QR models, particularly when considering the effects of differing quantile levels.
Table 3.
Results of the real data analysis.
5. Conclusions and Discussion
In this paper, we introduced an adaptive bi-level variable selection method for QR models with a diverging number of covariates. The method employs an adaptive penalty that enables simultaneous selection at both the group and individual levels, addressing challenges related to sparsity, heterogeneity, and skewness in data. Through rigorous theoretical analysis, we established the asymptotic properties of the proposed estimators, confirming their consistency and efficiency. The simulation studies demonstrated the superior performance of the proposed method, particularly in scenarios where both group-level and within-group sparsity existed. The adaptive bi-level method consistently outperformed traditional variable selection techniques in terms of selecting the correct groups and identifying the most relevant individual variables within those groups. Additionally, the real data application from the Birthwt dataset further validated the method’s practical utility. It effectively identified key covariates influencing birth weight at different quantiles, offering improved interpretability and predictive accuracy across various quantile levels. Overall, the findings suggest that the adaptive bi-level method is a robust and flexible approach to variable selection in complex, high-dimensional QR models.
Throughout this study, the dimension of parameters in QR models was assumed to grow with the sample size. However, extending the proposed procedure to high-dimensional settings, where the number of covariates exceeds the sample size, is of significant interest. In such cases, further investigation into the bi-level variable selection procedure, in terms of both theory and optimization, would be necessary for high-dimensional QR models with grouped variables. In addition, the convolution-type smoothing techniques of [30,31] could be employed to achieve bi-level variable selection in high-dimensional QR models under high dimensionality. These interesting extensions are beyond the scope of the present paper, and are left for future research.
Author Contributions
Conceptualization, X.D. and Z.Y.; methodology, X.D. and Z.Y.; validation, X.D.; formal analysis, Z.Y.; investigation, X.D.; writing— original draft, Z.Y.; writing—review and editing, X.D. and Z.Y.; supervision, Z.Y.; project administration, Z.Y.; funding acquisition, X.D. and Z.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Yunnan Fundamental Research Projects (Grant Number 202401AS070152), the National Natural Science Foundation of China (Grant Number 12001244), and the Major Basic Research Project of the Natural Science Foundation of the Jiangsu Higher Education Institutions (Grant Number 19KJB110007).
Data Availability Statement
The real data that are used to illustrate the proposed methods are available at https://search.r-project.org/CRAN/refmans/fic/html/birthwt.html (accessed on 16 October 2024).
Acknowledgments
The authors wish to thank the Editor-in-Chief, the Associate Editor and three reviewers for their many helpful and insightful comments and suggestions that greatly improved the paper.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Proof of Theorem 1.
Let . It is required to show that for all , there exists a constant sufficiently large such that, for sufficiently large n,
To achieve this, for some constant , consider the expectation of the difference:
This expectation can be rewritten as
where denotes the cumulative distribution function of the errors .
Given condition (A4), . Additionally, under condition (A3), as . Using the mean value theorem and the fact that the density f has a bounded first derivative in the neighborhood of 0, it follows that
From condition (A2), it holds that
Next, define the random variable , and the random vector , where . Then, the process can be expressed as
Given that , together with condition (A3), it follows that
which implies
Using conditions (A1)–(A3), it follows that
Defining the random variable , it follows that . This, along with , implies by the Bienaymé–Tchebychev inequality that as . Consequently, . Therefore, Equation (A3) can be expressed as
This implies that
Using the central limit theorem, converges in distribution to a centered Gaussian distribution, since and with .
Taking into account conditions (A2) and (A4), for a sufficiently large constant C, it follows that
for sufficiently large n. Therefore, inequality (A1) is satisfied, considering conditions (A1) and (A2). □
Proof of Theorem 2.
We begin by establishing part (i) of Theorem 2. Let and define . It suffices to show that, for any , there exists a sufficiently large C such that
Considering that
where
We apply Knight’s identity (Knight and Fu [32]) for any two scalars w and v, yielding
The difference between and can thus be expressed as
Given that and , we obtain
Thus, is of order . For , using the proof of Theorem 1, it holds that
This implies that is of order . Therefore, by choosing C sufficiently large, dominates uniformly for .
For the lower bound of , consider the case where . Since for , it follows that
where and is defined in condition (A5). Therefore, it holds that
Since is of order , the first term on the right-hand side of (A5) dominates the third term uniformly for when C is sufficiently large. This completes the proof of part (i) of Theorem 2.
Next, we establish the group selection consistency. Using Theorem 2-(i), belongs, with a probability converging to one, to the set for sufficiently large . For any with and for all constants , we show that
with a probability tending to one as .
Consider the parameter set . We demonstrate that as . Let and such that and .
Using the definition of , it holds that
This leads to
For , since the density f is bounded in a neighborhood of 0, it holds that
Given condition (A2), it follows that . By analogous calculations, using the independence of , we have as . Since , applying the Bienaym–Tchebychev inequality yields
For , we rewrite as
Using conditions (A1)–(A3) and the boundedness of , we have
Similarly, we can show that . Using the Bienaym–Tchebychev inequality, it follows that
Consequently, we obtain
Thus, it follows that
For the lower bound of , by (A5), we have
If , then
Hence, . Since , from condition (A5), we conclude that
Finally, we establish the asymptotic distribution of . Since are fixed, , so that condition (A6) implies condition (A5), and
Therefore, the proof of Theorem 2-(i) applies with the reduced and the reduced number of coefficients . Thus,
Let and , where is a zero vector of dimension , and is a -dimensional constant vector. Using part (i) of Theorem 2, with large probability approaching one, .
On the other hand, can be rewritten as
Following the arguments of [26], , where denotes convergence in distribution. According to [11], we have
Therefore, . Since , using the argmin continuous mapping theorem (Kim and Pollard [33]), , which completes the proof. □
Proof of Theorem 3.
We first establish the consistency of . Let and . It is sufficient to show that for every , there exists such that
Since
where
We can utilize Knight’s identity (Knight and Fu [32]) to rewrite the difference between and as
Since and , it follows that
Thus, we have . For , from the proof of Theorem 1, it holds that
It follows that is of order . By choosing a sufficiently large C, dominates uniformly in .
Next, can be re-expressed as
Consider first. We analyze two cases: Case when for all and , and Case where at least one for some and some .
In Case , assume that for all and all . Noting that for , and using condition (A7), it follows that with for . Thus, for sufficiently large n such that for all ,
Since for , and given conditions (A7) and (A8), this term is dominated by .
In Case , where at least one for some and some , consider the term of ,
where the last equality holds due to with . Since by condition (A8), dominates and .
Next, consider
We conclude that
Since converges in probability to the non-zero for , and , . Meanwhile, for , so is at least . Therefore, dominates as . Hence, for sufficiently large n,
By for , for sufficiently large n we obtain
Using similar arguments to those in (A10), dominates . Consider ,
where , and the last equality holds due to with . Since , dominates and . Thus, by (A9) and (A11), if at least one , dominates and for sufficiently large n.
Combining (A9) and (A11) with (A8), for sufficiently large n, it holds that
which completes the proof of Theorem 3(i).
Next, we demonstrate variable selection consistency. Let , where C is a constant and . Define
It suffices to show that for sufficiently large n,
Following the argument of the consistency proof for the adaptive group bridge estimator, we can verify that and are of order .
Consider :
Since for , and , using condition (A8), dominates and . Similarly to (A9), also dominates and .
Therefore, for sufficiently large n,
which proves the individual variable selection consistency.
Finally, we show the asymptotics of , where . Note that the proof of Theorem 3(i) still works with the reduced and reduced number of coefficients . Thus,
Let and , where is a zero vector of dimension and is a -dimensional constant vector. From the consistency of , with large probability approaching one, , with .
Using (A4), can be rewritten as
Following the arguments of [26], , where denotes convergence in distribution and . Similarly to [11], it follows from condition (A8) that . Therefore, . Using the epi-convergence results of [34], . This completes the proof. □
References
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
- Ahn, K.W.; Kim, S. Variable selection with group structure in competing risks quantile regression. Stat. Med. 2018, 37, 1577–1586. [Google Scholar] [CrossRef] [PubMed]
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Goeman, J.; Bühlmann, P. Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 2007, 23, 980–987. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Frank, I.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics 1993, 35, 109–135. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zhang, C. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
- Meier, L.; Van De Geer, S.; Bühlmann, P. The group lasso for logistic regression. J. R. Stat. Soc. Ser. B 2008, 70, 53–71. [Google Scholar] [CrossRef]
- Zhao, W.; Zhang, R.; Liu, J. Sparse group variable selection based on quantile hierarchical lasso. J. Appl. Stat. 2014, 41, 1658–1677. [Google Scholar] [CrossRef]
- Huang, J.; Ma, S.; Xie, H. A group bridge approach for variable selection. Biometrika 2009, 96, 339–355. [Google Scholar] [CrossRef]
- Huang, J.; Li, L.; Liu, Y.; Zhao, X. Group selection in the cox model with adiverging number of covariates. Stat. Sin. 2014, 24, 1787–1810. [Google Scholar]
- Cai, K.; Shen, H.; Lu, X. Adaptive bi-level variable selection for multivariate failure time model with a diverging number of covariates. Test 2022, 31, 968–993. [Google Scholar] [CrossRef]
- Buch, G.; Schulz, A.; Schmidtmann, I.; Strauch, K.; Wild, P.S. Interpretability of bi-level variable selection methods. Biom. J. 2024, 66, 2300063. [Google Scholar] [CrossRef] [PubMed]
- Buch, G.; Schulz, A.; Schmidtmann, I.; Strauch, K.; Wild, P.S. Sparse Group Penalties for bi-level variable selection. Biom. J. 2024, 66, 2200334. [Google Scholar] [CrossRef]
- Buch, G.; Schulz, A.; Schmidtmann, I.; Strauch, K.; Wild, P.S. A systematic review and evaluation of statistical methods for group variable selection. Stat. Med. 2023, 42, 331–352. [Google Scholar] [CrossRef]
- Dai, D.; Tang, A.; Ye, J. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics 2023, 11, 2232. [Google Scholar] [CrossRef]
- Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Li, Y.; Zhu, J. L1-norm quantile regression. J. Comput. Graph. Stat. 2008, 17, 163–185. [Google Scholar] [CrossRef]
- Ciuperca, G. Adaptive group LASSO selection in quantile models. Stat. Pap. 2019, 60, 173–197. [Google Scholar] [CrossRef]
- Ciuperca, G. Adaptive elastic-net selection in a quantile model with diverging number of variable groups. Statistics 2020, 54, 1147–1170. [Google Scholar] [CrossRef]
- Shi, S.; Wilke, R.A. Variable selection with group structure: Exiting employment at retirement age—A competing risks quantile regression analysis. Empir. Econ. 2022, 62, 119–155. [Google Scholar] [CrossRef]
- Zhou, N.; Zhu, J. Group variable selection via a hierarchical lasso and its oracle property. Stat. Interface 2010, 3, 557–574. [Google Scholar] [CrossRef]
- Ouhourane, M.; Yang, Y.; Benedet, A.L. Group penalized quantile regression. Stat. Methods Appt. 2022, 31, 1–35. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H. Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 2006, 34, 2272–2297. [Google Scholar]
- Wu, Y.; Liu, Y. Variable selection in quantile regression. Stat. Sin. 2009, 19, 801–817. [Google Scholar]
- Zhong, W.; Zhu, L.; Li, R. Regularized quantile regression and robust feature screening for single index models. Stat. Sin. 2016, 26, 69–95. [Google Scholar] [CrossRef]
- Zou, H.; Zhang, H. On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 2009, 37, 1733–1751. [Google Scholar] [CrossRef]
- Lee, E.R.; Noh, H.; Park, B.U. Model selection via Bayesian information criterion for quantile regression models. J. Am. Stat. Assoc. 2014, 109, 216–229. [Google Scholar] [CrossRef]
- Fernandes, M.; Guerre, E.; Horta, E. Smoothing quantile regressions. J. Bus. Econ. Stat. 2021, 39, 338–357. [Google Scholar] [CrossRef]
- He, X.; Pan, X.; Tan, K.M. Smoothed quantile regression with large-scale inference. J. Econom. 2023, 232, 367–388. [Google Scholar] [CrossRef]
- Fu, W.; Knight, K. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar] [CrossRef]
- Kim, J.; Pollard, D. Cube root asymptotics. Ann. Stat. 1990, 18, 191–219. [Google Scholar] [CrossRef]
- Geyer, C.J. On the asymptotics of constrained M-estimation. Ann. Stat. 1994, 22, 1993–2010. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).