Relaxed Adaptive Lasso and Its Asymptotic Results
Abstract
:1. Introduction
2. Relaxed Adaptive Lasso and Asymptotic Results
2.1. Definition
2.2. Algorithm
- Step (1).
- For a given , we use to construct the weight in an adaptive lasso based on the definition from Zou [6]. We can also replace with other consistent estimators, e.g., .
- Step (2).
- Define , where
- Step (3).
- Then, the process of computing relaxed adaptive lasso solutions is identical to that of solving the relaxed lasso solutions in Meinshausen [11]. The relaxed lasso estimator is defined asThe Lars algorithm is first used to compute all the adaptive lasso solutions. Select a total of h resulting models attained with the sorted penalty parameters . When , for example, all variables with nonzero coefficients are selected, which is identical to the OLS function. On the other hand, completely shrinks the estimators to zero, thus leading to a null model. Therefore, a moderate in the sequence of is chosen such that . Then, define the OLS estimator , where is the direction of adaptive lasso solutions, which can be obtained from the last step. If there exists at least one component j such that , then all the adaptive lasso solutions on the set of variables are identical to the set of relaxed lasso estimators for . Otherwise, for are computed by linear interpolation between and .
- Step (4).
- Output the relaxed adaptive lasso solutions: .
- Step (1).
- As before, denotes the active set of the adaptive lasso. Let denote the adaptive lasso estimator. The relaxed adaptive lasso solution can be defined as
- Step (2).
- The submatrix of active predictors is a reversible matrix; thus, .
- Step (3).
- Define , where ; then, the adaptive lasso solution is identical to solving the lasso problemBy means of the Karush–Kuhn–Tucker (KKT) optimality condition, the lasso solution over its active set can be written asFrom the transformation of the predictor matrix in Step (2), it follows that the adaptive lasso estimator is .
- Step (4).
- Thus the improved solution of the relaxed adaptive lasso can be written as
Algorithm 1. The simple algorithm for relaxed adaptive lasso. |
Input: a given constant , the weight vector , |
Precompute: |
Initialization: Let to be the optimal parameter |
corresponding to the modified models . |
Set to an initial order number of |
Define , |
, where |
fordo |
if then |
else |
Set |
until |
Output: |
Algorithm 2. The improved algorithm for the relaxed adaptive lasso. |
Input: Adaptive lasso estimator , OLS estimator , |
weight vector |
Precompute:, Let be the active set of the adaptive lasso |
Initialization: Define |
fordo |
if then |
compute , |
else |
Stop iterations |
until |
Output:, |
2.3. Asymptotic Results
3. Simulation
3.1. Setup
- i.
- Given sample size and data dimension .
- ii.
- The true regression coefficient has its first signal variables taking nonzero coefficients equally spaced from 0.5 to 10 in the sense that for all and the remaining coefficients are zero.
- iii.
- The design matrix is generated from a normal distribution , where covariance matrix has entries and . The correlation between predictor variables is set to .5.
- iv.
- The theoretical signal-to-noise ratio in this simulation is defined as . We discuss either SNR = 0.2 for low or SNR = 0.8 for high to calculate the variance of so that the response variable Y generated from the linear regression model follows .
- v.
- We compute the weight of the adaptive lasso via the ridge regression estimator with . For each method, five-fold cross-validation is used to select the penalty parameters, and the loss function to apply for cross-validation is chosen by minimizing the prediction error on the test set. Furthermore, we pick the least complex model that is comparable in accuracy to the best model under the “one-standard-error” criterion Franklin [22].
3.2. Evaluation Metrics
3.3. Summary of Results
4. Application to Real Data
4.1. Dataset
4.2. Analysis Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Appendix Proof of Lemma 1
Appendix B. Appendix Proof of Lemma 2
Appendix C. Appendix Proof of Lemma 3
Appendix D. Appendix Proof of Theorem 1
Appendix E. Appendix Proof of Theorem 2
Appendix F. Appendix Proof of Theorem 3
Appendix G. Appendix Proof of Theorem 4
References
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Wang, S.; Weng, H.; Maleki, A. Which bridge estimator is optimal for variable selection? arXiv 2017, arXiv:1705.08617. [Google Scholar]
- Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef] [Green Version]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. arXiv 2006, arXiv:math/0602133. [Google Scholar] [CrossRef]
- Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
- Fan, J.; Peng, H. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 2004, 32, 928–961. [Google Scholar] [CrossRef] [Green Version]
- Donoho, D.L.; Johnstone, J.M. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
- Breiman, L. Better subset regression using the nonnegative garrote. Technometrics 1995, 37, 373–384. [Google Scholar] [CrossRef]
- Yuan, M.; Lin, Y. On the non-negative garrotte estimator. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2007, 69, 143–161. [Google Scholar] [CrossRef]
- Meinshausen, N. Relaxed lasso. Comput. Stat. Data Anal. 2007, 52, 374–393. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Tibshirani, R.J. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv 2017, arXiv:1707.08692. [Google Scholar]
- Mentch, L.; Zhou, S. Randomization as regularization: A degrees of freedom explanation for random forest success. arXiv 2019, arXiv:1911.00190. [Google Scholar]
- Bloise, F.; Brunori, P.; Piraino, P. Estimating intergenerational income mobility on sub-optimal data: A machine learning approach. J. Econ. Inequal. 2021, 19, 643–665. [Google Scholar] [CrossRef]
- He, Y. The Analysis of Impact Factors of Foreign Investment Based on Relaxed Lasso. J. Appl. Math. Phys. 2017, 5, 693–699. [Google Scholar] [CrossRef] [Green Version]
- Kang, C.; Huo, Y.; Xin, L.; Tian, B.; Yu, B. Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J. Theor. Biol. 2019, 463, 77–91. [Google Scholar] [CrossRef]
- Tay, J.K.; Narasimhan, B.; Hastie, T. Elastic net regularization paths for all generalized linear models. arXiv 2021, arXiv:2103.03475. [Google Scholar]
- Fu, W.; Knight, K. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar] [CrossRef]
- Zhao, P.; Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
- Huang, J.; Ma, S.; Zhang, C.H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 2008, 1603–1618. [Google Scholar]
- Franklin, J. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 2005, 27, 83–85. [Google Scholar] [CrossRef]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models; Routledge: Oxfordshire, UK, 2019. [Google Scholar]
- Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2008, 70, 849–911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef] [PubMed] [Green Version]
p | n | Method | RR | RTE | PVE | MSE | Number of Nonzeros |
---|---|---|---|---|---|---|---|
20 | 100 | Lasso | 0.997 | 1.206 | 0.4 | 96.2 | 1 |
Rlasso | 0.997 | 1.205 | 0.6 | 95.4 | 1 | ||
Alasso | 0.995 | 1.205 | 0.8 | 94.5 | 1 | ||
Radlasso | 0.986 | 1.203 | 2.4 | 100.1 | 2 | ||
500 | Lasso | 0.990 | 1.199 | 1.6 | 91.4 | 4 | |
Rlasso | 0.987 | 1.198 | 2.1 | 90.3 | 2 | ||
Alasso | 0.989 | 1.198 | 1.9 | 90.5 | 3 | ||
Radlasso | 0.974 | 1.196 | 4.3 | 86.5 | 6 | ||
1000 | Lasso | 0.987 | 1.197 | 2.1 | 90.2 | 5 | |
Rlasso | 0.983 | 1.196 | 2.9 | 89.6 | 3 | ||
Alasso | 0.985 | 1.197 | 2.4 | 90.1 | 4 | ||
Radlasso | 0.974 | 1.195 | 4.4 | 86.8 | 7 | ||
50 | 100 | Lasso | 0.998 | 1.197 | 0.4 | 99.7 | 1 |
Rlasso | 0.997 | 1.197 | 0.5 | 99.9 | 1 | ||
Alasso | 0.993 | 1.196 | 1.2 | 98.8 | 2 | ||
Radlasso | 0.985 | 1.195 | 2.3 | 106.8 | 2 | ||
500 | Lasso | 0.992 | 1.200 | 1.4 | 93.4 | 4 | |
Rlasso | 0.986 | 1.199 | 2.3 | 92.5 | 2 | ||
Alasso | 0.988 | 1.199 | 1.9 | 91.6 | 3 | ||
Radlasso | 0.976 | 1.197 | 4.0 | 90.6 | 5 | ||
1000 | Lasso | 0.987 | 1.195 | 2.1 | 88.8 | 5 | |
Rlasso | 0.982 | 1.195 | 2.9 | 88.0 | 3 | ||
Alasso | 0.985 | 1.195 | 2.5 | 88.4 | 4 | ||
Radlasso | 0.974 | 1.193 | 4.3 | 86.5 | 6 |
p | n | Method | RR | RTE | PVE | MSE | Number of Nonzeros |
---|---|---|---|---|---|---|---|
20 | 100 | Lasso | 0.980 | 1.789 | 8.8 | 75.1 | 5 |
Rlasso | 0.972 | 1.783 | 12.1 | 73.8 | 3 | ||
Alasso | 0.975 | 1.785 | 11.1 | 72.8 | 4 | ||
Radlasso | 0.960 | 1.773 | 17.8 | 75.2 | 5 | ||
500 | Lasso | 0.969 | 1.781 | 13.8 | 61.5 | 7 | |
Rlasso | 0.962 | 1.775 | 17.1 | 60.7 | 5 | ||
Alasso | 0.967 | 1.780 | 14.7 | 61.9 | 6 | ||
Radlasso | 0.956 | 1.771 | 19.7 | 58.8 | 9 | ||
1000 | Lasso | 0.966 | 1.762 | 14.8 | 59.3 | 8 | |
Rlasso | 0.959 | 1.756 | 17.8 | 58.5 | 6 | ||
Alasso | 0.964 | 1.760 | 15.9 | 59.3 | 7 | ||
Radlasso | 0.956 | 1.753 | 19.4 | 57.1 | 9 | ||
50 | 100 | Lasso | 0.985 | 1.784 | 6.7 | 75.5 | 4 |
Rlasso | 0.978 | 1.779 | 9.4 | 73.4 | 3 | ||
Alasso | 0.974 | 1.775 | 11.4 | 69.7 | 6 | ||
Radlasso | 0.963 | 1.766 | 16.2 | 83.3 | 4 | ||
500 | Lasso | 0.970 | 1.773 | 13.1 | 62.9 | 7 | |
Rlasso | 0.963 | 1.767 | 16.6 | 61.5 | 5 | ||
Alasso | 0.967 | 1.770 | 14.6 | 61.9 | 6 | ||
Radlasso | 0.958 | 1.763 | 18.7 | 60.4 | 7 | ||
1000 | Lasso | 0.967 | 1.774 | 14.6 | 59.7 | 8 | |
Rlasso | 0.960 | 1.768 | 17.9 | 58.7 | 6 | ||
Alasso | 0.964 | 1.772 | 15.8 | 59.4 | 7 | ||
Radlasso | 0.957 | 1.765 | 19.3 | 57.6 | 8 |
Method | Lasso | Rlasso | Alasso | Radlasso |
---|---|---|---|---|
MSE | 0.521 | 0.485 | 0.575 | 0.429 |
Order Number | Explanatory Variable | Coefficient |
---|---|---|
Cash Flow from Operations | 0.008 | |
Net Increase in Cash and Cash Equivalents | 0.048 | |
Net Accounts Receivable | 0.208 | |
Non-Current Assets | −0.214 | |
Business Taxes and Surcharges | −0.265 | |
Interest Income | 0.130 | |
Profit and Loss from Asset Disposal | 0.154 | |
Cash Paid for Commodities or Labor | 0.386 | |
Cash Paid to and for Employees | 0.569 | |
Cash Flow from Financing Activities Net Amount | −0.080 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, R.; Zhao, T.; Lu, Y.; Xu, X. Relaxed Adaptive Lasso and Its Asymptotic Results. Symmetry 2022, 14, 1422. https://doi.org/10.3390/sym14071422
Zhang R, Zhao T, Lu Y, Xu X. Relaxed Adaptive Lasso and Its Asymptotic Results. Symmetry. 2022; 14(7):1422. https://doi.org/10.3390/sym14071422
Chicago/Turabian StyleZhang, Rufei, Tong Zhao, Yajun Lu, and Xieting Xu. 2022. "Relaxed Adaptive Lasso and Its Asymptotic Results" Symmetry 14, no. 7: 1422. https://doi.org/10.3390/sym14071422
APA StyleZhang, R., Zhao, T., Lu, Y., & Xu, X. (2022). Relaxed Adaptive Lasso and Its Asymptotic Results. Symmetry, 14(7), 1422. https://doi.org/10.3390/sym14071422