A Review on Variable Selection in Regression Analysis
Abstract
:1. Introduction
2. Typology of Procedures
- Test-based
- Penalty-based
- Screening-based
2.1. Test-Based
2.2. Penalty-Based
2.3. Screening-Based
3. Linear Models
3.1. Testing
3.2. Penalty
3.2.1. Norm Penalties
- SparseStep:
- LASSO:
- Ridge:
3.2.2. Concave Penalties
- Non-negative garotte:
- SCAD:
- MCP:
3.3. Screening
3.3.1. Regressor Based
3.3.2. Covariance Based
4. Grouped Models
4.1. Penalty
4.1.1. Single-Level
4.1.2. Bi-Level
5. Additive Models
5.1. Penalty
5.2. Screening
6. Partial Linear Models
6.1. Standard
Penalty
6.2. Varying Coefficients
6.2.1. Penalty
6.2.2. Testing/Penalty
7. Non-Parametric Models
7.1. Testing
7.2. Penalty
7.3. Screening
7.3.1. Model-Free
- DC-SIS (Li et al. 2012)The Distance Correlation (DC) is a generalization of the Pearson Correlation Coefficient in terms of norm distances. It can be written as:
- HSIC-SIS (Balasubramanian et al. 2013)The Hilbert–Schmidt Independence Criterion (HSIC) generalizes the previous one as it defines a maximum distance metric in a RKHS space:We recognize again the form of the usual correlation but this time written in terms of kernels. To avoid the choice of the bandwidths in kernels, they decided to use the sup of the criterion over a family of kernels .Empirically, the ranking measure is simpler to compute:
- KCCA-SIS Liu et al. (2016)The Kernel Canonical Correlation Analysis (KCCA) is the last improvement in the field of non-parametric screening. It encompasses SIS as it can handle non-linearities. Unlike DC-SIS, it is scale-free and does not rely on the Gaussian assumption. However, even if it shares many aspects of HSIC-SIS, it differs in one aspect: HSIC is based on maximum covariance between the transformations of two variables, while KCCA uses the maximum correlation between the transformations by removing the marginal variations. Their measure is defined as:Because the covariance matrices may not be invertible, they introduce a ridge penalty :The correlation measure is then defined as the norm of the correlation operator:Empirical estimates of covariance matrices are obtained after singular decomposition of kernel matrices (the latter being the same as in HSIC). While bandwidths in kernels can be chosen optimally ex ante, has to be estimated via GCV over a grid of values.
- DC-SIS:
- HSIC-SIS:
- KCCA-SIS:
7.3.2. Model Based
8. Improving on Variable Selection
8.1. Stability Selection
8.2. Ranking-Based Variable Selection
9. Discussion
Funding
Acknowledgments
Conflicts of Interest
References
- Abenius, Tobias. 2012. Lassoshooting: L1 Regularized Regression (Lasso) Solver Using the Cyclic Coordinate Descent algorithm aka Lasso Shooting. R Package Version 0.1.5-1. Available online: https://CRAN.R-project.org/package=lassoshooting (accessed on 15 November 2018).
- Akaike, Hirotugu. 1973. Information Theory and an Extension of Maximum Likelihood Principle. Paper presented at 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, September 2–8; pp. 267–81. [Google Scholar]
- Bach, Francis R. 2008. Bolasso: Model Consistent Lasso Estimation through the Bootstrap. Paper presented at 25th International Conference on Machine Learning, Helsinki, Finland, July 5–9; pp. 33–40. [Google Scholar]
- Balasubramanian, Krishnakumar, Bharath Sriperumbudur, and Guy Lebanon. 2013. Ultrahigh dimensional feature screening via rkhs embeddings. Artificial Intelligence and Statistics 31: 126–34. [Google Scholar]
- Baranowski, Rafal, Patrick Breheny, and Isaac Turner. 2015. rbvs: Ranking-Based Variable Selection. R Package Version 1.0.2. Available online: https://CRAN.R-project.org/package=rbvs (accessed on 15 November 2018).
- Baranowski, Rafal, Yining Chen, and Piotr Fryzlewicz. 2018. Ranking-based variable selection for high-dimensional data. Statistica Sinica. in press. [Google Scholar] [CrossRef]
- Bickel, Peter J., Friedrich Götze, and Willem R. van Zwet. 2012. Resampling Fewer Than n Observations: Gains, Losses, and Remedies for Losses. New York: Springer, pp. 267–97. [Google Scholar]
- Blum, Avrim L., and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97: 245–71. [Google Scholar] [CrossRef]
- Boyd, Stephen, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3: 1–122. [Google Scholar] [CrossRef]
- Breaux, Harold J. 1967. On Stepwise Multiple Linear Regression. Technical Report. Aberdeen: Army Ballistic Research Lab Aberdeen Proving Ground MD. [Google Scholar]
- Breheny, Patrick, and Jian Huang. 2009. Penalized methods for bi-level variable selection. Statistics and Its Interface 2: 369. [Google Scholar] [CrossRef] [PubMed]
- Breheny, Patrick, and Jian Huang. 2011. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5: 232–53. [Google Scholar] [CrossRef] [PubMed]
- Breheny, Patrick, and Jian Huang. 2015. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing 25: 173–87. [Google Scholar] [CrossRef] [PubMed]
- Breiman, Leo, and Jerome H. Friedman. 1985. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association 80: 580–98. [Google Scholar] [CrossRef]
- Breiman, Leo. 1995. Better subset regression using the nonnegative garrote. Technometrics 37: 373–84. [Google Scholar] [CrossRef]
- Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef]
- Castle, Jennifer L., Jurgen A. Doornik, and David F. Hendry. 2011. Evaluating automatic model selection. Journal of Time Series Econometrics 3. [Google Scholar] [CrossRef]
- Castle, Jennifer L., and David F. Hendry. 2010. A low-dimension portmanteau test for non-linearity. Journal of Econometrics 158: 231–45. [Google Scholar] [CrossRef]
- Cawley, Gavin C., and Nicola L. C. Talbot. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11: 2079–107. [Google Scholar]
- Chen, Xueying, and Min-Ge Xie. 2014. A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica 24: 1655–84. [Google Scholar]
- Cheng, Guang, Hao H. Zhang, and Zuofeng Shang. 2015. Sparse and efficient estimation for partial spline models with increasing dimension. Annals of the Institute of Statistical Mathematics 67: 93–127. [Google Scholar] [CrossRef] [PubMed]
- Choi, Nam Hee, William Li, and Ji Zhu. 2010. Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association 105: 354–64. [Google Scholar] [CrossRef]
- Ding, Ying, Shaowu Tang, Serena G. Liao, Jia Jia, Steffi Oesterreich, Yan Lin, and George C. Tseng. 2014. Bias correction for selecting the minimal-error classifier from many machine learning models. Bioinformatics 30: 3152–58. [Google Scholar] [CrossRef] [PubMed]
- Doornik, Jurgen A. 2009. Econometric Model Selection with More Variables Than Observations. Oxford: Economics Department, University of Oxford, Unpublished Work. [Google Scholar]
- de Rooi, Johan, and Paul Eilers. 2011. Deconvolution of pulse trains with the L0 penalty. Analytica Chimica Acta 705: 218–26. [Google Scholar] [CrossRef] [PubMed]
- Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–99. [Google Scholar]
- Epprecht, Camila, Dominique Guegan, Álvaro Veiga, and Joel Correa da Rosa. 2017. Variable Selection and Forecasting via Automated Methods for Linear Models: Lasso/adalasso and Autometrics. Documents de travail du Centre d’Economie de la Sorbonne 2013.80. Paris: Centre d’Economie de la Sorbonne. [Google Scholar]
- Eugster, Manuel, Torsten Hothorn, The Students of the ‘Advanced R Programming Course’ Hannah Frick, Ivan Kondofersky, Oliver S. Kuehnle, Christian Lindenlaub, Georg Pfundstein, Matthias Speidel, Martin Spindler, Ariane Straub, and et al. 2013. hgam: High-Dimensional Additive Modelling. R Package Version 0.1-2. Available online: https://CRAN.R-project.org/package=hgam (accessed on 15 November 2018).
- Fan, Jianqing, Yang Feng, and Rui Song. 2011. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association 106: 544–57. [Google Scholar] [CrossRef] [PubMed]
- Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
- Fan, Jianqing, and Jinchi Lv. 2008. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B 70: 849–911. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fan, Jianqing, and Jinchi Lv. 2010a. A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20: 101. [Google Scholar]
- Fan, Jianqing, and Jinchi Lv. 2010b. Sure Independence Screening. R Package Version. Available online: https://cran.r-project.org/web/packages/SIS/SIS.pdf (accessed on 15 November 2018).
- Fan, Jianqing, Richard Samworth, and Yichao Wu. 2009. Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research 10: 2013–38. [Google Scholar] [PubMed]
- Fan, Jianqing, and Wenyang Zhang. 2008. Statistical methods with varying coefficient models. Statistics and Its Interface 1: 179. [Google Scholar] [CrossRef] [PubMed]
- Flom, Peter L., and David L. Cassell. 2007. Stopping Stepwise: Why Stepwise and Similar Selection Methods Are Bad, and What You Should Use. Paper presented at NorthEast SAS Users Group Inc 20th Annual Conference, Baltimore, MD, USA, November 11–14. [Google Scholar]
- Frank, Ildiko, and Jerome H. Friedman. 1993. A statistical view of some chemometrics regression tools. Technometrics 35: 109–35. [Google Scholar] [CrossRef]
- Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33: 1. [Google Scholar] [CrossRef] [PubMed]
- Friedman, Jerome H. 1991. Multivariate adaptive regression splines. The Annals of Statistics 19: 1–67. [Google Scholar] [CrossRef]
- Fu, Wenjiang J. 1998. Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics 7: 397–416. [Google Scholar]
- Hall, Peter, and Hugh Miller. 2009. Using generalized correlation to effect variable selection in very high dimensional problems. Journal of Computational and Graphical Statistics 18: 533–50. [Google Scholar] [CrossRef]
- Hannan, Edward J., and Barry G. Quinn. 1979. The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B 41: 190–95. [Google Scholar]
- Hastie, Trevor, and Bradley Efron. 2013. Lars: Least Angle Regression, Lasso and Forward Stagewise. R Package Version 1.2. Available online: https://CRAN.R-project.org/package=lars (accessed on 15 November 2018).
- Hendry, David F., and Jean-Francois Richard. 1987. Recent Developments in the Theory of Encompassing. Technical Report. Louvain-la-Neuve: Université catholique de Louvain, Center for Operations Research and Econometrics (CORE). [Google Scholar]
- Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
- Hofner, Benjamin, and Torsten Hothorn. 2017. Stabs: Stability Selection with Error Control. R Package Version 0.6-3. Available online: https://CRAN.R-project.org/package=stabs (accessed on 15 November 2018).
- Hu, Tao, and Yingcun Xia. 2012. Adaptive semi-varying coefficient model selection. Statistica Sinica 22: 575–99. [Google Scholar] [CrossRef]
- Huang, Jian, Patrick Breheny, and Shuange Ma. 2012. A selective review of group selection in high-dimensional models. Statistical Science 27. [Google Scholar] [CrossRef] [PubMed]
- Huang, Jian, Shuange Ma, Huiliang Xie, and Cun-Hui Zhang. 2009. A group bridge approach for variable selection. Biometrika 96: 339–55. [Google Scholar] [CrossRef] [PubMed]
- Hurvich, Clifford M., and Chih-Ling Tsai. 1989. Regression and time series model selection in small samples. Biometrika 76: 297–307. [Google Scholar] [CrossRef]
- Hurvich, Clifford M., and Chih-Ling Tsai. 1990. The impact of model selection on inference in linear regression. The American Statistician 44: 214–17. [Google Scholar]
- Jović, Alan, Karla Brkić, and Nikola Bogunović. 2015. A Review of Feature Selection Methods with Applications. Paper presented at 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, May 25–29; pp. 1200–5. [Google Scholar]
- Ke, Tracy, Jiashun Jin, and Jianqing Fan. 2014. Covariate assisted screening and estimation. The Annals of Statistics 42: 2202. [Google Scholar] [CrossRef] [PubMed]
- Ke, Tracy, and Fan Yang. 2017. Covariate assisted variable ranking. arXiv, arXiv:1705.10370. [Google Scholar]
- Kim, Yongdai, Hosik Choi, and Hee-Seok Oh. 2008. Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association 103: 1665–73. [Google Scholar] [CrossRef]
- Kowalski, Matthieu. 2014. Thresholding Rules and Iterative Shrinkage/Thresholding Algorithm: A Convergence Study. Paper presented at 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, October 27–30; pp. 4151–55. [Google Scholar]
- Lafferty, John, and Larry Wasserman. 2008. Rodeo: Sparse, greedy nonparametric regression. The Annals of Statistics 36: 28–63. [Google Scholar] [CrossRef]
- Li, Runze, Liying Huang, and John Dziak. 2018. VariableScreening: High-Dimensional Screening for Semiparametric Longitudinal Regression. R Package Version 0.2.0. Available online: https://CRAN.R-project.org/package=VariableScreening (accessed on 15 November 2018).
- Li, Runze, and Hua Liang. 2008. Variable selection in semiparametric regression modeling. The Annals of Statistics 36: 261. [Google Scholar] [CrossRef] [PubMed]
- Li, Runze, Wei Zhong, and Liping Zhu. 2012. Feature screening via distance correlation learning. Journal of the American Statistical Association 107: 1129–39. [Google Scholar] [CrossRef] [PubMed]
- Lian, Heng, Hua Liang, and David Ruppert. 2015. Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Statistica Sinica 25: 591–607. [Google Scholar]
- Liaw, Andy, and Matthew Wiener. 2002. Classification and regression by randomforest. R News 2: 18–22. [Google Scholar]
- Lin, Yi, and Hao H. Zhang. 2006. Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics 34: 2272–97. [Google Scholar] [CrossRef]
- Liu, Tianqi, Kuang-Yao Lee, and Hongyu Zhao. 2016. Ultrahigh dimensional feature selection via kernel canonical correlation analysis. arXiv, arXiv:1604.07354. [Google Scholar]
- Lumley, Thomas. 2017. Leaps: Regression Subset Selection. R Package Version 3.0. Available online: https://CRAN.R-project.org/package=leaps (accessed on 15 November 2018).
- Mallows, Colin L. 1973. Some comments on cp. Technometrics 15: 661–75. [Google Scholar]
- McIlhagga, William H. 2016. Penalized: A matlab toolbox for fitting generalized linear models with penalties. Journal of Statistical Software 72. [Google Scholar] [CrossRef]
- Mehmood, Tahir, Kristian Hovde Liland, Lars Snipen, and Solve Sæbø. 2012. A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems 118: 62–69. [Google Scholar] [CrossRef]
- Meier, Lukas, Sara Van de Geer, and Peter Buhlmann. 2009. High-dimensional additive modeling. The Annals of Statistics 37: 3779–821. [Google Scholar] [CrossRef]
- Meinshausen, Nicolai, and Peter Bühlmann. 2006. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34: 1436–62. [Google Scholar] [CrossRef]
- Meinshausen, Nicolai, and Peter Bühlmann. 2010. Stability selection. Journal of the Royal Statistical Society: Series B 72: 417–73. [Google Scholar] [CrossRef]
- Milborrow, Stephen. 2018. Earth: Multivariate Adaptive Regression Splines. R Package Version 4.6.2. Available online: https://CRAN.R-project.org/package=earth (accessed on 15 November 2018).
- Nadaraya, Elizbar A. 1964. On estimating regression. Theory of Probability & Its Applications 9: 141–42. [Google Scholar]
- Ni, Xiao, Hao H. Zhang, and Daowen Zhang. 2009. Automatic model selection for partially linear models. Journal of Multivariate Analysis 100: 2100–11. [Google Scholar] [CrossRef] [PubMed]
- Park, Byeong U., Enno Mammen, Young K. Lee, and Eun Ryung Lee. 2015. Varying coefficient regression models: a review and new developments. International Statistical Review 83: 36–64. [Google Scholar] [CrossRef]
- Pretis, Felix, J. James Reade, and Genaro Sucarrat. 2018. Automated general-to-specific (GETS) regression modeling and indicator saturation for outliers and structural breaks. Journal of Statistical Software 86: 1–44. [Google Scholar] [CrossRef]
- Radchenko, Peter, and Gareth M. James. 2010. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association 105: 1541–53. [Google Scholar] [CrossRef]
- Ravikumar, Pradeep, Han Liu, John Lafferty, and Larry Wasserman. 2007. Spam: Sparse Additive Models. Paper presented at 20th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, December 3–6; Red Hook: Curran Associates Inc., pp. 1201–8. [Google Scholar]
- Saeys, Yvan, Iñaki Inza, and Pedro Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23: 2507–17. [Google Scholar] [CrossRef] [PubMed]
- Saldana, Diego Franco, and Yang Feng. 2018. Sis: An R package for sure independence screening in ultrahigh-dimensional statistical models. Journal of Statistical Software 83: 1–25. [Google Scholar] [CrossRef]
- Santos, Carlos, David F. Hendry, and Soren Johansen. 2008. Automatic selection of indicators in a fully saturated regression. Computational Statistics 23: 317–35. [Google Scholar] [CrossRef]
- Schwarz, Gideon. 1978. Estimating the dimension of a model. The Annals of Statistics 6: 461–64. [Google Scholar] [CrossRef]
- Shah, Rajen D., and Richard J. Samworth. 2013. Variable selection with error control: Another look at stability selection. Journal of the Royal Statistical Society: Series B 75: 55–80. [Google Scholar] [CrossRef] [Green Version]
- Steyerberg, Ewout W., Marinus J. C. Eijkemans, and J. Dik F. Habbema. 1999. Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis. Journal of Clinical Epidemiology 52: 935–42. [Google Scholar] [CrossRef]
- Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B 58: 267–88. [Google Scholar]
- Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B 67: 91–108. [Google Scholar] [CrossRef] [Green Version]
- Ulbricht, Jan. 2012. lqa: Penalized Likelihood Inference for GLMs. R Package Version 1.0-3. Available online: https://CRAN.R-project.org/package=lqa (accessed on 15 November 2018).
- van den Burg, Gerrit J. J., Patrick J. F. Groenen, and Andreas Alfons. 2017. Sparsestep: Approximating the counting norm for sparse regularization. arXiv, arXiv:1701.06967. [Google Scholar]
- Varma, Sudhir, and Richard Simon. 2006. Bias in error estimation when using cross-validation for model selection. Bioinformatics 7: 91. [Google Scholar] [PubMed]
- Wang, Hansheng. 2009. Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association 104: 1512–24. [Google Scholar] [CrossRef]
- Wang, Hansheng, and Yingcun Xia. 2009. Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association 104: 747–57. [Google Scholar] [CrossRef]
- Wang, Lifeng, Guang Chen, and Hongzhe Li. 2007. Group scad regression analysis for microarray time course gene expression data. Bioinformatics 23: 1486–94. [Google Scholar] [CrossRef] [PubMed]
- Watson, Geoffrey S. 1964. Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A 26: 359–72. [Google Scholar]
- Weisberg, Sanford. 2005. Applied Linear Regression. Hoboken: John Wiley & Sons, vol. 528. [Google Scholar]
- Wen, Canhong, Wenliang Pan, Mian Huang, and Xueqin Wang. 2014. cdcsis: Conditional Distance Correlation and Its Related Feature Screening Method. R Package Version 1.0. Available online: https://CRAN.R-project.org/package=cdcsis (accessed on 15 November 2018).
- Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury, and Robert P. Freckleton. 2006. Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75: 1182–89. [Google Scholar] [CrossRef] [PubMed]
- Wu, Tong Tong, and Kenneth Lange. 2008. Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics 2: 224–44. [Google Scholar] [CrossRef]
- Yuan, Ming, and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B 68: 49–67. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Cun-Hui. 2007. Penalized Linear Unbiased Selection. Camden: Rutgers University. [Google Scholar]
- Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef]
- Zhang, Hao H., and Chen-Yen Lin. 2013. cosso: Fit Regularized Nonparametric Regression Models Using COSSO Penalty. R Package Version 2.1-1. Available online: https://CRAN.R-project.org/package=cosso (accessed on 15 November 2018).
- Zhang, Jing, Yanyan Liu, and Yuanshan Wu. 2017. Correlation rank screening for ultrahigh-dimensional survival data. Computational Statistics & Data Analysis 108: 121–32. [Google Scholar]
- Zhao, Tuo, Xingguo Li, Han Liu, and Kathryn Roeder. 2014. SAM: Sparse Additive Modelling. R Package Version 1.0.5. Available online: https://CRAN.R-project.org/package=SAM (accessed on 15 November 2018).
- Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]
- Zou, Hui, and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B 67: 301–20. [Google Scholar] [CrossRef]
1 | Some of them are not presented in this paper either because they are out of its scope, e.g., Bayesian framework, or because they are special cases of other ones. |
2 | Even though it started in 1987, there are still ongoing improvements being reported. |
3 | This should be investigated more deeply; to the best of our knowledge, no papers have tried to compare their non-linear regression to the very well-known non-parametric procedures such as Kernels or Splines. An obvious link can be made with Projection Pursuit Regression (PPR); in this respect, we claim that autometrics may be a special case of PPR. |
4 | |
5 | One can get insights of how to connect LASSO to stepwise regression (Efron et al. 2004) via the forward stagewise method of Weisberg (2005). |
6 | |
7 | This is true only for large values of parameters; the reader can get intuitions of this phenomenon with threshold methods (Kowalski 2014). |
8 | Usually the ridge because it has an analytical solution. |
9 | Because of low computational costs, but it can be estimated with any non-parametric regression technology. |
10 | Distance Correlation-SIS, Hilbert Schmidt Independence Criterion-SIS, Kernel Canonical Correlation Analysis and the Generalized Correlation, respectively. |
11 | Mean Decrease Impurity, Mean Decrease Accuracy and the Regularization Of Derivative Expectation Operator, respectively. |
12 | These are known as “greedy algorithms” where the optimal global solution is sought by taking optimal local solutions. |
13 | They also introduced it as SpIn (SpAM with INteractions) in their paper but claimed that interactions would then not be treated efficiently. |
14 | Which is out of the scope of this paper but still very important. |
15 | This relates obviously to the problem raised when discussing stepwise regression. Here, the ensemble is a subset of the model space. |
16 | In the case of a regression, it is how well the subregion can be approximated by a constant. |
17 | Without replacement, random samples have to be non-overlapping. |
Screening | Penalty | Testing | |
---|---|---|---|
Linear | SIS | SparseStep | Stepwise |
SFR | LASSO | Autometrics | |
CASE | Ridge | ||
FA-CAR | BRidge | ||
SCAD | |||
MCP | |||
NNG | |||
SHIM | |||
Group | gLASSO | ||
gBridge | |||
gSCAD | |||
gMCP | |||
ElasticNet | |||
NIS | SpAM | ||
Additive | CR-SIS | penGAM | |
Partial Linear | kernelLASSO | SP-GLRT | |
adaSVC | |||
DPLSE | |||
PSA | |||
PEPS | |||
Non-Parametric | DC-SIS | VANISH | MARS |
HSIC-SIS | COSSO | ||
KCCA-SIS | |||
Gcorr | |||
MDI | |||
MDA | |||
RODEO |
© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Desboulets, L.D.D. A Review on Variable Selection in Regression Analysis. Econometrics 2018, 6, 45. https://doi.org/10.3390/econometrics6040045
Desboulets LDD. A Review on Variable Selection in Regression Analysis. Econometrics. 2018; 6(4):45. https://doi.org/10.3390/econometrics6040045
Chicago/Turabian StyleDesboulets, Loann David Denis. 2018. "A Review on Variable Selection in Regression Analysis" Econometrics 6, no. 4: 45. https://doi.org/10.3390/econometrics6040045
APA StyleDesboulets, L. D. D. (2018). A Review on Variable Selection in Regression Analysis. Econometrics, 6(4), 45. https://doi.org/10.3390/econometrics6040045