Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Frequentist High-Dimensional Inference
2.1.1. Classical Large Sample Properties
2.1.2. Post-Selection Inference
2.1.3. Data Shuffling via Sample Splitting and Bootstrapping
2.1.4. The Knock-Off Procedures
2.1.5. Remarks on Model Assumptions, Robustness and Inference Measures
2.2. The Bayesian Inferences in High Dimensions
2.2.1. The Shrinkage Priors
2.2.2. The Robust Likelihood
2.3. Connections between High-Dimensional Frequentist and Bayesian Methods
3. Simulation Study
4. Examples
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ADMM | Alternating direction method of multipliers |
BL | Bayesian LASSO |
BLSS | Bayesian LASSO with spike-and-slab prior |
BMI | Body mass index |
CI | Confidence interval |
CV | Cross-validation |
FDR | False discovery rate |
GE | Gene expression |
GWAS | Genome-wide association study |
LAD | Least absolute deviation |
LASSO | Least absolute shrinkage and selection operator |
MCMC | Markov chain Monte Carlo |
MCP | Minimax concave penalty |
MPM | Median probability model |
PGEE | Penalized generalized estimating |
RBL | Robust Bayesian LASSO |
RBLSS | Robust Bayesian LASSO with spike-and-slab prior |
SCAD | Smoothly clipped absolute deviation |
SNP | Single-nucleotide polymorphism |
Appendix A
Appendix A.1
Methods | |||||||
---|---|---|---|---|---|---|---|
BL | BLSS | RBL | RBLSS | Debiased-LASSO | Selective Inference | ||
Error 1 N(0,1) | error of | 1.773 | 0.250 | 2.898 | 0.274 | 1.038 | 0.718 |
Standard deviation | 0.312 | 0.113 | 0.299 | 0.121 | 0.026 | 0.023 | |
error of | 11.891 | 0.331 | 16.553 | 0.601 | 0.128 | 1.021 | |
Standard deviation | 0.797 | 0.176 | 0.681 | 0.291 | 0.027 | 0.100 | |
TP | 2.696 | 3.000 | 0.906 | 3.000 | 3.000 | 3.000 | |
Standard deviation | 0.460 | 0.000 | 0.549 | 0.000 | 0.000 | 0.000 | |
FP | 0.000 | 0.112 | 0.000 | 0.341 | 3.057 | 21.533 | |
Standard deviation | 0.000 | 0.357 | 0.000 | 0.647 | 0.547 | 1.722 | |
Coverage of | |||||||
0.138 | 0.957 | 0.167 | 0.939 | 0.938 | 0.725 | ||
0.153 | 0.950 | 0.087 | 0.920 | 0.945 | 0.725 | ||
0.166 | 0.943 | 0.079 | 0.927 | 0.943 | 0.725 | ||
Average length | |||||||
0.789 | 0.412 | 1.030 | 0.416 | 0.458 | - | ||
0.901 | 0.411 | 1.309 | 0.413 | 0.456 | - | ||
0.908 | 0.410 | 1.607 | 0.412 | 0.457 | - | ||
Coverage of | 0.995 | 0.994 | 0.998 | 0.994 | 0.957 | 0.020 | |
Average length | 0.405 | 0.006 | 0.814 | 0.010 | 0.456 | - | |
Error 2 t(2) | error of | 2.549 | 1.096 | 3.125 | 0.376 | 3.049 | 1.730 |
Standard deviation | 0.551 | 0.881 | 0.372 | 0.285 | 0.103 | 0.081 | |
error of | 27.124 | 8.645 | 23.650 | 0.390 | 0.067 | 2.099 | |
Standard deviation | 39.807 | 40.405 | 3.622 | 0.229 | 0.021 | 0.213 | |
TP | 1.581 | 2.611 | 0.474 | 2.972 | 1.996 | 2.731 | |
Standard deviation | 0.793 | 0.610 | 0.558 | 0.208 | 0.117 | 0.066 | |
FP | 0.007 | 15.999 | 0.000 | 0.039 | 0.849 | 19.204 | |
Standard deviation | 0.083 | 78.785 | 0.000 | 0.223 | 0.247 | 1.632 | |
Coverage of | |||||||
0.204 | 0.790 | 0.236 | 0.950 | 0.956 | 0.582 | ||
0.195 | 0.865 | 0.149 | 0.963 | 0.961 | 0.657 | ||
0.192 | 0.895 | 0.121 | 0.962 | 0.962 | 0.657 | ||
Average length | |||||||
0.969 | 1.073 | 1.106 | 0.666 | 1.282 | - | ||
1.218 | 1.127 | 1.636 | 0.629 | 1.281 | - | ||
1.374 | 1.152 | 1.653 | 0.631 | 1.276 | - | ||
Coverage of | 0.997 | 0.996 | 0.999 | 0.994 | 0.954 | 0.018 | |
Average length | 0.703 | 0.208 | 0.923 | 0.007 | 1.282 | - |
References
- Saeys, Y.; Inza, I.; Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed]
- Ma, S.; Huang, J. Penalized feature selection and classification in bioinformatics. Briefings Bioinform. 2008, 9, 392–403. [Google Scholar] [CrossRef] [PubMed]
- O’hara, R.B.; Sillanpää, M.J. A review of Bayesian variable selection methods: What, how and which. Bayesian Anal. 2009, 4, 85–117. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical learning with sparsity. Monogr. Stat. Appl. Probab. 2015, 143, 8. [Google Scholar]
- Wu, Y.; Wang, L. A survey of tuning parameter selection for high-dimensional regression. Annu. Rev. Stat. Its Appl. 2020, 7, 209–226. [Google Scholar] [CrossRef]
- Breheny, P.J. Marginal false discovery rates for penalized regression models. Biostatistics 2019, 20, 299–314. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
- Fan, J.; Liao, Y.; Liu, H. An overview of the estimation of large covariance and precision matrices. Econom. J. 2016, 19, C1–C32. [Google Scholar] [CrossRef]
- Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. Stat. Methodol. 2010, 72, 417–473. [Google Scholar] [CrossRef]
- Benjamini, Y.; Heller, R.; Yekutieli, D. Selective inference in complex research. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2009, 367, 4255–4271. [Google Scholar] [CrossRef]
- Gelman, A.; Loken, E. The statistical crisis in science. Am. Sci. 2014, 102, 460–465. [Google Scholar] [CrossRef]
- Benjamini, Y. Selective inference: The silent killer of replicability. Harv. Data Sci. Rev. 2020, 2. [Google Scholar] [CrossRef]
- Wang, H. Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 2012, 7, 867–886. [Google Scholar] [CrossRef]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Benjamini, Y.; Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 2001, 29, 1165–1188. [Google Scholar] [CrossRef]
- Wu, C.; Ma, S. A selective review of robust variable selection with applications in bioinformatics. Briefings Bioinform. 2015, 16, 873–883. [Google Scholar] [CrossRef]
- Zhou, F.; Ren, J.; Lu, X.; Ma, S.; Wu, C. Gene–environment interaction: A variable selection perspective. Epistasis Methods Protoc. 2021, 2212, 191–223. [Google Scholar]
- Noh, H.; Chung, K.; Van Keilegom, I. Variable selection of varying coefficient models in quantile regression. Electron. J. Stat. 2012, 6, 1220–1238. [Google Scholar] [CrossRef]
- Tang, Y.; Wang, H.J.; Zhu, Z.; Song, X. A unified variable selection approach for varying coefficient models. Stat. Sin. 2012, 601–628. [Google Scholar] [CrossRef]
- Zhou, F.; Ren, J.; Ma, S.; Wu, C. The Bayesian regularized quantile varying coefficient model. Comput. Stat. Data Anal. 2023, 107808. [Google Scholar] [CrossRef]
- Dezeure, R.; Bühlmann, P.; Meier, L.; Meinshausen, N. High-dimensional inference: Confidence intervals, p-values and R-software hdi. Stat. Sci. 2015, 30, 533–558. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Cai, T.T.; Sun, W. Large-scale global and simultaneous inference: Estimation and testing in very high dimensions. Annu. Rev. Econ. 2017, 9, 411–439. [Google Scholar] [CrossRef]
- Kuchibhotla, A.K.; Kolassa, J.E.; Kuffner, T.A. Post-selection inference. Annu. Rev. Stat. Its Appl. 2022, 9, 505–527. [Google Scholar] [CrossRef]
- Chernozhukov, V.; Chetverikov, D.; Kato, K.; Koike, Y. High-dimensional data bootstrap. Annu. Rev. Stat. Its Appl. 2023, 10, 427–449. [Google Scholar] [CrossRef]
- Huang, Y.; Li, C.; Li, R.; Yang, S. An overview of tests on high-dimensional means. J. Multivar. Anal. 2022, 188, 104813. [Google Scholar] [CrossRef]
- Zhang, D.; Khalili, A.; Asgharian, M. Post-model-selection inference in linear regression models: An integrated review. Stat. Surv. 2022, 16, 86–136. [Google Scholar] [CrossRef]
- Heinze, G.; Wallisch, C.; Dunkler, D. Variable selection–A review and recommendations for the practicing statistician. Biom. J. 2018, 60, 431–449. [Google Scholar] [CrossRef]
- Bühlmann, P.; Kalisch, M.; Meier, L. High-dimensional statistics with a view toward applications in biology. Annu. Rev. Stat. Its Appl. 2014, 1, 255–278. [Google Scholar] [CrossRef]
- Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 2008, 17, 347–388. [Google Scholar] [CrossRef]
- Lu, Z.; Lou, W. Bayesian approaches to variable selection: A comparative study from practical perspectives. Int. J. Biostat. 2022, 18, 83–108. [Google Scholar] [CrossRef] [PubMed]
- Fridley, B.L. Bayesian variable and model selection methods for genetic association studies. Genet. Epidemiol. Off. Publ. Int. Genet. Epidemiol. Soc. 2009, 33, 27–37. [Google Scholar] [CrossRef]
- Müller, P.; Parmigiani, G.; Rice, K. FDR and Bayesian Multiple Comparisons Rules. In Bayesian Statistics 8, Proceedings of the Eighth Valencia International Meeting, Valencia, Spain, 2–6 June 2006; Oxford University Press: Oxford, UK, 2007. [Google Scholar] [CrossRef]
- Bhadra, A.; Datta, J.; Polson, N.G.; Willard, B. Lasso meets horseshoe. Stat. Sci. 2019, 34, 405–427. [Google Scholar] [CrossRef]
- Shafer, G.; Vovk, V. A tutorial on conformal prediction. J. Mach. Learn. Res. 2008, 9, 371–421. [Google Scholar]
- Fontana, M.; Zeni, G.; Vantini, S. Conformal prediction: A unified review of theory and new challenges. Bernoulli 2023, 29, 1–23. [Google Scholar] [CrossRef]
- Angelopoulos, A.N.; Bates, S. Conformal prediction: A gentle introduction. Found. Trends Mach. Learn. 2023, 16, 494–591. [Google Scholar] [CrossRef]
- Fan, J.; Lv, J. A selective overview of variable selection in high dimensional feature space. Stat. Sin. 2010, 20, 101. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall/CRC, 1995. [Google Scholar]
- Bolstad, W.M.; Curran, J.M. Introduction to Bayesian Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Rousseeuw, P.; Yohai, V. Robust regression by means of S-estimators. In Proceedings of the Robust and Nonlinear Time Series Analysis: Proceedings of a Workshop Organized by the Sonderforschungsbereich 123 “Stochastische Mathematische Modelle”; Springer: Berlin/Heidelberg, Germany, 1984; pp. 256–272. [Google Scholar]
- Huber, P.J.; Ronchetti, E.M. Robust Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Yu, K.; Moyeed, R.A. Bayesian quantile regression. Stat. Probab. Lett. 2001, 54, 437–447. [Google Scholar] [CrossRef]
- Svensén, M.; Bishop, C.M. Robust Bayesian mixture modelling. Neurocomputing 2005, 64, 235–252. [Google Scholar] [CrossRef]
- Hjort, N.L.; Holmes, C.; Müller, P.; Walker, S.G. Bayesian Nonparametrics; Cambridge University Press: Cambridge, UK, 2010; Volume 28. [Google Scholar]
- Ghosal, S.; van der Vaart, A.W. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017; Volume 44. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
- Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
- Sun, T.; Zhang, C.H. Scaled sparse linear regression. Biometrika 2012, 99, 879–898. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. Stat. Methodol. 2005, 67, 91–108. [Google Scholar] [CrossRef]
- Tibshirani, R.J. The solution path of the generalized lasso. Ann. Stat. 2011, 39, 1335–1371. [Google Scholar] [CrossRef]
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. Stat. Methodol. 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Meier, L.; Van De Geer, S.; Bühlmann, P. The group lasso for logistic regression. J. R. Stat. Soc. Ser. Stat. Methodol. 2008, 70, 53–71. [Google Scholar] [CrossRef]
- Huang, J.; Breheny, P.; Ma, S. A selective review of group selection in high-dimensional models. Stat. Sci. Rev. J. Inst. Math. Stat. 2012, 27. [Google Scholar] [CrossRef]
- Li, C.; Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008, 24, 1175–1182. [Google Scholar] [CrossRef]
- Huang, J.; Ma, S.; Li, H.; Zhang, C.H. The sparse Laplacian shrinkage estimator for high-dimensional regression. Ann. Stat. 2011, 39, 2021. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Zhang, Q.; Jiang, Y.; Ma, S. Robust network-based analysis of the associations between (epi) genetic measurements. J. Multivar. Anal. 2018, 168, 119–130. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Zhou, F.; Ren, J.; Li, X.; Jiang, Y.; Ma, S. A selective review of multi-level omics data integration using variable selection. High-Throughput 2019, 8, 4. [Google Scholar] [CrossRef] [PubMed]
- Meinshausen, N.; Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 2006, 34, 1436–1462. [Google Scholar] [CrossRef]
- Rinaldo, A.; Wasserman, L.; G’Sell, M. Bootstrapping and sample splitting for high-dimensional, assumption-lean inference. Ann. Stat. 2019, 47, 3438–3469. [Google Scholar] [CrossRef]
- Lockhart, R.; Taylor, J.; Tibshirani, R.J.; Tibshirani, R. A significance test for the lasso. Ann. Stat. 2014, 42, 413. [Google Scholar] [CrossRef]
- Lee, J.D.; Sun, D.L.; Sun, Y.; Taylor, J.E. Exact post-selection inference, with application to the lasso. Ann. Stat. 2016, 44, 907–927. [Google Scholar] [CrossRef]
- Tibshirani, R.J.; Taylor, J.; Lockhart, R.; Tibshirani, R. Exact post-selection inference for sequential regression procedures. J. Am. Stat. Assoc. 2016, 111, 600–620. [Google Scholar] [CrossRef]
- Taylor, J.; Tibshirani, R.J. Statistical learning and selective inference. Proc. Natl. Acad. Sci. USA 2015, 112, 7629–7634. [Google Scholar] [CrossRef]
- Berk, R.; Brown, L.; Buja, A.; Zhang, K.; Zhao, L. Valid post-selection inference. Ann. Stat. 2013, 41, 802–837. [Google Scholar] [CrossRef]
- Bachoc, F.; Leeb, H.; Pötscher, B.M. Valid confidence intervals for post-model-selection predictors. Ann. Stat. 2019, 47, 1475–1504. [Google Scholar] [CrossRef]
- Bachoc, F.; Preinerstorfer, D.; Steinberger, L. Uniformly valid confidence intervals post-model-selection. Ann. Stat. 2020, 48, 440–463. [Google Scholar] [CrossRef]
- Javanmard, A.; Montanari, A. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 2014, 15, 2869–2909. [Google Scholar]
- Bühlmann, P. Statistical significance in high-dimensional linear models. Bernoulli 2013, 19, 1212–1242. [Google Scholar] [CrossRef]
- Van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 2014, 42, 1166–1202. [Google Scholar] [CrossRef]
- Zhang, C.H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. Stat. Methodol. 2014, 76, 217–242. [Google Scholar] [CrossRef]
- Song, Q.; Liang, F. Nearly optimal Bayesian shrinkage for high-dimensional regression. Sci. China Math. 2023, 66, 409–442. [Google Scholar] [CrossRef]
- Javanmard, A.; Javadi, H. False discovery rate control via debiased lasso. Electron. J. Stat. 2019, 13, 1212–1253. [Google Scholar] [CrossRef]
- Liang, W.; Zhang, Q.; Ma, S. Hierarchical false discovery rate control for high-dimensional survival analysis with interactions. Comput. Stat. Data Anal. 2024, 192, 107906. [Google Scholar] [CrossRef]
- Wasserman, L.; Roeder, K. High dimensional variable selection. Ann. Stat. 2009, 37, 2178. [Google Scholar] [CrossRef]
- Meinshausen, N.; Meier, L.; Bühlmann, P. P-values for high-dimensional regression. J. Am. Stat. Assoc. 2009, 104, 1671–1681. [Google Scholar] [CrossRef]
- Shah, R.D.; Samworth, R.J. Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. Stat. Methodol. 2013, 75, 55–80. [Google Scholar] [CrossRef]
- Dai, C.; Lin, B.; Xing, X.; Liu, J.S. False discovery rate control via data splitting. J. Am. Stat. Assoc. 2023, 118, 2503–2520. [Google Scholar] [CrossRef]
- Dai, C.; Lin, B.; Xing, X.; Liu, J.S. A scale-free approach for false discovery rate control in generalized linear models. J. Am. Stat. Assoc. 2023, 118, 1551–1565. [Google Scholar] [CrossRef]
- Candes, E.; Fan, Y.; Janson, L.; Lv, J. Panning for gold:‘model-X’knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. Stat. Methodol. 2018, 80, 551–577. [Google Scholar] [CrossRef]
- Barber, R.F.; Candès, E.J. Controlling the false discovery rate via knockoffs. Ann. Stat. 2015, 43, 2055–2085. [Google Scholar] [CrossRef]
- Barber, R.F.; Candès, E.J. A knockoff filter for high-dimensional selective inference. Ann. Stat. 2019, 47, 2504–2537. [Google Scholar] [CrossRef]
- Barber, R.F.; Candès, E.J.; Samworth, R.J. Robust inference with knockoffs. Ann. Stat. 2020, 48, 1409–1431. [Google Scholar] [CrossRef]
- Romano, Y.; Sesia, M.; Candès, E. Deep knockoffs. J. Am. Stat. Assoc. 2020, 115, 1861–1872. [Google Scholar] [CrossRef]
- Jordon, J.; Yoon, J.; van der Schaar, M. KnockoffGAN: Generating knockoffs for feature selection using generative adversarial networks. In Proceedings of the International Conference on Learning Representations, Vancouver, Canada, 30 April–3 May 2018. [Google Scholar]
- Bates, S.; Candès, E.; Janson, L.; Wang, W. Metropolized knockoff sampling. J. Am. Stat. Assoc. 2021, 116, 1413–1427. [Google Scholar] [CrossRef]
- Sesia, M.; Sabatti, C.; Candès, E.J. Gene hunting with hidden Markov model knockoffs. Biometrika 2019, 106, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Huang, D.; Janson, L. Relaxing the assumptions of knockoffs by conditioning. Ann. Stat. 2020, 48, 3021–3042. [Google Scholar] [CrossRef]
- Sesia, M.; Katsevich, E.; Bates, S.; Candès, E.; Sabatti, C. Multi-resolution localization of causal variants across the genome. Nat. Commun. 2020, 11, 1093. [Google Scholar] [CrossRef] [PubMed]
- Dai, R.; Barber, R. The knockoff filter for FDR control in group-sparse and multitask regression. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1851–1859. [Google Scholar]
- Dai, R.; Zheng, C. False discovery rate-controlled multiple testing for union null hypotheses: A knockoff-based approach. Biometrics 2023, 79, 3497–3509. [Google Scholar] [CrossRef]
- Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, Y. Variable selection in quantile regression. Stat. Sin. 2009, 19, 801–817. [Google Scholar]
- Fan, J.; Fan, Y.; Barut, E. Adaptive robust variable selection. Ann. Stat. 2014, 42, 324. [Google Scholar] [CrossRef]
- Kepplinger, D. Robust variable selection and estimation via adaptive elastic net S-estimators for linear regression. Comput. Stat. Data Anal. 2023, 183, 107730. [Google Scholar] [CrossRef]
- Belloni, A.; Chernozhukov, V.; Kato, K. Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 2015, 102, 77–94. [Google Scholar] [CrossRef]
- Belloni, A.; Chernozhukov, V.; Kato, K. Valid post-selection inference in high-dimensional approximately sparse quantile regression models. J. Am. Stat. Assoc. 2019, 114, 749–758. [Google Scholar] [CrossRef]
- Han, D.; Huang, J.; Lin, Y.; Shen, G. Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors. J. Econom. 2022, 230, 416–431. [Google Scholar] [CrossRef]
- He, X.; Pan, X.; Tan, K.M.; Zhou, W.X. Smoothed quantile regression with large-scale inference. J. Econom. 2023, 232, 367–388. [Google Scholar] [CrossRef] [PubMed]
- Huang, H. Controlling the false discoveries in LASSO. Biometrics 2017, 73, 1102–1110. [Google Scholar] [CrossRef] [PubMed]
- Su, W.; Bogdan, M.; Candes, E. False discoveries occur early on the lasso path. Ann. Stat. 2017, 2133–2150. [Google Scholar] [CrossRef]
- Bogdan, M.; Van Den Berg, E.; Sabatti, C.; Su, W.; Candès, E.J. SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 2015, 9, 1103. [Google Scholar] [CrossRef]
- Liang, W.; Ma, S.; Lin, C. Marginal false discovery rate for a penalized transformation survival model. Comput. Stat. Data Anal. 2021, 160, 107232. [Google Scholar] [CrossRef]
- Tadesse, M.G.; Vannucci, M. Handbook of Bayesian Variable Selection; Chapman & Hall: London, UK, 2021. [Google Scholar]
- Ickstadt, K.; Schäfer, M.; Zucknick, M. Toward integrative Bayesian analysis in molecular biology. Annu. Rev. Stat. Its Appl. 2018, 5, 141–167. [Google Scholar] [CrossRef]
- Mallick, H.; Yi, N. Bayesian methods for high dimensional linear models. J. Biom. Biostat. 2013, 1, 005. [Google Scholar]
- Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
- Casella, G.; Ghosh, M.; Gill, J.; Kyung, M. Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 2010, 06, 369–411. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. A sparse-group lasso. J. Comput. Graph. Stat. 2013, 22, 231–245. [Google Scholar] [CrossRef]
- Mitchell, T.J.; Beauchamp, J.J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 1988, 83, 1023–1032. [Google Scholar] [CrossRef]
- George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
- Barbieri, M.M.; Berger, J.O. Optimal predictive model selection. Ann. Stat. 2004, 32, 870–897. [Google Scholar] [CrossRef]
- Barbieri, M.M.; Berger, J.O.; George, E.I.; Ročková, V. The median probability model and correlated variables. Bayesian Anal. 2021, 16, 1085–1112. [Google Scholar] [CrossRef]
- Wasserman, L. Bayesian model selection and model averaging. J. Math. Psychol. 2000, 44, 92–107. [Google Scholar] [CrossRef]
- Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian model averaging: A tutorial (with comments by M. Clyde, David Draper and EI George, and a rejoinder by the authors. Stat. Sci. 1999, 14, 382–417. [Google Scholar] [CrossRef]
- Xu, X.; Ghosh, M. Bayesian variable selection and estimation for group lasso. Bayesian Anal. 2015, 10, 909–936. [Google Scholar] [CrossRef]
- Ren, J.; Zhou, F.; Li, X.; Chen, Q.; Zhang, H.; Ma, S.; Jiang, Y.; Wu, C. Semiparametric Bayesian variable selection for gene-environment interactions. Stat. Med. 2020, 39, 617–638. [Google Scholar] [CrossRef]
- Bai, R.; Ročková, V.; George, E.I. Spike-and-slab meets LASSO: A review of the spike-and-slab LASSO. In Handbook of Bayesian Variable Selection; Chapman & Hall: London, UK, 2021; pp. 81–108. [Google Scholar]
- Carvalho, C.M.; Polson, N.G.; Scott, J.G. Handling sparsity via the horseshoe. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Clearwater Beach, FL, USA, 16–18 April 2009; pp. 73–80. [Google Scholar]
- Ročková, V. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Stat. 2018, 46, 401–437. [Google Scholar] [CrossRef]
- Carvalho, C.M.; Polson, N.G.; Scott, J.G. The horseshoe estimator for sparse signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef]
- Polson, N.G.; Scott, J.G. Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat. 2010, 9, 105. [Google Scholar]
- Ročková, V.; George, E.I. EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 2014, 109, 828–846. [Google Scholar] [CrossRef]
- Ročková, V.; George, E.I. The spike-and-slab lasso. J. Am. Stat. Assoc. 2018, 113, 431–444. [Google Scholar] [CrossRef]
- Fu, W.J. Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Stat. 1998, 7, 397–416. [Google Scholar] [CrossRef]
- Wu, T.T.; Lange, K. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2008, 2, 224–244. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
- Tang, Y.; Wang, H.J.; Zhu, Z. Variable selection in quantile varying coefficient models with longitudinal data. Comput. Stat. Data Anal. 2013, 57, 435–449. [Google Scholar] [CrossRef]
- Tang, Z.; Shen, Y.; Zhang, X.; Yi, N. The spike-and-slab Lasso generalized linear models for prediction and associated genes detection. Genetics 2017, 205, 77–88. [Google Scholar] [CrossRef]
- Liu, Y.; Ren, J.; Ma, S.; Wu, C. The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies. Stat. Med. 2024. [Google Scholar] [CrossRef] [PubMed]
- Nie, L.; Ročková, V. Bayesian bootstrap spike-and-slab LASSO. J. Am. Stat. Assoc. 2023, 118, 2013–2028. [Google Scholar] [CrossRef]
- Newton, M.A.; Raftery, A.E. Approximate Bayesian inference with the weighted likelihood bootstrap. J. R. Stat. Soc. Ser. Stat. Methodol. 1994, 56, 3–26. [Google Scholar] [CrossRef]
- Kuo, L.; Mallick, B. Variable selection for regression models. Sankhyā Indian J. Stat. Ser. B 1998, 60, 65–81. [Google Scholar]
- Carlin, B.P.; Chib, S. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. Stat. Methodol. 1995, 57, 473–484. [Google Scholar] [CrossRef]
- Bhattacharya, A.; Pati, D.; Pillai, N.S.; Dunson, D.B. Dirichlet–Laplace priors for optimal shrinkage. J. Am. Stat. Assoc. 2015, 110, 1479–1490. [Google Scholar] [CrossRef]
- Bhadra, A.; Datta, J.; Polson, N.G.; Willard, B. The horseshoe+ estimator of ultra-sparse signals. Bayesian Anal. 2017, 12, 1105–1131. [Google Scholar] [CrossRef]
- Johnson, V.E.; Rossell, D. On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. Stat. Methodol. 2010, 72, 143–170. [Google Scholar] [CrossRef]
- Shin, M.; Bhattacharya, A.; Johnson, V.E. Scalable Bayesian variable selection using nonlocal prior densities in ultrahigh-dimensional settings. Stat. Sin. 2018, 28, 1053. [Google Scholar]
- Yu, K.; Zhang, J. A three-parameter asymmetric Laplace distribution and its extension. Commun. Stat. Theory Methods 2005, 34, 1867–1879. [Google Scholar] [CrossRef]
- Li, Q.; Lin, N.; Xi, R. Bayesian regularized quantile regression. Bayesian Anal. 2010, 5, 533–556. [Google Scholar] [CrossRef]
- Lu, X.; Fan, K.; Ren, J.; Wu, C. Identifying gene–environment interactions with robust marginal Bayesian variable selection. Front. Genet. 2021, 12, 667074. [Google Scholar] [CrossRef] [PubMed]
- Ren, J.; Zhou, F.; Li, X.; Ma, S.; Jiang, Y.; Wu, C. Robust Bayesian variable selection for gene–environment interactions. Biometrics 2023, 79, 684–694. [Google Scholar] [CrossRef] [PubMed]
- Reich, B.J.; Bondell, H.D.; Wang, H.J. Flexible Bayesian quantile regression for independent and clustered data. Biostatistics 2010, 11, 337–352. [Google Scholar] [CrossRef]
- McLachlan, G.J.; Lee, S.X.; Rathnayake, S.I. Finite mixture models. Annu. Rev. Stat. Its Appl. 2019, 6, 355–378. [Google Scholar] [CrossRef]
- Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 2000, 9, 249–265. [Google Scholar] [CrossRef]
- Li, Y.; Schofield, E.; Gönen, M. A tutorial on Dirichlet process mixture modeling. J. Math. Psychol. 2019, 91, 128–144. [Google Scholar] [CrossRef]
- Görür, D.; Edward Rasmussen, C. Dirichlet process gaussian mixture models: Choice of the base distribution. J. Comput. Sci. Technol. 2010, 25, 653–664. [Google Scholar] [CrossRef]
- Khalili, A. An overview of the new feature selection methods in finite mixture of regression models. J. Iran. Stat. Soc. 2022, 10, 201–235. [Google Scholar]
- Barcella, W.; De Iorio, M.; Baio, G. A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models. Can. J. Stat. 2017, 45, 254–273. [Google Scholar] [CrossRef]
- Peel, D.; McLachlan, G.J. Robust mixture modelling using the t distribution. Stat. Comput. 2000, 10, 339–348. [Google Scholar] [CrossRef]
- Wu, C.; Li, G.; Zhu, J.; Cui, Y. Functional mapping of dynamic traits with robust t-distribution. PLoS ONE 2011, 6, e24902. [Google Scholar] [CrossRef] [PubMed]
- Yao, W.; Wei, Y.; Yu, C. Robust mixture regression using the t-distribution. Comput. Stat. Data Anal. 2014, 71, 116–127. [Google Scholar] [CrossRef]
- Lee, S.; McLachlan, G.J. Finite mixtures of multivariate skew t-distributions: Some recent and new results. Stat. Comput. 2014, 24, 181–202. [Google Scholar] [CrossRef]
- Wang, H.; Li, G.; Jiang, G. Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J. Bus. Econ. Stat. 2007, 25, 347–355. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
- Wang, L.; Zhou, J.; Qu, A. Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 2012, 68, 353–360. [Google Scholar] [CrossRef]
- Zhou, F.; Ren, J.; Li, G.; Jiang, Y.; Li, X.; Wang, W.; Wu, C. Penalized variable selection for lipid–environment interactions in a longitudinal lipidomics study. Genes 2019, 10, 1002. [Google Scholar] [CrossRef]
- Zhou, F.; Lu, X.; Ren, J.; Fan, K.; Ma, S.; Wu, C. Sparse group variable selection for gene–environment interactions in the longitudinal study. Genet. Epidemiol. 2022, 46, 317–340. [Google Scholar] [CrossRef]
- Breheny, P.; Huang, J. Penalized methods for bi-level variable selection. Stat. Its Interface 2009, 2, 369. [Google Scholar] [CrossRef]
- Li, J.; Wang, Z.; Li, R.; Wu, R. Bayesian group lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. Ann. Appl. Stat. 2015, 9, 640. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Cui, Y.; Ma, S. Integrative analysis of gene–environment interactions under a multi-response partially linear varying coefficient model. Stat. Med. 2014, 33, 4988–4998. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Shi, X.; Cui, Y.; Ma, S. A penalized robust semiparametric approach for gene–environment interactions. Stat. Med. 2015, 34, 4016–4030. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Zhong, P.S.; Cui, Y. Additive varying-coefficient model for nonlinear gene-environment interactions. Stat. Appl. Genet. Mol. Biol. 2018, 17, 20170008. [Google Scholar] [CrossRef] [PubMed]
- Han, A.K. Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator. J. Econom. 1987, 35, 303–316. [Google Scholar] [CrossRef]
- Khan, S.; Tamer, E. Partial rank estimation of duration models with general forms of censoring. J. Econom. 2007, 136, 251–280. [Google Scholar] [CrossRef]
- Steele, J.M.; Steiger, W.L. Algorithms and complexity for least median of squares regression. Discret. Appl. Math. 1986, 14, 93–100. [Google Scholar] [CrossRef]
- Alfons, A.; Croux, C.; Gelper, S. Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 2013, 7, 226–248. [Google Scholar]
- She, Y.; Owen, A.B. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 2011, 106, 626–639. [Google Scholar] [CrossRef]
- Wang, X.; Jiang, Y.; Huang, M.; Zhang, H. Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 2013, 108, 632–643. [Google Scholar] [CrossRef]
- Yu, C.; Yao, W. Robust linear regression: A review and comparison. Commun. Stat.-Simul. Comput. 2017, 46, 6261–6282. [Google Scholar] [CrossRef]
- Yang, Y.; Wang, H.J.; He, X. Posterior inference in Bayesian quantile regression with asymmetric Laplace likelihood. Int. Stat. Rev. 2016, 84, 327–344. [Google Scholar] [CrossRef]
- Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
- Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
- Wang, G.; Sarkar, A.; Carbonetto, P.; Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. Stat. Methodol. 2020, 82, 1273–1300. [Google Scholar] [CrossRef] [PubMed]
- Logsdon, B.A.; Hoffman, G.E.; Mezey, J.G. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinform. 2010, 11, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Carbonetto, P.; Stephens, M. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 2012, 7, 73–108. [Google Scholar] [CrossRef]
- Sunnåker, M.; Busetto, A.G.; Numminen, E.; Corander, J.; Foll, M.; Dessimoz, C. Approximate bayesian computation. PLoS Comput. Biol. 2013, 9, e1002803. [Google Scholar] [CrossRef]
- Beaumont, M.A. Approximate bayesian computation. Annu. Rev. Stat. Its Appl. 2019, 6, 379–403. [Google Scholar] [CrossRef]
- Zhang, L.; Baladandayuthapani, V.; Mallick, B.K.; Manyam, G.C.; Thompson, P.A.; Bondy, M.L.; Do, K.A. Bayesian hierarchical structured variable selection methods with application to molecular inversion probe studies in breast cancer. J. R. Stat. Soc. Ser. C Appl. Stat. 2014, 63, 595–620. [Google Scholar] [CrossRef]
- Ren, J.; Zhou, F.; Li, X.; Wu, C. Package’roben’ 2020. Available online: https://cran.r-project.org/web/packages/roben/index.html (accessed on 23 July 2024).
- Xia, L.; Nan, B.; Li, Y. Debiased lasso for generalized linear models with a diverging number of covariates. Biometrics 2023, 79, 344–357. [Google Scholar] [CrossRef]
- Wang, L.; Li, H.; Huang, J.Z. Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 2008, 103, 1556–1569. [Google Scholar] [CrossRef]
- Koenker, R.; Portnoy, S.; Ng, P.T.; Zeileis, A.; Grosjean, P.; Ripley, B.D. Package ‘Quantreg’. Reference Manual. 2018. Available online: https://cran.rproject.org/web/packages/quantreg/quantreg.pdf (accessed on 23 July 2024).
- Sherwood, B.; Maidman, A. Package ‘rqPen’. R Foundation for Statistical Computing; R Core Team: Vienna, Austria, 2017. [Google Scholar]
- Vilenchik, D. Simple statistics are sometime too simple: A case study in social media data. IEEE Trans. Knowl. Data Eng. 2019, 32, 402–408. [Google Scholar] [CrossRef]
Reference | Type | Description | Inference Procedure | Numerical Study |
---|---|---|---|---|
Dezeure et al. (2015) [21] | Theory and frequentist | Provided a selective surveys on high-dimensional inference for frequentist regularization methods, which focuses on sample splitting, regularized projection and bias correction. R package hdi has also been introduced. | Confidence intervals and p-values in both linear and generalized linear models. Multiple testing correction included. | Both simulation and a case study on riboflavin data. |
Fan and Lv (2010) [38] | Theory and frequentist | Systematically overviewed the theory, methods and application in high-dimensional variable selection. Topics on oracle property and ultra-high dimensional variable selection are included. | Discussed the oracle properties in both the classical and ultra-high dimensional setting. | No. |
Cai and Sun (2017) [23] | Theory and frequentist | Surveyed recently developed large scale multiple testing with FDR control and examined efficient procedures to handle hierarchical, grouping and depedent structure. | Multiple testing with FDR control. | No. |
Kuchibhotla et al. (2022) [24] | Theory and frequentist | Reviewed three categories of inference methods after variable selection: sample splitting, simultaneous inference and conditional selective inference. | Confidence intervals and p-values. | No simulation. Case studies on Boston Housing data. |
Chernozhukov et al. (2023) [25] | Theory and frequentist | Reviewed recent development on high-dimensional bootstrap including high-dimensional central limit theorems, multiplier and empirical bootstrap and applications. | Confidence intervals and p-values. | Provided R codes to compute different versions of p-value on hedge fund data (n = 50, p = 2000). |
Huang et al. (2022) [26] | Theory and frequentist | Surveyed statistical tests for high-dimensional mean problems, with a focus on testing two-sample means for differentially expressed gene expression analysis. | Power of tests and control on type 1 error. | Simulation. |
Zhang et al. (2022) [27] | Theory and frequentist | A theoretical review on post selection inferences under linear models. | Examined the confidence intervals and coverage probabilities. | Simulation. |
Heinze et al. (2018) [28] | Applied (for practicing statisticians) and frequentist | Focus on surveying variable selection methods for low-dimensional problems including backward/forward/stepwise/best subset selection and LASSO. | Inferences were not explicitly examined. Discussed model stability, resampling and bootstrap. | No simulation. Case study on body-fat data (n = 251, p = 13). |
Bühlmann et al. (2014) [29] | Applied and frequentist | Reviewed uncertainty quantification using type 1 error and p-values on high-dimensional linear models (including generalized linear models and mixed models), graphical models, and causal inferences. | FDR and p-values. | No simulation. A case study on Riboflavin data with n = 71 and p = 4088. |
Benjamini (2009) [10] | Applied and frequentist | Summarized the current success and future trend in inferences with FDR and multiple comparisons. | Discussed False discovery rates (FDR) and Family wise error rates (FWER). | No. |
Farcomeni (2008) [30] | Applied and frequentist | Reviewed multiple hypothesis testing with control on different error measures related to FDR and its variants. | Assessed validity of controlling a variety of FDR related error measures in multiple hypothesis testing. | Simulation and two case studies on clinical trials with multiple endpoints and DNA microarrays. |
O’hara and Sillanpöö (2009) [3] | Theory and Bayesian | Reviewed major categories of Bayesian variable selection methods, including indicator model selection, adaptive shrinkage, and stochastic search variable selection. | Posterior distributions of regression coefficients and posterior inclusion probabilities. | Both simulation and real data. |
Lu and Lou (2022) [31] | Applied and Bayesian | Surveyed Bayesian variable selection under a variety shrinkage priors and conducted comprehensive comparative study. | Coverage probability on prediction assessed on real data. | Simulation and a case study on body-fat data (n = 251, p = 13). |
Fridley (2009) [32] | Applied and Bayesian | Reviewed and compared Bayesian variable and model selection in genetic associations. | Posterior inclusion probability and credible intervals. | Both simulation and case studies on 17 SNPs genotyped from two genes. |
Muller et al. (2007) [33] | Theory and Bayesian | Reviewed and compared Bayesian approaches to multiple testing. | Bayesian FDR and its variants. | Real data on DNA microarray studies. |
Bhadra et al. (2019) [34] | Theory and Bayesian | Surveyed two major types of variable selection methods, LASSO and Horseshoe, in high-dimensional inference, efficiency and scalability. | Examined theoretical optimality in high-dimensional inference. | Used simulated data to check theoretical assumptions. |
Methods | |||||||
---|---|---|---|---|---|---|---|
BL | BLSS | RBL | RBLSS | Debiased-LASSO | Selective Inference | ||
Error 1 N(0,1) | error of | 0.951 | 0.305 | 1.713 | 0.329 | 0.688 | 0.497 |
Standard deviation | 0.213 | 0.146 | 0.284 | 0.159 | 0.019 | 0.017 | |
error of | 10.782 | 0.312 | 14.739 | 0.576 | 0.061 | 0.712 | |
Standard deviation | 0.777 | 0.333 | 0.818 | 0.285 | 0.021 | 0.085 | |
TP | 2.961 | 3.000 | 1.941 | 3.000 | 3.000 | 3.000 | |
Standard deviation | 0.194 | 0.000 | 0.419 | 0.000 | 0.000 | 0.000 | |
FP | 0.000 | 0.096 | 0.000 | 0.275 | 1.430 | 14.996 | |
Standard deviation | 0.000 | 0.333 | 0.000 | 0.560 | 0.438 | 1.503 | |
Coverage of | |||||||
0.774 | 0.942 | 0.946 | 0.920 | 0.913 | 0.788 | ||
0.966 | 0.945 | 0.990 | 0.931 | 0.775 | 0.790 | ||
0.740 | 0.949 | 0.795 | 0.932 | 0.911 | 0.784 | ||
Average length | |||||||
0.937 | 0.472 | 1.369 | 0.476 | 0.483 | - | ||
1.075 | 0.526 | 1.892 | 0.531 | 0.482 | - | ||
1.008 | 0.471 | 1.892 | 0.477 | 0.481 | - | ||
Coverage of | 0.994 | 0.994 | 0.996 | 0.994 | 0.956 | 0.015 | |
Average length | 0.414 | 0.006 | 0.829 | 0.010 | 0.482 | - | |
Error 2 t(2) | error of | 1.602 | 1.353 | 2.033 | 0.486 | 2.222 | 1.243 |
Standard deviation | 0.784 | 0.941 | 0.451 | 0.331 | 0.103 | 0.065 | |
error of | 25.542 | 4.911 | 23.405 | 0.383 | 0.025 | 1.416 | |
Standard deviation | 0.063 | 14.225 | 4.082 | 0.225 | 0.017 | 0.176 | |
TP | 2.200 | 2.437 | 1.350 | 2.948 | 2.409 | 2.856 | |
Standard deviation | 0.784 | 0.612 | 0.720 | 0.222 | 0.093 | 0.047 | |
FP | 0.004 | 10.376 | 0.000 | 0.042 | 0.329 | 13.183 | |
Standard deviation | 0.063 | 62.511 | 0.000 | 0.186 | 0.212 | 1.357 | |
Coverage of | |||||||
0.751 | 0.745 | 0.906 | 0.954 | 0.910 | 0.722 | ||
0.898 | 0.843 | 0.949 | 0.960 | 0.755 | 0.788 | ||
0.636 | 0.906 | 0.665 | 0.964 | 0.901 | 0.794 | ||
Average length | |||||||
1.219 | 1.143 | 1.448 | 0.833 | 1.418 | - | ||
1.601 | 1.555 | 1.971 | 0.856 | 1.409 | - | ||
1.564 | 1.348 | 2.004 | 0.728 | 1.412 | - | ||
Coverage of | 0.996 | 0.996 | 0.997 | 0.995 | 0.955 | 0.015 | |
Average length | 0.704 | 0.120 | 0.953 | 0.008 | 1.414 | - |
Reference | Model | Inferences and Software | Case Study |
---|---|---|---|
Wang et al. (2012) [160] | Penalized Generalized Estimating Equation (PGEE) for longitudinal data | Confidence interval based on asymptotic property of Oracle estimator. R package PGEE (ver 1.5). | Yeast cell-cycle gene expression (GE) data. n = 297, p = 96. : log transformed time varying GE; : matching score of binding probability |
Breheny (2019) [6] | Penalized regression (LASSO, SCAD and MCP) | Marginal FDR for penalized regression. R package ncvreg (ver 3.14.3). | (1) TCGA breast cancer data n = 536, p = 17,322. : BRCA1 expression, :GE; (2) Genetic association study of cardiac fibrosis n = 313, p = 66,0496 : ratio of cardiomyocytes to fibroblasts in the heart tissue on log scale. : SNP |
Xia et al. (2023) [185] | De-biased LASSO in generalized linear models. | Confidence intervals based on refined de-biasing estimating approach. | Boston lung cancer data. : binary with 723 controls and 651 cases (n = 1374) : 103 SNPs and 4 clinical covariates |
Meinshausen and Bühlmann (2010) [9] | Graphical LASSO and LASSO. | Error control on expected number of falsely selected edges from the graph. | Riboavin (vitamin gene-expression) data with n = 115 subjects and p = 160 GEs. No phenotype. |
Zhang et al. (2014) [183] | Generalized hierarchical structured (bi-level) Bayesian variable selection | Bayesian credible intervals and FDR. | Breast cancer study. : binary. Case: 184 TNBC subtype. Control: 787 other and unclassified subtypes. (n = 971) : 167, 574 probes for copy number measurements |
Zhou et al. (2023) [20] | Bayesian regularized quantile varying coefficient model. | Bayesian credible intervals on non-linear gene-environment interaction effects. R package pqrBayes (ver 1.0.2). | Type 2-diabetes data with SNP measurements. n = 1716 : BMI; : 53,408 SNPs |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fan, K.; Subedi, S.; Yang, G.; Lu, X.; Ren, J.; Wu, C. Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. Entropy 2024, 26, 794. https://doi.org/10.3390/e26090794
Fan K, Subedi S, Yang G, Lu X, Ren J, Wu C. Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. Entropy. 2024; 26(9):794. https://doi.org/10.3390/e26090794
Chicago/Turabian StyleFan, Kun, Srijana Subedi, Gongshun Yang, Xi Lu, Jie Ren, and Cen Wu. 2024. "Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies" Entropy 26, no. 9: 794. https://doi.org/10.3390/e26090794
APA StyleFan, K., Subedi, S., Yang, G., Lu, X., Ren, J., & Wu, C. (2024). Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. Entropy, 26(9), 794. https://doi.org/10.3390/e26090794