Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design
Abstract
:1. Introduction
2. Preliminaries
2.1. Subsampling M-Estimation
2.2. Generalized Linear Models
3. Main Results
3.1. Subsampling M-Estimation Problem
3.2. Conditional Asymptotic Properties of Subsampled GLMs with Unbounded Covariates
- (A.1)
- The range of the unknown parameter is an open subset of .
- (A.2)
- For any , .
- (A.3)
- For any and , , where .
- (A.4)
- For any and , there exists a function such that
- (A.5)
- When and , where and is the smallest eigenvalue of the matrix .
- (A.6)
- , .
3.3. Unconditional Asymptotic Properties of Subsampled GLMs with Unbounded Covariates
- (B.1)
- where is the unbounded covariate of GLMs.
- (B.2)
- For ,
- (B.3)
- (B.4)
- in (3) is twice continuously differentiable and its every derivative has a positive minimum.
- (B.5)
- in (3) is twice continuously differentiable and its every derivative has a positive minimum.
- (C.1)
- (C.2)
- , for ,where means k-th element of vector and means j-th element of vector .
- (C.3)
- is three-times continuously differentiable for every x with its domain.
- (C.4)
- For any , .
- (C.5)
- and .
- (C.6)
- ,
- (C.7)
4. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Technical Details
- (D.1)
- (D.2)
- The Fisher information matrix
- (D.3)
- For any given , there exists a positive number and a positive function such that and
- (E.1)
- is finite and nonsingular.
- (E.2)
- For ,
- (E.3)
- For ,
- (F.1)
- ;
- (F.2)
- for some sequence of positive definite matrices with i.e., the largest eigenvalue is uniformly bounded;
- (F.3)
- For some probability distribution , ∗ denotes convolution and denotes the law of random variates:
- (G.1.)
- ;
- (G.2.)
- .Then
References
- Xi, R.; Lin, N. Direct regression modelling of high-order moments in big data. Stat. Its Interface 2016, 9, 445–452. [Google Scholar] [CrossRef] [Green Version]
- Tewes, J.; Politis, D.N.; Nordman, D.J. Convolved subsampling estimation with applications to block bootstrap. Ann. Stat. 2019, 47, 468–496. [Google Scholar] [CrossRef] [Green Version]
- Yu, J.; Wang, H.; Ai, M.; Zhang, H. Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J. Am. Stat. Assoc. 2022, 117, 265–276. [Google Scholar] [CrossRef]
- Yao, Y.; Wang, H. A review on optimal subsampling methods for massive datasets. J. Data Sci. 2021, 19, 151–172. [Google Scholar] [CrossRef]
- Yu, J.; Wang, H. Subdata selection algorithm for linear model discrimination. Stat. Pap. 2021, 63, 1883–1906. [Google Scholar] [CrossRef]
- Fu, S.; Chen, P.; Liu, Y.; Ye, Z. Simplex-based Multinomial Logistic Regression with Diverging Numbers of Categories and Covariates. Stat. Sin. 2022, in press. [Google Scholar] [CrossRef]
- Ma, J.; Xu, J.; Maleki, A. Analysis of sensing spectral for signal recovery under a generalized linear model. Adv. Neural Inf. Process. Syst. 2021, 34, 22601–22613. [Google Scholar]
- Mahmood, T. Generalized linear model based monitoring methods for high-yield processes. Qual. Reliab. Eng. Int. 2020, 36, 1570–1591. [Google Scholar] [CrossRef]
- Ai, M.; Yu, J.; Zhang, H.; Wang, H. Optimal Subsampling Algorithms for Big Data Regressions. Stat. Sin. 2021, 31, 749–772. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, R.; Ma, P. Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 2018, 113, 829–844. [Google Scholar] [CrossRef]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: London, UK, 1998. [Google Scholar]
- Wooldridge, J.M. Inverse probability weighted M-estimators for sample selection, attrition, and stratification. Port. Econ. J. 2002, 1, 117–139. [Google Scholar] [CrossRef]
- Durret, R. Probability: Theory and Examples, 5th ed.; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
- McCullagh, P.; Nelder, J. Generalized Linear Models, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 1989. [Google Scholar]
- Fahrmeir, L.; Kaufmann, H. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Stat. 1985, 13, 342–368. [Google Scholar] [CrossRef]
- Shao, J. Mathematical Statistics, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
- Yin, C.; Zhao, L.; Wei, C. Asymptotic normality and strong consistency of maximum quasi-likelihood estimates in generalized linear models. Sci. China Ser. A 2006, 49, 145–157. [Google Scholar] [CrossRef]
- Rigollet, P. Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Stat. 2012, 40, 639–665. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Ning, Y.; Ruppert, D. Optimal sampling for generalized linear models under measurement constraints. J. Comput. Graph. Stat. 2021, 30, 106–114. [Google Scholar] [CrossRef]
- Ohlsson, E. Asymptotic normality for two-stage sampling from a finite population. Probab. Theory Relat. Fields 1989, 81, 341–352. [Google Scholar] [CrossRef]
- Zhang, H.; Wei, H. Sharper Sub-Weibull Concentrations. Mathematics 2022, 10, 2252. [Google Scholar] [CrossRef]
- Gong, T.; Dong, Y.; Chen, H.; Dong, B.; Li, C. Markov Subsampling Based on Huber Criterion. IEEE Trans. Neural Netw. Learn. Syst. 2022, in press. [Google Scholar] [CrossRef]
- Xiao, Y.; Yan, T.; Zhang, H.; Zhang, Y. Oracle inequalities for weighted group lasso in high-dimensional misspecified Cox models. J. Inequalities Appl. 2020, 2020, 252. [Google Scholar] [CrossRef]
- Zhang, H.; Jia, J. Elastic-net regularized high-dimensional negative binomial regression: Consistency and weak signals detection. Stat. Sin. 2022, 32, 181–207. [Google Scholar] [CrossRef]
- Ding, J.L.; Chen, X.R. Large-sample theory for generalized linear models with non-natural link and random variates. Acta Math. Appl. Sin. 2006, 22, 115–126. [Google Scholar] [CrossRef]
- Jennrich, R.I. Asymptotic properties of non-linear least squares estimators. Ann. Math. Stat. 1969, 40, 633–643. [Google Scholar] [CrossRef]
- White, H. Maximum likelihood estimation of misspecified models. Econom. J. Econom. Soc. 1982, 50, 1–25. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Davidson, J. Stochastic Limit Theory: An Introduction for Econometricians; OUP Oxford: Oxford, UK, 1994. [Google Scholar]
- Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, Volume 1: Models and Applications, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Teng, G.; Tian, B.; Zhang, Y.; Fu, S. Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design. Entropy 2023, 25, 84. https://doi.org/10.3390/e25010084
Teng G, Tian B, Zhang Y, Fu S. Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design. Entropy. 2023; 25(1):84. https://doi.org/10.3390/e25010084
Chicago/Turabian StyleTeng, Guangqiang, Boping Tian, Yuanyuan Zhang, and Sheng Fu. 2023. "Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design" Entropy 25, no. 1: 84. https://doi.org/10.3390/e25010084