Resampling under Complex Sampling Designs: Roots, Development and the Way Forward
Abstract
:1. Introduction
1.1. Generalities
1.2. Superpopulation Model and Sampling Design: Basic Aspects
- (a)
- is a probability measure on for every in .
- (b)
- is a Borel-measurable function of for every .
1.3. Descriptive and Analytic Inference
2. From Efron’s iid Bootstrap to Pseudo-Population Based Resampling
2.1. Efron’s Bootstrap: A Few Basic Aspects
- -
- Conditionally on , the r.v.s in are i.i.d. with common d.f. , the finite population d.f.
- -
- Unconditionally, the r.v.s in are i.i.d. with common d.f. .
- E1.
- Conditionally on , converges weakly to a Brownian bridge W on the scale of as N, n increase. The same result also holds unconditionally.
- E2.
- weakly converges to a Brownian bridge W on the scale of as N increases.
- E3.
- and are asymptotically independent.
- E4.
- If , with , then converges weakly to , as n, N increase.
- E5.
- Conditionally on , , converges weakly to a Brownian bridge on the scale of as N, n increase.
3. Failure of Efron’s Bootstrap in the Non-i.i.d. Case
- S1.
- Conditionally on , converges weakly to , where W is a Brownian bridge on the scale of as N, n increase. The same result also holds unconditionally.
- S2.
- weakly converges to a Brownian bridge W on the scale of as N increases.
- S3.
- and are asymptotically independent.
- S4.
- converges weakly to W, a Brownian bridge on the scale of , as n, N increase.
- S5.
- Conditionally on and , converges weakly to a Brownian bridge on the scale of as N, n increase.
- This is the closest to Efron’s original idea of replicating, at a sample level, the sampling process from the population.
- This is the only resampling procedure justified by asymptotic arguments similar to those of [17] for Efron’s bootstrap.
4. Accounting for the Sampling Design in Resampling: The Pseudo-Population Approach
4.1. Pseudo-Populations: Definition
4.2. Resampling from Pseudo-Populations
4.3. Resampling Based on Pseudo-Populations: Basics Results for Descriptive Inference
- Under appropriate regularity conditions, the conditional distribution of , given and , converges weakly, as both n and N tend to infinity, to a Gaussian process with null mean function and covariance kernel . This result, furthermore, holds for a set of sequences of s and s having -probability 1.
- If the functional is Hadamard-differentiable at with Hadamard derivative , then, again conditionally on and , tends in distribution to , which is a Normal variate with zero expectation and variance .
- .
- Under appropriate regularity conditions, the conditional distribution of , given , , , converges weakly, as both n and N tend to infinity, to a Gaussian process with a null mean function and covariance kernel . This result, furthermore, holds for a set of sequences of s and s having -probability 1 and in probability w.r.t. the sampling design.
- .
- If the functional is continuously Hadamard-differentiable at , with Hadamard derivative , then, again conditionally on , , , tends in distribution to , that turns out to be a Normal variate with zero expectation and variance .
- -
- Conditional approach. A single pseudo-population is constructed, and M independent bootstrap samples are drawn. In this way, M independent replications are generated.
- -
- Unconditional approach. M independent pseudo-populations are constructed, and from each of them, a single bootstrap sample is drawn. In this case, M independent replications are generated.
4.4. Resampling Based on Pseudo-Populations: Basics Results for Analytic Inference
- -
- The generation of s from the superpopulation model.
- -
- The selection of the sample from the finite population.
- 1.
- Under appropriate regularity conditions, the (unconditional) distribution of converges weakly, as both n and N tend to infinity to a Gaussian process with a null mean function and covariance kernel .
- .
- Under appropriate regularity conditions, and conditionally on , , , the distribution of converges weakly, as both n and N tend to infinity to the same Gaussian process with a null mean function and covariance kernel .
- 2.
- The limiting process can be written as , where is the limiting Gaussian process obtained for descriptive inference, is an independent Gaussian process (essentially, a Brownian bridge on the scale of ), and f is the limiting value of the sampling fraction.
- 3.
- If the functional is Hadamard-differentiable at , with Hadamard derivative , then tends in distribution to , that turns out to be a Normal variate with zero expectation and variance .
- .
- If the functional is continuously Hadamard-differentiable at , with Hadamard derivative , then, conditionally on , , and , tends in distribution to the same Normal variate with zero expectation and variance .
5. Computational Issues
6. Open Problems and Final Considerations
Author Contributions
Funding
Institutional Review Board Statement
Conflicts of Interest
References
- Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
- Mashreghi, Z.; Haziza, D.; Léger, C. A survey of bootstrap methods in finite population sampling. Stat. Surv. 2016, 10, 1–52. [Google Scholar] [CrossRef]
- McCarthy, P.J.; Snowden, C.B. The bootstrap and finite population sampling. In Vital and Health Statistics; Public Heath Service Publication, U.S. Government Printing: Washington, DC, USA, 1985; Volume 95, pp. 1–23. [Google Scholar]
- Rao, J.N.K.; Wu, C.F.J. Resampling inference with complex survey data. J. Am. Stat. Assoc. 1988, 83, 231–241. [Google Scholar] [CrossRef]
- Sitter, R.R. A resampling procedure for complex data. J. Am. Stat. Assoc. 1992, 87, 755–765. [Google Scholar] [CrossRef]
- Chatterjee, A. Asymptotic properties of sample quantiles from a finite population. Ann. Inst. Stat. Math. 2011, 63, 157–179. [Google Scholar] [CrossRef]
- Rao, J.N.K.; Wu, C.F.J.; Yue, K. Some recent work on resampling methods for complex surveys. Surv. Methodol. 1992, 18, 209–217. [Google Scholar]
- Conti, P.L.; Marella, D. Inference for quantiles of a fnite population: Asymptotic vs. resampling results. Scand. J. Stat. 2015, 42, 545–561. [Google Scholar] [CrossRef]
- Beaumont, J.F.; Patak, Z. On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Ssampling. Int. Stat. Rev. 2012, 80, 127–148. [Google Scholar] [CrossRef]
- Antal, E.; Tillé, Y. A direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 2011, 106, 534–543. [Google Scholar] [CrossRef] [Green Version]
- Gross, S.T. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association, Houston, TX, USA, 11–14 August 1980; pp. 181–184. [Google Scholar]
- Chao, M.T.; Lo, S.H. A bootstrap method for finite population. Sankhya 1985, 47, 399–405. [Google Scholar]
- Booth, J.G.; Butler, R.W.; Hall, P. Bootstrap methods for finite populations. J. Am. Stat. Assoc. 1994, 89, 1282–1289. [Google Scholar] [CrossRef]
- Holmberg, A. A bootstrap approach to probability proportional-to-size sampling. In Proceedings of the ASA Section on Survey Research Methods, Alexandria, VA, USA, 1998; pp. 378–383. [Google Scholar]
- Chauvet, G. Méthodes de Bootstrap en Population Finie. Ph.D. Dissertation, Laboratoire de Statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes, Rennes, France, 2007. [Google Scholar]
- Conti, P.L. On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B 2014, 76, 234–259. [Google Scholar] [CrossRef]
- Bickel, P.J.; Freedman, D. Some asymptotic theory for the bootstrap. Ann. Stat. 1981, 9, 1196–1216. [Google Scholar] [CrossRef]
- van der Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Pfeffermann, D.; Sverchkov, M. Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhya B 1999, 61, 166–186. [Google Scholar]
- Conti, P.L.; Marella, D.; Mecatti, F.; Andreis, F. A unified principled framework for resampling based on pseudo-populations: Asymptotic theory. Bernoulli 2020, 26, 1044–1069. [Google Scholar] [CrossRef] [Green Version]
- Pfeffermann, D.; Sverchkov, M. Prediction of finite population totals based on the sample distribution. Surv. Methodol. 2004, 30, 79–92. [Google Scholar]
- Boistard, H.; Lophuhaä, H.P.; Ruiz-Gazen, A. Functional central limit theorems for single-stage sampling design. Ann. Stat. 2017, 45, 1728–1758. [Google Scholar] [CrossRef] [Green Version]
- Bertail, P.; Chautru, E.; Clémençon, S. Empirical Processes in Survey Sampling with (Conditional) Poisson Designs. Scand. J. Stat. 2017, 44, 97–111. [Google Scholar] [CrossRef]
- Han, Q.; Wellner, J.A. Complex sampling designs: Uniform limit theorems and applications. Ann. Stat. 2021, 49, 459–485. [Google Scholar] [CrossRef]
- Di Iorio, A. Analytic Inference in Finite Population Framework Via Resampling. Unpublished Ph.D. Thesis, Department of Statistical Science, Sapienza Università di Roma, Roma, Italy, 2016. [Google Scholar]
- Ranalli, M.G.; Mecatti, F. Comparing Recent Approaches for Bootstrapping Sample Survey Data: A First Step Towards a Unified Approach. In Proceedings of the ASA Section on Survey Research Methods, Alexandria, VA, USA, 2012; pp. 4088–4099. [Google Scholar]
- Quatember, A. Pseudo-Populations—A Basic Concept in Statistical Surveys; Springer: New York, NY, USA, 2015. [Google Scholar]
- Quatember, A. The Finite Population Bootstrap—From the Maximum Likelihood to the Horvitz-Thompson Approach. Austrian J. Stat. 2014, 43, 93–102. [Google Scholar] [CrossRef] [Green Version]
- Conti, P.L.; Mecatti, F.; Nicolussi, F. Efficient unequal probability resampling from finite populations. Comput. Stat. Data Anal. 2022, 167, 107366. [Google Scholar] [CrossRef]
- Thompson, S.K. Sampling, 3rd ed; Wiley: New York, NY, USA, 2012. [Google Scholar]
- Thompson, S.K. Adaptive and Network Sampling for Inference and Interventions in Changing Populations. J. Surv. Stat. Methodol. 2017, 5, 1–21. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Conti, P.L.; Mecatti, F. Resampling under Complex Sampling Designs: Roots, Development and the Way Forward. Stats 2022, 5, 258-269. https://doi.org/10.3390/stats5010016
Conti PL, Mecatti F. Resampling under Complex Sampling Designs: Roots, Development and the Way Forward. Stats. 2022; 5(1):258-269. https://doi.org/10.3390/stats5010016
Chicago/Turabian StyleConti, Pier Luigi, and Fulvia Mecatti. 2022. "Resampling under Complex Sampling Designs: Roots, Development and the Way Forward" Stats 5, no. 1: 258-269. https://doi.org/10.3390/stats5010016