A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling
Abstract
:1. Introduction
2. Methods
2.1. Rubin’s Method
2.2. The Bootstrap Estimation Method
2.2.1. Assumptions
2.2.2. Variance Decomposition
2.2.3. Bootstrap Variance Estimator and Its Properties
3. Examples and Results
3.1. Simulation 1: Domain Mean Estimation
3.2. Simulation 2: Linear Regression
Rbias (%) | Mwidth | 95cov | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | B | M | d | |||||||||||||
500 | 500 | 10 | 0.8 | 0.2 | 0.6 | 146.2 | −12.5 | 0.5 | 65 | 102 | 58 | 65 | 95 | 99 | 89 | 95 |
0.6 | 1.6 | 21.6 | −4.9 | 1.5 | 48 | 53 | 46 | 48 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −1.3 | 237.7 | −41.1 | 0.7 | 61 | 113 | 36 | 61 | 95 | 99 | 58 | 96 | |||
0.6 | −0.7 | 18.1 | −40.5 | −0.6 | 55 | 61 | 42 | 56 | 95 | 96 | 85 | 95 | ||||
30 | 0.8 | 0.2 | 1.0 | 156.8 | −12.6 | 0.9 | 63 | 100 | 57 | 62 | 95 | 99 | 92 | 95 | ||
0.6 | 1.3 | 21.6 | −5.3 | 1.3 | 47 | 52 | 46 | 47 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −2.2 | 261.4 | −54.3 | 0.6 | 58 | 109 | 33 | 57 | 95 | 100 | 63 | 95 | |||
0.6 | −1.4 | 17.8 | −42.2 | −1.4 | 54 | 59 | 41 | 54 | 95 | 97 | 86 | 95 | ||||
200 | 10 | 0.8 | 0.2 | 0.5 | 146.2 | −12.5 | 0.2 | 65 | 102 | 58 | 65 | 95 | 99 | 89 | 95 | |
0.6 | 1.6 | 21.6 | −4.9 | 1.5 | 48 | 53 | 46 | 48 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −1.4 | 237.7 | −41.1 | 0.7 | 62 | 113 | 36 | 61 | 95 | 99 | 58 | 96 | |||
0.6 | −0.8 | 18.1 | −40.5 | −0.7 | 55 | 61 | 42 | 56 | 95 | 96 | 85 | 95 | ||||
30 | 0.8 | 0.2 | 1.1 | 156.8 | −12.6 | 1.0 | 63 | 100 | 57 | 62 | 95 | 99 | 92 | 95 | ||
0.6 | 1.3 | 21.6 | −5.3 | 1.3 | 47 | 52 | 46 | 47 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −2.2 | 261.4 | −54.3 | 0.6 | 61 | 108 | 33 | 57 | 96 | 100 | 63 | 95 | |||
0.6 | −1.3 | −42.2 | 17.8 | −1.3 | 54 | 59 | 41 | 54 | 95 | 97 | 86 | 95 | ||||
1000 | 500 | 10 | 0.8 | 0.2 | −1.1 | 141.4 | 13.1 | −1.1 | 46 | 72 | 42 | 46 | 95 | 99 | 89 | 95 |
0.6 | −4.0 | 53.4 | −9.7 | −4.0 | 34 | 37 | 33 | 34 | 95 | 97 | 94 | 95 | ||||
0.5 | 0.2 | −3.7 | 255.4 | −44.3 | −3.2 | 43 | 80 | 25 | 44 | 95 | 99 | 57 | 95 | |||
0.6 | −0.3 | 19.5 | −39.2 | −0.3 | 39 | 44 | 30 | 40 | 95 | 97 | 85 | 95 | ||||
30 | 0.8 | 0.2 | −1.5 | 148.6 | −14.6 | −1.6 | 44 | 71 | 41 | 44 | 95 | 99 | 92 | 95 | ||
0.6 | −3.7 | 15.7 | −9.8 | −3.6 | 33 | 37 | 32 | 34 | 95 | 96 | 94 | 95 | ||||
0.5 | 0.2 | −3.9 | 251.2 | −55.3 | −3.3 | 41 | 77 | 23 | 40 | 94 | 99 | 63 | 94 | |||
0.6 | −0.1 | 20.0 | −40.8 | −0.1 | 38 | 42 | 29 | 39 | 95 | 97 | 85 | 95 | ||||
200 | 10 | 0.8 | 0.2 | −1.0 | 141.4 | −13.1 | −1.0 | 46 | 72 | 42 | 46 | 95 | 99 | 89 | 95 | |
0.6 | −3.9 | 15.3 | −9.7 | −4.0 | 34 | 37 | 33 | 34 | 94 | 97 | 94 | 95 | ||||
0.5 | 0.2 | −3.8 | 225.4 | −44.3 | −3.3 | 44 | 80 | 25 | 44 | 96 | 99 | 57 | 95 | |||
0.6 | −0.3 | 19.5 | −39.2 | −0.6 | 39 | 44 | 30 | 40 | 95 | 97 | 85 | 95 | ||||
30 | 0.8 | 0.2 | −1.6 | 148.6 | 14.6 | −1.5 | 45 | 71 | 41 | 44 | 95 | 99 | 92 | 95 | ||
0.6 | −3.7 | 15.7 | −9.8 | −3.6 | 33 | 37 | 32 | 34 | 95 | 96 | 94 | 94 | ||||
0.5 | 0.2 | −4.0 | 251.2 | −55.3 | −3.3 | 42 | 77 | 23 | 40 | 95 | 99 | 63 | 94 | |||
0.6 | −0.1 | 20.0 | −40.8 | −0.1 | 38 | 42 | 29 | 39 | 95 | 97 | 85 | 95 |
3.3. A Real Data Analysis
4. Discussion and Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Clogg, C.C.; Rubin, D.B.; Schenker, N.; Schultz, B.; Weidman, L. Multiple imputation of industry and occupation codes in census public-use samples using bayesian logistic-regression. J. Am. Stat. Assoc. 1991, 86, 68–78. [Google Scholar] [CrossRef]
- Schafer, J.L.; Ezzati-Rice, T.M.; Johnson, W.; Khare, M.; Little, R.J.A.; Rubin, D.B. The NHANES III multiple imputation project. Race Ethn. 1996, 60, 28–37. [Google Scholar]
- Gelman, A.; King, G.; Liu, C. Not asked and not answered: Multiple imputation for multiple surveys. J. Am. Stat. Assoc. 1998, 93, 846–857. [Google Scholar] [CrossRef]
- Davey, A.; Shanahan, M.J.; Schafer, J.L. Correcting for selective nonresponse in the National Longitudinal Survey of Youth using multiple imputation. J. Hum. Resour. 2001, 36, 500–519. [Google Scholar] [CrossRef]
- Taylor, J.M.G.; Cooper, K.L.; Wei, J.T.; Sarma, A.V.; Raghunathan, T.E.; Heeringa, S.G. Use of multiple imputation to correct for nonresponse bias in a survey of urologic symptoms among African-American men. Am. J. Epidemiol. 2002, 156, 774–782. [Google Scholar] [CrossRef] [Green Version]
- Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley: Hoboken, NJ, USA, 1987. [Google Scholar]
- Rubin, D.B. Multiple imputation after 18+ years. J. Am. Stat. Assoc. 1996, 91, 473–489. [Google Scholar] [CrossRef]
- Meng, X.L. Multiple-imputation inferences with uncongenial sources of input. Stat. Sci. 1994, 9, 538–558. [Google Scholar]
- Fay, R.E. When are Inferences from Multiple Imputation Valid? Proceedings of the Section on Survey Research Methods; U.S. Bureau of the Census: Washington, DC, USA, 1992; pp. 227–232.
- Fay, R.E. Valid inferences from imputed survey data. Surv. Res. Methods 1993, 41–48. [Google Scholar]
- Binder, D.A.; Sun, W.M.; Amer Stat, A. Frequency valid multiple imputation for surveys with a complex design. Surv. Res. Methods 1996, 281–286. [Google Scholar]
- Wang, N.; Robins, J.M. Large-sample theory for parametric multiple imputation procedures. Biometrika 1998, 85, 935–948. [Google Scholar] [CrossRef]
- Nielsen, S.F. Proper and improper multiple imputation. Int. Stat. Rev. 2003, 71, 593–607. [Google Scholar] [CrossRef]
- Robins, J.M.; Wang, N.S. Inference for imputation estimators. Biometrika 2000, 87, 113–124. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Kim, J.K. Fractional imputation in survey sampling: A Comparative Review. Stat. Sci. 2016, 31, 415–432. [Google Scholar] [CrossRef]
- Kim, J.K.; Michael Brick, J.; Fuller, W.A.; Kalton, G. On the bias of the multiple-imputation variance estimator in survey sampling. J. R. Stat. Soc. Ser. B 2006, 68, 509–521. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman and Hall: New York, NY, USA, 1993. [Google Scholar]
- Sarndal, C.E. Methods for estimating the precision of survey estimates when imputation has been used. Surv. Methodol. 1992, 18, 241–252. [Google Scholar]
- Rao, J.N.K.; Shao, J. Jackknife variance-estimation with survey data under hot deck imputation. Biometrika 1992, 79, 811–822. [Google Scholar] [CrossRef]
- Rao, J.N.K. On variance estimation with imputed survey data. J. Am. Stat. Assoc. 1996, 91, 499–506. [Google Scholar] [CrossRef]
- Shao, J.; Sitter, R.R. Bootstrap for imputed survey data. J. Am. Stat. Assoc. 1996, 91, 1278–1288. [Google Scholar] [CrossRef]
- Shao, J.; Steel, P. Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. J. Am. Stat. Assoc. 1999, 94, 254–265. [Google Scholar] [CrossRef]
- Haziza, D. Imputation and inference in the presence of missing data. In Handbook of Statistics: Sample Surveys: Theory Methods and Inference; Rao, C.R., Pfeffermann, D., Eds.; Elsevier: Amsterdam, The Netherlands, 2009; Volume 29A, pp. 215–246. [Google Scholar]
- Kim, J.K.; Rao, J.N.K. A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika 2009, 96, 917–932. [Google Scholar] [CrossRef] [Green Version]
- Chen, S.; Haziza, D.; Léger, C.; Mashreghi, Z. Pseudo-population bootstrap methods for imputed survey data. Biometrika 2019, 106, 369–384. [Google Scholar] [CrossRef] [PubMed]
- Lu, K.F.; Li, D.Y.; Koch, G.G. Comparison between two controlled multiple imputation methods for sensitivity analyses of time-to-event data with possibly informative censoring. Stat. Biopharm. Res. 2015, 7, 199–213. [Google Scholar] [CrossRef]
- Gao, F.; Liu, G.F.; Zeng, D.; Xu, L.; Lin, B.; Diao, G.; Golm, G.; Heyse, J.F.; Ibrahim, J.G. Control-based imputation for sensitivity analyses in informative censoring for recurrent event data. Pharm. Stat. 2017, 16, 424–432. [Google Scholar] [CrossRef] [PubMed]
- Schomaker, M.; Heumann, H. Bootstrap inference when using multiple imputation. Stat. Med. 2018, 37, 2252–2266. [Google Scholar] [CrossRef] [PubMed]
- Darken, P.; Nyberg, J.; Ballal, S.; Wright, D. The attributable estimand: A new approach to account for intercurrent events. Pharm. Stat. 2020, 19, 626–635. [Google Scholar] [CrossRef]
- Nguyen, T.L.; Collins, G.S.; Pellegrini, F.; Moons, K.G.; Debray, T.P. On the aggregation of published prognostic scores for causal inference in observational studies. Stat. Med. 2020, 39, 1440–1457. [Google Scholar] [CrossRef] [Green Version]
- Bartlett, J.W.; Hughes, R.A. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Stat. Methods Med. Res. 2020, 29, 3533–3546. [Google Scholar] [CrossRef]
- Satterthwaite, F.E. An approximate distribution of estimates of variance components. Biom. Bull. 1946, 2, 110–114. [Google Scholar] [CrossRef] [Green Version]
- Schenker, N.; Welsh, A.H. Asymptotic results for multiple imputation. Ann. Stat. 1988, 16, 1550–1566. [Google Scholar] [CrossRef]
- Hox, J.J. Multilevel Analysis: Techniques and Applications; Lawrence Erlbaum: Mahwah, NJ, USA, 2002. [Google Scholar]
- Rubin, D.B.; Schenker, N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Stat. Assoc. 1986, 81, 366–374. [Google Scholar] [CrossRef]
Rbias (%) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scen | n | B | M | ||||||||||||
1 | 500 | 500 | 10 | 0.7 | 0.4 | 0.4 | 0.1 | 21 | 22 | 22 | 22 | 95 | 95 | 95 | 96 |
30 | 0.8 | 0.2 | 0.3 | 0.1 | 20 | 20 | 20 | 20 | 95 | 95 | 95 | 95 | |||
200 | 10 | 0.6 | 0.4 | 0.4 | −0.1 | 21 | 22 | 22 | 22 | 95 | 95 | 95 | 96 | ||
30 | 0.7 | 0.2 | 0.3 | 0.0 | 20 | 20 | 20 | 20 | 95 | 95 | 95 | 95 | |||
1000 | 500 | 10 | −1.8 | −2.4 | −2.3 | −2.3 | 14 | 15 | 15 | 16 | 94 | 95 | 95 | 96 | |
30 | −2.4 | −2.9 | −2.9 | −2.9 | 14 | 14 | 14 | 14 | 94 | 95 | 95 | 95 | |||
200 | 10 | −2.0 | −2.4 | −2.4 | −2.4 | 14 | 15 | 15 | 16 | 94 | 95 | 95 | 96 | ||
30 | −2.4 | −2.9 | −2.9 | −2.8 | 14 | 14 | 14 | 14 | 94 | 95 | 95 | 95 | |||
2 | 500 | 500 | 10 | −6.0 | −5.0 | −5.0 | −5.0 | 13 | 14 | 14 | 14 | 94 | 94 | 94 | 95 |
30 | −5.0 | −4.0 | −4.0 | −4.1 | 13 | 13 | 13 | 13 | 94 | 95 | 95 | 95 | |||
200 | 10 | −6.0 | −5.0 | −5.0 | −5.0 | 13 | 14 | 14 | 14 | 94 | 94 | 94 | 95 | ||
30 | −5.0 | −4.0 | −4.0 | −4.0 | 13 | 13 | 13 | 13 | 94 | 95 | 95 | 95 | |||
1000 | 500 | 10 | −1.0 | 0.4 | 0.4 | 0.0 | 9 | 10 | 10 | 10 | 95 | 95 | 95 | 96 | |
30 | −0.6 | 0.8 | 0.8 | 0.6 | 9 | 9 | 9 | 9 | 95 | 95 | 95 | 95 | |||
200 | 10 | −1.1 | 0.4 | 0.4 | −0.1 | 9 | 10 | 10 | 10 | 95 | 95 | 95 | 96 | ||
30 | −0.8 | 0.8 | 0.8 | 0.7 | 9 | 9 | 9 | 9 | 95 | 95 | 95 | 95 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, L.; Zhao, Y. A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats 2022, 5, 1231-1241. https://doi.org/10.3390/stats5040074
Yu L, Zhao Y. A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats. 2022; 5(4):1231-1241. https://doi.org/10.3390/stats5040074
Chicago/Turabian StyleYu, Lili, and Yichuan Zhao. 2022. "A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling" Stats 5, no. 4: 1231-1241. https://doi.org/10.3390/stats5040074
APA StyleYu, L., & Zhao, Y. (2022). A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats, 5(4), 1231-1241. https://doi.org/10.3390/stats5040074