Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research
Abstract
:1. Introduction
2. Materials and Methods
2.1. Multivariate Mass Imputation Approaches
2.2. Monte Carlo Simulation Study
2.3. Real Data Application
3. Results
3.1. Monte Carlo Simulation Study
3.2. Real Data Application
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fuller, W.A. Sampling Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Baker, R.; Brick, J.M.; Bates, N.A.; Battaglia, M.; Couper, M.P.; Dever, J.A.; Gile, K.J.; Tourangeau, R. Summary report of the AAPOR task force on non-probability sampling. J. Surv. Stat. Methodol. 2013, 1, 90–143. [Google Scholar] [CrossRef]
- Cochran, W.G. Sampling Techniques; John Wiley & Sons: Hoboken, NJ, USA, 1977. [Google Scholar]
- Wu, C.; Thompson, M.E. Sampling Theory and Practice; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
- Vehovar, V.; Toepoel, V.; Steinmetz, S. Non-Probability Sampling; The Sage Handbook of Survey Methods; SAGE Publications: New York, NY, USA, 2016; Volume 1, pp. 329–345. [Google Scholar]
- Dutwin, D.; Buskirk, T.D. Telephone sample surveys: Dearly beloved or nearly departed? Trends in survey errors in the era of declining response rates. J. Surv. Stat. Methodol. 2021, 9, 353–380. [Google Scholar] [CrossRef]
- Lehdonvirta, V.; Oksanen, A.; Räsänen, P.; Blank, G. Social media, web, and panel surveys: Using non-probability samples in social and policy research. Policy Internet 2021, 13, 134–155. [Google Scholar] [CrossRef]
- Chen, S.; Campbell, J.; Spain, E.; Milligan, A.; Snider, C. Improving the representativeness of the Tribal Behavioral Risk Factor Surveillance System through data integration. BMC Public Health 2023, 23, 273. [Google Scholar] [CrossRef] [PubMed]
- Thompson, A.J.; Pickett, J.T. Are relational inferences from crowdsourced and opt-in samples generalizable? Comparing criminal justice attitudes in the GSS and five online samples. J. Quant. Criminol. 2020, 36, 907–932. [Google Scholar] [CrossRef]
- Valliant, R. Comparing alternatives for estimation from nonprobability samples. J. Surv. Stat. Methodol. 2020, 8, 231–263. [Google Scholar] [CrossRef]
- Tsung, C.; Kuang, J.; Valliant, R.L.; Elliott, M.R. Model-assisted calibration of non-probability sample survey data using adaptive LASSO. Surv. Methodol. 2018, 44, 117–145. [Google Scholar]
- Lee, S.; Valliant, R. Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociol. Methods Res. 2009, 37, 319–343. [Google Scholar] [CrossRef]
- Wang, L.; Valliant, R.; Li, Y. Adjusted logistic propensity weighting methods for population inference using nonprobability volunteer-based epidemiologic cohorts. Stat. Med. 2021, 40, 5237–5250. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.K.; Park, S.; Chen, Y.; Wu, C. Combining non-probability and probability survey samples through mass imputation. J. R. Stat. Soc. Ser. A 2021, 184, 941–963. [Google Scholar] [CrossRef]
- Yang, S.; Kim, J.K.; Hwang, Y. Integration of survey data and big observational data for finite population inference using mass imputation. Surv. Methodol. 2021, 47, 29–58. [Google Scholar]
- Chen, S.; Yang, S.; Kim, J.K. Nonparametric mass imputation for data integration. J. Surv. Stat. Methodol. 2022, 10, 1–24. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Li, P.; Wu, C. Doubly robust inference with nonprobability survey samples. J. Am. Stat. Assoc. 2020, 115, 2011–2021. [Google Scholar] [CrossRef]
- Chen, S.; Haziza, D. General purpose multiply robust data integration procedures for handling nonprobability samples. Scand. J. Stat. 2022. [Google Scholar] [CrossRef]
- Brand, J. Development, Implementation and Evaluation of Multiple Imputation Strategies for the Statistical Analysis of Incomplete Data Sets. Ph.D. Thesis, Erasmus University, Rotterdam, The Netherlands, 1999. [Google Scholar]
- Van Buuren, S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 2007, 16, 219–242. [Google Scholar] [CrossRef] [PubMed]
- Robbins, M.W. A flexible and efficient algorithm for joint imputation of general data. arXiv 2020, arXiv:2008.02243. [Google Scholar]
- Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 81. [Google Scholar]
- Chu, A.; Brick, J.M.; Kalton, G. Weights Forcombining Surveys across Time or Space, 52nd Session ed; Bulletin of the International Statistical Institute: ContributedPapers, Book 2; International Statistical Institute: Voorburg, The Netherlands, 1999; pp. 103–104. [Google Scholar]
- Friedman, E.M.; Jang, D.; Williams, V.T. Combined Estimates from FourQuarterly Survey Data Sets. In Proceedings of the American Statistical Association Joint Statistical Meetings—Section on Survey Research Methods, Alexandria, VA, USA, 11–15 August 2002; pp. 1064–1069. [Google Scholar]
- Homas, S.; Wannell, B. Combining cycles of the Canadian Community Health Survey. Health Rep. 2009, 20, 53–58. [Google Scholar]
Variable | Population | Probability Sample | Nonprobability Sample |
---|---|---|---|
X1 (Value=1) | 0.200 | 0.199 | 0.041 |
X1 (Value=2) | 0.300 | 0.299 | 0.077 |
X2 | 2.300 | 2.301 | 2.836 |
X3 | 5.300 | 5.298 | 5.988 |
X4 | 0.602 | 0.602 | 0.688 |
X5 (Value=1) | 0.159 | 0.159 | 0.049 |
X5 (Value=2) | 0.538 | 0.538 | 0.478 |
X6 | 0.303 | 0.304 | 0.336 |
X7 | 1.600 | 1.602 | 2.727 |
Variable | Method | Estimate | Bias |
---|---|---|---|
X4 | mice (pmm) | 0.598 | −0.0036 |
mice (cart) | 0.573 | −0.0288 | |
mice (rf) | 0.706 | 0.1041 | |
gerbil | 0.603 | 0.0012 | |
X5 (Value=1) | mice (pmm) | 0.159 | 0.0002 |
mice (cart) | 0.140 | −0.0188 | |
mice (rf) | 0.048 | −0.1112 | |
gerbil | 0.160 | 0.0012 | |
X5 (Value=2) | mice (pmm) | 0.537 | −0.0007 |
mice (cart) | 0.543 | 0.0052 | |
mice (rf) | 0.569 | 0.0314 | |
gerbil | 0.534 | −0.0039 | |
X6 | mice (pmm) | 0.312 | 0.0091 |
mice (cart) | 0.282 | −0.0213 | |
mice (rf) | 0.269 | −0.0339 | |
gerbil | 0.308 | 0.0049 | |
X7 | mice (pmm) | 1.603 | 0.0025 |
mice (cart) | 1.603 | 0.0025 | |
mice (rf) | 1.603 | 0.0025 | |
gerbil | 1.603 | 0.0025 |
Variable | Value | BRFSS Weighted Frequency (Percent) | TBRFSS Unweighted Frequency (Percent) |
---|---|---|---|
age * | 18–24 | 46,597 (17.07) | 37 (5.83) |
25–29 | 30,027 (11.00) | 48 (7.56) | |
30–34 | 32,567 (11.93) | 46 (7.24) | |
35–39 | 29,459 (10.79) | 49 (7.72) | |
40–44 | 19,838 (7.27) | 55 (8.66) | |
45–49 | 17,961 (6.58) | 51 (8.03) | |
50–54 | 21,637 (7.93) | 63 (9.92) | |
55–59 | 21,303 (7.81) | 94 (14.80) | |
60−64 | 16,142 (5.91) | 76 (11.97) | |
65–79 | 12,267 (4.49) | 59 (9.29) | |
70+ | 25,129 (9.21) | 57 (8.98) | |
gender * | Male | 133,198 (48.80) | 140 (22.05) |
Female | 139,728 (51.20) | 495 (77.95) | |
marital * | Married | 120,946 (44.31) | 242 (38.11) |
Divorced/Separated | 50,397 (18.47) | 142 (22.36) | |
Widowed | 16,701 (6.12) | 60 (9.45) | |
Never Married | 72,022 (26.39) | 114 (17.95) | |
Member of unmarried Couple | 12,861 (4.71) | 77 (12.13) | |
education * | Less than High School | 38,116 (13.97) | 63 (9.92) |
High School Graduate | 103,878 (38.06) | 191 (30.08) | |
Some college/technical school | 89,158 (32.67) | 231 (36.38) | |
College Graduate | 41,774 (15.31) | 150 (23.62) | |
employ * | Employed/Self-employed | 157,742 (57.80) | 400 (62.99) |
Unemployed/Homemaker/Student | 49,507 (18.14) | 72 (11.34) | |
Retired | 31,124 (11.40) | 104 (16.38) | |
Unable to Work | 34,553 (12.66) | 59 (9.29) | |
income * | Less than USD 10,000 | 24,554 (9.00) | 117 (18.43) |
Less than USD 15,000 | 11,586 (4.25) | 60 (9.45) | |
Less than USD 20,000 | 32,404 (11.87) | 63 (9.92) | |
Less than USD 25,000 | 29,114 (10.67) | 76 (11.97) | |
Less than USD 35,000 | 35,740 (13.10) | 88 (13.86) | |
Less than USD 50,000 | 42,416 (15.54) | 89 (14.02) | |
Less than USD 75,000 | 40,524 (14.85) | 79 (12.44) | |
USD 75,000 or More | 56,587 (20.73) | 63 (9.92) | |
BMI Cat * | Underweight/Healthy weight | 64,439 (23.61) | 105 (16.54) |
Overweight | 98,507 (36.09) | 176 (27.72) | |
Obese | 109,980 (40.30) | 354 (55.75) | |
general health * | Excellent | 37,839 (13.86) | 56 (8.82) |
Very Good | 78,767 (28.86) | 144 (22.68) | |
Good | 85,727 (31.41) | 261 (41.10) | |
Fair/Poor | 70,593 (25.87) | 174 (27.40) |
Variable | Naïve | Mice (pmm) | Mice (cart) | Mice (rf) | Gerbil |
---|---|---|---|---|---|
cvd | 0.0353 | 0.0434 | 0.0290 | −0.0081 | 0.0401 |
asth | −0.0300 | −0.0273 | −0.0405 | −0.0867 | −0.0179 |
hlthcov | −0.1391 | −0.1548 | −0.1012 | −0.0535 | −0.1197 |
stroke | −0.0082 | −0.0015 | 0.0033 | −0.0334 | −0.0027 |
diabete | 0.1070 | 0.0508 | 0.0667 | 0.0252 | 0.0515 |
smoke | −0.0732 | 0.0154 | 0.0188 | −0.1343 | 0.0400 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, S.; Woodruff, A.M.; Campbell, J.; Vesely, S.; Xu, Z.; Snider, C. Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research. Stats 2023, 6, 617-625. https://doi.org/10.3390/stats6020039
Chen S, Woodruff AM, Campbell J, Vesely S, Xu Z, Snider C. Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research. Stats. 2023; 6(2):617-625. https://doi.org/10.3390/stats6020039
Chicago/Turabian StyleChen, Sixia, Alexandra May Woodruff, Janis Campbell, Sara Vesely, Zheng Xu, and Cuyler Snider. 2023. "Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research" Stats 6, no. 2: 617-625. https://doi.org/10.3390/stats6020039
APA StyleChen, S., Woodruff, A. M., Campbell, J., Vesely, S., Xu, Z., & Snider, C. (2023). Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research. Stats, 6(2), 617-625. https://doi.org/10.3390/stats6020039