An Approach to Integrating a Non-Probability Sample in the Population Census
Abstract
:1. Introduction
2. Methods
2.1. Sampling from the Finite Population
2.2. Auxiliary Data and Outcome Regression Model
2.3. Estimation of Population Parameters
2.3.1. Post-Stratified Generalized Regression Estimator
2.3.2. Inverse Probability Weighting Estimator Based on the Propensity Score Model
- A1
- The indicator and the study variable are independent given the covariates .
- A2
- All units have a nonzero propensity score: for all .
- A3
- The indicators and are independent, given and for .
- C1
- The population size N and the sample size satisfy .
- C2
- There exist and such that for all units .
- C3
- The finite population and the propensity scores satisfy , as well as , and is a positive definite matrix.
2.3.3. Generalized Difference Estimator
2.3.4. Doubly Robust Estimator
- C4
- For each , is continuous in and for in the neighborhood of , and .
- C5
- For each , is continuous in and for in the neighborhood of , and .
2.3.5. Composite Estimators
3. Application to the Survey of the Lithuanian Census
3.1. Motivation
3.2. Sample Selection
- (i)
- At first, a voluntary online survey was carried out from 15 January to 28 February, 2021, which allowed for the collection of statistical data from approximately 2% of the census population (about 54,000 respondents), resulting in the non-probability sample .
- (ii)
- After the end of the online survey, a sampling frame for probability sampling was constructed. It excluded certain addresses, e.g., if at least one individual from the address participated in the online survey, if it was an institution, if more than 15 individuals were permanent residents, among other rules. These units, which were not included in the sampling frame, comprised the part of the sample s.
- (iii)
- Lastly, the probability sample was drawn from the sampling frame , which was divided into strata according to the municipality intersected with the area of residence, i.e., urban or rural. The number of addresses sampled from a particular stratum was proportional to the size of the stratum, resulting in around 40,000 addresses sampled from the Population Register in total; approximately 6% of the census population was interviewed through the telephone survey (about 171,000 respondents).
3.3. Imputation of Missing Values
3.4. Application to Religion Proportions
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MAR | missing at random |
IPW | inverse probability weighting |
DR | doubly robust |
References
- Axelson, M.; Holmberg, A.; Jansson, I.; Westling, S. A register-based census: The Swedish experience. In Administrative Records for Survey Methodology; Chun, A.Y., Larsen, M.D., Durrant, G., Reiter, J.P., Eds.; Wiley: Hoboken, NJ, USA, 2021; pp. 179–204. [Google Scholar]
- Bernardini, A.; Brown, J.; Chipperfield, J.; Bycroft, C.; Chieppa, A.; Cibella, N.; Dunnet, G.; Hawkes, M.; Hleihel, A.; Law, E.; et al. Evolution of the person census and the estimation of population counts in New Zealand, United Kingdom, Italy and Israel. Stat. J. IAOS 2022, 38, 1221–1237. [Google Scholar] [CrossRef]
- Bycroft, C. Census transformation in New Zealand: Using administrative data without a population register. Stat. J. IAOS 2015, 31, 401–411. [Google Scholar] [CrossRef]
- Mule, V.T., Jr.; Keller, A. Administrative records applications for the 2020 census. In Administrative Records for Survey Methodology; Chun, A.Y., Larsen, M.D., Durrant, G., Reiter, J.P., Eds.; Wiley: Hoboken, NJ, USA, 2021; pp. 205–229. [Google Scholar]
- Tille, Y. Sampling and Estimation from Finite Populations; Wiley Series in Survey Methodology; Wiley: Hoboken, NJ, USA, 2020. [Google Scholar]
- Argüeso, A.; Vega, J.L. A population census based on registers and a “10% survey” methodological challenges and conclusions. Stat. J. IAOS 2014, 30, 35–39. [Google Scholar]
- Beaumont, J.F. Are probability surveys bound to disappear for the production of official statistics? Surv. Methodol. 2020, 46, 71–96. [Google Scholar]
- Kim, J.-K. A gentle introduction to data integration in survey sampling. Surv. Stat. 2022, 85, 19–29. [Google Scholar]
- Rao, J.N.K. On making valid inferences by integrating data from surveys and other sources. Sankhya B 2021, 83, 242–272. [Google Scholar] [CrossRef]
- Wu, C. Statistical inference with non-probability survey samples. Surv. Methodol. 2022, 48, 283–311. [Google Scholar]
- Meng, X.-L. Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Ann. Appl. Stat. 2018, 12, 685–726. [Google Scholar] [CrossRef] [Green Version]
- Kim, J.-K.; Tam, S.-M. Data integration by combining big data and survey sample data for finite population inference. Int. Stat. Rev. 2021, 89, 382–401. [Google Scholar] [CrossRef]
- Tam, S.-M.; Kim, J.-K. Big data ethics and selection-bias: An official statistician’s perspective. Stat. J. IAOS 2018, 34, 577–588. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Li, P.; Wu, C. Doubly robust inference with nonprobability survey samples. J. Am. Stat. Assoc. 2020, 115, 2011–2021. [Google Scholar] [CrossRef] [Green Version]
- Castro-Martín, L.; Rueda, M.d.M.; Ferri-García, R. Estimating general parameters from non-probability surveys using propensity score adjustment. Mathematics 2020, 8, 2096–2109. [Google Scholar] [CrossRef]
- Wu, C.; Sitter, R.R. A model-calibration approach to using complete auxiliary information from survey data. J. Am. Stat. Assoc. 2001, 96, 185–193. [Google Scholar] [CrossRef] [Green Version]
- Särndal, C.-E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling; Springer Series in Statistics; Springer: New York, NY, USA, 1992. [Google Scholar]
- Deville, J.C.; Särndal, C.-E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992, 87, 376–382. [Google Scholar] [CrossRef]
- Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models; Chapman and Hall: New York, NY, USA, 1989. [Google Scholar]
- Kim, J.K.; Park, S.; Chen, Y.; Wu, C. Combining non-probability and probability survey samples through mass imputation. J. R. Stat. Soc. Ser. A 2021, 184, 941–963. [Google Scholar] [CrossRef]
- Kowarik, A.; Templ, M. Imputation with the R Package VIM. J. Stat. Softw. 2016, 74, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Dick, P. Modelling net undercoverage in the 1991 Canadian census. Surv. Methodol. 1995, 21, 45–54. [Google Scholar]
- Yang, S.; Kim, J.-K. Statistical data integration in survey sampling: A review. Jpn. J. Stat. Data Sci. 2020, 3, 625–650. [Google Scholar] [CrossRef]
Voluntary Sample | Population | Difference in % | ||
---|---|---|---|---|
Ethnicity | Pole | 0.35 | 0.07 | 441 |
Education | higher | 0.48 | 0.20 | 134 |
County | Vilnius | 0.64 | 0.29 | 121 |
Employment | employed | 0.63 | 0.45 | 41 |
Age group | ≥30, <50 | 0.37 | 0.27 | 37 |
Marital status | married | 0.52 | 0.42 | 25 |
Gender | male | 0.41 | 0.46 | −11 |
Ethnicity | Lithuanian | 0.56 | 0.85 | −34 |
Education | (lower) secondary | 0.24 | 0.37 | −35 |
Education | primary | 0.09 | 0.20 | −55 |
Voluntary Sample | Population | Difference in % | |
---|---|---|---|
Karaites | 0.00130 | 0.00009 | 1307 |
New Apostolic Church | 0.00161 | 0.00014 | 1049 |
Evangelical Reformed Believers | 0.00833 | 0.00207 | 302 |
Other | 0.01596 | 0.00514 | 211 |
Pentecostalists | 0.00198 | 0.00067 | 194 |
Greek Catholics (Uniats) | 0.00048 | 0.00021 | 131 |
Evangelical Lutherans | 0.01311 | 0.00585 | 124 |
Judaists | 0.00074 | 0.00035 | 112 |
Baptists and Free Churches | 0.00083 | 0.00048 | 74 |
Sunni Muslims | 0.00130 | 0.00085 | 52 |
Not indicated | 0.07621 | 0.10090 | −24 |
Seventh Day Adventist Church | 0.00026 | 0.00032 | −20 |
None | 0.07580 | 0.06424 | 18 |
Old Believers | 0.00615 | 0.00683 | −10 |
Orthodox | 0.04047 | 0.03787 | 7 |
Roman Catholics | 0.75548 | 0.77398 | −2 |
Roman Catholics | 0.78391 | 0.77233 | 0.73664 | 0.74101 |
Not indicated | 0.05671 | 0.10112 | 0.15701 | 0.15025 |
None | 0.09696 | 0.06146 | 0.05408 | 0.05477 |
Orthodox | 0.04150 | 0.04113 | 0.03433 | 0.03482 |
Old Believers | 0.00806 | 0.00767 | 0.00434 | 0.00419 |
Evangelical Lutherans | 0.00565 | 0.00604 | 0.00389 | 0.00398 |
Other | 0.00282 | 0.00493 | 0.00566 | 0.00625 |
Evangelical Reformed Believers | 0.00208 | 0.00221 | 0.00122 | 0.00126 |
Pentecostalists | 0.00037 | 0.00061 | 0.00117 | 0.00158 |
Sunni Muslims | 0.00075 | 0.00089 | 0.00058 | 0.00064 |
Baptists and Free Churches | 0.00034 | 0.00044 | 0.00017 | 0.00016 |
Judaists | 0.00039 | 0.00040 | 0.00025 | 0.00029 |
Greek Catholics (Uniats) | 0.00010 | 0.00023 | 0.00030 | 0.00038 |
Seventh Day Adventist Church | 0.00016 | 0.00030 | 0.00014 | 0.00013 |
New Apostolic Church | 0.00012 | 0.00014 | 0.00015 | 0.00020 |
Karaites | 0.00008 | 0.00010 | 0.00008 | 0.00010 |
Roman Catholics | 0.73664 | 0.73349 | 0.74101 | 0.73811 |
Not indicated | 0.15701 | 0.15452 | 0.15025 | 0.14832 |
None | 0.05408 | 0.05319 | 0.05477 | 0.05401 |
Orthodox | 0.03433 | 0.03804 | 0.03482 | 0.03592 |
Old Believers | 0.00434 | 0.00503 | 0.00419 | 0.00486 |
Evangelical Lutherans | 0.00389 | 0.00460 | 0.00398 | 0.00475 |
Other | 0.00566 | 0.00636 | 0.00625 | 0.00709 |
Evangelical Reformed Believers | 0.00122 | 0.00151 | 0.00126 | 0.00175 |
Pentecostalists | 0.00117 | 0.00126 | 0.00158 | 0.00191 |
Sunni Muslims | 0.00058 | 0.00069 | 0.00064 | 0.00094 |
Baptists and Free Churches | 0.00017 | 0.00024 | 0.00016 | 0.00037 |
Judaists | 0.00025 | 0.00031 | 0.00029 | 0.00051 |
Greek Catholics (Uniats) | 0.00030 | 0.00034 | 0.00038 | 0.00060 |
Seventh Day Adventist Church | 0.00014 | 0.00017 | 0.00013 | 0.00031 |
New Apostolic Church | 0.00015 | 0.00017 | 0.00020 | 0.00035 |
Karaites | 0.00008 | 0.00009 | 0.00010 | 0.00022 |
(i) | (ii) | (iii) | |
---|---|---|---|
New Apostolic Church | 1 | 6 | −88 |
Karaites | 4 | 43 | −88 |
Greek Catholics (Uniats) | 2 | 16 | −84 |
Seventh Day Adventist Church | 4 | 22 | −79 |
Judaists | 6 | 35 | −76 |
Pentecostalists | 1 | 3 | −73 |
Baptists and Free Churches | 9 | 42 | −71 |
Sunni Muslims | 6 | 20 | −65 |
Evangelical Reformed Believers | 6 | 12 | −46 |
Other | 2 | 2 | −14 |
Evangelical Lutherans | 7 | 7 | −7 |
Old Believers | 19 | 22 | 4 |
Orthodox | 18 | 9 | 146 |
None | 0 | 0 | 261 |
Not indicated | 0 | 0 | 366 |
Roman Catholics | 0 | 0 | 1389 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Burakauskaitė, I.; Čiginas, A. An Approach to Integrating a Non-Probability Sample in the Population Census. Mathematics 2023, 11, 1782. https://doi.org/10.3390/math11081782
Burakauskaitė I, Čiginas A. An Approach to Integrating a Non-Probability Sample in the Population Census. Mathematics. 2023; 11(8):1782. https://doi.org/10.3390/math11081782
Chicago/Turabian StyleBurakauskaitė, Ieva, and Andrius Čiginas. 2023. "An Approach to Integrating a Non-Probability Sample in the Population Census" Mathematics 11, no. 8: 1782. https://doi.org/10.3390/math11081782
APA StyleBurakauskaitė, I., & Čiginas, A. (2023). An Approach to Integrating a Non-Probability Sample in the Population Census. Mathematics, 11(8), 1782. https://doi.org/10.3390/math11081782