2.2. Pollution Exposure
In urban areas in China, the main sources of PM
are electric power plants, industrial facilities, automobiles and heating, while in rural areas, PM
is primarily produced by biomass burning, agricultural dust, and windblown sources outside the region. According to official monitoring data, the annual mean PM
concentration across the 338 monitored cities was 50
g/m
in 2016 [
26], which is much higher than the 35
g/m
standard set by the 2012 National Ambient Air Quality Standard (NAAQS) and the 10
g/m
standard set by the World Health Organization [
27].
Prior to 2012, there was no formal regulation of PM
in China and few ground-level monitors for PM
. The 2012 NAAQS mandated the monitoring and reporting of PM
and set more stringent standards for other pollutants such as PM
. The implementation of the new standards took a staggered approach, with the first phase implemented in 2012 and covering 66 cities including municipalities, provincial capitals, provincial level cities, major cities in the Jing-Jin-Ji region (also known as the national capital region), Yangzi River Delta, and Pearl River Delta; the second phase implemented in 2013 covered 116 additional cities; and the third phase implemented in 2014 added another 177 cities. By the end of 2014, all prefecture-level cities were included. Following Barwick et al. [
25], we leverage the roll-out of monitoring as an information shock to households.
As of 2017, there were 1436 air pollution monitors across the country (refer to
http://www.cnemc.cn/sssj/ for more information, accessed on 12 March 2020); however, the monitor coverage remains sparse even in densely-populated cities. During the period of our study, from 2009 to 2014, fewer pollution monitors were in place and even fewer recorded PM
levels. We therefore use satellite-derived annual mean PM
estimates developed by van Donkelaar et al. [
28], which combine Aerosol Optical Depth (AOD) retrievals from the NASA MODIS, MISR, and SeaWIFS instruments with the GEOS-Chem chemical transport model, and are subsequently calibrated to regional ground-based observations of both the total and compositional mass using Geographically Weighted Regression (GWR). The calibration is conducted at the global scale and not for China exclusively. Monitoring PM
was not mandatory in China before 2012. The data consist of estimated annual mean PM
concentrations from 2009 to 2014 at the global scale with a grid cell resolution of 0.01° × 0.01°, which corresponds to roughly a square kilometer.
As discussed earlier, monitor-level PM
data are not available at the beginning of our study period; however, using monitor data from 2015 and 2016, we find a correlation of remote-sensing estimates and monitor-level averages for monitored sites to be about 0.7, without information of the composition of air-bone particulates. The mean pollution level across monitored locations is 53.4 and 49.02
g/m
for 2015 and 2016, respectively, while the mean of the remote-sensing estimates is slightly lower, at 52.7 and 46.9
g/m
, respectively. This slight difference is expected as remote-sensing estimates tend to understate pollution at higher levels due to saturation [
29]. In addition to better temporal coverage, remote-sensing pollution estimates offer several advantages over monitor-based readings for our setting, as the spatial coverage of surveyed households would be incomplete for the monitoring data. We illustrate this in
Figure 1, which presents the monitor locations and heat maps of satellite-derived pollution estimates (
Figure 1a–c), and the distributions of pollution levels (
Figure 1d) for Beijing, Chongqing and Shanghai, three of the largest cities in China. In all three cities, we observe sparse monitor coverage and a large within-city variation of pollution concentrations in both monitored and unmonitored areas.
We aggregate the pollution estimates for 21,592,032 grid cells within the administrative boundaries of China to the sub-district level. The number of sub-districts in a city is much larger than the number of pollution monitors. Using the same example of the cities Beijing, Chongqing, and Shanghai, the number of monitors are 12, 17 and 9, respectively; in contrast, there are 325, 1071, and 230 sub-districts in the respective cities according to the 2010 Township Population Census.
We match air pollution to individuals at the sub-district level, using the survey year and month.
Table A1 in the
Appendix A presents the survey schedule and the number of individuals surveyed in each month. As our data on labor hours is based on the year prior to the interview, we construct the pollution exposure measure for the same time period by calculating the weighted average of the pollution 12 months prior to the interview. For an individual living in sub-district
j interviewed in year
t and month
m, the pollution level assigned is
We assign pollution based on the sub-district of the individual’s residential address. It is, of course, possible that individuals may be exposed to a different pollution level at the workplace. We do not observe the precise location of the workplace for respondents, but the survey indicates whether the individual works outside their home sub-district. We do not expect, ex ante, that this measurement error would lead to any bias, but as a robustness check we also consider the sub-sample of individuals who work in the same sub-district as they reside.
2.3. Descriptive Statistics
About 64% of the 94,660 observations report labor force participation status, out of which 37,750 (62%) are in the labor force. Among those in the labor force, 5.7% are unemployed. The actual sample size for extensive margin models is smaller as individuals who report labor force participation and unemployment status in only one wave are dropped in individual-fixed effects estimations. Sample restrictions on missing values in key independent variables and post-migration observations also apply. The resulting sample size is 56,064 for the labor force participation model and 29,796 for the unemployment model. While 35,595 observations report hours worked, 29,389 are from individuals who are observed at least twice in the sample. We further restrict the sample by removing 3.7% of the observations with missing information on key variables, such as month of interview and sub-district identifier. We do not include the post-migration observations (about 5% of the remaining sample), as the survey does not include the timing of the move, making it impossible to accurately assign pollution exposure. The resulting sample size for our baseline specification is 25,665 observations for 11,474 individuals.
The average hours worked per week in our sample is 42.5, and those in non-agricultural sectors work almost 20 h longer than those employed only in agriculture. The within-person change in hours worked is centered around −0.82, with a standard deviation of 23.85. The average annual PM
for the sample is 44
g/m
. Non-agricultural workers face higher pollution with a PM
level of 49
g/m
compared to 42
g/m
for those who work in agriculture, reflecting the urban–rural pollution gap. The within-person change in PM
exposure has a mean of −0.45 and a standard deviation of 5.12. The distribution of hours worked and PM
(both levels and within-individual changes) are presented in
Figure A1.
The baseline sample includes an equal share of males and females; about 56% of the sample has only a primary school or lower level of education; about 9% of the sample is single, while 17% co-live with children aged 7 and below.
Table A2 provides the descriptive statistics for the key variables used in the analysis.