3. Results
Table 5 presents the results from the DID estimation. Although the coefficient of the DID estimator still remains positive and statistically significant at the 5% level, the decline of its absolute value indicates that demographic characteristics play an important role in explicating the health condition. According to the results of the
t-test, young and single individuals may tend to move to cities in hopes of better economic opportunity. Generally, young peoples’ health status is better than that of elderly people, therefore, the health advantage could partly attribute to the youth and vigor; the empirical results are parallel with previous studies [
6,
16,
17,
22,
36,
42].
Given the heterogeneity of the pre-trend between the treated and control groups according to the result of the t-test, we employed propensity score matching to mitigate the selection bias and obtain a more comparable treatment and control group. Finally, we combined the PSM and DID to find a more reliable average health effect of migration on Chinese peasant workers.
In the first step, we used the logit regression to estimate the propensity score of entering the treatment group (migrant). In addition to age, gender, education level, and marital status, we want to add socioeconomic characteristics as covariates to match, thereby further enhancing the comparability of the control group.
Considering the conciseness of the regression framework, we decided to use factor analysis according to Hamilton [
46]. The principal factor method using iterated communalities can identify the potential dimension that best explains the correlation among the variables. We utilized the principal factor method of iterative common factor variance and then applied factor oblique rotation, which simplifies the factor pattern and allows some degree of correlation among factors. Although related factors are less parsimonious in statistical significance because they have overlapping variances, if the factors generated in this article are considered to be dimensions that reflect socioeconomic status and are not necessarily irrelevant, the use of oblique rotation is more in line with the actual situation than the orthogonal rotation commonly used in the existing literature. According to the KMO (Kaiser-Meyer-Olkin) test result shown in
Table 6, the KMO value of the majority of our selected variables is more than 0.7 and the overall KMO value is 0.6775, which implies that factor analysis is appropriate here.
According to the loading of different variables on factors, we extracted three factors: Subjective socioeconomic perception, career influence, and objective socioeconomic status. The subjective socioeconomic perception factor includes the self-rated family financial situation, socioeconomic status compared with peers, autonomy in determining the way of working, frequency of depression, social equality perception, and whether a CCP member or not. The career influence factor includes whether the company is within the system and whether the company is a state-owned enterprise. The objective socioeconomic status factor includes household income per capita, whether the respondent possesses a private car, whether the respondent has medical insurance, whether the respondent signed a written labor contract with the current employer, the respondent’s fluency level of speaking Mandarin, and the frequency of participation in cultural events.
After completing the factor construction, we sorted the data randomly and subsequently implemented the PSM and PSM-DID approaches. A one-to-one match was primarily used (using the closest propensity score observed as a control). Kernel matching and local linear regression matching were performed for the robustness test. We first used the one-to-one match, as shown in
Appendix A:
Figure A1. Most of the variables’ standardized bias narrowed after matching. According to
Appendix B:
Figure A2, the majority of the observed variables were in the “On Support” category, which indicates only slight loss of samples during PSM. We subsequently constructed two other matching techniques for robustness checks, kernel matching, and local linear regression matching. As shown in
Table 7, there were no conspicuous differences among the different PSM approaches, which implies that the PSM approach does not depend on the specific method.
In the next step, we used the PSM-DID (Propensity score matching combined Difference-in-Difference) model to construct a more comparable control group. As shown in
Table 8, the DID estimator is neutralized after the PSM-DID process, which indicates that the migrant workers’ health priorities could disappear after controlling their attributes.
The balanced test shown in
Table 9 indicates that the PSM approach played a role in eliminating the differences between the two groups, the mean value of selected covariates between treatment group and control group didn’t demonstrate significant difference anymore, which suggest that practicing PSM-DID is apposite here.
Given that our dependent variable of health status is defined by self-rated health, which is the ordinal variable, we decided to employ ordered logit regression and mixed effect logit regression to conduct a robustness check.
First, we used the ordered logit approach. As shown in
Table 10, in Model 1, we included only migrant status, hukou status, and their interactive term. In Model 2, based on Model 1, we added further demographic variables, such as age, gender, marital status, and the interaction term combining gender and marital status. In Model 3, based on Model 2, we added education level. In Model 4, Model 5, and Model 6, we added three previously generated factors: Subjective socioeconomic status, career influence, and objective socioeconomic status. Finally, in Model 7, we added three factors simultaneously based on Model 3.
According to Model 1, we incorporated only migrant status, hukou category, and their interaction term. The coefficient of mobility was positive and significant at the 1‰ level, and the coefficient of agricultural hukou was negative and significant at the 1‰ level, conforming to common sense. The migration progress is highly demanding on health. The widening gap between rural and urban China, including but not limited to basic infrastructure, sanitary conditions, public goods, medical care services, and the affordability of remedies, could lead to deleterious effects on rural residents’ health [
13,
22,
36]. In addition, farming may involve pesticides, which could also harm farmers. However, the coefficient of our interaction term is positive and statistically significant at the 1‰ level and possesses the highest absolute value, which accounts for the healthy migrant hypothesis, the stressful migrant process and working in unfamiliar cities. Thus, better health conditions are a prerequisite. Furthermore, in rural China, the quality of education lags far behind that of cities, which causes lower general human capital [
47] among rural migrants. Their urban employers do not want to invest in job training for them because the hukou system, such as the Chinese internal passport, inhibits rural migrant workers from settling in the destination city; therefore, the paucity of job training reduces their specific human capital. This lack of human capital makes health particularly valuable to the migrant workers. In Model 2, when we add age, gender, and marital status based on Model 1, the coefficient of mobility declines in both absolute value and statistical significance, as did the interaction term of mobility multiplied by agricultural hukou. The coefficient of age is negative and significant at the 1‰ level, which is normal since young people are generally healthier than the elderly. According to our results, females and single women tend to be less healthy. Interestingly, the interaction term combining females and single women is positive and significant at the 5% level, which may suggest that as Chinese rural women move to cities, they can disentangle themselves from violent husbands and overbearing in-laws, women in rural China are more likely to suffer domestic violence than female citizens in urban China [
48,
49,
50]. Due to urbanization, the grip of tradition loosens, and women possess more choices about whom they marry or live with; therefore, living among strangers in metropolises may not be a cause for despair but a chance to throw off the fetters of custom and kinship. All of these causes make their lives more bearable and lead to better health status among single women. In Model 3, when we add education, the coefficient of mobility becomes zero, and the coefficient of the interaction term combining mobility and agricultural hukou also declines compared to Model 1. The protective effect from education to health is revealed in Model 3. As shown in the regression results of Models 4 to 7, we discovered that the statistical significance of mobility, agricultural hukou, and their interaction term decreased as other demographic, cognitive, and socioeconomic characteristics were gradually incorporated in the model, and the coefficient of mobility turned from positive to negative. Both the absolute value and the statistical significance decreased in the coefficient of the interaction term combining mobility and agricultural hukou, which implies that migration does not have a positive effect on health in China. The initial positive and significant health effect could be the result of self-selection since healthier individuals are more capable of migrating. We note that the preponderance of rural migrant workers cannot obtain necessary medical treatment because of the lack of a local hukou, which determines their access to public health services in the destination cities. When a dangerous work environment and dilapidated residences lead to a precipitous deterioration in migrant workers’ health and increase their demand for medical treatment, they are more likely to return to their hometown to address their declining health. The coefficient of the interaction term combining single and female and the coefficient of education becomes insignificant but also positive. The advantages of being single for women and education may be reflected in socioeconomic factors.
However, people in different regions may have different criteria about their health status. In the next step, we divide self-reported health into dual dummy variables, with “very unhealthy”, “less healthy”, and “ordinary” equaling zero and “fairly healthy” and “very healthy” equaling one. We employ mixed-effect logit model to allow the intercepts and slopes to vary among respondents from different regions. First, we include each place of the interview as a random intercept in the mixed-effect logit model. In the regression result, compared to the normal logit regression, the likelihood ratio test indicates that the random intercept manifests significant disparities. We reconsider the seven aforementioned models using mixed-effect logit regression, as shown in
Table 11. When we incorporate the random intercept of every interview location into the mixed-effect logit model, we can observe that the positive effect from mobility to rural migrant health decreases as other demographic, cognitive, and socioeconomic factors are gradually brought into the function, which corresponds to our assumption. The initial better health condition was largely due to self-selection given that better health could be rural migrant workers’ most important competitive advantage.
Next, we considered whether the health effect from mobility on rural migrants’ health varied among different regions. We incorporated the random intercept and slope simultaneously into our seven previously constructed models, which allowed the coefficient of the interactive term combining agricultural hukou and migration to vary with different survey regions. Compared to the normal logit regression, the likelihood ratio test indicates that the random slope manifests significant disparities. In the next step, we determined the total effect (=random effect + fixed effect) of the interaction term combining mobility and agricultural hukou on health status in each interview location in the seven models. As shown in
Table 12, with regard to the coefficient of the interaction term combining mobility and agricultural hukou, its absolute value and statistical significance both decreased as other characteristics gradually entered the function, which could verify that the positive health effect was from initial health rather than the migrant process. In
Figure 1, the more visualized form, we can observe that the total health effect from the interaction term combining mobility and agricultural hukou turned from positive to negative in some interview locations when other related factors entered the function.
Before we began to explore the salmon bias hypothesis in China, we decided to use the Harmonized CHARLS (China Health and Retirement Longitudinal Study) data to further verify the healthy migrant hypothesis to improve the robustness check.
We used the dependent variable self-reported health in 2013 and 2015, the dummy variable agricultural hukou in 2013 (agricultural hukou = 1, others = 0), the dummy variable “work status” in 2013 (non-agricultural work = 1, others = 0 in 2012), and their interactive term as our independent variables of interest. Given the ordinal attributes of our dependent variables, we employed the ordered logit model to explore the health effect of mobility on migrant workers. In the Harmonized CHARLS data, there were two scales of self-reported health. Through our adjustment, we had four groups of self-reported health: One of the scales ranges from 1 for Poor to 5 for Excellent in 2013 and 2015, and the other scale ranges from 1 for Very Bad to 5 for Very Good in 2013 and 2015. As shown in
Table 13, regardless of the method of defining self-reported health, the results are similar: The coefficient of agricultural hukou is negative and significant at the 1% level in the four models, which implies that rural residents’ health is generally worse than that of urban citizens. The coefficient of non-agricultural employment history is positive and significant at the 1% level. We are interested in the coefficient of the interaction term combining agricultural hukou and non-agricultural work experiences. In Model 1 and Model 3, using the dependent variable measuring health in 2013, the coefficient of the interaction term was positive and significant at the 5% level. When the dependent variable measured health status in 2015, the coefficient of both agricultural hukou and non-agricultural work experiences did not fluctuate conspicuously. In contrast, the coefficients of the interaction terms both became insignificant and even negative in Model 2, and their absolute values both obviously declined, which indicates that working in cities had a negative effect on the migrant workers. This finding is consistent with the previously generated results suggesting that the initial better health among migrant workers was due to self-selection. Healthier people tend to migrate for better remuneration, and the process of working outside their hukou-registered locale neutralizes their initial health advantage.
After testing the healthy migrant hypothesis in China, we chose samples who lived in rural areas in 2015 and whose work status in the 2013 survey questionnaire was non-agricultural work (i.e., engaged in non-agricultural work in 2012) to further explore the salmon bias hypothesis. First, we used two scales of the ordinal variable “self-reported health” as the dependent variable (one of the scales ranges from 1 for Poor to 5 for Excellent, the other scale ranges from 1 for Very Bad to 5 for Very Good), the dummy variable “work status” in 2013 (non-agricultural work = 1, others = 0 in 2012), the dummy variable “rural or urban residence” in 2015 (rural = 1, urban = 0), and their interactive term as our independent variables of interest. Considering the attributes of our ordinal dependent variable, we employed the ordered logit model. There is an overlap between agricultural hukou and living in a rural area. To assuage concerns about collinearity, we dropped the hukou category in our subsequent functional framework. As shown in
Table 14, regardless of which approaches to measuring self-reported health we chose, the results were similar. The coefficient of non-agricultural work experience in 2012 was positive and significant at the 1% level in Models 1 and 2. In Models 3 and 4, when we added demographic and cognitive characteristics such as gender, age, and education level, the coefficient of non-agricultural work remained positive and significant at the 1% level. We are most interested in the coefficient of the interaction term combining living in rural areas in 2015 and non-agricultural work experience in 2012, which are insignificant and small in absolute value in all four models. Although we are aware of the initial better health condition among migrant workers when they start to migrate, the low statistical significance and low absolute value of the interaction term indicates that rural migrant workers who return to their hometowns tend to experience health deterioration.
Thanks to the copious amount of health indicators in the Harmonized CHARLS, we could utilize factor analysis with iterative common factor variance combined with oblique rotation, which permits the correlation among different factors to extract other health indicators. The factors obtained through oblique rotation can represent health conditions in different dimensions, which allowed us to vary the dependent variables to deploy a robustness check for the salmon bias hypothesis. However, the larger numbers of different health indicators represent worse health conditions in the Harmonized CHARLS. The independent variables have the opposite positive and negative coefficients when using symptoms or health behaviors to represent health compared with self-reported health. For instance, in the Harmonized CHARLS, the frequency of drinking in the previous year equals zero when the interviewee never drank in the previous year and equals eight when the interviewee drank more than twice per day; lower drinking frequency represents better health behavior. Since medical care utilization is constrained by the pecuniary budget and less affluent rural residents may endure disease or choose cheaper approaches to manage their health problems rather than going to hospital and seeking standard treatment, to assuage concerns of endogeneity, we excluded health care utilization or insurance and only included symptoms (for example, ADLs, IADLs, CESD10) and health behaviors (for example, drinking and smoking) to implement the factor analysis and further produce new factors representing the health condition in 2015.
We chose 20 variables in the Harmonized CHARLS to reflect the health condition in 2015: (1) Six-item summary of activities of daily living (ADL) containing bathing, dressing, eating, getting in/out of bed, using the toilet, and controlling urination (each item equals one if the interviewee had difficulty completing this item independently and otherwise equals zero); (2) five-item summary of instrumental activities of daily living (IADL) including whether the interviewee had difficulty managing money, taking medications, shopping for groceries, preparing meals, and making phone calls; (3) seven-item summary of any difficulty with mobility activities, including walking 100 m, climbing several flights of stairs, getting up from a chair, stooping, kneeling or crouching, extending arms up, lifting 5 kg, and picking up a small coin; (4) CESD10 ranging from 0 to 30 with higher scores indicating that the respondent felt more negative during the past week; (5)–(17) the respondent’s answer to the question regarding whether a doctor had told the respondent that he or she had a specific condition, including high blood pressure; diabetes or high blood sugar; cancer or a malignant tumor; chronic lung disease; heart attack, coronary heart disease, angina, congestive heart failure, or other heart problems; stroke; emotional, nervous, or psychiatric problems; arthritis; dyslipidemia; liver disease; kidney disease; stomach or other digestive disease; and asthma; (18) the respondent’s response to the question regarding whether a doctor had told the respondent that he or she had a memory-related condition; (19) frequency of drinking behavior during the last year; and (20) current smoking habit.
The Kaiser-Meyer-Olkin measure of sampling adequacy in
Appendix C shows that the KMO value of the majority of the variables is more than 0.7 and the overall KMO value is 0.7398, which indicates that the factor analysis is appropriate.
This large number of variables may reflect fewer potential dimensions. We employed the principal factor method combined with iterated communalities, and through oblique rotation, we extracted four factors to represent health in different dimensions. According to the rotation result, the extracted four factors can represent daily activities, internal disease, organ disease, and unhealthy behaviors. Daily activities include ADL, IADL, mobility difficulties, CESD10; internal disease include high blood pressure, memory-related conditions, stroke, diabetes, cancer, psychiatric problems, heart problems, and dyslipidemia; organ disease includes lung, digestive, asthma, arthritis, liver, and kidney problems; and unhealthy behavior includes drinking and smoking habits.
As shown in
Table 15, we employ Ordinary Least Square (OLS) to examine the salmon bias hypothesis. When the dependent variables are daily activities or organ diseases, the results are consistent with our previously observed results. Regardless of whether the demographic and cognitive control variables are included, healthier people tended to perform non-agricultural work in 2012. In comparison, the coefficient of our interaction term of greatest interest combining non-agricultural work experience in 2012 and living in a rural area in 2015 declined in both absolute value and statistical significance, which implies that the returnees often had declining health. Interestingly, when the dependent variables were internal disease or unhealthy behavior, some fluctuations emerged. The results of the OLS model suggest that individuals living in rural areas in 2015 or working in non-agricultural departments in 2012 had fewer internal diseases, while those who lived in rural areas in 2015 and simultaneously had non-agricultural work experiences tended to suffer more internal diseases compared to others. Given the initial better health condition among rural migrant workers, this result strongly supports the salmon bias hypothesis. Those who choose to return to their rural hometown often lose their health advantages, and a non-agricultural employment history can produce chronic diseases among them via poor working environments. According to the regression results of the last two OLS function frameworks, the returnees seem to have less unhealthy behavior compared to their peers who remained in destination cities, which may reveal a possible mechanism of health deterioration by which living and working in cities without urban hukou could lead to alcohol drinking and smoking. These unhealthy behaviors may temporarily alleviate socioeconomic pressure and depression, but living and working in cities without urban hukou may also lead to drinking alcohol and smoking.
4. Discussion and Conclusions
This paper examined the “healthy migrant hypothesis” and the “salmon bias hypothesis” in China. Our empirical evidence supports both hypotheses in the Chinese context. In the ordered logit model, when we included only mobility, agricultural hukou, and their interactive term, the coefficient of the interaction term was positive and significant at 1% level. When we gradually added other demographic, cognitive, and socioeconomic characteristics, this positive effect disappeared in both absolute value and statistical significance. When we included the random intercept and random slope in the mixed-effect logit model, which allowed the intercept and the slope of the interaction term combining agricultural hukou to shift when different subgroups of the sample changed, this phenomenon persisted, which suggests that the health priorities among rural migrant workers can be attributed to their previous self-selection rather than the migrant effect. People endowed with initial better heath are more likely to migrate to cities seeking economic opportunities. Under the draconian hukou system, their career choices, access to local medical care, and opportunities for public services are limited in the host cities, and they may suffer discrimination from native citizens and mistreatment from their employers. Consequently, their initial health advantage gradually disappears. Because China’s New Rural Cooperative Medical System is only valid in hukou-registered locations, rural migrant workers tend to return to their hometown after their health deteriorates.
The difficulties of rural migrant workers can be blamed in part on broader conditions, such as the inability to obtain a strong connection between destination cities’ public services and local hukou; the presence at the nadir of career ladders caused by inadequate knowledge and the hukou system; dangerous and even polluted working environments; and crowded and dirty living conditions. After migrant workers experience deteriorating health, returning to their rural hometowns seems to be their best choice. According to our empirical outcomes, the returnees often had declining health. This is a serious problem in China. As the “healthy migrant” and “salmon bias” have consistent effects, the burden on the new rural cooperative medical system is continually increased, and the already widening gap of population health between urban and rural China further enlarges. The conclusions of this article provide important policy implications. Rural migrant workers must adapt if they are to survive in destination cities, and governments can help them to do so, such as by loosening the linkage between local hukou and medical care and providing more public services (especially low-rent housing) and occupation choices. Eliminating the problem altogether will be impossible, but considering the vitally important role rural migrant workers play in long-term development and the importance of promoting people’s happiness and perceptions of equality perception, it is time to help them overcome various obstacles in cities rather than allowing hukou to colonize its role of continuously signaling permission.
This article makes the following important contributions. (1) China has experienced the largest domestic migration process in human history. Empirical testing of the relationship between rural-urban migration and health in China can help people to more accurately understand the relationship between population migration and health. (2) We present some innovations in the research methods. Considering that rural and urban dualization formed under the hukou system, we utilized two dimensions—agricultural hukou and migrant status—to employ difference-in-difference (DID) to focus on the interaction term combining agricultural hukou and migration. Furthermore, we used propensity score matching-difference in difference (PSM-DID) to find a more comparable group and overcome some of the defects of DID to examine the healthy migrant hypothesis. Moreover, considering that interviewees in different regions may have different criteria for health conditions, we incorporated the random intercept of each survey site and the random slope of the interactive term combining agricultural hukou and migration into the mixed-effect logit regression model. (3) Existing articles often focus on a single independent variable and usually use one micro database. Based on the existing research, we explored the “healthy migrant hypothesis” and the “salmon bias hypothesis” under Chinese household registration systems by focusing on the interaction term combining agricultural hukou and migrant status, previous non-agricultural working experiences and subsequent residence in rural areas. We also utilized two Chinese micro-databases to make our conclusions more cogent. (4) Urbanization and its concomitant rural-urban migration are objective processes, and many developing countries are experiencing an urbanization process similar to that in China. In this regard, the conclusions of this paper on rural-urban migration and health can provide more general value.
There are some limitations of our study. For example, because our focus was on the interaction term, the Heckman two-step method and the entropy balanced matching method were not used. Furthermore, the mental health of Chinese migrant workers was not used as a dependent variable to observe whether it is applicable to the healthy migrant hypothesis and the salmon bias hypothesis. We leave these aspects for future research.