*2.3. Regression Model*

According to the seminal work of Henderson [56] and Roback [57], when a spatial equilibrium is achieved, the levels of residents' utilities in different regions are the same. Since migrants always tend to move to the cities with higher levels of utility, whether the migrants choose to settle down or not depends on the utility levels in cities. In addition to income, which is an important influencing factor on the utility levels of migrants, satisfactory air quality can also improve their subjective wellbeing [58,59]. Therefore, the higher the income and air quality of cities, the higher the levels of utility experienced by the migrants, and hence the higher the probability of settling down in those cities, with everything else being equal [60]. However, in many cases, high income and satisfactory air quality are incompatible. The existence of an environmental Kuznets curve means that the relationship between income and air quality tends to be an inverted U shape [61–63]. Especially in emerging countries, such as China, which is a developing country with relatively low per capita income, areas with a higher income often face more serious air pollution. Therefore, whether migrants choose to settle down in a city or not is a trade-off between higher income and poorer air quality. Although poorer air quality can significantly decrease the settlement intentions of migrants, a higher income will often compensate for the resulting loss of utility of the migrants. In order to test the influences of both air quality and income on the settlement intentions of the migrants, a regression model was built, as follows:

$$SI\_{ij} = \beta\_0 + \beta\_1 PM2.5\_{ij} + \beta\_3 income\_{ij} + \lambda X + \rho Z + \mu\_{ij} \tag{1}$$

$$SI\_{\rm ij} = \beta\_0 + \beta\_1 PM2.5\_{\rm ij} + \beta\_2 PM2.5\_{\rm ij} \times income\_{\rm ij} + \beta\_3 income\_{\rm ij} + \lambda X + \rho Z + \mu\_{\rm ij} \tag{2}$$

where *i* represents the individual, and *j* represents the Chinese city. *SIij* stands for settlement intention of the migrant, *PM2.5ij* represents air quality of the cities, and *incomeij* represents the monthly income of migrant *i* in city *j*. Since the explained variable *SI* of model (1) is a binary dummy variable, this study used a logistic regression model to estimate model (1). Note that when estimating a binary choice model, such as model (1), a logistic regression model and a probit regression model are equivalent [64]. This study also estimated model (1) using a probit regression model in the following robustness test. A logistic regression model is a nonlinear model, where the coefficients of variables in model (1) are

not the marginal effects, as in a linear regression model, but their signs are consistent with the marginal effects [64]. By substituting the estimated coefficients in model (1) into the exponential function with log-base e, the odds ratio was obtained. According to the theoretical analyses above, β*<sup>1</sup>* < 0, β*<sup>2</sup>* > 0, and β*<sup>3</sup>* > 0 was expected and assumed.

Furthermore, *X* represents a vector that included all the control variables of individual characteristics that affect the settlement intention of the migrant (*nation*, *gender*, *age*, *party*, *edu*, *hukou*, *marriage*, *time*, *distance<sup>1</sup>* , *distance<sup>2</sup>* , *distance<sup>3</sup>* , *reason*). All control variables came from the CMDS 2017. *Z* represents a vector including other city-level variables that affected the settlement intentions of migrants (*third*, *trade*, *pgdp*, *gdpr*), which all came from the China City Statistical Yearbook 2018. In addition, the provincial fixed effect in model (1) was also controlled, and µ*ij* represents the residual term. Model (2) added an interaction term *PM2.5* × *income* based on model (1).

The specific definitions and descriptive statistics for all variables are reported in Tables 2 and 3.


**Table 2.** Definitions of variables.


**Table 3.** Descriptive statistics of the variables.
