*2.3. Methodology*

The procedure for estimating the causal relationships between PM2.5 and the above socioeconomic factors using the panel data from 2000 to 2015 included five steps: the unit root test, panel cointegration test, panel fully modified least squares (FMOLS) regression, Granger causality test, variance decomposition and impulse response. The details are as follows:

A unit root test checks whether the unit root exists and if a time series variable is non-stationary [33]. If there is a unit root in the time series variable, it will lead to a pseudo-regression in subsequent regression analysis [34]. The null hypothesis is defined as the existence of a unit root, and the variables are non-stationary. In this study, the methods of Levin, Lin and Chu (LLC) and Im, Pesaran and Shin (IPS) were used for testing.

A panel cointegration test is used to test whether there is a long-term stable equilibrium relationship between variables. In this study, the Pedroni method was used to test the cointegration relationship between the socioeconomic variables and PM2.5 concentrations [16].

The panel FMOLS regression designed by Phillips [35] is utilized to provide the optimal estimations of cointegrating regressions [36]. This method modifies least squares to account for the autocorrelation effects and the endogeneity in the regressors due to the existence of a cointegration relationship [35,37]. In this study, the panel FMOLS regression was used to explore the trends and directions of *ln*GDPPC, *ln*UIS and *ln*IND in *ln*PM2.5 in the long term. The relationship between variables was expressed by the following equation, Equation (1):

$$
\ln \text{PM}\_{2.5it} = \alpha + \beta\_1 \ln \text{GDPPC}\_{it} + \beta\_2 \ln \text{UIS}\_{it} + \beta\_3 \ln \text{INDD}\_{it} + \varepsilon\_{it} \tag{1}
$$

where *i* and *t* represent the city and the time indexes in the panel, as shown by subscripts *i* (*i* = 1, ... , 14) and *t* (*t* = 1, ... , 16), respectively. α is the intercept; βs are partial coe fficients of *ln*GDPPC, *ln*UIS and *ln*IND; and εs refer to errors.

The panel vector error correction model (VECM) was used to investigate the direction and Granger causal relationships between the variables in the panel in the short or long term. In this study, short-term causality represented weak Granger causality because the dependent variable only responds to the short-term shocks of the stochastic environment (a stochastic environment refers to the agent's actions and does not uniquely determine the outcome), whereas long-term causality referred to the independent variable's response to the deviation from long-term equilibrium [22,38]. Generally, short-term causality a ffected 1–2 periods, while long-term causality represented the casual relationship of the whole period from 2000 to 2015 [22]. The short-term Granger causality depended on the χ2-Wald statistics of the coe fficient significances of the lagged terms of the explanatory variables [38]. The long-term Granger causality was determined by the error correction term (ECT) significance. If the variables are cointegrated, then the coe fficients of the ECTs are expected to be at least one or all negative and significantly di fferent from zero [22].

Variance decomposition explains the amount of information each endogenous variable contributes to the other variables in the autoregressions. The impulse response function indicates the e ffects of a shock to one innovation on current and future values of the endogenous variables [38,39]. The Cholesky decomposition technique was used in the VECM to determine the contribution of one variable on another and estimate how each variable responds to the changes in the other variables [22].

The above methods were realized in the software EViews 8.0 (IHS Global Inc., Englewood, CA, USA), and relevant statistical principles were followed according to the user guide [40,41].
