2.4.2. Regression Models and Evaluation

As Zhai et al. [29] suggested, there may be some problems that the contributions of the predictors truly driving PM2.5 variations were unclear when using anterior principal components as explanatory variables without explicit rules and standards. To better understand the relationship between the built environment and PM2.5 and establish regression models, this study carried out stepwise regression analysis involving all principal factors, which can have a screening process for the principal factors and obtain the principal factors that have a significant impact on the dependent variables [29]. The verification of regression models was in accordance with the common methods used in relevant research fields in which the neighborhood samples were divided into test samples and verification samples [35]. The selection of two types of samples should not only consider that there are enough test samples to establish the regression model but also take a certain number of verification samples for validation. Therefore, one neighborhood sample in each city was randomly selected for validation, including WH4, HF5, NJ3, SH4, and HZ4. The remaining 32 neighborhood samples were test samples for the construction of the regression model.

The accuracy of the regression model was measured by comparing the difference between predicted values and actual values of the dependent variable. The relative error (RE) was used to evaluate the accuracy of the predicted values of the PM2.5 indicators of the five validation samples.

$$\text{RE}\_{i} = \frac{|y\_{i}^{\prime} - y\_{i}|}{y\_{i}} \times 100\% \tag{3}$$

where RE*<sup>i</sup>* is the relative error of the *i*-th validation sample and *yi* and *yi* are the predicted value and actual value of a PM2.5 indicator of the *i*-th validation sample, respectively.

#### **3. Results and Discussion**
