*3.2. Construction and Verification of Models*

Based on 22 principal factors, 24 regression models of six PM2.5 relative indicators were carried out for four pollution levels, which included principal factors that significantly influenced PM2.5. Regression models of six PM2.5 relative indicators at the overall pollution level passed the test of significance (Table 2). However, the principal factors included in the six models were different, indicating the complex impacts of different principal factors on the range, duration, and rate of PM2.5 increase or decrease. The number of principal factors included in the regression model was 3~11. The more principal factors were included in the regression model, the adj\_*R*<sup>2</sup> value was relatively higher. In these models, P3, P4, P13, and P17 were the four principal factors that appeared more frequently, indicating their significant effects on the increase/decrease in PM2.5. Overall, these principal factors can explain approximately 60.6~81.3% of the PM2.5 reduction indicators but only approximately 23.2~67.9% of the PM2.5 increase indicators.



Note: \*\*\*, \*\*, and \* indicate that the factors passed the test of significance at 1%, 5%, and 10%, respectively, and the numbers in brackets indicate the regression coefficient and standardization coefficient, respectively.

PM2.5 indicator regression models at different pollution levels also passed the test of significance (Table S6). First, although there were great differences in the principal factors included in the different models, some principal factors had a high frequency and great impacts on PM2.5. However, these principal factors varied based on pollution levels. For example, P3, P1, and P16 were important principal factors affecting the relative indicators of PM2.5 at slight, moderate, and heavy pollution levels, respectively. Second, the explanation degree of these principal factors for different PM2.5 indicators showed a similar trend at different pollution levels. The explanation degree of Cin was higher than that of Cin', and Δtin was generally between them. The explanation degree of Cde was lower than that of Cde', and Δtde was often in between. Nonetheless, the explanations of these principal factors for PM2.5 increase and decrease indicators were different. At the slight pollution level, these principal factors explained relatively more (approximately 52~81%) of the PM2.5 decrease indicators and less (approximately 16~49%) of the PM2.5 increase indicators. At the moderate pollution level, the principal factors had a higher explanation for PM2.5 increase indicators (approximately 70~84%), while the explanation for PM2.5 decrease indicators was lower (approximately 60~62%). At the heavy pollution level, the difference in the explanation of PM2.5 increase and decrease indicators by principal factors narrowed, focusing on 60~75%.

The prediction accuracy of the verification neighborhood samples was calculated for validation. Figure 4 shows the comparison between the predicted value and the actual value of PM2.5 indicators. Generally, the predicted value and actual value of each PM2.5 indicator were similar. There were individual samples with great differences between the predicted value and the actual value, which were mainly at heavy pollution, followed by moderate pollution. Furthermore, the accuracy of different PM2.5 indicators of five verification samples was compared through the RE value via Equation (3). At the overall pollution level, the RE of the six PM2.5 indicators of each verification sample was mostly less than 10%. The prediction error of different verification samples had great randomness. The maximum prediction error was Cin (33.3%) of sample HZ4, and the minimum was Δtde in WH4, whose RE was 0.3%. At different pollution levels, the trend of prediction error of verification samples increased with the increase in pollution level, and the prediction error of each verification sample varied greatly.

**Figure 4.** Validation for regression models of six PM2.5 indicators at the different pollution levels. (**a**) Cin; (**b**) Δtin; (**c**) Cin'; (**d**) Cde; (**e**) Δtde; (**f**) Cde'.

The prediction error was relatively low at the overall pollution level because more days of data (24 days) were used for testing. With the increase in pollution level, the number of days used for verification was less, which is vulnerable to accidental or sudden factors outside the built environment, resulting in an increase in prediction error. Based on these models, although the short-term prediction error is unstable, it can still achieve high prediction accuracy for the long-term PM2.5 change trend of the block. Therefore, it has high application value.

Data with fewer days used in the analysis usually lead to a lower R<sup>2</sup> and a greater RE. In this study, due to the limited number of heavy pollution days, only 3 days of data were used for analysis at this pollution level. However, 4-day PM2.5 data and 3-day data monitored by instruments were used to analyze the effects of urban lake wetlands, neighboring urban greenery, and plant communities on PM2.5, respectively [42,43]. There may be some accidental factors influencing the results by limited data, but it is enough for analysis.
