2.1.1. Benchmark Regression Model

Liu et al. [28] investigated the relationship between industrial agglomeration and environmental pollution by adding industrial agglomeration into the production function. Zhu and Yan [45] extended the model on this basis and introduced urbanization into the equation. Based on the theories and methods of Zhu and Yan [45], in this paper, a regression equation was constructed, including new urbanization (*nurb*), the agglomeration of energy-intensive industries (*hagg*) and NOx emissions (*NE*). Based on previous studies [46–50], this paper introduced five control variables, such as the industrial structure (*is*), technological innovation (*r*&*d*), foreign direct investment (*fdi*), environmental regulation (*er*) and economic development (*pgdp*). In order to avoid heteroscedasticity, all variables were treated with a logarithm, and the benchmark regression model was set as below:

$$\begin{aligned} \ln NE\_{it} &= \beta + \beta\_1 \ln nur b\_{it} + \beta\_2 \ln flag g\_{it} + \beta\_3 \ln is\_{it} + \beta\_4 \ln r \& d\_{it} + \beta\_5 \ln f di\_{it} \\ &+ \beta\_6 \ln er\_{it} + \beta\_7 \ln p g d p\_{it} + \varepsilon\_{it} \end{aligned} \tag{1}$$

In Equation (1) and the following equations, *εit* is the standard error term.

The new urbanization and agglomeration of energy-intensive industries interact with each other. At the same time, considering a certain dynamic lag of NOx emissions, the interaction term between the new urbanization and agglomeration of energy-intensive industries and the first-order lag term of NOx emissions were introduced to build a dynamic model as follows:

$$\begin{array}{c} \ln NE\_{it} = \eta + \eta\_0 \ln NE\_{i, t-1} + \eta\_1 \ln nurb\_{it} + \eta\_2 \ln flag\_{jt} + \eta\_3 c \ln nurb\_{it} \* c \ln flag\_{jt} \\ + \eta\_4 \ln is\_{il} + \eta\_5 \ln r \& d\_{il} + \eta\_6 \ln fdi\_{il} + \eta\_7 \ln er\_{il} + \eta\_8 \ln pgdp\_{il} + \varepsilon\_{il} \end{array} \tag{2}$$

Among them, thanks to the variable centralization, the multicollinearity problem caused by the introduction of interaction terms can be avoided effectively. *c* ln *nurbit* and *c* ln *haggit* are the variable representations after being intensively processed.

It makes the model having a dynamic interpretation ability to introduce the first-order lag term of NOx emissions into the regression model, but it will result in endogenous problems of the model. The different generalized methods of moments (DIF-GMM) can reduce the influence of endogeneity on the model estimation. However, the DIF-GMM has serious problems, such as: weak instrumental variables and poor accuracy of the coefficient estimation results. From this, scholars combined the horizontal equation and the first-order difference equation, as well as proposed the system generalized method of moments (SYS-GMM) [51,52]. Due to limited samples, the system GMM method ensures the accuracy of the estimation. Therefore, Stata 16.0 [53] was used to analyze the model based on the system generalized moment estimation method.
