*2.1. Data*

#### 2.1.1. Air Quality and Weather Monitoring Network in Milan

Data on pollution and weather conditions of Lombardy are collected from the Lombardy Regional Agency for Environmental Protection (ARPA Lombardia), which makes a large open data portal fully available to users (https://www.dati.lombardia.it/). The agency manages a diffuse monitoring system distributed among the regional territory and counting on hundreds of monitoring stations collecting intra-daily information on climate and pollution through sensors.

Installed within the borders of Milan are seven weather monitoring stations and five air quality monitoring stations. Air quality stations are classified according to a taxonomy system that identifies the type and function in the network. The stations Liguria (ARPA code 539), Marche (ARPA code 501), Senato (ARPA code 548), and Verziere (ARPA code 528) are urban traffic control units: sensors installed near important roads and intersections in order to accurately quantify the pollution generated by traffic. The station Città Studi (ARPA code 705) is instead of type urban background, that is, the station is located in such a position that the level of pollution is not mainly influenced by specific sources but by the integrated contribution of all the upwind sources at the station with respect to the predominant directions of the winds on the site [8]. The seven weather stations are Marche (ARPA code 501), Lambrate (ARPA code 100), Zavattari (ARPA code 503), Brera (ARPA code 620), Feltre (ARPA code 869), Rosellini (ARPA code 1327), and Juvara (ARPA code 502).

Figure 2 georeferences on the map the exact position of each station and allows for identifying the position with respect to Area B and Area C. Air quality stations are represented as blue points, while weather stations are the red points. Marche station (ARPA code 501), in the upper side of the map, is the only one to collect both weather and pollution data and is represented with a double label, the first one blue and the second red.

The spatial distribution of the stations is not uniform: air quality stations cover northern, eastern, central, and southern parts of the city, leaving the western districts uncovered; climate stations cover in detail the city centre and all the northern neighbours but are not installed in the south.

#### 2.1.2. Temporal Coverage, Pollutants, and Weather Measures

The analysis presented in this paper takes into account daily measures from 1 January 2014 to 30 September 2019, generating an overall sample of 2099 daily observations.

The whole, the monitoring system provides information about many urban pollutants, such as carbon dioxide, particulates, and oxides. All the pollutants are measured as μg/m3. As already stated in the Introduction, we focus our attention on concentrations of total nitrogen oxides (NOx) and nitrogen dioxide (NO2), which are mainly primary gaseous pollutants, hence considered as proxies of pollution emissions due to human activities, first of all car traffic.

Weather stations provide measures of local temperature (◦C), rainfall (cumulated mm), humidity (%), global radiation (W/m2), wind speed (m/s), and wind direction. The wind direction is expressed in clockwise degrees from 0◦ to 360◦; for example, 90◦ identifies winds going from east to west. To make results easier to interpret, we decide to aggregate the measurements on wind direction and speed by constructing a set of new variables that describe the average speed in the four quadrants of the compass rose. The Northeast quadrant (QNE) corresponds to degrees between 0 and 90, the Southeast quadrant (QSE) to degrees from 90 to 180, the Southwest quadrant (QSW) to degrees from 180 to 270 and the Northwest quadrant (QNW) to the remaining values lying between 270 and 360 degrees.

These measures will be used in the modeling part to capture local weather conditions specific to the city of Milan. Instead of using the data referring to the weather station closest to each air quality station, we preferred to aggregate each of the climate variables through a daily city average valid for each pollution station. In this way, the subsequent models will be fully comparable guaranteeing the maximum possible spatial coverage.

#### 2.1.3. Anthropogenic Activities

Human activities, and therefore the quality of the air we breathe, are often affected by calendar events that are recorded based on national, local, and religious holidays and weekends. Calendar effects are captured by a set of covariates, which identify the weekends and the main Italian holidays, both religious and secular. Holidays are collected in a dummy variable named *Holidays*, while the weekends are contained in a dummy variable named *WeekEnd*. Specific effects related to the behavior of people can be observed when holidays coincide with the weekend; therefore, we considered two terms of interaction between the two dummies. The interaction terms include those holidays that fall on Saturday, denoted as *Saturday:Holiday*, and those on which they fall on Sunday, which is *Sunday:Holiday*.

For a correct assessment of the effects of the traffic policy on pollutants concentrations in Milan, it is necessary to purify the estimates from any external weather or socio-economic effects overlapping with the policy and which may hence alter policy effects. This operation is accomplished by introducing a counter-factual term into the models represented by the pollution levels observed in other cities surrounding Milan. We considered seven important urban centres located in the Lombardy Po Valley area, which show socio-demographic and economic characteristics and weather conditions similar to Milan, but which cannot be directly affected by the limited traffic zone. These urban centres are Bergamo (East), Brescia (far East), Cremona (far Southeast), Lodi (Southeast), Pavia (South), Saronno (North) and Treviglio (East). As reported in Figure 3, the considered candidates cover a large territory surrounding Milan in all the directions while maintaining a sufficient distance to be considered independent in terms of traffic.

#### *2.2. Methods: Average and Median Difference before and after the Policy*

Figure 4 shows the temporal evolution of yearly average and median concentrations in the period preceding and following the entry into force of the policy for each control units located in Milan. According to the figure, starting from 2015, the city of Milan recorded a generalized reduction of concentration levels especially in peripheral areas, such as Marche and Liguria. Observed mean values for 2019 present a further reduction of concentrations rather apparently anomalous and significant. The comparison between the levels of NOx and NO2 pairs for each station shows obvious common trends between the two pollutants both considering the annual average and median values. Averages and medians follow similar temporal patterns, but focusing on nitrogen oxides sensors, it is possible to note that the medians are significantly smaller than the averages, highlighting the heavy-tailed characteristic of the distribution (positive asymmetry) and the presence of extreme values. Following these facts, an interesting question to investigate is if, and how much, the greater difference observed in 2019 can be attributed to traffic restrictions, or if it is due to a general de-carbonization trend that the city is experiencing, or to weather variations not considered yet.

**Figure 3.** Georeferentiation of counter-factual candidates. Geographical positioning of the counterfactual candidates with respect to Milan.

**Figure 4.** Pollutant levels in Milan (μg/m3). Observed concentrations levels of NOx and NO2 between 2014 and 2019 with yearly average and median values. Values are expressed as μg/m3.

Before investigating the factors and causes that may have generated these sharp reductions, we perform a preliminary analysis of the concentration levels pre-and-post policy, in order to quantify the changes observed in 2019 both in Milan and in the other centres. Since air quality data present outliers and heavy-tail distributions given by extreme events, the only use of average values for central tendency estimation can provide misleading results. Therefore, we compare the central values obtained both considering the sample mean and the sample median, which is notoriously a more robust indicator if outliers occur [26,27].

The comparison is performed through the computation of two statistics based on the difference of central tendency indicators. The first statistic computes the difference between the average of the observations gathered after the policy intervention and the average of observations referring to the sub-period 2014–2018. The second statistics consists of computing the difference between median concentration levels observed in 2019 and before that year. The difference in average concentrations is denoted by *dAVG*, whereas the difference in median concentrations is denoted as *dMED*. Since both sub-periods are treated as independent of each other, from the statistical perspective, the statistics are assimilable to unpaired samples statistics.

Both statistics use the observations collected between 25 February and 30 September of each year, with a total length of 214 days. Approaches of this type can be framed in a context of treatment-control analysis, in which the data referred to the year 2019 constitute the treatment group, while the observations collected between 2014 and 2018 compose the control group. Control data refer to a 5-year-period; therefore, the concentrations measured on the same calendar day are aggregated into a single representative value calculated as the daily average concentration of the period 2014 to 2018. Denoting as *cij* the observed pollutant concentration during the day *j*, where *j* = 25 *February*, ..., 30 *September*, of the year *i*, where *i* = 2014, ..., 2018, the average for a generic calendar day *j* is computed as *cj* = ∑<sup>2018</sup> *i*=2014 *cij* 5.

Let *U* = {*uj*, *j* = 1, 2, ..., 214} be the treatment observations and *V* = {*vj*, *j* = 1, 2, ..., 214} the control observations, the difference of averages is defined as *dAVG* = *AVG*(*U*) − *AVG*(*V*) and the difference of medians is calculated as *dMED* = *MED*(*U*) − *MED*(*V*), where *AVG*(.) is the temporal sample mean and *MED*(.) is the temporal sample median.

#### *2.3. Methods: Time Series Modeling Using a State Space Approach*

In this section, we discuss the time series models used to identify the policy effect, the estimation algorithms, and the related inference. Firstly, we introduce a brief description of the basic structural model (BSM) using a state space approach for time series analysis and the estimation algorithm based on the Kalman filter [28,29]. Then, we present a three-step procedure used to select the most representative model in terms of predictive power and quality of fit. As a last step, we explain how the policy intervention is included in the models and how it should be interpreted.

#### 2.3.1. Basic Structural Model for Air Quality Data

According to their physical characteristics, air pollution concentrations time series are often characterized by seasonality, high persistence [30,31], strong right skewness with uni-modal distribution, and scale invariance [32]. Therefore, we analyze the concentrations using the basic structural model [33,34] augmented by deterministic regressors for weather conditions and socio-economic features.

BSM is defined as a simple unobservable components model composed by local linear trend (LLT), stochastic seasonality, and irregular (white noise) component. LLT describes both the temporal evolution of the series level and its slope, while the seasonal component aims to capture cyclical behaviors given by natural and anthropogenic phenomena. We modeled the seasonal component using a trigonometric form for daily data, hence with period *s* = 365, and considering only a few harmonics given the very regular and almost deterministic behavior of the series. This fact avoids the risk of a model over-parametrization.

Let {*y*1, *y*2, ..., *yn*} be the time series of the observed pollution concentrations in logarithmic scale, the state space form of BSM without regressors is composed by the following equations:

$$y\_t = \mu\_t + \gamma\_t + \varepsilon\_{t\ \prime} \tag{1}$$

where *εt* ∼ *N*(0, *σ*2*ε* ) is the measurement error and

$$\text{LLT (Level)}: \quad \mu\_{\text{l}} = \mu\_{\text{l}-1} + \beta\_{\text{l}-1} + \eta\_{\text{l}}, \qquad \eta\_{\text{l}} \sim \text{WN}(0, \sigma\_{\text{\eta}}) \,, \tag{2}$$

$$LLT\ (Slope):\quad \beta\_l = \beta\_{l-1} + \mathbb{Z}\_l,\qquad \mathbb{Z}\_l \sim \text{WN}(0, \sigma\_{\mathbb{Z}})\ ,\tag{3}$$

$$Stochastic\text{ }s\text{-}asonality:\quad \gamma\_t = \sum\_{j=1}^{k} \gamma\_{j,t}.\tag{4}$$

where *k* ≤ *s*2 is the number of included harmonics and *γj*,*<sup>t</sup>* is the non-stationary stochastic cycle

$$
\begin{bmatrix}
\gamma\_{j,t} \\
\gamma\_{j,t}^\*
\end{bmatrix} = \begin{bmatrix}
\cos(2\pi j/s) & \sin(2\pi j/s) \\
\end{bmatrix} \begin{bmatrix}
\gamma\_{j,t-1} \\
\gamma\_{j,t-1}^\*
\end{bmatrix} + \begin{bmatrix}
\omega\_{j,t} \\
\omega\_{j,t}^\*
\end{bmatrix} \tag{5}
$$

*ωt* ∼ *WN*(0, *<sup>σ</sup>*2*ω*) and *<sup>ω</sup>*<sup>∗</sup>*t* ∼ *WN*(0, *<sup>σ</sup>*2*ω*) are white-noise processes with mean zero and variance *<sup>σ</sup>*2*ω*.

Equation (1) is called measurement equation and describes the evolution of the observed series as the sum of the underlying components, while Equations (2)–(4) are named transition equations. Equations (2) and (3) compose the LLT and describe respectively the unobservable processes of the level and the slope, whereas Equation (4) describes the trigonometric seasonality evolution. Weather, socio-economic factors, and policy intervention will be included in the models adding a set of deterministic components to the measurement Equation (1). Since the BSM with Gaussian errors belongs to the class of Gaussian linear models, the estimation step has been performed using the Kalman Filter algorithm, an iterative procedure, which allows estimating simultaneously the unobservable components and the model's parameters by maximizing the Gaussian likelihood function.

When dealing with Gaussian linear state space models, the parameters estimated using a maximum likelihood (ML) approach inherit the asymptotic properties of ML estimators [29]. The distribution of the MLE is asymptotically approximated using a Gaussian distribution, which allows deriving the usual asymptotic confidence intervals and t-tests for significance. Assuming a significance level of 5%, the estimates are statistically significant if the standardized value lies outside of the interval [−1.96,1.96], obtained using the quantiles of a Standard Normal distribution. Moreover, since the dependent variable is expressed in logarithmic scale, the coefficients have to be interpreted as relative increases or decreases in concentration levels due to a unitary increase in the explanatory variable.

#### 2.3.2. Three-Step Model Selection

We now propose a three-step procedure for model selection, which considers multiple rules based on cross-validation, information criteria, and stepwise regression. To avoid estimation bias due to the policy introduction, all the steps are computed using only the observations before the introduction of Area B that is, from 1 January 2014 to 24 February 2019.

Step 1 is designed for selecting the most predictive seasonal component, defined in Equation (4), comparing different model specifications, which consider a varying number of harmonics *k* for the trigonometric function. Specifically, we fit 10 alternative models for each station: in each of them, the trigonometric seasonality is modeled by an increasing number of harmonics ranging from *k* = 1 to *k* = 10. The use of an increasing number of sinusoids, in our case up to 10, allows the modeling of complex seasonality with strong variations within short periods, but at the same time increases the model complexity.

Once the seasonal component has been selected, step 2 introduces in Equation (1) a counter-factual term *xt* able to capture weather and socio-economic factors common to the Po basin and affecting the air quality of Milan. In our approach, the counter-factual candidates are the time series introduced in Section 2.1.3 and which refer to the measurements of pollutant concentrations in seven important cities around Milan. The new measurement Equation can be written as follows:

$$
\Delta y\_t = \mu\_t + \gamma\_t + \theta \mathbf{x}\_t + \varepsilon\_t \tag{6}
$$

where *xt* is the logarithm of the counter-factual time series and *θ* is its coefficient, *μt* follows Equations (2) and (3), and *γt* follows the specification obtained by step 1. The expected sign of *θ* is positive: higher levels of air pollution should correspond to high values in nearby cities due to similar conditions.

In step 3, we identify the best subset of calendar events and weather covariates, capturing residual variations not ye<sup>t</sup> covered by the counter-factual or by the latent components. These residual variations are estimated by the smoothed observation disturbances from Equation (6) that is *<sup>ε</sup>*<sup>ˆ</sup>*t*, and describe residual patterns that have not been explained by the persistence of series, the seasonality or characteristics common to nearby territories of the region.

Relevant weather and calendar covariates are selected through a backward-forward stepwise regression algorithm, which uses as a starting model the auxiliary linear regression expressed in Equation (7). The equation represents the full model which sets up the smoothed observation errors *ε*ˆ*t* as dependent variable and the weather conditions and calendar events as covariates:

$$\begin{aligned} \varepsilon\_{l} &= \tau\_{l} \textit{Holdays} + \tau\_{l} \textit{WeekEnd} + \tau\_{\sf S} \textit{Saturday} : \textit{Holdays} + \tau\_{\sf S} \textit{Sunday} : \textit{Holday} \\ &+ \tau\_{\sf S} \textit{Temperature} + \tau\_{\sf S} \textit{Rainfall} + \tau\_{\sf S} \textit{Radianity} + \tau\_{\sf S} \textit{Hundredity} \\ &+ \tau\_{\sf S} \textit{VindSpeed}\_{\sf N \to} + \tau\_{\sf N \to} \textit{VindSpeed}\_{\sf N \to} + \tau\_{\sf T} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf T} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf T} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf T} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit{VindSpeed}\_{\sf S \to} + \tau\_{\sf S} \textit$$

The stepwise regression is set up twice for each station: in one case, it selects the model according to the Akaike's Information Criterion (AIC), while, in the other, it uses the Bayesian Information Criterion (BIC). The algorithm starts estimating the full model and computes the AIC or the BIC. Iteratively, it drops out the predictors one at a time; at each step, it computes the new information criterion and considers whether the criterion is improved by adding back in a variable removed at a previous step. The procedure ends when the reintroduction of each omitted variable does not improve the information criteria.

In the first two steps, we select the seasonal component and the counter-factual term by fitting and comparing alternative models based on Equations from (1) to (4) according to their predictive power and their ability to adapt adequately to the observed data. The first principle, which tests the predictive power of the models, relies on the minimization of the cross-validated mean square forecasting error (MSFE) evaluated for up to 10-step-ahead forecast horizon, that is, *y*<sup>ˆ</sup>*t*+*<sup>h</sup>* ∀ *h* = 1, 2, ..., 10, while the second compares the models in terms of estimation quality. The latter computes both corrected Akaike's Information Criteria (AICc) and BIC intending to select the model that minimizes both. To identify a unique model for all the stations located in Milan, we proceed to a global comparison, both graphical and analytical, of the two blocks of indicators, giving attention to the overall performances and not focusing only on individual outputs.

According to the cross-validation principle for time series [35,36], we split the full time series into two subsets: a training set for model estimation and a test set for evaluating the out-of-sample forecast performances. The training set includes all the measurements until 24 February 2018, while the test set contains observations relative to the sub-period 25 February 2018–24 February 2019, for a

total count of 365 out-of-sample observations. The exclusion of observations after the start of the traffic restrictions makes it possible to obtain unbiased estimates of the policy effects avoiding overlapping with other unidentified factors. Before starting the iterative loop, the model to evaluate is estimated just on time using the observations included in the original training set. At the end of the estimation, the cross-validation algorithm is iteratively implemented as follows. For each iteration, the algorithm extracts the first ten observations available in the test set, generating a forecasting set, and computes three quantities: the 1-to-10 step-ahead forecasts that is *y*<sup>ˆ</sup>*t*+*<sup>h</sup>* ∀ *h* = 1, ..., 10, the forecast errors *y* ˆ *t*+*h* − *yt*+*h* and the quadratic forecast errors (*y*<sup>ˆ</sup>*t*+*<sup>h</sup>* − *yt*+*<sup>h</sup>*)2. The first out-of-sample observation is discarded and the set of forecasting observations is updated right-shifting the forecast horizon by 1 unit and adding the new observation. These operations are repeated for a number of times equal to the length of the test set, in our cases 365 times. The algorithm returns the output of 365 different sequences of 1–10 step-ahead forecasts; for each step-ahead *h* = 1, 2, ..., 10, the MSFE is calculated as *MSFE*(*h*) = ∑<sup>365</sup> *j*=1 (*y*<sup>ˆ</sup>*t*+*h*−*yt*+*<sup>h</sup>*)<sup>2</sup> 365.

#### 2.3.3. Policy Intervention Analysis

The introduction of new rules or limitations to individual behaviors can lead to the co-existence of multiple effects with different structure, such as simultaneous immediate changes and adaptive changes that take a long time before visible effects occur. Take into consideration that this fact leads to implement intervention analysis, which includes both permanent and transitory effects. Further details and examples of ARMA-like transfer function applied to intervention analysis are available in Pelagatti ([29]).

The policy intervention is modeled through the combination of two individual effects: (1) a permanent effect, estimated by *δ*1 that measures the level shift of pollutant concentrations given by the treatment and modeled as a step dummy, which is *D*1*t*, which assumes a value equal to 1 starting from 25 February 2019; (2) a transitory effect, estimated by *δ*0 and evolving according to a first-order difference dynamics of the type

$$w\_{\!\!\!\/} = \lambda w\_{\!\!\!\/-1} + \delta\_0 D\_{\!\!\/+1} \tag{8}$$

where *D*2,*t* is a impulse dummy, which assumes value equal 1 for 25 February 2019 and 0 otherwise and *λ* measures the persistence of the transitory effect. The sum of the two effects returns the total effect, which expresses the estimated overall reduction or increase in air pollutant levels generated by the policy. The measurement equation after the three-step model selection and augmented by the policy intervention is eventually expressed as follows:

$$y\_l = \mu\_l + \gamma\_l + \theta \mathbf{x}\_l + Z\_l \Phi + \delta\_1 D\_{1t} + w\_l + \varepsilon\_l \tag{9}$$

where *yt* is the logarithm of pollution concentrations in one of the stations in Milan, *xt* is the logarithm of pollution concentrations in the optimal counter-factual station, *μt* is the LLT evolving according to Equations (2) and (3), *γt* is the optimal seasonal component selected in step one, *Zt* is a matrix containing the set of optimal subset of weather and calendar covariates selected in step 3, and Φ is the associate vector of coefficients.
