*3.1. Spatial Methodology*

To investigate the spatial distribution of PTB risk at census block level in Paris, we used a spatial scan statistic approach implemented in the SaTScan software [56].

The null hypothesis (H0) tests whether the risk of PTB is equi-distributed throughout the study area. The alternative hypothesis (H1) tests if there is an elevated PTB risk within the cluster in comparison with census blocks outside the cluster.

In our study, the Poisson probability model implemented in the SaTScan software [56] was chosen as *cluster analysis method*. The number of PTB cases (a rare event) in each census block is assumed to follow a Poisson distribution. The input data for the Poisson model are the cases (PTB) and the population at risk (all birth) to determine if there is significant spatial clustering of the cases.

We therefore compute a relative risk (RR) in each census block weighted by the population at risk count in each census block. The RR is estimated as the observed divided by the expected cases within the cluster divided by the observed divided by the expected cases outside the cluster (Equation (1)):

$$RR = \frac{\text{c} / \text{E}[\text{c}]}{\text{(}^{\text{C}}-\text{c}\text{)} / (\text{E}[\text{C}]-\text{E}[\text{c}]} = \frac{\text{c} / \text{E}[\text{c}]}{\text{(}^{\text{C}}-\text{c}\text{)} / (\text{C}-\text{E}[\text{c}])} \tag{1}$$

where *c* is the number of observed PTB cases within the cluster and *C* is the total number of PTB cases in the data set. Note that since the analysis is conditioned on the total number of cases observed, *E*[*C*] = *C*.

The procedure to identify the most likely cluster is structured as follow. First, a circle of radius, varying from zero up to 50% of the population size [57], is placed at the centroid of every census blocks. Second, the circle moves across the study area to compare the PTB rate within the circle with what would be expected under a random distribution. Therefore, an infinite number of circles were created around each centroid, with the radius anywhere from zero up to a maximum so that at most 50 percent of the population is included.

The scan statistic approach is likelihood based. The most likely cluster can be selected and tested for statistical significance. The likelihood function for the Poisson model is detailed in Equation (2):

$$I\left(\frac{c}{E(c)}\right)^c \left(\frac{\mathcal{C} - c}{\mathcal{C} - E(c)}\right)^{\mathcal{C} - c} I\left(\right) \tag{2}$$

where *C* is the total number of PTB cases, c is the observed number of PTB cases within the window and *E*[*c*] is the covariate adjusted expected number of PTB cases within the window under the null hypothesis. Note that since the analysis is conditioned on the total number of cases observed, *C*-*E*[*c*] is the expected number of cases outside the window. *I* () is an indicator function.

The identification of the most-likely clusters is based on a likelihood ratio test [58] with an associated *p*-value obtained using Monte Carlo replications [59]. The number of Monte Carlo replications was set to 999 to ensure adequate power for defining clusters and considered a 0.05 level of significance (*p* value derived from 999 replications).

### *3.2. Analytical Strategy and Results Interpretation*


To incorporate covariates in the model, we categorized NO2 concentrations and socioeconomic deprivation index into five groups according to the quintile of their distribution. Because the SaTScan software does not allow for an interaction term to be accommodated in the model, we created several dummy variables combining the socioeconomic deprivation and the air pollution categories.

At the first step, a statistically significant test means that the risk of PTB is not randomly distributed in the city of Paris: a cluster of census blocks presents a significant increase in PTB risk in comparison with census blocks located outside the cluster [59].

For the three others steps, when the models are adjusted on one or more co-variables, according to the Kulldorff studies [57], several statistical criteria were used to test the H0 hypothesis: the cluster's localization (the shift or the disappearance of the cluster, or no changes), the level of statistical significance of the cluster and the likelihood ratio value of each model.

According to these criteria, there are three possible results: If, after adjustment, the most likely cluster remains in the same location, (whether or not this cluster is significant) and its likelihood ratio decreases, then it means that the variable(s) incorporated in the model explain partially the excess risk [56]; If the most likely cluster shifts (the centroid of the cluster changes), this suggests that the covariate(s) in the model explain the cluster's excess risk [56] allowing the identification of second cluster. Finally, if the most likely cluster disappears totally, it means that the adjusted PTB risk is now randomly distributed over the study area. To map and visualize the spatial location of the statistically significant most likely clusters, we used ArcGis software (ESRI, Meudon, France).
