**2. Dataset**

We used the global centroid moment tensor (CMT) catalog containing 11,638 earthquakes with a depth ≤ 50 km recorded over the past almost 40 years between 1980 and 2019 [13,14]. We considered only events above the completeness magnitude as threshold Mw = 5.5 [13,15]. The epicenter distribution of these events is shown in Figure 1. The current seismic sequences present in the seismic catalog have been detected by the [16] declustering algorithm, and the related parameters are provided and implemented in the ZMAP software [17]. Figure 2 shows the mainshocks (red dots) and foreshocks/aftershocks

(blue dots) in three zones in the world (Chile, Mexico, and Indonesia). Table 1 shows the number of events present in each subcatalog. We stress that the declustered catalog (i.e., the catalog containing only the mainshocks of the sequences) has 6440 events, about 45% less with respect to the complete catalog. This work aims to maintain as much data as possible and use all the available earthquakes in the catalog for the spatial distribution modeling. We underline that the use of a global catalog, instead of regional catalogs, has some drawbacks: a high threshold for the completeness magnitude (in our case Mw 5.5), difficulty in recognizing volcanic events, and large uncertainties in hypocentral estimation. The main advantage is a large number of strong events, which can be collected in a few years.

**Figure 1.** Location of earthquakes in the global centroid moment tensor (CMT) catalog with a depth ≤ 50 km recorded over the past almost 40 years between 1980 and 2019 [13,14]; blue letters indicate the zones of the zoom-in Figure 2.



**Figure 2.** Location of earthquakes in the global centroid moment tensor (CMT) catalog with a depth ≤ 50 km recorded over the past almost 40 years between 1980 and 2019 [13,14]; (**<sup>a</sup>**–**<sup>c</sup>**) show the mainshocks (red dots) and foreshocks/aftershocks (blue dots) in some zones in the world: Indonesia, Mexico, and Chile.

#### **3. A New Smoothed Seismicity Approach**

Building a spatial grid is the first step to constructing a spatial smoothed seismicity model [3,4]. In this work, we used a global spatial regular grid, 0.5◦ by 0.5◦. Therefore, we need to compute the contribution to each event in the seismic catalog to the generic *i*-th spatial grid; the following equation describes that contribution:

$$f\_{\bar{i}} = \sum\_{j=1}^{N} cK\_{\bar{i}\bar{j}}A\_{\bar{i}}d\_{\bar{j}} \tag{1}$$

where *fi* represents the normalized total seismic rate for the *i*-th spatial grid, *N* is the total number of events in the complete (i.e., not declustered) catalog, *c* is the normalization factor *c* = 1∑ *fi* , *Kij* is the kernel function that depends on the distance between the center of the *i-th* spatial cell and the epicentre of the *j*-th earthquake, *Ai* is the area of the *i*-th spatial cell, and *dj* is the correction to take into account the foreshocks and aftershocks contribution to the spatial model.

The following Gaussian kernel function [3] is used:

$$K\_{ij} = \frac{1}{2\pi\sigma^2} e^{-\frac{r\_{ij}^2}{2\sigma^2}} \tag{2}$$

where *rij* is the distance between the center of the *i*-th spatial cell and the epicentre of the *j*-th earthquake, and *σ* is the free parameter of the model that rules the amplitude of the smoothing. However, we note that different kernel functions can also be employed in smoothing the epicenters from the earthquake catalog [4,18]. The smoothing distance, σ, involved in each earthquake may be defined differently in various smoothed seismicity models. For example, the fixed smoothed seismicity models practiced a single smoothing distance for all earthquakes. The adaptive smoothed seismicity models represent unique smoothing distances for each earthquake between an event and its *n*th closest neighbors (NN), resulting in spatially varying smoothing distances [4]. The distance becomes smaller in regions of high seismicity than in areas with sparse seismicity. It is one of the crucial parameters in the smoothed seismicity models both for the earthquake rates and the spatial variations of the earthquake activity rates in a region [19]. The correction parameter *dj* represents the innovative part of our method. It is defined as following *dj* = 1*Sj* , where *Sj* is the number of events in the seismic sequence and contains the *j*-th event. For example, if a seismic sequence contains ten events, one mainshock, and nine aftershocks, each event receives a weight, = 110 . Since the sum of all the weights is equal to one, the inclusion of aftershocks does not create a spatial bias in the model [6], and it leads to a better description of the fault that generated the sequence.

Using this simple correction may help better identify the active fault structures and their features in a region. Removing all the aftershocks and foreshocks [3,4], giving very high weight to the mainshocks only [10], may lead to an incomplete or biased view of the spatial distribution of future seismicity. Conversely, considering all the events in the sequence with a uniform weight, as in our method, increases the model's forecasting performance.

We underline that with Equation (1), we build normalized smoothed seismicity models, i.e., the sum of all the rates in the spatial cells are equal to 1. In this work, we do not face the problem of the total number of events and their magnitude frequency distribution, already treated in [9]. In that work, the seismicity rates are corrected by a proposed technique that allows counting all events in the complete seismic catalog by quickly adjusting the magnitude frequency relationships. Our method differs from theirs, since we only deal with the spatial distribution of the seismicity by using an equal weight for all the events of the corresponding seismic sequence and incorporating aftershocks to improve the spatial resolution of the model.

#### **4. Likelihood Testing for Spatial Variation of Seismicity**

To perform the maximum likelihood estimation of the parameter *σ* (both for the fixed and adaptive smoothing approach) and to assess the performance of the model, we avoid considering the Poisson distribution of seismic events because this assumption is rarely satisfied by the seismic catalogs [20,21]. Since we are interested only in the spatial distribution of the events, and with Equation (1), we model the normalized spatial distribution of events, we defined the log-likelihood (*LL*) of the observations with:

$$LL(X|M) = \sum\_{i=1}^{N} \log(f\_i) \tag{3}$$

where *X* is the set of the *N* observations (i.e., the epicenters of the events in the seismic catalog), *M* is the spatial model, *log* is the natural logarithm, and *fi* is the seismic rate of the spatial cell where the *i*-th event is located. We note that this formulation differs from the spatial *LL* defined by [22] and has been commonly used in many seismic experiments [23], since the Poisson hypothesis has been abandoned in our study. The *LL* of Equation (3) may be ratified as the classical *LL* of a bivariate probability density function (represented by the model, *M*) in case we assume the independence between the observations in the set *X*. Additionally, in the case of nonindependent observations, the *LL* can be still used for scoring the models (some authors, in this case, called the function "pseudo-likelihood", [24].

To perform a pseudoprospective evaluation of the models first, we calculated the log-likelihood values by dividing the earthquake catalog into two parts: (1) the learning catalog, which contains the events recorded between 1980 and 2009 and is used to construct trial smoothed seismicity models, and (2) the testing catalog, which covers the last ten years of catalogs (2010–2019). The same *LL* of Equation (3) is also used to evaluate the performance of the models.

We applied the fixed and adaptive smoothing methods with and without our correction to include aftershocks and foreshocks for a total number of four different models. First, we used the learning catalog to compute the optimal smoothing parameters from the maximum-likelihood estimations (MLE), which strongly vary with smoothing distance (fixed smoothing) and neighbor number (adaptive smoothing). In the case of fixed smoothing, we used a vector of possible sigma (from 5 km to 200 km, with a spacing of 5 km), while for the adaptive smoothing, a set of possible neighbor numbers) are considered from 1 to 20, with a spacing of 1. The first part of the learning catalog (1980–1999) with a period of twenty years is utilized to build various smoothed seismicity models with different sigma and NN values. Finally, the nearest neighbor numbers and the correlation distances are calculated through maximum-likelihood optimization for the four smoothed seismicity models using the last ten years of the learning catalog (2000–2009). The results of these estimations are summarized in Table 2 and Figure 3.

We underline that these obtained MLE values are suitable only in the case of a global catalog: regional estimation of these parameters can lead to different MLE values (e.g., smaller sigma and larger NN).


**Table 2.** MLE of the parameters.

**Figure 3.** MLE for the parameters sigma and NN for the Fixed (**a**), Adaptive (**b**), Fixedcorrected (**c**), and Adaptivecorrected (**d**) models. Blue curves represent the log-likelihood functions, red dots the maximum of these functions.
