*2.4. Performance Evaluation*

## 2.4.1. ETAS Framework

The efficiency of the method to correctly identify spatio-temporal correlated seismicity is evaluated on a simulated ETAS catalog, where the underlying structure of the clusters is known a priori. Additionally, we aim to demonstrate the performance of the method compared to widely used clustering algorithms. In particular, our approach is compared with the Nearest-Neighbor (NN), Gardner and Knopoff (GK) window-based and Reasenberg (RB) link-based algorithms. A detailed review on each one of them is given in Appendix B.

The ETAS model belongs to a wide class of branching processes where the occurrence rate of earthquakes, known as intensity function, depends on the history of all previous seismicity and consists of two parts given by

$$\lambda\left(t, \mathbf{x} = (\mathbf{x}, y) / H\_t\right) = \mu(\mathbf{x}) + K \sum\_{j: t\_j < t, x\_j \in \Sigma\_0} e^{a\left(m\_j - m\_t\right)} \mathbf{g}\left(t\right) f(\mathbf{x}),\tag{1}$$

where *j* runs over all past earthquakes with magnitude larger than or equal to *mc*. The intensity function is evaluated at each event that occurred during the time interval [*tin*, *T*] and inside the target region Σ ⊆ Σ0. However, events in the broader region Σ0 and time interval [*<sup>t</sup>*0, *<sup>T</sup>*], with *t*0 < *tin*, can be considered triggering events, therefore, they should be included in the evaluation of *<sup>λ</sup>*(*<sup>t</sup>*, **<sup>x</sup>**). The first term of the right-hand side of Equation (1) expresses the background seismicity *μ*(**x**) which is assumed stationary in time (mother events) and clustered in space due to the fault network geometry. The latter term represents the space-time-dependent seismicity (daughter events) expressed through the following empirical laws:


Each mother event generates its daughters (first generation), the daughters generate their own descendants (second generation), and so on. In this way, a cluster is defined as a

sequence of events with a common mother event (first event in the cluster). There are also single events, i.e., mother events without any subsequent triggered earthquake.

## 2.4.2. Simulation Procedure

For the simulation of an ETAS earthquake catalog, first, we need to generate the mother events. The heterogeneity in space can be preserved from the spatial coordinates of a declustered earthquake catalog. In particular, we implement a declustering procedure to the earthquake catalog of Greece (Σ0 = [19◦*E* − 29◦*E*] × [33.7◦*N* − 42◦*N*]) for earthquakes with *mc* = 2.5 during the period of 2011–2019, using the NN method, and then we produce *Nmain* mother events according to a Poisson distribution with mean value equal to the number of the identified background events. Their coordinates are sampled with replacement from the declustered catalog by adding a random factor. The occurrence times are simulated from a uniform distribution *<sup>U</sup>*(*<sup>t</sup>*0, *<sup>T</sup>*), where *t*0 = 0 and *T* = 20 years.

The magnitudes are independent from the earthquakes' spatial and temporal distribution and follow the Gutenberg–Richter (GR) law truncated from the left at the completeness magnitude, *mc* = 2.5, and from the right at the maximum observed magnitude of the instrumental earthquake records in Greece plus a small factor, *mmax* = 7.8. The functional form of their distribution is the following, *s*(*m*)=(*βe*<sup>−</sup>*βm*)/(*e*<sup>−</sup>*βmc* − *e*<sup>−</sup>*βmmax* ), where *β* relates to the *b*-value of the GR law with *β* = *blog*(10), and we chose *b* = 1.0. After the generation of the background events, we simulate their aftershock number, following Poisson distribution with expected rate equal to the productivity of the model, *<sup>k</sup>*(*mj*) = *Ke<sup>a</sup>*(*mj*−*mc* ) . Their occurrence times are sampled from the modified Omori law, *g*(*t*), and the locations from the isotropic spatial distribution function, *f*(**x**). For next-generation daughters, the triggering step is repeated until there are no more generated events. In Table 1, we give the parameter set that produced the ETAS catalog.

**Table 1.** ETAS parameters used for the synthetic earthquake catalog with *mc* = 2.5. The target area is Corinth Gulf, Greece, with <sup>Σ</sup>*x*[*tin*, *T*]=[21.3◦*E* − 23.2◦*E*] × [37.9◦*N* − 38.6◦*N*] × [2, <sup>20</sup>]. The number of clustered events is *N* = 4253 and the number of mother events is *Nbg* = 1595.


2.4.3. Evaluation

The ETAS earthquake catalog is divided into either single events or mother events with their descendants. We define by **X** = {**<sup>X</sup>***k*}*<sup>k</sup>*=1,. . . ,*Nc* the true partition of the catalog, where *Nc* corresponds to the number of clusters, and with **Y***i* = {**<sup>Y</sup>***n*}*<sup>n</sup>*=1,. . . ,*Nyi* , *i* = 1, . . . , *K*, the partition after the implementation of the MAP-DBSCAN and the other *K* − 1 methods, including different tuning of the algorithm's parameters. *Nyi* is the number of clusters after the implementation of method *i*.

Next, we define the Jaccard index [46], which is a measure to quantify the overlap between two partitions, in our case, the true one of the ETAS catalog, **X**, and the one of the *i*-th implemented algorithm, **Y***i*. The Jaccard index is expressed by *J*1(**<sup>X</sup>**,**<sup>Y</sup>**) = *<sup>a</sup>*11/(*<sup>a</sup>*11 + *a*10 + *<sup>a</sup>*01), where *a*11 indicates the number of pairs of elements which are correctly assigned into the same cluster (true links), *<sup>a</sup>*01—the number of pairs of elements which are in the same cluster in the ETAS catalog and in different clusters in the estimated one (missed links) and *<sup>a</sup>*10—the number of pairs of elements which are wrongly identified as clustered events (false links). If all the initial clusters are correctly identified by the implemented method, then *a*10 = 0 = *a*01 and *J*1(**<sup>X</sup>**, **Y**) = 1. Conversely, if all pairs are wrongly identified as clustered or independent, then *a*11 = 0 and, as a consequence, *J*1(**<sup>X</sup>**,**<sup>Y</sup>**) = 0.

In addition, we introduce a generalization of the Jaccard index, *J*2(**<sup>X</sup>**,**<sup>Y</sup>**) = *b*11/(*b*11 + *b*10 + *b*01), to identify the partition **Y** with the best discrimination between the background seismicity and clustered elements, following the definition in Lippiello and Bountzis [47]. We consider as background seismicity single events and the mother events of each cluster, i.e., the one that initiated a cascade of events. Here, *b*11 represents the number of common background events in the two partitions, *b*10 is the number of elements wrongly identified as mother events in the partition **Y**, whereas *b*01 corresponds to the number of true mother events identified as clustered elements in the partition **Y**. In Table 2, we show the Jaccard index values (*Ji*, *i* = 1, 2) after the implementation of the different clustering algorithms. In particular, for the MAP-DBSCAN algorithm we show the one with the best results in terms of the Jaccard index (MAP-DBSCAN27). In Appendix B, the results for all input parameters are given.

**Table 2.** *Ji*, *i* = 1, 2, values for 3 parameter sets (PS) of RB and GK algorithms, respectively, and the corresponding values of the MAP-DBSCAN and NN methods.


The window-based method removes all the events within *d*(*M*) kilometers and *t*(*M*) days after a main shock with magnitude *M*. We used three different temporal and spatial intervals, given by Equations (A2) (GK1), (A3) (GK2) and (A4) (GK3). The Reasenberg algorithm combines a deterministic spatial window and a probabilistic temporal one, determined by the Omori law. We used three sets of parameters in the ZMAP tool. RB1 (Table A1) corresponds to the original parameters proposed in Reasenberg [12]. In the second set, RB2, we extend the spatial zone by increasing the factor *rf act* from 10 to 20 km, whereas, in the third set, RB3, we also extend the temporal window modifying the parameters *τmin* and *τmax* (see Table A1). The NN method separates seismicity into background and clustered events according to the bimodal distribution of the rescaled time and distance metrics. The algorithm needs as input two parameters, the b-value and the fractal dimension of the earthquake locations, which are considered equal to *b* = 1.0 and *df* = 1.51, respectively. For the MAP-DBSCAN method, first, the optimal MAP model and the corresponding rate threshold (*λthr* = *λ*1) are determined. Then, different temporal constraints are tested for the merging of the potential clusters (consecutive events above the rate threshold) and, subsequently, the DBSCAN is implemented. We set the minimum number of neighbors equal to *Npts* = 2, similar to the minimum size of a cluster that can be given as output from the other algorithms. For the distance threshold, , we tested a wide range of possible values (see Table A2).

The NN method shows the best performance in the construction of the clusters (*J*1 = 0.756) and in the detection of the mother events (*J*2 = 0.727). This is also evident by its cumulative number of background seismicity (purple line in Figure 1a), which is the closest one to the initial catalog (dotted black line). The temporal evolution of background seismicity is shown in Figure 1b–f across the longitude for ease of reading as west–east normal faults dominate the area. For the NN method, no large gaps are evident in the spacetime evolution of the declustered seismicity, although there is a significant concentration of events between the 7th and 8th year of the catalog, which is also persistent in both RB2 and GK3 methods (orange ellipses in Figure 1d–f) and less apparent on MAP-DBSCAN method (orange ellipse in Figure 1c). The high efficiency of the NN method is probably related to the metric it uses, which is similar to the ETAS one with *<sup>λ</sup>j*(*ti*, *xi*)=(*ti* − *tj*)−1*r*<sup>−</sup>*df ij* 10*bmj* and *c* = 0, *p* = 1, *d* = 0, *q* = *df* and *a* = *b*. The windowing technique seems to overestimate the temporal and spatial windows, since it removes large amounts of seismicity (blue ellipse in Figure 1f), in accordance with previous results [11]. The same gap between the

14th and 15th year of the catalog is also evident in the background seismicity from the MAP-DBSCAN method, however, it is smaller and some sparse seismicity is left (blue ellipse in Figure 1c). On the other hand, Reasenberg's declustered catalog has more events than any other method (pink, magenta and green line, Figure 1a) and significant concentrations of events are visible in the space-time evolution of the background seismicity (orange and purple ellipses in Figure 1e).

**Figure 1.** (**a**) Cumulative number of background events for each algorithm, the initial ETAS catalog with black color and the mother events of the ETAS synthetic catalog with the black dotted line. The space-time evolution (**b**) of the initial ETAS catalog and of the background seismicity for the four best algorithms, (**c**) MAP-DBSCAN27 (*J*2 = 0.647), (**d**) NN (*J*2 = 0.727), (**e**) RB2 (*J*2 = 0.630), (**f**) GK3 (*J*2 = 0.676). Colored ellipses stand for large gaps and significant concentration of events.

Best overlapping among the true, **X**, and the estimated partition, **Y**, does not mean necessarily the best detection for the declustered seismicity. For instance, the GK3 partition is characterized by a lower index, *J*1 = 0.585, than the MAP-DBSCAN27 partition, *J*1 = 0.627, however, its declustering catalog is more accurate (GK3-*J*2 = 0.676 > *J*2 = 0.647- MAP-DBSCAN27). Nevertheless, both indexes combined, the MAP-DBSCAN partition shows a higher efficiency than the rest of the algorithms, except for the NN. The Jaccard index values for the rest of the MAP-DBSCAN input parameters are quite stable with small fluctuations from the best parameter set (MAP-DBSCAN27), apart from the smallest distance cutoff, = 2.5 km, which seems inadequate for capturing the spatial correlations among the events (Appendix B.4).

## **3. Earthquake Data**

We considered three areas in the region of Greece (Figure 2a), which consist of distinctive seismotectonic units. The selection of the three study areas is based on criteria related to the homogeneity of the type of faulting, the comparatively intense continuous seismicity and the existence of seismic excitations during the study period. The first area, Corinth Gulf (CG) (Figure 2b), is undergoing high extensional deformation rates. The seismicity is mainly associated with eight major faults that bound the rift to the south and dip to the north [48]. The area of central Ionian Islands (CII) (Figure 2c) is characterized by the highest moment rate in the Mediterranean region. Its main seismotectonic feature is the Kefalonia

Transform Fault Zone (KTFZ) which extends for more than 100 km along the western coastlines of Lefkada and Kefalonia Islands, comprising two distinct main branches, the Lefkada and Kefalonia faults, respectively. Right lateral strike slip motion with a minor thrust component is the dominant faulting type [49,50]. The third area is located in the North Aegean Sea (NAS) (Figure 2d) and is dominated by dextral strike-slip faulting, along the North Aegean Trough (NAT) and its parallel branches [51], as a consequence of the westward propagation of the North Anatolian Fault into the Aegean [52]. The driving mechanism of the active deformation in the Aegean region is the subduction of the oceanic lithosphere of the Eastern Mediterranean under the continental Aegean microplate, forming the Hellenic Subduction Zone and the extensional back arc Aegean area due to the slab rollback [53].

For the investigation of the clustering properties in the three areas, we considered earthquake datasets from the regional catalog of the Geophysics Department of the Aristotle University of Thessaloniki [54], compiled with the recordings of the Hellenic Unified Seismological Network (HUSN). The earthquake catalogs of CG, CII and NAS, which we denote henceforth as D1, D2 and D3, include 25,595, 24,085 and 21,139 events, respectively, occurring between 2012 and 2019. For the determination of their completeness magnitude, we implemented the Goodness-of-Fit (GFT) method [55], assuming that earthquakes follow the Gutenberg–Richter (GR) law, *logN* = *a* − *bM*. In particular, the differences between the observed and the synthetic frequency-magnitude distributions are computed for increasing magnitude bins as threshold values. The completeness magnitude is defined as the first magnitude bin at which the difference falls under the 5% residual (Figure S1). Figure S1a–c show the residuals for the three datasets and Figure S1d–f present the GR law for the corresponding complete datasets. The *b*-value is calculated by means of the maximum likelihood method proposed by [56] and found equal to *b* = 0.97, 0.88, 0.89, for the D1, D2, D3 datasets, respectively (Table 3). The resulting magnitude threshold for the three datasets is equal to *Mc* = 1.5, 2.2, 2.1, with 13,043, 6981, 8328 events (Table 3), respectively, the epicenters of which are shown in the maps of each study area (Figure 2).

**Table 3.** The magnitude of completeness, *Mc*, for the datasets of the three areas CG, CII and NAS, along with the productivity, *a*, and the *b*-value of the GR law. *N* and *Nc* denote the initial number of events and the ones with *M* ≥ *Mc*, respectively.


**Figure 2.** Maps of the study areas depicting seismicity along with major faults (yellow lines). (**a**) The main seismotectonic features of the area of Greece. The black lines illustrate the active boundaries and the arrows the relative plate motions. Rectangles enclose the three study areas. (**b**) The area of Corinth Gulf where the major faults are shown (yellow lines) along with seismicity during 2012–2019. (**c**) The area of Central Ionian Islands where the Kefalonia Transform Fault Zone and the collision front are shown (yellow lines) along with seismicity during 2012–2019. (**d**) The area of North Aegean Sea where the NAT is traced (yellow line) along with seismicity during 2012–2019. The legend is common for the three study areas.
