**1. Introduction**

As a rule, strong earthquakes may not occur over the entire territory of a seismically active region. Critical objectives of the seismic hazard assessment include recognition of the areas prone to strong earthquakes. An effective instrument to accomplish this objective is pattern recognition. The fundamental possibility of employment methods and algorithms for pattern recognition to identify potentially high seismicity areas was first substantiated by remarkable mathematician I.M. Gelfand et al. in 1972 [1,2]. The developed approach was later called EPA (Earthquake-Prone Areas) [3–7].

The EPA method was developed in the fundamental papers of I.M. Gelfand and V.I. Keilis-Borok, members of the Academy of Sciences of the USSR; A.D. Gvishiani, academician of the RAS; Al.An. Soloviev, associate member of the RAS; and famous Soviet and Russian scientists, namely Sh.A. Guberman, M.P. Zhidkov, V.G. Kossobokov, A.I. Gorshkov,

**Citation:** Dzeboev, B.A.; Gvishiani, A.D.; Agayan, S.M.; Belov, I.O.; Karapetyan, J.K.; Dzeranov, B.V.; Barykina, Y.V. System-Analytical Method of Earthquake-Prone Areas Recognition. *Appl. Sci.* **2021**, *11*, 7972. https://doi.org/10.3390/app11177972

Academic Editor: Giuseppe Lacidogna

Received: 30 July 2021 Accepted: 26 August 2021 Published: 28 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

V.A. Gurvich, E.Ya. Rantsman, I.M. Rotvain, etc. Prominent foreign geophysicists, seismologists, geologists, and mathematicians took an active part in developing EPA. These include F. Press and L. Knopoff, members of the United States National Academy of Sciences; Professors A. Cisternas, J. Bonnin, E. Philip, C. Weber, and J. Sallantin, French scientists; as well as M. Caputo and G. Panza, members of the National Academy of Sciences of Italy, etc. [5–7].

In the classical Gelfand–Keilis-Borok setting, the problem of strong earthquake-prone areas recognition (EPA problem) is formulated as follows. In a considered seismically active region, it is necessary to recognize the areas prone to strong earthquakes (with magnitude *M* ≥ *M*0, where *M*0 is a given threshold). These areas are sought among the recognition objects identified in the region. As the recognition objects, morphostructural nodes or intersections of morphostructural lineaments obtained as a result of morphostructural zoning (MSZ) of the region are considered [8–12]. It is necessary to divide the set of recognition objects *W* into two non-intersecting classes: class *B* consisting of the objects whose vicinities are prone to strong earthquakes and class *H*, which is composed of the objects where such earthquakes cannot occur.

This classification is carried out with the employment of the pattern recognition algorithm with learning. It uses the learning set *W*0, which is determined based on the information about the seismicity of the region. In turn, *W*0 consists of two non-intersecting subsets *B*0, containing objects that are a priori referred to as class *B*, and *H*0, containing the representatives of class *H*. The result of applying the recognition algorithm is a decision function based on which an object from *W* can be attributed to class *B* or *H*, and the classification of the objects itself [4–6].

A detailed literary review for almost half a century of the development and application of pattern recognition algorithms to solve the recognition problem of areas prone to strong earthquakes (EPA approach), carried out by the authors of this article, is given in [5]. This paper examines the applied pattern recognition algorithms, the studied regions, and methods for assessing the reliability of the results obtained, including the theory of dynamic and limit recognition problems. This work is devoted to the presentation of modern mathematical methods developed at the Geophysical Center of the Russian Academy of Sciences, aimed at overcoming the difficulties that may arise today when using the EPA approach in its classic 50-year version. In the present paper, the problem of earthquake-prone areas recognition is considered in the classical formulation of Gelfand– Keilis-Borok.

Talking about the drawbacks of the EPA method, the following should be noted. The formation of learning material is a fundamental phase of recognition. Learning set *B*0 includes the objects with known epicenters of strong earthquakes in the vicinities. It is reasonable to assume that set *B*0 formed in this way is highly likely to contain no a priori errors or so few of them that they are unable to affect significantly the recognition result. It is hard to form a similar "clean" learning material of class *H*. The set *H*0 contains either all objects not included in *B*0 or objects with known earthquakes with magnitude *M* < *M*0 − *δ*, where *δ* >0[5,13] in the vicinities.

As regards the essence of the EPA problem, which represents a threshold problem of recognition [4,14,15], learning class *H*0 entails inherent potential errors. Accordingly, a low seismicity learning class is not at all a totality of benchmark objects that cannot be related to strong earthquake-prone areas. Learning sets *B*0 and *H*0 turn out to be disparate [16,17], and the EPA procedure ignores this fact [13].

Many years of applying the EPA method in numerous mountainous countries of the world [5] demonstrated the need to avoid learning asymmetry in recognition. Making EPA findings more reliable necessitates amending the recognition unit by adding a learning algorithm based on the only high seismicity class *B*0, which includes objects with known strong earthquakes in their vicinities [13]. The development of this kind of recognition algorithm has been one of the main objectives of this study.

Another drawback of the EPA method is that the identification of recognition objects and the measurement of their geological-geophysical and geomorphologic characteristics is a time- and effort-consuming problem. That said, the possibility of using selected objects needs separate justification for every region [5]. The foregoing illustrates that the practical employment of the EPA method is still challenging to a grea<sup>t</sup> extent. This forced the authors to develop new state-of-the-art algorithmic systems, which enable the automation of the recognition process. The key objective of study was to set up and develop this kind of system.

#### **2. Materials and Methods**

#### *2.1. Recognition of the Areas Prone to Strong Earthquakes with One Learning Class*

In the papers [4,5], the EPA approach was further developed by creating a new algorithm called Barrier-3, employed in the recognition unit [13,16,17]. It should be especially noted that the principal difference of the Barrier-3 algorithm from the dichotomy algorithms previously used in EPA is that learning is based only on *B*0 high seismicity class set.

Broadly speaking, the Barrier-3 algorithm, which learns based only on one class, is not a dichotomy algorithm. However, it can be used in the EPA. Barrier-3 also divides the territory into two non-intersecting areas that are either prone or not prone to strong earthquakes.

Similarly, to the dichotomy algorithms [5], the Barrier-3 algorithm views disjunctive nodes or axis intersections of the morphostructural lineaments as recognition objects. Such selection of objects derives from their deep tectonic connection with strong earthquakes. The confinedness of the strong earthquake epicenters to the intersections of morphostructural lineaments was statistically confirmed in the paper [18].

The objective of the Barrier-3 algorithm is to study the characteristics of learning set *B*0 of the only high seismicity class and identify the objects that are "similar" to the learning objects based on the knowledge obtained. The latter are declared as high seismicity ones. Speaking the language of the theory of sets, Barrier-3 accomplishes the objective of constructing in the finite set of objects *W* and its subset *B*, broadening the only learning class *B*0. For this reason, the disparity measure between two arbitrary objects is constructed for every characteristic. This allows finding and measuring the "barrier," which divides these objects as part of the considered characteristic. This assessment acts as a proximity measure for the initial set *W*, which enables giving exact meaning to the idea of proximity to *B*0 based on a given totality of characteristics. Let us move on to the description of the mathematical construction of the Barrier-3 algorithm [13,16].

Let us assume that Π = {*π*} is a finite totality of numerical characteristics of recognition objects *w* ∈ *W*, and *π* : *W* → *R*, *B*0 is own subset in *W* for learning of high seismicity class *B*. Based on the totality of characteristics Π of a set of objects *W*, it is necessary to construct a set *<sup>P</sup>*Π(*<sup>B</sup>*0) that would adequately expand *B*0 in the sense of requirements of the formulated problem.

The proximity of the objects *w*1 and *w*2 by characteristic *π* is "hampered" by all those objects *w* whose *π*(*w*) values lie in between the values *<sup>π</sup>*(*<sup>w</sup>*1) and *<sup>π</sup>*(*<sup>w</sup>*2). They make up a barrier:

$$\mathcal{B}\_{\pi}(w\_1, w\_2) = \{ w \in \mathcal{W} : \min(\pi(w\_1), \pi(w\_2)) \le \pi(w) \le \max(\pi(w\_1), \pi(w\_2)) \} \tag{1}$$

It is natural to assume that the lower the barrier <sup>Б</sup>*π*(*<sup>w</sup>*1, *<sup>w</sup>*2), the better it is for the proximity of *w*1 and *w*2 on *W* by characteristic *π*. This observation explains the name of the algorithm.

Let us call the ratio

$$\rho\_{\pi}(w\_1, w\_2) = |\mathcal{S}\_{\pi}(w\_1, w\_2)| / |W| \tag{2}$$

disparity measure between *w*1 and *w*2 by characteristic *π* or simply barrier measure.

The Barrier-3 algorithm forms the set of *<sup>P</sup>*Π(*v*) objects that are close to *v* ∈ *B*0 from *W* based on the characteristics Π in three phases.

Phase one: formation of set *<sup>P</sup>π*(*v*) of objects that are close to *v* in *W* by characteristic *π* using the minimality threshold *απ*(*v*):

$$P\_{\pi}(\upsilon) = \{ w \in \mathcal{W} \, : \, \rho\_{\pi}(w, \upsilon) \le a\_{\pi}(\upsilon) \}\tag{3}$$

The threshold *απ*(*v*) performs the functions of the flexible lower boundary of the set {*ρπ*(*<sup>w</sup>*, *<sup>v</sup>*), *w* ∈ *W*} and can be obtained, for example, using Kolmogorov averaging with the value *s* < 0:

$$\alpha\_{\pi}(v) = \left(\frac{\sum\_{\overline{\pi} \in \mathcal{W}} \rho\_{\pi}(\overline{\pi}v, v)^{s}}{|W|}\right)^{1/s} \tag{4}$$

Phase two: formation on *W* of the value *p*Π(*w*|*v*) showing the proximity of *w* to *v* based on all characteristics Π:

$$p\_{\Pi}(w|v) = |\{\pi \in \Pi : w \in P\_{\pi}(v)\}|.\tag{5}$$

Here *<sup>P</sup>π*(*v*) is determined by the formula (3), and the integral exponent *p*Π(*w*|*v*) introduced by the formula (5) varies from 0 to |Π|.

Phase three: formation in *W* of a subset of *<sup>P</sup>*Π(*v*) objects that are close to *v* based on all characteristics Π using the maximality threshold *β*Π(*v*):

$$P\_{\Pi}(\upsilon) = \{ w \in \mathcal{W} \, : \, p\_{\Pi}(w|\upsilon) \ge \beta\_{\Pi}(\upsilon) \}\tag{6}$$

The threshold *β*Π(*v*) performs the functions of flexible upper boundary of the set of values {*p*Π(*w*|*v*), *w* ∈ *<sup>W</sup>*}. Similarly, to the formula (4), *β*Π(*v*) can be constructed using Kolmogorov averaging with the value *q* > 0:

$$\beta\_{\mathrm{II}}(\boldsymbol{v}) = \left(\frac{\sum\_{\overline{\boldsymbol{w}} \in \boldsymbol{W}} p\_{\mathrm{II}}(\overline{\boldsymbol{w}} | \boldsymbol{v})^{q}}{|\boldsymbol{W}|}\right)^{1/q} \tag{7}$$

As a result, a sought set *<sup>P</sup>*Π(*<sup>B</sup>*0) is obtained by the formula:

$$P\_{\text{II}}(B\_0) = \cup\_{\upsilon \in B\_0} P\_{\text{II}}(\upsilon). \tag{8}$$

For the quantitative assessments of the contribution of characteristics to the formation of a sought subset of high seismicity objects, the algorithm features additional computational units. In parallel with the computation of *p*Π(*w*|*v*), a binary matrix is formed *<sup>M</sup>*Π(*w*|*v*):

$$M\_{\Pi}(w|v)\_{i,j} = \begin{cases} \ 1, \ w\_{\hat{\imath}} \in P\_{\pi\_{\hat{\jmath}}}(v) \\\ \ 0, w\_{\hat{\imath}} \notin P\_{\pi\_{\hat{\jmath}}}(v) \end{cases}, \ i = 1, \ldots, |\mathcal{W}|, \ j = 1, \ldots, |\Pi|. \tag{9}$$

Every element of the matrix (9) determines whether or not the object *w* ∈ *W* belongs to the set *<sup>P</sup><sup>π</sup>j*(*v*), *πj* ∈ Π of the objects that are close to *v*. String summation of the matrix *<sup>M</sup>*Π(*w*|*v*) for all *w* ∈ *<sup>P</sup>*Π(*v*) forms the vector *<sup>W</sup>*Π(*v*):

$$\mathcal{W}\_{\Pi}(\upsilon)\_{\dot{j}} = \sum\_{k} M\_{\Pi}(w|\upsilon)\_{k,j\prime} \ w\_k \in P\_{\Pi}(\upsilon)\_{\prime} \ \dot{j} = 1, \ldots, |\Pi|. \tag{10}$$

The elements of the vector (10) illustrate the contribution of characteristics *πj* ∈ Π to the formation of a subset *<sup>P</sup>*Π(*v*) of the objects that are close to *v* ∈ *B*0. The quantitative assessments of the contribution of characteristics to the formation of *<sup>P</sup>*Π(*<sup>B</sup>*0) are undertaken in two phases:

Element-by-element summation of all vectors *<sup>W</sup>*Π(*v*), *v* ∈ *B*0 and normalization on |*<sup>B</sup>*0| allows obtaining the average contribution of characteristics to the recognition of a sought high seismicity subset *<sup>P</sup>*Π(*<sup>B</sup>*0).

The sorting of *<sup>W</sup>*Π(*v*), *v* ∈ *B*0 and the selection for each of them of three characteristics with the greatest values, followed by the summation of the number of belongings of such characteristics to the formed threes, enable assessing the contribution of such characteristics through their classification as the "strongest." This class will be called Top 3 ranking. This explains the name of the algorithm Barrier-3.

The set of recognition objects is represented as a disjoint union of *W* = *B H* high and low seismicity classes, *B* = *<sup>P</sup>*Π(*<sup>B</sup>*0) ⊇ *B*0, and *H* = *<sup>W</sup>*\*<sup>B</sup>*.

The Barrier-3 algorithm was employed as the EPA recognition block to recognize the areas prone to crustal earthquakes with *M* ≥ 6.0 in the Caucasus and the Altai–Sayan– Baikal region. Here, 16 intersections of the axes of morphostructural lineaments with known epicenters of crustal earthquakes with *M* ≥ 6.0, starting from 1900, in their vicinities (with a radius of 50 km in the Altai–Sayan–Baikal region and 25 km in the Caucasus) were used as the learning sets *B*0 of high seismicity class in both regions.

The list of the geological-geophysical and geomorphological characteristics of considered vicinities of lineament intersections used for recognition by Barrier-3 algorithm is given in Table 1. Highlighted in bold type are 7 characteristics that were selected to be used for recognition in the Altai–Sayan–Baikal region based on the findings from the assessment of informativeness for the instance of one learning class; in italics, 11 characteristics selected for recognition in the Caucasus are given. It is noteworthy that 4 characteristics (Hmin, Top, R2, and dB) were selected for recognition using the Barrier-3 algorithm in both regions. Based on the threshold magnitude of the recognized earthquake-prone areas (*M* ≥ 6.0), the circles with a radius of 25 km were selected as vicinities within which the values of characteristics were computed. For reproducibility of the result and its greater reliability, the values of characteristics of objects were computed automatically using the smart GIS developed by the Geophysical Center of the Russian Academy of Sciences (http://seismgis.gcras.ru/ access date: 30 July 2021) [19,20].


**Table 1.** The list of characteristics of the recognition objects (intersections of lineaments). Bold type— characteristics used by the Barrier-3 algorithm for recognition in the Altai–Sayan–Baikal region; italics—the Caucasus; bold italics—both regions.

#### *2.2. Recognition of Strong Earthquake-Prone Areas Based on Identifying Dense Condensations of Point Objects*

In this part of the article, let us depart from the recognition objects described above (disjunctive nodes and intersections of the axes of morphostructural lineaments) and their geological-geophysical and geomorphologic characteristics. The task considered here will be closer to reality. Namely, in addition to the reliable classification of a finite set of point objects, a formalized and reproducible transition from a recognized high seismicity set *B* to a real two-dimensional region in the plane with the cardinality of the continuum will be required. Strong earthquakes can occur within and cannot occur outside of this sought region. In other words, the innovation we make to the statement of the problem will be building an image of the set of recognition objects *W* in the studied region *S* as in the subset of the Euclidean plane:

$$F\_{\mathcal{T}} : \mathcal{W} \to F\_{\mathcal{T}}(\mathcal{W}) \subset S \subset \mathbb{R}^2,\tag{11}$$

where *γ* is a set of free mapping parameters *Fγ* [21].

A sought mapping *Fγ* must meet the following necessary conditions:


It is natural to view such two-dimensional sets *<sup>F</sup>γ*(*B*) of the cardinality of the continuum as actual flat zones within which strong earthquakes can occur. The selection of values *γ* of any given variant among those that meet the conditions (a), (b), and (c) is based on the system approach using control experiments.

The algorithmic system called Formalized Clustering and Zoning (FCAZ) developed by the authors [21,22] is used as sought mapping *<sup>F</sup>γ*. It represents systems analysis method concerning Discrete Mathematical Analysis (DMA) [23–30].

The FCAZ method enables an effective recognition of strong earthquake-prone areas based on the cluster analysis [31] of a catalog of seismic events. It represents a consistent application of Discrete Perfect Sets (DPS) algorithms [21,32–34] and E2XT [22] (Figure 1). Unlike the EPA procedure, the FCAZ systems analysis method uses neither morphostructural zoning nor dichotomy learning algorithms. It relies on the topological filtering of a finite set of epicenters of fairly weak earthquakes, which act as recognition objects.

The fundamental difference of the FCAZ systems method from the EPA procedure is that FCAZ has a formalized block (E2XT algorithm) of passing from the classification of a finite set of point objects to sought flat high seismicity zones. The E2XT block constructs an unambiguous mapping of a set of objects identified by the DPS algorithm in the flat zones of non-zero measure. Strong earthquakes can occur within and at the boundary of such zones. Such mapping allows for the first time to switch in the problem of earthquake-prone areas recognition from the simple pattern recognition to full-fledged systems analysis. Specifically, this makes it possible to unambiguously isolate, using sharp bound, a subsystem of recognized high seismicity zones from their non-empty complement.

The core of FCAZ is the DPS topological filtering algorithm (Figure 1) [21] isolating clusters as own subsets in a set *W*. This is what makes DPS different from classical clustering algorithms. It is aimed at isolating in a finite set of Euclidean space of flat regions with a given density level *α*.

It should be emphasized that DPS is effective for the considered problem exactly because it distinguishes between compact, connected groups of objects and their fuzzy, unstructured complement. In other words, DPS cuts out isolated objects, "attracting" the rest into dense clusters. That said, unlike classical clustering algorithms, in DPS by no means all objects end up in clusters. This is what makes DPS new and innovative as a systems analysis algorithm.

**Figure 1.** The flow chart of system-analytical FCAZ method. Black blocks—DPS algorithm steps; blue blocks—E2XT algorithm.

The DPS algorithm has two free parameters: *q* < 0 for the calculation of localization radius *rq*(*W*), which is determined as the power mean of all nontrivial pairwise distances in a set of recognition objects *W*, and *β* ∈ [−1, 1] is the maximality level of the necessary density of *α* = *<sup>α</sup>*(*β*, *q*) DPS clusters. The output value is a set (*α*(*β*, *q*)) *α*, which is dense in each of its elements.

The result of applying the DPS algorithm is as follows:

$$\text{DPS}(q,\beta): \mathcal{W} \to \{B\_1, \dots, B\_n\},\tag{12}$$

where parameters *q* and *β* determine a particular type of DPS clusters; *B*1, ... , *Bn*, own connected subsets in a set of recognition objects. In other words, *B* = *ni*=<sup>1</sup> *Bi*, where *Bi*, *i* = 1, ... , *n* are recognized DPS clusters, *B*, *Bi* ⊂ *W*, and *W*\*B* represents a significant part of the set *W*. It is noteworthy that if the latter is false, then the recognition result is trivial.

During the next phase, DPS clusters *B*1, ... , *Bn* are transformed by the E2XT algorithm into flat zones of the cardinality of the continuum (i.e., mapping of *Fγ* is performed). During this phase, mapping is constructed:

$$\text{E}^2\text{XT}(\delta, \mathbb{C}, \omega, \upsilon):\ B\_i \to F(B\_i),\tag{13}$$

where *δ* is the step of the geographical grid, *ω* < 0, *v* < 0 are free parameters, *C* is connection type, *Bi* are determined by the formula (12), *<sup>F</sup>*(*Bi*), *i* = 1, ... , *n* are sought flat zones of non-zero measure. If mapping (13) meets the above conditions (a), (b), and (c), then *F*(*Bi*) there are sought areas prone to strong earthquakes.

Accordingly, the FCAZ system is a composition of two algorithms:

$$\text{FCAZ}(\gamma) = \text{E}^2 \text{XT}(\delta, \mathbb{C}, \omega, \upsilon) \diamond \text{DPS}(q, \beta), \tag{14}$$

and *γ* is a set of free parameters of FCAZ:

$$\gamma = \{\delta, \mathbb{C}, \omega, v\} \cup \{q, \beta\} = \{\delta, \mathbb{C}, \omega, v, q, \beta\}. \tag{15}$$

The existence of mappings (13–15) allows considering FCAZ as a systems analysis method. FCAZ(*γ*), indeed, processes input data from beginning to end in a unified system. That said, the solution presented takes shape of sought two-dimensional zones and not their palliatives, representing finite sets of points in the plane [21].

The constructions of DPS and E2XT algorithms feature artificial intelligence blocks that automatically select optimal values *β* in DPS and *ω*, *v* in E2XT. This makes the result of FCAZ recognition objective and reproducible. An optimal value *β* enables recognizing DPS clusters for which the difference between the totality of densities (in the inherent sense of the algorithm) of objects inside the clusters and the totality of densities of objects outside of the clusters will be the greatest possible. The selected *ω* and *v* make it possible to find an optimal combination of connection and scannability of DPS clusters. A detailed description of mathematical constructions of the DPS and E2XT algorithms is provided in the papers [21,22].

FCAZ delivers a system approach to studying strong earthquake-prone areas. Its characteristic property lies in the fact that the recognition of sought zones in the regions of the globe that differ in structure relies on universal facts and methods, which enable a uniform approach toward solving the entire class of such problems. The statement of the problem and the process of its solving represent a unified system that is sufficiently invariable relative to geological structure, the selection of threshold magnitudes of sought strong earthquakes, objects, etc. That said, the FCAZ applicability condition is the state of seismological and geological-geophysical exploration of regions, which manifests itself in the high quality of earthquake catalog.

The next parts of the paper will demonstrate how FCAZ allows performing reliable recognition of the strongest ( *M* ≥ 7.75), strong ( *M* ≥ 6.0), and significant earthquakeprone areas across different mountain countries of the world. The reliability of obtained results was assessed with the help of control experiments and through comparison with high seismicity zones recognized by EPA approach. The division of earthquakes into the strongest, strong, and significant ones was made by the authors solely to simplify the description of the results obtained.

It should be noted that previously the recognition of high seismicity zones was performed just for one fixed magnitude threshold *M*0. In the present paper, a method for the successive recognition of the areas prone to earthquakes for different threshold magnitudes in the same region is proposed. It is based on the repeated application of FCAZ method to a set of recognition objects, which is successively narrowed down by way of DPS clustering. The new method was called Successive Formalized Clustering and Zoning and is abbreviated as SFCAZ [35]. Accordingly, the classical EPA problem formulated at the beginning of the article is for the first time expanded to a more systemic complicated problem of the successive recognition of the areas prone to earthquakes in the same region for several threshold magnitudes. The mathematical construction of the SFCAZ method is described in detail in the paper [35].
