*3.2. Conditioning Factors*

Landslide occurrence is considered to be affected by a variety of natural and anthropogenic factors representing the conditions of a given region. These conditioning factors can be separated into two main categories: (a) the preparatory factors which create suitable conditions for a landslide by changing the state of a slope from stable to marginally stable, and (b) the triggering factors which initiate a landslide by changing the state of a slope from marginally stable to unstable [43]. Morphological and hydro-lithological conditions of the region of interest are represented by natural preparatory factors, whereas the human interventions on it are represented by anthropogenic preparatory factors. The triggering factors mainly represent climatic and seismic conditions related to rainstorms and earthquakes, respectively.

**Figure 2.** Multi-temporal Google Earth images: (**<sup>a</sup>**,**<sup>c</sup>**) before the landslides; (**b**,**d**) after the landslides. The red dashed lines indicate the location of the landslide before it happened, and the red solid line shows the scar of the landslide after it occurred.

Since no official guidelines are used by the scientific community for the selection of factors, the characteristics of the study area, data availability and a literature review [29,30] were taken into account for this study. In total, fourteen conditioning factors were selected, including both preparatory and triggering factors. In particular, the altitude, slope angle, slope aspect, profile curvature, plan curvature, stream density, stream power index (SPI), topographical wetness index (TWI), lithology, proximity to faults and soil type were used as natural preparatory factors; the land use/cover and proximity to roads were used as anthropogenic preparatory factors, and the mean annual rainfall was used as a triggering factor.

Defined as the height above a reference point (typically above the mean sea level), altitude is an important conditioning factor due to its gravitational potential energy. In general, the higher the slope angle is, the higher the likelihood of failure. Therefore, steep slopes are more prone to failures. The slope aspect is defined as the azimuth-based orientation of terrain and is highly related to exposure to sunlight; evapotranspiration; and rainfall's effects on weathering, soil, vegetation cover and root development [44]. Expressed by different types, such as plan and profile, the curvature indicates the runoff and erosion factors of water. The plan curvature is perpendicular to the maximum slope direction, whereas the profile curvature is parallel to the same direction [45]. By retaining more rainfall water and erosion-induced sediment than convex slopes, concave slopes are correlated with higher likelihoods of failure.

Considering its effects on groundwater recharge, stream density constitutes another important factor for landslide activity. This factor determines the ratio of the total length of streams to the extent of the study area. A high stream density is linked to low surface water infiltration and thus mass movements with high velocity [46]. SPI is another hydrological factor that measures the erosive power of the streams. On the other hand, TWI quantifies the moisture content of the surface [32].

Lithology is one of the most crucial factors for LS assessments, since different lithological formations have different slope instability performances in terms of strength and permeability. In a tectonically active country such as Greece, the faults seem to be associated with extensive fractured zones and steep relief anomalies presenting favorable conditions for landslides [35]. Hence, landslides are usually found in proximity to faults. Additionally, different soil types can have different impacts on surface infiltration and groundwater flow, depending on their particular physical and mechanical properties [47].

Changes in land use/cover as a result of human activities such as cultivation, deforestation and forest logging can significantly affect the occurrence of landslides. Proximity to roads can also reflect the human impact on landslides, as road construction at the base of a slope tends to degrade its stability.

Rainfall—causing an increase in the pore water pressure and a reduction in the shear strength of the soil [48]—is a basic triggering mechanism for not only the development of new landslides but also the re-activation of old ones. Particularly in Greece, rainfalltriggered landslides are among the most frequent and devastating disasters [38]. It is worth mentioning that since the majority of earthquakes that occurred in the study area during the last two decades were characterized by relatively low magnitudes (with M w between 3.0 and 3.5) and grea<sup>t</sup> depth (greater than 15 km) [49], seismic factor was not included in the analysis.

As is shown in Table 2, all the above conditioning factors were represented by GISsupported data formats. Most of them were in raster format (grids), but others were converted from vector (point, line, or polygon features) to a raster format with 25 m spatial resolution.

#### *3.3. Geographical Detector (GeoDetector)*

GeoDetector is a spatially-based multivariate statistical model which was developed in 2010 by Wang et al. [50]. It can detect the spatially stratified heterogeneity of a given phenomenon according to the basic principle that if a determinant is associated with the phenomenon, then there may be some similarities between their spatial distributions. Furthermore, it can reveal the driving forces behind the phenomenon by quantifying the impacts of individual determinants and of their pairwise interactions. The phenomenon under investigation as a dependent variable can be represented by either numerical continuous or discrete classified (stratified) data, and the determinants as explanatory variables exclusively by classified data.


**Table 2.** Summary of the datasets representing the conditioning factors.

In the case of LS, GeoDetector can detect whether a conditioning factor (explanatory variable) causes the spatial stratified heterogeneity of landslide occurrence (presence or absence of a landslide, dependent variable) or not. In particular, it can quantify the degree of impact of each factor on the landslide occurrence using a q-statistic calculated as follows [51]:

$$q = 1 - \frac{\sum\_{h=1}^{L} N\_h \sigma\_h^2}{N\sigma^2} \tag{1}$$

where *h* = 1, 2, ... , *L* is a given class (stratum) of an explanatory variable; *L* is the number of classes; *Nh* and *N* are the numbers of samples in class *h* and entire study area, respectively; and *σh* and *σ* are the variance of dependent variable in class *h* and entire study area, respectively. Ranging from 0 to 1, the higher the q value is, the more this explanatory variable contributes to the dependent variable. A *p*-statistic, an indicator of statistical significance for each explanatory variable, is also calculated by a non-central F-distribution:

$$p(q < x) = p\left(F < \frac{N-L}{L-1} \frac{x}{1-x}\right) = 1-a \tag{2}$$

where *a* is the probability of q being higher than or equal to *x*. In a 95% confidence interval, an explanatory variables with a *p* value greater than 0.05 is considered to have a statistically insignificant relationship with the dependent variable and could be eliminated from the model.

By estimating the value of q-statistic corresponding to the interaction of two explanatory variables, GeoDetector can also quantify the degree of the interactive impact of each pair of conditioning factors on landslide occurrence. As is shown in Table 3, based on the

comparison of this value with the individually estimated values, the type of interaction can be then determined.


**Table 3.** Types of interaction between two explanatory variables (X1 and X2).

#### *3.4. Information Value (IV)*

IV is a bivariate statistical model which was initially proposed by Yin and Yan [52] and later modified by van Westen [53]. It includes class-level estimations of weight values based on the spatial associations between the landslide occurrence and each class of each conditioning factor. The IV for a given factor class is derived from a mathematical formula of the ratio of landslide density in this class to the landslide density in entire study area (or factor):

$$IV = \ln\left(\frac{Np\text{ix}(Si)/Np\text{ix}(Ni)}{\sum Np\text{ix}(Si)/\sum Np\text{ix}(Ni)}\right) \tag{3}$$

where *Npix(Si)* is the number of landslide pixels within the factor class *i*, and *Npix(Ni)* is the number of all pixels in the same class. The calculated value can be either positive or negative, and the higher (or lower) it is, the more (or less) significant the contribution of the relevant factor class to landslide occurrence.

#### **4. LS Assessment by Hybrid Modeling**

Considering the functionalities and data requirements of the two models composing the GeoDIV hybrid model, two GIS-based data processing procedures initially took place under the general methodological framework (Figure 3). These procedures were the (non)landslide sampling and the factor preparation. For sampling, the landslide inventory dataset was divided into two subsets used as inputs in the model's training (training dataset) and validation (validation dataset), respectively. Among the amount of 60 landslides contained in the inventory, 80% of them (48 in amount) were randomly selected for the training dataset in this study. The remaining 20% (12 in number) constituted the validation dataset. Based on the sizes of mapped landslides and the spatial resolution of obtained factor data, the entire study area was then tiled into grid pixels of 25 × 25 m as the basic analysis unit, resulting in 188 training and 41 validation landslide pixels. The IV model required only a landslide dataset, whereas the GeoDetector model required both landslide and non-landslide datasets. Hence, in order to construct the dependent variable for GeoDetector, an equal number of pixels from the not landslide-affected part of study area were also selected in a random way for the training dataset (totally 376 pixels). The targe<sup>t</sup> values of 0 and 1 were assigned to the non-landslide and landslide pixels, respectively, making the dependent variable a binary classified dataset.

**Figure 3.** Methodological framework for the development of the hybrid GeoDIV model.

In regard to factor preparation, the raster layers of conditioning factors on a continuous numerical scale (altitude, slope angle, profile curvature, plan curvature, stream density, SPI, TWI, proximity to faults, proximity to roads and mean annual rainfall) were divided into a number of discrete classes (Figure 4). In this study, the number of categories and their relative break values were mainly determined by the "natural breaks (Jenks)" classification method [54]. In this method, class breaks identify the most similar within-group values and maximize the differences between classes according to the deviations about the median [55]. Additionally, the raster layers of factors originally on a discrete classified scale (slope aspect, lithology, soil type, and land use/cover) were prepared by grouping them into more or less common initial classes (Figure 4).

After the data processing procedures, the GeoDIV model was implemented. A database was firstly created as the result of the matching of the sample of 376 training data with each factor layer. Including the fourteen classified factors as independent variables and the landslide presence or absence (binary target value of 0 and 1) as the dependent variable were determined in the GeoDetector software, developed by Xu and Wang [56], to determine the impacts of the factors and their pairwise interactions on the spatial stratified heterogeneity of landslide occurrence represented by the training sample. This determination included the calculation of q values for the factors and their pairwise interactions (Tables 4 and 5). To incorporate in the model only the factors with statistically significant relationships with landslide occurrence, the estimated *p* values (Table 4) of the factors were also exploited for factor selection. Despite the requirement for *p* values less than 0.05 in the 95% confidence interval, factors such as altitude, slope angle, plan curvature, stream density, TWI, proximity to faults, proximity to roads, lithology, soil type and land use/cover remained in the model. Conversely, slope aspect, profile curvature, SPI and mean annual rainfall were not qualified to be further analyzed by the model, indicating that there were statistically insignificant relationships (i.e., *p* values greater than 0.05) between them and landslide occurrence in the same confidence interval.

**Figure 4.** *Cont.*

**Figure 4.** Conditioning factors: (**a**) altitude; (**b**) slope angle; (**c**) slope aspect; (**d**) plan curvature; (**e**) profile curvature; (**f**) stream density; (**g**) SPI; (**h**) TWI; (**i**) lithology; (**j**) proximity to faults; (**k**) soil type; (**l**) mean annual rainfall; (**m**) land use/cover; (**n**) proximity to roads.



\* indicate the factors eliminated from GeoDetector according to the *p* values.


**Table 5.** The q-statistic values for the pairwise interactions between the conditioning factors, calculated using GeoDetector.

Subsequently, by matching only the 188 landslide training data with each layer of statistically significant factors, the landslide density for each of their classes was estimated. The IVs were then calculated by Equation (2) to determine the impact of each class on landslide occurrence (Figure 5).

**Figure 5.** *Cont.* 74

**Figure 5.** The estimated IVs for the classes of conditioning factors qualified from the factor selection: (**a**) altitude; (**b**) slope angle; (**c**) plan curvature; (**d**) stream density; (**e**) TWI; (**f**) lithology; (**g**) proximity to faults; (**h**) soil type; (**i**) land use/cover; (**j**) proximity to roads. NA values (or no bars) indicate "not applicable" for these classes.

By using the q values from GeoDetector as factor-level weights and IVs as class-level weights, the overall landslide susceptibility (LS) score was estimated through a GIS-based weighted linear combination of statistically significant factors:

$$LS = \sum\_{j=1}^{n} \mathcal{W}\_{l} \times s\_{i,j} \tag{4}$$

where *Wj* is the weight of a given factor *j*, *si,j* is the weight for a given class *i* of factor *j* and *n* is the number of factors. The spatial distribution of the estimated overall score was visualized by a LS map divided into five classes ("very low", "low", "moderate", "high" and "very high" susceptibility) according to the "natural breaks (Jenks)" method (Figure 6).

**Figure 6.** The landslide susceptibility map produced by the hybrid GeoDIV model.
