**1. Introduction**

A landslide is a gravity-driven environmental process which involves the movement of rocks, debris, earth, or a combination of them down a slope [1]. According to official data, landslides constituted the third (after floods and storms, and before earthquakes) most frequent natural disaster worldwide in 2020 [2]. Generally, the extreme weather events due to climate change, and the high seismic activity in combination with the poorly planned expansion of human activities (deforestation of slopes, uncontrolled irrigation, etc.), have contributed to a global upward tendency in landslide occurrence in the recent years [3].

Due to their occurring without warning and seriously threatening both natural and human environments, landslides are a major problem. Due to severe damage, or even destruction, of infrastructure and properties, they generate larger annual economic losses (billions euro) than any other natural disaster in many countries. In addition, a considerable number of people each year are injured and, in some cases, killed by them. It is indicative that during 1998–2017, totally 4.8 million people were affected by landslides worldwide, with 18,414 of them being killed [4]. In addition, the environmental effects of landslides are mainly changes in terrain morphology, and increased sediment loads in rivers and subsequent transport to dams.

**Citation:** Polykretis, C.; Grillakis, M.G.; Argyriou, A.V.; Papadopoulos, N.; Alexakis, D.D. Integrating Multivariate (GeoDetector) and Bivariate (IV) Statistics for Hybrid Landslide Susceptibility Modeling: A Case of the Vicinity of Pinios ArtificialLake, Ilia, Greece. *Land* **2021**, *10*, 973. https://doi.org/10.3390/land10090973

Academic Editors: Enrico Miccadei, Cristiano Carabella and Giorgio Paglia

Received: 20 August 2021 Accepted: 13 September 2021 Published: 15 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The increased frequency of landslides and the severity of their effects have led to growing interest from international scientific community. Since predictions of occurrence and intensity remain challenging, most of the attention has been given to the determination of potential spatial locations. The acquisition of this spatial information can be achieved through landslide susceptibility (LS) assessments and mapping. LS refers to the potential landslide activity as a result of terrain conditions [5]. An assessment depends on the spatial distribution of past landslides in an area and their relation to its terrain conditions, in order to generate spatial predictions for areas that are not landslide-affected but have similar conditions. The output is a map presenting the region of interest divided into homogeneous zones of susceptibility [6]. LS maps with high levels of accuracy and reliability are considered crucial tools that can then be used as inputs for disaster managemen<sup>t</sup> plans.

The advancements in the geospatial tools of geographic information systems (GIS) and remote sensing (RS), assisted by improvements in computer processing power, have improved LS modeling over the last few decades. Based on the literature, a considerable number of models are currently available for assessing LS at different spatial scales. In terms of degree of objectivity and necessity for landslide occurrence data, all these models can be separated into two different groups, the qualitative and quantitative models. The qualitative (or semi-quantitative) models estimate a susceptibility score on the basis of weights assigned to landslide conditioning factors from one or more expert(s). They suffer from low objectivity associated with the experts' subjective judgements [7]. On the other hand, the quantitative models decrease bias in the weight assignments, since they depend on fixed mathematical rules, regardless of any expert judgement [8]. Particularly, the impacts of different conditioning factors on past occurrences are quantitatively determined, resulting in high objectivity.

The current capability for acquiring multi-temporal landslide occurrence data through RS-based approaches has led to wide use of the data-driven quantitative models. These models range from complicated geotechnical and advanced machine learning models to more conventional statistical analysis models. Based on mechanical laws for the calculation of a safety factor, the geotechnical models [9,10] examine the slope stability from the perspective of the mechanical properties of the slope. Being based on human learning procedures, machine learning models are used to solve problems characterized by nonlinear functions and data. Commonly applied machine learning models are artificial neural networks (ANN), support vector machines (SVM), random forests (RF) and decision trees (DT) [11–13].

Regarding statistical analysis models, their fundamental principle is to estimate the probability of a landslide under the existence of spatial associations between the conditioning factors and past landslides [14]. Depending on the examination of factors individually or cumulatively, they can be either bivariate or multivariate. In bivariate modeling, weights are calculated for the classes of each individual factor by their levels of association with landslides in a historic dataset. Frequency ratio (FR), information value (IV) and weights of evidence (WoE) constitute the main representatives of bivariate models [15,16]. Conversely, in multivariate modeling, all the factors are sampled, and the presence or absence of landslide is determined for each of the sampling units [17]. Then, weights are calculated for the factors via statistical means. Among the multivariate models, logistic regression (LR) is doubtless the most used [18,19]. However, models such as LR consider the factors as explanatory variables without taking into account the spatial information contained in them and exploring their impacts on landslide occurrence (dependent variable) from a spatial perspective. In order to overcome this limitation, new spatially-based multivariate models have been put forward recently. These models can address the specificities of each space and consider that spatial variations in landslides may cause different responses to variations in the factor variables. Such a model is the Geographical Detector (GeoDetector). Although GeoDetector has been tested in various studies of health, social and environmental sciences [20–22] over the last few years, its use in landslide-related research has been quite limited. Since it provides an effective way to identify and eliminate redundant

variables, GeoDetector has been used in a few relevant studies [14,23,24] for factor selection purposes.

In general, all the quantitative models have been proven beneficial for identifying locations that are prone to landslides; however, some shortcomings still characterize them. The geotechnical models require detailed mechanical data of soil or rock, and as a result they are only suitable for studying small regions or single slopes. Although the statistical models are easy to understand and perform well in most cases, they find it difficult to solve situations with large amounts of data. Moreover, despite their ability to handle large amounts of nonlinear data, machine learning models are not significantly better than the statistical ones, and cannot perform well under different conditions and in different areas [25]. In order to produce the most reliable LS map for a region of interest, one possible solution is to compare different models and select the optimum in terms of accuracy and prediction ability. Several studies have compared two or more different models to recognize the most suitable for a specific region [26–28].

The aforementioned shortcomings tend to increase the uncertainty and reduce the efficiency of models when applied individually. Thus, another solution has gained popularity recently, which is the development of hybrid models. Hybrid modeling can resolve the shortcomings of individual models and improve performance. This type of modeling has been gradually applied in LS assessment studies over the last decade. For instance, in the work of Arabameri et al. [8], the efficiency of the integration of statistical (FR) and machine learning (RF) models was explored for LS mapping in northern Iran. For assessing the LS in a region of India, Saha et al. [29] integrated a statistical and a machine learning model to improve on their individual accuracies. Chen et al. [30] applied a combination of bivariate (WoE) and multivariate (LR) statistical models with a machine learning model (RF) for LS mapping of a mountainous region of China. Roy et al. [31] delineated LS zones in districts of India by integrating bivariate statistical (WoE) and machine learning (SVM) models. Chowdhuri et al. [32] introduced hybrid models from statistical and machine learning model integrations for predicting spatially the landslide occurrence in a basin of India. In addition, some studies have improved the performances of machine learning models by combining them with optimization or meta-heuristic algorithms [33,34].

In Greece, landslide activity has been highly facilitated by the frequent occurrence of intense rainfall and seismic events. Along with them, its complex geo-morphological settings (strained geological formations and steep slopes) and the uncontrolled land-use in landslide-prone areas have contributed. As a result, the interest in and awareness of the importance of LS assessments for regions of Greece have increased, particularly over the last decade. However, the majority of relevant studies has focused on the implementation of individual statistical and machine learning models [35–37], rather than integrated approaches. It could be mentioned that the work of Chalkias et al. [38] constitutes an exception.

The region of Peloponnese has experienced severe natural disasters, including floods, earthquakes, landslides and wildfires. Specifically, landslides have highly damaged settlements within its boundaries (mainly in its northern and western parts), resulting in partial destruction and necessary re-locations to nearby geologically stable lands. Considering all the above, the present study aimed to assess the LS and create a reliable map of a wetland in northwestern Peloponnese. Therefore, a hybrid LS modeling is proposed based on the integration of two different statistical models, the multivariate GeoDetector and bivariate IV. Past landslide occurrence and conditioning factor datasets were incorporated into the hybrid model, named GeoDIV, and analyzed in a GIS environment to determine the spatial distribution of susceptibility. In order to confirm the targeted reliability of LS map, the performance of proposed GeoDIV model was compared with that of the individual IV model in a validation procedure.
