1. Introduction
Landslides are a multifarious phenomenon, representing a substantial global hazard and endangering human lives, infrastructure, and transportation networks [
1,
2]. Annually, slope failures incur substantial economic losses, both directly and indirectly, potentially accounting for losses of millions of dollars [
3,
4]. Over the past two decades, researchers have increasingly recognized a worldwide landslide susceptibility map as a valuable tool for identifying areas vulnerable to landslide risk in urban or rural settings [
3]. Such maps are particularly essential in the Medea region, where the specific geological, climatic, and physiographic conditions make it prone to landslide occurrences. Moreover, human activities indirectly contribute to landslide risks due to significant population growth, prompting the government to expand the road network and urbanize the region, thereby placing more infrastructures in landslide-prone areas. The construction of such an infrastructure typically involves excavation and tunneling activities, disrupting the natural equilibrium of soils and potentially inducing slope instability [
1,
4]. The landslide that occurred in January 2014 within the Jebel El Ouahch Tunnel in Constantine stands as a prominent example of such cases, directly linked to tunneling activities [
1]. As a consequence, the tunnel was closed, leading to a modification in the alignment of the A1 highway corridor [
1]. Hence, a landslide susceptibility map is a crucial tool in landslide hazard management, providing valuable information for local governments in order to develop master plans and derive solutions aimed at mitigating the catastrophic consequences of landslides. Such maps facilitate the development of appropriate planning and decision-making tools to address landslide risks effectively [
3,
5,
6].
Since the 17th century, numerous approaches have been used to develop methodologies for predicting landslide susceptibility and producing corresponding maps [
7,
8]. Such maps serve as valuable tools for identifying regions vulnerable to landslide hazards. In this context, a variety of approaches have been employed for landslide susceptibility mapping. These approaches can be categorized into three main types: qualitative, quantitative, and semi-quantitative methods [
9,
10]. Qualitative methods are characterized as straightforward approaches that primarily rely on direct field measurements and the expertise and experience of experts. On the other hand, quantitative methods are regarded as rigorous and objective approaches that use statistical and mathematical techniques to analyze data [
10]. Qualitative and quantitative methods are used to mitigate the subjectivity inherent to landslide susceptibility assessments by integrating geotechnical and statistical models. Furthermore, new hybrid methods have been introduced in the literature. These methods originated from the aforementioned approaches by merging qualitative and quantitative methods to assess the importance of the input parameters in generating landslide hazard maps. These hybrid methods are commonly called semi-quantitative methods [
11]. The ease of use and effectiveness of the three aforementioned methods have rendered them popular and valuable, owing to their straightforward representation of the dependent variable (i.e., landslide susceptibility) and independent variables (i.e., its drivers) [
12,
13]. To the best of the authors’ knowledge, landslide inventories and heuristic methods stand out as the most frequently employed qualitative approaches [
14]. The primary drawback of qualitative methods lies in their subjectivity, stemming from the experiential ranking of landslide predisposing factors based on the expertise of individuals [
14].
Similarly, quantitative methods can be categorized into several subclasses, including statistical, deterministic, and machine learning approaches [
10]. Deterministic methods, also called geotechnical methods, have seen extensive applications in the literature. These approaches rely on geotechnical parameters determined on-site, coupled with the engineering principles of slope instability, typically expressed through a safety factor. However, deterministic methods tend to overlook climatic and anthropogenic factors [
14]. The primary limitation of deterministic methods is their reliance on comprehensive geotechnical and hydrological data, which can be challenging to gather for large areas [
14]. Moreover, these methods are generally applicable only to mapping small areas [
15]. Typically, statistical methods aim to predict the relationship between historical landslides and their drivers through bivariate or multivariate techniques [
10]. Logistic regression, weight of evidence, and analytical hierarchy process methods are among the statistical approaches commonly employed for modeling landslide susceptibility. However, a key criticism of statistical methods is their requirement for the drivers to follow a normal distribution, which may not always be true under real world conditions. Additionally, these methods inherently assume linearity [
16] and rely on simplified assumptions, such as linear behavior or production heuristics, which can limit the effectiveness of statistical methods in modeling complex nonlinear phenomena [
17,
18].
To address these limitations, machine learning models have been proposed for landslide susceptibility mapping. These models leverage sophisticated algorithms to model the intricate nonlinear relationships by analyzing the conditioning factors of both landslide and non-landslide locations [
10]. Machine learning is an algorithmic approach that iteratively learns from the available data to uncover underlying relationships or hidden patterns, thereby constructing accurate analytical models [
10,
19,
20,
21]. So far, numerous researchers have employed machine learning methods for modeling landslide susceptibility, with the most commonly utilized approaches in the literature including Artificial Neural Networks [
14,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38], Support Vector Machines [
22,
26,
27,
30,
39,
40,
41,
42,
43,
44,
45,
46], Decision Trees [
23,
39,
41,
42,
43,
47,
48,
49,
50,
51,
52], Random Forest [
30,
40,
51,
52,
53,
54,
55,
56,
57,
58], Adaptive Neuro-Fuzzy Inference System [
41,
59,
60,
61,
62,
63,
64,
65], and Deep Neural Network [
66].
However, certain limitations have been mentioned in the literature with respect to traditional machine learning models. The process of selecting the optimal model can be time-consuming, and these models are prone to issues such as overfitting and underfitting. Moreover, the convergence of algorithms during the training phase relies heavily on the initial complex values, leading to slow training speeds [
16]. Recent studies have primarily focused on enhancing traditional machine learning methods through the use of hybrid metaheuristic algorithms [
67,
68]. These approaches serve as a prevalent solution to address the challenges encountered by machine learning algorithms. The primary advantage of integrating metaheuristic algorithms with machine learning methods is the enhancement of convergence during the learning phase toward the optimal solution. However, despite this benefit, hybrid machine learning methods have received relatively little attention in the context of landslide susceptibility mapping.
Building upon the background information provided, the current research aims to develop a novel advanced hybrid machine learning model designed to assess landslide susceptibility effectively. This model is subsequently integrated into a geographical information system (GIS), resulting in the generation of an accurate map highlighting landslide-prone areas. The proposed map serves as a valuable tool for decision-makers and land use managers in the Medea area, which is known to be highly vulnerable to landslides, helping them to mitigate landslide hazards, and has the potential to be used in other regions of the world. The study significantly contributes to introducing novel hybrid metaheuristic machine learning methods, combining Genetic Algorithm, Particle Swarm Optimization, Harris Hawks Optimization, and Salp Swarm Algorithm with ANFIS, to enhance landslide susceptibility mapping accuracy. Furthermore, an advanced approach named K cross-validation has been employed to ensure that the models are better generalized and less prone to overfitting and underfitting. Additionally, the research fills in a significant gap in the literature by focusing on the understudied region of Medea and providing valuable insights into landslide susceptibility in this vulnerable area.
3. Results
3.1. Database Compilation
Compiling the database is a critical step in assessing the relationship between landslide occurrence and its causal factors. Typically, landslide occurrences tend to be more frequent under certain conditions, as observed in previous incidents. Therefore, identifying the distribution of past landslides is crucial for a comprehensive study. In this research, Google Earth images were used to locate landslide sites using the Historical Imagery tool, which enables the visualization of changes over time in the study area map. This approach is valuable for monitoring suspicious sites over time and identifying potential landslide areas. Following this, field surveys were conducted to validate the potentiality; assess the sizes and shapes of landslides; determine the movement types; conduct site diagnoses; and characterize the activity level (active, dormant, etc.) of failed slopes.
To assess the performance of hybrid machine learning methods, various thematic layers representing several factors influencing landslides were prepared. The selection of drivers depended on data availability, information gathered from field surveys, and specific characteristics of the study area. These factors were digitized, organized, and rasterized using Geographic Information System (GIS) software, specifically ArcGIS 10.8, to integrate them into the proposed model and generate the final map.
The adopted input factors are lithology, elevation, slope, land cover, distance to stream, precipitation, slope aspect, and distance to road. Initially, the lithology layer (X1) was created using data from a 1:50,000-scale geologic map. This layer was then categorized into four distinct groups according to their landslide susceptibility, as illustrated in
Table 3. First, elevation, slope, and slope aspect maps were derived from the Shuttle Radar Topography Mission (SRTM) database. Second, the elevation values (X2) ranged from 265 to 1800 m. X2 was classified into six categories as follows: (1) 265–400 m, (2) 400–600 m, (3) 600–800 m, (4) 800–1000 m, (5) 1000–1200 m, and (6) >1200 m. Third, the slope layer (X3) was created using the “slope function” in ArcGIS 10 and categorized into four sets as follows: (1) 0–5°, (2) 5–12.5°, (3) 12.5–25°, and (4) 25–90°. Fourth, the land cover layer (X4) was categorized into five classes: (1) residential, (2) uncultivated, (3) cultivated, (4) grassland, and (5) forest. Precipitation (X7) was classed into five classes as follows: (1) 100–200 mm, (2) 200–300 mm, (3) 300–400 mm, (4) 400–500 mm, and (5) 600–800 mm. Finally, distance to stream (X5) and distance to road (X6) were classified into six sets as follows: (1) 0–25 m, (2) 25–50 m, (3) 50–100 m, (4) 100–200 m, (5) 200–300 m, and (6) >300 m. These classifications are based on the analysis of the susceptibility of each factor class to landslide occurrences, drawing on findings from previous landslide studies.
3.2. Correlation between Inputs and Target
The statistical relationship between landslide susceptibility and input parameters was examined using SPSS software version 20.0.
Table 4 displays the correlation matrix, providing a descriptive overview of the data distribution, the Pearson correlation coefficient (R) and its significance regarding landslide susceptibility, and other inputs. The results revealed significance levels below 0.05 for X1, X2, X3, X7, and X8, indicating statistical significance in these correlations. According to Smith’s classification (1986) [
83], landslide susceptibility exhibits a consistent correlation with the input parameters, except for X4, X5, and X6, which demonstrate poor correlation. This suggests a complex nonlinear relationship that necessitates advanced machine learning techniques for accurate modeling.
3.3. Optimal Input Selection Using the Gamma Test (GT)
In this section, the impact of each input on landslide susceptibility was evaluated by constructing eight different combinations of the input factors (X1, X2, X3, X4, X5, X6, X7, and X8), as outlined in
Table 5. The first combination includes all eight parameters (referred to as the initial set). Similarly, the second combination consists of seven input factors (All-X1), excluding the lithology parameter; the seventh combination includes all inputs except precipitation (X7) and so forth for the remaining combinations, as detailed in
Table 5. The results of the GT analysis reveal that factors X1, X3, X5, X7, and X8 have a significant influence on the output. These five input factors were selected based on the highest value of the gamma statistic (Γ) and V
ratio. The findings clearly indicate that the combination of lithology, slope, distance to stream, precipitation, and slope aspect exhibited the lowest values of gamma statistics. Consequently, this set was identified as the optimal combination of input variables for modeling landslide susceptibility, as determined by the Gamma Test method.
3.4. Landslide Susceptibility Classification through the Hybrid Metaheuristic Method
To determine the optimal machine learning model, the study employed a two-step approach: first, selecting influential input parameters based on the literature recommendations, and second, identifying the best machine learning methods. Initially, eight factors were chosen, and the Gamma Test method was then applied to identify the optimal inputs. Subsequently, five statistical measures were used to assess and compare the performances of various models during both the training and validation phases. The results, including sensitivity, specificity, precision, accuracy, and Pearson correlation coefficient (R), are presented in
Table 6.
The dataset was split into two parts: 80% for training and 20% for validation, comprising 128 samples for training and 32 for validation. As shown in
Table 6, the landslide susceptibility modeled using various hybrid metaheuristic methods exhibited the following ranges of performance metrics during the training phase: sensitivity (95.83–100%), specificity (94.94–98.73%), precision (92.16–98%), accuracy (96.09–99.22%), and Pearson correlation coefficient (R) (92–99.21%). Similarly, during the validation phase, the ranges were sensitivity (85.71–100%), specificity (95.45–100%), precision (88.89–100%), accuracy (93.75–100%), and Pearson correlation coefficient (R) (87.26–99.97%). The results clearly demonstrate that the ANFIS-HHO model, trained with the optimal inputs identified by the GT method and using a combination of the HHO algorithm, produced the most accurate predictions. This model exhibited high sensitivity (100%/100%), specificity (98.734%/100%), precision (98%/100%), accuracy (99.22%/100%), and Pearson correlation coefficient (R) (99.21%/99.97%) during both the training/validation phases. Additionally, the ANFIS-SSA model performed satisfactorily and was ranked as the second-best model. On the other hand, the ANFIS-GA model yielded the weakest results in predicting landslide susceptibility. In terms of the performance hierarchy of the hybrid metaheuristic machine learning models during the training and validation phases, the order is ANFIS-HHO, ANFIS-SSA, ANFIS-PSO, and ANFIS-GA.
3.5. Evaluating the Best-Fitted Model Using the K-Fold Cross-Validation Approach
The evaluation of the predictive capability of the optimal ANFIS-HHO model involved the effective use of a five-fold cross-validation approach. Notably, previous studies focusing on predicting landslide susceptibility often assessed their models based on a single split, which limited the verification of their models’ ability to address overfitting and underfitting issues.
Figure 13 illustrates the performance measures of the optimal ANFIS-HHO model using five-fold cross-validation with validation data for each split.
The results clearly demonstrate the efficacy of the ANFIS-HHO model, with correlation coefficients ranging between 0.972 and 0.994 for validation data across the five splits. This substantiates the predictive capability of the optimal ANFIS-HHO model to not only learn from existing data but also generalize well to novel validation data, effectively overcoming the overfitting and underfitting challenges.
3.6. Landslide Susceptibility Mapping
The landslide modeling process was conducted using four hybrid metaheuristic machine learning methods based on the training dataset. The performance of each model was assessed using five statistical indicators, revealing ANFIS-HHO as the most suitable model. To generate landslide susceptibility maps, susceptibility indices were computed for all pixels in the study area using a 30 × 30 m grid size. The Fishnet Tool in ArcToolbox facilitated this step. Subsequently, the ANFIS-HHO model was integrated into ArcGIS software to classify the susceptibility indices of each pixel based on the optimal input layers.
Figure 14 illustrates the resulting landslide susceptibility maps, showing three susceptibility classes: low, moderate, and high. The distribution of susceptibility classes reveals that 48.39% of the study area exhibits low susceptibility to landslides, while 22.31% and 29.29% have moderate and high susceptibilities, respectively.
The map illustrates an increasing susceptibility from plateau surfaces to streams, primarily influenced by slope angle. Plateau surfaces demonstrate low susceptibility due to factors such as lithological characteristics (e.g., hard rock), distance from streams, low precipitation, and gentle slopes. Moderate susceptibility is observed in ravines and convex slope ruptures delineating plateaus, indicative of lithological alteration zones (passage from a hard layer to soft layer). Conversely, high susceptibility zones are characterized by soft soil lithology, sparse vegetation cover, steep slopes, high precipitation, and proximity to streams.
3.7. Comparison between Our Model and the Models Proposed by the Literature
To assess the effectiveness of the proposed ANFIS-HHO model, a comparative study was conducted involving several empirical models from the literature predicting landslide susceptibility, as outlined in
Table 7. The comparison was based on classification accuracy, sensitivity, and specificity, crucial indicators for evaluating prediction accuracy, where values close to 100 represent the best model. The results of the comparative study revealed that our proposed ANFIS-HHO model outperformed the others, demonstrating the highest classification accuracy, sensitivity, and specificity with values of 99.21, 100, and 98.734, respectively.
Following our model, the random forest model proposed by Dou et al. [
52] ranked second, providing acceptable accuracy. The performance hierarchy of the machine learning models in our study was as follows: Dou et al. [
52], Benbouras [
8], Kavzoglu et al. [
39], Tien Bui et al. [
26], Aghdam et al. [
59], Dao et al. [
52], and Yeon et al. [
50]. The effectiveness of our suggested model is attributed to the metaheuristic hybrid machine learning method, which automates the training process and achieves a better performance and optimum results in a short period of time.
4. Discussion
4.1. Significance of the Results
In our current research, we aimed to significantly contribute to the landslide research community by enhancing the performance of landslide susceptibility models. To the authors’ knowledge, the quality of these models heavily relies on the chosen method, and the current study focuses on exploring the effectiveness of novel hybrid metaheuristic machine learning methods. Furthermore, despite Medea Wilaya being highly vulnerable to landslides, the existing literature lacks a comprehensive landslide map. To address these gaps, the efficacy of four meta-heuristic algorithms combined with the Adaptive Neuro-Fuzzy Inference System (ANFIS) method was examined for a landslide susceptibility assessment. These algorithms include Genetic Algorithm (ANFIS-GA), Particle Swarm Optimization (ANFIS-PSO), Harris Hawks Optimization (ANFIS-HHO), and Salp Swarm Algorithm (ANFIS-SSA). It is worth noting that the use of hybrid metaheuristic machine learning methods in a landslide assessment is relatively rare, representing a premiere for the study area.
Our findings highlight that the ANFIS-HHO model emerged as the most suitable model, exhibiting higher values of sensitivity (100/100), specificity (98.734/100), precision (98/100), accuracy (99.22/100), and Pearson correlation coefficient (R) (99.21/99.97) during both the training and validation phases compared to the other models. Furthermore, we evaluated the newly developed model using the K-fold cross-validation method, demonstrating its ability to generate new data without overfitting or underfitting and its superior precision compared to the other proposed empirical models in the literature.
Our results hold significant importance for landslide research and hazard assessment. By demonstrating the effectiveness of hybrid metaheuristic machine learning methods in improving landslide susceptibility models, we provide valuable insights for researchers and practitioners. The identification of the ANFIS-HHO model as the most suitable for landslide prediction underscores the potential of these advanced techniques in addressing complex geological phenomena. The model is based on an optimal hybrid metaheuristic ANFIS-HHO model assessed by the K-fold cross-validation approach. This approach ensures that our model’s performance is evaluated using different subsets of the dataset, minimizing the risk of overfitting and enhancing its generalization. This combination shows a rigorous optimization for accurate predictions, effectively overcoming the overfitting and underfitting challenges. It includes essential input parameters for landslide susceptibility, such as lithology, slope, elevation, distance to stream, land cover, precipitation, slope aspect, and distance to road. Moreover, the integration of our improved ANFIS-HHO model into GIS software facilitates the generation of accurate landslide susceptibility maps, enabling decision-makers and land use managers to implement effective risk management strategies.
4.2. Inner Validation of the Results
We believe that our proposed methodology, which combines hybrid machine learning techniques with GIS tools, offers a straightforward approach that can be replicated in other regions facing similar challenges. Historically, by the beginning of the 20th century, microzoning maps and machine learning methods gained widespread usage in northern countries, aiding decision-makers and land use managers in various applications [
84]. Today, the imperative to leverage these tools for developing up-to-date microzoning maps extends to several countries in the south, reflecting their significance in addressing contemporary challenges [
84]. In this context, landslide susceptibility maps hold considerable importance [
85], serving as critical resources for informed decision-making and risk management.
4.3. External Validation of the Results
The results of the current study demonstrate a significant enhancement in the performance of the landslide model through the utilization of hybrid machine learning methods. Compared to traditional methods, the metaheuristic hybrid HHO-ANFIS method yielded highly significant results. Furthermore, ANFIS-HHO outperformed the proposed models in the literature, showing a 9.22% improvement over the ANFIS and DNN proposed by Aghdam et al. [
59] and Dao et al. [
52], respectively, 4.81% improvement over SVM proposed by Kavzoglu et al. [
39], 4.14% improvement over PSOGSA-ANN proposed by Benbouras [
8], and 3.63% improvement over Random Forest proposed by Dou et al. [
52]. These findings are consistent with expectations, as hybrid machine learning techniques, whether employed for prediction or classification tasks, can mitigate bias and variance while averting issues such as overfitting and underfitting, thus enhancing the predictive capability of traditional methods.
4.4. Importance of the Results
The significance of our results lies in their potential to inform landslide risk management strategies and land use planning efforts in landslide-prone regions. By accurately assessing the landslide susceptibility and generating detailed susceptibility maps, our methodology empowers decision-makers to implement proactive measures to mitigate the impact of landslides on human lives and infrastructures. Furthermore, our study highlights the efficacy of hybrid machine learning techniques in enhancing traditional landslide modeling approaches, paving the way for future research and practical applications in similar contexts worldwide.
4.5. Study Limitations and Future Directions
The main advantage of hybrid metaheuristic machine learning methods, in contrast to traditional approaches, lies in their ability to automate training processes and achieve superior performance and optimal results within shorter timeframes. Moreover, these methods possess the capability to amalgamate various algorithms, harnessing the strengths of each to create highly adaptable methodologies compared to conventional machine learning techniques. However, it is crucial to acknowledge several limitations associated with hybrid metaheuristic machine learning methods. The most important one is the relatively small sample size used in this study, which may impact the precision of the landslide susceptibility map. This limitation could hinder the model’s capacity to generalize novel conditions or scenarios not accounted for during the training phase. Additionally, researchers often rely on extensive and diverse datasets compiled from various sources to bolster learning outcomes. Thus, future studies could benefit from incorporating data gathered across multiple countries to enrich the learning process and enhance the model performance. Furthermore, the implementation of proposed models may pose other challenges in future research endeavors. Results are typically presented in complex matrices computed using transfer functions, which may prove cumbersome to utilize in subsequent cases, particularly considering the necessity to integrate the model with external programs such as ArcGIS, as demonstrated in this study.
5. Conclusions
The conclusions drawn from this study highlight significant contributions aimed at exploring the efficacy of new unused advanced hybrid machine learning methods in generating a reliable model for effectively assessing landslide susceptibility in Medea Wilaya, which is known as a highly vulnerable area to landslides. To achieve this objective, historical landslide locations were identified using Google Earth images, and multiple field surveys were performed. The Gamma Test method was then employed for optimal input selection, revealing that lithology, slope, distance to stream, precipitation, and slope aspect constitute the optimal input set. Following this, four meta-heuristic algorithms, namely ANFIS-GA, ANFIS-PSO, ANFIS-HHO, and ANFIS-SSA, were combined with the ANFIS method and applied to model the selected optimal input set. The accuracy of the proposed models was assessed using five statistical indicators. Based on that, the comparative assessments highlighted the superior accuracy of the ANFIS-HHO model, exhibiting the best performance in terms of sensitivity (100/100), specificity (98.734/100), precision (98/100), accuracy (99.22/100), and Pearson correlation coefficient (R) (99.21/99.97) during both the training and validation phases compared to other models. Additionally, the predictive capability of the ANFIS-HHO model was evaluated using a five-fold cross-validation with K = 5, which yielded consistently high correlation coefficients (0.972 to 0.994) across validation data splits, indicating the absence of overfitting or underfitting issues. Our proposed model is based on an optimal hybrid metaheuristic ANFIS-HHO model assessed by the K-fold cross-validation approach and indicates rigorous optimization for accurate predictions, effectively overcoming the overfitting and underfitting challenges.
Comparative analyses with the proposed models in the literature confirmed the significant improvement of our proposed ANFIS-HHO model. Finally, the proposed model was integrated into GIS software to produce an accurate map depicting landslide-prone areas. This map can serve as a valuable tool for decision-makers and land use managers in mitigating landslide hazards in the Medea region.
Theoretically, our study contributes to expanding our knowledge by providing an in-depth insight into the application of advanced machine learning techniques in landslide susceptibility assessments. By investigating the efficacy of hybrid metaheuristic algorithms combined with the Adaptive Neuro-Fuzzy Inference System (ANFIS) method, we contribute to the body of literature on landslide modeling methodologies.
Methodologically, our study presents a novel approach to landslide susceptibility modeling by integrating multiple advanced techniques and methodologies. The utilization of hybrid metaheuristic algorithms represents an innovative methodological advancement, offering a robust framework for improving the predictive accuracy and reliability. Moreover, our research demonstrates the feasibility of integrating machine learning models with Geographic Information Systems (GIS) software for practical applications in hazard mapping and risk management. This methodological innovation has implications beyond landslide research and can be adapted to other geospatial modeling applications, contributing to interdisciplinary approaches in geological–geotechnical risks.
Finally, the current research has the potential to inform decision-making processes and improve disaster preparedness efforts in landslide-prone regions. Moreover, the methodologies developed in this study can serve as a foundation for future research endeavors in the broader field of natural hazard assessment and mitigation.