Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks

Wang, Ruiting; Xi, Wenfei; Huang, Guangcai; Yang, Zhiquan; Yang, Kunwu; Zhuang, Yongzai; Cao, Ruihan; Zhou, Dingjie; Ma, Yijie

doi:10.3390/land14040839

Open AccessArticle

Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks

by

Ruiting Wang

¹

,

Wenfei Xi

^1,2,3,*,

Guangcai Huang

^4,*,

Zhiquan Yang

^3,5

,

Kunwu Yang

¹,

Yongzai Zhuang

¹,

Ruihan Cao

¹,

Dingjie Zhou

⁶ and

Yijie Ma

¹

Faculty of Geography, Yunnan Normal University, Kunming 650500, China

²

Key Laboratory of Highland Geographic Processes and Environmental Change in Yunnan Province, Kunming 650500, China

³

Key Laboratory of Early Rapid Identification, Prevention and Control of Geological Diseases in Traffic Corridor of High Intensity Mountainous Area of Yunnan Province, Kunming 650093, China

⁴

Guizhou Institute of Geological Survey, Guiyang 550081, China

⁵

Faculty of Public Safety and Emergency Management, Kunming University of Science and Technology, Kunming 650093, China

⁶

Surveying and Mapping Engineering Institute of Yunnan Province, Kunming 650224, China

^*

Authors to whom correspondence should be addressed.

Land 2025, 14(4), 839; https://doi.org/10.3390/land14040839

Submission received: 5 March 2025 / Revised: 3 April 2025 / Accepted: 9 April 2025 / Published: 11 April 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Landslides represent a widespread global geological hazard, presenting significant risks to both human populations and critical infrastructure. The accuracy of landslide susceptibility evaluation models serves as a critical prerequisite for landslide hazard prediction and risk management, while insufficient landslide sample data may constrain the reliability of susceptibility modeling and evaluation results. To address the challenge of limited landslide samples in complex mountainous areas, this study proposes a novel landslide susceptibility evaluation method integrating environmental similarity theory with a backpropagation neural network (Environmental Similarity Model–BP Neural Network, ESM-BP). Taking the Baihetan reservoir area as the study region, the environmental similarity degrees between potential prediction points and historical landslide samples were calculated using eight environmental factors. A normal distribution approach was employed to classify similarity thresholds, thereby constructing an enhanced landslide sample dataset. The BP neural network model was subsequently applied for susceptibility assessment, with comparative validation against support vector machine (SVM) and random forest (RF) models. The experimental results demonstrate that (1) the integration of environmental similarity theory effectively expanded the dataset by 4398 samples with distinct susceptibility levels, resolving data scarcity issues and significantly enhancing the model’s generalization capabilities. (2) Among the three models tested with supplemented samples, the BP neural network achieved optimal performance, showing improvements in the accuracy values by 0.02 and 0.14 compared to SVM and RF, respectively, Kappa coefficient enhancements of 0.02 and 0.18, and RMSE reductions of 0.04 and 0.21. This methodology enhances the applicability and reliability of landslide susceptibility evaluation models in complex mountainous environments, providing innovative insights for related research in landslide susceptibility assessment.

Keywords:

environmental similarity; BP neural network; machine learning; environmental factors; assessment of landslide susceptibility

1. Introduction

Landslides are prevalent geological hazards globally, characterized by suddenness, destructiveness, and a high frequency, severely disrupting normal production, livelihoods, and ecological balance [1,2,3,4]. According to the Global Fatal Landslide Database (GFLD), over 4000 fatal landslide events occurred worldwide between 2004 and 2016, resulting in more than 50,000 fatalities [5,6,7]. China is also highly susceptible to landslide disasters [8]. From 2010 to 2019, nearly 90,000 landslides were recorded nationwide, causing over 8000 casualties and direct economic losses of approximately CNY 4.5 billion [9,10]. Landslide susceptibility evaluation enables the scientific prediction of landslide occurrence probabilities, providing critical support for disaster emergency responses and regional safety management [11,12,13,14]. However, challenges such as the complexity of geological surveys in mountainous areas and incomplete historical landslide data have led to insufficient landslide samples, limiting the accuracy of regional susceptibility evaluations and models’ predictive capabilities [15,16]. Therefore, developing accurate landslide susceptibility evaluation methods under data scarcity is essential for early risk warning and disaster mitigation in complex mountainous regions [17].

The current approaches to landslide susceptibility evaluation under sample scarcity fall into three categories: model optimization, sample augmentation, and transfer learning. The first category focuses on optimizing susceptibility evaluation models. Traditional methods such as logistic regression, frequency ratios, and artificial neural networks for susceptibility mapping heavily rely on large sample sizes, struggling to adapt to data-limited scenarios [18]. Researchers have thus explored high-generalization or low-sample-dependent models. For instance, H. A. Nefeslioglu [19] applied the Analytic Hierarchy Process (AHP) to establish a susceptibility evaluation framework for data-deficient regions, innovatively validating the applicability of traditional statistical models under sample scarcity. However, this approach overlooked the landslide sample characteristics, compromising the evaluation accuracy [20,21,22].

The second category employs sample augmentation strategies, such as remote sensing interpretation, to expand landslide databases. E. M. Van Den [23] integrated historical landslide news texts and Google Earth imagery features across Europe to construct a continental-scale landslide spatial database, significantly enhancing the susceptibility mapping accuracy through data fusion. J. Du [24] achieved intelligent landslide identification in Tibet’s Jilong Valley using remote sensing and machine learning, but their supervised learning method suffered from manual interpretation biases and lacked the quantitative characterization of the slope deformation dynamics, reducing the results’ reliability.

The third category involves transfer learning for cross-regional landslide prediction. Fu Zhiyong [25] improved the model performance in target regions by transferring landslide samples from source regions without extensive field data. Yuan Tian [26] proposed a transfer-learning-based slope displacement prediction method emphasizing slope gradient effects. However, in complex geological environments with diverse influencing factors, the predictive capacity of such methods diminishes significantly.

In summary, existing methods address landslide susceptibility evaluation under sample scarcity to some extent. However, in southwestern mountainous regions, complex environments hinder effective sample expansion via remote sensing or transfer learning, limiting model applicability [27]. Geographical environments critically influence landslide development: similar environmental configurations yield analogous landslide processes, with higher environmental similarity leading to closer geographical features [28,29,30]. Thus, environmental similarity can supplement existing landslide samples by identifying analogous points [31,32,33].

This study proposes an Environmental Similarity-Based Landslide Susceptibility Evaluation Method (ESM-BP), which calculates the environmental similarity between known samples and prediction points using multiple factors and integrates a BP neural network for susceptibility assessment in data-scarce mountainous regions. The results enhance the applicability and reliability of landslide prediction models in complex geological settings, offering both theoretical and practical significance.

2. Theoretical Methods

2.1. Environmental Similarity Method Process

Landslides occur due to the combined influence of multiple factors, including the geological conditions, the topography, and human activities [34]. The core of landslide susceptibility assessment lies in accurately identifying the key geographical and environmental characteristics that contribute to landslide occurrence [28]. This study employs the environmental similarity theory to quantify the similarity in environmental factors between prediction points and historical landslide samples. Specifically, historical landslide points are considered as reference samples, while environmental variables such as the elevation, slope, and NDVI are used to characterize the primary conditions associated with landslide occurrence. By applying the environmental similarity approach, a spatial comparison of the environmental factors between prediction points and landslide samples is conducted, measuring the degree of similarity between them. This similarity is further categorized into different susceptibility levels, which are then utilized as input parameters for the landslide susceptibility evaluation model, as illustrated in Figure 1.

2.2. Environmental Similarity Calculation Method

The environmental similarity assessment method relies on the geographic and environmental characteristics of past landslide occurrences to identify potential landslide-prone areas within the study region. This approach evaluates the resemblance between the environmental conditions of historical landslide sites and those of prediction points to estimate the probability of landslide occurrence. The calculation process primarily involves two key components: determining the similarity of individual environmental factors and integrating these values to obtain the overall environmental similarity.

The calculation of the similarity for individual environmental factors involves estimating the similarity between each environmental factor of the prediction points in the study area and the corresponding historical landslide sample points. In this study, a Gaussian similarity function is used to calculate the relationship between the two. The calculation formula is as follows:

W (x, y) = e^{- \frac{{||x - y||}^{2}}{{2 σ}^{2}}}

(1)

In the formula, W(x,y) represents the similarity between the point to be inferred x and the historical landslide sample point y under a continuous variable environmental factor; ||x − y|| is the Euclidean distance between the point to be inferred and the historical landslide sample point. σ represents the standard deviation of the Gaussian distribution, which controls the rate of similarity decay. The larger the standard deviation, the slower the similarity decay, and vice versa. Using the Gaussian similarity function, the similarity between each point to be inferred and all the historical landslide sample points in each environmental factor is calculated.

For discrete environmental factors (e.g., slope aspect, land use), a simple binary matching method is used to calculate the similarity. Specifically, when the categories of the prediction point and the historical landslide sample point match, they are considered fully similar, with a similarity value of 1; when the categories do not match, the similarity value is set to 0. The specific formula is as follows:

W (x, y) = \{\begin{matrix} 1, x = y \\ 0, x \neq y \end{matrix}

(2)

The comprehensive similarity calculation seeks to further synthesize the similarity of individual environmental factors and calculate the overall similarity between each point to be inferred and the sample points of historical landslides. Since the occurrence of landslides is usually affected by the combined action of multiple environmental factors, this study considers the contributions of multiple environmental factors to the similarity. The specific calculation formula is as follows:

W_{i} = f (W_{i, 1}, W_{i, 2}, \dots, W_{i, l}, \dots, W_{i, n})

(3)

In the formula, W_i_,l represents the similarity between the i-th point to be inferred and the l-th historical landslide sample point; W_i represents the comprehensive similarity of the i-th point to be inferred; and f is the integration function. In this study, the mean method was selected as the integration method, and the average similarity of all environmental factors was taken as the value of the comprehensive similarity.

2.3. Sample Dataset Construction Method

Threshold determination is a critical step in constructing a sample dataset to distinguish landslide susceptibility levels. To rationally define thresholds, this study employs a normal distribution model combined with the similarity values of known landslide samples for analysis. Specifically, based on environmental similarity theory, environmental similarity values are calculated for each pixel. By analyzing the distribution patterns of similarity values at corresponding pixels for historical landslide samples, different susceptibility level intervals are classified. Within each interval, sample expansion is conducted according to the environmental similarity values of the prediction points, generating additional samples representing distinct susceptibility levels to supplement the dataset. The normal distribution formula is expressed as

f (x) = \frac{1}{\sqrt{2 π σ^{2}}} e x p (- \frac{{(x - μ)}^{2}}{2 σ^{2}})

(4)

where x denotes the environmental similarity value, μ represents the mean of the environmental similarity values, and σ is the standard deviation of the environmental similarity for historical landslide samples. During sample augmentation, σ governs the dispersion of the environmental similarity distribution, thereby influencing the range and density of the expanded samples. To ensure consistency in the environmental characteristics between augmented samples and historical landslide data, this study determines the σ value based on the distribution features of the environmental similarity from known landslide samples.

σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(S_{i} - μ)}^{2}}

(5)

In the formula, S_i represents the environmental similarity value of the i-th historical landslide sample point, and N denotes the total number of historical landslide samples. Additionally, data balance is carefully addressed during the dataset construction process. Due to significant disparities in the quantities of landslide samples across different risk levels, a balanced sample selection strategy is implemented under the guidance of environmental similarity theory to enhance the balance and representativeness of the dataset. This strategy ensures balanced sample quantities for each susceptibility level, thereby improving the model’s generalization capabilities.

2.4. BP Neural Network Model

The backpropagation (BP) neural network is a multilayer perceptron model based on artificial neural networks. The principle of BP is to iteratively train the network using gradient descent, continuously adjusting the weights in the network to minimize the mean square error between the computed results and the expected output and ultimately determining the weight values for each layer. BP neural networks have several advantages, including distributed information storage, the ability for large-scale parallel processing, adaptability, self-learning capabilities, and strong fault tolerance and robustness, making them highly flexible and stable in handling complex problems. Through the biomimetic neural network structure, BP neural networks can automatically learn complex nonlinear relationships, thereby classifying and predicting the landslide susceptibility.

2.5. Model Accuracy Verification Method

This study integrates environmental similarity theory with the BP neural network model for landslide susceptibility assessment. To comprehensively evaluate the model’s performance, both qualitative and quantitative analysis methods are employed. The qualitative analysis involves visualizing the spatial distribution of landslide susceptibility, comparing the susceptibility levels across different regions with actual landslide occurrences to intuitively demonstrate the model’s predictive accuracy. Meanwhile, the quantitative analysis applies statistical methods and model evaluation metrics to rigorously validate the prediction results, ensuring the model’s reliability and practical applicability.

In this research, to fully utilize landslide data while enhancing the model’s generalization ability, a cross-validation approach is adopted, dividing the sample dataset into a training set and a test set at a 7:3 ratio. Several commonly used and representative evaluation metrics, including the accuracy, Kappa index, RMSE, and ROC curve, are selected to assess and validate the model’s predictive performance. Additionally, a t-test is conducted to determine whether significant differences exist in the classification performance of the model. The corresponding calculation formulas and definitions are provided below.

Accuracy: This is an important indicator in measuring the correctness of the model’s predictions. It represents the proportion of correctly predicted samples to the total number of samples.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(6)

In this context, TP denotes true positives, TN represents true negatives, FP signifies false positives, and FN indicates false negatives. This metric offers a clear assessment of the model’s overall predictive performance.

Kappa coefficient (Kappa index): The Kappa coefficient is an indicator used to evaluate the consistency of the predictions produced by a classification model. It measures the consistency between the model’s predictions and random predictions, avoiding evaluation based solely on the overall accuracy rate. The value range is −1 to 1.

K a p p a = \frac{P_{o} - P_{e}}{1 - P_{e}}

(7)

P_o represents the observed agreement, i.e., the proportion of instances where the model’s predictions match the actual outcomes; P_e represents the expected agreement, which is the probability that two classification results would agree by chance. The Kappa coefficient is effective in mitigating misleading results caused by class imbalance and provides an objective evaluation of the model’s predictive power.

Root mean squared error (RMSE): The RMSE is a commonly used evaluation metric to quantify the error between predicted and true values. In landslide susceptibility assessments, the prediction typically reflects the likelihood of landslide occurrence or susceptibility ranking. The RMSE evaluates the difference between the predicted outcomes and actual observations.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(8)

In this formula, n signifies the sample size, specifically the total number of instances within the test set;

{\hat{y}}_{i}

corresponds to the predicted output of the i-th sample, derived from the model’s input-driven computations; and

y_{i}

represents the actual value for the i-th sample, i.e., the true observed result or label.

The t-test is a statistical method for hypothesis testing, primarily used to determine whether there is a significant difference between the means of two groups of data. The formula is expressed as

t = \frac{\bar{D}}{s_{D} / \sqrt{n}}

(9)

where

\bar{D}

denotes the mean of the paired differences, S_D represents the standard deviation of the paired differences, and n indicates the sample size of the paired data.

3. Overview of the Study Area and Data Preprocessing

3.1. Study Area

The Baihetan Hydropower Station, situated in the lower reaches of the Jinsha River at the junction of Sichuan and Yunnan in China, is currently the world’s second-largest hydropower station. The reservoir’s catchment area encompasses approximately 24,200 km², with a total storage capacity reaching 20.627 billion m³. This region experiences a subtropical plateau monsoon climate, characterized by distinct wet and dry seasons with synchronous rainfall and temperature patterns. Rainfall infiltration softens the rock and soil masses, increases the pore water pressure, and reduces the shear strength, making the area susceptible to landslides. Furthermore, active faults such as the Zemuhe and Xiaojiang fault zones traverse the area, resulting in fractured and loosened rock masses that further diminish the slope stability. The landscape is primarily shaped by fluvial erosion, tectonic activity, and glacial processes, featuring deeply incised river valleys and severe weathering and denudation. These geological and hydrometeorological conditions contribute to the frequent occurrence of landslide disasters in the region. This study focuses on the Baihetan reservoir area, covering a total area of 4649.68 km², as illustrated in Figure 2.

3.2. Data Preprocessing

3.2.1. Historical Landslide Data

The primary sources of historical landslide data include field measurements and manual visual interpretation. Using the high-resolution satellite imagery provided by the Google Earth platform, combined with landslide morphological characteristics, surrounding vegetation types, geological environments, and topographical conditions, the manual identification and preliminary delineation of landslide areas were conducted. This approach offers a high level of flexibility. To ensure the accuracy and completeness of the landslide data, the interpreted results were compared and verified against existing geological survey data, further refining and validating the available landslide data. In addition, the relevant geological literature [35,36] was referenced to enhance the accuracy and comprehensiveness of the dataset. In this study, a total of 102 landslide sample points were collected in the Baihetan reservoir area [37].

3.2.2. Environmental Factor Data

The formation of landslide hazards typically results from interactions among multiple geographical environmental factors. These factors not only directly influence the slope stability but also determine the scale and spatial distribution characteristics of landslide events. Based on a comprehensive analysis of the topography, geological conditions, and prior research findings in the Baihetan reservoir area, this study quantified the weights of 11 environmental factors—rainfall, elevation, slope, aspect, curvature, lithology, NDVI, distance to roads, distance to rivers, distance to faults, and land use type—through feature importance analysis combined with existing landslide data. To enhance the experimental accuracy, redundant factors with low weights (curvature, rainfall, and lithology) were excluded. Eight factors were ultimately selected as core indicators for landslide susceptibility evaluation, namely the elevation, slope, aspect, NDVI, distance to roads, distance to rivers, distance to faults, and land use type [38], as summarized in Table 1.

Among these, the elevation directly affects the potential energy and internal stress of slopes, serving as a foundational condition for landslide initiation. The slope is a critical determinant of the landslide likelihood, with specific ranges significantly increasing the susceptibility. Aspect reflects the slope’s orientation, influencing its stability. The Normalized Difference Vegetation Index (NDVI), a widely used metric for vegetation coverage, correlates with the slope stability, as plant root systems reinforce soil and mitigate erosion. The distances to roads, rivers, and faults quantify the impacts of human engineering activities, hydrological processes, and tectonic zones on the landslide probability. The land use type, as a comprehensive environmental factor, further reflects the potential influence of human activities on landslide occurrence.

The Sentinel-2 imagery data used in this study were sourced from the European Space Agency (ESA) and processed using the ArcGIS (v10.8) software to generate terrain-related environmental factors such as the elevation, slope, and aspect. Data on roads, rivers, land use, and faults were obtained from OpenStreetMap (OSM) and the National Earthquake Science Data Center, with spatial analysis used to derive the distances to rivers, roads, and faults. All data were standardized to a 30 m spatial resolution [39], as illustrated in Figure 3.

4. Results Analysis

4.1. Sample Expansion Based on Environmental Similarity

The analysis of landslide sample point similarity reveals that there are no sample points with a similarity value below 0.80. In the range of 0.80 to 0.90, there are relatively few landslide sample points, and their distribution is scattered. This indicates that the sample points in this range are somewhat incidental and do not represent the general occurrence of landslides, thus lacking high representativeness in susceptibility assessment. Between 0.92 and 0.94, the number of landslide sample points increases sharply, reaching its maximum value within this range. The dense distribution of sample points in this range suggests that the similarity values effectively reflect high-risk areas for landslide occurrences, as illustrated in Figure 4.

The threshold divisions are as follows: similarity values between 0 and 0.80—very low-susceptibility area, where the similarity is low, the sample points are sparse, and the probability of landslides in this area is very low; similarity values between 0.80 and 0.90—low-susceptibility area, where a small number of sample points exist, indicating a relatively low risk and a low probability of landslides but still some potential risk; similarity values between 0.90 and 0.92—medium-susceptibility area, where the number of landslide points gradually increases, and the distribution trend is obvious, indicating that the probability of landslides in this area is moderate; similarity values between 0.92 and 0.94—high-susceptibility area, where the sample points of landslides are dense and reach a maximum, clearly reflecting that this area is highly susceptible to landslides; similarity value between 0.94 and 1—extremely high-susceptibility area, where the similarity value is the highest and the potential risk of landslides is the greatest, so this interval is defined as an extremely high-risk area. Through this threshold classification, landslide susceptibility is categorized into five levels. Equal quantities of augmented samples are generated for each susceptibility class to ensure dataset balance. Consequently, the original 102 historical landslide points are expanded to a dataset of 4500 samples.

4.2. Comparative Analysis of Landslide Susceptibility Models

This study compares three mainstream landslide susceptibility assessment methods: BPNN, SVM, and RF. The landslide susceptibility levels are categorized into five classes: extremely low, low, moderate, high, and extremely high. Initially, the models were trained and tested using existing historical landslide samples, resulting in landslide susceptibility maps (Figure 5) based on these three models.

Due to the insufficient amount of landslide sample data, the model parameters could not be fully optimized, which made it difficult to accurately capture the relationships between landslide features and environmental factors, resulting in underfitting. Moreover, the data exhibited biases and imbalances, which reduced the model’s prediction accuracy and limited its predictive capacity. Additionally, when the sample sizes are small, the model becomes more sensitive to outliers, leading to unstable predictions and bias.

To demonstrate the applicability and effectiveness of the proposed method, an experimental comparison was performed. In this experiment, the sample dataset constructed using environmental similarity was used for model training and landslide susceptibility evaluation, with the results shown in Figure 6.

The comparative analysis of the landslide susceptibility evaluation methods reveals that the BPNN model achieves superior sensitivity and classification accuracy in specific regions, delivering a more refined spatial representation of the landslide risk distribution. Relative to the SVM and RF models, the BPNN framework exhibits enhanced capabilities in deciphering non-linear relationships among sample features. Under complex geological conditions, this method not only maintains robust consistency in susceptibility classification but also demonstrates exceptional precision in localizing high-risk zones. Furthermore, it effectively elucidates intrinsic correlations between the landslide dynamics and environmental drivers, yielding reliable susceptibility predictions even in data-deficient regions. This represents a pragmatic and scalable technical paradigm for regional landslide risk management. This study systematically evaluates the performance of the BPNN, SVM, and RF models in landslide susceptibility mapping. Key accuracy metrics are summarized as follows.

An analysis of Figure 7 reveals that the BPNN demonstrates superior performance, significantly outperforming the other models. This method achieves high classification accuracy and consistency, with lower RMSE values indicating stronger discriminative capabilities in distinguishing landslide susceptibility levels and the more precise capture of environmental factor interactions. In comparison, the SVM also exhibits robust performance, reflecting reliable classification consistency. Overall, the BPNN excels across all three metrics—accuracy, the Kappa coefficient, and the RMSE—while the RF model underperforms relative to the others. Although the SVM delivers commendable results, it remains inferior to the BPNN.

As illustrated in Figure 8, the ROC curves demonstrate that the BPNN (AUC = 0.85) achieves the best performance, with its curve positioned closer to the top-left corner, indicating a higher true positive rate (TPR) at lower false positive rates (FPR). The SVM (AUC = 0.80) slightly outperforms the RF model (AUC = 0.78). Our comprehensive analysis confirms that the BPNN is better suited for landslide susceptibility evaluation in this study.

A t-test was conducted to compare the models’ performance. Table 2 reveals that all p-values are below 0.05, confirming the statistically significant differences in performance among the RF, SVM, and BPNN models. This underscores the substantial disparities in their classification efficacy.

4.3. Landslide Susceptibility Evaluation Based on the ESM-BP Method

The landslide susceptibility of the study area was evaluated using a BPNN model combined with sample data supplemented by environmental similarity (ESM-BP). The landslide susceptibility evaluation results are shown in Figure 9.

The experimental findings demonstrate strong alignment between the susceptibility predictions generated by the ESM-BP framework and the spatial distribution of historical landslide occurrences. Notably, elevated landslide susceptibility is concentrated along riverbank zones, with the majority of documented landslides clustering within high-susceptibility areas identified by the model. Furthermore, the framework has identified previously unrecorded high-susceptibility zones in data-scarce regions, underscoring its capability to uncover latent risks. The quantitative analysis of the susceptibility class proportions across the study area reveals the pronounced spatial stratification of the landslide probability, highlighting regions requiring prioritized risk mitigation measures.

According to Table 3, extremely low- and low-susceptibility areas occupy the vast majority of the study area, at more than 70%. This indicates that most areas have a low risk of landslides and are less likely to experience landslides. In addition, about 25% of the study area (including moderately susceptible, highly susceptible, and extremely susceptible areas) is at risk of landslides to varying degrees.

An analysis of the spatial distribution of the landslide susceptibility across distinct environmental factors (Figure 10) reveals heterogeneous susceptibility patterns governed by specific geoenvironmental conditions. Notably, a pronounced correlation exists between the slope gradient and susceptibility levels. Within the gentle slope range of 0–10°, very low susceptibility dominates, occupying 423.73 km² exclusively, underscoring the minimal landslide probability in low-relief terrains. As the slope steepness increases, the susceptibility escalates progressively, with high-susceptibility areas peaking at 259.47 km² in the 20–30° range—the largest spatial extent for this category. Similarly, moderate- (204.49 km²) and high-susceptibility (171.71 km²) zones prevail in the 30–40° gradient range, aligning with empirical observations that steeper slopes constitute primary landslide-prone regions.

Furthermore, high-propensity zones cluster near the tributaries of the reservoir’s hydrological network. Empirical evidence suggests that reservoir-impoundment-induced hydrogeological alterations, coupled with bank erosion and abrupt vegetation dynamics, destabilize the slope equilibrium, amplifying the landslide risks. The susceptibility is further modulated by the elevation, aspect, vegetation density, and anthropogenic activity. Southern and southwestern slopes exhibit elevated landslide probabilities, while cultivated and urbanized areas demonstrate reduced risks. Unutilized lands, however, display heightened susceptibility. Forested regions, characterized by robust vegetation (NDVI > 0.6), exhibit the largest very low-susceptibility area (1761.7 km²), attributed to root reinforcement enhancing soil cohesion. Nevertheless, localized high-susceptibility patches persist due to steep gradients and unfavorable geological substrates.

The elevational analysis identifies 1000–2500 m as a high-risk altitudinal band, with moderate-to-high-susceptibility zones predominating. Conversely, low NDVI values (sparse vegetation) correlate strongly with elevated landslide likelihoods, whereas denser vegetation coverage (NDVI > 0.4) progressively mitigates the risk, culminating in minimal susceptibility in well-vegetated areas.

5. Discussion and Conclusions

5.1. Discussion

This study establishes a landslide susceptibility evaluation model based on environmental similarity theory. The results reveal significant spatial overlap between high-similarity zones and historically documented landslide areas, demonstrating that the environmental similarity method effectively captures critical geoenvironmental features influencing landslide occurrence. Even under conditions of limited sample data, this approach provides reliable scientific support for landslide risk assessment. However, the proposed methodology remains in the preliminary exploration phase and requires extensive further research for refinement, particularly in the following three aspects.

(1): Environmental Similarity Methodology: By integrating multiple environmental factors (e.g., elevation, slope, NDVI), this method quantifies the similarity between evaluation points and historical landslide samples, offering a novel framework for susceptibility evaluation. While it reduces the reliance on large sample datasets and shows promising applicability in this study, the current experimental results lack an in-depth analysis of the model’s predictive mechanisms and systematic comparisons with alternative sample augmentation or transfer learning approaches. A core assumption of the method is the consistency between historical landslide samples and the current environmental conditions. However, extreme climate events (e.g., extreme rainfall, prolonged droughts) can trigger non-stationary geological responses, significantly diminishing the predictive power of conventional environmental factors. Additionally, region-specific environmental factors must be carefully selected based on the local characteristics. Future work should rigorously validate the method’s applicability under diverse extreme climate scenarios and complex geological settings, while exploring optimized factor selection strategies through multidimensional experiments and comparative analyses to comprehensively elucidate its strengths and limitations.
(2): Threshold Determination for Susceptibility Classification: The reasonable classification of landslide susceptibility levels and the establishment of an effective sample dataset are crucial in improving the model’s performance in landslide susceptibility assessment. Existing threshold classification methods often rely on fixed thresholds or manually defined standards, which lack universality and fail to comprehensively reflect the geological environment and landslide mechanisms across different regions. In this study, a normal distribution method was employed for threshold classification, providing a scientifically grounded representation of the spatial distribution patterns of landslides within the study area. This approach outperforms traditional empirical methods. However, due to variations in the geological characteristics and landslide mechanisms across different regions, the current threshold classification method still has certain limitations. Future research could further optimize threshold classification by integrating multidimensional environmental factors and regional differences. Additionally, exploring adaptive algorithms for dynamic threshold adjustment mechanisms could enhance the method’s adaptability to different geographical conditions and temporal variations [40].
(3): Although this study demonstrates the effectiveness of the ESM-BP method in landslide susceptibility assessment, certain data-related limitations require further exploration. First, historical landslide data were validated and refined by integrating field surveys, manual visual interpretation, and existing geological survey data. However, these methods are inherently susceptible to human subjectivity and interpretation biases, potentially increasing the data uncertainty. Second, although the environmental similarity method has been employed to expand the limited sample set, the scarcity of high-quality historical data may constrain the accuracy of similarity calculations within the study area. Future research could explore the integration of multi-source datasets or the application of transfer learning to further enrich the original landslide dataset.

5.2. Conclusions

This study develops an integrated landslide susceptibility assessment framework (ESM-BP) by coupling environmental similarity theory with a BPNN to mitigate challenges arising from limited sample availability in susceptibility modeling. Specifically, leveraging the principles of environmental similarity, the method quantifies the similarity between prediction points and historical landslide samples, employs a normal-distribution-based threshold strategy to augment landslide datasets, and integrates a BPNN for susceptibility evaluation. Comparative validation against SVM and RF models demonstrates that the ESM-BP framework effectively addresses the data scarcity constraints inherent to conventional susceptibility assessments reliant on sparse historical samples. The experimental results reveal the following.

(1): By introducing the theory of environmental similarity, the original 102 historical landslide points were expanded to 4500 sample data with different susceptibility levels.
(2): The BP neural network model has the highest accuracy, with an accuracy value of 0.93, a Kappa coefficient value of 0.91, and an RMSE value of 0.25. Compared with SVM, the accuracy value was improved by 0.02, and the RMSE value decreased by 0.04; compared with RF, the accuracy value increased by 0.14, the Kappa coefficient value increased by 0.18, and the RMSE value decreased by 0.21.

This study successfully addressed the problem of difficulties in predicting the landslide susceptibility under conditions of sample scarcity by combining a BP neural network with an environmental similarity metric. Future research can further optimize the threshold division method by combining richer environmental data and dynamic climate information to improve the evaluation accuracy. In summary, this study not only demonstrates methodological innovation for landslide susceptibility evaluation but also provides new ideas for the prediction and management of other geological hazards and risks.

Author Contributions

Conceptualization, R.W., W.X., R.C. and Y.M.; methodology, R.W.; software, Y.Z.; validation, R.W. and W.X.; formal analysis, R.W.; investigation, R.W.; resources, R.W.; data curation, R.W.; writing—original draft preparation, R.W.; writing—review and editing, W.X., G.H., Z.Y., K.Y. and D.Z.; visualization, R.W.; supervision, W.X.; project administration, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

Major scientific and technological projects of Yunnan Province: Research on Key Technologies of Ecological Environment Monitoring and Intelligent Management of Natural Resources in Yunnan (No: 202202AD080010), Basic Research Plan Outstanding Youth Fund Project of Yunnan Province (Grant No. 202401AV070010), National Natural Science Foundation of China (Grant No. 41861134008), Muhammad Asif Khan academician workstation of Yunnan Province (Grant No. 202105AF150076), Key R&D Program of Yunnan Province (Grant No. 202003AC100002), Guizhou Scientific and Technology Fund (Grant No. QKHJ-ZK (2023) YB 193), Yunnan Province Innovation Team Project (202305AS350003), Guizhou Provincial Basic Research Program (Natural Science) (No. MS013), Yunnan Science and Technology Plan Key Project (202401AS070638).

Data Availability Statement

The original data presented in the study are openly available in [Copernicus Data Space Ecosystem; OpenStreetMap; National Earthquake Data Center; National Tibetan Plateau/Third Pole Environment Data Center] at [https://dataspace.copernicus.eu/; https://www.openstreetmap.org/; https://data.earthquake.cn/; https://data.tpdc.ac.cn/].

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Huang, Y.; Zhao, L. Review on landslide susceptibility mapping using support vector machines. Catena 2018, 165, 520–529. [Google Scholar] [CrossRef]
Ado, M.; Amitab, K.; Maji, A.K.; Jasińska, E.; Gono, R.; Leonowicz, Z.; Jasiński, M. Landslide susceptibility mapping using machine learning: A literature survey. Remote Sens. 2022, 14, 3029. [Google Scholar] [CrossRef]
Pradhan, B.; Dikshit, A.; Lee, S.; Kim, H. An explainable ai (xai) model for landslide susceptibility modeling. Appl. Soft Comput. 2023, 142, 110324. [Google Scholar] [CrossRef]
Di Napoli, M.; Carotenuto, F.; Cevasco, A.; Confuorto, P.; Di Martire, D.; Firpo, M.; Pepe, G.; Raso, E.; Calcaterra, D. Machine learning ensemble modelling as a tool to improve landslide susceptibility mapping reliability. Landslides 2020, 17, 1897–1914. [Google Scholar] [CrossRef]
Dai, F.; Lee, C.F.; Ngai, Y.Y. Landslide risk assessment and management: An overview. Eng. Geol. 2002, 64, 65–87. [Google Scholar] [CrossRef]
Froude, M.J.; Petley, D.N. Global fatal landslide occurrence from 2004 to 2016. Nat. Hazards Earth Syst. Sci. 2018, 18, 2161–2181. [Google Scholar] [CrossRef]
Petley, D. Global patterns of loss of life from landslides. Geology 2012, 40, 927–930. [Google Scholar] [CrossRef]
Liu, X.; Miao, C. Large-scale assessment of landslide hazard, vulnerability and risk in china. Geomat. Nat. Hazards Risk 2018, 9, 1037–1052. [Google Scholar] [CrossRef]
Wang, C.; Lin, Q.; Wang, L.; Jiang, T.; Su, B.; Wang, Y.; Mondal, S.K.; Huang, J.; Wang, Y. The influences of the spatial extent selection for non-landslide samples on statistical-based landslide susceptibility modelling: A case study of anhui province in china. Nat. Hazards 2022, 112, 1967–1988. [Google Scholar] [CrossRef]
Lin, Q.; Wang, Y. Spatial and temporal analysis of a fatal landslide inventory in china from 1950 to 2016. Landslides 2018, 15, 2357–2372. [Google Scholar] [CrossRef]
Ullah, I.; Aslam, B.; Shah, S.H.I.A.; Tariq, A.; Qin, S.; Majeed, M.; Havenith, H.-B. An integrated approach of machine learning, remote sensing, and gis data for the landslide susceptibility mapping. Land 2022, 11, 1265. [Google Scholar] [CrossRef]
Liao, M.; Wen, H.; Yang, L. Identifying the essential conditioning factors of landslide susceptibility models under different grid resolutions using hybrid machine learning: A case of wushan and wuxi counties, china. Catena 2022, 217, 106428. [Google Scholar] [CrossRef]
Zeng, T.; Jin, B.; Glade, T.; Xie, Y.; Li, Y.; Zhu, Y.; Yin, K. Assessing the imperative of conditioning factor grading in machine learning-based landslide susceptibility modeling: A critical inquiry. Catena 2024, 236, 107732. [Google Scholar] [CrossRef]
Keefer, D.K.; Larsen, M.C. Assessing landslide hazards. Science 2007, 316, 1136–1138. [Google Scholar] [CrossRef]
Conoscenti, C.; Rotigliano, E.; Cama, M.; Caraballo-Arias, N.A.; Lombardo, L.; Agnesi, V. Exploring the effect of absence selection on landslide susceptibility models: A case study in sicily, italy. Geomorphology 2016, 261, 222–235. [Google Scholar] [CrossRef]
Hong, H.; Miao, Y.; Liu, J.; Zhu, A.-X. Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. Catena 2019, 176, 45–64. [Google Scholar] [CrossRef]
Lin, Q.; Lima, P.; Steger, S.; Glade, T.; Jiang, T.; Zhang, J.; Liu, T.; Wang, Y. National-scale data-driven rainfall induced landslide susceptibility mapping for china by accounting for incomplete landslide data. Geosci. Front. 2021, 12, 101248. [Google Scholar] [CrossRef]
Wang, L.-J.; Guo, M.; Sawada, K.; Lin, J.; Zhang, J. A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network. Geosci. J. 2016, 20, 117–136. [Google Scholar] [CrossRef]
Nefeslioglu, H.A.; Sezer, E.A.; Gokceoglu, C.; Ayas, Z. A modified analytical hierarchy process (m-ahp) approach for decision support systems in natural hazard assessments. Comput. Geosci. 2013, 59, 1–8. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Kawabata, D.; Bandibas, J. Landslide susceptibility mapping using geological data, a dem from aster images and an artificial neural network (ann). Geomorphology 2009, 113, 97–109. [Google Scholar] [CrossRef]
Nanehkaran, Y.A.; Chen, B.; Cemiloglu, A.; Chen, J.; Anwar, S.; Azarafza, M.; Derakhshani, R. Riverside landslide susceptibility overview: Leveraging artificial neural networks and machine learning in accordance with the united nations (un) sustainable development goals. Water 2023, 15, 2707. [Google Scholar] [CrossRef]
Van Den Eeckhaut, M.; Hervás, J.; Jaedicke, C.; Malet, J.-P.; Montanarella, L.; Nadim, F. Statistical modelling of europe-wide landslide susceptibility using limited landslide inventory data. Landslides 2012, 9, 357–369. [Google Scholar] [CrossRef]
Du, J.; Glade, T.; Woldai, T.; Chai, B.; Zeng, B. Landslide susceptibility assessment based on an incomplete landslide inventory in the jilong valley, tibet, chinese himalayas. Eng. Geol. 2020, 270, 105572. [Google Scholar] [CrossRef]
Fu, Z.; Li, C.; Yao, W. Landslide susceptibility assessment through tradaboost transfer learning models using two landslide inventories. Catena 2023, 222, 106799. [Google Scholar]
Tian, Y.; Deng, Y.-L.; Zhang, M.-Z.; Pang, X.; Ma, R.-P.; Zhang, J.-X. Short-term displacement prediction for newly established monitoring slopes based on transfer learning. China Geol. 2024, 7, 351–364. [Google Scholar] [CrossRef]
Ran, P.; Li, S.; Zhuo, G.; Wang, X.; Meng, M.; Liu, L.; Chen, Y.; Huang, H.; Ye, Y.; Lei, X. Early identification and influencing factors analysis of active landslides in mountainous areas of southwest china using SBAS−InSAR. Sustainability 2023, 15, 4366. [Google Scholar] [CrossRef]
Zhu, A.X.; Lu, G.; Liu, J.; Qin, C.Z.; Zhou, C. Spatial prediction based on third law of geography. Ann. GIS 2018, 24, 225–240. [Google Scholar] [CrossRef]
Zhu, A.-X.; Miao, Y.; Liu, J.; Bai, S.; Zeng, C.; Ma, T.; Hong, H. A similarity-based approach to sampling absence data for landslide susceptibility mapping using data-driven methods. Catena 2019, 183, 104188. [Google Scholar] [CrossRef]
Chen, W.; Peng, J.; Hong, H.; Shahabi, H.; Pradhan, B.; Liu, J.; Zhu, A.-X.; Pei, X.; Duan, Z. Landslide susceptibility modelling using gis-based machine learning techniques for chongren county, jiangxi province, china. Sci. Total Environ. 2018, 626, 1121–1135. [Google Scholar] [CrossRef]
Conforti, M.; Borrelli, L.; Cofone, G.; Gullà, G. Exploring performance and robustness of shallow landslide susceptibility modeling at regional scale using different training and testing sets. Environ. Earth Sci. 2023, 82, 161. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, Z.; Zhang, W.; Jia, H.; Zhang, J. Landslide susceptibility mapping using multiscale sampling strategy and convolutional neural network: A case study in jiuzhaigou region. Catena 2020, 195, 104851. [Google Scholar] [CrossRef]
Habumugisha, J.M.; Chen, N.; Rahman, M.; Islam, M.M.; Ahmad, H.; Elbeltagi, A.; Sharma, G.; Liza, S.N.; Dewan, A. Landslide susceptibility mapping with deep learning algorithms. Sustainability 2022, 14, 1734. [Google Scholar] [CrossRef]
Yang, Z.; Zhao, Q.; Gan, J.; Zhang, J.; Chen, M.; Zhu, Y. Damage evolution characteristics of siliceous slate with varying initial water content during freeze-thaw cycles. Sci. Total Environ. 2024, 950, 175200. [Google Scholar] [CrossRef]
Li, L.; Xu, C.; Yao, X.; Shao, B.; Ouyang, J.; Zhang, Z.; Huang, Y. Large-scale landslides around the reservoir area of baihetan hydropower station in southwest china: Analysis of the spatial distribution. Nat. Hazards Res. 2022, 2, 218–229. [Google Scholar] [CrossRef]
Li, Z.-H.; Shi, A.-C.; Xiao, H.-X.; Niu, Z.-H.; Jiang, N.; Li, H.-B.; Hu, Y.-X. Robust landslide recognition using uav datasets: A case study in baihetan reservoir. Remote Sens. 2024, 16, 2558. [Google Scholar] [CrossRef]
Harp, E.L.; Keefer, D.K.; Sato, H.P.; Yagi, H. Landslide inventories: The essential part of seismic landslide hazard analyses. Eng. Geol. 2011, 122, 9–21. [Google Scholar] [CrossRef]
Zhu, Y.; Qiu, H.; Liu, Z.; Ye, B.; Tang, B.; Li, Y.; Kamp, U. Rainfall and water level fluctuations dominated the landslide deformation at baihetan reservoir, china. J. Hydrol. 2024, 642, 131871. [Google Scholar] [CrossRef]
Qiu, H.; Zhu, Y.; Zhou, W.; Sun, H.; He, J.; Liu, Z. Influence of dem resolution on landslide simulation performance based on the scoops3d model. Geomat. Nat. Hazards Risk 2022, 13, 1663–1681. [Google Scholar] [CrossRef]
Wang, Q.; Li, W.; Chen, W.; Bai, H. Gis-based assessment of landslide susceptibility using certainty factor and index of entropy models for the qianyang county of baoji city, china. J. Earth Syst. Sci. 2015, 124, 1399–1415. [Google Scholar] [CrossRef]

Figure 1. Basic idea.

Figure 2. Overview of the study area.

Figure 3. Environmental factor data preprocessing. (a) Elevation; (b) Slope; (c) Aspect; (d) NDVI; (e) Distance to road; (f) Distance to fault; (g) Distance to river; (h) Land use.

Figure 4. Sample similarity distribution.

Figure 5. Landslide susceptibility evaluation based on historical landslide samples. (a) BPNN; (b) RF; (c) SVM.

Figure 6. Landslide susceptibility evaluation based on environmental similarity supplementary samples. (a) BPNN; (b) RF; (c) SVM.

Figure 7. Model accuracy evaluation.

Figure 8. ROC curves.

Figure 9. Landslide susceptibility evaluation results based on ESM-BP.

Figure 10. Area distribution of landslide susceptibility grades in different environmental factors. (a) Elevation; (b) Slope; (c) Aspect; (d) Distance to road; (e) Distance to river; (f) Distance to fault; (g) NDVI; (h) Land use.

Table 1. Environmental factor data.

Category	Factor	Weight	Data Source
Topography	Elevation	0.19	https://dataspace.copernicus.eu/
	Slope	0.14
	Aspect	0.12
	Curvature	0.03
Vegetation cover	NDVI	0.08
Human activity	Distance to roads	0.11	https://www.openstreetmap.org/
Human activity	Land use	0.09
Hydrology	Distance to rivers	0.10
Geology	Distance to faults	0.12	https://data.earthquake.cn/
Geology	Lithology	0.01	https://data.earthquake.cn/
Meteorology	Rainfall	0.01	https://data.tpdc.ac.cn/

Table 2. t-test results.

Model Comparison	p-Value
RF vs. SVM	0.00067
RF vs. BPNN	0.0109
SVM vs. BPNN	0.0036

Table 3. Analysis of the proportion of landslide susceptibility grades by area.

Total Area: 4649.68 km²
Grade	Area/km²	Percentage
Extremely low	2931.93	63.06%
Low	533.28	11.47%
Middle	562.84	12.11%
High	587.52	12.64%
Extremely high	34.11	0.73%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Xi, W.; Huang, G.; Yang, Z.; Yang, K.; Zhuang, Y.; Cao, R.; Zhou, D.; Ma, Y. Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks. Land 2025, 14, 839. https://doi.org/10.3390/land14040839

AMA Style

Wang R, Xi W, Huang G, Yang Z, Yang K, Zhuang Y, Cao R, Zhou D, Ma Y. Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks. Land. 2025; 14(4):839. https://doi.org/10.3390/land14040839

Chicago/Turabian Style

Wang, Ruiting, Wenfei Xi, Guangcai Huang, Zhiquan Yang, Kunwu Yang, Yongzai Zhuang, Ruihan Cao, Dingjie Zhou, and Yijie Ma. 2025. "Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks" Land 14, no. 4: 839. https://doi.org/10.3390/land14040839

APA Style

Wang, R., Xi, W., Huang, G., Yang, Z., Yang, K., Zhuang, Y., Cao, R., Zhou, D., & Ma, Y. (2025). Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks. Land, 14(4), 839. https://doi.org/10.3390/land14040839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Landslide Susceptibility Evaluation Based on the Combination of Environmental Similarity and BP Neural Networks

Abstract

1. Introduction

2. Theoretical Methods

2.1. Environmental Similarity Method Process

2.2. Environmental Similarity Calculation Method

2.3. Sample Dataset Construction Method

2.4. BP Neural Network Model

2.5. Model Accuracy Verification Method

3. Overview of the Study Area and Data Preprocessing

3.1. Study Area

3.2. Data Preprocessing

3.2.1. Historical Landslide Data

3.2.2. Environmental Factor Data

4. Results Analysis

4.1. Sample Expansion Based on Environmental Similarity

4.2. Comparative Analysis of Landslide Susceptibility Models

4.3. Landslide Susceptibility Evaluation Based on the ESM-BP Method

5. Discussion and Conclusions

5.1. Discussion

5.2. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI