1. Introduction
Landslides are a highly destructive natural disaster that seriously affect the safety of human life and property, as well as social and economic development [
1,
2]. Landslides account for nearly 9% of the total number of natural disasters worldwide [
3], and China is one of the countries most extensively and severely affected by landslide disasters in the world, with landslides occurring in mountainous and hilly regions across the country. According to the State Statistical Bureau (
http://www.stats.gov.cn/ (accessed on 1 January 2023), there were 332,715 geological disasters in China from 2004 to 2021, including 237,487 landslides, resulting in more than 24,000 casualties and direct economic losses of approximately USD 1.5 billion. Therefore, it is indispensable to identify areas that are potentially susceptible to landslides [
4].
Landslide susceptibility mapping (LSM) is a method of quantitatively predicting the spatial distribution of landslide susceptibility in a region by combining regional topography, geological structures, hydro-meteorology and other characteristics, which is of great significance to landslide prevention and management, and urban planning [
5,
6]. Earlier studies on LSM have mainly adopted statistical models (e.g., entropy [
7], frequency ratio [
8], linear regression [
9], etc.). Statistical models that are based on historical landslides and geographical conditions have the advantage of being quantitative and objective in their analysis. However, traditional statistical models are weak in explaining the complex and non-linear relationship between landslides and their conditioning factors, and are subjective in selecting factor weights, making it difficult to handle high-dimensional and large data sets; in addition, their accuracy still needs to be further improved.
With the development of data mining technology, machine learning methods have been increasingly extensively used in landslide susceptibility mapping, such as support vector machine (SVM) [
10,
11], decision tree (DT) [
12,
13], random forest (RF) [
14,
15], etc. The choice of evaluation method is crucial in the process of landslide susceptibility mapping, and directly affects the generalization ability and transparency of the model. Machine learning methods explore the complex relationship between landslides and their conditioning factors based on historical landslide data, and have the advantages of a high evaluation accuracy, strong generalization performance and less over-fitting. Pradhan [
16] demonstrated the significant advantage of machine learning models in reflecting the relationship between environmental factors and landslide susceptibility based on a comparison of three relatively new approaches used to predict landslides in Penang Mountain landslide, Malaysia; these were SVM, DT and the adaptive neuro-fuzzy inference system (ANFIS). Most scholars have studied the basic conditioning factors and decision mechanism of landslides based on different machine learning models or hybrid machine learning models [
17,
18]. Liao et al. [
19] discussed the effect of hybrid machine learning model identification on landslide susceptibility evaluation at different grid resolutions, aiming to identify the underlying conditioning factors of landslides and improve the predictive capability of landslide susceptibility evaluation models. In recent years, Gradient Boosting has been widely considered by scholars for its excellent prediction capability and stability. In particular, XGBoost, LightGBM and CatBoost are increasingly being used in research on landslides. Xu et al. [
20] proposed a superposition concept and an ensemble learning technology for eight types of machine learning. By comparing the prediction results of the optimized models, their capabilities were found to be superior to those of ordinary regression models, and the ensemble learning models were combined and applied to landslide prediction. The study improved the robustness and generalization ability of machine learning models. Combining the machine learning methods XGBoost and LightGBM, Zhang et al. [
21] developed an incident reliability analysis method and used it for the analysis of the Bazimen Landslide in the Three Gorges Reservoir Area. The model enables the efficient and accurate evaluation of time-varying damage probabilities, facilitating the acquisition of time-variant failure probabilities in practical applications and reducing the computational cost of performing extensive deterministic analysis.
While machine learning models can improve evaluation accuracy and outperform traditional models, the “black-box” characteristics of these algorithms result in less transparency and credibility, and cannot be explained to users [
22]; this has a huge impact on their applications in high-risk areas, such as geological disaster prediction, automatic driving and medical diagnosis, and has severely hindered the development of machine learning in a number of areas. In recent years, the post-interpretation algorithm [
23] has provided new directions for the interpretation of black-box models. This algorithm is designed to build exegetical models to explain the working mechanism of learning model and decision-making behavior, which makes fairer and more robust decisions while ensuring the causality of model inference [
24]. Commonly used post hoc machine learning interpretation algorithms include Shapley additive explanation (SHAP) [
25], partial dependence plots (PDP) [
26,
27], local interpretable model-agnostic explanations (LIME) [
28], global surrogate [
29], Dalex [
30], etc. These kinds of analysis tools are developing rapidly, among which the SHAP algorithm is gaining popularity due to its simple operation and comprehensive content. The algorithm has been used with good results in other research areas, but the post-interpretation algorithm is rarely used in the LSM field. Shaker [
31] developed an explainable AD diagnosis and progression detection model using random forest to forecast the early diagnosis and incidence of AD within three years, and applied the SHAP algorithm to provide a global and instance explanation that enhances the credibility of the random forest model. Zhou [
22] presented an interpretable combined model based on SHAP and XGBoost that can provide a scientific basis for landslide hazards and can be used as a comprehensive evaluation framework for landslide susceptibility. Omer [
32] proposed a new hierarchical binary prediction framework for the susceptibility assessment of geological and hydrological disasters, such as floods and landslides, using a combined ERT and PSO classification scheme based on the Shapley additive interpretation algorithm.
Geomorphic features are an important component of the earth surface system. The geomorphology classification system is used to fundamentally classify the basic elements of geomorphology (relief, slope and elevation, etc.), constituent materials (bedrock, unconsolidated sediments), the forces of genesis, and the landform formation environment based on geomorphic genesis [
33,
34]. The characteristics of various internal and external forces that shape landforms and their magmatic differentiation lead to the multiple superposition of modern geomorphic entities, resulting in geomorphic diversity and variability. Therefore, the study of the genesis of different landform types is one of the fundamental and central elements of geographic research. However, previous authors studying landslide susceptibility generally consider the entire research area as a unit, which is more effective and accurate for areas with simple and less varied landform environments. However, the subareas with complex landform environments and topographies contained in the majority of the research area have different geological structures and environmental conditions even when the same evaluation factors are applied. In addition, the degree of influence of the evaluation factors on landslides in various small areas may still differ depending on the type of landform, resulting in the poor reliability and authenticity of the evaluation results. Hence, it is necessary to evaluate the landslide susceptibility of different areas in a larger study area by zoning blocks according to their landform type, to assess and interpret the contribution of conditioning factors, and to investigate the variation in the intrinsic factors that affect landslide occurrence under the conditions of different landform types.
Chongqing, located in southwestern China, is one of the most landslide-prone areas in China due to its complex geological and geomorphological environment. For this reason, this paper takes two areas of Chongqing with widely varying landform and geological conditions, namely the corrosion layered high and middle mountain region (Zone I), and the middle mountainous region of strong karst gorges (Zone II), as the research area in order to explore the internal cause mechanisms and spatial distribution of the different landform types that affect landslide occurrence. The objectives of this paper were as follows: (1) To use the Bayesian optimization algorithm to optimize the parameters of LightGBM and XGBoost, select the optimal hyperparameters for training in order to construct a landslide susceptibility evaluation model for Zones I and II, and to evaluate the accuracy of the model in both zones. Then, to select the model with the higher accuracy to construct landslide susceptibility mappings. (2) To test the accuracy of the prediction model and the difference in performance between the two algorithms using collinearity analysis and McNemar’s Test. (3) To interpret the prediction results based on the SHAP algorithm’s factor importance ranking and single-factor dependency plots in order to explore the factors inherent to the different landform conditions that influence landslides. (4) To sample individual landslides from Zones I and II separately, and apply local and individual interpretation to them based on a summary plot of SHAP values.
4. Discussion
4.1. LightGBM-SHAP Hybrid Model
This paper first uses a Bayesian optimization algorithm to optimize the model parameters and applies the optimized hyperparameters to two different machine learning algorithms to investigate landslide susceptibility in areas with different landform types.
Figure 8 and
Figure 9 and
Table 7 and
Table 8 demonstrate that both the LightGBM and XGBoost algorithms had strong predictive power. Meanwhile, the accuracies of the LightGBM test set were 0.068 and 0.055 higher than those of the XGBoost test set in Zones I and II, respectively, which shows that the prediction performance of LightGBM was superior to that of XGBoost. For this reason, the LightGBM algorithm was employed for training in this paper. Its landslide susceptibility results for both zones were found to be reasonable and consistent with the distribution of historical landslide cases, indicating the stronger generalization capability of the LightGBM algorithm. Through verification with practical examples, LightGBM, as an optimization algorithm for XGBoost, is more accurate and capable of handling large-scale data and requires less memory, which agrees well with previous research results.
The application of machine learning algorithms in the field of landslides is currently more mature, but most scholarly research has been limited to the differences in the prediction performance of different models. Sahin, E.K. [
64] constructed GBM, XGBoost and RF models to map landslide susceptibility and assessed the differences between the models to obtain the optimal predictive power of XGBoost. However, this is far from sufficient, as researchers should aim for a high predictive power along with a comprehensive explanatory performance in order to make research more rational and transparent. At the same time, few scholars have assessed landslide susceptibility based on different landform types or have investigated their intrinsic causal mechanisms, which could lead to inaccurate evaluation results for research areas with complex landform conditions.
While the LightGBM model achieved favorable prediction results in this study, the intrinsic prediction mechanism of the machine learning model was difficult to predict and explain, reducing the credibility and transparency of the model and seriously hindering the use of machine learning in the field of landslide susceptibility. Therefore, by combining the LightGBM and SHAP algorithms, this paper proposed an interpretable machine learning system. The model was constructed on the basis of different geomorphological types, and the model output was interpreted both comprehensively and locally in order to explore the relevance of each factor to the occurrence of landslides, while exploring the contribution of each conditioning factor to the evaluation results of each evaluation unit in the research area, as well as to the changes in the dominant factors leading to landslides under different geomorphological conditions. The SHAP waterfall plot was also used to analyze the causes and predict the risks of a randomly selected landslide sample and comprehensively investigate the decision-making mechanism of the model.
4.2. Global Explanation
Summary plot is a comprehensive interpretation of sample prediction results that allows us to have an intuitive sense of how features affect the overall predicted value, while visualizing the importance of the features and clearly capturing the distribution of SHAP values for each feature [
65,
66].
In this study, the importance of each factor was calculated and ranked from a global perspective based on the SHAP algorithm (
Figure 11,
Figure 12,
Figure 13 and
Figure 14). The Zone I factors were ranked by importance in descending order as follows: elevation, distance to the river, distance to roads, surface cutting depth, land use, average annual rainfall, relief, curvature, distance to a fault, aspect, TWI, POI, lithology, slope, NDVI and TRI. The Zone II factors were ranked by importance as follows: elevation, average annual rainfall, distance from roads, land use, distance from a fault, surface cutting depth, distance from river, TWI, NDVI, relief, POI, curvature, lithology, aspect, slope and TRI. Combining the SHAP results above, we concluded that the elevation, surface cutting depth, land use, average annual rainfall and distance from roads had a significant influence on the occurrence of landslides in Zones I and II, and that these five factors were still important factors in the occurrence of landslides even in areas with different geomorphological types. The difference was that the distance from the river ranked second in the order of importance in Zone I, and the distance from faults ranked fifth in the order of importance in Zone II. This section mainly, respectively, analyzes the dominant factors that are common and individual to both zones.
4.2.1. Ensemble Dominant Factor
Elevation controls the local vegetation type and cover, and reflects the intensity of human activity [
67]. The elevation ranked first in importance in both zones. The effect of elevation on landslide occurrence is negatively correlated in Zones I and II, with landslides being inhibited at high values of elevation and promoted at low values of elevation. The reason is that Zone I belongs to the layered mountainous hilly landform, with an uplift internal power, steep slopes, vertical and horizontal valleys and loose soil. Zone II was characterized by the coexistence of the middle and low canyon landform and reverse landform. Loose sediments were likely to accumulate in the low-altitude areas of the two zones and close to the water system. There were also extensive human engineering activities in the territory (such as excavation slope toe, deforestation, sewage infiltration, excessive exploitation of groundwater, etc.), which reduced slope stability and greatly increased the probability of landslide occurrence. In high-altitude areas with high vegetation cover, low human activity and a high soil and water consolidation capacity, landslides were less likely to occur.
The surface cutting depth, as opposed to relief, is targeted at small localized areas, using the proximity of the watershed to the nearest valley channel as an indicator. It has important reference significance for studying the development of soil erosion and surface erosion due to its ability to reflect the valley depth and the relative elevation difference in the zone, as well as surface fragmentation in the vertical direction. At the same time, the distribution of the degree of fragmentation of the landscape varied due to geological formations and lithology, which had an impact on regional erosion and ecology, and thus on the occurrence of landslides. As shown in
Figure 13d and
Figure 14f, this factor played an important role in the landslides in Zones I and II. In the space of Zone I, the landform in Zone I presents an alternating high and low landform formed by the crisscross of deep valleys and middle mountains, and the structural form of Daba Mountain. This landform is mainly present in layered high-medium mountainous areas. For the mountainous landform, the external geomorphological process is mostly erosion and denudation. Therefore, the surface cutting depth plays a promoting role in the low-value area. Zone II is high in the northwest and low in the southeast. The river flows from west to east, and the terrain is deep. There are many valleys and mountains, mainly in the middle and low terrain, with obvious moderate cutting. It is a typical middle and deep cutting middle mountain terrain with less flat area. The greater the surface cutting depth, the more effective the promotion of landslide occurrence.
The average annual rainfall is the data obtained from long-term observation, whose influence on slope stability is related to permeability, hydrophilicity and the initial water content before precipitation. It has a great effect on the local surface runoff level and groundwater flow. At the same time, rain will infiltrate the slope, erode the slope, and scour the rock and soil on the surface of the slope, which will increase the pore water pressure, soften the rock and soil, increase the bearing capacity of the slope, but also affect the development of vegetation, thus promoting or inducing the occurrence of landslides. However, as can be seen from
Figure 13f and
Figure 14b, rainfall had very different effects on landslides for the same amount of precipitation in different areas. There was a positive correlation between precipitation and landslide occurrence in Zone I, i.e., a promoting effect on landslides in areas with high values of precipitation and an inhibitory effect in areas with low values of precipitation. In contrast, precipitation in Zone II was negatively correlated with initiation, acting as an inhibitor in areas with high values of precipitation and promoting landslides in areas with low values of precipitation. This is due to the fact that Zone I is predominantly a stratified mountainous landscape with an undulating topography and a humid zone, with high precipitation and strong flowing water. The surface soil type is mainly lime (rock) soil, and the area is mostly slightly eroded. The increase in rainfall aggravates slope erosion and reduces slope stability. As a result, when the average annual rainfall in Zone I increases, the potential for landslide hazards increases. Zone II is a mountainous and hilly landform with high mountains, steep slopes, undulating terrain and abundant rainfall. Its surface soil type is mainly yellow brown soil with severe soil erosion. Landslides can occur at lower rainfall levels. Although high rainfall aggravates erosion, it also takes away the loose material on the surface of the slope, making the slope more compact and stable, thus reducing the possibility of landslide occurrence. In addition, the factor of the multi-year average rainfall used in this study mainly refers to its indirect influence on landslide occurrence through its long-term influence on vegetation development, soil moisture content, surface erosion and other disaster-pregnant environments. There may be a potential correlation between historical precipitation and landslide development, or with landslide incubation processes, in the development of landslides. Compared with the direct landslide triggering factors, such as ‘24 h cumulative rainfall (daily rainfall)’, ‘effective rainfall in the early stage (such as 10 days before the landslide)’ and ‘rainfall duration’, the average annual rainfall does not directly trigger landslides, but can be one of the underlying factors conditioning their formation. Due to the large difference in the magnitude and spatial distribution of the average annual rainfall between Zone I and Zone II, combined with the very different geomorphology of the two areas, the average annual rainfall in Zone I and Zone II has a significantly different effect on the occurrence of landslides.
Human activities that violate natural laws and destabilize slope conditions can trigger the occurrence of landslides. Due to engineering construction, slope excavation and filling, the slope sliding force changes. The surface load increases the slope gravity, leading to a slope foot excavation and free surface, which can lead to the revival of old landslides, slope instability or natural landslide intensification, and thus the occurrence of large-scale landslides. For example, excavation of the foot of the slope, the construction of railway roads, building houses on the mountain, etc., can have an effect. Vigorous blasting and forced excavation during construction can lead to slumps in the lower part of the slope due to the loss of support and subsequent landslides on the side slopes, bringing hazards to road construction operations. Thus, as can be seen from
Figure 13c and
Figure 14c, this factor contributed to the occurrence of landslides when the distance from roads was within 500 m, and only as the distance increased would the contribution to landslides diminish.
Different land use practices have affected the inherent mechanisms of landslides differently, and with an increasing population and economic development in both urban and rural areas, human engineering activities have altered the original balance of ancient landslides. The resurrection of ancient landslides and slope instability as a result of irrational land use is a frequent occurrence, such as the revival of Maliuzui landslide in the Banan District of Chongqing and the Daheba ancient landslide in the Wanzhou District of Chongqing. It can be seen from
Figure 13e and
Figure 14d that forest, shrub, grassland and cultivated land had an inhibitory effect on landslides because forest–shrub–grassland is capable of soil and water conservation. Land use mainly affects the critical strength of the induced landslide, enhances the surface strength of soil, and enhances the fixation of the root system during landslides. Cultivated paddy fields can reduce the anti-sliding force of landslides. However, urban land plays a significant role in promoting landslides. Not only do cities gather large numbers of people, their construction works are the most numerous and have the greatest impact on landslides. Landslides are more densely distributed in urban areas than in other types of areas with the same geological topography. The ways of using urban land, such as through landfill, broken surface loading, the unreasonable discharge of sewage, the unreasonable mining of industrial and the mining industries to form goaf, have an important impact on landslides. In addition, steel, concrete and other hard materials are used in urban construction. The engineering excavation exposes slopes and enhances landslide susceptibility in the case of weak rock/soil slopes. In summary, the impact of the land use type on landslide development is fundamentally different from that of geomorphological factors.
4.2.2. Comparison of Individual Dominant Factors
As can be seen from
Figure 13b and
Figure 14e, the top factors in Zone I and Zone II top are the distance from the river and the distance from faults, respectively.
Distance from river: The erosion of rivers was one of the common factors affecting the formation of landslides. The slope of the valley bank became steeper due to the erosion of water flow, which destabilized and even destroyed its slope, while making the slope toe and sliding face empty, leading to soil sliding and the collapse of the bank slope. The pattern and degree of river erosion changes with the evolution of riverbank erosion. In the karst landform of Zone I, flowing water acted on limestone. Under the long-term erosion of river water (river incision and side erosion), the rock and soil accumulated at the foot of the bank slope got lost, weakening its supporting effect on the sliding body and affecting stability. For example, seven tributaries, including the Daning River and Baolong River, were strongly incised in the north-south direction.
The distance between faults is one of the most important structures in the crust. Generally speaking, geological faults cause a large number of landslides, and structural faults usually reduce the strength of the surrounding rock [
68,
69]. The more developed the faults that cut and separate the slope are, the denser the landslide scale is. The geological structure of Zone II is the Sichuan–Hunan–Guizhou uplift fold belt in the southeastern Chongqing of the neo-cathaysian tectonic system. The Shapley value indicates that with the increase in the distance between faults, landslides will be more likely to occur due to more developed gullies in the north and south of landslide-prone mountains and deeper cutting. The geological structure, including faults and folds, has a great impact on the formation of landslides. In general, rock mass near the fold core and the fault zone is broken, the landslide is developed, and the crustal stability of the active tectonic section is poor. In particular, the recent strong active fault zone makes the landslide densely developed.
4.3. Local Explanation
In this study, taking the Jinjiling landslide in Wushan County (Zone I) and the Jiweishan landslide in Wulong County (Zone II) as the research area, this paper objectively analyzed the decision-making process of the research model based on the waterfall plot generated by SHAP, which provided a local explanation of the causes of individual landslides.
The average annual rainfall, distance from roads, TWI, elevation, and surface cutting depth were the dominant factors affecting the occurrence of landslides in the Jinjiling landslide in Zone I. Combined with the field analysis results, we found that the landslide was surrounded by a round-chair terrain, with multiple gullies gathering in the Longdong Gully, in which surface water and groundwater converged into the landslide area, thus creating a favorable condition for the occurrence of landslides. Due to tectonic compression, the rock in the landslide area was extremely fragmented, which makes it easy for rainwater and groundwater to infiltrate and for groundwater to accumulate in the landslide area. At the same time, broken stone soil was mainly accumulated in the shallow layer of the landslide, which had a loose structure, good permeability and experienced the rapid infiltration of atmospheric rainfall and surface water. In addition, the landslide was strongly deformed after severe rainfall. The continuous and concentrated rainfall process increased the self-weight of the slide, and soil was pushed towards the central front and flowed under its own weight. In addition, the front of the landslide was formed by the excavation of the road, which was performed in order to form a mining surface, providing conditions for the landslide to shear out. In summary, the topography, geological structure and stratigraphic lithology collectively provided a good physical foundation and prerequisite for the formation of landslides, while the interaction between human engineering activities (landfill projects) and heavy rainfall were the key factors contributing to the deformation of the Jinjiling landslides.
The Jiweishan landslide in Zone II illustrates the unique mechanism of slope failure [
63], with the dominant factors for its occurrence being the distance from roads, slope orientation, surface cutting depth and relief. With the complex local terrain and dense vegetation of the Jiweishan landslide, the geological conditions for this landslide formation were hidden and complex, usually manifesting as lateral landslip. The Jiweishan landslide was the result of the joint action of the weak strata controlling the geological structure underground mining and karstification. Jiweishan is a kind of inclined thick limestone hill structure that is widely distributed in the main limestone hills (Chongqing, Sichuan, Hubei, etc.). Since the 1920s, there have been continuous mining activities on the hill, and a hollow area of more than 5 × 10
4 m
2 had been formed in the lower part of the slope, which had a certain influence on the deformation and stress adjustment of the hill. In areas with a large slope, the shear and stability of the slopes were even weaker, and a large number of sloping hills were characterized by instability. Therefore, the formation of a large-scale dangerous rock mass on the Jiweishan landslide was mainly attributed to high and steep terrain, and to large-area iron ore mining under the hill. With the intensification of human activities, especially at lower elevations and at closer distances from roads, the eco-environment on the ground surface was more and more seriously damaged, all of which contributed to the occurrence of landslides.
The waterfall plot helps us to explore the internal occurrence mechanism of individual landslides in a comprehensive and clear manner, and is highly practical in terms of providing a basis for disaster management authorities to make decisions.
4.4. Post-Programming
Notably, the SHAP summary plots showed that the surface cutting depths chosen in this paper all have high contribution values in the landslide susceptibility zoning, further suggesting the important contributing role of the surface cutting depth in the occurrence of landslides. However, this factor has been rarely selected as a conditioning factor in previous studies. At the same time, the incision density, incision depth and surface fluctuation degree were three single-factor indicators that could be combined to quantify the degree of surface fragmentation. However, only the incision depth and the surface fluctuation degree were selected in order to examine the factors influencing the landslide mechanism from the macroscopic perspective of the vertical level and regional topographic features in this paper. Therefore, it is advised that the surface incision density is included as a conditioning factor in future studies on landslide susceptibility.
5. Conclusions
In this study, 16 landslide susceptibility conditioning factors were extracted using multiple sources of data, such as satellite imagery, geological data and hydro-meteorological data, in areas with distinct differences in terms of their topographic and geological features. A negative sample was also randomly selected at a 1:1 ratio between historical landslide data and non-landslide data. A comprehensive and interpretable landslide susceptibility evaluation model framework of two mountain landforms that was based on the SHAP-LightGBM algorithm was constructed in order to perform a comparative analysis and to explore the variation and spatial distribution characteristics of the dominant factors that induce landslides under different geomorphological conditions; the algorithm was also constructed in order to explore the internal decision-making mechanism of the landslide susceptibility results that were constructed using machine learning algorithms. The aim of this study was to improve the scientific accuracy and transparency of zoning results and to minimize the influence of different geomorphological conditions on the results of landslide evaluation. This paper provides a reference for interpretable research on landslide hazard management and machine learning in two distinct areas: the corrosion layered high-middle mountain region and the middle mountainous region of strong karst gorges. By selecting two typical landform type areas in Chongqing as the research area, this paper assessed the feasibility and interpretability of the proposed model and its prediction accuracy. The conclusions are made as follows.
1. The AUC values of the LightGBM and XGBoost models based on the Bayesian optimization algorithm for Zone I are 0.9649 and 0.9292, while those for Zone II are 0.9920 and 0.9773, respectively. Most of the areas on the landslide susceptibility map are in the low and lower susceptibility zones, with the high and very high susceptibility zones accounting for the majority of the total number of historical landslides in the research area, with a gradual increase in the landslide density from the very low to very high susceptibility zones. It can be observed that both algorithms are accurate in predicting the landslide occurrence in both zones, and the model constructed by the LightGBM model after using the hyperparameter optimization algorithm has a higher evaluation accuracy, which further validates the excellent prediction performance of the algorithm with instances. The LSM results constructed based on the LightGBM algorithm have great application prospects because they are realistic, reliable and scientific.
2. The elevation, surface cutting depth, land use, distance from roads and average annual rainfall are the dominant factors that act together in the context of the two different landform types. The distance from rivers is more relevant in the corrosion layered high-middle mountain region, while the distance from faults has a stronger influence on the distribution of landslides in typical low-medium hills, multi-gorge regions.
3. The single-factor dependence plot generated by the SHAP algorithm quantifies the value of the contribution of individual factors to the evaluation results of the model in terms of individual evaluation units. In addition, the analysis of the different degrees of factor influence on individual landslides during their occurrence and the genesis analysis take into account the uniqueness and diversity of individual landslide causes. The integrated SHAP-LightGBM interpretation model that has been proposed in order to evaluate the causes of a single landslide occurrence, as well as risk prediction, is of great value in the field of landslide susceptibility prediction.
4. SHAP is an algorithm used to effectively interpret landslide susceptibility assessment results. The integrated interpretation framework based on the SHAP-LightGBM model can measure the importance and interaction of factors at both global and local levels; this enables scholars to comprehensively and explicitly understand and analyze the distribution characteristics of each factor during model modelling and the occurrence pattern of landslide hazards, improves the credibility of machine learning algorithms, and provides a reference for research on the interpretability of machine learning. It is believed that SHAP and other XAI analysis tools will become an integral part of later research on machine learning systems.