1. Introduction
Landslides are defined as the downward movement of soil, rock, and organic materials under the influence of gravity. They typically occur when the shear strength of slope materials is exceeded by shear stress, often due to external triggers such as intense rainfall, seismic activity, or human disturbances [
1]. Globally, landslides pose a significant threat to both human life and infrastructure. Between 1995 and 2014, over 3876 documented landslide events led to approximately 11,689 injuries and 163,658 fatalities. In 2014 alone, at least 174 landslides occurred worldwide, resulting in severe human and environmental losses [
2].
Regional statistics also highlight the devastating impacts of landslides. In the United Kingdom, a prolonged period of above-average rainfall from April to December 2012—one of the wettest periods in the country’s meteorological history—triggered a marked increase in landslide events, as recorded in the National Landslide Database (NLD) maintained by the British Geological Survey (BGS) [
3]. Similarly, across Europe from 1995 to 2014, 476 landslides were reported, causing 1370 deaths and 784 injuries. The most affected countries during this period included Turkey (335 deaths), Italy (283), Russia (169), and Portugal (91) [
4].
Landslide countermeasures are actions taken to prevent slope instability or to reduce its impacts. These measures can be broadly classified into two categories: structural and non-structural. To implement the most appropriate countermeasure, it is crucial to first identify the specific type and mechanism of the landslide [
5].
Landslide classification varies from country to country. In Japan, the term “landslide” encompasses three main phenomena: slope failure (also known as cliff failure), landslides, and debris flows. Collectively, these are referred to as sediment disasters [
6]. Although rockfall is not classified as a type of landslide, it is included in inventories under a separate category.
In Sri Lanka, landslides are classified into four types: slope failure, slides, debris flows, and rockfalls. The definitions of these classifications are quite similar in both Japan and Sri Lanka. Although landslides have been categorized in Sri Lanka, a more detailed classification system was introduced to the inventory in 2018 with the support of JICA. This updated classification is referred to as the recently modified inventory in this paper. Past disaster records are essential information for any kind of countermeasures. The National Building Research Organization in Sri Lanka has been collecting and managing a number of landslide records in the past. Since those have been stored on a paper basis, without disaster type and in different formats, it is difficult to utilize them for risk assessment and designing countermeasures [
5].
Figure 1 shows the distribution of landslide and sediment disaster types in Kegalle District, Sri Lanka, and Tokushima Prefecture, Japan, from 2002 to 2022. Slope failures are the predominant type in both regions, comprising 41% and 58% of recorded events, respectively. Slide/landslide events are more common in Kegalle (30%) than in Tokushima (18%), whereas debris flows occur only in Tokushima (13%). Rock falls account for 26% of events in Kegalle but are absent from the Tokushima records. Other types represent less than 5% in both datasets. For Kegalle District, a recently updated inventory was used to create
Figure 1.
Numerous studies have explored the factors contributing to landslide occurrence, commonly referred to as landslide conditioning factors. These include topographic elements (e.g., slope, elevation, aspect), geological conditions (e.g., soil type, lithology), land use, and proximity to natural and built features such as water bodies, streams, and buildings. Various modeling approaches have been applied to evaluate the relative importance of these factors. A summary of recent research is presented in
Table 1, highlighting the number of conditioning factors and incidents considered, along with the associated conclusions.
The matrix presented summarizes the consideration of various Landslide Conditioning Factors (LCFs) across multiple studies, categorized by the number of landslides (LS) examined in each case. The horizontal axis represents individual studies or datasets, with the respective number of LS considered indicated at the top of each column. The vertical axis lists the 24 LCFs that are commonly applied in landslide susceptibility modelling, including topographic, geological, hydrological, and anthropogenic parameters.
The analysis reveals that Land Use/Land Cover, Climate (Rainfall), Elevation/Altitude, Slope, and NDVI/Vegetation are the most consistently used LCFs, often classified as highly influential across datasets. In contrast, variables such as Thickness, Plane Curvature, Distance from Structures, and Flow Accumulation are less frequently considered. Some parameters, such as Soil Type and Distance to Faults, are included selectively and often marked as least influential, reflecting their varying importance depending on the local geological context.
The variation in LCF usage across studies reflects both the differences in data availability and the geomorphological context of the study areas. For example, hydrological factors such as TWI and SPI are predominantly considered in studies focusing on rainfall-induced landslides, whereas Distance to Epicenter is only relevant in seismically triggered events.
Overall, the comparative assessment highlights a trend toward prioritizing topographic and vegetation-related factors as primary determinants of landslide susceptibility, while certain localized parameters are incorporated on a case-by-case basis. Further information regarding LCF will be discussed under the first objective, which is to investigate the factors that affect landslide occurrence in
Section 2.
In Sri Lanka, landslides pose a serious threat, particularly within the central highlands, which encompass 12 districts recognized for their high susceptibility to slope instability [
17]. While some landslides occur due to natural factors such as intense rainfall or geological conditions, anthropogenic activities—such as deforestation, unregulated construction, and poor land-use practices—have significantly amplified the risk. Alarmingly, nearly 30% of the national population resides within these mountainous regions, increasing their exposure to such hazards. Major landslide events recorded in 2003, 2007, 2010, 2011, 2012, 2014, 2015, and 2016 collectively resulted in nearly 1000 fatalities. Furthermore, approximately 20,000 km
2—equivalent to 30.7% of Sri Lanka’s total land area—is classified as highly landslide-prone [
18].
Given this alarming trend, the identification and mapping of landslide-prone zones are essential for disaster risk reduction and sustainable land-use planning. Landslide assessments are typically conducted through a tiered approach that includes susceptibility mapping (identifying areas likely to experience landslides), hazard mapping (considering frequency and magnitude), and risk mapping (accounting for both hazard and vulnerability) [
19].
Significance of the Study:
Historical disaster records are essential for developing effective and targeted countermeasures against landslides. In Sri Lanka, the National Building Research Organisation (NBRO) has been systematically collecting landslide-related data for many years. However, much of this information has traditionally been stored in paper-based formats or across disparate systems, making it difficult to access and analyze for risk assessment and mitigation planning. While NBRO currently maintains a disaster inventory that includes basic attributes such as the date of occurrence, location, scale, and rainfall data, there remains a critical need to organize and categorize this information in a structured digital format. To address this gap, NBRO has initiated the development of an Excel-based database that compiles key parameters, including disaster location, occurrence date, rainfall conditions, landslide type, event scale, and the resulting damage [
20]. Such a system would greatly enhance the ability to design context-specific structural and non-structural interventions.
Although landslide susceptibility models, such as logistic regression, have been widely developed, the type of landslide is not considered in countries including Bangladesh, India, Indonesia, and Nepal. The significance and behavior of contributing variables are known to vary considerably between regions [
21]. Moreover, many existing models are applied at national or regional scales and often overlook the fact that landslides are influenced by distinct sets of conditioning factors [
22]. Failing to account for these differences may lead to generalized or ineffective mitigation measures.
Therefore, this study aims to enhance landslide risk management by predicting the type of landslide reported in the National Inventory, enabling the implementation of more appropriate and effective countermeasures. Given the severe consequences of landslides on human life, infrastructure, and the environment, improving predictive capability is a critical step toward reducing future impacts and enhancing community resilience.
Aim and Objectives:
The aim of this project is to develop a model for earthquake-unaffected regions to identify cliff-type landslides from a landslide inventory where the type is not specified. This will reference an area with a suitable inventory and a similar range of elevation and annual average rainfall.
Objectives are:
To investigate the factors that influence the occurrence of landslides.
To identify the most suitable techniques and tools for determining the relationship between Landslide Causative Factors (LCF) and landslide types.
To select an area that has an appropriate inventory and a comparable range of elevation and annual average rainfall.
To develop and train a model that finds the relationship between LCF and landslide type, including triggering LCF, and to validate the model.
To predict and validate cliff-type landslides in the inventory of the study area.
Study area:
Sri Lanka is an island in South Asia, and the Kegalle District, indicated in
Figure 2 below, serves as the study area. The map delineates the administrative extent of Kegalle District (highlighted in light yellow with an orange boundary) within the Sabaragamuwa Province of Sri Lanka. The district boundary is clearly demarcated to differentiate it from surrounding administrative units.
Figure 3 below illustrates the Digital Elevation Model (DEM) of the Kegalle District, Sri Lanka, highlighting the spatial variation in elevation across the region. The DEM values range from 10 m to 1934 m above mean sea level, with lower elevations (depicted in green) concentrated in the western and central portions of the district, and higher elevations (depicted in red) predominantly located along the eastern boundary adjacent to the Central Province highlands. The district boundary is demarcated in orange for spatial reference. The inset map in the upper left corner shows the location of Kegalle District within Sri Lanka, providing geographical context for the study area. This elevation distribution reflects the district’s varied topography, which plays a critical role in influencing geomorphological processes and potential landslide susceptibility.
Figure 4 presents the spatial distribution of annual average rainfall within the defined study boundary, estimated using Inverse Distance Weighting. (IDW) interpolation. Rainfall intensity is depicted using a blue-scale gradient, where lighter tones (166.988 mm) indicate lower rainfall and darker tones (451.768 mm) represent higher rainfall concentrations.
Figure 5 illustrates the structural geology and lithological composition of the study area within the Kegalle District. Geological contacts and boundaries are shown using standardized symbols, including inferred and approximate geological boundaries, axial traces of folds, faults, shear zones, and thrust lines. Lithological units are color-coded, encompassing rock types such as biotite-hornblende gneiss, calc-gneiss, granite gneiss, quartzite, marble, charnockite, and garnet granulite. Structural features such as antiformal and synformal folds, fractures, and overturned structures are indicated with distinct symbology. The inset map (upper right) locates the study area within Sri Lanka, providing a broader geographic context. The base map includes topographic relief to aid interpretation of structural patterns in relation to terrain.
Figure 6 depicts the spatial distribution of land use categories within the Kegalle District boundary. Land use types are color-coded to represent barren land, coconut plantations, dense and open forests, forest plantations, homesteads/gardens, water bodies, other cultivation areas, paddy fields, rubber plantations, rock outcrops, scrub lands, sparsely used croplands, tea plantations, and other miscellaneous uses. The predominant land use types in the Kegalle District are homesteads/gardens (39.8%), paddy fields (25.6%), and rubber plantations (18.6%), reflecting a mixed agro-residential landscape.
In Sri Lanka, out of 25 districts, 10—specifically Badulla, Nuwara-Eliya, Kegalle, Kandy, Ratnapura, Matale, Kalutara, Matara, Galle, and Hambantota—are highly susceptible to landslides. According to statistics from 1974 to 2020, Kegalle District ranks fifth in the number of landslide incidents and second in fatalities associated with landslides [
23].
2. Materials and Methods
2.1. Investigation of Factors Affecting Landslide Occurrence
A total of 52 conditioning factors were identified through a literature review. As shown in
Figure 7,
Figure 8 and
Figure 9, thirty-one research papers from different countries that considered LCF for susceptibility mapping were used to identify 52 LCF for this study. Root cohesion was excluded from this study, as it requires specific knowledge of the tree plantation area. Therefore, only 51 factors were included for initial screening. Further screening occurred during model training, which will be discussed in
Section 3 and
Section 4.
The pie chart illustrates the proportion of landslide susceptibility mapping referred to for each year from 2017 to 2023. The largest contribution occurred in 2023 (32.3%), followed by 2021 (19.4%), 2020 (16.1%), and 2019 (9.7%)
The pie chart (
Figure 8) displays the global distribution of landslide susceptibility mapping efforts by country. China accounts for the largest share (32.3%), followed by Sri Lanka (12.9%) and Africa (9.7%). Moderate contributions are recorded for India, Iran, Austria, and Japan (each 6.5%). Smaller proportions (3.2%) are reported for America, Nepal, Pakistan, Slovakia, Turkey, and the global “World” category.
The above
Figure 9, pie chart illustrates the distribution of Landslide Conditioning Factors (LCFs) considered by various authors in landslide susceptibility studies. The number of LCFs used varies notably, reflecting differences in methodological approaches, data availability, and study objectives. The largest proportion of studies (39%) incorporated 11–15 LCFs, suggesting a preference for a moderately comprehensive factor set that balances analytical robustness with data manageability. The second most common range is 6–10 LCFs (32%), which may be adopted in studies with data limitations or those focusing on specific regional conditions.
A smaller share of studies (16%) used 1–5 LCFs, likely representing preliminary assessments, rapid hazard mapping, or research emphasizing a limited set of dominant conditioning variables (e.g., slope, lithology, and rainfall). Only 7% of studies applied 16–20 LCFs, and 6% considered 21–25 LCFs—these larger ranges are typically found in highly detailed, data-rich investigations aiming for maximum predictive accuracy. This distribution reflects a methodological trade-off: including more LCFs may capture complex interactions influencing landslide susceptibility, but also increases data demands, processing time, and the risk of multicollinearity in statistical models. Conversely, fewer LCFs simplify analysis but may omit relevant influencing factors, potentially reducing model reliability.
Figure 10 below illustrates the Landslide Conditioning Factors (LCFs) considered, along with conclusions drawn by previous authors. Dominant Factors Frequently Considered and Judged Highly Influential: Slope (87%), land use (65%), elevation (66%), and aspect (61%) are among the most frequently cited LCFs, with a high proportion of studies concluding they are highly influential in landslide occurrence. Lithology (74%), distance to roads (58%), and distance from rivers/streams (61%) also exhibit high reference and high impact ratings, suggesting their strong geotechnical and geomorphological relevance.
Moderately Considered Factors: Factors such as topographical position index (39%), geology (38%), drainage density (52%), and soil type (16%) are moderately referenced but still considered significant by many authors. This indicates that while they are not universally included, they are often judged important when data are available.
Factors with Low Reference but High Impact in Specific Contexts: Rainfall (69% considered highly influential despite fewer references), stream power index (19%), and road curvature (10%) show cases where local or regional conditions may elevate their importance. These are often context-dependent, where climatic or anthropogenic pressures dominate landslide triggers.
Least Referenced and Least Impactful Factors: Several LCFs (e.g., river proximity, sediment load of river, instability indications, topographic wetness index) are rarely referenced (3–6%) and are generally concluded as having minimal influence. These may be either redundant with other factors or less relevant in most study areas.
Trend in Author Preferences: The high peaks for slope, lithology, and elevation confirm that terrain morphology and geological structure are universally acknowledged as the most critical drivers of landslide susceptibility. The clustering of low-reference factors suggests that while numerous potential LCFs exist, researchers tend to prioritize a core set of well-established variables.
2.2. Identification of the Most Suitable Techniques to Determine the Relationship Between LCFs and Landslide Types
The Forest-based and Boosted Classification and Regression (FBCR) tool, implemented in ESRI Arc GIS Pro software 3.5.2, was used to analyze the relationship between LCFs and various types of landslides. The random forest model demonstrated superior performance in landslide prediction [
36]. The Random Forest Classifier (RFC) provided the best results for susceptibility assessment [
2], and the developed Random Forest Machine (RFM) is a promising tool for assisting local authorities in mitigating shallow landslide hazards [
37]. One-third of the reviewed literature employed the Random Forest method and reported the highest accuracy, while other studies applied different machine learning techniques.
The FBCR tool utilizes two supervised machine learning methods: an adaptation of the random forest algorithm developed by Leo Breiman and Adele Cutler, and the Extreme Gradient Boosting (XGBoost) algorithm created by Tianqi Chen and Carlos Guestrin. It enables predictions for both categorical (classification) and continuous (regression) variables. Explanatory variables can include fields in the attribute table of the training features, raster datasets, and distance features used to calculate proximity values. In addition to validating model performance based on training data, predictions can be made for either features or a prediction raster [
38,
39].
The gradient-boosted model was chosen for its methodological approach, which builds a model through a boosting technique where each decision tree is created sequentially using the original training data. Each subsequent tree corrects the errors of the previous trees, allowing the model to combine multiple weak learners to produce a strong predictive model. This technique incorporates regularization and early stopping, which helps prevent overfitting and provides greater control over hyperparameters, though it is more complex [
38,
39].
2.3. Selection of a Study Area with an Appropriate Inventory and Comparable Elevation and Annual Average Rainfall Ranges
Sri Lanka has similar topographical and geological conditions to Japan [
40]. All information related to the 47 prefectures was thoroughly analyzed, leading to the selection of Wakayama Prefecture (WP) and Tokushima Prefecture (TP) as reference areas. The availability of adequate inventory data, including types of landslides, elevation ranges, and annual average rainfall ranges, was considered, as outlined in
Table 2.
The elevation range was determined to be 14, based on factors such as elevation, aspect, slope, profile curvature, plane curvature, TWI, STI, SPI, TRI, TPI, direct radiation, duration of direct radiation, flow accumulation, and flow direction, derived from the Digital Elevation Model (DEM). Population density was also compared in the chosen prefectures. Factors such as land use, NDVI, distance to roads, and distance from structures correlate with population density. Environmental LCFs, including geology, soil type, soil thickness, and distance from water bodies, posed challenges in comparison.
The distance from the earthquake epicenter (LCF) was analyzed separately for Japan, an earthquake-prone country, and Sri Lanka, which is not. Landslide and earthquake inventories [
41] for the Target Province (TP) and Western Province (WP) were examined to identify instances of earthquake-induced landslides. Earthquake incidents from 2002 to 2022 that occurred within a 90 km radius of the centers of the TP and WP were considered (see
Figure 11). The analysis revealed that no earthquake-induced landslides occurred in the TP, whereas 12 cliff-type landslides were identified in the WP. An earthquake-induced landslide likelihood map was generated by reviewing existing literature [
42,
43], which indicated the following likelihood of landslide occurrence based on earthquake magnitude: for magnitudes less than 4, landslides are rare or nonexistent; for magnitudes between 4 and 5.5, the likelihood is low to moderate; and for magnitudes greater than 5.5, the likelihood is high.
2.4. Development, Training, and Validation of a Model to Analyse the Relationship Between LCFs, Landslide Types, and Triggering Factor Occurrence
A total of 24 layers were created by collecting data from relevant authorities and processing it with GIS Pro software. NDVI (Normalized Difference Vegetation Index) and DEM (Digital Elevation Model) layers were generated using satellite images downloaded from the USGS website [
44]. Additional layers, including Aspect, Slope, Profile Curvature, Plane Curvature, Topographic Wetness Index (TWI), Stream Transportation Index (STI), Stream Power Index (SPI), Topographic Roughness Index (TRI), Topographic Position Index (TPI), Direct Radiation, Duration of Direct Radiation, Flow Accumulation, and Flow Direction, were developed based on the DEM layer. The soil thickness layer was created using information from the ISRIC website [
45]. Additional details regarding the remaining layers can be found in
Table 3 below.
Aspect, also referred to as exposure, indicates the compass direction a terrain surface faces. Slope represents the rise or fall of the land surface. Profile Curvature runs parallel to the maximum slope, while Plane Curvature is perpendicular to it. The Topographic Wetness Index (TWI), also known as the Compound Topographic Index (CTI), quantifies the topographic control over hydrological processes. The Stream Transportation Index (STI) describes erosion and deposition processes, while the Stream Power Index (SPI) indicates potential flow erosion at specific topographic points.
The Topographic Roughness Index (TRI) measures elevation differences between adjacent DEM cells, calculating the difference in elevation values from a center cell and the eight surrounding cells. The Topographic Position Index (TPI) categorizes topographic positions into upper, middle, and lower landscape segments. The Direct Duration Radiation layer indicates the length of incoming solar radiation at each location, while the NDVI measures the greenness and density of vegetation captured in satellite images.
Layers related to soil type, land use, and geology were processed using tools such as polygon-to-raster, copy raster, and float. The distances to structures, roads, streams, and faults were calculated using the distance accumulation tool. The 20-year average annual rainfall data were processed year by year using the Inverse Distance Weighting (IDW) tool in GIS and then summarized using the cell statistics tool.
Data on landslide incidents from 2002 to 2022 were collected from the National Building Research Organization (NBRO) in Sri Lanka for the KD area, while data for TP and WP were obtained from the SABO Prefectural Department. Since 2018, with support from JICA, the NBRO has worked to modify and maintain landslide inventories more effectively, including categorizing different landslide types. For validation, landslide incident data, including types, were gathered for KD from the recently updated NBRO inventory. The number of landslide points is presented in
Table 4 below.
Landslide susceptibility mapping using data mining methods can be considered a binary classification task. Therefore, the same number of non-landslide points was randomly selected from landslide-free areas and divided using a 70/30 ratio [
49]. Landslide points were labeled as 1, and non-landslide points as 0. A layer was created based on 167 cliff-type landslide points and 167 non-landslide points to run the model. Non-landslide points were generated using tools such as buffer, erase, and create random points in GIS Pro 3.5.2.
All layers were processed to have the same extent, raster format, pixel type, pixel depth, and cell size of 33.952976 m, and were set to use the WGS 1984 Web Mercator (auxiliary sphere) coordinate system. Tools used for processing included copy raster, clip raster, float, define projection, and project raster.
The model was developed using the Forest-based and Boosted Classification and Regression (FBCR) too in GIS Pro 3.5.2l. This process involved inputting 24 explanatory variable layers along with a point layer representing cliff types, categorized as landslide (LS) and non-landslide (non-LS), for a total of 334 points. The prediction type selected was “train only,” and the model utilized was a gradient boosted model, which constructs a series of sequential decision trees. Each subsequent decision tree is designed to minimize the error (bias) of the previous tree. As a result, the gradient boosted model effectively combines several weak learners to create a robust predictive model.
For training, the input feature consisted of the cliff type point layer, while the variable to be predicted was the occurrence of LS in cliffs. The explanatory training datasets were loaded, and for land use, soil type, and geology layers, the categorical box was checked. Output files—including the trained model, trained features, variable importance table (VIT), confusion matrix (CM), and validation table—were saved in a geodatabase. The training data exclusion rate for validation was increased from 10% to 30%. Environment settings were configured, and the model was trained by adjusting various parameters as shown in
Table 5 below. Throughout the training process, key metrics such as accuracy, sensitivity, Matthews correlation coefficient (MCC), F1 score, mean, median, standard deviation, and the shape of the histogram were monitored.
Explanatory variables identified as unimportant were found to potentially impact the model’s accuracy and other parameters. Consequently, geology, soil type, and land use conditioning factors (LCF) were removed due to their low importance, as illustrated in
Figure 12. The model was finalized once all output parameters met established criteria.
2.5. Prediction and Validation of Cliff-Type Landslides in the Study Area Inventory
To predict and validate cliff-type landslides in the study area, the satisfactorily trained model was utilized to generate a raster prediction, which was then saved in the geodatabase. This prediction focused on the target area (TP) for validation, as only 70% of the cliff LS points were used for training. The column containing explanatory raster predictions was updated using KD layers, and the model was executed for KD prediction. A cliff-type LS point layer was created with GIS tools, referencing the TP inventory. This layer was overlaid with the predicted output layer of the TP subarea for model validation. Similarly, a cliff-type LS point layer was created using the recently modified KD inventory and overlaid for validation. The model was also employed to predict cliff-type LS within the KD inventory prior to its modification.
Figure 13 below presents the methodological flowchart developed for identifying cliff-type landslide susceptibility. The process begins with the selection and preparation of input datasets, which include 21 landslide conditioning factor layers and landslide inventory points. The model is trained using machine learning classifiers—specifically, the Forest-based and Boosted Tree algorithms—within a defined training area (TP). Validation is conducted using standard performance metrics such as accuracy and sensitivity. The validated model is then applied to the target area (KD) using the same set of conditioning factors. Finally, predictions are compared with a recently updated landslide inventory to evaluate the model’s reliability and accuracy. This structured approach ensures the systematic generalization of the model across different spatial domains.
4. Discussion
Landslides are a pervasive geohazard worldwide, causing extensive human, infrastructural, and environmental losses, with their occurrence governed by complex interactions among climatic, geological, geomorphological, and anthropogenic factors. The prevalence of slope failures, debris flows, rockfalls, and other landslide types is influenced by local environmental conditions and classification practices; however, a comparison between Kegalle District, Sri Lanka, and Tokushima Prefecture, Japan, reveals broadly similar classification patterns despite regional differences. The adoption of type-specific inventories, such as the recently modified NBRO database in Sri Lanka, enhances hazard modelling by enabling a more precise distinction between failure mechanisms and supporting the design of targeted countermeasures. Literature comparisons indicate that commonly prioritized landslide conditioning factors (LCFs)—including land use, rainfall, slope, elevation, and NDVI indices—are widely acknowledged as influential, yet the integration of less frequently applied but contextually significant variables, such as plane curvature, soil thickness, and distance from earthquake-induced epicenters, can strengthen predictive accuracy when adapted to local settings.
To determine the most suitable analytical approach for exploring the relationship between LCFs and specific landslide types, this study employed the Forest-based and Boosted Classification and Regression (FBCR) tool in ArcGIS Pro. RF, widely used in susceptibility mapping, consistently demonstrates high predictive accuracy, with approximately one-third of recent studies reporting superior results compared to other machine learning methods. While RF offers robust ensemble-based predictions, the gradient boosting method further enhances model performance by sequentially correcting misclassifications, incorporating regularisation, and reducing overfitting.
The choice of study and reference areas—Kegalle District (KD) in Sri Lanka and Wakayama (WP) and Tokushima (TP) prefectures in Japan—was informed by the availability of type-specific inventories and broadly comparable elevation and rainfall ranges. Earthquake-induced landslides were assessed separately given the differing seismic hazard contexts of Japan and Sri Lanka. Inventory analysis revealed no earthquake-triggered landslides in TP but 12 cliff-type events in WP between 2002 and 2022. These findings, supported by literature-based magnitude thresholds for landslide triggering, underscore the importance of region-specific factor selection and methodological adaptation in enhancing the relevance and reliability of landslide-type prediction models.
The relative importance of each LCF was first evaluated against trends reported in previous studies (
Figure 10). Consistent with earlier work, DEM, slope, and land use emerged as highly influential, while factors such as soil thickness and building proximity—less frequently highlighted in the literature—showed strong importance in this study, reflecting local anthropogenic influences. Moderate-importance variables, including direct duration radiation, terrain ruggedness index (TRI), and profile curvature, contributed meaningfully to the model’s discriminatory power, highlighting the multifaceted influence of both geomorphic form and solar energy input on slope stability.
In modelling the relationship between LCFs and cliff-type landslide occurrence, 24 explanatory variables derived from satellite imagery, digital elevation models, thematic maps, and official geospatial datasets were processed to a consistent spatial resolution and projection. Many of these terrain-derived indices—such as slope, curvature metrics, TWI, SPI, TRI, and TPI—are well established in the literature for their geomorphic relevance. The gradient boosting algorithm was selected for model training due to its iterative bias-reduction process, with parameter tuning (e.g., number of trees, lambda, gamma) applied to optimize predictive performance while avoiding overfitting. Geology, soil type, and land use were excluded from the final model owing to their low importance scores, reflecting the benefits of data-driven feature selection in enhancing model generalizability.
The model results confirm that cliff-type landslides can be effectively predicted using terrain-derived explanatory variables in combination with advanced ensemble learning algorithms. The performance metrics—accuracy (0.84), sensitivity (0.84), F1 score (0.84), and Matthews correlation coefficient (MCC) (0.68)—indicate a high level of predictive reliability, consistent with or exceeding the performance of similar studies employing Random Forest and gradient boosting techniques for landslide susceptibility mapping. The relatively low standard deviation (0.03) across multiple runs further suggests model stability and generalizability. The prediction performance of 95% for both landslide and non-landslide classes in TP highlights balanced classification ability, reducing the risk of bias toward either class.
The variable importance analysis identified DEM, distance from roads, soil thickness, slope, and proximity to buildings as the most influential factors. While slope and elevation consistently appear among top predictors in global literature, as shown in
Figure 51 below, the prominence of soil thickness and building proximity in this study reflects local geomorphic and anthropogenic influences—particularly in densely settled terrain where slope cutting and surface loading can exacerbate instability. Hydrological indices such as Direct Duration Radiation (DDR), Terrain Ruggedness Index (TRI), and Profile Curvature also emerged as moderately influential, each contributing ~7% to the model’s gain (
Table 7), reinforcing the role of topographic form and energy in slope failure processes.
Validation outcomes demonstrate the added value of using type-specific inventories. In TP, 112 of 118 cliff-type landslide points (95%) overlapped with predicted high-susceptibility zones, while in KD, the recently modified inventory yielded a match rate of 80.1% (72 out of 89 points). In contrast, validation against KD’s earlier non-classified inventory resulted in only 39.1% (115 out of 294 points) alignment, underscoring the critical importance of detailed landslide type information for accurate hazard modelling. This finding echoes observations that type-specific susceptibility mapping significantly improves the spatial precision of hazard delineation and the relevance of mitigation strategies.
Spatial overlay analysis between predicted cliff-type landslide zones and individual triggering factors further reinforced the model interpretation. The thematic maps (
Figure 52,
Figure 53,
Figure 54,
Figure 55 and
Figure 56 below) illustrate how high-susceptibility areas spatially correspond with Elevation, Slopes, Soil thickness zones, proximity to roads and buildings, rainfall, and areas of high direct duration radiation.
The DEM comparison shows in
Figure 52 that the majority of predicted cliff-type landslide zones coincide with areas in the mid-elevation range (15.138–250.114 m), highlighting the role of intermediate relief zones in fostering cliff instability. These elevations may represent transitional zones between valley floors and ridge tops, where slope gradients, drainage convergence, and weathering profiles combine to produce unstable conditions.
Soil thickness analysis in
Figure 53 shows that many predicted landslide areas overlap with zones of maximum soil depth (182.048–200 cm). Thicker soils may promote instability by increasing gravitational load and water retention, thus raising pore-water pressures during heavy rainfall events.
The comparison between the slope classification (
Figure 54) and the predicted cliff-type landslide distribution (right panel) reveals a clear spatial correspondence between the predicted cliff-type landslide zones and areas with moderate slope gradients (3.5–28.497°), represented in yellow. This suggests that cliff-type landslides in the Tokushima subarea are more frequently associated with moderately steep slopes rather than extremely steep terrain (>28.498°).
Overlaying transportation networks (
Figure 55) revealed that many predicted cliff-type landslide areas intersect or lie in close proximity to road corridors. This relationship is consistent with findings from previous studies that road construction, cut slopes, and associated drainage modifications can locally reduce slope stability and accelerate failure processes. Similarly, the distribution of buildings shows (
Figure 56) a notable concentration within or near predicted landslide-prone zones, suggesting heightened exposure of infrastructure to slope hazards. This proximity underscores the potential socio-economic implications of such failures and the necessity for integrated land-use and hazard management planning.
Collectively, these results demonstrate that a carefully selected set of LCFs, combined with advanced machine learning algorithms, can produce high-accuracy, transferable models for cliff-type landslide susceptibility. The approach enables more targeted hazard management, particularly when applied to inventories with detailed landslide type classifications, thereby bridging the gap between susceptibility modelling and engineering-scale countermeasure design.
Limitations
While the model demonstrates strong performance metrics, several limitations should be acknowledged. First, the quality and completeness of the landslide and earthquake inventories used for training may contain biases, and some events that could influence the model’s learning process might have gone undetected or misclassified. Additionally, the environmental variables utilized were restricted to the available spatial layers, potentially overlooking relevant but unmeasured factors, such as the Soil Water Index and groundwater levels.
Although the model is based on numerical and range-based factors, which reduces sensitivity to regional differences, there remains a significant risk that predictions may become unreliable when approaching the model’s extrapolation limits. In this study, the model was trained using data from Japan and applied to Sri Lanka; while the numerical nature of the conditioning factors enables interpolation and limited extrapolation, the possibility of reduced accuracy under extreme or unrepresented value ranges cannot be ruled out.
Lastly, although the analysis incorporated the Cliff landslide (LS) point layer and 21 landslide conditioning factors (LCFs), only the earthquake point layer was considered beyond the Tokushima Prefecture boundary, with a buffer applied outside the study area. This represents a limitation of the study, as the other 20 LCFs were not extended beyond Tokushima Prefecture. Expanding the spatial coverage of all conditioning factors, rather than restricting them to the prefectural boundary, could provide a more comprehensive assessment and improve the robustness of the model results.
5. Conclusions
This study demonstrates the effectiveness of advanced machine learning methods—specifically Random Forest and Gradient Boosting—in modelling cliff-type landslide susceptibility using a diverse set of landslide conditioning factors (LCFs). By integrating terrain-derived indices, hydrological parameters, proximity measures, and selected anthropogenic variables, the developed models achieved high predictive accuracy, stability, and balanced classification performance. The results confirm that inventories containing detailed landslide type classifications greatly improve model precision, with validation outcomes showing substantially higher agreement rates compared to non-classified datasets.
The variable importance analysis revealed that both widely recognised factors (e.g., slope, elevation) and context-specific variables (e.g., soil thickness, proximity to buildings) significantly contribute to cliff-type landslide prediction. Spatial overlay analysis highlighted the strong alignment between high-susceptibility zones and critical triggering conditions such as steep slopes, reduced soil cover, and areas influenced by human infrastructure. These findings underscore the importance of combining universally relevant predictors with locally significant factors to achieve optimal model performance.
The methodological approach developed in this research can be adapted to other landslide types and geographic contexts, provided that suitable inventory data and environmental covariates are available. Beyond academic contributions, the study offers practical value for hazard management, land-use planning, and the design of targeted countermeasures. Future work should focus on integrating temporal triggers such as rainfall intensity-duration thresholds and seismic parameters, as well as testing model transferability across regions with similar geomorphological characteristics.
Recommendations:
To enhance the comprehensiveness and practical utility of landslide susceptibility modeling, it is recommended to integrate temporal variables, such as rainfall intensity, earthquake occurrences, and historical landslide timing, into future models. Including these time-dependent factors would allow for predictions not only of spatial risk but also of the likely timing of landslide events. Additionally, future studies should focus on developing models specifically designed for areas affected by earthquakes, as seismic activity significantly contributes to triggering landslides, especially in steep and unstable terrain. Furthermore, it is advisable to expand the modeling approach to include various types of landslides, such as debris flows, rockfall.