1. Introduction
Landslides are a physical hazard that frequently result in devastating human and economic losses around the world [
1,
2,
3]. There are various underlying geological, lithological, and morphological characteristics that make an area prone to these hazards. Nonetheless, landslides can happen as a result of anthropogenic activities or can be triggered by natural forces such as earthquakes, melting snow, or extreme precipitation [
4,
5,
6].
Landslides that are triggered by rainfall are common phenomena in mountainous tropical regions. These landslides are associated with long-term, high-intensity periods of precipitation that have dangerous potential to initiate mass soil movement due to changes in pore pressure and seepage forces in the soil [
7,
8,
9]. Rainfall-triggered landslides are usually shallow (0.3–2 m) and often driven by two different mechanisms. In the first, the hydraulic conductivity of the weathering profile decreases, creating a perched water flow that is parallel to the slope. This results in a reduction of the shear strength of the soil, which leads to slope failure. In the second mechanism, water from the surface advances on the slope while it is still unsaturated and, in this case, low suction results in a rigid mass slope failure [
8,
10].
Over the years, scholars have tried to define statistical or empirical correlations between slope failures and rainfall intensity and duration. These relationships are often defined mathematically as rain thresholds that attempt to define the rainfall curve in between the slope’s stability and failure zones [
11]. Since their inception in Cane (1980) [
12], precipitation thresholds have been established for Rainfall Intensity-Duration (I-D), Cumulative Rainfall Event-Duration (E-D), Cumulative Rainfall Event Intensity (E-I), Rainfall Cumulative (R), and other relationships between intraday rain and antecedent rainfall [
13].
Inevitably, these thresholds are highly influenced by temporal and spatial factors such as the location, range of the study area, and the instruments (rain gauges or remote sensors) used to calculate them. To a large extent, in-situ sensors (gauges) have been used to derive rainfall thresholds in various areas of the world. In Indian’s Himalayan region, for example, several authors have used in-situ-based data for the definition of rainfall thresholds. These scholars combined intensity-duration thresholds based on the daily rainfall and antecedent rain by aggregating several days in different combinations, such as 2, 3, 5, and 20 days [
14,
15,
16].
Nonetheless, recent advancements in satellite technologies have been a promising and reliable source of data to map and model susceptibility, hazard, risk, and landslide impacts in various areas of the world. Satellite products such as the Global Precipitation Measurement Mission (GPM), for example, provide rainfall estimates that can help evaluate rain as a landslide trigger at large scales [
9,
17]. Satellite soil moisture products have also been successfully adapted in various shallow landslide studies. Ray et al. (2007), for example, used moisture settings from the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) to demonstrate the correlation between moisture conditions, rainfall patterns observed from the Tropical Rainfall Measuring Mission (TRMM), and landslide occurrence. Brocca et al. (2012) used the soil water index (SWI) value derived from the Advanced SCATterometer (ASCAT) to obtain soil moisture indicators that can help predict landslide occurrence. Cullen et al. (2016) developed a shallow landslide index (SLI) derived from the Soil Moisture Active Passive mission (SMAP) and GPM that can be used as a dynamic indicator of the total amount of antecedent moisture and rainfall needed to trigger a shallow landslide in North America.
Various studies have used remote sensors, or a combination of remote sensors and gauges, to derive rainfall landslide thresholds. Brunetti et al. (2021), for example, used GPM, SM2RAIN (Soil Moisture to Rain)—ASCAT rainfall products and daily rain gauge observations from the Indian Meteorological Department to study 197 rainfall-induced landslides. In this instance, results demonstrated that the satellite products outperformed the in-situ sensors due to the better satellite spatial and temporal resolutions [
18]. Contrary to these results, M Rossi et al. (2017) described three statistical procedures for defining satellite and gauge threshold methods in central Italy. In this case, the results indicated that the thresholds derived from satellite data were lower than those obtained from gauges as the satellite products underestimated the “ground” rainfall measured by the gauges [
19].
Despite these developments, accurate satellite information is sometimes challenged by the area’s physical characteristics. This is the case in tropical regions where dense vegetation prevents the instruments from retrieving reliable readings. Complex and heavily vegetated tropical areas usually pose a significant challenge for remote earth observations. For example, exploratory analysis of the expected association between rainfall and soil moisture is not observed when looking at data retrieved from NASA’s GPM and SMAP missions in Colombia, South America. Methods such as those described in Cullen et al. (2016) perceive the connection between remotely sensed precipitation and soil moisture content but are useful only for less complex and less vegetated terrains.
Perhaps for this reason, various physical, and not satellite-based, rainfall thresholds have been determined for the Colombian region. Marin et al. (2021) for example, applied a physically based model to define rainfall intensity-duration thresholds and predict areas susceptible to shallow landslides in tropical mountain basins of the Colombian Andes [
20]. Nonetheless, to the knowledge of the authors, as of the time of this writing, satellite-based landslide rainfall thresholds for this area are not available.
Framework
This work proposes a framework for the development of a rainfall-triggered landslide threshold derived from a system that incorporates satellite observations and physical ground instrumentation at regional and global scales. As previously stated, remotely sensed antecedent soil moisture conditions for the range of the study area are not available. Therefore, we derive soil wetness conditions using a four-year (2016–2019) rainfall time series from The Climate Hazards Group Infrared Precipitation with Stations (CHIRPS).
First, we establish a relationship between the two successive rainfall episodes and the dry period in between for the entire series. Subsequently, we test the performance of these parameters in conjunction with static factors in a logistic regression. Here we leverage the information provided in inventories, the expert opinion of specialists in the region, and the various heuristic, statistical, and deterministic analyses in C.J. van Western (2008) [
21] to determine the static factors that should be incorporated into the analysis.
Dividing the dataset into training and testing sets, we then formulate a relationship between the slope angle and the new dynamic parameters expressed as a threshold that once exceeded will trigger a landslide. Later, we compare the performance between the proposed threshold and the well-known event-duration (E-D) method for the training and testing sets. Finally, we represent the proposed threshold values in a hazard map of the study area.
4. Results
4.1. Logistic Regression—Dynamic and Static Factors
The estimated coefficients for each factor affecting the “log-odds” using the maximum likelihood estimate (MLE) in the logistic model is presented in the Z-factor for Equation (3) above:
In Equation (3), P tends to 1 as Z in Equation (6) increases. As Z increases, the probability of a shallow landslide event tends towards 1 (landslide). In contrast, as Z decreases, the probability tends to 0 (no-landslide). The relationship between the coefficients and the probability is expressed as positive (landslide) or negative (no-landslide).
Landcover and soil type are categorical variables with six and five categories, respectively, for the study area and are described in depth in H. Eswaran (2016) and Smets (2020), respectively. Because this is a data-driven model, we do not assign any weight to any soil or landcover type. Instead, we create dummy variables for each category at each location. After segregating meaningful types of these categories using the RFE process, we are left with those that either influence the event to occur or not. Those with positive coefficients have a positive relationship to event causation and vice versa.
Validation results of the logistic regression demonstrate that the model can correctly predict 73% of the cases using the newly created dynamic variables.
Figure 5a shows the ROC curve that helps summarize the model predictability based on the area under the curve (AUC). The AUC reflects the probability that a randomly chosen actual landslide incident will have a high chance of classification as being an actual event. The model has an AUC of 0.73, suggesting good data-driven predictability for landslide events.
Figure 5b shows the confusion matrix for the model. These values help explain the precision (true positive)/(true positive + false positive), recall (true positive)/(true positive + false negative), and F1-measure, which combines the precision and recall. These measurements can be seen in
Table 5 below.
The odds ratio (OR) demonstrates how a one-unit increase or decrease in a variable affects the odds of initiating a landslide event. In
Table 6 below we see that for one unit increase in PR1, we expect that there is 0.718 times increase in the odds of a landslide happening. The other independent variables can be interpreted the same way.
4.2. Landslide Triggering Factor (LTF) Thresholds—Dynamic Factors and Slope
The LTF value for the training (65%) and testing (35%) datasets is assessed using Equation (4) for every two rainfall/dry periods at each landslide location from 1 January 2016 to 31 December 2019. LTF values that are associated with an actual landslide event are set as thresholds for the corresponding slope angles. Equation (7) and
Figure 6 below show the inverse function that relates the LTF and the Slope angle.
where the determination coefficient (R
2) for the LTF threshold-Slope relationship as per Equation (7) is R
2 = 0.836.
Figure 6 shows that the LTF-Slope angle relation rapidly changes in smaller slope angles, whereas it barely fluctuates in larger ones. Slopes greater than 25° show an asymptote average threshold value of 1.227 with a standard deviation of 0.104.
The LTF-slope relationship is congruent with the physical mechanisms that drive rainfall-triggered landslides. Physically, a rainfall-triggered landslide develops as the moisture content of the soil and its pore pressure increase. The slope fails when the driving force along the slip failure surface is greater than the shear strength of the material and its cohesion [
50]. In steep-slope angles, the weight of the soil along the slope surface is already significant. In this case, a small amount of additional water weight is likely to initiate a failure. This explains the low variation of the LTF in the high slope ranges. Alternatively, for small slope angles, the soil’s weight component along the slip surface that contributes to the slope failure is relatively small, and therefore, a substantial additional water weight component is needed to initiate a failure. Hence, for small slope ranges, the LTF exhibits more heightened variations.
Landslide Triggering Factor Error—False Positive Rate (FPR)
The false positive rate is defined as the probability of falsely rejecting the null hypothesis. In our case, it represents the negative cases in the data that were mistakenly reported as positive or where the LTF threshold was exceeded but there was no landslide event. Consequently, we use the FPR concept to check the adequacy of the LTF threshold value as per Equation (8) below:
where LTFOver are the times in the rainfall series where the LTF value exceeded the established LTF threshold. TotalPeriods are the total number of rainfall/dry periods from the first day of the rainfall series to the actual landslide event. In this case, the maximum observed FPR value for all training cases was 0.271, demonstrating a 73% overall accuracy. Similarly, the testing dataset, 35% of the cases, presented a maximum FPR of 0.274 showing an overall performance of 72.6%.
4.3. Accumulated Rainfall Duration (E-D) Threshold—Dynamic Factors and Slope
The E-D values in the power and linear forms for the training dataset are shown in
Figure 7a,b below. In these Figures, the E-D threshold values are fitted with the corresponding E-D linear and power forms as per Equations (5a) and (5b) above.
For each instance, the corresponding curve becomes as per Equations (9a) and (9b).
Equations (9a) and (9b) define whether a landslide event happens or not. Here, for an observed accumulated rainfall (E) for a time duration (D), if E is greater than the threshold, a landslide should be expected. However, in this case, the linear form exhibits a higher determination coefficient (R2 = 0.68), thus, we use this form.
Accumulated Rainfall Duration (E-D) Thresholds Error—False Positive Rate (FPR)
We calculate the E-D threshold false positive rate (FPR) using the same approach as for the LTF FPR. In Equation (10) below, we substitute LTFOver for EDOver from Equation (8) above.
where EDOver are the times in the rainfall series where the E value exceeded the established E-D threshold, and TotalPeriods are the total number of rainfall/dry periods from the first day of the rainfall series to the actual landslide event. In this case, the maximum observed FPR value for all training cases was 0.60.
4.4. LTF Threshold vs. E-D Threshold
It is well known that the antecedent moisture conditions of the soil before a landslide event are critical for landslide initiation. Regardless of the intensity and duration of a rainfall episode, shallow landslides are directly affected by soil moisture conditions [
51,
52]. Various physically based analyses have demonstrated that slope instability does not only depend on the intensity of the rain or its duration. It is the case that extensive precipitation within a dry period can trigger a landslide as much as a low-intensity rainfall during a wet period [
53]. Similarly, pre-existing wet conditions can cause large debris flow during or following a downpour [
54].
The E-D method uses the duration and accumulation of the triggering rainfall to establish a threshold value that, once exceeded, will lead to a landslide event. The LTF method, instead, not only considers the triggering rainfall but also evaluates the effects of the preceding precipitation before the triggering rain and the dry period in between the two rainfalls. The maximum observed FPR for the E-D method for the training dataset was 0.60. Conversely, the LTF FPR was 0.271 as noted above.
Figure 8a,b below show the E-D and LTF threshold FPR values for the training and testing datasets. In both cases, the LTF threshold performs better than the E-D threshold for 71% and 81% of the cases, respectively.
The difference in performance between the two thresholds can potentially be explained by the introduction of parameters that simulate the state of the soil before the landslide event and by relating them to the slope inclination. The LTF method considers five dynamic variables (RS1, PR1, RS2, PR2, and DT) that precariously simulate the wet state of the soil before the landslide event affecting the probability of a landslide. This assumption is possible under the notion that the slope angle inclination is inversely proportional to the amount of rainfall necessary to trigger a landslide. Conversely, by design, the E-D threshold does not consider any information about the soil wetness pre-event, therefore limiting its performance.
4.5. Landslide Triggering Factor—(LTF) Thresholds Hazard Map
A landslide hazard map that shows the probability of where and when an event would happen, as defined by Guzzetti et al. (2005) [
55], can be derived by applying the LTF threshold concept to the slope angle distribution in the study area. From Equation (4) above, we can now derive a dynamic value that can be mapped for all areas where the quantities of both rainfall periods (RS1, RS2), their duration (PR1, PR2), and the dry period in between (DT) exceeds the LTF threshold as follows:
By defining DM as in Equation (11):
And then substituting Equations (4), (7), and (10) into Equation (11) and simplifying:
The DynamicMap (DM) quantities represent the average rainfall in mm/day of the two rainfalls divided by the dry period between the two rainfalls in days. In
Figure 9 below, it is anticipated that when the DM quantity exceeds the LTF threshold multiplied by the slope, a landslide should be expected. Areas located in the mountainous Andes region show relative low DM (Equation (12)) values necessary to trigger a landslide. This is evident, as the area is characterized by high slopes. Conversely, areas located in the Caribbean Sea coastal region, the Pacific Ocean coastal region, and the lowlands of the Amazon and Orinoco regions with gentle or very low slopes require higher DM values.
The DM values map highlights the Andes region as the area that is at most risk of landslides with lower DM threshold values. From a risk management perspective, new DM values can be derived from weather forecasts where a landslide should be expected if the new calculated DM values overpass the DM thresholds presented here.
4.6. Challenges and Limitations
It is important to highlight the limitations of this work. First, the landslide records on which this work was based do not provide a timestamp of the event. For this reason, a daily average of rainfall estimates is used to develop both the logistic model and the LTF thresholds. Information that reflects the exact time of the event could have a significant impact on the LTF and DM thresholds. CHIRPS, for example, provides rainfall information every 6 h, and having a landslide inventory that gives a time of event could be useful to improve performance.
Second, because of the lack of satellite-based antecedent soil moisture information for the study area, wet and dry periods are used to simulate the effect that antecedent soil wetness would have in landslide initiation. This point itself is a significant assumption, as all the intrinsic physical dynamics of soil moisture conditions are not accounted for. Furthermore, this notion is based on rainfall estimates, and although these approximations have demonstrated high correlations with in-situ gauges, they have their limitations and uncertainties. Consequently, they have a direct effect on the LTF and DM thresholds.
Finally, although the DM map could serve as guidance for the landslide rainfall threshold, it is essential to note that the DM map is based on a 30 m resolution, but many landslides occur at smaller scales.
5. Conclusions
Rainfall-triggered landslides are a significant and constant hazard for the Andes region. This danger, coupled with the lack of availability of on-site instrumentation and the reliability of remotely sensed information, opens the need for more imaginative ways to unravel the problem. We present a data-driven solution in the form of dynamic variables derived from a satellite infrared precipitation and station data, CHIRPS, to simulate soil moisture variations and build a landslide triggering threshold in the Colombian region.
With the assumptions detailed above, we study four years of daily rainfall at 346 landslide events in the region. We focus on the two consecutive rainfall occurrences and corresponding dry periods leading to each one of the landslide events in the inventory. We then use them as dynamic variables. We first investigate the relationship of these rainy/dry periods in a logistic algorithm where results demonstrate acceptable performance of 73%.
Consequently, we take the rainy and dry periods right before the landslide event and simplify them as the PMC value. This value serves as an indicator for the moisture content in the soil before the landslide event. The triggering rainfall episode is then expressed as the RST value. Accordingly, these two factors are normalized with the effect of the slope angle giving rise to the LTF concept.
The LTF model allows for the allocation of threshold values associated with slope angles, and we see that as the slope increases the LTF decreases. The LTF also serves as a guidance for landslide hazards in the region, as areas with lower threshold values and high slopes are at a higher risk of landslides and vice-versa.
Although the simulated LTF lacks details about the complex processes that drive soil moisture mechanics, it attempts to simulate them by including the triggering rain, the antecedent rainfall, and the dry period in between. When the LTF is compared to the E-D threshold, the LTF performs better for 81% of the testing cases.
Although various physically based landslide rainfall thresholds have been developed for the study area, no satellite-based thresholds currently exist. The LTF threshold is one of the first satellite-based thresholds for the Colombian region that attempts to simulate the effect of this parameter in the area.
DM values for the Andes region range between 32 and 46 mm/day2 mark this as the region with the least precipitation (in two rainfall episodes) necessary to trigger a landslide. And although the DM map could serve as a guide for vulnerability and risk, several challenges should be resolved to “fine-tune” the thresholds. These include introducing a “time of event” parameter and physical or reliable satellite-based antecedent soil moisture information when it becomes available.