**4. Discussions**

Choosing the best suited method from the obtained observations demands a detailed analysis of the effect of thresholds on the study area. Separate analyses were conducted for occurrences of landslides in each polygon using statistics. As pointed out in a recent study, statistical attributes are reliable parameters that can be used to compare different methodologies for the definition of threshold [48]. When the available information about the distribution of rainfall is coarse, the possibilities for underestimating the threshold values are higher. From an operational point of view, this could possibly lead to a number of false alarms. Hence, it is important to complete quantitative analysis using statistical attributes. The attributes are calculated using a confusion matrix, comparing the prediction of each defined threshold and the occurrence of landslides. Everyday prediction of thresholds during the study period (2010–2018) was used for the verification of the thresholds. True positives are counted when the threshold is crossed, and a landslide is reported on the day. If no landslides are reported when a threshold is crossed, it is counted as a false positive. Similarly, if landslides are reported without the crossing of threshold, it is considered as a missed alarm and

counted as a false negative. When thresholds are not exceeded and landslides are not reported, true negatives are counted.

A perfect model would predict all the landslides correctly without any false positives or false negatives. The performances of our tests are far from perfection, but the main objective of this work is to discuss the results in a relative way to compare the different approaches used for threshold analysis. The higher the number of true positives and true negatives, the better the model. Here, we use derived parameters like efficiency, sensitivity, specificity and the likelihood ratio for a better understanding of the relative performance of different thresholds. The aforementioned statistics can help optimize the threshold model configuration, identifying a balance between false and missed alarms prediction. The results are summarized in Table 1 below.

The higher number of false positives in all the cases points towards less positive prediction power of the model. The very high number of true negatives in comparison with the order of other parameters increases the efficiency of the model. It can be stated that the thresholds are conservative in nature with much lower false negatives, and the Negative Predictive Power is very close to one in all the cases. As expected, with the decrease in exceedance probability, the number of false positives is increasing, which reduces the efficiency considerably. If the exceedance probability increases by more than 5, the number of missed alarms will increase beyond 5%, which is also not acceptable. Hence, when defining a threshold, a 5% exceedance probability can be considered.

A perfect prediction model should have the sensitivity and specificity values as one. Sensitivity is a key towards the true positive rate of the model, and specificity is an indication towards the true negative rate. In this study, T0.05 thresholds have sensitivity values as one, but this happens at the cost of very low specificity values, which is not acceptable. The likelihood ratio can be considered as the term which considers the effect of both sensitivity at the same time and can be taken as a reliable parameter for comparison of different methods [49]. It can be understood from the analysis that Imax thresholds have the maximum likelihood ratio in three different exceedance probabilities. At the same time, in each polygon, the separate thresholds derived perform well. Hence, this study proposes a regional scale threshold of 5% exceedance probability using the Imax approach and four separate thresholds for each polygon operating on a local scale.

If polygons defined thresholds as lower than the merged dataset (R2 and R4), single regional scale thresholds perform better than the separate polygon-wise threshold due to a lower number of false alarms. In the other two polygons, polygon-wise thresholds (R1 and R3) can be opted over the regional scale thresholds. Separate local scale thresholds have the advantage of more uniform climatic and geological conditions, but the lower number of events used for calibration especially in R3 and R4 is the major constraint in the definition. However, while creating a single dataset for the whole region, the merged approach of considering the nearest rain gauge is less likely to be adopted than the Imax approach. In case of peak I approach, the occurrence of high intensity rainfalls in the beginning of rainfall event will produce a false alarm which will be sustained throughout the event, predicting the possibility of a landslide. Even though the defined thresholds appear to be higher than all the other approaches, this method is not found to be effective in reducing false alarms.

These results are useful to understand the sensitivity of I–D threshold models to some boundary conditions such as the rain gauge selection, the intensity definition and the strategy of subdividing the area into independent alert zones. Unfortunately, the derived thresholds are not ready to be operated into a LEWS, but still the results highlighted the shortcomings that could be addressed with future improvements. For instance, it would be very useful to use rainfall with higher temporal resolutions (e.g., hourly) and to take into account the effect of antecedent rainfall conditions during the monsoon season by using some state-of-the-art approaches like weighted antecedent precipitation indexes [49–51] or soil moisture estimates [21].


**Table 1.**

Statistical Comparison

 of the derived thresholds.

 (The maximum likelihood ratio values are highlighted

 in bold.).

*Water* **2020** , *12*, 1000
