*4.3. Threshold Validation*

The validation of the national thresholds was based on two subsets of data: (i) a calibration set containing 70% of all reconstructed rainfall conditions (258), and (ii) a validation set containing the remaining 30% (110). The subsets were randomly selected 100 times. In addition, all those rainfall conditions that (presumably) did not cause landslides in the considered period were also reconstructed. The validation was performed 100 times, each time resulting differently; the number of conditions was always the same. The thresholds at different NEPs, calculated using the MRPC in the calibration set, are compared with the MPRC in the validation set and the rainfall conditions that did not trigger landslides. Therefore, 100 contingency tables were determined [33,51], reporting true positives (TP, i.e., landslide-triggering rainfall conditions predicted by the thresholds), false positive (FP, i.e., rainfall conditions not resulting in landslides incorrectly classified as landslide-triggering), true negatives (TN, i.e., rainfall conditions not resulting in landslides not predicted by the thresholds) and false negatives (FN, i.e., landslide-triggering rainfall conditions located below the threshold). Furthermore, three skill scores could be calculated: the true positive rate, i.e., TPR = TP/(TP + FN); the false positive rate, i.e., FPR = FP/(FP + TN); and the true skill statistics, i.e., TSS = TPR − FPR. Moreover, the FPR and TPR values were used to draw the receiver operating characteristic (ROC) curve (Figure 8). The best prediction is achieved when TPR = 1 (all observed landslides correctly detected) and FPR = 0 (no false positives) and is represented by the upper left green point in Figure 8 (best prediction point). The threshold which results closest to the best prediction point is assumed to be optimal.

Table 2 reports the mean values of the performing indexes for calculated thresholds at different NEPs, for the 100 validation runs. As the non-exceedance probability increases, the number of false negatives rises, and that of the true positives decreases. Conversely, lowering the thresholds causes an increase in the number of false positives and a decrease in the number of true negatives. In such cases, if the thresholds are used in a LEWS, false positives lead to false alarms and false negatives lead to missed alarms. It can be noted that the number of false positives can be greatly overestimated due to a lack of landslide information, i.e., many landslides may have occurred, but were not recorded. Likewise, even the true negatives can be overestimated. It has been observed that even a slight underestimation of the number of landslide occurrences can lead to an increase in uncertainty about prediction (and consequently system) performance [51].

**Figure 8.** Classification of thresholds at different non-exceeding probabilities (black points) in the ROC space. The threshold closest to the best prediction point (green point) is the optimal threshold. Horizontal and vertical bars represent the range of variation of TPR and FPR for the 100 runs in which the MPRCs are randomly selected.

**Table 2.** Mean values of the performing indexes for calculated thresholds at different non-exceedance probability. The 15% threshold has the highest scoring indexes. NEP, non-exceeding probability; TP, true positive; FN, false negative; FP, false positive; TN, true negative; TPR, true positive rate; FPR, false positive rate; TSS, true skill statistics; δ, distance from perfect classification point. The optimal value for TPR and TSS is 1, while for FPR and δ is 0.


The validation showed that the best-performing threshold is that at 15% NEP, which has the shortest distance δ from the best prediction point, and also the highest mean value TSS in the 100 validation runs (Table 2; Figure 8). This threshold is represented by the equation:

$$E = (8.9 \pm 1.0)D^{(0.42 \pm 0.03)} \tag{2}$$

The relative uncertainties of these parameters are slightly higher (Δα/α = 11.2%; Δγ/γ = 7%) than the ones reported in Table 1. The reason behind this is in the lower number of rainfall conditions available (258 out of 368).

## *4.4. Comparison with Other Thresholds*

Comparing the proposed new thresholds with the existing ones, in particular with the Slovenian threshold calculated by Rosi et al. [25] (4 in Figure 9), a large difference in the intercept of the thresholds and a small difference in the slope of the functions is noticeable: the new thresholds T5,SVN, and T15,SVN are much lower than the previously calculated Slovenian thresholds [25]. Nevertheless, the thresholds defined in this work are higher than those defined for Central and Southern Europe (an area that includes Slovenia) by Guzzetti et al. [6] (2 in Figure 9) and lower, in particular at short durations, than the global thresholds by Caine [5] and Guzzetti et al. [7] (1 and 3 in Figure 9, respectively). In addition, these differences can be ascribed to the use of different sets of input data, such as the number of landslides and time period, as well as on the available rainfall data. Rosi et al. [25] used landslides that occurred between 2007 and 2014 and a limited rainfall dataset (1 rain gauge per 460 km2). On the other hand, T5,SVN was defined with the same method and has the same resolution of rainfall data (hourly) and the same non-exceeding probability as the thresholds for Italy [39] (5 in Figure 9) and for the Italian Alpine area [40] (6 in Figure 9).

Interestingly, T5,SVN is very similar to the Italian threshold, while it has a slope that is different to the Alpine threshold. Comparing the 5% threshold defined for Slovenia with that defined with the same approach for Italy, some differences are observed. The Slovenian threshold has a similar slope and a lower intersection than the Italian one. Furthermore, the relative uncertainties for the Slovenian case study are higher. This is due to the lower number of empirical data points (368 compared to the 2309 in the Italian case) and also to a different distribution of points in the *ED* graph. In fact, the percentage of MPRC with *D* ≤ 6 h is 5.4% in the Slovenian case and 12% in the Italian case. This is due to the coarser (daily vs. hourly) temporal resolution of the landslide data in Slovenia.

The difference between the new Slovenian thresholds and the threshold for the Alpine chain can be ascribed at the same cause. One could have expected that the Slovenian threshold would be similar to the Alpine one, given the similar environment and latitude. However, this difference is again related to the diverse temporal resolution of the two landslide catalogs: daily for Slovenia and hourly for the Alps. Working with daily information for landslides can result in missing several very short (<6 h) rainfall events that can drive the slope of the threshold.

**Figure 9.** Comparison between the 5% and 15% thresholds for Slovenia and other global (1 and 3), regional (2 and 6) and national (4 and 5) thresholds. Source, numbered chronologically: 1, global threshold by Caine [5]; 2, threshold for Central and Southern Europe by Guzzetti et al. [6]; 3, global threshold by Guzzetti et al. [7]; 4, national threshold for Slovenia by Rosi et al. [25]; 5, national threshold for Italy by Peruccacci et al. [39]; 6, threshold for Alps by Palladino et al. [40].
