*4.2. Increased E*ffi*ciency of the Method*

We test the modified *AR-S* method using the regional-scale *S* data [49] and the calibration landslide data set from [35] to calculate thresholds with 0.05 and 0.10 exceedance probability. The 145 landslide events constituting the calibration set yield 435 weighted event dates, of which eight are discarded from the analysis because they do not meet the *AR* ≥ 5 mm requirement, thus 427 data instances remain in the threshold analysis (constituting *Q*, Figure 4). The 5% probability level is most frequently used in landslide hazard and early warning studies [9,41,64]. We also include the 10% level because threshold estimation relies then on a larger data subset *T* (at 5% probability level, *t* = 85; at 10% level, *t* = 171). Significance measures mentioned throughout the paper are associated to the significance level p = 0.05. The R open-source software, release 3.4.3 (http://www.r-project.org, last access: 14 April 2019) was used for all analyses. *AR* thresholds at the 5% and 10% exceedance probability levels were estimated as (Figure 5)

$$AR\left(\mathbb{S}^{\omega}\right) = \left(4.8 \pm 0.6\right) \times \mathbb{S}^{\left(-1.16 \pm 0.08\right)}\left(\mathbb{R}^2 = 0.69\right) \tag{6}$$

$$AR\ (10\%) = (6.4 \pm 0.6) \times S^{(-1.08 \pm 0.07)} \ \left(R^2 = 0.62\right) \tag{7}$$

Contrary to the unrealistic results obtained from the original *AR-S* approach (Equations (3) and (4)), we get here plausible marked inverse relations between *S* and *AR* [65,66]. Moreover, the threshold equations are now associated with meaningful average R<sup>2</sup> coefficients of 0.69 and 0.62. All bootstrap iterations provide significant α and β parameters for both thresholds. We remind that here the bootstrap procedure consists in repeating the threshold calibration phase 5000 times, each iteration being based on a random sampling (with replacement) out of the *R* data set until the number of sampled data equals that of the *r* points of the data set. The subset of lowest-*AR* data is then selected from the random sample before threshold estimation. The mean and standard deviations of the 5000 estimates of α and β define the parameter values and uncertainties (Δα and Δβ). The results indicate an excellent performance of the modified *AR-S* threshold approach where the spread of the data subset for threshold calibration is forced over the entire *S* range. Obviously, strongly negative slopes result in decreased values of intercept α in Equations (6) and (7) as compared to Equations (3) and (4), respectively.

**Figure 5.** Log–log plot of antecedent rain (mm) vs. landslide susceptibility (regional-scale [49]) for the landslide events on the reported day and the days prior and after that date (with the point size relative to their associated weights, i.e., 0.67 and 0.17 respectively). The green and red curves are the *AR* thresholds at 5% and 10% exceedance probability levels respectively, along with their uncertainties shown as shaded areas and have been obtained with the modified *AR-S* method (Figure 4). Ndata is the number of data in the expanded calibration set. The dashed lines delimit the log(*S*) classes.

Though performing satisfyingly well, the modified *AR-S* threshold method leaves two minor issues open. The first one is related to the very close parameterization of the 5% and 10% thresholds and finds its cause in the similar actual FNRs of 0.05 and 0.07 obtained from 5% and 10% thresholding, respectively. In particular, the too low actual FNR associated with the 10% threshold equation betrays the real nature of the problem, which lies in the insufficient number of data in the low-*S* classes preventing the constitution of a complete data subset *T* to estimate the desired threshold. This issue is independent of the size of the original data set because, however large the number of recorded events might be, their distribution across the *S* range will remain similarly unequal, with low-*S* classes relatively deficient in data, especially for thresholds with higher exceedance probability demanding larger calibration subsets. Owing to the specific distribution of the data in the *AR-S* space, the *AR-S* approach inevitably implies to make a trade-off between high exceedance probability levels and degraded distribution of the data from which the threshold is estimated. Fortunately, more conservative low-exceedance probability thresholds (typically 5%) are the least affected by this issue.

High relative uncertainties (in the order of 10%) on parameter α might be another source of concern. However, beyond being subjective, the criterion chosen by [9] to qualify the threshold quality, namely a > 10% relative uncertainty, is barely usable in the *AR-S* space, where the many outliers in data distribution alter the efficiency of the bootstrap technique of uncertainty estimation (see Section 5). Moreover, in addition to the fit uncertainty, the bootstrap-based errors on the parameters obtained here from our weighted approach include the event date uncertainty and are also affected by the effect of the partly erratic character of the data distribution, inherent to the combination of ground (*S*) and meteorological (*AR*) variables on which the method relies. We thus conclude that the benefits of a method yielding thresholds directly modulated by the environmental conditions greatly outweigh the shortcomings of slightly higher uncertainty mainly on the threshold line intercept.
