*2.2. Data*

This study is based on land surface and landslide data from various sources and at different spatial resolutions. As our target resolution, we used 10 m × 10 m to account for the dependence of landslide susceptibility on local-scale topography.

**Figure 1.** Overview of study area. (**A**): Location of study areas in Austria. (**B**): Landslides visible in Waidhofen's slope map (red rectangle in (**C**)). (**C**): HRDTM and derived landslide inventory. (**D**): A landslide on cropland in the district South East Styria, which occurred after an extreme rainfall event in June 2009, and "disappeared" in the following years. Photo taken on 23 July 2009.

> 2.2.1. Land Surface Data

For both study areas, airborne LiDAR-based HRDTMs of 1 m × 1 m resolution were available, which were provided by the GIS department of the Styrian governmen<sup>t</sup> for Paldau (acquisition year: 2009), and by the provincial governmen<sup>t</sup> of Lower Austria for Waidhofen (acquisition year: 2017), respectively (Table A1).

Datasets of hydrologic and hydropedologic characteristics were compiled by the Austrian Research Centre for Forests in 2014 for Waidhofen [39] (50 m × 50 m resolution) and in 2017 for Paldau [40] (100 m × 100 m resolution), respectively, and were made available by the respective federal states. The datasets include information on soil parameters such as soil type, total pore volume, and hydraulic conductivity of the topsoil (0–20 cm).

Geological basemaps at a 1:50,000 scale were provided by the Geological Survey of Austria for Waidhofen and by the Styrian GIS department for Paldau, respectively. In both geological basemaps, alluvial deposits were corrected in order to match valley floors visible in the HRDTM. Furthermore, we reclassified the geological units into a smaller number of relevant classes. In Waidhofen the reclassification was based on lithology and geomechanical properties which resulted in seven lithological units of (i) alluvial deposits, (ii) talus and glacial deposits, (iii) Inneralpine Neogene, (iv) Klippen zone, (v) flysch zone, (vi) Upper Austroalpine marls, and (vii, reference level) Upper Austroalpine limestone. In Paldau, we specified five geological units based on grain size distribution and age of origin: (i, reference level) Neogene formations with coarse-grained layers, (ii) Neogene formations dominated by fine-grained sediments, (iii) pre-Würmian Pleistocene formations, (iv) Würmian and Holocene sediments, and (v) other units.

#### 2.2.2. LULC Legacy

By combining unique historical spatially explicit information on LULC in the two study areas and numerical information on land use intensity, we were able to create information on LULC legacies that could be used as an input in landslide susceptibility modeling. This is, to our knowledge, the first study that uses such a long, multi-temporal and spatially explicit LULC record to better understand regional-scale landslide susceptibility.

To generate this dataset, we collected, digitized and harmonized data depicting different spatial patterns for the years 1820, 1960 and 2015 (Table 1). Our data sources included the Franciscan Cadastre of 1820, aerial photographs of 1960, and aerial orthophotos combined with Integrated Administration and Control System (IACS) data of 2015 ("present-day"). Numerical information on LULC intensity (e.g., agricultural yields and machinery numbers) was collected at communal, district or provincial levels from archival sources and both historical and recent statistics, and was up- or downscaled to match the municipal level. The data sources assigned to a time cut slightly differed in their year of origin between the municipalities (Table 1). The selection of the time cuts was based on characteristic socio-economic framework conditions of the Central European cultural landscape following Bender [41] (i.e., time cuts around 1850, 1914, 1960 and present age) as well as on the availability and temporal proximity of data.

We created a vector GIS database with LULC polygons from these sources using ESRI ArcGIS. We distinguished between forest area, cropland and grassland, and classified the remaining LULC as "settlement and other" (for details refer to Table A2 in the Appendix A). Due to varying data quality (Q, Table 1), in the first step, the cadastral maps (1820) and orthophotos (2015) were georeferenced and digitized (EPSG: 32633) since these allowed us to map boundaries between distinct land uses. In a second step, we used these layers as base maps for digitizing the less distinct areas from greyscale aerial photographs (1960). Spatially persistent natural and built structures were kept unchanged for all time cuts. Furthermore, we consistently used a display scale of 1:1000 to keep the digitization error constant. The estimated overall positional error is up to 3–5 m for 1820, 5–10 m for 1960, and less than 3 m for 2015.


**Table 1.** Sources of land use legacy information of at least acceptable spatial and thematic quality identified for the case study areas.

Wh: Waidhofen, P: Paldau; **\*** Data status of temporal extent of 2016; Quality (Q): ++: high spatial and thematic quality, +: high spatial or thematic quality, ~: acceptable spatial and thematic quality. Adapted from Knevels et al. [45].

> Based on the numerical information, we quantified biomass extraction as yields (kg fresh weight per ha and year, kg FW/ha/a) at the lowest possible administrative level and allocated it to cropland, grassland and forests. Biomass extraction represents the output from the agricultural production system and is thus an indicator of the LULC intensity during historical time periods [46]. For the biomass extraction from cropland, we employed the cereal yields as proxy; data on harvest from meadows was collected for grassland yields, and wood extraction was derived from forest yields.

> We finally intersected the three vector datasets for 1820, 1960 and 2015 successively to one final file, keeping all available attributes, and converted the result to raster format at the uniform target resolution. Biomass extraction was summed over the time cuts to obtain a cumulative land use intensity (Figure A1 in Appendix A). This is an innovative approach for this socio-ecological indicator in the context of landslide science as other authors calculated cumulative materials flows [47] or greenhouse gas emissions [48] in other application contexts. To avoid artifacts due to geometric inaccuracies inherent in the vector files and from sliver polygons, isolated pixels were identified, and the affected grid cells were excluded from the landslide sampling design described below (see Section 2.2.3). The estimated digitization errors (3–10 m) might partly be counteracted by using a target resolution of 10 m × 10 m (i.e., resolution corresponds to largest estimated error).

The created historical LULC legacy dataset was made publicly available [49].

#### 2.2.3. Airborne LiDAR-Derived Landslide Inventories

For both study areas, we derived historical landslide inventories by mapping landslides visible in the HRDTM following the approach proposed by Schulz [22]. We included landslides in earth and debris materials, focusing on the slide type with possible transitions

to complex slide flows according to the classification scheme by Cruden and Varnes [50]. Landslides were digitized as polygon features also distinguishing between landslide body and scarp (Figure 1B). Selected mapped landslides were inspected in the field for validation, and we corrected the inventory where necessary. In total, in Waidhofen, 621 landslides were mapped, covering 5.31% of the municipal area, and in Paldau, 418 landslides (4.14% of the area; Figure 1C; Table A3 in the Appendix A).

Following Hussin et al. [51], we randomly sampled landslide presence points in the landslide scarp area using the recommended 50-m distance constraint, and attributed the point to the corresponding grid cell in our target grid (in total 974 and 559 landslide points in Waidhofen and Paldau, respectively). For sampling non-landslide points, we defined the landslide-free area by excluding the mapped landslides and a surrounding 50-m buffer to account for digitization errors. We furthermore excluded so-called trivial areas—areas considered as not susceptible to landsliding (e.g., floodplains, flat areas) [52]. Isolated grid cells (see Section 2.2.2.) and anthropogenic structures with similar geomorphometric characteristics as landslides (e.g., quarries) were also masked. For the landslide absence locations, we distributed random points in a 1:1 sampling ratio using a minimum nearestneighbor distance of 50 m to reduce spatial autocorrelation.

## *2.3. Methods*

The relationships between LULC legacies and landslide distribution were analyzed using GAMs [53,54] while also accounting for the local topography as an important preparatory factor for landslides [8]. GAMs have become popular in landslide susceptibility studies due to their ability to model nonlinear relationships while allowing for a separate interpretation of additive effects in terms of odds ratios and variable importance [55].

Our analysis was conducted in the free and open source computing environment R (R version 3.5.3) [56]. We used the GAM implementation of the *mgcv* package [54] and the *mlr* package [57] as the modeling framework. Furthermore, for terrain analysis we used System for Automated Geoscientific Analysis (SAGA) GIS 6.3.0 [58] through *RSAGA* [59] and Terrain Analysis Using Digital Elevation Models (TauDEM) 5.3 [60] via R system calls.

For downscaling the input data to the target resolution, we applied bilinear interpolation for resampling using SAGA GIS. However, we acknowledge that we are unable to capture local-scale patterns of geology or soil parameters.

#### 2.3.1. Landslide Susceptibility Modeling Design

For landslide susceptibility modeling, we related land surface variables, soil parameters, lithological units and LULC legacies as predictors to landslide occurrences (Table 2 for overview).

Our model design enabled us to explore relationships between LULC legacy and landslide distribution, and to improve the understanding of the potential biases in airborne LiDAR-derived landslide inventories. We created landslide susceptibility models with different sets of input variables: (i) The baseline model 'GAM-Base' excludes LULC legacy variables; for the assessment of the LULC legacy effect, we built (ii) a GAM using the present-day LULC as an additional predictor (GAM-2015), (iii) a GAM based on the LULC legacy information from 1960 to 2015 (GAM-1960), and (iv) a GAM based on the LULC legacy information since 1820 (GAM-1820). Moreover, we tested (v) a GAM using the setting of GAM-2015, but excluding potentially inventory-biasing observations following the recommendations of Petschko et al. [27] (GAM-2015-Masked; i.e., all observations located in continuously forest-covered areas). Furthermore, we allowed modeled landslide occurrences to be dependent on the combined effect of the historical biomass extraction and present-day LULC class rather than modeling LULC legacy variables as additive terms. Thus, the predictor of LULC legacy information in GAM-1820 and GAM-1960 was implemented as a parametric, linear interaction term between the LULC of 2015 and the historical biomass extraction (sum since 1820 or 1960). Moreover, we tested the model's

transferability between study areas, but excluded predictors that are specific to each area (i.e., lithology).

For both study areas, the airborne LiDAR-derived historical landslide inventory and the created landslide susceptibility models are available as Supplementary Materials.


**Table 2.** Predictor variables for landslide susceptibility modeling.

Setting, scale-dependent parameters: r: radius; w,t,e: Parameters in SAGA GIS module Relative Heights and Slope Positions; \* ref = reference level (see Section 2.2.1).

#### 2.3.2. Assessment of the Effect of Land Use Legacy

The empirical effect of LULC legacy on landslide occurrence was assessed in terms of model performance, variable importance, odds ratios (OR) of the LULC classes, and transformationfunctionplotsofthethreemostimportantpredictor-responserelationships.

For the model assessment, we applied a *k*-fold spatial cross-validation (SpCV) to achieve independent test areas and thus a bias-reduced predictive performance as a measure of model generalization [55]. For SpCV, we partitioned the data into five disjoint folds using *k*-means clustering of the coordinates (*k* = 5), and repeated this procedure 100 times. In each repetition, subsequently, four folds were used as training data while the remaining fold was used for validation until each fold was used once for validation (i.e., *k* models per repetition, 500 models in total). Furthermore, we ensured comparability between the models' performance estimates using identical training and validation data for each study area and repetition.

The area under the receiver operating characteristic curve (AUROC) was computed as the performance measure. The AUROC is a quality measure suitable for binary response data and is a common evaluation tool for landslide susceptibility models [55]. AUROC values lie between 0.5 (no discrimination) and 1.0 (perfect discrimination), and were interpreted following the recommendations of Hosmer et al. [68]. Additionally, differences in model performances were tested using Wilcoxon signed-rank tests (*α* = 0.05; R *coin* package) [69,70], and *p*-values were adjusted for multiple comparisons according to Benjamini and Hochberg [71].

As a measure of variable importance, we extracted for each variable the mean decrease in deviance explained (mDD, %) under the consideration of all SpCV models. The mDD indicates the explanatory contribution of a variable to the overall explained deviance of the corresponding model [72,73]. The higher the mDD value, the greater is the contribution of a variable, and thus its importance. To compute the mDD, we left the variable of interest out while fixing the remaining smoothing parameters (*mgcv::gam sp* argument) during model (re-)training, and subsequently measured the percentage differences of the deviance explained [73].

Transformation function and OR plots were used to explore the relationships between landslide distribution and the three most important predictors as well as LULC legacy of each model setting. Additionally, we extracted comparative predictor-response relationships reported in studies of the same area [73,74]. A transformation function plot shows the predictor-response relationship as a parametric (linear) or non-parametric smoothing function by using the additive structure of a GAM. We visualized predictor-response relationships on the logit scale (i.e., linear predictor scale). An OR indicates the chance that an outcome occurs given a specific exposure, relative to a reference exposure [75]. An OR less than one means an exposure with lower odds of the outcome while an OR higher than one shows an exposure with higher odds of the outcome, while accounting for the other variables in the model; an OR of one is associated with no influence of the exposure [75]. We calculated ORs for the LULC classes with 'forest' as reference level. Additionally, we derived ratios of ORs (rOR) by dividing a model's predictor-response relationships by the corresponding relationship in GAM-Base. rOR enables a more sensitive comparison between models.
