*2.3. Variable Selection*

The environmental variables (n = 98) were reduced by the variable inflation factor (*VIF* < 10) [28] and the AUCRF R package [36,37]. We considered the most parsimonious model, i.e., the one with lower number of variables [28], built with the non-collinear variables that maximized the AUC value of the Random Forest (RF) model prediction [36,37]. This index was calculated as:

$$VIF = \frac{1}{1 - R^2} \tag{1}$$

where *R*<sup>2</sup> is the coefficient of determination. The selection of non-collinear variables was carried out using the stepwise procedure with the *usdm* package in the *R* program, depending on the importance of each variable [38] in the *R* program [39]. It was determined through simple linear correlations between the predictions of the model including all the variables (full model) and the prediction excluding the evaluated variable (reduced model) [28].

The AUCRF R package implements a stepwise variable selection procedure and returning the AUC value reached. The order in which the variables are added to the model is estimated according to the variable important measurement. The variable importance measurement offers two different importance measures, the mean decrease Gini (MDG) and mean decrease accuracy (MDA), respectively. Furthermore, the probability that the variable is elected to build the model. For more details see [36]. Hence, the model is firstly feed with the variable that returns the best AUC and new variables are added until the model is built with all available variables. The package returns the number and identity of the variables which give the best AUC value with their variable importance and probability of selection score [36].
