*2.4. Statistical Models*

In this work, we used an ensemble approach (biomod2) [22] to describe the current and future habitat of *Euacalyptus grandis* and *E. dunnii* in Uruguay, as a basis to study potential changes in the optimal plantation areas for both species, under different climate scenarios. The biomod2 R package assembles the SDM which includes the Generalized Lineal Model (GLM), Generalized Additive Models (GAM), classification and regression trees (CART), Boosted Regression Tress (BRT), Random Forest (RF), Maximum entropy (MaxEnt) flexible determinant analysis (FDA), artificial neuronal networks (ANN), Multivariate Adaptive Regression Splines (MARS), and Surface Range Envelope (SRE). We used ensemble models calculated using the mean, median, coefficient of variation, confidence interval (inferior and superior), committee averaging (CA), and probability mean weight decay (WD) of the single model prediction. The CA was achieved by a binary (presence/absence) transformation using the threshold of the single model predictions. The threshold is the maximum score of the evaluation metric ("True Skill Statistics", TSS) for the evaluated dataset. Subsequently, the probability value of each pixel was calculated as the mean of the single pixel predictions. The WD ensemble modelling scaled the individual model predictions according to their accuracy statistic value (AUC) and the sum of all individual models [37,40]. We made ensemble predictions based on all single models' projections with an AUC > 0.90.

#### *2.5. Selection and Validation of the Model*

The evaluation and selection of the model correspond to the determination of the predictive capacity based on the quantification of the error and the misclassified data. These errors may be of commission, which is the classification as an absence of a data point that is present (false negative), and of omission, which is the opposite. We randomly split our dataset into two subsets, 70% of the data to train the model and 30% for model evaluation using cross-validation (100) which yielded 100 different fits per model. Models with higher mean values and smaller variations were considered as being the most accurate ones [28]. The criteria used to determine the predictive capacity of the

model were the Kappa coefficient (K), the Area Under the ROC (Operational Characteristic of the Receptor) Curve (AUC) and the coefficient TSS. The first of these was used to estimate the veracity of the maps developed and is a qualitative measure of the concordance between the categoric predictors and describes the concordance rate between the observed and predicted values. The values of K vary from +1 to −1, where +1 represents a perfect classification and negative values represent a random fitment [41]. The AUC is the graphical representation of the commission errors on the horizontal axis (presences classified incorrectly, 1-specificity), and the omission errors (real presences which are omitted, sensitivity) are represented on the vertical axis for the whole value range [42]. The values obtained vary from 0.5 to 1, where 1 represents a perfect classification (presences and absences) and 0.5 represents a random prediction. According to Thuiller et al. [43], the fitted model may be classified as poor (AUC < 0.8), satisfactory (0.8 < AUC < 0.9), good (0.9 < AUC < 0.95), or very good (0.95 < AUC < 1). The coefficient TSS was applied to estimate the precision of the model with the presence/pseudo-absence data [44]. This compares the number of correct predictions except for those attributed to the random and, in the same way as the K index, considers the omission and commission errors. Its values oscillate between −1 and 1, where 1 indicates a perfect concordance and negative values indicate random behavior [42]. This index is defined according to the following expression:

$$\text{TSS} = \text{sensitivity} + \text{specificity} - 1 \tag{2}$$

In this case, K index and TSS were calculated according to maximum TSS threshold [45].

#### *2.6. Comparison of Current and Future Distributions*

Current and future model predictions were obtained at a pixel resolution ~1 km. Each pixel of the prediction contains the probability of occurrence of the species ranging from 0 (not probable) to 1 (highly probable). To better visualization of the probability of occurrence the probability values were classified in four categories (0–25%, low; 26–50%, moderately low; 51–75%, moderately high; 76–100%, high) for the present or future occurrence of both species in Uruguay. To calculate the variation in the area occupied in the future, compared to the area currently occupied by the two species, the probability maps were reclassified to 0 (absence) and 1 (present), using the same threshold applied to calculate TSS and Kappa. The total area of the potential distribution for each single projection was calculated by counting the number of pixels and multiplies it by the area of a pixel and the changed area was presented as percentage of total area loss (<100%) or gained (>100%) [30].

#### **3. Results**
