*2.2. Species Distribution Ensemble Modelling*

According to the existing literature, the ensemble forecasting model from different SDM techniques is recognized as the most powerful, stable, and well-referenced method to analyze the potential impact of climate change on tree species [31,64]. An ensemble (or sometimes consensus) modelling is based on the idea that each different modelling output represents a possible state of the real distribution. With this technique, single-model projections are combined into a final surface where the predictions are averaged. In this paper, the ensemble technique was used as predictive method for each of the 19 forest tree species to estimate their potential land suitability under current (i.e., 1981–2010) and future climate conditions (i.e., 2050s, RCP 4.5). The averaging technique was represented by the weighted mean of single model projections using the True Skill Statistic (TSS) indicator [65] calculated with a cross-validation procedure using 75% and 25% for training and testing as weight [42,66]. Furthermore, in order to account for the potential uncertainty that originated from different SDMs, nine algorithms were used for modelling tree species distributions. Fifty replications were performed for each algorithm for a total of 450 single-model projections for each investigated species. The algorithms implemented here include general linear model (GLM), generalized additive model (GAM), classification tree analysis (CTA), artificial neural network (ANN), flexible discriminant analysis (FDA), multivariate adaptive spline (MARS), random forest (RF), and maximum entropy (MAXENT). Codes are available in the biomod2 package [67] in the R statistical language [68].

To avoid collinearity problems amongst the predictors, a principal component analysis (PCA) was performed on the complete set of climatic variables [69]. PCA transforms the original predictors in uncorrelated (i.e., orthogonal) features by preserving the whole variability of the analyzed ecological system (i.e., the ecological variability of the Italian environment). The PCA-derived features were then used as input for the SDMs. Among all the NFI points a threshold of 15% for basal area share was used to filter NFI plots to generate presences (i.e., all the plots where the target species was representing more than 14.99% of total basal area) according to a previous investigation [25]. Afterwards, 10 different pseudo absences datasets (PA) with an equal number each of pseudo-absences than presences were generated with the Surface Range Envelope method [70]. Indeed, even if potentially available from the NFI dataset and detectable from tree-level information, the use of all the plots where the species has not been detected as absences can drive the models to biased predictions, even if setting prevalence to 0.5 [27]. The main reason behind this issue is that, in a managed environment, while the presence is objectively defined, the absence can be due to both inhospitable environment or forest management decision (selective logging, forest management, etc.) and no information is available to confirm any of the above-mentioned possibilities in the NFI data. This generated the final dataset composed by 4500 different single-algorithm single-PA prediction for the consensus model calculation. No soil information was added in the model as it was considered almost stable in the considered time period.
