*2.4. Bioclimatic, Physiographic, and Soil Variables*

The spatial distribution of species within a geographic area dependson the interaction with several environmental factors that contribute to their development and coexistence [40]. Considering this, 33 variables were selected (Table 2) to carry out the modeling. These variables include 19 bioclimatic and solar radiation obtained from WorldClim 2.1 (https://www.worldclim.org/; accessed on 5 January 2021) [37]; 3 topographic derived from digital elevation model (DEM) obtained from the United States Geological Survey (USGS) web portal (http://srtm.usgs.gov; accessed on 28 December 2020); the relative humidity obtained from the Climate Research Unit (CRU) [41] (www.cru.uea.ac.uk; accessed on 1 May 2021); and 9 soil properties collected from SoilGrids 0.5.3 (http://soilgrids.org; accessed on 15 January 2021) [42]. All variables were rescaled into a spatial resolution of 250 m to overcome the issues such as collinearity between variables, which causes overfitting problems, increases uncertainty, and decreases the statistical power of the model [43]. Therefore, using the function "remove collinearity" from the package "virtual species" [44] in R 3.6, the variables were grouped (clustering) according to the Pearson correlation coefficient, and only variables having Pearson's r ≥ 0.7 were considered. This threshold is an acceptable measure to minimize the multicollinearity of fitted models [43].

**Table 2.** Variables for MaxEnt modeling of *Cedrela* in Peru.


<sup>1</sup> In bold, the variables with less variation between the regularized training data and single variable for each cluster used for MaxEnt model are shown.

To select an important variable for each cluster, a preliminary MaxEnt model was run (the configuration is explained in Section 3.2.) using all the variables. The variable with the best performance in the Jackknife test [25] was selected (i.e., the smallest difference in regularized training gains obtained from a model generated with all criteria except that of interest and a model generated only with the criterion of interest [21] (Table 2).

### *2.5. Execution of the Model*

The biogeographic distribution model for the 10 species of the genus *Cedrela* was performed using a maximum entropy algorithm [31] which estimates the probability of potential distribution of each species from the presence data (location) using the open-source software MaxEnt ver. 3.4.1 (https://biodiversityinformatics.amnh.org/open\_source/maxent/; access on 10 November 2020). For the validation of this model, 75% of the randomly selected presence data were used for training purposes, and 25% were used for validation [31]. The algorithm was run using 100 repetitions in 5000 iterations with different random partitions (Bootstrap method), and other configurations (i.e., extrapolation, graph drawing, etc.) were kept as default [45].

The resulting model was validated based on the area under the curve (AUC) calculated from the operating characteristic of the receptor (ROC) [31,46,47]. According to the AUC values, five performance levels were differentiated: excellent (>0.9), good (0.8–0.9), accepted (0.7–0.8), poor (0.6–0.7) and invalid (<0.6) [46,48]. We used the logistic output format to obtain the model of the 10 evaluated species by generating a raster of continuous values in a range from 0 to 1. The raster obtained was further reclassified into four ranges: (1) "high potential" habitat (>0.6), (2) "moderate" habitat (0.4–0.6), (3) "low potential" habitat (0.2–0.4), and (4) "no potential" habitat (<0.2) [24,25,28,48].
