*2.10. Classification Methods*

All classifications were realized using "caret" package in R [44]. Specifically, we used "Kernlab" and "RandomForest" packages for SVM and RF, respectively [45,46]. We used the radial basis kernel with SVM and optimized *C* and *sigma* values through grid search. For RF, we set *ntree* to 500 and searched *mtry* value by testing values 4, 8, 16 and 32. As our dataset was imbalanced, we selected the models that produced the highest Kappa instead of overall accuracy (OA) [44].

The class balancing was done for the training data during the cross validation using the up-sampling method from "caret" package, while test data were left intact. The up-sampling method randomly samples (with replacement) the minority classes to be the same size as the majority class (class with most samples). Classifications using up-sampling are referred to as balanced classification from here on after.

#### *2.11. Measures of Performance*

OA was calculated as the total number of correctly classified samples divided by the total number of samples. We used precision and recall, equivalent to user's and producer's accuracy [47], to evaluate the performance on the class level. We calculated also F1-score, which is harmonic mean of precision and recall as following:

$$F1 = 2 \times \left(\frac{precision \times recall}{precision + recall}\right) \tag{1}$$

F1-score increases with higher precision or recall and with the higher similarity between the two values. The values range between 0 and 1, while the best value is 1 and worst is 0.
