*3.6. Supervised Classification*

All the classifications are performed using Python scikit-learn package [129].

#### 3.6.1. Random Forest (RF)

RF is an ensemble classifier that uses a set of Classification And Regression Trees (CARTs) to make a prediction [130]. The trees are created by drawing a subset of training samples through replacement (a bagging approach). In standard classification trees, each node is split using the best split among all variables. In RF, each node is split using the best predictor, among a user-defined number of features (*Mtry* that is usually set to the square root of the number of input variables [131]). By growing the forest up to a user-defined number of trees (*Ntree* that is usually set to 500 but different values such as 100, 1000 or 5000 have been investigated [131]), the algorithm creates trees that have high variance and low bias. The final classification decision is taken by averaging (using the arithmetic mean) the class assignment probabilities calculated by all produced trees.

For this study, *Mtry* = 500 and *Ntree* ∈ [500, 1000, 2000, <sup>5000</sup>].

#### 3.6.2. Support Vector Machines (SVM)

SVM is a supervised non-parametric statistical learning technique therefore there is no assumption on the distribution of the data [132]. The main idea of SVM classification is to construct a hyperplane as a decision surface in a way that the margin of separation between two classes is maximized. To do this, the original feature space is mapped into a space with a higher dimensionality, where classes can be modelled to be linearly separable. This transformation is implicitly performed by applying kernel functions to the original data. The learning of the classifier is performed using a constrained optimization process that is associated with a complex cost function. For problems that involve identification of multiple classes, adjustments are made to the simple SVM binary classifier to operate as a multi-class classifier using methods such as one-against-all, one-against-others.

For this study, two kernels are retained: a linear kernel (SVM linear) and a Gaussian kernel (SVM RBF).

#### 3.6.3. Regularized Logistic Regression (RLR)

RLR is a linear model based on logistic regression with an additional regularization term. This classifier has been successfully used with high dimensional data (gene selection in cancer classification [133], feature selection in remote sensing [28,29,134]).

For this study, the -1-norm and -2-norm regularization term are investigated.

#### 3.6.4. Partial Least Squares-Discriminant analysis (PLS-DA)

PLS-DA is based upon the classical partial least square regression method for constructing predictive models [135]. The goal of PLS regression is to provide dimension reduction in an application where the response variable is related to the predictor variables. In the case of PLS-DA, the response variable (i.e., vegetation types) is binary and expresses class membership [136,137]. This classifier has been successfully used with high dimensional data (gene selection [138], tree species discrimination [139]).

For this study the number of latent variables is fixed to the number of vegetation types-1 [138]. This method is not applied on spectral vegetation indices selected but on spectral signatures and their transformations on spectral ranges because it is commonly used when the number of features is much bigger than the number of spectra.

#### *3.7. Classification Accuracy Evaluation*

To evaluate the classification accuracy of supervised classifiers, a 30 fold cross-validation is used and six training samples size were investigated: 50%, 45%, 40%, 35%, 30% and 25% of all spectra.

To evaluate the classifier precision overall accuracy and F1-score are used. Overall accuracy computes number of correct spectra over all spectra, whereas F1-score is given by:

$$\text{F1-score} = 2 \cdot \frac{\text{PA} \cdot \text{UA}}{\text{PA} + \text{UA}'} \tag{9}$$

where PA (Producer's Accuracy) is the fraction of retrieved classes that are relevant whereas UA (User's Accuracy) is the fraction of relevant classes that are retrieved.

#### **4. Results and Discussion**
