*2.4. Classification*

#### 2.4.1. Random Forest (RF)

The RF ensemble uses a bootstrap sample, i.e., 2/3 of the original dataset (referred to as the "in-bag" sample), to train decision trees. The remaining 1/3 of the data is used to compute an internal measure of accuracy (referred to as the "out-of-bag" or OOB error) [25]. To produce the forest of decision trees, two parameters need to be set: The number of unpruned trees to grow, known as ntree; and the number of predictor variables (i.e., wavebands) selected, known as mtry [25]. Mtry variables are tested at each node to specify the best split when growing trees. These randomly selected variables produce low correlated trees that prevent over-fitting. In a classification framework, the final classification results are determined by averaging the results of all the decision trees produced. For a detailed account of RF, see [25,50]. RF was implemented using the 'randomForest' package [51] in the R statistical software environment [49]. The default values for ntree (ntree = 500) and mtry (mtry = √*p*) were used following [50,52].

#### 2.4.2. Extreme Gradient Boosting (XGBoost)

XGBoost, like gradient boosting, is based on three essential elements; (i) a loss function that needs to be optimised; (ii) a multitude of weak decision trees that are used for classification; and (iii) an additive model that combines weak decision trees to produce a more accurate classification model [31]. XGBoost simultaneously optimises the loss function while constructing the additive model [30,31]. The loss function accounts for the errors in classification that were introduced by the weak decision trees [31]. For a detailed account of XGBoost, see [30]. XGBoost was implemented using the 'xgboost' package [53] in the R statistical software environment [49]. XGBoost requires the optimisation of several key parameters (Table 1). However, to facilitate a fair comparison of RF and XGBoost, the default values for all parameters were used to construct the XGBoost models, with nrounds set to 500.

Furthermore, to ensure a more robust model and prevent overfitting, a 10-fold cross validation was performed for both RF and XGBoost.


**Table 1.** Key parameters used for XGBoost classification.
