**3. Results**

#### *3.1. Comparison of Feature Sets and Classifiers*

The highest OA using both classifiers was achieved with MNF + ALS feature set (Table 2). There was a statistically significant difference (*p* < 0.05) between the SVM and RF classifications only when reflectance or NVI feature set was used (Table S2). MNF feature set outperformed reflectance, NVI and ALS feature sets with statistical significance. For SVM classification the fusion of ALS features with MNF features improved the OA with statistical significance, compared with the classification with only MNF features (Tables S3 and S4). For RF classification there was not statistically significant improvement between these feature sets. The highest classification accuracy was achieved with SVM classifier and MNF + ALS features set, but the improvement to RF classification with the same feature set was not statistically significant. Generally, the OAs were low, as many species with fewer samples performed poorly.


**Table 2.** Classification results for the different feature sets using support vector machine and random forest classifiers with all of the species with more than three samples classified separately.

## *3.2. Feature Selection*

Feature selection had only small impact on OA and Kappa (Table S5), while the change in OA was statistically significant only for the NVI feature set classified with the SVM classifier. However, we could achieve the same level of accuracy with a smaller number of input features (Table 3). The important spectral regions were found at 400–450 nm, while 550–570 and 700–800 nm were also important. The most important MNF component (MNF9) had high weights around the same areas where we found important spectral bands (Table 4 and Figure 6). Most important vegetation indices were anthocyanin content index (ACI) and anthocyanin reflectance index (ARI) (Table S1 and Table 4) that were calculated from spectral bands centered at 549, 698 and 788 nm that are also seen as spikes in the MNF9 component.

**Table 3.** Features selected by VSURF at prediction phase for the different feature sets. The features are ordered based on their importance starting from the most important.


**Figure 6.** Contribution (weight) of different wavelengths on the most important MNF component (MNF9; vertical bars) plotted over mean spectra of 10 species with most samples. Bars represent the absolute weight and sign is indicated with color (positive, negative).

## *3.3. Jeffries–Matusita Distance*

The spectral regions with the highest JM distances between species (nine species with most samples selected for closer inspection) were found most often near 400 nm and 550 nm (Figure 7). There were notable differences between species as, for example, *Euphorbia kibwezensis* did not have any bands with the highest distance between species around 470–740 nm, while we can see an important region around 750 nm, where the reflectance is notably lower in comparison with other species.

**Figure 7.** Reflectance (mean and standard deviation) for selected species and the wavelengths with the greatest JM distances between species (vertical lines).

## *3.4. Data Balancing*

The mean OA (10 iterations with the best performing feature set MNF + ALS with feature selection) was 57.1% for the imbalanced and 56.0% the balanced classification (all 31 species). The mean F1-scores ranged between 0% (*Ficus sur*) and 84.9% (*Eucalyptus* spp) in the imbalanced setting (Figure 8). *Acacia mearnsii*, *Grevillea robusta*, *Eucalyptus* spp. and *Euphorbia kibwezensis* had mean F1-scores of 73.8%, 74.6%, 84.9% and 71.7%, respectively, with low variability. The species with fewer samples had high variability and lower F1-scores. However, *Erythrina abyssinica*, *Acacia tortilis* and *Ficus sycomorus* with 9, 4 and 8 samples, respectively, performed better than *Persea Americana* and *Cupressus lusitanica* with 42 and 31 samples, respectively. Up-sampling had only minor impact on the results.

**Figure 8.** F1-scores for all species (more than three samples) in imbalanced and balanced (up-sampling) setting using support vector machine classifier and features selected by VSURF (MNF + ALS).

#### *3.5. Grouping by Frequency*

Combining the species with less than 20 samples increased the mean OA to 70.2% (imbalanced) and 69.2% (balanced), while mean Kappa was 62.9% and 61.3% for imbalanced and balanced classification, respectively (Figure 9). Up-sampling increased recall for *Acacia mearnsii*, *Cupressus lucitanica* and *Persea Americana* while precision decreased. We found a notable increase in the mean F1-score only for *Persea americana* while the F1-scores of species with more samples and higher initial classification accuracy decreased slightly.

**Figure 9.** Precision, Recall and F1-scores for the species with more than 20 samples and Other class in balanced and imbalanced setting using support vector machine classifier and features selected by VSURF (MNF + ALS).

#### *3.6. Single Species Classfication*

The up-sampling had the biggest impact on the results when the species were classified individually against all remaining species (Figure 10). The mean recall increased and mean precision decreased for most species. The mean F1-scores increased notably for *Acacia seyal*, *Acacia tortilis* and *Ficus sycomorus* from 40.2%, 64.3% and 82.6% to 51.3%, 77.3%, and 87.2%, respectively.

**Figure 10.** Classification results when each species was classified individually against mixed group of all other species (results shown for species with F1-score > 50%) in balanced and imbalanced setting (SVM classifier and MNF + ALS feature set with feature selection). The class level accuracies for the "other" class are not shown.

#### *3.7. Grouping Species Based on Jeffries–Matusita Distance*

The species with high F1-scores and low variability (*Acacia mearnsii*, *Grevillea robusta*, *Eucalyptus* spp. and *Euphorbia kibwezensis*) were classified separately, while four groups were created for the remaining species based on JM-distance (Table 4). Two of the groups were classified with F1-score > 60% while the other two had mean F1-scores around 50% (Figure 11). The mean OA was 66% and mean Kappa 61% while up-sampling had only a little impact on the overall performance. All species in Group 3 are fruit bearing trees with economic importance.

**Table 4.** Groups generated using JM distances and the total number of samples in each group.


**Figure 11.** Classification results with JM distance based class grouping in balanced and imbalanced setting using support vector machine classifier and features selected by VSURF (MNF + ALS).

#### *3.8. Comparison of Different Aproaches*

We selected four of the species (*Acacia mearnsii*, *Grevillea robusta*, *Eucalyptus* spp. and *Euphorbia kibwezensis*) with the highest F1-scores for a closer comparison of how they were affected depending on how the remaining species were grouped (Figure 12). For the selected species, the highest mean precision and F1-scores were achieved when the species were classified individually against mixed groups of all other species. The highest recall for *Acacia mearnsii* and *Grevillea robusta*, the species with the greatest number of samples, was achieved when all 31 species were classified separately.

**Figure 12.** Comparison of precision, recall and F1-score for the selected tree species with different grouping methods and up-sampling (All = all 31 species classified individually, Single = each species is classified against mixed group of all other species, JM = JM distance used to group species, Lim20 = species with fewer than 20 samples grouped together).
