*2.12. Jeffries–Matusita Distance*

As we expected that some of the species cannot be classified individually and grouping all the species under an arbitrarily defined limit for minimum number of samples might not be meaningful, we used JM distance to find spectrally and structurally similar subgroups of species. The multiclass JM distances were calculated using the varSel package in R [48] using the best performing feature set. First, the JM distances were calculated for all species pairs. Next, the species with the least number of samples were grouped with the species with the lowest matching JM distance. The process was repeated until each species belonged to one of the groups. Finally, the process was repeated to achieve a smaller number of groups. The process was started from the species with the fewest samples to achieve groups with enough samples for building a stable classification model.

#### *2.13. Statistical Significance Tests*

McNemar's test without continuum correction was used for testing statistical significance [18,49,50]. It is an appropriate method when the sample size is small [51]. Specifically, McNemar's test was used

to assess: (1) whether there were significant differences in the OAs between the different feature sets; (2) whether there were significant differences in the OAs between SVM and RF classification results; and (3) whether the feature selection had a significant impact. McNemar's tests were calculated from leave-one-out cross validation (LOOCV) results. The limitation of McNemar's test is that it does not measure the variation resulting from the choice of training sets or internal randomness of the algorithm. As the RF results vary between iterations, we repeated the LOOCV for RF classifier 50 times and used the mode of the prediction results of each sample when McNemar's test results were classified.
