*3.1. Statistical Analysis of Investigated Classification Scenarios*

The mean F1 score was calculated for all classes on two sets: the test set and the validation set. The test set was dependent on the training set—the pixels in these sets were drawn from the same polygons, so the number of pixels in the test set decreased with an increasing number of pixels in the training set (Table 2). The high accuracy level obtained for this set is, therefore, not surprising, nor can it be used to compare the classifiers.

In contrast, the validation set had a fixed number of observations (4835 pixels) and was spatially independent of the other data sets. Regardless of the classifier used, higher mean F1 scores for all classes based on the validation set were obtained for classifications performed on 30 MNF transformation bands (0.854–0.918) compared to that of the 430 hyperspectral data bands (0.760–0.853).


**Table 2.** Classifier training parameters and their average F1 scores.

The accuracy level for both classifiers increased with the number of training pixels used for classification (Figure 5). The distributions of the mean F1 score for all classes revealed that when the number of training pixels increased, the interquartile range of the obtained accuracies decreased, so the results obtained in 100 iterations were more stable. What is more, the use of a smaller number of training pixels caused a greater decrease in the accuracy of classifications performed on the original hyperspectral bands than in the case of classifications performed on the MNF transformation bands. The most stable distributions and the highest F1 scores for all classes were obtained by the classifications performed on a set of 30 MNF transformation bands and 300 training pixels (the median F1 for RF was about 0.92, while the median F1 for SVM was about 0.88).

**Figure 5.** Distributions of mean F1 scores for all classes calculated on the validation data set for SVM and RF classifiers; both analyzed raster data sets and a different number of training pixels. Explanations are presented in Figure 4.

In order to check if there are statistically significant differences in the F1 scores of all the tested scenarios, the Mann–Whitney–Wilcoxon test was carried out at the significance level of 0.05 (Figure 6). There were statistically significant differences between most of the considered scenarios. The SVM classifications on MNF bands using 200 and 300 pixels for classifier training were the only exception. There were no statistically significant differences found for the RF classification performed on 430 hyperspectral bands using 300 training pixels and the SVM classification on a very limited data set consisting of 30 MNF bands and 30 training pixels.

**Figure 6.** Matrix of statistical significance between scenarios calculated on the basis of F1 accuracy for all classes using the U Mann–Whitney–Wilcoxon test (red fields indicate significant differences between populations at the 0.05 significance level). Names of scenarios contain an acronym of the classification algorithm (RF or SVM), information about raster data (430 HS or 30 MNF), and size of the training data set in pixels.

*Remote Sens.* **2020**, *12*, 516

An analysis of the distribution of F1 scores for individual classes of identified species (Figure 7) makes it possible to draw conclusions about the best data sets and algorithms for classifying each class.

**Figure 7.** F1 score distribution for validation data set (**a**) 430 bands and (**b**) 30 MNF bands. The horizontal axis of the charts indicates the number of pixels in the training set used to classify the given species using RF or SVM classifiers. The vertical axis shows the accuracy of the scenarios.

The *Solidago* spp. class identified well with all classifiers and raster data sets (the F1 score was above 0.95). The accuracy levels increased with an increasing number of training pixels, whereas the differences in accuracy levels resulting from the change in the size of the training sets were small. However, slightly higher mean F1 scores were recorded for the Random Forest classifier. *Solidago* are

marked by their very characteristic yellow color and spectral properties, which distinguished them from other classes in the imaging, and additionally tend to form large, uniform fields, so the almost perfect identification of this species was not surprising.

In the case of the *Rubus* spp. class, the best identification results were obtained for the SVM classification on 30 MNF bands using 300 training pixels (F1 = 0.97), but application of the same classifier with the number of training pixels reduced to 100 resulted in a similar accuracy level. Good results were also obtained for the RF classification on the same raster data set and 300 training pixels (F1 = 0.95). The F1 scores obtained on 430 hyperspectral data bands were lower (F1 RF from 0.7 to 0.76, and F1 SVM from 0.71 to 0.84).

*Calamagrostis epigejos* was a more difficult plant species to identify. However, high F1 scores of around 0.91 were obtained using the SVM algorithm, 30 MNF transformation bands, and sets of 200 and 300 training pixels. A similar accuracy level was also obtained for the SVM classification and 300 training pixels on 430 hyperspectral bands (F1 = 0.9). The Random Forest classification resulted in lower accuracy levels for this species, with F1 scores between 0.7 and 0.82 on the hyperspectral data set, and between 0.76 and 0.83 on the MNF transformation bands. The accuracy increased with the growth of the number of training pixels.

Considering the mean accuracy level for three species identified in the research area, it can be concluded that the best spatial distribution was obtained using the SVM algorithm and 200 or 300 training pixels (F1 = 0.95). For the other classes distinguished in the image (i.e., plant background, forests, buildings, bare soil, and shadows), the best F1 scores (from 0.93 to 0.96) were obtained with the RF algorithm. However, in terms of accuracy for all the classes together, the best accuracy (Kappa = 0.92, F1 for all classes = 0.92) was obtained for the RF classifier, 30 MNF bands, and sets of 200 and 300 training pixels.

To sum up, the SVM algorithm and the data set consisting of 30 MNF bands and 300 training pixels proved to be the best for identifying the *Calamagrostis* and *Rubus* classes. In the case of *Solidago* and background classes, better results were obtained with the Random Forest classifier. However, goldenrod classified well (mean F1> 0.95) on both sets of raster data and with a different numbers of training pixels. On the other hand, in the case of background classes, the best results were obtained for 30 MNF bands and 200 training pixels. This may indicate that the Random Forest method works better for the classification of spectrally uniform, large forms of land use, which differ significantly from their surroundings, while the SVM method is better for identifying plant species that are more spectrally different and similar to the background classes.

## *3.2. Best Model Plant Species Identification Accuracy*

A set of data consisting of 30 MNF bands and 300 training pixels was selected on the basis of the analysis of statistical accuracy to develop images showing spatial distributions of the analyzed species in the research area. Figure 8 presents distributions of the producer and user accuracies for 100 iterations of classifications performed on a selected set of data using both classifiers.

For the *Rubus* spp. class, the RF classifier yielded a lower median user's accuracy than that of SVM by three percentage points, while the differences in the producer's accuracy levels between the classifiers were small. Both the producer's and user's accuracies for *Solidago* spp. were very high (close to 100%), a slight underestimation was detected only in the case of the SVM classification (producer's accuracy about 93%). In contrast, the *Calamagrostis epigejos* class achieved the lowest median producer and user accuracies of all classes. The SVM classifier achieved higher producer and user accuracy levels for *C. epigejos* (PA = 96%, UA = 87%) than the RF classifier (PA = 88%, UA = 78%).

The resulting images for both classification methods prepared for the best mean F1 scores for all iteration classes are presented and compared below (Figure 9). The correctness of species identification was also assessed on the basis of the confusion matrix (Tables 3 and 4).

**Figure 8.** User and producer accuracies of the 300 pixel training set and 30 MNF bands classification.

**Figure 9.** Classification results of the (**a**) SVM and (**b**) RF based on 30 MNF bands and 300-pixel training sets; SVM: Kappa coefficient = 0.89, OA = 91.21; RF: Kappa coefficient = 0.92, OA = 93.23%.


**Table 3.** Confusion matrix of the SVM classification with 30-MNF bands and the 300 pixel training set (Kappa coefficient = 0.89, OA= 91.21%).

**Table 4.** Confusion matrix of the RF classification with 30 MNF bands and the 300 pixel training set (Kappa coefficient = 0.92, OA = 93.23%.).


*Rubus* spp. was identified near forest borders and buildings, and its spatial distribution for the SVM method reflected reality more accurately than the result of using the RF method (Figure 9). There was a slight overestimation of this species in the case of the RF method, especially in places with trees and bushes near buildings (Table 4). The *Calamagrostis epigejos* and *Solidago* spp. classes can be found in the open spaces of non-agricultural meadows. The spatial distribution of *Solidago* in the image resulting from the use of the RF method reflected reality almost perfectly, and in the case of the SVM method, the underestimation of this species applied mainly to uncut meadows in the south of the area. On the other hand, the *Calamagrostis epigejos* class was slightly overestimated in the results of both classifications, especially in places with dry or mowed meadows. The SVM classification image presents the spatial distribution of this species in the research area with greater precision (Table 3), and its estimations were more accurate, especially in places with bare soils, which have a similar spectral response.
