**4. Discussion**

The effects of the raster data set and the number of training pixels on the classification accuracy of three invasive or expansive plant species were tested in this paper using the Random Forest and Support Vector Machine methods. The method we used to divide the patterns into three sets—the training set, the test set, and the spatially independent validation set—allows for reliable assessment of the classification accuracy. Balanced training sets of 30, 50, 100, 200, and 300 pixels per class were tested in this paper. The test set was strongly spatially correlated with the training set, which led to inflated accuracy results; therefore, it was used only for the initial accuracy assessment. However, surprisingly accurate measures (PA, UA, F1) calculated on the test data set increased, despite the decrease in the number of patterns in the test set. This highlights the importance of using spatially separate data set for proper accuracy assessments. A constant set of validation pixels that remained both unchanged between iterations and was spatially separate from training data allowed us to reliably assess the accuracy of classification. Spatial separation of the data sets used to assess classification results and train classifiers allowed us to avoid artificial inflation due to spatial correlation between pixels belonging to the same reference polygon. Such a method allows for more objective comparisons

of classification algorithms and data sets, while delivering more trustworthy accuracy metrics. The very act of creating training and test or validation data sets introduces human or random bias into any comparison. In order to decrease such bias of our method, training and validation data sets were created multiple times. Such approaches were already used multiple times in the past [24,51,52] and are proven to be more reliable when it comes to classifier comparison. The accuracy of any machine learning procedure is directly related to the quality of samples used for training and validation of a given classifier. In order to decrease the impact of human or random bias in creating the data sets, training and validation data sets were created multiple times. Repeated sampling of pixels for the reference sets and assessing classification accuracy minimized the impact of pixel selection for training on the classification accuracy and allowed an objective assessment of the impact of the tested data sets on the effectiveness of species identification [26,52,53].

The analyses showed that, regardless of the selected classifier, a higher F1 score for all classes was obtained for classifications performed on 30 MNF transformation bands (0.854–0.918) than those on 430 hyperspectral data bands (0.760–0.853). The reduction in the number of input layers to several dozen of the most informative bands is recommended for the Random Forest and Support Vector Machine algorithms, as it allows one to obtain higher accuracy levels and significantly shortens the classification time [51,54–56]. During the classification of herbaceous vegetation in the Hortobágy National Park (Eastern Hungary), a higher overall accuracy level was obtained for nine MNF transformation bands (SVM = 82.06%, RF = 79.14%) than for 128 original bands of AISA Eagle (SVM = 72.85%, RF = 72.89%) [27]. Similarly, when identifying tree species based on AISA Eagle data using the SVM algorithm, classification of the MNF-transformed data resulted in an increase of about 30% in the classification agreement compared to the classification performed on the original bands [57]. The first 30 MNF transformation bands were used, for example, to identify four invasive or expansive species in central Poland, obtaining high F1 scores of identification: about 0.80 for *Filipendula ulmaria* and *Molinia caerulea*, about 0.79 for *Phragmites australis*, and about 0.73 for *Solidago gigantean* [58].

The increase in the number of pixels used to train the F1 score classification for the three species analyzed in this article resulted in an increase of these values, but also a simultaneous decrease in their distribution width, which indicates stabilization of the results. Our observations indicate that the preferred number of training patterns is at least 300 pixels per class, regardless of the classifier used. In the case of 30 MNF and the SVM algorithm, 300 was the optimal value because there were no statistically significant differences between training data sets containing 200 and 300 pixels per class (Figure 6). Due to the unavailability of the larger continuous areas of invasive plants on our research area, we have limited the analysis to 300 pixels, and therefore we were unable to assess impact of larger number of pixels per class in training data set on achieved classification results. A similar trend was noticed by testing different sets of training pixels (from 10 to 30 pixels) and raster data for the classification of 20 herbaceous species in Eastern Hungary by means of the SVM and RF algorithms [27]. Moreover, the highest overall accuracy (SVM: 82.06%; RF: 79.14%) was obtained using the largest of the tested sets of patterns (30 training pixels). The overall classification accuracy decreased with a decreasing number of training pixels (lower by about 2 percentage points for the set of 10 training pixels).

After a detailed analysis, it can be concluded that the Support Vector Machine algorithm was more resistant to smaller numbers of training patterns and allowed to obtain a higher mean F1 score for three plant species (F1 SVM = 0.95) compared to the Random Forest algorithm (F1 RF = 0.92) on the best data set (30 MNF, 300 training pixels). Lower mean F1 scores for background classes (F1 SVM = 0.82, F1 RF = 0.91) were noted in the SVM result image, but classification errors occurred mainly between different background classes and not between the background and plant species.

Visual interpretation of the result images and statistical accuracy analyses indicated that both classifiers detected the plant species of this study in the research area with a very high level of accuracy. Correct identification of species was also confirmed by additional field verifications carried out after the analyses. High classification accuracy levels obtained for the analyzed scenarios may also be due to

the optimal time in which the imaging was obtained [26,59]. The analyzed species are in their flowering and fruiting phases at the turn of August and September, which makes them more distinctive thanks to their characteristic colors of inflorescences, fruits, and leaves (Table 5).

The classification accuracy of the *Solidago* spp. species was very high (F1> 0.95) for both classifiers and the raster data. This is not surprising because this plant's yellow inflorescences form homogeneous fields, which are easy to distinguish from other objects in images, and it would probably be even possible to use photointerpretation for this task. The *Solidago gigantea* species was identified in central Poland using 30 MNF transformation bands (a mosaic of hyperspectral data from the same HySpex sensors) and the Random Forest method; a lower F1 score for the species, about 0.73, and a slightly higher F1 score for the background, about 0.94, were obtained [58]. *Solidago* spp. has also been classified with high accuracy (F1 about 0.83, UA = 0.71, PA = 1.0) on the Hungarian–Slovak cross-border site using 15 MNF bands (a mosaic of hyperspectral data from AISA Eagle II) and the maximum likelihood method [61]. High identification accuracy of one of the goldenrod species, *Solidago altissima* (F1 score of about 0.86, UA = 0.94, PA = 0.80), was also obtained during the research conducted in Watarase wetlands in Japan with the help of only 3 MNF transformation bands (a mosaic of hyperspectral data from AISA Eagle) and generalized linear models [19].


**Table 5.** Comparison of acquired results with references.

*Rubus* spp. was classified in the research area with F1 scores ranging from 0.70 to 0.97, with the highest accuracy obtained for the Support Vector Machine method and 30 MNF transformation bands. High accuracy (OA = 87.8% and Kappa = 0.75) was also obtained during the detection of *Rubus armeniacus* in open areas in Surrey, BC, Canada, by means of a combination of CASI hyperspectral imagery with LiDAR data and the Random Forest algorithm [62]. Similarly, when identifying *Rubus* *fruticosus sp. agg.* in the Kosciuszko National Park in Australia, a F1 score of about 0.83 was obtained for blackberry using 23 bands of a mosaic of hyperspectral data from HyMap after MNF transformation and the Mixture-Tuned Matched Filter (MTMF) algorithm [32]. On the other hand, research on the identification of *Rubus cuneifolius* species in the eastern parts of South Africa using the SVM algorithm and multispectral data led to results that were much lower in accuracy: the F1 scores for the Landsat data varied from 0.33 to 0.48, while the scores for the Sentinel-2 data were between 0.34 and 0.58, which confirms that hyperspectral data allow for much more accurate detection of blackberries [63].

Identification of *Calamagrostis epigejos* resulted in F1 scores between 0.70 and 0.91, depending on the algorithm and data set used. As before, the best data set for wood small-reed classification turned out to be the SVM algorithm and MNF-transformed bands (F1 scores from 0.86 to 0.91), while the RF method resulted in F1 scores between 0.76 and 0.83, depending on the number of pixels used for training. By carrying out *C. epigejos* classifications at various growth stages, it was confirmed that flowering time (around September) facilitated correct identification of wood small-reed [26]. In addition, the use of the Random Forest method and MNF transformation bands on the HySpex hyperspectral data led to an F1 score of 0.72, which is an accuracy level close to the one obtained for wood small-reed in our research. Lower accuracy was obtained (producer accuracy 68%, and user accuracy 51%) in the classification of plant communities representing the *Calamagrostis villosa* species when the APEX data and the SVM method were used [60]. However, an average PA of about 82% and UA of about 75% were obtained for wood small-reed grasses during the classification of high-mountain vegetation communities using 40 MNF transformation bands of the DAIS 7915 data and neural networks [64]. This was similar to the results obtained in our work on 30 MNF bands with the RF algorithm (PA of about 88%, UA of about 78%) and was lower than the results for SVM (PA of about 96%, UA of about 87%).
