**1. Introduction**

The spread of invasive and expansive species is one of the main threats to biodiversity and functioning of ecosystems [1]. This results in transformation of natural habitats, displacement of native species, and degrading environmental conditions (e.g., number of existing micro- and macrophytes). It also generates economic losses by degrading the quality of soil and destroying road and railway infrastructure [2]. In the European Union (EU), it is estimated that the cost of controlling and combating invasive species amounts to approximately 12 billion EUR per year [3]. Implementation of appropriate remedial strategies and effective limitation of the invasion's effects require constant monitoring, which is emphasized in the EU Regulation No. 1143/2014.

The species that pose a threat to natural habitats protected under the Natura 2000 program in Poland include, for example, native expansive plants such as blackberry shrubs (*Rubus* spp. L.), perennial wood small-reed (*Calamagrostis epigejos* (L.) Roth), and foreign invasive goldenrod species (*Solidago* spp. L). These species do not have high requirements concerning their habitat; they also

reproduce quickly, both in terms of vegetative and generative reproduction, and they stifle other plants [4]. They negatively impact valuable natural habitats, such as inland sand calcareous grasslands, mountain and lowland *Nardus* grasslands, *Molinia* meadows, and alluvial meadows. They are extensively used in fresh low pastures in mountain hay and bent-grass meadows [5–7]. In order to prevent further changes in the vegetation, these harmful species should be identified and removed preferably at the early stages of invasion.

The current monitoring of plant species changes is based on fixed target areas. Individual specimens of the species found in target areas are counted, and the observed regularities are extrapolated to the whole area, which can differentiate due to, for example, environmental components or land use. In comparison to traditional fields, remote sensing allows for objective and repetitive monitoring that can be conducted both on local and global scales [8,9]. Considering the complexity of class distinctions, both intra-class similarities and differences between classes, the data which can be used for this purpose are multispectral, such as Landsat [10], WorldView-2 [11], or hyperspectral data (e.g., HyMap) [12]. As hyperspectral data constitute a source of ongoing information about spectral reflection, they provide a lot of information about the biophysical and chemical characteristics of the analyzed vegetation [13–15]. Either hyperspectral satellite data (e.g., Hyperion [16] and CHRIS [15,17]) or aerial data (e.g., APEX [18] and AISA [19,20]) are used, depending on the size of the research area and the canopy characteristics of the identified vegetation. Airborne data are more useful for the detection of small, less compact patches of plant species because of their high spatial resolution [16]. The study of Mediterranean plants in southern France confirms that spectral and spatial resolution influence the accuracy of vegetation mapping [21]. The highest accuracy of classification of five vegetation types was obtained using the airborne hyperspectral imaging sensor, HyMap. Depending on the classification method used, the overall accuracies (OAs) ranged from 62.3% for k-nearest neighbor (k-nn), 67.7% for Random Forest (RF), and 70.2% for Support Vector Machine (SVM), up to 72.5% for Artificial Neural Networks (ANNs), while the use of ASTER satellite data resulted in slightly lower accuracy levels (from 60.3%), and the worst results were obtained using multispectral data Landsat 7 ETM + (59.3%).

Multi-dimensional, large-scale image data can be used effectively when their use is based on modern classification methods, i.e., Support Vector Machine (SVM) [22] or Random Forest (RF) [23]. Both are considered to be among the most effective classification methods [21]. The SVM algorithm transforms the original space and then constructs an optimal hyperplane in the multi-dimensional feature space, which divides the data into different classes with the largest possible margin of separation. The algorithm works well on noisy data and small numbers of training pixels; it is sufficient to develop support vectors and usually has a higher level of accuracy than other classification algorithms [21,24]. The SVM method was compared with different types of neural networks (MLP, multilayer perceptrons; CANFIS, co-active neurofuzzy inference systems) used for classifying five types of cultivated plants in Spain using HyMap data [25]. Results have shown that, despite small differences in the classification accuracy (OASVM = 96,4% 29, OAMLP = 94,5%, OARBF = 94,1%, OACANFIS = 94,2%), the SVM algorithm is more efficient than neural networks in terms of stability, reliability, simplicity, as well as the speed of the classification process. Moreover, SVM achieved very high accuracies (OA = 93%) during the detection of invasive *Solanum mauritianum* shrubs on *Pinus patula* plantations in southern Africa on the basis of AISA Eagle images [20].

On the other hand, the RF algorithm works by creating many decision trees based on a random subset of training data, and the final decision is made by combining individual tree votes [23]. The advantage of this method is its resistance to overfitting of the training set and its short classification time. Good results were achieved by using the RF method to study the invasion of *Euphorbia escula* and *Centaurea maculosa* in Montana [15]. The accuracy levels of classification based on the airborne hyperspectral HySpex images for the mentioned plant species were 86% and 84%, respectively. Additionally, the Random Forest algorithm has proved its worth in identifying two expansive grassland species, *Molinia caerulea* and *Calamagrostis epigejos*, in the Silesia Upland in Poland. HySpex and LiDAR (light detection and ranging) products from the Riegl LMS-Q680i scanner were used in the study,

obtaining the highest median Kappa of 0.85 (F1 = 0.89, which is a mathematical product of the user (UA) and producer accuracies (PA)) for *M. caerulea* identification and 0.65 (F1 = 0.73) for *C. epigejos* [26].

The use of SVM and RF methods yielded good results during the classification of 20 types of grassy vegetation in the Hortobágy National Park in eastern Hungary on the basis of AISA Eagle II data [27]. The highest accuracy of classification was obtained on the first nine Minimum Noise Fraction (MNF) transformation bands of the hyperspectral image and by using 30 random training pixels (OASVM= 82.06%, OARF = 79.14%, OAML = 80.78%). However, when the training set was reduced to 10 pixels, SVM and RF methods still maintained high levels of accuracy (79.57% and 76.55%, respectively), while the ML accuracy dropped significantly to 52.56%. The low level of sensitivity to the training sample size is a big advantage of these algorithms, especially SVM. On the other hand, the RF algorithm had a short image classification time (3 minutes) compared to the other methods used on the same data set (SVM = 16 min, ML = 8 min). Studies of Mediterranean vegetation (mainly shrubs varying in height from about 0.5 m to almost 5 m) that were carried out in Languedoc in southern France demonstrated that RF and SVM methods obtained better information from hyperspectral data than any traditional classifiers (e.g., classification tree (CT), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k-nearest neighbor (k-nn)), especially when the spectral differences between classes were small [21]. When distinguishing 15 species of plants, the overall accuracies of the classification for modern methods, i.e., SVM and RF (OASVM = 39.2–47.9%, OARF = 39.3–49.5%), were higher than those recorded for traditional methods (OACT = 28.6–44.4%, OALDA = 37–45.1%, OAQDA = 37.5–39.3%, OAk-nn = 18–28.8%), depending on the set of input data. The artificial neural network (ANN) method was also used to identify plant species; however, this experiment did not lead to satisfactory results.

The aim of the current analysis was to verify whether the expansive/invasive *Rubus* spp., *Calamagrostis epigejos*, and *Solidago* spp. were characterized by a specific set of spectral characteristics that allowed them to be distinguished from the surrounding species, which altogether create a mix of fuzzy, covered patterns. Moreover, an analysis of the impact of the number of pixels in training data set on the classification accuracy was performed. Well-known reference classification algorithms were applied, SVM and RF methods, which are commonly used because of their effectiveness.

The proposed method could be applied in extensively used agricultural areas (considering traditionally used farming methods), and not limited to only selected test areas.

#### **2. Materials and Methods**

## *2.1. Study Area and Objects of the Study*

The research area was located in southern Poland near the town of Malinowice (Silesian Province) and covered an area of approximately 10.6 km<sup>2</sup> of the Natura 2000 habitat (Figure 1). This is an upland area covering the Tarnogóra Hummock and the Katowice Upland and is in a transitional temperate climate. This area is dominated by grasslands, meadows, and forests. Blackberry (mainly *Rubus caesius L.*, European dewberry), various species of goldenrod (*Solidago* spp.), and wood small-reed grass (*Calamagrostis epigejos*) occur very frequently in this area.

*Rubus* spp. L., a genus of plant in the *Rosaceae* family commonly called bramble, is one of the most important expansive species [28]. Blackberries are native to Asia, Europe, and North and South America [29], and they often pose a threat to young forest crops and habitats protected under the Natura 2000 program. They are typically shrubs (can be up to 3 meters high) with perennial roots, biennial prickly stems, and edible fruits which are aggregates of drupelets [29]. Blackberries can be found in all kinds of environments, including forests, shrubs, meadows, wastelands, and roadsides. Vegetative reproduction and production of a large number of seeds that are spread by birds and other animals allows them to quickly colonize new areas [30]. They bloom from May to September. According to the latest data, there are 105 *Rubus* species in Poland alone [31]. *Rubus* spp. L. is linked to negative economic and environmental consequences (e.g., changes in the dominant type of vegetation,

soil depletion, or increased susceptibility to fires) [32]. The spectral characteristics of *Rubus* spp. are very similar, which is why they were identified collectively in the paper without division into individual species.

**Figure 1.** Field research polygons on the Malinowice area.

Another widespread, expansive species that degrades grassland and meadow communities is *Calamagrostis epigejos* (L.) Roth, commonly referred to as wood small-reed [33]. It is a perennial grass in the *Poaceae* family, which is native to the Eurasian area [5], and has spread to North America [34]. The plant has thick and rigid blades that can be up to 2 meters high and has complex inflorescences in the form of a panicle. Wood small-reed propagates vegetatively, through numerous stolons, as well as generatively, through seeds (i.e., kernels) [35]. It blooms from July to September, often forming extensive single-species fields whose colors vary from green to brown to purple. Wood small-reed grows in meadows, forests, urban areas, along railways, and on the roadsides. A large amount of reed biomass is deposited in non-hay areas, and its lengthy decomposition time causes acidification of the substrate and hinders development of other plants [36].

Some of the most invasive plants that pose a huge threat to native species and biodiversity of entire ecosystems are representatives of the goldenrod genus (*Solidago* spp. L.). They are perennials from the *Asteraceae* family, imported from North America to Europe as decorative plants [37]. Goldenrod occurs in the form of three invasive species: *Solidago canadensis* (Canadian goldenrod), *Solidago gigantea* (tall goldenrod), and *Solidago graminifolia* (grass-leaved goldenrod) [38,39]. These plants have stiff sprouts that can be up to 2 meters tall, ending in pyramidal panicle clusters, which are formed by flowers clustered in heads [40]. They propagate vegetatively, thanks to underground rhizomes, and generatively with the help of light seeds (achenes with pappus) that can be spread over considerable distances [41]. They quickly begin to dominate and often form dense single-species patches. They bloom from July to October, forming characteristic yellow inflorescences. Goldenrods have a high tolerance for various soil types, but they require exposure to full sun [42]. They grow in open habitats such as meadows, wastelands, anthropogenic areas, and along roads and river banks [2].
