*3.2. Pollen DNA Extraction Optimization*

The largest quantity of DNA extracted from 10 mg of pollen has been achieved using CTAB lysis buffer containing 0.04% SDS and 0.2 mg per sample of proteinase K. The average concentration of the extracted DNA was 16.57 and 13.62 ng ∗ μL−<sup>1</sup> for *Poa pratense* and *Bromus inermis* pollen, respectively. The purity of the extracted DNA was in the range of 1.883–2.006 OD 260/230 and 2.095–2.142 OD 260/280 regardless of the extraction protocol. An increase in proteinase K concentration in the lysis buffer led to a lower extracted DNA yield, and an increase in lysis time led to a slight increase in the yield in most cases (Table 3). Thus, we have chosen a protocol with a lysis incubation time of 2 h in the CTAB lysis buffer with 0.04% SDS and 0.2 mg per sample proteinase K for all further extractions.



The quantity of DNA extracted from 4 × serial dilutions of pollen suspension steadily decreased along with the pollen count and became undetectable (measured by fluorometric method) starting from a sample with 2350 pollen grains (Table 4). Thus, we have chosen 10,000 pollen grains for artificial pollen mixes creation.

**Table 4.** Pollen DNA extraction test results.


### *3.3. 5.'-ETS, ITS1, ITS2, and trnL-F Barcodes Comparison*

All four barcodes were amplified from DNA of herbarium specimens of 14 reference Poaceae species, Sanger sequenced, and submitted to the GenBank database. The obtained sequences were aligned with the corresponding GenBank sequences of these barcodes and used to construct a local reference database. The length and GC content of the barcode sequences varies slightly within each marker, except for the length of 5--ETS: 307–363 bp, GC content 29–33% for *trnL-F*; 175–509 bp, GC content 50–59% for 5--ETS; 190–204 bp, GC content 55–67% for ITS1; 193–207 bp, GC content 59–68% for ITS2. Evaluation of intra- and interspecific variability showed that while all barcodes have low intraspecific distances, the 5--ETS barcode has the highest interspecific distance closely followed by ITS2 (Table 5). Plastome barcode *trnL-F* showed the lowest intra- and interspecific distances compared to the nuclear barcodes.


**Table 5.** Intra- and interspecific distance statistics.

However, the difference between the barcode sequences is low for the species of the same genus (*Poa* in this study). For example, all barcodes of *Poa annua* and *Poa supina* have identical sequences, which means that these species will be impossible to distinguish. Other possible misidentification sources with barcoding gap less than 1% could be *Arrhenatherum elatius* vs. *Calamagrostis epigeios* and *Alopecurus pratensis*, *Lolium perenne* vs. *Festuca pratensis* (barcoding gap equals −0.008, 0.001, and −0.0001, respectively) for the ITS1 barcode; *Poa pratensis* vs. *Phleum pratense* (−0.0142), *Calamagrostis epigeios* vs. *Briza media*, *Poa pratensis*, and *Phleum pratense* (−0.002, 0.007, and 0.008, respectively) for ITS2; *Poa annua* vs. *Poa pratensis*, *Alopecurus pratensis,* and *Phleum pratense*, *Lolium perenne* vs. *Festuca pratensis* (−0.004, 0.009, 0.009 and 0.0000, respectively) for *trnL-F*. Barcoding gaps for all four barcodes are present in Figure 2. Additionally, barcode intra- and interspecific distances per species are present in Supplementary Figure S2.

### *3.4. Metabarcoding Analysis of the Artificial Pollen Mixes*

Using the optimized protocol for pollen DNA extraction, we have obtained DNA of 1.2–1.5 ng ∗ μL−<sup>1</sup> from artificial pollen mixes (am). The quality of obtained DNA was the same as we have obtained for the *Poa pratense* and *Bromus inermis* single-species pollen at the optimization step. Amplification was successful for all barcodes and all samples of artificial pollen mixes, though the amplification efficiency differs significantly between the barcodes and decreases as follows: ITS2 > ITS1 > 5--ETS > *trnL-F* (confidence intervals for amplified barcode concentrations are 20.02 ± 4.44, 13.04 ± 3.47, 8.32 ± 1.69, and 0.43 ± 0.07 ng ∗ μL−1, respectively).

The species composition of the artificial pollen mixes determined by HTS analysis is congruen<sup>t</sup> with the actual pollen species content in 10 out of 18 artificial mixes. The most frequent erroneous identification has occurred in mixes containing either *Lolium* or *Festuca* pollen. In these mixes, the erroneous presence of *Lolium*, where only *Festuca* is present, and vice versa, was detected. However, the abundance of the erroneously identified species is often low (less or close to 1%). This issue is common for all barcodes in the study, especially for the plastome *trnL-F* barcode (1.7–4.9% *Festuca/Lolium* errors). Nuclear barcodes show fewer errors of this type, minimal for ITS2, for which abundance of erroneously identified *Lolium* or *Festuca* is close to 0 in all cases.

**Figure 2.** Barcoding gaps for all four DNA barcodes in the study per species.

Spearman's correlation coefficient between HTS determined the abundance and true abundance of each species in the artificial pollen mix decreases as follows: ITS1 > ITS2 > *trnL-F* > 5--ETS (0.8, 0.78, 0.63, and 0.59, respectively). For the 5--ETS barcode, the *Bromus inermis* abundance in all the mixes is significantly lower than for the other barcodes and actual mix composition (0.41–3.16%). As the complexity of the artificial pollen mix increases, the abundance of the detected 5--ETS of *Bromus inermis* decreases. The low representation of the 5--ETS barcode of *Bromus inermis* is most likely related to the length of the amplified 5--ETS fragment (444 bp vs. 220 bp in average for other reference species in the database, except for 509 bp of *Poa supina* and *Poa annua*), which could lead to a lower amplification efficiency of the 5--ETS of *Bromus inermis* when in the mix with other species.

Overall, the nuclear barcodes proved to be the most effective in the amplification and species classification. The plastome *trnL-F* barcode has demonstrated a lower amplification efficiency and a higher rate of erroneously identified species than the nuclear barcodes. Though the mix composition could be determined well qualitatively by metabarcoding analysis, quantitative results for each pollen species, determined by read counts, is rarely congruen<sup>t</sup> with the actual abundance of pollen species in the mix. Most of the congruen<sup>t</sup> quantitative results were achieved using ITS1 and ITS2 barcodes (Figure 3).

**Figure 3.** Metabarcoding results for the artificial pollen mixes.
