**3. Discussion**

This study utilized the gd-GBS application, in combination with Haplotag pipeline, for the first time in CWG, to generate a data matrix of 192 genotypes × 45,507 SNP markers, and captured genome-wide genetic variants to evaluate the genetic diversity present in tetraploid CWG. The diversity analysis revealed 15.8% of SNP variation residing among the 12 lines and the model-based Bayesian analysis identified four major clusters of the assayed samples. These research outputs are not only useful for understanding the genetic diversity of CWG and for its breeding, but also are encouraging for molecular characterization of non-model polyploid plants.

The revealed patterns of genetic diversity are interesting. First, the model-based Bayesian approach in the STRUCTURE identified four major clusters of the assayed genotypes, while the distance-based approaches like PCoA and UPGMA identified three major clusters; however, the neighbor-joining analysis was in accordance with the result from STRUCTURE analysis. Following the pedigree of the assayed genotypes (Table S1), we could infer that the model-based Bayesian analysis and neighbor-joining analysis were able to genetically infer population substructure—an outcome of probable processes such as genetic drift, migration, mutation, and selection—more distinctly than distance-based approaches. Results also showed most of the genotypes grouped together within their lines, revealing that different lines were distinct. The STRUCTURE analysis (Figure 2A), neighbor-joining analysis (Figure 3), PCoA (Figure 4B), and UPGMA dendrogram (Figure 5A) revealed the genetic distinctness of lines Karabalykskij 202, PGR 16830, S8959E, and Vysokij 9. S8959E is a breeding line in the Saskatoon program, but it is a selection from Russian genebank line Vysokij 9. Although it has been recurrently selected for vigorous growth and plant type, it has not been interpollinated with any other lines, explaining its distinctness from other Canadian cultivars/breeding lines. However, STRUCTURE revealed all genotypes, except one (S8959E-14; Figure 2A) from line S8959E, showing high affinity with the line from Kazakhstan. This is also supported by UPGMA clustering (Figure 5A), while neighbor-joining analysis revealed the relatedness of lines from Russia. These findings will serve as valuable information for the genetic improvement of CWG for forage production.

Our analysis showed high within-line genetic variation (Table 2) of assayed CWG lines, which is in agreement with studies on highly outcrossing species [37]. Overall, our genetic diversity results are in accordance with diversity studies of CWG reported by Mellish et al. [18] using AFLP markers and Che et al. [31] and Che et al. [33,34] using SSR markers. The somewhat higher among population variation (15.8%) observed in the present study may partly be due to narrower genetic base of eight of the breeding lines/cultivars relative to the three genebank lines and one line of Russian origin (S8959E). Most of the Canadian cultivars and breeding lines shared one or more common parents in their genetic background (Table S1), and they have gone through many cycles of recurrent selection for vigor and yield. Thus, there has probably been a slight reduction in heterozygosity as indicated by the generally higher inbreeding coefficients (Figure 5B). The distinctness of the lines S8959E, Vysokij 9, Karabalykskij 202, and PGR 16830 can be attributed to their Asian origin and absence of interpollination with Canadian cultivars/lines and selection under Canadian conditions, except for the recurrent selection of line S8959E, mentioned above. Thus, the cultivars/breeding lines likely have reduced the within-line variation, while diverging more from the unselected Asian lines, explaining some increase of the among-line variation. Further research is needed on the utilization of the genetic variability of these lines with focus on morpho-physiological studies, adaptation, and their utilization in breeding programs. Likewise, the distinctness of the line S9491 in the UPGMA analysis (Figure 5A) is attributed to its synthesis from seven different lines/cultivars from breeding programs in Saskatoon and Logan, Utah, USA. The line S9514 was directly selected from S9491, which explains why these two lines clustered (green cluster) together in the STRUCTURE analysis (Figure 2) and neighbor-joining analysis (Figure 3). However, the Canadian cultivar "Kirk" developed partly from a plant introduction from a botanical garden in Finland (University of Turku) in 1968 showed shared pedigree with some or all of the Kazakhstan lines based on model-based Bayesian clustering (Figure 2A) and neighbor-joining

analysis (Figure 3). While the origin of the plant introduction from the University of Turku remains unknown, it can be reasoned that this original introduction may have common genetic background with some of the Kazakhstan lines based on Bayesian clustering.

It was observed that the extent of reduction in heterozygosity, as explained by Fst, was more in cultivars than most of the breeding lines. Two cultivars "AC-Goliath" and "Kirk" had lower diversity as indicated by higher inbreeding coefficient (Fst values) (Figure 5B), perhaps because of being synthesized from the interpollination of fewer genotype than many of the breeding lines. Also, most of the breeding lines included cultivars "Kirk", "AC-Goliath", and other sources, in their pedigrees. The cultivar "Newkirk" was selected from progenies of crosses between "Kirk" and "AC-Goliath". However, the inbreeding coefficient of "Newkirk" was lower than the parental cultivars, indicating a higher level of heterozygosity. The three breeding lines S9516, S9544, and S9556 showed high within-line genetic diversity according to greater dispersal of these lines on PCoA (Figure 4B), higher within line variation (92.2%) as explained by a separate AMOVA, and lower line-specific Fst (Figure 5B). This greater genetic diversity could be attributed to inclusion of diverse germplasm sources during their synthesis (Table S1). The high within-line variability suggests that there is sufficient genetic variation in all lines in this study to make progress from selection. Inclusion of germplasm from the Asian lines in the breeding program to interpollinate with Canadian cultivars/breeding lines will increase diversity.

Our gd-GBS application has identified thousands of genome-wide SNP markers to assess the extent of genetic diversity in the non-model polyploid CWG with no prior genomic information. These results demonstrated the technical feasibility and effectiveness of GBS to sample genome-wide genetic variability in other perennial grass species with complex genomes. High resolution plant genetic diversity analysis, with 45,000 SNP markers spread over a genome, is more informative than with relatively few markers, like AFLP and SSR used in previous studies [1,12,18,38–40]. Also, the experimental cost for sampling genome-wide variants in this study was roughly \$12,000, suggesting the feasibility of a wider application of GBS to characterize other perennial polyploid grass species. The results of the present study, along with those published in northern wheatgrass and wild oat [12,13], demonstrate the utility of GBS in molecular characterization of non-model plants with complex ploidy and genetic structures.
