*2.2. Whole-Genome Pattern of LD*

The LD and LD decay rates were analyzed for each population separately as well as the merged population using the filtered SNP data. The physical distances of pair-wise SNPs at which the LD *r* 2 dropped to half were 1242, 223, 728 and 272 kb for BM, EV, SU and merged populations respectively. This indicated substantial variation in LD decay rate across populations (Figure 2). The average LD *r* 2 of BM, EV, SU, and merged populations were 0.37, 0.26, 0.28 and 0.30, respectively, with the number of haplotype blocks for each population estimated at 599, 648, 206 and 1205, respectively (Table S3).

*Int. J. Mol. Sci.* **2018**, *19*, x 4 of 24

*Int. J. Mol. Sci.* **2018**, *19*, x 4 of 24

**Figure 1.** Distribution of 17,288 SNPs, 114 selective sweeps and 33 QTL on the 15 chromosomes of flax for each of three bi-parental populations BM, EV and SU and, for the merged population (BM + EV + SU). Four vertical bars from left to right for each chromosome represent the BM + EV + SU, BM, EV and SU populations, respectively. Short horizontal lines on bars represent SNPs. QTL regions are highlighted in cyan and by vertical blue lines. Red triangles identify QTL's peak SNP. Selective sweeps are represented by short vertical black lines. **Figure 1.** Distribution of 17,288 SNPs, 114 selective sweeps and 33 QTL on the 15 chromosomes of flax for each of three bi-parental populations BM, EV and SU and, for the merged population (BM + EV + SU). Four vertical bars from left to right for each chromosome represent the BM + EV + SU, BM, EV and SU populations, respectively. Short horizontal lines on bars represent SNPs. QTL regions are highlighted in cyan and by vertical blue lines. Red triangles identify QTL's peak SNP. Selective sweeps are represented by short vertical black lines. **Figure 1.** Distribution of 17,288 SNPs, 114 selective sweeps and 33 QTL on the 15 chromosomes of flax for each of three bi-parental populations BM, EV and SU and, for the merged population (BM + EV + SU). Four vertical bars from left to right for each chromosome represent the BM + EV + SU, BM, EV and SU populations, respectively. Short horizontal lines on bars represent SNPs. QTL regions are highlighted in cyan and by vertical blue lines. Red triangles identify QTL's peak SNP. Selective

**Figure 2.** Intra-chromosome LD (*r*2) decay of SNP pairs over the entire flax genome as a function of physical distances (kb) of pair-wise SNPs for the three individual and merged populations. The curves are drawn based on a fitted non-linear model (see Section 4.2). **Figure 2.** Intra-chromosome LD (*r* 2 ) decay of SNP pairs over the entire flax genome as a function of physical distances (kb) of pair-wise SNPs for the three individual and merged populations. The curves are drawn based on a fitted non-linear model (see Section 4.2).

are drawn based on a fitted non-linear model (see Section 4.2).

sweeps are represented by short vertical black lines.

#### *2.3. Genetic Diversity and Population Structure Int. J. Mol. Sci.* **2018**, *19*, x 5 of 24

Nucleotide diversity (*π*) was estimated at 41.52, 38.26 and 3.95 for the BM, EV and SU populations, respectively (Table 1), and was consistent with the number of SNPs identified from the three populations. A strong population-differentiation (*Fst*) was observed at 0.44 between BM and SU and 0.48 between EV and SU. However, *Fst* was weaker at 0.04 between the BM and EV (Table 1). *2.3. Genetic Diversity and Population Structure*  Nucleotide diversity (*π*) was estimated at 41.52, 38.26 and 3.95 for the BM, EV and SU populations, respectively (Table 1), and was consistent with the number of SNPs identified from the three populations. A strong population-differentiation (*Fst*) was observed at 0.44 between BM and SU and 0.48 between EV and SU. However, *Fst* was weaker at 0.04 between the BM and EV (Table 1).

**Table 1.** Genetic differentiation (*Fst*) between three bi-parental (upper triangle elements) and nucleotide diversity (*π*) within these populations (diagonal elements). **Table 1.** Genetic differentiation (*Fst*) between three bi-parental (upper triangle elements) and

nucleotide diversity (*π*) within these populations (diagonal elements).



The genetic structure within the merged population was assessed based on the 17,288 SNP loci from the 260 individuals using two methods: principal component analysis (PCA) and discriminant analysis for principal components (DAPC). Bi-plots of the first three principal components of the PCA showed five distinct clusters (Figure 3a,b). The BM (BM1 and BM2) and EV (EV1 and EV2) populations each contained two sub-populations, while SU produced a single cluster. DAPC corroborated the same five clusters (Figure 3c,d). Therefore, a DAPC Q matrix based on the five clusters was generated and used as covariates to assess the population stratification in GWAS and phenotypic variation explained by the SNPs. The genetic structure within the merged population was assessed based on the 17,288 SNP loci from the 260 individuals using two methods: principal component analysis (PCA) and discriminant analysis for principal components (DAPC). Bi-plots of the first three principal components of the PCA showed five distinct clusters (Figure 3a,b). The BM (BM1 and BM2) and EV (EV1 and EV2) populations each contained two sub-populations, while SU produced a single cluster. DAPC corroborated the same five clusters (Figure 3c,d). Therefore, a DAPC Q matrix based on the five clusters was generated and used as covariates to assess the population stratification in GWAS and phenotypic variation explained by the SNPs.

**Figure 3.** Principal component analysis (PCA) and discriminant analysis of principal components (DAPC) of the 260 individuals in three bi-parental populations (BM, EV and SU) based on 17,288 SNPs. (**a**) Bi-plot of the first and second principal components (PCs); (**b**) Bi-plot of the first and third PCs; (**c**) *k*-means clustering analysis based on 100 chosen PCs shows that the optimal number of clusters (*k*) is 5, that is where the Bayesian information criterion (BIC) is lowest (arrow); (**d**) DAPC scatter plot. Percentages in parentheses in the axis titles of (**a**) and (**b**) represent the variance explained by the respective PCs. Individuals from the BM and EV populations grouped into two subpopulations each, BM1 and BM2, and EV1 and EV2, respectively. **Figure 3.** Principal component analysis (PCA) and discriminant analysis of principal components (DAPC) of the 260 individuals in three bi-parental populations (BM, EV and SU) based on 17,288 SNPs. (**a**) Bi-plot of the first and second principal components (PCs); (**b**) Bi-plot of the first and third PCs; (**c**) *k*-means clustering analysis based on 100 chosen PCs shows that the optimal number of clusters (*k*) is 5, that is where the Bayesian information criterion (BIC) is lowest (arrow); (**d**) DAPC scatter plot. Percentages in parentheses in the axis titles of (**a**) and (**b**) represent the variance explained by the respective PCs. Individuals from the BM and EV populations grouped into two subpopulations each, BM1 and BM2, and EV1 and EV2, respectively.

#### *2.4. h*<sup>2</sup> *SNP*

Phenotypic variation of traits was largely explained by SNPs in the three individual and the merged populations (Table 2). The average *h* 2 *SNP* for all 11 traits was 0.51. The largest *h* 2 *SNP* values among the four populations ranged from 0.45 (YLD) to 0.90 (PAL). More than 80% of the phenotypic variation in one of the populations was explained by identified SNPs for days to maturity (DTM), IOD, PAL, STE, LIO and LIN. The *h* 2 *SNP* varied from one population to another depending on the genetic variation between the two parents. For SU, little or no phenotypic variation was explained by SNPs for DTM, plant height (PLH) and STE. For EV, a relatively low phenotypic variation (*h* 2 *SNP* < 0.1) was explained by SNPs for STE and OLE.

**Table 2.** Phenotypic variation explained by all SNPs (*h* 2 *SNP*) and identified QTL (*h* 2 *GWAS*) for 11 traits in different populations.



**Table 2.** *Cont.*

YLD: seed yield; PLH: plant height; DTM: days to maturity; PRO: protein content; OIL: oil content; IOD: iodine value; PAL: palmitic acid content; STE: stearic acid content; OLE: oleic acid content; LIO: linoleic acid content; LIN: linolenic acid content; BM: CDC Bethune/Macbeth; EV: E1747/Viking; SU: SP2047/UGG5-5. § *h* 2 *GWAS* of YLD was estimated based on the phenotypes in a single environment (Morden/2012). For all other traits, *h* 2 *GWAS* was estimated based on the BLUP estimates of phenotypes.
