*4.3. Population Structure, LD, Genome-Wide Association Study, and Candidate Genes*

Population structure was estimated with 259 neutral SSR loci [41] distributed across flax's 15 chromosomes. The software STRUCTURE v.2.3.4 [47] was employed with predefined numbers of genetic clustering (K) from 1–5, using 50,000 burn-in iterations, followed by 100,000 MCMC across five independent runs for each K values. The number of clusters (K) was calculated with the Evanno method [77] implemented in the R package POPHELPER v.1.1.10 [78]. A total of 771,914 SNPs, filtered from the 1.7 million SNPs by removing those with a minor allele frequency <0.05 and >10% missing data, were used to produce a dendrogram using the neighbor-joining (NJ) algorithm implemented in TASSEL v.5.2.31 [70]. Genome-wide linkage disequilibrium (LD) and intrachromosomal LD between pairs of SNPs using the 771,914 filtered SNPs were estimated using squared allele frequency correlations (*r* 2 ) in TASSEL v.5.2.31 [70]. LD values were plotted against physical distance to determine the LD decay using the Hill and Weir [79] function. A cut-off value of *r* <sup>2</sup> = 0.1 was set to estimate the average LD blocks [41].

GWAS was performed in TASSEL v.5.2.31 [70] using the 771,914 filtered SNPs. Three models were evaluated, including GLM-Q, GLM-PCA, and MLM-K. The Q matrix generated from STRUCTURE was used as a cofactor to adjust for population stratification (GLM-Q). A GLM-PCA was assessed, including up to ten principal component covariates. The ten PCAs were generated in TASSEL v.5.2.31 [70] with 105,038 SNPs (MAF > 0.05 and at least 95% present among the 200 genotypes). For the MLM-K, a kinship matrix was created in TASSEL v.5.2.31 [70] with the set of 105,038 SNPs, and used as covariate to account for cryptic relatedness. A quantile–quantile (Q–Q) plot was displayed using the R package qqman [80] to evaluate the fitness and efficiency of the different models. The final Manhattan plots were also displayed using the qqman package [80]. The Bonferroni correction (0.1/771,914 = −log (*P*) = 6.88) was used as threshold for the significance of marker–trait associations.

To identify candidate genes associated with significant SNPs, the Jbrowse feature of Phytozome v.12.1 (http://phytozome.jgi.doe.gov/pz/portal.html) was used to examine the *L. usitatissimum* v.1.0 genome [71] for genes relevant to MC and HC in flaxseed. As mentioned above, a cut-off value of *r* <sup>2</sup> = 0.1 was set to estimate the average LD block for each chromosome. The defined physical distance was used to pinpoint candidate genes on either side of the most significant SNPs. A plausible candidate gene was defined by the following criteria: (a) the gene had a function known to be related to the trait evaluated based on gene ontology term descriptions in Phytozome; (b) BLASTX searches from the Arabidopsis genome returned orthologous protein sequences with functions associated to the phenotypes of interest.
