*4.2. Whole Genome Resequencing, SNP Calling, SNP Imputation and LD Analysis*

Three populations consisting of 97 randomly selected lines from BM, 91 from EV, 72 from SU including five parents (one parent is the reference genome) were grown in growth cabinets with a 16-h light and 8-h dark cycle at 20/18 ◦C. DNA was extracted from young leaf tissue using the DNeasy 96 Plant kit (Qiagen, Mississauga, ON, Canada) according to the manufacturer's instructions. The DNA was subsequently restricted, size-selected and quantified prior to the construction of the reduced representation libraries used for Illumina sequencing as previously described [47]. Reduced representation libraries from a total of 260 individuals of the three populations, i.e., 96 randomly selected from BM, 89 from EV, 70 from SU, and five parents (One parent CDC Bethune of BM is used as a reference genome) were re-sequenced by the Michael Smith Genome Sciences Centre of the BC Cancer Agency, Genome British Columbia (Vancouver, BC, Canada) using 100-bp paired-end reads on an Illumina HiSeq 2000 platform (Illumina Inc., San Diego, CA, USA).

SNP calling was performed using the revised AGSNP pipeline [47,48,68]. The flax scaffold sequences of cultivar CDC Bethune [46] were used as reference for read mapping. Then SNPs were called using SAMtools [69] and further filtered using a set of criteria such as mapped read depth, consensus base ratio, mapping quality score and homopolymers with a validation rate of 96.8% for the called SNPs as described in detail [47]. Finally SNPs with a MAF < 0.05 and a genotyping rate <60% were removed for further analysis. The coordinates of all SNPs were extracted from the chromosome-based flax pseudomolecules v2.0 [45]. Missing data for these filtered SNPs were imputed using Beagle v.4.2 [70].

Intra-chromosome LD (*r* 2 ) was calculated using plink ver. 1.9 [71] with the parameters "-r2 -ld-window-kb 2000 -ld-window-r2 0". Before LD calculation, SNP data were pruned using the parameter "–indep-pairwise 2000 50 0.9" to remove SNPs with high *r* 2 (>0.9) in a 2000 kb window with step size of 50 SNPs. Pair-wise *r* <sup>2</sup> values were plotted against the base pair distance, and curves

of LD decay with distances of paired SNPs were fitted using a non-linear regression model [72] as follows:

$$r^2 = \frac{10+cd}{(2+cd)(11+cd)} \times \left\{ 1 + \frac{(3+cd)\left(12+12cd+(cd)^2\right)}{n(2+cd)(11+cd)} \right\},\tag{1}$$

where *c* is the coefficient to be estimated, *d* is the distance between pair-wise SNPs, and *n* is the number of total gametes, corresponding to twice the number of individuals in a population. The R function *nls* was used to fit the model. The rate of LD decay for each population was determined from the fitted model at the half of the maximum LD (*r* 2 ). Haplotype blocks were calculated using plink with the parameters "–blocks no-pheno-req –blocks-max-kb 2000".
