*4.3. Genomic Data*

A total of 258,873 SNPs were obtained from the 370 accessions after pruning by removing redundant SNPs [4]. The missing data of SNPs (on average 14.13% of a missing data rate) were imputed using Beagle v.4.2 with default parameters [61]. Our previous GWAS analyses of PS in flax were conducted separately for combinations of the five individual year and the 5-year average datasets with ten statistical methods [4]. The statistical methods for GWAS included three single locus models (GLM [62], MLM [63] and GEMMA [64]) and seven multi-locus models (FarmCPU [65], mrMLM [66], FASTmrEMMA [67], ISIS EM-BLASSO [68], pLARmEB [69], pKWmEB [70], FASTmrMLM [71]). For GLM, MLM and FarmCPU, the first six principal components (PCs), accounting for 33.04% of the total variation, were chosen as covariates to measure population structure, while Frappe (http://med.stanford.edu/tanglab/software/frappe.html) was used to estimate the population structure of the 370 accessions for other six multi-locus models. GEMMA does not require a Q matrix. The threshold of significant associations for all three single-locus methods (GLM, MLM and GEMMA) and the multi-locus method FarmCPU was determined by a critical *p* value (α = 0.05) subjected to Bonferroni correction, that is, the corrected *<sup>p</sup>* value <sup>=</sup> 1.93 <sup>×</sup> <sup>10</sup>−<sup>7</sup> (0.05/258,873 SNPs), while a log of odds (LOD) score of three was used to detect robust association signals for the remaining six multi-locus models. The R package MVP (https://github.com/XiaoleiLiuBio/MVP) was used for GWAS analyses for the GLM, MLM and FarmCPU, the GEMMA software (https://github.com/genetics-statistics/GEMMA) for GEMMA and the R package mrMLM (https://cran.r-project.org/web/packages/mrMLM/index.html) for the additional six multi-locus models. The details of GWAS analyses were described in Reference [4]. A total of 500 non-redundant QTL for PS were identified from 370 diverse flax accessions, including 134 QTL that statistically stable in all five years and 67 QTL with relatively stable and large effects [4]. These three QTL datasets (500 unique QTL, 134 statistically stable QTL and 67 stable and large-effect QTL) were used for GP model construction. In addition, we performed Pearson's χ 2 test with Yate's continuity correction to detect all SNPs significantly associated with PS using a 10−<sup>5</sup> probability level. The three QTL sets and the genome-wide SNP set were used to construct the GP models. Thus, GP models with the 24 combinations of the four marker sets and the six phenotypic datasets were built and compared.
