*2.4. Data Analysis*

## 2.4.1. Genetic Data

All population genetic data analyses were undertaken using the coding environment in R using the R packages adegenet v2.1.3 [62] and dartR v1.1.11 [63]. In the first instance, the SNP dataset was subject to a filtering process using dartR to remove potentially erroneous SNPs. Monomorphic SNPs were excluded followed by the removal of SNPs with a reproducibility of <95%, a call rate of <90% (i.e., SNPs which have 10% missing genotypes or greater), and secondaries.

Pairwise FST, estimated as θ [64], was calculated between the five putative populations (Cry3Bb1, Cry34/35Ab1Ab1, Cry3B1\_Cry34/35Ab1Ab1, adapted to crop rotation, and non-resistant), along with observed (Ho) and expected (He) heterozygosity. Departure from Hardy–Weinberg equilibrium (HWE) was tested for each population using the function *gl.report.hwe* as implemented in the R package dartR [63], which includes Bonferroni correction for multiple testing. Using the function *gl.basic.stats* in dartR, overall basic population genetics statistics per locus, such as the observed (HO) heterozygosity, (FIS) inbreeding co-efficient per locus, and FST corrected for the number of individuals, was undertaken. To summarize genetic similarity among populations, *gl.tree.nj* in dartR was used.

The Bayesian model-based clustering algorithm implemented in the STRUCTURE v 2.3.4 [65] Evanno method was employed to determine the genetic structure of the WCR populations investigated. Genetic clusters (*K*-values) ranged between 1 and 6 (1 more population than the total number of populations for the complete data set), and a series of 10 replicate runs for each prior value of *K* were analyzed. The parameter set for each run consisted of a burn-in of 10,000 iterations followed by 100,000 Markov chain Monte Carlo iterations based on the admixture model of ancestry with the correlated allele frequency model and the default parameters in STRUCTURE. The most suitable value of *K* was calculated using the Δ*K* method as used in Structure Harvester web version 0.6.94 [66], where the highest Δ*K* value was indicative of the number of genetic clusters.

The marker-based kinship matrix (*K*) was calculated with the same genotypes using the VanRaden method [67] and then used to create a clustering heat map of the association mapping panel in the GAPIT [68].
