*2.4. Sequence Data Analysis and SNP Calling*

When this study was completed, there was a reference genome available for aligning Illumina raw reads for SNP discovery. Raw reads were de-multiplexed and the barcode sequences were removed. Any sequences not containing the expected restriction sites for both enzymes were removed. Subsequently, the reads were filtered and trimmed using recommended settings in Trimmomatic-0.39 [25]. Burrows–Wheeler Aligner (BWA) software [26] was then used to assemble and to align the clean reads from each individual against the radish's reference genome. Genomic data (Rs 1.0 chromosome) at the chromosome pseudomolecule level of the Radish Genome Database (http://radish-genome.org/) were used as the reference genome [27]. Alignment files ware converted to bam files using the SAMtools software [28]. A genome Analysis Toolkit (GATK) and Picard tools [29] were used for variant calling. The GATK 'HaplotypeCaller' was used to find all possible SNP and indel sites. In addition, filtering was performed using GATK 'VariantFiltration' (Figure 1).


**Figure 1.** A pipeline for calling single nucleotide polymorphisms (SNPs) from genotype-by-sequencing (GBS) data.
