*2.1. Plant Materials*

Two *G. paniculata* wild-type accessions with pink (WT-P) and white flowers (WT-W) were used in this study (Figure 1). The WT-P plant was used for the de novo genome sequenced previously, providing the reference genome data thereby. The WT-W plant was used for genome resequencing to generate InDel markers. Meanwhile, we selected four commercial cultivars of *G. paniculata* ('YX1 , 'YX2 , 'YX3 and 'YX4 ) to identify and validate the polymorphic InDel markers. 'YX1 , 'YX2 and 'YX4 are three representative commercial varieties of *G. paniculata* with white petals, and the difference is the flower size ('YX1 >'YX4 >'YX2 , from large to small'), whereas the flower colour of 'YX3 is pink with a similar size as 'YX2 (Figure 1). All of the above plant materials were provided by Yuxi Yunxing Biological Technology Co., Ltd. (Yuxi, China).

#### *2.2. Variation Detection by Genome Resequencing*

The fresh young leaves of *G. paniculata* WT-W were used for genome resequencing. The MGISEQ-2000 PE150 sequencer was applied to conduct genome sequencing, after which the original reads (8.66 Gb) were filtered to generate clean reads (8.05 Gb) for subsequent analysis. Using in-house scripts, we filtered any sequencing reads with the following: reads with adapter sequences, consecutive bases on the ends with base quality < Q20, read length < 50 bp and singletons. The clean reads were then aligned to the *G. paniculata* reference genome using BWA mem (v0.7.17) with default settings [26]. The alignment results were sorted using Samtools (v1.9) [27].

SNP and InDel were called using GATK HaplotypeCaller (v4.1.4.1, Broad Institute, Cambridge, MA, USA) with default settings [28]. We further filtered the calls using GATK VariantFiltration with the following parameters: SNP filtering (QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum< −12.5, ReadPosRankSum < −8.0); InDel filtering (QD < 2.0, FS > 200.0, ReadPosRankSum < −20.0). CNV were detected using CNVnator (v0.3.2) with default settings [29]. SV were identified using Manta (v1.6.0) [30]. Mutational positions, genomic regions and potential amino acid changes were assessed using ANNOVAR (v2019, Wang Kai, PA, USA) [31]. Circos (v0.69, Martin Krzywinski, Vancouver, BC, Canada) was used to plot the genome-wide distribution of variation [32].

**Figure 1.** The phenotype of *G. paniculata* wild-type accessions and commercial cultivars used in this study. (**A**). The pink flower wild type of *G. paniculata* (WT-P). (**B**). The white flower wild type of *G. paniculata* (WT-W). (**C**). The flower phenotype of four commercial cultivars ('YX1 , 'YX2 , 'YX3 and 'YX4 , from left to right). Bar = 1 cm.

#### *2.3. Development of InDel Markers*

We selected the InDels that were over 10 bp long and distributed ~2 Mb. The positions with excess InDels which might interfere with the PCR verification were excluded. After selecting the suitable InDels, a ~400 bp genome sequence covering each InDel was used as the template for primer design. The primers were designed on NCBI and named after the chromosome number and the physical position (N-XX.XX, Table S1).

#### *2.4. PCR Analyses of InDel Markers*

The total DNA of two wild-type plants and four commercial cultivars was extracted from fresh leaves using the CTAB method [33]. Template DNA was amplified with the designed primers in a 10 μL system (7.3 μL ddH2O; 1 μL 10× Taq buffer; 0.8 μL dNTPs; 0.2 μL primers; 0.1 μL Taq enzyme; 0.4 μL DNA template) using the following PCR program: 5 min of full denaturation at 95 ◦C; 29 cycles (95 ◦C, 30 s; 56 ◦C, 30 s; 72◦C, 30 s); 72 ◦C extension for 7 min. After the standard PCR, 3 μL DNA loading buffer was added to the PCR product. Then, the mixture was separated in 3.5% agarose gel.
