*2.4. Differentially Expressed Genes Analysis*

In the current study, differentially expressed genes (DEGs) were selected using a cut-off value of |log2FoldChange| ≥ 1 and an adjusted *p*-value < 0.05. Thousands of DEGs were identified through pairwise comparisons (Figure 3a). Among them, the B vs. A group had the largest number of DEGs, followed by the D vs. A and C vs. A groups. It is worth noting that the number of up-regulated genes was significantly higher than that of down-regulated genes, with only one exception: the C vs. B comparison. Interestingly, the largest difference between the numbers of DEGs was observed in the D vs. A comparison, reaching a total of 1003 genes. Furthermore, the common DEGs among the D vs. A, C vs. A, and B vs. A groups were selected using the Venn Diagram Plotter tool (http://omics.pnl.gov/software/venn-diagram-plotter (accessed on 22 January 2022)). The results indicate that there were 1684 genes found to be differentially expressed in the three-way comparisons (Figure 3b), which suggests that these DEGs might have different functions, resulting in different petal color pattern forms.

**Figure 3.** Number of differentially expressed genes (DEGs) in pairwise comparisons of the four varieties. (**a**) The total number of DEGs is up-regulated and down-regulated. The red and cyan colors indicate the up-regulated and down-regulated genes, respectively. (**b**) Venn diagram of DEGs for making a comparison among the three groups.

To explore the differences in gene expression patterns, we performed a hierarchical clustering analysis in a heatmap. Overall, three main clusters of samples were identified (Figure 4). The first cluster mainly consisted of A samples, and cluster two mainly consisted of D samples, whereas cluster three comprised B and C samples. The smallest number of up- and down- regulated DEGs was identified in the C vs. B comparison (Figure 3a), which implied transcriptional similarity between B and C samples. However, the contents of anthocyanins between B and C were distinctly different (Table S1). Therefore, we may assume that the 167 unique up-regulated genes in B samples might be related to the anthocyanin biosynthetic pathway. Meanwhile, we found that the expression patterns of most DEGs were opposite between A and D samples.

**Figure 4.** Heatmap showing the relative expression profiles of DEGs by Euclidean distance among the four samples. A selection of genes is shown in rows, and each column represents an individual sample. The upper dendrogram showed gene clustering by complete linkage, while the left showed clustering of gene expression quantity. The filled colors indicate the gene expression levels (from low—green to high—red).

To verify the reliability of the RNA-seq analysis, we randomly selected several genes for validation using quantitative RT-PCR (qRT-PCR). All the primers used in this study are provided in Table S2, and these genes are responsible for L-ascorbate oxidase, pectin esterase, and beta-glucuronosyltransferase. The results show that the coefficient of determination (R2) between the qRT-PCR data and RNA-seq data was 0.82 (Figure 5), which indicates the high reliability of the RNA-seq results in our study.

**Figure 5.** Regressions of gene expression ratio between RNA-seq and qRT-PCR with GAPDH as a reference. The *x*-axis represents relative gene expression fold change obtained by qRT-PCR, and the *y*-axis represents relative gene expression fold change obtained by RNA-seq. Each point represents the average of the three biological replicates.

#### *2.5. Classification of GO and KEGG Terms*

With gene ontology (GO) annotation, DEGs were classified into three major functional categories: biological process (BP), cellular component (CC), and molecular function (MF). The three types of comparisons presented similar distribution patterns, while the numbers and types of enriched pathways were different (Figures S1 and S2). For the category of BP, the pathway "localization" (GO: 0051179)–"oligosaccharide transport" (GO: 0015772)– "sucrose transport" (GO: 0015770) had a higher number of genes than others (Figure S2a–c). Among the CC functions, the pathway "cell periphery" (GO: 0071944)–"plasma membrane" (GO: 0005886)–"obsolete intrinsic component of plasma membrane" (GO: 0031226) presents the higher number of DEGs (Figure S2d–f). In the MF category, most of the DEGs were mapped to "nutrient reservoir activity" (GO: 0045735) and "catalytic activity" (GO: 003824) (Figure S2g–i).

GO terms with a corrected *p*-value < 0.05 were considered to be significantly enriched. For simplicity, only the top twenty most significant GO terms were selectively presented in each pairwise comparison. GO analysis revealed that most enriched GO terms were assigned to molecular function, followed by biological process (Figure S3 and Table S3). In the D vs. A comparison, these DEGs were associated with six BP terms, five CC terms, and nine MF terms. Comparison of the C sample with the A sample revealed that there were six, five, and nine functional terms assigned to the main categories of three ontologies. Among the 20 GO terms, 2, 3 and 15 GO terms belonged to these 3 functional terms in the B vs. A comparison. Notably, we found that most of the significantly enriched GO MF terms contain genes involved in "sucrose transmembrane transporter activity" (GO: 008515) and "disaccharide transmembrane transporter activity" (GO: 0015154).

The KEGG enrichment analysis was performed to further explore the various metabolic and biosynthesis pathways of the DEGs obtained in different comparisons. The top 20 most significantly enriched pathways that were associated with the DEGs are listed in Figure 6. For example, "Plant hormone signal transduction", "Phenylpropanoid biosynthesis", and "Anthocyanin biosynthesis" were significantly enriched in the D vs. A comparison. "Biosynthesis of secondary metabolites" and "Anthocyanin biosynthesis" were

most highly enriched in the C vs. A comparison. "Glutathione metabolism", "Anthocyanin biosynthesis", and "Phenylpropanoid biosynthesis" were detected in the B vs. A comparison. In particular, pathways related to "phenylpropanoid biosynthesis" (ko00940), and "Anthocyanin biosynthesis" (ko00942) were all significantly enriched in the three pairwise comparisons.

**Figure 6.** KEGG pathway enrichment analysis of the identified DEGs among the three comparison groups. The *x*-axis represents the rich factor, the *y*-axis indicates the name of the KEGG pathway. The size of the dot indicates the number of DEGs in each pathway, while dot color represents the *q*-value of the enrichment.

## *2.6. The Key DEGs Involved in Anthocyanin Biosynthesis Pathway*

Since the contents of anthocyanin components in four peals were correlated with color intensity (Table S1), to investigate the differences in anthocyanin biosynthesis in the petals of the four varieties, DEGs in the anthocyanin biosynthesis pathway were detected, including the phenylpropanoid biosynthesis, flavonoid biosynthesis, and anthocyanin biosynthesis pathways (Table 4). Phenylalanine is the major amino acid precursor for phenylpropanoids. In our study, the gene (*Rhsim04G0145800*) encoding phenylalanine ammonia-Lyase (PAL) was down-regulated in the A sample relative to the B and D samples. Similarly, one gene (*Rhsim01G0211600*) linked to 4-coumarateCoA ligase (4CL) in converting 4-coumaric acid to 4-coumaroyl-CoA was down-regulated in the A sample, as compared to the C and D samples. Additionally, several enzymes were identified, such as flavonoid 3 ,5 -hydroxylase (F3 5 H), flavonol synthase (FLS), flavanone 4-reductase (DFR), and anthocyanidin synthase (ANS), which were associated with two, six, three, and one DEGs, respectively. It has been noticed that six genes (*RhsimUnG0095500*, *RhsimUnG0143300*, *RhsimUnG0134400*, *Rhsim04G0219900*, *RhsimUnG0007900*, *RhsimUnG0105700*) encoding FLS were highly expressed in the B, C, and D samples, especially in the D sample. These results suggest that the genes encoding FLS might be the key to absence of pigments in white petals, leading to more dihydroflavonol entering the flavonols branch. In contrast, three genes in the A sample associated with DFR (*Rhsim06G0030600*, *Rhsim06G0030500*, and *Rhsim06G0030400*) were more highly expressed than in the B, C, or D samples (Table 4). These results indicate that the genes encoding DFR might be critical for color formation, due to causing more dihydroflavonols to enter into the anthocyanin pathway. Interestingly, DFR in the B sample compared to the A sample showed a greater degree of up-regulation than in the C sample, and the high expression of ANS (*Rhsim07G0096600)* in the B sample may explain the similarity of anthocyanins content between A and B samples.


**Table 4.** Gene expression levels of anthocyanin biosynthesis in pairwise comparisons.

Note: Blank present in the table indicates that there is no significant difference in gene expression.

As we know, anthocyanins and flavonoids are mainly pigments in flowers. In this current study, DEGs involved in the anthocyanin biosynthesis pathway were detected according to KEGG pathway analysis. In total, 17 genes were detected in the D vs. A comparison. Compared with the D sample, 12 genes were up-regulated and 5 genes were down-regulated in the A sample. Correspondingly, 19 DEGs were detected between C and A samples, including 14 up-regulated genes and 5 down-regulated genes in the A sample. Additionally, 22 genes were identified in the B vs. A comparison, of which 15 genes were up-regulated and 7 genes down-regulated in the A sample (Table S4). The overlaps among the three pairwise comparisons were calculated, and 11 DEGs were found in the overlapping regions. In addition, eight genes were up-regulated in the A sample relative to the B, C, and D samples, including one, two, and five DEGs encoding UGAT, 5AT, and UGT75C1, respectively (Table S4).
