**2. Results**

#### *2.1. The Contents of Anthocyanins Components and Total Flavonoids*

The petals of *R. pulchrum* have deep-color blotches in the center. The components and contents of anthocyanins both in petals and blotches were analyzed by HPLC and compared with the standard. The results show that the peak value of pigments appeared in samples A, B, and C around 7.32, 9.07, 10.55, 12.30, and 15.6 min, which is consistent with the detection time of five pigment standards (delphinidin, 6.95; cyanidin, 8.91; pelargonidin, 10.82; peonidin, 11.95; malvidin, 14.80 min) (Figure 1). In petal samples, peonidin was the main anthocyanin in sample A. The content of peonidin in B and C decreased significantly, and pelargonidin pigment became the main anthocyanin that determined the color of samples B and C (Table S1). The main anthocyanins in the deep-color blotches were peonidin. Anthocyanin components were not detected in the petals of D, and only a small amount of anthocyanins were detected in the blotches. The analysis of flavonoid content in sample petals significantly showed that the lighter the color of the petals, the higher the flavonoid content, and there was no significant difference in the flavonoid content in the dark spots of the petals (Table 1).

**Figure 1.** High-performance liquid chromatograms (HPLC) of mixed anthocyanins in petals. The *y*-axis represents the peak area. DP, delphinidin; CY, cyanidin; PG, pelargonidin; PN, peonidin; MV, malvidin.


**Table 1.** The content of total flavonoids.

Means are followed by ± standard deviations. Different lowercase letters indicate significant difference between treatments based on one-way ANOVA (*p* < 0.05). QE mg/g DW, mg quercetin equivalent per g of dry weight.

#### *2.2. Overview of RNA-Seq Data and Sequence Assembly*

To further research the molecular mechanism of *R*. *pulchrum* petal coloration, transcriptome analysis was performed via RNA-seq. Libraries were prepared from three biological replicates of each variety, and twelve libraries were established. The cDNA libraries were then submitted for transcriptome sequencing analysis on the Illumina HiSeq 2000 platform. A total of 38.67 to 43.72 million raw sequencing reads were generated for each variety (Table 2). After removing low-quality reads, 35.55 to 40.56 million high-quality clean reads were obtained, which accounted for over 90% of the raw reads. The percentage of bases with Q30 (high sequencing quality) was not less than 89.80%. The clean reads were then aligned to the corresponding reference genome using Hisat2 (version 2.1.0) software. The number of mapped reads ranged from 28.56 to 32.65 million, and the mapping ratio of each sample ranged between 79.32 and 80.66%. In addition, the percentage of reads that were only aligned to one position (uniquely mapped reads) was greater than 94.70% (Table 2).


**Table 2.** Summary of the transcriptome sequencing dataset.

Sample: petal sample's name; Raw Reads: the total number of original reads before filtering; Clean Reads: the remaining reads after filtering, where the percentage in parentheses represents clean reads relative to raw reads; Q30 (%): the proportion of bases with a base recognition error rate of 0.1% or less; Total Mapped: the total number of reads mapped to the reference genome, where the proportion in parentheses is total mapped/clean reads; Uniquely Mapped: the reads which can be mapped to the reference genome at only one site, where the proportion in parentheses is uniquely mapped/total mapped.

The abundance of each gene was calculated based on the fragments per kb per million reads (FPKM) method. Moreover, Pearson's correlation test was applied to calculate the relation of expression patterns among the different samples. The gene expression levels showed similar patterns within sample groups and differences between groups (Figure 2a), indicating that the analysis results are reliable. Principal component analysis (PCA) was used to further explore the differences and similarities, which showed a clear separation among the four varieties with different colors. Meanwhile, the biological replicates of each sample were clustered together, indicating a high degree of transcriptional similarity (Figure 2b). Collectively, these results show that the RNA sequencing quality was suitable for further analysis.

**Figure 2.** Different samples analysis. (**a**) Pearson correlation coefficient of biological replicates of different samples. The correlation coefficient between two samples was calculated based on the FPKM values of those samples. The left and upper sides of the figure show sample clustering, and the right and lower sides show sample names. (**b**) Principal component analysis (PCA) of the similarities and differences between the four samples used for RNA-seq.

#### *2.3. Annotation of R. pulchrum Transcriptome*

Five public databases were used for annotation of the unigenes in *R*. *pulchrum*. The results show that a total of 32,999 unigenes were annotated according to the BLASTx results, of which 28,273, 18,054, 24,301, 19,099, and 11,507 genes could be annotated using the Nr database (85.68%), Swiss-Prot database (54.71%), Pfam database (73.64%), GO database (57.88%), and KEGG database (34.87%), respectively (Table 3).

**Table 3.** Summary of functional annotation of transcripts in the five public databases searched.

