*2.1. De Novo Transcriptome Assembly and Gene Expression Profiles in the Two Turnips at Five Developmental Stages*

In this work, two widely grown *Brassica rapa* ssp. rapa varieties in Xinjiang (China) including GT with green-colored root skin and PT with purple-colored root skin were studied. Skin samples were collected at five different developmental stages, namely seedling stage (S1, 15 days after sowing (DAS)), early stage of fleshy root expansion (S2, 30 DAS), full expansion stage of fleshy root (S3, 45 DAS), maturity stage of fleshy root (S4, 55 DAS) and harvest stage of fleshy root (S5, 65 DAS). The phenotypes of young and mature turnip roots for GT and PT are presented in Figure 1. The objective of the work was to elucidate the mechanisms underlying the differential skin coloration in these turnips with respect to the accumulation of anthocyanin compounds. First, we de novo sequenced and assembled the transcriptome from 30 samples of the two turnips.

The RNA-seq yielded a total of 232.22 Gb clean data, on average 6.20Gb for each sample with 90.74% of bases scoring Q30 and above (Table 1). A total of 76,152 unigenes were obtained after assembly using the Trinity software and 17,594 unigenes have length of more than 1 kb. The N50 length obtained was approximately 1443 bp (Table 2). The detected gene number in this study was much higher than the reported gene number (41,174 genes) in *Brassica rapa* ssp. pekinensis variety Chiifu-401-42 [42] or (40,708 genes) in *Brassica rapa* ssp. rapa [43]. We performed the functional annotation of the unigenes in various database, including NR, Swiss-Prot, KEGG, COG, KOG, GO and

Pfam databases, which resulted in 52,449 unigenes successfully annotated (Table 3). The clean data of each sample was serialized with the assembled unigene libraries and the mapping result statistics are presented in Table S1. Gene expression levels were estimated with the fragments per kilobase of exon per million fragments mapped (FPKM) values ranging from 0.04 to 5,566,185 (Figure 2A). were related to the turnip variety. For example, Cluster 1 gathered mostly samples of GT, Cluster 2 grouped samples of PT, and Cluster 3 had two subgroups each mostly made of samples from a unique variety (Figure 2B). These results indicate that the global gene expression profile is quite uniform regardless of the developmental stages and only few differentially expressed genes between

the two varieties may be associated with the difference in the turnip skin coloration.

not observe a clear separation according to the developmental stages but to some extent, Clusters

*Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 3 of 19

assembly using the Trinity software and 17,594 unigenes have length of more than 1 kb. The N50 length obtained was approximately 1443 bp (Table 2). The detected gene number in this study was much higher than the reported gene number (41,174 genes) in *Brassica rapa* ssp. pekinensis variety Chiifu-401-42 [42] or (40,708 genes) in *Brassica rapa* ssp. rapa [43]. We performed the functional annotation of the unigenes in various database, including NR, Swiss-Prot, KEGG, COG, KOG, GO and Pfam databases, which resulted in 52,449 unigenes successfully annotated (Table 3). The clean data of each sample was serialized with the assembled unigene libraries and the mapping result statistics are presented in Table S1. Gene expression levels were estimated with the fragments per kilobase of exon per million fragments mapped (FPKM) values ranging from 0.04 to 5,566,185

**Figure 1.** The phenotypes of young and mature purple-colored turnip and green-colored turnip **Figure 1.** The phenotypes of young and mature purple-colored turnip and green-colored turnip roots. *Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 5 of 19

**Figure 2.** Overview of the transcriptome sequencing. (**A**) Gene expression profiles in the 30 libraries. PT represents the purple turnip while GT represents the green turnip; (**B**) heatmap clustering showing correlation among turnip samples based on global expression profiles. The number 1, 2 and **Figure 2.** Overview of the transcriptome sequencing. (**A**) Gene expression profiles in the 30 libraries. PT represents the purple turnip while GT represents the green turnip; (**B**) heatmap clustering showing correlation among turnip samples based on global expression profiles. The number 1, 2 and 3 represent the 3 Clusters of samples.

*2.2. Differentially Expressed Genes Between the Two Turnips and Analysis of Major Regulator Genes* 

3 represent the 3 Clusters of samples.

represent a key stage for turnip skin coloration.

turnip skin color formation.

Transcription factors (TF) are the major regulators of gene expression profiles [45]. We therefore extended the study on the major transcription factor families differentially expressed between the two turnips. Our analysis showed that nine main TF families modulate the global gene expression levels among the two turnips (Figure 3B,C). In addition, the highest number of TF could be noticed at S3, which correlates well with the observed significant DEGs at this developmental stage (Figure 3A,B), showing that TFs are the main drivers of gene expression changes leading to the differential turnip skin coloration. Among the detected TF families, MYB, bHLH and WRKY families showed more active members involved in gene regulation (Figure 3B,C), therefore we deduce that these TF families may be crucial for the regulation of structural genes involved in turnip skin coloration. Distinctly, the genes *c42189.graph\_c1* (WRKY) was strongly down-regulated over most of the developmental stages in PT while the genes *c33188.graph\_c0* (MYB), *c44079.graph\_c0* (MYB) and *c37493.graph\_c0* (bHLH) exhibited the opposite trend. Given the role of WRKY, MYB and bHLH TFs in the regulation of structural genes involved in pigment (flavonoid-anthocyanin) biosynthesis in plants [33], it is tempting to speculate that these four key genes are the major regulators during

To identify the differentially expressed genes (DEG) related to turnip skin coloration, we compared the FPKM values of each gene in PT to GT at the different developmental stages and retained DEGs with fold change > 2 and a false discovery rate (FDR) correction set at *p* < 0.01 [44]. We


**Table 1.** Overview of the transcriptome sequencing dataset and quality check.

**Table 2.** Statistics of the unigene assembly results.


**Table 3.** Functional annotation statistics of the unigenes.


Hierarchical clustering of the samples based on FPKM displayed 3 Clusters of samples. We did not observe a clear separation according to the developmental stages but to some extent, Clusters were related to the turnip variety. For example, Cluster 1 gathered mostly samples of GT, Cluster 2 grouped samples of PT, and Cluster 3 had two subgroups each mostly made of samples from a unique variety (Figure 2B). These results indicate that the global gene expression profile is quite uniform regardless of the developmental stages and only few differentially expressed genes between the two varieties may be associated with the difference in the turnip skin coloration.
