**Red Chinese Cabbage Transcriptome Analysis Reveals Structural Genes and Multiple Transcription Factors Regulating Reddish Purple Color**

**Jana Jeevan Rameneni 1,**† **, Su Ryun Choi 1,**† **, Sushil Satish Chhapekar 1,**† **, Man-Sun Kim <sup>1</sup> , Sonam Singh <sup>1</sup> , So Young Yi <sup>1</sup> , Sang Heon Oh <sup>1</sup> , Hyuna Kim <sup>1</sup> , Chang Yeol Lee <sup>1</sup> , Man-Ho Oh <sup>2</sup> , Jhongchul Lee <sup>3</sup> , Oh Ha Kwon <sup>3</sup> , Sang Un Park <sup>4</sup> , Sun-Ju Kim <sup>5</sup> and Yong Pyo Lim 1,\***


Received: 1 April 2020; Accepted: 18 April 2020; Published: 21 April 2020

**Abstract:** Reddish purple Chinese cabbage (RPCC) is a popular variety of *Brassica rapa* (AA = 20). It is rich in anthocyanins, which have many health benefits. We detected novel anthocyanins including cyanidin 3-(feruloyl) diglucoside-5-(malonoyl) glucoside and pelargonidin 3-(caffeoyl) diglucoside-5-(malonoyl) glucoside in RPCC. Analyses of transcriptome data revealed 32,395 genes including 3345 differentially expressed genes (DEGs) between 3-week-old RPCC and green Chinese cabbage (GCC). The DEGs included 218 transcription factor (TF) genes and some functionally uncharacterized genes. Sixty DEGs identified from the transcriptome data were analyzed in 3-, 6- and 9-week old seedlings by RT-qPCR, and 35 of them had higher transcript levels in RPCC than in GCC. We detected *cis-*regulatory motifs of MYB, bHLH, WRKY, bZIP and AP2/ERF TFs in anthocyanin biosynthetic gene promoters. A network analysis revealed that MYB75, MYB90, and MYBL2 strongly interact with anthocyanin biosynthetic genes. Our results show that the late biosynthesis genes *BrDFR, BrLDOX, BrUF3GT, BrUGT75c1-1, Br5MAT, BrAT-1, BrAT-2, BrTT19-1,* and *BrTT19-2* and the regulatory MYB genes *BrMYB90, BrMYB75,* and *BrMYBL2-1* are highly expressed in RPCC, indicative of their important roles in anthocyanin biosynthesis, modification, and accumulation. Finally, we propose a model anthocyanin biosynthesis pathway that includes the unique anthocyanin pigments and genes specific to RPCC.

**Keywords:** anthocyanins; anthocyanin biosynthetic genes; *cis*-regulatory motifs; DEGs; network analysis; qRT-PCR; reddish purple Chinese cabbage; transcriptome; transcription factors

#### **1. Introduction**

Introgression breeding is an important traditional breeding technique for transferring key agronomic traits between two distinct species [1]. Using this technique, improvements have been made to many *Brassica* traits, such as disease resistance, male sterility, seed color, oil quality traits, and other morphological traits [1,2]. Some purple Brassicaceae lines have also been generated [3,4]. Red Chinese cabbage (*Brassica rapa* ssp. *pekinensis* L.) is reddish purple in color and rich in anthocyanins. This vibrantly colored variety is a popular addition to salads and it has important antioxidant properties [4].

Anthocyanins are a class of secondary metabolites that are synthesized through the phenylpropanoid pathway [5]. These water-soluble compounds with red, purple, or blue colors are synthesized in the cytosol and stored in the vacuole [6]. Anthocyanins play crucial roles in reducing damage from, and in defense responses against, abiotic stresses such as ultraviolet exposure, wounding, high light, chilling, pollution, osmotic stress, and nutrient deficiency, as well as biotic stresses such as pathogen infection [7].

The important structural genes in the anthocyanin biosynthetic pathway are those encoding phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumaroyl CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonoid 30 -hydroxylase (F30H), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin dioxygenase (LDOX), UDP-flavonoid glucosyl transferase (UFGT) and glutathione S-transferase (GST) [8]. The late anthocyanin biosynthesis genes are regulated by a complex of transcription factors (TFs), MYB-bHLH-WD40, known as the MBW complex. In vivo assays in *Antirrhinum* revealed that a flower-specific MYB protein activates the transcription of genes involved in phenylpropanoid biosynthesis [9,10]. Four MYB TFs, encoded by *PAP1*/*MYB75, PAP2*/*MYB90, MYB113*, and *MYB114*, have been identified to control anthocyanin biosynthesis in vegetative tissues of *Arabidopsis* [11,12]. Previous biochemical and genetic studies have shown that TTG1 (WD40), GL3/EGL3/TT8 (bHLH) and PAP1/PAP2/MYB113/MYB114 (MYB) are components of potential WBM complexes that activate anthocyanin biosynthesis [13,14]. Interestingly, recent studies on proanthocyanidin and anthocyanin biosynthesis pathways in *Arabidopsis* and *Petunia* suggest that WRKY TFs regulate color accumulation along with the MBW complex [10,15]. Similarly, studies on bZIP-TFs revealed the mechanisms by which they regulate anthocyanin accumulation in apples [16,17].

Secondary metabolites are abundant in Chinese cabbage. Previous studies have shown that some secondary metabolites are specific to particular species, cultivars, or varieties. Such metabolites are usually detected by high performance liquid chromatography (HPLC) coupled with liquid chromatography-tandem mass spectrometry (LC-MS/MS) [4,18–20]. Recent metabolic profiling studies have demonstrated that cyanidin derivatives are highly accumulated in *Brassica* species [18,21–23]. It is difficult to identify all the genes related to specific traits in a plant system, but transcriptome sequencing allows for the prediction of functional genes in the plant genome. Several RNA-seq studies have been conducted for *Brassica* species, and their results have led to the identification of sets of genes that are expressed under various conditions and/or in certain genotypes. This information is very useful for the functional characterization of target genes [24–27].

Reddish purple Chinese cabbage (RPCC) is a variety of *B. rapa*. It is an economically important leafy vegetable that is widely cultivated and consumed in East Asian countries, especially Korea, due to its health promoting properties [28]. To date, the underlying molecular mechanisms and the genes regulating the anthocyanin pigments responsible for its vivid red color are unexplored, and may differ from those reported previously for other *Brassica* lines. In this study, we used a next generation sequencing (NGS) based RNA-sequencing approach to identify a novel set of genes involved in anthocyanin biosynthesis, and the regulation of this pathway, in a red Chinese cabbage variety. A comprehensive analysis of this plant including computational, leaf chemotype, and expressional abundance analyses shows the significance of this variety.

#### **2. Results**

#### *2.1. Estimation of Anthocyanin Content in RPCC and GCC Leaf Samples*

The red color is a distinguishing feature of some *Brassica* species, and is due to the accumulation of anthocyanins [29]. To determine the types of anthocyanins that confer color in RPCC, we analyzed anthocyanins in the innermost and outermost leaves of 9-week-old GCC and RPCC plants (Table 1 and Table S1) by HPLC-MS/MS. We detected 13 anthocyanin pigments, 12 of which were derivatives of cyanidin. The other one was a pelargonidin derivative. The most abundant pigment in the innermost leaves of RPCC was cyanidin 3-(feruloyl) diglucoside-5-(malonoyl) glucoside (peak 8) with a concentration of 12.63 ± 1.37 mg/g dry weight, followed by pelargonidin 3-(caffeoyl) diglucoside-5-(malonoyl) glucoside (peak 7) with a concentration of 5.66 ± 0.60 mg/g dry weight (Figure 1; Table 1 and Table S1). In the outermost leaves of RPCC, the most abundant pigments were cyanidin 3-(feruloyl) diglucoside-5-(malonoyl) glucoside and cyanidin 3-*O*-(sinapoyl)(feruloyl) diglucoside-5-*O*-(malonyl) glucoside (Figure 1; Table 1 and Table S1). The total amount of anthocyanins was 32.31 ± 2.84 mg/g dry weight in the innermost leaf and 10.17 ± 1.69 mg/g dry weight in the outermost leaf of RPCC. No anthocyanins were detected in the GCC leaves (Table 1). Interestingly, the RPCC pigment peaks 8 and 7 and a few other pigments identified in this study (Table S1) had not been reported previously. This result indicates that cyanidin 3-(feruloyl) diglucoside-5-(malonoyl) glucoside and pelargonidin 3-(caffeoyl) diglucoside-5-(malonoyl) glucoside are specific to this variety of RPCC, and contribute to its color.

**Figure 1.** HPLC chromatogram of anthocyanin pigments detected in leaf extract of reddish purple Chinese cabbage at 520 nm. Horizontal axis shows retention time (min); vertical axis indicates strength of the peak (mAU).



RPCC-reddish purple Chinese cabbage; GCC- green Chinese cabbage; IL-inner leaf; OL- outer leaf; anthocyanin content-mg/g dry weight.

#### *2.2. RNA-Sequencing of RPCC and GCC Samples*

To identify differences in gene transcription between RPCC and GCC, we obtained and sequenced 53,127,646 (GCC) and 48,179,516 (RPCC) total reads, making up 5.37 Gb (giga bases) and 4.87 Gb, respectively (Table 2) with a GC content of 47.6% and 47.2%, respectively. After filtering, 38,611,432 and 34,924,846 clean reads were obtained for the GCC and RPCC samples, respectively, with Q30 values of 94.4% and 94.1%, respectively. A total 32,395 non-redundant transcripts were identified with varying lengths, most in the range of 501 to 1000 nucleotide bases, followed by 200–500 and 1001–1500 nucleotide bases (Figure S1a). Among the identified transcripts, 90.2% were annotated to the following public databases: NR-Viridiplantae (88.61%)¸ Phytozome (88.05%), UniProtKB–Viridiplantae (85.31%), KOG (84.06%), GO (83.72%), InterProscan (68.43%), and KEGG (18.87%) (Figure S1b).



#### *2.3. Identification of DEGs*

The cDNA libraries of RPCC and GCC were mapped to the *B. rapa* reference genome with coverage of 95.2% and 94.66%, respectively (Figure 2a). Among the mapped and annotated DEGs, the highest proportion had RPKM (reads per kilobase per million mapped reads) values of >10, followed by RPKM values of 1 to 4 (Figure 2b). Among the predicted 3345 DEGs, 2706 were up-regulated and 639 were down-regulated in RPCC vs. GCC (Figure 2c; Table S2 and Table S3). Of them, 643 of the up-regulated DEGs and 354 of the down-regulated DEGs between RPCC and GCC samples were not functionally characterized (Table S2 and Table S3). The DEGs included many ABGs. A Bland–Altman (MA) plot was constructed to show the differentiation of gene expression between the two samples by plotting the values onto M (log ratio) and A (mean average) scales. The differences in gene expression are shown on the MA plot, where genes with ≥1-fold expression values are shown in red and those with negative log2 fold values are shown in green (Figure S2).

**Figure 2.** RNA-seq data for reddish purple Chinese cabbage. (**a**) Percentage of transcripts mapped to reference genome; (**b**) Gene expression values (RPKM); (**c**) Differentially expressed genes (DEGs) between two genotypes [green (GCC) and reddish purple (RPCC)]; (**d**) Transcription factor families identified in the transcriptome.

#### *2.4. Identification of Transcription Factor Genes*

Transcription factors play crucial roles in regulating gene expression, and many TFs control ABGs [30]. Hence, we searched for TFs that regulate anthocyanin biosynthesis in *B. rapa*. We detected 1625 TF genes in 54 TF families from our transcriptome data (Figure 2d; Table S4 and Table S5). The proportions of TF genes in different TF families were as follows: 8.7% in the basic helix loop helix (bHLH) family, 7.6% in the ethylene response factor (ERF) family, 6.05% in the myeloblastosis (MYB) family, 5.77% in the WRKY family, and 5.72% in the MYB-related family (Figure 2d). Additionally, 218 TF genes with log2fold change expression ≥1 and 37 TFs with log2fold change expression ≤ 1 were identified (Table S4 and Table S5). A few TF genes had very high log2fold change values (5–11.3), including *MYB90 (Bra004162), MYB75 (Bra039763),* three *RRTF1s (Bra017656, Bra011529, Bra034624), CBF4 (Bra028290), MYBL2 (Bra016164), TTG2 (Bra023112), DDF1 (Bra019777)* and *ERF-13 (Bra037630)* (Table 3). These DEGs may be involved in various functions related to pigment accumulation in RPCC.


**Table**



**Table 3.** *Cont.*


**Table3.***Cont.*


**Table 3.***Cont.*

#### *2.5. Functional Characteristics of DEGs and KEGG Pathway Enrichment Analysis*

We performed GO analyses to annotate the DEGs. The top 20 enriched terms in each GO category of DEGs were selected based on significance (*p* < 0.05) and are summarized in Figure 3. In the BP category, the majority of DEGs were involved in "response to chemical stimulus" followed by "response to organic substance", "response to endogenous stimulus", and "cellular response to chemical stimulus" processes. Similarly, in the MF category, the majority of genes were involved in "DNA binding", followed by "transcription factor activity", "sequence specific DNA binding" and "calcium ion binding". In the CC category, many of the DEGs were related to "extracellular region", "cell wall", "external encapsulating structure" and "plant type cell wall" (Figure 3). We further categorized the up-regulated and down-regulated genes. Among the up-regulated DEGs, a total of 385 GO terms were identified, comprising 328 in the BP category, 35 in the MF category, and 22 in the CC category. Among the down-regulated DEGs, a total of 210 GO terms were identified, comprising 133 terms in the BP category, 50 in the MF category, and 22 in the CC category (Table S6.) In the BP category, a few genes were involved in "biosynthetic and metabolic processes of anthocyanins, ethylene signaling, flavonoids and phenylpropanoids", "catabolic and metabolic processes of L-phenylalanine", "cinnamic acid biosynthesis", "response to absence of light", and "response to temperature stimulus". In the MF category, some genes were involved in "phenylalanine ammonia-lyase activity" and "O-methyltransferase activity" (Table S6.). Some of the down-regulated genes were involved in "biosynthesis process", "negative regulation of cellular metabolic process", and "response to UV-C" (Table S6).

**Figure 3.** Gene ontology (GO) analysis of differentially expressed genes (DEGs) between red and green Chinese cabbage. DEGs were grouped into three categories: (**a**) Biological process; (**b**) molecular function; and (**c**) cellular component. X-axis shows gene annotation term; y-axis shows number of genes.

Next, we conducted a KEGG enrichment analysis to identify pathways significantly enriched with DEGs. The pathways enriched with up-regulated DEGs were "biosynthesis of secondary metabolites", "metabolic pathways", "biosynthesis of flavonoids and phenylpropanoids", "metabolism of fructose, mannose, starch, and sucrose", and several others (Figure 4 and Table S7). The down-regulated DEGs were involved in 46 types of functions related to organ development, and other functions related to plant growth and development (Table S7).

**Figure 4.** KEGG enrichment analysis of differentially expressed anthocyanin biosynthetic genes. *X*-axis shows KEGG terms and y-axis shows enrichment factor. Gene count and corrected *p*-values are shown on right.

#### *2.6. Genes Related to Anthocyanin Biosynthesis Identified from Transcriptome Data*

From the transcriptome data, we identified 255 ABGs (Table S8) comprising 58 phenylpropanoid biosynthetic genes (PBGs), 56 early biosynthetic genes (EBGs), 67 late biosynthetic genes (LBGs), 19 anthocyanin transporter genes (ATGs), 29 other anthocyanin biosynthesis regulatory genes (OABRGs) and 26 regulatory TF genes (Table S8). The main PBGs were those encoding cinnamyl alcohol dehydrogenase (CAD), caffeoyl CoA O-methyltransferase (CCoAMT), cinnamoyl-CoA reductase (CCR), cinnamate 4-hydroxylase (C4H), 4-coumarate: coenzyme A ligase (4CL), O-methyltransferase (OMT), and phenylalanine ammonia lyase (PAL). These genes are involved in different stages of the phenylpropanoid biosynthesis pathway and showed large differences in transcript levels (−0.7 to 3.2-fold change) between RPCC and GCC (Table S8). Similarly, EBGs encoding chalcone isomerase (CHI), chalcone synthase (CHS), flavanone-3-hydroxylase (F3H), flavonoid 30 -hydroxylase (F30H), and flavonol synthase (FLS) showed large differences in their transcript levels between RPCC and GCC (−1.1 to 4.9-fold change) (Table S8). The LBGs showing large differences in transcript levels (−1.9 to 9.3-fold change) between RPCC and GCC encoded beta glucosidase (BGLU), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin dioxygenase (LDOX), 5-*O*-glucoside-6-*O*-malonyltransferase (5MAT), UDP-glucose: flavonoid 3-o-glucosyltransferase (UF3GT), and UDP-glucosyltransferases

(UGT). Genes for anthocyanin transporters and regulatory TFs included those encoding glutathione S-transferase 26/TRANSPARENT TESTA 19 (GST26/TT19), multidrug and toxic compound extrusion (MATE), basic helix-loop-helix 32 (BHLH32), ENHANCER OF GLABRA 3 (EGL3), GLABRA 3 (GL3), myeloblastosis protein 75 (MYB75), myeloblastosis protein 90 (MYB90), TRANSPARENT TESTA 8 (TT8) and TRANSPARENT TESTA GLABRA 1 (TTG1), and TRANSPARENT TESTA GLABRA (TTG2). These genes showed differences in expression between RPCC and GCC ranging from a −0.5 to 11.3-fold change. Interestingly, we identified a few OABRGs with high log2fold expression values (≥ 2) (Table S8). These results indicate that ABGs have vital roles in anthocyanin biosynthesis, transportation, and accumulation in the leaf tissues of RPCC at the seedling stage (3 weeks).

#### *2.7. Expression Analysis of Anthocyanin Biosynthetic Genes by qRT-PCR*

In general, gene expression studies can demonstrate the biological activity of genes in plants. We confirmed the reproducibility and accuracy of DEGs identified in our transcriptome data through qRT-PCR analyses. Although transcriptome sequencing was performed using samples from 3-week-old plants (seedlings), we checked the expression profiles of genes in samples from 6-week-old (rosette stage) and 9-week-old (heading stage) plants of GCC and RPCC. This analysis of gene expression at three developmental stages sheds light on the expression profile of ABGs and their effect on anthocyanin biosynthesis and accumulation throughout plant development. In general, the anthocyanin biosynthesis pathway can be classified into three phases; 1. The phenylpropanoid pathway; 2. Early steps of the flavonoid pathway; and 3. The anthocyanin pathway [31]. For the validation of gene expression, we selected 60 genes identified from the transcriptome data and from a previous study [26] that are involved in various stages of anthocyanin biosynthesis (Table 4; Table S9).

In the phenylpropanoid pathway, three PAL genes (*BrPAL1, BrPAL2* and *BrPAL4*), *BrC4H*, and the 4CL homolog *Br4CL2* were detected in GCC and RPCC at three developmental stages. The transcriptome data from 3-week-old plants (heat map, Figure 5a) and the qRT-PCR analyses showed that the transcript levels of these genes were much higher in RPCC than in GCC. Gene expression was also compared among stages (seedling, rosette, and heading stages) (Figure 5a). Similar to PBGs, EBGs such as *BrCHS*, *BrCHI* and *BrCHI1, BrF3H,* and *BrF3*0*H-1* showed similar expression patterns in both the transcriptome analysis (heat map, Figure 5b) and qRT-PCR analyses (Figure 5b). Unlike PBGs and EBGs, the LBG *BrDFR* was expressed only in the RPCC at all stages, while the LBG *BrLDOX* was expressed at lower levels at the early stage. The *BrLDOX* transcript levels gradually increased from the rosette to the heading stage in RPCC (Figure 5c). The transcript levels of MYBs including *BrMYB90, BrMYB75, and BrMYBL2-1* varied among different stages. *BrMYB90* and *BrMYB75* showed maximum transcript levels in RPCC at the seedling stage, while *BrMYBL2-1* had higher transcript levels at the rosette and heading stages than at the seedling stage in RPCC (Figure 5d).

The downstream LBGs are involved in the acylation, glycosylation, and methylation of anthocyanins and include genes encoding acyltransferase (AT), glycosyltransferase (GT) and O-methyltransferase (OMT) [32,33]. The downstream LBGs selected from the transcriptome data for qRT validation included *BrUF3GT*, *BrUGT75C1-1, BrUGT73B2, Br5MAT, BrAT-1, and BrAT-2. BrUF3GT, BrUGT75C1-1*, and *BrUGT73B2* showed increased expression while *Br5MAT* and *BrAT-1* showed decreased expression from the seedling to heading stages in RPCC. The highest transcript level of *BrAT-2* was at the rosette stage in RPCC (Figure 5e). Interestingly, all the LBGs including regulatory MYB (RM) genes showed low or no expression in GCC compared with RPCC in the transcriptome data (heat map, Figure 5e) and in the qRT-PCR analyses (Figure 5e).


Detailsof60genesselectedforvalidationbyqRT\_PCR


**Table 4.**

*Cont.*


**Table 4.** *Cont.*

*Int.*

**Figure 5.** Validation of anthocyanin biosynthetic genes (ABGs) detected in transcriptome data by qRT-PCR analyses of reddish purple (RPCC) and green (GCC) leaf tissue samples. (**a**) Phenylpropanoid pathway genes; (**b**) early biosynthesis pathway genes; (**c**) and (**e**) late biosynthesis pathway genes; (**d**) regulatory MYB genes; and (**f**) transporter genes. Gene expression levels were normalized against that of *Actin*. Error bars are based on mean of three technical replicates. Schematic representation of anthocyanin biosynthetic pathway is shown in left corner. Heatmaps in middle and right corner indicate transcript abundance of ABGs.

Among many anthocyanin transporter genes, four *B. rapa* transporter genes including *BrMATE-1, BrMATE-2*, and *BrTT19-1* showed differential expression between GCC and RPCC (heat map, Figure 5f). At the seedling and rosette stages, both *BrMATE-1* and *BrMATE-2* were expressed at higher levels in

RPCC than in GCC. *BrTT19-1* and *BrTT19-2* transcript levels were high at all stages in RPCC, but at negligible or undetectable levels in GCC at all stages (Figure 5f). The remaining ABGs, showed diverse expression patterns in the qRT-PCR analyses (Figure S3). Most of the analyzed genes had higher transcript levels in RPCC than in GCC (Figure S3).

We also conducted qRT-PCR analyses for some MYB TF genes showing differences in expression levels between RPCC and GCC in transcriptome data (Table S4). Among the five selected MYB genes, BrMYB15 and BrMYB51-2 showed >1-fold expression in RPCC than in GCC at all stages, and BrMYB77 transcript levels increased from the rosette to the heading stage in RPCC but not in GCC (Figure 6).

**Figure 6.** qRT-PCR validation of MYBs with high transcript abundance in transcriptome data. qRT-PCR expression values were normalized against that of *Actin*. Error bars are based on mean of three technical replicates.

A correlation analysis revealed a strong correlation between the transcriptome data and qRT-PCR data (R = 0.81) (Figure 7). Overall, similar trends in gene expression were detected from transcriptome data and qRT-PCR analyses. The expression patterns of PBGs, EBGs, LBGs, TGs, and RMs implied that LBGs, TGs and RMs play crucial roles in anthocyanin biosynthesis during different developmental stages of RPCC.

**Figure 7.** Correlation analysis between RNA-seq and qRT-PCR methods. Log2fold values of RNA-seq data (x-axis) are plotted against log2fold values of qRT-PCR (*y*-axis) data.

#### *2.8. Promoter Analysis of Anthocyanin Biosynthetic Genes*

Cis-regulatory elements (CREs) are binding sites for TFs in the promoters of target genes. To identify CREs, we analyzed the promoter regions of 22 important ABGs. The 2-kb region upstream of the transcription start site (TSS) was extracted and analyzed by the New PLACE program to find CRE motifs (Figure 8a). Among the predicted CREs, most were binding elements for MYB, bHLH, WRKY, bZIP and Ap2/ERF TFs (Figure 8b and Table S10). Those binding to MYB TFs were the most abundant, followed by those binding to bHLH, WRKY, bZIP and AP2/ERF TFs (Figure 8b). MYB CREs have been found in the promoters of genes related to secondary metabolism, flavonoid biosynthesis, anthocyanin biosynthesis, and plant defense (Table S10). The bHLH and bZIP CREs are known to be involved in the light response, tissue specific activation of phenylpropanoid biosynthetic genes, sugar repression, seed development, and the biosynthesis of phenylpropanoids, lignin, and flavonoids. AP2/ERF CREs are involved in functions related to the ethylene response, the jasmonate response, and secondary metabolism (Table S10). Therefore, the results of our study and other studies [34,35] indicate that MYB, and bHLH CREs regulate the expression of genes at all stages in the anthocyanin biosynthesis pathway.

**Figure 8.** Cis-regulatory elements predicted in upstream promoter regions of anthocyanin biosynthetic genes (ABGs). (**a**) Example of plant gene organization and important cis-elements in promoter. (**b**) Number of each type of cis-element identified in ABGs. P, promoter; E, exon; I, intron.

#### *2.9. Regulatory Network Analysis of Anthocyanin Biosynthesis Genes*

To identify the interactions among anthocyanin biosynthetic genes and the transcription factor genes including MYB, bHLH, WRKY, bZIP, and AP2/ERF (with log2fold change > 2) (Tables 3 and 4), a putative interactive network was constructed (Figure 9). Among them, 37 *B. rapa* genes (yellow circles) showed 147 interactions, which could be classified into two types: activation (↓/↑) and repression (⊥) (Figure 9). The gene network results showed that MYB75 interacts with a gene encoding an acyl transferase family protein, as well as MYB90, 5MAT, TT5, AGT, TT4, UGT78D2, TT19, DFR, and UF3GT, and activates them to promote anthocyanin biosynthesis. Two LBGs (DFR and LDOX) are positively regulated by TFs such as PIF3, MYB32, HY5, and TT2 (Figure 9) and DFR is also positively regulated by the TT8 TF. Besides gene–gene interactions, this network analysis revealed many other interactions

among TFs and structural genes involved in various functions. Among the repressors, the MYBL2 TF interacts with MYB75, DFR, TT2, TT8, GL2, and EGL3; MYB75 represses SCPL10; both EGL3 and GL3 repress LDOX; and bHLH32 represses DFR (Figure 9). These results indicate that interactions between TFs and their target genes play a vital role in the regulation of anthocyanin biosynthesis and other metabolic functions related to the growth and development of RPCC.

**Figure 9.** Gene regulatory network of anthocyanin biosynthetic genes and important transcription factors. DEGs detected from our transcriptome data are shown in yellow: other interactive genes involved in various functions including anthocyanin biosynthesis are shown in blue. Up tack (⊥) indicates repressors and arrow (↓/↑) indicates activators.

#### **3. Discussion**

The red color in Chinese cabbage has been introduced through different techniques of introgression breeding. Among the introduced varieties, Xie et al. [3] introgressed the red color phenotype by crossing a Chinese cabbage variety with red color *Brassica juncea* through the embryo rescue technique. While in our study, RPCC and red Chinese cabbage (RCC) reported by Lee et al. [4], have been developed through interspecific-crossing between Chinese cabbage and red cabbage.

#### *3.1. Novel Anthocyanin Pigments Responsible for the Color of RPCC*

Red Chinese cabbage is an economically important variety that is rich in various secondary metabolites including anthocyanins [4]. This variety accumulates red pigments at an early stage of plant growth (seedling stage), so it is an ideal system to study the genes and regulatory TFs that are involved in regulating color (anthocyanin) accumulation at the early stages of plant development (Figure 10). Fruits and vegetables contain five main types of anthocyanins, with different frequencies of occurrence: cyanidin (50%), delphinidin (12%), pelargonidin (12%), peonidin (12%), malvidin (7%), and petunidin (7%) [36]. We detected 11 anthocyanin variants in the RPCC samples, approximately 85% of which were cyanidin isoforms. Thus, our results and those of other studies show that cyanidin derivatives are the most abundant type of anthocyanins in red Chinese cabbage [4]. In addition,

the anthocyanin pigments accumulated in RPCC and in previous studies [3,4] are entirely different, indicating that the color accumulation and regulation are due to different sets of anthocyanin pigments. In accordance with the results of the HPLC analysis, we propose a model of the anthocyanin biosynthesis pathway that generates cyanidin 3-(feruloyl) diglucoside-5-(malonoyl) glucoside, and pelargonidin 3-(caffeoyl) diglucoside-5-(malonoyl) glucoside in RPCC (Figure 11). Most of the anthocyanin components identified in this study have been detected in radish or other *Brassica* crops but not in red Chinese cabbage [20–23,37–43]. Hence, 11 of the 13 pigments identified in our study were detected for the first time in RPCC (Table S1). Joo et al [28] have proved that the anthocyanin rich extract has the ability to lower the risk of vascular inflammatory diseases.

**Figure 10.** Chinese cabbage at young and mature stages. (**a**). Green (GCC) and (**b**). reddish purple (RPCC).

**Figure 11.** Schematic representation of anthocyanin biosynthetic pathway based on anthocyanin pigments and important anthocyanin biosynthetic genes (ABGs) identified from transcriptome data.

#### *3.2. Anthocyanin Biosynthesis Genes Are Di*ff*erentially Regulated in RPCC*

Transcriptome sequencing is an advanced NGS technique that can be used to predict novel genes, gene function, and genome evolution. Comparative transcriptome sequencing between two different phenotypes, GCC and RPCC, revealed differences in the expression levels of genes involved in anthocyanin biosynthesis and the regulation of this pathway [26,44]. The transcriptome sequencing of RPCC and GCC at the seedling stage revealed 3345 DEGs, which included unique genes with unknown functions and TF genes. About 255 DEGs were involved in various functions related to phenylpropanoids, lignins, flavonols, and anthocyanins. Further qRT-PCR analyses showed that PBGs and EBGs were expressed at levels 0.5- to 1.0-fold higher at the seedling stage than at the other two stages (rosette and heading) in RPCC. Many of these genes encoded proteins involved in the early phases of anthocyanin biosynthesis [7,45,46]. Our results indicate that *BrPAL, BrPAL2, BrPAL4, BrC4H, Br4CL2, BrCHS, BrCHI, BrCHI1, BrF3H,* and *BrF3*0*H-1* may be involved in the early phase of anthocyanin biosynthesis (i.e., the conversion of phenylalanine to dihydroquercetin) in RPCC. As shown in Figure 5e, the important LBGs *BrDFR* and *BrLDOX*, whose encoded products catalyze the conversion of dihydroquercetin to cyanidin at the late stage of anthocyanin biosynthesis, were expressed in RPCC at all stages, but not in GCC at any stage. Previous studies have shown that DFR plays crucial roles in anthocyanin accumulation in many plant species under different abiotic stress conditions [47,48]. Similarly, in *Arabidopsis,* sucrose and jasmonic acid have been shown to induce LBGs such as *DFR, LDOX*, and *UF3GT*, leading to anthocyanin accumulation [45,49]. Interestingly, we detected high transcript levels of the LBGs *BrDFR, BrLDOX*, and *BrUF3GT* in RPCC but not in GCC, indicating that anthocyanin biosynthesis occurs in RPCC under normal growth conditions without induction by external factors such as sugars or hormones.

Some downstream LBGs are involved in p-coumaroylation (*At3AT1*: At1g03940), glucosylation (*UGT75C1*:At4g14090) and malonylation (*At5MAT*: At3g29590) [33,50–52]; the orthologs of these genes (*BrAT-1, BrUGT75C1*-*1*, and *Br5MAT*) were expressed only in RPCC. Accordingly, the HPLC analyses detected p-coumaroyl (*At3AT1*: At1g03940) diglucoside (*UGT75C1*: At4g14090), indicating that anthocyanins have been modified as reported previously [33]. Our qRT-PCR analyses showed that the transcript levels of *BrUGT73B2* and *BrAT-2*, whose encoded proteins catalyze p-coumaroylation and glucosylation, respectively, were much higher in RPCC than in GCC at the rosette and heading stages. TT19 encodes a transporter involved in the movement and accumulation of anthocyanins in the Brassicaceae [53,54]. In this study, two TT19 paralogs, *BrTT19-1* and *BrTT19-2,* which have a common ortholog in *A. thaliana* (*AtTT19*: AT5G17220), showed very high transcript levels in RPCC, indicating that transport of anthocyanins from the cytosol to the vacuole is an important process in RPCC.

#### *3.3. Di*ff*erential Expression of MYBs Regulates Reddish Purple Color Accumulation*

In general, ABGs are controlled by various TFs, including the MYB, bHLH, and WD40 TFs that make up the most well-known complex in plants, the MBW complex [55,56]. A study using the transcriptome approach in red Chinese cabbage identified that the anthocyanin pathway regulating genes are TT8 (*Bra037887*) and PAP1 (*c3563g1i2*) [3]. Most TFs that are known to regulate anthocyanins are MYB TFs. A study on grapes reported that the MYBA transcription factor regulates the anthocyanin biosynthesis pathway through controlling the expression of *UFGT* [57]. Because TFs, especially MYB, are known to be important for controlling expression of ABGs, we searched our transcription data for TF sequences. As a result, we identified 25 TF genes with log2fold change > 3: 12 TF genes with log2fold change > 4, five TF genes with log2fold change > 5 and > 6, and one TF gene with log2fold change > 10 (Table 3). They included two important MYB genes: *MYB90* (log2fold change, 11.3) and *MYB75* (log2fold change, 6.8), which are homologs of *PRODUCTION OF ANTHOCYANIN PIGMENT 2 (PAP2)* and *PRODUCTION OF ANTHOCYANIN PIGMENT 1 (PAP1)*. These genes showed similar transcript levels in qRT-PCR and transcriptome analyses, and their high expression levels in RPCC suggested that they might be involved in the positive regulation of LBGs during anthocyanin biosynthesis [13,58]. We identified duplicate copies of *MYBL2, Bra016164 (BrMYBL2-1)* and *Bra007957 (BrMYBL2-2),* in *B.*

*rapa*, which are orthologs of *A. thaliana AtMYBL2* (*AT1G71030*). The transcript levels of both *BrMYBL2-1* and *BrMYBL2-2* were very high, as revealed by transcriptome analysis (log2fold change of 5.5 and 4.06, respectively) and qRT-PCR analysis. A previous study detected a similar expression pattern of *MYBL2* in *B. rapa,* indicative of its role as a positive regulator [59]. However, other studies on the Brassicaceae have demonstrated that MYBL2 can function as a negative regulator of anthocyanin biosynthesis [60,61]. Functional characterization of *BrMYBL2-1* and *Bra007957* (*BrMYBL2-2*) will clarify the molecular mechanisms of these genes in RPCC. Our results also showed that MYB binding motifs are highly conserved not only in LBGs but in most of the analyzed ABGs.

#### **4. Materials and Methods**

#### *4.1. Plant Material and Sample Collection*

The RPCC was developed (through introgression hybridization) and registered by the Kwonnong Seed Company (Cheongju, S. Korea) as described by Lee et al. [4]. The lines used in this study, green Chinese cabbage (GCC) and RPCC, belong to *B. rapa* L. ssp. *pekinensis* and were selected on the basis of their distinct color phenotypes (Figure 1). Seeds of GCC and RPCC were germinated in a growth chamber under a 16-h light/8-h dark photoperiod at 24 ◦C. Leaf samples (innermost and outermost leaves) were collected from three biological replicates at three growth stages: the seedling, rosette, and heading stages (at 3-, 6-, and 9-weeks-old, respectively). The leaf samples were stored at −70 ◦C until further analysis.

#### *4.2. Anthocyanin Extraction and HPLC Analysis*

The total anthocyanin content of freeze-dried outer and inner leaf tissues of 9-week-old GCC and RPCC was determined by HPLC and LC-MS/MS, as described previously [37]. Each 100-mg lyophilized leaf sample was mixed with 2 mL water:formic acid (95:5 *v*/*v*) followed by 5 min vortexing and 20 min sonication. The sample was centrifuged at 9200× *g* for 15 min at 4 ◦C and the supernatant was filtered through a 0.45-µm PTFE hydrophilic syringe filter. From this filtrate, 10 µL was used for estimating anthocyanin content. The sample was injected into an Agilent 1200 series HPLC connected to a 4000 Q-Trap LC-ECI-MS/MS system. A Synergy 4µL Polar-RP 80A column (250 × 4.6 mm i.d., particle size 4 µm; Phenomenex, Torrance, CA, USA) with a Security Guard Cartridge (AQ C18, 4 × 3.0 mm KJO-4282; Phenomenex, Torrance, CA, USA) were used. Anthocyanin pigments were detected at 520 nm. The oven temperature was set to 40 ◦C. The composition of the mobile phase was as follows: solvent A: water:formic acid (95:5 *v*/*v*), and solvent B (acetonitrile:formic acid, 95:5 *v*/*v*). The gradient conditions were as follows: 0–8 min, 8–13 min, 13% solvent B; 13–20 min, 20–23 min, 17% solvent B; 23–30 min, 30–40 min, 20% solvent B; 40–40.1 min, 5% solvent B; and 40.1–50 min, 5% solvent B. The anthocyanin concentration in each sample was measured by comparison of the area of each peak with that of the external standard (cyanidin-3-*O*-glucoside) on the HPLC chromatogram. Mean ± SD values were calculated from the three replicates of each sample.

#### *4.3. Sequence Pre-Processing and Assembly*

For transcriptome analysis, total RNA was extracted from leaf tissues of 3-week-old plants using an RNeasy Mini kit (Qiagen, Valencia, CA, USA) and sequenced with the Illumina Hi-seq2000 platform by SEEDERS Inc. (Korea). The sequence data was submitted to the NCBI database and are available under the accession number PRJNA612946. The raw reads were trimmed using the Dynamic-Trim and Length-Sort programs of the Solexa QA [62] package. Based on the Dynamic-Trim (phred score ≥ 20) and Length-Sort (short read length ≥ 25bp) parameters, clean reads were obtained. The clean reads were assembled according to the protocols of Velvet (version 1.2.08) and Oases (version 0.2.08) software [63]. The optimal k-mer was selected based on the max length, average length, and N50 according to the total length of the assembled sequence.

#### *4.4. Mapping and Annotation of Transcripts*

To identify gene function, the transcripts were used as queries in BlastX searches against the amino acid sequences in the BRAD [64] and KEGG databases with the following parameters: filter criterion: e-value ≤ 1e−10, best hits. Mapping was performed using Bowtie2 (v2.1.0) software with the following limitation: mismatch ≤ 2 bp, computed by the penalty method) [65]. Transcript levels were normalized using the R package of DESeq [66], and this software was also used to calculate the gene expression values for each sample with data deviation.

#### *4.5. Identification of Transcription Factors*

To identify TFs, sequences of all *B. rapa* 4127 TFs were downloaded from the Plant Transcription Factor Database [67] (http://planttfdb.cbi.pku.edu.cn/). The total assembled transcripts were compared and analyzed using BlastX software with the parameters e-value ≤ 1e−50 and identity ≥ 50, and annotated.

#### *4.6. Prediction of Di*ff*erentially Expressed Genes*

Differentially expressed genes (DEGs) were defined as those with at least a log2fold difference in transcript levels between the RPCC and GCC samples. Up-regulated genes were those with log2 fold-change greater than 1, and down-regulated genes were those with log2 fold-change less than −1 [22].

#### *4.7. Expression Analyses*

Leaf samples collected from 3-, 6-, and 9-week-old plants were collected and immediately frozen in liquid nitrogen. Total RNA was extracted using an RNeasy Mini kit (Qiagen). cDNA was synthesized using a Reverse Transcription System kit (Promega, Madison, WI, USA). The resulting cDNA was used as a template for qRT-PCR analyses, which were conducted with the CFX96 Real-Time system (Bio-Rad, Hercules, CA, USA). qRT-PCR analyses were conducted for 60 anthocyanin biosynthetic genes (ABGs) with three biological replicates and three technical replicates using the following conditions: 95 ◦C for 3 min; 39 cycles of 95 ◦C for 15 s and 58 ◦C for 20 s. The relative expression levels were determined by normalizing the data with the comparative Ct method 2−[∆∆*C*t] [68] using the *Actin* gene as a reference.

#### *4.8. Gene Ontology Annotation and KEGG Enrichment Analyses*

Gene ontology (GO) [69] alignments were performed using total transcripts and 'GO DB' with the threshold of 'counts ≥ 1' and 'GO depth' set to 3. Genes were classified into three functional categories, BP (Biological Process), CC (Cellular Component), and MF (Molecular Function). Then, gene annotation (filter criterion: e-value ≤ 1e−10, best hits) was performed through comparisons with amino acid sequences in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database using BLASTX.

To predict the functional characteristics of up- and down-regulated genes, we first computed *p*-values based on Fisher's exact test; these values were taken as indicators of significance of the gene in the respective function. Further KEGG enrichment analysis was performed using the KOBAS online tool (http://kobas.cbi.pku.edu.cn/kobas3) [70]. These analyses provided pathway annotations for the transcripts.

#### *4.9. Identification of Cis-Regulatory Elements*

The 2-kb region upstream from the transcription start site of selected genes was screened to identify cis-regulatory motifs using the New PLACE web tool [71].

#### *4.10. Network Analysis*

We carried out network analysis to construct an active gene-to-gene regulatory network of genes positively correlated with the anthocyanin biosynthetic pathway. The STRING 10.0 database (http://string-db.org/) was used to obtain an interaction network of these genes in *B. rapa* [72], according to orthologous genes in *A. thaliana*. Every link has a score from 0 to 1, where 1 is considered as the highest confidence link for reconstruction [73].

#### **5. Conclusions**

In conclusion, 11 of the 13 pigments detected in RPCC are reported for the first time for this variety. Analyses of the transcriptome data from two varieties at the seedling stage revealed many unique transcripts including DEGs and TF genes that are involved in a multitude of functions in growth and development. Our results show that many DEGs between the red and green varieties are involved in the biosynthesis of secondary metabolites such as phenylpropanoids, lignins, flavonoids, and anthocyanins, and in the regulation of these biosynthetic pathways. Further qRT-PCR expression analyses confirmed that ABGs and many TFs play essential roles in anthocyanin biosynthesis. The gene-to-gene interaction network illustrates the possible regulatory mechanism of MYBs with ABGs during anthocyanin biosynthesis in RPCC. Overall, our study describes the pigments in RPCC, identifies the important anthocyanin biosynthetic genes and TF genes that control the anthocyanin biosynthesis pathway, and proposes a model for the possible interaction mechanism between ABGs and TFs.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/8/2901/ s1. References [74–85] are cited in Supplementary Materials. Figure S1. Statistics of non-redundant transcripts predicted in transcriptome data. (a) Length distribution of transcripts. (b) Transcripts annotated in public databases. Figure S2 MA plot of DEGs between green (G) vs. reddish purple (RP) Chinese cabbage identified from transcriptome data. X-axis shows log2fold change and y-axis shows differentially expressed genes. Red and green dots indicate up-regulated and down-regulated genes, respectively. Figure S3. Additional ABGs showing differential expression in transcriptome data validated by qRT-PCR. Expression values were normalized against that of *Actin*. Error bars are based on mean of three technical replicates. Table S1. Total anthocyanin pigments identified in reddish purple Chinese cabbage leaf samples. Table S2. Total up-regulated DEG transcripts between RPCC and GCC with annotations from BRAD and TAIR databases. Table S3. Total down-regulated DEG transcripts between RPCC and GCC with annotations from BRAD and TAIR databases. Table S4. Transcription factor genes up-regulated between RPCC and GCC samples, as identified from transcriptome data. Table S5. Transcription factor genes down-regulated between RPCC and GCC, as identified from transcriptome data. Table S6. Gene ontology (GO) annotation of DEG transcripts predicted from transcriptome data. GO annotations are shown for up-regulated genes (left) and down-regulated genes (right). Table S7. KEGG pathway annotation of DEG transcripts predicted in transcriptome data. KEGG annotations are shown for up-regulated genes (left) and down-regulated genes (right). Table S8. Genes identified from transcriptome data that are involved in phenylpropanoid and anthocyanin pathways with expression values and GO and KEGG annotations. Table S9. List of primer sequences used for qRT-PCR of anthocyanin biosynthetic genes. Table S10. *Cis*-regulatory elements (CREs) identified in upstream promoter regions of anthocyanin biosynthetic genes.

**Author Contributions:** Conceptualization, J.J.R., S.R.C., S.S.C., S.Y.Y., M.-H.O., S.U.P., S.-J.K., J.L., O.H.K. and Y.P.L.; Data curation, J.J.R., S.H.O.; Formal analysis, J.J.R., S.S.C., S.Y.Y., S.H.O., H.K., C.Y.L., S.U.P., K.S.-J.; Funding acquisition, S.R.C. and YPL; Investigation, J.J.R., S.Y.Y., H.K., C.Y.L., J.L. and O.H.K.; Methodology, J.J.R., S.S.C., S.H.O., H.K., C.Y.L., J.L. and O.H.K.; Project administration, S.R.C. and Y.P.L.; Resources, S.R.C., J.L., O.H.K. and Y.P.L.; Software, J.J.R. and M.-S.K.; Supervision, S.R.C. and Y.P.L.; Validation, J.J.R. and S.S.; Writing–original draft, J.J.R.; Writing–review & editing, J.J.R., S.R.C., S.S.C., S.Y.Y., M.-S.K.,S.S., M.-H.O., S.U.P., S.-J. K., J.L., O.H.K. and Y.P.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, and Forestry (IPET) through Golden Seed Project (213006-05-4-SBD30, 213006-05-4-SB110), funded by Ministry of Agriculture, Food and Rural Affairs (MAFRA), Ministry of Oceans and Fisheries (MOF), Rural Development Administration (RDA) and Korea Forest Services (KFS).

**Acknowledgments:** We thank Jennifer Smith, from Liwen Bianji, Edanz Group China (www.liwen bianji.cn/ac), for editing the English text of a draft of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**




#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
