**1. Introduction**

In Orchidaceae, *Dendrobium* species are one of the most popular orchids known for their medicinal and commercial value in potted and cut flower industries [1]. The *Dendrobium* genus contains approximately 1800 species and are mainly distributed throughout Asia and the South Pacific [2]. *Dendrobium catenatum* (also named *Dendrobium o*ffi*cinale*), *Dendrobium nobile*, and *Dendrobium candidum* are used in herbal medicines in many Asian countries [3]. Moreover, the *Dendrobium* genus is known for its valuable floral traits including colors, morphologies, and scent and *Dendrobium* species are regarded as some of the important commercial cut flowers. A variety of *Dendrobium* hybrids have been created that have improved flower colors. However, limitation of genetic resources in *Dendrobium* limits the extent to which flower color can be modified.

*D. bigibbum* is an epiphytic or lithophytic orchid that contains cylindrical pseudobulbs, each having between three and five green or purplish leaves and arching flowering stems with up to 20 usually lilac-purple flowers. The *D. bigibbum* plants containing purple spots on their leaves are very popular in the commercial market. Although colorful leaves and flowers add significant ornamental values to orchids, our understanding of the differential pigmentation in *D. bigibbum* remains limited. Natural agents extracted from various parts of *Dendrobium* contain bioactive substances, such as phenolic compounds, anthocyanins, and polysaccharides [4–6]. Many of these phenolic compounds and anthocyanins have well-known antioxidant activities [7] and contribute to leaf and flower coloration [8]. Anthocyanins are water-soluble, which are present in the vacuoles of plant epidermal cells and impart an orange, red, or blue color to flowers, fruits, stems, leaves, and roots [9]. Anthocyanin biosynthesis is a well-studied secondary metabolic pathway in plants that involves the conversion of phenylalanine into 4-coumaryl-CoA, followed by their conversions to flavonoid compounds. Studies in antirrhinum [10], petunia [11,12], maize [13,14], *Brassica* [15,16], and *Arabidopsis* [17,18] have identified genes that regulate anthocyanin production, and these can be broadly classified into two major groups. The first group consists of enzymes that participate in anthocyanin biosynthesis, including phenylalanine ammonia-lyase (PAL), chalcone synthetase (CHS), chalcone isomerase (CHI), flavanone 3-β-hydroxylase (F3H), dihydroflavonol 4-reductase (DFR), anthocyanin synthase (ANS), and UDP-glucose flavonoid 3-O-glucosyl transferase [19]. Loss-of-function mutations in *CHS*, *CHI*, *F3H*, *DFR*, or *ANS* abolish anthocyanin biosynthesis, and plants harboring these mutations often produce colorless tissues [20–24]. Anthocyanin accumulations in green and red leaves of *Dendrobium o*ffi*cinale* stems have been associated with *ANS* and UDP-glucose flavonoid 3-*O*-glucosyl transferase expression [25]. Moreover, Yu et al. suggested that among anthocyanins, delphinidin 3,5-*O*-diglucoside and cyanidin 3-*O*-glucoside may be responsible for the red peel color of *D. o*ffi*cinale* [25]. The second group contains MYBs, basic helix-loop-helixes (bHLHs), or WD40 repeat transcription factors (TFs) that regulate the expression levels of genes involved in anthocyanin biosynthesis [26,27]. Earlier studies on *Arabidopsis* and *Medicago truncatula* indicated that MYB2 acts as a transcriptional repressor of anthocyanin biosynthesis and that the overexpression of MYB2 abolishes anthocyanin biosynthesis [28, 29]. However, the overexpression of orchid *MYB2* in petunia results in increased petal pigmentation [11]. Likewise, the transient overexpression of *Phalaenopsis equestris MYB2* positively regulates anthocyanin pigmentation and is associated with the increased expression of downstream genes *PeF3H5*, *PeDFR1*, and *PeANS3* [30]. Conversely, silencing of *PeMYB2* results in reduced anthocyanin accumulation [31]. Thus, depending on the plant system, MYB2 appears to serve as either a negative or positive regulator of anthocyanin biosynthesis.

In this study, we characterized an orchid mutant that was isolated on the basis of its unusual leaf color. The *D. bigibbum* mutant accumulated higher levels of anthocyanin, which in turn was associated with the increased expression of genes regulating anthocyanin biosynthesis. This also included the *MYB2* gene, which, when transiently expressed in a heterologous system, led to induction of genes associated with anthocyanin biosynthesis.

#### **2. Results**

#### *2.1. The Purple Mutant of D. Bigibbum Accumulates Higher Levels of Anthocyanin*

We used γ-irradiated *D. bigibbum* rhizomes to produce a mutant that developed purple leaves in comparison to the green leaves seen on wild type (WT) plants (Figure 1A,B) (Figure S1). This mutant, designated as RB016-S7, was propagated through four generations of tissue culturing. To determine whether the purple coloration of the mutant's leaves was associated with anthocyanin pigmentation, we used a pH-differential-based method to quantify the anthocyanin content. The anthocyanin content in the purple leaves (11.68 mg/g dry weight) was ~7.0-fold higher than in the green leaves (1.66 mg/g dry weight) (Figure 1C). Thus, the purple coloration of RB016-S7 leaves was likely associated with the increased biosynthesis of anthocyanins.

To understand the biochemical basis of the increased anthocyanin production in the RB16-S7 mutant, we analyzed genome-wide changes in gene expression. Total RNAs from WT and RB016-S7 leaves were used to construct six cDNA libraries that were sequenced using the Illumina HiSeq 2500 platform. After filtering and quality trimming the raw reads, we obtained 47–66 million high quality reads. Using Trinity, the clean reads from the six libraries were assembled into 110,104 transcripts,

**BEFORE TRIMMING** 

**AFTER ASSEMBLY**  Number of transcripts in the combined

Number of unigenes in the combined

data 110,104

data 32,575

Total nucleotides of transcripts (bp) 122,947,955 Total nucleotides of unigenes (bp) 34,155,642

**AFTER TRIMMING** 

with an average length of 1116 bp, and these were then assembled into 32,575 unigenes, with an average length of 1048 bp (Table 1). The sequence length distribution of unigenes showed that 8373 unigenes (25.7%) ranged from 100 to 500 bp, 11,350 unigenes (34.8%) ranged from 501 to 1000 bp, and 6488 unigenes (19.91%) had lengths of more than 1500 bp (Figure 2). The 30,714 unigenes were matched with the non-redundant (nr) database, and among these, 26,851 unigenes matched sequences from *Dendrobium catenatum*, followed by *Phalaenopsis equestris* (2263) and *Apostasia shenzhenica* (209) (Figure 3A). Furthermore, this was consistent with the phylogenetic analysis carried out among native *Dendrobium spp*, *Cymbidium spp*, *P. equestris* and *A. shenzhenica* orchids, which, as expected, showed relatedness among *Dendrobium spp* (Figure 3B). *Int. J. Mol. Sci.* **2020**, *21*, x FOR PEER REVIEW 3 of 17

**Figure 1.** Images of *Dendrobium bigibbum* and the anthocyanin contents in the leaves. (**A**) Morphological phenotypes of typical wild type (WT) and RB016-S7 mutant *D. bigibbum* plants*.* (**B**) Relative anthocyanin contents in the WT and RB016-S7 mutant. Error bars represent standard deviations (*n* = 3). The experiment was repeated three times with similar results. (**C**) The number of up- and downregulated differentially expressed genes (DEGs) in the WT 3 versus RB016-S7 mutant **Figure 1.** Images of *Dendrobium bigibbum* and the anthocyanin contents in the leaves. (**A**) Morphological phenotypes of typical wild type (WT) and RB016-S7 mutant *D. bigibbum* plants. (**B**) Relative anthocyanin contents in the WT and RB016-S7 mutant. Error bars represent standard deviations (*n* = 3). The experiment was repeated three times with similar results. (**C**) The number of up- and downregulated differentially expressed genes (DEGs) in the WT 3 versus RB016-S7 mutant comparison.



which, as expected, showed relatedness among *Dendrobium spp* (Figure 3B). Q30, base call accuracy of 99.9%; N50, the sequence length of the shortest unigene at 50% of the total genome length.

**Table 1.** Summary of RNA sequencing and de novo transcriptome assembly results.

Number of clean reads 56,856,166 66,205,490 57,973,956 55,934,410 56,856,134 47,287,332 GC content (%) 46.42 44.22 45.73 47.06 46.79 46.55 Q30 percentage (%) 95.67 95.86 95.56 95.43 95.68 95.22

**Sequences Control 1 Control 2 Control 3 RB016-S7-1 RB016-S7-2 RB016-S7-3** 

Mean length of transcripts (bp) 1,116 Mean length of unigenes (bp) 1,048 N50 of unigenes (bp) 1350

Mean length of transcripts (bp) 1,116 Mean length of unigenes (bp) 1,048 N50 of unigenes (bp) 1350

Q30, base call accuracy of 99.9%; N50, the sequence length of the shortest unigene at 50% of the total

Q30, base call accuracy of 99.9%; N50, the sequence length of the shortest unigene at 50% of the total

*Int. J. Mol. Sci.* **2020**, *21*, x FOR PEER REVIEW 4 of 17

**Figure 2.** Sequence length distribution of the unigenes in *D. bigibbum* transcriptomes**.** The *x*-axis indicates unigene length intervals from 200 bp to >3000 bp. The *y*-axis indicates the number of unigenes of each given length. **Figure 2.** Sequence length distribution of the unigenes in *D. bigibbum* transcriptomes. The *x*-axis indicates unigene length intervals from 200 bp to >3000 bp. The *y*-axis indicates the number of unigenes of each given length. **Figure 2.** Sequence length distribution of the unigenes in *D. bigibbum* transcriptomes**.** The *x*-axis indicates unigene length intervals from 200 bp to >3000 bp. The *y*-axis indicates the number of unigenes of each given length.

**Figure 3.** Species distribution of the BLAST search results in the nr database. (**A**) A cut off E-value of 10-5 was used. Different species are indicated by different colors. (**B**) A reference phylogenetic tree derived from rDNA ITS 2 sequences of 14 species of *Dendrobium*, 3 species of *Cymbidium*, *Apostasia shenzhenica*, and *Phalaenopsis equestris*. (**C**) The number of up- and downregulated differentially expressed genes (DEGs) in the wild type versus RB016-S7 mutant comparison. **Figure 3.** Species distribution of the BLAST search results in the nr database. (**A**) A cut off E-value of 10-5 was used. Different species are indicated by different colors. (**B**) A reference phylogenetic tree derived from rDNA ITS 2 sequences of 14 species of *Dendrobium*, 3 species of *Cymbidium*, *Apostasia shenzhenica*, and *Phalaenopsis equestris*. (**C**) The number of up- and downregulated differentially expressed genes (DEGs) in the wild type versus RB016-S7 mutant comparison. **Figure 3.** Species distribution of the BLAST search results in the nr database. (**A**) A cut off E-value of 10−<sup>5</sup> was used. Different species are indicated by different colors. (**B**) A reference phylogenetic tree derived from rDNA ITS 2 sequences of 14 species of *Dendrobium*, 3 species of *Cymbidium*, *Apostasia shenzhenica*, and *Phalaenopsis equestris*. (**C**) The number of up- and downregulated differentially expressed genes (DEGs) in the wild type versus RB016-S7 mutant comparison.

#### *2.2. Functional Annotation and Classification*

*2.2. Functional Annotation and Classification*  In the Gene Ontology (GO) analysis, 17,498 unigenes (53.71%) were assigned to three GO terms and were categorized into 41 functional groups (FDR < 0.05) (Table S1). The GO assignments were *2.2. Functional Annotation and Classification*  In the Gene Ontology (GO) analysis, 17,498 unigenes (53.71%) were assigned to three GO terms and were categorized into 41 functional groups (FDR < 0.05) (Table S1). The GO assignments were In the Gene Ontology (GO) analysis, 17,498 unigenes (53.71%) were assigned to three GO terms and were categorized into 41 functional groups (FDR < 0.05) (Table S1). The GO assignments were divided into three categories: biological process (BP), cellular component (CC), and molecular function (MF). Among these, 10,569 unigenes (32.4%), 9195 unigenes (28.2%), and 12,401 unigenes (38%), were assigned to BP, CC, and MF, respectively. In the BP category, the predicted proteins were mainly distributed in metabolic process (30.61%) and cellular process (28.36%), followed by biological regulation (7.34%), localization (6.87%), and regulation of biological process (6.09%). Predicted proteins assigned to the CC category were mainly associated with cellular anatomical entity (55.61%), intracellular (30.19%), and

protein-containing complex (12.84%). Furthermore, in the MF category, the most heavily represented groups were linked to catalytic activity (47.13%), binding (40.78%), and transporter activity (5.04%) (Figure S2).

To predict and classify the gene functions, we queried all the unigenes against the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) (v4.5) database. This database contains the functional descriptions and classifications of the orthologous proteins, including Clusters of Orthologous Groups and euKaryotic Orthologous Groups. This analysis allowed us to allocate 27,963 unigenes to 25 eggNOG classifications. Among them, the eggNOG category of functional unknown (S, 27.88%) represented the largest group, followed by signal transduction mechanisms (T, 8.58%), posttranslational modification, protein turnover, chaperones (O, 8.13%), transcription (K, 8.09%), and carbohydrate transport and metabolism (G, 5.70%) (Figure S3).

Next, we mapped the assembled unigenes to the reference anthocyanin pathways, including metabolism, genetic information processing, environmental information processing, and cellular processes, in the KEGG (http://www.kegg.jp/kegg/pathway.html). The 6314 unigenes were assigned to 394 KEGG sub-pathways (Table S2). These pathways included KEGG orthology (KO) entries for metabolism (3503 KOs), genetic information processing (950 KOs), environmental information processing (488 KOs), cellular processes (702 KOs), and organismal systems (671 KOs) (Figure S4).
