**2. Results and Discussion**

#### *2.1. Functional Annotation and Classification*

The transcriptome sequencing was conducted using an Illumina HiSeq platform (Beijing Novogene Biological Information Technology Co., Ltd., Beijing, China); clean reads were used for de novo assembly using Trinity with the default parameters to obtain 140,042 transcripts, and TGICL version 2.1 software was used to cluster the transcripts into 98,389 unigenes. There were 18,521 unigenes (18.82% of the total unigenes) with a length of more than 1 Kb and high assembly integrity (Table 1). To learn more about the features and functions of these unigenes, we aligned them with the sequences of public databases, including Nr, Nt, Pfam, KOG/COG, Swiss-Prot, KEGG, and GO (see Table 2). It can be seen from the results that at least some of the unigenes were not annotated in all seven databases, which demonstrates that there are unknown unigene sequences. This is consistent with the fact that the annotation of the unigenes of *Glycyrrhiza uralensis* [19], *Luculia gratissima* [20], and *Prunella vulgaris* [21] did not reach 100%. This result may be caused by the length of some unigenes being so short that the annotation information is

incomplete, or by the specific unigenes of this species not being fully recognized and the relevant information not being included in the database. The specific reasons for this need to be further studied via sequence analysis and gene expression verification.


**Table 1.** Length distribution after assembly in the flower transcriptome of *L. yunnanensis*.

**Table 2.** The statistics of the unigenes' annotation in the flower transcriptome of *L. yunnanensis*.


According to the species classification from the result of BLAST with Nr (Figure 1A), *Coffea canephora* (Rubiaceae, *Coffea*) had the highest matching rate, followed by *Vitis vinifera* (Vitaceae, *Vitis*), with the next highest being *Nicotiana tomentosiformis* (Solanaceae, *Nicotiana*). As can be seen from Figure 1B, 38.5% of the unigenes can be fully matched to Nr (the smaller the E-value, the higher the degree of matching and the similarity), and the degree of matching in Nr is relatively high. Figure 1C shows that 11.0% of the unigenes had more than 95% similarity, and 54% of the unigenes had 80–95% similarity (the higher the similarity, the higher the confidence).

All unigenes were assessed for GO assignments based on the Nr annotations. The unigenes were categorized into biological process (BP), cellular component (CC), and molecular function (MF) to reveal the gene function classification (Figure 2). Within these functional groups, "metabolic process" was the dominant group among biological processes (282,588), which contained 144,606 unigenes, and second was "cellular process" (62,514). Among cellular components (83,109), "cell part" (24,619) contained the largest number of unigenes, followed by "organelle" (19,336). With a total of only 57,203 unigenes in the molecular function category, the two largest groups were "binding" and "catalytic activity", which included 41,251 and 11,311 unigenes, respectively. This result was similar to those for *Lycium barbarum* [22], *Rhododendron fortune* [23], and *Elaeagnus mollis* Diels [24], which showed that metabolic processes and cellular processes contained the largest numbers of unigenes in all subcategories. The genes successfully annotated by KOG were classified into 26 KOG groups, as shown in Figure 3. The dominant group was general function prediction (1936), followed by the posttranslational modification, protein turnover, and chaperones (1459), while the smallest group was cell motility (only 4). The KOG classification results of the three species mentioned above are still consistent with the results of our study. In addition, the group of unknown functions contained 443 unigenes, accounting for 3.6% of the total annotation information in the *L. yunnanensis* flower transcriptome. Studies on *Castanea henryi* (Skan) Rehd. et Wils. [25], *Pinus yunnanensis* Franch. [26], *Rhododendron longipedicellatum* Lei Cai et Y. P. Ma [27], and *Phyllanthus emblica* [28] have produced

similar results. We speculate that this may be caused by insufficient annotation information. By comparing the KEGG database, the unigenes can be classified into 22 KEGG pathways according to the signaling pathways involved. Of these pathways, the most represented was translation (957), followed by carbohydrate metabolism (918) and signal transduction (910) (Figure 4).

**Figure 1.** Characterization of assembled *L*. *yunnanensis* unigenes using the Nr database: (**A**) Species distribution for the assembled unigenes. (**B**) E-value distribution for the assembled unigenes. (**C**) Similarity distribution for the assembled unigenes.

**Figure 2.** GO classification of assembled unigenes of the *L. yunnanensis* flower transcriptome. The *x*-axis indicates the subgroups in GO annotation; the *y*-axis indicates the number of genes in each category.

**Figure 3.** KOG classification of assembled unigenes of the *L. yunnanensis* flower transcriptome. The *x*-axis indicates the 26 groups in KOG annotation; the *y*-axis indicates the percentage of the number of genes in each group relative to the total number of annotated genes.

**Figure 4.** KEGG metabolic pathways of assembled unigenes of the *L. yunnanensis* flower transcriptome: (**A**) cellular processes; (**B**) environmental information processing; (**C**) genetic information processing; (**D**) metabolism; (**E**) organismal systems. The *x*-axis indicates the number of genes in each metabolic pathway and the ratios of the number of genes to total number of annotated genes; the *y*-axis indicates the names of the KEGG metabolic pathways.

The unigenes were roughly divided into 3 functional categories and 56 subcategories according to GO function, among which metabolic process, cellular process, cell, cell part, organelle, binding, and catalytic activity were highly enriched. In other words, the expression levels of genes related to cellular activity, metabolic activity, and catalytic activity was high, indicating that *L. yunnanensis* has strong metabolic capacity. We hypothesized that this may be related to the continuous cell proliferation of the meristem during flower development and the vigorous metabolic activities in the flower organs of *L. yunnanensis*. Xia et al. [29] analyzed the flower transcriptome of *Camellia sinensis*, and their results and hypotheses were similar to ours. KEGG functional annotation analysis showed that unigenes could be grouped into 5 categories, among which the pathways related to metabolism and genetic information processing accounted for the highest proportion, and the number of genes related to metabolism-related pathways was the largest. This was further proof that there was strong metabolic activity in *L. yunnanensis* during this period. Additionally, the metabolic pathways of amino acid metabolism, carbohydrate metabolism, lipid metabolism, biosynthesis of other secondary metabolites, and environmental adaptation were involved in the KEGG functional annotation analysis. These data provide a molecular basis for further research on resistance mechanisms, and allow us to explore the genes related to the flowering regulation and environmental adaptability of *L. yunnanensis*.
