Next Article in Journal
Potential Distribution and Suitable Habitat for Chestnut (Castanea sativa)
Next Article in Special Issue
Identification of CpbZIP11 in Cyclocarya paliurus Involved in Environmental Stress Responses
Previous Article in Journal
Resistance to Bark Beetle Outbreak in Norway Spruce: Population Structure Analysis and Comparative Genomic Assessment of Surviving (LTS) and Randomly Selected Reference Trees
Previous Article in Special Issue
Recent Advances in Flower Color and Fragrance of Osmanthus fragrans
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Full-Length Transcriptome Sequencing and Identification of Genes Related to Terpenoid Biosynthesis in Cinnamomum migao H. W. Li

1
Pharmacy College, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
2
Key Laboratory of State Forestry Administration on Biodiversity Conservation in Karst Mountain Area of Southwest of China, School of Life Science, Guizhou Normal University, Guiyang 550025, China
3
Yunfu Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Yunfu 527400, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Forests 2023, 14(10), 2075; https://doi.org/10.3390/f14102075
Submission received: 14 September 2023 / Revised: 11 October 2023 / Accepted: 11 October 2023 / Published: 17 October 2023
(This article belongs to the Special Issue Genetic Regulation of Growth and Development of Woody Plants)

Abstract

:
Cinnamomum migao H. W. Li is an evergreen woody plant that is only distributed in southwest China. The volatile oils from the fruits of C. migao have long been used as a special authentic medicinal herb by local ethnic minorities. Due to its low seed germination rate, destructive lumbering and low rates of artificial planting, C. migao is facing the danger of extinction. Therefore, it is urgent to exploit and protect this species using molecular biological technology, especially to target the genes involved in the biosynthesis of terpenoids in the volatile oil. However, the genomic data for this plant are not available. In this study, the transcriptome sequencing of C. migao was performed to obtain the key genes involved in terpenoid biosynthesis via a combination of full-length transcriptome and next-generation sequencing (NGS). More than 39.9 Gb of raw data was obtained and 515,929 circular consensus sequences (CCSs) were extracted. After clustering similar 472,858 full-length non-chimeric (FLNC) reads and correction with NGS data, 139,871 consensus isoforms were obtained. Meanwhile, 73,575 non-redundant transcripts were generated by removing redundant transcripts. Then, 70,427 isoforms were successfully annotated using public databases. Moreover, differentially expressed transcripts (DETs) in four different developmental stages of the C. migao fruit were analyzed and 5764 transcripts showed period-specific expression. Finally, 15 transcripts, 6 transcripts, and only 1 transcript were identified as being involved in the biosynthesis of sesquiterpenoids, diterpenoids, and monoterpenoids, respectively. This study provides a basis for future research in the gene mining, genetic breeding, and metabolic engineering of C. migao.

1. Introduction

Cinnamomum migao H. W. Li, commonly known as “Da Guo Mu Jiang Zi”, is an important economical and medicinal plant in China. It was first recorded in the “Supplement of the Compendium of Materia Medica” by Xuemin Zhao in 1765 [1]. Specimens of C. migao were originally collected by Mr. Xitao Cai in 1958 and identified as a new species of Cinnamomum in Lauraceae by Prof. Xiwen Li in 1978 [2]. As a tropical and subtropical plant, C. migao generally grows at altitudes of 300–850 m and is only distributed in southwest China, including Guizhou, Yunnan, and Guangxi Provinces. Due to its rich root system, hard durable wood, good water retention ability, and strong photosynthesis, C. migao can not only improve and protect ecological environments, but also provide quality materials for building, furniture, and handicrafts. More importantly, the ripe fruits of C. migao are also used as a common medication among the Miao and Buyi ethnic groups in Guizhou Province [3]. The volatile oils of C. migao fruits have multiple therapeutic functions for various diseases and symptoms, such as stomachache, abdominal pain, pectoralgia, rheumatic arthritis pain, nausea, and chest tightness [4,5]. In our previous research, monoterpenoids and sesquiterpenoids, the components with the highest content of the volatile oil of C. migao, were shown to have the potential to play a major pharmaceutical role in clinical practices [6,7]. This is consistent with the results of other groups [8,9,10]. The Miao medicine “Li Qi Huo Xue Di Wan” was developed using the volatile oil as the main raw material to treat angina pectoris. Meanwhile, C. migao fruits can also be used as raw materials for the extraction of natural flavorings and precious spices, and are widely applied for the production of cigarettes, food additives, and cosmetics [1].
It should be noted that C. migao is becoming increasingly scarce because of environmental degradation, destructive lumbering, the lack of artificial planting, and this plant’s low seed germination rate. Encouragingly, some researchers and enterprises have started to study the introduction and cultivation of C. migao and have achieved some good results. For example, Jingzhong Chen et al. assessed the effects of light, stratification, alternating temperatures, and gibberellic acid on the seed dormancy release of C. migao, and their results showed that the most effective method was gibberellic acid pretreatment combined with the 15 °C stratification treatment [11]. However, it is still extremely urgent to protect wild C. migao.
Besides artificial cultivation, genetic resource conservation and exploitation is another approach to species protection. With the development of omics technology, high-throughput sequencing has become an important tool for molecular biology research. Presently, there are no genomic data for C. migao. Thus, transcriptome sequencing is an effective method for gathering genetic data and analyzing gene function. For example, Xiaolong Huang et al. first used NGS to sequence the seeds of C. migao and determined the gene expression patterns during seed germination, which provided a foundation for uncovering the molecular mechanisms of seed germination [12]. This is the sole report on transcriptome sequencing in C. migao.
In order to enrich the genetic data for C. migao, especially for the genes involved in the biosynthesis of the volatile oils from C. migao fruits in this study, Pacific Bioscience (PacBio)’s single-molecule real-time (SMRT) technology combined with NGS was used to generate the full-length transcriptome of C. migao. Meanwhile, the transcriptome differences between the different developmental stages of C. migao fruits were also analyzed. This will generate a large amount of transcriptome data, which will enable the mining of hub genes involved in the biosynthesis of the volatile oil.

2. Materials and Methods

2.1. Plant Materials

The materials of C. migao were located in the field of Luodian County, Guiyang City, Guizhou Province, China (106°64′ E, 25°29′ N). These plants were grown in the slope zone of a hilly area, belonging to a karst landform, and located in a subtropical humid monsoon climate. The average annual temperature is 19.6 °C and the average annual rainfall is 1100 mm. The fruits were collected from the same plant (24 years old) from May to November and were rapidly frozen with liquid nitrogen, and then stored in a −80 °C freezer. The fruits of C. migao from four different developmental stages were collected and used for RNA extraction. These samples were named CMFI (fruits at post-flowering period), CMFII (fruits at young fruit stage), CMFIIII (fruits at expanding period), and CMFIV (fruits at maturation stage). Meanwhile, the roots (CMR), stems (CMS), and leaves (CML) were also collected for full-length transcriptome analysis.

2.2. RNA Extraction, Library Construction, and Transcriptome Sequencing

In total, 21 RNA samples were extracted from seven C. migao samples (CMFI, CMFII, CFMIII, CMFIV, CMR, CMS and CML; each sample was extracted in triplicate) using an RNA extraction kit (Biomarker, Beijing, China). The complete RNA has obvious ideal bands, which can be divided into 28S, 18S, and 5S. The 28S and 18S peaks are relatively sharp. The brightness of the 28S band is 1–2 times that of the 18S band, while the 5S band is generally very weak or even non-existent. These characteristics can be detected using agarose gel electrophoresis. Based on the height and sharpness of the 28S and 18S peaks, the Agilent 2100 could produce RIN (RNA integrity number) values that indicate the quality of the RNA samples. Therefore, the RNA quality was also assessed using an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA).
For the full-length transcriptome sequencing, 21 high-quality RNA samples were mixed and used as templates for the synthesis of full-length cDNA using a SMARTerTM PCR cDNA Synthesis Kit following the user’s manual. Briefly, 1 μg of total RNA was reverse-transcribed into the first-strand cDNA using 3′ SMART CDS Primer II A and SMART II A Oligonucleotides. Then, 40 μL of TE buffer was added to dilute the first-strand reaction product. Subsequently, 1 μL of diluted single-strand cDNA was amplified using 5′ PCR Primer II A using long-distance (LD) PCR. After amplification, end repair, joint connection, and exonuclease digestion, the cDNA library was constructed and sequenced with PacBio equipment. For the Illumina platform, the RNA samples with an RIN value above 8.0 were commended for library construction and sequencing. Then, 12 RNA samples from C. migao fruits at four different developmental stages were collected using magnetic beads with Oligo (dT) and randomly fragmented with fragmentation buffer. Then, their cDNAs were synthesized using random hexamers and purified with AMPure XP beads. After end repair, poly-A tail addition and joint connection, 12 cDNA libraries were constructed using PCR amplification and sequenced on a Hiseq2500 Illumina platform (Biomarker, Beijing, China).

2.3. Raw Data Analysis

For full-length transcriptome data, CCS reads were extracted from the raw data and polished. After the deletion of the 3′ primer, 5′ primer, and poly-A tail, the CCS reads were divided into full-length reads, non-full-length reads, chimeric reads, and non-chimeric reads. Subsequently, the FLNC reads were clustered using the IsoSeq module of SMRTLink software (version 11.1; Biomarker, Beijing, China), generating high-quality isoforms (accuracy > 0.99) and low-quality isoforms. Then, redundant isoforms were removed using CD-HIT software (Binary version; Biomarker, Beijing, China) to generate non-redundant transcripts. The integrity assessment of the non-redundant transcriptome was achieved using Benchmarking Universal Single-Copy Orthologs (BUSCO).
For the NGS transcriptome data, reads containing the adapter, reads comprising poly-N, and low-quality reads were removed using fastp (version 0.18.0; GitHub, San Francisco, CA, USA) to generate clean data [13]. Meanwhile, Q30, the GC content, and base average error rate were calculated. Clean reads from all samples were mapped to the full-length transcriptome using Bowtie2 (version 2.2.8; Johns Hopkins University, Baltimore, MD, USA) [14].

2.4. SSR, CDS, TFs, and LncRNA Prediction

Transcripts with length > 500 bp were employed for simple sequence repeat (SSR) analysis using MIcroSAtellite (MISA) software (version 2.1; Leibniz Institute, Gatersleben, Germany) [15]. The coding sequences (CDSs) were predicted using TransDecode software (version 5.5.0; GitHub, San Francisco, CA, USA) based on the length of the open reading frame (ORF), log-likelihood score, and blast between the amino acids and protein domain sequences. Transcription factors (TFs) were predicted using iTAK software (version 1.6; Feilab, Cornell University, Ithaca, NY, USA) [16]. Long non-coding RNA (LncRNA) were predicted using the Coding Potential Calculator (CPC) [17], Coding-Non-Coding Index (CNCI) [18], Coding Potential Assessment Tool (CPAT) [19] and Protein family (Pfam) database [20].

2.5. Functional Annotation of Transcripts

For the functional annotation of the non-redundant transcripts, they were blasted with different databases, including the Non-Redundant Protein Sequence Database (NR) [21], and Gene Ontology (GO) [22], Cluster of Orthologous Groups (COG) [23], and evolutionary Genealogy of Genes-Non-supervised Orthologous Groups (eggNOG) databases.

2.6. DETs Analysis

The clean NGS data were blasted with the redundant transcripts using STAR (spliced transcripts alignment of a reference) software (version 5.5.0; Cold Spring Harbor, New York, NY, USA). The expression levels of transcripts were quantified with FPKM (fragments per kilobase of transcript per million fragments mapped) values using RSEM software (version 5.5.0; GitHub, San Francisco, CA, USA) [24]. DESeq software (version 5.5.0; Biomarker, Beijing, China) was applied for the differential expression analysis of the samples at different developmental stages. The FDR (false discovery rate) value was calculated through correction of the p-value. Using fold change ≥2 and FDR < 0.01 as the screening criteria, DETs were obtained.

2.7. RT-qPCR Analyses

Eight transcripts at four different developmental stages of C. migao fruits were randomly selected for RT-qPCR analysis. The actin gene was selected as the normalization control. The first-strand cDNAs were reverse-transcribed from the CMFI, CMFII, CMFIII, and CMFIV of C. migao fruits using the MightyScript First-Strand cDNA Synthesis Master Mix (Sangon Biotech, Shanghai, China). RT-qPCR was conducted by using a BioRad CFX96 Real-Time PCR System (BIO-RAD, Hercules, CA, USA) and TransStart® Green qPCR SuperMix (TransGen, Beijing, China). Each PCR reaction in a 20 μL volume comprised 1 μL of template cDNA, 0.4 μL of forward primer, 0.4 μL of reverse primer, 10 μL of 2× TransStart® Green qPCR SuperMix, and 9.2 μL of nuclease-free water. The amplification of the target genes was carried out as follows: 60 s at 95 °C, followed by 40 cycles of 5 s at 95 °C and 60 s at 60 °C. The 2−∆∆Ct method was used to calculate gene expression through normalization to the actin gene from C. migao. Three independent biological replicates were conducted for each sample.

3. Results

3.1. Transcriptome Analysis

For the NGS transcriptome, after removing the sequencing connectors and primers and filtering out the low-quality data, the clean data (high-quality reads) were generated (Table S1). For the full-length transcriptome, 21 RNA samples obtained from the roots, stems, leaves, and fruits at four different developmental stages of C. migao were mixed and applied for library construction and third-generation sequencing. Finally, 39.9 Gb of raw data was obtained. Based on full passes ≥3 and sequence accuracy > 0.9, 515,929 CCSs were extracted from the raw data (Table 1).
The lengths of the CCSs were distributed from 1000 bp to 6000 bp, with a mean read length of 2498 (Figure 1). Of the 515,929 CCSs, 472,858 reads (91.65%) were FLNC reads. After the clustering of similar FLNC reads and correction with NGS data, 139,871 consensus isoforms were obtained, of which, 139,843 reads (99.98%) were HQ transcripts. Afterwards, transcripts with high similarity were merged and redundant transcripts were removed. Finally, 73,575 non-redundant transcripts were obtained. In addition, 1234 complete transcripts were successfully blasted with BUSCO, including 522 single-copy and 712 duplicate transcripts, which indicated the good integrity of the non-redundant transcriptome (Figure 2).

3.2. SSR, CDS, TFs, and LncRNA Prediction

An SSR is a piece of DNA composed of repeated basic units with 1–6 nucleotides, which can be used as molecular markers in plant breeding. After SSR analysis using MISA software (version 2.1; Leibniz Institute, Gatersleben, Germany), a total of 62,628 SSRs were identified, including mono-nucleotide, di-nucleotide, tri-nucleotide, tetra-nucleotide, penta-nucleotide, hexa-nucleotide, and compound SSRs. Among these, the number of mono-nucleotide SSRs was the highest, at 28,447 (Figure 3).
The CDS is the part of the mRNA responsible for encoding proteins. Using TransDecode software (version 5.5.0; GitHub, San Francisco, CA, USA), a total of 4797 complete ORFs were predicted, which were between 100 and 2300 bp in length (Figure 4).
TFs, also known as trans-acting factors, refer to DNA-binding proteins that can specifically interact with the cis-acting elements and activate or inhibit the transcription of target genes. Thus, TFs directly control gene transcription levels and ultimately regulate the types and content of metabolites. The results showed that there were more than 3400 TFs in C. migao fruits. The top 30 TFs are illustrated in Figure 5; these include bHLH (240), B3-ARF (215), and C2H2 (199) TFs.
LncRNA is a non-coding RNA greater than 200 nucleotides in length that can interact with mRNA, DNA, protein, and even miRNA to regulate gene expression at epigenetic, transcriptional, post-transcriptional, translational and post-translational levels. Through CPS, CNCI, CPA, and Pfam analyses, the non-coding transcripts were identified and intersected to produce 2246 lncRNAs (Figure 6). The target genes of the lncRNAs were also predicted with the LncTar tool.

3.3. Functional Annotation

The obtained non-redundant transcripts were blasted with NR, Swissport, GO, KOG, Pfam, and KEGG databases for functional annotation. A total of 70,427 isoforms were successfully annotated, among which, 18,568 isoforms were simultaneously annotated within five databases (Figure 7). The NR database annotated the highest number of transcripts (70,212), and the COG database annotated the lowest number of transcripts (31,769).
In order to comprehensively describe the properties of genes and their products in C. migao, the NR database was adopted for the functional annotation of all non-redundant transcripts. A total of 70,212 transcripts were annotated and classified into three categories according to their potential functions, including cellular components, molecular functions, and biological processes (Figure 8). For molecular functions, most of the annotated genes were involved in catalytic activity, binding, transporter activity, reproduction, and structural molecular activity. The main biological processes were the metabolic process, cellular process, single-organism process, biological regulation, and localization.
Meanwhile, gene functions were further analyzed using the COG, KOG and eggNOG databases, and 31,769, 45,834, and 69,253 transcripts were obtained, respectively. As a whole, all annotated genes were distributed in 26 biological processes (Figure 9). In the COG and KOG function classifications, “general function prediction only”, “signal transduction mechanisms”, “posttranslational modification, protein turnover, chaperones”, “transcription”, and “carbohydrate transport and metabolism” were the top five classifications. However, “function unknown” was more prevalent than the other categories in the eggNOG function classification.

3.4. Identification of DETs in C. migao Fruits at Different Developmental Stages

A total of 28,331 transcripts were expressed in all the CMFI, CMFII, CMFIII, and CMFIV fruits. Meanwhile, 5764 transcripts showed period-specific expression, with 913, 1227, 2637, and 987 transcripts specifically expressed in CMFI, CMFII, CMFIII and CMFIV, respectively (Figure 10).
The DEGs in the four different developmental stages were analyzed through pairwise comparisons (CMFI vs. CMFII, CMFI vs. CMFIII, CMFI vs. CMFIV, CMFII vs. CMFIII, CMFII vs. CMFIV, and CMFIII vs. CMFIV). The results showed that CMFII vs. CMFIV had the highest number of DETs (14,160), with 5241 up-regulated transcripts and 8919 down-regulated transcripts. CMFI vs. CMFIII, CMFI vs. CMFIV, CMFII vs. CMFIII, and CMFIII vs. CMFIV also showed relatively larger numbers of DETs. CMFI vs. CMFII produced the lowest number of DETs (219), with 173 up-regulated transcripts and 46 down-regulated transcripts (Figure 11).

3.5. Analysis of DETs Involved in Terpenoid Biosynthesis

Terpenoids are the main components of the volatile oils in C. migao fruits. The MEP and MVA pathways provide the substrates for the synthesis of different terpenoids. Here, 36 DETs encoding 15 enzymes involved in the terpenoid backbone pathway were identified through removing repeated genes, fragments of genes, and genes with lower expression levels (FPKM < 1).
In the MVA pathway, 19 DETs were identified, including 4 CmAACTs, 3 CmHMGCSs, 4 CmHMGCRs, 2 CmMKs, 2 CmPMKs, 3 CmMDCs, and 2 CmFPPSs (Figure 12). In general, all DETs showed the highest expression in the CMFI or CMFII, and the lowest expression in CMFIV (Table S2). Of the four CmAACTs, only CmAACT3 showed the highest expression in CMFII. The other three CmAACTs, CmAACT1, CmAACT2 and CmAACT4, showed the highest expression in CMFIV and the lowest expression in CMFI or CMFII. Similarly, two CmPMKs were expressed in all stages, with the highest expression in CMFIII and CMFIV, respectively. Meanwhile, 15 DETs were involved in the MVA pathway, including 4 CmDXSs, 1 CmDXR, 1 CmCMS, 1 CmCMK, 1 CmICS, 2 CmHDSs, 1 CmHDR and 4 CmGPPSs. Most genes in the MVA pathway showed the same expression trends as those in the MEP pathway except for CmDXSs. Of the four CmDXSs, only CmDXS2 showed the highest expression in CMFII and the lowest expression in CMFIV. However, CmDXS1, CmDXS3, and CmDXS4 showed the highest expression in CMFIV or CMFIII. In addition, two CmIDSs, encoding products that catalyze the mutual transformation between IPP and DMAPP, were identified.
The FPPS and GPPS synthesized through the MVA and MEP pathways are used as raw materials for the biosynthesis of sesquiterpenoids and monoterpenoids, respectively. In C. migao fruits, the volatile oils are primarily composed of monoterpenoids and sesquiterpenoids, whose synthesis requires terpenoid synthases (TPSs). Therefore, 22 hub TPS genes with different expression levels were identified. The TPSs family consists of seven sub-families: TPSa, TPSb, TPSc, TPSd, TPSe/f and TPSg. Each sub-family exists in a special plant species or exerts specific functions. The phylogenetic tree illustrated that 15 TPSs belong to the TPSa sub-family, which is responsible for the biosynthesis of sesquiterpenoids (Figure 13, Table S3). Meanwhile, six TPSs were clustered into the TPSc sub-family related to diterpenoid biosynthesis. But only one TPS (F01_transcript_130486) belonging to the TPSb sub-family was responsible for the biosynthesis of monoterpenoids.
Most TPSs displayed similar expression trends, with the highest expression in CMFI or CMFII, and the lowest expression in CMFIV or CMFIII. However, two TPSs (F01_transcript_22652 and F01_transcript_27229) showed the highest expression in CMFIV, and four TPSs (F01_transcript_12629, F01_transcript_130486, F01_transcript_15110, and F01_transcript_16333) had the highest expression level in CMFIII (Figure 14, Table S4). These special TPSs require more attention in future research.

3.6. RT-qPCR of DETs

To determine the expression levels of the DETs, eight transcripts were randomly selected for a real-time quality polymerase chain reaction (RT-qPCR) test. Different from the other genes, F01_transcript_22562 and F01_transcript_17265 had the highest expression in CMFIV, not in CMFI or CMFII (Figure 15, Table S5). The RT-qPCR results showed that the expression trends observed in the eight genes tested were in good agreement with the NGS data, verifying the accuracy of the transcriptome data.

4. Discussion

Cinnamomum migao H. W. Li is an important economic and medicinal evergreen plant that is restricted to southwest China. For many years, C. migao has been a common medication used by ethnic groups. Presently, multiple pharmaceutical drugs derived from C. migao have been produced and are prevalent in the market. Most studies related to C. migao have been focused on the separation, purification, and identification of chemicals in the volatile oil, as well as their pharmacological activity. However, this woody plant is facing the danger of extinction due to destructive lumbering, environmental degradation and its own physiological factors. In order to better protect the wild resource of C. migao, Xiaolong Huang et al. used RNA-Seq technology to obtain NGS transcriptome data for the C. migao seed, which provided valuable information regarding the regulation mechanism of seed germination in C. migao [12]. To some extent, these results provide good technical support and theoretical guidance for the artificial breeding of C. migao. As a medicinal woody plant, the mature fruits of C. migao are the most important medicinal part, but studies are lacking on its genetics.
To our knowledge, only the genome for Cinnamomum camphora has been reported [25,26,27,28], and no more than 10 species of camphor plants have been reported for transcriptome analysis, including Cinnamomum chago [29], Cinnamomum longepaniculatum [30], Cinnamomum cassia [31], Cinnamomum burmannii [32], and so on. At present, the genomic data for C. migao are not available, and the biosynthesis pathway of important terpenoids in the volatile oil is not clear. Therefore, more comprehensive and complete transcriptome data are absolutely necessary. Full-length transcriptome sequencing based on the PacBio platform using SMRT technology is an effective method to obtain transcriptome data and has been applied for many species [33,34]. For example, Minzhen Yin et al. used SMRT sequencing to investigate the specific gene expression in the root, stem, and leaves of A. mongholicus [35]. Their results showed that 643,812 CCS reads were generated, yielding 121,107 non-redundant transcript isoforms. Forty-four differentially expressed genes (DEGs) involved in isofavonoid biosynthesis and 44 DEGs involved in triterpenoid saponin biosynthesis were identified.
Although SMRT technology provides longer reading lengths, the error rate of sequencing is also higher. Therefore, NGSs with a higher accuracy were combined with full-length transcriptome sequencing to analyze the candidate genes involved in the biosynthesis of specific metabolites. For example, a combination of NGS and full-length transcriptome sequencing was performed to obtain candidate genes in Astragalus mongholicus Bunge, and 44 DETs from 16 gene families that encode enzymes involved in triterpenoid saponin biosynthesis were identified [36]. In this study, a total of 58 DETs involved in the biosynthesis of terpenoids were identified.
The major components of the volatile oils of C. migao are terpenoids, which have obvious pharmaceutical activities and are widely distributed in the fruits of Lauraceae plants. In the full-length transcriptome data of this study, a total of 276 transcripts were found to be related to the biosynthesis of terpenoids. The DETs in different developmental stages were identified, including 36 genes involved in the terpenoid backbone pathway. The MEP and MVA pathways provide different precursors for the biosynthesis of various terpenoids [37,38]. HMGCRs and DOXs play important roles in the MEP and MVA pathways, respectively [39,40]. Three identified CmHMGCRs showed similar expression trends, with the highest expression level in CMFI or CMFII, and the lowest expression level in CMFIV. Among the four CmDOXs, CmDOX1 and CmDOX3 had similar expression trends, with higher expression in CMFIV and lower expression in CMFI. Meanwhile, CmDOX2 had completely different expression trends, with higher expression in CMFII and the lowest expression in CMFIV.
In addition, the most abundant terpenoids in the volatile oil of C. migao are monoterpenoids and sesquiterpenoids, whose biosynthesis requires TPSs. Based on the reaction mechanism and products formed by terpenoid synthase, plant terpenoid synthase genes can be divided into two types: type I and type II [41]. However, most plant TPSs are type I. To date, the functional analysis of terpenoid synthase genes in terrestrial plants has been widely studied, including those in Arabidopsis thaliana, citrus, eucalyptus, grapes, and apples [42,43,44,45]. In our study, 22 hub TPS genes with different expression levels were identified. Among these, 15 sesqui-TPSs and 6 di-TPSs were involved in the biosynthesis of sesquiterpenoids and diterpenoids, respectively. Notably, only one mono-TPS was identified. This is mainly due to the versatility of mono-TPS and incomplete sequencing data. In addition, most TPSs had the same expression trends, with the highest expression in CMFI or CMFII, and the lowest expression in CMFIV. It is noteworthy that F01_transcript_130486 and F01_transcript_22652 showed different expression trends, with the highest expression in CMFIII or CMFIV, not in CMFI or CMFII.

5. Conclusions

In our study, a combination of full-length transcriptome and NGS sequencing was used for the first time to obtain transcriptome data for Cinnamomum migao. A total of 73,575 non-redundant transcripts were obtained and the structural prediction and functional annotation of these transcripts were performed. Meanwhile, the NGS of C. migao fruits at four different developmental stages was also performed. Fifty-eight DETs involved in the biosynthesis of terpenoids were identified, and the sub-families of TPSs were classified through the construction of a phylogenetic tree. This research is a valuable basis for future studies on gene mining, the biosynthesis of terpenoids and genetic breeding in C. migao. In future studies, we plan to obtain the complete sequence of key DETs using molecular cloning technology and test their biological function in the biosynthesis of terpenoids.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/f14102075/s1, Table S1. NGS data statistics of C. migao; Table S2. The expression level of transcripts involved in the MEP and MVA pathway; Table S3. ID information of TPSs using for the construction of phylogenetic tree; Table S4. The expression levels of TPSs; Table S5. Primers using for RT-qPCR.

Author Contributions

Conceptualization and writing—original draft, Z.J. and Q.G.; data curation and visualization, L.L. and D.K.; writing—review and editing, T.Z. and W.S.; funding acquisition and investigation, Y.Z. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. U1812403-2), Research and Demonstration on Key technologies for conservation and Innovative utilization of germplasm resources of important Southern medicine in Guangdong Province (Grant No. [2021]163), grants from Guizhou Science and Technology Department ([2019]1019), the ‘Thousand’ level Innovative Talents Project in Guizhou [Grant No. ZQ2018004] and the Innovation and Entrepreneurship Training Program for College Students in Guizhou University of Traditional Chinese Medicine (grant No. [2021]36).

Data Availability Statement

The datasets supporting the conclusions and description of a complete protocol can be found within the manuscript and its additional files. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shi, H.Y.; Shi, Q.L.; Wang, T.W.; Qin, J.Z.; Wang, Y.P. Migao-a special Miao medicinal plants in Guizhou Province. In Proceedings of the National Symposium on Miao Medicine, Guiyang, China, 1 November 2003. [Google Scholar]
  2. Zheng, Y.Y.; Qiu, D.W.; Liang, G.Y.; Sun, X.H.; Sui, Y.H. Research and Industrialization of “Da Guo Mu Jiang Zi” in Guizhou Province. World Sci. Technol. 2005, 02, 112–114. [Google Scholar]
  3. Wang, J.C.; Liu, J.M.; Wen, A.H.; Gao, P. Research progress on the medicinal plant “Da Guo Mu Jiang Zi” in Guizhou Province. Heilongjiang Agric. Sci. 2015, 5, 157–160. [Google Scholar]
  4. Guo, J.T.; Zhang, Y.P.; Liu, J.; Xu, J.; Cheng, C.; Liu, Y. Comparison of volatile oil composition and antioxidant activity in different parts of Lisea lancilimba Merr. Sci. Technol. Food Ind. 2023, 44, 306–315. [Google Scholar]
  5. Liu, J.; Guo, J.T.; Liu, Y.; Cheng, C.; Huang, K.; Jian, L.N.; Xu, J.; Zhang, Y.P. Extraction optimization, composition analysis of volatile oil from Lisea lancilimba Merr. and its antioxidant activity. Sci. Technol. Food Ind. 2022, 9, 211–219. [Google Scholar]
  6. Sun, H.C.; Li, J.; Zhou, Z.H.; Wang, Z.D.; Dong, H.; Tang, X.Q.; Liu, W.Q.; Zhang, L.Y. Research progress in chemical constituents, pharmacological effect and industrialization of Lisea lancilimba Merr. Cent. South Pharm. 2022, 20, 668–671. [Google Scholar]
  7. Yan, T.; Zhou, Z.Y.; Luo, D.Y.; Chi, M.Y.; Wang, A.M.; Huang, Y.; Zheng, L. GC-MS analysis of three different methods for extracting volatile oil components from fresh and dry fruits of Cinnamomum migao. J. Chin. Med. Mater. 2022, 1, 123–129. [Google Scholar]
  8. Chen, S.Y.; Tang, Z.X.; Yang, X.; Tu, X.H.; Yuan, L.F.; Pan, S.J.; Deng, W.J.; Zhao, T.T. Study on the molecular mechanism of main volatile components of Lisea lancilimba Merr. Chin. J. Ethnomed. Ethnopharmacol. 2022, 31, 33–38. [Google Scholar]
  9. Huang, K.; Liu, J.; Huang, C.H.; Liu, Y.; Cheng, C.; Zhang, Y.P.; Xu, J. Comparative analysis of volatile oil and fatty oil constituents from Cinnamomum migao in different sources. China Pharm. 2020, 31, 1961–1966. [Google Scholar]
  10. Luo, J.; Zhu, D.; Liao, X.; Huang, J.; Tang, J.; Liu, W.; Bao, J.P.; Tan, J.H. Comparative research on the components of volatile oils in Listea Lam based on the theory of using fresh materials in Miao medicine. Lishizhen Med. Mater. Res. 2019, 20, 574–576. [Google Scholar]
  11. Chen, J.Z.; Huang, X.L.; Xiao, X.F.; Liu, J.M.; Liao, X.F.; Sun, Q.W.; Peng, L.; Zhang, L. Seed Dormancy Release and Germination Requirements of Cinnamomum migao, an Endangered and Rare Woody Plant in Southwest China. Front. Plant Sci. 2022, 13, 770940. [Google Scholar] [CrossRef]
  12. Huang, X.L.; Tian, T.; Chen, J.Z.; Wang, D.; Tong, B.L.; Liu, J.M. Transcriptome analysis of Cinnamomum migao seed germination in medicinal plants of southwest China. BMC Plant Biol. 2021, 21, 270. [Google Scholar] [CrossRef] [PubMed]
  13. Raza, A.; Su, W.; Hussain, M.A.; Mehmood, S.S.; Zhang, X.; Cheng, Y.; Zou, X.; Lv, Y. Integrated Analysis of Metabolome and Transcriptome Reveals Insights for Cold Tolerance in Rapeseed (Brassica napus L.). Front. Plant Sci. 2021, 12, 721681. [Google Scholar] [CrossRef] [PubMed]
  14. Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
  15. Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar] [CrossRef] [PubMed]
  16. Zheng, Y.; Jiao, C.; Sun, H.; Rosli, H.G.; Pombo, M.A.; Zhang, P.; Banf, M.; Dai, X.; Martin, G.B.; Giovannoni, J.J.; et al. iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases. Mol. Plant 2016, 9, 1667–1670. [Google Scholar] [CrossRef]
  17. Kong, L.; Zhang, Y.; Ye, Z.Q.; Liu, X.Q.; Zhao, S.Q.; Wei, L.; Gao, G. CPC: Assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007, 35, W345–W349. [Google Scholar] [CrossRef] [PubMed]
  18. Sun, L.; Luo, H.; Bu, D.; Zhao, G.; Yu, K.; Zhang, C.; Liu, Y.; Chen, R.; Zhao, Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013, 41, e166. [Google Scholar] [CrossRef]
  19. Wang, L.G.; Park, H.J.; Dasari, S.; Kocher, J.P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression mode. Nucleic Acids Res. 2013, 41, e74. [Google Scholar] [CrossRef] [PubMed]
  20. Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A. The Pfam protein family database: Towards a more sustainable future. Nucleic Acids Res. 2016, 44, D279–D285. [Google Scholar] [CrossRef]
  21. Deng, Y.Y.; Li, J.Q.; Wu, S.F.; Zhu, Y.P.; Chen, Y.W.; He, F.C. Integrated NR Database in Protein Annotation System and Its Localization. Comput. Eng. 2006, 32, 71–74. [Google Scholar]
  22. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef] [PubMed]
  23. Tatusov, R.L.; Galperin, M.Y.; Natale, D.A. The COG database: A tool for genome scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28, 33–36. [Google Scholar] [CrossRef] [PubMed]
  24. Li, B.; Dewey, C.N. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011, 12, 323. [Google Scholar] [CrossRef] [PubMed]
  25. Shen, T.; Qi, H.; Luan, X.; Xu, W.; Yu, F.; Zhong, Y.; Xu, M. The chromosome-level genome sequence of the camphor tree provides insights into Lauraceae evolution and terpene biosynthesis. Plant Biotech. J. 2022, 20, 244–246. [Google Scholar] [CrossRef]
  26. Jiang, R.; Chen, X.; Liao, X.; Peng, D.; Han, X.; Zhu, C.; Wang, P.; Hufnagel, D.E.; Wang, L.; Li, K.; et al. A Chromosome-Level Genome of the Camphor Tree and the Underlying Genetic and Climatic Factors for Its Top-Geoherbalism. Front. Plant Sci. 2022, 13, 827890. [Google Scholar] [CrossRef]
  27. Wang, X.D.; Xu, C.Y.; Zheng, Y.J.; Wu, Y.F.; Zhang, Y.T.; Zhang, T.; Xiong, Z.Y.; Yang, H.K.; Li, J.; Fu, C.; et al. Chromosome-level genome assembly and resequencing of camphor tree (Cinnamomum camphora) provides insight into phylogeny and diversification of terpenoid and triglyceride biosynthesis of Cinnamomum. Hortic. Res. 2022, 9, uhac216. [Google Scholar] [CrossRef]
  28. Li, D.; Lin, H.Y.; Wang, X.; Bi, B.; Gao, Y.; Shao, L.; Zhang, R.; Liang, Y.; Xia, Y.; Zhao, Y.P.; et al. Genome and whole-genome resequencing of Cinnamomum camphora elucidate its dominance in subtropical urban landscapes. BMC Biol. 2023, 21, 192. [Google Scholar] [CrossRef]
  29. Zhang, X.; Zhang, Y.; Wang, Y.H.; Shen, S.K. Transcriptome Analysis of Cinnamomum chago: A Revelation of Candidate Genes for Abiotic Stress Response and Terpenoid and Fatty Acid Biosyntheses. Front. Genet. 2018, 9, 505. [Google Scholar] [CrossRef]
  30. Zhao, X.; Yan, Y.; Zhou, W.H.; Feng, R.Z.; Shuai, Y.K.; Yang, L.; Liu, M.J.; He, X.Y.; Wei, Q. Transcriptome and metabolome reveal the accumulation of secondary metabolites in different varieties of Cinnamomum longepaniculatum. BMC Plant Biol. 2022, 22, 243. [Google Scholar] [CrossRef]
  31. Gao, H.; Zhang, H.; Hu, Y.; Xu, D.; Zheng, S.; Su, S.; Yang, Q. De Novo transcriptome assembly and metabolomic analysis of three tissue types in Cinnamomum cassia. Chin. Herb. Med. 2023, 15, 310–316. [Google Scholar] [CrossRef]
  32. Guo, S.; Liang, J.; Deng, Z.; Lu, Z.; Fu, M.; Su, J. Full-Length Transcriptome Sequencing Combined with RNA-Seq to Analyze Genes Related to Terpenoid Biosynthesis in Cinnamomum burmannii. Curr. Issues Mol. Biol. 2022, 44, 4197–4215. [Google Scholar] [CrossRef] [PubMed]
  33. Zhao, L.; Zhang, H.; Kohnen, M.V.; Prasad, K.V.S.K.; Gu, L.; Reddy, A.S.N. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing. Front. Genet. 2019, 10, 253. [Google Scholar] [CrossRef]
  34. He, Z.; Su, Y.; Wang, T. Full-Length Transcriptome Analysis of Four Different Tissues of Cephalotaxus oliveri. Int. J. Mol. Sci. 2021, 22, 787. [Google Scholar] [CrossRef]
  35. Yin, M.; Chu, S.; Shan, T.; Zha, L.; Peng, H. Full-length transcriptome sequences by a combination of sequencing platforms applied to isoflavonoid and triterpenoid saponin biosynthesis of Astragalus mongholicus Bunge. Plant Methods 2021, 17, 61. [Google Scholar] [CrossRef]
  36. Vranová, E.; Coman, D.; Gruissem, W. Network analysis of the MVA and MEP pathways for isoprenoid synthesis. Annu. Rev. Plant Biol. 2013, 64, 665–700. [Google Scholar] [CrossRef] [PubMed]
  37. Xia, J.; Lou, G.; Zhang, L.; Huang, Y.; Yang, J.; Guo, J.; Qi, Z.; Li, Z.; Zhang, G.; Xu, S.; et al. Unveiling the spatial distribution and molecular mechanisms of terpenoid biosynthesis in Salvia miltiorrhiza and S. grandifolia using multi-omics and DESI-MSI. Hortic. Res. 2023, 10, uhad109. [Google Scholar] [CrossRef] [PubMed]
  38. Jo, Y.; DeBose-Boyd, R.A. Post-Translational Regulation of HMG CoA Reductase. CSH. Perspect. Biol. 2022, 14, a041253. [Google Scholar] [CrossRef] [PubMed]
  39. Wanke, M.; Skorupinska-Tudek, K.; Swiezewska, E. Isoprenoid biosynthesis via 1-deoxy-D-xylulose 5-phosphate/2-C-methyl-D-erythritol 4-phosphate (DOXP/MEP) pathway. Acta Biochim. Pol. 2001, 48, 663–672. [Google Scholar] [CrossRef]
  40. Jia, Q.; Brown, R.; Köllner, T.G.; Fu, J.; Chen, X.; Wong, G.K.; Gershenzon, J.; Peters, R.J.; Chen, F. Origin and early evolution of the plant terpene synthase family. Proc. Natl. Acad. Sci. USA 2022, 119, e2100361119. [Google Scholar] [CrossRef]
  41. Tholl, D.; Chen, F.; Petri, J.; Gershenzon, J.; Pichersky, E. Two sesquiterpene synthases are responsible for the complex mixture of sesquiterpenes emitted from Arabidopsis flower. Plant J. 2005, 42, 757–771. [Google Scholar] [CrossRef]
  42. Dornelas, M.C.; Mazzafera, P. A genomic approach to characterization of the Citrus terpene synthase gene family. Genet. Mol. Biol. 2007, 30, 832–840. [Google Scholar] [CrossRef]
  43. Külheim, C.; Padovan, A.; Hefer, C.; Krause, S.T.; Köllner, T.G.; Myburg, A.A.; Degenhardt, J.; Foley, W.J. The Eucalyptus terpene synthase gene family. BMC Genom. 2015, 16, 450. [Google Scholar] [CrossRef] [PubMed]
  44. Martin, D.M.; Aubourg, S.; Schouwey, M.B.; Daviet, L.; Schalk, M.; Toub, O.; Lund, S.T.; Bohlmann, J. Functional annotation, genome organization and phylogeny of the grapevine (Vitis vinifera) terpene synthase gene family based on genome assembly, FLcDNA cloning, and enzyme assays. BMC Plant Biol. 2010, 10, 226. [Google Scholar] [CrossRef]
  45. Nieuwenhuizen, N.J.; Green, S.A.; Chen, X.; Bailleul, E.J.; Matich, A.J.; Wang, M.Y.; Atkinson, R.G. Functional genomics reveals that a compact terpene synthase gene family can account for terpene volatile production in apple. Plant Physiol. 2013, 161, 787–804. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Distribution of CCS read lengths.
Figure 1. Distribution of CCS read lengths.
Forests 14 02075 g001
Figure 2. BUSCO assessment of non-redundant transcriptome.
Figure 2. BUSCO assessment of non-redundant transcriptome.
Forests 14 02075 g002
Figure 3. The distribution of different types of SSRs.
Figure 3. The distribution of different types of SSRs.
Forests 14 02075 g003
Figure 4. Distribution of CDS lengths.
Figure 4. Distribution of CDS lengths.
Forests 14 02075 g004
Figure 5. Distribution of top 30 TFs in C. migao.
Figure 5. Distribution of top 30 TFs in C. migao.
Forests 14 02075 g005
Figure 6. Venn diagram of lncRNAs in four databases.
Figure 6. Venn diagram of lncRNAs in four databases.
Forests 14 02075 g006
Figure 7. Venn diagram of functionally annotated transcripts in five databases.
Figure 7. Venn diagram of functionally annotated transcripts in five databases.
Forests 14 02075 g007
Figure 8. GO database annotations.
Figure 8. GO database annotations.
Forests 14 02075 g008
Figure 9. COG, KOG, and eggNOG database annotations.
Figure 9. COG, KOG, and eggNOG database annotations.
Forests 14 02075 g009
Figure 10. Venn diagram of DETs in four developmental stages of C. migao fruits.
Figure 10. Venn diagram of DETs in four developmental stages of C. migao fruits.
Forests 14 02075 g010
Figure 11. Up-regulated and down-regulated DETs in different comparisons.
Figure 11. Up-regulated and down-regulated DETs in different comparisons.
Forests 14 02075 g011
Figure 12. Heatmap of DETs involved in terpenoid backbone biosynthesis.
Figure 12. Heatmap of DETs involved in terpenoid backbone biosynthesis.
Forests 14 02075 g012
Figure 13. The phylogenetic tree of TPSs.
Figure 13. The phylogenetic tree of TPSs.
Forests 14 02075 g013
Figure 14. Heatmap of TPSs in four different developmental stages.
Figure 14. Heatmap of TPSs in four different developmental stages.
Forests 14 02075 g014
Figure 15. The phylogenetic tree of TPSs.
Figure 15. The phylogenetic tree of TPSs.
Forests 14 02075 g015
Table 1. Full-length transcriptome of C. migao.
Table 1. Full-length transcriptome of C. migao.
SampleRead (Gb)cDNA SizeCCS NumberRead (Bases)Mean Read Length
C. migao39.91–6 k515,9291,289,205,5182498
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ju, Z.; Gong, Q.; Liang, L.; Kong, D.; Zhou, T.; Sun, W.; Pang, Y.; Zhang, Y. Full-Length Transcriptome Sequencing and Identification of Genes Related to Terpenoid Biosynthesis in Cinnamomum migao H. W. Li. Forests 2023, 14, 2075. https://doi.org/10.3390/f14102075

AMA Style

Ju Z, Gong Q, Liang L, Kong D, Zhou T, Sun W, Pang Y, Zhang Y. Full-Length Transcriptome Sequencing and Identification of Genes Related to Terpenoid Biosynthesis in Cinnamomum migao H. W. Li. Forests. 2023; 14(10):2075. https://doi.org/10.3390/f14102075

Chicago/Turabian Style

Ju, Zhigang, Qiuling Gong, Lin Liang, Dejing Kong, Tao Zhou, Wei Sun, Yuxin Pang, and Yongping Zhang. 2023. "Full-Length Transcriptome Sequencing and Identification of Genes Related to Terpenoid Biosynthesis in Cinnamomum migao H. W. Li" Forests 14, no. 10: 2075. https://doi.org/10.3390/f14102075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop