*2.4. Di*ff*erentially Expressed Carotenoid Biosynthetic Genes between the Avocado Mesocarp and Seed*

A comparison of the avocado mesocarp and seed at five developmental stages based on the KEGG pathway enrichment among all DEGs resulted in the identification of the carotenoid biosynthetic pathway in four of the five developmental stages (Figure 3). The DEGs detected in the avocado mesocarp and seed transcriptomes included 17 unigenes that putatively encode 11 enzymes in the carotenoid biosynthetic pathway (Table *Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 1). 5 of 19

**Figure 3.** Results of the KEGG enrichment analysis of differentially expressed genes (DEGs) between the avocado mesocarp and seed at five developmental stages. **Figure 3.** Results of the KEGG enrichment analysis of differentially expressed genes (DEGs) between the avocado mesocarp and seed at five developmental stages.

**Table 1.** Unigenes related to carotenoid biosynthesis.

**Enzyme Commission** 

synthase 2.5.1.32 c103350.graph\_c0,

isomerase 5.2.1.12 c109620.graph\_c1

desaturase 1.3.5.6 c108741.graph\_c1,

**Number Unigene ID** 

c113873.graph\_c5

c104826.graph\_c4

c115069.graph\_c3

**Gene Name Functional Protein** 

*PaPSY* 15-cis-phytoene

*PaPDS* Phytoene

*PaZ-ISO* 15-cis-*ζ*-carotene

*PaZDS ζ*-carotene

**Name** 



An analysis of the unigenes related to carotenoid biosynthesis that were differentially expressed during five mesocarp and seed developmental stages (Figure 4) revealed that the following 15 unigenes were more highly expressed in the mesocarp than in the seed at each of the five examined time-points: *PaPSY* (c103350.graph\_c0 and c113873.graph\_c5), *PaPDS* (c103201.graph\_c0 and c104826.graph\_c4), *PaZ-ISO* (c109620.graph\_c1), *PaZDS* (c108741.graph\_c1 and c115069.graph\_c3), *PaCRTISO* (c108133.graph\_c1), *PaLCY-E* (c117627.graph\_c3), *PaLCY-B* (c92930.graph\_c0 and c110018.graph\_c0), *PaCYP97C* (c110544.graph\_c0), *PaZEP* (c109893.graph\_c0 and c116714.graph\_c5), and *PaNSY* (c92501.graph\_c0). In contrast, the *PaNSY* (c106233.graph\_c1) expression level was considerably lower in the mesocarp than in the seed at each of the five time-points (Figure 4; Table S6). Additionally, *PaCYP97A* (c106779.graph\_c0) was expressed at lower levels in the mesocarp than in the seed from 75 to 180 DAFB, but the opposite pattern was observed at 215 DAFB (Figure 4; Table S6). The *PaPSY*, *PaPDS*, *PaZ-ISO*, *PaZDS*, *PaCRTISO*, *PaLCY-E*, and *PaLCY-B* expression levels were higher in the mesocarp than in the seed at each of the five time-points, and increased by 1.09 to 22.41 fold (Table S6). To confirm the accuracy of the high-throughput sequencing results, the expression levels of ten unigenes involved in the carotenoid biosynthetic pathway (i.e., *PaPSY*, *PaPDS*, *PaZ-ISO*, *PaLCY-E*, *PaCYP97C*, *PaZEP*, and *PaNSY*) were analyzed by a quantitative real-time polymerase chain reaction (qRT-PCR) assay (Figure 5). The resulting expression patterns of these genes during the five mesocarp and seed developmental stages were consistent with the RNA-seq data.

*Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 7 of 19

**Figure 4.** Carotenoid biosynthetic pathway and transcript levels during avocado mesocarp and seed developmental stages. FPKM: fragments per kilobase of transcript sequence per million base pairs sequenced; gene expression levels at 75, 110, 145, 180, and 215 days after full bloom (DAFB) are indicated with colored bars; the top and bottom barfor eachunigene demonstrates avocado mosocarp and seed, respectively; PSY: 15-cis-phytoene synthase; PDS: phytoene desaturase; Z-ISO: 15-cis-*ζ*-carotene isomerase; ZDS: *ζ*-carotene desaturase; CRTISO: carotenoid isomerase; LCY-E: lycopene *ε*-cyclase; LCY-B: lycopene *β*-cyclase; CYP97A: P450 *β*-ring carotene hydroxylase; CYP97C: P450 *ε*-ring carotene hydroxylase; ZEP: zeaxanthin epoxidase; NSY: neoxanthin synthase. **Figure 4.** Carotenoid biosynthetic pathway and transcript levels during avocado mesocarp and seed developmental stages. FPKM: fragments per kilobase of transcript sequence per million base pairs sequenced; gene expression levels at 75, 110, 145, 180, and 215 days after full bloom (DAFB) are indicated with colored bars; the top and bottom barfor eachunigene demonstrates avocado mosocarp and seed, respectively; PSY: 15-cis-phytoene synthase; PDS: phytoene desaturase; Z-ISO: 15-cis-ζ-carotene isomerase; ZDS: ζ-carotene desaturase; CRTISO: carotenoid isomerase; LCY-E: lycopene ε-cyclase; LCY-B: lycopene β-cyclase; CYP97A: P450 β-ring carotene hydroxylase; CYP97C: P450 ε-ring carotene hydroxylase; ZEP: zeaxanthin epoxidase; NSY: neoxanthin synthase.

*Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 8 of 19

**Figure 5.** *Cont*.

*Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 9 of 19

**Figure 5.** The FPKM values and relative expression levels of 10 carotenoid biosynthesis unigenes in avocado mesocarp (**a**) and seed (**b**). Error bars are standard errors of the mean from three biological replicates and two technical replicates. **Figure 5.** The FPKM values and relative expression levels of 10 carotenoid biosynthesis unigenes in avocado mesocarp (**a**) and seed (**b**). Error bars are standard errors of the mean from three biological replicates and two technical replicates.

#### *2.5. General Properties of Single-Molecule Long-Reads 2.5. General Properties of Single-Molecule Long-Reads*

Full-length cDNA sequences derived from poly-A-tailed RNA samples were normalized and subjected to SMRT sequencing with the PacBio RS II platform. A total of 25.79 and 17.67 Gb clean data were generated for the library in avocado mesocarp and seed, respectively. Each SMRT cell produced 651,260 and 586,430 reads of inserts (ROIs) from the library (1–6 kb) in avocado mesocarp and seed, respectively. These ROIs were successfully extracted in avocado mesocarp and seed, respectively, with a mean length of 2200 and 2239 bp, a quality score of 0.96 and 0.94. All ROIs were Full-length cDNA sequences derived from poly-A-tailed RNA samples were normalized and subjected to SMRT sequencing with the PacBio RS II platform. A total of 25.79 and 17.67 Gb clean data were generated for the library in avocado mesocarp and seed, respectively. Each SMRT cell produced 651,260 and 586,430 reads of inserts (ROIs) from the library (1–6 kb) in avocado mesocarp and seed, respectively. These ROIs were successfully extracted in avocado mesocarp and seed, respectively, with a mean length of 2200 and 2239 bp, a quality score of 0.96 and 0.94. All ROIs were further

further classified into 495,245 and 403,108 full-length nonchimeric in avocado mesocarp and seed, respectively. On the basis of the iterative isoform-clustering algorithm, 233,014 and

classified into 495,245 and 403,108 full-length nonchimeric in avocado mesocarp and seed, respectively. On the basis of the iterative isoform-clustering algorithm, 233,014 and 238,219consensus isoforms were acquired in avocado mesocarp and seed, respectively, with a mean length of 2170 and 2027 bp (Table S7). After removing the redundant sequences for all high-quality transcripts and corrected low-quality transcripts with CD-HIT (c = 0.90), 76,345 and 68,618 nonredundant transcripts remained. The SMRT and Illumina HiSeq 2000 sequencing data were deposited in the GenBank database (accession numbers PRJNA551932 and PRJNA559779). *Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 10 of 19 238,219consensus isoforms were acquired in avocado mesocarp and seed, respectively, with a mean length of 2170 and 2027 bp (Table S7). After removing the redundant sequences for all high-quality transcripts and corrected low-quality transcripts with CD-HIT (c = 0.90), 76,345 and 68,618 nonredundant transcripts remained. The SMRT and Illumina HiSeq 2000 sequencing data were deposited in the GenBank database (accession numbers PRJNA551932 and PRJNA559779).

#### *2.6. Isoforms in Carotenoid Biosynthetic Pathway between the Avocado Mesocarp and Seed 2.6. Isoforms in Carotenoid Biosynthetic Pathway between the Avocado Mesocarp and Seed*

KEGG analysis in the avocado mesocarp and seed indicated that a total of 104 and 59 isoforms were found to correspond to the putative 11 genes in the carotenoid biosynthetic pathway, respectively (Figure 6). Two to 23 isoforms were found in the putative 11 genes in avocado mesocarp, and one to 15 isoforms were generated from the putative 11 genes in avocado seed. *PaPSY* possessed the most isoform number in avocado mesocarp and seed, respectively. The number of isoforms correspond to the putative 10 genes in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed, and increased by 1.33–5.50 fold. However, the number of isoforms corresponding to *PaCYP97A* was lower in the mesocarp than those in the seed. KEGG analysis in the avocado mesocarp and seed indicated that a total of 104 and 59 isoforms were found to correspond to the putative 11 genes in the carotenoid biosynthetic pathway, respectively (Figure 6). Two to 23 isoforms were found in the putative 11 genes in avocado mesocarp, and one to 15 isoforms were generated from the putative 11 genes in avocado seed. *PaPSY* possessed the most isoform number in avocado mesocarp and seed, respectively. The number of isoforms correspond to the putative 10 genes in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed, and increased by 1.33–5.50 fold. However, the number of isoforms corresponding to *PaCYP97A* was lower in the mesocarp than those in the seed.

**Figure 6.** Number of single-molecule real-time (SMRT) isoforms corresponding to the putative 11 genes in the carotenoid biosynthetic pathway. **Figure 6.** Number of single-molecule real-time (SMRT) isoforms corresponding to the putative 11 genes in the carotenoid biosynthetic pathway.

#### *2.7. Verification of Transcriptome Profiling in Carotenoid Biosynthetic Pathway between the Avocado 2.7. Verification of Transcriptome Profiling in Carotenoid Biosynthetic Pathway between the Avocado Mesocarp and Seed by Metabolite Profiling via HPLC*

*Mesocarp and Seed by Metabolite Profiling via HPLC* At last, in order to validate transcriptome profiling via NGST and SMRT sequencing in carotenoid biosynthetic pathway between the avocado mesocarp and seed, alpha- and beta-carotene were selected to measure contents during five avocado mesocarp and seed developmental stages by HPLC (Figure S3). The mesocarp alpha- and beta-carotene contents increased slightly from 75 days after full bloom (DAFB) (0.21 and 0.13 μg/g fresh weight (FW), respectively) to 110 DAFB (0.24 and 0.19 μg/g FW, respectively). They then decreased to their lowest levels (0.18 and 0.12 μg/g FW, respectively) at 145 DAFB, but then increased again up to 210 DAFB, peaking at 0.27 and 0.28 μg/g FW, respectively (Figure 7). Trace amounts of alpha- and beta-carotenes were detected in developing At last, in order to validate transcriptome profiling via NGST and SMRT sequencing in carotenoid biosynthetic pathway between the avocado mesocarp and seed, alpha- and beta-carotene were selected to measure contents during five avocado mesocarp and seed developmental stages by HPLC (Figure S3). The mesocarp alpha- and beta-carotene contents increased slightly from 75 days after full bloom (DAFB) (0.21 and 0.13 µg/g fresh weight (FW), respectively) to 110 DAFB (0.24 and 0.19 µg/g FW, respectively). They then decreased to their lowest levels (0.18 and 0.12 µg/g FW, respectively) at 145 DAFB, but then increased again up to 210 DAFB, peaking at 0.27 and 0.28 µg/g FW, respectively (Figure 7). Trace amounts of alpha- and beta-carotenes were detected in developing seeds, with the contents fluctuating between 0.01 and 0.02 µg/g FW from 75 to 215 DAFB (Figure 7).

seeds, with the contents fluctuating between 0.01 and 0.02 μg/g FW from 75 to 215 DAFB (Figure 7).

*Int. J. Mol. Sci.* **2019**, *20*, x FOR PEER REVIEW 11 of 19

**Figure 7.** Alpha-carotene and beta-carotene contents during five avocado mesocarp (**a**) and seed (**b**) developmental stages. **Figure 7.** Alpha-carotene and beta-carotene contents during five avocado mesocarp (**a**) and seed (**b**) developmental stages.

#### **3. Discussion 3. Discussion**

As it is inexpensive and can be completed rapidly, the transcriptome sequencing technique is useful for obtaining a large number of unigene sequences for an organism that lacks an available reference sequence [38]. To the best of our knowledge, for avocado, NGST transcriptome sequencing has been used to investigate fatty acid biosynthesis [39–41], but not any other metabolic biosynthetic pathway. Within our transcriptome assembly, 109.13 and 104.10 Gb of sequence data were respectively generated for the avocado mesocarp and seed during five developmental stages. Additionally, the 100,837 identified unigenes may be useful for subsequent analyses of metabolic biosynthetic pathways in avocado or related species. The N50 and mean lengths of avocado unigenes in our study were 1725 and 847.40 bp, respectively, which implies that our sequence assembly was accurate and effective. The N50 value in this study was higher than those obtained for avocado samples generated from mesocarp during four developmental stages (1050 bp) [41] and our previous avocado samples from five mixed organs sampled (1283 bp) [42], while the mean length in this study was lower than those obtained for both studies (987 and 922 bp) [41,42]. Recently, one of the advances in transcriptome sequencing technology has been the development of the long-read SMRT sequencing technique, which enables researchers to obtain a substantial number of full-length sequences from a cDNA library [28]. In the current study, PacBio SMRT system was applied to generate the full-length transcriptome of avocado mesocarp and seed. The 25.79 and 17.67 Gb SMRT data produced in this study provide the comprehensive insights into the avocado mesocarp and As it is inexpensive and can be completed rapidly, the transcriptome sequencing technique is useful for obtaining a large number of unigene sequences for an organism that lacks an available reference sequence [38]. To the best of our knowledge, for avocado, NGST transcriptome sequencing has been used to investigate fatty acid biosynthesis [39–41], but not any other metabolic biosynthetic pathway. Within our transcriptome assembly, 109.13 and 104.10 Gb of sequence data were respectively generated for the avocado mesocarp and seed during five developmental stages. Additionally, the 100,837 identified unigenes may be useful for subsequent analyses of metabolic biosynthetic pathways in avocado or related species. The N50 and mean lengths of avocado unigenes in our study were 1725 and 847.40 bp, respectively, which implies that our sequence assembly was accurate and effective. The N50 value in this study was higher than those obtained for avocado samples generated from mesocarp during four developmental stages (1050 bp) [41] and our previous avocado samples from five mixed organs sampled (1283 bp) [42], while the mean length in this study was lower than those obtained for both studies (987 and 922 bp) [41,42]. Recently, one of the advances in transcriptome sequencing technology has been the development of the long-read SMRT sequencing technique, which enables researchers to obtain a substantial number of full-length sequences from a cDNA library [28]. In the current study, PacBio SMRT system was applied to generate the full-length transcriptome of avocado mesocarp and seed. The 25.79 and 17.67 Gb SMRT data produced in this study provide the comprehensive insights into the avocado mesocarp and seed, respectively, and might

seed, respectively, and might serve as the genetic basis for future research on avocado. Interestingly,

serve as the genetic basis for future research on avocado. Interestingly, the full-length transcriptome sequence described herein is also the first such sequence for a plant species from the family Lauraceae.

Carotenoids are widely distributed isoprenoid pigments with very diverse biological functions in plants [12]. Carotenoids accumulate as secondary metabolites in leaves [9,22], fruits [21,26,43], and roots [24,25]. The carotenoid biosynthetic pathway has been extensively studied in many photosynthetic and non-photosynthetic organisms, and some researchers confirmed that in most plant species, carotenoid accumulation is mainly controlled by regulating the transcription of genes related to carotenoid biosynthesis [12]. However, the transcript profiles of genes related to carotenoid biosynthesis in avocado fruit remained unclear. In our avocado NGST transcriptome database, we identified 17 unigenes encoding 11 putative enzymes involved in the carotenoid biosynthetic pathway in avocado fruit. The 15 out of 17 unigenes were more highly expressed in the mesocarp than in the seed at each of the five examined time-points. Meanwhile, SMRT transcriptome database in our study indicated that the number of isoforms correspond to the putative 10 genes in the carotenoid biosynthetic pathway were higher in the mesocarp than those in the seed. Furthermore, the metabolite (alpha- and beta-carotene) profiling via HPLC in the avocado mesocarp and seed during five developmental stages in this study validated the results of our NGST and SMRT transcriptome profiling. These results clearly showed that the upregulated expression levels of most unigenes encoding 11 putative enzymes involved in the carotenoid biosynthetic pathway might contribute to the higher carotenoid pathway flux in the avocado mesocarp than in the seed. Besides, gene dosage (isoform number) increase of most carotenoid biosynthetic-related genes could also accelerate the carotenoid accumulation. Previous studies revealed gene dosage balance impacts on agronomic traits in plants, and defined the linkage between quantitative trait and gene dosage variation [44–46]. Consequently, we might imply that the gene dosage variation and the associated changes in gene expression of these unigenes might be important for controlling the carotenoid contents in avocado during the mesocarp and seed developmental stage.

An earlier investigation proved that upregulated *PSY* and *PDS* expression levels are correlated with the total carotenoid content during the tomato fruit maturation stage [47]. Similarly, *PSY*, *ZDS*, *CRTISO*, and *LCY-E* might be key genes for controlling carotenoid contents in *M. cochinchinensis* ripening fruits [21]. Additionally, *LCY-B* expression contributes to the accumulation of carotenoids in papaya [48], kiwifruit [49], and citrus [50] fruits. Another study indicated that *PSY* expression is also related to the alpha- and beta-carotene as well as total carotenoid contents in red pepper fruits [51]. In *B. campestris* L. subsp. *chinensis* var. *rosularis* Tsen and Lee leaves, *LCY-E* and *ZDS* expression may be vital for carotenoid biosynthesis [22]. In celery, *PSY* and *LCY-E* expression may be important for promoting beta-carotene biosynthesis. In the potato tuber, *PSY* expression is considered to increase the beta-carotene content [52]. Welsch [53] also suggested that *PSY* expression mediates the beta-carotene accumulation in cassava roots. Thus, analyses of the differences in gene expression profiles may yield new insights into carotenoid biosynthetic mechanisms and identify diverse carotenogenic genes expressed in various developmental stages, tissues, and species as well as in response to specific treatments.

The identification of genes encoding enzymes related to the carotenoid biosynthetic pathway not only facilitates the characterization of physiological functions in higher plants, it also provides useful information relevant for metabolic engineering. On the basis of NGST and SMRT transcriptome sequencing in this study, we investigated the differences in carotenoid biosynthesis between the avocado mesocarp and seed. However, carotenoid biosynthesis involves complex biological processes regulated by many biological pathways (i.e., the MVA, MEP, and carotenoid biosynthetic pathways) and genes. The NGST and SMRT transcriptome database described herein may represent a useful resource for clarifying carotenoid biosynthesis in various avocado tissues. Additionally, to the best of our knowledge, this study is the first to integrate Illumina with PacBio SMRT sequencing platforms for investigating avocado mesocarp and seed developmental stages via transcriptome sequencing and assembly without a reference genome. We believe that the transcriptome dataset will provide a

solid foundation for future functional and genomics-based analyses of avocado, and will be useful for elucidating metabolic biosynthetic mechanisms.
