**1. Introduction**

Avocado (*Persea americana* Mill.) is a member of the family Lauraceae of the order Laurales, and widely grown in countries and regions with a tropical-to-cool climate [1–3]. Avocado is among the most economically important subtropical/tropical fruit crops worldwide, with considerable increases in yield reported in several countries, including Mexico, the USA, Indonesia, Chile, Spain, Israel, Colombia, South Africa, and Australia [4]. Certain avocado constituents, such as carotenoids, lipids,

sugars, proteins, minerals, vitamins, and other nutrients and active ingredients, provide nutritional and health benefits [5–7].

Carotenoids in fruits have been extensively studied because of their nutritional benefits for humans [5,8]. Moreover, some carotenoids serve as precursors of vitamin A, strigolactones, and abscisic acid, and some apocarotenoids are also potent antioxidants and colorants [9]. Additionally, carotenoid derivatives substantially affect the aromatic flavor of fruits, thereby making the fruits more desirable to consumers and seed dispersers [10,11]. Carotenoids are natural yellow-to-red pigments that are mainly a type of C40 terpenoid distributed in most photosynthetic organisms as well as in some non-photosynthetic fungi and bacteria [12]. In plants, carotenoids primarily contribute to photosynthesis, photomorphogenesis, photoprotection, light-harvesting processes, and growth and development [12–14]. The plant carotenoid metabolic pathway has been well elucidated. This pathway consists of a series of chemical reactions, including condensations, dehydrogenations, cyclizations, hydroxylations, and epoxidations [12].

Next-generation high-throughput sequencing technology (NGST) have recently become popular options for transcriptome sequencing experiments because they enable high-throughput, efficient, accurate, and reproducible analyses [15–20]. Previous studies suggested that changes in carotenoid biosynthesis and accumulation are correlated with changes to the expression of genes encoding carotenoid metabolic pathway enzymes [8,12]. Transcriptome sequencing methods have been used to investigate the carotenoid biosynthetic mechanism in many species, including *Momordica cochinchinensis* [21], *Brassica campestris* [22], green alga [23], celery [9], carrot [24,25], and *Euscaphi skonishii* [26]. However, few reports have focused on transcriptome sequencing to investigate the carotenoid biosynthetic mechanism in avocado.

Among the third-generation sequencing platforms, PacBio RS II, which is regarded as the first commercialized third-generation sequencer, is based on single-molecule real-time (SMRT) technology [27]. The PacBio RS II system can produce much longer reads than second-generation sequencing platforms, and has been applied to effectively capture full-length transcript sequences [28]. Single-molecule real-time technology has the following three main advantages over second-generation sequencing options: it generates longer reads, it has a higher consensus accuracy, and it is less biased [29]. A previous study revealed that SMRT technology can precisely ascertain alternative polyadenylation sites and full-length splice isoforms, and also detect a higher isoform density than that for the reference genome [30]. The application of SMRT technology for nearly 3 years has helped to elucidate the complexity of the transcriptome and molecular mechanism underlying metabolite synthesis in safflower [27], *Zanthoxylum bungeanum* [28], *Trifolium pretense* [30], sugarcane [31], switchgrass [32], *Medicago sativa* [33], *Zanthoxylum planispinum* [34], bermudagrass [35], *Camellia sinensis* [36], and *Cassia obtusifolia* [37]. So far, no report has been found about the application of SMRT technology in a plant species from the family Lauraceae.

In this study, Illumina HiSeq 2000 next-generation sequencing technology and PacBio RS II third-generation sequencing platform were integrated to generate transcriptome data for exploring the carotenoid biosynthetic pathways in the avocado mesocarp and seed. We integrated the gene dosage variation and the associated changes in gene expression to identify genes that are likely important for carotenoid accumulation. Additionally, metabolite profiling (alpha- and beta-carotene contents) via high-performance liquid chromatography (HPLC) was used to auxiliarily validate the transcriptomic analyses. The data obtained in this comprehensive study involving the full-length transcript sequences and de novo transcriptome assembly from short read sequencing will be useful for investigating the main physiological and biochemical molecular metabolic mechanisms in the avocado mesocarp and seed.
