**3. Discussion**

In the plant kingdom, terpenes are traditionally classified as secondary metabolites. Thousands of terpenes have been found, and have proven to play significant roles not only in resistance against stress conditions but also in flavor formation [19,20,24,25]. However, each species is capable of synthesizing only a small fraction, whose synthesis has evolved in plants as a result of selection for increased fitness via better adaptation to the local ecological niche of each species [12]. Terpene synthases are responsible for the synthesis of the various terpene molecules [12]; plant TPS gene families are a medium-sized group, and display varied numbers of TPS families among different species [16]. The Rosaceae family has significant economic value, including the fruit crops and ornamental flowers; however, comprehensive molecular evolutionary and function analysis of TPSs remain elusive. In this study, we screened for the TPS family from eight Rosaceae species. We identified TPSs by detecting both domains and either single domain separately, thus minimizing the chance of missing putative TPSs. We found this family in Rosaceae is a midsized family, as identified in a previous study [13], ranging from 10 TPSs in *P. mira* to 76 in *R. chinensis*. Domain loss for either N-terminal or C-terminal occurred frequently in Rosaceae species.

All the Rosaceae TPSs in this study were divided into seven known clades, TPS a–g. The family numbers of different TPS clades varied among three Rosaceae subfamilies; for example, more than two TPS-c gene copies existed in Maloideae and Rosoideae subfamilies, while only one copy was detected in Prunoideae species. The average number of TPSs in Prunoideae species is lower than that of Maloideae and Rosoideae; no recent WGD except a triplicated arrangement could limit the expansion of TPSs in Prunoideae. Additionally, fewer TPSs in the early Prunoideae species *P. mira*, but more TPSs in modern peach *P. persica*, further revealed the evident evolutionary plasticity of the TPS family. Lineage- and speciesspecific expansions of the TPSs were widely observed within different TPS clades in Rosaceae. The varied family number and differentiation of TPSs in Rosaceae may play roles in the specialization of essential traits and species differentiation. It has been proposed that lineage-specific genes have a greater chance to contribute to phenotypic variations because their roles are not essential. Further synteny analysis showed that segmental and tandem duplications were both the driving force for the expansion of the TPS gene family in Rosaceae; for *M. domestica* and *P. persica*, segmental and tandem duplication contributed to family expansion; and for *F. vesca*, tandem duplication played the most important role in the family expansion. Ka/Ks calculations further revealed that TPSs genes mainly evolved under purifying selection, except for several pairs; the divergent time indicated TPS-e clade was diverged relatively anciently.

Since closely related enzymes differ in their product profiles, subcellular localization, or substrates, the prediction strategy based on sequence similarity usually cannot accurately describe the specialized function of TPSs, and only roughly obtained their involved pathways. In our study, we still predicted their function based on blast searching against Uniprot and KEGG pathway databases, the results revealed the functional diversity between different TPS clades and species. For the three Prunoideae species, the functional classification of TPSs among different clades was relatively conservative except for the family number variation. Whereas for the other two subfamilies, the putative function of TPSs demonstrated wide diversity, such as TPS-a clade in *P. betulifolia*, up to seven different types of synthases were predicted. In addition, we observed that early diverged TPS-e clade is conserved in function and most of their members were all predicted as ent-kaur-16-ene synthases with exceptions in *M. domestica*. Despite TPS-e and TPS-f being sister clades that clustered together, we found that their function had undergone differentiation, in contrast to TPS-e that mainly participated in diterpenoid biosynthesis, most TPS-f members are involved in the monoterpene biosynthesis pathway. The expansion of TPS-f in *P. persica* and their product, S-linalool synthase, is the essential aromatic substance in peach fruits [21]. Our findings indicated that the TPS family in Rosaceae species possesses remarkable function diversity; different clades expand in different lineages by gene duplication and divergence. The generation of altered subcellular localization, and new substrate specificities of TPSs, is a dynamic process that specialized the trait differentiation. However, experimental data like metabonomics, enzyme assays, are also needed to verify these observations.

The expression profiling results on the *TPS* gene family in three Rosaceae species (*P. persica*, *M. domestica*, and *F. vesca*) showed that most identified TPSs were expressed in at least one tissue. Most of the TPSs were specifically expressed in one certain tissue; the expressed TPSs in ripe fruits are rare. For each TPS clade, the expression pattern also varied among species, such as TPS-f genes, which demonstrated a high expression in *P. persica*, but a lower expression in *M. domestica*. We also found that many paralogs exhibited divergent expression patterns either in tissues or expression abundance, suggesting that expression divergence might significantly contribute to gene survival and function differentiation after gene expansion. It was worth noting that among the expressed TPSs, certain putative TPSs did not have both domains. In contrast, many TPSs with both domains were not expressed, and the finding suggested that in the complexity of TPSs expression, not all of the complete TPSs were functional, and some functional ones may lose activity in either one domain. These putative TPSs without both domains were assumed to be triggered by partial duplication and assumed to be pseudogenes for the loss of original function. A previous study found that a total of 12% of the pseudogenes still contained detectable open reading frames and were effectively expressed. The generated transcripts may contribute to the synthesis of small interfering RNA species that regulate parent transcripts [26]. Fast-evolving families involved in ubiquitination and secondary metabolism families always contain the highest number of pseudogenes [26,27]. Hence, the functionality of the expressed "fragmental" TPSs in our study still needs further investigation.
