*2.5. Ka/Ks Ratios of TPS Family Members in Rosaceae*

In addition, based on the phylogenetic relationship in Figure 3, we assumed TPSs pairs that derived from recent duplication as paralogs. There were two types of paralogs: one is the "between-species paralog" and the other is "within-species paralog" [22]. In this study, we only used the latter type of paralogs. As a result, a total of 82 TPS paralogs from recent duplication were found in the six Rosaceae species (*P. persica*, *P. mume*, *M. domestica*, *P. betulifolia*, *F. vesca*, *R. chinensis*). To explore the selection pressure in the evolution of TPSs, the Ka/Ks values were calculated for the six Rosaceae species (Table S3); a Ka/Ks value of less than one implies purifying selection, Ka/Ks = 1 represents neutral selection, and Ka/Ks > 1 indicates positive selection. The results showed that the Ka/Ks values of TPS paralogs were mostly less than one, suggesting that these genes evolved under purifying selection. There were five gene paralogs with Ka/Ks values greater than one, including three pairs of the TPS-a clade (Ro.chi-TPS18/Ro.chi-TPS20, Ma.dom-TPS14/Ma.dom-TPS13, and Ma.dom-TPS6/Ma.dom-TPS8), one pair from the TPS-b clade (Fr.ves-TPS11/Fr.ves-TPS9), and one pair from the TPS-c clade (Ma.dom-TPS29/Ma.dom-TPS30), which indicates that they were

evolved under positive selection. Based on Ks values, the divergence time was calculated, which showed that five gene paralogs were diverged less than 5.5 Mya, especially for Ro.chi-TPS18 and Ro.chi-TPS20, and their divergence time was estimated to be around 0.55 Mya, indicating these gene paralogs were diverged recently. We observed that the divergence time varied among different clades. The divergence time of paralogs from TPS-e were all more than 11.4 Mya, which is higher than other clades, indicating the TPS-e clade diverged relatively anciently.

## *2.6. Function Diversity of TPSs in Rosaceae*

For the six representative plants shown in Figure 3, we predicted their function based on blast searching against Uniprot and KEGG pathway databases; their subcellular localization was also predicted using TargetP [23] and pLoc-mPlant (www.jci-bioinfo.cn/pLocmPlant/, accessed on 24 August 2021). Detailed information is listed in Tables 3 and S2. Subcellular localization analysis indicated that TPSs were substantially localized to cytoplasm and chloroplasts, and only a few TPSs were localized to mitochondria. TPSs from the TPS-a clade were mostly located in cytoplasm and chloroplasts, and demonstrated varied function diversity. Most of the characterized TPSs (~95%) in the TPS-a clade are involved in sesquiterpenoid and triterpenoid biosynthesis, and monoterpene biosynthesis pathway, using geranyl diphosphate GPP and farnesyl diphosphate (FPP) as substrate. Whereas for *M. domestica* and *P. betulifolia*, several TPSs from TPS-a clade are involved in diterpenoid biosynthesis that used geranylgeranyl diphosphate (GGPP) as substrate. The finding suggested both the cytosolic mevalonic acid (MVA) pathway and the plastidic methylerythritol phosphate (MEP) pathway coexist in the TPS-a clade. All characterized TPSs in TPS-b clade are involved in either a sesquiterpenoid or monoterpene biosynthesis pathway. The members from the TPS-g clade mainly function in producing acyclic monoand sesquiterpenoid products. TPS-c and TPS-e clades mainly participate in diterpenoid biosynthesis. Surprisingly, despite the close relationship between TPS-f and TPS-e/c clades, we found that TPS-f members are involved in sesquiterpenoid and triterpenoid biosynthesis, and monoterpene biosynthesis pathway. Furthermore, we observed that TPSs from the same clade participated in different pathways, such as, for TPS-e clade, TPSs were all predicted as ent-kaur-16-ene synthases with exceptions in the *M. domestica* that involved in other pathways. For the Prunoideae subfamily, the function of TPSs among different clades was relatively conserved in three plants, and more TPS copies from the TPS-a clade involved in the monoterpene biosynthesis pathway were detected in *P. persica*. However, the functional diversity of TPSs is more obvious for Maloideae and Rosoideae species, such as the TPS-a clade in *P. betulifolia*. The above findings indicated that the TPS family in Rosaceae species possesses remarkable flexibility to evolve enzymes substrate specificity, and different clades expand in different lineages by gene duplication and divergence. It can be expected that proteins with altered subcellular localization and new substrate specificities would have evolved.



Notes: \*, a1 represents the catalytic type of TPS listed in the second row; number in the bracket "(19)" represents family number of this TPS in the corresponding species.
