**4. Material and Methods**

#### *4.1. Identification of TPSs in Rosaceae*

A total of eight Rosaceae species are included in the identification of TPSs, namely three Prunoideae, three Maloideae, and two Rosoideae species. Their genomes have been completely sequenced and annotated. The genome files of the Rosaceae species were mostly downloaded from the NCBI (https://www.ncbi.nlm.nih.gov, accessed on 2 August 2021) and GDR (https://www.Rosaceae.org, accessed on 2 August 2021), the versions of genomes were all the recently released and chromosome-scale, detailed genome information was summarized in Table 1. Six representative sequences of TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g from *Vitis vinifera* from a previous study [28], and one representative sequence of TPS-d from *Abies grandis*, were used as queries to search the corresponding subject protein sequences of each Rosaceae species. Two different methods were used to identify TPSs in Rosaceae species. First, we implemented BLASTP searches of the complete genome with an E-value cut-off of 0.00001 to reduce false positives, and the second method was Hidden Markove Model (HMM) profiles of TPS domains in these Rosaceae species by using HMMER software with an E-value cut off of 0.001 [29]. The redundant sequences were removed by manual inspection. Subsequently, we verified all sequences by checking the existence of Pfam domains PF03936 (metal-binding domain) and PF01397 (N-terminal TPS domain) using PfamScan tools with default parameters [30], Pfam-A was used as the searching database. PfamScan tools search the whole sequences against the Pfam database, and annotate the sequence blocks as known domains, only the significant domains are retained, it can simultaneously predicted the different domains in one protein. Ultimately, genes containing at least one TPS domain were confirmed as members of the TPS gene family and named in numerical order.

#### *4.2. Motif Annotation, Subcellular Localization, and Physical Localization*

The conserved motifs were predicted using the online MEME software with the following settings: the motif discovery mode was classic, site distribution was zero or one occurrence per sequence (zoops), the background was a 0-order background model, the maximum number of different motifs was 20, minimum motif width was 6, and maximum motif width was 50 [31]. A shuffling was also performed prior to the MEME/MAST analysis to validate the identified motifs. The conserved domain was annotated based on the conserved domain database (CDD v3.19) in NCBI. TargetP and pLoc-mPlant (www.jcibioinfo.cn/pLoc-mPlant/, accessed on 24 August 2021) were used to predict the subcellular localization of TPS proteins [32]. For each species, we got the information of the TPSs on the corresponding chromosome according to the annotation documents and drew a sketch map of the gene's physical location using TBTools [33]. Protein functions were also predicted based on blast searching against Uniprot and KEGG pathway databases under default parameters.
