*4.6. De Novo Transcriptome Assembly Annotation*

Transcriptome de novo assembly was performed with clean data, filtered from the raw data by removing adaptors and unknown nucleotides (>10%), and those with low quality reads. The data were assembled using the Trinity platform [51] with the parameters 'K-mer = 31 and 'K-mer cover = 6 . First, short reads of a certain length were combined with overlap to form longer contigs. Then, based on their paired-end information, clean reads were mapped back to the corresponding contigs. Thus the sequences of the transcripts were finished, and defined as unigenes. All assembled unigenes were then annotated using BLASTx (E-value <sup>≤</sup> <sup>1</sup> <sup>×</sup> 10-5) against protein databases, including the National Center for Biotechnology Information non-redundant (Nr, ftp://ftp.ncbi.nih.gov/blast/db/), Swiss-Protein (https://www.uniprot.org/), Kyoto Encyclopedia of Genes and Genomes (KEGG, https: //www.genome.jp/kegg/), and Gene Ontology (GO, http://geneontology.org/) databases. while Nr and Swiss-Protein annotate gene function, KEGG is used to understand biological systems and GO divides genes into different categories. BLASTx was used to search for the unigenes against the public databases with the following order of priority: Nr, Swiss-protein, KEGG, and COG. When a unigene could not be aligned to any of these protein databases, the protein code sequence and sequence direction was confirmed using the ESTscan program.
