*4.3. Transcriptome Assembly, Annotation, and Coding Sequence Prediction*

Clean data (clean reads) were obtained by discarding reads with adapters, reads with ambiguous poly-N sequences, and low-quality reads in which more than 50% of the bases had a Q-value ≤ 20. The two read files that were independently established for the libraries/samples were used for assembling the transcriptome with the Trinity program (version 2.5.1) [54]; the min\_kmer\_cov was set to 2 and all other parameters were set to default values. The assembled transcripts were hierarchically clustered to unigenes through shared reads and expression by the Corset program [55].

Unigenes were functionally annotated with a BLASTX alignment algorithm (E-value threshold of 10−<sup>5</sup> ) and the following databases: KOG/COG, Swiss-Prot (manually annotated and reviewed protein sequence database), Pfam (along with the HMMER3.0 package), Nr, and Nt (comprising non-redundant nucleotide sequences). The KEGG Automatic Annotation Server [56] was used to map these genes according to the KEGG metabolic pathway database. Rich factor = (the number of DEGs/the number of all DEGs)/(the number of all unigenes in pathways/the number of all unigenes in KEGG). Additionally, Blast2GO (version 2.5) [57] was used for the annotation of unigenes with GO terms based on the BLASTX hits against the Pfam and Nr databases, with a cut-off E-value of 10−<sup>6</sup> . To predict the coding sequences, the unigenes were first used to screen the Nr and Swiss-Prot databases with a BLAST algorithm, after which the open reading frame data for sequence matches were acquired directly. The coding sequences for the remaining unigenes were predicted with ESTScan (version 3.0.3) (https://sourceforge.net/projects/estscan/).
