*4.4. Transcriptome Assembly and Gene Functional Annotation*

A total of 44.59 Gb of clean data was removed from the original data, including low-quality data. Then Trinity software was used to break reads into shorter K-mers, extend them to obtain Contigs, gather a collection of Contig clusters, and finally obtain the transcript sequence by using de Bruijin graphs algorithm method and reads.

Using BLAST software, the sequencing of the Unigene sequence was compared with NR, KOG, COG, GO, KEGG, Pfam and Swissprot-Annotation databases. After the prediction of the amino acid sequence of the unigene, HMMER software [63] was used to compare it with the Pfam database [64] to obtain functional annotation information of all unigenes.
