*4.4. Transcriptome Assembly and Bioinformatics Analysis*

Transcriptome assembly was accomplished based on the left.fq and right.fq using Trinity [17] with min\_kmer\_cov set to 2 by default and all other parameters set default. In brief, the left files (read1 files) from all libraries/samples were pooled into one big left.fq file, and right files (read2 files) into one big right.fq file. Gene function was annotated based on Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), KO (KEGG Ortholog database), GO (Gene Ontology) databases. Gene expression levels were estimated by RSEM [41] for each sample. Clean data were mapped back onto the assembled transcriptome. Read counts for each gene were obtained from the mapping results. For differential expression analysis, prior to differential gene expression analysis, for each sequenced library, the read counts were adjusted by edgeR program package [42] through one scaling normalized factor. Differential expression analysis of six samples was performed using the DEGseq R package [43].

All transcripts were searched against the latest versions (as of August 2018) of Nr (nonredundant) database (http://www.ncbi.nlm.nih.gov/) and the Swiss-Prot database (http://www.gpmaw.com/ html/swiss-prot.html) using the BLAST program with an *e* < 10−<sup>5</sup> . The transcripts with the top hits were selected as unigenes. Open reading frames (ORFs) were predicted using the GetORF program contained in the EMBOSS software package. The Blast2GO program was used for GO annotation (http: //www.geneontology.org), and the unigenes were aligned to the eggNOG (evolutionary genealogy of genes: non-supervised orthologous groups) database (http://www.ncbi.nlm.nih.gov/COG/) to identify functional categories. The KEGG database (http://www.genome.jp/kegg/) was used for pathway annotation. All searches were conducted using an e-value cut-off of 10−<sup>5</sup> . GO terms were downloaded from the GO Analysis Toolkit and Database for Agriculture Community (AGRI go, http: //bioinfo.cau.edu.cn/agriGO/download.php). All the genes identified with significant differential expression (*p* < 0.05) and FC >2 in this study were used as inputs to carry out GO enrichment

analysis. Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was implemented by the GOseq R packages based Wallenius non-central hyper-geometric distribution [44] that can adjust for gene length bias in DEGs. KEGG pathway enrichment analysis used KOBAS [19] software to test the statistical enrichment of the differential expression genes in the KEGG pathways. In the scatterplot, the rich factor is the ratio of the differentially expressed gene number to the total gene number in a certain pathway.
