*2.3. Transcriptome Assembly*

Raw sequencing reads were cleaned using the trim\_galore v0.5.0 and the clean data was quality controlled by FASTQC v0.11.7 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, clean reads were used to assemble the transcriptome for each sample using Trinity software with default parameters, as described [21]. In detail, high quality RNA-Seq reads were used to generate overlapping k-mers (25) and Inchworm was used to assemble sorted k-mers into transcript contigs based on the (k-1)-mer overlaps. Next, Chrysalis was used to cluster related Inchworm contigs into components by using grouped raw reads and paired read links. Then, a de Bruijn graph for each cluster was built by Chrysalis and reads were partitioned among the clusters. Finally, Butterfly was used to process the individual graphs and ultimately report the full-length transcripts. To remove redundant sequences, CD-HIT was used to cluster the assembled highly similar transcripts into Unigenes [22], which can be accessed in the NCBI TAS platform under the accession number GIF00000000. BUSCO v4 was used to evaluate the completeness of the assembled Unigenes [23] using the eukaryota\_odb10 dataset.

#### *2.4. Annotation for the Transcriptome*

We annotated the assembled transcriptome by aligning them to di fferent databases. Initially, the transcriptome was searched against the NR (Non-Redundant Protein Sequence Database), NT (Nucleotide Sequence Database), KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, gene ontology (GO), Pfam, and UniProt/SwissProt databases using BLAST software, and hits with an e-value > 1 × 10−<sup>5</sup> were filtered. Then, BLAST2GO was used to retrieve the GO annotation in terms of the biological process, cellular component, and molecular function [24]. Using the enzyme commission numbers produced by BLAST2GO, we mapped the assembled transcriptome to the KEGG pathway database and obtained the pathway annotation. rRNA transcripts were predicted using RNAMMER [25].

#### *2.5. Likely Protein Identification and Annotation*

We next extracted the likely proteins from the assembled transcriptome using TransDecoder. Then, the likely proteins were searched against the UniProtKB/Swiss-Prot database to identify known proteins, functional PFAM domains were identified using HMMER [26], signal peptides were predicted using SignalP [27], and transmembrane domains were predicted using TMHMM Sever v2.0 [28]. The EggNOG database v4.1 [29] was searched against to identify proteins in EuKaryotic Orthologous Groups (KOG), Clusters of Orthologous Groups (COGs), and non-supervised orthologous groups (NOGs). All the annotation for the assembled genes and likely proteins were subjected to the Trinotate v3.1.1 (http://trinotate.github.io) for combination.

Based on the gene annotation and likely protein annotation, we obtained the Unigenes, which were annotated into olfactory gene families, such as OBP, OR, IR, GR, and SNMP, using their names as key words. For the CSP transcripts, we aligned the Unigenes to all the CSP transcripts from NCBI GenBank. Hits with an e-value > 1 × 10−<sup>5</sup> were filtered.

#### *2.6. Noncoding Genes and microRNA Genes*

Unannotated Ashmead genes were processed by the Coding Potential Calculator (CPC v2) with default parameters to identify potential long noncoding genes [30]. To identify potential microRNA (miRNA) genes, we first mapped all the animal mature miRNAs to the noncoding genes with a maximal of two mismatches [31]. Then, MIREAP was used to predict the miRNA precursors and MIRANDA was used to predict the target genes of these miRNAs [32].

#### *2.7. Di*ff*erential Expression Analysis*

We aligned the clean reads of each sample to the Unigenes using Bowtie2 and profiled the gene expression using RSEM [33]. The trimmed mean of the M-values (TMM) method was used for normalization and edgeR was used to identify di fferentially expressed genes with the following cut-o ffs [34]: Count > 5, log2 fold change (log2 fc) >1 or log2 fc < −1, *p*-value < 0.05, and false discovery rate (FDR) < 0.05.

#### *2.8. Function Enrichment Analysis*

We calculated the *p*-value (calculated by Fisher's exact test) and *q*-value (calculated by the R package 'qvalue') for each GO term and KEGG pathway involved in the differentially expressed genes. Enriched terms should satisfy the following criteria: *p*-value < 0.05 and *q*-value < 0.05.

## *2.9. Phylogenetic Analysis*

Phylogenetic trees were reconstructed for OBPs and CSPs using MEGA7 software [35]. We obtained the likely protein sequences for the top 5 highly expressed OBP and CSP transcripts. These sequences together with the homology protein sequences, obtained from NCBI, were subjected to MEGA7 to create phylogenetic trees using the neighbor-joining method. The bootstrap procedure based on 1000 replicates was used to assess node support, and the node support values < 50% were not shown. Figtree v1.4.3 (https://github.com/rambaut/figtree/) was used to visualize the results.
