*2.2. Functional Annotation and Classification of the Unigenes*

unigenes were annotated in the *N*r protein database.

*2.2. Functional Annotation and Classification of the Unigenes* Functional annotation of unigenes were performed to search for homologues against the NCBI non-redundant protein sequence database (*N*r), NCBI nucleotide sequences (*N*t), Pfam (Protein family), Kyoto Encyclopedia of Genes and Genomes (KEGG), swiss-prot sequence databases (SwissProt), Gene ontology (GO), and Eukaryotic Orthologous Groups (KOG) using the Basic Local Alignment Search Tool (BLAST) [18]. An e-value cut-off of 10−<sup>5</sup> was applied to the homologue recognition. The results were shown in Table 2. 64551 (55.12%) total unigenes were annotated in at least one database and 8973 unigenes (7.66%) were annotated in all databases. 51,105 (43.64%) total Functional annotation of unigenes were performed to search for homologues against the NCBI non-redundant protein sequence database (*N*r), NCBI nucleotide sequences (*N*t), Pfam (Protein family), Kyoto Encyclopedia of Genes and Genomes (KEGG), swiss-prot sequence databases (SwissProt), Gene ontology (GO), and Eukaryotic Orthologous Groups (KOG) using the Basic Local Alignment Search Tool (BLAST) [18]. An e-value cut-off of 10−<sup>5</sup> was applied to the homologue recognition. The results were shown in Table 2. 64551 (55.12%) total unigenes were annotated in at least one database and 8973 unigenes (7.66%) were annotated in all databases. 51,105 (43.64%) total unigenes were annotated in the *N*r protein database.

**Table 2.** Summary of function annotation of the *Betula halophila* transcriptome.


The GO analysis indicated that a total of 41,116 unigenes were summarized into the three main GO categories (biological process, cellular component, and molecular function) and 56 sub-categories (Figure 2). In the biological process category, genes involved in cellular process, metabolic process, and single-organism process were dominant. As for the cellular component category, genes involved in cell, cell part, and organelle were highly represented. The molecular function category mainly included genes involved in binding and catalytic activity. The GO analysis indicated that a total of 41,116 unigenes were summarized into the three main GO categories (biological process, cellular component, and molecular function) and 56 sub-categories (Figure 2). In the biological process category, genes involved in cellular process, metabolic process, and single-organism process were dominant. As for the cellular component category, genes involved in cell, cell part, and organelle were highly represented. The molecular function category mainly included genes involved in binding and catalytic activity.

Total Unigenes 117,091 100

*Int. J. Mol. Sci.* **2018**, *19*, x 4 of 13

**Figure 2.** Gene ontology (GO) classification of unigenes. The GO terms are summarized into three main categories: biological process, cellular component, and molecular function. **Figure 2.** Gene ontology (GO) classification of unigenes. The GO terms are summarized into three main categories: biological process, cellular component, and molecular function. **Figure 2.** Gene ontology (GO) classification of unigenes. The GO terms are summarized into three main categories: biological process, cellular component, and molecular function.

The KOG analysis showed that all of the 15,572 unigenes were divided into 26 different functional classes, which were represented by A to Z (Figure 3). Among the 26 categories, the largest group was 'Post-translational modification, protein turnover, chaperon' (2104, 13.51%) followed by 'General function prediction' (1906, 12.24%), 'Translation, ribosomal structure, and biogenesis' (1552, 9.97%), 'RNA processing and modification' (1272, 8.17%) and 'Signal Transduction' (1248, 8.01%). The smallest group was 'Cell motility' (7, 0.04%) and 'Unnamed protein' (2, 0.01%). The KOG analysis showed that all of the 15,572 unigenes were divided into 26 different functional classes, which were represented by A to Z (Figure 3). Among the 26 categories, the largest group was 'Post-translational modification, protein turnover, chaperon' (2104, 13.51%) followed by 'General function prediction' (1906, 12.24%), 'Translation, ribosomal structure, and biogenesis' (1552, 9.97%), 'RNA processing and modification' (1272, 8.17%) and 'Signal Transduction' (1248, 8.01%). The smallest group was 'Cell motility' (7, 0.04%) and 'Unnamed protein' (2, 0.01%). The KOG analysis showed that all of the 15,572 unigenes were divided into 26 different functional classes, which were represented by A to Z (Figure 3). Among the 26 categories, the largest group was 'Post-translational modification, protein turnover, chaperon' (2104, 13.51%) followed by 'General function prediction' (1906, 12.24%), 'Translation, ribosomal structure, and biogenesis' (1552, 9.97%), 'RNA processing and modification' (1272, 8.17%) and 'Signal Transduction' (1248, 8.01%). The smallest group was 'Cell motility' (7, 0.04%) and 'Unnamed protein' (2, 0.01%).

**Figure 3.** Eukaryotic Orthologous Groups (KOG) classification of the unigenes. **Figure 3.** Eukaryotic Orthologous Groups (KOG) classification of the unigenes.

**Figure 3.** Eukaryotic Orthologous Groups (KOG) classification of the unigenes. The KEGG pathway analysis revealed that 18876 (16.12%) of the unigenes could be mapped to the KEGG database and referred to 129 pathways (Figure 4). The pathway involved the highest number of unigenes was 'Translation' (1915, 10.14%), followed by 'Folding, sorting, and degradation' The KEGG pathway analysis revealed that 18876 (16.12%) of the unigenes could be mapped to the KEGG database and referred to 129 pathways (Figure 4). The pathway involved the highest number of unigenes was 'Translation' (1915, 10.14%), followed by 'Folding, sorting, and degradation' (1465, 7.76%), 'Carbohydrate metabolism' (1388, 7.35%) and 'Overview' (1012, 5.36%). These results are very important for studying the mechanism in *B. halophila* response to salt. The KEGG pathway analysis revealed that 18876 (16.12%) of the unigenes could be mapped to the KEGG database and referred to 129 pathways (Figure 4). The pathway involved the highest number of unigenes was 'Translation' (1915, 10.14%), followed by 'Folding, sorting, and degradation' (1465, 7.76%), 'Carbohydrate metabolism' (1388, 7.35%) and 'Overview' (1012, 5.36%). These results are very important for studying the mechanism in *B. halophila* response to salt.

(1465, 7.76%), 'Carbohydrate metabolism' (1388, 7.35%) and 'Overview' (1012, 5.36%). These results

are very important for studying the mechanism in *B. halophila* response to salt.

DEGs.

*Int. J. Mol. Sci.* **2018**, *19*, x 5 of 13

**Figure 4.** Kyoto Encyclopedia of Genes and Genomes (KEGG) classification of KO annotated **Figure 4.** Kyoto Encyclopedia of Genes and Genomes (KEGG) classification of KO annotated unigenes.

#### unigenes. *2.3. Differential Expression Genes in B. halophila Response to Salt*

*2.3. Differential Expression Genes in B. halophila Response to Salt* To obtain the differential expression genes' response to salt in *B. halophila*, we compared the differentially expressed tags of two libraries. As a results, a total of 519 differentially expressed genes (DEGs) with *q* value < 0.05 and |log2 (fold change)| >1 were identified in the two libraries (Table S1). As shown in Figure 5a, there were more down-regulated genes (351) than up-regulated genes (168). Among these DEGs, 332 DEGs were present in both libraries, (Figure 5b). 66 DEGs were only detected in the salt stress library (Figure 5b) and 121 DEGs were only detected in the control library. In this study, the transcription factor AT-Hook Motif Nuclear Localized gene (AHL) was the most upregulate gene in leaves after the salt stress. Conversely, a dehydrin (DHNs) was the most downregulated gene. These results suggest that the two genes may have a high correlation with salt resistance of *B. halophila*. The GO and KEGG classification of the 519 DEGs were analyzed (Figure S1). To obtain the differential expression genes' response to salt in *B. halophila*, we compared the differentially expressed tags of two libraries. As a results, a total of 519 differentially expressed genes (DEGs) with *q* value < 0.05 and |log2 (fold change)| >1 were identified in the two libraries (Table S1). As shown in Figure 5a, there were more down-regulated genes (351) than up-regulated genes (168). Among these DEGs, 332 DEGs were present in both libraries, (Figure 5b). 66 DEGs were only detected in the salt stress library (Figure 5b) and 121 DEGs were only detected in the control library. In this study, the transcription factor AT-Hook Motif Nuclear Localized gene (AHL) was the most up-regulate gene in leaves after the salt stress. Conversely, a dehydrin (DHNs) was the most down-regulated gene. These results suggest that the two genes may have a high correlation with salt resistance of *B. halophila*. The GO and KEGG classification of the 519 DEGs were analyzed (Figure S1). GO enrichment and KEGG enrichment were performed for further analysis of the functions of 519 DEGs.

GO enrichment and KEGG enrichment were performed for further analysis of the functions of 519

**Figure 5. A**. Up-regulated and down-regulated differentially expressed genes in SC vs. CK; **B**. Venn diagrams showing unique and shared differentially expressed genes (DEGs) in SC (green) vs. CK (purple); **C**. Scatterplot of GO category enrichment of DEGs in SC vs. CK; **D**. Scatterplot of enriched KEGG pathways for DEGs in SC vs. CK. Rich factor is the ratio of the differentially expressed gene number to the total gene number in a certain pathway. The size and color of dot represent the gene number and the range of the q value, respectively. **Figure 5. A**. Up-regulated and down-regulated differentially expressed genes in SC vs. CK; **B**. Venn diagrams showing unique and shared differentially expressed genes (DEGs) in SC (green) vs. CK (purple); **C**. Scatterplot of GO category enrichment of DEGs in SC vs. CK; **D**. Scatterplot of enriched KEGG pathways for DEGs in SC vs. CK. Rich factor is the ratio of the differentially expressed gene number to the total gene number in a certain pathway. The size and color of dot represent the gene number and the range of the q value, respectively.

#### *2.4. GO category Enrichment of DEGs Under Salt Stress 2.4. GO category Enrichment of DEGs Under Salt Stress*

To characterize the function of the DEGs under salt stress, the GO category enrichment analysis was performed using Fisher's exact test with *p* value ≤0.05 as the cutoff. GO category enrichment analysis for 519 DEGs under salt stress showed that these DEGs were mainly involved in a plant-type cell wall organization biological process, plant-type cell wall organization or biogenesis biological process, cell wall cellular component and structural constituent of cell wall molecular function (Figure 5c, Table S2). For the up-regulated DEGs, metalloendopeptidase activity molecular function was most highly enriched (Table S3). For down-regulated DEGs (Figure 5c), in the BP category, 'plant-type cell wall organization biological process', 'plant-type cell wall organization or biogenesis biological process', 'cell wall organization biological process', and 'external encapsulating structure To characterize the function of the DEGs under salt stress, the GO category enrichment analysis was performed using Fisher's exact test with *p* value ≤0.05 as the cutoff. GO category enrichment analysis for 519 DEGs under salt stress showed that these DEGs were mainly involved in a plant-type cell wall organization biological process, plant-type cell wall organization or biogenesis biological process, cell wall cellular component and structural constituent of cell wall molecular function (Figure 5c, Table S2). For the up-regulated DEGs, metalloendopeptidase activity molecular function was most highly enriched (Table S3). For down-regulated DEGs (Figure 5c), in the BP category, 'plant-type cell wall organization biological process', 'plant-type cell wall organization or biogenesis biological process', 'cell wall organization biological process', and 'external encapsulating structure organization biological process' were most highly enriched. In the CC category, 'cell wall cellular component', 'cytosolic part cellular component', 'cytosol cellular component', and 'external encapsulating structure

cellular component' were the main enriched terms. In MF, the most enriched term was structural constituent of cell wall molecular function (Table S4). term was structural constituent of cell wall molecular function (Table S4).

encapsulating structure cellular component' were the main enriched terms. In MF, the most enriched

*Int. J. Mol. Sci.* **2018**, *19*, x 7 of 13
