*Article* **Global Profiling of Genes Expressed in the Silk Glands of Philippine-Reared Mulberry Silkworms** *(Bombyx mori)*

**Pauline Nicole O. de la Peña, Adria Gabrielle D. Lao and Ma. Anita M. Bautista \***

National Institute of Molecular Biology and Biotechnology, University of the Philippines Diliman, Quezon City 1101, Philippines; podelapena1@up.edu.ph (P.N.O.d.l.P.); adlao@up.edu.ph (A.G.D.L.) **\*** Correspondence: mmbautista20@up.edu.ph

**Simple Summary:** The sericulture industry in the Philippines remains to be enhanced through efforts to improve local strains of the silkworm *Bombyx mori,* such as selective hybridization. Genetic trait markers could help improve breeding; however, there is a scarcity of genetic data on local strains. This study aims to bridge this gap by analyzing the gene expression profiles of Philippine-reared silkworm strains through next-generation sequencing. Transcriptome assemblies were generated, and gene expressions were compared between silkworm strains reared in different temperatures, revealing genes that may be important for development in a tropical environment. This study is the first to provide transcriptome datasets for the Philippine-reared *B. mori* strains, which may serve as a resource for improving local strains and increasing silk production.

**Abstract:** RNA sequencing was used to assemble transcriptome data for Philippine-reared silkworm and compare gene expression profiles of strains reared in high- and low-temperature environments. RNA was isolated from the silk glands of fifth instar larvae and mRNA-enriched libraries were sequenced using Illumina NextSeq 500. Transcriptome reads were assembled using reference-based and de novo assemblers, and assemblies were evaluated using different metrics for transcriptome quality, including the read mapping rate, E90N50, RSEM-eval, and the presence of single-copy orthologs. All transcriptome assemblies were able to reconstruct >40,000 transcripts. Differential expression analysis found 476 differentially expressed genes (DEGs; 222 upregulated, 254 downregulated) in strains reared in different temperatures. Among the top DEGs were myrosinase, heat shock proteins, serine protease inhibitors, dehydrogenases, and regulators of the juvenile hormone. Validation of some of the top DEGs using qPCR supported the findings of the in silico analysis. GO term enrichment analysis reveals an overrepresentation of GO terms related to nucleotide metabolism and biosynthesis, lipid and carbohydrate metabolic processes, regulation of transcription, nucleotide binding, protein binding, metal binding, catalytic activity, oxidoreductase activity, and hydrolase activity. The data provided here will serve as a resource for improving local strains and increasing silk production of Philippine-reared *B. mori* strains.

**Keywords:** silkworm; *Bombyx mori*; RNA-seq; silk gland; transcriptome analysis

#### **1. Introduction**

The mulberry silkworm *Bombyx mori* is an economically important insect for the commercial production of silk, which is highly valued for its texture and luster, and for being one of the strongest natural fibers [1]. Silk has many uses such as in garments, upholstery, parachute linings, insulation, and as suturing material in surgeries. It is also considered a promising biomaterial in tissue engineering due to its mechanical strength and biocompatibility [2].

Since sericulture was introduced to the Philippines in the 1970s, local silkworm stocks have originated from Chinese, Japanese, and European strains [3]. Previous studies determined that temperate strains produce finer and stronger silk fibers, while tropical strains

**Citation:** de la Peña, P.N.O.; Lao, A.G.D.; Bautista, M.A.M. Global Profiling of Genes Expressed in the Silk Glands of Philippine-Reared Mulberry Silkworms *(Bombyx mori)*. *Insects* **2022**, *13*, 669. https:// doi.org/10.3390/insects13080669

Academic Editor: Silvia Cappellozza

Received: 24 June 2022 Accepted: 20 July 2022 Published: 24 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

are more robust and disease-resistant but produce coarser and weaker silk fibers [4]. There have been efforts to breed hybrids that are acclimatized to the Philippine climate with desired phenotypic traits such as a high hatch rate, cocoon yield, cocoon weight, and silk filament quality [5–7]. Local silk production, however, has been reported to continue lagging behind demand, with 10 metric tons of demand annually and only 1 metric ton of raw silk production [8]. Low production has been attributed to the quality of local silkworm strains and poor rearing management [9]. The low silk yield may be improved by understanding how it is produced. The molecular mechanisms behind the different silk yields in silkworm strains, for example, need to be investigated as they are not yet well-understood [10,11]. Knowledge to be generated from molecular studies may therefore help in breeding local silkworm strains with a capacity to produce high silk yield. Markers associated with high silk yield and other economically important traits may also be developed for marker-assisted selection and breeding. Thus, this endeavor would need genetic resources, which can be sourced from genome and transcriptome sequences of foreign strains of *B. mori* in public databases such as the Silkworm Genomic Database (SilkDB) [12], KAIKObase [13], SilkTransDB (transcriptome) [14], SilkSatDb (microsatellites) [15], and BmTEdb (transposable elements) [16].

While there are already several genomic resources for the silkworm, transcriptome analysis can be used to gain insight into the functional elements of the genome and its biological pathways. One such technique is RNA sequencing (RNA-seq), which utilizes next-generation sequencing to provide high-throughput and high-resolution data to analyze expressed genes in a cost-efficient manner. RNA-seq can be used to identify novel transcripts, splice variants, and single nucleotide polymorphisms [17]. It also allows quantification over a dynamic range since it detects both highly and lowly expressed genes, making it ideal for gene expression analysis [18]. Recently, RNA-seq has been used by Zhang et al. to reveal gene expression changes in silkworms exposed to hydrogen sulfide [19]. Yokoi et al. also used RNA-seq to generate reference transcriptome data for major tissues of the silkworm p50T strain [20]. However, available transcriptome resources currently lack representative sequences from Philippine-reared *B. mori* strains; hence, the need to generate such resources is deemed important.

This study aimed to generate transcriptome resources for Philippine-reared mulberry silkworms, which can be utilized in future studies for its potential industrial valorization and to address the issue of low silk yield in the Philippines. This investigation focused on genes found in the silk glands of different strains of Philippine-reared silkworm.

#### **2. Materials and Methods**

#### *2.1. Insect and Tissue Collection*

Silkworms reared at the Philippine Textile Research Institute Technology Center in Misamis Oriental (PTRI-TCMO, Villanueva, Misamis Oriental) and the Technology Center of the Department of Science and Technology Cordillera Administrative Region (DOST-CAR, Benguet) were collected in 2018. The ambient temperature in TCMO and CAR at the time of collection was 31.7 ◦C and 23.9 ◦C, respectively. For each site, priority strains (MO204, LAT51, ITA) and a poorly performing strain (DMMMSU119) were used as representative strains. The strains were classified by the rearing facilities as either priority strains or poorly performing strains based on their rearing performance (hatching ratio, larval mortality, and pupation ratio). Economic characteristics such as silk yield and quality were not considered in the classification of priority strains. Three individuals per strain were used as bioreplicates. Silkworms were collected on the 3rd day of the 5th instar larval stage and were characterized morphologically. Larvae were preserved by flash-freezing in liquid nitrogen and stored at −80 ◦C until ready for dissection. Larvae were dissected in a petri dish on ice to collect whole silk glands. Silk glands were chopped using sterile surgical blades and homogenized mechanically using a mortar and pestle.

#### *2.2. RNA Extraction and Sequencing*

Total RNA was extracted from 100 μg of the tissue with the TRIzol reagent (Invitrogen, Life Technologies, Carlsbad, CA, USA). Total RNA was treated with TURBO DNAse (Ambion, Life Technologies, Carlsbad, CA, USA) and purified with the RNA Clean and Concentrator kit (Zymo Research, Irvine, CA, USA). The quality of RNA extracts was visually estimated by running them in 1% agarose gel at 90 V for 40 min. RNA purity was estimated by measuring the 260/280 and 260/230 absorbance ratios in a NanoDrop 2000/2000c Spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA), and RNA was quantified by fluorometry using a Qubit RNA BR Assay kit (Thermo Fisher Scientific) according to the manufacturer's instructions. Finally, extracts were analyzed in the Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA) to evaluate the RNA Integrity Number (RIN). Samples with RIN ≥ 6 were used for library preparation and sequencing.

The RNA samples were diluted and used as input (0.1–4 μg) for mRNA-enrichment library preparation using the TruSeq Stranded RNA Library Prep Kit (Illumina, San Diego, CA, USA). Twelve libraries were prepared (4 strains × 3 bioreplicates) with fragment sizes ranging 131–279 bp. Libraries were denatured, diluted to 1.8 pM, and combined with a spike-in of 1.0% PhiX sequencing control. The libraries were sequenced in the Philippine Genome Center using a NextSeq 500 (Illumina, San Diego, CA, USA) platform and a NextSeq v2 high output kit (Illumina, San Diego, CA, USA) with 300 cycles (2 × 150 pairedend sequencing).

#### *2.3. Preprocessing, Alignment, and Transcriptome Assembly*

Illumina bcl2fastq (v. 2.2.0) was used to convert sequencing base call files to FASTQ format. FastQC was used to evaluate reads and visualize read quality. Raw reads were processed using Trimmomatic (0.39) and fastp (0.20.0) to trim adapter sequences and filter low-quality reads.

The splice-aware aligner STAR (2.7.0f) was used to align reads to the *B. mori* reference genome from NCBI (RefSeq GCF\_000151625.1). The BAM file was processed with RNA-SeQC (1.1.9) to evaluate RNA-Seq data bias based on the total read count, coverage, and expression correlation (RPKM-based estimation of expression levels). Another aligner, HISAT (2.1.0), was used for alignment in preparation for generating the StringTie assembly. HISAT alignment files were used for transcriptome assembly evaluation and expression quantification.

Reference-based assembly was performed using Cufflinks (2.2.1) and StringTie (1.3.6). In addition to reference-based assembly, de novo assembly was also performed using Trinity (2.8.6). The percentage of aligned reads was determined using Bowtie. Contig statistics (number of transcripts, N50, mean, and median contig length) were generated using Trinitystats.pl. Assemblies were evaluated using DETONATE (1.11) to estimate their RSEM-EVAL scores. The completeness of the assemblies was evaluated using BUSCO (v. 3.0.2).

#### *2.4. Expression Quantification and Differential Expression Analysis*

FeatureCounts (1.4.6-p5) was used for gene-level quantification and DESeq2 (1.30.1) was used for gene-level differential expression analysis. Differentially expressed genes (DEGs) were filtered based on the false discovery rate (FDR)/adjusted *p*-value < 0.1 and |log2 fold change| > 1. Figures to visualize DEGs were generated from the plotMA function of DESeq2 and R packages EnhancedVolcano (1.8.0) and pheatmap (1.0.12).

The annotation of DEGs was performed against the NCBI nr protein database with an e-value cut-off of 1 × <sup>10</sup>−5. GO terms were mapped to DEGs by matching the protein IDs to the GO annotations in the newest release of the *B. mori* genome annotations from KAIKObase 4.1.0 (https://kaikobase.dna.affrc.go.jp/KAIKObase\_download.html (accessed on 4 April 2021). The R package goseq (1.42.0) was used to determine GO term enrichment among DEGs. Significantly enriched GO terms were summarized and visualized using REVIGO (http://revigo.irb.hr/ (accessed on 8 April 2021).

#### *2.5. Quantitative Real-Time PCR (qPCR)*

Quantitative real-time PCR assays were carried out using the Bio-Rad CFX96™ Touch system (Bio-Rad Laboratories, Inc., Hercules, CA, USA). Each 10-μL reaction contained 5 μL of 2X iTaq Universal SYBR Green Supermix (Bio-Rad Laboratories, Inc., Hercules, CA, USA), 5 pmol each of forward and reverse primers, and 25 ng of cDNA, following a thermal profile comprising initial denaturation at 95 ◦C for 5 min, 40 cycles of denaturation at 95 ◦C for 10 s, annealing and extension at 60 ◦C for 30 s, and a melt curve assay from 95 ◦C to 65 ◦C in 0.5 ◦C stepwise increments. Primers for select genes differentially expressed between CAR and TCMO were designed using Primer-BLAST and are shown in Table 1.

**Table 1.** Primers used for qPCR validation of differentially expressed genes in *Bombyx mori* strains.


Total RNA was extracted from silkworms reared at the PTRI-TCMO and PTRI-CAR, followed by TURBO DNase treatment as previously described. RNA from two individual silkworms from each location was pooled and used to generate cDNA using the ProtoScript II First Strand cDNA Synthesis Kit (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's recommendations. The cDNA was quantified by fluorometry using the Qubit ssDNA Assay kit (Thermo Fisher Scientific). Serial dilutions (10×) of pooled cDNA samples were prepared to generate a standard curve for each primer set used for the qPCR assays, starting at 100 ng of cDNA. The standard curves were used in conjunction with cycle threshold values to calculate starting quantities of the transcripts, and the expression levels and fold changes of each gene were normalized to *B. mori* ribosomal protein 49 [20]. All reactions were performed in triplicate. Calculations were performed using the Bio-Rad CFX Manager (v. 3.1). Statistical significance was determined using Student's t-test with GraphPad Prism 6.

#### **3. Results**

#### *3.1. Selection of Representative Silkworm Strains*

Representative strains from the two collection sites were selected based on their phenotypic and economic characteristics (Table 2). Each strain can be differentiated by its larval markings. MO204 and DMMMSU119 are milky white, while LAT51 and ITA are beige.


**Table 2.** Characteristics of silkworm strains used in the study.

Cocoon shell percentage: Ratio of the weight of silk shell to the weight of the whole cocoon; Raw silk percentage: Ratio of the weight of raw silk to the weight of the whole cocoon.

The rearing sites had different conditions, particularly temperature and relative humidity (RH), the most important environmental factors in silkworm rearing. During sample collection, the temperature in TCMO was higher by 7.8 ± 0.1 ◦C and RH was also higher by 12%.

#### *3.2. RNA Sequencing and Quality Control*

High-quality RNA was extracted (RIN ≥ 6), and the sequencing libraries had fragment sizes ranging from 131 to 279 bp, as determined by the Agilent TapeStation. Sequencing yielded 50.1 Gb of 169.2 M reads passing filter, with 96.1% of reads having >Q30.

The number of raw reads from RNA sequencing is shown in Table 3. All ITA bioreplicates yielded lower than the expected number of reads. Because the number of reads was deemed insufficient, ITA was no longer included in downstream analyses.


**Table 3.** Statistics of raw RNA-seq read sequences of *Bombyx mori* strains.

Trimmomatic was used for trimming Illumina adapters that were ligated to tag each sample during sequencing, and for quality trimming on leading and trailing sequences. Sequences that were under 36 bp were also dropped. A sliding window of 4:20 was used, where the sequences will be trimmed if the quality drops below Q20 in a 4-base window. After processing with Trimmomatic, most sequences were still retained, indicating that only a small number of sequences were dropped due to having low-quality bases (Figure 1). After Trimmomatic processing, poly G sequences still appeared in overrepresented sequences in FastQC, particularly in forward reads. Poly G sequences are artifacts in NextSeq sequencing due to its two-color filter that calls for G bases when there is no emission detected. Poly G sequences were filtered out using fastp, after which 77.9–83.1% of the raw reads were retained.

**Figure 1.** *Bombyx mori* raw read counts and proportion of reads after processing.

#### *3.3. RNA-seq Read Alignment to the Reference Genome*

Two samples per strain with the most read counts were aligned to the *B. mori* reference genome RefSeq GCF\_000151625.1 [21] using the splice-aware aligner STAR. Most single and paired reads were uniquely mapped to the reference with only a small percentage (0.11–0.53%) of unmapped reads (Figure 2). The DMMMSU replicates had a higher percentage of reads that were mapped to multiple loci (22.73% and 14.70% for DMMMSU-2 and DMMMSU-4, respectively), while for LAT51 and MO204 replicates, more than 90% of reads were uniquely mapped.

Moreover, most of the splice sites detected by STAR were in the GTF annotation of the reference genome. It has been reported that STAR generates a large number of putative novel splice sites [22], but in this case, the splice sites were validated by the reference genome annotation.

**Figure 2.** Summary of STAR alignment statistics of *Bombyx mori* RNA-seq reads to the reference genome. (PE: Paired-end reads, SE: Single-end reads). Generated using STAR.log files and MultiQC [23].

Samtools flagstat was also run to analyze data from the FLAG field of the generated BAM files. According to flagstat, most of the mapped reads were properly paired and mapped with their mate. No duplicates were detected, but secondary alignments were produced (ranging from 9.6 to 31%). This coincides with the STAR statistics indicating that there were several reads mapped to multiple loci (ranging from 7.43 to 22.73%). In the BAM files of the paired-end reads, less than 1% were singletons, and 0 reads had mates mapped to a different chromosome.

#### *3.4. Transcriptome Assembly and Evaluation Metrics*

After assembling the reads using different assemblers, the assemblies were mapped back to the input reads to evaluate their read support. Figure 3 shows that the reads had the highest alignment rates with the Trinity assembly, while the Cufflinks and StringTie assemblies had comparable alignment rates. Since the reads from the DMMMSU samples had a lower percentage of uniquely mapped reads, it could be expected that the referencebased assemblies would have less support from these samples.

**Figure 3.** Overall alignment rate of RNA-seq reads to the transcriptome assemblies.

Transcriptome assemblies are also evaluated based on the number and length of unigenes (unique sequences reconstructed in the assembly; includes not only genes but all assembled contigs). According to the contig statistics in Table 4, the Trinity assembly had the lowest N50 (1313) and lowest median contig length (468). Cufflinks had the highest contiguity with the highest N50 (3414), highest median contig length (1738), and average contig length (2305.35). However, in a transcriptome assembly, long contig lengths and high N50 are not as important as in genome assemblies. The most highly expressed transcripts are not necessarily the longest ones, and the majority of transcripts are expected to have low expression levels. Transcript lengths are also not indicative of a good transcriptome assembly since transcripts may be present in a wide range of sizes.

**Table 4.** Contig statistics of the different transcriptome assemblies from *Bombyx mori* strains. Statistics generated from TrinityStats.pl.


Trinity assemblies had the highest overall alignment rates (96.60–98.00%), while the alignment rate for the Cufflinks and StringTie assemblies were comparable (60.5–91.10% and 58.80–90.2%, respectively).

A metric more appropriate for transcriptome assemblies is the ExN50 statistic, which represents N50 based on the most highly expressed transcripts that represent x% of the normalized expression data (e.g., E90N50 is the N50 value for 90% of the normalized expressed transcripts, excluding the lowly expressed transcripts) [24]. To obtain the ExN50 scores, the expression data were first normalized using kallisto, then count matrices were generated. Finally, the Trinity scripts contig\_ExN50\_statistic.pl and plot\_ExN50\_statistic.Rscript were run to compute and visualize ExN50 contig statistics.

Comparing the N50 of the assemblies with their E90N50, a reversal is observed. When all contigs were considered (E100), Cufflinks and StringTie had a higher N50 than Trinity, indicating longer contig lengths. However, when N50 is confined to 90% of expression data, the Trinity assembly resulted in a higher value for E90N50. This suggests that lowly expressed genes make up longer contigs in reference-based assemblers but not in the de novo assembler. The ExN50 profiles in Figure 4 show that the peak N50 occurs at higher Ex values, indicating that the sequencing depth was enough to assemble the lowly expressed transcripts.

**Figure 4.** ExN50 profiles of the (**a**) Cufflinks, (**b**) StringTie, and (**c**) Trinity merged *Bombyx mori* transcriptome assemblies.

A reference-free evaluation metric for transcriptome assemblies was developed by Li et al. [25] in the package DETONATE (De novo Transcriptome RNA-seq Assembly with or without the Truth Evaluation). DETONATE has two component packages, RSEM-EVAL and REF-EVAL. RSEM-EVAL is based on the RSEM (RNA-Seq by Expectation Maximization) algorithm. RSEM-EVAL is based on a probabilistic model that combines different factors including the compactness of the assembly and read support [25]. The RSEM-EVAL score is a log joint probability based on three components: Likelihood of the assembly, assembly prior, and a Bayesian information criterion penalty. Penalties are imposed when assemblies have too many bases or contigs or have an unusual distribution of contig lengths relative to the expected read coverage. RSEM-EVAL scores are always reported to be negative, but higher scores are indicative of better assemblies. Out of the three transcriptome assemblies, Trinity obtained the highest RSEM-EVAL score (Table 5). Aside from the RSEM-EVAL scores, REF-EVAL scores are also reported in Table 5. REF-EVAL reports the recall and precision of the assemblies at the contig and nucleotide levels. The recall is the fraction of reference elements (contigs, scaffolds, nucleotides) that are recovered in the assembly, while precision is the fraction of assembly elements that recover a reference element. The F1 score represents the harmonic mean of the recall and precision scores:

$$F\_1 = \frac{2 \times recall \times precision}{recall + precision}.\tag{1}$$


**Table 5.** DETONATE scores of the different *Bombyx mori* transcriptome assemblies.

The k-mer compression (*KC*) score is computed by the formula:

$$\text{K\!C} = \text{WKR} \left( \text{weighted } kmer \text{ recall} \right) - \text{I\!CR} \left( \text{inverse} \, \text{compression\, rate} \right), \tag{2}$$

where *WKR* measures an assembly's recall of the k-mer content in the reference, with each k-mer weighted by relative frequency, and *ICR* measures the compression of the RNA-seq data.

To evaluate the completeness of assemblies, Benchmarking Universal Single-Copy Orthologs (BUSCO) was employed using the dataset insecta\_odb9 (2016). BUSCOs are core genes from OrthoDB that are expected to be present as single copies in orthologous groups. Recovered genes are classified as 'complete' when their lengths are within two standard deviations of the group mean length; 'fragmented' when they are only partially recovered; and 'missing' if they are unrecovered [26]. Both reference-guided assemblies had a high percentage (>95%) of completely recovered BUSCOs, while the Trinity assembly had the lowest number of BUSCOs at 80.10% (Figure 5). The StringTie assembly had more single-copy and fewer duplicate orthologs. Duplicated BUSCOs should be rare and could indicate erroneous assembly of haplotypes [26]. Running BUSCO for unmerged transcriptome assemblies (per sample) resulted in a small percentage of duplicated BUSCOs, so the duplication may be due to heterozygous alleles that were not collapsed when the assemblies were merged.

To summarize the results of the different transcriptome assembly evaluation metrics, the Trinity de novo assembly performed best in terms of alignment rate, assembled the greatest number of transcripts with high contig length represented by E90N50, and had the best RSEM-eval score. However, when it comes to the completeness of assembly based on BUSCO, Cufflinks outperformed the other assemblers.

#### *3.5. Expression Quantification, Normalization, and Differential Expression Analysis*

The reads were counted according to gene-level features using FeatureCounts (1.4.6-p5), which is part of the R Subread package [27]. Quantification was run at the meta-feature level (gene level), where reads that overlap multiple exons of the same gene are counted exactly once, provided there is no overlap with another gene (i.e., exon spanning reads will be counted once). Read count normalization was performed with DE analysis using DESeq2's size factor normalization. After running DESeq2, 1647 DEGs were identified within the threshold *p* < 0.1, but after FDR correction, only 476 DEGs were within the adjusted threshold *padj* < 0.1. Factor levels were set by location, where TCMO is the reference condition and CAR is set as the contrast condition. Among the 476 DEGs, there were 222 upregulated genes and 254 downregulated genes in CAR strains.

**Figure 5.** Measure of completeness of transcriptome assemblies from *Bombyx mori* strains using BUSCO (dataset: insecta\_odb9).

A heat map of the 476 DEGs that passed the significance threshold is shown in Figure 6. Clustering of the DEGs by location is apparent in the heat map. The replicates for each strain show similar DEGs, except for a cluster of DEGs that are upregulated in bioreplicate MO204-4 but not in other MO204 replicates. This DEG cluster is composed of loci with uncharacterized proteins (LOC101741578, LOC101739373, LOC101736556, LOC101741319, LOC101738771, LOC101740400, LOC101735788). LOC101741578 has a conserved domain (COG5240 SEC21; vesicle coat complex COPI, gamma subunit) that is found in proteins involved in intracellular traffic and secretion [12]. Since these genes are highly upregulated in MO204-4 but not in other TCMO strains, this may be due to individual differences that were not taken into account (e.g., sex differences).

DEGs were filtered by the criteria *p*adj < 0.1 and |log2 FC| > 1 to select statistically and biologically significant genes; genes passing this significance threshold are represented by red points in the volcano plot shown in Figure 7. The filtered DEGs with their corresponding gene and protein IDs are listed in Table S1.

#### *3.6. Functional Annotation of Select DEGs*

The top DEGs were selected based on statistical significance and high fold changes and are summarized in Table 6. This list includes fold changes, *p*adj values, as well as the protein names (if available) and the functions derived from Pfam/InterPro annotations available in UniProt [29].

**Figure 6.** Heat map of the 112 DEGs from *Bombyx mori* with *p*adj < 0.1. The values shown are scaled to the distances from the average of each row (gene).

Gene Ontology (GO) terms were mapped to the gene sets by matching the protein IDs to the GO annotations in the newest release of *B. mori* genome annotations from KAIKObase (ver. 4.1.0, July 2020). Their annotation contains the newest predicted gene models, descriptions including InterPro IDs, GO terms, and best hit in the NCBI nr database. Out of 14,124 assayed genes, 11,792 were annotated with protein IDs from nr, and 3948 were annotated with GO terms.

GO term enrichment was determined using the R package goseq. Significantly overrepresented GO terms are summarized in Tables S1 and S3 for upregulated and downregulated genes, respectively. Among the upregulated genes, the biological processes that were enriched are related to nucleotide metabolism and biosynthesis, lipid and carbohydrate metabolic processes, and regulation of transcription. Among down-regulated genes, similar biological processes were found to be enriched, but the ontology for carbohydrate metabolic process stands out with a low *p*-value. For cellular components, there is no overlap of GO terms for up- and downregulated genes. For molecular functions, nucleotide, protein, and metal binding are enriched in upregulated genes, while DNA binding, ATP binding, and zinc binding are enriched in downregulated genes. Catalytic activity, oxidoreductase, and hydrolase activities are also enriched molecular functions found in both sets.

**Figure 7.** Volcano plot of DEGs from *Bombyx mori*, log fold change vs. −log *P*. Gray: Nonsignificant genes; green: Genes with significant fold changes but insignificant adjusted *p*-values; blue: Genes that passed the significance threshold according to adjusted *p*-value but did not have significant fold changes; red: Genes that have adjusted *p*-value < 0.1 and significant fold change (|log2 FC| > 1). Graph generated in R package EnhancedVolcano [28].

**Table 6.** Summary of select top DEGs between different *Bombyx mori* strains (CAR vs. TCMO).


#### *3.7. qPCR Validation of Select DEGs*

Among the selected top DEGs in Table 6, six were selected for validation using qPCR: Heat shock protein 20 (Hsp20), oxygen-dependent choline dehydrogenase-like (odcdl), myrosinase 1 (myr), peroxidase-like (pl), uncharacterized protein (LOC105842393), and uncharacterized protein tetraspanin family (ts). The first three were observed in silico to be downregulated in CAR vs. TCMO silkworms, while the last three were calculated to be upregulated. The results for the qPCR found Hsp20 to be significantly downregulated in CAR vs. TCMO samples (*p* < 0.0001), as were odcdl (*p* = 0.0005) and myr (*p* = 0.0008). LOC105842393 was found to be significantly upregulated in CAR vs. TCMO (*p* = 0.028), as were pl (*p* = 0.0001) and ts (*p* = 0.0093). These data support the results of in silico analysis (Figure 8). Additionally, log2FC values were calculated and compared with the values produced using DESeq2, as presented in Figure 8. While differences were found between the log2FC values between qPCR and DESeq2, these may be due to the individual gene expression differences between samples used for sequencing and samples used for qPCR. Despite these differences, the trends of downregulation or upregulation in CAR vs. TCMO samples observed during qPCR appear to support those found during in silico analysis.

**Figure 8.** Results of qPCR validation of differentially expressed genes. Heat shock protein 20 (Hsp20), oxygen-dependent choline dehydrogenase-like (odcdl), and myrosinase 1 (myr) were downregulated in CAR vs. TCMO, while uncharacterized protein (LOC105842393), peroxidase-like (pl), and uncharacterized protein tetraspanin family (ts) were upregulated in CAR vs. TCMO. Log2FC values calculated from qPCR results are compared with values obtained during DESeq2 analysis. Error bars indicate standard deviation; asterisks denote significant change in CAR vs. TCMO (Student's *t*-test), *p* < 0.05.

#### **4. Discussion**

In this study, we compared gene expression in silk glands of *B. mori* strains reared in two different local facilities with different rearing temperatures. According to previous studies, the optimal temperature for silkworm growth is 20–28 ◦C and 23–28 ◦C for silk productivity [30]. The ambient temperature in the rearing sites during collection was 31.7 ◦C and 23.9 ◦C for TCMO and CAR, respectively. These data are based on bivoltine strains, although polyvoltine strains acclimatized in tropical countries are known to tolerate higher temperatures. Nevertheless, it has been shown that lower temperatures are better for silkworm productivity and larval duration [31].

Closely related to temperature is humidity, thus, an important factor to consider. It influences the withering of leaves in rearing beds, which affects larval feeding. However, there is no limiting range for humidity, and most insects can develop as long as they can control their water balance [30].

Among the downregulated genes in CAR were myrosinase, heat shock proteins, and serine protease inhibitors, which are related to defense and ensuring proper protein folding. These were downregulated in CAR strains likely because chaperone and protease inhibitors are not needed as much in lower temperatures. Aside from Hsp20.1, other heat shock proteins that are significantly downregulated are Hsp 68, Hsp 70, Hsp 23.7 precursor, and Hsp 25.4 precursor.

On the other hand, there were upregulated genes found in CAR related to defense such as peroxidase-like protein, defensin, and other proteins related to oxidoreductase activity. Many of these protein products are yet to be characterized, but sequence motifs provide some hints about their function based on information from protein families and conserved domains.

There were also several dehydrogenases found among the DEGs, such as oxygendependent choline dehydrogenase-like protein, L-sorbose 1-dehydrogenase, glucose dehydrogenase [FAD, quinone], 15-hydroxyprostaglandin dehydrogenase, farnesol dehydrogenase, dihydropyrimidine dehydrogenase [NADP(+)], mitochondrial aldehyde dehydrogenase, and alcohol dehydrogenases. Dehydrogenases are a type of oxidoreductases that oxidize substrates by reducing an electron acceptor and are important for their role in detoxification of metabolites. Malate dehydrogenase (BmMDH1) and 6-phosphogluconate dehydrogenase (Bm6PGD) were previously found to be differentially expressed in Dazao silkworms exposed to different temperatures, which implies enhancement in energy metabolism and ATPase expression at low temperatures [32].

Another DEG of note is SOSS complex subunit B homolog (log2FC: −2.0042; *p*adj: −2.004 × <sup>10</sup>−13). The SOSS complex binds to single-stranded DNA and is involved in DNA damage response, cell cycle checkpoint activation, homologous repair of doublestranded breaks, and ATM-dependent signaling pathways [33]. Downregulation of SOSS in CAR strains may indicate that there is less DNA damage response in silkworms reared in low-temperature environments.

Microvitellogenin (log2FC: 6.2283; *p*adj: 0.023) was one of the genes found upregulated in CAR strains, which encodes a low molecular weight (30 kDa) lipoprotein (30 K proteins, or 30 KPs). These lipoproteins are synthesized in the *B. mori* fat body and secreted in the hemolymph during the last instar larva stage. They are involved in several physiological processes, such as energy storage, embryonic development, and immune response [34]. Upregulation of microvitellogenin may indicate higher energy storage and enhanced immunoprotection in low-temperature strains [35,36].

Another upregulated DEG was the protein eyes shut (log2FC: 5.7860; *p*adj: 0.024), also known as EGF-like protein 10, an essential protein for the formation of photoreceptor cells in *Drosophila* [33]. Most EGF-like domains are found in extracellular domains of membranebound proteins or secreted proteins. Aside from its EGF-like domain, *B. mori* protein eyes shut also has a laminin G domain, which is associated with different functions such as cell adhesion, signaling, migration, assembly, and differentiation [37].

Three regulators of the juvenile hormone (JH) were found among the DEGs in this analysis: Juvenile hormone epoxide hydrolase-like protein 2 (Jheh-lp2), juvenile hormone epoxide hydrolase precursor (Jheh2), and cytosolic juvenile hormone binding protein 36 kDa subunit (Cjhbp). JHEH inactivates juvenile hormones by hydrolyzing the epoxide groups in JH. Differential expression of JH regulators could explain the observed differences in larval development between CAR and TCMO strains, wherein CAR strains have a longer larval duration (23 days for TCMO, 27 days for CAR). JH has long been known to affect silk production, and the application of JH analogs has been used to prolong the 5th instar stage to increase larval weight and silk secretion [38]. More recently, the JH regulatory pathway has been demonstrated to influence Fib-H expression through the action of transcription factors *Bmdimm* and *Bmsage* [39]. Zhou et al. [40] showed that JH biosynthesis and sexual maturation are delayed in cotton ballworm (*Helicoverpa armigera*) reared at 19 ◦C compared to those reared in higher temperatures. Liu et al. [41] showed that JH titers in Formosan termite (*Coptotermes formonasus*) were positively correlated with temperature. In contrast, Geister et al. [42] found no correlation between temperature and JH titers in the tropical butterfly *Bicyclus anynana*.

Caytaxin was downregulated in CAR strains (log2FC: −2.8609; *p*adj: 0.032). Caytaxin is a member of the BNIP2 (Bcl2-/adenovirus E1B nineteen kDa-interacting protein 2) protein family (pfam12496). Caytaxins interact with pro- and anti-apoptotic molecules in the cell and are involved in Rho GTPase regulation [43].

Another DEG found was dymeclin (log2FC: 1.7453; *p*adj: 0.030), which belongs to the Dymeclin (Dyggve-Melchior-Clausen syndrome protein) protein family (pfam09742) in plants and animals. Dymeclin proteins have a length of approximately 700 residues and contain many leucine and isoleucine in their conserved domain [12]. Mutations in human dymeclin cause Dyggve–Melchior–Clausen syndrome (DMC, MIM 223800), an autosomal recessive disorder. Dymeclin is a peripheral membrane protein dynamically associated with the Golgi apparatus. In *Caenorhabditis elegans*, a member of the Dymeclin protein family is hid1 (high-temperature-induced dauer-formation protein 1)*,* which encodes a highly conserved transmembrane protein that could be involved in vesicle secretion or intercellular signaling [44].

Previous comparative studies found similar enrichment in GO terms as those in this study. In one study comparing wild and domestic silkworms, oxidoreductase activity (GO: 0016491) was the only enriched GO term in four pairwise comparisons [11]. In another study comparing Chinese silkworm strains JingSong and Lan10, which have different rates of silk production, the dominant GO term was membrane-enclosed lumen under the cellular component [10].

#### **5. Conclusions**

The present study provides, for the first time, valuable information on the transcriptome of *B. mori* strains found in the Philippines. The differential expression analysis performed on silk glands of *B. mori* strains grown in different sites gives us insight into the processes and functions linked to the down- and up-regulated genes. This molecular information could be used as a source of trait markers for the improvement of local silkworm strains to enhance the sericulture industry in the Philippines. For future directions, aside from protein annotations and GO terms, KEGG pathway analysis may also be performed to further elucidate the functions of DEGs in silkworms and in silk production. It is also recommended to perform further quantitative PCR assays to validate the other bioinformatically obtained DEGs, in addition to the DEGs validated here. Furthermore, this study could be extended to transcriptome analysis of other *B. mori* strains, tissues, and developmental stages.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/insects13080669/s1, Table S1: DEGs between *B. mori* CAR and TCMO strains within the *p*-adj threshold, Table S2: GO terms enriched among upregulated genes in the silk glands of selected *B. mori* strains from CAR, Table S3: GO terms enriched among downregulated genes in the silk glands of selected *B. mori* strains from CAR.

**Author Contributions:** P.N.O.d.l.P.: Methodology, investigation, data curation, formal analysis, visualization, writing—original draft; A.G.D.L.: Investigation, formal analysis, visualization, writing; M.A.M.B.: Conceptualization, funding acquisition, methodology, project administration, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by in-house funding for 2020 provided by the National Institute of Molecular Biology and Biotechnology, University of the Philippines—Diliman for P.N.O.d.L.P.'s master's thesis.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The RNA-seq dataset generated in this study has been deposited in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus [45] accessible through GEO Series Accession Number GSE184152.

**Acknowledgments:** The authors would like to thank the Department of Science and Technology— Philippine Textile Research Institute (DOST-PTRI) for providing administrative and material support for sample collection, and the PTRI Technology Center in Misamis Oriental and DOST Cordillera Administrative Region Technology Center for providing silkworm samples.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

