Next Article in Journal
Airway Epithelial-Derived Immune Mediators in COVID-19
Next Article in Special Issue
Navigating the Landscape: A Comprehensive Review of Current Virus Databases
Previous Article in Journal
Genomic and Pathologic Characterization of the First FAdV-C Serotype 4 Isolate from Black-Necked Crane
Previous Article in Special Issue
Viral Instant Mutation Viewer: A Tool to Speed Up the Identification and Analysis of New SARS-CoV-2 Emerging Variants and Beyond
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Bioinformatic Surveillance Leads to Discovery of Two Novel Putative Bunyaviruses Associated with Black Soldier Fly

by
Hunter K. Walt
1,*,
Emilia Kooienga
2,
Jonathan A. Cammack
3,4,
Jeffery K. Tomberlin
3,
Heather R. Jordan
2,
Florencia Meyer
1 and
Federico G. Hoffmann
1,5,*
1
Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University, Starkville, MS 39762, USA
2
Department of Biology, Mississippi State University, Starkville, MS 39762, USA
3
Department of Entomology, Texas A&M University, College Station, TX 77843, USA
4
EVO Conversion Systems, LLC, College Station, TX 77845, USA
5
Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS 39762, USA
*
Authors to whom correspondence should be addressed.
Viruses 2023, 15(8), 1654; https://doi.org/10.3390/v15081654
Submission received: 27 June 2023 / Revised: 21 July 2023 / Accepted: 26 July 2023 / Published: 29 July 2023
(This article belongs to the Special Issue Virus Bioinformatics 2023)

Abstract

:
The black soldier fly (Hermetia illucens, BSF) has emerged as an industrial insect of high promise because of its ability to convert organic waste into nutritious feedstock, making it an environmentally sustainable alternative protein source. As global interest rises, rearing efforts have also been upscaled, which is highly conducive to pathogen transmission. Viral epidemics have stifled mass-rearing efforts of other insects of economic importance, such as crickets, silkworms, and honeybees, but little is known about the viruses that associate with BSF. Although BSFs are thought to be unusually resistant to pathogens because of their expansive antimicrobial gene repertoire, surveillance techniques could be useful in identifying emerging pathogens and common BSF microbes. In this study, we used high-throughput sequencing data to survey BSF larvae and frass samples, and we identified two novel bunyavirus-like sequences. Our phylogenetic analysis grouped one in the family Nairoviridae and the other with two unclassified bunyaviruses. We describe these putative novel viruses as BSF Nairovirus-like 1 and BSF uncharacterized bunyavirus-like 1. We identified candidate segments for the full BSF Nairovirus-like 1 genome using a technique based on transcript co-occurrence and only a partial genome for BSF uncharacterized bunyavirus-like 1. These results emphasize the value of routine BSF colony surveillance and add to the number of viruses associated with BSF.

1. Introduction

Insects have become a point of interest as sustainable alternative sources of food and feed. Insect rearing is less environmentally impactful than traditional food and feed production, but insects must be reared in mass to produce a substantial amount of biomass [1]. Mass rearing of animals is highly conducive to pathogen transmission, as many are reared in close quarters. Insects are no exception, and the production of insects of economic importance, such as crickets (Orthoptera: Gryllidae), silkworms (Lepidoptera: Bombycidae), and honeybees (Hymenoptera: Apidae), has been historically stifled by viral epidemics [2].
The black soldier fly (BSF) (Hermetia illucens) is an insect of emerging industrial importance. This is because of its ability to convert organic waste to highly nutritious biomass for animal feed and potentially food for humans [3]. It has also been investigated as a source of antimicrobial compounds, antioxidants, and biodiesel [4,5,6]. As a result, there is global interest in upscaling the rearing of the species; however, relatively little is known about the microbes that could help or hinder its production [7]. It has been proposed that BSF are unusually resistant to pathogens because of their large repertoire of antimicrobial peptides, and until recently, no viruses were known to be associated with BSF [8,9,10,11]. Pienaar et al. [11] mined publicly available BSF data for endogenous viral elements (EVEs) and viruses and found multiple EVEs and one exogenous toti-like virus. This suggests that there is a history of BSF-virus interactions and emphasizes the need for investigating BSF-virus interaction.
In this study, we use a computational approach to survey RNA-seq datasets for viruses associated with BSF. We use transcriptomic data generated from previous BSF studies to search for putative novel viral-like sequences associated with BSF. We use a sequence similarity-based approach to identify putative viral sequences in our data, infer phylogenetic relationships based on the RNA-dependent RNA polymerase (RdRp) gene, and identify candidate sequences of diverged genome segments of these viruses based on the co-occurrence of transcripts between samples [12]. Our results emphasize the potential of high-throughput sequencing technology as a biological surveillance tool that can also identify emerging pathogens of BSF. In addition, this type of data provides a foundation to develop molecular diagnostic tools to actively monitor colonies for viral infection [13].

2. Materials and Methods

2.1. Sample Preparation and Sequencing

Larvae (n = 19), BSF digestate substrate (frass) (n = 17), and a sample of the diet (n = 1) with no larvae (spent grain) were sampled from industrial-scale rearing enclosures. Larvae were surface sterilized by dipping in 10% bleach solution for 1 min, followed by two successive deionized water washes for one minute. Following this, the intestinal tract (gut) was dissected from the larvae. RNA was purified from the larval guts, frass, and the initial spent grain diet using a Direct-zol™ RNA MiniPrep kit (Zymo, Irvine, CA), following the manufacturer’s protocol, and RNA products were quantified with a Qubit® 2.0 fluorometer (Invitrogen, Waltham, MA), and then ran on a gel to determine the RNA quality. A NEB Ultra RNA Library Kit (New England Biolabs, Ipswich, MA) was used to convert the RNA to cDNA, avoiding steps involving rRNA depletion and mRNA enrichment while preserving total RNA concentrations. cDNA libraries were multiplexed using NEBNext Oligos for Illumina (New England Biolabs, Ipswich, MA) to create five pooled libraries. The resulting multiplexed libraries were quality verified using a 4150 TapeStation System (Agilent, Santa Clara, CA) and then submitted for shotgun whole metatranscriptome sequencing using an Illumina HiSeq 2000 instrument (Illumina, San Diego, CA) generating 151 bp reads. Sequences were deposited in the Sequence Read Archive database under BioProject ID: PRJA976287.

2.2. Metatranscriptome Assembly

The paired-end reads were quality checked using fastQC v.0.11.9 [14] and quality trimmed using trimmomatic v.0.39 with the options ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10, SLIDINGWINDOW:4:15, and MINLEN: 30 [15]. All read libraries were then mapped to the reference BSF genome (GCA_905115235.1) [16] using HISAT2 v.2.2.1 [17], and only the non-BSF reads were retained using the –un-conc option. The non-BSF reads were subsequently assembled using Trinity v2.14.0 [18] through Trinity’s Docker container [19].

2.3. Identification of Conserved Viral Sequences

To identify the putative viral sequences, all transcripts in the assembled transcriptomes were renamed by sample ID and a unique identifier and concatenated. Next, the concatenated transcriptome was clustered using CD-hit-EST [20] at a threshold of 95% identity (calculated as the number of identical bases divided by the length of the shorter sequence), a word size of 8, and a minimum length of 500 nt. The representative sequences from these clusters were then annotated using DIAMOND + BLASTx using the –very-sensitive flag and an e-value threshold of 1e-2 [21,22]. Sequences matching RNA viruses were considered of probable viral origin. To eliminate false positives, these sequences were further inspected to establish the presence of ORFs of viral origin using NCBI’s ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/ accessed 5 March 2023). All large ORFs were translated and aligned to NCBI’s nr protein database using the BLASTp webserver (https://blast.ncbi.nlm.nih.gov/Blast.cgi accessed 5 March 2023). Viral conserved protein domains were verified using NCBI’s conserved domain database (https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml accessed 5 March 2023) and InterProScan (https://www.ebi.ac.uk/interpro/search/sequence/ accessed 5 March 2023) [23]. Sequences matching RNA viruses that passed the criteria listed above were considered as viral conserved sequences.

2.4. Identification of Diverged Viral Fragments

To find diverged genomic segments of viruses in the assembled transcriptomes, we implemented a method to identify transcripts that co-occur with the viral conserved sequences [12]. We assumed that diverged genomic segments would not have significant BLAST hits, so we only considered unannotated transcripts in the ensuing analysis. We wrote a Python script that determined all the samples where a viral conserved sequence is found. Next, we used two metrics that assessed how often any other transcripts co-occur with the viral conserved sequence: a viral co-occurrence metric (Vco) and a transcript co-occurrence metric (Tco). Vco measures how frequently a transcript is found in the same sample as a viral conserved sequence and is used to filter out transcripts that are not frequently found in the sample as the viral conserved sequence. Mathematically, Vco is equal to the cardinal number of the intersection of V and T divided by the cardinal number of V, where V is the set of samples where a viral conserved sequence is found, and T is the set of samples that a transcript occurs in (Equation (1)).
V c o = n ( V T ) n ( V ) ,
Because  n ( V T ) n ( V ) ,  Vco ranges from 0 (where the transcript and viral conserved sequence of interest are never found in the same sample) to 1 (where the transcript and viral conserved sequence of interest are only found together). Intermediate values of Vco (0 < Vco < 1) indicate that the viral conserved sequence and the transcript co-occur in some samples but not in all.
The transcript co-occurrence metric (Tco) is a complementary metric that assesses whether a transcript is found in samples without the viral conserved sequence. Tco is described by Equation (2) and is the cardinal number of the intersection of V and T divided by the cardinal number of T.
T c o = n ( V T ) n ( T ) ,
Because  n ( V T ) n ( T ) ,  Tco ranges from 0 (where the transcript and the viral conserved sequence of interest are never found in the same sample) to 1 (where the transcript is only found with the viral conserved sequence). Intermediate values of Tco (0 < Tco < 1) indicate that the transcript is found in samples with the viral conserved sequence but also in additional samples. Ideally, Tco and Vco will be close to 1. In our case, we used a threshold of Vco = 0.75 and Tco = 0.5 to designate a transcript as putatively viral. The remaining transcripts were filtered by an ORF size > 500 nucleotides and inspected for other viral attributes using NCBI’s conserved domain database, BLAST, and InterProscan.

2.5. Phylogenetic Analysis

To confirm the identity of the novel viruses, we performed a phylogenetic analysis using the RNA-dependent RNA-polymerase (RdRp) gene from a diverse selection of bunyaviruses. The amino acid sequence for each bunyavirus RdRp was downloaded from NCBI (https://www.ncbi.nlm.nih.gov/ accessed 5 March 2023). The ORF containing the novel viruses’ RdRp was extracted and translated using NCBI’s ORFfinder. All resulting amino acid sequences were aligned using mafft v7.490, using the E-INS-I algorithm and the –maxiterate option set to 1000 [24]. ModelFinder [25] was used through the IQ-TREE2 v2.0.7 to find the best evolutionary model, and IQ-TREE2 v2.0.7 was used to infer the most likely tree [26]. Branch support was measured using ultrafast bootstrap with 1000 replicates, the Shimodaira–Hasegawa-like approximate likelihood ratio test (SH-aLRT) with 1000 replicates, and the aBayes test [27,28,29].

2.6. Incorporation of Public Data

All publicly available BSF transcriptomic data (n = 116, Table S4) was downloaded using the SRA toolkit v.3.0.0 (https://hpc.nih.gov/apps/sratoolkit.html accessed 27 October 2022) and mapped to the candidate viral segments using BWA-mem v2.2.1 [30]. Resulting sequence alignment files (.SAMs) were converted to binary (.BAMs), sorted, and indexed using SAMtools v.1.6 [31]. The presence of viral transcripts in the public data was inspected manually using Integrative Genomics Viewer v.2.14.1 [32]. If a viral conserved sequence was found, the corresponding SRA dataset was processed using the same process described in Section 2.2, added to the concatenated metatranscriptome, and the co-occurrence analysis was run again.

3. Results

We mapped 13,352,390 reads generated from 37 BSF larvae and frass samples to the BSF reference genome (GCA_905115235.1) [16]. An overview of the samples and the percentage of the reads that mapped to the BSF genome is shown in Table 1. For each sample, we assembled the reads that did not map to the BSF genome, resulting in 37 metatranscriptomes. CD-hit clustering of the metatranscriptomes resulted in 7962 distinct sequence clusters, all of which shared greater than 95% sequence identity. The representative sequence from each cluster was annotated using diamond BLASTx, resulting in 7830 (98.3%) annotations. We filtered these results for viral hits and found two transcripts with significant hits to RdRps of arthropod-infecting bunyaviruses: one transcript was 5696 nucleotides in length, and the other was 4542 nucleotides. We hereon refer to the virus with the longer RdRp transcript as BSF uncharacterized bunyavirus-like 1, and the other we refer to as BSF Nairovirus-like virus 1. We detected BSF uncharacterized bunyavirus-like 1 sequences in the samples T6WA1, T6WA2, T6WC1, T6WR3, and T6WR4, and BSF Nairovirus-like 1 sequences in samples T6WA1, T6WA2, T6WC1, and T6WR4.
Both transcripts shared their top five BLAST hits (Table S1). We confirmed the presence of the bunyaviral RdRp domain using the NCBI conserved domain database and InterProscan. We found the BSF uncharacterized bunyavirus-like 1 encodes for a bunyavirus-like RdRp conserved domain between nucleotides 3950-4294 (E-value = 2.43 × 10−6), and BSF Nairovirus-like 1 encodes for a bunyavirus-like RdRp conserved domain between nucleotides 2052-2888 (E-value = 1.10 × 10−15). Other putative viral sequences, such as RNA phages, were found and were not included in our analyses because they are likely associated with microbes (Table S2) [33].
To phylogenetically analyze the two novel putative viruses, we translated their coding sequences and aligned them with 55 known bunyavirus RdRp amino acid sequences retrieved from NCBI to construct a phylogenetic tree using the VT+F+R7 amino acid substitution model [34], which was chosen as the best fit by ModelFinder (Figure 1). The sequences were chosen to represent all bunyavirus families documented in NCBI [35]. The phylogeny shows that one putative virus groups with two taxonomically uncharacterized arthropod-infecting bunyaviruses: Kristianstad virus (detected in mosquitoes) [36] and Beihai barnacle virus 6 (detected in barnacles) [37]. Interestingly, BSF Nairovirus-like virus 1 groups with the RdRp of Abu Hammad virus, which is in the Nairoviridae family that was isolated from a tick. Many of the viruses in the Nairoviridae family also infect vertebrates and can be transmitted by arthropod vectors [38]. We hereon refer to this putative virus as BSF Nairovirus-like 1. The full phylogenetic tree with no collapsed nodes is shown in Figure S1.
Generally, bunyaviruses have a segmented genomic architecture with a large (L) segment containing the conserved RdRp domain, a medium (M) segment, and a small (S) segment. Because of the error-prone replication by RdRp and extreme selection pressures from host-virus interactions, the M and S segments of bunyaviruses can become very diverged and sometimes impossible to detect based on sequence similarity. We did not find any BLAST results that matched other viral M or S segments, so we employed an approach based on transcript co-occurrence across samples, similar to the approach taken by Batson et al. (2021) [12]. We assumed that if an L segment occurred in multiple samples, the M and S segments would also occur in those samples. Because BSF uncharacterized bunyavirus-like 1 occurred in five samples (out of 37) (Figure 2), and BSF Nairovirus-like 1 occurred in four samples (out of 37) (Figure 2), we were able to employ this method.
We wrote a Python script that takes a defined list of viral conserved sequences (the two novel bunyavirus RdRp transcripts in our case), determines the samples where the viral conserved sequence occurred, and returns a viral co-occurrence metric (Vco, Equation (1)) and a transcript co-occurrence metric (Tco, Equation (2)). Vco is a metric to ensure that a transcript occurs in the same samples as the viral conserved sequence, while Tco is a metric to assess whether a transcript co-occurs with the viral conserved sequence and no other samples. Ideally, Vco and Tco should equal 1 (i.e., a transcript perfectly occurs with a viral conserved sequence and only in those samples), but because of limited sequencing depth, we set the Vco threshold to 0.75 and the Tco threshold to 0.5 to identify candidate M and S segments. We assumed that diverged viral genome segments had no blast hits with an e-value higher than our 1 × 10−2 threshold, so only unannotated transcripts were considered. We found five candidate genomic segments for uncharacterized BSF bunyavirus-like 1, and seven candidate genomic segments for BSF Nairovirus-like 1. These transcripts were filtered by the presence of complete ORFs greater than 500 nucleotides in length. This left us with four candidates for each viral genome, but they were the same for both BSF uncharacterized bunyavirus-like 1 and BSF Nairovirus-like 1.
To determine which candidate genomic segments go with each virus, we mined all available BSF transcriptomic data by mapping their reads to the putative L segments. No SRA reads mapped to BSF Nairovirus-like 1, but reads from six datasets mapped to BSF uncharacterized bunyavirus-like 1 (Table S3). Notably, all the public data this virus occurred in were processed in the same lab as the samples used in this study, but for a different experiment (BioProject Accession: PRJNA542977). The datasets containing BSF uncharacterized bunyavirus-like 1 were processed in the same way as the reads generated in our study (i.e., trimmed with Trimmomatic, quality checked with FastQC, and assembled with Trinity), and the transcripts were added to the co-occurrence analysis. After repeating the co-occurrence analysis with the SRA data included, BSF uncharacterized bunyavirus-like 1 only had one candidate that met our criteria. This sequence was also present in the first co-occurrence analysis. Two candidate diverged viral segments appeared with BSF Nairovirus-like 1 with Vco = 1 and Tco = 1. Thus, we assigned these two transcripts as the M and S segments for BSF Nairovirus-like 1, which are 2837 and 922 nucleotides long, respectively (Figure 3A). We assigned the transcript identified with BSF uncharacterized bunyavirus-like 1 as its S segment, as it is 853 nucleotides long (Figure 3B).

4. Discussion

In this study, we discovered two novel putative bunyaviruses associated with BSF. To our knowledge, they are only the second and third viruses associated with BSF. Although we cannot confirm if these viruses are pathogenic to BSF, diagnostic molecular tools can be used to investigate the prevalence of these viruses and if they coincide with BSF disease symptoms. We show that both viruses share most recent common ancestors with other insect viruses, although both viruses were only found in frass samples. We suspect two reasons why these viruses were only found in frass samples. (1.) The virus might infect tissues other than the gut, which are the only BSF tissues used in this study. (2.) Sequencing depth might not have been sufficient to detect these viruses in larval tissues. Alternatively, these viruses may infect components of the BSF frass, as bunyaviruses are diverse and can infect a broad range of hosts (Abudurexiti et al. 2019). Either way, our phylogeny suggests that these are arthropod-infecting viruses, as their closest phylogenetic relatives also infect arthropods. Interestingly, BSF Nairovirus-like 1 is phylogenetically related to viruses that infect vertebrates, which could be a point of concern for food and feed safety. Additionally, we detected BSF uncharacterized bunyavirus-like 1 in 11 datasets (five from this study, six from a previous study, BioProject Accession: PRJNA542977, Table S3). These datasets were all produced in a single lab, though from different BSF experiments conducted in differing years and in different rearing locations. However, the BSFs utilized for each of these experiments were from the same original colony. It will therefore be important to understand the broader distribution of this virus across different BSF strains. Future studies should be conducted to confirm the host specificity and pathogenicity of these viruses to ensure that they do not affect BSF rearing.
The main limitations of this study relate to the data used, as our sequencing data had poor depth. This may be why we were unable to obtain a Vco and Tco of one for BSF uncharacterized bunyavirus-like 1, hence why we could not identify a candidate M segment. On the other hand, we show that novel viruses can be discovered even at low sequencing depths, especially when frass samples are used. Finally, this method can be easily applied to any system, although it is optimal to have a high-quality genome to reduce the amount of host reads.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v15081654/s1, Figure S1: Fully uncollapsed bunyavirus phylogenetic tree of the Bunyavirus RdRps used in this study; Figure S2: Transcript co-occurrence across samples; Table S1: Top five BLAST hits of the two novel bunyavirus L segments discovered in this study; Table S2: Putative microbe-associated viruses found within BSF and frass samples; Table S3: SRA data that were positive for BSF uncharacterized bunyavirus-like 1; Table S4: List of all SRA data used in this study by accession number.

Author Contributions

Conceptualization, H.K.W., J.A.C., J.K.T., H.R.J., F.M. and F.G.H.; formal analysis, H.K.W.; investigation, E.K. and H.R.J.; methodology, H.K.W. and F.G.H.; software, H.K.W.; supervision, H.R.J., F.M. and F.G.H.; writing—original draft, H.K.W.; writing—review and editing, H.K.W., J.A.C., J.K.T., H.R.J., F.M. and F.G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded, in part, by the Defense Advanced Research Projects Agency (Grant No. D172-002-0010).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All reads generated from this study were deposited in the NCBI sequence read archive under BioProject ID: PRJNA976287. Python code for co-occurrence analysis can be found at https://github.com/hunterkwalt/virus_co-occur accessed on 20 June 2023.

Acknowledgments

We would like to thank Amber E. McDonald for her input on the Figure design.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Van Huis, A. Insects as Food and Feed, a New Emerging Agricultural Sector: A Review. J. Insects Food Feed. 2020, 6, 27–44. [Google Scholar] [CrossRef] [Green Version]
  2. Eilenberg, J.; Vlak, J.M.; Nielsen-LeRoux, C.; Cappellozza, S.; Jensen, A.B. Diseases in Insects Produced for Food and Feed. J. Insects Food Feed. 2015, 1, 87–102. [Google Scholar] [CrossRef] [Green Version]
  3. Tomberlin, J.K.; van Huis, A. Black Soldier Fly from Pest to “crown Jewel” of the Insects as Feed Industry: An Historical Perspective. J. Insects Food Feed 2020, 6, 1–4. [Google Scholar] [CrossRef] [Green Version]
  4. Zhu, Z.; Rehman, K.U.; Yu, Y.; Liu, X.; Wang, H.; Tomberlin, J.K.; Sze, S.H.; Cai, M.; Zhang, J.; Yu, Z.; et al. De Novo Transcriptome Sequencing and Analysis Revealed the Molecular Basis of Rapid Fat Accumulation by Black Soldier Fly (Hermetia illucens, L.) for Development of Insectival Biodiesel. Biotechnol. Biofuels 2019, 12, 194. [Google Scholar] [CrossRef] [Green Version]
  5. Xia, J.; Ge, C.; Yao, H. Antimicrobial Peptides from Black Soldier Fly (Hermetia illucens) as Potential Antimicrobial Factors Representing an Alternative to Antibiotics in Livestock Farming. Animals 2021, 11, 1937. [Google Scholar] [CrossRef] [PubMed]
  6. Lu, J.; Guo, Y.; Muhmood, A.; Zeng, B.; Qiu, Y.; Wang, P.; Ren, L. Probing the Antioxidant Activity of Functional Proteins and Bioactive Peptides in Hermetia illucens Larvae Fed with Food Wastes. Sci. Rep. 2022, 12, 2799. [Google Scholar] [CrossRef]
  7. Joosten, L.; Lecocq, A.; Jensen, A.B.; Haenen, O.; Schmitt, E.; Eilenberg, J. Review of Insect Pathogen Risks for the Black Soldier Fly (Hermetia illucens) and Guidelines for Reliable Production. Entomol. Exp. Appl. 2020, 168, 432–447. [Google Scholar] [CrossRef]
  8. Park, S.I.; Chang, B.S.; Yoe, S.M. Detection of Antimicrobial Substances from Larvae of the Black Soldier Fly, Hermetia illucens (Diptera: Stratiomyidae). Entomol. Res. 2014, 44, 58–64. [Google Scholar] [CrossRef]
  9. Zhan, S.; Fang, G.; Cai, M.; Kou, Z.; Xu, J.; Cao, Y.; Bai, L.; Zhang, Y.; Jiang, Y.; Luo, X.; et al. Genomic Landscape and Genetic Manipulation of the Black Soldier Fly Hermetia illucens, a Natural Waste Recycler. Cell Res. 2020, 30, 50–60. [Google Scholar] [CrossRef]
  10. Moretta, A.; Salvia, R.; Scieuzo, C.; Di Somma, A.; Vogel, H.; Pucci, P.; Sgambato, A.; Wolff, M.; Falabella, P. A Bioinformatic Study of Antimicrobial Peptides Identified in the Black Soldier Fly (BSF) Hermetia illucens (Diptera: Stratiomyidae). Sci. Rep. 2020, 10, 16875. [Google Scholar] [CrossRef]
  11. Pienaar, R.D.; Gilbert, C.; Belliardo, C.; Herrero, S.; Herniou, E.A. First Evidence of Past and Present Interactions between Viruses and the Black Soldier Fly, Hermetia illucens. Viruses 2022, 14, 1274. [Google Scholar] [CrossRef] [PubMed]
  12. Batson, J.; Dudas, G.; Haas-Stapleton, E.; Kistler, A.L.; Li, L.M.; Logan, P.; Ratnasiri, K.; Retallack, H. Single Mosquito Metatranscriptomics Identifies Vectors, Emerging Pathogens and Reservoirs in One Assay. Elife 2021, 10, e68353. [Google Scholar] [CrossRef]
  13. Semberg, E.; de Miranda, J.R.; Low, M.; Jansson, A.; Forsgren, E.; Berggren, Å. Diagnostic Protocols for the Detection of Acheta Domesticus Densovirus (AdDV) in Cricket Frass. J. Virol. Methods 2019, 264, 61–64. [Google Scholar] [CrossRef]
  14. Andrews, S. FASTQC. A Quality Control Tool for High Throughput Sequence Data. 2010. [Google Scholar]
  15. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [Green Version]
  16. Generalovic, T.N.; McCarthy, S.A.; Warren, I.A.; Wood, J.M.D.; Torrance, J.; Sims, Y.; Quail, M.; Howe, K.; Pipan, M.; Durbin, R.; et al. A High-Quality, Chromosome-Level Genome Assembly of the Black Soldier Fly (Hermetia illucens L.). G3 Genes Genomes Gene. 2021, 11, jkab085. [Google Scholar] [CrossRef]
  17. Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-Based Genome Alignment and Genotyping with HISAT2 and HISAT-Genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef]
  18. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-Length Transcriptome Assembly from RNA-Seq Data without a Reference Genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef] [Green Version]
  19. Merkel, D. Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux J. 2014, 2014, 2. [Google Scholar]
  20. Li, W.; Godzik, A. Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Buchfink, B.; Reuter, K.; Drost, H.-G. Sensitive Protein Alignments at Tree-of-Life Scale Using DIAMOND. Nat. Methods 2021, 18, 366–368. [Google Scholar] [CrossRef] [PubMed]
  22. Buchfink, B.; Xie, C.; Huson, D.H. Fast and Sensitive Protein Alignment Using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
  23. Quevillon, E.; Silventoinen, V.; Pillai, S.; Harte, N.; Mulder, N.; Apweiler, R.; Lopez, R. InterProScan: Protein Domains Identifier. Nucleic Acids Res. 2005, 33, W116–W120. [Google Scholar] [CrossRef] [Green Version]
  24. Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Kalyaanamoorthy, S.; Minh, B.Q.; Wong, T.K.F.; von Haeseler, A.; Jermiin, L.S. ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates. Nat. Methods 2017, 14, 587–589. [Google Scholar] [CrossRef] [Green Version]
  26. Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; von Haeseler, A.; Lanfear, R. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020, 37, 1530–1534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Anisimova, M.; Gil, M.; Dufayard, J.F.; Dessimoz, C.; Gascuel, O. Survey of Branch Support Methods Demonstrates Accuracy, Power, and Robustness of Fast Likelihood-Based Approximation Schemes. Syst. Biol. 2011, 60, 685–699. [Google Scholar] [CrossRef] [Green Version]
  28. Nguyen, L.-T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef]
  29. Minh, B.Q.; Nguyen, M.A.T.; Von Haeseler, A. Ultrafast Approximation for Phylogenetic Bootstrap. Mol. Biol. Evol. 2013, 30, 1188–1195. [Google Scholar] [CrossRef] [Green Version]
  30. Vasimuddin, M.; Misra, S.; Li, H.; Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 20–24 May 2019; pp. 314–324. [Google Scholar]
  31. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [Green Version]
  32. Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative Genomics Viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Callanan, J.; Stockdale, S.R.; Shkoporov, A.; Draper, L.A.; Ross, R.P.; Hill, C. RNA Phage Biology in a Metagenomic Era. Viruses 2018, 10, 386. [Google Scholar] [CrossRef] [Green Version]
  34. Müller, T.; Vingron, M. Modeling Amino Acid Replacement. J. Comput. Biol. 2000, 7, 761–776. [Google Scholar] [CrossRef]
  35. Wang, D.; Fu, S.; Wu, H.; Cao, M.; Liu, L.; Zhou, X.; Wu, J. Discovery and Genomic Function of a Novel Rice Dwarf-Associated Bunya-like Virus. Viruses 2022, 14, 1183. [Google Scholar] [CrossRef]
  36. Pettersson, J.H.-O.; Shi, M.; Eden, J.-S.; Holmes, E.C.; Hesson, J.C. Meta-Transcriptomic Comparison of the RNA Viromes of the Mosquito Vectors Culex Pipiens and Culex Torrentium in Northern Europe. Viruses 2019, 11, 1033. [Google Scholar] [CrossRef] [Green Version]
  37. Shi, M.; Lin, X.-D.; Tian, J.-H.; Chen, L.-J.; Chen, X.; Li, C.-X.; Qin, X.-C.; Li, J.; Cao, J.-P.; Eden, J.-S.; et al. Redefining the Invertebrate RNA Virosphere. Nature 2016, 540, 539–543. [Google Scholar] [CrossRef] [PubMed]
  38. Walker, P.J.; Widen, S.G.; Wood, T.G.; Guzman, H.; Tesh, R.B.; Vasilakis, N. A Global Genomic Characterization of Nairoviruses Identifies Nine Discrete Genogroups with Distinctive Structural Characteristics and Host-Vector Associations. Am. J. Trop. Med. Hyg. 2016, 94, 1107–1122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Phylogenetic tree of Bunyavirales RNA-dependent RNA polymerase. The viruses discovered in this paper are in red text, and their clade is highlighted in green. The green dots along the branches represent a minimum bootstrap value of 75. Families that did not contain the novel putative viral transcripts were collapsed at their common ancestral node. The tree was rooted at the midpoint.
Figure 1. Phylogenetic tree of Bunyavirales RNA-dependent RNA polymerase. The viruses discovered in this paper are in red text, and their clade is highlighted in green. The green dots along the branches represent a minimum bootstrap value of 75. Families that did not contain the novel putative viral transcripts were collapsed at their common ancestral node. The tree was rooted at the midpoint.
Viruses 15 01654 g001
Figure 2. Viral genome occurrence across samples. Occurrence matrix showing the samples in which the putative viral sequences discovered in this study were found. Black boxes indicate that a viral genome segment was present, and grey boxes indicate that a viral genome segment was absent. The full sample occurrence matrix is shown in Figure S2.
Figure 2. Viral genome occurrence across samples. Occurrence matrix showing the samples in which the putative viral sequences discovered in this study were found. Black boxes indicate that a viral genome segment was present, and grey boxes indicate that a viral genome segment was absent. The full sample occurrence matrix is shown in Figure S2.
Viruses 15 01654 g002
Figure 3. Genome Segments of the two bunyaviruses discovered in this study. (A) The proposed genome of BSF Nairovirus-like 1. The L, M, and S segments are 4543, 2837, and 922 nucleotides, respectively. The bunyavirus RdRp conserved domain lies between nucleotides 3950–4294 (E-value = 2.43 × 10−6) on the L segment. (B) The proposed genome of BSF uncharacterized bunyavirus-like 1. The L and S segments are 5696 and 853 nucleotides, respectively. No transcript met the requirements for an M segment candidate. The bunyavirus RdRp conserved domain lies between nucleotides 2052–2888 (E-value = 1.10 × 10−15) on the L segment.
Figure 3. Genome Segments of the two bunyaviruses discovered in this study. (A) The proposed genome of BSF Nairovirus-like 1. The L, M, and S segments are 4543, 2837, and 922 nucleotides, respectively. The bunyavirus RdRp conserved domain lies between nucleotides 3950–4294 (E-value = 2.43 × 10−6) on the L segment. (B) The proposed genome of BSF uncharacterized bunyavirus-like 1. The L and S segments are 5696 and 853 nucleotides, respectively. No transcript met the requirements for an M segment candidate. The bunyavirus RdRp conserved domain lies between nucleotides 2052–2888 (E-value = 1.10 × 10−15) on the L segment.
Viruses 15 01654 g003
Table 1. Sample Overview.
Table 1. Sample Overview.
SampleNumber of Reads% Mapped to BSF Genome Sample Type
SG1 35,728 0.59% Spent Grain
T3LA2 23,447 32.26% Larvae
T3LA4 1,017,935 85.79% Larvae
T3LC1 18,192 15.88% Larvae
T3LC2 22,027 38.51% Larvae
T3LC4 162,829 86.39% Larvae
T3LR1 1,137,578 85.40% Larvae
T3LR3 12,247 24.13% Larvae
T3WA1 604 1.90% Frass
T3WA2 7373 2.10% Frass
T3WA4 7613 1.83% Frass
T3WC1 17,569 0.73% Frass
T3WC2 21,706 1.21% Frass
T3WC4 52,404 0.53% Frass
T3WR1 315,689 0.90% Frass
T3WR3 47,411 0.47% Frass
T6LA1 388,220 86.20% Larvae
T6LA2 1,264,636 90.31% Larvae
T6LA3 212,178 77.46% Larvae
T6LA4 125 9.20% Larvae
T6LC1 89,711 60.09% Larvae
T6LC2 551 26.23% Larvae
T6LC3 1,500,632 86.55% Larvae
T6LC4 833,147 93.56% Larvae
T6LR1 66 0.76% Larvae
T6LR2 210 30.71% Larvae
T6LR3 75 53.33% Larvae
T6LR4 456,745 95.04% Larvae
T6WA1 995,664 5.14% Frass
T6WA2 956,959 2.84% Frass
T6WA4 10,006 1.82% Frass
T6WC1 684,749 2.26% Frass
T6WC2 737,122 3.18% Frass
T6WC3 93,147 15.04% Frass
T6WC4 79,076 1.98% Frass
T6WR1 20,437 1.34% Frass
T6WR2 1,130,727 2.12% Frass
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Walt, H.K.; Kooienga, E.; Cammack, J.A.; Tomberlin, J.K.; Jordan, H.R.; Meyer, F.; Hoffmann, F.G. Bioinformatic Surveillance Leads to Discovery of Two Novel Putative Bunyaviruses Associated with Black Soldier Fly. Viruses 2023, 15, 1654. https://doi.org/10.3390/v15081654

AMA Style

Walt HK, Kooienga E, Cammack JA, Tomberlin JK, Jordan HR, Meyer F, Hoffmann FG. Bioinformatic Surveillance Leads to Discovery of Two Novel Putative Bunyaviruses Associated with Black Soldier Fly. Viruses. 2023; 15(8):1654. https://doi.org/10.3390/v15081654

Chicago/Turabian Style

Walt, Hunter K., Emilia Kooienga, Jonathan A. Cammack, Jeffery K. Tomberlin, Heather R. Jordan, Florencia Meyer, and Federico G. Hoffmann. 2023. "Bioinformatic Surveillance Leads to Discovery of Two Novel Putative Bunyaviruses Associated with Black Soldier Fly" Viruses 15, no. 8: 1654. https://doi.org/10.3390/v15081654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop