Viral Network Analyzer (VirNA): A Novel Minimum Spanning Networks Algorithm for Investigating Viral Evolution
Abstract
:1. Introduction
2. Results
2.1. VirNA
2.2. Comparison with State-of-the Art Tools
2.3. Comparison with Phylogeny Results
2.4. Performance Test of VirNA
3. Materials and Methods
3.1. VirNA Algorithm Details
3.1.1. VirNA Algorithm Definitions
- Root sequence: the first sequence of the input multiple alignment is designated as the root sequence R and will not be included in the output network. This is the reference sequence against which mutations in the dataset are identified;
- Input genomic sequences: let S = {s1, s2, …, sn} be the set of input genomic sequences;
- Mutation sets: for each sequence si, where i = 1, 2, …, n, we define a set of mutations Mj = {m1, m2, …, mp}, representing the p single-character differences (i.e., mutations) between the root sequence R and the sequence si;
- Network nodes: identical sequences are grouped in a single node. Let Gk = {s1, s2, …, sm}, with m ≤ n, be the sets of nodes in the network, where s1 = s2 = … = sm.
- Compatibility: The mutation set Mi is said to be compatible with the mutation set Mj if and only if Mi⊂Mj or Mj⊂Mi;
- Connected component: a connected component of a directed graph is defined as a subgraph where for each pair of nodes, Gi, Gj there either is a path from Gi to Gj or from Gj to Gi.
3.1.2. Initialization
- For each node Gi, compute the corresponding set of mutations Mi compared to the root R;
- Calculate the Hamming distance among all sequences and store it in a list HD, HD = [HD1, 2, HD1, 3, …, HDn-1, n] where HDi, j is the Hamming distance between nodes Gi and Gj, with i: 1, …, n, j: i, …, n and i ≠ j;
- Sort the list of Hamming distances HD in ascending order removing the duplicated values: SHD = {HD1, …, HDk} where HDi in HD, k <= n(n − 1), HDi ≠ HDj for each I ≠ j, HDa < HDb for each a, b = 1, …, k if a < b;
- Create a starting network, MSN(0), where all the (grouped) sequences (nodes) are not connected.
3.1.3. Iterative Steps
- HDi, j == HDm;
- Gi is compatible with Gj;
- Gi and Gj do not belong to the same connected component;
- Stop criteria at step m;
- All the nodes Gi 1 <= I <= n, are part of a single connected component of the current network MSN(m − 1);
- HDm > D, where HDm is the m-th distance in SDH, and D is a user-defined maximum allowable distance.
3.1.4. Output
3.2. Real Data, State of the Art Tools and Phylogeny
4. Concluding Remarks
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Morel, B.; Barbera, P.; Czech, L.; Bettisworth, B.; Hübner, L.; Lutteropp, S.; Serdari, D.; Kostaki, E.-G.; Mamais, I.; Kozlov, A.M.; et al. Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult. Mol. Biol. Evol. 2021, 38, 1777–1791. [Google Scholar] [CrossRef] [PubMed]
- Paradis, E. pegas: An R package for population genetics with an integrated–modular approach. Bioinformatics 2010, 26, 419–420. [Google Scholar] [CrossRef]
- Leigh, J.W.; Bryant, D. popart: Full-feature software for haplotype network construction. Methods Ecol. Evol. 2015, 6, 1110–1116. [Google Scholar] [CrossRef]
- Bandelt, H.J.; Forster, P.; Röhl, A. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 1999, 16, 37–48. [Google Scholar] [CrossRef] [PubMed]
- Khare, S.; Gurry, C.; Freitas, L.; Schultz, M.B.; Bach, G.; Diallo, A.; Akite, N.; Ho, J.; Lee, R.T.; Yeo, W.; et al. GISAID’s Role in Pandemic Response. China CDC Wkly. 2021, 3, 1049–1051. [Google Scholar] [CrossRef] [PubMed]
- Manuto, L.; Grazioli, M.; Spitaleri, A.; Fontana, P.; Bianco, L.; Bertolotti, L.; Bado, M.; Mazzotti, G.; Bianca, F.; Onelia, F.; et al. Rapid SARS-CoV-2 Intra-Host and Within-Household Emergence of Novel Haplotypes. Viruses 2022, 14, 399. [Google Scholar] [CrossRef] [PubMed]
- Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [Google Scholar] [CrossRef]
- Python Documentation [Internet]. The Python Language Reference. Available online: https://docs.python.org/3/reference/index.html (accessed on 31 July 2023).
- Behnel, S.; Bradshaw, R.; Citro, C.; Dalcin, L.; Seljebotn, D.; Smith, K. Cython: The Best of Both Worlds. Comput. Sci. Eng. 2011, 13, 31–39. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- igraph—Network Analysis Software [Internet]. Available online: https://igraph.org/ (accessed on 6 November 2023).
- Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
- Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. Proc. Int. AAAI Conf. Web Soc. Media 2009, 3, 361–362. [Google Scholar] [CrossRef]
- Bianco, L.; Moser, M.; Silverj, A.; Micheletti, D.; Lorenzin, G.; Collini, L.; Barbareschi, M.; Lanzafame, P.; Segata, N.; Pindo, M.; et al. On the Origin and Propagation of the COVID-19 Outbreak in the Italian Province of Trento, a Tourist Region of Northern Italy. Viruses 2022, 14, 580. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mazzotti, G.; Bianco, L.; Lavezzo, E.; Bado, M.; Toppo, S.; Fontana, P. Viral Network Analyzer (VirNA): A Novel Minimum Spanning Networks Algorithm for Investigating Viral Evolution. Int. J. Mol. Sci. 2025, 26, 2008. https://doi.org/10.3390/ijms26052008
Mazzotti G, Bianco L, Lavezzo E, Bado M, Toppo S, Fontana P. Viral Network Analyzer (VirNA): A Novel Minimum Spanning Networks Algorithm for Investigating Viral Evolution. International Journal of Molecular Sciences. 2025; 26(5):2008. https://doi.org/10.3390/ijms26052008
Chicago/Turabian StyleMazzotti, Giorgia, Luca Bianco, Enrico Lavezzo, Martina Bado, Stefano Toppo, and Paolo Fontana. 2025. "Viral Network Analyzer (VirNA): A Novel Minimum Spanning Networks Algorithm for Investigating Viral Evolution" International Journal of Molecular Sciences 26, no. 5: 2008. https://doi.org/10.3390/ijms26052008
APA StyleMazzotti, G., Bianco, L., Lavezzo, E., Bado, M., Toppo, S., & Fontana, P. (2025). Viral Network Analyzer (VirNA): A Novel Minimum Spanning Networks Algorithm for Investigating Viral Evolution. International Journal of Molecular Sciences, 26(5), 2008. https://doi.org/10.3390/ijms26052008