Probably Correct: Rescuing Repeats with Short and Long Reads
Abstract
:1. Introduction
2. Reference Genomes Are Inherently Incomplete
3. Short Reads
3.1. Methods of Multi-Mapping Read Assignment
3.2. Multi-Mapping Reads in RNA-Seq, Chip-Seq, Hi-C, and Exome Sequencing
3.3. Repeat Masking and Its Consequences
3.4. Sex Chromosomes
4. Long Reads
4.1. Long-Read Sequencing Strategies
4.2. Differentiating (Nearly) Identical Repeat Arrays
4.3. Long-Read Assemblies
5. Future
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lallemand, T.; Leduc, M.; Landès, C.; Rizzon, C.; Lerat, E. An overview of duplicated gene detection methods: Why the duplication mechanism has to be accounted for in their choice. Genes 2020, 11, 1046. [Google Scholar] [CrossRef] [PubMed]
- Lerat, E. Identifying repeats and transposable elements in sequenced genomes: How to find your way through the dense forest of programs. Heredity 2010, 104, 520–533. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kojima, K.K. Human transposable elements in Repbase: Genomic footprints from fish to humans. Mob. DNA 2018, 9, 2. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miga, K.H. Centromere studies in the era of “telomere-to-telomere”genomics. Exp. Cell Res. 2020, 394, 112127. [Google Scholar] [CrossRef]
- Chaisson, M.J.P.; Huddleston, J.; Dennis, M.Y.; Sudmant, P.H.; Malig, M.; Hormozdiari, F.; Antonacci, F.; Surti, U.; Sandstrom, R.; Boitano, M.; et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 2015, 517, 608–611. [Google Scholar] [CrossRef] [Green Version]
- de Koning, A.P.J.; Gu, W.; Castoe, T.A.; Batzer, M.A.; Pollock, D.D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011, 7, e1002384. [Google Scholar] [CrossRef] [Green Version]
- Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; et al. Initial sequencing and analysis of the human genome. Nature 2001, 409, 860–921. [Google Scholar]
- Haberer, G.; Kamal, N.; Bauer, E.; Gundlach, H.; Fischer, I.; Seidel, M.A.; Spannagl, M.; Marcon, C.; Ruban, A.; Urbany, C.; et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 2020, 52, 950–957. [Google Scholar] [CrossRef]
- Singh, P.P.; Affeldt, S.; Malaguti, G.; Isambert, H. Human dominant disease genes are enriched in paralogs originating from whole genome duplication. PLoS Comput. Biol. 2014, 10, e1003754. [Google Scholar] [CrossRef]
- Sharp, A.J.; Locke, D.P.; McGrath, S.D.; Cheng, Z.; Bailey, J.A.; Vallente, R.U.; Pertz, L.M.; Clark, R.A.; Schwartz, S.; Segraves, R.; et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 2005, 77, 78–88. [Google Scholar] [CrossRef] [Green Version]
- Phan, V.; Gao, S.; Tran, Q.; Vo, N.S. How genome complexity can explain the difficulty of aligning reads to genomes. BMC Bioinform. 2015, 16, S3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schatz, M.C.; Delcher, A.L.; Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 2010, 20, 1165–1173. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Freudenberg, J.; Miramontes, P. Diminishing return for increased Mappability with longer sequencing reads: Implications of the k-mer distributions in the human genome. BMC Bioinform. 2014, 15, 2. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, W.; Freudenberg, J. Mappability and read length. Front. Genet. 2014, 5, 381. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pan, B.; Kusko, R.; Xiao, W.; Zheng, Y.; Liu, Z.; Xiao, C.; Sakkiah, S.; Guo, W.; Gong, P.; Zhang, C.; et al. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinform. 2019, 20, 101. [Google Scholar] [CrossRef]
- Ugarković, Ð.; Plohl, M. Variation in satellite DNA profiles—Causes and effects. EMBO J. 2002, 21, 5955–5959. [Google Scholar] [CrossRef] [Green Version]
- Miga, K.H.; Newton, Y.; Jain, M.; Altemose, N.; Willard, H.F.; Kent, W.J. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014, 24, 697–707. [Google Scholar] [CrossRef] [Green Version]
- Wei, K.H.-C.; Grenier, J.K.; Barbash, D.A.; Clark, A.G. Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 2014, 111, 18793–18798. [Google Scholar] [CrossRef] [Green Version]
- Cechova, M.; Harris, R.S.; Tomaszkiewicz, M.; Arbeithuber, B.; Chiaromonte, F.; Makova, K.D. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol. Biol. Evol. 2019, 36. [Google Scholar] [CrossRef] [Green Version]
- Lower, S.S.; McGurk, M.P.; Clark, A.G.; Barbash, D.A. Satellite DNA evolution: Old ideas, new approaches. Curr. Opin. Genet. Dev. 2018, 49, 70–78. [Google Scholar] [CrossRef]
- Logsdon, G.A.; Gambogi, C.W.; Liskovykh, M.A.; Barrey, E.J.; Larionov, V.; Miga, K.H.; Heun, P.; Black, B.E. Human artificial chromosomes that bypass centromeric DNA. Cell 2019, 178, 624–639.e19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miga, K.H. Centromeric satellite DNAs: Hidden sequence variation in the human population. Genes 2019, 10, 352. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schröder, J.; Girirajan, S.; Papenfuss, A.T.; Medvedev, P. Improving the power of structural variation detection by augmenting the reference. PLoS ONE 2015, 10, e0136771. [Google Scholar] [CrossRef] [PubMed]
- Zhao, T.; Duan, Z.; Genchev, G.Z.; Lu, H. Closing human reference genome gaps: Identifying and characterizing gap-closing sequences. G3 2020, 10, 2801–2809. [Google Scholar] [CrossRef]
- Altemose, N.; Miga, K.H.; Maggioni, M.; Willard, H.F. Genomic characterization of large heterochromatic gaps in the human genome assembly. PLoS Comput. Biol. 2014, 10, e1003628. [Google Scholar] [CrossRef] [Green Version]
- Peona, V.; Weissensteiner, M.H.; Suh, A. How complete are “complete” genome assemblies? An avian perspective. Mol. Ecol. Resour. 2018, 18, 1188–1195. [Google Scholar] [CrossRef] [Green Version]
- Salzberg, S.L.; Yorke, J.A. Beware of mis-assembled genomes. Bioinformatics 2005, 21, 4320–4321. [Google Scholar] [CrossRef]
- Li, H. Identifying centromeric satellites with dna-brnn. Bioinformatics 2019, 35, 4408–4410. [Google Scholar] [CrossRef]
- Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. Haplotype-resolved de novo assembly with phased assembly graphs. arXiv 2020, arXiv:2008.01237v1. [Google Scholar]
- GIS. The (Near) Complete Sequence of a Human Genome. Available online: https://genomeinformatics.github.io/CHM13v1/ (accessed on 25 October 2020).
- Logsdon, G.A.; Vollger, M.R.; Hsieh, P.; Mao, Y.; Liskovykh, M.A.; Koren, S.; Nurk, S.; Mercuri, L.; Dishuck, P.C.; Rhie, A.; et al. The structure, function, and evolution of a complete human chromosome 8. bioRxiv 2020. [Google Scholar] [CrossRef]
- Miga, K.H.; Koren, S.; Rhie, A.; Vollger, M.R.; Gershman, A.; Bzikadze, A.; Brooks, S.; Howe, E.; Porubsky, D.; Logsdon, G.A.; et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020, 585, 79–84. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Koyutürk, M.; Maxwell, S.; Xiang, M.; Veigl, M.; Cooper, R.S.; Tayo, B.O.; Li, L.; LaFramboise, T.; Wang, Z.; et al. Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genom. 2014, 15, 685. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, R.; Tian, X.; Yang, P.; Fan, Y.; Li, M.; Zheng, H.; Wang, X.; Jiang, Y. Recovery of non-reference sequences missing from the human reference genome. BMC Genom. 2019, 20, 746. [Google Scholar] [CrossRef]
- Sherman, R.M.; Forman, J.; Antonescu, V.; Puiu, D.; Daya, M.; Rafaels, N.; Boorgula, M.P.; Chavan, S.; Vergara, C.; Ortega, V.E.; et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 2019, 51, 30–35. [Google Scholar] [CrossRef] [PubMed]
- Eisfeldt, J.; Mårtensson, G.; Ameur, A.; Nilsson, D.; Lindstrand, A. Discovery of novel sequences in 1.000 Swedish genomes. Mol. Biol. Evol. 2020, 37, 18–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ameur, A.; Che, H.; Martin, M.; Bunikis, I.; Dahlberg, J.; Höijer, I.; Häggqvist, S.; Vezzi, F.; Nordlund, J.; Olason, P.; et al. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes 2018, 9, 486. [Google Scholar] [CrossRef] [Green Version]
- Tian, C.; Gregersen, P.K.; Seldin, M.F. Accounting for ancestry: Population substructure and genome-wide association studies. Hum. Mol. Genet. 2008, 17, R143–R150. [Google Scholar] [CrossRef] [Green Version]
- Martin, A.R.; Kanai, M.; Kamatani, Y.; Okada, Y.; Neale, B.M.; Daly, M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019, 51, 584–591. [Google Scholar] [CrossRef]
- Nagasaki, M.; Kuroki, Y.; Shibata, T.F.; Katsuoka, F.; Mimori, T.; Kawai, Y.; Minegishi, N.; Hozawa, A.; Kuriyama, S.; Suzuki, Y.; et al. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum. Genome Var. 2019, 6, 27. [Google Scholar] [CrossRef]
- Li, H. Which Human Reference Genome to Use? Available online: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use (accessed on 14 October 2020).
- Song, S.; Huang, Q.; Guo, J.; Li-Ling, J.; Chen, X.; Ma, F. Comparative component analysis of exons with different splicing frequencies. PLoS ONE 2009, 4, e5387. [Google Scholar] [CrossRef] [Green Version]
- Liang, D.; Wilusz, J.E. Short intronic repeat sequences facilitate circular RNA production. Genes Dev. 2014, 28, 2233–2247. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lozada-Chávez, I.; Stadler, P.F.; Prohaska, S.J. Genome-wide features of introns are evolutionary decoupled among themselves and from genome size throughout Eukarya. bioRxiv 2018. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Langmead, B. Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinform. 2010, 11. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Langmead, B.; Salzberg, S.L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 2015, 12, 357–360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef] [PubMed]
- Novák, P.; Ávila Robledillo, L.; Koblížková, A.; Vrbová, I.; Neumann, P.; Macas, J. TAREAN: A computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res. 2017, 45, e111. [Google Scholar] [CrossRef]
- Deschamps-Francoeur, G.; Simoneau, J.; Scott, M.S. Handling multi-mapped reads in RNA-seq. Comput. Struct. Biotechnol. J. 2020, 18, 1569–1576. [Google Scholar] [CrossRef]
- Robert, C.; Watson, M. Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biol. 2015, 16, 177. [Google Scholar] [CrossRef] [Green Version]
- Zytnicki, M. mmquant: How to count multi-mapping reads? BMC Bioinform. 2017, 18, 411. [Google Scholar] [CrossRef] [Green Version]
- Turro, E.; Su, S.-Y.; Gonçalves, Â.; Coin, L.J.M.; Richardson, S.; Lewin, A. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol. 2011, 12, R13. [Google Scholar] [CrossRef] [PubMed]
- Raghupathy, N.; Choi, K.; Vincent, M.J.; Beane, G.L.; Sheppard, K.S.; Munger, S.C.; Korstanje, R.; Pardo-Manual de Villena, F.; Churchill, G.A. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 2018, 34, 2177–2184. [Google Scholar] [CrossRef] [PubMed]
- Li, B.; Dewey, C.N. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011, 12, 323. [Google Scholar] [CrossRef] [Green Version]
- Bray, N.L.; Pimentel, H.; Melsted, P.; Pachter, L. Erratum: Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016, 34, 888. [Google Scholar] [CrossRef] [Green Version]
- Patro, R.; Duggal, G.; Love, M.I.; Irizarry, R.A.; Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 2017, 14, 417–419. [Google Scholar] [CrossRef] [Green Version]
- Bonfert, T.; Csaba, G.; Zimmer, R.; Friedel, C.C. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinform. 2012, 13, S9. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Robertson, G.; Krzywinski, M.; Ning, K.; Droit, A.; Jones, S.; Gottardo, R. PICS: Probabilistic inference for ChIP-seq. Biometrics 2011, 67, 151–163. [Google Scholar] [CrossRef] [Green Version]
- Hughes, J.F.; Skaletsky, H.; Pyntikova, T.; Graves, T.A.; van Daalen, S.K.M.; Minx, P.J.; Fulton, R.S.; McGrath, S.D.; Locke, D.P.; Friedman, C.; et al. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 2010, 463, 536–539. [Google Scholar] [CrossRef] [Green Version]
- Zheng, Y.; Ay, F.; Keles, S. Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies. eLife 2019, 8, e38070. [Google Scholar] [CrossRef]
- Cechova, M.; Vegesna, R.; Tomaszkiewicz, M.; Harris, R.S.; Chen, D.; Rangavittal, S.; Medvedev, P.; Makova, K.D. Dynamic evolution of great ape Y chromosomes. Proc. Natl. Acad. Sci. USA 2020, 117, 26273–26280. [Google Scholar] [CrossRef]
- Johnson, N.R.; Yeoh, J.M.; Coruh, C.; Axtell, M.J. Improved placement of multi-mapping small RNAs. G3 2016, 6, 2103–2111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nielsen, H.B.; Almeida, M.; Juncker, A.S.; Rasmussen, S.; Li, J.; Sunagawa, S.; Plichta, D.R.; Gautier, L.; Pedersen, A.G.; Le Chatelier, E.; et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 2014, 32, 822–828. [Google Scholar] [CrossRef] [PubMed]
- Tomaszkiewicz, M.; Medvedev, P.; Makova, K.D. Y and W chromosome assemblies: Approaches and discoveries. Trends Genet. 2017, 33, 266–282. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Clayton, D.G. Sex chromosomes and genetic association studies. Genome Med. 2009, 1, 110. [Google Scholar] [CrossRef] [Green Version]
- Anonymous. Accounting for sex in the genome. Nat. Med. 2017, 23, 1243. [Google Scholar] [CrossRef]
- König, I.R.; Loley, C.; Erdmann, J.; Ziegler, A. How to include chromosome X in your genome-wide association study. Genet. Epidemiol. 2014, 38, 97–103. [Google Scholar] [CrossRef]
- Webster, T.H.; Couse, M.; Grande, B.M.; Karlins, E.; Phung, T.N.; Richmond, P.A.; Whitford, W.; Wilson, M.A. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. Gigascience 2019, 8. [Google Scholar] [CrossRef] [Green Version]
- Olney, K.C.; Brotman, S.M.; Andrews, J.P.; Valverde-Vesling, V.A.; Wilson, M.A. Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data. Biol. Sex Differ. 2020, 11, 42. [Google Scholar] [CrossRef]
- Wick, R.R.; Holt, K.E. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Research 2019, 8, 2138. [Google Scholar] [CrossRef] [Green Version]
- Jain, M.; Olsen, H.E.; Turner, D.J.; Stoddart, D.; Bulazel, K.V.; Paten, B.; Haussler, D.; Willard, H.F.; Akeson, M.; Miga, K.H. Linear assembly of a human Y chromosome centromere. Nat. Biotechnol. 2018, 36, 321. [Google Scholar] [CrossRef] [Green Version]
- Jain, M.; Koren, S.; Miga, K.H.; Quick, J.; Rand, A.C.; Sasani, T.A.; Tyson, J.R.; Beggs, A.D.; Dilthey, A.T.; Fiddes, I.T.; et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 2018, 36, 338–345. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vollger, M.R.; Logsdon, G.A.; Audano, P.A.; Sulovari, A.; Porubsky, D.; Peluso, P.; Wenger, A.M.; Concepcion, G.T.; Kronenberg, Z.N.; Munson, K.M.; et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum. Genet. 2020, 84, 125–140. [Google Scholar] [CrossRef] [PubMed]
- Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Howe, K.; Wood, J.M.D. Using optical mapping data for the improvement of vertebrate genome assemblies. GigaScience 2015, 4, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hoang, P.T.N.; Fiebig, A.; Novák, P.; Macas, J.; Cao, H.X.; Stepanenko, A.; Chen, G.; Borisjuk, N.; Scholz, U.; Schubert, I. Chromosome-scale genome assembly for the duckweed Spirodela intermedia, integrating cytogenetic maps, PacBio and Oxford Nanopore libraries. Sci. Rep. 2020, 10, 19230. [Google Scholar] [CrossRef]
- Suzuki, S.; Ranade, S.; Osaki, K.; Ito, S.; Shigenari, A.; Ohnuki, Y.; Oka, A.; Masuya, A.; Harting, J.; Baybayan, P.; et al. Reference grade characterization of polymorphisms in full-length HLA class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform. Front. Immunol. 2018, 9, 2294. [Google Scholar] [CrossRef] [Green Version]
- Turner, T.R.; Hayhurst, J.D.; Hayward, D.R.; Bultitude, W.P.; Barker, D.J.; Robinson, J.; Madrigal, J.A.; Mayor, N.P.; Marsh, S.G.E. Single molecule real-time DNA sequencing of HLA genes at ultra-high resolution from 126 international HLA and immunogenetics workshop cell lines. Hladnikia 2018, 91, 88–101. [Google Scholar] [CrossRef]
- Albrecht, V.; Zweiniger, C.; Surendranath, V.; Lang, K.; Schöfl, G.; Dahl, A.; Winkler, S.; Lange, V.; Böhme, I.; Schmidt, A.H. Dual redundant sequencing strategy: Full-length gene characterisation of 1056 novel and confirmatory HLA alleles. Hladnikia 2017, 90, 79–87. [Google Scholar] [CrossRef] [Green Version]
- Chin, C.-S.; Wagner, J.; Zeng, Q.; Garrison, E.; Garg, S.; Fungtammasan, A.; Rautiainen, M.; Aganezov, S.; Kirsche, M.; Zarate, S.; et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 2020, 11, 4794. [Google Scholar] [CrossRef]
- Harris, R.S.; Cechova, M.; Makova, K.D. Noise-cancelling repeat finder: Uncovering tandem repeats in error-prone long-read sequencing data. Bioinformatics 2019, 35, 4809–4811. [Google Scholar] [CrossRef] [Green Version]
- Mitsuhashi, S.; Frith, M.C.; Mizuguchi, T.; Miyatake, S.; Toyota, T.; Adachi, H.; Oma, Y.; Kino, Y.; Mitsuhashi, H.; Matsumoto, N. Tandem-genotypes: Robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 2019, 20, 58. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ummat, A.; Bashir, A. Resolving complex tandem repeats with long reads. Bioinformatics 2014, 30, 3491–3498. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, C.; Medvedev, P. VarMatch: Robust matching of small variant datasets using flexible scoring schemes. Bioinformatics 2017, 33, 1301–1308. [Google Scholar] [CrossRef] [PubMed]
- Mousavi, N.; Margoliash, J.; Pusarla, N.; Saini, S.; Yanicky, R.; Gymrek, M. TRTools: A toolkit for genome-wide analysis of tandem repeats. Bioinformatics 2020. [Google Scholar] [CrossRef]
- Mikheenko, A.; Bzikadze, A.V.; Gurevich, A.; Miga, K.H.; Pevzner, P.A. TandemTools: Mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 2020, 36, i75–i83. [Google Scholar] [CrossRef]
- Jain, C.; Rhie, A.; Zhang, H.; Chu, C.; Walenz, B.P.; Koren, S.; Phillippy, A.M. Weighted minimizer sampling improves long read mapping. Bioinformatics 2020, 36, i111–i118. [Google Scholar] [CrossRef]
- Jain, C.; Rhie, A.; Hansen, N.; Koren, S.; Phillippy, A.M. A long read mapping method for highly repetitive reference sequences. Cold Spring Harb. Lab. 2020, 2020, 363887. [Google Scholar]
- Nanopore Technologies. R10.3: The Newest Nanopore for High Accuracy Nanopore Sequencing. Available online: https://nanoporetech.com/about-us/news/r103-newest-nanopore-high-accuracy-nanopore-sequencing-now-available-store (accessed on 5 November 2020).
- Nurk, S.; Walenz, B.P.; Rhie, A.; Vollger, M.R.; Logsdon, G.A.; Grothe, R.; Miga, K.H.; Eichler, E.E.; Phillippy, A.M.; Koren, S. HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020, 30, 1291–1305. [Google Scholar] [CrossRef]
- Wenger, A.M.; Peluso, P.; Rowell, W.J.; Chang, P.-C.; Hall, R.J.; Concepcion, G.T.; Ebler, J.; Fungtammasan, A.; Kolesnikov, A.; Olson, N.D.; et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 2019, 37, 1155–1162. [Google Scholar] [CrossRef]
- Salari, F.; Zare-Mirakabad, F.; Sadeghi, M.; Rokni-Zadeh, H. Assessing the impact of exact reads on reducing the error rate of read mapping. BMC Bioinform. 2018, 19, 406. [Google Scholar] [CrossRef]
- Mondo, S.J.; Dannebaum, R.O.; Kuo, R.C.; Louie, K.B.; Bewick, A.J.; LaButti, K.; Haridas, S.; Kuo, A.; Salamov, A.; Ahrendt, S.R.; et al. Widespread adenine N6-methylation of active genes in fungi. Nat. Genet. 2017, 49, 964–968. [Google Scholar] [CrossRef] [PubMed]
- Ding, H.; Bailey, A.D.; Jain, M.; Olsen, H.; Paten, B. Gaussian mixture model-based unsupervised nucleotide modification number detection using nanopore-sequencing readouts. Bioinformatics 2020, 8, 4928–4934. [Google Scholar] [CrossRef] [PubMed]
- Beaulaurier, J.; Zhu, S.; Deikus, G.; Mogno, I.; Zhang, X.-S.; Davis-Richardson, A.; Canepa, R.; Triplett, E.W.; Faith, J.J.; Sebra, R.; et al. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat. Biotechnol. 2018, 36, 61–69. [Google Scholar] [CrossRef] [PubMed]
- Schatz, M.C. Nanopore sequencing meets epigenetics. Nat. Methods 2017, 14, 347–348. [Google Scholar] [CrossRef] [PubMed]
- Schreiber, J.; Wescoe, Z.L.; Abu-Shumays, R.; Vivian, J.T.; Baatar, B.; Karplus, K.; Akeson, M. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl. Acad. Sci. USA 2013, 110, 18910–18915. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, Y.; Cheng, J.; Siejka-Zielińska, P.; Weldon, C.; Roberts, H.; Lopopolo, M.; Magri, A.; D’Arienzo, V.; Harris, J.M.; McKeating, J.A.; et al. Accurate targeted long-read DNA methylation and hydroxymethylation sequencing with TAPS. Genome Biol. 2020, 21, 54. [Google Scholar] [CrossRef] [Green Version]
- Liu, Q.; Georgieva, D.C.; Egli, D.; Wang, K. NanoMod: A computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genom. 2019, 20, 78. [Google Scholar] [CrossRef] [Green Version]
- Vollger, M.R.; Dishuck, P.C.; Sorensen, M.; Welch, A.E.; Dang, V.; Dougherty, M.L.; Graves-Lindsay, T.A.; Wilson, R.K.; Chaisson, M.J.P.; Eichler, E.E. Long-read sequence and assembly of segmental duplications. Nat. Methods 2019, 16, 88–94. [Google Scholar] [CrossRef]
- Koren, S.; Rhie, A.; Walenz, B.P.; Dilthey, A.T.; Bickhart, D.M.; Kingan, S.B.; Hiendleder, S.; Williams, J.L.; Smith, T.P.L.; Phillippy, A.M. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 2018, 36, 1174–1182. [Google Scholar] [CrossRef]
- Garg, S.; Fungtammasan, A.; Carroll, A.; Chou, M.; Schmitt, A.; Zhou, X.; Mac, S.; Peluso, P.; Hatas, E.; Ghurye, J.; et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 2020. [Google Scholar] [CrossRef]
- Porubsky, D.; Ebert, P.; Audano, P.A.; Vollger, M.R.; Harvey, W.T.; Marijon, P.; Ebler, J.; Munson, K.M.; Sorensen, M.; Sulovari, A.; et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 2020. [Google Scholar] [CrossRef] [PubMed]
- Di Genova, A.; Buena-Atienza, E.; Ossowski, S.; Sagot, M.-F. Efficient hybrid de novo assembly of human genomes with WENGAN. Nat. Biotechnol. 2020. [Google Scholar] [CrossRef]
- Asalone, K.C.; Ryan, K.M.; Yamadi, M.; Cohen, A.L.; Farmer, W.G.; George, D.J.; Joppert, C.; Kim, K.; Mughal, M.F.; Said, R.; et al. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput. Biol. 2020, 16, e1008104. [Google Scholar] [CrossRef] [PubMed]
- The Computational Pan-Genomics Consortium. Computational pan-genomics: Status, promises and challenges. Brief. Bioinform. 2018, 19, 118–135. [Google Scholar]
- Li, H.; Feng, X.; Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 2020, 21, 265. [Google Scholar] [CrossRef] [PubMed]
- The 1000 Genomes Project Consortium; Auton, A.; Brooks, L.D.; Durbin, R.M.; Garrison, E.P.; Kang, H.M.; Korbel, J.O.; Marchini, J.L.; McCarthy, S.; McVean, G.A.; et al. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chromosome Name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
mapping proportion [%] (to hg38) | 7.6 | 7.7 | 6.6 | 6.4 | 6.0 | 5.4 | 5.0 | 4.6 | 3.9 | 4.3 | 4.2 | 4.2 |
mapping proportion [%] (to itself) | 33.8 | 34.7 | 32.1 | 32.4 | 31.5 | 30.6 | 31.5 | 30.0 | 30.3 | 30.7 | 29.0 | 29.3 |
mapping proportion [%] (to masked hg38) | 5.2 | 5.3 | 4.4 | 4.1 | 3.9 | 3.8 | 3.4 | 3.2 | 2.6 | 3.0 | 3.3 | 2.8 |
mapping proportion [%] (to masked itself) | 6.2 | 6.5 | 4.0 | 4.0 | 3.7 | 3.6 | 4.5 | 3.0 | 3.6 | 4.5 | 3.1 | 2.7 |
Chromosome Name | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | X | Y 1 |
mapping proportion [%] (to hg38) | 3.2 | 2.8 | 2.5 | 2.7 | 2.4 | 2.5 | 1.6 | 2.1 | 1.4 | 1.2 | 4.9 | 0.2 |
mapping proportion [%] (to itself) | 27.7 | 28.6 | 27.3 | 27.7 | 27.6 | 26.9 | 25.0 | 26.9 | 25.8 | 26.0 | 29.9 | 23.6 |
mapping proportion [%] (to masked hg38) | 2.2 | 2.0 | 1.8 | 2.0 | 1.7 | 1.7 | 0.9 | 1.6 | 0.9 | 0.8 | 2.7 | 0.4 |
mapping proportion [%] (to masked itself) | 2.1 | 2.0 | 1.8 | 3.1 | 3.1 | 1.7 | 1.0 | 2.7 | 1.2 | 2.8 | 2.6 | 2.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cechova, M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes 2021, 12, 48. https://doi.org/10.3390/genes12010048
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes. 2021; 12(1):48. https://doi.org/10.3390/genes12010048
Chicago/Turabian StyleCechova, Monika. 2021. "Probably Correct: Rescuing Repeats with Short and Long Reads" Genes 12, no. 1: 48. https://doi.org/10.3390/genes12010048
APA StyleCechova, M. (2021). Probably Correct: Rescuing Repeats with Short and Long Reads. Genes, 12(1), 48. https://doi.org/10.3390/genes12010048