Hardware Acceleration of Identifying Barcodes in Multiplexed Nanopore Sequencing
Abstract
:1. Introduction
2. Background
2.1. Sequencing Barcode Identification Method
Algorithm 1 Barcode identification algorithm in [12] |
|
2.2. DP Algorithm for Identifying Indel
- 1.
- Calculate the edit distance between sequence elements and ; the calculation equation is
- 2.
- Backtrack from to to find the optimum path (i,j), which can be expressed as
- 3.
- Use the optimum path (i,j) to get the vector (i,j) that marks the indel positions, as follows
- 4.
- Based on the marker vector (i,j), the insertion and deletion errors on the corrupted sequence can be corrected.
3. Barcode Identification Accelerator
- 1.
- The -bit barcode r = is transformed into the sequence s = and code d = through the demapping module;
- 2.
- In the cyclic shifting module, it is assumed that the code d and the sequence s have been cyclically shifted to the left l times () to obtain the sequence and codeword . Meanwhile, the predetermined sequence p has been cyclically shifted to the left k times () to obtain the sequence ;
- 3.
- The shifted sequences , , and code are input into the dynamic programming module. Through the processing of this module, the indel identification of code is realized by calculating the edit distance of the sequences s and p and backtracking the optimal path, and we get the modified codeword ;
- 4.
- Subsequently, the remaining substitution errors on the codeword are decoded by the cyclic code decoder. If the decoded codeword meets the condition, that is, all syndromes equal to zero, it is cyclically shifted to the right k times as the final output result. Otherwise, the accelerator returns to the cyclic shifting module.
- 5.
- In the cyclic shifting module, if , we cyclically shift the sequence to the left once, and then perform indel identification. When , if , the sequence and the codeword are cyclically shifted to the left once, and then indel identification is performed; otherwise, the accelerator directly outputs the final decoding result .
3.1. Dynamic Programming Module
3.1.1. Edit-Distance Calculation Module
- PE sequentially receives in every clock cycle and sends it to PE in the next clock cycle. Similarly, PE receives the 1-bit from PE and outputs it to PE in the following clock cycle;
- The elements in every column correspond to one bit, respectively, in the sequence . In other words, as far as PE is concerned, the corresponding bit in the sequence (i.e., ) is constant, and it is only read at the beginning of the process;
- At the clock cycle, the input of PE is the output result of the same PE in the clock cycle. Moreover, the input and of PE are the output results of the PE in the and clock cycles, respectively;
- The calculated result is not only the input of PE, but also stored in a random access memory (RAM) with a width of K bits and depth of , where n is the sequence’s length, and it satisfies .
3.1.2. Indel-Marking Module
- If is the minimum, it means that an insertion error appears in the sequence at the position to i. In this case, the shift register sfr1 writes 2-bit data “01” in the upper direction;
- If is the minimum, it indicates that a deletion error appears in the sequence at the position to i. In this case, the shift register sfr1 writes 2-bit data “10” in the upper direction;
- If is the minimum, it means that the data are transmitted correctly or there is a substitution error in the sequence at position . In this case, the shift register sfr1 writes 2-bit data “11” in the upper direction.
3.1.3. Indel-Correction Module
3.2. Cyclic Code Decoder
4. Results
4.1. Performance Verification
4.2. Hardware Implementation Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
FPGA | Field-programmable gate array |
NGS | Next-generation sequencing |
DNA | Deoxyribonucleic acid |
CPU | Central processing unit |
DP | Dynamic programming |
PE | Processing element |
LSA | Linear systolic array |
RAM | Random-access memory |
BCH | Bose–Chaudhuri–Hocquenghem |
KES | Key equation solver |
IBM | Inversionless Berlekamp–Massey |
GPU | Graphics processing unit |
References
- Hardwick, S.A.; Deveson, I.W.; Mercer, T.R. Reference standards for next-generation sequencing. Nat. Rev. Genet. 2017, 18, 473–484. [Google Scholar] [CrossRef] [PubMed]
- Hu, T.; Chitnis, N.; Monos, D.; Dinh, A. Next-generation sequencing technologies: An overview. Hum. Immunol. 2021, 82, 801–811. [Google Scholar] [CrossRef] [PubMed]
- Krishnan, A.R.; Sweeney, M.; Vasic, J.; Galbraith, D.W.; Vasic, B. Barcodes for DNA sequencing with guaranteed error correction capability. Electron. Lett. 2011, 47, 236–237. [Google Scholar] [CrossRef]
- Abeynayake, S.W.; Fiorito, S.; Dinsdale, A.; Whattam, M.; Crowe, B.; Sparks, K.; Campbell, P.R.; Gambley, C. A rapid and cost-effective identification of invertebrate pests at the borders using MinION sequencing of DNA barcodes. Genes 2021, 8, 1138. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Han, M.; Zhou, J.; Ge, Q.; Wang, P.; Zhu, S.; Song, L.; Yuan, Y. An artificial chromosome for data storage. Natl. Sci. Rev. 2021, 8, nwab028–nwab037. [Google Scholar] [CrossRef]
- Chen, W.; Wang, L.; Han, M.; Han, C.; Li, B. Sequencing barcode construction and identification methods based on block error-correction codes. Sci. China Life Sci. 2020, 63, 1580–1592. [Google Scholar] [CrossRef]
- Hamady, M.; Walker, J.J.; Harris, J.K.; Gold, N.J.; Knight, R. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods 2008, 5, 235–237. [Google Scholar] [CrossRef]
- Bystrykh, L.V. Generalized DNA barcode design based on hamming codes. PLoS ONE 2012, 7, e36852. [Google Scholar] [CrossRef]
- Buschmann, T.; Bystrykh, L.V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinform. 2013, 14, 272–281. [Google Scholar] [CrossRef]
- Hawkins, J.A.; Jones, S.K.; Finkelstein, I.J.; Press, W.H.; Callan, C.G.; Troyanskaya, O.G.; Weissman, J.S. Indel-correcting DNA barcodes for high-throughput sequencing. Proc. Natl. Acad. Sci. USA 2018, 115, e6217–e6226. [Google Scholar] [CrossRef]
- Kracht, D.; Schober, S. Insertion and deletion correcting DNA barcodes based on watermarks. BMC Bioinform. 2015, 16, 50. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Wang, P.; Wang, L.; Zhang, D.; Han, M.; Han, M.; Song, L. Low-complexity and highly robust barcodes for error-rich single molecular sequencing. 3 Biotech 2021, 11, 78. [Google Scholar] [CrossRef] [PubMed]
- Goenka, S.D.; Gorzynski, J.E.; Shafin, K.; Fisk, D.G.; Pesout, T.; Jensen, T.D.; Monlong, J. Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing. Nat. Biotechnol. 2022, 40, 1035–1041. [Google Scholar] [CrossRef] [PubMed]
- Gorzynski, J.E.; Goenka, S.D.; Shafin, K.; Jensen, T.D.; Fisk, D.G.; Grove, M.E. Ultrarapid nanopore genome sequencing in a critical care setting. N. Engl. J. Med. 2022, 386, 700–702. [Google Scholar] [CrossRef]
- Pomerantz, A.; Peñafiel, N.; Arteaga, A.; Bustamante, L.; Pichardo, F.; Coloma, L.A.; Prost, S. Real-time DNA barcoding in a rainforest using nanopore sequencing: Opportunities for rapid biodiversity assessments and local capacity building. Gigascience 2018, 7, giy033. [Google Scholar] [CrossRef]
- Awad, M. FPGA supercomputing platforms: A survey. In Proceedings of the International Conference on Field Programmable Logic, Prague, Czech Republic, 31 August–2 September 2009; pp. 564–568. [Google Scholar]
- Leiserson, C.E.; Thompson, N.C.; Emer, J.S.; Kuszmaul, B.C.; Tao, B.S. There is plenty of room at the top: What will drive computer performance after Moore’s law. Science 2020, 368, e9744. [Google Scholar] [CrossRef]
- Stivala, A.; Stuckey, P.J.; Banda, M.G.D.L.; Hermenegildo, M.; Wirth, A. Lock-free parallel dynamic programming. J. Parallel Distrib. Comput. 2010, 70, 839–848. [Google Scholar] [CrossRef]
- Guo, X.; Hong, W.; Devabhaktuni, V. A systolic array-based FPGA parallel architecture for the BLAST algorithm. ISRN Bioinform. 2012, 2012, 195658. [Google Scholar] [CrossRef]
- Casale-Brunet, S.; Bezati, E.; Mattavelli, M. Design space exploration of dataflow-based Smith-Waterman FPGA implementations. In Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems (SiPS), Lorient, France, 3–5 October 2017; pp. 1–6. [Google Scholar]
- Shah, H.A.; Hasan, L.; Koo, I. Optimized and portable FPGA-based systolic cell architecture for Smith-Waterman-based DNA sequence alignment. J. Inf. Commun. Converg. Eng. 2016, 14, 26–34. [Google Scholar] [CrossRef]
- Koliogeorgi, K.; Voss, N.; Fytraki, S.; Xydis, S.; Soudris, D. Dataflow acceleration of Smith-Waterman with traceback for high throughput next generation sequencing. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL), Barcelona, Spain, 8–12 September 2019; pp. 74–80. [Google Scholar]
- Jain, M.; Koren, S.; Miga, K.H.; Quick, J.; Rand, A.C.; Sasani, T.A. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 2018, 36, 338–345. [Google Scholar] [CrossRef]
- Kruskal, J.B. An overview of sequence comparison: Time warps, string edits, and macromolecules. SIAM Rev. 1983, 25, 201–237. [Google Scholar] [CrossRef]
- Nawaz, Z.; Nadeem, M.; Someren, H.V.; Bertels, K. A parallel FPGA design of the Smith-Waterman traceback. In Proceedings of the 2010 International Conference on Field-Programmable Technology, Beijing, China, 8–10 December 2010; pp. 454–459. [Google Scholar]
- Tithi, J.J.; Crago, N.C.; Emer, J.S. Exploiting spatial architectures for edit distance algorithms. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Monterey, CA, USA, 23–25 March 2014; pp. 23–34. [Google Scholar]
- Benkrid, K.; Liu, Y.; Benkrid, A.S. A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2009, 17, 561–570. [Google Scholar] [CrossRef]
- Rucci, E.; Garcia, C.; Botella, G.; Giusti, A.D.; Naiouf, M.; Prieto-Matias, M. Swifold: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst. Biol. 2018, 12, 96. [Google Scholar] [CrossRef] [PubMed]
- Gok, M.; Yilmaz, C. Efficient cell designs for systolic Smith-Waterman implementations. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications, Madrid, Spain, 28–30 August 2007; pp. 1–4. [Google Scholar]
- Reed, I.S.; Shih, M.T. VLSI design of inverse-free Berlekamp-Massey algorithm. IEE Proc. E (Comput. Digit. Tech.) 1991, 138, 295–298. [Google Scholar] [CrossRef]
- Chen, C.; Han, Y.S.; Wang, Z.; Bai, B. A new inversionless Berlekamp-Massey algorithm with efficient architecture. In Proceedings of the 2019 IEEE International Workshop on Signal Processing Systems (SiPS), Nanjing, China, 20–23 October 2019; pp. 48–53. [Google Scholar]
- Hwang, T. Parallel decoding of binary BCH codes. Electron. Lett. 1991, 27, 2223–2225. [Google Scholar] [CrossRef]
- Davey, M.C.; Mackay, D.J.C. Reliable communication over channels with insertions, deletions, and substitutions. IEEE Trans. Inf. Theory 2001, 42, 687–698. [Google Scholar] [CrossRef]
- Ezpeleta, J.; Krsticevic, F.J.; Bulacio, P.; Tapia, E. Designing robust watermark barcodes for multiplex long-read sequencing. BMC Bioinform. 2017, 33, 807–813. [Google Scholar] [CrossRef]
- Costea, P.I.; Lundeberg, J.; Akan, P. Tag GD: Fast and accurate software for DNA tag generation and demultiplexing. PLoS ONE 2013, 8, e57521. [Google Scholar] [CrossRef]
- Tambe, A.; Pachter, L. Barcode identification for single cell genomics. BMC Bioinform. 2019, 20, 32–41. [Google Scholar] [CrossRef]
Barcode Length | Registers | LUTs | Slices | /MHz |
---|---|---|---|---|
12-nt | 1084 (1%) | 2676 (1%) | 983 (2%) | 226.4 |
15-nt | 1860 (1%) | 4583 (3%) | 1366 (3%) | 228.7 |
31-nt | 7093 (2%) | 24,712 (16%) | 8171 (21%) | 198.6 |
Barcode Length | BCH Code | CPU | FPGA | Speedup |
---|---|---|---|---|
12-nt | (12,4,2) | 4 m 30.3 s | 3.9 s | 69× |
15-nt | (15,5,3) | 6 m 54.5 s | 5.3 s | 78× |
31-nt | (31,16,3) | 54 m 16.1 s | 17.6 s | 183× |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, W.; Zhang, Y.; Zhang, H.; Chen, W. Hardware Acceleration of Identifying Barcodes in Multiplexed Nanopore Sequencing. Electronics 2022, 11, 2596. https://doi.org/10.3390/electronics11162596
Hu W, Zhang Y, Zhang H, Chen W. Hardware Acceleration of Identifying Barcodes in Multiplexed Nanopore Sequencing. Electronics. 2022; 11(16):2596. https://doi.org/10.3390/electronics11162596
Chicago/Turabian StyleHu, Wenjie, Yuxin Zhang, Hongrui Zhang, and Weigang Chen. 2022. "Hardware Acceleration of Identifying Barcodes in Multiplexed Nanopore Sequencing" Electronics 11, no. 16: 2596. https://doi.org/10.3390/electronics11162596
APA StyleHu, W., Zhang, Y., Zhang, H., & Chen, W. (2022). Hardware Acceleration of Identifying Barcodes in Multiplexed Nanopore Sequencing. Electronics, 11(16), 2596. https://doi.org/10.3390/electronics11162596