Image Storage in DNA by an Extensible Quaternary Codec System
Abstract
:1. Introduction
- Dividing an image into B, G, and R color channels for separate encoding and storage.
- Applying quaternary Huffman encoding to each color channel, directly encoding image information into nucleotide sequences composed of ATCG, eliminating the traditional “data → binary data stream → ATCG sequence” process, and overcoming the storage density limit of 2 bits/nt.
- Incorporating Reed–Solomon error correction codes into each DNA sequence to enhance the extensibility of the proposed method.
2. Materials and Research Methods
2.1. Encoding
- Probability Calculation: The probability of the occurrence of each pixel value in the image is calculated and used as the weight of the leaf nodes of the Huffman tree. The nodes are sorted in ascending order of their weights (from bottom to top).
- Tree Construction: The four nodes with the smallest weights are sequentially encoded as A/T/C/G. The weights of these four nodes are summed, and the result is used as the weight of a new node to construct the next level of the Huffman tree. This process is repeated until all nodes are encoded. The resulting sequence is denoted as seq_init.
- Metadata Generation: The initial encoded sequence is analyzed to generate an information table and a tree table, which are stored in the computer to enhance decoding accuracy.
- Biochemical Constraints: Due to limitations in DNA synthesis and sequencing technologies [22,23,24], biochemical constraints must be applied to the initial sequence. The ETQ system adopts a two-step extended mapping strategy:
- CG Content Control: A 5-to-6 mapping is used, grouping every five bases and mapping them to six bases with a GC ratio close to 50%.
- Homopolymer Length Control: Sliding windows are applied to replace homopolymers of “CCCC”, “GGGG”, “TTTT”, and “AAAA” with “ATGCC”, “ATGCG”, “ATGCT”, and “ATGCA”, respectively, ensuring a controlled homopolymer length.
2.2. Error Correction
- Syndrome Computation: The syndrome values are computed by evaluating the received polynomial at consecutive powers of the primitive element , specifically at , for j = 1, 2, …, n − k, within the . This operation mathematically corresponds to
- Error Locator Polynomial: Employ the Berlekamp–Massey algorithm to derive coefficients λ of the error locator polynomial
- Root Identification: Execute a Chien search [30] to determine roots of , yielding error positions .
- Error Magnitude Calculation: Apply Forney’s algorithm [31] to compute error magnitudes
- Error Correction: reconstruct using and recover the codeword
2.3. Synthesis and Sequencing
3. Results
4. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhirnov, V.; Zadegan, R.M.; Sandhu, G.S.; Church, G.M.; Hughes, W.L. Nucleic Acid Memory. Nat. Mater. 2016, 15, 366–370. [Google Scholar] [CrossRef] [PubMed]
- Recent News Release. Available online: https://www.wsts.org/76/Recent-News-Release (accessed on 16 April 2025).
- Market Analysis Perspective: Worldwide Enterprise Storage Systems. 2024. Available online: https://my.idc.com/getdoc.jsp?containerId=US47269521 (accessed on 16 April 2025).
- Gartner Forecasts Worldwide IT Spending to Grow 9.3% in 2025. Available online: https://www.gartner.com/en/newsroom/press-releases/2024-10-23-gartner-forecasts-worldwide-it-spending-to-grow-nine-point-three-percent-in-2025 (accessed on 16 April 2025).
- Bonnet, J.; Colotte, M.; Coudy, D.; Couallier, V.; Portier, J.; Morin, B.; Tuffet, S. Chain and Conformation Stability of Solid-State DNA: Implications for Room Temperature Storage. Nucleic Acids Res. 2009, 38, 1531–1546. [Google Scholar] [CrossRef]
- Zhou, Y.; Bi, K.; Ge, Q.; Lu, Z. Advances and Challenges in Random Access Techniques for In Vitro DNA Data Storage. ACS Appl. Mater. Interfaces 2024, 16, 43102–43113. [Google Scholar] [CrossRef]
- Chen, W.D.; Kohll, A.X.; Nguyen, B.H.; Koch, J.; Heckel, R.; Stark, W.J.; Ceze, L.; Strauss, K.; Grass, R.N. Combining Data Longevity with High Storage Capacity—Layer-by-Layer DNA Encapsulated in Magnetic Nanoparticles. Adv. Funct. Mater. 2019, 29, 1901672. [Google Scholar] [CrossRef]
- Church, G.M.; Gao, Y.; Kosuri, S. Next-Generation Digital Information Storage in DNA. Science 2012, 337, 1628. [Google Scholar] [CrossRef]
- Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; LeProust, E.M.; Sipos, B.; Birney, E. Towards Practical, High-Capacity, Low-Maintenance Information Storage in Synthesized DNA. Nature 2013, 494, 77–80. [Google Scholar] [CrossRef]
- Erlich, Y.; Zielinski, D. DNA Fountain Enables a Robust and Efficient Storage Architecture. Science 2017, 355, 950–954. [Google Scholar] [CrossRef]
- Organick, L.; Ang, S.D.; Chen, Y.-J.; Lopez, R.; Yekhanin, S.; Makarychev, K.; Racz, M.Z.; Kamath, G.; Gopalan, P.; Nguyen, B.; et al. Random Access in Large-Scale DNA Data Storage. Nat. Biotechnol. 2018, 36, 242–248. [Google Scholar] [CrossRef]
- Pan, C.; Tabatabaei, S.K.; Tabatabaei Yazdi, S.M.H.; Hernandez, A.G.; Schroeder, C.M.; Milenkovic, O. Rewritable Two-Dimensional DNA-Based Data Storage with Machine Learning Reconstruction. Nat. Commun. 2022, 13, 2984. [Google Scholar] [CrossRef]
- Ping, Z.; Chen, S.; Zhou, G.; Huang, X.; Zhu, S.; Zhang, H.; Lee, H.H.; Lan, Z.; Cui, J.; Chen, T.; et al. Towards Practical and Robust DNA-Based Data Archiving Using the Yin–Yang Codec System. Nat. Comput. Sci. 2022, 2, 234–242. [Google Scholar] [CrossRef]
- Ceze, L.; Nivala, J.; Strauss, K. Molecular Digital Data Storage Using DNA. Nat. Rev. Genet. 2019, 20, 456–466. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Han, M.; Zhou, J.; Ge, Q.; Wang, P.; Zhang, X.; Zhu, S.; Song, L.; Yuan, Y. An Artificial Chromosome for Data Storage. Natl. Sci. Rev. 2021, 8, nwab028. [Google Scholar] [CrossRef] [PubMed]
- Rasool, A.; Hong, J.; Jiang, Q.; Chen, H.; Qu, Q. BO-DNA: Biologically Optimized Encoding Model for a Highly-Reliable DNA Data Storage. Comput. Biol. Med. 2023, 165, 107404. [Google Scholar] [CrossRef]
- Ding, L.; Wu, S.; Hou, Z.; Li, A.; Xu, Y.; Feng, H.; Pan, W.; Ruan, J. Improving Error-Correcting Capability in DNA Digital Storage via Soft-Decision Decoding. Natl. Sci. Rev. 2024, 11, nwad229. [Google Scholar] [CrossRef]
- Dey, S.; Fan, C.; Gothelf, K.V.; Li, J.; Lin, C.; Liu, L.; Liu, N.; Nijenhuis, M.A.D.; Saccà, B.; Simmel, F.C.; et al. DNA Origami. Nat. Rev. Methods Primers 2021, 1, 13. [Google Scholar] [CrossRef]
- Postigo, A.; Marcuello, C.; Verstraeten, W.; Sarasa, S.; Walther, T.; Lostao, A.; Göpfrich, K.; del Barrio, J.; Hernández-Ainsa, S. Folding and Functionalizing DNA Origami: A Versatile Approach Using a Reactive Polyamine. J. Am. Chem. Soc. 2025, 147, 3919–3924. [Google Scholar] [CrossRef]
- Grass, R.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angew. Chem. 2015, 54, 2552–2555. [Google Scholar] [CrossRef]
- Moon, B.; Jagadish, H.V.; Faloutsos, C.; Saltz, J.H. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve. IEEE Trans. Knowl. Data Eng. 2001, 13, 124–141. [Google Scholar] [CrossRef]
- Baek, D.; Joe, S.-Y.; Shin, H.; Park, C.; Jo, S.; Chun, H. Recent Progress in High-Throughput Enzymatic DNA Synthesis for Data Storage. Biochip J. 2024, 18, 357–372. [Google Scholar] [CrossRef]
- Masaki, Y.; Onishi, Y.; Seio, K. Quantification of Synthetic Errors during Chemical Synthesis of DNA and Its Suppression by Non-Canonical Nucleosides. Sci. Rep. 2022, 12, 12095. [Google Scholar] [CrossRef]
- Hoose, A.; Vellacott, R.; Storch, M.; Freemont, P.S.; Ryadnov, M.G. DNA Synthesis Technologies to Close the Gene Writing Gap. Nat. Rev. Chem. 2023, 7, 144–161. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.-J.; Takahashi, C.N.; Organick, L.; Bee, C.; Ang, S.D.; Weiss, P.; Peck, B.; Seelig, G.; Ceze, L.; Strauss, K. Quantifying Molecular Bias in DNA Data Storage. Nat. Commun. 2020, 11, 3264. [Google Scholar] [CrossRef] [PubMed]
- Proakis, J.; Salehi, M. Communication Systems Engineering; Prentice Hall: Upper Saddle River, NJ, USA, 1994. [Google Scholar]
- Blawat, M.; Gaedke, K.; Hütter, I.; Chen, X.-M.; Turczyk, B.; Inverso, S.; Pruitt, B.W.; Church, G.M. Forward Error Correction for DNA Data Storage. Procedia Comput. Sci. 2016, 80, 1011–1022. [Google Scholar] [CrossRef]
- Meiser, L.C.; Antkowiak, P.L.; Koch, J.; Chen, W.D.; Kohll, A.X.; Stark, W.; Heckel, R.; Grass, R. Reading and Writing Digital Data in DNA. Nat. Protoc. 2019, 15, 86–101. [Google Scholar] [CrossRef]
- Press, W.; Hawkins, J.; Jones, S.K.; Schaub, J.M.; Finkelstein, I.J. HEDGES Error-Correcting Code for DNA Storage Corrects Indels and Allows Sequence Constraints. Proc. Natl. Acad. Sci. USA 2020, 117, 18489–18496. [Google Scholar] [CrossRef]
- Chien, R. Cyclic Decoding Procedures for Bose- Chaudhuri-Hocquenghem Codes. IEEE Trans. Inf. Theory 1964, 10, 357–363. [Google Scholar] [CrossRef]
- Forney, G. On Decoding BCH Codes. IEEE Trans. Inf. Theory 1965, 11, 549–557. [Google Scholar] [CrossRef]
- Chen, S. Notes/Notebooks/ReedSolomonErasureCodes.Ipynb at Master·Chenshuo/Notes. Available online: https://github.com/chenshuo/notes/blob/master/notebooks/ReedSolomonErasureCodes.ipynb (accessed on 8 January 2025).
- Caruthers, M.H. The Chemical Synthesis of DNA/RNA: Our Gift to Science. J. Biol. Chem. 2013, 288, 1420–1427. [Google Scholar] [CrossRef]
- Xu, C.; Zhao, C.; Ma, B.; Liu, H. Uncertainties in Synthetic DNA-Based Data Storage. Nucleic Acids Res. 2021, 49, 5451–5469. [Google Scholar] [CrossRef]
- Lu, M.; Wang, Y.; Qiang, W.; Cui, J.; Wang, Y.; Huang, X.; Dai, J. Towards High-Density Storage of Text and Images into DNA by the “Xiao-Pang” Codec System. Sci. China Life Sci. 2023, 66, 1447–1450. [Google Scholar] [CrossRef]
- Yan, Z.; Liang, C.; Wu, H. A Segmented-Edit Error-Correcting Code with Re-Synchronization Function for DNA-Based Storage Systems. IEEE Trans. Emerg. Top. Comput. 2023, 11, 605–618. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pang, R.; Dong, Y.; Zhao, X. Image Storage in DNA by an Extensible Quaternary Codec System. Appl. Sci. 2025, 15, 4760. https://doi.org/10.3390/app15094760
Pang R, Dong Y, Zhao X. Image Storage in DNA by an Extensible Quaternary Codec System. Applied Sciences. 2025; 15(9):4760. https://doi.org/10.3390/app15094760
Chicago/Turabian StylePang, Ruoying, Yiming Dong, and Xin Zhao. 2025. "Image Storage in DNA by an Extensible Quaternary Codec System" Applied Sciences 15, no. 9: 4760. https://doi.org/10.3390/app15094760
APA StylePang, R., Dong, Y., & Zhao, X. (2025). Image Storage in DNA by an Extensible Quaternary Codec System. Applied Sciences, 15(9), 4760. https://doi.org/10.3390/app15094760