General Designs Reveal a Purine-Pyrimidine Structural Code in Human DNA
Abstract
:1. Introduction
1.1. The Genome as an Information Source Containing Patterns and Codes
1.2. Genomic Signatures and General Designs
1.3. The Power of the Binary Components Analysis
1.4. The Physical and Chemical Properties of DNA Bases
1.5. Aim of Experiment and Importance of Findings
2. Methods, Data and Concepts
2.1. Obtaining Genomic Sequences from Human Genome Database
2.2. Mononucleotide and Dinucleotide Frequencies
2.3. Genomic Signatures: Odds Ratios and Relative Abundance Profiles
2.4. Components Analysis: Odds Ratios, Relative Abundance and Distance from Randomness
- The purine/pyrimidine RY dataset: RpR, RpY, YpR, and YpY;
- The weak/strong WS dataset: WpW, WpS, SpW, and SpS;
- The keto/amino dataset: KpK, KpM, MpK, and MpM.
2.5. Sequence Analysis and Statistics
2.6. Real DNA Sequences Verses a Random Model
- H0: The null hypothesis (H0) is that there is no difference between real and random sequences for each binary component (RY/WS/KM).
- H1: The alternate hypothesis is that there is a difference between real and random sequences.
3. Results and Discussion
3.1. Distance from Randomness Comparison of Binary Components RY, KM, and WS
3.2. Genomic Signatures and Odds Ratios for the Binary Components RY, KM, and WS
3.3. Information Content: Patterns and Codes in the DNA
3.4. The Use of Binary Components Analysis: Theory of Breakdown of Information Content
3.5. Assumptions, limitations and Future Research
4. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Chromosome No. | ANOVA | RY vs. WS | RY vs. KM | WS vs. KM |
---|---|---|---|---|
1. | F(1.87, 4303.25) = 9896.044, p < 0.001, = 0.811 | 0.063 | 0.033 | −0.03 |
2. | F(1.94, 4669.58) = 13,948.031, p < 0.001, = 0.853 | 0.06 | 0.025 | −0.035 |
3. | F(1.85, 3662.67) = 10,741.541, p < 0.001, = 0.844 | 0.058 | 0.025 | −0.033 |
4. | F(1.87, 3550.4) = 9521.713, p < 0.001, = 0.834 | 0.053 | 0.02 | −0.033 |
5. | F(1.63, 2954.66) = 6436.407, p < 0.001, = 0.78 | 0.056 | 0.023 | −0.032 |
6. | F(1.95, 3319.54) = 10,573.787, p < 0.001, = 0.862 | 0.06 | 0.025 | −0.036 |
7. | F(1.72, 2733.99) = 6212.169, p < 0.001, = 0.796 | 0.059 | 0.024 | −0.034 |
8. | F(1.84, 2662.38) = 6442.536, p < 0.001, = 0.817 | 0.055 | 0.024 | −0.031 |
9. | F(1.77, 2155.44) = 5441.261, p < 0.001, = 0.817 | 0.06 | 0.027 | −0.033 |
10. | F(1.66, 2216.09) = 5461.652, p < 0.001, = 0.804 | 0.061 | 0.026 | −0.035 |
11. | F(1.8, 2423.01) = 4060.935, p < 0.001, = 0.751 | 0.058 | 0.032 | −0.025 |
12. | F(1.77, 2357.78) = 4766.399, p < 0.001, = 0.782 | 0.061 | 0.029 | −0.032 |
13 | F(1.72, 1681.74) = 3222.249, p < 0.001, = 0.767 | 0.052 | 0.018 | −0.034 |
14. | F(1.7, 1534.44) = 3065.551, p < 0.001, = 0.772 | 0.061 | 0.028 | −0.033 |
15. | F(1.63, 1377.82) = 3458.144, p < 0.001, = 0.804 | 0.065 | 0.03 | −0.036 |
16. | F(1.63, 1329.35) = 3076.435, p < 0.001, = 0.79 | 0.061 | 0.029 | −0.031 |
17. | F(1.51, 1250.69) = 2419.446, p < 0.001, = 0.745 | 0.072 | 0.04 | −0.031 |
18. | F(1.65, 1316.46) = 2266.129, p < 0.001, = 0.739 | 0.055 | 0.025 | −0.03 |
19. | F(1.64, 956.8) = 1505.212, p < 0.001, = 0.721 | 0.069 | 0.037 | −0.032 |
20. | F(1.59, 1013.05) = 2682.895, p < 0.001, = 0.808 | 0.062 | 0.031 | −0.03 |
21. | F(1.52, 605.95) = 692.452, p < 0.001, = 0.634 | 0.053 | 0.021 | −0.031 |
22. | F(1.49, 581.68) = 806.02, p < 0.001, = 0.674 | 0.06 | 0.035 | −0.025 |
X. | F(1.93, 2979.63) = 5842.198, p < 0.001, = 0.791 | 0.055 | 0.02 | −0.035 |
Y. | F(1.79, 471.43) = 291.074, p < 0.001, = 0.525 | 0.041 | 0.011 | −0.029 |
Chromosome | RY | WS | KM | N | |||
---|---|---|---|---|---|---|---|
Mean | Std. Deviation | Mean | Std. Deviation | Mean | Std. Deviation | ||
1. | 0.124 | 0.020 | 0.060 | 0.024 | 0.091 | 0.009 | 2303 |
2. | 0.116 | 0.017 | 0.056 | 0.016 | 0.091 | 0.009 | 2405 |
3. | 0.116 | 0.018 | 0.058 | 0.019 | 0.091 | 0.009 | 1980 |
4. | 0.108 | 0.016 | 0.055 | 0.017 | 0.088 | 0.009 | 1897 |
5. | 0.114 | 0.018 | 0.058 | 0.025 | 0.091 | 0.009 | 1812 |
6. | 0.115 | 0.017 | 0.055 | 0.016 | 0.090 | 0.008 | 1700 |
7. | 0.117 | 0.019 | 0.058 | 0.024 | 0.093 | 0.010 | 1589 |
8. | 0.114 | 0.017 | 0.059 | 0.018 | 0.091 | 0.009 | 1447 |
9. | 0.119 | 0.018 | 0.059 | 0.022 | 0.093 | 0.010 | 1217 |
10. | 0.120 | 0.017 | 0.059 | 0.025 | 0.094 | 0.009 | 1332 |
11. | 0.123 | 0.022 | 0.065 | 0.027 | 0.090 | 0.010 | 1345 |
12. | 0.120 | 0.020 | 0.059 | 0.026 | 0.091 | 0.010 | 1331 |
13. | 0.107 | 0.017 | 0.055 | 0.020 | 0.089 | 0.010 | 979 |
14. | 0.119 | 0.019 | 0.058 | 0.024 | 0.091 | 0.010 | 905 |
15. | 0.124 | 0.017 | 0.059 | 0.026 | 0.094 | 0.008 | 846 |
16. | 0.126 | 0.018 | 0.066 | 0.026 | 0.097 | 0.012 | 818 |
17. | 0.136 | 0.021 | 0.064 | 0.034 | 0.096 | 0.010 | 829 |
18. | 0.116 | 0.022 | 0.061 | 0.026 | 0.091 | 0.009 | 800 |
19. | 0.139 | 0.020 | 0.070 | 0.035 | 0.102 | 0.009 | 584 |
20. | 0.129 | 0.016 | 0.067 | 0.026 | 0.098 | 0.009 | 639 |
21. | 0.115 | 0.020 | 0.062 | 0.030 | 0.093 | 0.015 | 400 |
22. | 0.133 | 0.018 | 0.073 | 0.031 | 0.098 | 0.014 | 391 |
X. | 0.112 | 0.020 | 0.057 | 0.019 | 0.092 | 0.009 | 1548 |
Y. | 0.108 | 0.018 | 0.068 | 0.030 | 0.097 | 0.015 | 264 |
References
- Locey, K.J.; White, E.P. Simple structural differences between coding and non-coding DNA. PLoS ONE 2011, 6, e14651. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Harrow, J.; Frankish, A.; Gonzalez, J.M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B.L.; Barrell, D.; Zadissa, A.; Searle, S.; et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 2012, 22, 1760–1774. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Slattery, M.; Zhou, T.; Yang, L.; Machado, A.C.D.; Gordân, R.; Rohs, R. Absence of a simple code: How transcription factors read the genome. Trends Biochem. Sci. 2014, 39, 381–399. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lee, C.M.; Barber, G.P.; Casper, J.; Clawson, H.; Diekhans, M.; Gonzalez, J.N.; Hinrichs, A.S.; Lee, B.T.; Nassar, L.R.; Powell, C.C.; et al. UCSC genome browser enters 20th year. Nucleic Acids Res. 2020, 48, D756–D761. [Google Scholar] [CrossRef]
- Fishilevich, S.; Nudel, R.; Rappaport, N.; Hadar, R.; Plaschkes, I.; Iny Stein, T.; Rosen, N.; Kohn, A.; Twik, M.; Safran, M.; et al. GeneHancer: Genome-wide integration of enhancers and target genes in GeneCards. Database 2017, 2017, bax028. [Google Scholar] [CrossRef] [Green Version]
- Sternberg, R.V. DNA codes and information: Formal structures and relational causes. Acta Biotheor. 2008, 56, 205–232. [Google Scholar] [CrossRef]
- Jernigan, R.W.; Baran, R.H. Pervasive properties of the genomic signature. BMC Genom. 2002, 3, 23. [Google Scholar] [CrossRef]
- Karlin, S.; Ladunga, I. Comparisons of eukaryotic genomic sequences. Proc. Natl. Acad. Sci. USA 1994, 91, 12832–12836. [Google Scholar] [CrossRef] [Green Version]
- Karlin, S.; Burge, C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet. 1995, 11, 283–290. [Google Scholar]
- Karlin, S.; Campbell, A.M.; Mrázek, J. Comparative DNA analysis across diverse genomes. Annu. Rev. Genet. 1998, 32, 185–225. [Google Scholar] [CrossRef] [Green Version]
- Karlin, S.; Mrázek, J. Compositional differences within and between eukaryotic genomes. Proc. Natl. Acad. Sci. USA 1997, 94, 10227–10232. [Google Scholar] [CrossRef] [Green Version]
- Ghannam, J.Y.; Wang, J.; Jan, A. Biochemistry, DNA Structure. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2022. [Google Scholar]
- Burge, C.; Campbell, A.M.; Karlin, S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 1992, 89, 1358–1362. [Google Scholar] [CrossRef] [Green Version]
- Travers, A.A.; Muskhelishvili, G.; Thompson, J.M. DNA information: From digital code to analogue structure. Philos. Trans. A Math. Phys. Eng. Sci. 2012, 370, 2960–2986. [Google Scholar] [CrossRef] [Green Version]
- Hood, L.; Galas, D. The digital code of DNA. Nature 2003, 421, 444–448. [Google Scholar] [CrossRef]
- Del Prado, A.; González-Rodríguez, D.; Wu, Y.L. Functional systems derived from nucleobase self-assembly. Chem. Open 2020, 9, 409–430. [Google Scholar]
- Yagil, G. The over-representation of binary DNA tracts in seven sequenced chromosomes. BMC Genom. 2004, 5, 19. [Google Scholar] [CrossRef] [Green Version]
- Amano, N.; Ohfuku, Y.; Suzuki, M. Genomes and DNA conformation. Biol. Chem. 1997, 378, 1397–1404. [Google Scholar]
- Bucher, P.; Yagil, G. Occurrence of oligopurine.oligopyrimidine tracts in eukaryotic and prokaryotic genes. DNA Seq. 1991, 1, 157–172. [Google Scholar] [CrossRef]
- Hunter, C.A. Sequence-dependent DNA structure. The role of base stacking interactions. J. Mol. Biol. 1993, 230, 1025–1054. [Google Scholar] [CrossRef]
- Calladine, C.R.; Drew, H.R.; Luisi, B.F.; Travers, A.A. Understanding DNA, the Molecule and How it Works, 3rd ed.; Academic Press: Cambridge, MA, USA, 2004. [Google Scholar]
- Slocombe, L.; Al-Khalili, J.S.; Sacchi, M. Quantum and classical effects in DNA point mutations: Watson-Crick tautomerism in AT and GC base pairs. Phys. Chem. Chem. Phys. 2021, 23, 4141–4150. [Google Scholar] [CrossRef]
- Mo, Y. Probing the nature of hydrogen bonds in DNA base pairs. J. Mol. Model. 2006, 12, 665–672. [Google Scholar] [CrossRef]
- Shioiri, C.; Takahata, N. Skew of mononucleotide frequencies, relative abundance of dinucleotides, and DNA strand asymmetry. J. Mol. Evol. 2001, 53, 364–376. [Google Scholar] [CrossRef]
- Nemzer, L.R. A binary representation of the genetic code. Biosystems 2017, 155, 10–19. [Google Scholar] [CrossRef] [Green Version]
- Yu, C.P.; Kuo, C.H.; Nelson, C.W.; Chen, C.A.; Soh, Z.T.; Lin, J.J.; Hsiao, R.X.; Chang, C.Y.; Li, W.H. Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data. Proc. Natl. Acad. Sci. USA 2021, 118, e2026754118. [Google Scholar] [CrossRef]
- Xiong, L.; Kang, R.; Ding, R.; Kang, W.; Zhang, Y.; Liu, W.; Huang, Q.; Meng, J.; Guo, Z. Genome-wide Identification and Characterization of Enhancers Across 10 Human Tissues. Int. J. Biol. Sci. 2018, 14, 1321–1332. [Google Scholar] [CrossRef]
- Napoli, A.A.; Lawson, C.L.; Ebright, R.H.; Berman, H.M. Indirect readout of DNA sequence at the primary-kink site in the CAP-DNA complex: Recognition of pyrimidine-purine and purine-purine steps. J. Mol. Biol. 2006, 357, 173–183. [Google Scholar] [CrossRef] [Green Version]
- Pabo, C.O.; Nekludova, L. Geometric analysis and comparison of protein-DNA interfaces: Why is there no simple code for recognition? J. Mol. Biol. 2000, 301, 597–624. [Google Scholar] [CrossRef]
- Zhou, T.; Shen, N.; Yang, L.; Abe, N.; Horton, J.; Mann, R.S.; Bussemaker, H.J.; Gordân, R.; Rohs, R. Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. USA 2015, 112, 4654–4659. [Google Scholar] [CrossRef] [Green Version]
- Retureau, R.; Foloppe, N.; Elbahnsi, A.; Oguey, C.; Hartmann, B. A dynamic view of DNA structure within the nucleosome: Biological implications. J. Struct. Biol. 2020, 211, 107511. [Google Scholar] [CrossRef]
- Richmond, T.J.; Davey, C.A. The structure of DNA in the nucleosome core. Nature 2003, 423, 145–150. [Google Scholar] [CrossRef]
- Coulocheri, S.A.; Pigis, D.G.; Papavassiliou, K.A.; Papavassiliou, A.G. Hydrogen bonds in protein-DNA complexes: Where geometry meets plasticity. Biochimie 2007, 89, 1291–1303. [Google Scholar] [CrossRef] [PubMed]
- Gago, F. Stacking interactions and intercalative DNA binding. Methods 1998, 14, 277–292. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Seeman, N.C.; Rosenberg, J.M.; Rich, A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. USA 1976, 73, 804–808. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cheng, A.C.; Chen, W.W.; Fuhrmann, C.N.; Frankel, A.D. Recognition of nucleic acid bases and base-pairs by hydrogen bonding to amino acid side-chains. J. Mol. Biol. 2003, 327, 781–796. [Google Scholar] [CrossRef] [Green Version]
- Youk, J.; An, Y.; Park, S.; Lee, J.K.; Ju, Y.S. The genome-wide landscape of C:G > T:A polymorphism at the CpG contexts in the human population. BMC Genom. 2020, 21, 270. [Google Scholar] [CrossRef]
- Cooper, D.N.; Gerber-Huber, S. DNA methylation and CpG suppression. Cell Differ. 1985, 17, 199–205. [Google Scholar] [CrossRef]
- Malik, F.K.; Guo, J.T. Insights into protein-DNA interactions from hydrogen bond energy-based comparative protein-ligand analyses. Proteins 2022, 90, 1303–1314. [Google Scholar] [CrossRef]
- Gershman, A.; Sauria, M.E.G.; Guitart, X.; Vollger, M.R.; Hook, P.W.; Hoyt, S.J.; Jain, M.; Shumate, A.; Razaghi, R.; Koren, S.; et al. Epigenetic patterns in a complete human genome. Science 2022, 376, eabj5089. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Chang, C.H.; Hsieh, L.C.; Chen, T.Y.; Chen, H.D.; Luo, L.; Lee, H.C. Shannon information in complete genomes. J. Bioinform. Comput. Biol. 2005, 3, 587–608. [Google Scholar] [CrossRef]
- Vinga, S. Information theory applications for biological sequence analysis. Brief Bioinform. 2014, 15, 376–389. [Google Scholar] [CrossRef] [Green Version]
- Zarrei, M.; MacDonald, J.R.; Merico, D.; Scherer, S.W. A copy number variation map of the human genome. Nat. Rev. Genet. 2015, 16, 172–183. [Google Scholar] [CrossRef]
- Matveishina, E.; Antonov, I.; Medvedeva, Y.A. Practical guidance in genome-wide RNA:DNA triple helix prediction. Int. J. Mol. Sci. 2020, 21, 830. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cohen, D. General Designs Reveal a Purine-Pyrimidine Structural Code in Human DNA. Mathematics 2022, 10, 2723. https://doi.org/10.3390/math10152723
Cohen D. General Designs Reveal a Purine-Pyrimidine Structural Code in Human DNA. Mathematics. 2022; 10(15):2723. https://doi.org/10.3390/math10152723
Chicago/Turabian StyleCohen, Dana. 2022. "General Designs Reveal a Purine-Pyrimidine Structural Code in Human DNA" Mathematics 10, no. 15: 2723. https://doi.org/10.3390/math10152723
APA StyleCohen, D. (2022). General Designs Reveal a Purine-Pyrimidine Structural Code in Human DNA. Mathematics, 10(15), 2723. https://doi.org/10.3390/math10152723