*Review* **Metal Ion-Directed Specific DNA Structures and Their Functions**

**Toshihiro Ihara \* , Yusuke Kitamura and Yousuke Katsuda**

Division of Materials Science and Chemistry, Faculty of Advanced Science and Technology, Kumamoto University, 2-39-1, Kurokami, Chuo-ku, Kumamoto 860-8555, Japan; ykita@kumamoto-u.ac.jp (Y.K.); katsuda2243@kumamoto-u.ac.jp (Y.K.)

**\*** Correspondence: toshi@chem.kumamoto-u.ac.jp; Tel.: +81-96-342-3873

**Abstract:** Various DNA structures, including specific metal ion complexes, have been designed based on the knowledge of canonical base pairing as well as general coordination chemistry. The role of metal ions in these studies is quite broad and diverse. Metal ions can be targets themselves in analytical applications, essential building blocks of certain DNA structures that one wishes to construct, or they can be responsible for signal generation, such as luminescence or redox. Using DNA conjugates with metal chelators, one can more freely design DNA complexes with diverse structures and functions by following the simple HSAB rule. In this short review, the authors summarize a part of their DNA chemistries involving specific metal ion coordination. It consists of three topics: (1) significant stabilization of DNA triple helix by silver ion; (2) metal ion-directed dynamic sequence edition through global conformational change by intramolecular complexation; and (3) reconstruction of luminescent lanthanide complexes on DNA and their analytical applications.

**Keywords:** DNA conjugate; metal ion; triple helix; silver ion; lanthanide; ATP sensor; aptamer; terpyridine; sequence edition; DNAzyme

### **1. Introduction**

Almost 20 years after the completion of the Human Genome Project [1,2], nucleic acid chemistry is once again a research focus, with the emergence of a number of new fields, including epigenetics, RNA interference, noncoding RNA, mRNA vaccines, iPS cells, and nucleic acid medicine. Molecular engineering in nucleic acid chemistry has become more flexible than ever before to meet the demands and new challenges in these emerging research fields, taking advantage of functional nucleic acids such as aptamers [3], ribozyme [4], and DNAzyme [5–7], as well as programmed spontaneous strand exchange reactions such as DNA circuits [8,9].

The basis of any molecular engineering of nucleic acids, after all, is the knowledge and techniques for the formation of canonical and some noncanonical structures of nucleic acids. Under certain conditions, we are now able to logically design a variety of static and dynamic structures of DNA/RNA by predicting the most stable duplex structures that will form in the solutions containing these mixtures. The pioneering work in predicting the thermodynamic stability of duplex structures based on the nearest-neighbor model was undoubtedly revolutionary [10–13]. These advances have contributed to the development of almost all modern hybridization-based techniques widely used for gene expression control, gene editing, and analysis, such as antisense, RNAi, CRISPR/Cas9, and in situ hybridization, among others.

In our previous work, using synthetic DNAs and DNA conjugates, we reported various conjugates consisting of oligo DNAs and functional molecules, e.g., anthracene [14], β-cyclodextrin [15], ferrocene [16], and several metal ion chelators [17]. The DNA conjugates were programmed or designed to change their structures in various ways in response to specific stimuli. Outputs, including photochemical ligations, luminescence, and electrochemical responses resulting from the structural changes, have been used to detect

**Citation:** Ihara, T.; Kitamura, Y.; Katsuda, Y. Metal Ion-Directed Specific DNA Structures and Their Functions. *Life* **2022**, *12*, 686. https://doi.org/10.3390/life12050686

Academic Editors: Tigran Chalikian and Jens Völker

Received: 5 April 2022 Accepted: 2 May 2022 Published: 5 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the stimuli themselves, complementary DNA/RNA, or other biomolecules. The merit of nucleic acids as molecular platforms is that pre-designed structures can be precisely constructed in a bottom-up fashion, providing an unparalleled advantage with respect to all biotechnological applications as mentioned above. This short article summarizes some of the works we conducted after one of the authors (T.I.) visited the laboratory of Prof. Breslauer at Rutgers University in 2001–2002, especially for the systems related to complexation with metal ions. In these works, metal ions were used as a critical structural factor and a signal generator in the design of both static and dynamic DNA structures.

### **2. Stabilization of a Parallel-Motif DNA Triplex by Silver Ion**

When designing the ligands for specific sequences in DNA duplexes, triple helix formation is a useful recognition motif, inasmuch as the formation of the base triplet follows the simple rule of complementary Hoogsteen hydrogen bonding, CG.C<sup>+</sup> and TA.T, for the parallel motif of the triplex. However, the triplexes containing CG.C<sup>+</sup> triplets form only in a weak acidic solution, because the N<sup>3</sup> position of cytosines (p*K*<sup>a</sup> = 4.5) in the third strand must be protonated to fulfill its complementarity [18]. With the aim of achieving sufficient stability under physiological conditions, a large quantity of chemically modified DNA has been developed by taking advantage of the highly advanced techniques of organic synthesis [19].

We reported an effective alternative method for the stabilization of the parallel motif triple helix of DNA using silver ions (Ag<sup>+</sup> ) [20]. Ono et al. reported that the formation of C–C and T–T mispairings in the duplex is promoted by Ag<sup>+</sup> and Hg2+, respectively [21]. In these duplexes, the ions were placed between the bases to form specific bridges (C–Ag+–C, T–Hg2+–T). These results led to the idea that it might be possible to stabilize triplex structures containing CG.C<sup>+</sup> base triplexes with Ag<sup>+</sup> . The silver ion was expected to displace an N<sup>3</sup> proton of a cytosine in the CG.C<sup>+</sup> to form a metal ion-mediated base triplet, CG.CAg<sup>+</sup> , as shown in Figure 1a. This process was expected to stabilize parallel motif triplexes even at neutral pH.

Figure 1b shows the UV melting curves at a pH of 7.0 and pH dependence of the temperatures of triplex–duplex transition. Surprisingly, the addition of an equal amount of Ag<sup>+</sup> (to CG.C<sup>+</sup> ) increased the melting temperature of the triplex by more than 30 ◦C under neutral conditions [20]. In the absence of Ag<sup>+</sup> , the relation of the melting temperature to pH was clearly evident. Meanwhile, in the presence of Ag<sup>+</sup> , the correlation disappeared, and a biphasic feature consisting of two temperature-independent regions was observed. A phase diagram of the structure of the Ag<sup>+</sup> -mediated nucleobase complex could be drawn based on this characteristic melting temperature–pH property. Mass spectrometry (ESI-TOF MS) clearly showed the quantitative formation of the Ag<sup>+</sup> -mediated base triplet, CG.CAg<sup>+</sup> . The results of modeling studies by DFT (B3LYP/6-31G\*//3-21G) suggest that the cytosines on the third strand are forced to be twisted from the plane of Watson–Crick GC pairs in CG.CAg<sup>+</sup> triplets, because the coordination distance in N–Ag+–N would be longer than that of the Hoogsteen hydrogen bonds, N–H+–N, in CG.C<sup>+</sup> . The deviation from the typical triplex structure observed in studies using CD is consistent with this non-planarity of CG.CAg<sup>+</sup> .

The method described here for the stabilization of DNA triplexes is both simple and effective. All that is required is the addition of an equimolar amount of Ag<sup>+</sup> into the solution containing the DNA triplex. The triplexes mediated by Ag<sup>+</sup> were found to be stable even in a weak basic solution and can be applied in various research tasks, including the regulation of DNAzyme activity [22], sensing [23,24], and luminous Ag nanocluster formation [25].

**Figure 1.** Triplex stabilization by silver ion. (**a**) Structure of the triplex and CG.CAg+ base triplet; (**b**) upper left: UV melting curves in the presence of Ag+ with various feeding ratios. Only the temperature of the triplex–duplex transition increased with the addition of Ag+. Bottom left: pH dependence of the melting temperatures of triplex in the absence and presence of Ag+. The melting temperature in the presence of Ag+ consists of two pH-independent regions. Right: Phase diagram of the structure of Ag+-mediated nucleobase complexes. **Figure 1.** Triplex stabilization by silver ion. (**a**) Structure of the triplex and CG.CAg<sup>+</sup> base triplet; (**b**) upper left: UV melting curves in the presence of Ag<sup>+</sup> with various feeding ratios. Only the temperature of the triplex–duplex transition increased with the addition of Ag<sup>+</sup> . Bottom left: pH dependence of the melting temperatures of triplex in the absence and presence of Ag<sup>+</sup> . The melting temperature in the presence of Ag<sup>+</sup> consists of two pH-independent regions. Right: Phase diagram of the structure of Ag<sup>+</sup> -mediated nucleobase complexes.

#### The method described here for the stabilization of DNA triplexes is both simple and effective. All that is required is the addition of an equimolar amount of Ag+ into the solu-**3. Metal Ion-Directed Dynamic Splicing of DNA through Global Conformational Change by Intramolecular Complexation**

tion containing the DNA triplex. The triplexes mediated by Ag+ were found to be stable even in a weak basic solution and can be applied in various research tasks, including the regulation of DNAzyme activity [22], sensing [23,24], and luminous Ag nanocluster formation [25]. **3. Metal Ion-Directed Dynamic Splicing of DNA through Global Conformational Change by Intramolecular Complexation**  The metal ion-directed global conformational control of DNA was performed as follows. Two terpyridine units were built into the distal sites on the DNA backbone to pre-The metal ion-directed global conformational control of DNA was performed as follows. Two terpyridine units were built into the distal sites on the DNA backbone to prepare a conjugate, i.e., **terpy2DNA**. The two terpyridines formed a stable intramolecular 1:2 complex, [M(terpy)2] 2+, with divalent transition metal ions, M2+, namely Fe2+, Ni2+ , Cu2+, and Zn2+. By the specific formation of an intramolecular metal complex, a part of the sequence of the DNA in between the two terpyridine units was reversibly excluded, and the two flanking external DNA segments were directly connected with each other to form an Ω-shaped structure presenting a new sequence (Figure 2). This can be regarded as a metal ion-directed reversible edition of the DNA sequence or dynamic DNA splicing [26].

pare a conjugate, i.e., **terpy2DNA**. The two terpyridines formed a stable intramolecular 1:2 complex, [M(terpy)2]2+, with divalent transition metal ions, M2+, namely Fe2+, Ni2+, Cu2+, and Zn2+. By the specific formation of an intramolecular metal complex, a part of the sequence of the DNA in between the two terpyridine units was reversibly excluded, and the two flanking external DNA segments were directly connected with each other to form an Ω-shaped structure presenting a new sequence (Figure 2). This can be regarded as a metal Conformational control of **terpy2DNA** was confirmed via UV melting with the complementary tandem sequence of the two external segments. The results show that the duplex structure was significantly stabilized in the presence of an equimolar amount (to **terpy2DNA**) of transition metal ions. In addition, in the presence of the metal ions, the shape of the melting curves changed to be more cooperative, indicating that the two sequences outside the terpyridines were cooperatively dissociated in a narrow temper-

ion-directed reversible edition of the DNA sequence or dynamic DNA splicing [26].

ature range. The dependences of duplex stabilization on the metal ion feeding ratio (*r* = [M2+]/[**terpy2DNA**]) were different for each of the metal ions. In the case of Fe2+ and Ni2+, the duplex remained stable even when additional metal ions were added to *r* = 2 or 3. In contrast, the duplex was destabilized at higher feeding ratios of Cu2+ and Zn2+ . The stability of the duplex was maintained even in the presence of the excess amounts of Fe2+ and Ni2+, because the Ω-shaped conformation of **terpy2DNA** was preserved due to the magnitudes of the two successive binding constants with terpyridine, *K*<sup>1</sup> < *K*2. As for Cu2+ and Zn2+, the global conformation of **terpy2DNA** was changed from Ω-form to a linear form accompanying the transition of the complex types formed on **terpy2DNA** from [M(terpy)2] 2+ (on **terpy2DNA**·M2+) to 2[M(terpy)]2+ (on **terpy2DNA**·2M2+) with increasing amounts of ions due to their binding properties with terpyridine, *K*<sup>1</sup> > *K*2. This indicates that the general trend of the complexation of transition metal ions found in the text books of coordination chemistry is still valid on DNA. range. The dependences of duplex stabilization on the metal ion feeding ratio (*r* = [M2+]/[**terpy2DNA**]) were different for each of the metal ions. In the case of Fe2+ and Ni2+, the duplex remained stable even when additional metal ions were added to *r* = 2 or 3. In contrast, the duplex was destabilized at higher feeding ratios of Cu2+ and Zn2+. The stability of the duplex was maintained even in the presence of the excess amounts of Fe2+ and Ni2+, because the Ω-shaped conformation of **terpy2DNA** was preserved due to the magnitudes of the two successive binding constants with terpyridine, *K*1 < *K*2. As for Cu2+ and Zn2+, the global conformation of **terpy2DNA** was changed from Ω-form to a linear form accompanying the transition of the complex types formed on **terpy2DNA** from [M(terpy)2]2+ (on **terpy2DNA**∙M2+) to 2[M(terpy)]2+ (on **terpy2DNA**∙2M2+) with increasing amounts of ions due to their binding properties with terpyridine, *K*1 > *K*2. This indicates that the general trend of the complexation of transition metal ions found in the text books of coordination chemistry is still valid on DNA.

Conformational control of **terpy2DNA** was confirmed via UV melting with the complementary tandem sequence of the two external segments. The results show that the duplex structure was significantly stabilized in the presence of an equimolar amount (to **terpy2DNA**) of transition metal ions. In addition, in the presence of the metal ions, the shape of the melting curves changed to be more cooperative, indicating that the two sequences outside the terpyridines were cooperatively dissociated in a narrow temperature

*Life* **2022**, *12*, x FOR PEER REVIEW 4 of 10

**Figure 2.** Metal ion-directed reversible edition of the DNA sequence. The sequence of **terpy2DNA** is edited by intramolecular complexation with appropriate metal ions through Ω-shaped global conformational change. **Figure 2.** Metal ion-directed reversible edition of the DNA sequence. The sequence of **terpy2DNA** is edited by intramolecular complexation with appropriate metal ions through Ω-shaped global conformational change.

We then applied the metal ion-directed sequence edition based on the Ω-motif to regulate the function of the split DNAzyme with peroxidase-like activity. To activate the split DNAzyme, they need to be reconstituted to form a G-quadruplex structure. As shown in Figure 3a, **terpy2DNA** was used as the tunable template to activate the split DNAzyme. The reaction was monitored by the color change associated with the oxidation of the substrate, 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid (ABTS). Figure 3b shows the time course of the reaction profiles. Equivalent concentrations of Fe2+ and Ni2+ (*r* = 1) restored the activity of the split DNAzyme in the presence of **terpy2DNA** [26]. Cu2+ and Zn2+ also showed a moderate effect on the restoration of split DNAzyme activity. As we expected, the global conformation of **terpy2DNA** was fixed to a Ω-shape by the intramolecular formation of [M(terpy)2]2+. Subsequently, the new sequence presented on **terpy2DNA** ∙M2+ worked as an effective template to reconstruct the integrated active form of DNAzyme. We then applied the metal ion-directed sequence edition based on the Ω-motif to regulate the function of the split DNAzyme with peroxidase-like activity. To activate the split DNAzyme, they need to be reconstituted to form a G-quadruplex structure. As shown in Figure 3a, **terpy2DNA** was used as the tunable template to activate the split DNAzyme. The reaction was monitored by the color change associated with the oxidation of the substrate, 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid (ABTS). Figure 3b shows the time course of the reaction profiles. Equivalent concentrations of Fe2+ and Ni2+ (*r* = 1) restored the activity of the split DNAzyme in the presence of **terpy2DNA** [26]. Cu2+ and Zn2+ also showed a moderate effect on the restoration of split DNAzyme activity. As we expected, the global conformation of **terpy2DNA** was fixed to a Ω-shape by the intramolecular formation of [M(terpy)2] 2+. Subsequently, the new sequence presented on **terpy2DNA** ·M2+ worked as an effective template to reconstruct the integrated active form of DNAzyme.

The results demonstrated that the global DNA structure and, furthermore, the activity of DNAzyme were controlled by local metal complexation events that could be rationally designed based on general coordination chemistry. The technique of dynamic DNA splicing proposed in this study would be a compatible technique with the construction of the molecular systems consisting of functional DNA, such as aptamer and DNAzyme. Based on the Ω-motif, one could control the activity of reconstituted functional DNA or RNA, thermodynamics and kinetics of strand exchange, and gene expression.

**Figure 3.** Metal ion-directed regulation of DNAzyme activity. (a) Allosteric regulation of split DNAzyme activity by metal ion-directed dynamic sequence edition of the template, **terpy2DNA**. (b) Left: Time courses of the ABTS oxidation by the split DNAzyme with **terpy2DNA** in the presence of Fe2+ and Ni2+. Red, split DNAzyme/**terpy2DNA** + Fe2+; blue, split DNAzyme/**terpy2DNA** + Ni2+; black, split DNAzyme/**terpy2DNA**, no metal ions. Right: Images of reaction solutions shown in the time courses. **Figure 3.** Metal ion-directed regulation of DNAzyme activity. (**a**) Allosteric regulation of split DNAzyme activity by metal ion-directed dynamic sequence edition of the template, **terpy2DNA**. (**b**) Left: Time courses of the ABTS oxidation by the split DNAzyme with **terpy2DNA** in the presence of Fe2+ and Ni2+. Red, split DNAzyme/**terpy2DNA** + Fe2+; blue, split DNAzyme/**terpy2DNA** + Ni2+; black, split DNAzyme/**terpy2DNA**, no metal ions. Right: Images of reaction solutions shown in the time courses.

#### The results demonstrated that the global DNA structure and, furthermore, the activity of DNAzyme were controlled by local metal complexation events that could be ration-**4. Reconstruction of Luminescent Lanthanide Complexes on DNA and Their Analytical Applications**

ally designed based on general coordination chemistry. The technique of dynamic DNA splicing proposed in this study would be a compatible technique with the construction of the molecular systems consisting of functional DNA, such as aptamer and DNAzyme. Based on the Ω-motif, one could control the activity of reconstituted functional DNA or RNA, thermodynamics and kinetics of strand exchange, and gene expression. **4. Reconstruction of Luminescent Lanthanide Complexes on DNA and Their Analytical Applications**  The present study demonstrated a straightforward genetic analysis using DNA-templated cooperative complexation between a luminescent lanthanide ion (Ln3+: Tb3+ or Eu3+) and two DNA conjugates. Ethylenediaminetetraacetic acid (EDTA) and 1,10-phenan-The present study demonstrated a straightforward genetic analysis using DNAtemplated cooperative complexation between a luminescent lanthanide ion (Ln3+: Tb3+ or Eu3+) and two DNA conjugates. Ethylenediaminetetraacetic acid (EDTA) and 1,10 phenanthrorine (phen) were covalently attached to the end of oligo DNAs to form a pair of the conjugates, i.e., capture and sensitizer probes, respectively. The sequences of these split probes were designed so as to form a tandem duplex with targets (templates) with their auxiliary units facing each other, providing a microenvironment to accommodate Ln3+ (Figure 4a) [27]. The results of time-resolved luminescence studies showed that the formation of luminous ternary complexes, EDTA/Ln3+/phen, depends on the sequence of the targets. The intensity of the luminescence is affected by the binding affinities of the probes or the local structural disruption caused by one-base mispairing [28].

throrine (phen) were covalently attached to the end of oligo DNAs to form a pair of the conjugates, i.e., capture and sensitizer probes, respectively. The sequences of these split probes were designed so as to form a tandem duplex with targets (templates) with their auxiliary units facing each other, providing a microenvironment to accommodate Ln3+ (Figure 4a) [27]. The results of time-resolved luminescence studies showed that the formation of luminous ternary complexes, EDTA/Ln3+/phen, depends on the sequence of the targets. The intensity of the luminescence is affected by the binding affinities of the probes or the local structural disruption caused by one-base mispairing [28]. This technique was applied to the multicolored allele typing based on single nucleotide polymorphisms (SNPs) in thiopurine S-methyltransferase gene by the concomitant use of the two capture probes, which are complementary to a part of the wild-type (**wt**) and the mutant (**mut**) of the gene. First, the capture probes for **wt** and **mut** were mixed with equimolar amounts of Tb3+ and Eu3+, respectively. Both the allele-specific capture probe with Ln3+ and the sensitizer probe were then added to three different solutions containing the targets, **wt**/**wt**, **mut**/**mut**, and **wt**/**mut**. The solutions emitted distinctive colors, i.e., green, red, and yellow for **wt**/**wt**, **mut**/**mut**, and **wt**/**mut**, respectively; the colors were identifiable with the naked eye (Figure 4b) [29].

The system was applied as a molecular nanodevice consisting of the lanthanide complex and stem-loop structured oligo DNA. The nanodevice was synthesized by the introduction of EDTA and phen at the 5'- and the 3'-end of the DNA, respectively. This device was named the lanthanide complex molecular beacon (**LCMB**). In the stem-loop

structure of **LCMB**, the two auxiliary units were placed in close proximity, providing a microenvironment to accommodate Ln3+. The characteristic emissions of Tb3+ and Eu3+ were clearly observed in the solution containing the nanodevice and the corresponding Ln3+ ("on" state). In contrast, scarce emission was observed in the presence of the DNA complementary to the loop region; the auxiliary units were separated from each other when the duplex was formed ("off" state). The ATP aptamer (**iATP**) was used as an interface for the application of **LCMB** to ATP sensing. The sequence of **LCMB** was designed to be complementary to a part of **iATP** (Figure 5a). With the addition of ATP to the **LCMB**/**iATP** duplex, the fluorescence signal turned on as the result of the restoration of **LCMB** stem-loop structure accompanying the displacement of **iATP** from **LCMB** by ATP. A highly specific response was observed for ATP among NTPs, as shown in Figure 5b [30]. *Life* **2022**, *12*, x FOR PEER REVIEW 6 of 10

**Figure 4.** Multicolored allele typing using time-resolved luminescence from lanthanide complexes (Tb3+ and Eu3+) cooperatively formed with a pair of split probes. (**a**) The structure of the Ln3+ complex formed on tandem duplex of the split probes with target sequence. (**b**) Allele typing of thiopurine S-methyltransferase gene. **Figure 4.** Multicolored allele typing using time-resolved luminescence from lanthanide complexes (Tb3+ and Eu3+) cooperatively formed with a pair of split probes. (**a**) The structure of the Ln3+ complex formed on tandem duplex of the split probes with target sequence. (**b**) Allele typing of thiopurine S-methyltransferase gene.

This technique was applied to the multicolored allele typing based on single nucleotide polymorphisms (SNPs) in thiopurine S-methyltransferase gene by the concomitant use of the two capture probes, which are complementary to a part of the wild-type (**wt**) and the mutant (**mut**) of the gene. First, the capture probes for **wt** and **mut** were mixed with equimolar amounts of Tb3+ and Eu3+, respectively. Both the allele-specific capture probe with Ln3+ and the sensitizer probe were then added to three different solutions containing the targets, **wt**/**wt**, **mut**/**mut**, and **wt**/**mut**. The solutions emitted distinctive colors, i.e., green, red, and yellow for **wt**/**wt**, **mut**/**mut**, and **wt**/**mut**, respectively; the colors were identifiable with the naked eye (Figure 4b) [29]. The system was applied as a molecular nanodevice consisting of the lanthanide complex and stem-loop structured oligo DNA. The nanodevice was synthesized by the introduction of EDTA and phen at the 5'- and the 3'-end of the DNA, respectively. This device was named the lanthanide complex molecular beacon (**LCMB**). In the stem-loop structure of **LCMB**, the two auxiliary units were placed in close proximity, providing a microenvironment to accommodate Ln3+. The characteristic emissions of Tb3+ and Eu3+ were clearly Nonenzymatic amplification of the luminescent signal from the Ln complexes on the DNA scaffold was performed through catalytic hairpin assembly (CHA) and hybridization chain reaction (HCR), which are the typical DNA circuits consisting of the autonomous successive strand exchange reactions [31,32]. For HCR, four hairpin DNA conjugates were prepared; two of them carry EDTA on both ends, and phens are attached to both ends of another two hairpin strands DNAs. The sequences of the four hairpin DNA strands were designed so as to provide the long DNA wire as the product with Ln complexes at every junction. The HCR was initiated by a small amount of target DNA, acting as an initiator. Figure 6a shows the scheme of the HCR amplification. The luminescence signal significantly increased with the progress of HCR after target addition. Signal contrast was very high, and the sequence selectivity was preserved in this system [32]. To improve the amplification rate, the system was redesigned to form a cruciform product consisting of four hairpins by catalytic hairpin assembly (CHA) (Figure 6b). The sequences of hairpin monomers were modified so as to hybridize convergently to form a closed cruciform structure. Ln complexes were expected to form at each of the four tips of the cruciform.

observed in the solution containing the nanodevice and the corresponding Ln3+ ("on" state). In contrast, scarce emission was observed in the presence of the DNA complemen-

application of **LCMB** to ATP sensing. The sequence of **LCMB** was designed to be complementary to a part of **iATP** (Figure 5a). With the addition of ATP to the **LCMB**/**iATP** duplex, the fluorescence signal turned on as the result of the restoration of **LCMB** stem-loop structure accompanying the displacement of **iATP** from **LCMB** by ATP. A highly specific

response was observed for ATP among NTPs, as shown in Figure 5b [30].

The target miRNA *let-7a* was detected using time-resolved luminescence measurement techniques [32]. The CHA system (cruciform formation) was found to be more efficient than that of the earlier version of HCR (DNA wire), probably due to the difference in molecular sizes of the products. *Life* **2022**, *12*, x FOR PEER REVIEW 7 of 10

**Figure 5.** ATP sensing using **LCMB** and **iATP**. (**a**) operating principle of ATP sensing using competitive reaction over **iATP** between ATP and **LCMB**; (**b**) luminescence signal response of ATP sensor to NTPs. **Figure 5.** ATP sensing using **LCMB** and **iATP**. (**a**) operating principle of ATP sensing using competitive reaction over **iATP** between ATP and **LCMB**; (**b**) luminescence signal response of ATP sensor to NTPs. *Life* **2022**, *12*, x FOR PEER REVIEW 8 of 10

(**b**) **Figure 6.** *Cont*.

**5. Perspective** 

**Figure 6.** Nonenzymatic signal amplification by DNA circuits: (**a**) Luminous DNA wire was pro-

In recent years, research on nucleic acids has uncovered new and challenging issues as mentioned above, and nucleic acid conjugates show promise as a molecular tool that can be used to meet those challenges. In addition to the standard complementary nucleic acids, functional nucleic acids, such as aptamers and DNAzyme, as well as nonnatural nucleic acids have been added to the options as nucleic acid components of the conjugates. Furthermore, given the diversity of functional molecules that pair with DNAs, an infinite number of combinations are possible in the design of nucleic acid conjugates. With the emergence of "click chemistry", the in situ synthesis of conjugate molecules is now possible, further expanding the potential of these molecules [33,34]. As demonstrated in the

duced by target-initiated HCR; (**b**) Luminous cruciform DNA was produced by CHA.

#### **5. Perspective 5. Perspective**

(**a**)

In recent years, research on nucleic acids has uncovered new and challenging issues as mentioned above, and nucleic acid conjugates show promise as a molecular tool that can be used to meet those challenges. In addition to the standard complementary nucleic acids, functional nucleic acids, such as aptamers and DNAzyme, as well as nonnatural nucleic acids have been added to the options as nucleic acid components of the conjugates. Furthermore, given the diversity of functional molecules that pair with DNAs, an infinite number of combinations are possible in the design of nucleic acid conjugates. With the emergence of "click chemistry", the in situ synthesis of conjugate molecules is now possible, further expanding the potential of these molecules [33,34]. As demonstrated in the In recent years, research on nucleic acids has uncovered new and challenging issues as mentioned above, and nucleic acid conjugates show promise as a molecular tool that can be used to meet those challenges. In addition to the standard complementary nucleic acids, functional nucleic acids, such as aptamers and DNAzyme, as well as nonnatural nucleic acids have been added to the options as nucleic acid components of the conjugates. Furthermore, given the diversity of functional molecules that pair with DNAs, an infinite number of combinations are possible in the design of nucleic acid conjugates. With the emergence of "click chemistry", the in situ synthesis of conjugate molecules is now possible, further expanding the potential of these molecules [33,34]. As demonstrated in the abovereferenced studies, it is always critical to accurately predict the structure of nucleic acid conjugates. The fundamentals in the physical chemistry of nucleic acids that Professor Breslauer and his research groups have achieved are pioneering and of universal value. The sum of these works can be considered a milestone in the history of nucleic acid science. We would like to conclude this brief note by sending our best wishes from Japan to Professor Breslauer, on the occasion of his 75th birthday.

**Author Contributions:** Conceptualization, T.I.; writing—original draft, review, editing, T.I., Y.K. (Yusuke Kitamura) and Y.K. (Yousuke Katsuda). All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Grant-in-Aid for Scientific Research on Innovative Areas (Coordination Programming, Area 2107) (no. 24108734 to T.I.), Challenging Exploratory Research (No. 19K22259 to T.I.), and Scientific Research (B) (no. 24350040, 15H03829, and 20H02769 to T.I.) from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

**Acknowledgments:** We are grateful to our collaborators for their many contributions to all the works discussed in this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Helen M. Berman <sup>1</sup> , Catherine L. Lawson <sup>2</sup> and Bohdan Schneider 3,\***


**Abstract:** In this review, we describe the creation of the Nucleic Acid Database (NDB) at Rutgers University and how it became a testbed for the current infrastructure of the RCSB Protein Data Bank. We describe some of the special features of the NDB and how it has been used to enable research. Plans for the next phase as the Nucleic Acid Knowledgebase (NAKB) are summarized.

**Keywords:** nucleic acid structures; nucleic acid conformation; biological structure database; DNA; RNA; validation standards

### **1. Introduction**

The first single crystal structures of nucleic acids were determined in the 1970s, almost twenty years after the model of the DNA double helix based on fiber data was published [1,2]. Short fragments of RNA yielded the first atomic-level views of the double helix and demonstrated conformational flexibility [3–5]. These structures were archived as small molecules in the Cambridge Crystallographic Database (CSD) [6]. The structure of tRNA, determined in 1974 [7–9], showed that RNA can fold into a compact structure and demonstrated the importance of tertiary interactions. As DNA synthesis became possible, structures of the DNA double helix with predefined sequences were determined. The first structures were left-handed Z-form DNA fragments [10], and in 1981, the first single crystal structure of a full turn of B-form DNA was published [11]. The tRNA structures and larger nucleic acid fragments were archived in the Protein Data Bank (PDB [12]). By 1990, there were nearly 100 publicly released nucleic acid structures, thus allowing analyses of sequence-dependent features, hydration patterns, and ligand interactions.

During the late 1970s and 1980s, several faculty members in the Chemistry Department at Rutgers University focused their research on nucleic acids. Ken Breslauer worked on the macroscopic properties of nucleic acids using calorimetric approaches [13–16]; these works, seminal for the understanding of thermodynamics of DNA, have continued to this day [17–20]. Roger Jones developed new methods to synthesize DNA [21]. Jerry Manning developed the counterion condensation theory to understand DNA folding [22], and continued this work in collaboration with the Breslauer group [23]. Wilma Olson performed detailed analyses of the structure of DNA [24]. During that period, Helen Berman carried out nucleic acid crystallography research at the Institute for Cancer Research in Philadelphia and had close interactions with the Rutgers group. In 1989, she joined the Chemistry faculty at Rutgers.

The setting at Rutgers was ideal for collaborative studies using both experimental and computational approaches to investigate nucleic acid structure. It was necessary to have a resource that contained the structural information which resided in the CSD, in the PDB, or in the laboratories of individual researchers to facilitate these efforts. In collaboration with David Beveridge, with whom Berman was collaborating on computational analyses of nucleic acid hydration, Olson and Berman proposed to create the Nucleic Acid Database

**Citation:** Berman, H.M.; Lawson, C.L.; Schneider, B. Developing Community Resources for Nucleic Acid Structures. *Life* **2022**, *12*, 540. https://doi.org/10.3390/life12040540

Academic Editors: Tigran Chalikian, Jens Völker and Bruce J. Nicholson

Received: 14 March 2022 Accepted: 31 March 2022 Published: 6 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

(NDB). In the early 1990s, funding was received from the National Science Foundation to establish "A Comprehensive Database of the Three-Dimensional Structures of Nucleic Acids". The goal was to create a searchable database that would integrate information from several sources and make a variety of reports, thus enabling research on nucleic acid structure.

### **2. Development of the Nucleic Acid Database**

The first step in the development of the NDB was to collect and curate the structural data [25]. Coordinates were accessed from the CSD and the PDB. Each structure and experiment were carefully reviewed to create appropriate annotations beyond what was available from each resource. Rather than working directly with the flat files maintained by the PDB, the NDB imported the parsed data files into a relational database management system (DBMS). Sybase [26] was chosen as the DBMS in large part because it was being used by Genbank [27,28]. A query system called NDBquery was put into place. In the early years, distribution was accomplished via FTP and a system called Gopher [29]. By 1995, a web server was set up, which generated a modest amount of activity to access and analyze the 350 structures represented in the NDB. The NDB was actively involved in the development of mmCIF, whose data model is compatible with a relational DBMS. By 1996, mmCIF [30] became the master format for the NDB. The software that was developed and the experience gained using this data representation set the stage for the management of the Protein Data Bank using mmCIF as the master format by the Research Collaboratory for Structural Bioinformatics (RCSB) beginning in 1998.

The NDB also became a driver for the creation of geometrical standards for nucleic acid structures. Careful analysis of high-resolution structures from CSD permitted the calculation of standard reference bond distances and angles for the bases, sugars, and phosphates of nucleic acids [31,32]. Using these values, Parkinson et al. [33] created new parameters that enabled improved refinement of nucleic acid-containing crystal structures against their experimental data. Those standards were widely used. In 1998, the NDB helped organize a conference whose outcome was the standard coordinate frame definition for nucleic acid bases [34]. This standard became widely adopted by researchers studying nucleic acid base morphology.

### **3. Features of the NDB**

In addition to facilitating access to primary data for nucleic acid structures, the NDB provides tables of derived features, such as classifications of base pairing topologies [35], backbone torsion angles, and conformational and base pair classifications [36,37].

The NDB also offers different types of data visualization and presentation. The most important is the NDB Atlas page (Figure 1), which gives summary information about the structure, visualizations of the crystal asymmetric unit, the biological unit, unit cells, and for RNA structures; it provides a view that combines the secondary and tertiary structural features. Links to other resources are also provided.

The functionality of the NDB and its query engine was first and foremost driven by research projects on the nucleic acid structural and computational biologists. Careful attention was given to the quality and uniformity of the metadata so that it would be possible to use Boolean logic to create queries; individual questions could be made into logical constructs joined by logical AND, OR, and NOT. This requirement represented a challenge for building a robust system of precisely defined terms incorporated into a formal computer-readable language; mmCIF was that dictionary.

The NDB website was designed so that the user could select structures with features of interest and then use those structures for further analysis, e.g., through the creation of detailed tabular or graphical reports. Soon after the first functional version of the NDB was available, we started to use its potential to study the geometrical features of nucleic acids. The original NDB reporting capability allowed the user to obtain tabular reports of various properties of the selected nucleic acid structures from basic information about the

publication or refinement parameters and graphical reports of selected geometric features such as bond distances (Figure 2) or torsion angles (Figure 3). Once funding for the NDB became limited in the 2000s, it was not possible to maintain these reporting capabilities. *Life* **2022**, *12*, x FOR PEER REVIEW 3 of 12

*Life* **2022**, *12*, x FOR PEER REVIEW 3 of 12

such as bond distances (Figure 2) or torsion angles (Figure 3). Once funding for the NDB

**Figure 1.** NDB Atlas page of a tRNA structure. **Figure 1.** NDB Atlas page of a tRNA structure. became limited in the 2000s, it was not possible to maintain these reporting capabilities.

**Figure 2.** Example NDB report of geometric features of nucleic acids, based on structures available in the 1990s. Histograms show the P–O5′ valence distances in high-resolution DNA, and in all DNA structures. **Figure 2.** Example NDB report of geometric features of nucleic acids, based on structures available in the 1990s. Histograms show the P–O50 valence distances in high-resolution DNA, and in all DNA structures.

**Figure 2.** Example NDB report of geometric features of nucleic acids, based on structures available

structures.

**Figure 3.** NDB graphical report of torsion angle distribution [38] for the Drew–Dickerson dodecamer, PDB ID 1BNA, NDB ID BDL001 [11]. Blue sectors indicate torsion angle limits for all structures annotated as B–DNA. Overlaid black tick marks are measured torsion values for BDL001. Adjacent black/grey sectors denote average values and spreads of 1 and 2 estimated standard deviations. Note that two averages are indicated for several torsions, e.g., for δ (two distinct sugar puckers) and ε (BI versus BII forms). Values reflect NDB data available in 1996. **Figure 3.** NDB graphical report of torsion angle distribution [38] for the Drew–Dickerson dodecamer, PDB ID 1BNA, NDB ID BDL001 [11]. Blue sectors indicate torsion angle limits for all structures annotated as B–DNA. Overlaid black tick marks are measured torsion values for BDL001. Adjacent black/grey sectors denote average values and spreads of 1 and 2 estimated standard deviations. Note that two averages are indicated for several torsions, e.g., for δ (two distinct sugar puckers) and ε (BI versus BII forms). Values reflect NDB data available in 1996. **Figure 3.** NDB graphical report of torsion angle distribution [38] for the Drew–Dickerson dodecamer, PDB ID 1BNA, NDB ID BDL001 [11]. Blue sectors indicate torsion angle limits for all structures annotated as B–DNA. Overlaid black tick marks are measured torsion values for BDL001. Adjacent black/grey sectors denote average values and spreads of 1 and 2 estimated standard deviations. Note that two averages are indicated for several torsions, e.g., for δ (two distinct sugar puckers) and ε (BI versus BII forms). Values reflect NDB data available in 1996.

#### **4. Research Enabled by the NDB 4. Research Enabled by the NDB 4. Research Enabled by the NDB**

The NDB has been used by many researchers to analyze the structures of nucleic acids. There are over 1100 citations to the original NDB article. The type of research enabled by the NDB includes DNA conformational analyses [39], DNA structure prediction [40], RNA structure prediction [41], analyses of protein-nucleic acid interactions [42,43], and the creation of new specialty databases [44]. In our research, we have used the NDB to study a variety of aspects of nucleic acids. For example, we surveyed A, B, and Z-form double helical DNA structures and used Fourier averaging to determine hydration patterns, e.g., for DNA nitrogenous bases [45]. Both base and later phosphate studies showed sequence and conformation-dependent water position preferences (Figure 4). The NDB has been used by many researchers to analyze the structures of nucleic acids. There are over 1100 citations to the original NDB article. The type of research enabled by the NDB includes DNA conformational analyses [39], DNA structure prediction [40], RNA structure prediction [41], analyses of protein-nucleic acid interactions [42,43], and the creation of new specialty databases [44]. In our research, we have used the NDB to study a variety of aspects of nucleic acids. For example, we surveyed A, B, and Z-form double helical DNA structures and used Fourier averaging to determine hydration patterns, e.g., for DNA nitrogenous bases [45]. Both base and later phosphate studies showed sequence and conformation-dependent water position preferences (Figure 4). The NDB has been used by many researchers to analyze the structures of nucleic acids. There are over 1100 citations to the original NDB article. The type of research enabled by the NDB includes DNA conformational analyses [39], DNA structure prediction [40], RNA structure prediction [41], analyses of protein-nucleic acid interactions [42,43], and the creation of new specialty databases [44]. In our research, we have used the NDB to study a variety of aspects of nucleic acids. For example, we surveyed A, B, and Z-form double helical DNA structures and used Fourier averaging to determine hydration patterns, e.g., for DNA nitrogenous bases [45]. Both base and later phosphate studies showed sequence and conformation-dependent water position preferences (Figure 4).

**Figure 4.** Sequence dependence of DNA hydration. Two distinct hydration patterns are shown in for the A-form major groove for (**a**) 5′–GC–3′ and (**b**) 5′–CG–3′, based on analyses of structures available in the mid–1990s NDB [44]. A more recent analysis of hydration using larger and functionally more relevant dinucleotide fragments is available at watlas.datmos.org/watna (accessed on 30 March 2022). **Figure 4.** Sequence dependence of DNA hydration. Two distinct hydration patterns are shown in for the A-form major groove for (**a**) 5′–GC–3′ and (**b**) 5′–CG–3′, based on analyses of structures available in the mid–1990s NDB [44]. A more recent analysis of hydration using larger and functionally more relevant dinucleotide fragments is available at watlas.datmos.org/watna (accessed on 30 March 2022). **Figure 4.** Sequence dependence of DNA hydration. Two distinct hydration patterns are shown in for the A-form major groove for (**a**) 50–GC–30 and (**b**) 50–CG–30 , based on analyses of structures available in the mid–1990s NDB [44]. A more recent analysis of hydration using larger and functionally more relevant dinucleotide fragments is available at watlas.datmos.org/watna (accessed on 30 March 2022).

The growing volume of available crystal structures with ever growing sequence variability also led us to ask whether conformational properties of various DNA and RNA forms could be better characterized. This task posed new challenges to NDB querying and reporting capabilities. Specific subsets of structures were selected based on sequence, function, or structural features using SQL queries; their properties were reported as text or graphs (Figure 5). Ultimately, we were able to sharpen conformational definitions for The growing volume of available crystal structures with ever growing sequence variability also led us to ask whether conformational properties of various DNA and RNA forms could be better characterized. This task posed new challenges to NDB querying and reporting capabilities. Specific subsets of structures were selected based on sequence, function, or structural features using SQL queries; their properties were reported as text or graphs (Figure 5). Ultimately, we were able to sharpen conformational definitions for established subtypes of A-B-Z forms (Figure 6) [46]. The growing volume of available crystal structures with ever growing sequence variability also led us to ask whether conformational properties of various DNA and RNA forms could be better characterized. This task posed new challenges to NDB querying and reporting capabilities. Specific subsets of structures were selected based on sequence, function, or structural features using SQL queries; their properties were reported as text or graphs (Figure 5). Ultimately, we were able to sharpen conformational definitions for established subtypes of A-B-Z forms (Figure 6) [46].

established subtypes of A-B-Z forms (Figure 6) [46].

**Figure 5.** NDB torsion angle scatter plot. The distribution of backbone torsion angles α and ɣ observed in crystal structures of RNA annotated in the NDB as ribozymes in 1996 is shown. α describes rotations around the P–O5′ phosphodiester bond, ɣ around C5′–C4′ bond. Plots were created directly on the NDB website as PostScript formatted reports. **Figure 5.** NDB torsion angle scatter plot. The distribution of backbone torsion angles α and G observed in crystal structures of RNA annotated in the NDB as ribozymes in 1996 is shown. α describes rotations around the P–O50 phosphodiester bond, G around C50–C40 bond. Plots were created directly on the NDB website as PostScript formatted reports. **Figure 5.** NDB torsion angle scatter plot. The distribution of backbone torsion angles α and ɣ observed in crystal structures of RNA annotated in the NDB as ribozymes in 1996 is shown. α describes rotations around the P–O5′ phosphodiester bond, ɣ around C5′–C4′ bond. Plots were created directly on the NDB website as PostScript formatted reports.

**Figure 6.** Scatter plot of backbone torsion angles ζ and α + 1. ζ shows the variation in rotation around the O3′–P phosphodiester bond and α + 1 around the P–O5′ bond (labeled α + 1 because this bond belongs to the sequentially following nucleotide). Data for the DNA alone is shown as dark blue crosses, for protein-DNA complexes: light blue dots, and for RNA: red dots. The scattergram shows data for all nucleotide residues in the 1998 NDB. Clusters of some major conformational types are labeled. This analysis revealed that no nucleic acid form can be unequivocally classified by torsion angle pairs; a more sophisticated multidimensional analysis was needed. The growing number of nucleic acid structures and the appearance of new forms **Figure 6.** Scatter plot of backbone torsion angles ζ and α + 1. ζ shows the variation in rotation around the O3′–P phosphodiester bond and α + 1 around the P–O5′ bond (labeled α + 1 because this bond belongs to the sequentially following nucleotide). Data for the DNA alone is shown as dark blue crosses, for protein-DNA complexes: light blue dots, and for RNA: red dots. The scattergram shows data for all nucleotide residues in the 1998 NDB. Clusters of some major conformational types are labeled. This analysis revealed that no nucleic acid form can be unequivocally classified by torsion angle pairs; a more sophisticated multidimensional analysis was needed. **Figure 6.** Scatter plot of backbone torsion angles ζ and α + 1. ζ shows the variation in rotation around the O30–P phosphodiester bond and α + 1 around the P–O50 bond (labeled α + 1 because this bond belongs to the sequentially following nucleotide). Data for the DNA alone is shown as dark blue crosses, for protein-DNA complexes: light blue dots, and for RNA: red dots. The scattergram shows data for all nucleotide residues in the 1998 NDB. Clusters of some major conformational types are labeled. This analysis revealed that no nucleic acid form can be unequivocally classified by torsion angle pairs; a more sophisticated multidimensional analysis was needed.

such as quadruplexes and large-folded RNAs demonstrated the plasticity of nucleic acid molecules. It became clear that the conformational space of nucleic acids is extremely complex and that capturing it would require a concerted understanding of base pairing motifs and the backbone structural variability. Early analyses showed that backbone conformational variability was fundamentally The growing number of nucleic acid structures and the appearance of new forms such as quadruplexes and large-folded RNAs demonstrated the plasticity of nucleic acid molecules. It became clear that the conformational space of nucleic acids is extremely complex and that capturing it would require a concerted understanding of base pairing motifs and the backbone structural variability. The growing number of nucleic acid structures and the appearance of new forms such as quadruplexes and large-folded RNAs demonstrated the plasticity of nucleic acid molecules. It became clear that the conformational space of nucleic acids is extremely complex and that capturing it would require a concerted understanding of base pairing motifs and the backbone structural variability.

influenced by flexibility around the O3′–P–O5′ phosphodiester bonds that connected adjacent nucleotide residues, described by torsion angles ζ and α [47]. Our multidimensional statistical analysis, therefore, focused on dinucleotide fragments analyzed in torsion space, taking full advantage of the availability of the NDB and PDB. Early analyses showed that backbone conformational variability was fundamentally influenced by flexibility around the O3′–P–O5′ phosphodiester bonds that connected adjacent nucleotide residues, described by torsion angles ζ and α [47]. Our multidimensional statistical analysis, therefore, focused on dinucleotide fragments analyzed in torsion space, taking full advantage of the availability of the NDB and PDB. Early analyses showed that backbone conformational variability was fundamentally influenced by flexibility around the O30–P–O50 phosphodiester bonds that connected adjacent nucleotide residues, described by torsion angles ζ and α [47]. Our multidimensional statistical analysis, therefore, focused on dinucleotide fragments analyzed in torsion space, taking full advantage of the availability of the NDB and PDB.

In the 2000s, research conducted by several groups concentrated on analysis of RNA backbone flexibility culminated in an RNA Consortium consensus set of dinucleotide conformers [48]. The effort was later complemented by an analogous set of DNA conformers [49] and, ultimately, a comprehensive classification system for dinucleotide fragments covering both DNA and RNA [50]. This classification algorithm provides an automated structural ranking of dinucleotide fragments at two levels of detail: fully geometrical classification into dinucleotide conformational classes (NtC) and a more human-accessible structural alphabet (CANA). The assignment of the CANA and NtC classes makes it possible to study the structural propensities of dinucleotide sequences. For example, analysis of DNA in transcription factors and in histone core particle complexes showed important trends of protein interactions with specific bending associated NtC classes (Figure 7) [49]. formers [48]. The effort was later complemented by an analogous set of DNA conformers [49] and, ultimately, a comprehensive classification system for dinucleotide fragments covering both DNA and RNA [50]. This classification algorithm provides an automated structural ranking of dinucleotide fragments at two levels of detail: fully geometrical classification into dinucleotide conformational classes (NtC) and a more human-accessible structural alphabet (CANA). The assignment of the CANA and NtC classes makes it possible to study the structural propensities of dinucleotide sequences. For example, analysis of DNA in transcription factors and in histone core particle complexes showed important trends of protein interactions with specific bending associated NtC classes (Figure 7) [49].

In the 2000s, research conducted by several groups concentrated on analysis of RNA

backbone flexibility culminated in an RNA Consortium consensus set of dinucleotide con-

*Life* **2022**, *12*, x FOR PEER REVIEW 6 of 12

**Figure 7.** Transcription factors and proteins of the histone core particle bend DNA duplex differently [49]. (**Left**) bending by transcription factors is acquired mostly by local adaptation to the A form (highlighted in red); shown is DNA from complex with TFIIB–Related Factor Brf2 (PDB id 4ROC [51]). (**Right**) bending by the histone core particle is associated with the BII form (highlighted in blue); shown are first 75 base pairs from a histone core particle (PDB id 5F99 [52]); when statistically measured over many structures, the BII form appears in histone-wrapped DNA every tenth step corresponding to one full turn of duplex; the periodicity of the BII form appearance explains **Figure 7.** Transcription factors and proteins of the histone core particle bend DNA duplex differently [49]. (**Left**) bending by transcription factors is acquired mostly by local adaptation to the A form (highlighted in red); shown is DNA from complex with TFIIB–Related Factor Brf2 (PDB id 4ROC [51]). (**Right**) bending by the histone core particle is associated with the BII form (highlighted in blue); shown are first 75 base pairs from a histone core particle (PDB id 5F99 [52]); when statistically measured over many structures, the BII form appears in histone-wrapped DNA every tenth step corresponding to one full turn of duplex; the periodicity of the BII form appearance explains the DNA bending.

the DNA bending. NtC assignments have also inspired development of a new validation tool linking the global geometry criterion (closeness of fit to the nearest NtC class) and the quality of NtC assignments have also inspired development of a new validation tool linking the global geometry criterion (closeness of fit to the nearest NtC class) and the quality of fit into electron density (Figure 8) [50]. It offers a simple information-rich graphical representation of the overall quality of nucleic acid structure in the form of a 2D graph.

fit into electron density (Figure 8) [50]. It offers a simple information-rich graphical representation of the overall quality of nucleic acid structure in the form of a 2D graph. In an additional effort to understand, classify, and validate nucleic acids, we have developed a procedure similar to Ramachandran analysis for proteins, making use of eta (η) and theta (θ) virtual torsion angles (pseudotorsions) [53,54]. Measured (η,θ) pairs define backbone conformations for each central residue within a trinucleotide. Plots are designed to quickly reveal rare conformations that may need extra checking (Figure 9). A web server was recently set up to investigate the utility of this approach for RNA structures determined using cryoEM (ptp.emdataresource.org) (accessed on 30 March 2022).

**Figure 8.** Geometrically well-defined dinucleotides fit well into their electron densities. Real Space Correlation Coefficient (RSCC, horizontal axis) measures how closely the model electron density resembles the experimental density and rmsd (vertical axis) measures how closely the geometry of the model resembles the closest NtC class in the so called golden set [50]. NtC class BB00 (**left**) characterizes the B form, AA00 (**center**) A form in both DNA and RNA, and NANT (**right**) are all unclassified dinucleotides. Geometrically unclassified dinucleotides fit significantly worse to the elec-

tron density.

sentation of the overall quality of nucleic acid structure in the form of a 2D graph.

In the 2000s, research conducted by several groups concentrated on analysis of RNA

backbone flexibility culminated in an RNA Consortium consensus set of dinucleotide conformers [48]. The effort was later complemented by an analogous set of DNA conformers [49] and, ultimately, a comprehensive classification system for dinucleotide fragments covering both DNA and RNA [50]. This classification algorithm provides an automated structural ranking of dinucleotide fragments at two levels of detail: fully geometrical classification into dinucleotide conformational classes (NtC) and a more human-accessible structural alphabet (CANA). The assignment of the CANA and NtC classes makes it possible to study the structural propensities of dinucleotide sequences. For example, analysis of DNA in transcription factors and in histone core particle complexes showed important trends of protein interactions with specific bending associated NtC classes (Figure 7) [49].

**Figure 7.** Transcription factors and proteins of the histone core particle bend DNA duplex differently [49]. (**Left**) bending by transcription factors is acquired mostly by local adaptation to the A form (highlighted in red); shown is DNA from complex with TFIIB–Related Factor Brf2 (PDB id 4ROC [51]). (**Right**) bending by the histone core particle is associated with the BII form (highlighted in blue); shown are first 75 base pairs from a histone core particle (PDB id 5F99 [52]); when statistically measured over many structures, the BII form appears in histone-wrapped DNA every tenth step corresponding to one full turn of duplex; the periodicity of the BII form appearance explains

NtC assignments have also inspired development of a new validation tool linking the global geometry criterion (closeness of fit to the nearest NtC class) and the quality of fit into electron density (Figure 8) [50]. It offers a simple information-rich graphical repre-

**Figure 8.** Geometrically well-defined dinucleotides fit well into their electron densities. Real Space Correlation Coefficient (RSCC, horizontal axis) measures how closely the model electron density resembles the experimental density and rmsd (vertical axis) measures how closely the geometry of the model resembles the closest NtC class in the so called golden set [50]. NtC class BB00 (**left**) characterizes the B form, AA00 (**center**) A form in both DNA and RNA, and NANT (**right**) are all unclassified dinucleotides. Geometrically unclassified dinucleotides fit significantly worse to the electron density. **Figure 8.** Geometrically well-defined dinucleotides fit well into their electron densities. Real Space Correlation Coefficient (RSCC, horizontal axis) measures how closely the model electron density resembles the experimental density and rmsd (vertical axis) measures how closely the geometry of the model resembles the closest NtC class in the so called golden set [50]. NtC class BB00 (**left**) characterizes the B form, AA00 (**center**) A form in both DNA and RNA, and NANT (**right**) are all unclassified dinucleotides. Geometrically unclassified dinucleotides fit significantly worse to the electron density. In an additional effort to understand, classify, and validate nucleic acids, we have developed a procedure similar to Ramachandran analysis for proteins, making use of eta (η) and theta (θ) virtual torsion angles (pseudotorsions) [53,54]. Measured (η,θ) pairs define backbone conformations for each central residue within a trinucleotide. Plots are designed to quickly reveal rare conformations that may need extra checking (Figure 9). A web server was recently set up to investigate the utility of this approach for RNA structures determined using cryoEM (ptp.emdataresource.org) (accessed on 30 March 2022).

the DNA bending.

**Figure 9.** Pseudotorsion plot, a simple coarse-level RNA backbone conformation validation tool. Measured eta and theta (η,θ) pseudotorsion values for each trinucleotide (black dots, red x's) are plotted against a quality-filtered virtual torsion angle distribution derived from a large number of RNA structures (contours), analogous to the Ramachandran plot for proteins. **Figure 9.** Pseudotorsion plot, a simple coarse-level RNA backbone conformation validation tool. Measured eta and theta (η,θ) pseudotorsion values for each trinucleotide (black dots, red x's) are plotted against a quality-filtered virtual torsion angle distribution derived from a large number of RNA structures (contours), analogous to the Ramachandran plot for proteins.

### **5. Current State of Nucleic Acid Structural Biology**

**5. Current State of Nucleic Acid Structural Biology** When the NDB was established in the early 1990s, most of the nucleic acid structures were small fragments with the exception of tRNA. There were a few structures of proteinnucleic acid complexes, limited to virus capsids with viral genomic RNA or DNA and transcription factors bound to duplex DNA. Molecular machines, such as the ribosome, were yet to be determined. In contrast, there are now more than 14,000 nucleic acid-containing structures in the PDB and NDB (Figure 10). A notable trend is the recent increase in the use of electron microscopy (EM) for structure determination. Protein/DNA complexes are the most abundant, followed by protein/RNA, DNA-only, and RNA-only. In When the NDB was established in the early 1990s, most of the nucleic acid structures were small fragments with the exception of tRNA. There were a few structures of proteinnucleic acid complexes, limited to virus capsids with viral genomic RNA or DNA and transcription factors bound to duplex DNA. Molecular machines, such as the ribosome, were yet to be determined. In contrast, there are now more than 14,000 nucleic acidcontaining structures in the PDB and NDB (Figure 10). A notable trend is the recent increase in the use of electron microscopy (EM) for structure determination. Protein/DNA complexes are the most abundant, followed by protein/RNA, DNA-only, and RNA-only. In addition to the increase in the number of structures, the structures are very diverse, as shown in Figure 11.

addition to the increase in the number of structures, the structures are very diverse, as

**Figure 10.** Current statistics for nucleic acid-containing structures. (**a**) New structures released into the PDB, by year and method; (**b**) Distribution of nucleic acid-containing structures. NDB archives and annotates structures determined using X-ray crystallography or NMR. Electron microscopy structures are not included in the NDB, but will be included in the NAKB, the planned successor to

shown in Figure 11.

NDB.

**Figure 10.** Current statistics for nucleic acid-containing structures. (**a**) New structures released into the PDB, by year and method; (**b**) Distribution of nucleic acid-containing structures. NDB archives and annotates structures determined using X-ray crystallography or NMR. Electron microscopy structures are not included in the NDB, but will be included in the NAKB, the planned successor to NDB. **Figure 10.** Current statistics for nucleic acid-containing structures. (**a**) New structures released into the PDB, by year and method; (**b**) Distribution of nucleic acid-containing structures. NDB archives and annotates structures determined using X-ray crystallography or NMR. Electron microscopy structures are not included in the NDB, but will be included in the NAKB, the planned successor to NDB. *Life* **2022**, *12*, x FOR PEER REVIEW 8 of 12

In an additional effort to understand, classify, and validate nucleic acids, we have developed a procedure similar to Ramachandran analysis for proteins, making use of eta (η) and theta (θ) virtual torsion angles (pseudotorsions) [53,54]. Measured (η,θ) pairs define backbone conformations for each central residue within a trinucleotide. Plots are designed to quickly reveal rare conformations that may need extra checking (Figure 9). A web server was recently set up to investigate the utility of this approach for RNA structures determined using cryoEM (ptp.emdataresource.org) (accessed on 30 March 2022).

**Figure 9.** Pseudotorsion plot, a simple coarse-level RNA backbone conformation validation tool. Measured eta and theta (η,θ) pseudotorsion values for each trinucleotide (black dots, red x's) are plotted against a quality-filtered virtual torsion angle distribution derived from a large number of

When the NDB was established in the early 1990s, most of the nucleic acid structures were small fragments with the exception of tRNA. There were a few structures of proteinnucleic acid complexes, limited to virus capsids with viral genomic RNA or DNA and transcription factors bound to duplex DNA. Molecular machines, such as the ribosome, were yet to be determined. In contrast, there are now more than 14,000 nucleic acid-containing structures in the PDB and NDB (Figure 10). A notable trend is the recent increase in the use of electron microscopy (EM) for structure determination. Protein/DNA complexes are the most abundant, followed by protein/RNA, DNA-only, and RNA-only. In addition to the increase in the number of structures, the structures are very diverse, as

RNA structures (contours), analogous to the Ramachandran plot for proteins.

**5. Current State of Nucleic Acid Structural Biology**

shown in Figure 11.

**Figure 11.** Diversity of structures containing DNA (top rows) and RNA (bottom rows). Nucleic acids are shown with ribbon backbones (random colors) and base blocks (A—red, C—yellow, G—green, T—blue, U—cyan). Proteins are shown as gold ribbons. From top left: B-form duplex DNA (1BNA [11]), Holliday junction (5DSB [55]), parallel-stranded DNA quadruplex (139D [56]), nucleosome core particle (1KX5 [57]), *trp* repressor/operator complex (1TRR [58]), DNA repair enzyme rev1 (6X6Z [59]), A-form duplex RNA (402D [60]), tRNA Asp (6UGG [61]), glutamine II riboswitch (6QN3 [62]), bacterial ribosome (4YBB [63]), spliceosomal E complex (6N7P [64]), Ebola virus matrix protein octamer (7K5L [65]). Images were generated using DSSR and PyMOL [66]. **Figure 11.** Diversity of structures containing DNA (top rows) and RNA (bottom rows). Nucleic acids are shown with ribbon backbones (random colors) and base blocks (A—red, C—yellow, G—green, T—blue, U—cyan). Proteins are shown as gold ribbons. From top left: B-form duplex DNA (1BNA [11]), Holliday junction (5DSB [55]), parallel-stranded DNA quadruplex (139D [56]), nucleosome core particle (1KX5 [57]), *trp* repressor/operator complex (1TRR [58]), DNA repair enzyme rev1 (6X6Z [59]), A-form duplex RNA (402D [60]), tRNA Asp (6UGG [61]), glutamine II riboswitch (6QN3 [62]), bacterial ribosome (4YBB [63]), spliceosomal E complex (6N7P [64]), Ebola virus matrix protein octamer (7K5L [65]). Images were generated using DSSR and PyMOL [66].

These structures have significantly expanded our knowledge of structure/function relationships and raised the potential of new knowledge from systematic analyses of structure collections. Many different databases and tools have been created to enable specialized analyses of nucleic acid structures. Some have focused on DNA [67], some on RNA [68–72], and some on the interactions between proteins and nucleic acids [73,74]. A systematic long-term analysis of dinucleotides led to a unified RNA + DNA automated classification system [50] available at DNATCO (dnatco.datmos.org) (accessed on 30 March 2022). The NDB (ndbserver.rutgers.edu) (accessed on 30 March 2022) is unique in that all nucleic acid structures and their complexes are contained in a single resource.

### **6. Going Forward**

The NDB is maintained to the extent that new structures and manually curated annotations are added each week, but there is little significant development since its last full funding in 2003. Even so, thousands of users from the Americas, Asia, Europe, and other locations continue to make multiple visits to the NDB website each month. The most heavily visited pages are Advanced Search and DNA and RNA galleries.

In 2018, the collaborative group of scientists managing both the NDB (at Rutgers) and RNAhub services (at Bowling Green State University) proposed to create the Nucleic Acid Knowledge Base (NAKB), with the goal of integrating information already in the NDB with additional sequence, structure, function, and interaction-based annotations for all major classes of NA-containing 3D structures. This new service, which will ultimately replace the NDB, is currently under construction. The NAKB aims to enable users to quickly find and download all structures and metadata relevant to their search topic, whether broad or focused, based on the NDB's internal curation scheme, computationally generated annotations, and/or external database references for DNA, RNA, mixed NA, and for NAbinding enzymatic, regulatory, and structural proteins. All NA-containing structures in the PDB will be indexed, including structures obtained using Electron Microscopy. The NAKB will be updated weekly.

The NDB has employed manual expert curation collected over three decades to identify major NA secondary structure features (duplex, triplex, and quadruplex) and high-level classifications (e.g., ribosomal RNA or telomeric DNA), as well as interactions with ligands (e.g., minor groove binding) and protein classification [37]. Integrated computationally created annotations have included bond distance, angle, and torsion geometries, base and base-pair morphologies, as well as RNA 3D motifs, interactions (base pair types and parameters, base-to-backbone, and base stacking interactions), and RNA equivalence (3D structure similarity) classes [75,76].

New NAKB content will include equivalence class calculations for all nucleic acid molecule types (RNA, DNA, hybrid nucleic acids), enabling more accurate retrieval for closely related NA structures, analogous to the way that UniProt identifier mapping has improved search capabilities for related proteins in PDB [77]. Computationally derived annotations produced by DSSR software [78], including secondary structure features, sugar pucker type, and pseudo-torsion angles, will be added.

New search capabilities will be developed for specific classes of chemical modifications of nucleotides; nucleic acid 3D structure motifs by their common names, for example, G-Quadruplex, R-loop, Holliday Junction, Sarcin–Ricin, Kink-turn; ribosome functional states, e.g., full or single subunit, translational state, and numbers and positions of bound tRNAs; and deeper classification levels for selected proteins such as transcription factors.

The NAKB website will employ a modern web infrastructure with flexible data representation viewable on phones and tablets as well as desktop computers. For each NAcontaining 3D structure, an atlas page will provide a summary overview of annotations as well as access to 1D, 2D, and 3D visualizations, external analysis tools, and file downloads. Mappings to external database links will initially include: PDB, Uniprot, RNACentral, Rfam. External analysis tools will include DNATCO and DNAproDB. Some of the reporting functions that were available in the original NDB so that the types of conformational analysis described earlier will be reenabled.

**Author Contributions:** Conceptualization, H.M.B.; writing—original draft, review, editing, H.M.B., C.L.L. and B.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The early funding of the NDB by the National Science Foundation is gratefully acknowledged. Funding for B.S. is from program Inter excellence of Ministry of Education, Youth, and Sports of the Czech Republic, grant number LTAUSA18197, and institutional support to the Institute of Biotechnology of the Czech Academy of Sciences, grant number RVO 86652036. Funding for C.L.L. is from the National Institutes of Health General Medical Sciences, grant numbers R01 GM079429 and R01 GM085238.

**Acknowledgments:** We are grateful to our collaborators for their many contributions over NDB's three decades of existence.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Life* Editorial Office E-mail: life@mdpi.com www.mdpi.com/journal/life

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-6097-7