*Article* **Stabilizing DNA–Protein Co-Crystals via Intra-Crystal Chemical Ligation of the DNA**

**Abigail R. Ward <sup>1</sup> , Sara Dmytriw <sup>2</sup> , Ananya Vajapayajula <sup>2</sup> and Christopher D. Snow 1,2,\***


**Abstract:** Protein and DNA co-crystals are most commonly prepared to reveal structural and functional details of DNA-binding proteins when subjected to X-ray diffraction. However, biomolecular crystals are notoriously unstable in solution conditions other than their native growth solution. To achieve greater application utility beyond structural biology, biomolecular crystals should be made robust against harsh conditions. To overcome this challenge, we optimized chemical DNA ligation within a co-crystal. Co-crystals from two distinct DNA-binding proteins underwent DNA ligation with the carbodiimide crosslinking agent 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) under various optimization conditions: 50 vs. 30 terminal phosphate, EDC concentration, EDC incubation time, and repeated EDC dose. This crosslinking and DNA ligation route did not destroy crystal diffraction. In fact, the ligation of DNA across the DNA–DNA junctions was clearly revealed via X-ray diffraction structure determination. Furthermore, crystal macrostructure was fortified. Neither the loss of counterions in pure water, nor incubation in blood serum, nor incubation at low pH (2.0 or 4.5) led to apparent crystal degradation. These findings motivate the use of crosslinked biomolecular co-crystals for purposes beyond structural biology, including biomedical applications.

**Keywords:** co-crystal engineering; chemical ligation; bioconjugation; X-ray diffraction; DNA; DNA-binding protein

#### **1. Introduction**

Beyond serving as the fundamental components of life, proteins and DNA are also key building blocks for nanoscale self-assemblies. Biomolecular assemblies, ranging from 2D arrays to 3D crystals, are useful tools for structural biology, bio-catalysis, and biomedical applications [1–3]. Porous biomolecular crystals can even act as macromolecular scaffolds [4], providing structural details to guest macromolecules [5]. However, downstream applications of interest, including X-ray diffraction, are hindered by crystal fragility and intolerance to solvent conditions other than the crystal growth solution. In this study, we establish a protocol for the chemical ligation of DNA inside of crystals and we demonstrate structural resilience of crosslinked co-crystals which may further their application utility.

DNA assembly stability is a limiting factor for DNA nanotechnology and DNA crystals. While coding DNA sticky base overhangs can drive self-assembly, the non-covalent DNA base stacking interactions and Watson–Crick hydrogen bonds that stabilize the junctions are only stable under specific conditions. For example, crystallization conditions for DNA crystals typically feature high concentrations of divalent cations such as Mg(II) to balance the negative phosphate backbone of DNA [6]. DNA–protein co-crystals may be similarly reliant on counterions, particularly if counterions stabilize the DNA–protein binding event [7]. Crystal forms that bring DNA building blocks into close proximity are very sensitive to the counterion environment, and often dissolve or convert into a

**Citation:** Ward, A.R.; Dmytriw, S.; Vajapayajula, A.; Snow, C.D. Stabilizing DNA–Protein Co-Crystals via Intra-Crystal Chemical Ligation of the DNA. *Crystals* **2022**, *12*, 49. https://doi.org/10.3390/ cryst12010049

Academic Editor: Abel Moreno

Received: 5 December 2021 Accepted: 21 December 2021 Published: 30 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

disordered aggregate when placed in water. To maximize application versatility, DNA structures should ideally be robust to solution variations, not just ionic strength but also temperature and pH [1]. Introducing covalent bonds across DNA–DNA interfaces has the potential to dramatically improve crystal macro-structure stability and could also improve X-ray diffraction.

Bioconjugation, or crosslinking, is a well-established strategy to improve the structural integrity of protein and DNA crystals [8]. The protein–protein interfaces found within protein crystals tend to be rich in primary amines and carboxylic acids. If all neighboring building blocks can be covalently linked, the resulting covalent organic framework can be a robust material. In traditional protein X-ray crystallography, glutaraldehyde, a highly reactive crosslinker, can increase crystal stability in varying solution conditions, and can even improve diffraction resolution [9,10]. In our previous work on protein crystals, we have found that glyoxal offers an effective alternative to glutaraldehyde [11,12]. Chemical crosslinking and photo-crosslinking methods for DNA crystals are also established in the literature [13–15]; however, we wanted to focus on a protocol in which the crosslinking does not require a specific sequence of DNA and does not add atoms to the structure (a zero-length crosslink).

Arguably the most natural form of sequence-independent DNA crosslinking is *ligation*, where the nicks dividing stacked dsDNA blocks are removed to generate longer contiguous DNA strands. For example, Li et al. used T4 DNA Ligase to ligate the DNA junctions within highly porous DNA crystals [16]. This elegant approach is limited to crystals that have large enough solvent channels for enzyme ingress. Here, we sought to optimize a chemical ligation alternative to the use of ligase that would be applicable to crystals with both large and small pores.

Our chemical ligation chemistry relies on 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC), a water-soluble carbodiimide [8]. EDC is widely used, especially in protein conjugation, to crosslink primary amines to carboxylic acids. A less common chemistry for EDC is the activation of a terminal phosphate such that a suitably placed nucleophile can displace the leaving group [17]. When that nucleophile is the hydroxyl of a neighboring DNA strand, this chemistry results in a zero-length crosslink: a scar-less chemical ligation of DNA (Figure 1). EDC has been used to ligate dsDNA hairpins in solution [17], to link the phosphate backbone of stacked DNA in liquid crystals [18] and to stabilize a 600 nucleotide DNA origami structure [19]. Our work represents the first ligation via EDC of co-crystals containing protein and DNA. We show that EDC crosslinking dramatically increases crystal stability at the macroscale and does not prevent destroy the crystal nanostructure (i.e., treated crystals are still suitable for study via X-ray diffraction).

To demonstrate generality, we chemically ligate two different co-crystals of DNAbinding proteins containing stacked DNA–DNA interfaces (Figure 2). For convenience, we will refer to crystals of the RepE54 transcription factor bound to cognate 21-mer dsDNA as Co-Crystal One (CC1) (Figure 2A–D) and we will refer to crystals of the E2F8 transcription factor bound to cognate 15-mer dsDNA as Co-Crystal Two (CC2) (Figure 2E–G). The asymmetric unit for each co-crystal consists of a DNA-binding protein and short, cognate DNA duplex. Both co-crystals have existing models in the Protein Data Bank (PDB). CC1 is closely related to existing PDB entry 1rep, though the 1rep model corresponds to a crystal with differing DNA at the junction (Table S1). CC2 is identical to existing PDB entry 4yo2. The CC1 and CC2 crystals used in this study consist of dsDNA that is either blunt-ended or carries terminal 50 or 30 phosphates (Figure 3). In each co-crystal system, the crosslinking variables tested were terminal 50 vs. 30 phosphates, crosslinking time, EDC concentration, and repeated EDC dose. After EDC crosslinking, co-crystals had dramatically increased structural integrity with respect to changes in the solution condition.

To show foundational feasibility for biomedical applications, we demonstrated that crosslinked co-crystals remain robust in aqueous environments, blood serum, and at pH values found in the stomach (pH 2.0) or lysosomes (pH 4.5). Therefore, the EDC crosslinking results provided here may justify further investigation of chemically ligated co-crystals or pure DNA crystals as biomaterials. For scaffold-assisted crystallography [3] it is also important to note that the crosslinked co-crystals still diffracted X-rays. The crosslinked crystals tested here diffracted nearly as well as non-crosslinked crystals (anecdotally, a typical ~0.3 Å resolution difference). Additionally, we showed that this chemical ligation method is independent of the DNA sequence at the DNA–DNA junction. For example, despite differing DNA sequences at the junctions of CC1 and CC2, chemical ligation was effective in both cases. In summary, EDC ligation is a practical approach for crosslinking DNA inside of crystals and the optimized chemical crosslinking shown can provide the stability needed for diverse downstream applications.

**Figure 1.** The mechanism of chemical DNA ligation with EDC. (**A**) A terminal 50 hydroxyl and a terminal 30 phosphate on neighboring DNA chains. The phosphate interacts with EDC to form an intermediate (**B**) and the hydroxyl displaces the reactive intermediate to form a zero-length crosslink (**C**) between the two DNA chains. R<sup>1</sup> is the nucleobase and R<sup>2</sup> is the phosphate backbone.

**Figure 2.** (**A**) The building block for co-crystal 1 (CC1) consists of the RepE54 transcription factor bound to 21-mer cognate DNA (represented here by PDB entry 1rep). **(B**) A collection of neighboring CC1 unit cells oriented to show the DNA stacks in 2 dimensions, with protein at 50% transparency. (**C**) The CC1 lattice has C121 symmetry, and all DNA–DNA junctions are symmetry equivalent to (**D**) the single DNA–DNA junction shown here. (**E**) The building block for co-crystal 2 (CC2) consists of the E2F8 transcription factor bound to 15-mer cognate DNA (represented here by PDB entry 4yo2). (**F**) A collection of neighboring CC2 unit cells oriented to show the DNA stacks in two dimensions, with protein at 50% transparency. (**G**) The CC2 lattice has P3221 symmetry, and all DNA–DNA junctions are symmetry equivalent to the single DNA–DNA junction shown here. Images were generated in PyMOL.

**Figure 3.** Examples of six co-crystal variants relative to a 100 micron scale bar. The CC1 crystals have C121 symmetry and tend to grow as monoclinic prisms: (**A**) CC1 without terminal phosphates, (**B**) CC1 with terminal 50 phosphate, and (**C**) CC1 with terminal 30 phosphate. In contrast, CC2 crystals have P3221 symmetry and tend to grow as truncated hexagonal prisms: (**D**) CC2 without terminal phosphates, (**E**) CC2 with terminal 50 phosphate, and (**F**) CC2 with terminal 30 phosphate.

#### **2. Materials and Methods**

#### *2.1. Protein Cloning, Expression, and Purification*

The protein sequence (Protocol S1) of RepE54 transcription factor (CC1 protein) from PDB code 1rep was cloned into a PSB3 vector with a N-terminal 6-Histag [20,21]. The Histone Source at Colorado State University expressed and purified CC1 protein as follows. *E. coli* CodonPlus RIPL competent cells were transformed with the CC1 protein expression plasmid and grown at 37 ◦C to a density of OD<sup>600</sup> 0.6 in 2xYT broth containing Ampicillin (100 mg/L) and Chloramphenicol (25 mg/L). Isopropyl-β-D-thiogalactoside (IPTG) was added at 0.4 mM and the culture was continually shaken at 37 ◦C for 3 h. Cells were harvested by centrifugation and resuspended in PBS buffer supplemented with 300 mM NaCl, 0.2 mM AEBSF, and 5 mM B-mercaptoethanol and were homogenized by sonication at 50% output (10 cycles of 45 s on, 120 s off). Lysate was recovered by centrifugation at 27,000× *g* for 25 min. The supernatant was loaded onto Ni Excel Sepharose resins (CV = 15 mL, Cytiva), washed and eluted by a linear gradient of 0–500 mM imidazole in resuspension buffer. The fractions containing CC1 protein were pooled, concentrated using Amicon Ultra-15 10 kDa MWCO centrifugal filter unit (EMD Millipore) and loaded onto a size-exclusion HiLoad Superdex 200 PG column (Cytiva) equilibrated with sodium citrate buffer (100 mM Sodium citrate pH 6.2, 100 mM KCl, 10 mM MgCl<sup>2</sup> and 10% glycerol). Fractions containing CC1 protein were collected, concentrated to 15 mg/mL, and stored at −80 ◦C after freezing with liquid nitrogen.

The E2F8 transcription factor (CC2 protein) plasmid was graciously donated by the Taipale Lab (Protocol S1). The protein was expressed and purified based on previous guidelines [22]. CC2 protein with a TEV protease-cleavable N-terminal thioredoxin tag was expressed with a T7 promoter in *E. coli* BL21(DE3) cells. Upon addition of 0.5 mM IPTG, the cells were outgrown at 25 ◦C for 20 h. The cell pellets were sonicated in lysis buffer and applied to HisTrap (HisPur™ Ni-NTA Resin) equilibrated with HisTrap buffer (500 mM NaCl, 100 mM HEPES, 10 mM imidazole, 10% glycerol, 0.5 mM TCEP, pH 7.5). The protein was eluted with 200 mM imidazole in HisTrap buffer. CC2 protein was TEV cleaved from thioredoxin during dialysis using Snakeskin MWCO 10 kDa into HisTrap buffer. The cleaved product was separated from thioredoxin and TEV Protease by HisTrap, eluting with addition of HisTrap buffer. The CC2 protein was purified further with Nuvia™ cPrime™ Hydrophobic Cation Exchange Media, equilibrated with cation exchange buffer (50 mM NaCl, 100 mM HEPES, 10% glycerol, 0.5 mM TCEP, pH 7.5), and eluted with 100 mM NaCl in cation exchange buffer. The fractions containing CC2 protein were pooled, concentrated using Amicon Ultra-15 10 kDa MWCO centrifugal filter unit (EMD Millipore) and loaded onto a size-exclusion HiLoad Superdex 200 PG column (Cytiva) equilibrated with CC2 storage buffer (150 mM NaCl, 20 mM HEPES, 5% glycerol, 0.5 mM TCEP, pH 7.5). Size exclusion was completed at CSU's Histone Source. Fractions containing CC2 protein were collected, concentrated to 10 mg/mL, and stored at −80 ◦C after flash freezing with liquid nitrogen.

All protein sample purification was analyzed with SDS-PAGE (NuPAGE™ 4–12% Bis-Tris Gel) with MES SDS running buffer. Gels were stained with Imperial™ Protein stain. Protein concentrations were determined with Bradford Assay using Coomassie Plus™ Protein Assay Reagent.

#### *2.2. DNA Duplex Annealing*

DNA duplex sequences are given in Figure 2 and Table S1. The RepE54 co-crystal oligomers were designed from the original 22-mer in PDB code 1rep [20]. All sequences contained the 19 bp iteron sequence for DNA–protein binding, but the original duplex was truncated from a 22-mer to a 21-mer to eliminate an unresolved dangling base and to give a blunt ended DNA interaction for crosslinking. The E2F8 transcription factor co-crystal oligomers were the original duplex found in PDB code 4yo2 [22]. The CC1 and CC2 oligomers were synthesized and HPLC purified by Integrated DNA Technologies with termini containing no phosphates, 50 phosphates, or 30 phosphates. The oligomers were

resuspended: CC1 oligomers in 50 mM Tris HCl, 100 mM KCl pH 7.0 and CC2 oligomers in 10 mM Tris base, 150 mM NaCl, 1 mM EDTA pH 7.5. The DNA duplexes were annealed by combining cognate ssDNA oligomers in a 1:1 molar ratio, heating to 94 ◦C for 2 min then slowly cooling to room temperature over approximately 60 min. The final concentration of CC1 and CC2 duplexes were 4 mM and 1 mM, respectively. DNA stocks were quantified with a Qubit4 (Qubit™ 1× dsDNA HS Assay Kit).

#### *2.3. DNA–Protein Complex Co-Crystallization*

All co-crystals were grown via sitting drop vapor diffusion. At 30 min prior to crystal plate setup, the protein and DNA were incubated at a 1:1.2 molar ratio. The DNA–protein complexes were kept on ice for 30 min prior to use. CC1, RepE54 transcription factor cocrystal, crystallization conditions were 30–120 mM MgCl2, 2–16% PEG 400 and 100–220 mM Tris HCl pH 8.0. CC2, E2F8 transcription factor co-crystal, crystallization conditions were 40–300 mM ammonium sulfate, 5% PEG 400, 5–20% PEG 3350, and 80 mM HEPES pH 7.1. Crystals grew to a size of 50–150 µm<sup>3</sup> in a range of 24 h to 7 days.

#### *2.4. EDC Crosslinking Co-Crystals*

Co-crystals were washed in conditions similar to crystal growth conditions where growth buffer components that interfere with crosslinking were substituted (i.e., primary amines, carboxylic acids, and divalent cations). The CC1 wash solution consisted of 30–120 mM NaCl (substituting for MgCl2), 2–16% PEG 400 and 100–220 mM MES pH 6.0 (substituting for Tris HCl pH 8.0). The CC2 wash solution consisted of 20–300 mM lithium sulfate (substituting for ammonium sulfate), 5% PEG 400, 10–30% PEG 3350 (an increase of 10% PEG 3350 compared to the growth solution), and 80 mM MES pH 6.0 (substituting for HEPES pH 7.1). The 10% additional PEG 3350 for CC2 appeared to prevent the crystals from degrading upon addition of the wash. The co-crystals were washed in 9-well glass plates (Hampton) to remove additional protein and DNA monomers and unwanted buffer components. 1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) (Advanced Chemtech CAS#:25952-53-8) was resuspended in the wash solution to final concentration values ranging from 5 to 80 mg/mL and used immediately. The co-crystals were crosslinked in a 200 µL EDC solution volume for varying time points. The co-crystal crosslinking reaction was quenched by moving crystals to 1× Tris-Borate-EDTA (TBE) buffer pH 8.3 containing 3.5 M urea.

#### *2.5. DNA Gel Electrophoresis and Densitometry*

Crosslinked co-crystals were dissolved in 3.5M Urea in 1× TBE supplemented with Proteinase K and incubated at 50 ◦C overnight. When crystals were too robust to dissolve under these harsh conditions, the crystals were heated to 94 ◦C for 1 h and glass crystal crushers (Hampton) were used to crush the crystals prior to chemical and enzymatic attack. The crystals were analyzed with 10% or 15% Novex™ TBE-Urea Polyacrylamide Gel Electrophoresis (PAGE) with 1× TBE running buffer. DNA ladders were GeneRuler Low Range DNA Ladder (Thermo Scientific, Houston, USA) for CC1 gels and Ultra Low Range DNA Ladder (Invitrogen) for CC2 gels. The control lanes included 1-mer dsDNA and 2-mer dsDNA, prepared by annealing oligos as mentioned in Section 2.3 (Table S1 Duplex IDs 1.1, 1.4, 2.1, and 2.4). Gels were incubated with 3 × GelRed™ Nucleic Acid Gel Stain and imaged with a UVP Bioimaging System on the Ethidium Bromide setting. For further validation, selected ligation products for CC1 were also analyzed with a TapeStation D1000 ScreenTape assay (Agilent) (Figure S1) at CSU's Next Generation Sequencing Core. The gels and TapeStation were analyzed via densitometry.

#### *2.6. DNA Gels and Densitometry*

For densitometry, we used ImageJ (1.52 k) to obtain raw x,y,intensity values for the gels shown in Section 3.1. We averaged these data over x values and used custom Python scripts (within "cocrystal\_ligation\_scripts.zip" hosted on Zenodo [23]) as well as the lmfit module [24] to obtain non-linear best fits of the gel intensity. Specifically, we modeled peaks using Gaussian functions. We also modeled the background using diffuse Gaussian functions. Crystals with more crosslinking produced overlapping gel bands for higher-order ligation products. One of the benefits of using a mathematical curve fitting framework is our ability to fit (albeit approximately) these populations. Specifically, we fit the peak position trend using the well-separated gel bands corresponding to smaller ligation products. Then, we fit the highly overlapping region using extrapolated peak positions with fitting parameter restrictions implemented via lmfit. Inspection of the fitting results (Figure S2) gave us confidence that higher-order band intensity fit was reasonable.

In principle, longer DNA ligation products can adsorb a greater number of GelRed fluorophores, proportionally with the DNA length. Ignoring this effect might cause us to overestimate the ligation yield. Accordingly, we proceeded to normalize the estimated molar ratio of the ligation products (Section 3.2) by dividing each band intensity by the assigned DNA block size (divide by N for N-mer DNA blocks). The raw band intensity fits are provided in Table S2.

#### *2.7. Random Ligation Model*

As exemplified in Section 3.2, the densitometry data could be interpreted in terms of the relative population of unfused DNA, ligated 2-mer, ligated 3-mer, etc. We sought to interpret these data in terms of the likely percentage of the dsDNA-dsDNA interfaces that have gained at least one covalent bond via EDC ligation. First, we used the estimated molar ratio of products from gel densitometry to estimate the fraction of potential ligation sites that were ligated. Second, to compute the expected distribution of fused DNA blocks of varying length, we implemented a simple 1D simulation in Python (Protocol S2, also within "cocrystal\_ligation\_scripts.zip" [23]) in which all 85712 nicks between DNA blocks in a 1D stack of 42,857 blocks (a 300 micron stack) were equally likely to be randomly removed in each unit of time. This "random ligation model" is arguably the least complex theoretical model for the crosslinking process, ignoring transport phenomena and assuming that all possible ligation sites throughout the crystal undergo ligation randomly with equal probability per unit of time. We also developed a biased ligation model in which sites near the crystal interior are less likely to be ligated than sites near the crystal surface (Protocol S3, also within "cocrystal\_ligation\_scripts.zip" [23]).

#### *2.8. X-ray Diffraction Data Collection, Refinement and Omit Maps*

Single-crystal X-ray diffraction (XRD) data were collected for CC1 crystals containing 5 0 and 30 terminal phosphates. Crosslinked crystals with 30 terminal phosphates were also analyzed via XRD. Crystals were briefly swished through cryo-protectant solution (300 mM MgCl2, 30% PEG 400, and 100 mM Tris HCl pH 8.0) and flash-frozen in liquid nitrogen. Frozen crystals were stored in Rigaku ACTOR Magazines (Mitegen) and shipped to the Advance Light Source Beamline 4.2.2 for data collection. Full datasets were collected on a CMOS detector from 0 to 180 degrees with an omega delta of 0.2◦ and an exposure time of 0.3 s. Data were processed with XDS [25] and molecular replacement and refinement within PHENIX [26] and COOT [27]. As a result, the original co-crystal for RepE54 transcription factor (2.60 Å PDB code 1rep) was updated with a higher-resolution structure (1.89 Å PDB code 7rva). The updated structure was solved with molecular replacement using the PDB code 1rep. CC1 crystal structures containing 50 or 30 terminal phosphates were solved via molecular replacement in PHENIX using the updated original CC1 as a starter model. For all structures, the same R-free flags were used during refinement in PHENIX and COOT. Structure factor data were truncated using I/sigma(I) >1.5 as a cutoff. The resulting structures were of: CC1 with terminal 50 phosphates (PDB code: 7sgc), CC1 with terminal 30 phosphates (PDB code: 7sdp), low EDC crosslinked (5 mg/mL EDC, 12 h) CC1 with terminal 30 phosphates (PDB code: 7soz), and heavy EDC crosslinked (30 mg/mL EDC, 12 h, two doses) CC1 with terminal 30 phosphates (PDB code: 7spm). Standard X-ray diffraction data quality statistics are provided in Tables 1 and 2.

Omit maps were generated for each structure, shown in Section 3.3. To prevent bias of the electron density at the junctions in crosslinked structures, discovery and omit maps were generated with structures containing no terminal phosphates. After generating discovery and omit maps, the terminal phosphates were added to the structures and refined for submission to the PDB. In the final PHENIX refine of heavy crosslinked CC1 terminal 3 0 phosphates, a custom geometry bond restraint was added because the electron density indicated ligation at both junctions. The terminal 30P and flanking 50OH were given a bond length restraint of 1.59 Å, the ideal length of the phosphate-oxygen bond in the DNA backbone [28].

**Table 1.** X-ray diffraction statistics for the updated original CC1 crystal, the CC1 crystal with terminal 5 0 phosphates, and the CC1 crystal with terminal 30 phosphates.


\* Values in parentheses are for high-resolution shell.


**Table 2.** X-ray diffraction statistics for the CC1 crystal with terminal 30 phosphates and low crosslink (5 mg/mL EDC for 12 h) and the CC1 crystal with terminal 30 phosphates and heavy crosslink (two doses of 30 mg/mL EDC for 12 h).

\* Values in parentheses are for high-resolution shell.

#### *2.9. Stability Assays*

Crystals were crosslinked using the Section 2.4 protocol, with 15 mg/mL EDC for 20 h. The EDC reaction was quenched in 50 mM Tris base pH 8.0 for 30 min. The crystals were equilibrated in crosslinking wash solution for 30 min prior to looping to stringent conditions. The stability test buffers used were as follows: molecular biology grade water (CORNING), very low pH 2.0 0.01 M HCl buffer (to mimic stomach acid), a moderately low pH 4.5 citrate buffer (46 mM sodium citrate, 54.1 mM citric acid to mimic lysosomal fluid pH), and blood serum (HyClone, bovine calf serum). Pictures for each trial are in Figures S3–S6. Crystal pictures were obtained with a Moticam 3.0 MP camera attached to a Motic SMZ-168 stereozoom microscope and crystal measurements were performed in Motic Images Plus 2.0 (Figure S4 and Protocol S4).

#### **3. Results**

#### *3.1. Chemical Ligation in Co-Crystals*

Within our two co-crystal families (CC1 and CC2), we observed clear evidence of chemical ligation of stacked DNA duplexes. Both co-crystals demonstrated broadly similar ligation results, emphasizing the generality of this ligation method to co-crystals in which blunt-ended DNA blocks are suitably positioned to resemble contiguous DNA. As shown in Table 3, the PDB entries for the parent structures of both CC1 (7rva) and CC2 (4yo2) have junction step geometry that is reasonably comparable to contiguous B-DNA as calculated using x3DNA [29]. Except for the twist and roll across the CC2 junction (as seen in PDB entry 4yo2), all step geometry parameters are within 2 standard deviations of the B-DNA mean. It is possible that other co-crystals in which the DNA–DNA junctions have a geometry less like contiguous DNA would resist ligation. Additionally, in the preliminary crosslinking tests shown here, the crosslinking was successfully independent of the sequence at the DNA ends. CC1 has GC/CG flanking ends while CC2 has AT/TA flanking ends. The sequence independence of this ligation strategy is advantageous for DNA structure design projects where the junction sequence may be constrained for functional reasons. Table 3 also reports an interesting asymmetry between the two nick sites at the DNA–DNA junctions within the CC1 family of structures. We report the distance between C50 and O30 to avoid relying on the less certain O50 position. For calibration, an idealized B-DNA model from x3dna had C50 to O30 distances of 2.73 Å for contiguous bases, but this span is variable (2.99 ± 0.17 Å) elsewhere within the dsDNA of PDB entry 7rva. One of the two CC1 nick sites, chain B, was invariably closer than chain A (e.g., 3.75 Å rather than 4.22 Å in CC1-30P), and electron density suggested that this shorter gap (chain B) was more readily ligated.

**Table 3.** DNA–DNA junction geometry parameters before and after phosphorylation and ligation. The likelihood of successful chemical ligation for stacked DNA may depend on geometry details across the junction. Here, we compare the geometry of the junction in the parent PDB models for CC1 and CC2, as well as the blunt-ended 50 or 30 phosphorylated CC1 crystals, to the geometry of contiguous bases in idealized B-DNA from Olson et al., 1998 [30]. The junctions are not symmetric, and differing distances for the two nicks across the junctions are also shown.


\* Base pair step parameters from Olson et al. 1998 [30]. C50 to O3<sup>0</sup> distance from x3dna idealized B-DNA. † Values differ from B-DNA by more than 2 standard deviations. ‡ Values are the distances in the refined "discovery" models prior to addition of the 30 phosphate (not PDB 7spm).

EDC crosslinking was tested for both 50 and 30 phosphate laden crystals. For CC1, the 3 0 phosphate resulted in superior ligation yield than the 50 phosphate in each trial (Figures 4 and S7–S11). On the other hand, CC2 ligation yields had a modest difference in the ligation yield for 30 and 50 phosphates. Given the limited dataset, it is premature to conclude that 30 phosphates will typically give a higher ligation yield within co-crystals.

**Figure 4.** TBE-urea gels of (**A**) CC1 and (**B**) CC2 chemical ligation. In both co-crystals, additional ligation was achieved with increased EDC concentration and a second EDC dose. (**A**) A 10% TBEurea gel of CC1 illustrating a much-improved ligation product distribution for 30 vs. 50 phosphates. (**B**) A 15% TBE-urea gel of CC2 illustrating a modestly improved ligation product distribution for 3 0 vs. 50 phosphates. Assigned band sizes are given in bp.

In both systems, DNA ligation was dependent on the presence of the terminal phosphates as well as on the crystal template; control crystals lacking terminal phosphates yielded no observable ligation products (Figure S10). Additionally, freely diffusing DNA blocks carrying terminal phosphates (but lacking the co-crystal scaffold) also yielded no observable ligation products when exposed to EDC (Figure S10). This second control demonstrated that the scaffold was necessary for ensuring efficient ligation of blunt-ended DNA blocks. The absence of observable ligation for building blocks in the absence of the crystal "scaffold", precludes a systematic study of the effects of precursor ligation on crystal growth. Future work will determine, as a function of sticky overhang length, the extent to which blocks with sticky overhangs can be ligated within crystals and in solution.

Crosslinking reaction time was clearly and directly related to ligation reaction yield during the first 12 h (Figure S7). It was less clear if reaction yield was further improved by incubation beyond 12 h. Therefore, 12 h crosslinking incubations were used for the subsequent ligation optimization trials.

In the next series of experiments, we optimized EDC concentration for maximum ligation yield. We assayed the ligation product distribution as a function of concentration from 5 mg/mL EDC to 80 mg/mL EDC. As hypothesized, increasing the concentration of EDC increases the ligation of DNA duplexes in the co-crystals (Figures 4 and S8). In CC1 trials, we did not see a noticeable increase in ligation beyond 30 mg/mL. However, in CC2 trials, there was improved ligation at 60 mg/mL. We also subjected the co-crystals to multiple fresh doses of EDC (30 mg/mL) to determine if we could achieve near 100% ligation. For both co-crystal systems, multiple doses of EDC did increase ligation yields (Figure S9) but did not approach 100% ligation yields.

Reaction buffer components were critical for successful ligation. We observed, at the outset of this project, that the presence of magnesium chloride in the crosslinking buffer appeared to interfere with the crosslinking reaction. This was problematic because the CC1 crystal growth conditions contain a significant amount of magnesium chloride. In our crystallization trials, 30–120 mM magnesium chloride was required for growth [21]. Additionally, there is a structural Mg(II) at the DNA–protein interface coordinated by Glu77 and Asp81. To circumvent the apparent deleterious role of Mg(II) on CC1 crosslinking, we replaced magnesium chloride with sodium chloride in the wash solution for all CC1 crosslinking trials. At the conclusion of the project, we again confirmed that Mg(II) was deleterious to ligation by adding Mg(II) to the optimized ligation protocol. Specifically, we verified that supplementing the crosslinking incubation buffer with 90 mM or 110 mM MgCl<sup>2</sup> noticeably reduced the ligation yield (Figure S11). The exact role of Mg(II) in inhibiting the ligation reaction is not clear, but might involve reduced availability of the nucleophilic phosphate groups.

#### *3.2. Ligation Model Compared to Experimental Co-Crystal Ligation*

The ligation product distributions we experimentally obtained should shed light on the stochastic process of ligation. Using densitometric analysis of electrophoresis results, we quantified the population ratio of bands assigned to non-modified DNA blocks as well as fused 2-mer, 3-mer, etc. For selected gels, we also obtained TapeStation results (Figure S1). The relative population of the end-product distribution was fairly consistent for gel band populations measured with TBE-urea gels in ImageJ compared to the automated TapeStation analysis (Figure S1).

Next, we sought to calculate a global performance metric for the ligation yield, P*LIG*, as the fraction of all possible DNA–DNA nick sites throughout a crystal that were ligated. One destructive assay to quantify the ligation yield throughout an entire crystal is to analyze the implications of the final DNA product distribution recovered after the crystal is dissolved and the protein components are removed. A related quantity is *PDSB*, the probability that any random DNA–DNA junction within the crystal remains a double-strand break (DSB). If we count the number of DNA blocks of each length (*n<sup>i</sup>* ) present in the crystal, we ignore edge effects and estimate the total number of DSB as *NDSB* = ∑*<sup>i</sup> n<sup>i</sup>* . For the same crystal, the estimated total number of original junctions (regardless of final ligation status) would be *NJXN* = ∑*<sup>i</sup> i*·*n<sup>i</sup>* . For example, adding a single fused 3-mer to the crystal increases the DSB tally by one, but increases the tally of all possible junctions by three. Then, to compute the total probability of encountering DSB, we calculate:

$$P\_{DSB} = \frac{N\_{DSB}}{N\_{IXN}} = \frac{\sum\_{i} n\_{i}}{\sum\_{i} i \cdot n\_{i}} = \frac{\sum n\_{i} / \sum\_{i} n\_{i}}{\sum\_{i} i \cdot n\_{i} / \sum\_{i} n\_{i}} = \frac{1}{\sum\_{i} i \cdot \mathbf{x}\_{i}} \tag{1}$$

In the final equation, *x<sup>i</sup>* is the mole fraction for the DNA block of length *i*. Therefore, to estimate the *PDSB*, we can use estimated mole fractions from electrophoresis and densitometry (Figures 4 and S2 and Table 4). Accurately calculating *PDSB* does require including the small mole fractions for higher-order products (Table S3) since longer products contribute proportionally more to ∑*<sup>i</sup> i*·*x<sup>i</sup>* . To estimate the uncertainty in each *PDSB*, we used 500 numerical trials in which random noise was added to *i*·*x<sup>i</sup>* to mimic densitometry measurement error. We used noise comparable to *i*·*x<sup>i</sup>* for the highest-order ligation products (normal variate with standard deviation 0.03), such that the smallest *i*·*x<sup>i</sup>* values would regularly fall to 0 after the addition of random noise.

While the probability of encountering a double-strand break in the crystal (*PDSB*) is an important parameter, it would also be useful to know *PLIG*, the probability of each terminal phosphate having undergone ligation. In the context of the random ligation model (RLM), ligation events throughout the crystal are independent and occur with equal probability at all nick sites. Therefore, the incidence of double-strand breaks within the crystal should occur with the joint probability of independent events, *PDSB* = (1 − *PLIG*) 2 . Thus, the overall probability that a random terminal phosphate within the crystal will be ligated

is *PLIG* = 1 − √ *PDSB*. The joint probability that DNA junctions will be double ligated is *PDLIG* = 1 − 2 √ *PDSB* + *PDSB*, and the probability that they will be singly ligated is *PSLIG* = 2 √ *<sup>P</sup>DSB* <sup>−</sup> *<sup>P</sup>DSB* .

**Table 4.** Distribution of DNA block sizes as a function of crosslinking protocol and 30 vs. 50 terminal phosphates. The data shown correspond with the gel lanes in Figure 4. The crosslinking protocols low, medium, and high were 1 dose of 5 mg/mL EDC for 12 h, 1 dose of 30 mg/mL EDC for 12 h, and 2 doses of 30 mg/mL EDC for 12 h each, respectively. The values in this table are weighted so that the DNA length and dye intensity contributes to the final value. Unweighted values are found in Table S2. The full table including estimated mole fractions for higher-order products is found in Table S3. PDSB, PSB, and PLIG were calculated for each crosslinked crystal sample. Uncertainties are standard deviations in derived quantities after 500 trials in which noise (standard deviation 0.03) is introduced into relative band intensities.


\* Calculated from experimental mole fractions per Equation (1). Other probabilities are calculated as shown.

This analysis of the electrophoresis experiments suggests that ~50% of the terminal phosphates within the most thoroughly crosslinked CC1-30P crystal have undergone ligation. On the other hand, ~75% of the DNA–DNA junctions in this crystal had at least one ligated chain. In summary, the ligation product ratio analysis suggests that a moderate fraction of the phosphates within these crystals have undergone the target ligation reaction, leading to an important question. What factors are limiting the yield? Incomplete ligation could result if a random population of terminal phosphates are missing, or otherwise incapable of on-target ligation. We used simulations to verify that the predicted RLM product ratio did not change when we postulated that a random subset of nick sites is incapable of ligation. This makes sense because junctions that are randomly selected to be incapable of ligation are functionally equivalent to sites that are randomly selected to be ligated last (i.e., after we stop ligating since we have reached PDSB).

It may also be possible that ligating one phosphate at a DNA–DNA junction would negatively affect neighboring ligation probabilities. However, evidence for such allostery is lacking. Instead, the observed product distributions for CC1 ligation outcomes (Table 4), were close to the distributions predicted by the RLM (Figure S12). One small but consistent deviation from the RLM was a lower 2-mer, and higher 3-mer population than predicted. This observation seems to preclude the simplest negative allostery scenario (where one ligation event would reduce the probability at flanking sites). We cannot rule out the possibility that this discrepancy is an artifact associated with the gel electrophoresis densitometry.

The CC2 ligation outcomes (Table 4) were significantly less consistent with distributions predicted by the RLM. Once more, the 3-mer population was often higher than expected, frequently exceeding the 2-mer population (which never happens in the RLM). This effect also seemed to extend to anomalously common 4-mers. A more striking divergence from the RLM prediction was the high population of non-ligated 1-mer blocks. Regardless of the RLM fit, the significant difference between the 1-mer mole fractions and the PDSB values obtained from all the mole fractions strongly implicates that the RLM is lacking.

To investigate, we tested biased ligation model simulations. One possible explanation is that the ligation outcomes were driven partially by kinetics and molecular transport phenomena. Hypothetically, ligation sites near the crystal exterior might be more likely to be ligated than possible sites near the crystal center since reactive molecules must traverse the outer layers to react the interior. To determine the likely implications of this scenario, we conducted biased random ligation simulations (Protocol S3) that increased the probability of ligation events near the surface, decreased the probability at the center, and terminated the random ligation process at a set PDSB threshold. Perhaps counterintuitively, this spatial bias increased the predicted 1-mer mole fraction. A high 1-mer fraction is partially consistent with the observed product distribution for CC2. The overall lower ligation yield achieved for CC2 crystals compared to CC1 is also consistent with the hypothesis that the CC2 crystal interior is systematically under-ligated.

#### *3.3. Ligation Structural Details*

Co-crystal structural details were revealed with X-ray diffraction at the Advanced Light Source beamline 4.2.2. Electrophoresis data (Figure 4) suggest that the CC2 DNA is stacked as intended. However, while high-resolution diffraction for CC2 crystals should be possible (3.07 Å reported by Morgunova et al. (22)), our CC2 crystals have, to date, yielded poor diffraction (>10 Å). Therefore, we chose to focus on the CC1 crystals as the model crystals to observe ligation via X-ray diffraction".

Here, we report five new crystal structures for CC1. We obtained a 1.89 Å dataset for the original co-crystal, which revealed additional details beyond the original model (PDB code: 1rep, 2.60 Å). Komori et al. varied the DNA building block to optimize resolution (20), finding that dangling Ts resulted in the best data. Our updated structure provides a rationale for this empirical observation. Specifically, one of the dangling T bases is resolved, and participates in a crystallographic contact. Removing the dangling Ts decreased the resolution of our native structures from 1.9 Å to 2.7 Å (CC1-50p) or 3.01 Å (CC1-30p). Once crystals were crosslinked with low (15 mg/mL 12 h) and heavy (2 doses 30 mg/mL 12 h) EDC, the crystals maintained diffraction, albeit with a moderate loss in diffraction (3.14 Å and 3.28 Å, respectively).

Models were refined with PHENIX [26] and COOT [27]. The electron density for the heavily ligated DNA junction was consistent with contiguous DNA, despite omitting the terminal phosphate throughout prior refinement calculations. Figure 5 shows omit maps where any terminal phosphates are omitted, along with the bases flanking the junctions. The potential for overlapping electron density contributions from non-ligated and ligated phosphates makes it difficult to quantify occupancy. Nonetheless, we observed clear trends. Prior to ligation, the positions of 30 phosphates (Figure 5C) or 50 phosphates (Figure 5D) were reasonably clear.

**Figure 5.** Omit maps for (**A**) the CC1 DNA–DNA junction. (**B**) Our updated model for the original structure resolves one of the dangling 50 terminal bases (white sticks). Prior to ligation, CC1 crystals grow with either (**C**) terminal 30 phosphates or (**D**) terminal 50 phosphates. Whereas (**E**) low dose EDC ligation results in minor changes to the electron density for CC1 with 30 phosphates, (**F**) high dose EDC ligation results in electron density consistent with ligated DNA. Neighboring protein is hidden for clarity. All meshes are omit maps (mFo-DFc) contoured at 3.0 rmsd. All four bases flanking the junction (orange sticks) were omitted. To faithfully represent COOT contours in PyMOL, we turned off automatic map normalization and instead set the contour level to 3.0 rmsd. Table S4 has the corresponding e/Å<sup>3</sup> values.

Consistent with the lower distance between C50 and O30 for chain B (Table 3), the electron density was invariably higher for the right hand nick (chain B:chain B). When contoured at 3.0 rmsd, the omit map electron density was even contiguous for the non-ligated CC1 30P case (Figure 5C). Notably, the maps for crystals subjected to EDC (Figure 5E,F) are discovery maps in the sense that the models were refined in the absence of terminal 3 0 phosphates. After light ligation (Figure 5E), the omit map was not clearly changed. However, after heavy ligation (Figure 5F), there was very strong electron density on the right and solid electron density in the left. Phosphates were added prior to submission to the PDB (entry 7spm) and our final refinement calculation for CC1 30P High included bond length restraints between the model and its symmetry neighbor to ensure a reasonable phosphate geometry.

It is somewhat remarkable that ligation was visible in the electron density trend (Figure 5C–F), despite the incomplete ligation yield suggested by the electrophoresis data (Table 4). In principle, the clarity of the ligation sites in the electron density maps may vary depending on whether the X-ray beam is diffracting from a highly ligated region of the crystal.

#### *3.4. Co-Crystal Stabilization Effects from Ligation and Crosslinking*

To determine if crosslinked co-crystals may be suitable for various applications, including biomedical applications at physiologically relevant conditions, the co-crystals were crosslinked (20 h, 15 mg/mL EDC) and subjected to a panel of harsh conditions: a stomach acid mimic, a lysosomal fluid mimic, blood serum (bovine calf), and deionized water (Figure 6). The conditions chosen, especially the stomach acid mimic and deionized water, were challenging for native crystals (no crosslink) since DNA-containing crystals typically require stabilizing counterions.

**Figure 6.** A survey of crosslinked crystals (15 mg/mL EDC 20 h) with terminal 30 phosphates in four stringent solutions. (**A**) CC1-30p crosslinked crystals incubated in pH 4.5, pH 2.0 and water for seven days and blood serum for twenty-four hours. (**B**) CC2-30p crosslinked crystals incubated in pH 4.5, pH 2.0 and water for seven days and blood serum for twenty-four hours.

In the stomach acid mimic (0.01 M hydrochloric acid pH 2), the non-crosslinked cocrystals were observed to convert to an aggregate (Figure S4). Remarkably, in the stomach acid solution, the entire set of crosslinked crystals demonstrated enhanced stability, not dissolving even after 7 days. The 30 phosphate crosslinked crystals did not change macrostructure for at least 5 days in the harshly acidic environment (Figure S4). Co-crystals without phosphates were also crosslinked and these crystals expanded dramatically in the acidic environment after 24 h (~430 ± 70% volume change), demonstrating the importance of the DNA ligation for crystal stability. Crosslinked co-crystals also maintained integrity in a lysosomal mimic buffer (pH 4.5) and blood serum with no measurable changes to the crystal dimensions after 24 and 72 h, respectively (Figures S5 and S6).

In deionized water, the co-crystal stability resulting from crosslinking was exceptional (Figures 7 and S3). Within one minute of transferring co-crystals to deionized water, noncrosslinked crystals (except for interesting exception CC2-30P) completely dissolved or were converted to an aggregate. When the co-crystals were crosslinked (20 h, 15 mg/mL EDC), the crystals remained intact and lacked observable changes to their surface quality or dimensions for at least 7 days (Figures 7 and S3). Interestingly, crosslinked co-crystals without terminal phosphates remained unperturbed, just like the 30 and 50 phosphorylated crystals. These results indicate that the protein–protein crosslinks created within the cocrystals were sufficient to maintain macroscopic crystal integrity in water. The distinct stability of crosslinked crystals in water confirmed our hypothesis that crystals can be stabilized with new covalent crosslinks. Specifically, the non-covalent interactions that make up crystals can be stabilized with chemical crosslinking and prevent crystals from degrading rapidly in an ion-environment (deionized water).

**Figure 7.** The crystals were crosslinked with 15 mg/mL EDC for 20 h and quenched with Tris base pH 8.2 for 30 min prior to transfer to the wash solution. All scale bars are 100 µm. (**A**) CC1 crystals in wash solution containing 50 mM NaCl, 14% PEG 400, and 200 mM MES buffer pH 6.0. The concentrations of the wash solution matched the initial crystal growth solutions, but we replaced MgCl<sup>2</sup> with NaCl and Tris HCl pH 8.0 with MES buffer pH 6.0. (**B**) CC1 crystals after transitioning to an ion-free environment (deionized water). The crosslinked crystals (left three panels) remained intact for 7 days. Non-crosslinked control crystals (right three columns) dissolved or converted to an aggregate at various immediate time points.

#### **4. Discussion**

Our strategy in this work was to identify an EDC ligation protocol (EDC concentration, incubation time, and repeated dosage regimen) that optimized reaction yield, without chasing diminishing returns. Accordingly, our final protocol uses 30 mg/mL EDC, an incubation time of 12 h, and two repeated doses within reaction sizes of approximately 200 microliters to ligate the DNA present within approximately 500 ng of co-crystals. Under these conditions, stacked DNA within co-crystals was reliably ligated to a significant extent. We used gel densitometry and detailed Gaussian peak fitting to estimate the fraction of the population for each ligated species (Figure S2).

Global analysis of the ligation product distribution suggested that the most thoroughly crosslinked CC1 crystals feature ligation of nearly half of all possible ligation sites, covalently linking about 50% of the DNA–DNA junctions. Ligation was corroborated by single-crystal XRD where we could directly observe ligation in electron density omit maps (Figure 5).

Apart from small systematic deviations, the random ligation model (RLM, Protocol S2) was able to fit the ligation product distribution for CC1 (Figure S12). In contrast, the CC2 ligation results could not be fit to the RLM as accurately (Figure S12). In particular, the CC2 crystals appeared to have a 1-mer mole fraction that was significantly larger than the total PDSB, which is inconsistent with the RLM. This could be explained by invoking transport limitations. Specifically, one way to boost the 1-mer mole fraction is if the exterior of the crystal has a higher ligation probability than the interior (Protocol S3).

Previously mentioned in the introduction, EDC ligation of DNA has been reported in the literature in the context of DNA hairpins in solution, liquid DNA crystals, and DNA origami. Notably, there has not been a consensus for whether 50 or 30 phosphate placement results in a superior yield. Fraccia et al. used 30 phosphates for the EDC ligation of liquid DNA crystals (15), whereas Kramer and Richert used 50 phosphates for the EDC ligation of a DNA origami structure (14). Giving a comparison of 50 versus 30 phosphates, Obianyor et al. showed EDC ligation of a hairpin DNA structure and reported 95% ligation yield for DNA with 30 phosphates whereas the 50 phosphates yielded 40% ligation [17]. They hypothesized the 30 phosphate ligation reaction could benefit from a primary alcohol nucleophile (Figure 1B) and the geometry difference of the two phosphate positions could contribute to reaction yields. Our data suggest that 30 phosphates may be superior in the context of a crystal, though comparison between CC1 and CC2 suggests that the results

may be system dependent. Our XRD data (Figure 5) furthermore suggest that the results may vary for different nick sites within the same crystal.

It is not clear why 30 phosphates were more readily ligated than 50 phosphates in CC1. Conceivably, the rate limiting step for the ligation reaction may be the attack of the hydroxyl on the activated EDC intermediate. Perhaps the short-arm 30 -EDC intermediate is more accessible to the long-arm 50 hydroxyl than a long-arm 50 -EDC intermediate is to a short-arm 30 hydroxyl. Notably, one of the 30 phosphates in the CC1 lattice (chain B) is close (5.65 Å) to a symmetry copy of itself (Figure S13), whereas the 50 phosphate is farther (9.29 Å). Therefore, 30 phosphate ligation might be favored due to the greater reduction in electrostatic repulsion upon ligation. However, the CC2 ligation results were more balanced (albeit still favoring 30 phosphates), suggesting that the relative efficacy of 30 or 50 phosphates will be system dependent.

Analyses via gel electrophoresis showed that the ligation yield increased concomitant with the EDC incubation time, but also that the reaction yield appeared to plateau short of full ligation. The cause is unclear. Transport considerations and EDC conjugation to protein sites complicate reaction modeling. One consideration is that the predicted active half-life for EDC in water at 298 K is sixteen hours [17]. However, the ligation yields also appeared to plateau for repeated EDC dosing. Perhaps incomplete ligation is due to a small fraction of DNA strands lacking the necessary terminal phosphate. Alternately, perhaps some EDC-activated phosphates have been ligated to third-party molecules. Perhaps a small DNA strand population is missing a base. Further investigation may be worthwhile prior to future work that depends on near 100% ligation.

In addition to optimizing conditions for our two co-crystals, we have established a set of generalizable guidelines for DNA ligation within co-crystals regarding optimal reaction conditions, phosphate composition, and concentration of EDC. First, it is imperative to optimize the wash solution for each respective system, eliminating components that could interfere with crosslinking. Reactive amines and carboxylic acids are obvious components to eliminate, to avoid forming off-target species. Additionally, we empirically found that it was important to minimize the concentration of the standard divalent cation Mg(II). Re-introducing 90–110 mM Mg(II) into our optimized protocol, we observed a dramatic reduction in the yield (Figure S11). Second, since we found that the best phosphate for ligation may depend on subtle geometry differences, we recommend testing both 50 and 30 phosphates for new co-crystal systems. Finally, the EDC concentration used for ligation of a new co-crystal may need to be optimized. Our co-crystals did not dissolve when introduced to crosslinking agents, with the highest concentration at 80 mg/mL. However, in past experiments, we found that the concentration of EDC in the crosslinking reaction drop can affect the integrity of co-crystals. Biomolecular crystals are typically fragile, and a drastic change in solution conditions can cause crystals to fall apart. Therefore, when working with a new system, we recommend testing a range of EDC concentrations. Dosing experiments may be necessary for systems that need a "gentle", multistep transition to harsher conditions. These guidelines may apply to crystals composed of only DNA, as well.

With data for two example co-crystals, generalization is difficult. CC1 and CC2 differ in numerous ways (e.g., DNA length of 21 bp vs. 15 bp, crystal space group, different base pairs spanning the DNA–DNA junction, different DNA sequences in general including flanking base pairs) which makes it difficult to determine which variables may be predictive of ligation yield. Given our observation that ligation may be very sensitive to the nick geometry (Table 3 and Figure 5), we hypothesize that several factors will be particularly important due to their influence on the nick geometry. The DNA sequence at the junction, and to a lesser extent the flanking bases, will affect the base pair stacking energy, which would be expected to change the nick geometry probability distribution. Other nick-site ligation yield differences may be driven by the crystallographic symmetry, particularly the presence or absence of neighboring groups in addition to intrinsic geometry differences between the nick site (e.g., a slightly higher nick distance for chain A nick sites in CC1 crystals).

The crystal stability produced after the chemical ligation of stacked DNA within crystals opens the door for downstream applications, especially for DNA nanotechnology efforts. As shown here, even incomplete ligation can result in dramatic stabilization effects with tangible benefits to suitable application targets. No obvious EDC-induced crosslinks were visible at the two distinct protein–protein interfaces in the CC1 system. Further experiments will be needed to specifically seek and identify any EDC-induced protein–protein or DNA–protein conjugation.

It is possible that DNA ligation provided strong stabilizing effects because both CC1 and CC2 are held together by DNA–DNA junctions in two dimensions (Figures 6 and 7). Essentially, by ligating the stacked DNA in these cases we are forming longer "threads" that are woven together. Stabilization of devices or materials is intriguing if this stabilization allows them to provide or preserve functionality in various biomedical contexts (e.g., in the digestive system, the blood stream, or within lysosomes). It may also be useful if crosslinking allows crystals to remain stable and diffract to high resolution under buffer conditions that mimic physiological conditions (e.g., inside the nucleus), thereby allowing XRD structure determination under conditions besides the idiosyncratic conditions that allow for co-crystal growth.

Along the same lines, one traditional concern crystallographers have regarding crosslinking chemistry is that subjecting a crystal to handling, buffer changes, and reactive chemicals, can degrade the diffraction resolution. For example, subjecting crystals to the common crosslinking agent, glutaraldehyde, can rapidly degrade diffraction resolution. However, supplying aldehydes via gentle vapor diffusion [9] can improve outcomes. We have observed that using glyoxal and EDC can likewise result in negligible diffraction loss, particularly if the reactive chemistry is quenched [11,12]. In the case of CC1, we have once again found that carefully optimized crosslinking protocols can maintain diffraction. Another notable benefit of the EDC crosslinking method is that crystals were not "damaged" during the reaction chemistry. For comparison, when crosslinking HEWL crystals with glutaraldehyde, careful optimization was required to avoid forming cracks in the crystals [31].

Future work may determine if the ligation yield differs for sticky overhang junctions compared to the blunt end junctions used in this work. Similarly, yield may also depend on the DNA bases that span the junction. That said, the current work suggests that the method may be sequence independent because the CC1 junction has a GC/CG and the CC2 junction has an AT/TA. In summary, the reported protocol is a reliable crosslinking strategy using the zero-length crosslinking agent EDC to affect DNA ligation at blunt-end DNA–DNA junctions held together by the co-crystal lattice. Post-ligation stability paves the way for biomedical applications.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cryst12010049/s1, Figure S1. TapeStation analysis and matching gel electrophoresis, Figure S2. Densitometry results and annotation (corresponds to main text Figure 4), Figure S3. Co-crystal stability test—water, Figure S4. Co-crystal stability test—very low pH 2.0 to mimic stomach acid, Figure S5. Co-crystal stability test—moderately low pH 4.5 to mimic lysosomal fluid, Figure S6. Co-crystal stability test—blood serum, Figure S7. Gel electrophoresis of varied EDC crosslink time, Figure S8. Gel electrophoresis of varied EDC crosslink concentration, Figure S9. Schematic and gel electrophoresis of varied EDC crosslink dose, Figure S10. Schematic and gel electrophoresis of the controls—crystals with no terminal phosphates and duplexes with terminal phosphates in-solution, Figure S11. Magnesium chloride's effect on the EDC crosslinking of CC1 crystals, Figure S12. Best fits of random ligation model (RLM) to product distribution data, and Figure S13. Terminal phosphates position due to crystallographic symmetry; Table S1. DNA oligonucleotide sequences used in this study, Table S2. Ligation percentages from gel densitometry (unweighted), Table S3. Full version of densitometry output Table 2, and Table S4: Absolute electron density values for the Figure 5 electron density maps; Protocol S1. Protein sequences for cloning and overexpression in *E. coli*., Protocol S2. Random ligation model: simulation and calculations, Protocol S3. Spatial biased random ligation model, and Protocol S4. Crystal measurements

**Author Contributions:** Conceptualization, A.R.W., S.D., A.V. and C.D.S.; data curation, A.R.W., S.D., A.V. and C.D.S.; formal analysis, A.R.W., S.D., A.V. and C.D.S.; funding acquisition, A.R.W. and C.D.S.; investigation, A.R.W., S.D., A.V. and C.D.S.; methodology, A.R.W., S.D., A.V. and C.D.S.; project administration, A.R.W. and C.D.S.; resources, C.D.S.; software, A.R.W. and C.D.S.; supervision, A.R.W. and C.D.S.; validation, A.R.W., S.D., A.V. and C.D.S.; visualization, A.R.W. and C.D.S.; writing original draft, A.R.W. and C.D.S.; writing—review and editing, A.R.W., S.D., A.V. and C.D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This material is based upon work supported by the National Science Foundation under Grant No. NSF DMR 2003748 and NSF DMR 1506219. The team also gratefully acknowledges support for undergraduate researchers from the Nelson Family Faculty Excellence Award.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in Zenodo at doi:10.5281/zenodo.5748969.

**Acknowledgments:** Hataichanok (Mam) Scherman, Histone Source at Colorado State University for the expression and purification of the RepE54 transcription factor and the purification of E2F8 transcription factor. Mark Stenglein and Mikaela Samsel at the Next Generation Sequencing Facility at Colorado State University for TapeStation analysis. Jay Nix at the ALS Beamline 4.2.2 for extensive support of the XRD data collection. The Taipale Lab for their CC2 protein plasmid donation. Thaddaus Huber for cloning expertise and PSB3 plasmid.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A New L-Proline Amide Hydrolase with Potential Application within the Amidase Process**

**Sergio Martinez-Rodríguez 1,2,\*, Rafael Contreras-Montoya <sup>3</sup> , Jesús M. Torres <sup>1</sup> , Luis Álvarez de Cienfuegos <sup>3</sup> and Jose Antonio Gavira 2,\***


**Abstract:** L-proline amide hydrolase (PAH, EC 3.5.1.101) is a barely described enzyme belonging to the peptidase S33 family, and is highly similar to prolyl aminopeptidases (PAP, EC. 3.4.11.5). Besides being an *S*-stereoselective character towards piperidine-based carboxamides, this enzyme also hydrolyses different L-amino acid amides, turning it into a potential biocatalyst within the Amidase Process. In this work, we report the characterization of L-proline amide hydrolase from *Pseudomonas syringae* (PsyPAH) together with the first X-ray structure for this class of L-amino acid amidases. Recombinant PsyPAH showed optimal conditions at pH 7.0 and 35 ◦C, with an apparent thermal melting temperature of 46 ◦C. The enzyme behaved as a monomer at the optimal pH. The L-enantioselective hydrolytic activity towards different canonical and non-canonical amino-acid amides was confirmed. Structural analysis suggests key residues in the enzymatic activity.

**Keywords:** amidase; amino acid; amidase process; proline; aminopeptidase; S33 family

#### **1. Introduction**

L-proline amide hydrolase (PAH, EC 3.5.1.101) is a barely described enzyme, which up to now, has only been characterized with some detail in *Pseudomonas azotoformans* IAM 1603 (LaaAPa) [1,2]. PAH belongs to the serine peptidase S33 family, together with prolyl aminopeptidases (PAP, EC. 3.4.11.5) or prolinases (Pro-Xaa dipeptidase, 3.4.13.18). PAH was suggested as a different member of this family since LaaAPa proved a different substrate scope than PAPs [2]. On the other hand, the enzyme proved enantioselective towards different piperidine-based carboxamides, L-prolinamide, and other different amino acid amides (Figure 1A). Since LaaAPa was applied in the context of the so-called "Amidase Process" for the industrial production of optically pure amino acids, its different substrate scope prompted its nomenclature also as L-amino acid amidase [2–4]. This biotechnological process consists of the dynamic kinetic resolution of amino acid amides mixtures using an α-amino-ε-caprolactam racemase together with a stereoselective "D- or L-amidase" (Figure 1B, [3]).

As for other enzymes with biotechnological interest, the general "amidase" nomenclature might confuse neophyte and experienced researchers, since it includes different unrelated enzymes. The enzymatic resolution of the two isomers of proline amide (D and L) was already achieved using an "amidase" from hog kidney more than half a century ago [5]; this enzyme also proved useful for the resolution of diverse amino acid amides ([6] and references therein).

**Citation:** Martinez-Rodríguez, S.; Contreras-Montoya, R.; Torres, J.M.; de Cienfuegos, L.Á.; Gavira, J.A. A New L-Proline Amide Hydrolase with Potential Application within the Amidase Process. *Crystals* **2022**, *12*, 18. https://doi.org/10.3390/cryst12010018

Academic Editors: Kyeong Kyu Kim and Dinadayalane Tandabany

Received: 25 November 2021 Accepted: 21 December 2021 Published: 23 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** (**A**) Substrates recognized by L-proline amide hydrolase. (**B**) General scheme of the "Amidase Process". The full line represents the "L-system", whereas the dashed line represents the "D-system". ACLR: α-amino-ε-caprolactam racemase.

A reduced number of "L-amidases" have been studied to some detail, such as those from *Pseudomonas azotoformans* [1], *Ochrobactrum anthropi* [7,8], and *Brevundimonas diminuta* [2]. Enzymes from *Pseudomonas putida* [9] or *Mycobacterium neoaurum* ATCC 25795 [10], and different aminopeptidases (EC. 3.4.11.X) and amidases (E.C. 3.5.1.4) have also shown to able to hydrolyze amino acid amides with good enantioselectivity [11,12]. Some of the latter enzymes have been applied at the industrial level [9,11,12]. A discrete example of the hydrolysis of amino acid esters and amides by acylase I has been reported, despite this enzyme being mainly used for the hydrolysis of N-acetyl-amino acids [13]. Peptide amidase from *Citrus sinensis* and *Stenotrophomonas maltophilia* also allowed enzymatic resolution of racemic N-acetyl amino acid amides, yielding N-acetyl-L-amino acids with optical purity ≥ 99% [14]. As for the "amidase" nomenclature, PAPs present a similar scenario, whereas many of the reported PAPs show a clear preference for proline residues ([15] and references therein), not all cases show that they are obligate "proline aminopeptidases". Some members of this family have shown cleaving activity with different amino acid derivatives at different extents [15–17].

In order to gain understanding into enzymes with L-amidase activity and with potential industrial interest, we have embarked on the characterization of a putative PAP from *Pseudomonas syringae* (PsyPAH). This enzyme is highly similar to LaaAPa, which is the only PAH characterized showing L-amino acid amidase activity [1]. On the other hand, the closest structural homolog of PsyPAH to date is the amidohydrolase VinJ from *Streptomyces halstedii* (PDB 3WMR, 55% seq id.), with a highly different substrate scope [18]. In this work, we provide biochemical and biophysical characterization, together with the first X-ray structure of a PAH enzyme (PAP-like) with experimentally proven "L-amidase" activity. We have gone a step forward and based on sequence and structural information, we have categorized the different L-amidase enzymes in the literature in an attempt to facilitate comprehension on their potential biotechnological application.

#### **2. Materials and Methods**

The different amino acid amides and *p*-nitroanilide derivatives used for activity measurement of PsyPAH were purchased from VWR (VWR International Eurolab S.L, Barcelona, Spain), TCI chemicals, Alfa Aesar, or Acros (Cymit Quimica, Barcelona, Spain). Other amino acid amides were synthesized as previously described [19] (see supporting information). Other chemicals were from Sigma Aldrich (Sigma-Aldrich, St. Louis, MO, USA).

#### *2.1. Cloning, Overexpression, and Purification of PsyPAH*

A DNA sequence corresponding to the putative L-amidase from *Pseudomonas syringae pv. tomato* (Uniprot A0A0Q0CYJ4) was synthesized and cloned into pET-22b (NZYtech, Lisboa, Portugal) for over-expression in *Escherichia coli*. The resulting construct allows the overproduction of PsyPAH fused to a C-terminal His6-tag. *E. coli* BL21 (DE3) (Agilent, Madrid, Spain) was transformed with this plasmid and grown in solid LB medium supplemented with 100 <sup>µ</sup>g·mL−<sup>1</sup> of ampicillin. A single colony was transferred into 10 mL of LB medium with ampicillin at the concentration above mentioned and incubated overnight at 37 ◦C. Then, 500 mL of LB supplemented with ampicillin was inoculated with 5 mL of the overnight culture. After 3–4 h of incubation at 37 ◦C with vigorous shaking, the OD<sup>600</sup> of the resulting culture was 0.6–0.8. To induce the over-expression of PsyPAH, isopropylβ-thio-D-galactopyranoside (IPTG) was added to a final concentration of 0.2 mM and the culture was kept at 16 ◦C overnight. Cells were collected by centrifugation (4000 rpm, 4 ◦C, 20 min) and subsequently frozen at −80 ◦C till use.

The pellet corresponding to 1 L was resuspended in 10 mL of 20 mM sodium phosphate, 20 mM of imidazole, and 300 mM of NaCl pH 8.0 (washing buffer, WB). Cells were lysed on ice via sonication with a Branson sonicator (6 periods of 60 s (1 s on, 1 s off), amplitude 25%) and then centrifuged (13,000 rpm, 10 min, RT). The resulting supernatant was applied to a HisPur Ni-NTA column (1 mL, Thermo Fisher, Waltham, MA, USA) previously equilibrated with 10 mL of WB. The column was then washed with 12 mL of WB and protein was eluted with 3 mL of 20 mM of sodium phosphate, 300 mM of imidazole, and 300 mM of NaCl pH 8.0. Subsequently, protein samples were loaded onto a Superdex 200 16/60 XK gel-filtration column (GE Healthcare, Boston, MA, USA) in an AKTA-prime FPLC system (GE Healthcare) using 20 mM of Hepes pH 7.0 as a running buffer. The peak corresponding to PsyPAH was concentrated up to 20 mg·mL−<sup>1</sup> using 30 kDa concentrators (Amicon Ultra-Millipore) and dialyzed in 20 mM of Hepes pH 7.0 (4 ◦C). Protein was frozen at −80 ◦C till use. Protein purity was verified by SDS-PAGE. Protein concentrations were determined from the absorbance at 280 nm (ε = 49,390 M−<sup>1</sup> ·cm−<sup>1</sup> ).

#### *2.2. Activity Measurement*

Different amino acid amides (10 mM) were used as possible substrates for PsyPAH: (amide derivatives of Gly, L-Pro and D-Pro, L-Trp, L-Phe, L-*tert*-Leu, L-Ala, L-norVal, L-Met, L-homoPhe, L-Ser, L-norLeu, L-Leu, L-2-ABA, and L-Val). The phenate method was used to measure ammonia formation [20], with slight modifications. Reaction volumes of 200 µL and a final enzyme concentration of 0.1–0.2 mg·mL−<sup>1</sup> (pH 7.0, 35 ◦C) were used. After 5–15 min, the reaction was stopped by mixing with 540 µL of freshly prepared phenate solution. A total of 280 µL of 2.5% sodium hypochlorite and 140 µL of 25 µM MnCl<sup>2</sup> were then added, followed by incubation at 70 ◦C for 40 min. Absorbance was measured at 625 nm. (NH4)2SO<sup>4</sup> standards were used for all the assays. Three replicates were conducted for each experiment.

Kinetic parameters for L-prolinamide and L-leucinamide were calculated with substrate concentrations ranging 0.1 to 15 mM, using 100 mM of stock solutions (in 100 mM of phosphate buffer pH 7.0). Reactions were carried out at 35 ◦C and pH 7.0 (using 20–400 <sup>µ</sup>g·mL−<sup>1</sup> PsyPAH concentrations depending on the substrate). After 5–15 min, (pre-experiments suggested this reaction time as appropriate for Vo calculation), ammonium formation was measured with the phenate method (see above). The activity with *p*-nitroanilide derivatives was measured following sample absorption at 405 nm. K<sup>m</sup> and kcat were measured using L-Leu and L-Pro *p*-nitroanilide concentrations ranging from 0.1 to 10 mM, using 500 mM of stock solutions (in acetonitrile). Reactions were carried out at <sup>35</sup> ◦C and pH 7.0 (using 8–80 ng·mL−<sup>1</sup> PsyPAH concentrations depending on the substrate, with a constant 2% concentration of acetonitrile into the reaction). A calibration was performed and plotted with *p*-nitroaniline in the same buffer used for activity determination (experimental ε = 9265 cm−<sup>1</sup> ·M−<sup>1</sup> , similar to that reported previously [21]). Three replicates were conducted for each experiment.

#### *2.3. Size Exclusion Chromatography (SEC-FPLC)*

PsyPAH was loaded onto a Tricorn Superdex 200 gel-filtration column (GE Healthcare) using an AKTA-prime FPLC system (GE Healthcare), with 20 mM of sodium phosphate pH 7.0 as a running buffer. BSA (66 kDa), ovalbumin (43 kDa), carbonic anhydrase (29 kDa), and RNase A (13.7 kDa) were used as standards for molecular mass determination (Cytiva Gel Filtration Calibration Kits).

#### *2.4. Dynamic Light Scattering*

DLS measurements were performed in a Zetasizer Nano instrument (Malvern Instruments Ltd., Malvern, UK). Experiments were performed with PsyPAH (1.3 mg·mL−<sup>1</sup> ) in 20 mM of sodium phosphate pH 7.0 at 25 ◦C. Samples were centrifuged for 10 min at 13,000 rpm before measurement. The PsyPAH sample was measured 3 times with 10 runs each (in automatic mode for time selection).

#### *2.5. Thermal Shift Assays*

Thermal shift assays were carried out using a QuantStudio 3 qPCR (Thermo Fisher). A concentrated PsyPAH sample was 10-fold diluted directly into different 100-mM buffers (sodium acetate, pHs 4.0–5.6; sodium phosphate, pHs 6.0–8.0; tetraborate HCl/NaOH, pHs 8.0–10.0) to a final concentration of 1.4 mg·mL−<sup>1</sup> , and kept at 4 ◦C O/N. Aqueous SYPRO (50×) was added to a final 10× concentration. Thermal denaturation measurements were monitored by measuring the changes in the fluorescence as a result of SYPRO binding. Denaturation data were collected from 25 to 99 ◦C at a scan rate of 3 ◦C·min−<sup>1</sup> . Three replicates were conducted in all cases. Despite the irreversibility of the thermal unfolding, apparent Tms were calculated using a Boltzmann fit to the raw data, with Protein Thermal shift software v1.3 (Thermo Fisher).

#### *2.6. Crystallization*

Freshly purified recombinant His6-tagged PsyPAH (20 mg·mL−<sup>1</sup> , 20 mM of Hepes pH 7.0) was used to set up initial crystallization screenings with the HRCS I & II (Hampton Research, Palo Alto, CA, USA). The hanging drop configuration of the vapor diffusion method with a 1:1 ratio of the reservoir and protein solution was used. Crystallization experiments were kept at 20 ◦C in an incubator. Crystals were obtained using 0.2 M of sodium acetate trihydrate, 0.1 M of sodium cacodylate trihydrate pH 6.5, and 30% *w*/*v* polyethylene glycol 8000 after 48 h.

#### *2.7. Data Collection and Refinement*

Target crystals were identified under a microscope using polarized light, separated with a microtool, fished out of the drop with a loop, and transferred to a 1-µL drop of mother solution containing 20% (*v*/*v*) glycerol as cryo-protectant. After soaking for less than 60 s, crystals were flash-cooled in liquid nitrogen and stored until data collection.

X-ray diffraction data were collected at ID30B (ESRF, Grenoble, France). Diffraction data were indexed and integrated using XDS [22] and scaled with AIMLESS from the CCP4 suite [23]. The crystal structure of PsyPAH was determined by the molecular replacement method with PHASER [24] using the structure of the amidohydrolase VinJ from *Streptomyces halstedii* (PDB ID: 3WMR) [18] as the search model. Refinement was done with PHENIX [25] and Refmac [26] with cycles of manual rebuilding using COOT [27] and finalized using several cycles of refinement applying TLS parameterization [28]. The final refined model was checked with Molprobity [29]. Data collection and refinement statistics are summarized in Table 1.


**Table 1.** Data collection and refinement statistics. (Statistics for the highest-resolution shell are shown in parentheses.)

#### *2.8. Sequence and Structure Analysis*

PDB-SUM was used for global structure analysis [30]. Clustal omega [31] and SPript [32] were used for multiple sequence alignment and phylogenetic analysis. The i-Tol server was used for tree representation [33]. The Dali server [34] was used to search for other members of the peptidase S33 superfamily with a similar fold to that presented by the PsyPAH structure. Graphical representation of 3D structural models was conducted with Pymol [35].

#### **3. Results and Discussion**

#### *3.1. PsyPAH Characterization*

Recombinant C-His6-tagged PsyPAH was purified using nickel affinity chromatography and SEC-FPLC (Size-exclusion chromatography-Fast Protein Liquid Chromatography) (>95% purity, yield of 10 mg per L of culture). SEC-FPLC showed an estimated molecular mass of 33 ± 2 kDa in phosphate buffer pH 7.0 (Figure 2A), slightly lower than the theoretical molecular mass of the monomer (36.7 kDa). An estimated R<sup>h</sup> of 2.5 ± 0.40 nm was obtained for PsyPAH by DLS (20 mM phosphate pH 7.0). This value is a bit higher than that shown for carbonic anhydrase (29 kDa, 2.37 nm [36]), and argues with the value obtained by SEC-FPLC. Thermal Shift Assays (TSA) showed single thermal transitions in the pH range from 6.0 to 11.0 as a result of SYPRO binding (Figure 2B, inset).

Apparent thermal midpoints (T<sup>m</sup> app) could be calculated from Boltzmann fitting, with values ranging from 35.8 to 46.0 ◦C in that pH interval (Figure 2B). The maximum T<sup>m</sup> app value coincided with the optimum pH activity of the enzyme (pH 7.0; Figure 2C). The optimal reaction temperature was 35 ◦C (Figure 2D), whereas enzymatic activity was lost at 50 ◦C.

**Figure 2.** (**A**) SEC-FPLC of PsyPAH (black continuous line) in phosphate buffer 20 mM pH 7.0. Protein standards represented in dashed lines are BSA (132 and 66 kDa), carbonic anhydrase (29 kDa), and RNAse A (13.7 kDa). (**B**) Apparent Tms calculated for PsyPAH at different pHs. The inset corresponds to the TSA experiment of PsyPAH in phosphate buffer at pH 7.0. Relative activity of PsyPAH as a function of pH (**C**) and Temperature (**D**).

Both the optimal temperature and pH were lower than those reported previously for LaaAPa [2]. No activity loss was observed after incubation of PsyPAH at 30 ◦C for 14 h and it also retained over 75% of its activity after incubation at 35 ◦C for the same period of time. PsyPAH stored at −80 ◦C maintained full activity for more than two years. Biochemical parameters of PsyPAH were assayed with the amide and *p*-nitroanilide derivatives of L-Pro and L-Leu (Table 2), showing the expected L-amidase activity of the enzyme. Whilst, we could not determine the K<sup>m</sup> values for two of the substrates used due to the limit of detection of the method (L-Pro-*p*-nitroanilide) and the solubility of the substrate (L-Leuamide), visual inspection of the kinetic profiles (Figure S1) supports that the K<sup>m</sup> for the amide derivatives of L-Pro and L-Leu is, at least, one order of magnitude higher than for the *p*-nitroanilide derivative (Table 2 and Figure S1). These results suggest that the presence

of the aromatic aniline moiety of the substrate improves PsyPAH-binding, which might reflect a better accommodation of these substrates into the active site.

**Table 2.** Kinetic parameters of PsyPAH with L-Pro and L-Leu amide and *p*-nitroanilide derivatives (pH 7.0, 35 ◦C). \* Could not be determined due to detection limit of the determination method. \*\* Could not be determined due to the solubility of this substrate. \*\*\* Obtained from the linear part of the kinetic plot (see Figure S1).


We have also qualitatively tested the activity of PsyPAH towards different canonical and non-canonical L-amino acid amides. PsyPAH was able to hydrolyze glycinamide, L-alaninamide, L-phenyalalaninamide, L-methioninamide, L-serinamide, L-valinamide, L-tryptophanamide, L-norvalinamide, L-homophenylalaninamide, L-norleucinamide, and L-2-aminobutyramide (data not shown). No activity was detected towards D-prolinamide or L-tert-leucinamide.

#### *3.2. PsyPAH Sequence Analysis*

Since E.C. classification is based solely on the enzymatic reaction, different enzymes catalyzing the same reaction can share the same nomenclature (e.g., L-amidases), even when their sequences are highly different. This is a recurrent issue in the biotechnological field, where it is common to discover novel enzymes after screening methods for a desired specific activity, from which they are named. The general "amidase" nomenclature used in the context of the "Amidase Process" might thus initially confuse neophyte researchers in this field, since many different enzymes classified under E.C. 3.5.1 are named as "amidases" [12,37]. Previous studies on L-amidases of biotechnological interest already highlighted enzymes belonging to different protein families [2,12].

Phylogenetic analysis of the primary sequence of enzymes with L-amidase activity shows four different enzyme groups (Table 3 and Figure S2). The broad-spectrum amidase from *Ochrobactrum anthropi* [7] shapes an alternative "acetamidase/formamidase clan" (Pfam PF03069), together with the enzymes from *Enterobacter cloacae* and *Thermus* sp. (Table 2). The industrially-used L-amidase from *Pseudomonas putida* (a leucine aminopeptidase [9]) and LaaABd shape an alternative "aminopeptidase clan", belonging to the peptidase M17 family (Pfam PF00883). On the other hand, the leucyl-aminopeptidase from *Aeromonas proteolytica* [38] confers an isolated clan, which belongs to the peptidase M28 family (Pfam PF04389, Table 3). Finally, LaaAPa and PsyPAH are grouped into a "peptidase S33 clan".

Thus, from a biotechnological point of view, it is important to bear in mind that different "L-amidases" belonging to different protein families exist when dealing with the so-called "Amidase Process". Besides their application on the production of amino acids, some of these L-amidases have also found other biotechnological applications [37,39,40], further increasing their potential and economic interest.

**Table 3.** Different enzymes with L-enantioselective amidase activity described in the literature with potential application in the production of amino acids. \* It is not clear from the literature whether the hog kidney amidase used in the 50s for the resolution of amino acids [5,6] might correspond to a leucyl aminopeptidase or a PAP, or even if they are the same enzyme [41,42].


#### *3.3. Overall Structure of PsyPAH*

PsyPAH crystallized in the most standard space group P212121 and presents a single polypeptide chain in the asymmetric unit, as observed in the solution. As ascertained from primary sequence analysis, PsyPAH belongs to the hugely diverse α/β hydrolase superfamily and more specifically to the serine peptidase family S33 (clan SC) [47]. The α/β hydrolase fold family of enzymes is one of the largest groups of structurally related enzymes with diverse catalytic functions. It contains several enzymes found to have a second promiscuous function on alternative substrates [48,49]. Like other members of this family, PsyPAH is constituted by two different domains, namely the catalytic domain (residues 1–141 and 231–end) and the cap domain (residues 142–230; Figure 3). The catalytic domain is formed by a αβα sandwich containing the conserved catalytic triad motif of the family (Ser113, Asp253, His280), whereas the cap domain is constituted exclusively by α-helices. A DALI search shows more than 140 structures with a Z-score over 20 when using the PDB90 subset database. However, only three structures present a sequence similarity over 25% with PsyPAH (Table S1): The amidohydrolase VinJ from *Streptomyces halstedii* (PDB 3WMR, 55% seq id. [18]), a putative uncharacterized PAP from *Mycobacterium smegmatis* (MysPAP, PDB 3NWO, 50% seq id.), and the Tricorn protease-interacting aminopeptidase F1 from *Thermoplasma acidophilum* (APF1, 34% seq. id., PDB 1MU0, [50], with RMSD of 1.0, 2.2, and 1.7 Å, respectively). Other different peptidase S33 family members appear with sequences below 21%, such as epoxide hydrolases and esterases (Table S1). On the other hand, other characterized PAPs included in the ESTHER database [51] whose structures are known, present a lower structure similarity with PsyPAH. This is the case of *Xanthomonas campestris* PAP (XcPAP, PDB 1AZW, [52]) or *Serratia marcescens* PAP (SmPAP, PDB 1QTR, [53]). Other known PAP family structures are those from the PAP-related protein TTHA1809 from *Thermus thermophilus* (PDB 2YYS, [54]) and putative PAP from yeast *Glaciozyma antarctica* (PDB 5YHP, unpublished results). However, no biochemical data is available for these enzymes.

**Figure 3.** Overall fold of PsyPAH. The catalytic domain is shown in cyan/purple, whereas the cap domain appears in red. The catalytic triad (Ser113, Asp253, His280) is shown in stick mode to account for its position in the catalytic domain.

#### *3.4. Differences on the Substrate Binding Groove (SBG) Seem to Account for the Substrate Scope of PsyPAH*

The substrate specificity and function of prolyl peptidases was proposed early on to be determined by the cap domain [53], where the substrate firstly needs to bind before reaching the catalytic center to be hydrolyzed. The specificity of the exopeptidase activity of SmPAP was thus proposed to have originated by steric impediments of this smaller domain, which would block the entrance of extra residues at the N-terminal proline of the substrate [53]. Differences on the substrate binding entrance were already highlighted for APF1, XcPAP, and SmPAP, with the latter showing larger openings to the active site [50]. An overview of the homolog PsyPAH structures reveals that whereas the catalytic domains are spatially conserved, the cap domain presents clear positional differences (Figure 4). Interestingly, the highest differences in the cap domain are observed when comparing PsyPAH with the two characterized PAPs: XcPAP and SmPAP (Figure 4A), while better fit are observed with VinJ, APF1, and the uncharacterized MysPAP (Figure 4B). The substrate binding groove of APF1 (SBG, also known as E1 site [50,55]) was experimentally deciphered between two helices comprised in the cap domain (e.g., PDB 1XRP, Figure 5A); conservation of the spatial disposition of the SBGs into the cap domains of PsyPAH, VinJ, and MysPAP is observed (Figures 5 and S3), revealing clear differences on the different PAP structures: The SBG on XcPAP (and SmPAP) is in a completely different position to the other enzymes, in-between the cap and the catalytic domains (Figure 4A, [53]). The different position of the SBG makes the catalytic center more accessible to the solvent in XcPAP and SmPAP, supporting the acceptance of long peptides. However, in APF1, the N-terminal peptide needs to enter the catalytic center by a narrow hollow, where it can be processed [50,55]. This "smaller" SBG supports the acceptance of shorter peptides when compared with XcPAP and SmPAP. This should also be the case for PsyPAH as observed by the SBG configuration (Figures 4, 5 and S3).

**Figure 4.** (**A**) Superposition of PsyPAH with "real" PAPs belonging to *Xanthomonas campestris* (XcPAP, PDB 1AZW) and from *Serratia marcescens* (SmPAP, PDB 1QTR). (**B**) Superposition of PsyPAH, amidohydrolase VinJ (PDB 3WMR), APF1 from *Thermoplasma acidophilum* (PDB 1MTZ), and putative uncharacterized PAP from *Mycobacterium smegmatis* (MysPAP, PDB 3NWO). (**C**) Sequence alignment of PsyPAH, VinJ, APF1, and MysPAP shown in panel B. Residues comprising the SBGs (red circles) or catalytic triad (asterisks)/catalytic cleft (black circles) are highlighted.

**Figure 5.** (**A**) Surface representation of APF1 showing the binding site of the tetrapeptide PLGG (E213Q mutant, PDB 1XRP). Red: Cap domain; orange: Catalytic domain. (**B**) Surface representation of PsyPAH showing the putative substrate binding site. Red: Cap domain; orange: Catalytic domain. The orientation is exactly the same as in (**A**). (**C**) Surface representation of XcPAP showing the putative substrate binding site. Red: Cap domain; orange: Catalytic domain. The orientation has been rotated approximately 90◦ with respect to (**A**).

The closest structural homolog of PsyPAH known to date is VinJ [18], but the most and only exhaustive analysis of the binding mode among homolog structures has been carried out with APF1 [50] (Table S2). Comparison of residues comprising different regions of APF1 (E1, S1, S10 [50,55]) with those of PsyPAH, VinJ, and MysPAP reveals totally conserved residues, despite a low overall conservation (Table S3). The different substrate scope of VinJ compared to APF1 was explained by the presence of a unique polyketide binding tunnel (which partly correspond to the E1 site, Tables S3 and S4) and a smaller S1 site in VinJ [18]. These unique feature of VinJ is necessary for polyketide moiety fitting on the surface of the enzyme (and other VinJ-proteins used for the synthesis of β-amino acid containing macrolactams [18]). Comparison of this hydrophobic tunnel with PsyPAH, APF1, and MysPAP confirms the unique character of this binding site in VinJ, which shows an overall higher hydrophobic character (Figure S4, Table S4). Specifically, residues F176VinJ and Y205VinJ were hypothesized to provide additional hydrophobic interactions with the polyketide chain of the substrate [18]. However, counterpart residues in PsyPAH, APF1, and MysPAP are overall more polar (Table S4). In this sense, E200APF1 (counter part of Y205VinJ) has been experimentally proven to be responsible for peptide docking [55], (see below). These structural differences suggest that PsyPAH is not a VinJ-type protein, and also supports a closer binding mode and catalytic mechanism to that reported for APF1 (Table S4).

Different Proline-containing liganded structures of APF1 (Table S2), PDBs 1XQY, 1XRP, and 1XRR, [55] show Y178APF1 and E200APF1 as responsible for Pro-docking at the E1 site (Figure 6). The counterpart of Y186PsyPAH and D208PsyPAH residues plausibly have a key role in substrate positioning at PsyPAH; in fact, D208PsyPAH shows alternative orientations, suggesting a dynamic character for substrate binding. A lower volume of the PsyPAH SBG is observed when compared to APF1 (Figure 6) arising from (i) displacement of P176-V190PsyPAH helix towards the catalytic domain (originating from the "closure" of the frontside of the E1 site by L179-D208PsyPAH) and (ii) the presence of longer or more voluminous side chains at the backside of the E1 site (Figure 6).

**Figure 6.** Superposition PsyPAH (white tones) and APF1 bound to PLGG peptide (PDB 1XQY, main chain in blue tones, peptide in sky blue tones). The numbering of the enzymes is that which appears in the corresponding PDBs (PsyPAH, black numbering; APF1, grey numbering).

R183PsyPAH generates a stacking interaction with W196PsyPAH (W188APF1), closing the backside of the SBG, impeding the allocation of longer peptides. Residues L179PsyPAH (Q171APF1) and R183PsyPAH (N175APF1) would also hamper the presence of similar peptide ligands in PsyPAH (Figure 6). Finally, F204PsyPAH (L196APF1) and I207PsyPAH (A199APF1) reduce the SBG cleft volume, producing a higher hydrophobic character of this site compared to APF1, but lower than that presented by VinJ [18]. In fact, the hydrophobicity of this site would partly explain why PsyPAH can hydrolyze different aliphatic/aromatic amino acid amides, or even why the *p*-nitroanilide derivatives were hydrolyzed more efficiently than the amide derivatives (Table 2); the environment generated by Y186PsyPAH, W196PsyPAH, and F204PsyPAH seems highly appropriate for the accommodation of an aromatic moiety. In this sense, it might be interesting to ascertain whereas other L-amino acid-amide derivatives with more voluminous amide substituents could be a more suitable starting material for their kinetic resolution using this subfamily of L-amidases. These differences in the SBG would support the different substrate specificity of PAHs when compared to PAPs.

#### *3.5. Putative Catalytic Centre of PsyPAH*

S113PsyPAH, D253PsyPAH, and H280PsyPAH comprise the canonical clan SC class catalytic triad of the family (Table S3). The putative catalytic center of PsyPAH is buried into the structure, accessible through the deep hollow contiguous to the E1 site, where the substrate needs to enter to be cleaved. Whereas we were not able to obtain a ligand-bound structure through soaking experiments, an extra density was found at the S1 site in our crystallographic data, assigned as a phosphate molecule most likely arising from the initial purification buffer. This molecule is at a binding distance of N218PsyPAH and E222PsyPAH (Figures 7A and S5). Superposition with the APF1 bound to L-Proline reveals that both ligands occupy the same spatial position (Figure 7B). The counterpart of N209APF1 and E213APF1 residues were proved to be key for L-Pro-binding [55] together with Y205APF1 and E245APF1. Since both enzymes process Pro-containing substrates and the four residues are conserved (N218PsyPAH, E222PsyPAH, Y214PsyPAH, and E254PsyPAH), a common L-Pro binding mode can be defined (Figure 7B).

**Figure 7.** (**A**) Catalytic cleft of PsyPAH showing the modeled phosphate molecule. (**B**) Superposition of APF1 bound to L-Proline at the S1 site (PDB 1XRR, blue tones) and PsyPAH bound to phosphate (white tones). The numbering corresponds to PsyPAH residues.

Analogously to APF1, our structural model also supports the carbonyl/amide groups from the peptide bonds of G42PsyPAH (G37APF1) and W114PsyPAH (Y106APF1) as the constituents of the oxyanion hole (Figure 7A,B, Table S3). Despite the conservation of these key residues, the rest of the amino acids comprising S1 and S10 sites are quite different (Table S3), while providing an overall hydrophobic character to these environments. It is important to highlight that P139PsyPAH (L131APF1) and W145PsyPAH (T137APF1) transform the PsyPAH S1 site into a much smaller and more hydrophobic cleft when compared to APF1, which might account for the substrate scope of PsyPAH toward different non-polar amides (see above).

Finally, further comparison of PsyPAH with SmPAP and XcPAP reveals that whereas the catalytic triad (S113PsyPAH, D253PsyPAH, and H280PsyPAH) is positionally conserved in the catalytic domain, key binding residues of the cap domain (N218PsyPAH, E222PsyPAH, Y214PsyPAH, and E254PsyPAH, also present in APF1, VinJ, and MysPAP, Table S3), are not conserved. These results confirm the discussion about the PAP classification and provide additional clues on the different substrate scope observed among PAPs [56].

#### **4. Conclusions**

In conclusion, we report the first crystal structure of a PAP-like amidase (S33 peptidase clan) at 1.95 Å resolution with potential application within the "Amidase Process", showing a broad substrate specificity toward different canonical and non-canonical amino acids. Structural and sequence analyses allow one to decipher different L-amidase subfamilies, a prerequisite to finding enzymes with new or improved properties. Besides, the overall structure of PsyPAH is more similar to VinJ (a S33 peptidase, not a PAP), and structural comparison showed a higher conservation of key residues of the activity of APF1 (a S33 peptidase, not a strict PAP), suggesting a similar catalytic mechanism to that proposed for the latter. The lower volume and hydrophobicity of the S1 and E1 sites seem to account for the activity with smaller L-amino acid amides.

Therefore, our results confirm PsyPAH as a different member of the S33 peptidase family, which is not strictly a PAP enzyme. Future work should focus on understanding the substrate specificity of amidases conforming the S33 peptidase clan through mutational and structural studies. Since the divergence of the cap domain among these enzymes seems critical for substrate specificity, special attention should be taken to accurately classify them. **Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/cryst12010018/s1. Synthesis of different amino-acid amides. Figure S1: Kinetic determinations for L-Pro- and L-Leu-amide. Figure S2: Phylogenetic analysis of different enzymes with proven L-amidase activity. Figure S3: Surface representation of different peptidase S33 family members. Figure S4: Comparison of the polyketide substrate binding site of VinJ with that of APF1, MysPAP, and PsyPAH. Figure S5: Omit maps calculated for the PsyPAH structure. Table S1: Homolog structures of PsyPAH obtained with the DALI server. Table S2: Different ligand-bound structures of Tricorn Interacting Factor F1. Table S3: Residues involved in substrate binding and catalysis in the different pockets of APF1. Table S4: Comparison of residues proposed in the polyketide binding tunnel in VinJ.

**Author Contributions:** Conceptualization, S.M.-R. and J.A.G.; methodology, all authors; investigation, all authors; writing—original draft preparation, S.M.-R. and J.A.G.; writing—review and editing, all authors; funding acquisition, S.M.-R., R.C.-M., L.Á.d.C. and J.A.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Spanish Ministry of Science and Innovation/FEDER funds grant PID2020-116261GB-I00/AEI/10.13039/501100011033 (JAG), from the FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades grants P18-FR-3533 (LAC) and P12-FQM-790 (RCM), and from the University of Granada grant PPJI2017-1 (SMR).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Coordinates and structure factors have been deposited at the PDB with accession code 7A6G.

**Acknowledgments:** We are grateful to the European Synchrotron Radiation Facility (ESRF), Grenoble, France, for the provision of time through proposals Mx1938 and Mx2064, and the staff at ID30B beamline for their assistance during data collection. SMR and JTP are also grateful to the Andalusian Regional Government through the Endocrinology and Metabolism Group (CTS-202). We want to thank "Unidad de Excelencia Química aplicada a Biomedicina y Medioambiente" of the University of Granada.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

