*2.5. Ring Puckering:* Δ*G(α1, α2) Minima for All Compounds*

Quantitative calculation of pyranose ring puckering probabilities is valuable for comparison to high-quality experimental data for pyranoses with multiple thermodynamically accessible puckering conformations, as in the case of IdoA and Ido. However, such calculation by either integration around eABF Δ*G*(α1, α2) minima or summing of re-weighted probabilities for individual snapshots from CMAP-biased simulations entails substantial post-simulation effort following the initial computation of Δ*G*(α1, α2) with eABF. Unlike IdoA and Ido, most of the pyranose monosaccharides considered here are expected to have a single thermodynamically important Δ*G*(α1, α2) minimum that corresponds to either the 4C1 or 1C4 chair pucker conformation. As such, tabulation of Δ*G* minima values in the four quadrants of (α1, α2) space provides a convenient semi-quantitative means to evaluate the behavior of the force field model for those compounds.

Table 2 lists the Δ*G* minimum value in each of the four quadrants of (α1, α2) space for each of the 45 compounds studied. It also correlates each thermodynamically important minimum (i.e., having a value of <3 kcal/mol) with the puckering conformation associated with the value of (α1, α2) for that Δ*G* minimum. This correlation was carried out using computed Cremer-Pople parameters (detailed in "Materials and Methods: Definition of 4C1, 1C4, 2SO, OS2, and other ring puckering conformations") for trajectory snapshots with (α1, α2) values within a 10◦ radius of the location of the Δ*G* minimum.

**Table 2.** Minimum Δ*G*(α1, α2) values in each of the four quadrants of the (α1, α2) reaction coordinate, and the corresponding major ring puckering conformation(s).



**Table 2.** *Cont.*

<sup>1</sup> Data in kcal/mol are averages from triplicate simulations for the minimum Δ*G*(α1, α2) in that quadrant. For example, "**−**,+" indicates the quadrant defined by (α<sup>1</sup> < 0◦, α<sup>2</sup> > 0◦). Standard error of the mean values are in parentheses. <sup>2</sup> Conformations are listed in order of most likely to least likely. Only conformations corresponding to Δ*G*+,−, Δ*G*−,+, Δ*G*+,+, and/or Δ*G*−,<sup>−</sup> < 3 kcal/mol are listed.

As expected, most of the pyranose monosaccharides have a single major pucker conformation: the 4C1 or the 1C4 chair. Aside from IdoA and Ido, which were discussed in the previous section, exceptions to this are Me*β*GlcNAc, *β*GalNAc, Me*β*GalNAc, αXyl, αNeu5Ac, and MeαNeu5Ac.

Me*β*GlcNAc, *β*GalNAc, and Me*β*GalNAc all have their Δ*G*(α1, α2) global minimum corresponding to the 4C1 chair conformation, as expected. They each also have a secondary minimum, but in all three cases the associated Δ*G*(α1, α2) is no less than 2.5 kcal/mol, which translates to a probability of no more than 1.5%. For Me*β*GlcNAc, the secondary minimum arises from skew-boat puckering, whereas for *β*GalNAc and Me*β*GalNAc, the secondary minimum is the 1C4 chair conformation.

αXyl has the 4C1 chair conformation as its global minimum and a secondary Δ*G*(α1, α2) minimum corresponding to the 1C4 chair and with a value of 2.17 kcal/mol. This compares favorably to the value of 1.65 kcal/mol computed with the GROMOS 56a6CARBO force field (Table 1 in [48]), which is also exactly the value from Angyal's scheme for determining ring puckering free energies [70]. We note that data from Angyal's scheme have been used for quantitative comparison in other force field evaluations for a wide variety of pyranoses. It is worth emphasizing here that the Angyal data, though based in experiment, are indirect and were deemed by Angyal to be "calculated interaction energies". Concerning his "calculated interaction energies", Angyal writes, "an approximate calculation serves as a useful guide and can be readily carried out by adding the values of all of the non-bonded interactions occurring in each conformer, plus the value of the anomeric effect [70]". That is, those Angyal data for the 4C1 to 1C4 equilibrium in D-aldopyranoses listed in Table 1 of [70] are calculated as a simple sum of experimental values from model compounds, in

contrast to being directly measured for each monosaccharide, for example, through NMR experiments [46].

Neu5Ac is discussed at length in Appendix A, below. Part of that discussion involves comparison to structures from PDB crystal structures. On the one hand, all simulation data here are for isolated Neu5Ac monosaccharides in liquid water, whereas the PDB data are from crystal environments and typically involve Neu5Ac having non-covalent interactions with other biomolecules and/or being covalently attached to other monosaccharides. On the other hand, there is substantial congruence between the aqueous simulation data and the experimental crystal data for Neu5Ac (Figure A1b,d,f,h in Appendix A). Indeed, a computational study of Neu5Ac ring puckering in vacuum and in explicit water noted that the structure of Neu5Ac bound in influenza neuraminidase belonged to conformations preferentially sampled in the aqueous simulations [71]. And, an analysis of high-resolution PDB data for Me*β*GlcNAc noted that while nearly 97% of structures in the data set were in the 4C1 chair conformation, 2.6% were boats or skew boats [72], which correlates closely with data from the present work. Therefore, in addition to NMR data from directly analogous experimental systems of monosaccharides in liquid water, PDB data may be useful as benchmarks for the type of force field-based simulations described here.

On a final note, control eABF simulations for THP yield a Δ*G*(α1, α2) plot that is symmetric about both α<sup>1</sup> = α<sup>2</sup> and about α<sup>1</sup> = −α<sup>2</sup> (Supplementary Material Figure S12), as expected. There are two equivalent global minima at 4C1 = 1C4, and boat/skew-boat conformations are over 5 kcal/mol higher in free energy. Thus, the exocyclic functional groups in the pyranose monosaccharides considered here can be thought of as introducing two types of perturbations to the THP Δ*G*(α1, α2): breaking of the symmetry, and altering the balance of chair vs. boat/skew-boat energetics.

#### **3. Materials and Methods**

#### *3.1. Force Field*

All systems were modeled using the CHARMM all-atom additive force field for carbohydrates [36,38] and the CHARMM-modified TIP3P water parameters [73,74] as contained in the "jul20" release of the CHARMM force field available as "toppar\_c36\_jul20.tgz" from http://mackerell.umaryland.edu/charmm\_ff.shtml (accessed on 3 March 2021). Systems with a carboxylate functional group additionally used sodium ion parameters [75,76] included in the same release. During the course of the present work, we discovered a set of typos in the jul20 parameter file that affect Neu5Ac puckering energetics. Full details are provided in Appendix A. The data presented in this manuscript and the associated Supplementary Material reflect the correct parameters as developed in [38].

#### *3.2. System Construction*

Solvated systems were constructed for each monosaccharide in Figure 2 using either the *α* or the *β* anomer or one of the corresponding O-methyl glycosides, resulting in four unique systems for each monosaccharide. Monosaccharide coordinates were constructed from default force field internal geometries. The solvent consisted of a cubic box of water molecules at the experimental density of water and having an edge length of the longest dimension of the monosaccharide plus 30 Å; water molecules within 3 Å of the monosaccharide were deleted. In systems with a carboxylate group, a single sodium ion replaced a water molecule randomly selected and at least 6 Å from the monosaccharide. All system construction was carried out using the CHARMM program, v. c45b1 [77]. A single system containing tetrahydropyran (THP) was similarly constructed.

#### *3.3. Molecular Dynamics Simulations*

Each system was simulated in triplicate under periodic boundary conditions. Each replicate within the triplicate was assigned random initial velocities using a unique random seed to generate a unique trajectory. Simulations were carried out using the NAMD software, v. 2.13 [78]. Electrostatic and Lennard-Jones interactions employed a 10-Å spherical

cutoff. Lennard-Jones interaction energies were smoothly switched to zero in the interval 8–10 Å [79], an isotropic correction was applied for Lennard-Jones interactions beyond the cutoff [80], and the particle-mesh Ewald method with a 1 Å grid spacing accounted for electrostatic interactions beyond the cutoff [81]. After 1000 steps of energy minimization, each system was heated through re-initializing velocities to the target temperature of 298 K every 1000 molecular dynamics steps across 20,000 total steps with an integration timestep of 0.5 fs and positional restraints on solute non-hydrogen atoms. The SHAKE [82] and SETTLE algorithms [83] were respectively used to constrain all bonds involving hydrogen atoms and water geometries to their equilibrium values, and a temperature of 298 K and a pressure of 1 atm were maintained using Langevin thermostatting [84] and Nosé-Hoover Langevin barostatting [85,86]. Following heating, positional restraints were removed and data were collected from 200-ns production simulations (100 × 10<sup>6</sup> timesteps with an integration timestep of 2.0 fs).

The Extended-System Adaptive Biasing Force (eABF) methodology [43,44] was used to determine the free energy of pyranose ring puckering, Δ*G*(*α*1, *α*2), using reaction coordinates proposed by Babin and Sagui [64], where *α*<sup>1</sup> is the dihedral angle defined by the atoms O5-C1-C2-C3 and *α*<sup>2</sup> is the dihedral defined by the atoms C3-C4-C5-O5, except in the case of sialic acid in which these dihedrals are defined by O6-C2-C3-C4 and C4-C5-C6-O6, respectively. Δ*G*(*α*1, *α*2) was computed from the CZAR gradient estimate [43] using a Poisson equation formalism [87] implemented within NAMD via the Colvars software module [88]. eABF parameters included a fictitious particle spring constant of *k*B*T*/degree/degree and sampling with a 1◦ × 1◦ bin size and restrained with half-harmonic potentials to the range −75◦ < *α*1,2 < 75◦. Application of the biasing force in a given bin was scaled by 0 for the first 100 samples and then linearly scaled from 0 to 100% between 100 and 200 samples. Non-biased control simulations followed the same protocol but with no eABF sampling.

Additional CMAP-biased simulations were carried out for iduronate and for idose by applying a fixed bias equal to −0.5 × Δ*G*(*α*1, *α*2) through the CHARMM force field CMAP term [69]. The representation of this bias using CMAP is not exact relative to the reference values computed by eABF simulation, as CMAP uses a square grid with 15◦ intervals between grid points and bicubic interpolation approximate −0.5 × Δ*G*(*α*1, *α*2) for off-grid values of (*α*1, *α*2). CMAP-biased simulations were run using the OpenMM software, v. 7.5.1 [89] and a molecular dynamics protocol similar to that used for non-biased control NAMD simulations.

#### *3.4. Molecular Dynamics Trajectory Analysis*

Molecular dynamics trajectories were analyzed with the CHARMM software, including for the computation of Cremer-Pople ring puckering parameters [55]. VMD [90] was used for visualization and the creation of molecular graphics.
