**1. Introduction**

Glycosylation is a common and important post-translational modification to proteins in eukaryotic biology. Additionally, carbohydrates are key components of eukaryotic lipids that make up the bilayers in which transmembrane proteins are embedded [1]. The carbohydrate portions of glycosylated proteins and glycolipids are called glycans. Naturally occurring glycans in vertebrates, including in humans, are composed of the monosaccharides D-glucose (Glc), *N*-acetyl-D-glucosamine (GlcNAc), D-galactose (Gal), *N*-acetyl-D-galactosamine (GalNAc), D-mannose (Man), D-xylose (Xyl), L-fucose (Fuc), *N*acetyl-D-neuraminic acid (Neu5Ac), D-glucuronic acid (GlcA), and L-iduronic acid (IdoA), all in their pyranose forms [2] (Figure 1). As Neu5Ac, GlcA, and IdoA are expected to be deprotonated under typical physiological conditions, Figure 1 shows their conjugate base forms, *N*-acetyl-D-neuraminate, D-glucuronate, and L-iduronate, and it is these forms that are exclusively considered in what follows. Examples of glycans as components of glycosylated proteins are the *N*-glycans [3] and O-glycans [4] attached to glycoproteins and the glycosaminoglycans attached to proteoglycans [5]. Experimental atomic-resolution structural biology on glycosylated proteins is complicated by the non-template based synthesis of the attached glycans [6], which precludes a convenient source of homogeneous

**Citation:** Guvench, O.; Martin, D.; Greene, M. Pyranose Ring Puckering Thermodynamics for Glycan Monosaccharides Associated with Vertebrate Proteins. *Int. J. Mol. Sci.* **2022**, *23*, 473. https://doi.org/ 10.3390/ijms23010473

Academic Editors: Istvan Simon and Csaba Magyar

Received: 6 December 2021 Accepted: 28 December 2021 Published: 31 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

sample from biological sources, the intrinsic flexibility of glycans, which hinders conformational analysis by X-ray crystallography and NMR spectroscopy [7], and the covalent linkage of proteins with glycans, which can affect the structural properties of both the glycan and protein components [8–10]. In the context of membrane proteins, experimental atomic-resolution structural biology using X-ray crystallography entails extracting the membrane protein from its native lipid environment in order to create protein crystals [11], which means the effects of natural glycolipids in the native bilayer are not included in the structure determination. Therefore, the impact of glycans on protein structure continues to be at the frontiers of protein structure research.

**Figure 1.** Compounds considered in the current study. Glc carbon atoms are numbered in blue. All other monosaccharides follow the same numbering scheme, except for Neu5Ac, which is numbered as pictured. All monosaccharides are drawn as the *β* anomer. The *α* anomer is created by inversion of the configuration at carbon 2 for Neu5Ac and at carbon 1 for all other monosaccharides. Both anomers for each monosaccharide as well as the corresponding O-methyl glycosides, formed by methylation at the anomeric carbon hydroxyl, were studied, for a total of 45 compounds (44 monosaccharides + THP).

Computational approaches for three-dimensional modeling of the atomic-resolution conformational properties of glycans have been developed and applied to help bridge the gaps in experimental methods [12–26]. Widely used among these computational approaches are explicit solvent molecular dynamics simulations employing atomistic force fields such as GLYCAM06 [27,28], GROMOS 53A6GLYC [29,30], GROMOS 56a6CARBO/ CARBO\_R [31–33], OPLS-AA [34,35], and CHARMM [36–39]. The quality of the results from such molecular dynamics simulations depends upon the quality of the force field parametrization. The conformational properties of glycans are determined principally by flexibility in the rings of the constituent monosaccharides and in the glycosidic linkages connecting them (Figure 2) [12,40], and thus it is important for force field parametrizations to accurately capture the physics of these sources of flexibility in order to ensure reliable modeling results.

**Figure 2.** Pyranose ring puckering (red) and glycosidic bond rotation (blue) are the major sources of polymer flexibility in vertebrate glycans.

Since pyranose ring puckering occurs at the microsecond and beyond timescale [40–42], which is near the upper limit of typical present-day all-atom explicit-solvent molecular dynamics simulations, limitations in force field accuracy may not be readily apparent simply based on analysis of such simulation results. Here, we systematically determine the ring puckering thermodynamics of all compounds in Figure 2, including both the *α* and the *β* anomers and their corresponding O-methyl glycosides for the ten monosaccharides (i.e., 45 systems total), with the widely-used CHARMM force field. Extended System Adaptive Biasing Force (eABF) [43,44] is applied to achieve well-converged equilibrium statistics for ring puckering probabilities, with error estimates from triplicate 200-ns simulations for each system. The ring puckering thermodynamics from these simulations are in line with expected behavior, including for the highly flexible IdoA, and imply that the CHARMM force field can be used with confidence to correctly capture pyranose ring puckering contributions to glycan conformational heterogeneity in the context of the vertebrate glycans such as N-glycans, O-glycans, glycosaminoglycans, and glycolipids.

We additionally consider idose in its pyranose form, since, like IdoA, idose has a close balance between 4C1 and 1C4 chair probabilities, which makes it a useful test of force field accuracy. In agreement with prior computational results [45] and recent NMR data [46], the CHARMM carbohydrate force field performs very well in capturing the close balance for idose as well as for IdoA. Finally, for completeness, we include tetrahydropyran, which is the basic six-membered ring scaffold common to all of the monosaccharides considered here (Figure 2). As expected, there is an exact 50:50 balance for chair-chair interconversion for THP.

It is possible to tune ring puckering thermodynamics by selectively refining specific force field parameters and by using ring puckering thermodynamics as target data in the parametrization process. In the case of the GROMOS force field, such an approach was taken as a force field revision [31,32,47], and has yielded excellent results for ring puckering across a wide variety of pyranoses [32,33,45,48]. In the case of CHARMM, ring puckering thermodynamics in solution were not used as target data for CHARMM parametrization, and both bonded and nonbonded force field parameters, which built upon quantum mechanical gas phase puckering energetics for tetrahydropyran [36,49], are conserved across all of the different monosaccharides considered here. This demonstrates it is possible to correctly account for pyranose monosaccharide ring puckering thermodynamics in solution with a general transferable bonded and nonbonded force field parameter set. In the case of CHARMM, combining this parameter set with CHARMM force field parameters

for proteins [50–52] can enable accurate modeling of glycoproteins and proteoglycans, and combining these parameters set with CHARMM force field parameters for lipids can do the same for glycolipids [53,54], which in turn can enable accurate modeling of transmembrane proteins embedded in complex bilayers composed of natural lipids.

## **2. Results and Discussion**

#### *2.1. Reaction Coordinate and Sampling Approach*

Ring puckering for pyranose monosaccharides is commonly described using the Cremer-Pople (C-P) parameters (*Q*, *θ*, *φ*) [55], which provide a convenient quantitative means to identify both the extent and the nature of the puckering using spherical coordinates. The puckering amplitude *Q* describes the extent or magnitude of the puckering, while the angular values 0◦ ≤ *θ* ≤ 180◦ and 0◦ ≤ *φ* < 360◦ describe the nature of the puckering. "Polar" values of *θ* near 0◦ and 180◦ correspond to 4C1 and 1C4 chair conformations, respectively, while "equatorial" values of *θ* near 90◦ correspond to boat and skew boat conformations, with the *φ* value indicating the specific boat or skew boat (e.g., 2SO). Intermediate or "tropical" values of *θ*, which are between the poles and the equator, correspond to envelope and half-envelope conformations, with the *φ* value indicating the specific envelope or half envelope [56].

Due to the long timescale for interconversion between 4C1 and 1C4, it is impractical to precisely determine the balance of probabilities and, hence, free energy difference, Δ*G*, between these conformations for pyranose monosaccharides on a routine basis using standard all-atom explicit-solvent molecular dynamics simulations. This is true for pyranoses where Δ*G* ≈ 0 due to the energy barrier separating the conformations [41,42], and the situation is even more difficult in cases where Δ*G* is substantially different than zero due to the difficulty in achieving equilibrium sampling of the unfavored conformation.

A logical means to address this issue is to apply a bias to *θ* during a simulation and either to reweight the sampling distribution to get unbiased conformational probabilities or to directly compute Δ*G* from the bias. Such an approach employing metadynamics [57,58] has enabled a number of studies to this end [29,32,33,45,48,59]. As demonstrated in these studies, this approach allows one to obtain a good estimate for Δ*G* with much less computation time than through standard (non-biased) molecular dynamics. There are two potential downsides to using a bias on *θ*. The first is the need to develop specialized computer code for the bias since the C-P *θ* is not a standard cartesian or internal coordinate. The second is that, while the single parameter *θ* can differentiate the two chair conformations from each other and also non-chair conformations from chair conformations, it cannot differentiate one non-chair conformation from another non-chair conformation. This second potential downside can be addressed by introducing a second simultaneous bias on *φ* but at the expense of further complicating the first downside.

For these reasons, direct use of dihedral angles is an attractive alternative. For example, Pickett and Strauss (P-S) defined three out-of-plane dihedrals constructed as various combinations of atoms in the pyranose ring [60], and it has been shown that simultaneous biases on all three of these angles can be effectively used to sample pyranose ring puckering [61]. In fact, the P-S and C-P approaches are mathematically equivalent [62]. However, there is an important practical difference with regard to applying biases on C-P parameters versus P-S out-of-plane dihedrals: only the two angular C-P parameters are required to uniquely identify the pucker nature (as opposed to magnitude) of a particular conformation whereas all three P-S out-of-plane dihedrals are required to do the same [59,63].

Babin and Sagui (B-S) have also proposed using dihedral angles for biased sampling of pyranose ring pucker [64]. In contrast to the P-S approach, only two dihedral angles are used in their scheme, α<sup>1</sup> ≡ O5–C1–C2–C3 and α<sup>2</sup> ≡ C3–C4–C5–O5, and the dihedral angles are real dihedrals determined by sequentially bonded atoms. Babin and Sagui have shown that biased sampling of (*α*1, *α*2) is an effective approach for sampling IdoA and GlcA puckering, and Alibay and Bryce have extended on these two monosaccharides to sulfated variants, as well as to non-sulfated and sulfated variants of GlcNAc, Gal, and GalNAc [65]. In what follows, we demonstrate that major minima in Δ*G*(α1, α2) are populated by unique conformations. As such, it is possible to do direct integration of regions of Δ*G*(α1, α2) to determine Δ*G* not only between the 4C1 and 1C4 chairs, but also between specific boat/skew-boat conformations.

#### *2.2. Extended System Adaptive Biasing Force (eABF) Sampling of the B-S (α1, α2) Reaction Coordinate*

Methyl α-L-idopyranosiduronic acid (MeαIdoA) (Figure 2 "IdoA" with a methylated axial C1 hydroxyl) serves as a good test system to demonstrate the efficacy of eABF sampling of (α1, α2) owing to a small (<1 kcal/mol [46]) Δ*G* for conversion between the 4C1 and 1C4 chair conformations and a large energy barrier (~10 kcal/mol from the present work based on transition path saddle points in Figure 3), and therefore, slow kinetics, for this transition. Triplicate 200-ns eABF simulations with simultaneous biases on α<sup>1</sup> and α<sup>2</sup> and seeded with different randomized initial velocities yield essentially identical results across the entire Δ*G*(α1, α2) surface (Figure 3). Not only are the thermodynamic minima equal in both value and location, but so are the saddle regions and even the maxima, which demonstrates the excellent convergence properties of eABF for this system. Δ*G*(α1, α2) data are similarly well-converged for all 45 systems in this study (four different anomerization/methylation states for each of the 11 monosaccharides in Figure 2 plus tetrahydropyran; Supplementary Material Figures S1–S12).

**Figure 3.** MeαIdoA Δ*G*(α1, α2) from eABF simulation. Each panel is from a separate 200-ns simulation seeded with different initial random velocities. α<sup>1</sup> and α<sup>2</sup> are in degrees. Δ*G*(α1, α2) is in kcal/mol, with contours drawn every 1 kcal/mol, colored from 0–3 kcal/mol, and labeled every 2 kcal/mol.

Additionally, each major thermodynamic minimum, that is, where Δ*G*(α1, α2) < 3 kcal/mol, is populated by a single type of ring puckering conformation (Figure 4). We have chosen 3 kcal/mol as a cutoff value for the definition of major thermodynamic minimum since, at the simulation temperature of 298 K, values greater than 3 kcal/mol correspond to small probabilities, specifically, less than 0.64%. This association between a single ring puckering conformation and each major thermodynamic minimum in Δ*G*(α1, α2) holds for all 44 monosaccharides in this study, which illustrates the practical utility of the B-S reaction coordinate for characterizing pyranose ring puckering not only for chair conformations but also for specific non-chair conformations.

Kinetic data from the simulations clearly show the efficacy of eABF combined with the B-S (α1, α2) reaction coordinate for effectively sampling pyranose ring pucker, which is not surprising given the excellent convergence properties of Δ*G*(α1, α2) with eABF as discussed previously. Serving as a negative control, standard (non-biased) triplicate simulations of MeαIdoA starting from the 1C4 chair undergo at most one transition in C-P *θ* during 200 ns (Figure 5a). Specifically, two of the simulations maintain *θ* ∼= 180◦, indicating they are trapped in the initial conformation, while the third transitions at 25 ns to *θ* ∼= 90◦ and remains there, indicating it is stuck in the equatorial boat/skew-boat region of puckering space. Therefore, standard sub-microsecond explicit solvent molecular dynamics simulation is inadequate for the task of sampling puckering conformations for pyranoses modeled with the CHARMM force field.

**Figure 4.** Sampling of specific MeαIdoA ring puckering conformations during eABF simulation with the Babin-Sagui (α1, α2) reaction coordinate. Sampled (α1, α2) values are separated into those for 4C1, 1C4, and 2SO (blue, red, and green dots, respectively, in panel "a") and for all other (black dots, panel "b") puckering conformations. α<sup>1</sup> and α<sup>2</sup> are in degrees. Δ*G*(α1, α2) is in kcal/mol, with contours drawn every 1 kcal/mol from 0–5 kcal/mol and colored from 0–3 kcal/mol. Puckering data have been aggregated across the triplicate simulations, and Δ*G*(α1, α2) is from the first simulation in the triplicate.

**Figure 5.** MeαIdoA conformational transitions in standard (non-biased) (**a**), eABF (**b**), and CMAPbiased (**c**) molecular dynamics simulations. eABF and CMAP biased simulations have biasing on the Babin-Sagui (α1, α2) reaction coordinate. Data in each panel are from triplicate simulations (blue, red, and green) seeded with different random initial velocities.

In contrast, with eABF sampling, during the first 25 ns, as the time-dependent biasing force becomes a progressively better estimate of the thermodynamic force along (α1, α2), transitions in *θ* start to become induced (Figure 5b). Beyond *t* = 25 ns, there is rapid transitioning on the nanosecond timescale from the 4C1 chair (*<sup>θ</sup>* ∼= <sup>0</sup>◦), through boat/skewboat conformations (*<sup>θ</sup>* ∼= <sup>90</sup>◦), to the 1C4 chair (*<sup>θ</sup>* ∼= <sup>90</sup>◦) and back again, indicating sufficient

sampling of (α1, α2) by eABF to provide an accurate estimate of the thermodynamic force along (α1, α2). As a technical point, the eABF approach applies a bias not to (α1, α2) directly but to extended degrees of freedom attached to (α1, α2), and the thermodynamic force on (α1, α2) is recovered from the biasing force applied to these extended degrees of freedom [43,44]. Standard ABF is not possible for sampling (α1, α2) since α<sup>1</sup> and α<sup>2</sup> do not meet the required orthogonality condition for standard ABF [66–68] owing to the sharing of atoms carbon 1 and oxygen 5 in both of the dihedral angle definitions. For additional information on this point, we refer interested readers to the cyclohexane data in Figure 2 of reference [44] and the associated discussion therein, which vividly demonstrates errors in estimation of cyclohexane puckering free energy with standard ABF that are corrected with eABF.

As a positive control, and similar to the approach of Babin and Sagui [64], we ran an additional set of simulations that employed CMAP-biased sampling [51,69]. In these simulations, the potential energy was defined by *U*non-biased + *U*CMAP, where *U*non-biased is the same CHARMM additive force field function used in the non-biased simulations here and *<sup>U</sup>*CMAP is *<sup>U</sup>*CMAP(α1, <sup>α</sup>2) ∼= −0.5 × <sup>Δ</sup>*G*(α1, <sup>α</sup>2). Unlike in the eABF simulations, the bias, in this case from the CMAP term, is fixed. We note that *U*CMAP(α1, α2) is only approximately equal to −0.5 × Δ*G*(α1, α2) since, while Δ*G*(α1, α2) was computed on a square grid with a grid spacing of 1◦, the grid spacing for the CMAP potential is 15◦. As expected, there is excellent sampling of C-P *θ* from the very beginning of the triplicate simulations (Figure 5c). While there is rapid barrier crossing with this approach, there is less uniform sampling across all values of *θ* as compared to eABF sampling, with a strong tendency to favor sampling of polar and equatorial values of *θ* as compared to tropical values (Figure 5b vs. Figure 5c). This resulted from the factor of 0.5 used in the definition of *U*CMAP(α1, α2), and was done to maximize importance sampling of thermodynamically favored regions of (α1, α2) space while still lowering barriers sufficiently to achieve ergodic sampling of (α1, α2) on the 200-ns time scale of the simulations. As expected, thermodynamically unfavored regions of (α1, α2) correspond to tropical values, which in turn are envelope and half-envelope conformations with high degrees of ring strain.

Plotting C-P (*θ*, *φ*) values sampled during the eABF and the CMAP-biased simulations further validates the degree to which these two biasing methods applied to the (α1, α2) reaction coordinate enable sampling of pyranose puckering space. In addition to excellent coverage of the two chair conformations 4C1 and 1C4 located in the polar regions, there is good coverage of the equatorial region for 75◦ < *φ* < 270◦, which includes 5S1, 2,5B, 2SO, B3,O, 1S3, 1,4B, and 1S5, in order of increasing *φ* (Figure 6). That said, there is very limited sampling of equatorial regions outside this range of *φ* values, resulting from the fact that the two-dimensional B-S (α1, α2) reaction coordinate is not a perfect replacement for biased sampling of the two-dimensional C-P (*θ*, *φ*) reaction coordinate. Nonetheless, it is reasonable to assume conformations not sampled are very high in free energy and that the thermodynamically relevant conformations have all been sampled. This latter point is emphasized by comparing these sampling data for eABF versus CMAP biasing. In the case of eABF, as time increases, sampling approaches that for a distribution biased by −Δ*G*(α1, α2), whereas for CMAP biasing, sampling is that for a distribution biased by −0.5 × Δ*G*(α1, α2), as discussed above. As such, eABF provides more complete coverage of (*θ*, *φ*) space (Figure 6a) as compared to CMAP-biased sampling (Figure 6b).

**Figure 6.** MeαIdoA Cremer-Pople (*θ*, *φ*) values sampled during eABF (**a**) and CMAP-biased (**b**) molecular dynamics simulations. Pyranose ring puckering regions [56] ("4C1", "northern tropical", "2SO", etc.) are labeled as defined in the Materials and Methods section. Biasing was applied to the Babin-Sagui (α1, α2) reaction coordinate. Data in each panel are from triplicate simulations (blue, red, and green) seeded with different random initial velocities. eABF simulations were 200 ns and CMAP-biased simulations were 1000 ns.

## *2.3. Using eABF-Computed* Δ*G(α1, α2) to Calculate Specific Ring Puckering Conformation Probabilities*

Given that each major thermodynamic minimum for MeαIdoA is populated by a single type of puckering conformation, as shown above, it is possible simply to integrate the probabilities associated with each minimum to determine relative probabilities for specific ring puckering conformations. We operationalized this by converting Δ*G*(α1, α2) values from eABF simulations to probabilities *p*(α1, α2) using the Boltzmann relationship *p*(α1, α2) = exp(Δ*G*(α1, α2)/−*RT*), where *R* is the universal gas constant and *T* is the temperature. We then separated the data based on the (α1, α2) quadrant, and summed up all values of *p* for each (α1, α2) bin having an associated value Δ*G*(α1, α2) < 3 kcal/mol within a 20◦ degree radius of the most favorable thermodynamic minimum in that quadrant. This yields at most one summed probability, *P*, per quadrant of the (α1, α2) coordinate. In the case of MeαIdoA, there are three such values, *P*+,+, *P*-,+, and *P*+,-; the subscript here indicates the quadrant, for example, the quadrant defined by (α<sup>1</sup> < 0◦, α<sup>2</sup> > 0◦) for "−, +". As discussed above, for MeαIdoA, the "+, −" minimum corresponds uniquely to the 4C1 ring pucker conformation, "−, +" to 1C4, and "+, +" to 2SO (Figure 4), which allows for the assignment of probability values to specific ring pucker conformations based on eABF Δ*G*(α1, α2) results.

#### *2.4. Ring Puckering Probabilities: Idose and Iduronate*

Among the molecules considered in this study (Figure 2), the Ido and IdoA compounds are well known to exhibit significant conformational flexibility with regard to ring pucker. There are recent high-quality experimental results quantifying this, but with variable agreement with prior molecular dynamics simulation studies [46]. Comparison of 4C1: 1C4 ring puckering probability ratios shows good agreement between the present simulation results and these available experimental data (Table 1). In addition to probabilities from the eABF Δ*G*(α1, α2) results, we have included probabilities computed from the CMAP-biased simulations. These were determined by collecting all 4C1 conformations from a CMAPbiased simulation, assigning a probability *p* = exp(*U*CMAP/−*RT*) to each conformation to

account for the effect of the CMAP bias, and then summing up the *p* values to get the total probability for the 4C1 pucker. This was likewise carried out for the 1C4 pucker, and the two total probabilities were normalized to sum to 100% (Table 1, "CMAP-biased simulations").


**Table 1.** 4C1: 1C4 ring puckering probability ratios in idose (Ido) and iduronate (IdoA) compounds.

<sup>1</sup> Data are averages from triplicate simulations with standard error of the mean values in parentheses.

Converting the 4C1: 1C4 ring puckering probability ratios *r* to free energies using the relationship Δ*G* = −*RT*ln(*r*) and plotting these Δ*G* values further illustrates how well the force field approach treats the close balance between 4C1 and 1C4 ring conformations. These Δ*G* values for the 4C1 to 1C4 equilibrium from the eABF and from the CMAP-biased simulations are typically within 0.5 kcal/mol of the experimental values (Figure 7a and Figure 7b, respectively). This very small degree of error is excellent for a force field model, and is not much different than what is seen when comparing the results from the two different simulation approaches using the same force field (Figure 7c).

**Figure 7.** Comparison of Δ*G* values for the 4C1 to 1C4 equilibrium in Ido and IdoA compounds from eABF simulations, CMAP-biased simulations, and NMR experiments. Data are presented as eABF vs. NMR (**a**), CMAP-biased vs. NMR (**b**), and eABF vs. CMAP-biased (**c**). The specific compounds and the experimental data from NMR experiments are as detailed in Table 1. Simulation data points are averages from triplicate simulations, with error bars representing 95% confidence intervals. The solid diagonal is the line *y* = *x*, and the dotted diagonal lines are ±0.5 kcal/mol.
