**1. Introduction**

The diverse group of protein–carbohydrate conjugates called proteoglycans (PGs) is a fundamental component of tissue structure in animals and can be found in the extracellular matrix (ECM) as well as on and within cells. PGs bind growth factors [1–12], enzymes [2,12], membrane receptors [12], and ECM molecules [2,12,13]. By doing so, they modulate signal transduction [13,14], tissue morphogenesis [2,8–11], and matrix assembly [2,15–17]. PG bioactivity is often dependent on the covalently linked carbohydrate chains called glycosaminoglycans (GAGs), which are linear, highly negatively charged, and structurally diverse carbohydrate polymers. GAGs mediate receptor–ligand complex formation by either forming non-covalent complexes with proteins or inhibiting the formation of complexes with other biomolecules. This makes GAGs key modulators in many diseases, giving them potential therapeutic applications. For example, heparan sulfate (HS) is released during sepsis and induces septic shock [18,19]; the removal of chondroitin sulfate (CS) may enhance memory retention and slow neurodegeneration in patients with Alzheimer's disease [20–22]; and dermatan

sulfate (DS) deficiency has been implicated in Ehlers–Danlos syndrome, thus the screening of DS in urine could be used as an early diagnostic tool [23,24].

GAG binding sites on proteins are determined by protein sequence and structure, with requirements for both shape and charge complementarity [12,25]. Thus, GAG function depends on GAG three-dimensional structure and conformation. Even subtle structural differences impact GAG function. For example, while CS and DS have many functional differences, the only structural difference is in the chirality of the uronic acid monosaccharides. While much is known about GAG function, attempting to study GAG conformational thermodynamics at atomic resolution presents a largely unsolved problem for existing experimental methods. This is largely due to the structural and conformational complexities of GAGs. For example, a given GAG consists of a repeating sequence of a particular disaccharide, but conformational complexity is introduced through flexibility in the glycosidic linkages between monosaccharides [26–30] (Figure 1). Additional complexity results from non-template-based synthesis [31] and variable enzymatic sulfation [32], which means a biological sample of a GAG composed of a specific disaccharide repeat will be polydisperse and heterogeneous owing to the variable length and sulfation of the individual polymer molecules. Liquid chromatography–mass spectrometry (LC-MS) [33–35], X-ray crystallography [36–41], and nuclear magnetic resonance (NMR) [42–45] are used to study GAGs but are limited in their ability to account for all of these complexities. Additionally, some studies have used results from LC-MS [45], X-ray crystallography [46], and NMR [46–51] to compare and validate conformational data from molecular dynamics (MD) simulations. This suggests that MD simulations can produce results complementary to experimental analysis methods by providing realistic three-dimensional atomic-resolution molecular models of GAG conformational ensembles [52–56]. ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐

‐ ‐ **Figure 1.** Compact non-sulfated chondroitin 20-mer conformation arising from flexible glycosidic linkages (red) between monosaccharide rings (GalNAc in blue and GlcA in cyan). The molecular graphics throughout are produced with the VMD program [57].

‐ ‐ ‐ ‐ A critical challenge with MD simulations of GAGs is that a single biological GAG polymer chain may contain up to 200 monosaccharide units [9]. When fully solvated, the resulting system will have in excess of 10<sup>6</sup> atoms. It is not feasible to routinely simulate such a system using current graphics processing unit (GPU)-accelerated MD codes with a modern GPU and multi-core CPU. This limits the utility of all-atom explicit-solvent MD as a tool for routine conformational analysis of GAGs of this size.

Coarse-grained (CG) MD simulations are the most feasible current alternative to all-atom explicit-solvent MD as they entail fewer degrees of freedom for the solute [48] and often an implicit (continuum) description of the solvent [58,59]. This can make CG MD two to three orders of magnitude faster, thereby allowing for the handling of large systems [60], such as GAG 200-mers. Indeed, a recent CG model using glycosidic linkage and ring pucker energy functions has provided previously-unseen details of the structure–dynamics relationship of GAGs in the context of PGs [48]. An important insight from that study was that GAGs, in contrast to the unique ordered conformations of folded proteins, need to be considered as existing in conformational ensembles containing a large diversity of three-dimensional conformations.

As an alternative approach to using CG MD to generate such conformational ensembles for GAGs, we propose using glycosidic linkage and monosaccharide ring conformations from unbiased all-atom explicit-solvent MD simulations [56,61–63] of short GAG polymers to rapidly construct conformational ensembles for GAGs of an arbitrary length. Toward this end, we studied a non-sulfated chondroitin 20-mer with the sequence [-4 glucuronate β1-3 N-acetylgalactosamine β1-]<sup>10</sup> for its simplicity and homogeneity. We first ran microsecond-scale all-atom explicit solvent MD on the 20-mer and used the resulting trajectories to develop a database of conformations. From this database, we randomly selected individual values for the bond lengths, bond angles, and dihedral angles in the glycosidic linkages connecting glucuronate (GlcA) and N-acetylgalactosamine (GalNAc) and in the monosaccharide rings. These values were used to construct a 20-mer conformational ensemble. The comparison of the constructed ensemble with the MD-generated ensemble of 20-mer conformations revealed similar end-to-end distance distributions, with a strong bias toward extended conformations in both cases. Short end-to-end distances associated with more compact conformations were facilitated by the sampling of non-4C<sup>1</sup> ring puckering by GlcA. This change in ring geometry, which occurs rarely on the microsecond timescale, introduced kinks into the polymer, causing it to bend back toward itself. The fact that the MD-generated ensemble had a great deal of variability in both end-to-end distances and radii of gyration demonstrates the inherent flexibility of the chondroitin polymer in aqueous solution. The fact that the constructed ensemble has very similar conformational properties to the MD-generated ensemble suggests that there is little correlation between the individual dihedral angle values that determine the internal geometry of a given conformation. Therefore, on the timescale of the simulations, non-sulfated chondroitin 20-mer does not appear to have any higher-order structure, in contrast to, for example, the secondary and tertiary structure seen in proteins. This lack of higher-order structure was borne out in a comparison of end-to-end distances for constructed vs. MD-generated ensembles of 10-mers, with the constructed ensemble built using the 20-mer database. Finally, we used the methodology to produce conformational ensembles of 100-mers and of 200-mers. The ability to model polymers with biologically-relevant chain lengths (e.g., 100- to 200-mers) will provide insights into GAG binding by other biomolecules. This will be especially useful in understanding the formation of complexes containing multiple biomolecules bound to a single GAG.

Other programs that construct three-dimensional atomic-resolution models of GAG polymers exist, for example, Glycam GAG Builder [64], POLYS Glycan Builder [65], CarbBuilder [66], and MatrixDB GAG Builder [67,68], which allow the user to choose GAG type, length, and sequence and are useful tools for producing an initial structure for MD simulations. Glycam and POLYS Glycan Builder allow the user either to specify particular glycosidic linkage dihedral angle values or use default parameters pulled from their databases. The databases used by Glycam, POLYS Glycan Builder, and Carb Builder include GAG mono- and disaccharide structures determined by molecular mechanics and/or MD. MatrixDB pulls from databases of experimentally determined conformations of GAG disaccharides from crystallized GAG–protein complexes. While the user has the option to choose the GAG length, these tools are intended for shorter GAG polymers. In contrast to these tools, our algorithm pulls from a database of full conformational landscapes of unbound GAG 20-mers. Additionally, our algorithm is intended for modeling long GAG polymers with biologically-relevant chain lengths and can quickly produce large ensembles (e.g., on the order of 10,000 3-D models) of polymer conformations that we would expect to see in simulation. Thus, it eliminates the need for simulation, reducing time and computational cost.
