*2.2. Construction Algorithm to Generate GAG Conformational Ensembles*

The conformational data described above served as inputs to an algorithm we developed to generate chondroitin polymer conformational ensembles of user-specified length and with a user-specified number of conformations. The algorithm works as follows:


$$E\_{\rm rdihe} = k\_{\rm dibe} \ast \sum \left(\phi\_1 - \phi\_0\right)^2 \tag{1}$$

3. To ensure conformational ensembles do not contain non-physical conformations, a bond potential energy (*E*b) cutoff is applied. This cutoff is the sum of a polymer-length-specific cutoff and a constant independent of polymer length. The length-specific component of the cutoff is the bond potential energy after energy minimization, performed using the same restraints and minimization protocol used for each frame of the constructed ensemble (outlined above), of the polymer constructed in a fully-extended conformation (i.e., with the same glycosidic linkage φ and ψ angles as the starting conformation for MD simulations). The constant is added as a buffer to account for slight variations in the energies of other extended conformations. As linkage and ring conformations are treated independently and selected at random, it is possible to have a bond piercing another monosaccharide ring that may not be corrected by minimization. To estimate the ring-piercing bond strain energy for each exocyclic bond not participating in a glycosidic linkage, a system containing two non-bonded monosaccharides (i.e., GlcA and GalNAc, GlcA and GlcA, or GalNAc and GalNAc) was constructed such that an exocyclic bond of one monosaccharide pierces the ring of the other. To estimate the bond strain energy for each bond participating in a glycosidic linkage, a system containing one disaccharide unit (i.e., GlcAβ1-3GalNAc or GalNAcβ1-4GlcA) and a single monosaccharide (i.e., GlcA or GalNAc) was constructed such that a linkage bond in the disaccharide pierces the ring of the single monosaccharide. Systems containing interlocking rings (i.e., GlcA-GalNAc, GlcA-GlcA, and GalNAc-GalNAc) were also constructed to estimate the bond strain energy of the bonds piercing

the opposite ring. The same energy minimization protocol used in the algorithm was performed on this conformation, as well as a conformation in which the non-bonded saccharide units are 20 Å apart, and the post-minimization lengths of the bond piercing the ring in the initial conformation were compared. The pierced bond length (*x*2), the non-pierced bond length (*x*1), and the equilibrium bond length (*x*0) and corresponding force-field bond-stretching constant (*k*b) from the CHARMM parameter file were used to estimate a lower bound on the energy (∆*E*b) resulting from the bond distortion (Equation (2)).

$$
\Delta E\_{\rm b} = k\_{\rm b} \ast \left[ \left( \mathbf{x}\_2 - \mathbf{x}\_0 \right)^2 - \left( \mathbf{x}\_1 - \mathbf{x}\_0 \right)^2 \right] \tag{2}
$$

Of all conformations that still had a bond piercing a ring after minimization, the smallest ∆*E*<sup>b</sup> = 132.3 kcal/mol. Of the conformations in which ring piercing was corrected during minimization, the maximum ∆*E*<sup>b</sup> < 1 kcal/mol. Thus, a buffer of 100 kcal/mol is added to the post-minimization bond potential of the initial extended conformation for any given polymer length. If the post-minimization bond potential of a given frame is beyond this cutoff, the frame is excluded from the ensemble.

For internal validation of our implementation of the algorithm, bond length probability distributions for each type of bond (i.e., C-C single bond, C-O single bond, C=O double bond, C-O partial double bond of GlcA carboxylate group, C2-N single bond between GalNAc amide and ring carbon, C-N single bond within GalNAc amide, C-H bond, O-H bond, and N-H bond), free energies ∆*G*(φ, ψ) for β1-3 and β1-4 glycosidic linkages, C-P parameters of GlcA and GalNAc monosaccharide rings, end-to-end distance distributions, and scatterplots of radius of gyration as a function of end-to-end distance from MD-generated ensembles and constructed ensembles both before and after energy minimization were compared. Additionally, bond potential energy distributions from constructed ensembles after energy minimization were plotted to verify that the algorithm calculated an appropriate energy cutoff and gave the expected energy distributions for the given polymer size.

To assess the expediency of application of MD-generated 20-mer conformations to construct chondroitin polymers of variable length, we constructed a non-sulfated chondroitin 10-mer ensemble using the algorithm and compared it to chondroitin 10-mer conformational ensembles generated by MD using the same protocol as the 20-mer simulations. We also constructed conformational ensembles of a non-sulfated chondroitin 100-mer and 200-mer to demonstrate the efficacy and efficiency of our algorithm to construct conformational ensembles of chondroitin polymers with biologically-relevant chain lengths.
