**2. Results and Discussion**

Figure 1 illustrates how coordinate uncertainty in NMR-derived "ensembles" (panel B) tracks coordinate variance in MD simulations (e.g., at 300 K in panel D). The position of the carbonyl oxygen atom in residue 42 varies both across structural models in the NMR ensemble and over MD trajectories (panels C and D), and this oxygen atom is splayed more than the carbonyl carbon to which it is attached in panels B–D. However, the crystallographic B-factor for this carbonyl oxygen (22.15) is not particularly high nor is it much larger than that of the carbonyl carbon (21.83). Meanwhile, on the opposite side of that peptide bond's plane, the amide nitrogen from residue 43 is relatively well superimposed in the NMR ensemble and MD trajectory. The motion of the peptide plane appears to pivot around the amide nitrogen and proton. However, in the crystallographic structure, the B-factor (21.47) is barely lower than that of the carbonyl atoms.

**Figure 1.** Backbone traces of residues 41–43 from Q8ZRJ2. (**A**) Crystallographic structure (PDB ID 2ES9) colored by a B-factor with blue being low, green being moderate, and red being high. Residue numbers shown in this panel reflect residue numbers in all panels. (**B**) FindCore superimposition of NMR ensemble (PDB ID 2JN8). This superimposition was calculated using a core atom set drawn from all heavy atoms (using all deposited models in the FindCore calculation) and not merely the residues shown. THESEUS superimposition, calculated from the entire MD trajectory using all heavy atoms, of MD trajectories simulated using the AMBER force field, showing snapshots 100 and 1000, at (**C**) 100 K and (**D**) 300 K. In panels (**B**–**D**), carbonyl oxygens are red, amide nitrogens are blue, carbons

are green, and amide hydrogens are white. Note the splaying in the carbonyl oxygens in panels (**B**–**D**) and the relatively well superimposed amide nitrogens in panels (**B**) and (**D**). Even in panel (**C**), amide nitrogens are better superimposed than carbonyl oxygens. In general, peptide planes appear to pivot with the amide protons and/or amide nitrogens being relatively immobile with the carbonyl oxygens at the opposite end of the peptide plane being relatively mobile. This pattern is not apparent in the B-factors depicted in panel (**A**).

Application of Friedman's test [36] to coordinate uncertainties (Figure 2, first two columns), ranked from lowest to highest on a per-residue basis, of NMR structures, yielded results that confirmed what was observed in the development of the Expanded Findcore method [22]. For almost all NMR ensembles considered, whether superimposed using FindCore or THESEUS, the average rank of the carbonyl oxygen (O) atoms was higher than the average ranks of the amide nitrogen (N), Cα, and carbonyl carbon (C') atoms. In many structures, the average rank of C' and N atoms was lower than the average rank of the Cα atoms. Average ranks (averaged on a per-structure basis) of backbone heavy atoms in THESEUS superimposed MD trajectories (Figure 2, third column) were also higher for O atoms and lower for C' and N atoms. When analyzing crystallographic B-factors, however, average ranks did not generally vary much with the atom type (Figure 2, fourth column). α α

α α **Figure 2.** Distribution of average ranks of coordinate uncertainties, variances, and B-factors of backbone heavy atoms. As described in the main text, atoms in each residue are ranked by (**A**–**D**) coordinate uncertainty of FindCore superimposed NMR ensembles, (**E**–**H**) THESEUS superimposed NMR structures, coordinate variances of (**I**–**L**) THESEUS superimposed MD trajectories and (**M**–**P**) B-factors. For each structure, an average rank is calculated for each backbone heavy atom type: (first row) amide N, (second row) Cα, (third row) carbonyl C, and (fourth row) carbonyl O. For superimposed NMR ensembles (columns one and two) and MD trajectories (column three), a clear pattern is visible: average ranks for amide nitrogen atoms and carbonyl carbon atoms are often lower than average ranks for Cα atoms. The average ranks for carbonyl oxygen atoms are usually higher. When backbone heavy atoms are ranked by a B-factor, however, the average ranks for all backbone heavy atoms typically are between 2–3. The average ranks plotted in this figure are tabulated in Tables S2–S5, Supplementary Materials.

Multiple comparisons subsequent to Friedman's test (Figure 3) indicated that, for NMR ensembles and MD trajectories, the coordinate uncertainties and, respectively, variances (as

ranked on a per-residue basis) for O atoms were significantly higher than the coordinate uncertainties or variances for N and C' atoms in almost all ensembles or trajectories explored. In many superimposed NMR ensembles, coordinate uncertainties for O atoms were also significantly higher than coordinate uncertainties for Cα atoms, and, in a few superimposed NMR ensembles, coordinate uncertainties for Cα atoms were higher than those for N and C' atoms. In most superimposed MD trajectories, coordinate uncertainties for O atoms were also significantly higher than coordinate uncertainties for Cα atoms, but coordinate uncertainties for Cα atoms were not significantly higher than those for N and C' atoms. However, only a few crystal structures showed any significant differences in coordinate uncertainties between atom types. α α α α

**Figure 3.** Results of Friedman's Test and subsequent multiple comparisons analysis. A bar, associated with a comparison X < Y, that is n units high, indicates that, in n structures, the assessed measure of coordinate variability is significantly lower for atom type X than for atom type Y. e.g., in panel A, the bar associated with C < O being 39 units high indicates that in 39 NMR ensembles, the coordinate uncertainties (calculated using FindCore superimpositions) for carbonyl carbons are significantly less (according to Friedman's test) than those for carbonyl oxygens. Mean ranks are considered significantly different if they differ by more than three standard deviations. Assessed measures of coordinate variability are (**A**) coordinate uncertainties in FindCore superimposed NMR ensembles, (**B**) coordinate uncertainties in THESEUS superimposed NMR ensembles, (**C**) coordinate uncertainties in THESEUS superimposed MD trajectories, and (**D**) crystallographic B-factors. Note that, in almost all superimposed NMR ensembles (independent of superimposition method), as well as in almost all THESEUS superimposed MD trajectories, amide nitrogen and carbonyl carbons have significantly lower coordinate uncertainties than carbonyl oxygens. However, only a small number of crystallographic structures have any significant results using the Friedman's test to compare B-factors of different atom types.

Unlike, in the case of superimposed NMR ensembles and MD trajectories, where the coordinate uncertainties or variances of backbone heavy atoms in a residue had a tendency to be lowest for N and C' atoms and highest for O atoms, no such persistent pattern existed for crystallographic B factors. On the other hand, the pattern in coordinate variances in superimposed MD structures persisted across MD trajectories ran using different forcefields (AMBER99SB vs. OPLS) as well as temperatures (100 K vs. room temperature) and did not depend on whether the SeMET residues found in the crystal structures used to seed MD calculations were replaced with MET residues or not.

That carbonyl oxygens possess a significant tendency to have higher coordinate variances in THESEUS superimposed MD ensembles, as well as having higher coordinate uncertainties across FindCore superimposed NMR "ensembles" indicates the pattern of coordinate uncertainties observed in NMR-derived structures is not solely an artifact of the

superimposition method (THESEUS vs. FindCore), not a particular force field used (AM-BER and OPLS in MD simulations, CNS [37,38], and XPLOR-NIH [39] in NMR refinement), nor the particular characteristics of an NMR-based structural determination (e.g., a lack of experimentally derived restraints on carbonyl oxygen atoms). The persistence of the tendency for carbonyl oxygens to have higher coordinate variability between ensembles explored via MD simulation and NMR-derived "ensembles", which typically consist of models resulting from replicated, simulated, annealing calculations, indicates that this tendency is not solely an artifact of the structure sampling scheme used in NMR calculations. It may be the case that NMR structures not refined using CNS or XPLOR-NIH do not generally have carbonyl oxygens with high relative coordinate uncertainties. The one unrefined structure (1XPV) analyzed in this study did have carbonyl oxygens with high relative coordinate uncertainties.

One possible explanation of the high relative carbonyl oxygen uncertainties in superimposed NMR structures and variances in MD trajectories is that forcefields do not adequately restrain the positions of carbonyl oxygens. Carbonyl oxygen atoms are known to favorably interact with aromatic rings via n-π\* interactions [40,41] and also participate in hydrogen bonding, whose representation in classical forcefields is often deficient [42]. Hydrogen bonding is important in stabilizing the protein tertiary structure [43], and carbonyl oxygen atoms in regions of a secondary structure typically participate in hydrogen bonds.

Figure 4 shows that carbonyl oxygen atoms with relatively high coordinate uncertainties in NMR structures and with relatively high coordinate variances in MD trajectories occur in carbonyl oxygen atoms participating in intramolecular hydrogen bonding as well as those which are only hydrogen bonded to solvent. Nevertheless, some carbonyl oxygen atoms in a secondary structure have relatively greater coordinate variances across MD trajectories than in NMR structures. As NMR-based structure calculations typically involve additional restraints on hydrogen bonding atoms (based on H/D exchange data and/or secondary structure as established based on resonance assignments), it may be the case that MD simulations could benefit from better representation of hydrogen bonding [42] and other non-covalent interactions [41] in MD forcefields.

In addition to potentially inadequately representing quantum mechanical phenomena such as hydrogen bonding and n-π\* interactions, many force fields strongly penalize any deviation of a peptide bond from planarity. In particular, requiring peptide bonds to remain planar may cause more complex motions of the amide backbone to be represented by simple rocking motions along an axis near the N–C bond axis but angled slightly toward the Cα. This motional model, by placing carbonyl oxygens furthest from the axis of motion (and Cα atoms second furthest), inappropriately represents them as being most mobile. Deficiencies in representing hydrogen bonding in force fields [42] may also be problematic when such deficiencies result in insufficient restraints on carbonyl oxygen positions. Hydrogen bonds that are important in stabilizing protein tertiary structure [43], may represent important restraints in a carbonyl oxygen position across MD trajectories just as they are in NMR-based structural determination.

It is possible that the pattern of coordinate uncertainties and variances observed, respectively, in superimposed NMR and MD ensembles actually represents internal motions of peptide bond units in proteins in the solution state. Carbonyl oxygen atoms, branching off from the main polypeptide chain, may have enhanced thermal motion relative to backbone atoms on the main chain. In fact, other atoms branching off from the main chain, including Cβ atoms and even amide protons, tend to have significantly more coordinate uncertainty in superimposed NMR ensembles and coordinate variance across superimposed MD trajectories than amide nitrogen or carbonyl carbon atoms (Figures S1–S4, Supplementary Materials). However, more crystallographic structures have significantly higher Cβ B-factors, as compared to amide nitrogen B-factors by Friedman's test, than higher carbonyl oxygen B-factors as compared to amide nitrogen B-factors.

π

**Figure 4.** F-scores comparing UniProt ID Q8ZRJ2 backbone heavy atom coordinate uncertainties and variances. (**A**) The first model in the NMR "ensemble" 2JN8, (**B**) final snapshot of the MD trajectory seeded with 2ES9 (replacing Se-MET residues with MET residues, ran at 300 K with the AMBER 99SB forcefield), and (**C**) crystallographic structure, PDB ID 2ES9, each colored on a per residue bases by the F-score described in the Materials and Methods section (Equation (1)). Red indicates an F-score greater than 10 (relative uncertainty, variance, or B-factor of carbonyl oxygen coordinates quite high), white an F-score equal to 1 and blue and F-score less than 0.1. Green indicates residues for which the carbonyl oxygen coordinate uncertainty, variance, or B-factor for the carbonyl oxygen was actually less than the uncertainty, variance, or B-factor for the corresponding amide nitrogen. Dotted lines indicate hydrogen bonds: carbonyl oxygen atoms with high relative coordinate uncertainties and variances occur in both a hydrogen-bonded secondary structure as well as in loop regions. Some helical regions, likely endowed with extra restraints in the NMRbased structure determination process, do have slightly fewer carbonyl oxygens with high relative coordinate uncertainty as compared with the MD trajectory, illustrating the potential importance of hydrogen bonding in "fixing" the position of carbonyl oxygen atoms with high relative coordinate variances. By comparison, the crystallographic structure, PDB ID 2ES9, has relatively few carbonyl oxygens with high relative B-factors as indicated by the relative dearth of red in panel (**C**).

It is often assumed that crystallographic B-factors correlate well with internal flexibility. The absence of a persistent pattern in the B-factors for backbone atoms suggests that any such pattern observed in MD trajectories and NMR "ensembles" is an artifact. However, even in an ideal case where Crystallographic B-factors arise entirely from static and dynamic disorder, these B-factors reflect protein dynamics in the crystalline state and not in the solution state [44]. Moreover, previous studies have not only shown that NMR coordinate uncertainties correlate well to coordination variances in MD trajectories but also have demonstrated that crystallization has a "flattening" effect on protein flexibility [25]. Additionally, since Debye-Waller theory attributes any reduction in diffraction pattern intensities relative to those expected given a static protein structure to localize harmonic motion, other processes that reduce diffraction pattern intensities may result in over-estimation or even under-estimation of protein flexibility [45]. Relatedly, values obtained for B-factors are dependent on the refinement techniques used in interpreting X-ray data [25].

Nevertheless, the patterns described in this paper as well as the relatively high correlations between the statistical coordinate uncertainties derived from NMR and the putatively physical coordinate variances across MD ensembles may very well indicate deficiencies common to all force fields. Fully exploring the pervasiveness of the patterns described in this paper necessitates MD simulations and analysis of NMR structures beyond the systems studied here. However, the analysis presented in this paper identifies that coordinate variances/uncertainties from at least some MD trajectories and NMR ensembles have properties not found in B-factors. This divergence between B-factors and coordinate variances potentially indicates that there remain critical concerns in force field development. Future studies of MD trajectories will hopefully reveal which potentially inaccurate aspects of force fields, such as the requirement that peptide bonds remain planar and inadequacies in the representation of non-covalent interactions, such as hydrogen bonding as well as solvent/protein interactions, need the most adjustment. Addressing such deficiencies in

force field construction can result in better descriptions of protein structure and, hence, facilitate the accurate prediction of protein dynamics, structure, and folding pathways.
