**1. Introduction**

Large molecules and biomolecules can have a high degree of motional flexibility, affecting their function [1]. Common sources of information about protein flexibility and modes of motion include crystallographic B-factors [2], molecular dynamics (MD) simulations [3], and Nuclear Magnetic Resonance (NMR) spectroscopy, including relaxation measurements [4–7] and even chemical shift data [8,9].

Each of the above techniques for evaluating protein flexibility yields an incomplete picture of protein dynamics in solution. Crystallographic B-factors are affected by packing and other special features of the crystalline state [10]. In addition, many factors may reduce the intensities of the "reflections" in a protein crystal's X-ray diffraction pattern, and, hence, elevated crystallographic B-factors that may not solely indicate macromolecular flexibility [11]. The quality of MD simulations is dependent on the quality of the seed structure and force field used, despite recent efforts applying MD simulations to NMRderived structures [12]. NMR relaxation experiments provide a critical source of data for evaluating individual MD trajectories as well as the force fields and other methodological details of MD simulations [13,14]. The combination of multiple assessments of protein flexibility has proven particularly illuminating [15]. For example, the combination of NMR-relaxation data with MD simulations yields a detailed picture of protein dynamics and motional modes [16].

While lacking a universally accepted analog of the B-factor, the NMR-based structure determination process itself provides insight into protein flexibility. Atoms in loop residues and other flexible regions of a protein typically have fewer long-range "contacts" to atoms in other residues. This paucity of contacts leads to both increased flexibility of loop regions [17,18] as well as poor convergence for loop residue positions in NMR-based

**Citation:** Reinknecht, C.; Riga, A.; Rivera, J.; Snyder, D.A. Patterns in Protein Flexibility: A Comparison of NMR "Ensembles", MD Trajectories, and Crystallographic B-Factors. *Molecules* **2021**, *26*, 1484. https:// doi.org/10.3390/molecules26051484

Academic Editor: Marilisa Leone

Received: 31 December 2020 Accepted: 28 February 2021 Published: 9 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

structure calculations [19–22], provided the structure refinement process does not lead to inaccurate rather than imprecise coordinates [23]. Moreover, the primary source of structural restraints in NMR-based structure calculations are NOESY (Nuclear Overhauser Effect Spectroscopy) experiments. Fast motions reduce NOEs while intermediate time scale motion causes line broadening that can interfere with the identification of NOESY cross-peaks. Thus, NMR yields a paucity of restraints for particularly flexible regions of a protein leading to poor convergence in NMR-based structure determination, and coordinate uncertainties in an NMR-derived "ensemble" of structures [24]. While, strictly speaking, such coordinate uncertainties measure the local reproducibility of the NMRbased structural determination process, coordinate uncertainties across NMR ensembles are highly correlated to coordinate variances across MD trajectories [25].

While they can provide key insights into protein flexibility and dynamics, evaluation of uncertainties in protein structure coordinates inferred from NMR data is a non-trivial and non-physical process. Typically, NMR-based structure calculations generate multiple (typically 10–40) models [22]. Such collections of structural models are called "ensembles". Although NMR ensemble generation can effect Boltzmann sampling [19–21], generally NMR ensembles, including those analyzed in this study, are not actually Boltzmann ensembles.

Calculation of coordinate variances requires the superimposition of NMR ensembles. However, inclusion of poorly converged coordinates can bias the superimposition process, reducing the applicability of the resulting coordinate variances [26,27]. Limiting the calculation of an optimal superimposition to a core atom set, determined in a superimposition independent manner using either circular variances of backbone dihedral angles [28] or an interatomic variance matrix [27,29], ensures calculation of optimal superimpositions and, hence, of appropriate coordinate uncertainties. Alternatively, assumptions concerning the distribution of coordinate variances can lead to model-based superimposition methods such as THESEUS, which assumes a multivariate Gaussian distribution of coordinate uncertainties [30,31].

Identification of a core atom set is a critical step in solving two distinct, albeit related, problems. Not only does identification of a core atom set an important step in calculating coordinate uncertainties via superimposition, but such a core atom or residue sets also convey in which regions the NMR-based structure calculation process has converged [22]. Since these two problems are different, their optimal solutions may differ slightly. For example, application of the FindCore method, which identifies core atom sets for use in assessing the precision of NMR ensembles, to the distinct, albeit related problem of identifying well-converged core atom sets for CASP10 [32,33], required extension of the FindCore method into an approach known as Expanded FindCore [22].

Software used in the CASP10 competition also required any residue with core atoms to have all backbone heavy atoms in the core. The process of modifying Expanded Find-Core to meet this requirement revealed carbonyl oxygens from otherwise well-defined residues whose positions were poorly defined in NMR-based structural calculations. Given the relation between coordinate uncertainties in NMR-derived structures and physical flexibility as described above, this discovery raised questions about the high uncertainties (relative to other backbone heavy atoms in the same residue) of those carbonyl oxygens. How common are these relatively uncertain carbonyl oxygens and is this high relative uncertainty an artifact of the NMR-based structure determination process or is it indicative of a pattern in backbone atom flexibilities?

Addressing these questions requires a comparison of NMR ensembles with complementary structural information, such as that obtained from crystallographic data, as well as with MD trajectories that provide insight into protein flexibility. Protein structures obtained by the North East Structure Genomics (NESG; http://www.nesg.org/ accessed on 31 December 2020) consortium facilitated this analysis. The NESG performed crystallization and HSQC (Heteronuclear Single Quantum Coherence Spectroscopy) screening in parallel for robustly expressed protein targets resulted in more than 40 NMR/X-ray crystal structural pairs [34,35].

The analysis presented here demonstrates the persistence of a pattern in coordinate variances across structural "ensembles" obtained using multiple force fields, superimposition techniques, and sampling schemes (i.e., restrained, simulated annealing and similar schemes in NMR structural refinement vs. the unrestrained constant temperature approach used in MD). This persistent pattern does not necessarily occur in Crystallographic Bfactors of backbone heavy atoms. That the relatively high uncertainty of carbonyl oxygens persists, in almost all MD trajectories simulated in this study, indicates that the relatively high uncertainty of carbonyl oxygens is not solely an artifact of NMR-based structural determination. The pattern in backbone heavy atom coordinate uncertainties reflects either a physical reality of peptide bond motion not evident in crystallographic data or a shortcoming common to multiple force fields. If the latter explanation is true, the analysis presented here underscores that further improvements in force field parameterization are necessary for better prediction and calculation of a protein structure and dynamics.
