**2. CABS Dynamics and Interaction Model**

Since its development, the CABS model (C-alpha, C-beta and Side chain model) has been applied to a variety of modeling problems, such as protein folding mechanisms [49,50,52–57], protein structure prediction [58–61], protein–peptide docking including large-scale conformational flexibility [62–68] and simulations of near-native fluctuations of globular proteins [69–73]. When combined with careful bioinformatics selection of the generated models, CABS proved to be one of the two most accurate structure prediction tools evaluated in the CASP (Critical Assessment of protein Structure Prediction) experiment [60]. The CABS model uses up to four atoms or pseudo-atoms per residue (see the description below), but outputs protein systems in C-alpha representation only. Therefore, for practical applications, the obtained models need to be reconstructed to all-atom representation. In various multiscale modeling tools discussed below, CABS has been integrated with the MODELLER-based reconstruction procedure [74]. Other reconstruction scenarios are also possible to ensure the best possible quality of local protein structure. This can be realized by combination of different tools for protein backbone reconstruction from the C-alpha trace and side chain reconstruction, like BBQ [75] or SCWRL [76] for example, and optionally further refinement [77].

In this review, we discuss the applicability of the CABS CG model and its knowledge-based statistical force field [28] to the modeling of disordered or unfolded protein states. In the CABS model the polypeptide chain representation is reduced to up to four unified atoms per residue (see Figure 1). These interaction centers represent lattice-confined C-alpha atoms, C-beta atoms, the united side chain pseudo-atom, and additionally, pseudo-atoms representing geometrical centers of peptide bonds needed to define the hydrogen pseudo-bond. An example of a polypeptide chain in CABS representation is presented in Figure 1b. Even though the restriction of the C-alpha trace to the underlying low spacing (0.61 Å [28]) cubic lattice may appear to be a drastic simplification, it is not. Allowing small fluctuations of the C-alpha, C-alpha distance enables hundreds of possible orientations of this pseudo bond, and thereby the resulting model chains do not show any noticeable directional biases. Furthermore, the averaged resolution of the C-alpha traces is acceptable and below 0.5 Å [28]. Additionally, the lattice representation enables pre-calculation of local moves and corresponding changes of interactions, leading to a few times faster simulations in comparison with otherwise equivalent continuous space CG models [11].

The CABS model uses a knowledge-based statistical force field that consists of generic, sequence-independent interaction terms that favor protein-like conformations, and sequencedependent interaction terms that determine some structural details [11,28,78]. The generic force field terms are derived from general features of polypeptide chains that result in protein-like behavior of the model chains. They account for properties of protein chains such as local stiffness, their biases toward secondary structures and packing compactness. The residue–residue interaction terms are derived from contact geometry statistics derived from folded globular proteins (illustrated in Figure 2a). Nevertheless, the local packing regularities in unfolded states appear to be very similar to that observed in native structures [11,28,33]. Thereby, CABS simulations provided correct pictures of protein folding [49,52–56,60] and flexibility of globular proteins [70,71].

The resulting force field takes a form of a precomputed matrix of contact pseudo-energies, presented schematically in Figure 2b. Additionally, to allow successful modeling of membrane proteins the CABS force field can be extended by introducing effective dielectric constant terms [79].

**Figure 1.** A three-residue protein fragment in: all-atom (**a**) and CABS model (**b**) representation. The spheres represent atoms: blue, C-alpha and C-beta atoms (the same in both representations); yellow, side chain atoms (one pseudo-atom in CABS); red, atoms involved in the peptide bond (one pseudo-atom in CABS placed in the geometric center of the peptide bond. A single slice (layer) of the lattice that confines the C-alpha trace in the CABS model is also presented.

**Figure 2.** *Cont.*

**Figure 2.** Key elements of a residue–residue interaction term in the CABS model force field. Panel (**a**) shows three examples of contact geometries in CABS representation: parallel (P), antiparallel (A), and intermediate (M), used to derive contact statistics from experimentally-derived structures of folded globular proteins. Panel (**b**) shows an example matrix of contact energies which depend on the geometry of the contacting pair, main chain geometry (compact (C) or extended (E)) for both amino acids (left part of the panel), and also on the amino acid identities (right part of the panel, the amino acids are represented using the one-letter code). The PCC matrix is presented which shows interaction energies between residues being in parallel orientation (P), where one residue belongs to a compact type of structure (C) and the second one as well (C).

The main difference between CABS and other statistical force fields used in CG models of similar resolution [11] is the context and orientation dependence of side chain interaction pseudo-energy that encodes characteristic patterns observed in globular proteins. For instance, the oppositely charged side chains in single globules mostly contact in an almost parallel fashion (usually on the surface of a globule), while the antiparallel contacts (usually in the buried regions of the protein globule) are very rare. Therefore, in the context dependent force field these antiparallel contacts of oppositely charged residues are treated as repulsive. This way, the CABS force field implicitly incorporates information on the complicated interaction patterns with the solvent (via contact statistics) and its entropic contribution to system thermodynamics [11,28].

Using the mean-force force field derived from folded proteins to simulations of less-structured systems raises justified questions about the validity of this approach in studies of the disordered protein regions. The folding events observed in simulations performed using the CABS force field are consistent with both the experimental data and all-atom MD simulations [49,52,80,81]. Thus, it is hypothesized that unstructured (unfolded, partially unfolded or intrinsically disordered) proteins to a significant extent share similar stabilizing interaction patterns with the patterns observed for their well-structured counterparts [82,83].

The CABS method uses the MC asymmetric Metropolis sampling scheme that governs a set of local motions as well as multi-residue, small distance moves of the C-alpha atoms (see Figure 3). The method uses a replica exchange algorithm with simulated annealing to enhance the sampling of conformational states. The simulation is organized as a set of nested loops, in which the *s* number of MC steps are organized into the *y* number of MC cycles, and these in the *a* number of annealing cycles. Each of the MC steps consists of a per-set number of attempts to perform each of the five standard precomputed moves. The available motions and the details of implementation of the sampling scheme are presented in Figure 3.

**Figure 3.** Sampling scheme of the CABS model. Blue panels show implementation details of Monte Carlo (MC) iterations (loops). The orange panel shows all motions that may be performed in a single MC step. The simulation is organized as a set of nested loops, in which the *s* number of MC steps is organized into the *y* number of cycles, and these in *a* annealing cycles (number of *a*, *y* or *s* cycles can be controlled by the user in CABS-flex and CABS-dock standalone packages [72]). In the orange panel, numbers 1 to 5 denote the available moves, presented together with the number of attempts to perform a move in each of the MC steps. The resulting trajectory is comprised of simulation snapshots saved at the end of each MC cycle.

The combination of the key features of CABS—its representation, force field and the scale of the movements used in the MC scheme—makes it suitable for the investigation of protein pseudo-dynamics. As mentioned above, the fine-grained lattice improves sampling efficiency, achieving effective timescales of milliseconds. As compared with MD, this is a considerably broader time range (in the study of flexibility of folded proteins [71] the CABS dynamics was estimated to be around 6 × 103 cheaper in terms of computational cost than the classical MD). The chosen micro-motions allow (via accumulation over simulation steps) cooperative, large-scale motions. The ensemble of structures produced by the CABS method resembles a dynamic ensemble averaged over the effective timescale. Due to the nature of the method, the picture of local dynamics is distorted (on the level of local moves); however, it may be argued (based on the works mentioned above that compared our simulations with experimental data) that the long-time pseudo-dynamics recovers the realistic picture of protein motions averaged over time.

The timescale of the CABS simulations is not a priori defined and depends on the CABS simulation temperature, due to hidden entropic contributions in the force field, accounting for implicit solvent effects and multi-body interactions encoded in the statistical force field. Nevertheless, the effective timescale of MC dynamics can be approximately identified by comparison with MD trajectories from sufficiently long simulations. This comparison was thoroughly discussed previously, and the results were compared to MD results [69] and NMR ensembles [71].

The CABS model is presently used as a simulation engine of a few multiscale modeling tools that merge CABS with models reconstruction to all-atom resolution. Those include the CABS-dock method for flexible protein-peptide docking (available as a web server [62] at http://biocomp.chem. uw.edu.pl/CABSdock and a standalone application [84] at https://bitbucket.org/lcbio/cabsdock/)

(accessed on 30 January 2019). In comparison to other protein–peptide docking tools, reviewed recently [85], CABS-dock offers a unique opportunity for modeling large-scale rearrangements of protein receptor structure during on-the-fly docking of fully flexible peptides. Another CABS-based tool, CABS-flex, enables fast simulations of protein flexibility (available as a web server [73] at http: //biocomp.chem.uw.edu.pl/CABSflex and a standalone application [72] at https://bitbucket.org/ lcbio/cabsflex/, accessed on 30 January 2019). This approach has been also incorporated as the module in the Aggrescan3D method for prediction of protein aggregation properties (available as a web server [86] at http://biocomp.chem.uw.edu.pl/A3D and a standalone application at https://bitbucket. org/lcbio/aggrescan3D, accessed on 30 January 2019). By using CABS-flex predictions, Aggrescan3D enables predicting the impact of protein conformational fluctuations on aggregation properties. Finally, the CABS model is used in the CABS-fold method for protein structure prediction: in the de novo fashion (from an amino acid sequence only), guided by user-provided templates or user-provided distance restraints (available as a web server [58] at http://biocomp.chem.uw.edu.pl/CABSfold/, accessed on 30 January 2019). The access to CABS-based tools, together with the tools description, is also available from websites of the laboratories: http://biocomp.chem.uw.edu.pl/ and http://lcbio.pl/ (accessed on 30 January 2019).
