*3.2. Folding and Flexibility of Globular Proteins*

The CABS model has been applied to de novo simulations of protein folding (using no knowledge about the protein structure) for several model systems that have been extensively studied by experiment and simulation tools. Those studies include barnase [50,52], chymotrypsin inhibitor [50,52], B1 domain of protein G [49,50], B domain of protein A [53], and others [50,54]. The CABS modeling protocol was also extended to enable studies of the chaperonin effect on the folding mechanism [55]. In these works, various parameters have been studied, including residue–residue contact frequency, radius of gyration, residual secondary structure and others. The obtained pictures, which covered protein dynamics from highly denatured states to ensembles close to the folded states, agreed well with available experimental data.

For example, simulation of barnase folding resulted in the adequate reproduction of the folding pathway in strong agreement with NMR data for denatured states and phi-value analysis [52]. The performed simulations show that barnase folding starts with developing a folding nucleation site that consists of protein fragments corresponding to two strands of a beta sheet and one of the helices in the folded structure (presented in Figure 5d). In addition, the characteristic patterns of hydrophobic interactions that are crucial for the initiation and sustenance of folding are in agreement with the experimental data (see discussion in Reference [52], the contact map resulting from these simulations is presented in Figure 5d).

#### **4. Conclusions**

The presented case studies review the applications of the CABS model in simulations of disordered or unfolded protein states. As discussed, the method succeeded in capturing the experimentally determined features of the investigated systems, such as binding site localization, key contacts, peptide hot-spot areas, distinctive conformational states of the system, transient encounter complexes and intermediate states in protein folding [49,52,63,64]. Additionally, CABS enables an investigation of fluctuations of globular proteins around the native (input) structure [69–73].

There is a number of tools commonly used for sampling of disordered protein states, which predictions agree with the experimental studies [91–95]. The CABS method is complementary to these and provides a unique approach allowing for effective modeling both ordered and disordered elements of the system. As observed in many previous studies, these features of CABS method allow for providing accurate pictures of folding pathways [49,52–56,60] and near-native dynamics [70,71]. Obviously, due to its coarse-graining, the geometric details are missed, and their reconstructions is approximate [11,28]. The main distinctive feature of CABS method as compared to the available tools is that the ensemble generation is (pseudo-)energy driven and thus may provide some information on the dynamics on the system. This is not the case in the above-mentioned examples of methods based on random-walk [91,92,95].

On the other hand, CABS force field side-chain interactions escape a clear interpretation, which may be a disadvantage compared to physics-based approaches that allow for straightforward and detailed description of each of the terms [93,94].

It is, however, noteworthy that statistical force fields suffer from inherent limitations, depending on the chosen method of derivation. The most commonly discussed challenges include the transferability, solvent interactions and integration of experimental data. Here, we briefly summarize these topics, a detailed discussion of the limitations of this approach, and possible workarounds may be found in review works [11,17]. The transferability of statistical force fields may be limited as they are applicable always to a certain subset of proteins. Therefore, the performance of knowledge-based approaches may be poor for rare or atypical structures, for which appropriate statistics of contact patterns could not be collected. It should also be noted that interactions with solvent are averaged and treated implicitly, which may lead to significant discrepancies if the method is applied to non-standard solvent conditions (such as extreme pH values). The CABS force field is derived assuming averaged effect solvent conditions for folded globular proteins. Therefore, a subtle effect of small molecules, such

as pH, cannot be simulated in a strict fashion, although averaged effects (see modeling the chaperonin effect [55]) can be approximately taken into considerations.

One of the most challenging tasks in modeling protein systems is the effective incorporation of sparse experimental data to drive the modeling procedure. In the CABS model, the experimental data may be readily introduced into the simulation as geometry distance restraints and weighted according to their certainty. A thorough discussion of this possibility is presented in the documentation of CABS-based tools for the fast modeling of protein flexibility and protein–peptide docking [66,72,73]. On a similar basis, CABS simulations can be guided by computational predictions from other sources or integrated with other modeling tools of various resolution. Therefore, the CABS model can be incorporated into integrative modeling pipelines that would benefit from its effective sampling scheme. The recently published standalone application and web server tools are available for integration with external pipelines (access links are presented in the last paragraph of Section 2).

**Author Contributions:** S.K. and A.K. conceptualized this review. M.P. performed the simulations and analyzed the results for the AR/FxxLF system. The review was written by M.P.C., A.E.B-D., A.K. and S.K.

**Funding:** This research was funded by NCN Poland, grant number MAESTRO2014/14/A/ST6/00088.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

