**1. Introduction**

Intrinsically disordered proteins (IDPs) play important roles in the regulation of many biological processes, such as cell growth, cell signaling, and cell survival. To exert these functions, their intrinsically disordered regions (IDRs) often bind to different proteins with high specificity and low affinity [1–4]. In many cases, it is observed that IDRs adopt well structured conformations when bound to their partners [5]. Segments that undergo such a disorder-to-order transition upon binding are frequently called Molecular Recognition Features (MoRFs) in the literature [4,6–10].

A typical IDR with a MoRF is the WASP-homology 2 (WH2) motif, which is found in about 50 proteins [11]. WH2 motifs are actin-binding modules of about 30–50 residues that are key players in regulation of the cytoskeleton actin polymerization, dynamics, and organization [11–13]. Proteins of the WH2 family can contain one to four WH2 motifs, each being able to bind one G-actin monomer (Table S1). In unbound state, WH2 motifs are intrinsically disordered, and, in complex with actin, they all share a similar binding mode: their N-terminal part folds into an *α*-helix which interacts with the barbed face of actin, between subdomains 1 and 3, while their central consensus sequence "LKKV" has an extended conformation which lies on the actin's surface, between subdomains 1 and 2 [11,14,15] (see Figure 2B). Although these actin–WH2 motif structures were determined by X-ray diffraction, the

common folding of different WH2 motifs upon binding to actin indicates, with reasonable confidence, that it is probably similar to the one adopted in solution.

It should be noted that, when a WH2 motif or a peptide construct encompassing a WH2 motif is co-crystallized with actin, only the coordinates of about 20 residues, generally from the beginning of the helical segment to the consensus sequence "LKKV", were resolved in most crystallographic complexes (Table S1). Only the crystallographic structures 2A41, 2D1K, and 5YPU contain almost all residue coordinates of the co-crystallized WH2 motifs. The absence in most crystallographic structures of atomic coordinates for regions after the consensus sequence "LKKV" indicates that they probably keep a highly flexible and disordered conformation upon binding to actin, forming so-called fuzzy complexes. Questions that could be raised here are: What is the conformational dynamics of these invisible regions? Are they interacting with actin, and, if so, with which residues?

A more general and still debated question regarding IDRs concerns the mechanism of their specific binding to their partners. The formation of IDP–protein complexes can indeed follow a pathway between two possible mechanisms [16]: the "induced fit" pathway, in which the disordered region binds to its partner and folds into an ordered structure on its surface, and the "conformational selection" mechanism, in which the folded structure preexists among the ensemble of conformations of the unbound IDP and is recognized by the protein partner. However, the observation of preexisting structured segments in IDRs does not necessarily prove that the binding proceeds by a direct conformational selection [17]. For example, an alternative mechanism could be that the protein partner first binds to any IDR region and slides to the specific binding site which has the correct complementary conformation [18]. Thus, closer investigations are required to gain insight into the early events and pathways of the IDP–protein recognition mechanism.

In this report, we address these issues in the case of the verprolin homology domain (V) of the Neural Wiskott–Aldrich Syndrome Protein (N-WASP), which has two WH2 motifs. With the Arp2/3 complex, N-WASP stimulates actin filament branching and the formation of dendritic networks of filaments that shape or deform cell membranes in several cellular processes, such as cell motility or endocytosis [19,20]. The 505-residue sequence of the human N-WASP can be decomposed into seven domains: a primary WASP homology domain WH1 (segment 1–150), a basic domain B (186–200), a GTPase-binding domain GBD (203–274), a proline-rich domain PRD (277–392), a verprolin homology domain V (405–450), a cofilin homology domain C (451–485), and an acidic domain A (486–505) [21,22]. N-WASP domain V binds and recruits G-actin monomers, while domains CA are attached to the Arp2/3 complex. These associations allow the nucleation of new branch filaments [19,23,24]. N-WASP domain V is composed of two WH2 motifs (Table S1), each being able to bind one G-actin [25–27]. Interestingly, the presence of two WH2 motifs in N-WASP domain V induces more rapid actin polymerization than the other proteins of the WASP family which have only one WH2 motif [28]. However, the structural mechanism by which a tandem of WH2 motifs binds two actin monomers and accelerates polymerization and branching is not completely elucidated.

Two crystallographic structures of the N-WASP WH2 tandem in complex with actin are available in the Protein Data Bank: a 1:1 actin–domain VC (2VCP [27]) and a 2:1 actin–WH2 tandem (3M3N [26]). Nevertheless, in both 2VCP and 3M3N structures, we emphasize again that only about 20 residues of each WH2 motif, from the helical N-terminal part to the consensus sequence "LKKV", could be resolved by X-ray experiments (Table S1). It should be noted that the actin dimer in 3M3N complex has an overall longitudinal arrangement similar to that one in actin filament [26]. This suggests that N-WASP domain V might favor the formation of actin dimers in a longitudinal filament-like conformation, which might accelerate actin polymerization. However, to confirm this scenario, a detailed description of the formation of the 1:1 and 2:1 actin–domain V complexes in solution is required.

Previously, we structurally characterized the unbound state of a construct encompassing N-WASP domain V (Figure S1) by combining various biophysical techniques [29]. Multiple molecular dynamics (MD) simulations allowed generating a conformational ensemble of this construct (which we continue to call "N-WASP domain V" for simplicity) in very good agreement with both NMR chemical shifts and SAXS intensity measurements. In this ensemble, several conformations were identified with transient *α*-helices in the WH2 motifs, suggesting that these secondary structures might be selected by actin during the recognition process. We query here the validity of this hypothesis and, more generally, investigate the early events of actin recognition by these *α*-MoRFs, using protein–protein docking calculations and multiple MD simulations. In addition, since N-WASP has a tandem of WH2 motifs, we examine the possible molecular pathways leading to the ternary complex of domain V with two actins.

#### **2. Results**

NMR experiments and MD simulations previously showed that unbound N-WASP domain V has two transient *α*-helical structures (one per WH2 domain) at regions 10–15 and 37–43 corresponding to residues 407–412 and 434–440 in the whole protein sequence (Figure S1) [29].

#### *2.1. Monomeric Actin–Domain V Encounter Complexes Generated by Docking Calculations*

To examine whether these two helical MoRFs are preferential recognition sites for actin, we blindly docked the 527 most populated clusters of N-WASP domain V conformational ensemble (derived from MD simulations with the A03ws force field [29]) onto the actin chain B extracted from the PDB structure 2VCP [27]. Each docking generated about 1300 different poses of domain V on actin, yielding a total number of 702,920 encounter complexes. The likeliness of these complexes was evaluated with the scoring function 2/3B*best* InterEvScore [30]. We delineated the 1% of complexes (i.e., 7030 conformers) having the highest 2/3B*best* score as the most probable actin–domain V structures. It could be noted that, when compared to the 527 cluster representative structures, the domain V conformations that are retrieved in the 7030 most probable complexes are sightly more compact, as indicated by the radius of gyration distributions (Figure S2), indicating that extended conformations of domain V did not particularly favor their binding to actin. At the local level, the difference in probability for residues to be in *α*-helix, between the two ensembles of 527 clusters and of 7030 ligands, appears quite small and may not be significant (Figure S2).

We first analyzed the residues at the protein–protein interface in the 7030 most probable complex structures. The probability of N-WASP domain V residues to be in contact with actin was computed, as plotted in Figure 1. Clearly, it can be observed that actin preferentially recognizes two regions of domain V which can be delimited by residues 8–18 and 37–50. The first binding site is shorter than the second one, which might be related to the difference in propensity of the two WH2 motifs to form *α*-helical structures (Figure S2). Nevertheless, when the two regions with high probability to be contacted by actin are compared, a consensus sequence can be identified as the most probable recognition site for actin: 9KAALLDQIRE18 and 37RDALLDQIRQ46 in the first and second WH2 motif, respectively. It is worth noting that both recognition segments exhibit a similar pattern in which a positively charged residue (K9 or R37) precedes two moderate probability residues (A10/A11 or D38/A39), followed by two high probability hydrophobic residues (L12/L13 or L40/L41) and again two moderate probability ones (D14/Q15 or D42/Q43), before two other high probability residues (I16/R17 or I44/R45). This pattern suggests that the domain V recognized regions are rather *α*-helical structures than short linear motifs (SLiMs) in coil or extended conformations. The chemical nature of the mentioned residues also indicates that the central parts of the recognized segments are amphiphilic helices with their hydrophobic faces in contact with actin.

**Figure 1.** Probability of the N-WASP domain V residues to be distant by less than 4 Å from actin. Orange and magenta dashed lines indicate the protein regions in *α*-helix (as revealed by the X-ray structure 2VCP [27]) and the consensus sequences "LKKV" [14,31], respectively.

Besides, it could be noted that, among the most probable complexes, the conserved residues 22LKKV25 and 50LKSV53 have significantly lower probability to be in contact with actin than the two previous binding sites (Figure 1). This suggests that, after the recognition of regions 9–18 or 37–46 by actin, the N-WASP consensus sequences "LKKV" should move and anchor to the actin's surface in a second step. This scenario was further examined using MD simulations, as presented in the next section.

Before that, we investigated the preferential location of the two N-WASP regions 9–18 and 37–46 on actin's surface. To that end, the probability that actin residues are contacted by one of these two segments was computed over the 7030 most probable complexes predicted by docking, as plotted in Figure 2A. Among the actin residues which are frequently contacted by regions 9–18 and 37–46, we retrieved those (Y143, G146, T148, G168, Y169, L349, T351, M355, and F375) which make contacts with the N-WASP segment 37–46 in structure 2VCP [27]. However, we also observed that segments 9–18 or 37–46 can bind to other patches of the actin's surface with high probability, notably residues 171–173 and 283–290, which are not close to the cognate binding site (Figure 2). These observations could arise from various factors, including limitations of the rigid-body docking procedure and imperfections of the coarse-grained scoring function. This could be also related to the fact that, in most selected conformations of N-WASP domain V used in docking calculations, segments 9–18 and 37–46 were not fully helical, unlike in the crystallographic complex (Figure S2). This might favor the binding to pockets of the actin's surface with no particular shape, to the detriment of the groove that is expected to accommodate the WH2 motif helices. In these cases, the conformational transition of these N-WASP regions toward full *α*-helices might not lead to stable complexes. Besides, it could be noted that these non-specific binding sites on actin monomer also extend over the actin–actin interface in longitudinal dimers and, therefore, might be less observed in such actin assemblies.

**Figure 2.** (**A**) Probability of actin residues to be distant by less than 4 Å from domain V regions 9–18 or 37–46. Red dashed lines indicate actin residues in contact with N-WASP helical segment in structure 2VCP [27]. (**B**) Views of actin's surface colored proportionally to previous probabilities. Blue, white, and red colors indicate actin residues with low, intermediate, and high probabilities to be contacted by domain V, respectively. As a reference, yellow and green ribbons represent the second WH2 motif helical region and conserved sequence LKSV as observed in 2VCP [27].

Overall, docking calculations of representative conformations of free domain V on actin monomer yielded many encounter complexes in which N-WASP segments 9–18 and 37–46 are preferentially bound to actin, but to both specific and non-specific sites. In these encounter complexes, consensus sequences "LKKV" have low probability to be in contact with actin, whereas they are found attached to actin in all available crystallographic complex structures. This suggests a two-step association mechanism involving large conformational rearrangements of domain V after the formation of a productive encounter complex with either segment 9–18 or 37–46 in cognate binding site of actin.
