*2.2. Statistical Physical Modelling of Domain-Linker-Domain Enzymes*

In order to determine how the disordered linker influences (re)binding kinetics of binding domains within a DLD-type enzyme, we used a statistical-kinetic approximation of their binding/unbinding behavior. As the effect of linker length will depend on distances between binding sites and on/off rates of binding domains, we used as a representative example the cellulose/cellulase (Cel7A in Table 1) system. To describe the kinetic behavior of the system, we used a Gaussian approximation of the exact Freely Jointed Chain (FJC) model (see Supplementary Methods and Figure S1). Figure 2 shows the results of varying parameters of a sample case where the tethering domain (cf. Figure 1D) is bound at a substrate site, and we calculate the average binding time (the time it takes for half the free domains to bind a target binding site on the substrate; cf. Supplementary Methods, Equations (S9) and (S10)). By considering the distribution of concentration of the free domain around the bound tethering domain (Figure S1) and integrating binding events (kinetics) based on the binding rate of cellulases (Table S3) over all binding sites within the reach of the free domain, it appears (Figure 2A) that the average time required for re-binding (Supplementary Equation (S10)) increases with increasing linker length. By assuming a threshold set by the kinetics of the dissociation of the tethering domain (for illustration, dissociation half-time (i.e., the time taken for half the bound domains to dissociate) taken as 3 <sup>×</sup> <sup>10</sup>−<sup>3</sup> s), the system is processive below a certain linker length (re-binding will be preferred over dissociation), and becomes non-processive for longer linkers (e.g., the threshold linker length is 50 residues in Figure 2A). It should not be forgotten here that the domains in this modelling are dimensionless, due to which there is no minimum on the curve (although there appears to be a minimum imposed by the separation between binding sites, setting a minimum to Kuhn segments).

**Figure 2.** Modelling linker length in processive enzymes. Average binding times (tb) of a free domain linked to the tethering domain already bound to the substrate by a disordered linker of the given length (cf. Figure 1D, and Supplementary Equations (S9) and (S10)). The substrate is modelled based on cellulose geometry: it is assumed to contain binding sites spaced equidistantly every 1.026 nm (1 cellobiose unit) in the X dimension for a thread, and every 2 nm in the Y dimension in case of a sheet. (**A**) Average binding time of the free domain with a random-coil linker (length of Kuhn segment (lk) = 0.88 nm) and binding domains with no physical dimensions. (**B**) Lengthening the Kuhn segment length from 0.88 nm (random-coil) to 7.04 nm (PPII helix) significantly slows binding and reduces processivity. (**C**) "Diluting" binding sites on the substrate (by lengthening the distance between binding sites from 1 cellobiose unit to 7) has a dramatic effect on binding time. (**D**) Binding to a 2D substrate (sheet) is much faster than binding to a 1D substrate (fibril), making the enzyme more processive. On all the panels, if we assume a dissociation half-time of 3 <sup>×</sup> <sup>10</sup>−<sup>3</sup> s (limited by catalysis), the enzyme is typically processive at shorter, but not at longer, linker lengths (see text for details).

Therefore, spatially confined diffusional search by the free domain can result in processivity under certain circumstances, when (re)binding by the free domain is kinetically favored over dissociation of the tethering domain. Next, we asked how the flexibility of the linker affects binding time by the free domain. To this end, we ran the statistical kinetic model by varying the length of Kuhn segments (and therefore the persistence length of the chain, see Supplementary Methods) from 0.88 nm (characteristic of random coil chains) to 7.04 nm (characteristic of a polyproline II (PPII) helix), and found a marked effect (Figure 2B), with a more rigid linker providing longer binding times, making the enzyme less processive (e.g., at a length of 30 residues, the enzyme is processive with a linker of 0.88 nm, but not of 3.52 nm, Kuhn-segment length), which may be a prime factor in determining the amino acid composition and sequence conservation of processive linkers, as shown later.

As the calculated binding time is an aggregate value (integrating binding events over all substrate binding sites that can be reached by the free domain, see Supplementary Equation (S10)), we intuitively expect that processivity is increased when possible binding sites are closer to each other, i.e., there are more sites within the reach of the free domain. This is formally demonstrated by varying the spacing of sites (Figure 2C), showing that a processive enzyme can be made non-processive by moving the target sites farther away (this will depend on linker length and could actually be a tuned feature of each system). Along a similar logic, one might expect that the level of processivity is higher when target sites are spread on a two-dimensional surface, by making more sites available for binding. This is formally shown in Figure 2D, where clearly the enzyme is much more processive with a two-dimensional substrate.

Another caveat to the model calculations is if, besides qualitatively assessing whether an enzyme is processive or not, we can draw quantitative conclusions on the level of processivity (average number of steps taken before releasing the substrate). For this, one has to note that the extent of processivity (average number of elementary steps upon engagement with the substrate) is straightforward to define, but not trivial—and is probably not unequivocal—to measure. Furthermore, being a kinetic phenomenon, it may show high stochastic fluctuations and may be very sensitive to experimental conditions.

Nevertheless, one can infer the typical linker-length range where a particular enzyme may behave processive (say, 10–100 residues, cf. intersection of red and blue traces in Figure 2A). This inference may also suggest that linker length and the distance between substrate binding sites must have co-evolved. As an additional note, whereas preferential binding (over dissociation) follows from the kinetic setup of the system, its capacity for unidirectionality does not. As a diffusive move can equally well occur in the backward direction (Figure 1D), directionality may stem from additional mechanistic elements, such as the use of energy and/or post-translational modifications of the substrate. This may even include its degradation, such as that of extracellular matrix proteins in the case of MMP-9 [33,35] or cellulose in the case of cellulases [31,32,36]. This may hinder backward movement and result in rapid unidirectional, forward translocation (Figure 1D).

#### *2.3. Multiple Examples of DLD-Type Processive Enzymes*

The foregoing modelling studies show the potential for processivity encoded in the DLD arrangement of enzymes. Next, we demonstrate that there are many such enzymes in biology. Out of 47 processive enzymes of various mechanisms (Table S1), a simple literature search identified 12 processive systems that appear to rely on the DLD domain arrangement, such as MMP-9 [33,37], RNAse H1 [5], or a variety of glycohydrolases [6,31,32]. These ATP-independent enzymes enlisted in Table 1, are analyzed further.

#### 2.3.1. Structural Disorder of Linkers in Monomeric Processive Enzymes

A critical element of processivity in these DLD-type of processive enzymes is the structural disorder of the linker region connecting the binding domains, which has been experimentally demonstrated in only a few cases. For example, the cellulose-binding domain can be effectively separated from the catalytic domain of cellobiohydrolase I by limited proteolysis [38], in agreement with the extreme proteolytic sensitivity of IDPs [34]. Structural disorder was directly observed in cellulase Cel6A and Cel6B by small-angle X-ray scattering (SAXS) [39], in xylanase 10C by X-ray crystallography [40], and in MMP-9 by atomic-force microscopy (AFM) [33]. Besides these few examples, however, structural disorder has not yet been systematically analyzed in monomeric processive enzymes.

To this end, we applied bioinformatic predictions for the local structural disorder of the linker regions of DLD enzymes in Table 1 (Figure 3). Prediction of structural disorder of three processive enzymes MMP-9, Cel6A and RNAse H1 by IUPred [41] shows a distinctive pattern of a very sharp transition from local order in the binding domains to structural disorder within the linker region. Given the reliability of disorder prediction [42], we may conclude that the linker region in processive enzymes is always disordered, as confirmed for all the cases collected from literature (cf. Table 1, predicted disorder values). Interestingly, the length of the linkers in these processive enzymes always falls within the critical range suggested by model calculations above (cf. Figure 2).

**Figure 3.** Structural disorder of linker regions in processive enzymes. The linker region in monomeric processive enzymes tends to be highly disordered, as shown here for three illustrative examples by the IUPred algorithm [41]. Traces of disorder score are given for the human and matrix metalloproteinase-9 (MMP-9) sequence (**A**), bacterial cellulase 6A (**B**) and Ribonuclease H1 (RNAseH1) (**C**). In each case, the sharp transition from order to disorder (IUPred score > 0.5) and again to order clearly delimits the linker as a disordered element connecting two globular domains. Globular domains are visualized on top of the diagrams, with blue rectangles representing binding domains and red ones representing catalytic domains.
