*2.3. Algal Cellulases Are Multimodular*

Sequence alignment (Supplementary Figure S1a) and homology models (Figure 3) reveal that algal GH9 cellulases consist of catalytic and non-catalytic modules. Multidomain cellulases are widespread among many taxonomic groups, however, cellulases from anaerobic bacteria, found in cellulosomes, have the most complex architecture consisting of different types of modules (Supplementary Figure S3). For example, *Clostridium cellulolyticum* produces 13 GH9 modular cellulases containing a different number and arrangement of CD (single), CBM (0–2), dockerin (0–1), and Ig-like domain (0–1) [24]. However, among templates, only the full sequences of 1JS4/4TF4 and 1KFG/1GA2, comprising CD, linker, and CBM, have been crystallized (Supplementary Figure S3) [18,25]. Multimodular cellulases are more efficient than free enzyme (with only CD) due to synergism because of the close proximity between the enzyme and the cellulosic substrate [1,2,26]. Glycosylated linkers provide flexibility to the CD for higher activity [27] and protease protection, as well as increased binding to the cellulose surface [28] (see also Section 2.6)*.* The statistics regarding homology-based modelling are given in Supplementary Table S1, showing the top templates employed by I-TASSER, such as 1KS8 (an endocellulase from a termite), 1JS4, 1TF4 (mixed endo-/exocellulase from *Thermobifida fusca*), and 1UT9 (exocellulase from *Clostridium thermocellum*).

**Figure 3.** Homology models of selected family GH9 cellulases. Blue, CD (catalytic domain); pink, CBM (carbohydrate binding module); grey, linker; yellow, Ig-like domain; red/?, unknown. Organism names, accession/PDB codes, and cellulase types are given alongside the structures. (**a**) Cr, *Chlamydomonas reinhardtii*; (**b**) and (**c**) Gp, *Gonium pectoral* and Vc, *Volvox carteri*. The X-ray structures of templates (PDB: 1JS4/4TF4, 1KFG/1GA2, 1UT9, 1KS8, 1CLC and 2YIK) used by I-TASSER for generating homology models are given in Supplementary Figure S3. The domain arrangement is given below the structure, with a dot showing separation between two adjacent domains. CD, catalytic domain; CBM, carbohydrate-binding module; arrow, linker. The I-TASSER statistics are given in Supplementary Table S1 and the X-ray structures of templates are given in Supplementary Figure S3.

Some physicochemical properties (Supplementary Table S2) and arrangement of various domains in algal (Figure 3) and some other non-algal cellulases (Supplementary Figure S3) are given. In addition to GH9 CDs, all algal cellulases were found to have putative CBMs linked to CD by linkers, except CrCel9C (hereafter, indicated, for simplicity, as Cr9C; Figure 3a). Only Cr9C was found to have an Ig-like domain and no CBM, whereas Cr9D had an unknown sequence at the N-terminus (Figure 3a). Interestingly, Gp KXZ51468 (hereafter, referred to as Gp51468) had two consecutive CD, separated by two linkers with a single CBM (CD1-linker-CD2-linker-CBM), whereas Vc2952174 has two putative CBMs and a single CD. Spinach cellulase (Supplementary Figure S3b,c) was found to have a similar domain arrangement to Gp51468, whereas the bacterial cellulase from *Caldocellum saccharolyticum* has one family GH9 CD and another family GH48 CD, along with three CBMs [29]. In contrast to bacterial cellulases [30], dockerin and Fn3-like domains were found to be absent in algal, plant, and invertebrate metazoan cellulases (Figure 3). Multimodular microalgal cellulases were found to be closest to invertebrate metazoan homologs (Figure 2, Supplementary Figure S1b); however, in contrast to the modular cellulase from abalone, many invertebrate cellulases (such as termite) were found to be non-modular (Supplementary Figure S3).

#### *2.4. Active-Site Architecture Shows Different Types of Cellulolytic Activities in Algal GH9 Cellulases*

Cellulolytic organisms secrete different types of cellulases, in addition to β-glucosidases, xylanases and lignin-degrading enzymes [1,7]. In this work, we focus on microalgal cellulases. Various residues (white) involved in substrate binding (S-labelled), Ca++-binding (M-labelled), catalytic residues (C-labelled), and loops (pink boxed) within the CD (blue highlighted region) are shown in the multiple alignment (Supplementary Figure S1a). All the catalytic acidic residues are strictly conserved in all GH9 cellulases, including microalgae (Figure 1 upper panel, Supplementary Figure S1a and Figure 4a). However, there is variation in Ca++-binding residues among algal and other GH9 cellulases, probably reflecting different metal-binding affinities. For example, a D residue is substituted by an A and G in Vc2952174 and 1UT9, respectively (M-labelled, blue highlighted, Supplementary Figure S1a), that are unable to bind calcium. In glucanases, the active-site is mostly lined by aromatic residues in order to bind sugar moieties, although polar amino acids are also present (Figure 4b). These amino acids bind to cellulose via H-bonding and hydrophobic interaction, whereas aromatic amino acids interact via CH-π interaction with the sugar rings [18,31]. The substrate-binding residues around the active-site pocket are mostly conserved among GH9 sequences (Figure 4b), however, there are exceptions (red residues, Supplementary Figure S1a). In Region II (Figure 1, lower panel), highly conserved H and R are replaced by V465 and K467, respectively, in Vc2952174 (Supplementary Figure S1a).

Cellulases can be classified into endo-, exo-, and exo/endo- (also called processive endoglucanases) [3,4] (Figure 5). It has been shown that cellulases with similar sequences have different specificities, implying that exo- versus endo-versus exo/endo activities are a consequence of subtle differences in and around the active-site cleft [32]. However, in spite of this limitation, modelling of algal CDs may give valuable insight into their likely mode of action, although determination of X-ray structures and experimental data obtained via enzyme assays using various substrates (soluble, amorphous, and crystalline), as well as product analysis (cellobiose versus oligosaccharides), are more reliable options [4].

**Figure 4.** Active-site pocket of selected GH9 cellulases showing conserved residues around the substrate superimposed on each other. (**a**) Catalytic-residues (above, E412; below left to right, D54, D57) and (**b**) binding-residues (above from left to right: W253, F205, H124, R361, W127; below from left to right: H306, Y417, W301, Y408, H359). The residue number refers to that of termite (PDB, 1KS8). Red, substrate (C4 + C2); blue, termite; green, algae; pink, spinach; black, *T. fusca* (4TF4); orange, fungus (*Neocallimastix patriciarum*).

**Figure 5.** Various mechanisms of GH9 family cellulases found in algal enzymes. (**a**) Random cleavage of cellulose by endoglucanases to form oligosaccharides; (**b**) sequential cleavage of cellulose by GH9 exoglucanase-like enzyme due to partial blockage of the active site similar to that found in CbhA from Ruminiclostridium, 1UT9; (**c**) sequential cleavage of cellulose by processive endoglucanases (also called exo/endo cellulases) into oligosaccharides not longer than cellotetraoses due to blockage after the −4 binding site. Numbers (−4 to +2) show binding subsites (non-reducing to reducing) in the cellulase catalytic domain (blue). Arrows show the cleavage site; hexagon, glucose units.

Space-filled active-sites of algal and selected GH9 cellulases derived from homology models of CD were compared with the X-ray structures of endocellulases, an exoglucanase (both from GH6 and GH9 families), and mixed exo/endo processive endoglucanases (Figure 6). Binding and catalytic residues are shown as space-filled atoms, along with cellotetraose + cellobiose substrates (Figure 6b, top and lower panels). The X-ray structures of 4TF4 from *T. fusca* and 1KFG from *C. cellulolyticum*, with their respective oligosaccharides, were found to superimpose perfectly on each other, implying that the substrate can be modelled on algal cellulases, with the aim to locate the position of sthe ubstrate and orientation of subsites within the active-site [3,25]. To show the accessibility of the substrate, the degree of indentations and obstructions in the form of blockages and cavities around the active-site

are depicted in different shades of color ranging from orange/red (humps) to dark blue (depressions) (Figure 6, middle panels).

Cellulose is a very recalcitrant crystalline polymer due to intensive H-bonding between tightly packed glucose chains [1] and the cleavage of β-glycosidic bonds is much more energy-demanding than α-linked glycosidic bonds [33]. Glucanases that act on cellulose are broadly divided into endocellulases and exocellulases [34]. Endocellulases have an open cleft or groove structure that allows the binding of many sugar units to randomly cleave internal bonds, with concomitant release of cellulose chain after every cleavage and, finally, producing oligosaccharides (Figure 5a). In contrast, exocellulases (cellobiohydrolases) from the families, GH5 and GH6, such as from *Humicola insolens* (Figure 6f), have a tunnel structure due to loops that partially cover the active-site, enabling the cellulose chain to thread through [5]. This geometry allows exoglucanase to hold on to the cellulose chain during product release without losing the chain to the surroundings. During such a processive movement, every alternate glucose unit is presented to the active site, resulting in the liberation of the cellobiose product [35].

In nature, however, the differentiation between an exo- and endo-acting cellulase is, at best, blurred, since many cellulases show characteristics of both endo- and exocellulase, depending on the active-site architecture and/or the presence of CBM close to the reducing end of the active-site [3,18,35,36]. True exocellulase with unique tunnel-like active-sites, such as in GH6 cellobiohydrolase [35] (Figure 6f, *H. insolens*), have not yet been found in GH9 family cellulases. However, one GH9 member (CbhA from *Ruminiclostridium*, 1UT9) was shown to have exocellulase activity (despite the absence of a tunnel-like active site), which was explained by the abrupt blockage of the active-site after the −2 subsite by a GEDNGLW loop, which is absent in other GH9 cellulases (Figures 5b and 6e). However, a transient tunnel formation by extended loops (DIYA-NDDY, Supplementary Figure S1a; Figure 6p–r) upon substrate binding has also been proposed [23].

The mixed exo/endo type cellulases show some type of blockage of the active-site [3,18]. A classic example showing this type of active-site architecture is a processive cellulase (E4) from Thermobifida fusca in which the non-reducing end is blocked (Figure 6c grey "tower block"). This block acts as a "measuring stick", resulting in cleavage products that are not any longer than cellotetraose that exit towards the bottom, whereas the remaining chain is held in place by the C-terminal CBM and is fed to the active-site in a processive manner (Figure 5c) [3,18]. With time, the enzyme cleaves cellulose and cellooligosaccharides (G5-G6) into cellotetraose and smaller oligosaccharides (G1-G3). Further incubation of G3-G6 cellooligosaccharides with the E4 results in the formation of a mixture of cellotetraose, cellotriose, cellobiose, and glucose products, as determined by thin-layer chromatography using purified enzymes [37]. The formation of cellotriose and cellobiose from amorphous and crystalline cellulose by cloned and purified *Clostridium thermocellum* and *Saccharophagus degradans* processive endoglucanases has also been demonstrated [4]. Other mechanisms have also been described to account for processive endoglucanases, including the presence of a CBM that binds cellulose, disrupts its crystalline structure, and feeds substrate to the active-site. Additionally, an increase in the substrate affinity for the active-site to prevent instant dissociation of the cellulose chain after initial attack has also been proposed [36]. Interestingly, a change discovered in a single amino acid around the active-site can convert a non-processive into a processive pectinase [38]. Later work extended this to *C. cellulolyticum* cellulases and proposed that the presence of a single critical aromatic residue around the active-site can influence the processive behaviour [22].

The active-site architecture of all algal enzymes, except Vc2952174, illustrates a fully open cleft (Figure 6a–n upper panels) with tower blocks towards the non-reducing end, suggesting that these may either be GH9 exo (as 1UT9) [23] or exo/endo processive enzymes [5,19,39]. The active-site architecture and the accessibility analysis of algal enzymes, such as Cr9D and Gp44756, is indicative of an open cleft and the absence of any "tower blocks" on the non-reducing end of the active-site (Figure 5a; Figure 6i,l, upper and lower panels), implying that these may simply be pure endocellulases [22].

The cavity analysis gives additional support for the presence of humps (greenish to yellowish shades) near the non-reducing end (−4 subsite) in the exo/endo-type of cellulases, whereas the active-site cleft is depicted in shades of blue depending on the depth (Figure 6j,k,m,n, middle panels). The lower panels (Figure 6) show the view from the reducing end looking down to the active-site cleft. This view confirms that, whereas pure endoglucanases (1KS8, a; 1CLC, b, Cr9D, i; Gp44756, l; Figure 6, lower panels) show low-height obstructions, GH9 exo/endo and exo-type of cellulases are characterized by tower blocks (Figure 6, lower panels; 4TF4/1JS4, c; 1KFG, d; 1UT9, e; Cr9B, g; Cr9C, h; Gp51468, j and k; Gp51466, m and both Vc cellulases, n and o). As both GH9 exoand exo/endo processive cellulases are characterized by tower blocks (Figure 6), to unambiguously distinguish between these two types, loops responsible for purely exocellulase activity in CbhA from *Ruminoclostridium* (1UT9) were modelled for all algal enzymes (Figure 6p–r). Modeling of CbhA (1UT9) shows that these extra loops (QGY-WGS and NSPH-GCFT, Supplementary Figure S1a, pink boxed) are responsible for exo-activity by either partially covering the active-site near the non-reducing end (−4 subsite) or by running parallel along the active-site (IYAE-NDDY, Supplementary Figure S1a). This specific conformation means that they are modeled to cover the active-site upon substrate binding (Figure 6p–r, red loops) [23].

Among all algal cellulases (Figure 6g–o), only Vc2952174 has the necessary loop (CVSR-GSAR, Supplementary Figure S1a) that can block the active-site (Figure 6o,r; light pink) similar to that of CbhA exo-cellulase (red loops). In purely endo (1KS8) and exo/endo (4TF4) cellulases, these loops point away from the active-site, as seen in all algal cellulases except Vc2952174 (Figure 6p–r). The loops in all microalgal cellulases (Figure 6p–r), equivalent to the CbhA long loop (Figure 6p–r, red), running parallel to the active-site are much shorter. In the absence of X-ray structures and experimental data, it is not clear whether these shorter loops in microalgal cellulases will occlude the active-site upon substrate binding, such as in CbhA [23]. It is noteworthy that, in Vc2952174 (CVSR-GSAR) and Cr9B (THTD-GSSS), there is an extra loop that is absent in all cellulases described here (Supplementary Figure S1a). This loop covers the active-site in Vc2952174 (Figure 5b; Figure 6r, light pink loop), but is farther away from the active-site in Cr9B (Figure 6p, yellow loop).

Currently available experimental data on Chlamydomonas (Cr) cellulases can be exploited in support of our assignment of different algal cellulases as GH9 exo-, endo-, and exo/endo types. These cellulase types can be differentiated on the basis of substrates utilized and products released [12]. Cellulases with open clefts, such as exo-acting GH9, endo, and exo/endo, can hydrolyze filter paper, as well as carboxymethyl cellulose (CMC). However, whereas endo-acting enzymes form oligosaccharides preferably from amorphous cellulose, exo and exo/endo-acting enzymes can also produce cellobiose [18,34,37]. The published results showed that a mixture of all three Cr cellulases can utilize both CMC, crystalline Avicel, and filter paper, with the release of C5, C4, and C3, as well as C2 (cellobiose) as products, suggesting the presence of either an exo- or an endo- and at least one processive mixed exo/endo types of cellulase [12].

Collectively, based on the experimental data of *Cr cellulases* [12] and active-site and loop analysis described here (Figure 6), it can be deduced that all algal cellulases are likely to be exo/endo processive enzymes (presence of tower blocks with exo-loops shortened and pointing away from the active-site), except for Cr9D (Figure 6i) and Gp44756 (Figure 6l), which seem to be endoglucanases (absence of tower blocks with exo-loops shortened and pointing away from the active-site). The corresponding tower block (Figure 6e, 1UT9), due to extra loops [23] responsible for exocellulase activity in CbhA, seems to be pointing away from the active-site or is shortened in all microalgal cellulases, except in Vc2952174 (Figure 6o) where the loop (light pink) covers the active-site (Figures 5b and 6r). However, it is possible that these shortened loops in microalgal enzymes (Figure 6p–r) may close the active-site upon substrate binding. It is interesting that both the C- and N-terminal CDs in Gp51468 seem to have similar activities (processive exo/endo). It is noteworthy that Gp51468 is composed of two CDs separated by a linker, which is also found for spinach cellulase (Figure 3).

**Figure 6.** *Cont.*

(**p**) Loops: Cr, 1UT9, 1KS8, 4TF4 (**q**) Loops: Gp, 1UT9, 1KS8, 4TF4 (**r**) Loops: Vc, 1UT9, 1KS8, 4TF4

**Figure 6.** Active-site architecture of selected family GH9 cellulases determined from X-ray structures and homology models. Upper panels (**a**–**o**): Top view of the active-site. Critical residues surrounding the active-site. Blue, H; pink, W; turquoise, R/K; yellow, S; in lieu of substrate-binding, W; green, Y; orange, F; red, catalytic residues (E/D); grey, blocking residues/loops; Middle panels (**a**–**o**): Cavity analysis of the active-site pocket highlighting clefts, tunnels, and blocks in various shades. Dark blue, completely buried; orange/red, at least 75% surface accessible. Upper and middle panels showing substrate (C4 + C2) from −4 non-reducing (left) to +2 reducing end (right); Lower panels (**a**–**o**): View of the active-site from +2 to −4 subsite looking down the cleft/barrel highlighting the absence or presence of "tower blocks" (grey) at the non-reducing end. The extra loop in Vc2952174 (**o**) is shown as ball and stick (pink); (**p**–**r**): Analysis of the blocking loops/secondary structure elements in microalgal CDs compared with 1KS8, (endo-type, white), 4TF4 (exo/endo-type, brown) and 1UT9 (exo-type, red);

(**p**) Cr9B (XP\_001701544), yellow; Cr9C (XP\_001701546), light green, Cr9D (XP\_001696497), dark green; (**q**) N-Gp (KXZ51468), turquoise; C-Gp (KXZ51468), blue; Gp (KXZ51466), magenta, Gp (KXZ44756), orange; (**r**) Vc (XP\_002952174), light pink; Vc (XP\_002958622), dark blue. Black, cleaved hexose substrate. The text description is as in Figure 3.

### *2.5. Novel Cysteine-Rich CBM in Algal Cellulases*

The non-catalytic CBMs recognize polysaccharides and promote the association of the enzyme with its substrate, although standalone CBMs that are not linked to CDs have also been described [40]. Based on sequence similarity, CBMs are currently divided into 83 families (CAZy database available online: http://www.cazy.org/Carbohydrate-Binding-Modules.html). Three main functions of CBMs have been described [40] that include concentrating CDs of enzymes on the surface of polysaccharides for enhanced degradation, targeting distinct regions of a polysaccharide, such as crystalline cellulose [41,42], and, possibly, disrupting polysaccharide structure via replacement of H-bonds in crystalline cellulose by H-bonds from polar residues in CBM [3,43]. In addition, CBMs were proposed to help feed cellulose chain into the catalytic site, especially in the case of processive endocellulases [18].

Among the microalgal cellulases that have been described here, only Cr9C does not have a CBM, whereas all other enzymes have CBMs located on the C-terminal side, with Vc2952174 having two CBMs (Figure 3). Multiple alignment (Supplementary Figure S1a, pink highlighted), phylogenetic analysis (Supplementary Figure S4 built using the sequences in Supplementary Figure S5), and identity matrix of putative microalgal CBMs compared to CBMs across different families (1–6, 10, 11, 12, 14, 17/28, 18, 20, 41, 43–45, 47–50, 53, 81) and taxonomic groups (bacteria, fungi, microalgae, invertebrates, plants) shows low similarity between them. The identity matrix of top hits (Supplementary Figure S6) shows that, although Cr, Gp, and Vc putative CBMs have high identity (15–75%) with each other, microalgal CBMs show lower identity (19–27%) across members of known families, implying that Cr, Gp, and Vc CBMs do not belong to any of the previously described families in the CAZy database (available online: http://www.cazy.org/Carbohydrate-Binding-Modules.html). Like CBM14 and 18 family members, Cr, Gp, and Vc putative CBMs have a high percentage of cysteine residues.

To identify motifs in Cr, Gp, and Vc, CBM sequences from members belonging to different families and taxonomic groups were subjected to MEME analysis (Table 1). In addition to 2-C motifs that were found in all algal CBMs, 6-C and 4-C residue motifs were only found in Cr9B, Gp51466, Gp51468, and Vc2958622 (Table 1, motifs 1–2). It is noteworthy that none of the microalgal CBMs (Cr, Gp and Vc) have a Hevein motif characteristic of cysteine-rich CBM18 members. None of the motifs 1–3 are found in any other algal, bacterial, fungal, or plant CBMs, including Cys-rich CBM1, CBM14, and CBM18, nor in CBMs that are commonly associated with endo- (EC:3.2.1.4) and exoglucanases (EC: 3,2.1.91). Based on our results (Supplementary Figures S3 and S4, Table 1), we propose that Cys-rich algal GH9-appended CBMs are classified into a new CBM family or two separate families. One family may include Cr9B, Gp51466, Gp51468, and Vc2958622, whereas another family may include Cr9D, Gp44756, and Vc2952174.

It has been proposed that a lack of aromatic residues in the CBM binding region, along with flexible linkers, results in a decreased cellulose-CMB affinity that can promote movement and feeding of the cellulose chain to the catalytic site of processive endoglucanes [18]. In the context of microalgal GH9-appended exo/endo processive glucanase (Section 2.4, Figure 6), described here, this feature may be crucial, however, the absence of structural data precludes drawing any further conclusions. The presence of multiple C residues (10–16) in algal cellulases is also intriguing. For example, a CBM-like region on the C-terminal side of a CD/linker containing an eight cysteine-box with 4-disulfide bridges has been proposed to promote substrate binding, help in the folding of secretory proteins, maintain conformational stability, and induce a conformational change required for activity [44]. In the present study, the identification of novel CBMs in Cr, Gp, and Vc is solely based on the evidence

that modular cellulases require CBM modules, along with CD and linkers (Supplementary Figure S1a). In view of the novelty of algal CBMs described here, binding data between CBMs and cellulose is vital for unequivocal confirmation.


**Table 1.** Motif analysis of GH9-appended microalgal CBMs (carbohydrate binding modules) by MEME.

None of the CBM1 and CBM14 motifs were found in any Cr, Gp, and Vc sequences. For a comparison, the consensus sequence of the Hevein motif is provided in the table.
