*2.6. Algal Cellulases Have PS-Rich Linkers*

The linkers were, generally, thought of simply as a connecting "rope" between CD and CBMs, which, due to their flexibility, allow cellulases to nudge forward on the surface of cellulose with a caterpillar-like movement [45]. However, recent data point to many crucial functions, such as binding of glycosylated linkers to cellulose substrate [28] and the modulation of endoglucanase activity [42]. Putative linkers have been found in all algal cellulases (Figure 7). All these linkers are located between CD and CBMs, except in Cr9C, which is lacking a CBM (Figure 3). Collectively, algal linkers can be roughly classified into P/S-rich (N-Gp51468, C-Vc2952174 and Vc2958622), P/S/T-rich (Cr9D) and non-P/S/T linkers (Cr9B, Cr9C, Gp51466, C-Gp51468 and N-Vc2952174). Although it is straightforward to identify PS/PT linkers [29,46], it is not possible to confidently assign linker functions to non-P/S/T regions because of high sequence variability [27,30,46] (Figure 7). In cellulases, the average length of linkers is 20–50 residues [47]. However, linkers as small as 6–14 residues long and lacking S and/or T residues and as long as >100 residues have been reported in addition to the substitution of Ig-like or Fn3-like domains in lieu of linkers between CD and CBD modules (Figure 3) [27,30,32,46]. Interestingly, Vc2952174 has two linkers (N-Vc2952174 lacking P/S/T residues and C-Vc2952174 rich in P/S residues), one for each CBM (Figures 3 and 7). On the other hand, Gp51468 has two linkers on the C-terminal side of a single CBM (Figure 3), where N-Gp51468 is PS-rich and C-Gp51468 is lacking P, S, and T residues (Figure 7). It is intriguing to note that GH9-appended microalgal linkers have a preponderance of PS residues (Figure 7), which contrasts with PT residues found in the invertebrate metazoan abalone [48] and *Caldocellum saccharolyticum* cellulase linkers [29]). While many cellulase-appended linkers from *Pseudomonas fluorescence* are S-rich, these have very low P content [32,47], unlike microalgal linkers, which are found to be P/S-rich (Figure 7).

Glycosylation shows great diversity that depends on the sugars (type, sequence, chain length, branching point, anomeric nature) attached to various amino acid side chains that, generally, include N for N-linked modification; S, T, and Y for O-linked modification via the OH-group, and W for C-mannosylation. Whereas N-linked glycosylation requires a N-X-S/T consensus sequence (X can be any amino acid, except P), no consensus motif has been described for *O*-glycosylation. As glycans are secondary gene products, glycosylation is also cell/tissue and species specific [49]. Although *N*-glycosylation of the secreted proteins in microalgae is well documented [50], no information is available regarding *O*-glycosylation in modular algal glucanases. However, glycosylation in fungal glucanases, including *Trichoderma reesei*, has been described that displays extensive modification of

linkers with di- and tri-saccharides at OH-groups of T and S residues [51]. S and T residues confer different properties to glycosylated peptides. In contrast to S, the steric repulsion between the side chain methyl group in T and the carbohydrate moiety can drastically alter the sugar to peptide backbone orientation with the possibility of altered water structure and/or H-bond formation [52]. These modifications likely lead to changes in the binding affinity of S versus T *O*-glycosylated peptides to the polysaccharide substrate, however, further experimental verification is required in the case of cellulase linker-cellulose interaction.

The presence of linkers with different amino acid sequences implies different functions. Linkers are highly divergent in lengths and sequences, but typically contain G, P, S, and T residues. P imparts extended conformation [53] and does not form H-bonds, while G provides flexibility, and S/T are often involved in *O*-glycosylation, which confers rigidity, stability, and protease-resistance [27,47]. Recent studies have found that the length (distance between the CD and CBM) and rigidity/flexibility of linkers play a critical role in the efficient functioning of cellulases; however, the precise role of a linker in the structure-function of modular cellulase is not yet fully understood [54]. For example, an increase in the number of PS/T boxes enhanced the cellulolytic activity on crystalline cellulose due to desorption of the enzyme from the substrate [55], whereas progressive shortening of linkers were shown to cause a decrease in flexibilty, with concomitant reduction in activity and enhancement in stability [56].


**Figure 7.** Linkers in microalgal cellulases (between the arrows). Black, PS, or PST linkers; red, putative linker sequence (or may be part of C-terminal CD or N-terminal CBM).

#### *2.7. Expression of Cellulases in Gonium Pectorale (Gp)*

RT-qPCR revealed an increased expression when *Gonium* was cultivated in the presence of cellulose for two of the three cellulases analyzed in this study (Figure 8). The genes, *Gp51466* and *Gp51468*, showed a statistically-significant increase in expression in the presence of filter paper. The cellulase encoded by *Gp44756* shows a trend, which is, however, not statistically-significant. Nevertheless, if we consider the phylogenetic position of *Gp44756*, the modelling results shown in Figure 6 and the gene expression analysis, it is reasonable to assume *Gp44756* is the *G. pectorale* ortholog of *CrCel9D*.

**Figure 8.** Gene expression analysis of the three *G. pectorale* cellulases after growth for 14 days under continuous light in the presence/absence of 0.1% (*w*/*v*) filter paper. Asterisks denote statistically-significant values after Student's *t*-test (\* *p*-value < 0.05).

#### **3. Materials and Methods**

### *3.1. Computational Methods*

The cellulase accession numbers are indicated in the phylogenetic tree. The sequences were taken from [17], manually truncated to CDs, and enriched by blasting in the Metazome database (Available online: https://metazome.jgi.doe.gov/pz/portal.html#!search?show=BLAST). The physico-chemical properties of algal cellulases were determined using the ProtParam tool (Available online: http: //web.expasy.org/protparam/). Conserved domains and GH-family assignment were identified with the MotifScan (Available online: http://myhits.isb-sib.ch/cgi-bin/motif\_scan) and ScanProsite [21] algorithms. The pair-wise multiple alignment of algal cellulases for identifying conserved residues and motifs were determined by using CLUSTAL-Ω (Available online: http://www.ebi.ac.uk/ Tools/msa/clustalo/) [57]. The 3D homology models of the algal sequences comprising complete sequences, as well as only CD regions, were generated with I-TASSER Suite (Available online: http://zhanglab.ccmb.med.umich.edu/I-TASSER/) [58] utilizing LOMETS, SPICKER, and TM-align. The models were then refined using REMO by optimizing the backbone hydrogen-bonding networks and FG-MD by removing the steric clashes and improving the torsion angles. Separate homology models for Gp1468 and the spinach homolog were generated due to the presence of two CDs. The residues implicated in substrate binding and activity were manually annotated using the 3D structures of cellulase templates available in the PDBsum database (Available online: http://www.ebi. ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=index.html) [59], CLUSTAL-Ω, and COACH/ COFACTOR tools within the I-TASSER Suite and published literature. The final structures showing various domains, conserved regions, motifs, and active-site architecture (including surface accessibility, blocks, clefts, and tunnels) were visualized by superimposing each model on the *T. fusca* template (4TF4) in the presence of cellotetraose substrate from −1 to −4 subsites and cellobiose from +1 to +2 subsites with DeepView Swiss-PdbViewer v4.1 (Available online: http://www.expasy.org/spdbv/) [60]. For the phylogenetic analysis, truncated CD and CBM sequences (the sequences are given in Supplementary Figures S2 and S5) were aligned with Clustal-Ω and the alignment submitted to PhyML [61] (available at http://phylogeny.lirmm.fr/phylo\_cgi/one\_ task.cgi?task\_type=phyml) to obtain a maximum likelihood phylogenetic tree (100 bootstraps). The tree

was visualized with iTOL-Interactive Tree of Life (Available online: http://itol.embl.de/). Putative CBMs in Cr, Gp, and Vc algae were analyzed using MotifScan and the CAZY database (Available online: http://www.cazy.org/Carbohydrate-Binding-Modules.html) by a manual search through all 83 families. For CBM analysis, the selected sequences from the CAZY database belonging to various families were truncated to only CBM parts by subjecting these sequences to MotifScan (Available online: https://myhits.isb-sib.ch/cgi-bin/motif\_scan). Full sequences were used for standalone CBMs, especially from the algae and for those sequences where CBM motifs were not identified. The CBM sequences were aligned using CLUSTAL-Ω and subjected to the MEME tool (Available online: http://meme-suite.org/tools/meme) for the discovery of novel motifs [62].

#### *3.2. Growth of G. pectorale, RNA Extraction, cDNA Synthesis and RT-qPCR*

*G. pectorale* (strain K3-F3–4, mating type minus, NIES-2863 obtained from the Microbial Culture Collection at National Institute for Environmental Studies, Tsukuba, Japan; Available online: http://mcc.nies.go.jp/) was grown under continuous light (1300 lux) in 50 mL of modified Bold's 3N medium (UTEX, Austin, TX, USA) for 14 days in the presence/absence of autoclaved 0.1% *w*/*v* Whatman Grade 1 filter paper (Merck, Darmstadt, Germany). Algae were centrifuged for 10 min at 15,000× *g*, the pellet immediately frozen in liquid nitrogen, and cells disrupted using sterilized 5 mm stainless steel beads and a bead beater (Retsch MM400, Aartselaar, Belgium) set at 20 Hz for 2 min (the holders were previously cooled with liquid nitrogen to avoid heating of the samples during disruption). Total RNA was extracted using the Qiagen RNA extraction kit (Qiagen, Leusden, The Netherlands) coupled to the on-column DNase I digestion. The RNA purity and quality were measured with a Nanodrop ND-1000 (Thermo Scientific, Villebon-sur-Yvette, France) and a 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA), respectively. Two hundred nanograms of RNA were retrotranscribed into cDNA with the ProtoScript II RTase (New England Biolabs, Leiden, The Netherlands) and random primers, according to the manufacturer's instructions.

The cDNA was diluted to 2 ng/μL and 2 μL were used for the RT-qPCR analysis (final volume of the reaction: 10 μL) in 384-wells plates. An automated liquid handling robot (epMotion 5073, Eppendorf, Hamburg, Germany) was used to prepare the plates, which were run on a Viia™ 7 System (Thermo Scientific, Villebon-sur-Yvette, France). The TaqMan Low ROX 2x Mix was used (Takyon, Eurogentec, Seraing, Belgium). To ensure robust results, the TaqMan chemistry was used to evaluate *Gonium* cellulase relative expression (fluorescent dye and quencher used are FAM-TAMRA, target-specific primers and probes are given in Supplementary Table S3). The expression of the *Gonium* cellulases was calculated with qbase<sup>+</sup> version 3.1 (Biogazelle, Zwijnaarde, Belgium; Available online: www.qbaseplus.com) after normalization using the genes, *rpl23* and *eef1,* that the program, geNorm™, identified as the most stable. Normalized relative quantities were calculated according to [63], by considering specific target PCR efficiency and multiple reference gene normalization. Here, four candidate reference genes were validated for gene expression analysis, the eukaryotic translation elongation factor, 1α *eef1*, *rpl23*, encoding the 60S ribosomal protein, L23, *tbpA* coding for the TATA-box binding protein and *tubA1* coding for α tubulin. For statistical analysis, the normalized relative quantities exported from qbase<sup>+</sup> were subjected to a Student's *t*-test, as implemented in Excel.

The primers used in this study are reported in Supplementary Table S3. Primers were designed using Primer3Plus (Available online: http://www.bioinformatics.nl/cgi-bin/primer3plus/ primer3plus.cgi/) and further checked with the OligoAnalyzer 3.1 tool from Integrated DNA technologies (Available online: http://eu.idtdna.com/calc/analyzer). Primer efficiencies were checked via qPCR using 6 points of a serial five-fold dilution of cDNA starting at 20 ng.

#### **4. Conclusions and Future Direction**

This is the first report on the bioinformatics of algal family GH9 cellulases. The GH9 catalytic domains of algal cellulases form a distinct group, which is phylogenetically closer to invertebrate metazoan than plant or bacterial homologs. All algal enzymes were found to be modular and analysis of the active-site architecture of the considered CDs indicates endoglucanase and mixed exo/endo (processive endoglucanase) types of activities. It has been suggested that the lack of pure cellobiohydrolases (exo-acting) in algae are compensated by the presence of many processive endoglucanases, along with endocellulases, to produce a simple and efficient enzyme system for the degradation of cellulose [4]. Except for Cr9C, all cellulase homologs have at least one putative C-terminal novel cysteine-rich CBM. The presence of novel CBMs and PS-rich linkers, in combination with CDs, indicate that the studied cellulases may have enhanced catalytic properties suitable for the efficient degradation of cellulosic biomass. In this context, Gp51468 is of special interest as it is composed of two CDs with exo/endo activities, two different linkers, and a single CBM. Future work will involve cloning, purification, and crystallization of Gp51468 to fully understand its mode of action, as well as growing it in the presence of different cellulosic substrates for the production of valuable biochemicals.

#### **Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/19/6/ 1782/s1.

**Author Contributions:** G.G. and K.S.S. conceived the idea of writing the paper and designed the experiments; G.G., K.S.S. performed the experiments; K.S., S.L. and I.A. contributed to the experimental work concerning bioinformatics and the Taqman-based RT-qPCR; G.G., I.A., K.S.S. analyzed the data; G.G. and K.S.S. wrote the paper draft; all authors discussed, contributed to and edited the paper.

**Acknowledgments:** The personal assistance of K.S.S. and I.A. by KFUPM is acknowledged.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
