*Review* β**-Xylosidases: Structural Diversity, Catalytic Mechanism, and Inhibition by Monosaccharides**

**Ali Rohman 1,2, Bauke W. Dijkstra <sup>3</sup> and Ni Nyoman Tri Puspaningsih 1,2,\***


Received: 15 October 2019; Accepted: 4 November 2019; Published: 6 November 2019

**Abstract:** Xylan, a prominent component of cellulosic biomass, has a high potential for degradation into reducing sugars, and subsequent conversion into bioethanol. This process requires a range of xylanolytic enzymes. Among them, β-xylosidases are crucial, because they hydrolyze more glycosidic bonds than any of the other xylanolytic enzymes. They also enhance the efficiency of the process by degrading xylooligosaccharides, which are potent inhibitors of other hemicellulose-/xylan-converting enzymes. On the other hand, the β-xylosidase itself is also inhibited by monosaccharides that may be generated in high concentrations during the saccharification process. Structurally, β-xylosidases are diverse enzymes with different substrate specificities and enzyme mechanisms. Here, we review the structural diversity and catalytic mechanisms of β-xylosidases, and discuss their inhibition by monosaccharides.

**Keywords:** biomass; hemicellulose; bioethanol; xylanolytic enzyme; hemicellulase; glycoside hydrolase

### **1. Introduction**

Xylan is a prominent component of cellulosic biomass, a heterogeneous complex of carbohydrate polymers (cellulose and hemicellulose) and lignin, a complex polymer of phenylpropane units [1–3]. Hemicellulose, including xylan, makes up approximately one-third of the carbohydrate content of common agricultural and forestry waste [1–3]. It is among the most inexpensive non-food biomass that is sustainably available in nature in large quantities, and that can be converted into biofuel or other value-added products, such as low-calorie sweeteners, prebiotics, surfactants and various specialty chemicals [1,4–6]. Structurally, xylan is a complex heteropolysaccharide with a glycosidic β-(1,4)-linked d-xylose backbone that is frequently substituted with side chains of arabinose, glucuronic acid and other groups. In turn, these side chains may be further esterified with acetic, ferulic, and *p*-coumaric acids (Figure 1). The type and frequency of the side chains and their substituents vary with the source of xylans [2,7–11].

**Figure 1.** Example of the structure of a plant xylan with the cleavage sites of various xylanolytic enzymes indicated. A β-d-xylopyranose unit with numbered carbon atoms is shown in the middle. Glycosidic bonds and xylanolytic enzymes that hydrolyze them are depicted in the same color [7,9].

As a complex heteropolysaccharide, full degradation of xylan into its monosaccharide constituents requires the concerted action of various hydrolytic xylan-degrading enzymes with different specificities (Figure 1). These enzymes include α-l-arabinofuranosidase (EC 3.2.1.55), α-d-glucuronidase (EC 3.2.1.139), acetylxylan esterase (EC 3.1.1.72), and *p*-coumaric acid and ferulic acid esterases (EC 3.1.1.73), which release the side chain substituents from the xylan backbone, and *endo*-β-1,4-xylanase (EC 3.2.1.8), which works synergistically with β-xylosidase (EC 3.2.1.37) to break down the xylan backbone. *Endo*-β-1,4-xylanase hydrolyses the internal β-(1,4) linkages of the xylan backbone producing short xylooligosaccharides, while β-xylosidase removes xylose units from the non-reducing termini of these xylooligosaccharides [2,7,8,10]. In nature, xylanolytic enzymes are mainly found in numerous saprophytic microorganisms, such as fungi, actinomycetes and other bacteria, as well as in the rumen biota of higher animals. The microorganisms secrete the enzymes, for example, as a strategy for expanding their versatility to use primary carbon sources [7,11–14].

Xylan-degrading enzymes have found application as environmentally friendly agents in a wide range of industrial processes, such as bleaching of paper pulp, deinking of recycled paper, enhancing the digestibility and nutritional properties of animal feed, degumming of plant fiber sources, manufacturing of beer and wine, clarification of fruit juices and maceration of fruits and vegetables, preparation of high-fiber baked goods, and the extraction of coffee [2,7,15,16]. Furthermore, the enzymes are applied during the saccharification of pretreated agricultural and forestry cellulosic biomass into fermentable sugars [2,15], e.g., for producing biofuel.

In xylan saccharification, β-xylosidase is a crucial enzyme since, of all the xylanolytic enzymes, it cleaves the greatest number of glycosidic bonds. [17–19]. In addition, because xylooligosaccharides are potent inhibitors of *endo*-β-1,4-xylanases and cellulases, the activity of β-xylosidase can improve the efficiency of the saccharification process by degrading the xylooligosaccharides and thus alleviating inhibition of those enzymes [7,11,20–23]. However, most of the characterized β-xylosidases are, to some extent, also inhibited themselves by xylose, arabinose, glucose, and/or other monosaccharides [2,11,24–27]. This is an important problem, since in industrial cellulosic biomass saccharification, the monosaccharides may accumulate to high enough concentrations to significantly reduce the activity of β-xylosidase, even in simultaneous saccharification and fermentation processes, where monosaccharides are directly consumed by the fermenting organisms [26,27]. This adverse property may severely reduce the efficiency of β-xylosidases in the saccharification process.

In this review, the structural diversity, catalytic mechanisms and inhibition by monosaccharides of β-xylosidases are discussed.

#### **2. Structural Diversity of** β**-xylosidases**

β-Xylosidases are a group of structurally diverse enzymes with varying specificities, in line with the diversity of the organisms that produce them and the heterogeneity of their substrates [28]. However, as commonly observed in other glycoside hydrolases (GHs), they hydrolyze the glycosidic bond via one of two routes, either with overall retention or with overall inversion of the anomeric carbon configuration [29].

GHs are classified in the Carbohydrate-Active Enzymes database (CAZy; http://www.cazy.org/), which groups the enzymes into families based on their amino acid sequence similarities [30,31]. As there is a direct relationship between amino acid sequence similarity and similarity of folding, the classification also represents the structural features and commonality of the catalytic mechanism of the enzymes. Thus, enzymes in a particular family display highly similar three-dimensional structures and catalytic mechanisms [29,32]. At present, 161 GH families (GH1 to GH161) are represented on the CAZy server. Nevertheless, despite divergent amino acid sequences, several different GH families show significantly similar protein folding and active site architecture. Such GH families are considered to have a common ancestor and, therefore, have been grouped together into a clan [33]. To date, 18 GH clans (GH-A to GH-R) have been assigned in the database.

A search using the enzyme classification number for β-xylosidase (EC 3.2.1.37) in the CAZy database [31] revealed that enzymes with this number are presently found in 11 different GH families (Table 1). Nevertheless, a further literature examination suggests that 3 families, i.e., GH1, GH54 and GH116, may not contain enzymes with β-xylosidase activity on natural substrates. The enzymes from *Reticulitermes flavipes* (RfBGluc-1; GenPept accession No. ADK12988) [34] and *R. santonensis* De Feytaud (GenPept ADT62000) [35] in GH1, and *Trichoderma koningii* G-39 (TkAbf; GenPept AAA81024) [36] in GH54 were classified as β-xylosidases because they hydrolyze artificial nitrophenyl-β-d-xylopyranoside derivatives. However, to our knowledge there is no evidence that these enzymes are able to release xylose from natural substrates. Similarly, a bifunctional aryl β-glucosidase/β-xylosidase from the hyperthermophilic archaeon *Saccharolobus solfataricus* P2 (formerly *Sulfolobus solfataricus*; SSO1353; GenPept AAK41589) in GH116 is called so based on its activity on aryl β-glucosides and β-xylosides, but the enzyme likely does not hydrolyze xylooligosaccharides [37]. All in all, this suggests that enzymes with β-xylosidase activity on natural substrates currently occur in only 8 GH families in the CAZy database, i.e., in GH families 3, 5, 30, 39, 43, 51, 52 and 120.


**Table 1.** Distribution of the current β-xylosidases in the CAZy database, their catalytic domain fold, their type of catalytic mechanism, and their catalytic residues.

†: Catalysis by GHs commonly proceeds with either retention or inversion of the substrate's anomeric carbon configuration. See main text for further information. ‡: It is unknown whether the enzymes from GH1, GH54, and GH116 have <sup>β</sup>-xylosidase activity on natural substrates. #: Not part of a clan. §: General base; General acid. %: Not assigned in the CAZy database. Data are from the crystal structure of the <sup>α</sup>-l-arabinofuranosidase from *Aspergillus kawachii* IFO4308 [38].

#### *2.1. Glycoside Hydrolase Clan A (GH-A)*

GH families 1, 5, 30, 39 and 51 are part of clan GH-A, the largest clan in the CAZy database with currently 23 GH families. Enzymes in this clan all have a (β/α)8 catalytic domain, also known as triose-phosphate isomerase (TIM) barrel domain [39].

Of clan GH-A, structural data for β-xylosidases are currently only available for GH39, i.e., β-xylosidases from *Thermoanaerobacterium saccharolyticum* B6A-RI (TsXynB; Protein Data Bank code 1px8; Figure 2e) [40], *Geobacillus stearothermophilus* T-6 (GsXynB1; PDB 2BS9) [41] and *Caulobacter crescentus* NA1000 (CcXynB2; PDB 4EKJ) [42]. These enzymes fold into a three-domain structure, consisting of an N-terminal (β/α)8-barrel catalytic domain, sequentially followed by a β-sandwich and an α-helical accessory domain. Their structures are very similar. Superposition of the structures of isolated proteins gave an overall root mean squared deviation (RMSD) of 1.6 Å for 462 amino acid residues. However, while CcXynB2 exists as a monomer in solution [42], TsXynB and GsXynB1 are present as tetramers [40,41]. The absence of a short amino acid sequence at the C-terminus of CcXynB2, compared to the other two enzymes, has been suggested to prevent the formation of a stable tetramer [42]. Additionally, it has been proposed that subtle structural differences in the accessory domains of these β-xylosidases slightly alter their overall structure and the accessibility of their catalytic region [42].

In the absence of structural data for β-xylosidases from families GH1, GH5, GH30, and GH51 we generated homology models to compare the 3D structures of β-xylosidases from the different families of the GH-A clan. Models were built of the β-glucosidase/β-xylosidase RfBGluc-1 (GH1; Figure 2a) [34], a β-xylosidase from *Phanerochaete chrysosporium* BKM-F-1767 (PcXyl5; GenPept AHL69750; GH5; Figure 2c) [43], a β-glucosidase/β-xylosidase from *Phytophthora infestans* (PiBGX1; GenPept AAK19754; GH30; Figure 2d) [44], and an α-l-arabinofuranosidase/β-xylosidase from *Arabidopsis thaliana* (AtAraf; GenPept AAF19575; GH51; Figure 2h) [45], using 3D structures of their nearest homologs as templates. All resulting models display a (β/α)8-barrel catalytic domain that is highly similar to the catalytic domain of GH39 β-xylosidases (e.g., Figure 2e) and that shows that the catalytic residues of GH39 are present at the equivalent positions in the GH1, GH5, GH30, and GH51 β-xylosidase families. A multiple structural alignment of the catalytic domains of these models and GH39 β-xylosidases gave an overall RMSD of 3.4 Å for 168 amino acid residues, with PcXyl being the most divergent from the other structures. In contrast, the structures of their accessory domains varied with the family. The accessory domains of GH39 β-xylosidases are absent in RfBGluc-1 and PcXyl5, but they are retained at a comparable position in PiBGX1 and AtAraf albeit with some modifications. The major differences are observed for the third domain, in which the GH39 α-helical domain is replaced by a β-sheet and a loop structure in PiBGX1 and AtAraf, respectively.

**Figure 2.** Three-dimensional (3D) structures of β-xylosidases from various GH families. Helix, strand, and loop structures are colored in magenta, blue, and green, respectively. GH family numbers and fold type of their catalytic domains are shown. The structures represented are (**a**) RfBGluc-1 from *Reticulitermes flavipes* (GenPept ADK12988); (**b**) GlyA1 from metagenomic cow rumen fluid (PDB 5K6L); (**c**) PcXyl5 from *Phanerochaete chrysosporium* BKM-F-1767 (GenPept AHL69750) (**d**) PiBGX1 from *Phytophthora infestans* (GenPept AAK19754); (**e**) TsXynB from *Thermoanaerobacterium saccharolyticum* B6A-RI (PDB 1PX8); (**f**) RS223-BX from an uncultured organism (PDB 4MLG); (**g**) GsXynB3 from *Geobacillus stearothermophilus* T-6 (PDB 2EXH); (**h**) AtAraf from *Arabidopsis thaliana* (GenPept AAF19575); (**i**) GT2\_24\_00240 from *Geobacillus thermoglucosidasius* TM242 (PDB 4C1O); (**j**) TkAbf from *Trichoderma koningii* G-39 (GenPept AAA81024); (**k**) SSO1353 from *Saccharolobus solfataricus* P2 (GenPept AAK41589); and (**l**) TsXylC from *Thermoanaerobacterium saccharolyticum* JW/SL-YS485 (PDB 3VST). The structures of (a), (c), (d), (h), (j), and (k) were modeled using PDB entries 3VIK, 1EQP, 2XWE, 2C8N, 1WD3, and 5BVU, respectively, which belong to the same GH family but do not have β-xylosidase activity. Structure modeling was performed using the Swiss-Model server [46]. Figure 2, Figure 3, and Figure 6 were produced using the program PyMol (The PyMOL Molecular Graphics System, v. 0.99, Schrödinger, LLC, http://www.pymol.org).

#### *2.2. Glycoside Hydrolase Family 3 (GH3)*

While GH families 1, 30, 39 and 51 are part of clan GH-A, other β-xylosidases belong to other families that are not part of this clan. GH3 is one of the largest and most diverse GH families in the CAZy database [28,47]. It contains more than 23400 entries with various enzyme activities, including β-xylosidase, β-glucosidase, β-glucosylceramidase, β-N-acetylhexosaminidase, and α-l-arabinofuranosidase activities. A number of GH3 enzymes are reported to be bi/multifunctional, particularly toward synthetic substrates.

Enzymes in family GH3 vary considerably in the lengths of their peptide chains [48,49] and, consequently, in the number of tertiary structure domains [48,50]. The basic structure of GH3 members is a single (β/α)8 TIM-barrel domain [48,50], similar to the domain that is observed in clan GH-A. In most members, the domain is followed by an (α/β)-sandwich domain that varies in size [48], e.g., (α/β)6 in *Kluyveromyces marxianus* NBRC1777 β-glucosidase [51], (α/β)5 in *Thermotoga neapolitana* β-glucosidase [49], or even only an αβα motif in *Bacillus subtilis* 168 β-N-acetylglucosaminidases [52]. Sometimes the order of the domains in the primary structure is reversed [48]. Although these two domains are generally sufficient to organize the active site of GH3 enzymes, frequently GH3 members are extended with a fibronectin type III (FnIII) domain of unknown function at the C-terminus of the (α/β)-sandwich domain [48,49]. Moreover, in some GH3 members, the (α/β)-sandwich domain is interrupted by a PA14 domain. This domain appeared to be important for the substrate specificity of the *Kluyveromyces marxianus* NBRC1777 β-glucosidase [51].

A total of 103 enzymes with β-xylosidase annotation are currently found in GH3, making it the largest β-xylosidase-containing GH family. A protein domain search using the program InterProScan 5 [53] revealed that the majority of the GH3 β-xylosidases are composed of three domains (TIM-barrel, (α/β)-sandwich and FnIII). However, a bifunctional β-xylosidase/β-glucosidase from *Erwinia chrysanthemi* D1 (EcBgxA; GenPept AAA80156) [54] has two domains (TIM-barrel and (α/β)-sandwich) and a β-xylosidase from an environmental sample (G06-24; GenPept ACY24766) [55] has four domains (TIM-barrel, (α/β)-sandwich, FnIII, and PA14). A phylogenetic analysis clustered these two enzymes divergently from the other GH3 β-xylosidases [56].

3D Structures of GH3 β-xylosidases are available for a β-xylosidase from the fungus *Trichoderma reesei* RutC-30 (TrBxl1; PDB 5A7M; GenPept CAA93248) [57] and a β-glucosidase/β-xylosidase from metagenomic cow rumen fluid (GlyA1; PDB 5K6L; Figure 2b) [58]. Both structures have a (β/α)8 TIM-barrel, a (α/β)6-sandwich, and a FnIII domain, but at different positions in the primary structure. As observed for the majority of GH3 structures [48], TrBxl1 has its TIM-barrel domain at the N-terminus, followed sequentially by the (α/β)-sandwich and FnIII domains. This order is reversed in GlyA1, where the (α/β)-sandwich domain is at the N-terminus, followed by the FnIII and TIM-barrel domains. In addition, GlyA1 has an additional domain with unknown structure at its C-terminus [58]. Despite this, the 3D structures of TrBxl1 and GlyA1 are conserved, with the TIM-barrel and (α/β)-sandwich domains, as well as the catalytic residues superimposing reasonably well when the domains are structurally aligned.

#### *2.3. Glycoside Hydrolase Family 43 (GH43)*

GH43 is the second largest β-xylosidase-containing GH family with currently 96 members annotated as β-xylosidase. In addition to β-xylosidases, this family also contains enzymes with (putative) α-l-arabinofuranosidase, arabinanase, xylanase, galactan 1,3-β-galactosidase, α-1,2-larabinofuranosidase, *exo*-α-1,5-l-arabinofuranosidase, *exo*-α-1,5-l-arabinanase, or β-1,3-xylosidase activities. As observed for the GH3 members, several enzymes in this family are bi/multifunctional.

Together with GH62, GH43 is grouped into clan GH-F in the CAZy database with a structural characteristic of a 5-bladed β-propeller catalytic domain [59]. Some of its members contain only this single catalytic domain, and, based on their domain architecture, were classified as type I [60]. In other members, the catalytic domain is extended with a family 6 carbohydrate-binding module (type II), or a unique β-sandwich domain that is designated as X19 [61] (type III), or contain an even more complex domain composition and organization (type IV). The extensions are commonly fused at the C-terminus of the catalytic domain [60,62], although in a β-xylosidase from *G. thermoleovorans* IT-08 (GbtXyl43B), for example, the extension is at the N-terminus [63]. Thus, GH43 contains enzymes that vary both in the lengths of their primary structure and in their number of structure domains.

For detailed characterization, enzymes in GH43 have been divided into 37 subfamilies, GH41\_1 to GH43\_37 [61]. In this classification, β-xylosidases are currently found in 16 different subfamilies, with the majority belonging to subfamilies GH43\_1 and GH43\_11. Two GH43\_1 β-xylosidase crystal structures are currently present in the PDB database, i.e., from an uncultured organism (RS223-BX; PDB 4MLG; Figure 2f) [64] and from a compost metagenome (CoXyl43; PDB 5GLK) [65]. The most structurally characterized β-xylosidases are from GH43\_11, with crystal structures available of seven different β-xylosidases, i.e., the β-xylosidases from *B. subtilis*(PDB 1YIF) (Patskovsky et al., unpublished work), *Clostridium acetobutylicum* ATCC 824 (CaXyl43\_11; PDB 1YI7) (Teplyakov et al., unpublished work), *B. halodurans* (PDB 1YRZ) (Fedorov et al., unpublished work), *G. stearothermophilus* T-6 (PDB 2EXH; Figure 2g) [66], *Selenomonas ruminantium* GA192 (PDB 3C2U) [67], *B. pumilus*IPO (PDB 5ZQJ) [68], and *Bacillus* sp. HJ14 (PDB 6IFE) [69]. Additionally, GH43 β-xylosidase 3D structures are also found in subfamilies GH43\_12 and GH43\_26, i.e., the β-xylosidases from *G. thermoleovorans* IT-08 (PDB 5Z5D) [70] and *C. acetobutylicum* ATCC 824 (CaXyl43\_26; PDB 3K1U) (Osipiuk et al., unpublished work), respectively.

While β-xylosidases from GH43\_1 and GH43\_26 have only a single 5-bladed β-propeller catalytic domain [64,65] and belong to type I GH43 [60], those from GH43\_11 and GH43\_12 possess an additional X19 domain at the C-terminus [66–68,70] and belong to type III GH43 [60]. Although the architecture of the 5-bladed β-propeller is highly conserved among the GH43 β-xylosidases [64], structural superposition of the type I and type III catalytic domains gave a high RMSD. This is because the catalytic domains of the type I enzymes have several significantly longer loops than those of type III. The single catalytic domain of the type I GH43 β-xylosidases is sufficient for activity, but the enzymes are strongly activated by divalent metal ions, particularly calcium. Indeed, those metal-containing enzymes contain a metal-binding site close to the enzymes' active site [64,65]. In contrast, the type III GH43 β-xylosidases have no such metal-binding site [66–68,70]. The X19 domain, which is only found in a subset of GH43 subfamilies [61], appeared to be crucial for catalytic activity of the type III GH43 β-xylosidases, since removing this domain abolished the activity of the GH43\_11 β-xylosidases from *Thermobifida fusca* YX [71] and *Enterobacter* sp. [72]. In fact, a loop from the X19 domain contributes a Phe residue to the active site of the type III β-xylosidases [66–68,70], which is spatially conserved among all GH43 β-xylosidase structures. Only in CaXyl43\_26 this Phe is missing. Unfortunately, no biochemical evidence is available on the enzyme's substrate preferences and catalytic activity, but given that all other enzymes in GH43\_26 are α-l-arabinofuranosidases [61], some doubt that CaXyl43\_26 is a genuine β-xylosidase seems justified. These observations suggest that although GH43 β-xylosidases adopt different overall folds, the enzymes have a common active site organization and use a conserved Phe to interact with the substrate in subsite −1 (see below; Figure 4b).

#### *2.4. Glycoside Hydrolase Family 52 (GH52)*

Currently, GH52 contains 112 entries, of which 11 enzymes are annotated as β-xylosidases. These β-xylosidases have comparable amino acid sequence lengths of about 700 amino acid residues, with the exception of a β-xylosidase from *G. stearothermophilus* 236 (GsXylA, GenPept AAA50863), which is composed of only 618 amino acid residues. The GH52 β-xylosidases are very similar to each other with amino acid sequence identities of around 41%–90%. In this family, crystal structures are available for β-xylosidases from *Parageobacillus thermoglucosidasius* TM242 (GT2\_24\_00240; PDB 4C1O; Figure 2i) [73] and *G. stearothermophilus* T-6 (Xyn52B2; PDB 4RHH) (Dann et al., unpublished work). With a sequence identity of 86%, the two proteins fold into almost the same structures; they display two distinct domains, an N-terminal β-sandwich domain and a C-terminal (α/α)6-barrel domain. The catalytic residues of the GH52 enzymes, which are Glu-357 and Asp-517 in GT2\_24\_00240 [73], are located in the (α/α)6-barrel domain. Protein homology modeling based on the structure of GT2\_24\_00240 suggested that the domains are conserved among the GH52 β-xylosidases, except for the C-terminal domain of GsXylA. Compared to other GH52 β-xylosidases, this latter domain lacks five α-helices of the C-terminal domain, such that it displays an open half-barrel structure.

#### *2.5. Glycoside Hydrolase Family 54 (GH54)*

Most of the characterized enzymes in GH54 are annotated as α-l-arabinofuranosidases. However, two sequences in this family are annotated as β-xylosidase. TkAbf from *T. koningii* G-39 was characterized as a bifunctional α-l-arabinofuranosidase/β-xylosidase due to its activity on synthetic nitrophenyl derivatives of α-l-arabinofuranoside and β-d-xylopyranoside with comparable *kcat*/*Km* values [36]. A three-dimensional structure of a GH54 enzyme is currently only available for an α-l-arabinofuranosidase from *A. kawachii* IFO4308 (AkAbfB; PDB 1WD3) [38]. The primary structures of TkAbf (500 residues) and AkAbfB (499 residues) are very similar with an amino acid sequence identity of 73%. Therefore, the three-dimensional structure of TkAbf was predicted by homology modeling using the crystal structure of AkAbfB as a template. The predicted model (Figure 2j) consists of two domains that correspond to the N-terminal catalytic domain and the C-terminal arabinose-binding domain of the AkAbfB structure [38]. The catalytic domain folds into a β-sandwich similar to that of clan GH-B enzymes, while the arabinose-binding domain has a β-trefoil structure that belongs to the family 42 carbohydrate-binding module [28,38].

#### *2.6. Glycoside Hydrolase Family 116 (GH116)*

In GH116, SSO1353 is the only enzyme that exhibits β-xylosidase activity. As mentioned above, this enzyme does not hydrolyze xylooligosaccharides, but it is active on artificial substrates such as *p*-nitrophenyl- and methylumbelliferyl-linked β-d-xylopyranosides [37]. Currently, a threedimensional structure of a GH116 member is only available for the β-glucosidase from the thermophilic bacterium *T. xylanolyticum* LX-11 (TxGH116; PDB 5BVU). This protein folds into an N-terminal β-sandwich domain and a C-terminal (α/α)6 solenoid catalytic domain [74]. The primary structure similarity of SSO1353 and TxGH116 is rather low with an amino acid sequence identity of only ~20%. However, homology modeling of SSO1353 based on the structure of TxGH116 using the Swiss-Model server [46] produced a relatively good quality model with a Global Model Quality Estimation (GMQE) value of 0.63 (on a scale of 0–1). As expected, the model displays a two-domain fold, i.e., an N-terminal β-sandwich domain and a C-terminal (α/α)6-barrel domain (Figure 2k), very much like the domain organization of the GH52 proteins (see above). Importantly, the modeling placed the catalytic nucleophile and acid/base residues of SSO1353 (Glu-335 and Asp-462, respectively) [37] at about the same positions as those of the GH52 β-xylosidase GT2\_24\_00240 (Glu-357 and Asp-517, respectively) [73]. In view of this structural similarity and the conservation of the catalytic residues, GH families 52 and 116 were recently grouped into clan GH-O [74].

#### *2.7. Glycoside Hydrolase Family 120 (GH120)*

Of the 176 sequences that are currently available in the CAZy database for the GH120 family, two enzymes were characterized and identified as β-xylosidases, i.e., enzymes from *Thermoanaerobacterium* saccharolyticum JW/SL-YS485 (TsXylC; GenPept ABM68042) [75] and *Bifidobacterium adolescentis* LMG10502 (BaXylB; GenPept BAF39080) [56]. While TsXylC was shown to be active on xylobiose and xylotriose [75], BaXylB prefers xylotriose or longer xylooligosaccharides as its substrate [56]. The three-dimensional structure of TsXylC has been reported to fold into a core domain of a right-handed parallel β-helix, a common fold observed in several GHs, polysaccharide lyases, and carbohydrate esterases. This core domain is intervened by an Ig-like β-sandwich domain (PDB 3VST, Figure 2l). Both domains are important to organize the active site of the enzyme [25]. BaXylB shares 47% amino acid sequence identity with TsXylC. A homology model of BaXylB based on the structure of TsXylC suggested that the active site residues and their positions are conserved in the enzymes, except for Trp-362 in BaXylB, which is a histidine (His-352) in TsXylC.

**Figure 3.** β-Xylosidase active site. Molecular surface drawing of active sites of β-xylosidases colored according to their electrostatic potential (negative, red; neutral, white; positive, blue). Complexed ligands are depicted in ball and stick representation with carbon atoms in green. The active sites are of (**a**) GlyA1 from metagenomic cow rumen fluid in complex with xylose (PDB 5K6N; GH3); (**b**) GsXynB1 from *Geobacillus stearothermophilus* T-6 in complex with 2,5-dinitrophenyl-β-d-xyloside (PDB 2BFG; GH39); (**c**) CoXyl43 from a compost metagenome in complex with xylose and xylobiose (PDB 5GLN; GH43\_1); (**d**) GsXynB3 from *G. stearothermophilus* T-6 in complex with xylobiose (PDB 2EXJ; GH43\_11); (**e**) GT2\_24\_00240 from *Parageobacillus thermoglucosidasius* TM242 in complex with xylobiose (PDB 4C1P; GH52); and (**f**) TsXylC from *Thermoanaerobacterium saccharolyticum* JW/SL-YS485 in complex with xylobiose (PDB 3VSU; GH120). The electrostatic potential was calculated using the APBS (Adaptive Poisson–Boltzmann Solver) implemented in the program PyMol [76]. (**g**) Generalized schematic diagram of a β-xylosidase active site with a ligand bound at subsites –1 and +1. Catalytic residues (see below) are represented by carboxylate groups and their catalytic roles are indicated. The exact positions of the catalytic residues vary with enzymes (see Figure 6).

#### **3. Active Site of** β**-Xylosidases**

Despite the diversity of their three-dimensional folds, all structurally characterized β-xylosidases display a typical pocket-shaped active site (Figure 3) that is very suitable for *exo*-acting enzymes [29]. The pocket is negatively charged due to the presence of several acidic residues, but contains also hydrophobic patches of aromatic residues (Figure 3a–f). It has only a single route for substrates to enter and products to exit. The active site pocket can be virtually divided into two subsites with each of them able to accommodate a monosaccharide residue (Figure 3g). One subsite is buried, and, in several enzyme-xylobiose complexes (e.g., PDBs 2EXJ, 4C1P and 3VSU), interacts with the –1 non-reducing-end xylose (subsite –1), while the other is more open and binds the +1 xylose (subsite +1). Substrates with more than two xylose residues must have the additional residues beyond +1 exposed in the bulk solvent [25,67]. The active site architecture seems to be both necessary and sufficient for β-xylosidase activity of the enzymes.

Furthermore, comparison of active site structures of several β-xylosidase-ligand complexes suggests that there are similar interactions between the enzymes and their ligands (Figure 4). The ligand in subsite –1 is strongly bound to the enzyme by a large number of hydrogen bonds and a few hydrophobic stacking interactions. In contrast, the ligand substrate in subsite +1 interacts less strongly with the enzyme with less hydrogen bonds but more hydrophobic stacking interactions.

**Figure 4.** Interactions between active site residues of β-xylosidases and their ligands. The ligands 2,5-DNPX (2,5-dinitrophenyl-β-d-xyloside) and BXP (β-d-xylobiopyranose) are represented with purple bonds and their binding subsites -1 and +1 are indicated. Catalytic residues are labeled in magenta. Hydrogen bonds are shown as dashed lines and their distances are indicated in Å, while hydrophobic interactions are rendered with arcs. The active sites are of (**a**) GsXynB1 (PDB 2bfg); (**b**) GsXynB3 (PDB 2exj); (**c**) GT2\_24\_00240 (PDB 4c1p); and (**d**) TsXylC (PDB 3vsu), which represent β-xylosidases from GH families 39, 43, 52, and 120, respectively (see caption of Figure 3 for further details of the enzymes). Interaction analysis and figure preparation were performed using LigPlot<sup>+</sup> [77].

#### **4. Catalytic Mechanism of** β**-Xylosidases**

With respect to their catalytic mechanism most GHs can generally be classified into retaining and inverting enzymes [29,78]. The retaining GHs hydrolyze their substrates with overall retention of the stereochemistry of the anomeric carbon atom of the hydrolyzed glycosidic bond, while the inverting GHs yield a product with an inverted stereochemistry of the anomeric carbon atom [29,78]. In both mechanisms, the enzymes rely on two catalytic carboxylate groups that function as a nucleophile and a general acid/base in the retaining enzymes, or as a general base and a general acid in the inverting enzymes, respectively (Figure 5).

The retaining enzymes use a two-step double-displacement mechanism, in which enzyme glycosylation is followed by deglycosylation. In the glycosylation step, the nucleophile attacks the anomeric carbon to form a glycosyl–enzyme intermediate with the inverted configuration at the anomeric carbon. Concomitantly, the (protonated) acid/base residue transfers its proton to the glycosidic oxygen atom, to cleave the scissile glycosidic bond. Departure of the aglycone creates space allowing a catalytic water molecule to come closer to the anomeric center. In the deglycosylation step, the incoming catalytic water molecule, which is activated by the now negatively charged acid/base, attacks the anomeric carbon to release the glycone product from the intermediate. The attack re-inverts the inverted configuration of the anomeric carbon and hence the released glycone has the same stereochemistry as it had in the substrate. In contrast, the inverting enzymes follow a single-displacement mechanism to hydrolyze the glycosidic bond. A catalytic water molecule, which is deprotonated by the general base, does a nucleophilic attack on the anomeric carbon in concert with the general acid protonating the glycosidic oxygen. This cleaves the scissile glycosidic bond and frees the glycone with the inverted stereochemistry of its anomeric carbon.

**Figure 5.** The two common types of catalysis by glycoside hydrolases as adapted from Davies and Henrissat [29]. (**a**) The retaining mechanism. The nucleophile and the general acid/base are represented as B- and AH, respectively. (**b**) The inverting mechanism. The general base and the general acid are represented as B- and AH, respectively. The typical distances of the catalytic residues in both mechanisms are indicated in Å. In most GHs, A and B are either Asp or Glu. See main text for further details.

Except for the enzymes from family GH43, all β-xylosidases in the CAZy database are predicted to have a retaining mechanism (Table 1). Among these retaining β-xylosidases, structural data with bound ligand are available for enzymes from GH3 [58], GH39 [40–42], GH52 [73], and GH120 [25]. A structural alignment of these β-xylosidases on the basis of their bound ligand revealed that the carboxylate group of their catalytic nucleophiles, which are Glu in GH39 and GH52, and Asp in GH3 and GH120, are spatially conserved relative to the bound ligand (Figure 6a). They are within good distance (~3.1 Å) and right position for reaction with the anomeric carbon of the scissile glycosidic bond. On the other hand, the carboxylate group of their catalytic acid/base, which is Glu in GH3, GH39, and GH120, and Asp in GH52, are spatially less conserved, although they are at productive hydrogen-bonding positions (~3.2 Å on average) to the corresponding glycosidic oxygen atom.

β-Xylosidases from GH43 are inverting enzymes [79]. They use Asp and Glu as the general base and general acid, respectively [70,80,81]. Similar to the catalytic acid/base of the retaining β-xylosidases, their catalytic acid is within hydrogen-bonding distance (~2.7 Å) to the glycosidic oxygen atom of the scissile bond (Figure 6b). However, compared to the catalytic nucleophile of the retaining β-xylosidases, their catalytic base is located further away from the anomeric carbon atom of the scissile glycosidic bond with a distance of ~5.2 Å [65,66,70]. This distance provides sufficient space for accommodating a catalytic water molecule that can be activated by the catalytic base to attack the anomeric carbon [66]. It has been observed generally for GHs that the distance between the carboxylate groups of the catalytic base and acid of retaining enzymes is shorter (~5 Å) than the distance between the carboxylates of the catalytic nucleophile and acid/base of inverting enzymes (~8–10 Å) [29,82]. This is also the case for the GH43 β-xylosidases. Indeed, in the inverting GH43 β-xylosidase from *G. stearothermophilus* T-6, for example, a distance of ~7.9 Å between the carboxylate groups of its catalytic residues has been observed [66].

**Figure 6.** Positions of the catalytic residues relative to the xylosyl moiety bound in subsite -1 of the active sites of (**a**) retaining and (**b**) inverting β-xylosidases. The structures are of GlyA1 (PDB 5K6N; GH3; carbon atoms in pink), GsXynB1 (PDB 2BFG; GH39; green), GT2\_24\_00240 (PDB 4C1P; GH52; cyan), and TsXylC (PDB 3VSU; GH120; blue), which are retaining β-xylosidases, and GsXynB3 (PDB 2EXJ; GH43; white), which is an inverting β-xylosidase (see caption of Figure 3 for further details of the enzymes). Important distances (in Å) are shown next to dashed lines.

#### **5. Inhibition of** β**-Xylosidases by Monosaccharides**
