**Expression, Characterisation and Homology Modelling of a Novel Hormone-Sensitive Lipase (HSL)-Like Esterase from** *Glaciozyma antarctica*

#### **Hiryahafira Mohamad Tahir 1,2, Raja Noor Zaliha Raja Abd Rahman 1,3, Adam Thean Chor Leow 1,4 and Mohd Shukuri Mohamad Ali 1,2,\***


Received: 2 October 2019; Accepted: 5 November 2019; Published: 1 January 2020

**Abstract:** Microorganisms, especially those that survive in extremely cold places such as Antarctica, have gained research attention since they produce a unique feature of the protein, such as being able to withstand at extreme temperature, salinity, and pressure, that make them desired for biotechnological application. Here, we report the first hormone-sensitive lipase (HSL)-like esterase from a *Glaciozyma* species, a psychrophilic yeast designated as GlaEst12-like esterase. In this study, the putative lipolytic enzyme was cloned, expressed in *E. coli*, purified, and characterised for its biochemical properties. Protein sequences analysis showed that GlaEst12 shared about 30% sequence identity with chain A of the bacterial hormone-sensitive lipase of E40. It belongs to the H group since it has the conserved motifs of Histidine-Glycine-Glycine-Glycine (HGGG)and Glycine-Aspartate-Serine-Alanine-Glycine (GDSAG) at the amino acid sequences. The recombinant GlaEst12 was successfully purified via one-step Ni-Sepharose affinity chromatography. Interestingly, GlaEst12 showed unusual properties with other enzymes from psychrophilic origin since it showed an optimal temperature ranged between 50–60 ◦C and was stable at alkaline pH conditions. Unlike other HSL-like esterase, this esterase showed higher activity towards medium-chain ester substrates rather than shorter chain ester. The 3D structure of GlaEst12, predicted by homology modelling using Robetta software, showed a secondary structure composed of mainly α/β hydrolase fold, with the catalytic residues being found at Ser232, Glu341, and His371.

**Keywords:** psychrophilic yeast; hormone-sensitive lipase; *Glaciozyma antarctica*; Antarctica and homology modelling

#### **1. Introduction**

The lipolytic enzyme consists of esterases (EC 3.1.1.1) and lipases (EC 3.1.1.3) that catalyse both the cleavage and formation of ester bonds [1]. Although they have similar secondary structures, i.e., α/β hydrolase fold, esterase prefers to hydrolyse fatty-acids esters with acyl chain with less than 10 carbon atoms, whereas lipase is able to hydrolyse long-chain fatty acids with more than 10 carbon atoms [2]. Based on the sequence similarity, these protein have been classified into four groups, namely, C (cholinesterases and fungal lipase), L (lipoprotein and bacterial lipase), H (mammalian

hormone-sensitive lipase and hormone-sensitive lipase (HSL)-like family), and X (α/β hydrolase and does not belong to any of the other groups) [3].

The H group consists of two members that are HSL and HSL-like, in which both of them having conserved motifs, such as GDSAG or Glycine-Threonine-Serine-Alanine-Glycine (GTSAG) and HGGG motifs. Hormone-sensitive lipase is an enzyme that is mostly found in mammalian tissue and stimulated by several hormones, such as catecholamines, ACTH, and glucagon, to hydrolyse the triglyceride into free fatty acids and glycerol, which makes it play a pivotal role in providing the major source of energy for most tissues [4,5]. Another member of the H group is HSL-like, which is mostly originated from microbial sources that have similar protein sequences with HSL, especially at the C-terminal catalytic domain [6]. Although the mechanism of catalysis and the function of N-terminal domain HSL-like in microorganisms is still scarcely explored, the discovery of this new enzyme provides biotechnological application, such as in biosensors to detect foodborne bacteria and organophosphate pesticides [7,8]. Besides that, HSL-like enzymes also have potential to be used in the pharmaceutical, biodiesel, and detergents industry [6,9,10].

Several HSL-like enzymes have been reported from microbial sources, such as RmEstB from the thermophilic fungus *Rhizomucor miehei* [11], PMGL2 from a permafrost bacterium *Psychrobacter cryohalolentis* [12], and E25 HSL esterase from a surface sediment sample E505 collected from the South China Sea [13]. Even though there are many reported HSL-like esterase from psychrophilic microorganisms on heterologous expression and biochemical characterisation, there are few reports on HSL-like esterase specifically from psychrophilic yeast. Discovery of the new HSL-like esterase from psychrophilic yeast not only provides an opportunity in biotechnology application but also gives crucial information on novel sequence, characterisation, and the structure–function relationship.

*Glaciozyma antarctica* strain PI12 is a member of the phylum Basidiomycota that was previously known as the *Leucosporidium antarcticum* [14]. This psychrophilic yeast was isolated from sea ice near the Casey Research Station in Antarctica and had optimum temperature growth at 12 ◦C but can grow at up to 18 ◦C [15]. A few reported proteins have been successfully expressed from *G. antarctica* such as proteases, antifreeze protein, α-amylase, and chitinase [16–19]. In this work, we report the heterologous expression, purification, biochemical characterisation, and structural prediction of the first HSL-like esterase from the *Glaciozyma Antarctica* species, and we also believe this enzyme is the first HSL-like esterase from psychrophilic basidiomycete yeast.

#### **2. Results and Discussion**

#### *2.1. Sequence Analysis of GlaEst12*

The amino acid sequence of *Glaciozyma antarctica* hormone-sensitive lipase (GlaEst12) esterase was searched for similarity against the protein data bank at the National Centre of Biotechnology (NCBI) (https://blast.ncbi.nlm.nih.gov/Blast.cgi) using BLASTP. The search results showed that GlaEst12 had low sequence similarity (about 30% identity) with chain A of the crystal structure of esterase 40 from the bacterial HSL family and apparently no homology to the HSL-like esterase from psychrophilic yeast or bacteria. No similarity of the GlaEst12 sequence with psychrophilic microorganism may have two possible explanations. Firstly, there is less discovery of this type of enzyme from a cold environment, so the possibility of this GlaEst12 sequence being similar to other mesophiles and thermophiles is quite high. Hormone-sensitive lipase from *Psycrobacter* sp. that has been isolated from Antarctic seawater also showed similar reports in that the enzyme is closely related to the HSL-like esterase from the mesophilic enzyme [20]. Another explanation is because GlaEst12 showed mesophilic or thermophilic characteristics rather than psychrophilic features. The limited sequence of the GlaEst12 to the known HSL-like esterase sequences indicated less conserved residues, which provide novelty properties. The nucleotide sequence of GlaEst12 revealed an open reading frame (ORF) of 1200 nucleotide, which encoded 399 amino acids with a predicted molecular weight of 44.5 kDa. This esterase lacks signal peptide and has a theoretical isoelectric point (pI) value of 7.72.

The multiple sequence alignment was performed using ENDscript with the other seven proteins that have higher percentage of sequence similarity with GlaEst12 esterase, in which they are chain A of esterase from the bacterial HSL family (PDB ID: 4XVC A); chain A of mutant S202w/203f of the Esterase E40 (PDB ID: 5GMS A); chain C of mutant M3+s202w/i203f of Esterase E40 (PDB ID: 5GMR C); chain A of esterase/lipase from uncultured bacterium (PDB ID: 3V9A A); chain A of Hormone-sensitive lipase-like Este5 (PDB ID: 3FAK A), chain A of hyper-thermophilic carboxylesterase from archaeon *Archaeoglobus fulgidus* (PDB ID: 1JJI A), and chain A of MGS-MT1, an alpha/beta hydrolase enzyme from a Lake Matapan deep-sea metagenome library. Surprisingly, the GlaEst12 sequence is closely related to the mesophilic and thermophilic esterase. None of them are from either a psychrophilic or Antarctica environment. This finding might give new insight into the highly similar protein sequences that are not usually from the same environment.

Multiple sequence alignment showed that GlaEst12 belongs to the H group of the lipolytic group, which consists of a type of protein that has sequence similarity with the HSL subfamily. Most of the members of the H group have two highly conserved motifs, which are also present in the GlaEst12 sequence, such as His-Gly-Gly-Gly at upstream of the catalytic triad and the residue of serine at GDSAG motif [3]. Figure 1 shows GlaEst12 adhered to the characteristic of the H group, which is indicated by the red residue for HGGG and GDSAG motifs. The alignment with other proteins showed the possibility of the catalytic residue of GlaEst12 at position Ser232, Glu341, and His371. The hormone-sensitive lipase-like family (HSL-like) can be widely found in microorganisms, animals, and plants. Most of the microbial HSL-like family consists of two subfamilies, GDSAG and GTSAG [21]. Since the serine residue was located at the pentapeptide motif, which is in the middle between aspartate acid and alanine, we proposed that GlaEst12 is a new member of the GDSAG subfamily of the HSL family.

Furthermore, the phylogenetic tree was constructed based on the amino acid sequence that aligned with closely related proteins and with the other members of HSL-like esterase from prokaryotic and eukaryotic microorganisms (PDB ID: 4QO5; 4Q30; 4WY8; 4WY5; Accession number: WP\_012330536.1, ADH59412, QBH67630.1, KX580963.1). The results showed that GlaEst12 is grouped under the GDSAG motif subfamily (Figure 2) together with other proteins containing a GDSAG conserved sequence. Interestingly, GlaEst12 was assigned at a different sub-branch with other GDSAG subfamily members, indicating the differences of this sequence with the other esterases. The contrast may be due to the presence of extra α-helix at the N-terminal region, which was absent in all other esterases. Apart from that, this esterase is mostly related to eukaryotic proteins, such as RmEsTA (PDB 4WY5) and RmEsTB (PDB 4WY8), since they come from fungi species.

**Figure 1.** *Cont.*

**Figure 1.** Multiple alignments of amino acid and secondary structure protein sequences from *Glaciozyma Antarctica* of hormone-sensitive lipase (GlaEst12) esterase with other related proteins. Squiggles indicate helices, arrows indicate β-strands, TT letters indicate a turn, ω letters indicate random coil and the catalytic triad are indicated by an arrow symbol. The identical and highly conserved residues are indicated by red and yellow colour, respectively.

**Figure 2.** Phylogenetic tree of representative esterase sequences from microbial hormone-sensitive lipase (HSL) family generated using MEGA 7.0. The amino acid sequences were retrieved from the National Centre of Biotechnology (NCBI) and Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) database. The neighbour-joining method was built with a Jones-Taylor-Thornton matrix-based model to estimate the phylogenetic tree. The black box indicates GlaEst12.

#### *2.2. Expression and Purification of Recombinant GlaEst12*

GlaEst12-like esterase was cloned and expressed in an *E. coli* BL21(De3)/pET32b(+) expression system, which resulted in the accumulation of expressed GlaEst12 in the form of inclusion bodies. It is well known that high-level expression recombinant protein in *E. coli* is usually formed of partially folded or misfolded protein. HSL-like esterases from *Psychrobacter* sp. TA144 [20] and *Mycobacterium tuberculosis* LIPY [22] were also expressed as inclusion bodies. In the case of recombinant GlaEst12, the active enzyme was successfully renatured (Figure 2). A protein band corresponding to GlaEst12-like esterase with an expected size of 63 kDa was obtained as visualised by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). High expression of GlaEst12 in inclusion bodies leads to solubilisation and refolding to recover the bioactive protein. The proper protein folding of the aggregated protein in inclusion bodies was achieved by solubilising it with a high concentration of urea and then refolded by dilution. The refolded GlaEst12 esterase showed the highest esterase activity that revealed the successful solubilisation of GlaEst12 from inclusion bodies. The expression of recombinant GlaEst12 was optimised with the temperature, induction time, and isopropyl β-D-1-thiogalactopyranoside concentration at 16 ◦C, 20 h, and 10 μM, respectively.

The crude refolded GlaEst12 was purified in one-step purification using nickel sepharose affinity chromatography. N-terminal of polyhistidine (His-tag) in pET32b(+) vector was fused and expressed together with esterase, which enables the protein-containing polyhistidine to bind the specific immobilised Nickel (II) ions [23]. The crude refolded GlaEst12 was loaded into a nickel sepharose column, and the bound of protein was eluted using an ascending wise gradient of imidazole concentration. The bound protein was eluted at 300 mM Imidazole concentration and was checked for the presence of target protein by performing a lipase assay and SDS-PAGE. Figure 3 shows a single band on SDS-PAGE with a molecular weight of 63 kDa, which is consistent with the size of GlaEst12 (i.e., 45 kDa) fused with an 18 kDa pET32b(+) vector, indicating the successful purification of GlaEst12 esterase. The esterase was purified for homogeneity with a 40% yield and a purification fold of 1.72. The size of protein GlaEst12 was compared with the protein marker. Unlike with other HSL esterases,

the molecular mass of GlaEst12 is 45 kDa, which is slightly higher than the reported range of molecular mass between 30–40 kDa [24–26].

**Figure 3.** The SDS-PAGE analysis of purified GlaEst12 esterase. Lane M: unstained protein marker (Thermo Scientific, Waltham, MA, USA); Lane 1: refolded GlaEst12; Lane 2: purified GlaEst12-like esterase via Ni-Sepharose affinity chromatography.

#### *2.3. Characterisation of Purified GlaEst12*

2.3.1. Effect of Temperature on GlaEst12 Esterase Activity and Stability

The effect of temperature on GlaEst12 esterase activity and stability was studied by measuring the activity from 10–70 ◦C with an interval of 10 ◦C. Interestingly, the purified GlaEst12 has a broad temperature from 10–70 ◦C with an optimum temperature at 60 ◦C (Figure 4a) with 980 U/mL, and the activity of GlaEst12 dropped drastically at 70 ◦C. This indicated that GlaEst12 esterase exhibits thermophilic characteristics rather than psychrophilic, in which most of the reported enzymes from psychrophilic organisms have activity at low temperature with an optimal temperature range between 0–30 ◦C [27]. However, this is not the first reported enzyme from a psychrophilic microorganism that has broad temperature because there are already reported microbes isolated from the cold environment that appeared to produce thermotolerant lipases, such as AMS3 lipase from *Pseudomonas* sp. [28] and lipase ZJB09193 from *Candida antarctica* [29]. Another explanation of why GlaEst12 is able to withstand at a higher temperature is because of the presence of three cysteines in the amino acid composition. The cysteine consists of a thiolate group in the side chain that will form a disulphide bond that increases the rigidity of the protein, which plays a role in thermostability [27]. Figure 4b shows the thermostability of GlaEst12, which was tested by incubating the enzyme at 10–70 ◦C for 30 min. The results showed GlaEst12 was most stable at 50 ◦C and when incubated at a lower temperature range from 10–40 ◦C, the reduction of enzyme activity less than 40%. Thermal stability is one of the important criteria for making an enzyme to be used for industrial purposes [30]. Esterases or lipases that have optimal activity at low or high temperatures make them a versatile biocatalyst [31].

**Figure 4.** Effect of temperature on enzyme (**a**) activity and (**b**) stability of purified GlaEst12. The optimal temperature was determined by measuring the enzymatic activity at different temperatures ranged from 10–70 ◦C by using C10 as a substrate. The maximum optimal activity was observed at 60 ◦C. The temperature stability of purified esterase was determined by measuring the residual activities after the enzyme had been incubated for 30 min at different temperatures (10–80 ◦C) and the assay was performed at optimum temperature. Error bar represents standard deviation (*n* = 3). The absence of the bar indicates the error smaller than symbols.

#### 2.3.2. Effects of pH on GlaEst12 Activity and Stability

The effect of pH on purified GlaEst12 esterase was tested on the different buffers with different pH ranging from 4–11. This esterase showed maximum activity at pH 8 using the Tris-HCl buffer. The GlaEst12 tended to be stable at a pH ranging from 7 to 9. Figure 5a shows the increasing trends of enzyme activity from pH 6 of sodium acetate to pH 7 and 8 of sodium phosphate, which peaked at pH 8 of Tris-HCl and then decreased gradually from pH 9 of Tris-HCl to pH 9–10 of Glycine-OH. Extreme acidic and alkaline buffers (i.e., pH less than 6 and pH more than 10, respectively) exhibited unfavourable conditions for this esterase with enzyme activity less than 100 U/mL. The pH stability of GlaEst12 was studied by treating the enzyme with various buffers for 30 min. Then, the residual activity after incubation was measured, and the highest activity was denoted as 100% relative activity, as shown in Figure 5b. The pH stability shows a similar pattern to the effects of pH on the enzyme activity since the GlaEst12 esterase showed the most stable in Tris-HCl pH 8 and more than 50% of residual activity stable at pH 7–9. Moreover, the result showed similar findings as reported in the other hormone-sensitive lipases, such as from *Psychrobacter* sp. TA144 and *R. miehei,* which have higher

activity and tend to be stable at pH 8 [11,20]. The buffer with pH range between 4–6 showed less than 10% enzyme activity, suggesting that the extreme acidic condition may affect the secondary structures, which ultimately leads to the reduction of the esterase activity.

**Figure 5.** Effect of purified GlaEst12 esterase on enzyme (**a**) activity and (**b**) stability. The optimal pH was determined by measuring enzyme activity using pNP (C10) as a substrate in different buffer systems ranging from pH 4–11. The pH stability was determined by incubating an enzyme at different buffers for 30 min and measuring the residual activity at optimum pH. The buffer systems were used: sodium acetate (pH 4–6) (blue, filled square); sodium phosphate (pH 6–8) (orange, filled circle); Tris-HCl (pH 8–9) (grey, filled triangle), and glycine-OH (pH 9–11) (yellow, filled diamond). Error bar represents standard deviation (*n* = 3). The absence of the bar indicates the error smaller than symbols.

#### 2.3.3. Substrate Specificity of GlaEst12 Esterase

The study on the substrate specificity of GlaEst12 was examined using various *p*-nitrophenyl (pNP) esters with an acyl chain length from C2–C16 using standard assay. The esterase showed high substrate specificity toward middle chain esters, pNP decanoate rather than a shorter or a longer chain of pNP ester. Figure 6 shows that more than 80% of activity was achieved when GlaEst12 used C8 as a substrate, while about 50% activity dropped when this esterase was assayed with a longer chain that is more than 10 carbons. The shorter chain of carbon, such as C2 and C4, had the lowest activity, which was less than 10%, indicating that the GLA has a low specificity toward the shorter carbon chain. The results showed differently from other esterases, for example, HSL esterase that has a protein sequence similarity with GlaEst12, E40 had the highest activity toward pNP butyrate (C4) [32], RmEstA from *R. miehei* prefers pNP hexanoate (C5) [24] and Est22 from deep-sea metagenomic library has the highest activity on pNP butyrate [33].

**Figure 6.** Effect of different pNP esters on purified GlaEst12 esterase. The activity of esterase was measured using different pNP esters at 60 ◦C using 50 mM Tris-HCl pH 8. The highest activity with p-nitrophenol decanoate (C10) substrate is shown as 100%. Error bar represents standard deviation (*n* = 3).

#### 2.3.4. Effect of Metal Ions on Esterase Activity

The importance of metal ions in enzyme catalysis is well established since there are many reported metal-dependent enzymes that enhanced enzyme activity. Each metal ion has different roles since they may play an important role in a redox reaction, stabilisation of negative charges, and activation of substrates by virtue of their Lewis acid properties [34]. Effect of metal ions on GlaEst12-like esterase was conducted by treating the enzyme with various metals ions at the concentration of 1 mM and 5 mM. Figure 7 shows that metal ions (Na+, K<sup>+</sup>, Ca2<sup>+</sup>, and Mn2+) enhanced the activity, which was higher than that in the control (enzyme without metal ions). However, 1 mM and 5 mM of Mg2<sup>+</sup>, Ni2<sup>+</sup>, and Cu2<sup>+</sup> decreased and abolished the GlaEst12 esterase activity. For Rb+, lower concentration showed an increase in the enzyme activity, but the high concentration of Rb<sup>+</sup> had a negative effect on the enzyme activity. Most of the experiments that involved the HSL-like esterases showed that Ni2<sup>+</sup> and Cu2<sup>+</sup> tended to decrease the enzyme activity, such as EstAG1 from *Staphylococcus saprophyticus* and RmEsT from *R. miehei* [11,35].

**Figure 7.** Effect of metal ions on the activity of purified GlaEst12-like esterase. The relative activity of the unincubated enzyme without metal ions (control) was taken 100%. Error bar represents standard deviation (*n* = 3).

#### 2.3.5. Effect of Organic Solvents on GlaEst12

The stability of GlaEst12 esterase on organic solvents was studied by incubating the enzyme with polar and non-polar organic solvents based on log P-value. Figure 8 reveals the activity of GlaEst12 that is increasing with dimethyl sulfoxide (DMSO) (104%), 1-propanol (123%), and Toluene (113%) compared to the control. However, other solvents such as methanol, acetonitrile, benzene, octanol, xylene, and *n*-hexane caused instability in the protein. Among organic solvents, DMSO tends to give better stability to the GlaEst12 and other HSL esterases, consistent with many previous studies that have reported that this solvent is able to enhance lipolytic activities, such as RmEstB esterase from *R. miehei*, rEst1 from *Rheinheimera* sp., and EstAG1 from *S. saprophyticus* [11,35,36]. Based on these results, the enzyme showed less tolerance to the organic solvents since they were unable to resist the denaturation by the organic solvent, and the presence of these solvents may prevent accessibility of substrate to the active site [37].

**Figure 8.** Effect of various organic solvents on the activity of purified GlaEst12-like esterase. The relative activity of the unincubated enzyme without organic solvent (control) was taken 100%. The log P is the logarithm of the partition coefficient, P, of the solvent between n-octanol and water, and is used as a quantitative measure of the polarity. Error bar represents standard deviation (*n* = 3).

#### *2.4. Homology Modelling and Validation of GlaEst12*

The homology modelling of GlaEst12 was done using the Robetta server (http://robetta.bakerlab. org/). The software uses two approaches to predict the structure, namely, comparative modelling or *de novo* structure prediction method. The de novo method was used when the query sequence was not matched with the template sequence and is known as the de novo Rosetta fragment insertion method [38]. Based on the multiple sequence alignments result, a crystal structure of Esterase 40 from the bacterial HSL family was chosen as the template to generate a 3D structure of GlaEst12 because it gave a higher score of sequence identity (about 30%), and the structure E40 was already solved using X-ray diffraction method [32]. Figure 9a shows the predicted model of this esterase that exists as a dimer comprised of two monomers of the subunits. Each monomer is dominated by 33.08% of α-helix followed by 9.52% and 57.39% of β-sheets and others, respectively. A higher number of α-helix present in the structure might be helping the survival of this enzyme in the Antarctica environment because an increase in the number of α-helix in protein structure tends to make the enzyme more flexible, which is responsible for enzyme activity at low temperatures [39]. The active site of GlaEst12-like esterase was predicted to be at position Ser 232, Glu 341, and His 371 (Figure 9b), which plays an important role in allowing the accessibility of the substrate. Serine residue located at the active site acts as a nucleophile, which is responsible for attacking the carbonyl group of the substrate, and this reaction later forms a tetrahedral intermediate together with the substrates, i.e., histidine and glutamate. In contrast with other esterases and lipases, mammalian HSL and the HSL-like esterase group exhibited conserved motif HGGG sequences. This sequence usually forms a loop in the secondary structure that is located

in close proximity to the active site and contributes to the formation and stabilisation of the oxyanion hole [40].

**Figure 9.** (**a**) The predicted GlaEst12 exists as dimer composed of two chain A (purple) and B (cyan) of α-helix structure, β-sheet, and coiled structures. (**b**) The catalytic triad of GlaEst12 was positioned at Ser232, Glu341, and His371 and depicted in yellow.

The assessment of protein models with 3D profiles was performed using online websites with the predicted structure of HSL esterase as a subject (Table 1). VERIFY 3D was used to determine the accuracy of the atomic model (3D), where the result was generated by comparing the subject with the structures that had already been solved by crystallography or the nuclear magnetic resonance (NMR) method. From the results, it showed that GlaEst12 has 87.8% residues of amino acid that scored equal and above 0.2 in Verify 3D. Although the value of score was lower than 90%, this structure is accepted because the residues have low scores at the N-terminal region, and the GLA esterase sequence is mostly conserved only at the central region as revealed by the multiple sequence alignment (Figure 1). This result showed consistency with the previous study, which stated that HSL lipase from the psychrophilic *Psychrobacter* sp. has sequence similarity with other homologous HSL proteins at the central region to the catalytic region. However, this psychrophilic enzyme has an additional sequence at the N-terminal region, which is expected to be the additional domain unique to the cold-adapted protein [20]. The addition of four α-helix domains at the N-terminal in GlaEst12 comparing to the other HSLs might

support the facts of the additional domain in HSL lipase from *Psychrobacter* since both of them are from the psychrophilic Antarctic. Besides that, the Errat tool was used to determine the accuracy and exactness of the atom distribution in the protein region and GlaEst12 has a high score that is more than 90%. The predicted structure was validated using a Ramachandran plot and revealed that 84.8% of it, which is about 328 residues, was located at the favoured region, while the remaining 14.8%, 0.3%, and 0.1% located at allowed, general, and outlier, respectively. The residues located at the disallowed region contributed about 0.1% of the total residues together with one of the catalytic triad, which is serine at position 232. The presence of the catalytic triad serine at negative region suggested that the enzyme is an active conformation. In contrast, the predicted structure of AMS8 lipase revealed that the catalytic serine is located at the allowed region and the protein is a closed conformation since it has the lid structure that covers the active site [41].

**Table 1.** The summary score for the predicted structure of GlaEst12 esterase using online web tools.


#### **3. Materials and Methods**

#### *3.1. Sequence Analysis of GlaEst12*

Previously, a psychrophilic yeast named *G. antarctica* was successfully isolated from sea ice near Casey Research Station, Antarctica. The whole-genome sequencing of this organism was done using 454 pyrosequencing and Illumina technology, with the protein information of *G. antarctica* being deposited in the *Glaciozyma antarctica* Genome Database (GanDB) (www.mgi-nibm.my/glaciozyma\_ antarctica) [42]. The gene encoding for putative esterase was chosen and known as *Glaciozyma antarctica* hormone-sensitive lipase (GlaEst12) esterase. The protein sequence of GlaEst12 was analysed using the GenBank database BLASTp (http://www.ncbi.nih.gov) from the NCBI to search the protein similarity with the other proteins. The amino acid composition, molecular weight, and predicted pI value of GlaEst12 were determined using Expasy Tools (https://web.expasy.org/protparam/). The presence of the signal peptide was predicted using the online tool SignalP-5.0 server (http: //www.cbs.dtu.dk/services/SignalP/). The sequences similarity and secondary structure information from aligned sequences were performed using ENDscript 2.0 (http://endscript.ibcp.fr). The phylogenetic tree was constructed using MEGA 7.0, whereby the GlasEst12 protein was aligned with eight additional proteins (accession numbers: WP\_012330536.1, ADH59412, QBH67630.1, KX580963.1, 4WY8, 4WY5, 4QO5, and 4Q30). The alignment was generated using Clustal W, and the evolutionary history was inferred by using the Neighbour Joining method with a Jones-Taylor-Thornton (JTT) method.

#### *3.2. Gene Synthesis, Bacteria Strains, and Plasmids*

The sequence of GlaEst12-like esterase that encoded for 1200 nucleotides was sent for gene synthesis. Codon optimisation was performed based on preferred codons by *E. coli* to enhance GLA HSL lipase expression in the *E. coli* host system. This codon-optimised gene was synthesised together with restriction endonuclease EcoR1 and Xho1 placed at the beginning and at the end of the gene sequence (Integrated DNA Technologies, Coralville, IA, USA). This gene was also cloned into a cloning vector (pUCIDT) and supplied in the dried plasmids. Since the pUCIDT/GLA HSL plasmid was in the form of powder, plasmid resuspension was carried out according to the manufacturer's protocol (Integrated DNA Technologies, Coralville, IA, USA). For cloning and expression of the protein, pET32b

(Merck, Kenilworth, NJ, USA) was used together with *E. coli* BL21(De3) as vector and expression host, respectively.

#### *3.3. Cloning of GlaEst12 in E. coli*

The gene that encoded GLA HSL lipase gene was amplified by PCR using recombinant plasmid pUCIDT/GLA HSL as a template. A set of forward and reverse primers with EcoRI and XhoI restriction sites were designed based on an optimised GLA HSL esterase gene sequence. The forward and reverse sequences are 5 CGTGAATTCGATGTTGAGTCCTG-3 and 5 GAGCTCTTAAAACTTCCCGTCTA-3 , respectively, in which the underlined nucleotide sequences represent the sequences of EcoRI and XhoI. The PCR product was purified using a Gel Extraction kit (GeneAll, Seoul, Korea) and then digested with restriction enzymes EcoRI and XhoI. The digested PCR product was cloned into a pET32b vector (Merck, Kenilworth, NJ, USA) and transformed into *E. coli* BL21(De3) in tributyrin-containing ampicillin agar plates. The agar plates were incubated at 37 ◦C for 16 h and followed by incubation at 4 ◦C for 24 h. The positively transformed colonies were indicated by the formation of halo zones of colonies in tributyrin agar supplemented with ampicillin.

#### *3.4. Expression, Solubilisation, and Refolding of GlaEst12 Inclusion Bodies*

The recombinant GlaEst12 was expressed using pET32(b) + vector and transformed into *E. coli* BL21(De3). The expression was induced using 10 μM IPTG at 16 ◦C for 20 h. Solubilisation of GlaEst12 was conducted as the enzyme was mostly expressed as inclusion bodies. The *E. coli* cell was harvested by centrifugation at 10,000× *g* for 15 min. Then, the supernatant was discarded, and the pellet was resuspended with 20 mL of 50 mM Tris-HCl (pH 8) and subjected to sonication for 6 min under the output of 2 and duty cycle of 20 (Sonifer® SLP150 Branson, Danbury, CT, USA). The clear lysate was centrifuged at 10,000× *g* for 15 min, and the pellet-containing insoluble protein was further resuspended with Tris-HCl buffer (pH 8) containing 8 M of urea. The resuspend mixture was then incubated at 4 ◦C for 4 h with constant agitation. After incubation, the mixture was centrifuged with the same condition stated above, and the supernatant was used for further reaction. Renaturation of the supernatant containing the GlaEst12-like esterase was achieved by a 10× dilution of the denaturant in 50 mM Tris-HCl buffer (pH 8). The solubilised protein was diluted in one-step with a peristaltic pump of the flow rate of 0.5 mL/min and stirred thoroughly at 4 ◦C. The refolded protein was then subjected to enzyme assay.

#### *3.5. Purification of Recombinant of GlaEst12-Like Esterase*

The His-tagged of recombinant GlaEst12 was purified by single-step Ni-sepharose affinity chromatography. The filtered crude protein was loaded onto a Nickel-Sepharose HP column (XK16/20) (GE Healthcare, Boston, MA, USA). The binding buffer [20 mM Sodium phosphate, 10 mM imidazole, 500 mM NaCl (pH 7.4)] was used to equilibrate the column at a flow rate of 1 mL/min. Then, the crude protein was loaded onto the column, and the bound protein was eluted with an ascending step gradient of elution buffer [20 mM Sodium phosphate, 500 mM imidazole, 500 mM NaCl (pH 7.4)]. The eluted proteins were collected in 2 mL per fraction. The fractions containing the protein of interest were confirmed through pNP assay and SDS-PAGE. The fractions containing protein of interest were pooled, dialysed with buffer [50 mM Tris-HCl, 50 mM NaCl (pH 8)], and stored at 4 ◦C for further characterisation. The molecular weight of GlaEst12 was determined by using SDS-PAGE with 6% stacking gel and 12% separating gel, as described by Laemmli., 1970, with some modification [43]. The gel was stained using Coomassie Brilliant Blue R 250 (BioRad, Hercules, CA, USA) and destained with a destaining solution. The molecular mass of the protein was estimated using a broad range of protein standard markers (unstained protein marker 18.4–116 kDa, Thermo Scientific, Waltham, MA, USA).

#### *3.6. Enzyme Assay*

A spectrophotometric method was used to determine the GlaEst12 activity using pNP substrate. The pNP released from the substrate was measured according to the method described by Sumby et al., 2009, with some modifications [44]. The mixture reaction consisted of 950 μL of 50 mM Tris-HCl (pH 8), 25 μL of 10 mM *p*-nitrophenyl decanoate (C10:0), and 25 μL of 0.1 mg/mL enzyme. The mixture was assayed with shaking at 150 rpm, 60 ◦C for 10 min. Then, the liberation of pNP was measured using Biochrom WPA UV/Visible spectrophotometer (Cambridge, UK) at 410 nm. The absorbance of the sample was deduced with the control that the mixture stated above without the enzyme. One unit of esterase was defined as 1.0 μmol of pNP released per min under the conditions stated above.

#### *3.7. Characterisation of Purified GlaEst12*

#### 3.7.1. Effect of Temperature on Activity and Stability

The determination of the effective temperature of purified GlaEst12-like esterase on its activity was conducted by measuring the esterase activity (as mentioned in Section 3.6) assayed at different temperatures of 10–80 ◦C (10 ◦C interval) for 10 min. For thermostability, 25 μL of the enzyme was first incubated with 50 mM Tris-HCl pH8 at different temperatures of 10–70 ◦C (10 ◦C interval) for 30 min without substrate. Then, the residual of enzyme activity was assayed together with 10 mM *p*-nitrophenyl decanoate (C10) as substrate at the optimum temperature of 60 ◦C for 10 min.

#### 3.7.2. Effect of pH and pH Stability

Different buffers were used to study and determine the optimum buffer for GlaEst12-like esterase under pH range from 4–11. The buffers used were 50 mM sodium acetate (pH 4.0–6.0), 50 mM sodium phosphate (pH 6.0–8.0), 50 mM Tris-HCl (pH 8.0–9.0), and 50 mM glycine-NaOH (pH 9.0–11.0). The pH stability was investigated by incubating the enzyme with different buffers as stated above at 60 ◦C for 30 min and followed by enzyme assay (same as in point 3.6).

#### 3.7.3. Effect of Substrate Specificity

The substrate specificity was determined by p-nitrophenyl esters with various chain lengths, including *p*-nitrophenyl acetate (C2), *p*-nitrophenyl butyrate (C4), *p*-nitrophenyl octanoate (C8), *p*-nitrophenyl decanoate (C10), *p*-nitrophenyl laurate (C12), *p*-nitrophenyl myristate (C14), and *p*-nitrophenyl palmitate (C16). The reaction mixtures containing 25 μL of the purified enzyme, 950 μL of 50 mM Tris-HCl pH 8, and 10 mM of different substrates were assayed at 60 ◦C for 10 min.

#### 3.7.4. Effect of Metals Ions

GlaEst12-like esterase was treated with 1 mM and 5 mM metal ions (i.e., Li+,Na+, K<sup>+</sup>, Rb2<sup>+</sup>, Mg2<sup>+</sup>, Ca2<sup>+</sup>, Mn2<sup>+</sup>, Ni2<sup>+</sup>, Cu2<sup>+</sup>). The treated enzyme was then subjected to enzyme assay. For 1 mM of metal ions, 940 μL of Tris-HCl buffer pH 8, 25 μL of the enzyme was treated with 10 μL of metal ions for 30 min at 60 ◦C. Then, 25 μL of 10 mM *p*-nitrophenyl decanoate (C10) was added to the mixture and assay, as mentioned in point 3.6. For 5 mM, all the composition are same as 1 mM except for the composition of buffer and metal ions, which used 900 μL and 50 μL, respectively. The stability was determined as the relative activity to the control (i.e., without a metal ion).

#### 3.7.5. Effect of Organic Solvents

The esterase was incubated for 30 min at 60 ◦C with various organic solvents at a concentration of 25% (*v*/*v*). The solvents were selected based on their log P values (in parentheses): DMSO (−1.22), methanol (−0.76), acetonitrile (−0.33), 1-propanol (1.36) benzene (2.0), toluene (2.5), octanol (2.9), xylene (3.15), and *n*-hexane (3.16). The mixtures pre-incubate for 30 min, which contained 700 μL of 50 mM of Tris-HCl (pH 8), 25 μL of the enzyme, and 250 μL of organic solvents and later were assayed together

with 10 mM of *p*-nitrophenyl decanoate (C10) at 60 ◦C. The stability was determined as the relative activity to the control (i.e., without organic solvent).

#### *3.8. Homology Modelling and Structure Validation*

The homology modelling was used to predict 3D structure using templates deposited in the Protein Data Bank (PDB) that have high similarity to GlaEst12. The 3D structure of GlaEst12 was generated by using the Robetta server (http://robetta.bakerlab.org) that provides automated tools for protein structure prediction, while the figures were prepared using the Chimera visual system (www.cgl.ucsf.edu/chimera). The validation of protein structure was done using online software such as Ramachandran Plot (http://www-cryst.bioc.cam.ac.uk/), Errat [45], and VERIFY 3D [46].

#### **4. Conclusions**

A novel HSL-like esterase family known as GlaEst12 is being introduced from *G. antarctica*, a psychrophilic yeast. Multiple sequence alignment with another hormone-sensitive lipase proteins revealed GlaEst12 as a new member of the GDSAG motif subfamily of the HSL family. GlaEst12-like esterase was successfully expressed in *E. coli* and purified with single-step nickel-sepharose affinity chromatography. Biochemical characterisation of this esterase showed interestingly higher activity and stability at a higher temperature, which gives a unique feature to HSL-like esterase that was isolated from psychrophilic yeast. Besides that, this esterase was activated when treated with metal ions (Na+, K+, Ca2+, Mn2+) and stabilised when incubated with *1*-propanol and toluene. Homology modelling of this GlaEst12-like esterase showed the predicted structure of this enzyme that is composed of a typical α/β hydrolase fold with the catalytic residues found at Ser 232, Glu 341, and His 371. The characterisation of GlaEst12 that can withstand a broad temperature and remain stable in an alkaline environment make it a potential catalyst in industrial application.

**Author Contributions:** Conceptualisation, H.M.T. and M.S.M.A.; methodology, H.M.T. and M.S.M.A.; validation, M.S.M.A., R.N.Z.R.A.R. and A.T.C.L.; formal analysis, H.M.T. and M.S.M.A.; investigation, H.M.T.; Resources, M.S.M.A., R.N.Z.R.A.R. and A.T.C.L.; data curation, H.M.T. and M.S.M.A.; writing (review and editing) H.M.T. and M.S.M.A.; visualisation, H.M.T., M.S.M.A. and R.N.Z.R.A.R.; supervision, M.S.M.A., R.N.Z.R.A.R., A.T.C.L.; project administration, M.S.M.A.; funding acquisition, M.S.M.A., R.N.Z.R.A.R. and A.T.C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** Putra Grant funded this research, grant number 9601600.

**Acknowledgments:** This research and H.M.T scholarship were supported by research grant (GP-IPS/2017/9601600) and Graduate Research Fellowship (GRF) fund from Universiti Putra Malaysia.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Characterization of the Novel Ene Reductase Ppo-Er1 from** *Paenibacillus Polymyxa*

#### **David Aregger, Christin Peters and Rebecca M. Buller \***

Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Department of Life Sciences and Facility Management, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820 Waedenswil, Switzerland; David.Aregger@zhaw.ch (D.A.); Christin.peters@zhaw.ch (C.P.)

**\*** Correspondence: rebecca.buller@zhaw.ch; Tel.: +41-58-934-5438

Received: 31 January 2020; Accepted: 14 February 2020; Published: 19 February 2020

**Abstract:** Ene reductases enable the asymmetric hydrogenation of activated alkenes allowing the manufacture of valuable chiral products. The enzymes complement existing metal- and organocatalytic approaches for the stereoselective reduction of activated C=C double bonds, and efforts to expand the biocatalytic toolbox with additional ene reductases are of high academic and industrial interest. Here, we present the characterization of a novel ene reductase from *Paenibacillus polymyxa*, named Ppo-Er1, belonging to the recently identified subgroup III of the old yellow enzyme family. The determination of substrate scope, solvent stability, temperature, and pH range of Ppo-Er1 is one of the first examples of a detailed biophysical characterization of a subgroup III enzyme. Notably, Ppo-Er1 possesses a wide temperature optimum (Topt: 20–45 ◦C) and retains high conversion rates of at least 70% even at 10 ◦C reaction temperature making it an interesting biocatalyst for the conversion of temperature-labile substrates. When assaying a set of different organic solvents to determine Ppo-Er1 s solvent tolerance, the ene reductase exhibited good performance in up to 40% cyclohexane as well as 20 vol% DMSO and ethanol. In summary, Ppo-Er1 exhibited activity for thirteen out of the nineteen investigated compounds, for ten of which Michaelis–Menten kinetics could be determined. The enzyme exhibited the highest specificity constant for maleimide with a *k*cat/*K*<sup>M</sup> value of 287 mM−<sup>1</sup> s−1. In addition, Ppo-Er1 proved to be highly enantioselective for selected substrates with measured enantiomeric excess values of 92% or higher for 2-methyl-2-cyclohexenone, citral, and carvone.

**Keywords:** biocatalysis; ene reductase; enzyme sourcing; old yellow enzyme; solvent stability

#### **1. Introduction**

Many bioactive molecules contain at least one chiral center rendering the development of effective asymmetric synthesis methods essential for the chemical industry. Besides the well-established metal- and organocatalytic approaches [1], biocatalytic strategies offer an interesting alternative to install chirality into small molecules. To date, industrial biocatalysis has mastered a range of enzyme families including ketoreductases [2], transaminases [3], and imine reductases [4]. Looking forward, the increasing power of genomic mining and enzyme engineering will allow industrial access to even more enzyme families leading to an expansion of the available biocatalytic toolbox [5].

The families of enzymes collectively known as ene reductases (ERs) catalyze the stereoselective trans- and, more rarely, cis-hydrogenation of activated alkenes [6–9]. Thus, ene reductases offer a valuable access route to asymmetric compounds, which is complementary to the chemical cis-hydrogenation catalyzed by chiral rhodium or ruthenium phosphine catalysts [10,11]. Today, ene reductases are classified into five enzyme groups, which differ in structure, reaction mechanism, substrate spectrum, and stereoselectivity (Figure 1) [12]. While enoate reductases, medium- and short-chain dehydrogenases/reductases (MDR and SDR), as well as the recently discovered quinone reductase-like ene reductases [13], are currently being investigated in terms of their industrial

y

potential [14], enzymes stemming from the old yellow enzyme (OYE) family are established members of the biocatalytic toolbox and are the best characterized and most extensively employed ene reductases today [6].

**Figure 1.** Overview of the classification within the ene reductase family [15]. QnoR (NADPH-dependent quinone reductase like ene-reductases), EnoR (enoate reductase), OYE (old yellow enzyme), MDR (medium-chain dehydrogenase/reductase), and SDR (short-chain dehydrogenase/reductase); Class I (classical OYE); Class II (thermophilic-like OYE) and Class V (fungal OYE).

Isolated in 1932 by Warburg and Christian from bottom-fermented brewer's yeast (*Saccharomyces pastorianus*), the first such ene reductase was named "yellow enzyme" [16]. After the discovery of several additional members belonging to the same enzyme family the "yellow enzyme" was renamed to "old yellow enzyme" (OYE1) [17]. OYEs preferentially accept α,β-unsaturated ketones, aldehydes, nitroalkenes, and some carboxylic acids as substrates [7]. In the last decade, the catalytic mechanism of OYEs has been exhaustively investigated and its general principle is well understood: The enzymes follow a bi–bi ping–pong mechanism, which can be divided into a reductive and an oxidative half reaction [18]. In the reductive half-reaction, flavin mononucleotide (FMN) is reduced through hydride transfer from NAD(P)H, whereas in the oxidative half reaction a hydride is transferred from the reduced flavin to the C<sup>β</sup> of the activated alkene. The missing proton for the C<sup>α</sup> is transferred via a tyrosine residue from the opposite site [18,19], ultimately leading to an anti-addition hydrogenation.

The catalytic machinery of OYE enzymes is supported by a typical (α,β) 8-barrel (TIM-barrel) fold with additional secondary structural elements present (e.g., four β-strands and five α-helices in OYE1 [20]; six β-strands and two α-helices in 12-oxophytodienoate reductase OPR [18]). The folded domain is known to occur in different oligomeric states, such as monomers (PETN reductase) [21], dimers (OYE1) [20], tetrameters (dimers of dimers such as YqjM [22] or TOYE [23]), octamers, and dodecamers [23]. The oligomerization state is described to be often governed by the position and amino acid composition of surface loops [7]. In addition, the constitution of the loops can have an influence on thermostability [23].

Notably, amino acid sequence alignments of OYE homologs show high conservation in specific regions of the proteins, such as residues involved in catalysis, FMN, and substrate binding [7,15,23]. To account for these differences in sequence and the resulting structural features, the old yellow enzyme family can be further divided into five subclasses [15]. While enzyme members of the subclass I, also termed "classical" old yellow enzymes, and class II, introduced by Scrutton's group in 2010 and dubbed "thermophilic-like" [23], have been well explored [7,14], the recently described class III–V are less well investigated [15,24].

Synthetic applications of ene reductases are manifold and range from the preparation of profens [25–27] and chiralγ-amino acids [28–30] to the synthesis of chiral phosphonates [31] and nitroalkanes [32], precursors in the synthesis of pharmaceutically active ingredients. To further promote an off-the-shelve synthetic use of ene reductases, which can reduce the time and cost of the implementation of a biocatalytic step into a process significantly, we set out to expand the available biocatalytic toolbox [15]. In this context, not only the discovery and engineering of novel ene reductases is of great utility [33], but also a careful characterization of the new biocatalysts is needed as it may lead to the construction of a more targeted enzyme library associated with reduced screening time and costs.

Herein, we showcase the detailed characterization of Ppo-Er1 from *Paenibacillus polymyxa*, an OYE subclass III enzyme, and highlight the enzyme's substrate scope, kinetic parameters, solvent tolerance, as well as pH and temperature profile. The data presented may facilitate future screening and engineering studies and, in selected cases, thus, lead to the faster adoption of an ene reductase in chemical process development.

#### **2. Results and Discussion**

The enzyme Ppo-Er1 from *P. polymyxa* was discovered during the screening of 19 bacterial wild-type strains from the Culture Collection of Switzerland, as previously described [15]. Ppo-Er1 (41.3 kDa) is characterized by a substantial sequence similarity with the old yellow enzyme YqiG from *Bacillus subtilis* (50%) [34], Bac-OYE2 from *Bacillus* sp. (50%) [35], Lla-Er from *Lactococcus lactis* (39%) [15], and LacER from *Lactobacillus paracasei* (47%) [36], all of which belong to the subclass III of the OYE family. In detail, Ppo-Er1 contains a specific combination of motifs known from the classical and thermophilic-like groups that has been found to be characteristic for class III enzymes [15]: Gln104 and Arg228 predicted to interact with the pyrimidine ring of FMN [22], His 171, and Asn 175 proposed to interact with N1 and N3 of FMN [22,37]; Thr30 suggested to interact with isoalloxazine ring O4 of FMN [38]; and Met29, Leu324, and Arg321, which presumably interact with the dimethyl benzene moiety of FMN. As expected, subclass III old yellow enzyme Ppo-Er1 is thus phylogenetically positioned between classical and thermophilic-like OYEs.

#### *2.1. Expression and Characterization of Ppo-Er1*

The ready-to-use plasmid consisting of pET-28b(+) vector and the Ppo-Er1 sequence was assembled by Twist Bioscience and a C-terminal His6 tag for protein purification by affinity chromatography was included. The soluble recombinant expression of Ppo-Er1 in *Escherichia coli* BL21 (DE3) was achieved in terrific broth (TB) medium at 25 ◦C. Ppo-Er1 was purified by affinity chromatography using Ni-NTA resin (Figure S1) and the cofactor FMN was reconstituted before further analysis. FMN reconstitution (100 μM) proved necessary to obtain a fully active enzyme as without this step the enzyme preparation only exhibited 8% (0.05 U/mg for cyclohexanone) of the expected activity (0.61 U/mg for cyclohexanone). This effect was also described for the OYEs LacER [36] and Lla-Er [15]. In the case of LacER, for example, the addition of FMN after purification by DEAE ion exchange chromatography increased the activity by a factor of 92 from 0.0018 to 0.168 U/mg for the substrate *trans*-2-hexen-1-al. This observation suggests that—similar to other known OYEs—the binding affinity of Ppo-Er1 to FMN under purification conditions is low, a fact that has to be kept in mind for any following activity analysis. The storage stability of the purified Ppo-Er1 proved to be very good, boding well for the enzyme's incorporation in potential enzyme screens: At −20 ◦C and in the presence of 20% glycerol, the enzyme did not lose any activity even when stored for an extended period of time (one week), whereas an activity drop of approximately 20% was observed after incubation for 10 days at 4 ◦C (no additives). In contrast to a number of reported OYEs [15,39], we found that NADPH and NADH are equally preferred physiological cofactors of Ppo-ER1 (Figure S14) allowing for maximum flexibility in the choice of recycling system during process development. Both, the coupled-enzyme approach [40] or the use of alternative hydride sources [41,42] will thus be conceivable options to avoid having to add stoichiometric amounts of the coenzymes.

The oligomeric state of Ppo-Er1 was determined via gel filtration by correlation with a commercial gel filtration standard containing proteins of specific size. Based on this comparison, Ppo-Er1 mostly occurs as a monomer (Figure S2) as do for example PETN from *Enterobacter cloacae* [21] and RmER from *Ralstonia metallidurans* [43], both thermophilic-like ene reductases.

Further relevant parameters for application such as optimum pH, optimum temperature, and long-term temperature stability were determined using the substrate cyclohexenone. The pH profile of Ppo-Er1 was measured in Davies buffer covering pH 5 to pH 10 [44], in which the enzyme reached about 50% of the activity observed in 50 mM phosphate buffer (Figure S3). The pH profile was found to be bell-shaped, exhibiting a narrow optimum at pH 6.5–7.5 (Figure 2). Beyond this range, enzyme activity decreases rapidly, especially when the enzyme was pre-incubated for a longer time period (24 h) in the measurement buffers (Figure 2). In the case of other characterized class III OYEs such as LacER [36] and YqiG [15,34], a similar pH profile was determined albeit with a wider pH working range as indicated by the reported optimum activities in the range of pHopt 8–9 and pHopt 6–9, respectively. Notably, OYE enzymes belonging to other subclasses exhibit similar pH profiles as reported for Ppo-Er1, e.g., the "classical" XenB [45] and NemA [45] with a pHopt of 6–7.5, the "thermophilic-like" YqjM [46] and Chr-OYE3 [47] with a pHopt of 6–8, and the class IV enzyme Ppo-Er3 [15] with a pHopt of 7–8.5.

**Figure 2.** pH profile of Ppo-Er1 measured between pH 5 and pH 10 in Davies buffer [44]. The enzyme was preincubated at 25 ◦C in the respective measurement buffer solution for 10 min and 24 h, respectively, to determine the stability and activity of Ppo-Er1 in dependence of pH. Relative specific activity corresponds 100% to an activity of 0.41 U/mg for cyclohexenone. The error bars show the standard deviation of triplicates.

In terms of thermal robustness, Ppo-Er1 possesses interesting long-term stability. After 24 h incubation at 20 ◦C, enzyme activity toward cyclohexenone remained virtually unchanged, whereas residual activity of approximately 70% was detected after an equally long incubation time at 30 ◦C. Furthermore, short-term exposure of Ppo-Er1 to 45 ◦C led to only a marginal loss in activity (<10%) allowing the enzyme to be used for applications that require higher temperatures (Figure 3). These results are in line with data obtained for other class III and IV enzymes such as YqiG and Ppo-Er3, which have reported Topt values of 25–40 ◦C [15,34]. Strikingly, Ppo-Er1 retained a relative specific activity of >70% at temperatures as low as 10 ◦C making the enzyme an interesting candidate to be used for the transformation of thermolabile substrates such as aldehydes (Figure 3). Overall, our Ppo-Er1 data confirm that the temperature profile of class III enzymes resembles those of their mesophilic counterparts of class I, for example NemA [45] with a reported Topt of 30–50 ◦C and OYE2p [48] with a Topt of 25–40 ◦C. Finally, we employed the *Thermo*FAD technique to determine the melting temperature of Ppo-Er1 and found that the ene reductase unfolds at Tm = 46.5 ± 1 ◦C (Figure S15).

**Figure 3.** The temperature profile and the temperature stability of Ppo-Er1. For the temperature profile Ppo-Er1 was incubated for 5 min at different temperatures (10–60 ◦C) and directly measured for the conversion of substrate cyclohexenone (1 mM). For the temperature stability measurement, Ppo-Er1 was incubated at four different temperatures (4–40 ◦C) and measured after 24 h at 25 ◦C. The error bars show the standard deviation of triplicates. Relative specific activity corresponds 100% to an activity of 0.52 U/mg for cyclohexenone.

The use of cosolvents is often a "must" in biocatalytic processes due to the presence of high concentrations of various organic substrates. Consequently, in many instances the solvent stability of enzymes needs to be optimized by enzyme engineering to generate catalysts that are compatible with the process conditions [49]. To verify the stability of Ppo-Er1 in the presence of a set of typical solvents, we thus determined the enzymatic activity over a concentration range of 10–40% of DMSO, DMF, cyclohexane, ethanol, and ethyl acetate. The enzyme performed best in cyclohexane (assayed substrate: 1 mM hexenal), which did not cause a significant loss in activity even when supplemented to a final volume of up to 40% in the assay. Alternatively, DMSO could be considered as a viable cosolvent for Ppo-Er1 as the enzyme was virtually unaffected up to a concentration of 20% *v*/*v*. Even at a concentration of 30% *v*/*v* DMSO, Ppo-Er1 retained a relative activity of approximately 80% (assay substrate: 1 mM cyclohexenone). The solvent ethanol was shown to also be a suitable choice for this enzyme, as it was tolerated well up to a concentration of 10% *v*/*v*. DMF or ethyl acetate, however, should not be used in combination with Ppo-Er1 as their presence was found to be detrimental for enzymatic activity. Already at a concentration of 10% *v*/*v* activity drops of 30% and 85% were observed, respectively (Figure 4).

In comparison to most known old yellow enzymes, Ppo-Er1 exhibits similar solvent resistance: The thermophilic-like OYE YqjM [46] has been reported to remain active in an analogous concentration range of DMSO, DMF, and ethyl acetate as Ppo-Er1. However, an ethanol concentration of 10% *v*/*v* led to a strong reduction of the half-life of YqjM, which we did not observe in the case of Ppo-Er1. TOYE [23], another thermophilic-like OYE, was reported to exhibit a 50% loss of activity at an ethanol concentration of 45% corresponding to a higher stability toward this solvent compared to Ppo-Er1, whereas the classical PETNR [50] already lost 50% activity in the presence of an ethanol concentration of 20% *v*/*v*. In this context, it should be noted that organic-solvent-tolerant ene reductases have also been reported: FOYE1, originating from an acidophilic iron oxidizer, was shown to perform well in many solvent systems with up to 20% *v*/*v* solvents (ethanol, methanol, acetone, isopropanol, DMSO, THF) clearly outperforming all abovementioned ene reductases in terms of solvent stability [51].

**Figure 4.** Overview of the solvent stability of Ppo-Er1 in DMSO (dimethyl sulfoxide), DMF (dimethyl formamide), cyclohexane, ethanol, and ethyl acetate in a concentration range of 10%–40% *v*/*v*. The standard enzyme assay was performed while the concentration of solvents was varied (substrate for cyclohexane: 1 mM hexenal, all other solvents: 1 mM cyclohexenone). Data are shown as values relative to an enzyme assay without cosolvent in which 100% relative conversion corresponds to the production of 0.84 mM cyclohexanone or 0.49 mM hexanal, respectively. The error bars show the standard deviation of triplicates, except for the 30% *v*/*v* cyclohexane point for which only two measurements were available.

#### *2.2. Substrate Scope, Determination of Michaelis–Menten Parameters, and Stereoselectivity*

To determine the substrate profile of Ppo-Er1, the enzyme was tested for the conversion of nineteen structurally diverse aliphatic and cyclic alkenes bearing ketone, aldehyde, nitro, carboxylic acid, or ester moieties as electron-withdrawing groups. For thirteen substrates, product formation by Ppo-Er1 could be detected. Cyclohexenone, hexenal, 2-methyl-2-pentenal, 4-phenyl-3-buten-2-one, cinnamic aldehyde, maleimide, and carvone (at 5 mM concentration) were converted especially well, and >99% conversion was obtained within 4 h (Table 1). Substrates not accepted by Ppo-Er1 included α,β-unsaturated carboxylic acids such as butenic acid, cinnamic acid, and citraconic acid as well as the ketones 3-methyl-2-cyclohexenone and 3-methyl-2-cyclopentenone, which are characterized by an additional methyl group in the β-position. The α,β-unsaturated ester ethyl crotonate was also not converted.

Based on the obtained data, it can be concluded that the overall substrate profile of Ppo-Er1 resembles that of other subclass III enzymes such as YqiG [15,34] and Lla-Er [15]. For example, 5 mM of cinnamic aldehyde and cyclohexenone are also well converted by Lla-Er [15] (65% ± 4.2% and 23% ± 3.1%) and YqiG [15] (58% ± 2.4% and 55% ± 6.1%) after 1 h at 30 ◦C. Notably, however, marked differences in substrate acceptance by class III enzymes occur for some of the investigated substrates highlighting the importance of an in-depth substrate profiling: Whereas carvone and maleimide are very well converted by Ppo-Er1 (both: >99%), Lla-Er, for example, accepts this compound only poorly (carvone: 2.6% ± 0.1%, maleimide: not converted) [15]. Diethylbenzylidenemalonate conversion by YqiG [15,34] (11% ± 1.3%), on the other hand, significantly exceeded the detected product formations achieved by Lla-Er (<1%) [15] and Ppo-Er1 (1.2%). Moreover, 3-methyl-2-cyclopentenone, which is not converted by Ppo-Er1, Lla-Er [15], and YqiG [15,34], has been shown to be accepted by LacER [36]. Generally, we noted that Ppo-Er1 has a restricted substrate acceptance for cyclic β-methylated substrates such as 3-methyl-2-cyclohexenone and 3-methyl-2-cyclopentenone, which possibly results from a difficulty in accepting substituents at the C<sup>β</sup> position of cyclic compounds in the active site in analogy to other class II, III, and IV enzymes [15,39]. In addition, carboxylic acids

and esters seem to be non-optimal alkene activating groups for this enzyme as conversion of the corresponding substrates was low or not detectable.

**Table 1.** Conversion, steady state kinetics,(a) and enantiomeric excess (ee) of various substrates converted with purified enzymes as determined after 4 h at 20 ◦C (n.d.: not detected; n.s.: not soluble). The given uncertainties show the standard deviation of triplicates.


(a) Reactions (1 mL) were performed in potassium phosphate buffer (50 mM, pH 7.0) containing NADPH (175 μM) and substrate (20 μM–80 mM), depending on substrate, Ppo-Er1 (0.61 μM), and DMSO to solubilize the substrates. The reactions were followed continuously by monitoring NADPH oxidation at 340 nm for 90 sec at 25 ◦C.

To complement the substrate acceptance profile, Michaelis–Menten parameters of Ppo-Er1 for ten diverse substrates were determined (Table 1, Figures S4–S13). Within the tested substrate range, Ppo-Er1 showed the highest catalytic efficiency for maleimide (*k*cat/*K*<sup>m</sup> = 287 mM−<sup>1</sup> s<sup>−</sup>1) followed by trans-β-methyl-β-nitrostyrene (*k*cat/*K*<sup>m</sup> = 41 mM−<sup>1</sup> s<sup>−</sup>1). In combination with the conversion data, the

measured kinetic parameters (Table 1) indicate a general preference for alkenes carrying a phenyl substituent at the C<sup>β</sup> position of the substrates. Overall, Ppo-Er1 s specific activity for other typical ene reductase substrates such as carvone (*k*cat/*K*<sup>m</sup> = 0.5 mM−<sup>1</sup> s−1) and cyclohexanone (*k*cat/*K*<sup>m</sup> = 0.4 mM−<sup>1</sup> s<sup>−</sup>1) was found to be in a similar range as those described for other well-known OYEs such as the classical PETNR (carvone: *k*cat/*K*<sup>m</sup> = 2 mM−<sup>1</sup> s−1; cyclohexanone: *k*cat/*K*<sup>m</sup> = 5 mM−<sup>1</sup> s−1) [50] and the thermophilic-like YqjM (cyclohexanone: *k*cat/*K*<sup>m</sup> = 6.4 mM−<sup>1</sup> s−1) [46] (Table 2). Maleimide, however, is better converted by ene reductases from photosynthetic extremophiles such as CtOYE (*k*cat/*K*<sup>m</sup> = 1940 mM−<sup>1</sup> s−1) or GsOYE (*k*cat/*K*<sup>m</sup> = 399 mM−<sup>1</sup> s−1) [52] the thermophilic-like OYERo2 (*k*cat/*K*<sup>m</sup> = 10,800 mM−<sup>1</sup> s<sup>−</sup>1) [53] or the class III OYE YqiG (*k*cat/*K*<sup>m</sup> = 800 mM−<sup>1</sup> s<sup>−</sup>1) (Table 2).

**Table 2.** Comparison of the catalytic efficiencies (mM−<sup>1</sup> s<sup>−</sup>1) of a range of known old yellow enzymes (OYEs) (YqiG [34], PETNR [50], YqjM [46], TOYE [23], DrER [43], RmER [43], and OYERo2 [53]) from class I–III.


In addition to determining the steady-state kinetic parameters, we also investigated the stereopreference of Ppo-Er1. Based on our results with four selected substrates, Ppo-Er1 displays a similar stereopreference to other reported OYE class III enzymes (Table 3), preferentially forming the *S*-product when converting 2-methy-2-pentenal and citral and forming the *R*-product when transforming carvone and 2-methyl-2-cyclohexenone. Notably, the detected ee values of Ppo-Er1 are generally superior to values determined for YqiG and Lla-Er [15] with the only exception being the enantiomeric excess reported for the conversion of carvone by Lla-Er (>99.9% ee). It should be noted, however, that Lla-Er displayed a low conversion of 2.6% of 5 mM substrate after 1 h at 30 ◦C compared to the >99% conversion of 5 mM substrate by Ppo-Er1 after 4 h at 20 ◦C.

**Table 3.** The enantiomeric excess of some selected OYEs (YqiG [15], Lla-Er [15], Ppo-Er3 [15], OPR1 [54], OPR3 [54], PETNR [50], YqjM [54], TOYE [23]) from classes I–IV. The values presented for YqjM were measured as a reference for Ppo-Er1 and compared with the literature [54].


#### **3. Materials and Methods**

#### *3.1. Materials*

All chemicals were purchased from Merck (Darmstadt, Germany), VWR (Hannover, Germany), or Carl Roth (Karlsruhe, Germany). The purchased chemicals were of the highest available purity or of analytical grade and were used without further purification unless otherwise specified. NADPH tetrasodium salt was ordered from Oriental Yeast Co. Ltd. (Tokyo, Japan). The plasmid (pET 28b(+) incl. Ppo-Er1) was ordered from Twist Bioscience (San Francisco, CA, USA). The HisTrap FF and the HiTrap Desalting columns were ordered from GE Healthcare (Uppsala, Sweden).

#### *3.2. Plasmid*

Twist Bioscience (San Franscisco, CA, USA) cloned the synthetic gene of the codon optimized Ppo-Er1 (Accession Nr: WP\_013369181) with *Nde*I and *Xho*I in the commercial pET28b(+) vector.

#### *3.3. Bacterial Strains and Culture Conditions*

*E. coli* BL21 (DE3) [*fhuA2 [lon] ompT gal (*λ *DE3) [dcm]* Δ*hsdS*] was purchased from New England Biolabs (Beverly, MA, USA). *E. coli* strains were cultured routinely in Lysogeny broth (LB) or TB media and were supplemented with kanamycin (50 μg mL<sup>−</sup>1). Bacterial cultures were incubated in baffled Erlenmeyer flasks in a New Brunswick Innova 42 orbital shaker at 200 rpm and 37 ◦C. Bacteria on agar plates were incubated in a HERATherm Thermo Scientific incubator under air. All materials and biotransformation media were sterilized by autoclaving at 121 ◦C for 20 min. Aqueous stock solutions were sterilized by filtration through 0.22 μm syringe filters. Agar plates were prepared with LB medium supplemented by 1.5% (w/v) agar.

#### *3.4. Expression*

The expression of Ppo-Er1 in *E. coli* BL21 (DE3) was performed by inoculation of TB media (400 mL) supplemented with kanamycin (50 μg mL−1) with an overnight culture (4 mL; 1:100). The culture was incubated at 37 ◦C and 180 rpm until optical density OD600 = 0.5–0.8 was reached. Afterward expression was induced by the addition of 100 μM IPTG, and incubation was continued at 25 ◦C for 18 h. Cells were harvested by centrifugation at 4500× *g* for 10 min at 4 ◦C and either used directly or the pellet was stored by freezing at −20 ◦C.

#### *3.5. Enzyme Purification*

The cell disruption was performed by resuspending the pellet from a 400 mL culture in 20 mL buffer (100 mM sodium phosphate buffer pH 7.5, 300 mM NaCl, supplemented by 30 mM imidazole) and a single passage through a French press (2000 psi). The crude extract was separated from the cell debris by centrifugation at 8000× *g* for 45 min. Purification was achieved by affinity chromatography exploiting the C-terminal His-Tag using an automated Äkta purifier system. The crude extract was filtered (0.45 μm) and applied to a pre-equilibrated 5 mL HisTrap FF column. The unbound protein was washed with five column volumes of buffer supplemented with 45 mM imidazole. The elution of Ppo-Er1 was accomplished by a three-column volume of buffer supplemented with 300 mM of imidazole. The resulting fractions were collected and analyzed by SDS-PAGE. The fractions with a high content of Ppo-Er1 were pooled and desalted using 50 mM sodium phosphate buffer (pH 7.5) to remove the imidazole. This step was performed employing the Äkta purifier system using three coupled 5 mL HiTrap desalting columns. After the system was equilibrated, the Ppo-Er1-containing sample was applied and fractioned. The protein fractions were analyzed via the integrated online absorption measurement at 280 nm. The protein content of the pooled purified sample was determined by measuring the adsorption with a NanoDrop One (Thermo Fisher Scientific) system and using the molecular weight (41.3 kDa) and extinction coefficient (<sup>λ</sup> <sup>=</sup> 280 nm = 38 390 M−<sup>1</sup> cm<sup>−</sup>1) of Ppo-Er1 for the calculation. The extinction coefficient was obtained by using the online calculation tool Prot pi [55].

#### *3.6. Activity Assay*

The activity measurements were recorded spectrophotometrically by observing NADPH consumption at 340 nm for 60–90 s in a 1 mL (1 cm) plastic cuvette in the Lambda 465 (PDA UV/VIS) system from Perkin Elmer. The biocatalytic experiments to obtain the pH and the temperature profile were conducted in sodium phosphate buffer (50 mM, pH 7.5) using 175 μM NADPH, 1 mM

cyclohexenone, and 0.61 μM purified Ppo-Er1. For the determination of the Michaelis–Menten parameters, the substrate concentration was varied in the range of 20 μM–80 mM depending on the substrate while the enzyme concentration was kept constant at 0.61 μM. For the pH profile, Davies buffer [44] was used. All measurements were done in triplicates. Background NADPH consumption was determined in assays in which either the enzyme or the substrate had been eliminated. The substrates were solubilized as 1 M stock in DMSO.

#### *3.7. Biocatalysis Reaction*

The in vitro biocatalysis reaction were performed by using desalted Ppo-Er1 (with a concentration of 12.1 μM), 5 mM substrate (1 M stock in DMSO) supplemented with 100 μM NADPH, 10 mM glucose, and 5 μL GDH (20% w/v cell suspension). The reaction volume was adjusted to 1 mL in a glass vial by using sodium phosphate buffer (200 mM, pH 7.0) and incubated for 4 h at 20 ◦C and 1000 rpm. To determine the solvent stability of Ppo-Er1, the biocatalysis reaction conditions were adapted to include 2.4 μM Ppo-Er1 and 0%–40% v/v solvent (ethanol, ethyl acetate, DMSO, DMF, cyclohexane) in a total reaction volume of 1 mL for 50 min at 20 ◦C and 1000 rpm. All biocatalysis reactions were done in triplicate, biocatalysis results were verified by control reactions omitting the enzyme.

#### *3.8. GC-Analysis*

One milliliter biocatalysis reactions were extracted once with 500 μL methyl *tert*-butyl ether (incl. 1 g/L 1-octanol as internal standard). The phase separation was achieved by centrifugation of the biphasic sample, and the organic phase was separated and subjected to GC analysis (Table S1).

#### *3.9. Gel Filtration*

For the determination of the oligomeric state of Ppo-Er1, the Äkta purifier system employing a HiLoad 16/600 Superdex 75 pg column (GE Healthcare (Uppsala, Sweden)) and sodium phosphate buffer (50 mM, pH 7.5) was used. In a first step, the system was calibrated by using the gel filtration standard from Bio Rad (1.35–670 kDa Prod. no.: #1511901). Then flavin-saturated Ppo-Er1 was applied to system under identical conditions.

#### *3.10. Melting Temperature*

The unfolding temperature was determined by a *Thermo*FAD assay [56] using Rotor-Gene Q RT-PCR machine. Protein samples (0.5–0.3 mg/mL) in 20 μL sodium phosphate buffer pH 7 were measured using a temperature gradient from 25 to 90 ◦C, performing fluorescence measurements every 0.5 ◦C increase after a 10 s delay for signal stabilization. The measurements were performed in triplicates using 470 nm excitation wavelength and 510 nm emission wavelength.

#### **4. Conclusions**

Ppo-Er1 is a well-expressed, easy to purify, old yellow enzyme belonging to the recently introduced subclass III designation. In terms of cofactor preference, the enzyme accepts NADPH and NADH equally well, whereas pH and optimum temperature resemble those of previously described OYEs. Notably, the enzyme exhibits only slightly reduced performance (>70% conversion of 1 mM cyclohexenone) at lowered temperatures (10 ◦C) making it a possible candidate for the transformation of labile substrates such as some aldehydes. In addition, the enzyme was shown to have noteworthy stability in the presence of the solvents cyclohexane (up to at least 40% v/v), DMSO, and ethanol (up to 20% v/v).

The substrate profile analysis with a set of 19 representative alkenes allowed the establishment of Ppo-Er1 s substrate scope highlighting its acceptance of a variety of linear and cyclic compounds with often excellent transformation efficiencies and exquisite stereoselectivity (e.g., 98% ee for carvone). Complementing this analysis with the determination of steady-state kinetics for ten of the substrates allowed us to conclude that Ppo-Er1 classifies well with other subgroup III old yellow enzymes.

In summary, our in-depth characterization of Ppo-Er1 allows the enlargement of the available panel of ene reductases with a versatile biocatalyst having interesting synthetic properties. Its introduction in the biocatalytic toolbox may further facilitate academic and industrial efforts when screening for biocatalysts capable of asymmetric double bond reduction. Looking forward, Ppo-Er1 s performance could be further optimized via enzyme and process engineering.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/10/2/254/s1. Figure S1: SDS-PAGE of the different purification steps for the ene reductase Ppo-ER1; Figure S2: Gel filtration of Ppo-ER1; Figure S3: Activity of Ppo-ER1 in the two used buffers; Table S1: Overview of the used GC-methods; Figure S4: Michaelis–Menten kinetic for maleimide; Figure S5: Michaelis–Menten kinetic for trans-β-methyl-β-nitrostyrene; Figure S6: Michaelis–Menten kinetic for cyclohexanone; Figure S7: Michaelis–Menten kinetic for cinnamaldehyde; Figure S8: Michaelis–Menten kinetic for 2-methyl-2-pentenal; Figure S9: Michaelis–Menten kinetic for carvone; Figure S10: Michaelis–Menten kinetic for citral; Figure S11: Michaelis–Menten kinetic for 2-methyl-2-cyclohexenone; Figure S12: Michaelis–Menten kinetic for cyclopentenone; Figure S13: Michaelis–Menten kinetic for hexenal; Figure S14: Comparison conversion with NADH and NADPH; Figure S15: Melting curve.

**Author Contributions:** Conceptualization: C.P. and R.M.B.; experimental work: D.A.; writing: D.A., C.P., and R.M.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Cofactor F420-Dependent Enzymes: An Under-Explored Resource for Asymmetric Redox Biocatalysis**

**Mihir V. Shah 1,**†**, James Antoney 1,2,**†**, Suk Woo Kang 1,2, Andrew C. Warden 1, Carol J. Hartley 1, Hadi Nazem-Bokaee 1, Colin J. Jackson 1,2 and Colin Scott 1,\***


Received: 20 September 2019; Accepted: 10 October 2019; Published: 20 October 2019

**Abstract:** The asymmetric reduction of enoates, imines and ketones are among the most important reactions in biocatalysis. These reactions are routinely conducted using enzymes that use nicotinamide cofactors as reductants. The deazaflavin cofactor F420 also has electrochemical properties that make it suitable as an alternative to nicotinamide cofactors for use in asymmetric reduction reactions. However, cofactor F420-dependent enzymes remain under-explored as a resource for biocatalysis. This review considers the cofactor F420-dependent enzyme families with the greatest potential for the discovery of new biocatalysts: the flavin/deazaflavin-dependent oxidoreductases (FDORs) and the luciferase-like hydride transferases (LLHTs). The characterized F420-dependent reductions that have the potential for adaptation for biocatalysis are discussed, and the enzymes best suited for use in the reduction of oxidized cofactor F420 to allow cofactor recycling in situ are considered. Further discussed are the recent advances in the production of cofactor F420 and its functional analog FO-5 -phosphate, which remains an impediment to the adoption of this family of enzymes for industrial biocatalytic processes. Finally, the prospects for the use of this cofactor and dependent enzymes as a resource for industrial biocatalysis are discussed.

**Keywords:** cofactor F420; deazaflavin; oxidoreductase; hydride transfer; hydrogenation; asymmetric synthesis; cofactor biosynthesis

#### **1. Introduction**

Enzymes that catalyze the asymmetric reduction of activated double bonds are among the most important in biocatalysis, allowing access to chiral amines from imines (C=N), *sec*-alcohols from ketones C=O), and enantiopure products derived from enoates (C=C). To date, the reduction of imines, ketones and enoates has been achieved largely using enzymes that draw their reducing potential from the nicotinamide cofactors NADH and NADPH; e.g., imine reductases, ketoreductases and Old Yellow Enzymes [1–4]. However, there has been recent interest in an alternative reductive cofactor, cofactor F420 (8-hydroxy-5-deazaflavin) [5,6].

Cofactor F420 is a deazaflavin that is structurally similar to flavins (Figure 1), with a notable difference at position 5 of the isoalloxazine ring, which is a nitrogen in flavins and a carbon in deazaflavins. Additionally, while C-7 and C-8 are methylated in riboflavin, they are not in cofactor F420: C-7 is hydroxylated and C-8 is unsubstituted. These structural differences cause significant differences in the electrochemical properties of cofactor F420 and flavins: a −360–340 mV the redox mid-point

potential of cofactor F420 is not only lower than that of the flavins (−205 mV to −220 mV), but it is also lower than that of the nicotinamides (−320 mV) [7]. Additionally, as a consequence of the substitution of N-5 for a carbon, cofactor F420 cannot form a semiquinone (Figure 1), which means that unlike other flavins, cofactor F420 can only perform two-electron reductions.

**Figure 1.** The structures of NAD(P) (top), cofactor F420 and its synthetic analog FOP (center) and common flavins (riboflavin, FMN and FAD; bottom). The oxidized and reduced forms are shown, as is the flavin semiquinone. Dashed lines indicate the differences in the structures of FOP and cofactor F420, and riboflavin, FMN and FAD.

Cofactor F420 was originally described in methanogenic archaea, where it plays a pivotal role in methanogenesis [8,9]. Cofactor F420 has since been described in a range of soil bacteria supporting a range of metabolic activities, including catabolism of recalcitrant molecules (such as picric acid) and the production of secondary metabolites, such as antibiotics [7]. A comprehensive review of the biochemistry and physiological roles of cofactor F420 was recently published by Greening and coworkers [7]. This review considers the potential of F420-dependent enzymes in industrial biocatalysis, focusing on the enzyme families relevant to biocatalytic applications and the reactions that they catalysis. Cofactor recycling strategies and cofactor production are also discussed, with a focus on the prospects for achieving low-cost production at scale in the latter case.

#### **2. Families of F420-Dependent Enzymes Relevant to Biocatalysis**

With respect to their prospective biocatalytic applications, the two most important families of F420-dependent enzymes are the Flavin/Deazaflavin Oxidoreductase (FDOR) and Luciferase-Like Hydride Transferase (LLHT) families, albeit F420-dependent enzyme from other families have also been shown to have catalytic activities of interest (e.g., TomJ, the imine reducing flavin-dependent monooxygenase or OxyR, the tetracycline oxidoreductase) [10,11]. The FDOR and LLHT families are large and contain highly diverse flavin/deazaflavin-dependent enzymes. In both families, there are enzymes with preferences for flavins, such as flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD), as well as those that use cofactor F420 [12,13]. Moreover, there are F420-dependent FDORs that have been shown to be able to promiscuously bind FMN and use it in oxidation reactions [14]. In this section, the FDOR and LLHT families and the classes of reaction that they catalyze are discussed.

#### *2.1. The FDOR Superfamily*

The FDOR superfamily (PFAM Clan CL0336) can be broadly divided into two groups: the FDOR-As (which includes a sub-group called the FDOR-AAs) and the FDOR-Bs. The FDOR-As are restricted to *Actinobacteria* and *Chloroflexi* and to date no FDOR-As have been described that use cofactors other than F420 [7,12]. The FDOR-Bs are found in a broader range of bacterial genera than the FDOR-A enzymes, and in addition to F420-dependent enzymes, this group also includes heme oxygenases, flavin-sequestering proteins, pyridoxine 5' oxidases and a number of proteins of unknown function [12,15–17]. Both groups of FDOR are highly diverse, with many homologs often found within a single bacterial genome (e.g., *Mycobacterium smegmatis* has 28 FDORs) [18]. In addition, the majority of the enzymes of this family are yet to be characterized with respect to either their biochemical or physiological function, and therefore the FDORs represent a currently under-explored source of enzymes for biocatalysis.

The FDOR enzymes share a characteristic split β-barrel fold that forms part of the cofactor-binding pocket. The majority of the protein sequences of enzymes currently identified as belonging to this family are small single-domain proteins. The topologies of the two FDOR subgroups are broadly similar (Figure 2), with the split-barrel core composed of 7–8 strands and with 4–5 helices interspersed. All FDOR-Bs studied so far have been demonstrated to be dimeric, with stands β2, β3, β5 and β6 making up the core of the dimer interface (Figure 2). In structures of full-length FDOR-As solved to date, the N-terminal helix (if present) lies on the opposite face of the beta sheet to that in FDOR-Bs. Thus, the N-terminus occupies part of the dimer interface region and prevents interaction between the sheets of adjacent monomers. In contrast to the FDOR-Bs, the oligomerization state of the FDOR-As is more varied. While a number of FDOR-As have been determined to be monomeric [18], the deazaflavin-dependent nitroreductase (DDN) from *M. tuberculosis* forms soluble aggregates through the amphipathic N-terminal helix [19]. DDN and the FDOR-AA subgroup have been shown to be membrane-associated [20–22], and FDOR-AAs have been associated with fatty acid metabolism [12]. No structures of FDOR-AAs have been solved to date.

**Figure 2.** Representative structures of F420-dependent FDOR-A (PDB: 3R5Z, panels **A** and **C**) and FDOR-B (PDB: 5JAB, panels **B** and **D**). Both are predominantly composed of a single β-sheet forming a split barrel. The N-terminal helices are spatially displaced between the two families, falling on opposite faces of the β-sheet.

#### *2.2. The LLHT Family:*

The LLHT family form part of the Luciferase-Like Monooxygenase family (PFAM PF00296). They adopt an (α/β)8 TIM-barrel fold with three insertion regions, IS1–4 (Figure 3). IS1 contains a short loop and forms part of the substrate cleft. IS2 contains two antiparallel β-strands, and IS3 contains a helical bundle at the C-terminus of the β-barrel and contains the remainder of the substrate-binding pocket (Figure 3). All structures solved to date from the LLHT family contain a non-prolyl *cis* peptide in β3 [23–26]. Recent phylogenetic reconstructions have shown that the F420-dependent LLHTs form two clades: the F420-dependent reductases and the F420-depented dehydrogenases [27]. The F420-reductases contain methylenetetrahydromethanopterin reductases (MERs), which catalyze the reversable, ring-opening cleavage of a carbon-nitrogen bond during the biosynthesis of folate in some archaea [28–30]. The F420-dependent dehydrogenases can be further divided into three subgroups. The first contains F420-dependent secondary alcohol dehydrogenases (ADFs) and the hydroxymycolic acid reductase from *M. tuberculosis* [31]. The second contains the F420-dependent glucose-6-phosphate dehydrogenases (FGDs) from *Mycobacteria* and *Rhodococcus*, while the third appear to be more general sugar-phosphate dehydrogenases [27]. In contrast to the heterodimeric structure of bacterial luciferase, the F420-dependent dehydrogenases form homodimers with the dimer interface burying a relatively large portion of the surface area of the monomers (≈2000 Å2, roughly 15%

of the total surface area) [24–26]. A number of enzymes involved in the F420-dependent degradation of nitroaromatic explosives, such as picrate and 2,4-dinitroanisole, appear to belong to the LLHT family as well [32,33].

**Figure 3.** Structure of representative luciferase-like hydride transferase (LLHT) (PDB: 1RHC). (**A**) A 3D representation of the biologically relevant dimer (panel **A**). Monomer of an LLHT with insertion sequences IS1–4 highlighted, along with the helical bundle composed of α7–9 (panel **B**). Topology diagram showing (α/β)8 fold with insertion sequences highlighted: IS1, red; IS2, orange; IS3, light green, IS4, pink. The helical bundle of α7–9 is highlighted in purple (panel **C**).

#### *2.3. Cofactor F420-Dependent Reactions with Relevance to Biocatalysis*

From the perspective of biocatalysis, cofactor F420-dependent enzymes catalyze a number of key reductions including the reduction of enoates, imines, ketones and nitro-groups (Table 1; Figure 4).

**Figure 4.** Representative cofactor F420-dependent oxidoreductions with the potential for adaptation to biocatalytic applications. Included are: nitroreduction, enoate reduction, ketoreduction and imine reduction (from top to bottom). For clarity, only the dehydropiperidine ring of the thiopeptide is shown and partial structures for biliverdin-Ixα and phthiodiolone dimycocerosates are shown.

For enoate reductions, a small number of FDORs have been studied. However, the substrate range for most of these enzymes is yet to be fully elucidated. The ability of the mycobacterial FDORs to reduce activated C=C double bonds was first identified when DDN was shown to be responsible for activating the bicyclic nitroimidazole PA-824 in *M. tuberculosis*. These enzymes were then shown to also reduce enoates in aflatoxins, coumarins, furanocoumarins and quinones [6,12,14,16,34–38]. Recent studies have shown that these enzymes are promiscuous and can use cyclohexen-1-one, malachite green and a wide range of other activated ene compounds as substrates [35]. However, there have been a few FDOR studies to date that have examined their kinetic properties and stereospecificity. In one of these studies, FDORs from *Mycobacterium hassiacum* (FDR-Mha) and *Rhodococcus jostii* RHA1 (FDR-Rh1 and FDR-Rh2) were shown to reduce a range of structurally diverse enoates with conversions ranging from 12 to >99% and *e.e*. values of up to >99% [6]. Interestingly, it has been proposed that both the hydride and proton transfer from F420H2 in these reactions was directed to the same face of the activated double bond (Figure 5), which results in the opposite enantioselectivity compared to that of the FMN-dependent Old Yellow Enzyme family of enoate reductases [6]. This suggests that the F420-dependent FDORs may provide a stereocomplementary enoate reductase toolbox. However, other studies suggest that protonation of the substrate is mediated by solvent or an enzyme side-chain (as it

is in Old Yellow Enzyme) [37]. Further structure/function studies are needed to fully understand the mechanistic diversity of this family of enzymes.

**Figure 5.** Enoate reduction by a flavin-dependent enzyme (Old Yellow Enzyme) and the proposed mechanism for cofactor F420-dependent reduction. Notably the mechanism of reduction yields *trans*-hydrogenation products for Old Yellow Enzyme and *cis*-hydrogenation products for the F420-dependent enzymes.

The LLHT family contains several enzymes with alcohol oxidase or ketoreductase activity (Table 1; Figure 4). The F420-dependent glucose-6-phosphate dehydrogenases of several species have been investigated [25,26,39]. Although an extensive survey of their substrate ranges has yet to be conducted, it has been demonstrated that glucose is a substrate for the *Rhodococcus jostii* RHA1 enzymes [26]. An F420-dependent alcohol dehydrogenase (ADH) from *Methanogenium liminatans* has been shown to catalyze the oxidation of the short chain aliphatic alcohols 2-propanol, 2-butanol and 2-pentanol (85, 49 and 23.1 s−<sup>1</sup> *k*cat, 2.2, 1.2 and 7.2 mM *K*<sup>M</sup> respectively) [40], but it was unable to oxidize primary alcohols, polyols or secondary alcohols with more than five carbons. It is unclear whether these alcohol oxidations are reversible, but in the oxidative direction, these reactions provide enzymes that can be used to recycle reduced cofactor F420 (see Section 4). Alcohol oxidation can also be used to produce ketones as intermediates in biocatalytic cascades that can then be used in subsequent reactions, such those catalyzed by transaminases or amine dehydrogenases in chiral amine synthesis [1,41–43] or by ketoreductases or alcohol dehydrogenases in chiral *sec*-alcohol synthesis (i.e., deracemization or stereoinversion of *sec*-alcohols). This approach can be achieved in a one pot cascade if different cofactors are used for the oxidation and reduction (Figure 6) [44].

At least one F420-dependent ketoreductase has been described. The mycobacterial F420-dependent phthiodiolone ketoreductase catalyzes a key reduction in the production of phthiocerol dimycocerosate, a diacylated polyketide found in the mycobacterial cell wall [45]. Although the physiological role of

this enzyme has been elucidated, biochemical studies of the catalytic properties and substrate range are required to assess this enzymes' potential for use as a biocatalyst.

**Figure 6.** Proposed scheme for one-pot, enzyme cascades for deracemization/steroinversion of *sec*-alcohols (top) and chiral amine synthesis (bottom) using cofactor F420-dependent alcohol oxidation.

F420-dependent enzymes have also been shown to reduce imines (Table 1; Figure 4). An FDOR fromr *Streptomyces tateyamensis* (TpnL) is responsible for the reduction of dehydropiperidine in the piperidine-containing series *a* group of thiopeptide antibiotics produced in this bacterium (Figure 4). TpnL was identified as the F420-dependent dehydropiperidine reductase responsible for the reduction of dehydropiperidine ring in thiostrepton A to produce the piperidine ring in the core macrocycle of thiostrepton A [45]. TpnL activity was affected by substrate inhibition at concentrations higher than 2 μM of thiostrepton A, preventing the measurement of the *K*M, but its *k*cat/*K*<sup>M</sup> was measured at 2.80 <sup>×</sup> <sup>10</sup><sup>4</sup> <sup>M</sup>−<sup>1</sup> <sup>S</sup>−<sup>1</sup> [45]. The substrates for phthiodiolone ketoreductase and TpnL are large secondary metabolites and, as yet, it is unclear if it will accept smaller substrates or substrates with larger/smaller heterocycles (e.g., dehydropyrroles).

Another F420-dependent imine reductase (TomJ) has been described from *Streptomyces achromogenes* that reduces the imine in 4-ethylidene-3,4-dehydropyrrole-2-carboxylic acid during the production of the secondary metabolite tomaymycin, which has been shown to have potentially interesting pharmaceutical properties [11]. Additionally, the reduction of a prochiral dihydropyrrole to a pyrrole is a reaction with a number of biocatalytic applications [5].

Nitroreductases have the potential application in the reduction of a prochiral nitro group to form a chiral amine [46]. The LLHT family F420-dependent nitroreductase Npd from *Rhodococcus* catalyzes the two-electron reduction of two nitro groups in picric acid during catabolism of the explosive TNT (Table 1; Figure 4) [47]. While this stops short of reducing the nitro group to an amine, this catalytic activity may contribute to a reductive cascade that achieves this conversion.

The final class of reaction for consideration in this review is the unusual, reversable ring-opening/ closing reaction catalyzed by the MERs (Figure 4; Table 1). This reaction is required for folate biosynthesis in some archaea [23,28–30]. However, ring-closing reactions of this type could be used for producing N-containing heterocycles, which are intermediates in the synthesis of numerous pharmaceuticals [48,49]. The promiscuity of the MERs has not yet been investigated, and so the potential to re-engineer these enzymes is not fully understood.


**Table 1.** Characterized F420-dependent enzymes with activities that could be adapted for biocatalytic applications.

#### **3. Cofactor Recycling for Cofactor F420**

Cofactor recycling is essential for the practical application of the F420-dependent enzymatic processes in biocatalysis. There are various strategies for cofactor regeneration for NADH and NADPH, including enzymatic, chemical, electrochemical and photochemical methods [52]. In this section, the potential enzymes for the regeneration of cofactor F420 are discussed. As most of the industrially relevant F420-dependent reactions are asymmetric reductions, F420-dependent oxidases are required for cofactor regeneration. Figure 7 shows the characterized enzymes that catalyze F420-dependent oxidations that could be applied in cofactor F420 reduction.

Emulating methods developed for nicotinamide cofactors, both formate dehydrogenase (FDH) and glucose 6-phosphate dehydrogenase (G6PD) enzymes are attractive enzymatic routes for cofactor reduction both in vitro [53–56] and in vivo [57,58]. Fortunately, F420-dependent G6PDs and FDHs have been identified and characterized. The F420-dependent G6PD from *Mycobacteria* (FGD) is one potential cofactor F420-recycling enzyme. FGD is the only enzyme in these bacteria known to reduce oxidized cofactor F420. The intracellular concentration of G6P in *Mycobacteria* is up to 100-fold higher than it is in *E. coli*, which provides a ready source of reducing power for F420-dependent reduction reactions [59]. FGD from *Rhodococcus jostii* and *Mycobacterium smegmatis* have been studied and expressed in *E. coli*, both the enzymes were stable in in vitro assays [26,39,60]. Both FGDs have been expressed in engineered *E. coli* producing cofactor F420 together with FDORs [38,59] FGDs have been shown to efficiently regenerate reduced cofactor F420 both in vivo and in vitro. However, the cost of the glucose-6-phosphate and the need to separate reaction products from the accumulated FGD byproduct (6-phosphoglucono-d-lactone) may prove to be impediments for the adoption of FGD as a recycling system for cofactor F420 in the in vitro biotransformations.

**Figure 7.** Cofactor F420-dependent oxidation reactions that could be exploited to produce reduced cofactor F420.

Formate is an excellent reductant for cofactor recycling, with FDH-dependent cofactor reduction yielding carbon dioxide, a volatile byproduct that can be easily removed from the reaction mixture, thereby simplifying the downstream processing of the product of interest. Additionally, formate is a low-cost reagent, leading to favorable process economics. Most methanogens have the capability to use formate as sole electron donor using F420-dependent formate dehydrogenase [61]. The soluble F420-dependent FDH from *Methanobacterium formicium* has been expressed in *E. coli* [62], purified and studied in vitro with the reduction of 41.2 μmol of F420 min−<sup>1</sup> mg−<sup>1</sup> of FDH, with non-covalently bound FAD required for optimal activity [8]. *Methanobacterium ruminantium* FDH reduces cofactor F420 at a much slower rate than *M. formicium*: 0.11 μmol of F420 min−<sup>1</sup> mg−<sup>1</sup> of FDH [8]. As yet, the use of F420-dependnt FDHs for in vitro cofactor recycling has been sparsely studied. However, as these enzymes are soluble and can be heterologously expressed, they represent a promising system for use in cofactor F420-dependent biocatalytic processes.

Another potential recycling system for cofactor F420 is the F420:NADPH oxidoreductase (Fno), which couples the reduction of cofactor F420 with oxidation of NADPH. Methanogenic archaea use this enzyme to transfer reducing equivalents from hydrogenases to produce NADPH via F420, while in bacteria it functions in the opposite direction, that is, to provide the cell with reduced F420 via NADPH [63]. Fno is also required for the production of reduced F420 for tetracycline production in *Streptomyces* [63]. The Fno enzymes from the thermophilic bacteria *Thermobifida fusca* and the thermophilic archaeon *Archeoglobus fulgidus* have been expressed in *E. coli* [64,65]. These enzymes are thermostable, with their highest activity observed at 65 ◦C. As the redox midpoint potentials of NADP and cofactor F420 are very similar, it is perhaps unsurprising that pH has a significant influence on the equilibrium of the reaction, with the reduction of NADP<sup>+</sup> favored at high pH (8–10) and the reduction of F420 favored at low pH (4–6) [64,65]. The Fno *Streptomyces griseus* has also been purified and characterized, and also displayed pH-dependent reaction directionality [66]. Fno may be an excellent enzyme for the in vivo reduction of cofactor F420, where NADPH would be provided from

central metabolism. However, for its use as a cofactor F420 recycling enzyme in vitro, Fno would need to be coupled with an NADPH regenerating enzyme, such as an NADPH-dependent formate dehydrogenase [67]. This added complexity and cost may limit the use of Fno-dependent cofactor F420 recycling in vivo.

Hydrogenotrophic archaea, including methanogens and sulfate-reducing archaea, possess an essential, cofactor F420-dependent hydrogenase (FhrAGB) [68–71]. These nickel/iron enzymes could potentially be used in vivo to allow the direct H2-dependent reduction of cofactor F420. However, as these heterododecameric enzymes have complex cofactor requirements (four [4Fe 4S] clusters, and NiFe center and FAD), are oxygen-sensitive and tend to aggregate [71], it is unclear if they can be made suitable for in vitro use.

#### **4. Cofactor Production**

The lack of a scalable production system for cofactor F420 has been noted as a major impediment to the adoption of F420-dependent enzymes by industry [5]. Cofactor F420 is available as a research reagent (http://www.gecco-biotech.com/), but its production at scale is not yet economic. In fact, most research laboratories with an interest in cofactor F420-dependent enzymes synthesize and purify the cofactor themselves using slow-growing F420 producing microorganisms, most commonly methanogens and actinobacteria (Table 2). The economic production of cofactor F420 at large scale is not feasible using natural producers as they are ill-suited to industrial fermentation and generally lack the genetic tools required to improve cofactor F420 yield.


**Table 2.** Published production systems for cofactor F420.

<sup>a</sup> Mol weight of F420 with 1 glutamate tail is 773.6 Da, which was used to convert values published as μg of F420, noting that micro-organisms produce mixture of F420 with different number of glutamates (1–9) attached. <sup>b</sup> Concentration estimated through absorbance at 420 nm and using extinction coefficient of 41.4 mM−<sup>1</sup> cm−<sup>1</sup> [73]. <sup>c</sup> F420 concentration per g of wet cell weight. <sup>d</sup> Concentration of F420 not mentioned in the publication, but F420 yield was stated to be 10 times higher than wild-type *M. smegmatis*. <sup>e</sup> Concentration estimated through absorbance at 400 nm and using extinction coefficient of 25.7 mM−<sup>1</sup> cm−<sup>1</sup> [74].

Recently, there have been significant advances towards the scalable production of the cofactor for F420-dependent enzymes. *M. smegmatis* has been engineered to overexpress the biosynthetic genes for cofactor F420 production, leading to a substantial improvement in yields (Table 2) [72]. However, *M. smegmatis* is not ideally suited as a fermentation organism as it is slow growing, forms clumps

during cultivation and is not recognized as GRAS (generally regarded as safe). More recently, the biosynthetic pathway for cofactor F420 has been successfully transplanted to *E. coli* [59], allowing the heterologous production of the cofactor at levels similar to those of the natural F420 producers (Table 2) [59], accumulated to 0.38 μmol of F420 per gram of dry cells.

There is scope to further improve the production of F420 in *E. coli*. Cofactor F420 does not appear to be toxic to *E. coli* [59], which suggests that there is little interaction between F420 and the enzymes *E. coli* (although this is yet to be confirmed experimentally). The thermodynamics of cofactor F420 production are favorable (Appendix A), suggesting that there are no major thermodynamic impediments to improving yield. Interestingly, the first dedicated step of cofactor F420 production (catalyzed by CofC/FbiD) is not energetically favorable and may consequently be sensitive to intracellular metabolite concentrations. In addition to the engineering considerations that this may impose, it may also be responsible for the biochemical diversity of this step in different microorganisms. In different microbes, the CofC/FbiD-dependent step uses 2-phospholactate [75], 3-phosphoglycerate [76] or phosphoenolpyruvate [59] as a substrate, which may reflect the relative abundance of those metabolites in various bacteria and archaea and the thermodynamic constraints on this step.

Another recent advance is the production of a synthetic analog of cofactor F420, called FO-5 -phosphate (FOP). FOP was derived from FO, the metabolic precursor of cofactor F420, which is phosphorylated using an engineered riboflavin kinase [38]. FOP has also been shown to function as an active cofactor for cofactor F420-dependent enzymes activities, albeit there is a penalty in the rates of these reactions [38]. Drenth and coworkers prepared FO by chemical synthesis, using a method developed by Hossain et al. [77]. However, it is likely that the engineered kinase for the phosphorylation of FO could be introduced into an organism that over-produces FO allowing for the production of FOP by fermentation. This semisynthetic pathway would have the advantage that it needs only two biosynthetic steps, instead of the four steps needed for cofactor F420 production, and demands less metabolic input from the native host metabolism (e.g., no glutamate is required) [38]. The production of FOP also opens the possibility of making deazaflavin analogs of FMN and FAD, which would be electrochemically more like F420 than flavins, but may still bind FMN and FAD- dependent enzymes and potentially allow access to new chemistry with already well-characterized enzymes.

#### **5. Prospects**

Reduced cofactor F420 is electrochemically well suited for biocatalytic applications, and the small number of F420-dependent enzymes characterized to date show promise as potential biocatalysts (as discussed above). However, before these enzymes can be widely and effectively used as biocatalysts, further research is needed to better characterize them as the biochemistry of cofactor F420-dependent enzymes remains under-explored. The LLHT and FDOR families are a rich source of highly diverse enzymes with considerable potential for biocatalysis, albeit much of the research to date has focused on the physiological roles of these enzymes, rather than their in vitro enzymology. Although some of these enzymes have been shown to have small molecule substrates, those involved with secondary metabolite biosynthesis tend to act on high molecular weight substrates and it is not yet clear whether they will accept lower molecular weight molecules.

To be cost competitive, cofactor F420 needs to have effective recycling systems. The enzymes for cofactor recycling have already been identified, although there have been a few studies investigating their performance in this role. Moreover, alternative cofactor recycling strategies, such electrochemical or photochemical recycling, have not yet been investigated for cofactor F420. The production of cofactor F420 at scale and at low cost remains a roadblock for the use of these enzymes by industry. However, considerable progress has been made on this front in the last few years and it is likely that low cost cofactor F420, or F420 surrogates, will soon be available. Additionally, the availability of F420-producing bacteria with tools for facile genetic manipulation, along with a growing number of empirically determined protein structures, opens up the prospect of improving this class of enzymes using in vitro evolution and rational design. It is notable that there is still some uncertainty concerning

the mechanistic detail of F420-dependent reactions, which need to be addressed through a detailed structure/function analysis to enable a rational design of these enzymes.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/9/10/868/s1.

**Author Contributions:** Writing—original draft preparation, M.V.S., J.A., S.W.K., H.N.-B., C.J.J. and C.S.; writing review and editing, M.V.S., J.A., S.W.K., H.N.-B., A.C.W., C.J.H., C.J.J. and C.S.

**Funding:** M.V.S. and H.N.-B. are funded by the CSIRO Synthetic Biology Future Science Platform. J.A. is funded by the Australian Government Research Training Program (AGRTP) and S.W.K. is funded by the Korean Institute of Science and Technology and the AGRTP.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

5AD: 5 -Deoxyadenosine; 5ARPD: 5-amino-6-(d-ribitylamino)uracil; 5ARPD4HB: 5-amino-5-(4-hydroxybenzyl)- 6-(d-ribitylimino)-5,6-dihydrouracil; dF420-0: Dehydro coenzyme F420-0 (oxidized); EPPG: Enolpyruvyl-diphospho-5 -guanosine; Fo: 7,8-Didemethyl-8-hydroxy-5-deazariboflavin; F420-0: Coenzyme F420-0 (oxidized); F420-1: Coenzyme F420-1 (oxidized); F420-2: Coenzyme F420-2 (oxidized); F420-3: Coenzyme F420-3 (oxidized); FMN: Flavin mononucleotide (oxidized); FMNH2: Flavin mononucleotide (reduced); GDP: Guanosine diphosphate; GMP: Guanosine monophosphate; Glu: l-Glutamate; GTP: Guanosine triphosphate; H+: Proton; ImiAce: 2-iminoacetate or Dehydroglycine; Met: l-Methionine; NH4: Ammonium; PEP: Phosphoenolpyruvate; Pi: Phosphate; PPi: Diphosphate; SAMe: S-Adenosyl-l-methionine; Tyr: l-Tyrosine.

#### **Appendix A Thermodynamics of F420 Biosynthesis**

The thermodynamic properties of each of the steps in cofactor F420 biosynthesis were estimated to evaluate the feasibility of increasing the production of the cofactor in an engineered microorganism. The pathway assembled by Bashiri et al. [59] in *E. coli* was used (i.e., PEP was used as substrate for CofC). The standard transformed Gibbs free energy (ΔrGt ) of each step were calculated under the physiological conditions (25 ◦C, pH 7, and ionic concentration of 0.25 M) as described elsewhere [78,79]. The overall Gibbs free energy (ΔG<sup>t</sup> ) was then calculated by summing up all individual ΔrG<sup>t</sup> (Table A1). The Gibbs free energy of metabolite formation (ΔfG) for each metabolite in the pathway was obtained (Supplementary Information) from comprehensive lists of metabolites whose ΔfG were estimated using a group contribution method [80,81]. The ΔfG for each metabolite was then converted into its transformed type (ΔfGt ) method of Alberty [78]. The data were collected from relevant biochemical databases and the literature for any metabolite with missing ΔfG [82–84]. Owing to possessing different protonation states, the inconsistencies in ΔfG of certain metabolites such as the glutamates in F420-n among databases and the literature are inevitable. Thus, ΔrGt for reactions containing metabolites with varying ΔfG were calculated considering the differences in their ΔfG leading to the generation of a total of four sets of ΔrG<sup>t</sup> . Finally, the mean and standard deviations were calculated for these sets to yield the variation in each reaction as well as in the overall pathway (Table A1).

The data shown in Table A1 confirms that the overall cofactor F420 biosynthesis pathway is thermodynamically feasible under the given conditions. However, certain steps in this pathway impose a thermodynamic barrier with respect to the physiological conditions examined. For example, CofC seems to be one of the major thermodynamically unfavorable steps in the whole pathway possibly due to the energy-dependent synthesis of EPPG, one of the precursors for making F420. CofG/H combined appears to be the most thermodynamically favorable step in the whole pathway driving the biosynthesis of F0, the other key precursor for F420 biosynthesis. Interestingly, the formation of F420-2 molecule seems to be the most favorable step among other F420 molecules downstream of the pathway. It should be noted that the thermodynamic calculations were only performed up to three steps of F420 molecule production (i.e., F420-3) largely because of the high levels of inconsistencies of the data available for ΔfG of higher F420 molecules.


**Table A1.** Standard transformed Gibbs free energy of reaction (ΔrGt ), for the F420 biosynthesis pathway, calculated based on Gibbs free energy of metabolite formation (ΔfG<sup>t</sup> ) calculated at 25 ◦C, pH of 7, and ionic concentration of 0.25 M.

<sup>a</sup> For simplicity, protons were omitted in these equations and subsequent calculations as the ΔfG<sup>t</sup> of a proton under the set conditions is ~0.08 kJ. However, all ΔrG<sup>t</sup> calculations are based on a balanced equation. <sup>b</sup> The mean values of four sets and their standard deviations in parenthesis shown for each reaction. <sup>c</sup> ΔfG of 5ARPD4HB has only been reported in MetaCyc inferred by computational analysis. Including it in the calculations of ΔrG<sup>t</sup> for CofG and CofH results in −225.88(±0) and −894.62(±36), respectively. <sup>d</sup> Hydrolysis of PPi (H3P2O7 <sup>3</sup><sup>−</sup> + H2O → 2 HPO4 <sup>2</sup><sup>−</sup> + H+) yields a ΔrGt of ~17 kJ/mole, resulting in less than 2% change in the overall ΔrG<sup>t</sup> .

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **A Machine Learning Approach for E**ffi**cient Selection of Enzyme Concentrations and Its Application for Flux Optimization**

**Anamya Ajjolli Nagaraja 1,2,3,4, Philippe Charton 2,3,4, Xavier F. Cadet 5, Nicolas Fontaine 5, Mathieu Delsaut 1, Birgit Wiltschi 6, Alena Voit 6, Bernard O**ff**mann 7, Cedric Damour 1, Brigitte Grondin-Perez <sup>1</sup> and Frederic Cadet 2,3,4,\***


Received: 28 January 2020; Accepted: 28 February 2020; Published: 4 March 2020

**Abstract:** The metabolic engineering of pathways has been used extensively to produce molecules of interest on an industrial scale. Methods like gene regulation or substrate channeling helped to improve the desired product yield. Cell-free systems are used to overcome the weaknesses of engineered strains. One of the challenges in a cell-free system is selecting the optimized enzyme concentration for optimal yield. Here, a machine learning approach is used to select the enzyme concentration for the upper part of glycolysis. The artificial neural network approach (ANN) is known to be inefficient in extrapolating predictions outside the box: high predicted values will bump into a sort of "glass ceiling". In order to explore this "glass ceiling" space, we developed a new methodology named glass ceiling ANN (GC-ANN). Principal component analysis (PCA) and data classification methods are used to derive a rule for a high flux, and ANN to predict the flux through the pathway using the input data of 121 balances of four enzymes in the upper part of glycolysis. The outcomes of this study are i. in silico selection of optimum enzyme concentrations for a maximum flux through the pathway and ii. experimental in vitro validation of the "out-of-the-box" fluxes predicted using this new approach. Surprisingly, flux improvements of up to 63% were obtained. Gratifyingly, these improvements are coupled with a cost decrease of up to 25% for the assay.

**Keywords:** machine learning; flux optimization; artificial neural network; synthetic biology; glycolysis; metabolic pathways optimization; cell-free systems

#### **1. Introduction**

Many chemical molecules like peptides, organic acids, etc., are synthesized by different methods such as chemical reactions [1–5] and fermentation process for their application in everyday life. Due to the depletion of non-renewable resources, synthesis of these molecules through a biological system is essential on an industrial scale [6,7]. For decades, scientists have been successful in producing different chemical molecules through microbial fermentation by optimizing the process [7–10]. The costs of microbial fermentation are low, for instance, in comparison to mammalian cell cultures. Microbial systems are easily scalable, use inexpensive synthetic media and have lower batch-to-batch variability [11]. However, microbial systems such as *Escherichia coli* or yeasts have no or only limited capacity for post-translational modifications. Microbial biosynthesis may show low productivity and the coproduction of by-products is possible, which make product recovery complex and protracted [12]. With the advancement of science and technology, there is a continuous effort to improve productivity through novel techniques like gene regulation, which helps to channel the pathway in particular directions, substrate channeling where reactants are directed to the active site of enzymes [13,14], quorum sensing [15], enzyme engineering, etc. However, even after numerous studies, synthesizing some molecules on an industrial scale through microbial fermentation is not cost-effective.

Nobel laureate Eduard Buchner laid the foundation for the cell-free system (CFS) of biomolecule production by converting sugar into ethanol in 1897. It has been successfully used in the synthesis of many products like bio-hydrogen [16,17], bio-ethanol [18,19], antibodies [20], vaccines [21], proteins [22], etc. The CFS is classified into two broad categories: i) cell-extract based: in which the host cells are lysed [23,24] and ii) purified-enzyme based: a mixture of purified enzymes and cofactors are in the system [25]. The CFS has high toxic tolerance, rapid development timeline, easy incorporation of unnatural amino acids and easy purification of the product. The disadvantages of CFS include poor scalability, and post-translation modification of proteins is challenging [22]. The selection of enzymes is crucial in metabolic engineering since low performing enzymes result in poor titer and yield. Homology based methodologies like Selenzyme [26] have been developed to select better performing enzymes. One of the main challenges of purified enzyme-based CFS is the selection of optimum enzyme concentrations for maximum product formation. The experimental selection of optimum enzyme concentrations is expensive and tedious.

Researchers became more interested in the mathematical modeling of biological systems due to the availability of data from omics studies [27]. The modeling helps to organize the system information, to simulate and hence optimize the experiment and to understand system characteristics. Out of many different kinds of modeling methods, constraints-based and statics-based models, as well as kinetics-based or dynamic models have been used extensively to study metabolic pathways. The constraints-based methods such as the flux balance analysis [28] depend on physicochemical constraints like mass and energy balance [29]. However, the constraint-based method does not provide information about the concentration of metabolites. Kinetic modeling depends on the kinetic parameters of the enzymes involved in the pathway and provides information about their concentrations [30]. Kinetic modeling of pathways helps to better understand their behavior and replicate the system. Since the kinetic parameters are essential for this kind of modeling, it is not always easy to replicate the system. Finding the kinetic parameters is expensive, tedious [31], and some parameters are difficult to estimate experimentally [32]. For example, phosphofructokinase requires more than ten parameters to model [33]. Hence, the development of a computational method for selecting optimum enzyme concentrations without detailed knowledge of their kinetic parameters, using other existing experimental data, is helpful.

Machine learning methods help to predict the outcome based on the existing experimental data. The artificial neural network (ANN) is one such method inspired by brain architecture [34]. The neural network consists of connections between three layers: input, hidden and output layer. An activation function for the hidden layer is used to define the output. The neural network has been widely used in different fields of science for system identification and control, pattern recognition, medical diagnosis,

weather prediction, etc. In particular, the ANN has been used for the selection of optimized medium components in the fermentation process for producing different molecules such as lipids from *Chlorella vulgaris* [35] and Spinosyns from *Saccharopolyspora spinose* [36]. ANN was employed, for instance, for the prediction of the flux through mammalian gluconeogenesis, using the simulated data from metabolite isotopic labeling [37]. Glycolysis, one of the central carbon metabolism pathways, is not only important for organisms, but also in biotechnology for producing different biomolecules [38]. Many chemicals such as organic acids [39,40] and biofuels [41,42] have been successfully produced with high titer using engineered microorganisms including *Saccharomyces cerevisiae* or *Escherichia coli.* Glycolysis is widely studied from various perspectives. The availability of data from Fievet et al. [43] for flux prediction with different enzyme concentrations makes it a good candidate for developing a new approach to select optimum enzyme concentrations.

Previously, ANN was used to predict the flux through the upper part of glycolysis using enzyme concentrations, i.e., phosphogluco isomerase (PGI), phosphofructokinase (PFK), fructose biphosphate aldolase (FBA), and triosephosphate isomerase (TPI) as the input to the model [44]. The predicted flux has a root mean square error (RMSE) of 0.84 and an R<sup>2</sup> of 0.93, with 13 hidden units. Since the ANN is a training-based method, the new prediction depends on the training dataset. Since ANN is not efficient in extrapolating predictions [45,46], the new predictions will always lie in the range of the known output predictions; in other words, we could say that they will remain "in-the-box". High predicted output values will bump into a sort of "glass ceiling". Our working hypothesis was that, in reality, actual flux values could be higher than the predicted ones. So, in order to explore this "glass ceiling" space, we developed a new methodology (GC-ANN, for glass ceiling ANN) to predict the flux for the upper part of glycolysis, given enzyme concentrations using an artificial neural network. The outcomes of this study are i. in silico selection of optimum enzyme concentrations for maximum flux through the pathway and ii. experimental in vitro validation of the "out-of-the-box" flux predicted using this new approach. Initially, we expected to obtain slight improvements, i.e., improved flux values close to the highest one that we fed into the model. Surprisingly, improvements up to 63% were obtained. Moreover, these improvements are coupled with a cost decrease of up to 25% for the assay.

#### **2. Methodology**

#### *2.1. Data for New Methodology*

The data from Fievet et al. [43] were used to develop the new methodology for selecting optimum enzyme concentrations using ANN. The dataset consisted of 121 combinations of four enzymes (PGI, PFK, FBA and TPI) for the upper stage of glycolysis for a flux value of 0.74–12.9 μM/s. The total enzyme concentration was kept constant for four enzymes of 101.9 mg/L. The flux was measured as NADH consumption through G3PDH. For more details about the data, please refer to the experimental section of the research article by Fievet et al. [43].

#### *2.2. ANN-Based Flux Prediction Workflow*

The new GC-ANN methodology is explained in three steps: i.) the preparation stage: the data dimension is reduced to find the possibly correlated variable, the rule for obtaining higher flux (>12 μM/s) is derived from the data, and a neural network model is built to predict the flux using the enzyme concentrations; ii.) execution stage: new enzyme concentrations are generated using the rule obtained and the flux is predicted for the new concentration using ANN; and iii.) validation of the methodology: the new methodology of predicting flux using ANN is validated through simulation and experiment.

#### 2.2.1. Preparation stage

#### Reduction of Data Dimensionality

Principal component analysis (PCA) is one of the methods for the reduction of dimensionality of the dataset [47,48]. For datasets with a high degree of freedom, PCA is very useful to find possible correlations between the variables. PCA is performed using the R (V 3.4.3; R Development Core Team (2008)) package FactoMineR [49].

#### Visualization of Data

Three-dimensional viewing of data could provide insight into the distribution of flux in the space. Therefore, the fluxes in the 3D space of concentrations PGI, PFK, and TPI were visualized using R statistical packages plot3D [50] and plot3Drgl [51].

#### Classification of Data for Higher Flux (> 12 μM/s)

Data classification is the process of categorizing data into various homogeneous groups or types based on common characteristics. Decision tree analysis is a method of data classification helping to search for possible associations within the dataset. The decision tree is a simple tree-like graph method to understand and interpret the observations. The discriminant analysis helps to discriminate between the groups of data. The classification is supported by a discriminant analysis.

The data were classified into 5 groups, i.e., flux value from 0.728–3.17, 3.17–5.6, 5.6–8.04, 8.04–10.5 and 10.5–12.9. Approximately, 40% of the data are in the final group, which consists of higher flux concentrations (greater than 10.5 μM/s). The R packages klaR [52] and rpart [53] were used for discriminant analysis and decision tree respectively. The results from the decision tree and discriminant analysis were used to derive the concentration rule for higher flux values (> 12 μM/s) through the pathway.

#### Neural Network Model

The artificial neural network for predicting the flux through the upper part of glycolysis is built using the data described earlier in the section "Data for new methodology". The model predicts flux as an NADH consumption through the pathway. The model is built using the R package neuralnet [54], which gives us the freedom to choose two different activation functions: logistic and tanh [44].

#### 2.2.2. Execution Stage

#### Generation of New Enzyme Concentration

The new enzyme concentrations were generated between the highest (PGI = 70, PFK = 70, FBA = 86.1, TPI = 66.1 mg/L) and lowest (PGI = 1, PFK = 1, FBA = 2, TPI = 1.66 mg/L) concentrations of the data from Fievet et al. [43], with a step size of 1 mg/L using R script. The total enzyme concentration of four enzymes was kept constant at 101.9 mg/L as in Fievet et al. [43]. The newly generated concentrations were used in the additional analysis.

#### Flux Prediction Using ANN

Newly generated enzyme concentrations were fed to the ANN model to predict the flux. The data consisted of flux values ranging from 0.74 to 12.9 μM/s. Since ANN is not good for extrapolation, these values limit the prediction to this range. Nevertheless, it is likely that new enzyme concentrations could provide higher flux. However, ANN prediction will remain in the glass ceiling space. Hence, we decided to explore this space with squeezed flux, i.e., the flux that lies in this particular space. Thus, for our study, fluxes > 12 μM/s predicted by ANN and the concentrations that obeyed the rule derived to obtain possible higher flux values from data classification were retained.

#### 2.2.3. Validation of Methodology

The artificial neural network-based methodology for flux prediction was validated in two steps. In the first step of validation, the available kinetic parameters from Fievet et al. [43] helped us to build the model and replicate the experimental conditions. In the second step, the methodology was experimentally validated.

#### Simulation of Upper Part of Glycolysis

In CellDesigner (ver4.4) [55,56], the kinetic model of the upper part of glycolysis was built using the kinetic parameters from Fievet et al. [43] and the parameters for cofactors chosen from the BRENDA [57] database (Table 1). The model was built to replicate the experimental condition with the Michaelis-Menten equation (Table 1). ATP is regenerated using the creatine kinase system. The hexokinase concentration was kept constant at 0.1 μM and flux was measured as NADH consumption, as catalyzed by 1 μM of G3PDH. The concentrations of PGI, PFK, FBA and TPI were varied according to the selected balance from Section 2.2.2. (i.e., with concentrations that provide a flux ≥ 12 μM/s as predicted by the ANN model). The concentrations were converted from mg/L to μM using the molecular weight as suggested by Fievet et al.

The model was simulated for 120 s using COPASI [58] to measure NADH consumption. The slope of NADH decay between 60 and 120 s was estimated as flux through the pathway 182 enzyme balances yielding flux ≥ 15 μM/s from simulation using an in silico model were selected as the potential higher flux balances.

#### Experimental Validation

The upper part of glycolysis was reconstructed as described in Fievet et al. [43] (Figure 1). The in vitro system consisted of varied concentrations of PGI, PFK, FBA and TPI. The HK and G3PDH were kept constant and creatine kinases were used to regenerate ATP in the system. The NADH decay was measured as flux through the pathway. The slope of the linear NADH decay was used to calculate the flux in μM/s.

#### 2.2.4. The Workflow of the Proposed Methodology

Based on the data listed in Fievet et al. [43], the ANN model was built to predict the flux using enzyme balances, and the rule for enzyme balance for higher flux was obtained by data classification. The fluxes for newly generated enzyme balances were predicted using the ANN model. The balances with a flux value > 12 μM/s (balances from the glass-ceiling) and the balances obeying the derived rule for higher flux were selected as potential higher flux balances. These selected balances were validated using the kinetic model and by experiments. The methodology that followed for exploring the glass-ceiling of ANN (GC-ANN) is represented diagrammatically in Figure 2.


**Figure 1.** CellDesigner diagram for the upper part of glycolysis, which replicates the experimental conditions described by Fievet et al. [41]. HK: hexokinase, PGI: glucose 6- phosphate isomerase, PFK: phosphofructokinase, FBA: aldolase, TPI: triose-phosphate Isomerase, G3PDH: glycerol-3-phosphate dehydrogenase, CK: creatine kinase, re: reaction.

**Figure 2.** The methodology followed to obtain the new flux values from the generated enzyme concentration.

#### **3. Application and Results**

#### *3.1. Preparation*

#### 3.1.1. Data Dimension Reduction

In our study, PCA did not provide much information regarding the data. The total four-enzyme concentration was constant in the system, which reduced the degree of freedom to limit the enzyme concentrations to three. If the total enzyme concentration is not constant or the dataset presents a high degree of freedom, PCA will be more useful for obtaining uncorrelated variables: this is why we mentioned PCA as a useful tool in the framework of this methodology.

#### 3.1.2. Visualization of Data

After the PCA, data was visualized in 3D (Figure 3). We could observe on the plot that the higher flux (red dots) was quite distinct. This is a good indication that a quantitative method could be applied and should provide good results. Indeed, this is verified in the section "Flux prediction using ANN" (Figure 3). In this methodology, we were exploring the space around those higher flux concentrations to obtain new concentrations of PGI, PFK and TPI.

**Figure 3.** Three-dimensional visualization of Fievet et al. [43] enzyme balances after PCA (Dim1: 43.55%, Dim2: 23.78% and Dim 3: 17.56%). The change from blue to red indicates the gradient from low to high fluxes, respectively. Standard deviation of experimental flux is represented on the third-dimension.

#### 3.1.3. Enzyme Concentration Rule

Decision tree analysis was performed using the R package rpart by dividing the data into five groups; this provides the best compromise on the gain in inter-class inertia. The five groups were determined using kmeans clustering.

Figure 4 represents the classification of data where the percentage of data belongs to the branch of tree and fraction represents the distribution into different groups. For example, 89% of the data had FBA concentration > 11 and is distributed in five groups as a fraction of 0.01, 0.09, 0.17, 0.29 and 0.44 (Figure 4, node 3).

**Figure 4.** Decision tree analysis for Fievet et al. [43] data to obtain the rule for higher flux (≥ 12 μM/s). The data is classified into 5 groups (i.e., flux value from (0.728–3.17), (3.17–5.6), (5.6–8.04), (8.04–10.5) and (10.5–12.9).

Among the different methods of discriminant analysis studied, rpart performed the best with an approximate error rate of 0.1. The different methods studied were LDA (linear discriminant analysis), QDA (quadratic discriminant analysis), SKNN (simple k nearest neighbors), RDA (regularized discriminant analysis) and naïve Bayes (under R package). For the SKNN method, the error rate was low but it led to an over-classification (data not shown). Figure 5 represents the discriminant analysis for the classification of data from Fievet et al. [43] using the rpart [53] method from R.

**Figure 5.** Discriminant analysis for the classification of data from Fievet et al. [43] using the rpart50 method from R. Color code according to the feature space of data, where group 1 (flux: 0.728–3.17 μM/s) is shown in light blue, group 2 (flux: 3.17–5.6 μM/s) in dark blue, group 3 (flux: 5.6–8.04 μM/s) in white, group 4 (flux: 8.04–10.5 μM/s) in light pink and group 5 (flux: 10.5–12.9 μM/s) in dark pink. Numbers in black represent the data classified to the same group, and in red represent data misclassified into the other groups.

After using the decision tree (Figure 4) and discriminant analysis (Figure 5), the following rule was derived to obtain a flux ≥ 12 μM/s:

PGI < 11; 10 < PFK < 16; TPI < 18; 59 >FBA (mg/L), which corresponds to PGI < 15.07 U/mL; 0.7 U/mL < PFK < 1.12 U/mL; TPI < 264.42 U/mL; 2.48 U/mL > FBA.

The conversion from mg/L to U/mL is given inMethods S1 in SupplementaryMaterials. The derived rule is applied for the selection of the best concentrations of the enzymes PFK, PGI, TPI, and FBA to obtain a high flux through the pathway.

#### 3.1.4. Neural Network Model

ANN is a training-based method, the structure of the neural network needs to be chosen carefully since it depends on the number of inputs, sampling in the training dataset and the outputs. The structure was determined based on our previous study [44]. The neuralnet package from R statistical tool with the logistic activation function was used. It has 13 hidden units in a single layer. The ANN model used has an RMSE value of 0.84 and an R<sup>2</sup> value of 0.93, using leave-one-out cross-validation [44].

#### *3.2. Execution*

#### 3.2.1. Generation of New Enzyme Concentrations

The new concentrations of PFK, PGI, TPI and FBA were generated as explained in the methodology section. These new balances were used for further analysis to predict the flux.

#### 3.2.2. Flux Prediction Using ANN

The new balances were fed into the previously built neural network to predict the flux. The ANN predicted flux from the newly generated data was visualized in 3-dimensions (Figure 6).

**Figure 6.** Three-dimensional visualization of flux predicted by an artificial neural network (ANN) for newly generated enzyme concentrations. The color gradient is from the lowest (blue) to the highest (red) predicted flux.

As expected, the new prediction remained in the box (see the maximum value of the color gradient bar in Figure 6) since ANN is a training-based method that depends on the training dataset. The high predicted values bump into the "glass ceiling". Our hypothesis was that even though they remain in the roof of the "glass ceiling", the experimental values could be higher than the predicted ones. By exploring this space, we could obtain new balances with higher flux values.

In order to explore the "glass ceiling" space, we developed this new methodology (named GC-ANN) using the artificial neural network to predict the flux through the upper part of glycolysis for given enzyme concentrations. In this study, we showed (see below in the section validation) that by careful selection of enzyme concentrations from the "glass-ceiling" space, it is possible to obtain higher flux values "out-of-the-box".

For all the enzyme concentrations generated between minimum and maximum of experimental data, only flux values above 12 μM/s predicted by neural network, and only enzyme balances (total of 335 balances, a balance being defined as a mixture of the four enzymes PGI, PFK, FBA and TPI) obeying the enzyme concentration rule were selected as potential high-flux balances.

#### *3.3. Validation*

The methodology for exploring the glass-ceiling using ANN (GC-ANN) was validated in two steps: first using the kinetic model and second, in vitro.

#### 3.3.1. Simulation of Upper Part of Glycolysis

The kinetic model is built using CellDesigner [55,56] (Figure 1) and validated with COPASI [58] using the 121 concentrations from Fievet et al. [43]. The model has an RMSE value of 1.58 and R2 of 0.84 in a cross-validation procedure, compared to the experimentally determined flux (Figure 7). Figure 7 proves that the kinetic model was good and could be used for the validation of the new

approach. The highest flux predicted by the kinetic model of the reconstituted upper part of glycolysis was 14.93 μM/s, where the highest experimentally observed flux was 12.9 μM/s. The flux predicted by ANN for new enzyme balances from the section "Flux prediction using ANN" was compared with the simulated flux for each enzyme (Figure 8). Figure 8 shows that the balances that were predicted with higher flux through GC-ANN were also estimated to have higher flux using the kinetic model. This validates the good quality of the kinetic model.

**Figure 7.** Relationship between experimental flux (JFievet) estimated by Fievet et al. [43] and COPASI [58] estimated by the kinetic model.

#### 3.3.2. Experimental Validation of the Methodology

To validate this new approach to exploring the glass-ceiling (GC-ANN), the new enzyme balances generated were assayed in vitro. For the control experiment, 10 enzyme balances from previously used Fievet et al. [43] enzyme concentrations (Figure 9) were selected (Figure 10; Table S1). These selected balances have a correlation R<sup>2</sup> of 0.99 and an RMSE of 0.17 between the predicted flux from our kinetic model and the experimental flux assessed by Fievet et al. [43]. Figure 9 shows that balances selected for the control study are an appropriate choice. Two of these selected Fievet's balances were tested experimentally. The resulting fluxes for these two balances were 0.59 (±0.10) μM/s and 8.03 (±0.56) μM/s (see Table S2 in the Supporting Information) while Fievet et al. had determined 1.22 (±0.08) μM/s and 11.05 (±0.29) μM/s, respectively.

**Figure 8.** The relationship between flux values predicted by ANN vs COPASI for newly generated enzyme balances. The enzymes considered are: upper, left (PGI), right (PFK), lower left (TPI), right (FBA). The color gradient from blue to red represents the particular enzyme concentration from low to high, respectively.

**Figure 9.** Correlation between Fievet et al. [43] experimental flux and Copasi predicted flux. The balances corresponding to these flux values are selected as experimental control.

**Figure 10.** Comparison between glass ceiling ANN (GC-ANN) predicted flux and simulated flux. The enzyme balance corresponding to these flux values are selected for experimental validation of the methodology.

From the GC-ANN approach, 31 new balances were selected (Figure 10; Table S1) for experimental validation. The flux values associated with the selected balances had a coefficient of determination R<sup>2</sup> of 0.44, between GC-ANN predictions and simulated flux. This low R2 between ANN and Copasi prediction is due to the glass-ceiling effect: the underestimation of the flux due to the inability to obtain "out-of-the-box" values for the ANN was expected.

Enzyme Assays for Measurement of Kinetic Parameters

HK activity was assessed using glucose-6-phosphate dehydrogenase (G6PDH) in a coupled reaction. The substrate glucose was converted to 6-phosphogluconate, the formation of NADPH was followed spectrophotometrically at 340 nm (Figure 11A).

We assessed the activities of PGI, PFK and FBA using a coupled NADH assay applied to the upper part of glycolysis (Figure 11B). To determine the activity of PGI, we started the assay with glucose-6-P (Figure 11B, reaction 1); for the measurement of the activities of PFK and FBA, fructose 6-P and fructose 1,6-bisP were used as the substrates (Figure 11B, reactions 2 and 3). All reactions were monitored by reading the absorbance of NADH at 340 nm and the initial rates were used to calculate the Michaelis constant Km and the maximal velocity Vmax. The kinetic parameters Km for HK, PGI, PFK and FBA corresponded well to the values listed by the manufacturer (Sigma) or by the Enzyme Database Brenda (Table 2). Nevertheless, some enzymes, particularly HK and FBA, showed lower specific activity compared to the Sigma reference. The loss of activity could have occurred during delivery and/or storage of the enzymes or could be attributed to a different enzyme assay.

**Figure 11.** (**A**) Coupled HK/G6PDH assay to assess the HK activity. (**B**) Coupled NADH assay to assess the activities of PGI, PFK, and FBA. The individual reactions were started with substrates indicated by the numbers in circles.

**Table 2.** Summary of the kinetic parameters of HK, PGI, PFK, and FBA. The experimentally assessed values were deduced from Lineweaver-Burk and Eadie-Hofstee plots. Reference values for Km and Vmax from Brenda and Sigma's product data sheets are indicated, respectively. Lot No., lot number; sp. act., specific activity.


\* measured in this study, n.d.: not determined in this study.

#### Flux Determinations

The reaction mixtures for the measurements of the flux through the upper part of glycolysis were based on Fievet et al. [43] (Table 3). In contrast to Fievet et al., we based our mixtures on relative enzyme activities rather than enzyme concentrations. Calculations are explained in Method S1, in the Supplementary Materials.


**Table 3.** Comparison of ANN predicted flux (JANN in μM/s), simulated flux (JCopasi in μM/s) and experimentally assessed flux (JExp in μM/s). The four enzymes PGI, PFK, FBA and TPI were used at the indicated concentrations for the experimental assessment of the flux with mean deviation (M.D) of triplicates.

Out of 41 selected balances, 31 newly predicted enzyme concentrations were tested experimentally to estimate flux. All 31 new enzyme balances experimentally tested were estimated with flux values greater than 12 μM/s (Table 3). Table 3 shows that 28 out of 31, i.e., 90.3%, had a value above 15.0 μM/s, as expected according to the kinetic model. Moreover, 31 out of 31, i.e., 100%, had a value above 12.0 μM/s, as expected according to our methodology.

#### *3.4. Application: Selection of Cost-E*ffi*cient Enzyme Balances*

For industrial-scale production, the selection of best enzyme concentrations in terms of cost is essential. Therefore, we estimated the cost per μM of NADH consumed per second for all the enzyme balances generated (Figure 12) and for those selected balances from ANN prediction that obey the enzyme concentration rule (flux greater than 12 μM/s), i.e., 335 balances from the section "Flux prediction using ANN" (Figure 13). The calculations were described in Method S2 in the Supplementary Materials. The cost calculation for each reaction observed in the selection of enzymes could help to reduce cost. Figures 12 and 13 show the variation in cost according to each balance and its flux and allow the selection of balances with higher flux at low cost.

**Figure 12.** 3D-representation of cost estimated for all the enzyme concentrations generated. The color gradient is according to the cost required for each balance: blue is the lowest and red is the highest cost for a selected balance of the four enzymes PGI, PFK, FBA and TPI.

**Figure 13.** 3D-representation of the cost estimated for the enzyme concentration that obeys the rule obtained for higher flux values. The color gradient is according to the cost required for each balance, blue is the lowest and red is the highest cost for a selected balance of the four enzymes PGI, PFK, FBA and TPI.

As an example: the enzyme balance (in mg/L) with PGI = 2, PFK = 12, FBA = 81.24 and TPI = 4.66 (index 13 in Table S6 of the Supporting Information) could give a flux of 12.1 μM/s with a cost of 3.79 EUR.

#### **4. Discussion**

Traditionally, chemical molecules are synthesized by the chemical reaction of petroleum-based products. Due to the depletion of petroleum products, in-vivo biosynthesis has gained a lot of attention. Limitations of the cellular production system, such as low productivity, by-product formation, and low host cell tolerance to toxins moved the focus towards development of cell-free systems. Compared to cell systems, cell-free systems have high productivity and high toxin tolerance [22]. The selection of optimal enzyme concentrations for maximal productivity is a crucial step for industrial scale, cell-free production of biomolecules. The modeling of metabolic pathways helps to study and predict the behavior of the biological system. Constraint-based methods facilitate the understanding of the system but do not provide information about the concentration of the individual metabolites. In contrast, kinetic models provide information about individual metabolite concentrations but require kinetic parameters of enzymes, which are tedious and expensive to determine [32]. Design of experiment (DOE) is a systematic approach to optimize the conditions for biomolecule production in the field of biotechnology [63]. In DOE, multiple variables are studied to find the correlation between the variables and the final outcome. The main objective of DOE is to reduce the number of experiments, time and cost; our study has the same objective. The benefit of GC-ANN is that the objective optimum can be "out-of-the-box" but will nevertheless be found without additional experiments.

#### *4.1. GC-ANN Approach Could be Used to Predict "Out-of-the-Box" Values*

In this study, a new methodology, GC-ANN, to select the optimum enzyme balances for industrial biotechnology is devised. This approach aims to see beyond the "glass ceiling", using an artificial neural network and different statistical methods like PCA and data classification. The method was designed and validated for the upper part of glycolysis but could be applied to any other natural or reconstituted biosynthesis pathway.

The workflow of the methodology used in the upper part of glycolysis is summarized in Figure 2. In the first step, for selecting the optimum concentrations of the four relevant enzymes PGI, PFK, FBA and TPI, a rule was devised for high flux values (supported by Figures 3–5). We generated all possible balances using a step of 1 mg/L in terms of variation for each enzyme concentration. The balances newly generated in the present study have higher and lower limits than those in Fievet et al. [43]. These new enzyme balances were used to predict the flux through the upper glycolysis using ANN, and the predicted fluxes were depicted in 3D representation (Figure 6); we observed a zone (Figure 6, brown zone) with predicted flux > 12 μM/s. To explore this space in order to obtain even higher fluxes, the high-flux-rule was applied, i.e., 10 < PFK < 16; PGI < 11; TPI < 18; 59 < FBA (in mg/L), and 335 enzyme balances were scrutinized. The main idea behind our approach is based on the fact that: *i.* ANN is known to be a good tool for predicting class and/or quantitative values inside the box (i.e., prediction close to training data), *ii*. the brown region in Figure 6 contains values that are all very close to 12 μM/s (from 12 to 12.9 μM/s) because ANN is not useful for extrapolation and new predictions remain inside the box; and *iii*. we postulate that among these flux values, in fact, some could be higher than predicted.

In the second step, to validate our hypothesis we conducted in silico and in vitro experiments.

#### 4.1.1. In-Silico Validation

Due to the availability of kinetic parameters, to avoid unnecessary expenses linked to in vitro assays: First, we built a kinetic model. Figure 7 shows good agreement (R<sup>2</sup> = 0.84) between the fluxes predicted by the kinetic model and all the flux values experimentally assessed by Fievet et al. [41]. Then, we selected 10 balances associated with experimental values between 0.74 and 12.9 μM/s of

Fievet's data for the benchmark study. Figure 9 shows excellent correlation with R2 of 0.99 and an RMSE of 0.17 between the predicted flux from our kinetic model and the experimental flux assessed by Fievet et al. Taken together, these first results were a good validation of our kinetic model.

Second, we intended to validate our in vitro assay by reproducing the results obtained by Fievet et al. [43]. We decided to carry out in vitro experiments for the balances that had a good correlation between simulated and experimental flux. The experimentally determined fluxes using the balances selected from the Fievet data were lower than those previously determined by these authors (Table S3). Nevertheless, the fold-increase was comparable (approximately 9-fold, this study vs. 13-fold, Fievet et al. [43]). The deviation of the absolute flux values could be attributed to experimental settings, i.e., NADH depletion assay in cuvettes at 390 nm (Fievet et al. [43]) vs. in 96 well plates at 365 nm, in this study; or to differences in the assays performed to measure kinetic parameters of the individual enzymes.

Finally, as our kinetic model has been validated, we used it to conduct the first verification, in silico, of our hypothesis. For 31 new balances selected according to the methodology described above (Section 3.3.2), Figure 10 shows how flux values predicted by the kinetic model fit with the simulated values. All the balances selected from the brown zone (Figure 6) were indeed superior to 12.0 μM/s. Moreover, the flux should be above 15.0 μM/s. So, this is a first, in silico, validation of our hypothesis, i.e., the ANN-based approach could be used to predict "out-of-the-box" values.

At this point, we had to keep in mind that this preliminary verification was conducted because the kinetic model was possible to establish, but this step is not mandatory in the proposed methodology. Indeed, the 31 balances were chosen first, based only on the outcome of GC-ANN methodology that combines ANN and different statistical methods like PCA and data classification.

#### 4.1.2. In Vitro Validation

The 31 new enzyme balances were assessed experimentally. Table 3 proves our hypothesis: with careful selection of enzyme concentrations from the glass ceiling, it is possible to obtain higher flux values. For the 27 best enzyme balances, the improvement of flux ranged from 20% (observed flux: 15.4, original flux: 12.9) to 63% (observed flux: 21.0, original flux: 12.9). This clearly demonstrates that exploring the predicted values, which hit the "glass ceiling" using the GC-ANN approach is a good way to select the optimum enzyme concentration.

Since artificial neural networks do not require much information regarding experimental conditions, and particularly, in our case, kinetic parameters hard to obtain, they are easy to apply in different fields of science. Our GC-ANN approach could be applied to any pathway provided the experimental data are available. Currently, we are looking for other experimental datasets to which this methodology can be applied.

#### *4.2. The Proposed Methodology Is Cost-E*ffi*cient*

From an industrial perspective, production costs per quantity of product are very important. Choosing an enzyme balance that results in maximum flux at a very low cost per given quantity of product is essential. The ANN-based methodology makes it easy to estimate the total cost. The approximate price for each reaction was calculated using the details provided by the manufacturer, such as specific activity and units of enzyme in the sample. We could calculate the approximate cost required for 1 μM of product formation per second through the pathway. This would help us to decide which is the most suitable enzyme balance for maximum flux in terms of cost minimization, which is important for industrial-scale production. For example, to obtain a flux of 12.1 μM/s, the approximate cost should be 6.28 EUR, whereas we could achieve the same flux value with a cheaper rate of 3.79 EUR (40%). Figure 12 clearly shows how costs vary. Details are provided in Table S6 and Figure S1. Among the enzyme combinations selected for the validation of our methodology, PGI = 3, PFK = 16, FBA = 80.24 and TPI = 2.66 (mg/L) had an estimated flux value of 20.6 μM/s with the lowest cost of 0.197 EUR per μM of NADH consumed per second using GC-ANN methodology for the selection of

enzyme balances (Figure S2). In contrast, the lowest price in Fievet et al. [43] with the selected balance PGI = 7, PFK = 12, FBA = 66.23 and TPI = 16.66 (mg/L) was 0.349 EUR per μM/s with an experimentally estimated flux value of 12.35 μM/s (Figure S2). This method, therefore, makes it possible to identify the production costs of 1 μM of product from 0.197 to 6.28 € in order to choose the best compromise between the cost and speed of the reaction.

Lastly and interestingly, the validated kinetic model makes it possible to generate a huge amount of data so as to feed our ANN-based model with more flux values from the newly predicted enzyme balances. This should be explored in future studies.

#### **5. Materials and Methods**

All enzymes as well as phosphocreatine, glucose-6-phosphate, fructose-6-phosphate and fructose-1,6-bisphosphate were purchased from Sigma-Aldrich (St. Louis, MO, USA). D-Glucose, ATP, NADH, and NADP were obtained from Carl Roth GmbH (Karlsruhe, Germany). Hexokinase (HK), phosphoglucoisomerase (PGI), triose-phosphate isomerase (TPI), and glucose-6-phosphate dehydrogenase (G6PDH) originated from baker's yeast; fructose biphosphate aldolase (FBA), glycerol-3-phosphate dehydrogenase (G3PDH), and creatine kinase (CK) were obtained from rabbit muscle and phosphofructokinase (PFK) originated from *Bacillus stearothermophilus*. The enzymes were obtained as lyophilized powder except for PGI and TPI, which were ammonium sulphate suspensions. Detailed information on the enzymes used is provided in Table S1 of Supplementary Materials.

#### *5.1. Determination of Protein Concentration*

Protein concentrations were determined using the Bradford protein assay [64] from Bio-Rad Laboratories (Hercules, CA). Of the protein solutions 10 μL was mixed with 200 μL of Bio-Rad Protein Assay Dye Reagent, incubated for 5 minutes at room temperature, and the absorbance was measured spectrophotometrically at 595 nm. A dilution series of 0.06–0.5 mg/mL BSA (Carl Roth GmbH) was used for calibration.

#### *5.2. Enzyme Assays for the Determination of Kinetic Parameters*

Enzyme assays were performed in 96-well UV-STAR®microplates (Greiner Bio-One GmbH, Kremsmünster, Austria) in a total volume of 100 μL at 25 ◦C. The reaction buffer contained 50 mM PIPES (pH 7.5), 100 mM KCl, and 5 mM magnesium acetate. The cofactors for the reactions were 1 mM ATP and 1 mM NADH or NADP.

HK activity was measured with 0.05 U HK, 2.5 U G6PDH, and glucose concentrations from 10 to 0.01 mM. PGI activity was measured with 0.02–0.01 U PGI, 1–0.5 U PFK, 0.5 U FBA, 2 U G3PDH, 5 U TPI, and glucose 6-phosphate concentrations ranging from 30 to 0.03 mM. PFK activity was measured with 0.02 U PFK, 0.5 U FBA, 2 U G3PDH, 5 U TPI, and fructose 6-phosphate concentrations from 10 to 0.01 mM. FBA activity was measured with 0.01–0.05 U FBA, 2 U G3PDH, 5 U TPI, and fructose 1,6-phosphate concentrations from 10 to 0.01 mM. All reactions were monitored by recording the absorption at a wavelength of 340 nm (molar extinction coefficient ε340 nm, 25 ◦<sup>C</sup> 6.22 L mmol−<sup>1</sup> cm<sup>−</sup>1). For calculation of the kinetic parameters Vmax, Km, and kcat we used Lineweaver-Burk as well as Eadie-Hofstee representations.

#### *5.3. Flux Measurements*

The total reaction volume of 100 μL contained fixed concentrations of 3 mM NADH, 20 mM phosphocreatine, 1 μM CK, 0.1 μM HK, and 1 μM G3PDH. The concentrations of PGI, PFK, FBA, and TPI were varied as indicated (Section 3.3.2). The reactions were started with 1 mM ATP and 100 mM glucose. Blank reactions contained all ingredients except ATP and glucose. Each condition was measured in triplicates. The NADH decay was monitored every 3 s at 365 nm using a SynergyMxSMATBLD(+) Gen5 SW plater reader (SZABO-SCANDIC, Vienna, Austria). The slope of NADH decay was measured as the flux through the pathway (molar extinction coefficient ε365 nm, 25 ◦<sup>C</sup> 3.4 L mmol−<sup>1</sup> cm<sup>−</sup>1).

#### **6. Conclusions**

The selection of enzymes is an important step in the production of biomolecules. Methods based on homology are widely used to select the best performing enzymes. In addition, the selection of optimum enzyme balances is also crucial. Most methods use kinetic information for concentration selection via modeling. However, the determination of kinetic parameters is not always easy; therefore, developing new methodologies for selecting the optimum enzyme balances is of great interest.

In this study, we developed a new approach, GC-ANN, which uses an artificial neural network along with different statistical methods (PCA and data classification) to select enzyme balances that improve the flux as well as the costs. The selected balances might not be the balances with the highest flux, but they would be among the best. This approach allows cost-efficient selection of enzyme balances using a small existing dataset, and it opens the door for rapid optimization of cell-free systems in an industrial environment.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/10/3/291/s1, Figure S1: The cost predicted (in EUR) for the four-enzyme concentration (PGI, PFK, FBA, and TPI) selected for experimental validation. The blue is lowest, to highest in red; Figure S2: The cost predicted (in EUR) for the four-enzyme concentration (PGI, PFK, FBA, and TPI) selected by Fievet et al. (2006). The blue is lowest, to highest in red; Table S1: Enzymes used in this study for the upper part of glycolysis. All enzymes were from Sigma; Table S2: The measured enzyme activities for the enzymes involved in the upper part of glycolysis (see also Table 2 in the main text); Table S3: The enzyme concentrations (mg/L) predicted from ANN and in-silico modeling to have higher flux values. For the experimental validation, we used relative concentrations of enzymes obtained as explained in Method S1; Table S4: Specification of enzymes used for the calculation of cost for the preparatory stage of glycolysis from sigma. Specific activities are calculated by Fievet et al; Table S5: Comparison of flux predicted between Fievet et al. selected concentration (JFievet) and new estimation during current work (Jobs); Table S6: The calculated price for the μM of NADH consumed per second by the enzyme concentration selected for the experiment; Methods S1: concentration based on relative activity; Method S2: Cost Calculation.

**Author Contributions:** F.C., C.D. and P.C. designed the method. A.A.N., P.C., X.F.C., N.F., M.D., B.W., A.V., B.O., C.D., B.G.-P. and F.C. participated in the design of the study and performed the analysis. A.A.N. and M.D. wrote algorithms. A.A.N., P.C., X.F.C., C.D. and F.C. wrote and corrected the manuscript. All authors read and approved the final version of the manuscript.

**Funding:** AAN is supported by a PhD grant from the Region Reunion and European Union (FEDER) under European operational program INTERRG V-2014-2020, file number 20161449, tiers 234273. We gratefully acknowledge support from: *i.* the Federal Ministry for Digital and Economic Affairs (bmwd), the Federal Ministry for Transport, Innovation and Technology (bmvit), the Styrian Business Promotion Agency SFG, the Standortagentur Tirol, Government of Lower Austria and ZIT—Technology Agency of the City of Vienna through the COMET-Funding Program managed by the Austrian Research Promotion Agency FFG.; *ii.* Peaccel via a research program co-funded by the European Union (UE) and Region Reunion (FEDER). The funding agencies had no influence on the research process.

**Conflicts of Interest:** Authors declare no conflict of interest.

**Availability of Data and Materials:** R-scripts used for the analysis are found at https://github.com/DSIMB/GC-ANN-Enzyme-Concentration-Selection.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Immobilization of** β**-Galactosidases on the** *Lactobacillus* **Cell Surface Using the Peptidoglycan-Binding Motif LysM**

#### **Mai-Lan Pham 1, Anh-Minh Tran 1,2, Suwapat Kittibunchakul 1, Tien-Thanh Nguyen 3, Geir Mathiesen <sup>4</sup> and Thu-Ha Nguyen 1,\***


Received: 25 April 2019; Accepted: 7 May 2019; Published: 12 May 2019

**Abstract:** Lysin motif (LysM) domains are found in many bacterial peptidoglycan hydrolases. They can bind non-covalently to peptidoglycan and have been employed to display heterologous proteins on the bacterial cell surface. In this study, we aimed to use a single LysM domain derived from a putative extracellular transglycosylase Lp\_3014 of *Lactobacillus plantarum* WCFS1 to display two different lactobacillal β-galactosidases, the heterodimeric LacLM-type from *Lactobacillus reuteri* and the homodimeric LacZ-type from *Lactobacillus delbrueckii* subsp. *bulgaricus*, on the cell surface of different *Lactobacillus* spp. The β-galactosidases were fused with the LysM domain and the fusion proteins, LysM-LacLMLreu and LysM-LacZLbul, were successfully expressed in *Escherichia coli* and subsequently displayed on the cell surface of *L. plantarum* WCFS1. β-Galactosidase activities obtained for *L. plantarum* displaying cells were 179 and 1153 U per g dry cell weight, or the amounts of active surface-anchored β-galactosidase were 0.99 and 4.61 mg per g dry cell weight for LysM-LacLMLreu and LysM-LacZLbul, respectively. LysM-LacZLbul was also displayed on the cell surface of other *Lactobacillus* spp. including *L. delbrueckii* subsp. *bulgaricus*, *L. casei* and *L. helveticus*, however *L. plantarum* is shown to be the best among *Lactobacillus* spp. tested for surface display of fusion LysM-LacZLbul, both with respect to the immobilization yield as well as the amount of active surface-anchored enzyme. The immobilized fusion LysM-β-galactosidases are catalytically efficient and can be reused for several repeated rounds of lactose conversion. This approach, with the β-galactosidases being displayed on the cell surface of non-genetically modified food-grade organisms, shows potential for applications of these immobilized enzymes in the synthesis of prebiotic galacto-oligosaccharides.

**Keywords:** *Lactobacillus*; β-galactosidase; immobilization; cell surface display; LysM domains

#### **1. Introduction**

β-Galactosidases catalyze the hydrolysis and transgalactosylation of β-D-galactopyranosides (such as lactose) [1–3] and are found widespread in nature. They catalyze the cleavage of lactose (or related compounds) in their hydrolysis mode and are thus used in the dairy industry to remove lactose from various products. An attractive biocatalytic application is found in the transgalactosylation potential of these enzymes, which is based on their catalytic mechanism [1,4]. β-Galactosidases can be obtained from different sources including microorganisms, plants and animals, however microbial sources of β-galactosidase are of great biotechnological interest because of easier handling, higher multiplication rates, and production yield. Recently, a number of studies have focused on the use of the genus *Lactobacillus* for the production and characterization of β-galactosidases, including the enzymes from *L. reuteri, L. acidophilus, L. helveticus, L. plantarum, L. sakei, L. pentosus, L. bulgaricus, L. fermentum, L. crispatus* [5–15]. β-Galactosidases from *Lactobacillus* species are different at molecular organization [6,8,10,12,16]. The predominant glycoside hydrolase family 2 (GH2) β-galactosidases found in lactobacilli are of the LacLM type, which are heterodimeric proteins encoded by the two overlapping genes, *lacL* and *lacM*, including *lacLM* from *L. reuteri* [16], *L. acidophilus* [6], *L. helveticus* [7], *L. pentosus* [11], *L. plantarum* [8], and *L. sakei* [10]. Di- or oligomeric GH2 β-galactosidases of the LacZ type, encoded by the single *lacZ* gene, are sometimes, but not often found in lactobacilli such as in *L. bulgaricus* [12]. Lactobacilli have been studied intensively with respect to their enzymes for various different reasons, one of which is their 'generally recognized as safe' (GRAS) status and their safe use in food applications. It is anticipated that galacto-oligosaccharides (GOS) produced by these β-galactosidases will have better selectivity for growth and metabolic activity of this bacterial genus in the gut.

An economical, sustainable and intelligent use of biocatalysts can be achieved through immobilization, where the enzyme is bound onto a suitable food-grade carrier. Efforts have been made to immobilize β-galactosidases from *L. reuteri*, a LacLM-type, and *Lactobacillus bulgaricus*, a LacZ-type, on chitin using the chitin binding domain (ChBD) of *Bacillus circulans* WL-12 chitinase A1 [17]. Cell surface display has been shown as a new strategy for enzyme immobilization, which involves the use of food-grade organism *L. plantarum* both as a cell factory for the production of enzymes useful for food applications and as the carrier for the immobilization of the over-expressed enzyme by anchoring the enzyme on the cell surface [18,19]. This enables the direct use of the microbial cells straight after the fermentation step as an immobilized biocatalysts, offering the known advantages of immobilization (reuse of enzyme, stabilization, etc.) together with a significant simplification of the production process since costly downstream processing of the cells producing the enzyme (cell disruption, protein purification, etc.) as well as the use of carrier material will not be necessary. We recently reported cell surface display of mannanolytic and chitinolytic enzymes in *L. plantarum* using two anchors from *L. plantarum*, a lipoprotein-anchor derived from the Lp\_1261 protein and a cell wall anchor (cwa2) derived from the Lp\_2578 protein [19]. However, this approach works less efficient with dimeric and oligomeric enzymes, such as β-galactosidases from lactobacilli, due to low secretion efficiency of target proteins. Therefore, it is of our interest to find another strategy to display lactobacillal β-galactosidases on *Lactobacillus* cell surface for use as immobilized biocatalysts for applications in lactose conversion and GOS formation processes.

There are two principally different ways of anchoring a secreted protein to the bacterial cell wall: covalently, via the sortase pathway, or non-covalently, via a protein domain that interacts strongly with cell wall components. In sortase-mediated anchoring, the secreted protein carries a C-terminal anchor containing the so-called LP × TG motif followed by a hydrophobic domain and a positively charged tail [20]. The hydrophobic domain and the charged tail keep the protein from being released to the medium, thereby allowing recognition of the LP × TG motif by a membrane-associated transpeptidase called sortase [20–22]. The sortase cleaves the peptide bond between threonine and glycine in the LP × TG motif and links the now C-terminal threonine of the surface protein to a pentaglycine in the cell wall [21–25]. One of the non-covalent cell display systems exploits so-called LysM domains, the peptidoglycan binding motifs, that are known to promote cell wall association of several natural proteins [23,26]. These domains have been used to display proteins in lactic acid bacteria (LAB) by fusing the LysM domain N- or C-terminally to the target protein [27–30]. In *L. plantarum* WCFS1 ten proteins are predicted to be displayed at the cell wall through LysM domains [31].

In this present study, we exploit a single LysM domain derived from the Lp\_3014 protein in *L. plantarum* WCFS1 for external attachment of two lactobacillal β-galactosidases, a LacLM-type from *L. reuteri* and a LacZ-type from *L. bulgaricus*, on the cell surface of four *Lactobacillus* species. The immobilization of active β-galactosidases through cell-surface display can be utilized as safe and stable non-GMO food-grade biocatalysts that can be used in the production of prebiotic GOS.

#### **2. Results**

#### *2.1. Expression of Recombinant Lactobacillal* β*-Galactosidases in E. coli*

The overlapping *lacLM* genes from *L. reuteri* L103 and the *lacZ* gene from *L. bulgaricus* DSM20081, both encoding β-galactosidases, were fused N-terminally to the LysM motif for expression and later attachment of the hybrid proteins to the peptidoglycan layer of *Lactobacillus* spp. An 88 residue fragment of the LysM motif from the 204-residue-Lp\_3014 protein of an extracellular transglycosylase of *L. plantarum* WCFS1 [31,32] was fused to two β-galactosidases for production in *E. coli*. The two hybrid sequences were then cloned into the expression vector pBAD containing an N-terminal 7 × Histidine tag for immunodetection, yielding pBAD3014LacLMLreu and pBAD3014LacZLbul (Figure 1).

**Figure 1.** The expression vectors for LysM-LacLMLreu (**A**) and LysM-LacZLbul (**B**) in *E. coli*. The vectors are the derivatives of the pBAD vector (Invitrogen, Carlsbad, CA, USA) containing a 7 × His tag sequence fused to a single LysM domain from Lp\_3014, *L. plantarum* WCFS1. LacLMLreu encoded by two overlapping genes *lacLM* and LacZLbul encoded by the *lacZ* gene are the β-galactosidases from *L. reuteri* and *L. delbrueckii* subsp. *bulgaricus* DSM 20081, respectively. See text for more details.

The *E. coli* strains were cultivated in Luria-Bertani (LB) medium, induced for gene expression (as described in Materials and Methods), and the SDS-PAGE and Western blot analyses of cell-free extracts (Figure 2) showed the production of the two recombinant β-galactosidases, LysM-LacLMLreu and LysM-LacZLbul. As judged by SDS-PAGE (Figure 2A), LysM-LacLMLreu shows two bands with apparent molecular masses corresponding to a large subunit (LacL) and a small subunit (LacM) at ~90 kDa and ~35 kDa. These values are in agreement with reported molecular masses of 73 and 35 kDa for these two subunits of β-galactosidase from *L. reutei* [5,16]. The increase in molecular mass of a larger subunit in LysM-LacLMLreu is due to the added His-LysM fragment (~18 kDa). On the other hand, β-galactosidase from *L. bulgaricus* was reported to be a homodimer, consisting of two identical subunits of ~115 kDa [12]. A unique band of ~130 kDa corresponding to the molecular mass of a single subunit of LacZ fused with the 18 kDa-fragment of the histidine-tag and the LysM domain was shown on SDS-PAGE analysis of a cell-free extract of LysM-LacZLbul as expected (Figure 2A). Western blot analysis of the crude, cell-free extracts was performed using anti-His antibody for detection. Figure 2B shows that the recombinant bacteria produced the expected proteins, LysMLacL (lane 2) and LysMLacZ (lane 4). LacM was not detected as it does not contain the histidine-tag.

**Figure 2.** SDS-PAGE analysis (**A**) and Western blot analysis (**B**) of a cell-free extract of crude β-galactosidase fusion proteins, LysM-LacLMLreu (non-induced: lane 1, induced: lane 2) and LysM-LacZLbul (non-induced: lane 3, induced: lane 4), overexpressed in *E. coli* HST08. LacLMLreu encoded by two overlapping genes *lacLM* and LacZLbul encoded by *lacZ* gene are the β-galactosidases from *L. reuteri* and *L. delbrueckii* subsp. *bulgaricus* DSM 20081, respectively. The cultivation and induction conditions are as described in Materials and Methods and samples were taken at different time points after induction during cultivations. The arrows indicate the subunits of the recombinant β-galactosidases. M denotes the Precision protein ladder (Biorad, CA, USA).

To check if the heterologously produced enzymes were functionally active, β-galactosidase activities of cell-free lysates of *E. coli* cells carrying different expression vectors were measured. The highest yields obtained for the two recombinant enzymes were 11.1 ± 1.6 k·U*o*NPG per L of medium with a specific activity of 6.04 <sup>±</sup> 0.03 U·mg−<sup>1</sup> for LysM-LacLMLreu and 46.9 <sup>±</sup> 2.7 kU*o*NPG per L of medium with a specific activity of 41.1 <sup>±</sup> 0.9 U·mg−<sup>1</sup> for LysM-LacZLbul, respectively (Table 1). The β-galactosidase activities in non-induced *E. coli* cells were negligible for both LysM-LacLMLreu and LysM-LacZLbul showing that the activity is from the overproduced β-galactosidases (Table 1).


**Table 1.** β-Galactosidase activities in cell-free lysates of *E. coli* cells carrying different expression vectors.

#### *2.2. Display of Lactobacillal* β*-Galactosidases on Lactobacillus Cell Surface*

To investigate the attachment of the two hybrid proteins, LysM-LacLMLreu and LysM-LacZLbul, to the cell wall of *L. plantarum,* cell-free crude extracts from *E. coli* harboring β-galactosidases corresponding to 50 U*o*NPG (~5–6 mg protein) were incubated with *L. plantarum* cells collected from one mL cultures at OD600 ~4.0. The enzymes and *L. plantarum* were incubated at 37 ◦C with gentle agitation, and after 24 h of incubation, the residual activities in the supernatant as well as on the cell surface were determined for both enzymes (Table 2A). The immobilization yield (IY) is a measure of how much of the applied protein bound to the surface of *Lactobacillus* cells. Immobilizations yields for LysM-LacLMLreu and LysM-LacZLbul were 6.5% and 31.9%, respectively. SDS-PAGE analysis of the samples after the immobilization procedure showed strong bands of LysM-LacL and LacM or LysM-LacZ in the residual supernatants (Figure 3A, lane 2; Figure 3B, lane 2), indicating relatively high amounts of non-anchored proteins in the supernatants. Two successive washing steps with 50 mM sodium phosphate buffer (NaPB, pH 6.5) did not release the enzymes showing that the immobilization is both effective and stable (Figure 3A, lanes 4, 5; Figure 3B, lanes 3, 4). The low immobilization yield for LysM-LacLMLreu was confirmed by the SDS-PAGE analysis (Figure 3A, lane 3). Western blot analysis of the crude, cell-free extracts of *L. plantarum* LacZLbul-displaying cells was performed using an anti-His antibody for detection showing the presence of LacZLbul (Figure 3C; lane 3). Flow cytometry confirmed the surface localization of both enzymes LysM-LacLMLreu and LysM-LacZLbul as clear shifts in the fluorescence signals for *L. plantarum* LacLMLreu- and LacZLbul-displaying cells in comparison to the control strain were observed (Figure 4A,B). The surface-displayed enzymes were shown to be functionally active. β-Galactosidase activities obtained for *L. plantarum* displaying cells were 179 and 1153 U per g dry cell weight, corresponding to approximately 0.99 and 4.61 mg of active, surface-anchored β-galactosidase per g dry cell mass for LysM-LacLMLreu and LysM-LacZLbul (Table 2A), respectivel.

**Figure 3.** *Cont*.

**Figure 3.** SDS-PAGE analysis (**A**,**B**) and Western blot analysis (**C**) of immobilization of recombinant enzymes. LacLMLreu encoded by two overlapping genes *lacLM* and LacZLbul encoded by *lacZ* gene are the β-galactosidases from *L. reuteri* and *L. delbrueckii* subsp. *bulgaricus* DSM 20081, respectively. The arrows indicate the subunits of the recombinant β-galactosidases. M denotes the Precision protein ladder (Biorad, CA, USA). (**A**) Cell-free crude extracts of *E. coli* HST08 harboring pBAD3014LacLMLreu (containing LysM-LacLMLreu) at 18 h of induction (lane1); flow through during immobilization (lane 2); surface anchored-LysM-LacLMLreu in *L. plantarum* WCFS1 (lane 3) and washing fractions (lanes 4, 5); non-displaying *L. plantarum* WCFS1 cells, negative control (lane 6). (**B**) Cell-free crude extracts of *E. coli* HST08 harboring pBAD3014LacZLbul (containing LysM-LacZLbul) at 18 h of induction (lane1); flow through during immobilization on the cell surface of *L. plantarum* WCFS1 (lane 2) and washing fractions (lanes 3, 4); flow through during immobilization on the cell surface of *L. delbrueckii* subsp. *bulgaricus* DSM 20081 (lane 5) and washing fractions (lanes 6, 7); flow through during immobilization on cell surface of *L. casei* (lane 8) and washing fractions (lanes 9, 10); flow through during immobilization on cell surface of *L. helveticus* DSM 20075 (lane 11) and washing fractions (lanes 12, 13). (**C**) Cell-free crude extracts of *E. coli* HST08 harboring pBAD3014LacZLbul (containing LysM-LacZLbul) at 18 h of induction (lane 1); non-displaying *L. plantarum* WCFS1 cells (lane 2) and surface anchored-LysM-LacZLbul in *L. plantarum* WCFS1 (lane 3); non-displaying *L. delbrueckii* subsp. *bulgaricus* DSM 20081 cells (lane 4) and surface anchored-LysM-LacZLbul in *L. delbrueckii* subsp. *bulgaricus* DSM 20081 (lane 5); surface anchored-LysM-LacZLbul in *L. casei* (lane 6) and non-displaying *L. casei* cells (lane 7); non-displaying *L. helveticus* DSM 20075 cells (lane 8) and surface anchored-LysM-LacZLbul in *L. helveticus* DSM 20075 (lane 9).

**Figure 4.** Analysis of surface localization of LysM-LacLMLreu and LysM-LacZLbul in *Lactobacillus* cells by using flow cytometry: surface anchored-LysM-LacLMLreu in *L. plantarum* WCFS1 (**A**, green line); surface anchored-LysM-LacZLbul in *L. plantarum* WCFS1 (**B**, blue line), in *L. delbrueckii* subsp. *bulgaricus* DSM 20081 (**C**, red line), in *L. casei* (**D**, purple line) and in *L. helveticus* DSM 20075 (**E**, olive line). Non-displaying *Lactobacillus* cells were used as negative controls (A–E, black line).


*L. bulgaricus* DSM 20081 71.3 ± 0.9 28.7 14.0 ± 0.9 795 ± 53 48.5 3.18 ± 0.11 *L. casei* 76.1 ± 0.9 23.9 15.1 ± 0.8 861 ± 48 63.2 3.44 ± 0.09 *L. helveticus* DSM20075 75.3 ± 0.9 24.7 14.3 ± 0.5 812 ± 27 57.7 3.25 ± 0.11

**Table 2.** Immobilization of (**A**) recombinant lactobacillal β-galactosidases on *L. plantarum* WCFS1 cell surface and (**B**) recombinant β-galactosidase from *L. bulgaricus* DSM 20081 (LysM-LacZLbul) on the cell surface of different *Lactobacillus* spp.

*<sup>a</sup>* IY (%) was calculated by subtraction of the residual enzyme activity (%) in the supernatant after immobilization from the total activity applied (100%). *<sup>b</sup>* Activity on the cell surface (%) is the percentage of enzyme activity measured on the cell surface to the total applied activity. Activity on the cell surface (U/g DCW) is calculated as the amount of enzyme (Units) per g dry cell weight. *<sup>c</sup>* Activity retention, AR (%), is the ratio of activity on the cell surface (%) to IY (%). *<sup>d</sup>* It was calculated based on specific activities of purified LacLMLreu of 180 U/mg protein [16] and of purified LacZLbul (His Tagged) of 250 U/mg protein [12]. Values given are the average value from at least two independent experiments, and the standard deviation was always less than 5%.

Due to higher immobilization yields and increased amounts of active surface-anchored protein in *L. plantarum*, LysM-LacZLbul was chosen for further analysis of its display on the cell surface of other food-relevant *Lactobacillus* spp. including *L. bulgaricus*, *L. casei* and *L. helveticus*. The parameters of residual activities in the supernatant after the anchoring experiment, activity on the cell surface, immobilization yields, activity retention and amounts of active surface-anchored LysM-LacZLbul were determined and are presented in Table 2B. It was shown that surface-anchored LysM-LacZLbul was released from the cell surface of *L. casei* during the subsequent washing steps (Figure 3B, lanes 9, 10). Western blot analysis of the crude, cell-free extracts of *Lactobacillus* LysM-LacZLbul-displaying cells indicated the binding of LysM-LacZLbul to all four *Lactobacillus* spp. tested (Figure 3C; lanes 3, 5, 6, 9) as was also confirmed by flow cytometry (Figure 4B–E). *L. plantarum* bound most efficiently among the tested *Lactobacillus* species shown by the highest immobilization yield and the highest amount of active, surface-anchored LysM-LacZLbul (Table 2B).

#### *2.3. Enzymatic Stability of* β*-Galactosidase-Displaying Cells*

Both temperature stability and reusability of β-galactosidase displaying cells were determined. For temperature stability, *L. plantarum* galactosidase-displaying cells were incubated in 50 mM sodium phosphate buffer (NaPB), pH 6.5 at different temperatures, and at certain time intervals, the residual β-galactosidase activities on *L. plantarum* cell surface were measured. Both LysM-LacLMLreu and LysM-LacZLbul-displaying cells are very stable at −20 ◦C with a half-life time of activity (τ<sup>1</sup> 2 ) of approximately 6 months (Table 3). The half-life time of activity of LysM-LacLMLreu-displaying cells at 30 ◦C is 55 h, whereas half-life times of activity of LysM-LacZLbul-displaying cells at 30 ◦C and 50 ◦C are 120 h and 30 h, respectively (Table 3).


**Table 3.** Stability of *L. plantarum* β-galactosidase-displaying cells at various temperatures *<sup>a</sup>*.

*<sup>a</sup> L. plantarum* galactosidase-displaying cells were incubated in 50mM sodium phosphate buffer (NaPB), pH 6.5 at different temperatures. Experiments were performed at least in duplicates. *<sup>b</sup>* not determined.

To test the reusability of LysM-LacLMLreu- and LysM-LacZLbul-displaying cells, the enzyme activity was measured during several repeated rounds of lactose conversion with two washing steps between each cycle. The enzymatic activities of *L. plantarum* LysM-LacZLbul-displaying cells decreased by ~23% and 27% at 30 ◦C and 50 ◦C (Figure 5), respectively, after three conversion/washing cycles, indicating that these displaying cells can be reused for several rounds of biocatalysis at tested temperatures. LysM-LacLMLreu-displaying cells are less stable than LysM-LacZLbul-displaying cells as only 56% of the initial β-galactosidase activity are retained at 30 ◦C after the third cycle (Figure 5). LysM-LacZLbul-displaying cells retained 35% of β-galactosidase activity after the fourth cycle at 50 ◦C, 57% and 51% after the fourth and fifth cycle, respectively, at 30 ◦C (Figure 5). These observations indicate that immobilized fusion LysM-β-galactosidases can be reused for at least four to five repeated rounds of lactose conversion.

**Figure 5.** Enzymatic activity of surface display β-galactosidases, LysM-LacLMLreu- and LysM-LacZLbul, during several repeated rounds of lactose conversion using *L. plantarum* WCFS1 displaying cells. Experiments were performed in duplicates, and the standard deviation was always less than 5%.

#### *2.4. Formation of Galacto-Oligosaccharides (GOS)*

Figure 6 shows the formation of GOS using *L. plantarum* cells displaying β-galactosidase LacZ from *L. bulgaricus* (LysM-LacZLbul) with 1.0 ULac β-galactosidase activity per mL of the reaction mixture and 205 g/L initial lactose in 50 mM sodium phosphate buffer (pH 6.5) at 30◦C. The maximal GOS yield was around 32% of total sugars obtained at 72% lactose conversion after 7 h of conversion. This observation shows that surface-displayed LacZ is able to convert lactose to form galacto-oligosaccharides. We could identify the main GOS products of transgalactosylation, which are β-D-Gal*p*-(1→6)-D-Glc, β-D-Gal*p*-(1→3)-D-Lac, β-D-Gal*p*-(1→3)-D-Glc, β-D-Gal*p*-(1→3)-D-Gal, β-D-Gal*p*-(1→6)-D-Gal, and

β-D-Gal*p*-(1→6)-D-Lac. This is similar to the product profile when performing the conversion reaction with the free enzyme as previously reported [12].

**Figure 6.** Course of reaction for lactose conversion by surface display β-galactosidase from *L. bulgaricus* (LysM-LacZLbul) in *L. plantarum* WCFS1 as determined by HPLC. The batch conversion was carried out at 30 ◦C using 205 g/L initial lactose concentration in 50 mM NaPB (pH 6.5) and constant agitation (500 rpm). *L. plantarum* LysM-LacZLbul displaying cells were added to equivalent concentrations of 1.0 ULac/mL of the reaction mixture. Experiments were performed in duplicates, and the standard deviation was always less than 5%.

#### **3. Discussion**

Surface display of proteins on cells of lactic acid bacteria (LAB) generally requires genetic modifications, which might have limitations in food and medical applications due to the sensitive issue of the use of genetically modified organisms (GMO). Anchoring heterologous proteins on the cell surface of non-genetically modified LAB (non-GMO) via mediated cell wall binding domains including surface layer domain (SLPs) [33,34], LysM domain [26,30,35–37], W × L domains [38] attracts increasing interest.

Lysin motif (LysM) domains are found in many bacterial peptidoglycan hydrolases [26,38,39]. Peptidoglycan contains sugar (glycan) chains, which consist of *N*-acetylglucosamine (NAG) and *N*-acetylmuramic acid (NAM) units joined by glycosidic linkages. Proteins harboring LysM motifs have been shown to bind non-covalently to the peptidoglycan layer and have been employed to display heterologous proteins on the bacterial cell-surface [26,40,41]. These domains can contain single or multiple LysM motifs [41], and they have been used to display proteins in LAB by fusion either to the N- or C-terminus of a target protein [27–30]. Interestingly, the LysM motif derived from the *L. plantarum* Lp\_3014 transglycosylase has been used successfully for surface display of invasin [36] and a chemokine fused to an HIV antigen [37] previously.

In this work, we used the single LysM domain derived from Lp\_3014 to anchor two different lactobacillal β-galactosidases, a heterodimeric type from *L. reuteri* and a homodimeric type from *L. bulgaricus*, on the cell surface of four species of lactobacilli. Functional active fusion proteins, LysM-LacLMLreu and LysM-LacZLbul, were successfully expressed in *E. coli*. However, the expression yield of LysM-LacLMLreu was ten-fold lower than that of the β-galactosidase from *L. reuteri* (LacLMLreu) without LysM expressed previously in *E. coli,* which was reported to be 110 kU of β-galactosidase activity per liter of cultivation medium [16]. This may indicate that the fusion of the LysM domain has a negative effect on the expression level. Interestingly, the expression yields of LysM-LacZLbul were 4-fold and 7-fold higher in terms of volumetric and specific activities, respectively, than that of LysM-LacLMLreu using the same host, expression system and induction conditions. β-Galactosidase from *L. bulgaricus* (LysM-LacZLbul) is a homodimer whereas β-galactosidase from *L. reuteri* (LysM-LacLMLreu) is a heterodimer, and hence the fusion of the LysM domain only to the LacL subunits might lead to the discrepancy between the yields of these two fusion proteins due to different folding mechanisms.

Not surprisingly, the affinity for peptidoglycan of homodimeric LysM-LacZLbul is significantly higher than LysM-LacLMLreu as shown by the immobilization yield (Table 2A). As aforementioned LacLMLreu from *L. reuteri* is a heterodimer and the LysM domain is fused N-terminally to only LacL, while LacZLbul from *L. bulgaricus* is a homodimer, hence each of the identical subunits will carry its own LysM domain leading to stronger attachment of LacZ on the *L. plantarum* cell wall. This could be a likely explanation for the higher immobilization yields observed for LysM-LacZLbul. Even though the immobilization yields obtained in this study were significantly lower than the immobilization yields for these same enzymes when a chitin binding domain (ChBD) together with chitin was used [17], the activity retention (AR) on the *L. plantarum* cell surface (46.9% and 63.5% for LysM-LacLMLreu and LysM-LacZLbul, respectively) were significantly higher. The AR values for ChBD-LacLM, LacLM-ChBD and LacZ-ChBD using chitin beads were 19%, 26% and 13%, respectively [17]. Notably, the amount of active surface anchored LysM-LacLMLreu (0.99 ± 0.02 mg per g dry cell weight) on the cell surface of *L. plantarum* WCFS1 is significantly lower than LysM-LacZLbul (4.61 ± 0.05 mg per g dry cell weight). This is mainly due to the low immobilization yield of LysM-LacLMLreu. *L. plantarum* collected from one mL cultures at OD600 ~4.0 was used in immobilization reactions, hence the amount of *L. plantarum* cells was estimated to be ~3.0 <sup>×</sup> 10<sup>9</sup> cfu/mL. Therefore, we calculated that 8.22 <sup>μ</sup>g LysM-LacLMLreu and 38.3 <sup>μ</sup>g LysM-LacZLbul anchored on 3.0 <sup>×</sup> 109 *L. plantarum* cells or 0.002 pg LysM-LacLMLreu and 0.012 pg LysM-LacZLbul per *L. plantarum* cell. Xu et al. (2011) reported the use of the putative muropeptidase MurO (Lp\_2162) from *L. plantarum* containing two putative LysM repeat regions for displaying a green fluorescent protein (GFP) and a β-galactosidase from *Bifidobacterium bifidum* on the surface of *L. plantarum* cells [42]. They reported that 0.008 pg of GFP was displayed per cell on non-treated *L. plantarum* cells, while the amount of active surface anchored β-galactosidase from *B. bifidum* on the surface of *L. plantarum* cells was not reported in that study.

Further, we tested the capability of binding the fusion protein LysM-LacZLbul to the cell wall of three other *Lactobacillus* species. *L. plantarum* showed the best capacity among the tested *Lactobacillus* for surface anchoring of LysM-LacZLbul (Table 2B), whereas *L. bulgaricus*, *L. casei* and *L. helveticus* are comparable in term of the amount of active surface-anchored enzyme.

The highest GOS yield of 32% obtained with the surface-immobilized enzyme is lower than the yield obtained with the free enzyme LacZ from *L. bulgaricus* (Figure 6), which was previously reported to be approximately 50% [12]. This could be due to the binding of LysM-LacZLbul to the peptidoglycan and the attachment of the enzyme on *Lactobacillus* cell surface, which might hinder the access of the substrate lactose to the active site of the enzyme. Interestingly, the GOS yield obtained from lactose conversion using *L. plantarum* cells displaying β-galactosidase (LysM-LacZLbul) from *L. bulgaricus* is significantly higher than the yield obtained with immobilized β-galactosidase (LacZ-ChBD) on chitin, which was previously reported around 23%–24% [12]. It indicates that β-galactosidase from *L. bulgaricus* anchored on *L. plantarum* cell surface is more catalytically efficient than its immobilized form on chitin.

#### **4. Materials and Methods**

#### *4.1. Bacterial Strains and Culture Conditions*

The bacterial strains and plasmids used in this study are listed in Table 4. *Lactobacillus plantarum* WCFS1, isolated from human saliva as described by Kleerebezem et al. [32], was originally obtained from NIZO Food Research (Ede, The Netherlands) and maintained in the culture collection of the Norwegian University of Life Sciences, Ås, Norway. *L. helveticus* DSM 20075 (ATCC 15009) and *L. delbrueckii* subsp. *bulgaricus* DSM 20081 (ATCC 11842) were obtained from the German Collection of Microorganisms and Cell Cultures (DSMZ; Braunschweig, Germany). *L. casei* was obtained from the culture collection of the Food Biotechnology Laboratory, BOKU-University of Natural Resources and Life Sciences Vienna. *Lactobacillus* strains were cultivated on MRS medium (*Lactobacillus* broth according to De Man, Rogosa and Shape [43]) (Carl Roth, Karlsruhe, Germany) at 37 ◦C without agitation. *E. coli* NEB5α (New England Biolabs, Frankfurt am Main, Germany) was used as cloning hosts in the transformation of DNA fragments; whereas *E. coli* HST08 (Clontech, Mountain View, CA, USA) was used as the expression host strain. *E. coli* strains were cultivated in Luria-Bertani (LB) medium (10g/L tryptone, 10 g/L NaCl, and 5 g/L yeast extract) at 37 ◦C with shaking at 140 rpm. Agar media were prepared by adding 1.5% agar to the respective media. When needed, ampicillin was supplemented to media to a final concentration of 100 μg/mL for *E. coli* cultivations.


#### *4.2. Chemicals, Enzymes and Plasmids*

All chemicals and enzymes were purchased from Sigma (St. Louis, MO, USA) unless stated otherwise and were of the highest quality available. All restriction enzymes, Phusion high-fidelity DNA polymerase, T4 DNA ligase, and corresponding buffers were from New England Biolabs (Frankfurt am Main, Germany). Staining dyes, DNA and protein standard ladders were from Bio-Rad (Hercules, CA, USA). All plasmids used in this study are listed in Table 4.

#### *4.3. DNA Manipulation*

Plasmids were isolated from *E. coli* strains using Monarch Plasmid Miniprep Kit (New England Biolabs, Frankfurt am Main, Germany) according to the manufacturer's instructions. PCR amplifications of DNA were done using Q5 High-Fidelity 2X Master Mix (New England Biolabs). The primers used in this study, which were supplied by VBC-Biotech Service (Vienna, Austria), are listed in Table 5. PCR products and DNA fragments obtained by digestion with restriction enzymes were purified using Monarch DNA Gel Extraction Kit (New England Biolabs); and the DNA amounts were estimated using Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA). The sequences of PCR-generated fragments were verified by DNA sequencing performed by a commercial provider (Microsynth, Vienna, Austria). The ligation of DNA fragments was performed using NEBuilder HiFi Assembly Cloning Kit (New England Biolabs). All plasmids were transformed into *E. coli* NEB5α chemical competent cells following the manufacturer's protocol for obtaining the plasmids in sufficient amounts. The constructed plasmids (Table 4) were chemically transformed into expression host strain *E. coli* HST08.


**Table 5.** Primers used in the study.

\* The nucleotides in italics are the positions that anneal to the DNA of the target genes (*lacLM* or *lacZ*).

#### *4.4. Plasmid Construction*

Two recombinant fusion proteins were constructed. The first fusion protein was based on LacLM from *L. reuteri* and the LysM domain attached upstream of LacLM (termed LysM-LacLMLreu). The second fusion protein was based on LacZ from *L. delbrueckii* subsp. *bulgaricus* DSM 20081 and the LysM domain attached upstream of LacZ (termed LysM-LacZLbul). Plasmid pBAD\_3014AgESAT\_DC (Table 4) [44] (was used for the construction of the expression plasmids. This plasmid is a derivate of pBAD vector (Invitrogen, Carlsbad, CA, USA) containing a 7 × His tag sequence and a single LysM domain from Lp\_3014, which is a putative extracellular transglycosylase with LysM peptidoglycan binding domain from *L. plantarum* WCFS1 (NCBI reference sequence no. NC\_004567.2) [31,32], fused to the hybrid tuberculosis antigen AgESAT-DC [44]. The fragment of *lacLM* genes from *L. reuteri* was amplified from the plasmid pHA1032 (Table 4) [16] with the primer pair Fwd1LreuSalI and Rev1LreuEcoRI (Table 5), whereas the *lacZ* gene from *L. bulgaricus* was amplified from the plasmid pTH103 (Table 4) [12] with the primer pair Fwd2LbulSalI and Rev2LbulEcoRI (Table 5). The PCR-generated products were then cloned into *Sal*I and *EcoR*I cloning sites of the pBAD\_3014AgESAT\_DC vector using and NEBuilder HiFi DNA Assembly Cloning Kit (New England Biolabs) following the manufacturer's instructions, resulting in two expression plasmids pBAD3014LacLMLreu and pBAD3014LacZLbul (Figure 1).

#### *4.5. Gene Expression in E. coli*

The constructed plasmids pBAD3014LacLMLreu and pBAD3014LacZLbul were chemically transformed into expression host *E. coli* HST08. For gene expression, overnight cultures of *E. coli* HST08 were diluted in 300 mL of fresh LB broth containing 100 μg/mL ampicillin to an OD600 of ~0.1 and incubated at 37 ◦C with shaking at 140 rpm to an OD600 ~0.6. Gene expression was then induced by L-arabinose to a final concentration of 0.7 mg/mL and the cultures were incubated further at 25 ◦C for 18 h with shaking at 140 rpm. Cells were harvested at an OD600 of ~3.0 by centrifugation at 4000× *g* for 30 min at 4 ◦C, washed twice, and resuspended in 50 mM sodium phosphate buffer (NaPB), pH 6.5. Cells were disrupted by using a French press (AMINCO, Maryland, USA). Debris was removed by centrifugation (10,000× *g* for 15 min at 4 ◦C) to obtain the crude extract.

#### *4.6. Immobilization of* β*-Galactosidases on Lactobacillus Cell Surface*

One mL of *Lactobacillus* cultures were collected at OD600 ~4.0 by centrifugation (4000× *g* for 5 min at 4 ◦C) and the cells were washed with 50 mM sodium phosphate buffer (NaPB), pH 6.5. The cell pellets were then mixed with one mL of diluted cell-free crude extracts of 50 U*o*NPG/mL (~5–6 mg protein/mL) of fused LysM-β-galactosidases (LysM-LacLMLreu or LysM-LacZLbul) and incubated at 37 ◦C for 24 h with gentle agitation. *Lactobacillus* β-galactosidase displaying cells were separated from the supernatants by centrifugation (4000× *g* for 5 min at 4 ◦C). Cells were then washed with NaPB (pH 6.5) two times; the supernatants and wash solutions were collected for SDS-PAGE analysis and activity and protein measurements. *Lactobacillus* β-galactosidase displaying cells were resuspended in NaPB (pH 6.5) for further studies.

#### *4.7. Protein Determination*

Protein concentrations were determined using the method of Bradford [45] with bovine serum albumin (BSA) as standard.

#### *4.8.* β*-Galactosidase Assays*

β-Galactosidase activity was determined using *o*-nitrophenyl-β-D-galactopyranoside (*o*NPG) or lactose as the substrates as previously described [5] with modifications. When chromogenic substrate *o*NPG was used, the reaction was started by adding 20 μL of *Lactobacillus* β-galactosidase displaying cell suspension to 480 μL of 22 mM *o*NPG in 50 mM NaBP (pH 6.5) and stopped by adding 750 μL of 0.4 M Na2CO3 after 10 min of incubation at 30 ◦C. The release of *o*-nitrophenol (*o*NP) was measured by determining the absorbance at 420 nm. One unit of *o*NPG activity was defined as the amount of β-galactosidase releasing 1 μmol of *o*NP per minute under the defined conditions.

When lactose was used as the substrate, 20 μL of *Lactobacillus* β-galactosidase displaying cell suspension was added to 480 μL of a 600 mM lactose solution in 50 mM sodium phosphate buffer, pH 6.5. After 10 min of incubation at 30 ◦C, the reaction was stopped by heating the reaction mixture at 99 ◦C for 5 min. The reaction mixture was cooled to room temperature, and the release of D-glucose was determined using the test kit from Megazyme. One unit of lactase activity was defined as the amount of enzyme releasing 1 μmol of D-glucose per minute under the given conditions.

#### *4.9. Gel Electrophoresis Analysis*

For visual observation of the expression level of the two recombinant β-galactosidases (LysM-LacLMLreu and LysM-LacZLbul) in *E. coli* and the effectiveness of the immobilization, cell-free extracts, supernatants, and wash solutions were analyzed by Sodium Dodecyl Sulfate Polyacrylamide Gel Electrophoresis (SDS-PAGE). Protein bands were visualized by staining with Bio-safe Coomassie (Bio-Rad). The determination of protein mass was carried out using Unstained Precision plus Protein Standard (Bio-Rad).

#### *4.10. Western Blotting*

Proteins in the cell-free extracts were separated by SDS-PAGE. Protein bands were then transferred to a nitrocellulose membrane using the Trans-Blot TurboTM Transfer System (Biorad) following the manufacturer's instructions. Monoclonal mouse anti-His antibody (Penta His Antibody, BSA-free) was obtained from Qiagen (Hilden, Germany), diluted 1:5000 and used as recommended by the manufacturer. The protein bands were visualized by using polyclonal rabbit anti-mouse antibody conjugated with horseradish peroxidase (HRP) (Dako, Denmark) and the ClarityTM Western ECL Blotting Substrate from Bio-Rad (Hercules, CA, USA).

#### *4.11. Flow Cytometry*

*Lactobacillus* β-galactosidase displaying cells were resuspended in 50 μL of phosphate buffered saline (PBS) (137 mM NaCl, 2.7 mM KCl, 2 mM KH2PO4, and 10 mM Na2HPO4, pH 7.4) containing 2% of BSA (PBS-B) and 0.1 μL of Penta His Antibody, BSA-free (Qiagen; diluted 1:500 in PBS-B). After incubation at RT for 40 min, the cells were centrifuged at 4000× *g* for 5 min at 4 ◦C and washed three times with 500 μL PBS-B. The cells were subsequently incubated with 50 μL PBS-B and 0.1 μL anti-mouse IgG H&L/Alexa Flour 488 conjugate (Cell Signaling Technology, Frankfurt am Main, Germany, diluted 1:750 in PBS-B) for 40 min in the dark at room temperature. After washing five times with 500 μL PBS-B, the stained cells were analyzed by flow cytometry using a CytoFLEX Flow Cytometer (Beckman Coulter, Brea, CA, USA) following the manufacturer's instructions.

#### *4.12. Temperature Stability and Reusability of Immobilized Enzymes*

The temperature stability of immobilized enzymes was studied by incubating *L. plantarum* LysM-LacLMLreu- and LysM-LacZLbul-displaying cells in 50 mM NaPB (pH 6.5) at various temperatures (−20, 4, 30, 50 ◦C). At certain time intervals, samples were withdrawn, the residual activity was measured using *o*NPG as the substrate under standard assay conditions and the τ1/<sup>2</sup> value was determined.

To test the reusability of immobilized enzymes, several repeated rounds of lactose conversion at 30 ◦C using LysM-LacLMLreu- and LysM-LacZLbul-displaying cells and at 50 ◦C using LysM-LacZLbul-displaying cells were carried out with 600 mM initial lactose in 50mM NaBP (pH 6.5) and constant agitation (500 rpm). The enzyme activity during these repeated cycles with intermediate two washing steps was measured using *o*NPG as the substrate under standard assay conditions.

#### *4.13. Lactose Conversion and Formation of Galacto-Oligosaccharides (GOS)*

The conversion of lactose was carried out in discontinuous mode using *L. plantarum* cells displaying β-galactosidase LacZ from *L. bulgaricus* (LysM-LacZLbul). The conversion was performed at 30 ◦C using 205 g/L initial lactose concentration in 50 mM NaPB (pH 6.5) and constant agitation (500 rpm). *L. plantarum* LysM-LacZLbul displaying cells were added to equivalent concentrations of 1.0 ULac/mL of reaction mixture. Samples were withdrawn at intervals, heated at 99 ◦C for 5 min and further analyzed for lactose, galactose, glucose and GOS present in the samples.

#### *4.14. Analysis of Carbohydrate Composition*

The carbohydrate composition in the reaction mixture was analyzed by high-performance liquid chromatography (HPLC) equipped with a Dionex ICS-5000+ system (Thermo Fisher Scientific) consisting of an ICS-5000+ dual pump (DP) and an electrochemical detector (ED). Separations were performed at room temperature on CarboPac PA-1 column (4 × 250 mm) connected to a CarboPac PA-1 guard column (4 × 50 mm) (Thermo Fisher Scientific) with flow rate 1 mL/min. All eluents A (150 mM NaOH), B (150 mM NaOH and 500 mM sodium acetate) and C (deionized water) were degassed by flushing with helium for 30 min. Separation of D-glucose, D-galactose, lactose and allolactose was carried out with a run with the following gradient: 90% C with 10% A for 45 min at 1.0 mL/min, followed by 5 min with 100% B. The concentration of saccharides was calculated by interpolation from external standards. Total GOS concentration was calculated by subtraction of the quantified saccharides (lactose, glucose, galactose) from the initial lactose concentration. The GOS yield (%) was defined as the percentage of GOS produced in the samples compared to initial lactose.

#### *4.15. Statistical Analysis*

All experiments and measurements were conducted at least in duplicate, and the standard deviation (SD) was always less than 5%. The data are expressed as the mean ± SD when appropriate.

#### **5. Conclusions**

This work describes the immobilization of two lactobacillal β-galactosidases, a β-galactosidase from *L. reuteri* of the heterodimeric LacLM-type and one from *L. bulgaricus* of the homodimeric LacZ-type, on the *Lactobacillus* cell surface using a peptidoglycan-binding motif as an anchor, in this case, the single LysM domain Lp\_3014 from *L. plantarum* WCFS1. The immobilized fusion LysM-β-galactosidases are catalytically efficient and can be reused for several repeated rounds of lactose conversion. Surface anchoring of β-galactosidases in *Lactobacillus* results in safe, non-GMO and stable biocatalysts that can be used in the applications for lactose conversion and production of prebiotic galacto-oligosaccharides.

**Author Contributions:** Conceptualization, G.M. and T.-H.N.; Data Curation, M.-L.P., G.M. and T.-H.N.; Investigation, M.-L.P., A.-M.T., S.K. and T.-T.N.; Methodology, M.-L.P. and T.-H.N.; Supervision, T.-H.N.; Writing-Original Draft Preparation, M.-L.P.; Writing-Review & Editing, T.-H.N.

**Funding:** M.-L.P. thanks the European Commission for the Erasmus Mundus scholarship under the ALFABET project. S.K. and A.-M.T. are thankful for the Ernst Mach—ASEA Uninet scholarships granted by the OeAD—Austrian Agency for International Cooperation in Education and Research and financed by the Austrian Federal Ministry of Science, Research and Economy. T.-H.N. acknowledges the support from the Austrian Science Fund (FWF Project V457-B22).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article Bacillus subtilis* **Lipase A—Lipase or Esterase?**

#### **Paula Bracco 1, Nelleke van Midden 1, Epifanía Arango 1, Guzman Torrelo 1, Valerio Ferrario 2, Lucia Gardossi <sup>2</sup> and Ulf Hanefeld 1,\***


Received: 20 February 2020; Accepted: 5 March 2020; Published: 7 March 2020

**Abstract:** The question of how to distinguish between lipases and esterases is about as old as the definition of the subclassification is. Many different criteria have been proposed to this end, all indicative but not decisive. Here, the activity of lipases in dry organic solvents as a criterion is probed on a minimal α/β hydrolase fold enzyme, the *Bacillus subtilis* lipase A (BSLA), and compared to *Candida antarctica* lipase B (CALB), a proven lipase. Both hydrolases show activity in dry solvents and this proves BSLA to be a lipase. Overall, this demonstrates the value of this additional parameter to distinguish between lipases and esterases. Lipases tend to be active in dry organic solvents, while esterases are not active under these circumstances.

**Keywords:** hydrolase; lipase; esterase; *Bacillus subtilis* lipase A; transesterification; organic solvent; water activity

#### **1. Introduction**

Lipases and esterases both catalyze the hydrolysis of esters. This has led to the longstanding question: how can we distinguish between a lipase and an esterase? As the simple hydrolysis of an ester does not suffice, a range of different criteria has been suggested [1–5]. (1) The oldest distinction is the kinetic and structural criterion of interfacial activation, which was already described in 1936 [6]. However, several lipases do not fulfill this; in particular, the much-used *Candida antarctica* lipase B (CALB) does not [7]. (2) Directly linked to the interfacial activation is the lid that covers the active site of many lipases and, via a conformational change, makes the active site more accessible once an interface is present. Again, this is not the case for all lipases [1–3,7,8]. (3) Primary sequence data were shown not to be distinctive enough [2]. (4) Substrates and inhibitors, such as Orlistat, can be utilized to distinguish between esterases and lipases but, again, they are not precise. However, the different substrate ranges are indicative. Esterases tend to be capable of the hydrolysis of water-soluble esters and, in general, short and/or branched side chain esters, while lipases hydrolyze triglycerides, apolar esters, substituted with linear side chains, as well as waxes. This is seen as a reliable but not decisive criterion [2,9]. (5) The activity of the enzyme in the presence of (water-miscible) organic solvents has been proposed as a property of lipases, but other enzymes fulfill this criterion, too [2,10–14]. (6) A parameter already investigated some time ago is the activity of lipases in the absence of water, i.e., in modestly polar, water-non-miscible solvents at very low water activities (aw). Out of all enzymes tested, only lipases and the closely related cutinases are active at low aw [1–5,10,15–19]. While not all lipases display this property, it is highly distinctive [20,21].

To probe whether aw is indeed a suitable parameter to distinguish between lipases and esterases and between lipases and other hydrolases in general, we studied the behavior of *Bacillus subtilis* lipase A (BSLA) [9]. BSLA is a small (181 amino acids, 19 kDa) serine hydrolase (Figure 1). It is neither interfacially activated nor does it have a lid (criteria one and two) [9,22–24] and sequence data are not conclusive, but it is a minimal α/β hydrolase fold enzyme [9,23,25]. The substrate range clearly qualifies BSLA as a lipase, as does the stability in the presence of solvents [9,26–29]. This stability has even been significantly improved in recent mutational studies and BSLA mutants can be very stable in the presence of water-miscible solvents, such as dimethyl sulfoxide (DMSO), dioxane and trifluoroethanol [30,31]. Studies on BSLA in dry organic solvents are, however, missing. As an experimental parameter, we demonstrate the activity of BSLA in dry toluene. Toluene is not water-miscible and has a logP of 2.5 [32]. It is commonly used in organic synthesis and is highly suitable for lipases and also other enzymes with an α/β hydrolase fold. To date, only lipases were shown to be active in toluene with a very low aw [1,3].

**Figure 1.** *Bacillus subtilis* lipase A (BSLA) is the smallest serine hydrolase with an α/β hydrolase fold. With only 181 amino acids, it has a molecular weight of 19 kDa. The depicted BSLA structure is pdb 1R50 and the catalytic triad His156, Ser77 and Asp133 and the oxyanion hole Ile12 and Met78 are highlighted. The figure was created with PyMOL.

Additionally, we extend the structural assignment of the hydrolase character with the GRID-based (Fortran program [33]) Global Positioning System in Biological Space (Bio GPS) investigation [33]. BioGPS utilizes surface shape and polarity as criteria. It is neither based on direct sequence comparison, nor on structure superimposition [33,34]. Earlier studies with this method had placed CALB in both the esterase and lipase group. CALB works extremely well in dry solvents and is, therefore, often applied in reactions that require these conditions, such as dynamic kinetic resolutions [1,3,7,35]. On the other hand, it misses interfacial activation and major conformational changes do not take place when CALB comes into contact with an apolar second phase (see above). As such, BioGPS recognized the ambivalence in the assignment of CALB as a lipase well.

Here, we describe the investigation of BSLA by BioGPS and a comparison to other lipases, in particular CALB. We also probe the lipase character of both BSLA and CALB at different aw. In this manner, new experimental and computational criteria for the esterases and lipases are introduced and investigated.

#### **2. Results**

#### *2.1. BioGPS*

BioGPS descriptors can be utilized to explore enzyme active site properties and to group them according to their similarities and differences. As such, they can help to explore promiscuous activities. In an earlier study, the character of CALB was investigated in a set of 42 serine hydrolases. The set contained 11 amidases, nine proteases, 11 esterases and 11 lipases, one of them being CALB [33]. Here, we expand this set with BSLA, utilizing the pdb 1R50 with a resolution of 1.4 Å for the structural information (Table S1). Three probes were used to map specific electrostatic and geometrical active site properties. The O-probe evaluates the H-bond donor properties of the enzymes; the N1 probe, on the contrary, evaluates the H-bond acceptor properties and the DRY probe evaluates the hydrophobic interactions [33]. The DRY probe is clearly of special importance for enzymes that accept hydrophobic substrates, as is the case for lipases.

Considering each property separately, the O-probe located BSLA (pdb 1R50) not among the lipases, but in the amidases cluster, together with a number of esterases (Figure S1a). Equally, the N1-probe (Figure S1b) placed BSLA among the amidases. The DRY probe (Figure S1c), again, placed BSLA amongst the amidases and esterases. This is, in all cases, in contrast to CALB (pdb 1TCA) but it should also be noted that *Candida rugosa* lipase (CRL), a classic lipase with a prominent movement of the lid (criteria one and two) is also always outside the lipase cluster in the different analyses. The previous study ascribed this behavior to the lower hydrophobic nature of the active site of CRL (pdb 1CRL) when compared to the other lipases [8,33].

In the global score, which considers all the mapped properties of the BioGPS together, BSLA can be found firmly among the amidases and esterases (Figure 2), while CALB is in the lipase cluster in the area overlapping with the esterase cluster. CRL, again, is outside the lipase cluster and indeed seems to take a separate position.

**Figure 2.** BioGPS of 43 serine hydrolases, for BSLA the data of pdb 1R50 were utilized (global score). Each analyzed enzyme structure is placed within a multidimensional space. Relative distances between each enzyme and all the other enzymes are determined by a statistical principal component analysis. The pdb codes of the processed enzyme structures are indicated in different colors according to their class: lipases in blue, amidases in red, proteases in cyan and esterases in green; the BSLA structure is in black.

BSLA is, according to its substrate range, very clearly a lipase and not an esterase. Amidase activity has to date not been reported for BSLA. While initially surprising, these results also indicate that the study should be extended with an activity assay for amidases.

#### *2.2. Amidase Activity*

To probe for amidase activity in BSLA, an amidase activity assay is utilized (Scheme 1). This assay employs benzyl chloroacetamide as standard amide. The released amine reacts with 4-nitro-7-chloro-benzo-2-oxa-1,3-diazole, yielding an adduct that can directly be quantified spectrophotometrically at 475 nm [36].

**Scheme 1.** Amidase activity assay [36]. The assay can be quantified spectrophotometrically. BSLA showed no activity in this assay, ruling out amidase activity.

BSLA showed no activity in this assay. A control experiment with another serine hydrolase, the acyltransferase from *Mycobacterium smegmatis* (*Ms*Act), was performed. This enzyme is an acyltransferase [37,38] and displays promiscuous amidase activity [39,40]. *Ms*Act exhibited activity in this 24 h assay (> 20% conversion of the 5 mM substrate), showing that even minor, promiscuous activities are detectable. This rules out amidase activity for BSLA and supports the earlier assignment of the enzyme as a lipase.

#### *2.3. BSLA Activity in Dry Organic Solvents*

To probe the activity of BSLA at low aw, toluene was used as the solvent and the transesterification of 1-octanol with vinyl acetate was performed as a test reaction (Scheme 2). The use of 1-octanol as a long chain aliphatic compound is a good substrate for lipases [1–5,9] and vinyl acetate is a readily available and widely utilized acyl donor in lipase catalyzed acylation reactions [1,3,41,42]. All reactions were performed with lyophilized BSLA. In parallel, CALB was also tested to ensure direct comparability with one of the most-used lipases. CALB was utilized both as lyophilized enzyme and immobilized as Novozym 435. The latter preparation is most commonly employed, both in the laboratory and on industrial scale [43].

**Scheme 2.** Test reaction for the activity of BSLA at low aw. The reaction was performed in toluene at 30 ◦C, with a ratio of 1-octanol to vinyl acetate of 1:5 and aw < 0.1, 0.23 and 0.75.

BSLA and CALB were produced by expressing the codon-optimized genes in *E. coli* BL21 (DE3) within pET22b. Subsequent purification gave both enzymes a good purity (Figure 3). With this expression system, both enzymes are not glycosylated. The CALB Novozym 435 produced and immobilized by Novozymes, however, is expressed in *Aspergillus oryzae* and it is, therefore, glycosylated [44].

**Figure 3.** Sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) gels of purified BSLA (19 kDa) and Candida antarctica lipase B (CALB) (33 kDa).

Three different aw were tested < 0.1 to establish whether BSLA shows the activity in dry solvent only observed for lipases, with aw = 0.23 as a low value at which most enzymes lose all their activity and aw = 0.75, an activity at which most enzymes are active [10,21,45,46]. To rigorously ascertain these values of the solvent and reagents, including the internal standard, decane and the enzyme preparations were equilibrated via the vapor phase with dried molecular sieves (activated at elevated temperatures, 5 Å) for aw < 0.1 [47]. For the other aw, the enzyme preparations and the other components were equilibrated via the gas phase with an oversaturated solution of potassium acetate (aw = 0.23) and sodium chloride (aw = 0.75) [48–52]. For all components, the water content was determined by Karl Fischer titration and equilibrations were considered complete when no changes were observed any more (24–48 h, Table 1). As vinyl acetate was found to negatively affect the Karl Fischer titration, it was freshly distilled and dried with activated molecular sieves for 16 h before use. The activity of the different enzyme preparations was also followed with the tributyrin and *p*-nitrophenol acetate activity assays [2,5,53–56] during equilibration, to establish optimal equilibration times. For BSLA, a small loss of activity over time was observed, while both CALB preparations were stable.

**Table 1.** Equilibration to different aw via vapor phase over a saturated solution of salt [47,51] and via the salt pair method [50]. All reaction components, except the acyl donor, were mixed and equilibrated overnight at 30 ◦C. Finally, dried and freshly distilled vinyl acetate was added in order to start the reaction. The water content was determined by Karl Fischer Titration after 48 h.


a) Not applicable (NA).

Once reagents and enzymes were equilibrated, the reactions were performed with 100 mM 1-octanol and 500 mM vinyl acetate in previously equilibrated toluene at 30 ◦C and 1000 rpm (Figure 4). Equal activity of the enzymes (Units) was utilized as determined with the tributyrin activity assay. CALB and, in particular, the well-established commercial preparation of CALB, Novozym 435, performed very well. In both cases, full conversion to 1-octyl acetate was observed. In comparison, BSLA displayed lower conversions (Figure 4). However, the key indicator for a lipase is its activity at low aw. Here, BSLA and Novozym 435 performed best. For the synthesis of 1-octyl acetate, the trend is a reduction in specific rate at higher aw (Figure 5). BSLA is very active in dry solvent, as is Novozym 435. Both display lower activities at higher aw. CALB does not follow this trend.

**Figure 4.** Activity of BSLA, CALB and Novozym 435 in toluene with different aw. U = μmol butyric acid <sup>×</sup> min−<sup>1</sup> in tributyrin activity assay, 0.5–1.2 U of catalyst, 1-octanol (100 mM), vinyl acetate (5 eq.), ISTD: Decane (500 mM), 1 mL reaction volume, 24 h, 30 ◦C and 1000 rpm. Blanks were performed in the absence of enzyme and showed no conversion. Final conversions are given as inset; the color corresponds to the aw.

**Figure 5.** Activity of BSLA, CALB and Novozym 435 in toluene with different aw. Reaction conditions: 0.5–1.2 U of catalyst, 100 mM 1-octanol, 500 mM vinyl acetate, 500 mM decane (ISTD), in dry toluene (1 mL reaction) at 30 ◦C and 1000 rpm. U: <sup>μ</sup>mol butyric acid <sup>×</sup> min<sup>−</sup>1. Blanks were performed in the absence of enzyme and showed no conversion.

In an earlier study, it had been demonstrated, for different CALB preparations, that this change in activity in the synthesis reaction to 1-octyl acetate can be due to the hydrolysis of the acyl donor vinyl acetate [47]. Therefore, the synthesis reaction at aw < 0.1 was repeated for BSLA with a 1-octanol to vinyl acetate ratio of 1:1 (Figure 6). Almost the same rate and conversion was observed as with the 1:5 ratio, indicating that, at this low aw, essentially no hydrolysis occurred, as was the case for Novozym 435, as reported earlier. Overall, these differences in performance at altered aw can be ascribed to several influences [47,57,58]. Novozyme 435 is an immobilized enzyme and its high activity can be linked to the dispersion of the enzyme on a large surface, promoting its mass transfer and preventing particle aggregation. In contrast, the lyophilized enzymes have a reduced accessibility of the individual enzymes in the preparation. Furthermore, it is well established that immobilized enzymes are better protected against the acetaldehyde that is a side product of the acylation reaction [59]. A difference in susceptibility to acetaldehyde induced deactivation might also cause the alterations in rate between the two pure enzymes. However, similarly, ionization and water clustering can influence the activity [60,61], leading to these alterations. To demonstrate that the observed effect is general, the experiments were repeated, but this time with BSLA that was dried by co-lyophilization with a salt to establish the desired aw [62,63]. The enzyme is now in a different environment and two different aw were established, < 0.1 and 0.57. At < 0.1, very similar results were obtained. Equally, at higher aw, the ester formation slowed down as before, but could be restarted by adding additional vinyl acetate (Figure 7).

**Figure 6.** Activity of BSLA, toluene at aw < 0.1. Reaction conditions: 0.5–1.2 U of catalyst, 100 mM 1-octanol, 100 mM or 500 mM vinyl acetate, 500 mM decane (ISTD), in dry toluene (1 mL reaction) at <sup>30</sup> ◦C and 1000 rpm. U: <sup>μ</sup>mol butyric acid <sup>×</sup> min<sup>−</sup>1. Blanks were performed in the absence of enzyme and showed no conversion.

**Figure 7.** Activity of BSLA co-lyophilized with the appropriate salt, toluene at aw < 0.1 or 0.57. Reaction conditions: 0.5–1.2 U of catalyst, 100 mM 1-octanol, 100 mM vinyl acetate, 500 mM decane (ISTD), in dry toluene (1 mL reaction) at 30 ◦C and 1000 rpm. U: <sup>μ</sup>mol butyric acid <sup>×</sup> min<sup>−</sup>1. Blanks were performed in the absence of enzyme, i.e., in the presence of salt, and showed no conversion. After 8 h (480 min) an additional equivalent of vinyl acetate was added.

To confirm this activity of BSLA (equilibrated via the gas phase) in dry toluene as a general property, the reaction was repeated in dry methyl-t-butyl ether (MTBE) at the same low aw < 0.1. Enzymes display the same activity in organic solvents when these have the same aw [64]. Indeed, the BSLA-catalyzed esterification displayed a very similar reaction progress in MTBE and toluene (Figure 8). This confirms the activity of BSLA at low aw, in line with the earlier observed catalytic activity of CALB at low aw [47] and of *Rhizomucor miehei* lipase at very low aw [65].

**Figure 8.** Activity of BSLA, in MTBE and toluene aw < 0.1. Reaction conditions: 0.5–1.2 U of catalyst, 100 mM 1-octanol, 500 mM vinyl acetate, 500 mM decane (ISTD), in dry solvent (1 mL reaction) at 30 ◦C and 1000 rpm. U: <sup>μ</sup>mol butyric acid <sup>×</sup> min<sup>−</sup>1. Blanks were performed in the absence of enzyme and showed no conversion.

#### **3. Discussion**

Interestingly, the BioGPS analysis seems to identify features of the BSLA active site which are shared by other amidase enzymes. In particular, BSLA seems to share similar H-bond capabilities with amidases, as evidenced by the single-probe clustering. The possible promiscuous amidase activity of BSLA was probed with an amidase activity assay (Scheme 1) [36]. This revealed a complete absence of amidase activity. While indicative, this is not conclusive, as this might also be due to substrate specificity. Amidases are characterized by a developed network of H-bond acceptors and donors as described in previous work [33]. The aromatic moiety of the substrate molecule might thus prevent a good interaction with such H-bond/hydrophilic network. More generally, for amidases, the necessity of a hydrogen bond network that stabilizes the NH hydrogen to suppress its deprotonation was reported earlier [66]. Given the very open active site of BSLA, the minimal serine hydrolase with an α/β hydrolase fold, it is not entirely surprising that this type of hydrogen bond network has never been described for this enzyme.

The test of aw as parameter for the assignment of a serine hydrolase as lipase gave conclusive results. BSLA and CALB displayed good activity at low aw. In line with earlier results, the synthetic catalytic activity of CALB varies depending on the preparation. Earlier studies had shown that the observed synthetic catalytic activity competes with the hydrolytic activity—that is to say, the parallel hydrolysis of vinyl acetate [47]. This can lead to an apparent decrease in synthetic activity, as is observed for Novozym 435 and BSLA at higher aw (Figure 5). This trend has already been reported for Novozym 435 [47]. Just like Novozym 435 [47], BSLA displays essentially no hydrolysis of vinyl acetate at low aw < 0.1. This is confirmed in experiments with a ratio of 1:1 of alcohol to vinyl acetate; a similar rate of synthesis was observed (Figures 6 and 7). The fact that free CALB displays higher synthetic rates at higher aw is also in line with the literature [47]. It had earlier been demonstrated for

CALB that the ratio of synthesis to hydrolysis depends on the preparation of the enzyme used and that it increases with aw for purified, free CALB [47]. The activity of BSLA at low aw was proven also with a different solvent, MTBE (Figure 8).

BioGPS is a complimentary computational tool to investigate the character of an enzyme and delivers a useful input to help us explore the scope of an enzyme more thoroughly. The parameter aw is an indicative tool to determine whether an enzyme is a lipase or and esterase. Just like the substrate scope, it is not absolute, but is highly indicative. Essentially, a serine hydrolase that is active at low aw is a lipase and not an esterase, while the reverse statement is not valid. Or, as it was recently summarized: "This long-standing and biased question could be compared to the search for differences between humans and mammals, which implicitly means that one does not consider humans as mammals! Obviously, lipases are a special kind of esterases like humans are a special kind of mammals." [2].

#### **4. Materials and Methods**

#### *4.1. Materials*

#### Chemicals and Enzymes

1-propanol, 1-octanol, toluene extra dry, decane, *p*-nitrophenylbutyrate, *p*-nitrophenol, 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), tributyrin and 2-methyl-2-propanol, butyric acid and caprylic acid were purchased from Sigma-Aldrich (Schnelldorf, Germany) and Acros (Geel, Belgium), and used without previous purification. Vinyl acetate was purchased from Sigma-Aldrich and distilled before use. Novozym 435 (immobilized lipase B from *Candida antarctica*) was made available by Novozymes (Bagsværd, Denmark). Bovine serum albumin protein and lysozyme from chicken egg whites were purchased from Sigma Aldrich. Bradford reagent was purchased from Biorad (Hercules, C.A., USA). Medium and buffer components were purchased from BD, Merck (Darmstadt, Germany) or J.T. Baker (Geel, Belgium).

#### Strains and Plasmids

Strains *Escherichia coli* (*E. coli*) HB2151 and *E. coli* HB2151 pCANTAB 5E bsla were kindly provided by Prof. Bauke Dijkstra and Prof. Wim Quax, University of Groningen, the Netherlands. Strains *Escherichia coli* (*E. coli*) BL21 (DE3), *E. coli* TOP10 and plasmid pET22b(+) were utilized for all further work.

#### *4.2. Methods*

#### Cloning pET22bbsla and pET22bcalb

The gene of BSLA (as confirmed by sequencing ID: CP011115.1, range from 292296 to 292841, protein AKCA5803.1) was amplified by PCR from vector pCANTAB 5E BSLA using primers BSLA F: 5 -CCTTTCTATGCGGCCCAGC–3 and BSLA + XhoI R: 5 -CCGCTCGAGCGCCTTCGTATTCTGG-3 . Thereby, restriction site XhoI was introduced for subsequent cloning of BSLA (NcoI, XhoI) into vector pET22b+ (in frame with pelB and His-tag signals). The resulting vector was named pET22bBSLA. The wild type CALB was synthesized by BaseClear (Leiden, The Netherlands). The codon-optimized genes were cloned into pET22b(+) using the NcoI and NotI restriction sites as previously described [67], in order to be in frame with the pelB sequence and a C-terminal His-tag of the plasmid.

#### Expression and Purification of BSLA

This protocol was adapted from [24]. A freshly grown colony of *E. coli* HB 2151 pCANTAB 5E BSLA was used to inoculatea1L shake flask containing 100 mL of 2xTY medium (1.6% *w*/*v* bactotryptone, 1% *w*/*v* bacto yeast extract and 0.5% *w*/*v* sodium chloride), ampicillin (100 μg/mL final concentration) and isopropyl-β-d-galactopyranoside (IPTG, 1 mM final concentration). After 16 h at 28 ◦C and 150 rpm (Innova Incubator, Hamburg, Germany) the cells were harvested and washed with 10 mM Tris Buffer pH 7.4 and stored at −20 ◦C. The periplasm isolation protocol was adapted from [67] and consisted of the resuspension of the overexpressed cells in 1 mL of 10 mM Tris buffer pH 8.0 containing sucrose (25 % *w*/*v*), EDTA (2 mM) and lysozyme (0.5 mg/mL). After incubation on ice for 20 min, 250 μL of 10 mM Tris buffer pH 8.0 containing sucrose (20% *w*/*v*) and MgCl2 (125 mM) was added. The suspension was centrifuged and the supernatant containing the periplasmic fraction was desalted using a PD10 column (GE, Healthcare, New York, N.Y., USA) to 100 mM potassium phosphate buffer pH 7.4. Afterwards, the solution was shock-frozen with liquid nitrogen and stored at −20 ◦C for future biocatalysis applications. BSLA was purified mainly from the media. First, proteins were precipitated by adding 50% *v*/*v* saturated ammonium sulphate (2.8 M final concentration) for 5 h at 4 ◦C. After centrifugation, the solid fraction was dissolved in 100 mL of 100 mM potassium phosphate buffer pH 7.4, filtered through a 0.45 μm filter and loaded into a 5 mL His-Trap previously equilibrated column (GE Healthcare) using a NGC chromatography system (BIORAD, Hercules, C.A., USA). The loaded proteins were washed with equilibration buffer potassium phosphate (100 mM, pH 7.4) containing 500 mM NaCl and 20 mM imidazole. The His-tagged BSLA was eluted with a linear gradient from 0–100% potassium phosphate (100 mM, pH 7.4) containing 500 mM NaCl and 500 mM imidazole. The progress of the purification was monitored at 280 nm. Fractions containing the target protein (as confirmed by SDS-PAGE and activity assay, 50–60% of the gradient) were combined, concentrated and desalted with a PD-10 column (GE Healthcare) to potassium phosphate buffer (100 mM, pH 7.4). The purified enzyme (68–148 μg/L medium) was aliquoted (2.5 U/vial, units determined by tributyrin assay) and freeze dried for 16 h, −80 ◦C and stored at −20 ◦C under nitrogen atmosphere.

Protein Sequence of BSLA-His (AKCA5803.1):

MAAEHNPVVMVHGIGGASFNFAGIKSYLVSQGWSRDKLYAVDFWDKTGTNYNNGPVLSR FVQKVLDETGAKKVDIVAHSMGGANTLYYIKNLDGGNKVANVVTLGGANRLTTGKALPGTDP NQKILYTSIYSSADMIVMNYLSRLDGARNVQIHGVGHIGLLYSSQVNSLIKEGLNGGGQNTKALEH HHHHH

#### Expression and Purification of CALB

An LB-Amp plate (100 μg/mL) was used to freshly grow *E. coli* BL21 (DE3) pET22bCALB from a −80 ◦C DMSO stock. After incubation at 37 ◦C for 16 h, a single colony was used to inoculate a 5 mL LB-Amp (100 μg/mL) preculture and grown for 8 h at the same temperature. Large-scale expressions were carried out in 0.5 L of ZYM-5052 media (placed in 2 L shake flask), 2% *v*/*v* of the preculture was used for inoculation. After 17 h expression at 22 ◦C and 170 rpm, an optical density (600 nm) of approximately 3 was obtained in all cases. Afterwards, cells were spun down, washed with 10 mM potassium phosphate buffer pH 7.4 and stored at −20 ◦C. ZYM-5052 medium [68]: The main cultures were grown in ZYM-5052 medium containing 50 mL 50xM (Na2HPO4·12H2O 448 g/L, KH2PO4 170 g/L, NH4Cl 134 g/L, Na2SO4 35.5 g/L), 20 mL 50x5052 (100 g/L α-D-lactose, 250 g/L glycerol and 25 g/L glucose dissolved in ddH2O) and 2 mL of MgSO4 solution (1 M in ddH2O) and filled to 1 L with ZY medium (casamino acids 10 g/L—tryptone in this case—yeast extract 5 g/L). Additionally, 0.2 mL of trace element solution was added to the media. The purification of His-tagged CALB was performed from the periplasmic fraction, as described for BSLA in the previous sections.

Protein Sequence of CALB-His (Sequence ID: 4K6G\_A):

MALPSGSDPAFSQPKSVLDAGLTCQGASPSSVSKPILLVPGTGTTGPQSFDSNWIPLSTQLGYTPC WISPPPFMLNDTQVNTEYMVNAITALYAGSGNNKLPVLTWSQGGLVAQWGLTFFPSIRSKVDRLMA FAPDYKGTVLAGPLDALAVSAPSVWQQTTGSALTTALRNAGGLTQIVPTTNLYSATDEIVQPQVSNS PLDSSYLFNGKNVQAQAVCGPLFVIDHAGSLTSQFSYVVGRSALRSTTGQARSADYGITDCNPLPAN DLTPEQKVAAAALLAPAAAAIVAGPKQNCEPDLMPYARPFAVGKRTCSGIVTPAAALEHHHHHH

Bradford Assay

Total protein concentration was determined using Bradford reagent in a microtiter plate (MTP) reader format (96 well plates) [69]. Properly diluted samples were mixed with Bradford reagent (5x), incubated at room temperature (RT) for 5 min and the absorbance measured at 595 nm (in triplicate). The calibration curve was carried out using bovine serum albumin protein, as is standard.

#### Lipase Activity: Tributyrin Assay

A tributyrin assay for determining lipase activity was performed according to the literature [54]. The assay is based on pH change by acid formation when tributyrin is hydrolyzed by the enzyme. *p*-Nitrophenol was used as pH indicator (colorless at pH 5.5 and yellow at pH 7.5) and the acid concentration was determined by a calibration curve with known amounts of butyric acid (from 0 mM to 40 mM). A negative control was performed by adding buffer instead of an enzyme sample. The substrate consumption (0.8 mM initial concentration) was monitored at 410 nm, 30 ◦C for 15 min, every 38 s by a microtiter plate reader (in 96-well plates, Synergy 2, BioTek, Winooski, V.T., USA). Plates were shaken for 5 s before every read. The different buffers needed for this assay contained 2.5 mM 3-(*N*-morpholino)propanesulfonic acid (MOPS) (pH 7.2), CHAPS (to dissolve acids) and β-cyclodextrin (to dissolve acids into the solution, to increase the linearity). The activity was determined in U, which is equivalent to μmol acid formed per minute. The assays were done in triplicate. For performing this assay with immobilized enzymes, a larger scale (3 mL) in glass vials with a magnetic stirrer was applied. These were placed on a stirring platform and, for Novozym 435, samples (120 μL) were taken over time and placed in a 96-well plate. If desired, the assay can also be performed with Trioctanoin.

#### Esterase/Lipase Activity: *p*-Nitrophenol Assay

This protocol was adapted from [53] to an MTP reader equipped for 96 well plates. The enzymatic hydrolysis of *p*-nitrophenyl butyrate with the concomitant formation of *p*-nitrophenol was monitored at 405 nm, 37 ◦C and recorded for 30 min. For this, a calibration curve of *p*-nitrophenol in potassium phosphate buffer (100 mM, pH 7.4) was prepared (levels from 0-500 μM, 200 μL total volume, in triplicate) and control reactions without enzyme extracts were performed. Lyophilized cell-free extract or pure enzymes were re-dissolved in potassium phosphate buffer (100 mM, pH 7.4) (approximately 20–30 mg/mL) and proper dilutions were added into a preheated potassium phosphate buffer (100 mM, pH 7.4) solution containing 3 mM *p*-nitrophenyl butyrate. The esterase activity measured was corrected by subtracting the activity observed in the controls (no enzyme). By definition, one unit of enzyme (U) is equivalent to 1 μmol of *p*-nitrophenol formed per minute.

#### Karl Fischer Titration

A Metrohm KF Coulometer Karl Fischer titration setup was used, according to the manufacturer's instructions, to determine the water content in ppm. Samples (100 μL) were taken from the solvent and injected into the system in duplicate. In general, master mixes of toluene after equilibration with aw < 0.1, aw 0.23 and aw 0.75 contained 20, 120 and 360 ppm respectively. A deviation of 5–10 ppm per sample was observed.

#### Equilibration of Solvents/Enzymes to the Desired aw

All materials, reagents, enzymes and solvents used for the biocatalytic reactions were carefully dried and kept under nitrogen atmosphere with molecular sieves (5 Å) at all times. In all cases, the water content was monitored by Karl Fischer titration and as standard parameter compounds with a water content below 100 ppm were considered dry and suitable for the reaction.

#### Vapor Phase Method

Oversaturated salt solutions and activated molecular sieves were used to equilibrate the solvents and enzymes needed in the transesterification reaction with a desired water activity [47–50]. In the case of working with dry systems, a master mix was prepared including solvent, substrates (without vinyl acetate) and internal standard, all components previously dried with activated molecular sieves achieving aw < 0.1. Lyophilized BSLA, Novozym 435 and CALB were dried over silica in desiccators under a vacuum at room temperature (20–25 ◦C) for 24, 48 and 24 h, respectively. In order to achieve higher water activities, the master mix and the enzymes were equilibrated over saturated salt solutions of potassium acetate (KAc) and sodium chloride (NaCl) at 30 ◦C for 48 h, resulting in aw 0.23 and aw 0.75 at 30 ◦C, respectively [51]. As exceptions, BSLA and free CALB were equilibrated for a shorter period of only 24 h.

#### Salt Pairs Method

The protocol was adapted from [62]. The enzymes were lyophilized with anhydrous salts (Na2HPO4 or NaAc) in a ratio 1:99 (3 mg pure BSLA or CALB enzyme and 297 mg of the respective salt). For the background reaction (no-enzyme), only lyophilized salts were added. An amount of 10 mg of the co-lyophilized enzyme was added under a nitrogen atmosphere to the previously dried reaction components (except vinyl acetate) and a specific amount of water was introduced under the nitrogen atmosphere. The moles of water added to the reaction mixture were calculated in order to generate the couple of hepta- and dihydrated phosphates in the case of Na2HPO4 (5 moles of water per mol of salt, aw ~ 0.57) and the couple of tri- and anhydrous acetate in the case of NaAc (1.5 moles of water per mol of salt, aw ~ 0.25) [50]. After overnight equilibration, the last substrate was added (freshly distilled and dry vinyl acetate) to begin the reaction. In the case of the dry system, no water was added (aw < 0.1).

Transesterification Catalyzed by Lipases in Organic Solvents under Fixed Water Activities

The protocol was adapted from [47]. The reaction conditions included substrates 1-propanol, 1-octanol, 2-octanol or benzylacohol (100 mM), vinyl acetate freshly distilled (1 or 5 equiv. in respect to the initial substrate concentration) and decane (ISTD, 500 mM final concentration). A range of 0.5–1 U of purified enzymes, 1 mg of immobilized Novozym 435 were tested as catalysts. Toluene or methyl-*t*-butyl ether were used as media (1 mL total volume in GC airtight vials). Reactions were carried out for 25 h, 30 ◦C and 1000 rpm (thermoblock Eppendorf, Hamburg, Germany). Negative controls were run for both substrates in absence of enzyme. All reactions were performed in duplicate and monitored over time by gas chromatography.

#### Analytics: GC and GC-MS

Gas chromatograph (GC) and gas chromatograph-mass spectrometry (GC-MS) methods were adapted from [47]. Samples were injected in a gas chromatograph (GC-2014, Shimadzu, Kyoto, Japan) equipped with a CP Sil 5 column (50 m × 0.53 mm × 1.0 um). Injector and detector temperatures were set to 340 and 360 ◦C, respectively. The initial column temperature was set to 35 ◦C for 5 min, followed by an increase of 15 ◦C/min up to 60 ◦C for 0.5 min and 15 ◦C/min up to 160 ◦C and hold for 2 min. Finally, a burnout was introduced, 30 ◦C/min up to 325 ◦C. The retention times for 1-propanol, vinyl acetate, toluene, decane, 1-octanol and 1-octylacetate were 1.69, 1.87, 6.64, 10.52, 11.19 and 12.79 min, respectively. To confirm the product's structure, samples were also injected in a gas chromatograph-mass spectrometer (GC-MS QP2010s, Kyoto, Japan) equipped with a CP Sil 5 (25 m × 0.25 mm × 0.4 μm). The injector, interface and ion source temperatures were set to 315, 250 and 200 ◦C, respectively. The retention times for vinyl acetate, 1-octanol and 1-octylacetate were 1.78, 11.57 and 12.93 min, respectively.

Amidase Activity Assay: Hydrolysis of Benzyl Chloroacetamide

This protocol was adapted from the literature [36]. The biocatalysis conditions included a total volume 500 μL in a 2 mL Eppendorf tube containing 5 mM benzyl chloroacetamide (stock solution of 500 mM in THF), 100 μg/mL enzyme (as quantified by Bradford assay), in 25 mM potassium phosphate buffer pH 7.0 with 10 % *v*/*v* THF. The conversion was carried out for 24 h at 37 ◦C and 500 rpm. Afterwards, the derivatization of 200 μL of reaction mixture was carried out with 50 μL of NBDCl (20 mM in DMSO) for 1 h, at 37 ◦C and 500 rpm. UV detection at 475 nm was performed.

#### BSLA Structure

The structure of BSLA was taken from the Protein Data Bank (PDB). The structure downloaded (1R50) was treated by removing all molecules but the protein chain with the software PyMOL. The thus-generated structure was used for visual inspection with PyMOL as well as input for the BioGPS analysis.

#### *4.3. BioGPS Computational Analysis*

The BioGPS analysis and projection was taken from the previous published work. The BSLA BioGPS analysis was performed using the BioGPS software provided by Molecular Discovery Ltd. (Borehamwood, Hertfordshire, UK) by projecting the enzyme according to its active site properties in the previously performed analysis. The identification of the BSLA active site and the calculation of its properties has been performed as previously described. Specifically, FLAPsite was used for automatic active site identification. The active site was mapped using a GRID approach and the resulting computed properties were considered as electrondensity-like fields centered on each atom, which correspond to the so-called pseudo-molecular interaction fields (pseudo-MIFs). Four different properties were mapped: the active site shape (H probe), H-bond donor properties (O probe), H-bond acceptor capabilities (N1 probe), and hydrophobicity (DRY probe). The magnitude of the interaction of the N1 and O probes also includes, implicitly, information about the charge contribution, as these probes already have a partially positive and negative charge, respectively. The pseudo-MIF points were filtered, by means of a weighted energy-based and space-coverage function, and then used for the generation of quadruplets obtained from all possible combinations of the four pseudo-MIF points. Thus, the BSLA active site was described by a series of quadruplets. Finally, BSLA was projected according to its series of quadruples and scored by the previously performed BioGPS analysis.

#### **5. Conclusions**

The longstanding question "what differentiates lipases from esterases?" has led to a list of six parameters that are indicative, but not decisive. Here, we have probed BioGPS and aw as parameters to distinguish between lipases and esterases, utilizing the minimal serine hydrolase with an α/β fold, BSLA, as a test enzyme. While BioGPS has been used successfully to address similar questions earlier, it was not indicative in this case. The clear assignment of BSLA as either esterase or lipase was a challenging task. The high catalytic activity of BSLA at low aw clearly demonstrated this serine hydrolase to be a lipase. In future studies, activity at low aw should, therefore, be utilized to support the differentiation of lipases and esterases.

**Supplementary Materials:** The following file is available online at http://www.mdpi.com/2073-4344/10/3/308/s1, Figure S1: Bio GPS of 43 serine hydrolases, for BSLA the data of pdb 1R50 were utilized: (**a**) H-bond donor; (**b**) H-bond acceptor; (**c**) Hydrophobicity; Table S1: Enzymes utilized for the Bio GPS study with the relevant PDB codes.

**Author Contributions:** Conceptualization, U.H., L.G., P.B. and G.T.; methodology, P.B., V.F. and G.T.; validation, N.v.M., E.A., P.B. and V.F.; formal analysis, P.B., U.H. and L.G.; investigation, N.v.M., E.A., P.B. and V.F.; resources, U.H. and L.G.; data curation, N.v.M., E.A., P.B. and V.F.; writing—original draft preparation, U.H., P.B. and V.F.; writing—review and editing, U.H. and L.G.; visualization, U.H. and L.G.; supervision, P.B., U.H. and L.G.; project administration, U.H. and L.G.; funding acquisition, U.H. and L.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by BE-BASIC, grant number FES-0905 to P.B. and G.T. L.G. received funding from the University of Trieste, FRA 2018.

**Acknowledgments:** Excellent technical support by the technicians of the BOC group and by Linda Otten is gratefully acknowledged. Bauke Dijkstra and Wim Quax, University of Groningen, the Netherlands kindly provided the *Escherichia coli* (*E. coli*) HB2151 and *E. coli* HB2151 pCANTAB 5E bsla. L.G. is grateful to Molecular Discovery Ltd. for providing software access.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Hydrolysis of Glycosyl Thioimidates by Glycoside Hydrolase Requires Remote Activation for E**ffi**cient Activity**

#### **Laure Guillotin 1, Zeinab Assaf 1, Salvatore G. Pistorio 2, Pierre Lafite 1, Alexei V. Demchenko <sup>2</sup> and Richard Daniellou 1,\***


Received: 18 September 2019; Accepted: 30 September 2019; Published: 1 October 2019

**Abstract:** Chemoenzymatic synthesis of glycosides relies on efficient glycosyl donor substrates able to react rapidly and efficiently, yet with increased stability towards chemical or enzymatic hydrolysis. In this context, glycosyl thioimidates have previously been used as efficient donors, in the case of hydrolysis or thioglycoligation. In both cases, the release of the thioimidoyl aglycone was remotely activated through a protonation driven by a carboxylic residue in the active site of the corresponding enzymes. A recombinant glucosidase (*Dt*Gly) from *Dictyoglomus themophilum*, previously used in biocatalysis, was also able to use such glycosyl thioimidates as substrates. Yet, enzymatic kinetic values analysis, coupled to mutagenesis and in silico modelling of *Dt*Gly/substrate complexes demonstrated that the release of the thioimidoyl moiety during catalysis is only driven by its leaving group ability, without the activation of a remote protonation. In the search of efficient glycosyl donors, glycosyl thioimidates are attractive and efficient. Their utility, however, is limited to enzymes able to promote leaving group release by remote activation.

**Keywords:** glycoside hydrolase; thioglycosides; biocatalysis

#### **1. Introduction**

Enzymes proved to be efficient synthetic tools for the eco-compatible synthesis of many classes of compounds. Non-organic solvents, mild experimental conditions and high regio- or stereo- specificity inherent to biocatalyzed reactions have increased the added value of enzymes in transformation processes, from the laboratory bench to the industrial scale [1]. Moreover, genetic modifications of recombinant enzymes are now powerful tools to easily alter versatility and properties of the engineered proteins. Rational mutagenesis, directed evolution, or even de novo design have dramatically broadened the applicability of enzymes in biocatalysis [2].

In the glycochemistry field, a vast array of carbohydrate-metabolizing enzymes (CAZYmes), including glycoside hydrolases (GH) or glycosyltransferases (GT), has been engineered and used for the chemo-enzymatic synthesis of glycosides [3]. The corresponding methodologies have proved useful in numerous applications ranging from glycosylated natural products to pharmaceuticals [4,5]. However, only few examples in the literature have been describing the use of CAZYmes for the preparation of synthetic thioglycosides that exhibit a sulphur atom linking the glycone and aglycone counterparts instead of more conventional oxygen or nitrogen atoms [6]. Interestingly, when compared to the corresponding *O*-glycosides, *S*-glycosides are highly stable towards enzymatic and acidic hydrolyses. As a result, thioglycosides have been used as substrate analogues or inhibitors of *O*-GH involved in many diseases including cancer, lysosomal storage disorder, viral and bacterial infections [7,8].

Activated glycosyl donors have been used for a long time, especially in chemoenzymatic synthesis of oligosaccharides [9–11]. In retaining GH, where the stereochemistry of the anomeric carbon is conserved, these activated donors are of high interest because they enable the formation of the glycosyl-enzyme intermediate through the release of the leaving group (Figure 1). This first step is common to all enzymatic activities (hydrolase [12], transglycosidase [13], halogenase [12] and thioligase [14]) because the final outcome of the reaction only depends on the nature of the nucleophile that will attack the glycosyl-enzyme intermediate in the second step. Depending on the reaction and the substrate employed, this step can be rate-determining.

**Figure 1.** Schematic mechanism of the first step involving the glycosyl-enzyme intermediate formation in retaining GH. The leaving group (LG) release can also be catalysed through another catalytic residue according to its nature. Depending on the nucleophile (Nu) attacking the intermediate, three reactions can take place—hydrolysis, transglycosylation or thioligation.

In addition to the well characterized *O*-glycosides bearing a potent leaving group, some activated S-glycosides have been reported as efficient substrates for thioligases [15] or glycoside hydrolases [16]. This latter hydrolytic activity is peculiar as very few examples of *S*-glycosides hydrolysis by glycoside hydrolases have been reported in literature [16–26]. Among those examples, putting aside Glc*N*Acase, GH4 and myrosinase that do not operate through the canonical GH mechanism, only almond β-glucosidase GH1, *Aspergillus niger* GH3 [16,22,27], *Micromonospora viridifaciens* sialidase [21], *Caldocellum saccharolyticum* glucosidase [24] and *Oryza sativa* Os4BGlu12 [23] have been isolated and identified as thioglycoside hydrolases (Table 1).



<sup>a</sup> ratio of *k*cat/*K*<sup>M</sup> for thioglycoside substrate vs. corresponding *para*-nitrophenyl glycoside. <sup>b</sup> ratio of *k*cat/*K*<sup>M</sup> for octyl *S*-glucoside vs. octyl *O*-glucoside.

In most cases, *S*-substrate hydrolysis is much less efficient than the rate observed for the corresponding *O*-substrate. Indeed, thioglycosides are less efficient substrates because no general acid/base catalysis is available [28]. Yet, a new class of reactive thioglucosides (Figure 2, Table 1) bearing a thioimidoyl moiety was reported, which were efficiently hydrolysed by almond GH1, as well as *A. niger* GH3 [16]. In both cases, the authors demonstrated that benzoxazolyl 1-thio-β-D-glucopyranoside (Glc*S*Box) and benzimidazolyl 1-thio-β-D-glucopyranoside (Glc*S*Biz) hydrolyses were catalysed by remote activation of the C-S bond through protonation of the ring nitrogen in the aglycone. Such remote activation was also described in the case of Araf51 [15], which was able to use similar arabinofuranosyl thioimidates as glycosyl donors in thioglycoligation reaction [29–35]. In the context of chemoenzymatic synthesis of glycosides, these substrates are attractive because of their high stability towards chemical hydrolysis in aqueous solutions, as well as efficient leaving group ability [15].

**Figure 2.** Substrates used in this study.

In this work, we demonstrated that *Dt*Gly, a GH previously used in chemoenzymatic synthesis of *O*-glycosides, was able to hydrolyse these glycosyl thioimidates. Combined in vitro enzymatic analysis with in silico modelling of the enzyme-substrate interaction have helped us to decipher the molecular mechanism of this rare hydrolysis.

#### **2. Results**

#### *2.1. DtGly Can Hydrolyze Thioglycosides*

*Dt*Gly (uniprot B5YCI2\_DICT6) is encoded by *dicth\_0359* gene in the thermophile *Dictyoglomus thermophilum* genome. We have recently reported the cloning, expression and purification of this protein [36]. As many other *D. thermophilum* proteins [37–41], *Dt*Gly was found to be thermostable and also exhibited a wide substrate specificity, as it is able to hydrolyse *p*NP β-D-glucoside, *p*NP β-D-galactoside and *p*NP β-D-fucoside. Moreover, our previous study demonstrated that *Dt*Gly could be used in chemoenzymatic synthesis of glycosides, thereby serving as an attractive biocatalyst that needed to be assayed for other substrates [36].

In this context, we have focused on thioglycoside hydrolysis, as few examples of *S*-GH are available in literature. Three *S*-containing substrates were tested, namely Glc*S*Biz, Glc*S*Box and Glc*S*Taz that bear benzimidazolyl, benzoxazolyl and thiazolinyl aglycones, respectively (Figure 2).

Unlike *p*NP-Glc, wherein the hydrolysis can be easily monitored by quantification of the released *p*NP group, hydrolysis rates of the *S*-containing substrates were determined by quantification of the released glucose. This was achieved by monitoring *o*-dianisidine oxidation enzymatically coupled to glucose production [42]. This methodology applied to *p*NP-Glc hydrolysis gave similar kinetic values to those previously reported using *p*NP quantification (*data not shown*).

All three *S*-containing substrates were hydrolysed by *Dt*Gly (Table 2), with *K*<sup>M</sup> values higher but in the same order of magnitude, as those observed for *p*NP-Glc (2- to 5-fold increase). However, the catalytic rate *k*cat was decreased by one order of magnitude, indicating that the reaction is dramatically slowed in the case of *S*-containing substrates. Therefore, the catalytic efficiencies of

Glc*S*Biz, Glc*S*Box and Glc*S*Taz were found to be 20 to 40 times lower than the value determined for *p*NP-Glc.

**Table 2.** Kinetic parameters of WT and acid/base E159Q mutant of *Dt*Gly. pNP-Glc hydrolysis activity was measured by *p*NP release quantification. Other substrate hydrolysis activities were determined by quantification of the released glucose. All experiments were done in three independent replicates and are expressed as Mean ± SD.


<sup>a</sup> Previously reported data [36].

Glc*S*Biz, Glc*S*Box and Glc*S*Taz have previously been used as substrates for sweet almond and *A. niger* β-glucosidases [16], yet with a much different behaviour. Glc*S*Biz was hydrolysed by this enzyme as efficiently as *p*NP-Glc. Kinetics analysis proved that Glc*S*Biz was efficiently hydrolysed by those glucosidases thanks to the remote protonation of the imidazole ring nitrogen. A much lower activity was observed for Glc*S*Box and no activity could be observed for Glc*S*Taz.

To better understand the chemistry underlying the thioglucoside hydrolysis by *Dt*Gly, we first investigated whether these substrates were efficiently binding in the active site, because low GH activities can arise from a second binding mode of substrates, as already reported [43]. Inhibition of *p*NP-Glc hydrolysis by Glc*S*Biz demonstrated that the latter is a competitor in the active site to *p*NP-Glc (Figure 3). Moreover, it efficiently binds into the active site, as an inhibitory constant *K*<sup>i</sup> of 177 ± 11 μM was calculated from the inhibition curves.

**Figure 3.** Lineweaver plot of wild-type *Dt*Gly inhibition with increasing concentrations of Glc*S*Biz. Data are expressed as mean ± SD from three independent experiments. Inhibitor concentrations are respectively depicted as crosses (0 μM), circles (100 μM), triangles (250 μM), diamonds (500 μM) and squares (1000 μM). Inset: 2X zoom on axes origin highlighting the intersection of fitted lines on *y*-axis.

#### *2.2. Identification of Residues Surrounding the Thioglycoside Substrates in DtGly Active Site*

Structural analysis of *Dt*Gly was carried out to identify potential residues that might be involved in *S*-containing substrate hydrolysis mechanism. Despite our efforts to crystallize *Dt*Gly, no diffracting crystal could be obtained, thus we decided to build a homology model of the enzyme. To do so, a 3D structure of β-glycosidase from *Pyrococcus horikoshii* was chosen because of its high sequence identity (resp. homology) with *Dt*Gly of 45% (resp. 63%) [44].

An initial model of residues 2-416 was built using a ModWeb server (ModPipe Protein Quality Score of 1.6, considered as reliable); missing residues were then added and the overall model was equilibrated by several cycles of energy minimization and molecular dynamics (Figure 4A).

**Figure 4.** (**A**) Overall representation of *Dt*Gly model. Helices and sheets are respectively coloured in blue and orange. (**B**) Model of docked Glc*S*Biz in *Dt*Gly active site. Residues surrounding the ligand binding pocket are depicted as sticks. For clarity purposes, hydrogens are not represented. Catalytic residues Glu159 (acid/base) and Glu324 (nucleophile) are highlighted in bold. H-bonds are indicated as dashed lines.

In order to evaluate potential roles of active site residues in sulphur-containing substrate hydrolysis, modelling of substrate-bound *Dt*Gly were done by molecular docking. Using the conformation of glucosides in other closely related GH1 x-ray structures (β-glucosidase from *Thermotoga maritima* PDB 1OIM and 1OIF [45]), Glc*S*Biz, Glc*S*Box and Glc*S*Taz were independently docked into the *Dt*Gly active site. Figure 4B depicts the residues surrounding Glc*S*Biz, as well as the network of H-bonds between the sugar moiety and several polar residues (Gln20, Glu159, Glu324, Asn367, Glu369). An additional H-π interaction between glucose and Trp362 is also visible, as already seen for other GH [40]. The same interactions were found for other substrates or conformations (see the Supplementary Materials).

In the context of identification of potential residues involved in the *S*-glycoside activation during hydrolysis this model confirms that no acidic residue except Glu159 was close enough to remotely protonate aglycone moieties of substrates, as expected considering in vitro assays.

#### *2.3. DtGly Hydrolysis of S-Glycosides Does Not Involve General Acid*/*Base Catalysis*

In our model, the catalytic glutamate Glu159, which acts as the acid/base residue in retaining GH mechanism [46], is the only one close enough to activate thioglycoside hydrolysis. Although direct protonation on sulphur cannot occur in the case of thiogycosides [28], examples of distant protonation of the aglycone by a catalytic residue have been reported [15]. We have thus generated two mutants, namely Glu159Ala and Glu159Gln to assess the potential role of this residue in the thioglycoside hydrolysis. Unlike Glu159Gln that could be purified to homogeneity, Glu159Ala mutant was not soluble after cell lysis and thus could not be purified. This mutant was left aside for further experiments.

Glu159Gln mutation led to a dramatic decrease of catalytic efficiency for *p*NPGlc, as shown in Table 2. *K*<sup>M</sup> values for this substrate are lower but in the same order of magnitude (200 μM vs. 460 μM), which can be explained by conservation of the active site structure in the mutant and decreased *k*cat value by a factor of 150 because *K*<sup>M</sup> is related to *k*cat. This loss of hydrolytic activity upon acid/base mutation is usual, as reported in many other studies, especially those concerning thioligase generation [14,40].

When using Glc*S*Box as a substrate, *Dt*Gly Glu159Gln exhibits a reduced *k*cat value (0.38 to 0.06 s<sup>−</sup>1), as expected because nucleophile water attack is not activated by deprotonation. However, second order rate constant remains unchanged, indicating that the first step of the reaction is not compromised

by the removal of the glutamate residue. Thus, the release of the thiol leaving group is not activated by Glu159 and is only dictated by its leaving capability (i.e., pKa).

#### **3. Discussion**

We have previously used *D. thermophilum Dt*Gly as a versatile tool for synthesis of glycosides and looked for alternate substrates for this enzyme. Thioimidate glycosyl donors have been used for a long time in organic synthesis to generate a wide range of glycosides and glycans [34,47–49]. In this context, we tested previously reported glycosyl thioimidates as substrates for almond GH1 and *A. niger* GH3 [16]. Examples of cloning and characterization of thioglycoside hydrolases are scarcely available in literature and even fewer studies on mechanism underlying the thioglycoside hydrolysis by GH have been published.

*Dt*Gly is able to hydrolyse *S*-glycosides, with lower activities than those observed for *O*-glycosides. This hydrolytic activity is rate-limited by release of the thiol-containing leaving group and not water nucleophilic attack, unlike generally accepted mechanism for the *O*-glycoside hydrolysis [19,28]. The modelling of substrate-*Dt*Gly complexes as well as mutagenesis of the acid/base residue also demonstrated that no residue was able to remotely protonate the benzimidazole group nor the sulphur atom. If *Dt*Gly is able to hydrolyse *S*-containing substrates without general acid catalysis, the hydrolysis rate is limited by the leaving group capability, as no remote activation is possible. The pKa of leaving groups 2-mercaptobenzimidazole (for Glc*S*Biz) and 2-mercaptobenzoxazole (for Glc*S*Box) have been experimentally determined at 5.8 [50] and 6.58 [51]. To our knowledge, no value is available for 2-mercaptothiazoline (for Glc*S*Taz).

In the case of almond and *A. niger* glycosidases, a remote protonation occurring on a nitrogen atom of the benzimidazole moiety of Glc*S*Biz was shown to accelerate the leaving group release, thus increasing the catalytic rate to a value close to those observed for *O*-glycosides. Another GH exhibiting thioglycosidase activity on 2'-thio-benzimidazolyl arabinosides activated by remote deprotonation, namely Ara*f* 51 [15], was also reported. The modelling of Ara*f* 51/substrate complex demonstrated that the nucleophile catalytic residue was responsible for the remote protonation on imidazole nitrogen, mostly because a furanosyl ring is much more flexible than a pyranosyl ring and allows the nucleophile residue to interact with the aglycone.

#### **4. Materials and Methods**

#### *4.1. Materials*

*para*-Nitrophenyl β-D-glucopyranoside (*p*NP-Glc) was purchased from Carbosynth (Oxford, UK). 2-benzoxazolyl 1-thio-β-D-glucopyranoside (Glc*S*Box) [52], 2-benzimidazolyl 1-thio-β-Dglucopyranoside (Glc*S*Biz) [16] and 2-thiazolinyl 1-thio-β-D-glucopyranoside (Glc*S*Taz) [49] were prepared as previously described. Otherwise specified, all other chemicals were purchased from Thermo Fisher Scientific (Waltham, MA, USA) and were of purest quality available. Mutagenic primers were purchased from Eurofins Genomics (Ebersberg, Germany) and WT *Dt*Gly coding expression plasmid (pET28a-*dtgly*) was prepared as previously described [36].

#### *4.2. Production of WT and E159Q DtGly*

pET28a-*dtgly* was used as a template for mutagenic PCR in the Quikchange Site-directed mutagenesis kit (Agilent, Les Ulis, France). Primers containing desired mutation on acid/base residue position (E159) were constituted of a pair of complement oligonucleotides designed using Agilent tools website (www.genomics.agilent.com, mutated codons are highlighted in bold): *Dt*Gly E159A: 5'-gaattactggatgactataaat**gcg**cccaatgcttatgccttt-3' and *Dt*Gly E159Q: 5'-atcttgtgaattactggatgactataaat**cag**cccaatgcttatg-3'. Mutagenesis procedure was performed according to the kit procedure. Sequences of pET28a-*dtgly*E159A and pET28a-*dtgly*E159Q were verified by Sanger sequencing at Eurofins Genomics (Ebersberg, Germany).

Production and purification of *Dt*Gly variants was done as previously reported [36]. Briefly, *Escherishia coli* Rosetta(DE3) transformed with expression plasmids were grown in LB medium supplemented with chloramphenicol (34 μg/mL) and kanamycin (30 μg/mL) at 37 ◦C until OD600 reached 0.6. Induction was then done by addition of 1 mM IPTG and incubated overnight a 25 ◦C. Cells were harvested, lyzed by freeze-thaw cycles and sonication and supernatant was clarified by heat treatment for 15 min at 70 ◦C before centrifugation. Finally, supernatant was loaded on a Nickel column (HisPure, Thermo Scientific) and purified by elution with lysis buffer containing 500 mM imidazole.

#### *4.3. pNP Release Quantification Assay*

*Dt*Gly variants (WT or mutants) activity towards *p*NP-Glc hydrolysis was determined at 37 ◦C in a 200 μL incubation containing 5 ng of the enzyme, 0.01–10 mM *p*NP-Glc, Citrate-Phosphate buffer (20 mM, pH 6) and 0.1–1 mM GlcBox for inhibition studies. After 20 min of incubation, 100 μL of Na2CO3 1 M was added and released *p*NP was quantified at 405 nm (ε<sup>405</sup> = 19,500 cm<sup>−</sup>1.M−1). Prism 4 (GraphPad) was used to fit data according to Michaelis-Menten model, or competitive inhibition model and retrieve kinetic parameters.

#### *4.4. Glucose Release Assay*

To determine Glc*S*Box, Glc*S*Taz and Glc*S*Biz hydrolysis rate by DtGly variants, produced glucose was quantified using a continuous coupled enzyme assay [42]. Incubations were similar as those for *p*NP-Glc hydrolysis with the addition of glucose oxidase from *Aspergillus niger* (Sigma-Aldrich, Saint Louis, MO, USA, 0.4 u), horseradish peroxidase (Sigma-Aldrich, 0.4 u) and *o*-dianisidine (Sigma-Aldrich, 100 μM). Dianisidine oxidation coupled to glucose production was monitored at 442 nm during 30 min. Prism 4 (GraphPad) was used to fit data according to Michaelis-Menten model and retrieve kinetic parameters.

#### *4.5. Computational Studies*

The structure of β-glycosidase from Pyrococcus horikoshii [44] (PDB 1VFF, 45%/63% sequence identity/homology) was used as a template for homology model building using a ModWeb server from the A. Sali Laboratory (https://modbase.compbio.ucsf.edu/modweb/). The resulting model was prepared with AmberTools [53] and equilibrated using NAMD software [54] and Amber fb15 force field [55] (3 cycles of 10,000 minimization steps and 0.5 ns dynamics at 100 K).

Docking of Glc*S*Box, Glc*S*Taz and Glc*S*Biz substrates into *Dt*Gly active site model was done by firstly applying AM1-BCC charges on ligands [56]. Then each substrate was placed 10 Å away, facing the active site (according to PDB 1VFF). *Dt*Gly-substrate complexes were formed using steered molecular dynamics [57] at 100 K using the structural alignment of glucose moiety in its binding pocket as the final orientation according to closest structures bearing a ligand in their active site (β-glucosidase from *Thermotoga maritima* PDB 1OIM, 1OIF and 1OIF) [45]. *Dt*Gly backbone was kept constrained during the whole procedure. Finally, protein-ligand complexes models were equilibrated by releasing substrate constraints and applying several cycles of energy minimization (10,000 steps, steepest descent) followed by molecular dynamics (100 K, 1 ns). Final complex models were obtained by a final energy minimization. For each substrate, several initial conformations were tested, mostly by rotation of the glycosidic bond. All structural figures were drawn using the PyMOL Molecular Graphics system 1.8 (www.pymol.org).

#### **5. Conclusions**

This study demonstrates that glycosyl thioimidates are not universal glycosyl donors for chemoenzymatic syntheses. While the above examples of efficient enzymatic activities using such substrates were reported in literature, they rely on an activation by protonation of the aglycone moiety, either with a distant carboxylic acid residue (almond GH1) or the catalytic nucleophile (Ara*f* 51, Figure 5). *Dt*Gly seems to be the paradigm of the general case of an enzyme that can use those substrates

without acid catalysis, yet with a much lower activity. This study paves the way for broadening *Dt*Gly applications in biocatalysis. Identification of efficient substrates and mutation into a thioligase are currently under further investigation.

**Figure 5.** Glycosyl thioimidates require remote activation to promote the leaving group release.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/9/10/826/s1, Figure S1: Model of docked Glc*S*Box and Glc*S*Taz in *Dt*Gly active site.

**Author Contributions:** Conceptualization, R.D. and A.V.D.; Enzymatic studies, L.G., P.L. and Z.A.; data curation and modelling, P.L.; chemical synthesis of substrates, S.G.P. writing—original draft preparation, P.L.; writing—review and editing, P.L., R.D. and A.V.D.; project administration, R.D.; funding acquisition, R.D. and A.V.D.

**Funding:** L.G., P.L. and R.D. thank the Région Centre Val de Loire for funding (APR IR GLYCOPEPS). A.V.D. thanks the NIGMS for generous support of this work (GM GM111835).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Deciphering the Role of V88L Substitution in NDM-24 Metallo-**β**-Lactamase**

#### **Zhihai Liu 1,2, Alessandra Piccirilli 3, Dejun Liu 2, Wan Li 2, Yang Wang <sup>2</sup> and Jianzhong Shen 2,\***


Received: 25 July 2019; Accepted: 28 August 2019; Published: 2 September 2019

**Abstract:** The New Delhi metallo-β-lactamase-1 (NDM-1) is a typical carbapenemase and plays a crucial role in antibiotic-resistance bacterial infection. Phylogenetic analysis, performed on known NDM-variants, classified NDM enzymes in seven clusters. Three of them include a major number of NDM-variants. In this study, we evaluated the role of the V88L substitution in NDM-24 by kinetical and structural analysis. Functional results showed that V88L did not significantly increase the resistance level in the NDM-24 transformant toward penicillins, cephalosporins, meropenem, and imipenem. Concerning ertapenem, *E. coli* DH5α/NDM-24 showed a MIC value 4-fold higher than that of *E. coli* DH5α/NDM-1. The determination of the *k*cat, *K*m, and *k*cat/*K*<sup>m</sup> values for NDM-24, compared with NDM-1 and NDM-5, demonstrated an increase of the substrate hydrolysis compared to all the β-lactams tested, except penicillins. The thermostability testing revealed that V88L generated a destabilized effect on NDM-24. The V88L substitution occurred in the β-strand and low β-sheet content in the secondary structure, as evidenced by the CD analysis data. In conclusion, the V88L substitution increases the enzyme activity and decreases the protein stability. This study characterizes the role of the V88L substitution in NDM-24 and provides insight about the NDM variants evolution.

**Keywords:** New Delhi metallo-β-lactamase; NDM-24; kinetic profile; secondary structure

#### **1. Introduction**

Metallo-β-lactamases (MBLs) are a group of enzymes that confer high resistance to most β-lactams. The active site of these enzymes contains one or two zinc ions, that are crucial for catalytic mechanism [1]. Based on their amino acid sequences, MBLs have been divided into subclasses B1, B2, and B3 [2]. Among subclass B1, the New Delhi metallo-β-lactamase (NDM-1) is one of the most widespread carbapenemase. NDM-1 was first identified in 2008 in a clinical strain of *Klebsiella pneumoniae* [3]. NDM-1 producing bacteria can hydrolyse all β-lactams (except monobactams), including carbapenems, the "last resort" antibiotics used in clinical therapy. NDM-1 genes are located on plasmids that mediate their dissemination across different bacterial strains [4,5]. However, the clinical success of NDM is also due to the fact that it is a lipoprotein anchored to the outer membrane, resulting in an unusual stability of NDM-1 and enabling secretion, in Gram-negative bacteria [6–8].

To date, more than 26 variants differing by a limited number of substitutions have been identified [9]. Previous studies revealed that these substitutions have contributed to NDM-1 to increase the hydrolytic activity toward several β-lactams resulting in an increment of resistance in the host bacteria [10]. Crystal structures showed that NDM-1 presents the typical αβ/βα fold of MBLs [11,12]. In this enzyme, the zinc ions are coordinated by six conserved residues: H120, H122, and H189 for Zn1

(BBL numbering) and D124, C208, and H250 for Zn2 (BBL numbering). The active site is surrounded by Loop 3 (residues 67-73) and Loop 10 (residues 210-230), involved in the substrate accommodation [12]. The most frequent substitution in NDM-1 is M154L, found in 11 NDM variants (NDM-4, -5, -7, -8, -12, -13, -15, -17, -19, -20 and -21) [9,13–16]. Indeed, V88L has been reported in five NDM variants (NDM-5, -17, -20, -21 and -24). Other frequent substitutions are A233V (NDM-6, -15, -19 and -27), D130G (NDM-8 and -14), D130N (NDM-7 and -19), and D95N (NDM-3 and -27) [10]. The single substitutions of M154L and D130G seem to increase the carbapenemase activity in NDM-4 and NDM-14, respectively [17,18]. Moreover, the combination D130G/M154L (NDM-8), reduces the hydrolysis toward carbapenems [19]. The main goal of the study was to evaluate the role of the V88L substitution in the NDM-24 enzyme. The NDM-24 was generated in the laboratory by a site-directed mutagenesis of NDM-1 and the enzyme properties, protein structure, and thermal stability were studied compared with NDM-1 and NDM-5.

#### **2. Results and Discussion**

#### *2.1. Phylogenetic Analysis*

A phylogenetic analysis of NDM-1 variants was performed in order to classify these enzymes based on their amino acid similarities. Overall, the NDM variants were classified into three major clusters (NDM-1, NDM-4, and NDM-24), two minor clusters (NDM-3 and NDM-6), and two divergent proteins (NDM-14 and NDM-10). As shown in Figure 1, the NDM variants are well categorized. The NDM-1 cluster includes eight variants that showed only one amino acid replacement, except for NDM-18 where an insertion of five amino acids have been found (position 48-52). In the NDM-4 group, all variants possess the replacement at position 154. In particular, except for NDM-11 containing the M154V substitution, all variants shared M154L. In the NDM-24 group, Valine at position 88 has been replaced by a Leucine (V88L). Concerning the two minor groups, similar characteristics were observed with the D95N and A233V substitution for the NDM-3 and NDM-6 clusters, respectively.

**Figure 1.** New Delhi metallo-β-lactamase-1 (NDM-1) variants phylogenetic analysis. Phylogenetic groups were differently coloured: For example, the NDM-24 cluster is coloured in green.

#### *2.2. Functional Study*

The NDM-24 variant was obtained by a site-directed mutagenesis by using the NDM-1 as template. All genes (*bla*NDM-1, *bla*NDM-5, and *bla*NDM-24) were cloned into pHSG398, which were controlled by the same promoter and thus the same expression. The *E. coli* DH5α recombinant strains were used to test susceptibility to a wide panel of β-lactams. As shown in Table 1, the results of the susceptibility test revealed that NDM-1, NDM-5, and NDM-24 confer resistance to most β-lactams with similar MIC values, suggesting that the NDM enzymes were successfully expressed in the host cells. A different behaviour was observed for carbapenems, for which the MIC values were markedly lower than those of penicillins and cephalosporins with the exception of cefepime, as previously reported [15–20]. Concerning ertapenem, NDM-24 and NDM-5 showed an increase of the MIC values of the 4- and 8-fold with respect to NDM-1. Based on the data obtained, the V88L substitution enhances resistance toward ertapenem.


**Table 1.** Antimicrobial susceptibility of *E. coli* BL21(DE3)-DH5α carrying *bla*NDM-1, *bla*NDM-5, and *bla*NDM-24.

#### *2.3. Characteristics of Enzyme Activity*

In order to obtain soluble and active enzymes, the recombinant plasmids were successfully expressed in the *E. coli* BL21 (DE3) cells as described in the methods section. After purification, the enzymes were checked on SDS-PAGE to confirm the solubility and purity (>90%) (Figure S1). The MALDI-TOF mass spectrometry was used to confirm the molecular mass of the three enzymes, which corresponds to 24884,024 Da (Figure S2). To investigate the enzyme activity, the kinetic parameters for NDM-1, NDM-5, and NDM-24 were determined (Table 2).

All the NDM variants of this study were able to hydrolyse all the β-lactams tested. Compared with NDM-1, NDM-24 showed lower *K*m values for penicillins and ceftazidime whereas for carbapenems they are quite similar. Comparing the *k*cat values, NDM-24 hydrolyses all β-lactams, except penicillins, better than NDM-1 and NDM-5. In particular, the *k*cat values of NDM-24 are 2.26-, 1.61-, 2.73-, 2.02-, 2.17-, and 1.75-fold higher than NDM-1 towards penicillin G, ceftazidime, cefepime, imipenem, meropenem, and ertapenem, respectively. This was also confirmed by a slight increase of catalytic efficiency. This result stated the important role of V88L in the substrate hydrolysis. The contribution of V88L is likely that of M154L as demonstrated by the calculation of the *k*cat/*K*m rates (Table 2). This was possibly due to differences in the intrinsic properties, such as the enzyme stability, protein expression, and adaptability [21–24], and nutritional conditions of bacteria in vivo/vitro. Comparing the *k*cat/*K*<sup>m</sup> values of carbapenems, the carbapenemase activity of NDM-5 was similar to NDM-24, but higher than NDM-1. A recent study showed that an increase of the catalytic efficiency (*k*cat/*K*m) for meropenem has been ascertained in NDM-5 (V88L and M154L). In NDM-4, which contains only M154L, no significant change has been observed, suggesting that V88L might play a role in enhancing the NDM enzymes activity rather than M154L. Moreover, an increase of the carbapenemase activity was also observed in the evolutionary NDM variants, such as NDM-17 (V88L, M154L, and E170K) and NDM-20 (V88L, M154L, and R270H) [10,15,16].



proteinsinitiallypurifiedHis-tag,purification.β-lactams:AMP,ampicillin;TAG,ceftazidime;PEN,penicillinG;FEP,cefepime;IPM,imipenem;MEM,meropenem;ETP,ertapenem.

#### *2.4. Thermal Stability*

As previously reported, mutations in the NDM variants can affect the enzymes stability, resulting in changing the persistence lifetime in the bacterial host, and consequent antibiotics resistance [25]. For determining whether the V88L substitution influences the NDM-24 stability property, circular dichroism CD was used to assay the protein stability by recording signal changes. NDM-5 was used as reference to analyze the effect of M154L. Compared with NDM-1 and NDM-5, NDM-24 (59.41 ± 0.06 ◦C) possessed the lowest melting temperature (Figure 2). Notably, the V88L destabilized effect was compensated by M154L in NDM-5 with a remarkable higher thermal temperature than NDM-24 (69.13 ± 3.6 ◦C compared to 59.41 ± 0.06 ◦C). Moreover, NDM-5 showed a higher stability than NDM-1 suggesting the destabilized role of M154L. This was in agreement with a previous document that the M154L mutation would be a turning point for the NDM variants, in which combing M154L with additional substitutions benefit for the NDM enzymes exhibiting increased thermostability [10]. In the NDM-24 group there are four variants (NDM-5, -17, -20, and -21) in which the combination of the V88L and M154L substitutions takes favorable results in terms of the stability and environmental selection.

**Figure 2.** (**A**) Thermal stability melting curves. (**B**) Melting temperatures of the NDM enzymes as determined by the circular dichroism analysis: NDM-1, 63.61 ± 0.57 ◦C; NDM-V88L, 59.41 ± 0.06 ◦C; and NDM-5, 69.13 ± 3.6 ◦C. Data are the means of triplicate experiments, with error bars showing the standard deviation (±SD).

#### *2.5. Structure Analysis*

Previous reports indicated that mutations in the NDM influence the α-helical, β-sheets content, and loop flexibility [26]. For example, the Q123A substitution in NDM-1 leads to a decrease of the α-helical content [27]. To know if the V88L substitution could modify the NDM-24 structure, a secondary structure was determined by the Far-UV CD spectrum. All NDM variants CD spectrum data were fitted and shown in Figure 3. The spectrum signals were superimposable at most wavelengths, and showed characteristics of αβ/βα fold, a typical and conservative protein structure in MBLs [28]. The presence of positive bands at 192 nm and two negative peaks at 208 nm, a minimum peak, and 220 nm, suggesting the dominance of the β-sheets and α-helical content. The major differences were observed in the nearby 192 nm, symbolizing α-helical peak, and 208–220 nm, a α-helical and β-sheets bonds. Overall, the α-helical content was found ranging between 13%–20% in NDM-1, NDM-5, and NDM-24 (Table 3), in agreement with previous reports and the content of the β-sheet was high around 30% [27]. Compared to NDM-1, NDM-24 possesses a higher α-helical content and lower β-sheet content, suggesting that V88L was responsible for the secondary structure content changes of NDM-24. Furthermore, the secondary predicted result (Figure S3) confirmed that the V88L substitution occurred in the β-strand terminal, which may be prematurely terminated, leading to a decrease in the β-sheet content. Kumar et al. claimed that 152A, located in the β-strand, drastically influenced the NDM-5

activity and protein thermolability, by reducing the β-sheet content [26]. Our analysis demonstrates that the emergence of M154L (in NDM-5) caused the α-helical to decrease and the β-sheet to increase relative to NDM-24, while the α-helical and β-sheet content of NDM-5 were between NDM-1 and NDM-24. In addition, the 3D model of NDM-24 (Figure 4) was generated by using NDM-1 (PDB accession: 5N0H, 4RBS, 4HKY, and 4EYL) and NDM-5 (PDB accession: 6MGY, and 4TZE) as a template. Although the residue 88L is away from the active site groove and far to the active loops (Loop 3 and Loop 10), differences in the structure content, stability, and enzyme activity were ascertained. Several studies confirmed that non-activity sites substitution can influence the NDM catalytic efficiency [29], and our results about the V88L substitution support this theory.

**Figure 3.** Normalized circular dichroism (CD) spectra of the NDM enzymes tested. MRE: Mean residue ellipticity.


*<sup>a</sup>* The CDPro program package was used to analyse the data using two algorithms: CONTINLL and SELCON3. *<sup>b</sup>* H(r), regular α-helix; H(d), distorted α-helix; S(r), regular β-strand; S(d), distorted β-strand; Trn, turns; Unrd, unordered. *<sup>c</sup>* The reference protein sets (IBasis sets) were adopted.

**Figure 4.** Cartoon model of NDM-24. To acquire a credible model, the 6 NDM crystal structure (NDM-1(5N0H, 4RBS, 4HKY, and 4EYL) and NDM-5(6MGY, and 4TZE)) were adopted. The residue 88L and active pocket were labelled.

#### **3. Material and Methods**

#### *3.1. Site-Directed Mutagenesis, Cloning and Expression of NDM Variants*

The *bla*NDM-1 and *bla*NDM-5 encoding genes were obtained from clinical *Escherichia coli* strains as previously described [15,16]. Site-directed mutagenesis was performed to generate *bla*NDM-24 using the pHSG398/NDM-1 plasmid as template and primers listed in Table S1, as previously described [30].

First, the *bla*NDM-genes were cloned into a pHSG398 vector (TaKaRa Bio, Dalian, China) using the *BamHI* and *XhoI* restriction sites. In a second PCR experiment, the *bla*NDM variants were amplified without a signal peptide introducing the Tobacco Etch Virus (TEV) at the N-terminal sequence. In order to obtain enzymes overexpression, the amplicons were inserted into a pET-28a(+) plasmid. The *E. coli* DH5α competent cells were used as a non-expression host. *E. coli* BL21(DE3) was used for enzymes overexpression. The authenticity of recombinant plasmids was verified by PCR and sequencing was with Sanger.

#### *3.2. Antimicrobial Susceptibility Tests*

The phenotypic profile has been characterized by a microdilution method using a bacterial inoculum of 5 x 10<sup>5</sup> CFU/ml according to the Clinical Laboratory and Standards Institute [31,32]. *E. coli* ATCC25922 was used as a negative control.

#### *3.3. Production and Purification of NDM-1, NDM-5, and NDM-24*

NDM-1, NDM-5, and NDM-24 were extracted from 0.5 L of culture of *E. coli* BL21(DE3)/pETNDM-1, *E. coli* BL21 (DE3) /pETNDM-5, and *E. coli* BL21 (DE3)/pETNDM-24, respectively. The cultures were grown at 37 ◦C to achieve an A600 of 0.5 L, and 0.4 mM of IPTG was added. After addition of the IPTG, the cultures were incubated at 20 ◦C for 16 h, under aerobic conditions. Thereafter, a cell supernatant containing the soluble NDM protein was obtained from the lytic bacteria by centrifuging at 10<sup>4</sup> rpm. Proteins purification followed the manufacturer's instructions (Qiagen, Hilden, Germany) by using the Ni-nitrilotriacetic acid (NTA) agarose. The turbo tobacco etch virus (TEV) protease (Accelagen, San Diego, CA, USA) was used to gain the untagged protein without the His tags. The SDS-PAGE was performed to estimate the NDM purity enzymes. The protein concentration was determined using a BCA Protein Quantification Kit (Vazyme Biotech Co., Ltd, Nanjing, China). The β-lactamase activity was monitored at each purification step using the colour change of nitrocefin 1 mg/mL, a chromogenic cephalosporin, according to the previous report [20].

#### *3.4. Determination of Kinetic Parameters*

Steady-state kinetic experiments were performed following the hydrolysis of each substrate at 25 ◦C in a 50 mM phosphate buffer, pH 7.0 in the presence of 20 μM Zn SO4. The data were collected with a SpectraMax M5 multi-detection microplate reader (Molecular Devices, Sunnyvale, CA, USA) as previously described [33]. Kinetic parameters were determined under initial-rate conditions using the GraphPad Prism® version 5.01 (San Diego, CA, USA) to generate the Michaelis-Menten curves, or by analysing the complete hydrolysis time courses [34,35]. Each kinetic value is the mean of the results of three different measurements. The error was below 5%. NDM-5 was used as a reference to normalize.

#### *3.5. Circular Dichroism and Structure Analysis*

The circular dichroism (CD) spectra (185 to 260 nm) were determined with a Chirascan Plus CD spectrophotometer (Applied Photophysics Ltd, Leatherhead, UK) equipped with a Peltier temperature-controlled cell holder, at 25 ◦C using a 1-mm pathlength cuvette and the Savitzky-Golay filter was explored to the baseline-correct spectra data. Protein concentrations were diluted to 0.05-0.2 mg/mL with a 5 mM sodium phosphate buffer pH 7.0 [36]. 207 nm spectrum data was used as the baseline to normalize and calibrate data for eliminating minor errors due to different concentrations [37]. The analysis was performed using the CONTINLL and SELCON3 algorithms with reference data sets three and nine, respectively [38]. The super-secondary (tertiary) structures of the proteins were analysed by the CDPro software package, which is available at the CDPro website: https://sites.bmb.colostate.edu/sreeram/CDPro/ [38,39]. To assay the location of the V88L substitution and analyse its effect on the structure, the pharmacophore modeling and screening software program Discovery Studio (version 2018) was employed to generate a three-dimensional (3D) interconnected model of NDM-24 using NDM-1 and NDM-5 as a template, in which reliable data of the crystal structure were collected from the PDB database.

#### *3.6. Thermal Stability Testing*

The melting temperature (*Tm*) was used to show the protein stability. The determination of *Tm* was performed by recording the CD signal change at 222 nm. Data were collected at a ramp of 1 ◦C /min with a temperature range from 20-90 ◦C. The two-state model using nonlinear regression (Boltzmann) in the OriginPro 9.1.64 (OriginLab, Northampton, MA, USA) was applied to analyse the data. When 50% of the protein melts, the temperature is defined as the *Tm*, representing thermal stability.

#### **4. Conclusions**

Our study explored the NDM-24 biological function and probed the V88L substitution role on the structure, enzyme activity, and stability. In brief, this non-active site change enhances the enzyme activity by increasing the turnover rate, related with an indirect effect on conformation. However, the loss cost caused by V88L significantly decreased the protein stability, and would shorten the persistence lifetime in the cell, so that the resistance to antibiotics hardly exhibits an outstanding elevation for the NDM-24-producing transformants. Meanwhile, alterations in the secondary content, such as lowering the β-sheet, have an interesting role in the NDM instability, being relevant to the V88L substitution occurring in the β-strand. According to previous data, the V88L/M154L combination appears to be favorable in the NDM evolution under an environmental pressure selection [14]. Further analysis about the significance of non-active-site residues will help in the comprehension of the resistance mechanism and broaden insight in the development of inhibitors, such as potential antibiotics candidate by reducing the protein stability lifetime.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/9/9/744/s1. Figure S1: SDS-PAGE of NDM-24. Lane 1: NDM-24 containing His Tags, Figure S2: Molecular mass spectrometry of NDM-24 estimated by MALDI-TOF, Figure S3: Predicted secondary structure of NDM-24, Lane 2: NDM was cleaved by using Turbo tobacco etch virus (TEV) protease to remove His Tags (Accelagen, San Diego, CA, USA): tagged protein (2a) and untagged protein (2b); Lane 3: untagged protein; Lane M: Marker, Table S1. Oligonucleotides used in this study.

**Author Contributions:** J.S. designed the study. Z.L., D.L., and W.L. collected the data. Z.L., Y.W., and D.L. analyzed and interpreted the data. Z.L., A.P., D.L., Y.W., and J.S. wrote the report. All authors revised, reviewed and approved the final report.

**Funding:** The study was supported by grants from the National Key Research and Development Program of China (2018YFD0500300), and the National Natural Science Foundation of China (81861138051 and 81661138002).

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Decoding Essential Amino Acid Residues in the Substrate Groove of a Non-Specific Nuclease from** *Pseudomonas syringae*

#### **Lynn Sophie Schwardmann, Sarah Schmitz, Volker Nölle and Skander Elleuche \***

Miltenyi Biotec B.V. & Co. KG, Friedrich-Ebert-Straße 68, 51429 Bergisch Gladbach, Germany; l.schwardmann@web.de (L.S.S.); sarahs@miltenyibiotec.de (S.S.); VolkerN@miltenyibiotec.de (V.N.) **\*** Correspondence: skander.elleuche@miltenyibiotec.de; Tel.: +49-2204-8306-4452

Received: 17 October 2019; Accepted: 6 November 2019; Published: 9 November 2019

**Abstract:** Non-specific nucleases (NSN) are of interest for biotechnological applications, including industrial downstream processing of crude protein extracts or cell-sorting approaches in microfabricated channels. Bacterial nucleases belonging to the superfamily of phospholipase D (PLD) are featured for their ability to catalyze the hydrolysis of nucleic acids in a metal-ion-independent manner. In order to gain a deeper insight into the composition of the substrate groove of a NSN from *Pseudomonas syringae*, semi-rational mutagenesis based on a structure homology model was applied to identify amino acid residues on the protein's surface adjacent to the catalytic region. A collection of 12 mutant enzymes each with a substitution to a positively charged amino acid (arginine or lysine) was produced in recombinant form and biochemically characterized. Mutations in close proximity to the catalytic region (inner ring) either dramatically impaired or completely abolished the enzymatic performance, while amino acid residues located at the border of the substrate groove (outer ring) only had limited or no effects. A K119R substitution mutant displayed a relative turnover rate of 112% compared to the original nuclease. In conclusion, the well-defined outer ring of the substrate groove is a potential target for modulation of the enzymatic performance of NSNs belonging to the PLD superfamily.

**Keywords:** DNase; kinetic profiles; RNase; semi-rational mutagenesis; substrate specificity

#### **1. Introduction**

Non-specific nucleases (NSN) are a group of enzymes that hydrolyze deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) in all conformations, including single-stranded and double-stranded or linear and circularized substrates, without sequence specificity [1]. NSNs are ubiquitously distributed among all organisms and are of great potential for versatile biotechnological and clinical applications [2–4].

Enzymes that are highly indiscriminate towards different substrates are generally considered as potential evolutionary starting points for developing novel or more specific catalytic activities [5–7]. Members of the phospholipase D (PLD; Enzyme Commission number (EC) 3.1.4.4) superfamily are known to accept a wide range of ester substrates, including nucleic acids [8–10]. PLDs are mainly represented in eukaryotes and predominately catalyze the hydrolysis of phosphatidylcholine to produce choline and phosphatidic acid [11]. Moreover, PLDs act as important key players in various physiological processes, including cell migration and membrane trafficking [10]. This family of enzymes usually encodes two copies of the conserved HxK(x)4D(x)6GSxN motif in one gene.

A structurally related bacterial enzyme (Nuc) has been initially described from the human pathogenic microorganism *Salmonella enterica* subsp. *enterica* serovar *Typhimurium*. Nuc contains a single HxK(x)4D(x)6GSxN motif, but forms a homodimer, and is capable of degrading nucleic acids in a non-specific manner [12,13]. This enzyme is among the very few known nucleases that are not dependent on a metal ion in its catalytic region, and is therefore of potential interest for biotechnological applications that take place in buffers supplemented with metal ion chelators, such as ethylenediaminetetraacetic acid (EDTA) or ethylene glycol-bis(β-aminoethyl ether)-N,N,N ,N -tetraacetic acid (EGTA). The group of metal-ion-independent nucleases mainly consists of two PLD-like, site-specific restriction endonucleases from *Bacillus firmus* and *Bacillus megaterium*, WSV191 from the white spot syndrome virus and GBSV1-NSN from a thermophilic bacteriophage, as well as the restriction glycosylase R.*Pab*I from the hyperthermophilic archeon *Pyrococcus abyssi* [14–18].

So far, only three isozymes of bacterial PLD-like NSNs, beside Nuc from *S. enterica* subsp. *enterica* serovar *Typhimurium*, have been investigated and described in detail with regard to their biochemical and biophysical properties: (1) *Ec*Nuc from *Escherichia coli* has been shown to be applicable during cell lysis and protein purification in EDTA-containing buffers; and (2) two isozymes from the plant pathogenic competitor bacterium *Pantoea agglomerans* were shown to be the result of an ancient gene duplication event followed by diversification [19,20]. These enzymes are completely devoid of catalytic performance towards lipids and exclusively degrade nucleic acids in a non-specific manner.

In this study, another metal-ion-independent NSN (DNase/D157G) from *Pseudomonas syringae* was mutagenized using a semi-rational strategy to gain a deeper insight into the composition of the substrate groove. Homology modeling was applied to identify amino acid residues on the surface of the NSN in the surrounding of the catalytic site, which is buried at the bottom of the putative substrate groove. It has been shown before that positively charged amino acid residues that can interact with the proximal negatively charged phosphate groups in nucleic acids had a stimulating impact on the catalytic activity of human DNase I [21]. Therefore, positively charged amino acids were introduced at positions on the surface of DNase/D157G within the substrate groove. Two regions were defined that were either directly adjacent to the catalytic site (inner ring) or at the border of the substrate groove (outer ring). Substitutions in the inner ring dramatically impaired or completely abolished the catalytic activity, while mutagenesis in the outer ring had no or little effect. DNase variant K119R/D157G displayed increased activity, the temperature optimum of variant S143R/D157G was shifted from 50 ◦C to 40 ◦C, and N95K/D157G and S143K/D157G exhibited an increased tolerance towards 50 mM of EDTA.

#### **2. Results**

#### *2.1. Identification of Amino Acid Residues within the Substrate Groove of a NSN from Pseudomonas sp.*

Phylogenetic analyses revealed a highly conserved NSN within the genomes of *Pseudomonas* species. These enzymes are related to Nuc from *S. enterica* subsp. *enterica* serovar *Typhimurium* (~57% identity in 159 amino acids overlap) and are highly active at neutral pH, in the presence of salt concentrations up to 250 mM, and in a temperature range between 4 and 50 ◦C (Supplementary Materials Figure S1). It has been shown that amino acid residues D157, E157, and G157 occur naturally in homologues from the genus *Pseudomonas*. A comparison of the catalytic activities in our laboratory revealed that an enzyme variant containing G157 is superior over a variant with a negatively charged amino acid at position 157 (unpublished results). Therefore, the natural amino acid sequence from a NSN of *P. syringae* containing a single amino acid substitution at position 157 (DNase/D157G) was used in our study as the starting variant for semi-rational mutagenesis to generate double-mutants.

A homology model of the enzyme DNase/D157G was produced based on the crystal structure from *S. enterica* subsp. *enterica* serovar *Typhimurium* (Figure 1). The enzyme is modelled as a hypothetical homodimer, with the catalytic site buried within a putative substrate groove at the dimeric interface. The catalytic site is composed of amino acids H122, K124, G136, S137, and N139, that are part of the HxK(x)4D(x)6GSxN motif, while D129 has been proposed to be of structural relevance [12]. Fifteen different amino acid residues were identified that are present on the protein surface within the substrate groove. These amino acid residues were either assigned to be part of an inner ring that is directly

adjacent to the catalytic site or to an outer ring that surrounded the inner ring amino acid residues. The following amino acids were identified as being located on the protein surface close to the catalytic site: Y63, S64, T66, I120, and S141, while P68, H91, G92 D94, N95, A97, A101, K119, A142, and S143 are defined as being part of the outer ring (Figure A1 Appendix A).

**Figure 1.** Homology model of non-specific nucleases (NSNs) from *Pseudomonas syringae* in top and side views. Low resolution model of DNase/D157G using Nuc (PDB: 1BYS\_A) as a template. PyMOL was used to highlight a hypothetical dimeric structure with dark and light grey monomers. The conserved residues of the HxK(x)4D(x)6GSxN motif are given in dark and light red. Naturally occurring amino acids of the outer and inner rings are highlighted in dark and light blue, while substituted positively charged amino acid residues are indicated in dark and light green. Cartoon illustrations at the bottom are used to simplify the orientation of amino acid residues of the inner and outer ring within the potential substrate groove surrounding the catalytic site.

Surface-presented amino acid residues were exchanged for lysine or asparagine to improve substrate binding and modulate the enzymatic performance or to identify amino acids essential for catalytic activity. Histidine was not considered due to its bulkiness, aggravating the risk for interference with the structure of the protein. Potential steric hindrance was determined by in silico mutagenesis, and the following substitutions were selected: Y63K, S64K, T66R, P68R, H91R, D94K, N95K, K119R, I120K, S141K, A142R, and S143R. Due to the fact that K119 is the only positively charged amino acid within the substrate groove, this lysine was replaced with asparagine. Amino acid residues G92, A97, and A101 were not mutagenized due to expected clashes with adjacent amino acids.

#### *2.2. Production and Purification of Recombinant Nuclease Mutants*

Recombinant DNase/D157G double-mutants were produced in *Escherichia coli* Veggie BL21 (DE3), except for mutant P68R/D157G, because this expression strain could not be transformed with the corresponding expression plasmid. Therefore, *Escherichia coli* Veggie BL21 (DE3) pLysS was used for the production of this mutant. All recombinant mutant enzymes were produced in soluble form and could be purified using a two-step approach combining affinity and ion exchange chromatography. Purification strategy was optimized using the Äkta purifier (Figure 2). The purification level of all recombinant nucleases was visualized with sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and Western blotting analyses using an anti-HIS antibody.

**Figure 2.** Biochemical and enzymatical analysis of purified nuclease mutants. SDS-PAGE results showed the purity of HIS-tagged nucleases (29 kDa) after a two-step purification approach composed of affinity and ion exchange chromatography (upper row). Western blotting analyses confirmed the presence of HIS-tagged nucleases (middle row). Qualitative activity assays used ethidium bromide staining in 96-well plates (lower row): white colored dots indicate the presence of ethidium bromide intercalating into DNA and grey dots indicate incomplete degradation, while a complete loss of fluorescence is due to complete DNA degradation. Solid line boxes—inactive variants Y63K/D157G and S141K/D157G were purified in non-optimized gravity flow experiments. Dashed line box—variant P68K/D157G was produced in expression strain *E. coli* Veggie BL21(DE3) pLysS. Note: "+" indicates activity and "−" indicates inactivity. SDS-PAGE results, including all purification steps, are shown in Figure A2 Appendix A.

A qualitative assay using ethidium bromide to visualize non-degraded, double-stranded DNA (dsDNA) was applied to demonstrate that mutants Y63K/D157G and S141K/D157G were inactive, while mutant S64K/D157G completely degraded dsDNA after 6 h, mutant T66R/D157G after 24 h, and mutant I120K/D157G partly hydrolyzes DNA after 24 h. These five mutants contain amino acid substitutions within the inner ring of the substrate groove. The remaining mutants as well as the positive control DNase/D157G completely hydrolyzed dsDNA after 30 min (Figure 2).

To ensure that the correct mutants were purified and characterized, the molecular masses of the purified enzymes were determined by mass spectrometry in addition to sequence verification of generated plasmids. Measured molecular weights are in accordance with predicted masses (Table 1). Variant P68R/D157G could not be properly identified due to impurities.


**Table 1.** Molecular masses of nuclease variants.

<sup>1</sup> Theoretical molecular weights (MW) of monomers were calculated using *Compute pI*/*MW*. <sup>2</sup> n.d.—not detectable.

#### *2.3. Biochemical Properties of Nucleases*

In vitro activity assays using dsDNA at 25 ◦C without the addition of ethidium bromide were conducted to confirm the qualitative plate assays. The reaction was stopped after 1 h of incubation and samples were loaded onto an agarose gel to investigate the level of hydrolysis of sheared dsDNA (Figure 3). As expected, inactive mutant enzymes Y63K/D157G and S141K/D157G were not capable of degrading dsDNA, while recombinant enzymes S64K/D157G, T66R/D157G, and I120K/D157G, respectively, exhibited reduced activity levels compared to the initial nuclease variant DNase/D157G and the remaining mutants.

**Figure 3.** Digestion of sheared dsDNA by nuclease mutants. Sheared dsDNA (UltraPureTM Salmon Sperm DNA Solution) exhibited an average size of ≤ 2000 bps. Negative control containing substrate without enzymes is indicated as "dsDNA". The positive control DNase/D157G and mutants S64K/D157G, H91R/D157G, D94K/D157G, N95K/D157G, K119R/D157G, I120K/D157G, A142R/D157G, and S143R/D157G were able to completely hydrolyze sheared dsDNA. S64K/D157G, T66R/D157G, and I120K/D157G partially degraded sheared dsDNA within 1 h at 25 ◦C. Y63K/D157G and S141K/D157G did not exhibit activity towards sheared dsDNA. Note: "+" indicates activity and "−" indicates inactivity, without any quantification of the activity level.

After 1 h of incubation, mutant enzyme T66R/D157G only initiated the hydrolysis of dsDNA with some low molecular weight fragments visible at the bottom of the agarose gel. Therefore, the catalytic activity of this mutant was investigated with regard to its degradation velocity. Identical concentrations of mutant enzyme T66R/D157G were incubated with substrate for 30 min, 1 h, 2, h, 4 h, and 24 h, respectively. Reactions were stopped and the samples were loaded onto an agarose gel, revealing that the substrate becomes slowly degraded and is still not completely digested after 4 h of incubation (Figure 4).

**Figure 4.** Degradation of sheared dsDNA by mutant T66R/D157G over time. Sheared dsDNA (UltraPureTM Salmon Sperm DNA Solution) exhibited an average size of ≤ 2000 bps. Reaction was stopped after indicated times (between 0.5 and 24 h). White dashed arrow indicates level of nucleic acid molecular weights. The dsDNA is completely hydrolyzed after 24 h of incubation at 25 ◦C.

To investigate the substrate promiscuity of active recombinant nuclease mutants, the enzymatic hydrolysis was studied towards the following substrates: unsheared dsDNA, single-stranded genomic DNA (ssDNA), circularized DNA, and RNA from bacteriophage MS2. Mutant enzymes with amino acid substitutions in the outer ring of the substrate groove that were active towards sheared dsDNA (Figure 3) also degraded unsheared dsDNA, ssDNA, circularized DNA, and RNA (Figure 5, Figure A3).

**Figure 5.** Substrate specificity of mutants with amino acid substitutions in the outer ring. Positive control DNase/D157G and mutant P68R/D157G were incubated for 1 h at 25 ◦C with different types of DNA and RNA. Further outer ring mutants H91R/D157G, D94K/D157G, N95K/D157G, K119R/D157G, A142R/D157G, and S143R/D157G were also able to degrade all types of nucleic acids (Figure A3 Appendix A). "Control" indicates negative controls containing substrate but no enzyme in the reaction mixture.

Nuclease mutant enzyme Y63K/D157G was also inactive towards unsheared dsDNA, ssDNA, and circularized DNA, while S141K/D157G showed some activity on all substrates except sheared

dsDNA (Figures 3 and 6). However, mutant Y63K/D157G exhibited some activity towards RNA. In good agreement with previous results, mutant enzyme T66R/D157G exhibited reduced activity compared to control DNase/D157G and active mutant enzymes when incubated with both DNA and RNA (Figure 6). In contrast to sheared dsDNA, mutant S64K/D157G completely digested unsheared dsDNA, ssDNA, and circularized DNA, and partially digested RNA, while I120K/D157G was also active on every substrate, but only completely degraded RNA within 1 h at 25 ◦C (Figure 6).

**Figure 6.** Substrate specificity of mutants with amino acid substitutions in the inner ring. Mutant Y63K/D157G was not able to degrade any type of nucleic acid. Mutants S64K/D157G, T66R/D157G, and I120K/D157G partially degraded all types of nucleic acids within 1 h at 25 ◦C, while mutant S141K/D157G exhibited low levels of degradation activity. "Control" indicates negative controls containing substrate but no enzyme in the reaction mixture.

Furthermore, pH and temperature optima of active mutants were determined. Every mutant as well as the initial variant DNase/D157G displayed optimal activity at pH 7.0. The temperature optima were around 50 ◦C for the initial variant DNase/D157G and all active mutants, except for mutants S64K/D157G and S143R/D157G, which showed optimal activities at 60–70 ◦C and 40 ◦C, respectively.

#### *2.4. Enzyme Kinetics*

Michaelis–Menten kinetics using sheared dsDNA confirmed the result of the quality activity assays: mutant enzymes Y63K/D157G and S141K/D157G did not exhibited any activity in these assays towards sheared dsDNA and the catalytic performances of mutant enzymes S64K/D157G, T66R/D157G, and I120K/D157G were lower compared to the initial nuclease variant DNase/D157G and remaining active mutant enzymes at 25 ◦C. Relative turnover rates were below 10% compared with DNase/D157G and the catalytic efficiency (kcat/KM) was below 0.2 s<sup>−</sup><sup>1</sup> μM−<sup>1</sup> (Table 2). The turnover number of mutant enzyme K119R/D157G (1034 s<sup>−</sup>1) exclusively surpassed the catalytic activity of DNase/D157G (924 s<sup>−</sup>1), but the latter mutant enzyme also exhibited the lowest substrate affinity of all mutants (KM = 428 μM).


**Table 2.** Kinetic characteristics of nuclease mutants 1.

<sup>1</sup> All experiments were done at 25 ◦C. <sup>2</sup> Not detectable, below the detection limit.

#### *2.5. EDTA Tolerance of Nuclease Mutants*

As metal-ion-independent proteins, NSNs of the PLD-like family are known to be tolerant in the presence of chelating agents, such as EDTA. DNase/D157G displayed > 80% relative activity at a concentration of 20 mM EDTA and still preserved > 60% residual activity in the presence of 50 mM EDTA. Active positively charged mutant enzymes were also tested in the presence of EDTA in concentrations between 1 mM and 50 mM, without any abnormalities, except that mutant enzymes P68R/D157G, N95K/D157G, I120K/D157G, and S143K/D157G were almost not affected by any concentration of EDTA (> 80% residual activity at concentrations up to 50 mM), while mutant enzymes T66R/D157G and K119R/D157G only showed residual activities of 25% and 34% in the presence of 50 mm EDTA, respectively.

#### **3. Discussion**

Commercially available NSNs are of high potential for the elimination of nucleic acids during protein downstream processing, to reduce the viscosity, or for prevention of cell clumping in cell sorting approaches [1,20,22]. The prototype non-specific nuclease is the metal-ion-dependent NSN from *Serratia marcescens* that is commercially sold under the trademark "Benzonase® Nuclease" (Merck KGaA, Darmstadt, Germany). This enzyme has been investigated in detail with regard to protein maturation, secretion, catalytic mechanism, and biotechnological applications [2]. Further NSNs, especially metal-ion-independent enzymes, have only been rarely investigated to date [13,16,18–20,23].

In this study, a semi-rational approach was used to select suitable mutation sites within the substrate groove of an NSN from *Pseudomonas syringae*, which were substituted by site-directed mutagenesis against positively charged amino acid residues to modulate the affinity for negatively charged substrates. Critical amino acid residues for the enzymatic performance of different nucleases were routinely identified by site-directed mutagenesis in previous studies [21,24–26].

Under natural conditions, it has been hypothesized that either positive mutations are first installed in a protein, while negative and neutral mutations are accumulated over time (Neo-Darwinian hypothesis), or neutral mutations pave the way for a flexible evolution and positive or negative mutations are installed as a response to certain conditions (competing hypothesis) [27]. By comparing homology models of proteins with singular amino acid substitutions, it is difficult to determine which of the specific mutations evokes an advantageous, neutral, or even deleterious effect on the protein performance, with certain effects on secondary and tertiary structures often being totally unpredictable [28]. Therefore, in silico mutagenesis was exclusively used to predict steric hindrance between introduced charged amino acid residues and amino acids of the catalytic site, or the predicted substrate groove followed by experimental testing of produced mutants.

Fifteen amino acid residues on the protein surface near the catalytic site were identified in a NSN that is highly conserved within the genus *Pseudomonas*. In silico mutagenesis revealed that twelve amino acids could be substituted against either arginine or lysine, without any steric effects in the molecular model. Eleven mutants could be produced in recombinant form in *E. coli* Veggie BL21 (DE3). However, transformation of *E. coli* Veggie BL21 (DE3), with an expression plasmid encoding for mutant P68R/D157G, did not result in any clones, but the recombinant enzyme could be produced in expression strain *E. coli* Veggie BL21 (DE3) pLysS with a very low yield. The additional plasmid pLysS encodes for T7 lysozyme to lower the background expression level of genes under the control of the T7 promoter. Therefore, it can be hypothesized that background expression of the gene-encoding mutant P68R/D157G in pLysS-less expression strains leads to lethality of the host strain. It is worth mentioning that in another experiment, the same results were monitored when proline at position 68 was replaced with negatively charged aspartate in our control. Nevertheless, mutant P68R/D157G only exhibited a turnover rate of 11% compared with the original variant DNase/D157G. However, the reduced activity may also be dependent on protein impurities that were still present after application of a two-step purification approach.

Artificially increasing the positive charge of the putative substrate groove in the NSN did not accelerate the catalytic performance of the enzyme. In another study, the activity of human DNase I also dropped with the addition of basic amino acids compared with the wild-type enzyme. However, suboptimal conditions for the wild-type enzyme, such as increased salt concentrations, accelerated the performance of the mutated enzyme variants [21]. Eleven out of twelve mutants in our portfolio displayed reduced activity at optimal conditions, while K119R/D157G was the only variant exhibiting an increased turnover number compared with DNase/D157G. However, this enzyme was the sole exception with a basic amino acid replaced by another basic amino acid. Two mutations within the inner ring of the substrate groove, namely Y63K and S141K, completely abolished the ability to hydrolyze sheared dsDNA, while mutant S141K/D157G was still capable of partially hydrolyzing unsheared dsDNA and circular plasmid DNA. Both mutants were also active towards RNA, but it is important to note that although all buffers were prepared under sterile conditions, the possibility that the observed activity is an artefact due to contamination with RNase cannot be excluded. Interestingly, asparagine at position 95 within the outer ring of the substrate groove was slightly impaired with regard to kcat. A comparable effect has also been observed in another mutant, in which asparagine was replaced by serine. The latter amino acid occurs naturally in the homologous NSN in some members of the genus *Pseudomonas*.

Mutants were also probed for stability and activity effects in the presence of chelating agents. A slight performance improvement was detected for mutants N95K/D157G and S143K/D157G with regard to the tolerance of high concentrations of EDTA, which were not affected by concentrations up to 50 mM. Metal-ion-dependent DNases are usually completely inhibited by low concentrations of EDTA (1–5 mM), and even some metalloproteins that tolerate EDTA are dramatically impaired by concentrations of 50 mM [29,30]. However, the related NSNs from *Escherichia coli* and *Pantoea agglomerans* were already inhibited at concentrations above 20 mM of EDTA, which is similar to the results obtained with mutant enzymes T66R/D157G and K119R/D157G [19,20]. Nevertheless, these data are in line with the crystal structure of Nuc from *S. enterica* subsp. *enterica* serovar *Typhimurium*, demonstrating that NSNs of the superfamily of PLD proteins are not metal-ion-dependent [12].

The optimal growth temperature of *Pseudomonas syringae* is at 28 ◦C, but the highest activity of the nuclease and its mutants was determined to take place between 40 and 70 ◦C [31], These results are in line with previous observation of enzymes derived from psychro-, meso-, and thermophiles that displayed a temperature optimum, which is above their preferred growth temperatures [19,20,29,32,33].

Detailed enzyme characterizations are always a prerequisite for understanding the functionality of enzymes and to enable the modulation of their catalytic performances [34]. It has been shown that the enzyme family of NSNs from the genus *Pseudomonas* is a promising model protein for modifying the catalytic performance with regard to turnover number, temperature optimum, or EDTA tolerance

by single amino acid substitutions. Due to their enzymatic properties, PLD-like NSNs from bacteria are of great potential for versatile biotechnological applications, and for this reason the discovery of novel enzymes, the optimization of available candidates, and the development of further applications are all highly needed [4,20,35–37]. Finally, the discovery of novel candidates and the extensive characterization in combination with straight-forward protein engineering techniques will lead to the production of more tailor-made enzymes for specific biotechnological applications.

#### **4. Materials and Methods**

#### *4.1. Strain and Culture Conditions*

*Escherichia coli* strains Veggie BL21 (DE3) and Veggie BL21(DE3) pLysS (both from Merck KGaA, Darmstadt, Germany) were used for gene expression and protein production. *E. coli* strain NEB® 5-alpha (New England Biolabs, Frankfurt/Main, Germany) was used for plasmid propagation and maintenance.

#### *4.2. Computational Sequence Analysis and Structure Modelling*

Protein sequence data of a non-specific nuclease (WP\_050543862.1) from the gram-negative, ice-nucleating bacterium *Pseudomonas syringae* was identified and biochemically characterized in our laboratory (unpublished results). The naturally occurring amino acid substitution D157G in related homologous sequences was shown to be beneficial for catalytic activity of the enzyme compared to the wild-type polypeptide sequence. The three-dimensional model of the bacterial nuclease DNase/D157G was generated by the SWISS-MODEL online server, using the crystal structure of a homologous nuclease (PDB ID:1BYS\_A) from *Salmonella enterica* subsp. *enterica* serovar *Typhimurium* as a template. The structure model was analyzed and visualized using the PyMOL software package (PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC, New York, NY, USA). Amino acid residues that are located on the surface of the predicted substrate groove were identified and chosen to be substituted against positively charged amino acids (lysine or asparagine). Fifteen amino acid residues were identified to be orientated towards the protein surface close to the catalytic region. These amino acids were assigned to two groups: (1) inner ring (closely located to the catalytic site): Y63, S64, T66, I120, S141, and S143; (2) outer ring (distantly located to the catalytic region): P68, H91, G92, D94, N95, A97, A101, K119, and A142 (Figure A1 Appendix A). In silico mutagenesis was performed to discriminate between lysine and asparagine residue, replacing amino acid residues that are located on the surface of the predicted substrate groove. The substitution with the preferred amino acid residue, either asparagine or lysine, did not result in any steric clashes with adjacent amino acids in the protein for 12 out of 15 amino acids. Therefore, amino acid residues G92, A97, and A101 were excluded from site-directed mutagenesis.

#### *4.3. Cloning of Nuclease Variants*

Genes-encoding nuclease variants with amino acid substitutions were codon-optimized for expression in *E. coli* and synthesized by ATUM (Newark, CA, USA). Flanking *Nco*I and *Aat*II restriction sites were used for unidirectional ligation into linearized vector pET24d(+) (Merck KGaA, Darmstadt, Germany), equipped with a double HIS tag. Sequence verification of inserted genes was done by Eurofins Genomics (Ebersberg, Germany).

#### *4.4. Gene Expression and Protein Purification*

Expression of genes in *E. coli* Veggie BL21 (DE3) was performed as described earlier [20]. In brief, nuclease variants were produced with *E. coli* Veggie BL21(DE3) in 1 L cultures in 2 L Erlenmeyer shaking flasks. It was not possible to transform *E. coli* Veggie BL21 (DE3) with a plasmid coding for the P68K/D157G mutant in our control. Therefore, *E. coli* BL21 (DE3) pLys was used as an alternative expression host for this NSN variant. Cells were grown under constant shaking (250 rpm) at 37 ◦C until an optical density at 600 nm (OD600) of 0.6–0.8 was reached. Afterwards, gene expression was induced

by the addition of 0.4 mM isopropyl β-d-1-thiogalactopyranoside (IPTG). Cells were harvested 4 h post-induction by centrifugation for 15 min at 4 ◦C and 2880× *g*.

Cells were disrupted by high-pressure homogenization (constant cell disruption systems, Constant Systems Limited, Northants, UK) at 5 ◦C and 1250 bar. Pelleted cells were dissolved in lysis buffer (50 mM NaPO4, pH 7.3): 1 g per 5 mL with a minimal volume of 20 mL. Crude protein extract was incubated for 30 min at 37 ◦C to enable digestion of nucleic acids by the recombinantly expressed nuclease. Afterwards, the sample was centrifuged at 4000× *g* for 30 min at 4 ◦C. HIS-tagged fusion enzymes were purified in a two-step approach using a combination of affinity (AC) and ion-exchange chromatography (IEX). Initially, gravity flow experiments using Ni sepharose 6 Fast Flow and SP Sepharose Fast Flow cation exchange chromatography resins (both GE Healthcare, Munich, Germany) were done to test activity of partly purified mutant enzymes. Afterwards, the Äkta purifier (GE Healthcare, Munich, Germany) was used to optimize the purification strategy. Supernatant from cell disruption was loaded onto a HisTrap FF Crude histidine-tagged protein purification column (GE Healthcare, Munich, Germany) equilibrated with 50 mM NaPO4, 50 mM NaCl, 5 mM imidazole, pH 7.3. The loaded column was connected to an Äkta protein purifier system and washed with 10 column volumes of 50 mM NaPO4, 50 mM NaCl, 50 mM imidazole, pH 7.3, prior to the elution with 50 mM NaPO4, 50 mM NaCl, 500 mM imidazole, pH 7.3. For the purification by IEX, the eluate from the Ni sepharose affinity purification was diluted 1:4 with IEX running buffer (25 mM NaPO4, pH 6.0) to reach a conductivity of 7–8 mS/cm. A HiTrap-SP FF column (GE Healthcare, Munich, Germany) was equilibrated with 10 column volumes of IEX running buffer, before the sample was loaded and washed with 10 column volumes of 50 mM sodium phosphate, 300 mM NaCl, pH 6.0, prior to the elution with 50 mM sodium phosphate, 700 mM NaCl, pH 6.0. All steps were performed at a flow velocity of 1 mL/min. Finally, a PD-10 desalting column (GE Healthcare, Munich, Germany) was used to replace the IEX elution buffer with storage buffer (50 mM NaPO4, 25 mM NaCl, pH 7.3). SDS-PAGE (ProGel Tris Glycin 4–20%, Anamed Elektrophorese GmbH, Groß-Bierberau/Rodau, Germany) in combination with a Western blot using a nitrocellulose blotting membrane (GE Healthcare, Munich, Germany) was used for visualization of recombinant nuclease mutants. An anti-HIS horseradish peroxidase (HRP) antibody (Miltenyi Biotec B.V. & Co. KG, Bergisch Gladbach, Germany) and the ImmobilonTM Western HRP substrate (Merck, Darmstadt, Germany) were used to detect HIS-tagged proteins.

Enzyme concentrations were measured using a (micro-) Bradford approach in 96-well plate format. A plate reader (EMax, Molecular Devices, San Jose, CA, USA) was used to determine absorbances at 590 nm, which were evaluated with the software SoftmaxPro V5 (Molecular Devices, San Jose, CA, USA). The identities of the nuclease mutants based on their molecular masses were verified with the micrOTOF-Q II Benchtop Mass Spectrometer (Bruker, Billerica, MA, USA).

#### *4.5. Enzyme Activity Assays*

Enzyme activity was determined both qualitatively and quantitatively based on the depolymerization of nucleic acids. All activity assays were done at pH 7, which has been determined to be the optimal pH. Due to the limited stability of nuclease mutants at elevated temperatures, all assays were done at 25 ◦C. No thermal effect on the stability was observed for any mutant at this temperature. (1) Qualitative ethidium bromide staining: this assay was adopted from [4]. Recombinant enzyme was incubated with 5 μg sheared double-stranded genomic DNA (dsDNA), namely UltraPureTM Salmon Sperm DNA Solution (Thermo Fisher Scientific, Darmstadt, Germany). This method is based on quenching of fluorescence of ethidium bromide intercalated into DNA. Repeated fluorescence recordings were taken using a VWR® imager (VWR international, Radnor, PA, USA) over a duration of up to 24 h. (2) Qualitative visualization by agarose gel electrophoresis: this assay was adopted from [20]. Substrate specificity was tested with sheared dsDNA, unsheared dsDNA, namely deoxyribonucleic acid from calf thymus (Sigma-Aldrich, St. Louis, MO, USA), single-stranded genomic DNA (ssDNA) from calf thymus (Sigma-Aldrich, St. Louis, USA), RNA from bacteriophage MS2 (Sigma-Aldrich, St. Louis, USA), and circularized plasmid DNA. Qualitative levels of activity are exclusively interpreted

as "+" (active enzyme) and "−" (inactive enzyme) in Figure 3, Figure 5, and Figure 6, and do not allow any quantification of activities. (3) Quantitative measurements were done in Corning® 96-well UV-transparent plates (Merck KGaA, Darmstadt, Germany) using the VictorTM X4 Multilabel Plate Reader (PerkinElmer, Rodgau, Germany), as described previously [20]. Standard activity assays were conducted in 50 mM sodium phosphate buffer at pH 7.3. For the generation of pH profiles, reactions were performed in 50 mM sodium acetate buffer at pH 5 and 6, in 50 mM sodium phosphate buffer at pH 6, 7, and 8, and in Tris/HCl buffer at pH 7, 8, and 9. Temperature profiles were conducted in the range of 10 to 90 ◦C. EDTA tolerance was tested at the following concentrations: 0, 1, 2, 5, 10, 20, and 50 mM. (4) Kinetic parameters were determined with Vmax and KM obtained from Michaelis–Menten technique by non-linear regression, as described previously [20,38]. It has been speculated that the initial reaction rate of high molecular weight substrates is reduced at high substrate concentrations due to the extension of the lag phase [39]. Therefore, maximum reaction rates were evaluated for each substrate concentration to determine defined kinetic parameters. All experiments were done in triplicate. The error level was below 10%.

#### **5. Conclusions**

The composition of the substrate groove was investigated by a combination of structural modelling, multiple sequence alignment, and site-directed mutagenesis. Amino acid residues that are in close proximity to the catalytic site (inner ring of the substrate groove) are of tremendous importance for proper activity of NSN, while amino acids at the border of the substrate groove (outer ring) are promising targets for modulation of the enzymatic properties with regard to turnover number, EDTA tolerance, and temperature preference.

#### **6. Patents**

A patent application describing the utilization of non-specific nucleases from the genus *Pseudomonas* and their application potential in cell-sorting approaches has been submitted by Miltenyi Biotec B.V. & Co. KG.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4344/9/11/941/s1. Figure S1: Sequence alignment of DNase/D157G vs. Nuc from *S. enterica* subsp*. enterica* serovar *Typhimurium*.

**Author Contributions:** Conceptualization, S.E.; methodology, L.S.S. and S.S.; validation, L.S.S., S.S., V.N., and S.E.; resources, V.N.; visualization, L.S.S. and S.E.; writing—original draft preparation, S.E.; writing—review and editing, S.E. and V.N.; All authors approved the final manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors thank Jens Hellmer (Miltenyi Biotec B.V. & Co. KG) for mass spectrometry analyses and Marek Wieczorek (Miltenyi Biotec B.V. & Co. KG) for discussion.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**

**Figure A1.** Homology model of NSN from *Pseudomonas syringae* in top view. Amino acid residues of the conserved HxK(x)4D(x)6GSxN motif are given as sticks in dark and light red in the respective monomer. Numbering of amino acids that are part of the catalytic site was omitted for clarity (encircled in red, dashed line), except for amino acid residue D129, which is part of the HxK(x)4D(x)6GSxN motif, but not part of the catalytic site. Naturally occurring amino acids of the outer and inner rings are also given as sticks and highlighted in dark and light blue, while substituted positively charged amino acid residues are indicated in dark and light green.

**Figure A2.** SDS-PAGE results illustrating the purification of all mutant enzymes. M—protein marker, RE—crude extract, PE—pellet, SN—supernatant, ENi—elution fraction Ni-agarose, EIEX—elution fraction ion exchange chromatography. Asterisks (\*) indicate that inactive mutants Y63K/D157G and S141K/D157G were purified using non-optimized gravity flow purification approaches, resulting in lower purities.

**Figure A3.** Substrate specificity of mutants with amino acid substitutions in the outer ring. All incubations were done for 1 h at 25 ◦C. "Control" indicates negative controls containing substrate but no enzyme in the reaction mixture.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
