Next Article in Journal
Reduced form of Galectin-1 Suppresses Osteoclastic Differentiation of Human Peripheral Blood Mononuclear Cells and Murine RAW264 Cells In Vitro
Previous Article in Journal
The 2.6 Å Structure of a Tulane Virus Variant with Minor Mutations Leading to Receptor Change
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sequence-Structure Analysis Unlocking the Potential Functional Application of the Local 3D Motifs of Plant-Derived Diterpene Synthases

1
National Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650091, China
2
Key Laboratory of Microbial Diversity in Southwest China, Ministry of Education, Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming 650091, China
3
College of Mathematics and Computer Science, Dali University, Dali 671003, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Biomolecules 2024, 14(1), 120; https://doi.org/10.3390/biom14010120
Submission received: 12 December 2023 / Revised: 31 December 2023 / Accepted: 11 January 2024 / Published: 17 January 2024
(This article belongs to the Section Bioinformatics and Systems Biology)

Abstract

:
Plant-derived diterpene synthases (PdiTPSs) play a critical role in the formation of structurally and functionally diverse diterpenoids. However, the specificity or functional-related features of PdiTPSs are not well understood. For a more profound insight, we collected, constructed, and curated 199 functionally characterized PdiTPSs and their corresponding 3D structures. The complex correlations among their sequences, domains, structures, and corresponding products were comprehensively analyzed. Ultimately, our focus narrowed to the geometric arrangement of local structures. We found that local structural alignment can rapidly localize product-specific residues that have been validated by mutagenesis experiments. Based on the 3D motifs derived from the residues around the substrate, we successfully searched diterpene synthases (diTPSs) from the predicted terpene synthases and newly characterized PdiTPSs, suggesting that the identified 3D motifs can serve as distinctive signatures in diTPSs (I and II class). Local structural analysis revealed the PdiTPSs with more conserved amino acid residues show features unique to class I and class II, whereas those with fewer conserved amino acid residues typically exhibit product diversity and specificity. These results provide an attractive method for discovering novel or functionally equivalent enzymes and probing the product specificity in cases where enzyme characterization is limited.

1. Introduction

Diterpenoid natural products belong to a class of widely distributed C20 isoprenoids, with more than 18,000 members identified in plants [1]. They play an important role in plant growth and development [2], mediate complex plant-environment interactions [3], and also have applications in medicine, flavor, and food industries [4,5,6,7]. All the discovered diterpenoids can be classified based on their core diterpene skeletons by removing all heteroatoms and stereocenters and reducing unsaturated structures [1,8].
Diterpenoids are highly diversified and complex compounds derived from the 5-carbon building blocks isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP). Geranylgeranyl pyrophosphate (GGPP) synthase catalyzes the coupling of IPP and DMAPP in a processive head-to-tail fashion to generate linear hydrocarbon molecules. Then, diTPSs and cytochrome P450 monooxygenases (P450s) are responsible for synthesizing a variety of intermediates and modifying skeletons [2,9,10,11]. In particular, diTPSs catalyze remarkably complex cyclization cascades with structural and stereochemical precision, creating a chemical library of 20-carbon hydrocarbons.
Based on the reaction mechanism, diTPSs employ either ionization-induced carbocation formation (diTPSs I), protonation-induced carbocation formation (diTPSs II), or use both mechanisms through bifunctional enzymes (diTPSs I/II) [12]. diTPSs II exist in the form of architectures with β, βγ, or αβγ domains, and their functional active site motif, DXDD, is located at the interface of the βγ domains where the substrate GGPP undergoes the reaction to generate intermediate [13,14]. diTPSs I exist in the form of architectures with α, αβ, or αβγ domains, housing the functional active site motif DDXXD in the α domain. In this region, the catalysis of products by diTPSs II results in diterpene precursors [13,14]. diTPSs I/II exist as fusion proteins with αβγ domains. Truncation studies failed to generate independently functional constructs of α (Class I) and βγ (Class II) domains, indicating that protein stability and structural integrity of each catalytic domain are largely dependent on crucial interactions at the α-β domain interface [15,16]. Detected during steady-state kinetic measurements, diTPSs II products diffuse in the solution, and then rebind to the active site of diTPSs I for the final cyclization reaction. This result contradicts the assumption of substrate channels between active sites [17]. Furthermore, the structural homology between the β and γ domains of squalene hopene cyclase and 23% amino acid sequence identity suggests the occurrence of ancestral gene duplication and fusion events, with catalytic activity evolving at the domain domain interface [18,19]. The available PdiTPS crystal structures indicate that most PdiTPSs contain up to three domains, namely γ, β, and α [15,20,21,22]. PdiTPSs I commonly feature non-active βγ domains and PdiTPSs II generally have an inactive α domain.
These diTPSs make a significant contribution to synthesizing a diverse range of diterpene skeletons. In plants, there are over a hundred known diterpene skeletons [1], yet the currently identified plant diterpene synthases can only synthesize less than one-tenth of the known skeletons. The limited knowledge of enzyme substrate recognition and product distribution of diTPSs hinders the identification of novel functional diTPSs. Classic multiple-sequence alignment methods have been used to identify the functional motifs of diTPSs. The identified functional sequence motifs include DXDD [23], DDXXD [24,25,26], NSE\DTE [27], PIX [28], and LHS...PNV [29,30,31]. Structural alignments also provide access to 3D motifs, which are typically formed by specific subsets of amino acid residues. The connections between these residues might not be readily apparent in the amino acid sequence, but they are consistently present in the three-dimensional spatial structure of the enzyme. These characteristics may be shared by the majority of enzymes, while at the same time, they can be unique to a specific enzyme class and relevant to its function [32,33]. They are characterized by the spatial proximity of multiple non-contiguous amino acid residues through the folding of the protein chain, resulting in the formation of a conserved or convergent spatial arrangement of structural modules.
Obtaining a substantial number of reliable structures to capture these 3D structural motifs was a challenging endeavor in the past. AlphaFold2 has generated reliable protein structures, helping to overcome the shortage of structural information. Its success in accurately predicting the three-dimensional (3D) folding of proteins and enzymes marks a revolution in structural biology. AlphaFold 2, along with its predicted structures, has found diverse applications, including studies of protein pockets [34], complex structure prediction [35,36], studies of structural similarity [37], and novel fold predictions [38]. It has predicted structures for billions of proteins, encompassing virtually all known proteins [39]. This has further spurred exploration into new protein families [40], discovering enzymes with new functions [41], and investigating the evolution and functionality of ancient proteins [42]. Structure-based method and the identification of functionally relevant motifs, we can delve into the multifunctional enzymes and the plasticity of functional sites. Ultimately, these studies will assist in delineating the 3D fingerprints of enzymes, providing crucial references for the exploration of uncharted protein functionalities. Currently, there are now techniques available to perform structural site comparisons, such as ProBis [43], PocketAlign [44], and SiteMotif [45]. These tools can align binding sites of interest in enzymes, paving the way for localized structural investigations into the functional diTPSs.
Furthermore, by examining the specific residues function relationship of Selaginella moellendorffii miltiradiene synthase (SmMDS), specific residues around the substrate responsible for product specificity, such as E690, S717, and H721, were identified [22]. The analysis of structures and catalytic mechanisms also suggests that the cavity formed by the substrate surrounding residues can selectively choose the substrate [46]. Product-changing mutational studies and structural analyses provide valuable insights for investigating PdiTPSs’ function. However, this research has only covered a small fraction of the characterized PdiTPSs. Furthermore, as of now, there has not been a comprehensive investigation of the PdiTPSs based on the sequence and structure.
Here, a manually curated and annotated database has been utilized to investigate the partitioning of PdiTPSs functions, using sequence similarity network (SSN) and phylogenetic analysis. The correlations were examined among various factors, including overall sequences, subsequences, overall structures, residues around the substrate, and product similarity. We calculated the residue preferences surrounding the substrate and analyzed their spatial conservation to identify the range of residues that significantly affect substrate type and product outcome. We analyzed the structural motifs around the substrates of PdiTPSs and used these motifs to search uncharacterized terpene synthases as well as recently characterized PdiTPSs. The comprehensive analysis here yielded valuable insights into exploring product-specific residues in PdiTPSs and the mapping patterns between PdiTPSs and their functions.

2. Materials and Methods

2.1. Collect Characterized PdiTPSs

To find potentially characterized PdiTPSs, we manually searched for experimental characterization of diterpenes from the literature up to May 2022 and collected their corresponding GenBank accession numbers (NCBI). Hidden Markov model (HMM) of the N-terminal domain (PF01397) and C-terminal domain (PF03936) of terpene synthases were downloaded from the Pfam database. Hmmsearch v.3.1.2 was used to search for the N- and C-terminal sequences in each PdiTPS sequence. When multiple N- or C-terminal sequences were identified, the result with the lowest E-value was retained (Tables S5 and S6).

2.2. Creation of Sequence Similarity Networks

All-versus-all pairwise local sequence alignments were performed using SSNpipe v.1.0.0 for PdiTPSs C, N, NC domains, and overall sequences [47]. The BLAST result files were searched with E-value thresholds ranging from 10−5 to 10−140 at 5 log unit intervals. The networks were visualized using Cytoscape (version 3.8.2).

2.3. Phylogenetic Analysis

The protein sequences were aligned using MAFFT v.7.310 [48], three methods of globalpair, gendfpair and localpair were used, and the bootstrap test was carried out with 1000 replicates. Manual inspection was performed to ensure proper alignment of known motifs such as the DDXXD and DXDD motif. Phylogenetic tree analysis was inferred using IQTree v.2.0.3 [49] with the following parameters: -s Mafft_Sequence -m JTT+F+R7 (full-length)/JTT+F+R5 (C-terminal and N-terminal subsequences) -B 1000 -nt AUTO. Levopimaradiene synthase from the hornwort Phaeoceros carolinianus was specified as the outgroup, and the tree was visualized by Chiplot [50].

2.4. Retrieve and Visualize Sequence Motifs

A total of 199 PdiTPSs amino acid sequences were submitted to the MEME online tool https://meme-suite.org/meme/tools/meme (accessed on 23 November 2022) to identify sequence motifs. Based on the width of functional motifs, the number of motifs was set to 20, and the minimum width of the motifs was set to 3. Other parameters were set to default values.

2.5. Calculations on Similarity and Correlation between Sequences, Structures, and Small Molecules

The similarities between C-terminal, N-terminal, and overall sequences were compared by using the TBtools [51] Protein Pairwise Similarity Matrix. The TM-align [52] was used to compare the topology similarities between the residues around the substrates and the overall structures, with the command “TMalign Pdb-A Pdb-B -outfmt 2″, and the TM-score value scales the structural similarity. The Dice Similarity Coefficient (DSC) [53] was used to measure chemical similarity in extended connectivity fingerprint (ECFP/Morgan Fingerprint, radius 2). The SMILES strings of the small molecules used in the calculations are provided in Table S4. For each pair of PdiTPSs, their corresponding products were arranged and combined, and the similarity score was calculated using the RDkit similarity matrix. For PdiTPSs that produce only one product, there is only one product pair and thus only one similarity value. However, for those that produce multiple products, several product pairs were obtained, and the average of their similarity scores was used to represent the product similarity values. The Pearson correlation coefficient (PPC) was calculated using the ggstatsplot package in R software (version 4.3.1) [54]. The correlation analysis involved the following factors: the overall sequences of 199 PdiTPSs, the sequence similarities of C-terminal, N-terminal, and NC terminal subsequences, as well as the structures formed by residues surrounding the substrates and the overall structures.

2.6. Structure Prediction and Molecular Docking

A total of 153 structures of PdiTPSs were downloaded from the AlphaFold database https://alphafold.com/ (accessed on 30 July 2022), and the rest were predicted for their 3D structures using ColabFold [55,56]. The obtained structures were docked with substrates using the CB-Dock2 molecular docking program https://cadd.labshare.cn/cb-dock2 (accessed on 11 September 2022) The parameters for molecular docking are the default parameters. The docking results generated by CB-Dock2 were selected to reference the crystal structures. Amino acids within a 4–10 Å range of the substrate were selected using the command “select AA, byres all within 4 of sele” in PyMOL (version 2.0) for further analysis. The diagrams were made with PyMOL.

2.7. Analyzing Amino Acid Frequencies and Preferences in the Proximity of Substrates

Software iLearnPlus v.1.0.1 [57] was used to extract the amino acid composition (AAC) and grouped amino acid composition (GAAC) features from the residues surrounding the substrates and the overall sequences. The resulting features included the frequencies of the 20 amino acids and their categorization based on 5 physicochemical properties (aliphatic, aromatic, negative charge, positive charge, uncharged). The residue preferential value was then calculated as the ratio of the frequency of each residue around the substrate to its frequency in the overall sequence.

2.8. Capturing Structure Motifs and Application

We utilized SiteMotif [45] to retrieve structural motifs generated by residues within 6 Å of the substrates for PdiTPSs that produce SK1-SK15, as well as motifs generated by residues within 8 Å of the substrates for PdiTPSs that produce different diterpene intermediates. The cutoff value with M-dist-min > 0.6 and M-dist-max > 0.4 to identify representative structural motifs in the SiteMotif. Subsequently, the retrieved motifs were visualized using PyMOL (version 2.0) and saved as aligned sequences. We further created Logo plots for these sequences using Hiplot (ORG) [58].
Additionally, we downloaded 342 putative terpene synthase structures from the AlphaFold2 protein structure database. Using the pyScoMotif (version 0.9.50) [59] to search for structural motifs in the uncharacterized terpene synthase structures and the newly identified structures of PdiTPSs. Utilizing the ‘—residue type policy = relaxed’ option to permit a maximum of one residue mutation while generating similar structural motifs.

3. Results and Discussion

3.1. Overview of Functional Annotations of PdiTPSs

We provide a curated database of 199 functional PdiTPSs, including 27 bifunctional enzymes, 90 class I enzymes, and 82 class II enzymes (Table S1). These PdiTPSs were derived from 69 plant species belonging to 26 families and 52 genera, producing 16 diterpene intermediates and 63 diterpene precursors. Of these products, only a small fraction was found to be associated with multiple PdiTPSs, while the majority of products were primarily catalyzed by a single PdiTPS, which affected the product-specific analysis.
To solve this problem, the existing terpenoid skeleton classification system [60] was employed to group these products into 16 different types. Products from PdiTPSs I and PdiTPSs I/II were classified into 15 skeleton types, while those from PdiTPSs II were classified into only a single type based on the preserved phosphate group features (Table S1). Detailed information about the structures of the products and substrates, as well as the grouping of backbone structures, are provided in supplementary material (Tables S2 and S3). The classification scheme allowed us to group multiple products from a single enzyme into the same category, such as SsSS synthase from Salvia sclarea [5], which could catalyze the dephosphorylation and minor rearrangement of 9 diterpene intermediates to produce 11 diterpene precursors, all of which fall into the same labdane scaffold. However, this classification system is not always effective; for example, TrTPS13 from Tripterygium regelii produces five products belonging to pimarane, kaurane, and stemodane scaffolds (ntkrn, sndarpardn, ipsfdn, spsfdn, sdmon), and PdiTPSs from Grindelia hirsutula produces three products (abedn, epmnlo, mnlo) belonging to labdane and abietane scaffolds, respectively (Table S3). Therefore, in this study, we also attempt to explore the correlation between PdiTPSs in terms of sequence and structure with substrates or products’ similarity. Although the diversity of products and high substrate promiscuity challenge the analysis of enzyme product specificity, these enzymes can be used to explore and obtain new diterpenes. For example, Peter and his colleagues established a new biosynthetic pathway of 16 diterpenes using extreme promiscuity diTPSs from plant SsSS (Salvia sclarea) and bacterial KgTS (Kitasatospora griseola) [61]. Subsequently, they used the heterozygosity of diTPSs for combinatorial biosynthesis in their research. Their work offers novel biosynthetic access to almost 19 labdane-related diterpenes, showcasing the power of the combinatorial approach for expanding chemical diversity and potential application [62].

3.2. The SSN Topology Displays Fuzzy Functional Relationships of PdiTPSs

Applying a skeleton classification system for functional mapping in SSN analysis was examined. This all-pairs local sequence-based comparison method could rapidly generate a network of nodes and edges using any expectation value (E) as a threshold. SSN can be used to assign functions to uncharacterized enzymes, such as exploring the functional relationship between the glutamylation domain of the lantibiotic dehydratases [63] and achieving functional partitioning of sesquiterpene synthases, correctly predicting new sesquiterpene synthases from five different fungi [64].
The analysis above suggests that C-terminal, N-terminal, NC-terminal subsequences, and overall sequences can be used for classifying PdiTPSs I and PdiTPSs II. In general, the N-terminal (Figure 1a), C-terminal (Figure 1b), and NC-terminal (Figure 1c) networks generated by SSN mainly resulted in multiple backbone clustering. In particular, labdane, abietane, and pimarane skeletons were often clustered together, despite the clear differences in their product structures. In contrast, the product skeleton clusters obtained from multiple sequence comparisons of N-terminal subsequences were more refined, with fewer outliers.
While PdiTPSs from different species clustered together based on multiple sequence alignments, grouping them according to products has limitations. Moreover, we observed that the most abundant and widely distributed skeletons, such as labdane, abietane, pimarane, and kaurane [65] are predominantly produced by PdiTPSs from early diverging plant lineages, such as ferns and mosses (Figure 1). Further insights into the relationship between the product backbones and enzyme sequences were from SSN analysis. We found that the labdane and kaurane skeletons serve as crucial linkage points for other skeleton groups (Figure 1c). This implies that they may be the fundamental skeletons driving the evolution and diversification of diTPS products, but further experimental validation is required to confirm this hypothesis.

3.3. The Functional Subgroups of PdiTPSs Remain Unclear from a Phylogenetic Perspective

The phylogenetic tree of PdiTPSs was constructed to detect evolutionary relationships and identify lineages with similar features. By examining the change in product skeletons, it should be possible to identify the potential correlation between products and PdiTPSs mapping. Terpene synthases commonly contain two conserved structural domains, the N-terminal and C-terminal domains. These two structural domains comprise active sites, and in the analysis of product specificity in plant sesquiterpene synthases, isolated C-terminal subsequence characteristics have been found to effectively explain product specificity [66,67]. Therefore, phylogenetic trees for N-terminal and C-terminal subsequences were constructed.
Unfortunately, the phylogenetic tree did not provide a clear division of PdiTPSs based on their functions. However, the labdane, clerodane, pimarane, kaurane, and PdiTPSs II products were frequently present in the early diverging PdiTPSs products. While this phylogenetic tree exclusively illustrates the gene’s evolutionary relationships, similar patterns are also evident in the phylogenetic trees of the overall sequences (Figure 2), as well as the N-terminal (Figure S1A) and C-terminal (Figure S1B) domains. The atisane, trachylobane, beyerane, and stemodane were found in the late-emerging PdiTPS products. There were observed trends in functions of PdiTPS products, towards evolving multi-ring skeletons from the major branches of the trees. Moreover, the PdiTPSs that produced the casbane, cembrane, taxane, vulgarisane, and pseudolarane skeletons showed shorter evolutionary distances from the ancestral PdiTPSs. This provides valuable insights into how the function and evolution of PdiTPSs may contribute to species-specific adaptations to unique ecological niches. However, additional investigations are necessary to further explore the distribution patterns of diterpenoid compound types and their relationship with the evolutionary status of plants. Similar research has been carried out to investigate the distribution of terpenoid compounds and their biosynthetic pathways in various species of Isodon plants [68,69].
The domain composition of PdiTPSs showed that the γβα triple-domain structure and βα bi-domain structure alternately appeared in the phylogenetic tree, indicating a phenomenon of continual loss and acquisition of structural domain subsequences during the evolution of terpene synthases. Additionally, the bi-domain βα structure, seen only in angiosperms (Figure 2), originated from the loss of the γ domain in ancestral terpene synthases with the γβα structure [13,69].
The LHS and PNV motifs (Figure 2) were copalyl diphosphate synthase (CPS)-specific motifs [29], as confirmed by sequence-based analysis. These two motifs were conserved in PdiTPSs II, but had undergone mutations in PdiTPSs I. The histidine (H) residue in the FEHXW motif exerted cooperative GGPP/Mg2+ inhibition on CPS [70], but histidine was not always conserved in the FEHXW motif of PdiTPSs II. Although the function of aromatic amino acids in this motif remained unclear, apparently in PdiTPSs I these residues were no longer predominantly composed of aromatic amino acids in this motif, but rather of aliphatic and uncharged amino acids. The PIX motif (Figure 2) displayed was related to ent-kaurene [28] and was lost in the PdiTPSs of angiosperms that produced primarily labdane and abietane, as well as in that of polycyclic skeleton casbane, cembrane, and vulgarisane. This motif was present in the PdiTPSs of mosses that produce labdane and abietane, but had undergone mutations. This suggests that potentially product-specific motifs might exist in PdiTPSs, which have evolved through deletions and mutations, and thereby resulted in enzyme sequences with newly acquired functions. Motifs therefore are different from those in other PdiTPSs or being absent in other PdiTPSs may help uncover product-specific motifs.

3.4. PdiTPSs with Conserved N-Terminal and Variable C-Terminal Subsequences

The box plot was used to visualize the sequence similarity features of PdiTPSs. Larger values for the upper quartile and lower quartile in the box plot indicate higher similarity among sequences. Statistical results (Figure 3) showed that the C-terminal was more conserved in PdiTPSs I, while the N-terminal was more conserved in PdiTPSs II. The main differences in sequence similarity between PdiTPSs I and PdiTPSs II were located in the C-terminal (Figure 3), suggesting that the use of C-terminal subsequences might facilitate divergence. The similarity distribution of PdiTPSs I and PdiTPSs I/II sequences was lower in the NC-terminus and overall regions than that of in the C- and N-terminal subsequences, while the similarity distribution of PdiTPSs II and PdiTPSs I/II sequences was lower in the C-terminus than that of in the N-, NC-, and global sequences (Figure 3).
After obtaining the basic conservation features of the PdiTPSs sequences, we compared and assessed sequence similarity in terms of substrate recognition and product generation. Theoretically, sequence similarities that are required to recognize the same substrates or to produce the same products, should be higher than that which recognizes different substrates or produces different products. The results are consistent. The N-terminal subsequences had the highest upper and lower quartiles of sequence similarities in identifying the same substrates and producing the same products. In contrast, the C-terminal subsequences had the lowest lower quartile of sequence similarities in identifying different substrates and producing different products. Additionally, the C-terminal subsequences presented the fewest 100% sequence similarity values in identifying different substrates and producing different products (Figure 4a,b). Similarly, sesquiterpene synthases have been reported to construct a phylogenetic tree using the C-terminal subsequences, which can group enzymes based on their product types [66]. This is also reflected in the SSN analysis, where the C-terminal subsequences generated more different clusters and distant outliers for different types of products than the N-terminal subsequences at the same threshold (Figure 1a,b).

3.5. PdiTPSs Have a Conserved Global Structure and a Flexible Fold of Surrounding Residues

While some clues have been obtained from sequences to predict the substrates and products of PdiTPSs, these have not yet expanded our insights into the functional features. Hence, further examination of PdiTPSs’ structures and their substrate-surrounding residues is warranted. The distribution of fold similarities for the global structures has been calculated (Figure 3), with most TM scores exceeding 0.75. Furthermore, the median TM scores of the overall structures when recognizing the same and different substrates (Figure 4d), as well as producing the same and different products (Figure 4e), were also greater than 0.75 and closer to the upper quartile (Q3). These results suggested that PdiTPSs that perform different functions shared a similar TPS fold. Here, only the superimposed results of the representative structures of different types of PdiTPSs at the median (Q2) of TM scores were shown (Figure 3). Supplementary Figure S2 showed the global structural superposition results of the representative structures of different types of structures representing PdiTPSs in Figure 3 with 2 extreme values and 3 quartiles of TM score. Furthermore, increasing the selected residue range around the substrate led to a higher TM score, signifying greater differences in the local structure formed by residues closer to the substrate. Figure 4c displays the location of the local structure constituted by the residues surrounding the substrate. Conversely, the conservation of the local structure formed by residues farther from the substrate increased, although the rate of increase leveled off (Figure 4d,e).
Both the local structures formed by substrate-surrounding residues and the overall structures exhibited significantly higher TM scores when compared to those of different substrates or different products (Figure 4d,e). This trend was more significant than the overall difference trend contributed by sequence similarity (Figure 4a,b). The results of structural analysis are similar to the evaluation of sequence similarity. The TM scores between the residues around the substrates and the whole structures that recognize the same substrates or produce the same product are greater than 0.5. Conversely, the TM scores for those structures recognizing different substrates and producing different products would be less than 0.5. Therefore, the local structures formed by residues within 6 Å of the substrates appeared to be the best choice for determining substrate and product similarity.

3.6. N-Terminal Subsequence Strongly Correlates with Overall Sequence Similarity

We have provided some insights for the determination of product types at both the sequence and structure levels. Hence, it would be interesting to further explore methods for quantitatively assessing the relationship between PdiTPSs sequences, structures, and products. The correlation was evaluated using Pearson’s correlation coefficient (PCC) [71], with only the final coefficient being shown here. The statistical results showed that the similarity between the C-terminal subsequences and the overall sequences (PCC = 0.46, p < 0.001) was significantly weaker than that between the N-terminal subsequences (PCC = 0.91, p < 0.001). And there was almost no correlation between the C-terminal and N-terminal subsequences (PCC = 0.26, p < 0.001).
The weak correlation of sequence similarities between C-terminal and N-terminal indicates that their contributions to PdiTPSs specificity are indeed significantly different. However, the phylogenetic trees of the C-terminal subsequence, N-terminal subsequence, and full-length sequence are similar. Another noteworthy observation is that the sequence similarity between the N-terminal subsequence and the full-length sequence is highly correlated, while the C-terminal subsequence is not related to either the N-terminal or the full-length sequence. A study has suggested that the N-terminal subsequence has been maintained conservation during evolution [14]. In contrast, the C-terminal subsequence likely underwent functional selection and evolved at a faster rate in order to acquire new functions to adapt to the environment. Clues to this can be found in our statistics of the sequence divergence of the C-terminal subsequences than the N-terminal subsequences.

3.7. Substrate-Surrounding Residue Topology in PdiTPSs Is Independent of the Overall Structure

The analysis based on sequence features provides limited insights into substrate recognition and functional diversification of PdiTPSs. Protein structures provide a higher resolution platform for understanding function, but acquiring protein structures is expensive and difficult. The emergence of AlphaFold2 and its high accuracy is exciting, as it has been applied to understand the mechanisms of enzymes [72]. AlphaFold2 has been applied to obtain structural data for the PdiTPSs in this study. Blind docking using CB-Dock2 can be used to reveal key residues that are functionally relevant in the binding pocket [73,74,75]. Based on the structural data, it has been observed that the variable arrangement of the γ, β, and α domains in PdiTPSs (Figure 2) is an important strategy for expanding and diversifying diTPSs. Their combination, presence, and absence constitute the structural chemistry of diTPSs [14,15,21].
In addition, the correlation analysis showed that the similarity of the overall structure increased with the similarity of the overall sequence (PCC = 0.78, p < 0.001). The N-terminal subsequence showed a high correlation with the overall sequence and a moderate correlation with the TM score of the overall structure (PCC = 0.69, p < 0.001). Conversely, the C-terminal subsequence differed significantly from the N-terminal and overall sequence and exhibited weak correlation with the overall structure (PCC = 0.33, p < 0.001). When analyzing the correlation between residues surrounding the substrate and the overall structure, the trend was completely opposite (Table S9). The average correlation between the C-terminal subsequence (0.54) and residues around the substrate was higher than that between the N-terminal subsequence (0.37). Combining the N- and C-terminal subsequence as the NC-terminal subsequence increased the average correlation with residues around the substrate to 0.58, which is understandable because both N- and C-terminal subsequence contain residues around the substrate. Furthermore, there was no correlation between residues around the substrate and the topology of the overall structure, as indicated by the TM score distribution. This distribution was significantly lower for residues around the substrate compared to the overall structure (Figure 4d,e). This indicates that the local structures of PdiTPSs, formed by residues around the substrates, may have significantly different folding mechanisms compared to the overall structures.
To evaluate the relationship between sequence similarity, the TM scores of protein structures, and products, the impact of sequence similarities, and the TM scores of products were indirectly calculated by measuring the strength of their correlations. The final correlation coefficients between product similarities and sequences and structures are summarized in Table 1. Surprisingly, the overall sequence had the highest correlation with the product, while the residues around the substrate and the overall topology had a weak correlation with the product. The phenomenon can be explained, as our research primarily centers around assessing the geometric compatibility between the substrate and the enzyme pocket. It is evident that, in the catalytic process of diTPSs, the geometric selection of substrate by the enzyme is only one of the factors. Hence, we propose a deeper exploration of the conserved physicochemical properties of residues occupying the same spatial vicinity to the substrate. This avenue of investigation promises a more insightful elucidation of the intricate interplay between diTPSs and the process of product formation.
To further validate the impact of the correlation between product and sequence similarity on product grouping, the similarity mapping between product and conserved motifs in PdiTPSs was analyzed by MEME. The four signature motifs (LHS, PNV, FERLW, and PIX) located in one differently long motif (Figure 5) have been identified. The similarities of these motifs to the products were then correlated with the similarities of the products. The correlation between the similarities of motifs 2 (PNV and FERLW) and products was 0.51, while the correlations for motif 1 (PIX) and motif 3 (LHS) were 0.32 and 0.36, respectively. Motifs showing stronger product correlations exhibit fewer mutations in unfunctional motifs and vice versa. This correlation indirectly reflects the differentiation level among enzyme motifs. Motifs with high product correlation likely represent functional domains of the PdiTPSs, whereas positions with low product correlation may contain product-specific motifs. Additionally, these functionally identified motifs, except for DDXXD and DXDD, can be aligned across all diTPSs and easily identified. However, the remaining motifs can only be discovered in enzymes with the same functions. After aligning enzymes with different functions, these motifs are overshadowed (Figure 5e–g). Yet, evidently, these motifs have fixed positions in the structures (Figure 5a–d). This phenomenon further emphasizes our focus on structural features.

3.8. Aromatic Residues around PdiTPSs Substrates Affect the Substrate Selection

Generally, the probability of catalytic-associated residues appearing elsewhere in the sequence should be lower than that of appearing at the catalytic center. The aromatic amino acids around substrates with different sizes were the most frequently observed in PdiTPSs (Figure 6a,b). This may be due to aromatic rings in aromatic amino acids providing electrons to stabilize carbocation intermediates in the PdiTPSs catalysis, facilitating substrate conversion. Specifically, tryptophan (W) was more likely to occur around the linear substrate of GGPP, while phenylalanine (F) and tyrosine (Y) were more likely to present around the substrate of the initial cyclization intermediates of PdiTPSs (Figure 7b,d).
The aromatic residues of PdiTPSs at the active site usually contain at least 2–3 aromatic residues, which most likely guide the intermediate involved in the reaction through spatial constraints and cation-π interactions. Moreover, the number of aromatic residues in the active site can be used to predict the promiscuity of the enzyme [76]. However, this report explains the selectivity of the observed substrate structures on the observed types of aromatic residues.

3.9. The Spatial Landscape of Conserved Residues and Less Conserved Residues around PdiTPSs Substrates and Their Applications

The substrate-surrounding residues of PdiTPSs include those that have been experimentally verified impacts on product formation. Note that the physicochemical properties and spatial orientations of these residues surrounding the substrate were highly conserved. Some frequently occurring residues, such as arginine (R), cysteine (C), tryptophan (W), aspartic acid (D), isoleucine (I), serine (S), threonine (T), valine (V), phenylalanine (F), tyrosine (Y), and methionine (M) (Figure 6b), were also conserved in their spatial positions (Figure 7c,d). Except for arginine (R), cysteine (C), phenylalanine (F), and tryptophan (W), the effects of the other residues on enzyme function and product had been demonstrated by mutagenesis experiments. For example, the PdiTPSs OsKSL5i: I664T and OsKSL5i: I718V from rice (Oryza sativa) specifically produced ent-pimara-8(14),15-diene and ent-isokaur-15-ene, respectively [77]. Six PdiTPSs from Tripterygium wilfordii independently evolved new functions by mutating specific residues, including TwKSL1v2: M607\T638A, TwKSL3: M608\I639, TwKSL2: A608\I639, TwCPS3: I115\N327\V328\H268, TwCPS5: T115\A327\T326, and TwCPS6: Y265 [78]. Moreover, mutation of glutamic acid (E) at position 690 to arginine (R), phenylalanine (F), lysine (K), proline (P), or aspartic acid (D) or mutation of serine (S) at position 721 to valine (V) in SmMDS resulted in product loss [22], respectively.
Low-frequency residues, including alanine (A) and histidine (H), also contribute to product specificity. For example, the AgAS: A723S mutant of abietadiene synthase specifically produced pimaradienes, and the H268 residue in TwCPS3 also contributed to product specificity [78,79]. Additionally, these low-frequency residues around the substrate often appeared spatially conserved but less conserved in physicochemical properties (Figure 7c,d). Mutated residues affecting the products mainly occurred within 6 Å of the substrates in PdiTPSs I (Figure 7a) and within 8 Å of that in PdiTPSs II (Figure 7c). For specific residue shapes and maps within 4 Å to 8 Å of representative PdiTPSs I and PdiTPSs I/II that produced SK1-SK15 skeleton types, as well as that of PdiTPSs II, please refer to the Supplementary Materials (Figures S3–S23).
The impact of residues surrounding the substrate on product outcomes has also been observed in other enzymes. For example, the residues around the substrate in P450 enzymes can control regio- and stereo-specificity in the biosynthesis of bacterial heterodimeric diketopiperazines [80]. The local structural analysis reveals that exploring and examining local structural alignment to generate sequence-order independent structural site motifs [45,81] might offer us a perspective to unveil the remarkable chemical diversity of PdiTPSs.
Given our observation of conservative spatial arrangements among certain residues around the substrates of PdiTPSs, we employed SiteMotif to separately search for the existence of conservative structural motifs within 8 Å and 6 Å of residues that generate different diterpene scaffolds and intermediates. This method allows for the rapid batch retrieval of structural motifs within multiple local structures and has been proven effective in obtaining motifs of binding sites for glutathione-binding proteins [45]. Based on the results retrieved by SiteMotif, we observed significant differences in local structural motifs of PdiTPSs for producing various diterpene scaffolds (Figure 7b) and intermediates (Figure 7d) in PdiTPSs. Considering the inclusion of diverse characteristics in PdiTPSs, we hypothesize that these two motifs may represent the characteristic site atlas of residues around the substrates of PdiTPSs. Then, pyScoMotif [59] has been used to search for similar 3D structural motifs of these two motifs in newly characterized PdiTPSs and unreviewed terpene synthases, and the results indicate that these motifs are even present in PdiTPSs that produce new terpene scaffolds [82]. Furthermore, these two motifs can rectify incorrectly annotated terpene synthases obtained from UniProt. Supplementary Tables S7 and S8 contain information about these two motifs and the search results.
From the comprehensive results mentioned above, we can infer that these two structural motifs recur across PdiTPSs. This indicates the presence of strictly conserved feature modules around the substrates of PdiTPSs. Meanwhile, the relaxed residues around the substrates suggest their influence on catalytic specificity. It should be noted that the structures we employed are mostly predictions generated by AlphaFold2, which may not provide conformations with complete catalytic activity or accurately position side chains.

4. Conclusions

We have compiled and meticulously curated an extensive dataset of PdiTPSs with substrate–product pairs, sequences, and structural details, making it the most comprehensive resource of these enzymes to date. The dataset can facilitate in-depth sequence and structure analysis of PdiTPSs. There are some excellent examples, such as the virtual screening of P450 using characterized enzymes [83], product specificity analysis of sesquiterpene synthases [66], prediction of substrate classes for acyltransferases [84] and identification and classification of terpene synthase [85,86]. Furthermore, it serves as an essential repository of enzyme bioparts for combinatorial biology experiments to produce non-natural diterpene products and a broader spectrum of diterpene derivatives [62,87]. We attempt to categorize diterpene products by their skeletal structure. While this simplifies analysis and function comparison, it will encounter challenges when one enzyme produces multiple diterpene skeletons.
A strong correlation is observed between N-terminal subsequences and overall sequences, as well as the significant sequence differences between N- and C-terminal subsequences. Structural conservation demonstrates an increasing trend with a greater similarity between the N-terminal sequence and the overall sequence. Additionally, an independent topological structure exists between the local structure around the substrate and the overall structure. Quantitative analysis indicates that both sequence and structural similarities influence product distribution, though establishing a robust correlation remains elusive, underscoring the intricate relationships between sequence and structure in the functionality of PdiTPSs. Ultimately, our attention shifted to the examination of local structural features. This analysis allowed us to identify the spatial geometric arrangement of residues around substrates. Additionally, we assigned distinct potential functions to the conserved and less conserved residues. These conserved residues, in addition to the aspartic acid (D) in the DDXXD/DXDD, include tryptophan (W), phenylalanine (F), tyrosine (Y), methionine (M), arginine (R), serine (S), threonine (T), lysine (K), proline (P). These residues, along with aspartic acid (D), exhibit a stable distribution around the substrate of PdiTPSs. Furthermore, the spatial arrangement of these residues gives rise to distinctive 3D motifs in PdiTPSs I and PdiTPSs II, respectively. Contrastingly, if residues around a substrate exhibit high variability at the same spatial location, they tend to be product-specific residues. This correlation was supported by mutagenesis experiments involving previously reported product-specific residues.
It is noteworthy that AlphaFold2, renowned for its excellent stereochemical features [88], enables us to conduct hypothetical analyses of the overall and local structures of PdiTPSs. However, in some cases, limitations arise even when very high-confidence predictions differ from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation [55,88]. Therefore, despite it has facilitated developments in the fields of biology and medicine [89], it cannot replace the results obtained from experiments and crystal structures. Considering its limitations, our structural analysis does not address intricate substrate-residue interactions and other factors like conformational changes in catalytic states [90,91]. In summary, AlphaFold2 can serve as a valuable tool to assist in hypothesis generation for experimental design and complement the interpretation of final experimental results. However, recognizing its limitations is crucial. Additionally, despite efforts to collect characterized PdiTPSs, insufficient data hinder normalization and omissions may exist. Further characterization of PdiTPSs remains essential for continuous optimization of computational studies. There is one more thing, 3D motifs, contingent on the molecular docking results, which may exhibit minor variations due to method and model choices. Despite rigorous manual checks, the complete elimination of such differences remains unattainable.
In conclusion, a comprehensive sequence-to-structure analysis of PdiTPSs adds to the current knowledge of diTPSs. The results highlight the spatial distribution of residues around the substrate of PdiTPSs, showcasing both conservation (3D motifs) and variation (less conserved residues) that contribute to the maintenance and diversification of PdiTPSs functionality. The identified local structure signatures around substrates prove beneficial for annotating diTPSs, offering valuable insights into a more comprehensive understanding of these enzymes from a structural perspective. Actually, the concept of 3D motifs, such as the well-known catalytic triad (Ser-His-Glu), has been proposed for a considerable time. Its widespread application, however, has been hindered by the challenges in obtaining a large number of reliable enzyme structures. The advent of AlphaFold2 and its continuous optimization have significantly improved this situation, allowing for effective analysis of enzyme product specificity through structural analysis. This approach has been extended to the large-scale annotation of enzyme functions [92,93], showcasing the advantages and potential of structural analysis in understanding PdiTPSs. Of course, our analysis has some disadvantages. Firstly, the 3D motifs obtained for PdiTPSs were not systematically compared with other terpene synthases. Future research could focus on a systematic comparison of 3D motifs in various classes of terpene synthases. Additionally, a comparative analysis of the similarities and differences in 3D motifs among terpene synthases producing diverse products could be explored. Secondly, the present method for obtaining 3D motifs was cumbersome and unsuitable for large datasets. The use of a faster approach, such as US-align, enables quick batch structure alignment and sequence output [94]. Visualizing these results with tools like Jalview [95] allows for swift identification of conserved and highly variable residue positions. These methods will enhance the efficiency of enzyme structure analysis.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom14010120/s1. Supplementary File S1, Supplementary File S2 [96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154].

Author Contributions

Y.Z.: Conceptualization, Data Curation, Formal Analyses, Investigation, Visualization, and Writing—Original Draft. Y.L. (Yupeng Liang): Conceptualization, Investigation, and Methodology. G.L.: Data Curation and Formal Analyses. Y.L. (Yi Li): Supervision and Funding Acquisition. X.H.: Supervision, Project Administration, and Writing—Review and Editing. M.W.: Supervision, Writing—Review and Editing, Funding Acquisition, Project Administration, and Resources. All authors have read and agreed to the published version of the manuscript.

Funding

The research was financially supported by a grant (No. 2021KF011) from the State Key Laboratory for Conservation and Utilization of Bio-Resources at Yunnan University, as well as a grant (62241602) from the National Natural Science Foundation of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the supplementary materials.

Acknowledgments

We would like to express our sincere gratitude to Gabriel Cia for his invaluable assistance during the use of pyScoMotif, which significantly contributed to the successful completion of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zeng, T.; Chen, Y.X.X.; Jian, Y.X.; Zhang, F.; Wu, R.B. Chemotaxonomic investigation of plant terpenoids with an established database (TeroMOL). New Phytol. 2022, 235, 662–673. [Google Scholar] [CrossRef]
  2. Zerbe, P.; Bohlmann, J. Plant diterpene synthases: Exploring modularity and metabolic diversity for bioengineering. Trends Biotechnol. 2015, 33, 419–428. [Google Scholar] [CrossRef] [PubMed]
  3. Gershenzon, J.; Dudareva, N. The function of terpene natural products in the natural world. Nat. Chem. Biol. 2007, 3, 408–414. [Google Scholar] [CrossRef] [PubMed]
  4. Jennewein, S.; Croteau, R. Taxol: Biosynthesis, molecular genetics, and biotechnological applications. Appl. Microbiol. Biotechnol. 2001, 57, 13–19. [Google Scholar]
  5. Caniard, A.; Zerbe, P.; Legrand, S.; Cohade, A.; Valot, N.; Magnard, J.L.; Bohlmann, J.; Legendre, L. Discovery and functional characterization of two diterpene synthases for sclareol biosynthesis in Salvia sclarea (L.) and their relevance for perfume manufacture. BMC Plant Biol. 2012, 12, 1–13. [Google Scholar] [CrossRef]
  6. Schalk, M.; Pastore, L.; Mirata, M.A.; Khim, S.; Schouwey, M.; Deguerry, F.; Pineda, V.; Rocci, L.; Daviet, L. Toward a Biosynthetic Route to Sclareol and Amber Odorants. J. Am. Chem. Soc. 2012, 134, 18900–18903. [Google Scholar] [CrossRef] [PubMed]
  7. Philippe, R.N.; De Mey, M.; Anderson, J.; Ajikumar, P.K. Biotechnological production of natural zero-calorie sweeteners. Curr. Opin. Biotechnol. 2014, 26, 155–161. [Google Scholar] [CrossRef]
  8. Bemis, G.W.; Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887–2893. [Google Scholar] [CrossRef]
  9. Chen, F.; Tholl, D.; Bohlmann, J.; Pichersky, E. The family of terpene synthases in plants: A mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J. 2011, 66, 212–229. [Google Scholar] [CrossRef]
  10. Banerjee, A.; Hamberger, B. P450s controlling metabolic bifurcations in plant terpene specialized metabolism. Phytochem. Rev. 2018, 17, 81–111. [Google Scholar] [CrossRef]
  11. Bathe, U.; Tissier, A. Cytochrome P450 enzymes: A driving force of plant diterpene diversity. Phytochemistry 2019, 161, 149–162. [Google Scholar] [CrossRef] [PubMed]
  12. Jia, Q.D.; Köllner, T.G.; Gershenzon, J.; Chen, F. MTPSLs: New Terpene Synthases in Nonseed Plants. Trends Plant Sci. 2018, 23, 121–128. [Google Scholar] [CrossRef]
  13. Christianson, D.W. Structural and Chemical Biology of Terpenoid Cyclases. Chem. Rev. 2017, 117, 11570–11648. [Google Scholar] [CrossRef]
  14. Faylo, J.L.; Ronnebaum, T.A.; Christianson, D.W. Assembly-Line Catalysis in Bifunctional Terpene Synthases. Acc. Chem. Res. 2021, 54, 3780–3791. [Google Scholar] [CrossRef]
  15. Zhou, K.; Gao, Y.; Hoy, J.A.; Mann, F.M.; Honzatko, R.B.; Peters, R.J. Insights into Diterpene Cyclization from Structure of Bifunctional Abietadiene Synthase from Abies grandis. J. Biol. Chem. 2012, 287, 6840–6850. [Google Scholar] [CrossRef]
  16. Peters, R.J.; Carter, O.A.; Zhang, Y.; Matthews, B.W.; Croteau, R.B. Bifunctional abietadiene synthase: Mutual structural dependence of the active sites for protonation-initiated and ionization-initiated cyclizations. Biochemistry 2003, 42, 2700–2707. [Google Scholar] [CrossRef] [PubMed]
  17. Peters, R.J.; Ravn, M.M.; Coates, R.M.; Croteau, R.B. Bifunctional abietadiene synthase: Free diffusive transfer of the (+)-copalyl diphosphate intermediate between two distinct active sites. J. Am. Chem. Soc. 2001, 123, 8974–8978. [Google Scholar] [CrossRef] [PubMed]
  18. Wendt, K.U.; Poralla, K.; Schulz, G.E. Structure and function of a squalene cyclase. Science 1997, 277, 1811–1815. [Google Scholar] [CrossRef]
  19. Wendt, K.U.; Schulz, G.E. Isoprenoid biosynthesis: Manifold chemistry catalyzed by similar enzymes. Structure 1998, 6, 127–133. [Google Scholar] [CrossRef]
  20. Köksal, M.; Jin, Y.H.; Coates, R.M.; Croteau, R.; Christianson, D.W. Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. Nature 2011, 469, U116–U138. [Google Scholar] [CrossRef]
  21. Köksal, M.; Hu, H.Y.; Coates, R.M.; Peters, R.J.; Christianson, D.W. Structure and mechanism of the diterpene cyclase ent-copalyl diphosphate synthase. Nat. Chem. Biol. 2011, 7, 431–433. [Google Scholar] [CrossRef]
  22. Tong, Y.R.; Ma, X.L.; Hu, T.Y.; Chen, K.; Cui, G.H.; Su, P.; Xu, H.F.; Gao, W.; Jiang, T.; Huang, L.Q. Structural and mechanistic insights into the precise product synthesis by a bifunctional miltiradiene synthase. Plant Biotechnol. J. 2022, 21, 165–175. [Google Scholar] [CrossRef] [PubMed]
  23. Abbas, F.; Ke, Y.; Yu, R.; Yue, Y.; Amanullah, S.; Jahangir, M.M.; Fan, Y. Volatile terpenoids: Multiple functions, biosynthesis, modulation and manipulation by genetic engineering. Planta 2017, 246, 803–816. [Google Scholar] [CrossRef] [PubMed]
  24. Starks, C.M.; Back, K.W.; Chappell, J.; Noel, J.P. Structural basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Science 1997, 277, 1815–1820. [Google Scholar] [CrossRef]
  25. Rynkiewicz, M.J.; Cane, D.E.; Christianson, D.W. Structure of trichodiene synthase from Fusarium sporotrichioides provides mechanistic inferences on the terpene cyclization cascade. Proc. Natl. Acad. Sci. USA 2001, 98, 13543–13548. [Google Scholar] [CrossRef]
  26. Whittington, D.A.; Wise, M.L.; Urbansky, M.; Coates, R.M.; Croteau, R.B.; Christianson, D.W. Bornyl diphosphate synthase: Structure and strategy for carbocation manipulation by a terpenoid cyclase. Proc. Natl. Acad. Sci. USA 2002, 99, 15375–15380. [Google Scholar] [CrossRef]
  27. Degenhardt, J.; Köllner, T.G.; Gershenzon, J. Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochemistry 2009, 70, 1621–1637. [Google Scholar] [CrossRef] [PubMed]
  28. Jia, M.; Zhou, K.; Tufts, S.; Schulte, S.; Peters, R.J. A Pair of Residues That Interactively Affect Diterpene Synthase Product Outcome. ACS Chem. Biol. 2017, 12, 862–867. [Google Scholar] [CrossRef]
  29. Potter, K.; Criswell, J.; Zi, J.C.; Stubbs, A.; Peters, R.J. Novel Product Chemistry from Mechanistic Analysis of ent-Copalyl Diphosphate Synthases from Plant Hormone Biosynthesis. Angew. Chem. Int. Ed. Engl. 2014, 53, 7198–7202. [Google Scholar] [CrossRef]
  30. Potter, K.C.; Jia, M.R.; Hong, Y.J.; Tantillio, D.; Peters, R.J. Product Rearrangement from Altering a Single Residue in the Rice syn-Copalyl Diphosphate Synthase. Org. Lett. 2016, 18, 1060–1063. [Google Scholar] [CrossRef]
  31. Potter, K.C.; Zi, J.C.; Hong, Y.J.; Schulte, S.; Malchow, B.; Tantillo, D.J.; Peters, R.J. Blocking Deprotonation with Retention of Aromaticity in a Plant ent-Copalyl Diphosphate Synthase Leads to Product Rearrangement. Angew. Chem. Int. Ed. Engl. 2016, 55, 634–638. [Google Scholar] [CrossRef]
  32. Parasuram, R.; Mills, C.L.; Wang, Z.; Somasundaram, S.; Beuning, P.J.; Ondrechen, M.J. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases. Methods 2016, 93, 51–63. [Google Scholar] [CrossRef]
  33. Hedstrom, L. Serine protease mechanism and specificity. Chem. Rev. 2002, 102, 4501–4524. [Google Scholar] [CrossRef]
  34. Wong, F.; Krishnan, A.; Zheng, E.J.; Stärk, H.; Manson, A.L.; Earl, A.M.; Jaakkola, T.; Collins, J.J. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 2022, 18, e11081. [Google Scholar] [CrossRef]
  35. Humphreys, I.R.; Pei, J.M.; Baek, M.; Krishnakumar, A.; Anishchenko, I.; Ovchinnikov, S.; Zhang, J.; Ness, T.J.; Banjade, S.; Bagde, S.R.; et al. Computed structures of core eukaryotic protein complexes. Science 2021, 374, eabm4805. [Google Scholar] [CrossRef]
  36. Burke, D.F.; Bryant, P.; Barrio-Hernandez, I.; Memon, D.; Pozzati, G.; Shenoy, A.; Zhu, W.S.; Dunham, A.S.; Albanese, P.; Keller, A.; et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 2023, 30, 216–225. [Google Scholar] [CrossRef] [PubMed]
  37. Akdel, M.; Pires, D.E.; Pardo, E.P.; Jänes, J.; Zalevsky, A.O.; Mészáros, B.; Bryant, P.; Good, L.L.; Laskowski, R.A.; Pozzati, G.; et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 2022, 29, 1056–1067. [Google Scholar] [CrossRef] [PubMed]
  38. Bordin, N.; Sillitoe, I.; Nallapareddy, V.; Rauer, C.; Lam, S.D.; Waman, V.P.; Sen, N.; Heinzinger, M.; Littmann, M.; Kim, S.; et al. AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun. Biol. 2023, 6, 160. [Google Scholar] [CrossRef]
  39. Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef]
  40. Durairaj, J.; Waterhouse, A.M.; Mets, T.; Brodiazhenko, T.; Abdullah, M.; Studer, G.; Tauriello, G.; Akdel, M.; Andreeva, A.; Bateman, A.; et al. Uncovering new families and folds in the natural protein universe. Nature 2023, 622, 646–653. [Google Scholar] [CrossRef]
  41. Huang, J.Y.; Lin, Q.P.; Fei, H.Y.; He, Z.X.; Xu, H.; Li, Y.J.; Qu, K.L.; Han, P.; Gao, Q.; Li, B.S.; et al. Discovery of deaminase functions by structure-based protein clustering. Cell 2023, 186, 3182–3195. [Google Scholar] [CrossRef]
  42. Barrio-Hernandez, I.; Yeo, J.; Jänes, J.; Mirdita, M.; Gilchrist, C.L.M.; Wein, T.; Varadi, M.; Velankar, S.; Beltrao, P.; Steinegger, M. Clustering predicted structures at the scale of the known protein universe. Nature 2023, 622, 637–645. [Google Scholar] [CrossRef] [PubMed]
  43. Konc, J.; Janežič, D. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 2010, 26, 1160–1168. [Google Scholar] [CrossRef] [PubMed]
  44. Yeturu, K.; Chandra, N. PocketAlign A Novel Algorithm for Aligning Binding Sites in Protein Structures. J. Chem. Inf. Model. 2011, 51, 1725–1736. [Google Scholar] [CrossRef] [PubMed]
  45. Sankar, S.; Chandra, N. SiteMotif: A graph-based algorithm for deriving structural motifs in Protein Ligand binding sites. PLoS Comput. Biol. 2022, 18, e1009901. [Google Scholar] [CrossRef]
  46. Tao, H.; Lauterbach, L.; Bian, G.K.; Chen, R.; Hou, A.W.; Mori, T.; Cheng, S.; Hu, B.; Lu, L.; Mu, X.; et al. Discovery of non-squalene triterpenes. Nature 2022, 606, 414–419. [Google Scholar] [CrossRef]
  47. Viborg, A.H.; Terrapon, N.; Lombard, V.; Michel, G.; Czjzek, M.; Henrissat, B.; Brumer, H. A subfamily roadmap of the evolutionarily diverse glycoside hydrolase family 16 (GH16). J. Biol. Chem. 2019, 294, 15973–15986. [Google Scholar] [CrossRef]
  48. Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30, 3059–3066. [Google Scholar] [CrossRef]
  49. Nguyen, L.T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef]
  50. Xie, J.; Chen, Y.; Cai, G.; Cai, R.; Hu, Z.; Wang, H. Tree Visualization By One Table (tvBOT): A web application for visualizing, modifying and annotating phylogenetic trees. Nucleic Acids Res. 2023, 51, W587–W592. [Google Scholar] [CrossRef]
  51. Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant. 2020, 13, 1194–1202. [Google Scholar] [CrossRef] [PubMed]
  52. Zhang, Y.; Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302–2309. [Google Scholar] [CrossRef]
  53. Willett, P.; Barnard, J.M.; Downs, G.M. Chemical similarity searching. J. Chem. Inform. Computer Sci. 1998, 38, 983–996. [Google Scholar] [CrossRef]
  54. Patil, I. Visualizations with statistical details: The ‘ggstatsplot’ approach. J. Open Source Softw. 2021, 6, 983–996. [Google Scholar] [CrossRef]
  55. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  56. Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making protein folding accessible to all. Nat. Methods 2022, 19, 679–682. [Google Scholar] [CrossRef] [PubMed]
  57. Chen, Z.; Zhao, P.; Li, C.; Li, F.Y.; Xiang, D.X.; Chen, Y.Z.; Akutsu, T.; Daly, R.J.; Webb, G.I.; Zhao, Q.Z.; et al. iLearnPlus: A comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. 2021, 49, e60. [Google Scholar] [CrossRef]
  58. Li, J.; Miao, B.; Wang, S.; Dong, W.; Xu, H.; Si, C.; Wang, W.; Duan, S.; Lou, J.; Bao, Z.; et al. Hiplot: A comprehensive and easy-to-use web service for boosting publication-ready biomedical data visualization. Brief. Bioinform. 2022, 23, bbac261. [Google Scholar] [CrossRef] [PubMed]
  59. Cia, G.; Kwasigroch, J.M.; Stamatopoulos, B.; Rooman, M.; Pucci, F. pyScoMotif: Discovery of similar 3D structural motifs across proteins. Bioinform. Adv. 2023, 3, vbad158. [Google Scholar] [CrossRef]
  60. Hu, Z.M.; Liu, X.Y.; Tian, M.; Ma, Y.; Jin, B.L.; Gao, W.; Cui, G.H.; Guo, J.; Huang, L.Q. Recent progress and new perspectives for diterpenoid biosynthesis in medicinal plants. Med. Res. Rev. 2021, 41, 2971–2997. [Google Scholar] [CrossRef]
  61. Jia, M.; Potter, K.C.; Peters, R.J. Extreme promiscuity of a bacterial and a plant diterpene synthase enables combinatorial biosynthesis. Metab. Eng. 2016, 37, 24–34. [Google Scholar] [CrossRef]
  62. Jia, M.R.; Mishra, S.K.; Tufts, S.; Jernigan, R.L.; Peters, R.J. Combinatorial biosynthesis and the basis for substrate promiscuity in class I diterpene synthases. Metab. Eng. 2019, 55, 44–58. [Google Scholar] [CrossRef]
  63. Ortega, M.A.; Hao, Y.; Zhang, Q.; Walker, M.C.; van der Donk, W.A.; Nair, S.K. Structure and mechanism of the tRNA-dependent lantibiotic dehydratase NisB. Nature 2015, 517, 509–512. [Google Scholar] [CrossRef]
  64. Zhang, C.Q.; Chen, X.X.; Orban, A.; Shukal, S.; Birk, F.; Too, H.P.; Ruhl, M. Agrocybe aegerita Serves as a Gateway for Identifying Sesquiterpene Biosynthetic Enzymes in Higher Fungi. ACS Chem. Biol. 2020, 15, 1268–1277. [Google Scholar] [CrossRef]
  65. Johnson, S.R.; Bhat, W.W.; Bibik, J.; Turmo, A.; Hamberger, B.; Evolutionary Mint Genomics Consortium; Genomics, E.M. A database-driven approach identifies additional diterpene synthase activities in the mint family (Lamiaceae). J. Biol. Chem. 2019, 294, 1349–1362. [Google Scholar] [CrossRef]
  66. Durairaj, J.; Di Girolamo, A.; Bouwmeester, H.J.; de Ridder, D.; Beekwilder, J.; van Dijk, A.D.J. An analysis of characterized plant sesquiterpene synthases. Phytochemistry 2019, 158, 157–165. [Google Scholar] [CrossRef]
  67. Durairaj, J.; Melillo, E.; Bouwmeester, H.J.; Beekwilder, J.; de Ridder, D.; van Dijk, A.D.J. Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases. PLoS Comput. Biol. 2021, 17, e1008197. [Google Scholar] [CrossRef]
  68. Li, H.; Pu, J.X.; Li, J. Diterpenoids Chemodiversity of the Genus Isodon Spach from Lamiaceae. Plant Divers. 2013, 35, 81–88. [Google Scholar]
  69. Wang, Z.B.; Nelson, D.R.; Zhang, J.; Wan, X.Y.; Peters, R.J. Plant (di)terpenoid evolution: From pigments to hormones and beyond. Nat. Prod. Rep. 2023, 40, 452–469. [Google Scholar] [CrossRef]
  70. Jia, Q.D.; Brown, R.; Köllner, T.G.; Fu, J.Y.; Chen, X.L.; Wong, G.K.S.; Gershenzon, J.; Peters, R.J.; Chen, F. Origin and early evolution of the plant terpene synthase family. Proc. Natl. Acad. Sci. USA 2022, 119, e2100361119. [Google Scholar] [CrossRef] [PubMed]
  71. Riziotis, I.G.; Ribeiro, A.J.M.; Borkakoti, N.; Thornton, J.M. Conformational Variation in Enzyme Catalysis: A Structural Study on Catalytic Residues. J. Mol. Biol. 2022, 434, 167517. [Google Scholar] [CrossRef] [PubMed]
  72. Zhai, G.Q.; Zhang, Z.Y.; Dong, C.J. Mutagenesis and functional analysis of SotB: A multidrug transporter of the major facilitator superfamily from Escherichia coli. Front. Microbiol. 2022, 13, 1024639. [Google Scholar] [CrossRef]
  73. Chen, Z.R.; Lv, Q.L.; Peng, H.W.; Liu, X.Y.; Hu, W.L.; Hu, J.F. Drug screening against F13 protein, the target of tecovirimat, as potential therapies for monkeypox virus. J. Infection. 2023, 86, 195–198. [Google Scholar] [CrossRef] [PubMed]
  74. Liu, Y.; Yang, X.C.; Gan, J.H.; Chen, S.; Xiao, Z.X.; Cao, Y. CB-Dock2: Improved protein ligand blind docking by integrating cavity detection, docking and homologous template fitting. Nucleic Acids Res. 2022, 50, W159–W164. [Google Scholar] [CrossRef] [PubMed]
  75. Adrián, A.F.; Rodriguez, C.; Gonzalez-Chavez, R.; Georgellis, D. The Escherichia coli two-component signal sensor BarA binds protonated acetate via a conserved hydrophobic-binding pocket. J. Biol. Chem. 2021, 297, 101383. [Google Scholar]
  76. Zhang, F.; An, T.Y.; Tang, X.W.; Zi, J.C.; Luo, H.B.; Wu, R.B. Enzyme Promiscuity versus Fidelity in Two Sesquiterpene Cyclases (TEAS versus ATAS). ACS Catal. 2020, 10, 1470–1484. [Google Scholar] [CrossRef]
  77. Xu, M.M.; Wilderman, P.R.; Peters, R.J. Following evolution’s lead to a single residue switch for diterpene synthase product outcome. Proc. Natl. Acad. Sci. USA 2007, 104, 7397–7401. [Google Scholar] [CrossRef]
  78. Tu, L.C.; Cai, X.B.; Zhang, Y.F.; Tong, Y.R.; Wang, J.; Su, P.; Lu, Y.; Hu, T.Y.; Luo, Y.F.; Wu, X.Y.; et al. Mechanistic analysis for the origin of diverse diterpenes in Tripterygium wilfordii. Acta Pharm. Sin. B.2 2022, 12, 2923–2933. [Google Scholar] [CrossRef]
  79. Wilderman, P.R.; Peters, R.J. A single residue switch converts abietadiene synthase into a pimaradiene specific cyclase. J. Am. Chem. Soc. 2007, 129, 15736–15737. [Google Scholar] [CrossRef]
  80. Sun, C.; Luo, Z.; Zhang, W.; Tian, W.; Peng, H.; Lin, Z.; Deng, Z.; Kobe, B.; Jia, X.; Qu, X. Molecular basis of regio- and stereo-specificity in biosynthesis of bacterial heterodimeric diketopiperazines. Nat. Commun. 2020, 11, 6251–6261. [Google Scholar] [CrossRef]
  81. Sankar, S.; Chandran Sakthivel, N.; Chandra, N. Fast Local Alignment of Protein Pockets (FLAPP): A System-Compiled Program for Large-Scale Binding Site Alignment. J. Chem. Inf. Model. 2022, 62, 4810–4819. [Google Scholar] [CrossRef] [PubMed]
  82. Li, C.; Wang, S.; Yin, X.; Guo, A.; Xie, K.; Chen, D.; Sui, S.; Han, Y.; Liu, J.; Chen, R.; et al. Functional Characterization and Cyclization Mechanism of a Diterpene Synthase Catalyzing the Skeleton Formation of Cephalotane-Type Diterpenoids. Angew. Chem. Int. Ed. Engl. 2023, 62, e202306020. [Google Scholar] [CrossRef] [PubMed]
  83. Wang, H.; Wang, Q.; Liu, Y.Q.; Liao, X.P.; Chu, H.Y.; Chang, H.; Cao, Y.; Li, Z.G.; Zhang, T.C.; Cheng, J.; et al. PCPD: Plant cytochrome P450 database and web-based tools for structural construction and ligand docking. Synth. Syst. Biotechnol. 2021, 6, 102–109. [Google Scholar] [CrossRef]
  84. Kruse, L.H.; Weigle, A.T.; Irfan, M.; Martínez-Gómez, J.; Chobirko, J.D.; Schaffer, J.E.; Bennett, A.A.; Specht, C.D.; Jez, J.M.; Shukla, D.; et al. Orthology-based analysis helps map evolutionary diversification and predict substrate class use of BAHD acyltransferases. Plant J. 2022, 111, 1453–1468. [Google Scholar] [CrossRef]
  85. Priya, P.; Yadav, A.; Chand, J.; Yadav, G. Terzyme: A tool for identification and analysis of the plant terpenome. Plant Methods 2018, 14, 4. [Google Scholar] [CrossRef]
  86. Domingues, D.S.; Oliveira, L.S.; Lemos, S.M.C.; Barros, G.C.C.; Ivamoto-Suzuki, S.T. A Bioinformatics Tool for Efficient Retrieval of High-Confidence Terpene Synthases (TPS) and Application to the Identification of TPS in Coffea and Quillaja. Methods Mol. Biol. 2022, 2469, 43–53. [Google Scholar] [PubMed]
  87. Bathe, U.; Schmidt, J.; Frolov, A.; Soboleva, A.; Frank, O.; Dawid, C.; Tissier, A. Production of 130 diterpenoids by combinatorial biosynthesis in yeast. bioRxiv 2023, preprint. [Google Scholar]
  88. Terwilliger, T.C.; Liebschner, D.; Croll, T.I.; Williams, C.J.; McCoy, A.J.; Poon, B.K.; Afonine, P.V.; Oeffner, R.D.; Richardson, J.S.; Read, R.J.; et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat. Methods 2023. online ahead of print. [Google Scholar] [CrossRef]
  89. Yang, Z.Y.; Zeng, X.X.; Zhao, Y.; Chen, R.S. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct. Target. Ther. 2023, 8, 115. [Google Scholar] [CrossRef]
  90. Driller, R.; Janke, S.; Fuchs, M.; Warner, E.; Mhashal, A.R.; Major, D.T.; Christmann, M.; Brück, T.; Loll, B. Towards a comprehensive understanding of the structural dynamics of a bacterial diterpene synthase during catalysis. Nat. Commun. 2018, 9, 3971. [Google Scholar] [CrossRef]
  91. Raz, K.; Levi, S.; Gupta, P.K.; Major, D.T. Enzymatic control of product distribution in terpene synthases: Insights from multiscale simulations. Curr. Opin. Biotechnol. 2020, 65, 248–258. [Google Scholar] [CrossRef]
  92. Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner. Brief. Bioinform. 2023, 24, bbad357. [Google Scholar] [CrossRef] [PubMed]
  93. Derry, A.; Altman, R.B. Explainable protein function annotation using local structure embeddings. bioRxiv 2023. preprint. [Google Scholar]
  94. Zhang, C.X.; Shine, M.; Pyle, A.M.; Zhang, Y. US-align: Universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat. Methods 2022, 19, 1109–1115. [Google Scholar] [CrossRef] [PubMed]
  95. Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef]
  96. Andersen-Ranberg, J.; Kongstad, K.T.; Nielsen, M.T.; Jensen, N.B.; Pateraki, I.; Bach, S.S.; Zerbe, P.; Staerk, D.; Bohlmann, J.; Moller, B.L.; et al. Expanding the Landscape of Diterpene Structural Diversity through Stereochemically Controlled Combinatorial Biosynthesis. Angew. Chem. Int. Ed. Engl. 2016, 55, 2142–2146. [Google Scholar] [CrossRef]
  97. Božić, D.; Papaefthimiou, D.; Brückner, K.; de Vos, R.C.H.; Tsoleridis, C.A.; Katsarou, D.; Papanikolaou, A.; Pateraki, I.; Chatzopoulou, F.M.; Dimitriadou, E.; et al. Towards Elucidating Carnosic Acid Biosynthesis in Lamiaceae: Functional Characterization of the Three First Steps of the Pathway in Salvia fruticosa and Rosmarinus officinalis. PLoS ONE 2015, 10, e0124106. [Google Scholar] [CrossRef]
  98. Brückner, K.; Božić, D.; Manzano, D.; Papaefthimiou, D.; Pateraki, I.; Scheler, U.; Ferrer, A.; de Vos, R.C.H.; Kanellis, A.K.; Tissier, A. Characterization of two genes for the biosynthesis of abietane-type diterpenes in rosemary (Rosmarinus officinalis) glandular trichomes. Phytochemistry 2014, 101, 52–64. [Google Scholar] [CrossRef] [PubMed]
  99. Bryson, A.E.; Lanier, E.R.; Lau, K.H.; Hamilton, J.P.; Vaillancourt, B.; Mathieu, D.; Yocca, A.E.; Miller, G.P.; Edger, P.P.; Buell, C.R. Uncovering a miltiradiene biosynthetic gene cluster in the Lamiaceae reveals a dynamic evolutionary trajectory. Nat. Commun. 2023, 14, 343. [Google Scholar] [CrossRef]
  100. Chen, X.Y.; Berim, A.; Dayan, F.E.; Gang, D.R. A (-)-kolavenyl diphosphate synthase catalyzes the first step of salvinorin A biosynthesis in Salvia divinorum. J. Exp. Bot. 2017, 68, 1109–1122. [Google Scholar] [CrossRef]
  101. Cho, E.M.; Okada, A.; Kenmoku, H.; Otomo, K.; Toyomasu, T.; Mitsuhashi, W.; Sassa, T.; Yajima, A.; Yabuta, G.; Mori, K.; et al. Molecular cloning and characterization of a cDNA encoding cassa-12,15-diene synthase, a putative diterpenoid phytoalexin biosynthetic enzyme, from suspension-cultured rice cells treated with a chitin elicitor. Plant. J 2004, 37, 1–8. [Google Scholar] [CrossRef]
  102. Cui, G.H.; Duan, L.X.; Jin, B.L.; Qian, J.; Xue, Z.Y.; Shen, G.A.; Snyder, J.H.; Song, J.Y.; Chen, S.L.; Huang, L.Q.; et al. Functional Divergence of Diterpene Syntheses in the Medicinal Plant Salvia miltiorrhiza. Plant Physiol. 2015, 169, 1607–1618. [Google Scholar] [CrossRef] [PubMed]
  103. Du, G.; Gong, H.Y.; Feng, K.N.; Chen, Q.Q.; Yang, Y.L.; Fu, X.L.; Lu, S.; Zeng, Y. Diterpene synthases facilitating production of the kaurane skeleton of eriocalyxin B in the medicinal plant. Phytochemistry 2019, 158, 96–102. [Google Scholar] [CrossRef] [PubMed]
  104. Falara, V.; Akhtar, T.A.; Nguyen, T.T.H.; Spyropoulou, E.A.; Bleeker, P.M.; Schauvinhold, I.; Matsuba, Y.; Bonini, M.E.; Schilmiller, A.L.; Last, R.L.; et al. The Tomato Terpene Synthase Gene Family. Plant Physiol. 2011, 157, 770–789. [Google Scholar] [CrossRef]
  105. Falara, V.; Pichersky, E.; Kanellis, A.K. A copal-8-ol diphosphate synthase from the angiosperm Cistus creticus subsp. creticus is a putative key enzyme for the formation of pharmacologically active, oxygen-containing labdane-type diterpenes. Plant Physiol. 2010, 154, 301–310. [Google Scholar] [CrossRef] [PubMed]
  106. Gao, W.; Hillwig, M.L.; Huang, L.; Cui, G.H.; Wang, X.Y.; Kong, J.Q.; Yang, B.; Peters, R.J. A Functional Genomics Approach to Tanshinone Biosynthesis Provides Stereochemical Insights. Org. Lett. 2009, 11, 5170–5173. [Google Scholar] [CrossRef] [PubMed]
  107. Hall, D.E.; Zerbe, P.; Jancsik, S.; Quesada, A.L.; Dullat, H.; Madilao, L.L.; Yuen, M.; Bohlmann, J. Evolution of Conifer Diterpene Synthases: Diterpene Resin Acid Biosynthesis in Lodgepole Pine and Jack Pine Involves Monofunctional and Bifunctional Diterpene Synthases. Plant Physiol. 2013, 161, 600–616. [Google Scholar] [CrossRef]
  108. Hansen, N.L.; Heskes, A.M.; Olsen, C.E.; Hallström, B.M.; Andersen-Ranberg, J.; Hamberger, B.; Evolutionary Mint Genomics Consortium. The terpene synthase gene family in harbors a labdane-type diterpene synthase among the monoterpene synthase TPS-b subfamily. Plant J. 2017, 89, 429–441. [Google Scholar] [CrossRef]
  109. Harris, L.J.; Saparno, A.; Johnston, A.; Prisic, S.; Xu, M.; Allard, S.; Kathiresan, A.; Ouellet, T.; Peters, R.J. The maize gene is induced by attack and encodes an ent-copalyl diphosphate synthase. Plant Mol. Biol. 2005, 59, 881–894. [Google Scholar] [CrossRef]
  110. Hayashi, K.; Kawaide, H.; Notomi, M.; Sakigi, Y.; Matsuo, A.; Nozaki, H. Identification and functional analysis of bifunctional ent-kaurene synthase from the moss Physcomitrella patens. FEBS Lett. 2006, 580, 6175–6181. [Google Scholar] [CrossRef]
  111. Heskes, A.M.; Sundram, T.C.M.; Boughton, B.A.; Jensen, N.B.; Hansen, N.L.; Crocoll, C.; Cozzi, F.; Rasmussen, S.; Hamberger, B.; Evolutionary Mint Genomics Consortium; et al. Biosynthesis of bioactive diterpenoids in the medicinal plant Vitex agnus-castus. Plant J. 2018, 93, 943–958. [Google Scholar] [CrossRef] [PubMed]
  112. Hillwig, M.L.; Xu, M.M.; Toyomasu, T.; Tiernan, M.S.; Wei, G.; Cui, G.H.; Huang, L.Q.; Peters, R.J. Domain loss has independently occurred multiple times in plant terpene synthase evolution. Plant J. 2011, 68, 1051–1060. [Google Scholar] [CrossRef] [PubMed]
  113. Ignea, C.; Ioannou, E.; Georgantea, P.; Loupassaki, S.; Trikka, F.A.; Kanellis, A.K.; Makris, A.M.; Roussis, V.; Kampranis, S.C. Reconstructing the chemical diversity of labdane-type diterpene biosynthesis in yeast. Metab. Eng. 2015, 28, 91–103. [Google Scholar] [CrossRef]
  114. Inabuy, F.S.; Fischedick, J.T.; Lange, I.; Hartmann, M.; Srividya, N.; Parrish, A.N.; Xu, M.M.; Peters, R.J.; Lange, B.M. Biosynthesis of Diterpenoids in Tripterygium Adventitious Root Cultures. Plant Physiol. 2017, 175, 92–103. [Google Scholar] [CrossRef]
  115. Jackson, A.J.; Hershey, D.M.; Chesnut, T.; Xu, M.M.; Peters, R.J. Biochemical characterization of the castor bean kaurene synthase(-like) family supports quantum chemical view of diterpene cyclization. Phytochemistry 2014, 103, 13–21. [Google Scholar] [CrossRef] [PubMed]
  116. Jin, B.L.; Cui, G.H.; Guo, J.; Tang, J.F.; Duan, L.X.; Lin, H.X.; Shen, Y.; Chen, T.; Zhang, H.B.; Huang, L.Q. Functional Diversification of Kaurene Synthase-Like Genes in Isodon rubescens. Plant Physiol. 2017, 174, 943–955. [Google Scholar] [CrossRef]
  117. Keeling, C.I.; Dullat, H.K.; Yuen, M.; Ralph, S.G.; Jancsik, S.; Bohlmann, J. Identification and Functional Characterization of Monofunctional Copalyl Diphosphate and Kaurene Synthases in White Spruce Reveal Different Patterns for Diterpene Synthase Evolution for Primary and Secondary Metabolism in Gymnosperms. Plant Physiol. 2010, 152, 1197–1208. [Google Scholar] [CrossRef]
  118. Keeling, C.I.; Madilao, L.L.; Zerbe, P.; Dullat, H.K.; Bohlmann, J. The Primary Diterpene Synthase Products of Levopimaradiene/Abietadiene Synthase (PaLAS) Are Epimers of a Thermally Unstable Diterpenol. J. Biol. Chem. 2011, 286, 21145–21153. [Google Scholar] [CrossRef]
  119. Keeling, C.I.; Weisshaar, S.; Ralph, S.G.; Jancsik, S.; Hamberger, B.; Dullat, H.K.; Bohlmann, J. Transcriptome mining, functional characterization, and phylogeny of a large terpene synthase gene family in spruce (Picea spp.). BMC Plant Biol. 2011, 11, 43. [Google Scholar] [CrossRef]
  120. Kirby, J.; Nishimoto, M.; Park, J.G.; Withers, S.T.; Nowroozi, F.; Behrendt, D.; Rutledge, E.J.G.; Fortman, J.L.; Johnson, H.E.; Anderson, J.V.; et al. Cloning of casbene and neocembrene synthases from Euphorbiaceae plants and expression in Saccharomyces cerevisiae. Phytochemistry 2010, 71, 1466–1473. [Google Scholar] [CrossRef]
  121. Li, J.L.; Chen, Q.Q.; Jin, Q.P.; Gao, J.; Zhao, P.J.; Lu, S.; Zeng, Y. IeCPS2 is potentially involved in the biosynthesis of pharmacologically active Isodon diterpenoids rather than gibberellin. Phytochemistry 2012, 76, 32–39. [Google Scholar] [CrossRef] [PubMed]
  122. Mafu, S.; Karunanithi, P.S.; Palazzo, T.A.; Harrod, B.L.; Rodriguez, S.M.; Mollhoff, I.N.; O’Brien, T.E.; Tong, S.; Fiehn, O.; Tantillo, D.J.; et al. Biosynthesis of the microtubule-destabilizing diterpene pseudolaric acid B from golden larch involves an unusual diterpene synthase. Proc. Natl. Acad. Sci. USA 2017, 114, 974–979. [Google Scholar] [CrossRef] [PubMed]
  123. Margis-Pinheiro, M.; Zhou, X.R.; Zhu, Q.H.; Dennis, E.S.; Upadhyaya, N.M. Isolation and characterization of a Ds-tagged rice (Oryza sativa L.) GA-responsive dwarf mutant defective in an early step of the gibberellin biosynthesis pathway. Plant Cell Rep. 2005, 23, 819–833. [Google Scholar] [CrossRef] [PubMed]
  124. Martin, D.M.; Fäldt, J.; Bohlmann, J. Functional characterization of nine Norway Spruce TPS genes and evolution of gymnosperm terpene synthases of the TPS-d subfamily. Plant Physiol. 2004, 135, 1908–1927. [Google Scholar] [CrossRef]
  125. Misra, R.C.; Garg, A.; Roy, S.; Chanotiya, C.S.; Vasudev, P.G.; Ghosh, S. Involvement of an ent-copalyl diphosphate synthase in tissue-specific accumulation of specialized diterpenes in Andrographis paniculata. Plant Sci. 2015, 240, 50–64. [Google Scholar] [CrossRef]
  126. Morrone, D.; Jin, Y.H.; Xu, M.M.; Choi, S.Y.; Coates, R.M.; Peters, R.J. An unexpected diterpene cyclase from rice: Functional identification of a stemodene synthase. Arch. Biochem. Biophys. 2006, 448, 133–140. [Google Scholar] [CrossRef] [PubMed]
  127. Otomo, K.; Kanno, Y.; Motegi, A.; Kenmoku, H.; Yamane, H.; Mitsuhashi, W.; Kawa, H.; Toshima, H.; Itoh, H.; Matsuoka, M.; et al. Diterpene cyclases responsible for the biosynthesis of phytoalexins, momilactones A, B, and oryzalexins A-F in rice. Biosci. Biotechnol. Biochem. 2004, 68, 2001–2006. [Google Scholar] [CrossRef]
  128. Pateraki, I.; Andersen-Ranberg, J.; Heskes, A.M.; Martens, H.J.; Zerbe, P.; Bach, S.S.; Moller, B.L.; Bohlmann, J.; Hamberger, B.; Evolutionary Mint Genomics Consortium. Manoyl oxide (13R), the biosynthetic precursor of forskolin, is synthesized in specialized root cork cells in Coleus forskohlii. Plant Physiol. 2014, 164, 1222–1236. [Google Scholar] [CrossRef]
  129. Pelot, K.A.; Hagelthorn, D.M.; Addison, J.B.; Zerbe, P. Biosynthesis of the oxygenated diterpene nezukol in the medicinal plant is catalyzed by a pair of diterpene synthases. PLoS ONE 2017, 12, e0176507. [Google Scholar] [CrossRef]
  130. Prisic, S.; Xu, M.M.; Wilderman, P.R.; Peters, R.J. Rice contains two disparate ent-copalyl diphosphate synthases with distinct metabolic functions. Plant Physiol. 2004, 136, 4228–4236. [Google Scholar] [CrossRef]
  131. Richman, A.S.; Gijzen, M.; Starratt, A.N.; Yang, Z.; Brandle, J.E. Diterpene synthesis in Stevia rebaudiana: Recruitment and up-regulation of key enzymes from the gibberellin biosynthetic pathway. Plant J. 1999, 19, 411–421. [Google Scholar] [CrossRef] [PubMed]
  132. Sallaud, C.; Giacalone, C.; Töpfer, R.; Goepfert, S.; Bakaher, N.; Rösti, S.; Tissier, A. Characterization of two genes for the biosynthesis of the labdane diterpene Z-abienol in tobacco (Nicotiana tabacum) glandular trichomes. Plant J. 2012, 72, 1–17. [Google Scholar] [CrossRef] [PubMed]
  133. Sawada, Y.; Katsumata, T.; Kitamura, J.; Kawaide, H.; Nakajima, M.; Asami, T.; Nakaminami, K.; Kurahashi, T.; Mitsuhashi, W.; Inoue, Y.; et al. Germination of photoblastic lettuce seeds is regulated via the control of endogenous physiologically active gibberellin content, rather than of gibberellin responsiveness. J. Exp. Bot. 2008, 59, 3383–3393. [Google Scholar] [CrossRef]
  134. Schepmann, H.G.; Pang, J.H.; Matsuda, S.P.T. Cloning and characterization of Ginkgo biloba levopimaradiene synthase which catalyzes the first committed step in ginkgolide biosynthesis. Arch. Biochem. Biophys. 2001, 392, 263–269. [Google Scholar] [CrossRef]
  135. Smith, M.W.; Yamaguchi, S.; Ait-Ali, T.; Kamiya, Y. The first step of gibberellin biosynthesis in pumpkin is catalyzed by at least two copalyl diphosphate synthases encoded by differentially regulated genes. Plant Physiol. 1998, 118, 1411–1419. [Google Scholar] [CrossRef]
  136. Su, P.; Guan, H.Y.; Zhao, Y.J.; Tong, Y.R.; Xu, M.M.; Zhang, Y.F.; Hu, T.Y.; Yang, J.; Cheng, Q.Q.; Gao, L.H.; et al. Identification and functional characterization of diterpene synthases for triptolide biosynthesis from Tripterygium wilfordii. Plant J. 2018, 93, 50–65. [Google Scholar] [CrossRef] [PubMed]
  137. Sugai, Y.; Ueno, Y.; Hayashi, K.-i.; Oogami, S.; Toyomasu, T.; Matsumoto, S.; Natsume, M.; Nozaki, H.; Kawaide, H. Enzymatic 13C Labeling and Multidimensional NMR Analysis of Miltiradiene Synthesized by Bifunctional Diterpene Cyclase in Selaginella moellendorffii. J. Biol. Chem. 2011, 286, 42840–42847. [Google Scholar] [CrossRef]
  138. Sun, T.P.; Kamiya, Y. The Arabidopsis GA1 locus encodes the cyclase ent-kaurene synthetase A of gibberellin biosynthesis. Plant Cell 1994, 6, 1509–1518. [Google Scholar]
  139. Sun, W.; Leng, L.; Yin, Q.G.; Xu, M.M.; Huang, M.K.; Xu, Z.C.; Zhang, Y.J.; Yao, H.; Wang, C.X.; Xiong, C.; et al. The genome of the medicinal plant provides insight into the biosynthesis of the bioactive diterpenoid neoandrographolide. Plant J. 2019, 97, 841–857. [Google Scholar] [CrossRef]
  140. Nakagiri, T.; Lee, J.B.; Hayashi, T. cDNA cloning, functional expression and characterization of ent-copalyl diphosphate synthase from Scoparia dulcis L. Plant Sci. 2005, 169, 760–767. [Google Scholar] [CrossRef]
  141. Wildung, M.R.; Croteau, R. A cDNA Clone for A cDNA clone for taxadiene synthase, the diterpene cyclase that catalyzes the committed step of taxol biosynthesis. J. Biol. Chem. 1996, 271, 9201–9204. [Google Scholar] [CrossRef] [PubMed]
  142. Xu, M.M.; Hillwig, M.L.; Prisic, S.; Coates, R.M.; Peters, R.J. Functional identification of rice syn-copalyl diphosphate synthase and its role in initiating biosynthesis of diterpenoid phytoalexin/allelopathic natural products. Plant J. 2004, 39, 309–318. [Google Scholar] [CrossRef] [PubMed]
  143. Yamaguchi, S.; Saito, T.; Abe, H.; Yamane, H.; Murofushi, N.; Kamiya, Y. Molecular cloning and characterization of a cDNA encoding the gibberellin biosynthetic enzyme ent-kaurene synthase B from pumpkin (Cucurbita maxima L.). Plant J. 1996, 10, 203–213. [Google Scholar] [CrossRef] [PubMed]
  144. Yamaguchi, S.; Sun, T.P.; Kawaide, H.; Kamiya, Y. The GA2 locus of Arabidopsis thaliana encodes ent-kaurene synthase of gibberellin biosynthesis. Plant Physiol. 1998, 116, 1271–1278. [Google Scholar] [CrossRef] [PubMed]
  145. Yang, R.K.; Du, Z.Y.; Qiu, T.; Sun, J.; Shen, Y.T.; Huang, L.L. Discovery and Functional Characterization of a Diverse Diterpene Synthase Family in the Medicinal Herb Isodon lophanthoides Var. gerardiana. Plant Cell Physiol. 2021, 62, 1423–1435. [Google Scholar] [CrossRef]
  146. Zerbe, P.; Chiang, A.; Yuen, M.; Hamberger, B.; Evolutionary Mint Genomics Consortium; Draper, J.A.; Britton, R.; Bohlmann, J. Bifunctional cis-abienol synthase from Abies balsamea discovered by transcriptome sequencing and its implications for diterpenoid fragrance production. J. Biol. Chem. 2012, 287, 12121–12131. [Google Scholar] [CrossRef]
  147. Zerbe, P.; Hamberger, B.; Yuen, M.M.S.; Chiang, A.; Sandhu, H.K.; Madilao, L.L.; Nguyen, A.; Hamberger, B.; Evolutionary Mint Genomics Consortium; Bach, S.S.; et al. Gene discovery of modular diterpene metabolism in nonmodel systems. Plant Physiol. 2013, 162, 1073–1091. [Google Scholar] [CrossRef]
  148. Zerbe, P.; Rodriguez, S.M.; Mafu, S.; Chiang, A.; Sandhu, H.K.; O’Neil-Johnson, M.; Starks, C.M.; Bohlmann, J. Exploring diterpene metabolism in non-model species: Transcriptome-enabled discovery and functional characterization of labda-7,13E-dienyl diphosphate synthase from Grindelia robusta. Plant J. 2015, 83, 783–793. [Google Scholar] [CrossRef]
  149. Zhou, K.; Peters, R.J. Investigating the conservation pattern of a putative second terpene synthase divalent metal binding motif in plants. Phytochemistry 2009, 70, 366–369. [Google Scholar] [CrossRef]
  150. Li, H.X.; Wu, S.; Lin, R.X.; Xiao, Y.R.; Morotti, A.L.M.; Wang, Y.; Galilee, M.; Qin, H.W.; Huang, T.; Zhao, Y.; et al. The genomes of medicinal skullcaps reveal the polyphyletic origins of clerodane diterpene biosynthesis in the family Lamiaceae. Mol. Plant. 2023, 16, 549–570. [Google Scholar] [CrossRef]
  151. Niu, S.H.; Yuan, L.; Zhang, Y.C.; Chen, X.Y.; Li, W. Isolation and expression profiles of gibberellin metabolism genes in developing male and female cones of Pinus tabuliformis. Funct. Integr. Genomic. 2014, 14, 697–705. [Google Scholar] [CrossRef] [PubMed]
  152. Qiu, T.; Li, Y.Y.; Wu, H.S.; Yang, H.; Peng, Z.Q.; Du, Z.Y.; Wu, Q.W.; Wang, H.B.; Shen, Y.T.; Huang, L.L. Tandem duplication and sub-functionalization of clerodane diterpene synthase originate the blooming of clerodane diterpenoids in Scutellaria barbata. Plant J. 2023, 116, 375–388. [Google Scholar] [CrossRef] [PubMed]
  153. Lee, J.B.; Ohmura, T.; Yamamura, Y. Functional Characterization of Three Diterpene Synthases Responsible for Tetracyclic Diterpene Biosynthesis in Scoparia dulcis. Plants 2023, 12, 69. [Google Scholar] [CrossRef] [PubMed]
  154. Ma, L.T.; Lee, Y.R.; Tsao, N.W.; Wang, S.Y.; Zerbe, P.; Chu, F.H. Biochemical characterization of diterpene synthases of Taiwania cryptomerioides expands the known functional space of specialized diterpene metabolism in gymnosperms. Plant J. 2019, 100, 1254–1272. [Google Scholar] [CrossRef]
Figure 1. Similarity networks of PdiTPSs sequences. (a) Clusters of PdiTPSs product skeletons defined by N-terminal subsequence similarity (E-value threshold 10−70). (b) Clusters of PdiTPSs product skeletons defined by C-terminal subsequence similarity (E-value threshold 10−70). (c) Clusters of PdiTPSs product skeletons defined by NC-terminal subsequence similarity (E-value threshold 10−120).
Figure 1. Similarity networks of PdiTPSs sequences. (a) Clusters of PdiTPSs product skeletons defined by N-terminal subsequence similarity (E-value threshold 10−70). (b) Clusters of PdiTPSs product skeletons defined by C-terminal subsequence similarity (E-value threshold 10−70). (c) Clusters of PdiTPSs product skeletons defined by NC-terminal subsequence similarity (E-value threshold 10−120).
Biomolecules 14 00120 g001
Figure 2. The relationship between the overall sequence phylogeny of PdiTPSs, their product scaffolds, function-related motifs, and biosource classification. The phylogenetic tree labels each enzyme with its accession ID, product class, and scaffold classification. It also displays the skeleton structures of SK1-SK15, product structures of PdiTPSs II, function-verified motifs in some PdiTPSs, and the domain composition of each PdiTPSs. The color blocks of motifs and domains in the figure only distinguish the continuous conserved motifs and domains. The color blocks in the PdiTPSs biosource classification represent the same phylum.
Figure 2. The relationship between the overall sequence phylogeny of PdiTPSs, their product scaffolds, function-related motifs, and biosource classification. The phylogenetic tree labels each enzyme with its accession ID, product class, and scaffold classification. It also displays the skeleton structures of SK1-SK15, product structures of PdiTPSs II, function-verified motifs in some PdiTPSs, and the domain composition of each PdiTPSs. The color blocks of motifs and domains in the figure only distinguish the continuous conserved motifs and domains. The color blocks in the PdiTPSs biosource classification represent the same phylum.
Biomolecules 14 00120 g002
Figure 3. Distribution of sequences and structural similarities and representative structure (Q2, 50%). Distribution of sequences and structural similarities among N-terminal, C-terminal, NC-terminal sequences, overall sequences, and overall structures of PdiTPSs I, II, and I/II. Superimposition of representative structures at the Q2 position based on the TM-score. The same color in the distribution of sequence and structural similarities represent the same sequences and structure types, respectively.
Figure 3. Distribution of sequences and structural similarities and representative structure (Q2, 50%). Distribution of sequences and structural similarities among N-terminal, C-terminal, NC-terminal sequences, overall sequences, and overall structures of PdiTPSs I, II, and I/II. Superimposition of representative structures at the Q2 position based on the TM-score. The same color in the distribution of sequence and structural similarities represent the same sequences and structure types, respectively.
Biomolecules 14 00120 g003
Figure 4. Distribution of similarities between N-terminal, C-terminal, NC-terminal subsequences, overall sequences, and overall structures of PdiTPSs for the same and different substrates and products. (a) Comparison of sequence similarities between N-terminal, C-terminal, and NC-terminal subsequences and overall sequences for the same and different substrates. (b) Comparison of sequence similarities between N-terminal, C-terminal, and NC-terminal subsequences and overall sequences for the same and different products. (c) The representative display of the range of positions of residues around the substrate under analysis. (d) Comparison of structure fold similarities between the local structures formed by residues within 4 Å, 6 Å, 8 Å, and 10 Å of the substrates with the overall structures for the same and different substrates. (e) Comparison of fold similarities between the local structures formed by residues within 4 Å, 6 Å, 8 Å, and 10 Å of the substrates with the overall structures for the same and different products. Asterisks indicate statistical significance (*: p < 0.05, ***: p < 0.001), and “NS” indicates no statistical difference. The same color in (a,b) represents the same sequence types. The same color in (d,e) represents the same local structure types.
Figure 4. Distribution of similarities between N-terminal, C-terminal, NC-terminal subsequences, overall sequences, and overall structures of PdiTPSs for the same and different substrates and products. (a) Comparison of sequence similarities between N-terminal, C-terminal, and NC-terminal subsequences and overall sequences for the same and different substrates. (b) Comparison of sequence similarities between N-terminal, C-terminal, and NC-terminal subsequences and overall sequences for the same and different products. (c) The representative display of the range of positions of residues around the substrate under analysis. (d) Comparison of structure fold similarities between the local structures formed by residues within 4 Å, 6 Å, 8 Å, and 10 Å of the substrates with the overall structures for the same and different substrates. (e) Comparison of fold similarities between the local structures formed by residues within 4 Å, 6 Å, 8 Å, and 10 Å of the substrates with the overall structures for the same and different products. Asterisks indicate statistical significance (*: p < 0.05, ***: p < 0.001), and “NS” indicates no statistical difference. The same color in (a,b) represents the same sequence types. The same color in (d,e) represents the same local structure types.
Biomolecules 14 00120 g004
Figure 5. The positions of validated structure, along with the number of motifs and specific residues near the motifs in the three classes of PdiTPSs. (a) The crystal structure of Taxadiene synthase (diTPSs I) [20]. (b) The crystal structure of ent-copalyl diphosphate synthase (diTPSs II) [21]. (c) The crystal structure of Casbene synthase (diTPSs I). (d) The crystal structure of Abietadiene cyclase (diTPSs I/II) [15]. (e) Seqlogo of the PIX motif and its neighboring residues. (f) Seqlogo of the PNV and FERLW motifs and their neighboring residues. (g) Seqlogo of the LHS motif and its neighboring residues. The black numbers represent the total number of sequences retrieved by MEME for the PNV, FERLW, LHS, and PIX motifs. The red numbers represent the number of occurrences of each motif in the three classes of PdiTPSs. In (ad), the colors yellow, green, and blue, respectively, represent the γ, β, and α structural domains.
Figure 5. The positions of validated structure, along with the number of motifs and specific residues near the motifs in the three classes of PdiTPSs. (a) The crystal structure of Taxadiene synthase (diTPSs I) [20]. (b) The crystal structure of ent-copalyl diphosphate synthase (diTPSs II) [21]. (c) The crystal structure of Casbene synthase (diTPSs I). (d) The crystal structure of Abietadiene cyclase (diTPSs I/II) [15]. (e) Seqlogo of the PIX motif and its neighboring residues. (f) Seqlogo of the PNV and FERLW motifs and their neighboring residues. (g) Seqlogo of the LHS motif and its neighboring residues. The black numbers represent the total number of sequences retrieved by MEME for the PNV, FERLW, LHS, and PIX motifs. The red numbers represent the number of occurrences of each motif in the three classes of PdiTPSs. In (ad), the colors yellow, green, and blue, respectively, represent the γ, β, and α structural domains.
Biomolecules 14 00120 g005
Figure 6. Preference of 5 types of amino acids in the residues around the substrate. (a) Frequency distribution of five types of amino acids (aliphatic, aromatic, negative charge, positive charge, and uncharged) within 4 Å, 6 Å, 8 Å, and 10 Å from the substrate compared to their overall frequency within the structure. (b) Comparison of the frequency of 20 amino acids within the structure to their frequency within 4 Å, 6 Å, 8 Å, and 10 Å from the substrate. Asterisks indicate statistical significance (***: p < 0.001), and “NS” indicates no statistical difference.
Figure 6. Preference of 5 types of amino acids in the residues around the substrate. (a) Frequency distribution of five types of amino acids (aliphatic, aromatic, negative charge, positive charge, and uncharged) within 4 Å, 6 Å, 8 Å, and 10 Å from the substrate compared to their overall frequency within the structure. (b) Comparison of the frequency of 20 amino acids within the structure to their frequency within 4 Å, 6 Å, 8 Å, and 10 Å from the substrate. Asterisks indicate statistical significance (***: p < 0.001), and “NS” indicates no statistical difference.
Biomolecules 14 00120 g006
Figure 7. (a) Structurally aligned representative residues (within 6 Å) around the substrates of PdiTPSs (I and I/II) producing different backbones are shown superimposed. Residues labeled ent-kaurene synthase (AtKSL:AF-AAC39443). (b) The residue conservation corresponds to the positions depicted in (a). (c) Structurally aligned representative residues (within 8 Å) around the substrates of PdiTPSs (II and I/II) producing different intermediates are shown superimposed. Residues labeled ent-copalyl diphosphate synthase (TwTPS3:AF-ANO43020). (d) The residue conservation corresponds to the positions depicted in (c). In (b,d), aromatic residues and residues that have been shown to affect product outcome in mutagenesis studies are highlighted as red fonts indicate mutated residues.
Figure 7. (a) Structurally aligned representative residues (within 6 Å) around the substrates of PdiTPSs (I and I/II) producing different backbones are shown superimposed. Residues labeled ent-kaurene synthase (AtKSL:AF-AAC39443). (b) The residue conservation corresponds to the positions depicted in (a). (c) Structurally aligned representative residues (within 8 Å) around the substrates of PdiTPSs (II and I/II) producing different intermediates are shown superimposed. Residues labeled ent-copalyl diphosphate synthase (TwTPS3:AF-ANO43020). (d) The residue conservation corresponds to the positions depicted in (c). In (b,d), aromatic residues and residues that have been shown to affect product outcome in mutagenesis studies are highlighted as red fonts indicate mutated residues.
Biomolecules 14 00120 g007
Table 1. PCC for PdiTPSs sequences, structures, and products.
Table 1. PCC for PdiTPSs sequences, structures, and products.
ContentPearson’s Correlation Coefficient
Local structures of 4 Å: Products0.38
Local structures of 6 Å: Products0.4
Local structures of 8 Å: Products0.39
Local structures of 10 Å: Products0.4
Overall structures: Products0.36
C-subsequences: Products0.35
N-subsequences: Products0.54
NC-subsequences: Products0.54
Overall structures: Products0.55
Note: All p values < 0.001 in the table’s statistical results.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Y.; Liang, Y.; Luo, G.; Li, Y.; Han, X.; Wen, M. Sequence-Structure Analysis Unlocking the Potential Functional Application of the Local 3D Motifs of Plant-Derived Diterpene Synthases. Biomolecules 2024, 14, 120. https://doi.org/10.3390/biom14010120

AMA Style

Zhao Y, Liang Y, Luo G, Li Y, Han X, Wen M. Sequence-Structure Analysis Unlocking the Potential Functional Application of the Local 3D Motifs of Plant-Derived Diterpene Synthases. Biomolecules. 2024; 14(1):120. https://doi.org/10.3390/biom14010120

Chicago/Turabian Style

Zhao, Yalan, Yupeng Liang, Gan Luo, Yi Li, Xiulin Han, and Mengliang Wen. 2024. "Sequence-Structure Analysis Unlocking the Potential Functional Application of the Local 3D Motifs of Plant-Derived Diterpene Synthases" Biomolecules 14, no. 1: 120. https://doi.org/10.3390/biom14010120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop