Next Article in Journal
Using In Silico Methods to Identify Protein Tyrosine Kinase A (PtkA) Homolog in Non-Tuberculous Mycobacteria (NTM)
Previous Article in Journal
Single-Molecule Analysis of Alkaline Phosphatase
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Representing and Quantifying Conformational Changes of Kinases and Phosphatases Using the TSR-Based Algorithm

Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
Author to whom correspondence should be addressed.
Kinases Phosphatases 2024, 2(4), 315-339;
Submission received: 26 August 2024 / Revised: 4 November 2024 / Accepted: 5 November 2024 / Published: 8 November 2024


Protein kinases and phosphatases are key signaling proteins and are important drug targets. An explosion in the number of publicly available 3D structures of proteins has been seen in recent years. Three-dimensional structures of kinase and phosphatase have not been systematically investigated. This is due to the difficulty of designing structure-based descriptors that are capable of quantifying conformational changes. We have developed a triangular spatial relationship (TSR)-based algorithm that enables a unique representation of a protein’s 3D structure using a vector of integers (keys). The main objective of this study is to provide structural insight into conformational changes. We also aim to link TSR-based structural descriptors to their functions. The 3D structures of 2527 kinases and 505 phosphatases are studied. This study results in several major findings as follows: (i) The clustering method yields functionally coherent clusters of kinase and phosphatase families and their superfamilies. (ii) Specific TSR keys are identified as structural signatures for different types of kinases and phosphatases. (iii) TSR keys can identify different conformations of the well-known DFG motif of kinases. (iv) A significant number of phosphatases have their own distinct DFG motifs. The TSR keys from kinases and phosphatases agree with each other. TSR keys are successfully used to represent and quantify conformational changes of CDK2 upon the binding of cyclin or phosphorylation. TSR keys are effective when used as features for unsupervised machine learning and for key searches. If discriminative TSR keys are identified, they can be mapped back to atomic details within the amino acids involved. In conclusion, this study presents an advanced computational methodology with significant advantages in not only representing and quantifying conformational changes of protein structures but also having the capability of directly linking protein structures to their functions.

1. Introduction

Protein phosphorylation is an important cellular regulatory mechanism, as many enzymes and receptors are activated/deactivated by phosphorylation and dephosphorylation events [1]. Nearly 70% of all eukaryotic cellular proteins are regulated by phosphorylation and dephosphorylation [2]. Protein kinases are responsible for cellular transduction signaling through phosphorylation. Their hyperactivity, malfunction or overexpression can be found in several diseases, mostly tumors [3]. Therefore, it is evident that the use of kinase inhibitors can be valuable for the treatment of cancers [1]. Phosphatases have dephosphorylation activity, an opposite function of kinases.
Proteins require the formation of specific 3D structures to manifest their essential biological functions for sustaining life. The 3D structures of molecular complexes reveal the atomic details of intra- and inter-molecular interactions, and thus provide important information for understanding the molecular mechanisms [4]. Structural comparison techniques benefit from the ever-expanding repositories of the Protein Data Bank (PDB) [5]. Structurally similar proteins tend to have similar functions, even if their amino acid sequences are not similar to one another. Structurally different proteins could have similar structures at their active sites. Thus, it is very important to identify proteins with similar global or local structures, especially by employing search mechanisms for local 3D structures that can capture remote structural homologies and analogies from the growing database for elucidating the mechanisms of protein functions. In addition, life is about relationships between molecules, not about a property of any single molecule [6]. To understand the biological effect of the binding of an interacting molecule, one must understand the conformational changes of a protein, especially local structures (e.g., a phosphorylation site) induced by molecular binding.
The comparison of two protein structures, despite its apparent simplicity, is a non-trivial challenge. The triangular spatial relationship (TSR) algorithm simplifies this complex problem to one of matching two integer vectors. We have reported protein comparison studies that employ the TSR-based algorithm where triangles are constructed using Cα atoms of a protein as the vertices [7]. A TSR key (an integer) is computed using the length, angle and vertex labels of triangles based on a rule-based label assignment formula, ensuring the assignment of the same key to identical TSRs across different proteins. The unique strengths of the method include (i) an approach to represent molecular 3D structures, complementing the need for structural superimposition or alignment [7] and (ii) the ability to accurately quantify structural similarity, either globally or locally, by counting the common TSR keys between two proteins. In addition, the TSR keys have the advantage of performing structure-based BLAST searches through the PDB structures [8]. The first structure study of a protein kinase, a eukaryotic protein kinase A, described by Knighton et al. in 1991, uncovered the architecture of the enzyme catalytic site together with structural insight into the substrate and inhibitor binding mode [9,10]. For the current investigation, 3D structures of 2527 kinases and 505 phosphatases are identified from the PDB. Their families and subfamilies are annotated and labeled.
This study focuses on the global and local structure comparisons of kinases and phosphatases by taking advantage of the TSR-based method and the structures identified from the PDB and annotated. The multiplicity of functional characteristics of proteins is reflected by the hierarchical organization of their spatial structures. A protein structure dataset can be arranged into hierarchical relations for understanding the mechanisms underlying the structural relationships (Figure 1a). The objectives of this work are to (i) identify the common and specific substructures (Figure 1b) and (ii) understand conformational changes upon the binding of a partner (Figure 1c) and due to the protein phosphorylation or dephosphorylation (Figure 1d).
Most proteins perform their biological function by interacting with themselves or other types of molecules. Accurately predicting protein structures and protein–protein interactions will provide biological insights into protein functions, disease prevalence and therapy development [11]. Prediction of protein structures or protein–protein interactions is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics and computer science [11,12]. The descriptors used in such prediction include sequence-based descriptions [13], structural domains [14], experimental data [15,16,17], physiochemical descriptors [18] and structural descriptors [18]. One focus of this study is to understand structural relationships and changes. Additionally, common and specific TSR keys and TSR keys associated with conformational changes can be developed as new types of structure-based descriptors for prediction purposes.

2. Results

2.1. Extent of Structure Similarities Shared Among Kinases, Phosphatases and Kinases/Phosphatases Combined

Most kinases contain a conserved catalytic domain [19] that has a small N-terminal lobe, which is predominantly composed of five β-sheets and an α-helix, and a larger C-terminal lobe, which is predominantly composed of six helices, an activation loop and a catalytic loop [20]. The catalytic domain of most phosphatases also shows high conservation [19,21]. To understand the structural relationship of kinases and phosphatases, we performed a hierarchical cluster analysis using 2527 kinase structures and 505 phosphatase structures. The result shows well-separated kinase subfamily clusters, well-separated phosphatase subfamily clusters and a few mixed kinase and phosphatase clusters (Supplementary File S1) (Figure 2a), suggesting highly distinct structural characteristics of certain kinase and phosphatase subfamilies and, on the other hand, high similarity between other kinase and phosphatase subfamilies. Using feature filtering through the grouping of amino acids with similar chemical properties and structures has the advantage of improving cluster analyses [22]. Thus, we have applied amino acid grouping in the cluster analysis of kinase and phosphatase structures. The MDS analysis with amino acid grouping reveals a similar result (Figure 2b) to that of the hierarchical clustering without applying amino acid grouping.
The TSR algorithm represents a protein 3D structure with all possible triangles, each of which is computed into an integer (a TSR key). This unique representation allows to determine a common substructure among kinases, among phosphatases and between kinases and phosphatase. Quantification of such a common substructure will provide insight into the hierarchical cluster analysis. In total, 5391, 52,714 and 4178 “Distinct_Common” triangles are identified for kinases, phosphatases and kinases/phosphates combined, respectively, when the number of occurrences of individual triangles (key frequency) is not counted (Figure 2c). If key frequencies are counted, 126,333, 815,511 and 105,677 “Total_Common” triangles on average are found for kinases, phosphatases and kinases/phosphates, respectively (Figure 2c). To estimate the percentage of “Distinct_Common” and “Total_Common” substructures, the numbers of “Distinct” and “Total” triangles are calculated for kinases, phosphatases and kinases/phosphates and then “Distinct_Common” is divided by “Distinct” and “Total_Common” is divided by “Total” (Figure 2c). In percentage terms, the calculation indicates that kinases share a 0.681% (“Distinct_Common” keys/“Distinct” keys: 5391/803,018) of TSR keys when not counting key frequency and a 3.13% of TSR keys (“Total_Common” keys/“Total” keys: 126,333/4,245,673) when key frequencies are counted. Phosphatases share higher average percentages of TSR keys given by Distinct_Common % (6.32% without counting key frequency) and Total_Common % (16.5% with counting key frequency). As expected, the average percentages of TSR keys given by Distinct_Common % (without counting key frequency: 0.523%) and Total_Common % (with counting key frequence: 2.55%) for the case of the whole dataset (kinases and phosphatases combined) are lower.
To further understand structural relationships, we calculate the number of distinct triangles for kinase family and phosphatase family. Surprisingly, indeed as they have different folds, both families share a very high percentage (97.1%) of all distinct triangles in the dataset (proteins of both families together) (Figure 2d). More importantly, this shows there is only a small percentage of keys specific to kinase or phosphatases classes. As introduced in the Methods section, “Near_Specific” sets of keys are identified by applying thresholds of the target and contrasting classes containing a key included in the set. Only such keys are deemed eligible to represent the local structural characteristics of each family and almost all special keys do not satisfy the thresholds for probability occurrence in target class and probability of occurrence in the contrasting classes. In the rest of the paper, we refer to the “Near_Specific” keys of a class or subclass as “Specific” TSR keys or specific substructures. From the calculation, we were able to identify three “Specific” TSR keys for the kinase family and six “Specific” TSR keys for the phosphatase family (Figure 2e). The role of high-resolution 3D structures of proteins and complexes has been recognized as fundamental for the rational design of drug molecules [23,24]. The “Specific” TSR keys from these two localities are shown as examples of substructures for kinase (Figure 2f) and phosphatase (Figure 2g) families.

2.2. Certain Protein Kinase Families Have Their Own Structural Characteristics

Traditionally, molecular phylogenies are constructed as trees based on an amino acid sequence similarity coupled with an underlying theory of sequence evolution [25]. The extreme sequence divergence seen in the kinase superfamily makes such determinations problematic [26]. The development of the computational method and the acceleration in the rate of deposition to the PDB make a more comprehensive 3D structural study of the kinase superfamily possible. Protein structure is generally much more conserved than protein sequence. A 3D structural study will not only provide mechanistic understanding of protein functions but will also complement sequence-based analysis.
The kinase dataset contains 107 different types of kinases, with the numbers of kinases in each type ranging from a few to several hundred instances (Supplementary File S2). This study focuses on the cAMP dependent protein kinase (cAMPDK), mitogen-activated protein kinase (MAK or MAPK), cyclin dependent kinase (CDK), casein kinase II (CKII) and epidermal growth factor receptor (EGFR) families (Figure 3a), which have more instances in the dataset. There are ~20 members in the CDK family regulating the cell cycle, transcription and splicing [27]. The CDKs are in active or inactive forms depending on whether they associate with their partners, known as cyclins [28]. If they associate with cyclins, they will be in their active forms. If they do not associate with cyclins, they are in their inactive forms. We have identified eight members (CDK1–2 and 4–9) of CDKs from the PDB (Figure 3a). The hierarchical clustering shows that cAMPDK, MAK, CDK, CKII and EGFR have their individual clusters (Figure 3b) (Supplementary File S3), demonstrating their structural characteristics. However, not all the CDKs are in one single cluster as predicted. One large cluster and three small clusters are observed. The large cluster includes CDK1, CDK2, CDK4, CDK5 and CDK6 (Figure 3b). CDK7, CDK8 and CDK9 form their own three smaller clusters (Figure 3b). The clustering results can be partially interpreted by “Distinct_Common” and “Total_Common” keys. cAMPDK, MAK, CKII and EGFR families have their high structural similarity measured by percentage of “Distinct_Common” and “Total_Common” TSR keys (Figure 3c). In contrast, the CDK family has a relatively low structural similarity (Figure 3c). We hypothesized that applying amino acid grouping would increase the opportunity for more CDKs to be clustered together. However, the result shows the opposite effect (Figure 3d). The MDS analysis demonstrates that CDK2, CDK8 and CDK9 are well separated from other CDKs, suggesting the structural characteristics of these three CDKs. The rest of the CDKs have well-separated instances and overlapping instances (Figure 3e).
To completely interpret the clustering results, we next discuss distinctive structural characters seen in one family but not in other families in the kinase/phosphatase dataset. “Specific” TSR keys are identified for cAMPDK, MAK, CDK, CKII and EGFR families (Figure 4a). Representative examples are shown for cAMPDK (Figure 4b), MAK (Figure 4c), CDK (Figure 4d), CKII (Figure 4e) and EGFR (Figure 4f) families. High specificities are observed for the “Specific” keys of cAMPDK (98.2% vs. 0.512% other kinases and 0.132% of phosphatases), MAK (99.9% vs. 0.666% other kinases and 0.792% of phosphatases), CKII (97.9% vs. 0.247% other kinases and 0.475% of phosphatases) and EGFR (97.6% vs. 0.630% other kinases and 0.396% of phosphatases). Relatively low specificity is observed for the “Specific” TSR keys of CDK’s family (83.8% vs. 2.08% other kinases and 2.84% of phosphatases). CDK’s family contains eight members in the dataset. We further identify “Specific” keys for each member. Two and three “Specific” TSR keys are identified for CDK2 and CDK9, respectively (Figure 5a). The examples are shown in Figure 5b for CDK2 and Figure 5c for CDK9. The “Specific” keys combined with “Distinct_Common” and “Total_Common” keys explain the clustering results. In addition, the “Specific” keys represent unique substructures of that protein family or subfamily that provide the structural foundation for designing specific inhibitors.

2.3. Certain Protein Phosphatase Families Have Their Structural Characteristics

To understand phosphatase structural relationships, we first perform global structural clustering and MDS analyses. Seven different types of phosphatase structures are annotated in the dataset (Supplementary File S4). Because two types of phosphatases (phosphoprotein phosphatase (PPP) and protein tyrosine phosphatase (PTP) families) contain more instances than other types, we focus on the discussion around these two types of phosphatases. The TSR-based method cannot separate the PPP family from the PTP family and vice versa, either when not applying (Figure 6a) (Supplementary File S5) or when applying amino acid grouping (Figure 6b). It implies that the two phosphatase families likely share similar substructures, and each family may not have highly “Specific” TSR keys. As expected, applying amino acid grouping increases the structural similarity of phosphatases (Figure 6c).
Six and twenty-eight members are annotated for PPP and PTP families, respectively. Six subfamilies of PPP are separated (Figure 6d). Two large members (PTPN1 and PTPN11) of structures are identified for the PTP family. PTPN1s form their own large cluster, while PTPN11s are separated into three clusters (Figure 6e), suggesting more diversity of PTPN11s than PTPN1s, which is supported by percentages of “Distinct_Common” and “Total_Common” key calculations (Figure 6f). Using a similar approach for kinases, we calculated “Specific” keys for understanding the hierarchical relationships of phosphatase structures and for interpreting the MDS and clustering results. The phosphatase structures are arranged at the root level, level 1 containing PPP, PTP and others and level 2 including PTPN1, PTPN11 and others (Figure 7a). Four and eight “Specific” keys are identified as structural signatures for PPP and PTP families at level 1 (Figure 7b) and the representative examples are shown in Figure 7c and 7d, respectively. At level 2, five “Specific” keys are found for both PTPN1 and PTPN11 subfamilies (Figure 7e). Their representative examples are illustrated in Figure 7f for PTPN1 and Figure 7g for PTPN11.

2.4. Structural Motifs Have Their Structural Characteristics

The DFG Motif

As stated earlier, protein kinases play important roles in many cellular signaling pathways, such as cell cycle regulation and apoptosis [29]. Protein kinases share a similar 3D catalytic domain that is composed of an N-terminal lobe and a C-terminal lobe. The two lobes come together through surfaces formed by the C-helix and the activation loop (A-loop), a dynamic feature that is a common site for phosphorylation that regulates kinase activity. Close interactions between C-helix and the A-loop are critical for active state, while a common theme of the inactive state is the disruption of interactions involving the C-helix and the A-loop [30]. The A-loop is also involved in recognizing protein substrates and forms part of a groove that extends along the front face of the C-terminal lobe [30]. However, the N-terminal lobe is also involved in protein substrate recognition found in some kinases such as PINK1 (PTEN-induced kinase 1) [31]. Well-studied tyrosine kinases Src and EGFR are regulated by a C-helix out/C-helix in mechanism [32]. Activity of most kinases is partially regulated by the phosphorylation status and position of the activation loop, which begins with the highly conserved DFG (Asp-Phe-Gly) motif and ends with a sequence usually similar to APE (Ala-Pro-Glu) [33,34]. C-helix contains a critical glutamate residue that forms a salt bridge with a catalytic lysine [35]. The two most common conformations of the DFG motif are DFG-in, in which the A-loop interacts with the C-helix (the salt bridge between Lys and Glu is present), and DFG-out, in which the A-loop is directed away from the C-helix (the salt bridge between Lys and Glu is broken) [30].
Most kinases have a DFG motif that plays an important role in regulating their kinase activity [20,36]. Therefore, we decided to study the DFG motif using the TSR-based method. First, we perform a sequence alignment of 34 kinases selected from seven kinase families. The alignment shows low similarity (amino acid sequence similarity: 22.0%, and amino acid sequence identity: 0.8%). The sequence alignment shows only three amino acids, DFG, aligned together (Figure 8a). Second, we identify all kinases that have the DFG motif. A significant portion (2077 out of 2527 kinases = 82.2%, Supplementary File S6) of the kinases in the dataset have the DFG motif. The CKIIs have a DWG motif instead of the DFG motif (Figure 8b), which partially explains why not all the kinases in the dataset have the DFG motif. Third, we calculate the TSR keys for the DFG motif and discover three groups of TSR keys for the DFG motif. One is a dominant group that consists of 2031 kinases with the TSR keys of 5484102 (more instances) or 5484101 (less instances). The second group contains 41 kinases and has the TSR key of 5484131. The third group has only five kinases and their TSR key values are less than 5484101. Three groups have different MaxDist (Figure 8c) (t-test, p < 0.001) and Theta (Figure 8d) (t-test, p < 0.001) values, suggesting different geometries. To visualize such geometry differences, we randomly select two kinases: 3ZO4: cAMPDK, which has the TSR value of 5484102 for the DFG motif; and 3DCP: MAK, which has the TSR value of 5484131 for the DFG motif. Each of the two structures contains an inhibitor. In the case of 3ZO4, Asp of the DFG motif points toward the inhibitor (QWI) (Figure 8e). In contrast, Asp of the DFG motif points to the opposite direction of the inhibitor (SB2) for 3DCP (Figure 8f). Intermediate conformations (DFG-inter) are reported [37]. DFG-Asp residue in the DFG-in conformation points into the active site and DFG-Asp and DFG-Phe swap positions in the DFG-out conformation [38]. However, it is worth noting that we cannot draw the conclusion that these three groups of TSR keys are directly related to DFG-in, DFG-out and DFG-inter conformations.
We report a limited number of phosphatases that have the DFG motif [39]. This motivated us to search for the DFG motif against the phosphatase dataset, although we have not seen other publications that have reported the DFG motif in the phosphatase family. Surprisingly, 281 serine/threonine and tyrosine phosphatases out of a total of 505 have the DFG motif (Supplementary File S7). Two groups of TSR key values are observed. One group has the TSR key value of 5484101/5484102 and the other has the key value of 5484131. As expected, both groups have different MaxDist (Figure 8c) values (t-test, p < 0.05). However, two groups have similar Theta (Figure 8d) values. In a randomly selected example, 1WAX: PTP with the TSR key of 5484102, Asp of the DFG motif points towards to the inhibitor (LO1) (Figure 9a). In the case of 1NL9: PTP1B with the TSR key of 5484131, another randomly selected example, Asp of the DFG motif points to an opposite direction of the inhibitor (989) (Figure 9b). This observation agrees with what was discussed for kinases.
A WPD-loop is reported in phosphatases [40,41]. Asp in the motif of WPD functions as an acid and a base to directly participate in catalytic reaction [40,41]. This motivated us to study the WPD motif. Surprisingly, we find that the WPD motif and the DFG motif of tyrosine phosphatases share the common Asp residue (Figure 9c,d). Certain tyrosine phosphatases have a DMG motif (Figure 9c) instead of the DFG motif. It appears that certain serine/threonine phosphatases have a WxDP sequence. The proline residues from the WxDP sequence of serine/threonine phosphatases and the proline residues from the WPD motif of tyrosine phosphatases are aligned together (Figure 9c). Most of the phosphatases (410 out of 505 phosphatases) have a WPD motif (Supplementary File S8). Two types of kinases (CDK7 and BSK8) also have a WPD sequence. WPD motifs or sequences of phosphatases and kinases have different MaxDist (Figure 10a) and Theta (Figure 10b) values to those of all the combinations of triangles constituted by Trp, Pro and Asp residues. Two randomly selected WPD motifs of phosphatases (3S3F: PTP10D and 1BZC: PTP1B) are shown in Figure 10c,d. Both the WPD motifs share the Asp residue with their DFG motifs. Those WPD motifs have very close (i.e., a small difference in the associated Theta values) TSR keys (1BZC: 7199432 vs. 3S3F: 7199433), suggesting a similar geometry. In contrast, the DFG motifs have different conformations (1BZC: 5484102 vs. 3S3F: 5484131). For kinases, the WPD sequences and the DFG motifs are located at different places (Figure 10e) compared with those of phosphatases. The WxDP sequence is identified for 69 instances of phosphatases. Most of them are serine/threonine phosphatases and a few are tyrosine phosphatase. The WxDP sequences have their characteristic structures (Figure 10f,g) (t-test, p < 0.001). However, the biological functions of the WxDP sequences are not clear.

2.5. Quantify and Define the Conformational Changes upon the Binding of an Interacting Protein and Phosphorylation

Life is about molecular interactions. For example, C-helix out in inactive CDK2 is stabilized by a refolded conformation of the A-loop that occupies the space between the Lys and Glu [42]. Binding of Cyclin A brings about a substantial conformational change. Phosphorylation of the A-loop on Thr160 reinforces this change [43]. Mutations in the A-loop that cause constitutive kinase activity occur frequently in cancers, and these might act to destabilize autoinhibitory conformations and/or stabilize an active conformation [44]. To demonstrate the application of the TSR-based method in quantifying and defining conformational changes, we have further annotated a subset of the kinase/phosphatase dataset as follows: CDK2. 214 CDK2 structures are labeled based on whether CDK2 has an interacting protein and/or is phosphorylated at threonine 160. Depending on these criteria, 4 groups are labeled in the CDK2 dataset (Supplementary File S9) as follows: 130 structures of CDK2 Thr160 (dephosphorylation) without an interacting protein, 3 structures of CDK2 Thr160 with an interacting protein of Spy, 24 structures of CDK2 Thr160 with an interacting protein of cyclin and 57 structures of CDK2 TPO160 with an interacting protein of cyclin.
Six amino acids, including Arg126, Arg150, Tyr159, His161, Glu162 and Tyr180, are identified for the CDK2 phosphorylation site at Thr160. Both the hierarchical cluster (Figure 11a) and MDS (Figure 11b) analyses of these six amino acids clearly reveal these four groups. These results demonstrate structural characteristics of each group of CDK2 structures. To demonstrate the conformational changes induced by binding of cyclin and phosphorylation, we focus on the following three groups: Thr160 without an interacting protein, Thr160 with cyclin and TPO160 with cyclin. To show the conformational changes induced by binding of cyclin, we compare local structures between the group of Thr160 without an interacting protein and the group of Thr160 with cyclin. Three triangles—Arg126-Arg150-His161, Arg150-Tyr159-Glu162 and His161-Glu162-Tyr180—of the two groups have different MaxDist (Figure 11c) and Theta (Figure 11d) values (t-test, p < 0.05 or 0.001). To demonstrate the conformational changes upon the phosphorylation, we compare the group of Thr160 with cyclin to the group of TPO160 with cyclin. These two groups have different MaxDist and Theta values (Figure 11c,d) for the three triangles of Arg126-Arg150-His161, Arg150-Tyr159-Glu162 and His161-Glu162-Tyr180 (t-test, p < 0.05 or 0.001). One representative phosphorylation site is shown in Figure 11e. The TSR-based method uses MaxDist and Theta values of three triangles constituted from six amino acids in the phosphorylation site at Thr160 to quantify and define conformational changes induced by binding of cyclin to CDK2 or Thr160 phosphorylation. Such conformational changes are illustrated in Figure 11f.

3. Discussion

We have demonstrated that the TSR-based method can quantify conformational changes of the phosphorylation site upon binding of an interacting protein or protein phosphorylation. The conformational changes at the interaction surface or site can propagate to a location a long distance away to cause global structural changes. To further detect the global structural changes of CDK2 upon the binding of cyclin or phosphorylation, we perform a hierarchical cluster analysis using entire structures. Two well-defined clusters are observed. One cluster consists of the structures of CDK2 with an interacting protein and the other cluster contains the CDK2 structures without an interacting protein (Figure 12a). This result clearly demonstrates the global structural changes of CDK2 upon the binding of cyclin or Spy. All three groups of Thr160 without an interacting protein, Thr160 with cyclin and TPO160 with cyclin have similar global structural similarities (Figure 12b). The global structural similarity between Thr160 with cyclin and TPO160 with cyclin is greater than those between Thr160 without an interacting protein and Thr160 with cyclin, and between Thr160 without an interacting protein and TPO160 with cyclin (Figure 12b). As a comparison, the local structural similarities of Thr160 phosphorylation site are shown in Figure 12c. TPO160 with cyclin has the highest local structural similarity on average.
“Specific” and “common” TSR keys are identified for the kinase family, the phosphatase family, the kinase subfamily and the phosphatase subfamilies. Searching a database of protein structures for structures that are similar to, or contain substructures that are similar to a query structure, is a significant problem in structural biology and bioinformatics [45]. One of the uniqueness of the TSR algorithm is its use of integer-based vector representation for 3D structures. This makes local structural search against all structures in the PDB possible. A well-folded 3D protein structure is a prerequisite for biological activity [46], yet it is not well understood how unfolded proteins reach their native state. Protein misfolding is often associated with human diseases (e.g., Parkinson’s, Huntington’s and Alzheimer’s) [47,48]. Soluble and correctly folded proteins typically bury a significant fraction (60% to 80%) of their nonpolar residues inside the core to minimize exposure to the hydrophilic environment of typical intracellular media [49,50]. The driving factors for protein folding are numerous non-covalent inter-residue interactions [12]. Such interactions could be from local residues or long-distance residues. Local residue interactions bias short stretches of the chain toward forming specific secondary structures, while favorable long-distance interactions can be formed even before the global native fold is reached [51]. The common substructures identified for kinases and phosphatases imply they are special. It has been reported that the common substructure motifs among different protein folds are of critical importance for biological function predictions [52]. One of the future directions is to carefully examine location and type (polar vs. nonpolar) of amino acids of those common substructures possibly for providing a mechanistic understanding of protein folding. Identification of interacting proteins is an important step in the elucidation of cell regulatory mechanisms [53,54]. Studies have shown that identifying potential binding sites of small molecules will allow optimizing docking protocols and may also help to elucidate the unknown protein functions [55]. Structural information can be used to predict molecular interactions superior to predictions based on non-structural evidence [56]. Specific substructural and conformational changes identified, as carried out using TSR keys in this study, could greatly aid prediction purposes in the future.

4. Methods

4.1. Key Generation

The process begins with extracting Cα atoms from PDB files of each protein under analysis. Next, the three side lengths and angles of all triangles constructible from these Cα atoms are systematically calculated. Each of the 20 amino acids is labeled or assigned with 20 consecutive unique integer identifiers [7]. We map the amino acids involved with three vertices of triangle i to corresponding integer IDs to three labels l i 1 , l i 2 and l i 3 . We then ensure uniqueness of the same TSR triangle across proteins to be represented by the same integer keys by applying the rule-based label determination of vertices of each triangle [57]. Once l i 1 , l i 2 and l i 3 are determined for triangle i, we calculate θ1 using Equation (1) and θΔ based on θ1 values.
θ 1 = cos 1 ( ( d 13 2 ( d 12 2 ) 2 d 3 2 ) / ( 2 × ( d 12 2 ) × d 3 ) ) θ Δ = θ 1                                 i f   θ 1     90 ° 180 ° θ 1           o t h e r w i s e
  • d 13 : distance between l i 1 and l i 3 for triangle i;
  • d 12 : distance between l i 1 and l i 2 for triangle i;
  • d 3 : distance between l i 3 and the midpoint of l i 1 and l i 2 for triangle i.
We refer to the value of θΔ as Theta and D as MaxDist [7]. Theta is defined as the angle that is <90° between the line from the midpoint of the edge of l i 1 and l i 2 to the opposite vertex l i 3 and half of the l i 1 l i 2 edge. MaxDist is defined as the distance of the longest edge of a triangle. Once labels l i 1 , l i 2 , l i 3 , D and θΔ are determined, we use Equation (2) to calculate the key for each triangle.
k = θ T d T l i 1 1 m 2 + θ T d T l i 2 1 m + θ T d T l i 3 1 + θ T d 1 + θ 1
  • m: the total number of distinct labels;
  • θ : the bin value for the class in which θ Δ , the angle representative, falls; to achieve discretization we use the adaptive unsupervised iterative discretization algorithm;
  • θ T : the total number of distinct discretization levels (or bin number) for angle representative;
  • d: the bin value for the class in which D, the length representative, falls; to achieve discretization we use the adaptive unsupervised iterative discretization algorithm;
  • d T : the total number of distinct discretization levels (or number of bins) for length representative.
Crucially, the generated key for each triangle depends on l i 1 , l i 2 and l i 3 (vertex labels) and Theta (θ) and MaxDist (D). This design ensures that keys, while remaining invariant to rotations and translations, can effectively capture scale changes in protein structures, making them suitable for alignment-free pairwise comparison of 3D structures.

4.2. Protein Structural Similarity and Distance Calculation

We apply the generalized Jaccard coefficient measure [58], Equation (3), for the calculation of similarity between two proteins.
J a c g e n = i = 1 n ϵ i / i = 1 n z i    
where n is the total number of unique keys in proteins p1 and p2.
Equivalence ϵ for a given key ki in two different proteins p1 and p2 is defined as ϵ i = k i p 1 k i p 2 , where is defined by the minimum of the count of corresponding keys.
Difference z for a given key ki in a pair of proteins is defined as z i = k i p 1 k i p 2 , where is defined by the maximum of the count of corresponding keys. The count of a key is the number of times that key occurs (occurrence frequency) within a protein.
Once a similarity matrix is generated, the distance matrix is derived simply by taking each value in the similarity matrix and subtracting it from 1. Protein structure clustering is visualized based on average linkage clustering [59]. A six-layer fully connected neural network is used for classifying protein structures [39]. Structural images are prepared using the visual molecular dynamics (VMD) package [60].

4.3. Amino Acid Grouping

For the case of amino acid grouping, we follow the formula reported earlier [22]. Ser and Thr are grouped together with the same integer because Ser and Thr have similar structures and functions. Similarly, we group and assign the same integers for Ala and Val; Leu and Ile; Phe and Trp; Asp and Glu; Asn and Gln; and Lys and Arg. For the case of grouping, out of 20 distinct amino acids, 14 are combined to form 7 amino acid categories and the other 6 remain in a category by themselves. Thus, we end up with 13 total integer IDs, 1 for each amino acid category.

4.4. Dataset Preparation

Protein kinases are classified into five major groups according to the amino acid residue that they phosphorylate, including serine/threonine protein kinases (STPKs) [61,62], tyrosine protein kinases (TKs) [61,62], histidine-specific kinases [63,64], dual-specificity protein kinases [65] and aspartic acid/glutamic acid-specific protein kinases [64]. Serine/threonine protein kinase is a large family of protein kinases, including a number of kinases such as CDK [66], MAK [67], cAMPDK [68], casein kinase (CK) [69], etc. The CDK subunit needs to be combined with the corresponding cyclin to activate. Activated CDKs exhibit protein kinase activity that phosphorylates different substrate proteins, thereby initiating or regulating the cell cycle. MAPKs have been classified into extracellular signal-regulated kinases (ERKs), c-Jun N-terminal kinases (JNKs) and p38-MAPKs (MAK) [67]. Tyrosine kinases are classified into non-receptor tyrosine protein kinases (NRTKs) and receptor tyrosine kinases (RTKs) [70,71,72,73]. NRTKs mainly include Abl, FES, JAK, ACK, SYK, TEC, FAK, Src and CSK, whereas RTKs include ~20 different RTK classes (EGFR family and insulin receptor family) [72]. Based on the kinase (sub)families, we searched for kinases from the PDB and prepared a dataset containing 2527 kinase structures. Detailed information including PDB IDs, chains and hierarchical class labels can be found in Supplementary File S2.
Phosphatases are classified into three families [74,75]: the PPP family, the metallo-dependent protein phosphatase (PPM) family that dephosphorylates phosphoserine and phosphothreonine residues and the PTP family that dephosphorylates phosphotyrosine amino acids [76]. A subfamily of the PTPs, the dual-specificity phosphatases, dephosphorylates all three phosphoamino acids [76]. The PPP family consists of PP1, PP2A, PP3 (PP2B/calcineurin), PP4, PP5, PP6 and PP7 [77]. Analogous to the kinase structures, we have built a phosphatase dataset that contains 505 phosphatase structures. Detailed information about the phosphatase dataset can be found in Supplementary File S4.

4.5. Sequence Alignment

SnapGene and Vector NTI are applied to conduct multiple sequence alignments.

4.6. Terminology and Quantitative Analyses

4.6.1. Terminology

D—set of proteins in a dataset.
Class X—set of proteins representing a subclass of the dataset.
Class Y—set of proteins representing another subclass of the dataset.
Note: in the case of a dataset with only two classes, Class Y can be referred to as Class ¬X and Class ¬Y is Class X; D = (Class X ∪ Class Y).
T—set of TSR keys present in one or more proteins in D.
Xi—set of TSR keys present in ith protein in Class X.
F(Xi)—sum of occurrence frequencies of TSR keys present in ith protein Xi of Class X.
X—{∪ (Xi)|Xi in Class X}.
Y—{∪ (Yi)|Yi in Class X}.
T—(X ∪ Y).
Common (D)—set of TSR keys present in both one or more proteins in Class X, and in one or more proteins in Class Y, or (X ∩ Y).
Specific (Class X)—set of TSR keys present in one or more proteins in X, and not occurring in any protein in Y, or (X − (X ∩ Y)).
common (D)—set of TSR keys present in every protein in D.
Fcommon(D) (Xi)—sum of occurrence frequencies of all keys from common (D) in protein Xi.
Common (Class X)—set of TSR keys present in one or more proteins in Class X (union of keys specific to subgroups of X). If Class P and Class Q are subclasses of Class X, then these Common keys are (P ∩ Q).
common (Class X)—set of TSR keys present in every protein in Class X.
Fcommon(X) (Xi)—sum of occurrence frequencies of all keys from common(X) in protein Xi.
specific (Class X)—set of TSR keys present in every protein in Class X, and not occurring in any protein in Class Y (common(X) − common(D)).

4.6.2. Quantitative Analysis

Distinct_Common (D)—|common (D)|.
Note: |Q| denotes the cardinality of the set Q.
Distinct_Common (Class X)—|common (Class X)|.
Distinct (D)—average of the numbers of distinct TSR keys over proteins in D, ((∑i |Xi| + (∑i |Yi|)/|D|).
Distinct (Class X)—average of the numbers of distinct TSR keys over proteins in Class X, (∑i |Xi|/|Class X|).
Total_Common (D)—average, over all proteins in D, of sums of occurrence frequencies of TSR keys in common (D), ((∑i Fcommon(D)(Xi) + (∑i Fcommon(D)(Yi)|)/|D|).
Total_Common (Class X)—average, over all proteins in Class X, of the sums of occurrence frequencies of TSR keys in common (X), (∑i Fcommon(X)(Xi)/|Class X|).
Total (D)—average, over all proteins in D, of sums of occurrence frequencies of all TSR keys in T, ((∑i F(Xi) + (∑i F(Yi))/|D|).
Total (Class X)—average, over all proteins in Class X, of sums of occurrence frequencies of all TSR keys in T, ((∑i F(Xi)/|Class X|).

4.6.3. Percentage of Common Substructures

Distinct_Common (D) % = Distinct_Common(D) × 100/Distinct (D)
Distinct_Common (Class X) % = Distinct_Common(Class X) × 100/Distinct (Class X)
Total_Common (D) % = Total_Common (D) × 100/Total (D)
Total_Common (Class X) % = Total_Common (Class X) × 100/Total (Class X)

4.6.4. Identifying “Specific” Substructure Probabilities by Protein Subclasses

Note: Nearly all Specific keys are not interesting because they occur in a small percentage of proteins of the target class. Most of the interesting keys are in (Common(D) − common(D)). “Specific” keys are identified from these.
For these keys, we define filtered key sets (or specific substructure):
Near_Specific (Class X) keys—{K|key, K appears at least 70% of proteins in Class X and no more than 30% proteins in Class Y and K ∈ {(Common(D) − common(D)) ∪ Specific (Class X)}}.
Near_Specific (Class Y) keys—{K|key, K appears at least 70% of proteins in Class Y and no more than 30% proteins in Class X and K ∈ {(Common(D) − common(D)) ∪ Specific (Class Y)}}.
Now, the protein class-conditional probabilities with respect to the filtered “Specific” keys are given by:
Near_Specific (Class X)-in-Class X %—average of the key set percentages of keys from the set Near_Specific (Class X) occurring in a protein, over all proteins in Class X).
Note: here, Class X is the target class and Class Y is the contrasting class.
Near_Specific (Class Y)-in-Class Y %—average of the key set percentages of keys from the set Near_Specific (Class Y) occurring in a protein, over all proteins in Class Y).
Note: here, Class Y is the target class and Class X is the contrasting class.
The key set percentage of keys from a key set S appearing in a protein is (the number of keys from S appearing in the protein × 100/|S|).

4.6.5. An Example for Understanding Terminology

A given dataset has two protein classes: X and Y. There are three proteins—X1, X2 and X3—in Class A, whereas there are two proteins—Y1 and Y2—in Class Y. The sets of distinct TSR keys for each protein in the dataset are shown in Figure 13a. The key occurrences (frequencies) for each protein can be found in Figure 13b. Figure 2c shows “Specific” TSR keys of Class X and Class Y and “Common” TSR keys of the dataset. “common” TSR keys for Class X, Class Y and the dataset are illustrated in Figure 13d, 13e and 13f, respectively. “common” keys of the dataset are the intersection between “common” keys of Class X and “common” keys of Class Y. Figure 13g reveals how “Distinct_Common”, “Total_Common”, “Distinct” and “Total” keys are calculated. “specific” TSR keys for Class X and Class Y are shown in Figure 13h. “specific” TSR keys of Class X (Y) is a subset of “Specific” TSR keys of Class X (Y).
To extract meaningful information from a dataset, it is important to include as many proteins as possible and increase diversity as much as possible. For such a dataset, “specific” keys often do not exist (unlike the keys in Figure 13h). In contrast, “Near_Specific” can be identified. Figure 14 is an example of a “Near_Specific” keyset and a “Near_Specific” key. A dataset has two classes: A and B. Each class has five proteins (Figure 14a). Five distinct keys (K1-K5) are found for the dataset and key occurrences can be found in Figure 14a. An example of a “Near_Specific” keyset and an example of a “Near_Specific” single key can be found in Figure 14b. The calculations of keyset and single key percentages are illustrated in Figure 14c. Figure 14d shows a bar plot based on the data from Figure 14c.

4.7. Statistical Analyses

A t-test is used to identify statistical differences between different structural comparison methods. A threshold of p < 0.05 is used to determine significance.

5. Conclusions

The key contributions of this study can be summarized as follows:
We introduce new hierarchically organized kinase and phosphatase large datasets and new DFG, WPD and CDK2 sub-datasets, with each containing meaningful annotations and labels. The python source code for this study is publicly available on GitHub.
Hierarchical clustering and MDS analyses reveal that hierarchically organized kinases and phosphatases have structural characteristics and diversities.
The interpretation of clustering results is achieved using “common” and “Specific” TSR keys. “Specific” keys represent the structural signatures of particular types of kinases or phosphatases.
TSR keys can meaningfully represent different conformations of the well-known DFG motif of kinases.
A significant number of phosphatases have a DFG motif. The TSR keys for DFG motifs from phosphatases agree with those from kinases.
TSR keys are successfully used to represent and quantify conformational changes of CDK2 upon binding of cyclin or phosphorylation.
In summary, this study demonstrates the capabilities of an advanced computational methodology with notable advantages not only to represent and quantify conformational changes of protein structures but also to directly link protein structures to their functions.

Supplementary Materials

The following supporting information can be downloaded at:, File S1: Clustermap: 5 clusters; File S2: Kinase_2527; File S3: Clustermap_Kinase; File S4: Phosphatase_505; File S5: Clustermap_Phosphatase; File S6: DFG_Kinase; File S7: DFG_Phosphatase; File S8: WPD_Phosphatase; File S9: CDK2_Thr_TPO160_Cyclin.

Author Contributions

W.X. proposed and designed this study. T.I.M., K.R., S.F. and K.H.O. carried out the study, collected data and prepared the figures. T.I.M. and K.R. wrote the Python code. Y.W. and V.R. contributed to the discussions. W.X. and V.R. wrote the manuscript, and all authors read and revised the manuscript. All authors have read and agreed to the published version of the manuscript.


This study is supported by NIH NIGMS grant (1R15GM144944-01).

Institutional Review Board Statement

This study did not involve human subjects and/or animals.

Data Availability Statement

These data were derived from the following resource available in the public domain: (accessed on 1 October 2024). The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary Materials. The source code is available for academic users on GitHub: (accessed on 1 October 2024), (accessed on 1 October 2024) and (accessed on 1 October 2024).


Most of this research was conducted with high-performance computational resources provided by the Louisiana Optical Network Infrastructure ( (accessed on 1 October 2024)). Caleb Rahim helped in preparing the datasets. Here, we want to appreciate the LONI support team, especially Feng Chen, Jianxiong Li and Oleg Starovoytov.

Conflicts of Interest

The authors declare no conflicts of interest.


  1. Ardito, F.; Giuliani, M.; Perrone, D.; Troiano, G.; Lo Muzio, L. The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review). Int. J. Mol. Med. 2017, 40, 271–280. [Google Scholar] [CrossRef] [PubMed]
  2. Olsen, J.V.; Vermeulen, M.; Santamaria, A.; Kumar, C.; Miller, M.L.; Jensen, L.J.; Gnad, F.; Cox, J.; Jensen, T.S.; Nigg, E.A.; et al. Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci. Signal 2010, 3, ra3. [Google Scholar] [CrossRef] [PubMed]
  3. Sebolt-Leopold, J.S.; Herrera, R. Targeting the mitogen-activated protein kinase cascade to treat cancer. Nat. Rev. Cancer 2004, 4, 937–947. [Google Scholar] [CrossRef] [PubMed]
  4. Berman, H.; Henrick, K.; Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003, 10, 980. [Google Scholar] [CrossRef]
  5. Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.; Meyer, E.F., Jr.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein Data Bank: A computer-based archival file for macromolecular structures. J. Mol. Biol. 1977, 112, 535–542. [Google Scholar] [CrossRef]
  6. Zuckerkandl, E.; Pauling, L. Molecular disease, evolution, and genic diversity. In Horizons in Biochemistry; Kasha, M., Pullman, B., Eds.; Academic Press: New York, NY, USA, 1962; pp. 189–225. [Google Scholar]
  7. Kondra, S.; Sarkar, T.; Raghavan, V.; Xu, W. Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery. Front. Chem. 2021, 8, 602291. [Google Scholar] [CrossRef]
  8. Chen, F.; Milon, I.T.; Khajouie, P.; Myers, A.; Xu, W. A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid. Curr. Bioinform. 2024, 19, 1–16. [Google Scholar] [CrossRef]
  9. Knighton, D.R.; Zheng, J.; Ten Eyck, L.F.; Ashford, V.A.; Xuong, N.-H.; Taylor, S.S.; Sowadski, J.M. Crystal Structure of the Catalytic Subunit of Cyclic Adenosine Monophosphate-Dependent Protein Kinase. Science 1991, 253, 407–414. [Google Scholar] [CrossRef]
  10. Knighton, D.R.; Zheng, J.; Ten Eyck, L.F.; Xuong, N.-H.; Taylor, S.S.; Sowadski, J.M. Structure of a Peptide Inhibitor Bound to the Catalytic Subunit of Cyclic Adenosine Monophosphate-Dependent Protein Kinase. Science 1991, 253, 414–420. [Google Scholar] [CrossRef]
  11. Soleymani, F.; Paquet, E.; Viktor, H.; Michalowski, W.; Spinello, D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput. Struct. Biotechnol. J. 2022, 20, 5316–5341. [Google Scholar] [CrossRef]
  12. Huang, B.; Kong, L.; Wang, C.; Ju, F.; Zhang, Q.; Zhu, J.; Gong, T.; Zhang, H.; Yu, C.; Zheng, W.M.; et al. Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms. Genom. Proteom. Bioinform. 2023, 21, 913–925. [Google Scholar] [CrossRef] [PubMed]
  13. Pazos, F.; Valencia, A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng. 2001, 14, 609–614. [Google Scholar] [CrossRef]
  14. Sprinzak, E.; Margalit, H. Correlated sequence-signatures as markers of protein-protein interaction. J. Mol. Biol. 2001, 311, 681–692. [Google Scholar] [CrossRef] [PubMed]
  15. Fields, S.; Song, O. A novel genetic system to detect protein-protein interactions. Nature 1989, 340, 245–246. [Google Scholar] [CrossRef] [PubMed]
  16. Ho, Y.; Gruhler, A.; Heilbut, A.; Bader, G.D.; Moore, L.; Adams, S.L.; Millar, A.; Taylor, P.; Bennett, K.; Boutilier, K.; et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415, 180–183. [Google Scholar] [CrossRef]
  17. Zhu, H.; Bilgin, M.; Bangham, R.; Hall, D.; Casamayor, A.; Bertone, P.; Lan, N.; Jansen, R.; Bidlingmaier, S.; Houfek, T.; et al. Global analysis of protein activities using proteome chips. Science 2001, 293, 2101–2105. [Google Scholar] [CrossRef]
  18. Bock, J.R.; Gough, D.A. Whole-proteome interaction mining. Bioinformatics 2003, 19, 125–134. [Google Scholar] [CrossRef]
  19. Merrill, R.A.; Strack, S. Protein Kinases and Phosphatases. In Molecular Pain; Zhuo, M., Ed.; Springer: New York, NY, USA, 2007; pp. 187–205. [Google Scholar] [CrossRef]
  20. Huse, M.; Kuriyan, J. The conformational plasticity of protein kinases. Cell 2002, 109, 275–282. [Google Scholar] [CrossRef]
  21. Seok, S.H. Structural Insights into Protein Regulation by Phosphorylation and Substrate Recognition of Protein Kinases/Phosphatases. Life 2021, 11, 957. [Google Scholar] [CrossRef]
  22. Sarkar, T.; Raghavan, V.V.; Chen, F.; Riley, A.; Zhou, S.; Xu, W. Exploring the effectiveness of the TSR-based protein 3-D structural comparison method for protein clustering, and structural motif identification and discovery of protein kinases, hydrolase, and SARS-CoV-2’s protein via the application of amino acid grouping. Comput. Biol. Chem. 2021, 92, 107479. [Google Scholar] [CrossRef]
  23. Duran-Frigola, M.; Mosca, R.; Aloy, P. Structural Systems Pharmacology: The Role of 3D Structures in Next-Generation Drug Development. Chem. Biol. 2013, 20, 674–684. [Google Scholar] [CrossRef] [PubMed]
  24. Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 2005, 4, 649–663. [Google Scholar] [CrossRef] [PubMed]
  25. Fitch, W.M.; Margoliash, E. Construction of phylogenetic trees. Science 1967, 155, 279–284. [Google Scholar] [CrossRef]
  26. Scheeff, E.D.; Bourne, P.E. Structural evolution of the protein kinase-like superfamily. PLoS Comput. Biol. 2005, 1, e49. [Google Scholar] [CrossRef]
  27. Łukasik, P.; Załuski, M.; Gutowska, I. Cyclin-Dependent Kinases (CDK) and Their Role in Diseases Development-Review. Int. J. Mol. Sci. 2021, 22, 2935. [Google Scholar] [CrossRef] [PubMed]
  28. Ekholm, S.V.; Reed, S.I. Regulation of G(1) cyclin-dependent kinases in the mammalian cell cycle. Curr. Opin. Cell Biol. 2000, 12, 676–684. [Google Scholar] [CrossRef]
  29. Chang, F.; Steelman, L.S.; Shelton, J.G.; Lee, J.T.; Navolanic, P.M.; Blalock, W.L.; Franklin, R.; McCubrey, J.A. Regulation of cell cycle progression and apoptosis by the Ras/Raf/MEK/ERK pathway (Review). Int. J. Oncol. 2003, 22, 469–480. [Google Scholar]
  30. Arter, C.; Trask, L.; Ward, S.; Yeoh, S.; Bayliss, R. Structural features of the protein kinase domain and targeted binding by small-molecule inhibitors. J. Biol. Chem. 2022, 298, 102247. [Google Scholar] [CrossRef]
  31. Schubert, A.F.; Gladkova, C.; Pardon, E.; Wagstaff, J.L.; Freund, S.M.V.; Steyaert, J.; Maslen, S.L.; Komander, D. Structure of PINK1 in complex with its substrate ubiquitin. Nature 2017, 552, 51–56. [Google Scholar] [CrossRef]
  32. Jura, N.; Zhang, X.; Endres, N.F.; Seeliger, M.A.; Schindler, T.; Kuriyan, J. Catalytic control in the EGF receptor and its connection to general kinase regulatory mechanisms. Mol. Cell 2011, 42, 9–22. [Google Scholar] [CrossRef]
  33. Taylor, S.S.; Kornev, A.P. Protein kinases: Evolution of dynamic regulatory proteins. Trends Biochem. Sci. 2011, 36, 65–77. [Google Scholar] [CrossRef] [PubMed]
  34. Johnson, L.N.; Lewis, R.J. Structural Basis for Control by Phosphorylation. Chem. Rev. 2001, 101, 2209–2242. [Google Scholar] [CrossRef] [PubMed]
  35. Baker, Z.D.; Rasmussen, D.M.; Levinson, N.M. Exploring the conformational landscapes of protein kinases: Perspectives from FRET and DEER. Biochem. Soc. Trans. 2024, 52, 1071–1083. [Google Scholar] [CrossRef] [PubMed]
  36. Nolen, B.; Taylor, S.; Ghosh, G. Regulation of protein kinases; controlling activity through activation segment conformation. Mol. Cell 2004, 15, 661–675. [Google Scholar] [CrossRef]
  37. Dodson, C.A.; Kosmopoulou, M.; Richards, M.W.; Atrash, B.; Bavetsias, V.; Blagg, J.; Bayliss, R. Crystal structure of an Aurora-A mutant that mimics Aurora-B bound to MLN8054: Insights into selectivity and drug design. Biochem. J. 2010, 427, 19–28. [Google Scholar] [CrossRef]
  38. Hubbard, S.R.; Wei, L.; Hendrickson, W.A. Crystal structure of the tyrosine kinase domain of the human insulin receptor. Nature 1994, 372, 746–754. [Google Scholar] [CrossRef]
  39. Kondra, S.; Chen, F.; Chen, Y.; Chen, Y.; Collette, C.J.; Xu, W. A study of a hierarchical structure of proteins and ligand binding sites of receptors using the triangular spatial relationship-based structure comparison method and development of a size-filtering feature designed for comparing different sizes of protein structures. Proteins 2022, 90, 239–257. [Google Scholar] [CrossRef]
  40. Pannifer, A.D.B.; Flint, A.J.; Tonks, N.K.; Barford, D. Visualization of the Cysteinyl-phosphate Intermediate of a Protein-tyrosine Phosphatase by X-ray Crystallography. J. Biol. Chem. 1998, 273, 10454–10462. [Google Scholar] [CrossRef]
  41. Shen, R.; Crean, R.M.; Olsen, K.J.; Corbella, M.; Calixto, A.R.; Richan, T.; Brandão, T.A.S.; Berry, R.D.; Tolman, A.; Loria, J.P.; et al. Insights into the importance of WPD-loop sequence for activity and structure in protein tyrosine phosphatases. Chem. Sci. 2022, 13, 13524–13540. [Google Scholar] [CrossRef]
  42. De Bondt, H.L.; Rosenblatt, J.; Jancarik, J.; Jones, H.D.; Morgan, D.O.; Kim, S.H. Crystal structure of cyclin-dependent kinase 2. Nature 1993, 363, 595–602. [Google Scholar] [CrossRef]
  43. Jeffrey, P.D.; Russo, A.A.; Polyak, K.; Gibbs, E.; Hurwitz, J.; Massagué, J.; Pavletich, N.P. Mechanism of CDK activation revealed by the structure of a cyclinA-CDK2 complex. Nature 1995, 376, 313–320. [Google Scholar] [CrossRef] [PubMed]
  44. Miller, M.L.; Reznik, E.; Gauthier, N.P.; Aksoy, B.A.; Korkut, A.; Gao, J.; Ciriello, G.; Schultz, N.; Sander, C. Pan-Cancer Analysis of Mutation Hotspots in Protein Domains. Cell Syst. 2015, 1, 197–209. [Google Scholar] [CrossRef] [PubMed]
  45. Stivala, A.D.; Stuckey, P.J.; Wirth, A.I. Fast and accurate protein substructure searching with simulated annealing and GPUs. BMC Bioinform. 2010, 11, 446. [Google Scholar] [CrossRef]
  46. Mecha, M.F.; Hutchinson, R.B.; Lee, J.H.; Cavagnero, S. Protein folding in vitro and in the cell: From a solitary journey to a team effort. Biophys. Chem. 2022, 287, 106821. [Google Scholar] [CrossRef] [PubMed]
  47. Speed, M.A.; Wang, D.I.; King, J. Specific aggregation of partially folded polypeptide chains: The molecular basis of inclusion body composition. Nat. Biotechnol. 1996, 14, 1283–1287. [Google Scholar] [CrossRef]
  48. Tycko, R. Amyloid polymorphism: Structural basis and neurobiological relevance. Neuron 2015, 86, 632–645. [Google Scholar] [CrossRef] [PubMed]
  49. Chothia, C. Hydrophobic bonding and accessible surface area in proteins. Nature 1974, 248, 338–339. [Google Scholar] [CrossRef]
  50. Chothia, C. The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 1976, 105, 1–12. [Google Scholar] [CrossRef]
  51. Kuhlman, B.; Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019, 20, 681–697. [Google Scholar] [CrossRef]
  52. Wu, S.; Zhang, Y. Recognizing Protein Substructure Similarity Using Segmental Threading. Structure 2010, 18, 858–867. [Google Scholar] [CrossRef]
  53. Bonetta, L. Protein-protein interactions: Interactome under construction. Nature 2010, 468, 851–854. [Google Scholar] [CrossRef]
  54. Vidal, M.; Cusick, M.E.; Barabási, A.L. Interactome networks and human disease. Cell 2011, 144, 986–998. [Google Scholar] [CrossRef]
  55. Shirvanyants, D.; Alexandrova, A.N.; Dokholyan, N.V. Rigid substructure search. Bioinformatics 2011, 27, 1327–1329. [Google Scholar] [CrossRef]
  56. Zhang, Q.C.; Petrey, D.; Deng, L.; Qiang, L.; Shi, Y.; Thu, C.A.; Bisikirska, B.; Lefebvre, C.; Accili, D.; Hunter, T.; et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012, 490, 556–560. [Google Scholar] [CrossRef]
  57. Guru, D.S.; Nagabhushan, P. Triangular spatial relationship: A new approach for spatial knowledge representation. Pattern Recognit. Lett. 2001, 22, 999–1006. [Google Scholar] [CrossRef]
  58. Jaccard, P. Etude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 547–579. [Google Scholar]
  59. Ackerman, M.; Ben-David, S. A characterization of linkage-based hierarchical clustering. J. Mach. Learn. Res. 2016, 17, 8182–8198. [Google Scholar]
  60. Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual molecular dynamics. J. Mol. Graph. 1996, 14, 33–38. [Google Scholar] [CrossRef] [PubMed]
  61. Krupa, A.; Preethi, G.; Srinivasan, N. Structural Modes of Stabilization of Permissive Phosphorylation Sites in Protein Kinases: Distinct Strategies in Ser/Thr and Tyr Kinases. J. Mol. Biol. 2004, 339, 1025–1039. [Google Scholar] [CrossRef] [PubMed]
  62. Bossemeyer, D. Protein kinases—Structure and function. FEBS Lett. 1995, 369, 57–61. [Google Scholar] [CrossRef]
  63. Fuhs, S.R.; Hunter, T. pHisphorylation: The emergence of histidine phosphorylation as a reversible regulatory modification. Curr. Opin. Cell Biol. 2017, 45, 8–16. [Google Scholar] [CrossRef] [PubMed]
  64. Swanson, R.V.; Alex, L.A.; Simon, M.I. Histidine and aspartate phosphorylation: Two-component systems and the limits of homology. Trends Biochem. Sci. 1994, 19, 485–490. [Google Scholar] [CrossRef] [PubMed]
  65. Dhanasekaran, N.; Premkumar Reddy, E. Signaling by dual specificity kinases. Oncogene 1998, 17, 1447–1455. [Google Scholar] [CrossRef] [PubMed]
  66. Russo, A.A.; Jeffrey, P.D.; Pavletich, N.P. Structural basis of cyclin-dependent kinase activation by phosphorylation. Nat. Struct. Biol. 1996, 3, 696–700. [Google Scholar] [CrossRef]
  67. Wada, T.; Penninger, J.M. Mitogen-activated protein kinases in apoptosis regulation. Oncogene 2004, 23, 2838–2849. [Google Scholar] [CrossRef]
  68. Taylor, S.S.; Buechler, J.A.; Yonemoto, W. cAMP-dependent protein kinase: Framework for a diverse family of regulatory enzymes. Annu. Rev. Biochem. 1990, 59, 971–1005. [Google Scholar] [CrossRef]
  69. Knippschild, U.; Gocht, A.; Wolff, S.; Huber, N.; Löhler, J.; Stöter, M. The casein kinase 1 family: Participation in multiple cellular processes in eukaryotes. Cell. Signal. 2005, 17, 675–689. [Google Scholar] [CrossRef]
  70. Robinson, D.R.; Wu, Y.-M.; Lin, S.-F. The protein tyrosine kinase family of the human genome. Oncogene 2000, 19, 5548–5557. [Google Scholar] [CrossRef]
  71. Hubbard, S.R.; Till, J.H. Protein tyrosine kinase structure and function. Annu. Rev. Biochem. 2000, 69, 373–398. [Google Scholar] [CrossRef]
  72. Siveen, K.S.; Prabhu, K.S.; Achkar, I.W.; Kuttikrishnan, S.; Shyam, S.; Khan, A.Q.; Merhi, M.; Dermime, S.; Uddin, S. Role of Non Receptor Tyrosine Kinases in Hematological Malignances and its Targeting by Natural Products. Mol. Cancer 2018, 17, 31. [Google Scholar] [CrossRef]
  73. Pendergast, A.M. Nuclear tyrosine kinases: From Abl to WEE1. Curr. Opin. Cell Biol. 1996, 8, 174–181. [Google Scholar] [CrossRef] [PubMed]
  74. Jin, J.; Pawson, T. Modular evolution of phosphorylation-based signalling systems. Philos. Trans. R. Soc. London. Ser. B Biol. Sci. 2012, 367, 2540–2555. [Google Scholar] [CrossRef] [PubMed]
  75. Sacco, F.; Perfetto, L.; Castagnoli, L.; Cesareni, G. The human phosphatase interactome: An intricate family portrait. FEBS Lett. 2012, 586, 2732–2739. [Google Scholar] [CrossRef] [PubMed]
  76. Barford, D.; Das, A.K.; Egloff, M.-P. The Structure and Mechanism of Protein Phosphatases: Insights into Catalysis and Regulation. Annu. Rev. Biophys. 1998, 27, 133–164. [Google Scholar] [CrossRef]
  77. Peti, W.; Nairn, A.C.; Page, R. Structural basis for protein phosphatase 1 regulation and specificity. FEBS J. 2013, 280, 596–611. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The objectives of this study. (a) An example of a hierarchical organization of protein structure dataset. A and B are two protein families. A1, A2 and A3 are the subfamilies of family A and B1 and B2 are the subfamilies of family B. (b) The objectives Ia and Ib are to identify the common and specific substructures of protein families. Common and specific substructures are illustrated. (c) The objective IIa is to quantify structural changes of a binding site of a protein induced by the binding of its interacting protein. The conformational changes of a binding site are illustrated. (d) The objective IIb is to quantify structural changes of a phosphorylation site of a protein induced by phosphorylation or dephosphorylation. The conformational changes of a phosphorylation site are illustrated.
Figure 1. The objectives of this study. (a) An example of a hierarchical organization of protein structure dataset. A and B are two protein families. A1, A2 and A3 are the subfamilies of family A and B1 and B2 are the subfamilies of family B. (b) The objectives Ia and Ib are to identify the common and specific substructures of protein families. Common and specific substructures are illustrated. (c) The objective IIa is to quantify structural changes of a binding site of a protein induced by the binding of its interacting protein. The conformational changes of a binding site are illustrated. (d) The objective IIb is to quantify structural changes of a phosphorylation site of a protein induced by phosphorylation or dephosphorylation. The conformational changes of a phosphorylation site are illustrated.
Kinasesphosphatases 02 00021 g001
Figure 2. Hierarchical clustering study of kinase and phosphatase structures. (a,b) The TSR keys are calculated for each structure and the generalized Jaccard algorithm is used to calculate pairwise structural similarities. Pairwise structural distances are calculated based on the similarity values. The pairwise structural distances are the input for hierarchical clustering (a) and MDS (b) analyses. This approach is applicable for other clustering and MDS analyses in this study. The numbers of kinase structures and phosphatase structures are indicated. (a) The amino acid grouping algorithm is not used. The color scale on the upper and left side indicate structural distances. (b) The amino acid grouping algorithm is used. (c) The numbers of distinct common, total common, distinct and total TSR keys are calculated for the kinases and phosphatases together, kinases and phosphatases. The number of protein structures is labeled. The average numbers are indicated. (d) The Venn diagram shows the numbers of overlapped distinct TSR keys for both kinase and phosphatase classes, specific TSR keys for only kinase class and specific TSR keys only for phosphatase class. (e) “Specific” TSR keys for kinase family and phosphatase family are identified. The average values, SDs and coefficients of variation are indicated. The “Specific” TSR keys are shown. (f) The triangles associated with the “Specific” TSR keys of kinases are illustrated. PDB ID, inhibitor, amino acids and TSR keys are shown. (g) The amino acids associated with the six “Specific” TSR keys for phosphatases are shown. PDB ID and inhibitor are shown.
Figure 2. Hierarchical clustering study of kinase and phosphatase structures. (a,b) The TSR keys are calculated for each structure and the generalized Jaccard algorithm is used to calculate pairwise structural similarities. Pairwise structural distances are calculated based on the similarity values. The pairwise structural distances are the input for hierarchical clustering (a) and MDS (b) analyses. This approach is applicable for other clustering and MDS analyses in this study. The numbers of kinase structures and phosphatase structures are indicated. (a) The amino acid grouping algorithm is not used. The color scale on the upper and left side indicate structural distances. (b) The amino acid grouping algorithm is used. (c) The numbers of distinct common, total common, distinct and total TSR keys are calculated for the kinases and phosphatases together, kinases and phosphatases. The number of protein structures is labeled. The average numbers are indicated. (d) The Venn diagram shows the numbers of overlapped distinct TSR keys for both kinase and phosphatase classes, specific TSR keys for only kinase class and specific TSR keys only for phosphatase class. (e) “Specific” TSR keys for kinase family and phosphatase family are identified. The average values, SDs and coefficients of variation are indicated. The “Specific” TSR keys are shown. (f) The triangles associated with the “Specific” TSR keys of kinases are illustrated. PDB ID, inhibitor, amino acids and TSR keys are shown. (g) The amino acids associated with the six “Specific” TSR keys for phosphatases are shown. PDB ID and inhibitor are shown.
Kinasesphosphatases 02 00021 g002
Figure 3. Hierarchical study of kinase 3D structures. (a) The kinase superfamily is organized as a hierarchical fashion. Level 1 includes cAMPDK, MAK, CDK, CKII, EGFR and others. Level 2 includes CDK1–2 and 4–9. (b) Hierarchical clustering result of kinase structures is shown. Amino acid grouping algorithm is not used. cAMPDK, MAK, CKII, MAK, EGFR and members of CDKs are labeled and the numbers of the structures are indicated. (c) The percentages of distinct and total common TSR keys for cAMPDK, MAK, CDK, CKII and EGFR are calculated and are presented. The numbers of structures in each family and average values are indicated. The error bars represent SDs. (d) Hierarchical clustering result of kinase structures is shown. The amino acid grouping algorithm is used. cAMPDK, CKII, MAK, EGFR and members of CDKs are labeled and the numbers of the structures are indicated. (e) The MDS analysis of the structures of CDK family members. The numbers of structures used in the analysis are labeled.
Figure 3. Hierarchical study of kinase 3D structures. (a) The kinase superfamily is organized as a hierarchical fashion. Level 1 includes cAMPDK, MAK, CDK, CKII, EGFR and others. Level 2 includes CDK1–2 and 4–9. (b) Hierarchical clustering result of kinase structures is shown. Amino acid grouping algorithm is not used. cAMPDK, MAK, CKII, MAK, EGFR and members of CDKs are labeled and the numbers of the structures are indicated. (c) The percentages of distinct and total common TSR keys for cAMPDK, MAK, CDK, CKII and EGFR are calculated and are presented. The numbers of structures in each family and average values are indicated. The error bars represent SDs. (d) Hierarchical clustering result of kinase structures is shown. The amino acid grouping algorithm is used. cAMPDK, CKII, MAK, EGFR and members of CDKs are labeled and the numbers of the structures are indicated. (e) The MDS analysis of the structures of CDK family members. The numbers of structures used in the analysis are labeled.
Kinasesphosphatases 02 00021 g003
Figure 4. Specific TSR keys for CAMPK, MAK, CDK, CKII and EGFR are identified, quantified and presented. (a) The numbers of “Specific” TSR keys for each family, numbers of structures and average values are indicated. The error bars represent SDs. (bf) Examples of “Specific” TSR keys are illustrated for cAMPDK (b), MAK (c), CDK (d), CKII (e) and EGFR (f). One of the longest connected specific key components is shown for each family. A connected component is defined as two or more triangles that share a joint vertex or a joint edge. This definition is applicable for other figures. PDB IDs and amino acids are labeled.
Figure 4. Specific TSR keys for CAMPK, MAK, CDK, CKII and EGFR are identified, quantified and presented. (a) The numbers of “Specific” TSR keys for each family, numbers of structures and average values are indicated. The error bars represent SDs. (bf) Examples of “Specific” TSR keys are illustrated for cAMPDK (b), MAK (c), CDK (d), CKII (e) and EGFR (f). One of the longest connected specific key components is shown for each family. A connected component is defined as two or more triangles that share a joint vertex or a joint edge. This definition is applicable for other figures. PDB IDs and amino acids are labeled.
Kinasesphosphatases 02 00021 g004
Figure 5. Specific TSR keys for CDK2 and CDK9 are identified, quantified and presented. (a) The numbers of “Specific” TSR keys for CDK2 and CDK9, numbers of structures and average values are indicated. The error bars represent SDs. (b,c) Examples of “Specific” TSR keys are illustrated for CDK2 (b) and CDK9 (c). One of the longest connected “Specific” key components is shown for CDK2 and CDK9.
Figure 5. Specific TSR keys for CDK2 and CDK9 are identified, quantified and presented. (a) The numbers of “Specific” TSR keys for CDK2 and CDK9, numbers of structures and average values are indicated. The error bars represent SDs. (b,c) Examples of “Specific” TSR keys are illustrated for CDK2 (b) and CDK9 (c). One of the longest connected “Specific” key components is shown for CDK2 and CDK9.
Kinasesphosphatases 02 00021 g005
Figure 6. Hierarchical relationships of phosphatase family. (a,b) MDS analyses of different phosphatase families. The numbers of the structures in each family are labeled. Amino acid grouping algorithm is not used for (a) but is used for (b). (c) The difference of structural similarities between applying amino acid grouping algorithm and not applying amino acid grouping algorithm is illustrated. The average values of structural similarities are labeled. (d) MDS analysis of PPP subfamily structures. The numbers of structures in each subfamily are labeled. (e) The hierarchical clustering of PTP subfamily structures. The numbers of the structures in each subfamily are labeled. (f) The percentages of distinct and total “common” TSR keys for PPP and PTP subfamilies are calculated and presented. The number of structures in each family and average values are indicated. The error bars represent SDs.
Figure 6. Hierarchical relationships of phosphatase family. (a,b) MDS analyses of different phosphatase families. The numbers of the structures in each family are labeled. Amino acid grouping algorithm is not used for (a) but is used for (b). (c) The difference of structural similarities between applying amino acid grouping algorithm and not applying amino acid grouping algorithm is illustrated. The average values of structural similarities are labeled. (d) MDS analysis of PPP subfamily structures. The numbers of structures in each subfamily are labeled. (e) The hierarchical clustering of PTP subfamily structures. The numbers of the structures in each subfamily are labeled. (f) The percentages of distinct and total “common” TSR keys for PPP and PTP subfamilies are calculated and presented. The number of structures in each family and average values are indicated. The error bars represent SDs.
Kinasesphosphatases 02 00021 g006
Figure 7. Hierarchical relationships of phosphatase families and subfamilies are demonstrated using the specific TSR keys. (a) A graph presentation showing the hierarchical relationships of phosphatase structures. (b) The numbers of “Specific” TSR keys for PPP and PTP families, numbers of structures and average values are indicated. The error bars represent SDs. (c,d) Examples of “Specific” TSR keys are illustrated for PPP (c) and PTP (d) families. One of the longest connected “Specific” key components is shown for PPP (c) and PTP (d) families. (e) The numbers of “Specific” TSR keys for PTPN1 and PTPN11 subfamilies, numbers of structures and average values are indicated. The error bars represent SDs. (f,g) Examples of specific TSR keys are illustrated for PTPN1 (f) and PTPN11 (g) families. One of the longest connected “Specific” key components is shown for PTPN1 (f) and PTPN11 (g) subfamilies.
Figure 7. Hierarchical relationships of phosphatase families and subfamilies are demonstrated using the specific TSR keys. (a) A graph presentation showing the hierarchical relationships of phosphatase structures. (b) The numbers of “Specific” TSR keys for PPP and PTP families, numbers of structures and average values are indicated. The error bars represent SDs. (c,d) Examples of “Specific” TSR keys are illustrated for PPP (c) and PTP (d) families. One of the longest connected “Specific” key components is shown for PPP (c) and PTP (d) families. (e) The numbers of “Specific” TSR keys for PTPN1 and PTPN11 subfamilies, numbers of structures and average values are indicated. The error bars represent SDs. (f,g) Examples of specific TSR keys are illustrated for PTPN1 (f) and PTPN11 (g) families. One of the longest connected “Specific” key components is shown for PTPN1 (f) and PTPN11 (g) subfamilies.
Kinasesphosphatases 02 00021 g007
Figure 8. The DFG motif’s structural characteristics. (a,b) The amino acid sequence alignment of selected kinases using Vector NTI. The DFG (a) and DWG (a,b) motifs are highlighted. (c,d) MaxDist (c) and Theta (d) values of DFG motifs with different TSR numbers from kinases and phosphatases are calculated and presented. *** means p value < 0.001. ** means p value < 0.01. * means p value < 0.05. These are applicable for other figures. The numbers of kinases and phosphatases and structures in each TSR key category are shown. This is applicable to other figures. (e,f) Example of the DFG motifs of kinases with the TSR values of 5484102 (e) and 5484131 (f) are illustrated. PDB IDs, amino acids, inhibitors and keys are labeled.
Figure 8. The DFG motif’s structural characteristics. (a,b) The amino acid sequence alignment of selected kinases using Vector NTI. The DFG (a) and DWG (a,b) motifs are highlighted. (c,d) MaxDist (c) and Theta (d) values of DFG motifs with different TSR numbers from kinases and phosphatases are calculated and presented. *** means p value < 0.001. ** means p value < 0.01. * means p value < 0.05. These are applicable for other figures. The numbers of kinases and phosphatases and structures in each TSR key category are shown. This is applicable to other figures. (e,f) Example of the DFG motifs of kinases with the TSR values of 5484102 (e) and 5484131 (f) are illustrated. PDB IDs, amino acids, inhibitors and keys are labeled.
Kinasesphosphatases 02 00021 g008
Figure 9. DFG and WPD motifs of phosphatases. (a,b) Example of the DFG motifs of phosphatases with the TSR values of 5484102 (a) and 5484131 (b) are illustrated. PDB IDs, amino acids, inhibitors and keys are labeled. (c) The sequence alignment shows the DFG and WPD motifs and WxDP sequence. (d) A DFG motif and a WPD motif share a common residue of Asp.
Figure 9. DFG and WPD motifs of phosphatases. (a,b) Example of the DFG motifs of phosphatases with the TSR values of 5484102 (a) and 5484131 (b) are illustrated. PDB IDs, amino acids, inhibitors and keys are labeled. (c) The sequence alignment shows the DFG and WPD motifs and WxDP sequence. (d) A DFG motif and a WPD motif share a common residue of Asp.
Kinasesphosphatases 02 00021 g009
Figure 10. WPD motif of phosphatases has its structural characteristics. (a,b) MaxDist (a) and Theta (b) values of WPD motifs from phosphatases and kinases are calculated and presented. Numbers of structures in each category, average and SD are shown. (ce) Representative WPD and DFG motifs are illustrated for phosphatases (c,d) and kinase (e). PDB IDs, amino acids and inhibitors are labeled. (f,g) MaxDist (f) and Theta (g) values of WxDP sequences with different TSR numbers from phosphatases and kinases are calculated and presented. The numbers of kinases and phosphatases and structures in each TSR key category are shown. *** means p value < 0.001.
Figure 10. WPD motif of phosphatases has its structural characteristics. (a,b) MaxDist (a) and Theta (b) values of WPD motifs from phosphatases and kinases are calculated and presented. Numbers of structures in each category, average and SD are shown. (ce) Representative WPD and DFG motifs are illustrated for phosphatases (c,d) and kinase (e). PDB IDs, amino acids and inhibitors are labeled. (f,g) MaxDist (f) and Theta (g) values of WxDP sequences with different TSR numbers from phosphatases and kinases are calculated and presented. The numbers of kinases and phosphatases and structures in each TSR key category are shown. *** means p value < 0.001.
Kinasesphosphatases 02 00021 g010
Figure 11. Conformational changes of phosphorylation site of CDK2 induced by binding of cyclin or phosphorylation are quantified using the TSR-based algorithm. (a) Hierarchical cluster analysis of CDK2 structures categorized by absence or presence of an interacting proteins, and phosphorylated or not phosphorylated. (b) The structures categorized in (a) are also studied by the MDS analysis. (a,b) The numbers of structures in each category are labeled. (c,d) MaxDist (c) and Theta (d) values of six amino acids in the phosphorylation site of CDK2 with or without cyclin and phosphorylated and not phosphorylated are calculated and presented. Mean values are labeled. The numbers of structures for Thr160_NoCylin, Thr160_Cyclin and TPO_Cyclin are 130, 24 and 57, respectively. (e) An example of phosphorylation site of CDK2 is illustrated. PDB ID, amino acids, ATP and molecules are labeled. (f) A skeletal model illustrates the conformational changes of the phosphorylation site of CDK2 due to the binding of cyclin or phosphorylation/dephosphorylation. *** means p value < 0.001. * means p value < 0.05.
Figure 11. Conformational changes of phosphorylation site of CDK2 induced by binding of cyclin or phosphorylation are quantified using the TSR-based algorithm. (a) Hierarchical cluster analysis of CDK2 structures categorized by absence or presence of an interacting proteins, and phosphorylated or not phosphorylated. (b) The structures categorized in (a) are also studied by the MDS analysis. (a,b) The numbers of structures in each category are labeled. (c,d) MaxDist (c) and Theta (d) values of six amino acids in the phosphorylation site of CDK2 with or without cyclin and phosphorylated and not phosphorylated are calculated and presented. Mean values are labeled. The numbers of structures for Thr160_NoCylin, Thr160_Cyclin and TPO_Cyclin are 130, 24 and 57, respectively. (e) An example of phosphorylation site of CDK2 is illustrated. PDB ID, amino acids, ATP and molecules are labeled. (f) A skeletal model illustrates the conformational changes of the phosphorylation site of CDK2 due to the binding of cyclin or phosphorylation/dephosphorylation. *** means p value < 0.001. * means p value < 0.05.
Kinasesphosphatases 02 00021 g011
Figure 12. The CDK2 undergoes structural changes upon the binding of an interacting protein. (a) Hierarchical clustering of CDK2 structures with or without an interacting protein. Numbers of structures in each category are labeled. (b) The pairwise structural similarities of CDK2 within each category or between any two categories are shown. (c) The pairwise structural similarities of CDK2 phosphorylation site within each category or between any two categories are shown. (b,c) Means are labeled. Error bars indicate SDs.
Figure 12. The CDK2 undergoes structural changes upon the binding of an interacting protein. (a) Hierarchical clustering of CDK2 structures with or without an interacting protein. Numbers of structures in each category are labeled. (b) The pairwise structural similarities of CDK2 within each category or between any two categories are shown. (c) The pairwise structural similarities of CDK2 phosphorylation site within each category or between any two categories are shown. (b,c) Means are labeled. Error bars indicate SDs.
Kinasesphosphatases 02 00021 g012
Figure 13. An example illustrates the calculations of “Specific”, “specific”, “Common” and “common” TSR keys for Class X, Class Y and the dataset. (ah): red represents Class X, blue represents Class Y, green represents key frequencies.
Figure 13. An example illustrates the calculations of “Specific”, “specific”, “Common” and “common” TSR keys for Class X, Class Y and the dataset. (ah): red represents Class X, blue represents Class Y, green represents key frequencies.
Kinasesphosphatases 02 00021 g013
Figure 14. An example illustrates the calculations of “Near_Specific” TSR keyset or single key for Class A, Class B and the dataset. (ad): red represents Class A and blue represents Class B.
Figure 14. An example illustrates the calculations of “Near_Specific” TSR keyset or single key for Class A, Class B and the dataset. (ad): red represents Class A and blue represents Class B.
Kinasesphosphatases 02 00021 g014
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Milon, T.I.; Rauniyar, K.; Furman, S.; Orthi, K.H.; Wang, Y.; Raghavan, V.; Xu, W. Representing and Quantifying Conformational Changes of Kinases and Phosphatases Using the TSR-Based Algorithm. Kinases Phosphatases 2024, 2, 315-339.

AMA Style

Milon TI, Rauniyar K, Furman S, Orthi KH, Wang Y, Raghavan V, Xu W. Representing and Quantifying Conformational Changes of Kinases and Phosphatases Using the TSR-Based Algorithm. Kinases and Phosphatases. 2024; 2(4):315-339.

Chicago/Turabian Style

Milon, Tarikul I., Krishna Rauniyar, Sara Furman, Khairum H. Orthi, Yingchun Wang, Vijay Raghavan, and Wu Xu. 2024. "Representing and Quantifying Conformational Changes of Kinases and Phosphatases Using the TSR-Based Algorithm" Kinases and Phosphatases 2, no. 4: 315-339.

APA Style

Milon, T. I., Rauniyar, K., Furman, S., Orthi, K. H., Wang, Y., Raghavan, V., & Xu, W. (2024). Representing and Quantifying Conformational Changes of Kinases and Phosphatases Using the TSR-Based Algorithm. Kinases and Phosphatases, 2(4), 315-339.

Article Metrics

Back to TopTop