GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks
Abstract
:1. Introduction
2. Results and Discussion
2.1. Comparison of General and Specific Category Site-Prediction Performance
2.2. Analysis of Protein Structure Results from Different Sources
- (i)
- training with experimental structures and predicting with experimental structures;
- (ii)
- training with AlphaFold-predicted structures and predicting with experimental structures;
- (iii)
- training with AlphaFold-predicted structures and predicting with AlphaFold-predicted structures.
2.3. Ablation Study
2.3.1. The Ablation of Modules
2.3.2. The Ablation of Features
2.4. Comparison Results with Other Methods
2.5. Generalizability Analysis
3. Materials and Methods
3.1. Datasets
3.2. Features
3.2.1. Sequence Features
- One-hot encoding, with a dimension of , where L represents the sequence length.
- Physicochemical properties of proteins, where different types of amino acids are represented by a 5-dimensional principal component vector, resulting in a dimension of [41].
- Secondary structure (SS) of proteins, including 8 secondary structure categories: 310-helix (G), -helix (H), -helix (I), -bridge (B), -strand (E), bend (S), -turn (T), and coil (C). Additionally, secondary structure features incorporate solvent-accessible surface area (ASA), as well as upper and lower hemisphere exposures (HSEU and HSED), resulting in a total dimension of 11. These features are computed using the SPOT-1D-single method [44].
3.2.2. Structural Features
3.3. Methods
3.3.1. GraphSAGE
3.3.2. ProtBERT Combines multiCNN
3.3.3. MultiFNN
3.4. Implementation Details
3.5. Evaluating Indicator
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Walsh, C.T.; Garneau-Tsodikova, S.; Gatto, G.J., Jr. Protein posttranslational modifications: The chemistry of proteome diversifications. Angew. Chem. Int. Ed. 2005, 44, 7342–7372. [Google Scholar] [CrossRef] [PubMed]
- Audagnotto, M.; Dal Peraro, M. Protein post-translational modifications: In silico prediction tools and molecular modeling. Comput. Struct. Biotechnol. J. 2017, 15, 307–319. [Google Scholar] [CrossRef]
- Deribe, Y.L.; Pawson, T.; Dikic, I. Post-translational modifications in signal integration. Nat. Struct. Mol. Biol. 2010, 17, 666–672. [Google Scholar] [CrossRef]
- Cohen, P. The role of protein phosphorylation in neural and hormonal control of cellular activity. Nature 1982, 296, 613–620. [Google Scholar] [CrossRef] [PubMed]
- Johnson, L.N. The regulation of protein phosphorylation. Biochem. Soc. Trans. 2009, 37, 627–641. [Google Scholar] [CrossRef]
- Cohen, P. The origins of protein phosphorylation. Nat. Cell Biol. 2002, 4, E127–E130. [Google Scholar] [CrossRef]
- Fleuren, E.D.; Zhang, L.; Wu, J.; Daly, R.J. The kinome ‘at large’ in cancer. Nat. Rev. Cancer 2016, 16, 83–98. [Google Scholar] [CrossRef] [PubMed]
- Creixell, P.; Schoof, E.M.; Simpson, C.D.; Longden, J.; Miller, C.J.; Lou, H.J.; Perryman, L.; Cox, T.R.; Zivanovic, N.; Palmeri, A.; et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell 2015, 163, 202–217. [Google Scholar] [CrossRef]
- Pearson, R.B.; Kemp, B.E. [3] Protein kinase phosphorylation site sequences and consensus specificity motifs: Tabulations. Methods Enzymol. 1991, 200, 62–81. [Google Scholar]
- Diella, F.; Cameron, S.; Gemünd, C.; Linding, R.; Via, A.; Kuster, B.; Sicheritz-Pontén, T.; Blom, N.; Gibson, T.J. Phospho. ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinform. 2004, 5, 79. [Google Scholar] [CrossRef]
- Leuchowius, K.J.; Weibrecht, I.; Söderberg, O. In situ proximity ligation assay for microscopy and flow cytometry. Curr. Protoc. Cytom. 2011, 56, 9–36. [Google Scholar] [CrossRef] [PubMed]
- Fuchs, S.M.; Strahl, B.D. Antibody recognition of histone post-translational modifications: Emerging issues and future prospects. Epigenomics 2011, 3, 247–249. [Google Scholar] [CrossRef] [PubMed]
- Maiti, S.; Hassan, A.; Mitra, P. Boosting phosphorylation site prediction with sequence feature-based machine learning. Proteins Struct. Funct. Bioinform. 2020, 88, 284–291. [Google Scholar] [CrossRef]
- He, Z.; Yang, C.; Guo, G.; Li, N.; Yu, W. Motif-All: Discovering all phosphorylation motifs. BMC Bioinform. 2011, 12, S22. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Wu, J.; Gong, H.; Deng, S.; He, Z. Mining conditional phosphorylation motifs. IEEE ACM Trans. Comput. Biol. Bioinform. 2014, 11, 915–927. [Google Scholar]
- Puntervoll, P.; Linding, R.; Gemund, C.; Chabanis-Davidson, S.; Mattingsdal, M.; Cameron, S.; Martin, D.M.; Ausiello, G.; Brannetti, B.; Costantini, A.; et al. ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003, 31, 3625–3630. [Google Scholar] [CrossRef]
- Sigrist, C.J.; De Castro, E.; Cerutti, L.; Cuche, B.A.; Hulo, N.; Bridge, A.; Bougueleret, L.; Xenarios, I. New and continuing developments at PROSITE. Nucleic Acids Res. 2012, 41, D344–D347. [Google Scholar] [CrossRef]
- Amanchy, R.; Periaswamy, B.; Mathivanan, S.; Reddy, R.; Tattikota, S.G.; Pandey, A. A curated compendium of phosphorylation motifs. Nat. Biotechnol. 2007, 25, 285–286. [Google Scholar] [CrossRef]
- Mount, D.W. Using BLOSUM in sequence alignments. Cold Spring Harb. Protoc. 2008, 2008, pdb–top39. [Google Scholar] [CrossRef] [PubMed]
- Jung, I.; Matsuyama, A.; Yoshida, M.; Kim, D. PostMod: Sequence based prediction of kinase-specific phosphorylation sites with indirect relationship. BMC Bioinform. 2010, 11, S10. [Google Scholar] [CrossRef] [PubMed]
- Suo, S.B.; Qiu, J.D.; Shi, S.P.; Chen, X.; Liang, R.P. PSEA: Kinase-specific prediction and analysis of human phosphorylation substrates. Sci. Rep. 2014, 4, 4524. [Google Scholar] [CrossRef] [PubMed]
- Blom, N.; Sicheritz-Pontén, T.; Gupta, R.; Gammeltoft, S.; Brunak, S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 2004, 4, 1633–1649. [Google Scholar] [CrossRef] [PubMed]
- Basu, S.; Plewczynski, D. AMS 3.0: Prediction of post-translational modifications. BMC Bioinform. 2010, 11, 210. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.D.; Lee, T.Y.; Tzeng, S.W.; Horng, J.T. KinasePhos: A web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 2005, 33, W226–W229. [Google Scholar] [CrossRef] [PubMed]
- Gao, J.; Thelen, J.J.; Dunker, A.K.; Xu, D. Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol. Cell. Proteom. 2010, 9, 2586–2600. [Google Scholar] [CrossRef]
- Dou, Y.; Yao, B.; Zhang, C. PhosphoSVM: Prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 2014, 46, 1459–1469. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.H.; Lee, J.; Oh, B.; Kimm, K.; Koh, I. Prediction of phosphorylation sites using SVMs. Bioinformatics 2004, 20, 3179–3184. [Google Scholar] [CrossRef]
- Biswas, A.K.; Noman, N.; Sikder, A.R. Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinform. 2010, 11, 273. [Google Scholar] [CrossRef] [PubMed]
- Xue, Y.; Li, A.; Wang, L.; Feng, H.; Yao, X. PPSP: Prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinform. 2006, 7, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Dang, T.H.; Van Leemput, K.; Verschoren, A.; Laukens, K. Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics 2008, 24, 2857–2864. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.; Xing, P.; Tang, J.; Zou, Q. PhosPred-RF: A novel sequence-based predictor for phosphorylation sites using sequential information only. IEEE Trans. Nanobiosci. 2017, 16, 240–247. [Google Scholar] [CrossRef] [PubMed]
- Qiu, W.R.; Xiao, X.; Xu, Z.C.; Chou, K.C. iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 2016, 7, 51270. [Google Scholar] [CrossRef] [PubMed]
- Wang, D.; Zeng, S.; Xu, C.; Qiu, W.; Liang, Y.; Joshi, T.; Xu, D. MusiteDeep: A deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 2017, 33, 3909–3916. [Google Scholar] [CrossRef]
- Luo, F.; Wang, M.; Liu, Y.; Zhao, X.M.; Li, A. DeepPhos: Prediction of protein phosphorylation sites with deep learning. Bioinformatics 2019, 35, 2766–2773. [Google Scholar] [CrossRef]
- Ahmed, S.; Kabir, M.; Arif, M.; Khan, Z.U.; Yu, D.J. DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information. Anal. Biochem. 2021, 612, 113955. [Google Scholar] [CrossRef]
- Guo, L.; Wang, Y.; Xu, X.; Cheng, K.K.; Long, Y.; Xu, J.; Li, S.; Dong, J. DeepPSP: A global–local information-based deep neural network for the prediction of protein phosphorylation sites. J. Proteome Res. 2020, 20, 346–356. [Google Scholar] [CrossRef]
- Lv, H.; Dao, F.Y.; Zulfiqar, H.; Lin, H. DeepIPs: Comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach. Briefings Bioinform. 2021, 22, bbab244. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Consortium, U. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef] [PubMed]
- Jones, P.; Côté, R.G.; Martens, L.; Quinn, A.F.; Taylor, C.F.; Derache, W.; Hermjakob, H.; Apweiler, R. PRIDE: A public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 2006, 34, D659–D663. [Google Scholar] [CrossRef] [PubMed]
- Venkatarajan, M.S.; Braun, W. New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Mol. Model. Annu. 2001, 7, 445–453. [Google Scholar]
- Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch, A. UniProtKB/Swiss-Prot: The manually annotated section of the UniProt KnowledgeBase. In Plant Bioinformatics: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2007; pp. 89–112. [Google Scholar]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed]
- Singh, J.; Litfin, T.; Paliwal, K.; Singh, J.; Hanumanthappa, A.K.; Zhou, Y. SPOT-1D-Single: Improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning. Bioinformatics 2021, 37, 3464–3472. [Google Scholar] [CrossRef] [PubMed]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rihawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Towards cracking the language of Life’s code through self-supervised deep learning and high performance computing. arXiv 2020, arXiv:2007.06225. [Google Scholar]
- Godzik, A.; Skolnick, J. Flexible algorithm for direct multiple alignment of protein structures and sequences. Bioinformatics 1994, 10, 587–596. [Google Scholar] [CrossRef]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Suzek, B.E.; Wang, Y.; Huang, H.; McGarvey, P.B.; Wu, C.H.; Consortium, U. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015, 31, 926–932. [Google Scholar] [CrossRef]
- Steinegger, M.; Mirdita, M.; Söding, J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat. Methods 2019, 16, 603–606. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. 2017. Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 15 January 2025).
Accuracy | Precision | F1 Score | |
---|---|---|---|
S | 0.8888 | 0.7927 | 0.8374 |
T | 0.8590 | 0.7418 | 0.7951 |
Y | 0.7985 | 0.6460 | 0.7120 |
STY | 0.9196 | 0.8519 | 0.8829 |
Train | Test | Accuracy | Precision | F1-Score |
---|---|---|---|---|
AlphaFold2 | AlphaFold2 | 0.9136 | 0.8566 | 0.8810 |
PDB | AlphaFold2 | 0.9197 | 0.8509 | 0.8828 |
PDB | 0.9196 | 0.8519 | 0.8829 |
Train | Test | Accuracy | Precision | F1-Score |
---|---|---|---|---|
AlphaFold2-high | AlphaFold2 | 0.9174 | 0.8543 | 0.8823 |
AlphaFold2-low | AlphaFold2 | 0.4244 | 0.8445 | 0.5133 |
Accuracy | Precision | F1-Score | ||
---|---|---|---|---|
STY | GraphPhos | 0.9196 | 0.8519 | 0.8829 |
GraphPhos (w/o SAGE) | 0.8271 | 0.8589 | 0.8398 | |
GraphPhos (w/o ProtBERT-CNN) | 0.8277 | 0.8606 | 0.8408 | |
S | GraphPhos | 0.8888 | 0.7927 | 0.8374 |
GraphPhos (w/o SAGE) | 0.6524 | 0.8036 | 0.6999 | |
GraphPhos (w/o ProtBERT-CNN) | 0.8593 | 0.8054 | 0.8271 | |
T | GraphPhos | 0.8593 | 0.8054 | 0.8271 |
GraphPhos (w/o SAGE) | 0.7371 | 0.7586 | 0.7425 | |
GraphPhos (w/o ProtBERT-CNN) | 0.8590 | 0.7418 | 0.7951 | |
Y | GraphPhos | 0.7985 | 0.646 | 0.712 |
GraphPhos (w/o SAGE) | 0.7851 | 0.6665 | 0.7144 | |
GraphPhos (w/o ProtBERT-CNN) | 0.7516 | 0.6387 | 0.6878 |
Accuracy | Precision | F1-Score | ||
---|---|---|---|---|
phy pssm ss | S | 0.8729 | 0.8042 | 0.8332 |
T | 0.8034 | 0.7577 | 0.7763 | |
Y | 0.7237 | 0.7023 | 0.7017 | |
STY | 0.9084 | 0.8563 | 0.8790 | |
one-hot pssm ss | S | 0.6712 | 0.7966 | 0.7133 |
T | 0.8539 | 0.753 | 0.7951 | |
Y | 0.6598 | 0.6893 | 0.6696 | |
STY | 0.6339 | 0.8606 | 0.7161 | |
one-hot phy ss | S | 0.8671 | 0.8086 | 0.8328 |
T | 0.7247 | 0.7642 | 0.7395 | |
Y | 0.5237 | 0.6911 | 0.5517 | |
STY | 0.8017 | 0.8511 | 0.8223 | |
one-hot phy pssm | S | 0.8766 | 0.7987 | 0.833 |
T | 0.8574 | 0.7616 | 0.7985 | |
Y | 0.7907 | 0.6515 | 0.7099 | |
STY | 0.8831 | 0.8537 | 0.8661 |
S | T | Y | |
---|---|---|---|
Musite | 0.6923 | 0.6478 | 0.6180 |
PhosphoSVM | 0.7086 | 0.6669 | 0.6391 |
iPhos-PseEn | 0.7680 | 0.7520 | 0.7900 |
MusiteDeep | 0.8095 | 0.8095 | 0.8571 |
DeepPhos | 0.6890 | 0.6890 | 0.6890 |
DeepPSP | 0.8021 | 0.8021 | 0.7619 |
DeepIPs | 0.8063 | 0.8063 | 0.8333 |
GraphPhos | 0.8819 | 0.9600 | 0.9779 |
Accuracy | Precision | F1-Score | |
---|---|---|---|
GraphPhos-ran | 0.9136 | 0.8566 | 0.8810 |
GraphPhos-sim | 0.8718 | 0.8932 | 0.8811 |
Train Set | Test Set | |||
---|---|---|---|---|
Positive | Negative | Positive | Negative | |
Serine | 10,420 | 10,420 | 2814 | 2814 |
Threonine | 1244 | 1244 | 224 | 224 |
Tyrosine | 695 | 695 | 80 | 80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Yang, X.; Gao, S.; Liang, Y.; Shi, X. GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks. Int. J. Mol. Sci. 2025, 26, 941. https://doi.org/10.3390/ijms26030941
Wang Z, Yang X, Gao S, Liang Y, Shi X. GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks. International Journal of Molecular Sciences. 2025; 26(3):941. https://doi.org/10.3390/ijms26030941
Chicago/Turabian StyleWang, Zeyu, Xiaoli Yang, Songye Gao, Yanchun Liang, and Xiaohu Shi. 2025. "GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks" International Journal of Molecular Sciences 26, no. 3: 941. https://doi.org/10.3390/ijms26030941
APA StyleWang, Z., Yang, X., Gao, S., Liang, Y., & Shi, X. (2025). GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks. International Journal of Molecular Sciences, 26(3), 941. https://doi.org/10.3390/ijms26030941