DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes
Abstract
:1. Introduction
2. Results and Discussion
2.1. Processing and Analyzing Datasets Extracted from UniProt
2.2. Cross-Validation on the Training Set
2.3. Comparison with QUEEN
2.4. Case Study
2.5. Web Platform
3. Materials and Methods
3.1. Datasets
3.2. The model Architecture of DeepSub
3.3. Model Training
3.4. Baseline Models
3.5. Loss Function
3.6. Evaluation Metrics
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kumari, N.; Yadav, S. Modulation of protein oligomerization: An overview. Prog. Biophys. Mol. Biol. 2019, 149, 99–113. [Google Scholar] [CrossRef]
- Gwyther, R.E.; Jones, D.D.; Worthy, H.L. Better together: Building protein oligomers naturally and by design. Biochem. Soc. Trans. 2019, 47, 1773–1780. [Google Scholar] [CrossRef]
- Oohora, K.; Hayashi, T. Hemoprotein-based supramolecular assembling systems. Curr. Opin. Chem. Biol. 2014, 19, 154–161. [Google Scholar] [CrossRef]
- Wu, J.; Cao, C.; Loch, R.A.; Tiiman, A.; Luo, J. Single-molecule studies of amyloid proteins: From biophysical properties to diagnostic perspectives. Q. Rev. Biophys. 2020, 53, e12. [Google Scholar] [CrossRef]
- Liu, C.; Luo, J. Protein Oligomer Engineering: A New Frontier for Studying Protein Structure, Function, and Toxicity. Angew. Chem. 2023, 62, e202216480. [Google Scholar] [CrossRef]
- Alghazali, R.; Nugud, A.; El-Serafi, A. Glycan Modifications as Regulators of Stem Cell Fate. Biology 2024, 13, 76. [Google Scholar] [CrossRef] [PubMed]
- Selwood, T.; Jaffe, E.K. Dynamic dissociating homo-oligomers and the control of protein function. Arch. Biochem. Biophys. 2012, 519, 131–143. [Google Scholar] [CrossRef] [PubMed]
- Fan, D.; Creemers, E.E.; Kassiri, Z. Matrix as an interstitial transport system. Circ. Res. 2014, 114, 889–902. [Google Scholar] [CrossRef] [PubMed]
- Maggio, R.; Novi, F.; Scarselli, M.; Corsini, G.U. The impact of G-protein-coupled receptor hetero-oligomerization on function and pharmacology. FEBS J. 2005, 272, 2939–2946. [Google Scholar] [CrossRef]
- Poddar, M.K.; Banerjee, S. Molecular aspects of pathophysiology of platelet receptors. In Platelets; IntechOpen: Rijeka, Croatia, 2020. [Google Scholar]
- Baek, M.; Park, T.; Heo, L.; Park, C.; Seok, C. GalaxyHomomer: A web server for protein homo-oligomer structure prediction from a monomer sequence or structure. Nucleic Acids Res. 2017, 45, W320–W324. [Google Scholar] [CrossRef]
- Chen, Y.; Gustafsson, J.; Tafur Rangel, A.; Anton, M.; Domenzain, I.; Kittikunapong, C.; Li, F.; Yuan, L.; Nielsen, J.; Kerkhoven, E.J. Reconstruction, simulation and analysis of enzyme-constrained metabolic models using GECKO Toolbox 3.0. Nat. Protoc. 2024. [Google Scholar] [CrossRef]
- Consortium, T.U. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2022, 51, D523–D531. [Google Scholar] [CrossRef]
- Mao, Z.; Zhao, X.; Yang, X.; Zhang, P.; Du, J.; Yuan, Q.; Ma, H. ECMpy, a Simplified Workflow for Constructing Enzymatic Constrained Metabolic Network Model. Biomolecules 2022, 12, 65. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Mao, Z.; Zhao, X.; Wang, R.; Zhang, P.; Cai, J.; Xue, C.; Ma, H. Integrating thermodynamic and enzymatic constraints into genome-scale metabolic models. Metab. Eng. 2021, 67, 133–144. [Google Scholar] [CrossRef]
- Li, F.; Yuan, L.; Lu, H.; Li, G.; Chen, Y.; Engqvist, M.K.M.; Kerkhoven, E.J.; Nielsen, J. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 2022, 5, 662–672. [Google Scholar] [CrossRef]
- Kroll, A.; Rousset, Y.; Hu, X.-P.; Liebrand, N.A.; Lercher, M.J. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat. Commun. 2023, 14, 4139. [Google Scholar] [CrossRef]
- Yu, H.; Deng, H.; He, J.; Keasling, J.D.; Luo, X. UniKP: A unified framework for the prediction of enzyme kinetic parameters. Nat. Commun. 2023, 14, 8211. [Google Scholar] [CrossRef] [PubMed]
- Dafforn, T.R. So how do you know you have a macromolecular complex? Acta Crystallogr. Sect. D-Biol. Crystallogr. 2007, 63 Pt 1, 17–25. [Google Scholar] [CrossRef]
- Nishi, H.; Hashimoto, K.; Madej, T.; Panchenko, A.R. Evolutionary, physicochemical, and functional mechanisms of protein homooligomerization. Prog. Mol. Biol. Transl. Sci. 2013, 117, 3–24. [Google Scholar] [CrossRef]
- Avraham, O.; Tsaban, T.; Ben-Aharon, Z.; Tsaban, L.; Schueler-Furman, O. Protein language models can capture protein quaternary state. BMC Bioinform. 2023, 24, 433. [Google Scholar] [CrossRef]
- Verkuil, R.; Kabeli, O.; Du, Y.; Wicky, B.I.M.; Milles, L.F.; Dauparas, J.; Baker, D.; Ovchinnikov, S.; Sercu, T.; Rives, A. Language models generalize beyond natural proteins. bioRxiv 2022. [Google Scholar] [CrossRef]
- Li, Y.; Wei, H.; Wang, T.; Xu, Q.; Zhang, C.; Fan, X.; Ma, Q.; Chen, N.; Xie, X. Current status on metabolic engineering for the production of l-aspartate family amino acids and derivatives. Bioresour. Technol. 2017, 245, 1588–1602. [Google Scholar] [CrossRef] [PubMed]
- Navratna, V.; Reddy, G.; Gopal, B. Structural basis for the catalytic mechanism of homoserine dehydrogenase. Acta Crystallogr. Sect. D-Biol. Crystallogr. 2015, 71 Pt 5, 1216–1225. [Google Scholar] [CrossRef] [PubMed]
- Akai, S.; Ikushiro, H.; Sawai, T.; Yano, T.; Kamiya, N.; Miyahara, I. The crystal structure of homoserine dehydrogenase complexed with l-homoserine and NADPH in a closed form. J. Biochem. 2018, 165, 185–195. [Google Scholar] [CrossRef]
- Liu, X.; Liu, J.; Liu, Z.; Qiao, Q.; Ni, X.; Yang, J.; Sun, G.; Li, F.; Zhou, W.; Guo, X.; et al. Engineering allosteric inhibition of homoserine dehydrogenase by semi-rational saturation mutagenesis screening. Front. Bioeng. Biotechnol. 2024, 11, 1336215. [Google Scholar] [CrossRef]
- Wagener, R.; Kobbe, B.; Paulsson, M. Matrilin-4, a new member of the matrilin family of extracellular matrix proteins 1. FEBS Lett. 1998, 436, 123–127. [Google Scholar] [CrossRef] [PubMed]
- Klatt, A.R.; Nitsche, D.P.; Kobbe, B.; Macht, M.; Paulsson, M.; Wagener, R. Molecular Structure, Processing, and Tissue Distribution of Matrilin-4*. J. Biol. Chem. 2001, 276, 17267–17275. [Google Scholar] [CrossRef] [PubMed]
- Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2022. [Google Scholar] [CrossRef]
- Colombatti, A.; Spessotto, P.; Doliana, R.; Mongiat, M.; Bressan, G.M.; Esposito, G. The EMILIN/Multimerin Family. Front. Immunol. 2012, 2, 93. [Google Scholar] [CrossRef]
- Jeimy, S.B.; Tasneem, S.; Cramer, E.M.; Hayward, C.P.M. Multimerin 1. Platelets 2008, 19, 83–95. [Google Scholar] [CrossRef]
- Lorenzon, E.; Colladel, R.; Andreuzzi, E.; Marastoni, S.; Todaro, F.; Schiappacassi, M.; Ligresti, G.; Colombatti, A.; Mongiat, M. MULTIMERIN2 impairs tumor angiogenesis and growth by interfering with VEGF-A/VEGFR2 pathway. Oncogene 2012, 31, 3136–3147. [Google Scholar] [CrossRef] [PubMed]
- Verdone, G.; Corazza, A.; Colebrooke, S.A.; Cicero, D.; Eliseo, T.; Boyd, J.; Doliana, R.; Fogolari, F.; Viglino, P.; Colombatti, A.; et al. NMR-based homology model for the solution structure of the C-terminal globular domain of EMILIN1. J. Biomol. NMR 2009, 43, 79–96. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302–2309. [Google Scholar] [CrossRef] [PubMed]
- Alley, E.C.; Khimulya, G.; Biswas, S.; AlQuraishi, M.; Church, G.M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 2019, 16, 1315–1322. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
EC Number | Protein Name | Protein Counts | Subunits Labels |
---|---|---|---|
1.15.1.1 | superoxide dismutase | 126 | 1, 2, 3, 4, 6 |
2.5.1.41 | phosphoglycerol geranylgeranyltransferase | 11 | 1, 2, 4, 5, 6 |
2.7.7.7 | DNA-directed DNA polymerase | 217 | 1, 2, 3, 4, 6 |
3.2.1.21 | beta-glucosidase | 12 | 1, 2, 4, 6, 8 |
3.4.11.5 | prolyl aminopeptidase | 8 | 1, 2, 3, 4, 6 |
3.5.1.4 | amidase | 6 | 1, 2, 4, 6, 8 |
3.6.1.1 | inorganic diphosphatase | 116 | 1, 2, 3, 6, 12 |
3.6.1.15 | nucleoside-triphosphate phosphatase | 16 | 1, 2, 4, 6, 12 |
4.2.1.1 | carbonic anhydrase | 13 | 1, 2, 3, 4, 6 |
Fold | mPrecision | mRecall | mACC | mF1 |
---|---|---|---|---|
1 | 0.882 | 0.861 | 0.970 | 0.871 |
2 | 0.946 | 0.958 | 0.970 | 0.948 |
3 | 0.976 | 0.916 | 0.970 | 0.937 |
4 | 0.883 | 0.857 | 0.970 | 0.870 |
5 | 0.977 | 0.9648 | 0.972 | 0.970 |
6 | 0.8795 | 0.870 | 0.971 | 0.875 |
7 | 0.876 | 0.862 | 0.970 | 0.869 |
8 | 0.879 | 0.856 | 0.968 | 0.867 |
9 | 0.980 | 0.964 | 0.969 | 0.971 |
10 | 0.880 | 0.864 | 0.969 | 0.871 |
Average | 0.916 ± 0.047 | 0.897 ± 0.048 | 0.97 ± 0.001 | 0.905 ± 0.046 |
Organism | Uniprot ID | Complex | QUEEN | DeepSub |
---|---|---|---|---|
Corynebacterium glutamicum | P08499 | - | 4 | 2 |
Mus musculus | O89029 | Matrilin-4 complex | 1 | 3 |
Homo sapiens | O95460 | Matrilin-4 complex | 1 | 3 |
Mus musculus | B2RPV6 | Multimerin-1 complex | 2 | 3 |
Homo sapiens | Q13201 | Multimerin-1 complex | 2 | 3 |
Mus musculus | A6H6E2 | Multimerin-2 complex | 2 | 3 |
Homo sapiens | Q9H8L6 | Multimerin-2 complex | 2 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, R.; Wu, K.; Lin, J.; Wang, D.; Huang, Y.; Li, Y.; Shi, Z.; Zhang, Z.; Wang, Z.; Mao, Z.; et al. DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes. Int. J. Mol. Sci. 2024, 25, 4803. https://doi.org/10.3390/ijms25094803
Deng R, Wu K, Lin J, Wang D, Huang Y, Li Y, Shi Z, Zhang Z, Wang Z, Mao Z, et al. DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes. International Journal of Molecular Sciences. 2024; 25(9):4803. https://doi.org/10.3390/ijms25094803
Chicago/Turabian StyleDeng, Rui, Ke Wu, Jiawei Lin, Dehang Wang, Yuanyuan Huang, Yang Li, Zhenkun Shi, Zihan Zhang, Zhiwen Wang, Zhitao Mao, and et al. 2024. "DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes" International Journal of Molecular Sciences 25, no. 9: 4803. https://doi.org/10.3390/ijms25094803
APA StyleDeng, R., Wu, K., Lin, J., Wang, D., Huang, Y., Li, Y., Shi, Z., Zhang, Z., Wang, Z., Mao, Z., Liao, X., & Ma, H. (2024). DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes. International Journal of Molecular Sciences, 25(9), 4803. https://doi.org/10.3390/ijms25094803