Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
2.2. Data Representation
2.2.1. Protein Sequence Coding
2.2.2. Secondary Structural Features
2.2.3. Structural Contact Map Features
2.3. Model Architecture and Implementation
2.3.1. Multimodal Multiscale Network (MMSNet)
2.3.2. Residual Attention Mechanism
2.3.3. Model Configuration and Training Details
2.4. Comparison with Existing Methods
2.5. Evaluation Metrics
3. Results and Discussion
3.1. Comparative Analysis of Algorithm Performance Combining Sequence and Structural Features
3.2. Performance Comparison Between Residual Attention Mechanism and Traditional Pooling Layer
3.3. Comparative Analysis of Performance with Other Model Methods
3.4. Analysis of Model Prediction Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Oluwole, O.; Ibidapo, O.; Arowosola, T.; Raji, F.; Zandonadi, R.P.; Alasqah, I.; Raposo, A. Sustainable transformation agenda for enhanced global food and nutrition security: A narrative review. Front. Nutr. 2023, 10, 1226538. [Google Scholar] [CrossRef]
- Langyan, S.; Khan, F.N.; Kumar, A. Advancement in nutritional value, processing methods, and potential applications of pseudocereals in dietary food: A review. Food Bioprocess Technol. 2023, 17, 571–590. [Google Scholar] [CrossRef]
- Kajzer, M.; Diowksz, A. The clean label concept: Novel approaches in gluten-free breadmaking. Appl. Sci. 2021, 11, 6129. [Google Scholar] [CrossRef]
- Poutanen, K.S.; Karlund, A.O.; Gomez-Gallego, C.; Johansson, D.P.; Scheers, N.M.; Marklinder, I.M.; Landberg, R. Grains—A major source of sustainable protein for health. Nutr. Rev. 2022, 80, 1648–1663. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
- Liu, J.; Tang, X.; Cui, S.; Guan, X. Predicting the function of rice proteins through multi-instance multi-label learning based on multiple features fusion. Brief. Bioinf 2022, 23, bbac095. [Google Scholar] [CrossRef]
- Yousef, M.; Jung, S.; Showe, L.C.; Showe, M.K. Learning from positive examples when the negative class is undetermined- microrna gene identification. Algorithms Mol. Biol. 2008, 3, 2. [Google Scholar] [CrossRef] [PubMed]
- Nam, J.-W.; Shin, K.-R.; Han, J.; Lee, Y.; Kim, V.N.; Zhang, B.-T. Human microrna prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res. 2005, 33, 3570–3581. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.-W.; Liu, M. Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 2005, 21, 4394–4400. [Google Scholar] [CrossRef] [PubMed]
- Kulmanov, M.; Khan, M.A.; Hoehndorf, R. Deepgo: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018, 34, 660–668. [Google Scholar] [CrossRef]
- Kulmanov, M.; Hoehndorf, R. Deepgoplus: Improved protein function prediction from sequence. Bioinformatics 2020, 36, 422–429. [Google Scholar] [CrossRef] [PubMed]
- Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using diamond. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef] [PubMed]
- Jisna, V.A.; Jayaraj, P.B. Protein structure prediction: Conventional and deep learning perspectives. Protein J. 2021, 40, 522–544. [Google Scholar] [CrossRef] [PubMed]
- Serc¸inoğlu, O.; Ozbek, P. Sequence-structure-function relationships in class i mhc: A local frustration perspective. PLoS ONE 2020, 15, e0232849. [Google Scholar] [CrossRef]
- Gligorijević, V.; Renfrew, P.D.; Kosciolek, T.; Leman, J.K.; Berenberg, D.; Vatanen, T.; Bonneau, R. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 2021, 12, 3168. [Google Scholar] [CrossRef]
- Liu, J.; Tang, X.; Guan, X. Grain protein function prediction based on self-attention mechanism and bidirectional lstm. Brief. Bioinf 2023, 24, bbac493. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Hassabis, D. Highly accurate protein structure prediction with alphafold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Boadu, F.; Cao, H.; Cheng, J. Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function. Bioinformatics 2023, 39, i318–i325. [Google Scholar] [CrossRef]
- Jiao, P.; Wang, B.; Wang, X.; Liu, B.; Wang, Y.; Li, J. Struct2go: Protein function prediction based on graph pooling algorithm and alphafold2 structure information. Bioinformatics 2023, 39, btad637. [Google Scholar] [CrossRef] [PubMed]
- Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bansal, P.; Bridge, A.J.; Xenarios, I. Uniprotkb/swiss-prot, the manually annotated section of the uniprot knowledgebase: How to use the entry view. Methods Mol. Biol. 2016, 1374, 23–54. [Google Scholar] [PubMed]
- Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Velankar, S. Alphafold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef] [PubMed]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Sherlock, G. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef] [PubMed]
- Zuallaert, J.; Pan, X.; Saeys, Y.; Wang, X.; Neve, W.D. Investigating the biological relevance in trained embedding representations of protein sequences. In Proceedings of the Workshop on Computational Biology at ICML2019, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Joosten, R.P.; Beek, T.A.H.t.; Krieger, E.; Hekkelman, M.L.; Hooft, R.W.W.; Schneider, R.; Vriend, G. A series of pdb related databases for everyday needs. Nucleic Acids Res. 2011, 39, D411–D419. [Google Scholar] [CrossRef]
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, B.A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Zhu, K.; Wu, J. Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 184–193. [Google Scholar]
- Radivojac, P.; Clark, W.T.; Oron, T.R.; Schnoes, A.M.; Wittkop, T.; Sokolov, A.; Friedberg, I. A large-scale evaluation of computational protein function prediction. Nat. Methods 2013, 10, 221–227. [Google Scholar] [CrossRef]
- Zeng, M.; Zou, B.; Wei, F.; Liu, X.; Wang, L. Effective prediction of three common diseases by combining smote with tomek links technique for imbalanced medical data. In Proceedings of the 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 28–29 May 2016; pp. 225–228. [Google Scholar]
- Guo, N.; Tang, S.; Wang, J.; Hu, S.; Tang, S.; Wei, X.; Hu, P. Transcriptome and proteome analysis revealed that hormone and reactive oxygen species synergetically regulate dormancy of introgression line in rice (Oryza sativa L.). Int. J. Mol. Sci. 2023, 24, 6088. [Google Scholar] [CrossRef]
- Prathap, V.; Kumar, S.; Tyagi, A. Comparative proteome analysis of phosphorus-responsive genotypes reveals the proteins differentially expressed under phosphorous starvation stress in rice. Int. J. Biol. Macromol. 2023, 234, 123760. [Google Scholar] [CrossRef] [PubMed]
- Haider, I.; Yunmeng, Z.; White, F.; Li, C.; Incitti, R.; Alam, I.; Bouwmeester, H.J. Transcriptome analysis of the phosphate starvation response sheds light on strigolactone biosynthesis in rice. Plant J. 2023, 114, 355–370. [Google Scholar] [CrossRef]
- Li, P.; Jiang, J.; Zhang, G.; Miao, S.; Lu, J.; Qian, Y.; Xu, J. Integrating gwas and transcriptomics to identify candidate genes conferring heat tolerance in rice. Front. Plant Sci. 2023, 13, 1102938. [Google Scholar] [CrossRef]
- Wei, Y.; Li, X.; Li, D.; Su, X.; Huang, Y.; Li, Q.; Yang, X. Mapping and candidate gene analysis of the low-temperature-sensitive albino gene osltsa8 in rice seedlings. Curr. Issues Mol. Biol. 2024, 46, 6508–6521. [Google Scholar] [CrossRef] [PubMed]
Datasets | Sub- Ontology | Training Samples | Validation Samples | Test Samples | Total | Number of Classes | GO Filtering Threshold |
---|---|---|---|---|---|---|---|
Japonica | MF 1 | 2157 | 540 | 675 | 3372 | 121 | 50 |
BP 2 | 2067 | 517 | 646 | 3230 | 254 | ||
CC 3 | 2085 | 522 | 652 | 3259 | 59 | ||
Indica | MF | 260 | 65 | 82 | 407 | 52 | 10 |
BP | 68 | 17 | 22 | 107 | 28 | ||
CC | 192 | 48 | 61 | 301 | 28 | ||
Maize | MF | 254 | 64 | 80 | 398 | 48 | 10 |
BP | 76 | 20 | 25 | 121 | 31 | ||
CC | 152 | 38 | 48 | 238 | 27 | ||
Wheat | MF | 131 | 33 | 42 | 206 | 56 | 5 |
BP | 37 | 10 | 12 | 59 | 29 | ||
CC | 61 | 16 | 20 | 97 | 27 |
Grain | Sub- Ontology | Algorithm | Fmax | AvgPr | AvgRe | AUPR | MCC |
---|---|---|---|---|---|---|---|
Japonica | MF | MCNN-1D (S) 1 | 0.650 | 0.710 | 0.604 | 0.741 | 0.656 |
MCNN-1D (S+SS) | 0.730 | 0.772 | 0.692 | 0.824 | 0.734 | ||
MMSNet (S+SS+DS) 2 | 0.764 | 0.818 | 0.720 | 0.849 | 0.762 | ||
BP | MCNN-1D (S) | 0.544 | 0.638 | 0.476 | 0.530 | 0.486 | |
MCNN-1D (S+SS) | 0.645 | 0.710 | 0.588 | 0.648 | 0.593 | ||
MMSNet (S+SS+DS) | 0.654 | 0.728 | 0.592 | 0.661 | 0.603 | ||
CC | MCNN-1D (S) | 0.754 | 0.796 | 0.714 | 0.778 | 0.691 | |
MCNN-1D (S+SS) | 0.789 | 0.790 | 0.788 | 0.814 | 0.724 | ||
MMSNet (S+SS+DS) | 0.800 | 0.824 | 0.776 | 0.819 | 0.739 | ||
Indica | MF | MCNN-1D (S) | 0.566 | 0.720 | 0.468 | 0.477 | 0.419 |
MCNN-1D (S+SS) | 0.654 | 0.808 | 0.552 | 0.564 | 0.500 | ||
MMSNet (S+SS+DS) | 0.668 | 0.776 | 0.594 | 0.580 | 0.522 | ||
BP | MCNN-1D (S) | 0.558 | 0.804 | 0.430 | 0.627 | 0.457 | |
MCNN-1D (S+SS) | 0.588 | 0.670 | 0.596 | 0.641 | 0.392 | ||
MMSNet (S+SS+DS) | 0.604 | 0.748 | 0.514 | 0.655 | 0.482 | ||
CC | MCNN-1D (S) | 0.699 | 0.746 | 0.658 | 0.677 | 0.569 | |
MCNN-1D (S+SS) | 0.737 | 0.786 | 0.696 | 0.731 | 0.649 | ||
MMSNet (S+SS+DS) | 0.752 | 0.794 | 0.716 | 0.728 | 0.667 | ||
Maize | MF | MCNN-1D (S) | 0.556 | 0.690 | 0.472 | 0.498 | 0.423 |
MCNN-1D (S+SS) | 0.563 | 0.610 | 0.528 | 0.509 | 0.439 | ||
MMSNet (S+SS+DS) | 0.574 | 0.642 | 0.526 | 0.513 | 0.442 | ||
BP | MCNN-1D (S) | 0.716 | 0.872 | 0.618 | 0.766 | 0.621 | |
MCNN-1D (S+SS) | 0.784 | 0.918 | 0.688 | 0.817 | 0.711 | ||
MMSNet (S+SS+DS) | 0.798 | 0.970 | 0.678 | 0.831 | 0.718 | ||
CC | MCNN-1D (S) | 0.638 | 0.740 | 0.560 | 0.480 | 0.438 | |
MCNN-1D (S+SS) | 0.714 | 0.896 | 0.596 | 0.589 | 0.512 | ||
MMSNet (S+SS+DS) | 0.719 | 0.870 | 0.614 | 0.597 | 0.516 | ||
Wheat | MF | MCNN-1D (S) | 0.620 | 0.782 | 0.516 | 0.464 | 0.402 |
MCNN-1D (S+SS) | 0.651 | 0.814 | 0.544 | 0.461 | 0.371 | ||
MMSNet (S+SS+DS) | 0.664 | 0.824 | 0.558 | 0.480 | 0.390 | ||
BP | MCNN-1D (S) | 0.635 | 0.720 | 0.570 | 0.634 | 0.522 | |
MCNN-1D (S+SS) | 0.705 | 0.676 | 0.760 | 0.669 | 0.513 | ||
MMSNet (S+SS+DS) | 0.724 | 0.680 | 0.782 | 0.679 | 0.530 | ||
CC | MCNN-1D (S) | 0.449 | 0.594 | 0.372 | 0.335 | 0.314 | |
MCNN-1D (S+SS) | 0.531 | 0.612 | 0.476 | 0.351 | 0.258 | ||
MMSNet (S+SS+DS) | 0.547 | 0.624 | 0.510 | 0.368 | 0.272 |
Sub- Ontology | Algorithm | Fmax | AvgPr | AvgRe | AUPR | MCC |
---|---|---|---|---|---|---|
MF | MMSNet (Max Pooling) | 0.660 ± 0.033 | 0.690 ± 0.045 | 0.632 ± 0.028 | 0.700 ± 0.055 | 0.619 ± 0.043 |
MMSNet (Average Pooling) | 0.618 ± 0.044 | 0.652 ± 0.057 | 0.584 ± 0.036 | 0.645 ± 0.069 | 0.575 ± 0.062 | |
MMSNet (GlobalMax) | 0.732 ± 0.025 | 0.776 ± 0.041 | 0.696 ± 0.026 | 0.831 ± 0.021 | 0.738 ± 0.025 | |
MMSNet (GlobalAverage) | 0.754 ± 0.031 | 0.798 ± 0.046 | 0.710 ± 0.032 | 0.845 ± 0.024 | 0.754 ± 0.028 | |
MMSNet (ResidualAttention) | 0.764 ± 0.014 | 0.818 ± 0.025 | 0.720 ± 0.028 | 0.849 ± 0.015 | 0.762 ± 0.014 | |
BP | MMSNet (Max Pooling) | 0.536 ± 0.014 | 0.600 ± 0.052 | 0.488 ± 0.016 | 0.485 ± 0.017 | 0.450 ± 0.015 |
MMSNet (Average Pooling) | 0.518 ± 0.015 | 0.566 ± 0.026 | 0.476 ± 0.021 | 0.456 ± 0.012 | 0.420 ± 0.013 | |
MMSNet (GlobalMax) | 0.628 ± 0.019 | 0.696 ± 0.024 | 0.574 ± 0.016 | 0.634 ± 0.019 | 0.579 ± 0.025 | |
MMSNet (GlobalAverage) | 0.638 ± 0.015 | 0.698 ± 0.031 | 0.584 ± 0.008 | 0.643 ± 0.016 | 0.583 ± 0.021 | |
MMSNet (ResidualAttention) | 0.654 ± 0.011 | 0.728 ± 0.020 | 0.592 ± 0.015 | 0.661 ± 0.009 | 0.603 ± 0.009 | |
CC | MMSNet (Max Pooling) | 0.746 ± 0.012 | 0.794 ± 0.021 | 0.708 ± 0.032 | 0.734 ± 0.018 | 0.668 ± 0.015 |
MMSNet (Average Pooling) | 0.732 ± 0.020 | 0.764 ± 0.029 | 0.700 ± 0.021 | 0.712 ± 0.030 | 0.650 ± 0.024 | |
MMSNet (GlobalMax) | 0.790 ± 0.009 | 0.816 ± 0.022 | 0.764 ± 0.008 | 0.816 ± 0.008 | 0.729 ± 0.012 | |
MMSNet (GlobalAverage) | 0.794 ± 0.005 | 0.814 ± 0.014 | 0.778 ± 0.010 | 0.818 ± 0.007 | 0.734 ± 0.008 | |
MMSNet (ResidualAttention) | 0.800 ± 0.007 | 0.824 ± 0.017 | 0.776 ± 0.005 | 0.819 ± 0.005 | 0.739 ± 0.007 |
Grain | Sub- Ontology | Protein | Real Function | Predicted Function |
---|---|---|---|---|
Japonica | MF | P29250 (LOX2_ORYSJ) | GO:0046872 GO:0043169 GO:0043167 GO:0005488 GO:0016491 GO:0003824 | GO:0046872 GO:0043169 GO:0043167 GO:0005488 GO:0016491 GO:0003824 |
P16081 (NIA1_ORYSJ) | GO:0003824 GO:0005488 GO:1901363 GO:0097159 GO:0046914 GO:0043168 | GO:0003824 GO:0005488 GO:1901363 GO:0097159 GO:0003676 GO:0003723 GO:0016787 | ||
Q0JKI9 (ARFB_ORYSJ) | GO:0003677 GO:0003676 GO:0097159 GO:1901363 GO:0005488 | GO:0003677 GO:0003676 GO:0097159 GO:1901363 GO:0005488 | ||
BP | Q10RB4 (BGAL5_ORYSJ) | GO:0008152 GO:0044238 GO:0005975 | GO:0008152 GO:0044238 | |
Q8W0A1 (BGAL2_ORYSJ) | GO:0005975 GO:0044238 GO:0008152 | GO:0005975 GO:0044238 GO:0008152 GO:0009058 GO:0016051 | ||
CC | Q84YK8 (LOXC2_ORYSJ) | GO:0009507 GO:0009536 GO:0043231 GO:0043227 GO:0043229 GO:0043226 GO:0110165 | GO:0009507 GO:0009536 GO:0043231 GO:0043227 GO:0043229 GO:0043226 GO:0110165 GO:0016020 | |
Q6YSJ5 (AGO16_ORYSJ) | GO:0005737 GO:0110165 | GO:0005737 GO:0110165 | ||
Q7XP59 (GLR31_ORYSJ) | GO:0005886 GO:0016020 GO:0110165 | GO:0016020 GO:0110165 | ||
Indica | MF | A2Y9M4 (SSY1_ORYSI) | GO:0003824 GO:0016740 | GO:0003824 |
P0C461 (RR12_ORYSI) | GO:0003735 GO:0005198 | GO:0003735 GO:0005198 GO:0005488 | ||
BP | Q01IX6 (DAO_ORYSI) | GO:0009987 GO:0008152 GO:0044237 | GO:0009987 GO:0008152 | |
A2YNH4 (6PGL2_ORYSI) | GO:0044238 GO:0071704 GO:0008152 | GO:0044238 GO:0071704 GO:0008152 | ||
CC | B8BKI8 (MCM2_ORYSI) | GO:0032991 | GO:0032991 GO:0110165 | |
B8ATT7 (VLN4_ORYSI) | GO:0005737 GO:0110165 | GO:0005737 GO:0110165 | ||
Maize | MF | P29390 (FRI2_MAIZE) | GO:0003824 GO:0016491 | GO:0003824 |
P06677 (ZEA9_MAIZE) | GO:0045735 | GO:0045735 GO:0003824 | ||
BP | P33488 (ABP4_MAIZE) | GO:0009987 | GO:0009987 GO:0044237 GO:0008152 | |
B6SU46 (AAMT2_MAIZE) | GO:0006950 GO:0050896 | GO:0006950 GO:0050896 | ||
CC | Q9LKX9 (RBR1_MAIZE) | GO:0032993 GO:0032991 | GO:0032993 GO:0032991 | |
Q41764 (ADF3_MAIZE) | GO:0110165 GO:0005737 | GO:0110165 | ||
Wheat | MF | Q8L803 (RK9_WHEAT) | GO:0003735 GO:0005198 | GO:0003735 GO:0005198 GO:0003824 |
Q5I7K3 (RS29_WHEAT) | GO:0003735 GO:0005198 | GO:0003735 GO:0005198 | ||
BP | B6DZC8 (1FEH3_WHEAT) | GO:0008152 GO:0044238 GO:0071704 GO:0005975 | GO:0008152 GO:0044238 GO:0071704 | |
O04706 (GAO1B_WHEAT) | GO:0009416 GO:0009314 GO:0009628 GO:0050896 | GO:0009416 GO:0009314 GO:0009628 GO:0050896 | ||
CC | Q41560 (HS16B_WHEAT) | GO:0005737 GO:0110165 | GO:0005737 GO:0110165 | |
Q01481 (WIR1B_WHEAT) | GO:0110165 GO:0016020 | GO:0110165 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Zhang, X.; Huang, K.; Wei, Y.; Guan, X. Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data. Appl. Sci. 2025, 15, 1890. https://doi.org/10.3390/app15041890
Liu J, Zhang X, Huang K, Wei Y, Guan X. Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data. Applied Sciences. 2025; 15(4):1890. https://doi.org/10.3390/app15041890
Chicago/Turabian StyleLiu, Jing, Xinping Zhang, Kai Huang, Yuqi Wei, and Xiao Guan. 2025. "Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data" Applied Sciences 15, no. 4: 1890. https://doi.org/10.3390/app15041890
APA StyleLiu, J., Zhang, X., Huang, K., Wei, Y., & Guan, X. (2025). Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data. Applied Sciences, 15(4), 1890. https://doi.org/10.3390/app15041890