The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction
Abstract
:1. Introduction
2. An Overview of Multiple Sequence Alignment
2.1. Multiple Sequence Alignment for Protein Monomer
2.1.1. Dynamic Programming-Based Pairwise Alignment
2.1.2. Multiple Sequence Alignment
2.1.3. Sequence-Based Approaches for Protein Monomer’s MSA
2.1.4. HMM-Based Approaches for Protein Monomer’s MSA
2.1.5. k-Mer-Based Approaches
2.1.6. Multi-Stage Hybrid Approaches to Search Metagenome
2.1.7. Deep Learning-Based Approaches
2.2. Multiple Sequence Alignment for Protein Complex
2.2.1. Genomic Distance-Based Approaches
2.2.2. Phylogeny-Based Approaches
2.2.3. Protein–Protein Interactions Databases-Based Approaches
2.2.4. Protein Language Models-Based Approaches
2.2.5. Hybrid Approaches for Protein Complex’s MSA
2.3. Multiple Sequence Alignment for RNA
2.3.1. Sequence-Based Approaches for RNA’s MSA
2.3.2. HMM-Based Approaches for RNA’s MSA
2.3.3. Covariance Model-Based Approaches
2.3.4. Hybrid Approaches for RNA’s MSA
2.4. Alternative for MSA in Application Tasks, Protein Language Model
2.4.1. PLMs with MSA as Input
2.4.2. Autoencoding PLMs with Single-Sequence Input
2.4.3. Autoregressive PLM with Single-Sequence Input
2.4.4. Other Types of PLMs
3. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
References
- Wu, S.; Zhang, Y. LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Res. 2007, 35, 3375–3382. [Google Scholar] [CrossRef] [PubMed]
- Söding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33, W244–W248. [Google Scholar] [CrossRef] [PubMed]
- Adhikari, B.; Cheng, J. CONFOLD2: Improved contact-driven ab initio protein structure modeling. BMC Bioinform. 2018, 19, 22. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
- Zhang, C.; Zheng, W.; Freddolino, P.L.; Zhang, Y. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. J. Mol. Biol. 2018, 430, 2256–2265. [Google Scholar] [CrossRef]
- Chen, K.; Mizianty, M.J.; Kurgan, L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 2012, 28, 331–341. [Google Scholar] [CrossRef]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
- Yang, J.; Roy, A.; Zhang, Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 2013, 29, 2588–2595. [Google Scholar] [CrossRef]
- Chauhan, J.S.; Rao, A.; Raghava, G.P. In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences. PLoS ONE 2013, 8, e67008. [Google Scholar] [CrossRef]
- Hwang, S.; Gou, Z.; Kuznetsov, I.B. DP-Bind: A web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 2007, 23, 634–636. [Google Scholar] [CrossRef] [PubMed]
- Paz, I.; Kosti, I.; Ares, M., Jr.; Cline, M.; Mandel-Gutfreund, Y. RBPmap: A web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res. 2014, 42, W361–W367. [Google Scholar] [CrossRef] [PubMed]
- Sang, X.; Xiao, W.; Zheng, H.; Yang, Y.; Liu, T. HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection. Comput. Math. Methods Med. 2020, 2020, 1384749. [Google Scholar] [CrossRef]
- Zaman, R.; Chowdhury, S.Y.; Rashid, M.A.; Sharma, A.; Dehzangi, A.; Shatabda, S. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features. BioMed Res. Int. 2017, 2017, 4590609. [Google Scholar] [CrossRef]
- Disfani, F.M.; Hsu, W.L.; Mizianty, M.J.; Oldfield, C.J.; Xue, B.; Dunker, A.K.; Uversky, V.N.; Kurgan, L. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 2012, 28, i75–i83. [Google Scholar] [CrossRef]
- Sharma, R.; Kumar, S.; Tsunoda, T.; Patil, A.; Sharma, A. Predicting MoRFs in protein sequences using HMM profiles. BMC Bioinform. 2016, 17, 504. [Google Scholar] [CrossRef]
- Wuyun, Q.; Chen, Y.; Shen, Y.; Cao, Y.; Hu, G.; Cui, W.; Gao, J.; Zheng, W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024, 29, 832. [Google Scholar] [CrossRef]
- Pearson, W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990, 183, 63–98. [Google Scholar]
- Hughey, R.; Krogh, A. SAM: Sequence Alignment and Modeling Software System; University of California at Santa Cruz: Santa Cruz, CA, USA, 1995. [Google Scholar]
- Steinegger, M.; Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef]
- Zheng, W.; Wuyun, Q.; Li, Y.; Zhang, C.; Freddolino, P.L.; Zhang, Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat. Methods 2024, 21, 279–289. [Google Scholar] [CrossRef]
- Kaminski, K.; Ludwiczak, J.; Pawlicki, K.; Alva, V.; Dunin-Horkawicz, S. pLM-BLAST: Distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics 2023, 39, btad579. [Google Scholar] [CrossRef] [PubMed]
- Hopf, T.A.; Schärfe, C.P.; Rodrigues, J.P.; Green, A.G.; Kohlbacher, O.; Sander, C.; Bonvin, A.M.; Marks, D.S. Sequence co-evolution gives 3D contacts and structures of protein complexes. Elife 2014, 3, e03430. [Google Scholar] [CrossRef] [PubMed]
- Zeng, H.; Wang, S.; Zhou, T.; Zhao, F.; Li, X.; Wu, Q.; Xu, J. ComplexContact: A web server for inter-protein contact prediction using deep learning. Nucleic Acids Res. 2018, 46, W432–W437. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Yu, D.J. cpxDeepMSA: A Deep Cascade Algorithm for Constructing Multiple Sequence Alignments of Protein-Protein Interactions. Int. J. Mol. Sci. 2022, 23, 8459. [Google Scholar] [CrossRef]
- Chen, B.; Xie, Z.; Qiu, J.; Ye, Z.; Xu, J.; Tang, J. Improved the heterodimer protein complex prediction with protein language models. Brief. Bioinform. 2023, 24, bbad221. [Google Scholar] [CrossRef]
- Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and applications. BMC Bioinform. 2009, 10, 421. [Google Scholar] [CrossRef]
- Wheeler, T.J.; Eddy, S.R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 2013, 29, 2487–2489. [Google Scholar] [CrossRef]
- Nawrocki, E.P.; Kolbe, D.L.; Eddy, S.R. Infernal 1.0: Inference of RNA alignments. Bioinformatics 2009, 25, 1335–1337. [Google Scholar] [CrossRef]
- Eggenhofer, F.; Hofacker, I.L.; Höner Zu Siederdissen, C. RNAlien–Unsupervised RNA family model construction. Nucleic Acids Res. 2016, 44, 8433–8441. [Google Scholar] [CrossRef]
- Rao, R.M.; Liu, J.; Verkuil, R.; Meier, J.; Canny, J.; Abbeel, P.; Sercu, T.; Rives, A. MSA Transformer. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8844–8856. [Google Scholar]
- Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [CrossRef]
- Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348. [Google Scholar] [CrossRef] [PubMed]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef] [PubMed]
- Xu, Z.; Yang, Y.; Huang, B. A teaching approach from the exhaustive search method to the Needleman–Wunsch algorithm. Biochem. Mol. Biol. Educ. 2017, 45, 194–204. [Google Scholar] [CrossRef] [PubMed]
- Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197. [Google Scholar] [CrossRef] [PubMed]
- Iovino, B.G.; Ye, Y. Protein embedding based alignment. BMC Bioinform. 2024, 25, 85. [Google Scholar] [CrossRef]
- Pantolini, L.; Studer, G.; Pereira, J.; Durairaj, J.; Tauriello, G.; Schwede, T. Embedding-based alignment: Combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone. Bioinformatics 2024, 40, btad786. [Google Scholar] [CrossRef]
- van Kempen, M.; Kim, S.S.; Tumescheit, C.; Mirdita, M.; Lee, J.; Gilchrist, C.L.M.; Söding, J.; Steinegger, M. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 2024, 42, 243–246. [Google Scholar] [CrossRef]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef]
- Lipman, D.J.; Altschul, S.F.; Kececioglu, J.D. A Tool for Multiple Sequence Alignment. Proc. Natl. Acad. Sci. USA 1989, 86, 4412–4415. [Google Scholar] [CrossRef]
- Bonizzoni, P.; Vedova, G.D. The complexity of multiple sequence alignment with SP-score that is a metric. Theor. Comput. Sci. 2001, 259, 63–79. [Google Scholar] [CrossRef]
- Feng, D.-F.; Doolittle, R.F. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J. Mol. Evol. 1987, 25, 351–360. [Google Scholar] [CrossRef] [PubMed]
- Thompson, J.D.; Higgins, D.G.; Gibson, T.J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22, 4673–4680. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30, 3059–3066. [Google Scholar] [CrossRef]
- Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef]
- McWhite, C.D.; Armour-Garb, I.; Singh, M. Leveraging protein language models for accurate multiple sequence alignments. Genome Res. 2023, 33, 1145–1153. [Google Scholar] [CrossRef]
- Guindon, S.; Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003, 52, 696–704. [Google Scholar] [CrossRef]
- Nguyen, L.-T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol. Biol. Evol. 2014, 32, 268–274. [Google Scholar] [CrossRef]
- Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef]
- Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. [Google Scholar] [CrossRef]
- Kumar, S.; Tamura, K.; Jakobsen, I.B.; Nei, M. MEGA2: Molecular evolutionary genetics analysis software. Bioinformatics 2001, 17, 1244–1245. [Google Scholar] [CrossRef] [PubMed]
- Kumar, S.; Tamura, K.; Nei, M. MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput. Appl. Biosci. 1994, 10, 189–191. [Google Scholar] [CrossRef] [PubMed]
- Lupo, U.; Sgarbossa, D.; Bitbol, A.-F. Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nat. Commun. 2022, 13, 6298. [Google Scholar] [CrossRef]
- Chao, J.; Tang, F.; Xu, L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022, 12, 546. [Google Scholar] [CrossRef] [PubMed]
- Lipman, D.J.; Pearson, W.R. Rapid and sensitive protein similarity searches. Science 1985, 227, 1435–1441. [Google Scholar] [CrossRef]
- Dumas, J.-P.; Ninio, J. Efficient algorithms for folding and comparing nucleic acid sequences. Nucleic Acids Res. 1982, 10, 197–206. [Google Scholar] [CrossRef]
- Wilbur, W.J.; Lipman, D.J. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 1983, 80, 726–730. [Google Scholar] [CrossRef]
- Henikoff, S.; Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 1992, 89, 10915–10919. [Google Scholar] [CrossRef]
- Müller, T.; Spang, R.; Vingron, M. Estimating amino acid substitution models: A comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol. Biol. Evol. 2002, 19, 8–13. [Google Scholar] [CrossRef]
- Tomii, K.; Yamada, K. Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS. Methods Mol. Biol. 2016, 1415, 211–223. [Google Scholar] [CrossRef]
- Prlić, A.; Domingues, F.S.; Sippl, M.J. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 2000, 13, 545–550. [Google Scholar] [CrossRef] [PubMed]
- Jia, K.; Jernigan, R.L. New amino acid substitution matrix brings sequence alignments into agreement with structure matches. Proteins 2021, 89, 671–682. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
- Ma, B.; Tromp, J.; Li, M. PatternHunter: Faster and more sensitive homology search. Bioinformatics 2002, 18, 440–445. [Google Scholar] [CrossRef]
- Park, J.H.; Karplus, K.; Barrett, C.; Hughey, R.; Haussler, D.; Haussler, D.; Hubbard, T.J.P.; Chothia, C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 1998, 284, 1201–1210. [Google Scholar] [CrossRef]
- Eddy, S.R. Hidden Markov models. Curr. Opin. Struct. Biol. 1996, 6, 361–365. [Google Scholar] [CrossRef]
- Hughey, R.; Krogh, A.; Hughey, R.; Krogh, A. Hidden Markov models for sequence analysis. Extension and analysis of the basic method. Bioinformatics 1996, 12, 95–107. [Google Scholar] [CrossRef]
- Karplus, K.; Barrett, C.; Hughey, R. Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14, 846–856. [Google Scholar] [CrossRef]
- Potter, S.C.; Luciani, A.; Eddy, S.R.; Park, Y.; Lopez, R.; Finn, R.D. HMMER web server: 2018 update. Nucleic Acids Res. 2018, 46, W200–W204. [Google Scholar] [CrossRef]
- Madera, M.; Gough, J. A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 2002, 30, 4321–4328. [Google Scholar] [CrossRef] [PubMed]
- Barrett, C.; Hughey, R.; Karplus, K. Scoring hidden Markov models. Comput. Appl. Biosci. 1997, 13, 191–199. [Google Scholar] [CrossRef] [PubMed]
- Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 2012, 9, 173–175. [Google Scholar] [CrossRef] [PubMed]
- Söding, J. Protein homology detection by HMM–HMM comparison. Bioinformatics 2005, 21, 951–960. [Google Scholar] [CrossRef]
- Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460–2461. [Google Scholar] [CrossRef]
- Edgar, R. Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res. 2004, 32, 380–385. [Google Scholar] [CrossRef]
- Chao, K.-M.; Pearson, W.; Miller, W. Aligning two sequences within a specified diagonal band. Comput. Appl. Biosci. 1992, 8, 481–487. [Google Scholar] [CrossRef]
- Ovchinnikov, S.; Park, H.; Varghese, N.; Huang, P.-S.; Pavlopoulos, G.A.; Kim, D.E.; Kamisetty, H.; Kyrpides, N.C.; Baker, D. Protein structure determination using metagenome sequence data. Science 2017, 355, 294–298. [Google Scholar] [CrossRef]
- Zhang, C.; Zheng, W.; Mortuza, S.M.; Li, Y.; Zhang, Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 2020, 36, 2105–2112. [Google Scholar] [CrossRef]
- Johnson, L.S.; Eddy, S.R.; Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 2010, 11, 431. [Google Scholar] [CrossRef]
- Peng, Z.; Wang, W.; Wei, H.; Li, X.; Yang, J. Improved protein structure prediction with trRosettaX2, AlphaFold2, and optimized MSAs in CASP15. Proteins 2023, 91, 1704–1711. [Google Scholar] [CrossRef] [PubMed]
- Du, Z.; Peng, Z.; Yang, J. Toward the assessment of predicted inter-residue distance. Bioinformatics 2022, 38, 962–969. [Google Scholar] [CrossRef] [PubMed]
- Mistry, J.; Finn, R.D.; Eddy, S.R.; Bateman, A.; Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013, 41, e121. [Google Scholar] [CrossRef] [PubMed]
- Mirdita, M.; von den Driesch, L.; Galiez, C.; Martin, M.J.; Söding, J.; Steinegger, M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017, 45, D170–D176. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Chen, J.; Shen, T.; Li, Y.; Sun, S. Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation. arXiv 2023, arXiv:2306.01824. [Google Scholar]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Liu, W.; Wang, Z.; You, R.; Xie, C.; Wei, H.; Xiong, Y.; Yang, J.; Zhu, S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat. Commun. 2024, 15, 2775. [Google Scholar] [CrossRef]
- Jones, S.; Thornton, J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA 1996, 93, 13–20. [Google Scholar] [CrossRef]
- Ovchinnikov, S.; Kamisetty, H.; Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife 2014, 3, e02030. [Google Scholar] [CrossRef]
- Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2022. [Google Scholar] [CrossRef]
- Szklarczyk, D.; Morris, J.H.; Cook, H.; Kuhn, M.; Wyder, S.; Simonovic, M.; Santos, A.; Doncheva, N.T.; Roth, A.; Bork, P.; et al. The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017, 45, D362–D368. [Google Scholar] [CrossRef]
- Harrison, P.W.; Alako, B.; Amid, C.; Cerdeño-Tárraga, A.; Cleland, I.; Holt, S.; Hussein, A.; Jayathilaka, S.; Kay, S.; Keane, T.; et al. The European Nucleotide Archive in 2018. Nucleic Acids Res. 2019, 47, D84–D88. [Google Scholar] [CrossRef]
- Federhen, S. The NCBI Taxonomy database. Nucleic Acids Res. 2012, 40, D136–D143. [Google Scholar] [CrossRef]
- UniProt Consortium, T. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 2018, 46, 2699. [Google Scholar] [CrossRef]
- Lupo, U.; Sgarbossa, D.; Bitbol, A.-F. Pairing interacting protein sequences using masked language modeling. arXiv 2023, arXiv:2308.07136. [Google Scholar] [CrossRef]
- Liu, J.; Guo, Z.; Wu, T.; Roy, R.S.; Quadir, F.; Chen, C.; Cheng, J. Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15. Commun. Biol. 2023, 6, 1140. [Google Scholar] [CrossRef]
- Suzek, B.E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C.H. UniRef: Comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007, 23, 1282–1288. [Google Scholar] [CrossRef]
- UniProt Consortium, T. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef]
- Markowitz, V.M.; Ivanova, N.N.; Szeto, E.; Palaniappan, K.; Chu, K.; Dalevi, D.; Chen, I.M.; Grechkin, Y.; Dubchak, I.; Anderson, I.; et al. IMG/M: A data management and analysis system for metagenomes. Nucleic Acids Res. 2008, 36, D534–D538. [Google Scholar] [CrossRef]
- Liu, J.; Guo, Z.; Wu, T.; Roy, R.S.; Chen, C.; Cheng, J. Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15. Commun. Chem. 2023, 6, 188. [Google Scholar] [CrossRef]
- Hofacker, I.L.; Bernhart, S.H.; Stadler, P.F. Alignment of RNA base pairing probability matrices. Bioinformatics 2004, 20, 2222–2227. [Google Scholar] [CrossRef]
- Dowell, R.D.; Eddy, S.R. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinform. 2006, 7, 400. [Google Scholar] [CrossRef] [PubMed]
- Nawrocki, E.P.; Eddy, S.R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 2013, 29, 2933–2935. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Singh, J.; Litfin, T.; Zhan, J.; Paliwal, K.; Zhou, Y. RNAcmap: A fully automatic pipeline for predicting contact maps of RNAs by evolutionary coupling analysis. Bioinformatics 2021, 37, 3494–3500. [Google Scholar] [CrossRef] [PubMed]
- Lorenz, R.; Bernhart, S.H.; Höner Zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithms Mol. Biol. 2011, 6, 26. [Google Scholar] [CrossRef]
- Singh, J.; Hanson, J.; Paliwal, K.; Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 2019, 10, 5407. [Google Scholar] [CrossRef]
- Hanumanthappa, A.K.; Singh, J.; Paliwal, K.; Singh, J.; Zhou, Y. Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network. Bioinformatics 2021, 36, 5169–5176. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, Y.; Pyle, A.M. rMSA: A Sequence Search and Alignment Algorithm to Improve RNA Structure Modeling. J. Mol. Biol. 2023, 435, 167904. [Google Scholar] [CrossRef]
- Weinreb, C.; Riesselman, A.J.; Ingraham, J.B.; Gross, T.; Sander, C.; Marks, D.S. 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 2016, 165, 963–975. [Google Scholar] [CrossRef]
- Steinegger, M.; Meier, M.; Mirdita, M.; Vöhringer, H.; Haunsberger, S.J.; Söding, J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 2019, 20, 473. [Google Scholar] [CrossRef]
- Ho, J.; Kalchbrenner, N.; Weissenborn, D.; Salimans, T. Axial attention in multidimensional transformers. arXiv 2019, arXiv:1912.12180. [Google Scholar]
- Ram, S.; Bepler, T. Few Shot Protein Generation. arXiv 2022, arXiv:2204.01168. [Google Scholar]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Clark, K.; Luong, M.-T.; Le, Q.V.; Manning, C.D. Electra: Pre-training text encoders as discriminators rather than generators. arXiv 2020, arXiv:2003.10555. [Google Scholar]
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
- Su, J.; Han, C.; Zhou, Y.; Shan, J.; Zhou, X.; Yuan, F. SaProt: Protein Language Modeling with Structure-aware Vocabulary. bioRxiv 2023. [Google Scholar] [CrossRef]
- Oord, A.v.d.; Vinyals, O.; Kavukcuoglu, K. Neural Discrete Representation Learning. arXiv 2017, arXiv:1711.00937. [Google Scholar]
- Yang, K.K.; Zanichelli, N.; Yeh, H. Masked inverse folding with sequence transfer for protein representation learning. Protein Eng. Des. Sel. 2023, 36, gzad015. [Google Scholar] [CrossRef]
- Frazer, J.; Notin, P.; Dias, M.; Gomez, A.; Min, J.K.; Brock, K.; Gal, Y.; Marks, D.S. Disease variant prediction with deep generative models of evolutionary data. Nature 2021, 599, 91–95. [Google Scholar] [CrossRef]
- Chowdhury, R.; Bouatta, N.; Biswas, S.; Floristean, C.; Kharkar, A.; Roy, K.; Rochereau, C.; Ahdritz, G.; Zhang, J.; Church, G.M.; et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 2022, 40, 1617–1623. [Google Scholar] [CrossRef] [PubMed]
- Wu, R.; Ding, F.; Wang, R.; Shen, R.; Zhang, X.; Luo, S.; Su, C.; Wu, Z.; Xie, Q.; Berger, B.; et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022. [Google Scholar] [CrossRef]
- Leinonen, R.; Diez, F.G.; Binns, D.; Fleischmann, W.; Lopez, R.; Apweiler, R. UniProt archive. Bioinformatics 2004, 20, 3236–3237. [Google Scholar] [CrossRef] [PubMed]
- Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’19), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Madani, A.; Krause, B.; Greene, E.R.; Subramanian, S.; Mohr, B.P.; Holton, J.M.; Olmos, J.L., Jr.; Xiong, C.; Sun, Z.Z.; Socher, R.; et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 2023, 41, 1099–1106. [Google Scholar] [CrossRef]
- UniProt Consortium, T. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2007, 35, D193–D197. [Google Scholar] [CrossRef]
- Finn, R.D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; et al. Pfam: The protein families database. Nucleic Acids Res. 2014, 42, D222–D230. [Google Scholar] [CrossRef]
- Nijkamp, E.; Ruffolo, J.A.; Weinstein, E.N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968–978.e963. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Hesslow, D.; Zanichelli, N.; Notin, P.; Poli, I.; Marks, D. RITA: A Study on Scaling Up Generative Protein Sequence Models. arXiv 2022, arXiv:2205.05789. [Google Scholar]
- Notin, P.; Dias, M.; Frazer, J.; Marchena-Hurtado, J.; Gomez, A.; Marks, D.S.; Gal, Y. Tranception: Protein fitness prediction with autoregressive transformers and inference-time retrieval. arXiv 2022, arXiv:2205.13760. [Google Scholar]
- Meier, J.; Rao, R.; Verkuil, R.; Liu, J.; Sercu, T.; Rives, A. Language models enable zero-shot prediction of the effects of mutations on protein function. bioRxiv 2021. [Google Scholar] [CrossRef]
- Chen, B.; Cheng, X.; Li, P.; Geng, Y.-a.; Gong, J.; Li, S.; Bei, Z.; Tan, X.; Wang, B.; Zeng, X.; et al. xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. arXiv 2024, arXiv:2401.06199. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv 2019, arXiv:1910.10683. [Google Scholar]
- Iovino, B.G.; Tang, H.; Ye, Y. Protein domain embeddings for fast and accurate similarity search. Genome Res. 2024, 34, 1434–1444. [Google Scholar] [CrossRef]
- Yang, P.; Zheng, W.; Ning, K.; Zhang, Y. Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. Proc. Natl. Acad. Sci. USA 2021, 118, e2110828118. [Google Scholar] [CrossRef]
- Gil, N.; Fiser, A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019, 35, 12–19. [Google Scholar] [CrossRef]
- Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 1999, 292, 195–202. [Google Scholar] [CrossRef]
- Wu, S.; Zhang, Y. ANGLOR: A composite machine-learning algorithm for protein backbone torsion angle prediction. PLoS ONE 2008, 3, e3400. [Google Scholar] [CrossRef] [PubMed]
- Adhikari, B.; Hou, J.; Cheng, J. DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2018, 34, 1466–1472. [Google Scholar] [CrossRef] [PubMed]
- Hanson, J.; Paliwal, K.; Litfin, T.; Yang, Y.; Zhou, Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 2018, 34, 4039–4045. [Google Scholar] [CrossRef]
- He, B.; Mortuza, S.M.; Wang, Y.; Shen, H.-B.; Zhang, Y. NeBcon: Protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2017, 33, 2296–2306. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput. Biol. 2017, 13, e1005324. [Google Scholar] [CrossRef]
- Wu, S.; Zhang, Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008, 72, 547–556. [Google Scholar] [CrossRef]
- Zheng, W.; Zhang, C.; Wuyun, Q.; Pearce, R.; Li, Y.; Zhang, Y. LOMETS2: Improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019, 47, W429–W436. [Google Scholar] [CrossRef]
- Weigt, M.; White, R.A.; Szurmant, H.; Hoch, J.A.; Hwa, T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. USA 2009, 106, 67–72. [Google Scholar] [CrossRef]
- Bitbol, A.F.; Dwyer, R.S.; Colwell, L.J.; Wingreen, N.S. Inferring interaction partners from protein sequences. Proc. Natl. Acad. Sci. USA 2016, 113, 12180–12185. [Google Scholar] [CrossRef]
- Szurmant, H.; Weigt, M. Inter-residue, inter-protein and inter-family coevolution: Bridging the scales. Curr. Opin. Struct. Biol. 2018, 50, 26–32. [Google Scholar] [CrossRef]
- Gueudré, T.; Baldassi, C.; Zamparo, M.; Weigt, M.; Pagnani, A. Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc. Natl. Acad. Sci. USA 2016, 113, 12186–12191. [Google Scholar] [CrossRef] [PubMed]
- Sankoff, D. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 1985, 45, 810–825. [Google Scholar] [CrossRef]
- Mathews, D.H.; Turner, D.H. Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 2002, 317, 191–203. [Google Scholar] [CrossRef] [PubMed]
- Will, S.; Reiche, K.; Hofacker, I.L.; Stadler, P.F.; Backofen, R. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput. Biol. 2007, 3, e65. [Google Scholar] [CrossRef]
- Baek, M.; McHugh, R.; Anishchenko, I.; Jiang, H.; Baker, D.; DiMaio, F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117–121. [Google Scholar] [CrossRef]
- Pearce, R.; Omenn, G.S.; Zhang, Y. De Novo RNA Tertiary Structure Prediction at Atomic Resolution Using Geometric Potentials from Deep Learning. bioRxiv 2022. [Google Scholar] [CrossRef]
- Wang, W.; Feng, C.; Han, R.; Wang, Z.; Ye, L.; Du, Z.; Wei, H.; Zhang, F.; Peng, Z.; Yang, J. trRosettaRNA: Automated prediction of RNA 3D structure with transformer network. Nat. Commun. 2023, 14, 7266. [Google Scholar] [CrossRef]
- Gainza, P.; Nisonoff, H.M.; Donald, B.R. Algorithms for protein design. Curr. Opin. Struct. Biol. 2016, 39, 16–26. [Google Scholar] [CrossRef]
- Lapedes, A.S.; Giraud, B.G.; Liu, L.; Stormo, G.D. Correlated Mutations in Models of Protein Sequences: Phylogenetic and Structural Effects; Lecture Notes-Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1999; pp. 236–256. [Google Scholar]
- Hopf, T.A.; Green, A.G.; Schubert, B.; Mersmann, S.; Schärfe, C.P.I.; Ingraham, J.B.; Toth-Petroczy, A.; Brock, K.; Riesselman, A.J.; Palmedo, P.; et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics 2019, 35, 1582–1584. [Google Scholar] [CrossRef]
- Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D.S.; Sander, C.; Zecchina, R.; Onuchic, J.N.; Hwa, T.; Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA 2011, 108, E1293–E1301. [Google Scholar] [CrossRef]
- Michaud, J.M.; Madani, A.; Fraser, J.S. A language model beats alphafold2 on orphans. Nat. Biotechnol. 2022, 40, 1576–1577. [Google Scholar] [CrossRef]
Advantages | Limitations | Classification | Objective |
---|---|---|---|
Such methods perform well on short sequences or sequences with high similarity. | Such methods have limited sensitivity to distantly related homologous sequences. | Sequence-based approaches | MSA for protein monomer |
Such methods can significantly improve sensitivity and alignment quality, allowing for better capture of distant homology. | When the database is very large, the running speed can be slow, especially for complex model training and alignment processes. | HMM-based approaches | |
Such methods enable fast and accurate searching of large-scale databases, further enhancing speed and sensitivity. | There is still potential for improving the precision of the MSAs it generates. | k-mer-based approaches | |
Such methods enable fast and highly sensitive exploration of metagenomic databases, integrating multiple specialized tools to generate optimal MSAs. | The algorithm is complex and requires substantial computational resources. | Multi-stage hybrid approaches | |
Such methods significantly improve the sensitivity for identifying homologous query target pairs with low sequence consistency but high structural similarity. | In the local mode, alignments are often shorter yet more accurate, and their evolutionary significance is still to be explored. | Deep learning-based approaches | |
The algorithm is simple and intuitive, requiring no additional information. | Such methods are more suitable for prokaryotes. | Genomic distance-based approaches | MSA for protein complex |
It addresses the issue that, in eukaryotes, a single MSA containing a rich set of paralogs may pose a challenge for methods based on genomic distance, which are unable to identify potential interactions. | The abundant homologous sequences in metagenomic databases cannot be fully utilized to guide the assembly of multi-chain structures. | Phylogeny-based approaches | |
Integrating protein interaction databases for MSA refinement can help produce more stable results | Such MSA construction methods are all hand-crafted approaches and merely have effects on the specific domains. | Protein-protein interactions databases-based approaches | |
Such methods enable highly automated MSA concatenation. | The feasibility and effectiveness of its practical application remain to be evaluated. | PLM-based approaches | |
Such methods integrate various homologous detection strategies and monomer MSA concatenation techniques to achieve high-quality, deep, and versatile MSA construction. | The construction of MSA for heteromeric complexes requires further improvement. | Hybrid approaches | |
Such methods perform well on short sequences or sequences with high similarity. | Such methods have limited sensitivity to distantly related homologous sequences. | Sequence-based approaches | MSA for RNA |
HMM-based methods offer enhanced capability for capturing remote homologous relationships compared to sequence-based methods. | These methods lack the utilization of RNA secondary structure information. | HMM-based approaches | |
CM-based approaches utilize conserved secondary structure features as supplementary information, which is particularly important for identifying functionally similar RNA molecules with significant sequence divergence. | These methods rely on predefined consensus models, and their performance may be suboptimal when applied to unknown RNA sequences. | CM-based approaches | |
These methods integrate various MSA techniques to achieve high-quality, deep, and versatile MSA construction. | The algorithm is complex and requires substantial computational resources. | Hybrid approaches | |
Compared to single-sequence input, the results of such methods yield better performance for downstream tasks. | The demand for computational resources is higher. | With MSA as input | PLMs |
Implicitly and more effectively capturing the evolutionary and co-evolutionary information of sequences, reducing time costs. The autoencoding-based bidirectional learning is better at learning the contextual relationships of amino acids. | PLM-based methods with autoencoding objectives perform comparably to MSA-based methods in general protein understanding tasks but exhibit relatively lower accuracy in structure prediction. | Autoencoding objectives with single-sequence input | |
Autoregressive objectives are more suitable for protein generation tasks | These methods do not adequately capture the complex global interactions of amino acids. | Autoregressive objectives with single-sequence input | |
These methods combine the advantages of both autoencoding and autoregressive objectives. | These methods lack design specifically tailored to the features of protein sequences. | Others |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Wang, Q.; Li, Y.; Teng, A.; Hu, G.; Wuyun, Q.; Zheng, W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024, 14, 1531. https://doi.org/10.3390/biom14121531
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules. 2024; 14(12):1531. https://doi.org/10.3390/biom14121531
Chicago/Turabian StyleZhang, Chenyue, Qinxin Wang, Yiyang Li, Anqi Teng, Gang Hu, Qiqige Wuyun, and Wei Zheng. 2024. "The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction" Biomolecules 14, no. 12: 1531. https://doi.org/10.3390/biom14121531
APA StyleZhang, C., Wang, Q., Li, Y., Teng, A., Hu, G., Wuyun, Q., & Zheng, W. (2024). The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules, 14(12), 1531. https://doi.org/10.3390/biom14121531