Post-Alignment Adjustment and Its Automation
Abstract
:1. Introduction
2. Criteria and Methods Used to Identify Suboptimal Sites in Alignments
2.1. Sum-of-Pairs Score (SPS)
2.2. Pairwise Alignment Inconsistency Index (PAI)
2.3. Position Weight Matrix Differential (PWMD)
3. A comparison of Methods with Huntingtin Sequence Alignment
4. Discussion
5. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Blackburne, B.P.; Whelan, S. Class of multiple sequence alignment algorithm affects genomic analysis. Mol. Biol. Evol. 2013, 30, 642–653. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumar, S.; Filipski, A. Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Res. 2007, 17, 127–135. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wong, K.M.; Suchard, M.A.; Huelsenbeck, J.P. Alignment uncertainty and genomic analysis. Science 2008, 319, 473–476. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Noah, K.; Hao, J.; Li, Y.; Sun, X.; Foley, B.T.; Yang, Q.; Xia, X. Major revisions in arthropod phylogeny through improved supermatrix, with support for two possible waves of land invasion by chelicerates. Evol. Bioinform. 2020, 16, 1176934320903735. [Google Scholar] [CrossRef]
- Xia, X. A Mathematical Primer of Molecular Phylogenetics; CRC Press: New York, NY, USA, 2020; p. 380. [Google Scholar]
- Edgar, R.C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004, 5, 113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Katoh, K.; Asimenos, G.; Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 2009, 537, 39–64. [Google Scholar]
- Hogeweg, P.; Hesper, B. The alignment of sets of sequences and the construction of phylogenetic trees: An integrated method. J. Mol. Evol. 1984, 20, 175–186. [Google Scholar] [CrossRef]
- Thompson, J.D.; Higgins, D.G.; Gibson, T.J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22, 4673–4680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, X. Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense. Mol. Biol. Evol. 2020, 37, 2699–2705. [Google Scholar] [CrossRef]
- Xia, X. Dating the Common Ancestor from an NCBI Tree of 83688 High-Quality and Full-Length SARS-CoV-2 Genomes. Viruses 2021, 13, 1790. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Sequence Alignment. In Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics; Springer: Cham, Switzerland, 2018; pp. 33–75. [Google Scholar]
- Higgins, D.; Lemey, P. Multiple sequence alignment. In The Phylogenetic Handbook; Lemey, P., Salemi, M., Vandamme, A.M., Eds.; Cambridge University Press: Cambridge, UK, 2009; pp. 68–108. [Google Scholar]
- Wei, Y.; Aris, P.; Farookhi, H.; Xia, X. Predicting mammalian species at risk of being infected by SARS-CoV-2 from an ACE2 perspective. Sci. Rep. 2021, 11, 1702. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Data Analysis in Molecular Biology and Evolution; Kluwer Academic Publishers: Boston, UK, 2000; p. 277. [Google Scholar]
- Xia, X.; Xie, Z. DAMBE: Software package for data analysis in molecular biology and evolution. J. Hered. 2001, 92, 371–373. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, X. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences. Mol. Phylogenet. Evol. 2016, 102, 331–343. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. DAMBE6: New Tools for Microbial Genomics, Phylogenetics, and Molecular Evolution. J. Hered. 2017, 108, 431–437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sankoff, D.; Cedergren, R.J.; Lapalme, G. Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA. J. Mol. Evol. 1976, 7, 133–149. [Google Scholar] [CrossRef] [PubMed]
- Vingron, M.; von Haeseler, A. Towards integration of multiple alignment and phylogenetic tree construction. J. Comput. Biol. 1997, 4, 23–34. [Google Scholar] [CrossRef] [PubMed]
- Edgar, R.C.; Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol. 2006, 16, 368–373. [Google Scholar] [CrossRef] [PubMed]
- Althaus, E.; Caprara, A.; Lenhof, H.P.; Reinert, K. Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics. Bioinformatics 2002, 18, S4–S16. [Google Scholar] [CrossRef] [Green Version]
- Reinert, K.; Stoye, J.; Will, T. An iterative method for faster sum-of-pairs multiple sequence alignment. Bioinformatics 2000, 16, 808–814. [Google Scholar] [CrossRef] [Green Version]
- Stoye, J.; Moulton, V.; Dress, A.W. DCA: An efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput. Appl. Biosci. 1997, 13, 625–626. [Google Scholar] [CrossRef]
- Lipman, D.J.; Altschul, S.F.; Kececioglu, J.D. A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 1989, 86, 4412–4415. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gupta, S.K.; Kececioglu, J.D.; Schaffer, A.A. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Comput. Biol. 1995, 2, 459–472. [Google Scholar] [CrossRef] [PubMed]
- Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Floden, E.W.; Tommaso, P.D.; Chatzou, M.; Magis, C.; Notredame, C.; Chang, J.M. PSI/TM-Coffee: A web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic Acids Res. 2016, 44, W339–W343. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Magis, C.; Taly, J.F.; Bussotti, G.; Chang, J.M.; Di Tommaso, P.; Erb, I.; Espinosa-Carrasco, J.; Notredame, C. T-Coffee: Tree-based consistency objective function for alignment evaluation. Methods Mol. Biol. 2014, 1079, 117–129. [Google Scholar]
- Chang, J.M.; Di Tommaso, P.; Notredame, C. TCS: A new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 2014, 31, 1625–1637. [Google Scholar] [CrossRef]
- Gotoh, O. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 1996, 264, 823–838. [Google Scholar] [CrossRef]
- Staden, R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984, 12, 505–519. [Google Scholar] [CrossRef] [Green Version]
- Stormo, G.D.; Schneider, T.D.; Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 1986, 14, 6661–6679. [Google Scholar] [CrossRef] [Green Version]
- Hertz, G.Z.; Hartzell, G.W., III; Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 1990, 6, 81–92. [Google Scholar] [CrossRef]
- Claverie, J.M.; Audic, S. The statistical significance of nucleotide position-weight matrix matches. Comput. Appl. Biosci. 1996, 12, 431–439. [Google Scholar] [CrossRef] [PubMed]
- Hertz, G.Z.; Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 1999, 15, 563–577. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Position weight matrix and Perceptron. In Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics; Springer: Cham, Switzerland, 2018; pp. 77–98. [Google Scholar]
- Xia, X. Beyond Trees: Regulons and Regulatory Motif Characterization. Genes 2020, 11, 995. [Google Scholar] [CrossRef]
- Xia, X. Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction. Scientifica 2012, 2012, 917540. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, X. DAMBE7: New and improved tools for data analysis in molecular biology and evolution. Mol. Biol. Evol. 2018, 35, 1550–1552. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xu, P.; Pan, F.; Roland, C.; Sagui, C.; Weninger, K. Dynamics of strand slippage in DNA hairpins formed by CAG repeats: Roles of sequence parity and trinucleotide interrupts. Nucleic Acids Res. 2020, 48, 2232–2245. [Google Scholar] [CrossRef]
- Wexler, N.S.; Lorimer, J.; Porter, J.; Gomez, F.; Moskwitz, C.; Shackell, E.; Karen, M.; Penchaszadeh, G.; Roberts, S.A.; Gayan, J.; et al. Venezuelan kindreds reveal that genetic and environmental factors modulate Huntington’s disease age of onset. Proc. Natl. Acad. Sci. USA 2004, 101, 3498–3503. [Google Scholar]
- Guindon, S.; Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003, 52, 696–704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Heath, T.A.; Zwickl, D.J.; Kim, J.; Hillis, D.M. Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol. 2008, 57, 160–166. [Google Scholar] [CrossRef] [Green Version]
- Poe, S.; Swofford, D.L. Taxon sampling revisited. Nature 1999, 398, 299–300. [Google Scholar] [CrossRef]
- Zwickl, D.J.; Hillis, D.M. Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 2002, 51, 588–598. [Google Scholar] [CrossRef] [PubMed]
T/-(1) | T/T(1) | T/I(1) | I/-(1) | SPS | |
---|---|---|---|---|---|
Score(2) | −6 | 5 | −1 | −6 | |
Alignment 1 | 10 | 6 | 4 | −34 + C(3) | |
Alignment 2 | 6 | 10 | 4 | −10 + C(3) |
Alignment 1 | Alignment 2 | ||||
---|---|---|---|---|---|
AA | Site 20 | Site 21 | Site 20 | Site21 | |
A | −3.4621 | −3.4621 | −3.4621 | −3.4621 | |
R | −3.4632 | −3.4632 | −3.4632 | −3.4632 | |
N | −3.4620 | −3.4620 | −3.4620 | −3.4620 | |
D | −3.4625 | −3.4625 | −3.4625 | −3.4625 | |
C | −3.4757 | −3.4757 | −3.4757 | −3.4757 | |
Q | −3.4632 | −3.4632 | −3.4632 | −3.4632 | |
E | −3.4616 | −3.4616 | −3.4616 | −3.4616 | |
G | −3.4625 | −3.4625 | −3.4625 | −3.4625 | |
H | −3.4673 | −3.4673 | −3.4673 | −3.4673 | |
I | −3.4628 | 2.9353 | −3.4628 | 2.9353 | |
L | −3.4612 | −3.4612 | −3.4612 | −3.4612 | |
K | −3.4624 | −3.4624 | −3.4624 | −3.4624 | |
M | −3.4645 | −3.4645 | −3.4645 | −3.4645 | |
F | −3.4629 | −3.4629 | −3.4629 | −3.4629 | |
P | −3.4628 | −3.4628 | −3.4628 | −3.4628 | |
S | −3.4619 | −3.4619 | −3.4619 | −3.4619 | |
T | 4.1089 | 3.5976 | 4.2457 | 3.3770 | |
W | −3.4649 | −3.4649 | −3.4649 | −3.4649 | |
Y | −3.4632 | −3.4632 | −3.4632 | −3.4632 | |
V | −3.4621 | −3.4621 | −3.4621 | −3.4621 |
Substitution Matrix | |||
---|---|---|---|
Alignment | LG | JTT | BLOSUM62 |
in Figure 3A | −126.6903 | −122.6004 | −126.7423 |
in Figure 3B | −106.7703 | −105.2280 | −106.9387 |
Site | Q | P |
---|---|---|
18 | −0.0374 | −4.3223 |
19 | −0.0374 | −4.3223 |
20 | −0.0374 | −4.3223 |
21 | −0.0374 | −4.3223 |
22 | −0.0374 | −4.3223 |
23 | −0.0374 | −4.3223 |
24 | −0.0374 | −4.3223 |
25 | 1.4974 | −4.3223 |
26 | 1.4974 | −4.3223 |
27 | 2.2241 | −4.3223 |
28 | 4.2125 | −4.3223 |
29 | 4.2125 | −4.3223 |
30 | 4.2125 | −4.3223 |
31 | 4.2125 | −4.3223 |
32 | 4.2125 | −4.3223 |
33 | 4.2125 | −4.3223 |
34 | 4.1387 | −4.3223 |
35 | 4.0609 | −4.3223 |
36 | 4.0609 | −4.3223 |
37 | 4.0609 | −4.3223 |
38 | 2.8964 | 2.5727 |
39 | −0.0374 | −4.3223 |
40 | −4.3224 | −0.1637 |
41 | −4.3224 | −0.1637 |
42 | −4.3224 | 0.7953 |
43 | −4.3224 | 0.7953 |
44 | 0.9251 | 3.4601 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xia, X. Post-Alignment Adjustment and Its Automation. Genes 2021, 12, 1809. https://doi.org/10.3390/genes12111809
Xia X. Post-Alignment Adjustment and Its Automation. Genes. 2021; 12(11):1809. https://doi.org/10.3390/genes12111809
Chicago/Turabian StyleXia, Xuhua. 2021. "Post-Alignment Adjustment and Its Automation" Genes 12, no. 11: 1809. https://doi.org/10.3390/genes12111809
APA StyleXia, X. (2021). Post-Alignment Adjustment and Its Automation. Genes, 12(11), 1809. https://doi.org/10.3390/genes12111809