Evolutionary Characterization of the Short Protein SPAAR
Abstract
:1. Introduction
2. Materials and Methods
2.1. Initial Homology Search
2.2. Syntenic Alignments
2.2.1. Curation of Precomputed LastZ Alignments
2.2.2. Identification of SPAAR ORFs from Syntenic Alignments
2.2.3. Identification of Unannotated HRCT1 Orthologs
2.3. Gene Expression Analysis
2.4. Remote Homology Detection
2.5. Conservation Analyses in Mammalian Lineages
2.5.1. Multiple Sequence Alignments and Guide Trees
2.5.2. Conservation Analyses in Mammalian Lineages
2.6. Structural Predictions
2.7. Analyses of Long SPAAR ORF
2.7.1. Addition of Primate Sequences
2.7.2. Ancestral Sequence Reconstruction
2.7.3. Site-Specific Positive Selection Test
3. Results
3.1. Identification of SPAAR Orthologs Outside of Placental Mammals
3.2. Sequence Divergence and Structural Conservation of SPAAR Orthologs
3.3. Emergence of Long SPAAR Isoform from Noncoding Sequences in Primates
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef] [Green Version]
- Howe, K.L.; Achuthan, P.; Allen, J.; Allen, J.; Alvarez-Jarreta, J.; Amode, M.R.; Armean, I.M.; Azov, A.G.; Bennett, R.; Bhai, J.; et al. Ensembl 2021. Nucleic Acids Res. 2021, 49, D884–D891. [Google Scholar] [CrossRef] [PubMed]
- Mudge, J.M.; Ruiz-Orera, J.; Prensner, J.R.; Brunet, M.A.; Gonzalez, J.M.; Magrane, M.; Martinez, T.; Schulz, J.F.; Yang, Y.T.; Albà, M.M.; et al. A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq. BioRxiv 2021. [Google Scholar] [CrossRef]
- Schlesinger, D.; Elsässer, S.J. Revisiting sORFs: Overcoming challenges to identify and characterize functional microproteins. FEBS J. 2021. [Google Scholar] [CrossRef]
- Ingolia, N.T.; Brar, G.A.; Stern-Ginossar, N.; Harris, M.S.; Talhouarne, G.J.S.; Jackson, S.E.; Wills, M.R.; Weissman, J.S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 2014, 8, 1365–1379. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Brunner, A.-D.; Cogan, J.Z.; Nuñez, J.K.; Fields, A.P.; Adamson, B.; Itzhak, D.N.; Li, J.Y.; Mann, M.; Leonetti, M.D.; et al. Pervasive functional translation of noncanonical human open reading frames. Science 2020, 367, 1140–1146. [Google Scholar] [CrossRef]
- Carvunis, A.-R.; Rolland, T.; Wapinski, I.; Calderwood, M.A.; Yildirim, M.A.; Simonis, N.; Charloteaux, B.; Hidalgo, C.A.; Barbette, J.; Santhanam, B.; et al. Proto-genes and de novo gene birth. Nature 2012, 487, 370–374. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hsu, P.Y.; Calviello, L.; Wu, H.-Y.L.; Li, F.-W.; Rothfels, C.J.; Ohler, U.; Benfey, P.N. Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis. Proc. Natl. Acad. Sci. USA 2016, 113, E7126–E7135. [Google Scholar] [CrossRef] [Green Version]
- Laumont, C.M.; Daouda, T.; Laverdure, J.-P.; Bonneil, É.; Caron-Lizotte, O.; Hardy, M.-P.; Granados, D.P.; Durette, C.; Lemieux, S.; Thibault, P.; et al. Global proteogenomic analysis of human MHC class I-associated peptides derived from non-canonical reading frames. Nat. Commun. 2016, 7, 10238. [Google Scholar] [CrossRef]
- van Heesch, S.; Witte, F.; Schneider-Lunitz, V.; Schulz, J.F.; Adami, E.; Faber, A.B.; Kirchner, M.; Maatz, H.; Blachut, S.; Sandmann, C.-L.; et al. The translational landscape of the human heart. Cell 2019, 178, 242–260.e29. [Google Scholar] [CrossRef] [Green Version]
- Anderson, D.M.; Anderson, K.M.; Chang, C.-L.; Makarewich, C.A.; Nelson, B.R.; McAnally, J.R.; Kasaragod, P.; Shelton, J.M.; Liou, J.; Bassel-Duby, R.; et al. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 2015, 160, 595–606. [Google Scholar] [CrossRef] [Green Version]
- Nelson, B.R.; Makarewich, C.A.; Anderson, D.M.; Winders, B.R.; Troupes, C.D.; Wu, F.; Reese, A.L.; McAnally, J.R.; Chen, X.; Kavalali, E.T.; et al. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 2016, 351, 271–275. [Google Scholar] [CrossRef] [Green Version]
- Makarewich, C.A. The hidden world of membrane microproteins. Exp. Cell Res. 2020, 388, 111853. [Google Scholar] [CrossRef]
- Zanet, J.; Chanut-Delalande, H.; Plaza, S.; Payre, F. Small peptides as newcomers in the control of drosophila development. Curr. Top. Dev. Biol. 2016, 117, 199–219. [Google Scholar] [CrossRef]
- Fesenko, I.; Shabalina, S.A.; Mamaeva, A.; Knyazev, A.; Glushkevich, A.; Lyapina, I.; Ziganshin, R.; Kovalchuk, S.; Kharlampieva, D.; Lazarev, V.; et al. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res. 2021, 49, 10328–10346. [Google Scholar] [CrossRef]
- Wacholder, A.; Acar, O.; Carvunis, A.-R. A reference translatome map reveals two modes of protein evolution. BioRxiv 2021. [Google Scholar] [CrossRef]
- Ruiz-Orera, J.; Verdaguer-Grau, P.; Villanueva-Cañas, J.L.; Messeguer, X.; Albà, M.M. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat. Ecol. Evol. 2018, 2, 890–896. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Q.; Vashisht, A.A.; O’Rourke, J.; Corbel, S.Y.; Moran, R.; Romero, A.; Miraglia, L.; Zhang, J.; Durrant, E.; Schmedt, C.; et al. The microprotein Minion controls cell fusion and muscle formation. Nat. Commun. 2017, 8, 15664. [Google Scholar] [CrossRef] [Green Version]
- D’Lima, N.G.; Ma, J.; Winkler, L.; Chu, Q.; Loh, K.H.; Corpuz, E.O.; Budnik, B.A.; Lykke-Andersen, J.; Saghatelian, A.; Slavoff, S.A. A human microprotein that interacts with the mRNA decapping complex. Nat. Chem. Biol. 2017, 13, 174–180. [Google Scholar] [CrossRef] [PubMed]
- Van Oss, S.B.; Carvunis, A.-R. De novo gene birth. PLoS Genet. 2019, 15, e1008160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ruiz-Orera, J.; Hernandez-Rodriguez, J.; Chiva, C.; Sabidó, E.; Kondova, I.; Bontrop, R.; Marqués-Bonet, T.; Albà, M.M. Origins of de novo genes in human and chimpanzee. PLoS Genet. 2015, 11, e1005721. [Google Scholar] [CrossRef] [PubMed]
- McLysaght, A.; Hurst, L.D. Open questions in the study of de novo genes: What, how and why. Nat. Rev. Genet. 2016, 17, 567–578. [Google Scholar] [CrossRef]
- Weisman, C.M.; Murray, A.W.; Eddy, S.R. Many, but not all, lineage-specific genes can be explained by homology detection failure. PLoS Biol. 2020, 18, e3000862. [Google Scholar] [CrossRef]
- Matsumoto, A.; Pasut, A.; Matsumoto, M.; Yamashita, R.; Fung, J.; Monteleone, E.; Saghatelian, A.; Nakayama, K.I.; Clohessy, J.G.; Pandolfi, P.P. mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide. Nature 2017, 541, 228–232. [Google Scholar] [CrossRef]
- Saxton, R.A.; Sabatini, D.M. mTOR Signaling in Growth, Metabolism, and Disease. Cell 2017, 168, 960–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Spencer, H.L.; Sanders, R.; Boulberdaa, M.; Meloni, M.; Cochrane, A.; Spiroski, A.-M.; Mountford, J.; Emanueli, C.; Caporali, A.; Brittan, M.; et al. The LINC00961 transcript and its encoded micropeptide, small regulatory polypeptide of amino acid response, regulate endothelial cell function. Cardiovasc. Res. 2020, 116, 1981–1994. [Google Scholar] [CrossRef] [Green Version]
- National Library of Medicine Gene. Available online: https://www.ncbi.nlm.nih.gov/gene/ (accessed on 15 August 2021).
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [Green Version]
- Harris, R.S. Improved Pairwise Alignment of Genomic DNA. Doctoral Dissertation; The Pennsylvania State University: State College, PA, USA, 2007. [Google Scholar]
- Durinck, S.; Spellman, P.T.; Birney, E.; Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 2009, 4, 1184–1191. [Google Scholar] [CrossRef] [Green Version]
- Eddy, S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 2011, 7, e1002195. [Google Scholar] [CrossRef] [Green Version]
- Leinonen, R.; Sugawara, H.; Shumway, M. International Nucleotide Sequence Database Collaboration The sequence read archive. Nucleic Acids Res. 2011, 39, D19–D21. [Google Scholar] [CrossRef] [Green Version]
- Marin, R.; Cortez, D.; Lamanna, F.; Pradeepa, M.M.; Leushkin, E.; Julien, P.; Liechti, A.; Halbert, J.; Brüning, T.; Mössinger, K.; et al. Convergent origination of a Drosophila-like dosage compensation mechanism in a reptile lineage. Genome Res. 2017, 27, 1974–1987. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Z.-Y.; Leushkin, E.; Liechti, A.; Ovchinnikova, S.; Mößinger, K.; Brüning, T.; Rummel, C.; Grützner, F.; Cardoso-Moreira, M.; Janich, P.; et al. Transcriptome and translatome co-evolution in mammals. Nature 2020, 588, 642–647. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef]
- Babraham Bioinformatics. Trim Galore; Babraham Institute: Cambridge, UK, 2019. [Google Scholar]
- Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef] [Green Version]
- Pertea, M.; Kim, D.; Pertea, G.M.; Leek, J.T.; Salzberg, S.L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 2016, 11, 1650–1667. [Google Scholar] [CrossRef]
- Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef] [PubMed]
- Malone, B.; Atanassov, I.; Aeschimann, F.; Li, X.; Großhans, H.; Dieterich, C. Bayesian prediction of RNA translation from ribosome profiling. Nucleic Acids Res. 2017, 45, 2960–2972. [Google Scholar] [CrossRef] [Green Version]
- Morgulis, A.; Coulouris, G.; Raytselis, Y.; Madden, T.L.; Agarwala, R.; Schäffer, A.A. Database indexing for production MegaBLAST searches. Bioinformatics 2008, 24, 1757–1764. [Google Scholar] [CrossRef]
- Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef]
- Potter, S.C.; Luciani, A.; Eddy, S.R.; Park, Y.; Lopez, R.; Finn, R.D. HMMER web server: 2018 update. Nucleic Acids Res. 2018, 46, W200–W204. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Armstrong, J.; Hickey, G.; Diekhans, M.; Fiddes, I.T.; Novak, A.M.; Deran, A.; Fang, Q.; Xie, D.; Feng, S.; Stiller, J.; et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020, 587, 246–251. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Rozewicki, J.; Yamada, K.D. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 2017, 20, 1160–1166. [Google Scholar] [CrossRef] [Green Version]
- Paradis, E.; Schliep, K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2019, 35, 526–528. [Google Scholar] [CrossRef]
- Guindon, S.; Dufayard, J.-F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 1997, 13, 555–556. [Google Scholar] [CrossRef]
- Möller, S.; Croning, M.D.; Apweiler, R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 2001, 17, 646–653. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ward, J.J.; McGuffin, L.J.; Bryson, K.; Buxton, B.F.; Jones, D.T. The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20, 2138–2139. [Google Scholar] [CrossRef]
- Kim, D.E.; Chivian, D.; Baker, D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004, 32, W526–W531. [Google Scholar] [CrossRef] [Green Version]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Schrödinger LLC. The PyMOL Molecular Graphics System, Version 2.5.2; Schrödinger, Inc.: New York, NY, USA, 2021. [Google Scholar]
- Löytynoja, A.; Goldman, N. webPRANK: A phylogeny-aware multiple sequence aligner with interactive alignment browser. BMC Bioinform. 2010, 11, 579. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Storer, J.; Hubley, R.; Rosen, J.; Wheeler, T.J.; Smit, A.F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 2021, 12, 2. [Google Scholar] [CrossRef] [PubMed]
- Tautz, D.; Domazet-Lošo, T. The evolutionary origin of orphan genes. Nat. Rev. Genet. 2011, 12, 692–702. [Google Scholar] [CrossRef] [PubMed]
- Vakirlis, N.; Carvunis, A.-R.; McLysaght, A. Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes. eLife 2020, 9, e53500. [Google Scholar] [CrossRef]
Species 1 | Species 2 | N | S | dN | dS | dN/dS | p-Value |
---|---|---|---|---|---|---|---|
Human | Mouse | 170.1 | 54.9 | 0.1795 | 1.2873 | 0.1395 (±0.0545) | 5.4222 × 10−7 * |
Koala | Wombat | 167.4 | 57.6 | 0.0426 | 0.0997 | 0.4274 (±0.267) | 4.177 × 10−1 |
Tasmanian Devil | Wombat | 168 | 57 | 0.0545 | 0.3203 | 0.1701 (±0.0816) | 1.004 × 10−3 * |
Tasmanian Devil | Koala | 166.9 | 58.1 | 0.061 | 0.4442 | 0.1373 (±0.0618) | 3.410 × 10−5 * |
Opossum | Wombat | 168.3 | 56.7 | 0.0826 | 0.296 | 0.2789 (±0.1211) | 1.707 × 10−2 * |
Opossum | Koala | 168.8 | 56.2 | 0.0881 | 0.3333 | 0.2642 (±0.1117) | 8.962 × 10−3 * |
Opossum | Tasmanian Devil | 162.2 | 62.8 | 0.071 | 0.4726 | 0.1504 (±0.0639) | 3.429 × 10−5 * |
Platypus | Echidna | 162.8 | 71.2 | 0.0523 | 0.1289 | 0.4061 (±0.2110) | 2.481 × 10−1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.; Wacholder, A.; Carvunis, A.-R. Evolutionary Characterization of the Short Protein SPAAR. Genes 2021, 12, 1864. https://doi.org/10.3390/genes12121864
Lee J, Wacholder A, Carvunis A-R. Evolutionary Characterization of the Short Protein SPAAR. Genes. 2021; 12(12):1864. https://doi.org/10.3390/genes12121864
Chicago/Turabian StyleLee, Jiwon, Aaron Wacholder, and Anne-Ruxandra Carvunis. 2021. "Evolutionary Characterization of the Short Protein SPAAR" Genes 12, no. 12: 1864. https://doi.org/10.3390/genes12121864
APA StyleLee, J., Wacholder, A., & Carvunis, A.-R. (2021). Evolutionary Characterization of the Short Protein SPAAR. Genes, 12(12), 1864. https://doi.org/10.3390/genes12121864