SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms
Abstract
:1. Introduction
2. Results
2.1. SpliceProt 2.0 Sequence Diversity at Transcript and Protein Levels
2.2. Purging SpliceProt 2.0 Protein Sequences Predicted to Be Targeted for the NMD Pathway and First Methionine Sequence Selection
2.3. The SpliceProt Release 2.0
2.4. SpliceProt 2.0 Performance against Other Databases for Proteotypic Peptide Detection
2.5. Identification of Orthologous Proteoforms
2.5.1. Relationships between the Orthologous Proteoforms, Shotgun Proteomics Analysis, and Transcript Quantification
2.5.2. Orthologous NMD Pathway Targets
2.6. Web Repository—User Interface
2.6.1. Search Tab
2.6.2. Download Tab
2.6.3. Submit Query Tab
3. Discussion
4. Materials and Methods
4.1. SpliceProt 2.0 Construction
4.1.1. Computational Translation of Predicted Transcripts
4.1.2. Identification of Hypothetical Transcripts Predicted to Be Susceptible to Degradation by the Nonsense-Mediated Decay Pathway
4.2. Using SpliceProt 2.0 for Shotgun Proteomic Analysis
4.2.1. Database Search Using Publicly Available Shotgun Proteomics Data
4.2.2. Proposed Strategy to Proteotypic Peptide Identification after Peptide Spectrum Match Search
4.2.3. Benchmarking for Classic Shotgun Proteomics Analysis Using Known Databases
4.2.4. SpliceProt, OpenProt, and UniProtKB Database Comparisons
4.3. Healthy Liver RNA-Seq Analysis
4.4. Identification of Orthologous Proteoforms
4.5. Web Interface Implementation
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Nesvizhskii, A.I. Proteogenomics: Concepts, Applications and Computational Strategies. Nat. Methods 2014, 11, 1114–1125. [Google Scholar] [CrossRef] [PubMed]
- Sheynkman, G.M.; Shortreed, M.R.; Cesnik, A.J.; Smith, L.M. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. Annu. Rev. Anal. Chem. 2016, 9, 521–545. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Whiteaker, J.R.; Hoofnagle, A.N.; Baird, G.S.; Rodland, K.D.; Paulovich, A.G. Clinical Potential of Mass Spectrometry-Based Proteogenomics. Nat. Rev. Clin. Oncol. 2019, 16, 256–268. [Google Scholar] [CrossRef] [PubMed]
- Kumar, D.; Bansal, G.; Narang, A.; Basak, T.; Abbas, T.; Dash, D. Integrating Transcriptome and Proteome Profiling: Strategies and Applications. Proteomics 2016, 16, 2533–2544. [Google Scholar] [CrossRef] [PubMed]
- Craig, R.; Cortens, J.P.; Beavis, R.C. The Use of Proteotypic Peptide Libraries for Protein Identification. Rapid Commun. Mass Spectrom. 2005, 19, 1844–1850. [Google Scholar] [CrossRef] [PubMed]
- Mallick, P.; Schirle, M.; Chen, S.S.; Flory, M.R.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T.; et al. Computational Prediction of Proteotypic Peptides for Quantitative Proteomics. Nat. Biotechnol. 2006, 25, 125–131. [Google Scholar] [CrossRef]
- Miller, R.M.; Jordan, B.T.; Mehlferber, M.M.; Jeffery, E.D.; Chatzipantsiou, C.; Kaur, S.; Millikin, R.J.; Dai, Y.; Tiberi, S.; Castaldi, P.J.; et al. Enhanced Protein Isoform Characterization through Long-Read Proteogenomics. Genome Biol. 2022, 23, 69. [Google Scholar] [CrossRef]
- Gilbert, W. Why Genes in Pieces? Nature 1978, 271, 501. [Google Scholar] [CrossRef]
- Malioutov, D.; Chen, T.; Airoldi, E.; Jaffe, J.; Budnik, B.; Slavov, N. Quantifying Homologous Proteins and Proteoforms. Mol. Cell. Proteom. 2019, 18, 162–168. [Google Scholar] [CrossRef]
- Smith, L.M.; Kelleher, N.L. Proteoform: A Single Term Describing Protein Complexity. Nat. Methods 2013, 10, 186–187. [Google Scholar] [CrossRef]
- Smith, L.M.; Kelleher, N.L. Proteoforms as the next Proteomics Currency: Identifying Precise Molecular Forms of Proteins Can Improve Our Understanding of Function. Science 2018, 359, 1106–1107. [Google Scholar] [CrossRef] [PubMed]
- Schaffer, L.V.; Millikin, R.J.; Miller, R.M.; Anderson, L.C.; Fellers, R.T.; Ge, Y.; Kelleher, N.L.; LeDuc, R.D.; Liu, X.; Payne, S.H.; et al. Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics 2019, 19, 1800361. [Google Scholar] [CrossRef] [PubMed]
- Scotti, M.M.; Swanson, M.S. RNA Mis-Splicing in Disease. Nat. Rev. Genet. 2016, 17, 19–32. [Google Scholar] [CrossRef] [PubMed]
- Singh, R.K.; Cooper, T.A. Pre-MRNA Splicing in Disease and Therapeutics. Trends Mol. Med. 2012, 18, 472–482. [Google Scholar] [CrossRef] [PubMed]
- Suñé-Pou, M.; Prieto-Sánchez, S.; Boyero-Corral, S.; Moreno-Castro, C.; El Yousfi, Y.; Suñé-Negre, J.M.; Hernández-Munain, C.; Suñé, C. Targeting Splicing in the Treatment of Human Disease. Genes 2017, 8, 87. [Google Scholar] [CrossRef]
- Pan, Q.; Bakowski, M.A.; Morris, Q.; Zhang, W.; Frey, B.J.; Hughes, T.R.; Blencowe, B.J. Alternative Splicing of Conserved Exons Is Frequently Species-Specific in Human and Mouse. Trends Genet. 2005, 21, 73–77. [Google Scholar] [CrossRef]
- Ule, J.; Blencowe, B.J. Alternative Splicing Regulatory Networks: Functions, Mechanisms, and Evolution. Mol. Cell 2019, 76, 329–345. [Google Scholar] [CrossRef]
- Chen, G.; Chen, J.; Yang, J.; Chen, L.; Qu, X.; Shi, C.; Ning, B.; Shi, L.; Tong, W.; Zhao, Y.; et al. Significant Variations in Alternative Splicing Patterns and Expression Profiles between Human-Mouse Orthologs in Early Embryos. Sci. China Life Sci. 2016, 60, 178–188. [Google Scholar] [CrossRef]
- Yeo, G.W.; Van Nostrand, E.; Holste, D.; Poggio, T.; Burge, C.B. Identification and Analysis of Alternative Splicing Events Conserved in Human and Mouse. Proc. Natl. Acad. Sci. USA 2005, 102, 2850–2855. [Google Scholar] [CrossRef]
- Zambelli, F.; Pavesi, G.; Gissi, C.; Horner, D.S.; Pesole, G. Assessment of Orthologous Splicing Isoforms in Human and Mouse Orthologous Genes. BMC Genom. 2010, 11, 534. [Google Scholar] [CrossRef]
- Modrek, B.; Lee, C.J. Alternative Splicing in the Human, Mouse and Rat Genomes Is Associated with an Increased Frequency of Exon Creation and/or Loss. Nat. Genet. 2003, 34, 177–180. [Google Scholar] [CrossRef] [PubMed]
- Blencowe, B.J. The Relationship between Alternative Splicing and Proteomic Complexity. Trends Biochem. Sci. 2017, 42, 407–408. [Google Scholar] [CrossRef] [PubMed]
- Baralle, D.; Buratti, E. RNA Splicing in Human Disease and in the Clinic. Clin. Sci. 2017, 131, 355–368. [Google Scholar] [CrossRef]
- Stamm, S. Alternative Splicing and Human Disease. In eLS Encyclopedia of Life Sciences; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2017; pp. 1–9. [Google Scholar]
- Chang, Y.F.; Imam, J.S.; Wilkinson, M.F. The Nonsense-Mediated Decay RNA Surveillance Pathway. Annu. Rev. Biochem. 2007, 76, 51–74. [Google Scholar] [CrossRef] [PubMed]
- Da Cosra, P.J.; Menezes, J.; Romao, L. The Role of Alternative Splicing Coupled to Nonsense-Mediated MRNA Decay in Human Disease. Int. J. Biochem. Cell Biol. 2017, 91, 168–175. [Google Scholar] [CrossRef] [PubMed]
- Miller, J.N.; Pearce, D.A. Nonsense-Mediated Decay in Genetic Disease: Friend or Foe? Mutat. Res. Rev. Mutat. Res. 2014, 762, 52. [Google Scholar] [CrossRef]
- Zerbino, D.R.; Achuthan, P.; Akanni, W.; Amode, M.R.; Barrell, D.; Bhai, J.; Billis, K.; Cummins, C.; Gall, A.; Girón, C.G.; et al. Ensembl 2018. Nucleic Acids Res. 2018, 46, D754–D761. [Google Scholar] [CrossRef]
- Tavares, R.; de Miranda Scherer, N.; Pauletti, B.A.; Araújo, E.; Folador, E.L.; Espindola, G.; Ferreira, C.G.; Paes Leme, A.F.; de Oliveira, P.S.L.; Passetti, F. SpliceProt: A Protein Sequence Repository of Predicted Human Splice Variants. Proteomics 2014, 14, 181–185. [Google Scholar] [CrossRef]
- Bateman, A.; Martin, M.J.; O’Donovan, C.; Magrane, M.; Alpi, E.; Antunes, R.; Bely, B.; Bingley, M.; Bonilla, C.; Britto, R.; et al. UniProt: The Universal Protein Knowledgebase. Nucleic Acids Res. 2017, 45, D158–D169. [Google Scholar] [CrossRef]
- Bateman, A.; Martin, M.J.; Orchard, S.; Magrane, M.; Agivetova, R.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Bursteinas, B.; et al. UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef]
- Rodriguez, J.M.; Maietta, P.; Ezkurdia, I.; Pietrelli, A.; Wesselink, J.-J.; Lopez, G.; Valencia, A.; Tress, M.L. APPRIS: Annotation of Principal and Alternative Splice Isoforms. Nucleic Acids Res. 2013, 41, D110–D117. [Google Scholar] [CrossRef]
- Rodriguez, J.M.; Rodriguez-Rivas, J.; Di Domenico, T.; Vázquez, J.; Valencia, A.; Tress, M.L. APPRIS 2017: Principal Isoforms for Multiple Gene Sets. Nucleic Acids Res. 2018, 46, D213–D217. [Google Scholar] [CrossRef]
- Rodriguez, J.M.; Carro, A.; Valencia, A.; Tress, M.L. APPRIS WebServer and WebServices. Nucleic Acids Res. 2015, 43, W455–W459. [Google Scholar] [CrossRef]
- Spliceprot-Home. Available online: http://spliceprot.icc.fiocruz.br/ (accessed on 3 January 2024).
- Hsu, M.K.; Lin, H.Y.; Chen, F.C. NMD Classifier: A Reliable and Systematic Classification Tool for Nonsense-Mediated Decay Events. PLoS ONE 2017, 12, e0174798. [Google Scholar] [CrossRef]
- Rice, P.; Longden, L.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276–277. [Google Scholar] [CrossRef]
- Brunet, M.A.; Lucier, J.F.; Levesque, M.; Leblanc, S.; Jacques, J.F.; Al-Saedi, H.R.H.; Guilloy, N.; Grenier, F.; Avino, M.; Fournier, I.; et al. OpenProt 2021: Deeper Functional Annotation of the Coding Potential of Eukaryotic Genomes. Nucleic Acids Res. 2021, 49, D380–D388. [Google Scholar] [CrossRef]
- Howe, K.L.; Achuthan, P.; Allen, J.; Allen, J.; Alvarez-Jarreta, J.; Ridwan Amode, M.; Armean, I.M.; Azov, A.G.; Bennett, R.; Bhai, J.; et al. Ensembl 2021. Nucleic Acids Res. 2021, 49, D884–D891. [Google Scholar] [CrossRef]
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef]
- Lane, L.; Argoud-Puy, G.; Britan, A.; Cusin, I.; Duek, P.D.; Evalet, O.; Gateau, A.; Gaudet, P.; Gleizes, A.; Masselot, A.; et al. NeXtProt: A Knowledge Platform for Human Proteins. Nucleic Acids Res. 2012, 40, D76. [Google Scholar] [CrossRef]
- Zahn-Zabal, M.; Michel, P.A.; Gateau, A.; Nikitin, F.; Schaeffer, M.; Audot, E.; Gaudet, P.; Duek, P.D.; Teixeira, D.; De Laval, V.R.; et al. The NeXtProt Knowledgebase in 2020: Data, Tools and Usability Improvements. Nucleic Acids Res. 2020, 48, D328–D334. [Google Scholar] [CrossRef]
- da Silva, E.M.G.; Rebello, K.M.; Choi, Y.J.; Gregorio, V.; Paschoal, A.R.; Mitreva, M.; McKerrow, J.H.; Neves-Ferreira, A.G.d.C.; Passetti, F. Identification of Novel Genes and Proteoforms in Angiostrongylus Costaricensis through a Proteogenomic Approach. Pathogens 2022, 11, 1273. [Google Scholar] [CrossRef]
- da Silva, E.M.G.; Santos, L.G.C.; de Oliveira, F.S.; Freitas, F.C.d.P.; Parreira, V.d.S.C.; Dos Santos, H.G.; Tavares, R.; Carvalho, P.C.; Neves-Ferreira, A.G.d.C.; Haibara, A.S.; et al. Proteogenomics Reveals Orthologous Alternatively Spliced Proteoforms in the Same Human and Mouse Brain Regions with Differential Abundance in an Alzheimer’s Disease Mouse Model. Cells 2021, 10, 1583. [Google Scholar] [CrossRef] [PubMed]
- Wu, P.; Zhang, M.; Webster, N.J.G. Alternative RNA Splicing in Fatty Liver Disease. Front. Endocrinol. 2021, 12, 613213. [Google Scholar] [CrossRef]
- Navi, L.R.B.; Tsukerman, A.; Feldman, A.; Melamed, P.; Tomic, M.; Stojilkovic, S.S.; Boehm, U.; Seger, R.; Naor, Z. Alternative RNA Splicing in the Pathogenesis of Liver Disease. Front. Endocrinol. 2017, 8, 133. [Google Scholar] [CrossRef]
- Brunet, M.A.; Brunelle, M.; Lucier, J.F.; Delcourt, V.; Levesque, M.; Grenier, F.; Samandi, S.; Leblanc, S.; Aguilar, J.D.; Dufour, P.; et al. OpenProt: A More Comprehensive Guide to Explore Eukaryotic Coding Potential and Proteomes. Nucleic Acids Res. 2019, 47, D403–D410. [Google Scholar] [CrossRef] [PubMed]
- Carvalho, P.C.; Lima, D.B.; Leprevost, F.V.; Santos, M.D.M.; Fischer, J.S.G.; Aquino, P.F.; Moresco, J.J.; Yates, J.R.; Barbosa, V.C. Integrated Analysis of Shotgun Proteomic Data with PatternLab for Proteomics 4.0. Nat. Protoc. 2016, 11, 102–117. [Google Scholar] [CrossRef] [PubMed]
- Verta, J.P.; Jacobs, A. The Role of Alternative Splicing in Adaptation and Evolution. Trends Ecol. Evol. 2022, 37, 299–308. [Google Scholar] [CrossRef]
- Hernández-Salmerón, J.E.; Moreno-Hagelsieb, G. Progress in Quickly Finding Orthologs as Reciprocal Best Hits: Comparing Blast, Last, Diamond and MMseqs2. BMC Genom. 2020, 21, 741. [Google Scholar] [CrossRef]
- Aebersold, R.; Agar, J.N.; Amster, I.J.; Baker, M.S.; Bertozzi, C.R.; Boja, E.S.; Costello, C.E.; Cravatt, B.F.; Fenselau, C.; Garcia, B.A.; et al. How Many Human Proteoforms Are There? Nat. Chem. Biol. 2018, 14, 206–214. [Google Scholar] [CrossRef]
- Sulakhe, D.; D’Souza, M.; Wang, S.; Balasubramanian, S.; Athri, P.; Xie, B.; Canzar, S.; Agam, G.; Gilliam, T.C.; Maltsev, N. Exploring the Functional Impact of Alternative Splicing on Human Protein Isoforms Using Available Annotation Sources. Brief. Bioinform. 2019, 20, 1754–1768. [Google Scholar] [CrossRef]
- Fancello, L.; Burger, T. An Analysis of Proteogenomics and How and When Transcriptome-Informed Reduction of Protein Databases Can Enhance Eukaryotic Proteomics. Genome Biol. 2022, 23, 132. [Google Scholar] [CrossRef]
- Li, Y.; Wang, X.; Cho, J.H.; Shaw, T.I.; Wu, Z.; Bai, B.; Wang, H.; Zhou, S.; Beach, T.G.; Wu, G.; et al. JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells. J. Proteome Res. 2016, 15, 2309–2320. [Google Scholar] [CrossRef] [PubMed]
- Aken, B.L.; Ayling, S.; Barrell, D.; Clarke, L.; Curwen, V.; Fairley, S.; Fernandez Banet, J.; Billis, K.; García Girón, C.; Hourlier, T.; et al. The Ensembl Gene Annotation System. Database 2016, 2016, baw093. [Google Scholar] [CrossRef]
- Brunet, M.A.; Roucou, X. Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames. JoVE J. Vis. Exp. 2019, 2019, e59589. [Google Scholar] [CrossRef]
- Nesvizhskii, A.I.; Aebersold, R. Interpretation of Shotgun Proteomic Data: The Protein Inference Problem. Mol. Cell. Proteom. 2005, 4, 1419–1440. [Google Scholar] [CrossRef]
- Omenn, G.S.; States, U.; Lane, L.; Overall, C.M.; Lindskog, C. Research on the Human Proteome Reaches a Major Milestone: >90% of Predicted Human Proteins Now Credibly Detected, According to the HUPO Human Proteome Project. J. Proteome Res. 2020, 19, 4735–4746. [Google Scholar] [CrossRef]
- Kim, S.; Pevzner, P.A. MS-GF þ Makes Progress towards a Universal Database Search Tool for Proteomics. Nat. Commun. 2014, 5, 5277. [Google Scholar] [CrossRef] [PubMed]
- Ho, B.; Baryshnikova, A.; Brown, G.W. Unification of Protein Abundance Datasets Yields a Quantitative Saccharomyces Cerevisiae Proteome. Cell Syst. 2018, 6, 192–205.e3. [Google Scholar] [CrossRef]
- Tabb, D.L.; Vega-Montoto, L.; Rudnick, P.A.; Variyath, A.M.; Ham, A.J.L.; Bunk, D.M.; Kilpatrick, L.E.; Billheimer, D.D.; Blackman, R.K.; Cardasis, H.L.; et al. Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography—Tandem Mass Spectrometry. J. Proteome Res. 2010, 9, 761. [Google Scholar] [CrossRef]
- Sayers, E.W.; Barrett, T.; Benson, D.A.; Bryant, S.H.; Canese, K.; Chetvernin, V.; Church, D.M.; Dicuccio, M.; Edgar, R.; Federhen, S.; et al. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009, 37, 5–15. [Google Scholar] [CrossRef]
- Santos, M.D.M.; Lima, D.B.; Fischer, J.S.G.; Clasen, M.A.; Kurt, L.U.; Camillo-Andrade, A.C.; Monteiro, L.C.; de Aquino, P.F.; Neves-Ferreira, A.G.C.; Valente, R.H.; et al. Simple, Efficient and Thorough Shotgun Proteomic Analysis with PatternLab V. Nat. Protoc. 2022, 17, 1553–1578. [Google Scholar] [CrossRef] [PubMed]
- Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Carvalho, P.C.; Fischer, J.S.G.; Xu, T.; Cociorva, D.; Balbuena, T.S.; Valente, R.H.; Perales, J.; Yates, J.R.; Barbosa, V.C. Search Engine Processor: Filtering and Organizing Peptide Spectrum Matches. Proteomics 2012, 12, 944–949. [Google Scholar] [CrossRef]
- Elias, J.E.; Gygi, S.P. Target-Decoy Search Strategy for Increased Confidence in Large-Scale Protein Identifications by Mass Spectrometry. Nat. Methods 2007, 4, 207–214. [Google Scholar] [CrossRef]
- Cox, J.; Mann, M. MaxQuant Enables High Peptide Identification Rates, Individualized p.p.b.-Range Mass Accuracies and Proteome-Wide Protein Quantification. Nat. Biotechnol. 2008, 26, 1367–1372. [Google Scholar] [CrossRef] [PubMed]
- Abril, J.F.; Castelo, R.; Guigó, R. Comparison of Splice Sites in Mammals and Chicken. Genome Res. 2005, 15, 111–119. [Google Scholar] [CrossRef]
- Faergeman, S.L.; Evans, H.; Attfield, K.E.; Desel, C.; Kuttikkatte, S.B.; Sommerlund, M.; Jensen, L.T.; Frokiaer, J.; Friese, M.A.; Matthews, P.M.; et al. A Novel Neurodegenerative Spectrum Disorder in Patients with MLKL Deficiency. Cell Death Dis. 2020, 11, 303. [Google Scholar] [CrossRef] [PubMed]
- Abdollahpour, H.; Alawi, M.; Kortüm, F.; Beckstette, M.; Seemanova, E.; Komárek, V.; Rosenberger, G.; Kutsche, K. An AP4B1 Frameshift Mutation in Siblings with Intellectual Disability and Spastic Tetraplegia Further Delineates the AP-4 Deficiency Syndrome. Eur. J. Hum. Genet. 2014, 23, 256–259. [Google Scholar] [CrossRef]
- Montpetit, A.; Côté, S.; Brustein, E.; Drouin, C.A.; Lapointe, L.; Boudreau, M.; Meloche, C.; Drouin, R.; Hudson, T.J.; Drapeau, P.; et al. Disruption of AP1S1, Causing a Novel Neurocutaneous Syndrome, Perturbs Development of the Skin and Spinal Cord. PLoS Genet. 2008, 4, e1000296. [Google Scholar] [CrossRef]
- Fuchizawa, T.; Adachi, Y.; Ito, Y.; Higashiyama, H.; Kanegane, H.; Futatani, T.; Kobayashi, I.; Kamachi, Y.; Sakamoto, T.; Tsuge, I.; et al. Developmental Changes of FOXP3-Expressing CD4+CD25+ Regulatory T Cells and Their Impairment in Patients with FOXP3 Gene Mutations. Clin. Immunol. 2007, 125, 237–246. [Google Scholar] [CrossRef] [PubMed]
- Mailer, R.K.W. Alternative Splicing of FOXP3-Virtue and Vice. Front. Immunol. 2018, 9, 530. [Google Scholar] [CrossRef] [PubMed]
- NCBI to Retire the UniGene Database—NCBI Insights. Available online: https://ncbiinsights.ncbi.nlm.nih.gov/2019/02/01/ncbi-to-retire-the-unigene-database/ (accessed on 1 July 2023).
- Kent, W.J. BLAT—The BLAST-like Alignment Tool. Genome Res. 2002, 12, 656–664. [Google Scholar] [CrossRef] [PubMed]
- Harrow, J.; Frankish, A.; Gonzalez, J.M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B.L.; Barrell, D.; Zadissa, A.; Searle, S.; et al. GENCODE: The Reference Human Genome Annotation for The ENCODE Project. Genome Res. 2012, 22, 1760–1774. [Google Scholar] [CrossRef]
- Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; Mcgettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; et al. Clustal W and Clustal X Version 2.0. Bioinformatics 2007, 23, 2947–2948. [Google Scholar] [CrossRef] [PubMed]
- Faustino, N.A.; Cooper, T.A. Pre-MRNA Splicing and Human Disease. Genes Dev. 2003, 17, 419–437. [Google Scholar] [CrossRef]
- Duncan, M.W.; Aebersold, R.; Caprioli, R.M. The Pros and Cons of Peptide-Centric Proteomics. Nat. Biotechnol. 2010, 28, 659–664. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, A. Proteotypic Peptides. In Encyclopedia of Systems Biology; Springer: New York, NY, USA, 2013. [Google Scholar]
- Tavares, R.; Wajnberg, G.; Scherer, N.d.M.; Pauletti, B.A.; Cassoli, J.S.; Ferreira, C.G.; Paes Leme, A.F.; de Araujo-Souza, P.S.; Martins-de-Souza, D.; Passetti, F. Unveiling Alterative Splice Diversity from Human Oligodendrocyte Proteome Data. J. Proteom. 2017, 151, 293–301. [Google Scholar] [CrossRef]
- Deutsch, E.W.; Bandeira, N.; Sharma, V.; Perez-Riverol, Y.; Carver, J.J.; Kundu, D.J.; García-Seisdedos, D.; Jarnuczak, A.F.; Hewapathirana, S.; Pullman, B.S.; et al. The ProteomeXchange Consortium in 2020: Enabling ‘Big Data’ Approaches in Proteomics. Nucleic Acids Res. 2020, 48, D1145–D1152. [Google Scholar] [CrossRef]
- Nesvizhskii, A.I. A Survey of Computational Methods and Error Rate Estimation Procedures for Peptide and Protein Identification in Shotgun Proteomics. J. Proteom. 2010, 73, 2092–2123. [Google Scholar] [CrossRef] [PubMed]
- GTEx Project GTEx Portal. GTEx Anal. Release V6p (dbGaP Access. phs000424.v6.p1). 2017. Available online: https://gtexportal.org/home/ (accessed on 7 November 2022).
- Lonsdale, J.; Thomas, J.; Salvatore, M.; Phillips, R.; Lo, E.; Shad, S.; Hasz, R.; Walters, G.; Garcia, F.; Young, N.; et al. The Genotype-Tissue Expression (GTEx) Project. Nat. Genet. 2013, 45, 580–585. [Google Scholar] [CrossRef] [PubMed]
- Aguet, F.; Brown, A.A.; Castel, S.E.; Davis, J.R.; He, Y.; Jo, B.; Mohammadi, P.; Park, Y.S.; Parsana, P.; Segrè, A.V.; et al. Genetic Effects on Gene Expression across Human Tissues. Nature 2017, 550, 204–213. [Google Scholar] [CrossRef]
- Krueger, F.; James, F.; Ewels, P.; Afyounian, E.; Schuster-Boeckler, B. FelixKrueger/TrimGalore: V0.6.7—DOI via Zenodo. 2021. Available online: https://zenodo.org/records/5127899 (accessed on 4 January 2024).
- Kim, D.; Langmead, B.; Salzberg, S.L. HISAT: A Fast Spliced Aligner with Low Memory Requirements. Nat. Methods 2015, 12, 357–360. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
- Patro, R.; Duggal, G.; Love, M.I.; Irizarry, R.A.; Kingsford, C. Salmon: Fast and Bias-Aware Quantification of Transcript Expression Using Dual-Phase Inference. Nat. Methods 2017, 14, 417. [Google Scholar] [CrossRef] [PubMed]
- Trapnell, C.; Williams, B.A.; Pertea, G.; Mortazavi, A.; Kwan, G.; van Baren, M.J.; Salzberg, S.L.; Wold, B.J.; Pachter, L. Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation. Nat. Biotechnol. 2010, 28, 511–515. [Google Scholar] [CrossRef]
- Buchfink, B.; Xie, C.; Huson, D.H. Fast and Sensitive Protein Alignment Using DIAMOND. Nat. Methods 2014, 12, 59–60. [Google Scholar] [CrossRef]
- Nellore, A.; Jaffe, A.E.; Fortin, J.P.; Alquicira-Hernandez, J.; Collado-Torres, L.; Wang, S.; Phillips, R.A.; Karbhari, N.; Hansen, K.D.; Langmead, B.; et al. Human Splicing Diversity and the Extent of Unannotated Splice Junctions across Human RNA-Seq Samples on the Sequence Read Archive. Genome Biol. 2016, 17, 266. [Google Scholar] [CrossRef]
- Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef]
- Needleman, S.B.; Wunsch, C.D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
- Sievers, F.; Higgins, D.G. Clustal Omega. Curr. Protoc. Bioinform. 2014, 48, 3–13. [Google Scholar] [CrossRef] [PubMed]
- Sass, J.O.; Fischer, K.; Wang, R.; Christensen, E.; Scholl-Bürgi, S.; Chang, R.; Kapelari, K.; Walter, M. D-Glyceric Aciduria Is Caused by Genetic Deficiency of D-Glycerate Kinase (GLYCTK). Hum. Mutat. 2010, 31, 1280–1285. [Google Scholar] [CrossRef] [PubMed]
- McLennan, A.G. The Nudix Hydrolase Superfamily. Cell. Mol. Life Sci. 2006, 63, 123–143. [Google Scholar] [CrossRef] [PubMed]
- Abdelraheim, S.R.; Spiller, D.G.; McLennan, A.G. Mammalian NADH Diphosphatases of the Nudix Family: Cloning and Characterization of the Human Peroxisomal NUDT12 Protein. Biochem. J. 2003, 374, 329–335. [Google Scholar] [CrossRef] [PubMed]
- Gakière, B.; Hao, J.; de Bont, L.; Pétriacq, P.; Nunes-Nesi, A.; Fernie, A.R. NAD+ Biosynthesis and Signaling in Plants. Crit. Rev. Plant Sci. 2018, 37, 259–307. [Google Scholar] [CrossRef]
- Xia, Z.; Legler, P.; Gabelli, S.B. Structures and Mechanisms of Nudix Hydrolases. Arch. Biochem. Biophys. 2005, 433, 129–143. [Google Scholar] [CrossRef]
- Gillespie, M.; Jassal, B.; Stephan, R.; Milacic, M.; Rothfels, K.; Senff-Ribeiro, A.; Griss, J.; Sevilla, C.; Matthews, L.; Gong, C.; et al. The Reactome Pathway Knowledgebase 2022. Nucleic Acids Res. 2022, 50, D687–D692. [Google Scholar] [CrossRef] [PubMed]
- Kraus, D.; Yang, Q.; Kong, D.; Banks, A.S.; Zhang, L.; Rodgers, J.T.; Pirinen, E.; Pulinilkunnil, T.C.; Gong, F.; Wang, Y.C.; et al. Nicotinamide N-Methyltransferase Knockdown Protects against Diet-Induced Obesity. Nature 2014, 508, 258–262. [Google Scholar] [CrossRef] [PubMed]
- Magni, G.; Amici, A.; Emanuelli, M.; Orsomando, G.; Raffaelli, N.; Ruggieri, S. Enzymology of NAD+ Homeostasis in Man. Cell. Mol. Life Sci. 2004, 61, 19–34. [Google Scholar] [CrossRef]
- Yu, Y.; Fuscoe, J.C.; Zhao, C.; Guo, C.; Jia, M.; Qing, T.; Bannon, D.I.; Lancashire, L.; Bao, W.; Du, T.; et al. A Rat RNA-Seq Transcriptomic BodyMap across 11 Organs and 4 Developmental Stages. Nat. Commun. 2014, 5, 3230. [Google Scholar] [CrossRef]
- Kanehisa, M.; Furumichi, M.; Sato, Y.; Ishiguro-Watanabe, M.; Tanabe, M. KEGG: Integrating Viruses and Cellular Organisms. Nucleic Acids Res. 2021, 49, D545–D551. [Google Scholar] [CrossRef]
- Smith, D.M.; Fraga, H.; Reis, C.; Kafri, G.; Goldberg, A.L. ATP Binds to Proteasomal ATPases in Pairs with Distinct Functional Effects, Implying an Ordered Reaction Cycle. Cell 2011, 144, 526–538. [Google Scholar] [CrossRef] [PubMed]
- Tanahashi, N.; Suzuki, M.; Fujiwara, T.; Takahashi, E.I.; Shimbara, N.; Chung, C.H.; Tanaka, K. Chromosomal Localization and Immunological Analysis of a Family of Human 26S Proteasomal ATPases. Biochem. Biophys. Res. Commun. 1998, 243, 229–232. [Google Scholar] [CrossRef] [PubMed]
Human | Mouse | Rat | |
---|---|---|---|
Number of transcripts | 242,578 | 135,694 | 37,453 |
Number of transcripts selected for computational translation | 203,709 | 115,321 | 34,868 |
Number of polypeptide sequences obtained after computational translation | 120,964 | 74,702 | 24,739 |
Canonical Proteins | Non-Canonical Proteins | Total Proteins | |
---|---|---|---|
Human | 30,976 | 55,616 | 86,592 |
Mouse | 25,138 | 37,301 | 61,439 |
Rat | 16,136 | 7251 | 23,387 |
Species | SpliceProt 2.0 against SwissProt | OpenProt 1.6 against SwissProt | SpliceProt 2.0 against OpenProt 1.6 | |||
---|---|---|---|---|---|---|
Mean | SD | Mean | SD | Mean | SD | |
Human | 99.9 | 0.017 | 96.2 | 15.2 | 79.9 | 32.6 |
Mouse | 99.9 | 0.018 | 98.2 | 9.36 | 87.3 | 26.0 |
Rat | 99.9 | 0.009 | 96.2 | 15.2 | 97.0 | 11.0 |
SpliceProt 2.0 for PSM Search | OpenProt 1.6 | UniProtKB/ SwissProt | UniProtKB/ TrEMBL | ||
---|---|---|---|---|---|
Human | Peptides | 13,503 | 905 | 15,090 | 11,351 |
Proteins | 1805 | 451 | 1986 | 1243 | |
Mouse | Peptides | 10,375 | 437 | 9944 | 9560 |
Proteins | 1793 | 237 | 2347 | 1405 | |
Rat | Peptides | 20,400 | 122 | 13,504 | 15,021 |
Proteins | 4032 | 72 | 3212 | 3710 |
Datasets | Proteins | |
---|---|---|
Total | human | 120,932 |
mouse | 74,694 | |
rat | 24,739 | |
Orthologous | human/mouse | 23,458 |
human/rat | 13,633 | |
rat/mouse | 13,292 | |
Triads (perfect match) | human/mouse/rat | 12,257 |
Comparison | Identity Score | Needle | RBH |
---|---|---|---|
human vs. mouse | 100% | 479 | 3752 |
human vs. rat | 100% | 203 | 2223 |
rat vs. mouse | 100% | 609 | 1263 |
human vs. mouse | 60–99.9% | 13,208 | 6019 |
human vs. rat | 60–99.9% | 7588 | 3278 |
rat vs. mouse | 60–99.9% | 10,180 | 1581 |
Human | Mouse | Rat | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Gene Symbol | Ensembl | Proteotypic Peptides | TPM | Gene Symbol | Ensembl | Proteotypic Peptides | TPM | Gene Symbol | Ensembl | Proteotypic Peptides | TPM |
ADK | ENST00000539909 | 2 | 0 | Adk | ENSMUST00000045376 | 2 | 145.1 | Adk | ENSRNOT00000016709 | 1 | 156.8 |
CMAS | ENST00000229329 | 5 | 19.8 | Cmas | ENSMUST00000032419 | 1 | 27.8 | Cmas | ENSRNOT00000018734 | 1 | 49.3 |
DDX3Y | ENST00000336079 | 1 | 6.5 | Ddx3y | ENSMUST00000091190 | 2 | 8.6 | Ddx3y | ENSRNOT00000092078 | 2 | 0 |
FAM120A | ENST00000277165 | 3 | 20.2 | Fam120a | ENSMUST00000060805 | 1 | 56.5 | Fam120a | ENSRNOT00000060568 | 16 | 91.2 |
FGA | ENST00000651975 | 13 | 515.5 | Fga | ENSMUST00000166581 | 8 | 62.6 | Fga | ENSRNOT00000064091 | 15 | 2326.7 |
GDI2 | ENST00000380191 | 3 | 58.7 | Gdi2 | ENSMUST00000223396 | 2 | 5.9 | Gdi2 | ENSRNOT00000024952 | 5 | 2703.4 |
GLYCTK * | ENST00000436784 | 6 | 14.6 | Glyctk | ENSMUST00000036382 | 2 | 52.3 | Glyctk * | ENSRNOT00000074595 | 3 | 211.5 |
GLYCTK * | ENST00000436784 | 6 | 14.6 | Glyctk | ENSMUST00000112543 | 2 | 34.5 | Glyctk * | ENSRNOT00000074595 | 3 | 211.5 |
GLYCTK * | ENST00000436784 | 6 | 14.6 | Glyctk | ENSMUST00000159809 | 2 | 8.2 | Glyctk * | ENSRNOT00000074595 | 3 | 211.5 |
GLYCTK * | ENST00000436784 | 6 | 14.6 | Glyctk | ENSMUST00000162562 | 2 | 21.2 | Glyctk * | ENSRNOT00000074595 | 3 | 211.5 |
GPT2 | ENST00000340124 | 1 | 20.3 | Gpt2 | ENSMUST00000034136 | 1 | 186.2 | Gpt2 | ENSRNOT00000077275 | 3 | 100.1 |
HSDL2 | ENST00000398805 | 4 | 25.2 | Hsdl2 | ENSMUST00000030078 | 1 | 33.6 | Hsdl2 | ENSRNOT00000059458 | 2 | 34.9 |
HSPA4 | ENST00000304858 | 2 | 7.1 | Hspa4 | ENSMUST00000020630 | 6 | 33.5 | Hspa4 | ENSRNOT00000023628 | 7 | 34.8 |
IQGAP2 | ENST00000274364 | 8 | 3.9 | Iqgap2 | ENSMUST00000068603 | 1 | 110.7 | Iqgap2 | ENSRNOT00000035017 | 38 | 101.5 |
MTTP | ENST00000265517 | 1 | 26.3 | Mttp | ENSMUST00000029805 | 21 | 135.4 | Mttp | ENSRNOT00000014631 | 4 | 186.2 |
NAXE | ENST00000368235 | 2 | 25.7 | Naxe | ENSMUST00000029708 | 3 | 72.8 | Naxe | ENSRNOT00000025986 | 2 | 47.8 |
NUDT12 | ENST00000230792 | 2 | 4.5 | Nudt12 | ENSMUST00000025065 | 2 | 15.6 | Nudt12 | ENSRNOT00000066968 | 2 | 9.3 |
PGRMC2 | ENST00000520121 | 3 | 0.2 | Pgrmc2 | ENSMUST00000058578 | 1 | 15.9 | Pgrmc2 | ENSRNOT00000018796 | 2 | 101.6 |
PSMC2 | ENST00000292644 | 4 | 7.8 | Psmc2 * | ENSMUST00000030769 | 2 | 52.4 | Psmc2 * | ENSRNOT00000016450 | 4 | 79.9 |
PSMC2 | ENST00000425206 | 4 | 7 | Psmc2 * | ENSMUST00000030769 | 2 | 52.4 | Psmc2 * | ENSRNOT00000016450 | 4 | 79.9 |
PSMC2 | ENST00000435765 | 4 | 0 | Psmc2 * | ENSMUST00000030769 | 2 | 52.4 | Psmc2 * | ENSRNOT00000016450 | 4 | 79.9 |
PSMD1 | ENST00000308696 | 4 | 13.8 | Psmd1 | ENSMUST00000027432 | 1 | 54.8 | Psmd1 | ENSRNOT00000024306 | 1 | 64.5 |
SCFD1 | ENST00000458591 | 4 | 5.7 | Scfd1 | ENSMUST00000021335 | 3 | 28 | Scfd1 | ENSRNOT00000040548 | 2 | 23.9 |
SEC24D | ENST00000280551 | 1 | 11.6 | Sec24d | ENSMUST00000047923 | 1 | 26.9 | Sec24d | ENSRNOT00000064809 | 8 | 38.5 |
STIP1 | ENST00000305218 | 1 | 31.6 | Stip1 | ENSMUST00000025918 | 1 | 37.3 | Stip1 | ENSRNOT00000028743 | 4 | 56.8 |
UGT1A1 | ENST00000305208 | 5 | 203.8 | Ugt1a1 | ENSMUST00000073049 | 3 | 508.8 | Ugt1a3 | ENSRNOT00000025045 | 3 | 211.2 |
XYLB | ENST00000207870 | 1 | 6.2 | Xylb | ENSMUST00000039610 | 1 | 19.7 | Xylb | ENSRNOT00000019106 | 6 | 56.3 |
YWHAB | ENST00000353703 | 4 | 25.3 | Ywhab | ENSMUST00000018470 | 6 | 27.2 | Ywhab * | ENSRNOT00000016981 | 6 | 30.7 |
YWHAB | ENST00000372839 | 4 | 3.7 | Ywhab | ENSMUST00000131288 | 6 | 0.4 | Ywhab * | ENSRNOT00000016981 | 6 | 30.7 |
Peptide | Primary Score | Human | Mouse | Rat |
---|---|---|---|---|
AVLGMAAAAEELLGQHLVQGVISVPK | 5.76 | X | - | - |
LLAARGATIQELNTIRK | 4.77 | - | X | - |
ADSDPHGPHTCGHVLNVIIGSNSLALAEAQR | 4.88 | - | - | X |
GPVCLLAGGEPTVQLQGSGK | 4.03 | - | X | X |
GPVCLLAGGEPTVQLQGSGK | 4.45 | - | X | X |
GPVCLLAGGEPTVQLQGSGR | 3.72 | X | - | - |
Human | Mouse | Rat | |||
---|---|---|---|---|---|
Gene Name | Ensembl ID | Gene Name | Ensembl ID | Gene Name | Ensembl ID |
PLCB4 | ENST00000492632 | Plcb4 | ENSMUST00000184371 | Plcb4 | ENSRNOT00000049855 |
AP1S2 | ENST00000672063 | Ap1s2 | ENSMUST00000140845 | Ap1s2 | ENSRNOT00000081652 |
FOXP3 | ENST00000651307 | Foxp3 | ENSMUST00000234479 | Foxp3 | ENSRNOT00000091146 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Santos, L.G.C.; Parreira, V.d.S.C.; da Silva, E.M.G.; Santos, M.D.M.; Fernandes, A.d.F.; Neves-Ferreira, A.G.d.C.; Carvalho, P.C.; Freitas, F.C.d.P.; Passetti, F. SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. Int. J. Mol. Sci. 2024, 25, 1183. https://doi.org/10.3390/ijms25021183
Santos LGC, Parreira VdSC, da Silva EMG, Santos MDM, Fernandes AdF, Neves-Ferreira AGdC, Carvalho PC, Freitas FCdP, Passetti F. SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. International Journal of Molecular Sciences. 2024; 25(2):1183. https://doi.org/10.3390/ijms25021183
Chicago/Turabian StyleSantos, Letícia Graziela Costa, Vinícius da Silva Coutinho Parreira, Esdras Matheus Gomes da Silva, Marlon Dias Mariano Santos, Alexander da Franca Fernandes, Ana Gisele da Costa Neves-Ferreira, Paulo Costa Carvalho, Flávia Cristina de Paula Freitas, and Fabio Passetti. 2024. "SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms" International Journal of Molecular Sciences 25, no. 2: 1183. https://doi.org/10.3390/ijms25021183
APA StyleSantos, L. G. C., Parreira, V. d. S. C., da Silva, E. M. G., Santos, M. D. M., Fernandes, A. d. F., Neves-Ferreira, A. G. d. C., Carvalho, P. C., Freitas, F. C. d. P., & Passetti, F. (2024). SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. International Journal of Molecular Sciences, 25(2), 1183. https://doi.org/10.3390/ijms25021183