Optimizing Scorpion Toxin Processing through Artificial Intelligence
Abstract
:1. Introduction
2. Results
2.1. Tapai (Transcriptome Processing by Artificial Intelligence), a Python Neural-Network-Approach Script to Classify Scorpion Toxins
2.2. RNA Sequencing and Transcriptome Assembly, and Toxin Classification
2.3. Toxin Annotation on the Three New Scorpion Venom Transcriptomes
3. Discussion
4. Conclusions
5. Materials and Methods
5.1. Biological Material, RNA Extraction, RNA Sequencing and Transcriptome Assembly
5.2. AI Processing Pipeline
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Froy, O.; Sagiv, T.; Poreh, M.; Urbach, D.; Zilberberg, N.; Gurevitz, M. Dynamic diversification from a putative common ancestor of scorpion toxins affecting sodium, potassium, and chloride channels. J. Mol. Evol. 1999, 48, 187–196. [Google Scholar] [CrossRef] [PubMed]
- Possani, L.D.; Becerril, B.; Delepierre, M.; Tytgat, J. Scorpion toxins specific for Na+-channels. Eur. J. Biochem. 1999, 264, 287–300. [Google Scholar] [CrossRef] [PubMed]
- Zhu, S.; Peigneur, S.; Gao, B.; Umetsu, Y.; Ohki, S.; Tytgat, J. Experimental conversion of a defensin into a neurotoxin: Implications for origin of toxic function. Mol. Biol. Evol. 2014, 31, 546–559. [Google Scholar] [CrossRef]
- Santibáñez-López, C.E.; Possani, L.D. Overview of the Knottin scorpion toxin-like peptides in scorpion venoms: Insights on their classification and evolution. Toxicon 2015, 107, 317–326. [Google Scholar] [CrossRef]
- Wang, X.; Gao, B.; Zhu, S. Exon shuffling and origin of scorpion venom biodiversity. Toxins 2016, 9, 10. [Google Scholar] [CrossRef]
- Grashof, D.G.; Kerkkamp, H.M.; Afonso, S.; Archer, J.; Harris, D.J.; Richardson, M.K.; Vonk, F.J.; van der Meijden, A. Transcriptome annotation and characterization of novel toxins in six scorpion species. BMC Genom. 2019, 20, 645. [Google Scholar] [CrossRef] [PubMed]
- Cid-Uribe, J.I.; Veytia-Bucheli, J.I.; Romero-Gutierrez, T.; Ortiz, E.; Possani, L.D. Scorpion venomics: A 2019 overview. Expert Rev. Proteom. 2020, 17, 67–83. [Google Scholar] [CrossRef]
- Santibáñez-López, C.E.; Aharon, S.; Ballesteros, J.A.; Gainett, G.; Baker, C.M.; González-Santillán, E.; Harvey, M.S.; Hassan, M.K.; Abu Almaaty, A.H.; Aldeyarbi, S.M.; et al. Phylogenomics of scorpions reveal contemporaneous diversification of scorpion mammalian predators and mammal-active sodium channel toxins. Syst. Biol. 2022, 71, 1281–1289. [Google Scholar] [CrossRef] [PubMed]
- Nystrom, G.S.; Ellsworth, S.A.; Ward, M.J.; Rokyta, D.R. Varying Modes of Selection Among Toxin Families in the Venoms of the Giant Desert Hairy Scorpions (Hadrurus). J. Mol. Evol. 2023, 91, 935–962. [Google Scholar] [CrossRef]
- Santibáñez-López, C.E.; Cid-Uribe, J.I.; Zamudio, F.Z.; Batista, C.V.; Ortiz, E.; Possani, L.D. Venom gland transcriptomic and venom proteomic analyses of the scorpion Megacormus gertschi Díaz-Najera, 1966 (Scorpiones: Euscorpiidae: Megacorminae). Toxicon 2017, 133, 95–109. [Google Scholar] [CrossRef]
- Rokyta, D.R.; Ward, M.J. Venom-gland transcriptomics and venom proteomics of the black-back scorpion (Hadrurus spadix) reveal detectability challenges and an unexplored realm of animal toxin diversity. Toxicon 2017, 128, 23–37. [Google Scholar] [CrossRef] [PubMed]
- Ward, M.J.; Ellsworth, S.A.; Rokyta, D.R. Venom-gland transcriptomics and venom proteomics of the Hentz striped scorpion (Centruroides hentzi; Buthidae) reveal high toxin diversity in a harmless member of a lethal family. Toxicon 2018, 142, 14–29. [Google Scholar] [CrossRef] [PubMed]
- Cid-Uribe, J.I.; Santibáñez-López, C.E.; Meneses, E.P.; Batista, C.V.; Jiménez-Vargas, J.M.; Ortiz, E.; Possani, L.D. The diversity of venom components of the scorpion species Paravaejovis schwenkmeyeri (Scorpiones: Vaejovidae) revealed by transcriptome and proteome analyses. Toxicon 2018, 151, 47–62. [Google Scholar] [CrossRef] [PubMed]
- Romero-Gutiérrez, M.T.; Santibáñez-López, C.E.; Jiménez-Vargas, J.M.; Batista CV, F.; Ortiz, E.; Possani, L.D. Transcriptomic and proteomic analyses reveal the diversity of venom components from the vaejovid scorpion Serradigitus gertschi. Toxins 2018, 10, 359. [Google Scholar] [CrossRef] [PubMed]
- Cid-Uribe, J.I.; Meneses, E.P.; Batista, C.V.; Ortiz, E.; Possani, L.D. Dissecting toxicity: The venom gland transcriptome and the venom proteome of the highly venomous scorpion Centruroides limpidus (Karsch, 1879). Toxins 2019, 11, 247. [Google Scholar] [CrossRef]
- Valdez-Velázquez, L.L.; Cid-Uribe, J.; Romero-Gutierrez, M.T.; Olamendi-Portugal, T.; Jimenez-Vargas, J.M.; Possani, L.D. Transcriptomic and proteomic analyses of the venom and venom glands of Centruroides hirsutipalpus, a dangerous scorpion from Mexico. Toxicon 2020, 179, 21–32. [Google Scholar] [CrossRef]
- DeBin, J.A.; Maggio, J.E.; Strichartz, G.R. Purification and characterization of chlorotoxin, a chloride channel ligand from the venom of the scorpion. Am. J. Physiol. Cell Physiol. 1993, 264, C361–C369. [Google Scholar] [CrossRef]
- Benkhalifa, R.; Stankiewicz, M.; Lapied, B.; Turkov, M.; Zilberberg, N.; Gurevitz, M.; Pelhate, M. Refined electrophysiological analysis suggests that a depressant toxin is a sodium channel opener rather than a blocker. Life Sci. 1997, 61, 819–830. [Google Scholar] [CrossRef]
- Alami, M.; Vacher, H.; Bosmans, F.; Devaux, C.; Rosso, J.P.; Bougis, P.E.; Tytgat, J.; Darbon, H.; Martin-Eauclaire, M.F. Characterization of Amm VIII from Androctonus mauretanicus mauretanicus: A new scorpion toxin that discriminates between neuronal and skeletal sodium channels. Biochem. J. 2003, 375, 551–560. [Google Scholar] [CrossRef]
- Geron, A. Hands-On Machine Learning with Scikit-Learn & TensorFlow; O’Reilly Media Inc.: Sebastopol, CA, USA, 2017. [Google Scholar]
- McGinnis, S.; Madden, T.L. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004, 32, W20–W25. [Google Scholar] [CrossRef]
- Vishnoi, S.; Matre, H.; Garg, P.; Pandey, S.K. Artificial intelligence and machine learning for protein toxicity prediction using proteomics data. Chem. Biol. Drug Des. 2020, 96, 902–920. [Google Scholar] [CrossRef] [PubMed]
- Wong, E.S.W.; Hardy, M.C.; Wood, D.; Bailey, T.; King, G.F. SVM-Based Prediction of Propeptide Cleavage Sites in Spider Toxins Identifies Toxin Innovation in an Australian Tarantula. PLoS ONE 2013, 8, e66279. [Google Scholar] [CrossRef]
- Cole, T.J.; Brewer, M.S. Toxify: A deep learning approach to classify animal venom proteins. PeerJ 2019, 7, e7200. [Google Scholar] [CrossRef]
- Bileschi, M.L.; Belanger, D.; Bryant, D.H.; Sanderson, T.; Carter, B.; Sculley, D.; Bateman, A.; DePristo, M.A.; Colwell, L.J. Using deep learning to annotate the protein universe. Nat. Biotechnol. 2022, 40, 932–937. [Google Scholar] [CrossRef]
- Merino, G.A.; Saidi, R.; Milone, D.H.; Stegmayer, G.; Martin, M.J. Hierarchical deep learning for predicting GO annotations by integrating protein knowledge. Bioinformatics 2022, 38, 4488–4496. [Google Scholar] [CrossRef]
- Sanderson, T.; Bileschi, M.L.; Belanger, D.; Colwell, L.J. ProteInfer, deep neural networks for protein functional inference. eLife 2023, 12, e80942. [Google Scholar] [CrossRef]
- Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Trinity: Reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 2011, 29, 644. [Google Scholar] [CrossRef] [PubMed]
- Santibáñez-López, C.E.; Kriebel, R.; Ballesteros, J.A.; Rush, N.; Witter, Z.; Williams, J.; Janies, D.A.; Sharma, P.P. Integration of phylogenomics and molecular modeling reveals lineage-specific diversification of toxins in scorpions. PeerJ 2018, 6, e5902. [Google Scholar] [CrossRef] [PubMed]
- Haas, B.J. TransDecoder v. 5.3.0. Available online: https://github.com/TransDecoder/TransDecoder (accessed on 1 September 2024).
- Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.O.; Lieber, M.; et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef]
- D’Amour, A.; Heller, K.; Moldovan, D.; Adlam, B.; Alipanahi, B.; Beutel, A.; Chen, C.; Deaton, J.; Eisenstein, J.; Hoffman, M.D.; et al. Underspecification presents challenges for Credibility in Modern Machine Learning. J. Mach. Learn. Res. 2022, 23, 1–61. [Google Scholar]
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed]
- Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc (accessed on 1 September 2024).
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 1 September 2024).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. 2015. Available online: http://tensorflow.org/ (accessed on 1 September 2024).
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; Walt, S.V.D., Millman, J., Eds.; pp. 56–61. [Google Scholar]
- The Pandas Development Team. 2020. Pandas. Version 1.2.4. Zenodo. Available online: https://zenodo.org/records/13819579 (accessed on 1 September 2024).
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
Class | Total | Training Sequences | Validation Sequences |
---|---|---|---|
Calcins|DDH (ICKs) | 89 | 66 | 23 |
Potassium channel toxins (KTxs) | 627 | 150 | 477 |
Sodium channel toxins (NaTxs) | 706 | 150 | 556 |
Other venom proteins (venom) | 167 | 125 | 42 |
Class | Total | Training Sequences | Validation Sequences |
---|---|---|---|
Toxin acting on both insect and mammal sodium channels | 908 | 32 | 876 |
Toxins acting only on insect sodium channels | 42 | 32 | 10 |
Toxins acting only on mammal sodium channels | 42 | 32 | 10 |
Class | Split 1 | Split 2 | Split 3 | Split 4 | Split 5 | Split 6 | Split 7 | Split 8 | Mean | SD |
---|---|---|---|---|---|---|---|---|---|---|
Calcin|DDH (ICK) | 100 | 82.61 | 86.95 | 78.26 | 82.61 | 91.30 | 82.61 | 82.60 | 85.87 | 6.87 |
Potassium (KTx) | 82.18 | 89.52 | 73.79 | 77.78 | 87.63 | 78.41 | 76.73 | 71.07 | 79.64 | 6.43 |
Sodium (NaTx) | 96.94 | 91.19 | 96.94 | 95.86 | 88.13 | 92.99 | 92.99 | 97.66 | 93.82 | 3.53 |
Venom | 69.05 | 78.57 | 64.29 | 83.33 | 71.43 | 71.43 | 71.43 | 73.81 | 74.70 | 7.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Psenicnik, A.; Ojanguren-Affilastro, A.A.; Graham, M.R.; Hassan, M.K.; Abdel-Rahman, M.A.; Sharma, P.P.; Santibáñez-López, C.E. Optimizing Scorpion Toxin Processing through Artificial Intelligence. Toxins 2024, 16, 437. https://doi.org/10.3390/toxins16100437
Psenicnik A, Ojanguren-Affilastro AA, Graham MR, Hassan MK, Abdel-Rahman MA, Sharma PP, Santibáñez-López CE. Optimizing Scorpion Toxin Processing through Artificial Intelligence. Toxins. 2024; 16(10):437. https://doi.org/10.3390/toxins16100437
Chicago/Turabian StylePsenicnik, Adam, Andres A. Ojanguren-Affilastro, Matthew R. Graham, Mohamed K. Hassan, Mohamed A. Abdel-Rahman, Prashant P. Sharma, and Carlos E. Santibáñez-López. 2024. "Optimizing Scorpion Toxin Processing through Artificial Intelligence" Toxins 16, no. 10: 437. https://doi.org/10.3390/toxins16100437
APA StylePsenicnik, A., Ojanguren-Affilastro, A. A., Graham, M. R., Hassan, M. K., Abdel-Rahman, M. A., Sharma, P. P., & Santibáñez-López, C. E. (2024). Optimizing Scorpion Toxin Processing through Artificial Intelligence. Toxins, 16(10), 437. https://doi.org/10.3390/toxins16100437