Revisiting Five Years of CASMI Contests with EPA Identification Tools
Abstract
:1. Introduction
2. Results and Discussion
2.1. Dataset Assembly
2.1.1. CASMI 2012
2.1.2. CASMI 2013
2.1.3. CASMI 2014
2.1.4. CASMI 2016
2.1.5. CASMI 2017
2.1.6. Dataset Assembly Summary and Ongoing Work
2.2. Individual CASMI Contest Year Results
2.2.1. CASMI 2012
2.2.2. CASMI 2013
2.2.3. CASMI 2014
2.2.4. CASMI 2016
2.2.5. CASMI 2017
2.2.6. Summary
2.3. Comparison with CFM-ID Contestants and Post-Contest Evaluations
2.4. Spectral Match Interrogation
3. Materials and Methods
3.1. EPA Tools for Structure Identification
3.2. Compilation of CASMI Spectral Files
3.3. Structure Identification, Scoring, and Ranking
3.4. Assembly and Review of CASMI Input Data Sets
4. Summary and Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Data Availability
References
- Hollender, J.; Schymanski, E.L.; Singer, H.; Ferguson, P.L. Nontarget screening with high resolution mass spectrometry in the environment: Ready to go? Environ. Sci. Technol. 2017, 51, 11505–11512. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schymanski, E.L.; Singer, H.; Slobodník, J.; Ipolyi, I.M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; et al. Non-target screening with high-resolution mass spectrometry: Critical review using a collaborative trial on water analysis. Anal. Bioanal. Chem. 2015, 407, 6237–6255. [Google Scholar] [CrossRef] [PubMed]
- Rager, J.E.; Strynar, M.J.; Liang, S.; McMahen, R.L.; Richard, A.M.; Grulke, C.M.; Wambaugh, J.; Isaacs, K.K.; Judson, R.S.; Williams, A.J.; et al. Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring. Environ. Int. 2016, 88, 269–280. [Google Scholar] [CrossRef] [Green Version]
- Pablo, G.-F.; Bletsou, A.A.; Damalas, D.E.; Aalizadeh, R.; Alygizakis, N.A.; Singer, H.P.; Hollender, J.; Thomaidis, N.S. Wide-scope target screening of >2000 emerging contaminants in wastewater samples with Uplc-Q-Tof-Hrms/Ms and smart evaluation of its performance through the validation of 195 selected representative analytes. J. Hazard. Mater. 2020, 387, 121712. [Google Scholar]
- Newton, S.; McMahen, R.L.; Sobus, J.R.; Mansouri, K.; Williams, A.J.; McEachran, A.D.; Strynar, M.J. Suspect screening and non-targeted analysis of drinking water using point-of-use filters. Environ. Pollut. 2018, 234, 297–306. [Google Scholar] [CrossRef] [PubMed]
- Schymanski, E.L.; Jeon, J.; Gulde, R.; Fenner, K.; Ruff, M.; Singer, H.; Hollender, J. Identifying small molecules via high resolution mass spectrometry: Communicating confidence. Environ. Sci. Technol. 2014, 48, 2097–2098. [Google Scholar] [CrossRef]
- Sobus, J.R.; Wambaugh, J.; Isaacs, K.K.; Williams, A.J.; McEachran, A.D.; Richard, A.M.; Grulke, C.M.; Ulrich, E.M.; Rager, J.E.; Strynar, M.J.; et al. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. J. Expo. Sci. Environ. Epidemiol. 2017, 28, 411–426. [Google Scholar] [CrossRef] [Green Version]
- Hohrenk, L.; Itzel, F.; Baetz, N.; Tuerk, J.; Vosough, M.; Schmidt, T.C. Comparison of software tools for Lc-Hrms data processing in non-target screening of environmental samples. Anal. Chem. 2019, 92. [Google Scholar] [CrossRef]
- Blaženović, I.; Kind, T.; Ji, J.; Fiehn, O. Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites 2018, 8, 31. [Google Scholar] [CrossRef] [Green Version]
- Rostkowski, P.; Haglund, P.; Aalizadeh, R.; Alygizakis, N.A.; Thomaidis, N.; Arandes, J.B.; Nizzetto, P.B.; Booij, P.; Budzinski, H.; Brunswick, P.; et al. The strength in numbers: Comprehensive characterization of house dust using complementary mass spectrometric techniques. Anal. Bioanal. Chem. 2019, 411, 1957–1977. [Google Scholar] [CrossRef] [Green Version]
- Ulrich, E.M.; Sobus, J.R.; Grulke, C.M.; Richard, A.M.; Newton, S.; Strynar, M.J.; Mansouri, K.; Williams, A.J. EPA’s non-targeted analysis collaborative trial (ENTACT): Genesis, design, and initial findings. Anal. Bioanal. Chem. 2018, 411, 853–866. [Google Scholar] [CrossRef]
- Sobus, J.R.; Grossman, J.N.; Chao, A.; Singh, R.; Williams, A.; Grulke, C.M.; Richard, A.; Newton, S.; McEachran, A.; Ulrich, E.M.; et al. Using prepared mixtures of toxcast chemicals to evaluate non-targeted analysis (Nta) method performance. Anal. Bioanal. Chem. 2018. [Google Scholar] [CrossRef]
- Schymanski, E.L.; Neumann, S. The critical assessment of small molecule identification (CASMI): Challenges and solutions. Metabolites 2013, 3, 517–538. [Google Scholar] [CrossRef] [PubMed]
- Nishioka, T.; Kasama, T.; Kinumi, T.; Makabe, H.; Matsuda, F.; Miura, D.; Miyashita, M.; Nakamura, T.; Tanaka, K.; Yamamoto, A.; et al. Winners of CASMI2013: Automated tools and challenge data. Mass Spectrom. 2014, 3, S0039. [Google Scholar] [CrossRef]
- Nikolic, D.; Sumner, L.; Dunn, W.; Jones, M. CASMI 2014: Challenges, solutions and results. Curr. Metab. 2017, 5, 5–17. [Google Scholar] [CrossRef]
- Schymanski, E.L.; Ruttkies, C.; Krauss, M.; Brouard, C.; Kind, T.; Dührkop, K.; Allen, F.; Vaniya, A.; Verdegem, D.; Böcker, S.; et al. Critical assessment of small molecule identification 2016: Automated methods. J. Cheminform. 2017, 9, 22. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ruttkies, C.; Schymanski, E.L.; Wolf, S.; Hollender, J.; Neumann, S. MetFrag relaunched: Incorporating strategies beyond in silico fragmentation. J. Cheminform. 2016, 8, 3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Allen, F.; Greiner, R.; Wishart, D.S. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics 2014, 11, 98–110. [Google Scholar] [CrossRef] [Green Version]
- Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A.A.; Melnik, A.V.; Meusel, M.; Dorrestein, P.C.; Rousu, J.; Böcker, S. SIRIUS 4: A rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 2019, 16, 299–302. [Google Scholar] [CrossRef] [Green Version]
- Grulke, C.M.; Williams, A.J.; Thillanadarajah, I.; Richard, A.M. EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Comput. Toxicol. 2019, 12, 100096. [Google Scholar] [CrossRef]
- Williams, A.J.; Grulke, C.M.; Edwards, J.; McEachran, A.D.; Mansouri, K.; Baker, N.C.; Patlewicz, G.; Shah, I.; Wambaugh, J.; Judson, R.S.; et al. The comptox chemistry dashboard: A community data resource for environmental chemistry. J. Cheminform. 2017, 9, 61. [Google Scholar] [CrossRef] [PubMed]
- McEachran, A.D.; Sobus, J.R.; Williams, A.J. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal. Bioanal. Chem. 2016, 409, 1729–1735. [Google Scholar] [CrossRef]
- McEachran, A.D.; Mansouri, K.; Grulke, C.M.; Schymanski, E.L.; Ruttkies, C.; Williams, A.J. “MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies. J. Cheminform. 2018, 10, 45. [Google Scholar] [CrossRef] [PubMed]
- McEachran, A.D.; Balabin, I.; Cathey, T.; Transue, T.R.; Al-Ghoul, H.; Grulke, C.; Sobus, J.R.; Williams, A.J. Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns. Sci. Data 2019, 6, 141–149. [Google Scholar] [CrossRef] [PubMed]
- Chao, A.; Al-Ghoul, H.; McEachran, A.D.; Balabin, I.; Transue, T.; Cathey, T.; Grossman, J.N.; Singh, R.R.; Ulrich, E.M.; Williams, A.J.; et al. In silico MS/MS spectra for identifying unknowns: A critical examination using CFM-ID algorithms and ENTACT mixture samples. Anal. Bioanal. Chem. 2020, 412, 1303–1315. [Google Scholar] [CrossRef] [Green Version]
- Critical Assessment of Small Molecule Identification. Available online: http://casmi-contest.org/ (accessed on 15 June 2020).
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2019 update: Improved access to chemical data. Nucleic Acids Res. 2018, 47, D1102–D1109. [Google Scholar] [CrossRef] [Green Version]
- Irwin, J.J.; Shoichet, B.K. Zinc—A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005, 45, 177–182. [Google Scholar] [CrossRef] [Green Version]
- Kiss, R.; Sandor, M.; Szalai, F.A. http://Mcule.com: A public web service for drug discovery. J. Cheminform. 2012, 4, P17. [Google Scholar] [CrossRef] [Green Version]
- Ruttkies, C.; Gerlich, M.; Neumann, S. Tackling CASMI 2012: Solutions from MetFrag and MetFusion. Metabolites 2013, 3, 623–636. [Google Scholar] [CrossRef]
- Dührkop, K.; Hufsky, F.; Böcker, S. Molecular formula identification using isotope pattern analysis and calculation of fragmentation trees. Mass Spectrom. 2014, 3, S0037. [Google Scholar] [CrossRef] [Green Version]
- Allen, F.; Greiner, R.; Wishart, D. CFM-ID applied to CASMI 2014. Curr. Metab. 2017, 5, 35–39. [Google Scholar] [CrossRef]
- Bertrand, S.; Guitton, Y.; Roullier, C. Successes and pitfalls in automated dereplication strategy using liquid chromatography coupled to mass spectrometry data: A CASMI 2016 experience. Phytochem. Lett. 2017, 21, 297–305. [Google Scholar] [CrossRef]
- Blaženović, I.; Kind, T.; Torbašinović, H.; Obrenović, S.; Mehta, S.; Tsugawa, H.; Wermuth, T.; Schauer, N.; Jahn, M.; Biedendieck, R.; et al. Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: Database boosting is needed to achieve 93% accuracy. J. Cheminform. 2017, 9, 32. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McEachran, A.D.; Mansouri, K.; Newton, S.R.; Beverly, B.E.; Sobus, J.R.; Williams, A.J. A comparison of three liquid chromatography (LC) retention time prediction models. Talanta 2018, 182, 371–379. [Google Scholar] [CrossRef] [PubMed]
- Zenodo: S0 | Susdat | Merged Norman Suspect List: Susdat (Version Norman-Sle-S0.0.2.2). Available online: https://zenodo.org/record/3900203#.Xuug_kVKhaQ (accessed on 15 April 2020).
- Allen, F.; Pon, A.; Greiner, R.; Wishart, D.S. Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification. Anal. Chem. 2016, 88, 7689–7697. [Google Scholar] [CrossRef] [PubMed]
- Allen, F.; Pon, A.; Wilson, M.; Greiner, R.; Wishart, D.S. CFM-ID: A web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. 2014, 42, W94–W99. [Google Scholar] [CrossRef] [Green Version]
- Stein, S.E.; Scott, D.R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 1994, 5, 859–866. [Google Scholar] [CrossRef] [Green Version]
Contest Year | Total Number of Challenge Structures | Total Number of Structures in DSSTox (Before) | Total Number of Structures in DSSTox (After) |
---|---|---|---|
CASMI 2012 1 | 14 | 9 | 14 |
CASMI 2013 | 18 2 | 8 | 16 |
CASMI 2014 | 42 | 29 | 42 |
CASMI 2016-challenge | 208 | 208 | 208 |
CASMI 2016-training | 312 | 312 | 312 |
CASMI 2017 | 243 | 54 | 227 |
CASMI Year | CFM-ID Only | CFM-ID + DS | Winners’ Results 1 | Total in DB/Total in Dataset 2 |
---|---|---|---|---|
2012 | 36% | 64% | 36% | 14/14 |
2013 | 81% | 88% | 88% | 16/16 |
2014 | 57% | 76% | 71% | 42/42 |
2016-training | 63% | 96% | 312/312 | |
2016-challenge | 66% | 94% | 81% | 208/208 |
2017 | 59% | 53% | 74% 3 | 227/243 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
McEachran, A.D.; Chao, A.; Al-Ghoul, H.; Lowe, C.; Grulke, C.; Sobus, J.R.; Williams, A.J. Revisiting Five Years of CASMI Contests with EPA Identification Tools. Metabolites 2020, 10, 260. https://doi.org/10.3390/metabo10060260
McEachran AD, Chao A, Al-Ghoul H, Lowe C, Grulke C, Sobus JR, Williams AJ. Revisiting Five Years of CASMI Contests with EPA Identification Tools. Metabolites. 2020; 10(6):260. https://doi.org/10.3390/metabo10060260
Chicago/Turabian StyleMcEachran, Andrew D., Alex Chao, Hussein Al-Ghoul, Charles Lowe, Christopher Grulke, Jon R. Sobus, and Antony J. Williams. 2020. "Revisiting Five Years of CASMI Contests with EPA Identification Tools" Metabolites 10, no. 6: 260. https://doi.org/10.3390/metabo10060260
APA StyleMcEachran, A. D., Chao, A., Al-Ghoul, H., Lowe, C., Grulke, C., Sobus, J. R., & Williams, A. J. (2020). Revisiting Five Years of CASMI Contests with EPA Identification Tools. Metabolites, 10(6), 260. https://doi.org/10.3390/metabo10060260