Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases
Abstract
:1. Introduction
2. Results
2.1. Data Sets and Machine Learning Models
2.2. Polar Stationary Phases
2.3. DB-624 and OV-17 Data Sets
2.4. Prediction of Second Dimension Retention Times
2.5. Further Testing and Applications
3. Methods
3.1. Deep Neural Networks
3.2. Second-Level Models for Mid-Polar Stationary Phases
3.3. Data Sets Preprocessing and Training-Test Split
3.4. Implementation and Software
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Vigdergauz, M.S.; Martynov, A.A. Some applications of the gas chromatographic linear retention index. Chromatographia 1971, 4, 463–467. [Google Scholar] [CrossRef]
- Tarján, G.; Nyiredy, S.; Györ, M.; Lombosi, E.; Lombosi, T.; Budahegyi, M.; Mészáros, S.; Takács, J. Thirtieth anniversary of the retention index according to Kováts in gas-liquid chromatography. J. Chromatogr. A 1989, 472, 1–92. [Google Scholar] [CrossRef]
- Khodadadi, M.; Pourfarzam, M. A review of strategies for untargeted urinary metabolomic analysis using gas chromatography–mass spectrometry. Metabolomics 2020, 16, 66. [Google Scholar] [CrossRef]
- Babushok, V.I.; Linstrom, P.J.; Zenkevich, I.G. Retention Indices for Frequently Reported Compounds of Plant Essential Oils. J. Phys. Chem. Ref. Data 2011, 40, 043101. [Google Scholar] [CrossRef] [Green Version]
- Zellner, B.D.; Bicchi, C.; Dugo, P.; Rubiolo, P.; Dugo, G.; Mondello, L. Linear retention indices in gas chromatographic analysis: A review. Flavour Fragr. J. 2008, 23, 297–314. [Google Scholar] [CrossRef]
- Veenaas, C.; Bignert, A.; Liljelind, P.; Haglund, P. Nontarget Screening and Time-Trend Analysis of Sewage Sludge Contaminants via Two-Dimensional Gas Chromatography–High Resolution Mass Spectrometry. Environ. Sci. Technol. 2018, 52, 7813–7822. [Google Scholar] [CrossRef]
- Matyushin, D.D.; Sholokhova, A.Y.; Karnaeva, A.E.; Buryak, A.K. Various aspects of retention index usage for GC-MS library search: A statistical investigation using a diverse data set. Chemom. Intell. Lab. Syst. 2020, 202, 104042. [Google Scholar] [CrossRef]
- Zhang, J.; Koo, I.; Wang, B.; Gao, Q.-W.; Zheng, C.-H.; Zhang, X. A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures. J. Chromatogr. A 2012, 1251, 188–193. [Google Scholar] [CrossRef] [Green Version]
- Ji, H.; Deng, H.; Lu, H.; Zhang, Z. Predicting a Molecular Fingerprint from an Electron Ionization Mass Spectrum with Deep Neural Networks. Anal. Chem. 2020, 92, 8649–8653. [Google Scholar] [CrossRef] [PubMed]
- Qiu, F.; Lei, Z.; Sumner, L.W. MetExpert: An expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications. Anal. Chim. Acta 2018, 1037, 316–326. [Google Scholar] [CrossRef] [PubMed]
- Dossin, E.; Martin, E.; Diana, P.; Castellon, A.; Monge, A.; Pospisil, P.; Bentley, M.; Guy, P.A. Prediction Models of Retention Indices for Increased Confidence in Structural Elucidation during Complex Matrix Analysis: Application to Gas Chromatography Coupled with High-Resolution Mass Spectrometry. Anal. Chem. 2016, 88, 7539–7547. [Google Scholar] [CrossRef]
- Matsuo, T.; Tsugawa, H.; Miyagawa, H.; Fukusaki, E. Integrated Strategy for Unknown EI–MS Identification Using Quality Control Calibration Curve, Multivariate Analysis, EI–MS Spectral Database, and Retention Index Prediction. Anal. Chem. 2017, 89, 6766–6773. [Google Scholar] [CrossRef]
- Kumari, S.; Stevens, D.; Kind, T.; Denkert, C.; Fiehn, O. Applying In-Silico Retention Index and Mass Spectra Matching for Identification of Unknown Metabolites in Accurate Mass GC-TOF Mass Spectrometry. Anal. Chem. 2011, 83, 5895–5902. [Google Scholar] [CrossRef] [Green Version]
- Héberger, K. Quantitative structure–(chromatographic) retention relationships. J. Chromatogr. A 2007, 1158, 273–305. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kaliszan, R. QSRR: Quantitative Structure-(Chromatographic) Retention Relationships. Chem. Rev. 2007, 107, 3212–3246. [Google Scholar] [CrossRef] [PubMed]
- Zhokhov, A.K.; Loskutov, A.Y.; Rybal’Chenko, I.V. Methodological Approaches to the Calculation and Prediction of Retention Indices in Capillary Gas Chromatography. J. Anal. Chem. 2018, 73, 207–220. [Google Scholar] [CrossRef]
- Matyushin, D.D.; Buryak, A.K. Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning. IEEE Access 2020, 8, 223140–223155. [Google Scholar] [CrossRef]
- Vrzal, T.; Malečková, M.; Olšovská, J. DeepReI: Deep learning-based gas chromatographic retention index predictor. Anal. Chim. Acta 2021, 1147, 64–71. [Google Scholar] [CrossRef]
- Qu, C.; Schneider, B.I.; Kearsley, A.J.; Keyrouz, W.; Allison, T.C. Predicting Kováts Retention Indices Using Graph Neural Networks. J. Chromatogr. A 2021, 1646, 462100. [Google Scholar] [CrossRef] [PubMed]
- Shrestha, A.; Mahmood, A. Review of Deep Learning Algorithms and Architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
- Matyushin, D.D.; Sholokhova, A.; Buryak, A.K. A deep convolutional neural network for the estimation of gas chromatographic retention indices. J. Chromatogr. A 2019, 1607, 460395. [Google Scholar] [CrossRef]
- Randazzo, G.M.; Bileck, A.; Danani, A.; Vogt, B.; Groessl, M. Steroid identification via deep learning retention time predictions and two-dimensional gas chromatography-high resolution mass spectrometry. J. Chromatogr. A 2020, 1612, 460661. [Google Scholar] [CrossRef]
- Stein, S.E.; Babushok, V.I.; Brown, A.R.L.; Linstrom, P.J. Estimation of Kováts Retention Indices Using Group Contributions. J. Chem. Inf. Model. 2007, 47, 975–980. [Google Scholar] [CrossRef]
- Yan, J.; Cao, D.-S.; Guo, F.-Q.; Zhang, L.-X.; He, M.; Huang, J.-H.; Xu, Q.-S.; Liang, Y.-Z. Comparison of quantitative structure–retention relationship models on four stationary phases with different polarity for a diverse set of flavor compounds. J. Chromatogr. A 2012, 1223, 118–125. [Google Scholar] [CrossRef] [PubMed]
- Qin, L.-T.; Liu, S.-S.; Chen, F.; Wu, Q.-S. Development of validated quantitative structure-retention relationship models for retention indices of plant essential oils. J. Sep. Sci. 2013, 36, 1553–1560. [Google Scholar] [CrossRef] [PubMed]
- Rojas, C.; Duchowicz, P.R.; Tripaldi, P.; Diez, R.P. Quantitative structure–property relationship analysis for the retention index of fragrance-like compounds on a polar stationary phase. J. Chromatogr. A 2015, 1422, 277–288. [Google Scholar] [CrossRef] [PubMed]
- Jennings, W. Retention Indices in Increasing Order on Polyethylene Glycol Carbowax 20M. In Qualitative Analysis of Flavor and Fragrance Volatiles by Glass Capillary Gas Chromatography; Elsevier: Amsterdam, The Netherlands, 1980; pp. 86–113. [Google Scholar]
- Veenaas, C.; Linusson, A.; Haglund, P. Retention-time prediction in comprehensive two-dimensional gas chromatography to aid identification of unknown contaminants. Anal. Bioanal. Chem. 2018, 410, 7931–7941. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- D’Archivio, A.A.; Incani, A.; Ruggieri, F. Cross-column prediction of gas-chromatographic retention of polychlorinated biphenyls by artificial neural networks. J. Chromatogr. A 2011, 1218, 8679–8690. [Google Scholar] [CrossRef] [PubMed]
- D’Archivio, A.A.; Giannitto, A.; Maggi, M.A. Cross-column prediction of gas-chromatographic retention of polybrominated diphenyl ethers. J. Chromatogr. A 2013, 1298, 118–131. [Google Scholar] [CrossRef]
- Seeley, J.V.; Seeley, S.K. Model for predicting comprehensive two-dimensional gas chromatography retention times. J. Chromatogr. A 2007, 1172, 72–83. [Google Scholar] [CrossRef]
- Wang, J.; Hang, Y.; Yan, T.; Liang, J.; Xu, H.; Huang, Z. Qualitative analysis of flavors and fragrances added to tea by using GC-MS. J. Sep. Sci. 2018, 41, 648–656. [Google Scholar] [CrossRef]
- Cuzuel, V.; Sizun, A.; Cognon, G.; Rivals, I.; Heulard, F.; Thiébaut, D.; Vial, J. Human odor and forensics. Optimization of a comprehensive two-dimensional gas chromatography method based on orthogonality: How not to choose between criteria. J. Chromatogr. A 2018, 1536, 58–66. [Google Scholar] [CrossRef] [PubMed]
- Cabrera, J.F.A.; Moyano, E.; Santos, F. Gas chromatography and liquid chromatography coupled to mass spectrometry for the determination of fluorotelomer olefins, fluorotelomer alcohols, perfluoroalkyl sulfonamides and sulfonamido-ethanols in water. J. Chromatogr. A 2020, 1609, 460463. [Google Scholar] [CrossRef]
- Poole, C.F. Gas chromatography system constant database for 52 wall-coated, open-tubular columns covering the temperature range 60–140 °C. J. Chromatogr. A 2019, 1604, 460482. [Google Scholar] [CrossRef]
- Willighagen, E.L.; Mayfield, J.W.; Alvarsson, J.; Berg, A.; Carlsson, L.; Jeliazkova, N.; Kuhn, S.; Pluskal, T.; Rojas-Chertó, M.; Spjuth, O.; et al. The Chemistry Development Kit (CDK) v2.0: Atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017, 9, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Matyushin, D. Supplementary Data and Code for the Article “Gas Chromatographic Retention Index Prediction Using Multi-modal Machine Learning”. Figshare 2020. p. 57303746 Bytes. [Google Scholar] [CrossRef]
Designation | N | Compounds | Stationary Phase | Reference a |
---|---|---|---|---|
FLAVORS | 1169 | Flavors and fragrances | Carbowax 20 M | [26,27] |
ESSOILS | 427 | Essential oils components | Various polar SP | [4,25] |
DB-624 | 545 b | Various aliphatic and aromatic alcohols, esters, ethers, aldehydes, sulfur-containing compounds, heterocycles, nitriles and other compounds | DB-624 | [11] |
OV-17 | 192 | Odorants (the Flavornet database https://www.flavornet.org/, accessed on 22 August 2021) | OV-17 | [24] |
BPX50_2D | 859 c | The diverse set of environmental-related compounds: pesticides, organophosphates, esters, polyaromatic compounds, polychlorinated biphenyl congeners, polychlorinated dioxins, bisphenols, etc. | BPX50 | [28] |
SEDB624 | 130 | Series of homologues of ketones, aldehydes, alcohols, alkylbenzenes, alkenes, chloroalkenes, cycloalkenes, esters, and other compounds | DB-624 | [31] |
DB-1701 | 36 | Flavors and fragrances | DB-1701 | [32] |
DB-210 | 130 | The same compounds as in the SEDB624 data set | DB-210 | [31] |
Model | Metric | Data Sets | ||
---|---|---|---|---|
NIST | FLAVORS | ESSOILS | ||
CNNPolar | RMSE | 92.0 | 92.3 | 70.1 |
MAE | 47.3 | 53.8 | 46.9 | |
MdAE | 23.4 | 32.9 | 30.8 | |
MPE | 3.12 | 3.40 | 2.84 | |
MdPE | 1.56 | 2.21 | 1.73 | |
MLPPolar | RMSE | 82.5 | 94.2 | 64.6 |
MAE | 45.8 | 52.3 | 45.6 | |
MdAE | 27.4 | 27.5 | 31.0 | |
MPE | 3.07 | 3.26 | 2.77 | |
MdPE | 1.80 | 1.86 | 1.89 | |
Average | RMSE | 80.3 | 86.1 | 58.8 |
MAE | 41.7 | 46.6 | 40.4 | |
MdAE | 22.0 | 26.1 | 26.4 | |
MPE | 2.77 | 2.93 | 2.47 | |
MdPE | 1.45 | 1.74 | 1.59 | |
Previous works [23,25,26] | RMSE | 154 | 125.4 | 177.3 |
MAE | 101 | 89.8 | 139.0 | |
MdAE | 64 | 68.6 | 123.5 | |
MPE | 5.7 | 5.76 | 8.42 | |
MdPE | 3.9 | 4.49 | 7.09 | |
Linear model [23] | RMSE | 129.1 | 157.6 | 91.8 |
MAE | 72.5 | 92.4 | 61.7 | |
MdAE | 41.2 | 47.4 | 43.2 | |
MPE | 4.75 | 5.76 | 3.62 | |
MdPE | 2.77 | 3.07 | 2.57 |
Model | Metric | Data Sets | |
---|---|---|---|
DB-624 | OV-17 | ||
This work | RMSE | 36.8 | 43.8 |
MAE | 24.3 | 30.7 | |
MdAE | 16.7 | 22.5 | |
MPE | 2.33 | 2.52 | |
MdPE | 1.66 | 1.73 | |
Previous results [11,24] | RMSE | 54.2 | 58.8 |
MAE | 32.5 | 47.3 | |
MdAE | 19.2 | 42.9 | |
MPE | 2.96 | 4.15 | |
MdPE | 2.10 | 3.61 |
Model | Metric | Data Sets | |
---|---|---|---|
Test Set | External Set | ||
This work | RMSE | 0.229 | 0.211 |
MAE | 0.149 | 0.151 | |
MdAE | 0.096 | 0.104 | |
MPE | 4.34 | 4.17 | |
MdPE | 2.89 | 3.07 | |
Previous results [28] | RMSE | 0.26 | 0.23 |
MAE | 0.17 | 0.16 | |
MPE | 5 | 4 |
Metric | Data Sets | ||
---|---|---|---|
SEDB624 | DB-1701 | DB-210 | |
RMSE | 26.2 | 56.1 | 64.2 |
MAE | 15.9 | 37.1 | 50.1 |
MdAE | 9.95 | 20.0 | 42.3 |
MPE | 2.03 | 2.54 | 5.05 |
MdPE | 1.17 | 1.52 | 4.12 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matyushin, D.D.; Sholokhova, A.Y.; Buryak, A.K. Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases. Int. J. Mol. Sci. 2021, 22, 9194. https://doi.org/10.3390/ijms22179194
Matyushin DD, Sholokhova AY, Buryak AK. Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases. International Journal of Molecular Sciences. 2021; 22(17):9194. https://doi.org/10.3390/ijms22179194
Chicago/Turabian StyleMatyushin, Dmitriy D., Anastasia Yu. Sholokhova, and Aleksey K. Buryak. 2021. "Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases" International Journal of Molecular Sciences 22, no. 17: 9194. https://doi.org/10.3390/ijms22179194