Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors
Abstract
:1. Introduction
2. Methods
2.1. Dataset
2.2. Molecular Descriptor Calculation
2.3. Data Preprocessing and Feature Selection
2.4. Model Development and Evaluation
2.5. Feature Importance and Interpretation
3. Results and Discussion
3.1. Statistical Assessment of the Dataset and Feature Selection
3.2. Model Development and Selection
3.3. Model Prediction
3.4. Model Interpretation
3.5. Model Comparison and Application
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Code Availability
Conflicts of Interest
References
- Mackay, D.; Boethling, R.S. Henry’s Law Constant. In Handbook of Property Estimation Methods for Chemicals; CRC Press: Boca Raton, FL, USA, 2000; pp. 91–110. ISBN 0429133294. [Google Scholar]
- Li, M.; Wang, X.; Zhao, Y.; Du, P.; Li, H.; Li, J.; Shen, H.; Liu, Z.; Jiang, Y.; Chen, J.; et al. Atmospheric Nitrated Phenolic Compounds in Particle, Gaseous, and Aqueous Phases during Cloud Events at a Mountain Site in North China: Distribution Characteristics and Aqueous-Phase Formation. J. Geophys. Res. Atmos. 2022, 127, e2022JD037130. [Google Scholar] [CrossRef]
- Xuan, X.; Chen, Z.; Gong, Y.; Shen, H.; Chen, S. Partitioning of Hydrogen Peroxide in Gas-Liquid and Gas-Aerosol Phases. Atmos. Chem. Phys. 2020, 20, 5513–5526. [Google Scholar] [CrossRef]
- Leng, C.; Kish, J.D.; Roberts, J.E.; Dwebi, I.; Chon, N.; Liu, Y. Temperature-Dependent Henry’s Law Constants of Atmospheric Amines. J. Phys. Chem. A 2015, 119, 8884–8891. [Google Scholar] [CrossRef] [PubMed]
- Staudinger, J.; Roberts, P. V A Critical Review of Henry’s Law Constants for Environmental Applications. Crit. Rev. Environ. Sci. Technol. 1996, 26, 205–297. [Google Scholar] [CrossRef]
- Linnemann, M.; Nikolaychuk, P.A.; Muñoz-Muñoz, Y.M.; Baumhögger, E.; Vrabec, J. Henry’s Law Constant of Noble Gases in Water, Methanol, Ethanol, and Isopropanol by Experiment and Molecular Simulation. J. Chem. Eng. Data 2020, 65, 1180–1188. [Google Scholar] [CrossRef]
- Keshavarz, M.H.; Rezaei, M.; Hosseini, S.H. A Simple Approach for Prediction of Henry’s Law Constant of Pesticides, Solvents, Aromatic Hydrocarbons, and Persistent Pollutants without Using Complex Computer Codes and Descriptors. Process Saf. Environ. Prot. 2022, 162, 867–877. [Google Scholar] [CrossRef]
- Nirmalakhandan, N.N.; Speece, R.E. QSAR Model for Predicting Henry’s Constant. Environ. Sci. Technol. 1988, 22, 1349–1357. [Google Scholar] [CrossRef]
- Meylan, W.M.; Howard, P.H. Bond Contribution Method for Estimating Henry’s Law Constants. Environ. Toxicol. Chem. 1991, 10, 1283–1293. [Google Scholar] [CrossRef]
- Lin, S.T.; Sandler, S.I. Henry’s Law Constant of Organic Compounds in Water from a Group Contribution Model with Multipole Corrections. Chem. Eng. Sci. 2002, 57, 2727–2733. [Google Scholar] [CrossRef]
- Gharagheizi, F.; Abbasi, R.; Tirandazi, B. Prediction of Henry’s Law Constant of Organic Compounds in Water from a New Group-Contribution-Based Model. Ind. Eng. Chem. Res. 2010, 49, 10149–10152. [Google Scholar] [CrossRef]
- Duchowicz, P.R.; Aranda, J.F.; Bacelo, D.E.; Fioressi, S.E. QSPR Study of the Henry’s Law Constant for Heterogeneous Compounds. Chem. Eng. Res. Des. 2020, 154, 115–121. [Google Scholar] [CrossRef]
- Vo Thanh, H.; Zhang, H.; Dai, Z.; Zhang, T.; Tangparitkul, S.; Min, B. Data-Driven Machine Learning Models for the Prediction of Hydrogen Solubility in Aqueous Systems of Varying Salinity: Implications for Underground Hydrogen Storage. Int. J. Hydrogen Energy 2024, 55, 1422–1433. [Google Scholar] [CrossRef]
- Hou, Y.; Wang, S.; Bai, B.; Stephen Chan, H.C.; Yuan, S. Accurate Physical Property Predictions via Deep Learning. Molecules 2022, 27, 1668. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Wang, Y.; Ren, S.; Hou, Y.; Wu, W. Novel Strategy of Machine Learning for Predicting Henry’s Law Constants of CO2 in Ionic Liquids. ACS Sustain. Chem. Eng. 2023, 11, 6090–6099. [Google Scholar] [CrossRef]
- Wu, T.; Li, W.L.; Chen, M.Y.; Zhou, Y.M.; Zhang, Q.Y. Prediction of Henry’s Law Constants of CO2 in Imidazole Ionic Liquids Using Machine Learning Methods Based on Empirical Descriptors. Chem. Pap. 2021, 75, 1619–1628. [Google Scholar] [CrossRef]
- Orhan, I.B.; Le, T.C.; Babarao, R.; Thornton, A.W. Accelerating the Prediction of CO2 Capture at Low Partial Pressures in Metal-Organic Frameworks Using New Machine Learning Descriptors. Commun. Chem. 2023, 6, 214. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Yao, A.; Shokri, M.; Dineva, A.A. Predictive Modeling of Henry’s Law Constant in Chemical Structures Using LSSVM and ANFIS Algorithms. Preprints 2020, 2020020248. [Google Scholar] [CrossRef]
- Sander, R. Compilation of Henry’s Law Constants (Version 5.0.0) for Water as Solvent. Atmos. Chem. Phys. 2023, 23, 10901–12440. [Google Scholar] [CrossRef]
- Mauri, A. AlvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In Ecotoxicological QSARs; Methods in Pharmacology and Toxicology (MIPT); Springer: Berlin/Heidelberg, Germany, 2020; pp. 801–820. [Google Scholar] [CrossRef]
- Lomte, D.R.S.S.; Torambekar, M.R.S.G.; Janwale, M.R.A.P. Methods, Theory of Boosting Algorithm: A Review. JournalNX 2018, 39–44. Available online: https://repo.journalnx.com/index.php/nx/article/view/2024 (accessed on 5 June 2024).
- Gulati, P.; Sharma, A.; Gupta, M. Theoretical Study of Decision Tree Algorithms to Identify Pivotal Factors for Performance Improvement: A Review. Int. J. Comput. Appl. 2016, 141, 19–25. [Google Scholar] [CrossRef]
- Biau, G.; Scornet, E. A Random Forest Guided Tour. TEST 2015, 25, 197–227. [Google Scholar] [CrossRef]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 2017, 4766–4775. [Google Scholar]
- Nguyen, D.H.; Hien Le, X.; Heo, J.-Y.; Bae, D.-H. Development of an Extreme Gradient Boosting Model Integrated with Evolutionary Algorithms for Hourly Water Level Prediction. IEEE Access 2021, 9, 125853–125867. [Google Scholar] [CrossRef]
- Stanton, D.T.; Jurs, P.C. Development and Use of Charged Partial Surface Area Structural Descriptors in Computer-Assisted Quantitative Structure-Property Relationship Studies. Anal. Chem. 1990, 62, 2323–2329. [Google Scholar] [CrossRef]
- Karelson, M.; Lobanov, V.S.; Katritzky, A.R. Quantum-Chemical Descriptors in QSAR/QSPR. Chem. Rev. 1996, 96, 1027–1044. [Google Scholar] [CrossRef] [PubMed]
- Modarresi, H.; Modarress, H.; Dearden, J.C. Henry’s Law Constant of Hydrocarbons in Air–Water System: The Cavity Ovality Effect on the Non-Electrostatic Contribution Term of Solvation Free Energy. SAR QSAR Environ. Res. 2005, 16, 461–482. [Google Scholar] [CrossRef] [PubMed]
- Goodarzi, M.; Ortiz, E.V.; Coelho, L.d.S.; Duchowicz, P.R. Linear and Non-Linear Relationships Mapping the Henry’s Law Parameters of Organic Pesticides. Atmos. Environ. 2010, 44, 3179–3186. [Google Scholar] [CrossRef]
- Duchowicz, P.R.; Garro, J.C.; Castro, E.A. QSPR Study of the Henry’s Law Constant for Hydrocarbons. Chemom. Intell. Lab. Syst. 2008, 91, 133–140. [Google Scholar] [CrossRef]
- Gharagheizi, F.; Ilani-Kashkouli, P.; Mirkhani, S.A.; Farahani, N.; Mohammadi, A.H. QSPR Molecular Approach for Estimating Henry’s Law Constants of Pure Compounds in Water at Ambient Conditions. Ind. Eng. Chem. Res. 2012, 51, 4764–4767. [Google Scholar] [CrossRef]
- Modarresi, H.; Modarress, H.; Dearden, J.C. QSPR Model of Henry’s Law Constant for a Diverse Set of Organic Chemicals Based on Genetic Algorithm-Radial Basis Function Network Approach. Chemosphere 2007, 66, 2067–2076. [Google Scholar] [CrossRef] [PubMed]
- O’Loughlin, D.R.; English, N.J. Prediction of Henry’s Law Constants via Group-Specific Quantitative Structure Property Relationships. Chemosphere 2015, 127, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Bilde, M.; Barsanti, K.; Booth, M.; Cappa, C.D.; Donahue, N.M.; Emanuelsson, E.U.; McFiggans, G.; Krieger, U.K.; Marcolli, C.; Topping, D.; et al. Saturation Vapor Pressures and Transition Enthalpies of Low-Volatility Organic Molecules of Atmospheric Relevance: From Dicarboxylic Acids to Complex Mixtures. Chem. Rev. 2015, 115, 4115–4156. [Google Scholar] [CrossRef] [PubMed]
- Sharif, M.; Fan, H.; Wu, X.; Yu, Y.; Zhang, T.; Zhang, Z. Assessment of Novel Solvent System for CO2 Capture Applications. Fuel 2023, 337, 127218. [Google Scholar] [CrossRef]
Descriptor | Mean | SD | Maximum | Minimum | Description |
---|---|---|---|---|---|
MW | 110.34 | 38.92 | 284.76 | 26.04 | molecular weight |
AMW | 7.11 | 3.57 | 30.76 | 3.76 | average molecular weight |
Sv | 9.41 | 3.24 | 22.92 | 2.24 | sum of the atomic van der Waals volumes |
Se | 16.82 | 6.08 | 44.35 | 3.88 | sum of the atomic Sanderson electronegativities |
Sp | 10.16 | 3.69 | 26.42 | 2.21 | sum of the atomic polarizabilities |
Si | 19.22 | 7.28 | 52.72 | 4.41 | sum of the first ionization potentials |
Mv | 0.57 | 0.09 | 1.07 | 0.43 | mean atomic van der Waals volume |
Me | 1.00 | 0.046 | 1.21 | 0.95 | mean atomic Sanderson electronegativity |
Mp | 0.61 | 0.098 | 1.19 | 0.50 | mean atomic polarizability |
Mi | 1.14 | 0.020 | 1.19 | 1.07 | mean first ionization potential |
Model | R2_Train | RMSE_Train | R2_Validation | RMSE_Validation |
---|---|---|---|---|
Extra Trees | 0.9993 | 0.0518 | 0.8383 | 0.8137 |
Gradient Boosting | 0.9584 | 0.4901 | 0.8328 | 0.8275 |
CatBoost | 0.9970 | 0.1050 | 0.8283 | 0.8386 |
Random Forest | 0.9720 | 0.3205 | 0.8179 | 0.8635 |
XGBoost | 0.9993 | 0.0520 | 0.8131 | 0.8749 |
LightGBM | 0.9959 | 0.1225 | 0.8090 | 0.8843 |
AdaBoost | 0.8046 | 0.8459 | 0.7191 | 1.0725 |
Feature Selection Technique | AdaBoost | CatBoost | Decision Tree | Extra Trees | Gradient Boosting | Light GBM | Random Forest | XG Boost |
---|---|---|---|---|---|---|---|---|
F_regression | 0.764 | 0.622 | 0.888 | 0.590 | 0.659 | 0.697 | 0.702 | 0.735 |
Mutual_info_regression | 0.802 | 0.637 | 0.946 | 0.606 | 0.690 | 0.674 | 0.800 | 0.810 |
SelectfromModel_Lasso | 0.855 | 0.668 | 1.046 | 0.593 | 0.700 | 0.751 | 0.786 | 0.745 |
SelectfromModel_GB | 0.708 | 0.561 | 0.882 | 0.554 | 0.543 | 0.619 | 0.654 | 0.617 |
Parameter | Search Space | Optimum Values |
---|---|---|
learning_rate | (0.005, 1.0) | 0.134 |
n_estimators | (5, 200) | 193 |
min_samples_split | (2, 100) | 44 |
min_samples_leaf | (1, 100) | 59 |
Max_depth | (2, 50) | 27 |
Compounds | Formula | Hscp |
---|---|---|
2-(Butylamino)ethanol | C6H15NO | 2.66 × 10−1 |
2-Amino-1-butanol | C6H11NO | 3.10 × 10−2 |
N-(Hydroxyethyl)ethylenediamine | C6H12N2O | 1.33 × 10−2 |
N-(2-Hydroxyethyl)acetamide | C6H9NO2 | 1.33 × 10−2 |
1,4-Dimethyl piperazine | C6H14N2 | 2.28 × 10−1 |
N-(2-hydroxyethyl)formamide | C6H7NO2 | 1.33 × 10−2 |
1-(2-Hydroxyethyl)imidazole | C5H8N2O | 1.13 × 10−2 |
2-Hydroxy-N-(2-hydroxyethyl)acetamide | C4H9NO3 | 3.28 × 10−3 |
N-(2-hydroxyethyl)oxamic acid | C4H7NO4 | 6.70 × 10−3 |
N,N’-bis(2-hydroxyethyl)oxamide | C6H12N2O4 | 3.63 × 10−3 |
N-formylpiperazine | C5H10N2O | 3.28 × 10−3 |
N-ethylpiperazine | C6H14N2 | 1.78 × 10−1 |
4-diethylamino-2-butanol | C8H19NO | 2.63 × 10−1 |
N,N,N’,N’-tetramethylpropane-1,3-diamine | C7H18N2 | 3.59 × 10−1 |
1-(dimethylamino)propan-2-ol | C5H13NO | 4.78 × 10−2 |
N-methylcyclohexylamine | C7H15N | 1.08 × 100 |
Dodecanedioic acid | C12H22O4 | 1.23 × 10−1 |
Methylmalonic acid | C4H6O4 | 1.23 × 10−1 |
2,2-Dimethylmalonic acid | C5H8O4 | 6.70 × 10−3 |
Ethyl malonic acid | C5H8O4 | 1.05 × 10−2 |
Butyl malonic acid | C7H12O4 | 1.05 × 10−2 |
Methylsuccinic acid | C5H8O4 | 9.38 × 10−2 |
2,2-Dimethylsuccinic acid | C6H10O4 | 1.05 × 10−2 |
2-Methylglutaric acid | C6H10O4 | 3.02 × 10−2 |
2,2-Dimethylglutaric acid | C7H12O4 | 3.02 × 10−2 |
3,3-Dimethylglutaric acid | C7H12O4 | 9.38 × 10−2 |
aspartic acid | C4H7NO4 | 3.02 × 10−2 |
2-Oxosuccinic acid | C4H4O5 | 9.38 × 10−2 |
3-Oxoglutaric acid | C5H6O5 | 6.70 × 10−3 |
2-Oxoadipic acid | C6H8O5 | 6.70 × 10−3 |
4-Oxopimelic acid | C7H10O5 | 4.41 × 10−2 |
2-hydroxymalonic acid | C3H4O5 | 1.07 × 10−2 |
1,2 Cyclopentanedicarboxylic acid | C7H10O4 | 3.88 × 10−2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ullah, A.; Shaheryar, M.; Lim, H.-J. Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors. Atmosphere 2024, 15, 706. https://doi.org/10.3390/atmos15060706
Ullah A, Shaheryar M, Lim H-J. Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors. Atmosphere. 2024; 15(6):706. https://doi.org/10.3390/atmos15060706
Chicago/Turabian StyleUllah, Atta, Muhammad Shaheryar, and Ho-Jin Lim. 2024. "Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors" Atmosphere 15, no. 6: 706. https://doi.org/10.3390/atmos15060706
APA StyleUllah, A., Shaheryar, M., & Lim, H.-J. (2024). Machine Learning Approach for the Estimation of Henry’s Law Constant Based on Molecular Descriptors. Atmosphere, 15(6), 706. https://doi.org/10.3390/atmos15060706