SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network
Abstract
:1. Introduction
2. Results and Discussion
2.1. Evaluation of Independent Datasets
2.2. Error Analysis
2.3. Feature Importance
2.4. Hyperparameter Tuning
2.5. Web Server for Solubility
3. Materials and Methods
3.1. Dataset
3.1.1. Data Preprocessing
3.1.2. Ten-Fold Data Split
3.2. Methods
3.2.1. Molecular Feature Extraction
3.2.2. Graph Neural Network (GNN)
3.2.3. Residual Gated Graph Convolution
3.2.4. Implementation Details
3.2.5. Evaluation Metrics
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2012, 64, 4–17. [Google Scholar] [CrossRef]
- Lipinski, C.A.L.F. Poor aqueous solubility—An industry-wide problem in drug discovery. Am. Pharm. Rev 2002, 5, 82–85. [Google Scholar]
- Di, L.; Kerns, E.H. Drug-like Properties: Concepts, Structure Design and Methods from ADME to Toxicity Optimization; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
- Kostewicz, E.S.; Brauns, U.; Becker, R.; Dressman, J.B. Forecasting the oral absorption behavior of poorly soluble weak bases using solubility and dissolution studies in biorelevant media. Pharm. Res. 2002, 19, 345. [Google Scholar] [CrossRef] [PubMed]
- McPherson, S.; Perrier, J.; Dunn, C.; Khadra, I.; Davidson, S.; Ainousah, B.; Wilson, C.G.; Halbert, G. Small scale design of experiment investigation of equilibrium solubility in simulated fasted and fed intestinal fluid. Eur. J. Pharm. Biopharm. 2020, 150, 14–23. [Google Scholar] [CrossRef] [PubMed]
- Chaudhary, A.; Nagaich, U.; Gulati, N.; Sharma, V.; Khosa, R.; Partapur, M. Enhancement of solubilization and bioavailability of poorly soluble drugs by physical and chemical modifications: A recent review. J. Adv. Pharm. Educ. Res. 2012, 2, 32–67. [Google Scholar]
- Tu, M.; Cheng, S.; Lu, W.; Du, M. Advancement and prospects of bioinformatics analysis for studying bioactive peptides from food-derived protein: Sequence, structure, and functions. TrAC Trends Anal. Chem. 2018, 105, 7–17. [Google Scholar] [CrossRef]
- Jan, B.; Farman, H.; Khan, M.; Imran, M.; Islam, I.U.; Ahmad, A.; Ali, S.; Jeon, G. Deep learning in big data analytics: A comparative study. Comput. Electr. Eng. 2019, 75, 275–287. [Google Scholar] [CrossRef]
- Tang, W.; Chen, J.; Wang, Z.; Xie, H.; Hong, H. Deep learning for predicting toxicity of chemicals: A mini review. J. Environ. Sci. Health Part C 2018, 36, 252–271. [Google Scholar] [CrossRef]
- Wang, X.; Liu, M.; Zhang, L.; Wang, Y.; Li, Y.; Lu, T. Optimizing pharmacokinetic property prediction based on integrated datasets and a deep learning approach. J. Chem. Inf. Model. 2020, 60, 4603–4613. [Google Scholar] [CrossRef]
- Khan, A.; Tayara, H.; Chong, K.T. Prediction of organic material band gaps using graph attention network. Comput. Mater. Sci. 2023, 220, 112063. [Google Scholar] [CrossRef]
- Qin, G.; Wei, Y.; Yu, L.; Xu, J.; Ojih, J.; Rodriguez, A.D.; Wang, H.; Qin, Z.; Hu, M. Predicting lattice thermal conductivity from fundamental material properties using machine learning techniques. J. Mater. Chem. A 2023, 11, 5801–5810. [Google Scholar] [CrossRef]
- Stahl, K.; Graziadei, A.; Dau, T.; Brock, O.; Rappsilber, J. Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning. Nat. Biotechnol. 2023, 41, 1810–1819. [Google Scholar] [CrossRef] [PubMed]
- Boothroyd, S.; Kerridge, A.; Broo, A.; Buttar, D.; Anwar, J. Solubility prediction from first principles: A density of states approach. Phys. Chem. Chem. Phys. 2018, 20, 20981–20987. [Google Scholar] [CrossRef] [PubMed]
- Livingstone, D.J.; Ford, M.G.; Huuskonen, J.J.; Salt, D.W. Simultaneous prediction of aqueous solubility and octanol/water partition coefficient based on descriptors derived from molecular structure. J. Comput. Aided Mol. Des. 2001, 15, 741–752. [Google Scholar] [CrossRef] [PubMed]
- Ma, X.; Li, Z.; Achenie, L.E.; Xin, H. Machine-learning-augmented chemisorption model for CO2 electroreduction catalyst screening. J. Phys. Chem. Lett. 2015, 6, 3528–3533. [Google Scholar] [CrossRef] [PubMed]
- Korotcov, A.; Tkachenko, V.; Russo, D.P.; Ekins, S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 2017, 14, 4462–4475. [Google Scholar] [CrossRef] [PubMed]
- Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 2017, 9, 48. [Google Scholar] [CrossRef]
- Hirohara, M.; Saito, Y.; Koda, Y.; Sato, K.; Sakakibara, Y. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 2018, 19, 83–94. [Google Scholar] [CrossRef]
- Arús-Pous, J.; Johansson, S.V.; Prykhodko, O.; Bjerrum, E.J.; Tyrchan, C.; Reymond, J.L.; Chen, H.; Engkvist, O. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminform. 2019, 11, 71. [Google Scholar] [CrossRef]
- Chen, S.; Wulamu, A.; Zou, Q.; Zheng, H.; Wen, L.; Guo, X.; Chen, H.; Zhang, T.; Zhang, Y. MD-GNN: A mechanism-data-driven graph neural network for molecular properties prediction and new material discovery. J. Mol. Graph. Model. 2023, 123, 108506. [Google Scholar] [CrossRef]
- Cremer, J.; Sandonas, L.M.; Tkatchenko, A.; Clevert, D.A.; De Fabritiis, G. Equivariant Graph Neural Networks for Toxicity Prediction; ACS Publications: Washington, DC, USA, 2023. [Google Scholar]
- Yang, L.; Jin, C.; Yang, G.; Bing, Z.; Huang, L.; Niu, Y.; Yang, L. Transformer-based deep learning method for optimizing ADMET properties of lead compounds. Phys. Chem. Chem. Phys. 2023, 25, 2377–2385. [Google Scholar] [CrossRef] [PubMed]
- Atz, K.; Grisoni, F.; Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 2021, 3, 1023–1032. [Google Scholar] [CrossRef]
- Chuang, K.V.; Gunsalus, L.M.; Keiser, M.J. Learning molecular representations for medicinal chemistry: Miniperspective. J. Med. Chem. 2020, 63, 8705–8722. [Google Scholar] [CrossRef] [PubMed]
- Ghasemi, F.; Mehridehnavi, A.; Pérez-Garrido, A.; Pérez-Sánchez, H. Neural network and deep-learning algorithms used in QSAR studies: Merits and drawbacks. Drug Discov. Today 2018, 23, 1784–1790. [Google Scholar] [CrossRef] [PubMed]
- Padula, D.; Simpson, J.D.; Troisi, A. Combining electronic and structural features in machine learning models to predict organic solar cells properties. Mater. Horizons 2019, 6, 343–349. [Google Scholar] [CrossRef]
- Kang, B.; Seok, C.; Lee, J. Prediction of molecular electronic transitions using random forests. J. Chem. Inf. Model. 2020, 60, 5984–5994. [Google Scholar] [CrossRef] [PubMed]
- Fan, Z.; Ma, E. Predicting orientation-dependent plastic susceptibility from static structure in amorphous solids via deep learning. Nat. Commun. 2021, 12, 1506. [Google Scholar] [CrossRef]
- Wu, C.K.; Zhang, X.C.; Yang, Z.J.; Lu, A.P.; Hou, T.J.; Cao, D.S. Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules. Briefings Bioinform. 2021, 22, bbab327. [Google Scholar] [CrossRef]
- Shen, C.; Krenn, M.; Eppel, S.; Aspuru-Guzik, A. Deep molecular dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations. Mach. Learn. Sci. Technol. 2021, 2, 03LT02. [Google Scholar] [CrossRef]
- Capecchi, A.; Reymond, J.L. Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning. J. Cheminform. 2021, 13, 82. [Google Scholar] [CrossRef]
- Gao, P.; Liu, Z.; Tan, Y.; Zhang, J.; Xu, L.; Wang, Y.; Jeong, S.Y. Accurate predictions of drugs aqueous solubility via deep learning tools. J. Mol. Struct. 2022, 1249, 131562. [Google Scholar] [CrossRef]
- Cui, Q.; Lu, S.; Ni, B.; Zeng, X.; Tan, Y.; Chen, Y.D.; Zhao, H. Improved prediction of aqueous solubility of novel compounds by going deeper with deep learning. Front. Oncol. 2020, 10, 121. [Google Scholar] [CrossRef] [PubMed]
- Bae, S.Y.; Lee, J.; Jeong, J.; Lim, C.; Choi, J. Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints. Comput. Toxicol. 2021, 20, 100178. [Google Scholar] [CrossRef]
- Maziarka, Ł.; Danel, T.; Mucha, S.; Rataj, K.; Tabor, J.; Jastrzębski, S. Molecule attention transformer. arXiv 2020, arXiv:2002.08264. [Google Scholar]
- Francoeur, P.G.; Koes, D.R. SolTranNet—A Machine Learning Tool for Fast Aqueous Solubility Prediction. J. Chem. Inf. Model. 2021, 61, 2530–2536. [Google Scholar] [CrossRef]
- Sorkun, M.C.; Khetan, A.; Er, S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci. Data 2019, 6, 143. [Google Scholar] [CrossRef]
- Boobier, S.; Osbourn, A.; Mitchell, J.B. Can human experts predict solubility better than computers? J. Cheminform. 2017, 9, 63. [Google Scholar] [CrossRef]
- Lovrić, M.; Pavlović, K.; Žuvela, P.; Spataru, A.; Lučić, B.; Kern, R.; Wong, M.W. Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J. Chemom. 2021, 35, e3349. [Google Scholar] [CrossRef]
- Llinas, A.; Oprisiu, I.; Avdeef, A. Findings of the second challenge to predict aqueous solubility. J. Chem. Inf. Model. 2020, 60, 4791–4803. [Google Scholar] [CrossRef]
- Amara, K.; Ying, R.; Zhang, Z.; Han, Z.; Shan, Y.; Brandes, U.; Schemm, S.; Zhang, C. Graphframex: Towards systematic evaluation of explainability methods for graph neural networks. arXiv 2022, arXiv:2206.09677. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef]
- Landrum, G. Rdkit documentation. Release 2013, 1, 4. [Google Scholar]
- Fey, M.; Lenssen, J.E. Fast graph representation learning with PyTorch Geometric. arXiv 2019, arXiv:1903.02428. [Google Scholar]
- Walters, W.P.; Barzilay, R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 2020, 54, 263–270. [Google Scholar] [CrossRef] [PubMed]
- Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 2, pp. 729–734. [Google Scholar]
- Sukhbaatar, S.; Fergus, R. Learning multiagent communication with backpropagation. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Marcheggiani, D.; Titov, I. Encoding sentences with graph convolutional networks for semantic role labeling. arXiv 2017, arXiv:1703.04826. [Google Scholar]
- Bresson, X.; Laurent, T. Residual gated graph convnets. arXiv 2017, arXiv:1711.07553. [Google Scholar]
Fold | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
RMSE | 1.00 | 0.93 | 1.02 | 1.03 | 1.04 | 1.08 | 1.02 | 1.01 | 1.06 | 1.09 |
0.73 | 0.79 | 0.80 | 0.74 | 0.81 | 0.78 | 0.77 | 0.76 | 0.80 | 0.79 |
Datasets | SolTranNet | SolPredictor | ||
---|---|---|---|---|
RMSE | RMSE | |||
Cui et al. [34] | 0.611 | 0.624 | 0.547 | 0.597 |
Boobier et al. [39] | 0.724 | 1.010 | 0.814 | 0.743 |
Lovric et al. [40] | 0.783 | 0.720 | 0.805 | 0.783 |
Llinas et al. [41] set1 | 0.527 | 0.952 | 0.373 | 0.991 |
Llinas et al. [41] set2 | 0.824 | 1.243 | 0.677 | 1.142 |
Atom Features | Description |
---|---|
Atom number | 1 to 119 |
Chirality | Atom chirality |
Degree | 0 to 11 |
Charge | −5 to 7 |
Hydrogens | Connected hydrogens |
No of radical electrons | 0 to 5 |
Hybridization | s, sp2, sp3d2, sp, sp3, sp3d, other |
Aromaticity | False or True |
Is in ring | False or True |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmad, W.; Tayara, H.; Shim, H.; Chong, K.T. SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network. Int. J. Mol. Sci. 2024, 25, 715. https://doi.org/10.3390/ijms25020715
Ahmad W, Tayara H, Shim H, Chong KT. SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network. International Journal of Molecular Sciences. 2024; 25(2):715. https://doi.org/10.3390/ijms25020715
Chicago/Turabian StyleAhmad, Waqar, Hilal Tayara, HyunJoo Shim, and Kil To Chong. 2024. "SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network" International Journal of Molecular Sciences 25, no. 2: 715. https://doi.org/10.3390/ijms25020715
APA StyleAhmad, W., Tayara, H., Shim, H., & Chong, K. T. (2024). SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network. International Journal of Molecular Sciences, 25(2), 715. https://doi.org/10.3390/ijms25020715