A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection
Abstract
:1. Introduction
- Given a particular docking protocol, would it be possible to know a priori which protein–ligand pairs will result in the best docking pose?
- Is there a preferable way of choosing the best docking protocol for an arbitrary ligand rather than selecting the one that reproduces the best self-docking pose for a particular proteins structure?
2. Results and Discussion
3. Materials and Methods
3.1. Datasets
3.2. Complex Preparation
3.3. Data Generation
3.4. Descriptor Calculation
3.5. Neural Network Architecture
3.6. Training and Validation
3.7. Implementation and Code Availability
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
RMSD | Root mean squared distance |
DL | Deep learning |
NN | Neural network |
CNN | Convolutional neural network |
FNN | Fully-connected neural network |
References
- Kitchen, D.B.; Decornez, H.; Furr, J.R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug Discov. 2004, 3, 935. [Google Scholar] [CrossRef] [PubMed]
- Sousa, S.F.; Fernandes, P.A.; Ramos, M.J. Protein-ligand docking: Current status and future challenges. Proteins Struct. Funct. Bioinform. 2006, 65, 15–26. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.21082 (accessed on 3 September 2019). [CrossRef] [PubMed]
- Chaput, L.; Mouawad, L. Efficient conformational sampling and weak scoring in docking programs?: Strategy of the wisdom of crowds. J. Cheminform. 2017, 9, 37. [Google Scholar] [CrossRef] [PubMed]
- Cuzzolin, A.; Sturlese, M.; Malvacio, I.; Ciancetta, A.; Moro, S. DockBench: An integrated informatic platform bridging the gap between the robust validation of docking protocols and virtual screening simulations. Molecules 2015, 20, 9977–9993. [Google Scholar] [CrossRef] [PubMed]
- Ciancetta, A.; Cuzzolin, A.; Moro, S. Alternative Quality Assessment Strategy to Compare Performances of GPCR-Ligand Docking Protocols: The Human Adenosine A2A Receptor as a Case Study. J. Chem. Inf. Model. 2014, 54, 2243–2254. Available online: http://xxx.lanl.gov/abs/https://doi.org/10.1021/ci5002857 (accessed on 3 September 2019). [CrossRef] [PubMed]
- Salmaso, V.; Sturlese, M.; Cuzzolin, A.; Moro, S. Combining self-and cross-docking as benchmark tools: The performance of DockBench in the D3R Grand Challenge 2. J. Comput. Aided Mol. Des. 2018, 32, 251–264. [Google Scholar] [CrossRef] [PubMed]
- Salmaso, V.; Sturlese, M.; Cuzzolin, A.; Moro, S. DockBench as docking selector tool: The lesson learned from D3R Grand Challenge 2015. J. Comput. Aided Mol. Des. 2016, 30, 773–789. [Google Scholar] [CrossRef]
- Dahl, G.E.; Jaitly, N.; Salakhutdinov, R. Multi-task neural networks for QSAR predictions. arXiv 2014, arXiv:1406.1231. [Google Scholar]
- Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv 2015, arXiv:1510.02855. [Google Scholar]
- Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: Toxicity prediction using deep learning. Front. Environ. Sci. 2016, 3, 80. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Feinberg, E.N.; Sur, D.; Wu, Z.; Husic, B.E.; Mai, H.; Li, Y.; Sun, S.; Yang, J.; Ramsundar, B.; Pande, V.S. Potentialnet for molecular property prediction. ACS Cent. Sci. 2018, 4, 1520–1530. [Google Scholar] [CrossRef] [PubMed]
- Jiménez-Luna, J.; Pérez-Benito, L.; Martínez-Rosell, G.; Sciabola, S.; Torella, R.; Tresadern, G.; De Fabritiis, G. DeltaDelta neural networks for lead optimization of small molecule potency. Chem. Sci. 2019, 10, 10911–10918. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Segler, M.H.; Preuss, M.; Waller, M.P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 2018, 555, 604. [Google Scholar] [CrossRef] [Green Version]
- Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef]
- Skalic, M.; Jiménez, J.; Sabbadin, D.; De Fabritiis, G. Shape-Based Generative Modeling for de Novo Drug Design. J. Chem. Inf. Model. 2019, 59, 1205–1214. [Google Scholar] [CrossRef]
- Segler, M.H.; Waller, M.P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. A Eur. J. 2017, 23, 5966–5971. [Google Scholar] [CrossRef]
- Ragoza, M.; Turner, L.; Koes, D.R. Ligand pose optimization with atomic grid-based convolutional neural networks. arXiv 2017, arXiv:1710.07400. [Google Scholar]
- Gentile, F.; Agrawal, V.; Hsing, M.; Ban, F.; Norinder, U.; Gleave, M.E.; Cherkasov, A. Deep Docking: A deep learning approach for virtual screening of big chemical datasets. bioRxiv 2019. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: Current status of the PDBbind database. Bioinformatics 2015, 31, 405–412. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47, 2977–2980. [Google Scholar] [CrossRef] [PubMed]
- Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
- Sheridan, R.P. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 2013, 53, 783–790. [Google Scholar] [CrossRef] [PubMed]
- Jiménez, J.; Sabbadin, D.; Cuzzolin, A.; Martínez-Rosell, G.; Gora, J.; Manchester, J.; Duca, J.; De Fabritiis, G. PathwayMap: Molecular pathway association with self-normalizing neural networks. J. Chem. Inf. Model. 2018, 59, 1172–1181. [Google Scholar] [CrossRef] [PubMed]
- Bolcato, G.; Cuzzolin, A.; Bissaro, M.; Moro, S.; Sturlese, M. Can we still trust docking results? An extension of the applicability of DockBench on PDBbind database. Int. J. Mol. Sci. 2019, 20, 3558. [Google Scholar] [CrossRef] [Green Version]
- Vilar, S.; Cozza, G.; Moro, S. Medicinal chemistry and the molecular operating environment (MOE): Application of QSAR and molecular docking to drug discovery. Curr. Top. Med. Chem. 2008, 8, 1555–1572. [Google Scholar] [CrossRef]
- OpenEye Scientific Software. QUACPAC; OpenEye Scientific Software: Santa Fe, NM, USA, 2016. [Google Scholar]
- O’Boyle, N.M.; Morley, C.; Hutchison, G.R. Pybel: A Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008, 2, 5. [Google Scholar] [CrossRef] [Green Version]
- Halgren, T.A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490–519. [Google Scholar] [CrossRef]
- Goodsell, D.S.; Morris, G.M.; Olson, A.J. Automated docking of flexible ligands: Applications of AutoDock. J. Mol. Recognit. 1996, 9, 1–5. [Google Scholar] [CrossRef]
- Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef] [Green Version]
- Korb, O.; Stutzle, T.; Exner, T.E. Empirical scoring functions for advanced protein- ligand docking with PLANTS. J. Chem. Inf. Model. 2009, 49, 84–96. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Chen, R.; Weng, Z. RDOCK: Refinement of rigid-body protein docking predictions. Proteins Struct. Funct. Bioinform. 2003, 53, 693–707. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef] [PubMed]
- Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein-ligand docking using GOLD. Proteins Struct. Funct. Bioinform. 2003, 52, 609–623. [Google Scholar] [CrossRef]
- Jiménez, J.; Doerr, S.; Martínez-Rosell, G.; Rose, A.S.; De Fabritiis, G. DeepSite: Protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017, 33, 3036–3042. [Google Scholar] [CrossRef] [Green Version]
- Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D.R. Protein-ligand scoring with convolutional neural networks. J. Chem. Inf. Model. 2017, 57, 942–957. [Google Scholar] [CrossRef] [Green Version]
- Doerr, S.; Harvey, M.; Noé, F.; De Fabritiis, G. HTMD: High-throughput molecular dynamics for molecular discovery. J. Chem. Theory Comput. 2016, 12, 1845–1852. [Google Scholar] [CrossRef]
- Landrum, G. Rdkit: A Software Suite for Cheminformatics, Computational Chemistry, and Predictive Modeling. 2013. Available online: http://www.rdkit.org/RDKit_Overview.pdf (accessed on 3 September 2019).
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Kramer, C.; Gedeck, P. Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. J. Chem. Inf. Model. 2010, 50, 1961–1969. [Google Scholar] [CrossRef]
- Bateman, A.; Coin, L.; Durbin, R.; Finn, R.D.; Hollich, V.; Griffiths-Jones, S.; Khanna, A.; Marshall, M.; Moxon, S.; Sonnhammer, E.L.; et al. The Pfam protein families database. Nucleic Acids Res. 2004, 32, D138–D141. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Gathiaka, S.; Liu, S.; Chiu, M.; Yang, H.; Stuckey, J.A.; Kang, Y.N.; Delproposto, J.; Kubish, G.; Dunbar, J.B.; Carlson, H.A.; et al. D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions. J. Comput. Aided Mol. Des. 2016, 30, 651–668. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gaieb, Z.; Liu, S.; Gathiaka, S.; Chiu, M.; Yang, H.; Shao, C.; Feher, V.A.; Walters, W.P.; Kuhn, B.; Rudolph, M.G.; et al. D3R Grand Challenge 2: Blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J. Comput. Aided Mol. Des. 2018, 32, 1–20. [Google Scholar] [CrossRef]
- Gaieb, Z.; Parks, C.D.; Chiu, M.; Yang, H.; Shao, C.; Walters, W.P.; Lambert, M.H.; Nevins, N.; Bembenek, S.D.; Ameriks, M.K.; et al. D3R Grand Challenge 3: Blind prediction of protein-ligand poses and affinity rankings. J. Comput. Aided Mol. Des. 2019, 33, 1–18. [Google Scholar] [CrossRef]
- Cohen, T.S.; Geiger, M.; Köhler, J.; Welling, M. Spherical cnns. arXiv 2018, arXiv:1801.10130. [Google Scholar]
- Thomas, N.; Smidt, T.; Kearnes, S.; Yang, L.; Li, L.; Kohlhoff, K.; Riley, P. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv 2018, arXiv:1802.08219. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1263–1272. [Google Scholar]
- Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Comput. Aided Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [Green Version]
- Jin, W.; Barzilay, R.; Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. arXiv 2018, arXiv:1802.04364. [Google Scholar]
- Morrone, J.A.; Weber, J.K.; Huynh, T.; Luo, H.; Cornell, W.D. Combining Docking Pose Rank and Structure with Deep Learning Improves Protein-Ligand Binding Mode Prediction over a Baseline Docking Approach. J. Chem. Inf. Model. 2020. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Terashi, G.; Christoffer, C.W.; Zhu, M.; Kihara, D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 2020, 36, 2113–2118. [Google Scholar] [CrossRef] [PubMed]
Protocol | RMSE | Pearson’s R | RMSE | Pearson’s R | RMSE | Pearson’s R | RMSE | Pearson’s R |
---|---|---|---|---|---|---|---|---|
Random | Ligand Scaffold | Protein Classes | Protein Classes Balanced | |||||
autodock-ga | 1.60 | 0.74 | 1.34 | 0.38 | 1.76 | 0.60 | 1.48 | 0.73 |
autodock-lga | 2.01 | 0.65 | 1.82 | 0.30 | 2.20 | 0.57 | 1.89 | 0.70 |
autodock-ls | 2.04 | 0.50 | 1.79 | 0.50 | 2.02 | 0.41 | 1.93 | 0.46 |
glide-sp | 2.79 | 0.52 | 3.34 | 0.14 | 2.84 | 0.44 | 2.34 | 0.64 |
gold-asp | 2.43 | 0.68 | 2.50 | 0.50 | 2.52 | 0.64 | 2.08 | 0.78 |
gold-chemscore | 2.59 | 0.62 | 2.74 | 0.37 | 2.62 | 0.61 | 2.25 | 0.73 |
gold-goldscore | 2.47 | 0.52 | 2.44 | 0.53 | 2.49 | 0.51 | 2.12 | 0.66 |
gold-plp | 2.49 | 0.66 | 2.53 | 0.32 | 2.57 | 0.62 | 2.14 | 0.76 |
plants-chemplp | 2.55 | 0.44 | 2.68 | −0.02 | 2.55 | 0.56 | 2.23 | 0.58 |
plants-plp95 | 3.04 | 0.42 | 3.16 | −0.12 | 3.08 | 0.40 | 2.58 | 0.57 |
plants-plp | 2.75 | 0.43 | 2.76 | 0.09 | 2.79 | 0.41 | 2.44 | 0.54 |
rdock-solv | 3.95 | 0.35 | 3.58 | 0.09 | 3.73 | 0.42 | 3.33 | 0.54 |
rdock-std | 3.92 | 0.35 | 3.62 | 0.08 | 3.71 | 0.42 | 3.23 | 0.56 |
vina-std | 2.23 | 0.40 | 2.30 | 0.19 | 2.35 | 0.33 | 1.97 | 0.69 |
Average | 2.63 | 0.52 | 2.62 | 0.24 | 2.66 | 0.50 | 2.29 | 0.64 |
Split Type | Pearson’s R | RMSE |
---|---|---|
random | ||
ligand scaffold | ||
protein classes | ||
protein classes balanced |
Score | Search Algorithm | Scoring Function | Protocol Abbrv. |
---|---|---|---|
Autodock 4.2 | Local search | Autodock SF | autodock-ls |
Lamarckian GA | autodock-lga | ||
GA | autodock-ga | ||
Glide 6.5 | Glide algorithm | Standard precision | glide-sp |
GOLD 5.4.1 | GA | ASP | gold-asp |
Chemscore | gold-chemscore | ||
Goldscore | gold-goldscore | ||
PLP | gold-plp | ||
PLANTS 1.2 | ACO algorithm | ChemPLP | plants-chemplp |
PLP | plants-plp | ||
PLP95 | plants-plp95 | ||
rDock 2013.1 | GA + MC + Simplex minimization | rDock master SF | rdock-std |
rDock master SF + desolvation | rdock-solv | ||
Vina 1.1.2 | MC + BFGS local search | Vina SF | vina-std |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiménez-Luna, J.; Cuzzolin, A.; Bolcato, G.; Sturlese, M.; Moro, S. A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection. Molecules 2020, 25, 2487. https://doi.org/10.3390/molecules25112487
Jiménez-Luna J, Cuzzolin A, Bolcato G, Sturlese M, Moro S. A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection. Molecules. 2020; 25(11):2487. https://doi.org/10.3390/molecules25112487
Chicago/Turabian StyleJiménez-Luna, José, Alberto Cuzzolin, Giovanni Bolcato, Mattia Sturlese, and Stefano Moro. 2020. "A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection" Molecules 25, no. 11: 2487. https://doi.org/10.3390/molecules25112487
APA StyleJiménez-Luna, J., Cuzzolin, A., Bolcato, G., Sturlese, M., & Moro, S. (2020). A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection. Molecules, 25(11), 2487. https://doi.org/10.3390/molecules25112487