UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning
Abstract
:1. Introduction
2. Background
3. Materials and Methods
3.1. Data
3.2. Ensemble Methods
3.2.1. Different Training Data
3.2.2. Different Models
3.2.3. Different Combinations of the Outputs
3.3. Methodology
Algorithm 1 Pseudocode of ensemble of classifiers as well as undersampling the negative set |
Input: |
P= Positive set (minority class samples in the dataset D) |
N= Negative set (majority class samples in the dataset D) |
M=Number of base learners in ensemble model |
Output: |
Ensemble = trained ensemble. |
for do |
for do |
Randomly sample : |
= train classifier using P and |
end for |
end for |
3.4. Drug and Target Vectorization
3.4.1. Protein Vectorization
3.4.2. Compound Vectorization
3.5. Lab Validation
4. Results and Discussion
4.1. Evaluation Metrics
4.2. Comparison of Results between Unbalanced Models vs. Proposed Method
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Sample Availability
Abbreviations
DTI | Drug-target interaction |
SMILES | Simplified Molecular-Input Line-Entry System |
PSC | Protein Sequence Composition |
ErG | Extended reduced graphs |
ESPF | Explainable Substructure Partition Fingerprint |
Appendix A
References
- Thafar, M.; Raies, A.B.; Albaradei, S.; Essack, M.; Bajic, V.B. Comparison study of computational prediction tools for drug-target binding affinities. Front. Chem. 2019, 7, 782. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cheng, A.C.; Coleman, R.G.; Smyth, K.T.; Cao, Q.; Soulard, P.; Caffrey, D.R.; Salzberg, A.C.; Huang, E.S. Structure-based maximal affinity model predicts small-molecule druggability. Nat. Biotechnol. 2007, 25, 71–75. [Google Scholar] [CrossRef] [PubMed]
- Yazdani-Jahromi, M.; Yousefi, N.; Tayebi, A.; Garibay, O.O.; Seal, S.; Kolanthai, E.; Neal, C. AttentionSiteDTI: Attention Based Model for Predicting Drug-Target Interaction Using 3D Structure of Protein Binding Sites. bioRxiv 2021. [Google Scholar] [CrossRef]
- Köppen, M. The curse of dimensionality. In Proceedings of the 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), Berlin, Germany; 2000; Volume 1, pp. 4–8. [Google Scholar]
- Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006, 34, D668–D672. [Google Scholar] [CrossRef]
- Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, T.; Lin, Y.; Wen, X.; Jorissen, R.N.; Gilson, M.K. BindingDB: A web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 2007, 35, D198–D201. [Google Scholar] [CrossRef] [Green Version]
- Ma, W.; Yang, L.; He, L. Overview of the detection methods for equilibrium dissociation constant KD of drug-receptor interaction. J. Pharm. Anal. 2018, 8, 147–152. [Google Scholar] [CrossRef] [PubMed]
- Du, X.; Li, Y.; Xia, Y.L.; Ai, S.M.; Liang, J.; Sang, P.; Ji, X.L.; Liu, S.Q. Insights into protein–ligand interactions: Mechanisms, models, and methods. Int. J. Mol. Sci. 2016, 17, 144. [Google Scholar] [CrossRef]
- Burlingham, B.T.; Widlanski, T.S. An intuitive look at the relationship of Ki and IC50: A more general use for the Dixon plot. J. Chem. Educ. 2003, 80, 214. [Google Scholar] [CrossRef]
- Bachmann, K.A.; Lewis, J.D. Predicting inhibitory drug—drug interactions and evaluating drug interaction reports using inhibition constants. Ann. Pharmacother. 2005, 39, 1064–1072. [Google Scholar] [CrossRef]
- Hulme, E.C.; Trevethick, M.A. Ligand binding assays at equilibrium: Validation and interpretation. Br. J. Pharmacol. 2010, 161, 1219–1237. [Google Scholar] [CrossRef] [Green Version]
- Weiland, G.A.; Molinoff, P.B. Quantitative analysis of drug-receptor interactions: I. Determination of kinetic and equilibrium properties. Life Sci. 1981, 29, 313–330. [Google Scholar] [CrossRef]
- Bulusu, K.C.; Guha, R.; Mason, D.J.; Lewis, R.P.; Muratov, E.; Motamedi, Y.K.; Cokol, M.; Bender, A. Modelling of compound combination effects and applications to efficacy and toxicity: State-of-the-art, challenges and perspectives. Drug Discov. Today 2016, 21, 225–238. [Google Scholar] [CrossRef] [Green Version]
- Ezzat, A.; Wu, M.; Li, X.L.; Kwoh, C.K. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinform. 2016, 17, 267–276. [Google Scholar] [CrossRef] [Green Version]
- Ezzat, A.; Wu, M.; Li, X.; Kwoh, C.K. Computational prediction of drug-target interactions via ensemble learning. In Computational Methods for Drug Repurposing; Springer: Berlin, Germany, 2019; pp. 239–254. [Google Scholar]
- Mahmud, S.H.; Chen, W.; Jahan, H.; Liu, Y.; Sujan, N.I.; Ahmed, S. iDTi-CSsmoteB: identification of drug–target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE. IEEE Access 2019, 7, 48699–48714. [Google Scholar] [CrossRef]
- Mahmud, S.H.; Chen, W.; Meng, H.; Jahan, H.; Liu, Y.; Hasan, S.M. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal. Biochem. 2020, 589, 113507. [Google Scholar] [CrossRef]
- Mousavian, Z.; Khakabimamaghani, S.; Kavousi, K.; Masoudi-Nejad, A. Drug–target interaction prediction from PSSM based evolutionary information. J. Pharmacol. Toxicol. Methods 2016, 78, 42–51. [Google Scholar] [CrossRef]
- Rayhan, F.; Ahmed, S.; Shatabda, S.; Farid, D.M.; Mousavian, Z.; Dehzangi, A.; Rahman, M.S. iDTI-ESBoost: Identification of drug target interaction using evolutionary and structural features with boosting. Sci. Rep. 2017, 7, 17731. [Google Scholar] [CrossRef]
- Taherzadeh, G.; Zhou, Y.; Liew, A.W.C.; Yang, Y. Sequence-based prediction of protein–carbohydrate binding sites using support vector machines. J. Chem. Inf. Model. 2016, 56, 2115–2122. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Redkar, S.; Mondal, S.; Joseph, A.; Hareesha, K. A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing. Mol. Inform. 2020, 39, 1900062. [Google Scholar] [CrossRef] [PubMed]
- Shi, H.; Liu, S.; Chen, J.; Li, X.; Ma, Q.; Yu, B. Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 2019, 111, 1839–1852. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Guo, M.; Needham, C.J.; Huang, Y.; Cai, L.; Westhead, D.R. Simple sequence-based kernels do not predict protein–protein interactions. Bioinformatics 2010, 26, 2610–2614. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Öztürk, H.; Özgür, A.; Ozkirimli, E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics 2018, 34, i821–i829. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Gareth, J.; Daniela, W.; Trevor, H.; Robert, T. An Introduction to Statistical Learning: With Applications in R; Spinger: Berlin, Germany, 2013. [Google Scholar]
- Perrone, M.P.; Cooper, L.N. When Networks Disagree: Ensemble Methods for Hybrid Neural Networks; Technical Report; Brown University, Institute for Brain and Neural Systems: Providence, RI, USA, 1992. [Google Scholar]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Stiefl, N.; Watson, I.A.; Baumann, K.; Zaliani, A. ErG: 2D pharmacophore descriptions for scaffold hopping. J. Chem. Inf. Model. 2006, 46, 208–220. [Google Scholar] [CrossRef]
- Huang, K.; Xiao, C.; Glass, L.; Sun, J. Explainable substructure partition fingerprint for protein, drug, and more. In Proceedings of the NeurIPS Learning Meaningful Representation of Life Workshop, Vancouver, BC, Canada, 13 December 2019. [Google Scholar]
- Smith, M.H. The amino acid composition of proteins. J. Theor. Biol. 1966, 13, 261–282. [Google Scholar] [CrossRef]
- Huang, K.; Fu, T.; Glass, L.M.; Zitnik, M.; Xiao, C.; Sun, J. DeepPurpose: A deep learning library for drug–target interaction prediction. Bioinformatics 2020, 36, 5545–5547. [Google Scholar] [CrossRef]
- Lee, I.; Keum, J.; Nam, H. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput. Biol. 2019, 15, e1007129. [Google Scholar] [CrossRef] [Green Version]
- Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
- James, C.A.; Weininger, D.; Delany, J. Daylight Theory Manual; Daylight Chemical Information Systems Inc.: Mission Viejo, CA, USA, 1992–2005. [Google Scholar]
- Barker, E.J.; Gardiner, E.J.; Gillet, V.J.; Kitts, P.; Morris, J. Further development of reduced graphs for identifying bioactive compounds. J. Chem. Inf. Comput. Sci. 2003, 43, 346–356. [Google Scholar] [CrossRef]
- Gillet, V.J.; Willett, P.; Bradshaw, J. Similarity searching using reduced graphs. J. Chem. Inf. Comput. Sci. 2003, 43, 338–345. [Google Scholar] [CrossRef]
- Kearsley, S.K.; Sallamack, S.; Fluder, E.M.; Andose, J.D.; Mosley, R.T.; Sheridan, R.P. Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 1996, 36, 118–127. [Google Scholar] [CrossRef]
- Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909. [Google Scholar]
- Bolton, E.E.; Wang, Y.; Thiessen, P.A.; Bryant, S.H. PubChem: integrated platform of small molecules and biological activities. In Annual Reports in Computational Chemistry; Elsevier: Amsterdam, The Netherlands, 2008; Volume 4, pp. 217–241. [Google Scholar]
- Cereto-Massagué, A.; Ojeda, M.J.; Valls, C.; Mulero, M.; Garcia-Vallvé, S.; Pujadas, G. Molecular fingerprint similarity search in virtual screening. Methods 2015, 71, 58–63. [Google Scholar] [CrossRef]
Unique Drugs | Unique Targets | Total Pairs | Positive Pairs | Negative Pairs | Imbalance Ratio | |
---|---|---|---|---|---|---|
BindingDB Dataset | 679,118 | 5941 | 1,369,057 | 492,970 | 876,087 | 1.78 |
Bioactivity Type | IC50 Value | SMILES Sequence Length | FASTA Sequence Length | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Max | Min | Avg | Max | Min | Avg | Max | Min | Avg | ||
BindingDB Dataset | IC50 | 1 × 107 | 0 | 3.79 × 104 | 1.94 × 103 | 2.0 × 100 | 5.85 × 101 | 7.18 × 103 | 9.0 × 100 | 7.07 × 102 |
Layer | ErG | ESPF | PSC | Final FC Network |
---|---|---|---|---|
First | Linear(315, 1024) | Linear(2586, 1024) | Linear(8420, 1024) | Linear(512, 1024) |
2nd | Linear(1024, 256) | Linear(1024, 256) | Linear(1024, 256) | Linear(1024, 1024) |
3rd | Linear(256, 64) | Linear(256, 64) | Linear(256, 64) | Linear(1024, 512) |
4th | Linear(64, 256) | Linear(64, 256) | Linear(64, 256) | Linear(512, 1) |
Name | Description | Size | Feature Group |
---|---|---|---|
PSC | Amino acid composition up to 3-mers | 8420 | Target |
ErG | 2D pharmacophore descriptions for scaffold hopping | 315 | Drug |
ESPF | Explainable Substructure Partition Fingerprint | 2586 | Drug |
AUROC | AUPRC | F1-Score | Recall (TPR) | |
---|---|---|---|---|
Unbalanced Model 1 (ESPF-PSC) | 0.924 | 0.879 | 0.809 | 0.797 |
Unbalanced Model 2 (ErG-PSC) | 0.926 | 0.876 | 0.796 | 0.809 |
Proposed Model | 0.952 | 0.920 | 0.838 | 0.903 |
Compound | Lab Results | Unbalanced Model 1 | Unbalanced Model 2 | Proposed Model |
---|---|---|---|---|
darunavir | P | N | N | P |
2-keto-3-deoxynononic | P | N | N | N |
Cytidine-5monophospho-N-acetylneuraminic | P | N | P | P |
N-Glycolylneuraminic | P | N | P | P |
N-acetyl-neuraminic | P | N | P | P |
N-Acetyllactosamine | P | N | N | P |
3,6-Mannopentaose | N | N | N | N |
Recall(TPR) | 0 | 0.5 | 0.833 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tayebi, A.; Yousefi, N.; Yazdani-Jahromi, M.; Kolanthai, E.; Neal, C.J.; Seal, S.; Garibay, O.O. UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules 2022, 27, 2980. https://doi.org/10.3390/molecules27092980
Tayebi A, Yousefi N, Yazdani-Jahromi M, Kolanthai E, Neal CJ, Seal S, Garibay OO. UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules. 2022; 27(9):2980. https://doi.org/10.3390/molecules27092980
Chicago/Turabian StyleTayebi, Aida, Niloofar Yousefi, Mehdi Yazdani-Jahromi, Elayaraja Kolanthai, Craig J. Neal, Sudipta Seal, and Ozlem Ozmen Garibay. 2022. "UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning" Molecules 27, no. 9: 2980. https://doi.org/10.3390/molecules27092980