Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method
Abstract
:1. Introduction
2. Materials & Methods
2.1. Datasets
- Chemical Exposure Guidelines for Deployed Military Personnel Version 1.3 (MEG).We curated carcinogenic chemicals from the Technical Guide 230 (TG230): “Chemical Exposure Guidelines for Deployed Military Personnel” [18]. TG 230 provides military exposure guidelines (MEGs) for chemicals in the air, water, and soil, along with an assigned carcinogenicity group for each chemical. Chemicals are categorized into one of 5 groups: Group A (human carcinogen), Group B (probable human carcinogen), Group C (possible human carcinogen), Group D (not classifiable), and Group E (no evidence of carcinogenicity).
- Environmental Health Risk Assessment and Chemical Exposure Guidelines for Deployed Military Personnel 2013 Revision (TG230).We curated carcinogenic chemicals listed in the Technical Guide 230 (TG230): “Environmental Health Risk Assessment and Chemical Exposure Guidelines for Deployed Military Personnel” [19], which provides military exposure guidelines (MEGs).
- National Toxicology Program (NTP).Carcinogenic chemicals were curated from the NTP [20]. NTP lists two groups of carcinogenic chemicals: (a) reasonably anticipated to be a human carcinogen and (b) known to be human carcinogens.
- International Agency for Research on Cancer (IARC)Carcinogenic chemicals were curated from IARC [21]. IARC categorizes chemicals into one of the 5 groups: Group 1 (carcinogenic to humans), Group 2A (probably carcinogenic to humans), Group 2B (possibly carcinogenic to humans), Group 3 (not classifiable as to its carcinogenicity to humans), and Group 4 (probably not carcinogenic to humans).
- The Japan Society for Occupational Health (JSOH)Carcinogenic chemicals were curated with the recommendation of Occupational Exposure Limits published by the JSOH [22], which are classified into one of the 3 groups: Group 1 (carcinogenic to humans), Group 2A (probably carcinogenic to humans), and Group 2B (possibly carcinogenic to humans).
- The National Institute for Occupational Safety and Health (NIOSH)Carcinogenic chemicals curated from the NIOSH [23].
- Carcinogenic Potency Database (CPDB)
- CPDB_CPE (CPDB CarcinoPred-EL) data: CPDB data for rat carcinogenicity were collected from the CarcinoPred-EL developed by Zhang et al. [10]. The list contains 494 carcinogenic and 509 non-carcinogenic chemicals.
- CPDB data: CPDB [24] data were collected and processed to obtain the median toxicity dose (TD50) for rat carcinogenicity. TD50 is the dose-rate in mg/kg body wt/day administered throughout life that induces cancer in half of the test animals. A total of 561 carcinogenic chemicals was obtained with TD50 values for rat carcinogenicity. A total of 605 noncarcinogenic chemicals was obtained for rat carcinogenicity. For 543 carcinogenic chemicals out of 561, the TD50 values in mmol/kg body wt/day were also obtained from the DSSTox database (https://www.epa.gov/chemical-research/distributed-structure-searchable-toxicity-dsstox-database; accessed on 30 September 2017).
- Chemical Carcinogenesis Research Information System (CCRIS).Carcinogenesis data were collected from the CCRIS at ftp://ftp.nlm.nih.gov/nlmdata/.ccrislease/; accessed on 30 September 2017. The carcinogenicity and mutagenicity data were extracted. A total of 6833 chemicals was obtained after eliminating duplicates/conflicting data when compared to data sources 1 to 6, out of which 4054 were carcinogenic/mutagenic and 2779 were non-carcinogenic/mutagenic.
- Drugbank 2018The drug data were collected from the drug bank (www.drugbank.ca; accessed on 31 March 2018). The approved drugs predicted as carcinogenic by Zhang et al. [10] were removed, the remining 1756 approved drugs were considered non-carcinogenic.
2.1.1. Dataset I: Binary Classification Data
- For binary classification of chemicals to predict the carcinogenic or non-carcinogenic category, 448 carcinogenic chemicals were obtained from data sources 1 to 6 above.Data 1 (MEG): The chemicals classified into Groups A, B, and C were considered as carcinogens.Data 2 (TG30): The chemicals listed as carcinogens were considered as carcinogens.Data 3 (NTP): The chemicals classified as either “reasonably anticipated to be a human carcinogen” or “known to be human carcinogens” were considered as carcinogens.Data 4 (IARC): The chemicals classified into Groups 1, 2A, and 2B were considered as carcinogens.Data 5 (JSOH): The chemicals classified into Groups 1, 2A, and 2B were considered as carcinogens.Data 6 (NIOSH): The carcinogenic chemicals listed were considered as carcinogens.
- CPDB_CPE chemicals from data source 7a contributed 320 carcinogenic and 458 non- carcinogenic additional data after comparing to the data from data sources 1 to 6 and removing duplicates and conflicting chemicals.
- The CCRIS mutagenicity/carcinogenicity data from data source 8 contributed 3868 mutagenic/carcinogenic data and 2500 non-mutagenic/carcinogenic data.
- A total of 400 non-carcinogenic approved drugs from data source nine was also used in this classification model.
2.1.2. Dataset II: Multiclass Classification Data
- For multiclass classification, 882 carcinogenic and 2 non-carcinogenic chemicals were collected from data sources 1, 3, 4, and 5. There was a total of 2 in class 0, 604 in class 1, and 278 in class 2 in this dataset.Data 1 (MEG): The chemicals classified into Groups A and B were considered class 2. The chemicals classified into Groups C and D were considered class 1 carcinogens. Chemicals classified into group E are considered class 0 compounds.Data 3 (NTP): The chemicals classified as either “reasonably anticipated to be a human carcinogen” or “known to be human carcinogens” were considered class 2.Data 4 (IARC): The chemicals classified into Groups 1 and 2A were considered class 2 carcinogens, and those classified into Groups 2, B, and 3 were considered class 1 carcinogens.Data 5 (JSOH): The chemicals classified into Groups 1 and 2A were considered class 2 carcinogens, and those classified into Groups 2B were considered class 1 carcinogens.Considering Group D of MEG data as class 1 carcinogen along with Group C and considering Group 3 of IARC data as class 1 carcinogen along with Group 2B increased the multiclass data significantly in this dataset. In the case of binary classification, we discarded these groups.
- CPDB chemicals from data source 7b contributed 277 carcinogenic and 457 non-carcinogenic additional data after removing duplicates and conflicting chemicals compared to the data from data sources 1, 3, 4, and 5. The 277 carcinogenic chemicals were categorized into class 2, and 457 noncarcinogenic chemicals were categorized into class 0.
2.1.3. Dataset III: Regression Data
2.2. Descriptors
2.3. SMILES Preprocessing
3. Machine Learning Models
3.1. Hybrid Neural Network Model
3.2. Other Machine Learning Algorithms
3.3. Model Evaluation
4. Results and Discussion
4.1. Carcinogen Prediction Using Binary Classification
4.2. Carcinogen Prediction Using Multiclass Classification
4.3. Carcinogenicity Prediction Using Regression
5. Conclusions
6. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hernández, L.G.; van Steeg, H.; Luijten, M.; van Benthem, J. Mechanisms of non-genotoxic carcinogens and importance of a weight of evidence approach. Mutat. Res. 2009, 682, 94–109. [Google Scholar] [CrossRef] [PubMed]
- Wogan, G.N.; Hecht, S.S.; Felton, J.S.; Conney, A.H.; Loeb, L.A. Environmental and chemical carcinogenesis. Semin. Cancer Biol. 2004, 14, 473–486. [Google Scholar] [CrossRef] [PubMed]
- Ledda, C.; Rapisarda, V. Occupational and Environmental Carcinogenesis. Cancers 2020, 12, 2547. [Google Scholar] [CrossRef] [PubMed]
- Marone, P.A.; Hall, W.C.; Hayes, A.W. Reassessing the two-year rodent carcinogenicity bioassay: A review of the applicability to human risk and current perspectives. Regul. Toxicol. Pharmacol. 2014, 68, 108–118. [Google Scholar] [CrossRef] [PubMed]
- Russell, W.; Burch, R. The Principles of Humane Experimental Technique; Methuen: London, UK, 1959; ISBN 0-900767-78-2. [Google Scholar]
- Luan, F.; Zhang, R.; Zhao, C.; Yao, X.; Liu, M.; Hu, Z.; Fan, B. Classification of the Carcinogenicity of N-Nitroso Compounds Based on Support Vector Machines and Linear Discriminant Analysis. Chem. Res. Toxicol. 2005, 18, 198–203. [Google Scholar] [CrossRef]
- Ivanciuc, O. Support Vector Machine Classification of the Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons. Internet Electron. J. Mol. Des. 2002, 1, 203–218. [Google Scholar]
- Fjodorova, N.; Vračko, M.; Tušar, M.; Jezierska, A.; Novič, M.; Kühne, R.; Schüürmann, G. Quantitative and qualitative models for carcinogenicity prediction for non-congeneric chemicals using CP ANN method for regulatory uses. Mol. Divers. 2010, 14, 581–594. [Google Scholar] [CrossRef]
- Tanabe, K.; Kurita, T.; Nishida, K.; Lučić, B.; Amić, D.; Suzuki, T. Improvement of carcinogenicity prediction performances based on sensitivity analysis in variable selection of SVM models. SAR QSAR Environ. Res. 2013, 24, 565–580. [Google Scholar] [CrossRef]
- Zhang, L.; Ai, H.; Chen, W.; Yin, Z.; Hu, H.; Zhu, J.; Zhao, J.; Zhao, Q.; Liu, H. CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep. 2017, 7, 2118. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Du, Z.; Wang, J.; Wu, Z.; Li, W.; Liu, G.; Shen, X.; Tang, Y. In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods. Mol. Inform. 2015, 34, 228–235. [Google Scholar] [CrossRef]
- Toma, C.; Manganaro, A.; Raitano, G.; Marzo, M.; Gadaleta, D.; Baderna, D.; Roncaglioni, A.; Kramer, N.; Benfenati, E. QSAR Models for Human Carcinogenicity: An Assessment Based on Oral and Inhalation Slope Factors. Mol. Basel Switz. 2020, 26, 127. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.-W.; Huang, L.; Jiang, S.-W.; Li, K.; Zou, J.; Yang, S.-Y. CapsCarcino: A novel sparse data deep learning tool for predicting carcinogens. Food Chem. Toxicol. 2020, 135, 110921. [Google Scholar] [CrossRef] [PubMed]
- Guan, D.; Fan, K.; Spence, I.; Matthews, S. Combining machine learning models of in vitro and in vivo bioassays improves rat carcinogenicity prediction. Regul. Toxicol. Pharmacol. 2018, 94, 8–15. [Google Scholar] [CrossRef] [PubMed]
- Issa, N.T.; Wathieu, H.; Glasgow, E.; Peran, I.; Parasido, E.; Li, T.; Simbulan-Rosenthal, C.M.; Rosenthal, D.; Medvedev, A.V.; Makarov, S.S.; et al. A novel chemo-phenotypic method identifies mixtures of salpn, vitamin D3, and pesticides involved in the development of colorectal and pancreatic cancer. Ecotoxicol. Environ. Saf. 2022, 233, 113330. [Google Scholar] [CrossRef]
- Li, N.; Qi, J.; Wang, P.; Zhang, X.; Zhang, T.; Li, H. Quantitative Structure-Activity Relationship (QSAR) Study of Carcinogenicity of Polycyclic Aromatic Hydrocarbons (PAHs) in Atmospheric Particulate Matter by Random forest (RF). Anal. Methods 2019, 11, 1816–1821. [Google Scholar] [CrossRef]
- Limbu, S.; Zakka, C.; Dakshanamurthy, S. Predicting Environmental Chemical Toxicity Using a New Hybrid Deep Machine Learning Method. ChemRxiv 2021. [Google Scholar] [CrossRef]
- Hauschild, V.D. Chemical exposure guidelines for deployed military personnel. Drug Chem. Toxicol. 2000, 23, 139–153. [Google Scholar] [CrossRef]
- USAPHC TG230 Environmental HRA and Chemical Military Exposure Guidelines (MEGs). Environmental Health Risk Assessment and Chemical Exposure Guidelines for Deployed Military Personnel. 2013 Revision. U.S. Army Public Health Command (USAPHC). Available online: https://phc.amedd.army.mil/PHC%20Resource%20Library/TG230-DeploymentEHRA-and-MEGs-2013-Revision.pdf (accessed on 12 September 2022).
- National Toxicology Program: 14th Report on Carcinogens. Available online: https://ntp.niehs.nih.gov/go/roc14 (accessed on 5 March 2020).
- List of Classifications–IARC Monographs on the Identification of Carcinogenic Hazards to Humans. Available online: https://monographs.iarc.who.int/list-of-classifications (accessed on 2 March 2020).
- Recommendation of Occupational Exposure Limits (2018–2019). J. Occup. Health 2018, 60, 419–542. [CrossRef]
- Carcinogen List-Occupational Cancer|NIOSH|CDC. Available online: https://www.cdc.gov/niosh/topics/cancer/npotocca.html (accessed on 28 February 2020).
- Carcinogenic Potency Database. Available online: http://wayback.archive-it.org/org-350/20190628191644/https://toxnet.nlm.nih.gov/cpdb/chemicalsummary.html (accessed on 5 June 2018).
- Moriwaki, H.; Tian, Y.-S.; Kawashita, N.; Takagi, T. Mordred: A molecular descriptor calculator. J. Cheminform. 2018, 10, 4. [Google Scholar] [CrossRef] [Green Version]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Drucker, H. Improving Regressors Using Boosting Techniques. In Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, TN, USA, 8–12 July 1997; pp. 107–115. [Google Scholar]
- Li, T.; Tong, W.; Roberts, R.; Liu, Z.; Thakkar, S. DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation. Front. Artif. Intell. 2021, 4, 757780. [Google Scholar] [CrossRef] [PubMed]
- Li, T.; Tong, W.; Roberts, R.; Liu, Z.; Thakkar, S. DeepDILI: Deep Learning-Powered Drug-Induced Liver Injury Prediction Using Model-Level Representation. Chem. Res. Toxicol. 2021, 34, 550–565. [Google Scholar] [CrossRef] [PubMed]
- Valerio, L.G., Jr.; Arvidson, K.B.; Chanderbhan, R.F.; Contrera, J.F. Prediction of rodent carcinogenic potential of naturally occurring chemicals in the human diet using high-throughput QSAR predictive modeling. Toxicol. Appl. Pharmacol. 2007, 222, 1–16. [Google Scholar] [CrossRef]
- Jiao, Z.; Hu, P.; Xu, H.; Wang, Q. Machine Learning and Deep Learning in Chemical Health and Safety: A Systematic Review of Techniques and Applications. ACS Chem. Health Saf. 2020, 27, 316–334. [Google Scholar] [CrossRef]
- Tan, N.X.; Rao, H.B.; Li, Z.R.; Li, X.Y. Prediction of chemical carcinogenicity by machine learning approaches. SAR QSAR Environ. Res. 2009, 20, 27–75. [Google Scholar] [CrossRef] [PubMed]
- Tanabe, K.; Lučić, B.; Amić, D.; Kurita, T.; Kaihara, M.; Onodera, N.; Suzuki, T. Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling. Mol. Divers 2010, 14, 789–802. [Google Scholar] [CrossRef] [PubMed]
- Toropova, A.P.; Toropov, A.A. CORAL: QSAR Models for Carcinogenicity of Organic Compounds for Male and Female Rats. Comput. Biol. Chem. 2018, 72, 26–32. [Google Scholar] [CrossRef]
- Yauk, C.L.; Harrill, A.H.; Ellinger-Ziegelbauer, H.; van der Laan, J.W.; Moggs, J.; Froetschl, R.; Sistare, F.; Pettit, S. A Cross-Sector Call to Improve Carcinogenicity Risk Assessment through Use of Genomic Methodologies. Regul. Toxicol. Pharmacol. 2020, 110, 104526. [Google Scholar] [CrossRef]
- Zhang, H.; Cao, Z.-X.; Li, M.; Li, Y.-Z.; Peng, C. Novel Naïve Bayes Classification Models for Predicting the Carcinogenicity of Chemicals. Food Chem. Toxicol. 2016, 97, 141–149. [Google Scholar] [CrossRef]
- Wathieu, H.; Ojo, A.; Dakshanamurthy, S. Prediction of Chemical Multi-target Profiles and Adverse Outcomes with Systems Toxicology. Curr. Med. Chem. 2017, 24, 1705–1720. [Google Scholar] [CrossRef] [PubMed]
- Issa, N.T.; Wathieu, H.; Ojo, A.; Byers, S.W.; Dakshanamurthy, S. Drug Metabolism in Preclinical Drug Development: A Survey of the Discovery Process, Toxicology, and Computational Tools. Curr. Drug Metab. 2017, 18, 556–565. [Google Scholar] [CrossRef] [PubMed]
- Issa, N.T.; Stathias, V.; Schürer, S.; Dakshanamurthy, S. Machine and deep learning approaches for cancer drug repurposing. Semin. Cancer Biol. 2021, 68, 132–142. [Google Scholar] [CrossRef]
- Glück, J.; Buhrke, T.; Frenzel, F.; Braeuning, A.; Lampen, A. In Silico genotoxicity and Carcinogenicity Prediction for Food-Relevant Secondary Plant Metabolites. Food Chem. Toxicol. 2018, 116, 298–306. [Google Scholar] [CrossRef]
- Singh, K.P.; Gupta, S.; Rai, P. Predicting Carcinogenicity of Diverse Chemicals Using Probabilistic Neural Network Modeling Approaches. Toxicol. Appl. Pharmacol. 2013, 272, 465–475. [Google Scholar] [CrossRef] [PubMed]
- Asha, P.; Natrayan, L.B.; Geetha, B.T.; Beulah, J.R.; Sumathy, R.; Varalakshmi, G.; Neelakandan, S. IoT enabled environmental toxicology for air pollution monitoring using AI techniques. Environ. Res. 2021, 205, 112574. [Google Scholar] [CrossRef] [PubMed]
- Saravanan, D.; Kumar, D.K.S.; Sathya, R.; Palani, U. An iot based air quality monitoring and air pollutant level prediction system using machine learning approach–dlmnn. Int. J. Future Gen. Commun. Netw. 2020, 13, 925–945. [Google Scholar]
- Satpathy, S.; Mohan, P.; Das, S.; Debbarma, S. A new healthcare diagnosis system using an IoT-based fuzzy classifier with FPGA. J. Supercomput. 2020, 76, 5849–5861. [Google Scholar] [CrossRef]
- Senthilkumar, R.; Venkatakrishnan, P.; Balaji, N. Intelligent based novel embedded system based IoT enabled air pollution monitoring system. Microprocess. Microsyst. 2020, 77, 103172. [Google Scholar] [CrossRef]
- Shukla, S.K.; Kumar, B.M.; Sinha, D.; Nemade, V.; Mussiraliyeva, S.; Sugumar, R.; Jain, R. Apprehending the Effect of Internet of Things (IoT) Enables Big Data Processing through Multinetwork in Supporting High-Quality Food Products to Reduce Breast Cancer. J. Food Qual. 2022, 2022, 2275517. [Google Scholar] [CrossRef]
- Memon, M.H.; Li, J.P.; Haq, A.U.; Memon, M.H.; Zhou, W. Breast Cancer Detection in the IOT Health Environment Using Modified Recursive Feature Selection. Wirel. Commun. Mob. Comput. 2019, 2019, 5176705. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Limbu, S.; Dakshanamurthy, S. Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method. Sensors 2022, 22, 8185. https://doi.org/10.3390/s22218185
Limbu S, Dakshanamurthy S. Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method. Sensors. 2022; 22(21):8185. https://doi.org/10.3390/s22218185
Chicago/Turabian StyleLimbu, Sarita, and Sivanesan Dakshanamurthy. 2022. "Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method" Sensors 22, no. 21: 8185. https://doi.org/10.3390/s22218185
APA StyleLimbu, S., & Dakshanamurthy, S. (2022). Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method. Sensors, 22(21), 8185. https://doi.org/10.3390/s22218185