Next Article in Journal
HR-MAS NMR Applications in Plant Metabolomics
Next Article in Special Issue
iT4SE-EP: Accurate Identification of Bacterial Type IV Secreted Effectors by Exploring Evolutionary Features from Two PSI-BLAST Profiles
Previous Article in Journal
The Azide-Allene Dipolar Cycloaddition: Is DFT Able to Predict Site- and Regio-Selectivity?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Identification of Adverse Drug Reactions (ADRs) of Digestive and Nervous Systems of Organic Drugs by In Silico Models

1
College of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
2
Fujian Key Laboratory of TCM Health Status Identification, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
3
Fujian Engineering Center of Intelligent Diagnosis and Treatment of TCM Four Diagnosis, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
4
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
*
Author to whom correspondence should be addressed.
Molecules 2021, 26(4), 930; https://doi.org/10.3390/molecules26040930
Submission received: 15 January 2021 / Revised: 29 January 2021 / Accepted: 6 February 2021 / Published: 10 February 2021

Abstract

:
This study aimed to discover concurrences of adverse drug reactions (ADRs) and derive models of the most frequent items of ADRs based on the SIDER database, which included 1430 marketed drugs and 5868 ADRs. First, common ADRs of organic drugs were manually reclassified according to side effects in the human system and followed by an association rule analysis, which found ADRs of digestive and nervous systems often occurred at the same time with a good association rule. Then, three algorithms, linear discriminant analysis (LDA), support vector machine (SVM) and deep learning, were used to derive models of ADRs of digestive and nervous systems based on 497 organic monomer drugs and to identify key structural features in defining these ADRs. The statistical results indicated that these kinds of QSAR models were good tools for screening ADRs of digestive and nervous systems, which gave the ROC AUC values of 81.5%, 98.9%, 91.5%, 69.5%, 78.4% and 78.8%, respectively. Then, these models were applied to investigate ADRs of 1536 organic compounds with four phase and zero rule-of-five (RO5) violations from the ChEMBL database. Based on the consensus ADRs’ predictions of models, 58.1% and 42.6% of compounds were predicted to cause these two ADRs, respectively, indicating the significance of initial assessment of ADRs in early drug discovery.

Graphical Abstract

1. Introduction

Adverse drug reactions (ADRs) are inherent features of drug structures that often caused unnecessary suffering and threats to human health and became obstacles to the discovery of new drugs. In general, drugs have multiple side effects even when prescribed at the appropriate doses and used correctly, as recorded in the SIDER database, which includes information on 1430 marketed drugs and 5868 ADRs [1]. For instance, ibuprofen is the most commonly used and prescribed non-steroidal anti-inflammatory drug. It causes commonly reported side effects, including hemorrhage, vomiting, anemia, decreased hemoglobin, eosinophilia and hypertension [1,2]. Severe ADRs can induce drugs to be withdrawn from the market. For instance, troglitazone, a drug for treatment of diabetes that decreases blood glucose significantly without body weight changes, was withdrawn from the market due to it causing the ADR of hepatic failure [3,4]. Valdecoxib, an anti-inflammatory drug, was withdrawn from the market due to ADRs of nervous and cardiovascular systems [2]. Although ADRs may not be avoided, they can be predictable. Hence, fast identification of ADRs of drugs and filtering out unqualified candidate compounds in early drug discovery play a significant role in the development of new drugs. There are many in vitro methods that screen for toxicological effects, including animal and various cell studies, reactive metabolites, human organ microsomes, etc. [4,5]. However, there are usually some gaps in clinical practice. For instance, the hepatotoxic effects of some drugs in vivo of human and in vitro of animals may be contradictory, such as those of furosemide [6,7]. Differences in acidity and transit time in the gastrointestinal tract between animals and humans can make the solubility and permeability of drugs vary greatly [8]. Additionally, ADRs that impair specific human functions (such as digestive and nervous systems) are difficult to evaluate by these methods. Meanwhile, drugs often have multiple side effects, which may involve some concurrences. Compared with experimental methods, predicting ADRs by in silico models is more time-saving, low-cost and effective.
For the above reasons, human in vivo data are preferable for the identification of associated ADRs of drugs and discovery of their structural characteristics by QSAR modeling, and play a crucial role in filtering out unqualified candidate compounds in early drug discovery. QSAR models apply linear and non-linear algorithms to associate chemical structures with specific activities or properties, which have become increasingly popular in many fields for predicting compound properties, e.g., toxicity, physical properties and biological activity [9]. To date, the QSAR models of hepatotoxicity, cardiotoxicity and nephrotoxicity are commonly reported in light of the relatively clear mechanisms of these ADRs. Pan et al. and Huang et al. developed QSAR models of hepatotoxicity to evaluate hepatotoxicity of traditional Chinese medicines based on the Liver Toxicity Knowledge Base [10,11]. Ancuceanu et al. built computational models for predicting drug hepatotoxicity based on the DILIrank dataset by using machine learning algorithms [12]. Cai et al. built a QSAR model of cardiotoxicity by using a deep learning algorithm for risk assessment of hERG-mediated cardiotoxicities in drug discovery and postmarketing surveillance [13]. Satalkar et al. developed (QSAR) models for fatal drug- induced renal toxicity by using three algorithms, including simple K-means clustering, decision tree and linear regression analysis [14]. Sun et al. developed QSAR classification models to predict potential nephrotoxic ingredients in traditional Chinese medicines by using SVM and ANN [15].
There has not yet been an association analysis of ADRs and QSAR models of ADRs whose mechanisms of action are vague and not easily tested by traditional experimental methods, such as ADRs of digestive and nervous systems. Therefore, this study aimed to discover patterns of concurrences of ADRs among marketed organic drugs, derive models of the most frequent items of ADRs, and mine out key structural features in defining these ADRs based on the SIDER database, which includes 1430 marketed drugs and 5868 ADRs. Firstly, we reclassified common ADRs into seven systematic categories according to side effects in the human system and applied an association rule algorithm named a priori to discover patterns of concurrences of seven ADRs, which found that ADRs of digestive and nervous systems often occurred at the same time with good rule support. Second, three algorithms, linear discriminant analysis (LDA), support vector machine (SVM) and deep learning (DL), were used to derive models of ADRs of digestive and nervous systems based on structures of marketed organic monomer drugs and mine out key structural features in defining these ADRs. Then, these models were simultaneously applied to investigate ADRs of digestive and nervous systems of 1536 organic compounds with four phase and zero rule-of-five (RO5) violations from the ChEMBL database.

2. Results and Discussions

2.1. Common ADRs of Marketed Organic Drugs

According to the definition that an ADR is common if occurring at a frequency of greater than 10%, 566 organic drugs were identified as common ADR drugs after removing gold or ion compounds. According to side effects in the human body system, the common ADRs can be grouped into gastrointestinal toxicity, nervous system reaction, allergy, hematopoietic system reaction, circulatory system reaction, hepatotoxicity reaction and urinary system reaction. As listed in Table 1, it was found that 418 drugs had digestive system toxicity, 259 drugs had allergies, and 442 drugs had nervous reactions, 99 drugs caused urinary system reactions, 108 drugs had hematopoietic system reactions, and 75 drugs induced hepatotoxic reactions, respectively.

2.2. Associations between ADRs of Organic Drugs

Three key measures of interest for an association rule (support, confidence and lift) were used to select useful rules for the prediction of ADRs. As shown in Table 2, only one association rule was generated, which met the rule of minimum support of 50% and confidence of 80%. The derived association rule was about ADRs of digestive toxicity and nervous reactions, which has the rule support of 59.89%, confidence of 81.1% and lift of 1.04, respectively. This showed that ADRs of digestive and nervous systems often occurred at the same time with the rule support of about 60% and confidence of about 81%. It can be clearly concluded that ADRs of nervous and digestive system were the most frequent item-sets among common ADRs, and ADRs of digestive system can cause ADRs of nervous system, indicating that these two ADRs have some correlations. Thus, it is of great importance to develop QSAR models for understanding and predicting ADRs of nervous and digestive systems, which may eliminate drug candidates with such ADRs in early drug development and reduce the rate of attrition and decrease the cost of drug discovery.

2.3. Dataset Splitting

To investigate the chemical diversity of the modeling dataset, the Tanimoto similarity index was calculated based on FP2 fingerprints using Openbabel 2.3.0. Figure 1 demonstrated the Tanimoto similarity indices among drugs ranged from 0.00 to 0.99. From the contour graph of Tanimoto similarity indices, it can be clearly noted that all structures of compounds had low similarity. Additionally, the average index of Tanimoto similarity was only 0.17, which together indicated the significant chemical diversity of modeling dataset. Then, the cluster analysis of 497 drugs was performed by using descriptors and principal component analysis, which led to discovery of 348 clusters of structural diversities. This result further indicated the great structural diversity of the drugs in the dataset. To ensure the biggest structural diversity of the training set, the division of the training and test set chemicals was conducted based on the chemical space distribution of drugs in the cluster analysis. Consequently, 380 drugs were included in the training set and 117 drugs fell into the test set. Figure 2 shows the chemical space distribution of the training set and test set, marked by ‘‘black diamond” and “red round” symbols, respectively.

2.4. SW-LDA Model Results of ADRs of Digestive System

After a stepwise method combined with an LDA (SW-LDA) process, 12 molecular descriptors were identified as the QSAR model parameters from the above remaining 130 descriptors. The corresponding LDA model was simultaneously derived by these descriptors. The linear discriminant function was as follows:
y = 0.078 × (PEOE_VSA-4) + 3.495 × PEOE_VSA_FNEG − 3.235 × vsurf_CP +
0.07 × vsurf_DD13 − 1.472 × a_nP + 0.358 × MNDO_LUMO + 0.786 × reactive + 0.768 × a_nI −
0.296 × opr_violation − 2.593 × vsurf_CW1 − 0.007 × SlogP_VSA5 − 0.105 × vsurf_IW7 + 6.105
Table 3 lists the selected descriptors, tolerance, Wilks’ lambda values, variance inflation factor (VIF), F-test values and statistical significance (p-value). The statistical significance of all selected descriptors was less than 0.001, showing that they were obvious features in defining ADRs of the digestive system. The tolerance of descriptors was more than 0.1 and VIF was less than 10, indicating no multicollinearity existed among these variables in the LDA model. Table 4 gives the statistical results of the proposed model. As described in Table 4, the obtained LDA model was successful and of good predictive ability. The training accuracy value of 76.58% revealed that the LDA QSAR model can give 76.58% classification accuracy in the training set. The 10-fold cross-validation value of the training set was 72.89% (more than 50%), showing that the developed QSAR LDA model had acceptable stability and predictive ability. Additionally, the predictive accuracy of the test set also reached 72.65%, indicating the good prediction and generalization ability of the LDA model. The sensitivity and specificity of the derived QSAR model were 80.5% and 67.3%, respectively, implying better ability in predicting positive compounds than negative ones. The AUC value of the ROC curve was 81.5%, showing that the LDA model was acceptable in prediction of ADRs of the digestive system and 12 descriptors can be used as the main structural features in defining ADRs of the digestive system.

2.5. SW-LDA Model Results of ADRs of Nervous System

After SW-LDA was performed, the best QSAR model for ADRs of the nervous system was generated with six molecular descriptors based on the same training set of ADRs of the digestive system. The obtained QSAR model was given as follows:
y = 0.03 × (PEOE_VSA+5) − 0.244 × vsurf_IW7 − 0.697 × std_dim2 + 0.024 × SMR_VSA3 −
3.498 × Q_VSA_FPPOS + 0.153 × MNDO_dipole + 1.927
The selected variables and their chemical meanings, standard coefficients, tolerance and VIF are shown in Table 5. The values of tolerance, VIF and significance showed that these six descriptors were significant features in defining ADRs of the nervous system and each of them were independent. Table 5 lists the statistical results of the proposed model. As described in Table 4, the derived QSAR model was of acceptable predictive ability. The QSAR model can give 69.21% variance in ADRs of the nervous system in the training set. The accuracy value of 10-fold cross-validation was 68.68% (more than 50%) and the prediction accuracy for the external test set was 64.1% too, showing that the developed QSAR model was of acceptable stability and predictive ability.

2.6. Interpretation of the Descriptors

It is possible to define some vital structural features governing ADRs of digestive or nervous systems by interpreting the molecular descriptors in the QSAR models based on the same training set. In the QSAR models of ADRs of digestive and nervous systems, 12 and 6 descriptors for each model were uncovered, respectively. Additionally, vsurf_IW7 was the same descriptor for two QSAR models, which represents the hydrophilic integy moment that belongs to descriptors of surface area, volume and shape dependent on the structure connectivity and conformation. Here, vsurf_IW7 negatively contributed to these two ADRs, showing that a higher vsurf_IW7 may weaken ADRs of digestive and nervous systems. In order to investigate some correlations between descriptors in two QSAR models, the correlation coefficients of their descriptors were calculated and listed in Table 6. Of note is that, except for the common descriptor (vsurf_IW7), no descriptors in two QSAR models were related to each other based on all correlation coefficient values less than 0.75. This result indicated that vsurf_IW7 was the main factor in defining two ADRs of digestive and nervous systems and other descriptors in two ADR models were independent, especially for descriptors of atom counts and bond counts and physical properties only involved in ADRs of digestive system, including a_nI, a_nP, opr_violation and reactive, and can well distinguish ADRs of the two systems.

2.7. Results of SVM Models

The combination of C and γ in RBF kenel function was optimized to derive the best SVM models for ADRs of digestive and nervous systems. The optimum values of C and γ used in the two ADRs models were 150 and 2.5, 150 and 1, together with a maximum 10-fold cross-validation accuracy of 75.79% and 76.05%, respectively. Thereby, the final optimal SVM models for ADRs of digestive and nervous systems were generated as well. As shown in Table 4, the SVM model for ADRs of the digestive system gave quite satisfactory results: accuracytrain of 98.42%, accuracycv of 75.79%, accuracytest of 76.92%, sensitivity of 99.63%, specificity of 95.58% and AUC of 98.9%, respectively, exhibiting significantly high prediction and generalization ability in distinguishing ADRs of digestive system of drugs. Additionally, the final optimal SVM model for ADRs of the nervous system produced satisfactory results: accuracytrain of 80.26%, accuracycv of 76.05%, accuracytest of 83.76% and AUC of 78.4%, respectively. Therefore, compared with the LDA models, SVM models can enhance the ability to predict and generalize the ADRs of digestive and nervous systems.

2.8. Results of Deep Learning Models

Similarly, the QSAR models for ADRs of digestive and nervous systems derived by DL were built using the same input variables as used in the above models. Here, we used RapidMiner studio software to perform DL experiments. Table 2 also lists the prediction ability of final DL models of ADRs of digestive and nervous systems. The final DL model of ADRs of digestive system gave satisfactory results: accuracytrain of 87.89%, accuracycv of 78.42%, accuracytest of 78.63%, sensitivity of 94%, specificity of 73.45% and AUC of 91.5%, respectively. Additionally, the final DL model of ADRs of the nervous system gave satisfactory results: accuracytrain of 82.89%, accuracycv of 73.68%, accuracytest of 81.2%, sensitivity of 91.75%, specificity of 53.93% and AUC of 78.8%, respectively. Obviously, DL enhanced the accuracy of descriptors in prediction of ADRs of digestive and nervous systems compared with LDA.

2.9. Comparison of Different Approaches and Consensus Prediction

From the above discussion, three algorithms performed well in prediction of ADRs of digestive and nervous systems based on the same descriptors. Apparently, the performance of SVM in prediction of ADRs of the digestive system outperformed LDA and DL, but in prediction of ADRs of the nervous system, DL was better than SVM from a comprehensive index of ROC curve, as shown in Figure 3, suggesting different algorithms suit peculiar aspects of some special structures. Thus, it seemed reasonable that a consensus predicted result given by these kinds of QSAR models might be more strict and correct than individual models. Here, a consensus prediction of ADRs of digestive and nervous systems was derived by averaging the predictions for the dataset given by the individual models [15]. Based on the consensus ADR predictions of 1536 organic compounds with four phase and zero RO5 violations from ChEMBL database by three models, we found that 893 and 654 compounds were computationally identified to cause the above two ADRs, respectively. Among them, 433 compounds were predicted to cause these two ADRs. These results can be seen in the supplementary materials, indicating the significance of initial assessment of ADRs in early drug discovery.

3. Methods and Materials

3.1. Association Analysis of Common Side Effects of Drugs

As defined in the SIDER database, ADRs with a frequency of more than 10% were identified as common ADRs and then grouped in seven systematic categories according to side effects in the human body system, including digestive, nervous, hepatotoxic, urinary, allergy, circulatory and hemopoietic systems. Additionally, we removed non-organic compounds that could not be further analyzed by descriptor calculation, which led to an acquisition of 566 organic drugs with common ADRs. Then, to explore the correlations between multiple side effects of organic drugs, an association rule analysis was applied to discover patterns of concurrences of ADRs in the database. Support, confidence and lift are three key measures of interestingness of an association rule. Support is an indicator of rule frequency. Confidence is the probability that consequent B will follow antecedent A. Lift is an indicator of the contribution antecedent A makes to consequent B [16]. Here, the association algorithm named a priori embedded in the clementine 12 software (SPSS, Inc., Chicago, IL, USA) was performed to search for concurrences of ADRs. The maximum frequent set was fixed to no more than 5, the minimum support and confidence of rules were set to 50% and 80%, respectively.

3.2. Molecular Descriptor Calculation

First, the molecular structures of 566 organic marketed drugs with common ADRs were downloaded from the Pubchem database based on their names and checked one by one. After removing multi-compound drugs, a total of 497 organic monomer drugs remained. Then, they were put into the Molecular Operating Environment software (MOE2008.10, Chemical Computing Group Inc., Montreal, Canada), to be subjected to the energy minimization of 3D structures. Subsequently, stochastic conformation search was performed to optimize their conformer structures. Then, a total of 327 diverse descriptors of optimized structures were calculated by utilizing the QSAR module of MOE. These 327 descriptors consisted of 184 2D molecular properties, 86 i3D molecular properties and 10 x3D structural information, which may be redundant and irrelevant for QSAR development. Thus, the constant or almost constant descriptors for all molecules were first deleted and then a pairwise correlation analysis was conducted to remove one of inter-correlated descriptors (with a correlation coefficient value greater than 0.95) [17]. Finally, a total set of 130 descriptors remained and was used for QSAR modeling.

3.3. Data Splitting

To investigate the chemical diversity of the whole dataset, the Tanimoto similarity index was calculated based on FP2 fingerprints using Openbabel 2.3.0 [18]. The Tanimoto similarity coefficient is the atomic pair shared between two molecules divided by all their atomic pairs, defined as c/(a + b + c). The variable c is the number of atomic pairs of the two compounds, and a and b are the numbers of their unique atomic pairs. To obtain reliable QSAR models, the data set was split into a training set and a test set by a range ratio of 3:1~4:1. The training set was used to construct QSAR models, and the test set was used as an external validation of derived models. To ensure the training set spanned the whole descriptor space and kept a balance distribution of the chemicals in two data sets [19], the cluster analysis of dataset was further investigated using the QuaSAR-Cluster module in MOE, which calculated the descriptor average vector x0 and covariance matrix S based on principal component analysis to assign similar molecules to one cluster.

3.4. QSAR Model Approach

3.4.1. Stepwise Linear Discriminant Analysis

LDA is one of the data dimensionality reduction and classification techniques widely used in QSAR modeling. The basic idea of the LDA algorithm is to project the data in low dimensions, so that the projection centers of the same type of data are as close as possible, and the projection centers of different types of data are as far apart as possible [20]. Feature selection is one important step for development of QSAR models. In this study, a stepwise method combined with LDA (SW-LDA) was conducted to derive the QSAR models and mine significant features in defining ADRs of drugs, which used F-test to eliminate redundant variables at each stage of the descriptor selection. Here, we used the default set of F values (Fmax = 3.84 and Fmin = 2.71) in the SW-LDA algorithm embedded in Clementine 12.

3.4.2. Support Vector Machine (SVM)

SVM is a very classic and efficient classification and regression algorithm proposed by Vapnik et al. in 1998 [21]. Compared with the current mainstream deep neural network technology, the SVM algorithm has certain advantages in solving small sample, nonlinear and high-dimensional feature data pattern recognition problems. The core idea of the SVM algorithm is to find the optimal classification surface (also called the hyperplane) between the two classes. SVM uses kernel functions such as the radial basis function (RBF), spline and Bessel for nonlinear transformation of the input space. Here, the RBF kernel in the SVM algorithm embedded in Clementine 12 was performed to derived non-linear models [22].

3.4.3. Deep Learning (DL)

Deep learning is based on a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation [23]. The network can contain a large number of hidden layers consisting of nervous with tanh, rectifier or maxout activation functions [24]. Advanced features such as adaptive learning rate, rate annealing, momentum training, dropout and L1 or L2 regularization enable high predictive accuracy. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously), and contributes periodically to the global model via model averaging across the network. Here, the DL algorithm embedded in the RapidMiner studio software (education version, RapidMiner, Inc., Boston, MA, USA) was conducted to derive QSAR models by using the maxout activation function.

3.5. Performance Evaluation

Then, to evaluate the predictive ability and reliability of QSAR models, widely applied internal and external validations, such as the 10-fold cross-validation and the test set validation, were applied. In the 10-fold cross-validation, the training set is randomly divided into ten equal subsets. Each time, one of the ten subsets is used as the validation set and the other nine subsets are put together to build a model. Then the average error across all ten trials is computed. Further, four important evaluation indicators for performance of QSAR models, including accuracy (ACC), balanced accuracy (BACC), sensitivity (SE), and specificity (SP), were calculated as follows [25].
ACC = ( TP + TN ) / ( TP + TN + FP + FN ) , SE = TP / ( TP + FN ) , SP = TN / ( TN + FP ) , BACC = ( SE + SP ) / 2 ,
where TP, TN, FP and FN represent the number of true positive, true negative, false positive and false negative ones, respectively. Additionally, the receiver operating characteristic (ROC) curve was performed to evaluate QSAR models with a more global and unbiased evaluation, which is a comprehensive index reflecting sensitivity and specificity [26]. The ROC curve is a graphical plot of the sensitivity or true positive rate against the false positive rate (1-specificity), which can be quantitatively described by the area under the curve (AUC). The larger the AUC, the higher the diagnostic accuracy.

3.6. Model Application

To investigate ADRs of digestive and nervous systems of other drugs, we applied these models in prediction of 1536 organic compounds with four phase and zero RO5 violations in the ChEMBL database [27], respectively. To obtain more strict and correct results, a consensus prediction of ADRs of digestive and nervous systems was derived by all three models, respectively.

4. Conclusions

In this study, the association rule analysis was initially used to discover patterns of concurrences of ADRs of 566 marketed organic drugs in the SIDER database. Then, three QSAR modeling algorithms, LDA, SVM and DL, were successfully used to derive models of ADRs of digestive and nervous systems and identified key structural features in defining these ADRs of organic drugs based on 497 marketed organic monomer drugs. Satisfactory results were obtained as follows. First, ADRs of digestive and nervous systems often occurred at the same time with the rule support of about 60% and confidence of about 81%, indicating these two ADRs co-occurred very frequently in adverse drug events. Second, QSAR models derived by LDA, SVM and DL were good tools for screening ADRs of digestive and nervous systems, which gave the ROC AUC values of 81.5%, 98.9%, 91.5%, 69.5%, 78.4% and 78.8% in discriminating ADRs of digestive and nervous systems, respectively. The vsurf_IW7 was the same descriptor for two QSAR models, which may be responsible for two ADRs of digestive and nervous systems. Other descriptors in two ADR models were independent, especially for descriptors of atom counts and bond counts and physical properties only involved in ADRs of the digestive system, including a_nI, a_nP, opr_violation and reactive, and can well distinguish these two ADRs. Then, these models were applied to investigate ADRs of digestive and nervous systems of 1536 organic compounds with four phase and zero RO5 violations from the ChEMBL database. Based on the consensus ADR predictions of models, among 1536 organic compounds 58.1% and 42.6% of compounds were computationally identified to cause such two ADRs, respectively, indicating the significance of initial assessment of ADRs in early drug discovery.

Supplementary Materials

The following are available online: ADRs predictions of 1536 organic compounds with four phase and zero rule-of-five (RO5) violations from the ChEMBL database.

Author Contributions

Conceptualization, Z.Y.; methodology, M.C.; software, Y.G.; validation, M.C.; data curation, M.C.; writing—original draft preparation, M.C.; writing—review and editing, C.L. and Z.Y.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Fujian Provincial Natural Science fund of China (2019J01347) and National Natural Science Foundation programs of China (U1705286, 81230087 and 81973751).

Data Availability Statement

The data presented in this study are available within the article and in supplementary materials.

Acknowledgments

We thank RapidMiner, Inc. for the education version of the RapidMiner studio tool. We also thank all who offered resources used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Sample Availability

Not applicable.

References

  1. Michael, K.; Ivica, L.; Juhl, J.L.; Peer, B. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016, 44, D1075–D1079. [Google Scholar]
  2. DrugBank. Available online: https://www.drugbank.ca/ (accessed on 10 May 2019).
  3. Watkins, P.B. Insight into hepatotoxicity: The troglitazone experience. Hepatology 2005, 41, 229–230. [Google Scholar] [CrossRef]
  4. Kuna, L.; Bozic, I.; Kizivat, T.; Bojanic, K.; Mrso, M.; Kralj, E.; Smolic, R.; Wu, G.Y.; Smolic, M. Models of drug induced liver injury (DILI)—Current issues and future perspectives. Curr. Drug Metab. 2018, 19, 830–838. [Google Scholar] [CrossRef]
  5. Hosey, C.M.; Benet, L.Z. Experimental ADME and Toxicology. In Comprehensive Medicinal Chemistry III, 3rd ed.; Chackalamannil, S., Rotella, D., Ward, S., Eds.; Elsevier: Amsterdam, The Netherland, 2017. [Google Scholar]
  6. FDA. Guidance for industry integrated summaries of effectiveness and safety: Location within the common technical document; FDA: Rockville, MD, USA, 2009; pp. 532–536. [Google Scholar]
  7. Walker, R.M.; McElligott, T.F. Furosemide induced hepatotoxicity. J. Pathol. 1981, 135, 301–314. [Google Scholar] [CrossRef] [PubMed]
  8. Markovic, M.; Zur, M.; Fine-Shamir, N.; Haimov, E.; González-Álvarez, I.; Dahan, A. Segmental-dependent solubility and permeability as key factors guiding controlled release drug product development. Pharmaceutics 2020, 12, 295. [Google Scholar] [CrossRef] [Green Version]
  9. Lo Piparo, E.; Fratev, F.; Lemke, F.; Mazzatorta, P.; Smiesko, M.; Fritz, J.I.; Benfenati, E. QSAR models for Daphnia magna toxicity prediction of benzoxazinone allelochemicals and their transformation products. J. Agric. Food Chem. 2016, 54, 1111–1115. [Google Scholar] [CrossRef]
  10. Zhao, P.; Liu, B.; Wang, C. Hepatotoxicity evaluation of traditional Chinese medicines using a computational molecular model. Clin. Toxicol. 2017, 55, 996–1000. [Google Scholar] [CrossRef]
  11. Huang, S.H.; Tung, C.W.; Fülöp, F.; Li, J.H. Developing a qsar model for hepatotoxicity screening of the active compounds in traditional Chinese medicines. Food Chem Toxicol. 2015, 78, 71–77. [Google Scholar] [CrossRef]
  12. Ancuceanu, R.; Hovanet, M.V.; Anghel, A.I.; Furtunescu, F.; Dinu, M. Computational models using multiple machine learning algorithms for predicting drug hepatotoxicity with the dilirank dataset. Int. J. Mol. Sci. 2020, 21, 2114. [Google Scholar] [CrossRef] [Green Version]
  13. Cai, C.; Guo, P.; Zhou, Y.; Zhou, J.; Wang, Q.; Zhang, F.; Fang, J.; Cheng, F. Deep learning-based prediction of drug-induced cardiotoxicity. J. Chem. Inf. Model 2019, 59, 1073–1084. [Google Scholar] [CrossRef]
  14. Satalkar, V.; Kulkarni, S.; Joshi, D. QSAR based analysis of fatal drug induced renal toxicity. J. Comput. Methods Mol. Des. 2015, 5, 24–32. [Google Scholar]
  15. Sun, Y.; Shi, S.; Li, Y.; Wang, Q. Development of quantitative structure-activity relationship models to predict potential nephrotoxic ingredients in traditional chinese medicines. Food Chem. Toxicol. 2019, 128, 163–170. [Google Scholar] [CrossRef]
  16. Ordonez, C.; Zhao, K. Evaluating association rules and decision trees to predict multiple target attributes. Intell. Data Anal. 2011, 15, 173–192. [Google Scholar] [CrossRef] [Green Version]
  17. Chen, M.; Yang, X.; Lai, X.; Gao, Y. Structural Investigation for optimization of anthranilic acid derivatives as partial fxr agonists by in silico approaches. Int. J. Mol. Sci. 2016, 17, 536. [Google Scholar] [CrossRef] [Green Version]
  18. O’Boyle, N.M.; Banck, M.; James, C.A.; Morley, C.; Vandermeersch, T.; Hutchison, G.R. Open Babel: An open chemical toolbox. J. Cheminform. 2011, 3, 33. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Chen, M.; Yang, F.; Kang, J.; Gan, H.; Lai, X.; Gao, Y. Discovery of molecular mechanism of a clinical herbal formula upregulating serum HDL-c levels in treatment of metabolic syndrome by in vivo and computational studies. Bioorg. Med. Chem. Lett. 2018, 28, 174–180. [Google Scholar] [CrossRef]
  20. Guo, J. Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis. Biostatistics 2010, 11, 599–608. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Vapnik, V. The Support Vector Method of Function Estimation. In Nonlinear Modeling; Springer: Boston, MA, USA, 1998; pp. 55–85. [Google Scholar]
  22. Cheng, F.; Guo, T.; Liu, C.; Wang, Y.; Huang, B. Identification of the thief zone using a support vector machine method. Processes 2019, 7, 373. [Google Scholar]
  23. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  24. Khumprom, P.; Yodo, N. A data-driven predictive prognostic model for lithium-ion batteries based on a deep learning algorithm. Energies 2019, 12, 660. [Google Scholar] [CrossRef] [Green Version]
  25. Zhang, Y.; Yang, S.; Liu, Y.; Zhang, Y.; Zhou, F. Integration of 24 feature types to accurately detect and predict seizures using scalp EEG signals. Sensors 2018, 18, 1372. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Linden, A. Measuring diagnostic and predictive accuracy in disease management: An introduction to receiver operating characteristic (ROC) analysis. J. Eval. Clin. Pract. 2006, 12, 132–139. [Google Scholar] [CrossRef] [PubMed]
  27. Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Krüger, F.A.; Light, Y.; Mak, L.; McGlinchey, S.; et al. The ChEMBL bioactivity database: An update. Nucleic Acids Res. 2014, 42, 1083–1090. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. A contour graph of Tanimoto Similarity Index to show the structural similarity of modeling dataset. The abscissa and ordinate are molecular IDs, respectively.
Figure 1. A contour graph of Tanimoto Similarity Index to show the structural similarity of modeling dataset. The abscissa and ordinate are molecular IDs, respectively.
Molecules 26 00930 g001
Figure 2. The chemical space distribution of training set and test set based on PC1, PC2 and PC3 components of 130 descriptors. Black diamond represents training set; red round symbol represents test set.
Figure 2. The chemical space distribution of training set and test set based on PC1, PC2 and PC3 components of 130 descriptors. Black diamond represents training set; red round symbol represents test set.
Molecules 26 00930 g002
Figure 3. ROC curves for three ADRs of models on digestive (a) and nervous systems (b).
Figure 3. ROC curves for three ADRs of models on digestive (a) and nervous systems (b).
Molecules 26 00930 g003
Table 1. Some common adverse drug reaction (ADR) information on drugs in the SIDER database.
Table 1. Some common adverse drug reaction (ADR) information on drugs in the SIDER database.
IDADR CategoriesMain Items of ADRsNo. of Drugs
1Digestive system reactionsAbdominal pain; diarrhea; abdominal bloating; constipation; gastrointestinal disorder; nausea; anorexia; digestion impaired; gastrointestinal hemorrhage; flatulence; abnormal feces; abdominal pain upper; vomiting; gastroesophageal reflux disease, etc.418
2Nervous reactionsAbnormal involuntary movements; convulsion; headache; balance disorder; dizziness; arthralgia; depression; paresthesia; ataxia; amnesia; disorder sight; feeling abnormal; deafness; abnormal behavior; nervous symptoms; etc.442
3Allergy reactionsPruritus; dermatitis; dyspnea; injection site pain; rash maculo-papular; erythema; acne; application site irritation; anaphylactic shock; injection site reaction; application site erythema; allergic contact dermatitis; seborrheic dermatitis; blisters; etc.259
4Hepatotoxic reactionsAlanine aminotransferase increased; liver function test abnormal; hepatic encephalopathy; hepatic enzyme abnormal; transaminases increased; hepatitis; hepatotoxicity; hepatic failure; etc.75
5Cardiovascular reactionsAtrial fibrillation; cardiac output decreased; bradycardia; angina pectoris; arrhythmia; acute coronary syndrome; arterial insufficiency; cardiac disorder; angiopathy; atrial fibrillation; cardiac murmur; cardiotoxicity; blood triglycerides increased; ventricular arrhythmia; etc.85
6Urinary reactionsUrinary tract infection; dysuria; bladder pain; micturition disorder; urinary hesitation; nephropathy toxic; renal failure; renal tubular acidosis; blood creatinine increased; urinary retention; albuminuria; hematuria; urethral disorder; chronic kidney disease; protein in urine; etc.99
7Hematologic reactionsThrombocytopenia; anemia; coagulopathy; agranulocytosis; eosinophilia; hemoglobin decreased; platelet count decreased; activated partial thromboplastin time prolonged; white blood cell count decreased; etc.108
Table 2. Results of association rule analysis of ADRs of organic drugs.
Table 2. Results of association rule analysis of ADRs of organic drugs.
ConsequentAntecedentInstancesSupport%Confidence%Rule Support%Lift
ADRs of nervous systemADRs of digestive system 41873.8581.159.891.04
Table 3. Molecular descriptors and the standardized coefficient of the linear discriminant analysis (LDA) model for ADRs of digestive system.
Table 3. Molecular descriptors and the standardized coefficient of the linear discriminant analysis (LDA) model for ADRs of digestive system.
DescriptorsChemical MeaningToleranceWilks’
Lambda
VIFF to
Remove
Non-Standardized Coefficientp-Value
PEOE_VSA-4Total negative van der Waals surface area0.7710.8531.29771.4890.0780.00
PEOE_VSA_FNEGFractional negative van der Waals surface area0.7730.7361.29411.2593.4950.00
vsurf_CPCritical packing parameter0.6970.7451.43515.798−3.2350.00
vsurf_DD13Contact distances of lowest hydrophobic energy0.9080.7371.10111.5780.070.00
a_nPNumber of phosphorus atoms0.8980.741.11413.367−1.4720.00
MNDO_LUMOThe energy (eV) of the lowest unoccupied molecular orbital calculated using the MNDO Hamiltonian0.7570.751.32118.1530.3580.00
reactiveIndicator of the presence of reactive groups0.950.7331.0529.7820.7860.00
a_nINumber of iodine atoms0.9720.7281.0297.3170.7680.00
opr_violationThe number of violations of Oprea’s lead-like test0.4180.7392.39212.911−0.2960.00
vsurf_CW1Capacity factor0.4170.742.39213.506−2.5930.00
SlogP_VSA5The subdivided surface areas0.7150.7251.3995.435−0.0070.00
vsurf_IW7Hydrophilic integy moment0.9460.7231.0574.663−0.1050.00
Constant 6.105
Table 4. The classification performance of the derived QSAR models for ADRs of digestive and nervous systems.
Table 4. The classification performance of the derived QSAR models for ADRs of digestive and nervous systems.
QSAR
Models
Accuracy
Training Set
Accuracy
Test Set
Accuracy
10-Fold CV
Total
Accuracy
BACCSensitivitySpecificityROC
AUC
Digestive systemLDA76.58%72.65%72.89%75.65%73.9%80.5%67.3%0.815
SVM98.42%76.92%75.79%93.36%97.61%99.63%95.58%0.989
DL87.89%78.63%78.42%85.71%83.73%94%73.45%0.915
Nervous systemLDA69.21%64.1%68.68%68%67.4%70.8%64%0.695
SVM80.26%83.76%76.05%81.09%62.94%95.53%30.34%0.784
DL82.89%81.2%73.68%82.49%72.84%91.75%53.93%0.788
Table 5. Molecular descriptors and the standardized coefficient of the LDA model for ADRs of nervous system.
Table 5. Molecular descriptors and the standardized coefficient of the LDA model for ADRs of nervous system.
DescriptorsChemical MeaningToleranceWilks’
Lambda
VIFF to
Remove
Non-standardized Coefficientp-Value
PEOE_VSA+5Total positive van der Waals surface area0.530.9321.8877.7120.030.004
vsurf_IW7Hydrophilic integy moment0.9490.9321.0547.875−0.2440.001
std_dim2Standard dimension 20.3110.9393.21510.599−0.6970.00
SMR_VSA3Subdivided surface areas0.3040.9333.2898.2310.0240.00
Q_VSA_FPPOSFractional positive polar van der Waals surface area0.9170.9251.0914.821−3.4980.00
MNDO_dipoleThe dipole moment calculated using the MNDO Hamiltonian0.8880.9231.1264.2610.1530.00
Constant 1.927
Table 6. Correlation coefficients of descriptors in two QSAR models of ADRs of digestive and nervous systems.
Table 6. Correlation coefficients of descriptors in two QSAR models of ADRs of digestive and nervous systems.
DescriptorsPEOE_
VSA+5
Q_VSA_
FPPOS
MNDO_
dipole
SlogP_
VSA3
vsurf_
IW7
std_
dim2
reactive0.127−0.0270.0580.169−0.0780.129
a_nI0.1080.056−0.0730.075−0.0540.049
a_nP−0.0440.166-0.0220.036−0.0210.038
PEOE_VSA-4−0.0470.1730.1430.0140.002−0.018
PEOE_VSA_FNEG−0.071−0.714−0.076−0.2440.131−0.131
opr_violation0.4530.152−0.2050.488−0.1090.698
MNDO_LUMO−0.0010.05−0.1340.014−0.0070.004
SlogP_VSA50.2530.268−0.1310.292−0.0120.412
vsurf_CP−0.263−0.4−0.107−0.2030.086−0.211
vsurf_CW1−0.3080.2470.23−0.373−0.038−0.591
vsurf_DD130.4150.053−0.1630.499−0.0080.47
vsurf_IW7−0.07−0.1850.102−0.081−0.074
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, M.; Yang, Z.; Gao, Y.; Li, C. Fast Identification of Adverse Drug Reactions (ADRs) of Digestive and Nervous Systems of Organic Drugs by In Silico Models. Molecules 2021, 26, 930. https://doi.org/10.3390/molecules26040930

AMA Style

Chen M, Yang Z, Gao Y, Li C. Fast Identification of Adverse Drug Reactions (ADRs) of Digestive and Nervous Systems of Organic Drugs by In Silico Models. Molecules. 2021; 26(4):930. https://doi.org/10.3390/molecules26040930

Chicago/Turabian Style

Chen, Meimei, Zhaoyang Yang, Yuxing Gao, and Candong Li. 2021. "Fast Identification of Adverse Drug Reactions (ADRs) of Digestive and Nervous Systems of Organic Drugs by In Silico Models" Molecules 26, no. 4: 930. https://doi.org/10.3390/molecules26040930

APA Style

Chen, M., Yang, Z., Gao, Y., & Li, C. (2021). Fast Identification of Adverse Drug Reactions (ADRs) of Digestive and Nervous Systems of Organic Drugs by In Silico Models. Molecules, 26(4), 930. https://doi.org/10.3390/molecules26040930

Article Metrics

Back to TopTop