Next Article in Journal
Effect of Combined High Pressure and Thermal Treatment on Myofibrillar Proteins Solubilization of Beef Muscle
Previous Article in Journal
Optimization of Ultrasound-Assisted Extraction of Anthocyanins from Mulberry, Using Response Surface Methodology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Investigations on Inhibitors of Hedgehog Signal Pathway: A Quantitative Structure-Activity Relationship Study

1
Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China
2
Department of Natural Medicinal Chemistry, School of Pharmacy, Second Military Medical University, Shanghai 200433, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2011, 12(5), 3018-3033; https://doi.org/10.3390/ijms12053018
Submission received: 14 March 2011 / Revised: 20 April 2011 / Accepted: 28 April 2011 / Published: 11 May 2011
(This article belongs to the Section Physical Chemistry, Theoretical and Computational Chemistry)

Abstract

:
The hedgehog signal pathway is an essential agent in developmental patterning, wherein the local concentration of the Hedgehog morphogens directs cellular differentiation and expansion. Furthermore, the Hedgehog pathway has been implicated in tumor/stromal interaction and cancer stem cell. Nowadays searching novel inhibitors for Hedgehog Signal Pathway is drawing much more attention by biological, chemical and pharmological scientists. In our study, a solid computational model is proposed which incorporates various statistical analysis methods to perform a Quantitative Structure-Activity Relationship (QSAR) study on the inhibitors of Hedgehog signaling. The whole QSAR data contain 93 cyclopamine derivatives as well as their activities against four different cell lines (NCI-H446, BxPC-3, SW1990 and NCI-H157). Our extensive testing indicated that the binary classification model is a better choice for building the QSAR model of inhibitors of Hedgehog signaling compared with other statistical methods and the corresponding in silico analysis provides three possible ways to improve the activity of inhibitors by demethylation, methylation and hydroxylation at specific positions of the compound scaffold respectively. From these, demethylation is the best choice for inhibitor structure modifications. Our investigation also revealed that NCI-H466 served as the best cell line for testing the activities of inhibitors of Hedgehog signal pathway among others.

Graphical Abstract

1. Introduction

The hedgehog signaling pathway plays a key role in the control of cell differentiation, growth, and proliferation [1]. Briefly, hedgehog signal pathway is composed of four important components including Sonic Hedgehog, Patched, Smoothened and Gli transcription factors. Sonic Hedgehog is a secreted protein that can transduce signals between cells. Patched acts as a receptor protein to be binded by Sonic Hedgehog. When Sonic Hedgehog is absent, Patched can block the function of Smoothened. In addition, Smoothened would be activated and initiate a signaling cascade that results in the activation of Gli transcription factors when Sonic Hedgehog binds with Patched. These Gli transcription factors will translocate into the nucleus where the transcription of target genes is controlled. Recent studies have found that constitutively activating the pathway can trigger cancer in adult humans, leading to basal cell carcinoma, medulloblastoma, rhabdomyosarcoma, prostate, pancreatic and breast cancers [25].
Due to the direct relationship between the activation of hedgehog signaling pathway and oncogenesis, cancer researchers have been dedicated to find specific inhibitors of hedgehog signaling since it will provide efficient therapies for a wide range of malignancies [68]. Until now, only specific Smoothened inhibitors have been identified. Cyclopamine, a steroid alkaloid isolated from the corn lily (Veratrum californicum), is one of the small chemical compounds that specifically inhibit Smoothened in the hedgehog signaling pathway [9]. However, there is still no efficient pathway to synthesis Cyclopamine because of its low solubility in aqueous or polar solvents and little effort has been devoted into the synthesis of cyclopamine derivatives [1013]. In order to develop clinically effective drugs, modifications of parent lead compounds to generate derivatives to study the structure-activity relationship (SAR) become necessary [13]. Janardanannair et al. [9,14] have pioneered such investigations on the SAR of cyclopamine derivatives. Their results quantitatively indicated that modification on secondary amine and oxidation to ketone from 3-Hydroxy could help to influence the activities of cyclopamine derivatives. However, both studies had less than 30 samples, which is far from satisfactory for a sound QSAR study.
In order to better understand Hedgehog signal pathway as well as design efficient inhibitors for this pathway, 93 cyclopamine derivatives were synthesized and their activities were tested against four different cell lines (BxPC-3, NCI-H446, SW1990 and NCI-H157) respectively [15,16]. Based on these experimental data, a systematical investigation was carried out on SAR of inhibitors of Hedgehog signal pathway by incorporation of various statistic modeling approaches and comparison of different descriptors and statistical division approaches of these data.

2. Results and Discussion

Based on the computational framework outlined in Material and Methods, the following results or clues were obtained for the QSAR modeling of inhibitors of Hedgehog signal pathway.

2.1. The Influence of Descriptors on the QSAR Modeling of Inhibitors of Hedgehog Signal Pathway

As mentioned above, two distinct sets of descriptors were tested to describe the 93 chemical compounds respectively (Table 1 and Table 2). For the self-fitting of training data (highlighted in red), we found that the models derived from physical properties are more efficient than those derived from topological indices for QSAR modeling. It can be seen that almost all the values of σ in this case are negative. However, with regard to independent testing (highlighted in royal blue), it seems that QSAR models derived from the DLI descriptors [17] are much more robust than those derived from general descriptors [18], and in this case almost all the values σ are positive. As an intermediate state, the values of σ derived from cross validation (highlighted in yellow-green) contain several negative and positive ones respectively. In total, the above mentioned result indicated that when projecting the connection table information into physical properties, the general descriptors will lose some structural information of a compound. Such loss of information is different for training and testing datasets since this information is highly dependent on the conformation and structural essence of a molecule.
In conclusion, models derived from DLI are much more stable for both training data and testing data, while general descriptors cannot guarantee such stability and scale in independent data.

2.2. The Influence of Data Division on the QSAR Modeling of Inhibitors of Hedgehog Signal Pathway

It is normally known that QSAR predictions are only reliable within or near the property space used to train the model. Preparing a robust, unbiased and sufficiently large training set is critically important for the building of a proper statistical model. As mentioned above, two data division methods, i.e., Diverse Subset and Cluster plus Diverse Set were applied to divide our dataset into training set and testing set.
In order to statistically reveal the difference between the results influenced by two such kinds of data divisions, pair t-test was performed and the p-value derived from the above two tables (Table 1 and Table 2) was 0.88 (>0.05), which indicates that there is no significant statistical difference for these two data divisions for QSAR analysis. Our result has shown that clustering data before calculating the diverse set does not produce a significant influence on the QSAR models. This result was explained by analysis of the detailed algorithm in calculating the diverse set as follows: The Diverse Subset method used in MOE [19] ranks entries based on the whole dataset diversity, that is, the calculation of Diverse Subset itself is a global diversity comparison procedure. For the Cluster plus Diverse Set method, although an extra preprocess of clustering data exists, Diverse Subset still happens within every sub-cluster and the main difference, compared with the former, is that calculating diverse subset becomes a local procedure based on each clustering. It can be seen that essentially the two division methods have little influence on the final distribution of training data and testing data. Thus, as expected in our results, no significant differences for the results of these two division methods exist.

2.3. Comparison of PLS and SVR for QSAR Data Regression

When building a QSAR model, linear regression methods are normally preferred to the advanced non-linear methods, since the linear models are easier to use for a physical explanation of the prediction results. The most classical liner model in QSAR is PLS, which have been widely used in popular computer-aided drug design software [1921]. In our study, PLS (MOE-PLS) was first chosen to derive our QSAR models. However, as indicated in Tables 1 and 2, this linear model failed to achieve satisfactory results in QSAR study. The correlation coefficients from self-fitting testing and cross validation testing are all less than 0.65.
Since advanced machine learning methods such as ANN [22], Bayesian inference [23], Random Forest [24] and SVM [25] have been successfully applied in QSAR study [2636], our QSAR models were rebuilt using the SVR method, which is a derived regression model with powerful fitting ability as well as excellent prediction accuracy [3639]. In anticipating results, this method behaved well in the self-fitting testing of our training data (R2 is nearly 0.9) as well as in the cross-validation testing. Nevertheless, this method still performed badly in the independent test data, which indicates that such machine learning methods may not be generalized enough in the cyclopamine data. This is probably due to the fact that a substantial diversity exists in our dataset. Among the 93 data, four different scaffolds were found (Figure 1). In addition, there were still six molecules that did not match any of the scaffolds (Figure 2).

2.4. Comparison of Binary Bayesian Inference and SVM for QSAR Data Classification

When the qualities of the data or the underlying mechanism are not suitable for regression modeling, the binary classification was applied on the data to uncover their probabilities to be active or inactive. MOE has offered a binary filter to filtering the numerical data. Any properties which can be represented in a binary (yes/no) way (like active/inactive, toxic/non-toxic, drug-like/non-drug-like, permeable/non-permeable, etc.) could be mapped onto such a filter. Thus, the binary classification model was used to rebuild the QSAR models to further reveal their intrinsic characteristics. MOE’s binary filters (yes/no) are based on the Bayesian inference technique as mentioned in Material and Methods. Continuous activity data (non-binary) can be transferred to binary values with a specific threshold criterion. In our study the IC50 of the drug compound is used as a cut-off.
As shown in Table 1 and Table 2, the binary model behaved well on both training data and testing data sets. The overall prediction accuracy is improved to nearly 0.8 against NCI-H446 cell line. (Some were up to 0.906). This result has indicated that the binary QSAR classification model is more suitable to guide the direction of designing novel inhibitors of Hedgehog signal pathway.
The SVM classification was also applied to further validate the efficiency of binary classification models compared with regression models. The results shown in Table 1 and Table 2 reconfirmed that for our data the binary classification model is probably more suitable for QSAR analysis.

2.5. Cell Line Analysis

Four different cell lines (NCI-H446, NCI-H157, SW1990 and BxPC-3) were used to test the cytotoxicity of the 93 compounds. However, only the data of NCI-H446 can produce a reasonable model by QSAR analysis; the prediction accuracy of the models against all the other cell lines is about 0.6.
Why do some specific cell lines not fit well to our QSAR analysis? We speculate that the most likely reason is the non-specific cytotoxicity effect of these compounds to the other three cell lines. For example, HCI-H157 and BxPC-3 do not express the Gli and Smoothened protein, respectively [40,41]. That means that the cytotoxicity effect of these compounds may not directly result from the inhibition of hedgehog signaling. In addition, although sustained hedgehog signaling activity can be detected in SW1990 cells [41], it is very likely that cell lines grown in vitro may lose their dependence on hedgehog signaling for survival [42]. For example, the IC50 of positive compound (cyclopamine) is 9.13 μg/mL for NCI-H446, 38.11 μg/mL for BxPC-3, 61.05 μg/mL for SW1990 and 58.33 μg/mL for NCI-H157. That is to say, firstly, HCI-H466 cells were most sensitive to the hedgehog signaling inhibitor. In addition, the SW1990 possibly mutated and lost the hedgehog signaling in our experiment. In summary, the non-specific effects may result in the variance of the data of the cytotoxicity and finally affect the QSAR analysis.

2.6. Structure Activity Report

In our study, SAReport was applied to present a direct instruction on how to modify the structure of a compound and make it a better inhibitor of hedgehog signal pathway. All the structure modifications are listed in the supplementary material. Here the top three structures were selected with their activity improvements according to different modification mechanisms.
The first important finding is that through such SAReport we validated our former finding that only the data to cell line NCI-H446 can obtain a reasonable QSAR modeling result (indicated in Figure 3). Secondly, our SAReport has shown that demethylation, methylation and hydroxylation at a specific position of the inhibitor scaffold may highly improve their activity. As indicated in Figure 3, demethylation at position 8, methylation at position 7 and hydroxylation at position 11 provided three possible ways to improve the inhibitor’s activity. In addition, the SAReport shows that demethylation seems to be the most efficient approach to improve activity among others. This conclusion provides the first proven set of efficient inhibitor structure modification methods in order to improve their activities. All these results will definitely shed new light on the future work of inhibitor synthesis.

3. Material and Methods

A comprehensive computational workflow was designed to perform QSAR analysis on the inhibitors of Hedgehog signaling. This workflow is outlined in Figure 4. Details are listed below.
Our analysis started by using two different descriptors, i.e., general descriptors and drug-like index to describe the 93 cyclopamine derivates. In order to construct the training set and testing set for statistical modeling, two kinds of data division method were tried, i.e., Diverse Subset and Clustering Diverse Subset for data generations. Then, based on the training data we obtained, different statistical modeling approaches including PLS, SVR, Naive Bayesian classification and SVM classification were applied to evaluate their abilities for QSAR modeling. It should be noted that the former two methods are used to perform regression on the QSAR data and the other two methods are focusing on data classification. These approaches were applied in the testing data for further validation and derive useful clues for the designing of efficient inhibitors of Hedgehog signal pathway. Finally a SAReport of QSAR modeling of such inhibitors was presented for the first time.

3.1. Dataset and Data Division Methods

93 cyclopamine derivatives together with their activities against four different cell lines (BxPC-3, NCI-H446, SW1990 and NCI-H157) were tested and are listed in the supplementary material.
Two different approaches were applied to divide these experimental data into training set and testing set for our following statistical modeling. Details followed.

3.1.1. Diverse Subset

Briefly, the Diverse Subset method presented in MOE ranks compound entries based on diversity. In the procedure of data division, the first entry of the original dataset is taken as a reference and will always be viewed as part of a diverse subset. Then the most “distant” compound data is assigned #2, and then the most distant compound to these two is assigned #3 and so on until the required number of diverse compounds is identified or the whole dataset is ranked in diversity order. To determine which unranked entry is farthest from all already-ranked entries, the distance between each unranked entry and each ranked entry is calculated. For each unranked entry, the minimum of its distances to each ranked entry is found. The entry with the largest such “minimum distance” is deemed to be the farthest. Then such ranked dataset is divided into two parts as a training dataset (65% of the original set) and testing dataset (35% of the original set).

3.1.2. Cluster plus Diverse Subset

Compared with the above method, a clustering process is used here before Diverse Subset. Then the Diverse Subset is performed on each cluster to rank them respectively. Finally the training dataset and testing dataset are generated by summarizing the sub-training dataset (65% of every sub-cluster dataset) and testing dataset (35% of the every sub-cluster dataset) from every sub-cluster, respectively. It should be noted that MOE can cluster the whole data based on the descriptors or fingerprints. For time purposes, the descriptor-based clustering in MOE was used in our study because it is a simple 3N algorithm whereas fingerprint-based clustering uses the N2 Jarvis-Patrick algorithm.

3.2. Structural Descriptors

There are lots of descriptors to describe a chemical compound, including constitutional descriptors, physiochemical property descriptors, electronic descriptors, topological indices, geometrical descriptors, and quantum chemistry descriptors, etc. However, no set of descriptors is capable of performing spectacularly better than the others. Thus, to build our QSAR model, the widely applicable set of descriptors, i.e., the general descriptors was selected. Also, DLI descriptors was adopted for a complementary comparison.
General descriptors include atomic contributions to van der Waals surface area, log P (octanol/water), molar refractivity and partial charge. These descriptors are applied to the construction of QSAR models for boiling point, vapor pressure, free energy of salvation in water, solubility in water, thrombin/trypsin/factor Xa activity, blood-brain barrier permeability and compound classification. The wide applications of these descriptors have suggested their important usage in the QSAR modeling, combinatorial library design and molecular diversity work.
On the other hand, DLI descriptors acts as an approach to measure drug-like compounds, as first presented by Xu et al. Then it was used and modified as a set of descriptors by MOE. These descriptors characterized the hierarchy of drug structures in terms of rings, links, and molecular frameworks.
Although these two sets of descriptors are both computable from connection table information, they partly complement each other. Normally, general descriptors have a preference for physical prosperities of compounds, while DLI descriptors favor simple topological indices of compounds.

3.3. Statistic Modeling

In our computational framework, various statistical models were incorporated to evaluate their performance in QSAR analysis of inhibitors of Hedgehog signal pathway, and we wanted to find the most suitable statistical analysis method for the QSAR modeling of such data. Detailed descriptions of each statistical method are listed below.

3.3.1. PLS Method

The PLS QSAR method [43,44] was widely employed in the study of QSAR modeling by the QuaSAR-Model module of MOE 2008. This is arguably the most traditional and least sophisticated QSAR approach among those explored in this study. It was explored here to test if it could build reliable models for underlying data sets using the simplest approach. In our study, we applied the PLS method presented in MOE and the number of components was set to no limit on the degree of the fit. The maximum condition number of the principal component transform of the correlation matrix S, the condition limit, was set at 1.0 × 106 which is a very high setting. The leave-one-out cross validation (LOO-CV) scheme was used to validate the models and the correlation coefficient (Q2) and root-mean-square error (RMSE) were reported.

3.3.2. SVR

SVR was used here to compare with PLS regression, which has proven to be a powerful regression technique in many applications. SVR is the regression version derived from SVM which was proposed in 1996 by Vladimir Vapnik et al. [45]. This regression method depends only on a subset of the training data and the cost function for building the model ignores any training data close to the model prediction (within a threshold ɛ). Intrinsically, SVR maintains all the main features that characterize the maximal margin algorithm and a non-linear function is learned by a linear learning machine in a kernel-induced feature space while the capacity of the system is controlled by a parameter that does not depend on the dimensionality of the space. In summary, the basic idea of SVR is to map the data into a high-dimensional feature space via nonlinear mapping, and perform linear regression in this space.

3.3.3. Binary Bayesian Inference

The binary bayesian QSAR method was employed by using the QuaSAR-Model module of MOE 2008. In this modeling, the numerical values of inhibitor activity were transferred to binary classification labels, thus greatly reduced the noise of the data. That is, the binary model is used to predict a probability of a given compound to be either active or inactive rather than their numerical values. Since no quantitative estimation of the actual activity is derived, the compounds are referred to as “active” if its predicted probability of being active is more than 0.5.
In binary Bayesian inference for each compound, the following steps were applied to predict their probability of being active [46]:
  • Estimates two distributions: one for the active compounds and one for the inactive ones in the training set. The separation of active and inactive sets is manually defined by a Binary Threshold.
  • Counts the frequency of occurrence of a particular descriptor value in active and inactive cases.
  • Accumulates a histogram of the observed sample values over the classes. The distribution is convoluted with a Gaussian (σ = 0.25, the smoothing width) to avoid sensitivity to bin boundaries.
  • A histogram of property distributions is derived for each descriptor for “active ” and “inactive” (yes/no) sets. Those descriptors which differentiate the two sets will have a high impact in the model, those which do not, will drop out.

3.3.4. SVM Classification

Compared with binary Bayesian classification, the SVM classification was also applied for our QSAR data. SVM works by mapping the training data into a feature space with the aid of a so-called kernel function and then separating the data using a large margin hyperplane. Intuitively, the kernel computes a similarity between two given examples. Most commonly used kernel functions are radial basis function kernels and was used in our experiments. SVM classifiers are generated by a two-step procedure: First, the sample data vectors are mapped (“projected”) to a very high-dimensional space. The dimension of this space is significantly larger than the dimension of the original data space. Then, the algorithm finds a hyperplane in this space with the largest margin separating classes of data. It was shown that classification accuracy usually depends only weakly on the specific projection, provided that the target space is sufficiently high dimensional. Sometimes it is not possible to find the separating hyperplane even in a very high-dimensional space. In this case a tradeoff is introduced between the size of the separating margin and penalties for every vector which is within the margin.

3.4. SAReport

SAReport [47] is an important tool for the visualization and analysis of project SAR data introduced by MOE recently. SAReport contains sophisticated analysis methods to help scientists identify important groups and make more effective choices for synthesis.
Briefly, the Suggestions table in SAReport consists of a list of hypothetical molecules, constructed from available pieces, which are predicted to have a high probability of activity. The pool of hypothetical molecules is prepared by enumerating all of the input molecules, and performing single-point mutations at each of the substitute positions, with each of the R-groups that have been observed in the equivalent position for some other molecule in the dataset. The unique list of chimerical molecules is then rated according to an estimate of probability, scaled and balanced to match the distribution of activities found in the input set. The scores are scaled in such a way that a value of 0 indicates that the hypothetical molecule is as likely to be active as an average molecule in the input set, while positive values are more likely. The chimerical molecules are ranked by their probability of activity, multiplied by a weighting factor, which is a measure of cumulative similarity to other molecules in the database. A higher weighting implies that a larger statistical base is available to make the prediction. The most promising candidates are listed first. The molecule from which the candidate was mutated is shown, along with its property information. The new structure is shown to the right, along with the prediction. The percentage value is the increased probability of activity, and the number in brackets is the weighting.

4. Conclusions

In this study, different descriptors, different data dividing approaches as well as different statistic methods are used to build QSAR models for inhibitors of Hedgehog signal pathway on 93 cyclopamine derivatives together with their activities against four different cell lines. Our investigation has shown that NCI-466 may serve as the best cell line for testing the activities of inhibitors of Hedgehog signal pathway. Due to the lower qualities of the data, the binary classification method is a much better choice in building QSAR models than regression. Furthermore, for synthesis and medical scientists, our results indicate that demethylation, methylation and hydroxylation at a specific position may highly improve the activity of inhibitors of Hedgehog signal pathway. Demethylation is also found to be a better choice than methylation or hydroxylation for compound modification. Based on these conclusions, demethylation is preferred to methylation or hydroxylation in compound modification and such work is currently being actively pursued in our laboratory.

Supplementary Materials

ijms-12-03018-s001.doc

Abbreviations

R2 =
correlation coefficient in self fitting of training data set
Q2 =
correlation coefficient in cross validation fitting of training data set
r2 =
correlation coefficient in fitting of test data set
A =
percentage accuracy of binary model = Total accuracy
A0 =
percentage accuracy of inactive subset
A1 =
percentage accuracy of active subset
At =
A in self fitting of training data set
Av =
A in cross validation fitting of training data set
Ap =
A in fitting of test data set
DLI =
Drug-like Index
PLS =
Partial Least Squares
SVR =
Support Vector Regression
SVM =
Support Vector Machine
ANN =
Artificial Neural Networks
SAReport =
Structure-Activity Report

Acknowledgments

We would like to thank Baowei Zhao in GSK for his proofread and valuable suggestions. This work was supported in part by grants from Ministry of Science and Technology China (2009ZX10004-601), National Natural Science Foundation of China (30976611), and Research Fund for the Doctoral Program of Higher Education of China (20100072110008, 20100072120050).

References

  1. Ingham, PW; McMahon, AP. Hedgehog signaling in animal development: Paradigms and principles. Gene. Dev 2001, 15, 3059–3087. [Google Scholar]
  2. Oro, AE; Higgins, KM; Hu, Z; Bonifas, JM; Epstein, EH, Jr; Scott, MP. Basal cell carcinomas in mice overexpressing sonic hedgehog. Science 1997, 276, 817–821. [Google Scholar]
  3. Kinzler, KW; Bigner, SH; Bigner, DD; Trent, JM; Law, ML; O’Brien, SJ; Wong, AJ; Vogelstein, B. Identification of an amplified, highly expressed gene in a human glioma. Science 1987, 236, 70–73. [Google Scholar]
  4. Dahmane, N; Lee, J; Robins, P; Heller, P; Ruiz i Altaba, A. Activation of the transcription factor Gli1 and the Sonic hedgehog signalling pathway in skin tumours. Nature 1997, 389, 876–881. [Google Scholar]
  5. Grachtchouk, M; Mo, R; Yu, S; Zhang, X; Sasaki, H; Hui, CC; Dlugosz, AA. Basal cell carcinomas in mice overexpressing Gli2 in skin. Nat. Genet 2000, 24, 216–217. [Google Scholar]
  6. Reifenberger, J; Wolter, M; Weber, RG; Megahed, M; Ruzicka, T; Lichter, P; Reifenberger, G. Missense mutations in SMOH in sporadic basal cell carcinomas of the skin and primitive neuroectodermal tumors of the central nervous system. Cancer Res 1998, 58, 1798–1803. [Google Scholar]
  7. Dahmane, N; Sanchez, P; Gitton, Y; Palma, V; Sun, T; Beyna, M; Weiner, H; Ruiz i Altaba, A. The Sonic Hedgehog-Gli pathway regulates dorsal brain growth and tumorigenesis. Development 2001, 128, 5201–5212. [Google Scholar]
  8. Chen, JK; Taipale, J; Young, KE; Maiti, T; Beachy, PA. Small molecule modulation of Smoothened activity. Proc. Nat. Acad. Sci. USA 2002, 99, 14071–14076. [Google Scholar]
  9. Chen, JK; Taipale, J; Cooper, MK; Beachy, PA. Inhibition of hedgehog signaling by direct binding of cyclopamine to smoothened. Gene. Dev 2002, 16, 2743–2748. [Google Scholar]
  10. Beachy, P; Porter, J. Hedgehog-derived polypeptides. US Patent No. 6911528, 28 June 2005.
  11. Taipale, J; Chen, JK; Cooper, MK; Wang, B; Mann, RK; Milenkovic, L; Scott, MP; Beachy, PA. Effects of oncogenic mutations in smoothened and patched can be reversed by cyclopamine. Nature 2000, 406, 1005–1009. [Google Scholar]
  12. Giannis, A; Heretsch, P; Sarli, V; Stossel, A. Synthesis of cyclopamine using a biomimetic and diastereoselective approach. Angew. Chem. Int. Ed. Engl 2009, 48, 7911–7914. [Google Scholar]
  13. Zhang, J; Garrossian, M; Gardner, D; Garrossian, A; Chang, YT; Kim, YK; Chang, CW. Synthesis and anticancer activity studies of cyclopamine derivatives. Bioorg. Med. Chem. Lett 2008, 18, 1359–1363. [Google Scholar]
  14. Janardanannair, S; Adams, J; Ripka, AS. Methods for preparation cyclopamine analogs and use thereof in treating cancers. U.S. Patent 7,407,967 B2, 5 August 2008.
  15. Tang, J; Li, HL; Shen, YH; Jin, HZ; Yan, SK; Liu, RH; Zhang, WD. Antitumor activity of extracts and compounds from the rhizomes of Veratrum dahuricum. Phytother. Res 2008, 22, 1093–1096. [Google Scholar]
  16. Tang, J; Li, HL; Shen, YH; Jin, HZ; Yan, SK; Liu, XH; Zeng, HW; Liu, RH; Tan, YX; Zhang, WD. Antitumor and antiplatelet activity of alkaloids from Veratrum dahuricum. Phytother. Res 2010, 24, 821–826. [Google Scholar]
  17. Xu, J; Stevenson, J. Drug-like index: A new approach to measure drug-like compounds and their diversity. J. Chem. Inf. Comput. Sci 2000, 40, 1177–1187. [Google Scholar]
  18. Labute, P. A widely applicable set of descriptors. J. Mol. Graph. Model 2000, 18, 464–477. [Google Scholar]
  19. Molecular Operation Eenvironment, version 200810; Chemical Computing Group Inc: Montreal, Quebec, Canada, 2008.
  20. Discovery Studio, version 20; Accelrys Software Inc.: San Diego, CA, USA, 2007.
  21. Sybyl, version 68; Tripos Inc: St Louis, MO, USA, 2001.
  22. Balabin, RM; Lomakina, EI. Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies. J. Chem. Phys 2009, 131, 74104. [Google Scholar]
  23. Gelman, A; Carlin, JB; Stern, HS; Rubin, DB. Bayesian Data Analysis, 2nd ed; Chapman and Hall/CRC: Boca Raton, FL, USA, 2003. [Google Scholar]
  24. Ho, TK. A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal. Appl 2002, 5, 102–112. [Google Scholar]
  25. Cortes, C; Vapnik, V. Support-vector networks. Mach. Learn 1995, 20, 273–297. [Google Scholar]
  26. Patra, JC; Singh, O. Artificial neural networks-based approach to design ARIs using QSAR for diabetes mellitus. J. Comput. Chem 2009, 30, 2494–2508. [Google Scholar]
  27. Bucinski, A; Socha, A; Wnuk, M; Baczek, T; Nowaczyk, A; Krysinski, J; Gorynski, K; Koba, M. Artificial neural networks in prediction of antifungal activity of a series of pyridine derivatives against Candida albicans. J. Microbiol. Meth 2009, 76, 25–29. [Google Scholar]
  28. Kahn, I; Sild, S; Maran, U. Modeling the toxicity of chemicals to Tetrahymena pyriformis using heuristic multilinear regression and heuristic back-propagation neural networks. J. Chem. Inf. Model 2007, 47, 2271–2279. [Google Scholar]
  29. Vijayan, RS; Bera, I; Prabu, M; Saha, S; Ghoshal, N. Combinatorial library enumeration and lead hopping using comparative interaction fingerprint analysis and classical 2D QSAR methods for seeking novel GABA(A) alpha(3) modulators. J. Chem. Inf. Model 2009, 49, 2498–2511. [Google Scholar]
  30. Tang, H; Wang, XS; Huang, XP; Roth, BL; Butler, KV; Kozikowski, AP; Jung, M; Tropsha, A. Novel inhibitors of human histone deacetylase (HDAC) identified by QSAR modeling of known inhibitors, virtual screening, and experimental validation. J. Chem. Inf. Model 2009, 49, 461–476. [Google Scholar]
  31. Burden, FR; Winkler, DA. Optimal sparse descriptor selection for QSAR using bayesian methods. QSAR Comb. Sci 2009, 28, 645–653. [Google Scholar]
  32. Abdoa, A; Salima, N. Similarity-based virtual screening using bayesian inference network: Enhanced search using 2D fingerprints and multiple reference structures. QSAR Comb. Sci 2009, 28, 654–663. [Google Scholar]
  33. Li, Y; Wang, Y; Ding, J; Wang, Y; Chang, YQ; Zhang, SW. In silico prediction of androgenic and nonandrogenic compounds using random forest. QSAR Comb. Sci 2009, 28, 396–405. [Google Scholar]
  34. Zhu, JX; Lu, WC; Liu, L; Gu, TH; Niu, B. Classification of Src Kinase inhibitors based on support vector machine. QSAR Comb. Sci 2009, 28, 719–727. [Google Scholar]
  35. Polishchuk, PG; Muratov, EN; Artemenko, AG; Kolumbin, OG; Muratov, NN; Kuz’min, VE. Application of random forest approach to QSAR prediction of aquatic toxicity. J. Chem. Inf. Model 2009, 49, 2481–2488. [Google Scholar]
  36. Sun, M; Zheng, YG; Wei, HT; Chen, JQ; Cai, J; Ji, M. enhanced replacement method-based quantitative structure-activity relationship modeling and support vector machine classification of 4-Anilino-3-quinolinecarbonitriles as Src Kinase inhibitors. QSAR Comb. Sci 2009, 28, 312–324. [Google Scholar]
  37. Darnag, R; Schmitzer, A; Belmiloud, Y; Villemin, D; Jarid, A; Chait, A; Seyagh, M; Cherqaoui, D. QSAR studies of HEPT derivatives using support vector machines. QSAR Comb. Sci 2009, 28, 709–718. [Google Scholar]
  38. Rao, HB; Yang, GB; Tan, NX; Li, P; Li, ZR; Li, XY. Prediction of HIV-1 Protease inhibitors using machine learning approaches. QSAR Comb. Sci 2009, 28, 1346–1357. [Google Scholar]
  39. Goodarzi, M; Freitas, MP; Jensen, R. Feature selection and linear/nonlinear regression methods for the accurate prediction of glycogen synthase kinase-3beta inhibitory activities. J. Chem. Inf. Model 2009, 49, 824–832. [Google Scholar]
  40. Watkins, DN; Berman, DM; Burkholder, SG; Wang, B; Beachy, PA; Baylin, SB. Hedgehog signalling within airway epithelial progenitors and in small-cell lung cancer. Nature 2003, 422, 313–317. [Google Scholar]
  41. Thayer, SP; di Magliano, MP; Heiser, PW; Nielsen, CM; Roberts, DJ; Lauwers, GY; Qi, YP; Gysin, S; Fernandez-del Castillo, C; Yajnik, V; et al. Hedgehog is an early and late mediator of pancreatic cancer tumorigenesis. Nature 2003, 425, 851–856. [Google Scholar]
  42. Sasai, K; Romer, JT; Lee, Y; Finkelstein, D; Fuller, C; McKinnon, PJ; Curran, T. Shh pathway activity is down-regulated in cultured medulloblastoma cells: Implications for preclinical studies. Cancer Res 2006, 66, 4215–4222. [Google Scholar]
  43. Helland, IS. On the structure of partial least squares regression. Comm. Stat. Simulat. Comput 1988, 17, 581–607. [Google Scholar]
  44. Gelaldi, P; Kowalski, R. Partial least squares regression: A tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar]
  45. Drucker, H; Burges, CJC; Kaufman, L; Smola, AJ; Vapnik, V. Support Vector Regression Machines; MIT Press: Cambridge, MA, USA, 1996; pp. 155–161. [Google Scholar]
  46. Watanabe, N; Adachi, H; Takase, Y; Ozaki, H; Matsukura, M; Miyazaki, K; Ishibashi, K; Ishihara, H; Kodama, K; Nishino, M; et al. 4-(3-Chloro-4-methoxybenzyl)aminophthalazines: Synthesis and inhibitory activity toward phosphodiesterase 5. J. Med. Chem 2000, 43, 2523–2529. [Google Scholar]
  47. Schuffenhauer, A; Ertl, P; Roggo, S; Wetzel, S; Koch, MA; Waldmann, H. The scaffold tree-visualization of the scaffold universe by hierarchical scaffold classification. J. Chem. Inf. Model 2007, 47, 47–58. [Google Scholar]
Figure 1. Four scaffolds found in our experimental data.
Figure 1. Four scaffolds found in our experimental data.
Ijms 12 03018f1
Figure 2. Six molecules that did not match any of the scaffolds, as mentioned above.
Figure 2. Six molecules that did not match any of the scaffolds, as mentioned above.
Ijms 12 03018f2
Figure 3. SAReport of Hedgehog inhibitors.
Figure 3. SAReport of Hedgehog inhibitors.
Ijms 12 03018f3aIjms 12 03018f3b
Figure 4. General computational workflow used in our study.
Figure 4. General computational workflow used in our study.
Ijms 12 03018f4
Table 1. QSAR results derived from the data divided by Diverse Subset (σ indicates difference).
Table 1. QSAR results derived from the data divided by Diverse Subset (σ indicates difference).
BxPC-3NCI-H446SW1990NCI-H157
GeneralDrug-likeσGeneralDrug-likeσGeneralDrug-likeσGeneralDrug-likeσ
PLSR20.5520.494−0.0580.6590.526−0.1330.6440.585−0.0590.5270.5310.004
Q20.0000.0350.0350.0010.0260.0250.0210.1580.1370.0380.1060.068
r20.1020.3070.2050.2180.025−0.1930.0840.1930.1090.0190.1180.099
SVRR20.9940.6860.3080.9660.763−0.2030.9930.808−0.1850.9880.705−0.283
Q20.9940.000−0.9940.9620.002−0.960.9920.069−0.9230.9870.001−0.986
r20.0000.3960.3960.0880.1100.0220.0250.2580.2330.0230.0770.054
Bayesian inferenceAt0.8830.9170.0341.0000.967−0.0330.9000.9330.0330.9670.933−0.034
Av0.7830.8170.0340.9170.91700.8830.783−0.10.8670.8670
Ap0.6060.576−0.030.7580.8790.1210.5760.6670.0910.4850.6360.151
SVM classificationAt1.0001.00001.0001.00001.0001.00001.0001.0000
Av0.5500.500−0.050.8670.817−0.050.6500.533−0.1170.6330.617−0.016
Ap0.4550.6360.1810.7880.8790.0910.5450.7580.2130.6970.636−0.061
Table 2. QSAR results derived from the data divided by Cluster plus Diverse Subset (σ indicates difference).
Table 2. QSAR results derived from the data divided by Cluster plus Diverse Subset (σ indicates difference).
BxPC-3NCI-H446SW1990NCI-H157
GeneralDrug-likeσGeneralDrug-likeσGeneralDrug-likeσGeneralDrug-likeσ
PLSR20.5060.474−0.0320.5930.396−0.1970.5420.493−0.0490.5870.542−0.045
Q20.0110.007−0.0040.0150.0190.0040.0050.002−0.0030.0060.0400.034
r20.1780.2150.0370.0550.2010.1460.0000.2220.2220.0870.056−0.031
SVRR20.9970.716−0.2810.9650.756−0.2090.9930.839−0.1540.9870.655−0.332
Q20.9970.021−0.9760.9620.025−0.9370.9930.124−0.8690.9860.019−0.967
r20.0080.1390.1310.0290.001−0.0280.0400.0750.0350.0190.0870.068
Bayesian inferenceAt0.9670.885−0.0820.9510.934−0.0170.9340.918−0.0160.9840.885−0.099
Av0.8520.803−0.0490.9340.918−0.0160.8520.836−0.0160.8200.8200
Ap0.6560.625−0.0310.6250.9060.2810.6250.6560.0310.6250.6250
SVM classificationAt1.0000.984−0.0161.0001.00001.0001.00001.0000.984−0.016
Av0.5050.475−0.030.8030.8520.0490.5900.6230.0330.6560.623−0.033
Ap0.6560.7190.0630.8750.87500.6250.7190.0940.6880.7190.031

Share and Cite

MDPI and ACS Style

Zhu, R.; Liu, Q.; Tang, J.; Li, H.; Cao, Z. Investigations on Inhibitors of Hedgehog Signal Pathway: A Quantitative Structure-Activity Relationship Study. Int. J. Mol. Sci. 2011, 12, 3018-3033. https://doi.org/10.3390/ijms12053018

AMA Style

Zhu R, Liu Q, Tang J, Li H, Cao Z. Investigations on Inhibitors of Hedgehog Signal Pathway: A Quantitative Structure-Activity Relationship Study. International Journal of Molecular Sciences. 2011; 12(5):3018-3033. https://doi.org/10.3390/ijms12053018

Chicago/Turabian Style

Zhu, Ruixin, Qi Liu, Jian Tang, Huiliang Li, and Zhiwei Cao. 2011. "Investigations on Inhibitors of Hedgehog Signal Pathway: A Quantitative Structure-Activity Relationship Study" International Journal of Molecular Sciences 12, no. 5: 3018-3033. https://doi.org/10.3390/ijms12053018

Article Metrics

Back to TopTop