Herein, the selected kinase profiles are rationalized first and the virtual screening results against these panels are discussed. Then, the experimental results for the selected compounds are presented. Finally, the similarity between the kinases of the studied profiles is analyzed with respect to different ligand- and protein-centric measures.
2.1. Kinase Profiles
We focused our analysis on a target panel comprising kinases with medical relevance as well as a typical anti-target, known to be associated with frequent side effects of kinase inhibitors. All kinases in this set have been thoroughly characterized in the literature and are summarized in
Table 1.
The Erythroblastic leukemia viral oncogene homolog (ErbB) subclass of Receptor Tyrosine Kinases (RTKs) consists of four members named from ErbB1 (better known as epidermal growth factor receptor [EGFR]) to ErbB4 and they bind the EGF family of peptides with their extracellular region [
22]. The ErbB family is involved in the regulation of a multitude of signaling pathways associated with cell development. It is thus not surprising that aberrant ErbB signaling occurs in many cancers. Of note, patients with altered EGFR and ErbB2 expression suffer from a more aggressive disease. Especially breast cancer overexpressing ErbB2 is associated with poor patient prognosis [
23]. Unfortunately, therapy is often effective only for a short time and tumors will escape inhibition by activating pathways downstream of ErbB receptors via other kinases. This has been demonstrated for the phosphatidylinositol-3-kinase (PI3K) pathway, which is directly or indirectly activated by most ErbBs [
24]. After initial downregulation of PI3K activity upon inhibition of ErbBs, this pathway often recovers. Combination therapies are used to circumvent this problem, albeit with limited success. There is also evidence that tumor cells escape the negative effects of EGFR inhibition by upregulating tumor angiogenesis-promoting growth factors. A study used two antibodies against EGFR and VEGFR2 (vascular endothelial growth factor receptor 2), respectively, to treat gastric cancer grown in nude mice [
25]. The combination resulted in significantly greater inhibition of tumor growth.
Based on these experimental observations, we aggregated the investigated kinases in “profiles” (
Table 2). Profile 1 combined EGFR and ErbB2 as targets (indicated by a ‘+’) and BRAF (from rapidly accelerated fibrosarcoma isoform B) as a (general) anti-target (designated by a ‘—’). Out of similar considerations, Profile 2 consisted of EGFR and PI3K as targets and BRAF as anti-target. This profile is expected to be more challenging as PI3K is an atypical kinase and thus less similar to EGFR than for example ErbB2 used in Profile 1. Profile 3, comprised of EGFR and VEGFR2 as targets and BRAF as anti-target, was contrasted with the hit rate that we found with a standard docking against the single target VEGFR2 (Profile 4).
To broaden the comparison and obtain an estimate for the promiscuity of each compound, the kinases CDK2 (cyclic-dependent kinase 2), LCK (lymphocyte-specific protein tyrosine kinase), MET (mesenchymal-epithelial transition factor) and p38α (p38 mitogen activated protein kinase α) were included in the experimental assay panel and the structure-based bioinformatics comparison as commonly used anti-targets.
2.2. Virtual Screening against Kinase Profiles
Following our previous approach to identify ligands with tailored selectivity profiles by virtual screening [
6], the aim of this study was to evaluate the possibility to add anti-targets to a kinase profile. We hence modified our previous approach to incorporate profiles with more than two kinases, multiple structures per kinase and the selection of targets and anti-targets (Equation (
1) in Section “Data and Methods”).
Starting from the EGFR/ErbB2 pair, we included BRAF as a promiscuous anti-target, resulting in Profile 1 (see
Section 2.4.1 for a discussion of promiscuity values). We therefore prioritized molecules with high rank (i.e., favorable docking scores) in EGFR and ErbB2 as well as low rank (i.e., unfavorable docking interactions) in BRAF. The ZINC lead-like and ZINC drug-like subsets, containing 4.6 and 10.6 million molecules, respectively, were docked into each of the selected structures of these kinases (cf. “Data and Methods”). After docking the smaller lead-like subset to EGFR, ErbB2 and BRAF, the kinases comprising Profile 1, we identified a high mutual overlap in terms of well-ranked compounds between these three kinases (6982 common compounds in the top-ranked 25,000 compounds for EGFR and ErbB2, 4732 for ErbB2/BRAF and 4675 for EGFR/BRAF, respectively, each number representing the maximum over all pairwise comparisons of all docking runs of the lead-like ZINC subset into the different structures of these kinases). Thus, many promising poses in EGFR/ErbB2 were invalidated by a high-rank in the anti-target BRAF. Therefore, we deemed the docking of the larger drug-like subset necessary to obtain a sufficient number of poses with reasonable binding modes to select from after re-ranking. The re-ranking procedure was devised to prioritize molecules matching the requested profile, i.e., molecules with favorable docking rank in all targets but unfavorable docking ranks in all anti-target structures (see “Data and Methods” for details). Finally, we selected 18 molecules (see
Table 2 and
Table S1) based on visual inspection for this profile (see “Data and Methods” for more detail) from the re-ranked lists of both molecule sets and evaluated these experimentally.
Similarly, for Profile 2, using EGFR and PI3K as targets and again BRAF as an anti-target (
Table 2), we docked both the ZINC lead-like as well as the drug-like subsets. Again, we deemed the drug-like subset to be necessary due to the large overlap of the top-scoring lead-like molecules of the targets with the ones ranked favorably in the anti-target (4683, 4675 and 6591 for EGFR/PI3K, EGFR/BRAF, and PI3K/BRAF, respectively). For this profile, we selected nine molecules (
Table 2 and
Table S1).
The parallel docking calculations for Profiles 3 and 4 (
Table 2) yielded eight and four candidate ligands, respectively (
Table 2 and
Table S1). For Profile 3, the number of common molecules in the top 25,000 was 4610 and 5544 for VEGFR2/EGFR and VEGFR2/BRAF, respectively. As above, the overlap between EGFR and BRAF was 4675.
2.3. Experimental Validation
In total, 24 compounds selected from Profiles 1 and 2 (
Table 2 and
Table S1) were tested in the DiscoverX assay against kinases EGFR, ErbB2, BRAF, VEGFR2, LCK, CDK2, MET, p38α and PI3K (
Table S2), as well as in an additional confirmatory assay by Eurofins against EGFR, ErbB2, BRAF and PI3K (
Table S3). Only one of the 24 compounds, DS39984, showed measurable binding to the desired kinases (Profile 1,
Table 3 and
Tables S1–S3), while binding to neither Profile 1’s anti-target BRAF nor any of the other tested kinases (VEGFR2, CDK2, LCK, MET, p38α and PI3K). This compound DS39984 emerged from the screening campaign against Profile 1 (+EGFR+ErbB2—BRAF) and was picked from the drug-like subset of the ZINC database. We further validated the binding of this ligand and determined binding curves in an independent assay with IC
values of 324 and 220 nM (note that both enantiomers were docked—with the R-enantiomer more favorably ranked, but the racemate was tested) against EGFR and ErbB2insYVMA (a variant of ErbB2 with an insertion of four residues distant from the binding pocket), respectively (
Table 3, Rauh Lab).
As shown in the predicted binding modes in EGFR and ErbB2 (
Figure 1), DS39984 adopts a similar binding orientation in both proteins, with the pyrimidine portion forming a hydrogen bond to the hinge region. The methylester moiety is oriented more towards the back of the binding pocket, where both kinases feature rather voluminous cavities. This predicted binding mode to the hinge region is consistent with the sensitivity of DS39984 towards the T790M mutation: Affinity for the EGFRL858R/T790M double mutant is abolished (IC
µM), whereas the affinity for the EGFRL858R mutant is
nM. In contrast, in both BRAF structures used herein, the predicted poses are flipped and have their methylester moiety pointing towards the solvent (
Figure S1). A similar hinge binding interaction as in EGFR and ErbB2 is only present in one of the two poses (in the docking to BRAF structure 1UWH). This occurs despite the fact that in the 1UWH crystal structure the deep back pocket is open due to the crystallized ligand. Thus, in principle, a binding mode of DS39984 similar to the ones predicted in EGFR and ErbB2 is not per se excluded in BRAF due to steric reasons.
Note that DS39984 is not present in ChEMBL and has low similarity to known kinase ligands in ChEMBL (no ligand with Tanimoto similarity >0.7 as implemented in the ChEMBL web interface as of 18 October 2020). Furthermore, none of the additionally tested kinases (LCK, CDK2, MET and p38α) were inhibited by the molecule, which underlines, together with absence of BRAF inhibition, the potential of DS39984 as a novel, selective nanomolar EGFR and ErbB2 inhibitor.
Eight compounds were selected for Profile 3 (+EGFR+VEGFR2—BRAF,
Table 2 and
Table S1) and tested in the DiscoverX assay against EGFR, VEGFR2, BRAF and ErbB2. However, none of the compounds exhibited a relevant effect against any of these kinases. To crudely estimate the ligandability of VEGFR2, we docked against this target individually (Profile 4). However, we did not observe many poses that passed our visual inspection (see “Data and Methods” for details) and were able to select only four compounds from the docking to VEGFR2. These were tested in the same assay. Again, none of these compounds showed an effect on VEGFR2 activity. While the number of tested compounds is certainly too small to draw clear conclusions, the fact that only few compounds could be considered in the first place and that those few were inactive might indicate that VEGFR2 is more challenging with respect to the identification of ligands by docking than for example EGFR and ErbB2. One explanation for this could be associated with the fact that the vast majority of VEGFR2 structures show DFG-out(like) conformations (ratio DFG-in/out(like) structures in the PDB: 5/34 for VEGFR2 compared to 168/22 for EGFR, as of KLIFS 25 November 2020). Note that several FDA-approved kinase inhibitors bind to DFG-out(like) VEGFR2 conformations, e.g., axitinib, sunitinib and sorafenib [
26]. In contrast, we used DFG-in conformations of VEGFR2 for docking in order to maximize comparability with the other kinase structures used.
Unexpectedly, however, we found that one of these four compounds selected for VEGFR2 inhibition, K001MM011, actually inhibited EGFR and, to a lesser extent, ErbB2 (
Table 3 and
Table S2). While K001MM011 was picked from the docking to VEGFR2 only, we retrospectively inspected the ranking of this compound in the docking to EGFR and ErbB2. In EGFR, K001MM011 was found to be ranked within the best 10,000 compounds (rank 9527) of the lead-like subset in PDB 3POZ, while, in ErbB2, K001MM011 was ranked not as highly (best rank: 123,665 in PDB 3PP0).
In light of these experimental results and the comparative scarcity of ligands with the intended profiles, we decided to better investigate the kinases involved, with a view towards the possibility to predict the sensibility of a particular target combination.
2.4. Kinase Similarities
Designing kinase inhibitors with intended dual target activity that avoid binding to one or several specific anti-targets is a non-trivial task, as evidenced by the docking part of our study. To better understand how difficult it may be to design such inhibitors rationally, five different measures of inter-kinase similarity—each contributing a different level of granularity and a different viewpoint—were investigated (
Figure 2). Such an analysis potentially enables a priori estimations of the success of these endeavors for a given target/anti-target profile.
2.4.1. Ligand Profile Similarity (LigProfSim)
A first glance at the ChEMBL kinase ligand subsets revealed that none of the investigated kinases seems to be overly selective in terms of the ligands it recognizes, which is in accordance with previous kinome-wide profiling studies [
21,
27]. Given that the promiscuity values (
Table 4, diagonal of
Figure 2A and
Table S4) range from 0.55 for CDK2 to 0.82 for BRAF, all nine kinases bind more than half of the compounds tested against them at an affinity cut-off of 500 nM. Accordingly, BRAF is the most promiscuous kinase in the set, justifying its use as a general kinase anti-target in this study.
Second, considering LigProfSim, it becomes evident that EGFR, ErbB2 and BRAF are more similar to each other than the remaining kinases (top-left quarter of
Figure 2A), which renders finding a compound for Profile 1 (
Table 2) a difficult task. With LigProfSim values of 0.53 and 0.55, EGFR is more similar to ErbB2 and BRAF, respectively, than to any other kinase in the set (
Table S4). The same holds true for ErbB2, while BRAF has also higher similarities to other kinases in the set. In contrast, with a mean similarity value of 0.18, PI3K has the lowest mean LigProfSim similarity to all nine kinases. This is not unexpected, given that PI3K is the only atypical kinase in the set, but it underlines how challenging the definition of Profile 2 is. Note that, while 4150 compounds were tested against PI3K (with 2706 being active), PI3K has fewer than five common actives with most kinases, except for EGFR (13 common actives of 180 compounds tested against both targets) and VEGFR2 (32 of 175) (see
Table 5 and
Tables S5 and S6). While all kinases were assayed against at least 1500 compounds, a few other kinase pairs—not including PI3K—exist that have only a low number of tested compounds in common, e.g., CDK2/BRAF (14), CDK2/p38α (8) or ErbB2/p38α (9, see
Table S5), which makes thorough comparison difficult. Finally, with a value of 0.35, EGFR and VEGFR2 do not show high similarity from this ligand-centric perspective, while, as mentioned above, VEGFR2 and BRAF show considerably higher similarity (0.77). These numbers indicate that Profile 3 is very difficult.
2.4.2. Pocket Sequence Similarity (PocSeqSim)
Classically, kinases are clustered based on their full sequence similarity, such as in the well-known phylogenetic human kinome tree by Manning et al. [
11]. The kinome tree is often considered when checking for relationships among kinases, cross-reactivity and anti-targets. Arguably, EGFR and ErbB2 are the most closely related kinases in the set, both belonging to the TK branch and the EGFR family, followed by similarity to VEGFR2 (TK branch, VEGFR family). BRAF is less closely related (tyrosine-kinase-like [TKL] branch, RAF family). Finally, PI3K belongs to the atypical kinases and is only distantly related. Full kinase details are listed in
Table 1.
Here, we refined this sequence-based view of similarity to only consider the 85 residues forming the binding site in each kinase (PocSeqSim). Also in this “pocket sequence” space, the two EGFR family members EGFR and ErbB2 show the highest similarity of 0.89 (
Figure 2B, numbers in
Table S7). All other kinase pairs have similarity values below 0.48, thus less than 50% identical pocket residues. VEGFR2, MET and LCK, three other kinases from the TK class, have PocSeqSim between 0.42 and 0.47 to EGFR and Erb2; BRAF (TKL), p38α and CDK2 (both from the CMGC family) have values in the range of 0.32 to 0.40. Again, PI3K shows the lowest similarity to all other eight kinases. This indicates that, first, the pocket sequence similarities follow a similar trend as the whole-sequence similarities and, second, that—due to the close relationship of EGFR and ErbB2—other less similar kinases of the TK branch such as VEGFR2, MET and LCK, but also BRAF (TKL), p38α and CDK2 (both from the CMGC family), could be easier-to-satisfy anti-targets of +EGFR+ErbB2 ligands (
Figure 2B).
2.4.3. Interaction Fingerprint Similarity (IFPSim)
To take the interplay between the ligand and the protein into account, interaction fingerprint similarities (IFPSim) were investigated. Note that, for each kinase pair,
all available X-ray structures were compared and that only the similarity between the highest-scoring pair is reported (
Figure 2C, numbers in
Table S8). In the IFPSim matrix, the diagonal describes the best match among all pairwise IFP comparisons between different structures from the same kinase. Interestingly, ErbB2 has a self-similarity of only 0.71. This could be a consequence of the relatively low structural coverage of this kinase. In fact, ErbB2 is only represented by two structures, whereas, for EGFR, 150 structures are available (
Table 5).
With mean similarity values between 0.61 (lowest for PI3K) and 0.83 (highest for VEGFR2), the IFPSim values are generally higher than the LigProfSim and PocSeqSim values described above (
Table 4). EGFR has a high mean similarity to all kinases of 0.81, whereas ErbB2 has a lower mean value of 0.64; note again the low structural coverage of ErbB2. While ErbB2 is most similar to EGFR (0.78) with respect to IFPSim (
Figure 2C), it is less similar to BRAF (0.65), which would favor the development of a Profile 1 (+EGFR+ErbB2—BRAF) inhibitor. Interestingly, PI3K shows one of the highest similarities to EGFR (0.65), while it is less similar to BRAF (0.52), which, in contrast to other similarity measures, would support the feasibility of designing +EGFR+PI3K—BRAF compounds (Profile 2). In the case of VEGFR2, although similarity to EGFR is high (0.83), we observe an even higher similarity to BRAF (0.93), giving another indication of how difficult it may be to design-out this anti-target. On the other hand, the comparatively high similarity of VEGFR2 to EGFR might give an indication of why our Profile 4 compound actually inhibited EGFR.
2.4.4. Pocket Structure Similarity (PocStrucSim)
Similarities with respect to structural and physicochemical properties of the binding sites were analyzed using the CavBase fast cavity graph comparison algorithm [
28,
29] (
Figure 2D, numbers in
Table S9). Note that binding sites were automatically detected using LigSite and thus may vary in precision throughout the different structures, even within the same kinase. Pairwise kinase similarities range from 0.16 (PI3K/ErbB2) to 0.61 (BRAF/VEGFR2 and LCK/VEGFR2) and are—with a mean value of 0.46 over all kinase pairs—generally lower than the IFPSim values described above (
Table 4). Interestingly, EGFR and ErbB2 share only moderate similarity in this measure (0.40), while EGFR is more similar to all other kinases (including BRAF; 0.52), except PI3K (0.24). However, it should be noted that the structural coverage for ErbB2 and PI3K is much lower than for the other kinases, with only two structures each (
Table 5). Note that EGFR is most similar to the anti-target BRAF (0.52). Thus, according to PocStrucSim, it appears difficult to develop ligands against all multi-target profiles (1–3,
Table 2).
2.4.5. Docking Rank Similarity (DockRankSim)
Finally, we leveraged the results of our docking experiments to derive a complementary similarity measure based on the rank correlation of the docked lead-like compounds (
Figure 2E). DockRankSim values were calculated using only the top-scoring 25,000 lead-like molecules for each structure (about 0.5% of the ZINC lead-like subset at that time), since control calculations taking into account the entirety of docked molecule sets showed poor discrimination between different kinases. This lack of discrimination is likely due to the fact that the majority of molecules in the lead-like set are not kinase inhibitor-like. Therefore, the docking rank order of molecules past a certain threshold is noisy, i.e., all of them are more or less equally unlikely to bind. However, they will still receive different ranks based on small scoring differences, and these different ranks will lead to rather different—yet meaningless—correlations between the rankings. Only the five kinases that were included in the four docking profiles (
Table 2) were considered, i.e., no values for CDK2, LCK, MET and p38α were determined.
EGFR and ErbB2 have by far the highest mutual similarity of 0.3 within this set of kinases and a DockRankSim below 0.12 to all other kinases. While their higher mutual DockRankSim is not surprising given the close relationship between EGFR and ErbB2, it is encouraging that the docking results capture this.
Interestingly, the second highest DockRankSim observed is between PI3K and BRAF (0.15), followed by BRAF and VEGFR2 (0.13) as well as PI3K and VEGFR2 (0.13). This is surprising as PI3K, as atypical kinase, shares a rather low similarity to the remaining kinases using most other measures employed in this study (
Figure 2A–D). The remaining DockRankSim values are around 0.1, which seems to be the center of the distribution. The smallest DockRankSim was observed between EGFR and PI3K (0.04), an indication that Profile 2 (+EGFR+PI3K—BRAF) inhibitor design might be a challenge, at least computationally.
2.4.6. Comparison of Similarity Analyses
To shed light on the ease of identifying inhibitors for the respective profiles and the possibility to predict the likelihood that multi-target design endeavors will be successful, five different protein similarity measures were calculated (
Figure 2A–E). While the individual relationships between the nine kinases studied differ according to the five measures (which might also be due to missing data or noise in the data, as discussed above), several trends can be observed.
The similarity scores of the PocStrucSim and the IFPSim comparisons are distributed more evenly and clearly correlate with each other (R = 0.78,
,
Figure S2). In addition, the pocket structure- and sequence-based comparisons follow a similar trend (PocStrucSim vs. PocSeqSim R = 0.73,
). All other pairwise comparisons are less correlated, showing values in the range of R = [0.55, 0.59] with
(
Figure S2). While several measurements appeared to be correlated, differences between them are not surprising since the measures capture diverse views and thus complementary information of similarity. Nonetheless, it should be noted that the calculated values highly depend on the amount of available data. The conformational space of a kinase might be underrepresented if few kinase structures are available, which affects the structure-related measurements. Furthermore, since ChEMBL only provides a very sparse kinase-compound matrix of experimental measurements, the basis of compounds considered per kinase pair may differ strongly, affecting the LigProfSim values (as well as the promiscuity as defined here).
Besides PocStrucSim, all other measures imply a high similarity between EGFR and ErbB2, which is in favor of +EGFR+ErbB2 inhibitor design. Furthermore, LigProfSim, PocStrucSim and PocSeqSim suggest BRAF as a relevant and frequent anti-target, while this is less clear-cut for the IFPSim and DockRankSim measures. This fact renders design for all three profiles a challenging task. Furthermore, while PI3K is very dissimilar to EGFR from a sequence point of view (cf. Manning tree annotation), it showed higher similarity based on other measures such as IFPSim, which is encouraging for Profile 2 (+EGFR+PI3K—BRAF) design. In this sense, the fact that our docking results did not yield compounds with such a profile would suggest that similarity to the anti-target (in this case, BRAF) larger than to the intended target could be a key factor complicating the detection of the desired compounds.
Overall, our analyses suggest that ligand-, sequence- and structure-based approaches complement each other and can thus yield consistent insights into kinase similarities. It therefore seems advisable to carry out all of these analyses before a (virtual) screening campaign in order to take appropriate steps, e.g., adaptation of the molecule library to be screened, early on. Our ranking comparisons also suggest that similarity between one of the targets and the anti-target that is higher than the similarity between the two intended targets can be used as a prognostic indicator for difficult multi-target profiles.