Next Article in Journal
Effect of Quercetin on mitoBKCa Channel and Mitochondrial Function in Human Bronchial Epithelial Cells Exposed to Particulate Matter
Next Article in Special Issue
MSEDDI: Multi-Scale Embedding for Predicting Drug—Drug Interaction Events
Previous Article in Journal
The Efficacy of Molecular Analysis in the Diagnosis of Bone and Soft Tissue Sarcoma: A 15-Year Mono-Institutional Study
Previous Article in Special Issue
Maximizing the Performance of Similarity-Based Virtual Screening Methods by Generating Synergy from the Integration of 2D and 3D Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models

Department of Safety Assessment, Genentech, Inc., South San Francisco, CA 94080, USA
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(1), 635; https://doi.org/10.3390/ijms24010635
Submission received: 27 October 2022 / Revised: 23 December 2022 / Accepted: 24 December 2022 / Published: 30 December 2022

Abstract

:
Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation that can impact compound design. Here, we present a consistent data inference approach, exemplified on two data sets of Ether-à-go-go-Related Gene (hERG) K+ inhibition data, for dose–response and screening experiments that are generally applicable for in vitro assays. hERG inhibition has been associated with severe cardiac effects and is one of the more prominent safety targets assessed in drug development, using a wide array of in vitro and in silico screening methods. In this study, the IC50 for hERG inhibition is estimated from diverse historical proprietary data. The IC50 derived from a two-point proprietary screening data set demonstrated high correlation (R = 0.98, MAE = 0.08) with IC50s derived from six-point dose–response curves. Similar IC50 estimation accuracy was obtained on a public thallium flux assay data set (R = 0.90, MAE = 0.2). The IC50 data were used to develop a robust quantitative model. The model’s MAE (0.47) and R2 (0.46) were on par with literature statistics and approached assay reproducibility. Using a continuous model has high value for pharmaceutical projects, as it enables rank ordering of compounds and evaluation of compounds against project-specific inhibition thresholds. This data inference approach can be widely applicable to assays with quantitative readouts and has the potential to impact experimental design and improve model performance, interpretation, and acceptance across many standard safety endpoints.

1. Introduction

Drug development typically incorporates variety of in vitro assays to assess off-target interactions that may ultimately result in toxicity. Often, these assays are run in multiple formats depending on the stage of development, and the historical data for the same endpoint can originate from somewhat variable protocols. Such data lend themselves well for the development of in silico models that can be used to screen virtual molecules for series selection or to prioritize compounds for experimental follow-up within a chemical series. One drawback in this context is that data are often limited, and it is advantageous to combine data from a variety of experimental protocols for the same endpoint to maximize chemical space coverage. This practice typically leads to data discretization and the development of categorical models that can be used as coarse filters but may not necessarily provide more detailed output to the user.
A typical safety concern for pharmaceutical projects is the propensity for cardiac toxicity. Cellular action potential and the electrical activity of the heart are largely regulated by voltage-gated ion channels [1]. The human ether-a-go-go-related gene (hERG or KCNH2) encodes a K+ channel (Kv11.1 or hERG) that is responsible for rapid repolarization. It is essential for a standard electrocardiogram readout and normal heart function [2,3,4]. hERG inhibition may cause long QT syndrome (LQTS), arrhythmia, and Torsade de Pointes (TdP), which can lead to palpitations, fainting, seizures, and, in severe cases, even sudden death [2,5,6,7]. Structurally diverse compounds have been shown to inhibit the hERG ion channel, and a number of drugs from across therapeutic areas have been withdrawn from the market or severely restricted due to hERG-related cardiotoxicity [1,8], for example, astemizole, cisapride, terfenadine, vardenafil, and ziprasidone [9,10,11].
Given the severity of cardiac events related to hERG inhibition, early assessment of hERG-related cardiotoxicity has been an important step in drug development since the early 2000s [12]. The United States Food and Drug Administration (FDA) and the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) added safety criteria guidance for possible or high risk TdP to new drug applications [13,14]. All drug candidates are required to be screened for hERG liabilities prior to clinical trials and regulatory review. Consequently, early hERG inhibition evaluation has been widely adopted across the pharmaceutical industry to eliminate compounds with higher hERG inhibition potential [15,16]. Many in vitro screening methods for hERG inhibition are available, but the patch-clamp electrophysiological assay is the most widely used method for in vitro hERG inhibition assessments [17,18,19]. While the technology is improving, it is relatively costly, laborious, and time-consuming [20]. However, the accumulation of hERG inhibition data enabled the development of in silico screening methods. In silico tools allow for rapid and cost-effective hazard identification, and no chemical matter is required for in silico assessment, allowing chemists to evaluate theoretical compounds before synthesis. Thus, in silico tools for hERG assessment provide valuable insights during lead findings and optimization stages of drug development.
Many models for hERG inhibition have been developed over the past twenty years [11,15,16,21,22,23,24,25,26,27,28,29,30,31,32,33]. As in all applications, model quality and utility are largely dependent on the training data [34]. Older hERG modeling efforts were limited by small qualitative or quantitative data sets with limited validation [15]. The majority of the recent models focused on classifying hERG inhibition at 10 μM or at multiple thresholds [11,15,16,21,24,25,27,28,29,30,31]. While categorical models are useful for hazard identification at a specific threshold, they do not allow scientists to rank-order compounds and do not support a project-specific threshold without retraining. Furthermore, toxicologists typically interpret experimental hERG inhibition results in terms of quantitative IC50 values and often prefer models with outputs in the same format. Consequently, it is advisable to develop continuous hERG inhibition models instead of categorical models whenever possible to facilitate model acceptance. However, quantitative model development has been hindered by data challenges. Much hERG inhibition data from early project stages may come from non-ICH experimental protocols. These protocols may differ widely across projects, across industries, and over time, making it more difficult to compare results and develop a comprehensive data set. Many early hERG inhibition screening assays are run in one or two points and do not provide IC50 estimates. Screening compounds in one or two concentrations only is common practice for financial reasons [35]. In addition, when the IC50 values lie outside the range of tested concentration, the IC50 may be assigned to a constant with a “<” or “>” qualifier and cannot be used directly for quantitative modeling [33].
In this paper, we demonstrate the impact of historical data processing on model performance. More specifically, robust dose–response inference from historical data can enable continuous hERG inhibition models that predict pIC50s instead of activity categories. The model’s training set incorporated historical data from dose–response and two-point screening experiments. We used a robust dose–response inference approach to derive IC50s from the two-point screening data and incorporated the IC50 data into the continuous model development, substantially increasing the size of the data set and enhancing the model performance. We demonstrate the benefits of the quantitative approach and discuss its practical uses in drug development. Finally, we show that the two-point inference can be applied to public data sets such as the thallium flux assay data available in PubChem.

2. Results

2.1. Two-Point Data Inference

Historical data for hERG inhibition and other toxicology endpoints often come from experiments that differ in screening format or protocols. It is often difficult to extract consistent and reliable quantitative potency estimates from different types of experiments. Conventionally, most results from one- and two-point screening are discretized to binary categories. Consequently, models based on these data tend to predict binary activity instead of continuous potency. We used simplified dose–response inference based on a one-parameter hill equation (Equation (1)) to obtain consistent IC50 estimates from two-point hERG screening experiments. To ensure model quality, we validated the reliability of IC50 estimates derived from two-point-screening experiments by comparing IC50 derived from 4–15 concentration dose–response curves to IC50s derived from a two-concentration subset of the curves, as discussed in the Methods section. To increase confidence, the analysis was repeated in two distinct data sets: internal proprietary data from hERG patch clamp assay [36,37,38] and a public thallium flux assay available in PubChem [11,39]. IC50s derived from two-point subsets of the full dose–response series demonstrated excellent correlation with IC50s derived from the full titration series. The correlation was better for the path clamp assay data (R = 0.98, MAE = 0.08 log10 units, Figure 1A), and somewhat worse for thallium flax assay data (R = 0.90, MAE = 0.2 log10 units, Figure 1B). In both cases, the uncertainty derived from mathematical extrapolation outside the tested concentration ranges was below the uncertainty resulting from repeated experiments, which was 0.21 log10 units based on the in-house patch clamp data set. The PubChem thallium flux assays did not contain any biological replicates to assess experimental variability. Larger differences in IC50 estimates were associated with lower response (% response range) captured in the two selected concentrations. Extreme errors in the thallium flax assay estimates were associated with non-monotonic responses and similar data idiosyncrasies, where the compounds produced no inhibition at the two concentrations selected for two-point screening but inhibited hERG at lower concentrations (see Figure S1 for examples).

2.2. hERG Model Performance

We compared four modeling strategies to evaluate the impact of the data processing on model performance. Two models for categorical and two models for continuous outcomes were developed (Table 1). The expanded continuous data (ECD) model included the pIC50s extrapolated from two-point hERG screening data, while the limited continuous data (LCD) model was based on the pIC50s derived from traditional dose–response experiments only. For categorical models, the data were discretized at a 10 μM threshold as discussed in the Methods section. The all binary data (ABD) model included data for all compounds that could be discretized at the 10 μM threshold without ambiguity. The high confidence binary data (HCBD) model only included compounds with % inhibition < 30% or >70% at 10 μM to minimize the influence of experimental variability on the classification.
The expansion of the data set with pIC50s derived from two-point experiments and estimates outside the tested concentration ranges improved the empirical model performance (Figure 2). The ECD model demonstrated lower RMSE, lower MAE, higher R2 and ΔQ2 compared to the LCD model (Table 2). Notably, these results are not consistent with the internal validation. In the five-fold internal cross-validation, the LCD model showed lower MAE (0.25) than the ECD model (MAE = 0.34) (Table 3).
Since the data for development of the categorical models were discretized at the 10 μM threshold, we discretized the predictions made by the ECD and LCD models around the same 10 μM threshold to compare model performance. The ECD model demonstrated higher sensitivity, balanced accuracy, NPV, and PPV (Table 2). Notably, the LCD model showed higher sensitivity and NPV than all other models but had the lowest balanced accuracy. The two categorical models performed comparably on the prospective test set (Table 2). The ABD model showed higher sensitivity, NPV, but lower balanced accuracy, specificity, and NPV compared to the HCBD model. Of the two categorical models, the ABD model showed better balanced accuracy during cross-validation; 0.90 vs. 0.83 for ABD and HCBD models, respectively (Table 3). Importantly, the ECD model showed the highest improvement over random accuracy (ΔQ2 = 0.28), compared to ABD and HCBD models (ΔQ2 = 0.21). Similar trends were observed when assessing model performance on a highest confidence test subset, where observations sufficiently close to the 10 μM threshold were excluded to account for experimental uncertainty (Table 2). Overall, the ECD model demonstrated the highest BA, Q2, ΔQ2, and PPV, as well as higher sensitivity and NPV than the categorical models. The ABD model showed higher sensitivity, NPV, but lower balanced accuracy, specificity, and NPV compared to the HCBD model. Table S3 provides confusion matrix statistics for each of the four models.

2.3. Alternative Decision Thresholds

Continuous models allow users to set different decision thresholds without retraining the model. We evaluated the more robust ECD model on four activity thresholds that are of typical interest to drug development projects (Table 2). As expected, model performance trended with the activity distribution in the data set. The highest balanced accuracy (BA = 0.79) was observed at the 10 μM threshold, where the fraction of active hERG inhibitors approached 0.5 (prevalence = 0.58). While sensitivity varied substantially across thresholds, the NPV, PPV and BA remained more consistent across the threshold range.

3. Discussion

Learning from imprecise data has been a growing interest across computational sciences [40]. The topic is of particular interest in computational toxicology since model development in the field usually relies on historical data collected in diverse formats for different purposes and, frequently, from different experimental protocols. Furthermore, toxicological data sets are often small, ranging from hundreds to a few thousand observations. Thus, each data point could be critical for in silico model development. For example, to include the most data, many hERG inhibition models have been developed using a combination of diverse public and private historical data sets [11,15,16,21,22,23,24,25,26,27,28,29,30,31,32,33]. The majority of these models treated inhibition as a categorical outcome around a threshold, in part, due to high prevalence of ambiguous IC50s with “>” or “<” qualifiers. These data points are generated typically when the experimental IC50 is outside the range of tested concentrations and the curve-fitting algorithms do not extrapolate beyond the tested concentration range when determining the IC50. Categorical models typically ignored these data or assigned surrogate values. While literature precedents have suggested that the addition of imprecise data improves model performance, careful interpretation of the data uncertainty in the context of experimental protocol and assay details is essential [32,40,41,42]. Previous studies indicated that across secondary pharmacology safety screening targets, IC50s can be reliably estimated from single concentration data if the measured % inhibition falls between 20% and 80% [35]. Based on these hypotheses, we fit dose–response curves to all in-house patch-clamp data regardless of experimental format and extrapolated IC50 estimates beyond tested concentration ranges. This approach allowed us to use all data from the faster and cheaper two-point screening assay in a continuous model and expanded the internal data set more than 10-fold. IC50 values from the two-point screening data accurately estimated IC50s from more complete dose–response curves when the measured % inhibition fell between 20% and 90% (Figure 1). The median pIC50 errors due to mathematical extrapolation outside the tested concentration ranges for titration series with measured % inhibition between 20% and 90% ranged from 0.01 to 0.1 in the two data sets and were far below experimental uncertainty in repeated experiments (0.21 log10 units). Estimating IC50s from two-point screening was an effective strategy of obtaining reliable IC50 estimates and expanding modeling data sets for the hERG patch clamp and thallium flux assays. While the two-point inference based on the 1 and 10 μM concentrations worked well for a vast majority of experiments, the exact IC50s for very potent or inactive compounds may be difficult to ascertain when data are collected at these two concentration only. In the extreme cases, the data may provide very limited information about the shape of the dose–response curve and the IC50 location. Our analysis suggested that IC50s between 0.1 and 100 μM can be reliably estimated from these two concentrations (Figure 1); i.e., the extrapolation is robust within one log unit of the tested concentrations. However, when a more exact IC50 estimate for more potent or less potent compounds is required, additional or different concentrations should be screened. The 1 and 10 μM values were selected based on their relevance to decision boundaries for hERG inhibition. Practically, any compound that is too potent to have its IC50 reliably estimated on these two concentrations is unlikely to be a viable drug candidate due to hERG inhibition concerns. Conversely, compounds completely inactive up to 10 μM pose no concern based on this assay. Thus, as with most computational tasks, the experimental design still plays a critical role in final data quality and interpretability. While robust inference can help maximize the knowledge gained from data, the upper limit is still dictated by the choices made in the experimental design.
Finally, we note that while this approach works well for the two hERG assays discussed in the paper, it should be further validated when applied to new assays. This is particularly critical for endpoints that are more likely to exhibit non-monotonic responses, such as cytotoxicity. However, given the ample and growing interest in predictive models for drug discovery [43] and challenges associated with balancing categorical data sets [44], these data inference approaches can be of high interest for developing more robust models in computational toxicology and drug discovery.
Expanding the quantitative data set with IC50s derived from the two-point data helped develop a more robust, continuous hERG model for internal project use (Table 2). Data set composition has been shown to have large effects on the performance of categorical models [45]. The trend held true for both the categorical and the continuous models in our study. When used to classify compounds at the 10 μM threshold, the ABD model showed higher specificity and balanced accuracy but lower sensitivity as compared with the HCBD model that was built on fewer data points with higher classification confidence. These results were expected, as the infusion of extrapolated IC50s into the ABD model added 2250 compounds with IC50 values above 10 μM and 179 compounds with IC50 values below 10 μM, enhancing the specificity and PPV. Notably, the ECD model showed marginal improvement compared to the two classification models, when evaluated on its ability to classify compounds at the 10 μM threshold, and demonstrated notably higher improvement over random accuracy (Table 2). Finally, we noted that the common practice of removing compounds near the activity classification cutoff during model training had minimal influence on the model performance (Table 2). The practice is typically used to avoid including data points with ambiguous true classification in model training. However, in this case-study, this practice increased the apparent model performance during cross-validation only (Table 3) and did not have any notable impact when the performance was assessed on the external test set (Table 2). Our findings suggested that the efforts to be overly selective with the data included in the training set may lead to over-fitting and may inflate model confidence if the model is assessed by cross-validation alone. More generally, the observations support the ML hypothesis that additional data, even that of somewhat lower quality, may improve model performance when compared to models based on smaller data sets with the highest-quality data only [32,40,41,42].
The training and test sets differed substantially in the prevalence of hERG-inhibiting compounds (0.39 and 0.58, respectively). This difference provided an additional challenge for the model beyond the typically challenging prospective assessment. However, this set up presented a more realistic assessment of the model utility in the evolving pharmacological space. Although the statistics and model performance may be seen as suboptimal, these statistics provide a conservative assessment of model performance on new projects or new chemical series. The continuous model statistics compared favorably with recent large data hERG models. Ma et al. reported R2 values ranging from 0.305 to 0.352 for hERG inhibition models [33]. The ECD model had an R2 of 0.46 on the prospective test set. Radchenko et al. reported Q2 = 0.6 and RMSEcv = 0.55 in a larger study that included cross-validation assessment only [46] In cross-validation, the ECD model was comparable. During hyperparameter optimization, the models exhibited Q2 values ranging from 0.66 to 0.69, and RMSEcv ranging from 0.32 to 0.35. These differences in performance statistics between internal and external validation once again highlight the importance of external and prospective model assessments. Older studies have reported Q2 values as high as 0.9 on test sets under 14 compounds [15]. While these provide an upward bound of model potential, they are unlikely to perform equally well in changing chemical space. Models trained and tested on small data sets may not be adequate to address the challenges faced in an industrial setting, where the models are expected to make useful predictions on a constantly changing chemical.
The ECD model that predicted a continuous IC50 value demonstrated marginal improvement in performance statistics when used for classification at the 10 μM threshold and when compared to the two categorical models (Table 2). While it may be inappropriate to generalize findings in a purely statistical sense, models for continuous data present several practical advantages for drug development. Most importantly, continuous IC50 predictions can be used to rank-order compounds. The feature is particularly useful during the early stage of drug development, as it allows chemists to rank-order theoretical compounds before synthesis. The predictions can be used in multi-parameter optimization to select chemical space or series expected to be less likely to produce adverse effects down the road. Second, scientists may evaluate new compounds against different project-specific thresholds using the same model. The flexibility of ranking compounds based on IC50 predictions or choosing an activity threshold is particularly useful for promiscuous targets, which may be evaluated across many projects. Owing to the flexibility of the hERG ligand binding site, hERG is a promiscuous target that may be inhibited by structurally diverse substances [47]. Continuous models allow scientists to choose different thresholds based on the relative potency of the molecules within a specific project or a specific series. Theoretically, a series of binary models can be used to bin compounds to be more refined; however, practically, the modeling approach would be more difficult to execute and maintain. Model development would be substantially influenced by data set composition, as some bins may have few or no compounds. More conceptually, the models would discretize the data that are inherently continuous in nature, thus reducing the information each model can learn from. Finally, to maximize model acceptance, it is usually advisable to provide project teams with a single model output, so that outputs from multiple models would have to be further aggregated. IC50 is the standard and the most robust output of the hERG patch clamp and thallium flux assays and is the parameter most commonly used for interpretation by chemists and toxicologists. Consequently, it is advantageous to report predicted IC50s to provide users with familiar and easily comparable values; thus further improving model acceptance.
In silico models that can reliably predict hERG blockade are useful to medicinal chemists and toxicologists during hit finding, hit-to-lead, and lead optimization stages of drug development. They allow projects to screen out potentially hazardous molecules early in the development process, saving time and money [48]. All models presented here showed high PPV and sensitivity at the 10 μM threshold (Table 2). These parameters are particularly useful in early drug development where it is important to avoid unnecessarily eliminating compounds, i.e., model predictions are unlikely to flag an inactive molecule as a hERG inhibitor. The predictions are validated with dose–response in vitro experiments later in the drug development process, and the early in silico profiling can help projects to selects leads with the best safety margins, rather than simply selecting those that have efficacy at the lowest plasma concentration. When applied to broader sets of industrial compounds, the in silico hERG models can help fill data gaps and guide risk minimization strategies.
One limitation of extracting IC50s from a one- or two-point screen in functional cell-based assays is the inability to capture partial activity. The models assumed 100% inhibition at some high concentration. While partial activity did not appear to contribute significantly to the types of responses observed in our data set, it may be a greater concern for other assays or in different chemical spaces. Analyzing and modeling different response patterns is a topic for further research.
Finally, hERG inhibition is important but not sufficient for predicting QT prolongation and drug-induced TdP. Some molecules may inhibit multiple ion channels with the cumulative effect, normalizing the standard depolarization–repolarization cycle [49]. Another advantage of the continuous modeling for cardiotoxicity prediction is that quantitative outputs from robust channel inhibition models can be combined with in silico reconstructions of cellular cardiac electrophysiological activity to model a more holistic output of overall cardiotoxicity potential based on chemical structure [50].

4. Materials and Methods

4.1. Chemical Data Sets

We compiled a data set of historic small molecules from Genentech pipeline with patch clamp hERG inhibition data. The compounds included 80 projects spanning the 2007–2021 time frame. The structures for all compounds used in the screening were standardized using OpenEye Scientific Software [51]. The counter ions were removed, and tautomeric forms standardized as discussed elsewhere [51,52]. The standardized structures were grouped based on InChIKey [53] calculated using RDKit toolbox [54] in Python v 3.7. All models were based on 2D descriptors, and hence, enantiomers and diastereomers appeared identical in the descriptor space, and single unique 2D structure was retained when enantiomers or diostereomers were present. A weighted average of pIC50 was calculated when multiple experiments were available for a 2D structure. The resulting data set contained 4200 unique 2D structures (Table S2).

4.2. Chemical Descriptors

We used a combination of molecular fingerprints and physicochemical properties to describe 2D chemical structures. Morgan (Circular), MACCs, Topological (RDKit), Feature Morgan, Pattern, Topological Torsion fingerprints and RDKit descriptors were calculated using RDKit toolbox [54] in Python v 3.7. In addition, Moka 2.6.5 [55] from Molecular Discovery was used to calculate pKa values, and internal models were used to produce additional substructure and partitioning descriptors [26,56]. The fingerprints and molecular descriptors were used to feature molecular structure for model development. The features were selected via regularization and feature selection filtering methods implemented in XGBoost.

4.3. Dose–Response Inference for Genentech Data

We collected 6216 historical experiments from the automated QPatch or SyncoPatch whole cell patch-clamp assay (Table S2) for the 4200 unique 2D structures [36,37,38]. The 6216 hERG patch-clamp experiments differed widely in the number of concentrations screened per titration series. The majority of the data were collected in two-point screening format, where compounds are screened at 1 and 10 μM. In all cases, the experiments are normalized to positive (E-4031) and negative (DMSO) controls. To ensure consistent inference across experiments, we developed an IC50 inference protocol that reliably extrapolates IC50s from two-point patch clamp screening data. In addition, we allowed the algorithm to extrapolate IC50s outside of the tested concentration ranges to support continuous model development. All titration series were fit to a simplified hill curve by fixing three of the four hill curve parameters (Equation (1)). The upper asymptote was set to 100% inhibition, i.e., the inhibition observed in the positive control (E-4031). The lower asymptote of the hill curve was fixed to 0% inhibition, i.e., normalized response observed in DMSO negative controls. The hill slope was set to 1. The three constant parameters were derived via a combination of heuristic information about the experimental set up and empirical data analysis. The resulting 1-parameter hill curves were fit in R statistical environment [57] using a likelihood-based optimization routine via Nelder–Mead optimization method as previously discussed [41,58].
%   Inhibition = 100 1 + IC 50 [ concentration ]
Since 85% of all data were generated in two-point-screening format, we ensured that the IC50 estimates from the two-point experiments were reliable. To assess the reliability of two-point IC50 estimates, we compared IC50 estimates from titration series with more than two points to the IC50s derived from two-point data sets in the same experiment (Figure 3). The 1 and 10 μM concentrations were selected as the two-point subsets of dose–response curves because these two concentrations are screened in all two-point experiments and have direct relevance for decision thresholds in drug safety.
In all cases, the curves were fit when at least one response in a titration series exceeded 10% inhibition. When no curve could be fit to a titration series due to lack of response in all tested concentrations, the IC50 was assigned to 1.5 log units above the highest tested concentration. The approach produced IC50 estimates of 316 μM for a typical experiment where a compound was tested up to 10 μM and hERG inhibition did not exceed 10%. The standard value was selected based on empirical evidence and the typical shape of the dose–response curves in the assays.
As discussed above, the data were combined using 2D chemical structures as unique identifiers. When pIC50 values from multiple experiments were available for a 2D structure, we calculated a weighted average of the pIC50 values. The weights were calculated based on the response range (Equation (3)), i.e., the maximum range of % inhibition response observed across the tested concentrations in each titration series. Finally, the IC50 values were converted to pIC50 for modeling effort (Equation (3))
Response   Range = max ( 100 min ( %   Inh ) ,   max ( %   Inh ) )
pIC 50 = log 10 ( IC 50 )
To demonstrate the applicability of the approach to other data, we applied the same data inference algorithm to the public thallium flux assay data. The thallium flux assay detects inhibition of the hERG channels by measuring flow of a surrogate ion, thallium, in a homogenous assay format. It is a high-throughput alternative to the more expensive patch clamp technology that has been extensively validated on small molecules with the potential to block hERG and induce LQTS [59,60]. We extracted data for 5281 dose–response experiments from PubChem (AID 588834) on 1 August 2022. The experiments were run in 7, 14, or 15 concentrations ranging from 0.1 nM to 92 μM. The curves were categorized into several classes based on the fraction of response achieved in each curve and the quality of the dose–response fit [61]. To enable reliable comparison metrics, we excluded curves annotated with poor fits, curves with no response across the tested concentration (PubChem inactive), and curves designated as “single point of activity”. The remaining data set contained 629 experiments. To assess the utility of the two-point inference, we selected the two concentrations closest to 1 and 10 μM, which were 0.84 and 9.2 μM in this data set. We then fit dose–response curves to the full concentration response curves and the two-point subsets of the data as discussed above.

4.4. hERG Modeling Strategies

To assess the impact of data inference on model performance and utility, we developed four models that reflect common data processing and model development practices in computational toxicology (Table 1). Two continuous models aimed to predict pIC50 (ECD and LCD), and two discretized the pIC50 values into active and inactive at the 10 μM threshold (ABD and HCBD). The 10 μM threshold was chosen based on common hERG data guidance in safety assessment and literature precedence [16,21,24,31]. The ECD model incorporated all available data. The LCD model only included data points with IC50 values within the tested concentration ranges. The ABD model included data with congruent classification at the 10 μM threshold. Data were excluded from the ABD training set if multiple experiments showed IC50 values above and below the 10 μM threshold for the same compound. HCBD model included only the data with higher classification confidence at the 10 μM threshold. In addition, compounds with % inhibition > 30% and <70% at 10 μM were designated as ambiguous for the purposes of classification and were excluded from the HCBD model training set. The HCBD model was included to assess the impact of the common categorical modeling practice where scientists exclude compounds near the activity threshold from the training set [62].

4.5. Statistical Methods

All four models were trained using eXtreme Gradient Boosting (XGBoost) implementation in the R package XGBoost [63]. The boosted decision tree approaches have demonstrated good results across cheminformatics and broader modeling tasks [16,64]. This modeling method has several advantages, including improved model performance, automatic feature selection, and interpretable output. Hyperparameters (Table S1) and final model features were elected by cross-validation. The final features for each model were selected via combination of regularization and feature selection hyperparameters in the XGBoost algorithm.
For all models, the data sets were split into training and prospective test sets before any further data processing. To mimic industrial use, the test set was selected on prospective bases, i.e., the set of 115 compounds most recently screened in 2021. The representations of the chemical space for test and train sets based on common dimension reduction techniques [65,66,67,68] are provided in Figure S2. Linear combinations of features in the training set were removed and hyperparameters were optimized using 5-fold and 8-fold nested cross-validation. The hyperparameters for each model are available in Table S1. XGBoost models with the optimized hyperparameters were built on the entire training set and evaluated on the prospective test set. The results of 5-fold cross-validation for each model are provided in Table 3.

4.6. Model Performance

A single prospective test set was used to assess and compare model performance. We used a comprehensive set of evolution metrics to assess model performance on the prospective test set (N = 115). These included sensitivity (Sens), specificity (Spec), positive predictive value (PPV), negative predictive value (NPV), balanced accuracy (BA), and accuracy (Q2) [16]. In addition, we included random accuracy (Q2,rnd) and model improvement over random accuracy (ΔQ2) for alternative metrics of model performance [69]. Since experimental variability may render classification around a threshold ambiguous, we identified a subset of the test data with higher experimental classification confidence for model evaluation. A subset included compounds with pIC50 values between 4.8 and 5.2 (N = 94). pIC50 values within 0.2 log units of the 10 μM threshold were removed based on the background assay variability: the median difference between IC50 values across experiments was 0.21 log units.

5. Conclusions

In this paper, we outlined the impact of historical data processing on model development, using hERG patch clamp inhibition assay as a case study. We demonstrated that pIC50 derived from two-point screening data can serve as a robust estimate for pIC50s derived from six-point dose–response curves, especially within the concentration ranges that are typically relevant for drug development decisions. Quantitative pIC50 inference from screening data enable the development of a quantitative QSAR model that can predict pIC50s for hERG inhibition instead of activity categories. This model performed on par with categorical models and demonstrated higher improvement over random accuracy when the output is discretized to predict activity categories. Furthermore, the continuous model enabled scientists to rank order and prioritize compounds in early research or to classify compounds around project-specific thresholds without retraining models. The pIC50 inference approach used here can be applied to expand the development of quantitative models in computational toxicology and drug development. It may be particularly impactful when modeling data that are highly categorically imbalanced. However, we advise scientists to review the pIC50 extrapolation results when applying similar approaches to new assays, as the inference can be complicated by non-monotonic dose–response curves or noisy assay data.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms24010635/s1, Table S1: Optimized hyperparameters used by each of the four models; Table S2: Genentech hERG data; Table S3: Confusion matrix statistics; Figure S1: Dose–response plots for PubChem data; Figure S2: Chemical space representation for compounds in the data set.

Author Contributions

All authors conceptualized the work and contributed to formal analysis and writing. L.T.A. and F.M. performed data curation and review. F.M. performed modeling and validation. C.H. supervised the work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

R, MLR, and XGBoost are freely available from https://cran.r-project.org/ on 2 August 2022. HERG patch clamp data and XGBoost model parameter sets for all models are provided as part of the Supporting Information. Chemical structures for internal molecules could not be disclosed. The thallium flux assay data are available from PubChem AID 588834 (https://pubchem.ncbi.nlm.nih.gov/bioassay/588834, accessed on 2 August 2022).

Acknowledgments

The authors would like to acknowledge Kevin P. Clark and Satoko Kiyota for their roles in data management and subject matter commentary, respectively.

Conflicts of Interest

All authors are employees of Genentech, Inc., and own stock in Roche Holding Ltd.

References

  1. Villoutreix, B.O.; Taboureau, O. Computational Investigations of HERG Channel Blockers: New Insights and Current Predictive Models. Adv. Drug Deliv. Rev. 2015, 86, 72–82. [Google Scholar] [CrossRef]
  2. Sanguinetti, M.C.; Tristani-Firouzi, M. HERG Potassium Channels and Cardiac Arrhythmia. Nature 2006, 440, 463–469. [Google Scholar] [CrossRef]
  3. Smith, P.L.; Baukrowitz, T.; Yellen, G. The Inward Rectification Mechanism of the HERG Cardiac Potassium Channel. Nature 1996, 379, 833–836. [Google Scholar] [CrossRef] [PubMed]
  4. Vandenberg, J.I.; Perry, M.D.; Perrin, M.J.; Mann, S.A.; Ke, Y.; Hill, A.P. HERG K+ Channels: Structure, Function, and Clinical Significance. Physiol. Rev. 2012, 92, 1393–1478. [Google Scholar] [CrossRef] [Green Version]
  5. Brugada, R.; Hong, K.; Dumaine, R.; Cordeiro, J.; Gaita, F.; Borggrefe, M.; Menendez, T.M.; Brugada, J.; Pollevick, G.D.; Wolpert, C.; et al. Sudden Death Associated With Short-QT Syndrome Linked to Mutations in HERG. Circulation 2004, 109, 30–35. [Google Scholar] [CrossRef] [Green Version]
  6. Curran, M.E.; Splawski, I.; Timothy, K.W.; Vincen, G.M.; Green, E.D.; Keating, M.T. A Molecular Basis for Cardiac Arrhythmia: HERG Mutations Cause Long QT Syndrome. Cell 1995, 80, 795–803. [Google Scholar] [CrossRef] [Green Version]
  7. Redfern, W.S.; Carlsson, L.; Davis, A.S.; Lynch, W.G.; MacKenzie, I.; Palethorpe, S.; Siegl, P.K.S.; Strang, I.; Sullivan, A.T.; Wallis, R.; et al. Relationships between Preclinical Cardiac Electrophysiology, Clinical QT Interval Prolongation and Torsade de Pointes for a Broad Range of Drugs: Evidence for a Provisional Safety Margin in Drug Development. Cardiovasc. Res. 2003, 58, 32–45. [Google Scholar] [CrossRef]
  8. Brown, A.M. Drugs, HERG and Sudden Death. Cell Calcium 2004, 35, 543–547. [Google Scholar] [CrossRef]
  9. Giacomini, K.M.; Krauss, R.M.; Roden, D.M.; Eichelbaum, M.; Hayden, M.R.; Nakamura, Y. When Good Drugs Go Bad. Nature 2007, 446, 975–977. [Google Scholar] [CrossRef]
  10. Laverty, H.G.; Benson, C.; Cartwright, E.J.; Cross, M.J.; Garland, C.; Hammond, T.; Holloway, C.; McMahon, N.; Milligan, J.; Park, B.K.; et al. How Can We Improve Our Understanding of Cardiovascular Safety Liabilities to Develop Safer Medicines? Br. J. Pharmacol. 2011, 163, 675–693. [Google Scholar] [CrossRef]
  11. Sun, H.; Huang, R.; Xia, M.; Shahane, S.; Southall, N.; Wang, Y. Prediction of HERG LiabilityUsing SVM Classification, Bootstrapping and Jackknifing. Mol. Inform. 2017, 36, 1600126. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Witchel, H.J. The HERG Potassium Channel as a Therapeutic Target. Expert Opin. Ther. Targets 2007, 11, 321–336. [Google Scholar] [CrossRef] [PubMed]
  13. Kratz, J.M.; Schuster, D.; Edtbauer, M.; Saxena, P.; Mair, C.E.; Kirchebner, J.; Matuszczak, B.; Baburin, I.; Hering, S.; Rollinger, J.M. Experimentally Validated HERG Pharmacophore Models as Cardiotoxicity Prediction Tools. J. Chem. Inf. Model. 2014, 54, 2887–2901. [Google Scholar] [CrossRef] [PubMed]
  14. Raschi, E.; Vasina, V.; Poluzzi, E.; De Ponti, F. The HERG K+ Channel: Target and Antitarget Strategies in Drug Development. Pharmacol. Res. 2008, 57, 181–195. [Google Scholar] [CrossRef]
  15. Rodolpho, C.B.; Vinicius, M.A.; Meryck, F.B.S.; Eugene, M.; Denis, F.; Alexander, T.; Carolina, H.A. Tuning HERG Out: Antitarget QSAR Models for Drug Development. Curr. Top. Med. Chem. 2014, 14, 1399–1415. [Google Scholar]
  16. Siramshetty, V.B.; Nguyen, D.-T.; Martinez, N.J.; Southall, N.T.; Simeonov, A.; Zakharov, A.V. Critical Assessment of Artificial Intelligence Methods for Prediction of HERG Channel Inhibition in the “Big Data” Era. J. Chem. Inf. Model. 2020, 60, 6007–6019. [Google Scholar] [CrossRef]
  17. Kiss, L.; Bennett, P.B.; Uebele, V.N.; Koblan, K.S.; Kane, S.A.; Neagle, B.; Schroeder, K. High Throughput Ion-Channel Pharmacology: Planar-Array-Based Voltage Clamp. ASSAY Drug Dev. Technol. 2003, 1, 127–135. [Google Scholar] [CrossRef]
  18. Polonchuk, L. Toward a New Gold Standard for Early Safety: Automated Temperature-Controlled HERG Test on the PatchLiner®. Front. Pharmacol. 2012, 102–111. [Google Scholar] [CrossRef] [Green Version]
  19. Wen, D.; Liu, A.; Chen, F.; Yang, J.; Dai, R. Validation of Visualized Transgenic Zebrafish as a High Throughput Model to Assay Bradycardia Related Cardio Toxicity Risk Candidates. J. Appl. Toxicol. 2012, 32, 834–842. [Google Scholar] [CrossRef]
  20. Polak, S.; Wiśniowska, B.; Brandys, J. Collation, Assessment and Analysis of Literature in Vitro Data on HERG Receptor Blocking Potency for Subsequent Modeling of Drugs’ Cardiotoxic Properties. J. Appl. Toxicol. 2009, 29, 183–206. [Google Scholar] [CrossRef]
  21. Cai, C.; Guo, P.; Zhou, Y.; Zhou, J.; Wang, Q.; Zhang, F.; Fang, J.; Cheng, F. Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity. J. Chem. Inf. Model. 2019, 59, 1073–1084. [Google Scholar] [CrossRef] [PubMed]
  22. Chavan, S.; Abdelaziz, A.; Wiklander, J.G.; Nicholls, I.A. A K-Nearest Neighbor Classification of HERG K+ Channel Blockers. J. Comput. Aided Mol. Des. 2016, 30, 229–236. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Czodrowski, P. HERG Me Out. J. Chem. Inf. Model. 2013, 53, 2240–2251. [Google Scholar] [CrossRef] [PubMed]
  24. Konda, L.S.K.; Keerthi Praba, S.; Kristam, R. HERG Liability Classification Models Using Machine Learning Techniques. Comput. Toxicol. 2019, 12, 100089. [Google Scholar] [CrossRef]
  25. Korotcov, A.; Tkachenko, V.; Russo, D.P.; Ekins, S. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Mol. Pharm. 2017, 14, 4462–4475. [Google Scholar] [CrossRef] [PubMed]
  26. Lee, M.-L.; Aliagas, I.; Feng, J.A.; Gabriel, T.; O’Donnell, T.J.; Sellers, B.D.; Wiswedel, B.; Gobbi, A. Chemalot and Chemalot_knime: Command Line Programs as Workflow Tools for Drug Discovery. J. Cheminformatics 2017, 9, 38. [Google Scholar] [CrossRef]
  27. Ogura, K.; Sato, T.; Yuki, H.; Honma, T. Support Vector Machine Model for HERG Inhibitory Activities Based on the Integrated HERG Database Using Descriptor Selection by NSGA-II. Sci. Rep. 2019, 9, 12220. [Google Scholar] [CrossRef] [Green Version]
  28. Sharifi, M.; Buzatu, D.; Harris, S.; Wilkes, J. Development of Models for Predicting Torsade de Pointes Cardiac Arrhythmias Using Perceptron Neural Networks. BMC Bioinform. 2017, 18, 497. [Google Scholar] [CrossRef]
  29. Siramshetty, V.B.; Chen, Q.; Devarakonda, P.; Preissner, R. The Catch-22 of Predicting HERG Blockade Using Publicly Accessible Bioactivity Data. J. Chem. Inf. Model. 2018, 58, 1224–1233. [Google Scholar] [CrossRef]
  30. Wang, S.; Sun, H.; Liu, H.; Li, D.; Li, Y.; Hou, T. ADMET Evaluation in Drug Discovery. 16. Predicting HERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches. Mol. Pharm. 2016, 13, 2855–2866. [Google Scholar] [CrossRef]
  31. Cianchetta, G.; Li, Y.; Kang, J.; Rampe, D.; Fravolini, A.; Cruciani, G.; Vaz, R.J. Predictive Models for HERG Potassium Channel Blockers. Bioorganic Med. Chem. Lett. 2005, 15, 3637–3642. [Google Scholar] [CrossRef] [PubMed]
  32. Jing, Y.; Easter, A.; Peters, D.; Kim, N.; Enyedy, I.J. In Silico Prediction of HERG Inhibition. Future Med. Chem. 2015, 7, 571–586. [Google Scholar] [CrossRef] [PubMed]
  33. Ma, J.; Sheridan, R.P.; Liaw, A.; Dahl, G.E.; Svetnik, V. Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships. J. Chem. Inf. Model. 2015, 55, 263–274. [Google Scholar] [CrossRef] [PubMed]
  34. Cronin, M.T.D.; Schultz, T.W. Pitfalls in QSAR. J. Mol. Struct. THEOCHEM 2003, 622, 39–51. [Google Scholar] [CrossRef]
  35. Bowes, J.; Brown, A.J.; Hamon, J.; Jarolimek, W.; Sridhar, A.; Waldron, G.; Whitebread, S. Reducing Safety-Related Drug Attrition: The Use of in Vitro Pharmacological Profiling. Nat. Rev. Drug Discov. 2012, 11, 909–922. [Google Scholar] [CrossRef]
  36. Obergrussberger, A.; Brüggemann, A.; Goetze, T.A.; Rapedius, M.; Haarmann, C.; Rinke, I.; Becker, N.; Oka, T.; Ohtsuki, A.; Stengel, T.; et al. Automated Patch Clamp Meets High-Throughput Screening: 384 Cells Recorded in Parallel on a Planar Patch Clamp Module. J. Lab. Autom. 2016, 21, 779–793. [Google Scholar] [CrossRef] [Green Version]
  37. Asmild, M.; Oswald, N.; Krzywkowski, K.M.; Friis, S.; Jacobsen, R.B.; Reuter, D.; Taboryski, R.; Kutchinsky, J.; Vestergaard, R.K.; Schrøder, R.L.; et al. Upscaling and Automation of Electrophysiology: Toward High Throughput Screening in Ion Channel Drug Discovery. Recept. Channels 2003, 9, 49–58. [Google Scholar] [CrossRef]
  38. Stoelzle, S.; Obergrussberger, A.; Brüggemann, A.; Haarmann, C.; George, M.; Kettenhofen, R.; Fertig, N. State-of-the-Art Automated Patch Clamp Devices: Heat Activation, Action Potentials, and High Throughput in Ion Channel Screening. Front. Pharm. 2011, 2, 76. [Google Scholar] [CrossRef] [Green Version]
  39. Titus, S.A.; Beacham, D.; Shahane, S.A.; Southall, N.; Xia, M.; Huang, R.; Hooten, E.; Zhao, Y.; Shou, L.; Austin, C.P.; et al. A New Homogeneous High-Throughput Screening Assay for Profiling Compound Activity on the Human Ether-a-Go-Go-Related Gene Channel. Anal. Biochem. 2009, 394, 30–38. [Google Scholar] [CrossRef] [Green Version]
  40. Couso, I.; Sánchez, L. Harnessing the Information Contained in Low-Quality Data Sources. Int. J. Approx. Reason. 2014, 55, 1485–1486. [Google Scholar] [CrossRef]
  41. Melnikov, F.; Hsieh, J.-H.; Sipes, N.S.; Anastas, P.T. Channel Interactions and Robust Inference for Ratiometric β-Lactamase Assay Data: A Tox21 Library Analysis. ACS Sustain. Chem. Eng. 2018, 6, 3233–3241. [Google Scholar] [CrossRef] [PubMed]
  42. Hüllermeier, E. Learning from Imprecise and Fuzzy Observations: Data Disambiguation through Generalized Loss Minimization. Int. J. Approx. Reason. 2014, 55, 1519–1534. [Google Scholar] [CrossRef]
  43. Bajorath, J.; Chávez-Hernández, A.L.; Duran-Frigola, M.; Fernández-de Gortari, E.; Gasteiger, J.; López-López, E.; Maggiora, G.M.; Medina-Franco, J.L.; Méndez-Lucio, O.; Mestres, J.; et al. Chemoinformatics and Artificial Intelligence Colloquium: Progress and Challenges in Developing Bioactive Compounds. J. Cheminformatics 2022, 14, 82. [Google Scholar] [CrossRef] [PubMed]
  44. López-López, E.; Fernández-de Gortari, E.; Medina-Franco, J.L. Yes SIR! On the Structure–Inactivity Relationships in Drug Discovery. Drug Discov. Today 2022, 27, 2353–2362. [Google Scholar] [CrossRef] [PubMed]
  45. Rodríguez-Pérez, R.; Vogt, M.; Bajorath, J. Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds. J. Chem. Inf. Model. 2017, 57, 710–716. [Google Scholar] [CrossRef] [Green Version]
  46. Radchenko, E.V.; Rulev, Y.A.; Safanyaev, A.Y.; Palyulin, V.A.; Zefirov, N.S. Computer-Aided Estimation of the HERG-Mediated Cardiotoxicity Risk of Potential Drug Components. Dokl. Biochem. Biophys. 2017, 473, 128–131. [Google Scholar] [CrossRef]
  47. Sun, H. An Accurate and Interpretable Bayesian Classification Model for Prediction of HERG Liability. ChemMedChem 2006, 1, 315–322. [Google Scholar] [CrossRef]
  48. Ford, K.A. Refinement, Reduction, and Replacement of Animal Toxicity Tests by Computational Methods. ILAR J. 2016, 57, 226–233. [Google Scholar] [CrossRef] [Green Version]
  49. Yang, T.; Snyders, D.; Roden, D.M. Drug Block of I Kr : Model Systems and Relevance to Human Arrhythmias. J. Cardiovasc. Pharmacol. 2001, 38, 737–744. [Google Scholar] [CrossRef] [Green Version]
  50. Park, J.-S.; Jeon, J.-Y.; Yang, J.-H.; Kim, M.-G. Introduction to in Silico Model for Proarrhythmic Risk Assessment under the CiPA Initiative. Transl. Clin. Pharm. 2019, 27, 12–18. [Google Scholar] [CrossRef] [Green Version]
  51. Cruciani, G.; Milletti, F.; Storchi, L.; Sforna, G.; Goracci, L. In Silico PKa Prediction and ADME Profiling. Chem. Biodivers. 2009, 6, 1812–1821. [Google Scholar] [CrossRef] [PubMed]
  52. Gobbi, A.; Lee, M.-L. Handling of Tautomerism and Stereochemistry in Compound Registration. J. Chem. Inf. Model. 2012, 52, 285–292. [Google Scholar] [CrossRef]
  53. Heller, S.; McNaught, A.; Stein, S.; Tchekhovskoi, D.; Pletnev, I. InChIThe Worldwide Chemical Structure Identifier Standard. J. Cheminformatics 2013, 5, 7. [Google Scholar] [CrossRef] [Green Version]
  54. Landrum, G. RDKit: Open-Source Cheminformatics Software. Available online: https://www.rdkit.org/2021 (accessed on 2 August 2022).
  55. Manchester, J.; Walkup, G.; Rivin, O.; You, Z. Evaluation of PKa Estimation Methods on 211 Druglike Compounds. J. Chem. Inf. Model. 2010, 50, 565–571. [Google Scholar] [CrossRef] [PubMed]
  56. Milletti, F.; Vulpetti, A. Tautomer Preference in PDB Complexes and Its Impact on Structure-Based Drug Discovery. J. Chem. Inf. Model. 2010, 50, 1062–1074. [Google Scholar] [CrossRef] [PubMed]
  57. R Core Team R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021.
  58. Filer, D.L.; Kothiya, P.; Setzer, R.W.; Judson, R.S.; Martin, M.T. Tcpl: The ToxCast Pipeline for High-Throughput Screening Data. Bioinformatics 2017, 33, 618–620. [Google Scholar] [CrossRef] [Green Version]
  59. Du, Y.; Days, E.; Romaine, I.; Abney, K.K.; Kaufmann, K.; Sulikowski, G.; Stauffer, S.; Lindsley, C.W.; Weaver, C.D. Development and Validation of a Thallium Flux-Based Functional Assay for the Sodium Channel NaV1.7 and Its Utility for Lead Discovery and Compound Profiling. ACS Chem. Neurosci. 2015, 6, 871–878. [Google Scholar] [CrossRef]
  60. Weaver, C.D.; Harden, D.; Dworetzky, S.I.; Robertson, B.; Knox, R.J. A Thallium-Sensitive, Fluorescence-Based Assay for Detecting and Characterizing Potassium Channel Modulators in Mammalian Cells. J. Biomol. Screen 2004, 9, 671–677. [Google Scholar] [CrossRef] [Green Version]
  61. Huang, R.; Xia, M.; Cho, M.-H.; Sakamuru, S.; Shinn, P.; Houck, K.A.; Dix, D.J.; Judson, R.S.; Witt, K.L.; Kavlock, R.J.; et al. Chemical Genomics Profiling of Environmental Chemical Modulation of Human Nuclear Receptors. Environ. Health Perspect. 2011, 119, 1142–1148. [Google Scholar] [CrossRef]
  62. Doddareddy, M.R.; Klaasse, E.C.; Shagufta; IJzerman, A.P.; Bender, A. Prospective Validation of a Comprehensive In Silico HERG Model and Its Applications to Commercial Compound and Drug Databases. ChemMedChem 2010, 5, 716–729. [Google Scholar] [CrossRef]
  63. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  64. Ji, X.; Tong, W.; Liu, Z.; Shi, T. Five-Feature Model for Developing the Classifier for Synergistic vs. Antagonistic Drug Combinations Built by XGBoost. Front. Genet. 2019, 10, 600. [Google Scholar] [CrossRef] [PubMed]
  65. McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint 2020, arXiv:1802.03426. [Google Scholar]
  66. Konopka, T. _umap: Uniform Manifold Approximation and Projection_. R Package Version 0.2.9.0. 2022. Available online: https://CRAN.R-Project.Org/Package=umap (accessed on 2 August 2022).
  67. Maaten, L.V.D.; Hinton, G.E. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2008, 9, 11. [Google Scholar]
  68. Krijthe, J. Rtsne: T-Distributed Stochastic Neighbor Embedding Using a Barnes-Hut Implementation. 2015. Available online: https://github.com/jkrijthe/rtsne (accessed on 2 August 2022).
  69. Batista, J.; Vikić-Topić, D.; Lučić, B. The Difference Between the Accuracy of Real and the Corresponding Random Model Is a Useful Parameter for Validation of Two-State Classification Model Quality. Croat. Chem. Acta 2016, 89, 527–534. [Google Scholar] [CrossRef]
Figure 1. Correlation between IC50 estimates derived from full dose–response curves and two-point subsets of the same dose–response curves for in-house patch clamp data (A) and PubChem thallium flux assay data (B).
Figure 1. Correlation between IC50 estimates derived from full dose–response curves and two-point subsets of the same dose–response curves for in-house patch clamp data (A) and PubChem thallium flux assay data (B).
Ijms 24 00635 g001
Figure 2. The relationship between predicted and measured pIC50 hERG inhibition estimates. The predictions are from the ECD model (A) and LCD model (B). The dashed line shows one-to-one correlation and the dotted lines mark 0.3 pIC50 errors around one-to-one correlation.
Figure 2. The relationship between predicted and measured pIC50 hERG inhibition estimates. The predictions are from the ECD model (A) and LCD model (B). The dashed line shows one-to-one correlation and the dotted lines mark 0.3 pIC50 errors around one-to-one correlation.
Ijms 24 00635 g002
Figure 3. Sample plot for IC50 derivation from six-point and two-point titration series.
Figure 3. Sample plot for IC50 derivation from six-point and two-point titration series.
Ijms 24 00635 g003
Table 1. Model training set summary.
Table 1. Model training set summary.
ModelTypeNData Included
Expanded continuous data (ECD)Continuous4081All pIC50s from all experiments, including pIC50s extrapolated outside of the tested concentration ranges
Limited continuous data (LCD)Continuous1686pIC50s from traditional dose–response experiments, calculated without extrapolation
All binary data (ABD)Categorical3903All compounds for which consistent classification at the 10 μM threshold; prevalence = 40.9%
High confidence binary data (HCBD)Categorical2812All compounds with % inhibition < 30% or >70% at 10 μM; prevalence = 35.9%
N—number of compounds in the model’s training set. Prevalence values are reported for categorical data sets around the 10 μM threshold.
Table 2. Model performance statistics.
Table 2. Model performance statistics.
ModelAc.Thr.Test SetSensSpecPPVNPVPrevBAQ2Q2,rndΔQ2
ECD10All0.690.900.900.670.580.790.770.490.28
LCD10All0.960.290.650.820.580.620.680.560.12
ABD10All0.580.830.830.590.580.710.690.480.21
HCBD10All0.540.900.880.580.580.720.690.480.21
ECD10HC0.740.950.950.740.560.840.830.490.34
LCD10HC0.960.320.650.870.560.640.680.540.14
ABD10HC0.640.850.850.650.560.750.730.490.24
HCBD10HC0.580.930.910.630.560.760.730.480.25
ECD1All0.080.980.330.900.100.530.890.880.01
ECD3All0.400.980.880.790.300.690.800.640.16
ECD5All0.510.880.750.720.410.700.730.540.19
ECD30All0.880.620.870.640.750.750.820.630.19
Model—model number as described in Table 1. Ac.Thr.—activity threshold for the model assessment (μM). Prev.—prevalence of active cpds at the activity threshold. HC—test set with observations at 10 μM threshold removed; contains data with 4.8 > pIC50 > 5.2. Sens—sensitivity, Spec—specificity, PPV—positive predictive value, NPV—negative predictive value, Prev—prevalence of positive classification around the activity threshold, BA—balanced accuracy, Q2—accuracy, Q2,rnd—random accuracy = [(TP + FN) (TP + FP) + (TN + FN) (TN + FP)] /N2, ΔQ2 = Q2 − Q2,rnd.
Table 3. Results for the internal cross-validation.
Table 3. Results for the internal cross-validation.
ModelMetric 1IterationValue
ECDMAE10.334
ECDMAE20.347
ECDMAE30.353
ECDMAE40.338
ECDMAE50.341
ECDMAEavg0.343
LCDMAE10.253
LCDMAE20.243
LCDMAE30.253
LCDMAE40.259
LCDMAE50.257
LCDMAEavg0.253
ABDBA10.814
ABDBA20.824
ABDBA30.838
ABDBA40.846
ABDBA50.835
ABDBAavg0.831
HCBDBA10.882
HCBDBA20.915
HCBDBA30.881
HCBDBA40.911
HCBDBA50.899
HCBDBAavg0.898
1—Measure used to assess model performance in five-fold CV; MAE—mean absolute error. BA—Balanced accuracy.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Melnikov, F.; Anger, L.T.; Hasselgren, C. Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models. Int. J. Mol. Sci. 2023, 24, 635. https://doi.org/10.3390/ijms24010635

AMA Style

Melnikov F, Anger LT, Hasselgren C. Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models. International Journal of Molecular Sciences. 2023; 24(1):635. https://doi.org/10.3390/ijms24010635

Chicago/Turabian Style

Melnikov, Fjodor, Lennart T. Anger, and Catrin Hasselgren. 2023. "Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models" International Journal of Molecular Sciences 24, no. 1: 635. https://doi.org/10.3390/ijms24010635

APA Style

Melnikov, F., Anger, L. T., & Hasselgren, C. (2023). Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models. International Journal of Molecular Sciences, 24(1), 635. https://doi.org/10.3390/ijms24010635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop