3.1. Advantage of QEPPI
Theoretically, we represent the ideal values for each physicochemical property characteristic of that dataset. This is because the frequency of compounds with that property was highest in that dataset. Therefore, these properties are expected to reflect the nature of the target proteins. Furthermore, because QED is modeled using FDA-approved oral drugs, it is expected to reflect absorption, distribution, metabolism, excretion, and toxicity. In contrast, the dataset used for QEPPI involves many PPI-targeting compounds and does not involve any optimization. Hence, the peak values for all physicochemical properties were higher for QEPPI than those for QED.
The advantage of QEPPI is that it allows model building using only target data. It does not require appropriate negative samples. The performance of machine learning classifiers is poor in problem settings where positive and negative samples are imbalanced [
25]. Therefore, QEPPI may be more effective than machine learning models under conditions in which appropriate negative samples are difficult to obtain from public databases.
RO4 is rule-based; therefore, it is nearly impossible to adjust certain threshold values. However, the threshold values of QEPPI developed in this study can be adjusted such that the desired sensitivity and specificity are achieved.
QEPPI indices are primarily intended to be used in the early-stage of PPI drug discovery, which is the seed compound discovery stage. Hence, better discrimination performance is desirable.
Figure 4 shows that QEPPI has higher Precision at the same Recall and higher Recall at the same Precision than those for RO4 with one, two, and four violations of RO4. The Precision-Recall AUC values for QEPPI, QED (a measure of oral drug-like properties), and QED_inv (a measure of
) were 0.425, 0.134, and 0.242, respectively, indicating that among these measures, QEPPI most accurately identified PPI-targeting compounds. The rules of RO4 are based on only 39 PPI inhibitors and, as with RO5, the strict cutoff for each physicochemical property is controversial. For example, a molecular weight of 401 is a pass, whereas 399 is a violation. In fact,
Table 1 shows that the peak value for MW is approximately 500 and peak values for ALogP and HBA are slightly higher than 4. This means that many compounds violate the RO4 criteria. For the 1007 PPI-targeting compounds used in the QEPPI model, the results of calculating whether each physicochemical property used in RO4 violates the four criteria are shown in
Table 4.
Table 4 shows that the violation percentages of WM, ALOP, and HBA were 24.1%, 37.5%, and 34.5%, respectively. For RING, which is a physicochemical property used only for RO4, more than 50% of the compounds violated this property.
The results above suggest that QEPPI is more useful and suitable compared to the conventional drug discovery indices QED and RO4. Hence our proposal is a useful index of PPI-targeting compounds in designing for early detection of PPI drugs.
Additionally, in further studies, QEPPI can be used as a reward in sequence-based molecular generation models using reinforcement learning such as REINVENT [
18,
19], and as a condition for sequence-based molecular generation models using conditional Wasserstein generative adversarial networks (WGANs) and Variational Autoencoders (VAEs), such as gcWGAN [
26] and CVAE [
27], which will enable molecular design with high PPI-targeting compound properties.
3.2. Application of QEPPI to PPI-Targeting Compounds That Are Approved or in Clinical Trials, and Other Small Compounds
In 2020, Shin et al. reported a review of PPI-targeting drug designs. We applied QEPPI to one dataset in this review [
15]. The dataset is described as the non-PPI dataset in the review (Soga dataset) [
28]. In 2021, Truong et al. explored which physicochemical parameters are necessary for a PPI modulator to become a clinical drug by analyzing the physicochemical properties of small-molecule PPI modulators that are either on the market, in clinical trials, or have been published. They found that PPI modulators currently on the market have a wide range of values for most physicochemical parameters, whereas PPI modulators in clinical trials conform much more closely to standard drug-like parameters, and therefore, a new PPI-specific screening library could be designed. This suggests that when designing new PPI-specific screening libraries, it is necessary to remain within parameters similar to those of standard drugs to obtain clinical candidates [
21]. As suggested by the authors, PPI modulators undergoing clinical trials tend to have physicochemical properties more similar to those of standard drugs than those of PPI modulators currently on the market.
We also applied QEPPI to the above datasets. The distribution of the QEPPI is shown in
Figure 5. Our application of QEPPI to the 30 clinical candidates used by Truong et al. showed a median value of approximately 0.59, which is higher than that of commercially available PPI modulators,
Figure 5. Although the physicochemical properties of the PPI-targeting compounds registered in iPPI-DB and FDA-approved drugs are different, as shown in
Figure 1 and
Table 1, the QEPPI modeled from iPPI-DB shows potential to be adapted to more recent PPI modulators.
In addition, we also looked at when the PPI-targeted compounds included in the Truong approved data were marketed and when the PPI-targeted compounds included in the Truong clinical data were used in clinical trials.
Figure 6 shows the QEPPI of PPI-targeting marketed drugs and compounds in clinical trials within the last 30 years (in detail
Supplementary Table S4).
Figure 6a shows the PPI-targeting drugs on the market, year the drug was first marketed (as identified in DrugBank), QEPPI value, and target PPI for each drug. PPI-targeting drugs launched in the 1990s showed lower QEPPI scores, whereas drugs marketed more recently tended to have higher QEPPI scores.
Figure 6b shows the PPI-targeting compounds in clinical trials, year of the first clinical trial (identified in
ClinicalTrials.gov, accessed on 15 September 2021), EU Clinical Trials Register (or NIPH Clinical Trial Search in Japan), QEPPI value, and target PPI for each compound. Regardless of the year, the QEPPI scores showed a high transition. This is consistent with the fact that the QEPPI scores of the marketed drugs in
Figure 6a exhibited a recent trend toward higher values.
In this study, some PPI-targeting compounds with low QEPPI scores showed small molecular weights compared to those at the peak. A previous study showed that the size and complexity of the binding interface of PPIs varies depending on the target. If the interface is relatively less complex and small, some PPI-targeting compounds with relatively small molecular weights can sufficiently block the binding interface. When the binding interface is more complex, the binding interface tends to be wide, and only a PPI-targeting compound with a large molecular weight can sufficiently block the binding interface [
29].
Figure 7 shows that there is a difference in the distribution of QEPPI for each PPI family in the iPPI-DB dataset. This result shows that the QEPPI scores of compounds targeting Bromodomain/Histone [
29], XIAP/Smac [
29], LFA/ICAM [
29], and CD4/gp120 [
30], which have primary epitopes (such as linear peptide), tend to be higher than those of compounds targeting Bcl2/Bax [
29], p53/MDM2 [
29], and CD80/CD28 [
31], which have secondary epitopes (such as the helix structure). As the LEDGF/IN interface area (400 Å) and transthyretin (TTR) dimer-dimer interface area are much smaller than the interface area of other PPIs [
15,
29,
32], the QEPPI scores of these PPI-targeting compounds tended to be low. Thus, the difference in the complexity of the PPI interface may affect the physicochemical properties of PPI-targeting compounds such as molecular weight; furthermore, the complexity of the PPI interface is related to the QEPPI score.
Therefore, evaluating average PPI-targeting compounds using the iPPI-DB as a dataset of various types of PPI-targeting compounds would be advantageous. Our further studies will focus on designing indices that are more specific to PPI-targeting compounds, such as the size of the binding interface or PPI family. This is similar to the proposal of QEX. The approach will become feasible as more data are deposited in the database.
Various types of PPI-targeting compounds have been reviewed, and QEPPI scores were calculated for several types of PPI target compounds mentioned in the review published by Mabonga et al. [
33]. For example, nine compounds that target chemokine receptors were included in the Truoung approved and Truoung clinical datasets used in this study, with an average QEPPI score of 0.641 (see
Supplementary Table S6). In addition, compounds targeting MDM2/p53, a cancer-related PPI, were also included in the Truoung clinical dataset with an average QEPPI score of 0.593 (see
Supplementary Table S6). The average QEPPI scores of 79-6 (PubChem CID5721353) targeting BCL6/SMRT and FMP-API-1 targeting AKAP18
/PKA were 0.468 and 0.410, respectively (see
Supplementary Table S7). In addition, LFA/ICAM, a PPI related to T-cell activation, was included in the data downloaded from iPPI-DB, and the average QEPPI score was 0.706 (see
Supplementary Table S5). These results suggest that QEPPI is effective for PPI modulators that have been developed to date. However, the QEPPI scores of approved drugs targeting FKBP12 (e.g., pimecrolimus, tacrolimus, everolimus, rapamycin, temsirolimus) and approved drugs targeting microtubules with a molecular weight greater than 800 (cabazitaxel, docetaxel cabazitaxel, eribulin mesylate, paclitaxel, vinblastine) have lower QEPPI scores (see
Supplementary Table S6). This is because the iPPI-DB does not include macrocyclic compounds that target FKBP12, and only approximately 4% of the total iPPI-DB of compounds have molecular weights exceeding 800; the QEPPI score of such compounds that deviate from the average is considered low because of the nature of the method. Further studies are needed to expand the compound space that is not covered by iPPI-DB.
To date, COVID-19 has claimed the lives of more than 4.7 million people and infected another 230 million, making it a global pandemic. In response to this critical situation, the development of drugs targeting the etiological agent, SARS-CoV-2, is ongoing. Some researchers are focusing on compounds that target PPIs. One of the most promising PPI targets is the interaction between the SARS-CoV-2 S protein and human angiotensin-converting enzyme 2 receptor [
34,
35,
36]. Therefore, we also calculated QEPPI for small-molecules targeting PPIs against SARS-CoV/SARS-CoV-2 (see
Supplementary Table S8). The median QEPPI score for these compounds was 0.511. Although this is only an example, QEPPI may be effective for host-pathogen PPIs and other PPIs.