1. Introduction
Herpes virus infections induced by viruses of the Herpesviridae family are among the most widespread human diseases. Antibodies to various herpes viruses are identified in about 95% of the world’s population. Eight types of Herpesviridae viruses cause herpesvirus infections [
1,
2,
3]. Herpes simplex viruses of the first and second type (HSV-1 and HSV-2) are the most common [
4]. HSV-1 usually affects the upper body (mouth, eyes, and brain), whereas HSV-2 relates to genital infections [
5]. These viruses are usually latent. However, when immunity is reduced, they are activated, which, in turn, provokes diseases, such as oral herpes, genital herpes, keratitis, conjunctivitis, herpes zoster, etc. It is reported that herpes virus infections induced by HSV-1 and HSV-2 (and other types) increase the possibility to be infected with the human immunodeficiency virus (HIV) and are almost always diagnosed in patients with an HIV infection, which complicates the course of this disease. There is evidence that HSV-1 can participate in the development of multiple sclerosis [
6] and lead to male infertility [
7].
Currently, there are three classes of drugs in active medical practice for the treatment of infectious diseases caused by different types of herpes viruses, including HSV-1 and HSV-2: (1) acyclic guanosine analogues; (2) acyclic nucleotide analogues; and (3) pyrophosphate analogues (foscarnet) [
8].
Acyclovir is known to be an effective inhibitor of viral thymidine kinase (TK). This drug is the gold standard for the prevention and treatment of infections caused by HSV-1 and HSV-2, as this drug combines a pronounced clinical effect and low toxicity [
9]. However, a significant disadvantage of acyclovir is its low oral bioavailability, poor solubility, and short blood circulation time. An increase in the therapeutic dose of this drug is undesirable with long-term use, as it leads to an increase in its toxicity. Another disadvantage of acyclovir is related to drug resistance development. This problem is not significant for patients with good immunity, since the incidence of acyclovir-resistant herpes simplex virus strains among them is ~0.5% of cases. However, among patients with immunodeficiency conditions, it exceeds 30% of cases. In 95% of cases, acyclovir resistance is due to mutations in the thymidine kinase and DNA polymerase genes, which are related to the mechanism of action of this drug [
10,
11,
12,
13,
14,
15]. In addition to acyclovir, the phenomenon of drug resistance was observed for pentacyclovir and its analogues used as HSV-1- and HSV-2-replication inhibitors. The analogues of adenine, adefovir, and tenofovir have found the greatest application in clinical practice among the phosphonate derivatives of guanosine. These drugs are included in therapy to suppress HSV-1 strains resistant to acyclovir (and its analogues) and some similar cases with the deficiency of viral thymidine kinase. The same drugs are used in the treatment of hepatitis B and HIV. However, these drugs have a pronounced nephro- and hepatotoxic effect. Foscarnet, a covalent inhibitor of DNA polymerase, has not been widely used in clinical practice due to the rather high toxicity and the lack of selectivity. The above-mentioned issues demonstrate the necessity for the search for new anti-herpetic drugs [
8].
Additional promising strategies against the HSV-induced herpes infections deal with the development of inhibitors of other enzymes (helicase-primase or ribonucleotide reductase) and inhibitors of the adhesion/penetration of the virus into the cell. Currently, inhibitors of the mentioned enzymes are at various stages of preclinical and clinical trials and out of medical practice.
Thymidine kinase inhibitors occupy a special place in the development of new-generation antiviral drugs. It should be noted that this enzyme plays a key role in the thymidine metabolism both in healthy and virus-infected cells. In healthy cells, this intracellular enzyme catalyzes the conversion of thymidine to thymidine monophosphate (TMP) in the presence of adenosine triphosphate (ATP). Viral thymidine kinase differs from the thymidine kinase of the host cell in its much greater substrate specificity and it is able to catalyze the phosphorylation of thymidine, pyrimidines, and purines. Subsequently, in both healthy and virus-infected cells, the resulting monophosphates of pyrimidines and purines are converted into the corresponding bi- and triphosphates. Triphosphates are then incorporated into deoxyribonucleic acid. Thus, viral thymidine kinase inhibitors cannot be simultaneously used with preparations containing acyclovir and its analogues as an active component [
5,
8].
DNA polymerase inhibitors prevent the replication of the virus after reactivation. In contrast, thymidine kinase inhibitors are aimed at preventing reactivation by lengthening the latent period of the virus. However, when developing antiviral drugs based on thymidine kinase, one should bear in mind that viral thymidine kinase is not a key target in the replication of herpes virus in rapidly dividing cells where the amount of thymidine triphosphate is sufficient for the synthesis of viral DNA due to cellular metabolism. However, this enzyme plays a key role in non-proliferating (non-dividing) nerve cells, in which the synthesis of cellular DNA occurs at a low level (if at all). In this case, the inhibition of viral thymidine kinase leads to the growth of damaged and, therefore, non-viable cells in primary neuronal cultures.
In addition to a potential target in the fight against viral infections, thymidine kinase (TK) is considered a tumor marker, which is used to diagnose and monitor the increased proliferation of tumor cells. It is known that tumor cells have an increased concentration of TK due to their high-intensity division and growth.
Thus, the search for efficient TK inhibitors, including TK of the human herpes viruses HSV-1 and HSV-2, can be considered one of the promising medical treatment options for herpetic infections and cancer diseases of various origins [
16,
17]. However, the rational search for new drugs without involving virtual screening methods is impractical both from an economic point of view and because of the high time costs [
18,
19,
20,
21,
22].
In this regard, scientists use various approaches for computer-aided drug design (CADD) to search for hit compounds at the initial stage of development of new potential drugs [
23]. There are known drugs that have been developed using this approach, such as tirofiban [
24], zanamivir [
25], boceprevir [
26], saquinavir [
27], captopril [
28,
29], and aliskiren [
30]. CADD approaches are classified into structure-based and ligand-based methods. In the first category of methods, computational drug design is carried out by studying the interactions between ligands and target molecules. Accordingly, the aim of this variant of CADD is to optimize the binding structure of the ligand under study and the corresponding receptor in a three-dimensional form. Virtual protein–ligand complexes are modeled using pharmacophore search, molecular docking, and molecular dynamics methods. The most obvious disadvantage of structure-based approaches in CADD is the requirement of correct information on the receptor structure and the high time and computational costs [
31].
In the alternative category of ligand-based CADD approaches, the leading factors contributing to biological activity are the physicochemical, electronic, and conformational features of the ligands. The key advantage of the latter strategy over the former one is mainly that in the latter case, knowledge of the spatial structure and amino acid composition of the target is not required for the design of potential drugs [
23,
31].
Quantitative structure–activity relationship (QSAR) is a valuable method in CADD which aims to build statistically significant mathematical models for predicting different biological activity parameters (pIC
50, pLD
50, pK
i, etc.) based on different physicochemical, electronic, and structural characteristics of organic compounds [
32,
33]. In terms of dimensionality, the type of QSAR models depends on the descriptors used, ranging from 0D-QSAR to 7D-QSAR [
34]. Several descriptors (e.g., atomic properties, number of fragments, and topological descriptors) make up the 0D to 2D-QSAR components. Modeling using 3D-QSAR methods requires the inclusion of 3D descriptors giving an additional dimension in spatial coordinates [
35,
36]. Additional aspects of 3D-QSAR models require the use of multidimensional molecular descriptors based on conformational flexibility, induced fit, solvation function, and target-based receptor models. These supplements generate multidimensional QSAR (i.e., 4D to 7D-QSARs) [
32]. A factor complicating the practical use of 3D-7D QSAR methods is the required knowledge of the bioactive conformation of the ligands that are structural analogues of the compounds being modeled [
37,
38,
39]. Taking into account of all of the above factors in terms of time and computational cost can, in some cases, be far superior to the first category of structure-based CADD methods. In this regard, today there is a growing interest in the use of 2D-QSAR models against the background of a relatively smaller number of studies using multivariate QSAR approaches despite the high predictive power, logical validity, and objectivity of the latter.
The GUSAR program, developed at the V.N. Orekhovich Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences, is modern software for the construction of quantitative and classification models (QSAR and SAR models) and the prediction of various types of biological activity, as well as other properties of organic compounds based on a 2D approach (structural formulae of organic compounds) [
20]. In this software, the chemical structure is described in terms of descriptors called quantitative neighborhoods of atoms (QNAs) and multilevel neighborhoods of atoms (MNAs) developed at the same institute [
40,
41,
42]. The functioning algorithm of this software is based on the method of self-consistent regression previously developed by the same team with the inclusion of additional estimates of the quality of prediction of the target property (based on the method of nearest neighbors and artificial neural network with a radial basis function) and construction of a consensus of the set of models [
20]. It is reported that GUSAR is not inferior to other methods (CoMFA, CoMSIA, HQSAR, etc.) used to build QSAR/QSPR models in terms of accuracy and predictive ability [
43,
44]. As a result, the software can be successfully applied to a variety of QSAR/QSPR tasks [
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58]. In particular, the GUSAR software has been used for more than a decade to model various types of biological activity and toxicity of organic compounds [
40,
41,
42,
43,
44,
45,
46,
47,
48,
49]. In addition, the successful application of this software has been demonstrated for the QSPR modeling of several physicochemical properties of organic compounds, including the n-octanol–water partition coefficient (logP) [
45], boiling and melting points, density, thermal conductivity, viscosity, surface tension, water solubility, and gas pressure [
40].
Additionally, our earlier publications demonstrated the successful application of this software for QSPR modeling of antioxidants under conditions of the liquid-phase radical-chain oxidation of organic substrates [
59,
60,
61,
62,
63].
This software has been used for more than a dozen years for modeling different types of biological activity. It was shown by the developers and other researchers, including our research team, that GUSAR software can be successfully applied to multiple QSAR/QSPR problems [
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63].
In this work, we used GUSAR 2019 software to study the quantitative structure–activity relationship for inhibitors of HSV-1 and HSV-2 viral thymidine kinase using the series of 5′-amino-2′,5′-dideoxy-5-ethyluridine (I–III), N2-phenylguanine (IV), and 2-phenylamino-6-oxopurine carboxamide derivatives (V–VI,
Figure 1) and developed coupled and statistically significant QSAR models for screening virtual libraries and databases.
2. Results
Using the consensus approach implemented in the GUSAR 2019 program, we have studied the quantitative relationship between the structure and the efficiency of inhibition of HSV-1 and HSV-2 TK with 5-ethyluridine, N2-guanine, and 6-oxopurine derivatives with general structural formulas I–VI (
Figure 1). These compounds made up the training sets TrS1–TrS4. Depending on the type of descriptors used in the calculations (MNA and/or QNA), three QSAR consensus models have been obtained for each of the training sets. In total, we have built 12 QSAR consensus models for predicting pIC
50 values for HSV-1 and HSV-2 TK inhibitors that included from 20 to 360 partial regression models. The pIC
50 values for inhibitors included in TrS1–TrS4 derived from these QSAR consensus models M1–M12 were compared with the experimental values of pIC
50 (see
Tables S2–S5 in Supplementary Materials).
The regression models were not explicitly displayed, as a clear physical interpretation of the descriptors was absent. Hence, we could not determine the descriptors making the largest/the smallest contributions to the simulated activity [
64,
65]. However, this was beyond the scope of this study. Our goal was to solve two problems:
- (1)
to show that the ideology of descriptor formation and selection implemented in the GUSAR 2019 software is applicable for modeling potential inhibitors of HSV-1 and HSV-2 TK enzymes in the series of 5-ethyluridine, N2-guanine, and 6-oxopurine derivatives;
- (2)
to develop statistically significant QSAR models suitable for the virtual screening of HSV TK inhibitors.
For the internal validation of the QSAR models M1–M12 over the TrS1–TrS4 structures, we used a cross validation procedure with a 20-fold randomized exclusion of 20% of the compounds. Here, the averaged values of determination coefficients
and
for the inhibitors of all training sets were similar (
Table 1); the difference between these two indicators (Δ =
−
) did not exceed 0.1. This assessment indicates the stability of the constructed consensus models.
This is exemplified in
Table 1,
Table 2 and
Table 3, which present the numerical values of the statistical criteria estimated by comparing the experimental and predicted pIC
50 values calculated using models M1–M12 with 95% of the data included in the corresponding training set. Full information about all of these criteria using the twelve developed QSAR models, which enables an objective evaluation of the descriptive and predictive ability of the models, taking into account 95% and 100% of the data included in the training and test sets, respectively, is given in the
Supplementary Materials (Tables S2–S5).
The data of
Table 1,
Table 2 and
Table 3 provide the conclusion that all constructed QSAR models had high descriptive ability. However, the data presented in
Table 1,
Table 2 and
Table 3 clearly demonstrate the discrepancy between the numerical values of determination coefficients (R
2) found while evaluating the descriptive ability of models M1–M12 in the GUSAR 2019 and XternalValidationPlus 1.2 software, due to different ideologies underlying the calculations.
It should be taken into account that in the GUSAR 2019 software, the target parameter (in our case, pIC50) for each chemical structure included in the training or test set is predicted as a result of averaging the numerical values of this parameter calculated using each of the particular models included in a single consensus model. The final statistical parameters are calculated in a similar way.
For example, when predicting pIC50 values for any compound from the training set TrS1 using the consensus model M1, we get a set of 20 predicted pIC50 pred values and 20 sets of different internal validation criteria: R2, Q2, F, and SD. Further, all the same data are averaged, which is displayed as the final results.
Meanwhile, in the XternalValidationPlus 1.2 program, the calculation of statistical parameters for assessing the descriptive and predictive ability of QSAR models is based on comparing the experimental pIC
50 data with the average values previously predicted using the GUSAR 2019 software. This procedure is performed twice without averaging the final results [
66]:
- (1)
for the full dataset in each training and test set (100% of data);
- (2)
for 95% of the data in each training and test set (95% of the data).
In general, a comparison of the data given in
Table 1,
Table 2 and
Table 3 demonstrates that the SCR method of GUSAR 2019 for selecting significant descriptors produces stable regression dependences with acceptable statistical characteristics (R
2TrS > 0.6 and Q
2TrS > 0.5) for simulated HSV-1 and HSV-2 TK inhibitors, regardless of the selected types of descriptors.
The different determination criteria of the descriptive ability of models M1–M12 are similar irrespective of the amount of data in the sets (95 or 100%) and tend to be 1 (
Table 2 and
Table 3). The MAE error values do not exceed 15% of the ΔpIC
50 range of the inhibitory activity of the TrS1–TrS4 structures. The parameter ΔR
2m is in all cases is much lower than 0.2 and does not exceed 0.048. All of these data indicate the rather high simulability of the target properties using the selected algorithms for a calculation of descriptors and construction of regression equations [
67] implemented in the GUSAR 2019 software.
An external validation of the M1–M3 and M7–M9 QSAR models was performed by predicting the pIC50 for HSV-1 TK inhibitors using test sets TS1 and TS3. The validity of the models M4–M6 and M10–M12, meant for the prediction of the pIC5 for HSV-2 TK inhibitors, was evaluated in relation to test sets TS2 and TS4. All estimates of the predictive ability of the M1–M12 models were based on three criteria:
- (1)
numerical values of various coefficients of determination based on R2 (R2, R20, Q2F1, Q2F2, CCC);
- (2)
numerical values of the MAE prediction error;
- (3)
the scatter range of activity prediction data taking into account MAE in the mσ (or mSD) range: MAE + 3·SD. All of these parameters were computed using the XternalValidationPlus 1.2 program. In addition, this program was used to trace the systematic error that can arise in QSAR modeling.
Figure 2,
Figure 3,
Figure 4 and
Figure 5 show the distribution of different determination coefficients and prediction errors for pIC
50 values for 95% of the HSV inhibitors from test sets TS1–TS4 calculated using the XternalValidationPlus 1.2 program. The complete set of all statistical parameters obtained from a comparison of experimental and predicted pIC
50 values for the TS1–TS4 structures determined based on models M1–M12 is given in
Tables S2–S5 (Supplementary Materials).
The more stringent criterion
is relatively high for the external validation of models M1–M12 using the full size of TS1–TS4, being in the range of 0.8273–0.8859 and 0.7587–0.8683 for HSV-1 and HSV-2 inhibitors, respectively. After removing 5% of outliers from TS1–TS4, the ranges of
become 0.8207–0.9294 and 0.8664–0.9294 for HSV-1 and HSV-2 inhibitors, respectively (
Figure 2,
Figure 3,
Figure 4 and
Figure 5,
Tables S2–S5 in Supplementary Materials). The Δ
criterion, proposed by the same authors as an additional parameter for assessing the predictive ability for the external validation of regression models, did not exceed 0.09 in any of the cases. This also indicates the rather high predictive ability of QSAR models M1–M12 (
Figure 2,
Figure 3,
Figure 4 and
Figure 5,
Tables S6–S13 in Supplementary Materials).
Based on a comparison of different determination coefficients obtained during the external validation of models M1–M12, we have found that the parameter pIC50 for 5-ethyluridine, N2-guanine, and 6-oxopurine derivatives with respect to HSV-1 is modeled with higher accuracy than that for the same compounds against HSV-2.
As we noted above, an analysis of different types of determination coefficients is faced with the following contradictory situation: the R
2 and R
20 values for the activity of 5-ethyluridine, N2-guanine, and 6-oxopurine derivatives against HSV-1 are equal to or less than Q
2F1 and Q
2F2. This means that the constructed models M1–M12 predict the activities of TS1–TS4 compounds better than the activities of the training set structures. Note that in practice, the situation is usually opposite. This fact was repeatedly noted by other researchers [
68,
69,
70,
71]. Thus, the use of the metrics based on R
2 and Q
2 alone for assessing the predictive ability of QSAR models seems to be insufficient.
According to these two criteria, the predictive ability of models M1–M3, M7, and M9 has been classified as high for both test sets TS1 and TS3. Since the MAE + 3·SD criterion has been at the boundary of the allowable threshold value equal to 1.1735, the predictive ability of model M8 for 95% of the data of test set TS1 is moderate. The MAE and MAE + 3·SD values in the case of sets TS2 and TS4 do not exceed 0.6250 and 1.5625, respectively. As a result, the predictive ability of model M4 in the sets TS2 and TS4 has also been rated as high. At the same time, considering these two threshold values, the predictive abilities of M5–M12 can be estimated as satisfactory for set TS2 and as high for set TS4.
A comparative analysis of the statistical characteristics and prediction errors of pIC50 indicate that all constructed models have rather high descriptive and predictive ability. However, to solve the problem of searching for new potential inhibitors of HSV-1 and HSV-2 TK enzymes among the title compounds, it is most preferable to use the consensus models M3 and M6 because they include 100 particular regression models and each of them is based on the maximum set of structures and descriptors.
In this regard, we have applied the consensus models M3 and M6 to virtual screening through the CHEMBL database for new HSV TK inhibitors among various lead compounds and active drug components of different pharmacological profiles. Unlike traditional methods of QSAR modeling (multilinear regression (MLR), partial least squares (PLS) method, etc.), the GUSAR software does not specify clear threshold criteria regarding the Tanimoto coefficient, which would limit the search for new potential biologically active substances in virtual databases. However, adhering to concepts of the classical school, we limited the scope of the search for new potential inhibitors of HSV-1 TK and HSV-2 TK in the ChEMBL database by the degree of similarity of at least 70% with respect to the reference compounds.
The virtual screening involved 400 5-ethyluridine, N2-guanine, and 6-oxopurine derivatives with pronounced antitumor and antibacterial properties and no antiviral properties. However, only 192 lead compounds and known pharmaceuticals fitted in the range of applicability of consensus models M3 and M6. For 155 structures of these, the predicted IC
50 values were <1 μmol/L. The most promising hit compounds are presented in
Table 4. The complete list of the structures of the potential HSV TK inhibitors predicted using consensus models M3 and M6 is given in
Table S14 in the Supplementary Materials. We assume that in living systems, these compounds should behave as multi-target drugs. They are promising for further detailed studies.
Additionally, using the GUSAR 2019 program, we carried out a structural analysis of TK inhibitors. Since for 42 compounds presented in
Table 4, there were no experimental data on the inhibitory activity against human herpes viruses HSV-1 and HSV-2, these compounds were not included in the structural analysis. We used the consensus model M3, as it provides more objective and accurate results due to the maximum number of modeled structures and involvement of all types of descriptors implemented in GUSAR 2019 [
20,
21,
22]. However, it should be noted that these compounds have been extensively studied for their inhibitory activity against HPV in previous biological experiments [
17,
72,
73]. Therefore, here, we will briefly discuss this issue.
Figure 6,
Figure 7 and
Figure 8 show the analysis of the contribution of different functional groups to the activity of inhibitors of HSV-1 and HSV-2 thymidine kinase with general structural formulas I–VI. For compounds with the general structural formula I, it was experimentally shown that the replacement of a hydrogen atom in the R
1 position of the benzene ring (
1) increases the inhibitory activity, irrespective of the nature of the acyclic substituent. The results of a structural analysis of the same compounds obtained using the GUSAR 2019 program lead to a similar conclusion. This enhancement is manifested for compounds
2–
7 containing fluoro (
2), chloro (
3), methyl (
4), and trifluoromethyl (
5) substituents in the ortho-positions (
Figure 6a).
In compounds with the general formula II, replacement of the dihydroxanthene moiety (
8) with a xanthene (
9) or thioxanthene dioxide (
12) moiety somewhat increases the activity of the TK inhibitors of HSV-1 and HSV-2. At the same time, replacement by dibenzosuberene, anthracene, or NMe-acridine (
10) has an adverse effect on both target properties. Note that the first two of these groups induce a pronounced decrease in the inhibitory activity, while the third replacement has only a moderate effect. The replacement of the dihydroxanthene moiety in
8 with a thioxanthene moiety (
11) decreases the inhibitory activity against HSV-1 TK by a factor of 1.5 and has almost no effect on the inhibitory activity against HSV-2 TK (
Figure 6b).
In compounds with the general structural formula III containing an oxygen atom in position R
1 (i.e., xanthene ring,
13) (
Figure 7a), the replacement of the hydrogen atom in position R
2 with a methyl group (
14) increases the inhibitory activity against HSV-1 TK and impairs the activity of TK inhibitors against HSV-2. However, the effect is not clearly pronounced in both cases. The introduction of a second methyl group into position R
3 (
15) of the xanthene ring decreases both target properties. The alternative replacement of the hydrogen atom at position R
2 by a chlorine atom (
16) increases the activity of TK inhibitors for HSV-1 almost 2-fold, but barely affects the inhibitory activity against HSV-2 TK. The additional incorporation of a second chlorine atom at position R
3 (
17) is favorable for the activity against HSV-1 TK and almost does not influence the activity against HSV-2 TK. The replacement of the hydrogen atom in position R
2 with a trifluoromethyl group (
18) and unsubstituted phenyl increases the TK inhibitory activity against HSV-1 and has almost no effect on this activity against HSV-2. Meanwhile, the modification of position R
2 by introducing a methoxy group (
19) increases the activity of HSV-1 TK inhibitors and decreases the activity of HSV-2 TK inhibitors. However, the changes caused by a hydrogen atom replacement with the above substituents are moderate.
In compounds with the general structural formula IV (
Figure 7b), the replacement of the hydrogen atom (
20) in the meta-position by chlorine (
21) or a trifluoromethyl group (
22) increases the activity of both TK isoforms quite significantly. Modification of the meta-position in the benzene ring with a hydroxymethyl group (
23) negatively affects both target properties, and the adverse effect is high. At the same time, the alternative replacement of the hydrogen atom with ethyl (
24) or n-propyl (
25) increases the activity of TK inhibition of HSV-1 and decreases the activity of TK inhibition of HSV-2.
The replacement of hydrogen in the para-position with a bromine atom (
26) favorably affects both target properties. In contrast, the alternative replacement of hydrogen with methyl (
29), ethyl (
32), n-butyl (
35), trifluoromethyl (
28), or hydroxyl (
27) markedly decreases the inhibitory activity of compounds with the general structural formula IV against both TK isoforms (
Figure 7b).
The simultaneous substitution of hydrogen atoms in the meta- and para-positions of the benzene ring by a bromine atom (
31) considerably increases the efficiency of inhibitors of HSV-1 TK and almost does not affect the efficiency against HSV-2 TK. However, if we consider this substitution as sequential, the introduction of the second bromine atom in the meta-position of the benzene ring decreases the activity of both TKs compared to the modification of only the para-position by this substituent. The inclusion of fluorine and chlorine atoms in the para- and meta-positions (
30) of the benzene ring, respectively, does not affect the inhibitory efficiency against HSV-1 and markedly decreases that against HSV-2. Similar modifications of para- and meta-positions based on the inclusion of two chlorine (
33) or fluorine atoms (
34) significantly decrease both target properties (
Figure 7b).
The replacement of benzene (
20) with a 2,3-dihydro-1H-indene (
36) or naphthalene (
37) ring and with a number of acyclic groups, including n-butyl (
38), n-hexyl (
39), and 1-hydroxypentyl (
40), in compounds with general structural formula V has the same effect (
Figure 8a).
In addition, in compounds with the general structural formula V, replacement of the benzene ring (
20) in position R
2 with a benzyl moiety (
41) markedly reduces the efficiency of inhibition of HSV-1 TK and has almost no effect on the activity of HSV-2 TK. At the same time, structural analogues of benzyl containing a chlorine atom in the meta- (
43) or para-position (
42) have the opposite effect, which is also markedly pronounced. The replacement of the oxo group (hydroxyl group, if we consider the alternative resonance structure) with a chlorine atom (
44) and a hydroxyl group significantly reduces the inhibitory activity against both TK isoforms (
Figure 8a).
In compounds of general formula VI, the introduction of hydroxyalkyl, aminoalkyl, or carboxyalkyl substituents in position 9 (position R
2) of the purine ring, except for 2-hydroxyethyl (
45) and 3-hydroxypentyl (
46), increases both target properties. The introduction of 4-(piperidinyl)butyl and its derivatives containing a benzene moiety and acyclic substituents at positions 2, 3, and 4 of the pyridine ring has a similar effect. The only exceptions in the latter case are the two oxopurine derivatives with the general structural formula VI containing 4-(4-hydroxypyridyl)butyl and 4-(1,4′-bipyridine)butyl at position R
1. However, these two moieties have a negative effect only for the inhibition of the TK activity of HSV-1. The activities of HSV-2 TK inhibitors are not affected by these two modifications. The modification of the R
2 position in the oxopurine ring by replacing the hydrogen atom with 4-(decahydroquinolyl)butyl or 4-(1,2,3,4-tetrahydroquinolyl)butyl makes a positive contribution to both target properties (
Figure 8b).
In oxopurine derivatives with the general formula VI containing a 4-hydroxyl group at position R2, the replacement of the phenylamine moiety at position R1 with a primary amino group or with a methylamine moiety significantly decreases the inhibitory activity against both TKs. Meanwhile, the introduction of a 2-phenoxyl or 2-phenylthiol moiety instead of 2-phenylaminyl moiety promotes the activity of inhibitors of HSV-1 TK, but negatively affects the inhibition efficiency of HSV-2 TK.
Overall, the comparison of experimental and calculated data indicates that the results of the structural analysis performed in GUSAR-2019 were 80% consistent with the results of previous biological studies.
The discrepancies in predicted estimates of the influence of structural descriptors on the target activities are observed only for the simulated structures containing bulky cyclic moieties, such as dibenzosuberene, NMe-acridine, thioxanthene dioxide, and their structural analogues. The mismatch is explained by the fact that the structural analysis in the GUSAR 2019 program is based on the 2D approach and, therefore, does not take into account steric features of the receptor that the activity of which the simulated compounds are intended to inhibit.
3. Discussion
In the present work, using the GUSAR 2019 program, we have modeled the quantitative structure–activity relationship for 89 TK inhibitors for HSV-1 and HSV-2 in the series of some carboxamide derivatives of 5′-amino-2′,5′-dideoxy-5-ethyluridine, N2-phenylguanine, and 2-phenylamino-6-oxopurine with general structural formulas I–VI. The modeled TK inhibitors differed quite significantly in structure and belonged to different classes of organic compounds. In particular, compounds with general structural formulas I–III had a rather high degree of similarity to thymidine in the backbone structure. Compounds with general structural formulas IV–VI were more diverse and were actually structural analogues of guanine.
The modeling resulted in the construction of 12 valid QSAR consensus models focused on predicting target properties in the form of pIC50. Each of these consensus models contains 20 to 100 partial regression relationships, which differ from each other by a set of descriptors. The validity of the use of structurally diverse TK inhibitors for modeling is confirmed based on the rather high numerical values of statistical criteria of the internal and external validation of QSAR models M1–M12. In particular, the high descriptive ability of the consensus models M1–M12 was confirmed based on the reliable prediction of activities performed for compound structures of four training sets using two categories of metrics: (1) metrics based on R2 coefficients of determination (R2, R20, , CCC); and (2) metrics based on errors in predicting pIC50 values (root mean square error (RMSEP), mean absolute error (MAE), standard deviation (SD)).
The predictive ability of QSAR models M1–M12 was evaluated using similar statistical criteria and prediction errors. Additionally, the criteria Q2F1 and Q2F2, which are also used in the scientific literature to evaluate the predictive ability of QSAR/QSPR models, were determined. All models demonstrated rather high predictive ability in predicting target properties for both internal and external test set structures regardless of their size (95 and 100% of data).
This result is not an exception to the general rule, although it may be rather cautiously perceived by followers of the methodology of Gunch, Hammett, Taft, etc. In this context, note that the GUSAR program has been used for more than ten years since the release of its first version to build (Q)SAR/QSPR models focused on the detection and quantitative prediction of different types of biological activity. The developers of this program have repeatedly demonstrated in their publications that an important and undeniable advantage of their software product is the correct modeling of organic compounds that differ significantly in the structure and type of experimental studies. This important benefit of the GUSAR software is once again confirmed by the results of the present studies and is related to the unique algorithms used to calculate descriptors, as well as methods used to select the most significant descriptors for building the final QSAR models. In particular, the calculation of descriptors, the ideology of which is described in detail in the
Section 4.1 and in the
Supplementary Materials, is performed in the GUSAR program not only on the basis of whole molecules, but also on the basis of their individual structural parts, including individual atoms, as well as their various combinations. The calculation of descriptors based on the nature and properties of all atoms included in the modeled structures and their local environments is dominant in the descriptor-generation methodology, unlike the calculation of properties using whole molecular structures. This approach to the calculation of descriptors allows for common elements to be found among various organic compounds differing in the nature of cyclic and acyclic moieties, and, accordingly, expands the possibilities of QSAR modeling in general.
The ideology of the consensus approach, which actually takes into account the predictions of all partial regression relationships with a focus on their statistical weights, also significantly increases the reliability of adequate prediction of quantitative indicators of biological activity.
In addition, GUSAR program developers have repeatedly reported that QSAR models based on a diverse range of compound structures have a broader applicability in virtual screening than models based on a narrow set of multiple data. Such models allow for the identification and quantification of target properties for a broader class of organic compounds.