2.1. Performance of the Mtc-QSAR-EL Model and Applicability Domain
The best mtc-QSAR-EL model found by us has twelve
D[
LQI]
cj descriptors, six of them based on hydrophobic factors, five containing steric information, and one focused on polar features of the molecules. Additionally, in terms of atom types, five
D[
LQI]
cj descriptors were based on the effect of halogen atoms, three indicating the influence of heteroatoms (mainly N, O, S, and P), two characterizing the presence of methyl groups, and two accounting for the importance of aromatic carbons. The details regarding each
D[
LQI]
cj descriptor appear in
Table 1, while all the molecular descriptors of the chemicals and the corresponding biological data are reported in
Supplementary Material 1.
The most appropriate mtc-QSAR-EL model developed here was an ensemble formed by three MLP networks whose parameters are represented in
Table 2. These MLP networks have different numbers of neurons in the hidden layer and require diverse error functions and different numbers of epochs to be trained. Two of these MLP networks (first and third) have the same type of activation function (hyperbolic tangent), with the other one having a logistic function. In the output layer, the first and second MLP networks use a softmax function, while the third one uses a logistic function. The combination of these aspects resulted in a difference in performance among these MLP networks, particularly in the case of the sensitivities [
Sn(%)]
ta, [
Sn(%)]
bs, and [
Sn(%)]
ap, as well as the specificities [
Sp(%)]
ta, [
Sp(%)]
bs, and [
Sp(%)]
ap. These six local metrics were of paramount importance in assessing the classification performance of the mtc-QSAR-EL model in both trained and unseen data (training and test sets, respectively) when considering the dissimilar experimental conditions
cj (see
Section 3 for full explanation) under which the molecules were assayed.
The mtc-QSAR-EL model exhibited accuracies [
Ac(%)] of 93.41% and 85.79% in the training and test sets, respectively. Furthermore, the global statistical indices depicted in
Table 3 support the good general performance of the mtc-QSAR-EL model. For instance, the sensitivity
Sn(%) and the specificity
Sp(%) have values higher than 90% in the training set. These two global statistical indices reached values higher than 85% in the test set. Additionally, the
MCC values are greater than 0.7, and, given their closeness to the ideal value of one, we can infer that there is very strong convergence between the observed [
ATBi(
cj)] and predicted [
Pred_
ATBi(
cj)]] values of the categorical variable of anti-TB activity against the different
Mtb strains. The classification results for all the molecules of our dataset are reported in
Supplementary Material 2. The file of the MLP networks can be obtained upon request to the authors.
At the local level, the mtc-QSAR-EL model also had a good performance. In the case of the training set, the local metrics [
Sn(%)]
ta, [
Sn(%)]
bs, [
Sn(%)]
ap, [
Sp(%)]
ta, [
Sp(%)]
bs, and [
Sp(%)]
ap were in the range 70–100%. The only exception was the assay time, 4 d (four days), which exhibited [
Sp(%)]
ap = 56.25%. In the case of the test set, a similar result was achieved since the same six statistical metrics were in the interval 71–100%, except for [
Sp(%)]
ap = 50% (assay time of four days) and [
Sn(%)]
ta = 64% (assay time of five days), as well as [
Sp(%)]
bs and [
Sp(%)]
ap, with values of 61.54% for the strain
Mtb (H37Rv_NRF) and the assay protocol “LORA method”, respectively. The previously mentioned global statistical indices and the local metrics discussed here confirm the internal quality and predictive power of the mtc-QSAR-EL model. All the values of [
Sn(%)]
ta, [
Sn(%)]
bs, [
Sn(%)]
ap, [
Sp(%)]
ta, [
Sp(%)]
bs, and [
Sp(%)]
ap can be found in
Supplementary Material 2.
Last, we would like to highlight that, from a physicochemical and structural point of view, the present mtc-QSAR-EL model classified anti-TB drugs belonging to a wide spectrum of chemical families (
Figure 1 and
Figure 2) such as polyketides, ethylenediamine derivative, aminoglycosides, nitroimidazopyrans, fluoroquinolones, and diarylquinolines.
Such heterogenicity of chemical structures together with the different definitions of the D[LQI]cj descriptors, the size of the dataset, and the relatively high values of the global and local statistical indices points to the capability and appropriateness of the mtc-QSAR-EL model to predict the anti-TB activity of structurally dissimilar chemicals against the different Mtb strains.
Regarding the AD of the mtc-QSAR-EL model, as reported in seminal references (see
Section 3.3), we calculated the so-called total score of applicability domain (TSAD), which is derived from the descriptors’ space approach. Since our mtc-QSAR-EL model contained twelve
D[
LQI]
cj descriptors, only those chemicals with TSAD = 12 were considered to be inside the AD (
Supplementary Material 3). By carefully inspecting our dataset, we found that only 15 out of 1571 molecules/cases in the dataset were outside the AD of the mtc-QSAR-EL model, 14 of them with TSAD = 11 and one with TSAD = 10. However, 13 out of these 15 seemingly atypical chemicals were correctly classified by considering the consensus prediction approach performed by the mtc-QSAR-EL model. We decided to keep these chemicals in the dataset since the consensus predictions constituted the priority over the descriptors’ space approach to define the AD.
2.2. Molecular Descriptors and Their Physicochemical and Structural Meanings
The sensitivity values
SV of the
D[
LQI]
cj descriptors are depicted in
Figure 3, where the highest values indicate those
D[
LQI]
cj descriptors with the greatest influence and discriminatory power in the mtc-QSAR-EL model. Simultaneously, the comparison between the class-based means for each
D[
LQI]
cj descriptor represented in
Table 4 shows us the type of variation that the value of a given
D[
LQI]
cj descriptor should undergo to potentiate the anti-TB activity against the different
Mtb strains.
As mentioned before, in our mtc-QSAR-EL model, we have six
D[
LQI]
cj descriptors characterizing information regarding atomic hydrophobic contributions. In this sense, we would like to highlight that such contributions are based on the hydrophobicity scale proposed by Ghose and co-workers [
35]. According to this scale, aliphatic carbon atoms present negative hydrophobicity values, excluding the fragments of the form
CHX3,
CR2
X2,
CRX3, and
CX4 (
X = O, N, S, P, Se, or halogen). Nitrogenated and oxygenated functional groups also have negative hydrophobic contributions except for the pyrrolic nitrogen (oxygen in furan) atom,
Ar–NH–
Ar (and its oxygenated counterpart), with
Ar being an aromatic (or heteroaromatic ring), and all the tertiary amines.
With that being said, the results presented in
Table 4 indicate that the anti-TB activity through the inhibition of different
Mtb strains can be enhanced by increasing the value of
D[
LASSq3(
HYD)
G]
ta, which describes the augmentation of the product of the hydrophobic contributions of any two atoms (with at least one of them being a halogen) that are placed at the topological distance (number of bonds between two atoms without considering bond multiplicity) of three. Examples of fragments with this characteristic are the 5-(halomethyl)pyrimidines, including those with substitutions in positions 4 and/or 6. Notice that
D[
LASSq3(
HYD)
G]
ta is the eleventh most important
D[
LQI]
cj descriptor in the mtc-QSAR-EL model. We also have
D[
LASSq0(
HYD)
Y]
ta whose increase is directly proportional to the number of heteroatoms in a molecule. In this sense, the presence of fragments such as
Ar–NO
2 (
Ar = aromatic or heteroaromatic ring), primary amines, amides, hydroxyl groups (alcohol), thiols, thioethers, functional groups containing sulfur (with sp2 hybridization) attached to a carbon atom, phosphite, and R3–P = X (R = any group link though carbon, X = O or S) will considerably increase the value of
D[
LASSq0(
HYD)
Y]
ta (the fourth most important
D[
LQI]
cj descriptor), favoring the anti-TB activity. The inhibitory activity against the
Mtb strains seems to be enhanced by the augmentation of the number of methyl groups in the molecules, and such a structural variation is characterized by
D[
LASSq0(
HYD)
M]
bs, which is the third most significant
D[
LQI]
cj descriptor in the mtc-QSAR-EL model. We also have
D[
LASSq2(
HYD)
Y]
bs (ranked the sixth most influential
D[
LQI]
cj descriptor), which indicates the increase in the product of the hydrophobic contributions of any two atoms (with at least one of them being a heteroatom) that are placed at the topological distance of two. Substructures such as pyrimidin−2-amine, 2-alkylpyrimidines, and urea, as well as aliphatic chains (or alicyclic moieties) attached to hydroxyl, amino, amide, and sulfonamide groups, will favorably increase the value of
D[
LASSq2(
HYD)
Y]
bs, with the subsequent improvement in the anti-TB activity. The presence of halogens seems to be of great importance in the increase in the inhibitory activity against the
Mtb strains, and as in the case of
D[
LASSq3(
HYD)
G]
ta (commented above),
D[
LASSq2(
HYD)
G]
ap also positively contributes through the increment in the product of the hydrophobic contributions of any two atoms (with at least one of them being a halogen) that are placed at the topological distance of two. Thus, 5-halopyrimidines and 4,6-disubstituted halobenzene fragments increase the value of
D[
LASSq2(
HYD)
G]
ap, which is the tenth most significant
D[
LQI]
cj descriptor. Last, we have
D[
LASSq2(
HYD)
M]
ap as the most important
D[
LQI]
cj descriptor in the mtc-QSAR-EL model. In this sense,
D[
LASSq2(
HYD)
M]
ap involves the increase in the hydrophobic contributions of any two atoms (with at least one of them being a carbon from a methyl group) that are placed at the topological distance of two. Moieties such as 2-methylpyrimidines, as well as aliphatic chains (or cycloalkane rings) containing methyl groups, amides, methoxy groups, and toluene fragments, will increase the value of
D[
LASSq2(
HYD)
M]
ap, and therefore the anti-TB activity against the diverse
Mtb strains reported here.
Our mtc-QSAR-EL model also has five D[LQI]cj descriptors associated with steric factors. In this context, D[LASSq0(POL)G]ta describes the diminution of the polarizability of the halogens. The value of D[LASSq0(POL)G]ta (having the fifth highest influence) can be decreased either by diminishing the number of halogens or by the presence of fluorine atoms. Consequently, halogens are usually important for the activity profiles of the molecules, and the case of the anti-TB activity is not an exception. Thus, functional groups such as trifluoromethyl (and, to a lesser degree, dichloromethyl and bromomethyl), 1,2-difluorobenze, and 2-chlorobenze will decrease the D[LASSq0(POL)G]ta value enough to favor the inhibitory potency against any Mtb strain. Two of the five steric D[LQI]cj descriptors characterize the presence of the aromatic carbons in the molecules. On one hand, the increase in the value of D[LASSq0(AW)P]bs is directly proportional to the increase in the number of aromatic carbons (e.g., benzene rings and polycyclic substructures such as naphthalene), while D[LASSq2(AW)P]ap, in addition to benefiting from the presence of aromatic carbons, is also favored by the presence of relatively heavy atoms (Cl, Br, and S) placed at the topological distance of two with respect to an aromatic carbon. Therefore, fragments that also increase the value D[LASSq2(AW)P]ap (and, to a lesser degree, D[LASSq0(AW)P]bs) are halobenzene and benzenethiol. The descriptors D[LASSq0(AW)P]bs and D[LASSq2(AW)P]ap rank eighth and second among the most important D[LQI]cj descriptors, respectively. Two other steric D[LQI]cj descriptors take into account the presence of halogens. One of them is D[LASSq5(AW)G]ap (exhibiting the seventh highest influence), which characterizes the increase in the atomic weight of any two atoms (with at least one of them being a halogen) placed at the topological distance of five or less. Fragments of the type ZCX3 (Z = any atom, X = Cl or Br) and 1,4-dihalobenze can favorably increase the value of D[LASSq5(AW)G]ap. The other steric D[LQI]cj descriptor is D[LASSq3(KU)G]ap and expresses the increment in the atomic accessibility (ability to interact with atoms from other molecules) of any two atoms (with at least one of them being a halogen) placed at the topological distance of three. In this case, 1,2-dihalobenzene substructures (Cl and Br favored over F) and moieties such as ZCX3, 5-halopyrimidines, and 4,6-disubstituted halobenzene are examples of fragments that increase the value of D[LASSq3(KU)G]ap. In the mtc-QSAR-EL model, D[LASSq3(KU)G]ap is ranked twelfth among all the D[LQI]cj descriptors.
Finally, we have D[LASSq1(PSA)Y]ta (with the ninth highest significance) characterizing the increase in the polar surface area of any two adjacent heteroatoms, and, therefore, only pyridazine and 1,2,3-triazine rings, as well as the sulfonamide and phosphorus-containing functional groups (phosphorus forming bonds with only oxygen and/or sulfur), will considerably augment the value of D[LASSq1(PSA)Y]ta.
2.3. Computational Drug Repurposing of Agency-Regulated Chemicals as Anti-TB Agents
We performed the virtual screening of a dataset formed by 8898 agency-regulated chemicals (
Supplementary Material 4), which included (but was not limited to) investigational and FDA-approved drugs. These were predicted by considering the 24 experimental conditions
cj reported in the dataset used to build the mtc-QSAR-EL model, yielding 213,552 predictions (
Supplementary Materials 5 and 6). In terms of the reliability of the predictions using the AD of the mtc-QSAR-EL model according to the descriptors’ space approach, we generated the TSAD values for each of these 8898 chemicals (
Supplementary Material 7). Then, the metrics FA(%) and S(TSAD) were calculated (see
Section 3.5 for full explanation). In any case, we would like to highlight that FA(%) describes the anti-TB potential of a molecule because it expresses the frequency in which a chemical is predicted as active against
Mtb. A high FA(%) value (maximum value is 100%) for a chemical means that it has a higher probability of having anti-TB activity by inhibiting multiple
Mtb strains with MIC
90 < 7622.22 nM, which is the highest of the MIC
90 cutoffs reported in this work (see
Section 3.1). At the same time, a high S(TSAD) value (the ideal value = 12 × 24 experimental conditions
cj = 288, with 12 being the maximum TSAD value) indicates the general tendency of a chemical to be inside the AD according to the descriptors’ space approach, which, together with the consensus predictions performed by the mtc-QSAR-EL model, helps to assess the degree of reliability of such predictions.
The combined use of FA(%) and S(TSAD) (
Supplementary Material 8) allowed us to rank the 8898 agency-regulated chemicals, and those with FA(%) > 80% and S(TSAD) ≥ 276 were the top ranked, giving priority to those exhibiting the highest FA(%) values. Notice that there is no “rule of thumb” in terms of the criteria used to prioritize chemicals. Therefore, in the present study, we employed these rigorous cutoff values for FA(%) and S(TSAD) to achieve a virtual screening hit rate of 0.49% (44 out of 8898 chemicals) which is higher than the greatest hit rate value of 0.14% for high-throughput screening but lower than smallest hit rate value for virtual screening campaigns (1%) [
36].
By using the aforementioned metrics for compound prioritization, the mtc-QSAR-EL model identified three chemicals with experimentally proven anti-TB activity (
Figure 4): macozinone (PBTZ-169), BTZ-043, and niclosamide. Notice that macozinone is a remarkably potent piperazine-containing benzothiazinone, being able to inhibit multiple
Mtb strains at MIC
99 ≤ 1 nM [
37]. In the case of BTZ-043, although structurally related to macozinone, it lacks the 2-(4-(cyclohexylmethyl)piperazin-1-yl) moiety, which decreases its activity. Still, BTZ-043 is a nanomolar inhibitor of several
Mtb strains [
37,
38]. On the other hand, niclosamide offers very attractive opportunities for anti-TB therapies because, in addition to being a recognized antihelminthic drug, it also has a wide-spectrum antimicrobial profile, which includes nanomolar to micromolar potency against diverse viruses (including coronaviruses) [
39], as well as anti-TB activity in the low micromolar range [
40,
41].
We should recall that the cutoff values of the metrics FA(%) and S(TSAD) used by us are very rigorous. If, for instance, we slightly relax these two metrics (e.g., FA(%) > 60% and S(TSAD) ≥ 270), other agency-regulated chemicals with experimentally proven anti-TB activity pop up. These are the cases of biapenem (FA(%) = 66.67% and S(TSAD) = 288) and TBA-354 (FA(%) = 91.67% and S(TSAD) = 270), whose anti-TB profile has been demonstrated in vitro (low micromolar range) and in vivo [
42,
43]. Notice that further relaxing FA(%) and/or S(TSAD) will allow the mtc-QSAR-EL model to identify more anti-TB agents, but a larger number of false positives may also be predicted. In the end, it is up to the analyst to select the adequate values of the metrics FA(%) and S(TSAD), which will depend on the number of drugs available for testing, with this being a key element when planning the expenditure of financial resources for experimental validation of virtually selected chemicals. In any case, we advise the use of FA(%) > 29% and S(TSAD) ≥ 250 since with these cutoffs, most of the known FDA-approved and investigational anti-TB drugs (not included in the dataset used to build our mtc-QSAR-EL model) were identified in the virtual screening analysis. We would like to highlight that the FA(%) value suggested by us is in the range reported for the hit rate in the prospective virtual screening [
36,
44].
Returning to the top 44 molecules predicted by the mtc-QSAR-EL model from the 8898 agency-regulated chemicals, we ran an experiment. To obtain a deeper insight regarding the new molecular patterns with promising anti-TB potential, we used the webserver mycoCSM [
45], which is an online tool with the capability to predict the antimycobacterial profile of any molecule, including the anti-TB activity. The results of the top 44 agency-regulated chemicals identified as potential anti-TB agents (multi-strain inhibitors of
Mtb) by our mtc-QSAR-EL model together with the predictions derived from mycoCSM appear in
Table 5. It can be seen that the mtc-QSAR-EL model and the webserver mycoCSM converge in 10 out of 44 chemicals (22.73%). In our opinion and experience, such a convergence is very good taking into account that the mtc-QSAR-EL model and the webserver mycoCSM employed dissimilar approaches to characterize the molecular structure, different machine learning algorithms, and distinct ways to consider experimental information when building the computational models. In the end, given all the experimental and theoretical evidence, we can conclude that our mtc-QSAR-EL model can be efficiently used alone or in combination with other in silico tools for virtual screening of anti-TB molecules, which may inhibit several
Mtb strains.