Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity

Toropova, Alla P.; Toropov, Andrey A.; Roncaglioni, Alessandra; Benfenati, Emilio

doi:10.3390/toxics13040309

Open AccessArticle

Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity

Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri (IRCCS), Via Mario Negri 2, 20156 Milano, Italy

^*

Author to whom correspondence should be addressed.

Toxics 2025, 13(4), 309; https://doi.org/10.3390/toxics13040309

Submission received: 13 February 2025 / Revised: 10 April 2025 / Accepted: 12 April 2025 / Published: 16 April 2025

Download

Browse Figures

Versions Notes

Abstract

The optimal descriptors generated by the CORAL software are studied as potential models of cardiotoxicity. Two significantly different cardiotoxicity databases are studied here. Database 1 contains 394 hERG inhibitors (pIC50) and external 200 substances that are potential drugs, which were used to confirm the predictive potential of the approach for Database 1. Database 2 contains cardiotoxicity data for 13864 different compounds in a format where active is denoted as 1 and inactive is denoted as 0. The same model-building algorithms were applied to all three databases using the Monte Carlo method and Las Vegas algorithm. The latter was used to rationally distribute the available data into training and validation sets. The Monte Carlo optimization for the correlation weights of different molecular features extracted from SMILES was improved by including the conformity coefficient of the correlation prediction (CCCP). This improvement provided greater predictive potential in the considered models.

Keywords:

cardiotoxicity; human ether-a-go-go-related gene (hERG) blocker compounds; QSAR; Monte Carlo method; coefficient of conformism of a correlative prediction (CCCP); external validation; CORAL software

Graphical Abstract

1. Introduction

There is no doubt that cardiovascular diseases are some of the most important medical problems globally. Accordingly, drug monitoring for cardiotoxicity should also be considered a very important medical problem. Data on cardiotoxicity are needed for the development of a wide range of new drugs [1]. The human potassium channel gene (hERG) plays an important role in regulating heart rate, and data on cardiotoxicity associated with hERG inhibition by drugs and environmental chemicals provide essential information for medicinal chemistry. Enhancing cardiotoxicity data in direct experiments is large-scale, expensive, and virtually impossible. Therefore, the use of in silico models can help to reach this endpoint. For instance, to define the hierarchy of molecules in the early stages of new drug development and minimize the risks of using new pharmaceutical agents, computational approaches are used to predict the hERG-blocking potential of new drug candidates. Indeed, quantitative structure–property/activity relationships (QSPRs/QSARs) for the cardiac toxicity of organic hERG blockers are reported in the literature [2,3,4,5].

A very complex model requires too much knowledge and too many skills. Convenient models that do not call for significant intellectual effort (i.e., economy of thinking) are therefore desirable, extracting information from data already available, such as effect data associated with chemical structures. However, chemical information can be described and processed in many ways, depending on the chemical structure format. It is interesting to compare the practical applications of InChI (International Chemical Identifier) and SMILES (Simplified Molecular Input Line Entry System), which are common formats used to describe the structure of a substance. According to SCOPUS, citations of works using InChI for QSPR/QSAR analysis are only 2% of those using SMILES in QSPR/QSAR analysis [6,7].

The CORAL software (http://www.insilico.eu/coral, accessed on 11 April 2025) requires only SMILES and numerical data on an endpoint to build a model. Therefore, while the approach used here is fairly convenient, it should be noted that two innovations are applied to the construction of the cardiotoxicity models described here. First, the Conformity Coefficient of Correlative Prediction (CCCP) [8] was used to improve the efficiency of the Monte Carlo method for model generation. Second, the Las Vegas algorithm [8,9] was used to select a prospective split of the available data into training and validation sets. It is likely that this is the first time that these steps have been applied to the construction of a cardiotoxicity model.

The aim of the study to attempt to assess the influence of these new characteristics [10,11] on the predictive potential of cardiotoxicity models in order to properly develop successful models.

2. Results

2.1. Database 1

Cardiotoxicity models were developed using two databases. The first database contains data on 394 organic molecules with a range of pLD50 values (−3.64, 2.00) and was used for regression models [1].

Two approaches were compared, and their differences lie in the use of the CCCP parameter. Table 1 gives the statistical characteristics of models obtained using target function T₁, and Table 2 shows the results obtained using target function T₂, which includes the CCCP parameter. Comparing the results of the two algorithms, the predictive potential of models using target function T₂ is better than that of models using target function T₁. For instance, the R² of the validation set of the three models (using the three partitions) is always below 0.7 in the case of target function T₁, but always above 0.7 for target function T₂. Similarly, the R² of the calibration set is always below 0.78 for target function T₁ but is always above 0.81 for target function T₂. In general, improvement is clear for all the statistical parameters in Table 1 and Table 2.

Figure 1 compares the correlation coefficients for experimental and predicted pIC50 values. It is important to note the difference between the red/green division in cases A and B (Figure 1). The red/green division in the first case is based on the quality assessed by calculating the difference between the “experimental value and the calculated value” of individual points, whereas in the second case, the red/green division is based on the difference between the correlation coefficients obtained by removing all substances one at a time. It can be seen that the configurations (red/green) of the geometric arrangement of the image points in cases A and B in Figure 1 differ significantly. When considering the set of points forming the geometric image of the correlation, special subsets can be defined. Points for which the forecast is accurate or overestimated are shown in Figure 1A in green. Points for which the forecast is underestimated are shown in Figure 1A in red. Based on this division of points, the index of the ideality of correlation (IIC) is calculated [8]. When calculating the correlation intensity index (CII) and the CCCP, other subsets are considered. Their definition is as follows. If the removal of a point is accompanied by an increase in the correlation coefficient, this point is classified as being an opponent to the correlation. This means that this substance has a detrimental role from a statistical point of view. These points are indicated by red in Figure 1B. If the removal of a point is accompanied by a decrease in the correlation coefficient, this point is classified as a supporter of correlation. In other words, these substances contribute more than the others to the good quality of the model. These points are indicated in green in Figure 1B. CII is the sum of the contributions of all opponents of correlation. CCCP is the ratio of the sum of the effects of all opponents of correlation to the sum of all supporters of correlation.

The IIC, CII, and CCCP were tested as values capable of influencing the predictive potential of models [8].

There are differences and relationships in the values obtained for the various parameters applied to different cases. These are presented in Figure 2. For instance, we observe that the concordance correlation coefficient (CCC) and IIC are sensitive to the change in the slope of the regression, and their values are low for the regression presented at the bottom of the figure. Conversely, the decreases in other parameters are similar when there is a relative spread of the values regarding the perfect regression (such as for the regression in the middle of the figure) and when the slope is affected (such as for the regression at the bottom of the figure).

The Las Vegas algorithm [8,9] selects the best Monte Carlo model from a group of attempts to build the model. The selection is based on the statistical quality of the model for the calibration set. The most meaningful statistical parameters are those associated with the model once it is fully developed, represented by the calibration set. The model is ready only when the modeling parameters are optimized, i.e., at the last step of the modeling process, which is carried out with the calibration set. The previous modeling steps, as implemented for the values for the active and passive training sets, are only preliminary, and the model is not mature. Thus, the statistics for the active training and passive training sets are less important for the evaluation of the final model. Table 1 and Table 2 and Figure 3 demonstrate that the CCCP is useful in the process of Monte Carlo optimization. Figure 3 shows the correlation coefficients obtained with the different sets.

Figure 4 illustrates the correlations for the different subsets. It is clear that in the calibration step, the model is mature and there is closer alignment between the calculated and experimental values than in the active and passive training sets, which show a much larger spread of values.

The comparison of the model based on the T₂-optimization shows that our results are comparable with the statistical quality of other models with the same endpoint, as reported in Table 3. In a previously published study, the results were good [12], but they utilized a much smaller dataset. The values reported in [12], with R² 0.81, are closely comparable with the R² of the calibration set, which is always above 0.81, with a mean value of 0.82. Thus, the model we presented here gave slightly better results than the best model in Table 3. In addition, our model utilizes a larger dataset, which is always an advantage since it provides a larger, and therefore more robust, basis, providing an increased applicability domain. There are other advantages. An advantage of our model compared to others is that it is quite simple in its general approach and implementation. Indeed, the CORAL software (http://www.insilico.eu/coral, accesed on 11 April 2025) simply needs SMILES as input, and there is no need to calculate molecular descriptors. It is useful to compare the results of our model with those in [1], because in this case, the comparison examines the exact same sets of substances (the apparently different number of substances indicated in Table 3 is due to the fact that we identified duplicates in the original dataset). The model described in [1] implies multiple very complex systems: docking simulation and three-dimensional QSAR models. Not only is the strategy complex, requiring not only one algorithm, but the components of this combined system are quite demanding, since a docking simulation is needed and models use three-dimensional descriptors. Thus, the system, the algorithms, and the molecular descriptors are quite complex. Conversely, our tool does not require the integration of multiple models, does not require complex algorithms and the calculation of three-dimensional descriptors is not necessary. In reality, molecular descriptors are not needed at all, since the software simply and directly uses SMILES. Thus, overall, our model offers several aspects of innovation that provide better results when compared using the same dataset.

2.2. Database 2

The second database was used to assess the statistical utility of the approach used in the case of classification models. For this purpose, three classification models were constructed using semi-correlations for the hERG inhibitory capacity of a large array of compounds taken from the literature [17]. From the entire dataset, all active compounds were considered together with an equal number of inactive compounds. The total number of considered compounds was 13,846 (endpoint values 0 for non-toxic compounds and 1 for toxic ones).

Table 4 contains the predictive potential of models based on semi-correlations obtained with target function T₁ and target function T₂. Again, it can be observed that the T₂ provides better models, even if the results obtained with target function T₁ are equally as good. Also in this case, as we previously highlighted in the case of regression model, the mean values are those obtained using the calibration set, which are related to the optimized model. These values should then be compared with those obtained using the substances in the validation set, which are not used in the model development. These represent the application of the model to new substances, and ideally, the difference in the statistical values for the calibration and validation sets should be small. This is demonstrated in Table 4, indicating that the model has good statistical values regarding both its robustness and its predictivity.

Table 5 contains a comparison of the statistical quality of models presented in the literature and models based on semi-correlations built using the CORAL software [18].

The only model with results comparable to those presented here can be those observed in [22]. However, that model utilized only 7889 compounds, whereas the model suggested in this study used 13,846 compounds and is thus expected to have a broader applicability domain.

The case of the comparison of the results of the different models in Table 5 is quite similar to the case we discussed previously regarding results comparisons in regression models. There are advantages due to the fact that our model has a larger dataset compared with the dataset of the best other model in the table, thus representing a more robust model. We compare our results with those obtained in the model outlined in [17] because we used the same dataset and thus the comparison is more appropriate. The accuracy of the model in [17] is slightly worse than that which we obtained, while the specificity is the same. However, this representation is misleading. We observed that the sensitivity is only 0.67 in the other study, while it is >0.99 in our case. Thus, the statistics are much better in our case. This is also due to our strategy of splitting the set of inactive substances into different sets, repeating the modeling approach; this solves the issue of unbalanced datasets. The model in [17] used a much more complex algorithm to support vector machines and hundreds of molecular descriptors calculated using different programs. Conversely, our algorithm is much simpler and does not use molecular descriptors at all. Thus, all these points represent useful innovative aspects of the modeling approach presented here.

3. Discussion

The CORAL software has been used to build QSPR/QSAR models for over ten years by many institutes [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58]. The basic idea used to develop this software is the use of SMILES to represent molecular structures. Most of the previously used QSPR/QSAR models were based on the use of a molecular graph to represent molecular structures, in which the vertices represent different chemical elements and the edges represent covalent bonds [59,60].

Both of the aforementioned options for representing the molecular structure have their advantages and disadvantages; however, the main thing is that these representations of molecular structures are far from identical, and therefore, in principle, can complement each other. The CORAL software makes it possible to construct models based on SMILES or the representation of the molecular structure in the form of a molecular graph, as well as through hybrid representation with the involvement of molecular features expressed by SMILES attributes together with invariants of molecular graphs in the modeling process [61,62,63,64].

In addition to the possibility of constructing hybrid descriptors as above indicated, specific molecular features that are characteristic of the molecular system as a whole were considered for QSPR/QSAR modeling. These are the global attributes of SMILES, such as BOND, NOSP, and HALO. The BOND is a code of covalent bonds. The NOSP is a code that represents configurations of nitrogen, oxygen, sulfur, and phosphorus in a molecular system.

SMILES can serve as a basis for developing other variants of molecular structure information that are capable, in principle, of somehow intersecting with complex biochemical features of molecular structures that determine the biological activity of a substance. These may be combinatorial features, such as contributions of individual atoms or proportions of pairs of atoms [11].

The fragments of local symmetry (FLS) examined in this study are partially related to mathematical symmetry. On the one hand, they have a significant level of symmetry, but in a topological sense, applied to local situations in the molecule, and not to the whole structure. On the other hand, this fact indicated that the molecular fragments represented by FLS are not related to the traditional symmetry concept, which is global. However, FLSs can improve the predictive potential of QSAR models. This is confirmed in the study.

In addition to this kind of representation, there are other possibilities to influence the results of stochastic optimization. These are the statistical criteria of the forecast potential, namely, the correlation ideality index, the correlation intensity index, and the coefficient of conformism for correlative prediction.

The basic principles and expectations when using stochastic modeling with the Monte Carlo approach via the CORAL program (http://www.insilico.eu/coral) are as follows.

Any QSPR/QSAR model can be affected by the presence of certain substances in a given (sub)set, thus it is a random event. Therefore, considering one distribution for the training and the validation sets is not enough for a robust assessment of the predictive potential of the method used; it is necessary to consider several random (non-identical) splits in the training and the validation sets.
Even when considering a single split into training and validation sets, multiple runs of the stochastic Monte Carlo simulation process will yield different values of the statistical characteristics of the training and validation sets. In this case, the important and necessary information for the correct assessment of the predictive potential of the method obtained is the dispersion of the statistical values for the training and validation sets.
This dispersion is not necessarily associated with an error (limitation of the predictive potential) of the considered method; there may be cases of special influence of the IIC and CII factors, which divide the correlation cluster into two sub-clusters for the training set, as we showed in Figure 1.
To strengthen the statistical reliability of the selected divisions in training and validation sets, the Las Vegas algorithm was used to obtain the divisions with minimal statistical defects. The essence of the specified stochastic process (the Las Vegas algorithm) is the construction of structured training and validation sets. The structured training set includes passive and active training sets accompanied by a calibration set. The statistical defect of each of the specified sets is the sum of the statistical defects of the SMILES in the abscissa of the model.
Ideally, a good model yields similar statistical parameters for the training and validation sets.

The results of the computational experiments considered here confirm the relevance of the above points, although this type of study should be replicated for different endpoints.

This study was planned as a means of testing the ability of local symmetry fragments in cooperation with the CCCP to improve the predictive potential of the models. The comparison of Table 1 and Table 2 demonstrated that the CCCP results in significant improvement in the statistical quality of the models. The corresponding computational experiments without the correlation weights of the FLS have shown that without these, the models are inadequate. Additional studies with different endpoints may explore whether this is true in other cases too.

At present, the study of IIC and CII has shown that the potential for using IIC is greater than that for CII, even though the combined use of IIC and CII may be beneficial. It is possible to manage the process of cooperation between IIC and CII by using coefficients similar to F₁–F₃ used in Equations (3) and (4).

Similar studies can be conducted for the new criterion of the predictive potential of the CCCP considered here. Advantages of this criterion in terms of improving the stochastic processes used here for constructing models, both in terms of the Monte Carlo method and in terms of the Las Vegas algorithm, may require additional verification. The study of both the cooperative application of the considered criteria of the forecast potential and their individual capabilities is broad in scope. Obviously, this not only requires experimental implementation but also theoretical understanding. From this point of view, it should be noted that the main advantage of the IIC is its ability to take into account both the correlation coefficient and the mean absolute error (MAE) and/or the root mean squared error (RMSE). To assess the correlation intensity (to calculate CII), other parameters of abstract correlation are used that do not depend on the dispersion of correlation clusters in the “experiment forecast” or “experiment calculation” coordinates (i.e., they do not depend on RMSE and MAE). In fact, optimizing the correlation weights can be interpreted as making a “generalized” decision that affects all compounds (SMILES), not just the training and unseen training sets. This is similar to making a generalized decision through in bicameral legislature in a state parliament. A bicameral legislature is used to avoid a biased decision that is preferred by a particular group of representatives in a parliament. Similarly, two groups of substances that have an unequal influence on the final decision, used separately as training sets and unseen training sets, can help to avoid the biased decision that is preferred by visible substances. The “protests” (substances with an opposite behavior) underlying the calculation of the CII allow us to consider the correlation to be a structure similar to the above-mentioned bicameral legislature. This allows us to compare different correlations using this rule: the smaller the sum of protests [8], the higher the correlation value. The calculation of the CII and the CCCP have a similar basis, which is the so-called “protest” [8]. However, the main difference between the CCCP and the CII is that not only “protests” are taken into account but also the opinion of the supporters of the correlation, that is, the compounds (SMILES) that have a negative protest value. This apparently results in an advantage of the CCCP compared to the CII, because taking into account all opinions is more balanced and yields a higher quantity of information about various phenomena.

In silico simulation can be used as a method of cognition functions by feedback, i.e., any model should be verifiable. For the considered models, an interested user can carry out verification conveniently. It is necessary to download the CORAL program and run it using the splits available from the Supplementary Materials of this study. For QSPR/QSAR simulation by means of stochastic variation in model parameters, a necessary condition is the reproducibility of the results (the values of statistical parameters for the considered partitions). As shown, the reproducibility of the forecast potential is observed with good statistical quality (0.73 ± 0.03 for the determination coefficients for the validation samples). Thus, the proposed modeling concept can be accepted as a convenient tool for QSPR/QSAR analysis.

In principle, the important points related to the development and use of models are the universality and the possibility of standardization. Universality is understood as the ability of the approach to serve various classes of compounds in QSPR/QSAR analysis. heckedStandardization is the ability to determine the essential criteria that guarantee a good level of predictive potential. In terms of universality, the proposed approach can be easily transposed into a modeling tool based on eclectic data using the so-called quasi-SMILES [10]. In terms of standardization, the above-mentioned IIC and CII, as well as the new parameter CCCP, can be used.

To evaluate the proposed approach, a validation of the model obtained for split 1 was performed. For this purpose, the corresponding SMILES-based descriptors were calculated for 200 compounds outlined in [15]. This validation confirmed the predictive potential of the model obtained for split 1. The technical details of this validation are presented in the Supplementary Materials (Table S2).

The above demonstrate that the new possibilities for constructing the stochastic models discussed here, namely (i) the CCCP and (ii) the Las Vegas algorithm, look quite promising.

We plan to conduct corresponding studies in the future, expanding the study scope by considering further cases, including traditional SMILES, organic and inorganic substances, and peptides and nanomaterials using quasi-SMILES.

4. Materials and Methods

Regression model

Database 1

The first database is for regression models. The numerical data on cardiotoxicity expressed in logarithmic units (pIC50) were taken from the literature [1]. Twelve duplicates were detected in the database. After removing the duplicates, the total number of compounds was 394. These were randomly divided into three partitions to produce three models. Each partition contained a structured training set containing the active (A), passive (P), and calibration (C) training sets; in addition, a validation (V) set was used to evaluate the results of the model using new substances (invisible during the construction of the model).

Descriptors

The descriptors applied were calculated as follows:

D C W (T, N) = \sum \{C W (S_{k}) + C W ({S S}_{k}) + C W ({S S S}_{k}) \} + \sum C W (F L S) + \sum C W (A P P)

(1)

S_k is a SMILES atom, i.e., a single symbol or a group of symbols (‘C’, ’O’, ‘N’, etc.) that should be considered a united system (‘Cl’, ‘@@’, %11, etc.); SS_k and SSS_k are two or three connected SMILES atoms. FLS means fragments of local symmetry [10], i.e., fragments of SMILES that can be represented as XYX, XYYX, or XYZYX, where X ≠ Y and Y ≠ Z.

APP is the matrix of atom pair proportions [28]. T and N are parameters of the Monte Carlo optimization that provide numerical data on the correlation weights (CWs) for the SMILES attributes.

Models

The model generated by the CORAL software was calculated as

p {I C}_{50} = C_{0} + C_{1} \times D C W (T, N)

(2)

C₀ and C₁ are regression coefficients.

Monte Carlo method

The Monte Carlo optimization provides the numerical data on the correlation weights of the SMILES attributes listed above. The following calculation aims to provide a larger value for target functions:

T_{1} = R A + R P - |R A - R P| \times F_{1} + (I I C + C I I) \times F_{2}

(3)

T_{2} = R A + R P - |R A - R P| \times F_{1} + (I I C + C I I) \times F_{2} + C C C P \times F_{3}

(4)

RA and RP are correlation coefficients for active and passive training sets, respectively; the index of ideality of correlation (IIC) [11]; the correlation intensity index [8]; and the coefficient of the conformism of a correlative prediction (CCCP) [8] are components of the Monte Carlo optimization—F₁ = F₂ = 0.5; F₃ = 0.3.

Classification models

Database 2

The second database contains data on 13,846 organic molecules for classification models [17]. The descriptors are as we described above.

Models

A unique feature of the approach under consideration is the possibility of constructing so-called semi-correlations [11], which are tools for the representation of binary classifications based on the principle of active versus inactive, represented by 1 and 0 (in principle, the other option, namely active = +1 and inactive = −1, can be used too).

The use of semi-correlations is carried out by means of a regression model in which the values of the optimal descriptor calculated by Equation (1) are plotted along one axis (ordinate), and along the abscissa, there are only two values of 0 and 1 (or, as mentioned above, −1 and +1) for the identification of activity and non-activity, respectively. This binary classification is carried out according to the following scheme:

y = C_{0} + C_{1} \times D C W (T, N)

(5)

C l a s s i f i c a t i o n (S M I L E S) = \{\begin{matrix} 1 (a c t i v e), i f y \geq 0.5 \\ 0 (i n a c t i v e), i f y < 0.5 \end{matrix}

(6)

Applicability domain

The modeling system, through the CORAL program, assumes a stochastic nature in several aspects. First, it is assumed that the model can be built for any random split in training and validation sets. It is expected that some distributions will lead to a good statistical quality of the model and some to a low statistical quality. Secondly, it is expected that even for “successful” models, there will be a scatter in the statistical characteristics (correlation coefficient and standard deviation). Thus, criteria are needed to select suitable distributions for training and validation. Statistical defect values for molecular features extracted from SMILES and statistical defect values for distributions have been proposed [10]. Depending on the statistical defects of the molecular features, as well as their average values, the applicability domain is determined, as shown below.

The defects for SMILES features (which represent molecular features) are calculated as follows:

d_{k} = \frac{|{P (A}_{k}) - {P^{'} (A}_{k})|}{N (A_{k}) + N^{'} (A_{k})} + \frac{|{P (A}_{k}) - {P^{″} (A}_{k})|}{N (A_{k}) + N^{″} (A_{k})} + \frac{|{P^{'} (A}_{k}) - {P^{″} (A}_{k})|}{N^{'} (A_{k}) + N^{″} (A_{k})}

(7)

where P(A_k), P′(A_k), and P″(A_k) are the probabilities of A_k in the active training, passive training, and calibration sets, respectively, and N(A_k), N′(A_k), and N″(A_k) are the frequencies of A_k in the active training set, passive training set, and calibration set. The statistical SMILES defects (D_j) are calculated as follows:

D_{j} = \sum_{k = 1}^{N A} d_{k}

(8)

where NA is the number of non-blocked SMILES attributes in the SMILES.

A SMILES falls into the domain of applicability if

D j < 2 * \bar{D}

(9)

\bar{D}

is average on the list of SMILES attribute defects {Dj}.

Las Vegas algorithm

The Las Vegas algorithm is a sequence of testing for different splits into active training, passive training, calibration, and validation sets in the process of the Monte Carlo optimizations [8]. The aim of the algorithm used here is the split which provides a determination coefficient for the calibration set that is as large as possible, hoping that it is accompanied by a large determination coefficient for the external validation set. Table 6 contains an example of the Las Vegas algorithm functioning.

Mechanistic interpretation

Through the comparison of several starting points of the described Monte Carlo optimization under the same conditions (same split and same parametrization), one can select a group of SMILES attributes (i.e., the group of molecular features) with positive correlation weights. These can be considered promoters of increases for the endpoint under consideration. Table 7 contains a collection of molecular features with positive correlation weights in several areas of stochastic optimization. One can see that in the absence of fragments of local symmetry, XYYX and XYZYX are promoters of the increase in cardiac toxicity in hERG. The same role is present in certain atoms, such as nitrogen, and plays a significant part in aromaticity (Table 7).

5. Conclusions

Cardiotoxicity models (hERG-blocking compounds expressed as pIC50, Database 1) are proposed based on the biological activity of 394 compounds, aiming to build regression models and providing results as continuous values. Furthermore, another set of compounds from Database 2, which contained 13864 compounds, was applied to develop a classification model. For both databases, good results were achieved. Stochastic Monte Carlo simulation algorithms for the different distribution of the substances into three training sets and a validation set were used. In addition, two functions were used to optimize the correlation weights of molecular features extracted from SMILES by utilizing CORAL software. Five main principles of model construction using CORAL software were formulated to simplify the understanding and possible application of the considered algorithms as much as possible. The proposed approach is convenient since it facilitates the study of possible effects of substances, providing the user with the opportunity to formulate and test various hypotheses related to cardiotoxicity simulation. In particular, the possibility of involving the described types of local symmetry in the simulation was tested in this study. In addition, the effectiveness of the new criterion of predictive potential (CCCP) was tested, and the effectiveness of IIC and CII as criteria of the predictive potential of models was confirmed. This series of optimization phases and approaches have shown to be effective in producing novel in silico models to be used to analyze endpoints related to cardiotoxicity. These novel models provided continuous or classification values, with statistical parameters, as good as or superior to those obtained with other models. An advantage of the CORAL software is that it does not require the calculation of molecular descriptors, and thus it is simpler and more convenient.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/toxics13040309/s1. Table S1: Technical details on split 1 (Database 1). Table S2: Technical details on experimental validation (for external 200 compounds). Table S3: Technical details on the classification model of cardiotoxicity for Database 2.

Author Contributions

Conceptualization, A.P.T., A.A.T., A.R. and E.B.; methodology, A.P.T., A.A.T., A.R. and E.B.; software, A.A.T.; validation, A.P.T., A.A.T., A.R. and E.B.; formal analysis, A.P.T.; data curation, A.P.T. and A.A.T.; writing—original draft preparation, A.P.T., A.A.T., A.R. and E.B.; writing—review and editing, A.P.T., A.A.T., A.R. and E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the European Union’s Horizon 2020 research and innovation program (grant #101037090). The content of this manuscript reflects only the author’s view, and the Commission is not responsible for any use that may be made of the information it contains.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in the article and its Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Obiol-Pardo, C.; Gomis-Tena, J.; Sanz, F.; Saiz, J.; Pastor, M. A multiscale simulation system for the prediction of drug-induced cardiotoxicity. J. Chem. Inf. Model. 2011, 51, 483–492. [Google Scholar] [CrossRef] [PubMed]
Park, M.J.; Lee, K.-R.; Shin, D.-S.; Chun, H.-S.; Kim, C.-H.; Ahn, S.-H.; Bae, M.A. Predicted drug-induced bradycardia related cardio toxicity using a zebrafish in vivo model is highly correlated with results from in vitro tests. Toxicol. Lett. 2013, 216, 9–15. [Google Scholar] [CrossRef] [PubMed]
Sinha, N.; Sen, S. Predicting hERG activities of compounds from their 3D structures: Development and evaluation of a global descriptors based QSAR model. Eur. J. Med. Chem. 2011, 46, 618–630. [Google Scholar] [CrossRef]
Kim, T.; Chung, K.-C.; Park, H. Derivation of Highly Predictive 3D-QSAR Models for hERG Channel Blockers Based on the Quantum Artificial Neural Network Algorithm. Pharmaceuticals 2023, 16, 1509. [Google Scholar] [CrossRef]
Ku, T.C.; Cao, J.; Won, S.J.; Guo, J.; Camacho-Hernandez, G.A.; Okorom, A.V.; Salomon, K.W.; Lee, K.H.; Loland, C.J.; Duff, H.J.; et al. Series of (([1,1′-Biphenyl]-2-yl)methyl)sulfinylalkyl Alicyclic Amines as Novel and High Affinity Atypical Dopamine Transporter Inhibitors with Reduced hERG Activity. ACS Pharmacol. Transl. Sci. 2024, 7, 515–532. [Google Scholar] [CrossRef]
Toropov, A.A.; Toropova, A.P.; Benfenati, E. QSAR-modeling of toxicity of organometallic compounds by means of the balance of correlations for InChI-based optimal descriptors. Mol. Divers. 2010, 14, 183–192. [Google Scholar] [CrossRef]
Toropov, A.A.; Toropova, A.P.; Benfenati, E.; Leszczynska, D.; Leszczynski, J. InChI-based optimal descriptors: QSAR analysis of fullerene[C60]-based HIV-1 PR inhibitors by correlation balance. Eur. J. Med. Chem. 2010, 45, 1387–1394. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A. The coefficient of conformism of a correlative prediction (CCCP): Building up reliable nano-QSPRs/QSARs for endpoints of nanoparticles in different experimental conditions encoded via quasi-SMILES. Sci. Total Environ. 2024, 927, 172119. [Google Scholar] [CrossRef] [PubMed]
Tempo, R.; Ishii, H. Monte Carlo and Las Vegas randomized algorithms for systems and control. Eur. J. Control. 2007, 13, 189–203. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Does the accounting of the local symmetry fragments in quasi-SMILES improve the predictive potential of the QSAR models of toxicity toward tadpoles? Toxicol. Mech. Methods 2024, 34, 737–742. [Google Scholar] [CrossRef]
Toropov, A.A.; Barnes, D.A.; Toropova, A.P.; Roncaglioni, A.; Irvine, A.R.; Masereeuw, R.; Benfenati, E. CORAL models for drug-induced nephrotoxicity. Toxics 2023, 11, 293. [Google Scholar] [CrossRef] [PubMed]
Obrezanova, O.; Csanyi, G.; Gola, J.M.; Segall, M.D. Gaussian processes: A method for automatic QSAR modeling of ADME properties. J. Chem. Inf. Model. 2007, 47, 1847–1857. [Google Scholar] [CrossRef] [PubMed]
Perry, M.; Sanguinetti, M.; Mitcheson, J. Symposium review: Revealing the structural basis of action of hERG potassium channel activators and blockers. J. Physiol. 2010, 588, 3157–3167. [Google Scholar] [CrossRef]
Du-Cuny, L.; Chen, L.; Zhang, S. A critical assessment of combined ligand- and structure-based approaches to HERG channel blocker modeling. J. Chem. Inf. Model. 2011, 51, 2948–2960. [Google Scholar] [CrossRef]
Lanevskij, K.; Didziapetris, R.; Sazonovas, A. Physicochemical QSAR analysis of hERG inhibition revisited: Towards a quantitative potency prediction. J. Comput.-Aided Mol. Des. 2022, 36, 837–849. [Google Scholar] [CrossRef]
Sanches, I.H.; Braga, R.C.; Alves, V.M.; Andrade, C.H. Enhancing hERG Risk Assessment with Interpretable Classificatory and Regression Models. Chem. Res. Toxicol. 2024, 37, 910–922. [Google Scholar] [CrossRef]
Ogura, K.; Sato, T.; Yuki, H.; Honma, T. Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II. Sci. Rep. 2019, 9, 12220. [Google Scholar] [CrossRef] [PubMed]
Toropov, A.A.; Toropova, A.P.; Roncaglioni, A.; Benfenati, E. The system of self-consistent semi-correlations as one of the tools of cheminformatics for designing antiviral drugs. New J. Chem. 2021, 45, 20713–20720. [Google Scholar] [CrossRef]
Frydrych, A.; Jurowski, K. Toxicity of minoxidil—Comprehensive in silico prediction of main toxicity endpoints: Acute toxicity, irritation of skin and eye, genetic toxicity, health effect, cardiotoxicity and endocrine system disruption. Chem.-Biol. Interact. 2024, 393, 110951. [Google Scholar] [CrossRef]
Feng, H.; Jiang, J.; Wei, G.-W. Machine-learning repurposing of DrugBank compounds for opioid use disorder. Comput. Biol. Med. 2023, 160, 106921. [Google Scholar] [CrossRef]
Aggarwal, B.; Singla, R.K.; Ali, M.; Singh, V.; Igoli, J.O.; Gundamaraju, R.; Kim, K.H. Triterpenic and monoterpenic esters from stems of Ichnocarpus frutescens and their drug likeness potential. Med. Chem. Res. 2015, 24, 1427–1437. [Google Scholar] [CrossRef]
Feng, H.; Wei, G.-W. Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models. Comput. Biol. Med. 2023, 153, 106491. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Khan, M.K.H.; Guo, W.; Dong, F.; Ge, W.; Zhang, C.; Gong, P.; Patterson, T.A.; Hong, H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: A comprehensive QSAR modeling study. Expert Opin. Drug Metab. Toxicol. 2024, 20, 665–684. [Google Scholar] [CrossRef]
Delre, P.; Lavado, G.J.; Lamanna, G.; Saviano, M.; Roncaglioni, A.; Benfenati, E.; Mangiatordi, G.F.; Gadaleta, D. Ligand-based prediction of hERG-mediated cardiotoxicity based on the integration of different machine learning techniques. Front. Pharmacol. 2022, 13, 951083. [Google Scholar] [CrossRef] [PubMed]
Boonsom, S.; Chamnansil, P.; Boonseng, S.; Srisongkram, T. ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning. Comput. Biol. Med. 2025, 185, 109480. [Google Scholar] [CrossRef]
Konda, L.S.K.; Praba, S.K.; Kristam, R. hERG liability classification models using machine learning techniques. Comput. Toxicol. 2019, 12, 100089. [Google Scholar] [CrossRef]
Zhang, X.; Mao, J.; Wei, M.; Qi, Y.; Zhang, J.Z.H. HergSPred: Accurate Classification of hERG Blockers/Nonblockers with Machine-Learning Models. J. Chem. Inf. Model. 2022, 62, 1830–1839. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A.; Benfenati, E. The self-organizing vector of atom-pairs proportions: Use to develop models for melting points. Struct. Chem. 2021, 32, 967–971. [Google Scholar] [CrossRef]
Yuan, B.; Wang, Y.; Zong, C.; Sang, L.; Chen, S.; Liu, C.; Pan, Y.; Zhang, H. Modeling study for predicting altered cellular activity induced by nanomaterials based on Dlk1-Dio3 gene expression and structural relationships. Chemosphere 2023, 335, 139090. [Google Scholar] [CrossRef]
Ahmadi, S.; Lotfi, S.; Hamzehali, H.; Kumar, P. A simple and reliable QSPR model for prediction of chromatography retention indices of volatile organic compounds in peppers. RSC Adv. 2024, 14, 3186–3201. [Google Scholar] [CrossRef]
Goyal, S.; Rani, P.; Chahar, M.; Hussain, K.; Kumar, P.; Sindhu, J. Quantitative structure activity relationship studies of androgen receptor binding affinity of endocrine disruptor chemicals with index of ideality of correlation, their molecular docking, molecular dynamics and ADME studies. J. Biomol. Struct. Dyn. 2023, 41, 13616–13631. [Google Scholar] [CrossRef] [PubMed]
Bamdi, F.; Shiri, F.; Ahmadi, S.; Salahinejad, M.; Bazzi-Allahri, F. Optimization of Monte Carlo Method-Based QSPR modeling for lipophilicity in radiopharmaceuticals. Chem. Phys. Lett. 2024, 843, 141239. [Google Scholar] [CrossRef]
Gupta, S.; Kashyap, M.; Bansal, Y.; Bansal, G. In silico insights into design of novel VEGFR-2 inhibitors: SMILES-based QSAR modelling, and docking studies on substituted benzo-fused heteronuclear derivatives. SAR QSAR Environ. Res. 2024, 35, 265–284. [Google Scholar] [CrossRef] [PubMed]
Soleymani, N.; Ahmadi, S.; Shiri, F.; Almasirad, A. QSAR and molecular docking studies of isatin and indole derivatives as SARS 3CLpro inhibitors. BMC Chem. 2023, 17, 32. [Google Scholar] [CrossRef]
Lotfi, S.; Ahmadi, S.; Azimi, A.; Kumar, P. Prediction of second-order rate constants of the sulfate radical anion with aromatic contaminants using the Monte Carlo technique. New J. Chem. 2023, 47, 19504–19515. [Google Scholar] [CrossRef]
Ouabane, M.; Tabti, K.; Hajji, H.; Elbouhi, M.; Khaldan, A.; Elkamel, K.; Sbai, A.; Aziz Ajana, M.; Sekkate, C.; Bouachrine, M.; et al. Structure-odor relationship in pyrazines and derivatives: A physicochemical study using 3D-QSPR, HQSPR, Monte Carlo, molecular docking, ADME-Tox and molecular dynamics. Arab. J. Chem. 2023, 16, 105207. [Google Scholar] [CrossRef]
Ahmadi, S.; Lotfi, S.; Azimi, A.; Kumar, P. Multicellular target QSAR models for predicting of novel inhibitors against pancreatic cancer by Monte Carlo approach. Results Chem. 2024, 10, 101734. [Google Scholar] [CrossRef]
Vukomanović, P.; Stefanović, M.; Stevanović, J.M.; Petrić, A.; Trenkić, M.; Andrejević, L.; Lazarević, M.; Sokolović, D.; Veselinović, A.M. Monte Carlo Optimization Method Based QSAR Modeling of Placental Barrier Permeability. Pharm. Res. 2024, 41, 493–500. [Google Scholar] [CrossRef]
Rezaie-keikhaie, N.; Shiri, F.; Ahmadi, S.; Salahinejad, M. QSTR based on Monte Carlo approach using SMILES and graph features for toxicity toward Tetrahymena pyriformis. J. Iran. Chem. Soc. 2023, 20, 2609–2620. [Google Scholar] [CrossRef]
Hamidi, E.; Fatemi, M.H.; Jafari, K. Thermal conductivity of carbon-based nanofluids; a theoretical modeling using nano-quantitative structure–property relationships. Chem. Phys. Lett. 2024, 846, 141344. [Google Scholar] [CrossRef]
Živadinović, B.; Stamenović, J.; Živadinović, J.; Živadinović, L.; Sokolović, M.; Filipović, S.S.; Sokolović, D.; Veselinović, A.M. QSAR modelling, molecular docking studies and ADMET predictions of polysubstituted pyridinylimidazoles as dual inhibitors of JNK3 and p38α MAPK. J. Mol. Struct. 2022, 1265, 133504. [Google Scholar] [CrossRef]
Ouabane, M.; Zaki, K.; Tabti, K.; Alaqarbeh, M.; Sbai, A.; Sekkate, C.; Bouachrine, M.; Lakhlifi, T. Molecular toxicity of nitrobenzene derivatives to tetrahymena pyriformis based on SMILES descriptors using Monte Carlo, docking, and MD simulations. Comput. Biol. Med. 2024, 169, 107880. [Google Scholar] [CrossRef] [PubMed]
Šarić, S.; Kostić, T.; Lović, M.; Aleksić, I.; Hristov, D.; Šarac, M.; Veselinović, A.M. In silico development of novel angiotensin-converting-enzyme-I inhibitors by Monte Carlo optimization based QSAR modeling, molecular docking studies and ADMET predictions. Comput. Biol. Chem. 2024, 112, 108167. [Google Scholar] [CrossRef] [PubMed]
Živadinović, B.; Stamenović, J.; Živadinović, J.; Živadinović, L.; Živadinović, A.; Stojanović, M.; Lazarević, M.; Sokolović, D.; Veselinović, A.M. Monte Carlo optimization based QSAR modeling, molecular docking studies, and ADMET predictions of compounds with antiMES activity. Struct. Chem. 2023, 34, 2225–2235. [Google Scholar] [CrossRef]
Azimi, A.; Ahmadi, S.; Kumar, A.; Qomi, M.; Almasirad, A. SMILES-based QSAR and molecular docking study of oseltamivir derivatives as influenza inhibitors. Polycycl. Aromat. Compd. 2023, 43, 3257–3277. [Google Scholar] [CrossRef]
Hamzehali, H.; Lotfi, S.; Ahmadi, S.; Kumar, P. Quantitative structure–activity relationship modeling for predication of inhibition potencies of imatinib derivatives using SMILES attributes. Sci. Rep. 2022, 12, 21708. [Google Scholar] [CrossRef]
Zivkovic, M.; Zlatanovic, M.; Zlatanovic, N.; Golubović, M.; Veselinović, A.M. A QSAR model for predicting the corneal permeability of drugs-the application of the Monte Carlo optimization method. New J. Chem. 2022, 47, 224–230. [Google Scholar] [CrossRef]
Tajiani, F.; Ahmadi, S.; Lotfi, S.; Kumar, P.; Almasirad, A. In-silico activity prediction and docking studies of some flavonol derivatives as anti-prostate cancer agents based on Monte Carlo optimization. BMC Chem. 2023, 17, 87. [Google Scholar] [CrossRef]
Antović, A.; Karadžić, R.; Živković, J.V.; Veselinović, A.M. Development of QSAR model based on Monte Carlo optimization for predicting GABAA receptor binding of newly emerging benzodiazepines. Acta Chim. Slov. 2023, 70, 634–641. [Google Scholar] [CrossRef]
Nikolić, N.; Kostić, T.; Golubović, M.; Nikolić, T.; Marinković, M.; Perić, V.; Mladenović, S.; Veselinović, A.M. Monte Carlo optimization based QSAR modeling of angiotensin ii receptor antagonists. Acta Chim. Slov. 2023, 70, 318–326. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, A. CORAL: Predictions of quality of rice based on retention index using a combination of correlation intensity index and consensus modelling. In QSPR/QSAR Analysis Using SMILES and Quasi-SMILES; Challenges and advances in computational chemistry and, physics; Toropova, A.P., Toropov, A.A., Eds.; Springer: Cham, Switzerland, 2023; Volume 33, pp. 421–462. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, A. CORAL: QSAR models of CB1 cannabinoid receptor inhibitors based on local and global SMILES attributes with the index of ideality of correlation and the correlation contradiction index. Chemom. Intell. Lab. Syst. 2020, 200, 103982. [Google Scholar] [CrossRef]
Achary, P.G.R. QSPR modelling of dielectric constants of π-conjugated organic compounds by means of the CORAL software. SAR QSAR Environ. Res. 2014, 25, 507–526. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, A.; Singh, D. CORAL: Development of a hybrid descriptor based QSTR model to predict the toxicity of dioxins and dioxin-like compounds with correlation intensity index and consensus modelling. Environ. Toxicol. Pharmacol. 2022, 93, 103893. [Google Scholar] [CrossRef] [PubMed]
Ghasemi, G.; Nasiri, N. Using QSAR calculations on benzamide derivatives to inhibit reproduction in endothelial cells by CORAL SEA. Pak. J. Pharm. Sci. 2022, 35, 841–844. [Google Scholar] [CrossRef]
Ahmadi, S.; Lotfi, S.; Afshari, S.; Kumar, P.; Ghasemi, E. CORAL: Monte Carlo based global QSAR modelling of Bruton tyrosine kinase inhibitors using hybrid descriptors. SAR QSAR Environ. Res. 2021, 32, 1013–1031. [Google Scholar] [CrossRef] [PubMed]
Kumar, P.; Kumar, A.; Lal, S.; Singh, D.; Lotfi, S.; Ahmadi, S. CORAL: Quantitative Structure Retention Relationship (QSRR) of flavors and fragrances compounds studied on the stationary phase methyl silicone OV-101 column in gas chromatography using correlation intensity index and consensus modelling. J. Mol. Struct. 2022, 1265, 133437. [Google Scholar] [CrossRef]
Veselinović, J.B.; Đorđević, V.; Bogdanović, M.; Morić, I.; Veselinović, A.M. QSAR modeling of dihydrofolate reductase inhibitors as a therapeutic target for multiresistant bacteria. Struct. Chem. 2018, 29, 541–551. [Google Scholar] [CrossRef]
Cvetković, D.; Gutman, I. The Computer System G R A P H: A Useful Tool in Chemical Graph Theory. J. Comput. Chem. 1986, 7, 640–644. [Google Scholar] [CrossRef]
Gutman, I. Graph-based molecular structure-descriptors theory and applications: Preface. Indian J. Chem.—Sect. A Inorg. Phys. Theor. Anal. Chem. 2003, 42, 1197–1198. [Google Scholar]
Kumar, A.; Chauhan, S. Monte Carlo method based QSAR modelling of natural lipase inhibitors using hybrid optimal descriptors. SAR QSAR Environ. Res. 2017, 28, 179–197. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A. Hybrid optimal descriptors as a tool to predict skin sensitization in accordance to OECD principles. Toxicol. Lett. 2017, 275, 57–66. [Google Scholar] [CrossRef] [PubMed]
Lotfi, S.; Ahmadi, S.; Kumar, P. The Monte Carlo approach to model and predict the melting point of imidazolium ionic liquids using hybrid optimal descriptors. RSC Adv. 2021, 11, 33849–33857. [Google Scholar] [CrossRef] [PubMed]
Lotfi, S.; Ahmadi, S.; Kumar, P. A hybrid descriptor based QSPR model to predict the thermal decomposition temperature of imidazolium ionic liquids using Monte Carlo approach. J. Mol. Liq. 2021, 338, 116465. [Google Scholar] [CrossRef]

Figure 1. The comparison of geometrical interpretations for (A) the division of the total correlation clusters into two sub-clusters caused by the influence of IIC or CII and (B) the graphical interpretation of the opponents and supporters of the correlation.

Figure 2. Comparison of the statistical parameters obtained for three regression cases.

Figure 3. Comparisons of the determination coefficients for experimental and predicted pIC50 in the active (A), passive (P), calibration (C), and validation (V) sets.

Figure 4. Graphical representation of models for random splits obtained using target function T₂.

Table 1. Statistical characteristics of models obtained with target function T₁.

	n *	R²	CCC	IIC	CII	Q²	CCCP	RMSE	MAE	F
A	101	0.660	0.795	0.765	0.794	0.648	0.198	0.802	0.599	192
P	96	0.690	0.748	0.655	0.812	0.679	0.331	0.896	0.737	208
C	101	0.762	0.862	0.873	0.893	0.746	0.673	0.510	0.392	318
V	96	0.660	-	-	-	-	-	0.60	0.47	-
A	100	0.530	0.690	0.594	0.747	0.508	0.008	0.938	0.761	109
P	99	0.565	0.703	0.523	0.745	0.549	−0.011	0.964	0.804	126
C	96	0.746	0.859	0.864	0.848	0.732	0.382	0.464	0.350	276
V	99	0.647	-	-	-	-	-	0.46	0.38	-
A	98	0.608	0.756	0.749	0.772	0.590	0.113	0.817	0.673	149
P	98	0.685	0.762	0.799	0.801	0.674	0.211	0.919	0.798	208
C	99	0.777	0.879	0.881	0.881	0.764	0.607	0.359	0.292	338
V	99	0.682	-	-	-	-	-	0.50	0.38	-

(*) n = number of substances.

Table 2. Statistical characteristics of models obtained with target function T₂.

	n *	R²	CCC	IIC	CII	Q²	CCCP	RMSE	MAE	F
A	101	0.562	0.720	0.627	0.760	0.544	0.141	0.909	0.710	127
P	96	0.552	0.672	0.374	0.778	0.536	0.214	0.994	0.807	116
C	101	0.828	0.909	0.909	0.933	0.815	0.858	0.387	0.290	475
V	96	0.773	-	-	-	-	-	0.44	0.33	-
A	100	0.536	0.698	0.676	0.755	0.516	0.135	0.929	0.763	113
P	99	0.532	0.691	0.592	0.750	0.516	0.011	0.995	0.825	110
C	96	0.824	0.905	0.907	0.904	0.814	0.763	0.379	0.296	439
V	99	0.706	-	-	-	-	-	0.41	0.33	-
A	98	0.526	0.690	0.642	0.753	0.506	0.094	0.899	0.764	107
P	98	0.603	0.716	0.747	0.750	0.589	0.023	1.00	0.825	146
C	99	0.817	0.902	0.904	0.923	0.805	0.825	0.324	0.263	434
V	99	0.716	-	-	-	-	-	0.45	0.35	-

(*) n = number of substances.

Table 3. Comparison of the predictive potential for cardiotoxicity models.

n *	R²	Reference
400	0.52	[1]
137	0.81	[12]
400	0.52	[13]
529	0.59	[14]
345	0.41	[15]
840	0.66	[16]
394	0.64	Present study

(*) n = number of substances.

Table 4. The statistical characteristics of the model for cardiotoxicity are based on semi-correlations obtained using target functions T₁ and T₂.

Target Function	Split	Set *	Sensitivity	Specificity	Accuracy	Matthews Correlation Coefficient
T₁	1	C	0.990	0.986	0.988	0.976
		V	0.990	0.990	0.990	0.979
	2	C	0.964	0.962	0.963	0.927
		V	0.968	0.967	0.967	0.934
	3	C	0.970	0.967	0.969	0.938
		V	0.980	0.965	0.972	0.945
T₂	1	C	0.999	0.997	0.998	0.996
		V	1.000	0.998	0.999	0.998
	2	C	0.988	0.986	0.987	0.974
		V	0.992	0.990	0.991	0.982
	3	C	0.994	0.988	0.991	0.982
		V	0.996	0.988	0.992	0.983

(*) C = calibration set; V = validation set.

Table 5. Comparison of the statistical characteristics of classification cardiotoxicity models from the literature with models built using semi-correlative models obtained with CORAL software.

Accuracy	Sensitivity	Specificity	References
0.984	0.670	0.995	[17]
0.905	0.702	0.912	[19]
0.664	0.865	0.657	[20]
0.568	0.876	0.557	[21]
0.998	0.998	1.000	[22]
0.865	0.858	0.871	[23]
-	0.930	0.900	[24]
0.876	0.871	0.882	[25]
0.930	0.967	0.780	[26]
0.840	0.824	0.858	[27]
0.999	0.999	0.998	Split 1, target function T₂
0.991	0.992	0.990	Split 2, target function T₂
0.992	0.995	0.988	Split 3, target function T₂

Table 6. An example of the Las Vegas algorithm features.

Test *	W%	N₁₁₁	N₁₁₀	N₁₀₁	N₁₀₀	N_All	CCCP	R²_A	R²_P	R²_C	Best R²_C	Best Test
1	97	291	5	4	1	301	0.527	0.830	0.813	0.746	0.746	1
2	97	299	5	4	0	308	0.424	0.771	0.748	0.713	0.746	1
3	97	289	6	4	0	299	0.740	0.700	0.667	0.776	0.776	3
4	96	281	9	3	0	293	0.777	0.828	0.751	0.621	0.776	3
5	93	287	18	3	2	310	0.760	0.822	0.807	0.706	0.776	3
6	90	283	28	0	3	314	0.659	0.830	0.830	0.782	0.782	6
7	93	291	18	1	3	313	−0.113	0.480	0.461	0.000	0.782	6
8	95	283	12	2	1	298	0.624	0.765	0.769	0.569	0.782	6
9	96	288	11	1	0	300	0.655	0.796	0.795	0.548	0.782	6
10	91	287	21	4	3	315	0.671	0.830	0.820	0.768	0.782	6

(*) The test is a probe of the Monte Carlo optimization; W% = the percentage of the SMILES attributes taking part in the process of the Monte Carlo optimization; N₁₁₁ = the number of SMILES attributes observed in active training, passive training, and calibration sets; N₁₁₀ = the number of SMILES attributes observed in active and passive training sets (only); N₁₀₁ = the number of SMILES attributes observed in the active training set and calibration set (only); N₁₀₀ = the number of SMILES attributes observed in only the active training set; R²_A, R²_P, and R²_C are determination coefficients for the active training, passive training, and calibration sets, respectively.

Table 7. A collection of promoters of cardiotoxicity increases.

SMILES Attribute	CWs, Probe 1	CWs, Probe 2	CWs, Probe 3	CWs, Probe 4	CWs, Probe 5
c……	0.459	0.459	0.459	0.459	0.459
c…c……	0.084	0.146	0.146	0.146	0.146
C…(……	0.122	0.184	0.184	0.184	0.184
[xyyx0]…	1.760	0.384	0.384	0.384	0.384
c…(……	0.117	0.242	0.242	0.242	0.242
c…1……	1.289	0.351	0.351	0.351	0.351
c…c…c…	0.757	0.257	0.257	0.257	0.257
N……	0.075	0.137	0.137	0.137	0.137
c…c…1…	0.299	0.424	0.424	0.424	0.424
[xyzyx0]…	0.085	0.210	0.210	0.210	0.210
C…C……	0.101	0.351	0.351	0.351	0.351
O……	0.186	0.248	0.248	0.248	0.248
N…C……	0.259	0.321	0.321	0.321	0.321
N…(……	0.064	0.127	0.127	0.127	0.127
c…c…(…	0.064	0.439	0.439	0.439	0.439

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity. Toxics 2025, 13, 309. https://doi.org/10.3390/toxics13040309

AMA Style

Toropova AP, Toropov AA, Roncaglioni A, Benfenati E. Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity. Toxics. 2025; 13(4):309. https://doi.org/10.3390/toxics13040309

Chicago/Turabian Style

Toropova, Alla P., Andrey A. Toropov, Alessandra Roncaglioni, and Emilio Benfenati. 2025. "Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity" Toxics 13, no. 4: 309. https://doi.org/10.3390/toxics13040309

APA Style

Toropova, A. P., Toropov, A. A., Roncaglioni, A., & Benfenati, E. (2025). Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity. Toxics, 13(4), 309. https://doi.org/10.3390/toxics13040309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using the Coefficient of Conformism of a Correlative Prediction in Simulation of Cardiotoxicity

Abstract

1. Introduction

2. Results

2.1. Database 1

2.2. Database 2

3. Discussion

4. Materials and Methods

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI