Next Article in Journal
Quantification of Chemical Groups and Quantitative HPLC Fingerprint of Poria cocos (Schw.) Wolf
Previous Article in Journal
Optimization of the Ultrasound Operating Conditions for Extraction and Quantification of Fructooligosaccharides from Garlic (Allium sativum L.) via High-Performance Liquid Chromatography with Refractive Index Detector
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toxicity Assessment of the Binary Mixtures of Aquatic Organisms Based on Different Hypothetical Descriptors

1
College of Chemistry and Chemical Engineering, Yantai University, Yantai 264005, China
2
LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
*
Author to whom correspondence should be addressed.
Molecules 2022, 27(19), 6389; https://doi.org/10.3390/molecules27196389
Submission received: 11 August 2022 / Revised: 7 September 2022 / Accepted: 19 September 2022 / Published: 27 September 2022
(This article belongs to the Section Computational and Theoretical Chemistry)

Abstract

:
Modern industrialization has led to the creation of a wide range of organic chemicals, especially in the form of multicomponent mixtures, thus making the evaluation of environmental pollution more difficult by normal methods. In this paper, we attempt to use forward stepwise multiple linear regression (MLR) and nonlinear radial basis function neural networks (RBFNN) to establish quantitative structure–activity relationship models (QSARs) to predict the toxicity of 79 binary mixtures of aquatic organisms using different hypothetical descriptors. To search for the proper mixture descriptors, 11 mixture rules were performed and tested based on preliminary modeling results. The statistical parameters of the best derived MLR model were Ntrain = 62, R2 = 0.727, RMS = 0.494, F = 159.537, Q2LOO = 0.727, and Q2pred = 0.725 for the training set; and Ntest = 17, R2 = 0.721, RMS = 0.508, F = 38.773, and q2ext = 0.720 for the external test set. The RBFNN model gave the following statistical results: Ntrain = 62, R2 = 0.956, RMS = 0.199, F = 1279.919, Q2LOO = 0.955, and Q2pred = 0.855 for the training set; and Ntest = 17, R2 = 0.880, RMS = 0.367, F = 110.980, and q2ext = 0.853 for the external test set. The quality of the models was assessed by validating the relevant parameters, and the final results showed that the developed models are predictive and can be used for the toxicity prediction of binary mixtures within their applicability domain.

1. Introduction

It has been widely recognized that toxic chemicals in the environment do not exist individually but are mixtures of each other; thus, research on both single and mixed toxic compounds is important [1,2]. Today, water system pollution is worthy of attention. With modern industrial development, various water bodies suffer from damage due to industrial wastewater discharge and the discharge of pesticides, organic chemicals, and other pollutants from all walks of life, with the destruction of water bodies posing a great threat to the ecological environment and human health [3,4]. In the aquatic world, water pollutants hardly ever exist as a compound alone but rather contaminate water bodies as a mixture. Under existing human risk assessment regulations, the toxicity performance of toxic chemicals is simply assessed by information on the toxicity of their individual compounds, whereas the fact is that the toxicity performance of an individual compound is vastly divergent from that of a mixture and that interactions between the components of a mixture can lead to synergistic and antagonistic reactions of an individual compound, resulting in significant changes in the toxicity performance of a mixture compared to a single one [5]. To address these issues, a risk assessment of the toxicity of mixtures seems to be greatly important and thus necessary.
For the combined effects of mixture toxicity evaluation, one can usually resort to two types of models: concentration addition (CA) and independent action (IA) models. As we know, the conventional CA is the most common additive toxicity model, which is based on the assumption that although the components of a mixture have the same mechanism of action with the same target on the basis, the components do not interact with each other and the rule has been highly endorsed by the US National Environmental Protection Agency and the European Commission [6,7,8]. The IA model, also termed the reaction-additive approach, assumes that the mechanisms and targets of action vary widely among the components of the mixture [9,10]. The two traditional models have their own limitations, although several new approaches based on the CA model as well as the IA model have been developed to overcome this shortcoming, e.g., the generalized concentration addition (GCA) model. CA, IA, and their optimization models also have a flawed side; they should only be used when there are no interactions between the mixture components and the mechanism of action of each component is known [11].
Currently, more methods are commonly employed to evaluate the toxicity of mixtures. Quantitative structure–activity relationship (QSAR) modeling is one of the methods used to predict the toxicity of environmental contaminants and is applied to the risk assessment of mixtures in various research fields, such as physical chemistry, medicinal chemistry, and toxicology [12]. As a computational method, the primary purpose of this method involves the use of data statistics, analysis, and other mathematical methods to correlate the activity, properties, and toxicity of a compound with its structure [13].As a theoretical study, which uses a relatively small number of compounds to establish mathematical relationships for predicting the properties of unknown compounds that fit the relationship, it reduces the burden of experimental studies and provides an alternative to animal studies [14]. The establishment of stable and reliable quantitative structure–activity relationship models relies on the calculation and selection of molecular descriptors. Thus, choosing the appropriate mixing rules of the mixing descriptors used in the models is particularly important for model quality owing to the complexity of the mechanisms of toxicity present [15].
QSAR methods have been used to evaluate the damage to aquatic ecosystems caused by aquatic toxicants. For instance, in the article by Yaqian Wang, seven phenolic and four aliphatic phenolic derivatives, including 2,4,6-trihalophenol, 2,6-dilhalogenated-4-nitrophenol, etc., in wastewater were studied by these methods [16]. In the work by Stefano Cassani et al., QSAR models based on forward stepwise multiple linear regression (MLR), partial least Squares regression (PLS), and associative neural network (ASNN) methods were developed for triazoles and benzotriazoles, and the mixture toxicity between them was predicted and analyzed [17]. In turn, the mixture effects of drug molecules on aquatic ecosystems were studied by Kabiruddin Khan et al. [18]. An evaluation of the toxic substances of interest, especially for the mixtures, is somewhat more useful.
The aim of this study was to develop stable QSAR models that can be used to predict the toxicity of aquatic binary mixtures at the EC50 level. The models were developed by a regression-based MLR modeling technique and a nonlinear-based radial basis function neural networks (RBFNNs) modeling technique. In our work, to obtain the proper mixture descriptors, multiple parameter combinations were used for preliminary modeling. Not only were the parameter combinations considered as a simple CA, but additional 10 mixing rules were chosen for modeling comparisons. Finally, the best combination of parameters was selected based on preliminary modeling results. A schematic diagram of the entire approach is presented in Figure 1,and more details about the methods can be found in Section 3).

2. Results

2.1. Model Development for Individual Compounds

As shown in Figure 1, in the current study, 35 compounds present in the aqueous environment were used for model development. First, a total of 614 descriptors were obtained after molecular optimization. Then, 348 nonconforming descriptors were eliminated after the CODESSA software heuristic method (HM), and 85 descriptors were pruned out of the descriptor pool after descriptor relevance screening. Finally, 181 descriptors were left to build the model. Through the forward stepwise multiple regression method, the five representative descriptors were selected for the construction of both the MLR and the RFBNNs models. (The total and the eliminated descriptor in this study are present in the supplementary material.) To evaluate the models, the leave-out many cross-validation (LOM) and Y randomization test methods were performed. For the individual groups, the results are listed in Table 1.
As seen from the table, five descriptors were chosen for the construction of the relationship through a forward stepwise multiple linear regression approach: (1) minimum atomic state energy for a C atom (Min-C); (2) relative number of N atoms (Rn-N); (3) total point charge component of the molecular dipole (Tot-pc); (4) maximum e-n attraction for a C–H bond (Max- C-H); and (5) HOMO energy (HOMO). It is expressed by Equation (1):
pEC50 = 155.63−1.2453 × [Min-C]−7.7850 × [Rn-N] + 0.27723 × [Tot-pc]-0.33616 × [Max-C-H] + 0.23028 × [HOMO]
Ntrain = 28, R2 = 0.887, RMS = 0.398, F = 204.660, Q2LOO = 0.887, and R2pred = 0.938; and Ntest = 7, R2 = 0.987, RMS = 0.297, F = 374.332, and q2ext =0.933. The statistical results reveal that the model has excellent statistical reliability for the internal training set and outstanding predictive power for the external test set (the minimum accepted value for Q2LOO, R2pred, and q2ext is 0.5, and for R2 is 0.6; in addition, the smaller the RMS value, the larger the F value, and the higher the quality of the model). In Table 2, the predicted pEC50 values, experimental pEC50 values, residual values, and individual compound names for an individual compound by each of the two models are shown, along with figures depicting the experimental and predicted value curves for the training and test sets under each of the two models in Figure 2.
An assessment of the correlation between the individual descriptors is necessary for binary mixture descriptor generation, and when the pairwise correlation between two individual descriptors fell below 0.8 [19], it was demonstrated that the individual descriptors were highly independent of each other and were able to avoid chance correlation effects due to interdependence, for which we examined the cross-correlation matrix of the five descriptors, as shown in Table 3.

2.2. Model Development for Binary Mixture Compounds

In the following, five descriptors were chosen to build the mixture compound models. Regarding the generation of binary mixture descriptors, although it has been shown that descriptors are generated simply by addition [20], which is not the case, it is based on this that 11 different mixing rules (Table 1) were applied to generate the binary mixture descriptors. Not only was the choice of the hybrid rule compared in terms of the evaluation of the model quality, but also the ninth hybrid rule was more dominant in terms of the contribution of the descriptors to the molar ratio [11], and thus this hybrid rule was finally used for the construction of the hybrid model. The equation for this is
D MIX   = X 1   D 1 3 + X 2 D 2 3  
In addition, it will allow molecules with larger descriptor values to be more dominant in the case of large differences between the component descriptors.
Finally, the MLR for the mixture in which the equation expression is constructed based on the mixing rules and the selected descriptors is as follows:
pEC50 = 93.276 + 6.874 × [DRn-N] + 0.003 × [DHOMO] − 0.011 × [DTot-pc]-0.0000779 × [DMin-C] − 0.00001334 × [D Max-C-H]
Ntrain = 62, R2 = 0.727, RMS = 0.494, F = 159.537, Q2LOO = 0.727, and Q2pred = 0.725; and Ntest = 17, R2 = 0.721, RMS = 0.508, F = 38.773, and q2ext = 0.720.
When looking at the values of the statistical parameters for the internal validation, it can be demonstrated that the model has good robustness in conjunction with statistical reliability, while the external validation parameters of the model also indicate that the model has a better predictive power. Furthermore, one has a higher requirement for the model with the statistical covariates presented in Table 4. In a standard comparison of the statistical parameters (R2 > 0.6, q2ext > 0.5, and k ≈ 1 (k is the slope of the regression line through the origin)), the quality of the model is in line with the requirements.
The set of relevant statistics predicted by the mixtures under each of the two models is shown in Table 5; the graphs of the experimental and predicted values for the training set as well as the test set are shown in Figure 3. Moreover, Figure 4 shows the scatter plots of the residuals for all data under both models.

2.3. RBFNN Results Analysis

Generally, nonlinear models have outstanding predictive power compared to linear ones. In the current work, an RBFNN model was constructed using the same descriptors as those used to construct the MLR model, and the quality of the model was evaluated by randomly dividing the training set as well as the test set. In the construction of the RBFNN model, the structure of the three-layer network was constructed as 5-nk-1, denoting the number of cells in the input, hidden, and output layers. For the radial basis function (RBF), the width (r) range was controlled by starting at 0.1 in increments of 0.1 until increasing to 4. The optimal width value found for the individual compound RBFNN model was r = 4, and the optimal width value for the mixture was r = 1.6.
The prediction data of the RBFNN models for the individual compound and the binary mixture are shown in Table 1 and Table 2, respectively, and the plots of the experimental and predicted values for the training set and the test set are shown in Figure 2 and Figure 3, respectively. In addition, the scatter plots of the residuals of the two models are also shown in Figure 4. The statistical parameters for the group of the individual compounds were Ntrain = 28, R2 = 0.864, RMS = 0.436, F = 165.309, Q2LOO = 0.864, and R2pred = 0.847 for the training set, and Ntest = 7, R2 = 0.941, RMS = 0.466, F = 79.300, and q2ext = 0.834 for the test set, and it is apparent from the observations that the model exhibits superior reliability as well as predictiveness. The statistical parameters of the nonlinear model for the mixture were Ntrain = 62, R2 = 0.956, RMS = 0.199, F = 1279.919, Q2LOO = 0.955, and Q2pred = 0.855 for the training set; and Ntest = 17, R2 = 0.880, RMS = 0.367, F = 110.980, and q2ext = 0.853 for the test set. Analysis of the statistical results shows that the robustness of the model as well as its predictive power is somewhat enhanced compared to the MLR modeling. In addition, the other parameters of the model quality were calculated as shown in Table 4.

2.4. Validation of the Models

Y randomization tests are typically applied to confirm the degree of chance correlation of regression models. In the current work, 10 tests were conducted for each of the two models, and the amount of parameter validation for both models is shown in Table 6. As seen from the table, the low R2 and RMS, along with the high MAE values (see below), indicated that the chance correlation of the models will barely exist [21].
The expression for MAE is
M A E = i = 1 n | y ( a ) y ^ ( b ) | n  
where n is the number of the example,   y ( a ) is the experimental value of a single example, and y ^ ( b ) is the predicted value of a single example.
In the following, a fivefold cross-validation algorithm was applied to assess the robustness of the models built. The validation parameters, including R2, F, and RMS, were used to evaluate the two models, as shown in Table 7 and Table 8. According to the statistical results, it is evident that the average training quality (MLR: R2 = 0.727, F = 163.334, and RMS = 0.496; and RBFNN: R2 = 0.935, F = 881.963, and RMS = 0.237) of both models together with the average predictive quality (MLR: R2 = 0.737, F = 41.147, and RMS = 0.494; and RBFNN: R2 = 0.939, F = 233.952, and RMS = 0.243) has a good presentation, indicating that both models have relatively robust properties.

2.5. Model Applicability Domain Analysis

In the present study, the visual application domains of the models that can typically be observed and analyzed using Williams plots are shown in Figure 5, where the horizontal coordinate is the leverage value, the vertical coordinate is the standardized cross-validation residual, the outlier criterion (read line) for the x coordinate is set to 3 m/n (m is the number of 5 descriptors chosen and n is the number of 62 compounds used as the training set), and the outlier criterion (read line) for the y coordinate is specified as ± 3σ (σ = 0.967). In Figure 5, it can be clearly observed that both the training set compounds and the test set compounds are within the domain, indicating that a reasonably close relationship can be established between the selected descriptors and the toxicity of the compounds, and within the domain, the model is able to fit the relevant mixture toxicity predictions.

2.6. Discussion of Selected Descriptors

In the present work, the built model can be used to predict the compounds, including aldehydes (AHS), cyanides (CGS), sulfonamides (SAS), and methomyl (TMP). These substances contain the C, H, O, N, and S atoms. Five descriptors, namely, HOMO, Rn-N, Tot-pc, Max-C-H, and Min-C, were used to construct the QSAR model. HOMO and Rn-N have a positive effect on the increase in toxicity, and Tot-pc, Max e-n, and Min-C-H have a negative effect on the increase in toxicity. Rn-N is a constitutional descriptor, which, in the present work, mainly concerning organic molecules containing cyano and nitrogen-containing heterocycles in the compounds, is positively correlated with the increase in toxicity. HOMO, Tot-pc, Max-C-H, and Min-C are quantum-chemical descriptors; HOMO has the property of being an electron donor in a chemical reaction, and the higher the energy is, the higher the toxicity value. Tot-pc is a class of descriptors describing the polarity of a molecule, where the size of the molecular dipole depends on the distribution of the point charges. MAX-C-H describes the electron-to-charge attraction of two atoms in a C–H bond. When a bond is formed, more electronegative atoms participate in the bonding orbitals to gain some of the electrons, and the electron-to-charge attraction decreases, resulting in a decrease in electronegativity, which has a positive effect on the decrease in toxicity. Min-C is a calculation of the energy of the ground state of the C atom; the lower the energy is, the more stable and the lower the toxicity value.

3. Materials and Methods

3.1. Datasets

In the present work, 35 compounds widely present in aqueous environments, including 13 aldehydes (AHS), 11 cyanides (CGS), 10 sulfonamides (SAS), and 1 methomyl (TMP), were obtained from Mainak Chatterjee et al. [22]. In an aqueous environment, the above compounds can cause damage to the environment in the form of either individual compounds or in mixtures, which directly or indirectly have an impact on human health. The names of the individual compounds, the experimental, predicted, and residual values (pEC50) by the MLR model and the RBFNN model are presented in Table 2 (pEC50 units are in moles per liter).
Additionally, the data in Table 2 were randomly divided into 28 training sets as well as 7 test sets (marked with asterisk) to assess the performance of the individual compound models. The selection of molecular descriptors highly correlated with single toxicity in the whole dataset.
Seventy-nine binary mixtures are listed in Table 5 along with the values obtained by each of the two modeling techniques. Toxicity ratios for the binary mixtures of two individual compounds and compositional information are also listed in this table. As the QSAR studies usually did, binary mixtures of 79 species were randomly divided into 62 training sets as well as 17 test sets (marked with an asterisk) for assessing the quality of the binary mixture models.

3.2. Molecular Descriptors Generation and Selection

In this study, the molecular descriptors were employed as quantitative representations of the molecular structural features and were then used to build the relationship between the representative descriptors and the target of the toxicity or activity. The process is as follows: the structures of the individual compounds were first drawn by Chemdraw (PerkinElmer Informatics, Inc: Massachusetts, MA, USA) [23], followed by preliminary molecular optimization in the HyperChem 6.0 program (Hypercube, Inc., Waterloo, ON, Canada) [24] software through molecular mechanics MM+ force fields. Then, the preliminary optimized molecular structures were further optimized using the semiempirical AM1 method in the Polak-Ribière algorithm until the root mean square gradient reached 0.001 kcal/mol. Last, the molecular structures were optimized in the MOPAC 6.0 software package (Indiana University: Bloomington, IN, USA) [25], and the structure files derived from HyperChem and MOPAC were selected for structural descriptors, geometric descriptors, electrostatic descriptors, quantum chemical descriptors, and topological descriptors using the CODESSA 2.63 program (University of Florida, Gainesville, FL, USA) [26] after optimization with the same root-mean-square gradient. Furthermore, 7 of the other descriptors obtained from HyperChem (including logP) were added to the descriptor pool.
Doing this, we needed to find the representative descriptors that are more related to the toxicity of the single compounds. Thus, a heuristic method in CODESSA software was employed, which can be used to calculate a pool of relevant descriptors and subsequent determinations of the most suitable descriptors for the construction of the model.
The generalization of the descriptors for the toxicity assessment of the binary mixtures is a challenge compared to the generation of descriptors for individual compounds. Normally, the approach applied to generating mixture descriptors is the weighted descriptor generation approach [27]. Supposing that the hypothetical descriptors do not follow the simple addition, other calculation rules have been performed [11,28,29,30,31], from which the optimal mixture rule was determined based on the preliminary modeling results. For each mixing rule, the selection standard was considered in terms of the reliability and robustness of the model quality (R2 (correlation coefficient), R2adj (adjusted correlation coefficient), F (Fisher test), and Q2LOO (leave-one-out correlation coefficient)), and the expression of the equation, as shown in Table 1.

3.3. Model Building Technique

3.3.1. Multiple Linear Regressions

As the easiest model-building statistical technique, MLR has been commonly implemented in quantitative constructive relationship models to solve regression analysis problems. It can predict the values of two or more explanatory variables from the corresponding variables, and it is essentially an extension of ordinary least squares (OLS) regression involving two and more explanatory variables as a mathematical statistical technique. Typically, multiple linear regression uses molecular descriptors as X variables to establish a mathematical relationship with the desired activity value Y (pEC50), which involves dividing the overall dataset into a test set and a training set. In a regression model, the regression coefficient bn and the intercept b0 of the model have the following relationship:
Y = b 0 + b 1 x 1 + b 2 x 2 + + b n x n
Regularly for the model, reliability and predictiveness are generally assessed using statistical parameters, including R2, RMS, F, Q2LOO, R2pred, q2ext, etc. For the development of the MLR model, we have chosen to do this in the CODESSA 2.63 program (University of Florida, Gainesville, FL, USA).

3.3.2. Radial Basis Function Neural Networks (RBFNN)

During the construction of the QSAR, one can consider not only the best multivariate linear model available by constructing the molecular descriptor versus the desired activity value (pEC50) but also some nonlinear models to establish the relationship, such as the RBFNN. The specifics of radial basis function neural networks have been described in several papers [32,33]. Briefly, a radial basis function neural network consists of an input layer, a hidden layer and an output layer. The input layer is virtually just an input vector and does not involve the processing of information; the hidden layer consists of k radial basis function (RBF) units; and the output layer is composed of linear neurons (LNS) [32,34]. In general, the radial basis function (RBF) serves as a Gaussian function defined by the center (Cj) and the width (Rj). The radial basis function (RBF) implements the nonlinear transformation by measuring the Euclidean distance between the input vector (X) and the center of the radial basis function (Cj):
h j = e x p ( X c j 2 / r j 2 )
y k ( X ) = j = 1 n h w k j h j ( X ) + b k
where yk stands for the kth output unit of the input vector X, wkj for the weight relationship between the kth output unit and the jth implied layer unit, and bk for the respective bias.
The determination of centers and width plays a decisive role in model development. Multiple methods are used to select centers. In the current study, we chose to employ a forward subset selection routine to select the centers from the training set samples. Regarding the width selection, the width range was from 0.1 to 4, in increments of 0.1, and the best width was ultimately selected. Afterwards, the connection weights between the hidden and output layers were selected using the least squares method. For the development of the RBFNN model, we have chosen to do this in MATLAB software (Online access: https://www.mathworks.com/products/matlab.html, access on 25 November 2021).
The RBFNN model was evaluated using the same statistical parameters as the MLR model.

4. Conclusions

Toxicity estimation of 79 aquatic mixtures was performed by quantitative constitutive relationship modeling through MLR and RBFNN methods. Eleven different mixing rules of the hypothetical descriptors were considered to obtain the proper models. Statistical results show that the developed MLR models are more robust as well as predictive, while the RBFNN models have a better model quality compared to the former. Furthermore, the statistical results show that the developed descriptors have excellent performance for the toxicity of mixtures wirhin the applicability domain range. We conclude that the models can be effective for the toxicity prediction of aquatic contaminants and have practical value for ecological assessment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/molecules27196389/s1. The total and the eliminated descriptor.

Author Contributions

Conceptualization M.J., F.L. and M.N.D.S.C.; methodology, M.J., F.L., L.Z. and X.Z.; software, M.J., F.L., C.T. and X.Z. validation, M.J., L.Z., F.L., C.T. and X.Z.; formal analysis, C.T. and F.L.; resources, F.L. and X.Z.; writing—original draft preparation, M.J.; writing—review and editing, F.L., C.T. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Natural Science Foundation of Shandong Province in China (ZR2021MB024) and National Natural Science Foundation of China (21778047, 21675138, 21705139).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

There are no conflicts to declare.

References

  1. Wang, T.; Zhou, X.; Wang, D.; Yin, D.; Lin, Z. Using molecular docking between organic chemicals and lipid membrane to revise the well known octanol–water partition coefficient of the mixture. Environ. Toxicol. Pharmacol. 2012, 34, 59–66. [Google Scholar] [CrossRef]
  2. Escher, B.I.; Baumer, A.; Bittermann, K.; Henneberger, L.; König, M.; Kühnert, C.; Klüver, N. General baseline toxicity QSAR for nonpolar, polar and ionisable chemicals and their mixtures in the bioluminescence inhibition assay with Aliivibrio fischeri. Environ. Sci. Processes Impacts 2017, 19, 414–428. [Google Scholar] [CrossRef]
  3. Schwarzenbach, R.P.; Escher, B.I.; Fenner, K.; Hofstetter, T.B.; Johnson, C.A.; von Gunten, U.; Wehrli, B. The challenge of micropollutants in aquatic systems. Science 2006, 313, 1072–1077. [Google Scholar] [CrossRef]
  4. Rand, G.M.; Wells, P.G.; McCarty, L.S. Introduction to Aquatic Toxicology//Fundamentals of Aquatic Toxicology; CRC Press: Boca Raton, FL, USA, 2020; pp. 3–67. [Google Scholar] [CrossRef]
  5. Yang, R.S.; Thomas, R.S.; Gustafson, D.L.; Campain, J.; A Benjamin, S.; Verhaar, H.J.; Mumtaz, M.M. Approaches to developing alternative and predictive toxicology based on PBPK/PD and QSAR modeling. Environ. Health Perspect. 1998, 106 (Suppl. 6), 1385–1393. [Google Scholar] [CrossRef]
  6. Heys, K.A.; Shore, R.F.; Pereira, M.G.; Jones, K.C.; Martin, F.L. Risk assessment of environmental mixture effects. RSC Adv. 2016, 6, 47844–47857. [Google Scholar] [CrossRef]
  7. Usepa, U. Guidelines for the health risk assessment of chemical mixtures. Fed. Regist. 1986, 51, 34014–34025. [Google Scholar]
  8. Kortenkamp, A.; Backhaus, T.; Faust, M. State of the art report on mixture toxicity. Contract 2009, 70307, 94–103. [Google Scholar]
  9. Altenburger, R.; Backhaus, T.; Boedeker, W.; Faust, M.; Scholze, M.; Grimme, L.H. Predictability of the toxicity of multiple chemical mixtures to Vibrio fischeri: Mixtures composed of similarly acting chemicals. Environ. Toxicol. Chem. Int. J. 2000, 19, 2341–2347. [Google Scholar] [CrossRef]
  10. Bliss, C.I. The toxicity of poisons jointly applied. Ann. Appl. Biol. 1939, 26, 585–615. [Google Scholar] [CrossRef]
  11. Bureš, M.S.; Ukić, Š.; Cvetnić, M.; Prevarić, V.; Markić, M.; Rogošić, M.; Kušić, H.; Bolanča, T. Toxicity of binary mixtures of pesticides and pharmaceuticals toward Vibrio fischeri: Assessment by quantitative structure-activity relationships. Environ. Pollut. 2021, 275, 115885. [Google Scholar] [CrossRef] [PubMed]
  12. Luan, F.; Xu, X.; Liu, H.; Cordeiro, M.N.D.S. Prediction of the baseline toxicity of non-polar narcotic chemical mixtures by QSAR approach. Chemosphere 2013, 90, 1980–1986. [Google Scholar] [CrossRef] [PubMed]
  13. Roy, K.; Kar, S.; Das, R.N. A Primer on QSAR/QSPR Modeling: Fundamental Concepts; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
  14. Chatterjee, M.; Roy, K. Prediction of aquatic toxicity of chemical mixtures by the QSAR approach using 2D structural descriptors. J. Hazard. Mater. 2021, 408, 124936. [Google Scholar] [CrossRef] [PubMed]
  15. Khan, P.M.; Rasulev, B.; Roy, K. QSPR modeling of the refractive index for diverse polymers using 2D descriptors. ACS Omega 2018, 3, 13374–13386. [Google Scholar] [CrossRef]
  16. Wang, Y.; Liu, H.; Yang, X.; Wang, L. Aquatic toxicity and aquatic ecological risk assessment of wastewater-derived halogenated phenolic disinfection byproducts. Sci. Total Environ. 2022, 809, 151089. [Google Scholar] [CrossRef] [PubMed]
  17. Cassani, S.; Kovarich, S.; Papa, E.; Roy, P.P.; Rahmberg, M.; Nilsson, S.; Sahlin, U.; Jeliazkova, N.; Kochev, N.; Pukalov, O.; et al. Evaluation of CADASTER QSAR models for the aquatic toxicity of (benzo) triazoles and prioritisation by consensus prediction. Altern. Lab. Anim. 2013, 41, 49–64. [Google Scholar] [CrossRef] [PubMed]
  18. Khan, K.; Benfenati, E.; Roy, K. Consensus QSAR modeling of toxicity of pharmaceuticals to different aquatic organisms: Ranking and prioritization of the DrugBank database compounds. Ecotoxicol. Environ. Saf. 2019, 168, 287–297. [Google Scholar] [CrossRef] [PubMed]
  19. Topliss, J.G.; Edwards, R.P. Chance factors in studies of quantitative structure-activity relationships. J. Med. Chem. 1979, 22, 1238–1244. [Google Scholar] [CrossRef]
  20. Yao, Z.; Lin, Z.; Wang, T.; Tian, D.; Zou, X.; Gao, Y.; Yin, D. Using molecular docking-based binding energy to predict toxicity of binary mixture with different binding sites. Chemosphere 2013, 92, 1169–1176. [Google Scholar] [CrossRef]
  21. Kaneko, H. Estimation of predictive performance for test data in applicability domains using y-randomization. J. Chemom. 2019, 33, e3171. [Google Scholar] [CrossRef]
  22. Roy, K.; Kar, S.; Das, R.N. Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar] [CrossRef]
  23. ChemDraw Professional 20.0.0.41; PerkinElmer Informatics, Inc.: Massachusetts, MA, USA, 2011.
  24. HyperChem 6.01; Hypercube, Inc.: Waterloo, ON, Canada, 2000.
  25. Stewart, J.P.P. MOPAC6.0, Quantum Chemistry Program Exchange, No.455; Indiana University: Bloomington, IN, USA, 1989. [Google Scholar]
  26. Katritzky, A.R.; Lobanov, V.S.; Karelson, M. CODESSA 2.63: T Raining Manual; University of Florida: Gainesville, FL, USA, 1995. [Google Scholar]
  27. Muratov, E.N.; Varlamova, E.V.; Artemenko, A.G.; Polishchuk, P.G.; Kuz’Min, V.E. Existing and developing approaches for QSAR analysis of mixtures. Mol. Inform. 2012, 31, 202–221. [Google Scholar] [CrossRef]
  28. Ajmani, S.; Rogers, S.C.; Barley, M.H.; Livingstone, D.J. Application of QSPR to mixtures. J. Chem. Inf. Modeling 2006, 46, 2043–2055. [Google Scholar] [CrossRef] [PubMed]
  29. Gaudin, T.; Rotureau, P.; Fayet, G. Mixture descriptors toward the development of quantitative structure–property relationship models for the flash points of organic mixtures. Ind. Eng. Chem. Res. 2015, 54, 6596–6604. [Google Scholar] [CrossRef]
  30. Qin, L.T.; Chen, Y.H.; Zhang, X.; Mo, L.Y.; Zeng, H.H.; Liang, Y.P. QSAR prediction of additive and non-additive mixture toxicities of antibiotics and pesticide. Chemosphere 2018, 198, 122–129. [Google Scholar] [CrossRef] [PubMed]
  31. Sobati, M.A.; Abooali, D.; Maghbooli, B.; Najafi, H. A new structure-based model for estimation of true critical volume of multi-component mixtures. Chemom. Intell. Lab. Syst. 2016, 155, 109–119. [Google Scholar] [CrossRef]
  32. Zeng, X.; Zhen, Z.; He, J.; Han, L. A feature selection approach based on sensitivity of RBFNNs. Neurocomputing 2018, 275, 2200–2208. [Google Scholar] [CrossRef]
  33. Derks, E.; Pastor, M.S.S.; Buydens, L.M.C. Robustness analysis of radial base function and multi-layered feed-forward neural network models. Chemom. Intell. Lab. Syst. 1995, 28, 49–60. [Google Scholar] [CrossRef]
  34. Wang, T.; Tang, L.; Luan, F.; Cordeiro, M.N.D.S. Prediction of the toxicity of binary mixtures by QSAR approach using the hypothetical descriptors. Int. J. Mol. Sci. 2018, 19, 3423. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Schematic diagram of the entire approach involved in the development of the QSAR models.
Figure 1. Schematic diagram of the entire approach involved in the development of the QSAR models.
Molecules 27 06389 g001
Figure 2. Plot of the predicted versus experimental log (EC50), including the training and test sets, from the MLR model and RBFNN model.
Figure 2. Plot of the predicted versus experimental log (EC50), including the training and test sets, from the MLR model and RBFNN model.
Molecules 27 06389 g002
Figure 3. Plot of the predicted versus experimental log (EC50), including the training and test sets, from the MLR model and RBFNN model.
Figure 3. Plot of the predicted versus experimental log (EC50), including the training and test sets, from the MLR model and RBFNN model.
Molecules 27 06389 g003
Figure 4. Residuals of the training and test sets by MLR and RBFNN.
Figure 4. Residuals of the training and test sets by MLR and RBFNN.
Molecules 27 06389 g004
Figure 5. The Williams plot of the training and external test sets.
Figure 5. The Williams plot of the training and external test sets.
Molecules 27 06389 g005
Table 1. The 11 different mixing rules and statistic parameters of the individual compound model.
Table 1. The 11 different mixing rules and statistic parameters of the individual compound model.
NO.EquationR2R2adjFQ2LOO
1 D MIX   = D 1 + X 2 D 2 0.7300.71139.3950.731
2 D MIX   = ( X 1   D 1 + X 2 D 2 ) 2 0.7540.73744.6630.756
3 D MIX   = ( X 1   D 1 + X 2 D 2 ) 2 0.7350.71640.4190.805
4 D MIX   = X 1   D 1 + X 2 D 2 0.5250.49316.1450.498
5 D MIX   = X 1   2 D 1 + X 2 2 D 2 0.5690.53919.2570.546
6 D MIX   = X 1   3 D 1 + X 2 3 D 2 0.5810.55220.2250.557
7 D MIX   = X 1   3 D 1 + X 2 3 D 2 3 0.5610.53118.6810.541
8 D MIX   = X 1   D 1 2 + X 2 D 2 2 0.7340.71640.3040.737
9 D MIX   = X 1   D 1 3 + X 2 D 2 3 0.7260.70738.6650.727
10 D MIX   = X 1   D 1 3 + X 2 D 2 3 3 0.7000.67934.0160.704
11 D MIX   = X 1   D 1 2 + X 2 D 2 2 2 0.7150.69636.6890.719
Table 2. Toxicity data of the individual compounds.
Table 2. Toxicity data of the individual compounds.
Nos.Individual CompoundsMLRRBFNN
Toxicity (pEC50 (mol/L))Toxicity (pEC50 (mol/L))
ExperimentalPredictedResidualExperimentalPredictedResidual
1Acetaldehyde2.362.74−0.382.362.7−0.34
2Propionaldehyde*2.722.81−0.092.722.74−0.02
3Butyraldehyde3.252.820.433.252.760.49
4Valeraldehyde3.272.810.463.272.750.52
5Benzaldehyde3.434.27−0.843.434.17−0.74
6p-Nitrobenzaldehyde4.284.170.114.284.170.11
7p-Terephthaldehyde4.314.43−0.124.314.270.04
8p-Chlorobenzaldehyde*4.254.2504.2540.25
9p-Bromobenzaldehyde4.34.31−0.014.34.060.24
10p-Hydrobenzaldehyde4.544.090.454.543.980.56
11p-Methylbenzaldehyde3.823.93−0.113.823.93−0.11
12p-Methoxybenzaldehyde4.034.41−0.384.034.31−0.28
13p-Dimethylaminobenzaldehyde5.45.020.385.44.750.65
14Malononitrile2.552.160.392.552.130.42
15Glycolonitrile2.982.870.112.983.01−0.03
16α-Hydroxyisobutyronitrile3.613.390.223.613.74−0.13
17Allyl cyanide*1.452.14−0.691.452.32−0.87
18Benzonitrile*3.483.57−0.093.483.69−0.21
19Benzyl cyanide4.233.181.054.233.091.14
20Acetonitrile0.750.92−0.170.750.92−0.17
21Acrylonitrile1.511.68−0.171.511.69−0.18
22Succinonitrile0.360.85−0.490.360.91−0.55
23Phthalonitrile3.513.83−0.323.514.05−0.54
24Lactonitrile*2.012.36−0.352.012.82−0.81
25Sulfamethazine4.084.33−0.254.084.37−0.29
26Sulfapyridine3.844.52−0.683.844.57−0.73
27Sulfamethoxazole4.454.69−0.244.454.7−0.25
28Sulfadiazine4.54.320.184.54.390.11
29Sulfisoxazole4.434.54−0.114.434.57−0.14
30Sulfamonomethoxine5.054.580.475.054.60.45
31Sulfachloropyridazine4.784.540.244.784.560.22
32Sulfachinoxalin*4.534.56−0.034.534.58−0.05
33Sulfamethoxydiazine*4.414.370.044.414.43−0.02
34Sulfamethoxypyridazine4.364.4−0.044.364.49−0.13
35Trimethoprim3.223.4−0.183.223.58−0.36
* Test set compound.
Table 3. Inter-correlation between the five descriptors.
Table 3. Inter-correlation between the five descriptors.
Rn-NHOMOTot-pcMin-CMax-C-H
Rn-N+1.000
HOMO−0.297+1.000
Tot-pc+0.217+0.630+1.000
Min-C−0.049−0.622−0.696+1.000
Max-C-H−0.423+0.311+0.197−0.145+1.000
Table 4. The statistical results of the external test set for the MLR and RBFNNs models.
Table 4. The statistical results of the external test set for the MLR and RBFNNs models.
MLRRBFNN
R20.7210.880
F38.773110.980
K0.9991.030
RMS
q2ext
0.508
0.720
0.367
0.853
Table 5. The number of chemicals in the mixtures, ratio of toxic unit, experimental pEC50 mix, predicted pEC50 mix, and their corresponding residuals.
Table 5. The number of chemicals in the mixtures, ratio of toxic unit, experimental pEC50 mix, predicted pEC50 mix, and their corresponding residuals.
Mixture No.Chemicals in
the Mixture
The Ratio of
Toxic Unit
Experimental pEC50 Mix (mol/L)MLRRBFNN
Toxicity (pEC50 (mol/L))Toxicity (pEC50 (mol/L))
PredictedResidualPredictedResidual
1 *1\141\12.442.10.342.75−0.31
22\141\12.632.10.532.71−0.08
33\141\12.772.10.672.720.05
44\141\12.782.090.692.720.06
55\141\12.83.1−0.32.9−0.1
6 *6\141\12.842.560.282.760.08
77\141\12.842.92−0.082.86−0.02
88\141\12.833.26−0.432.710.12
99\141\12.843.22−0.382.70.14
10 *10\141\12.853.34−0.492.87−0.02
1111\141\12.833.03−0.22.99−0.16
12 *12\141\12.843.15−0.313−0.16
1313\141\12.853.53−0.682.86−0.01
145\151\13.153.16−0.013.22−0.07
156\151\13.262.620.643.4−0.14
167\151\13.252.970.283.34−0.09
178\151\13.243.31−0.073.28−0.04
189\151\13.263.28−0.023.29−0.03
1910\151\13.273.39−0.123.010.26
2011\151\13.223.090.132.640.58
21 *13\151\13.283.58−0.32.580.7
22 *1\161\12.642.82−0.183.34−0.7
232\161\12.972.820.153.23−0.26
243\161\13.392.820.573.220.17
25 *5\161\13.513.82−0.313.84−0.33
266\161\13.833.280.553.84−0.01
277\161\13.783.640.143.83−0.05
288\161\13.753.98−0.233.86−0.11
299\161\13.833.94−0.113.830
30 *10\161\13.864.06−0.23.91−0.05
3111\161\13.73.75−0.053.73−0.03
3212\161\13.773.87−0.13.610.16
33*13\161\13.94.25−0.354.45−0.55
341\171\12.182.080.12.160.02
353\171\12.332.090.242.050.28
36 *4\171\12.342.080.262.060.28
375\171\12.343.09−0.752.7−0.36
386\171\12.362.55−0.192.48−0.12
397\171\12.362.9−0.542.67−0.31
40 *8\171\12.363.25−0.892.95−0.59
4110\171\12.363.32−0.962.82−0.46
4211\171\12.353.02−0.672.320.03
4312\171\12.353.14−0.792.37−0.02
4413\171\12.363.51−1.152.45−0.09
455\181\13.453.87−0.423.7−0.25
46 *6\181\13.723.330.393.83−0.11
477\181\13.683.6803.82−0.14
488\181\13.664.03−0.373.88−0.22
49 *10\181\13.744.1−0.363.660.08
5011\181\13.623.8−0.183.150.47
5112\181\13.673.92−0.253.510.16
5213\181\13.784.29−0.513.84−0.06
535\191\13.673.560.113.91−0.24
546\191\14.253.021.233.880.37
55 *7\191\14.143.380.763.950.19
568\191\14.083.720.364.060.02
579\191\14.263.680.584.080.18
5810\191\14.363.80.563.950.41
5913\191\14.53.990.514.490.01
6025\351\15.084.420.665.1−0.02
6126\351\14.854.750.15.31−0.46
6227\351\15.54.750.755.50
6328\351\15.424.580.845.240.18
64 *29\351\15.454.850.65.96−0.51
6530\351\16.014.751.265.690.32
66 *31\351\15.734.661.075.660.07
6727\3513396\13.493.94−0.453.58−0.09
6827\358587\13.493.94−0.453.58−0.09
6927\352747\13.493.94−0.453.59−0.1
70 *27\35858\13.513.94−0.433.59−0.08
7127\35274\13.553.95−0.43.6−0.05
7227\3585\13.673.96−0.293.650.02
7327\3527\13.924−0.083.770.15
7427\3515\14.084.040.043.910.17
7527\354\14.524.260.264.57−0.05
7627\351\65.345.320.025.58−0.24
7727\351\215.435.48−0.055.420.01
7827\351\375.455.51−0.065.380.07
7927\351\1165.465.54−0.085.340.12
* Test set compounds; * A set: 1,6,11,16,21,26,31,36,41,46,51,56,61,66,71,76; * B set: 2,7,12,17,22,27,32,37,42,47,52,57,62,67,72,77; * C set: 3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78; * D set: 4,9,14,19,24,29,34,39,44,49,54,59,64,69,74,79; * D set: 5,10,15,20,25,30,35,40,45,50,55,60,65,70,75.
Table 6. The R2, RMS, and MAE values of 10 Y-randomization tests.
Table 6. The R2, RMS, and MAE values of 10 Y-randomization tests.
MLRRBFNN
R2RMSMAER2RMSMAE
0.0271.3471.1300.1391.5491.258
0.0131.3181.1300.1171.5311.226
0.0771.4101.2120.1201.5331.226
0.0891.4211.1530.1491.5561.282
0.0711.4041.1550.1411.5501.258
0.0011.2631.1010.1391.5491.247
0.0571.3881.1360.1741.5731.325
0.0841.4161.1870.1701.5711.267
0.0611.3921.1460.1451.5531.258
0.0401.3671.1560.1671.5691.280
Table 7. Validation of the MLR model.
Table 7. Validation of the MLR model.
Training SetR2FRMSTest SetR2FRMS
B + C + D + T0.711150.4190.506A0.79052.6050.458
A + C + D + T0.712150.4820.513B0.79955.7440.426
A + B + D + T0.723158.8990.506C0.75242.3700.456
A + B + C + T0.743176.6380.478D0.67428.9020.563
A + B + C + D0.744180.2330.479T0.67426.1130.566
Average0.727163.3340.496 0.73741.1470.494
Table 8. Validation of the RBFNN model.
Table 8. Validation of the RBFNN model.
Training SetR2FRMSTest SetR2FRMS
B + C + D + T0.937912.8520.237A0.931187.7790.277
A + C + D + T0.932837.2160.212B0.956306.6560.216
A + B + D + T0.933845.7000.252C0.944235.5820.218
A + B+ C + T0.929799.8330.254D0.958317.5110.207
A + B+ C + D0.9421014.2170.231T0.904122.2360.298
Average0.935881.9630.237 0.939233.9520.243
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ji, M.; Zhang, L.; Zhuang, X.; Tian, C.; Luan, F.; Cordeiro, M.N.D.S. Toxicity Assessment of the Binary Mixtures of Aquatic Organisms Based on Different Hypothetical Descriptors. Molecules 2022, 27, 6389. https://doi.org/10.3390/molecules27196389

AMA Style

Ji M, Zhang L, Zhuang X, Tian C, Luan F, Cordeiro MNDS. Toxicity Assessment of the Binary Mixtures of Aquatic Organisms Based on Different Hypothetical Descriptors. Molecules. 2022; 27(19):6389. https://doi.org/10.3390/molecules27196389

Chicago/Turabian Style

Ji, Meng, Lihong Zhang, Xuming Zhuang, Chunyuan Tian, Feng Luan, and Maria Natália D. S. Cordeiro. 2022. "Toxicity Assessment of the Binary Mixtures of Aquatic Organisms Based on Different Hypothetical Descriptors" Molecules 27, no. 19: 6389. https://doi.org/10.3390/molecules27196389

Article Metrics

Back to TopTop