1. Introduction
Quantitative structure–property relationships (QSPR) remain the focus of many studies aimed at the modeling and prediction of physicochemical and biological properties of molecules. A powerful tool to help in this task is chemometrics, which uses statistical and mathematical methods to extract maximum information from a data set.
QSPR uses chemometric methods to describe how a given physicochemical property varies as a function of molecular descriptors describing the chemical structure of the molecule. Thus, it is possible to replace costly biological tests or experiments of a given physicochemical property (especially when involving hazardous and toxically risky materials or unstable compounds) with calculated descriptors, which can in turn be used to predict the responses of interest for new compounds. Chemometrics has provided new insight into the philosophy and theory behind QSPR modeling [
1,
2]. It has been used to estimate properties such as density, boiling point, solubility,
n-octanol–water partition coefficient, Henry’s law constant and vapor pressure of chemicals. QSPR has received significant contributions from various research schools [
3–
8]. Various quantitative structure–property relationship (QSPR) models have been proposed for estimating the properties of a series of aliphatic alcohols [
9–
12].
The basic strategy of QSPR is to find an optimum quantitative relationship, which can be used for the prediction of the properties of compounds, including those unmeasured. It is obvious that the performance of QSPR model mostly depends on the parameters used to describe the molecular structure. Many efforts have been made to develop alternative molecular descriptors which can be derived using only the information encoded in the chemical structure. Much attention has been concentrated on “topological indices” derived from the connectivity and composition of a molecule which have made significant contributions in QSPR studies. Topological index has advantages of simplicity and quick speed of computation and so attracts the attention of scientists. Topological descriptors can explain most of the property modeled, as shown by some researchers [
13].
In order to investigate the quantitative structure–property relationship of aliphatic alcohols, the molecular structure ROH is divided into two parts, R and OH to generate structural parameter. We proposed that the property is affected by three main factors for aliphatic alcohols, alkyl group R, substituted group OH, and interaction between R and OH. Due to the simplicity and efficiency of graph-theoretical approaches, our group recently introduced a set of novel topological indices to establish the quantitative relationships between the physicochemical properties and molecular structure for organic compounds [
14–
17]. On the basis of the polarizability effect index (PEI) previously developed by Cao, the novel molecular polarizability effect index (MPEI) combined with odd-even index (OEI), the sum eigenvalues of bond-connecting matrix (SX
1CH) previously developed in our team, were used to predict the property of aliphatic alcohols.
The main goal of the present study was to obtain QSPR models of the boiling point (BP), n-octanol–water partition coefficient (lg POW), water solubility (lg W) and the chromatographic retention indices (RI) for aliphatic alcohols using only calculated descriptors. At first, the generated numerical descriptors that encode structural information for the compounds in the data set were calculated. Then, multiple linear regression statistical analysis was used to build the QSPR models. In these models, no physical property parameter was used so that prediction could be carried out directly from molecular structure.
2. Methodology
The QSPR study of these aliphatic alcohols was performed in four fundamental stages: (1) Selection of data set; (2) generation of molecular descriptors; (3) multiple linear regression statistical analysis; and (4) model validation techniques. The descriptive power of the model was characterized by use of the statistical data multiple correlation coefficient (R), Fisher ratio (F), and standard derivation (s). Model applicability was further examined by plotting predicted data against experimental data for all the compounds.
All calculations were run on a Pentium IV personal computer with XP as operating system. Computation of the descriptors was performed using Matlab 6.5 programs. The Origin program packages were employed for regression analysis [
18].
2.1. Data Set
Alcohols are toxic materials and thus represent dangerous environmental pollutants especially in the case when a mishap happens and accidentally large quantities of alcohols pollute the environment. Alcohols are also technologically important materials and are used in the manufacture of a large number of products. In this work, 58 aliphatic alcohols were studied. The corresponding experimental data (boiling points at 1 atm) were obtained from the literature [
19]. The water solubility (lg
W) and
n-octanol/water partition coefficients (lg
POW) of the alcohols were taken from the literature [
20]. The data sets of the Kovats retention indices were taken from the literature [
21]. Kovats retention indices of the molecules were obtained on six different stationary phases of low to medium polarity (SE-30, OV-3, OV-7, OV-11, OV-17 and OV-25). All of these data are in agreement with a standard source.
2.2. Definition of the Topological Indices
Descriptors encoding significant structural information are used to present the physicochemical characteristics of compounds to build the relationship between structure and property in this study. According to the basic factors that influence the property of the aliphatic alcohols, such molecular descriptors: the molecular polarizability effect index (MPEI) connecting to the polarizability of the molecule and the intramolecular action of the solute, the odd–even index (OEI) which reflects the size of the molecule and the connection of each atom, the sum eigenvalues of every H–C bond adjacency matrix(SX
1CH) connecting to the property of the chemical bond, have been generated to build the QSPR model. The index OEI and SX
1CH reflect the property of apolar R group and represent the R contribution to the physicochemical properties to be predicted. The MPEI index reflects the property of polar OH group and represents the OH contribution, and R/OH interaction contribution. A complete list of the compounds names and the calculated values of the molecular descriptors appearing in the QSPR models are summarized in
Tables 3,
5 and
6.
2.2.1. The Odd–Even Index OEI
Odd–even index has been defined for the alkane molecule in our previous paper [
14], which reflects the size of the molecule and the connection of each atom. The index is restated briefly as follows:
where
N is the number of vertices in molecular graph and
S is the derivative matrix from distance matrix
D. The elements of
S are the squares of the reciprocal distances (
Dij)
−2,
i.e.,
(when
i =
j, let
). Taking 3-hexanol as an example to illustrate the calculation of OEI: First, we convert the structure of the molecule into that of the corresponding hexane.
Figure 1 shows the hydrogen-suppressed molecular graph of 3-hexanol, where the numbers are the random numberings of each vertex. Then, we use matrices
D to represent
Dij of the molecule.
2.2.2. The Molecular Polarizability Effect Index MPEI
In the preceding paper [
16], the polarizability effect index (PEI) for alkyl groups of alkane molecules has been developed and calculated. It quantitatively indicates the relative proportion polarizability effect of the alkyl groups. The PEI values of some normal alkyls and the increments ΔPEI are listed in
Table 1. As with aliphatic alcohols, the contribution of the property arising from relative proportion polarizability effect of alkyl groups is expressed as:
where ΔPEI is the polarizability effect index increment of
ith essential unit and can be directly taken from
Table 1.
For the aliphatic alcohol molecules, the substituent
R contains other atoms besides carbon and hydrogen,
αi is no longer a constant and
Equation (2) will not work well. It needs to be modified. Here, we use
Equation (3) to evaluate the stabilizing energy caused by the polarizability effect for a substituent
R:
where
Km = −
q2/(2Dl
4),
αi is the polarizability (unit 10
−24 cm
3) of the
ith atom in substituent R. Some atomi
αi values are listed in
Table 2. Because
Km is a constant, this work only calculates the term Σ
αi (ΔPEI) of
Equation (3). Take the sum of Σ
αi (ΔPEI) for all groups in a molecule as the molecular polarizability effect index (MPEI) and MPEI is expressed as [
16]:
The molecule of 2-methyl-1-propanol is taken as an example to illustrate the calculation of the molecular polarizability effect index.
Figure 2 shows its hydrogen-suppressed molecular graph, where the numbers are the numberings of each carbon atom according to its distance to the hydroxide radical. Take the carbon atom connecting the hydroxide radical as the beginning to calculate the MPEI index as below:
2.2.3. Eigenvalues of Bond-Connecting Matrix (SX1CH)
Recently, we introduced the X
1CH index to evaluate bond dissociation energy for the alkane molecule [
15]. Here, we also convert the structure of the aliphatic alcohol molecule to that of the corresponding alkane. Now, we consider the molecule of 2-methyl-1-propanol, the corresponding alkane is 2-methylbutane. If H atom connects with the
ith carbon atom (C
i), when the H
–C
i bond is broken, two radicals H and R
i will be formed (
Figure 3).
According to the calculation method of PEI of alkyl in paper [
16] and values in
Table 1, we can calculate the PEI for two radicals above as follows:
Then, PEI
H and PEI
1 were used as the main diagonal elements to build the bonding adjacency matrix B
CH of H
–C
1 bond:
The off-diagonal element “1” in matrix means that H atom and C
1 are connected with each other,
i.e., they are adjacent. Solving matrix B
CH by computer, we got two eigenvalues X
1CH = −0.5518 and X
2CH = 1.8121 (let X
1CH < X
2CH). The eigenvalues of every H–C bond adjacency matrix in a molecule are also calculated with the same method. Finally, taking sum of X
1CH of all B
CH respectively, we got parameters SX
1CH, in other words, let SX
1CH = ∑X
1CH. For 2-methylbutane, there are:
3. Results and Discussion
Multiple linear regression analysis using the novel MPEI, OEI and SX1CH indices is performed for the development of the final QSPR model.
3.1. Quantitative Structure-Retention Relationship (QSRR) Model for Alcohols on Stationary Phases of Different Polarity
After calculation of the descriptors (
Table 3) of alcohols molecule, multiple linear regression analysis using the novel MPEI, OEI, SX
1CH indices is performed for the development of the final QSRR model for each stationary phase separately. Specifications of the best models found for describing the RI values of alcohols on the six stationary phases are given in
Table 4. It can be seen that the equations represent excellent QSRR models judging from high
R and low
s values. Also, the
F values show a high degree of statistical credibility and are indicative of an excellent fit of the models to the experimental RI values.
In order to validate the models obtained, the leave-one-out test was performed. The results for the models are shown in
Table 4. As shown, in all cases, cross-validated correlation coefficient is only slightly less than the corresponding value of the full model.
3.2. Quantitative Structure-Property Relationship (QSPR) Model for BP of the Alcohols
Boiling point is important for the characterization and identification of a compound. It also provides an indication of the volatility of a compound. It is intuitively evident that boiling point is critically influenced by two characteristics of a molecule: first the molecular weight and, second, the intermolecular attractive forces between molecules. Multiple linear regression analysis using the novel MPEI, OEI indices is performed for the development of the final two-parameter QSPR model in the form of
Equation (5). Of the two parameters in the model, the OEI index addresses the first, and the MPEI addresses the second.
R = 0.9928;
s = 4.3;
F = 1885.3;
n = 58;
Rcv = 0.9918;
scv = 4.5.
The two parameter QSPR equation reflects quantitatively the well known fact that the boiling point of a compound depends on the mass of its molecules and their tendency to stick together. The calculated BP is shown in
Table 5 and plotted against the experimental values in
Figure 4.
3.3. Quantitative Structure-Property Relationship (QSPR) Models for Water Solubility (lg W), n-Octanol/Water Partition Coefficients (lg POW) of the Alcohols
Physicochemical properties of micropollutants, such as water solubility (lg
W) and
n-octanol/water partition coefficient (lg
POW), play a major role in determining the distribution and fate of organic contaminants in the global environments and have been used for assessing environmental partition and transport of organic substances. The compounds used in this study contain 58 alcohols. With the aid of a computer program, the best model is obtained as follows:
R = 0.9942;
s = 0.19;
F = 2176.9;
Rcv = 0.9932;
scv = 0.20;
n = 58.
R = 0.9959;
s = 0.15;
F = 3306.4;
Rcv = 0.9954;
scv = 0.15;
n = 58.
Two models are validated to be statistically significant by the leave-one-out cross-validation. The calculated and experimental lg
W and lg
POW of alcohols along with topological descriptors are listed in
Table 6.
The plot of calculated values
versus observed values of lg
W and lg
POW is shown in
Figure 5 and
Figure 6, respectively.
In the three models, the proposed index OEI and SX1CH were generated on the basis of the aliphatic part of the molecule and represent the R contribution to the physicochemical properties to be predicted. The MPEI index was introduced not only taking into account the presence of OH group, but also the polar OH contribution and apolar R group/polar OH interaction contribution to the predicted physicochemical properties. The property of alcohols is influenced by the intermolecular forces and MPEI index connecting to the polarizability of the molecule and the intramolecular action of the solute. So, in the three different models, MPEI index is significant.
Most of QSPR research only investigates one or a few properties of correlation with some parameters or descriptors. In this paper, we have obtained good correlations between OEI, MPEI, SXICH and the many properties of alcohols.
From the results above, all of the correlation coefficients (R) are greater than 0.99, every regression equation has high F and low s; from the figures, the calculated values are very close to the experimental ones, there is no large deviation in all estimated values, and the statistical validity of the models are verified by the leave-one-out cross validation technique.
It appears that models based on these properties are simpler, but it is important to remember that the experimental data of these properties are not always available. Furthermore, their predicted data could be subject to high variability due to the selected QSPR calculation method.
4. Conclusion
In this study, the novel topological indices: MPEI, OEI and SX1CH based on graph theory by dividing the molecular structure into substructure, were used to correlate with boiling point (BP), octanol–water partition coefficient (lg POW), water solubility (lg W) and the chromatographic retention indices (RI) on different polar stationary phases. Descriptors appeared in these models coding the chemical structure effectively and simply, providing enough information related to the molecular structure and molecular properties. The proposed models have good stability, robustness and the predicted values from MLR method are close to the experimental values, which demonstrates the ability of these descriptors to give prediction. The leave-one-out cross-validation technique used in the study ensures the models performed as stably and reliably as possible. The correlation equations and descriptors are expected to be used for the prediction of physicochemical properties for diverse aliphatic alcohols in cases where the physicochemical indices are not readily available. This paper opens a new insight and may lead to the exploration of a novel way for QSPR study of alcohols.