1. Introduction
The chemical manufacturing sector relies heavily on predictive methods to estimate the thermodynamic and physical properties of fluid mixtures, as experimental measurements cannot keep pace with the large number of new chemical compounds synthesized or discovered each year. Currently, experimental quantities are available for only a very small fraction of the multicomponent mixtures commonly encountered in industrial processes. Quantitative structure property/activity relationships (QSPRs/QSARs), linear free energy relationships (LFERs), and functional/fragment group contribution methods have been widely used by research scientists and design engineers to predict properties of both neat organic compounds and fluid mixtures based on the molecular structure of each individual compound or mixture component. The simpler predictive methods combine structural information, and easily measure property data to serve as required input parameters in mathematical correlations, while the more sophisticated predictive approaches utilize input parameters (such as atomic charges, atom–atom polarizabilities, molecular orbital energies, and frontier orbital densities) deduced from more theoretical quantum-mechanical computation methods. Binary interaction terms have even been incorporated in some methods to represent the various types of molecular interactions believed to be present. The introduction of machine learning and deep learning models has increased the applicability and predictive accuracy of the various estimation methods. Physical and thermodynamic properties that have been reasonably predicted by QSPR/QSAR, LFER, and group contribution approaches include vapor pressures [
1,
2], critical temperature [
3], flash point temperatures [
4,
5], water-to-organic solvent partition coefficients [
2,
6,
7,
8], Gibbs energies of solvation and Ostwald solubility coefficients [
9,
10,
11,
12,
13,
14,
15], liquid viscosities [
16,
17,
18,
19], surface tensions [
20], enthalpies of combustion and enthalpies of formation [
21,
22,
23,
24], enthalpies of solvation [
25,
26], isobaric solid, liquid and gas molar heat capacities [
27,
28,
29,
30,
31,
32], enthalpies of vaporization and sublimation [
33,
34,
35,
36,
37], total solid-to-liquid phase change entropies [
38,
39], and activity coefficients [
40,
41,
42,
43,
44].
The focus of the current study is limited in scope to the group contribution and machine learning approaches that have been developed for estimating Abraham model solute descriptors based on inputting the solute molecule’s Canonical SMILES. Our recent experience has indicated that such methods often do not properly account for intramolecular hydrogen bond formation. For example, both the UFZ-LSER and MIT group contribution methods significantly overestimate the Abraham model hydrogen bond acidity solute descriptor. In the case of 1,4-dihydroxyanthraquinone (see
Figure 1 for the molecular structure), the UFZ-LSER [
45] and MIT [
14,
46] predictive software yielded values of
A = 0.82 and
A = 0.862, respectively. The two estimated values are significantly larger than the experiment-based value of
A = 0.00 obtained by fitting the measured molar solubility of 1,4-dihydroxyanthraquinone in accordance with expressions derived from the Abraham model using measured solubility data for the organic solute dissolved in acetic acid, acetone, acetonitrile, 1-butanol and toluene [
47].
In this study, we identify several molecules in our Abraham model solute descriptor database that are known to engage in intramolecular H-bond formation. We propose a relatively simple modification of the molecule’s Canonical SMILES code to indicate which -OH hydrogen atoms cannot act as H-bond donors in their interactions with surrounding solvent molecules. As part of this study, we also measured the mole fraction solubility of oxybenzone (more formally named 2-hydroxy-4-methoxybenzophenone) dissolved in hexane, heptane, octane, dodecane, cyclohexane, methylcyclohexane, 2,2,4-trimethylpentane, diisopropyl ether, propan-1-ol, butan-1-ol, pentan-1-ol, heptan-1-ol, octan-1-ol, 2-methylpropan-1-ol, butan-2-ol, 2-methylbutan-1-ol, 3-methylbutan-1-ol, 4-methylpentan-2-ol, 2-ethylhexan-1-ol, cyclopentanol, and ethylene glycol at 298.15 K. Oxybenzone was selected because we wished to report experiment-based solute descriptors for one additional organic compound that exhibits intramolecular H-bond formation. Published infrared [
48,
49], X-ray crystallographic [
50,
51], proton nuclear magnetic resonance chemical shift [
48,
52], and computational studies [
53,
54] have revealed that intramolecular H-bond formation occurs between the hydroxyl hydrogen atom in 2-hydroxybenzophenone derivatives and the lone electron pairs on the oxygen atom of the neighboring (>C=O) functional group, as depicted in
Figure 2. Intramolecular H-bond formation does not prevent the second lone electron pair on the oxygen >C=O from forming an intermolecular hydrogen bond with protic solvents [
55]. Several amphoteric alcohol solvents were included in the current study because of their ability to act both as a hydrogen bond donor and a hydrogen bond acceptor. Published studies [
55,
56,
57,
58] have suggested the possible formation of bifurcated three-center hydrogen bonds in several organic compounds that exhibit intramolecular H-bond formation, which is where the hydroxyl hydrogen atom is simultaneously involved in an internal H-bond as well as an external H-bond with a surrounding solvent molecule. The amphoteric nature of the alcohol solvents would allow us to explore this possibility through the magnitude of oxybenzone’s calculated
A solute descriptor. A very small calculated
A value near zero would suggest the negligible involvement of the hydroxyl hydrogen atom in external H-bond formation with the H-bond acceptor sites on neighboring solvent molecules considered in the current study.
We note that oxybenzone is widely used as a UV filter in a variety of personal sunscreen products to prevent sunburn, premature photoaging and skin cancer, or as a stabilizer in many commercial products (e.g., plastic surface coatings, outdoor building materials) to minimize photodegradation. It is reported that as many as 81% of the personal care products from both the U.S. and China contain oxybenzone [
59,
60]. The increased demand for UV filters has resulted in the substantial release of oxybenzone and other benzophenone-type filter materials into the environment, and a greater concern for the potential damage that these emerging contaminants might cause in aquatic and other living organisms [
61]. To minimize adverse environmental effects, several countries now limit the amount of oxybenzone in sunscreen products [
62], and Hawaii has even gone so far as to pass legislation banning the active chemical ingredients oxybenzone and octinoxate in sunscreens altogether so as to protect coral reefs from bleaching and disruptions of the ecosystem [
63,
64]. Laws banning the sale or use of sunscreen products containing oxybenzone have also been enacted in several Caribbean islands such as the U.S. Virgin Islands, Aruba and Bonaire [
62].
The Abraham model experiment-based solute descriptors that are obtained from the solubility data measured in the current study, in combination with published Abraham model correlations, will allow researchers to evaluate potential biphasic aqueous–organic extraction systems and potential absorbent materials for the removal of oxybenzone from environmental waterways. At present, wastewater treatment facilities have problems removing UV filters [
65].
2. Chemical Materials and Experimental Methodology
Oxybenzone was purchased from Sigma-Aldrich Chemical Company (Milwaukee, WI, USA, 0.98 mass fraction) in the highest purity available and was recrystallized three times from anhydrous methanol to remove any trace impurities that may have been present in the commercial sample. The recrystallized sample was further dried at 313 K for three days to remove any adsorbed methanol. Gas chromatographic analysis (thermal conductivity detector, carbowax stationary phase) indicated a purity of 0.995 mass fraction for the purified sample. The 21 different organic solvents were purchased from commercial chemical suppliers as follows: hexane (Aldrich Chemical Company, Milwaukee, WI, USA, 0.99+ mass fraction purity), heptane (Aldrich Chemical Company, anhydrous, 0.99 mass fraction), octane (Fluka Chemicals, Buchs, Switzerland, prem, 0.99+ mass fraction), dodecane (Aldrich Chemical Company, 0.99+ mass fraction), cyclohexane (Sigma-Aldrich Chemical Company, St. Louis, MO, USA, anhydrous, 0.995 mass fraction), methylcyclohexane (Aldrich Chemical Company, anhydrous, 0.99+ mass fraction), 2,2,4-trimethylpentane (Aldrich Chemical Company, anhydrous, 0.998 mass fraction), diisopropyl ether (Sigma-Aldrich Chemical Company, anhydrous, 0.99 mass fraction), propan-1-ol (Alfa Aesar, Ward Hill, MA, USA, anhydrous, 0.999 mass fraction), butan-1-ol (Aldrich Chemical Company, HPLC grade, 0.998 mass fraction), pentan-1-ol (Sigma-Aldrich Chemical Company, ACS Reagent, 0.99+ mass fraction), heptan-1-ol (Alfa Aesar, 0.99 mass fraction), octan-1-ol (Sigma-Aldrich Chemical Company, anhydrous, 0.99+ mass fraction), 2-methylpropan-1-ol (Sigma-Aldrich Chemical Company, anhydrous, 0.995 mass fraction), butan-2-ol (Aldrich Chemical Company, anhydrous, 0.995 mass fraction), 2-methylbutan-1-ol (Sigma-Aldrich Chemical Company, 0.99+ mass fraction), 3-methylbutan-1-ol (Sigma-Aldrich Chemical Company, anhydrous, 0.99+ mass fraction), 4-methylpentan-2-ol (Acros Organics, Morris Plains, NJ, USA, 0.99+ mass fraction), 2-ethylhexan-1-ol (Acros Organics, 0.99 mass fraction), cyclopentanol (Sigma-Aldrich Chemical Company, anhydrous, 0.995 mass fraction), and ethylene glycol (Aldrich Chemical Company, spectrophotometric grade, 99+%). The organic solvents were stored over activated molecular sieves, and distilled prior to use. Gas chromatographic analysis showed the solvent purities to be 0.997 mass fraction or higher.
Mole fraction solubilities were determined using a spectrophotometric method of chemical analysis by placing excess oxybenzone and 20 mL of the respective organic solvent in sealed amber glass bottles. The sealed bottles were allowed to equilibrate in a constant-temperature water bath at 298.15 K for three days with periodic shaking to facilitate dissolution. Known aliquots of the clear saturated solutions were transferred into pre-weighed volumetric flasks. The flasks containing the transferred aliquot were then re-weighed on an analytical balance in order to determine the mass of the saturated solution removed for chemical analysis. By determining the amount of transferred saturated solution in this fashion we can calculate the solubility of oxybenzone as a mass fraction, which can be accurately converted to mole fraction solubilities, XS,organic, without having to measure the density of each saturated solution. Many journals require that solubility data be reported in terms of mole fractions.
The transferred solutions were diluted quantitatively with propan-2-ol and their absorbances were recorded on a Milton Roy Spectronic 1000 Plus spectrophotometer (Milton Roy, Rochester, NY, USA) at an analysis wavelength of 284 nm. The 21 organic solvents were optically transparent at the analysis wavelength. The concentration of each diluted sample was obtained from a Beer–Lambert law graph prepared from the measured absorbances of nine standard solutions of known oxybenzone concentration. A new Beer–Lambert law graph was determined each day using nine freshly prepared standard solutions. The molar absorptivity coefficient, ε ≈ 14,000 L/(mole cm), was found to be nearly constant over the concentration range from 3.29 × 10−5 Molar to 1.10 × 10−4 Molar.
The measured absorbances of the dilute solutions prepared from saturated solutions were converted to mass fraction solubilities and then mole fraction solubilities using the molar masses of oxybenzone and the respective organic solvent, the mass of the saturated solution taken for analysis, the volume of the volumetric flasks, calculated molar absorption coefficient, and any dilutions that were required to place the measured absorbance within the linear region of the Beer–Lambert law curve. Repetitive experimental measurements were performed on select samples after an additional two days of equilibration to ensure that saturation conditions had been attained. Melting point temperatures were determined on the equilibrated solid phases recovered from each saturated solution to verify that the solid phase did not change during the course of experimental measurements. For each of the 21 oxybenzone–organic solvent combinations studied in the current communication, the melting point temperature of the equilibrated oxybenzone solid phase was within ±0.5 K of the melting point temperature of the recrystallized sample of oxybenzone prior to contact with the organic solvent. There was no evidence of solid-to-solid phase transitions or solvate formation.
3. Results and Discussion
The experimental mole fraction solubilities,
XS,organic, of the oxybenzone dissolved in seven different linear, cyclic and branched alkanes, in 12 primary and secondary mono-alcohols, in ethylene glycol, and in diisopropyl ether at 298.15 K, are reported in the second and fourth columns of
Table 1. Each numerical value represents the average of 6 to 10 independent measurements, including the follow-up determinations to ensure that the samples had attained equilibrium after the initial three-day equilibration time. The follow up studies confirmed that in all cases, equilibrium had been reached. The tabulated
XS,organic values were reproducible to within ±2.5% (relative error). Our search of the published chemical and engineering literature unfortunately failed to find any experimental solubility data that could be used to compare our measured
XS,organic values against. In fact, we found no experimental
XS,organic data for oxybenzone dissolved in organic solvents. We did find a practical logarithm of a water-to-octanol partition coefficient of log
P = 3.79 [
66]; however, that could be used in computing oxybenzone’s Abraham model solute descriptors.
A major reason for performing experimental solubility measurements is to aid in selecting an organic mono-solvent or solvent mixture to serve as a reaction media for the synthesis of chemical compounds, for the purification of chemical products through either recrystallization or biphasic liquid–liquid extraction, or for dissolving medicinal drug compounds in controlled time-release drug formulations. The solubility data given in
Table 1 provide useful oxybenzone solubility data for a very limited number of the solvents used in industrial manufacturing processes. The measurements become more useful if one discovers a way to utilize the
XS,organic values given in
Table 1 to predict oxybenzone solubilities in additional organic solvents. Linear free energy relationships and quantitative structure–property relationships provide a convenient means for researchers to extend their experimental studies by developing mathematical expressions to estimate additional physical and thermodynamic properties from a limited number of measured quantities. In the case of the current study, the measured
XS,organic values in
Table 1 can be used to calculate the Abraham model descriptor values of the dissolved oxybenzone solute molecule. The solute descriptors (
E,
S,
A,
B,
V, and
L) once calculated can be inserted into Equations (1) and (2) [
67,
68],
thus allowing researchers to estimate the logarithm of the solute’s water-to-octanol partition coefficient, log
P, the logarithm of the solute’s gas-to-organic solvent partition coefficient, log
K, or the logarithm of the solute’s molar solubility ratios, log (
CS,organic/
CS,water) and log (
CS,organic/
CS,gas), in the more than 130 different organic solvents for which the solvent’s equation coefficients (
cp,
ep,
sp,
ap,
bp,
vp,
ck,
ek, s
k,
ak,
bk, and
lk) are known [
67,
68]. The phase to which each molar solute concentration refers is indicated by the subscripts “organic”, “water” and “gas”.
Each term on the right-hand side of Equations (1) and (2) represents a different type of solute–solvent interaction. The magnitude of the different molecular interactions is quantified by a solute property times the complementary solvent property. The solute descriptors are valuable regarding how the solute interacts with surrounding molecules. The solute descriptor E measures the molar refraction of the given solute in excess of that of a linear alkane with a comparable molecular size, which when multiplied by the complementary solvent property, ep and ek, describes the additional dispersion interactions possible for solute and solvent molecules with polarizable π-electrons and lone electron pairs. Intermolecular solute–solvent hydrogen-bonding interactions are quantified by the a × A + b × B terms on the right-hand side of Equations (1) and (2). In the first of the two respective terms, the oxybenzone solute molecule acts as the H-bond donor, while the solubilizing solvent medium acts as the H-bond acceptor. The roles of dissolved oxybenzone and the neighboring solvent molecules are reversed in the case of the two b × B terms.
The three remaining terms on the right-hand side of Equations (1) and (2) refer to contributions resulting from both orientation and induction dipole-type interactions (e.g., the
s ×
S terms), and the breaking of solvent–solvent interactions needed in the creation of the solvent cavity wherein the dissolved oxybenzone solute molecule will reside (e.g., the
vp ×
V and
lk ×
L terms). The size of the created solvent cavity depends on the volume of the solute molecule (e.g., the McGowan V solute descriptor). The various solute descriptors and complementary solvent equation coefficients are described in greater detail in several very informative review and research articles [
69,
70,
71,
72,
73].
The first step in determining oxybenzone’s solute descriptor values is to convert the measured
XS,organic values to molar solubilities,
CS,organic. This conversion is easily accomplished through Equation (3).
The numerical values of
XS,organic given in
Table 1 are divided by the respective ideal molar volume of the respective saturated solution. A value of V
Solute = 0.1948 L mol
−1 was used for the molar volume of the hypothetical subcooled liquid hippuric acid. The mole fraction solubility of oxybenzone is sufficiently small in the studied organic solvents that any errors resulting from our estimation of oxybenzone’s hypothetical subcooled liquid molar volume, V
Solute, or the ideal molar volume approximation should result in a negligible effect on each calculated
CS,organic value.
The calculated molar solubilities (listed in
Table 2), combined with the solvent equation coefficients (listed in
Table 3), are substituted into Equations (1) and (2). They give 42 Abraham model expressions to use in our solute descriptor determination. Two additional expressions are obtained from the practical log
P value contained in the published paper by Rodil and Moeder [
66]. The equation coefficients that describe solute partitioning into a water-saturated octanol phase are identified by the word “wet”, which follows the name of the organic solvent. The remaining equation coefficients in
Table 3 that are not identified as “wet” mean that the organic solvent does not contain enough water to significantly alter its solubilizing character. In other words, it is these equation coefficients that are used in conjunction with measured molar solubility data. The practical log
P value for “wet” octan-1-ol is converted into the corresponding water-to-wet octan-1-ol partition coefficient, log
K, as follows:
This uses oxybenzone’s aqueous molar solubility,
CS,water, and molar gas phase concentration,
CS,gas. The low solubility of 1-octanol in water allows “wet” log
P values to be converted to log
K in this fashion.
The Abraham model correlations that we have constructed contain the six solute descriptors (
E,
S,
A,
B,
V and
L) that we wish to calculate, as well as the values of
CS,water and
CS,gas. The latter two quantities will be floated and determined during the course of the solute description computation. Our search of the published literature did not find experimental solubility for oxybenzene in water. There are more than enough Abraham model expressions to calculate the eight desired quantities. The number of solute descriptors can be reduced to four by remembering that the values of both
V and
E can be obtained entirely from oxybenzone’s molecular structure. For example, the characteristic McGowan volume,
V = 1.7391, is calculated from the number of chemical bonds, as well as the number and sizes of the individual hydrogen, carbon and oxygen atoms in oxybenzone [
74]. An estimated value for the
E solute descriptor,
E = 1.500, is obtained by inputting the oxybenzone’s Canonical SMILES code, O=C(C=1C=CC=CC1)C2=CC=C(OC)C=C2O, into the estimation software program available on the UFZ-LSER website [
45]. Our past experience with this software program is that this group contribution method provides a very good estimate of the
E solute descriptor for organic molecules having few functional groups and relatively simple molecular structures. Estimated values of the
S,
A,
B, and
L solute descriptors for large, complex solute molecules containing multiple functional groups, however, can differ significantly from experiment-based solute descriptors deduced from measured molar solubilities and measured partition coefficients.
Preliminary regression analysis yielded a negative numerical value of
A = −0.068 for oxybenzone’s overall hydrogen bond acidity solute descriptor. A negative numerical value of
A was also obtained when we analyzed only the solubility in the 12 alcohol mono-solvents having a single hydroxyl functional group (
A = −0.126). A negative value of the
A solute descriptor is unrealistic, so we set the value to zero, and performed the regression analysis one more time to give the set of experiment-based descriptor values of
E = 1.500,
S = 1.413,
A = 0.000,
B = 0.617,
V = 1.7391 and
L = 8.660, as well as two logarithms of molar concentrations of oxybenzone—log
CS,water = −4.558 and log
CS,gas = −9.581. Only a very slight increase was noted in the overall standard deviation for the solute descriptor regression analysis, from SD = 0.094 log units to SD = 0.115 log units, by our setting
A = 0.000. Small individual standard deviations of SD = 0.114 log units and SD = 0.118 log units were obtained for the 21 calculated and observed log (
CS,organic/
CS,water) values and the 21 calculated and observed log (
CS,organic/
CS,gas) values, respectively, indicating that either Abraham model could be used in predicting the molar solubilities of oxybenzene in additional organic solvents (See the last two columns of
Table 2 for the back-calculated values of log
CS,organiccalc based on Equations (1) and (2)).
Our private database of experiment-based Abraham model solute descriptors contains numerical values for relatively few hydroxybenzophenone derivatives: 2-hydroxybenzophenone (E = 1.54, S = 1.46, A = 0.00, B = 0.46, V = 1.5395, L = 7.7950); 4-hydroxybenzophenone (E = 1.59, S = 1.89, A = 0.81, B = 0.59, V = 1.5395, L = 8.802); and 2,4-dihydroxybenzophenone (E = 1.73, S = 2.03, A = 0.49, B = 0.70, V = 1.5982, L = 9.062). Oxybenzone and 2-hydroxybenzophenone have a single -OH functional group located in the 2-position on the aromatic phenyl ring, and the calculated A solute descriptor of both compounds is equal to zero, as would be expected for strong intramolecular H-bond formation. 4-Hydroxybenzophenone also has a single -OH function group; however, its A solute descriptor value is much larger, as the hydroxyl functional group is too far away from the carbonyl group to permit intramolecular H-bond formation. The hydrogen atom on the hydroxyl functional group on 4-hydroxybenzophenone is thus available to form intermolecular H-bonds with surrounding solvent molecules. 2,4-Dihydroxybenzophenone has two hydroxyl functional groups, one that engages in intramolecular H-bond formation (-OH group at the 2-ring position) and one that is capable of forming intermolecular hydrogen bonds (-OH group at the 4-ring position) with lone electron pairs on surrounding solvent molecules.
An Abraham model hydrogen bond acidity descriptor value of zero indicates that the hydrogen atom of the hydroxyl functional group is engaged in strong intramolecular hydrogen bond formation with one of the lone electron pairs on the oxygen atom of the >C=O. This observation would be consistent with published spectroscopic [
48,
49,
50,
51,
52] and computation studies [
53,
54] that suggest the formation of intramolecular H-bonds in 2-hydroxybenzophenone derivatives. Moreover, a value of
A = 0.000 rules out the possibility of a bifurcated H-bond between the intramolecular H-bonded hydrogen atom and a neighboring diisopropyl ether or alcohol molecule. A bifurcated H-bond should result in a small, nonzero
A value.
As a reminder, one of the reasons for selecting oxybenzone was to determine if the compound engaged in intramolecular hydrogen bond formation, and to ascertain if the existing group contribution and machine learning models used for estimating the Abraham model solute descriptors would be able to recognize the possibility that the molecule could form intramolecular H-bonds. In the case of 1,4-dihydroxyanthraquinone, 1,8-dihydroxyanthraquinone, and 4,5-dihydroxyanthraquinone-2-carboxylic acids, the two-group contribution methods [
45,
46] significantly overestimated the
A solute descriptor value [
47,
75]. The MIT machine learning method [
46] was found to provide better estimates of the
A solute descriptor value in these three compounds.
Oxybenzone has a very different molecular structure (2-hydroxybenzophenone versus hydroxyanthraquinone motif), and it will informative to compare the experiment-based solute descriptors to estimated values based on the UFZ-LSER and MIT group contribution methods, as well as the MIT and newly proposed AbraLlama machine learning models. Inputting the Canonical SMILES of oxybenzone, O=C(C=1C=CC=CC1)C2=CC=C(OC)C=C2O, into the respective computational software programs yielded the following four sets of Abraham model solute descriptor values:
Group contribution UFZ-LSER estimation [
45]—
E = 1.50,
S = 1.66,
A = 0.41,
B = 0.71,
V = 1.7391, and
L = 9.214;
Group contribution MIT estimation [
46]—
E = 1.668,
S = 1.898,
A = 0.431,
B = 0.832,
V = 1.7391, and
L = 9.354;
Machine learning MIT estimation [
46]—
E = 1.644,
S = 1.714,
A = 0.153,
B = 0.697,
V = 1.7391, and
L = 8.915;
Machine learning AbraLlama estimation [
76]—
E = 1.467,
S = 1.599,
A = 0.175,
B = 1.049, and
V = 1.767.
Both group contribution methods significantly overestimated oxybenzone’s
A solute descriptor. The two machine learning models performed much better in this regard. One must remember, however, that in comparing the different methods to estimate Abraham model solute descriptors, the goal is not simply to obtain a set of estimated values, but rather to obtain values that are of sufficient “quality” to be used in conjunction with existing Abraham model correlations to predict a solute’s physical and thermodynamic properties. To further assess the capability of the four estimation approaches for solute descriptors, we used the estimated values to predict the solubility of oxybenzone in the 21 organic solvents considered in the current study. As part of the comparison, we redetermined the numerical values of log
CS,water and log
CS,gas that were floated in our initial solute descriptor determination. The values of the two floated quantities need to be the ones that minimize the overall standard deviation for the predicted solute descriptors. The results of our calculations, which are summarized in
Table 4, reveal that the four sets of estimated solute descriptors provide poor predictions for the solubility of oxybenzone when substituted into existing Abraham model log (
CS,organic/
CS,water) and log (
CS,organic/
CS,gas) correlations, as reflected in the much larger overall standard deviations. Slight differences between the experiment-based and estimated solute descriptor values can result in sizeable errors in the predicted physical and thermodynamic properties of the given solute molecule, particularly when the descriptor value is multiplied by a large complementary solvent equation coefficient.
The Canonical SMILES unfortunately does not incorporate all of the salient structural features that govern a molecule’s physical and thermodynamic properties. The SMILES code provides information regarding the numbers and various types of atoms, and how the individual atoms are arranged within the molecule. There are arrangements that give rise to special intramolecular interactions and to steric hinderance, which can inhibit a functional group from efficiently interacting with surrounding molecules. We believe that a better set of solute descriptors could be obtained if such information were used in the initial training of the group contribution or machine learning method. For example, if one knows from independent spectroscopic or computational studies that a specific hydroxyl hydrogen atom forms a strong intramolecular H-bond, would it not be better to reduce the contribution of this particular hydrogen atom from the overall H-bond acidity descriptor?
In the case of group contribution methods, it should be fairly easy to introduce a new functional group and associated group value to account for hydrogen atoms engaged in forming an intramolecular H-bond. The Canonical SMILES code of oxybenzone could be modified to include “hb” after the oxygen atom that is covalently bonded to the hydrogen atom involved in intramolecular hydrogen bonding. Our suggested “Informed User-Modified Canonical SMILES code” for oxybenzone would take the form of O=C(C=1C=CC=CC1)C2=CC=C(OC)C=C2Ohb. Modified SMILES codes for other select molecules and their experiment-based Abraham model solute descriptors are provided in
Table 5 and
Table 6, respectively. The tabulated information would provide a good starting point for anyone interested in incorporating intramolecular H-bond formation in the Abraham model solute descriptor predictions. The methodology can be extended to other structural features, such as intramolecular H-bond formation involving the hydrogen atom of a primary and/or secondary amine (Nhb) with lone electron pairs on neighboring oxygen or nitrogen atoms within the molecule [
77,
78]. Steric hindrance around an H-bond donor site (Osh and Nsh) could be introduced into the SMILES code as well. Each modification would permit the user of the predictive software to encode their knowledge of any special structural features that the solute molecule might possess that would affect the estimated Abraham model’s solute descriptors. We note that Gani and coworkers [
79] accounted for intramolecular H-bond formation and other steric factors by introducing second-order and third-order groups into their group contribution approach. The challenge with implementing higher-order groups into existing software programs for estimating solute descriptors is that much of the computational program will need to be completely rewritten. On the other hand, our suggested approach would require the addition of only a few select functional groups and the retraining of the method with a sufficient number of compounds and their modified Canonical SMILES codes in order to obtain group values with good predictive accuracy.
Steric factors have been used in the past to explain the dissimilar solubility behaviors of 2-acetyl-1-naphthol and 1-acetyl-2-naphthol in saturated hydrocarbons versus in alcohol solvents [
81,
82]. The measured solubilities of 2-acetyl-1-naphthol are significantly larger in alcohol solvents than in saturated hydrocarbons. In contract, there is very little difference in the solubility of 1-acetyl-2-naphthol in these two solvent types. Both acetylnaphthols exhibit intramolecular H-bond formation, as indicated by UV-Vis absorption [
83] and dielectric studies [
84]. It is believed that the formation of an intermolecular solute–alcohol H-bond in the case of 1-acetyl-2-naphthol is capable of breaking the weak intramolecular H-bond [
81]. Our suggested Informed User-Modified Canonical SMILES, if incorporated into group contribution methods, will enable researchers to calculate different sets of solute descriptors for 1-acetyl-2-naphthol depending upon whether they wish to include the intramolecular H-bond. In other words, one could use one set of Abraham model solute descriptors to predict the solubility of 1-acetyl-2-naphthol in saturated hydrocarbons and a second set of values to predict solubilities in alcohol solvents.