*Article* **Estimating Equivalent Alkane Carbon Number Using Abraham Solute Parameters**

**William E. Acree, Jr. 1, Wei-Khiong Chong 2, Andrew S.I.D. Lang 3,\* and Hamed Mozafari <sup>3</sup>**

<sup>3</sup> Department of Computing & Mathematics, Oral Roberts University, Tulsa, OK 74136, USA

**\*** Correspondence: alang@oru.edu

**Abstract:** The use of equivalent alkane carbon numbers (EACN) to characterize oils is important in surfactant-oil-water (SOW) systems. However, the measurement of EACN values is non-trivial and thus it becomes desirable to predict EACN values from structure. In this work, we present a simple linear model that can be used to estimate the EACN value of oils with known Abraham solute parameters. We used linear regression with leave-one-out cross validation on a dataset of N = 80 oils with known Abraham solute parameters to derive a general model that can reliably estimate EACN values based upon the Abraham solute parameters: E (the measured liquid or gas molar refraction at 20 ◦C minus that of a hypothetical alkane of identical volume), S (dipolarity/polarizability), A (hydrogen bond acidity), B (hydrogen bond basicity), and V (McGowan characteristic volume) with good accuracy within the chemical space studied (N = 80, *R*<sup>2</sup> = 0.92, RMSE = 1.16, MAE = 0.90, *<sup>p</sup>* < 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>16). These parameters are consistent with those in other models found in the literature and are available for a wide range of compounds.

**Keywords:** equivalent alkane carbon number (EACN); Abraham solute parameters; hydrophobicity; oils

#### **1. Introduction**

Considerable attention has been given in recent years to development of better surfactant-based microemulsions and foam systems for enhanced oil recovery in petroleum processes, for removal of oil from contaminated soil and industrial machinery surfaces, and for the solubilization of fragrances in water-based formulations. Many factors including the temperature, electrolyte concentration, and the hydrophobicities of both the surfactant and oil contribute to the overall efficiency of the extraction system. Experimental determination of the optimum set of conditions for a given surfactant-oil-water system is both expensive and very time-consuming. Fortunately, empirical equations have been proposed to describe how the various factors affect microemulsion formation. One such expression is based on the hydrophilic-lipophilic difference (HLD) framework [1,2].

$$\text{Ironic surfactant: HLD} = \ln(\text{S}) - \text{k} \cdot \text{EACN} + \text{Cc} - \text{\alpha} \cdot (\text{T} - \text{T}\_{\text{ref}}) \tag{1}$$

$$\text{Nonionic surface} \\ \text{cant:} \text{ HLD} = \mathbf{b} \cdot \mathbf{S} - \mathbf{K} \cdot \mathbf{E} \\ \text{ACN} + \mathbf{C} \mathbf{c} + \mathbf{C} \mathbf{T} \cdot (\mathbf{T} - \mathbf{T}\_{\text{ref}}) \tag{2}$$

where S is the salinity (not to be confused with Abraham's S parameter) and the terms ln(S) and b · S take into account the electrolyte concentration (usually in grams per 100 mL) of the system, b is electrolyte and surfactant specific, EACN is the equivalent alkane carbon number of the oil phase, Cc represents the hydrophilicity of the surfactant, and the last two terms, α · (T − Tref) and CT · (T − Tref) are related to the temperature effect. The application of HLD in predicting the type and microemulsion phase behavior is described in greater detail elsewhere [1–3].

**Citation:** Acree, W.E., Jr.; Chong, W.-K.; Lang, A.S.I.D.; Mozafari, H. Estimating Equivalent Alkane Carbon Number Using Abraham Solute Parameters. *Liquids* **2022**, *2*, 318–326. https://doi.org/ 10.3390/liquids2040019

Academic Editor: Slobodan Žumer

Received: 19 August 2022 Accepted: 26 September 2022 Published: 2 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> Department of Chemistry, University of North Texas, Denton, TX 76203, USA

<sup>2</sup> Advent Polytech Co., Ltd., Taipao City 61249, Taiwan

Our interest in the current study is in developing a predictive method for the equivalent alkane carbon number (EACN), which for simple alkanes is numerically equal to the number of carbons (ACN), and for other liquids it is equal to the number of carbons of the n-alkane exhibiting a similar phase behavior in a reference surfactant-oil-water (SOW) system. EACNs can be determined by comparing the oil's fish-tail-temperature (T\*) in a reference SOW to standard calibration curves for n-alkanes [4]. The experimental determination of EACN values for novel oils may be time consuming and the ability to predict EACN values is advantageous. Bouton et al. [5] have developed a two-descriptor model based upon experimental data for 43 oils using the proprietary Molecular Operating Environment (MOE) software:

$$\text{EACN} = -19.84 + 2.88 \cdot \text{average negative softness} + 0.88 \cdot \text{KierA3} \tag{3}$$

where KierA3 is the third alpha modified shape index and "average negative softness"—which is related to polarizability. Lukowicz et al. [6] have developed a three-descriptor model based upon 70 oils using COSMO-RS σ-moments:

$$\text{EACN} = -4.85 - 0.23 \cdot \text{M}\_2 - 0.33 \cdot \text{M}\_{\text{acc}} + 0.06 \cdot \text{M}\_{0\prime} \tag{4}$$

where M2 is molecular polarity, Macc is hydrogen bond basicity, and M0 is total molecular surface area. This extends previous work in this area [7]. Most recently, Delforce et al. [8] have developed both a graph machine model using SMILES codes and a neural network model using COSMO-RS-computed σ-moments based on reliable EACN values for 111 molecules.

This work develops a model for EACN based upon experimental EACN values for 80 liquids using the five Abraham solute parameters E, S, A, B, and V which encode physicochemical properties related to those already found to be important-namely size and shape, polarizability, and hydrogen bond basicity, i.e., we propose the following model:

$$\text{EACN} = \mathbf{c} + \mathbf{e} \cdot \mathbf{E} + \mathbf{s} \cdot \mathbf{S} + \mathbf{a} \cdot \mathbf{A} + \mathbf{b} \cdot \mathbf{B} + \mathbf{v} \cdot \mathbf{V} \tag{5}$$

where E is the solute excess molar refractivity-the measured liquid or gas molar refraction at 20 ◦C minus that of a hypothetical alkane of identical volume-in units of (cm3/mol)/10, S is the solute dipolarity/polarizability, A and B are the overall or summation hydrogen bond acidity and basicity, and V is the McGowan characteristic volume in units of (cm3/mol)/100.

#### **2. Materials and Methods**

Experimentally measured EACN values, collected by Aubry et al. [4,8], were combined with their experimentally determined Abraham solute parameters, primarily from the UFZ-LSER database [9], with the values for decylcyclohexane taken from a paper by Chung et al. [10], and the values for dodecylcyclohexane and bis(2-ethylhexyl) adipate new to this work (from an unpublished database of measured Abraham parameters original to Professor Abraham dated December 2020 shared with one of the authors before Professor Abraham's passing on the 19 January 2021), see Table 1. A modeling dataset was created from these data by: 1. Using median EACN values for compounds with multiple experimental measurements. 2. Only keeping compounds with all 5 Abraham parameters available (measured, not predicted) [9]. This dataset of N = 86 compounds with EACN values and Abraham parameters is available under a CC0 license from figshare [11]. Modeling was performed using R v4.2.0 (R Core Team, Vienna, Austria) [12].


**Table 1.** Measured EACN values with available Abraham Solute Parameters.

**Table 1.** *Cont.*


**Table 1.** *Cont.*



**Table 1.** *Cont.*

#### **3. Results**

The standard Abraham solute descriptor-based model is represented by Equation (6), where c is the intercept, E, S, A, B, and V are the Abraham solute descriptors, and e, s, a, b, v are their coefficients obtained by linear regression:

$$\text{EACNN} = \text{c} + \text{e} \cdot \text{E} + \text{s} \cdot \text{S} + \text{a} \cdot \text{A} + \text{b} \cdot \text{B} + \text{v} \cdot \text{V} \tag{6}$$

When using all the data, we found a moderate correlation (*R*<sup>2</sup> = 0.61) between S and B. Removing S-parameter outliers bis(2-ethylhexyl) adipate, tristearin, and methyl dihydrojasmonate (S > 1) resulted in a dataset where all pairwise correlations had coefficients of determination lower than 0.50 and the coefficients of determination of each parameter against all others was lower than 0.80. Removing S-parameter outliers provides a greater reliability to the model, but may limit its application to a smaller chemical space.

Performing linear regression with leave-out-out (LOO) cross-validation showed that dodecylcyclohexane, decylcyclohexane, and octyl octanoate were clear outliers. Removing the three outliers, and again using LOO linear regression, we found that EACN can be estimated with similar accuracy (LOO measures) using Abraham solute parameters (*N* = 80, *<sup>R</sup>*<sup>2</sup> = 0.92, RMSE = 1.16, MAE = 0.90, *<sup>p</sup>* < 2.2 × <sup>10</sup><sup>−</sup>16) as compared to previous models [5–8], at least within the chemical space represented in the study:

$$\text{EACN} = -2.16 - 2.08 \cdot \text{E} - 9.51 \cdot \text{S} - 50.91 \cdot \text{A} - 5.41 \cdot \text{B} + 6.83 \cdot \text{V} \tag{7}$$

Analyzing the EACN estimates for each compound type, we see that the model performs the best for alkenes, terpenes, alkynes, and aromatics (N = 31, ME = 0.37, MAE = 0.57, RMSE = 0.76). Good performance is seen for the four other types: branched and cyclic alkanes (N = 16, ME = −0.81, MAE = 0.85, RMSE = 1.11); halogenated alkanes (N = 6, ME = 0.99, MAE = 0.99, RMSE = 1.16); fragrances, acrylates, and miscellaneous (N = 9, ME = −0.16, MAE = 0.94, RMSE = 1.18); and ethers, esters, nitriles, and ketones (N = 18, ME = −0.17, MAE = 1.16, RMSE = 1.32)—with consistent over-prediction for halogenated alkanes and consistent under-prediction for branched and cyclic alkanes, see Figure 1.

**Figure 1.** Estimated vs. Measured EACN values colored by type: alkenes, terpenes, alkynes, and aromatics (dark blue); branched and cyclic alkanes (orange); ethers, esters, nitriles, and ketones (red); fragrances, acrylates, and miscellaneous (light blue); and halogenated alkanes (green).

#### **4. Discussion**

We have demonstrated that EACN can be estimated using the standard Abraham solute parameter model, see Equation (7). The first four parameters all have negative coefficients where E is the solute excess molar refractivity-the measured liquid or gas molar refraction at 20 ◦C minus that of a hypothetical alkane of identical volume-in units of (cm3/mol)/10, S is the solute dipolarity/polarizability, A and B are the overall or summation hydrogen bond acidity and basicity, and V is the McGowan characteristic volume in units of (cm3/mol)/100. These results align with previous results [5–8], using different parameter systems, but showing similar accuracy and that EACN has contributions from shape (size and branching) [5–8], polarity/polarizability [5–8], and hydrogen bond basicity [6–8]. Our addition of hydrogen bond acidity, represented by the A descriptor, leads to superior estimation of EACN values for alkynes something not seen in previous models.

We began with a dataset of N = 86 oils with both measured EACN values and measured Abraham solute descriptors. During modeling we removed six compounds: bis(2 ethylhexyl) adipate, tristearin, methyl dihydrojasmonate, decylcyclohexane, dodecylcyclohexane, and octyl octanoate.

The first three compounds were removed because they had large S-values which resulted in an artificially high collinearity with the B parameter. The second set of three compounds were removed as outliers from a first LOO cross-validation analysis. Even so, Equation (7)-predicted EACN values for these compounds are generally of the right order, see the predicted values in the open dataset [11]. The utility of Equation (7) can also be seen by using it to predict EACN values of compounds that have measured EACN values but

that do not have measured Abraham solute parameters. Using predicted Abraham solute parameters [9], we predicted the EACN values of several of these compounds without measured Abraham solute parameters from Table 1, see Table 2. For the compounds listed in Table 2, Equation (7) performs relatively well, with statistics similar to those found for the estimated EACN results above, specifically: N = 11, ME = 0.17, MAE = 1.13, RMSE = 1.61.


**Table 2.** Equation (7)-predicted EACN values using predicted Abraham solute parameters.

The most recent paper by Delforce et al. [8] notes that the measured EACN of 2.2 for diisopropyl ether reported previously [26] is an outier for their model. Our model estimates the EACN value of diisopropyl ether to be 0.2 which is in line with their newly measured EACN value of 0.6.

Our approach provides a useful tool for estimating equivalent alkane carbon numbers as Abraham solute parameters are available for a significant number of compounds [9,10]. While a general model is presented, models for specific families of compounds can be easily created using our open dataset [11]. We also note that the estimated EACN values of individual hydrocarbons from Equation (7) will allow estimation of EACN values of heavy hydrocarbon mixtures, EACNmix, using the mathematical expression proposed by Cayias et al. [30] and Cash et al. [31]:

$$\text{EACN}\_{\text{mix}} = \sum\_{i=1}^{N} \mathbf{x}\_{i} \text{EACN}\_{i} \tag{8}$$

where *xi* and EACN*<sup>i</sup>* denote the mole fraction and numerical EACN value of the individual hydrocarbon component *i*, respectively.

Future research directions include measuring the EACN values of more diverse compounds, especially those with known non-zero A-parameter values.

**Author Contributions:** Conceptualization, W.-K.C.; methodology, W.E.A.J., A.S.I.D.L. and H.M.; data collection and curation, A.S.I.D.L. and H.M.; writing—original draft preparation, W.E.A.J., W.-K.C., A.S.I.D.L. and H.M.; writing—review and editing, W.E.A.J., W.-K.C., A.S.I.D.L. and H.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Data used in this study is available from figshare under a CC0 license [11].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

