3.2.5. Modeling of Correlation between pH and Descriptors

In further calculations, it was assumed that the measured DES pH value can be described as a function of the σ profile of the mixture, expressed by a set of Simix descriptors in Equation (2):

$$\mathbf{pH} = f\left(\mathcal{S}^1\_{\text{mix}\prime}, \mathcal{S}^2\_{\text{mix}\prime}, \mathcal{S}^3\_{\text{mix}\prime}, \mathcal{S}^4\_{\text{mix}\prime}, \mathcal{S}^6\_{\text{mix}\prime}, \mathcal{S}^7\_{\text{mix}\prime}, \mathcal{S}^8\_{\text{mix}\prime}, \mathcal{S}^9\_{\text{mix}\prime}, \mathcal{S}^{10}\_{\text{mix}\prime}\right) \tag{2}$$

Multiple linear regression (MLR) with Equation (3), piecewise linear regression (PLR) with Equation (4), and artificial neural network (ANN) models were attempted to describe the relationship between the input and output variables. The dataset included 142 data points (that included replicates), of which 126 were used for model development and 16 (randomly selected) for independent model validation:

$$\text{pH} = b\_0 + b\_1 \cdot \text{S}\_{\text{mix}}^1 + b\_2 \cdot \text{S}\_{\text{mix}}^2 + b\_3 \cdot \text{S}\_{\text{mix}}^3 + b\_4 \cdot \text{S}\_{\text{mix}}^4 + b\_5 \cdot \text{S}\_{\text{mix}}^5 + b\_6 \cdot \text{S}\_{\text{mix}}^6 + b\_7 \cdot \text{S}\_{\text{mix}}^7 + b\_8 \cdot \text{S}\_{\text{mix}}^8 + b\_9 \cdot \text{S}\_{\text{mix}}^9 \tag{3}$$

$$\mathbf{pH} = \left( \left\{ \begin{aligned} b\_{01} &+ \sum\_{i=1}^{10} b\_{i1} \cdot \mathbb{S}\_{\text{mix}}^{i} & \forall (\text{pH} \le b\_{\text{\textdegree}})\\ b\_{02} &+ \sum\_{i=1}^{10} b\_{i2} \cdot \mathbb{S}\_{\text{mix}}^{i} & \forall (\text{pH} > b\_{\text{\textdegree}}) \end{aligned} \right\} \right) \tag{4}$$

The PLR technique is based on estimating the parameters of two linear regression equations: one for dependent variable values (*y*) less than or equal to the breakpoint (*bn*) and the other for dependent variable values (*y*) higher than the breakpoint.

The MLR parameters in Equation (3) were estimated using least square regression while the PLR parameters in Equation (4) were estimated using the Levenberg–Marquardt algorithm implemented in the software Statistica 13.0 (Tibco Software Inc, Palo Alto, Santa Clara, CA, USA). The algorithm searches for optimal solutions in the function parameter space using the least squares method. The calculations were performed in 50 repetitions with a convergence parameter of 10–6 and a confidence interval of 95% [22].

In addition, multilayer perceptron (MLP) ANNs were used for the prediction of DES pH values based on the Simix descriptors. The ANN models included an input layer, hidden layer, and output layer. The input layer included 10 neurons representing the Simix descriptors, the output layer had only one neuron, and the number of neurons in the hidden layer varied between 4 and 13 and was randomly selected by the algorithm. The hidden activation function and output activation function were selected randomly from the following set: Identity, Logistic, Hyperbolic tangent, and Exponential. The dimension of the data set for ANN modeling was 126 × 11 and was randomly divided into 70% for network training, 15% for network testing, and 15% for model validation. Model training was carried out using a back error propagation algorithm and the error function was a sum of squares implemented in Statistica v.13.0 Automated Neural Networks. The developed model's performance was estimated by calculating the R2 and root mean squared error (RMSE) values for the training, test, and validation sets.

Validation of the developed MLR, PLR, and ANN models was performed on an independent data set, including the Simix descriptors for 16 randomly selected DESs. The validation performance of the developed models was estimated based on the R2 and root mean squared error (*RMSE*).

#### **4. Conclusions**

The applicability of MLR, PLR, and ANN to predict the pH values of DESs was evaluated. The results indicate that although simple linear regression can be used for the description and prediction, its effectiveness and applicability are limited. On the other hand, PLR and ANN are applicable to predict the pH values of DESs with a very high goodness of fit (*R*<sup>2</sup> > 0.8600). The contribution of this work lies in the development of a user-friendly model to predict pH values in a wide range (from 0.525 to 9.25), indicating that the developed models are good for the prediction of the pH value of newly synthesized DESs. However, due to the simplicity of the developed PLR model, it could be suggested as a model of choice for use in daily work and screening purposes.

Nevertheless, this approach can also be extended to other physicochemical properties since this study confirmed previous findings that showed how the σ profile generated in COSMOtherm is a valuable DES molecular descriptor. It could be a good basis for the evaluation of various mathematical models to develop a simple and applicable prediction model for everyday laboratory or industrial applications.

It is interesting to comment on the influence of the addition of water to a DES. In our previous article [7], based on a limited set of data, it was noticed that the addition of water to extremely acidic DESs increases their pH values, and the addition of water to highly basic DESs decreases their pH values. Thus, it seemed that the addition of water somehow mellowed the pH environments. On the other hand, on a larger set of data, as presented here, this conclusion does not hold any more: there are difficult-to-predict exemptions to the rule. On the other hand, the COSMO-RS calculation results in combination with the non-presumptive numerical models, such as MLR, PLR, and ANN, are perfectly suitable to tackle those difficult-to-predict systems.

**Author Contributions:** Conceptualization, I.R.R., M.P. and A.J.T.; methodology, M.P. and A.J.T.; software, M.P., M.R. (Mia Radovi´c), M.R. (Marko Rogoši´c), and J.A.P.C.; validation, M.R. (Mia Radovi´cand), M.C.B., K.R. and J.A.P.C.; formal analysis, M.P., M.R. (Mia Radovi´c), and M.C.B.; investigation, M.P., M.C.B., K.R. and M.R. (Mia Radovi´c); resources, I.R.R.; data curation, M.P., M.R. (Mia Radovi´c), M.R. (Marko Rogoši´c), and A.J.T.; writing—original draft preparation, M.P. and A.J.T.; writing—review and editing, M.P., M.R. (Mia Radovi´c), M.C.B., K.R., M.R. (Marko Rogoši´c), I.R.R., A.J.T. and J.A.P.C.; visualization, M.P., M.R. (Marko Rogoši´c), and A.J.T.; supervision, I.R.R.; project administration, I.R.R.; funding acquisition, I.R.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partly developed within the scope of the project CICECO-Aveiro Institute of Materials, UIDB/50011/2020 & UIDP/50011/2020, financed by national funds through the Portuguese Foundation for Science and Technology/MCTES. This work was also financed by the Croatian science foundation (grant No. 7712).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Samples of the compounds are available from the authors.
