*2.2. Stage 2: CAMD Formulation*

2.2.1. Selection of Property Prediction Models

In the next stage, suitable property prediction models were selected to estimate the solvent's target properties via a GC approach. In the GC approach, the property of a compound was defined as a function of structurally dependent parameters, which can be estimated by summing contributions per structural group according to their recurrence in the solvent molecule. The method was mainly attributed to the research done by Joback and Reid [8]. The general property estimation model via GC methods can be seen in Equation (2).

$$f(X) = \sum\_{i} N\_{i} \mathbf{C}\_{i} + w \sum\_{j} M\_{j} D\_{j} + z \sum\_{k} O\_{k} E\_{k} \tag{2}$$

here, *Ci* is the contribution of the first order group of type *i* that occurs *Ni* times, while *Dj* is the contribution of the second order group of type *j* that occurs *Mj* times. *Ek* is the contribution of the third order group of type *k* that occurs *Ok* times. In addition, mixing rules were applied to determine the final values of the targeted solvent-oil blend. The selected property prediction models and mixing rules for targeted property estimation can be found in Appendix A: Table A1. In this study, crude bio-oil derived via the fast pyrolysis of palm kernel shells (PKS) was used as the basis [34]. However, only the organic phase of pyrolysis bio-oil was considered. The properties and components of the pyrolysis bio-oil applied in this study are summarised and listed in Appendix A: Table A2.

#### 2.2.2. Structural Constraints

Other than the abovementioned property constraints, structural constraints were also included in the CAMD model to ensure the formation of a feasible molecule by limiting the molecule to that which favours the targeted properties. The molecular building blocks forming the solvent's structure also need to be carefully tailored. Molecular design can utilise many functional group families and, in this work, several groups were selected, and are shown in Appendix A: Table A3, whereby their group-contribution variables for property estimation were available from the work of Marrero and Gani [34].

In addition, for completeness of the designed molecule, the final molecular structure must not have any free bonds. In other word, the free bond number (FBN) of the final solvent molecule must equal to zero, to be identified as a feasible molecule [35]. FBN can be expressed mathematically as Equation (3), where *Nr* is the number of rings in the structure and *Ni* is the number of acyclic groups in the molecule. Constraints were set for the parameter *Nr* and *Ni* to ensure there is a feasible length of groups included to form an aliphatic chain or cyclic ring.

$$FBN = \Sigma\_i N\_i FBN\_i - \mathcal{Z}(\Sigma\_i N\_i - 1) - \mathcal{Z}N\_r = 0\tag{3}$$

For acyclic structures: *Ni* ≥ 0. For cyclic structures: *Ni* ≥ 3. For ring molecules: *Nr* ≥ 1.

#### 2.2.3. Formulation of CAMD Model

The CAMD model was formulated with generalised mathematical expressions as shown from Equations (4)–(8) [10]:

$$F\_{obj} = \max / \min F(x, p) \tag{4}$$

$$h\_1(p, x) \le 0 \tag{5}$$

$$s\_1(\mathfrak{x}) \le 0 \tag{6}$$

$$p\_k^L \le p\_k \le p\_k^{\mathcal{U}} \; \forall k \tag{7}$$

$$\mathbf{x}\_{\mathcal{S}}^{L} \le \mathbf{x}\_{\mathcal{S}} \le \mathbf{x}\_{\mathcal{S}}^{\mathrm{II}} \; \forall \mathbf{g} \tag{8}$$

here, *Fobj* (Equation (4)) is the objective function, which is to minimise or maximise one or more parameters. Meanwhile, the *F(x,p)* is the vector to the objective function, which evaluate the performance of the designed solvent based on its property *p*. The target properties constraints can be formulated as Equation (5), which is the general function that correspond to the solvent design specification. As the properties of each solvent molecule are highly dependent on the presence of GC building blocks, this constraint can limit the number of appearances of specific GC groups in the designed solvent molecule. On the other hand, Equation (6) can be referred to as the general function that relates the molecular structure generation, to ensure structure feasibility of the generated solvent molecule. In Equations (7) and (8), *pk* indicates the property values for each property *k* and *xg* indicates the number of occurrences of each GC group *g*. Both equations represented the boundaries set on *pk* and *xd*. Here, *p<sup>L</sup> <sup>k</sup>* and *<sup>x</sup><sup>L</sup> <sup>g</sup>* are the lower bounds for parameter *pk* and *xg*, respectively. On the other hand, *p<sup>U</sup> <sup>k</sup>* and *<sup>x</sup><sup>U</sup> <sup>g</sup>* are the upper bounds for parameter *pk* and *xg*, respectively. With the developed CAMD framework, the list of solvent candidates that were feasible, available, and had a relatively established commercial or industrial scale presence were identified and applied in the next stage.

#### 2.2.4. Database Verification

All solvent candidates generated from the CAMD model were verified by conducting a database search on online platforms such as PubChem, ChemSpider, etc. The main purpose of this step was to ensure that all the generated solvent candidates were feasible and practical enough to be applied in real life applications. For solvent candidates that could be found in the databases, property values estimated from the design problem were compared to validate the CAMD results. However, for solvent candidates that could be found in the database or proved to be infeasible, the previous step was revisited by revising the property attributes and constraints.
