*3.1. Cross Validation and Model Development*

Cross validation and fitting with the LASSO approach was carried out here using the statistical software R and RStudio using the package "glmnet" written by Friedman et al. [20]. This software is commonly used for both linear and non-linear regression in addition to classification. To make this easier, various packages and subroutines have been written in this software including machine learning-based methods such as the LASSO. In the field of process/chemical engineering, alternative software such as Aspen Plus is a very powerful tool which can be used for both simulation and regression of parameters for both linear and non-linear expressions but as far as we know it does not include the option to include shrinkage-based model reduction (although perhaps subroutines could be written to add this functionality in the future).

An example of the output of cross validation is shown in Figure 3, which demonstrates how the mean square error (from cross validation) varies with changing the value of the tuning parameter *λ*. This particular graph shows the cross validation results for the prediction of hydrogen mole % in the produced gas based on a linear expression in terms of the 11 inputs. It can be seen from the number at the top edge that the number of inputs included in the model reduces as *λ* increases, with a minimum MSE value given with 7 out of 11 inputs.

In particular, the inputs that can be eliminated are shown from the data to be: C, H, Fs, and bulk, so the reduced linear expression can be stated as

$$H\_2(\%) = \beta\_0 + \beta\_1 Tgas + \beta\_2 ER + \beta\_3 MC + \beta\_4 O + \beta\_5 Ash + \beta\_6 Gr + \beta\_7 void \tag{6}$$

If starting from a quadratic expression, it might be expected that a larger number of inputs or combinations of inputs would result. However, the cross validation in Figure 4 shows a minimum MSE located near the point where there are only two inputs. Looking at the data, the two remaining terms after this point are TgasER, the product of gasification temperature and equivalence ratio, and MCAsh, the product of moisture content and ash content, suggesting that a very simple expression can be obtained:

$$H\_2(\%) = \beta\_0 + \beta\_1 TgasER + \beta\_2 MCAsh \tag{7}$$

Although, at the exact minimum, a third product, ERGr (the product of equivalence ratio and grate rotation speed), and fourth, Ovoid (the product of elemental oxygen content and the void fraction), also appear.

$$H\_2(\%) = \beta\_0 + \beta\_1 TgasER + \beta\_2 MCAsh + \beta\_3 ERGr + \beta\_4 Ovoid \tag{8}$$

Thus, it appears in the case of hydrogen that a quadratic expression with four terms provides a much simpler model than both the full linear model and the reduced linear model. Based on similar analysis, applying cross validation and fitting the resulting expressions to the training data, the following expressions are given in Table 2.

**Figure 3.** Plot of cross validation MSE against the log of the tuning parameter *λ* from Equation (4) for the prediction of hydrogen % using a linear expression. The numbers above the graph show the corresponding number of inputs with non-zero parameters.

**Figure 4.** Plot of cross validation MSE against the log of the tuning parameter *λ* from Equation (5) for the prediction of hydrogen % using a quadratic expression. The numbers above the graph show the corresponding number of terms with non-zero parameters in the quadratic model.


**Table 2.** Reduced model expressions resulting from cross validation with the LASSO approach.
