3.5.1. Shapley Additive Explanations (SHAP)

Any prediction by the statistical model, *f* , can be explained by a linear combination of the contributions by all input variables:

$$f(\mathbf{x}) = \phi\_0 + \sum\_{i=1}^{M} \phi\_i \tag{9}$$

where *φ*<sup>0</sup> is the contribution when information about the variables are not present. In practice, this will always be the mean value of all prediction of the statistical model. This is sensible since the mean value is always the most optimal choice if complementary information is lacking.

Each SHAP value, *φi*, can be calculated using the following expression [9]:

$$\phi\_i(f, \mathbf{x}) = \sum\_{Z \subseteq \mathcal{Z} \backslash \{i\}} \frac{|Z|!(M - |Z| - 1)!}{M!} [f\_\mathbf{x}(Z \cup \{i\}) - f\_\mathbf{x}(Z)] \tag{10}$$

where *Z* is a subset of the set of all input variables, *Z*ˆ, and *M* is the total number of input variables. *<sup>Z</sup>*<sup>ˆ</sup> \ {*i*} is the set of all input variables excluding variable *<sup>i</sup>* and *<sup>Z</sup>* ∪ {*i*} is a subset of the set of all input variables including variable *i*. Hence, each SHAP value, *φi*, represents the average contribution by an input variable on the output variable of all combinations in which that input variable is presented to the statistical model, *f* .

The number of calculations required to calculate Equation (10) scales drastically when the number of input variables and data points increases. To reduce the number of calculations, one can use an approximative method known as the Kernel SHAP method, which assumes variable independence and model linearity. Assuming variable independence means that Kernel SHAP will produce value combinations that are unrealistic for variables that are dependent. For example, high amount of burner oil will be matched with low oxygen through the burners. The linearity assumption is analogous to linear approximation of functions in mathematics where the space in the vicinity of a data point is assumed to be linear. Furthermore, Kernel SHAP is model independent which means that it can be applied to any supervised statistical model. It has previously been used to analyze an ANN predicting the EE of an EAF [10].

RF enables the use of the Tree SHAP method, which is adapted to tree-based statistical models. Tree SHAP can calculate the exact SHAP values because it does not assume variable independence. Furthermore, the algorithm is computationally efficient compared to the regular SHAP method [20], which means that the SHAP values can be calculated within a reasonable timeframe. By not assuming variable independence, Tree SHAP adheres to the true behavior of the model as opposed to Kernel SHAP, which uses a simplified representation of the original model. This is important since the aim of SHAP is to explain the behavior of the statistical model as accurately as possible.

Tree SHAP also provides the ability to use SHAP interaction values, which calculates the interaction effects between the input variables as contributions to the prediction [25]. Instead of receiving one SHAP value per input variable and prediction, the interaction value calculation provides an *MxM* matrix per prediction, where *M* is the number of input variables. The diagonal of the matrix contains the main interaction values, *φi*,*i*, which are the contributions by each input variable without the influence of the other input variables. The upper and lower triangles of the matrix contain the one-to-one interaction values, *φi*,*j*, which show the contributions to the prediction by each input variable pair. The SHAP interaction value is equally split such that *φi*,*<sup>j</sup>* = *φj*,*i*, which means that the total contribution of the input variable pair (*i*, *j*) is *φi*,*<sup>j</sup>* + *φj*,*i*.

The SHAP interaction values can be calculated by [25]:

$$\phi\_{i,j}(f,x) = \sum\_{Z \subseteq Z \backslash \{i,j\}} \frac{|Z|!(M-|Z|-2)!}{2(M-1)!} \nabla\_{i,j}(Z) \tag{11}$$

where

$$\nabla\_{i,j}(Z) = f\_x(Z \cup \{i, j\}) - f\_x(Z \cup \{i\}) - f\_x(Z \cup \{j\}) + f\_x(Z) \tag{12}$$

The SHAP interaction values can be expressed as the regular SHAP values as follows:

$$
\phi\_i = \phi\_{i,i} + \sum\_{i \neq j}^{M} \phi\_{i,j} \tag{13}
$$

Henceforth, the SHAP interaction values relate to the model prediction as:

$$\sum\_{i=0}^{M} \sum\_{j=0}^{M} \phi\_{i,j}(f, \mathbf{x}) = f(\mathbf{x}) \tag{14}$$

This means that the SHAP interaction values are a more granular representation of the contribution by each input variable on the prediction as opposed to regular SHAP values. On the one hand, the main interaction values provide us with the interaction of each input variable unaffected by the influences of the other input variables. On the other hand, the one-to-one interaction values between the input variables provide us with information on how the interaction between the input variables adds to the prediction contribution.

The aim of showing the main interaction effects on EE by the input variables is to investigate if the models adhere to the underlying relationships between the input variables and the output variables. One must bear in mind that this analysis is univariate with respect to the output variable. Hence, it is not possible to draw any conclusions regarding the intra-relationships between the input variables that together affect the output variable.

There is also one concept known as SHAP feature importance. It is defined as the mean absolute value of the regular SHAP values. See Equation (15).

$$FI\_{\dot{\jmath}} = \frac{1}{n} \sum\_{i=1}^{n} \phi\_{\dot{\jmath}}^{i} \tag{15}$$

where *F Ij* is the SHAP feature importance for variable *j* and *n* is the number of data points.

SHAP feature importance measures the global importance of each input variable and is a more trustworthy measure of feature importance than the traditional permutation-based feature importance. The main reason is that SHAP feature importance is rooted in solid mathematical theory while permutation-based feature importance is based on the empirical evidence provided by random permutations.

In the numerical experiments, the package *shap*, with the method Tree Explainer, in Python will be used to calculate SHAP. The method 'tree path dependent' will be used since it adheres to the variable dependence among the input variables. The software and hardware used in the numerical experiments can be seen in Appendix A, Tables A1 and A2.

#### 3.5.2. Correlation Metrics

Two different correlation metrics will be used to investigate the intra-correlation between input variables as well as the correlation between the input variables and EE, i.e., the output variable. By studying the resulting correlation values with domain-specific knowledge, it is possible to verify the connection between the model prediction and the input variables. After all, the intra-correlation between input variables and their respective correlations with the output variable is what the model learns during the training phase. The two correlation metrics are explained further.

**Pearson correlation:** The Pearson correlation metric that can only detect linear relationships between two random variables [26]. It assumes values between −1 and 1, where the former is a perfect negative relationship between the variables and the latter is a perfect positive relationship, i.e., the variables are identical. A value of 0 indicates that the variables have no relation.

The Pearson correlation coefficient is defined as follows:

$$\rho\_{X,Y} = \frac{\text{covariance}(X, Y)}{\sigma\_X \sigma\_Y} \tag{16}$$

where *σ<sup>X</sup>* and *σ<sup>Y</sup>* are the standard deviations of the two random variables *X* and *Y*, respectively.

**Distance correlation (dCor):** Although dCor cannot distinguish between positive and negative correlative relationships between variables, it is able to detect non-linear relationships between variables [27]. This is important since some variables governing the EAF process have a non-linear relationship to EE. By using dCor and Pearson in tandem, it is possible to get a clearer picture of the relationships between the variables governing the statistical models.

The mathematical expression for dCor has the same form as the Pearson correlation coefficient:

$$d\operatorname{Cov}(V\_1, V\_2) = \frac{d\operatorname{Cov}(V\_1, V\_2)}{\sqrt{d\operatorname{Var}(V\_1)d\operatorname{Var}(V\_2)}}\tag{17}$$

where *dVar*(*V*1) and *dVar*(*V*2) are the distance variance of the random variables *V*<sup>1</sup> and *V*2, respectively. *dCov*(*V*1, *V*2) is the corresponding distance covariance. The square root of *dVar*(*V*1) and *dVar*(*V*2) gives the distance standard deviations.

dCor assumes values between 0 and 1, where the former indicates that the variables are independent, and the latter indicates that the variables are identical. dCor has been used previously when evaluating the variables governing statistical models predicting the EE of the EAF [2,10].

#### 3.5.3. Charge Types

Three distinctly different charge types, i.e., scrap recipes, with different tramp element contents will be used in the analysis to exemplify the contribution to EE by the scrap type or scrap category for each of the chosen charge types. The first charge type, denoted A, has the lowest level of tramp elements of the charge types used in the steel plant. It therefore requires higher amounts of scrap types where the amount of tramp elements such as Cu and Sn are well-known. Examples are residual scrap from the forging mill and purchased scrap from other steel mills. The second charge type, denoted B, does not have as strict requirements as charge type A. Hence, a higher amount of scrap types with lower qualities can be used. A typical lower quality scrap type is heavy melting scrap (HM) which can have relatively high amounts of tramp elements. The last charge type, denoted C, can have higher contents of tramp elements. Hence, purchased scrap from other steel plants and residual scrap are used to a lesser extent.

It is important to note that alloying elements promoting the desired steel properties such as Ni, Mo, and Cr come either from own arising scrap or from purchased scrap with low level of impurities. In the cases the internal scrap is not used, pure alloying elements such as Ni-granulated must be used instead. This is a more expensive route with respect to the total cost of raw materials.

### **4. Results and Discussion**
