3.2.2. Variable Batches

The variable batches are based on the variables motivated in Section 3.2.1 as a starting point. All 8 variable batches from this group can be seen in Table 3 and the variables in each variable group are shown in Table 4.

**Table 3.** The domain-specific variable batches. The variables present in each variable group are shown in Table 4.


The reason the scrap representations will not be used together in any of the variable batches is due to physical consistency. Besides being redundant, there is a physical logic tied to each scrap representation. Using a mixture of scrap representations will not indicate which scrap representation is the most optimal to use. One of the aims of this study is to investigate the best scrap representation with respect to the performance of the models on test data.


**Table 4.** Input variables for each variable group. There is a total of 48 input variables.

#### *3.3. Data Treatment*

#### 3.3.1. Purpose

To ensure the reliability and validity of a statistical model, the data which is used to create the model must be treated. The reason is because statistical models adapt their coefficients solely based on data, as opposed to physical models, which have pre-determined coefficients. By including data that is of low quality, the statistical model will inherit that quality when making predictions on new, previously unseen, data. In general, data treatment is a double-edged sword. On the one hand, a model should be able to predict well on any future data. On the other hand, all data sets contain data points that represent extreme cases. Any statistical regression model will predict these extreme cases with a low accuracy, since the coefficient adaptation algorithm is based on minimizing the error on the entire data set included in the training phase. Any extreme case receives a lower priority due to its rarity. Hence, a successful modeling effort strikes a balance between these two opposing effects.

Data treatment methods can be divided into two categories, domain-specific methods and statistical methods. The two disparate data treatment methods will be described further.
