*3.2. Post-Processing of the Results*

Matlab's PCA algorithm was applied separately to the two discretized data sets. After subtracting the mean value corresponding to each variable, the algorithm computes by singular value decomposition the coefficient matrix corresponding to each variable per component. The 10-dimensional vector is represented by 10 components yielding a 10 × 10 coefficient matrix. The algorithm also computes the scores matrix corresponding to each observation per component, that is, there are as many individual score vectors as samples. On the basis of Equation (1) and adding the mean value of each variable, the observations can be reproduced. In this way, the variability of each component or dimension can be studied. Figure 5 presents the different pattern ranges that the first four principal components represent in the two data sets.

**Figure 5.** Slag share ranges represented by the principal component (C1 ... C4) for Data set 1 (left panels) and Data set 2 (right panels). Solid lines represent the *mean pattern*, dotted lines the lowest values (labelled A), and dashed lines the highest (labelled B). The percentage of variance explained by each component is reported in the right top part of each subpanel.

After studying the results, it can be hypothesized that the first component (C1, top panels) largely reflects the slag delay expressed in terms of slag share in the initial part of the tapping, where zero corresponds to iron-first and one to slag-first drainage. The second component (C2) captures mainly the level of the slag share (excluding variable 1), falling in the range of 0.1 to 0.3 for both data sets (corresponding to a slag ratio of 110–430 kg/thm, where subscript hm denotes hot metal). The third and fourth components (C3 and C4) represent changes in the share of slag during tapping. Even though the data sets were treated separately, C1 and C2 display a similar meaningful representation of the data in the principal component space. C4 in Data set 1 and C3 in Data set 2 show similar pattern ranges as well. As seen in Figure 4, the outflow patterns of the three tapholes display different trends from variable 2 to 10, and this trend can be captured by the slopes that C4 in Data set 1 and C3 in Data set 2 describe. It can be concluded that these slopes reflect how the slag share develops during the tapping when both phases flow out. Figure 5 also indicates by solid lines the pattern corresponding to the mean values for each data set, which are seen to be very similar. In the principal component space, this pattern corresponds to the interception of the components, which is referred to as the *mean pattern* in what follows. The dotted lines correspond to lowest values in the principal component space (denoted by A) and the dashed lines correspond to highest values (denoted by B).

The algorithm also reports the percentage of the total variance explained by each component. In Data set 1, the first four components explain 78.4%, 9.2%, 5.8%, and 4.0%, respectively, while in Data set 2, they explain 77.8%, 9.3%, 5.7%, and 3.6%, respectively. Considering the meaningful information that the components provide, C1, C2, and C4 were selected for Data set 1, which together explain 91.6%

of the data variation. To provide an equal representation of the data sets and to allow for a comparison, C1, C2, and C3 were selected for Data set 2, together explaining 92.8% of the variation. An accuracy exceeding 90% was considered sufficient to represent the data sets and consequently enough to make a meaningful analysis of the results.
