*5.2. Prediction of dc*/*dt With All the Features Included in the Dataset*

Initially, we used all the features present in the dataset (including dC/dt) to predict the value of the dc/dt. Figure 3 shows a statistical comparison between the actual values of the dc/dt (Figure 3a) and the predicted ones (Figure 3b). The blue bars show the histogram of the values. The green shadowed area shows the cumulative distribution function (CDF) and the blue shadowed area shows the probability density function (PDF). As shown in the figure, the predicted functions closely follow those of the actual values of the dc/dt. Both histograms show a frequency around 700 for a dc/dt value close to 0. In addition, both the histograms of the actual value of dc/dt and that of the predicted value show a bump in frequency around 100 for dc/dt values close to 9.6 and a bump at a frequency about 100 for dc/dt values around 17. As evident from Figure 3, the CDF of the actual values and that of the predicted values have a very similar shape. This confirms that the algorithm successfully captured the statistics of the dataset.

**Figure 3.** Histogram (blue), cumulative distribution function (green) and probability density function (purple) for (**a**) the actual values of the dc/dt and (**b**) the predicted values of the dc/dt for the pilot dataset with all the features included.

Table 4 compares the statistical metrics between the actual and predicted values of the dc/dt. Although the statistical metrics for the predicted values are slightly higher, it can be concluded that the machine-learning model successfully captured the statistics of the values of the dc/dt. This is further confirmed by Figure 4, the scatter plot, showing that the neural network regression model could accurately predict the values of the dc/dt. It should be noted that a scatter plot that compares the predicted values of the test set with the "true" values of the target is one the main metrics to evaluate a model performance. As is shown in Figure 4, the scored label as a function of the "true" value of the dc/dt follows a y = x line, meaning the model performed very well.

**Table 4.** Statistical comparison between the actual values of the dc/dt with the predicted values for the pilot dataset with all the features included.


**Figure 4.** Scatter plot comparing the predicted values of the dc/dt using the neural network method with the actual values of the dc/dt for the pilot dataset with all the features included.

Figure 5 shows the error histogram of the neural network regression model. Errors with a value of 0.000049 had the highest frequency, confirming the excellent performance of the model. Table 3 (column 3) shows the performance metrics of the neural network regression model. The metrics were recorded to be 0.029 for the mean absolute error, 0.043 for the root mean squared error, 0.005 for the relative absolute error and 0.0046 for the relative squared error. The coefficient of determination for the model was calculated to be 0.99, which shows the excellent performance of the neural network regression algorithm.

**Figure 5.** Error histogram of the predicted values of the dc/dt by using all the features included in the pilot plant dataset.

We used the permutation method to measure the feature importance for the prediction of the dc/dt. The feature importance values are shown in Table 5. It was computed that the dC/dt had the highest importance, with a value of 9.06. In second place stands the dO/dt with a value of 0.16.

**Table 5.** Feature importance for predicting the values of the dc/dt in the pilot dataset with all the features.


### *5.3. Prediction of the dc*/*dt After Excluding Parameters*

In order to establish whether the dc/dt could be predicted with reasonable accuracy with fewer parameters, parameters were successively removed. As discussed in the previous section, the dC/dt had a very high prediction power for the values of the dc/dt. However, it was anticipated that the dC/dt might lead to data leakage, as the dC/dt is calculated from the value of the dc/dt. Thus, the feature dC/dt was removed in order to measure the performance of the neural network regression model (Table 3, column 4). It was found that the scored labels (predicted values) have the very similar histogram, CDF and PDF graphs to those of the actual values of the dc/dt. Similar to the previous calculations, the statistical parameters belonging to the predicted values were slightly higher; however, the difference is very low, and it can be thus said that the model was able to capture the statistics of the data when the feature dC/dt was removed.

Then, we attempted to predict the value of the dc/dt using only two features: namely, total O2 flow and lance height, because these are the two inputs that are controllable in an industrial reactor. The oxygen blown into the converter will be used for the decarburization (dO/dt), oxidization of other elements into the slag (dOs/dt), oxygen in the waste gas, etc.; therefore, in the prediction of the dc/dt, the features of the dO/dt, dOs/dt, etc. are excluded. Figure 6 shows the scatter plot of the dc/dt versus the predicted values. As is evident from the figure, except for some values of the dc/dt where the predicted values were slightly lower, for most of the values, the neural network regression model was able to predict the dc/dt with a good accuracy.

**Figure 6.** Scatter plot comparing the predicted values of the dc/dt using the neural network method with the actual values of dc/dt by using the two features of total oxygen flow and lance height in the pilot dataset.

Figure 7 shows the error histogram for this prediction. The most frequent error was 0.0000033. Table 3 (column 5) shows the performance metrics for the prediction of the dc/dt using only two features. For this computation, the mean absolute error was calculated to be 0.034, root mean squared error to be 0.06, relative absolute error to be 0.008 and relative squared error to be 0.0001. The coefficient of determination was computed to be 0.97. These performance metrics showed that we were able to successfully predict the value of the dc/dt using only the two variables of total oxygen flow and lance height.

**Figure 7.** Error histogram of the predicted values of the dc/dt by using the two operating parameters of total oxygen flow and lance height in the pilot plant dataset.
