Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study

Kačur, Ján; Flegner, Patrik; Durdán, Milan; Laciak, Marek

doi:10.3390/app12157757

Open AccessArticle

Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study

Institute of Control and Informatization of Production Processes, Faculty BERG, Technical University of Košice, Němcovej 3, 042 00 Košice, Slovakia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(15), 7757; https://doi.org/10.3390/app12157757

Submission received: 22 June 2022 / Revised: 27 July 2022 / Accepted: 28 July 2022 / Published: 1 August 2022

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

In the presented research, machine learning methods were applied to the prediction of melt temperature and carbon concentration in the melt in the basic oxygen furnace (BOF). The study’s significance is that machine learning methods have not yet been applied to such an extent, even though the metallurgy industry requires it. The presented significant results will help choose the most powerful modeling method and improve the steelmaking process’ variable prediction. Prediction of temperature and carbon concentration can improve the process control and reduce after-blows in melting. Estimation of endpoint in BOF is currently the most researched field of interest to ensure the quality of produced steel.

Abstract

The basic oxygen steelmaking process (BOS) faces the issue of the absence of information about the melt temperature and the carbon concentration in the melt. Although deterministic models for predicting steelmaking process variables are being developed in metallurgical research, machine-learning models can model the nonlinearities of process variables and provide a good estimate of the target process variables. In this paper, five machine learning methods were applied to predict the temperature and carbon concentration in the melt at the endpoint of BOS. Multivariate adaptive regression splines (MARS), support-vector regression (SVR), neural network (NN), k-nearest neighbors (k-NN), and random forest (RF) methods were compared. Machine modeling was based on static and dynamic observations from many melts. In predicting from dynamic melting data, a method of pairing static and dynamic data to create a training set was proposed. In addition, this approach has been found to predict the dynamic behavior of temperature and carbon during melting. The results showed that the piecewise-cubic MARS model achieved the best prediction performance for temperature in testing on static and dynamic data. On the other hand, carbon predictions by machine models trained on joined static and dynamic data were more powerful. In the case of predictions from dynamic data, the best results were obtained by the k-NN-based model, i.e., carbon, and the piecewise-linear MARS model in the case of temperature. In contrast, the neural network recorded the lowest prediction performance in more tests.

Keywords:

steelmaking; melt temperature; carbon concentration; machine learning; modeling; prediction; method comparison

1. Introduction

When controlling basic oxygen steelmaking (BOS), it is essential to know the state of the melt, i.e., the carbon concentration in the melt and the melt temperature. Due to the high temperatures in the converter, it is problematic to measure these process variables with conventional devices (e.g., thermocouples). Although said process variables are measured at the beginning and end of the melting, it remains questionable to determine the dynamic course during the melt and the melting’s endpoint. For this reason, indirect measuring techniques based on soft sensors are used. Information about the melt temperature and the carbon concentration in the melt will help the control personnel perform the appropriate control interventions, i.e., adjust the amount of blown oxygen, the nozzle height, or add sub-materials. These soft-sensors use software models based on various principles and approaches. Mathematical models enable real-time estimating process variables without a hardware sensor [1].

The problem with basic oxygen steelmaking (BOS) control is the absence of information about the melt temperature and the carbon concentration in the melt. These data are usually measured only at the beginning and end of the melt. Various mathematical models can estimate these variables during melting or at the endpoint.

The main research objective of this study is to compare five machine learning methods for modeling target variables in the steelmaking process. Multivariate adaptive regression splines (MARS), support-vector regression (SVR), neural network (NN), k-nearest neighbors (k-NN), and random forests (RF) will be compared. Two approaches to data-driven converter process modeling will be explored, i.e., modeling from static data and from combined static and dynamic data of many melts.

This study envisages finding the most powerful machine model for predicting the melt temperature and carbon concentration at the endpoint of the melt. This study also assesses which modeling approach is more powerful for a given target variable. This research also hypothesizes that changing the number of input observations when training models has some effect on the performance of machine models. In this case, the amount of the added lime and dolomitic lime were regarded as supplementary input observations. This paper verified this assumption by modeling and predicting from static data. This study also envisages the possibility of modeling the dynamic behavior of temperature and carbon concentration based on dynamic data of meltings. In this case, the added concentration of O

_{2}

and H

_{2}

were regarded as supplementary input observations. This paper verified this assumption by modeling and predicting from dynamic data.

It is assumed that based on the continuous measurement of the waste gas composition, the temperature of the flue gases, and the volumetric amount of blown oxygen during melting, it is possible to determine the dynamic course of the temperature of the melt and the concentration of carbon. Then, a learned machine model could obtain an estimate of these target variables.

It should be possible to evaluate the accuracy at the melting endpoint as well as the quality of the dynamic course from the simulation of the dynamic sleep of the target variables. In addition, the speed of machine learning methods in the training phase of individual models will be evaluated.

The literature review and fundamentals of the BOS are explained in the following subsections. The second section presents the theoretical background of applied methods. Machine modeling was based on static and dynamic data from many meltings. This section describes the utilization of measurable static and dynamic data from BOS and proposes a joint of these data for prediction from dynamic data. The scheme of modeling and prediction in BOS is also proposed. The simulation results of prediction based on static data and dynamic data are provided in the third section. The results from five machine learning methods are compared. At the end of the paper, the achieved results are discussed, and the performance of individual machine models is evaluated.

1.1. Presented Research Area in the Literature Review

Various approaches to modeling and data prediction in steelmaking have been explored around the world. The latest innovations in converter process modeling and monitoring can be found in [2].

In recent years, many deterministic models of the steelmaking process have been developed and applied. Researchers have developed complex dynamical models to predict the chemical composition of the melt based on balance equations (e.g., [3]) or models to predict the temperature of steel based on thermodynamic reaction balance and gas analysis technology (e.g., [4,5,6,7]). Gas analysis technology provides a way to predict the temperature of converters with long life continuously, and it can be applied to different tonnage converters. Interest has also increased in thermodynamic modeling of the BOS process. The literature reports thermodynamic models to predict the bath composition, temperature as well as decarburization rate (e.g., [8,9,10]).

Nowadays, significantly increased computational resources permit detailed mathematical and data-driven modeling of BOS (e.g., the decarburization model proposed in [11]). On the other hand, the steady-state models are usually developed based on material and heat balance (e.g., [12,13]).

In the endpoint prediction of BOS, researchers also use different regression models. These are primarily linear or nonlinear models capable of predicting melt temperature and carbon concentration in the melt. The basic regression model is usually combined with another type of model (e.g., utilization of the Gray model in [14]) or adaptation level (e.g., a continual adaptation of the regression parameters by the least-squares method proposed in [15]). In addition, other researchers (e.g., Huang et al. [16]) performed regression analysis of BOS data and proposed an oxygen and temperature regression model for target variable prediction.

Recently, interest in machine learning algorithms has increased significantly in controlling the steel production process. The end parameters of the molten steel, such as steel temperature and carbon content, directly affect the quality of the produced steel. In addition, these endpoint process variables are problematic to measure online continuously over time. Applications of various machine learning methods aimed at predicting temperature and carbon in BOS can be found in the literature.

Methods with kernel functions are trendy in machine learning and steelmaking. Support-vector regression (SVR), based on a statistical learning algorithm, can be used as a soft sensor to model the complex relationship between input and output data. Some applications of SVR on BOS can be found in [17,18,19], [20] or [21]. These are primarily data-driven models aimed at online or offline prediction of BOS process variables. The proposed black-box models improved the prediction of process variables compared to the analytical models currently used in steel production.

Much research has focused on endpoint prediction in BOS using a neural network-based model. A neural network (NN) is capable of self-learning, approaching any nonlinear function, and processing data quickly. Some applications of NNs to predict process variables in BOS endpoints can be found in [22,23,24] or [25]. These applications were usually combined with various supporting optimization algorithms to improve or adapt the NN. There are also many NN applications and their improvements in the literature to predict stopping temperature and end blow oxygen in the LD converter (e.g., [26,27,28,29]). One of the improvements of the NN applied to BOS is represented by the adaptive neural network fuzzy inference system (ANFIS) [30].

Recently, a data-mining method, case-based reasoning (CBR), has been widely applied in many areas. CBR solves new problems by using the solution of a previous similar case. Han and Cao [31] proposed an improved CBR method to predict carbon content in the melt. The results showed higher prediction accuracy and robustness of the model than SVR. Another data-mining temperature model for BOF endpoint prediction was proposed in [28,32,33,34].

The use of machine learning methods and computer models in managing and monitoring steel production is closely related to optimizing production procedures using the most modern technological knowledge and information technologies to increase production. In short, it is a process known as Industry 4.0 that refers to a higher level of automation for operational productivity and efficiency by connecting virtual and physical worlds in an industry [35].

Reindustrialization in the European Union proves that the manufacturing industry around the world is developing in three directions of digitization, networking, and intelligence through information technologies [36].

One goal of the Industry 4.0 era is to apply data-driven approaches to optimize industrial processes. Steelmaking takes a long time, sometimes months, to produce the final product. A steel manufacturer usually produces steel of different grades for different uses. For the improvement of individual steel production processes and their analysis, engineers have a large amount of data from the production chain at their disposal. The lack of process data reduces the possibilities for critical optimizations in steel production. Through the intelligent management and control of the integrated systems of the production process, the efficiency of steel production can be effectively improved, and the energy consumption in production can be significantly reduced, which is of great importance for the improvement of production. Industry 5.0 also constitutes the reconciliation between humans and machines where artificial intelligence (AI), big datasets, cloud computing, machine learning, and digital transformation are the most often used [35].

Machine models intended for predicting process variables are also used in intelligent factories based on cyber-physical systems (CPS), which are the essence of Industry 4.0. The reason is that these systems connect virtual space with physical reality and operate reliably and efficiently. Among the capabilities of CPS belongs, e.g., self-monitoring and self-diagnosis, implemented by some producers [37].

An example is machine learning-based models that can improve the production monitoring process implemented on human-machine-interface (HMI) devices. The aspect of asset tracking is core in Industry 4.0. The use of AI together with RFID technology to search and track the real-time geo-position of monitored assets can be found, for example, in [38].

Although the steel industry invests significantly in research and automation, there is still much to do in the way of transformation to Industry 4.0. Nowadays, apart from the human factor, there are research gaps in sustainability, responsibility, safety, and others in the Industry 4.0 concept. The critical elements are also high investments and the pace of implementing technological changes [39].

1.2. Understanding of Steelmaking in LD Converter

This process is referred to as LD, BOF, BOP (i.e., basic oxygen process), or BOS (basic oxygen steelmaking) in various places, but they all denote top-blown converter steelmaking [40,41]. The BOF process involves melting liquid iron steel in a base-lined LD converter by blowing oxygen through a water-cooled lance. The oxygen converter is a pear-shaped vessel of sheet steel lined with a main brick into which molten iron from a blast furnace is poured [42]. The typical capacities of oxygen converters are in the range of 50–350 t. The converter can be rotated 360

^{\circ}

on pins around the horizontal axis for scrap metal loading, melt casting, steel, and debris removal during operation. The batch melting usually takes 15 to 25 min regardless of the size of the melt. The flow of the blown oxygen to the melt through the lance is also adjusted according to the weight of the batch [43].

By blowing oxygen at supersonic speed, the oxidation of impurities in the pig iron is accelerated, and the metal surface area increases. A multi-hole copper nozzle blows oxygen into the melt at a pressure of 1.6–2.8 MPa. Scrap metal is added to the LD converter at the beginning of the melting. Iron ore is added while blowing to oxidize silicon phosphorus, manganese, and carbon, but these exothermic reactions do not cause the melt temperature to rise. In addition, lime and sometimes other slag-forming agents are added to the melt during blowing to form a slag that retains impurities in the form of complex oxides. The converter slag is formed as a waste product after smelting the pig iron. Lime is used as a slag-forming additive in the melt. Crude magnesite is used to increase the MgO content in the converter slag. The magnesite is also used to cool the steel in the converter and to treat the converter liner before the end of the melt blowing. Crystalline solid sulfur is used for alloying selected steels. The tapping of the steel is usually carried out when the temperature reaches 1620–1660

^{\circ}

C and the carbon concentration is 0.03–0.04% (i.e., in low-carbon steels).

Due to the high temperatures in the converter (e.g., at the melting endpoint up to 1700

^{\circ}

C), it is problematic to measure these process variables with conventional devices (e.g., thermocouples). In practice, various approaches are used to measure melt temperature and carbon content, e.g., sublance and drop probes, bomb-method, or waste gas analysis [44]. In the literature, only a few examples of the dynamic behavior of temperature and carbon obtained during decarbonization can be found (e.g., [3,45,46]). These time behaviors are usually obtained from models or a small number of measurements during melting.

The behavior of temperature and carbon during melting depends on inputs to BOS and control interventions. The example of simulated behavior of temperature and carbon during selected melting is shown in Figure 1. This dynamic behavior was simulated by a complex model for indirect measurement of temperature and carbon with feedback [4,5] developed by the authors of this paper. The model is based on the assumption that the converter gas flow and its composition of CO, CO

_{2}

, and O

_{2}

are measured. The proposed complex model aggregates models of individual steel production processes (i.e., melting of scrap, decomposition of slag-forming additives, oxidation of elements C, Si, Fe, Mn, and P in a liquid state, and processes between slag and liquid metal). The input to the model is static data recorded at the start of melting and dynamic data recorded during melting. Submodels are based on heat balances and chemical equations. The temperature course slightly increased in the first half of the time, caused by the scrap melting, which is completed around 250 s.

Furthermore, the increase also slows down the decomposition of slag-forming additives, completed in around 550 s. In the second half of the forming process, the temperature rise is steeper because the scrap and additives are melted or decomposed. These processes require heat consumption and, at this time, the maximum rate of decarbonization. In the case of carbon, the drop in carbon content is initially more moderate because the carbon burning rate is lower, and the carbon in the metal is replenished by carbon from the molten scrap. In the course’s middle part, the carbon decrease is uniform because the rate of decarbonization is at its maximum. At the end of the process, the carbon course moderates and approaches the final concentration in the liquid metal [4].

2. Theoretical Background of Applied Method

2.1. Observations and Targets in BOS

The process data recorded during the melts on a 180 t converter with a non-brick converter vessel volume of 300 m

^{3}

were used to model the converter process. Many process variables are measured in the BOS process, which can be divided into static and dynamic variables. Static variables include those that do not change continuously during the process, e.g., input quantities at the beginning and end of BOS. Dynamic variables are continuous and are measured continuously throughout the process. The paper focuses on the creation and comparison of machine prediction models that would be able to predict process variables from static data (i.e., melt temperature and melt carbon concentration at the endpoint of BOS) but also from dynamic data that are continuously recorded during melting (i.e., the time behavior of the temperature and carbon concentration). Static data did not have a representation in the time domain (i.e., in the provided dataset). On the other hand, dynamic data were recorded during melting, and they were provided the time domain.

The raw static and dynamic provided by the steelmaking plant were processed in data-driven modeling of the BOS process. A correlation analysis was performed to select effective data. Furthermore, only meltings for which there existed a dynamic data record were selected, and meltings with missing data were excluded.

A total of 872 melts from the steelmaking plant were taken over and processed for BOS modeling. Static data were taken, where information on inputs at the beginning of the melt, static information recorded during the melt, and data at the endpoint of the melt were recorded. These data were divided into observations and targets. The targets are also the expected model outputs. Static data can be divided into training and testing datasets to train the models and test their prediction performance. Although three possible prediction targets were determined from static data, only the prediction of melt temperature and melt carbon concentration was investigated in this paper (see Table 1).

The steel quality class is a dimensionless number from the quality code list, which specifies the requirements for the steel produced, e.g., its chemical composition, the quality of the blown oxygen, the amount of pig iron and scrap to the converter, the carbon concentration or temperature at the endpoint, etc. For a given batch, the melter orders the weight of pig iron to the converter. Typical steels are casing, transformer steels, or high-alloy steels. Pig iron contains a certain amount of silicon, phosphorus, manganese, sulfur, and titanium. These parameters are known and can be used as input observations to determine the target process variable.

This paper also compares machine regression models in predicting melt temperature or carbon concentration based on four dynamic variables of the BOS. Given that the waste gas flow was not measured in the several melting, it was decided not to use this observation for model training and subsequent prediction. Dynamic data recorded for a period of 1 s were taken from each melting. These are the composition of the waste gas, the waste gas temperature, and the flow rate of the blown oxygen (see Table 2). The accumulated amount of blown oxygen was calculated from the oxygen flow rate.

The following data tensor can be defined when modeling from static observation to transform inputs to output of BOS process by the machine model:

S = {x_{i}, y_{i}}_{i = 1}^{N} = {x s_{1 i}, \dots, x s_{n i}, y s_{i}}_{i = 1}^{N}

(1)

where

x_{i} \in R^{n}

are vectors of independent static variables of the BOS process according to Table 1. Parameter n represents the number of values in the vectors given by the total number of meltings intended for training. Parameter N represents the total number of observations used for training and prediction. Dependent variable

y_{i} \in R

represents the target variable (i.e., temperature or carbon) that is modeled by the machine learning method and predicted by the learned machine model.

Tensor (1) should be created for each target variable individually. In this study, 17 static input observations were used to predict the selected target.

In modeling from merged static and dynamic data and finally prediction from dynamic data, the training data tensor can be defined as the following:

D = {x_{i}, y_{i}}_{i = 1}^{N} = {x d_{1 i}, \dots, x d_{n i}, y s_{i}}_{i = 1}^{N}

(2)

where

x d_{1 i}, \dots, x d_{n i}

are vectors of independent dynamic variables of the BOS process according to Table 2. Parameter n represents the number of data calculated as

(n = m * r)

, where parameter m represents the total number of meltings in the raw dataset and parameter r represents the number of repeated hardware-based measurements of the given target variable (i.e., initial, …, endpoint measurement) recorded as static data. Parameter N represents the total number of dynamic observations used for training and prediction (i.e., four in basic). The dependent variable

y s_{i}

represents the target variable (i.e., melt temperature or carbon concentration in the melt) that is modeled by the machine learning method and predicted by the learned machine model. The values of variable

y s_{i}

can be extracted from the static database.

Tensor (2) is created for each target variable individually. In this paper, only two points of each melting are regarded (i.e., initial and endpoint, so

r = 2

). Four input observations and two targets are regarded in this study as basic, and it is possible to consider adding other relevant observations to investigate the performance of the models.

Two large melting datasets were created to fit machine models of BOS, i.e., the datasets for the temperature model and the dataset for the carbon model. These sets were made in data mining by pairing static and dynamic data. The corresponding start and endpoint data (i.e., CO, CO

_{2}

, waste gas temperature) from the dynamic data database were assigned for each start and end measured melt temperature from the static dataset. Information on the accumulated amount of inflated oxygen was taken from the static data database. This pairing was done for each melting. Similarly, the corresponding data from the dynamic dataset were assigned for each initial and final actually measured melt carbon concentration from the static dataset. Then the prediction from the learned model can be realized in an inverse way, where from the test or online dynamic data (i.e., CO, CO

_{2}

, waste gas temperature, and blown oxygen flow rate), the melt temperature or carbon concentration can be calculated by the model. Meltings with missing dynamic data or no-valid melting has been removed in big datasets. In this way, the datasets of filtered meltings have been obtained. The obtained datasets of merged data can be subsequently divided into training and testing sets (i.e., in offline testing).

Based on this principle, a data-driven soft-sensor can be built to continually adapt the selected machine model with new melts, thus improving the prediction (see Figure 2). Finally, the principle of creating big datasets by merging static and dynamic data is illustrated in Figure 3. The flowchart demonstrates the sequence of steps applied for machine learning and making predictions in offline or online mode.

Soft sensors based on dynamic steelmaking data enable real-time estimating process variables without a hardware device (e.g., thermocouple or pyrometer). As a result, they can provide less expensive and quicker process data than slow and costly hardware devices.

2.2. Multivariate Adaptive Regression Splines

Multivariate adaptive regression splines (MARS) is a non-parametric regression technique that looks like an extension of linear models. This method was proposed by Friedman [47] and extensively discussed in [48,49,50,51,52,53]. Endpoint BOS temperature prediction from static data using the MARS model has recently been investigated by Diaz et al. [54].

This method aims to find the dependency of variables

y_{i}

on one or more independent variables

x_{i}

. The relationship between

y_{i}

and

x_{i}

(i = 1, \dots, N)

from the data tensor (1) or (2) can be represented as:

y_{i} = f (x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{p}) + ε = f (x_{i}) + ε,

(3)

where f is an unknown single valued deterministic function that captures the joint predictive relationship of

y_{i}

on

(x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{p})

and

ε

is an error. The additive stochastic component

ε

, for which the expected value is determined to be zero, usually reflects the dependence of

y_{i}

on values other than

(x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{p})

, that is, neither controlled nor observed.

In the multidimensional case, the goal is to form reflected pairs for each input component

x^{j}

of the vector

x = {(x^{1}, \dots, x^{j}, \dots, x^{p})}^{T}

with knots at each observed value

x_{i}^{j}

of that input

(i = 1, 2, \dots, N; j = 1, 2, \dots, p)

.

In MARS, the regression function called basis function (BF) is approximated by smoothing splines for a general representation of data in each subgroup. The BF is unique between any two knots and is shifted to another BF at each knot [47].

The MARS model for the prediction of melt temperature and carbon content can be expressed by the following equation [52]:

y_{T / C} = \hat{f} (x) + ε = c_{0} + \sum_{m = 1}^{M} c_{m} B_{m} (x) + ε,

(4)

where

y_{T / C}

represents the output variable that is predicted (i.e., temperature or carbon),

x

is the vector of input variables, i.e., observations from the BOS process, M is the number of BFs in the model (i.e., number of spline functions),

c_{0}

is the coefficient of the constant BF

B_{0}

, and the sum is over the BFs

B_{m}

produced by an algorithm that implements the stepwise forward part of the MARS strategy by incorporating the modification to recursive partitioning. The coefficients

c_{m}

are estimated by minimizing the residual sum-of-squares.

B_{m} (x)

is the m-th function in a set of constructed BFs, or a product of two or more such functions, and parameter

ε

is the additive stochastic component.

This paper compared the piecewise-linear and piecewise-cubic types of the MARS model for BOS modeling and predictions. In this paper, the open-source implementation of MARS proposed by Jekabsons [55] was applied to the BOS process. The piecewise-linear MARS model uses the max(0,

x - t

) function, where t is the univariate knot, which is selected for each of the factor variables x. The max(.) function represents the positive part of

(0, x - t)

, which can be formally expressed as the following:

\max (0, x - t) = \{\begin{matrix} x - t, if x \geq t \\ 0, otherwise \end{matrix}

(5)

The cubic BF has the following form [56]:

\begin{matrix} C (x | s = + 1, t_{-}, t, t_{+}) = \{\begin{matrix} 0, x \leq t_{-} \\ α_{+} {(x - t_{-})}^{2} + β_{+} {(x - t_{-})}^{3}, t_{-} < x < t_{+} \\ x - t, x \geq t_{+} \end{matrix} \end{matrix}

(6)

where

α_{+} = \frac{2 t_{+} - 3 t + t_{-}}{{(t_{+} - t_{-})}^{2}}, β_{+} = \frac{- t_{+} + 2 t - t_{-}}{{(t_{+} - t_{-})}^{3}},

(7)

and

\begin{matrix} C (x | s = - 1, t_{-}, t, t_{+}) = \{\begin{matrix} t - x, x \leq t_{-} \\ α_{-} {(x - t_{+})}^{2} + β_{-} {(x - t_{+})}^{3}, t_{-} < x < t_{+} \\ 0, x \geq t_{+} \end{matrix} \end{matrix}

(8)

where

α_{-} = \frac{- t_{+} + 3 t - 2 t_{-}}{{(t_{-} - t_{+})}^{2}}, β_{-} = \frac{t_{+} - 2 t + t_{-}}{{(t_{-} - t_{-})}^{3}} .

(9)

The algorithm builds a model in two phases: forward selection and backward deletion. While the model is typically overfitted from the forward phase, within a backward phase, it is simplified by deleting the least important basis functions. In modeling, we have considered maximal interactivity between input variables without self-interactions. Training data were not normalized in the simulation. The initial number of BFs is set according to functions

m i n (200, m a x (20, 2 d)) + 1

, where parameter d represents the number of input variables.

2.3. Support-Vector Regression

Support-vector machines (SVM) have successfully been applied to an enormously broad spectrum of application domains. In SVM regression (SVR), the essential idea is to map patterns

x

to a high-dimensional feature space through non-linear mapping and to perform linear regression in this space [57,58]. Recent applications of SVR to the BOS process can be found in [17,20,21,59].

In regression with one target variable y, the observations on the BOS process can be written as a sequence of pairs

(x_{1}, y_{1}), . . ., (x_{i}, y_{i}), \dots, (x_{N}, y_{N})

,

x_{i} \in R^{n}

,

y_{i} \in R

. Vector

x_{i}

represents one pattern of input observations

x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i n})

given in tensor (1) or (2).

The function used for the prediction of melt temperature and carbon content in the BOS process depends only on support vectors and can be expressed in the following form [60]:

y_{T / C} = \hat{f} (x) = \sum_{i = 1}^{N} (α_{i} - α_{i}^{*}) K (x_{i}, x) + b,

(10)

where

y_{T / C}

represents the output variable that is predicted (i.e., temperature or carbon), N represents the number of patterns

x = (x_{1}, x_{2}, \dots, x_{N})

, i.e., input observations in BOS process, and parameter b is the limit value, or the so-called threshold and parameters

α_{i}

,

α_{i}^{*}

have an intuitive interpretation as the forces of pull or push of

f (x_{i})

to measurement

y_{i}

[61]. Parameters

α

and

α^{*}

are non-negative Lagrange multipliers for each observation

x

. A threshold b can be determined from the Lagrange multipliers. Observations with nonzero Lagrange multipliers

α_{i}

are named support vectors.

K = {(k (x_{i}, x_{j}))}_{i, j = 1}^{N}

represents a symmetric, positively defined kernel matrix that specifies scalar products between all the pairs of points

{x_{i}}_{i = 1}^{N}

. The kernel matrix can be created directly by the kernel function [62].

In this paper, Gaussian and Polynomial kernel functions were considered [60].

Gaussian kernel:

$k (x_{i}, x_{j}) = e^{- γ ‖ x_{i} - x_{j} ‖^{2}},$

(11)

where $γ$ represents the kernel parameter, which controls the kernel function’s sensitivity.
Polynomial kernel:

$k (x_{i}, x_{j}) = {(γ (x_{i}^{T} x_{j} + 1))}^{d},$

(12)

where d is an integer.

The optimization task (i.e., minimization of Lagrange function [63,64]) finds optimal values of

α

whereby Karush–Kuhn–Tucker (KKT) complementary conditions must be met [58,65]. This paper applied a Matlab implementation of SVR available in [63] to BOS data modeling. In the working settings of SVR, the training data were standardized, i.e., the software centers and scales each column of the predictor data

x

by the weighted column mean and standard deviation, respectively. In addition, the software divides all elements of the predictor matrix

x

by the appropriate scale factor that was found using a heuristic procedure. Sequential minimal optimization (SMO) was used as an algorithm for solving quadratic programming (QP). This optimizer is widely used for training SVM.

2.4. Feed-Forward Neural Networks

The nonlinear structure of neural networks (NNs) is suitable for predicting process variables in many industries. Recent applications of NNs to the BOS process can be found in [22,26,27,28].

Neural networks consist of sensory units of so-called input neurons, which form an input layer followed by one or more hidden layers, each with a set number of neurons. Feed-forward neural networks pass a linear combination of inputs from one layer to another. As they do this, the neurons decide how to modify their inputs, utilizing a given activation function. In NN, each neuron

v_{i}

has activity

x_{i}

and threshold

ϑ_{i}

. The connections to neighboring neurons are rated by weights

w_{i j}

. The learning of the network consists in finding the weights and thresholds that correspond to the pair of an input vector of activities

x

and the desired output vector of activities

\hat{y}

[66].

Depending on the type of approach to modeling the BOS process (i.e., from static or combined static and dynamic data), tensor (1) or (2) is used and subsequently decomposed into a sequence of more pairs of input and desired output vectors:

(x_{1} / {\hat{y}}_{1}), (x_{2} / {\hat{y}}_{1}), \dots, (x_{N} / {\hat{y}}_{N}),

(13)

The goal of learning is to minimize error E (14) in individual cycles to achieve the required accuracy

E = \sum_{i = 1}^{N} E_{i} = \sum_{i = 1}^{N} \frac{1}{2} (y_{i} - {\hat{y}}_{i})^{2},

(14)

where

y_{i}

is the output vector of the NN as a response to the input vector

x_{i}

, and the desired output vector

{\hat{y}}_{i}

is assigned to the input

x_{i}

.

Network error minimization (14) can be calculated by the gradient method, where for multiple pairs of input-output vectors, the total gradient of the objective function is calculated as follows:

grad E = \sum_{i = 1}^{N} grad E_{i},

(15)

where the objective function

E_{i}

is defined for the i-th pair

x / \hat{y}

of the training set.

Formally, the trained NN can be expressed as follows:

(\bar{w}, \bar{ϑ}) = \underset{(w, ϑ)}{argmin} E (w, ϑ) .

(16)

The following procedure can be used to calculate the outputs for one training sample considering a neural network with multiple inputs and multiple outputs.

The output from the hidden layer can be calculated according to the following equation:

L_{j} = f (\sum_{i = 1}^{N} (w_{i j} x_{i} - θ_{j})); j = 1, 2, \dots M,

(17)

where

ϑ_{j}

is the threshold of the hidden layer neuron,

f (.)

is an nonlinear transfer function,

w_{i j}

is the input-hidden layer link weights,

w_{j k}

is the hidden-output layer link weights, N is the number of neurons in the input layer, and parameter M represents the number of neurons in the hidden layer.

The output value of the k-th neuron in the output layer can be calculated as the following:

y_{k} = \sum_{j = 1}^{M} L_{j} w_{j k} - a_{k}; k = 1, 2, \dots, l,

(18)

where

a_{k}

is the threshold of the output layer neuron and parameter l represents the number of neurons in output layer [22].

Figure 4 shows the general structure of NN for BOS process modeling and selected output prediction.The back-propagation algorithm for NN training was used in this study [67,68,69].

For the NN that was tried in this study, only one hidden layer was used. The number of neurons in the hidden layer was set to

2 N + 1

, where N is the number of input neurons. This setup results from experimental trials where the best performance was searched. The maximum number of learning epochs was set to 1000 and the learning rate to 0.000000001. The gradient descent method combines adaptive learning rate with momentum training and was used to train the network. The NN is trained as long as its weight, net input, and transfer functions have derivative functions. The learning rate must be small enough to ensure the monotone convergence of the optimization algorithm and, at the same time, large enough to provide a sufficiently high convergence rate. The momentum constant was set to 0.9. This parameter is essential for the “skip” of local minima in the initial optimization phase. The sigmoid/logistic activation function was used as the transfer activation function. This function is commonly used for models where we have to predict the probability as an output. Since the probability of anything exists only between 0 and 1, sigmoid is the right choice because of its range. The function is differentiable and provides a smooth gradient, i.e., preventing jumps in output values.

2.5. k-Nearest Neighbors

The k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method developed by Fix and Hodges in [70] and later extended in [71]. In k-NN, the input consists of the k-closest training examples in a dataset [52,72]. Han and Wang [73] first applied this method for oxygen volume flow prediction at the endpoint of the BOS process.

In classification and regression, the technique of weight assignment is used on the contributions of the neighbors so that the nearer neighbors contribute more to the average than the more distant ones. The algorithm’s training phase consists only of storing the feature vectors and class labels or property values of the training samples. In regression, the label assigned to a query point is computed based on the mean of the labels of its nearest neighbors.

The problem in regression is to predict labels

y^{'}

for new patterns

x^{'}

based on a set of N observations, i.e., labeled patterns

(x_{1}, y_{1}), \dots, (x_{N}, y_{N})

. In BOS modeling, the patterns can be extracted from data tensor (1) or (2) depending on the chosen modeling approach. The goal is to learn regression function

\hat{f}

based on this training set. For an unknown pattern

x^{'}

, k-NN regression computes the mean of the function values of its k-nearest neighbors [74].

y_{T / C} = \hat{f} (x^{'}) = \frac{1}{k} \sum_{i \in N_{K} (x^{'})}^{} y_{i},

(19)

where

y_{T / C}

represents the output variable that is predicted (i.e., temperature or carbon content in the melt), and

N_{K} (x^{'})

is the set containing the indices of the k-nearest neighbors of

x^{'}

.

The averaging is based on the assumption of the location of the functions in the data and label space. In local neighborhoods of

x_{i}

, patterns

x^{'}

are expected to have similar continuous labels

\hat{f} (x^{'})

such as

y_{i}

. Hence, for the unknown

\hat{f} (x^{'})

, the label must be similar to the labels of the closest patterns, which is modeled by the average of the label of the k-nearest patterns. The neighborhood size k is an important parameter affecting the regression model’s performance. This paper applied an open-source implementation of k-NN proposed by Ferreira [75] to the BOS process. In the working setup of k-NN modeling in this study, the number of neighbors was set to 5. In the settings, a Euclidean distance metric as the most used was applied. It is a measure of the actual straight-line distance between two points in Euclidean space.

2.6. Random Forest

A random forest (RF) is an ensemble machine learning technique for classification and regression that constructs many decision trees at training. In regression tasks, the mean or average prediction of the individual trees is returned [52,72,76,77,78]. The first algorithm for random decision forests was proposed by Ho [76]. Another algorithm extension was developed by Breiman [79]. Since 2006, the RF has been a registered trademark owned by Minitab, Inc. [80]. Unfortunately, in the literature, only low evidence of the RF application to the BOS process can be found (e.g., [34,81]).

The RF is trained through bagging or bootstrap aggregating. The bagging method generates the prediction, which uses different training data samples. The prediction is made based on the average exit from various trees. Predictive accuracy depends on the number of trees. The model is built on the training dataset with n training samples and N observations of the input set

x = {x_{1 i}, \dots, x_{n i}}_{i = 1}^{N}

with the response

y = {y_{1}, \dots, y_{n}}

; bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples.

For

b = 1, \dots, B

:

Sample, with replacement, n training examples from $x$ , $y$ ; call these $x_{b}$ , $y_{b}$ .
Train a classification or regression tree $f_{b}$ on $x_{b}$ , $y_{b}$ .

In BOS modeling, the training data be extracted from data tensor (1) or (2), depending on the chosen modeling approach. A trained forest makes predictions for unknown observations

x^{'}

by averaging predictions from all trees on

x^{'}

according to the following equation:

x^{'}

:

y_{T / C} = \hat{f} (x^{'}) = \frac{1}{B} \sum_{b = 1}^{B} f_{b} (x^{'}),

(20)

The number of samples/trees, B, is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set [82]. In this paper, a generic RF function proposed by Banerjee [83] was applied to the BOS process. In the working setup of RF modeling, the number of bags in the bootstrapping procedure was set to 5000.

2.7. Advantages and Disadvantages of Machine Learning Methods

Learning algorithms work based on strategies, algorithms, and inferences that have worked well in the past and are likely to work well in the future. Individual machine learning methods differ in algorithms and advantages and disadvantages, which can be decisive when applying the chosen method. Table 3 shows a comparison of individual methods in terms of their general properties.

In general, the advantage of SVM regression is that its efficiency increases with the size of the input data. The disadvantage of the SVR model is that the learning ability may decrease with increasing samples. Another disadvantage is that to achieve the model’s required performance is necessary to set a suitable kernel function. The SVR model is challenging to understand and interpret because the learned model maintains a database or program structure of learned support vectors used for prediction. The risk of overfitting is lower in the case of SVR than in the case of NN.

The advantage of NNs is their ability to learn and make decisions based on similar events. The disadvantage of NNs is their hardware dependency, limiting a network learning target prediction to one computer. All learned information is maintained in a NN instead of a database. Other disadvantages include the need for an experimental setup of the network structure and unexplained network behavior, which reduces confidence in its use. However, these disadvantages are continually eliminated over time and by scientific research.

The MARS technique is a computationally intensive methodology that fits a nonparametric regression model in the form of an expansion in product spline basis functions of predictor variables. The MARS algorithm produces continuous nonlinear regression models for high-dimensional data using a combination of predictor variable interactions and partitions of the predictor variable space. The MARS models seem well suited for considering the complex interactions among multivariate and cross-correlated predictor variables. The advantage of this technique is the ability to model large datasets more flexibly than linear models do. In addition, MARS automatically models non-linearities and interactions between variables.

The k-NN algorithm is straightforward to understand and just as easy to implement. This algorithm does not explicitly create any model, and it simply indicates new learning based on entering data from historical data. One of the biggest problems of k-NN is selecting the optimal number of neighbors to be considered for new data. The performance improvements can be found by optimizing this parameter. In general, the k-NN does not work well with unbalanced data.

The RF algorithm creates as many trees on the subset of the data and combines the output of all the trees. In this way, it reduces the overfitting problem in decision trees, reduces the variance, and improves accuracy. On the other hand, this algorithm requires much more computational power and resources. Generally, the RF is a very stable algorithm that can automatically handle missing values. The RF algorithm is usually resistant to noise and extreme values.

2.8. Model Performance Indicators

The performance of individual machine learning methods, and the created regression models, respectively, after application to BOS process data, were evaluated using statistical indicators. The performance indicators were calculated for each set

{\{y_{i}, Y_{i}\}}_{1}^{N}

, i.e., values of measured target y and simulated target Y. The following measures were calculated for each method [56]:

Coefficient of correlation ( $r_{y Y}$ )—This coefficient expresses the force of the linear relationship (i.e., degree of dependence) between two variables. The range of this coefficient is (−1,1), and its formula is as follows:

$r_{y Y} = \frac{\sum_{i = 1}^{N} (Y_{i} - Y^{a v g}) (y_{i} - y^{a v g})}{\sqrt{\sum_{i = 1}^{N} {(Y_{i} - Y^{a v g})}^{2}} \sqrt{\sum_{i = 1}^{N} {(y_{i} - y^{a v g})}^{2}}},$

(21)

where $y^{a v g}$ and $Y^{a v g}$ are average values of y and Y.
Coefficient of determination ( $r_{y Y}^{2}$ )—Expresses the degree of causal dependence of two variables. The coefficient gives information about the level of tightness, and the goodness of fit of the model, respectively (e.g., $r_{y Y}^{2}$ = 1 indicates that the model perfectly fits the measured target data, and $r_{y Y}^{2}$ < 1 corresponds to lower tightness between y and Y). The following formula can calculate the coefficient of determination $r_{y Y}^{2}$ :

$r_{y Y}^{2} = 1 - \frac{\frac{1}{N} \sum_{i = 1}^{N} {(Y_{i} - Y^{a v g})}^{2}}{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - y^{a v g})}^{2}} .$

(22)
Root-mean-squared error (RMSE)—Represents the square root of mean square error (MSE). The value of RMSE may vary from 0 to positive infinity. The smaller MSE or RMSE, the better the model performance. The calculation formula is as follows:

$RMSE = \sqrt{MSE} = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(Y_{i} - y_{i})}^{2}} .$

(23)
Relative root-mean-squared error (RRMSE)—Expresses RMSE divided by the average of measured value $y_{i}$ . The value of RRMSE may vary from 0 to positive infinity. The smaller RRMSE, the better the model performance. The formula to calculate RRMSE is as follows:

$RRMSE = \frac{RMSE}{y^{a v g}} \times 100 (%) .$

(24)
Mean absolute percentage error (MAPE)—This error indicates how accurate a prediction method is. The MAPE expresses this accuracy in a percentage. If values of $y_{i}$ are very low, then MAPE can exceed 100% extremely. Otherwise, if values $y_{i}$ are very high (i.e., above $Y_{i}$ ), MAPE will not exceed 100%. Therefore, the MAPE can be calculated according to the following formula:

$MAPE = \frac{1}{N} \sum_{i = 1}^{N} \frac{| y_{i} - Y_{i} |}{| y_{i} |} \times 100 (%) .$

(25)
Performance index (PI)—This index expresses the overall performance of a given prediction method. The value of PI may vary from 0 to positive infinity. A lower PI value means better model performance. The PI is calculated as follows [84,85]:

$PI = \frac{RRMSE}{r_{y Y} + 1} .$

(26)

An absolute and relative error calculation formula was applied to determine the deviation between the measured and calculated temperature or carbon concentration at the BOS endpoint of selected meltings.

The absolute error $Δ_{x}$ is the difference between the measured value of the process variable y and the calculated value by the model Y. For example, the following equation can express the absolute error:

$Δ_{x} = Y - y .$

(27)

The value of the absolute error is given unsigned in the evaluations, always as a positive number.
The relative error $δ_{x}$ is the ratio of the absolute error $Δ_{x}$ to the actual value of the process variable y. The relative error is usually expressed in percentages and can be expressed by the following equation:

$δ_{x} = \frac{Δ_{x}}{y} \times 100 .$

(28)

Besides these indicators, the time required to train the model was measured to assess the performance of the individual machine learning techniques.

3. Simulation Results

This comparative analysis evaluates the effectiveness of the applied machine learning methods in predicting melt temperature and carbon concentration at the BOS endpoint. The results are divided into two main sections. The first part presents the simulation-based results only on static BOS data. The second part tries to predict the time behavior of mentioned BOS variables. All five machine learning methods were applied, and the results are arranged in tables. In addition, the SVR was used with the Gaussian and polynomial kernel and MARS with piecewise-cubic and piecewise-linear types of the model. All machine learning methods were investigated on the same computer (i.e., Intel Core i5-9500 CPU @ 3.00 GHz with 8 GB RAM and SSD). All methods were investigated in the same environment (i.e., Matlab). Graphs support numerical results. Both subsections present offline simulation results. The flow chart for modeling and prediction in steelmaking based on machine learning was proposed in Figure 2. The development of an industrial soft-sensor would be based on training with updated history data and online prediction.

3.1. Prediction Based on Static Data

The machine methods of BOS process modeling are, in this case, based on a set of selected static data from melts. A set of 872 melts with static data (see Table 1) was divided into training and test data. The test data represented 10% of the data from the end of the whole melt set. A total of 785 melts were used for training, and 87 melts were used for testing. Note that the test data did not enter into the learning of the machine models. When applying such an approach to BOS endpoint modeling in practice, it is necessary to ensure that the set of static melting data is continuously updated or supplemented to improve prediction. The training and testing datasets were divided into input observations and possible prediction targets. Table 1 lists all selected observations and targets from the static dataset. This table shows fixed BOS observations (i.e.,

x_{1}

,…,

x_{15}

) and optional data. Melt temperature and carbon concentration were considered as target process variables. Melting duration as an optional target variable was not investigated in this paper. When endpoint BOS melt temperature was modeled and predicted, the endpoint carbon concentration in melt and melting duration was regarded as the next input observations (i.e.,

x_{16}

= endpoint carbon concentration in the melt,

x_{17}

= melting duration). Otherwise, when carbon concentration in the melt was modeled and predicted, then melt temperature and melting duration was considered as input observations (i.e.,

x_{16}

= endpoint melt temperature,

x_{17}

= melting duration). The total number of input observations in the training and testing of the machine regression models was 17. In addition, the performance impact of including lime and dolomitic lime as optional observations was also investigated. Separate tables show the performance of models with added lime.

Table 4 shows the performance results of the previously described machine learning models. The endpoint BOS temperature in the melt was considered the target of modeling and prediction. The table shows the performance of the individual models from the training phase (i.e., learning) and the testing phase. In both phases, statistical indicators were calculated (i.e., see Equations (21)–(26)). These indicators express how well the models fit the measured target data. Special indicators were the training time and the performance index (PI), which expresses the overall performance of the applied model.

The obtained results were contradictory. In Table 4, it can be seen that in the training phase, the machine learning model based on RF achieved the best performance (i.e., PI = 0.33). Table 5 shows the calculated relative and absolute errors of the ten selected melts from the test phase. The calculated average errors confirm the results from Table 4. The Gaussian kernel-based SVR modeling method was the second to achieve the best results (i.e., PI = 0.47). On the other hand, the NN-based regression was the worst in training (i.e., PI = 0.81). In this case, the NN had one hidden layer with 35 neurons. On the other hand, the MARS regression method with the piecewise-cubic model in the testing phase achieved the best performance (i.e., PI = 0.50). This is because the MARS model approximated target data most precisely. Regarding performance, the NN model was the least powerful in training and testing. In terms of training time, the k-NN proved to be the fastest method (i.e., training time was 0.001 s), and the regression model based on RF proved to be the slowest (i.e., training time was 64.58 s).

Similarly, Table 6 presents the carbon content modeling and prediction results. Unfortunately, machine learning methods were not as powerful as temperature modeling and prediction in this case. In training, the most powerful method indicated by the lowest PI was SVR with the polynomial kernel (i.e., PI = 4.68). The second-best machine learning method was RF regression (i.e., PI = 7.72). The regression with NN (i.e., PI = 33.89) proved the least efficient method for approximating training data.

On the other hand, in the testing phase (i.e., prediction from observations that were not input in the training phase), the MARS regression method with the piecewise-cubic model achieved the best performance (i.e., PI = 17.76). The MARS model was able to approximate target data most precisely. The NN was the least effective in the testing phase. The carbon content prediction with the NN has proven to be the least effective in the testing phase (i.e., PI = 34.67). In this paper, only one configuration of NN, i.e., with a single hidden layer, was compared, but by further optimization of the NN, its performance can be increased. Other reasons for NN’s low performance can be overfitting, the higher variability of inputs, and the low correlation between inputs and target. Table 7 shows the calculated relative and absolute errors on the ten selected melts from the test phase. The calculated average error values confirm the results from Table 6. The fastest method was k-NN (i.e., 0.0016 s) and the slowest was RF (i.e., 68.77 s) in training time.

Trained machine models exist in the PC in the form of many learned coefficients, parameters, and weights in various program structures. The problem with machine learning methods can be the portability of the learned models to different industrial automation devices. However, the exception is the MARS model, which is given by an exact mathematical equation that can be programmed on a PC or programmable logic controller (PLC). The regression equation of the MARS model is composed of basis functions (BFs), the form of which depends on the model type (i.e., piecewise-cubic or piecewise linear MARS).

Because in the testing phase, the piecewise-cubic MARS model was most effective in predicting temperature on unknown data, its equation is presented in this paper (see Equation (29)). The list of corresponding basis functions is shown in Table 8. Variables

x_{6}

,

x_{9}

,

x_{11}

,

x_{14}

, and

x_{17}

have the lowest relative importance and were not used in the piecewise-cubic MARS model of melt temperature.

The optimal number of maximal BFs in the final model was estimated by the generalized cross-validation criterion (GCV) and 10-fold cross-validation. In all cross-validation iterations, a new MARS model was created and reduced using the GCV in the in-fold (i.e., training) data. In addition, there was also calculated MSE criterion on out-of-fold (i.e., testing) data in the reducing phase of each MARS model.

\begin{matrix} Temperature (^{\circ} C) = 1636.6 + 0.016459 \times BF 1 + 1420.7 \times BF 2 + 0.024364 \times BF 3 \\ - 0.01286 \times BF 4 - 0.0056618 \times BF 5 + 0.0007718 \times BF 6 + 0.26662 \times BF 7 \\ - 0.18207 \times BF 8 - 57.206 \times BF 9 + 721.43 \times BF 10 - 0.00011921 \times BF 11 \\ - 0.00000028427 \times BF 12 + 0.00000021796 \times BF 13 + 0.000014316 \times BF 14 \\ + 0.14273 \times BF 15 + 2.0557 \times BF 16 + 0.63713 \times BF 17 + 8122.4 \times BF 18 \\ + 35.469 \times BF 19 + 28.721 \times BF 20 - 0.0016565 \times BF 21 + 0.000044732 \times BF 22 . \end{matrix}

(29)

Similarly, the piecewise-cubic MARS model was the most effective in predicting carbon content on unknown data, and its equation is as follows:

\begin{matrix} Carbon (%) = 0.056331 - 0.00015156 \times BF 1 + 0.00035671 \times BF 2 \\ - 0.026621 \times BF 3 - 0.030646 \times BF 4 + 0.00012233 \times BF 5 + 0.0000014438 \times BF 6 \\ - 0.613 \times BF 7 - 0.00017452 \times BF 8 - 0.0000000019273 \times BF 9 \\ - 0.000000000000088203 \times BF 10 - 0.0000023083 \times BF 11 \\ - 0.00000017825 \times BF 12 - 0.000010849 \times BF 13 - 0.00000080785 \times BF 14 \\ + 0.00024026 \times BF 15 + 0.00009186 \times BF 16 + 0.0000025541 \times BF 17 \\ - 0.00000045047 \times BF 18 - 0.0000000035607 \times BF 19 + 0.0000000023973 \times BF 20 \\ + 0.0000000011266 \times BF 21 - 0.0000000022276 \times BF 22 \\ - 0.00011272 \times BF 23 - 0.00011221 \times BF 24 + 0.00024054 \times BF 25 . \end{matrix}

(30)

Variables

x_{6}

,

x_{7}

,

x_{9}

, and

x_{10}

had the lowest relative importance and were not used in model (30). The list of corresponding basis functions is shown in Table 9.

Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 show the sequence of melts as they were performed in the steelmaking plant. These figures show a series of measured and predicted (i.e., simulated) endpoint melt temperatures. The black vertical line in the figures divides prediction in training and testing. These figures present a graphical comparison of the actual measured temperature with the simulated temperature for each method. The green bottom bars represent the calculated relative error in percentages. In graphic comparison, it can be seen that the lowest relative errors in the training phase were calculated in the case of the RF model (see Figure 11) and the highest in the case of NN (see Figure 7). In the test phase, the lowest relative errors were calculated in prediction with the piecewise-cubic MARS model (see Figure 9) and the highest in prediction with NN and k-NN (see Figure 7 and Figure 10). Suppose the prediction quality in the testing phase is the key criterion for selecting the most accurate method. In that case, it can be stated that the MARS model is the most powerful method for predicting melt temperature from static data.

Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 show the sequence of measured and predicted (i.e., simulated) endpoint carbon concentration in the melt. The figures present a graphical comparison of measured carbon content with the simulated one for each method. It can be seen that the lowest relative errors in the training phase were calculated in the case of the SVR model based on a polynomial kernel (see Figure 12) and the highest in the case of NN (see Figure 14). In the test phase, the lowest relative errors were calculated in prediction with the piecewise-cubic MARS model (see Figure 16) and the highest in prediction by NN and SVR with a polynomial kernel (see Figure 12 and Figure 14). Suppose the prediction quality in the testing phase is the key criterion for selecting the most accurate method. In that case, it can be stated that the MARS model is the most powerful method for predicting carbon content from static data.

In the previous analysis, lime was not considered as an input observation. Based on technological documentation, lime and dolomitic lime are added to the melt as slag-forming additives. The further analysis investigated the effect of added lime and dolomitic lime as additional input observations. Lime, together with magnesite, is used to cool the melt and MgO control in converter slag. To use two further observations, the list of variables in Table 1 was updated, by the following:

x_{13}

= amount of lime added to the melt (kg),

x_{14}

= amount of dolomitic lime added to the melt (kg),

x_{15}

= amount of added magnesite to the melt (kg),

x_{16}

= amount of Fe in pig iron (%),

x_{17}

= amount of after-blow oxygen (Nm

^{3}

).

When endpoint BOS melt temperature was modeled and predicted, the endpoint carbon concentration in melt and melting duration was regarded as the additional input observations (i.e.,

x_{18}

= endpoint carbon concentration in the melt,

x_{19}

= melting duration). Otherwise, when carbon concentration in the melt was modeled and predicted, then melt temperature and melting duration was considered as input observations (i.e.,

x_{18}

= endpoint melt temperature,

x_{19}

= melting duration). The total number of input observations for model training and target prediction was 19.

Table 10 compares the performance of the individual models when lime and dolomitic lime were added as input observations for modeling and melt temperature prediction. When the simulation results in Table 10 are compared with the results in Table 4, it can be stated that most models increased performance, except for the NN in the training phase. On the other hand, most models degraded performance in the test phase except for SVR models. Due to the performance index (PI), the RF model was the best in the training phase, and the piecewise-cubic MARS model was the best in the testing phase.

Similarly, Table 11 compares the performance of the individual models when lime and dolomitic lime were added as additional input observations for modeling and predicting carbon content in the melt. When the simulation results in Table 11 are compared with the results in Table 6, it can be stated that in the training phase, most models increased performance except the piecewise-cubic MARS model and NN. Only SVR with a Gaussian kernel, NN, and the piecewise-linear MARS model in the test phase improved their performance. The other models demonstrate reduced performance in the testing phase. Due to the performance index (PI), the SVR model with a polynomial kernel was the best in the training phase, and the piecewise-linear type MARS model was the best in the testing phase.

Due to the most effective model for predicting temperature on unknown data in the testing phase being the piecewise-cubic MARS model, its equation is presented in this paper (see Equation (31)). The list of corresponding basis functions for this model is shown in Table 12. In addition, this regression model includes lime and dolomitic lime as supplementary observations. On the other hand, the best model for carbon prediction was the piecewise-linear MARS model (see Equation (32)) with the basis functions listed in Table 13. Both models are listed in complete form and can be transferred to the monitoring system of the BOS plant.

\begin{matrix} Temperature (^{\circ} C) = 1636.1 + 0.038907 \times BF 1 + 3007.8 \times BF 2 - 0.015504 \times BF 3 \\ - 0.0043363 \times BF 4 + 0.001141 \times BF 5 + 0.30545 \times BF 6 - 0.20608 \times BF 7 \\ - 65.993 \times BF 8 + 721.84 \times BF 9 - 1.4239 \times BF 10 + 0.000007314 \times BF 11 \\ - 0.00000034225 \times BF 12 + 0.000000040715 \times BF 13 + 0.000000011786 \times BF 14 \\ - 0.00020834 \times BF 15 - 0.04435 \times BF 16 - 0.0082407 \times BF 17 - 0.19175 \times BF 18 \\ - 0.12989 \times BF 19 - 0.00000000027129 \times BF 20 - 0.31203 \times BF 21 \\ + 0.0024637 \times BF 22 + 0.00057932 \times BF 23 . \end{matrix}

(31)

Variables

x_{6}

,

x_{11}

,

x_{16}

, and

x_{19}

have the lowest relative importance and were not used in the MARS model (31).

\begin{matrix} Carbon (%) = 0.050403 + 0.00012192 \times BF 1 - 0.030111 \times BF 2 \\ - 0.025931 \times BF 3 + 0.00037055 \times BF 4 + 0.0000013054 \times BF 5 \\ - 0.00000071876 \times BF 6 - 0.50589 \times BF 7 + 0.00020491 \times BF 8 \\ + 0.0000068037 \times BF 9 + 0.0000014624 \times BF 10 - 0.0000076596 \times BF 11 \\ - 0.00000077033 \times BF 12 - 0.0000000073973 \times BF 13 \\ + 0.0000000021832 \times BF 14 - 0.000000000097109 \times BF 15 + 0.0000012 \times BF 16 \\ - 0.0000067856 \times BF 17 - 0.00000068003 \times BF 18 - 0.00000028498 \times BF 19 \\ - 0.00000037196 \times BF 20 - 0.00000029068 \times BF 21 - 0.00000013186 \times BF 22 \\ + 0.000000000037285 \times BF 23 + 0.000000000044636 \times BF 24 \\ + 0.000000000081557 \times BF 25 . \end{matrix}

(32)

Variables

x_{7}

,

x_{9}

,

x_{10}

, and

x_{14}

have the lowest relative importance and were not used in MARS model (32).

3.2. Prediction Based on Dynamic Data

The prediction of BOS variables (i.e., melt temperature and carbon concentration in the melt) is based on machine models trained on a large dataset of combined static and dynamic data. The idea of merged or paired static and dynamic data was drawn in Figure 3. The database from the BOS operation contained 872 meltings (i.e., rows in the table) with static data (i.e., columns in the table). Essential static data for each melting were the initial and final measured melt temperature and carbon concentration in the melt. Other essential data was the total amount of oxygen injected in a given melt. The database of dynamic data contained process variables measured continuously during melting. The concentration of CO and CO

_{2}

in the waste gas, temperature of waste gas, and flow rate of the blowing oxygen were provided from each melt. These process data were recorded for every melting with a sampling period of 1 s. An example of dynamic data behavior from the selected melting is shown in Figure 19. The training datasets for temperature and carbon content modeling were created within data mining. These big datasets were created by merging static and dynamic data (see Figure 2). For each initial and final measured melt temperature from the static dataset, initial and final observations (i.e., CO, CO

_{2}

, waste gas temperature, and the accumulated amount of blown oxygen) from the dataset of dynamic data were assigned for every melting. The same procedure was applied to the carbon concentration in the melt.

Thus, pairs of inputs (i.e., input observations) and outputs (i.e., targets) were created as huge data matrixes (i.e., big dataset #1 and big dataset #2). As part of the data processing, meltings whose dynamic data were not satisfactory (e.g., there was missing data) were excluded from these datasets. Furthermore, meltings whose dynamic data were used to test the performance of machine models were excluded. This way, big filtered datasets were obtained and used to train machine models.

Dynamic data (see Table 2) from ten selected meltings that were not part of the training were used to test the trained models. For illustration, Figure 2 shows the time behavior of dynamic data recorded during melting #5. In practice, dynamic data are continuously measured during melting and can be used as the input to the machine model to make online predictions of temperature or carbon concentration.

The performances of individual machine models from the training and testing phase are shown in Table 14. In the training phase, the statistical measures were calculated to indicate the performance level of the individual machine learning methods. The performance index (PI) was considered the primary indicator for performance assessment. In addition, the time required to train individual models was determined. Table 15 presents the performance of the models in the testing phase on dynamic data from melts that were not the subject of their training. Since the dynamic course of temperature and carbon was not measured, only the simulated (i.e., predicted) and actual measured value (i.e., temperature/carbon) in the BOS endpoint were compared. To determine the model’s accuracy, the absolute and relative error at this endpoint was calculated.

The results show that in the training phase, the RF-based model was best (i.e., PI = 0.76), although this model was the slowest in training. The NN-based model approximated the measured data the least satisfactorily (PI = 1.32). The number of neurons in one hidden layer was set to 9. On the other hand, in the test phase on ten meltings, the highest accuracy was achieved by the piecewise-linear MARS model. The average relative error from the ten meltings was 0.73%, and the average absolute error was 12.19

^{\circ}

C. The SVR-based model achieved the lowest accuracy in endpoint temperature prediction with the Gaussian kernel, i.e., the average relative error was 3.68%, and the average absolute error was 61.68

^{\circ}

C (see Table 15).

Table 16 compares the performance of individual models for melt carbon concentration. In this case, the individual models were also trained on static and dynamic data, and carbon concentration predictions were tested on ten meltings. The results show that in the training phase, the measured data were best approximated by the RF-based model (i.e., PI = 4.40). On the second side, this model was the slowest in training. Unfortunately, MARS-type models completely failed in the training phase due to the nature of the input data. In the training of the MARS model, numerical issues occurred due to a low correlation between input observations and the target variable. Therefore, testing of MARS models could not be performed. The NN-based model showed the lowest performance in training (i.e., PI = 7.77). In the test phase, i.e., prediction of the BOS endpoint, the best results were achieved by the k-NN model (i.e., the average relative error from the ten meltings was 12.62%, and the average absolute error was approximately 0.01 vol.%) (see Table 17). The SVR model with Gaussian kernel was, on average, the least accurate (i.e., the average relative error from the ten meltings was up to 1376.47%, and the average absolute error was approximately 0.58 vol.%). In general, it can be stated that modeling and carbon prediction were less successful because the models were less accurate than in the case of temperature, and some models failed.

Due to the most precise model for predicting endpoint melt temperature on unknown dynamic data being the piecewise-linear MARS model, its equation is presented in this paper (see Equation (33)). This regression model can be programmed in the monitoring system of the BOS plant and used as a soft sensor to estimate the melt temperature from dynamic measured data. The list of corresponding basis functions for this model is shown in Table 18. On the other hand, the most precise prediction of endpoint carbon content was achieved by the k-NN model. Unfortunately, this machine model cannot be presented in the form of a single regression equation.

\begin{matrix} Temperature (^{\circ} C) = 1659 + 0.022098 \times BF 1 - 0.045215 \times BF 2 - 0.03343 \times BF 3 \\ + 0.0011321 \times BF 4 - 0.0012658 \times BF 5 + 0.000012363 \times BF 6 - 0.016099 \times BF 7 \\ + 0.013921 \times BF 8 - 1.2843 \times BF 9 - 1.5315 \times BF 10 + 0.0021835 \times BF 11 \\ + 0.0016313 \times BF 12 - 0.002626 \times BF 13 - 0.18517 \times BF 14 . \end{matrix}

(33)

When predicting from dynamic data, the aim was to determine the dynamic course of these variables, in addition to estimating the temperature and carbon at the BOS endpoint. The course of the dynamic temperature and carbon may differ for each melting. The shape of the curves depends on the model’s ability to approximate the measured data and the number of actual measurements during melting that can be used to learn the model. The standard dynamic behavior was used to assess the modeled temperature and carbon (see Figure 1).

Figure 20 and Figure 21 show a comparison of temperature and carbon behavior predicted from dynamic BOS data in selected meltings by individual machine learning methods. These figures show the target variables predicted in melting #5. The black vertical line in the individual sub-figures indicates the endpoint of the melt.

Qualitatively, the best temperature behavior compared to Figure 1 was achieved by the SVR model with the polynomial kernel (see Figure 20a) and the piecewise-cubic MARS model (see Figure 20e). These models achieved good accuracy at the prediction temperature at the endpoint of the melting #5 (see Table 15). Note that in the case of melting #5, the SVR model with the polynomial kernel achieved a lower endpoint error. Less smooth courses were obtained by RF (see Figure 20g), k-NN (see Figure 20f), and the SVR model with the Gaussian kernel (see Figure 20b). Although the piecewise-linear MARS model achieved the best average relative and absolute errors in the endpoint prediction, the predicted temperature’s dynamic behavior was not satisfactory (see Figure 20d). Similarly, the unstable dynamic temperature profile was the output of the NN-based model (see Figure 20c).

The comparison of carbon behavior predicted from dynamic BOS data by machine learning methods is shown in Figure 21. Similar to temperature modeling results, this figure shows the predicted temperature in melting #5. In comparison, the results from the MARS models are missing, as these models failed in the training phase. Graphic results show that qualitatively, the smoothest carbon behavior compared to Figure 1 was achieved by the SVR model with the polynomial kernel (see Figure 21a) and the model based on k-NN (see Figure 21d). These models also achieved the best average relative error at the BOS endpoint carbon prediction (see Table 17). Note that the k-NN model achieved a lower average endpoint error from ten selected meltings. Carbon also had a relatively smooth course in the case of simulation with a model based on NN (see Figure 21c) and RF (see Figure 21e). However, SVR obtained less smooth courses of decarbonization with the polynomial kernel (see Figure 21b).

Furthermore, the effect of increasing the number of input observations on the performance of individual models was investigated. Oxygen and hydrogen concentrations were used as additional input observations. Three variants of different inputs were tested, i.e., oxygen only, hydrogen, and finally, oxygen and hydrogen together. Table 19 shows how the performance of machine models has changed in temperature modeling and prediction. On the second side, Table 20 shows how performance has changed in the case of carbon modeling and prediction.

By comparing the results from Table 19 with the values presented in Table 14 and Table 15, it can be seen that the temperature prediction performance on the training data increased in the case of using oxygen as an additional observation in the SVR model with the polynomial kernel, SVR with the Gaussian kernel, NN, piecewise-cubic MARS model and k-NN. Hydrogen increased performance only in the case of the NN model and the piecewise-cubic MARS model. In the case of adding oxygen and hydrogen as two additional observations, the training performance increased in the SVR model with a Gaussian kernel, NN, piecewise-cubic MARS model, and k-NN. In other cases, no increased prediction performance on the training data was recorded.

In the test phase, individual methods were applied to ten meltings, where the temperature at the endpoint was predicted. In Table 19, only the average relative error and the average absolute error at the endpoints from the ten meltings are given. An increase in performance in the case of added oxygen was noted in the NN and piecewise-cubic MARS model. In the case of added hydrogen, the performance increased in the SVR model with the Gaussian kernel and the NN model. In the case of adding oxygen and hydrogen as two additional observations, the performance in testing increased only with the NN model.

By comparing the results from Table 20 with the values presented in Table 16 and Table 17, it can be seen that the carbon prediction performance on the training data increased when oxygen was used as an additional observation in the case of the SVR method with the polynomial kernel, Gaussian kernel, and NN. The use of only hydrogen increased performance in the SVR model with Gaussian kernel and RF-based model. In the case of adding oxygen and hydrogen as two additional observations, the training performance increased for SVR models with the polynomial kernel, Gaussian kernel, and NN. In other cases, no increased prediction performance was noted on the training data.

Similarly to temperature, in the test phase, individual methods were applied to ten meltings where the concentration of carbon in the melt at the endpoint was predicted. An increase in performance was only noted in the SVR model with a Gaussian kernel using hydrogen as another observation. In the other models, the prediction performance on the test data did not increase.

4. Discussion of Results

This study investigated two approaches to BOS modeling, i.e., based on static observations from a set of meltings and modeling based on joined static and dynamic data. The latter approach has not yet been studied in detail due to the need for more temperature and carbon measurements during meltings. Five machine learning methods were compared, and their performances were examined both in the training phase, i.e., target prediction on training data, and in the testing phase, i.e., prediction on unknown data that were not part of model learning. The performance of machine learning methods was evaluated using statistical indicators and calculated errors in the melting endpoint.

The speed or duration of training depends on the complexity of the algorithm. The results showed that in terms of time, the slowest modeling method was based on RF, both when modeling from joined static and dynamic data and when modeling only from static data. On the other hand, k-NN regression proved to be the fastest machine learning method in both BOS modeling approaches. Machine learning speed, e.g., in the case of an online model update, may affect control interventions for the process. The RF algorithm creates as many trees on the data subset and combines all the trees’ output. Hence, it reduces the overfitting problem. However, this algorithm requires much more computational power and resources. On the other hand, the fastest k-NN algorithm does not explicitly create any model, and it simply indicates new learning based on entering data from historical data. In modeling, the number of neighbors k was set to 5, but performance improvements can be found by optimizing this parameter.

The MARS method proved to have the best endpoint prediction accuracy in terms of performance. The MARS model most accurately predicted temperature and carbon concentration when predicted from both static and dynamic data. The NN proved to be the least accurate model for temperature prediction from static data. The advantage of the MARS model is its relatively simple interpretation and portability. In general, MARS models worked very well on large datasets of meltings. They offer quick computation in prediction and do not require the standardization of the predictor variables. On the other hand, this algorithm proved to be more computationally intensive in training.

The RF-based model best approximated training data in the case of modeling based on combined static and dynamic data and prediction from dynamic melting data. The piecewise-linear MARS model proved to have the most accurate endpoint temperature prediction, and the k-NN model proved to have the most accurate carbon prediction. Unfortunately, MARS models failed in training from combined static and dynamic data. The MARS failed due to the higher variability of inputs and low correlation between inputs and targets.

The modeling and prediction of BOS from static data investigated how the performance of individual machine learning methods changes after adding lime and dolomitic lime as additional input observations. The results showed that the SVR models improved their performance index in testing in the case of temperature prediction. Although other models reduced their prediction performance in testing, the piecewise-cubic MARS model remained the most powerful. In testing carbon prediction, it improved the performance of the piecewise-linear MARS model, the SVR model with a Gaussian kernel, and NN. However, other models in the test phase decreased in prediction performance. Therefore, each machine learning method must select the optimal input parameters.

Regarding the quality of dynamic temperature prediction, the SVR method with a polynomial kernel and the piecewise-cubic MARS model proved to be the most effective. The simulated temperature in these cases corresponded most closely to the standard temperature profile, as reported in the literature (e.g., [3,4,5,45,46]). Here, however, it must be noted that the shape of the curves may vary with each melting depending on the BOS inputs and control interventions.

It was found that the SVR model with the polynomial kernel and the k-NN model can simulate the likeliest dynamic course of carbon in the melt.

Comparing the two approaches to the prediction of BOS process variables, we can state that the temperature prediction was more efficient in the case of modeling from static melting data and the carbon prediction in the case of modeling from combined static and dynamic melting data. This was demonstrated by calculating the relative and absolute errors at the endpoints of the ten selected meltings. Unfortunately, both investigated approaches to BOS process modeling may have weaknesses in practical implementation. In the case of modeling and prediction from static data, some potential target variable was always part of the training, i.e., in the case of temperature modeling, the carbon concentration and the melting time were included in the training as input observations. In the case of carbon modeling, the training also included endpoint temperature and melting time as input observations. A similar approach was applied in [27].

On the other hand, in the case of modeling from the combined static and dynamic data database, only the measured data from the start and endpoint of the melting were used, which may be insufficient to estimate the dynamic course of the target process variables between these two points. The quality of the dynamic behavior of two target variables was only graphically compared with its standard waveforms. Therefore, to improve the prediction of the dynamic behavior of temperature and carbon, we propose to perform more checking measurements during multiple meltings to train the models better. Only the melting endpoint at which the relative and absolute errors were calculated was considered to assess the accuracy of the models. The accuracy of the models was verified on ten selected meltings. Although only two measurements from each melting were used to train the models, the prediction results from the dynamic data are more than promising. By online adaptation of the machine model, it can achieve an increase in prediction performance. In addition, the continuously adapted machine model can serve as a soft sensor in the steelmaking monitoring system.

Recently, interest in machine learning algorithms has increased significantly in controlling the steel production process. The end parameters of the molten steel, such as steel temperature and carbon content, directly affect the quality of the production steel. In addition, these endpoint process variables are problematic to measure online continuously over time. A literature review shows that many studies have focused on developing predictions based on machine models. In the literature, we usually find only predictions from static data. Unfortunately, the authors often do not report relevant outputs, and model settings and results are presented differently. Works that solve the modeling of the dynamic course of the target variables of the BOS process are usually based on balance equations, thermodynamics, and various mathematical and physical laws.

In our recent research (see [6], we have compared different approaches to the creation of a mathematical model of melt temperature in an LD converter. A proposed complex deterministic model for steelmaking has been compared with linear and nonlinear regression models, SVR and ANFIS. The proposed deterministic model based on mathematical-physical laws is not as dependent on the parameters of previous meltings as it is with machine models. The SVR model achieved a lower average absolute deviation (i.e., 18.5 °C) than the deterministic model (i.e., 19.3 °C) but worse than the regression model (i.e., 11.9 °C). The SVR model could better approximate the standard dynamic course of the melt temperature.

Methods with kernel functions are very popular in machine learning and steelmaking. Models based on SVR also achieved interesting results in our study when predicting from static and dynamic data. Support-vector regression (SVR), based on a statistical learning algorithm, can be used as a soft sensor to model the complex relationship between input and output data. Schlueter et al. [18] designed and amended a data-driven prediction model for BOF endpoint based on support-vector machines (SVM) and applied it to an information system for steelmaking. Their model based on off-gas analysis (i.e., CO, CO₂, O₂) was implemented on an HMI monitoring device. When offline, temperature prediction achieved a standard deviation of ± 18.8 °C, and online prediction achieved 16 °C. In offline carbon prediction, they reached a standard deviation of ± 0.0061%, and in online prediction, they reached a standard deviation of 0.0040%. Online prediction has been successfully tested on 1400 meltings. Our SVR temperature model with the highest performance with the basic observations achieved RMSE = 16.60, and the average absolute error from those meltings was 11.90 °C. In the case of the best SVR carbon model, it achieved RMSE = 25.73, and the average absolute error was 0.004 vol.%. In our study, only the basic SVR algorithm was used, but in the literature, various improvements to SVR can be found. For example, Duan et al. [20] proposed a twin support-vector regression (TSVR), a novel prediction model based on the fire work algorithm (FWA) optimizing the parameters of TSVR. They investigated 50 meltings. Their FWATSVR results in RMSE had a value of 18.494 in temperature prediction and 2.870 in carbon prediction. They improved the standard SVR algorithm, where the RMSE was 23.952 in temperature prediction and the RMSE was 3.072 in carbon prediction. They used nine static BOS variables as inputs to the training model. Another static control model based on wavelet transform weighted twin support-vector regression (WTWTSVR) was proposed in [21]. They used 9 input observations with 170 training samples and 50 for testing. The results showed that RMSE was 0.0023 with the carbon model and 4.0210 with the temperature model. The carbon model was learned in 0.4523 s, and the temperature model was learned in 0.0977 s.

The best regression model created by the MARS method with the basic observations in our study achieved an MSE with the value of 209.09 and 14.46 when predicting temperature from static data. The indicators MSE with a value of 0.0001 and RMSE with a value of 24.81 were achieved in carbon prediction. This was a prediction on a test dataset where the average absolute error from ten melts was 11.34 °C in the case of temperature prediction and 0.0034 vol.% in the case of carbon prediction. Unfortunately, only one application of MARS on BOS was found in the literature. Diaz et al. [54] applied stochastic forecasting to hot metal temperature prediction based on MARS. They used a moving window approach for the training dataset selection to avoid the need for periodic re-tuning of the model. In temperature prediction on 2195 testing meltings, they achieved a mean absolute error of 11.2 °C, which is similar to our result. They used five static inputs to train the MARS model.

In the literature, many NN applications and their improvements to predict BOS process variables can also be found. The best NN in our study with the basic observations achieved MSE = 537.45 and RMSE = 23.18 when predicting temperature from static data. On the other hand, MSE = 0.0004 and RMSE = 0.0196 were achieved in carbon prediction. This was a prediction on a test dataset where the average absolute error from ten melts was 13.34 °C in the case of temperature prediction and 0.0073 vol.% in the case of carbon prediction. In our study, we used only a simple NN with one hidden layer without further optimization of the network and its structure. For that reason, we can find better results by applying NNs to BOS in the literature. For example, Li et al. [22] applied a NN combined with a particle swarm optimization (PSO) algorithm to predict the BOF’s endpoint carbon content and endpoint temperature. Their results show that when the carbon content prediction error is within ±0.02, the prediction accuracy rate of carbon content is 92.85%. On the other hand, when the endpoint temperature prediction error is within ± 10 °C, the prediction accuracy rate of the endpoint temperature is 89.28%. They tested NN with PSO on 30 melts. The absolute error in carbon prediction was in the range of 0.005–0.025 vol.% and 5–10 °C in temperature prediction. Park et al. [25] compared an artificial NN and proposed a least-squares support-vector machine (LSSVM) model for temperature prediction based on 13 input observations. The best obtained RMSE was 16.36 in ANN and 13.21 in LSSVM. These results are better than in our research.

In our research, the RF model with the basic observations achieved an RMSE with a value of 17.46 when predicting temperature from static data and 0.0125 when predicting carbon. For comparison, Sala et al. [34] investigated a data-driven model based on ridge regression, random forest (RF), and gradient-boosted regression trees (GBRT) on BOS variable prediction. They modeled melt temperature and its chemical composition in the endpoint. In ridge regression and carbon prediction, the obtained RMSE was 0.93, and temperature prediction reached 0.73. In RF regression and carbon prediction, the RMSE was 1.05, and in temperature prediction, the RMSE was 0.92. They obtained the best carbon prediction result with GBRT, where RMSE was 0.88. In temperature prediction with GBRT, the obtained RMSE was 0.79.

In future research, selected machine learning-based models and the deterministic model will be improved for practical use in steelmaking plants. As an alternative to the existing machine learning methods, extreme learning machines also have excellent potential for developing efficient and accurate modeling in steelmaking. In future research, we plan to optimize the NN. It would be interesting to change the network structure and increase the training set to assess the impact on the NN’s performance. It would also be interesting to investigate the prediction performance of online models.

5. Conclusions

From the point of view of automation and control of the BOS process, regulated process variables are essential, which we must know. However, some process variables are not long-term measurable by standard hardware (e.g., aggressive environments or high temperatures in the measuring area). This comparative study investigated two approaches to BOS process modeling to predict melt temperature and carbon concentration. These approaches have been investigated as a potential tool for software sensing process variables in BOS.

The application of the MARS method to the BOS process showed interesting results. In addition, the results of MARS have not yet been compared in the literature with other machine learning methods to predict temperature and carbon in BOS. The study also proposed a modeling and prediction approach from combined static and dynamic BOS data, which has not yet been investigated in the literature. The prediction results from the dynamic observations of the BOS process using machine models showed an improvement in carbon prediction compared to the prediction from static data only. In addition, predictions from dynamic melting observations make it possible to simulate the entire dynamic course of the target quantity.

The knowledge gained in this study can be summarized in several main points:

The speed of learning depends on the complexity of the algorithms of individual machine learning methods.
The k-NN method proved to be the fastest machine learning method for BOS modeling from static and combined static and dynamic data.
The RF method proved to be the slowest in training in all machine models.
The MARS method was shown to be the most powerful machine learning method for predicting endpoint temperature and carbon based on static data. This method best approximated nonlinearities between static variables.
In general, the prediction of melt carbon concentration from static data is less powerful than the prediction of melt temperature from static data.
It was found that changing the number of input observations affects the performance of machine models in the testing phase, so it is necessary to look for the optimal number of relevant observations so that the prediction performance from static data does not decrease.
In the case of observation number increases in lime and dolomitic lime, most models increased performance in training.
It was found that changing the number of input observations in the case of prediction from dynamic data can change the model’s accuracy. The model’s accuracy, in addition to the algorithm, depends on the user inputs and their significance. For example, adding some insignificant observations may reduce the accuracy of the prediction.
In case of temperature prediction from static data and observation number increase, only SVR with Gaussian kernel, NN, and piecewise-linear MARS model increased prediction performance in testing. In the case of carbon prediction from the static data and observation number increase, only SVR with a Gaussian kernel, NN, and the piecewise-linear MARS model improved prediction performance in testing.
The prediction results from the dynamic observations of the BOS process using machine models showed an improvement in carbon prediction compared to the prediction from static data only.
Predictions from dynamic melting observations make it possible to simulate the entire dynamic course of the target quantity.
In terms of quality, dynamic behavior was best simulated by SVR, MARS, and k-NN-based models.
The piecewise-linear MARS model proved to be the most accurate in predicting temperature, and the k-NN model was the most accurate in predicting carbon at the endpoint of melting from dynamic data.

Author Contributions

Conceptualization, J.K. and P.F.; Data curation, J.K.; Formal analysis, J.K. and P.F.; Methodology, J.K. and M.D.; Project administration, J.K.; Resources, M.D. and M.L.; Supervision, M.D. and M.L.; Validation, J.K.; Writing—Original draft preparation, J.K. and P.F.; Writing—review and editing M.D. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by U.S. Steel Košice under works contract No. P-101-0030/17 “Research on indirect measurement of temperature and carbon in the process of steelmaking” and by the Cultural and educational grant agency of the Ministry of Education, science, research and sport of the Slovak Republic under grant KEGA 016TUKE-4/2020 “Projects of applied research as a means for development of new models of education in the study program of industrial logistics”. The APC was funded by grant KEGA 016TUKE-4/2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

This research was supported by the Cultural and Educational Grant Agency of the Ministry of Education, Science, Research and Sport of the Slovak Republic under grant KEGA 016TUKE-4/2020 “Projects of applied research as a means for development of new models of education in the study program of industrial logistics”, and by the U.S. Steel Košice under works contract No. P-101-0030/17 concluded in business case No. AG407HH0501 “Research on indirect measurement of temperature and carbon in the process of steelmaking”).

Conflicts of Interest

The authors declare no conflict of interest.

References

Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, G.M. Soft Sensors for Monitoring and Control of Industrial Processes; Springer: London, UK, 2007; p. 271. [Google Scholar] [CrossRef]
Hubmer, R.; Kühböck, H.; Pastucha, K. Latest Innovations in Converter Process Modelling. In Proceedings of the Metec ε, 2nd Estat, Dusseldorf, Germany, 25 June 2015; pp. 1–7. [Google Scholar]
Weeks, R. Dynamic Model of the BOS Process, Mathematical Process Models in Iron and Steel Making; The Metals Society: Amsterdam, The Netherlands, 1973; pp. 103–116. [Google Scholar]
Laciak, M.; Petráš, I.; Terpák, J.; Kačur, J.; Flegner, P.; Durdán, M.; Tréfa, G. Výskum Nepriameho Merania Teploty a Uhlíka v Procese Skujňovania. (Zmluva o Dielo č. P-101-0030/17) (en: Research on Indirect Measurement of Temperature and Carbon in the Process of Steelmaking (Contract for Work No. P-101-0030/17)); Technical Report 2018; Technical University of Košice, Faculty BERG, Institute of Control and Informatization of Production Processes: Košice, Slovakia, 2018. [Google Scholar]
Laciak, M.; Kačur, J.; Flegner, P.; Terpák, J.; Durdán, M.; Tréfa, G. The Mathematical Model for Indirect Measurement of Temperature in the Steel-Making Process. In Proceedings of the 2020 21th International Carpathian Control Conference (ICCC), Kosice, Slovakia, 27–29 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
Laciak, M.; Kačur, J.; Terpák, J.; Durdán, M.; Flegner, P. Comparison of Different Approaches to the Creation of a Mathematical Model of Melt Temperature in an LD Converter. Processes 2022, 10, 1378. [Google Scholar] [CrossRef]
Wu, L.; Yang, N.; You, X.; Xing, K.; Hu, Y. A Temperature Prediction Model of Converters Based on Gas Analysis. Proc. Earth Planet. Sci. 2011, 2, 14–19. [Google Scholar] [CrossRef] [Green Version]
Sarkar, R.; Gupta, P.; Basu, S.; Ballal, N.B. Dynamic Modeling of LD Converter Steelmaking: Reaction Modeling Using Gibbs’ Free Energy Minimization. Metall. Mater. Trans. B 2015, 46, 961–976. [Google Scholar] [CrossRef]
Terpák, J.; Laciak, M.; Kačur, J.; Durdán, M.; Flegner, P.; Trefa, G. Endpoint Prediction of Basic Oxygen Furnace Steelmaking Based on Gradient of Relative Decarburization Rate. In Proceedings of the 2020 21th International Carpathian Control Conference (ICCC), Ostrava, Czech Republic, 27–29 October 2020. [Google Scholar] [CrossRef]
Kumari, V. Mathematical Modeling of Basic Oxygen Steel Making Process; National Institute of Technology: Rourkela, India, 2015. [Google Scholar]
Wang, X.; Xing, J.; Dong, J.; Wang, Z. Data driven based endpoint carbon content real time prediction for BOF steelmaking. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017. [Google Scholar] [CrossRef]
Terpák, J.; Flegner, P.; Kačur, J.; Laciak, M.; Durdán, M.; Trefa, G. Utilization of the Mathematical Model of the Converter Process for the Sensitivity Analysis. In Proceedings of the 2019 20th International Carpathian Control Conference (ICCC), Krakow, Poland, 26–29 May 2019. [Google Scholar] [CrossRef]
Asai, S.; Muchi, I. Theoretical Analysis by the Use of Mathematical Model in LD Converter Operation. Trans. Iron Steel Inst. Jpn. 1970, 10, 250–263. [Google Scholar] [CrossRef] [Green Version]
Xie, S.; Chai, T. Prediction of BOF Endpoint Temperature and Carbon Content. IFAC Proc. Vol. 1999, 32, 7039–7043. [Google Scholar] [CrossRef]
Kostúr, K.; Laciak, M.; Truchlý, M. Systémy Nepriameho Merania (en: Systems of Indirect Measurement), 1st. ed.; Monograph; Reprocentrum: Košice, Slovakia, 2005; p. 172. [Google Scholar]
Huang, W.; Liu, Y.; Dong, Z.; Yang, B. The Regression Equation of Oxygen Content and Temperature to End Point of Bath Based on Exhaust Gas Analysis. In Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering, Changsha, China, 24–25 October 2015. [Google Scholar] [CrossRef] [Green Version]
Bouhouche, S.; Mentouri, Z.; Meradi, H.; Yazid, L. Combined Use of Support Vector Regression and Monte Carlo Simulation in Quality and Process Control Calibration. In Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management, Istanbul, Turkey, 3–6 July 2012; pp. 2156–2165. [Google Scholar]
Schlüter, J.; Odenthal, H.J.; Uebber, N.; Blom, H.; Morik, K. A novel data-driven prediction model for BOF endpoint. In Proceedings of the Association for Iron & Steel Technology Conference, Pittsburgh, PA, USA, 6–9 May 2013; pp. 1–6. [Google Scholar]
Schlüter, J.; Uebber, N.; Odenthal, H.J.; Blom, H.; Beckers, T.; Morik, K. Reliable BOF endpoint prediction by novel data-driven modeling. In Proceedings of the Association for Iron & Steel Technology Conference, AISTech 2014 Proceedings, Pittsburgh, PA, USA, 16–18 May 2014; pp. 1159–1165. [Google Scholar]
Duan, J.; Qu, Q.; Gao, C.; Chen, X. BOF steelmaking endpoint prediction based on FWA-TSVR. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 4507–4511. [Google Scholar] [CrossRef]
Gao, C.; Shen, M.; Liu, X.; Wang, L.; Chu, M. End-Point Static Control of Basic Oxygen Furnace (BOF) Steelmaking Based on Wavelet Transform Weighted Twin Support Vector Regression. Complexity 2019, 2019, 7408725. [Google Scholar] [CrossRef] [Green Version]
Li, W.; Wang, X.; Wang, X.; Wang, H. Endpoint Prediction of BOF Steelmaking based on BP Neural Network Combined with Improved PSO. Chem. Eng. Trans. 2016, 51, 475–480. [Google Scholar] [CrossRef]
Cai, B.Y.; Zhao, H.; Yue, Y.J. Research on the BOF steelmaking endpoint temperature prediction. In Proceedings of the 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), Jilin, China, 19–22 August 2011. [Google Scholar] [CrossRef]
Han, M.; Liu, C. Endpoint prediction model for basic oxygen furnace steel-making based on membrane algorithm evolving extreme learning machine. Appl. Soft Comput. 2014, 19, 430–437. [Google Scholar] [CrossRef]
Park, T.C.; Kim, B.S.; Kim, T.Y.; Jin, I.B.; Yeo, Y.K. Comparative Study of Estimation Methods of the Endpoint Temperature in Basic Oxygen Furnace Steelmaking Process with Selection of Input Parameters. Kor. J. Met. Mater. 2018, 56, 813–821. [Google Scholar] [CrossRef]
Yue, Y.J.; Yao, Y.D.; Zhao, H.; Wang, H.J. BOF Endpoint Prediction Based on Multi-Neural Network Model. Appl. Mech. Mater. 2013, 441, 666–669. [Google Scholar] [CrossRef]
Rajesh, N.; Khare, M.R.; Pabi, S.K. Feed forward neural network for prediction of end blow oxygen in LD converter steel making. Mater. Res. 2010, 13, 15–19. [Google Scholar] [CrossRef] [Green Version]
Fileti, A.F.; Pacianotto, T.; Cunha, A.P. Neural modeling helps the BOS process to achieve aimed end-point conditions in liquid steel. Eng. Appl. Artif. Intell. 2006, 19, 9–17. [Google Scholar] [CrossRef]
Jun, T.; Xin, W.; Tianyou, C.; Shuming, X. Intelligent Control Method and Application for BOF Steelmaking Process. IFAC Proc. Vol. 2002, 35, 439–444. [Google Scholar] [CrossRef] [Green Version]
Huang, X.; Han, M. Greedy Kernel Components Acting on ANFIS to Predict BOF Steelmaking Endpoint. IFAC Proc. Vol. 2008, 41, 11007–11012. [Google Scholar] [CrossRef] [Green Version]
Han, M.; Cao, Z. An improved case-based reasoning method and its application in endpoint prediction of basic oxygen furnace. Neurocomputing 2015, 149, 1245–1252. [Google Scholar] [CrossRef]
Ruuska, J.; Ollila, S.; Leiviskä, K. Temperature Model for LD-KG Converter. IFAC Proc. Vol. 2003, 36, 71–76. [Google Scholar] [CrossRef]
Hu, Y.; Zheng, Z.; Yang, J. Application of Data Mining in BOF Steelmaking Endpoint Control. Adv. Mater. Res. 2011, 402, 96–99. [Google Scholar] [CrossRef]
Sala, D.A.; Jalalvand, A.; Deyne, A.Y.D.; Mannens, E. Multivariate Time Series for Data-Driven Endpoint Prediction in the Basic Oxygen Furnace. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018. [Google Scholar] [CrossRef] [Green Version]
Akundi, A.; Euresti, D.; Luna, S.; Ankobiah, W.; Lopes, A.; Edinbarough, I. State of Industry 5.0—Analysis and Identification of Current Research Trends. Appl. Syst. Innov. 2022, 5, 27. [Google Scholar] [CrossRef]
Wang, X. Made in China 2025: Industrial country from great to powerful. Internet Things Technol. 2015, 5, 3–4. [Google Scholar] [CrossRef]
Ma, H.; Huang, X.; Cui, X.; Wang, P.; Chen, Y.; Hu, Z.; Hua, L. Management Control and Integration Technology of Intelligent Production Line for Multi-Variety and Complex Aerospace Ring Forgings: A Review. Metals 2022, 12, 1079. [Google Scholar] [CrossRef]
Beliatis, M.; Jensen, K.; Ellegaard, L.; Aagaard, A.; Presser, M. Next Generation Industrial IoT Digitalization for Traceability in Metal Manufacturing Industry: A Case Study of Industry 4.0. Electronics 2021, 10, 628. [Google Scholar] [CrossRef]
Grabowska, S.; Saniuk, S.; Gajdzik, B. Industry 5.0: Improving humanization and sustainability of Industry 4.0. Scientometrics 2022, 127, 3117–3144. [Google Scholar] [CrossRef] [PubMed]
Pehlke, R.D. An Overview of Contemporary Steelmaking Processes. JOM 1982, 34, 56–64. [Google Scholar] [CrossRef]
Oeters, F. Metallurgy of Steelmaking; Verlag Stahleisen mbH: Dusseldoff, Germany, 1994; p. 512. [Google Scholar]
Ban, T.E. Basic Oxygen Steel Making Process. U.S. Patent No. 3,301,662, 31 January 1967. [Google Scholar]
Ghosh, A.; Chatterjee, A. Ironmaking and Steelmaking, Theory and Practice; PHI Learning, Private Limited: New Delhi, India, 2008; p. 494. [Google Scholar]
Takemura, Y.; Saito, T.; Fukuda, S.; Kato, K. BOF Dynamic Control Using Sublance System; Nippon Steel Technical Report, No. 11, March 1978. UDC 669. 012.1-52: 669.184. 244. 66: 681. 3; Technical Report 11; Nippon Steel Corporation: Tokyo, Japan, 1978. [Google Scholar]
Krumm, W.; Fett, F.N. Energiemodell eines LD-Stahlwerks. Stahl Und Eisen 1987, 107, 410–416. [Google Scholar]
Takawa, T.; Katayama, K.; Katohgi, K.; Kuribayashi, T. Analysis of Converter Process Variables from Exhaust Gas. Trans. Iron Steel Inst. Jpn. 1988, 28, 59–67. [Google Scholar] [CrossRef]
Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
Sephton, P. Forecasting Recessions: Can We Do Better on MARS? Federal Reserve Bank of St. Louis: St. Louis, MO, USA, 2001.
Chugh, M.; Thumsi, S.S.; Keshri, V. A Comparative Study Between Least Square Support Vector Machine(Lssvm) and Multivariate Adaptive Regression Spline(Mars) Methods for the Measurement of Load Storing Capacity of Driven Piles in Cohesion Less Soil. Int. J. Struct. Civ. Eng. Res. 2015, 4, 189–194. [Google Scholar] [CrossRef]
Tselykh, V.R. Multivariate adaptive regression splines. Mach. Learn. Data Anal. 2012, 1, 272–278. [Google Scholar] [CrossRef]
Samui, P.; Kothari, D.P. A Multivariate Adaptive Regression Spline Approach for Prediction of Maximum Shear Modulus and Minimum Damping Ratio. Eng. J. 2012, 16, 69–78. [Google Scholar] [CrossRef] [Green Version]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning—Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Zhang, W.; Goh, A.T.C. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geosci. Front. 2016, 7, 45–52. [Google Scholar] [CrossRef] [Green Version]
Díaz, J.; Fernández, F.J.; Prieto, M.M. Hot Metal Temperature Forecasting at Steel Plant Using Multivariate Adaptive Regression Splines. Metals 2019, 10, 41. [Google Scholar] [CrossRef] [Green Version]
Jekabsons, G. ARESLab: Adaptive Regression Splines Toolbox for Matlab/Octave. 2022. Available online: http://www.cs.rtu.lv/jekabsons/regression.html (accessed on 24 February 2022).
Kačur, J.; Durdán, M.; Laciak, M.; Flegner, P. A Comparative Study of Data-Driven Modeling Methods for Soft-Sensing in Underground Coal Gasification. Acta Polytech. 2019, 59, 322–351. [Google Scholar] [CrossRef] [Green Version]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory—COLT ’92, Pittsburgh, PA, USA, 27–29 July 1992; ACM Press: Pittsburgh, PA, USA, 1992; pp. 144–152. [Google Scholar] [CrossRef]
Vapnik, V.N. Constructing Learning Algorithms. In The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995; pp. 119–166. [Google Scholar] [CrossRef]
Kačur, J.; Laciak, M.; Flegner, P.; Terpák, J.; Durdán, M.; Trefa, G. Application of Support Vector Regression for Data-Driven Modeling of Melt Temperature and Carbon Content in LD Converter. In Proceedings of the 2019 20th International Carpathian Control Conference (ICCC), Krakow, Poland, 26–29 May 2019. [Google Scholar] [CrossRef]
Smola, A.; Schölkopf, B.; Müller, K.R. General cost functions for support vector regression. In Proceedings of the 9th Australian Conference on Neural Networks, Brisbane, Australia, 11–13 February 1999; Downs, T., Frean, M., Gallagher, M., Eds.; University of Queensland: Brisbane, Australia, 1999; pp. 79–83. [Google Scholar]
Burges, C.J.C.; Schölkopf, B. Improving the accuracy and speed of support vector learning machines. In Advances in Neural Information Processing Systems 9; Mozer, M., Jordan, M., Petsche, T., Eds.; MIT Press: Cambridge, MA, USA, 1997; pp. 375–381. [Google Scholar]
Lanckriet, G.; Cristianini, N.; Bartlett, P.; El, G.L.; Jordan, M. Learning the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res. 2004, 5, 27–72. [Google Scholar]
MathWorks. Matlab Statistics and Machine Learning Toolbox Release 2016b; MathWorks: Natick, MA, USA, 2016. [Google Scholar]
MathWorks. Understanding Support Vector Machine Regression. In Statistics and Machine Learning Toolbox User’s Guide (R2022a); regression.html; MathWorks: Natick, MA, USA, 2022; Available online: https://www.mathworks.com/help/stats/understanding-support-vector-machine-regression.html (accessed on 24 February 2022).
Smola, A.J.; Schölkopf, B. On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion. Algorithmica 1998, 22, 211–231. [Google Scholar] [CrossRef]
Kvasnička, V.; Beňušková, Ľ.; Pospíchal, J.; Farkaš, I.; Tiňo, P.; Kráľ, A. Úvod do Teórie Neurónových Sietí; IRIS: Bratislava, Slovakia, 1997. [Google Scholar]
MathWorks. Deep Learning Toolbox; MathWorks: Natick, MA, USA, 2022. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Wiliams, R.J. Learning internal representation by error propagation. In Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Vol 1: Foundation; Rumelhart, D.E., McClelland, J.L., PDP Research Group, Eds.; Stanford University: Stanford, CA, USA, 1987. [Google Scholar]
Sampson, G.; Rumelhart, D.E.; McClelland, J.L.; Group, T.P.R. Parallel Distributed Processing: Explorations in the Microstructures of Cognition. Language 1987, 63, 871. [Google Scholar] [CrossRef]
Fix, E.; Hodges, J.L. Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties (Report). 1951. Available online: https://apps.dtic.mil/sti/pdfs/ADA800276.pdf (accessed on 24 February 2022).
Altman, N.S. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef] [Green Version]
Piryonesi, S.M.; El-Diraby, T.E. Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems. J. Transp. Eng. Part B Pavements 2020, 146, 04020022. [Google Scholar] [CrossRef]
Han, M.; Wang, X. BOF Oxygen Control by Mixed Case Retrieve and Reuse CBR. IFAC Proc. Vol. 2011, 44, 3575–3580. [Google Scholar] [CrossRef] [Green Version]
Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–23. [Google Scholar] [CrossRef]
Ferreira, D. k-Nearest Neighbors (kNN) Regressor. GitHub. 2020. Available online: https://github.com/ferreirad08/kNNeighborsRegressor/releases/tag/1.0.1 (accessed on 24 February 2022).
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef] [Green Version]
Piryonesi, S.M.; El-Diraby, T.E. Using Machine Learning to Examine Impact of Type of Performance Indicator on Flexible Pavement Deterioration Modeling. J. Infrastruct. Syst. 2021, 27, 04021005. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Minitab. Random Forests Trademark of Health Care Productivity, Inc.—Registration Number 3185828—Serial Number 78642027. 2006. Available online: https://trademarks.justia.com/857/89/randomforests-85789388.html (accessed on 24 February 2022).
Laha, D.; Ren, Y.; Suganthan, P. Modeling of steelmaking process with effective machine learning techniques. Exp. Syst. Appl. 2015, 42, 4687–4696. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Banerjee, S. Generic Example Code and Generic Function for Random Forests. MATLAB Central File Exchange. 2022. Available online: https://www.mathworks.com/matlabcentral/fileexchange/63698-generic-example-code-and-generic-function-for-random-forests (accessed on 24 February 2022).
Gandomi, A.H.; Roke, D.A. Intelligent formulation of structural engineering systems. In Proceedings of the Seventh MIT Conference on Computational Fluid and Solid Mechanics-Focus: Multiphysics and Multiscale, Cambridge, MA, USA, 12–14 June 2013. [Google Scholar]
Gandomi, A.H.; Roke, D.A. Assessment of artificial neural network and genetic programming as predictive tools. Adv. Eng. Softw. 2015, 88, 63–72. [Google Scholar] [CrossRef]

Figure 1. Model-based simulation of temperature and carbon in melting [4].

Figure 2. Idea scheme of modeling and prediction in steelmaking based on machine learning.

Figure 3. Merging static and dynamic data.

Figure 4. The proposal of a NN for the prediction of process variables in the BOS.

Figure 5. Prediction of endpoint melt temperature by SVR model with the polynomial kernel.

Figure 6. Prediction of endpoint melt temperature by SVR model with the Gaussian kernel.

Figure 7. Prediction of endpoint melt temperature by NN model.

Figure 8. Prediction of endpoint melt temperature by piecewise-linear MARS model.

Figure 9. Prediction of endpoint melt temperature by piecewise-cubic MARS model.

Figure 10. Prediction of endpoint melt temperature by k-NN model.

Figure 11. Prediction of endpoint melt temperature by RF model.

Figure 12. Prediction of endpoint carbon concentration in melt by SVR model with a polynomial kernel.

Figure 13. Prediction of endpoint carbon concentration in melt by SVR model with a Gaussian kernel.

Figure 14. Prediction of endpoint carbon concentration in melt by NN model.

Figure 15. Prediction of endpoint carbon concentration in melt by piecewise-linear MARS model.

Figure 16. Prediction of endpoint carbon concentration in melt by piecewise-cubic MARS model.

Figure 17. Prediction of endpoint carbon concentration in melt by k-NN model.

Figure 18. Prediction of endpoint carbon concentration in melt by RF model.

Figure 19. Dynamic input observations of the melting #5.

Figure 20. Prediction of melt temperature from dynamic observations by (a) SVR model with polynomial kernel, (b) SVR model with Gaussian kernel, (c) NN model, (d) piecewise-linear MARS model, (e) piecewise-cubic MARS model, (f) k-NN model, (g) RF model.

Figure 21. Prediction of carbon concentration in the melt from dynamic observations by (a) SVR model with polynomial kernel, (b) SVR model with Gaussian kernel, (c) NN model, (d) k-NN model, (e) RF model.

Table 1. Static observations and targets with measurement ranges.

Observations	Ranges	Units	Targets	Ranges	Units
$x_{1}$ —Steel quality class	444–959	(-)	Endpoint melt temperature	1580–1720	( $^{\circ}$ C)
$x_{2}$ —Amount of blown oxygen	7000–8900	(Nm $^{3}$ )	Endpoint carbon concentration in melt	0.02–0.08	(%)
$x_{3}$ —Duration of oxygen blowing	900–1100	(s)	Melting duration	25–80	(min)
$x_{4}$ —Pig iron temperature	1200–1400	( $^{\circ}$ C)	(Optional)
$x_{5}$ —Weight of pig iron	1,300,000–170,000	(kg)
$x_{6}$ —Carbon concentration in pig iron	4.0–4.6	(%)
$x_{7}$ —Silicon concentration in pig iron	0.1–1.5	(%)
$x_{8}$ —Manganese concentration in pig iron	0.1–0.8	(%)
$x_{9}$ —Phosphorus concentration in pig iron	0.04–0.08	(%)
$x_{10}$ —Sulfur concentration in pig iron	0.002–0.02	(%)
$x_{11}$ —Titanium concentration in pig iron	0.005–0.05	(%)
$x_{12}$ —Scrap weight added to pig iron	23,000–5500	(kg)
$x_{13}$ —Amount of added magnesite to the melt	0–2000	(kg)
$x_{14}$ —Amount of Fe in pig iron	140,000–170,000	(kg)
$x_{15}$ —Amount of after-blow oxygen	0–1000	(Nm $^{3}$ )
Optional:
Endpoint melt temperature	1580–1720	( $^{\circ}$ C)
Endpoint carbon concentration in melt	0.02–0.08	(%)
Melting duration	25–80	(min)
Amount of lime added to the melt	4500–1200	(kg)
Amount of dolomitic lime added to the melt	2100–6000	(kg)

Table 2. Dynamic observations and targets with measurement ranges.

Observations	Ranges	Units	Targets	Ranges	Units
$x_{1}$ —Concentration of CO in waste gas	0–85	(%)	Melt tempearture	1580–1720	( $^{\circ}$ C)
$x_{2}$ —Concentration of CO $_{2}$ in waste gas	0–35	(%)	Carbon concentration in melt	0.02–0.08	(%)
$x_{3}$ —Temperature of waste gas	80–1000	( $^{\circ}$ C)
$x_{4}$ —Accumulated amount of blown oxygen	0–8500	(Nm $^{3}$ )
Optional:
Concentration of O $_{2}$ in waste gas	0–23	(%)
Concentration of H $_{2}$ in waste gas	0–12	(%)
Volume flow of waste gas	80,000–100,000	(m $^{3}$ /h)

Table 3. Comparison of advantages and disadvantages of machine learning methods.

Method	Advantages	Disadvantages
SVR	Can utilize predictive power of linear combinations of inputs. Works well outside of training data. A good solution for regression on nonlinear data. Not prone to overfitting. Durable to noise. Low generalization error.	Difficult to understand structure of algorithm. Depends on the kernel function. Needs normalizing of input data.
NN	Tolerant to noise and missing data. A good solution for regression on nonlinear data. Extensive literature. Good prediction, generally. Some tolerance to correlated inputs. Incorporating the predictive power of different combinations of inputs.	Difficult to understand structure of algorithm. Computationally expensive and prone to overfitting. Needs a lot of training data, often much more than that required for standard machine learning algorithms. Prediction outside of training data can be drastically incorrect. Unimportant inputs may worsen predictions. Requires manual tuning of nodes and layers. Computation costs are typically high. Depends on the training function. Not robust to outliers. Susceptible to irrelevant features. Difficult in dealing with big data with a complex model.
MARS	Works well with many predictor variables. Automatically detects interactions between variables. It is an efficient and fast algorithm, despite its complexity. MARS naturally handles mixed types of predictors (quantitative and qualitative). Robust to outliers. Ability to model large datasets more flexibly than linear models. The final regression model can be portable to various hardware. Automatically models non-linearities and interactions between variables.	Susceptible to overfitting. More difficult to understand and interpret than other methods. Not good with missing data. Typically slower to train. Besides speed, there is also the problem of global optimization vs. local optimization. Although correlated predictors do not necessarily impede model performance, they can make model interpretation difficult.
k-NN	Simple, adaptable to the problem. Accurate. Easy to understand. Uses spatial trees to improve space issues. Nonparametric. Intuitive approach. Robust to outliers on the predictors. Zero cost in the training process.	Memory intensive. Costly, all training data might be involved in decision making. Slow performance due to I/O operations. Selection of the optimal number of neighbors can be problematic. Choosing the wrong distance measures can produce inaccurate results.
RF	Not difficult to understand. High accuracy. A good starting point to solve a problem. Flexible and can fit a variety of different data well. Fast to execute. Easy to use. Useful for regression and classification problems. Can model missing values. High performing.	Slow in training. Overfitting. Not suitable for small samples. A small change in training data changes the model. Occasionally too simple for a very complex problem.

Table 4. Performance of the machine learning methods in temperature prediction based on static data.

Method	Training	Testing
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	0.1006	0.7384	0.5452	244.6054	15.6399	0.9432	0.7008	0.5426	0.5884	0.3462	310.0572	17.6084	1.0615	0.8277	0.6683
SVR (Gaussian kernel)	0.1075	0.7985	0.6376	200.1204	14.1464	0.8532	0.6167	0.4744	0.6439	0.4146	275.6026	16.6013	1.0008	0.7894	0.6088
NN	0.4690	0.4890	0.2391	397.3077	19.9326	1.2021	0.9460	0.8074	0.0904	0.0082	537.4529	23.1830	1.3976	1.1224	1.2817
MARS (piecewise-linear)	8.1688	0.7324	0.5364	241.7318	15.5477	0.9377	0.7463	0.5413	0.7007	0.4910	245.1000	15.6557	0.9438	0.7297	0.5549
MARS (piecewise-cubic)	8.1951	0.7053	0.4974	262.0617	16.1883	0.9763	0.7747	0.5725	0.7518	0.5652	209.0871	14.4598	0.8717	0.6891	0.4976
k-NN	0.0010	0.5584	0.3118	361.3029	19.0080	1.1464	0.9070	0.7356	0.3890	0.1513	417.6730	20.4370	1.2320	0.9927	0.8870
RF	64.5802	0.9368	0.8776	113.0654	10.6332	0.6413	0.4946	0.3311	0.6178	0.3817	304.9878	17.4639	1.0528	0.8388	0.6508

Table 5. Absolute and relative errors on selected meltings in temperature prediction based on static data.

Method	Endpoint Relative Error in Testing (%)											Endpoint Absolute Error in Testing (°C)
	Melting #										Average	Melting #										Average
	1	2	3	4	5	6	7	8	9	10		1	2	3	4	5	6	7	8	9	10
SVR (polynomial kernel)	2.05	0.58	1.34	0.77	0.33	0.64	0.44	0.93	0.46	0.15	0.77	32.66	9.74	21.83	13.00	5.53	10.72	7.32	15.30	7.58	2.44	12.61
SVR (Gaussian kernel)	1.64	0.46	1.45	0.44	0.49	0.48	0.54	0.95	0.36	0.44	0.72	26.14	7.72	23.70	7.41	8.16	8.13	8.95	15.53	5.98	7.26	11.90
NN	1.64	0.05	0.34	1.77	0.88	0.67	0.47	1.55	0.47	0.25	0.81	26.15	0.87	5.57	29.88	14.60	11.28	7.75	25.41	7.85	4.08	13.34
MARS (piecewise-linear)	1.54	0.42	1.57	0.49	0.31	0.52	0.56	0.72	0.39	0.38	0.69	24.56	7.09	25.66	8.34	5.14	8.83	9.34	11.79	6.51	6.32	11.36
MARS (piecewise-cubic)	1.93	0.22	1.57	0.66	0.08	0.56	0.42	1.18	0.23	0.07	0.69	30.81	3.71	25.63	11.20	1.28	9.50	6.96	19.28	3.79	1.23	11.34
k-NN	2.36	0.08	1.69	1.14	0.27	0.52	0.47	1.25	0.06	0.01	0.78	37.66	1.26	27.58	19.18	4.48	8.82	7.84	20.44	0.98	0.14	12.84
RF	1.85	0.83	1.13	0.96	0.53	0.25	0.61	0.76	0.44	0.23	0.76	29.58	13.81	18.41	16.26	8.81	4.20	10.06	12.39	7.32	3.86	12.47

Table 6. Performance of the machine learning methods in carbon prediction based on static data.

Method	Training	Testing
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	2.7453	0.9377	0.8793	0.0000	0.0040	9.0594	323.4241	4.6753	0.2722	0.0741	0.0004	0.0190	39.1046	27.9480	30.7378
SVR (Gaussian kernel)	0.1034	0.7063	0.4989	0.0001	0.0085	19.2427	900.9556	11.2772	0.3970	0.1576	0.0002	0.0125	25.7343	18.1771	18.3893
NN	0.4909	0.1056	0.0112	0.0003	0.0166	37.4686	1623.7490	33.8898	0.1623	0.0263	0.0004	0.0196	40.2912	28.7549	34.6651
MARS (piecewise-linear)	9.4032	0.6296	0.3964	0.0001	0.0090	20.1894	634.5756	12.3893	0.3674	0.1350	0.0002	0.0123	25.4229	17.7828	18.5920
MARS (piecewise-cubic)	10.4514	0.6038	0.3646	0.0001	0.0092	20.7141	705.0538	12.9155	0.3994	0.1595	0.0001	0.0120	24.8100	17.4491	17.7590
k-NN	0.0016	0.5466	0.2988	0.0001	0.0097	21.8553	1022.7833	14.1312	0.2364	0.0559	0.0002	0.0131	27.0069	19.7905	21.8427
RF	68.7665	0.9036	0.8165	0.0000	0.0065	14.7035	857.1016	7.7241	0.3368	0.1134	0.0002	0.0125	25.8377	18.5165	19.3283

Table 7. Absolute and relative errors on selected meltings in carbon prediction based on static data.

Method	Endpoint Relative Error in Testing (%)											Endpoint Absolute Error in Testing (Vol.%)
	Melting #										Average	Melting #										Average
	1	2	3	4	5	6	7	8	9	10		1	2	3	4	5	6	7	8	9	10
SVR (polynomial kernel)	15.12	29.43	11.23	31.32	17.05	4.40	17.12	4.83	11.91	8.88	15.13	0.0067	0.0109	0.0036	0.0094	0.0080	0.0014	0.0079	0.0014	0.0057	0.0029	0.0058
SVR (Gaussian kernel)	14.39	13.41	16.76	17.25	9.81	6.27	1.10	16.21	12.33	1.70	10.92	0.0063	0.0050	0.0054	0.0052	0.0046	0.0020	0.0005	0.0045	0.0059	0.0006	0.0040
NN	22.52	5.19	0.30	44.00	33.92	9.41	1.63	30.32	20.53	28.89	19.67	0.0099	0.0019	0.0001	0.0132	0.0159	0.0030	0.0008	0.0085	0.0099	0.0095	0.0073
MARS (piecewise-linear)	0.07	18.78	13.80	15.32	14.28	12.88	2.07	17.32	12.86	5.56	11.29	0.0000	0.0069	0.0044	0.0046	0.0067	0.0041	0.0010	0.0049	0.0062	0.0018	0.0041
MARS (piecewise-cubic)	2.69	16.99	15.70	3.58	3.61	15.67	1.52	14.32	9.79	11.55	9.54	0.0012	0.0063	0.0050	0.0011	0.0017	0.0050	0.0007	0.0040	0.0047	0.0038	0.0034
k-NN	6.36	8.70	19.88	16.53	7.45	18.44	8.52	18.50	16.42	2.97	12.38	0.0028	0.0032	0.0064	0.0050	0.0035	0.0059	0.0039	0.0052	0.0079	0.0010	0.0045
RF	5.42	18.74	15.00	18.64	8.39	10.25	3.01	19.77	7.02	12.70	11.89	0.0024	0.0069	0.0048	0.0056	0.0039	0.0033	0.0014	0.0055	0.0034	0.0042	0.0041

Table 8. Basis functions of the piecewise-cubic MARS model of melt temperature.

BF	Equation	BF	Equation
BF1	C( $x_{15}$ \| −1, 367.76, 735.52, 984.31)	BF12	BF5 × C( $x_{5}$ \| +1, 83650, 147100, 157750)
BF2	C( $x_{10}$ \| −1, 0.003, 0.006, 0.052)	BF13	BF5 × C( $x_{5}$ \| −1, 83650, 147100, 157750)
BF3	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8392.6)	BF14	BF1 × C( $x_{13}$ \| −1, 597, 1194, 2171.5)
BF4	C( $x_{2}$ \| −1, 5294.8, 7856.6, 8392.6)	BF15	BF3 × C( $x_{8}$ \| +1, 0.2045, 0.306, 0.3185)
BF5	C( $x_{12}$ \| +1, 18900, 37800, 46450)	BF16	BF10 × C( $x_{1}$ \| −1, 627.5, 694, 826.5)
BF6	C( $x_{12}$ \| −1, 18900, 37800, 46450)	BF17	BF1 × C( $x_{16}$ \| +1, 0.017, 0.034, 0.044)
BF7	C( $x_{4}$ \| +1, 1250.3, 1301.5, 1331.5)	BF18	BF2 × C( $x_{7}$ \| −1, 0.19371, 0.358, 0.5185)
BF8	C( $x_{4}$ \| −1, 1250.3, 1301.5, 1331.5)	BF19	C( $x_{10}$ \| −1, 0.003, 0.006, 0.052) × C( $x_{7}$ \| +1, 0.19371, 0.358, 0.5185) × C( $x_{4}$ \| +1, 1331.5, 1361.6, 1453.3)
BF9	C( $x 7$ \| −1, 0.5185, 0.679, 1.8595)	BF20	C( $x_{10}$ \| −1, 0.003, 0.006, 0.052) × C( $x_{7}$ \| +1, 0.19371, 0.358, 0.5185) × C( $x_{4}$ \| −1, 1331.5, 1361.6, 1453.3)
BF10	C( $x_{16}$ \| −1, 0.044, 0.054, 0.093)	BF21	C( $x_{15}$ \| −1, 367.76, 735.52, 984.31) × C( $x_{3}$ \| +1, 662.5, 959, 1085) × C( $x_{8}$ \| +1, 0.3185, 0.331, 0.87139)
BF11	BF3 × C( $x_{1}$ \| −1, 285, 561, 627.5)	BF22	C( $x_{15}$ \| −1, 367.76, 735.52, 984.31) × C( $x_{3}$ \| +1, 662.5, 959, 1085) × C( $x_{8}$ \| −1, 0.3185, 0.331, 0.87139)

Table 9. Basis functions of the piecewise-cubic MARS model of carbon concentration in melt.

BF	Equation	BF	Equation
BF1	C( $x_{16}$ \| +1, 1632, 1687, 1705)	BF14	C( $x_{12}$ \| −1, 44550, 50300, 52700)
BF2	C( $x_{16}$ \| −1, 1632, 1687, 1705)	BF15	C( $x_{3}$ \| +1, 644, 922, 927) × C( $x_{7}$ \| +1, 0.63971, 1.25, 2.145)
BF3	C( $x_{8}$ \| +1, 0.34472, 0.58643, 0.99911)	BF16	C( $x_{3}$ \| +1, 644, 922, 927) × C( $x_{7}$ \| −1, 0.63971, 1.25, 2.145)
BF4	C( $x_{8}$ \| −1, 0.34472, 0.58643, 0.99911)	BF17	BF2 × C( $x_{13}$ \|+1, 1010, 1085, 2117)
BF5	C( $x_{4}$ \| +1, 1266.5, 1333.9, 1439.5)	BF18	BF2 × C( $x_{13}$ \|−1, 1010, 1085, 2117)
BF6	C( $x_{5}$ \| +1, 83100, 146,000, 157,200)	BF19	C( $x_{16}$ \| −1, 1632, 1687, 1705) × C( $x_{15}$ \| −1, 577.03, 647.5, 940.3) × C( $x_{13}$ \| +1, 467.5,935, 1010)
BF7	C( $x_{11}$ \| −1, 0.0145, 0.028, 0.046)	BF20	C( $x_{16}$ \| −1, 1632, 1687, 1705) × C( $x_{15}$ \| −1, 577.03, 647.5, 940.3) × C( $x_{13}$ \| −1, 467.5, 935, 1010)
BF8	C( $x_{3}$ \| −1, 644, 922, 927)	BF21	BF18 × C( $x_{15}$ \| +1, 253.28, 506.55, 577.03)
BF9	BF6 × C( $x_{2}$ \| +1, 5618, 8503, 8715.8)	BF22	BF18 × C( $x_{15}$ \| −1, 253.28, 506.55, 577.03)
BF10	C( $x_{5}$ \| +1, 83100, 146000, 157200) × C( $x_{2}$ \| −1, 5618, 8503, 8715.8) × C( $x_{12}$ \| −1, 19400, 38800, 44550)	BF23	C( $x_{4}$ \| −1, 1266.5, 1333.9, 1439.5) × C( $x_{6}$ \| −1, 2.5825, 4.605, 4.63)
BF11	BF5 × C( $x_{1}$ \| +1, 442, 875, 917)	BF24	C( $x_{3}$ \| +1, 927, 932, 1071.5)
BF12	BF5 × C( $x_{1}$ \| −1, 442, 875, 917)	BF25	C( $x_{3}$ \| −1, 927,932, 1071.5)
BF13	C( $x_{12}$ \|+1, 44,550, 50,300, 52,700)

Table 10. Performance of the machine learning in melt temperature prediction based on static data (lime and dolomitic lime added as supplementary observations).

Method	Training	Testing
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	0.1926	0.7636	0.5831	224.9944	14.9998	0.9046	0.6646	0.5130	0.6057	0.3669	306.4681	17.5062	1.0553	0.8391	0.6572
SVR (Gaussian kernel)	0.1046	0.8145	0.6634	186.7076	13.6641	0.8241	0.5917	0.4542	0.6556	0.4298	273.3737	16.5340	0.9967	0.7807	0.6020
NN	0.2750	0.1519	0.0231	1445.2063	38.0159	2.2927	1.7939	1.9904	0.1081	0.0117	1453.5004	38.1248	2.2983	1.8851	2.0741
MARS (piecewise-linear)	10.8102	0.7484	0.5601	229.3880	15.1456	0.9134	0.7204	0.5224	0.6653	0.4426	296.5315	17.2201	1.0381	0.8216	0.6234
MARS (piecewise-cubic)	10.8143	0.7071	0.5000	260.6863	16.1458	0.9737	0.7684	0.5704	0.6714	0.4508	270.3367	16.4419	0.9912	0.7737	0.5930
k-NN	0.0012	0.6098	0.3719	331.4534	18.2059	1.0980	0.8646	0.6821	0.1137	0.0129	537.7687	23.1898	1.3980	1.1326	1.2552
RF	67.6480	0.9414	0.8862	105.0343	10.2486	0.6181	0.4757	0.3184	0.6119	0.3744	306.8707	17.5177	1.0560	0.8610	0.6551

Table 11. Performance of the machine learning in carbon prediction based on static data (lime and dolomitic lime added as supplementary observations).

Method	Training	Testing
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	2.1406	0.9710	0.9428	0.0000	0.0028	6.2736	169.6273	3.1830	0.0861	0.0074	0.0040	0.0630	129.6704	47.7670	119.3862
SVR (Gaussian kernel)	0.1017	0.7426	0.5515	0.0001	0.0081	18.3250	899.9417	10.5157	0.3803	0.1446	0.0002	0.0126	25.9159	18.0279	18.3772
NN	0.6165	0.1047	0.0110	0.0004	0.0200	44.9654	1800.2448	40.7037	0.2020	0.0408	0.0003	0.0165	33.9825	25.1977	28.2707
MARS (piecewise-linear)	13.3011	0.6304	0.3974	0.0001	0.0090	20.1725	725.0324	12.3728	0.4102	0.1683	0.0001	0.0122	25.0724	17.3692	18.1648
MARS (piecewise-cubic)	12.7149	0.5945	0.3534	0.0001	0.0093	20.8945	1028.9877	13.1037	0.2757	0.0760	0.0002	0.0128	26.3574	18.1886	20.6616
k-NN	0.0010	0.5569	0.3101	0.0001	0.0096	21.6641	1022.7053	13.9148	0.1112	0.0124	0.0002	0.0137	28.1985	21.6507	25.3777
RF	71.9077	0.9087	0.8257	0.0000	0.0064	14.4773	849.7621	7.5849	0.3266	0.1067	0.0002	0.0126	25.8663	18.6019	19.4980

Table 12. Basis functions of the piecewise-cubic MARS model of melt temperature (lime and dolomitic lime added as supplementary observations).

BF	Equation	BF	Equation
BF1	C( $x_{17}$ \| −1, 367.76, 735.52, 984.31)	BF13	BF4 × C( $x_{5}$ \| −1, 83650, 147100, 157750)
BF2	C( $x_{10}$ \| −1, 0.0045, 0.006, 0.052)	BF14	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8179.8) × C( $x_{13}$ \| −1, 7162, 8803, 10922) × C( $x_{15}$ \| −1, 362.5, 725, 1937)
BF3	C( $x_{2}$ \| −1, 5294.8, 7856.6, 8179.8)	BF15	BF1 × C( $x_{3}$ \| +1, 683, 1000, 1105.5)
BF4	C( $x_{12}$ \| +1, 18900, 37800, 46450)	BF16	C( $x_{10}$ \| −1, 0.0045, 0.006, 0.052) × C( $x_{13}$ \| +1, 2760.5, 5521, 7162) × C( $x_{1}$ \| −1, 69.5, 130, 345.5)
BF5	C( $x_{12}$ \| −1, 18900, 37800, 46450)	BF17	BF10 × C( $x_{1}$ \| −1, 345.5, 561, 760)
BF6	C( $x_{4}$ \| +1, 1250.3, 1301.5, 1313.7)	BF18	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8179.8) × C( $x_{8}$ \| −1, 0.221, 0.339, 0.87539)
BF7	C( $x_{4}$ \| −1, 1250.3, 1301.5, 1313.7)	BF19	BF4 × C( $x_{10}$ \| −1, 0.0015, 0.003, 0.0045)
BF8	C( $x_{7}$ \| −1, 0.35421, 0.679, 1.8595)	BF20	BF14 × C( $x_{4}$ \| −1, 1313.7, 1325.9, 1343.3)
BF9	C( $x_{18}$ \| −1, 0.027, 0.054, 0.093)	BF21	BF8 × C( $x_{2}$ \| +1, 8179.8, 8503, 8715.8)
BF10	BF2 × C( $x_{13}$ \| −1, 2760.5, 5521, 7162)	BF22	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8179.8) × C( $x_{4}$ \| +1, 1343.3, 1360.7, 1452.9)
BF11	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8179.8) × C( $x_{13}$ \| +1, 7162, 8803, 10922)	BF23	C( $x_{2}$ \| +1, 5294.8, 7856.6, 8179.8) × C( $x_{4}$ \| −1, 1343.3, 1360.7, 1452.9)
BF12	BF4 × C( $x_{5}$ \| +1, 83650, 147100, 157750)

Table 13. Basis functions of the piecewise-linear MARS model of carbon content (lime and dolomitic lime added as supplementary observations).

BF	Equation	BF	Equation
BF1	max(0, 1687− $x_{18}$ )	BF14	max(0, 1333.9− $x_{4}$ ) × max(0, $x_{15}$ ) × max(0, 8151.8− $x_{2}$ )
BF2	max(0, $x_{8}$ −0.58643)	BF15	BF14 × max(0, $x_{3}$ −971)
BF3	max(0, 0.58643− $x_{8}$ )	BF16	BF1 × max(0, $x_{17}$ −647.5)
BF4	max(0, $x_{4}$ −1333.9)	BF17	BF4 × max(0, $x_{1}$ −875)
BF5	max(0, $x_{5}$ −146,000)	BF18	BF4 × max(0, 875− $x_{1}$ )
BF6	max(0, 146,000− $x_{5}$ )	BF19	max(0, 1333.9− $x_{4}$ ) × max(0, $x_{2}$ −8044.3)
BF7	max(0, 0.028− $x_{11}$ )	BF20	max(0, 1333.9− $x_{4}$ ) × max(0, 8044.3− $x_{2}$ )
BF8	max(0, 922− $x_{3}$ )	BF21	max(0, $x_{3}$ −922) × max(0, $x_{17}$ −406.53)
BF9	max(0, $x_{13}$ −9722)	BF22	max(0, $x_{3}$ −922) × max(0, 406.53− $x_{17}$ )
BF10	max(0, 9722− $x_{13}$ )	BF23	BF19 × max(0, $x_{5}$ −150100)
BF11	max(0, $x_{12}$ −50300)	BF24	BF19 × max(0, 150100− $x_{5}$ )
BF12	max(0, 50300− $x_{12}$ )	BF25	max(0, 1333.9− $x_{4}$ ) × max(0, $x_{15}$ ) × max(0, 8151.8− $x_{2}$ ) × max(0, 971− $x_{3}$ ) × max(0, 596− $x_{1}$ )
BF13	BF5 × max(0, $x_{2}$ −8503)

Table 14. Performance of the machine learning in temperature prediction based on dynamic data.

Method	Training
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	1.0183	0.9842	0.9686	969.8928	31.1431	2.0959	1.5849	1.0563
SVR (Gaussian kernel)	0.2528	0.9883	0.9767	723.7140	26.9019	1.8104	1.3153	0.9105
NN	1.6471	0.9757	0.9520	1491.0892	38.6146	2.5987	1.6227	1.6227
MARS (piecewise-linear)	2.4166	0.9850	0.9702	919.6555	30.3258	2.0409	1.4773	1.0281
MARS (piecewise-cubic)	2.7178	0.9837	0.9677	1002.3017	31.6591	2.1306	1.6113	1.0741
k-NN	0.0015	0.9877	0.9756	754.7641	27.4730	1.8489	1.4295	0.9301
RF	59.0626	0.9918	0.9837	506.5556	22.5068	1.5147	1.1175	0.7604

Table 15. Absolute and relative errors in temperature prediction based on dynamic data.

Method	Endpoint Relative Error in Testing (%)											Endpoint Absolute Error in Testing (°C)
	Melting #										Average	Melting #										Average
	1	2	3	4	5	6	7	8	9	10		1	2	3	4	5	6	7	8	9	10
SVR (polynomial kernel)	0.07	1.15	4.39	0.29	0.01	1.32	1.08	0.89	1.38	1.68	1.23	1.10	19.03	74.79	4.72	0.21	21.67	17.92	14.91	22.94	28.37	20.56
SVR (Gaussian kernel)	1.03	0.50	7.41	6.84	1.45	0.39	0.52	4.45	4.88	9.37	3.68	16.97	8.25	126.18	112.69	23.99	6.33	8.63	74.27	81.28	158.18	61.68
NN	2.40	2.49	1.66	2.49	5.97	0.43	1.35	7.98	3.28	3.87	3.19	39.69	41.00	28.23	41.00	98.81	7.08	22.42	133.07	54.53	65.35	53.12
MARS (piecewise-linear)	0.15	0.59	1.27	1.68	0.29	0.78	0.03	0.85	0.68	1.00	0.73	2.47	9.69	21.55	27.73	4.78	12.86	0.55	14.20	11.24	16.88	12.19
MARS (piecewise-cubic)	0.07	0.65	1.28	1.81	0.41	0.89	0.07	0.88	0.76	0.93	0.77	1.12	10.67	21.79	29.75	6.73	14.66	1.11	14.72	12.63	15.69	12.89
k-NN	1.29	1.60	2.26	2.77	1.47	1.11	0.17	0.16	0.11	1.04	1.20	21.40	26.40	38.40	45.60	24.40	18.20	2.80	2.60	1.80	17.60	19.92
RF	0.35	1.11	3.13	0.35	1.05	0.58	1.40	1.35	1.53	2.46	1.33	5.80	18.36	53.33	5.81	17.43	9.47	23.16	22.60	25.53	41.60	22.31

Table 16. Performance of machine learning in prediction of carbon based on dynamic data.

Method	Training
	Time (s)	$r_{yY}$	$r_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI
SVR (polynomial kernel)	3.1040	0.9948	0.9896	0.0702	0.2650	12.0035	864.6450	6.0174
SVR (Gaussian kernel)	0.2451	0.9933	0.9866	0.0744	0.2727	12.3531	1350.4329	6.1973
NN	0.8467	0.9876	0.9754	0.1164	0.3412	15.4534	2542.8869	7.7748
MARS (piecewise-linear)	Failed
MARS (piecewise-cubic)	Failed
k-NN	0.0018	0.9955	0.9910	0.0489	0.2211	10.0151	595.5354	5.0188
RF	51.5903	0.9960	0.9920	0.0376	0.1938	8.7776	15.9674	4.3976

Table 17. Absolute and relative errors in carbon prediction based on dynamic data.

Method	Endpoint Relative Error in Testing (%)											Endpoint Absolute Error in Testing (Vol.%)
	Melting #										Average	Melting #										Average
	1	2	3	4	5	6	7	8	9	10		1	2	3	4	5	6	7	8	9	10
SVR (polynomial kernel)	116.00	20.32	250.43	142.38	386.25	147.20	134.69	284.12	97.67	329.35	190.84	0.0661	0.0106	0.0902	0.0584	0.1699	0.0559	0.0660	0.1193	0.0420	0.1449	0.0823
SVR (Gaussian kernel)	193.88	168.60	2279.79	2899.48	1521.79	1.31	24.51	2225.60	1596.32	2853.40	1376.47	0.1105	0.0877	0.8207	1.1888	0.6696	0.0005	0.0120	0.9347	0.6864	1.2555	0.5766
NN	252.69	393.18	84.70	1742.12	450.99	28.59	197.22	1527.45	531.81	2196.87	740.56	0.1440	0.2045	0.0305	0.7143	0.1984	0.0109	0.0966	0.6415	0.2287	0.9666	0.3236
MARS (piecewise-linear)	Failed											Failed
MARS (piecewise-cubic)	Failed											Failed
k-NN	30.18	26.92	16.11	12.68	15.91	1.58	1.63	14.29	3.26	3.64	12.62	0.0172	0.0140	0.0058	0.0052	0.0070	0.0006	0.0008	0.0060	0.0014	0.0016	0.0060
RF	16.29	334.95	352.63	335.99	364.71	459.68	135.39	317.87	354.29	435.79	310.76	0.0093	0.1742	0.1269	0.1378	0.1605	0.1747	0.0663	0.1335	0.1523	0.1917	0.1327

Table 18. Basis functions of the piecewise-linear MARS model of melt temperature based on dynamic data.

BF	Equation	BF	Equation
BF1	max(0, $x_{4}$ −7662.7)	BF8	max(0, 5402.7− $x_{4}$ ) × max(0, 0.54977− $x_{1}$ )
BF2	max(0, 7662.7− $x_{4}$ )	BF9	max(0, $x_{2}$ −12.992)
BF3	BF2 × max(0, 0.11574− $x_{1}$ )	BF10	max(0, 12.992− $x_{2}$ )
BF4	max(0, 7662.7− $x_{4}$ ) × max(0, $x_{1}$ −0.11574) × max(0, $x_{2}$ −20.747)	BF11	BF7 × max(0, $x_{2}$ −30.642)
BF5	max(0, 7662.7− $x_{4}$ ) × max(0, $x_{1}$ −0.11574) × max(0, 20.747− $x_{2}$ )	BF12	BF7 × max(0, 30.642− $x_{2}$ )
BF6	BF5 × max(0, $x_{3}$ −717.4)	BF13	max(0, 7662.7− $x_{4}$ ) × max(0, $x_{1}$ −0.11574) × max(0, $x_{2}$ −30.642)
BF7	max(0, 5402.7− $x_{4}$ ) × max(0, $x_{1}$ −0.54977)	BF14	BF9 × max(0, $x_{1}$ - 51.765)

Table 19. Performance of machine learning and temperature prediction based on dynamic data (concentration of O

_{2}

and H

_{2}

added as supplementary observations).

Table 19. Performance of machine learning and temperature prediction based on dynamic data (concentration of O

_{2}

and H

_{2}

added as supplementary observations).

Method	Added Observation(s)	Training								Testing
		Time (s)	r $_{yY}$	r $_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	Average Relative Error in Endpoint (%)	Average Absolute error in Endpoint ( $^{\circ}$ C)
SVR (polynomial kernel)	O $_{2}$ (%)	1.4019	0.9847	0.9696	940.7080	30.6710	2.0641	1.5918	1.0400	1.64	27.39
	H $_{2}$ (%)	0.0510	0.9829	0.9661	1049.0229	32.3886	2.1797	1.6523	1.0992	1.60	26.85
	O $_{2}$ , H $_{2}$ (%)	0.1439	0.9838	0.9679	994.9295	31.5425	2.1227	1.6045	1.0700	2.14	35.80
SVR (Gaussian kernel)	O $_{2}$ (%)	0.0333	0.989	0.9781	679.3708	26.0647	1.7541	1.4389	0.8819	4.78	79.97
	H $_{2}$ (%)	0.0282	0.9867	0.9736	820.0818	28.6371	1.9272	1.5261	0.9700	2.64	44.07
	O $_{2}$ , H $_{2}$ (%)	0.0330	0.989	0.9781	685.3073	26.1784	1.7617	1.4452	0.8858	4.01	67.13
0]*NN	O $_{2}$ (%)	1.2520	0.9786	0.9577	1313.0513	36.236	2.4386	1.8238	1.2325	2.26	37.66
	H $_{2}$ (%)	0.5802	0.9786	0.9577	1318.2917	36.3083	2.4435	1.7894	1.2349	2.79	46.36
	O $_{2}$ , H $_{2}$ (%)	0.5887	0.9788	0.9580	1308.9262	36.1791	2.4348	1.7751	1.2304	2.22	37.06
MARS (piecewise-linear)	O $_{2}$ (%)	1.8643	0.9850	0.9702	921.9392	30.3635	2.0434	1.5745	1.0294	0.77	12.91
	H $_{2}$ (%)	2.6286	0.9847	0.9696	941.7202	30.6875	2.0652	1.5791	1.0406	0.84	14.01
	O $_{2}$ , H $_{2}$ (%)	2.2213	0.985	0.9702	921.9392	30.3635	2.0434	1.5745	1.0294	0.77	12.91
MARS (piecewise-cubic)	O $_{2}$ (%)	2.0915	0.9845	0.9692	949.7489	30.8180	2.074	1.5895	1.0451	0.70	11.71
	H $_{2}$ (%)	2.6845	0.9845	0.9692	951.9365	30.8535	2.0764	1.5937	1.0463	0.91	15.20
	O $_{2}$ , H $_{2}$ (%)	2.3626	0.9845	0.9692	949.7489	30.818	2.074	1.5895	1.0451	0.70	11.71
k-NN	O $_{2}$ (%)	0.0015	0.9878	0.9757	750.9376	27.4032	1.8442	1.4299	0.9277	1.20	19.97
	H $_{2}$ (%)	0.0015	0.9877	0.9756	755.1083	27.4792	1.8493	1.4290	0.9304	1.20	19.97
	O $_{2}$ , H $_{2}$ (%)	0.0013	0.9878	0.9757	753.6573	27.4528	1.8475	1.4336	0.9294	1.20	19.97
RF	O $_{2}$ (%)	52.7746	0.9918	0.9837	507.8521	22.5356	1.5166	1.1332	0.7614	1.80	30.15
	H $_{2}$ (%)	52.1179	0.9917	0.9835	513.2911	22.6559	1.5247	1.1285	0.7655	1.77	29.55
	O $_{2}$ , H $_{2}$ (%)	53.3313	0.9917	0.9835	518.2623	22.7654	1.5321	1.1387	0.7692	2.31	38.54

Table 20. Performance of machine learning and carbon prediction based on dynamic data (concentration of O

_{2}

and H

_{2}

added as supplementary observations).

Table 20. Performance of machine learning and carbon prediction based on dynamic data (concentration of O

_{2}

and H

_{2}

added as supplementary observations).

Method	Added Observation(s)	Training								Testing
		Time (s)	r $_{yY}$	r $_{yY}^{2}$	MSE	RMSE	RRMSE (%)	MAPE (%)	PI	Average Relative Error in Endpoint (%)	Average Absolute Error in Endpoint (vol.%)
SVR (polynomial kernel)	O $_{2}$ (%)	9.8700	0.9935	0.9870	0.0617	0.2484	11.2522	1936.6827	5.6446	422.30	0.1807
	H $_{2}$ (%)	0.7264	0.9930	0.9860	0.0708	0.2661	12.0543	406.0649	6.0483	457.08	0.2029
	O $_{2}$ , H $_{2}$ (%)	6.8941	0.9933	0.9866	0.0661	0.2571	11.6462	477.8318	5.8428	449.68	0.2060
SVR (Gaussian kernel)	O $_{2}$ (%)	0.2335	0.9971	0.9942	0.0585	0.2419	10.9565	802.7950	5.4861	2240.19	0.9385
	H $_{2}$ (%)	0.2324	0.9946	0.9892	0.0645	0.2540	11.5071	1021.0557	5.7692	964.71	0.4144
	O $_{2}$ , H $_{2}$ (%)	0.2321	0.9958	0.9916	0.0494	0.2222	10.0664	913.7527	5.0439	1508.12	0.6346
NN	O $_{2}$ (%)	0.8624	0.9892	0.9785	0.1016	0.3188	14.4382	2395.4597	7.2582	1324.38	0.5667
	H $_{2}$ (%)	1.0127	0.9860	0.9722	0.1320	0.3633	16.4536	2763.2608	8.2849	1361.82	0.6014
	O $_{2}$ , H $_{2}$ (%)	0.8246	0.9888	0.9777	0.1055	0.3247	14.7090	2875.5355	7.3959	814.47	0.3571
MARS (piecewise-linear)	O $_{2}$ (%)	Failed
	H $_{2}$ (%)	Failed
	O $_{2}$ , H $_{2}$ (%)	Failed
MARS (piecewise-cubic)	O $_{2}$ (%)	Failed
	H $_{2}$ (%)	Failed
	O $_{2}$ , H $_{2}$ (%)	Failed
k-NN	O $_{2}$ (%)	0.0014	0.9948	0.9896	0.0494	0.2222	10.0633	486.1327	5.0448	12.62	0.0060
	H $_{2}$ (%)	0.0014	0.9948	0.9896	0.0489	0.2211	10.0165	631.1686	5.0213	12.62	0.0060
	O $_{2}$ , H $_{2}$ (%)	0.0014	0.9948	0.9896	0.0494	0.2222	10.0655	492.8121	5.0460	12.62	0.0060
RF	O $_{2}$ (%)	43.1992	0.9960	0.9920	0.0379	0.1947	8.8184	15.9738	4.4180	592.91	0.2530
	H $_{2}$ (%)	44.9515	0.9961	0.9922	0.0375	0.1937	8.7750	15.9797	4.3961	609.86	0.2625
	O $_{2}$ , H $_{2}$ (%)	44.0530	0.9959	0.9918	0.0398	0.1994	9.0315	15.9870	4.5252	901.37	0.3868

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kačur, J.; Flegner, P.; Durdán, M.; Laciak, M. Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study. Appl. Sci. 2022, 12, 7757. https://doi.org/10.3390/app12157757

AMA Style

Kačur J, Flegner P, Durdán M, Laciak M. Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study. Applied Sciences. 2022; 12(15):7757. https://doi.org/10.3390/app12157757

Chicago/Turabian Style

Kačur, Ján, Patrik Flegner, Milan Durdán, and Marek Laciak. 2022. "Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study" Applied Sciences 12, no. 15: 7757. https://doi.org/10.3390/app12157757

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Temperature and Carbon Concentration in Oxygen Steelmaking by Machine Learning: A Comparative Study

Abstract

Featured Application

Abstract

1. Introduction

1.1. Presented Research Area in the Literature Review

1.2. Understanding of Steelmaking in LD Converter

2. Theoretical Background of Applied Method

2.1. Observations and Targets in BOS

2.2. Multivariate Adaptive Regression Splines

2.3. Support-Vector Regression

2.4. Feed-Forward Neural Networks

2.5. k-Nearest Neighbors

2.6. Random Forest

2.7. Advantages and Disadvantages of Machine Learning Methods

2.8. Model Performance Indicators

3. Simulation Results

3.1. Prediction Based on Static Data

3.2. Prediction Based on Dynamic Data

4. Discussion of Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI