*2.3. HyGPR with K-Means Clustering*

In this study, the novel hybrid model HyGPR integrating the parametric MLR model and the non-parametric GPR model are constructed to forecast the oxygen consumption in the converter steelmaking process.

$$f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + \mathcal{g}(\mathbf{x}) \tag{22}$$

where *w<sup>T</sup>* is the weight vector of the MLR model defined in Equation (10) and *g*(*x*) ∼ GP(0, *k*(*x*, *x* )).

Such a hybrid model can bridge the gap between the interpretability of the parametric model MLR and the accuracy of non-parametric model GPR, where MLR is identified as the prior model. Note that the proposed hybrid model can be identified as a special GPR, where the mean function is defined as a linear function and the covariance function is formed as a composite kernel function defined in Equations (19)–(21).

The hyper-parameters of the proposed hybrid model are formed as a vector θ = *w*0, *w*1, ... , *wm*, σ<sup>2</sup> *f*1 , *l* 2 *<sup>s</sup>*1, ... , *l* 2 *sm*, *l* 2 *<sup>e</sup>*1, ... , *l* 2 *em* . To seek the optimal hyper-parameters, we need to maximize the log marginal likelihood [23].

$$\log p(y|\theta) = -\frac{1}{2}(y-m)^T K^{-1} (y-m) - \frac{1}{2}\log|\mathbb{K}| - \frac{n}{2}\log 2\pi\tag{23}$$

where *m* = *wTX*, and *Kij* = *k xi*, *xj* . When solving this equation, the most challengeable and time-consuming task is finding the inverse matrix *K*−<sup>1</sup> with high dimensions.

To apply the proposed HyGPR model in an actual environment, we employ a K-means clustering method to reduce the training sample size (as shown in Figure 2). When training the noise function *g*(*x*), we use the K-means clustering method with the same input variables as MLR and GPR to decompose the training set D = 0 *X*, *y* 1 into *Q* subsets, and respectively train *Q* GPR models. When a new input *x* arrives, the HyGPR firstly forecasts the value of *f*(*x* ) using the MLR model, and then predicts the value of *g*(*x* ) using the *qth* GPR model selected by the K-means clustering model. With this decomposition manner, the training speed of the GPR is assumed to be accelerated because the dimension of the observed matrix is reduced.

**Figure 2.** HyGPR model with K-means clustering. MLR: multiple linear regression; GPR: Gaussian process regression.

#### **3. Experiments and Discussion**

#### *3.1. Data Set*

To test the proposed HyGPR model, we collected the real-world process data of the converter steelmaking process in an integrated iron and steel works situated in the north of China. The data set has 1534 observed samples between 1 April 2018 and 30 June 2018. Figure 3 indicates the distribution of the observed outputs, which is irregular and fluctuates severely. The selected input variables includes:


The statistics information such as means, standard deviations, minimum and maximum values, are summarized in Table 1. To evaluate the performance of the proposed model, we divided the dataset into two sets with the handout way: The former 1381 samples (about 90%) for learning HyGPR and second 153 samples (10%) for testing.

**Figure 3.** Oxygen consumption of each process in converters.


**Table 1.** Descriptive statistics of data set.
