*Article* **Temperature Prediction Model for a Regenerative Aluminum Smelting Furnace by a Just-in-Time Learning-Based Triple-Weighted Regularized Extreme Learning Machine**

**Xingyu Chen, Jiayang Dai \* and Yasong Luo**

Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning 530004, China

**\*** Correspondence: daijiayang@gxu.edu.cn; Tel.: +86-185-7439-5495

**Abstract:** In a regenerative aluminum smelting furnace, real-time liquid aluminum temperature measurements are essential for process control. However, it is often very expensive to achieve accurate temperature measurements. To address this issue, a just-in-time learning-based tripleweighted regularized extreme learning machine (JITL-TWRELM) soft sensor modeling method is proposed for liquid aluminum temperature prediction. In this method, a weighted JITL method (WJITL) is adopted for updating the online local models to deal with the process time-varying problem. Moreover, a regularized extreme learning machine model considering both the sample similarities and the variable correlations was established as the local modeling method. The effectiveness of the proposed method is demonstrated in an industrial aluminum smelting process. The results show that the proposed method can meet the requirements of prediction accuracy of the regenerative aluminum smelting furnace.

**Keywords:** temperature prediction; weighted regularized extreme learning machine; just-in-time learning; sample similarities; variable correlations

#### **1. Introduction**

Aluminum can be made into alloys with various metals; it is widely used in automotive, aviation, and military industries due to its good ductility, plasticity, recyclability, and oxidation resistance. A regenerative aluminum smelting furnace is important for the aluminum smelting process, in which the real-time measurement and control of liquid aluminum temperatures influence the quality of the aluminum. However, on industrial sites, there are many influencing factors, such as the aging of temperature-measuring thermocouples and fluctuations in the operating voltage, which bring difficulties to the realtime measurements of the aluminum liquid temperature. Hence, it is essential to develop a modeling method to predict the liquid aluminum temperature for quality improvement of the aluminum. The aluminum smelting process is a typical complex industrial furnace production process. In recent decades, many studies on industrial furnaces have been performed (regarding 'mechanism modeling') [1–3]. Although the physical meaning of 'mechanism modeling' is clear, there are some problems, such as complicated calculations for industrial furnace systems. At the same time, mechanism models may not be reliable enough since they usually make simplified assumptions. The furnace temperature, airflow rate, etc., fluctuate greatly in different working states due to the intermittent working characteristics of the regenerative aluminum smelting furnace. The real-time update of the model for the regenerative aluminum smelting furnace is also a problem that needs to be considered.

To overcome the shortcomings of mechanism modeling, a soft-sensor that makes full use of the industrial data is proposed [4]. There are many researchers working on the data-driven modeling of industrial furnaces and similar processes, such as partial

**Citation:** Chen, X.; Dai, J.; Luo, Y. Temperature Prediction Model for a Regenerative Aluminum Smelting Furnace by a Just-in-Time Learning-Based Triple-Weighted Regularized Extreme Learning Machine. *Processes* **2022**, *10*, 1972. https://doi.org/10.3390/pr10101972

Academic Editors: Guoqing Zhang, Zejia Zhao and Wai Sze YIP

Received: 1 September 2022 Accepted: 23 September 2022 Published: 30 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

least squares (PLS) [5], the kernel principal component regression (KPCR) [6], and kernel partial least squares (KPLS) [7], which have been successfully applied with good results. However, these methods are generally considered to be global modeling (and trained offline). Moreover, after these models are put into application, they will face problems, such as difficulties in model updating. Consequently, to deal with the adaptive update problem of the model, the moving window technique [8,9], recursive models [10,11], and the just-in-time learning (JITL) strategy [12,13] are usually used as online adaptive update strategies. The JITL strategy trains an online local model to predict the query samples by selecting similar samples from historical samples, so it is more suitable for processes such as industrial furnaces with state mutations. For example, Chen et al. [14] proposed a least squares support vector machine temperature prediction model based on JITL to deal with large temperature change lags in roller kilns. Dai et al. [15] combined the moving window technique and the JITL strategy as an update strategy to select similar samples in both time and space dimensions, and they verified the effectiveness of the proposed method on an industrial kiln. In [16], a locally weighted partial least squares regression (LWPLS) model was proposed by JITL-based local modeling. In LWPLS, the samples most similar to the query sample are assigned different weights and selected for local modeling. The current model will be discarded when the next query sample is available. Then, a new local PLS model will be established for the model's online update. However, LWPLS only considers the sample similarities, not the variable correlations. The data of the aluminum smelting process often present high-dimensional characteristics and each input variable has a different degree of influence on the liquid aluminum temperature. Hence, except for the sample similarities, it is necessary to consider the variable correlations [17–19]. Furthermore, the accuracy of the JITL strategy depends on the quality of the selected samples. However, the traditional similarity measurement criteria, such as Euclidean distance and Mahalanobis distance, only consider the input information without considering the output information, and often cannot obtain accurate similar samples. Thus, investigating new similarity measurement criteria is important for the JITL strategy.

In recent years, artificial intelligence algorithms, such as long short-term memory networks (LSTM) [20–22] and extreme learning machine(s) (ELM) [23–26] have also been used in soft sensor modeling. The basic assumption for LSTM is that process data are sampled at even and unified frequencies; it is very difficult to meet these conditions for 'process data measurements' in industrial processes, especially for quality variables. Hence, LSTM is unsuitable for some processes with irregular sampling frequencies. ELM is a single hidden layer neural network with a low algorithm complexity, which does not need backpropagation to solve iteratively, and has been used in the temperature prediction of regenerative aluminum smelting processes. Huang et al. [27] proposed an extreme learning machine furnace temperature prediction model based on the kernel principal component analysis and showed that ELM has a better effect than the traditional BP neural network. Liu et al. [28] proposed an ELM model optimized by the restricted Boltzmann machine (RBM) to solve the random initialization of the input weights and biases in the ELM. Moreover, ELM has a fast learning speed and is suitable as an online prediction model. For example, Li et al. [29] built a local online ELM model in combination with a JITL strategy, allowing the online prediction of polyethylene terephthalate (PET) viscosity without relying on timeconsuming laboratory analysis procedures. However, this ELM-based online prediction model neither considers sample similarities nor variable correlations, which is unreasonable in local modeling. Moreover, the original ELM runs the risk of model overfitting. Hence, a regularized extreme learning machine (RELM) [30] was proposed to solve the model's overfitting problem.

Although some research studies have been carried out on ELM, there are few discussions about sample similarities and variable correlations in RELM, especially in temperature prediction. Based on the above discussions, a soft sensor modeling method of the JITL-based triple-weighted regularized extreme learning machine (JITL-TWRELM) was proposed to solve the above problems. Compared with the traditional data-driven

modeling method described above, the method proposed in this paper not only allows real-time updating of the model but also obtains more accurate local modeling samples due to the use of the WJITL strategy, which uses correlation information between the input and output variables in the sample selection stage. Meanwhile, in the local modeling stage, the proposed method overcomes the shortcomings of the traditional local modeling method, which only considers the sample similarities and analyzes the variable correlations, highlighting the influences of different variables on the output. The remainder of this article is structured as follows. Firstly, the regenerative aluminum smelting furnace is briefly introduced. Secondly, the regularized extreme learning machine (RELM), sample weighted regularized extreme learning machine (SWRELM), and variable weighted regularized extreme learning machine (VWRELM) are introduced, respectively. Then, the JITL-based triple-weighted regularized extreme learning machine (JITL-TWRELM) is described. Next, the flexibility and effectiveness of the proposed method are validated in the industrial aluminum smelting processing. Finally, we present the conclusions.

#### **2. Related Methods**

Since ELM runs the risk of model overfitting, the regularization method is used to solve the overfitting problem. Considering sample similarities and variable correlations, the sample weighted regularized extreme learning machine (SWRELM) and the variable weighted extreme learning machine (VWRELM) are introduced, respectively. Three related methods are discussed next. To better understand the derivation of the relevant equations, the definition of symbols in this paper is shown in Table 1.


**Table 1.** Definition of symbols in this paper.

#### *2.1. RELM*

As shown in Figure 1, the structure of ELM consists of three parts, which are the input layer, hidden layer, and output layer [31]. The core idea of ELM is to randomly select the input weights and hidden layer biases of the network. The output weights between the hidden layer and output layer are obtained by minimizing the loss function and solving the Moore–Penrose generalized inverse operation. Owing to the particularity of the single hidden-layer structure, ELM has a faster learning speed, minimal human interference, and it is easier to implement than traditional networks. However, the original ELM model only considers the empirical risk minimization (ERM) principle, which tends to result in an overfitting model. To overcome this deficiency, a regularized extreme learning machine (RELM) was proposed based on empirical risk minimization and structural risk minimization (SRM) principles and has proven to be a better generalization performance than ELM.

**Figure 1.** The structure of ELM.

It is assumed that the *n*th historical input variable vector and the output variable are denoted as *x<sup>n</sup>* = [*xn*1, *xn*2, . . . , *xnm*] and *tn*, respectively, where *m* is the number of input variables. (*xn*, *tn*) is the *n*th historical sample composed of *x<sup>n</sup>* and *tn*. The output function of the RELM with *L* hidden layer neurons can be represented as

$$\sum\_{i=1}^{L} \beta\_i g(\omega\_i \mathbf{x}\_j^T + b\_i) = \mathbf{t}\_{\mathbf{j}\prime} \mathbf{j} = \mathbf{1}\_{\prime} \dots \mathbf{N} \tag{1}$$

where *β<sup>i</sup>* is the output weight of the *i*th hidden layer unit, *ω<sup>i</sup>* = [*ωj*<sup>1</sup> , . . . , *ωjm*], and *b<sup>i</sup>* are the input weight, bias connecting input layer, and *i*th hidden layer unit, respectively. *x<sup>j</sup>* = [*xj*<sup>1</sup> , *xj*<sup>2</sup> , . . . , *xjm*] is the input variable vector, *t<sup>j</sup>* denotes the output corresponding to *x<sup>j</sup>* , *N* is the number of training samples. *g*(.) is the activation function. Usually, *g*(.) is set as the sigmoid function. We re-write Equation (1) in matrix form

$$H\beta = T\tag{2}$$

where

$$H = [h(\mathbf{x}\_1^T)^T, \dots, h(\mathbf{x}\_N^T)^T]^T = \begin{pmatrix} g(\omega\_1 \mathbf{x}\_1^T + b\_1) & \dots & g(\omega\_L \mathbf{x}\_1^T + b\_L) \\ \vdots & \ddots & \vdots \\ g(\omega\_1 \mathbf{x}\_N^T + b\_1) & \dots & g(\omega\_L \mathbf{x}\_N^T + b\_L) \end{pmatrix} \tag{3}$$

$$\mathcal{B} = [\mathcal{B}\_1, \dots, \mathcal{B}\_L]^T \tag{4}$$

$$T = [t\_1, \dots, t\_N]^T \tag{5}$$

Due to *ω<sup>i</sup>* and *b<sup>i</sup>* being randomly given, to obtain the output weight vector *β*, the optimization equation can be represented as

$$\begin{array}{ll}\min \frac{1}{2} \|\boldsymbol{\beta}\|^2 + \frac{\mathsf{C}}{2} \|\boldsymbol{\xi}\|^2 \mathsf{I} \\ \text{s.t.} \boldsymbol{h}(\boldsymbol{\mathfrak{x}}\_{\mathsf{j}}^T) \boldsymbol{\beta} = \boldsymbol{t}\_{\mathsf{j}} + \boldsymbol{\mathfrak{x}}\_{\mathsf{j}}, \boldsymbol{j} = \boldsymbol{1}, \ldots, \mathsf{N} \end{array} \tag{6}$$

where *C* represents the regularization coefficient, which can adjust the empirical risk and structural risk. *ξ* = [*ξ*1, . . . , *ξN*] *T* is the training error vector. By constructing the Lagrange function, the solution of Equation (6) is

$$\mathcal{B} = \begin{cases} (H^T H + \frac{I\_L}{\mathbb{C}})^{-1} H^T T, L < N \\\ H^T (H^T H + \frac{I\_N}{\mathbb{C}})^{-1} T, L > N \end{cases} \tag{7}$$

where *I<sup>L</sup>* ∈ *R L*×*L* , *I<sup>N</sup>* ∈ *R <sup>N</sup>*×*N*.

#### *2.2. SWRELM*

Not all samples have the same contribution to the output; moreover, the original RELM considers all samples equally important and does not consider the differences between different samples. Thus, to obtain a more realistic result, the sample weighted matrix Ω*<sup>s</sup>* = *diag*(Ω*s*1, . . . , Ω*sN*) is added to Equation (6), which is expressed as

$$\begin{array}{ll}\min \frac{1}{2} \left\| \boldsymbol{\beta}^{S} \right\|^{2} + \frac{\mathsf{C}}{2} \left\| \Omega\_{\mathsf{s}} \mathfrak{f}\_{\mathsf{s}} \right\|^{2} \\ \text{s.t.} \boldsymbol{h}(\boldsymbol{x}\_{\mathsf{j}}^{T}) \boldsymbol{\beta}^{S} = \boldsymbol{t}\_{\mathsf{j}} + \mathfrak{f}\_{\mathsf{j}}, \boldsymbol{j} = 1, \dots, N \end{array} \tag{8}$$

The Lagrange function can be represented as follows:

$$\begin{aligned} &L(\boldsymbol{\beta}^{\mathcal{S}}, \Omega\_{\boldsymbol{\theta}}, \boldsymbol{\lambda}) \\ &= \frac{1}{2} \left\| \boldsymbol{\beta}^{\mathcal{S}} \right\|^{2} + \frac{\mathsf{C}}{2} \left\| \Omega\_{\boldsymbol{\theta}} \boldsymbol{\xi} \right\|^{2} - \sum\_{j=1}^{N} \lambda \left( \sum\_{i=1}^{L} h(\mathbf{x}\_{j}^{T}) \boldsymbol{\beta}^{\mathcal{S}} - \boldsymbol{t}\_{j} - \boldsymbol{\xi}\_{j} \right) \\ &= \frac{1}{2} \left\| \boldsymbol{\beta}^{\mathcal{S}} \right\|^{2} + \frac{\mathsf{C}}{2} \left\| \Omega\_{\boldsymbol{\theta}} \boldsymbol{\xi} \right\|^{2} - \lambda (H \boldsymbol{\beta}^{\mathcal{S}} - \boldsymbol{T} - \boldsymbol{\xi}) \end{aligned} \tag{9}$$

where *λ* = [*λ*1, . . ., *λN*] denotes the Lagrange multiplier vector. According to the KKT condition, taking the derivative of Equation (9) and setting the derivative to zero, we have

$$\frac{\partial L}{\partial \boldsymbol{\beta}^S} = \boldsymbol{0} \rightarrow (\boldsymbol{\beta}^S)^T = \lambda \boldsymbol{H} \tag{10}$$

$$\frac{\partial L}{\partial \xi} = 0 \rightarrow \mathbb{C}\xi^T \Omega\_s^2 + \lambda = 0 \tag{11}$$

$$\frac{\partial L}{\partial \lambda} = 0 \to H\beta^S = T + \mathfrak{f} \tag{12}$$

With Equations (11) and (12), the Lagrange multiplier vector *λ* can be expressed as

$$\lambda = -\mathbb{C}(H\boldsymbol{\beta}^S - T)^T \boldsymbol{\Omega}\_s^2 \tag{13}$$

Similarly, with Equations (10) and (13), the expression of the output weight vector of the sample weighted regularized extreme learning machine (SWRELM) is

$$\mathcal{S}^S = (H^T \Omega\_\text{s}^2 H + \frac{I\_L}{\mathcal{C}})^{-1} H^T \Omega\_\text{s}^2 T, L < N \tag{14}$$

Equation (14) is suitable when the number of modeling samples is greater than the number of hidden neurons. Moreover, in this case, *β <sup>S</sup>* has a faster calculation speed [32].

#### *2.3. VWRELM*

The original RELM treats all input variables with equal importance, while not all input variables have the same effect on the output variable, some input variables are more strongly correlated with the output variable than others. Thus, to reflect the differences of input variables and obtain better quality-related features, a variable contribution method

based on the Pearson correlation coefficient was adopted. On this basis, the variable weighted extreme learning machine (VWRELM) was proposed. The Pearson correlation coefficient is defined as

$$\rho = \frac{E(\mathbf{x}t) - E(\mathbf{x})E(t)}{\sqrt{E(\mathbf{x}^2) - E^2(\mathbf{x})}\sqrt{E(t^2) - E^2(t)}} \tag{15}$$

where *E*(*x*) and *E*(*t*) are the expectations of the single input variable and output variable, respectively. *ρ* represents the degree of correlation between the two variables; two highly correlated variables will also have a larger *ρ*. As a result, the variable contribution can be defined by *ρ*. For a training sample (*xn*, *tn*), *n* = 1, . . . , *k*, where each input sample *x<sup>n</sup>* has *m* dimensions, the contribution of each variable can be defined as

$$v\_i = \frac{|\rho\_i|}{\sum\_{j=1}^{m} |\rho\_i|}, i = 1, \dots, m \tag{16}$$

where *ρ<sup>i</sup>* represents the Pearson correlation coefficient between the *i*th input variable and the output variable. The variable contribution matrix can be written as

$$V = \operatorname{diag}(v\_1, \dots, v\_m) \tag{17}$$

Hence, taking the variable contribution as the variable weights, and applying the variable weights to the input sample *xn*, the weighted input sample can be expressed as

$$\mathbf{x}\_n^v = \mathbf{x}\_n V = \mathbf{x}\_n \\ \text{diag}(v\_{1\prime}, \dots, v\_m) = (\mathbf{x}\_{n1} v\_{1\prime}, \dots, \mathbf{x}\_{nm} v\_m) \tag{18}$$

where *x v n* represents the input sample weighted by variable weights. It can be seen from Equation (18) that each dimension of the input sample is given a different weight, reflecting the differences between variables. By variable weighting, Equation (3) can be rewritten as

$$\begin{aligned} \boldsymbol{H}^{\boldsymbol{V}} &= [\boldsymbol{h}(\left(\boldsymbol{x}\_{1}^{\boldsymbol{v}}\right)^{\boldsymbol{T}})^{\boldsymbol{T}}, \dots, \boldsymbol{h}(\left(\boldsymbol{x}\_{N}^{\boldsymbol{p}}\right)^{\boldsymbol{T}})^{\boldsymbol{T}}]^{\boldsymbol{T}} \\ &= \left( \begin{array}{ccc} \boldsymbol{g}\left(\boldsymbol{\omega}\_{1}\left(\left(\boldsymbol{x}\_{1}^{\boldsymbol{v}}\right)^{\boldsymbol{T}}\right)^{\boldsymbol{T}} + \boldsymbol{b}\_{1}\right) & \dots & \boldsymbol{g}\left(\boldsymbol{\omega}\_{L}\left(\left(\boldsymbol{x}\_{1}^{\boldsymbol{v}}\right)^{\boldsymbol{T}}\right)^{\boldsymbol{T}} + \boldsymbol{b}\_{L}\right) \\ & \vdots & \ddots & \vdots \\ \boldsymbol{g}\left(\boldsymbol{\omega}\_{1}\left(\left(\boldsymbol{x}\_{N}^{\boldsymbol{v}}\right)^{\boldsymbol{T}}\right)^{\boldsymbol{T}} + \boldsymbol{b}\_{1}\right) & \dots & \boldsymbol{g}\left(\boldsymbol{\omega}\_{L}\left(\left(\boldsymbol{x}\_{N}^{\boldsymbol{v}}\right)^{\boldsymbol{T}}\right)^{\boldsymbol{T}} + \boldsymbol{b}\_{L}\right) \end{array} \right) \end{aligned} \tag{19}$$

when *L* < *N*, the output weight vector is

$$\boldsymbol{\beta}^{V} = ((\boldsymbol{H}^{V})^{T}\boldsymbol{H}^{V} + \frac{I\_{L}}{\mathbb{C}})^{-1}(\boldsymbol{H}^{V})^{T}\boldsymbol{T}\_{\prime}\boldsymbol{L} < \boldsymbol{N} \tag{20}$$

#### **3. The Proposed JITL-TWRELM Model**

In the previous analysis, the RELM, SWRELM, and VWRELM models have been established. However, in a multi-data, multivariate prediction model, the different samples and variables to the predicted outputs are different, especially in the aluminum smelting process. Table 2 shows the shortcomings of the three methods. Both sample similarities and variable correlations should be taken into account in RELM. Hence, to obtain a better model, combined with the weighted JITL strategy (WJITL), a JITL-based triple-weighted regularized extreme learning machine is proposed.


**Table 2.** Shortcomings of the three methods.

#### *3.1. Weighted Similarity Measurement Criterion*

The original Euclidean distance is usually used as a similarity measurement criterion, expressed as

$$d\_{on} = \sqrt{(\mathbf{x}\_q - \mathbf{x}\_n)(\mathbf{x}\_q - \mathbf{x}\_n)^T} \tag{21}$$

where *x<sup>q</sup>* ∈ *R* <sup>1</sup>×*<sup>m</sup>* is the current query sample, *<sup>x</sup><sup>n</sup>* <sup>∈</sup> *<sup>R</sup>* <sup>1</sup>×*<sup>m</sup>* is the *n*th historical sample, and *don* indicates the Euclidean distance between the current query sample and the *n*th historical sample. The more similar the historical sample is to the query sample, the smaller the distance *don*. However, Equation (21) only uses the input information of the historical sample and query sample, while the information of the output is not taken into consideration. Moreover, the calculation of the Euclidean distance can be regarded as the accumulation of each dimension of the sample. It is easy to see that the importance of each dimension may be different, with some dimensions contributing more to distance than others. Hence, inspired by Equation (15), the connections between the input variables and output variables are established through the correlation analysis. We define a weighted Euclidean distance as a weighted similarity measure criterion, expressed as

$$d\_{on}^{w} = \sqrt{(\mathbf{x}\_{q} - \mathbf{x}\_{n})\Omega\_{v}(\mathbf{x}\_{q} - \mathbf{x}\_{n})^{T}}\tag{22}$$

where Ω*<sup>v</sup>* = *diag*(*ρ*1, . . . , *ρm*). Then, the sample weight is expressed as

$$
\Omega\_{\rm sn} = \exp(\frac{d\_{\rm on}^w}{\varphi^2}) \tag{23}
$$

where *ϕ* is the adjust parameter, which can adjust the change rate of weight value with the sample distance. For a better expression, the JITL strategy that applied this weighted similarity measurement criterion is called WJITL.

#### *3.2. JITL-TWRELM*

A JITL-based triple-weighted regularized extreme learning machine (JITL-TWRELM) soft sensor method, combined with the WJITL strategy, was established to simultaneously incorporate sample weights and variable weights. The detailed derivation steps are as follows.

*N*(*N* < *H*) samples (*xn*, *tn*), *n* = 1, . . . , *H* from historical samples were selected for each query sample to local modeling. First, Pearson correlation coefficients between input variables and output variables of all historical samples were calculated to obtain the correlation coefficient matrix

$$
\Omega\_v^{\mathcal{S}} = \operatorname{diag}(\rho\_{1'}^{\mathcal{S}}, \dots, \rho\_m^{\mathcal{S}}) \tag{24}
$$

To distinguish it from the subsequent derivation, we call Ω *g <sup>v</sup>* the global correlation coefficient matrix, where *ρ g i* , *i* = 1, . . . , *m* is the global correlation coefficient. As a result, the weighted Euclidean distance between the query samples and the historical samples can be obtained by Equation (25).

$$d\_{on}^{tw} = \sqrt{(\mathbf{x}\_{\eta} - \mathbf{x}\_{n})\Omega\_{v}^{g}(\mathbf{x}\_{\eta} - \mathbf{x}\_{n})^{T}}, n = 1, \ldots, H \tag{25}$$

We sort *d tw on*, *n* = 1, . . . , *H* from small to large, and the first *N* samples are selected as modeling samples. The sample weighted matrix is obtained as

$$\Omega\_{\rm s}^{t} = \operatorname{diag}(\exp(\frac{d\_{o1}^{tw}}{\varphi^2}), \dots, \exp(\frac{d\_{oN}^{tw}}{\varphi^2})) = \operatorname{diag}(\Omega\_{\rm s1'}^{t}, \dots, \Omega\_{\rm sN}^{t}) \tag{26}$$

Then, the Pearson correlation coefficient of *N* local modeling samples is calculated, and the local correlation coefficient matrix is obtained as

$$
\Omega^l\_\upsilon = \operatorname{diag}(\rho^l\_{1'}, \dots, \rho^l\_m) \tag{27}
$$

where Ω*<sup>l</sup> v* is used as the local variable weighted matrix for local modeling samples and the query sample

$$X^w = X\Omega^l\_v = \{\mathfrak{x}^w\_n\}, n = 1, \dots, N \tag{28}$$

$$\mathfrak{x}\_q^w = \mathfrak{x}\_q \Omega\_v^l \tag{29}$$

where *X* ∈ *R <sup>N</sup>*×*<sup>m</sup>* consists of local modeling samples, *X <sup>w</sup>* and *x w <sup>q</sup>* are the variable weighted local modeling sample and variable weighted query sample, respectively. Thus, the new local modeling dataset (*x w n* , *tn*), *n* = 1, . . . , *N* is used to build the local model. The optimization equation for the output weight vector is established as Equation (30)

$$\begin{array}{ll}\min \frac{1}{2} \left\| \boldsymbol{\beta}^{t} \right\|^{2} + \frac{C}{2} \left\| \Omega\_{\text{s}}^{t} \boldsymbol{\xi} \right\|^{2} \\ \text{s.t.} \boldsymbol{h}((\boldsymbol{x}\_{j}^{\text{up}})^{T}) \boldsymbol{\beta}^{t} = \boldsymbol{t}\_{j} + \boldsymbol{\xi}\_{j\prime} \boldsymbol{j} = \mathbf{1}\_{\prime} \ldots \boldsymbol{N} \end{array} \tag{30}$$

The output matrix of the hidden layer is

$$H^{l} = \begin{pmatrix} g(\omega\_1(\mathbf{x}\_1^w)^T + b\_1) & \dots & g(\omega\_L(\mathbf{x}\_1^w)^T + b\_L) \\ \vdots & \ddots & \vdots \\ g(\omega\_1(\mathbf{x}\_N^w)^T + b\_1) & \dots & g(\omega\_L(\mathbf{x}\_N^w)^T + b\_L) \end{pmatrix} \tag{31}$$

According to Equations (9)–(14), the output weight vector of JITL-TWRELM is

$$\mathcal{J} = ((H^t)^T(\Omega\_s^t)^2 H^t + \frac{I\_L}{\mathbb{C}})^{-1} (H^t)^T (\Omega\_s^t)^2 T\_\prime L < N \tag{32}$$

Finally, the prediction output of the query sample is

$$\stackrel{\triangle}{t\_q^t} = \sum\_{i=1}^L \beta^t g\left(\omega\_i \left(\mathbf{x}\_q^w\right)^T + b\_i\right) \tag{33}$$

#### **4. Industrial Case**

*4.1. Process Description of the Regenerative Aluminum Smelting Furnace*

An industrial regenerative aluminum smelting furnace and its internal structure are shown in Figure 2a and Figure 2b, respectively. The regenerative aluminum smelting furnace consists of a furnace chamber, regenerative burner (including burner and ceramic sphere accumulator), reversing valve, flue gas pipe, etc. The regenerative burners are arranged in pairs, and the two opposite burners are a group (A and B). Normal temperature air from the blower enters burner B through the reversing valve and is heated as it flows through the hot ceramic sphere accumulator. Then, the normal temperature air is heated to

a temperature close to the furnace chamber (generally 80% to 90% of the furnace chamber temperature). The heated high-temperature air enters the furnace chamber and then rolls up the flue gas around the furnace to form a thin oxygen-poor high-temperature airflow with an oxygen content lower than 21%. Then, the mixture of the oxygen-poor high-temperature air and the injected flue gas is ignited to smelt the aluminum material. At the same time, the high-temperature flue gas passes through burner A, the heat is stored in the cold ceramic sphere accumulator, and then the flue gas is discharged at a temperature lower than 150 °C through the flue gas pipe. When the stored heat reaches saturation, the reversing valve is reversed, and the regenerative burner A and B change their combustion and heat storage working states, and so on, resulting in energy savings (and reducing emissions).

**Figure 2.** (**a**) An industrial regenerative aluminum smelting furnace; (**b**) the internal structure of the regenerative aluminum smelting furnace.

#### *4.2. Model Establishment*

To construct the model for the prediction of the liquid aluminum temperature, 12 secondary variables were chosen as the input variables, which are shown in Table 3. These input variables were measured by the sensor. The measurement ranges and errors of the sensors are shown in Table 4. The sampling interval of each sampling point was five

minutes. There were 4400 data samples collected for modeling, of which, 4000 samples were used as historical data for training, and 400 samples for model testing. To better test the effectiveness of the proposed method, two groups of data (D1 and D2) from different periods were used as the testing dataset, with 200 samples in each group.



**Table 4.** Sensor measurement range and error.


The flowchart of JITL-TWRELM model is shown in Figure 3. To validate the performance of JITL-TWRELM, the six methods listed below were employed for comparison.


The detailed step-by-step procedure of the proposed method is as follows.

Step 1: Prepare the input and output variables of the historical samples and perform the standardization.

Step 2: Determine the number *N* of training samples selected from the total historical samples, the parameter *ϕ* for the sample weight calculation, the hidden neuron number *L*, and the regularization coefficient *C* of the regularized extreme learning machine.

Step 3: Analyze the global correlation between the input variables and output variables of all historical samples. The global correlation coefficient matrix Ω *g <sup>v</sup>* is calculated for the sample similarity measurement.

Step 4: Calculate the weighted Euclidean distances between the current query samples and the training samples; *N* samples closest to the current query sample are selected as local modeling samples.

Step 5: Analyze the local correlation between input variables and output variables of the local modeling samples. The local correlation coefficient matrix Ω*<sup>l</sup> v* is determined.

Step 6: The JITL-TWRELM model is established, and the output of the current query sample is predicted.

Step 7: Before the next query sample arrives, the previous model is discarded and a new model is constructed based on the next query sample, enabling real-time updating of the model.

To evaluate the performance of the proposed method, four indices, including mean absolute error (*MAE*), root mean squared error (*RMSE*), mean absolute percentage error (*MAPE*), and coefficient of determination (*R* 2 ) are used in the performance evaluation, which are as follows:

$$MAE = \frac{1}{N\_T} \sum\_{i=1}^{N\_T} |y\_i - \mathcal{Y}\_i| \tag{34}$$

$$RMSE = \sqrt{\frac{1}{N\_T} \sum\_{i=1}^{N\_T} (y\_i - \mathcal{y}\_i)^2} \tag{35}$$

$$MAPE = \frac{1}{N\_T} \sum\_{i=1}^{N\_T} |\frac{y\_i - \mathcal{Y}\_i}{y\_i}| \tag{36}$$

$$R^2 = 1 - \frac{\sum\_{i=1}^{N\_T} (y\_i - \mathfrak{z}\_i)^2}{\sum\_{i=1}^{N\_T} (y\_i - \mathfrak{z}\_i)^2} \tag{37}$$

where *N<sup>T</sup>* denotes the number of samples used for testing, *y<sup>i</sup>* and *y*ˆ*<sup>i</sup>* denote the values of the actual output variable and predicted output, respectively, *y*¯*<sup>i</sup>* denotes the mean value of the actual output variable. It is essential to have small *MAE*, *RMSE*, and *MAPE*, and large *R* 2 for a prediction model.

Before establishing the JITL-TWRELM model, four parameters need to be determined. By trial and error experiments on dataset D1, *N* was set as a proper value of 200, which has a good prediction accuracy without increasing the computational burden. Similarly, the parameters *ϕ* and *L* are set to 0.3 and 20, respectively. Table 5 shows the prediction accuracy of the model under the different regularization coefficients. It can be seen that when *C* = 150, the model has a better effect.


**Table 5.** Comparison of the modeling accuracy with *C*.

#### *4.3. Results and Discussion*

To reduce the effect of randomness on the results, we took the average of ten tests as the final result. The prediction error indices of the six methods on two groups of the testing samples are shown in Table 6. We use testing dataset D1 as an example; in general, the proposed method (method 6) performed better than the other five methods on all four indices. Despite using the JITL strategy, the original method (method 1) had the worst performance on all indices. Methods 2, 3, 4, and 5 also achieved higher prediction accuracies than method 1, as neither the sample weights nor variable weights were used in method 1. Method 2 emphasizes the importance of the samples and introduces sample weights to reflect the effects of different samples on the output. Contrasted with the original JITL strategy, method 3 uses a weighted similarity measurement criterion; samples that are more similar to the query sample were selected to set up the local model, resulting in a more accurate prediction. Different from the previous methods, method 4 considers the local variable weights before establishing the model; the variable weights can be used to improve the influence of output-related variables and reduce that of irrelevant variables in feature extraction [33]. Although methods 2 to 4 have good prediction accuracy improvements, these methods only consider certain types of weighting strategies, such as individual sample weights or variable weights. Hence, method 5 introduces the sample weights and the WJITL strategy, and the *R* 2 is improved from 0.89453 to 0.97690 compared with method 1. Meanwhile, based on methods 4 and 5—method 6 has the smallest *MAE*, *RMSE*, and *MAPE*, and the highest *R* <sup>2</sup> among all methods. The *R* <sup>2</sup> of method 6 is improved from 0.97690 to 0.98764 compared with method 5. Correspondingly, the proposed method 6 has good prediction accuracy on D2, the *R* 2 reached 0.97427, which is 0.072 higher than method 1.


**Table 6.** The indices of the six methods of two groups of testing datasets.

To more intuitively demonstrate the performances of these six methods, the detailed prediction results for each method on D1 and D2 are shown in Figures 4 and 5, in which (a–f) shows the prediction results of the six methods, respectively. It is easy to see that the prediction of JITL-TWRELM matches well with the curve of the actual measurement of the furnace temperature, while the prediction curve of JITL-RELM cannot track with the real output curve in some samples. In addition, although the other four methods have certain improvements, they still do not achieve the desired effects. In summary, the flexibility and effectiveness of the proposed methods are validated.

**Figure 4.** The detailed prediction results of six methods on D1; (**a**) method 1; (**b**) method 2; (**c**) method 3; (**d**) method 4; (**e**) method 5; (**f**) method 6.

**Figure 5.** The detailed prediction results of six methods on D2; (**a**) method 1; (**b**) method 2; (**c**) method 3; (**d**) method 4; (**e**) method 5; (**f**) method 6.

The liquid aluminum temperature of the regenerative smelting furnace is generally controlled by feedback. The thermocouple is set in the furnace chamber, and if the temperature is detected to be lower than the set value, the regenerative burner starts to work. In a real industrial site, the temperature measurement performance of the thermocouple used to measure the temperature of aluminum liquids is often affected by voltage fluctuations and the aging of the protective jacket. Old thermocouples need to be replaced frequently, resulting in increased costs. The proposed method in this paper only requires the establishment of a historical database in the industrial site, and whenever a new query sample arrives, the modeling sample is selected from the historical database for modeling, and the prediction results of the aluminum liquid temperature can be obtained. As can be seen from Table 6, the *MAE*s of the proposed method 6 are 14.7273 and 14.8733 for the two test sets, respectively. Comparing the temperature range and measurement error of the thermocouple in Table 4, the accuracy of the proposed soft measurement model is close to the actual sensor, with a close to 2% error at the maximum temperature measurement range, but the efficiency and costs are more advantageous than the sensor. Therefore, the method proposed in this paper is significant for reducing production costs and improving product quality.

#### **5. Conclusions**

This paper mainly deals with the estimation of the liquid aluminum temperature in the regenerative aluminum smelting furnace. A JITL-TWRELM soft sensor modeling method is proposed. In this method, both the sample similarities and the variable correlations are considered in RELM to deal with the differences between samples and variables. Each modeling sample is assigned different weights according to the similarity calculation, and each dimension of the sample is also assigned a corresponding weight according to the correlation analysis, which improves the accuracy of the modeling compared with the original RELM. Furthermore, a weighted similarity measurement criterion is proposed for JITL to select similar samples for local modeling. Compared with the original JITL strategy, more similar modeling samples are selected for each query sample, enhancing the accuracy and reliability of the local modeling dataset. The flexibility and effectiveness of JITL-TWRELM were validated through the industrial aluminum smelting process. The industrial applications show that the proposed method can effectively deal with the nonlinear and time-varying problems in the regenerative aluminum smelting process and achieve a higher accuracy of temperature prediction compared with the other five methods.

For each query sample, the model needs to be updated once, although some adjacent query samples do not need to update the model so frequently. Selective updating of the model will improve the modeling efficiency. Therefore, developing a selective update strategy will be the focus of future work.

**Author Contributions:** Data curation, Y.L.; methodology, J.D.; supervision, J.D.; validation, X.C.; writing—original draft, X.C.; writing—review and editing, Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare that there are no conflict of interest regarding the publication of this paper.

#### **References**

