Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods

He, Kaixun; Ding, Haixiao

doi:10.3390/s24144442

Open AccessArticle

Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods

by

Kaixun He

^*

and

Haixiao Ding

College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(14), 4442; https://doi.org/10.3390/s24144442

Submission received: 20 June 2024 / Revised: 6 July 2024 / Accepted: 8 July 2024 / Published: 9 July 2024

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

:

Combustion optimization is an effective way to improve the efficiency of thermal power generation and reduce carbon and NOx emissions. Real-time and precise NOx emission prediction is the basis for combustion optimization control of thermal power plants. To construct an accurate NOx concentration prediction model, a novel just-in-time learning (JITL) method based on random forest (RF) is proposed in the present work. With this method, first, an improved permutation importance algorithm is proposed to extract important variables. In addition, a similarity index that incorporates temporal and spatial measures is defined to select a local training set representative of the process data. Moreover, considering the influence of model parameters on prediction performance under different working conditions, a process monitoring method based on a moving window (MW) is used to monitor the change in working conditions and guide online updating. The experimental results show that the proposed method has excellent prediction accuracy, with a coefficient of determination of 0.9319, a root-mean-square error of 3.6960 mg/m³, and an average absolute error of 2.7718 mg/m³ on the test set, making it superior to other traditional methods.

Keywords:

just-in-time learning; sample selection; random forest; thermal power plant; soft sensor; NOx

1. Introduction

With the increasing attention paid to environmental protection, determining how to reduce carbon and toxic gas emissions in flue gas while improving boiler efficiency is an important and urgent problem to address in thermal power plants [1,2,3,4]. Over recent decades, the reliability of power grid supply has predominantly relied on thermal power generation, with coal-fired thermal power plants contributing approximately 60% of national electricity in China [5]. Furthermore, the integration of intermittent renewable energy sources like wind power into the grid hinges upon the peak-regulating capabilities of thermal power units [6]. Consequently, thermal power generation is poised to maintain its pivotal role over the long term. Coal combustion produces substantial quantities of toxic gases, notably NOx, a primary contributor to atmospheric pollution [7,8]. As stringent environmental regulations continue to be enacted, coal-fired power plants face progressively stringent NOx emission standards for flue gases. Currently, China’s laws stipulate that NOx emissions in flue gas should not exceed 50

mg / m^{3}

[9], which poses a great challenge to the optimal control of power plant boilers and flue gas systems.

Accurate real-time detection of NOx concentration directly affects the amount of ammonia injected into the denitrification device, which is crucial for improving the efficiency of the selective catalytic reduction (SCR) and is the basis for boiler combustion optimization control [10]. The main chemical reaction equations of the denitration process are shown in Equations (1) and (2) [11].

N O + N O_{2} + 2 N H_{3} \to 2 N_{2} + 3 H_{2} O

(1)

4 N O + 4 N H_{3} + O_{2} \to 4 N_{2} + 6 H_{2} O

(2)

As illustrated, insufficient ammonia injection can cause ammonia or NOx to escape, whereas excessive injection results in resource wastage. To deal with such problems, CEMS is widely used to track and monitor NOx concentration. Unfortunately, CEMS is greatly affected by environmental disturbances, resulting in inadequate detection accuracy and considerable lag [12]. In view of this challenge, the construction of a high-precision real-time NOx prediction model is a necessary supplement to existing detection methods [13]. Prediction strategies for NOx concentration mainly include model-based and data-based methods. The first method has better interpretation ability but requires a large amount of expert knowledge. The combustion process of thermal power plant boilers is highly complicated, making it difficult to establish accurate models for describing the process [14]. With the rapid development and application of distributed control systems (DCSs) in recent years, large amounts of industrial process data can be collected. As a result, data-driven methods have gradually emerged as a viable alternative. Compared with model-based methods, data-driven strategies bypass the need to solve complex conservation equations, resulting in faster and more reliable model responses; hence, they have received widespread attention. For example, Wang et al. [15] employed deep belief networks to extract data features and constructed networks consisting of an extreme learning machine (ELM), a backpropagation neural network (BPNN), and a radial basis function to predict NOx concentration. Jacob et al. [16] developed a combustion optimization system based on neural networks and particle swarm optimization, which effectively reduced the emission of NOx from power plants. Yang et al. [17] established a NOx prediction model based on a long short-term memory (LSTM) neural network, which exhibited better performance than least-squares support vector machine (LSSVM) and recurrent neural network models. Yuan et al. [18] proposed a NOx emission prediction method using linear regression as the metamodel and adopted BPNN, support vector regression, and decision tree as the basic models. The proposed stacked-generalization ensemble method demonstrated strong robustness and generalization capability. Li et al. [19] proposed a novel model architecture composed of a convolutional neural network and an effective channel attention module, which demonstrated good performance in predicting NOx emissions. Timo Korpela et al. [20] compared the performance of three nonlinear methods—multilayer perceptron, support vector regression, and fuzzy inference system—in predicting NOx concentration in natural gas boilers. Xie et al. [21] applied the sequence-to-sequence structure from the field of natural language processing to the LSTM model, achieving simultaneous prediction of NOx emissions at multiple time points. In addition, to reduce the impact of redundant variables on the performance of the prediction model, many variable selection methods have been incorporated. Xing et al. [22] utilized partial least squares (PLS) for variable selection and established an extreme gradient boosting ensemble model to predict the NOx emission concentration of coal-fired boilers. Tang et al. [23] employed mutual information combined with autoencoders and an ELM to deeply explore the relationship between NOx emission concentration and features. Wang et al. [24] utilized random forest (RF) to calculate the importance of variables and select the important process variables. Based on neural networks, Zhang et al. [25] employed the mean impact value to select variables and achieved multiobjective prediction of boiler thermal efficiency and NOx emission coupling. Tang et al. [12] utilized the LASSO and relief feature selection algorithms to select the important variables. They proposed an error correction strategy and established an ELM model, which can accurately predict NOx concentration at the boiler outlet. LSSVM [26,27], convolutional neural networks [28,29,30,31,32], and ensemble learning methods [33,34,35,36] have also been widely used by scholars in NOx concentration prediction.

Although the above studies achieved enhanced results under certain scenarios, they hardly meet the need for timely updates in online applications [3,37]. To address this issue, Lv et al. [38] improved LSSVM and updated the model based on an incremental strategy. Li et al. [14] proposed a variable exponentially weighted MWPLS method, which can be adaptively updated by adjusting the window size. Lara F. A. [39] used the

T^{2}

and Q statistics of principal component analysis (PCA) to determine whether a model needs to be recalibrated. Lv et al. [40] proposed an adaptive strategy by updating the operating dataset when the model’s performance deteriorates. The above methods have achieved good results in addressing specific problems. However, the model parameters are difficult to adjust. Consequently, they are difficult to directly apply to the real-time prediction of NOx concentration in thermal power plants. Recently, just-in-time learning (JITL) [41,42] has seen widespread adoption due to its intrinsic characteristics favoring online implementation. Instead of building a global model, JITL creates a local model based on the similarity between input and output samples in real time. By using the current measured input data, similar samples in the database are collected for modeling. Considering the advantages of JITL, an improved scheme based on JITL for online prediction of NOx emissions is proposed in this work. With the proposed strategy, a supervised similarity distance measurement method is defined to adaptively select important training samples from the original dataset. RF regression is adopted to establish a prediction model, and an active updating strategy is proposed to maintain the model online. Moreover, to establish a robust model, a variable selection method is proposed that enables the robust selection of important variables. The industrial application results show that the presented strategy can provide good prediction accuracy and is suitable for long-term industrial applications.

This paper is organized as follows. Preliminaries about JITL and RF are provided in Section 2. Section 3 describes the target boiler in this work. Section 4 presents the proposed modeling method. Section 5 discusses the experiments and results obtained using real-world data. Finally, the conclusions are provided in Section 6.

2. Preliminaries

The key to predicting NOx concentration is to establish a learner

f (\cdot)

with historical data. Subsequently, the auxiliary variable

X_{q}

is substituted into the formula

\overset{⌢}{y} = f (X_{q}) + ε_{q}

to obtain the prediction value. In this work, JITL and RF are combined to develop a local prediction model. To elucidate the proposed approach, this section outlines the foundational principles of JITL and RF.

2.1. JITL Method

JITL is a dynamic modeling framework in which all historical data are stored in a database, and a model is built in real time by searching the database for the samples most relevant to the query sample

X_{q}

by a certain similarity index. After prediction, the established model is discarded. For the regression problem, the most common metric used to measure the similarity between

X_{q}

and the historical sample

X_{i}

is the Mahalanobis distance, which is defined as follows [43]:

d_{M} (X_{i}, X_{q}) = \sqrt{(X_{i} - X_{q}) \sum^{- 1} {(X_{i} - X_{q})}^{- 1}}

(3)

where

X \in R^{n \times m} = {X_{1}, \dots, X_{i}, \dots X_{n}}

,

i = 1, 2, \dots, n

denotes the sample number in the historical database and ∑ is the covariance matrix of X. Based on the Mahalanobis distance values, the similarity between

X_{q}

and each sample in the historical data can be calculated using the Gaussian kernel function, as follows:

S_{i, q} = exp (- \frac{d_{M}^{2} (X_{i}, X_{q})}{2 σ^{2}})

(4)

where

σ

is the kernel width. In accordance with

S_{i, q}

, the first l samples with high similarity are selected from the historical data to construct the local training set. Then, the output prediction of the query sample

X_{q}

is given by:

{\overset{⌢}{y}}_{q} = f (X_{q}, Θ)

(5)

where

Θ

is the hyperparameter of the model

f (\cdot)

.

2.2. RF Regression

RF is an ensemble learning algorithm capable of generating numerous decision trees that serve as regression learners. The final prediction result of RF is derived from the mean value of all trees. The structure of RF is shown in Figure 1. Classification and regression tree (CART) is commonly employed as a decision tree for RF. Given a single output dataset

D = {X_{t r a i n}, Y_{t r a i n}}

, where

X_{t r a i n} = (X_{1}, X_{2}, \dots, X_{N}) \in R^{N \times M}

,

Y_{t r a i n} = (y_{1}, y_{2}, \dots, y_{N}) \in R^{N \times 1}

, and N is the number of samples, along with the decision tree algorithm

Γ

and the number of base learners T, the steps for constructing a regression model using RF are as follows:

Step 1: For each decision tree t, n samples are randomly collected from D to construct a subtraining set

D_{t}

using the bagging method, where

t \in [1, T]

.

Step 2: The tth learner

h_{t} = Γ (D_{t})

is trained using

D_{t}

. During the training process, for each node, m(

m < M

) features are selected. Then, the optimal partition is selected from these m features to divide the molecular tree.

Step 3: During the formation of the decision tree, each node should be split in accordance with Step 2. The decision tree is trained with this subset until it is no longer possible to split, and the tree is not pruned.

Step 4: In accordance with Steps 1–3, a series of decision trees are established until T trees are trained.

During the application stage, the test samples are sent to each decision tree for regression prediction, and the prediction results of all decision trees for the same sample are counted. The average value of all results is used as the final predicted value, which can be expressed as follows:

H = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (X)

(6)

The RF algorithm is well suited for modeling complex processes due to its effective utilization of randomness, which reduces inter-tree correlations. Therefore, in this work, RF regression is adopted as the basic modeling method.

3. Proposed Method

Generally, the representativeness of auxiliary variables determines the upper limit of model performance. Selecting important variables helps prevent dimensionality disasters and mitigate overfitting. With RF, important variables are typically identified using permutation importance. The basic idea of permutation importance is as follows: for each variable

x^{j}

,

j \in [1, M]

. The sequence of variables

x^{j}

in out-of-bag (OOB) data is disrupted, and the corresponding relationship between auxiliary and output variables is broken. Subsequently, CART performs regression predictions on OOB data both before and after shuffling, calculating the mean square errors (MSEs) for each decision tree, which collectively determine the variable importance index

x^{j}

.

Given an RF with T decision trees

H = {h_{1}, h_{2}, \dots, h_{t}, \dots, h_{T}}

, where

t \in [1, T]

, the variable importance of

x^{j}

is defined as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}

(7)

{\bar{D}}_{j} = \frac{1}{T} \sum_{t = 1}^{T} (M S E_{t}^{O O B} - M S E_{t, v}^{O O B})

(8)

where

M S E_{t}^{O O B}

and

M S E_{t, v}^{O O B}

are the

M S E s

of the OOB data on tree t before and after interference, respectively, and

{\bar{D}}_{j}

is the variable importance of

x^{j}

. The permutation importance algorithm of RF is presented in Algorithm 1.

Algorithm 1: The permutation importance algorithm of RF

Based on the above permutation importance algorithm, determining the number of crucial variables to retain is not straightforward. In practice, threshold values are frequently empirically determined to facilitate variable selection, rendering the process notably subjective. To deal with this problem, this work proposes an iterative selection strategy, which determines the variable set to be retained by counting the frequency of selected variables in multiple runs. The specific steps of the improved variable selection method based on permutation importance are as follows:

Step 1: A regression model is established using each decision tree

h_{t}

, the value of the corresponding OOB data is predicted, and

M S E_{t}

is output.

Step 2: For each

O O B_{t}

set, the order of variable

x^{j}, j \in [1, M]

, is shuffled to obtain a new set; then,

M S E_{t}^{j}

is calculated using the model established in Step 1.

Step 3: The prediction influence factor of

x^{j}

on tree

h_{t}

is calculated using the following equation:

v i_{t}^{j} = M S E_{t}^{j} - M S E_{t}

(9)

Step 4: All decision trees are traversed, and Steps 1–3 are repeated. Afterward, the influence factors of

x^{j}

on all decision trees are obtained.

Step 5: The total influence factor of

x^{j}

on the RF is calculated as follows:

V I_{R F}^{j} = \frac{1}{T} \sum_{t = 1}^{T} v i_{t}^{j}

(10)

Step 6: All variables in

X_{t r a i n}

are traversed, and Steps 1–5 are repeated to acquire the influence factors

V I_{R F}

of all variables.

Step 7: The variables are sorted in accordance with the value of

V I_{R F}^{j}

.

Step 8: Steps 1–7 are repeated ℓ times, and the variables with the highest frequency in the top K are saved.

Step 9: The variable with the lowest frequency of 10% is eliminated each time, and the corresponding OOB error after removing the variables is calculated.

Step 10: The subset of variables corresponding to the minimum OOB error is selected.

3.1. Strategy for Local Training Sample Selection

In JITL, a crucial initial task involves identifying local training samples that closely resemble the query sample. To implement this operation, the distance

d_{M_{i}}

between the query sample

X_{q}

and the ith historical sample

X_{i}

is defined first, and then local training samples are selected based on

d_{M_{i}}

. As mentioned above, the Mahalanobis distance is generally used to define the similarity for regression modeling. The Mahalanobis distance considers the spatial characteristics of auxiliary variables but ignores the relationship between dependent variables.

Due to the strong time sequence characteristics inherent in thermal power plant process data, a more representative similarity index can be constructed by introducing the dependent variable information of the adjacent time-domain samples. Accordingly, we propose an improved similarity index by simultaneously considering the temporal and spatial characteristics of the independent and dependent variables.

Using the proposed scheme, the Mahalanobis distance

d_{M_{i}}

between

X_{i}

and

X_{q}

is calculated. The similarity index of all samples in the historical dataset is obtained. The average value of the NOx concentration of the sample in the preceding

M w

time window of

X_{q}

is calculated. The distance of the dependent variable is calculated as follows:

d_{y} (y_{i}, {\bar{y}}_{l o c a l}) = \sqrt{{(y_{i} - {\bar{y}}_{l o c a l})}^{2}}

(11)

where

y_{i}

is the concentration of NOx in the historical data and

{\bar{y}}_{l o c a l}

is the average concentration within the

M w

time scale.

In this way, the distance

d_{M}

defined by the independent variable and the distance

d_{y}

defined by the temporal data and dependent variables are obtained. The final similarity can be calculated through the following equations:

d_{s i m} (X_{i}, X_{q}) = λ d_{M} + (1 - λ) d_{y}

(12)

S_{i, q} = exp (- \frac{d_{s i m}^{2} (X_{i}, X_{q})}{2 σ^{2}})

(13)

where

0 \leq λ \leq 1

is a conversion factor; when

λ = 1

, the similarity degenerates to a method based solely on the Mahalanobis distance.

In accordance with the above method, L samples with high similarity can be selected from the historical data to construct a local training set

D_{l}

. Given that the local training set selected using the above method still has high leverage points, the existence of such samples reduces the performance of the constructed prediction model. At the end of sample selection, we use an F distribution to eliminate sample points with low confidence. The specific operations are described below.

The mean value of the independent variables in

D_{l}

is calculated using Equation (14). If

d_{M} (X_{l}, u) > F_{α} (M, L)

, then the sample

{X_{l}, Y_{l}}

is deleted from

D_{l}

, where

α

is the quantile; M and L are the number of independent variables and the number of samples in

D_{l}

, respectively. For ease of description, the final selected local sample set is still represented by

D_{l}

.

u = m e a n (X_{l o c a l})

(14)

The pseudocode for local training sample selection based on the proposed similarity definition method is shown in Algorithm 2.

Algorithm 2: The algorithm for local training sample selection

3.2. Process Condition Monitoring and Parameter Updating Strategy

The thermal power generation process has typical multicondition characteristics. Under the same working conditions, a high-performance prediction model can be constructed using JITL. However, the local training set constructed according to Equation (12) may drift in its representation of the process conditions after switching conditions. To address this problem, MWPCA is adopted to monitor the production process. Then,

λ

is updated timely in accordance with the monitoring results. The specific process of the algorithm is described below.

The moving window length is denoted as L, and the corresponding data matrix within the window at time t is denoted as

M a t r i x I

:

X_{t} = (x_{t - L + 1}, x_{t - L + 2}, \dots,

x_{t - 1}, x_{t})^{T} \in R^{L \times M}

. At time

t +

1, the corresponding data matrix within the window is denoted as

M a t r i x I I I

:

X_{t + 1} = {(x_{t - L + 2}, x_{t - L + 3}, \dots, x_{t}, x_{t + 1})}^{T} \in R^{L \times M}

, and the data matrix corresponding to the transition window is denoted as

M a t r i x I I

:

{\tilde{X}}_{t, t + 1} = {(x_{t - L + 2}, x_{t - L + 3}, \dots, x_{t - 1}, x_{t})}^{T} \in R^{(L - 1) \times M}

. The specific steps are as follows:

Step 1: Transition from

M a t r i x I

to

M a t r i x I I

:

μ_{t} \in R^{M \times 1}

and

C_{t} \in R^{M \times M}

are set as the mean vector and covariance matrix of

M a t r i x I

, respectively. The mean vector

{\tilde{μ}}_{t, t + 1}

and standard deviation

δ_{t, t + 1}^{i} (i = 1, 2, \dots, M)

of the variables for

M a t r i x I I

can be determined using Equations (15) and (16), respectively.

{\tilde{μ}}_{t, t + 1} = \frac{L}{L - 1} μ_{t} - \frac{1}{L - 1} x_{t - L + 1}

(15)

{(δ_{t, t + 1}^{i})}^{2} = \frac{L - 1}{L - 2} {(δ_{t}^{i})}^{2} - \frac{L - 1}{L - 2} {(μ_{t}^{i} - {\tilde{μ}}_{t, t + 1}^{i})}^{2} - \frac{1}{L - 2} ‖ {x^{i}}_{t - L + 1} - μ_{t}^{i} ‖^{2}

(16)

\sum_{t, t + 1}^{} = d i a g (δ_{t, t + 1}^{1}, δ_{t, t + 1}^{2}, \dots, δ_{t, t + 1}^{M})

(17)

The covariance matrix

{\tilde{C}}_{t, t + 1} \in R^{M \times M}

of

M a t r i x I I

can be derived from the above formula and is shown below:

{\tilde{C}}_{t, t + 1} = \frac{L - 1}{L - 2} (C_{t} - \frac{1}{L - 1} {x^{T}}_{t - L + 1} x_{t - L + 1} - \sum_{t}^{- 1} (μ_{t} - {\tilde{μ}}_{t, t + 1}) {(μ_{t} - {\tilde{μ}}_{t, t + 1})}^{T} \sum_{t}^{- 1})

(18)

where

\sum_{t}^{} = d i a g (δ_{t}^{1}, δ_{t}^{2}, \dots, δ_{t}^{M})

is the diagonal matrix composed of the standard deviations of each variable in

M a t r i x I

.

Step 2: Transition from

M a t r i x I I

to

M a t r i x I I I

:

Similar to Step 1, here, the mean vector

μ_{t + 1}

of

M a t r i x I I I

and the standard deviation

δ_{t + 1}^{i} (i = 1, 2, \dots, M)

of each variable can be recursively calculated.

μ_{t + 1} = \frac{L - 1}{L} {\tilde{μ}}_{t, t + 1} + \frac{1}{L} x_{k + 1}

(19)

{(δ_{t + 1}^{i})}^{2} = \frac{L - 2}{L - 1} {(δ_{t, t + 1}^{i})}^{2} + {(μ_{t + 1}^{i} - {\tilde{μ}}_{t, t + 1}^{i})}^{2} - \frac{1}{L - 1} ‖ {x^{i}}_{t + 1} - μ_{t + 1}^{i} ‖^{2}

(20)

\sum_{t + 1}^{} = d i a g (δ_{t + 1}^{1}, δ_{t + 1}^{2}, \dots, δ_{t + 1}^{M})

(21)

The new samples are standardized as follows:

x_{t + 1} = \sum_{t + 1}^{- 1} (x_{t + 1} - μ_{t + 1})

(22)

The covariance matrix

C_{t + 1} \in R^{M \times M}

of

M a t r i x I I I

can be derived from the above formula.

C_{t + 1} = \frac{L - 2}{L - 1} {\tilde{C}}_{t, t + 1} + \frac{1}{L - 1} {x^{T}}_{t + 1} x_{t + 1} + \sum_{t + 1}^{- 1} (μ_{t + 1} - {\tilde{μ}}_{t, t + 1}) {(μ_{t + 1} - {\tilde{μ}}_{t, t + 1})}^{T} \sum_{t + 1}^{- 1}

(23)

Substituting Equation (18) into Equation (23) yields the expression for transitioning from

M a t r i x I

to

M a t r i x I I I

as follows:

\begin{matrix} C_{t + 1} = C_{t} - \sum_{t}^{- 1} (μ_{t} - {\tilde{μ}}_{t, t + 1}) {(μ_{t} - {\tilde{μ}}_{t, t + 1})}^{T} \sum_{t}^{- 1} \\ - \frac{1}{L - 1} x_{t - L + 1} {x^{T}}_{t - L + 1} + \frac{1}{L - 1} x_{t + 1} x_{t + 1}^{T} \\ + \sum_{t + 1}^{- 1} (μ_{t + 1} - {\tilde{μ}}_{t, t + 1}) {(μ_{t + 1} - {\tilde{μ}}_{t, t + 1})}^{T} \sum_{t + 1}^{- 1} \end{matrix}

(24)

After the covariance matrix of the data in the new window is obtained, the corresponding principal component model can be obtained through singular value decomposition of the covariance matrix

C_{t + 1}

. The statistic

T^{2}

and the control limit for

T^{2}

with confidence level

α

can be determined using Equations (25) and (26), respectively.

T^{2} = x^{T} P Λ^{- 1} P^{T} x

(25)

T_{α}^{2} = \frac{A (L^{2} - 1)}{L (L - A)} F_{A, L - a; α}

(26)

where

Λ \in R^{A \times A}

is the diagonal matrix composed of the first A eigenvalues of the covariance matrix

C_{t + 1}

and

F_{A, L - a; α}

is the critical value of the F distribution with A and

L - A

degrees of freedom and a significance level of

α

.

When the

T^{2}

corresponding to the query sample is greater than the monitoring threshold, a working condition switch has occurred. If the monitoring statistic of the query sample does not exceed the monitoring threshold, the historical data in the moving window have characterized the current process well. In this case,

λ

can take a slightly smaller value, and vice versa.

3.3. Overall Flow of the Proposed Modeling Strategy

Following the aforementioned variable selection and local training sample selection methods, the overall flow of the NOx prediction model based on JITL is presented in Figure 2, and the corresponding pseudocode is shown in Algorithm 3.

Algorithm 3: Just-in-time RF

4. Boiler System and Data Preparation

4.1. Description of the Boiler System

In this work, the direct current boiler of a 1030 MW ultra-supercritical coal-fired unit is adopted as the target object. The object is a

π

-type boiler, featuring balanced ventilation, ultra-supercritical parameters, primary reheat, a spiral furnace, solid slag discharge, and an open-air arrangement. The pulverizing system employs a medium-speed coal mill with positive-pressure direct cooling primary air. The combustion unit adopts a front and rear wall hedge combustion mode, featuring low-NOx double adjustable air swirl burners and nozzles. The furnace dimensions are 64,500 mm in height, 33,128.7 mm in cross-sectional width, and 16,308.7 mm in depth. A flue gas-regulating baffle device is arranged at the bottom of the flue passage to distribute the flue gas, maintaining the reheat steam outlet temperature within the control load range. The flue gas is collected by the regulating baffle and then introduced into the SCR denitration device through the two tail flues. After denitration, the flue gas enters the air preheater. Figure 3 shows the overall structure of the boiler and SCR denitrification.

4.2. Data Description

The historical data used in this work were obtained from DCS, and a total of 5184 groups of 72 h data were collected at a sampling interval of 50 s. The abnormal samples were eliminated using the

3 σ

rule (Equation (27)), and 5174 groups of data were retained.

\bar{y} - 3 σ \leq y \leq \bar{y} + 3 σ

(27)

where

σ

and

\bar{y}

represent the standard deviation and mean value of NOx concentration, respectively. Among these data, 30% were randomly selected to construct the training set, about 60% were used as the test set, and the remaining 10% were utilized to construct the verification set. The original data contained 390 auxiliary variables, which were normalized as follows:

x = \frac{x - \bar{x}}{φ}

(28)

where

\bar{x}

and

φ

represent the mean and standard deviation values, respectively.

To assess the performance of the mentioned models, three indices—root-mean-square error (RMSE), coefficient of determination (R²), and mean absolute error (MAE)—were adopted. Generally, the smaller the values of the RMSE and MAE, the higher the accuracy of the model. R² describes the explanatory ability; the closer its value to 1, the greater the explanatory ability of the model.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}}

(29)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(30)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\overset{⌢}{y}}_{i}

(31)

where n is the number of test samples;

y_{i}

and

{\overset{⌢}{y}}_{i}

represent the actual and predicted values, respectively; and denotes the mean value.

5. Case Study

5.1. Parameter Tuning

Two parameters should be adjusted when constructing a model using RF: the number of leaf nodes, i.e.,

n L e a f

, and the number of decision trees, i.e.,

n T r e e

. In this work, these parameters are adjusted based on the OOB error on the training set. The numbers of leaf nodes and decision trees that minimize the OOB error are regarded as the optimal parameters. As illustrated in Figure 4, when

n L e a f

is 2, the corresponding

M S E

curve is the lowest. After the number of decision trees is increased to more than 50, the

M S E

of the OOB samples hardly decreases. Therefore, in this work, the values of

n L e a f

and

n T r e e

are set to 2 and 50, respectively.

5.2. Results and Discussion

In accordance with the variable selection method mentioned above, with

ℓ = 100

and K = 50, the RF model was run 100 times independently. Subsequently, variables that most frequently entered the top 50 based on their importance index were tallied. Finally, a total of 38 sets of variables were selected. Figure 5 shows the selection results.

To improve adaptability to the multicondition process, MW-PCA was employed to determine whether the parameter

λ

needed to be updated. Experimentally, the window size was set to

M w

= 200 and monitored for 20 consecutive times. When the

T^{2}

control limit was exceeded 15 times or more, an alarm signal was generated. This signal indicated that the process conditions had changed, and the parameter

λ

had to be updated. Figure 6 illustrates the monitoring results of MWPCA on the test set (the thick red line represents the

T^{2}

control limit for each window, the blue line represents the calculated

T^{2}

and the red circle represents the update time of the model).

As illustrated in Figure 6, a total of 228 changes were detected. The alarm occurred frequently— four times. These alarm messages were concentrated in periods of significant fluctuation in NOx concentration. The greater the fluctuation in NOx concentration, the more alarms occurred. These periods corresponded to transitional conditions before and after process changes. When the process condition was switched,

{\bar{y}}_{l o c a l}

in Equation (11) was insufficient as a representative for the query sample

y_{q}

. Therefore,

λ

should take a slightly larger value to ensure that the similarity calculated using Equation (12) maintains good accuracy.

Figure 7a presents the errors between the measured and predicted values of NOx, predominantly concentrated within the range of ±10

mg / m^{3}

. This indicates that the model demonstrates excellent stability and accuracy. Figure 7b intuitively shows the situation between the predicted and actual values. The model generally tracked NOx concentration variations well, albeit with large errors when the NOx concentration fluctuated significantly. These moments generally corresponded to transitional stages of changes in the operating conditions. This finding is consistent with Figure 6, where the model exhibited significant errors during changes in the operating conditions. Subsequent updates incorporating the latest operational data resulted in improved and sustained model accuracy.

Figure 8a shows the scatter plot of the predictions. The predicted and measured values are closely distributed near the perfect straight line, with small deviation and variance, demonstrating the excellent predictive performance of the model. Figure 8b indicates that the model prediction errors are primarily concentrated within the range of —5

mg / m^{3}

to 5

mg / m^{3}

, with 99.71% of samples exhibiting an absolute error within 15

mg / m^{3}

.

To validate the superiority of the proposed method, the traditional PLS and RF models, the MW-PLS and MW-RF models incorporating the MW strategy, and the JIT-PLS and JIT-RF models with the JITL strategy were adopted as comparison methods. Table 1 presents the results of the three evaluation metrics for the six models and the proportion of samples with an absolute error within 15

mg / m^{3}

.

The R², RMSE, and MAE values of the proposed model on the test set were 0.9319, 3.6960

mg / m^{3}

, and 2.7718

mg / m^{3}

, respectively, surpassing those of the other models in both prediction accuracy and error distribution. The predictive performance of the RF model was superior to that of the PLS model, which indicates that the RF model holds a greater advantage in predicting NOx concentration. The predictive performance of the PLS and RF models with the addition of the MW strategy was inferior to that of the traditional PLS and RF models, possibly due to information loss caused by the fixed length of MW. Specifically, the R², RMSE, and MAE of the PLS model on the test set were 0.7879, 6.5226

mg / m^{3}

, and 4.5792

mg / m^{3}

, respectively. The R², RMSE, and MAE of the JIT-PLS model on the test set were 0.8473, 5.5345

mg / m^{3}

, and 3.7936

mg / m^{3}

, respectively. A comparison of the performance of the PLS and JIT-PLS models indicates that the PLS model incorporating the JITL strategy exhibited better predictive performance compared to the traditional PLS model, which proves the effectiveness and applicability of the JITL strategy. The performance results of the JIT-RF model on the test set were 0.9252, 3.8727

mg / m^{3}

, and 2.9018

mg / m^{3}

. Incorporating the model update strategy into the JIT-RF model leads to further improvements in prediction performance.

Figure 9 and Figure 10 show the error distributions of the six comparative algorithms on the test set, illustrating that the RF model exhibited more stable prediction performance compared to the PLS model, with the JIT-RF model demonstrating the highest prediction accuracy. The incorporation of the JITL strategy into both the PLS and RF models resulted in a significant reduction in prediction errors. The range of error fluctuation also significantly improved, indicating the effectiveness of the JITL strategy. Conversely, the inclusion of the MW strategy led to a notable increase in prediction errors and the fluctuation range of errors. Thus, the addition of the MW strategy results in decreased accuracy due to the loss of input information and increased instability in prediction.

Figure 11, Figure 12 and Figure 13 present the error statistics and scatter plots. In Figure 12b, the NOx concentration predicted by the JIT-RF model and the actual measured values approximately follow the diagonal line. In contrast, Figure 13a demonstrates that the sample points predicted by the MW-PLS model deviate significantly from the perfect line. MW-PLS exhibited the worst and most unstable predictive performance, whereas the JIT-RF model had better predictive accuracy than the other models. After the addition of the MW strategy, the predictive performance of the MW-PLS and MW-RF models decreased (Figure 13), possibly due to the incomplete information in the training set caused by the MW strategy. After the JITL strategy was incorporated into the PLS and RF models, the sample points became closer to the perfect line (Figure 11b and Figure 12b), indicating that the prediction performance of the model improved and demonstrating the advantages of the JITL strategy.

The above analysis indicates that the proposed method outperforms traditional approaches and is particularly suited for the online prediction of the concentration of NOx emissions.

6. Conclusions

In this work, an improved JITL-based prediction method is proposed to predict the concentration of NOx emissions in a coal-fired power plant. A supervised similarity distance measurement method is defined, and local training samples can be effectively selected. To establish a robust model, a variable selection method is also proposed that enables the robust selection of important variables. Several comparative experiments on a real-world industrial dataset are presented. The experiments show that the proposed method has good accuracy. Therefore, it is suitable for the long-term prediction of NOx emissions. With the development of industrial information technology, the acquisition of multisource process data has become increasingly convenient. Based on multisource data, such as audio and images, building a large prediction model will be beneficial to improving the accuracy and real-time performance of NOx emission prediction, which will be our future research focus.

Author Contributions

K.H.: conceptualization, methodology, and writing—original draft preparation; H.D.: software, data curation, visualization, and investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant numbers 62273214 and 62233012, and in part by the Research Fund for the Taishan Scholar Project of Shandong Province of China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data generated in this study are presented in this article. For any clarifications, please contact the corresponding author.

Acknowledgments

The authors would like to thank the editors and anonymous referees for their invaluable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, K.; Su, H. Exploring the current situation of air pollution emissions and flue gas desulfurization and denitration technologies in thermal power plants. China Plant Eng. 2022, 1, 211–212. [Google Scholar]
Wang, B. Analysis of integrated air pollution control technology in thermal power industry. Energy Conserv. Environ. Prot. 2021, 10, 64–66. [Google Scholar]
Cao, W. Soft Sensor and Its Application of Flue Gas NOx Concentration in Thermal Power Plant Based on Improved Autoencoder. Master’s Thesis, Zhejiang University, Zhejiang, China, 2022. [Google Scholar]
Zhang, L.; Lin, D.; Wang, Y.; Ji, G.; Ma, S.; Cao, X.; Liu, Z.; Ma, R.; Wang, B. Review of applications of machine learning in nitrogen oxides reduction in thermal power plants. Therm. Power Gener. 2022, 52, 7–17. [Google Scholar]
Cai, S. Challenges and prospects for the trends of power structure adjustment under the goal of carbon peak and neutrality. S. Energy Constr. 2021, 8, 8–17. [Google Scholar]
Qin, S. Analysis of Wind Power Grid Connection Technology and Power Quality Control. Electronic Technology. 2022, 51, 110–111. [Google Scholar]
Pang, S.; Sun, P. Application analysis of flue gas desulfurization and denitration technology in air pollution control of thermal power plant. Leather Manuf. Environ. Technol. 2022, 3, 82–84. [Google Scholar]
He, L.; Wei, H.; Cui, Y.; He, J. Research on prevention and control technology of boiler air pollution in thermal power plant. Chem. Eng. Manag. 2021, 24, 45–46. [Google Scholar]
Qian, H.; Zhang, C.; Chai, T. Research on outlet NOx concentration prediction of SCR denitration system based on random forest algorithm. J. Eng. Therm. Energy Power 2021, 36, 122–129+137. [Google Scholar]
Pan, Y. Intelligent Control of Power Plant SCR Flue Gas Denitration System Based on Mechanism Modeling. Ph.D. Thesis, North China Electric Power University, Beijing, China, 2019. [Google Scholar]
Li, K. Research on NOx Emission Prediction of Power Plant Boiler Based on Ensemble Learning Method. Ph.D. Thesis, Northeast Electric Power University, Jilin, China, 2021. [Google Scholar]
Tang, Z.; Zhu, D.; Li, Y. Data driven based dynamic correction prediction model for NOx emission of coal fired boiler. Proc. CSEE 2022, 42, 5182–5194. [Google Scholar]
Zhuo, J.; Jiao, W.; Song, G.; Xiong, S.; Yao, Q.; Pan, T. A review on nitrogen oxides prediction model in combustion optimization of boilers. J. Combust. Sci. Technol. 2016, 22, 531–540. [Google Scholar]
Li, Z.; Lee, Y.; Chen, J.; Qian, Y. Developing variable moving window PLS models: Using case of NOx emission prediction of coal-fired power plants. Fuel 2021, 296, 120441. [Google Scholar] [CrossRef]
Wang, F.; Ma, S.; Wang, H.; Li, Y.; Zhang, J. Prediction of NOx emission for coal-fired boilers based on deep belief network. Control. Eng. Pract. 2018, 80, 26–35. [Google Scholar] [CrossRef]
Tuttle, J.; Vesel, R.; Alagarsamy, S.; Blackburn, L.; Powell, K. Sustainable NOx emission reduction at a coal-fired power station through the use of online neural network modeling and particle swarm optimization. Control. Eng. Pract. 2019, 93, 104167. [Google Scholar] [CrossRef]
Yang, G.; Wang, Y.; Li, X. Prediction of the NOx emissions from thermal power plant using long-short term memory neural network. Energy 2020, 192, 116597. [Google Scholar] [CrossRef]
Yuan, Z.; Meng, L.; Gu, X.; Bai, Y.; Cui, H.; Jiang, C. Prediction of NOx emissions for coal-fired power plants with stacked-generalization ensemble method. Fuel 2021, 289, 119748. [Google Scholar] [CrossRef]
Li, N.; Lv, Y.; Hu, Y. Prediction of NOx emissions from a coal-fired boiler based on convolutional neural networks with a channel attention mechanism. Energies 2022, 16, 76. [Google Scholar] [CrossRef]
Korpela, T.; Kumpulainen, P.; Majanne, Y.; Häyrinen, A.; Lautala, P. Indirect NOx emission monitoring in natural gas fired boilers. Control. Eng. Pract. 2017, 65, 11–25. [Google Scholar] [CrossRef]
Xie, P.; Gao, M.; Zhang, H.; Niu, Y.; Wang, X. Dynamic modeling for NOx emission sequence prediction of SCR system outlet based on sequence to sequence long short-term memory network. Energy 2020, 190, 116482. [Google Scholar] [CrossRef]
Xing, H.; Guo, J.; Zhang, Y.; Liu, S.; Liu, B.; Chang, Z. NOx emission prediction based on variable selection and XGBoost combined model. Autom. Instrum. 2021, 7, 21–25. [Google Scholar]
Tang, Z.; Wang, S.; Chai, X.; Chai, S.; Ouyang, T.; Li, Y. Auto-encoder-extreme learning machine model for boiler NOx emission concentration prediction. Energy 2022, 256, 124552. [Google Scholar] [CrossRef]
Wang, Z.; Peng, X.; Cao, S.; Zhou, H.; Fan, S.; Li, K.; Huang, W. NOx emission prediction using a lightweight convolutional neural network for cleaner production in a down-fired boiler. J. Clean. Prod. 2023, 389, 136060. [Google Scholar] [CrossRef]
Zhang, J.; Gu, C.; Li, R.; Jin, J. Design of Coupling Prediction Model for Boiler Energy Efficiency and NOx Emission. Energy Conserv. Technol. 2020, 38, 407–411+481. [Google Scholar]
Yang, T.; Ma, K.; Lv, Y.; Bai, Y. Real-time dynamic prediction model of NOx emission of coal-fired boilers under variable load conditions. Fuel 2020, 274, 117811. [Google Scholar] [CrossRef]
Lv, Y.; Liu, J.; Yang, T.; Zeng, D. A novel least squares support vector machine ensemble model for NOx emission prediction of a coal-fired boiler. Energy 2013, 55, 319–329. [Google Scholar] [CrossRef]
Yu, Y.; Han, Z.; Xu, C. NOx concentration prediction based on deep convolution neural network and support vector machine. Proc. CSEE 2022, 42, 238–248. [Google Scholar]
Wu, S.; Ma, Y. Prediction of NOx emission from power plant boiler based on mixed deep learning network. China Meas. Test 2022, 48, 166–174. [Google Scholar]
Wen, X.; Li, K.; Wang, J. NOx emission predicting for coal-fired boilers based on ensemble learning methods and optimized base learners. Energy 2023, 264, 126171. [Google Scholar] [CrossRef]
Liu, H.; Wang, Y.; Li, X.; Yang, G. Prediction of NOx emissions of coal-fired power plants based on mutual information-graph convolutional neural network. Proc. CSEE 2021, 42, 1052–1060. [Google Scholar]
Huang, Z. Prediction of NOx emissions from power plants based on EEMD and convolutional neural network. J. Eng. Therm. Energy Power 2022, 37, 96–103. [Google Scholar]
Ma, L.; Zhang, L.; Gu, Y.; Wu, W.; Tang, Y. Prediction of NOx concentration at the entrance of denitration system based on stacking generalization ensemble method. Electr. Power Technol. Environ. Prot. 2022, 38, 517–524. [Google Scholar]
Wang, H.; Zhang, G.; Huang, Y.; Zhang, Y. NOx emission prediction of boiler based on stacking ensemble learning. Ind. Control. Comput. 2021, 34, 92–93+96. [Google Scholar]
Li, Y.; Huang, W.; Xi, J. NOx Emission Forecasting based on Stacking Ensemble Model. J. Eng. Therm. Energy Power 2021, 36, 73–81. [Google Scholar]
Wang, W.; Fan, H.; Liang, C.; Zhao, Z.; Shao, Y.; Tan, C.; Zheng, C. Predictive modeling of NOx outlet of hedged boiler based on random forest. Therm. Power Gener. 2022, 51, 91–104. [Google Scholar]
Wang, Y. Research on Nox Emission Prediction and Control of Coal-Fired Boiler. Ph.D. Dissertation, North China Electric Power University, Beijing, China, 2021. [Google Scholar]
Lv, Y.; Yang, T.; Liu, J. An adaptive least squares support vector machine model with a novel update for NOx emission prediction. Chemom. Intell. Lab. Syst. 2015, 145, 103–113. [Google Scholar] [CrossRef]
Napier, L.; Aldrich, C. An IsaMill™ soft sensor based on random forests and principal component analysis. IFAC-PapersOnLine 2017, 50, 1175–1180. [Google Scholar] [CrossRef]
Lv, Y.; Lv, X.; Fang, F.; Yang, T.; Romero, C. Adaptive selective catalytic reduction model development using typical operating data in coal-fired power plants. Energy 2020, 192, 116589. [Google Scholar] [CrossRef]
Yang, X.; Zhou, C. Research progress on the application of just-in-time learning in process industry. Comput. Appl. Chem. 2018, 35, 746–758. [Google Scholar]
Joshi, T.; Goyal, V.; Kodamana, H. A novel dynamic just-in-time learning framework for modeling of batch processes. Ind. Eng. Chem. Res. 2020, 59, 19334–19344. [Google Scholar] [CrossRef]
Ma, X.; Wang, T.; Zhou, H. Fuzzy multiple kernel support vector mechine based on weighted mahalanobis distance. Comput. Sci. 2022, 49 (Suppl. S2), 302–306. [Google Scholar]

Figure 1. Schematic of RF regression modeling.

Figure 2. Basic concept of the proposed JIT-RF modeling framework.

Figure 3. Process flow of SCR denitrification.

Figure 4. OOB error curves with varying values of

n L e a f

and

n T r e e

.

Figure 4. OOB error curves with varying values of

n L e a f

and

n T r e e

.

Figure 5. Variable selection results.

Figure 6. MW-PCA monitoring results.

Figure 7. Prediction and error curves of the proposed method. (a) The prediction error of the proposed method on the test set. (b) The predicted results of the proposed method on test set.

Figure 8. Scatter plot and error distribution diagram of the proposed model. (a) The scatter plot of actual measured and predicted values. (b) The distribution map of prediction error.

Figure 9. Error curves of the PLS algorithm.

Figure 10. Error curves of the RF algorithm.

Figure 11. Predictive scatter plots for PLS and JIT-PLS models.

Figure 12. Predictive scatter plots for RF and JIT-RF models.

Figure 13. Predictive scatter plots for MW-PLS and MW-RF models.

Table 1. Prediction results of different models.

Method	R²	RMSE	MAE	Proportion with Absolute Error Less than 15 $mg / m^{3}$
JIT-PLS	0.8473	5.5345	3.7936	97.29%
PLS	0.7879	6.5226	4.5792	96.23%
MW-PLS	0.6222	8.7057	6.1037	92.48%
RF	0.9197	4.0138	2.9939	99.58%
MW-RF	0.7095	7.6345	5.1912	93.94%
JIT-RF	0.9252	3.8727	2.9018	99.68%
Proposed method	0.9319	3.6960	2.7718	99.71%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, K.; Ding, H. Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods. Sensors 2024, 24, 4442. https://doi.org/10.3390/s24144442

AMA Style

He K, Ding H. Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods. Sensors. 2024; 24(14):4442. https://doi.org/10.3390/s24144442

Chicago/Turabian Style

He, Kaixun, and Haixiao Ding. 2024. "Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods" Sensors 24, no. 14: 4442. https://doi.org/10.3390/s24144442

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of NOx Emissions in Thermal Power Plants Using a Dynamic Soft Sensor Based on Random Forest and Just-in-Time Learning Methods

Abstract

1. Introduction

2. Preliminaries

2.1. JITL Method

2.2. RF Regression

3. Proposed Method

3.1. Strategy for Local Training Sample Selection

3.2. Process Condition Monitoring and Parameter Updating Strategy

3.3. Overall Flow of the Proposed Modeling Strategy

4. Boiler System and Data Preparation

4.1. Description of the Boiler System

4.2. Data Description

5. Case Study

5.1. Parameter Tuning

5.2. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI