Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design

Debnath, Sarupa; Sahoo, Soumya Ranjan; Agyeman, Bernard Twum; Liu, Jinfeng

doi:10.3390/math11020400

Open AccessArticle

Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design

by

Sarupa Debnath

,

Soumya Ranjan Sahoo

,

Bernard Twum Agyeman

and

Jinfeng Liu

^*

Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(2), 400; https://doi.org/10.3390/math11020400

Submission received: 24 December 2022 / Revised: 10 January 2023 / Accepted: 10 January 2023 / Published: 12 January 2023

(This article belongs to the Section Engineering Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

In this work, we propose a sensitivity-based approach to construct reduced-order state estimators based on recurrent neural networks (RNN). It is assumed that a mechanistic model is available but is too computationally complex for estimator design and that only some target outputs are of interest and should be estimated. A reduced-order estimator that can estimate the target outputs is sufficient to address such a problem. We introduce an approach based on sensitivity analysis to determine how to select the appropriate inputs and outputs for data collection and data-driven model development to estimate the desired outputs accurately. Specifically, we consider the long short-term memory (LSTM) neural network, a type of RNN, as the tool to train the data-driven model. Based on it, an extended Kalman filter, a state estimator, is designed to estimate the target outputs. Simulations are carried out to illustrate the effectiveness and applicability of the proposed approach.

Keywords:

sensitivity analysis; state estimation; reduced-order state estimation; extended Kalman filter

MSC:

93B11

1. Introduction

In recent decades, modern processing industries are increasingly employing complex, large-scale chemical processes due to their economic efficiency. A rigorous dynamic model for a process can consist of hundreds of differential equations to account for the process dynamics. State estimation of the essential process variables is quite demanding from an operation point of view to achieve better product quality and optimal utilization of available resources. In processing industries, state estimators or observers are commonly used to estimate unknown variables based on a process model and some measurable variables. Using a detailed mechanistic model to perform state estimation is often challenging due to increased complexity and higher computational cost. In many applications, the number of key variables that should be estimated is indeed much smaller than the number of the internal states of the entire system. A reduced-order estimator that can estimate the key variables is sufficient.

In the literature, there are some results on state estimation based on the reduced-order models. In [1], a state estimation scheme of wastewater treatment plants was developed based on model approximation, in which a reduced-order model was obtained based on the proper orthogonal decomposition (POD) approach. In [2], a state estimation scheme was developed on a reduced-order approximation of the forecast error using a Kalman filter. The reduced-order system was developed by the balanced truncation of the Hankel operator representation of the estimation error. In [3], a model reduction was performed using the matched asymptotic expansions method for an implicit two-time-scale system, and a distributed state estimation was implemented to demonstrate the improved computational time and accuracy. In [4], a structure-preserving model reduction method using trajectory-based unsupervised machine learning techniques was used to develop an adaptive moving horizon estimation algorithm. A reduced state observer was developed for a linearized reduced system using the balancing model reduction technique [5]. Different data-driven methods are used in many applications to determine the structure of the reduced-order model and to reveal important physical properties such as sparse regression [6] and the Koopman operator [7]. In process modeling, hybrid modeling is another data-based modeling, wherein it combines a kinetic model with a data-based model that improves the model’s accuracy and robustness [8,9,10].

In recent years, machine learning techniques, in particular, neural networks [8,9,10], have attracted significant attention in reduced-order model and estimator development due to the data-driven nature and easy-to-implement feature of these techniques. In the literature, there are many studies that have used machine learning to develop data-driven models that are, in general, reduced-order models. For example, in [11,12,13], machine learning was integrated with traditional observer or estimator frameworks for data-based state estimation schemes. However, a careful examination of these studies reveals that they lack a systematic method to determine how to choose the appropriate inputs and outputs for these machine learning-based model developments. The well-selected inputs and outputs can make sure that the resulting data-driven model captures the dynamics needed for the estimation of the key target variables and can significantly reduce the model training effort [14,15].

Another relevant topic in the literature is inferential soft sensors built on process data and explores the correlation between inputs and target outputs. These inferential soft sensors, in general, do not consider the dynamics of the system. There are many applications of soft sensors based on machine learning and statistical techniques, such as neural networks [16,17], principal component regression [18], and partial least squares regression [19]. These soft sensors are useful and can provide predictions of unmeasured variables, and are typically easy to implement. However, as they do not take the dynamics of the process into account explicitly, their performance may be limited. If a dynamic model can be developed, state estimation provides much-improved estimation performance.

Motivated by the above considerations, in this work, we propose an approach to find the most appropriate inputs and outputs for data-driven, reduced-order model development for target variable estimation purposes. Specifically, we assume that a mechanistic model of the actual system is available, and we are only interested in estimating a small set of the desired outputs instead of the entire state vector. To address such a problem, a reduced-order estimator that can estimate the desired outputs is sufficient to meet the requirements. In the proposed approach, there are three steps. In the first step, a sensitivity matrix of the target outputs to the initial state is evaluated based on process data. Then the singular value decomposition (SVD) is applied to the matrix to find the dominant singular values and the most important state elements that contribute to the dominant singular values. The most important state elements are selected as the reduced state vector. Once the reduced state vector is determined, the sensitivity of the inputs to the reduced state vector is evaluated and similarly, the most important inputs are determined. These inputs are selected as the elements in the reduced input vector. In the second step, process data are collected based on simulating the process model, and a data-driven model in the form of a Long-Short-Term-Memory (LSTM) neural network is designed to approximate the dynamics between the selected reduced input vector and the reduced state vector. In the third step, an extended Kalman filter is designed based on the reduced-order LSTM model to estimate the target output. The proposed approach is applied to a chemical process, and extensive simulations will be performed to show its applicability and effectiveness.

The main contributions of this work include: (a) a systematic approach for input and output selection for reduced-order model development based on sensitivity analysis; (b) a modified EKF design that can take advantage of the reduced-order model; and (c) detailed simulations illustrating the applicability and effectiveness of the proposed approach.

2. Preliminaries

In this section, we first provide a description of the discrete-time nonlinear system and define the objectives of the work. This section also discusses the formulation of the reduced-order model briefly.

2.1. System Description

We consider a class of discrete-time nonlinear systems described as follows:

\begin{matrix} x (t + 1) & = f (x (t), u (t)) \end{matrix}

(1a)

\begin{matrix} y (t) & = h (x (t)) \end{matrix}

(1b)

\begin{matrix} y_{t} (t) & = h_{t} (x (t)) \end{matrix}

(1c)

where

x (t) \in R^{n_{x}}

is the vector of state variables at t,

u (t) \in R^{n_{u}}

is the vector of the manipulated inputs, and

y (t) \in R^{n_{y}}

is the vector containing all the measured outputs.

f (\cdot)

and

h (\cdot)

denote the nonlinear state and measured output equations, respectively.

y_{t} (t) \in R^{n_{y_{t}}}

represents the vector of target process variables that is to be estimated, and the function

h_{t} (\cdot)

characterizes the relation between the state x and the target output

y_{t}

. It is assumed that the state vector x is observable based on the measurements of y. It is also assumed that the dimension of

y_{t}

is smaller than the dimension of x (the number of target outputs

n_{y_{t}}

is smaller than the number of states

n_{x}

). In a process system, the number of states can be many, but there is typically a relatively much smaller number of important states that need to be monitored very closely. The above assumption is considered from the viewpoint of the operation. It is not a condition but a scenario of the operation. For convenience, we will refer to the system in (1) as the actual system for the remainder of this work.

2.2. Problem Formulation

The main objective of this work is to develop a reduced-order estimator to estimate the target output

y_{t}

based on measurements of y. Aiming at

y_{t}

estimation, one can develop a full-order estimator using the available measurements y and the process model (1a). Different estimation algorithms can be readily applicable to estimate all the states x of the system (1). However, in many applications, it is not necessary to estimate all the states of the system when we are only interested in

y_{t}

. A full-order estimation can be computationally expensive, and the estimation performance may also be compromised due to estimating all the states with a limited number of measured output variables. It is expected that a reduced-order estimator that only estimates a smaller subset of the state variables that are closely related to the target outputs is sufficient to achieve the objective. Compared to a full-order state estimator, a reduced-order estimator has the potential to decrease the computational cost and improve the estimation performance.

In this work, we consider the development of a reduced-order model for the system (1) and the associated reduced-order estimator design. Specifically, we propose to first use the full-order system model (1) to generate data and then to identify a reduced-order model using RNN for

y_{t}

estimation purposes. Subsequently, a reduced-order estimator is designed based on the RNN to estimate

y_{t}

based on the measurements of y. In particular, we are interested in identifying a reduced-order model in the following form:

\begin{matrix} \tilde{x} (t + 1) = \tilde{f} (\tilde{x} (t), \dots, \tilde{x} (t - n_{l}), \tilde{u} (t), \dots, \tilde{u} (t - n_{l})) \end{matrix}

(2)

where

\tilde{x} \in R^{{\tilde{n}}_{x}}

is the vector of reduced (selected) state variables,

n_{l}

is the length of the sequence of data,

\tilde{u} \in R^{{\tilde{n}}_{u}}

is the vector of the reduced (selected) manipulated inputs, and

\tilde{f} (\cdot)

describes the dynamics of the reduced-order model. It is also expected that the elements of the reduced variables

\tilde{x}

and

\tilde{u}

are the same as the respective elements in the actual system variables x and u. The measured output y and the target output

y_{t}

are also expected to be able to be described using the reduced state

\tilde{x}

. Let us denote the relations as follows:

\begin{matrix} y (t) = \tilde{h} (\tilde{x} (t)) \end{matrix}

(3)

\begin{matrix} y_{t} (t) = {\tilde{h}}_{t} (\tilde{x} (t)) \end{matrix}

(4)

where

\tilde{h}

and

{\tilde{h}}_{t}

define the measured output equation and target output equation with respect to the reduced state vector

\tilde{x}

.

We will discuss how

\tilde{x}

and

\tilde{u}

should be selected so that a reduced model, as shown in (2)–(4), can be identified and the target output

y_{t}

can be accurately estimated using the measurements of y based on the reduced-order model (2). It is expected that the dimension

{\tilde{n}}_{x}

of the reduced state vector

\tilde{x}

is much smaller than the dimension

n_{x}

of the actual system state vector x (

{\tilde{n}}_{x} < n_{x}

). Similarly,

{\tilde{n}}_{u} < n_{u}

.

3. Proposed Reduced Input and State Vectors Selection Approach

Figure 1 shows a flow chart of the proposed approach. In the first step, we select the reduced-order state

\tilde{x}

and the corresponding

\tilde{u}

for the purpose of estimating

y_{t}

. Based on the full-order system model, we first construct the sensitivity matrix

\frac{\partial y_{t}}{\partial x}

. By analyzing

\frac{\partial y_{t}}{\partial x}

, we can then determine the elements in the state vector x that are most closely related to

y_{t}

, and these elements are selected to construct the reduced-order system state

\tilde{x}

. Once

\tilde{x}

is determined, we further construct the sensitivity matrix

\frac{\partial \tilde{x}}{\partial u}

. Based on

\frac{\partial \tilde{x}}{\partial u}

, we then find a subset of u,

\tilde{u}

, that has a significant impact on

\tilde{x}

. In the determination of

\tilde{x}

and

\tilde{u}

, the singular value decomposition (SVD) method will be used. In the second step, based on the full-order system, we vary

\tilde{u}

and collect the trajectories of

\tilde{x}

. Then, based on the data of

\tilde{u}

and

\tilde{x}

, an LSTM model is identified. This LSTM model is a reduced-order model. In the third step, an estimator is designed based on the reduced-order model identified in the second step. In this work, we will show how an extended Kalman filter (EKF) may be designed based on the reduced-order model to estimate

y_{t}

based on measurements of y.

3.1. Sensitivity Matrix for Reduced State Selection

In order to select the reduced states, the sensitivity matrix of the target output

y_{t}

to state x is considered. A larger element in the sensitivity matrix indicates that the target output is more sensitive to the corresponding state. That is, a small perturbation of the states can generate a larger change in the target output. In the literature, sensitivity matrices have been used to identify relevant connections between model outputs and inputs to develop a reduced-order model [20,21,22] since the sensitivity is closely related to both the observability and controllability of a system [23,24]. To find the sensitivity matrix, a practical method is to linearize the nonlinear system at different points along its trajectories and find the observability matrix at each point.

Consider

N + 1

sampling points from

t_{0}

to

t_{N}

along a trajectory of the system (1). Defining

A (t) : = \frac{\partial f}{\partial x} (t)

,

B (t) : = \frac{\partial f}{\partial u} (t)

,

C_{1} (t) : = \frac{\partial h}{\partial x} (t)

, and

C_{2} (t) : = \frac{\partial h_{t}}{\partial x} (t)

, the linearized system at a sampling time t can be obtained as follows:

\begin{matrix} x (t + 1) & = A (t) x (t) + B (t) u (t) + F^{^{'}} \end{matrix}

(5)

\begin{matrix} y (t) & = C_{1} (t) x (t) + H_{1}^{^{'}} \end{matrix}

(6)

\begin{matrix} y_{t} (t) & = C_{2} (t) x (t) + H_{2}^{^{'}} \end{matrix}

(7)

where

F^{^{'}}

,

H_{1}^{^{'}}

, and

H_{2}^{^{'}}

are additional constant terms resulting from the linearization at sampling point

(x (t), u (t))

.

The sensitivity of the target output

y_{t} (t)

to the initial state

x (t_{0})

is defined as

\frac{\partial y_{t} (t)}{\partial x (t_{0})}

. Defining the sensitivity of the state to the initial condition as

\frac{\partial x (t)}{\partial x (t_{0})}

, from (5) and (7), the following two equations can be written:

\begin{matrix} \frac{\partial x (t + 1)}{\partial x (t_{0})} & = A (t) \frac{\partial x (t)}{\partial x (t_{0})} \end{matrix}

(8)

\begin{matrix} \frac{\partial y_{t} (t)}{\partial x (t_{0})} & = C_{2} (t) \frac{\partial x (t)}{\partial x (t_{0})} \end{matrix}

(9)

with the initial value at

t = t_{0}

,

\frac{\partial x (t_{0})}{\partial x (t_{0})} = I

. Using (8) and (9), it can be rewritten at sampling point t as:

\begin{matrix} \frac{\partial y_{t} (t)}{\partial x (t_{0})} & = C_{2} (t) A (t - 1) A (t - 2) \dots A (0) \end{matrix}

(10)

From sampling time

t_{0}

to

t_{N}

, we can construct the sensitivities,

\frac{\partial y_{t} (t)}{\partial x (t_{0})}

,

t = 0, \dots, N

, and stack them to form a sensitivity matrix

S_{O}

:

S_{O} (t_{0}, \dots, t_{N}) = [\begin{matrix} \frac{\partial y_{t} (t_{0})}{\partial x_{1} (t_{0})} \frac{\partial y_{t} (t_{0})}{\partial x_{2} (t_{0})} \dots \frac{\partial y_{t} (t_{0})}{\partial x_{n_{x}} (t_{0})} \\ \frac{\partial y_{t} (t_{1})}{\partial x_{1} (t_{0})} \frac{\partial y_{t} (t_{1})}{\partial x_{2} (t_{0})} \dots \frac{\partial y_{t} (t_{1})}{\partial x_{n_{x}} (t_{0})} \\ ⋮ \\ ⋮ \\ \frac{\partial y_{t} (t_{N})}{\partial x_{1} (t_{0})} \frac{\partial y_{t} (t_{N})}{\partial x_{2} (t_{0})} \dots \frac{\partial y_{t} (t_{N})}{\partial x_{n_{x}} (t_{0})} \end{matrix}]

(11)

The matrix is a series of snapshots of the sensitivities stacked vertically in the time span

t_{0}

to

t_{N}

. We can test the rank of

S_{O} (t_{0} \dots t_{N})

along a typical trajectory from

t_{0}

to

t_{N}

. The sensitivity matrix

S_{O}

will be used to select the reduced state

\tilde{x}

. The SVD analysis will be used to achieve this goal and will be discussed in Section 3.2.

3.2. Reduced State Selection via Singular Value Decomposition

Once the sensitivity matrix

S_{O}

is constructed, we propose using the singular value decomposition (SVD) analysis to analyze

S_{O}

to find the subset of state elements in x that are closely related to the target output

y_{t}

. The SVD algorithm [25] provides a way to represent the sensitivity matrix

S_{O}

as a summation of equally sized matrices that decrease in dominance:

\begin{matrix} S_{O} = u_{1} σ_{1} v_{1}^{T} + u_{2} σ_{2} v_{2}^{T} + \dots + u_{n_{x}} σ_{n_{x}} v_{n_{x}}^{T} \end{matrix}

(12)

where

σ_{i}

’s (

i = 1, \dots, n_{x}

) are the singular values of the matrix

S_{O}

sorted such that

σ_{1}

is the largest singular value and

σ_{n_{x}}

is the smallest singular value, and

u_{i}

,

v_{i}

(

i = 1, \dots, n_{x}

) are the associated left and right unitary matrices, respectively. Equation (12) implies that the sensitivity information contained in

S_{O}

can be projected onto

n_{x}

directions represented by the

n_{x}

singular vectors

v_{i}

,

i = 1, \dots, n_{x}

. The magnitude of the singular value reflects the amount of information contained in the associated direction.

After SVD, we first examine the singular values and can select the dominant ones that contain most of the information in

S_{O}

for further analysis. One way to select the dominant singular values is to identify a significant gap in the singular values [26,27].

Suppose that m out of the

n_{x}

singular values is selected for further analysis. We next analyze the associated singular vectors

v_{i}

,

i = 1, \dots, m

. The absolute value of the j-th,

j = 1, \dots, n_{x}

, element in

v_{i}

, denoted as

v_{i j}

, reflects the contribution of the j-th state element of x to the variance/information of the target variable

y_{t}

along the direction of

v_{i}

[28]. Therefore, in this work, the following measure is used to reflect the overall effect of the j-th state element on the target output:

\begin{matrix} D_{j} = \frac{\sum_{i = 0}^{m} | σ_{i} v_{i j} |}{\sum_{i = 0}^{m} | σ_{i} |} \end{matrix}

(13)

where

0 \leq D_{j} \leq 1

. A large value of

D_{j}

implies a larger impact on the target output. Based on the above measure, a partitioning of the elements of x into two groups can be performed. The first group has a comparatively large effect on the target variable than the other group. Therefore, the elements belonging to the first group are selected to be elements in

\tilde{x}

.

After the above process,

\tilde{x}

contains elements that are most closely related to

y_{t}

. In order to be able to use the measured output y, we should check whether y can be expressed using the elements in the current

\tilde{x}

. This is whether an expression in (3) can be obtained. If the current

\tilde{x}

is not sufficient to express y, the missing state elements should be added to

\tilde{x}

.

3.3. Sensitivity Matrix for Input Selection and Reduced Input Vector Selection

After

\tilde{x}

is determined, we can proceed to select the reduced input vector

\tilde{u}

. The sensitivity of

\tilde{x}

with respect to the actual input vector u,

\frac{\partial \tilde{x}}{\partial u}

will be used.

Considering the sampling points from

t_{0}

to

t_{N}

again, along a trajectory of the system in (1a) and (1b), from Equation (5), the following equation can be written:

\begin{matrix} \frac{\partial x (t + 1)}{\partial u (t_{0})} & = A (t) \frac{\partial x (t)}{\partial u (t_{0})} + B (t) \frac{\partial u (t)}{\partial u (t_{0})} \end{matrix}

(14)

where

\frac{\partial x (t)}{\partial u (t_{0})}

is the sensitivity of the state vector

x (t)

with respect to the input

u (t_{0})

and

\frac{\partial u (t)}{\partial u (t_{0})}

is the sensitivity of input vector u at t with respect to the input at

t_{0}

. As the inputs are not dependent on each other,

\frac{\partial u (t)}{\partial u (t_{0})} = 0

for all the t except

t_{0}

. At

t = t_{0}

,

\frac{\partial u (t)}{\partial u (t_{0})} = I

. From (14),

\frac{\partial x (t)}{\partial u (t_{0})}

is evaluated as follows:

\begin{matrix} \frac{\partial x (t)}{\partial u (t_{0})} & = A (t - 1) A (t - 2) \dots A (1) B (0) \end{matrix}

(15)

Note that the sensitivity of the reduced state vector

\tilde{x}

to the input vector u,

\frac{\partial \tilde{x} (t)}{\partial u (t_{0})}

, can be obtained by taking the corresponding elements (rows) from

\frac{\partial x (t)}{\partial u (t_{0})}

, since

\tilde{x}

is composed of selected elements of x. By calculating

\frac{\partial \tilde{x} (t)}{\partial u (t_{0})}

from

t_{0}

to

t_{N}

, and stacking these sensitivities in a matrix, we can form the following sensitivity matrix

S_{C}

:

S_{C} (t_{0}, \dots, t_{N}) = [\begin{matrix} \frac{\partial {\tilde{x}}_{1} (t_{0})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{1} (t_{0})}{\partial u_{2} (t_{0})} \dots \frac{\partial {\tilde{x}}_{1} (t_{0})}{\partial u_{n_{u}} (t_{0})} \\ \frac{\partial {\tilde{x}}_{2} (t_{0})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{2} (t_{0})}{\partial u_{2} (t_{0})} \dots \frac{\partial {\tilde{x}}_{2} (t_{0})}{\partial u_{n_{u}} (t_{0})} \\ ⋮ \\ \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{0})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{0})}{\partial u_{1} (t_{0})} \dots \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{0})}{\partial u_{n_{u}} (t_{0})} \\ ⋮ \\ ⋮ \\ \frac{\partial {\tilde{x}}_{1} (t_{N})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{1} (t_{N})}{\partial u_{1} (t_{0})} \dots \frac{\partial {\tilde{x}}_{1} (t_{N})}{\partial u_{n_{u}} (t_{0})} \\ \frac{\partial {\tilde{x}}_{2} (t_{N})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{2} (t_{N})}{\partial u_{1} (t_{0})} \dots \frac{\partial {\tilde{x}}_{2} (t_{N})}{\partial u_{n_{u}} (t_{0})} \\ ⋮ \\ \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{N})}{\partial u_{1} (t_{0})} \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{N})}{\partial u_{1} (t_{0})} \dots \frac{\partial {\tilde{x}}_{{\tilde{n}}_{x}} (t_{N})}{\partial u_{n_{u}} (t_{0})} \end{matrix}]

(16)

For determining the reduced input vector

\tilde{u}

, a similar approach to the selection of

\tilde{x}

is considered based on the sensitivity matrix

S_{C}

. First, by applying SVD to

S_{C}

, an equation similar to Equation (12) can be derived. Next, by examining the singular values, we can find the dominant ones. Next, we analyze the associated singular vectors to find the important elements in u that have a more significant impact on the reduced state

\tilde{x}

. A similar measure to

D_{j}

can be used in determining the important elements in u, and these elements in u form the reduced input vector

\tilde{u}

.

Remark 1.

Note that while in this work, we use SVD to find the reduced state

\tilde{x}

and input

\tilde{u}

, other orthogonalization approaches, such as principal component analysis, may also be used to find the reduced state and input vectors. The detailed procedures of applying other orthogonalization approach to find

\tilde{x}

and

\tilde{u}

need further investigation.

4. Proposed Reduced-Order Estimator Design Approach

In the previous section, we discussed how to select the reduced state and input vectors

\tilde{x}

and

\tilde{u}

. In this section, we discuss how to develop a reduced-order model using LSTM to describe the dynamics between

\tilde{u}

and

\tilde{x}

. We will also discuss how to design a reduced-order estimator in the framework of EKF based on the reduced-order model.

4.1. Reduced-Order Model Development

We propose to use LSTM neural networks to develop the reduced-order model. LSTM networks are typically used for modeling sequential time-series data, such as the trajectories of dynamical chemical processes. While the traditional RNNs suffer from losing error information pertaining to long data sequences, LSTM models can deal with this problem by protecting error information from decaying using learnable gates [29,30].

As illustrated in Step 2 in Figure 1, to develop the reduced-order model, we need to collect data. The data for model development can be generated based on the actual system model, as shown in (1) based on extensive simulations with different initial states and randomly generated multi-step input sequences. The multi-step input sequence ensures the capturing of most of the system’s dynamics. Random process noise can also be added to the simulations to generate noisy process data. Note that since only the inputs in the selected

\tilde{u}

are important for the target output

y_{t}

, only these inputs need to be considered in the data generation, and the other inputs can be kept constant. In the extensive simulations, the trajectories of

\tilde{u}

and

\tilde{x}

are collected. It is noted that the generation of these time-series data is an important step of LSTM modeling. When the data are collected, the entire data set is divided into training, validation, and testing data sets for the LSTM model development. A brief step-by-step procedure for identifying an LSTM model is outlined as follows:

Normalize the dataset so that all values are within the range of 0 and 1.
Determine the number of layers of the LSTM model and the number of nodes in each layer; the outputs of the LSTM model should be $\tilde{x}$ , and the input to the LSTM model should be ${\tilde{x}, \tilde{u}}$ .
Train the LSTM model using the training dataset. This may be performed using, for example, the Keras Library in Python for artificial neural networks.
Use the validation and test datasets to validate and evaluate the model performance, respectively. If the model performance (both single-step and multi-step ahead predictions) is not acceptable, go back to Step 2 and retrain the LSTM model. If the performance is good, save the model parameters.

After the LSTM model is trained, the LSTM model parameters can be extracted, and the model can be described in the form of (2)–(4).

4.2. Extended Kalman Filter Design

EKF and its variants are standard methods used for state estimation of nonlinear systems based on successively linearizing the nonlinear system [3,31]. A traditional EKF is modified to accommodate the sequence length of the LSTM model. Note that EKF is based on successive linearization of the original nonlinear system. If a system cannot be linearized well, numerical approaches may be used to find the Jacobian matrix treating the nonlinear system as a black box. The EKF is divided into two steps—prediction and update steps.

Prediction step. At a sampling time t, using the past

n_{l}

estimated reduced state

\hat{\tilde{x}}

from

t - 1

to

t - n_{l}

, the EKF predicts the reduced state at the sampling time t. The predicted reduced state is as follows:

\begin{matrix} \hat{\tilde{x}} (t | t - 1) = \tilde{f} (\hat{\tilde{x}} (t - 1), \dots, \hat{\tilde{x}} (t - n_{l}), \tilde{u} (t - 1), \dots, \tilde{u} (t - n_{l})) \end{matrix}

(17)

where

\hat{\tilde{x}} (t | t - 1)

represents the prediction of the reduced state at time instant t based on past estimated reduced state

\hat{\tilde{x}}

from

t - n_{l}

to

t - 1

.

The propagation of the process disturbance is as follows:

\begin{matrix} P (t | t - 1) = \sum_{\begin{matrix} m = t - n_{l} \end{matrix}}^{t - 1} A_{d} (m + 1) P (m) A_{d} (m + 1) + Q \end{matrix}

(18)

where P and Q are the reduced state covariance matrix and process noise covariance matrix. The state-transition matrix,

A_{d} (t) = \frac{\partial \tilde{f}}{\partial \tilde{x}} |_{\hat{\tilde{x}} (t - 1)}

.

To find the expression of P in (18), let us consider the reduced-order model with additive process noise w:

\begin{matrix} \tilde{x} (t) = \tilde{f} (\tilde{x} (t - 1), \dots, \tilde{x} (t - n_{l}), \tilde{u} (t - 1), \dots, \tilde{u} (t - n_{l})) + w (t - 1) \end{matrix}

(19)

The error between the actual reduced state

\tilde{x} (t)

and the predicted value from (17),

\hat{\tilde{x}} (t | t - 1)

, is given as:

\begin{matrix} \tilde{x} (t) - \hat{\tilde{x}} (t | t - 1) = & \tilde{f} (\tilde{x} (t - 1), \dots, \tilde{x} (t - n_{l}), \tilde{u} (t - 1), \dots, \tilde{u} (t - n_{l})) + w (t - 1) \\ - \tilde{f} (\hat{\tilde{x}} (t - 1), \dots, \hat{\tilde{x}} (t - n_{l}), \tilde{u} (t - 1), \dots, \tilde{u} (t - n_{l})) \end{matrix}

(20)

The estimation error can be approximated by only considering the linear approximations of the nonlinear system equation:

\begin{matrix} \tilde{x} (t) - \hat{\tilde{x}} (t | t - 1) \approx & \frac{\partial \tilde{f}}{\partial \tilde{x}} |_{\hat{x} (t - 1)} (\tilde{x} (t - 1) - \hat{\tilde{x}} (t - 1)) + \dots \\ + \frac{\partial \tilde{f}}{\partial \tilde{x}} |_{\hat{x} (t - n_{l})} (\tilde{x} (t - n_{l}) - \hat{\tilde{x}} (t - n_{l})) + w (t - 1) \\ = & \sum_{m = t - n_{l}}^{t - 1} A_{d} (m + 1) (\tilde{x} (m) - \hat{\tilde{x}} (m)) + w (t - 1) \end{matrix}

(21)

The covariance matrix

P (t | t - 1)

can be calculated as follows:

\begin{matrix} P (t | t - 1) = & E [(\tilde{x} (t) - \hat{\tilde{x}} (t | t - 1)) {(\tilde{x} (t) - \hat{\tilde{x}} (t | t - 1))}^{T}] \end{matrix}

(22)

Based on (21) and (22), the following equation can be written:

\begin{matrix} P (t | t - 1) = & E [(\sum_{m = t - n_{l}}^{t - 1} A_{d} (m + 1) (\tilde{x} (m) - \hat{\tilde{x}} (m)) + w (t - 1)) \\ {(\sum_{m = t - n_{l}}^{t - 1} A_{d} (m + 1) (\tilde{x} (m) - \hat{\tilde{x}} (m)) + w (t - 1))}^{T}] \end{matrix}

(23)

Given that the noise

w (t - 1)

is not correlated with the estimates at and before

t - 1

and neglecting the correlation between the estimated reduced states at different time instants,

P (t | t - 1)

can be approximated as follows:

\begin{matrix} P (t | t - 1) = \sum_{\begin{matrix} m = t - n_{l} \end{matrix}}^{t - 1} A_{d} (m + 1) P (m) A_{d} (m + 1) + Q \end{matrix}

(24)

which is the expression used in (18).

Update step. At each sampling instant t, an estimate of the current reduced state

\hat{\tilde{x}} (t)

is obtained by performing the measurement-update step based on the predicted value

\hat{\tilde{x}} (t | t - 1)

as follows:

\begin{matrix} \hat{\tilde{x}} (t) = \hat{\tilde{x}} (t | t - 1) + K_{t} (y (t) - C \hat{\tilde{x}} (t | t - 1)) \end{matrix}

(25)

where

\hat{\tilde{x}} (t)

represents the estimated

\tilde{x}

at time t given the observations of y up to time t and the observation matrix,

C = \frac{\partial h}{\partial \tilde{x}} |_{\hat{\tilde{x}} (t | t - 1)}

. The correction gain

K_{t}

at time t used to minimize a posteriori error covariance based on the measurement innovation (i.e.,

y (t) - C \hat{\tilde{x}} (t | t - 1)

) can be determined as follows:

\begin{matrix} K_{t} = P (t | t - 1) C^{T} {(R + C P (t | t - 1) C^{T})}^{- 1} \end{matrix}

(26)

where R is the covariance matrix of the measurement noise. The covariance matrix is also updated as follows:

\begin{matrix} P (t) = (I - K_{t} C) P (t | t - 1) \end{matrix}

(27)

where

P (t)

is the posteriori error covariance matrix of the estimation error at t, and I is an identity matrix. Note that

P (0)

, Q, and R are three tuning parameters for the EKF.

Remark 2.

Note that while in this work, we use LSTM to identify the reduced-order model, other types of data-driven reduced-order models may also be used as long as

\tilde{x}

and

\tilde{u}

are considered in the model. As discussed earlier,

\tilde{x}

and

\tilde{u}

are necessary to capture the essential dynamics of the target output estimation.

5. Application to a Chemical Process

In this section, we apply the proposed reduced-order estimator design approach to a chemical process to illustrate its applicability and effectiveness.

5.1. Process Description and Simulation Settings

A chemical process consisting of two continuous stirred tank reactors (CSTR) and a flash separator in series is considered [32]. A process schematic is shown in Figure 2. Pure material A is fed at the rates of

F_{10}

and

F_{20}

, respectively, into the two CSTRs, in which the first-order irreversible exothermic reactions take place, i.e.,

A \overset{}{\to} B

and

B \overset{}{\to} C

. The reactors are assumed to be perfectly mixed, with constant density, liquid volume, and heat capacity. The outlet of the second CSTR is fed into the flash separator at a flow rate

F_{2}

. The overhead of the separator is condensed and passed to a downstream unit at flow rate

F_{p}

with a recycle to the first reactor

F_{r}

and the bottom product stream is removed at flow rate

F_{3}

. Each tank is equipped with a jacket to heat or cool the tank, and

Q_{1}

,

Q_{2}

, and

Q_{3}

are heat inputs/removals. A detailed model of the process in the form of ordinary differential equations (ODEs) is described in [32]. The parameter values of the model are shown in Table 1. In the model, there are, in total, nine ODEs corresponding to the dynamics of the concentrations and temperatures of each tank. It is assumed that the temperatures of all the tanks are measurable, so the system outputs

y = {[T_{1}, T_{2}, T_{3}]}^{T}

. The concentrations of components A and B in each tank and the temperatures of the tanks are the states of the process; that is,

x = {[X_{A 1}, X_{B 1}, T_{1}, X_{A 2}, X_{B 2}, T_{2}, X_{A 3}, X_{B 3}, T_{3}]}^{T}

, and the input vector is

u = {[F_{10}, F_{20}, Q_{1}, Q_{2}, Q_{3}, F_{r}, F_{p}]}^{T}

.

In order to control and monitor the quality of the product, it is assumed that the concentration of component B in the separator

X_{B 3}

is an important process variable and is considered the target variable; that is,

y_{t} = [X_{B 3}]

. It is desired to estimate

X_{B 3}

at each sampling time based on the process information of the input u and the measured output y. Since we are mainly concerned about the target output

X_{B 3}

, it is not necessary to estimate all the other states of the process.

A first principle model of the process was built based on physio-chemical phenomena to connect different unit operations by mass and heat balances [1]. It is verified that based on the measurements of y and the first principle model, the entire state x can be estimated. In this section, we illustrate how a reduced-order estimator may be designed using the proposed approach to estimate

X_{B 3}

and will compare the performance of the reduced-order estimator with a full-order estimator based on the actual first principle model to show the benefits of using a reduced-order estimator.

5.2. Selection of the Reduced State and Input Vectors

Following the steps illustrated in Figure 1, first, to obtain sensitivity matrices for reduced state and input selection, we perform open-loop simulations based on the first principle model of the process to generate data. In the simulations, the values of the process parameters used are shown in Table 1. A random process noise

w (t)

is added, and the noise is generated following a Gaussian white noise with zero mean and standard deviation of 0.01. The process model is solved using the fourth-order Runge Kutta method with a sampling time of 0.01 h. The entirety of the data is produced using randomly generated inputs within the allowable ranges. The inputs change every 2 h. Figure 3 shows the trajectories of one of the inputs and the target output.

We find the sensitivity matrix of the target output to the state

S_{O}

following (11). Then, we apply SVD to the sensitivity matrix to find the singular values and the associated singular vectors. Figure 4A represents the nonzero singular values of the sensitivity matrix in a semi-log plot, in descending order, along with its index number. There are eight entries in this plot; therefore, it has a zero singular value. It can be observed from the plot that there is a clear gap between the third and fourth singular values. Therefore, the first three singular values are considered the dominant ones.

After determining the dominant singular values, we calculate measure (13) based on the three dominant singular values. Figure 4B shows the

D_{j}

values of each of the state elements or the contributions of the state elements to the three singular values. From Figure 4B, it can be seen that

T_{1}

,

X_{B 3}

,

T_{2}

, and

X_{B 2}

have relatively larger

D_{j}

values, which implies that these state elements contribute to the three dominant eigenvalues most. Further,

X_{B 1}

,

X_{B 2}

,

X_{A 2}

,

X_{A 1}

, and

X_{A 3}

have much smaller

D_{j}

values and

T_{3}

has a

D_{j}

value equal to 0. These smaller

D_{j}

values imply that the corresponding state elements contribute much less to the three dominant singular values. Therefore, the initial reduced state vector can include

T_{1}

,

X_{B 3}

,

T_{2}

, and

X_{B 2}

, which have a relatively larger impact on the three dominant singular values. That is,

\tilde{x} = {[T_{1}, X_{B 3}, T_{2}, X_{B 2}]}^{T}

. Note that the current

\tilde{x}

is determined based on

y_{t}

. Next, we continue to check whether the measured outputs

y = {[y_{1}, y_{2}, y_{3}]}^{T} = {[T_{1}, T_{2}, T_{3}]}^{T}

can be expressed based on

\tilde{x}

. It can be found that

y_{3} = T_{3}

cannot be expressed based on

\tilde{x}

. However, since

T_{3}

is not related to the three dominant singular values (

D_{j} = 0

for

T_{3}

),

y_{3}

is not really useful in estimating the target output. Instead of expanding

\tilde{x}

to include

T_{3}

, we can remove

y_{3}

from the measurements used in estimating

y_{t}

.

Once

\tilde{x} = {[T_{1}, X_{B 3}, T_{2}, X_{B 2}]}^{T}

is determined, we continue to determine the reduced input vector

\tilde{u}

. Similarly, based on open-loop simulation data, we calculate the sensitivity matrix

S_{C}

following (Section 3.3). Then, we apply SVD to

S_{C}

to find the dominant singular values of

S_{C}

and then the closely related inputs. Figure 5A shows the singular values of

S_{C}

. From the figure, it can be seen that there is a significant gap between the sixth and seventh singular values. Therefore, these six singular values are considered to be the dominant ones. For these singular values, we further calculate the

D_{j}

values, which are shown in Figure 5B. From Figure 5B, it can be seen that there are six inputs that contribute significantly to the singular values. These inputs are included in the reduced input vector. That is,

\tilde{u} = {[Q_{1}, F_{20}, F_{r}, Q_{3}, Q_{2}, F_{10}]}^{T}

.

The elements of the reduced state vector

\tilde{x}

, the reduced input vector

\tilde{u}

, and the measured outputs used for estimating

y_{t}

are summarized in Table 2.

5.3. Reduced-Order Model and Estimator

Based on the selected

\tilde{x}

and

\tilde{u}

, further open-loop simulations of the actual process model are performed, and data are collected for LSTM model development. Note the data are only collected for

\tilde{x}

and

\tilde{u}

.

In the training of the LSTM, different

n_{l}

values were considered, and it was found that

n_{l} = 2

is sufficient to create a good LSTM model. The LSTM has two hidden layers of 50 neurons and one dense layer of 4 neurons as the output layer.

Once the LSTM model is developed, the EKF is designed based on the LSTM model. For the EKF estimator, the weighting matrices are diagonal matrices with

Q = d i a g {[0 . 005^{2}, 0 . 005^{2}, 0 . 005^{2}, 0 . 005^{2}]}

,

R = d i a g {[20^{2}, 20^{2}]}

, and the matrix

P (0) = d i a g {[100^{2}, 100^{2}, 100^{2}, 100^{2}]}

.

Remark 3.

Although in this example, the LSTM was used, the sequence length was small

n_{l} = 2

. The real benefits of the LSTM are expected to be observed more obviously when the system dynamics under study require a longer sequence length. For this example, it is possible to apply different modeling approaches to find the structure of the model.

Remark 4.

In this work, some of the tuning hyperparameters in the LSTM training, such as the number of epochs (30) and batch size (100), were selected based on a number of simulation experiments. It was also observed that the default learning rate value set by the Keras library was suitable for the developed model. Similarly, the LSTM architecture (units = 50) was determined through a number of simulation experiments. While this brute force approach may be somewhat suitable for smaller LSTM models, a systematic approach can be considered to tune the hyperparameters of the identified LSTM model for more complicated systems.

5.4. Results and Discussion

In this subsection, we evaluate the performance of the above-developed reduced-order estimator. To evaluate the performance, we will use the average normalized estimation error of the target output

X_{B 3}

as shown below:

\begin{matrix} σ_{X_{B 3}} = \sqrt{\frac{1}{N_{s i m}} \sum_{j = 0}^{N_{s i m} - 1} {(\frac{{\hat{X}}_{B 3} (t_{j}) - X_{B 3} (t_{j})}{X_{B 3} (t_{j})})}^{2}} \end{matrix}

(28)

where

N_{s i m}

indicates the total simulation steps,

{\hat{X}}_{B 3}

denotes the estimated value, and

X_{B 3}

denotes the actual value of the target variable. All the simulations were conducted on a desktop computer with an Intel i7 CPU at 3.2 GHz and 16 GB RAM. The LSTM models were trained using Keras and TensorFlow in Python programming language.

We design various simulation cases to test the performance of the proposed reduced-order estimation. Specifically, we consider three different schemes: (a) Scheme 1, the proposed reduced-order estimator based on the LSTM model with

n_{l} = 2

, (b) Scheme 2, a soft sensor that exploits the correlation between y, u, and

y_{t}

, (c) Scheme 3, a full-order state estimator design based on a regular EKF based on the actual full-order process model, and Scheme 4, an LSTM model (

n_{l} = 2

) with all the measured variables (u, y) as well as the target output

y_{t}

as the inputs and the target output

y_{t}

as the single output. For these schemes, they are tuned to give their best performance for a fair comparison. Further, we note that the soft sensor in Scheme 2 was developed based on a dense neural network with y and u as the inputs and

y_{t}

as the output. Such a soft sensor explores the static correlation between y, u, and

y_{t}

but does not consider the dynamics of the system. Scheme 4 explores only the dynamic relation between the measured variables and the target output. While in the proposed Scheme 1, the LSTM includes more state variables in its inputs and outputs to capture the dynamics that are essential for estimating the target output. Since there are unmeasured state variables in the LSTM model used in Scheme 1, the EKF is used together with the LSTM to estimate the target output based on the measured variables.

First, we show the LSTM modeling and reduced-order estimation performance. Figure 6A,B shows the actual target output

X_{B 3}

and the one-step ahead and multi-step ahead open-loop predictions using the trained LSTM. From these plots, it can be seen that the trained LSTM model has a very good performance in predicting the evolution of

X_{B 3}

. Based on many simulations with different initial conditions and noise realizations, the corresponding

σ_{X_{B 3}}

for the single-step and multi-step ahead predictions are 0.111% and 1.003%. These numbers further verify that the trained model has a good performance. Note that the results in Figure 6A,B are based on initialization of the LSTM model from the actual initial

X_{B 3}

value. It only shows the performance of the trained model.

Figure 6C shows the estimation performance of the proposed reduced-order estimation scheme. The estimator was initialized using a value that is different from the actual

X_{B 3}

value. From the plot, it can be seen that the estimate of the proposed reduced-order estimation scheme (Scheme 1) can converge to the actual value quickly and then follow the actual value closely. The corresponding

σ_{X_{B 3}}

of the proposed reduced-order estimator is 1.43% from extensive simulations with different initial conditions and noise realizations. This demonstrates that the proposed approach is effective and applicable if only the target output needs to be estimated.

Next, we present the performance of the soft sensor in Scheme 2. The inputs of the soft sensor are

T_{1}

,

T_{2}

,

T_{3}

,

Q_{1}

,

F_{20}

,

F_{r}

,

Q_{3}

,

Q_{2}

,

F_{10}

, and

F_{p}

and the output is the target variable

X_{B 3}

. A dense neural network is trained. Figure 7 shows the results on the same trajectory of

X_{B 3}

as used in the previous simulation. From the figure, it can be seen that while the soft sensor can overall track the trend of

X_{B 3}

, the prediction performance is much poorer compared with the estimator in Scheme 1. The

σ_{X_{B 3}}

of the soft sensor (Scheme 2) calculated from various simulations was 5.36%, which is much larger than the value for the proposed reduced-order estimator. This set of simulations illustrated that the proposed reduced-order estimator gives a much-improved estimation performance compared with the soft sensor. The improvement in the estimation performance in Scheme 1 compared with Scheme 2 is from the explicit consideration of the dynamics of the system and the use of EKF. Note that the performance metrics reported are obtained from many simulations.

Then, we consider the full-order EKF based on the actual nonlinear model of the process. Figure 8 shows the estimation performance of the full-order EKF (Scheme 3). From Figure 8, it can be seen that the full-order EKF can also track the trend of

X_{B 3}

, but the estimate is very noisy compared with the estimated values in Scheme 1. The corresponding

σ_{X_{B 3}}

value of the full-order EKF is 4.61%, which is higher than the value for Scheme 1.

The decreased performance of the full-order estimator in Scheme 3 compared with Scheme 1 may be explained by examining the degrees of the observability of the estimated variables in the two schemes. Let us consider the following criterion for measuring the degree of observability of a system [1,24]:

\begin{matrix} γ (D_{o}) = \frac{m i n {λ_{i} (D_{o} | i = 1, \dots, n)}}{m a x {λ_{i} (D_{o} | i = 1, \dots, n)}} \end{matrix}

(29)

where

D_{o}

is the observability matrix of the system and

λ_{i}

is the ith singular value of

D_{o}

. For the nonlinear process and the nonlinear reduced-order model, we use linearization to find the corresponding linear system and then construct the corresponding observability matrix

D_{o}

. For the full-order nonlinear system, the degree of observability was found to be about 1.3

\times 10^{- 5}

, and for the reduced-order model, the degree of observability was 0.02. It is obvious that the reduced-order system has a much larger degree of observability compared with the full-order system. This makes sense since, in the reduced-order model, the measured outputs are only used to estimate the four selected states, but in the full-order estimator, the same number of measured outputs are used to estimate the entire state vector x, which contains nine elements. The much-improved degree of observability explains why Scheme 1 gives a much-improved estimation performance compared with Scheme 3.

Now, we present the performance of the LSTM model in Scheme 4. Figure 9 shows the predicted target output and the actual trajectory of the target output. From the figure, it can be seen that the LSTM in Scheme 4 gives a relatively poor performance in predicting the target output compared with the proposed approach in Scheme 1. This can also be seen from the

σ_{X_{B 3}}

value, which is 2.00% for Scheme 4 and is 1.43% for the proposed Scheme 1. This is indeed expected since the LSTM in Scheme 4 only uses the measured variables, and they cannot appropriately capture the essential dynamics that are needed to describe the target output

X_{B 3}

. The proposed approach keeps all the necessary variables needed to capture the dynamics of

X_{B 3}

in the LSTM, and the EKF can be used to estimate the target output and the other unmeasured variables. It is also verified that the proposed approach is more robust to noise in the measured variables given the use of the EKF, while the LSTM in Scheme 4 is more sensitive to measurement noise. When the variance of the measurement noise increases to

σ_{v} = 0.1

, the proposed approach gives

σ_{X_{B 3}} = 2.02 %

while the LSTM in Scheme 4 leads to a much worse

σ_{X_{B 3}} = 6.12 %

.

Finally, we consider the computational complexity of the four schemes. Table 3 shows the simulation times of the four schemes. It can be seen that the soft sensor in Scheme 2 and the LSTM in Scheme 4 are the fastest. The proposed reduced-order estimator (Scheme 1) gives the best estimation performance and has a relatively smaller computational complexity (15 s for the entire simulation) compared with the full-order estimator (20 s for the entire simulation). The smaller computational times of Schemes 2 and 4 are due to the fact that they do not need to evaluate EKF. Whereas Scheme 1 and Scheme 3 calculate the Jacobians at every instant for predicting the covariance matrix, and also EKF has the update step, which requires additional time. Scheme 3 evaluates the Jacobian matrix, which is bigger than Scheme 1, and also updates for all the variables; so comparatively, the computational cost is higher than Scheme 1. This further illustrates that the proposed reduced-order estimator can bring much-improved estimation performance using even fewer computational resources compared with a full-order estimator.

6. Conclusions

This paper proposes an approach to select the appropriate inputs and outputs for data-driven reduced-order model development in the framework of the LSTM neural network for reduced-order estimator design. A sensitivity-based approach was used in the reduced state and input vector selection. The LSTM neural network was used to develop the reduced-order model, and EKF was used to develop the reduced-order estimator. The application to a chemical process demonstrated the applicability and effectiveness of the proposed approach in achieving good target output estimation. In the simulations, the proposed approach was compared with a soft sensor design that did not consider the dynamics of the process, a full-order EKF, and an LSTM that only uses the measured variables. It was found that the proposed approach gives the best target output estimation performance whose

σ_{X_{B 3}}

is about 30% smaller than Scheme 4 which uses only measured variables in the training of the LSTM and is more than 70% smaller than the full-order state estimator and the soft sensor that only uses the static relationship between the measured variables and the target output.

Author Contributions

Conceptualization, S.D. and J.L.; methodology, S.D., S.R.S., B.T.A. and J.L.; software, S.D.; validation, S.D.; investigation, S.D.; data curation, S.D.; writing—original draft preparation, S.D.; writing—review and editing, J.L.; supervision, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

Financial support from Natural Sciences and Engineering Research Council of Canada and Alberta Innovates is gratefully acknowledge.

Data Availability Statement

The data used in this study are available at Harvard Dataverse [33]. The simulations and calculations were carried out using Python and the code files are also available in the dataverse.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yin, X.; Liu, J. State estimation of wastewater treatment plants based on model approximation. Comput. Chem. Eng. 2018, 111, 79–91. [Google Scholar] [CrossRef]
Farrell, B.F.; Ioannou, P.J. State Estimation Using a Reduced-Order Kalman Filter. J. Atmos. Sci. 2001, 58, 3666–3680. [Google Scholar] [CrossRef]
Debnath, S.; Sahoo, S.R.; Decardi-Nelson, B.; Liu, J. Subsystem decomposition and distributed state estimation of nonlinear processes with implicit time-scale multiplicity. AIChE J. 2022, 68, e17661. [Google Scholar] [CrossRef]
Sahoo, S.R.; Liu, J. Adaptive model reduction and state estimation of agro-hydrological systems. Comput. Electron. Agric. 2022, 195, 106825. [Google Scholar] [CrossRef]
Singh, A.K.; Hahn, J. State estimation for high-dimensional chemical processes. Comput. Chem. Eng. 2005, 29, 2326–2334. [Google Scholar] [CrossRef]
Narasingam, A.; Kwon, J.S.I. Data-driven identification of interpretable reduced-order models using sparse regression. Comput. Chem. Eng. 2018, 119, 101–111. [Google Scholar] [CrossRef]
Son, S.H.; Choi, H.K.; Moon, J.; Kwon, J.S.I. Hybrid Koopman model predictive control of nonlinear systems using multiple EDMD models: An application to a batch pulp digester with feed fluctuation. Control Eng. Pract. 2022, 118, 104956. [Google Scholar] [CrossRef]
Shah, P.; Sheriff, M.Z.; Bangi, M.S.F.; Kravaris, C.; Kwon, J.S.I.; Botre, C.; Hirota, J. Deep neural network-based hybrid modeling and experimental validation for an industry-scale fermentation process: Identification of time-varying dependencies among parameters. Chem. Eng. J. 2022, 441, 135643. [Google Scholar] [CrossRef]
Lee, D.; Jayaraman, A.; Kwon, J.S. Development of a hybrid model for a partially known intracellular signaling pathway through correction term estimation and neural network modeling. PLoS Comput. Biol. 2020, 16, e1008472. [Google Scholar] [CrossRef]
Bangi, M.S.F.; Kwon, J.S.I. Deep hybrid model-based predictive control with guarantees on domain of applicability. AIChE J. 2022. accepted. [Google Scholar] [CrossRef]
Alhajeri, M.S.; Wu, Z.; Rincon, D.; Albalawi, F.; Christofides, P.D. Machine-learning-based state estimation and predictive control of nonlinear processes. Chem. Eng. Res. Des. 2021, 167, 268–280. [Google Scholar] [CrossRef]
Li, D.; Zhou, J.; Liu, Y. Recurrent-neural-network-based unscented Kalman filter for estimating and compensating the random drift of MEMS gyroscopes in real time. Mech. Syst. Signal Process. 2021, 147, 107057. [Google Scholar] [CrossRef]
Xia, Y.; Wang, J. Low-dimensional recurrent neural network-based Kalman filter for speech enhancement. Neural Netw. 2015, 67, 131–139. [Google Scholar] [CrossRef]
Fernando, T.; Maier, H.; Dandy, G. Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach. J. Hydrol. 2009, 367, 165–176. [Google Scholar] [CrossRef]
Balachandran, P.V.; Xue, D.; Theiler, J.; Hogden, J.; Gubernatis, J.E.; Lookman, T. Importance of feature selection in machine learning and adaptive design for materials. In Materials Discovery and Design; Springer: Berlin/Heidelberg, Germany, 2018; pp. 59–79. [Google Scholar]
Vijaya Raghavan, S.; Radhakrishnan, T.; Srinivasan, K. Soft sensor based composition estimation and controller design for an ideal reactive distillation column. ISA Trans. 2011, 50, 61–70. [Google Scholar] [CrossRef] [PubMed]
Ke, W.; Huang, D.; Yang, F.; Jiang, Y. Soft sensor development and applications based on LSTM in deep neural networks. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–6. [Google Scholar]
Sharmin, R.; Sundararaj, U.; Shah, S.; Vande Griend, L.; Sun, Y.J. Inferential sensors for estimation of polymer quality parameters: Industrial application of a PLS-based soft sensor for a LDPE plant. Chem. Eng. Sci. 2006, 61, 6372–6384. [Google Scholar] [CrossRef]
Zamprogna, E.; Barolo, M.; Seborg, D.E. Optimal selection of soft sensor inputs for batch distillation columns using principal component analysis. J. Process. Control 2005, 15, 39–52. [Google Scholar] [CrossRef]
Zhao, T.; Zheng, Y.; Wu, Z. Improving Computational Efficiency of Machine Learning Modeling of Nonlinear Processes Using Sensitivity Analysis and Active Learning. Digit. Chem. Eng. 2022, 3, 100027. [Google Scholar] [CrossRef]
Zhao, T.; Zheng, Y.; Gong, J.; Wu, Z. Machine learning-based reduced-order modeling and predictive control of nonlinear processes. Chem. Eng. Res. Des. 2022, 179, 435–451. [Google Scholar] [CrossRef]
Zhao, T.; Zheng, Y.; Wu, Z. Feature selection-based machine learning modeling for distributed model predictive control of nonlinear processes. Comput. Chem. Eng. 2023, 169, 108074. [Google Scholar] [CrossRef]
Liu, J.; Gnanasekar, A.; Zhang, Y.; Bo, S.; Liu, J.; Hu, J.; Zou, T. Simultaneous State and Parameter Estimation: The Role of Sensitivity Analysis. Ind. Eng. Chem. Res. 2021, 60, 2971–2982. [Google Scholar] [CrossRef]
Grubben, N.L.; Keesman, K.J. Controllability and observability of 2D thermal flow in bulk storage facilities using sensitivity fields. Int. J. Control 2018, 91, 1554–1566. [Google Scholar] [CrossRef] [Green Version]
Stigter, J.D.; Joubert, D.; Molenaar, J. Observability of Complex Systems: Finding the Gap. Sci. Rep. 2017, 7, 16566. [Google Scholar] [CrossRef] [Green Version]
Stigter, J.; van Willigenburg, L.; Molenaar, J. An Efficient Method to Assess Local Controllability and Observability for Non-Linear Systems. IFAC-PapersOnLine 2018, 51, 535–540. [Google Scholar] [CrossRef]
Stigter, J.D.; Molenaar, J. A fast algorithm to assess local structural identifiability. Automatica 2015, 58, 118–124. [Google Scholar] [CrossRef]
Li, R.; Henson, M.; Kurtz, M. Selection of model parameters for off-line parameter estimation. IEEE Trans. Control. Syst. Technol. 2004, 12, 402–412. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Ma, H. Optimal investment portfolios for internet money funds based on LSTM and La-VaR: Evidence from China. Mathematics 2022, 10, 2864. [Google Scholar] [CrossRef]
Abuqaddom, I.; Mahafzah, B.A.; Faris, H. Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl.-Based Syst. 2021, 230, 107391. [Google Scholar] [CrossRef]
Rauh, A.; Wirtensohn, S.; Hoher, P.; Reuter, J.; Jaulin, L. Reliability assessment of an unscented Kalman filter by using ellipsoidal enclosure techniques. Mathematics 2022, 10, 3011. [Google Scholar] [CrossRef]
Yin, X.; Liu, J. Distributed moving horizon state estimation of two-time-scale nonlinear systems. Automatica 2017, 79, 152–161. [Google Scholar] [CrossRef]
Debnath, D.; Sahoo, S.R.; Agyeman, B.T.; Liu, J. Replication Data for: Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design. Harvard Dataverse. 2023. Available online: https://doi.org/10.7910/DVN/7W68ED (accessed on 9 January 2023). [CrossRef]

Figure 1. The flow chart of the proposed approach.

Figure 2. Two continuous-stirred tank reactors and a flash separator process.

Figure 3. Trajectories of input

F_{10}

and the target output

X_{B 3}

.

Figure 3. Trajectories of input

F_{10}

and the target output

X_{B 3}

.

Figure 4. Singular values of

S_{O}

and the

D_{j}

values associated with the dominant singular values.

Figure 4. Singular values of

S_{O}

and the

D_{j}

values associated with the dominant singular values.

Figure 5. Singular values of

S_{C}

and the

D_{j}

values associated with the dominant singular values.

Figure 5. Singular values of

S_{C}

and the

D_{j}

values associated with the dominant singular values.

Figure 6. Trajectories of the actual states

X_{B 3}

, the single-step ahead prediction (A), the multi-step ahead prediction using the LSTM model (B), and the estimated target variable using the proposed reduced-order estimator in Scheme 1 (C).

Figure 6. Trajectories of the actual states

X_{B 3}

, the single-step ahead prediction (A), the multi-step ahead prediction using the LSTM model (B), and the estimated target variable using the proposed reduced-order estimator in Scheme 1 (C).

Figure 7. Trajectories of the actual states

X_{B 3}

and the predicted target variable using the soft sensor in Scheme 2.

Figure 7. Trajectories of the actual states

X_{B 3}

and the predicted target variable using the soft sensor in Scheme 2.

Figure 8. Trajectories of the actual states

X_{B 3}

and the estimated target variable using the full-order estimation in Scheme 3.

Figure 8. Trajectories of the actual states

X_{B 3}

and the estimated target variable using the full-order estimation in Scheme 3.

Figure 9. Trajectories of the actual states

X_{B 3}

and the predicted target variable using the soft sensor in Scheme 4.

Figure 9. Trajectories of the actual states

X_{B 3}

and the predicted target variable using the soft sensor in Scheme 4.

Table 1. Parameter values.

$T_{01} = 300$ K	$Δ H_{1} = - 6.0 \times 10^{4}$ kJ/kmol
$T_{02} = 300$ K	$Δ H_{2} = - 7.0 \times 10^{4}$ kJ/kmol
$E_{1} = 5 \times 10^{4}$ kJ/kmol	$R = 8.314$ kJ/ kmol K
$E_{2} = 6 \times 10^{4}$ kJ/kmol	$k_{1} = 9.972 \times 10^{6}$ ${hr}^{- 1}$
$c_{p} = 4.2$ kJ/(kg·K)	$k_{2} = 9.36 \times 10^{6}$ ${hr}^{- 1}$
$Δ H_{v a p A} = - 3.57 \times 10^{4}$ kJ/kmol	$V_{1} = 4$ $m^{3}$
$Δ H_{v a p B} = - 1.57 \times 10^{4}$ kJ/kmol	$V_{2} = 4$ $m^{3}$
$Δ H_{v a p C} = - 4.07 \times 10^{4} kJ / kmol$	$V_{3} = 4$ $m^{3}$
$α_{A} = 3.5$	$α_{B} = 1.0$
$α_{C} = 0.5$	$X_{A 10} = 1.0$
$X_{B 10} = 0.0$	$X_{A 20} = 1.0$
$X_{B 20} = 0.0$	$ρ = 1000 kg / m^{3}$
$Q_{1} = 3 \times 10^{6} kJ / h$	$Q_{2} = 1 \times 10^{6}$ kJ/h
$Q_{3} = 3 \times 10^{6} kJ / h$	$F_{p} = 0.5 m^{3} / h$
$F_{10} = 12.0 m^{3} / h$	$F_{20} = 3.0 m^{3} / h$
$F_{r} = 13.4 m^{3} / h$

Table 2. Elements of the reduced state and input vectors and the used measured outputs.

State	$X_{B 3}$	$T_{1}$	$T_{2}$	$X_{B 2}$
Output	$T_{1}$	$T_{2}$
Input	$Q_{1}$	$F_{20}$	$F_{r}$	$Q_{3}$	$Q_{2}$	$F_{10}$

Table 3.

σ_{X_{B 3}}

values for the trained LSTM model and the different schemes.

Table 3.

σ_{X_{B 3}}

values for the trained LSTM model and the different schemes.

Methods	$σ_{X_{B 3}}$ in Percentage (%)	Simulation Time (s)
Single-step ahead prediction (SSAP)	0.111	-
Multi-step ahead prediction (MSAP)	1.003	-
Scheme 1	1.43	15
Scheme 2	5.36	3
Scheme 3	4.61	20
Scheme 4	2.00	3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Debnath, S.; Sahoo, S.R.; Agyeman, B.T.; Liu, J. Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design. Mathematics 2023, 11, 400. https://doi.org/10.3390/math11020400

AMA Style

Debnath S, Sahoo SR, Agyeman BT, Liu J. Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design. Mathematics. 2023; 11(2):400. https://doi.org/10.3390/math11020400

Chicago/Turabian Style

Debnath, Sarupa, Soumya Ranjan Sahoo, Bernard Twum Agyeman, and Jinfeng Liu. 2023. "Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design" Mathematics 11, no. 2: 400. https://doi.org/10.3390/math11020400

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Input-Output Selection for LSTM-Based Reduced-Order State Estimator Design

Abstract

1. Introduction

2. Preliminaries

2.1. System Description

2.2. Problem Formulation

3. Proposed Reduced Input and State Vectors Selection Approach

3.1. Sensitivity Matrix for Reduced State Selection

3.2. Reduced State Selection via Singular Value Decomposition

3.3. Sensitivity Matrix for Input Selection and Reduced Input Vector Selection

4. Proposed Reduced-Order Estimator Design Approach

4.1. Reduced-Order Model Development

4.2. Extended Kalman Filter Design

5. Application to a Chemical Process

5.1. Process Description and Simulation Settings

5.2. Selection of the Reduced State and Input Vectors

5.3. Reduced-Order Model and Estimator

5.4. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI