Next Article in Journal
An Effective Optimisation Method for Coupled Wind–Hydrogen Power Generation Systems Considering Scalability
Next Article in Special Issue
Are Neural Networks the Right Tool for Process Modeling and Control of Batch and Batch-like Processes?
Previous Article in Journal
Structure Size Optimization and Internal Flow Field Analysis of a New Jet Pump Based on the Taguchi Method and Numerical Simulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks

1
Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore
2
Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, USA
*
Author to whom correspondence should be addressed.
Processes 2023, 11(2), 342; https://doi.org/10.3390/pr11020342
Submission received: 23 November 2022 / Revised: 12 January 2023 / Accepted: 17 January 2023 / Published: 20 January 2023
(This article belongs to the Special Issue Machine Learning in Model Predictive Control and Optimal Control)

Abstract

:
This work focuses on the development of a Lyapunov-based economic model predictive control (LEMPC) scheme that utilizes recurrent neural networks (RNNs) with an online update to optimize the economic benefits of switched non-linear systems subject to a prescribed switching schedule. We first develop an initial offline-learning RNN using historical operational data, and then update RNNs with real-time data to improve model prediction accuracy. The generalized error bounds for RNNs updated online with independent and identically distributed (i.i.d.) and non-i.i.d. data samples are derived, respectively. Subsequently, by incorporating online updating RNNs within LEMPC, probabilistic closed-loop stability, and economic optimality are achieved simultaneously for switched non-linear systems accounting for the RNN generalized error bound. A chemical process example with scheduled mode transitions is used to demonstrate that the closed-loop economic performance under LEMPC can be improved using an online update of RNNs.

1. Introduction

An economic model predictive control (EMPC) that addresses economic considerations within process control has attracted considerable attention in the control community over recent decades. Model predictive control (MPC) is applied in a wide variety of applications due to its ability to handle hard constraints on system states and manipulated inputs. The key idea of MPC is to compute an optimal input sequence using state feedback at the current sampling instant, and only the first input is fed to the system. Typically, a quadratic cost function is used in tracking MPC schemes to penalize the deviation of predicted system states and manipulated inputs from their steady-state values over a finite prediction horizon, such that the system is driven to its desired steady-state by minimizing the quadratic cost function. Unlike the steady-state operation of conventional tracking MPC schemes, EMPC generally uses a non-quadratic objective function to operate in a dynamic fashion (off steady-state) by optimizing process economics. Many research works have been developed to address closed-loop stability, economic optimization considerations, and model uncertainty for non-linear systems under EMPC (e.g., [1,2,3,4,5,6]).
Since, in real life, dynamical processes often involve mode transitions that may arise due to various reasons (e.g., actuator/sensor faults, feedback changes, and changes in environmental factors), it gives rise to an important research subject of switched systems. A class of systems with multiple switching modes is termed switched systems, whose active mode is determined by a switching signal. Switch systems have wide applications in engineering practice (e.g., mobile robots [7], electrical circuits [8], and flight control [9]). Control and optimization of switched systems have been extensively explored using methods in terms of Lyapunov stability theory [10,11], dwell-time [12,13], and linear matrix inequality [14]. In [15], a Lyapunov-based MPC framework was presented to stabilize switched non-linear systems that execute mode transitions at the prescribed switching times. Following this direction, in [16], a Lyapunov-based EMPC method was proposed to address the stabilization and economic optimality of switched non-linear systems subject to a prescribed switching schedule.
An accurate process model is a key requirement for achieving the desired control performance under EMPC. To this end, the above EMPC schemes often assume that the process model with the desired prediction accuracy can be obtained using first-principles modeling approaches. However, capturing the non-linear dynamics of complex and large-scale systems based on first-principles modeling approaches can be cumbersome and inaccurate when the physio-chemical phenomena of the system are not well-understood. Machine learning (ML) algorithms have shown great success in a variety of application domains in recent years, e.g., the resolving-power domination number of probabilistic neural networks was investigated in [17], and Gaussian process models were used to capture the dynamics of non-linear processes with unknown dynamics in [18] and with time-varying dynamics in [19]. As a powerful black box modeling tool among various ML algorithms, RNNs have achieved astonishing results in ML-based control for non-linear systems since they can approximate non-linear dynamics based on time-series data [20,21,22]. In [23], an RNN model was constructed offline to predict future states for EMPC that optimize economic benefits for non-linear systems while maintaining closed-loop stability. However, since ML models are generally trained offline to model non-linear systems under normal operation (i.e., without model uncertainty) using historical operational data, the resulting offline-trained ML models may not well approximate real-time non-linear dynamics subject to model uncertainty. Therefore, the presence of model uncertainty could result in degradation of the control performance of real-world non-linear processes under ML-based EMPC with offline-trained ML models. In [24], online learning with event-triggered and error-triggered mechanisms was applied to update ML models based on real-time data to learn model uncertainty, thus improving the control performance of non-linear systems subject to model uncertainty under ML-based EMPC.
Many works have been developed to integrate online learning models with a control design for non-linear processes (e.g., [25,26,27,28]). Although online learning models have shown their effectiveness in improving the prediction accuracy and control performance of non-linear processes, characterizing their generalization performance on the unseen testing set remains a critical challenge for real-time implementation of online ML-based controllers in practice. The generalized error bound is widely used to evaluate how an ML model developed using the training set can generalize well to the unseen testing set. The generalized error bound for online ML models has been developed in [29,30,31] by assuming that the online learner receives a data sequence generated in an i.i.d. manner. Certain efforts have been made in [32,33] to remove the i.i.d. assumption on training data points, for which the generalized error bound for online ML models updated with a set of non-i.i.d. data points has been derived. In our previous work, the generalized error bounds for RNNs updated online using i.i.d. and non-i.i.d. data points were established in [34,35], respectively, and the error bounds were utilized to derive closed-loop stability properties for switched non-linear systems without and with process disturbances under online updating RNN-based MPC. However, at this stage, it remains unclear how online learning RNNs can be integrated with EMPC to optimize economic benefits for switched non-linear systems while maintaining closed-loop stability.
To fill this research gap, this work aims to incorporate online learning RNNs into LEMPC to address closed-loop stability and economic optimality for switched non-linear systems operating under scheduled mode transitions. Specifically, the notation, class of switched non-linear systems, and the developments of RNNs are presented in Section 2. The generalized error bounds for RNNs updated online using i.i.d. and non-i.i.d. training data points are derived in Section 3. In Section 4, an LEMPC scheme that integrates online updating RNNs is proposed for switched non-linear systems involving process disturbances, under which probabilistic closed-loop stability is proved based on the RNN generalized error bound. In Section 5, a non-linear chemical process example with scheduled mode transitions is presented to demonstrate the efficacy of the proposed LEMPC scheme.

2. Preliminaries

2.1. Notation

The operators · F and | · | are used to represent the Frobenius norm of a matrix and the Euclidean norm of a vector, respectively. The function f ( · ) belongs to class C 1 if f ( · ) is continuously differentiable. We use the operator \ to represent set subtraction, i.e., M \ V : = { x R n | x M , x V } . The continuous function f : [ 0 , a ) [ 0 , ) belongs to class K if it satisfies f ( 0 ) = 0 and increases strictly in its domain. E [ X ] and P ( A ) and are used to denote the expected value of a random variable X and the probability of a event A occurring, respectively. Let a m and b m be two sequences, we will write a m = O b m provided that lim sup m a m / b m < .

2.2. Class of Switched Non-Linear Systems

In this work, a class of switched non-linear systems described by the following first-order ordinary differential equations (ODEs) is considered.
x ˙ = F σ ( t ) ( x , u σ ( t ) , w σ ( t ) ) : = f σ ( t ) ( x ) + g σ ( t ) ( x ) u σ ( t ) + h σ ( t ) ( x ) w σ ( t )
where x R n , u σ ( t ) R n u , and w σ ( t ) R n w denote the vectors of system states, control inputs, and disturbances, respectively. The control input constraint is given by u σ ( t ) U σ ( t ) , where the set U σ ( t ) defines the vectors of the minimum value u σ ( t ) m i n and the maximum value u σ ( t ) m a x for the input constraint (i.e., U σ ( t ) : = u σ ( t ) m i n u σ ( t ) u σ ( t ) m a x ). The disturbance vector is subject to the constraint w σ ( t ) W σ ( t ) : = | w σ ( t ) | w m σ ( t ) , w m σ ( t ) 0 . The switching function σ ( t ) : [ 0 , ) ψ takes a value in ψ : = { 1 , , p } . The number of switching modes is denoted by p. Throughout this manuscript, the notations t k o u t and t k i n are used to represent the time at which the k-th mode (i.e., k ψ ) of Equation (1) is switched out and in, respectively. Therefore, the state-space model of Equation (1) is denoted by x ˙ = F k ( x , u k , w k ) with σ ( t ) = k when the system operates under mode k for t t k i n , t k o u t . For all k ψ , f k ( · ) , g k ( · ) , and h k ( · ) are assumed to be sufficiently smooth functions of dimensions n × 1 , n × n u , and n × n w , respectively. Additionally, for all k ψ , we assume that f k ( 0 ) = 0 and the initial time t 0 is zero, indicating that the origin is a steady-state of Equation (1) without disturbances (i.e., the nominal system). All states are assumed to be measurable at each sampling instant t q = t k i n + q Δ , where Δ is the sampling period, q = 0 , 1 , , N k , and N k is assumed to be a positive integer denoting the total number of sampling periods within t [ t k i n , t k o u t ).
For each mode k ψ , a stabilizing controller u k = Φ k ( x ) U k (e.g., the universal Sontag control law [36]) is assumed to exist in the sense that the origin of Equation (1) without disturbances is rendered exponentially stable. Following the construction method in [37], a level set of V k ( x ) (denoted by Ω ρ k : = x R n V k ( x ) ρ k , where 0 < ρ k for k ψ ) is used to represent the stability region of Equation (1) operating under mode k. Additionally, taking into account by the boundedness of x , u k , w k , the smoothness assumed for f k ( · ) , g k ( · ) , and h k ( · ) , and the continuous differentiable property of V k ( x ) , positive constants L w k , L w k , L x k , L x k , M k are assumed to exist, such that the following inequalities hold for all x , x Ω ρ k , w k W k , u k U k , k ψ :
F k ( x , u k , 0 ) F k ( x , u k , w k ) L w k | w k | + L x k x x
V k ( x ) x F k ( x , u k , 0 ) V k ( x ) x F k ( x , u k , w k ) L w k | w k | + L x k x x
| F k ( x , u k , w k ) | M k

2.3. Recurrent Neural Networks (RNN)

As opposed to the architecture of a traditional feedforward neural network in which signals are transmitted in only one direction, information in an RNN travels in both directions (i.e., forward and backward) due to the inclusion of recurrent loops as shown in Figure 1. This enables the feedback of signals associated with previous inputs back into the network, and fosters a temporal dynamic behavior that corresponds to the numerical techniques (e.g., the explicit Euler method) for solving an ODE. Therefore, the architecture of the RNN is especially suitable for modeling non-linear dynamic systems governed by ODEs.
In this work, we use a one-hidden-layer RNN described by the following form as a surrogate model for Equation (1):
h t , = σ h Q h t , 1 + W x t , y t , = σ y V h t ,
where y t , R d y , x t , R d x , and h t , R d h , t = 1 , , T (T is the data sequences) and = 1 , , L n n ( L n n is the time length), are the RNN outputs, the RNN inputs, and the hidden states, respectively. The weight matrices V R d y × d h , Q R d h × d h , and W R d h × d x are associated with the output layer, the hidden layer, and the input layer, respectively. The non-linear activation functions σ y and σ h are associated with the output and the hidden layers, respectively. In this work, we follow the method in [37] to generate historical data for developing the initial offline-learning RNN. Specifically, numerous open-loop simulations are carried out for Equation (1) without disturbances operating under mode k with various conditions x Ω ρ k and u k U k , where u k is applied to the system of Equation (1) in a sample-and-hold fashion at each sampling step (i.e., u k ( t ) = u k t q holds for t t q , t q + 1 , t q + 1 : = t q + Δ , where Δ is the sampling period). Subsequently, the initial RNN model is developed with open-loop simulation data to predict one sampling period forward using x = x ( t q ) u k ( t q ) as RNN inputs. The RNN outputs y are the predicted states within one sampling period (i.e., t [ t q , t q + 1 ) ) that contain all internal time steps of L n n = Δ / h ¯ c , where h ¯ c denotes the integration time step with a sufficiently small value used for solving the system of Equation (1) with numerical methods (e.g., the explicit Euler method). Without loss of generality, RNNs are developed under the following assumptions [38]:
Assumption 1. 
An upper bound exists for the RNN inputs in the sense that for all t = 1 , , T , x t , B X , where = 1 , , L n n .
Assumption 2. 
There are upper bounds for the weight matrices in the sense that V F B V , F , Q F B Q , F , and W F B W , F .
Assumption 3. 
σ y is a positive-homogeneous and 1-Lipschitz continuous activation function in the sense that for all β 0 , σ y ( β z ) = β σ y ( z ) , where z R .
Let h ( · ) H be the RNN model mapping the RNN input to the RNN output ( x y ), where H denotes the hypothesis class. In this work, we use the mean squared error (MSE) as the loss function L ( h x ) , y ¯ for the development of RNNs, where y ¯ R d y represents the true or labeled output vector. Since the training dataset for RNNs is generated for bounded states x Ω ρ k and inputs u k U k , there is an upper bound (denoted by r ) for the RNN output y and the true output y ¯ , i.e., | y ¯ | , | y | r for = 1 , , L n n , where r > 0 . Therefore, the MSE loss function meets the local Lipschitz property in the sense that for all | y | , | y ¯ | r , the inequality L y , y ¯ L y , y ¯ L r y y is satisfied, where L r represents the local Lipschitz constant.

3. Online Learning of RNNs

Offline-trained RNNs are constructed based on historical data gathered from Equation (1) without disturbances, and may not be capable of capturing the dynamics of Equation (1) in real-time operation involving disturbances. Therefore, online learning is applied to update ML models to approximate the non-linear dynamics of Equation (1) with disturbances using real-time data. To integrate online ML models with EMPC for switched non-linear systems, ML models need to be developed with the desired predictive capacity on unseen testing data, which is commonly measured by generalized error bounds. In this section, the generalized error bounds for RNN models updated online using i.i.d. and non-i.i.d. training data points are developed, respectively.

3.1. Generalized Error of RNNs Updated Online with i.i.d. Training Data

We first consider a special case of switched non-linear systems, where the system dynamics of Equation (1) does not vary over time, and, thus, Equation (1) can be simplified to the following state-space model:
x ˙ = F ( x , u ) : = f ( x ) + g ( x ) u
It is assumed that there exist multiple steady-states x s k for the non-linear system of Equation (4) under the stabilizing controller u = u s , where k ψ = { 1 , 2 , , p } and p > 1 is a positive integer that represents the number of steady-states. When Equation (4) operates within the stability region around the steady-state x s k , k ψ , t t k i n , t k o u t , we define the system operating in mode k. In this section, we assume that historical data are only available for a portion of the stability region around x s k . The initial RNN developed with the limited historical data around x s k may not be capable of approximating the system dynamics of Equation (4) when the system operates in the stability region around another steady-state (e.g., x s f , f ψ ). Therefore, it is essential to update the RNN models with improved prediction accuracy through online learning. In this case, online learning RNNs are developed using real-time process data that are drawn i.i.d. from the system of Equation (4).
We consider a sequence of data samples ( X 1 , Y 1 ) , , ( X T , Y T ) drawn i.i.d. from the same distribution D (e.g., the system of Equation (1)), where the online update of ML models takes place sequentially by processing these i.i.d. samples. Specifically, given an initial hypothesis h 1 H , the online learner A receives an instance X t and makes a prediction h t ( X t ) on the t-th round, where t = 1 , , T . Subsequently, the learner A receives the true output Y t , incurs the loss L ( h t ( X t ) , Y t ) (e.g., the MSE loss), and then updates the ML model from h t to h t + 1 after processing ( X t , Y t ) . Therefore, the learner A yields h 1 , h 2 , h T + 1 (i.e., a sequence of hypotheses) after T rounds. To simplify the notation, the loss L ( h t ( X t ) , Y t ) is denoted by L ( h t , Z t ) for any data sample Z t = ( X t , Y t ) , and we use the shorthand Z n m to represent Z n , Z n + 1 , , Z m (i.e., a sequence of data samples). In general, the goal of the online learner A is to minimize the regret after the end of T rounds, which is defined as follows [32]:
Reg A ( T ) = t = 1 T L h t , Z t t = 1 T L h , Z t
where the first term denotes the cumulative loss of hypotheses h 1 , , h T , and the second term represents the minimum cumulative loss that is achieved by the best mode h in the hypothesis class H , where h is defined by: h = arg min h H t = 1 T L h , Z t . Note that we can only obtain h in hindsight after the learner receives all the samples Z 1 T . Given a hypothesis h, its generalized error is defined as the expected loss at a new data point ( X , Y ) : R ( h ) = E L h X , Y .
The following lemma gives the generalized error bound for an ensemble hypothesis of ML models updated online using i.i.d. training set.
Lemma 1 
([34]). Given the training set Z 1 T drawn i.i.d. from the same distribution D , the learner A produces the hypotheses h 1 , , h T by processing the samples Z 1 T sequentially with the loss function L ( · , · ) that is convex in its first argument and satisfies 0 L ( · , · ) M . Let Ω T = : λ R T λ t 0 t = 1 , , T , and t = 1 T λ t = 1 be a unit simplex, λ = ( λ 1 λ T ) Ω T be a weight vector, and h = t = 1 T λ t h t be the ensemble hypothesis. Then, with a probability no less than 1 δ , the following inequalities are satisfied for any δ > 0 :
R ( h ) M | λ | 2 log 1 δ + t = 1 T λ t L h t , Z t
R ( h ) M | λ | 2 log 1 δ + t = 1 T M | λ t 1 T | + t = 1 T λ t L h , Z t + Reg A ( T ) T
The generalized error bound for h = t = 1 T λ t h t given by the right-hand side (RHS) of Equation (6) depends on the cumulative loss incurred by the online algorithm after T rounds (the second term) and an error function (the first term) with respect to some parameters λ , δ , and M that represent the weight vector, the confidence level, and the upper bound for L ( · , · ) , respectively. Additionally, Equation (7) is derived using online-to-batch conversion to establish an important connection between the generalized error in the batch setting and the regret of an online learning algorithm. In detail, the generalized error R ( h ) is bounded by two error functions based on T, λ , M, and δ , the cumulative loss of h , and the average regret in T rounds. The average regret converges to zero if the online algorithm A achieves a sub-linear regret bound (i.e., Reg A ( T ) = O ( T ) ), and the two error functions are known once the parameters of δ and λ are chosen. Finally, it is noted that the weight vector λ is a dominant factor that affects the calculation of the generalized error bound in Lemma 1. Therefore, to achieve a low generalized error, the weight vector λ for the hypotheses h 1 , , h T can be optimized by solving the optimization problem as follows [34]:
min λ Ω T t = 1 T λ t L h t , Z t s . t . t = 1 T | λ t 1 T | α
where the objective function is the cumulative loss of hypotheses h 1 , , h T , t = 1 T | λ t 1 T | α is an inequality constraint used for constraining the difference between the weight λ t and 1 / T , and α 0 denotes a hyperparameter predetermined by a validation procedure.

3.2. Generalized Error of RNNs Updated Online with Non-i.i.d. Training Set

We next consider the switched non-linear systems of Equation (1) subject to bounded disturbances and the system is switched between different modes with time-varying system dynamics. As a result, process data collected in real-time operation of the system of Equation (1) are non-i.i.d. samples. The notations and the training procedure of RNNs updated with non-i.i.d. training samples follow those in the i.i.d. case. The only difference is that ( X T + 1 , Y T + 1 ) (i.e., the new data point) in the non-i.i.d. setting is conditioned on the past samples Z 1 T , and thus, the generalized error of the hypothesis h H is defined as follows [33]:
R T + 1 h , Z 1 T : = E L h X T + 1 , Y T + 1 Z 1 T
The following lemma provides the generalized error bound for an ensemble hypothesis of ML models updated online using non-i.i.d. training set.
Lemma 2 
([33]). Given the non-i.i.d. training set Z 1 T , the learner A yields hypotheses h 1 , , h T + 1 by processing samples Z 1 T sequentially. Let the weight vector λ and the loss function L ( · , · ) be defined in Lemma 1, and h = t = 1 T λ t h t + 1 be the ensemble hypothesis. Then, with probability no less than 1 δ , the following inequalities are satisfied for any δ > 0 :
R T + 1 h , Z 1 T M | λ | 2 log 1 δ + t = 1 T λ t L h t + 1 , Z t + 1 + disc ( λ )
R T + 1 h , Z 1 T M | λ | 2 log 1 δ + t = 1 T M | λ t 1 T | + t = 1 T λ t L h , Z t + 1 + Reg A ( T ) T + disc ( λ )
In contrast to the generalized error bound for ML models updated online using i.i.d. training set in Lemma 1, Lemma 2 contains the term disc ( λ ) in the non-i.i.d. case, which is used to quantify the divergence of the sample and target distributions, and is given by [33]:
disc ( λ ) = sup h t H t = 1 T λ t R t + 1 h t + 1 , Z 1 t R T + 1 h t + 1 , Z 1 T
Since the calculation of disc ( λ ) requires knowledge of the distribution of Z T + 1 and we do not have access to Z T + 1 at the end of the T-th round, the discrepancy disc ( λ ) needs to be estimated based on the given data samples. Based on the results of Theorem 2 in [39] and Lemma 7 in [33], showing that the discrepancy disc ( λ ) can be bounded using sequential Rademacher complexity, the following lemma presents the generalized error bound for ML models updated online using non-i.i.d. training set in terms of sequential Rademacher complexity.
Lemma 3 
([35]). Given the non-i.i.d. training set Z 1 T , let h = t = 1 T λ t h t + 1 be the ensemble hypothesis that is developed satisfying all the conditions in Lemma 2. Consider a family of loss functions F defined by F : = ( x , y ¯ ) L ( h ( x ) , y ¯ ) , h H . For any δ > 0 , the following inequality is satisfied with probability no less than 1 δ :
R T + 1 h , Z 1 T 2 M | λ | 2 log 1 δ + t = 1 T λ t L h t + 1 , Z t + 1 + | λ | + Λ + disc ^ H ( λ ) + 6 log T π M R T s e q ( F )
where disc ^ H ( λ ) : = sup h ¯ , h t H t = 1 T λ t L h t + 1 , Z t + 1 L h t + 1 X T + 1 , h ¯ X T + 1 denotes the empirical discrepancy, Λ : = inf h ¯ H E L r h ¯ X T + 1 Z T + 1 Z 1 T , and R T s e q ( F ) denotes the sequential Rademacher complexity of the function class F .
It is noted that Λ and disc ^ H ( λ ) can be computed and optimized based on the given data samples, and | λ | can be obtained once the weight vector λ is chosen. As a result, to calculate the generalized error bound of Equation (13), it remains to characterize the upper bound on R T s e q ( F ) . The definition of sequential Rademacher complexity is given below.
Definition 1. 
(Sequential Rademacher complexity [33]). Let z be a Z -valued, T-depth tree and G be a class of functions mapping Z R . The sequential Rademacher complexity of a function class G on a Z -valued tree z is given by:
R T s e q ( G ) = sup z E sup g G t = 1 T λ t ϵ t g z t ( ϵ )
where ϵ = ϵ 1 , , ϵ T 1 represents a set of Rademacher random variables drawn i.i.d. from { 1 , 1 } and z t ( ϵ ) denotes z t ( ϵ 1 , ϵ 2 , , ϵ t 1 ) .
Lemma 3 characterizes the generalized error bound for a broad class of online ML models. In the remainder of this section, we will develop the generalized error bound for the RNN model of Equation (3) updated online with non-i.i.d. training set drawn from the non-linear system of Equation (1). We consider the RNN hypothesis class H that maps the first -time-step inputs to the -th output, = 1 , , L n n , and a family of loss functions F associated with H is defined by F : = ( x , y ¯ ) L ( h ( x ) , y ¯ ) , h H . Note that H is a family of vector-valued functions since the RNN model of Equation (3) is developed to approximate the dynamics of the multiple-input and multiple-output non-linear system (i.e., Equation (1)). Following the results of Lemma 4 in [32], we have the following upper bound for R T s e q ( F ) :
R T s e q ( F ) 8 L r 4 2 log 3 / 2 e T 2 + 1 j = 1 d y R T s e q ( H j , )
where H j , denotes a family of real-valued functions that corresponds to the j-th component of H , j = 1 , , d y , and d y represents the RNN output dimension. Subsequently, we develop the upper bound on R T s e q ( H j , ) following the proof techniques in [38] that peel off the weight matrices (i.e., V, W, and Q) and the activation functions (i.e., σ y and σ h ) layer by layer.
Lemma 4 
([35]). Consider a family of real-valued functions H j , that corresponds to the j-th component of the RNN hypothesis class H , with weight matrices and activation functions that satisfy Assumptions 1–3. The following inequality is satisfied for the RNNs developed with the non-i.i.d. training set Z 1 T :
R T s e q ( H j , ) 1 + 2 ( + 1 ) log ( 2 ) Γ B X | λ |
where Γ = ( B Q , F ) 1 B Q , F 1 B W , F B V , F .
Based on Equations (13), (15) and (16), the following theorem develops the generalized error bound for RNN models updated online using the non-i.i.d. training set.
Theorem 1  
([35]). Let H be the RNN hypothesis class that maps the first ℓ-time-step inputs to the ℓ-th output, = 1 , , L n n , h 1 , , h T + 1 be the hypotheses from H that are developed using the non-i.i.d. training set Z 1 T and meet all the conditions in Lemmas 2–4, and h = t = 1 T λ t h t + 1 be the ensemble hypothesis. For any δ > 0 , the following inequality holds with probability no less than 1 δ :
R T + 1 h , Z 1 T 2 M | λ | 2 log 1 δ + t = 1 T λ t L h t + 1 , Z t + 1 + | λ | + Λ + disc ^ H ( λ ) + M L r C T d y 1 + 2 ( + 1 ) log ( 2 ) Γ B X | λ |
where C T = O log T π 4 2 log 3 / 2 e T 2 + 1 .
The weight vector λ for the hypotheses h 1 , , h T + 1 developed with non-i.i.d. training set can be optimized as follows [35]:
min λ Ω T disc ^ H ( λ ) + t = 1 T λ t L h t + 1 , Z t + 1 s . t . λ T = 0 , t = 1 T | λ t 1 T | α
Remark 1. 
Compared to the optimization problem of Equation (8) for the i.i.d. case, the objective function of Equation (18) accounts for the empirical discrepancy term disc ^ H ( λ ) for non-i.i.d training set. Additionally, since the cost function of Equation (18) is based on the sample Z T + 1 that is unavailable after the T-th round, Equation (18) includes an additional equality constraint that lets λ T = 0 , and, therefore, h T + 1 (i.e., the last hypothesis) is discarded. This is consistent with the i.i.d. case where the ensemble hypothesis is developed using the hypotheses h 1 , , h T without h T + 1 . It should be noted that the initial hypothesis h 1 is also discarded for the ensemble hypothesis in the non-i.i.d. case, since h 1 is trained offline using historical data that cannot predict the system dynamics of Equation (1) well with disturbances. Therefore, the ensemble hypothesis h is derived using the hypotheses h 2 , , h T , that is, h = t = 1 T 1 λ t h t + 1 .

4. RNN-Based LEMPC of Switched Non-Linear Systems

In this section, we develop a framework that integrates online learning RNN models with Lyapunov-based EMPC (RNN-LEMPC) for switched non-linear systems. Specifically, for each switching mode k ψ , the closed-loop state of Equation (1) is maintained in the prescribed stability region while an economic cost function is maximized to obtain optimal economic performance for the system under RNN-LEMPC. Additionally, due to the switching behavior of Equation (1), an appropriate mode transition constraint is included in the RNN-LEMPC formulation to guarantee the success of scheduled mode transitions. Note that in this section, we will only discuss the case of RNNs updated online with non-i.i.d. training set for modeling Equation (1) involving process disturbances, since Equation (4) switched between different steady-states without disturbances (the i.i.d. case) is a special case of Equation (1) and the stability results derived in this section can be easily adapted to the i.i.d. case.

4.1. Lyapunov-Based Control Using RNN Models

To simplify the closed-loop stability analysis for the system of Equation (1) under RNN-LEMPC, we represent the RNN model of Equation (3) in the following continuous-time state-space form:
x ^ ˙ = F n n k ( x ^ , u k )
where x ^ R n denotes the RNN state vector and u k R n u represents the control input vector. For each mode k ψ , a stabilizing control law u k = Φ n n k ( x ) U k is assumed to exist in the sense that the origin of the RNN of Equation (19) is rendered exponentially stable. This stabilizability assumption indicates that there is a control Lyapunov function V ^ k ( x ) belonging to class C 1 such that the following inequalities are satisfied for all states x in D ^ k :
c ^ 1 k | x | 2 V ^ k ( x ) c ^ 2 k | x | 2 ,
V ^ k ( x ) x F n n k ( x , Φ n n k ( x ) ) c ^ 3 k | x | 2 ,
V ^ k ( x ) x c ^ 4 k | x | ,
where D ^ k denotes an open neighborhood around the origin, c ^ i k , i = 1 , 2 , 3 , 4 , k ψ , are positive constants. Similarly to the construction procedure of the stability region for Equation (1) without disturbances, the stability region for the RNN model of Equation (19) operating under mode k with u k = Φ n n k ( x ) U k is characterized as a level set of V ^ k ( x ) as follows: Ω ρ ^ k : = x R n V ^ k ( x ) ρ ^ k , where ρ ^ k > 0 for k ψ . Historical data are assumed to be available for Equation (1) without disturbances operating under each mode k ψ , and, thus, the initial RNN can be constructed offline using the corresponding historical data to approximate the nominal system dynamics for each mode, respectively. Subsequently, Φ n n k ( x ) and Ω ρ ^ k for the initial RNN model can be characterized accordingly. Note that although the online update of RNNs is carried out using real-time data in this work, Ω ρ ^ k and Φ n n k ( x ) will not be updated accordingly due to the excessive computational burden of real-time characterization of Ω ρ ^ k and Φ n n k ( x ) for online learning RNNs. Therefore, Φ n n k ( x ) and Ω ρ ^ k designed using the initial RNN remain unchanged at all times, and we will demonstrate that closed-loop stability for Equation (1) in terms of the boundedness of the state within the stability region is achieved in probability under LEMPC using online learning RNNs.

4.2. Lyapunov-Based EMPC Using RNN Models

Before we proceed to the closed-loop stability analysis for the system of Equation (1) under RNN-LEMPC, we need the following propositions that guarantee closed-loop stability of the system of Equation (1) under the controller u k = Φ n n k ( x ) U k . Specifically, Proposition 1 derives an upper bound for the state error between the RNN predicted state x ^ ( t ) of Equation (19) and the actual state x ( t ) of Equation (1) taking into account bounded disturbances and model mismatch.
Proposition 1 
([40]). Consider the RNN model of Equation (19) and the system of Equation (1) operating in mode k with the same initial condition x ^ 0 = x 0 Ω ρ ^ k , t k o u t = , and | w k | w m k . There exist a function f k ( · ) belonging to class K and a positive constant κ such that for all x , x ^ Ω ρ ^ k , the following inequalities hold with probability no less than 1 δ :
| x ( t ) x ^ ( t ) | f k ( t ) : = E I + L w k w m k L x k ( e L x k t 1 )
V ^ k ( x ) κ | x x ^ | 2 + c ^ 4 k ρ ^ k c ^ 1 k | x x ^ | + V ^ k ( x ^ )
where E I denotes an upper bound for the model mismatch between the initial RNN model of Equation (19) and the system of Equation (1) without disturbances (i.e., | F n n k ( x , u k ) ) F k ( x , u k , 0 ) | E I ). The formulation of E I can be derived using the generalized error bound for offline-trained RNNs (see [38] for details).
Remark 2. 
Since the initial RNN can capture the nominal system dynamics only, Equation (21a) is derived by taking w k = w m k (i.e., the worst-case scenario) into consideration. However, in this work, RNNs are iteratively updated using real-time data to capture non-linear dynamics of Equation (1) subject to bounded disturbances, such that the modeling error between the online learning RNN models of Equation (19) and the system of Equation (1) is bounded by the modeling error bound E O with probability no less than 1 δ , i.e., | F n n k ( x , u k ) F k ( x , u k , w k ) | E O . Based on the generalized error bound E P for RNNs updated online with non-i.i.d. training set (i.e., E P is given by the RHS of Equation (17)), the finite difference method can be used to approximate the modeling error bound E O . Note that the inequality | x x ^ | E P holds with probability no less than 1 δ if the MSE loss function is utilized in this work. Similarly to the derivation of Equation (21a), the following inequality holds with probability no less than 1 δ :
| x ( t ) x ^ ( t ) | f k ( t ) : = E O L x k ( e L x k t 1 )
In contrast to Equation (21a), it is readily seen from Equation (22) that if the RNNs are updated well, such that the inequality E O L w k w m k + E I is satisfied, the state error | x ( t ) x ^ ( t ) | achieved by the online learning RNN is smaller compared to that of the initial RNN trained offline.
Proposition 2 below demonstrates that if the initial RNN is trained to model the nominal system well (i.e., | F n n k ( x , u k ) ) F k ( x , u k , 0 ) | is sufficiently small), the closed-loop state of Equation (1) can be driven towards the origin and bounded in the stability region Ω ρ ^ k at all times under u k = Φ n n k ( x ) U k applied to the system of Equation (1) in a sample-and-hold fashion.
Proposition 2  
([40]). Consider Equation (1) operating in mode k with t k o u t = , | w k | w m k , under u k = Φ n n k ( x ) U k that is applied in a sample-and-hold fashion and meets the conditions of Equation (20). If the modeling error between the initial RNN model and the system of Equation (1) without disturbances can be bounded by | F n n k ( x , u k ) F k ( x , u k , 0 ) | E I γ k | x | , and there exist 0 < ρ s k < ρ ^ k , Δ > 0 , and ϵ k > 0 , k ψ , such that the following inequality is satisfied:
L x k M k Δ + L w k w m k c ˜ 3 k c ^ 2 k ρ s k ϵ k
where c ˜ 3 k = c ^ 3 k c ^ 4 k γ k > 0 for γ k satisfying 0 < γ k < c ^ 3 k / c ^ 4 k , k ψ , then, with probability no less than 1 δ , the following inequality holds for t t q , t q + 1 and x t q Ω ρ ^ k \ Ω ρ s k :
V ^ k ( x ( t ) ) V ^ k x t q
The following proposition ensures that under u k = Φ n n k ( x ) U k , the closed-loop state can be driven to the stability region of mode f when the system of Equation (1) is switched to the subsequent mode f from the current mode k at the prescribed switching time.
Proposition 3 
([34]). Consider Equation (1) operating in mode k for t [ t k i n , t k o u t ) , with | w k | w m k , and under u k = Φ n n k ( x ) U k satisfying the conditions in Proposition 1 and Proposition 2. Given t k i n t < t k o u t = t f i n and x ( t k i n ) Ω ρ ^ k for some f , k ψ , if there exist positive real numbers ρ ^ k , Δ, ϵ k , and N k , such that
c ^ 2 f x s k x s f + ρ ^ k ϵ k N k Δ c ^ 1 k 2 ρ ^ f
then x ( t f i n ) Ω ρ ^ f .
The RNN-LEMPC scheme that optimizes economic benefits while maintaining closed-loop stability for Equation (1) is represented by the optimization problem as follows:
J = max u k S ( Δ ) t q t k o u t l e ( x ˜ ( t ) , u k ( t ) ) d t
s . t . x ˜ ˙ ( t ) = F n n k ( x ˜ ( t ) , u k ( t ) )
x ˜ ( t q ) = x ( t q )
u k ( t ) U k , t [ t q , t k o u t )
V ^ k ( x ˜ ( t ) ) ρ ^ e k , t [ t q , t k o u t ) , i f x ( t q ) Ω ρ ^ e k
V ^ ˙ k ( x ( t q ) , u k ) V ^ ˙ k ( x ( t q ) , Φ n n k ( x ( t q ) ) ) , if x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k
V ^ f ( x ˜ ( t k o u t ) ) + f e ( E P ) ρ ^ f
where x ˜ ( t ) and S ( Δ ) represent the predicted state trajectory and the class of piecewise constant functions with sampling period Δ , and f e ( E P ) : = c ^ 4 k ρ ^ k c ^ 1 k E P + κ E P is used to evaluate the impact of E P (i.e., | x x ^ | E P ) on the Lyapunov function value based on Equation (21b). At the current sampling time step t q , the optimization problem of Equation (26) is solved by maximizing the economic cost function of Equation (26a) over a shrinking prediction horizon for t [ t q , t k o u t ) and taking into account the constraints of Equations (26b)–(26g). In detail, Equation (26b) represents the prediction model that utilizes the initial RNN at the beginning and then is iteratively updated using real-time data. This prediction model is utilized to forecast future states x ˜ ( t ) for t [ t q , t k o u t ) given x ˜ ( t q ) obtained from the state measurement x ( t q ) in Equation (26c). The constraint of Equation (26d) is incorporated to bound the control inputs u ( t ) for t [ t q , t k o u t ) . Additionally, by designing Ω ρ ^ e k as a subset of Ω ρ ^ k (i.e., ρ ^ e k < ρ ^ k ), the constraints of Equations (26e) and (26f) are designed to ensure that the predicted state x ˜ ( t ) moves toward Ω ρ ^ e k and remains inside Ω ρ ^ k at all times, and it will be demonstrated in Theorem 2 that the actual state x ( t ) of Equation (1) is maintained in Ω ρ ^ k . Finally, Equation (26g) is the mode transition constraint used to drive the state x ( t ) to Ω ρ ^ f at t = t k o u t .
It should be pointed out that when the prediction model (i.e., Equation (26b)) is updated online, all the terms in the RNN-LEMPC of Equation (26) associated with the Lyapunov function (i.e., Equation (26e), the LHS of Equation (26f), and Equation (26g)) use the latest RNN model except that the RHS of Equation (26f) utilizes the initial RNN at all times. Since the results of Propositions 1–3 are established under u k = Φ n n k ( x ) U k constructed using the initial RNN, the constraints of Equations (26e)–(26g) under u k = Φ n n k ( x ) U k may not hold due to the model inconsistency. Therefore, Equation (26) may not be guaranteed to be feasible once the prediction model is updated. To remedy this, the controller Φ n n k ( x ) will be utilized for the next sampling period to stabilize the system of Equation (1) as a backup controller in the case of infeasibility of Equation (26) for some sampling steps. Finally, we develop the following theorem to guarantee that the closed-loop state of Equation (1) is maintained in Ω ρ ^ k at all times and is driven into Ω ρ ^ f at the switching moment under the RNN-LEMPC of Equation (26) with updating RNN models.
Theorem 2. 
Consider the system of Equation (1) with | w k | w m k under the RNN-LEMPC of Equation (26) using Φ n n k ( x ) as the backup controller when Equation (26) is infeasible. Let 0 < ρ s k < ρ ^ e k < ρ ^ k , Δ > 0 , ϵ k > 0 , k ψ , satisfy
ρ ^ e k κ ( f k ( Δ ) ) 2 c ^ 4 k ρ ^ k c ^ 1 k f k ( Δ ) + ρ ^ k
where f k ( · ) is defined in Equation (21a) for the initial RNN and Equation (22) for online updating RNNs, respectively. For some k , f ψ , if x ( t k i n ) Ω ρ ^ k and all the conditions in Propositions 1–3 are satisfied, and the online updating RNNs are developed, such that | F n n k ( x , u k ) F k ( x , u k , w k ) | E O γ k | x | (i.e., the modeling error constraint) is met, then for each sampling step, with probability no less than 1 δ , the state x ( t ) of Equation (1) is bounded in Ω ρ ^ k for t [ t k i n , t k o u t ) and is driven to Ω ρ ^ f at t = t k o u t = t f i n .
Proof. 
The proof consists of two parts. We first consider the case where the optimization problem of Equation (26) is infeasible and the control law u k = Φ n n k ( x ) U k is applied. In this case, it is demonstrated in Propositions 1–3 that the controller u k = Φ n n k ( x ) U k is able to guarantee the boundedness of the state x ( t ) within Ω ρ ^ k and the success of the scheduled model transitions for the non-linear system of Equation (1).
Subsequently, we prove that when there is a feasible solution u k ( x ( t q ) ) (i.e., the optimal control action) for the RNN-LEMPC of Equation (26) with online updating RNN models, closed-loop stability for Equation (1) holds as well under u k ( x ( t q ) ) . In detail, if x ( t q ) Ω ρ ^ e k , the predicted state x ˜ ( t ) stays in Ω ρ ^ e k following the constraint of Equation (26e). Then, it follows from Proposition 1 that with probability no less than 1 δ , the actual state x ( t ) of Equation (1) for t [ t q , t q + 1 ) can be bounded as follows:
V ^ k ( x ) κ | x x ˜ | 2 + c ^ 4 k ρ ^ k c ^ 1 k | x x ˜ | + V ^ k ( x ˜ ) κ ( f k ( Δ ) ) 2 + c ^ 4 k ρ ^ k c ^ 1 k f k ( Δ ) + V ^ k ( x ˜ )
Since V ^ k ( x ˜ ) ρ ^ e k , we obtain V ^ k ( x ) ρ ^ k if Equation (27) holds, indicating that x ( t ) Ω ρ ^ k for all t [ t q , t q + 1 ) with probability no less than 1 δ . Following the proof technique in [40], if x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k , the time derivative of V ^ ( x ( t ) ) of Equation (1) for t t q , t q + 1 can be bounded as V ^ ˙ ( x ( t ) ) L x k M k Δ + V ^ ˙ ( x ( t q ) ) using Equation (2). Note that for any x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k , Equation (26f) is activated such that we can further bound V ^ ˙ ( x ( t q ) ) with probability no less than 1 δ as follows:
V ^ ˙ ( x ( t q ) ) = V ^ k ( x ( t q ) ) x F k x ( t q ) , u k ( x ( t q ) ) , w k F n n k ( x ( t q ) , u k ( x ( t q ) ) ) + V ^ k ( x ( t q ) ) x F n n k ( x ( t q ) , u k ( x ( t q ) ) ) c ^ 4 k | x ( t q ) | · F n n k ( x ( t q ) , u k ( x ( t q ) ) ) F k x ( t q ) , u k ( x ( t q ) ) , w k + V ^ k ( x ( t q ) ) x F n n k ( x ( t q ) , Φ n n k ( x ( t q ) ) ) c ^ 4 k γ k | x ( t q ) | 2 c ^ 3 k | x ( t q ) | 2 c ˜ 3 k c ^ 2 k ρ ^ e k
where the first inequality of Equation (29) is obtained under the constraints of Equations (26f) and (20c). The second inequality of Equation (29) follows from Equation (20b) and the inequality | F n n k ( x , u k ) F k ( x , u k , w k ) | E O γ k | x | for online updating RNNs. Using Equation (20a) for any state x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k , it follows that the last inequality of Equation (29) holds. Therefore, with probability no less than 1 δ , the following inequality for V ^ ˙ ( x ( t ) ) holds:
V ^ ˙ ( x ( t ) ) L x k M k Δ c ˜ 3 k c ^ 2 k ρ ^ e k < L x k M k Δ + L w k w m k c ˜ 3 k c ^ 2 k ρ s k
Due to 0 < ρ s k < ρ ^ e k and 0 < L w k w m k , the second inequality of Equation (30) is derived. Therefore, it is demonstrated in Equation (30) that V ^ ˙ ( x ( t ) ) < 0 holds if the constraint of Equation (23) in Proposition 2 is met. This implies that for any x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k , the value of V ^ k ( x ) decreases for t t q , t q + 1 with probability no less than 1 δ under u k ( x ( t q ) ) , and, thus, the closed-loop state of Equation (1) can enter into Ω ρ ^ e k within finite sampling steps for a certain probability. Additionally, using Equation (21b) and | x x ^ | E P , the value of V ^ f ( x ( t k o u t ) ) can be bounded with probability no less than 1 δ as follows:
V ^ f ( x ( t k o u t ) ) κ E P + c ^ 4 k ρ ^ k c ^ 1 k E P + V ^ f ( x ˜ ( t k o u t ) )
According to Equation (31), we have V ^ f ( x ( t k o u t ) ) ρ ^ f if Equation (26g) is met, which indicates that at t = t k o u t , the closed-loop state x ( t ) can be driven into Ω ρ ^ f in probability.
Therefore, closed-loop stability can be achieved for the system of Equation (1) in probability regardless of the feasibility of Equation (26). This completes the proof of Theorem 2. □
Remark 3. 
The RNN-LEMPC of Equation (26) demonstrates that if the state measurement x ( t q ) at t = t q is in the region Ω ρ ^ e k , the economic cost function is maximized within Ω ρ ^ e k ; if x ( t q ) Ω ρ ^ k \ Ω ρ ^ e k , the predicted state x ˜ ( t ) is driven towards Ω ρ ^ e k . Additionally, it has been proven in Theorem 2 that the actual state x ( t ) of Equation (1) is bounded in the stability region Ω ρ ^ k if x ˜ ( t ) is maintained in Ω ρ ^ e k . Therefore, the region Ω ρ ^ e k is a “safe" operating region in which the RNN-LEMPC of Equation (26) can maximize economic benefits while maintaining the boundedness of the state x ( t ) within Ω ρ ^ k . It is noted from Equation (27) that the relation between Ω ρ ^ e k and Ω ρ ^ k is determined by f k ( Δ ) , which is the upper bound on | x ( t ) x ^ ( t ) | within one sampling period Δ. As discussed in Remark 2, online learning RNNs are capable of modeling Equation (1) involving disturbances while the initial RNN can capture the nominal system dynamics only. This implies that compared to the initial model, online learning RNNs may better approximate Equation (1) such that the state error | x ( t ) x ^ ( t ) | is smaller, and, thus, a larger ρ ^ e k may be chosen for RNN-LEMPC with online updating RNNs. Therefore, while we use the controller Φ n n k ( x ) characterized using the initial offline-trained RNN as a backup controller to stabilize Equation (1) when RNN-LEMPC is infeasible, the online update of RNNs is performed to improve the closed-loop economic performance of Equation (1), which will be illustrated using a non-linear chemical process in the next section.

5. Application to a Chemical Process Example

In this section, a chemical process example is used to illustrate the efficacy of the proposed LEMPC scheme using RNN models updated online. Specifically, a non-isothermal continuous stirred tank reactor (CSTR) is considered, in which a reactant A is transformed into a product B ( A B ) via a second-order, irreversible, and exothermic reaction. The CSTR is required to switch between two modes consisting of two available inlet streams with different inlet concentrations C A 0 σ and inlet temperatures T 0 σ for the pure reactant A, where σ ψ = { 1 , 2 } . Additionally, a heating jacket with the heat rate Q is furnished in the CSTR to supply or remove heat for the reactor. At mode σ { 1 , 2 } , the CSTR dynamic model is described by the following ordinary differential equations:
d C A d t = k 0 e E R T C A 2 + F V ( C A 0 σ C A ) d T d t = Δ H ρ L C p k 0 e E R T C A 2 + F V ( T 0 σ T ) + Q ρ L C p V
where C A is the concentration of the reactant A and T is the reactor temperature. A detailed description of the chemical reaction and the process parameters in Equation (32) can be found in [24]. The process parameter values of the CSTR used in the closed-loop simulations are given in Table 1.
For each mode, a steady-state ( C A s σ , T s σ ) is considered for the CSTR under ( C A 0 s σ Q s ) (i.e., the steady-state input values). In this example, two manipulated inputs are the heat input rate Q and the inlet concentration C A 0 σ , which are denoted by Δ Q = Q Q s and Δ C A 0 σ = C A 0 σ C A 0 s σ in their deviation variable forms, respectively. The manipulated inputs are bounded by | Δ Q | 5 × 10 5 kJ/h and | Δ C A 0 σ | 3.5 kmol / m 3 for both modes. The input and state vectors in deviation form for the CSTR of Equation (32) are represented by u T = [ Δ C A 0 σ Δ Q ] and x T = [ C A C A s σ T T s σ ] , respectively, such that the equilibrium point of the CSTR for each mode is at the origin of the steady-space. It is desired to operate the CSTR in Ω ρ ^ σ (i.e., the stability region) around ( C A s σ , T s σ ) while maximizing the production rate of B given by:
l e ( x , u ) = k 0 e E / R T C A 2
The explicit Euler method with a sufficiently small integration time step of h ¯ c = 10 4 h is applied to numerically solve the CSTR dynamic model of Equation (32). Additionally, the non-linear optimization problem of Equation (26) is solved by PyIpopt [41] with a sampling period of Δ = 10 2 h .

5.1. The CSTR Switched between Two Modes with Bounded Disturbances

We first consider the CSTR subject to the following disturbances. (1) The upstream disturbance results in the variation of the feed flow rate F in a way that F is time varying and is subject to the constraint: 0 F 5.5 m 3 / h . (2) The catalyst deactivation is considered during process operation; this results in a gradual reduction in the pre-exponential factor k 0 that is constrained by: 0 k 0 8.46 × 10 6 m 3 / kmol h . Additionally, the control Lyapunov functions for both modes are designed using the quadratic form of V σ ( x ) = x T P σ x with P σ = 1060 22 22 0.52 for σ { 1 , 2 } . As discussed in Section 2.3, we follow the development method of RNN models in [37] to construct two initial RNNs to model the nominal CSTR system (i.e., the values of F and k 0 are taken in Table 1 at all times) operating in two modes using historical data gathered from the entire operating region, respectively. Specifically, the RNN models are trained using Keras, where a hidden recurrent layer of 16 neurons is utilized for both initial RNNs, with t a n h as the activation function, MSE as the loss function, and Adam as the optimizer. Based on the two initial RNNs, the stability region Ω ρ ^ σ and a subset Ω ρ ^ e σ for the CSTR at mode σ { 1 , 2 } can be characterized accordingly. In this example, ρ ^ 1 and ρ ^ e 1 are chosen to be 368 and 280 for mode 1, and ρ ^ 2 and ρ ^ e 2 are chosen to be 228 and 170, respectively. Since the CSTR involves disturbances during process operation, we follow the update strategy in [34] to improve RNN models online to capture the uncertain CSTR system involving bounded disturbances. Specifically, the online update of RNNs is carried out based on the most recent real-time data collected from a fixed time interval (e.g., five sampling periods) and the previous RNN model. The new RNN is utilized to predicate state evolution in LEMPC only if the modeling error constraint in Theorem 2 is met; otherwise, we will discard the new RNN model and use the previous RNN model as the prediction model in LEMPC.
We carry out the closed-loop simulations for the CSTR subject to bounded disturbances and with scheduled mode transitions under RNN-LEMPC as follows. Specifically, the CSTR operates under mode 1 for t [ 0 , 0.25 h) and the mode transition takes place at t = 0.25 h, after which the CSTR operates under mode 2 for t [ 0.25 h, ) . The value of F changes to 2.5 m 3 /h and 5.5 m 3 /h, and k 0 reduces to 0.8 k 0 , 0.6 k 0 , at t = 0.05 h and 0.25 h, respectively. Starting from an initial condition ( C A , T ) = (1.95 kmol/m 3 402 K), the simulation results for the uncertain CSTR system with process disturbances under the LEMPC with the initial offline-learning RNNs and the online updating RNNs are displayed in Figure 2, Figure 3 and Figure 4. In detail, it is observed from Figure 2a that under the LEMPC using the initial RNN at all times, the closed-loop state is bounded in Ω ρ ^ 1 and Ω ρ ^ 2 (i.e., the stability regions) for both modes, and is driven from the initial condition outside of Ω ρ ^ 2 into Ω ρ ^ 2 at the switching moment. However, the state trajectories under the initial RNNs show considerable oscillations near the boundaries of Ω ρ ^ e 1 and Ω ρ ^ e 2 for both modes, while those under the online updating RNNs stay smoothly at the boundaries of Ω ρ ^ e 1 and Ω ρ ^ e 2 with much smaller oscillations, as shown in Figure 2b. Additionally, Figure 3 shows the comparisons of the Lyapunov function value V ^ ( x ) under the LEMPC using the initial and online updating RNN models for both modes, respectively. It is shown in Figure 3 that both V ^ 1 ( x ) and V ^ 2 ( x ) under the initial RNNs show persistent oscillations around ρ ^ e 1 and ρ ^ e 2 , respectively, while those under the online updating RNNs show oscillations within finite sampling steps and ultimately converge to ρ ^ e 1 and ρ ^ e 2 after several rounds of online updates of RNNs, respectively. This implies that the contractive constraint of Equation (26f) under the LEMPC using the initial RNNs is activated frequently, since the Lyapunov function value V ^ ( x ) exceeds ρ ^ e 1 and ρ ^ e 2 for both modes frequently, while the contractive constraint remains inactive after finite sampling steps under the online updating RNNs. Figure 4 depicts the state profiles (i.e., C A and T) and the input profiles (i.e., C A 0 and Q) in the original state space. Specifically, it is observed from Figure 4a that under the online updating RNNs, the LEMPC drives the states C A and T to the optimal operating points that maximize the production rate of B for both modes. However, the state C A exhibits sustained oscillations under the LEMPC using the initial RNNs. Similarly, it is shown in Figure 4b that the LEMPC using the online updating RNNs shows smoother manipulated input profiles (fewer oscillations) compared to that using the initial RNNs. The above simulation results demonstrate that the initial RNNs trained with historical data cannot predict well the uncertain CSTR system in the presence of process disturbances, which results in sustained oscillations in the state trajectories, the evolution of Lyapunov function value V ^ ( x ) , and the state and input profiles. These oscillations can be effectively mitigated after a more accurate RNN model that approximates the uncertain CSTR system dynamics is derived through an online update of RNNs.
Finally, in the event that the system dynamics of the CSTR remains unchanged for the remaining operation time (i.e., no further mode transition and no process disturbances after t = 0.45 h ), an online update of RNNs will be deactivated and the final RNN model can be derived by solving Equation (18), and the CSTR will operate in mode 2 under the LEMPC using the final RNN model after t = 0.45 h . Specifically, when the CSTR operates in mode 2, the RNNs are updated online at t = 0.3 , 0.35 , 0.4 , 0.45 for four times (the number of rounds T = 4 ). We use the hypothesis h 1 and a sequence of hypotheses h 2 , , h 5 to denote the initial RNN and the four online updating RNNs for mode 2, respectively. To simplify the calculation of the empirical discrepancy disc ^ H ( λ ) of Equation (18), the hypothesis h ¯ is considered to belong to a linear space (denoted by H ¯ of the hypotheses h 2 , , h 5 in this example, that is h ¯ H ¯ : = i = 1 4 β i h i + 1 , where i = 1 4 β i = 1 and β i 0 i = 1 , , 4 . It is noted that each round of online learning represents five sampling periods in this example, and, thus, the RNN input for the next round X T + 1 consists of the system states x ( t ) and the control inputs u ( t ) for the current and the next four sampling steps, where t = 0.45 , 0.46 , 0.47 , 0.48 , and 0.49 h . Therefore, the loss between the RNN outputs predicted by the hypotheses h t + 1 and h ¯ on X T + 1 (i.e., L h t + 1 X T + 1 , h ¯ X T + 1 , t = 1 , , 4 ) can be obtained. The optimization problem of Equation (18) can be simplified to the following minimax optimization problem:
min λ Ω T max h ¯ H ¯ t = 1 T λ t L h t + 1 , Z t + 1 L h t + 1 X T + 1 , h ¯ X T + 1 + t = 1 T λ t L h t + 1 , Z t + 1 s . t . λ T = 0 , t = 1 T | λ t 1 T | α
It is noted that exhaustive searches for the hypothesis h ¯ are performed in the linear space H ¯ , thereby converting the minimax optimization problem of Equation (34) into the minimization problem of the maximum of a set of objective functions, which can be efficiently solved using the MATLAB routine fminimax. By setting the hyperparameter α = 0.8 , the optimization problem of Equation (34) is solved to calculate the optimal weight vector λ = ( λ 1 λ 4 ) , yielding λ 1 = 0.1 , λ 2 = 0.25 , λ 3 = 0.65 , where the weight λ 4 is assigned to be zero following the constraints in Equation (34). Subsequently, the final RNN model h is derived using the ensemble of the hypotheses h 2 , , h 4 with the corresponding weights λ 1 λ 3 , that is h = t = 1 3 λ t h t + 1 , and the hypothesis h 5 is discarded due to its weight λ 4 = 0 . It should be pointed out that when the system dynamics of the CSTR varies over time caused by further mode transitions and/or process disturbances at some future operation time, the final model h needs to be updated online again using real-time data if it does not perform well.

5.2. The CSTR Switched between Two Steady-States without Disturbances

We next consider a special switching case of the nominal CSTR system, as discussed in Section 3.1, where the CSTR operates in mode 1 defined in Section 5.1 at all times (i.e., σ 1 for Equation (32)) and does not involve disturbances. In this case, two steady-states ( C A s 1 , T s 1 ) = (1.95 kmol/m 3 , 402 K) and ( C A s 2 , T s 2 ) = (1.22 kmol/m 3 , 438 K) for the CSTR are considered under ( C A 0 s Q s ) = (4 kmol/m 3 , 0 kJ/h). The CSTR is switched between two steady-states at the prescribed switching times while maximizing the economic costs of Equation (33) under RNN-LEMPC. The CSTR is said to operate in mode 1 or 2 when it operates in the stability region Ω ρ ^ 1 or Ω ρ ^ 2 around the steady-state ( C A s 1 , T s 1 ) or ( C A s 2 , T s 2 ) . For both modes, the control Lyapunov functions follow those in Section 5.1 with P 1 = P 2 = 1060 22 22 0.52 . In this section, it is assumed that historical operational data is only available for the CSTR operating in a portion of operating region (denoted by Ω 0 ) around the steady-state ( C A s 1 , T s 1 ) , where Ω 0 : = 1.5 kmol / m 3 C A 2.4 kmol / m 3 and 360 K T 440 K (marked as a rectangle in Figure 5). Based on this limited training dataset, an initial RNN is trained using the same method in Section 5.1. The stability region Ω ρ ^ 1 and a subset Ω ρ ^ e 1 for mode 1 follow those in Section 5.1 with ρ ^ 1 = 368 and ρ ^ e 1 = 280 , and the stability region Ω ρ ^ 2 with ρ ^ 2 = 480 and a subset Ω ρ ^ e 2 with ρ ^ e 2 = 380 are chosen for the CSTR at mode 2 in this section. In this example, since the initial RNN is developed with the dataset in Ω 0 around the steady-state ( C A s 1 , T s 1 ) of mode 1, we will operate the CSTR in mode 1 under LEMPC with the initial RNN at all times. However, when the CSTR operates in mode 2, we will update the RNN models online since the initial RNN lacks the dataset around the steady-state ( C A s 2 , T s 2 ) of mode 2. Specifically, starting from the initial RNN, the RNNs are updated online using the most recent real-time data (i.e., every five sampling periods for each round) and the previous RNN model.
The simulation results for the nominal CSTR system switched between two steady-states under the RNN-LEMPC of Equation (26) are presented in Figure 5 and Figure 6. Specifically, the CSTR starts from an initial condition ( C A , T ) = ( 1.95 kmol / m 3 402 K ) and operates in mode 1 for t [ 0 , 0.1 h ) under the LEMPC with the initial RNN constructed with the dataset in Ω 0 . Subsequently, the CSTR operates under mode 2 for t [ 0.1 h , ) (i.e., the remaining operation time) following a switching schedule from mode 1 to mode 2 at t = 0.1 h under LEMPC using the initial RNN and the online updating RNNs. Figure 5 shows that under the initial RNN, the state trajectory closely follows the boundary of Ω ρ ^ e 1 for mode 1. This is consistent with the result in Figure 6a, which shows that the Lyapunov function value V ^ 1 ( x ) under the initial RNN converges to ρ ^ e 1 after t = 0.04 h . It should be mentioned that the initial RNN is constructed with the dataset gathered from a portion of the operating region around ( C A s 1 , T s 1 ) , and, thus, the LEMPC with the initial RNN performs well for the CSTR operating in mode 1. Additionally, it is observed in Figure 5 that under the initial RNN, there exists a gap between the closed-loop state trajectory and the boundary of Ω ρ ^ e 2 for mode 2, while the closed-loop state trajectory under the online updating RNNs ultimately operates near the boundary of Ω ρ ^ e 2 after exhibiting oscillations for some sampling steps. It is more apparent in Figure 6b that the Lyapunov function value V ^ 2 ( x ) under the initial RNN converges to a value of 340, while it converges to ρ ^ e 2 = 380 after t = 0.26 h under the online updating RNNs. The total economic benefits L E = 0.15 0.4 l e ( x , u ) d t within the operating period t [ 0.15 h , 0.4 h ) are calculated (note that the first update of RNNs occurs at t = 0.15 h ), which yields 4.75 and 4.84 for the LEMPC with the initial RNN and the online RNNs, respectively. Therefore, the accumulative economic benefits during t [ 0.15 h , 0.4 h ) are improved by 1.9 % via online learning. Finally, the RNNs are updated online at t = 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 h six times ( T = 6 ) to generate a sequence of hypotheses h 2 , , h 7 . Based on the testing error L ( h t , Z t ) for each hypothesis h t , Equation (8) is solved with the hyperparameter α = 0.6 to calculate the optimal weights λ 1 λ 6 for the hypotheses h 1 , , h 6 , respectively, yielding λ 1 = 0 , λ 2 = 0.0333 , λ 3 = 0.1666 , λ 4 = 0.1667 , λ 5 = 0.4667 , and λ 6 = 0.1667 . Subsequently, the final RNN model h is developed with h = t = 1 6 λ t h t , and the CSTR operates in mode 2 under the LEMPC with this final RNN model for the remaining operation time. However, when there is a further mode transition for the CSTR, the online update of RNNs is required to perform again if the final model does not predict well for the CSTR operating in the new mode.

6. Conclusions

This work proposed an LEMPC scheme using online updating RNNs that can optimize the economic benefits of switched non-linear systems. The generalized error bounds for RNN models updated online in i.i.d. and non-i.i.d. settings were derived, respectively. Subsequently, the LEMPC that incorporates online learning RNNs was developed to maintain the closed-loop state within the prescribed stability region and maximize the economic benefits for the uncertain system involving bounded disturbances. A Lyapunov-based constraint was incorporated into the LEMPC formulation to ensure the success of scheduled mode transitions. Closed-loop stability for the uncertain non-linear system subject to bounded disturbances under LEMPC was proved in a probabilistic manner accounting for the generalized error bound. The proposed LEMPC scheme was applied to a chemical process example to demonstrate that economic optimality and closed-loop stability can be improved under the LEMPC using online RNNs compared to those using the initial RNNs at all times.

Author Contributions

C.H. developed the main results, performed the simulation studies and prepared the initial draft of the paper. S.C. contributed to the simulation studies in this manuscript. Z.W. developed the idea of RNN generalized error, oversaw all aspects of the research and revised this manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National University of Singapore Start-up Grant, Grant/Award Number: R279-000-656-731 and MOE AcRF Tier 1 FRC Grant, Grant/Award Number: CHBE-22-5367.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Angeli, D.; Amrit, R.; Rawlings, J.B. On average performance and stability of economic model predictive control. IEEE Trans. Autom. Control 2011, 57, 1615–1626. [Google Scholar] [CrossRef] [Green Version]
  2. Heidarinejad, M.; Liu, J.; Christofides, P.D. Economic model predictive control of nonlinear process systems using Lyapunov techniques. AIChE J. 2012, 58, 855–870. [Google Scholar] [CrossRef]
  3. Müller, M.A.; Angeli, D.; Allgöwer, F. Economic model predictive control with self-tuning terminal cost. Eur. J. Control 2013, 19, 408–416. [Google Scholar] [CrossRef]
  4. Ellis, M.; Durand, H.; Christofides, P.D. A tutorial review of economic model predictive control methods. J. Process Control 2014, 24, 1156–1178. [Google Scholar] [CrossRef]
  5. Dong, Z.; Angeli, D. Analysis of economic model predictive control with terminal penalty functions on generalized optimal regimes of operation. Int. J. Robust Nonlinear Control 2018, 28, 4790–4815. [Google Scholar] [CrossRef]
  6. Dong, Z.; Angeli, D. Homothetic tube-based robust economic mpc with integrated moving horizon estimation. IEEE Trans. Autom. Control 2020, 66, 64–75. [Google Scholar] [CrossRef]
  7. Lee, T.C.; Jiang, Z.P. Uniform asymptotic stability of nonlinear switched systems with an application to mobile robots. IEEE Trans. Autom. Control 2008, 53, 1235–1252. [Google Scholar] [CrossRef]
  8. Shen, H.; Xing, M.; Wu, Z.G.; Xu, S.; Cao, J. Multiobjective fault-tolerant control for fuzzy switched systems with persistent dwell time and its application in electric circuits. IEEE Trans. Fuzzy Syst. 2019, 28, 2335–2347. [Google Scholar] [CrossRef]
  9. Jin, Y.; Fu, J.; Zhang, Y.; Jing, Y. Reliable control of a class of switched cascade nonlinear systems with its application to flight control. Nonlinear Anal. Hybrid Syst. 2014, 11, 11–21. [Google Scholar] [CrossRef]
  10. Branicky, M.S. Multiple Lyapunov functions and other analysis tools for switched and hybrid systems. IEEE Trans. Autom. Control 1998, 43, 475–482. [Google Scholar] [CrossRef]
  11. Aleksandrov, A.Y.; Chen, Y.; Platonov, A.V.; Zhang, L. Stability analysis for a class of switched nonlinear systems. Automatica 2011, 47, 2286–2291. [Google Scholar] [CrossRef]
  12. Hespanha, J.P.; Morse, A.S. Stability of switched systems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Control, Phoenix, AZ, USA, 7–10 December 1999; Volume 3, pp. 2655–2660. [Google Scholar]
  13. Xiang, W.; Xiao, J. Stabilization of switched continuous-time systems with all modes unstable via dwell time switching. Automatica 2014, 50, 940–945. [Google Scholar] [CrossRef]
  14. Nodozi, I.; Rahmani, M. LMI-based model predictive control for switched nonlinear systems. J. Process Control 2017, 59, 49–58. [Google Scholar] [CrossRef]
  15. Mhaskar, P.; El-Farra, N.H.; Christofides, P.D. Predictive control of switched nonlinear systems with scheduled mode transitions. IEEE Trans. Autom. Control 2005, 50, 1670–1680. [Google Scholar] [CrossRef]
  16. Heidarinejad, M.; Liu, J.; Christofides, P.D. Economic model predictive control of switched nonlinear systems. Syst. Control Lett. 2013, 62, 77–84. [Google Scholar] [CrossRef]
  17. Prabhu, S.; Deepa, S.; Arulperumjothi, M.; Susilowati, L.; Liu, J. Resolving-power domination number of probabilistic neural networks. J. Intell. Fuzzy Syst. 2022, 43, 6253–6263. [Google Scholar] [CrossRef]
  18. Zheng, Y.; Zhang, T.; Li, S.; Qi, C.; Zhang, Y.; Wang, Y. Data-Driven Distributed Model Predictive Control of Continuous Nonlinear Systems with Gaussian Process. Ind. Eng. Chem. Res. 2022, 61, 18187–18202. [Google Scholar] [CrossRef]
  19. Zhang, T.; Li, S.; Zheng, Y. Implementable Stability Guaranteed Lyapunov-Based Data-Driven Model Predictive Control with Evolving Gaussian Process. Ind. Eng. Chem. Res. 2022, 61, 14681–14690. [Google Scholar] [CrossRef]
  20. Pan, Y.; Wang, J. Model predictive control of unknown nonlinear dynamical systems based on recurrent neural networks. IEEE Trans. Ind. Electron. 2011, 59, 3089–3101. [Google Scholar] [CrossRef]
  21. Xu, J.; Li, C.; He, X.; Huang, T. Recurrent neural network for solving model predictive control problem in application of four-tank benchmark. Neurocomputing 2016, 190, 172–178. [Google Scholar] [CrossRef]
  22. Shahnazari, H.; Mhaskar, P.; House, J.M.; Salsbury, T.I. Modeling and fault diagnosis design for HVAC systems using recurrent neural networks. Comput. Chem. Eng. 2019, 126, 189–203. [Google Scholar] [CrossRef]
  23. Wu, Z.; Christofides, P.D. Economic machine-learning-based predictive control of nonlinear systems. Mathematics 2019, 7, 494. [Google Scholar] [CrossRef] [Green Version]
  24. Wu, Z.; Rincon, D.; Christofides, P.D. Real-time adaptive machine-learning-based predictive control of nonlinear processes. Ind. Eng. Chem. Res. 2019, 59, 2275–2290. [Google Scholar] [CrossRef]
  25. Wagener, N.; Cheng, C.A.; Sacks, J.; Boots, B. An online learning approach to model predictive control. arXiv 2019, arXiv:1902.08967. [Google Scholar]
  26. Bieker, K.; Peitz, S.; Brunton, S.L.; Kutz, J.N.; Dellnitz, M. Deep model predictive control with online learning for complex physical systems. arXiv 2019, arXiv:1905.10094. [Google Scholar]
  27. Ning, C.; You, F. Online learning based risk-averse stochastic MPC of constrained linear uncertain systems. Automatica 2021, 125, 109402. [Google Scholar] [CrossRef]
  28. Zheng, Y.; Zhao, T.; Wang, X.; Wu, Z. Online Learning-Based Predictive Control of Crystallization Processes under Batch-to-Batch Parametric Drift. AIChE J. 2022, 68, e17815. [Google Scholar] [CrossRef]
  29. Cesa-Bianchi, N.; Conconi, A.; Gentile, C. On the generalization ability of on-line learning algorithms. IEEE Trans. Inf. Theory 2004, 50, 2050–2057. [Google Scholar] [CrossRef]
  30. Cesa-Bianchi, N.; Gentile, C. Improved risk tail bounds for on-line algorithms. IEEE Trans. Inf. Theory 2008, 54, 386–390. [Google Scholar] [CrossRef]
  31. Kakade, S.M.; Tewari, A. On the generalization ability of online strongly convex programming algorithms. Adv. Neural Inf. Process. Syst. 2008, 21, 801–808. [Google Scholar]
  32. Rakhlin, A.; Sridharan, K.; Tewari, A. Online learning via sequential complexities. J. Mach. Learn. Res. 2015, 16, 155–186. [Google Scholar]
  33. Kuznetsov, V.; Mohri, M. Time series prediction and online learning. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; pp. 1190–1213. [Google Scholar]
  34. Hu, C.; Cao, Y.; Wu, Z. Online Machine Learning Modeling and Predictive Control of Nonlinear Systems with Scheduled Mode Transitions. AIChE J. 2022, 69, e17882. [Google Scholar] [CrossRef]
  35. Hu, C.; Wu, Z. Model Predictive Control of Switched Nonlinear Systems Using Online Machine Learning. submitted.
  36. Lin, Y.; Sontag, E.D. A universal formula for stabilization with bounded controls. Syst. Control Lett. 1991, 16, 393–397. [Google Scholar] [CrossRef]
  37. Wu, Z.; Tran, A.; Rincon, D.; Christofides, P.D. Machine learning-based predictive control of nonlinear processes. Part I: Theory. AIChE J. 2019, 65, e16729. [Google Scholar] [CrossRef]
  38. Wu, Z.; Rincon, D.; Gu, Q.; Christofides, P.D. Statistical Machine Learning in Model Predictive Control of Nonlinear Processes. Mathematics 2021, 9, 1912. [Google Scholar] [CrossRef]
  39. Kuznetsov, V.; Mohri, M. Discrepancy-based theory and algorithms for forecasting non-stationary time series. Ann. Math. Artif. Intell. 2020, 88, 367–399. [Google Scholar] [CrossRef]
  40. Wu, Z.; Alnajdi, A.; Gu, Q.; Christofides, P.D. Statistical machine-learning–based predictive control of uncertain nonlinear processes. AIChE J. 2022, 68, e17642. [Google Scholar] [CrossRef]
  41. Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
Figure 1. A schematic of a recurrent neutral network and its unfolded structure.
Figure 1. A schematic of a recurrent neutral network and its unfolded structure.
Processes 11 00342 g001
Figure 2. Closed–loop state trajectories ( C A , T ) for the uncertain CSTR system operating in mode 1 for t [ 0 , 0.25 h) (red solid line) and switching to mode 2 at t = 0.25 h (blue solid line) under the RNN-LEMPC of Equation (26), (a) using the initial offline-learning RNNs at all times, and (b) using the online updating RNNs, for the initial condition ( C A , T ) = (1.95 kmol/m 3 402 K) (marked as red diamond).
Figure 2. Closed–loop state trajectories ( C A , T ) for the uncertain CSTR system operating in mode 1 for t [ 0 , 0.25 h) (red solid line) and switching to mode 2 at t = 0.25 h (blue solid line) under the RNN-LEMPC of Equation (26), (a) using the initial offline-learning RNNs at all times, and (b) using the online updating RNNs, for the initial condition ( C A , T ) = (1.95 kmol/m 3 402 K) (marked as red diamond).
Processes 11 00342 g002
Figure 3. Comparisons of V ^ 1 ( x ) for mode 1 and V ^ 2 ( x ) for mode 2 under the initial offline-learning and online updating RNN models.
Figure 3. Comparisons of V ^ 1 ( x ) for mode 1 and V ^ 2 ( x ) for mode 2 under the initial offline-learning and online updating RNN models.
Processes 11 00342 g003
Figure 4. (a) Closed-loop state ( C A and T) and (b) manipulated input ( C A 0 and Q) profiles for the uncertain CSTR system operating in mode 1 for t [ 0 , 0.25 h) and switching to mode 2 at t = 0.25 h under the RNN-LEMPC of Equation (26) using the initial offline-learning RNNs at all times (blue solid line), and using the online updating RNNs (red dashed line), for the initial condition ( C A , T ) = (1.95 kmol/m 3 402 K).
Figure 4. (a) Closed-loop state ( C A and T) and (b) manipulated input ( C A 0 and Q) profiles for the uncertain CSTR system operating in mode 1 for t [ 0 , 0.25 h) and switching to mode 2 at t = 0.25 h under the RNN-LEMPC of Equation (26) using the initial offline-learning RNNs at all times (blue solid line), and using the online updating RNNs (red dashed line), for the initial condition ( C A , T ) = (1.95 kmol/m 3 402 K).
Processes 11 00342 g004
Figure 5. Closed-loop state trajectories ( C A , T ) for the nominal CSTR system operating in mode 1 for t [ 0 , 0.1 h ) using the initial offline-learning RNN (red solid line), and switching to mode 2 at t = 0.1 h using the initial RNN (blue solid line) and online updating RNNs (pink dashed line) under the RNN-LEMPC of Equation (26) with the initial condition ( C A , T ) = ( 1.95 kmol / m 3 402 K ) (marked as red diamond).
Figure 5. Closed-loop state trajectories ( C A , T ) for the nominal CSTR system operating in mode 1 for t [ 0 , 0.1 h ) using the initial offline-learning RNN (red solid line), and switching to mode 2 at t = 0.1 h using the initial RNN (blue solid line) and online updating RNNs (pink dashed line) under the RNN-LEMPC of Equation (26) with the initial condition ( C A , T ) = ( 1.95 kmol / m 3 402 K ) (marked as red diamond).
Processes 11 00342 g005
Figure 6. Comparisons of V ^ 1 ( x ) for mode 1 using the initial offline-learning RNN (red dashed line), and V ^ 2 ( x ) for mode 2 using the initial RNN (blue solid line) and online updating RNNs (pink dashed line).
Figure 6. Comparisons of V ^ 1 ( x ) for mode 1 using the initial offline-learning RNN (red dashed line), and V ^ 2 ( x ) for mode 2 using the initial RNN (blue solid line) and online updating RNNs (pink dashed line).
Processes 11 00342 g006
Table 1. Parameter values of the CSTR.
Table 1. Parameter values of the CSTR.
E = 5 × 10 4  kJ/kmol F = 5 m 3 /h
R = 8.314 kJ / kmol K T 0 1 = 300 K , T 0 2 = 290  K
V = 1 m 3 Q s = 0.0  kJ/h
Δ H = 1.15 × 10 4  kJ/kmol C A 0 s 1 = 4 kmol/m 3 ,   C A 0 s 2 = 4.55 kmol/m 3
ρ L = 1000 kg/m 3 T s 1 = 402 K , T s 2 = 475 K
C p = 0.231  kJ/kg K C A s 1 = 1.95 kmol/m 3 ,   C A s 2 = 0.83 kmol/m 3
k 0 = 8.46 × 10 6 m 3 /kmol h
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, C.; Chen, S.; Wu, Z. Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks. Processes 2023, 11, 342. https://doi.org/10.3390/pr11020342

AMA Style

Hu C, Chen S, Wu Z. Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks. Processes. 2023; 11(2):342. https://doi.org/10.3390/pr11020342

Chicago/Turabian Style

Hu, Cheng, Scarlett Chen, and Zhe Wu. 2023. "Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks" Processes 11, no. 2: 342. https://doi.org/10.3390/pr11020342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop