1. Introduction
In Model Predictive Control (MPC) [
1,
2], a dynamical model of the controlled process is used to predict its behaviour over a certain time horizon and to optimise the control policy. This problem formulation leads to very good control quality, much better than that in classical control methods. As a result, MPC methods have been used for a great variety of processes, e.g., chemical reactors [
3], heating, ventilation and air conditioning systems [
4], robotic manipulators [
5], electromagnetic mills [
6], servomotors [
7], electromechanical systems [
8] and stochastic systems [
9]. It must be pointed out that satisfactory control is only possible if the model used is precise enough. Although there are numerous types of dynamical models, e.g., fuzzy systems, polynomials, and piecewise linear structures [
10], neural networks of different kinds [
11] are very popular due to their excellent accuracy and simple structure [
12]. In particular, Recurrent Neural Networks (RNNs) [
13,
14,
15,
16] can serve as a model as they are able to give predictions over the required horizon.
In theory, RNNs can be extremely useful in various machine learning tasks in which the data are time-dependent such as modelling of time series, speech synthesis or video analysis. In contrast to the classical feedforward neural networks, RNNs can be used to create models and predictions from sequential data. However, in practice, their use is limited due to their one major drawback: the lack of long-term memory. RNNs have short-term memory capabilities; however, they tend to forget about the long-term input–output time dependencies during the backpropagation training. This problem is caused by the vanishing gradient phenomena, which was described in great detail in [
17,
18,
19]. Many ways of limiting the vanishing gradient influence on the training process have been proposed, such as using different activation functions (such as ReLU) or branch normalisation. Another approach is to modify the network architecture in a way that improves the gradient flow during training. Residual Neural Networks (ResNets) proposed in [
20] and the Long Short-Term Memory Network (LSTM) structure proposed first in [
18] and its modification—the Gated Recurrent Unit (GRU) architecture proposed in [
21]—can serve as examples.
The unique long-term memory properties of LSTM and GRU neural networks made them widely popular in a large variety of machine learning tasks. Example applications of the LSTM architecture are: data classification [
22], speech recognition [
23,
24], handwriting recognition [
25], speech synthesis [
26], text coherence tests [
27], biometric authentication and anomaly detection [
28], detecting deception from gaze and speech [
29] and anomaly detection [
30]. Similarly, example applications of the GRU structure are: facial expression recognition [
31], human activity recognition [
32], cyberbullying detection [
33], defect detection [
34], human activity surveillance [
35], automated classification of cognitive workload tasks [
36] and speaker identification [
37].
Recently, the LSTM networks have been also used to model dynamical processes. Examples are: a benchmark process [
38], a pH reactor [
39], a reverse osmosis plant [
40], temperature control [
41] or an autonomous mobility-on-demand system [
42]. In all cited publications, it was shown that the LSTM models are able to approximate the properties of dynamical processes; the models have very good accuracy. Some of these models have been used for prediction in MPC [
40,
41,
42]; very good control quality has been reported. Although GRU networks are similar to the LSTM ones and they have many successful applications in classification and detection tasks, as mentioned in the previous paragraph, they are very rarely used as models of dynamical processes, e.g., a tandem-wing quadplane drone model was discussed in [
43]. Hence, two important questions should be formulated:
- (a)
What is the accuracy of the dynamical models based on the GRU networks, and how do they compare to the LSTM ones?
- (b)
How do the GRU dynamical models perform in MPC, and how do they compare to the LSTM-based MPC approach?
Both of these issues are worth considering since the GRU networks have a simpler architecture and a lower number of parameters than the LSTM ones.
This work has three objectives:
- (a)
A thorough comparison of LSTM and GRU neural networks as models of two dynamical processes, polymerisation and neutralisation (pH) reactors, is considered. An important question is whether or not the GRU network, although it has a simpler structure as the LSTM one, offers satisfying modelling accuracy;
- (b)
The derivation of MPC prediction equations for the LSTM and GRU models;
- (c)
The development of MPC algorithms for the two aforementioned processes with different LSTM and GRU models used for prediction. An important question is whether or not the GRU network offers control quality comparable to that possible when the more complex LSTM structure is applied.
Unfortunately, to the best of the authors’ knowledge, the efficiency of LSTM and GRU networks as dynamical models and their performance in MPC have not been thoroughly compared in the literature; typically, the LSTM structures are used [
40,
41,
42].
The article is organised in the following way.
Section 2 describes the structures of the LSTM and GRU neural networks.
Section 3 defines the MPC optimisation task algorithm and details how the two discussed types of neural models are used for prediction in MPC.
Section 4 thoroughly compares the efficiency of LSTM and GRU neural networks used as models of the two dynamical systems. Moreover, the efficiency of both considered model classes is validated in MPC. Finally,
Section 5 summarises the whole article.
3. LSTM and GRU Neural Networks in Model Predictive Control
The manipulated variable, i.e., the input of the controlled process, is denoted by u, while the controlled one, i.e., the process output, is denoted by y. A good control algorithm is expected to calculate the value of the manipulated variable, which leads to fast control, i.e., the process output should follow the changes of the set-point. Moreover, since fast control usually requires abrupt changes of the manipulated variables, which may be dangerous for the actuator, such situations should be penalised. Finally, it is necessary to take some constraints; they are usually imposed on the magnitude and the rate of change of the manipulated variable. In some cases, constraints can also be imposed on the process output variable.
3.1. The MPC Problem
The vector of decision variables calculated online at each sampling instant of MPC is defined as the increments of the manipulated variable:
where the control horizon is denoted by
. The general MPC optimisation problem is:
The cost function can be divided into two parts. The first part describes the control error, which is defined as the sum of the differences between the set-point value and the output prediction over the prediction horizon N. The notation should be interpreted as follows: the prediction of the moment in the future is calculated in the current moment k. The second part of the cost function consists of the change of the manipulated variables multiplied by the weighting coefficient . When the whole cost function is taken into account, one can observe that it minimises both control errors and the change of control signals. Weighting coefficient is used to fine-tune the procedure.
The constraints of the MPC optimisation problem are as follows:
The magnitude constraints and are enforced on the manipulated variable over the control horizon ;
The constraints and are imposed on the increments of the same variable over the control horizon ;
The constraints put on the predicted output variable and over the prediction horizon N.
When the optimisation procedure calculates the decision vector (Equation (
19)) from Equation (
20), the first element of it is applied to the process. The most common way of this application is given by the following equation:
The whole computational scheme is then repeated at the next sampling instants.
In MPC [
2], the general prediction equation for the sampling instant
is:
where
. The output of the model for the sampling instant
calculated at the current instant
k is
, and the current estimation of the unmeasured disturbance acting on the process output is
. Typically, it is assumed that the disturbance is constant over the whole prediction horizon, and its value is determined as the difference between the real (measured) value of the process output and the model output calculated using the process input and output signals up to the sampling instant
:
3.2. The LSTM Neural Network in MPC
In the case of the LSTM model, to determine the predicted output, it is necessary to first calculate the prediction values of the cell state given by Equations (
6)–(
10) in the following way:
Using Equations (
6)–(
9) and Equation (
11), one can then calculate the prediction of the hidden state:
Finally, the prediction of the output signal can be calculated based on Equations (
18) and (
32) as:
Taking into account the input vector of the network (Equation (
4)), for prediction over the prediction horizon, the vector of arguments of the network is:
3.3. The GRU Neural Network in MPC
There is no cell state in the GRU neural networks, and therefore, to calculate the predicted output signal values
, only the prediction of hidden state
h is necessary to evaluate first. This is performed based on Equations (
14)–(
17) in the following way:
where
is an identity matrix with dimensions
. The prediction of the output signal Equation (
32), as well as the input vector Equation (
35) are the same as in the LSTM neural network model.
The proposed MPC control procedure may be summarised as follows:
4. Results of the Simulations
In order to compare the accuracy of the LSTM and GRU networks and their efficiency in MPC, we considered two dynamical systems: a polymerisation reactor and a neutralisation (pH) reactor.
4.1. Description of the Dynamical Systems
First, two considered processes are briefly described. Moreover, a short description of the data preparation procedure is given.
4.1.1. Benchmark 1: The Polymerisation Reactor
The first considered benchmark was a polymerisation reaction taking place in a jacketed continuous stirred-tank reactor. The reaction was the free-radical polymerisation of methyl methacrylate with azo-bis-isobutyronitrile as the initiator and toluene as the solvent. The process input was the inlet initiator flow rate
(m
h
); the output was the value of Number Average Molecular Weight (
) of the product(kg kmol
). The detailed fundamental model of the process was given in [
45]. The process was nonlinear: in particular, its static gain depended on the operating point. The polymerisation reactor is frequently used to evaluate model identification algorithms and advanced nonlinear control methods, e.g., [
12,
45,
46].
The fundamental model of the polymerisation process, comprising four nonlinear differential equations, was solved using the Runge–Kutta 45 method to obtain training and validation and test datasets, each of them having 2000 samples. After each 50 samples, there was a step change of the control signal. The magnitude of the control signal was chosen randomly. Next, since process input and output signals had different magnitudes, these signals were scaled in the following way:
where
and
20,000 denote the values of the variables at the nominal operating point. The sampling time was 1.8 s.
4.1.2. Benchmark 2: The Neutralisation Reactor
The second considered benchmark was a neutralisation reactor. The process input was the base (
) streamflow-rate
(mL/s); the output was the value of the pH of the product. The detailed fundamental model of the process was given in [
47]. The process was nonlinear since its static and dynamic properties depended on the operating point. Hence, it is frequently used as a good benchmark to evaluate model identification algorithms and advanced nonlinear control methods, e.g., [
46,
47,
48].
The fundamental model of the neutralisation process, comprising two nonlinear differential equations and a nonlinear algebraic equation, was solved using the Runge–Kutta 45 method to obtain training and validation and test datasets, each of them having 2000 samples. After each 50 samples, there was a step change of the control signal. The magnitude of the control signal was chosen randomly. The process signals were scaled in the following way:
where
and
denote the values of the variables at the nominal operating point. The sampling time was 10 s.
4.2. LSTM and GRU Neural Networks for Modelling of Polymerisation and Neutralisation Reactors
A number of LSTM and GRU models were trained for the two considered dynamic processes. All models were trained using the Adam optimisation algorithm. The maximum number of training epochs (iterations) was:
500 for the models with ;
750 for the models with ;
1000 for the models with .
The training procedure was performed as follows:
The order of the dynamics of the LSTM model was set to . The number of neurons in the hidden layer was set to . For the considered configuration, ten models were trained, and the best one was chosen;
The number of neurons was increased to two. Ten models were trained, and the best was chosen. This procedure was repeated until the number of neurons reached ;
The first two steps were repeated with the increased order of the dynamics , .
It is important to stress that setting the order of the dynamics to higher than did not result in any significant increase of the modelling quality. Therefore, further experiments with are not presented.
It is an interesting question if LSTM and GRU models without recurrent input signals can perform well in modelling tasks. In theory, the recurrent nature of hidden state h should be sufficient to ensure good model quality. To verify this expectation, an additional series of models was trained. The training procedure was similar to the one described above, the only difference being that now, the model order of the dynamics was first set to , , then increased to , and, finally, to , .
The quality of all trained models was then validated with the mean squared error chosen as the quality index. The models were validated in the nonrecurrent Autoregressive with eXogenous input (ARX) mode and the Output Error (OE) recurrent mode. The model input vectors for the two considered cases are:
It is important to stress that in the case of the models with , the ARX and OE modes were the same.
Taking into account the objective of this work, it is interesting to compare the accuracy of the LSTM and GRU models with different structures, defined by the number of neurons,
, and the order of the model dynamics, determined by
and
. For the polymerisation reactor, the results for the chosen networks are given in
Table 1 and
Table 2, and
Figure 4 depicts the model validation errors for all considered numbers of neurons. For the neutralisation reactor, the results for the chosen networks are given in
Table 3 and
Table 4, and
Figure 5 depicts the model validation errors for all considered numbers of neurons. The following notation is used:
is the the mean squared error for the training dataset in ARX mode;
is the the mean squared error for the validation dataset in ARX mode;
is the the mean squared error for the training dataset in recurrent mode;
is the the mean squared error for the validation dataset in recurrent mode.
The presented results can be summarised in the following way:
Based on the observations summarised above, it can be concluded that it is a good practice to train a model with a medium number of neurons and a low order of the dynamics. This approach may require many training trials, but as a result, the model has a relatively low number of parameters; therefore, a lower computational cost can be achieved. A direct comparison of the polymerisation reactor models can be seen in
Figure 10 and
Figure 11. Both models performed very well, and the modelling errors were minimal. A similar comparison for the pH reactor can be seen in
Figure 12 and
Figure 13. The modelling quality was again very satisfactory. Here, it is important to stress that in the case of the GRU model with
, it was necessary to choose one with a higher order of the dynamics to achieve results similar to those ensured by the simpler LSTM models.
4.3. LSTM and GRU Neural Network for the MPC of Polymerisation and Neutralisation Reactors
A few of the best-performing models were been chosen with the aim of being applied in the MPC control scheme for prediction. First, let us describe the tuning procedure of the MPC controller. It starts with the selection of the prediction horizon. It should be long enough to cover the dynamic behaviour of the process. However, if the horizons are too long, the computation cost of the optimisation task increases. The control horizon cannot be too short since it gives insufficient control quality, while its lengthening also increases the computational burden. The process of tuning was therefore as follows:
The constant weighting coefficient was assumed;
The prediction horizon N and the control horizon were set to have the same, arbitrarily chosen lengths. If the controller was not working properly, both horizons were lengthened;
The prediction horizon was gradually shortened, and its minimal possible length was chosen (with the condition );
The effect of changing the length of the control horizon on the resulting control quality was then assessed experimentally (e.g., assuming successively ). The shortest possible control horizon was chosen;
Finally, after determining the horizon’s lengths, the weighting coefficient was adjusted.
After applying the tuning procedure on both processes under study, the following settings were determined:
, , for the polymerisation process;
, , for the neutralisation process.
Simulations of the MPC algorithms were performed with MATLAB. For optimisation, the fmincon() function was used with the following settings:
MPC performance using the models without the recursive input signals (
) proved to be very satisfactory. In the case of the polymerisation reactor, in
Figure 14, minimal overshoot and a short settling time can be observed. Similar control quality was achieved for the neutralisation reactor system as depicted in
Figure 15. Interestingly enough, for MPC with more complex models (
), the results were comparable, as demonstrated in
Figure 16. In the case of the polymerisation system and the LSTM model, small oscillations around the set-point could be observed, as shown in
Figure 17, and the overall control quality was slightly worse.
Table 5 and
Table 6 compare the simulation results of the MPC algorithms based on the LSTM and GRU models, for the polymerisation and neutralisation processes, respectively. The following indicators used in process control performance assessment were considered [
49]:
The sum of squared errors (E);
The Huber standard deviation () of the control error;
The rational entropy () of the control error.
Additionally, the average time of calculation (t) during the whole simulation horizon (in seconds) was specified.
From the performed experiments, we were able to draw the following conclusions:
It is important to stress that the above observations are true for the two considered processes.