1. Introduction
Batch and fed-batch processes are commonplace in industrial fermentation processes. In a fed-batch process, a feed stream is used to supply additional substrate to the reactor during the reaction. The feed rate profile of this feed stream can be changed during the process, and the final product is removed at the end of the reaction. Introducing a feed stream helps to improve the control of reactant concentrations and, therefore, can be used to discourage undesired reactions from taking place and to improve product selectivity [
1]. These benefits only transpire when a suitable feeding strategy is used. Therefore, determining the feeding strategy that yields the maximum amount of desired product is the main design objective for fed-batch reactors (i.e., the optimal control strategy) [
2]. In the case of fermentations, this is usually where the rate of substrate being added to the reactor is equal to the rate at which it is used up by the organism [
3]. The ever-increasing market competition and the stricter regulations on product quality and process efficiency mean that the optimization of fed-batch fermentations is becoming increasingly important [
4]. An optimal control strategy will also help improve profits from the process [
The simplest way of optimizing the fed-batch process is to build an off-line control strategy using a model of the process. However, the presence of process–model mismatches (PMMs) and unknown disturbances renders the control strategy suboptimal when applied to the real fed-batch process [
4]. Therefore, strategies have been proposed to mitigate this issue. Batch-to-batch optimization involves taking data from the previous batch and using them, with historical batch data, to optimize the performance of the following batch.
In order to develop an optimal control strategy, a suitable model of the process is required. There are three different types of models that can be used: mechanistic models, data-driven models, or hybrid models. A mechanistic model uses knowledge of the kinetics, stoichiometry, and mass and energy balances of the reaction system. Data-driven or “black-box” models are constructed from process operational data and used to describe the relationship between the manipulated variables and the output variables [
6]. Hybrid models combine elements of data-driven and mechanistic models, taking advantages of both [
10]. Von Stosch et al. [
11] gives a detailed review of hybrid modeling in process systems engineering.
A common approach for developing a data-driven model is through using neural networks [
13]. The most commonly used neural networks are multi-layer feedforward neural networks consisting of an input layer, an output later, and one or more hidden intermediate layers [
14]. Each layer has several neurons, which are connected to neurons in the adjacent layers via weighted connections. Data are processed in these neurons as they pass through the network until they reach the output layer [
15]. Fed-batch processes are nonlinear systems and are often modeled using neural networks due to their excellent approximation ability [
17]. Xiong et al. [
18] developed a batch-to-batch control strategy based on a control affine feedforward neural network, which used modeling errors from the previous batch to improve the model. The model prediction improved after each batch, and so process–model mismatch was reduced. Zhang et al. [
4] developed a linearized model that was rebuilt after each batch by adding data from the recently completed batch to the pool of historical process data. The optimal control strategy using the updated model was able to respond to disturbances. Jewaratnam et al. [
3] expanded on the updating linearized model by introducing a “sliding window” approach in which only data from a window of the most recent batches were used to update the model. The sliding window improved the convergence rate and stability following a disturbance.
ELM is a type of neural network developed by Huang et al. [
19]. It differs from standard single-layer feedforward neural networks (SLFNs) in the way the network is trained. In standard SLFNs, hidden-layer weights must be solved iteratively using a training algorithm like back propagation, and a large number of training iterations are typically required [
20]. Whereas, in ELM, the weights of the hidden layers are assigned randomly and are not repeatedly adjusted [
21]. This allows the output-layer weights to be solved as a linear system of equations, which makes the training the ELM model significantly faster [
22]. ELM also provides very high generalization performance [
23]. Alli and Zhang [
21] proposed using ELM in conjunction with recursive least squares (RLS) for batchwise model updating. Unknown disturbances or PMMs are detrimental to the ELM model prediction. RLS is, therefore, used to address any PMM. After each batch, RLS uses the latest model prediction error to update the weights of the ELM neural network output layer, allowing the ELM model to respond to disturbances. The method was shown to be successful in suppressing the effects of PMM and disturbances. Recursive methods have also been recently used in developing process monitoring methods. Li et al. [
24] presented a recursive principal component analysis method for adaptive process monitoring. Yu and Zhao [
25] developed a recursive exponential slow feature analysis method for adaptive process monitoring. Yu et al. [
26] proposed a recursive cointegration approach for the adaptive monitoring of nonstationary industrial processes.
This paper proposes using the recursively updated ELM model in a batch-to-batch optimization control strategy. An ELM model is initially developed from historical process operation data. After each batch, the ELM model will be updated using RLS and used in an optimization framework to maximize the final biomass of a fed-batch fermentation in the next batch. Using a recursively updated model will allow the model to adapt to disturbances and reduce PMM with each iteration. A mechanistic model of a fed-batch fermentation will be used to simulate a real fed-batch process.
This paper is organized as follows:
Section 2 presents the fed-batch fermentation process, ELM, batch-to-batch ELM model updating using RLS, and batch-to-batch optimization control.
Section 3 presents the results and discussions.
Section 4 concludes this paper.
2. Materials and Methods
2.1. Fed-Batch Fermentation Model
A mechanistic model for the production of baker’s yeast in a fed-batch fermenter is used in this study. The model was taken from Yüzgeç et al. [
27]. Equations (1)–(12) outline the kinetic model.
Specific growth rate limit:
Oxidation glucose metabolism:
Reductive glucose metabolism:
Oxidative ethanol metabolism:
Total specific growth rate:
Carbon dioxide production rate:
In the above equations, Qi is the specific consumption or production rate of (); Ci is concentration of the ith component (g L−1); Ke is the saturation constant for ethanol (g L−1); Ki is the inhibition constant (g L−1); Ko is the saturation constant for oxygen (g L−1); Ks is the saturation constant for the substrate (glucose) (g L−1); t is time (h); td is the time delay (h); the subscripts or superscripts c, cr, e, lim, max, o, ox, red, s, up, and x represent carbon dioxide, critic, ethanol, limitation, maximum, oxygen, oxidative, reductive, substrate (glucose), uptake, and biomass, respectively; and μ is the specific growth rate (h−1).
The reactor was modeled as isothermal. The dynamic model is outlined in Equations (13)–(18).
Ce, and
Cx represent, respectively, the concentrations of glucose, oxygen, ethanol, and biomass;
F and
Fa stand for feed rate and air feed rate, respectively;
kLao is the total volumetric mass transfer coefficient (h
V is the volume of the content in the reactor; and
AR denotes the cross-sectional area of the reactor.
The values of the parameters used in the model are given in
Table 1. The initial conditions used in the model are given in
Table 2. A simulation based on this model was built in MATLAB [
28] and was used to generate historical batch data and test new feed profiles obtained from the optimization process.
2.2. Extreme Learning Machine
An example of a single-hidden-layer feedforward neural network (SLFN) is shown in
Figure 1. An input layer consisting of
n input neurons,
x, is connected to a hidden layer of
neurons via a series of hidden-layer weights,
w, and biases,
b. The output values of the hidden-layer neurons are computed using these inputs, weights, and biases in an activation function. The value of the output-layer neurons is then calculated using the linear relationship between the hidden-layer outputs and the output-layer weights,
For the SLFN used in this investigation, each input neuron corresponds to the flowrate at the respective feed interval, and the single output neuron represents the final biomass concentration.
The extreme learning machine (ELM), proposed by Huang et al. [
23], is a learning algorithm for an SLFN. They proved that the method was several orders of magnitude faster and could achieve improved generalization performance, when compared to other methods (such as back propagation). The main principle of ELM is that the hidden-layer weights and biases are chosen randomly.
N distinct samples, (
), where
is the
th set of input values and
is the
th set of target values, the hidden-layer output matrix of the neural network is given by
is the weight vector connecting the input neurons to the
th hidden-layer neuron, and
is the bias for the
th hidden neuron.
Using ELM training, the hidden-layer weights and biases are chosen randomly and are used in the activation function
, along with the input sample values, to compute
. The output neurons are linear in nature for ELM such that,
, is the weight vector connecting the
th hidden-layer neuron to the output neurons, and
is the network output for the
th sample.
So long as the activation function is infinitely differentiable, this SLFN can approximate the
N samples with zero error
, i.e., a properly trained network with correct
, and
suggests the following:
This equation can be written more simply as follows:
The hidden-layer output matrix,
, can be calculated simply as shown in Equation (19), and the matrix of target output values,
, is already known. Therefore, training the SLFN involves finding the least-squares solution
of the linear system given in Equation (22). The matrix of weights connecting the hidden layer to the output layer,
, is calculated as follows:
is the Moore–Penrose generalized inverse of matrix
Due to there often being an unequal number of hidden neurons, , and distinct training samples, , is rarely a square matrix. Therefore, the hidden-layer output matrix, , is noninvertible, and so the Moore–Penrose generalized inverse is used to compute the system in Equation (22). Singular value decomposition (SVD) is the method used to solve the Moore–Penrose generalized inverse.
2.3. Recursive Least Squares
Recursive least squares (RLS) is used to update the ELM model parameters following each subsequent batch. It works in such a way that the parameters calculated from
N observations are used in the estimation of parameters for
N + 1 observations. This is more computationally efficient than simply recalculating the least squares after each observation from scratch [
Consider a system of historical data based on
N pairs of input and output data,
is a column vector of system input values, and
is the scalar system output for the
th measurement. The system is modeled by the following linear model:
The model parameters relating
can be estimated through least-squares estimation by minimizing the loss function as follows:
are the estimation errors, and
are the estimated values. The solution to the minimization is to find an estimation of the parameters,
, that satisfies the following equation:
By left multiplying both sides of the above equation by
and then left multiplying both sides by (
−1, the least-squares solution is given by the following:
Upon obtaining additional measurements of the system, the matrix
is expanded by a row, and the vector
is expanded by an element as follows:
The model parameter estimation from Equation (27) can then be written as follows:
Assuming the matrix
is positive definite, the recursive least squares can be solved from the following equations:
For time-varying systems, it is necessary to introduce a forgetting factor,
), which works to reduce the influence of old data. A smaller forgetting factor will “forget” the old data faster. Using a forgetting factor, Equations (31) and (32) become
Equations (29), (30), (34) and (35) need to be updated after each subsequent batch. The new ELM parameters (output-layer weights and bias) are taken from , where b2 is the bias for the output layer neuron.
2.4. Batch-to-Batch Optimization Control
The aim of batch-to-batch optimization control is to maximize the amount of biomass at the end of the batch by adjusting the substrate feed profile. In order to carry out the optimization, an ELM model linking the batch feed profile and the final biomass concentration at the end of a batch is developed. The ELM model is of the following form:
y is the final biomass concentration at the end of a batch;
U = (
u2, …,
u8) is the feed profile for the batch; and
f() is a nonlinear function represented by an ELM. Here, we divide the batch duration into 8 equal intervals, and the substrate feed rate is kept constant during each interval. Hence, the feed profile is a vector containing 8 elements.
For the calculation of the optimal feed profile for the current batch (the
kth batch),
Uk, the optimization uses the feed profile from the previous batch,
Uk−1, as the initial search value. This feed profile is applied as the inputs to the ELM model in order to predict the final biomass concentration, which is used in a numerical optimization procedure. Therefore, the ELM model has 8 input neurons and one output neuron. The final volume is also calculated and multiplied with the predicted final biomass concentration to find the final amount of biomass. The optimization problem is solved using the MATLAB function, “fmincon”, from the MATLAB Optimization Toolbox and is outlined by Equation (37). Note that minimization is typically considered in optimization tools, so a negative sign is added to the predicted amount of the final biomass to convert the maximization problem into a minimization problem. The optimization solution is subject to the constraints that the feed at each interval cannot exceed 2600 L h
−1, and the final volume in the reactor should not exceed the maximum.
In the above equation, V0 is the initial volume of the content in the reactor at the start of a batch; V is the final volume of the content in the reactor at the end of a batch; U is the feed profile; and ts is the interval length (or sampling time).
It should be noted that this optimization control strategy is not a conventional feedback control strategy in which a manipulated variable is continuously adjusted by a feedback controller. In this optimization control strategy, an ELM model of the batch process is used in an optimization framework to find the best feeding profile.
2.5. Batch-to-Batch Optimization Control Strategy Integrating ELM and RLS
In this section, the batch-to-batch optimizing control strategy integrating ELM and RLS for the fed-batch fermentation process is outlined. The RLS updates the parameters of the ELM model, i.e., the output-layer weights and bias, by recursively solving the least-squares problem. This will work to amend any plant–model mismatches and the effects of unknown disturbances. The output-layer weights and bias of the ELM for the previous batch are used as the initial guess for the RLS procedure. The parameters are updated iteratively in the RLS algorithm. Combining ELM with RLS is beneficial when used on real plant systems due to the historical process data often being inaccurate or insufficient. The updated ELM model can be then used for the optimization of the next batch. The optimization will benefit from the ELM model being up-to-date and accurate.
Figure 2 gives the framework of the proposed approach, and the details are explained below.
To address the situation in which only limited historical process operation data are available, we started with a small number of historical batches. A set of 25 feed profiles, each consisting of 8 feed intervals, was created. Each interval had a duration of 2 h, and the feed rate was kept constant throughout the interval. These feed profiles were generated by adding uniformly distributed random variations (up to ±50 L h
−1) to a baseline profile to emulate how different operators might have different batch strategies based on their own experience.
Figure 3 shows all the 25 feed profiles used. Due to the large number of curves, legends have not been added to the figure. It can be seen from
Figure 3 that these feed profiles have a general increasing trend as the batch progresses. This is consistent with the general feed profiles for this process reported in previous studies [
30]. Each feed profile was simulated in the fed-batch fermentation mechanistic model to find the corresponding final biomass concentrations. To represent real processes in which measurement noise always exists, normally distributed noise with zero mean and a standard deviation of 0.1 (g L
−1) was added to the final biomass concentrations. The data in this study were generated through simulation. In practical applications, the substrate feed rate can be measured using a flow meter, and the biomass concentration at the end of a batch can be either measured on-line or through laboratory analysis.
The historical batch data were normalized by scaling to zero mean and unit variance. They were then randomly partitioned into training, validation, and unseen testing data sets. The number of hidden neurons in the ELM model was determined using cross validation. The ELM model was trained using the training data set and with different numbers of hidden neurons. In this study, we consider the number of hidden neurons to be in the range of 1 to 60. The number of hidden neurons that resulted in a low combined root-mean-square error (RMSE) in the training and validation data was selected.
Figure 4 shows how the modeling error changed with the number of hidden neurons. The final selected ELM model had 35 hidden neurons based on
Figure 4. Note that, if the final selected number of hidden neurons is close to 60 (the upper end), then the upper end of the range should be extended. A plot of the predicted against the actual final biomass concentration, showing the performance of the trained ELM model on all 3 data sets, is given in
Figure 5. Note that the values shown in the plot are normalized values.
A given batch feed profile was run in both the fed-batch simulation and the trained ELM model. In order for the feed profile to run in the fed-batch simulation, it was scaled back to the original scale. The prediction error of the ELM model was used in the RLS updating algorithm, i.e., Equations (29), (30), (34) and (35), to update the output-layer weights and bias of the ELM model. The updated model was then used in the optimization procedure to find an optimal feed profile for the next batch. The optimization yields a new feed profile that provides the maximum amount of the final biomass that the current model can achieve, subject to the constraints of the optimization. The process was repeated from batch to batch, using the new optimized feed profile with each batch. The reason for continuously updating the model is that it can then handle unknown disturbances. Also, in the case of limited historical batches for training the model, continuously updating the model will help it to adapt and improve its generalization capabilities.
As ELM models can be trained quickly when the number of data are not large, model redevelopment is feasible when the number of data are not large. However, when the number of data are large, which is the case in practical situations in which a batch process can run for years, re-developing an ELM model after each batch with a large number of data is costly. This issue can be addressed by recursively updating the ELM model from batch to batch using RLS.