*Article* **Metabolic Flux Analysis of VERO Cells under Various Culture Conditions**

**Georges Bastin 1,\*, Véronique Chotteau <sup>2</sup> and Alain Vande Wouwer <sup>3</sup>**

	- alain.vandewouwer@umons.ac.be

**Abstract:** Although the culture of VERO cells in bioreactors is an important industrial bioprocess for the production of viruses and vaccines, surprisingly few reports on the analysis of the flux distribution in the cell metabolism have been published. In this study, an attempt is made to fill this gap by providing an analysis of relatively simple metabolic networks, which are constructed to describe the cell behavior in different culture conditions, e.g., the exponential growth phase (availability of glucose and glutamine), cell growth without glutamine, and cell growth without glucose and glutamine. The metabolic networks are kept as simple as possible in order to avoid underdeterminacy linked to the lack of extracellular measurements, and a unique flux distribution is computed in each case based on a mild assumption that the macromolecular composition of the cell is known. The result of this computation provides some insight into the metabolic changes triggered by the culture conditions, which could support the design of feedback control strategies in fed batch or perfusion bioreactors where the lactate concentration is measured online and regulated by controlling the delivery rates of glucose and, possibly, of some essential amino acids.

**Keywords:** metabolic flux analysis; metabolic network; VERO cells; biotechnology

#### **1. Introduction**

The production of biopharmaceuticals using cultures of genetically modified strains has gained tremendous importance in the drug manufacturing sector. In this context, it is important to understand and assess the influence of the culture conditions, and the impact of metabolic engineering, on the yield of the products of interest. This can be achieved through an analysis of the flux distribution inside the metabolic network of the cells or microorganisms under consideration. Various computational procedures have been proposed for that purpose, including metabolic flux analysis and flux balance analysis [1].

Even though there has been a significant number of reports of the application of these procedures to cultures of CHO cells and hybridoma cells (e.g., [2–8]), there has been surprisingly few reports focusing on the metabolism of VERO cell cultures [9]. However VERO cells are important vectors for the production of viruses (and vaccines) (e.g., [10–18]).

The objective of this study is to apply metabolic flux analysis to small metabolic networks of VERO cells, on the basis of experimental data collected in three different culture conditions. In each case, the network is designed to be fully compatible with the data while being kept as simple as possible to avoid the underdeterminacy that usually prevails when manipulating large metabolic networks. In this study, the considered metabolic networks allow keeping the underdeterminacy at a minimum, and to compute a unique solution based on the only additional mild assumption that the macromolecular (proteins, nucleic acids, membrane lipids) composition of the cell is as reported in the literature [19] (p. 113, Table 7.1).

**Citation:** Bastin, G.; Chotteau, V.; Vande Wouwer, A. Metabolic Flux Analysis of VERO Cells under Various Culture Conditions. *Processes* **2021**, *9*, 2097. https://doi.org/ 10.3390/pr9122097

Academic Editor: Florian M. Wurm

Received: 10 October 2021 Accepted: 15 November 2021 Published: 23 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The paper is organized as follows. The next section presents the experimental data, cell densities, and metabolite concentrations, collected in two batch cultures of VERO cells. From these data, depending on the availability of glucose and glutamine, three types of culture growth are distinguished. The metabolic network and its analysis are then detailed and discussed in Section 3 for the exponential growth phase with glucose and glutamine as carbon and nitrogen sources, respectively. Next, in Section 4, we consider the case where glutamine is replaced by glutamate as the source of nitrogen. Finally, the case where lactate becomes the source of carbon instead of glucose is addressed in Section 5. Final conclusions are presented in the last section.

#### **2. Experimental Data**

In this paper, we use data from two batch cultures of VERO cells, labeled (a1) and (a2), which were simultaneously carried out over a period of eight days in parallel spinner-flasks, using the same culture medium. In particular, the two cultures were seeded from the same pool of cells. The only difference between the two cultures lies in the initial concentration of glucose. A full description of the materials and methods of these experiments can be found in [20] (Section 3).

The experiments were performed in spinner-flasks (paddle impeller type). The culture volume was 250–270 mL. VERO cells (passage 136–146) were grown adherently on Cytodex 1 microcarriers (3.5 g/L). The spinner-flasks were inoculated with approximately 105 cells/mL, which corresponded to eight cells per microcarrier on average. The basic culture medium was M199 supplemented at inoculation with fetal calf serum (10% *v*/*v*) and antibiotic (neomycin sulfate 5% *v*/*v*). The culture was magnetically stirred at 45 RPM. The oxygen supply was provided by transfer via the head space. The atmosphere of the head space was renewed twice a day.

The culture medium was sampled (2 mL) twice a day for analysis (except on day 5, where there was only one sample). Cell counting was done with a hemacytometer using crystal violet staining. Glucose and lactate concentrations were determined with a Yellow–Springer analyzer. Amino acids and ammonia were determined with the HPLC method.

The data collected during these cultures are presented in various figures hereafter. The time evolution of cell densities (counting) is shown in Figure 1. We can readily observe two different successive phases in both cultures. The growth begins with a classical exponential phase during the first four days. Then, from the fourth to the eighth days, there was a shift to a slower quasi-linear growth. An explanation for this behavior can be found in Figure 2 where the time evolution of the concentrations of glucose and glutamine in the culture medium are shown. Indeed, it can be seen that, after the fourth day, both glucose and glutamine are depleted in culture (a1). However the growth proceeds more slowly, with lactate as the carbon source, while alanine and glutamate are alternative nitrogen sources in the central metabolism (see Figure 3). In contrast, culture (a2) is operated with an excess of glucose, so that only glutamine is depleted on the fourth day and replaced by glutamate as a source of nitrogen. These three different situations of the culture conditions are summarized in the Table 1 below. Our purpose, in this paper, is to perform a metabolic flux analysis in order to compute and compare the distributions of the intracellular metabolic fluxes in these three situations.

**Table 1.** Three different culture conditions.


**Figure 1.** Time evolution of cell density in cultures a1 (**left**) and a2 (**right**).

**Figure 2.** Time evolution of substrates and products of the central metabolism in cultures a1 and a2.

**Figure 3.** Time evolution of non-essential amino acids in cultures a1 and a2.

#### *Data of Amino Acids*

Concentration measurements of eighteen amino acids in the culture medium were measured with an HPLC method. In Figure 4, we present the data for nine essential amino acids, which are naturally consumed in correlation with the cell growth: arginine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, tryptophan, and valine. Data are not available for threonine. It can be seen that the shape of the consumption is quite similar for both cultures, (a1) and (a2), with only marginal deviations for leucine and isoleucine.

Moreover, in addition to glutamine in Figure 2, we present in Figure 3 the measurements for eight other non-essential amino acids: alanine, aspartate, cysteine, glutamate, glycine, proline, serine, and tyrosine. It can be seen that these measurements do not always follow the shape of the cellular growth. In particular, glutamate and alanine are accumulated in the culture medium during the exponential growth, but are significantly consumed when glutamine is depleted. We also note that the medium does not contain asparagine at the start of the culture and that asparagine data are not available throughout the culture.

In Tables 2–4, the specific uptake and/or excretion rates of all the species measured in the culture medium are given for the three considered culture conditions (Tables 2 and 4 are complementary and when a rate does not appear in one table (symbol –––––) it appears in the other one). These rates are estimated from the slopes of the solid curves that fit the experimental data in Figures 2–4, at time *t* = 2.90 days for the exponential growth and at time *t* = 6.64 days for the growth without glucose and/or glutamine.


**Table 2.** Specific uptake rates (μM/d <sup>×</sup> <sup>10</sup><sup>7</sup> cell) Glucose, Lactate and non-essential AA.

**Table 3.** Specific uptake rates (μM/d <sup>×</sup> <sup>10</sup><sup>7</sup> cell) for essential AA.


**Table 4.** Specific excretion rates (μM/d <sup>×</sup> <sup>10</sup><sup>7</sup> cell) Lactate, NH3 and non-essential AA.


**Figure 4.** Time evolution of essential amino acids in cultures a1 and a2.

#### **3. Metabolic Flux Analysis of the Exponential Growth Phase**

#### *3.1. Metabolic Network*

The metabolic network considered for the exponential growth is made up of all the biochemical reactions in Figures 5–7. The main motivations behind the set-up of this network are given in the present section.

#### 3.1.1. Central Metabolism

For the growth of mammalian cells, the central metabolism involves glycolysis, TCA cycle, and glutaminolysis, as represented in Figure 5. For simplicity, the pentose phosphate pathway is neglected.

**Figure 5.** Exponential growth: central metabolism involving glycolysis, TCA, and glutaminolysis, nucleotide synthesis, and metabolism of alanine, asparagine, aspartate, histidine, glycine, and serine.

**Figure 6.** Metabolic network for seven essential amino acids (isoleucine, leucine, lysine, methionine, phenylalanine, tryptophan, valine) and two non-essential amino acids (cysteine, tyrosine).

**Figure 7.** Metabolic network for arginine (essential) and proline (non-essential). (**a**) Exponential growth; (**b**) growth without glutamine.

3.1.2. Synthesis of Proteins

Essential amino acids cannot be synthesized and must be provided in the culture medium. Therefore, the maximum possible production rate of proteins is determined by the essential amino acid with the lowest ratio between its external uptake rate (from Table 3) and its frequency in protein composition as given in Table 5.


**Table 5.** Specific consumption rates of amino acids (AA) for protein production [μM/d <sup>×</sup> <sup>10</sup><sup>7</sup> cell]. (Essential AA are in bold).

**<sup>1</sup>**Average of frequencies given in [3,21,22].

In the case of exponential growth, among all essential measured amino acids, phenylalanine is the one with this lower ratio. Hence, assuming a maximization of the biomass production, we suppose that phenylalanine is exclusively used for protein production. Therefore the protein production flux from phenylalanine *wPhe* must be equal to the external uptake rate given in Table 3, i.e., *wPhe* <sup>=</sup> *vPhe* = 0.228 <sup>μ</sup>M/d <sup>×</sup> 107 cell. On this basis, we can then compute the contribution of each amino acid to the production rate of proteins given in Table 5 with the formula:

$$w\_{AA} = w\_{\text{Phc}} \frac{f\_{AA}}{f\_{\text{Phc}}}.\tag{1}$$

where *wAA* is a specific intracellular consumption rate of amino acid *AA* for protein production, *fAA* is the frequency of amino acid *AA* in the protein composition (and *AA* = *Phe* for phenylalanine).

#### 3.1.3. Synthesis of Nucleotides

The synthesis of nucleotides is represented by the following standard overall biochemical reactions:

1 Ribose-5-P + 2 Glutamine + 1 Aspartate + 1 Glycine + 1 CO2 + 5 ATP −→ 2 Glutamate + 1 Fumarate + 4 ADP + 1 AMP + 1 Purine

1 Ribose-5-P + 1 Glutamine + 1 Aspartate + 2 ATP

−→ 1 Glutamate + 2 ADP + 1 Pyrimidine

Furthermore, we assume that DNA and RNA are made up with, approximately, equal shares of purine and pyrimidine nucleotides. It results that, omitting the co-factors ATP, ADP and AMP, nucleotide synthesis is represented in the network of Figure 5 by the single overall reaction:

2 Ribose-5-P + 3 Glutamine + 2 Aspartate + 1 Glycine + 1 CO2 −→ 3 Glutamate + 1 Fumarate + 2 Nucleotides. (2)

#### 3.1.4. Catabolism of Essential Amino Acids

Only catabolic pathways must be considered for essential amino acids since they cannot be synthesized in the cell. We adopt the standard catabolic reactions represented in Figures 6 and 7. It can be verified that this representation is fully consistent with the available data because we have 0 < *wAA vAA* for all essential amino acids (with *vAA* from Table 3 and *wAA* from Table 5).

#### 3.1.5. Metabolism of Non-Essential Amino Acids

For non-essential amino acids, both catabolic and anabolic pathways can be taken into account. From the data of Tables 2, 4 and 5, it appears that the metabolism of non-essential AA may strongly depend on the culture conditions.

In the phase of exponential growth, anabolic pathways must be considered for alanine and glutamate because, as seen in Table 4, they are excreted in the culture medium and, therefore, produced inside the cell at a level that is widely in excess, with respect to the amount needed for protein production. The metabolism of alanine and glutamate is represented in Figure 5.

Moreover, from the data of Table 2, it appears that the uptake rates of extracellular aspartate, glycine, and serine are not sufficient to reach the protein production level given in Table 5, and that an intracellular synthesis must be provided, too. The metabolism of these amino acids is also represented in Figure 5. In Figure 5, a synthesis pathway is provided for asparagine together with an excretion. This assumption will be motivated in the next section.

In contrast, again from Table 2, we see that the uptake rate of cysteine, proline, and tyrosine is large enough for protein production, and that an additional catabolic pathway is needed. The catabolic reactions are represented in Figure 6 for cysteine and tyrosine, and in Figure 7a for proline.

#### 3.1.6. Synthesis of Lipids

Finally, in addition to nucleotides and amino acids, we consider the lipids as the last fundamental building blocks of the biomass, in order to set up a model that is consistent, in terms of mass balance with a sufficient accuracy. For simplicity, we assume however that acetyl-CoA is the only significant contributor to the molecular mass of lipids, with a rate denoted *wLip* as shown in Figure 5. Indeed, the other necessary precursors of membrane lipids (e.g., serine, choline, ethanolamine, or dihydroxyacetone phosphate) are used in such low proportions that they can be neglected without significant loss of accuracy.

#### *3.2. Metabolic Flux Analysis*

#### 3.2.1. Balance Equations

The metabolic fluxes satisfy the following set of balance equations.



#### 3.2.2. Computation of Metabolic Fluxes

The above 39 × 41 system of linear equations is underdetermined. One of the reasons for this indeterminacy is that asparagine data are not available and the rate *vAsn* of asparagine transfer between the cell and the external culture medium is unknown. In order to get a unique solution, we introduce the additional constraint that the macromolecular (proteins, nucleic acids, membrane lipids) composition is as reported, e.g., in [19] (p. 113, Table 7.1) for mammalian cells. From this reference, the mass of proteins is roughly 12-fold larger than the mass of nucleic acids in mammalian cells. Furthermore, we know that the average mass of a nucleotide is about three-fold larger than the average mass of an amino acid. Using molar units as we do in this paper, it follows that the sum ∑ *wAA* (see Table 5) of amino acid rates for protein production must be approximately 36-fold larger than the nucleotide production rate and, consequently, 72-fold larger that the rate *v*<sup>39</sup> of reaction (2), i.e.,

$$\frac{\sum w\_{AA}}{v\_{\Im 9}} \approx 72.\tag{3}$$

Similarly, the mass of protein is roughly 3.5-fold larger than the mass of membrane lipids, while the average mass of a phospholipid is about 7-fold larger than the average mass of an amino acid. Then, since the production of one mole of phospholipids consumes about 18 moles of acetyl-CoA, we deduce that we have approximately:

$$\frac{\sum w\_{AA}}{w\_{Lip}} \approx \frac{3.5 \times 7}{18} = 1.36,\tag{4}$$

where *wLip* denotes the consumption rate of acetyl-CoA for lipid synthesis (see Figure 5).

Under these additional constraints (3) and (4), the system of equations is determined and has the following solution:


These results can be summed up as follows:

Assuming that


Then


#### **4. Metabolic Flux Analysis for the Growth without Glutamine**

*4.1. Metabolic Network*

4.1.1. Central Metabolism and Nucleotide Synthesis

As explained in Section 1, we now consider a case where there is no glutamine in the culture medium while glucose is in excess and not limiting. Glutamate (and aspartate to a lesser extent) become the main nitrogen sources. Obviously, in that case, glutamine must be synthesized inside the cells. The central metabolism is therefore slightly modified as represented in Figure 8 with a pathway for the synthesis of glutamine from glutamate.

Moreover, as shown in Figure 8, the pathway to nucleotide synthesis is assumed to be identical to that of exponential growth (see Section 3.1.3).

**Figure 8.** Growth without glutamine: central metabolism involving glycolysis, TCA and glutamine synthesis, nucleotide synthesis and metabolism of alanine, asparagine, aspartate, histidine, glycine, and serine.

#### 4.1.2. *Synthesis of Proteins*

In this phase of growth without glutamine, histidine turns out to be the essential amino acid with the lowest ratio between its external uptake rate (from Table 3) and its frequency in protein composition (from Table 5). If we assume (as in the previous section) that the cell growth is maximized and that histidine is exclusively used for protein formation, then the total protein production rate can be estimated as

$$\left(\sum w\_{AA}\right)\_{\text{wg}} \approx \frac{v\_{H\text{is}}}{f\_{H\text{is}}} = \frac{0.045}{0.024} = 1.87 \,\frac{\mu\text{M}}{\text{d} \times 10^7 \text{ cells}}\tag{5}$$

where the values of *vHis* and *fHis* are taken from Tables 3 and 5, respectively. We can then compute the contribution of each amino acid to the production rate of proteins. The result is given in Table 5.

From the available experimental data, the assumption of cell growth maximization could however be questionable when glutamine is depleted. The reason is that, as it can be seen in Figure 4, the net uptake rate of essential amino acids is similar in magnitude to the exponential growth while the specific cell growth rate is much smaller. An alternative natural assumption is to suppose that the rate of protein production is proportional to the rate of cell growth. Under this assumption, we have another manner to compute a plausible estimate of the protein production rate, as follows:

$$\left(\sum w\_{AA}\right)\_{\text{wg}} \approx \frac{\mu\_{\text{wg}}}{\mu\_{\text{cg}}} \left(\sum w\_{AA}\right)\_{\text{cg}} = \frac{0.18}{0.6} \times 6.159 = 1.85 \,\frac{\mu\text{M}}{\text{d} \times 10^7 \text{ cells}}.\tag{6}$$

In Equations (5) and (6), the subscripts '*wg*' and '*eg*' refer to the growth without glutamine and to the exponential growth, respectively. The values of the growth rates *μeg* and *μwg* are taken from Table 1.

It is remarkable that these two estimations of the protein production rate are almost equal, although they are obtained under totally different assumptions. In our viewpoint, this certainly gives a strong consistency to the validity of our experimental data and to the relevance of the assumption of growth maximization, which appears to be very plausible, not only for the exponential growth with non-limiting glucose and glutamine resources, but also in the case of growth with glutamine limitation. It will be seen, in the next section, that the situation is very different for the culture without glucose.

#### 4.1.3. Metabolism of Amino Acids

For the catabolism of essential amino acids, we adopt the same standard catabolic reactions represented in Figures 6 and 7.

Moreover anabolic pathways must be provided for glycine and proline, which are excreted into the extracellular medium under the current conditions (see Table 2). The metabolic pathways are given in Figure 7b for proline and in Figure 8 for glycine. Note that the difference between Figure 7a,b lies in the inversion of fluxes *v*17, *v*<sup>18</sup> and *vPro*.

Finally, catabolic pathways are used for aspartate, cysteine, serine, and tyrosine because the uptake rate from the culture medium is larger than their contribution to the flux in protein production given in Table 5. The metabolism of these amino acids is represented in Figure 6 for cysteine and tyrosine, and in Figure 8 for aspartate and serine.

#### *4.2. Metabolic Flux Analysis*

In this case of growth without glutamine, the metabolic fluxes satisfy the following set of balance equations.



Under conditions (3) and (4), this system of linear equations is determined and has the following solution.


We can conclude, as above, that the flux balance analysis based on the considered metabolic network allows computing the entire intracellular flux distribution from the measured extracellular uptake and excretion rates of Tables 2–4.

In this case, closing the overall flux balance requires that the non-essential amino acids that are significantly excreted be asparagine (0.47 <sup>μ</sup>M/d <sup>×</sup> 107 cells) and proline (0.47 <sup>μ</sup>M/d <sup>×</sup> <sup>10</sup><sup>7</sup> cells).

#### **5. Metabolic Flux Analysis for the Growth without Glucose and Glutamine**

*5.1. Metabolic Network*

5.1.1. Central Metabolism and Nucleotide Synthesis

We now consider the case where there is neither glucose nor glutamine in the culture, and where lactate, glutamate, and alanine are the main carbon and nitrogen sources for the central metabolism. This is represented by a metabolic network, shown in Figure 9, which involves a gluconeogenesis pathway for the synthesis of glucose-6-phosphate and a pathway for the synthesis of glutamine from alanine and glutamate. Moreover, the nucleotide synthesis pathway remains unchanged.

**Figure 9.** Growth without glucose and glutamine: central metabolism involving glyconeogenesis, TCA, and glutamine synthesis, nucleotide synthesis and metabolism of alanine, asparagine, aspartate, histidine, glycine, and serine.

#### 5.1.2. *Synthesis of Proteins*

In this phase of growth without glucose and glutamine, histidine is the essential amino acid with the lowest ratio between its external uptake rate (from Table 3) and its frequency in protein composition (from Table 5). From the assumption that the protein production rate is proportional to the cell growth rate, it appears however that maximization of cell growth is not applicable. Indeed, using equations of the form (5) and (6), we have here:

$$\left(\sum w\_{AA}\right)\_{\text{wgg}} \approx \frac{\mu\_{\text{wgg}}}{\mu\_{\text{cg}}} \left(\sum w\_{AA}\right)\_{\text{cg}} = \frac{0.04}{0.6} \times 6.159 = 0.41 \,\frac{\mu\text{M}}{\text{d} \times 10^7 \text{cells}}\tag{7}$$

$$\ll \frac{v\_{His}}{f\_{His}} = \frac{0.051}{0.024} = 2.12 \,\frac{\mu\text{M}}{\text{d} \times 10^7 \text{cells}}.\tag{8}$$

In Equation (7), the subscript '*wgg*' refers to growth without glucose and glutamine. It follows clearly from (7) and (8) that the actual protein production rate must be much smaller than the maximal rate that could be reached from the measured amino acid uptakes. Hence, the contributions of the amino acids are computed, in Table 5, by using the value ∑ *wAA* = 0.41 from Equation (7).

#### 5.1.3. Metabolism of Amino Acids

For the catabolism of essential amino acids, we adopt the same standard catabolic reactions represented in Figures 6 and 7.

Moreover, anabolic pathways are provided for glycine, serine, and proline, which are excreted into the extracellular medium under the current conditions (see Table 2). The metabolic pathways for glycine and serine are given in Figure 9. The metabolism of proline is represented in Figure 7b.

Furthermore, catabolic pathways are used for aspartate, cysteine, and tyrosine, because the uptake rate from the culture medium is larger that their contribution to the flux in protein production. The metabolism of these amino acids is represented in Figure 9 for aspartate and in Figure 6 for cysteine and tyrosine.

#### *5.2. Metabolic Flux Analysis*

**Internal Metabolite Flux Balance Equation** Glucose-6-P *v*<sup>1</sup> − *v*<sup>20</sup> = 0 Glyceraldehyde-3-P −*v*<sup>1</sup> − *v*<sup>2</sup> + *v*<sup>3</sup> = 0 Dihydroxyacetone P −*v*<sup>1</sup> + *v*<sup>2</sup> = 0 3-Phosphoglycerate −*v*<sup>3</sup> + *v*<sup>4</sup> − *v*<sup>23</sup> = 0 Pyruvate *v*<sup>5</sup> + *v*<sup>13</sup> − *v*<sup>27</sup> − *v*<sup>37</sup> = *vLac* = 10.218 Acetyl-coA *v*<sup>5</sup> − *v*<sup>6</sup> + 2*v*<sup>28</sup> + *v*<sup>31</sup> + *v*<sup>32</sup> = 0 Citrate *v*<sup>6</sup> − *v*<sup>7</sup> = 0 *α*-ketoglutarate *v*<sup>7</sup> − *v*<sup>8</sup> + *v*<sup>15</sup> − *v*<sup>19</sup> + *v*<sup>22</sup> + *v*<sup>23</sup> − *v*<sup>30</sup> − *v*<sup>31</sup> − *v*<sup>34</sup> + *v*<sup>37</sup> − <sup>2</sup>*v*<sup>38</sup> = <sup>0</sup> Succinyl-CoA *v*<sup>8</sup> − *v*<sup>9</sup> − *v*<sup>28</sup> + *v*<sup>35</sup> = 0 Succinate *v*<sup>9</sup> − *v*<sup>10</sup> + *v*<sup>28</sup> = 0 Fumarate *v*<sup>10</sup> − *v*<sup>11</sup> + *v*<sup>34</sup> + *v*<sup>39</sup> = 0 Malate *v*<sup>11</sup> − *v*<sup>12</sup> = 0 Oxaloacetate −*v*<sup>4</sup> − *v*<sup>6</sup> + *v*<sup>12</sup> + *v*<sup>13</sup> − *v*<sup>22</sup> = 0 Glutamate-5-semialdehyde *v*<sup>17</sup> − *v*<sup>18</sup> + *v*<sup>19</sup> = 0 *α*-ketobutyrate *v*<sup>26</sup> − *v*<sup>29</sup> = 0 Propionyl-CoA *v*<sup>29</sup> + *v*<sup>30</sup> + *v*<sup>31</sup> − *v*<sup>35</sup> = 0 *α*-ketoadipate −2*v*<sup>21</sup> + *v*<sup>36</sup> + *v*<sup>38</sup> = 0 Acetoacetate *v*<sup>21</sup> − *v*<sup>28</sup> + *v*<sup>32</sup> + *v*<sup>34</sup> = 0 Ribose-5-P *v*<sup>20</sup> − 2*v*<sup>39</sup> = 0

In this case of growth without glucose and without glutamine, the metabolic fluxes satisfy the following set of balance equations.


Under conditions (3) and (4), this system of equations is determined and has the following solution.


Here, the excreted amino acids are proline (0.86 <sup>μ</sup>M/d <sup>×</sup> <sup>10</sup><sup>7</sup> cells), asparagine (0.82 <sup>μ</sup>M/d <sup>×</sup> <sup>10</sup><sup>7</sup> cells), and serine (0.62 <sup>μ</sup>M/d <sup>×</sup> <sup>10</sup><sup>7</sup> cells).

#### **6. Final Remarks and Conclusions**

In this paper, we applied metabolic flux analysis to investigate the behavior of VERO cells in three different culture conditions.

As long as glucose is not limiting, this analysis supports the validity of a maximum growth hypothesis in which the amino acids are primarily used as building blocks for the formation of the biomass, even in case of glutamine deprivation. In the latter case, glutamine is replaced by glutamate as the nitrogen source and it can be observed that the biomass yield is even slightly higher (while the productivity is lower).

When glucose is exhausted, the cell growth does not stop, but continues at a smaller rate with the consumption of lactate as an alternative source of carbon, while using only a small part (about 20%) of available amino acids for biomass synthesis. As represented in the network of Figure 9, lactate is reintroduced into the cell, transformed into pyruvate, and integrated in the TCA cycle in order to provide a part of the required energy, which is no longer given by glycolysis. The rest of the energy is provided by the degradation of that part of amino acids, which are not used as building blocks for the biomass synthesis.

This analysis provides a metabolic foundation for the design of feedback control strategies in fed batch or perfusion bioreactors where the lactate concentration is measured online and regulated by controlling the delivery rates of glucose and, possibly, of some essential amino acids. Applications of such control strategies to CHO cells are discussed, e.g., in [23,24], while, to our knowledge, applications to VERO cell cultures have not been reported in the literature.

Let us finally mention that, in our metabolic analysis, one amino acid, namely *threonine*, was omitted from the model because experimental measurements are (unfortunately) missing. Obviously, this is equivalent to implicitly assume that, in the considered experiments, threonine, which is an essential amino acid, is consumed at the rate required for protein synthesis as given in Table 5. We can however confirm that this assumption is quite plausible on the basis of threonine data, which were obtained for the same cell line grown in similar conditions, but with a slightly different fetal serum (bovine instead of calf), as reported in [20], Chapter 3. This means that including threonine catabolism in the model, if actual data were available, should not significantly alter our results.

**Author Contributions:** Conceptualization, G.B., V.C. and A.V.W.; methodology, G.B., V.C. and A.V.W.; data curation and visualization, G.B. and V.C.; software, G.B.; investigation, G.B., V.C. and A.V.W.; writing—original draft preparation, G.B.; writing—review and editing, G.B. and A.V.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** All the data used in this study are available in the graphs and tables of this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Set Membership Estimation with Dynamic Flux Balance Models**

**Xin Shen and Hector Budman \***

Department of Chemical Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON N2L3G1, Canada; xin.shen@uwaterloo.ca **\*** Correspondence: hbudman@uwaterloo.ca

**Abstract:** Dynamic flux balance models (DFBM) are used in this study to infer metabolite concentrations that are difficult to measure online. The concentrations are estimated based on few available measurements. To account for uncertainty in initial conditions the DFBM is converted into a variable structure system based on a multiparametric linear programming (mpLP) where different regions of the state space are described by correspondingly different state space models. Using this variable structure system, a special set membership-based estimation approach is proposed to estimate unmeasured concentrations from few available measurements. For unobservable concentrations, upper and lower bounds are estimated. The proposed set membership estimation was applied to batch fermentation of *E. coli* based on DFBM.

**Keywords:** set membership estimation; dynamic flux balance model; multiparametric programming; observability; variable structure system

#### **1. Introduction**

The increasing demand of bio-pharmaceutical products requires continuous improvement in monitoring and control strategies for the fermentation processes. Model-based control and optimization strategies are crucial to boost productivity. Unlike traditional unstructured biochemical models, dynamic flux balance models (DFBM) have gained increasing attention since they contain more detailed information about the distribution of metabolic fluxes [1,2]. The strength of DFBM relies on their use of stoichiometric information about the cell metabolic network. The use of this information often results in models that require a smaller number of parameters as compared to another type of modelling approaches and thus are less prone to over-fitting. However, regardless of the choice of model, monitoring and control of industrial fermentation processes remains challenging because feedback control strategies require many states to be measured online. In reality, most states cannot be measured online either due to the expense of measuring equipment and its maintenance or the lack of online measurement devices [3–5]. Some states, including concentration of amino acids, metals, vitamins, ATP and precursors have great effect on the fermentation process but are either difficult or impossible to measure online.

To address the lack of online measurements, soft sensors have been proposed. Soft sensors are algorithms that estimate the values of the states based on few available online measurements. Data-driven soft sensors are currently very popular, driven by the interest in the artificial intelligence research area. Reported data-driven soft sensors are generally based on artificial neural networks, support vector machines, partial least squares, and fuzzy inference [4]. However, despite their popularity, the main drawback of data-driven soft sensors is their limited applicability to the region of data used for model training and the scarcity of data available for calibration [6]. Moreover, the lack of mechanistic information of these black box models introduces concerns about the safety and reliability of the controllers designed based on these models [6].

Another category of soft sensors are state observers based on mechanistic models such as a Luenberger observer, Kalman filter, and particle filter. These state observers estimate

**Citation:** Shen, X.; Budman, H. Set Membership Estimation with Dynamic Flux Balance Models. *Processes* **2021**, *9*, 1762. https://doi.org/10.3390/pr9101762

Academic Editors: Philippe Bogaerts and Alain Vande Wouwer

Received: 25 August 2021 Accepted: 26 September 2021 Published: 1 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the values of some states based on convergence of state prediction errors and provided that sufficient measurements are available [7]. A key prerequisite of theses state observer designs is that some observability condition is satisfied with respect to the estimated states. It will be shown later in the manuscript that, unless enough states of a DFBM model are measured online, it is difficult to satisfy full observability for all the states.

In the absence of observability of some states, instead of estimating their specific values it is possible to estimate intervals (ranges) of values based on a priori known range of initial conditions, i.e., range of values at time = 0. This type of problem is referred to in the literature as an initial values problem with parameter uncertainty or set-valued ODE integration. The parameter here refers to either uncertain initial states or some model parameter such as a kinetic constant. In the past several decades, different methods have been proposed to find tight bounds containing the reachable sets, including interval analysis [8], Taylor models with different remainder bounds [9], set-base parameter estimation [10], and different relaxation methods [11]. Due to the uncertainty amplification effect, interval analysis can diverge quickly and only suits a small part of the system. Set-based parameter estimation is computationally expensive because the parameter space need to be validated in a piecewise manner and each validation test requires the solution of a semi-definite programming problem. Taylor models can be used to find tight and nonconvex bounds of reachable sets but cannot be easily formulated to take measurements into consideration. To find the reachable sets compatible with available measurements, different relaxation methods and domain reduction are required which are computationally expensive.

When measurements are available, the trajectories that are not compatible with those measurements should be removed from the reachable sets. Estimation algorithms that efficiently deal with reachable sets subject to measurements including interval observers and set membership estimation algorithms [12,13]. An interval observer is usually composed of two classical observers (framers) which estimate the lower and upper bounds of states, respectively. However, sufficient measurements and fulfillment of observability are still required to build the two classical observers [14,15]. Most interval observers exploit the order-preserving properties of cooperative systems to estimate bounds of states [16]. Set membership estimation is an alternative method for estimating the uncertainty of a set of states that has been applied to linear systems [17]. The propagation of uncertainty along time is performed by a series of affine mapping operations over sets. Different shapes of sets have been used to contain the uncertainty, including zonotopes [18], parallelotopes [17] and ellipsoids [19].

In this research, a set membership estimation approach is proposed for nonlinear systems described by DFBM models. The DFBM is converted into a variable structure system composed of several continuous systems in different region of state space by multiparametric linear programming. To address the lack of measurements an Extended Kalman Filter (EKF) is used to estimate nominal values of some states which are important for determining metabolic fluxes. Then, a set membership estimation algorithm is applied for DFBM to estimate bounds of all states. A detector is proposed to detect the switch between different subsystems.

The paper is organized as follows. Section 2.1 introduces background of DFBM. Section 2.2 describes the use of multiparametric linear programming to convert the DFBM into a variable structure system composed of subsystems. Section 2.3 describes the EKF used to estimate some states which are important for determining metabolic fluxes. Section 2.4 presents the main ideas of set propagation and error compensation for calculation of states' bounds. Section 2.5 presents the algorithm for detecting the switch between different subsystems. Section 3 provides the application of the proposed techniques to the batch fermentation of *E. coli*. Section 4 presents a Discussion of the results followed by Conclusions.

#### **2. Materials and Methods**

#### *2.1. Dynamic Flux Balance Models*

Dynamic flux balance models (DFBM) are structured genome-based metabolic models developed from flux balance models. The key assumption of DFBM is that the cells act as agents distributing resources through metabolic reaction networks to boost a biological objective, e.g., growth rate [1]. Accordingly, the DFBM is formulated as an optimization problem. In the literature [20], both dynamic and static optimization approaches are reported. In the dynamic approach, the nonlinear programming problem is solved over a relatively large time period which is computationally expensive and thus less convenient for uncertainty propagation. In this investigation, static optimization approach is adopted for its simplicity. DFBM is interpreted as a local linear programming problem to maximize a biological objective. In terms of dynamics of intracellular metabolites, there are two type of DFBM models in the literature. One type of DFBM differentiates intracellular and extracellular environments and assumes that the intracellular metabolic reactions are fast enough such as it can be assumed at quasi-steady state [2,21]. Accordingly, only the extracellular metabolites and the biomass are described by dynamic state equations. It has been argued that the intracellular metabolite concentrations are not constant and may change over time [22]. Accordingly, there is a second type of DFBM, used in the current study, which does not differentiate between intracellular and extracellular compartments and the dynamics of all the metabolites are considered [20,23]. The governing equations of DFBM are based on discretized mass balances for all metabolites and these are defined by Equations (1a)–(1d).

$$\mathbf{x}\_{k+1} = \underset{\alpha}{\mathbf{B}} \mathbf{x}\_k + \boldsymbol{\Delta t} \mathbf{x}\_{\text{lin}\boldsymbol{\beta}k} \mathbf{A} \mathbf{z}\_k + \boldsymbol{h} \tag{1a}$$

$$y\_k = \mathbb{C}x\_k + r\_k \tag{1b}$$

$$\mathbf{x}\_{\mathbf{0}} \in \mathcal{P}\_{0} \tag{1c}$$

$$r\_k \sim T N(\mathbf{0}, \Sigma, l, \mathfrak{u}) \quad k = 0, 1, 2 \cdots \tag{1d}$$

where *xk* is a vector of *nx* state variables at the time step *<sup>k</sup>*. The state vector *<sup>x</sup>* includes concentrations of metabolites and biomass *xbio*. *<sup>y</sup>* is a vector of *ny* measured variables. *B* <sup>∈</sup> <sup>R</sup>*nx* <sup>×</sup> <sup>R</sup>*nx* is a constant diagonal matrix with diagonal elements *bj*, *<sup>j</sup>* <sup>=</sup> 1, ··· , *nx*. <sup>Δ</sup>*<sup>t</sup>* is a constant discrete time step size. *A* <sup>∈</sup> <sup>R</sup>*nx* <sup>×</sup> <sup>R</sup>*nrct* is a stoichiometry coefficient matrix, where *nrct* is the number of reactions considered in the metabolic network. *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*nrct* is the metabolic flux vector and its calculation is discussed below. *h* <sup>∈</sup> <sup>R</sup>*nx* is a constant vector. The initial state *<sup>x</sup>***<sup>0</sup>** is assumed to be bounded by a finite polyhedron <sup>P</sup><sup>0</sup> as Equation (1c). The underlying assumption is that in practice the initial concentrations of the culture medium components are known to be within specific ranges of values P0. This assumption is based on the fact that some variation in media formulation occurs due to human factor and variability in raw materials. Hence, this research focuses on the initial uncertainty and we assume all parameters in the state equations to be known accurately. In other words, the method proposed in this research cannot deal with model structure uncertainty like uncertainty in matrix *A*. However, the method can be extended to deal indirectly with uncertainty in parameters *θ* defined in the following paragraphs.

*rk* <sup>∈</sup> <sup>R</sup>*ny* are measurement noise vectors of which the elements follow the truncated multivariate normal distribution (TN) [24,25]. The probability density function *p* for *TN*(*μ*, **<sup>Σ</sup>**, *l*, *u*) are defined as per Equation (2).

$$p(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{I}, \boldsymbol{\mu}) = \frac{\exp\{-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\}}{\int\_{I}^{\boldsymbol{\mu}} \exp\{-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\}}\tag{2}$$

For *rk*, the mean vector of TN is **<sup>0</sup>** <sup>∈</sup> <sup>R</sup>*ny* ; the covariance is **<sup>Σ</sup>** <sup>∈</sup> <sup>R</sup>*ny* <sup>×</sup>R*ny* ; the corresponding variance vector is *<sup>σ</sup>*<sup>2</sup> <sup>∈</sup> <sup>R</sup>*ny* ; the lower bound and upper bound are *l* <sup>∈</sup> <sup>R</sup>*ny* and *u* <sup>∈</sup> <sup>R</sup>*ny* , respectively. |·| indicates the absolute value of a vector. It is assumed that <sup>|</sup>*l*| ≤ <sup>3</sup>*<sup>σ</sup>* and <sup>|</sup>*u*| ≤ <sup>3</sup>*σ*, which indicate that the absolute values of the lower bound and upper bound, respectively, are within the range of 3*σ*. For simplicity, the current study assumes the process noise to be zero. Process noise could be included as an additional state but this is beyond the scope of the current work.

Following the assumption that the cell allocates resources optimally, the metabolic flux *v* vector at each time step is obtained by solving a linear programming (LP) problem, defined by Equations (3a) and (3b).

$$\max\_{\mathbf{v}\_k} \tag{3a}$$

$$\hat{\mathbf{s}}\mathbf{u}\mathbf{j}\mathbf{c}\mathbf{t}\mathbf{t}\mathbf{t}\tag{3b}$$

$$\mathbf{G}\mathbf{v}\_{k}\leq\mathbf{F}\boldsymbol{\theta}\_{k}(\mathbf{x}\_{k})+\mathbf{z}\tag{3b}$$

where *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>*nrct* , *<sup>F</sup>* <sup>∈</sup> <sup>R</sup>*nG* <sup>×</sup> <sup>R</sup>*n<sup>θ</sup>* , *<sup>z</sup>* <sup>∈</sup> <sup>R</sup>*nG* , *<sup>G</sup>* <sup>∈</sup> <sup>R</sup>*nG* <sup>×</sup> <sup>R</sup>*nrct* , *<sup>θ</sup>* <sup>∈</sup> **<sup>Θ</sup>** <sup>⊆</sup> <sup>R</sup>*n<sup>θ</sup>* . *nG* is the number of linear constraints. The parameter vector *θ* is a nonlinear vector-valued function of states *<sup>x</sup>*. *<sup>n</sup><sup>θ</sup>* denotes the number of elements in the parameter vector *<sup>θ</sup>*. Usually, each element *θ* is only function of two states at most and one of these two states is biomass concentration. **Θ** denotes the parameter space where the optimal solution of the LP resides. Equation (3a) denotes the objective of the LP that cells are optimizing where the most commonly used objective is the biomass production rate, i.e., growth rate. Thus, cells try to maximize growth rate by allocating limited resources. The LHS (left hand-side) in Equation (3b) describes either the rate of change of metabolite concentrations or the change of metabolite concentrations over a discretization time step <sup>Δ</sup>*t*. Matrices *G* are constant matrices containing the information of the stoichiometry of reactions. RHS in Equation (3b) is a function of *xk*, denoting the metabolic reaction bounds for each step. The matrix *F* is a matrix of which the elements are the part of the right hand side of the constraints that are functions of states at the previous time interval. *z* is a vector containing constant values such as constant uptake rate limits. Therefore, linear constraints of flux *v* in Equation (3b) are reaction rate limits or bounds on available resources (nutrients). Numerical examples of these matrices and vectors are shown for the *E. coli* model in the results section.

#### *2.2. Multiparametric Linear Programming for DFBM*

#### 2.2.1. Multiparametric Linear Programming

While set -based methods are available for uncertainty propagation for linear state space equations, these methods are not directly applicable to DFBM. The reason is that the fluxes used in the state equations are obtained from an LP and thus the problem is nonlinear due to the nonlinear function *<sup>θ</sup>***(***x***)** and the occurrence of different sets of active constraints. To tackle the dependency of the state equations on the LP, the concept of multiparametric linear programming (mpLP) is used to convert the DFBM into a variable structure system which is composed of subsystems. Multiparametric linear programming divides the parameter space (**Θ**) into different regions corresponding to different sets of active constraints and generates explicit expressions for calculating optimal solutions (*v*) for each region [26–28].

Let assume a given optimal solution *v* of the LP (Equation (3)) where subscript <sup>A</sup> and I denote indices of active and inactive constraints, respectively. Using this notation Equation (3b) is decomposed into two parts, equalities *<sup>G</sup>Avk* <sup>=</sup> *<sup>F</sup>Aθk***(***xk***)** <sup>+</sup> *<sup>z</sup><sup>A</sup>* and inequalities *<sup>G</sup><sup>I</sup> vk* <sup>≤</sup> *<sup>F</sup><sup>I</sup> <sup>θ</sup>k***(***xk***)** <sup>+</sup> *<sup>z</sup>I*. Without loss of generality, let us assume that *<sup>G</sup>*<sup>A</sup> is linear independent (linear redundant rows can always been removed by Gaussian elimination). Let *H* <sup>=</sup> *G−***<sup>1</sup>** *<sup>A</sup> <sup>F</sup><sup>A</sup>* and *<sup>g</sup>* <sup>=</sup> *<sup>G</sup>−***<sup>1</sup>** *<sup>A</sup> <sup>z</sup>A*, then the optimal solution can be obtained by Equation (4).

$$w\_k = H\theta\_k(\mathbf{x}\_k) + \mathbf{g} \tag{4}$$

Substituting Equation (4) into the inequality constraints results in Equation (5).

$$(\mathbf{G}\_{\mathcal{T}}\mathbf{H} - \mathbf{F}\_{\mathcal{T}})\theta\_k(\mathbf{x}\_k) < \mathbf{z}\_{\mathcal{T}} - \mathbf{G}\_{\mathcal{T}}\mathbf{g} \tag{5}$$

Equation (5) defines a polyhedral region of *θ* where the existence of the optimal solution is ensured by Equation (4). The region defined by Equation (5) is referred to as a critical region in the multiparametric programming literature. Different critical regions are defined by different combinations of A and I. Then, the entire parameter space **Θ** can be decomposed into connected critical regions denoted by {**Θ***<sup>i</sup>* }, *i* = 1, ··· , *n***Θ**. *n***<sup>Θ</sup>** denotes the total number of critical regions in **Θ**. In practice, critical regions that are very small are ignored and assumed to be covered by the adjacent critical region. Correspondingly, superscript *i* is used to denotes the *<sup>i</sup>*-th critical region. Assume for a specific *<sup>θ</sup>* <sup>∈</sup> **<sup>Θ</sup>***<sup>i</sup>* , the optimal flux *v* vector can be calculated analytically by *v<sup>i</sup> k* <sup>=</sup> *<sup>H</sup><sup>i</sup> <sup>θ</sup>k* <sup>+</sup> *<sup>g</sup><sup>i</sup>* thus bypassing the need for solving the LP. Following the literature and our previous studies, for a given *θ*, multiple optimal solutions can coexist [29,30]. In other words, multiple Equation (4) can coexist which results in different ways to divide the parameter space **Θ**. When such multiplicity issue occurs it results in different time trajectories. For simplicity, multiplicity is not addressed in the current study and it is addressed in a separate work by different methods from the one presented here.

By substituting the optimizer equation *v<sup>i</sup> k* <sup>=</sup> *<sup>H</sup><sup>i</sup> <sup>θ</sup>k* <sup>+</sup> *<sup>g</sup><sup>i</sup>* into Equation (1a), we obtained a set of governing state equations as per Equations (6a)–(6e). Since different *<sup>θ</sup>k* are within different critical regions as Equation (6b), each critical region corresponds to different state equations Equation (6a). Thus the set {**Θ***<sup>i</sup>* } defines a family of state space models and this family is referred to as a variable structure system. A variable structure system is a piecewise continuous system composed of subsystems where each subsystem corresponds to a different region of the state space. Furthermore, the region of the state space corresponding to a specific subsystem is referred to as a critical region. Each subsystem is described by a different set of state equations. Accordingly, the state equations need to be changed as soon as the states enter into a new critical region. Here, the superscript *i* denotes the *i-th* subsystem corresponding to a critical region **Θ***<sup>i</sup>* . Equations (6c)–(6e) remain the same form as Equations (1b)–(1d).

$$\mathbf{x}\_{k+1} = B\mathbf{x}\_k + \Delta t \mathbf{x}\_{\text{bio},k} A (H^t \theta\_k(\mathbf{x}\_k) + \mathbf{g}^t) + h \tag{6a}$$

$$\theta\_k(\mathbf{x}\_k) \in \Theta^i \quad i = 1, \dots, n\_{\Theta} \tag{6b}$$

$$y\_k = \mathbf{C}\mathbf{x}\_k + r\_k \tag{6c}$$

$$\mathbf{x}\_{\emptyset} \in \mathcal{P}\_{0} \tag{6d}$$

$$r\_k \sim TN(\mathbf{0}, \Sigma, l, \mathfrak{u}) \quad k = 0, 1, 2 \cdots \tag{6e}$$

#### 2.2.2. Reaction Rate Estimability

To further simplify the system described by Equations (6a)–(6e) it is possible to exploit the sparseness of the *H* matrix. For instance, to take advantage of zero columns of *H*, Equation (4) can be re-written as shown in Equation (7). For conciseness, the subscript *k* is omitted here because Equation (7) applies for all time steps.

$$\mathbf{v}^{i} = H^{i}\boldsymbol{\theta}(\mathbf{x}) + \mathbf{g}^{i} = \begin{bmatrix} H\_{\text{N}}^{i} & H\_{\text{Z}}^{i} \end{bmatrix} \begin{bmatrix} \theta\_{\text{N}}^{\ell}(\mathbf{x}\_{\text{N}}^{\ell}) \\ \theta\_{\text{Z}}^{\ell}(\mathbf{x}) \end{bmatrix} + \mathbf{g}^{i} = H\_{\text{N}}^{i}\theta\_{\text{N}}^{i}(\mathbf{x}\_{\text{N}}^{i}) + \mathbf{g}^{i} \tag{7}$$

In Equation (7) *<sup>N</sup>* and *<sup>Z</sup>* denote the indices of the nonzero and zero columns of the *H* matrix, respectively. Because *HZ* is a submatrix containing the zero columns of *<sup>H</sup>*, the flux *<sup>v</sup>* is only a function of parameters *<sup>θ</sup>N***(***xN***)** according to Equation (7). Moreover, while the parameters *<sup>θ</sup>* are a function of states *x* (see Equations Equations (1a)–(1d) and (3)), only some elements of *<sup>x</sup>* actually determine the entire flux vector *<sup>v</sup>*. The vector *xN* contains, according to Equation (7), the states that determine the flux vector. Notice that for different critical regions flux-determining vector *xN* contains different states. Therefore, Equation (6a) can be simplified into Equation (8).

$$\mathbf{x}\_{k+1} = \mathbf{B}\mathbf{x}\_k + \Delta t \mathbf{x}\_{\text{div},k} \mathbf{A} \left( H\_N^t \theta\_N^l \left( \mathbf{x}\_{N,k}^l \right) + \mathbf{g}^l \right) + h \tag{8}$$

The biological interpretation of the flux-determining state vector *xN* is that only some resources are limiting the growth of cells, either because they are limited or because the activity of enzymes in the related reactions (fluxes) is limiting. As the fermentation progresses, the states transit into new critical regions from old critical regions. Different critical regions can be interpreted as different metabolic stages where *xN* are different. Similar interpretations have been reported in [26] in the context of steady state flux balance analysis.

In Equation (8), the term <sup>Δ</sup>*txbio*,*kA*(*H<sup>i</sup> N θi N* **(***x<sup>i</sup> N***,***k***)** denotes the change of metabolite concentrations contributed by metabolic reactions. Therefore, the reaction rates are *xbio*,*kA*(*H<sup>i</sup> N θi N* **(***x<sup>i</sup> N***,***k***)**. It is noted that this nonlinear reaction rate term is not only a function of the flux-determining states vector *xN* but also of biomass concentration *xbio*, because the fluxes are defined per unit biomass, i.e., more biomass demands more nutrients to satisfy the requirement of the growth. Once the states that determine the reaction rates, i.e., the states *xN* together with the value of *xbio*, can be estimated, the estimation problem can be simplified greatly. Since in some cases *xN* contains *xbio* but in some cases it does not, we define a reaction-rate-determining state vector *xM* in Equation (9). Hence, the reaction-rate-determining state vector *xM* always contains the flux-determining states *xN* and the biomass state *xbio* without any redundancy.

$$\mathbf{x}\_M = \begin{cases} \mathbf{x}\_{N\prime} & \text{if } \mathbf{x}\_N \text{ contains the biomass state } \mathbf{x}\_{bio} \text{ .}\\ \begin{bmatrix} \mathbf{x}\_N \\ \mathbf{x}\_{bio} \end{bmatrix} & \text{otherwise.} \end{cases} \tag{9}$$

The vector *xM* for critical region **<sup>Θ</sup>***<sup>i</sup>* is denoted by *<sup>x</sup><sup>i</sup> M*. We define reaction rate estimability as the ability to determine the reaction rates *xbio*,*kA*(*H<sup>i</sup> Nθi N***(***x<sup>i</sup> N***,***k***)** in the metabolic networks which is needed for the calculation of Equation (8). Following the above, once reaction-rate-determining state vector *xM* at time step *<sup>k</sup>* can be estimated, the dynamic evolution of the culture at step *k* + 1 as per Equation (8) can be predicted. In addition, it should be noticed that it is not necessary to measure all the reaction-rate-determining states for reaction rate estimability and instead some states can be estimated by an observer from available measurements. However, if an observer is used to estimate *x<sup>i</sup> M*, some particular combination of measurements is necessary for observability of *x<sup>i</sup> M*. Considering different measurement combinations Ω*<sup>i</sup>* <sup>1</sup>, <sup>Ω</sup>*<sup>i</sup>* <sup>2</sup>... for the critical region **Θ***<sup>i</sup>* , only some combinations provide full observability of *x<sup>i</sup> M*. Let <sup>Ω</sup>*<sup>i</sup>* <sup>O</sup> be defined as a family of sets of measurements, which contains all measurement combinations that fulfill observability of *x<sup>i</sup> M*.

Although many different critical regions and corresponding combinations of measurements could be considered, in practice the possibilities will be limited because industrial fermentations usually operate in a narrow range of operating conditions. Thus, the dynamic trajectories of states only pass through a limited set of critical regions. Assume for <sup>∀</sup>*x***<sup>0</sup>** ∈ P0, the set of critical regions that the trajectories traverse are <sup>Γ</sup>. Then, the minimum set of measurements required for the reaction rate estimability of the critical region set Γ is ΩΓ as per Equations (10a)–(10c).

$$
\Omega \Omega\_{\Gamma} = \min\_{j} |\bigcup\_{i} \Omega\_{j}^{i}| \tag{10a}
$$

subject to *i* ∈ Γ (10b)

$$
\Omega^i\_j \in \Omega^i\_{\mathcal{O}} \tag{10c}
$$

where |·| is the cardinality of a finite countable set, i.e., the number of elements of a set. In Equation (10c), Ω*<sup>i</sup> <sup>j</sup>* <sup>∈</sup> <sup>Ω</sup>*<sup>i</sup>* <sup>O</sup> indicates that the measurement combination <sup>Ω</sup>*<sup>i</sup> <sup>j</sup>* can fulfill the observability of reaction-rate-determining states *x<sup>i</sup> M* of critical region **<sup>Θ</sup>***<sup>i</sup>* . If all states in set ΩΓ are measured, the reaction rate term of any trajectory starting from P<sup>0</sup> can be estimated by the observer. In other words, although *x<sup>i</sup> M* in different critical regions may be different, requiring different measurements for observability, *x<sup>i</sup> M* is always observable if the chosen set of measurements satisfy Equation (10c).

#### *2.3. Extended Kalman Filter (EKF)*

Using the minimum required set of measurements, ΩΓ is defined in Equation (10c), *xM* can be estimated by an observer. *xM* corresponds to the observable subspace of the governing equation (Equations (1a)–(1d)) for each critical region. The state equation of the observable subspace for critical region **Θ***<sup>i</sup>* is given by Equations (11a)–(11c).

$$\mathbf{x}\_{M,k+1}^{i} = f^{i}(\mathbf{x}\_{M,k}^{i}) = \mathbf{B}\mathbf{x}\_{M,k}^{i} + \Lambda t \mathbf{x}\_{\text{fib},k} \mathbf{A}\_{M}(H\_{N}^{i}\boldsymbol{\theta}\_{N}^{i}(\mathbf{x}\_{N,k}^{i}) + \mathbf{g}^{i}) + \mathbf{h}\_{M} \tag{11a}$$

$$y\_k = \mathbf{C}\_{M}^{l} \mathbf{x}\_{M,k}^{l} + r\_k \tag{11b}$$

$$r\_k \sim TN(\mathbf{0}, \Sigma, l, \mathfrak{u}) \quad k = 0, 1, 2 \cdots \tag{11c}$$

where *x<sup>i</sup> N***,***k* and *<sup>x</sup><sup>i</sup> M***,***k* are the flux-determining state vector and the reaction-rate-determining state vector for critical region **Θ***<sup>i</sup>* , respectively; *AM* is the stoichiometry submatrix corresponding to *xM*. Similarly *hM* is a sub-vector of *<sup>h</sup>* corresponding to *xM*. It should be noticed that, for different critical regions, *xM* involves different states. Accordingly, each critical region requires the use of a different EKF. In addition, it should be noticed that the *Ci M* matrices are different for each critical region but the measured variables ΩΓ are the same since the same sensors are used for the entire fermentation.

To estimate *x<sup>i</sup> M*, an standard EKF is used due to its effective and simple structure [31]. The estimate *x***ˆ***<sup>i</sup> M***,***k* and covariance *<sup>P</sup><sup>i</sup> k* of *<sup>x</sup><sup>i</sup> M* for critical region **<sup>Θ</sup>***<sup>i</sup>* are described by Equation (12a) and Equation (12b), respectively.

$$\mathfrak{X}\_{M,k}^{i} = f^{i}(\mathfrak{X}\_{M,k-1}^{i}) + \mathbb{K}\_{k}(y\_{k} - \mathbb{C}\_{M}^{i}f^{i}(\mathfrak{X}\_{M,k-1}^{i})) \tag{12a}$$

$$P\_k^{i^{-1}} = \Phi\_{k-1}^i P\_{k-1}^i \Phi\_{k-1}^i \prescript{T}{}{}{\mathcal{C}}\_M^i (\Sigma \Sigma^T)^{-1} \prescript{}{M}{}{\mathcal{C}}\_M^i \tag{12b}$$

where

$$\mathbf{K}\_{k} = \Phi\_{k-1}^{i} \mathbf{P}\_{k-1}^{i} \Phi\_{k-1}^{i} \prescript{T}{}{\mathbf{C}}\_{M}^{i} \prescript{T}{}{\mathbf{C}}\_{M}^{i} \mathbf{O}\_{k-1}^{i} \mathbf{P}\_{k-1} \mathbf{O}\_{k-1}^{i} \prescript{T}{}{\mathbf{C}}\_{M}^{i} \prescript{T}{}{\mathbf{C}}\_{M}^{i} + \Sigma \boldsymbol{\Sigma}^{T})^{-1} \tag{13a}$$

$$\boldsymbol{\Phi}\_{k}^{i} = \frac{\partial f^{i}}{\partial \mathbf{x}\_{M}^{I}} (\boldsymbol{\mathfrak{x}}\_{M,k}^{i}) \tag{13b}$$

The measurement noise is assumed to be a truncated multivariate normal distribution as Equation (11c). This assumption is needed for estimating finite bounds as explained in the following section. Recall in Equation (2) that <sup>|</sup>*l*| ≤ <sup>3</sup>*<sup>σ</sup>* and <sup>|</sup>*u*| ≤ <sup>3</sup>*σ*, the lower and upper bounds are located within the range of 3*σ*. The covariance matrix *Pk* is always overestimated to ensure boundedness. Although the EKF resulting from this assumption is sub-optimal, it is still sufficient to estimate *x<sup>i</sup> M*.

#### *2.4. Set Propagation and Error Compensation*

Since the minimum set of measurements defined by Equations (10a)–(10c) can only ensure the observability of *xM*, the estimation of other states needs different estimation strategies. The idea is to exploit the a priori knowledge of the initial ranges of initial conditions to estimate all states. Instead of predicting specific values of states, set membership estimation (SME) approach is used to predict sets containing all possible states by a series of set operations. These set operations usually include linear mapping, projection, translation, Minkowski addition, intersection, union, and outer approximation. In this research, all sets and multiparametric linear programming operations are performed with the Multi-Parametric Toolbox 3.0 (https://www.mpt3.org/ accessed on 15 July 2021) [32] and MATLAB R2018a. The *E. coli* example can be found online (https://github.com/SetMembershipEstimationDFBM/E.coliExample, accessed on 25 September 2021). For DFBM, SME propagates the initial set P<sup>0</sup> by affine mapping as Equation (14). Affine mapping involves two operations: linear mapping of the previous set and translation.

$$
\hat{\mathcal{X}}\_{k+1} \approx \underbrace{\mathbf{B}\hat{\mathcal{X}}\_{k}}\_{\text{linear mapping}} + \underbrace{\Delta t \hat{\mathfrak{x}}\_{\text{bio},k} A (H\_N^{\bar{t}} \theta\_N^{\bar{t}} (\hat{\mathfrak{x}}\_{N,k}^{\bar{t}}) + \mathbf{g}^{\bar{t}}) + h}\_{\text{translation}} \tag{14}
$$

where <sup>X</sup><sup>ˆ</sup> *<sup>k</sup>* represents the set of states at time step *<sup>k</sup>* and <sup>X</sup><sup>ˆ</sup> <sup>0</sup> = P0, i.e., the set of initial conditions assumed to be known. In Equation (14), the translation term is approximated by using the estimate *x***ˆ***<sup>i</sup> M***,***k* obtained by the EKF. In the application of EKF, the estimate *<sup>x</sup>***ˆ***<sup>i</sup> M***,***k* needs several time steps to converge to the true flux-determining states *x<sup>i</sup> M***,***k*. Thus the SME described by Equation (14) may underestimate bounds while the EKF is converging. To mitigate this problem, a correction is implemented to compensate for the estimate error as described below. Since no extra information is available, the compensation of the estimate error is based on the worst case scenario.

The error in the estimate incurred by the observer for critical region **<sup>Θ</sup>***<sup>i</sup>* is *e<sup>i</sup> M* <sup>=</sup> *xi M***,***k* <sup>−</sup> *<sup>x</sup>***ˆ***<sup>i</sup> M***,***k*. Since *<sup>x</sup><sup>i</sup> M***,***k* always contains biomass *<sup>x</sup><sup>i</sup> bio*,*<sup>k</sup>* and *<sup>x</sup><sup>i</sup> N***,***k*, the corresponding estimate errors are defined as *e<sup>i</sup> N***,***k* <sup>=</sup> *<sup>x</sup><sup>i</sup> N***,***k* <sup>−</sup> *<sup>x</sup>***ˆ***<sup>i</sup> N***,***k* and *<sup>e</sup><sup>i</sup> bio* = *<sup>x</sup><sup>i</sup> bio*,*<sup>k</sup>* <sup>−</sup> *<sup>x</sup>*ˆ*<sup>i</sup> bio*,*k*. Let us assume that the function *θ* is first-order differentiable and define Jacobian matrix *ψ<sup>i</sup> k*.

$$
\Psi^i\_k = \frac{\partial \theta^i\_N}{\partial \mathbf{x}^i\_N} (\pounds^i\_{N,k}) \tag{15}
$$

Substituting the estimate error *e<sup>i</sup> k*, *ei bio* and Jacobian matrix *ψ<sup>i</sup> k* into Equation (8), a corrected state equation that accounts for the estimate error is obtained as Equations (16a) and (16b). Equations (16a) and (16b) uses a first order approximation to account for the state deviation *i k* caused by the estimate error *<sup>e</sup><sup>i</sup> M***,***k*, while the EKF is converging. The error compensation based on linearization provides satisfactory bounds because the error between estimate and measured is small and decreases quickly due to the convergence of EKF.

*N*

$$\mathbf{x}\_{k+1} = \mathbf{B}\mathbf{x}\_k + \Delta t \mathbf{\hat{x}}\_{bio,k} \mathbf{A} (H\_N^t \boldsymbol{\Psi}\_k^t \boldsymbol{\mathfrak{X}}\_{N,k}^t + \mathbf{g}^t) + h + \boldsymbol{\mathfrak{e}}\_k^t \tag{16a}$$

$$
\varepsilon\_k^t = D\_k e\_{N,k}^t + e\_{lio,k} \mathcal{M}\_k e\_{N,k}^t + L\_k \varepsilon\_{lio,k} \tag{16b}
$$

where

$$D\_k = \hat{x}\_{bio,k} \Delta t A H\_N^t \psi\_k^t + h \tag{17a}$$

$$M\_k = \Delta t A H\_N^i \Psi\_k^i$$

$$L\_k = \Delta t A \left( H\_N^t \theta\_{N,k}^t \left( \mathfrak{X}\_{N,k}^t \right) + \mathfrak{g}^t \right) \tag{17c}$$

In this work, the noise was assumed to follow a truncated multivariate Gaussian distribution. The corresponding standard multivariate Gaussian distribution of noise contains the truncated one. As illustrated in Figure 1, when an EKF is used to estimate the states, the distribution of states with a standard Gaussian noise should similarly contain the one with the truncated Gaussian noise, which is the true distribution of states. Moreover, the distribution of states by standard EKF is also a multivariate Gaussian distribution. For Gaussian distribution, 99.7% of the samples are within the interval of 6 standard deviations from both sides of the mean for each state. Thus, an interval set based on 6 standard deviations can contain the distribution by standard EKF and eventually contain the true distribution of states as in Figure 1. Since *P<sup>i</sup> k* is the covariance of a standard EKF, the diagonal elements of matrix *P<sup>i</sup> k* are the variances for each state. Therefore, diagonal elements of *P<sup>i</sup> k* can be used to define the interval set to bound the error *<sup>i</sup> k*.

**Figure 1.** Illustration of the interval set containing the distribution of states.

To formulate an error compensation operation scheme several set operations are introduced first as follows. The *<sup>n</sup>*-dimensional interval set is <sup>S</sup>(*p*, *q*) with lower bound *p* and upper bound *q* as <sup>S</sup>(*p*, *q*) = {*x* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : *p* <sup>≤</sup> *x* <sup>≤</sup> *q*}. The outer approximation operation Q(·) of a bounded set W is denoted by Q(W), which involves the mapping of the set W to a new interval set. If the infimum and supremum are denoted by inf(·) and sup(·), respectively, the outer approximation of the set W is Q(W) = S(inf(W), sup(W)). The operator ⊕ is the Minkowski addition of two sets. For example, for two sets *α* and *β*, *<sup>α</sup>* <sup>⊕</sup> *<sup>β</sup>* <sup>=</sup> {*a* <sup>+</sup> *b* : <sup>∀</sup>*a* <sup>∈</sup> *<sup>α</sup>*, <sup>∀</sup>*b* <sup>∈</sup> *<sup>β</sup>*}.

Notice that the diagonal elements of *P<sup>i</sup> k* are the variances of each state. Then, if the standard deviation of *e<sup>i</sup> N***,***k* is *<sup>η</sup><sup>i</sup> N***,***k* and of *<sup>e</sup><sup>i</sup> bio*,*<sup>k</sup>* is *<sup>η</sup><sup>i</sup> bio*,*k*, two interval sets E*N*,*<sup>k</sup>* and E*bio*,*<sup>k</sup>* can be defined to bound *η<sup>i</sup> N***,***k* and *<sup>η</sup><sup>i</sup> bio*,*k*, respectively, based on the choice of 3 standard deviation ranges, as *e<sup>i</sup> N***,***k* ∈ E*N*,*<sup>k</sup>* <sup>=</sup> <sup>S</sup>(−3*η<sup>i</sup> N***,***k*, 3*η<sup>i</sup> N***,***k*) and *<sup>e</sup><sup>i</sup> bio*,*<sup>k</sup>* ∈ E*bio*,*<sup>k</sup>* <sup>=</sup> <sup>S</sup>(−3*η<sup>i</sup> bio*,*k*, 3*η<sup>i</sup> bio*,*k*). In Equation (16b), since <sup>|</sup>*e<sup>i</sup> bio*,*k*<sup>|</sup> <sup>&</sup>lt; <sup>3</sup>*η<sup>i</sup> bio*,*k*, we have *ebio*,*kMke<sup>i</sup> N***,***k* <sup>∈</sup> <sup>3</sup>*η<sup>i</sup> bio*,*kMk*E*N*,*k*. Similarly, the other two terms in Equation (16b) can be bounded as *Dke<sup>i</sup> N***,***k* <sup>∈</sup> *Dk*E*N*,*<sup>k</sup>* and *Lkebio*,*<sup>k</sup>* <sup>∈</sup> *Lk*E*bio*,*k*, respectively. Therefore, the state deviation *<sup>i</sup> k* term can be contained within the interval set E,*<sup>k</sup>* according to Equation (18).

$$
\boldsymbol{\varepsilon}\_{\mathbf{k}}^{i} \in \mathcal{E}\_{\mathbf{c},\mathbf{k}} = \mathcal{Q}((\mathbf{D}\_{\mathbf{k}} + 3\eta\_{bio,\mathbf{k}}^{i}\mathbf{M}\_{\mathbf{k}})\mathcal{E}\_{\mathbf{N},\mathbf{k}}) \oplus \mathcal{Q}(\mathbf{L}\_{\mathbf{k}}\mathcal{E}\_{bio,\mathbf{k}}) \tag{18}
$$

where the sets *Dk*E*N*,*<sup>k</sup>* and 3*η<sup>i</sup> bio*,*kMk*E*N*,*<sup>k</sup>* occurring in Equation (18) are combined together. On the other hand, *Lk*E*bio*,*<sup>k</sup>* originates from a different set <sup>E</sup>*bio*,*<sup>k</sup>* and thus Minkowski addition must be used to add the different sets. However, linear mapping of interval sets can lead to irregular convex sets. In computational geometry, traditional algorithms that perform Minkowski addition for two convex irregular high-dimensional polytopes are computationally expensive [33]. On the other hand, Minkowski addition of two interval sets is computationally efficient because intervals are axis-aligned. Thus, the operator Q(·) that converts the irregular set to the axis-aligned set is applied to speed up the computation of the Minkowski addition.

Following the above, the set of states <sup>X</sup><sup>ˆ</sup> *<sup>k</sup>*+<sup>1</sup> is bounded by the prior estimate set P<sup>−</sup> *k*+1 according to Equations (19a) and (19b).

$$\mathcal{P}\_{k+1}^{-} = \mathcal{Q}\{ \underbrace{\mathcal{B}\mathcal{P}\_{k}^{+}}\_{\text{linear mapping}} + \underbrace{\Delta t \pounds\_{\text{bio},k} A \left(H\_{N}^{i} \theta\_{N}^{i} (\mathfrak{A}\_{N,k}^{i}) + \mathfrak{g}^{i}\right) + h}\_{\text{transulation}} \right\} \tag{19a} \tag{19b}$$

$$\mathcal{R}\_{k+1} \subset \mathcal{P}\_{k+1}^{-} \tag{19b}$$

where the set of the posterior estimates is <sup>P</sup><sup>+</sup> *<sup>k</sup>* . *<sup>B</sup>*P<sup>+</sup> *<sup>k</sup>* denotes the scaling of the set <sup>P</sup><sup>+</sup> *<sup>k</sup>* by the diagonal matrix *B*. Then the set *B*P<sup>+</sup> *<sup>k</sup>* is translated by the vector in the big curly brackets. To compensate for the deviation during the convergence of EKF, the interval set E,*<sup>k</sup>* is added by Minkowski addition.

Considering the truncated measurement noise, *rk* <sup>=</sup> *yk* <sup>−</sup> *Cxk* is bounded by the lower *<sup>l</sup>* and upper bounds *<sup>u</sup>*; let us define a set <sup>M</sup>*<sup>k</sup>* <sup>=</sup> {*xk* <sup>∈</sup> <sup>R</sup>*nx* : *<sup>l</sup>* <sup>&</sup>lt; *yk* <sup>−</sup> *Cxk* <sup>&</sup>lt; *<sup>u</sup>*}. Then, the posterior estimate set <sup>P</sup><sup>+</sup> *<sup>k</sup>*+<sup>1</sup> is given by Equations (20a)–(20c). In this study, it is assumed that <sup>P</sup><sup>+</sup> *<sup>k</sup>* and P<sup>−</sup> *<sup>k</sup>*+<sup>1</sup> are much smaller than the volumes of the critical regions.

$$\mathcal{P}\_{k+1}^{+} = \mathcal{P}\_{k+1}^{-} \bigcap \mathcal{M}\_{k+1} \tag{20a}$$

$$
\hat{\mathcal{A}}\_{k+1} \subset \mathcal{P}\_{k+1}^{+} \tag{20b}
$$

$$\mathcal{P}\_0^+ = \mathcal{P}\_0 \tag{20c}$$

Figure 2 illustrates the set propagation using intervals for an example involving two states, e.g., glucose and biomass concentrations. The initial set P<sup>0</sup> contains all possible initial values of glucose and biomass. Then <sup>P</sup><sup>+</sup> <sup>1</sup> is generated through set operations by computational geometry algorithms. Since an interval set is used, it is computationally efficient to project the set <sup>P</sup><sup>+</sup> <sup>1</sup> onto the biomass and glucose axes to obtain the corresponding lower bounds *lglc*, *lbio* and upper bounds *uglc*, *ubio* as shown in the figure for the set <sup>P</sup><sup>+</sup> 1 .

**Figure 2.** Illustration of set propagation of SME by set operations.

#### *2.5. Detecting the Transition between Critical Regions*

The proposed use of multiparametric programming converts the DFBM into a variable structure system composed of subsystems where each critical region corresponds to a subsystem. Along a given time trajectory the states may transit from one critical region to another. When the states estimated by the EKF leave a critical region **Θ***<sup>i</sup>* to enter another critical region **Θ***<sup>j</sup>* , the estimate *<sup>x</sup>***ˆ***M***,***k* and the covariance *Pk* must be reinitialized because *xM* for different critical regions may be different, even though the measured states are the same. Moreover, a criterion is required to detect whether the states are entering into a new critical region.

When the system is traversing from one critical region to another, it needs to cross a boundary between the critical regions. Over time the states may cross over several boundaries along their trajectories and these crossings must be detected. Two neighboring critical regions share a boundary where an active constraint will become inactive or vice versa. The activation of a constraint may require the change of constraints related to *<sup>x</sup>***ˆ***N***,***k*. For a given constraint, *<sup>θ</sup>* is usually only function of two states at most because of commonly used Michaelis-–Menten kinetics [34] or constraints to prevent the depletion of nutrients [23] and one of these two states is biomass. So two special cases should be considered as follows when system switches from one critical region to the next:

Case i—*x<sup>i</sup> N* of the old critical region **<sup>Θ</sup>***<sup>i</sup>* have one more observable state than the *<sup>x</sup> j N* of the new critical region **Θ***<sup>j</sup>* . For this case, the switch between critical regions is determined by Equation (21). Equation (21) calculates the norm of the difference between the flux estimates obtained with Equation (7) in the two neighboring regions. Notice that the flux estimate of **<sup>Θ</sup>***<sup>j</sup>* is based on estimate *x***ˆ***<sup>i</sup> N***,***k* of the old critical region. The value of *<sup>γ</sup>*(*i*, *<sup>j</sup>*, *<sup>k</sup>*) is used to detect the occurrence of a switch. If the system is exactly at the boundary of these two critical regions, the flux equation Equation (7) for these two critical regions should result in the same flux value and *γ*(*i*, *j*, *k*) will be zero. A schematic example is shown in Figure 3. Polygons in different colors represent different critical regions in the parameter space **Θ**. As the state evolves with time, the corresponding *θ* changes along the dash line in the parameter space **Θ**. As the *θ* approaches the boundary between the critical region **Θ<sup>1</sup>** and **Θ2**, *γ*(*i*, *j*, *k*) approaches zero. Correspondingly, a value of *γ*(*i*, *j*, *k*) smaller than a user specified tolerance indicates a switch between critical regions, thus requiring reinitialization of the EKF as follows: *x***<sup>ˆ</sup>** *j N***,***k* is set equal to *<sup>x</sup>***ˆ***<sup>i</sup> N***,***k* and *<sup>P</sup><sup>j</sup> k* is set equal to *<sup>P</sup><sup>i</sup> k*.

$$\gamma(i,j,k) = \left\| \mathfrak{d}\_{k}^{i} - \mathfrak{d}\_{k}^{j} \right\| = \left\| H\_{N}^{i} \mathfrak{d}\_{N}^{i} (\mathfrak{X}\_{N,k}^{i}) + \mathfrak{g}^{i} - (H\_{N}^{j} \mathfrak{G}\_{N}^{j} (\mathfrak{X}\_{N,k}^{i}) - \mathfrak{g}^{j}) \right\| \tag{21}$$

**Figure 3.** Illustration of detecting a critical region switch.

Case ii—*x j N* of the new critical region **<sup>Θ</sup>***<sup>i</sup>* has one more observable state than the *<sup>x</sup> j N* of the old critical region **Θ***<sup>i</sup>* . To reinitialize the EKF, *x***<sup>ˆ</sup>** *j N***,***k* and *<sup>P</sup><sup>j</sup> k* can be set to the old values except for the new observable state that is not observable in the old critical region, and thus it needs to be estimated for calculating *<sup>γ</sup>*(*i*, *<sup>j</sup>*, *<sup>k</sup>*). By projecting the set <sup>P</sup><sup>+</sup> *<sup>k</sup>* , the lower *lun*,*<sup>k</sup>* and upper bounds *uun*,*<sup>k</sup>* can be calculated. Since no extra information is available, the mean value of the upper bound and lower bound is used as the nominal value of the unobservable state as per Equation (22).

$$\mathfrak{X}^{i}\_{\text{un},k} = \frac{1}{2} (\mathfrak{u}\_{\text{un},k} + l\_{\text{un},k}) \tag{22}$$

Equation (23) is used to calculate *γ*(*i*, *j*, *k*). The flux estimate for the new critical region **Θ***<sup>j</sup>* is based on the nominal values of the unobservable state *x*ˆ*<sup>i</sup> un*,*<sup>k</sup>* combined with *<sup>x</sup>*ˆ*<sup>i</sup> <sup>N</sup>*,*<sup>k</sup>* of the old critical region.

$$\gamma(i,j,k) = \left\| \mathfrak{d}\_{\mathbf{k}}^{i} - \mathfrak{d}\_{\mathbf{k}}^{j} \right\| = \left\| H\_{\mathbf{N}}^{i} \mathfrak{d}\_{\mathbf{N}}^{i} (\mathfrak{A}\_{\mathbf{N},k}^{i}) + \mathfrak{g}^{i} - (H\_{\mathbf{N}}^{j} \mathfrak{G}\_{\mathbf{N}}^{j} (\mathfrak{A}\_{\mathbf{un},k'}^{i} \mathfrak{A}\_{\mathbf{N},k}^{i}) - \mathfrak{g}^{j}) \right\| \tag{23}$$

To reinitialize the EKF, the estimate and covariance are used together with the estimation of the new state that is added in the new critical region. Assuming the states are close enough to the boundary between the critical regions, then Equation (24) holds.

$$\left\|{H\_N^i \theta\_N^i(\mathfrak{X}\_{N,k}^i)} + \mathfrak{g}^i - (H\_N^j \theta\_N^j(\mathfrak{X}\_{N,k'}^j \mathfrak{X}\_{un,0}^j) - \mathfrak{g}^j)\right\| = 0 \tag{24}$$

The initial estimate of new observable state *x*ˆ *j un*,0 in the new region can be calculated by solving the Equation (24). Since the new state is between the upper bound and lower bound by SME, the half length between *uun*,*<sup>k</sup>* and *lun*,*<sup>k</sup>* is the worst possible deviation. Then, using a 3 standard deviation range, the initial variance *η*<sup>2</sup> *un*,*<sup>k</sup>* can be estimated according

to Equation (25) and all other covariance terms related to the new state are assumed to be zero.

$$
\eta\_{un,k} = \frac{1}{3} \cdot \frac{1}{2} (u\_{un,k} - l\_{un,k}) \tag{25}
$$

Bounds of states estimated by the SME are rigorously guaranteed in each critical region separately but subject to accurate tuning of the tolerance that is used to switch between the subsystems. The tolerance of *γ*(*i*, *j*, *k*) is the only user specified parameter in this research. If the tolerance is too large or small, the EKF may switch the subsystem too early or too late. Accordingly, if the wrong state equations are used in estimation, the bounds on the states may be violated. To avoid such a situation, exhaustive simulations that are initialized with P<sup>0</sup> are conducted to find the tolerance used to switch between critical regions. As an alternative, an overestimated covariance can also be used to reinitialize the EKF when the state enters a new critical region to avoid bound violations.

#### **3. Results**

#### *3.1. DFBM Model of E. coli*

A DFBM model of *E. coli* reported in [20] is used to illustrate the proposed methodology. The DFBM in batch operation includes four states, glucose concentration *xglc*, oxygen concentration *xoxy*, acetate concentration *xace*, and biomass concentration *xbio* as in Equations (26a)–(26e). Thus, the state vector is *<sup>x</sup>* <sup>=</sup> *xglc xoxy xace xbio<sup>T</sup>* . The substrates are glucose, oxygen, acetate.

$$\mathbf{x}\_{glc,k+1} = \mathbf{x}\_{glc,k} + \Delta t \mathbf{x}\_{bio,k} \mathbf{A}\_{glc} \mathbf{z}\_k \tag{26a}$$

$$\mathbf{x}\_{\rm any}\mathbf{x}\_{k+1} = (1 - k\_L a \Delta t) \mathbf{x}\_{\rm any}\mathbf{x}\_k + \Delta t \mathbf{x}\_{\rm bi} A\_{\rm any} \mathbf{v}\_{\rm k} + 0.21 k\_L a \Delta t \tag{26b}$$

$$\mathbf{x}\_{\text{acc},k+1} = \mathbf{x}\_{\text{acc},k} + \Delta t \mathbf{x}\_{\text{bio},k} \mathbf{A}\_{\text{acc}} \mathbf{v}\_k \tag{26c}$$

$$\mathbf{x}\_{\text{bia},k+1} = \mathbf{x}\_{\text{bia},k} + \Lambda \mathbf{t} \mathbf{x}\_{\text{bia},k} \mathbf{A}\_{\text{bia}} \mathbf{v}\_{\text{k}} \tag{26d}$$

*vk*

$$\mathbf{x\_0} \in \mathcal{P}\_0 = \mathcal{S}\left( \begin{bmatrix} 0.38 & 0.1995 & 0.19 & 0.00095 \end{bmatrix}^T, \begin{bmatrix} 0.42 & 0.2205 & 0.21 & 0.00105 \end{bmatrix}^T \right) \tag{26e}$$

where *kLa* <sup>=</sup> <sup>4</sup> *<sup>h</sup>*−<sup>1</sup> is the oxygen mass transfer coefficient. The initial state vector *<sup>x</sup>***<sup>0</sup>** is defined by the interval set P<sup>0</sup> according to Equation (26e). The matrix *A* contains the stoichiometric coefficients corresponding to four reactions according to Equation (27). Each column of this matrix corresponds to one reaction and each row correspond to one component.

$$A = \begin{bmatrix} A\_{glc} \\ A\_{oxy} \\ A\_{qlc} \\ A\_{bio} \end{bmatrix} = \begin{bmatrix} 0 & -9.46 & -9.84 & -19.23 \\ -35 & -12.92 & -12.73 & 0 \\ -39.43 & 0 & 1.24 & 12.12 \\ 1 & 1 & 1 & 1 \end{bmatrix} \tag{27}$$

The flux vector *vk* is obtained by solving the following linear programming problem as Equations (28a)–(28g):

$$\max\_{v\_k} \qquad \qquad A\_{blo} v\_k \tag{28a}$$

subject to <sup>−</sup> *Aoxyvk* <sup>≤</sup> OUR*max* (28b)

$$A\_{\text{acc}} \upsilon\_k \le 100 \Big| \Big| \Big. \tag{28c}$$

$$-\Delta t A\_{glc} \varpi\_k \le \frac{\chi\_{glc,k}}{\chi\_{bio,k}} = \theta\_{1,k} \tag{28d}$$

$$-\Delta t A\_{\alpha \text{xy}} \sigma\_{\text{k}} \le \frac{(1 - k\_L a \Delta t) \mathbf{x}\_{\alpha \text{xy},k} + 0.21 k\_L a \Delta t}{\mathbf{x}\_{\text{bio},k}} = \theta\_{2,k} \tag{28e}$$

$$-\Delta t A\_{\text{acc}} \sigma\_k \le \frac{\varkappa\_{\text{acc},k}}{\varkappa\_{\text{bio},k}} = \theta\_{\text{3,k}} \tag{286}$$

$$-A\_{glc}v\_k \le \frac{\text{GUR}\_{\text{max}} \underline{\chi}\_{glc,k}}{K\_m + \underline{\chi}\_{glc,k}} = \theta\_{4,k} \tag{28g}$$

where OUR*max* = 12 mM/(g-dw·h) is the maximum oxygen uptake rate and g-dw is grams of dry weight of biomass; GUR*max* = 6.5 mM/(g-dw·h) denotes the maximum glucose uptake rate. Equation (28a) describes that the objective of the cells is to maximize the biomass growth rate. Equation (28b) indicates that the oxygen consumption rate is limited by a maximum uptake limit. Equation (28c) indicates that the acetate generation rate is bounded by 100 mM/(g-dw·h). Equation (28g) indicates that the glucose consumption rate is bounded by an upper limit. All the other constraints are positivity constraints to prevent depletion of metabolites. To express these constraints in Equations (28a)–(28g) compactly, the constraints in (28a)–(28g) can be expressed in the form of Equation (3):

$$Gv\_k \le F\theta\_k(\mathbf{x}\_k) + \mathbf{z} \tag{29a}$$

$$\begin{bmatrix} -A\_{\alpha xy} \\ A\_{\alpha \dots} \end{bmatrix}$$

$$\begin{aligned} G &= \begin{bmatrix} A\_{\text{after}} \\ -\Delta t A\_{\text{glc}} \\ -\Delta t A\_{\text{oxy}} \\ -\Delta t A\_{\text{after}} \\ -A\_{\text{glc}} \end{bmatrix} \\ F &= \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ F &= \begin{bmatrix} \text{OUR}\_{\text{max}} \\ 100 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \end{aligned} \tag{29c}$$

#### *3.2. Determination of Minimum Measurements*

Due to the assumption that the initial state is contained in an interval, the problem in Equations (28a)–(28g) can be formulated as a multiparameteric linear programming (mpLP) problem. The vector *θ* is composed of four parameters which are nonlinear functions of states. Using the Multi-Parametric Toolbox 3.0, it can be found that the entire parameter space **Θ** can be decomposed into a maximum of 24 critical regions. For each critical region, the mpLP solver calculates the constraints that form the boundaries of the region and the equations that generate the optimal solutions. In order to reduce the computational effort, extensive simulations are conducted with randomly chosen initial values in set P<sup>0</sup> to identify which critical regions are relevant for the problem. It is found from these simulations that, for the chosen range of initial conditions, the states only traverse through two neighboring critical regions **Θ**<sup>1</sup> and **Θ**<sup>2</sup> assuming small critical regions are ignored. According to the results of the mpLP solver, the two critical regions can be defined as Equations (30a) and (30b). Critical regions **Θ**<sup>1</sup> and **Θ**<sup>2</sup> share a boundary defined in Equation (30c). Since *<sup>θ</sup>* is a function of *x*, the critical regions are next to each other in the state space.

$$\begin{aligned} \boldsymbol{\Theta}^{1}: \begin{bmatrix} -0.9988 & 0 & 0 & 0.0499 & 0 \\ 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & -0.9971 & -0.0767 & 0 \\ 0 & 0 & 0 & -0.0033 & -1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 & 0 \end{bmatrix} \boldsymbol{\theta}(\mathbf{x}) \le \begin{bmatrix} 0 \\ -0.6740 \\ 0.0171 \\ 0.0171 \\ 8.7864 \\ 0 \end{bmatrix} \end{aligned} \tag{30a}$$

$$\boldsymbol{\Theta}^{2}: \begin{bmatrix} -0.9988 & 0 & 0 & 0.0499 & 0 \\ 0 & -0.7469 & 0.6630 & 0.0510 & 0 \\ 0 & 0 & -0.0254 & -0.0053 & -0.9997 \\ 0 & 0 & -1 & 0 & 0 \\ 0 & 0 & 0.9971 & 0.0767 & 0 \\ 0 & 0 & 0 & -1 & 0 \end{bmatrix} \boldsymbol{\theta}(\mathbf{x}) \le \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0.6740 \\ 0 \end{bmatrix} \tag{30b)$$

$$\boldsymbol{\Theta}^{1} \bigcap \boldsymbol{\Theta}^{2}: \begin{bmatrix} 0 & 0 & 0.9971 & 0.0767 & 0 \end{bmatrix} \boldsymbol{\theta}(\mathbf{x}) = 0.6740 \tag{30c}$$

Accordingly, the mpLP solver also calculates the matrix *H* and *g* used in the flux equation Equation (7) for these two critical regions. By taking advantage of the sparseness of *H* for these two critical regions, *<sup>θ</sup>N* can be determined. The equations to calculate fluxes for these two critical regions can be expressed as Equations (31a) and (31b).

$$\mathbf{w}\_{\mathbf{k}}^{\mathbf{1}} = \begin{bmatrix} -0.039 & 0.1057 & 0 & 0 \end{bmatrix}^{T} \boldsymbol{\theta}\_{4} (\mathbf{x}\_{glc,k}) + \begin{bmatrix} 0.3429 & 0 & 0 & 0 \end{bmatrix}^{T} \tag{31a}$$

$$\mathbf{w}\_{\mathbf{k}}^{2} = \begin{bmatrix} 0.5072 & 0 & 0 & 0 \end{bmatrix}^{T} \boldsymbol{\theta}\_{3}(\mathbf{x}\_{\text{ac},k\prime}, \mathbf{x}\_{\text{bi}\boldsymbol{a},k}) + \begin{bmatrix} 0 & 0.1057 & 0 & 0 \end{bmatrix}^{T} \boldsymbol{\theta}\_{4}(\mathbf{x}\_{\text{g}\boldsymbol{l}\boldsymbol{c},k}) \tag{31b}$$

where *<sup>θ</sup><sup>N</sup>* for critical region **<sup>Θ</sup>**<sup>1</sup> is *<sup>θ</sup>*<sup>4</sup> and *<sup>θ</sup><sup>N</sup>* for critical region **<sup>Θ</sup>**<sup>2</sup> is *<sup>θ</sup>*<sup>3</sup> and *<sup>θ</sup>*4. By substituting the flux equation Equations (31a) and (31b) into Equations (26a)–(26e), the simplified state equations of *E. coli* model can be rewritten compactly as in Equations (32a) and (32b).

*xk***+<sup>1</sup>** <sup>=</sup> *Bxk* <sup>+</sup> <sup>Δ</sup>*txbio*,*kAv***<sup>1</sup>** *k*(*xglc*,*k*) + *<sup>h</sup> <sup>θ</sup>***(***xk***)** <sup>∈</sup> **<sup>Θ</sup>**<sup>1</sup> (32a)

$$\mathbf{x}\_{k+1} = \mathbf{B}\mathbf{x}\_k + \Lambda t \mathbf{x}\_{\text{bio},k} A \sigma\_\mathbf{k}^2 (\mathbf{x}\_{\text{ace},k}, \mathbf{x}\_{\text{bio},k}, \mathbf{x}\_{\text{g1c},k}) + h \mathbf{} \tag{32b} \\ \in \Theta^2 \tag{32b}$$

Following the calculations above, the original *E. coli* model is simplified into an equivalent system comprised of two subsystems of interest. Equations (32a) and (32b) describe subsystem 1 and subsystem 2, respectively. These two subsystems are continuous in the state space and they share the same boundary as per Equation (30c). Once the state crosses the boundary between the two subsystems, the governing equation is switched from Equations (32a) and (32b). Because the initial state is randomly initialized in set P0, <sup>P</sup><sup>0</sup> corresponds to a set in **<sup>Θ</sup>**1. Thus, the state evolves within the region of subsystem 1 and gradually approximates the region of subsystem 2 governed by Equation (32b) until finally crossing the boundary given by Equation (30c). As only part of *θ* is known, a detector is used to detect the crossing of the boundary, thus ensuring that the switch between the regions is performed accurately.

Based on the flux equation Equations (31a) and (31b), the reaction-rate-determining states vector *x<sup>i</sup> M* for **<sup>Θ</sup>**<sup>1</sup> are biomass and glucose and for **<sup>Θ</sup>**<sup>2</sup> are biomass, acetate and glucose. Accordingly, the possible combinations of measurements needed for observing *x*<sup>1</sup> *M* of **<sup>Θ</sup>**<sup>1</sup> include Ω<sup>1</sup> <sup>1</sup> <sup>=</sup> {*Bio*}, <sup>Ω</sup><sup>1</sup> <sup>2</sup> <sup>=</sup> {*Glc*} and <sup>Ω</sup><sup>1</sup> <sup>3</sup> = {*Bio*, *Glc*}. Similarly, there are 7 possible combinations of measurements for observing the vector *x*<sup>2</sup> *M* in **<sup>Θ</sup>**2, namely <sup>Ω</sup><sup>2</sup> <sup>1</sup> = {*Ace*}, Ω<sup>2</sup> <sup>2</sup> <sup>=</sup> {*Bio*}, <sup>Ω</sup><sup>2</sup> <sup>3</sup> <sup>=</sup> {*Glc*}, <sup>Ω</sup><sup>2</sup> <sup>4</sup> <sup>=</sup> {*Ace*, *Bio*}, <sup>Ω</sup><sup>2</sup> <sup>5</sup> <sup>=</sup> {*Bio*, *Glc*}, <sup>Ω</sup><sup>2</sup> <sup>6</sup> = {*Ace*, *Glc*}, and Ω<sup>2</sup> <sup>7</sup> = {*Ace*, *Bio*, *Glc*}. To find a combination of measurements ΩΓ that will be suitable for both critical regions, it is necessary to perform an analysis of observability for these combinations. The Symbolic Toolbox calculation of MATLAB R2018a is used to develop an analytical equation observability rank condition and rank of **Φ***<sup>i</sup> k* of the nonlinear system according to the criterion presented in [31]. Since the symbolic expressions of the rank for each critical regions for Equation (11) are very complex, it is very difficult to infer a

analytical condition of observability for all possible values of the states. Instead, the rank values are calculated for different measurement combinations and rank of **Φ***<sup>i</sup> k* using a Monte Carlo algorithm based on 5 million samples of **Θ**<sup>1</sup> and **Θ**2, respectively. According to these Monte Carlo simulations, the only measurement required for observability of the vectors *x***<sup>1</sup>** *M* in **<sup>Θ</sup>**<sup>1</sup> and *<sup>x</sup>***<sup>2</sup>** *M* in **<sup>Θ</sup>**<sup>2</sup> is the biomass concentration, namely ΩΓ <sup>=</sup> {*Bio*}.

#### *3.3. EKF for the Two Subsystems and Detection of Transition between Subsystems*

Based on the aforementioned observability analysis, the biomass concentration is the only state that needs to be measured online as per Equation (33a) for implementation of the EKF. Measurement noise is assumed as a truncated normal distribution as described by Equation (33b). Since the initial P<sup>0</sup> is assumed to be known, the EKF is initialized at the center of P<sup>0</sup> with a variance based on 3 standard deviations and zero covariance terms. The state of the plant is initialized randomly by sampling a point within the region defined by P0.

$$y\_k = \begin{bmatrix} 0 & 0 & 0 & 1 \end{bmatrix} \mathbf{x}\_k + r\_k \tag{33a}$$

$$r\_k \sim TN(0, 0.004^2, -0.0004, 0.0004) \quad k = 0, 1, 2 \cdots \tag{33b}$$

Based on the assumed <sup>P</sup>0, in the batch process the EKF starts in critical region **<sup>Θ</sup>**<sup>1</sup> and later it transitions into critical region **Θ**2. Thus, two EKFs are required in this case study to estimate the *xM* as summarized in Table 1. Based on the biomass measurement *yk*, the glucose and biomass concentrations are estimated by the EKF for **Θ**<sup>1</sup> as *x*ˆ*N*,*bio*,*<sup>k</sup>* and *x*ˆ*N*,*glc*,*k*. With the same biomass measurement, the second critical region **Θ**<sup>2</sup> has one more observable state which is the acetate concentration *x*ˆ*N*,*ace*,*k*.

**Table 1.** Observable and unobservable subpace of two subsystems of the DFBM model of *E. coli*.


Since acetate and oxygen are unobservable in **Θ**1, they need to be estimated by bounds. To find these bounds, SME propagates the initial set P<sup>0</sup> by set operations to obtain a prior estimate set P<sup>−</sup> *<sup>k</sup>* as Equation (19). After obtaining the measurement of biomass, a posterior estimate set <sup>P</sup><sup>+</sup> *<sup>k</sup>* as in Equation (20) is calculated by set operations. The error due to lack of convergence of the EKF is compensated by using Equation (18). By projecting <sup>P</sup><sup>+</sup> *<sup>k</sup>* onto the axis of acetate and oxygen, respectively, the upper bound *uun***,***k* and lower bound *lun***,***k* of these two states are obtained.

Since **Θ**<sup>2</sup> has one more flux-determining state, acetate that is not observable from the measurement of biomass, it must be estimated as explained in Equation (22). Using the mean value of *uun*,*ace*,*<sup>k</sup>* and *lun*,*ace*,*<sup>k</sup>* the nominal values of the unobservable state *x*ˆ*un*,*ace*,*<sup>k</sup>* are obtained. Using the EKF estimates of the observable flux-determining states *<sup>x</sup>***ˆ***N***,***k* together with the nominal value of acetate *x*ˆ*un*,*ace*,*k*, the detection scheme explained in Section 2.5 can be implemented. Accordingly, *γ*(*i*, *j*, *k*) is calculated from Equation (23) to determine the switch from critical region **Θ**<sup>1</sup> to critical region **Θ**2. The tolerance of *γ*(*i*, *j*, *k*) to determine the switch between the critical regions is assumed as 0.08. This tolerance is the only tuning parameter of the proposed method and it is determined by trial and error. After the switch occurs the acetate concentration is initialized by the solution of Equation (24) and the variance of acetate is initialized based on Equation (25). After the switch to critical region **Θ**2, the EKF continues to generate estimates of glucose, acetate and biomass concentrations in **<sup>Θ</sup>**<sup>2</sup> and the SME approach is used to propagate the set <sup>P</sup><sup>+</sup> *<sup>k</sup>* as conducted in critical region 1. Figure <sup>4</sup> presents the posterior estimate sets <sup>P</sup><sup>+</sup> and true plant state *x* at different times. Since the model is 4 dimensional, the posterior estimate sets <sup>P</sup><sup>+</sup> are projected for visualization onto two dimensional spaces: the glucose–oxygen subspace and acetate– biomass subspace. The 8 boxes denote the projected posterior estimate sets between 0 h and 7 h, and each box represents an hour. The arrows in Figure 4 indicate the direction of time evolution. The black dots denote the true plant state. Since biomass is measured, the length of the boxes along the biomass dimension is relatively smaller, as compared to the other dimensions. The switch between the critical regions occurs at around 5 h.

**Figure 4.** Posterior estimate sets projected onto the glucose–oxygen subspace and the acetate–biomass subspace at different times.

#### *3.4. Set Membership Estimation*

To verify the estimate and bounds generated by the proposed algorithm, we use a special Monte Carlo Algorithm (MCA) that takes biomass measurements into account. MCA randomly samples 100,000 different points from P<sup>0</sup> and use them as initial states' values, and then calculates the corresponding trajectories with respect to time. Since, for the measurement of biomass, a truncated normal distribution measurement noise was assumed, some trajectories are not within the confidence interval of measurements. Once a trajectory is found out of the measurement range, the evolution of the trajectory is stopped and the corresponding trajectory is removed while trajectories which are still within the confidence interval of measurements are kept. Accordingly, only a part (2581) out of the trajectories starting from P<sup>0</sup> are used for comparison to the bounds calculated by the proposed method. It should be noticed the fraction of trajectories kept for comparison is small because only a very narrow set of solutions are within the measurement range from the from the beginning to the end. In other words, only a small part of the samples considered in the simulation are compatible with the biomass measured trajectory that is assumed for the calculation of bounds by the set-based approach. Using parallel computation, 4 hour and 4 minutes of CPU time were required to complete all simulations. For comparison, the method proposed in this work can generate bounds with only 41 sec of CPU time without parallel computation. It should be remembered that the MCA was conducted for a specific trajectory of biomass measurements so as to enable a fair comparison with the method proposed in the current study. While it could be argued that MCA could be used to calculate bounds for all possible biomass trajectories, this will be computationally prohibitive. Thus, the proposed technique is a practical and analytical approach to the online estimation problem.

In Figure 5, the grey area denotes the trajectories randomly sampled and the two black lines represent the upper and lower bounds by SME. It is clear that the SME contains all the solutions generated by MCA, especially for the unobservable states. It can be observed that the switch from one critical region to the other occurs at approximately 5 h as shown in Figure 2. Before 5 h, the reactor has enough resources for cell growth and the limiting step is glucose uptake as Equation (31a) shows. Thus, critical region **Θ**<sup>1</sup> corresponds to the logarithmic phase of growth where the latter is driven by glucose consumption. At about 5 h, the simultaneous depletion of acetate and glucose leads to a metabolic switch from the logarithmic phase to the stationary phase. Following this metabolic switch, the culture is also acetate limited and thus acetate become a new flux-determining state. Since the oxygen feed rate is maintained constant in the model, the fact that the growth significantly decreases after the switch explains why the oxygen concentration bounces back up.

**Figure 5.** Comparison between MCA with bounds of 4 components estimated by SME in batch fermentation of *E. coli*.

To further verify the proposed scheme, similar MCA simulations were conducted with a larger initial uncertainty and measurement noise. In Figure 6, the bounds of 4 component concentrations estimated by SME are shown. It is clear that the simulated trajectories contained in the grey color band generated by MCA is within the bounds calculated by the proposed methodology. From comparison of Figures 5 and 6, it can be found that the SME approach copes with the larger noise and initial uncertainty by generating larger bounds.

**Figure 6.** Comparison between MCA with bounds of 4 components estimated by SME with loud noise.

#### **4. Discussion**

DFBM models are advantageous since they contain significant detail about the cell metabolism as compared to classical unstructured models. However, due to this level of detail, DFBM contain many states thus resulting in more difficult state estimation problem. The challenge of dealing with a large number of states is further exacerbated by the fact that online measurements of metabolites are generally difficult to obtain or not available. With limited online measurements, it is often impossible to produce observability for all the states. Noticing that the diagonal matrix *B* in Equation (19) is a linear mapping of states, if the nonlinear term <sup>Δ</sup>*txbio*,*kAvk* can be estimated then it is possible to estimate the other states of the DFBM.

Multiparametric LP is introduced to convert the original system into a series of piecewise continuous subsystems based on the partitioning of the parameter space into critical regions. The availability of an explicit expression for the calculation of the LP optima for each critical region significantly simplifies the solution of the problem. Although many critical regions may be mathematically possible, industrial fermentation is operated in a narrow range of initial operating conditions and as such only a few critical regions need to be considered.

Beyond their computational convenience the critical regions identified by the Multiparametric LP approach can be interpreted as corresponding changes in the cell metabolism. The relative abundance of substrates, i.e., glucose, acetate and oxygen in the *E. coli* model and their consumption towards biomass lead to the occurrence of different resources' limitations at any given time. Within some ranges of concentration, the limiting substrate remains the same corresponding to a specific metabolism strategy.

In the *E. coli* example, four reactions can synthesize the biomass from glucose, acetate and oxygen. However, since the objective is to maximize growth subject to constraints, the cell prioritizes these reactions differently at any given time due to their different efficiency for biomass synthesis. The ratio of the stoichiometry coefficients in each column of matrix *A* indicates the biomass yield of each substrate for each reaction. Reaction 1 is the only reaction that consumes acetate to synthesize biomass. The yield of acetate to biomass is 1 39.43 for reaction 1, which is very low compared with reaction 2 and reaction 3. The biomass yield of reaction 2 and reaction 3 by glucose is <sup>1</sup> 9.46 and <sup>1</sup> 9.84 , respectively. Reaction 4 is the only reaction that do not consume oxygen to generate biomass but it is very inefficient. Because the biomass yield of these reactions are different, reaction 2 is preferred over reaction 1 and reaction 3 when glucose and oxygen are abundant. When oxygen is very low, the cells switch their metabolism from aerobic to anaerobic to generate biomass through reaction 4.

To maximize the biomass growth rate, cells take advantage of reaction 1 and 2 to consume as much acetate and glucose as possible when oxygen is sufficient. However, the glucose amounts that can be consumed by the cells is limited by the glucose uptake rate, which is *θ*4. Similarly, oxygen consumption is limited by a constant oxygen uptake rate as in Equation (28b). The oxygen is consumed first with glucose in reaction 2 to synthesize biomass and the remaining oxygen is consumed for reaction 1. Multiparametric LP captures the relative priority of different reactions towards maximization of growth and identify the key limited resources. In critical region **Θ1**, glucose is the key resource that determines the flux vector according to Equation (31a). As glucose and acetate are consumed by reactions 1 and 2, biomass increases exponentially and the oxygen concentration drops fast due to oxygen demands as in Figures 5 and 6. At some point the concentration of acetate becomes very low but acetate is necessary for reaction 2 to synthesize biomass. At this point, acetate becomes the key limited resource and the system enters into a new critical region **Θ2**. Then in **Θ2**, the metabolism is limited by the available acetate and glucose and as they deplete the growth of cells decreases and ultimately stops. Accordingly, **Θ<sup>1</sup>** corresponds to the logarithmic phase and **Θ<sup>2</sup>** to the stationary phase of growth.

The use of EKF for each subsystem is used to estimate the reaction-rate-determining states thus reducing the need for online measurements. Since biomass is highly correlated with the reaction-rate-determining states, EKF can take advantage of biomass measurement to estimate these states. Because some of these reaction-rate-determining states are common to different critical regions, only are fewer states required to be measured or estimated, which greatly reduce the demand of online measurements of concentration. In the *E. coli* example, only biomass needs to be measured. Once biomass is measured, glucose can be estimated by the EKF in critical region **Θ<sup>1</sup>** and glucose and acetate can be estimated in **Θ2**.

By using the SME upper and lower bounds for all states can be generated including the unobservable ones such as acetate and oxygen in **Θ1**. Using the bounds of the acetate and biomass estimates, it was possible to determine the switch from one critical region to another and to re-initialize the estimates and covariance matrix for the EKF after the switch.

This research is helpful in DFBM-based control in bio-processes when many components cannot be measured online. Using the upper and lower bounds calculated by SME of unobservable states and estimates by EKF of observable states, robust control methods can be applied to achieve optimal operation in the presence of uncertainty. The method developed can also be extended to monitor the bio-processes and differentiate between normal and abnormal operations.

#### **5. Conclusions**

This research proposed a comprehensive DFBM-based approach to estimate the metabolites concentrations with a minimal number of online measurements. The main idea is to convert the DFBM model with uncertainty in initial conditions to an explicit variable structure system that can be analyzed by multiparametric linear programming. A key finding of the proposed work is that only a subset of the states, referred to as reaction-rate-determining states, is needed to calculate the flux vector. Identification of the reaction-rate-determining states for each critical region permitted the determination of the minimum set of measurements required for full state estimation. EKFs were used to estimate the observable states and set propagation by SME was used to identify bounds of both the observable states and unobservable states.

**Author Contributions:** Conceptualization, X.S. and H.B.; methodology, X.S. and H.B.; software, X.S.; validation, X.S.; formal analysis, X.S. and H.B.; investigation, X.S. and H.B.; resources, H.B.; data curation, X.S.; writing—original draft preparation, X.S.; writing—review and editing, X.S. and H.B.; visualization, X.S. and H.B.; supervision, H.B.; project administration, H.B.; funding acquisition, H.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by The Natural Sciences and Engineering Research Council of Canada (NSERC) of grant number RGPIN-04609-2019, Mitacs, and Sanofi Pasteur.

**Conflicts of Interest:** We declare that we have no conflicts of interest to this work.

#### **Abbreviations**

The following abbreviations are used in this paper:


#### **References**

