*Review* **How to Tackle Underdeterminacy in Metabolic Flux Analysis? A Tutorial and Critical Review**

**Philippe Bogaerts 1,\*,† and Alain Vande Wouwer 2,\*,†**


**Abstract:** Metabolic flux analysis is often (not to say almost always) faced with system underdeterminacy. Indeed, the linear algebraic system formed by the steady-state mass balance equations around the intracellular metabolites and the equality constraints related to the measurements of extracellular fluxes do not define a unique solution for the distribution of intracellular fluxes, but instead a set of solutions belonging to a convex polytope. Various methods have been proposed to tackle this underdeterminacy, including flux pathway analysis, flux balance analysis, flux variability analysis and sampling. These approaches are reviewed in this article and a toy example supports the discussion with illustrative numerical results.

**Keywords:** flux variability analysis; flux balance analysis; sampling; metabolic network; elementary flux modes

#### **1. Introduction**

Computational approaches for studying the flux distribution inside metabolic networks of microbial strains or mammalian cell lines have gained a tremendous importance in biotechnology. Indeed, the production of high-added value biochemicals is based on large-scale cultures of genetically engineered strains, and the determination of the flux distribution provides insight into the biosynthesis pathways, the impact of metabolic engineering and the influence of the culture conditions. Different approaches have been developed to compute this flux distribution, which are based on a common assumption that the intracellular metabolites do not accumulate, or in other words, that the cell is in a metabolic pseudo-steady state [1]. This assumption leads to a system of mass-balance equations of the form:

$$N\underline{v} = 0 \qquad v\_i \ge 0 \qquad \forall i \tag{1}$$

where *<sup>N</sup>* <sup>∈</sup> <sup>R</sup>*ns*×*nv* is the stoichiometric matrix (and the incidence matrix of the graph representing the metabolic network), *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*nv* is the vector of intracellular fluxes (in mmol/gDW/h), which are assumed positive (i.e., to have a net direction), and *ns* is the number of intracellular metabolites. *N* is assumed full-row rank, thus defining *ns* independent mass balance equations. This system of equations expresses the zero balance in each internal node of the metabolic network, and imposes a set of linear equality constraints, which are not sufficient to determine a unique solution for the flux vector *v*. This system of equations is often supplemented by additional mass balance equations expressing the link between the intracellular fluxes and the measurements of external fluxes (uptake or production of extracellular metabolites):

$$N\_m \underline{v} = \underline{v}\_m \tag{2}$$

where *vm* <sup>∈</sup> <sup>R</sup>*nm* . Even though this additional information allows restricting the solution space, it is usually not sufficient to define a unique solution. More precisely, a subset of

**Citation:** Bogaerts, P.; Vande Wouwer, A. How to Tackle Underdeterminacy in Metabolic Flux Analysis? A Tutorial and Critical Review. *Processes* **2021**, *9*, 1577. https://doi.org/10.3390/pr9091577

Academic Editor: Andrzej Pawłowski

Received: 31 July 2021 Accepted: 27 August 2021 Published: 2 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the fluxes might be exactly calculable [2] while only intervals of values for the remaining fluxes can be computed. In general, the system of equations under consideration can be formulated as:

$$A\_{\mathfrak{e}} \underline{v} = \underline{b}\_{\mathfrak{e}} \tag{3}$$

$$A\_i \underline{v} \le \underline{b}\_i \tag{4}$$

where the equality constraints (1) and (2) are put together in (3), and Equation (4) contains the positivity constraints as well as other bound constraints, e.g., upper bounds on some of the fluxes, corresponding to prior knowledge or biological assumptions. The matrices *Ae* <sup>∈</sup> <sup>R</sup>*ne*×*nv* and *Ai* <sup>∈</sup> <sup>R</sup>*ni*×*nv* , correspond to *ne* equality constraints and *ni* inequality constraints, respectively. To tackle the underdeterminacy, several approaches have been proposed in recent years, which can be grouped into two distinct strategies:


In the following, we review some of these methods and their implementation with a toy example, which provides a numerical illustration of the main concepts. The Matlab code of this example is provided in the Supplementary Materials associated to this article.

#### **2. A Toy Example**

Despite its small size, the metabolic network (see Figure 1) that is considered to illustrate the several methods introduced in the previous section presents many representative features, e.g., several intracellular metabolites, extracellular substrates, and intra- and extra-cellular products. This network is described by the following reactions:

$$\begin{aligned} S\_1 &\xrightarrow{\varepsilon\_3} M\_1\\ S\_2 &\xrightarrow{\varepsilon\_3} M\_2\\ S\_2 &\xrightarrow{\varepsilon\_3} M\_3\\ M\_1 &\xrightarrow{\varepsilon\_3} M\_4 + M\_5\\ M\_4 &\xrightarrow{\varepsilon\_3} M\_5\\ M\_2 &\xrightarrow{\varepsilon\_3} M\_5\\ M\_5 &\xrightarrow{\varepsilon\_3} P\_1\\ M\_1 + M\_3 &\xrightarrow{\varepsilon\_3} P\_2 \end{aligned}$$

**Figure 1.** Simple metabolic network. *Mi*, *i* = 1, ... , 5 are the internal metabolites, *Si*, *i* = 1, 2 are the extracellular substrates, and *Pi*, *i* = 1, 2 are the extracellular and intracellular product, respectively.

The quasi-steady state assumption for the internal metabolites *Mi*, *i* = 1, ... , 5, yields a system of algebraic mass-balance equations in the form of Equation (1)

$$
\begin{bmatrix}
1 & 0 & 0 & -1 & 0 & 0 & 0 & -1 \\
0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & -1 \\
0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & 1 & -1 & 0 \\
\end{bmatrix}
\begin{bmatrix}
v\_1 \\ v\_2 \\ v\_3 \\ v\_4 \\ v\_5 \\ v\_6 \\ v\_7 \\ v\_8 \\ v\_9
\end{bmatrix} = N\underline{v} = 0 \qquad v\_i \ge 0 \qquad \forall i \tag{5}
$$

The stoichiometric matrix *N* is full row rank (i.e., 5) and the system of equations has 3 degrees of freedom (5 independent equations for 8 unknown fluxes). In an ideal measurement configuration, we consider that the 3 extracellular fluxes can be measured, for instance:

$$
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix} \underline{v} = N\_m \underline{v} = \begin{bmatrix} v\_{m1} \\ v\_{m2} \\ v\_{m3} \end{bmatrix} = \begin{bmatrix} 3.5 \\ 2.7 \\ 1.8 \end{bmatrix} = \underline{v}\_{\mathfrak{m}} \tag{6}
$$

so that a unique solution, i.e., flux distribution, can be found:

$$\underline{v} = \begin{bmatrix} 3.5000 \\ 0.0667 \\ 2.6333 \\ 0.8667 \\ 0.8667 \\ 0.0667 \\ 1.8000 \\ 2.6333 \end{bmatrix} \tag{7}$$

In the sequel, we will consider various situations with less measurements, and alternative methods to deal with, reduce or eliminate the underdeterminacy.

#### **3. Dealing with the Underdeterminacy**

The solution space of Equation (5) can be described using the concept of EFMs [3,27] which represent minimal, non-decomposable pathways connecting substrates to products. Every flux in the metabolic network can be described as a convex combination of EFMs:

$$\underline{\omega} = \sum\_{i=1}^{n\_{FEM}} \mu\_i \underline{\omega}\_i = E \underline{\mu}\_i \quad \mu\_i \ge 0 \tag{8}$$

where *nEFM* is the number of EFMs *ei*.

The EFMs can be computed using readily available software such as Metatool [28], EFMtool [29] or FluxModeCalculator [30]. The main issue associated to this computation is the combinatorial explosion of the number of EFMs with the network size (a network of less than 100 reactions can have tens of thousands of EFMs), and the fact that the computation involves some form of enumeration, which requires large computer memory space and computation time. Alternative procedures have therefore been proposed to compute minimal sets of EFMs without enumerating all of them [31]. For the small network under consideration in this review, this computation is trivial and leads to

$$E = \begin{bmatrix} \ \mathfrak{L}\_1 & \mathfrak{L}\_2 & \mathfrak{L}\_3 \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0.5 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0.5 \\ 0 & 0 & 0.5 \\ 1 & 0 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \tag{9}$$

The three EFMs define a polyhedral cone in the positive orthant, which contains all possible flux distributions. They correspond to a minimal bioreaction system, which provides an input–output representation of the cell metabolism (this kind of representation can be very useful to derive reduced-order macroscopic representations of the culture system [32–34], but this subject will be addressed later on in this paper; see Section 5.4).

$$\begin{aligned} S\_2 &\rightarrow P\_1\\ S\_1 + S\_2 &\rightarrow P\_2\\ 0.5S\_1 &\rightarrow P\_1 \end{aligned} \tag{10}$$

We already know that if three measurements are available, such as the ones defined in Equation (2), the solution is unique and the coefficient vector *μ*<sup>1</sup> *μ*<sup>2</sup> *μ*<sup>3</sup> *<sup>T</sup>* = 0.0667 2.6333 1.7333 *<sup>T</sup>* . If less measurements are available, for instance only *vm*<sup>1</sup> and *vm*2, then the EFM basis of the linear system

$$
\begin{bmatrix} N & \underline{\Omega} \\ N\_m & -\underline{\omega}\_m \end{bmatrix} \begin{bmatrix} \underline{\omega} \\ 1 \end{bmatrix} = \underline{\Omega} \tag{11}
$$

leads to the so-called extreme rays *f <sup>i</sup>* [35]

$$
\begin{bmatrix}
\int\_1 \underline{f}\_1 & \underline{f}\_2
\end{bmatrix} = \begin{bmatrix}
3.5 & 3.5 \\
0 & 2.7 \\
2.7 & 0 \\
0.8 & 3.5 \\
0.8 & 3.5 \\
0 & 2.7 \\
1.6 & 9.7 \\
2.7 & 0
\end{bmatrix} \tag{12}
$$

which defines a pointed polyhedral cone that is a subspace of the previous solution cone. Moreover, the flux spectrum *F*<sup>0</sup> = *v* : *v*min *<sup>i</sup>* <sup>≤</sup> *vi* <sup>≤</sup> *<sup>v</sup>*max *i* with *v*min *<sup>i</sup>* = min{ *fki*, *<sup>k</sup>* = 1, ··· , *<sup>p</sup>*} and *v*max *<sup>i</sup>* = max{ *fki*, *<sup>k</sup>* = 1, ··· , *<sup>p</sup>*} is easily deduced, giving

$$\begin{bmatrix} v\_1 = 3.5\\ 0 \le v\_2 \le 2.7\\ 0 \le v\_3 \le 2.7\\ 0.8 \le v\_4 \le 3.5\\ 0.8 \le v\_5 \le 3.5\\ 0 \le v\_6 \le 2.7\\ 1.6 \le v\_7 \le 9.7\\ 0 \le v\_8 \le 2.7 \end{bmatrix} \tag{13}$$

which indeed encloses the unique solution found with an additional measurement.

An alternative and straightforward way to compute the flux spectrum is provided by Flux Variability Analysis (FVA) [5], which consists in formulating a double optimization problem (minimization/maximization) of the flux distribution under the constraints provided by the metabolic network stoichiometry, the measurements, and any other additional biological constraints.

$$\begin{cases} \boldsymbol{v}\_{i}^{\min} = \min\_{\underline{\boldsymbol{v}}} \boldsymbol{v}\_{i} \\ \boldsymbol{v}\_{i}^{\max} = \max\_{\underline{\boldsymbol{v}}} \boldsymbol{v}\_{i} \end{cases} \forall i \in [1, n\_{\boldsymbol{v}}] $$
 
$$\text{s.t.} \quad \left\{ \begin{array}{l} A\_{i} \underline{\boldsymbol{v}} = \underline{\boldsymbol{b}}\_{i} \\ A\_{i} \underline{\boldsymbol{v}} \le \underline{\boldsymbol{b}}\_{i} \end{array} \right. \tag{14}$$

The unique solution to this problem, if bounded and feasible, can be computed using linear programming, as available in many software libraries (e.g., *linprog* in *Matlab* or other LP solvers such as CPLEX interfaced in the language of your choice such as Python or Julia) or dedicated environments (e.g., COBRA [36] and CellNetAnalyzer [37]). In our application

example, we use *linprog* to compute the flux spectrum in the situation where only the first measurement is available (*vm*1) and a constraint is imposed in the form *v*<sup>2</sup> + *v*<sup>3</sup> ≤ 5.

$$\begin{aligned} A\_{\varepsilon} &= \left[ \begin{array}{cccc} N & & & & \\ & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right] & \underline{\mathbb{b}}\_{\varepsilon} &= \left[ \begin{array}{cccc} 0 & 0 & 0 & 0 & 0 & 0 & 3.5 \end{array} \right]^T \\ A\_{\hat{\imath}} &= \left[ \begin{array}{cccc} -l\_8 & & & \\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \end{array} \right] & \underline{\mathbb{b}}\_{\hat{\imath}} &= \left[ \begin{array}{cccc} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 5 \end{array} \right]^T \end{aligned} \tag{15}$$

Thus, the spectrum *F*<sup>0</sup> is given by

$$\begin{bmatrix} v\_1 = 3.5\\ 0 \le v\_2 \le 5\\ 0 \le v\_3 \le 3.5\\ 0 \le v\_4 \le 3.5\\ 0 \le v\_5 \le 3.5\\ 0 \le v\_6 \le 5\\ 0 \le v\_7 \le 12\\ 0 \le v\_8 \le 3.5 \end{bmatrix} \tag{16}$$

The application of FVA can be delicate when the measurements are corrupted by noise, as the constraints imposed by the metabolic network and/or the bounds on the fluxes could become incompatible with the measurement information. Several approaches take account of the measurement uncertainty, such as Flux Spectrum Analysis [6] or Adaptive Flux Variability Analysis [38], which relax the constraints to allow for a feasible solution. Here, we consider a variation of our simple example where only the first measurement *vm*<sup>1</sup> = 10.5 would be available and 3 constraints would be imposed, i.e., *v*2 ≤ 5, *v*5 ≤ 5 , *v*8 ≤ 5. In this case, the matrices become

$$A\_{\ell} = \begin{bmatrix} & & N & & & \\ & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \quad \underline{h}\_{\ell} = \begin{bmatrix} & 0 & 0 & 0 & 0 & 0 & 0 & 10.5 \end{bmatrix}^{T}$$

$$A\_{\ell} = \begin{bmatrix} & & -I\_{8} & & & \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} \quad \underline{h}\_{\ell} = \begin{bmatrix} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 5 & 5 & 5 \end{bmatrix}^{T} \tag{17}$$

but unfortunately the LP solver returns the message that no feasible solution can be found. What is happening? In fact, the solution of the FVA problem without the knowledge of the first measurement *vm*<sup>1</sup> returns the following spectrum:

$$\begin{bmatrix} 0 \le v\_1 \le 10 \\ 0 \le v\_2 \le 5 \\ 0 \le v\_3 \le 5 \\ 0 \le v\_4 \le 5 \\ 0 \le v\_5 \le 5 \\ 0 \le v\_6 \le 5 \\ 0 \le v\_7 \le 15 \\ 0 \le v\_8 \le 5 \end{bmatrix} \tag{18}$$

which shows that the maximum admissible value of *v*<sup>1</sup> is 10. Hence, the noisy measurement *vm*<sup>1</sup> = 10.5 is incompatible with this upper bound, and it is not possible to include it as such in the equality constraints. A way round this issue is to introduce inequality constraints in the form

$$(1 - \varepsilon)v\_{m1} \le v\_1 \le (1 + \varepsilon)v\_{m1} \tag{19}$$

where *e* represents a relative uncertainty. In our example, we could choose *e* = 5%, which is the smallest uncertainty leading to the matrices modification

$$\begin{aligned} A\_{\varepsilon} &= \begin{bmatrix} N & \\ & & \\ \end{bmatrix} & & \underline{\underline{h}} &= \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}^{T} \\ A\_{i} &= \begin{bmatrix} & -I\_{8} & & & \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} & \underline{h} &= \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 5 & 5 & 5 & -9.975 & 11.025 \end{bmatrix}^{T} \end{aligned} \tag{20}$$

and a feasible solution

$$\begin{bmatrix} 9.975 \le v\_1 \le 10 \\ 0 \le v\_2 \le 5 \\ 4.975 \le v\_3 \le 5 \\ 4.975 \le v\_4 \le 5 \\ 4.975 \le v\_5 \le 5 \\ 0 \le v\_6 \le 5 \\ 9.95 \le v\_7 \le 15 \\ 4.975 \le v\_8 \le 5 \end{bmatrix} \tag{21}$$

On another matter, it can be convenient to eliminate the equality constraints and to formulate the problem in terms of inequality constraints only in a space of reduced dimension [39]. To this end, we can use the kernel (or null space) of the matrix *Ae*. If *<sup>A</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*nv*×(*nv*−*ne*) is a matrix whose columns form a basis of this kernel (*nq* <sup>=</sup> *nv* <sup>−</sup> *ne* is the nullity of the kernel for a full row rank matrix *Ae*), then any solution of *Aev* = *be* (Equation (3)) can be expressed as

$$
\underline{v} = \underline{v}\_0 + A\_0 \underline{q} \tag{22}
$$

where *<sup>v</sup>*<sup>0</sup> is a particular solution to Equation (3) and the vector *<sup>q</sup>* <sup>∈</sup> <sup>R</sup>*nq* allows the reformulation of the inequality constraints as

$$\mathcal{A}^0\_i \underline{\underline{q}} \le \underline{\underline{b}}^0\_i \tag{23}$$

with *A*<sup>0</sup> *<sup>i</sup>* <sup>=</sup> *AiA*<sup>0</sup> and *<sup>b</sup>*<sup>0</sup> *<sup>i</sup>* = *bi* − *Aiv*0.

A particular solution *v*<sup>0</sup> can for instance be obtained by solving the following problem (in this case *v*<sup>0</sup> is the flux vector with minimum Euclidean norm in the set of solutions)

$$\underline{v}\_0 = \min\_{\underline{v}} \underline{v}^T \underline{v}$$

$$\text{s.t.} \quad \left\{ \begin{array}{l} A\_{\varepsilon \underline{v}} = \underline{b}\_{\varepsilon} \\ A\_i \underline{v} \le \underline{b}\_i \end{array} \right. \tag{24}$$

To illustrate this, we return to our original example and consider the situation where only the first measurement *vm*<sup>1</sup> = 3.5 is available and a constraint is imposed in the form *v*<sup>2</sup> + *v*<sup>3</sup> ≤ 5. *v*<sup>0</sup> can be computed using quadratic programming, e.g., *quadprog* in Matlab

$$\mathbf{u}\_{0} = \begin{bmatrix} .3.500 & 0 & .2.625 & 0.875 & 0.875 & 0 & 1.750 & 2.625 \end{bmatrix}^{T} \tag{25}$$

The null space of the equality constraint matrix *Ae* <sup>∈</sup> <sup>R</sup>6×<sup>8</sup> can be computed using *null* in Matlab and is given by

$$A\_0 = \begin{bmatrix} 0 & 0.2019 & -0.2846 & 0.2846 & 0.2846 & 0.2019 & 0.7711 & -0.2846 \\ 0 & 0.5994 & 0.2627 & -0.2627 & -0.2627 & 0.5994 & 0.0740 & 0.2627 \end{bmatrix}^T \tag{26}$$

and, in turn

$$A\_i^0 = \begin{bmatrix} 0 & -0.2019 & 0.2846 & -0.2846 & -0.2846 & -0.2019 & -0.7711 & 0.2846 & -0.0827 \\ 0 & -0.5994 & -0.2627 & 0.2627 & 0.2627 & -0.5994 & -0.0740 & -0.2627 & 0.8621 \\ & & & & & & \end{bmatrix}^T \tag{27}$$
 
$$\text{and}$$

*b*0 *<sup>i</sup>* <sup>=</sup> 3.500 0 2.625 0.875 0.875 0 1.750 2.625 2.375 *<sup>T</sup>* (28)

The application of FVA in the q-space (a reduced space of dimension *nq* = 2), i.e., with inequality constraints only

$$\begin{aligned} q\_i^{\min} &= \min\_{\underline{q}} q\_i \\ q\_i^{\max} &= \max\_{\underline{q}} q\_i \end{aligned} \qquad \forall i \in [1, n\_q] \tag{29}$$
 
$$\text{s.t.} \quad A\_i^0 \underline{q} \le \underline{b}\_i^0$$

gdves a spectrum  $Q\_0$ 

$$
\begin{bmatrix}
\end{bmatrix}
\tag{30}
$$

In the reduced q-space, the system of inequality constraints defines half hyperplanes whose intersection consists of a convex polytope that contains all the admissible flux distributions *q*. Uniformly sampling this convex polytope allows subsequently computing the marginal probability density functions (or marginal distributions) of each flux.

In our toy example, the rejection algorithm [40] can be applied, which boils down to uniformly sample each coordinate *qi* (∀*<sup>i</sup>* <sup>∈</sup> [1, *nq*]) on [*q*min *<sup>i</sup>* , *<sup>q</sup>*max *<sup>i</sup>* ]. The obtained sample *q* is kept if it satisfies the inequality constraints *A*<sup>0</sup> *<sup>i</sup> <sup>q</sup>* <sup>≤</sup> *<sup>b</sup>*<sup>0</sup> *<sup>i</sup>* , otherwise it is rejected. The procedure is repeated until the desired number of samples is reached. Figure 2 shows 104 samples obtained with the rejection algorithm. Despite its simplicity and the genuine uniform distribution that it provides, this algorithm cannot be used with high dimensional spaces and irregular shaped polytopes given that the fraction of rejected samples increases dramatically with the number of metabolic fluxes considered in the network.

**Figure 2.** Rejection algorithm in the q-space (mean = X).

Other algorithms have been (and are still) developed to circumvent this problem [41,42]. Among the oldest and simplest methods, hit-and-run algorithms [43] consist of Markov Chain Monte Carlo methods that sample the convex polytope via some specific random walk. While they can be used with high dimensional spaces, their main drawback is that the samples often get stuck in some part of the polytope when this latter exhibits an irregular shape, which is generally the case. Figure 3 represents the marginal distributions of each flux in the v-space (transforming the *q* samples into *v* samples through Equation (22)), inferred from 10<sup>4</sup> samples obtained with the rejection algorithm and with a specific hit-andrun algorithm (namely, the random direction algorithm). Both results are almost identical.

**Figure 3.** Marginal probability distribution in the original v-space, inferred from 10<sup>4</sup> samples with the rejection algorithm (red) and a hit-and-run algorithm (blue).

#### **4. Reducing or Eliminating the Underdeterminacy**

A straightforward way to reduce or eliminate underdeterminacy is to include additional measurements, either extracellular as shown in Equation (6) for our simple example, or intracellular using <sup>13</sup>*C* tracing [13,14] or the integration of time-course absolute quantitative metabolomics [23,24], which would amount to directly measure some of the internal fluxes *vi*, *i* = 1, ... , 8 in the toy example. However, this implies additional time-consuming, delicate, and costly experiments and equipment. If sufficient additional measurements are not available, a candidate flux distribution can be provided by Flux Balance Analysis (FBA) [21,22], which assumes some optimal behavior of the cell, such as maximum cell growth rate or maximum ATP production rate, and formulates a linear programming problem

$$\begin{aligned} \underline{\boldsymbol{\up}} &= \operatorname\*{arg\,min}\_{\underline{\boldsymbol{v}}} \underline{\boldsymbol{\up}}^T \underline{\boldsymbol{v}}\\ \underline{\boldsymbol{s}}.t. & \left\{ \begin{array}{l} A\_{\underline{\boldsymbol{c}}} \underline{\boldsymbol{v}} = \underline{\boldsymbol{b}}\_{\boldsymbol{c}} \\ A\_{\underline{i}} \underline{\boldsymbol{v}} \le \underline{\boldsymbol{b}}\_{\boldsymbol{i}} \end{array} \right. \end{aligned} \tag{31}$$

If the LP is feasible and bounded, then the cost function *J* = *λTv* has a unique minimum value *J*∗, but the corresponding flux distribution *v*ˆ is not necessarily unique, still implying underdeterminacy. In this latter case, it is possible to combine FBA (31) with FVA by subsequently solving a set of 2*nv* LPs

$$\begin{aligned} \upsilon\_i^{\min} &= \min\_{\underline{\mathbf{z}}} \upsilon\_i \\ \upsilon\_i^{\max} &= \max\_{\underline{\mathbf{z}}} \upsilon\_i \\ s.t. \quad \left\{ \begin{array}{l} A\_{\underline{c}\underline{\mathbf{z}}} = \underline{b}\_{\underline{c}} \\ \underline{\lambda}^T \underline{\mathbf{z}} = J^\* \\ A\_i \underline{\mathbf{z}} &\le \underline{b}\_i \end{array} \right. \end{aligned} \tag{32}$$

This FVA problem includes an additional equality constraint enforcing the value *J*<sup>∗</sup> = *λTv* determined in the first FBA step.

This can be illustrated with our toy example, first by applying FBA with the assumption that *v*<sup>7</sup> is maximum. In this case, *v*ˆ7 = 12 and the corresponding flux distribution is given by

*v*ˆ = 3.50 5.00 0 3.50 3.50 5.00 12.0 0 *<sup>T</sup>* (33)

which is confirmed to be the unique solution by applying FVA (which produces a flux spectrum which reduces to the single value *v*ˆ).

If we repeat this exercise with the maximization of *v*8, we find *v*ˆ8 = 3.5, but the corresponding flux distribution is not unique and belongs to the spectrum

$$\begin{bmatrix} v\_1 = 3.5\\ 0 \le v\_2 \le 1.5\\ v\_3 = 3.5\\ v\_4 = 0\\ v\_5 = 0\\ 0 \le v\_6 \le 1.5\\ 0 \le v\_7 \le 1.5\\ v\_8 = 3.5 \end{bmatrix} \tag{34}$$

#### **5. An Overview of Important Topics**

In this section, we address a few important questions when dealing with the analysis of metabolic networks. Some of them have a direct impact on the underdeterminacy of the metabolic flux analysis, such as the topology and size of the metabolic network or the selection of the extra- or intra-cellular measurements. On the contrary, other issues, such as the dynamic evolution of the extracellular fluxes, might have no particular influence on this underdeterminacy, but will impact the computational procedure and the visualization of the results, and this is why we include them in the global picture.

#### *5.1. How to Select the Size/Detail of the Metabolic Network?*

The structure and size of the metabolic network can be represented by an incidence graph, whose topological properties are important for the solution of Equations (1)–(4). The analysis of these properties, such as determinacy, redundancy, balanceability, and calculability, can be assessed using software tools such as CellNetAnalyser [37] or COPASI [44], which were the first to propose such functionalities. The size of the network will also have a tremendous influence on the number of EFMs. A large network will probably imply that the EFMs are no longer enumerable due to memory and computation limitations. Hence, the importance of procedures to compute only subsets of EFMs such as [31], and of concepts such as the giant strong component (GSC), which represents the largest fully connected part of a metabolic network as introduced originally in [45]. Indeed, the GSC usually contains less than one-third of the nodes of the network, but key metabolites, and is more feasible for analysis of flux distribution and computation of EFMs. Another concept of interest is the introduction of minimal cut sets (MCSs), which represent sets of reactions whose removal will disable certain network functions [46], and which have been shown to be the EFMs of a dual network [47]. MCSs can be used to study the observability of reaction rates in metabolic flux analyses. The selection of the network will therefore be a compromise between describing the metabolic features of interest and the tractability of the computation procedure underlying the flux determination. An open research question is the selection of the right metabolic network for the derivation of low-order dynamic models (as introduced in the following Section 5.4). Should a reduction be operated a priori based on biological assumptions and simplifications or only a posteriori, in the course of the derivation of the dynamic macroscopic model?

#### *5.2. Dynamic Metabolic Flux Interval Analysis*

To date, we have considered that the measured specific extracellular fluxes are constant. This is typically the case in the early exponential growth phase of batch cultures or in

the steady state of continuous cultures. However, there are other situations where the fluxes are time varying following changes in the environmental conditions (substrate excess or depletion, accumulation of byproducts). This can occur in fed-batch cultures, the transient phase between batch and continuous modes, or the transient phase between different setpoints in continuous operation. The previous analysis can be extended to these situations by considering a time-scale separation where the bioreactor environment is the slow subsystem and the cells or micro-organisms the fast subsystem, which can therefore be considered in a pseudo steady-state. The dynamics of the extracellular substrates *S* and products *P* can be described by a set of mass balance equations

$$\begin{aligned} \frac{d\underline{S}}{dt} &= -\underline{\underline{v}}\_{S}X - D(\underline{S} - \underline{S}\_{in})\\ \frac{d\underline{P}}{dt} &= \underline{\underline{v}}\_{P}X - D(\underline{P} - \underline{P}\_{in}) \end{aligned} \tag{35}$$

where *S* and *P* represent the extracellular substrate and product concentrations, respectively, *vS* is the vector of specific uptake rates, *vP* the vector of specific production rates, *D* = *Fin*/*V* is the dilution rate (ratio of the inlet flow rate *Fin*(*t*) to the culture volume *V*(*t*)).

A straightforward approach consists in smoothing the extracellular concentration evolutions, computing the derivatives of the smoothed signals and evaluating the uptake and production rates using Equation (35) and the knowledge about the transportation terms (functions of the dilution rate and inlet concentrations *Sin* and *Pin*). This approach was originally developed for MFA with no underdeterminacy or even overdeterminacy. One of the earlier reports can be found in [48] where the lysine fermentation process by Corynebacterium glutamicum is studied and the cell metabolic state is estimated online based on a small metabolic network with 11 fluxes. This work is extended in [49], where HEK cells are cultivated in perfusion and the metabolic fluxes are estimated online using a medium-size metabolic network of 40 reactions. Other notable accounts include [50], where Escherichia coli cultivations shifting from carbon limitation to nitrogen limitation and vice versa are studied, and [51], where a human cell line is analyzed and the authors compare Dynamic MFA (DMFA) to a flux average approach where the culture is divided in phases over which constant (average) fluxes are considered.

When underdeterminacy prevails, the approaches previously reviewed can be extended to take account of the dynamic evolution of the extracellular fluxes as well.

In [52,53], Dynamic Flux Balance Analysis (DFBA) is introduced and applied to the analysis of diauxic growth of Escherichia coli on glucose and acetate. Two optimization approaches can be considered: (a) a sequence of static LP problems corresponding to the subdivision of the batch time into time intervals over which the fluxes are assumed constant, or (b) a global dynamic optimization formulated over the total batch duration that can be solved using multiple shooting and orthogonal collocation to be converted into a NLP. The latter approach allows the consideration of an integrated (over the system trajectory) optimality objective or an end-of-batch objective, as well as nonlinear constraints on the fluxes resulting from a priori knowledge about kinetic expressions, but results in a significantly more complex problem. DFBA is nowadays a popular approach, which has led to many interesting applications (see for instance [54–58]) and the emergence of software tools such as DFBLAB [59].

In [60], Flux Spectrum Approach (FSA), which is a method based on the formulation of a set of min/max LP problems taking account of a range of measurement errors, is applied to the analysis of the evolution over time of the fluxes in small metabolic network of the CHO metabolism and data borrowed from [35].

In [61], a linear objective function subject to elementary modes as constraints is optimized to determine the fluxes in the metabolic network of Corynebacterium glutamicum at different temporal phases of fermentation. The use of convex analysis and EFMs is investigated in [35] to determine intervals for the metabolic fluxes of CHO cells in batch cultures, which are decomposed into several time periods corresponding to different phases of the

cell life cycle. This work is extended to more detailed metabolic networks in [62] and to the dynamic evolution of the flux spectrum, without assuming a decomposition in phases, in [34] where convex analysis is applied to hybridoma cultures switching from batch to perfusion mode.

#### *5.3. How to Represent the Accumulation of Internal Metabolites?*

Besides the possible time evolution of the extracellular fluxes discussed in the previous subsection, another dynamic phenomenon might need special attention: the accumulation of an intracellular metabolite over time, implying that the traditional assumption of quasi steady-state is no longer valid. This situation is, for instance, observed in cultures of photosynthetic micro-organisms that can accumulate various components, such as carbohydrates and lipids. It is also well-known that yeasts, such as Saccharomyces cerevisiae, are able to accumulate some carbohydrates, e.g., trehalose, that play a role of carbon and energy reserve, as well as of stress protectant against harmful environmental conditions [63]. Abandoning the quasi steady state assumption for some of the intracellular metabolites boils down to removing their corresponding rows in the stoichiometric matrix N involved in Equation (1), hence increasing the number of degrees of freedom that characterizes the system underdeterminacy. To compensate for the mass balance equations withdrawn from (1), the kinetics of accumulation and/or reuse of the intracellular metabolites should be included in mass balance ODEs. However, the lack of intracellular measurements along time usually hampers the identification of these intracellular reaction rates. In [64], the authors propose the DRUM (Dynamic Reduction of Unbalanced Metabolism) modeling framework that consists in defining subsets of balanced intracellular metabolites that are interconnected via linking metabolites. These latter may accumulate or be reused. The subsets of balanced metabolites are reduced to macroscopic reactions, based on Elementary Flux Mode analysis, for which simple kinetic models are derived that allow building mass balance ODEs for the linking metabolites, biomass production, substrate consumption and product excretion. This methodology is applied to the lipid and carbohydrate accumulation in the microalgae Tisochrysis lutea. The modeling approach proposed in [65] can be applied to any metabolic network whose kinetics can be locally linearized. As in [64], the authors also consider subnetworks of fast reactions (involving metabolites that are assumed at quasi steady state) connected via metabolites consumed at low rates. Based on this time scale separation, the methodology allows reducing high dimensional linearized models, while accounting for the accumulation of some metabolites, as well as for the dilution effect due to biomass growth. In [66], a FBA-based simulator of Saccharomyces cerevisiae fed-batch cultures is proposed, using the assumption that the intracellular alpha-ketoglutarate is unbalanced. Its dynamics are described with a linear combination of the glucose and nitrogen specific uptake rates whose models involve the alpha-ketoglutarate concentration.

#### *5.4. Model Reduction to Macroscopic Scale*

Macroscopic models mainly predict biomass growth, the consumption of external substrates and the secretion of external products. Their structural simplicity allows their use for bioprocess optimization, control and online monitoring. To these purposes, it can be worthy to reduce metabolic models to a macroscopic scale. In [67,68], the authors propose a systematic methodology to build macroscopic reaction rate models via the definition of additional constraints whose number corresponds exactly to the number of degrees of freedom, i.e., the difference between the number of metabolic fluxes and the number of available equality constraints, hence removing the system underdeterminacy. These constraints express linear combinations of the metabolic fluxes as nonlinear functions of the extracellular concentrations, which correspond to the macroscopic reaction rate models. Another method that has often been used to define macroscopic reactions is the Elementary Flux Mode analysis [32–34,69]. As discussed above, EFMs are the shortest pathways from substrates to products. However, the main drawback with EFMs is that their total number dramatically increases with the network size. To overcome this problem, different solutions have been proposed. To avoid the exhaustive enumeration of all EFMs, refs [31,70] propose fast algorithms that randomly compute minimal sets of EFMs. In [71], the number of EFMs is reduced through a projection from the flux space to the yield space. Refs. [72,73] select EFMs, via ranking or controlled random search algorithms, using a multi-criteria objective function that combines prediction error, model size and efficiency of the EFMs (investment required to produce the enzymes). Another approach is made of Lumped Hybrid Cybernetic Models in which EFMs are grouped into clusters, each of them being associated to an average EFM [74]. In [75], the column generation method is used to determine a (non-necessarily unique) minimal subset of EFMs, which consists in solving iteratively two levels of optimization problems: the master problem is a quadratic optimization problem that identifies the macroscopic flux values using a subset of EFMs, and the subproblem is a linear problem for identifying EFMs that improve the model prediction in the master problem. Based on extensions of Dynamic Metabolic Flux Analysis introduced in [76], that only uses concentration measurements and avoids any numerical differentiation, refs. [77,78] select reduced sets of EFMs via a geometrical reduction (excluding EFMs with a cosine-similarity algorithm) followed by a multi-objective genetic algorithm that minimizes the prediction error and the size of the EFMs subset. A linear optimization problem has been formulated in [79] for selecting the best subset of EFMs based on a relaxation criterion. The methodology is extended in [80] and includes a more efficient selection procedure for the minimal subset of EFMs. Note that once the macroscopic reactions have been deduced from the metabolic network, it remains to identify their kinetic models. To that purpose, general kinetic models and systematic identification procedures can be very useful [81,82]. Model reduction methodologies based on subsets of balanced metabolites interconnected via linking metabolites that may accumulate within cells [64,65], have also been introduced in the previous section. Finally, independently of any EFM computation, macroscopic models may also consist of ODEs describing the mass balances for the biomass and the extracellular species, in which the reaction rates are computed at each time point by solving a FBA problem [16,17,66]. In [66], the underdeterminacy of the FBA problem is taken into account within the set of mass balance ODEs by computing the minimum and maximum admissible values of the ethanol concentration associated to the respective minimum and maximum admissible values of the specific ethanol production rate that are computed through FVA. Hence, the underdeterminacy at the level of FBA leads to corridors of admissible values along time for the concentrations of some macroscopic components, typically the extracellular products that are secreted.

#### *5.5. How to Handle the Measurement Errors?*

In the toy example, the adverse effect of an error on the measured extracellular flux was alleviated by relaxing an equality constraint and reformulating it as two inequality constraints (see Equation (19)). Such errors are frequent in practice, as it is necessary to compute the specific extracellular fluxes from the measurements of the evolution of the biomass and the extracellular concentrations. If we consider, for simplicity, the exponential growth phase of a batch culture, this can be formulated in the following way:

$$\begin{aligned} \frac{d\underline{S}}{dt} &= -\underline{v}\_S X\\ \frac{d\underline{P}}{dt} &= \underline{v}\_P X\\ \frac{dX}{dt} &= \mu X \end{aligned} \tag{36}$$

The solution of these equations are in the form of a linear regression

$$\begin{aligned} \underline{S}(t) &= -\frac{\underline{\mathcal{Z}s}}{\mu} \mathcal{X}(t) + (\underline{\mathcal{Z}}(0) + \frac{\underline{\mathcal{Z}s}}{\mu} \mathcal{X}(0)) = \underline{a}\_1 \mathcal{X}(t) + \underline{b}\_1 \\ \underline{P}(t) &= \frac{\underline{\mathcal{Z}p}}{\mu} \mathcal{X}(t) + (\underline{P}(0) - \frac{\underline{\mathcal{Z}p}}{\mu} \mathcal{X}(0)) = \underline{a}\_2 \mathcal{X}(t) + \underline{b}\_2 \\ \ln(\mathcal{X}(t)) &= \mu t + \ln(\mathcal{X}(0)) = a\_3 t + b\_3 \end{aligned} \tag{37}$$

where the regressors *ai* give the estimation of the specific fluxes. If the environmental conditions evolve over time, it will be necessary to consider the inflows and outflows of the bioreactor as in dynamic model (35), and the variation of the growth rate of the biomass. This will imply the evaluation of the time derivatives using smoothing and numerical differentiation as explained in Section 5.2, or the formulation of the fluxes as piecewise linear functions without the need for numerical differentiation as proposed by [76]. Whatever the numerical procedure, experimental and numerical errors will always corrupt the information about the specific fluxes, which could in turn become incompatible with the constraints imposed by the metabolic network and prior knowledge about the system biology. This has to be taken into account by some form of constraints relaxation such as developed in Flux Spectrum Analysis [6] or Adaptive Flux Variability Analysis [38], where the uncertainty on the extracellular fluxes is represented as an interval around the measured values. In [38], minimum values for these uncertainties are determined by solving a sequence of optimization problems. The impact of these uncertainties on the solution, i.e., on the extent of the intervals for the intracellular fluxes, will largely depend on the structure and size of the metabolic network. This point is, however, still an open research question.

#### *5.6. Some Further Perspectives on Sampling Algorithms*

The ongoing research on sampling algorithms aims at improving their computational efficiency, convergence properties and ability to extensively explore the convex polytope of admissible flux distributions. To avoid the above mentioned problem of the samples that often become stuck in some part of the polytope when using hit-and-run algorithms [43], other methods have been proposed such as the artificial centering hit-and-run method (ACHR) [83] and the optimized general parallel sampler (OPTGP) [9]. Ref. [84] showed that the ACHR algorithms have convergence problems with high-dimensional polytopes, and introduced rounding methods with better performances. These methods were further improved in [10], leading to the coordinated hit-and-run with rounding (CHRR) method that computes the largest ellipsoid inscribed in the polytope and the rounding transformation of this ellipsoid into a unit ball. This latter transformation is then applied to the convex polytope whose sampling therefore becomes much more efficient in terms of computational time and convergence. Regarding these criteria, refs. [41,42] have shown that CHRR outperforms ACHR and OPTGP. Recently, refs. [39,85] have proposed the DISCOPOLIS (DIscrete Sampling of COnvex POlytopes via Linear program Iterative Sequences) algorithm that, instead of being a Markov Chain Monte Carlo method, provides (subsets of) samples that are independent of each other. This allows obtaining larger ranges of admissible flux values. Given the increasing complexity of the available metabolic networks, involving thousands of fluxes, there is still a need for the development and improvement of sampling algorithms with a reasonably low computational time and extensive exploration abilities for irregular shaped polytopes. Another difficult challenge consists in sampling non-convex solution spaces that are observed when using thermodynamical constraints for preventing infeasible loops in the metabolic network [20,84].

#### **6. Conclusions**

This paper reviews and applies to a toy example (the Matlab code of this example is provided in the Supplementary Materials associated to this article) some methods that can be used with underdetermined problems in metabolic flux analysis. These methods are grouped in two different strategies, the former consisting in simply dealing with the underdeterminacy without trying to reduce it, and the latter consisting in reducing or even eliminating the underdeterminacy. The choice between these two strategies depends on the specific problem to be solved and on the goals of the analysis. On the one hand, one could be interested by a unique solution that would be representative of a specific metabolic behavior in the considered cell line, e.g., by adding additional constraints describing these specific conditions and/or by defining an appropriate optimal behavior through FBA. On the other hand, one could be interested by the diversity of the possible metabolic behaviors resulting from the underdeterminacy, e.g., by analyzing the marginal distributions of each flux obtained from a sampling method. The point of utmost importance is to be aware of the system underdeterminacy. For example, even if one solution is provided by an algorithm used to solve the linear program of a FBA problem, that solution is not necessarily unique and a quick check via FVA could highlight that the system remains underdetermined. Many of the methods that have been presented are actually complementary, as illustrated in the abovementioned coupling of FBA with FVA. System underdeterminacy is a direct consequence of biological complexity and of the limited access to intracellular measurements. Developing methods to analyze underdetermined systems and/or to reduce their underdeterminacy in diverse and complementary ways will therefore remain an important research topic in the future. The interested reader might also consider the recent review [86], which includes an in-depth discussion of kinetic approaches for modeling cell metabolism.

**Supplementary Materials:** The following material is available online at https://www.mdpi.com/ article/10.3390/pr9091577/s1, code S1: toy\_example.zip.

**Author Contributions:** Conceptualization, P.B. and A.V.W.; methodology, P.B. and A.V.W.; software, P.B. and A.V.W.; investigation, P.B. and A.V.W.; writing—original draft preparation, P.B. and A.V.W.; writing—review and editing, P.B. and A.V.W.; visualization, P.B. and A.V.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

