1. Introduction
Microalgae are unicellular photosynthetic organisms. They are capable of capturing gases such as CO
from internal combustion engines and industries, and converting it into oxygen [
1]. Furthermore, some strains of microalgae have the ability to thrive under stress conditions while removing oxygen peroxide, nitrates, and phosphates present in wastewater [
2], making microalgae suitable for several bioremediation strategies. In addition, microalgae CO
capture through photosynthesis and its transformation into several industrial raw materials such as carbohydrates, lipids, proteins, pigments, aromatic compounds, etc., is a more economical and attractive renewable source [
3,
4].
A large number of strains of microalgae have been studied, finding several metabolic pathways involved in the synthesis of many secondary metabolites. However, the production rate of these metabolites varies from one species to another or even in the same species, according to different environmental and metabolic conditions. The production of the secondary metabolites depends on many factors, such as the type of microalgae and the culture conditions, temperature, pH, lighting, and nutrient sources [
5].
So far, metabolic models built from genomic sequences allow a quantitative view of the transport and metabolism of compounds within a target organism. In addition, these metabolic models have long been used to generate optimized design strategies for an improved production process [
6].
Most metabolic models of microalgae focus on studying a single objective function, e.g., biomass. For the particular case of metabolic networks in steady-state, Flux Balance Analysis (or FBA) is the most commonly used optimization method for their study [
7,
8]. Equation (
1) defines the associated FBA linear optimization problem [
8], where
v is the flux vector across the reactions. The stoichiometric matrix
represents the metabolic network, where there is a metabolite per row and a reaction per column. The value of the cell
is the stoichiometric coefficient of the metabolite
i involved in reaction
j [
7], and the
are the lower and upper bounds for the fluxes allowed in the metabolic system. The steady-state assumption is established by
[
9].
The solution space for FBA is defined by Equation (
1), and within it optimizes a single bioproduct of interest, usually biomass. Such is the case of the application of FBA on photosynthetic organisms models, including
Synechocystis sp. PCC 6803 [
10,
11],
Synechococcus sp. PCC 7002 [
12,
13],
Cyanothece sp. ATCC 51142 [
14],
C. reinhardtii [
15],
Anabaena sp.
UTEX 2576 [
16],
Chlorella vulgaris UTEX 395 [
6],
Chlorella variabilis [
17],
Chlorella protothecoides [
18], and
Arabidopsis thaliana [
19] to estimate fluxes and yields.
However, despite the widespread use of FBA to predict fluxes in large-scale networks, it is not always accurate in predicting fluxes in vivo [
20]. Moreover, most metabolic models satisfy
, meaning that multiple solutions might be found. This situation becomes more complex in simultaneous bioproducts optimization. A recent trend that works in metabolic analysis involves optimizing several objectives to engage in the study of more than one bioproduct of interest [
21,
22,
23]. In the past decade, this method can be traced back to the work of Zomorrodi and Maranas [
24]. There, they developed the computational framework OptCom for FBA of microbial communities. The foundation of the framework is multi-level optimization; it optimizes problems embedded one another in a hierarchical structure for the sake of reaching optimum values for the final chosen bioproduct. Budinich et al. [
21] extend FBA for microbial communities by defining a Multi-Objective FBA (MOFBA) in order to study multiple trade-offs between nutrients and growth rates. More recently, Andrade et al. [
22] and Pelt-KleinJan [
23] proposes a multi-objective formulation of FBA that considers nutrient limitations for metabolic analysis.
Multi-objective optimization has been exploited in a wide variety of fields in science and engineering [
25,
26]. MOFBA, in particular, appears in medicine, where Zhang and Boley [
27] proposed a non-linear MOFBA to explain the impact of the objectives cells in the Warburg effect in different cell types. Moreover, the works [
21,
24,
27] simulate genome-scale metabolic models for microbial ecosystems as a single strain exchanging; they use multi-objective flux equilibrium analysis, and flux variability analysis (MO-FVA).
The main goal for multi-objective optimization is a good approximation of the Pareto frontier, cf. [
28];
Figure 1 illustrates this within a metabolite context in a bi-objective function maximizing carbohydrates and proteins [
29]. In this field, Multi-objective Evolutionary Algorithms (MOEAs) are widely recognized. Mainly, the algorithm NSGAII (Non-Dominated Sorting-based multi-objective EA) proposed by [
30,
31] has been quite effective when handling two or three objectives [
32,
33]. Based on the survey in [
34], the only related work that uses NSGAII for FBA optimization is by Costanza et al. [
35].
An overall view of the previous analysis indicates that the motivation for using multi-objective optimization in FBA lies in improving the prediction capability of FBA. However, the revised approaches do not adequately exploit the versatility of metaheuristics to approximate the Pareto frontier under a moderate consumption of computational resources. In other words, using a metaheuristic can better approximate the Pareto frontier, and provide a greater diversity of solutions than the previous approaches [
25,
26].
Hence, this work proposes a novel implementation of the metaheuristic algorithm NSGAII [
30] for microalgae growth optimization. The novelty in the proposed NSGAII includes an original encoding scheme or genotype and an original fitness evaluation function. While in [
35], NSGAII uses a knockout vector as genotype or encoding scheme, and
OptKnock (cf. [
36]) as fitness evaluation, the proposed NSGAII uses an original encoding scheme that generalizes the previous one, and an original fitness function evaluation based on FBA. The proposed encoding scheme is a generalization because the associated solutions’ search space includes the knockouts. The use of FBA instead of
OptKnock as a fitness function might significantly impact the performance of the algorithm because instead of solving a costly combinatorial optimization problem as in
OptKnock, it solves a simpler linear equation system.
The conducted experimental design demonstrates the validity against a glyclolysis module of a reduced metabolic network for microalgae
Chlamydomonas reinhardtii [
20]. Moreover, the proposed NSGAII is compared against FBA, and the results show that while the quality of the solution remains, the proximity to an ideal point is improved statistically and it achieved a greater diversity of solutions. Hence, the main contributions are the novel multi-objective optimization problem for metabolic analysis and the metaheuristic algorithm to solve it.
2. Results
Table 1 summarizes the performance of NSGAII and FBA. Column 1 shows each configuration considered. Column 2 shows the quality of solution
achieved by NSGAII. Columns 3 to 5 show the value of
for FBA considering as objectives each of the bioproducts chosen in the associated configuration. Finally, Column 6 presents the number of solutions produced by NSGAII; this number denoted
, is the number of non-dominated solutions reported by the algorithm.
The solution quality was statistically compared between NSGAII and the distinct solutions reported by FBA for each configuration, and each objective took the lead. The null hypothesis was tested: the medians of the differences between the two group samples are equal. Using the Wilcoxon statistical test with a significance level set to , and as pairs of group samples , , , the obtained p-values were and , respectively. These results mean a rejection of , indicating a difference between the quality results of NSGAII and FBA, favoring NSGAII due to its lower values.
The results in
Table 1 show that NSGAII improved FBA in terms of quality. Considering multiple objectives, NSGAII obatined closer solutions to the ideal point than those obtained by FBA. Moreover, the statistical test confirms that there is indeed a significant difference among these results. In addition, the good performance of NSGAII with respect to the diversity indicator
is also confirmed according to the values shown in Column 6, where the number of solutions ranges from a few dozens to several hundred, depending on the configuration, while classical FBA usually offers only one solution when Flux Variance Analysis (FVA) is not used.
Figure 2,
Figure 3,
Figure 4 and
Figure 5 offer a perspective of the behavior of NSGAII concerning the spread indicator
applied to the results achieved in the configuration
. All these figures show the ideal point in purple color, the three solutions reported by FBA in green color, and all the solutions reported by NSGAII are in blue circles. From these graphics, three main observations must be commented on: (1) first, the solutions of NSGAII describe the real contour of the Pareto frontier, while the solutions by FBA are only extreme points; (2) there exist solutions closers to the ideal point even though FBA warranty optimal solutions; and (3) the solutions of NSGAII are widely spread in the Pareto frontier. Hence, from the previous observations, it can be noted that NSGAII spread improves that of FBA; this behavior is repeated in all the remaining configurations.
A closer numerical look at the differences between NSGAII and FBA under configuration
is shown in
Table 2. This table compares the fluxes achieved by FBA in each case against some selected solutions reported by NSGAII.
Some additional insights arise from the previous results. Let us begin with the variety of configurations used in the comparison; this demonstrates the versatility of NSGAII to adapt to different circumstances and its capacity to improve the analysis of the metabolic network given the larger number of solutions produced for each of them.
After that, evolutionary approaches require fewer resources than FBA when dealing with multiple objectives; for example, it has the advantage of spending less time and memory. Approaches such as NSGAII allow a greater power of choice in the decision-making process also due to the variety and number of solutions, and the possibility of an easier recognition of the most important fluxes in a network and their influence and impact rather than not having a methodology.
Finally, by analyzing sets of several tens or hundreds of solutions simultaneously instead of just one through the classical approach, it is possible to have a better perspective of what is happening in cell metabolism.