*3.1. Implementation*

POPI4SB is implemented in Python, and utilizes PySCeS for running simulations. The code for carrying out experiments is available online: https://github.com/jmtomczak/ popi4sb. The list of requirements is provided therein.

#### *3.2. Parameter Identification & the Fitness Function*

We consider the glycolysis process in yeas<sup>t</sup> as a biochemical system with inputs and outputs (see Figure 3). The input to the system is glucose (*glu*), and the outputs are ATP (*atp*), NAD (*nad*), acetaldehyde (*ac*), and external acetaldehyde (*ace*). The other metabolites, i.e., triose phosphates (*triop*), pyruvate (*pyr*), fructose-1,6-biphosphate (*fru*) and triphosphoglycerate (*tp*) are considered to be unobserved quantities. The system is governed by 11 reactions with 18 parameters in total (see Appendix for details). Each reaction is represented by an ordinary differential equation that is known. We assume that we have inputs and outputs, namely, i.e., *glu*, *atp*, *nad*, *ac*, and *ace*, and each quantity is represented as a timecourse of length *T*. We denote these measurements by

$$\mathcal{D} = \{\emptyset \\ \mathfrak{l}u, a \\ \mathfrak{l}p, \mathfrak{u}a d, \mathfrak{a}c, \mathfrak{a}c\}.$$

Further, following the nomenclature presented in [37], we consider the system of differential equations representing the glycolysis process as the **simulator** that for given values of parameters and initial conditions provides timecourses of all metabolites. Then, we can denote parameters by **x** and the simulator by sim : X → <sup>R</sup>9×*T*, i.e., sim takes parameters **x** and simulates timcourses of length *T* for all 9 metabolites, including *glu*, *atp*, *nad*, *ac*, *ace*. In order to calculate the objective (or the fitness) of the parameter values, we use the following function:

$$f(\mathbf{x}; \mathcal{D}) = \sum\_{i=1}^{5} \frac{1}{\gamma \cdot T} \sum\_{t=1}^{T} ||\mathbf{y}\_{i,t} - \text{sim}\_{i,t}(\mathbf{x})||\_{2'}^{2} \tag{31}$$

where **y***i*,*<sup>t</sup>* corresponds to one of the five observed metabolites at the *t*-th time step, and sim*i*,*<sup>t</sup>*(**x**) is the corresponding synthetically generated signal given by the simulator with parameters **x**, *γ* > 0 specifies the strength of penalizing a mistake. Notice that this is the (unnormalized) logarithm of the product of Gaussian distributions with means given by sim(**x**) and the diagonal covariance matrix with shared variance *γ*.

#### *3.3. Simulated Data*

In the experiments, we assume that *glu*, *atp*, *nad*, *ac*, and *ace* are observed. We generate the observed metabolites by running the simulator with the real parameter values. To mimic real measurements that are typically noisy, we add a Gaussian noise with zero mean and the standard deviation equal 3% of a generated value of a metabolite at a given time step. Adding noise prohibits finding a solution (i.e., values of parameters) that achieves error defined in Equation (31) equal zero. We repeat all experiments three times. For each repetition, we set the length of a timecourse to *T* = 30.
