**1. Introduction**

Nowadays, data-driven modeling is a very popular concept, first of all because of many examples of the successful application for a wide range of tasks where we have data samples which are sufficient for model training. However, originally the term "modeling" assumes a wider meaning than just identifying numerical coefficients in equations. One may say that modeling is an art of creation of mathematical (in the context) models that describe processes, events, and systems with mathematical notation. And current successes of artificial intelligence (AI) give the opportunity to come closer to the solution of the task of mathematical modeling in this original formulation.

For this purpose we may use an approach of generative design that assumes openended automatic synthesis of new digital objects or digital reflections of material objects which have desired properties and are aligned with possible restrictions. Open-ended evolution is a term that assumes ongoing generation of novelty as new adaptations of specimens, new entities and evolution of the evolvability itself [1]. We assume that new objects are objects with essentially new features that appeared during the adaptation process and that can't be obtained with simple tuning or recombination of initially known parameters. Other words, it is an approach that aims of algorithmic "growing" of a population of new objects when each of them is aligned with restrictions and have desired properties, to some extent. However, only the objects which could maximize the measure of fitness will be used for their intended purpose. The generative design is a well-known concept for creation of digital twins of material objects [2]. The same idea can be applied to mathematical models [3]. Indeed, it is known that we may grow mathematical expressions that approximate some initial data with a symbolic (usually polynomial) regression approach. However, if we look at mathematical expressions in a wider perspective we may admit that expressions could be different even much more complicated. For example, we may try to apply this approach to the problem of searching for an equation of mathematical physics that is able to describe observed phenomena. Or, we may want to create in an automated

**Citation:** Kalyuzhnaya, A.V.; Nikitin, N.O.; Hvatov, A.; Maslyaev, M.; Yachmenkov, M.; Boukhanovsky, A. Towards Generative Design of Computationally Efficient Mathematical Models with Evolutionary Learning. *Entropy* **2021**, *23*, 28. https://dx.doi.org/10.3390/e23010028

Received: 9 November 2020 Accepted: 24 December 2020 Published: 27 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

way a complicated data-driven model that consists of many single models and feature processing stages. Tasks in both examples can be formalized as the generative design of computer models.

Both of cases (model as mathematical equation and complicated data-driven models) have their own spheres of application, but they also can be joined as composite models. In machine learning the composite model case often is described in terms of the multi-model data-driven pipelines. If a single data-driven model cannot provide appropriate results, various ensembling techniques like stacking or blending are applied [4]. To achieve better quality, complex modeling pipelines can be used, that include different pre-processing stages and can contain several types of models. A generalization of ensembling approaches is the composite model concept [5]. A composite model has a heterogeneous structure, so it can include models of different nature: machine learning (ML), equation-based, etc. [6].

A design of a composite model can be represented from an automated ML (AutoML) perspective that may use a genetic algorithm for learning the structure. The evolutionary learning approach seems to be a natural and justified choice because of several reasons. First of all, the idea of generative design refers to the possibility of controlled open-ended evolution under a set of restrictions. After that, genetic algorithms give flexible opportunities for treating mixed problems with combinatorial and real parts of a chromosome.

However, the design of the composite model may depend on different factors: the desired modeling quality, computational constraints, time limits, interpreting ability requirements, etc. It raises the problem of co-design [7] of the automatically generated composite models with the specific environment. Generative co-design is an approach which allows to synthesize jointly a set (mostly a pair) of objects that will be compatible with each other. In context of this article these are mathematical models and computational infrastructure. The conceptual difference between the generative design (that builds the model on a basis of dataset only) and the generative co-design (that takes into account both data and infrastructure) is illustrated in Figure 1. The structure of composite models can be very complex, so it is complicated to construct the models in an expert way. For this reason, different optimization techniques are used for the structural learning of the model. Usually, the objective function for optimization is aimed to minimize the error of the predictions obtained from the candidate model [8].

**Figure 1.** The description of the generative co-design concept: the different aspects of the model design (genotype, phenotype, and the identification methods); the pipeline of the data-driven modeling; the difference between classical design approach and co-design approach.

The paper is organized as follows. Section 2 describes the existing approaches to the design of models. Section 3 provides the mathematical formulation for the model's design and co-design tasks and associated optimization problems. Section 4 described the actual issues of generative co-design for the mathematical models. Section 5 provides the results of experimental studies for different applications of generative design (composite models, equation-based models, etc). The unsolved problems of co-design and potential future works are discussed in Section 6. Section 7 provides the main conclusions.

#### **2. Related Work**

An extensive literature review shows many attempts for mathematical models design in the different fields [9,10]. In particular, the methods of the automated model design is highly valuable part of the various researches [11]. As an example, the equation-free methods allow building the models that represent the multi-scale processes [12]. Another example is building of the physical laws from data in form of function [13], ordinary differential equations system [14], partial differential equations (PDE) [15]. The application of the automated design of ML models or pipelines (which are algorithmicaly close notions) are commonly named AutoML [8] although most of them work with models of fixed structure, some give opportunity to automatically construct relatively simple the ML structures. Convenient notation for such purpose is representation of a model as a directed acyclic graph (DAG) [16]. Another example of popular AutoML tool for pipelines structure optimization is TPOT [17].

To build the ML model effectively in the complicated high-performance environment [18], the properties of both algorithms and infrastructure should be taken into account. It especially important for the non-standard infrastructural setups: embedded [19], distributed [20], heterogeneous [21] systems. Moreover, the adaptation of the model design to the specific hardware is an actual problem for the deep learning models [22,23].

However, the application of co-design approaches [24] for the generative model identification in the distributed or supercomputer environment [25,26] is still facing a lot of issues. For example, the temporal characteristics of the designed models should be known. The estimations of fitting and simulation time of the data-driven models can be obtained in several ways. The first is the application of the analytical performance models of the algorithm [27]. The identification of the analytical performance models can be achieved using domain knowledge [28]. However, it can be impossible to build this kind of model for the non-static heterogeneous environment. For this reason, the empirical performance models (EPMs) are highly applicable to the different aspects of the generative model design [29]. Moreover, the effective estimation of execution time is an important problem for the generation of optimal computational schedule [30] or the mapping of applications to the specific resources [31].

The execution of the complex resource-consuming algorithms in the specific infrastructure with limited resources raises the workflow scheduling problem [32]. It can be solved using an evolutionary algorithm [33] or neural approaches [34].

It can be noted that the existing design and co-design approaches are mostly focused on the specific application and do not consider the design for the different types of mathematical models. In the paper, we propose the modified formulation of this problem that allows applying the generative design and co-design approaches to the different tasks and models.

#### **3. Problem Statement**

A problem of the generative design of mathematical models requires a model representation as a flexible structure and appropriate optimization methods for maximizing a measure of the quality of the designed model. To solve this optimization problem, different approaches can be applied. The widely used approach is based on evolutionary algorithms (e.g., genetic optimization implemented in TPOT [35] and DarwinML [16] frameworks) because it allows solving both exploration and exploitation tasks in a space of model structure

variants. The other optimization approaches like the random search of Bayesian optimization also can be used, but the populational character of evolutionary methods makes it possible to solve the generative problems in a multiobjective way and produce several candidates model. Such formulation also can be successfully treated with the evolutionary algorithms or hybrid ones that combine the use of evolutionary operators with additional optimization procedures for increasing of robustness and acceleration of convergence. In this section, we describe the problem of generative co-design of mathematical models and computational resources in terms of the genetic programming approach.

A general statement for numerical simulation problem can be formulated as follows:

$$
\Upsilon = \mathcal{H}(M|Z),
\tag{1}
$$

where H is an operator of simulation with model *M* on data *Z*.

In the context of problem of computer model generative design, the model *M* should have flexible structure that can evolve by changing (or adding/eliminating) the properties of a set of atomic parts ("building blocks"). For such task, the model *M* can be described as a graph (or more precisely as a DAG):

$$M = \langle \mathcal{S}, E, \{a\_{1:|A|}\} \rangle\_{\prime} \tag{2}$$

with edges *<sup>E</sup>* that denoted relations between nodes ) *S*, *a*1:|*A*<sup>|</sup> \* that characterize functional properties *<sup>S</sup>* of atomic blocks and set of their parameters *a*1:|*A*<sup>|</sup> .

In terms of evolutionary algorithms each specimen *dp* in population *D* of computer model can be represented as a tuple that consists of phenotype *Y*, genotype *M* and fitness function *ϕ*(*M*):

$$d\_{\mathcal{P}} = \left< \boldsymbol{\chi}\_{\mathcal{P}}, \boldsymbol{M}\_{\mathcal{P}}, \boldsymbol{\varrho}\left(\boldsymbol{M}\_{\mathcal{P}}\right) \right>, \; D = \left(d\_{\mathcal{P}}, \; p \in \left[1: |D|\right]\right). \tag{3}$$

Genotype *M* should be mapped on plain vector as a multi-chromosome that consists of three logical parts: functional properties, sets of their parameters, relations between blocks:

$$M\_p = \left\langle S\_p, E\_p, \left\{ A\_k \right\}\_p \right\rangle = \left\langle \left\{ s\_{1:\left| S\_p \right|} \right\}\_p, \left\{ e\_{1:\left| E\_p \right|} \right\}\_p, \left\{ a\_{1:\left| S\_p \right|/A\_k} \right\}\_p \right\rangle,\\ A\_k = \left\{ a\_{1:\left| A\_k \right|} \right\}\_k, k \in \left[ 1: \left| S\_p \right| \right]. \tag{4}$$

The genotype is also illustrated in Figure 2.

**Figure 2.** The structure of the genotype during evolutionary optimization: functional properties, set of parameters and relations between atomic blocks.

An important property is that *Sp* , *Ap* , *Ep* = *const*, what means varying overall size of chromosome (and its structure). Such property makes this approach is really open-ended and consistent with idea of model evolution because it give an opportunity to synthesize the models with truly new features instead of simple recombination and optimization of existed ones. Technically open-endedness here refers to the ability of generative design algorithms to expand or narrow a combinatorial search space in the process of optimization with evolutionary operators. This leads to need of special realizations for crossover and mutation operators. As the chromosome *Mp* is a ordered set with the structure fixed in a tuple ) *S*, *a*1:|*A*<sup>|</sup> , *E* \* it is necessary to preserve this structure after crossover and mutation. That's why these operators are written relative to the graph structure and influence on the parts of chromosome that describe the node or a set of nodes with associated edges (sub-graphs). We may say that each single node can be described as some function with parameters *yk* = *fk x*, *a*1:|*A*<sup>|</sup> *k* . And mutation of function *fk* performs symbolic changes in the mathematical expression that results in extension of range of limits of initial genes.

So, the task of mathematical model generative design can be formulated as optimization task:

$$p\_{\mathbb{Q}}{}^{\max}(M^\*) = \max\_{M} f\_{\mathbb{Q}}\left(M|I^+, T\_{\text{gen}} \le \mathfrak{r}\_{\mathbb{S}}\right), \quad M = \{M\_p\},\tag{5}$$

where *fQ* is a fitness function that characterizes quality generated mathematical and *pQmax* is a maximal value of fitness function, model *M* is a space of possible model structures, *I*<sup>+</sup> - actual computational resources, *Tgen* is time for model structure optimization critical threshold *τg*. In such formulation we try to design the model with the highest quality, but we need to rely optimization to single configuration of computational resources. This factor is a strong limitation for the idea of generative design because this idea assumes flexibility of searched solution including the possibility to find the most appropriate for applied task combination of model structure and computational resources. The concept was illustrated on Figure 1.

Model and computational infrastructure co-design may be formulated as follows:

$$p^{\max}(M^\*, I^\*) = \max\_{M, I} F(M, I | T\_{\otimes m} \le \tau\_{\mathfrak{g}}), \quad I = \{I\_q\}, \ M = \{M\_p\},\tag{6}$$

where *I* is a set of infrastructure features, *F* is a vector fitness function that characterize a trade off between a goodness of fit and computational intensity of model structure. Vector function *F* consists of quality function *fQ* and time function *fT* that is negative for correct maximization:

$$F(M, I) = \left( f\_{\mathbb{Q}}(M, I), -f\_{\mathbb{T}}(M, I) \right). \tag{7}$$

The time function *fT* is a function that shows expected execution time of the model that is being synthesized with generative design approach. As the model *M* is still in the process of creation at the moment we want to estimate *F*, the values of *fT* may be defined by performance models (e.g., Equation (9)). The example of the model selection from the Pareto frontier on a basis of *pmax* and *τ<sup>c</sup>* constraints is presented is Figure 3. It can be seen that model *M*<sup>4</sup> has the better quality but it does not satisfy the execution time constraint *τc*.

However, in most of cases correct reflection of infrastructure properties to model performance is quite complicated task. In described case when we need, first, to generate the model with appropriate quality and vital limitations for computation time, we have several issues: (1) we may be not able to estimate model performance with respect to certain infrastructure in straight forward way and as a consequence we need performance models; (2) estimation of the dependency between model structure and computational resources reflects only mean tendency due to number of simplifications in performance models and search for minima on such uncertain estimations lead to unstable convergence to local minima. Due to these issues the formulation of optimization for co-design on stage of model building may be simplified to single criteria problem *F M*, *I*|*Tgen* ≤ *τ<sup>g</sup>* <sup>≈</sup> *<sup>F</sup> M*| *TM* ≤ *τ*, *Tgen* ≤ *τ<sup>g</sup>* 

with change of direct usage of infrastructure features to estimated time of model execution via performance models *TM* ≈ *T* = *fT*(*M*, *I*):

$$\hat{p}^{\text{max}}(M^\*) = \max\_{M} \hat{f}\_Q(M|\\_{T\_M} \le \tau\_{\mathfrak{c}}, T\_{\mathfrak{g}^{\text{en}}} \le \tau\_{\mathfrak{g}}),\tag{8}$$

where *fQ* is single criteria fitness function that characterize goodness of fit of model with additional limitations for expected model execution time *TM* and estimated time for structural optimization *Tgen*.

**Figure 3.** Pareto frontier obtained after the evolutionary learning of the composite model in the "quality-execution time" subspace. The points referred as *M*<sup>1</sup> - *M*<sup>4</sup> represent the different solutions obtained during optimization. *pmax* and *τ<sup>c</sup>* represent quality and time constraints.

In the context of automated models building and their co-design with computational resources, performance models (PM) should be formulated as a prediction of expected execution time with the explicit approximation of a number of operations as a function of computer model properties *S*, *a*1:|*S*<sup>|</sup> and infrastructure *I* parameters. However, for different computer models classes, there are different properties of performance models. In the frame of this paper, we address the following classes of models: ML models, numerical models (based on the numerical solution of symbolic equations), and composite models (that may include both ML and numerical models).

For ML models PM can be formulated as follows:

$$T\_{ML}^{PM}(Z,M) = \max\_{i} \left[ \sum\_{it} \frac{OML\_{i,it}}{V\_i(I) + Constant\_i(I)} \right] + O(I),\tag{9}$$

where *OML* = *OML*(*Z*, *M*) is an approximate number of operations for data-driven model with data volume *Z* and parametric model *M*, *it*—iterator for learning epoch, *Vi*(*I*) is for performance of *i th* computational node in flops, *Consti*(*I*) is for constant overheads for *i th* node in flops, *O*(*I*) is for sequential part of model code.

According to structure *M* = ) *S*, *E*, *a*1:|*S*<sup>|</sup> \* for data driven-model case, duple *S*, *<sup>E</sup>* characterize structural features of models (e.g., activation functions and layers in neural networks) and *a*1:|*S*<sup>|</sup> characterize hyper-parameters.

For numerical models PM can be formulated as follows:

$$T\_{Num}^{PM}(R,M) = \max\_{i} \left[ \frac{ON\_i}{V\_i(I) + \text{Const}\_i(I)} \right] + O(I),\tag{10}$$

where *ON* = *ON*(*R*, *M*)is an approximate number of operations for numerical model. In distinction with ML models they are not required for learning epochs and do not have strong dependency from volume of input data. Instead of this, there are internal features of model *M*, but it is worth separately denote computational grid parameters *R*. They include parameters of grid type, spatial and temporal resolution. Among the most influential model parameters *M* there are type and order of equations, features of numerical solution (e.g., numerical scheme, integration step, etc.).

For composite models PM total expected time is a sum of expected times for sequential parts of model chain:

$$T\_{Comp}^{PM}(R, Z, M) = \sum\_{j} \max\_{i} \left[ \frac{O \mathbb{C}\_{i, j}}{V\_i(I) + \mathbb{C}\_{const}(I)} \right] + O(I), \tag{11}$$

where expected time prediction for each sequential part is based on properties of appropriate model class:

$$\text{OC} = \begin{cases} \text{OML}, & \text{if } model \text{ is ML} \\ \text{ON}, & \text{if } model \text{ is numerical} \end{cases} \text{s.} \tag{12}$$

#### **4. Important Obstacles on the Way of Generative Co-Design Implementation**

It may seem that the problem statement described above gives us a clear vision of an evolutionary learning approach for generative design and co-design. However, several subtle points should be highlighted. This section is devoted to a discussion of the most interesting and challenging points (in the authors' opinion) that affect the efficiency or even the possibility of implementation the generative design (and co-design) approach for growing new mathematical models.

#### *Issue 1. Numerical Methods for Computation of Designed Arbitrary Function*

Open-ended realization of automated symbolic model creation with a generative design approach leads to the possibility of getting an unknown function as a resulted model. On the one hand, it gives interesting perspectives to create the new approximations of unknown laws. However, on the other hand, this possibility leads to the first conceptual problem of the generative design of mathematical models and a serious stumbling block on the way to implementing this idea. This problem is the need to calculate an arbitrary function or get the numerical solution of an arbitrary equation.

The choice of the numerical method for a given problem (discovered algebraic, ordinary differential, partial differential equation equations) is the crucial point. In most cases, the numerical method is designed to solve only several types of equations. When the numerical method is applied to the problem type, where convergence theorem is not proved, the result may not be considered as the solution.

As an example, solution of the partial difference equations using the finite difference schemes. For brevity, we omit details and particular equations, the reader is referred to [36] for details. The classical one-dimensional diffusion equation has different schemes, in particular, explicit, implicit, Crank-Nicolson scheme. Every scheme has a different approximation order and may lead to different solutions depending on the time-spatial grid taken. If the Crank-Nicolson spatial derivative scheme is taken to solve another equation, for example, the one-dimensional string equation, then the solution will also

depend on the time-spatial grid taken, however, in another manner. It leads to the general problem that the particular finite schemes cannot be used for the general equation solution.

The second approach is to approximate a solution with a neural network, which somewhat mimics the finite element method. The neural networks are known as universal approximators. However, their utility for differential equations solution is still arguable. The main problem is that the good approximation of the field is not necessary leads to the good derivative approximation [37]. There is a lot of workarounds to approximate derivatives together with the initial field, however, it is done with the loss of generality.

The possible promising solution is to combine optimization methods, local neural network approximation, and classical approach [38]. However, there is still a lot of the "white spots", since the arbitrary equation means a strongly non-linear equation with arbitrary boundary conditions. Such a generality cannot be achieved at the current time and requires a significant differentiation, approximation, and numerical evaluation method development. The illustration examples of the inverse problem solution are shown in Section 5.1.

### *Issue 2. Effective Parallelization of Evolutionary Learning Algorithm*

The procedure of generative design has high computation cost, thus effective algorithm realization is highly demanded. Efficiency can be achieved primarily by parallelizing the algorithm. As discussed generative algorithm is implemented on a base of the evolutionary approach, so the first way is a computation of each specimen *dp* in a population in a separate thread. Strictly speaking, it may be not only threads, but also separate computational nodes for clusters, but not to confuse computer nodes with nodes of a model graph *Mp*, here and further we will use the term "thread" in a wide sense. This way is the easiest for implementation but will be effective only in the case of cheap computations of objective function *ϕ*(*M*).

The second way is acceleration of each model *Mp* on the level of its nodes ) *S*, *a*1:|*A*<sup>|</sup> \* with possibility of logical parallelized. However, this way seems to be the most effective if we have uniform (from the performance point of view) nodes of models *Mp* and computational intensity appropriate for used infrastructure (in other words, each node should be computed in a separate thread in acceptable time). Often for cases of composite models and numerical models, this condition is becoming violated. Usually, the numerical model is consists of differential equations that should be solved on large computational grids. And composite models may include nodes that are significantly more computationally expensive than others. All these features lead us to take into account possibility of parallelization of generative algorithm on several levels: (1) population level, (2) model *Mp* level, (3) each node ) *S*, *a*1:|*A*<sup>|</sup> \* level; and make an adaptation of algorithm with respect to certain task.

Moreover, for the effective acceleration of the generative algorithm, we may take into account that most of the new composite models are based on nodes that are repeated numerously in the whole population. For such a case, we may provide storage for computed nodes and use them as results of pre-build models. The illustration of an ineffective and an effective parallelization setups described above is shown in the Figure 4.

The set of experiments that illustrates the problem raised in this issue and proposes the possible solutions is presented in Section 5.2.2.

#### *Issue 3. Co-Design of an Evolutionary Learning Algorithm and Computational Infrastructure*

In the frame of this research, the problem of co-design appears not only for the question of automatic creation of the computer model but also for the generative optimization algorithm itself. In Equation (8) we described co-design of generated model regarding the computational resources using estimation of model execution time *TM*. Separate problem is adaptation of generative evolutionary algorithm regarding the computational resources and specific traits of the certain task. In formulation Equation (8) it was only accounted for by the restriction to the overall time *Tgen* for model generation. However, the task can be formulated as search for generative evolutionary algorithm that is able to find the best model structure *M* in limited time *Tgen*. This task can be solved by optimization of hyper-parameters, evolutionary operators (and strategy of their usage) for generative optimization algorithm and formulated as meta-optimization problem over a set of possible algorithms *U* that are defined by a set of strategies B:

$$\mathcal{U} = \{\boldsymbol{u}(\boldsymbol{B})\}, \; \mathcal{B} = \left\{\boldsymbol{b}\_{1:|\mathcal{B}|}\right\}, \; \mathcal{b} = \langle \boldsymbol{H}, \mathcal{R} \rangle,\tag{13}$$

$$
\mu^\* = \mu(b^\*) = \arg\max\_{\ast} \mathcal{F}(\mu(b) | T\_{\mathbb{S}^{\text{en}}} \le \tau\_{\mathbb{S}}),
\tag{14}
$$

where F is a meta-fitness function and each strategy *b* is defined by evolutionary operators R and hyper-parameters *H*. Evolutionary operators also my be described as hyper-parameters but here we subdivide them in separate entity R.

*b*

**Figure 4.** Setup that illustrates inefficiency of the parallel evolution implementation due to fitness function computation complexity.

For the model's generative design task, the most expensive step usually refers to the evaluation of the fitness function value [6]. The calculation of the fitness function for the individuals of the evolutionary algorithm can be parallelized in different ways that are presented in Figure 5.

**Figure 5.** Approaches to the parallel calculation of fitness function with the evolutionary learning algorithm: (**a**) synchronously, each element of the population is processed at one node until all is processed (**b**) asynchronously, one of the nodes controls the calculations in other nodes.

The described approaches can be used for the different variants of the computational environment used for the generation of the models. The practical application of the generated models with the complex structure almost always difficult because of the high computation complexity of the numerical model-based simulations.

There are several groups of models that can be separated by the simulation pipeline structure. For the data-driven model, the computational cost of the fitting (identification) stage is higher than for the simulation stage. For the equation-based numerical models with rigid structure, there is no implicit fitting stage, but the simulation can be very expensive. In practice, different parallelization strategies can be applied to improve simulation performance [39].

The set of experiments that provides the examples to the problem raised in this issue can be seen in Section 5.3.

#### *Issue 4. Computational Strategies for Identification of Graph M*

The problem of DAG *M* = ) *S*, *E*, *a*1:|*S*<sup>|</sup> \* identification has two sides. First of all, the task of structural and parametric optimization of model *M* has exponential computational complexity with the growth of nodes number. Even if the functional structure *S*, *E* of the composite model is already identified, there is a computationally expensive problem of parameters *a*1:|*S*<sup>|</sup> (or hyperparameters in ML terms) tuning.

However, except for the computational intensity, there is a problem of searching the optimal set of values ) *S*∗, *E*∗, *a*1:|*S*<sup>|</sup> ∗\* in a space of high dimension (when chromosome has great length from tens to hundreds of values). This leads to unstable results of optimization algorithm because of the exponential growth of possible solutions in a combinatorial space (some parameters could be continuous but they are discretized and generally problem may be treated as combinatorial). One of the obvious ways for dealing with such a problem is local dimensionality reduction (or segmentation of the whole search space). This could be done with the application of various strategies. For example, we may simplify the task and search with generative algorithm only functional parts, and parameters (hyperparameters) may be optimized on the model execution stage (as discussed in Section 6). Such a way is economically profitable but we will get a result with lower fitness. An alternative variant is to introduce an approach for iterative segmentation of the domain space and greedy-like search on each batch (Section 5.4).

Another point should be taken into account, the structure of DAG with directed edges and ordered nodes (composite model with primary and secondary nodes) leads to the necessity of introducing the sequential strategies for parameters tuning. Despite the tuning can be performed simultaneously with the structural learning, there is a common approach to apply it for the best candidates only [16]. Unlike the individual models tuning, the tuning of the composite models with graph-based structure can be performed with different strategies, that are represented in Figure 6.

**Figure 6.** The different strategies of hyper-parameters tuning for the composite models: (**a**) individual tuning for each atomic model (**b**) the tuning of the composite model that uses secondary models to evaluate the tuning quality for the primary models.

The experiment that demonstrates the reduction of the search space for the composite model design by the application of the modified hyperparameters tuning strategy for the best individual is described in Section 5.4.

#### *Issue 5. Estimation of PM for Designed Models*

Analytic formulations of PM show expected execution time that is based on relation between approximate number of operations and computational performance of certain infrastructure configuration. The problem is to estimate this relation for all pairs from model structures *M* = ) *S*, *E*, *a*1:|*A*<sup>|</sup> \* and computational resources *<sup>I</sup>* <sup>=</sup> *Iq* with respect to input data *Z* because we need to make estimations of *OML*, *ON* and *OC* (depending on the class of models). Generally, there are two ways: (1) estimation of computational complexity (in O notation) for each model *M*, (2) empirical performance model (EPM) estimation of execution time for every specimen *M*, *I*, *Z*. The first option gives us theoretically proved results, but this is hardly may be implemented in case of models' generative design when we have too many specimens *M*, *I*, *Z*. The second option is to make a set of experimental studies for specimens *M*, *I*, *Z* execution time measurements. However, in this case, we need to make a huge number of experiments before we start the algorithm of generative codesign and the problem statement becomes meaningless. To avoid numerous experiments, we may introduce estimation of EPM that consists of two steps. The first one is to estimate relation between time *TM* and volume of data *Z*: *TPM Num*(*M*, *<sup>Z</sup>*, *<sup>I</sup>*) <sup>≈</sup> *<sup>T</sup>EPM Num* (*Z*|*M*, *I*). To simplify identification of *TEPM Num* (*Z*|*M*, *I*), we would like to approximate this with a linear function with non-linear kernel *ψ*(*Z*):

$$T\_{\text{Num}}^{\text{EPM}}(Z|M, I) = \sum\_{w=1}^{W} \omega\_{\text{w}} \psi\_{\text{w}}(Z), \tag{15}$$

where *W* is a number of components of linear function. The second step is to use value of *TEPM Num* (*Z*|*M*, *I*) to estimate relation between execution time and infrastructure *I*: *TEPM Num* (*Z*|*M*, *<sup>I</sup>*) <sup>→</sup> *<sup>T</sup>EPM Num* (*I*|*M*, *Z*). For this purpose we should make even a raw estimation of number of operations *OML*, *ON and OC*.

On the example of EPM for numerical model (Equation (10)) we can make the following assumptions:

$$\text{U } O(I) \approx 0, \quad \text{Const}\_{\text{i}}(I) \approx 0, \text{ } V = \text{mean}\_{\text{i}}(V\_{\text{i}}(I)), \text{ } ON = \text{mean}\_{\text{i}}(ON\_{\text{i}}), \tag{16}$$

$$\max\_{i} \left[ \frac{ON\_{i}}{V\_{i}(I) + \text{Const}\_{i}(I)} \right] = \max\_{i} \left[ \frac{ON\_{i}}{V\_{i}(I) + \text{Const}\_{i}(I)} \right],\tag{17}$$

and get the following transformations for raw estimation of overall number of operations *nON* with respect to *n* computational nodes:

$$n\text{ON}(M, Z) = nT\_{\text{Num}}^{\text{PM}}(M, Z, I)V(I), \; i \in [1:n]. \tag{18}$$

It is worth nothing that the obvious way to improve accuracy of estimation *nON* is to use for experimental setup resources with characteristics of computational performance close to *V* = *meani*(*Vi*(*I*)) and task partitioning close to *ON* = *meani*(*ONi*). Getting the estimation of *nON* and infrastructure parameters *Vi*(*I*), *Consti*(*I*), *O*(*I*) we may go to raw estimation:

$$T\_{\rm Num}^{\rm EPM}(M, Z, I) = \max\left[\frac{a\_i nON(M, Z)}{V\_i(I) + \text{Const}\_i(I)}\right] + O(I),\tag{19}$$

where *α<sup>i</sup>* is coefficient for model partitioning. Similar transformations could be made for other models.

The experiments devoted to the identification of the empirical performance models for both atomic and composite models are provided in Section 5.5.

#### **5. Experimental Studies**

The proposed approaches to the co-design of generative models cannot be recognized as effective without experimental evaluation. To conduct the experiments, we constructed the computational environment that includes and hybrid cluster and several multiprocessor nodes that can be used to evaluate different benchmarks.

A set of experiments have been held with the algorithm of data-driven partial differential equation discovery to analyze its performance with different task setups. All experiments were conducted using the EPDE framework described in detail in [15].

The other set of experiments devoted to the automated design of the ML models was conducted using the open-source Fedot framework (https://github.com/nccr-itmo/ FEDOT). The framework allows generating composite models using evolutionary approaches. The composite model generated by the framework can include different types of models [6]. The following parameters of the genetic algorithm were used during the experiments: maximum number of the generations in 20, number of the individuals in each population is 32, probability of mutation, probability of mutation is 0.8, probability of crossover is 0.8, maximum arity of the composite model is 4, maximum depth of the composite model is 3. More detailed setup is described in [40].

#### *5.1. Choice of the Model Evaluation Algorithm*

The first group of experiments is connected with the Issue 1 that describes the different aspects of numerical computation of designed models.

For example, the problem of data preprocessing for partial differential equations models, represented by the calculation of derivatives of the input field, is of the top priority for the correct operation of the algorithm: the incorrect selection of tools can lead to the increasing magnitudes of the noise, present in the input data, or get high values of numerical errors. The imprecise evaluation of equation factors can lead to cases, when the wrong structure has lower equation discrepancy (the difference between the selected right part term and the constructed left part) and, consequently, higher fitness values, than the correct governing equation.

However, the versatility of the numerical differentiation adds the second criterion on the board. The finite differences require a lot of expertise to choose and thus their automatic use is restricted since the choice of the finite difference scheme is not a trivial task that requires either a fine grid to reduce the error or choice of the particular scheme for the given problem. Both ways require extended time.

Artificial neural networks (ANN), used to approximate the initial data field, are an alternative to this approach, which can have a number of significant advantages. To get the fields of derivatives, we utilize the automatic differentiation, that is based on the approach, similar to the chain differentiation rule from the elementary calculus, and is able to combine the evaluated values of derivatives of a function, comprising the neural network to get the "correct" values of derivatives. In contrast to the previously used method of analytical differentiation of polynomials, the automatic differentiation is able to get mixed derivatives. From the performance point of view, the advantages of the artificial neural networks lie in the area of ease of parallelization of tensor calculations and the use of graphical processing units (GPU) for computation.

However, the task setup has a number of challenges in the approach to ANN training. First of all, the analyzed function is observed on a grid, therefore, we can have a rather limited set of training data. The interpolation approaches can alter the function, defining the field, and the derivatives, in that case, will represent the structure of the interpolating function. Next, the issue of the approximation quality remains unsolved. While the ANN can decently approximate the function of one variable (which is useful for tasks of ordinary differential equations discovery), on the multivariable problem statement the quality of the approximation is relatively low. The example of approximation is presented in Figure 7.

In the conducted experiments [41] we have used the artificial neural network with the following architecture: the ANN was comprised of 5 fully connected layers of 256, 512, 256, 128, 64 neurons with sigmoid activation function. As the input data, the values of the solution function for a wave equation (*utt* = *αuxx*), solved with the implicit finite-difference method, have been utilized. Due to the nature of the implemented solution method, the function values were obtained on the uniform grid. The training of ANN was done for a specified number of epochs (500 for the conducted experiments), when of the each epoch the training batch is randomly selected as a proportion of all points (0.8 of the total number of points). To obtain the derivatives, the automatic differentiation methods, implemented in the Tensorflow package are applied to the trained neural network.

**Figure 7.** Comparison of the equation solution and its approximation by artificial neural networks (ANNs) for a time slice (**a**) and heatmap of the approximation error (*uapprox* − *utrue*) (**b**).

Even with the presented very good approximation of the original field, the first derivatives (Figure 8) are obtained with decent quality and may serve as the building blocks. However, it is seen that the derivative field is significantly biased.

(**a**) *ut* (**b**) *ux* **Figure 8.** Comparison of derivatives obtained by polynomial differentiation and by symbolic regression for first time derivative (**a**) first spatial derivatives (**b**) for a time slice (*t* = 50).

Further differentiation amplifies the error. The higher-order derivatives shown in Figure 9 cannot be used as the building blocks of the model and do not represent the derivatives of the initial data field.

(**a**) *utt* (**b**) *uxx* **Figure 9.** Comparison of derivatives obtained by polynomial differentiation and by symbolic regression for second time derivative (**a**) second spatial derivatives (**b**) for a time slice (*t* = 50).

Both of the implemented differentiation techniques are affected by numerical errors, inevitable in the machine calculations, and contain errors, linked to the limitations of the method (for example, approximation errors). To evaluate the influence of the errors on the discovered equation structure, the experiments were conducted on simple ordinary differential Equation (ODE) (20) with solution function (21).

$$L(t) = x(t)\sin t + \frac{d\mathbf{x}}{dt}\cos t = 1,\tag{20}$$

$$x(t) = \sin t + C \cos t.\tag{21}$$

We have tried to rediscover the equation, based on data, obtained via analytical differentiation of function (21), application of polynomial differentiation, and with the derivative, calculated by automatic differentiation of fitted neural network. The series of function values and the derivatives are presented in Figure 10. Here, we can see, that the proposed ANN can decently approximate data; the analytical & polynomial differentiation obtains similar fields, while automatic differentiation algorithm may result in insignificant errors. 10 independent runs of the equation discovery algorithm have been performed for each derivative calculation method, and the results with the lowest errors have been compared. For the quality metric, the Mean Square Error of the vector, representing the discrepancy of the function *x*¯(*t*), which is the solution of discovered on data-driven equation *M*(*t*) = 0 with aim of |*M*(*t*)| −→ *min*, evaluated on the nodes of the grid was used.

While all of the runs resulted in the successful discovery of governing equations, the issues with such equations are in the area of function parameters detection and calculating the correct coefficients of the equation. The best result was achieved on the data from analytical differentiation: *MSE* <sup>=</sup> 1.452 · <sup>10</sup>−4. The polynomial differentiation got the similar quality *MSE* <sup>=</sup> 1.549 · <sup>10</sup>−4, while the automatic differentiation achieved *MSE* <sup>=</sup> 3.236 · <sup>10</sup>−4. It could be concluded, that in the case of first-order equations, the error of the differentiation has less order than all other errors and thus the fastest method for the given problem may be used. However, in the PDE case, it is complicated to use only first-order derivatives, whereas arbitrary ordinary differential equations may be represented as the system of the first-order equations.

**Figure 10.** The solution of ODE from Equation (20), its approximation by neural network, and derivatives calculated by analytic, polynomial and automatic differentiation.
