*2.1. A GENERIC Approach to the Learning Procedure*

Recently, W. E and coworkers established a very useful parallelism between DNNs and dynamical systems [33]. Consider a system governed by some state variables *<sup>z</sup>*(*z*0, *<sup>t</sup>*) : I → S, *<sup>z</sup>* ∈ C<sup>1</sup> (0, *T*], such that its time evolution is established in the form

$$\frac{d\mathbf{z}}{dt} = f(\mathbf{z}, t), \ \mathbf{z}(0) = \mathbf{z}\_0 \tag{1}$$

where *f* is, in general, a nonlinear function—otherwise, the method is of little interest. S represents the phase space of the system (a set of judiciously chosen variables that both describe the energy of the system and can be measured experimentally). I = (0, *T*] represents the considered time interval.

For the selected time horizon *T*, the *f*low map

$$z\_0 \to z(z\_{0\prime}T)$$

is a nonlinear function of *z*. The dynamical system approach to supervised learning consists in determining *f* so that the resulting flow map is able to reproduce the experimental data. This parallelism offers the advantage of the vast knowledge developed so far in the field of dynamical systems, which could help us in developing both theoretical insights about DNNs and also to devise alternative routes for developing efficient learning strategies. DNNs can be thought of as discretized dynamical systems, such that every time step corresponds to a layer of the DNN.

In this work, we consider viscous-hyperelastic materials. Therefore, the learned function *f* must satisfy certain well-known principles dictated by thermodynamics. In the hyperelastic case, the right choice for *f* could arise from Hamiltonian mechanics, i.e.,

$$\frac{d\mathbf{z}}{dt} = \mathbf{L}(\mathbf{z})\nabla E(\mathbf{z})\tag{2}$$

where *E* represents the Hamiltonian or, in other words, the energy of the system. *L*(*z*) represents the so-called Poisson matrix, a skew-symmetric matrix that depends on *z*, in general, but that results to be constant in many cases.

Should the system of interest not be conservative (or Hamiltonian), a new potential needs to be introduced in the formulation-entropy-giving rise to the GENERIC formalism [30]

$$\dot{\mathbf{z}}\_t = \mathbf{L}(\mathbf{z}\_t)\nabla E(\mathbf{z}\_t) + \mathbf{M}(\mathbf{z}\_t)\nabla S(\mathbf{z}\_t), \ \mathbf{z}(0) = \mathbf{z}\_0 \tag{3}$$

For Equation (3) to represent valid nonequilibrium thermodynamic processes, it must be supplemented with the so-called degeneracy conditions, i.e.,

$$L(\mathbf{z}) \cdot \nabla S(\mathbf{z}) = \mathbf{0} \tag{4}$$

$$\mathbf{M}(\mathbf{z}) \cdot \nabla E(\mathbf{z}) = \mathbf{0} \tag{5}$$

If, as stated before, *L* is skew-symmetric; and choosing *M* to be symmetric, positive semidefinite, we obtain

$$\dot{E}(\mathbf{z}) = \nabla E(\mathbf{z}) \cdot \dot{\mathbf{z}} = \nabla E(\mathbf{z}) \cdot \mathbf{L}(\mathbf{z}) \nabla E(\mathbf{z}) + \nabla E(\mathbf{z}) \cdot \mathbf{M}(\mathbf{z}) \nabla S(\mathbf{z}) = \mathbf{0} \tag{6}$$

i.e., one ensures the conservation of energy in closed systems.

In turn,

$$\dot{S}(\mathbf{z}) = \nabla S(\mathbf{z}) \cdot \dot{\mathbf{z}} = \nabla S(\mathbf{z}) \cdot \mathbf{L}(\mathbf{z}) \nabla E(\mathbf{z}) + \nabla S(\mathbf{z}) \cdot \mathbf{M}(\mathbf{z}) \nabla S(\mathbf{z}) \ge 0 \tag{7}$$

which is equivalent to the fulfillment of the second principle of thermodynamics. Thus, we notice how, by leveraging the dynamical systems equivalence, we efficiently enforce the conservation of energy and the production of entropy. For an in-depth discussion of the implications of the choice of phase space variables *z*, we refer the interested reader to our previous works in the field, [31,32]. In general, it is well-known that neither a particular choice of variables is needed, nor a particular scale for the description of the system at hand. The only need is that the variables in the phase space would be able to account for the energy of the system. That is why, in general, a Langevin equation is not suitable for this purpose, see [35] and references therein.

Following the dynamical systems equivalence, the next step consists in the determination, by regression from data, of the form of the elements of the GENERIC description of the dynamics, i.e., *L*, *M*, ∇*E*(*z*), and ∇*S*(*z*). To do so, assume a standard finite difference discretization of the time derivative,

$$\frac{\mathbf{z}\_{n+1} - \mathbf{z}\_n}{\Delta t} = \mathbb{L}(\mathbf{z}\_{n+1})\mathsf{D}\mathsf{E}(\mathbf{z}\_{n+1}) + \mathsf{M}(\mathbf{z}\_{n+1})\mathsf{D}\mathsf{S}(\mathbf{z}\_{n+1})\tag{8}$$

where we denote *zn*+<sup>1</sup> = *zt*+∆*<sup>t</sup>* and where L and M are the discrete version of the Poisson and friction operators, respectively. DE and DS represent the discrete gradients. In general, matrix L is constant over the process, while matrix M frequently varies.

While there exist machine learning techniques that are able to provide with the precise expressions of the terms involved in a particular PDE from data [2], we pursue a purely numerical route, in which we assume that the elements of the GENERIC formula have a particular manifold structure which is to be unveiled by our method. This is the concept of *c*onstitutive manifold that we first stated in our previous works [4,7,36].

The dynamical systems approach to the problem will thus consist of solving the following (possibly constrained) minimization problem within a time interval J ⊆ I:

$$\mu^\* = \{\mathsf{L}, \mathsf{M}, \mathsf{DE}, \mathsf{DS}\} = \operatorname\*{arg\,min}\_{\mu} ||z(\mu) - z^{\mathsf{meas}}|| \tag{9}$$

with *z* meas <sup>⊆</sup> *<sup>Z</sup>*, a subset of the total available experimental results. <sup>L</sup>, <sup>M</sup>, DE, DS will in general be dependent on *z*, and therefore the choice of size of *z* meas (in other words, the number of regressions performed along the time interval, or the number of layers in the DNNs equivalent) will affect the accuracy of the method. An analysis of the influence of this choice was made in [31].

We finally approximate *z* by employing piecewise polynomials, so that gradient operators can be cast in matrix form, *`*a la finite elements, as

$$\mathsf{DE} = \mathsf{Az} \tag{10}$$

$$\mathbf{DS} = \mathbf{B}\mathbf{z} \tag{11}$$

where *A* and *B* represent the discrete, matrix form of the gradient operators leading to DE and DS, respectively.

The GENERIC description of a hyperelastic material is well known [37]. Indeed, for the Hamiltonian, conservative part of the constitutive equation, we have

$$\mathbf{z}(\mathbf{x},t) = \left[\mathbf{x}(\mathbf{X},t), \mathbf{p}(\mathbf{X},t)\right]^\top \tag{12}$$

where *<sup>x</sup>* <sup>=</sup> *<sup>φ</sup>*(*X*) <sup>∈</sup> <sup>Ω</sup>*<sup>t</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> represents the deformed configuration of the solid, Ω*<sup>t</sup>* represents the deformed configuration of the solid at time *<sup>t</sup>*, and *<sup>p</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> represents the material momentum density. In this case,

$$\begin{aligned} \dot{z} = \begin{bmatrix} \dot{\mathbf{x}} \\ \dot{p} \end{bmatrix} = L\nabla E = \begin{bmatrix} \mathbf{0}\_{3 \times 3} & I\_{3 \times 3} \\\\ -I\_{3 \times 3} & \mathbf{0}\_{3 \times 3} \end{bmatrix} \begin{bmatrix} \frac{\partial E}{\partial \mathbf{x}} \\\\ \frac{\partial E}{\partial p} \end{bmatrix} \end{aligned} \tag{13}$$

The total energy of a hyperelastic body is known to be the sum

$$E = \mathcal{W} + \mathcal{K} \tag{14}$$

of elastic and kinetic energies. Here, we assume a strain energy density potential *w* of the form

$$\mathcal{W} = \int\_{\Omega\_0} w(\mathcal{C}) \, d\Omega \tag{15}$$

where Ω<sup>0</sup> represents the undeformed configuration of the solid and *C* represents the right Cauchy-Green deformation tensor. In a general isotropic case, the strain energy density would take the form *w* = *w*(*X*, *C*, *S*). If the material under consideration is isotropic hyperelastic, we simply write *w* = *w*(*C*). In turn, the kinetic energy will be

$$K = \int\_{\Omega\_0} \frac{1}{2\rho\_0} |\mathcal{p}|^2 \, d\Omega \tag{16}$$

Therefore, by numerically identifying the form of the energy potential, one readily observes that the conservative part of the usual constitutive hyperelastic law is found. If, in addition, the material shows a viscous dissipative behavior, the precise form of the entropy potential is to be found.

Once the learning method has been developed, we describe next how the experimental campaign was accomplished.

#### *2.2. Treatment of Dispersion and Noise in Data*

One of the most important aspects to consider when dealing with soft living tissues is the importance of the dispersion of experimental results. This will be highlighted in Section 3 below. The main objective of the proposed technique is the determination of a constitutive manifold for the term of the GENERIC description of the physical phenomena, this type of dispersion needs to be treated efficiently.

To this end, we suggest to employ Topological Data Analysis (TDA) [38,39]. Behind the proposed method is the assumption of the existence of a *c*onstitutive manifold, a concept that we first introduced in [4]. Our set of experimental measurements, in the most general case, is assumed to be composed by *D*-tuples (*D* = 9 for this particular case) of the type

$$\mathcal{S} = \{ \mathbf{z} = (\mathbf{U}, \mathbf{P}) \in (\mathbb{R}^3 \times \mathbb{R}^6) \}\tag{17}$$

where *U* represents the (right) stretch tensor and *P* the first Piola–Kirchhoff stress tensor. These experimental values are assumed to form a constitutive manifold in a high-dimensional space, see Figure 1. These values are, however, noisy. TDA is nevertheless able to extract the underlying geometry of this manifold, which will later be embedded onto a low-, *d*-dimensional manifold for interpolation purposes.

**Figure 1.** Hypothesis about the existence of a constitutive manifold in which the experimental results live. Despite the noise in the data, Topological Data Analysis (TDA) techniques will help us in unveiling the true geometry of the manifold in the high-dimensional setting, which will later be embedded onto a low dimensional space for the ease of computations.

TDA seeks to find the underlying topological structure of data. To this end, it employs a distance parameter *R* between experimental points. As we make *R* grow, points at distances lower than *R* will be connected by edges, triangles, and tetrahedra—in general, by *k*-simplexes, respectively, in one, two, three, ..., *k* dimensions. As simplexes appear, they form *s*implicial complexes. The study of the formation of this simplicial complex structure in the data is precisely the objective of TDA. The optimal *R* parameter that best describes the topology of the data is found by resorting to the so-called *p*ersistence diagrams.

In essence, simplicial homology reflects the number of holes in a given dimension for the simplicial complex. For instance, a chain of edges may close a hole, while the interior space within a tetrahedron is a void. If we record the precise value of *R* for which a hole or void (in any dimension) appears, and the *R* value for which it disappears—by the appearance of a filled triangle or tetrahedron, for instance—the resulting plots will indicate which topological features *p*ersist more. Those indicate the true topology of the data. For a graphical interpretation of this explanation, see Figure 2.

Under the TDA framework, noisy data produce topological structures with small persistence. The true topology (manifold structure) is described by these structures that persist the most. Once unveiled, the topological structure of data or, equivalently, the right shape of the data manifold, will allow us to interpolate data in the right topological space—in the tangent plane to the manifold at each datum—and not in Euclidean space.

**Figure 2.** Interpretation of TDA. Orange circles have diameter *R*, the topology parameter. Below each simplicial complex, the bar code corresponding to dimension 1 holes) is represented. (**a**) For a sufficiently small *R* parameter, say, 0.1, only a collection of data points is visible, with no topology at dimension 1. (**b**) Increasing *R* to 1.0 makes the first circular topology appear. This hole is visible for *R* > 0.8, hence the bar in the diagram from *R* = 0.8 to *R* = 1.0. (**c**) If we increase *R*, no perceptible changes are observed. Only one hole persists and it is reflected in the bar code below. (**d**) The hole disappears by formation of big triangles at about *R* = 2.8. From the observation of the bar code, we notice that the data set has the topology of a circle, with one single interior hole. In general, those holes or voids that persist the most reflect the persistent topology of the data.

## *2.3. Pseudo-Experimental Data—Learning a Visco-Hyperelastic Response*

In this first example, we consider pseudo-experimental (numerical) data coming from a finite element simulation. We consider the same visco-hyperelastic material previously considered in [32], but this time altered with noise. This noise has a standard deviation of 10% of the mean value.

The considered material is assumed to obey a Mooney–Rivlin constitutive law

$$\mathcal{W} = \mathbb{C}\_1(\mathbb{T}\_1 - \mathfrak{Z}) + \mathbb{C}\_2(\mathbb{T}\_2 - \mathfrak{Z}) + D\_1(f - 1)^2 \tag{18}$$

with *I*<sup>1</sup> = *J* − 2 *I*<sup>1</sup> and *I*<sup>2</sup> = *J* − 4 *I*2. *I*<sup>1</sup> = *λ* + *λ* + *λ* and *I*<sup>2</sup> = *λ λ* + *λ λ* + *λ λ* are the invariants of the right Cauchy-Green tensor *C*. In turn, *J* represents the determinant of the gradient of deformation tensor. We took *C*<sup>1</sup> = 27.56 MPa, *C*<sup>2</sup> = 6.89 MPa, and *D*<sup>1</sup> = 0.0029 MPa.

The viscous part of the behavior is modeled after a Prony series expansion of the type

$$\frac{G(t)}{G\_0} = 1 - \sum\_{i=1}^{2} \overline{\mathbf{g}}\_i^P \left( 1 - \exp\left(-\frac{t}{\tau\_i}\right) \right) \tag{19}$$

$$\frac{K(t)}{K\_0} = 1 - \sum\_{i=1}^{2} \overline{k}\_i^P \left(1 - \exp\left(-\frac{t}{\tau\_i}\right)\right) \tag{20}$$

with *g P <sup>i</sup>* = [0.2, 0.1] and *k P <sup>i</sup>* = [0.5, 0.2]. The relaxation times take the values *τ<sup>i</sup>* = [0.1, 0.2] seconds, respectively. Initial instantaneous Young's modulus and Poisson's ratio corresponding to these values are *E* = 206.7 MPa and *ν* = 0.45, respectively.

With the material as just described, a plane-stress, biaxial experiment is reproduced. This experiment consists of two different loading steps, each one followed by a relaxation step. A total of 50 different data sets have been obtained for this experiment.

## 2.3.1. A mean GENERIC model.

A regression procedure is then accomplished for each one of the 50 different experiments, so as to determine their precise GENERIC expression. With the obtained values, we first compute the mean GENERIC model by simply taking mean values for each one of the GENERIC model components. This "mean" GENERIC model is compared to the noise-free numerical experiment, taken as ground truth.

#### 2.3.2. Extracting the topology of data: GENERIC-TDA model.

Instead of just computing the mean values of each term of the GENERIC model, it seems judicious to employ TDA to unveil the topology of data and to determine the final GENERIC model by interpolating from the right neighboring experimental results. To this end, we considered a data set composed by 20 different loading states (all consisting of a load-relaxation-load-relaxation sequence) and the addition of noise (10% sdv) to each one of these processes, so as to obtain 50 different tests for each one of the 20 loading processes. This makes a total of one thousand different tests.

One of these tests (noise-free) is kept as the reference solution. We employ TDA to find the "neighboring" test to this reference solution and, by employing Kriging, to obtain the final numerical values of the GENERIC model. We would like to emphasize that interpolation is made not in the Euclidean space, but in the right tangent space to the data manifold, as unveiled by TDA techniques. Note that we speak of a tangent plane to a noisy manifold, since TDA is able to unveil the topology that persists the most, thus giving an apparent noise-free topology.

Weights for the interpolation are obtained by different Kriging interpolation techniques: Simple, Ordinary, and Local.
