*2.1. Linear Homogenization Procedure*

Due to the microscopic nature of heterogeneity, a procedure is required for extracting the effective thermal conductivity. In what follows we proceed in the linear case, as was also the case in [22], in a representative volume element Ω with a microstructure perfectly defined at that scale. The microscopic conductivity **<sup>k</sup>**(**x**) is known at every point **<sup>x</sup>** ∈ <sup>Ω</sup>.

The macroscopic temperature gradient **G** is defined from the space average

$$\mathbf{G} = \langle \mathbf{g}(\mathbf{x}) \rangle \equiv \frac{1}{|\Omega|} \int\_{\Omega} \mathbf{g}(\mathbf{x}) \, d\mathbf{x},\tag{5}$$

where **g**(**x**) represents the microscopic temperature gradient, i.e., **g**(**x**) = ∇*T*(**x**).

We define the localization tensor **L**(**x**) such that

$$\mathbf{g(x) = L(x) \,\mathrm{G}}.\tag{6}$$

The microscopic heat flux **q**(**x**) follows the Fourier law

$$\mathbf{q(x)} = -\mathbf{k(x)} \,\mathbf{g(x)}\_{\prime} \tag{7}$$

and its macroscopic counterpart **Q** reads

$$\mathbf{Q} = \langle \mathbf{q}(\mathbf{x}) \rangle = \langle -\mathbf{k}(\mathbf{x}) \, \mathbf{g}(\mathbf{x}) \rangle = \langle -\mathbf{k}(\mathbf{x}) \, \mathbf{L}(\mathbf{x}) \rangle \, \mathbf{G} \tag{8}$$

from which the homogenized thermal conductivity reads

$$\mathbf{K} = \langle -\mathbf{k}(\mathbf{x}) \, \mathbf{L}(\mathbf{x}) \rangle. \tag{9}$$

Thus, the calculation of the homogenized thermal conductivity tensor only needs the computation of the tensor **L**(**x**). The present work considers the simplest procedure that in the 2D case consists of solving two steady state thermal problems in Ω

$$\begin{cases} \nabla \cdot \left( \mathbf{k}(\mathbf{x}) \, \nabla T^1(\mathbf{x}) \right) = 0 \\\ T^1(\mathbf{x} \in \partial \Omega) = \mathbf{x} \end{cases} \text{,} \tag{10}$$

and

$$\begin{cases} \nabla \cdot \left( \mathbf{k}(\mathbf{x}) \, \nabla T^2(\mathbf{x}) \right) = 0 \\\ T^2(\mathbf{x} \in \partial \Omega) = y \end{cases} \text{,} \tag{11}$$

whose solutions verify by construction

$$\begin{cases} \mathbf{G}^1 = \langle \nabla T^1(\mathbf{x}) \rangle^T = (1,0) \\ \mathbf{G}^2 = \langle \nabla T^2(\mathbf{x}) \rangle^T = (0,1) \end{cases} \tag{12}$$

and whose gradients define the localization tensor columns

$$\mathbf{L(x)} = \begin{pmatrix} \nabla T^1(\mathbf{x}) & \nabla T^2(\mathbf{x}) \end{pmatrix} . \tag{13}$$

that allows calculating the effective thermal conductivity.

**Remark 1.** *In the present work, since we are only interested in the effective thermal conduction along the y-direction, a single problem, problem (11), suffices for calculating the only component of interest, component K*22*.*

#### *2.2. Topological Data Analysis*

Topological data analysis, TDA [21], is one of the most promising techniques in high-dimensional data analysis. In essence, TDA is a powerful tool to find the topology of data: if there are clusters, a manifold structure or even noise that is not relevant for the analysis.

For an intuitive description of the method consider the set of points depicted in Figure 1. In general, these points will live in high dimensional spaces, such that their intrinsic topology will not be visible at first glance. We then equip the set with a distance parameter *r*. By making *r* grow, different *k*-simplexes will appear. Remember that a 0-simplex is a point, a 1-simplex is an edge, a 2-simplex is a triangle, on so on.

**Figure 1.** Illustrating TDA: (**left**) For *r* < *d* the four points (A, B, C and D) remain disconnected; (**center**) At *r* = *d* the hole ABCD appear from the four edges AB, BC, CD and DA; (**right**) The just created hole persist until *r* = √ 2*d*, value at which A connects with C and the two resulting triangles ABC and ACD cover the initial hole that disappears consequently.

As *r* grows, holes appear (as the one defined by the edges between points A, B, C and D in Figure 1, for instance when *r* = *d*), and disappear for higher values of *r* (when *r* = √ 2*d*, the initial hole is covered by triangles ABC and ACD). Which is important in this discussion is that the overall structure of data is the one that *persists* for longer *r* values. Holes defined by noisy data are rapidly eliminated from the simple complex.

The value of *r* at which a hole appears, and then the one at which it disappears, defines a bar joining both, which characterizes the hole persistence. When collecting all the bars associated with all the holes appearing and then disappearing when *r* grows, the so-called persistence barcode results, the last representing compactly a given morphology.

An alternative consists of using a 2D representation, the so-called persistence diagram (PD), reporting in the *x*1-axis the value of *r* at which a hole appears, and on the *x*2-axis the value at which it disappears. Obviously, with the hole birth preceding its death, all the point are place on the upper domain defined by the bisector *x*<sup>2</sup> = *x*1, and any point (*x*1, *x*2) remaining close to that bisector represents noise, a small scale, with the associated hole death following immediately its birth. Points far from the bisector represent the topology that persists.

The persistence barcode and the persistence diagram are two representations with a high physical content; however both representations can not be used for comparison purposes, because they are defined in a non-metric space where the calculation of distances for concluding on proximity has not sense.

To move to a more appropriate space making possible the calculation of distances, we first transform the persistence diagram according to (*x*1, *x*2) → (*y*<sup>1</sup> = *x*1, *y*<sup>2</sup> = *x*<sup>2</sup> − *x*1) and then apply on the last a convolution (usually with a Gaussian kernel) leading to the so-called persistence image (PI) **y**, the last defined in a vector space, **<sup>y</sup>** <sup>∈</sup> <sup>R</sup>*D*, that allows applying most of AI algorithms [23].

#### *2.3. Principal Component Analysis*

TDA is able to analyze a complex microstructure through its associated image, and to extract its relevant topological features in form of a persistence image, that can be viewed a matrix. However, using these matrix components is not the most compact and concise way of representing the microstructure, because it contains too many components that makes difficult using it for constructing regressions. Thus, in practice, a linear dimensionality reduction such as principal component analysis (PCA) can be applied for extracting the most representative modes of the persistence images and then to represent in a compact and concise way the microstructures by using the weight associated with the most important modes extracted.

Let us consider a vector **<sup>y</sup>** <sup>∈</sup> <sup>R</sup>*<sup>D</sup>* containing the different components of a persistence image. When considering a set of P microstructures, the associated PIs lead to **y***<sup>i</sup>* , *i* = 1, . . . , P. If they are somehow correlated, there will be a linear transformation **<sup>W</sup>** defining the vector *<sup>ξ</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* , with *d* < *D*, which contains the still unknown *latent variables*, such that [4]

$$\mathbf{y} = \mathbf{W}\mathbf{\tilde{y}}.\tag{14}$$

The transformation matrix **<sup>W</sup>**, *<sup>D</sup>* <sup>×</sup> *<sup>d</sup>*, satisfies the orthogonality condition **<sup>W</sup>***T***<sup>W</sup>** <sup>=</sup> **<sup>I</sup>***<sup>d</sup>* , where **I***<sup>d</sup>* represents the *d* × *d* identity matrix.

PCA proceeds by guaranteeing maximal preserved variance and de-correlation in the latent variable set *ξ*. Thus, the covariance matrix of *ξ*,

$$\mathbf{C}\_{\xi\overline{\xi}} = \mathbf{E}\{\Xi\Xi^T\},\tag{15}$$

will be diagonal. PCA will then extract the *d* uncorrelated latent variables from

$$\mathbf{C}\_{yy} = \mathbf{E}\{\mathbf{Y}\mathbf{Y}^T\} = \mathbf{E}\{\mathbf{W}\boldsymbol{\Xi}\boldsymbol{\Xi}^T\mathbf{W}^T\} = \mathbf{W}\boldsymbol{\Xi}\{\boldsymbol{\Xi}\boldsymbol{\Xi}\boldsymbol{\Xi}^T\}\mathbf{W}^T = \mathbf{W}\mathbf{C}\_{\tilde{\xi}\tilde{\xi}}\mathbf{W}^T,\tag{16}$$

that pre- and post-multiplying by **W***<sup>T</sup>* and **W**, respectively, reads

$$\mathbf{C}\_{\xi\overline{\xi}} = \mathbf{W}^T \mathbf{C}\_{yy} \mathbf{W}.\tag{17}$$

By factorizing the covariance matrix **C***yy*, applying the singular value decomposition, SVD,

$$\mathbf{C}\_{yy} = \mathbf{V} \boldsymbol{\Lambda} \mathbf{V}^T.\tag{18}$$

and taking into account Equation (17), it results

$$\mathbf{C}\_{\xi\overline{\xi}} = \mathbf{W}^T \mathbf{V} \mathbf{A} \mathbf{V}^T \mathbf{W}\_\prime \tag{19}$$

that holds when the *d* columns of **W** are taken collinear with *d* columns of **V**, i.e.,

$$\mathbf{W} = \mathbf{V}\mathbf{I}\_{\mathrm{D}\times d}.\tag{20}$$
