**2. Preliminaries**

In this paper, we are considering the flow problem in highly heterogeneous media

$$\begin{aligned} -\text{div}\left(\mathbf{x}\,\nabla u\right) &= f \quad \text{in} \quad \Omega, \\ u &= 0 \quad \text{or} \quad \frac{\partial u}{\partial n} = 0 \quad \text{on} \quad \partial\Omega, \end{aligned} \tag{1}$$

where Ω is the computational domain, *κ* is the permeability coefficient in *<sup>L</sup>*<sup>∞</sup>(Ω), and *f* is a source function in *<sup>L</sup>*<sup>2</sup>(Ω). We assume the coefficient *κ* is highly heterogeneous with high contrast. The classical finite element method for solving (1) numerically is given by: find *uh* ∈ *Vh* such that

$$\mathfrak{a}(\mathfrak{u}\_{\hbar\prime}, \mathfrak{v}) = \int\_{\Omega} \mathfrak{x} \nabla \mathfrak{u}\_{\hbar} \cdot \nabla \mathfrak{v} \, d\mathfrak{x} = \int\_{\Omega} f \mathfrak{v} \, d\mathfrak{x} = (f, \mathfrak{v}) \quad \text{for all } \mathfrak{v} \in V\_{\hbar\prime} \tag{2}$$

where *Vh* is a standard conforming finite element space over a partition T*h* of Ω with mesh size *h*.

However, with the highly heterogeneous property of coefficient *κ*, the mesh size *h* has to be taken extremely small to capture the underlying fine-scale features of *κ*. This ends up with a large computational cost. GMsFEM [4,5] serves as a model reduction technique to reduce the number of degrees of freedom and attain both efficiency and accuracy to a considerable extent. GMsFEM has been successfully extended to other formulations and applied to other problems. Here we provide a brief introduction of the main ingredients of GMsFEM. For a more detailed discussion of GMsFEM and related concepts, the reader is referred to [22–26].

In GMsFEM, we define a coarse mesh T *H* over the domain Ω and refine to obtain a fine mesh T *h* with mesh size *h H*, which is fine enough to restore the multiscale properties of the problem. Multiscale basis functions are defined on coarse grid blocks using linear combinations of finite element basis functions on T *h*, and designed to resolve the local multiscale behaviors of the exact solution. The multiscale finite element space *V*ms, which is a principal component subspace of the conforming finite space *Vh* with dim(*<sup>V</sup>*ms) dim(*Vh*), is constructed by the linear span of multiscale basis functions. The multiscale solution *u*ms ∈ *V*ms is then defined by

$$a(u\_{\rm ms}, v) = (f, v) \quad \text{for all } v \in V\_{\rm ms}.\tag{3}$$

We consider the identification of dominant modes for solving (1) by multiscale basis functions, including spectral basis functions and simplified basis functions, in GMsFEM. Here, we present the details of the construction of multiscale basis functions in GMsFEM. Let *Nx* = {*xi* |1 ≤ *i* ≤ *Nv*} be the set of nodes of the coarse mesh T *H*. For each coarse grid node *xi* ∈ *Nx*, the coarse neighborhood *ωi* is defined by

$$\omega\_{i} = \bigcup \{ \mathbb{K}\_{j} \in \mathcal{T}^{H} \colon \quad \mathbf{x}\_{i} \in \mathbb{Z}\_{j} \},\tag{4}$$

that is, the union of the coarse elements *Kj* ∈ T *H* containing the coarse grid node *xi*. An example of the coarse and fine mesh, coarse blocks and a coarse neighborhood is shown in Figure 1. For each coarse neighbourhood *ωi*, we construct multiscale basis functions {*φ<sup>ω</sup><sup>i</sup> j*}*Li j*=1 supported on *ωi*.

For the construction of spectral basis functions, we first construct a snapshot space *V*(*i*) snap spanned by local snapshot basis functions *ψ<sup>i</sup>*,*<sup>k</sup>* snap for each local coarse neighborhood *ωi*. The snapshot basis function *ψ<sup>i</sup>*,*<sup>k</sup>* snap is the solution of a local problem

$$\begin{aligned} -\text{div}(\kappa \nabla \psi^{j,k}\_{\text{snap}}) &= 0, \quad \text{in} \quad \omega\_{\text{i}}\\ \psi^{j,k}\_{\text{snap}} &= \delta\_{i,k}, \quad \text{on} \quad \partial \omega\_{\text{i}}. \end{aligned} \tag{5}$$

**Figure 1.** An illustration of coarse mesh (**left**), a coarse neighborhood and coarse blocks (**right**).

The fine grid function *<sup>δ</sup>i*,*<sup>k</sup>* is a function defined for all *xs* ∈ *∂ωi*, where {*xs*} denote the fine degrees of freedom on the boundary of local coarse region *ωi*. Specifically,

$$\delta\_{i,k}(\mathbf{x}\_s) = \begin{cases} 1 & \text{if } \mathbf{s} = k \\ 0 & \text{if } \mathbf{s} \neq k \end{cases} \tag{6}$$

The linear span of these harmonic extensions forms the local snapshot space *V*(*i*) snap = span*k*{*ψ<sup>i</sup>*,*<sup>k</sup>* snap}. One can also use randomized boundary conditions to reduce the computational cost associated with snapshot calculations [25]. Next, a spectral problem is designed based on our analysis and used to reduce the dimension of local multiscale space. More precisely, we seek for eigenvalues *λim* and corresponding eigenfunctions *φ<sup>ω</sup><sup>i</sup> m* ∈ *V*(*i*) snap satisfying

$$a\_i(\phi\_m^{\omega\_i}, \upsilon) = \lambda\_m^i s\_i(\phi\_m^{\omega\_i}, \upsilon), \quad \forall \upsilon \in V\_{\text{snap}}^{(i)} \tag{7}$$

where the bilinear forms in the spectral problem are defined as

$$\begin{aligned} a\_i(u, v) &= \int\_{\omega\_i} \kappa \nabla u \cdot \nabla v, \\ s\_i(u, v) &= \int\_{\omega\_i} \bar{\kappa} u v, \end{aligned} \tag{8}$$

where *κ*˜ = ∑*j <sup>κ</sup>*|∇*<sup>χ</sup>j*|2, and *χj* denotes the multiscale partition of the unity function. We arrange the eigenvalues *λim* of the spectral problem (7) in ascending order, and select the first *li* eigenfunctions {*φ<sup>ω</sup><sup>i</sup> m* }*lim*=<sup>1</sup> corresponding to the small eigenvalues as the multiscale basis functions.

An alternative way to construct the multiscale basis function is by using the idea of simplified basis functions. This approach assumes the number of channels and position of the channelized permeability field are known. Therefore we can obtain multiscale basis functions {*φ<sup>ω</sup><sup>i</sup> m* }*lim*=<sup>1</sup> using these information and without solving the spectral problem [27].

Once the multiscale basis functions are constructed, the span of the multiscale basis functions will form the offline space

$$\begin{aligned} V\_{\text{ms}}^{(i)} &= \text{span}\{\phi\_m^{\omega\_i}\}\_{m=1'}^{l\_i} \\ V\_{\text{ms}} &= \ominus\_i V\_{\text{ms}}^{(i)}. \end{aligned} \tag{9}$$

We will then seek a multlscale solution *ums* ∈ *Vms* satisfying

$$a(u\_{\rm ms}, v) = (f, v) \quad \text{for all } v \in V\_{\rm ms} \tag{10}$$

which is a Galerkin projection of the (1) onto *Vms*, and can be written as a system of linear equations

$$A\_{\mathfrak{c}}u\_{\mathfrak{c}} = b\_{\mathfrak{c}\prime}$$

where *Ac* and *bc* are the coarse-scale stiffness matrix and load vector. If we collect all the multiscale basis functions and arrange the fine-scale coordinate representation in columns, we obtain the downscaling operator *R*. Then the fine-scale representation of the multiscale solution is given by

$$
u\_{m\mathfrak{s}} = R\mathfrak{u}\_{\mathfrak{c}}.\tag{12}$$

#### **3. Deep Learning for GMsFEM**

In applications, there are uncertainties within some local regions of the permeability field *κ* in the flow problem. Thousands of forward simulations are needed to quantify the uncertainties of the flow solution. GMsFEM provides us with a fast solver to compute the solutions accurately and efficiently. Considering that there is a large amount of simulation data, we are interested in developing a method utilizing the existing offline data and reducing direct computational effort later. In this work, we aim at using DNN to model the relationship between heterogeneous permeability coefficient *κ* and the key ingredients of GMsFEM solver, i.e., coarse scale stiffness matrices and multiscale basis functions. When the relation is built up, we can feed the network any realization of the permeability field and obtain the corresponding GMsFEM ingredients, and further restore fine-grid GMsFEM solution of (1). The general idea of utilizing deep learning in the GMsFEM framework is illustrated in Figure 2.

**Figure 2.** A flow chart in illustrating the idea of using deep learning in the generalized multiscale finite element method (GMsFEM) framework.

Suppose that there are uncertainties for the heterogeneous coefficient in a local coarse block *K*0, which we call the target block, and the permeability outside the target block remains the same. For example, for a channelized permeability field, the position, location and the permeability values of the channels in the target block can vary. The target block *K*0 is inside three coarse neighborhoods, denoted by *ω*1, *ω*2, *ω*3. The union of the 3 neighborhoods, i.e.,

$$
\omega^+(\mathcal{K}\_0) = \omega\_1 \cup \omega\_2 \cup \omega\_{\mathcal{N}} \tag{13}
$$

are constituted of by the target block *K*0 and 12 other coarse blocks, denoted by {*Kl*}<sup>12</sup>*l*=<sup>1</sup> A target block and its surrounding neighborhoods are depicted in Figure 3.

**Figure 3.** An illustration of a target coarse block *K*0 and related neighborhoods.

For a fixed permeability field *κ*, one can compute the multiscale basis functions *φ<sup>ω</sup><sup>i</sup> m* (*κ*) defined by (7), for *i* = 1, 2, 3, and the local coarse-scale stiffness matrices *AKl c* (*κ*), defined by

$$\mathbb{E}\left[A\_{\mathbf{c}}^{K\_l}(\boldsymbol{\kappa})\right]\_{\mathfrak{m},\mathfrak{u}}^{i,j} = \int\_{K\_l} \boldsymbol{\kappa} \nabla \boldsymbol{\phi}\_{\mathfrak{m}}^{\omega\_{\overline{i}}}(\boldsymbol{\kappa}) \cdot \nabla \boldsymbol{\phi}\_{\mathfrak{n}}^{\omega\_{\overline{j}}}(\boldsymbol{\kappa}),\tag{14}$$

for *l* = 0, 1, . . . , 12. We are interested in constructing the maps *g<sup>m</sup>*,*<sup>i</sup> B*and *<sup>g</sup>lM*, where

• *g<sup>m</sup>*,*<sup>i</sup> B* maps the permeability coefficient *κ* to a local multiscale basis function *φ<sup>ω</sup><sup>i</sup> m* , where *i* denotes the index of the coarse block, and *m* denotes the index of the basis in coarse block *ωi*

$$\!\!\!g\_B^{m,i}:\kappa \longmapsto \!\!\!\phi\_m^{\omega\_i}(\kappa),\tag{15}$$

• *glM* maps the permeability coefficient *κ* to the coarse grid parameters *AKl c* (*l* = 0, ··· , 12)

$$\mathbb{S}\_{\mathcal{M}}^{l}: \kappa \mapsto \mathcal{A}\_{\mathfrak{c}}^{\mathcal{K}\_{l}}(\kappa). \tag{16}$$

In this work, our goal is to make use of deep learning to build fast approximations of these quantities associated with the uncertainties in the permeability field *κ*, which can provide fast and accurate solutions to the heterogeneous flow problem (1).

For each realization *κ*, one can compute the images of *κ* under the local multiscale basis maps *g<sup>m</sup>*,*<sup>i</sup> B* and the local coarse-scale matrix maps *<sup>g</sup>lM*. These forward calculations serve as training samples for building a deep neural network for approximation of the corresponding maps, i.e.,

$$\begin{aligned} \mathcal{N}\_B^{m,\imath}(\kappa) &\approx \mathcal{g}\_B^{m,\imath}(\kappa),\\ \mathcal{N}\_M^{l}(\kappa) &\approx \mathcal{g}\_M^{l}(\kappa). \end{aligned} \tag{17}$$

In our networks, the permeability field *κ* is the input, while the multiscale basis functions *φ<sup>ω</sup><sup>i</sup> m* and the coarse-scale matrix *AKl c* are the outputs. Once the neural networks are built, we can use the networks to compute the multiscale basis functions and coarse-scale parameters in the associated region for any permeability field *κ*. Using these local information from the neural networks together with the global information which can be pre-computed, we can form the downscale operator *R* with the multiscale basis functions, form and solve the linear system (11), and obtain the multiscale solution by (12).

## *3.1. Network Architecture*

In general, an *L*-layer neural network N can be written in the form

$$\mathcal{N}(\mathbf{x}; \boldsymbol{\theta}) = \sigma(\mathbb{W}\_{L}\sigma(\cdots \cdot \sigma(\mathbb{W}\_{2}\sigma(\mathbb{W}\_{1}\mathbf{x} + b\_{1}) + b\_{2})\cdots) + b\_{L}),\tag{18}$$

where *θ* := (*<sup>W</sup>*1, *W*2, ··· , *WL*, *b*1, *b*2, ··· , *bL*), *W*'s are the weight matrices and *b*'s are the bias vectors, *σ* is the activation function, *x* is the input. Such a network is used to approximate the output *y*. Our goal is then to find *θ*∗ by solving an optimization problem

$$\theta^\* = \underset{\theta}{\text{argmin}} \quad \mathcal{L}(\theta)\_\prime \tag{19}$$

where L(*θ*) is called loss function, which measures the mismatch between the image of the input *x* under the the neural network N (*<sup>x</sup>*, *y*; *θ*) and the desired output *y* in a set of training samples (*xj*, *yj*). In this paper, we use the mean-squared error metric to be our loss function

$$\mathcal{L}(\theta) = \frac{1}{N} \sum\_{j=1}^{N} \|y\_j - \mathcal{N}(x\_j; \theta)\|\_{2}^{2} \tag{20}$$

where *N* is the number of the training samples. An illustration of a deep neural network is shown in Figure 4.

**Figure 4.** An illustration of a deep neural network.

Suppose we have a set of different realizations of the permeability {*<sup>κ</sup>*1, *κ*2, ··· , *<sup>κ</sup>N*} in the target block. In our network, the input *xj* = *κj* ∈ R*<sup>d</sup>* is a vector containing the permeability image pixels in the target block. The output *yj* is an entry of the local stiffness matrix *AKl c* , or the coordinate representation of a multiscale basis function *φ<sup>ω</sup><sup>i</sup> m* . We will make use of these sample pairs (*xj*, *yj*) to train a deep neural network N *<sup>m</sup>*,*i B* (*x*; *<sup>θ</sup>*<sup>∗</sup>*B*) and N *lM*(*x*; *<sup>θ</sup>*<sup>∗</sup>*M*) by minimizing the loss function with respect to the network parameter *θ*, such that the trained neural networks can approximate the functions *g<sup>m</sup>*,*<sup>i</sup> B* and *<sup>g</sup>lM*, respectively. Once the neural is constructed, for some given new permeability coefficient *<sup>κ</sup>N*+1, we use our trained networks to compute a fast prediction of the outputs, i.e., local multiscale basis functions *φ<sup>ω</sup>i*,pre<sup>d</sup> *m* by

$$\boldsymbol{\Phi}\_{m}^{\omega\_{i}\text{puck}}(\kappa\_{N+1}) = \boldsymbol{\mathcal{N}}\_{B}^{m,i}(\kappa\_{N+1}; \boldsymbol{\theta}\_{B}^{\*}) \approx \boldsymbol{g}\_{B}^{m,i}(\kappa\_{N+1}) = \boldsymbol{\phi}\_{m}^{\omega\_{i}}(\kappa\_{N+1}),\tag{21}$$

and local coarse-scale stiffness matrix *AKl*,pre<sup>d</sup> *c*by

$$A\_{\mathfrak{c}}^{K\_l \text{pred}}(\kappa\_{N+1}) = \mathcal{N}\_M^l(\kappa\_{N+1}; \theta\_M^\*) \approx g\_M^l(\kappa\_{N+1}) = A\_{\mathfrak{c}}^{K\_l}(\kappa\_{N+1}).\tag{22}$$

#### *3.2. Network-Based Multiscale Solver*

Once the neural networks are built, we can assemble the predicted multiscale basis functions to obtain a prediction *R*pre<sup>d</sup> for the downscaling operator, and assemble the predicted local coarse-scale stiffness matrix *AKl*,pre<sup>d</sup> *c* in the global matrix *A*pre<sup>d</sup> *c* . Following (11) and (12), we solve the predicted coarse-scale coefficient vector *u*pre<sup>d</sup> *c* from the following linear system

$$A\_c^{\text{pred}} u\_c^{\text{pred}} = b\_{c\_1} \tag{23}$$

and obtain the predicted multiscale solution *u*pre<sup>d</sup> *ms* by

$$
u\_{\rm ms}^{\rm pred} = R^{\rm pred} u\_c^{\rm pred}.\tag{24}$$

## **4. Numerical Results**

In this section, we present some numerical results for predicting the GMsFEM ingredients and solutions using our proposed method. We considered permeability fields *κ* with high-contrast channels inside the domain Ω = (0, 1)2, which consist of uncertainties in a target cell *K*0. More precisely, we considered a number of random realizations of permeability fields *κ*1, *κ*2, *κ*3, ··· , *<sup>κ</sup>N*+*M*. Each permeability field contained two high-conductivity channels, and the fields differ in the target cell *K*0 by:


In these numerical experiments, we assumed there were uncertainties in only the target block *K*0. The permeability field in Ω \ *K*0 was fixed across all the samples.

We followed the procedures in Section 3 and generated sample pairs using GMsFEM. Local spectral problems were solved to obtain the multiscale basis functions *φ<sup>ω</sup><sup>i</sup> m* . In the neural network, the permeability field *x* = *κ* was considered to be the input, while the local multiscale basis functions *y* = *φ<sup>ω</sup><sup>i</sup> m* and local coarse-scale matrices *y* = *AKl c* were regarded as the output. These sample pairs were divided into the training set and the learning set in a random manner. A large number *N* of realizations, namely *κ*1, *κ*2, ... , *κN*, were used to generate sample pairs in the training set, while the remaining *M* realizations, namely, *<sup>κ</sup>N*+1, *<sup>κ</sup>N*+2, ... , *<sup>κ</sup>N*+*M* are used in testing the predictive power of the trained network. We remark that, for each basis function and each local matrix, we solved an optimization problem in minimizing the loss function defined by the sample pairs in the training set, and build a separate deep neural network. We summarize the network architectures for training local coarse scale stiffness matrix and multiscale basis functions as below:

	- **–** Input: vectorized permeability pixels values *κ*,
	- **–** Output: coefficient vector of multiscale basis *φ<sup>ω</sup><sup>i</sup> m* (*κ*) on coarse neighborhood *ωi*,
	- **–** Loss function: mean squared error 1*N N*∑*j*=1 ||*φ<sup>ω</sup><sup>i</sup> m* (*<sup>κ</sup>j*) − N *<sup>m</sup>*,*i B* (*<sup>κ</sup>j*; *<sup>θ</sup>B*)||22,
	- **–** Activation function: leaky ReLu function,
	- **–** DNN structure: 10–20 hidden layers, each layer have 250–350 neurons,
	- **–**Training optimizer: Adamax.

• For the local coarse scale stiffness matrix *AKl c* , we build a network N *lM* using


For simplicity, the activation functions ReLU function [28] and Leaky ReLU function were used as they have the simplest derivatives among all nonlinear functions. The ReLU function proved to be useful in training deep neural network architectures. The Leaky ReLU function can resolve the vanishing gradient problem which can accelerate the training in some occasions. The optimizers Adamax and Proximal Adagrad are stochastic gradient descent (SGD)-based methods commonly used in neural network training [29]. In both experiments, we trained our network using Python API Tensorflow and Keras [30].

Once a neural network was built on training, it can be used to predict the output given a new input. The accuracy of the predictions is essential in making the network useful. In our experiments, we used *M* sample pairs, which were not used in training the network, to examine the predictive power of our network. On these sample pairs, referred to as the testing set, we compared the prediction and the exact output and computed the mismatch in some suitable metric. Here, we summarize the metric used in our numerical experiment. For the multiscale basis functions, we compute the relative error in *L*<sup>2</sup> and *H*<sup>1</sup> norm, i.e.,

$$c\_{L^2}(\mathbf{x}\_{N+j}) = \left(\frac{\int\_{\Omega} \left|\phi\_m^{\omega\_i}(\mathbf{x}\_{N+j}) - \phi\_m^{\omega\_i, \text{pred}}(\mathbf{x}\_{N+j})\right|^2}{\int\_{\Omega} \left|\phi\_m^{\omega\_i}(\mathbf{x}\_{N+j})\right|^2}\right)^{\frac{1}{2}},$$

$$c\_{H^1}(\mathbf{x}\_{N+j}) = \left(\frac{\int\_{\Omega} \left|\nabla\phi\_m^{\omega\_i}(\mathbf{x}\_{N+j}) - \nabla\phi\_m^{\omega\_i, \text{pred}}(\mathbf{x}\_{N+j})\right|^2}{\int\_{\Omega} \left|\nabla\phi\_m^{\omega\_i}(\mathbf{x}\_{N+j})\right|^2}\right)^{\frac{1}{2}}.\tag{25}$$

For the local stiffness matrices, we computed the relative error in entrywise -2, entrywise -∞ and Frobenius norm, i.e.,

$$\begin{split} e\_{\ell^{2}}(\kappa\_{N+j}) &= \frac{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{c}}}(\kappa\_{N+j}) - A\_{\mathfrak{c}}^{K\_{\mathfrak{l}},\mathsf{pred}}(\kappa\_{N+j})\|\_{2}}{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{l}}}(\kappa\_{N+j})\|\_{2}}, \\ e\_{\ell^{\infty}}(\kappa\_{N+j}) &= \frac{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{l}}}(\kappa\_{N+j}) - A\_{\mathfrak{c}}^{K\_{\mathfrak{l}},\mathsf{pred}}(\kappa\_{N+j})\|\_{\infty}}{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{l}}}(\kappa\_{N+j})\|\_{\infty}}, \\ e\_{\mathcal{F}}(\kappa\_{N+j}) &= \frac{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{l}}}(\kappa\_{N+j}) - A\_{\mathfrak{c}}^{K\_{\mathfrak{l}},\mathsf{pred}}(\kappa\_{N+j})\|\_{F}}{\|A\_{\mathfrak{c}}^{K\_{\mathfrak{l}}}(\kappa\_{N+j})\|\_{F}}. \end{split} \tag{26}$$

A more important measure of the usefulness of the trained neural network is the predicted multiscale solution *u*pre<sup>d</sup> *ms* (*κ*) given by (23) and (24). We compared the predicted solution to *ums* defined by (11) and (12), and computed the relative error in *L*<sup>2</sup> and energy norm, i.e.,

$$\begin{split} e\_{L^{2}}(\kappa\_{N+j}) &= \left( \frac{\int\_{\Omega} \left| u\_{\rm ms}(\kappa\_{N+j}) - u\_{\rm ms}^{\rm pred}(\kappa\_{N+j}) \right|^{2}}{\int\_{\Omega} \left| u\_{\rm ms}(\kappa\_{N+j}) \right|^{2}} \right)^{\frac{1}{2}}, \\ e\_{a}(\kappa\_{N+j}) &= \left( \frac{\int\_{\Omega} \kappa\_{j} \left| \nabla u\_{\rm ms}(\kappa\_{N+j}) - \nabla u\_{\rm ms}^{\rm pred}(\kappa\_{N+j}) \right|^{2}}{\int\_{\Omega} \kappa\_{j} \left| \nabla u\_{\rm ms}(\kappa\_{N+j}) \right|^{2}} \right)^{\frac{1}{2}}. \end{split} \tag{27}$$

## *4.1. Experiment 1*

In this experiment, we considered curved channelized permeability fields. Each permeability field contained a straight channel and a curved channel. The straight channel was fixed and the curved channel struck the boundary of the target cell *K*0 at the same points. The curvature of the sine-shaped channel inside *K*0 varied among all realizations. We generated 2000 realizations of permeability fields, where the permeability coefficients were fixed. Samples of permeability fields are depicted in Figure 5. Among the 2000 realizations, 1980 sample pairs were randomly chosen and used as training samples, and the remaining 20 sample pairs were used as testing samples.

**Figure 5.** Samples of permeability fields in the target block *K*0 in experiment 1.

For each realization, we computed the local multiscale basis functions and local coarse-scale stiffness matrix. In building the local snapshot space, we solved for harmonic extension of all the fine-grid boundary conditions. Local multiscale basis functions were then constructed by solving the spectral problem and multiplied the spectral basis functions with the multiscale partition of unity functions. With the offline space constructed, we computed the coarse-scale stiffness matrix. We used the training samples to build deep neural networks for approximating these GMsFEM quantities, and examined the performance of the approximations on the testing set.

Tables 1–3 record the error of the prediction by the neural networks in each testing sample and the mean error measured in the defined metric. It can be seen that the predictions were of high accuracy. This is vital in ensuring the predicted GMsFEM solver is useful. Table 4 records the error of the multiscale solution in each testing sample and the mean error using our proposed method. It can be observed that using the predicted GMsFEM solver, we obtained a good approximation of the multiscale solution compared with the exact GMsFEM solver.


**Table 1.** Percentage error of multiscale basis functions *φ<sup>ω</sup><sup>i</sup>* 1in experiment 1.

**Table 2.** Percentage error of multiscale basis functions *φ<sup>ω</sup><sup>i</sup>* 2 in experiment 1.


**Table 3.** Percentage error of the local stiffness matrix *AK*0 *c* in experiment 1.



**Table 4.** Percentage error of multiscale solution *ums* in experiment 1.
