Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy

Berrone, Stefano; Pintore, Moreno

doi:10.3390/a17090415

Open AccessArticle

Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy

by

Stefano Berrone

¹ and

Moreno Pintore

^2,3,*

¹

Dipartimento di Scienze Matematiche, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy

²

MEGAVOLT Team, Inria, 48 Rue Barrault, 75013 Paris, France

³

Laboratoire Jacques-Louis Lions, Sorbonne Center for Artificial Intelligence, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(9), 415; https://doi.org/10.3390/a17090415

Submission received: 22 July 2024 / Revised: 8 September 2024 / Accepted: 11 September 2024 / Published: 19 September 2024

(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we introduce a Meshfree Variational-Physics-Informed Neural Network. It is a Variational-Physics-Informed Neural Network that does not require the generation of the triangulation of the entire domain and that can be trained with an adaptive set of test functions. In order to generate the test space, we exploit an a posteriori error indicator and add test functions only where the error is higher. Four training strategies are proposed and compared. Numerical results show that the accuracy is higher than the one of a Variational-Physics-Informed Neural Network trained with the same number of test functions but defined on a quasi-uniform mesh.

Keywords:

VPINN; meshfree; Physics-Informed Neural Networks; error estimator; patches

MSC:

65N12; 65N15; 65N50; 68T05; 92B20

1. Introduction

Physics-Informed Neural Networks (PINNs) are a rapidly emerging numerical technique used to solve partial Differential equations (PDEs) by means of a deep neural network. The first idea can be traced back to the works of Lagaris et al. [1,2,3], but, thanks to the hardware advancements and the existence of deep learning packages like Tensorflow [4], Pytorch [5] and JAX [6], they have recently became popular since the works of Raissi et al. [7,8], published in [9]. In its original formulation, the approximate solution is computed as the output of a neural network trained to minimize the PDE residual on a set of collocation points inside the domain and on its boundary.

The growing interest in PINNs is strictly related to their flexibility. In fact, with minor changes to the implementation, it is possible to solve a huge variety of problems. For example, exploiting the nonlinear nature of the involved neural network, nonlinear [10,11] and high-dimensional PDEs [12] can be solved without the need for globalization methods or additional nonlinear solvers. Moreover, by changing the neural network’s input dimensions or suitably adapting the loss function, it is possible to solve parametric [13,14] or inverse [15,16] problems. When external data are available, they can also be used to guide the optimization phase and improve the PINN accuracy [17].

In order to improve the original PINN proposed in [9] and to adapt it to solve specific problems, several generalizations have been proposed. For example, the deep Ritz method (DRM) [18,19,20] looks for a minimizer of the PDE energy functional and, in the deep Galerkin method (DGM) [21,22,23], an approximation of the

L^{2}

norm of the PDE residual is minimized. It is also possible to exploit domain decomposition strategies [24,25] as in the conservative PINN (CPINN) [26], in the parallel PINN [27], in the extended PINN (XPINN) [28], or in the Finite Basis PINN (FBPINN) [29]. Moreover, it is even possible to change the neural network architecture or the training strategy as in [14,30,31,32,33,34,35]; between the methods based on different architectures, we highlight some works based on the novel Kolmogorov–Arnold Network (KAN) [36] architecture [37,38] and on a Large Language Model (LLM) [39]. More extensive overviews of the existing approaches can be found in [40,41,42,43]. In the context of the current work, an important extension is the Variational-Physics-Informed Neural Network (VPINN) [44,45], where the weak formulation of the problem is used to construct the loss function.

In this work, we focus on VPINNs. As discussed in [44,45,46,47], in order to train a VPINN, one needs to choose a suitable space of test functions, compute the variational residuals against all the test functions on the basis of such a space, and minimize a linear combination of these residuals. Since a spatial mesh is required to define the test functions, the VPINN cannot be considered a meshfree method, even though it is an extension of the PINN, which is meshfree. In this work, we present an adaptive Meshfree VPINN (MF-VPINN) that does not require a global triangulation of the domain but is trained with the same loss function and neural network architecture of a standard VPINN. Note that the MF-VPINN and the original VPINN can solve the same differential problems because the neural network is trained with the same loss functions. We also highlight that they can solve problems where the solution has low regularity that cannot be solved with standard PINNs, for example, in the presence of singular forcing terms, thanks to the weak formulation of the PDE without introducing further approximations or regularizations. However, one of the VPINN’s limitations is that a triangulation of the entire domain is required to define the test functions. Generating it may be very expensive or even impractical for very complex geometries (like, for example, the ones in [48]) and in moderate- or high-dimensional problems, for which automatic mesh-generation algorithms do not exist. For such domains, it is therefore highly advisable or computationally necessary to use a meshfree method such as the original PINN or the proposed MF-VPINN. Moreover, when dealing with complex geometries for which a mesh can be hardly generated, the refinement of the mesh for adaptive methods can be very difficult. In this paper, we describe an algorithm that solves the problem and provides a reliable solution.

The paper is organized as follows. In Section 2, we introduce the problem we are interested in. In particular, we focus on the problem discretization in Section 2.1 and on the MF-VPINN loss function in Section 2.2. Then, an a posteriori error estimator is presented in Section 2.3 and used in Section 2.4 to iteratively generate the required test functions. Numerical results are presented in Section 3. In Section 3.1, we describe the model implementation and some strategies to improve the model efficiency, in Section 3.2 we compare different approaches to generate the test functions and compare their performance and, in Section 3.3, we analyze the role of the error estimator introduced in Section 2.3. Similar tests are performed on a different problem in Section 3.4 to describe possible extensions on more complex domains. Finally, we conclude the paper in Section 4 and discuss future perspectives and ideas.

2. Problem Formulation

Let us consider the following second-order elliptic problem, defined on a polygonal or polyhedral domain

Ω \subset R^{n}

with a Lipshitz boundary

Γ = \partial Ω

:

\{\begin{matrix} L u : = - \nabla \cdot (μ \nabla u) + β \cdot \nabla u + σ u = f & in Ω, \\ u = g & on Γ, \end{matrix}

(1)

where

μ, σ \in L^{\infty} (Ω)

,

β

\in {(W^{1, \infty} (Ω))}^{n}

satisfy

μ \geq μ_{0}

,

σ - \frac{1}{2} \nabla \cdot β \geq 0

in

Ω

for some constant

μ_{0} > 0

, whereas

f \in L^{2} (Ω)

and

g = {\bar{u}}_{| Γ}

for some

\bar{u} \in H^{1} (Ω)

.

In order to derive the corresponding variational formulation, we define the bilinear form a and the linear form F as

a : V \times V \to R, a (w, v) = \int_{Ω} μ \nabla w \cdot \nabla v + β \cdot \nabla w v + σ w v,

(2)

F : V \to R, F (v) = \int_{Ω} f v;

(3)

where V is the function space

V = H_{0}^{1} (Ω)

. We denote by

α \geq μ_{0}

the coercivity constant of a and by

∥ a ∥

and

∥ F ∥

the continuity constants of a and F. Then, the variational formulation of Problem (1) reads as follows: Find

u \in \bar{u} + V

such that

a (u, v) = F (v) \forall v \in V .

(4)

2.1. Problem Discretization

In order to numerically solve Problem (4), one needs to choose suitable finite-dimensional approximations of the trial space

\bar{u} + V

and of the test space V. A Galerkin formulation is considered when we consider a finite-dimensional space

V_{h}^{trial}

for the trial space

\bar{u} + V_{h}^{trial}

and a finite-dimensional test space

V_{h}^{test}

, with

V_{h}^{trial} = V_{h}^{test}

; whereas a Petrov–Galerkin formulation is considered otherwise. In this work, we consider a Petrov–Galerkin formulation in which the trial space is approximated by a set of functions

V^{N N}

of the form

V^{N N} = \bar{u} + V_{h}^{trial}

, with

V_{h}^{trial}

represented by a neural network suitably modified to enforce the Dirichlet boundary conditions, and the test space is a space

V_{h}

of piecewise linear functions.

The neural network considered in the following is a standard fully connected feed-forward neural network. Given the number L of layers and a set of matrices

A_{ℓ} \in R^{N_{ℓ} \times N_{ℓ - 1}}

and vectors

b_{ℓ} \in R^{N_{ℓ}}

,

ℓ = 1, \dots, L

containing the neural network’s trainable weights, the function

w : R^{n} \to R

associated with the considered neural network architecture is:

\begin{matrix} x_{0} = x, \\ x_{ℓ} = ρ (A_{ℓ} x_{ℓ - 1} + b_{ℓ}), ℓ = 1, \dots, L - 1, \\ w (x) = A_{L} x_{L - 1} + b_{L} . \end{matrix}

(5)

where

ρ : R \to R

is a nonlinear function applied element-wise to the vector

A_{ℓ} x_{ℓ - 1} + b_{ℓ}

. In this section, we use

ρ (x) = tanh (x)

; other common choices include, but are not limited to,

ρ (x) = ReLU (x) = max {0, x}

,

ρ (x) = RePU (x) = max {0, x^{p}}

for

1 < p \in N

,

ρ (x) = 1 / (1 + e^{- x})

and

ρ (x) = log (1 + e^{x})

. Note that, in order to represent a function

w : R^{n} \to R

, the layer widths

N_{ℓ}

of the first and last layers are chosen as

N_{0} = n

and

N_{L} = 1

. We denote by

W^{N N}

the set of functions that can be represented as in (5) for any combination of the neural network weights and by

w^{N N}

the vector containing all the trainable weights of the neural network.

The function w defined in (5) is independent of the differential problem that has to be solved and is, in most papers on PINNs or related models, trained to minimize both the residual of the equation and a term penalizing the discrepancy between

w_{| Γ}

and g. Instead, we add a non-trainable layer B to the neural network architecture in order to automatically enforce the required boundary conditions without the need to learn them during the training. As described in [49], the operator B acts on the neural network output as

B w = ϕ w + \bar{g},

(6)

where

ϕ : Ω \to R

is a function vanishing on

Γ

and strictly positive inside

Ω

, and

\bar{g} : Ω \to R

is a suitable extension of

g : Γ \to R

. The advantages of such an approach are also described in [50]. Then, the discrete trial space approximating

\bar{u} + V

can be defined as

V^{N N} = {v^{N N} \in \bar{u} + V : v^{N N} = B w for some w \in W^{N N}} .

On the other hand, the discrete test space

V_{h}

is not associated with the neural network and only contains known test functions. In standard VPINNs, one generates a triangulation

T

of the domain

Ω

and then defines

V_{h}

as the space of functions that coincide with a polynomial of order

p \in N

inside each element of

T

. Instead, we want to construct a discrete space

V_{h}

of functions independent from a global triangulation

T

. Moreover, since in [47] it has been proven that the VPINN convergence rate with respect to mesh refinement decreases when the order of the test functions is increased, we are interested in a space

V_{h}

that only contains piecewise linear functions. For the sake of simplicity, we only consider the case where

n = 2

; the discussion can be directly generalized to the more general case

n \in N

.

Let

\hat{P} \subset R^{n}

be a reference patch. In the following discussion,

\hat{P}

can be any arbitrary star-shaped polygon with

N_{\hat{P}}

vertices and the dimension of its kernel strictly greater than zero. Nevertheless, in the numerical experiments, we only consider the reference patch

\hat{P} = {[0, 1]}^{2}

to avoid any unnecessary computational overhead. Let

M = {M_{i}}_{i = 1}^{n_{patches}}

be a set of affine mappings such that

M_{i} : \hat{P} \to P_{i} \subset Ω

, where we denote as

P_{i}

the patch obtained transforming the reference patch

\hat{P}

through the map

M_{i}

. We assume that

P = {P_{i}}_{i = 1}^{n_{patches}}

is a cover of

Ω

, i.e.,

\cup_{i = 1}^{n_{patches}} P_{i} = Ω

, and we admit overlapping patches.

Let us consider the triangulation

\hat{T} = {{\hat{T}}_{j} : 1 \leq j \leq N_{\hat{P}}}

of

\hat{P}

obtained by connecting each vertex with a single point

c_{\hat{P}}

in its kernel. It is then possible to define a piecewise linear function

\hat{φ}

vanishing on the border of

\hat{P}

such that

\hat{φ} (c_{\hat{P}}) = 1

and

{\hat{φ}}_{| {\hat{T}}_{j}} \in P_{1} ({\hat{T}}_{j})

, for any

j = 1, \dots, N_{\hat{P}}

. Then, we define the discrete test space

V_{h}

as

V_{h} = span {φ_{i} : i = 1, \dots, n_{patches}}

, where

φ_{i} \in V

is the piecewise linear function:

φ_{i} (x) = \{\begin{matrix} \hat{φ} (M_{i}^{- 1} (x)), & x \in P_{i}, \\ 0, & x \notin P_{i} . \end{matrix}

(7)

We remark that the only required triangulation is

\hat{T}

, which contains only

N_{\hat{P}}

triangles (in the numerical tests in this paper,

N_{\hat{P}} = 4

). Instead, there exists no mesh on

Ω

and the test functions

φ_{i}

and their supports

P_{i}

are all independent. Therefore, the proposed method is said to be meshfree. A simple example of a set of patches

P

with

n_{patches} = 7

on the domain

Ω = {[0, 1]}^{2}

is shown in Figure 1. For the sake of simplicity, in this work, we consider a squared reference patch

\hat{P}

with

c_{\hat{P}}

coinciding with its center, and we let each mapping

M_{i}

represent a combination of scalings and translations.

Using the introduced finite-dimensional set of functions

V^{N N}

and

V_{h}

, it is possible to discretize Problem (4) as follows: Find

u^{N N} \in V^{N N}

such that

a (u^{N N}, v) = F (v) \forall v \in V_{h} .

(8)

2.2. Loss Function

In this section, we derive the loss function used to train the neural network. It has to be computable, and its minimizer has to be an approximate solution of Problem (1). We highlight that, when a standard PINN is used, the loss function can be seen as a discrete cost penalizing the residual of (1) directly. In this context, instead, the loss function penalizes the variational residuals of (4) as in standard VPINNs. This is the key difference that differentiates the VPINNs (and its extension proposed in this manuscript) from the other generalizations of the original PINN introduced in Section 1.

Let us consider a quadrature rule of order

q \geq 2

on each triangle

T_{j} \in \hat{T}

,

j = 1, \dots, N_{\hat{P}}

, uniquely identified by a set of nodes and weights

{({\tilde{ξ}}_{ℓ}^{j}, {\tilde{ω}}_{ℓ}^{j}) : ℓ \in I^{T_{j}}}

. The nodes and weights of a composite quadrature formula of order q on

\hat{P}

can be obtained as

{({\hat{ξ}}_{ℓ}, {\hat{ω}}_{ℓ}) : ℓ \in I^{\hat{P}}} = ⋃_{j = 1}^{N_{\hat{P}}} {({\tilde{ξ}}_{ℓ}^{j}, {\tilde{ω}}_{ℓ}^{j}) : ℓ \in I^{T_{j}}} .

Then, the corresponding quadrature rule of order q of an arbitrary patch

P_{i}

is defined as

\{(ξ_{ℓ}^{i}, ω_{ℓ}^{i}) : ℓ \in I^{\hat{P}} | ξ_{ℓ}^{i} = M_{i} ({\hat{ξ}}_{ℓ}), ω_{ℓ}^{i} = {\hat{ω}}_{ℓ} \frac{area (P_{i})}{area (\hat{P})}\} .

(9)

Using the quadrature rule in (9), it is possible to define an approximate restriction on each patch of the forms a and F as follows:

a_{h}^{i} (w, v) = \sum_{ℓ \in I^{\hat{P}}} [μ \nabla w \cdot \nabla v + β \cdot \nabla w v + σ w v] (ξ_{ℓ}^{i}) ω_{ℓ}^{i} \approx a_{P_{i}} (w, v),

(10)

F_{h}^{i} (v) = \sum_{ℓ \in I^{\hat{P}}} [f v] (ξ_{ℓ}^{i}) ω_{ℓ}^{i} \approx F_{P_{i}} (v),

(11)

where

a_{P_{i}} (w, v)

and

F_{P_{i}} (v)

are defined as in (2) and (3) but restricting the supports of the integrals to

P_{i}

. We remark that, since it is not possible to compute integrals involving a neural network exactly, we can only use the forms

a_{h}^{i}

and

F_{h}^{i}

in the loss function. Exploiting the linearity of

a (w, v)

and

F (v)

with respect to v to consider only the basis

{φ_{i}}_{i = 1}^{n_{patches}}

of

V_{h}

as set of test functions, we approximate Problem (8) as follows: Find

u^{N N} \in V^{N N}

such that

a_{h}^{i} (u^{N N}, φ_{i}) = F_{h}^{i} (φ_{i}) \forall i = 1, \dots, n_{patches} .

(12)

Then, in order to cast Problem (12) into an optimization problem, we define the residuals

r_{h, i} (w) = F_{h}^{i} (φ_{i}) - a_{h}^{i} (w, φ_{i}), i = 1, \dots, n_{patches}

(13)

and the loss function

R_{h}^{2} (w; P) = \frac{1}{n_{patches}} \sum_{i = 1}^{n_{patches}} γ_{i} r_{h, i}^{2} (w),

(14)

where

γ_{i}

are suitable positive scaling coefficients. In this work, we use

γ_{i} = area {(P_{i})}^{- 1}

to give the same importance to each patch. Note that this is equivalent to normalizing the quadrature rules involved in (10) and (11); this way, each residual

r_{h, i}

can be regarded as a linear combination of the MF-VPINN value and derivatives independent of the size of the support of the patch

P_{i}

. We also highlight that the loss function depends on the choice of

M

since all the used test functions are generated starting from the corresponding mappings

M_{i} \in M

. We are now interested in a practical procedure to obtain a set

\tilde{P}

such that the approximate solution computed minimizing

R_{h}^{2} (\cdot; \tilde{P})

is as accurate as possible with

\tilde{P}

being as small as possible.

2.3. The a Posteriori Error Estimator

The goal of this section is to derive an error estimator associated with an arbitrary patch

P_{i}

, with

i \in {1, \dots, n_{patches}}

. To do so, we rely on the a posteriori error estimator proposed in [46]. It has been proven to be efficient and reliable; therefore, such an estimator allows us to know where the error is larger without knowing the exact solution of the PDE. Let us consider the patch

P_{i}

, formed by the triangles

T_{i, 1}, \dots, T_{i, N_{\hat{P}}}

and a triangulation

T_{i}

of

Ω

such that

T_{i, j} \in T_{i}

, for every

j = 1, \dots, N_{\hat{P}}

. We remark that the triangulation

T_{i}

does not have to be explicitly generated; it is only used to properly define all the quantities introduced in [46] required to derive the proposed error estimator.

Let

V_{h}^{i} = span {ψ_{j}^{i} : 1 \leq j \leq \dim (V_{h}^{i})}

be the space of piecewise linear functions defined on

T_{i}

. Where

{ψ_{j}^{i} : 1 \leq j \leq \dim (V_{h}^{i})}

is a Lagrange basis of

V_{h}^{i}

. It is then possible to define two constants

c_{h}^{i}

and

C_{h}^{i}

, with

0 < c_{h}^{i} < C_{h}^{i}

, such that

c_{h}^{i} {| v |}_{1, Ω} \leq {∥ v ∥}_{2} \leq C_{h}^{i} {| v |}_{1, Ω} \forall v \in V_{h}^{i},

(15)

where

v = \sum_{j = 1}^{\dim (V_{h}^{i})} v_{j} ψ_{j}^{i}

is an arbitrary element of

V_{h}^{i}

associated with the expansion coefficients

v = \{v_{1}, \dots, v_{\dim (V_{h}^{i})}\}

and

{∥ v ∥}_{2} = {(\sum_{j = 1}^{\dim (V_{h}^{i})} v_{i}^{2})}^{1 / 2}

.

Then, given an integer

k \geq 0

, for any element

E \in T_{i}

, we define the projection operator

Π_{E, k} : L^{2} (E) \to P_{k} (E)

such that

\int_{E} Π_{E, k} ϕ = \int_{E} ϕ \forall ϕ \in L^{2} (E) .

(16)

We also denote by

{(ξ_{ℓ}^{E}, ω_{ℓ}^{E}) : ℓ \in I^{E}}

a quadrature formula of order q on E and define the quadrature-based discrete seminorm:

{∥ v ∥}_{0, E, ω} = {(\sum_{ℓ \in I^{E}} v^{2} (ξ_{ℓ}^{E}) ω_{ℓ}^{E})}^{1 / 2} .

(17)

We require the weights and nodes of this quadrature rule to coincide with the ones introduced in (9) when E is a triangle included in

P_{i}

(i.e., when

E \in {T_{i, 1}, \dots, T_{i, N_{\hat{P}}}}

). We can now introduce all the terms involved in the a posteriori error estimator.

Let

η_{rhs, 1} (E)

and

η_{rhs, 2} (E)

be the quantities:

\begin{matrix} η_{rhs, 1} (E) & = h_{E} {∥ f - Π_{E, q - 1} f ∥}_{0, E}, \\ η_{rhs, 2} (E) & = h_{E} ∥ f - Π_{E, q - 1} {f ∥}_{0, E, ω} + {∥ f - Π_{E, q} f ∥}_{0, E, ω} . \end{matrix}

(18)

They measure the oscillations of the forcing term with respect to its polynomial projections in various norms. Similar oscillations are also measured for the diffusion, convection and reaction terms by the terms

η_{coef, i} (E)

for

i = 1, \dots, 6

:

\begin{matrix} η_{coef, 1} (E) & = ∥ μ \nabla u^{N N} - Π_{E, q} (μ \nabla u^{N N}) ∥_{0, E}, \\ η_{coef, 2} (E) & = h_{E} {∥ β \cdot \nabla u^{N N} - Π_{E, q - 1} (β \cdot \nabla u^{N N}) ∥}_{0, E}, \\ η_{coef, 3} (E) & = h_{E} {∥ σ u^{N N} - Π_{E, q - 1} (σ u^{N N}) ∥}_{0, E}, \\ η_{coef, 4} (E) & = ∥ μ \nabla u^{N N} - Π_{E, q} (μ \nabla u^{N N}) ∥_{0, E, ω}, \\ η_{coef, 5} (E) & = h_{E} {∥ β \cdot \nabla u^{N N} - Π_{E, q - 1} (β \cdot \nabla u^{N N}) ∥}_{0, E, ω}, \\ + ∥ β \cdot \nabla u^{N N} - Π_{E, q} (β \cdot \nabla u^{N N}) ∥_{0, E, ω} \\ η_{coef, 6} (E) & = h_{E} {∥ σ u^{N N} - Π_{E, q - 1} (σ u^{N N}) ∥}_{0, E, ω} \\ + ∥ σ u^{N N} - Π_{E, q} (σ u^{N N}) ∥_{0, E, ω}, \end{matrix}

(19)

where

u^{N N}

is the output of the neural network after the enforcement of the Dirichlet boundary conditions through the operator B and

h_{E}

is the diameter of E. Then, let us define the term

η_{r e s} (E)

, which measures how well the equation is satisfied, as

η_{res} (E) = h_{E} ∥ {bulk}_{E} (u^{N N}) ∥_{0, E} + h_{E}^{1 / 2} \sum_{e \subset \partial E} {∥ {jump}_{e} (u^{N N}) ∥}_{0, e},

(20)

where

{bulk}_{E} (u^{N N}) = Π_{E, q - 1} f + \nabla \cdot Π_{E, q} (μ \nabla u^{N N}) - Π_{E, q - 1} (β \cdot \nabla u^{N N} + σ u^{N N})

{jump}_{e} (u^{N N}) = Π_{E_{1}, q} (μ \nabla u^{N N}) \cdot n - Π_{E_{2}, q} (μ \nabla u^{N N}) \cdot n .

Note that

{jump}_{e} (u^{N N})

measures the interelemental jumps of

Π_{E, q} (μ \nabla u^{N N})

across the edge e with normal unit vector

n

shared by the elements

E_{1}

and

E_{2}

.

Finally, we introduce the approximate elemental forms:

a_{h}^{i, E} (w, v) = \sum_{ℓ \in I^{E}} [μ \nabla w \cdot \nabla v + β \cdot \nabla w v + σ w v] (ξ_{ℓ}^{E}) ω_{ℓ}^{E},

(21)

F_{h}^{i, E} (v) = \sum_{ℓ \in I^{E}} [f v] (ξ_{ℓ}^{E}) ω_{ℓ}^{E},

(22)

where

ξ_{ℓ}^{E}

and

ω_{ℓ}^{E}

,

ℓ \in I^{E}

, are the nodes and weights used in Equation (17). With such forms, it is possible to define the residuals

r_{h, i, j} (w) = \sum_{E \in T_{i}} F_{h}^{i, E} (ψ_{j}^{i}) - a_{h}^{i, E} (w, ψ_{j}^{i}), j = 1, \dots, \dim (V_{h}^{i})

and the quantity

η_{loss} (E)

as

η_{loss} (E) = C_{h} \sqrt{\sum_{j \in I_{h}^{E}} r_{h, i, j}^{2} (u^{N N})} .

(23)

Here, denoting the support of the function

ψ_{j}^{i} \in V_{h}^{i}

by

supp ψ_{j}^{i}

, the elemental index set

I_{h}^{E} = {j \in I_{h} : E \subset supp ψ_{j}^{i}}

is the set containing the indices of the functions whose support contains E. It is then possible to estimate the error between the unknown exact solution u and its MF-VPINN approximation

u^{N N}

by means of the computable quantities in Equations (18)–(20) and (23) as

| u - u^{N N} |_{1, E} ≲ {(η_{res}^{2} (E) + η_{loss}^{2} (E) + \sum_{i = 1}^{6} η_{coef, i}^{2} (E) + \sum_{i = 1}^{2} η_{rhs, i}^{2} (E))}^{1 / 2} .

(24)

Once more, we refer to [46] for the proof of such a statement.

We recall that our goal is to obtain a computable error estimator associated with a single patch

P_{i}

. When evaluated on an element

E \in P_{i}

, the quantity on the right-hand side of Equation (24) implicitly depends on several elements in

V_{h}^{i}

that do not belong to

P_{i}

because of the presence of

η_{res}^{2} (E)

and

η_{loss}^{2} (E)

. Therefore, such an estimator is not computable without generating the triangulation

T_{i}

and the corresponding space

V_{h}^{i}

. Instead, we look for an error estimator that does not control the error on the entire patch but only in a neighborhood

N_{i}

of its center

c_{P_{i}} = M_{i} (c_{\hat{P}})

. This can be carried out by considering only the terms whose computation involves geometric elements containing

c_{P_{i}}

and the only function

ψ_{j}^{i}

that does not vanish on

c_{P_{i}}

. Note that such a function is the function

φ_{i}

defined in (7). Therefore, the error estimator

η_{i}

that controls the error in

N_{i}

can be computed as

η_{i} = {[η_{res, i}^{2} + C_{h}^{2} r_{h, i}^{2} (u^{N N}) + \sum_{j = 1}^{N_{\hat{P}}} (\sum_{k = 1}^{6} η_{coef, k}^{2} (T_{i, j}) + \sum_{k = 1}^{2} η_{rhs, k}^{2} (T_{i, j}))]}^{1 / 2},

(25)

where

η_{res, i}

is defined as

η_{res, i} = \sum_{j = 1}^{N_{\hat{P}}} (h_{T_{i, j}} ∥ {bulk}_{T_{i, j}} (u^{N} N) ∥_{0, T_{i, j}} + h_{P_{i}}^{1 / 2} ∥ {jump}_{e_{i, j}} (u^{N N}) ∥_{0, e_{i, j}}) .

(26)

In (26), we denote by

h_{P_{i}}

the diameter of the patch

P_{i}

and by

e_{i, j}, j = 1, \dots, N_{\hat{P}}

the edges connecting its vertices with

c_{P_{i}}

.

Since

η_{i}

can be seen as an approximation of the right-hand side of (24), we use it as an indicator of the error

| u - u^{N N} |_{1, N_{i}}

. It is important to remark that

η_{i}

can be computed without generating

T_{i}

and

V_{h}^{i}

. In fact, its computation involves only the function

φ_{i}

, the triangles partitioning

P_{i}

and the edges connecting its vertices with its center.

2.4. The Choice of $M$ and $P$

In this section, the procedure adopted to generate the set of test functions used to train the MF-VPINN is described. We propose an iterative approach, in which the MF-VPINN is initially trained with very few test functions, and then other test functions are added in the regions of the domain in which the

H_{1}

norm of the error is larger. We anticipate that, as shown in Section 3.3, generating test functions in regions where

r_{h, i}^{2}

is large may not lead to accurate solutions because

r_{h, i}^{2}

is not proportional to the

H_{1}

error. Therefore, such a choice may increase the density of test functions where they are not required while maintaining only a few test functions in regions in which the error is large. Instead, we use the error indicator

η_{i}

defined in (25).

Let us initially consider a cover

P_{0} = {P_{i}}_{i = 1}^{n_{patches}}

of

Ω

comprising a few patches (i.e.,

n_{patches}

is a small integer) and the corresponding set of mappings

M_{0} = {M_{i}}_{i = 1}^{n_{patches}}

and test functions

{φ_{i}}_{i = 1}^{n_{patches}}

. These sets induce a loss function

R_{h}^{2} (w; P_{0})

as defined in (14), which is used to train an MF-VPINN. After this initial training, one computes

η_{i}^{γ} = γ_{i} η_{i}

for each patch

P_{i} \in P_{0}

and stores the result in the array

η

= [η_{1}^{γ}, \dots, η_{n_{patches}}^{γ}]

. Note that

η_{i}^{γ}

is a suitable rescaling of

η_{i}

to get rid of dependence from the size of

P_{i}

. Let us choose a threshold

1 \leq τ_{0} \leq n_{patches}

, sort

η

in descending order obtaining

η_{sort} = [η_{s_{1}}^{γ}, \dots, η_{s_{n_{patches}}}^{γ}]

(where we denote by

[s_{1}, \dots, s_{n_{patches}}]

the index set corresponding to a suitable permutation of

[1, \dots, n_{patches}]

) and consider the vector

{\bar{η}}_{0} = [η_{s_{1}}^{γ}, \dots, η_{s_{τ_{0}}}^{γ}]

. It is possible to note that

{\bar{η}}_{0}

contains only the

τ_{0}

worst values of the indicator; it thus allows us to understand where the error is higher and where additional test functions are required to increase the model accuracy.

It is then possible to move forward with the second iteration of the iterative training. For each patch

P_{i}

such that

η_{i}^{γ} \in {\bar{η}}_{0}

, we generate

k_{new}

new patches

P_{i}^{k}

,

k = 1, \dots, k_{new}

with centers inside

P_{i}

and areas such that

area (P_{i}) < \cup_{k = 1}^{k_{new}} area (P_{i}^{k}) < c \cdot area (P_{i})

, where

c > 1

is a tunable parameter. In the numerical experiments, we use

c = 1.25

. There exist different strategies to choose the number, the dimension, and the position of the centers of the new patches. Such strategies are described in Section 3 with particular attention to the effects of these choices on the MF-VPINN accuracy.

Let us denote by

P_{1}

the set

P_{1} = P_{0} \cup {P_{s_{1}}^{k}}_{k = 1}^{k_{new}} \cup \dots \cup {P_{s_{τ_{0}}}^{k}}_{k = 1}^{k_{new}}

and by

M_{1}

the corresponding set of mappings. Then, it is possible to define the loss function

R_{h}^{2} (w; P_{1})

, continue the training of the previously trained MF-VPINN, compute the error indicator

η_{i}^{γ}

for each patch

P_{i} \in P_{1}

, and obtain the vector

{\bar{η}}_{1}

used to decide where to insert the new patches to generate

P_{2}

. In general, iterating this procedure, it is possible to compute a set of patches

P_{m}

and of mappings

M_{m}

from the previously obtained sets

P_{m - 1}

and

M_{m - 1}

. Technical optimization details are discussed in Section 3.1.

3. Numerical Results

In this section, we provide several numerical results to show the performance of the training strategy described in Section 2.4. In Section 3.1, we describe the structure of the MF-VPINN implementation and highlight some details that have to be taken into account in order to increase the efficiency of the training phase. Different strategies to choose the position of the new patches are discussed in Section 3.2. The importance of the use of the error indicator is remarked in Section 3.3 with additional numerical examples. An example on a more complex domain is shown in Section 3.4 to discuss some ideas to adapt the proposed strategies in more complex domains.

3.1. Implementation Details

The computer code used to perform the experiment is implemented in Python using the Python package Tensorflow [4] to generate the neural network architecture and train the MF-VPINN. Using the notation introduced in Section 2.1, the used neural network consists of

L = 5

layers with

N_{ℓ} = 50

neurons in each hidden layer (i.e., for

ℓ = 1, \dots, L - 1

); the activation function is the hyperbolic tangent in each hidden layer. For the first iteration of the iterative training, the neural network weights in the ℓ-th layer are initialized with a glorot normal distribution, i.e., a truncated normal distribution with mean 0 and standard deviation equal to

\sqrt{2 / (N_{ℓ - 1} + N_{ℓ})}

. Then, for the subsequent iterations, their are initialized with the weights obtained at the end of the previous one.

During the first iteration of the training (during the minimization of

R_{h}^{2} (\cdot; P_{0})

), the optimization is carried out by exploiting the ADAM optimizer [51] with an exponentially decaying learning rate from

10^{- 2}

to

10^{- 4}

and with the second-order L-BFGS optimizer [52]. Then, from the second training iteration, we only use the L-BFGS optimizer. We remark that L-BFGS allows a very fast convergence but only if the initial starting point is close enough to the problem’s solution. Therefore, in the first training iteration, we use ADAM to obtain a first approximation of the solution that is then improved via L-BFGS. Then, since the m-th training iteration starts from the solution computed during the

(m - 1)

-th one, we assume that the starting point is close enough to the solution of the new optimization problem (associated with a difference loss function with more patches) and we only use L-BFGS to increase the training efficiency.

During the m-th iteration of the training, the training set consists of all the quadrature nodes

ξ_{ℓ}^{i}

, for any

ℓ \in I^{\hat{P}}

and for any patch

P_{i} \in P_{m}

as defined in (9). The order of the chosen quadrature rule is

q = 3

inside each triangle. The Dirichlet boundary conditions are imposed by means of the operator B defined in (6). In this operator, for our first numerical test, the function

ϕ

is a polynomial bubble vanishing on

Γ

and

\bar{g}

is the output of a neural network trained to interpolate the boundary data. For the numerical test in Section 3.4, instead,

ϕ

is computed as in [50] and

g = 0

. To decrease the training time, the functions

ϕ

,

\nabla ϕ

,

\bar{g}

and

\nabla \bar{g}

are evaluated only once at the beginning of the m-th training iteration and they are then combined to evaluate

B u^{N N}

and its gradient (where

u^{N N}

is the output of the last layer of the neural network). The derivatives of

u^{N N}

and

\bar{g}

are computed via automatic differentiation [53] due to the complexity of their analytical expressions.

The output of the model is the value of the function

B u^{N N}

and its gradient evaluated at the input points. Such values are then suitably combined using sparse and dense tensors to compute the quantity

R_{h}^{2} (B u^{N N}; P_{m})

. The sparse tensors contain the evaluation of

φ_{i}

and

\nabla φ_{i}

at each input point, whereas the dense ones store the quadrature weights, the vector

γ = {γ_{i}}_{i = 1}^{n_{patches}}

and the evaluation of

μ

,

β

,

σ

and f at the input points. We highlight that all these tensors have to be computed once at the beginning of the m-th training iteration (updating the ones of the

(m - 1)

-th iteration) to significantly decrease the training computational cost.

As discussed in Section 2.1, we assume that all the patches and test functions can be generated from a reference patch

\hat{P}

. For each patch

P_{i} \in P_{m}

, one has to generate all the data structures required to assemble the loss function and the error indicator

η_{i}

. To do so, it is possible to explicitly construct all the tensors required to assemble the term

{\hat{a}}_{h} (w, \hat{φ})

and all the terms involved in the computation of the reference error indicator

\hat{η}

only once, at the beginning of the first iteration of the training. Then, all these tensors can be suitably rescaled to obtain the ones corresponding to the patches and test functions involved in the loss function and error indicators computations.

To stabilize the MF-VPINN, we introduce the

L^{2}

regularization term

L_{reg} (u^{N N}) = λ_{reg} {∥ u^{N N} ∥}_{2}^{2},

where

u^{N N}

is the set of weights of the neural network introduced in Section 2.1. In our numerical experiments, we use

λ_{reg} = 10^{- 5}

. During the m-th iteration of the training, such a quantity is added to

R_{h}^{2} (B u^{N N}; P_{m})

to obtain the training loss function

L_{m} (u^{N N}) = R_{h}^{2} (B u^{N N}; P_{m}) + L_{reg} (w^{N N}),

(27)

which has to be minimized accurately enough. Indeed, if

L_{m}

is minimized poorly, the new patches

P_{m + 1} ∖ P_{m}

may be added in regions where they are not necessary because the accuracy of

B u^{N N}

may still improve during the training and may not be inserted in areas where they are required. Note that, in order to compute the numerical solution, the MF-VPINN has to be trained multiple times with a different set of patches

P_{m}

to minimize the losses

{L_{m}}

. Since such an iterative training may be expensive, we propose an early stopping strategy [54] based on the discussed error indicator to reduce its computational cost. In its basic version, early stopping consists of evaluating a chosen metric on a validation set in order to know when the neural network accuracy on data that are not present in the training set start worsening. Interrupting the training there prevents overfitting and improves generalization. In our context, instead, we can directly track the behavior of the MF-VPINN

H^{1}

error on each patch through the corresponding error indicator to understand when it stops decreasing. Therefore, given the set of patches

P_{m}

, the chosen metric is the linear combination

E S_{m} = \sum_{i = 1}^{\dim (P_{m})} η_{i}^{γ}

. Numerical results showing the performance of this strategy are presented in Section 3.2 and Section 3.4.

3.2. Adaptive Training Strategies

Let us consider the Poisson problem:

\{\begin{matrix} - Δ u = f & in Ω, \\ u = g & on Γ, \end{matrix}

(28)

defined on the unit square

Ω = {(0, 1)}^{2}

. The forcing term f and the boundary condition g are chosen such that the exact solution is, in polar coordinates,

u (r, θ) = r^{\frac{2}{3}} sin (\frac{2}{3} (θ + \frac{π}{2})) .

(29)

We use this function, represented in Figure 2, because the solution u is such that

u \in H^{5 / 3 - ε} (Ω)

but

u \in C^{\infty} (Ω ∖ N_{0})

, where we denote by

N_{0}

a neighborhood of the origin. Therefore, we know that an efficient distribution of patches has to be characterized by a high density only near the origin.

Below, we propose, in order of complexity, three alternatives to construct the new patches after having marked the ones with the higher error indicator. The first strategy is the most simple and intuitive, and the new patches are randomly generated with centers inside the marked patches, whereas the second strategy and third one place the new centers on a small local cartesian grid to ensure a more regular distribution. The difference between the second and the third strategies is that the marked patches are removed to increase the efficiency and we add a constraint to the marking procedure to ensure more regular distributions of the new patches.

Strategy #1: Random patch centers with uniform distribution

To solve Problem (28), as a first strategy, we consider the reference patch

\hat{P} = {(0, 1)}^{2}

and generate a sequence of sets of patches. During the first training iteration, we use

P_{0} = {\hat{P}}

since this is already a cover of

Ω

. During the second iteration, we enrich the set of patches as

P_{1} = P_{0} \cup {P_{1}, P_{2}, P_{3}, P_{4}}

where

P_{1}, P_{2}, P_{3}

and

P_{4}

are squared patches with edge

h_{i} = 0.6

,

i = 1, \dots, 4

and centers

c_{P_{1}} = (0.3, 0.3), c_{P_{2}} = (0.7, 0.3),

c_{P_{3}} = (0.3, 0.7), c_{P_{4}} = (0.7, 0.7) .

This allows us to start from a homogeneous distribution of patches before utilizing the error indicator to choose the location of the new patches. Then, to decide how many patches have to be added to

P_{m - 1}

to generate

P_{m}

, we choose

{\tilde{τ}}_{m}

such that

{\tilde{τ}}_{m} = \dim (\{\tilde{τ} \in {1, \dots, \dim (P_{m - 1}} : \frac{\sum_{i = 1}^{\tilde{τ}} η_{s_{i}}^{γ}}{\sum_{i = 1}^{\dim (P_{m - 1})} η_{i}^{γ}} < 0.75\}) + 1

(30)

and fix

τ_{m} = min (⌈ 0.3 \cdot \dim (P_{m - 1}) ⌉, {\tilde{τ}}_{m}) .

(31)

Note that (30) allows us to consider the smallest set of patches such that the corresponding error indicators contribute at least

75 %

of the global error indicator

E S_{m - 1}

, whereas (31) is considered to limit the maximum number of patches that can be added for efficiency reasons.

Then, to generate the generic set of patches

P_{m}

, we fix a multiplication factor

C_{M}

to decide how many new patches have to be inserted inside each patch

P_{i}

such that

η_{i}^{γ} \in {\bar{η}}_{m - 1}

. Inside each chosen patch

P_{i}

,

C_{M}

centers

{\tilde{c}}_{P_{i}^{k}}

= ({\tilde{x}}_{i}^{k}, {\tilde{y}}_{i}^{k})

,

k = 1, \dots, C_{M}

, are randomly generated with a uniform distribution and the new patches’ edges’ lengths are chosen as

h_{i}^{k} = λ \frac{A_{ratio}}{\sqrt{C_{M}}} h_{i}

. Here,

λ

is a random real value from the uniform distribution

U ([\frac{9}{10}, \frac{10}{9}])

, and the scaling coefficient

\frac{A_{ratio}}{\sqrt{C_{M}}}

is chosen such that the sum of the areas of the new patches is

A_{ratio}

times the area of the original patch

P_{i}

. In the numerical experiments, we use

A_{ratio} = 1.25

. This way, it is possible to allow the new patches to overlap and keep the area of the region

P_{i} ∖ (\cup_{k = 1}^{C_{M}} P_{i}^{k})

reasonably small.

We remark that, with this strategy, it may happen that some patches are outside

Ω

. In order to avoid this risk, we move the centers

{\tilde{c}}_{P_{i}^{k}}

to obtain the actual patches centers

c_{P_{i}^{k}}

as follows:

c_{P_{i}^{k}} = (x_{i}^{k}, y_{i}^{k}) \leftarrow (max \{min \{{\tilde{x}}_{i}^{k}, 1 - \frac{h_{i}^{k}}{2}\}, \frac{h_{i}^{k}}{2}\}, max \{min \{{\tilde{y}}_{i}^{k}, 1 - \frac{h_{i}^{k}}{2}\}, \frac{h_{i}^{k}}{2}\}) .

(32)

We remark that, when the patch

P_{i}

is very close to a vertex of the domain, it is possible that multiple original centers

{\tilde{c}}_{P_{i}^{k}}

are such that the distance of both

{\tilde{x}}_{i}^{k}

and

{\tilde{y}}_{i}^{k}

from the x and y coordinates of the domain vertex is smaller than

h_{i}^{k} / 2

. In this case, it is important to consider the random coefficient

λ

in the definition of

h_{i}^{k}

to avoid updating all these centers with the same point; otherwise, multiple new patches would coincide (because they would share the same center and size).

For the numerical test, we consider

C_{M} = 4

and

C_{M} = 9

. Using significantly more accurate quadrature rules, we compare the approximate solution with the exact one defined in (29) and compute the relative

H^{1}

error

∥ u - u^{N N} ∥_{1} / {∥ u ∥}_{1}

at the end of each training iteration. The obtained errors are shown as blue circles (

C_{M} = 4

) and red triangles (

C_{M} = 9

) in Figure 3. It can be noted that, with both values of

C_{M}

, when more patches are used, the error is smaller, even though the convergence rate is limited by the low regularity of the solution. It is also interesting to observe the positions and sizes of the used patches; such information is summarized in Figure 4 and Figure 5. In such figures, each dot is in the center of a patch

P_{i}

, and its size and color represent the size

h_{i}^{2}

and the scaled indicator

η_{i}^{γ}

associated with

P_{i}

. It can be noted that, even if the new centers are chosen randomly in the few selected patches, the final distribution is the expected one. In fact, most of the patches cluster around the origin, whereas the rest of the domain is covered by fewer patches. Nevertheless, we highlight that, when

C_{M} = 9

, there are more small and medium patches far from the origin, yielding a more uniform covering of the areas far from the singular point and a slightly better accuracy.

Strategy #2: Fixed patch centers

From the results discussed in Strategy #1, it can be observed that choosing the position of the new centers randomly may lead to non-uniform patch distribution in regions far from the singular point. In order to obtain better distributions, let us fix a priori the position of the new centers. Let us consider the reference patch

\hat{P} = {(0, 1)}^{2}

and the points

\begin{matrix} {\hat{c}}_{1} = (0.25, 0.25), {\hat{c}}_{2} = (0.75, 0.25), \\ {\hat{c}}_{3} = (0.25, 0.75), {\hat{c}}_{4} = (0.75, 0.75), \end{matrix}

(33)

when

C_{M} = 4

and

\begin{matrix} {\hat{c}}_{1} = (0.2, 0.2), {\hat{c}}_{2} = (0.2, 0.5), {\hat{c}}_{3} = (0.2, 0.8), \\ {\hat{c}}_{4} = (0.5, 0.2), {\hat{c}}_{5} = (0.5, 0.5), {\hat{c}}_{6} = (0.5, 0.8), \\ {\hat{c}}_{7} = (0.8, 0.2), {\hat{c}}_{8} = (0.8, 0.5), {\hat{c}}_{9} = (0.8, 0.8), \end{matrix}

(34)

when

C_{M} = 9

. At the end of the

(m - 1)

-th training iteration, if

η_{i}^{γ} \in {\bar{η}}_{m - 1}

, the

C_{M}

centers inside

P_{i}

are chosen as

c_{P_{i}^{k}}

= M_{i} ({\hat{c}}_{k})

,

k = 1, \dots, C_{M}

. Once more, to avoid patches partially outside

Ω

, we update such centers as in (32). We highlight that defining the new centers as in (33) and in (34) and the length

h_{i}^{k}

of the edges of the new patches as in Strategy #1, then the new patches with centers inside

P_{i}

form a cover of

P_{i}

, i.e.,

P_{i} ⊊ \cup_{k = 1}^{C_{M}} P_{i}^{k}

. Such a property does not hold if the new centers are randomly chosen.

Training an MF-VPINN with such a strategy leads to more accurate results. The error decays are shown in Figure 6, whereas a comparison with the previous one will be presented in Section 3.3. The patch distributions, for

C_{M} = 4

and

C_{M} = 9

, are shown in Figure 7 and Figure 8, respectively. Analyzing such distributions, it can be noted that the patches still accumulate near the origin as expected. However, it is possible to observe that there are regions that are only covered by the largest patches. This phenomenon is more evident when

C_{M} = 4

. To avoid such a phenomenon, we aim at inserting more patches far from the origin in order to train the MF-VPINN in the entire domain with a more balanced set of patches.

Strategy #3: Fixed patch centers and small level gap strategy

In order to ensure better patch distributions, let us consider a new criterion to choose the position and the size of the new patches. We name this strategy the small-level gap strategy because it penalizes patch distributions with large differences between the levels of the smallest patches and the ones of the largest patches.

We denote by k-th level patch any patch

P_{i}

such that

P_{i} \in P_{k}

and

P_{i} \notin P_{k^{'}}

for any

k^{'} < k

. With this notation, it is possible to group all the patches according to their level. To do so, we denote by

L_{ℓ}

the set of k-th level patches with

k \leq ℓ

. Let us consider the m-th training iteration. We define

η_{sort}^{ℓ}

as the array containing the elements

η_{i}^{γ}

of

η_{sort}

(maintaining the same ordering) such that

P_{i} \in L_{ℓ}

. We also denote by

{\bar{η}}_{m, ℓ}

the array containing the first

τ_{m}^{ℓ} = min {τ_{m}, \dim (L_{ℓ})}

elements of

η_{sort}^{ℓ}

. Note that

{\bar{η}}_{m, ℓ}

is the equivalent of

{\bar{η}}_{m}

for patches in

L_{ℓ}

.

In order to generate the new patches in

P_{m + 1} ∖ P_{m}

, let us add

C_{M}

new patches in any patch

P_{i}

such that

η_{i}^{γ} \in {\bar{η}}_{m} \cup {\bar{η}}_{m, ℓ}

. The centers and sizes of the new patches are chosen as in Strategy #2. This allows us to exploit the fact that

P_{i} ⊊ \cup_{k = 1}^{C_{M}} P_{i}^{k}

to remove the patches

P_{i}

such that

η_{i}^{γ} \in {\bar{η}}_{m} \cup {\bar{η}}_{m, ℓ}

from the new set of patches

P_{m + 1}

. We remark that such patches cannot be removed when the centers are randomly chosen as in Strategy #1 because, in that case,

P_{m + 1}

would not be a cover of

Ω

anymore.

We also highlight that, removing the patches

P_{i}

such that

η_{i}^{γ} \in {\bar{η}}_{m} \cup {\bar{η}}_{m, ℓ}

and choosing

A_{ratio} = 1

, it is possible to satisfy the inequality

\sum_{P_{i} \in P_{m + 1}} | P_{i} | \leq C | Ω |,

for any

m \in N

and with

C > 0

independent of m. Such a bound on the sum of the area of the patches is useful to ensure that there exists a number

N_{patch_per_point}

such that any point inside

Ω

belongs to at most

N_{patch_per_point}

patches. This property is useful to derive global error indicators. We choose to maintain

A_{ratio} = 1.25

to compare the numerical results with the ones obtained using the previous strategies and to consider overlapping patches.

We train an MF-VPINN with

C_{M} = 4

and

C_{M} = 9

as in the previous tests. The corresponding error decays are shown in Figure 9. It can be observed that the error decreases in a smoother way and that, as in the previous tests, choosing

C_{M} = 4

or

C_{M} = 9

does not lead to significant differences in the error behavior. The patches used during the training are represented in Figure 10 and Figure 11. We highlight that, when compared with the patch distributions in Strategy #2, there exist much more patches far from the origin, and, most importantly, the closer the center of a patch to the origin, the smaller its size. Even though the error decays with

C_{M} = 4

and

C_{M} = 9

are qualitatively similar, it should be noted that the patch distribution with

C_{M} = 9

is more skewed. In fact. its patches can be clustered into two subgroups: the first one containing larger patches and covering most of the domain the second one containing only small patches with centers very close to the origin. A similar distribution is obtained with

C_{M} = 4

, even though it is characterized by a smoother transition between large and small patches.

In both cases, it can be observed that there are no large patches very close to small ones. This is in contrast with the distributions obtained in Strategy #2 and leads to more stable solvers. Indeed, even though the test functions are not related to a global triangulation on the entire domain

Ω

, the current loss function is very similar to the one used in a standard VPINN with a good-quality mesh, i.e., a mesh in which neighboring elements are similar in size and shape. On the other hand, in Strategy #2, there exist large patches that are very close to small ones; this is equivalent to training a VPINN on a very poor-quality mesh. Such meshes, in the context of FEM, are strictly related to convergence and accuracy issues.

3.3. The Importance of the Error Indicator

As discussed in the previous sections, we use the error indicator described in Section 2.3 to interrupt the training and to decide where the new patches have to be inserted to maximize the accuracy. In this section, the advantages of such a choice are described.

Since each set

P_{m}

is a cover of

Ω

, the quantity

E S_{m} = \sum_{i = 1}^{\dim (P_{m})} η_{i}

is an indicator of the global

H^{1}

error

∥ u - u^{N N} ∥_{1}

on the entire domain

Ω

. Therefore, tracking its behavior during the training is equivalent to tracking that of the unknown

H^{1}

error. Such information is used to implement an early stopping strategy to reduce the computational cost of the iterative training. At the beginning of the m-th training iteration, all the vectors and sparse matrices required to compute

E S_{m}

are computed in a preprocessing phase. When such data structures are available, the error indicator can be assembled suitably combining basic algebraic operations.

We assemble

E S_{m}

every

N_{check}

epochs and store the best value obtained during the training, together with the corresponding neural network trainable parameters. Then, if no improvements are obtained in

p \cdot N_{check}

epochs, the training is interrupted and the neural network parameters associated with the best value of

E S_{m}

are restored. Here, p is a tunable parameter named patience. The first

N_{negl}^{m}

epochs are neglected because they are often characterized by strong oscillations due to the optimizer initialization and the different loss functions. In the numerical experiment, we use

N_{check} = 10

,

p = 10

,

N_{negl}^{m} = 100 (m + 1)

.

Two typical scenarios are shown in Figure 12. In the top row, the behaviors of

E S_{m}

and of

c ∥ u - u^{N N} ∥_{1}

are shown. Here, c is a scaling parameter used for visualization purposes, chosen such that

E S_{m}

and

c ∥ u - u^{N N} ∥_{1}

coincide at the beginning of the training. Indeed,

∥ u - u^{N N} ∥_{1}

is about two orders of magnitude smaller than

E S_{m}

. Nevertheless, it can be noted that these two quantities display very similar behaviors during the training. In the bottom row, instead, we represent the corresponding loss function decay. The left column is associated with the training performed using the patches in

P_{6}

shown in Figure 5f and the right column with the one performed using the patches in

P_{2}

in Figure 11a. We remark that the loss function,

E S_{m}

and

c ∥ u - u^{N N} ∥

are evaluated in the same epochs and that, in real applications, it is not possible to explicitly compute

c ∥ u - u^{N N} ∥

since u is not known. Moreover, since we use the L-BFGS optimizer, the neural network is evaluated multiple times on the entire training set in each epoch. Therefore, on the x-axis of Figure 12 we show the number of neural network evaluations instead of the number of epochs.

It can be noted that the behavior of the quantities shown in the left column is qualitatively different from the ones in the right column. In fact, when the MF-VPINN is trained with

P_{6}

of Figure 5f, the error, the error indicator, and the loss function decrease in similar ways. Therefore, there is no need to interrupt the training early since the accuracy is improving, minimizing the loss function. On the other hand, when the MF-VPINN is trained with the

P_{2}

of Figure 11a, the loss decreases even when the error and the error indicator increase or remain constant. In this case, it is convenient to interrupt the training, since minimizing the loss function further would lead to more severe overfitting phenomena and a loss in accuracy and efficiency. At the end of the training, the neural network’s trainable parameters corresponding to the best value of

E S_{2}

are restored. We highlight that such a phenomenon, observed in [46] too, highlights the fact that the minimization of the loss function generates spurious oscillations that cannot be controlled and ruin the model accuracy. The issue can be partially alleviated with the adopted regularization or completely removed using inf-sup stable models as in [47].

Strategy #4: Adaptive strategy without the error indicator

Let us now analyze the consequences of choosing the position of the new patches without using the error indicator. To do so, we consider Strategy #1 but, instead of considering the new centers inside the patches

P_{i}

with the highest values of

η_{i}^{γ}

, we add them inside the patches with the highest values of

r_{h, i}^{2} (u^{N N})

. Using the equation residuals is a common choice in PINN adaptivity because the residuals describe how accurately the neural network satisfies the PDE at that point. The obtained error decay is shown in Figure 13. It can be seen that the accuracy is worse than the ones obtained with the other strategies and that the convergence rate with respect to the number of patches is lower. In such a figure, we also compare the MF-VPINN with a standard VPINN trained with test functions defined on Delaunay meshes. Note that, when Strategy #2 or Strategy #3 is adopted, the MF-VPINN is more accurate than a simple VPINN, even though its main advantage resides in being a meshfree method.

We highlight that, due to the low regularity of the solution, the expected convergence rate with respect to the number of test functions of an FEM solution computed on uniform refinements is −1/3. Note that the convergence rate of the proposed MF-VPINN method is still close to −1/3, even though it is a meshfree method (see Table 1). For completeness, we also remark that, if an adaptive FEM is used, the rate of convergence depends on the FEM order.

Coherently with Figure 13, the best strategies are Strategy #2 and Strategy #3, whereas the worst one is Strategy #4, which does not exploit the error indicator. The poor performance of Strategy #4 can also be explained by analyzing the corresponding patch distribution. Such distribution is shown in Figure 14 for

C_{M} = 4

and in Figure 15 for

C_{M} = 9

. These plots highlight that the patches do not accumulate near the origin because the residuals of the patches closer to it are not significantly higher than the other ones. For example, note the different colors in Figure 4 and Figure 14, since in both cases, we randomly choose the position of

C_{M} = 4

centers inside the selected patches. Such a property is explained by the fact that, in order to minimize the loss function, the optimizer does not focus on specific regions of the domain. Therefore, the orders of magnitude of all the residuals with similar sizes are very close to each other regardless of the position of the corresponding patches. As discussed regarding Figure 12, we can conclude that the value of the residuals is not a good indicator of the actual error.

3.4. Extension to More a Complex Domain

In this section, we present some ideas that can be used to apply the method to more complex domains.

Let us consider a domain

Ω_{2}

with some internal holes and boundary

\partial Ω_{2} = Γ_{2}

. In particular,

Ω_{2} = {(0, 1)}^{2} ∖ (\cup_{i = 1}^{4} H_{i})

, where

H_{i}

,

i = 1, 2, 3, 4

are rectangular holes with centers

c_{H_{i}}

defined as

c_{H_{1}} = (\frac{9}{26}, \frac{9}{34}), c_{H_{2}} = (\frac{17}{26}, \frac{9}{34}),

c_{H_{3}} = (\frac{9}{26}, \frac{25}{34}), c_{H_{4}} = (\frac{17}{26}, \frac{25}{34}),

and basis and height equal

\frac{1}{26}

and

\frac{1}{34}

, respectively.

In this domain, we consider the Poisson problem:

\{\begin{matrix} - Δ u = f & in Ω_{2}, \\ u = g & on Γ_{2}, \end{matrix}

(35)

with f and g such that the exact solution is

\begin{matrix} u (x, y) = \frac{1}{C_{u}} & [x (x - 1) (x - \frac{4}{13}) (x - \frac{5}{13}) (x - \frac{8}{13}) (x - \frac{9}{13}) \cdot \\ y (y - 1) (y - \frac{4}{17}) (y - \frac{5}{17}) (y - \frac{12}{17}) (y - \frac{13}{17})], \end{matrix}

(36)

normalized through the constant

\frac{1}{C_{u}}

to assume value 1 in

(\frac{2}{13}, \frac{2}{17})

. This function is represented in Figure 16.

We extend the approaches proposed in Section 3.2 by adding a cutting procedure after the generation of the new patches. Note that, in particular, all the patches are already completely inside the square

{[0, 1]}^{2}

when we apply the cutting procedure, and we can thus focus only on the holes. When a patch intersects more than one hole, we recursively remove it from

P_{m}

, we subdivide the corresponding region in 4 overlapping patches, and we add them to

P_{m}

until all the generated patches intersect at most one hole. Moreover, we observe that the region

P_{i} ∖ H_{j}

inside the patch

P_{i} \in P_{m}

and outside the hole

H_{j}

,

j = 1, 2, 3, 4

, can always be covered by the union of at most four rectangles. When a generated patch intersects a hole, we thus remove the patch and generate the minimum number of patches (at most four) that are as large possible and whose union is the region

P_{i} ∖ H_{j}

.

To avoid numerical instabilities, when this cutting procedure generates a patch with an aspect ratio larger than 100 or with an area more than 100 times smaller than the original uncutted patch, the new patches are removed from

P_{m}

. This implies that it is not possible to remove the patches associated with the highest error indicators as in Strategy #3 because otherwise, the union of all the patches would not cover the entire domain. We thus present numerical results only for Strategy #1 and Strategy #2.

The obtained error decays are shown in Figure 17 for Strategy #1 and Strategy #2 with

C_{M} = 4

and

C_{M} = 9

. The first and second errors are computed with the patches generated by cutting the patches in

P_{0}

and

P_{1}

, respectively, whereas the third and fourth errors are obtained by refining the previous patches with the error indicator as previously described. Note that the first and second errors are very close for all the curves since the strategy and the value of

C_{M}

does not influence the training and that both strategies converge better with

C_{M} = 4

. The final patch distributions are displayed in Figure 18. Here, we can see that the inner part of the domain is covered by a few large patches, whereas the distribution is denser closer to the external boundary of

Ω_{2}

, where the solution is more oscillating.

4. Conclusions and Discussion

In this work, we presented a Meshfree Variational-Physics-Informed Neural Network (MF-VPINN). It is a PINN trained using the PDE variational formulation that does not require the generation of a global triangulation of the entire domain. In order to generate the test functions involved in the loss computation, we use an a posteriori error estimator based on the one discussed in [46]. Using such an error estimator, it is possible to add test functions only in regions in which the error is higher, thus increasing the efficiency of the method.

We highlight that the main advantages of the method are that it is meshfree, as it requires only a covering of the domain with patches that can be of different shapes and that it automatically improves the solution with the application of local patches without requiring a global mesh manipulation. It can be therefore used in domains where it is expensive or impossible to generate a mesh. On the other hand, if a mesh suitable to describe the solution can be generated, a standard VPINN is preferable since the implementation is simpler and the convergence rate with respect to the number of test functions is higher.

We discuss several strategies to generate the set of test functions. We observe that adding a few test functions inside the patches associated with higher errors while ensuring a smooth transition between regions with large patches and regions with small patches is the best way to obtain accurate solutions. We also show that, if the a posteriori error indicator is not used, the model’s accuracy decreases and the training is slower.

In this paper, we only focus on second-order elliptic problem even though VPINNs can be used to solve more complex problems. In a forthcoming paper, we will adapt the a posteriori error estimator and analyze the MF-VPINN performance on other PDEs. Moreover, we are interested in the analysis of the approach in more complex domains (in which the patches have to be suitably deformed) and in high-dimensional problems, where using a standard VPINN is not practical.

Author Contributions

Conceptualization, S.B. and M.P.; methodology, S.B. and M.P.; software, M.P.; validation, M.P.; formal analysis, S.B. and M.P.; investigation, S.B. and M.P.; resources, S.B.; data curation, M.P.; writing—original draft preparation, M.P.; visualization, M.P.; supervision, S.B.; project administration, S.B.; funding acquisition, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

The author S.B. kindly acknowledges partial financial support provided by PRIN project “Advanced polyhedral discretisations of heterogeneous PDEs for multiphysics problems” (No. 20204LN5N5_003) and by PNRR M4C2 project of CN00000013 National center for HPC, Big Data and Quantum Computing (HPC) (CUP: E13C22000990001). The author M.P. kindly acknowledges the financial support provided by the Politecnico di Torino where the research was carried out.

Data Availability Statement

The data are available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lagaris, I.; Likas, A.; Fotiadis, D. Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 1997, 104, 1–14. [Google Scholar] [CrossRef]
Lagaris, I.; Likas, A.; Fotiadis, D. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 1998, 9, 987–1000. [Google Scholar] [CrossRef] [PubMed]
Lagaris, I.; Likas, A.; Papageorgiou, D. Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 2000, 11, 1041–1049. [Google Scholar] [CrossRef] [PubMed]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org (accessed on 15 September 2024).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M.J.; Leary, C.; Maclaurin, D.; Necula, G.; Paszke, A.; VanderPlas, J.; Wanderman-Milne, S.; et al. JAX: Composable Transformations of Python+NumPy Programs. 2018. Available online: http://github.com/google/jax (accessed on 15 September 2024).
Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv 2017, arXiv:1711.10561. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics informed deep learning (part ii): Data-driven solutions of nonlinear partial differential equations. arXiv 2017, arXiv:1711.10566. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Pu, J.; Li, J.; Chen, Y. Solving localized wave solutions of the derivative nonlinear Schrödinger equation using an improved PINN method. Nonlinear Dyn. 2021, 105, 1723–1739. [Google Scholar] [CrossRef]
Yuan, L.; Ni, Y.; Deng, X.; Hao, S. A-PINN: Auxiliary physics informed neural networks for forward and inverse problems of nonlinear integro-differential equations. J. Comput. Phys. 2022, 462, 111260. [Google Scholar] [CrossRef]
Guo, Q.; Zhao, Y.; Lu, C.; Luo, J. High-dimensional inverse modeling of hydraulic tomography by physics informed neural network (HT-PINN). J. Hydrol. 2023, 616, 128828. [Google Scholar] [CrossRef]
Demo, N.; Strazzullo, M.; Rozza, G. An extended physics informed neural network for preliminary analysis of parametric optimal control problems. Comput. Math. Appl. 2023, 143, 383–396. [Google Scholar] [CrossRef]
Gao, H.; Sun, L.; Wang, J. PhyGeoNet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 2021, 428, 110079. [Google Scholar] [CrossRef]
Yuyao, C.; Lu, L.; Karniadakis, G.; Dal Negro, L. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Opt. Express 2020, 28, 11618–11633. [Google Scholar] [CrossRef]
Tartakovsky, A.; Marrero, C.; Perdikaris, P.; Tartakovsky, G.; Barajas-Solano, D. Learning parameters and constitutive relationships with physics informed deep neural networks. arXiv 2018, arXiv:1808.03398. [Google Scholar]
Chen, Z.; Liu, Y.; Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 2021, 12, 6136. [Google Scholar] [CrossRef] [PubMed]
Weinan, E.; Yu, B. The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 2018, 6, 1–12. [Google Scholar]
Müller, J.; Zeinhofer, M. Error estimates for the deep Ritz method with boundary penalty. In Proceedings of the Mathematical and Scientific Machine Learning, PMLR, Beijing, China, 15–17 August 2022; pp. 215–230. [Google Scholar]
Lu, Y.; Lu, J.; Wang, M. A priori generalization analysis of the deep Ritz method for solving high dimensional elliptic partial differential equations. In Proceedings of the Conference on Learning Theory. PMLR, Boulder, CO, USA, 15–19 August 2021; pp. 3196–3241. [Google Scholar]
Sirignano, J.; Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018, 375, 1339–1364. [Google Scholar] [CrossRef]
Al-Aradi, A.; Correia, A.; Jardim, G.; de Freitas Naiff, D.; Saporito, Y. Extensions of the deep Galerkin method. Appl. Math. Comput. 2022, 430, 127287. [Google Scholar] [CrossRef]
Li, J.; Zhang, W.; Yue, J. A deep learning Galerkin method for the second-order linear elliptic equations. Int. J. Numer. Anal. Model. 2021, 18, 427–441. [Google Scholar]
Smith, B.F. Domain decomposition methods for partial differential equations. In Parallel Numerical Algorithms; Springer: Berlin/Heidelberg, Germany, 1997; pp. 225–243. [Google Scholar]
Toselli, A.; Widlund, O. Domain Decomposition Methods-Algorithms and Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006; Volume 34. [Google Scholar]
Jagtap, A.; Kharazmi, E.; Karniadakis, G. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 2020, 365, 113028. [Google Scholar] [CrossRef]
Shukla, K.; Jagtap, A.D.; Karniadakis, G.E. Parallel physics-informed neural networks via domain decomposition. J. Comput. Phys. 2021, 447, 110683. [Google Scholar] [CrossRef]
Jagtap, A.; Karniadakis, G. Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 2020, 28, 2002–2041. [Google Scholar]
Moseley, B.; Markham, A.; Nissen-Meyer, T. Finite Basis Physics-Informed Neural Networks (FBPINNs): A scalable domain decomposition approach for solving differential equations. Adv. Comput. Math. 2023, 49, 62. [Google Scholar] [CrossRef]
Viana, F.; Nascimento, R.; Dourado, A.; Yucesan, Y. Estimating model inadequacy in ordinary differential equations with physics-informed neural networks. Comput. Struct. 2021, 245, 106458. [Google Scholar] [CrossRef]
Yang, L.; Meng, X.; Karniadakis, G. B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 2021, 425, 109913. [Google Scholar] [CrossRef]
Yang, L.; Zhang, D.; Karniadakis, G. Physics-Informed Generative Adversarial Networks for Stochastic Differential Equations. SIAM J. Sci. Comput. 2020, 42, A292–A317. [Google Scholar] [CrossRef]
Yucesan, Y.; Viana, F. Hybrid physics-informed neural networks for main bearing fatigue prognosis with visual grease inspection. Comput. Ind. 2021, 125, 103386. [Google Scholar] [CrossRef]
Zhu, Y.; Zabaras, N.; Koutsourelakis, P.; Perdikaris, P. Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 2019, 394, 56–81. [Google Scholar] [CrossRef]
Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Koenig, B.C.; Kim, S.; Deng, S. KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics. arXiv 2024, arXiv:2407.04192. [Google Scholar]
Qian, K.; Kheir, M. Investigating KAN-Based Physics-Informed Neural Networks for EMI/EMC Simulations. arXiv 2024, arXiv:2405.11383. [Google Scholar]
Kumar, V.; Gleyzer, L.; Kahana, A.; Shukla, K.; Karniadakis, G.E. Mycrunchgpt: A llm assisted framework for scientific machine learning. J. Mach. Learn. Model. Comput. 2023, 4, 41–72. [Google Scholar] [CrossRef]
Beck, C.; Hutzenthaler, M.; Jentzen, A.; Kuckuck, B. An overview on deep learning-based approximation methods for partial differential equations. Discret. Contin. Dyn. Syst. B 2022, 28, 3697–3746. [Google Scholar] [CrossRef]
Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific Machine Learning Through Physics-Informed Neural Networks: Where we are and What’s Next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
Lawal, Z.; Yassin, H.; Lai, D.; Che Idris, A. Physics-Informed Neural Network (PINN) Evolution and Beyond: A Systematic Literature Review and Bibliometric Analysis. Big Data Cogn. Comput. 2022, 6, 140. [Google Scholar] [CrossRef]
Viana, F.A.; Subramaniyan, A.K. A survey of Bayesian calibration and physics-informed neural networks in scientific modeling. Arch. Comput. Methods Eng. 2021, 28, 3801–3830. [Google Scholar] [CrossRef]
Kharazmi, E.; Zhang, Z.; Karniadakis, G. VPINNs: Variational physics-informed neural networks for solving partial differential equations. arXiv 2019, arXiv:1912.00873. [Google Scholar]
Kharazmi, E.; Zhang, Z.; Karniadakis, G. hp-VPINNs: Variational physics-informed neural networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 2021, 374, 113547. [Google Scholar] [CrossRef]
Berrone, S.; Canuto, C.; Pintore, M. Solving PDEs by variational physics-informed neural networks: An a posteriori error analysis. Ann. Univ. Ferrara 2022, 68, 575–595. [Google Scholar] [CrossRef]
Berrone, S.; Canuto, C.; Pintore, M. Variational-Physics-Informed Neural Networks: The role of quadratures and test functions. J. Sci. Comput. 2022, 92, 100. [Google Scholar] [CrossRef]
Berrone, S.; Pieraccini, S.; Scialò, S. Towards effective flow simulations in realistic discrete fracture networks. J. Comput. Phys. 2016, 310, 181–201. [Google Scholar] [CrossRef]
Sukumar, N.; Srivastava, A. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Comput. Methods Appl. Mech. Eng. 2022, 389, 114333. [Google Scholar] [CrossRef]
Berrone, S.; Canuto, C.; Pintore, M.; Sukumar, N. Enforcing Dirichlet boundary conditions in physics-informed neural networks and variational physics-informed neural networks. Heliyon 2023, 9, e18820. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Wright, S.; Nocedal, J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 1999; Volume 35, p. 7. [Google Scholar]
Baydin, A.; Pearlmutter, B.; Radul, A.; Siskind, J. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. 2018, 18, 5595–5637. [Google Scholar]
Prechelt, L. Early stopping-but when? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar]

Figure 1. Graphical representation of a set

{P_{i}}_{i = 1}^{n_{patches}}

obtained from a squared reference patch

\hat{P}

with

c_{\hat{P}}

in its center covering the domain

Ω = {(0, 1)}^{2}

.

Figure 1. Graphical representation of a set

{P_{i}}_{i = 1}^{n_{patches}}

obtained from a squared reference patch

\hat{P}

with

c_{\hat{P}}

in its center covering the domain

Ω = {(0, 1)}^{2}

.

Figure 2. Graphical representation of the solution u in (29).

Figure 3. Strategy #1: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 3. Strategy #1: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 4. Strategy #1: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{6}

; (e) Representation of

P_{8}

; (f) Representation of

P_{9}

.

Figure 4. Strategy #1: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{6}

; (e) Representation of

P_{8}

; (f) Representation of

P_{9}

.

Figure 5. Strategy #1: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 5. Strategy #1: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 6. Strategy #2: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 6. Strategy #2: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 7. Strategy #2: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{3}

; (b) Representation of

P_{4}

; (c) Representation of

P_{5}

; (d) Representation of

P_{6}

; (e) Representation of

P_{7}

; (f) Representation of

P_{8}

.

Figure 7. Strategy #2: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{3}

; (b) Representation of

P_{4}

; (c) Representation of

P_{5}

; (d) Representation of

P_{6}

; (e) Representation of

P_{7}

; (f) Representation of

P_{8}

.

Figure 8. Strategy #2: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{5}

; (e) Representation of

P_{6}

; (f) Representation of

P_{7}

.

Figure 8. Strategy #2: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{5}

; (e) Representation of

P_{6}

; (f) Representation of

P_{7}

.

Figure 9. Strategy #3: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 9. Strategy #3: Relative

H^{1}

errors obtained at the end of each training iteration for

C_{M} = 4

(blue circles) and

C_{M} = 9

(red triangles).

Figure 10. Strategy #3: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{3}

; (b) Representation of

P_{5}

; (c) Representation of

P_{6}

; (d) Representation of

P_{7}

; (e) Representation of

P_{8}

; (f) Representation of

P_{9}

.

Figure 10. Strategy #3: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{3}

; (b) Representation of

P_{5}

; (c) Representation of

P_{6}

; (d) Representation of

P_{7}

; (e) Representation of

P_{8}

; (f) Representation of

P_{9}

.

Figure 11. Strategy #3: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{5}

; (e) Representation of

P_{6}

; (f) Representation of

P_{7}

.

Figure 11. Strategy #3: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

η_{i}^{γ}

. (a) Representation of

P_{2}

; (b) Representation of

P_{3}

; (c) Representation of

P_{4}

; (d) Representation of

P_{5}

; (e) Representation of

P_{6}

; (f) Representation of

P_{7}

.

Figure 12. Top row: error indicator

E S_{m}

and rescaled

H^{1}

error

c ∥ u - u^{N N} ∥

. Bottom row: loss function. Left column: curves for the training with patches in

P_{6}

shown in Figure 5f. Right column: curves for the training with patches in

P_{2}

in Figure 11a. (a)

E S_{6}

and

c ∥ u - u^{N N} ∥

for patches in Figure 5f; (b)

E S_{2}

and

c ∥ u - u^{N N} ∥

for patches in Figure 11a; (c) Loss function for patches in Figure 5f; (d) Loss function for patches in Figure 11a.

Figure 12. Top row: error indicator

E S_{m}

and rescaled

H^{1}

error

c ∥ u - u^{N N} ∥

. Bottom row: loss function. Left column: curves for the training with patches in

P_{6}

shown in Figure 5f. Right column: curves for the training with patches in

P_{2}

in Figure 11a. (a)

E S_{6}

and

c ∥ u - u^{N N} ∥

for patches in Figure 5f; (b)

E S_{2}

and

c ∥ u - u^{N N} ∥

for patches in Figure 11a; (c) Loss function for patches in Figure 5f; (d) Loss function for patches in Figure 11a.

Figure 13. Comparison between the relative

H^{1}

errors obtained at the end of each training iteration with different strategies to choose the position of the new patches. (a)

C_{M} = 4

; (b)

C_{M} = 9

.

Figure 13. Comparison between the relative

H^{1}

errors obtained at the end of each training iteration with different strategies to choose the position of the new patches. (a)

C_{M} = 4

; (b)

C_{M} = 9

.

Figure 14. Strategy #4: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 14. Strategy #4: Patches used to train the MF-VPINN with

C_{M} = 4

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 15. Strategy #4: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 15. Strategy #4: Patches used to train the MF-VPINN with

C_{M} = 9

. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. (a) Representation of

P_{1}

; (b) Representation of

P_{2}

; (c) Representation of

P_{3}

; (d) Representation of

P_{4}

; (e) Representation of

P_{5}

; (f) Representation of

P_{6}

.

Figure 16. Graphical representation of the solution u in (36).

Figure 17. Relative

H^{1}

errors obtained by solving problem (35).

Figure 17. Relative

H^{1}

errors obtained by solving problem (35).

Figure 18. Problem (35): Representation of the last set of patches obtained with the different strategies. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. The black rectangles represent the holes

H_{i}

,

i = 1, 2, 3, 4

. (a) Strategy #1,

C_{M} = 4

; (b) Strategy #2,

C_{M} = 4

; (c) Strategy #1,

C_{M} = 9

; (d) Strategy #2,

C_{M} = 9

.

Figure 18. Problem (35): Representation of the last set of patches obtained with the different strategies. Each dot represents a patch

P_{i}

, its position is the center

c_{P_{i}}

of the patch, its size is proportional to the patch size

h_{i}^{2}

, and its color is associated with the quantity

r_{h, i}^{2} (u^{N N})

. The black rectangles represent the holes

H_{i}

,

i = 1, 2, 3, 4

. (a) Strategy #1,

C_{M} = 4

; (b) Strategy #2,

C_{M} = 4

; (c) Strategy #1,

C_{M} = 9

; (d) Strategy #2,

C_{M} = 9

.

Table 1. Rates of convergence with respect to the number of test functions.

$C_{M}$	Strategy #1	Strategy #2	Strategy #3	Strategy #4	Reference VPINN
4	−0.213	−0.295	−0.283	−0.105	−0.232
9	−0.294	−0.376	−0.287	−0.182	−0.232

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Berrone, S.; Pintore, M. Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy. Algorithms 2024, 17, 415. https://doi.org/10.3390/a17090415

AMA Style

Berrone S, Pintore M. Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy. Algorithms. 2024; 17(9):415. https://doi.org/10.3390/a17090415

Chicago/Turabian Style

Berrone, Stefano, and Moreno Pintore. 2024. "Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy" Algorithms 17, no. 9: 415. https://doi.org/10.3390/a17090415

APA Style

Berrone, S., & Pintore, M. (2024). Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy. Algorithms, 17(9), 415. https://doi.org/10.3390/a17090415

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy

Abstract

1. Introduction

2. Problem Formulation

2.1. Problem Discretization

2.2. Loss Function

2.3. The a Posteriori Error Estimator

2.4. The Choice of $M$ and $P$

3. Numerical Results

3.1. Implementation Details

3.2. Adaptive Training Strategies

3.3. The Importance of the Error Indicator

3.4. Extension to More a Complex Domain

4. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Meshfree Variational-Physics-Informed Neural Networks (MF-VPINN): An Adaptive Training Strategy

Abstract

1. Introduction

2. Problem Formulation

2.1. Problem Discretization

2.2. Loss Function

2.3. The a Posteriori Error Estimator

2.4. The Choice of M and P

3. Numerical Results

3.1. Implementation Details

3.2. Adaptive Training Strategies

3.3. The Importance of the Error Indicator

3.4. Extension to More a Complex Domain

4. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.4. The Choice of $M$ and $P$