Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation

Giordano, Matteo

doi:10.3390/foundations5020014

Open AccessArticle

Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation^†

by

Matteo Giordano

ESOMAS Department, University of Turin, Corso Unione Sovietica 218/bis, 10137 Turin, Italy

^†

This article is an extended version of a conference paper presented by the Author at JSM2023, Toronto, Ontario, Canada, https://doi.org/10.5281/zenodo.10075234.

Foundations 2025, 5(2), 14; https://doi.org/10.3390/foundations5020014

Submission received: 19 March 2025 / Revised: 16 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

(This article belongs to the Section Mathematical Sciences)

Download

Browse Figures

Versions Notes

Abstract

Parameter identification problems in partial differential equations (PDEs) consist in determining one or more functional coefficient in a PDE. In this article, the Bayesian nonparametric approach to such problems is considered. Focusing on the representative example of inferring the diffusivity function in an elliptic PDE from noisy observations of the PDE solution, the performance of Bayesian procedures based on Gaussian process priors is investigated. Building on recent developments in the literature, we derive novel asymptotic theoretical guarantees that establish posterior consistency and convergence rates for methodologically attractive Gaussian series priors based on the Dirichlet–Laplacian eigenbasis. An implementation of the associated posterior-based inference is provided and illustrated via a numerical simulation study, where excellent agreement with the theory is obtained.

Keywords:

inverse problems; Gaussian priors; frequentist consistency; posterior mean; Markov chain Monte Carlo

1. Introduction

Partial differential equations (PDEs) are primary mathematical tools to model the behaviour of complex real-world phenomena, with ubiquitous applications across engineering and the sciences. The formulation of a PDE typically involves a number of functional parameters, which are often unknown in applications and not directly accessible to measurements. Employing a PDE model in practice therefore necessitates that the parameters in the equation be determined beforehand from the available data, giving rise to an inverse problem of parameter identification. Such problems have been extensively studied in applied mathematics [1,2,3] and, more recently, in statistics [4,5,6]. See the monographs [7,8,9] and the references therein for an extended overview on this research area.

In the present paper, we shall focus on the following representative example. Consider a physical quantity undergoing diffusion in an inhomogeneous multidimensional convex medium

O \subset R^{d}, d \in N

, with smooth boundary

\partial O

. At equilibrium, the density

u (x)

of the diffusing substance at any location

x \in O

is governed by the second-order elliptic PDE

\{\begin{matrix} \nabla \cdot (f \nabla u) = s, & on O \\ u = b, & on \partial O, \end{matrix}

(1)

where

s : O \to R

describes the spatial distribution of local sources or sinks,

g : \partial O \to R

prescribes the density values at the boundary, and the diffusivity function

f : O \to (0, \infty)

models spatially varying conductivity throughout the inhomogeneous domain. Under mild regularity conditions on the PDE coefficients, standard elliptic theory implies the existence of a unique twice continuously differentiable classical solution

G (f) \equiv u_{f} \in C^{2} (O)

to (1) (e.g., [10], Chapter 6). Assuming that s and b are known, we are then interested in the problem of estimating f from n noisy point evaluations of

G (f)

over a grid of (possibly random) design points

X_{1}, \dots, X_{n}

in

O

,

Y_{i} = G (f) (X_{i}) + σ W_{i}, i = 1, \dots, n,

(2)

where

W_{1}, \dots, W_{n}

are statistical measurement errors, and

σ > 0

is the noise level. In view of the central limit theorem, the Gaussian assumption

W_{1}, \dots, W_{n} \overset{i i d}{\sim} N (0, 1)

can often be realistically maintained. The inverse problem of recovering the diffusivity in the elliptic PDE (1) from observations of its solution is an important building block in oil reservoir modelling [11], where u is the (accessible to measurements) fluid pressure, and f is the (not directly observable) permeability field, which can exhibit drastic spatial variations due to changes in the reservoir composition. This problem has been studied in a large number of articles in applied mathematics, e.g., [12,13,14], and statistics, e.g., [5,15,16]. An illustration on a bi-dimensional domain is given in Figure 1.

While the PDE (1) is linear, the parameter-to-solution map

f \mapsto G (f)

is not, which poses several methodological and theoretical challenges. In particular, least squares functionals involving

G (f)

are generally non-convex so that commonly used optimisation-based methods (such as Tikhonov regularisation, maximum likelihood or maximum-a posteriori estimation) cannot reliably be implemented by standard convex optimisation techniques. In this context, the Bayesian approach to inverse problems, popularised by influential work by Stuart [17], offers an attractive alternative. In the Bayesian framework, the unknown parameter f is regarded as a random variable (with values in a function space) and assigned a prior probability distribution

Π (\cdot)

that models the available information about f before collecting the observations. The prior is then combined, through Bayes’ formula, with the data

{(Y_{i}, X_{i})}_{i = 1}^{n}

to form the posterior distribution

Π (\cdot | {(Y_{i}, X_{i})}_{i = 1}^{n})

, which represents the updated belief about f and is used to draw the inferences. As the posterior formally involves only evaluations of the prior and the likelihood (cf. Equation (7) below), the approximate computation of

Π_{n} (\cdot | {(Y_{i}, X_{i})}_{i = 1}^{n})

and its associated posterior mean estimator

{\bar{f}}_{n} : = E^{Π} [f | {(Y_{i}, X_{i})}_{i = 1}^{n}]

via sampling methods is feasible as long as the forward map G can be numerically evaluated. For the elliptic PDE (1), this can be performed using efficient PDE solvers based on finite element methods, sidestepping altogether the need for a (possibly non-existent) inversion formula for G, as well as the use of optimisation approaches. In particular, for the class of Gaussian process priors, efficient ad hoc Markov chain Monte Carlo (MCMC) algorithms, suited to the present infinite-dimensional setting, have been developed, e.g., [16]. A further decisive advantage of the Bayesian methodology is that, alongside point estimates, it also automatically delivers uncertainty quantification for the recovery via the spread of the posterior, used in applications to provide interval-type estimators and to construct hypothesis tests.

The success and popularity in applications has led to recent interest in the literature for the derivation of theoretical performance guarantees for nonparametric Bayesian procedures in PDE models [18,19,20,21,22,23]. This is motivated by the fact that the performance of Bayesian methods depends on a suitable choice of the prior, which in infinite-dimensional statistical models primarily serves as a regularisation tool, and whose specification is a delicate task in its own right (cf. Section 1.2 in [24]). Thus, the question arises as to whether Bayesian procedures may provide valid and prior-independent inference, at least in the presence of informative data. The established paradigm under which such investigation is carried out is the frequentist analysis of Bayesian procedures, assuming that the observations are generated by a fixed ground truth

f_{0}

and studying the concentration of the posterior towards

f_{0}

in the large sample size limit. We refer the reader to [24] for an introduction to this research area.

The present paper is concerned with the performance of Bayesian nonparametric methods based on Gaussian priors in the elliptic inverse problem (2). In Section 2, we provide a general result that establishes posterior consistency and convergence rates of the conditional mean estimator for a general class of Gaussian priors (Theorem 1). We then apply our general theorem to obtain statistical recovery rates for truncated Gaussian series priors defined on the eigenbasis of the Dirichlet–Laplacian (Example 1), which is a commonly used basis of practical interest offering a convenient and generally applicable framework for implementation; cf. Section 3.1. This shows that the resulting procedures provide statistically valid estimation of the diffusivity f, with explicit estimation error bounds that decay algebraically in the number n of observations. Our results extend upon the recent investigation of Giordano and Nickl [20], who considered Gaussian priors associated to popular covariance kernels (such as the Matérn or squared-exponential ones), as well as truncated Gaussian wavelet series expansions but did not explore the methodologically attractive procedures based on the Dirichlet–Laplacian eigenbasis constructed herein.

Furthermore, in Section 3, we complement the theoretical results with a discussion on implementation, devising two different discretisation strategies. The first is tailored to applications where a specific set of basis functions is of interest. In particular, in Section 3.1, we employ the Dirichlet–Laplacian eigenbasis, discretising the parameter space by high-dimensional truncated series expansions in accordance with our theoretical results. In the second approach, described in Section 3.2, we discretise the parameter space via piece-wise linear functions defined on the elements of a deterministic triangular mesh, which is naturally suited to implementing Gaussian priors specified via a covariance kernel. Here, the popular Matérn kernel is employed for illustration. A numerical simulation study is provided to investigate the performance of the inferential procedures under the two discretisation strategies. In our numerical experiments, both approaches yielded satisfactory results, comparable in terms of reconstruction quality and running time. The posterior mean estimate (relative to a Matérn process prior), computed via a Metropolis–Hastings-type MCMC algorithm, is shown in Figure 2 for increasing sample sizes, to be compared to the true diffusion coefficient pictured in Figure 1 (left). The reproducible MATLAB code used for the study is available at https://github.com/MattGiord/Bayesian-Elliptic-PDEs/tree/main (accessed on 19 March 2025).

2. Materials and Methods

2.1. Likelihood, Prior and Posterior

Throughout,

O \subset R^{d}

,

d \in N

, is a given nonempty, open, convex and bounded set with smooth boundary

\partial O

. For the observation model (2), with

G (f)

as the solution to the PDE (1), and for fixed constants

α > 1 + d / 2

and

f_{m i n} > 0

, we take the parameter space

\begin{matrix} F_{α, f_{m i n}} : = \{f \in H^{α} (O) : inf_{x \in O} f (x) \geq f_{m i n}, f_{| \partial O} \equiv 1, \frac{\partial^{j} f}{\partial ν^{j}} \equiv 0 f o r 1 \leq j \leq α - 1\}, \end{matrix}

(3)

with

H^{α} (O)

as the usual Sobolev space of regularity

α

and

ν (x), x \in \partial O

, the unit normal vector. Assuming that the source term s in (1) is fixed and smooth, and taking (without loss of generality) homogeneous Dirichlet boundary conditions

b \equiv 0

, the Schauder theory for elliptic PDEs (e.g., Theorem 6.14 in [10]) implies that for each

f \in F_{α, f_{m i n}}

, there exists a unique classical solution

G (f) \in C (\bar{O}) \cap C^{1 + α} (O)

to the elliptic PDE (1). We then assume data

{(Y_{i}, X_{i})}_{i = 1}^{n}

arising as in Equation (2) for some unknown

f \in F_{α, f_{m i n}}

, with independent and identically distributed (i.i.d.) random design variables

X_{1}, \dots, X_{n}

following the uniform distribution on

O

. Throughout, we regard the noise level

σ > 0

in (2) as fixed and known; in practice, it may often be replaced by an estimate (cf. Section 4). In view of the i.i.d. standard normal assumption on the noise variables

W_{1}, \dots, W_{n}

in (2), the random vectors

{(Y_{i}, X_{i})}_{i = 1}^{n} \sim P_{f}^{(n)}

have joint probability density function in product form,

p_{f}^{(n)} ({(x_{i}, y_{i})}_{i = 1}^{n}) = \frac{1}{{(2 π σ^{2})}^{n / 2}} e^{- \sum_{i = 1}^{n} {[y_{i} - G (f) (x_{i})]}^{2} / (2 σ^{2})}, y_{i} \in R, x_{i} \in O .

Accordingly, the log-likelihood is seen to be equal to, up to an additive constant, the negative least-square functional

l_{n} (f) : = - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {[Y_{i} - G (f) (X_{i})]}^{2}, f \in F_{α, f_{m i n}} .

(4)

In a recent paper by Giordano and Nickl [20], posterior consistency and convergence rates for the conditional mean estimator have been established for nonparametric Bayesian procedures based on Gaussian process priors. To incorporate the shape constraints in the parameter space

F_{α, f_{m i n}}

, ref. [20] employed the parametrisation

f = Φ \circ F,

(5)

where

F \in H_{0}^{α} (O)

(the completion of

C_{c}^{\infty} (O)

with respect to

{∥ \cdot ∥}_{H^{α}}

) and

Φ : R \to [f_{m i n}, \infty)

is a regular link function, that is a smooth, strictly increasing and bijective function with bounded derivatives and such that

Φ (0) = 1

. An instance of regular link function is provided in Example 24 of [20]. In practice, the exponential link

Φ (\cdot) = exp (\cdot)

is often used. In the following, we occasionally switch between the notation f and F, implicitly making use of the relation (5).

Under the parametrisation (5), placing a prior probability distribution

Π (\cdot)

on F induces a (push-forward) prior on f, which, in slight abuse of the notation, we still denote by

Π (\cdot)

. Following [20], we consider scaled Gaussian priors, constructed starting from a base (possibly n-dependent) centred Gaussian Borel probability measure

Π_{W_{n}}

, which we assume to be supported on a measurable linear subspace of the Hölder space

C^{β} (O)

, for some

β \geq 1

, and to have reproducing kernel Hilbert space (RKHS)

H_{W_{n}}

continuously embedded into

H_{0}^{α} (O)

. See Chapter 2 in [25] for the background and terminology on Gaussian processes and measures. Given

Π_{W_{n}}

, we then construct the scaled prior

Π_{n} (\cdot)

as the law of the random function

V (x) : = \frac{W (x)}{n^{d / (4 α + 4 + 2 d)}}, x \in O, W \sim Π_{W_{n}} (\cdot) .

(6)

By linearity,

Π_{n} (\cdot)

also defines a centred Gaussian Borel probability measure with the same support as

Π_{W_{n}}

; the scaling enforces the additional regularisation used in the theoretical analysis to deal with the nonlinearity of the inverse problem.

Given prior

Π_{n} (\cdot)

as above, by Bayes’ formula (p. 7, [24]), the posterior distribution

Π_{n} (\cdot | {(Y_{i}, X_{i})}_{i = 1}^{n})

of

F | {(Y_{i}, X_{i})}_{i = 1}^{n}

arising from data in model (2) equals

Π_{n} (B | {(Y_{i}, X_{i})}_{i = 1}^{n}) = \frac{\int_{B} e^{l_{n} (Φ \circ F)} d Π_{n} (F)}{\int_{C (O)} e^{l_{n} (Φ \circ F^{'})} d Π_{n} (F^{'})}, B \subseteq C (O) m e a s u r a b l e,

(7)

with

l_{n} (\cdot)

as the log-likelihood in (4).

2.2. Convergence Rates

We study the asymptotic concentration of the posterior distribution

Π_{n} (\cdot | {(Y_{i},

X_{i} {)}}_{i = 1}^{n})

in (7) around the ground truth diffusivity function

f_{0} = Φ \circ F_{0}

, assuming that the data

{(Y_{i}, X_{i})}_{i = 1}^{n} \sim P_{f_{0}}^{(n)}

have been generated according to the observation model (2) with

f = f_{0}

. The following theorem extends the main result of [20], allowing to include in the analysis general sieve-type Gaussian priors, cf. Example 1. The proof follows through similarly to Section 3.2 in [20]; it is included for completeness and the convenience of the reader in Appendix B.

Theorem 1.

For fixed positive integers

α, β \in N

such that

α > β + d / 2

, consider the scaled prior

Π_{n}

in (6), where

Π_{W_{n}}

is a centred Gaussian Borel probability measure supported on a measurable linear subspace of

C^{β} (O)

, with RKHS

H_{W_{n}} \subseteq H_{0}^{α} (O)

satisfying, for some constant

c > 0

(independent of n),

{∥ F ∥}_{H^{α}} \leq c {∥ F ∥}_{H_{W_{n}}}, \forall F \in H_{W_{n}} .

Further assume that

sup_{n \in N} E^{Π_{W_{n}}} {∥ F ∥}_{C^{1}} < \infty .

For fixed

F_{0} \in H_{0}^{α} (O)

, suppose that there exists a sequence

F_{0, n} \in H_{W_{n}}

such that, as

n \to \infty

,

∥ F_{0} - F_{0, n} ∥_{{(H^{1})}^{*}} = O (n^{- \frac{α + 1}{2 α + 2 + d}}); max {∥ F_{0, n} ∥_{C^{1}}, ∥ F_{0, n} ∥_{H_{W_{n}}}} = O (1) .

(8)

Then, there exists

L > 0

that is large enough such that, as

n \to \infty

,

Π_{n} (f : ∥ G (f) - G (f_{0}) ∥_{L^{2}} > L n^{- \frac{α + 1}{2 α + 2 + d}} | {(Y_{i}, X_{i})}_{i = 1}^{n}) \to 0,

(9)

in

P_{f_{0}}^{(\infty)}

-probability as

n \to \infty

. If in addition

β \geq 2

and

{inf}_{x \in O} s (x)

> 0

, then there exists

L > 0

that is large enough and a constant

λ > 0

such that

Π_{n} (f : ∥ f - f_{0} ∥_{L^{2}} > L n^{- λ} | {(Y_{i}, X_{i})}_{i = 1}^{n}) \to 0,

(10)

in

P_{f_{0}}^{(\infty)}

-probability as

n \to \infty

, and moreover, the estimator

{\bar{f}}_{n} = Φ \circ {\bar{F}}_{n}

, with

{\bar{F}}_{n} = E^{Π_{n}} [F | {(Y_{i}, X_{i})}_{i = 1}^{n}]

, satisfies as

n \to \infty

,

P_{f_{0}}^{(n)} (∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}} > n^{- λ}) \to 0 .

(11)

The first statement (Equation (9)) of Theorem 1 establishes posterior consistency in prediction risk: The induced posterior on the PDE solution

G (f), f \sim Π_{n} (\cdot | {(Y_{i},

X_{i} {)}}_{i = 1}^{n})

, concentrates around the true PDE solution

G (f_{0})

in

L^{2}

-distance at rate

n^{- (α + 1) / (2 α + 2 + d)}

. Since such a rate is known to be minimax optimal [26] (Theorem 10), procedures satisfying the assumption of Theorem 1 are seen to optimally solve the PDE-constrained regression problem of recovering

G (f_{0})

from data

{(Y_{i}, X_{i})}_{i = 1}^{n}

.

The second statement shows that the posterior contracts around

f_{0}

also in the standard

L^{2}

-risk, thereby solving the inverse problem of estimating the diffusivity. It follows combining (9) with the regularisation properties implied by the rescaling in the prior construction (6) and a suitable stability estimate for

G^{- 1}

. The latter was proved in [26] (Lemma 24), and requires the slightly stronger assumption on

β

and the strict positivity of the source s. The exponent

λ > 0

is explicitly computed in the proof of Theorem 1 and equals

λ = (α + 1) (β - 1) / (2 α + 2 + d) (β + 1)

. Note that

λ < (α + 1) / (2 α + 2 + d)

. While minimax optimal rates for estimating the diffusivity f in model (2) are currently unknown, inspection of the proof shows that when

f_{0} \in C^{\infty} (O)

, then the prior can be tuned so to attain a rate as closed as desired to the parametric rate

n^{- 1 / 2}

.

The last statement of Theorem 1 entails that the estimator

{\bar{f}}_{n}

converges towards

f_{0}

in

L^{2}

-risk at the same rate

n^{- λ}

attained by the whole posterior distribution. It is indeed a corollary of (10), following from uniform integrability arguments for Gaussian measures and the Lipschitzianity of the composition with the link function

Φ

.

2.3. Examples of Gaussian Priors

We now provide two concrete instances of Gaussian priors to which Theorem 1 applies. For both examples, an implementation of the associated posterior-based inference is provided in Section 3 below. We maintain the assumption that

f_{0} = Φ \circ F_{0}

for some

F_{0} \in H^{α} (O)

supported inside a given compact subset

K \subset O

. This corresponds to the common assumption that

f_{0}

is known near the boundary

\partial O

(specifically

f_{0} \equiv 1

on

O ∖ K

).

We first consider high-dimensional Gaussian sieve priors obtained via truncating Karhunen–Loève-type random series expansions, a frequently used approach in computation, e.g., [27]. In particular, via Theorem 1, we study procedures based on the Dirichlet–Laplacian eigenbasis. Such constructions corresponds to commonly used Gaussian process priors with covariance kernel given by an inverse power of the Laplacian, e.g., [17] (Section 2.4). The eigenbasis can be numerically computed via efficient finite element methods for elliptic eigenvalue problems, offering a broadly applicable framework for implementation on general domains

O

. Details on computation and a numerical simulation study are provided in Section 3.1.

Example 1

(Dirichlet–Laplacian eigenbasis). Let

{e_{j}, j \geq 0} \subset H_{0}^{1} (O) \cap C^{\infty} (\bar{O})

be the orthonormal basis of

L^{2} (O)

formed by the eigenfunctions of the (negative) Dirichlet–Laplacian:

\{\begin{matrix} - Δ e_{j} - λ_{j} e_{j} = 0, & on O \\ e_{j} = 0, & on \partial O, \end{matrix} j \geq 0,

(12)

with associated eigenvalues

0 < λ_{0} < λ_{1} \leq λ_{2} \leq \dots,

satisfying

λ_{j} \to \infty

as

j \to \infty

according to Weyl’s asymptotics:

λ_{j} = O (j^{2 / d})

as

j \to \infty

, cf. Figure 3 and Figure 4. See Example 6.3 and Section 7.4 in [28] for details. Define the Hilbert scale

H^{s} : = \{w \in L^{2} {(O) : ∥ w ∥}_{H^{s}}^{2} : = \sum_{j = 0}^{\infty} λ_{j}^{s} {| {〈 w, e_{j} 〉}_{2} |}^{2} < \infty\}, s \geq 0 .

We then have

H^{0} = L^{2} (O)

(with equality of norms),

H^{1} = H_{0}^{1} (O)

(with equivalence of norms),

H^{2} = H_{0}^{1} (O) \cap H^{2} (O)

, and the continuous embedding

H^{s} \subset H^{s} (O)

for all

s \geq 0

(holding generally with strict inclusion), cf. p. 472f. in [29]. In fact, it holds that

{∥ w ∥}_{H^{s}} ≃ {∥ w ∥}_{H^{s}}

for all

w \in H^{s}

and

s \geq 0

(proved initially for

s = 2

, then extended by induction to all larger integers, and finally by interpolation to all positive reals), and if

F \in H^{s} (O)

is compactly supported within

O

, then

F \in H^{s}

. Finally, defining

H^{- 1} : = {(H^{1})}^{*} = {(H_{0}^{1} (O))}^{*}

, we have the equivalence (cf. Equation (A15) in [29])

\begin{matrix} {∥ w ∥}_{{(H_{0}^{1})}^{*}}^{2} ≃ {∥ w ∥}_{H^{- 1}}^{2} : = \sum_{j = 0}^{\infty} λ_{j}^{- 1} {| {〈 w, e_{j} 〉}_{2} |}^{2} . \end{matrix}

(13)

Now for fixed

J \in N

, the Gaussian random sum

{\bar{W}}_{J} : = \sum_{j \leq J} λ_{j}^{- α / 2} W_{j} e_{j}, W_{j} \overset{i i d}{\sim} N (0, 1),

defines a centred Gaussian Borel probability measure supported on (and with RKHS equal to) the finite-dimensional space

H_{{\bar{W}}_{J}} : = span {e_{j}, j \leq J}

, with RKHS norm

∥ \bar{w} ∥_{H_{{\bar{W}}_{J}}}^{2} = \sum_{j \leq J} λ_{j}^{α} w_{j}^{2} = ∥ \bar{w} ∥_{H^{α}}^{2} ≃ {∥ \bar{w} ∥}_{H^{α}}^{2}, \forall w \in {\bar{W}}_{J} .

Fix any smooth cut-off function

χ \in C_{c}^{\infty} (O)

such that

χ = 1

on K, and consider the random function

W_{n} : = χ W_{J_{n}} = \sum_{j \leq J_{n}} λ_{j}^{- α / 2} W_{j} χ e_{j}, W_{j} \overset{i i d}{\sim} N (0, 1),

(14)

where

J_{n} \in N

is such that

J_{n} ≃ n^{d / (2 α + 2 + d)}

. By the linearity and boundedness of multiplication by χ, the law

Π_{W_{n}}

of

W_{n}

defines, according to Exercise 2.6.5 in [25], a centred Gaussian prior supported on (and with RKHS equal to)

H_{W_{n}} = span {χ e_{j}, j \leq J_{n}} \subset C_{c}^{\infty} (O) \subset ⋂_{s = 0}^{\infty} H_{0}^{s} (O) \cap ⋂_{s = 0}^{\infty} H^{s},

with RKHS norm satisfying, with a multiplicative constant independent of n,

∥ χ \bar{w} ∥_{H_{W_{n}}} \leq ∥ \bar{w} ∥_{H_{{\bar{W}}_{J_{n}}}} ≃ {∥ \bar{w} ∥}_{H^{α}}, \forall \bar{w} \in H_{{\bar{W}}_{J_{n}}} .

Arguing as in Example 25 in [20], one further shows that for some constant

c > 0

(independent of n),

{∥ w ∥}_{H^{α}} \leq c {∥ w ∥}_{H_{W_{n}}}, \forall w \in H_{W_{n}} .

Finally, by a Sobolev embedding and the above inequality,

\begin{matrix} E^{Π_{W_{n}}} {∥ F ∥}_{C^{1}}^{2} & ≲ E^{Π_{W_{n}}} {∥ F ∥}_{H^{α}}^{2} \leq c E^{Π_{W_{n}}} {∥ F ∥}_{H_{W_{n}}}^{2} \leq E [\sum_{j \leq J_{n}} λ_{j}^{- α} W_{j}^{2}], \end{matrix}

which is uniformly bounded in n recalling that

W_{i} \overset{i i d}{\sim} N (0, 1)

and the fact that by Weyl’s asymptotics

λ_{j}^{- α} = O (j^{- 2 α / d})

with

α > 1 + d / 2

. This shows that the sequence of base priors

Π_{W_{n}}

satisfies the first two assumptions of Theorem 1. For ground truths

F_{0} \in H^{α} (O)

compactly supported inside K, we have

F_{0} \in H^{α}

. Construct the finite-dimensional approximations

F_{0, n} = \sum_{j \leq J_{n}} {〈 F_{0}, e_{j} 〉}_{2} χ e_{j} \in H_{W_{n}}, n \in N .

(15)

Then for all

n \in N

,

\begin{matrix} ∥ F_{0, n} ∥_{H_{W_{n}}} & \leq {∥\sum_{j \leq J_{n}} {〈 F_{0}, e_{j} 〉}_{2} e_{j}∥}_{H_{{\bar{W}}_{J_{n}}}} = {∥\sum_{j \leq J_{n}} {〈 F_{0}, e_{j} 〉}_{2} e_{j}∥}_{H^{α}} \leq ∥ F_{0} ∥_{H^{α}} ≃ {∥ F_{0} ∥}_{H^{α}} < \infty . \end{matrix}

By a Sobolev embedding, we similarly have

∥ F_{0, n} ∥_{C^{1}} \leq {∥ F_{0} ∥}_{H^{α}} < \infty

for all

n \in N

. Furthermore, since both

F_{0}

and

F_{0, n}

have compact support within

O

,

\begin{matrix} ∥ F_{0} - F_{0, n} ∥_{{(H^{1})}^{*}} & = sup_{H \in H^{1} (O)} \int_{O} (F_{0} (x) - F_{0, n} (x)) H (x) d x \\ = sup_{H \in H_{0}^{1} (O)} \int_{O} (F_{0} (x) - F_{0, n} (x)) H (x) d x = {∥ F_{0} - F_{0, n} ∥}_{{(H_{0}^{1})}^{*}}, \end{matrix}

and recalling (13), Weyl’s asymptotics, and the choice of

J_{n}

,

\begin{matrix} ∥ F_{0} - F_{0, n} ∥_{{(H^{1})}^{*}}^{2} & = \sum_{j > J_{n}} λ_{j}^{- (1 + α)} λ_{j}^{α} {| {〈 F_{0}, e_{j} 〉}_{2} |}^{2} \\ \leq λ_{J_{n}}^{- (1 + α)} {∥ F_{0} ∥}_{H^{α}}^{2} ≲ {(J_{n})}^{- 2 (1 + α) / d} ≃ n^{- 2 (α + 1) / (2 α + 2 + d)} . \end{matrix}

We conclude that Theorem 1 applies with the sequence of base Gaussian sieve priors in (14), choosing the approximations

F_{0, n}

according to (15).

The second main example, already considered in [20], concerns stationary Gaussian processes specified via a translation invariant covariance kernel. For concreteness, we focus on the popular Matérn kernel. Implementation of the resulting procedures is illustrated in Section 3.2.

Example 2

(Matérn covariance kernel). Consider a Matérn process

W = {W (x), x \in O}

on

O

with regularity

α - d / 2

, that is a centred stationary Gaussian process with covariance kernel

C (x, y) = \frac{2^{1 - α}}{Γ (α)} {(\frac{| x - y | \sqrt{2 α}}{ℓ})}^{α} B_{α} (\frac{| x - y | \sqrt{2 α}}{ℓ}), x, y \in O, ℓ > 0,

(16)

where Γ denotes the gamma function and

B_{α}

is the modified Bessel function of the second kind. Fix any smooth cut-off function

χ \in C_{c}^{\infty} (O)

such that

χ = 1

on K. It can then be shown (cf. Example 25 in [20]) that the law

Π_{W}

of

χ W

defines a centred Gaussian Borel probability measure supported on the separable linear subspace

C^{β^{'}} (O)

of

C^{β} (O)

for any

β < β^{'} < α - d / 2

. Furthermore, its RKHS is given by

H_{W} = {χ F, F \in H^{α} (O)} \subset H_{0}^{α} (O)

, with the RKHS norm satisfying

{∥ χ F ∥}_{H^{α}} ≲ {∥ χ F ∥}_{H_{W}}, \forall F \in H^{α} (O) .

For ground truths

F_{0} \in H^{α} (O)

compactly supported inside K, we have

χ F = F

, so that

F \in H_{W}

. We conclude that Theorem 1 applies for a base Matérn process prior

Π_{W_{n}} : = Π_{W}

, choosing the trivial approximating sequence

F_{0, n} : = F_{0}

for all

n \in N

.

3. Results

We investigate the performance of the procedures based on the Gaussian priors considered in Examples 1 and 2 in a numerical simulation study. For illustration, we fix the working domain

O

to the area contained inside the rotated ellipse, with

θ = π / 6

,

\partial O = {(cos (t) cos (θ) - 0.75 sin (t) sin (θ), 0.75 sin (t) cos (θ) + cos (t) sin (θ)), t \in [0, 2 π)},

and take the ground truth conductivity

\begin{matrix} f_{0} (x, y) & = \sum_{k, m \in {- 1, 1}} e^{- {(5 x + 1.75 k)}^{2} - {(5 y + 1.75 m)}^{2}}, (x, y) \in O, \end{matrix}

(17)

cf. Figure 1 (left). We then generate observations

{(Y_{i}, X_{i})}_{i = 1}^{n}

according to (2). The true PDE solution

G (f_{0})

is numerically computed via finite element methods, using the MATLAB PDE Toolbox. For the experiments, the source function is set to

s (x, y) = exp (- {(5 x - 2.5)}^{2} - {(5 y)}^{2}) + exp (- {(7.5 x)}^{2} - {(2.5 y)}^{2}) + exp (- {(5 x + 2.5)}^{2} - {(5 y)}^{2}), (x, y) \in O

, and the noise standard deviation to

σ = 0.0025

(with corresponding signal-to-noise ratio

∥ G (f_{0}) ∥_{L^{2}} / σ = 68.50

). Further experiments, with differently shaped ground truths, are presented in Appendix A.

3.1. Results with Truncated Gaussian Series Priors

To implement the truncated Gaussian series priors from Example 1, we numerically compute the first

J_{n} ≃ n^{d / (2 α + 2 + d)}

Dirichlet–Laplacian eigenpairs via finite element methods; see Figure 3 and Figure 4. Identifying the functional parameter F in (5) with the vector of coefficients

F : = [F_{1}, \dots, F_{J_{n}}]

, where

F_{j} : = {〈 F, e_{j} 〉}_{2}

, the prior then corresponds to

F \sim N (0, Λ_{n}), Λ_{n} : = n^{- d / (2 α + 2 + d)} diag (λ_{1}^{- α}, \dots, λ_{J_{n}}^{- α}) \in R^{J_{n}, J_{n}} .

(18)

While the employed prior is Gaussian, the nonlinearity of the forward map

f \mapsto G (f)

implies that the log-likelihood

l_{n} (f)

in (4) depends nonlinearly on f. The resulting posterior distribution is therefore generally non-Gaussian and not available in closed form. We then resort to MCMC sampling via the pCN algorithm [16], which is a specific instance of the random-walk Metropolis–Hastings method for Gaussian priors that is known to be robust to the discretisation dimension. In the present setting, we employ the pCN algorithm to generate an

R^{J_{n}}

-valued Markov chain

{(F_{h})}_{h \geq 1}

with invariant measure equal to the posterior distribution, starting from an initialisation point

F_{0}

and then, for

h = 0, 1, 2, \dots

, repeating the following steps:

Draw a prior sample $ξ \sim N (0, Λ_{n})$ , where $Λ_{n}$ is as in (18), and for $δ > 0$ define the proposal $p : = \sqrt{1 - 2 δ} F_{h} + \sqrt{2 δ} ξ$ ;
Set

$F_{h + 1} : = \{\begin{matrix} p, & with probability 1 \land e^{l (Φ \circ p) - l (Φ \circ F_{h})}, \\ F_{h}, & otherwise, \end{matrix}$

where l is the log-likelihood function in (4).

For each iteration, step 2 requires the evaluation of the log-likelihood

l_{n} (Φ \circ p)

, which in turn entails the numerical evaluation of the PDE solution

G (Φ \circ p)

at the design points

X_{1}, \dots, X_{n}

, cf. (4), which we perform via finite element methods. The prior samples

ξ \sim N (0, Λ_{n})

required in step 1 are straightforward to draw since

Λ_{n}

is diagonal.

The algorithm is terminated after H steps, returning approximate samples

{F_{0}, h = 0,

\dots,

H}

from the posterior distribution, where a first batch of iterates is typically discarded as the burn-in. Under a set of assumptions on the forward map that is compatible with the present setting, Hairer, Stuart, and Vollmer [30] derived dimension-free spectral gaps which imply rapid convergence of the pCN marginal laws towards the invariant measure. As a consequence, the posterior mean

{\bar{F}}_{n}

can reliably be numerically computed by the MCMC average

\bar{F} : = \frac{1}{H + 1} \sum_{h = 0}^{H} F_{h},

(19)

with non-asymptotic bounds for the numerical approximation error. Posterior credible sets can likewise be reliably computed by considering the empirical quantiles of the pCN samples.

Figure 2 shows the posterior mean estimates

{\bar{f}}_{n} = Φ \circ {\bar{F}}_{n}

of the conductivity function f obtained for increasing sample sizes

n = 100, 250, 1000

. The corresponding

L^{2}

estimation errors (and the associated relative errors) are reported in Table 1 (first and second row), displaying a progressive decay as expected from the theory developed in Section 2. Across the experiments, the prior regularity parameter

α

in Equation (18) is set to

α = 3 = 2 + d / 2

. The pCN algorithm is iterated H = 25,000 times, and the first 5000 samples are discarded as the burn-in. The step size is set to

δ = 0.0025, 0.001, 0.0005, 0.00025

, tuned depending on the sample size to obtain a stabilisation of the acceptance rate around 30% after the burn-in; see Figure 5 (left). A pCN cold-start

F_{0} = 0

is employed. Figure 5 (right) shows, for the experiment with

n = 1000

, the log-likelihood

l_{n} (Φ \circ ϑ_{h})

along the first 3000 pCN iterates (part of the burn-in phase), seen to rapidly increase towards, and then stabilise around, the log-likelihood

l_{n} (f_{0})

attained by the true diffusivity

f_{0}

. The computation time per experiment is around 50 min on a 2020 M1 MacBook Pro 13 running MATLAB 2024b.

3.2. Results with the Matérn Process Prior

For Gaussian process priors specified via a covariance kernel, such as the Matérn process considered in Example 2, we discretise the parameter space by assuming that F in (5) is given by the finite sum

F = \sum_{m = 1}^{M} F_{m} ψ_{m}, F_{m} \in R, M \in N,

(20)

where

{ψ_{1}, \dots, ψ_{M}}

are piecewise linear functions on a deterministic triangular mesh with nodes

{z_{1}, \dots, z_{M}} \subset O

, uniquely characterised by the relation

ψ_{m} (z_{m^{'}}) = 1_{{m = m^{'}}}

. Accordingly, F in (20) satisfies

F (z_{m}) = F_{m}

, and for any

x \in O

, the value

F (x)

is obtained by linear interpolation over the pairs

{(z_{m}, F_{m}), m = 1, \dots, M}

. Given the Matérn kernel C in (16), and identifying F with the vector of values

F : = [F_{1}, \dots, F_{M}]

, the prior then corresponds to

F \sim N (0, C), C = {[C_{m m^{'}}]}_{m, m^{'} = 1}^{M} \in R^{M, M}, C_{m m^{'}} = C (z_{m}, z_{m}^{'}) .

Posterior inference based on the Matérn process prior may be implemented via the pCN algorithm described in Section 3.1 above. For each iteration, the construction of the pCN proposal p in step 1 entails sampling an independent multivariate Gaussian random variable

ξ \sim N (0, C)

. In step 2, the computation of the acceptance probability requires the evaluation of the proposal log-likelihood

l_{n} (Φ \circ p)

, which can be performed via numerical PDE methods as described above.

For the ground truth

f_{0}

specified in Equation (17), Table 1 (third and fourth row) displays the

L^{2}

-estimation errors (and the relative errors) associated to the posterior mean estimates

{\bar{f}}_{n} = Φ \circ {\bar{F}}_{n}

for increasing sample sizes

n = 100, 250, 500, 1000

. Across the experiments, the parameter space is discretised using a uniform triangular mesh with

M = 2000

nodes. The hyperparameters for the Matérn covariance kernel in (16) are set to

α = 3

and

ℓ = 0.25

. Similarly to the results in Section 3.1, the runs of pCN are stopped after H = 25,000 iterations, with a tuning of the step size (respectively,

δ = 0.0025, 0.001, 0.0005, 0.00025

) to achieve a stabilisation of the acceptance rate at around

30 %

after the burn-in (corresponding to the first 5000 iterates). A non-informative initialisation point

F_{0} = 0

is chosen for each run. The computation times ranges between 50 and 57 min, in line with those obtained for the truncated Gaussian series prior in Section 3.1.

4. Discussion

In this article, we have considered the nonparametric Bayesian approach to inference in elliptic PDEs, focusing on the standard benchmark problem of estimating the diffusivity function from noisy observation of the PDE solution. We have provided a general asymptotic concentration result, Theorem 1, for the posterior distribution and the posterior mean estimator, and showed that it applies to two classes of Gaussian priors of interest, namely truncated Gaussian series priors defined on the Dirichlet–Laplacian eigenbasis (cf. Example 1) and Matérn process priors (cf. Example 2). For both prior models, we have devised strategies for implementing the posterior-based inference, employing efficient and reliable MCMC algorithms. The performance of the considered methods has been investigated in a numerical simulation study, where excellent reconstruction results and a general agreement with the theory have been obtained. Overall, the main advantages of the approach lie in its modelling flexibility, and the robustness of the associated computational methods, as well as the availability of theoretical guarantees on the recovery performance.

Let us conclude overviewing some related research questions. Firstly, we remark that that the noise standard deviation

σ

in (2) is assumed to be known throughout the paper. In the realistic scenario where

σ

is also unknown, the methodology developed here can readily be adapted by replacing it (in an empirical Bayes spirit) with a preliminary (non-likelihood-based) estimate. Several strategies have been proposed in the literature for variance estimation in nonparametric regression models, ranging from residual-based estimators using kernel smoothing [31] and splines [32], to difference-based estimators [33]. Alternatively, a joint Bayesian model for f and

σ

in (2) could be considered by endowing

σ

with a prior distribution (for example, an independent conjugate inverse gamma prior). This leads to the interesting question concerning the extension of the theoretical results presented in Section 2 to the setting with unknown variance; see [34] for related results in a direct regression model.

Secondly, we mention the important issue of specifying the hyperparameter values for the considered prior distributions, namely, the regularity parameter

α

for the truncated Gaussian series prior from Example 1, and the smoothness and length-scale in the Matérn covariance kernel (16). There a vast literature investigating the methodological and theoretical aspects of empirical and hierarchical Bayesian strategies to fully data-driven hyperparameters selection; see [20,35,36,37,38], where many more references can be found. Investigating the implications and performance of these methods in the context of the elliptic PDE model and the Gaussian prior distributions considered in the present article is an interesting direction for future research.

Funding

This research has been partially supported by MUR, PRIN project 2022CLTYP4.

Data Availability Statement

The (MATLAB) code to fully reproduce the numerical study is available at: https://github.com/MattGiord/Bayesian-Elliptic-PDEs/tree/main (accessed on 19 March 2025).

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Further Numerical Results

We present further empirical investigations in which we consider the recovery of three additional true diffusivity functions, respectively specified (under the parametrisation (5)) by

\begin{matrix} F_{0}^{(1)} (x, y) & = log (1 + e^{- {(4 x - 1.75)}^{2} - {(4 y - 1)}^{2}} + e^{- {(4 x + 1.75)}^{2} - {(4 y + 1)}^{2}}); \\ F_{0}^{(2)} (x, y) & = log (1 + e^{- {(5 x - 1.75)}^{2} - {(2.5 y - 0.25)}^{2}} + e^{- {(5 x + 1.75)}^{2} - {(5 y + 1)}^{2}}); \\ F_{0}^{(3)} (x, y) & = log (1 + 0.5 e^{- {(5 x - 1.75)}^{2} - {(5 y - 1.75)}^{2})} - 0.5 e^{- {(5 x + 1.75)}^{2} - {(5 y + 1.75)}^{2})}), \end{matrix}

for

(x, y) \in O

; see Figure A1, top row. For each ground truth, we generate synthetic datasets

{(Y_{i}, X_{i})}_{i = 1}^{n}

, with

n = 2500

as described in Section 3, with noise standard deviation

σ = 0.0025

(and signal-to-noise ratio respectively equal to 71.02, 73.26, and 63.88). Next, for each set of observations, we implement posterior inference with truncated Gaussian series priors based on the Neumann–Laplacian eigenpairs and with the Matérn process prior, numerically computing the associated posterior mean estimates via the pCN algorithm. For the series priors, the regularity parameter in (18) is set to

α = 3

. For the Matérn prior, the covariance hyperparameters in (16) are set to

α = 3

and

ℓ = 0.25

. Across the three collections of experiments, the pCN algorithm is iterated for 25,000 steps, with a tuning of the step size to achieve a stabilisation of the acceptance rate between 20% and 30% after the burn-in (corresponding to the first 5000 samples). All the runs are initialised with cold starts. The obtained results are visualised in Figure A1 and summarised in Table A1. The computation times are in line with those of the experiments presented in Section 3.

Figure A1. Left column, top to bottom, respectively: the ground truth corresponding to

F_{0}^{(1)}

, and the posterior mean estimates (computed via the pCN algorithm) for the truncated series and Matérn priors. The step size is set to δ = 0.00035, 0.0001 respectively, and the acceptance ratios are 24.15% and 26.51%. Central column: the ground truth for

F_{0}^{(2)}

and the posterior mean estimates. For pCN, δ = 0.0001, 0.00005, acceptance ratios: 20.42% and 22.73%. Right column: the ground truth for

F_{0}^{(3)}

and the posterior mean estimates. For pCN, δ = 0.0001, 0.000035, acceptance ratios: 25.61%, 29.85%.

Figure A1. Left column, top to bottom, respectively: the ground truth corresponding to

F_{0}^{(1)}

, and the posterior mean estimates (computed via the pCN algorithm) for the truncated series and Matérn priors. The step size is set to δ = 0.00035, 0.0001 respectively, and the acceptance ratios are 24.15% and 26.51%. Central column: the ground truth for

F_{0}^{(2)}

and the posterior mean estimates. For pCN, δ = 0.0001, 0.00005, acceptance ratios: 20.42% and 22.73%. Right column: the ground truth for

F_{0}^{(3)}

and the posterior mean estimates. For pCN, δ = 0.0001, 0.000035, acceptance ratios: 25.61%, 29.85%.

Table A1. Recovery performances of the posterior mean estimate for different ground truths.

	$F_{0}^{(1)}$	$F_{0}^{(2)}$	$F_{0}^{(3)}$
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}}$ ; series prior	0.1197	0.0974	0.07234
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}} / {∥ f_{0} ∥}_{L^{2}}$ ; series prior	7.22%	5.90%	4.71%
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}}$ ; Matérn prior	0.1241	0.0941	0.08148
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}} / {∥ f_{0} ∥}_{L^{2}}$ ; Matérn prior	7.49%	5.70%	5.31%

Appendix B. Proof of Theorem 1

We follow the proofs of Theorems 4–6 in [20], recalling two key properties of the forward operator G, holding for all

α > 1 + d / 2

,

∥ G (Φ \circ F_{1}) - G (Φ \circ F_{2}) ∥_{L^{2}} ≲ (1 + ∥ F_{1} ∥_{C^{1}}^{4} + ∥ F_{2} ∥_{C^{1}}^{4}) ∥ F_{1} - F_{1} ∥_{{(H^{1})}^{*}}, \forall F_{1}, F_{2} \in H_{0}^{α} (O),

(A1)

and

sup_{F \in H_{0}^{α}} {∥ G (Φ \circ F) ∥}_{L^{\infty}} < \infty .

Step I: posterior contraction rates in prediction risk.

We start with the derivation of the contraction rates in prediction risk (9). Set

ϵ_{n} : = n^{- (α + 1) / (2 α + 2 + d)}

. By an application of Theorem 13 and Lemmas 22 and 23 in [20], it is enough to show that for some sufficiently large constant

c_{1} > 0

, there exists a constant

c_{2} > 0

such that

Π_{n} (F : ∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥_{L^{2}} \leq c_{1} ϵ_{n}) \geq e^{- c_{2} n ϵ_{n}^{2}},

(A2)

and further that there exist measurable sets

W_{n} \subseteq C^{β} (O)

satisfying

Π_{n} (W_{n}^{c}) \leq e^{- c_{3} n ϵ_{n}^{2}}; log N (ϵ_{n}; W_{n}, d_{G}) ≲ n ϵ_{n}^{2},

(A3)

for some

c_{3} > 0

large enough, where

d_{G} (F_{1}, F_{2}) : = {∥ G (Φ \circ F_{1}) - G (Φ \circ F_{2}) ∥}_{L^{2}}

, and

N (ϵ_{n}; W_{n}, d_{G})

is the minimal number of balls of radius

ϵ_{n}

in the metric

d_{G}

needed to cover

A_{n}

.

The main difference compared to the proofs in [20] lies in the verification of the small ball probability estimate (A2). We proceed lower bounding this quantity by, for sufficiently large

M > 0

,

\begin{matrix} Π_{n} (F : ∥ G (Φ \circ F) & - G (Φ \circ F_{0}) ∥_{L^{2}} \leq c_{1} ϵ_{n}, {∥ F - F_{0} ∥}_{C^{1}} \leq M) \\ \geq Π_{n} (F : ∥ F - F_{0} ∥_{{(H^{1})}^{*}} \leq k_{1} ϵ_{n}, {∥ F - F_{0} ∥}_{C^{1}} \leq M) \end{matrix}

for some

k_{1} > 0

(depending on

c_{1}

), having used (A1). Recalling the assumption (8) on the approximating sequence

F_{0, n} \in H_{W_{n}}

and using (twice) the triangle inequality, the latter prior probability is greater than

\begin{matrix} Π_{n} (F : ∥ F - F_{0, n} ∥_{{(H^{1})}^{*}} \leq k_{2} ϵ_{n}, {∥ F - F_{0, n} ∥}_{C^{1}} \leq M^{'}) = Π_{n} (F : F - F_{0, n} \in B_{1} \cap B_{2}) \end{matrix}

for

k_{2}, M^{'} > 0

, having defined

B_{1} : = {{F : ∥ F ∥}_{{(H^{1})}^{*}} \leq k_{2} ϵ_{n}}

and

B_{2} : = {{F : ∥ F ∥}_{C_{1}} \leq M^{'}}

. The intersection

B_{1} \cap B_{2}

defines a symmetric set in the ambient Banach space

C^{1} (O)

; hence, recalling that, by linearity, the RKHS of the scaled Gaussian prior

Π_{n} (\cdot)

coincides with the RKHS

H_{W_{n}}

of the base prior

Π_{W_{n}} (\cdot)

, with the scaled RKHS norm being equal to

n^{d / (4 α + 4 + 2 d)} {∥ \cdot ∥}_{H_{W_{n}}} = \sqrt{n} ϵ_{n} {∥ \cdot ∥}_{H_{W_{n}}}

(e.g., Exercise 2.6.15 in [25]), an application of Corollary 2.6.18 in [25] gives the further lower bound

e^{- n ϵ_{n}^{2} {∥ F_{0, n} ∥}_{H_{W_{n}}}^{2}} Π_{n} (B_{1} \cap B_{2}) \geq e^{- k_{3} n ϵ_{n}^{2}} Π_{n} (B_{1} \cap B_{2}),

for

k_{3} > 0

since

∥ F_{0, n} ∥_{H_{W_{n}}} = O (1)

by assumption. Now,

B_{1}

and

B_{2}

are closed, convex, and symmetric subsets of the ambient space

C^{1} (O)

, and therefore by the correlation inequality for Gaussian measures (cf. Lemma A.2 in [39]),

Π_{n} (B_{1} \cap B_{2}) \geq Π_{n} (B_{1}) Π_{n} (B_{2}) .

Recalling the definition of the scaled Gaussian priors

Π_{n} (\cdot)

in (6), we have for

W \sim Π_{W_{n}} (\cdot)

,

Π_{n} (B_{2}) = {Pr (∥ W ∥}_{C^{1}} \leq M^{'} \sqrt{n} ϵ_{n} {) \geq Pr (∥ W ∥}_{C^{1}} \leq M^{″}) > 0

for some

M^{″} > 0

since

\sqrt{n} ϵ_{n} > 1

for all n. Lastly, recalling the continuous embedding

H_{W_{n}} \subseteq H_{0}^{α} (O)

holding by assumption for some integer

α > β + d / 2

,

β \geq 1

, combining the metric entropy estimate

\begin{matrix} log & N (η; {F \in H_{W_{n}} {: ∥ F ∥}_{H_{W_{n}}} \leq {1}, ∥ \cdot ∥}_{{(H^{1})}^{*}}) \\ \leq log N (η; {F \in H_{0}^{α} {(O) : ∥ F ∥}_{H^{α}} \leq k_{4} {}, ∥ \cdot ∥}_{{(H^{1})}^{*}}) ≲ η^{- \frac{d}{α + 1}}, \end{matrix}

cf. Lemma 19 in [26], with Theorem 1.2 in [40], yields

\begin{matrix} Π_{n} (B_{1}) = {Pr (∥ W ∥}_{{(H^{1})}^{*}} \leq k_{2} \sqrt{n} ϵ_{n}^{2}) \geq e^{- k_{5} {(\sqrt{n} ϵ_{n}^{2})}^{- 2 \frac{d}{d + 1} {(2 - \frac{d}{α + 1})}^{- 1}}} = e^{- k_{5} n ϵ_{n}^{2}} . \end{matrix}

The obtained estimates thus jointly conclude the verification of (A2) for a large enough constant

c_{2}

.

Next, for

K > 0

, define the sieves

W_{n} : = {F : F = F_{1} + F_{2}, ∥ F_{1} ∥_{{(H^{1})}^{*}} \leq K ϵ_{n}, ∥ F_{2} ∥_{H_{W_{n}}} \leq {M, ∥ F ∥}_{C^{β}} \leq M} .

A direct application of Lemma 17 in [20] (with

κ = 1

in their notation) gives that for any positive

Q > 0

, there exists sufficiently large K such that

Π_{n} (W_{n}^{c}) \leq e^{- Q n ϵ_{n}^{2}}

. We then take K that is large enough and argue as in the proof of Lemma 18 in [20] to deduce that

log N (ϵ_{n}; W_{n}, d_{G}) ≲ n ϵ_{n}^{2}

. This concludes the verification of (A3) and then also of the first claim (9) of Theorem 1 since by Theorem 13 in [20] we obtain that

Π_{n} (F \in W_{n} : {∥ G (Φ \circ F) - G (Φ \circ F) ∥}_{L^{2}} \leq L ϵ_{n} | {(Y_{i}, X_{i})}_{i = 1}^{n}) = 1 - O_{P_{f_{0}}^{(n)}} (e^{- D n ϵ_{n}^{2}})

(A4)

as

n \to \infty

for some

L, D > 0

.

Step II: remaining claims.

Assume now that

β \geq 2

, and recall that

α > β + d / 2

(with

α, β \in N

). Since

C^{β} (O) \subset H^{β} (O)

, Lemmas 23 and 29 in [26] imply that for all

F \in W_{n}

, with

W_{n}

for the above sieve sets, we have

G (Φ \circ F) \in H^{β + 1} (O)

and

{∥ G (Φ \circ F) ∥}_{H^{β + 1}} ≲ 1 + {∥ Φ \circ F ∥}_{H^{β}}^{β (β + 1)} ≲ 1,

with a multiplicative constant independent of F. Similarly, since

F_{0} \in H_{0}^{α} (O) \subset H^{β} (O)

,

G (Φ \circ F_{0}) \in H^{β + 1} (O)

and

∥ G (Φ \circ F_{0}) ∥_{H^{β + 1}} ≲ 1

. By the standard interpolation inequality for Sobolev spaces, for all

F \in W_{n}

, we then have

\begin{matrix} ∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥_{H^{2}} & ≲ ∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥_{L^{2}}^{\frac{β - 1}{β + 1}} {∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥}_{H^{β}}^{\frac{2}{β + 1}} \\ ≲ ∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥_{L^{2}}^{\frac{β - 1}{β + 1}} . \end{matrix}

Combined with (A4), this implies that

Π_{n} (F \in W_{n} : {∥ G (Φ \circ F) - G (Φ \circ F_{0}) ∥}_{H^{2}} \leq L^{'} ϵ_{n}^{\frac{β - 1}{β + 1}} | {(Y_{i}, X_{i})}_{i = 1}^{n}) = 1 - O_{P_{f_{0}}^{(n)}} (e^{- D n ϵ_{n}^{2}})

for

L^{'} > 0

as

n \to \infty

. The second claim (10) of Theorem 1 is then verified using the following stability estimate for the forward operator G,

∥ f_{1} - f_{2} ∥_{L^{2}} ≲ {∥ f ∥}_{C^{1}} {∥ G (f) - G (f_{0}) ∥}_{H^{2}}

holding for all

f, f_{0} \in F_{α, f_{\min}}

(the parameter space from (5)) provided that

α > 2 + d / 2

and

{inf}_{x \in O} s (x) > 0

, cf. Lemma 24 in [26]. Indeed, combined with the second to last display, this gives

Π_{n} (F \in W_{n} : {∥ Φ \circ F - Φ \circ F_{0} ∥}_{L^{2}} \leq L^{'} ϵ_{n}^{\frac{β - 1}{β + 1}} | {(Y_{i}, X_{i})}_{i = 1}^{n}) = 1 - O_{P_{f_{0}}^{(n)}} (e^{- D n ϵ_{n}^{2}}) .

Based on the last display, the proof of the last claim (11) now follows arguing exactly as for Theorem 6 in [20].

References

Engl, H.W.; Hanke, M.; Neubauer, A. Regularization of Inverse Problems; Mathematics and Its Applications; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1996; Volume 375, p. viii+321. [Google Scholar]
Kaltenbacher, B.; Neubauer, A.; Scherzer, O. Iterative Regularization Methods for Nonlinear Ill-Posed Problems; Radon Series on Computational and Applied Mathematics; Walter de Gruyter GmbH & Co. KG: Berlin, Germany, 2008; Volume 6, p. viii+194. [Google Scholar] [CrossRef]
Isakov, V. Inverse Problems for Partial Differential Equations, 3rd ed.; Applied Mathematical Sciences; Springer: Cham, Switzerland, 2017; Volume 127, p. xv+406. [Google Scholar] [CrossRef]
Kaipio, J.; Somersalo, E. Statistical and Computational Inverse Problems; Number 160 in Applied Mathematical Sciences; Springer: New York, NY, USA, 2004. [Google Scholar]
Bissantz, N.; Hohage, T.; Munk, A. Consistency and rates of convergence of nonlinear Tikhonov regularization with random noise. Inverse Probl. 2004, 20, 1773–1789. [Google Scholar] [CrossRef]
Hohage, T.; Pricop, M. Nonlinear Tikhonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inverse Probl. Imaging 2008, 2, 271–290. [Google Scholar] [CrossRef]
Benning, M.; Burger, M. Modern regularization methods for inverse problems. Acta Numer. 2018, 27, 1–111. [Google Scholar] [CrossRef]
Arridge, S.; Maass, P.; Öktem, O.; Schönlieb, C.B. Solving inverse problems using data-driven models. Acta Numer. 2019, 28, 1–174. [Google Scholar] [CrossRef]
Nickl, R. Bayesian Non-Linear Statistical Inverse Problems; Zurich Lectures in Advanced Mathematics; EMS Press: Berlin, Germany, 2023; p. xi+15. [Google Scholar] [CrossRef]
Evans, L.C. Partial Differential Equations, 2nd ed.; Graduate Studies in Mathematics; American Mathematical Society: Providence, RI, USA, 2010; Volume 19, p. xxii+749. [Google Scholar] [CrossRef]
Yeh, W.W.G. Review of Parameter Identification Procedures in Groundwater Hydrology: The Inverse Problem. Water Resour. Res. 1986, 22, 95–108. [Google Scholar] [CrossRef]
Richter, G.R. An inverse problem for the steady state diffusion equation. SIAM J. Appl. Math. 1981, 41, 210–221. [Google Scholar] [CrossRef]
Knowles, I. Parameter identification for elliptic problems. J. Comput. Appl. Math. 2001, 131, 175–194. [Google Scholar] [CrossRef]
Bonito, A.; Cohen, A.; DeVore, R.; Petrova, G.; Welper, G. Diffusion coefficients estimation for elliptic partial differential equations. SIAM J. Math. Anal. 2017, 49, 1570–1592. [Google Scholar] [CrossRef]
Dashti, M.; Stuart, A.M. Uncertainty quantification and weak approximation of an elliptic inverse problem. SIAM J. Numer. Anal. 2011, 49, 2524–2542. [Google Scholar] [CrossRef]
Cotter, S.; Roberts, G.; Stuart, A.; White, D. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster. Stat. Sci. 2013, 28, 424–446. [Google Scholar] [CrossRef]
Stuart, A.M. Inverse problems: A Bayesian perspective. Acta Numer. 2010, 19, 451–559. [Google Scholar] [CrossRef]
Vollmer, S.J. Posterior consistency for Bayesian inverse problems through stability and regression results. Inverse Probl. 2013, 29, 125011. [Google Scholar] [CrossRef]
Abraham, K.; Nickl, R. On statistical Calderón problems. Math. Stat. Learn. 2019, 2, 165–216. [Google Scholar] [CrossRef]
Giordano, M.; Nickl, R. Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem. Inverse Probl. 2020, 36, 85001–85036. [Google Scholar] [CrossRef]
Monard, F.; Nickl, R.; Paternain, G.P. Consistent inversion of noisy non-Abelian X-ray transforms. Comm. Pure Appl. Math. 2021, 74, 1045–1099. [Google Scholar] [CrossRef]
Giordano, M. Asymptotic Theory for Bayesian Nonparametric Inference in Statistical Models Arising from Partial Differential Equations. Doctoral Thesis, University of Cambridge, Cambridge, UK, 2021. [Google Scholar] [CrossRef]
Agapiou, S.; Wang, S. Laplace priors and spatial inhomogeneity in Bayesian inverse problems. Bernoulli 2024, 30, 878–910. [Google Scholar] [CrossRef]
Ghosal, S.; van der Vaart, A.W. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: New York, NY, USA, 2017. [Google Scholar]
Giné, E.; Nickl, R. Mathematical Foundations of Infinite-Dimensional Statistical Models; Cambridge University Press: New York, NY, USA, 2016; p. xiv+690. [Google Scholar] [CrossRef]
Nickl, R.; van de Geer, S.; Wang, S. Convergence rates for penalized least squares estimators in PDE constrained regression problems. SIAM/ASA J. Uncertain. Quantif. 2020, 8, 374–413. [Google Scholar] [CrossRef]
Dashti, M.; Stuart, A.M. The Bayesian approach to inverse problems. In Handbook of Uncertainty Quantification; Springer: Cham, Switzerland, 2017; pp. 311–428. [Google Scholar]
Haroske, D.D.; Triebel, H. Distributions, Sobolev Spaces, Elliptic Equations. EMS Press: Berlin, Germany, 2007. [Google Scholar]
Taylor, M.E. Partial Differential Equations I; Springer: New York, NY, USA, 2011. [Google Scholar]
Hairer, M.; Stuart, A.; Vollmer, S. Spectral Gaps for a Metropolis-Hastings Algorithm in Infinite Dimensions. Ann. Appl. Probab. 2014, 24, 2455–2490. [Google Scholar] [CrossRef]
Hall, P.; Marron, J. On variance estimation in nonparametric regression. Biometrika 1990, 77, 415–419. [Google Scholar] [CrossRef]
Wahba, G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 1978, 40, 364–372. [Google Scholar] [CrossRef]
Rice, J. Bandwidth choice for nonparametric regression. Ann. Statist. 1984, 12, 1215–1230. [Google Scholar] [CrossRef]
Kejzlar, V.; Son, M.; Bhattacharya, S.; Maiti, T. A fast and calibrated computer model emulator: An empirical Bayes approach. Stat. Comput. 2021, 31, 49. [Google Scholar] [CrossRef]
Knapik, B.; Szabò, B.; van der Vaart, A.W.; van Zanten, H. Bayes procedures for adaptive inference in inverse problems for the white noise model. Probab. Theory Relat. Fields 2015, 164, 771–813. [Google Scholar] [CrossRef]
Rousseau, J.; Szabo, B. Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator. Ann. Stat. 2017, 45, 833–865. [Google Scholar] [CrossRef] [PubMed]
Teckentrup, A.L. Convergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. SIAM/ASA J. Uncertain. Quantif. 2020, 8, 1310–1337. [Google Scholar] [CrossRef]
Agapiou, S.; Bardsley, J.M.; Papaspiliopoulos, O.; Stuart, A.M. Analysis of the Gibbs sampler for hierarchical inverse problems. SIAM/ASA J. Uncertain. Quantif. 2014, 2, 511–544. [Google Scholar] [CrossRef]
Giordano, M.; Ray, K. Nonparametric Bayesian inference for reversible multidimensional diffusions. Ann. Stat. 2022, 50, 2872–2898. [Google Scholar] [CrossRef]
Li, W.V.; Linde, W. Approximation, metric entropy and small ball estimates for Gaussian measures. Ann. Probab. 1999, 27, 1556–1578. [Google Scholar] [CrossRef]

Figure 1. (Left): An example of diffusivity function f with four circular regions of higher conductivity. (Right):

n = 4500

noisy observations from the corresponding PDE solution

G (f)

.

Figure 1. (Left): An example of diffusivity function f with four circular regions of higher conductivity. (Right):

n = 4500

noisy observations from the corresponding PDE solution

G (f)

.

Figure 2. Left to right: posterior mean estimates

{\bar{f}}_{n}

of the diffusivity function f for increasing sample sizes

n = 100, 250, 1000

.

Figure 2. Left to right: posterior mean estimates

{\bar{f}}_{n}

of the diffusivity function f for increasing sample sizes

n = 100, 250, 1000

.

Figure 3. Left to right: The first, second, and fiftieth Dirichlet–Laplacian eigenfunctions

e_{0}, e_{1}

and

e_{49}

, computed via finite element methods.

Figure 3. Left to right: The first, second, and fiftieth Dirichlet–Laplacian eigenfunctions

e_{0}, e_{1}

and

e_{49}

, computed via finite element methods.

Figure 4. Dirichlet–Laplacian eigenvalues

λ_{j}

in the range

[0, 500]

, computed via finite element methods.

Figure 4. Dirichlet–Laplacian eigenvalues

λ_{j}

in the range

[0, 500]

, computed via finite element methods.

Figure 5. (Left): acceptance rate over the first 10,000 pCN samples. The rate stabilises around

30 %

after the initial burn-in time (first 5000 iterates). (Right): in blue, the log-likelihood

l_{n} (Φ \circ ϑ_{h})

of the first 3000 iterates; in red, the log-likelihood

l_{n} (f_{0})

of the true diffusion coefficient

f_{0}

.

Figure 5. (Left): acceptance rate over the first 10,000 pCN samples. The rate stabilises around

30 %

after the initial burn-in time (first 5000 iterates). (Right): in blue, the log-likelihood

l_{n} (Φ \circ ϑ_{h})

of the first 3000 iterates; in red, the log-likelihood

l_{n} (f_{0})

of the true diffusion coefficient

f_{0}

.

Table 1.

L^{2}

-estimation errors and relative errors for the posterior mean estimates with increasing sample sizes.

Table 1.

L^{2}

-estimation errors and relative errors for the posterior mean estimates with increasing sample sizes.

n	100	250	500	1000
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}}$ ; series prior	0.2981	0.2232	0.2144	0.1581
$∥ {\bar{f}}_{n} - f_{0} ∥_{2} / {∥ f_{0} ∥}_{L^{2}}$ ; series prior	17.67%	12.23%	12.71%	9.36%
$∥ {\bar{f}}_{n} - f_{0} ∥_{L^{2}}$ ; Matérn prior	0.3289	0.2677	0.2033	0.1647
$∥ {\bar{f}}_{n} - f_{0} ∥_{2} / {∥ f_{0} ∥}_{L^{2}}$ ; Matérn prior	18.98%	15.86%	12.05%	9.76%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Giordano, M. Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation. Foundations 2025, 5, 14. https://doi.org/10.3390/foundations5020014

AMA Style

Giordano M. Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation. Foundations. 2025; 5(2):14. https://doi.org/10.3390/foundations5020014

Chicago/Turabian Style

Giordano, Matteo. 2025. "Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation" Foundations 5, no. 2: 14. https://doi.org/10.3390/foundations5020014

APA Style

Giordano, M. (2025). Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation. Foundations, 5(2), 14. https://doi.org/10.3390/foundations5020014

Article Menu

Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Likelihood, Prior and Posterior

2.2. Convergence Rates

2.3. Examples of Gaussian Priors

3. Results

3.1. Results with Truncated Gaussian Series Priors

3.2. Results with the Matérn Process Prior

4. Discussion

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Further Numerical Results

Appendix B. Proof of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation †

Abstract

1. Introduction

2. Materials and Methods

2.1. Likelihood, Prior and Posterior

2.2. Convergence Rates

2.3. Examples of Gaussian Priors

3. Results

3.1. Results with Truncated Gaussian Series Priors

3.2. Results with the Matérn Process Prior

4. Discussion

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Further Numerical Results

Appendix B. Proof of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Bayesian Nonparametric Inference in Elliptic PDEs: Convergence Rates and Implementation^†