ROM-Based Inexact Subdivision Methods for PDE-Constrained Multiobjective Optimization

Stefan Banholzer; Bennet Gebken; Lena Reichle; Stefan Volkwein

doi:10.3390/mca26020032

,

and

¹

Department of Mathematics and Statistics, University of Konstanz, 78457 Konstanz, Germany

²

Faculty for Computer Science, Electrical Engineering and Mathematics, Paderborn University, 33098 Paderborn, Germany

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Math. Comput. Appl.2021, 26(2), 32;https://doi.org/10.3390/mca26020032

This article belongs to the Special Issue Set Oriented Numerics 2022

Version Notes

Order Reprints

Review Reports

Abstract

The goal in multiobjective optimization is to determine the so-called Pareto set. Our optimization problem is governed by a parameter-dependent semi-linear elliptic partial differential equation (PDE). To solve it, we use a gradient-based set-oriented numerical method. The numerical solution of the PDE by standard discretization methods usually leads to high computational effort. To overcome this difficulty, reduced-order modeling (ROM) is developed utilizing the reduced basis method. These model simplifications cause inexactness in the gradients. For that reason, an additional descent condition is proposed. Applying a modified subdivision algorithm, numerical experiments illustrate the efficiency of our solution approach.

Keywords:

multiobjective optimization; PDE-constrained optimization; reduced-order modeling; set-oriented methods; inexact optimization

1. Introduction

Multiobjective optimization plays an important role in many applications, e.g., in industry, medicine, or engineering. One of the mentioned examples is the minimization of costs with simultaneous quality optimization in production or the minimization of CO₂ emission in energy generation and simultaneous cost minimization. These problems lead to multiobjective optimization problems (MOPs), where we want to achieve an optimal compromise with respect to all given objectives at the same time. Normally, the different objectives are contradictory such that there exists an infinite number of optimal compromises. The set of these compromises is called the Pareto set. The goal is to approximate the Pareto set in an efficient way, which turns out to be more expensive than solving a single objective optimization problem.

As multiobjective optimization problems are of great importance, there exist several algorithms to solve them. Among the most popular methods are scalarization methods, which transform MOPs into scalar problems. For example, in the weighted sum method [1,2,3,4], convex combinations of the original objectives are optimized. Another popular approach is to use non-deterministic methods like evolutionary algorithms, cf., e.g., [5]. Furthermore, as multiobjective problems are generalizations of scalar problems, some solution methods can be generalized from the scalar to the multiobjective case [6,7,8].

In addition to the classical methods above, there are set-based strategies for the solution of MOPs. Continuation methods [9,10,11] use the fact that the Pareto set is typically (the projection of) a smooth manifold. Subdivision methods [12,13,14,15] use tools from the area of dynamical systems to generate a covering of the Pareto set via hypercubes. However, especially when the objective functions and their gradients are expensive to evaluate, e.g., as an underlying PDE has to be solved for every evaluation, the computational time of these methods can quickly become very large. In the presence of PDE constraints, surrogate models offer a promising tool to reduce the computational effort significantly [16]. Examples are dimensional reduction techniques such as the Reduced Basis (RB) Method [17,18]. In an offline phase, a low-dimensional surrogate model of the PDE is constructed by using, e.g., the greedy algorithm, cf. [17]. In the online phase, only the RB model is used to solve the PDE, which saves a lot of computing time.

In this article, we combine an extension of the set-oriented method presented in [12] based on inexact gradient evaluations of the objective functions with an RB approach and a discrete empirical interpolation method (DEIM) [19,20] for semi-linear elliptic PDEs. In order to deal with the inexactness introduced by the surrogate model, we combine the first-order optimality conditions for multiobjective optimization problems with error estimates for the RB-DEIM method and derive an additional condition for the descent direction [9] to get a tight superset of the Pareto set. This approach allows us to better control the quality of the result by controlling the errors for the objective functions independently. In order to obtain an even tighter superset of the Pareto set, we update these error estimates in our subdivision algorithm after each iteration step.

The article is organized as follows. In Section 2, we recall the basic concepts of multiobjective optimization problems and review results on descent directions with exact and inexact gradients. Furthermore, we develop a set-oriented method to solve these problems, where only inexact gradient information is utilized. In Section 3, the PDE-constrained multiobjective optimization problem and the underlying semi-linear PDE are introduced. Subsequently, we show how reduced-order modeling can be applied efficiently. In Section 4, numerical results concerning both the subdivision and the modified algorithm are presented. Finally, we give a conclusion and discuss possible future work in Section 5.

2. A Set-Oriented Method for Multiobjective Optimization with Inexact Objective Gradients

In this section, we briefly recall the basic concepts of multiobjective optimization. Furthermore, we develop a set-oriented method to solve these problems, where only inexact gradient information is utilized.

2.1. Multiobjective Optimization

Let

μ_{a}, μ_{b} \in R^{m}

with

μ_{a} \leq μ_{b}

be arbitrary. We define the convex and compact parameter set

P_{ad} = [μ_{a}, μ_{b}] \subset R^{m}

. Now, the goal is to solve the constrained multiobjective optimization problem

\min \hat{J} (μ) subject to (s . t .) μ \in P_{ad}

(1)

with a given objective

\hat{J} = ({\hat{J}}_{1}, \dots, {\hat{J}}_{k}) : P_{ad} \to R^{k}

and

k > 1

.

Compared to scalar optimization, we do not have a natural total order of

R^{k}

for

k > 1

. Therefore, we cannot expect that there is a single point in

P_{ad}

that minimizes all objectives

{\hat{J}}_{i}

simultaneously. For this reason, we make use of the following definition.

Definition 1.

(a): A point $\bar{μ} \in P_{ad}$ is called (globally) Pareto optimal, if there is no $μ \in P_{ad}$ satisfying $\hat{J} (μ) ≨ \hat{J} (\bar{μ})$ . In that case we call $\bar{μ}$ a Pareto point.
(b): The set of all Pareto points in $P_{ad}$ is called Pareto set and is denoted by

$P = \{μ \in P_{ad} | μ i s P a r e t o o p t i m a l\} .$
(c): The image $\hat{J} (P) \subset R^{k}$ of the Pareto set under $\hat{J}$ is called the Pareto front.

If

\hat{J}

is continuously differentiable on an open set containing

P_{ad}

, then there exists a first-order necessary condition for Pareto optimality. To formulate this condition, we define the convex, closed, and bounded set

∆_{k} = \{α \in R^{k} | α_{i} \geq 0 and \sum_{i = 1}^{k} α_{i} = 1\} .

Further, the row vector

\nabla {\hat{J}}_{i} (μ) = {(\frac{\partial {\hat{J}}_{i}}{\partial μ_{j}} (μ))}_{1 \leq j \leq m} \in R^{1 \times m}

stands for the gradient of the i-th objective

{\hat{J}}_{i}

with

i \in {1, \dots, k}

.

Definition 2.

If for a given

\bar{μ} \in P_{ad}

there exists an

\bar{α} = {({\bar{α}}_{i})}_{1 \leq i \leq k} \in ∆_{k}

with

\begin{matrix} \sum_{i = 1}^{k} {\bar{α}}_{i} \nabla {\hat{J}}_{i} (\bar{μ}) (μ - \bar{μ}) & = {(μ - \bar{μ})}^{⊤} D \hat{J} {(\bar{μ})}^{⊤} \bar{α} \geq 0 for all μ \in P_{ad} \end{matrix}

(2)

then we call

\bar{μ}

Pareto critical, where

D \hat{J} (μ) = (\begin{matrix} \nabla {\hat{J}}_{1} (\bar{μ}) \\ ⋮ \\ \nabla {\hat{J}}_{k} (μ) \end{matrix}) \in R^{k \times m}

denotes the Jacobi matrix of

\hat{J}

at μ. The set of all Pareto critical points is called the Pareto critical set, denoted by

P_{c}

.

Now, we recall the first-order necessary optimality conditions for (1).

Theorem 1.

Let

\hat{J}

be continuously differentiable and

\bar{μ} \in P_{ad}

Pareto-optimal. Then,

\bar{μ}

is Pareto-critical, i.e., it holds

P \subset P_{c}

. Condition (2) is called the Karush–Kuhn–Tucker (KKT) condition for multiobjective optimization problems.

Proof.

The claim follows from ([1] Theorem 3.25) and the specific choice of

P_{ad}

. □

Remark 1.

(a): Let $\bar{μ}$ belong to the interior of $P_{ad}$ , i.e., $\bar{μ} \in int (P_{ad}) = (μ_{a}, μ_{b})$ . Then (2) is equivalent to

$\sum_{i = 1}^{k} {\bar{α}}_{i} \nabla {\hat{J}}_{i} (\bar{μ}) = α^{⊤} D \hat{J} (μ) = 0 i n R^{1 \times m}$

(3)

or

$D \hat{J} {(μ)}^{⊤} α = 0 i n R^{m} = R^{m \times 1};$

see also in [21].
(b): Throughout the paper, we only calculate the Pareto critical points in the interior of $P_{ad}$ and make use of (3). The idea is to choose $P_{ad}$ sufficiently large so that we get $P_{c} \subset i n t (P_{ad})$ .
(c): Due to Theorem 1, we have

$P \subset P_{c} = \{μ \in P_{ad} | \exists α = α (μ) \in ∆_{k} : D \hat{J} {(\bar{μ})}^{⊤} α = 0 i n R^{m}\} \subset i n t (P_{ad}) .$

provided $P_{c} \subset int (P_{ad})$ holds true. ◊

2.2. Descent Direction with Exact Gradients

Next, we introduce the notion of a descent direction for the vector valued objective function

\hat{J}

at a non-Pareto critical point

μ \notin P_{c}

. From now on, we assume that

\hat{J} : P_{ad} \to R^{k}

is continuously differentiable (on an open set containing

P_{ad}

).

Definition 3.

The vector

v \in R^{m}

is a descent direction for

\hat{J}

in

μ \in P_{ad}

, if we have

\nabla {\hat{J}}_{i} (μ) v \leq 0 f o r a l l i \in {1, \dots, k}

and if there is at least one

i \in {1, \dots, k}

with

\nabla {\hat{J}}_{i} (μ) v < 0

.

One way to compute a descent direction is to solve a constrained quadratic optimization problem in

R^{k}

as shown in the following theorem. For a proof we refer to the work in [8]. A similar result was shown in [22].

Theorem 2.

For given

μ \in int (P_{ad})

let

\bar{α} = \bar{α} (μ) \in ∆_{k}

be a (global) solution of the convex constrained quadratic minimization problem

min \frac{1}{2} {∥ D \hat{J} {(μ)}^{⊤} α ∥}_{2}^{2} s . t . α \in ∆_{k} .

(4)

Then, we have either

D \hat{J} {(μ)}^{⊤} \bar{α} = 0

or

v = - D \hat{J} {(μ)}^{⊤} \bar{α} \in R^{m}

is a descent direction for

\hat{J}

in μ.

Combined with a backtracking Armijo line search, the descent direction from Theorem 2 can be used to construct the steepest descent method in Algorithm 1.

Algorithm 1: Steepest descent method.

Remark 2.

(a): If Algorithm 1 terminates after a finite number l of iterations, then $μ^{l}$ is a Pareto critical point.
(b): Assume that Algorithm 1 does not stop after a finite number of iterations. Then, every accumulation point $\bar{μ}$ of the sequence ${μ^{l}}_{l \in N}$ generated by Algorithm 1 is a Pareto critical point. A proof based on ([7], Theorem 1) can be found in ([23] Theorem 5.2.5). ◊

2.3. Descent Direction with Inexact Gradients

Suppose that we have continuously differentiable approximations

{\hat{J}}^{ℓ} = {({\hat{J}}_{i}^{ℓ})}_{1 \leq i \leq k}

of the objective function

\hat{J}

satisfying

max_{μ \in P_{ad}} {∥ \nabla {\hat{J}}_{i} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} \leq ε_{i} for i \in {1, . . ., k},

(5)

for given tolerances

ε_{i} \geq 0

. By

P \subset P_{ad}

we denote the Pareto set for

\hat{J}

and by

P^{ℓ} \subset P_{ad}

the Pareto set for

{\hat{J}}^{ℓ}

. If we write

P_{c}

, we mean the Pareto-critical set for

\hat{J}

and

P_{c}^{ℓ}

is the Pareto-critical set for

{\hat{J}}^{ℓ}

. Note that, in general, we have neither

P^{ℓ} \subset P

nor

P \subset P^{ℓ}

.

In this section, our goal is to compute an approximation of

P

based on the approximation

{\hat{J}}^{ℓ}

of the objective function and the error bounds

ε_{i}

. We begin by investigating the relationship between the KKT conditions of the original objective function and its approximation.

Lemma 1.

Let (5) be satisfied and

\bar{μ} \in int (P_{ad})

be Pareto-critical for

\hat{J}

with the KKT-condition vector

\bar{α} \in ∆_{k}

. Then, it holds that

{∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} ∥}_{2} \leq \sum_{i = 1}^{k} {\bar{α}}_{i} ε_{i} = {⟨ \bar{α}, ε ⟩}_{2} \leq {∥ ε ∥}_{\infty}

(6)

with

ε = {(ε_{i})}_{1 \leq i \leq k}

, where we set

{⟨ \bar{α}, ε ⟩}_{2} = {\bar{α}}^{⊤} ε

.

Proof.

From

\bar{μ} \in int (P_{ad})

and Remark 1-a) we infer that

D \hat{J} {(\bar{μ})}^{⊤} \bar{α} = 0

holds. Therefore,

\begin{matrix} {∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} ∥}_{2} & = {∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} - D \hat{J} {(\bar{μ})}^{⊤} \bar{α} ∥}_{2} = ∥ \sum_{j = 1}^{k} (\nabla {\hat{J}}_{j}^{ℓ} (\bar{μ}) - \nabla {\hat{J}}_{j} (\bar{μ})) {\bar{α}}_{j} ∥_{2} \\ \leq \sum_{j = 1}^{k} ∥ \nabla {\hat{J}}_{j}^{ℓ} (\bar{μ}) - \nabla {\hat{J}}_{j} (\bar{μ}) ∥_{2} {\bar{α}}_{j} \leq \sum_{j = 1}^{k} ε_{j} {\bar{α}}_{j} = {⟨ \bar{α}, ε ⟩}_{2} \leq {∥ ε ∥}_{\infty} \end{matrix}

which gives the desired results. □

Based on estimate (6), we define two approximation sets for the Pareto-critical set

P_{c}

of

\hat{J}

.

Definition 4.

Let us introduce the two sets

P_{1}^{ℓ} = \{μ \in int (P_{ad}) | min_{α \in ∆_{k}} {∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α ∥}_{2}^{2} \leq {∥ ε ∥}_{\infty}^{2}\} \subset P_{ad}

and

P_{2}^{ℓ} = \{μ \in int (P_{ad}) | min_{α \in ∆_{k}} ({∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α ∥}_{2}^{2} - {⟨ α, ε ⟩}_{2}^{2}) \leq 0\} \subset P_{ad} .

Lemma 2.

It holds that

\begin{matrix} P_{c} \subset P_{2}^{ℓ}, P_{c}^{ℓ} \subset P_{2}^{ℓ} a n d P_{2}^{ℓ} \subset P_{1}^{ℓ} . \end{matrix}

Proof.

Let

\bar{μ} \in P_{c}

be a Pareto-critical point of

\hat{J}

, then there exists

\bar{α} \in ∆_{k}

with

D \hat{J} {(\bar{μ})}^{⊤} \bar{α} = 0

. From Lemma 1, it follows that

P_{c} \subset P_{2}^{ℓ}

. Next, we assume that

\bar{μ} \in P_{c}^{ℓ}

is a Pareto-critical point of

{\hat{J}}^{ℓ}

, then there exists

\bar{α} \in ∆_{k}

with

D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} = 0 .

This implies

\begin{matrix} min_{α \in ∆_{k}} (∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} α ∥_{2}^{2} - {⟨ α, ε ⟩}^{2}) \leq ∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} ∥_{2}^{2} - {⟨ \bar{α}, ε ⟩}_{2}^{2} = - {⟨ \bar{α}, ε ⟩}^{2} \leq 0 . \end{matrix}

Therefore, we have

\bar{μ} \in P_{2}^{ℓ} .

Let

\bar{μ} \in P_{2}^{ℓ}

. Then, there exists

\bar{α} \in ∆_{k}

with

\begin{matrix} ∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} ∥_{2}^{2} - {⟨ \bar{α}, ε ⟩}_{2}^{2} \leq 0 . \end{matrix}

Thus, we get

∥ D {\hat{J}}^{ℓ} {(\bar{μ})}^{⊤} \bar{α} ∥_{2}^{2} \leq {⟨ \bar{α}, ε ⟩}_{2}^{2} \leq {∥ ε ∥}_{\infty}^{2}

which implies

P_{2}^{ℓ} \subset P_{1}^{ℓ}

. □

Our goal is to compute the set

P_{2}^{l}

via a descent method like Algorithm 1. To this end, the following theorem presents a modified version of the descent direction (4), which additionally takes the error bounds

ε_{i}

into account.

Theorem 3.

Let

ε = {(ε_{j})}_{1 \leq j \leq k}

with

ε \geq 0

and

μ \in int (P_{ad})

be given. Assume that

α_{ε}

is a minimizer of the quadratic problem

min_{α \in ∆_{k}} ({∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α ∥}_{2}^{2} - {⟨ α, ε ⟩}_{2}^{2}) .

(7)

Then, we have that

μ \in P_{2}^{ℓ}

or

v_{ε} = - D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε} \in R^{m}

is a descent direction for

{\hat{J}}^{ℓ}

in μ.

Proof.

The Lagrange functions for (7) is given as

\begin{matrix} L : R^{k} \times R \times R^{k} \to R, (α, λ, ϱ) \mapsto {∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α ∥}_{2}^{2} - {⟨ α, ϵ ⟩}_{2}^{2} + λ (1 - \sum_{j = 1}^{k} α_{j}) - \sum_{j = 1}^{k} ϱ_{j} α_{j} . \end{matrix}

As

α_{ε}

is a minimizer of (7), we get Lagrangian multipliers

λ \in R

and

ϱ \in R_{\geq 0}^{k}

with

\begin{matrix} 2 D {\hat{J}}^{ℓ} (μ) D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε} - 2 ε {⟨ ε, α_{ε} ⟩}_{2} + λ {(- 1, . . ., - 1)}^{⊤} - ϱ & = 0, \\ ϱ_{i} \geq 0 and {(α_{ε})}_{i} ϱ_{i} & = 0 . \end{matrix}

(8)

If we multiply (8) with

α_{ε}^{⊤}

from the left, we get

\begin{matrix} 2 α_{ε}^{⊤} D {\hat{J}}^{ℓ} (μ) D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε} - 2 {⟨ ε, α_{ε} ⟩}_{2}^{2} - λ \sum_{i = 1}^{m} ({(α_{ε})}_{i} - {(α_{ε})}_{i} ϱ_{i}) = 0 \end{matrix}

which implies

\begin{matrix} λ = 2 (∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε} ∥_{2}^{2} - {⟨ α_{ε}, ε ⟩}_{2}^{2}) . \end{matrix}

First case:

λ \leq 0

, then

μ \in P_{2}^{ℓ}

holds and we are done.

Second case:

λ > 0

, then

μ \notin P_{2}^{ℓ}

holds. In this case, we show that

v_{ε} = - D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε}

is a descent direction in

μ

for every objective function

{\hat{J}}_{j}^{ℓ}

with

j = 1, . . ., k

:

Define

\begin{matrix} K (μ) = \{D {\hat{J}}^{ℓ} {(μ)}^{⊤} α | α \in R^{k}, α_{i} \geq 0 and \sum_{i = 1}^{k} α_{i} = 1\} . \end{matrix}

If we can show that

w^{⊤} v_{ε} < 0

holds for every

w \in K

, we know that

v_{ε}

is a descent direction for every objective function

J_{i}^{ℓ}

in

μ

.

Choose

w \in K (μ)

. Then, there exists an

α_{w} \in ∆_{k}

with

w = D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{w}

, and using (8) we obtain

\begin{matrix} w^{⊤} v_{ε} & = {(D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{w})}^{⊤} v_{ε} = - α_{w}^{⊤} D {\hat{J}}^{ℓ} (μ) D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε} \\ = - α_{w}^{⊤} (ε {⟨ ε, α_{ε} ⟩}_{2} + \frac{1}{2} λ (1, . . ., 1) + \frac{1}{2} ϱ) = - ({⟨ α_{w}, ε ⟩}_{2} {⟨ α_{ε}, ε ⟩}_{2} + \frac{1}{2} λ + \frac{1}{2} {⟨ α_{w}, ϱ ⟩}_{2}) \\ \leq - \frac{1}{2} (λ + {⟨ α_{w}, ϱ ⟩}_{2}) \leq - \frac{λ}{2} < 0 . \end{matrix}

Therefore,

v_{ε}

is a descent direction in

μ

for every objective function

{\hat{J}}_{j}^{ℓ}

,

j = 1, . . ., k

. □

The descent direction

v_{ε}

from the previous theorem will be referred to as the modified descent direction. Based on this direction, we can now construct a descent method for the computation of

P_{2}^{l}

, which is shown in Algorithm 2.

Algorithm 2: Descent method with inexact gradients.

Remark 3.

(a): If Algorithm 2 terminates after l iteration steps, then $μ^{l}$ is contained in $P_{2}^{ℓ}$ .
(b): Assume that Algorithm 2 does not terminate after a finite number of iteration steps. Then, every accumulation point $\bar{μ}$ of the sequence ${μ^{l}}_{l \in N}$ generated by Algorithm 2 is in the set $P_{2}^{ℓ}$ . A proof based on ([7] Theorem 1) can be found in ([23] Theorem 5.3.5).
(c): Note that the tolerance ε is constant for all l throughout Algorithm 2. In Section 2.5, we will adapt ε in each iteration. ◊

2.4. Subdivision Algorithm

As mentioned in the introduction, there exist set-based solution methods for MOPs which globally approximate the Pareto set via sets (instead of a finite number of points). Here, we will consider the subdivision algorithm [12,13,15], which computes an approximation of the Pareto set as a covering of hypercubes (or boxes). The idea is to start with a large box containing the Pareto set which is then iteratively subdivided into smaller boxes, while eliminating boxes that do not contain part of the Pareto set.

There are essentially two versions of the subdivision scheme: one is gradient free and, thus, is particularly useful in the case when the evaluation of gradients is computationally expensive. We refer to the work in [12], where this variant is utilized to numerically realize a reduced-order approach for a PDE-constrained multiobjective optimization problem. The other one is directly based on a dynamical systems approach and utilizes gradient information in a similar way to memetic algorithms, see in [8]. Here, we will generalize the latter to the case of inexact gradients.

For a stepsize

t_{l} > 0

, let us formulate a descent step of the optimization procedure by

μ^{l + 1} = a (μ^{l}) = μ^{l} + t_{l} v^{l} or μ^{l + 1} = a_{ε} (μ^{l}) = μ^{l} + t_{l} v_{ε}^{l},

where

v^{l}

and

v_{ε}^{l}

are the descent directions given by Theorems 2 and 3, respectively, with the choice

μ = μ^{l}

. Depending on the descent step that we use, we either want to compute the Pareto-critical set

P_{c}

or the superset

P_{2}^{ℓ}

or

P_{1}^{ℓ}

of

P_{c}

. As these sets are the sets of fixed points for their respective descent step, we want to find the subset

A_{P} \subset int (P_{ad})

satisfying

a (A_{P}) = A_{P}

or

a_{ε} (A_{P}) = A_{P}

.

To generate the set

A_{P}

, we will use a subdivision method. This method produces an outer approximation of the set

A_{P}

in the form of a nested sequence of sets

B_{0}, B_{1}, . . . \subset P (P_{ad})

, where

P (P_{ad})

denotes the power set of

P_{ad}

and each

B_{l}

is a subset of

B_{l - 1}

in the sense that

⋃_{B \in B_{l}} B \subset ⋃_{B \in B_{l - 1}} B

holds and

B_{l}

consists of finitely many subsets B covering

A_{P}

for all

l \in N

. For each set

B_{l}

, we define a diameter through

diam (B_{l}) = {max}_{B \in B_{l}} (diam (B))

. Algorithm 3 shows the classical subdivision method (based on Theorem 2) and our modified descent direction (based on Theorem 3).

Algorithm 3: Subdivision algorithm.

Remark 4.

In order to realize the subdivision algorithm numerically we choose a similar way as described in ([13] Remark 2.4). Instead of working explicitly with the centers and radii of the boxes, these are stored within a binary tree in the subdivision step, whereby the memory requirement is noticeably reduced. The selection step is implemented using a certain number of sample points in each box. These sample points are chosen either on an a priori defined grid or randomly within the boxes. Afterwards,

a_{(ε)}

is evaluated in these points. For more details, we refer the reader to ([24] Section 5). ◊

2.5. Modified Subdivision Algorithm for Inexact Gradients

In Algorithm 2, we utilize the same error bounds

ε = (ε_{1}, . . ., ε_{k})

with

ε_{i} \geq 0

,

i = 1, . . ., k

, in each iteration step l. Note that the larger the

ε

, the greater the difference between

P_{c}

and

P_{2}^{ℓ}

or

P_{1}^{ℓ}

. In the algorithm, we produce an outer approximation of the set

A_{P}

with a nested sequence of sets

{B_{l}}_{l \in N}

by

{\tilde{B}}_{l} = ⋃_{B \in B_{l}} B \subseteq P_{ad} .

As it holds that

{\tilde{B}}_{l} \subset P_{ad}

, we have

max \{{∥ \nabla {\hat{J}}_{i} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} | μ \in {\tilde{B}}_{l}\} \leq max \{{∥ \nabla {\hat{J}}_{i} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} | μ \in P_{ad}\} for 1 \leq i \leq k .

Now we modify Algorithm 3 by utilizing the descent directions introduced in Theorem 3 and update

ε

after every iteration step l to generate a better approximation of the set

A_{P}

. For updating

ε

, we use the formula

ε_{i}^{l} = sup \{{∥ \nabla {\hat{J}}_{i} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} | μ \in {\tilde{B}}_{l}\} for 1 \leq i \leq k

(9)

and set

ε^{l} = {(ε_{i}^{l})}_{1 \leq i \leq k}

. Due to the nested choice of the box coverings, we have

ε_{i}^{l + 1} \leq ε_{i}^{l}

for

i = 1, . . ., k

and

l \in N

. In iteration step l, we generate the descent direction by computing

α_{ε}^{l} \in arg min \{{∥ D {\hat{J}}^{ℓ} {(μ)}^{⊤} α ∥}_{2}^{2} - {⟨ α, ε^{l} ⟩}_{2}^{2} | α \in ∆_{k}\} .

(10)

Then, we set

v_{ε}^{l} = - D {\hat{J}}^{ℓ} {(μ)}^{⊤} α_{ε}^{l} and μ^{l + 1} = a_{ε}^{l} (μ^{l}) = μ^{l} + t_{l} v_{ε}^{l} .

Typically, the set

P_{2}^{ℓ}

is far smaller then the admissible set

P_{ad}

. Therefore, we expect that the error bounds in (9) become significantly smaller than the ones from (5). Therefore, we expect that we get better results with the modified function

a_{ε}^{l}

instead of

a_{ε}

.

3. Multiobjective Optimization of a Semi-Linear Elliptic PDE

In this section, we introduce a multiobjective parameter optimization problem governed by a semi-linear elliptic PDE. Further, we show how reduced-order modeling can be applied efficiently.

3.1. Problem Formulation

Let

Ω \subset R^{d}

,

d \in {1, 2, 3}

, be a bounded domain with Lipschitz-continuous boundary

Γ = \partial Ω

. Then, we consider the problem

min_{(y, μ)} J (y, μ) = (\begin{matrix} J_{1} (y, μ) \\ ⋮ \\ J_{k - 1} (y, μ) \\ J_{k} (y, μ) \end{matrix}) = \frac{1}{2} (\begin{matrix} \int_{Ω} {| y - y_{1}^{d} |}^{2} d x \\ ⋮ \\ \int_{Ω} {| y - y_{k - 1}^{d} |}^{2} d x \\ \sum_{j = 1}^{m} {| μ_{j} - μ_{j}^{d} |}^{2} \end{matrix})

(11a)

subject to the elliptic boundary value problem

- c ∆ y + b y + d y^{3} = f + \sum_{i = 1}^{m} μ_{i} ξ_{i} in Ω, c \frac{\partial y}{\partial n} + y = g on Γ,

(11b)

where

y \in V = H^{1} (Ω)

is the state variable and

μ \in P_{ad} = [μ_{a}, μ_{b}]

the parameter. We suppose that

g \in L^{r} (Γ)

with

r > d - 1

,

ξ_{1}, . . ., ξ_{m}, f \in H = L^{2} (Ω)

,

μ^{d} = (μ_{j}^{d}) \in R^{m}

, and

y_{1}^{d}, . . ., y_{k - 1}^{d} \in H

. Moreover, b, c, and d are non-negative constants with

c > 0

.

As

Ω

is a bounded connected open set with smooth boundary, it is known that V is a Hilbert space endowed with the inner product

{⟨ φ, ψ ⟩}_{V} = \int_{Ω} \nabla φ \cdot \nabla ψ d x + \int_{Γ} φ ψ d s for φ, ψ \in V

and the induced norm

{∥ φ ∥}_{V} = {⟨ φ, φ ⟩}_{V}^{1 / 2}

for

φ \in V

, see ([25], p. 133) for instance.

3.2. The Parameter Dependent Semi-Linear Elliptic PDE

In this subsection, we study the state equation (11b). First, we define the nonlinear operator

A : V \to V^{'}

by

{⟨ A (y), φ ⟩}_{V^{'}, V} = \int_{Ω} c \nabla y \cdot \nabla φ + (b y + d y^{3}) φ d x + \int_{Γ} y φ d s for y, φ \in V .

Recall that

d \in {1, 2, 3}

implies

V ↪ L^{6} (Ω)

, cf. ([26] Section 7). Therefore, the operator

A

is well defined. Moreover, for

μ \in P_{ad}

the functional

b_{μ} \in V^{'}

is given by

{⟨ b_{μ}, φ ⟩}_{V^{'}, V} = \int_{Ω} (f + \sum_{i = 1}^{m} μ_{i} ξ_{i}) φ d x + \int_{Γ} g φ d s for φ \in V .

Now, we define a weak solution to the state Equation (11b).

Definition 5.

A weak solution of (11b) is a function

y \in V

satisfying

{⟨ A (y), φ ⟩}_{V^{'}, V} = {⟨ b_{μ}, φ ⟩}_{V^{'}, V} f o r a l l φ \in V .

(12)

The following result is proved, e.g., in ([26] Section 4.2.3).

Proposition 1.

For a fixed parameter

μ \in R^{m}

, there exists a unique solution

y \in V

to (11b). This solution is even continuous on

\bar{Ω}

, and for a constant

c_{\infty}

it holds that

{∥ y ∥}_{V} + {∥ y ∥}_{C (\bar{Ω})} \leq c_{\infty} ({∥ g ∥}_{L^{r} (Γ)} + ∥ f + \sum_{i = 1}^{m} μ_{i} ξ_{i} ∥_{H}) .

Remark 5.

We define the state space

Y = V \cap C (\bar{Ω})

, which is a Banach space endowed with the natural norm

{∥ φ ∥}_{Y} = {∥ φ ∥}_{V} + {∥ φ ∥}_{C (\bar{Ω})} f o r φ \in Y .

Motivated by Proposition 1 , we define the parameter-to-state mapping

S : P_{ad} \to Y

as follows: For a given parameter

μ \in P_{ad}

, the function

y = S (μ) \in Y

is the solution to (11b). It follows by standard arguments that

S

is continuously Fréchet-differentiable, see ([23] Sections 2 and 4). ◊

3.3. Reduced Formulation and Adjoint Approach

Utilizing the parameter-to-state mapping

S

, we define the reduced cost functional

\hat{J} (μ) = J (S (μ), μ) = (\begin{matrix} {\hat{J}}_{1} (μ) \\ ⋮ \\ {\hat{J}}_{k} (μ) \end{matrix}) for μ \in P_{ad}

with

{\hat{J}}_{i} (μ) = \frac{1}{2} \int_{Ω} | (S (μ)) (x) - y_{i}^{d} (x) |^{2} d x for 1 \leq i \leq k - 1

and

{\hat{J}}_{k} (μ) = \sum_{j = 1}^{m} {| μ_{j} - μ_{j}^{d} |}^{2} / 2

. Now, the reduced problem is given as

min \hat{J} (μ) s . t . μ \in P_{ad} .

(13)

If

\bar{μ} \in P_{ad}

is a locally optimal solution to (13), then the pair

(S (\bar{μ}), \bar{μ})

is a locally optimal solution to (11). Conversely, if

(S (\bar{μ}), \bar{μ}) \in Y \times P_{ad}

solves (11) locally, the parameter

\bar{μ}

is a locally optimal solution to (13).

To apply the subdivision algorithm, the reduced objective function

\hat{J}

has to be Fréchet-differentiable, following immediately from the fact that the parameter-to-state operator

S

is Fréchet-differentiable, cf. Remark 5. The gradient of

\hat{J}

can be expressed by introducing adjoint variables. For that purpose, we define the operators

B_{i} : V \times V \to V^{'}

,

1 \leq i \leq k - 1

, as

{⟨ B_{i} (p, y), φ ⟩}_{V^{'}, V} = \int_{Ω} c \nabla p \cdot \nabla φ + (b + 3 d y^{2}) p φ d x + \int_{Γ} p φ d x - \int_{G} (y - y_{i}^{d}) φ d x

for

p, y, φ \in V

. Next, we define adjoint variables.

Definition 6.

Let a parameter

μ \in P_{ad}

be given and

y (μ) = S (μ) \in Y

. For every

i \in {1, \dots, k - 1}

, we call the solution

p_{i} (μ) \in V

to

{⟨ B_{i} (p_{i} (μ), y (μ)), φ ⟩}_{V^{'}, V} = 0 f o r a l l φ \in V

(14)

the adjoint variable associated with the objective

{\hat{J}}_{i}

.

Remark 6.

(a): Notice that (14) is the weak formulation of the elliptic problem

$\begin{matrix} - c ∆ p_{i} (μ) + (b + 3 d y {(μ)}^{2}) p_{i} (μ) & = y (μ) - y_{i}^{d} & in Ω, \\ c \frac{\partial p_{i}}{\partial n} (μ) + p_{i} (μ) & = 0 & on Γ \end{matrix}$

(15)

for $i = 1, \dots, k - 1$ .
(b): Applying the Lax–Milgram lemma (see, e.g., ([26] Section 2)) one can show that (14) has a unique solution $p_{i} (μ) \in V$ for all $μ \in P_{ad}$ and $i = 1, \dots, k - 1$ . ◊

Now, we can express the gradient of the reduced cost functional as follows:

\nabla {\hat{J}}_{i} (μ) = (\begin{matrix} \int_{Ω} ξ_{1} p_{i} (μ) d x \\ ⋮ \\ \int_{Ω} ξ_{m} p_{i} (μ) d x \end{matrix}) for 1 \leq i \leq k - 1, \nabla {\hat{J}}_{k} (μ) = μ - μ^{d},

where

y (μ) \in Y

and

p_{i} (μ) \in V

,

i \in {1, \dots, k - 1}

, solve (12) and (14), respectively.

3.4. Finite Element (FE) Galerkin Discretization

Let us briefly introduce a standard finite element (FE) method based on a triangulation of the spatial domain

Ω

. Here, we utilize piecewise linear FE ansatz functions

φ_{1}, \dots, φ_{N} \in V

, which are linearly independent. We define the finite-dimensional subspace

V^{N} = span {φ_{1}, . . ., φ_{N}} \subset V

supplied with the same topology as in V.

Next, we replace (12) by a FE Galerkin scheme: For each

μ \in P_{ad}

, the FE solution

y^{N} (μ) \in V^{N}

solves

{⟨ A (y^{N} (μ)), φ_{i} ⟩}_{V^{'}, V} = {⟨ b_{μ}, φ_{i} ⟩}_{V^{'}, V} for i = 1, \dots, N .

(16)

It follows by the same arguments as for (12) that the FE problem (16) has a unique solution

y^{N} = y^{N} (μ)

for every

μ \in P_{ad}

. Therefore, the parameter-to-state FE mapping

S^{N} : P_{ad} \to V^{N}

,

μ \mapsto S^{N} (μ) = y^{N} (μ)

is well defined. For

y^{N} \in V^{N}

there exist coefficients

y_{i}^{N}

,

i = 1, \dots, N

, satisfying

y^{N} = \sum_{i = 1}^{N} y_{i}^{N} φ_{i} for x \in Ω .

(17)

Inserting (17) into (16), we can express (16) as a nonlinear algebraic system. For that purpose, we introduce the

N \times N

-matrices

K = ((\int_{Ω} \nabla φ_{j} \cdot \nabla φ_{i} d x)), M = ((\int_{Ω} φ_{j} φ_{i} d x)), Q = ((\int_{Γ} φ_{j} φ_{i} d s)),

the N-vectors

y^{N} = (y_{i}^{N}), F_{μ} = (\int_{Ω} (f + \sum_{j = 1}^{m} μ_{j} ξ_{j}) φ_{i} d x), G = (\int_{Γ} g φ_{i} d s),

and the nonlinearity function

H : R^{N} \to R^{N}

given as

H (v) = (\int_{Ω} {(\sum_{j = 1}^{N} v_{j} φ_{j})}^{3} φ_{i} d x) for v = (v_{1}, \dots, v_{N}) \in R^{N} .

Inserting (17) into (16), we end up with the following nonlinear system:

(c K + b M + Q) y^{N} + d H (y^{N}) = F_{μ} + G \in R^{N} .

(18)

Remark 7.

The difficulty of (18) lies in the fact that we cannot assemble

H (y^{N})

efficiently in terms of a matrix-vector multiplication. Therefore, we use mass lumping to compute

H (y^{N})

approximately. For an introduction into mass lumping see, e.g., ([27] Section 15) or ([28] Section 17.2). With that we can write (18) as a root finding problem of

(c K + b M + Q) y^{N} + d \tilde{M} {(y^{N})}^{3} = F_{μ} + G \in R^{N}

(19)

with

{(y^{N})}^{3} = {({(y_{i}^{N})}^{3})}_{1 \leq i \leq N}

and the lumped mass matrix

\tilde{M}

defined by

\begin{matrix} {\tilde{M}}_{k j} : = δ_{k j} \sum_{l = 1}^{N} M_{k l} . \end{matrix}

Further details can be found in [23,29]. Let us refer to in [30], where mass lumping is utilized in optimal control. ◊

Now, the FE objectives are given as

{\hat{J}}_{i}^{N} (μ) = \frac{1}{2} \int_{Ω} | S^{N} (μ) - y_{i}^{d} |^{2} d x for i = 1, . . ., k - 1, {\hat{J}}_{k}^{N} (μ) = \sum_{j = 1}^{m} | μ_{j} - μ_{j}^{d} |^{2} .

3.5. Reduced-Order Modelling (ROM)

To generate the Pareto-critical set of the MOP (11), we need to evaluate the reduced objectives

{\hat{J}}_{i}

,

i = 1, \dots, k

, and their gradients many times. Therefore, the state and adjoint equations have to be solved numerically very often. Therefore, the use of ROM is a suitable option. In this paper, we will use the Reduced Basis (RB) method. The main idea is to construct a low-dimensional (i.e.,

ℓ ≪ N

) subspace

V^{ℓ}

of the FE space

V^{N}

spanned by FE solutions of the state and adjoint equation for appropriately chosen parameters

μ \in P_{ad}

. Here, this strategy is realized by greedy algorithms. We refer to the works in [17,18,31] for a general explanation and to ([23] Section 3.2) for our specific problem (11). This is an iterative procedure where in each iteration FE solutions of the state and adjoint equation at a specific parameter value are added to the basis. An essential ingredient of greedy algorithms is the choice of an error indicator

η_{ℓ} (μ)

. Here, we use the maximal true error between the FE and the ROM gradients, i.e., we set

η_{ℓ} (μ) : = max_{1 \leq i \leq k} {∥ \nabla {\hat{J}}_{i}^{N} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} .

(20)

Our subdivision scheme is based on gradient information. To be able to generate

P_{1}^{ℓ}

or

P_{2}^{ℓ}

for the approximation of

P

, we have to ensure that the approximated objective function

{\hat{J}}^{ℓ}

based on the ROM model satisfies the inequality in (5). The idea is to generate a new basis element in every step until the maximum error

η_{ℓ} (μ)

on a discrete training set

S_{t r a i n}

, which approximates

P_{ad}

good enough, is smaller than a tolerance

ε_{t o l}

. For more details, see Algorithm 4.

Algorithm 4: Greedy algorithm.

As

V^{ℓ}

is a subset of V, we endow

V^{ℓ}

with the V-topology. Due to

ψ_{j} \in V^{N}

(

1 \leq j \leq ℓ

), there exists a coefficient matrix

Ψ \in R^{N \times ℓ}

such that

ψ_{j} (x) = \sum_{i = 1}^{N} Ψ_{i j} φ_{i} (x) for x \in Ω .

(21)

Now, we replace (16) by an RB Galerkin scheme: For each

μ \in P_{ad}

, the RB solution

y^{ℓ} (μ) \in V^{ℓ}

solves

{⟨ A (y^{ℓ} (μ)), ψ_{i} ⟩}_{V^{'}, V} = {⟨ b_{μ}, ψ_{i} ⟩}_{V^{'}, V} for i = 1, \dots, ℓ .

(22)

We suppose that (22) has a unique solution

y^{ℓ} = y^{ℓ} (μ)

for every

μ \in P_{ad}

. Therefore, the parameter-to-state RB mapping

S^{ℓ} : P_{ad} \to V^{ℓ}

,

μ \mapsto S^{ℓ} (μ) = y^{ℓ} (μ)

is well defined.

Inserting (21) and

y^{ℓ} (μ) = \sum_{i = 1}^{N} y_{i}^{ℓ} (μ) ψ_{i}

into (22), we derive the nonlinear algebraic system (cf. (18))

(c K^{ℓ} + b M^{ℓ} + Q^{ℓ}) + d Ψ^{⊤} H (Ψ y^{ℓ}) = F_{μ}^{ℓ} + G^{ℓ} \in R^{l} (ℓ ≪ N)

(23)

with the

ℓ \times ℓ

matrices

K^{ℓ} = Ψ^{⊤} K Ψ

,

M^{ℓ} = Ψ^{⊤} M Ψ

, and

Q^{ℓ} = Ψ^{⊤} Q Ψ

and the ℓ-vectors

F_{μ}^{ℓ} = Ψ^{⊤} F_{μ}

and

G^{ℓ} = Ψ^{⊤} G

.

Remark 8.

As mentioned in Remark 7, we apply mass lumping to evaluate the nonlinear function

H

more efficiently. With that we can write (23) as a root finding problem of

(c K^{ℓ} + b M^{ℓ} + Q^{ℓ}) y^{ℓ} + d {\tilde{M}}^{ℓ} {(Ψ y^{ℓ})}^{3} = F_{μ}^{ℓ} + G^{ℓ} \in R^{l} (ℓ ≪ N)

(24)

with

{(Ψ y^{ℓ})}^{3} = {(({(Ψ y^{ℓ})}_{i}^{3}))}_{1 \leq i \leq N}

and the

ℓ \times N

matrix

{\tilde{M}}^{ℓ} = Ψ^{⊤} \tilde{M}

. However, in the RB Galerkin scheme the evaluation of the nonlinearity is still as costly as in the FE case. Here, discrete empirical interpolation (DEIM) is applied, cf. [19,20]. We skip the detailed description here and refer the reader to ([23] Section 3.2). ◊

3.6. Convergence Analysis

We prove the convergence of the RB solution with mass lumping to the weak solution of the state and adjoint equation.

Remark 9.

As the FE space is a finite dimensional space, the error for the state and adjoint equation converges for increasing dimension of the RB space to zero. Therefore, the RB solution converges for increasing accuracy in the Greedy algorithm to the FE solution. We skip a detailed description of the proof here and refer the reader to ([23], Section 3.2). ◊

Theorem 4.

Let

{(S_{j})}_{j} \subset P_{ad}

with

S_{j} \subseteq S_{j + 1}

be a growing sequence of training sets and choose a monotone sequence

{(ε_{j})}_{j} \subset R_{> 0}

satisfying

\begin{matrix} ε_{j} \to 0 for j \to \infty and ε_{j} \geq ε_{j + 1} . \end{matrix}

Then, we get

\begin{matrix} \underset{j \to \infty}{lim sup} \{{∥ y (μ) - y^{ℓ} (μ) ∥}_{V} | μ \in P_{ad}\} = 0, \underset{j \to \infty}{lim sup} \{{∥ p (μ) - p^{ℓ} (μ) ∥}_{V} | μ \in P_{ad}\} = 0 . \end{matrix}

Proof.

Let

μ \in P_{ad}

be an arbitrary parameter. It holds that

\begin{matrix} {∥ y (μ) - y^{ℓ} (μ) ∥}_{V} \leq {∥ y (μ) - y^{N} (μ) ∥}_{V} + {∥ y^{N} (μ) - y^{ℓ} (μ) ∥}_{V} \end{matrix}

with

N > ℓ

. From Theorem A1 (see Appendix A), we infer that the first summand converges to zero for increasing N. For the second summand, we refer the reader to ([23] Section 3.2.5); in this section we proved the convergence of the RB solution to the FE solution. For the adjoint equation, we can do the same and the claim follows. □

4. Numerical Experiments

In this section, we use our algorithm to solve multiobjective optimization problems with PDE constraints and interpret the numerical results. All computations were executed on a computer with a 2.9 GHz Intel Core i7 CPU, 8 GB of RAM, and an Intel HD Graphic 4000 1536 MB GPU. The algorithms were implemented in Matlab R2017b. For the subdivision method, we used the implementation from https://math.uni-paderborn.de/en/ag/chair-of-applied-mathematics/research/software.

In this example, we will numerically investigate the application of the modified subdivision algorithm presented in Section 2.5 to the PDE-constrained multiobjective optimization problem using the RB-DEIM solver from Section 3.5. For the underlying PDE, we set

d = 2

,

Ω = {(0, 1)}^{2}

with elements

x = (x_{1}, x_{2})

,

P_{ad} = {[- 2, 2]}^{2}

, and

b = c = d = 1

; the right-hand side

f (x) = x_{1}^{2} + x_{2}^{2} - 4 + {(x_{1}^{2} + x_{2}^{2})}^{3}

,

m = 2

,

ξ_{1} (x) = - 25 \cdot 1_{x_{1} > 0.5} (x)

, and

ξ_{2} (x) = 25 \cdot 1_{x_{1} \leq 0.5} (x)

; and the boundary condition

g (x) = 2 \cdot 1_{x_{1} = 1} (x) + 2 \cdot 1_{x_{2} = 1} (x) + x_{1}^{2} + x_{2}^{2}

. This leads to the following PDE:

- ∆ y + y + y^{3} = f + μ_{1} ξ_{1} + μ_{2} ξ_{2} in Ω, \frac{\partial y}{\partial n} + y = g on \partial Ω .

(25)

In Figure 1, the corresponding solutions of the state equation are shown for three values of

μ

.

Figure 1. FE solutions of (25) for parameters (a)

μ = (0, 0)

, (b)

μ = (0, 1)

and (c)

μ = (1, 0)

.

In [23], we have already observed that the error between the FE- and RB-DEIM-solution of the state and adjoint equation decreases if the FE grids get finer. We skip the detailed description here and refer the reader to ([23] Section 5).

Notice that

y (x) = x_{1}^{2} + x_{2}^{2}

solves (25) for

μ = (0, 0)

. For the FE-solver, we used linear finite elements with

∆ H_{max} = 0.04

and the finite elements have 762 degrees of freedom. We choose the following two objective functions:

\begin{matrix} {\hat{J}}_{1} (μ) = \frac{1}{2} \int_{Ω} | y (μ) - y^{d} |^{2} d x, {\hat{J}}_{2} (μ) = \frac{1}{2} \sum_{j = 1}^{2} {| μ_{j} - μ_{j}^{d} |}^{2} \end{matrix}

with

μ^{d} = (1, 1)

. For the desired state

y^{d}

, we take the FE solution for

μ = 0

, i.e.,

y^{d} = y^{N} (0)

. Thus,

y^{d} = \sum_{i = 1}^{N} y_{i}^{d} φ_{i}

is a piecewise linear approximation of

x_{1}^{2} + x_{2}^{2}

. The associated FE objectives are now given as

\begin{matrix} {\hat{J}}_{1}^{N} (μ) = \frac{1}{2} {(y^{N} (μ) - y^{d})}^{⊤} M (y^{N} (μ) - y^{d}), {\hat{J}}_{2}^{N} (μ) = \frac{1}{2} \sum_{j = 1}^{2} {| μ_{j} - μ_{j}^{d} |}^{2} \end{matrix}

with

y^{d} = {(y_{i}^{d})}_{1 \leq i \leq N}

. The gradients are

\begin{matrix} \nabla {\hat{J}}_{1}^{N} (μ) = (\begin{matrix} p^{N} {(μ)}^{⊤} {\bar{F}}_{1} \\ p^{N} {(μ)}^{⊤} {\bar{F}}_{2} \end{matrix}), \nabla {\hat{J}}_{2}^{N} (μ) = μ - μ^{d} \end{matrix}

where

p^{N} (μ) = \sum_{j = 1}^{N} p_{j}^{N} (μ) φ_{j}

is the FE solution to (15),

p^{N} (μ) = {(p_{j}^{N} (μ))}_{1 \leq j \leq N}

and

{\bar{F}}_{i} = {(\int_{Ω} φ_{j} ξ_{i} d x)}_{1 \leq j \leq N}

. The associated RB objective functions have the form

\begin{matrix} {\hat{J}}_{1}^{ℓ} (μ) = \frac{1}{2} {(y^{ℓ} (μ) - y^{d, ℓ})}^{⊤} M^{ℓ} (y^{ℓ} (μ) - y^{d, ℓ}), {\hat{J}}_{2}^{ℓ} (μ) = \frac{1}{2} \sum_{j = 1}^{m} {| μ_{j} - μ_{j}^{d} |}^{2} \end{matrix}

with

y^{d, ℓ} = Ψ^{⊤} y^{d}

. In [23], we have already observed good agreement between the approximated Pareto critical sets with the FE- and RB-DEIM-solver, if the error between the gradients is sufficiently small. However, we cannot always guarantee this agreement. Thus, we will instead use the supersets

P_{1}^{ℓ}

and

P_{2}^{ℓ}

from Section 2.3 for the approximation of

P

(and

P_{c}

), which is only based on the reduced objective function

{\hat{J}}^{ℓ}

and the error bounds

ε

. To generate

P_{1}^{ℓ}

, we compute the steepest descent direction for all components of

{\hat{J}}^{ℓ}

. Similar to Algorithm 1, we first calculate

α_{k}

as solution of (4) for

μ_{k}

. Then, the descent direction is

v_{k} : = - D {\hat{J}}^{ℓ} {(μ_{k})}^{⊤} α_{k}

. As a stopping condition, we choose

σ = {∥ ε ∥}_{\infty}

and set

a (μ_{k}) = μ_{k}

if it holds that

\begin{matrix} {∥ D {\hat{J}}^{ℓ} {(μ_{k})}^{⊤} α_{k} ∥}_{2} = {∥ v_{k} ∥}_{2} < σ = {∥ ε ∥}_{\infty} . \end{matrix}

To generate

P_{2}^{ℓ}

, we use Algorithm 2.

To save computational time during the modified subdivision algorithm, we calculate

\begin{matrix} max_{1 \leq i \leq k} {∥ \nabla {\hat{J}}_{i}^{N} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2} \end{matrix}

before the algorithm starts (i.e., in an offline phase) on a training set

P_{train}

, which approximates

P_{ad}

and store these errors. During the subdivision algorithm, we use them to generate the new

ε_{i + 1}^{l}

faster and without calculating the FE solution again.

To see a significant difference between the modified subdivision algorithm and the subdivision algorithm and

P_{2}^{ℓ}

and

P_{1}^{ℓ}

, we need to have a big error between the gradients. To achieve this, we choose a rough training set

\begin{matrix} S_{t r a i n} = \{{(- 2 + 0.51 k, - 2 + 0.427 j)}^{⊤} | (k, j) \in {0, . . ., 7} \times {0, . . ., 9}\} \end{matrix}

with

| S_{t r a i n} | = 80

. For the Greedy algorithm, we choose the tolerance

ε_{t o l} = 0.06

and the true error

η_{ℓ} (μ) : = {max}_{1 \leq i \leq k} {∥ \nabla {\hat{J}}_{i}^{N} (μ) - \nabla {\hat{J}}_{i}^{ℓ} (μ) ∥}_{2}

as error indicator, for the DEIM algorithm we choose the tolerance

\bar{ε} = 0.5

.

With these settings, we generate an RB-basis with six elements and a DEIM-basis with 18 elements. This leads to the following estimations for the gradients of

{\hat{J}}^{N}

and

{\hat{J}}^{ℓ}

:

\begin{matrix} max_{μ \in P_{ad}} {∥ \nabla {\hat{J}}_{1}^{N} (μ) - \nabla {\hat{J}}_{1}^{ℓ} (μ) ∥}_{2} & \approx max_{μ \in P_{train}} {∥ \nabla {\hat{J}}_{1}^{N} (μ) - \nabla {\hat{J}}_{1}^{ℓ} (μ) ∥}_{2} \approx 0.0491 = ε_{1}, \\ max_{μ \in P_{ad}} {∥ \nabla {\hat{J}}_{2}^{N} (μ) - \nabla {\hat{J}}_{2}^{ℓ} (μ) ∥}_{2} & = max_{μ \in P_{ad}} {∥ \nabla {\hat{J}}_{2} (μ) - \nabla {\hat{J}}_{2}^{ℓ} (μ) ∥}_{2} = 0 = ε_{2} . \end{matrix}

To generate these estimations, we chose a training set

P_{train} \subset P_{ad}

with 3105 equally distributed test points and generate the error for these points. Thus, we have

\begin{matrix} ε^{0} = (\begin{matrix} 0.0491 \\ 0 \end{matrix}) . \end{matrix}

Now, we test the subdivision and the modified subdivision algorithm with the following conditions. To compute the descent direction, we use the Matlab function fmincon and solve (4) or (7). The algorithm stops when the box size is small enough, which is after 25 iteration steps in our case. In every step, we halve the boxes. We choose five sample points in each box during the first five iteration steps, four sample points in the next five iteration steps, three sample points for the iteration steps ten to 14, two sample points for the next five steps, and one sample points for the last five iteration steps, see Remark 4. Figure 2 shows the generated approximated Pareto sets for the FE-solver after 20 and 25 iteration steps.

Figure 2. Pareto-critical set

P_{c}

in iteration step (a) 20 and (b) 25 for the FE-solver.

The results with the subdivision algorithm and RB-DEIM-solver are shown in Figure 3.

Figure 3. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the subdivision algorithm.

The modified subdivision algorithm and RB-DEIM-solver lead to the results plotted in Figure 4.

Figure 4. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the modified subdivision algorithm.

The runtime, number of boxes and number of function and gradient evaluations needed in iteration step 10, 15, 20, and 25 are shown in Table 1, Table 2, Table 3, Table 4 and Table 5. The total runtime and number of function and gradient evaluations needed for the different methods and the speed-up are shown in Table 6.

Table 1. The performance of the subdivision algorithm with the FE-solver and the steepest descent method.

Table 2. Subdivision algorithm with the steepest descent method.

Table 3. Subdivision algorithm with the modified descent direction.

Table 4. Modified subdivision algorithm with the steepest descent method.

Table 5. Modified subdivision algorithm with the modified descent direction.

Table 6. Total runtime, number of function, and gradient evaluations for the different methods and the speed-up.

The Greedy algorithm and DEIM together take 14.06 s in the offline phase. To ensure the error

ε

(cf. (5)) for the gradients on the training set

P_{train}

, it takes

151.13

s. It follows from Figure 3 and Figure 4 that

P_{2}^{ℓ}

is significantly smaller than

P_{1}^{ℓ}

for the subdivision algorithm as well as for the modified subdivision algorithm. Therefore, we have a much better approximation for

P_{c}

if we choose

P_{2}^{ℓ}

instead of

P_{1}^{ℓ}

. If we compare the two different subdivision algorithms, we notice that with the modified subdivision algorithm we have a better approximation of

P_{c}

than with the subdivision algorithm.

The reason for this is that in the modified subdivision algorithm we have a monotonically decreasing sequence

ε^{l}

. In Figure 5, the error between

\nabla {\hat{J}}_{1}^{N}

and

\nabla {\hat{J}}_{1}^{ℓ}

is plotted on the parameter set

P_{ad}

. The black markers are some points on the Pareto critical set

P_{c}

. It turns out that the difference between the two gradients is significantly smaller near

P_{c}

than in other regions of

P_{ad}

. Due to this, in the modified subdivision algorithm, the sequence

ε^{l}

decreases noticeably so that it is useful to update

ε

after each iteration step. As we have already mentioned,

P_{2}^{ℓ}

is a better approximation of

P_{c}

. If

P_{2}^{ℓ}

is generated by the modified subdivision algorithm, the result is even better. Comparing Figure 2b and Figure 4b, there is no significant difference between the two sets. Therefore, the modified subdivision algorithm yields a good approximation

P_{2}^{ℓ}

for

P_{c}

, although the error

ε_{1}

is not small.

Figure 5. Difference

∥ \nabla {\hat{J}}_{1}^{N} (μ) - \nabla {\hat{J}}_{1}^{ℓ} {(μ) ∥}_{2}

for

μ \in P_{ad}

.

Regarding the computational time (cf. Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6), we notice that in the first 20 iterations the four methods take almost the same time. In these iteration steps, the FE-solver takes between 19 and 30 times more time for one iteration step than the four RB-based methods. Only in the subsequent iteration steps a significant difference in the four RB-based methods appears. For these iteration steps, the FE-solver takes between

3.9

and

30.8

times as much time as one of the four methods for one iteration step. We notice that the computation of

P_{1}^{ℓ}

takes around two times as much time as the computation of

P_{2}^{ℓ}

with the subdivision algorithm. The main reason for this is the much larger number of function and gradient evaluations that we have for

P_{1}^{ℓ}

. This, in turn, can be attributed to the significantly larger number of boxes for

P_{1}^{ℓ}

. If we use the modified subdivision algorithm, the computation of

P_{2}^{ℓ}

takes slightly more time than the computation of

P_{1}^{ℓ}

. The main reason for this is again the larger number of function and gradient evaluations. Unlike the previous case, this can not be lead back to the number of boxes, but probably to the calculation of the modified descent method, which requires more function and gradient evaluations. Nevertheless, the difference in the computational time is marginal (a factor about

1.04

) for the modified subdivision algorithm. For

P_{1}^{ℓ}

, we notice another behavior: Here, the generation of

P_{1}^{ℓ}

with the modified subdivision algorithm is 2 times faster than with the subdivision algorithm. This is mainly because of the larger number of function evalutions, which is due to the larger number of boxes. The FE-solver takes approximately

23.6

times as much time as the computation of

P_{1}^{ℓ}

and approximately

22.6

times as much as the time as the computation of

P_{2}^{ℓ}

with the modified subdivsion. This is another advantage of the modified subdivision algorithm. As we get a better approximation in this algorithm, we have fewer boxes in the iteration steps, and therefore we have a smaller number of function and gradient evaluations than we have for the subdivision algorithm. Finally, we see that the modified subdivision algorithm works better and faster than the subdivision algorithm. As

P_{2}^{ℓ}

is a tighter approximation of

P_{c}

, it is better to generate

P_{2}^{ℓ}

rather than

P_{1}^{ℓ}

.

5. Conclusions

In this article, we present a way to solve multiobjective parameter optimization problems of semilinear elliptic PDEs by combining an RB approach and DEIM with the set-oriented method based on gradient evaluations. To deal with the error introduced by the surrogate model, we derived an additional condition for the descent direction, which allows us to consider the errors for the objective functions independently and derive a superset

P_{2}^{ℓ}

of the Pareto-critical set

P_{c}

. To get an even tighter superset, we update these error bounds in our subdivision algorithm after each iteration step. To summarize the numerical results, we first investigated the influence of the error bounds for the gradients of the objective functions. By individually adapting the components of the error bounds, we obtained a tighter covering of the Pareto critical set. When we additionally adjusted the error bounds in each iteration step, the result became even tighter and almost coincided with the exact solution of the MOP (solution with the FE-solver and the steepest descent method). Furthermore, we compared the computational time for each method. The FE-solver needed between

11.9

times and

23.6

times more time than the four different RB-based methods we presented in this work. For future work, it could be interesting to improve the results in ([23] Section 3.2.4) and to develop an efficient a posteriori error estimator for the error in the objectives and their gradients, cf., e.g., [32,33,34]. These error bounds can then be used in a weak greedy algorithm and beyond that for the error bounds which are needed in the subdivision algorithm.

Author Contributions

The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded partially by Deutsche Forschungsgemeinschaft grant number Priority Programme 1962.

Acknowledgments

The authors gratefully acknowledge partial support by the German DFG-Priority Program 1962. Furthermore, S. Banholzer acknowledges his partial funding by the Landesgraduiertenförderung of Baden-Württemberg.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MOP	Multiobjective optimization problem
DEIM	Discrete Empirical Interpolation Method
FE	Finite Element
GS	Gram–Schmidt
KKT	Karush–Kuhn–Tucker
PDE	Partial-differential equation
POD	Proper Orthogonal Decomposition
RB	Reduced Basis
ROM	Reduced Order Modeling

Appendix A. Mass Lumping

We follow the work in [35]. Let

Ω \subset R^{d}

,

d \in {1, 2, 3}

be an open, bounded Lipschitz domain, e.g.,

Ω = {(0, 1)}^{d}

. We set

H = L^{2} (Ω)

and

V = H^{1} (Ω)

, where V is endowed with the inner product

\begin{matrix} {⟨ φ, φ ⟩}_{V} = \int_{Ω} \nabla φ \cdot \nabla φ d x + \int_{Γ} φ φ d s for φ, φ \in V \end{matrix}

and its induced norm. Let

T^{N}

denote an underlying triangulation of

Ω

and

V (T^{N}) = {z^{1}, . . ., z^{N}}

denote the set of interior vertices of

T^{N}

, and let

V (τ)

be the set of vertices belonging to

τ \in T^{N}

and

V^{N} = span {φ^{1}, . ., φ^{N}}

be the FE space generated by first order Lagrange elements. It holds that

φ^{j} (z^{l}) = δ_{j l}

.

Define for

z^{j} \in V (T^{N})

a lumped mass region

Ω^{j}

by joining the centroids of the triangles, which have

z^{j}

as a common vertex, to the midpoint of the edges, which have

z^{j}

as a common extremity. We define the two sets

\begin{matrix} V^{N} & = \{v \in C (\bar{Ω}) {| v |}_{τ_{j}} \in P_{1} for all τ_{i} \in T^{N}\} \subset V, \\ H^{N} & = \{v \in L^{\infty} (Ω) | v = \sum_{j = 1}^{N} 1_{Ω^{j}} v^{j}, v^{j} \in R\} \subset H \end{matrix}

and the operator

R^{N} : C (\bar{Ω}) \to H^{N}, R^{N} (ω) = \sum_{j = 1}^{N} 1_{Ω^{j}} ω (z^{j}) .

We consider the following semi-linear elliptic partial differential equation:

- ∆ y + f (y) = h in Ω, \frac{\partial y}{\partial n} + y = g on Γ = \partial Ω .

(A1)

A weak solution of (A1) is a function

y \in Y = V \cap L^{\infty} (Ω)

such that

\int_{Ω} \nabla y \cdot \nabla φ + f (y) φ d x + \int_{Γ} y φ d s = \int_{Ω} h φ d x + \int_{Γ} g φ d s for all φ \in V .

(A2)

The function f is supposed to be sufficiently smooth, bounded, and measurable for a fixed y; monotonically increasing; and satisfies

f \geq 0

. Moreover,

h \in H

and

g \in L^{r} (Γ)

with

s > d - 1

. Then, there exists a unique solution

y \in Y

of (A1), c.f. [26], for instance. This solution is even continuous, and for a constant

c_{\infty}

it holds that

\begin{matrix} {∥ y ∥}_{V} + {∥ y ∥}_{C (\bar{Ω})} \leq c_{\infty} ({∥ h ∥}_{H} + {∥ g ∥}_{L^{r} (Γ)}) . \end{matrix}

The lumped mass finite element approximation of (A1) is to find

y^{N} \in V^{N}

such that

\int_{Ω} \nabla y^{N} \cdot \nabla φ + f (R^{N} (y^{N})) φ^{N} d x + \int_{Γ} y^{N} φ d s = \int_{Ω} h φ d x + \int_{Γ} g φ d s

(A3)

holds for all

φ \in V^{N}

and for

φ^{N} = R^{N} (φ)

.

Remark A1.

For any

v \in C (\bar{Ω})

, it holds that

R^{N} (v) = \sum_{j = 1}^{N} 1_{Ω^{j}} v (z^{j})

. Therefore,

R^{N} (v) \to v

for

n \to \infty

, which implies

{∥ R^{N} (v) - v ∥}_{H} \to 0 f o r n \to \infty .

◊

Theorem A1.

Assume that

y \in V

and

y^{N} \in V^{N}

are solutions of (A2) and (A3), respectively. Then, it holds that

\begin{matrix} {∥ y - y^{N} ∥}_{V} \to 0 f o r N \to \infty . \end{matrix}

Proof.

Because y solves (A2), it holds that

y \in C (\bar{Ω})

. Let

y^{p} \in V^{N}

be the interpolation polynomial of y in

V^{N}

. Then, it follows that

R^{N} (y^{p}) = R^{N} (y)

. Utilizing (A3), we derive

- \int_{Ω} \nabla y^{N} \cdot \nabla y^{p} - \int_{Γ} y^{N} y^{p} d s = \int_{Ω} f (R^{N} (y^{N})) R^{N} (y) - h y^{p} d x - \int_{Γ} g y^{p} d s .

(A4)

Let

ε > 0

be an arbitrary tolerance. For N large enough, we have

∥ y - y^{p} ∥_{V} < ε

. Furthermore, we have

∥ R^{N} {(y) - y ∥}_{H} < ε

and

∥ R^{N} (y^{p} - y^{N}) - (y^{p} - y^{N}) ∥_{H} < ε

. From (A4), we conclude

\begin{matrix} {∥ y^{N} - y ∥}_{V}^{2} & = \int_{Ω} | \nabla (y^{N} - y) |^{2} d x + \int_{Γ} {| y^{N} - y |}^{2} d s \\ = \int_{Ω} (\nabla (y^{N} - y) \cdot \nabla (y^{p} - y) + \nabla (y^{N} - y) \cdot \nabla y^{N} + \nabla (y - y^{N}) \cdot \nabla y^{p}) d x \\ + \int_{Γ} ((y^{N} - y) (y^{p} - y) + (y^{N} - y) y^{N} + (y - y^{N}) y^{p}) d s \\ = \int_{Ω} (\nabla (y^{N} - y) \cdot \nabla (y^{p} - y) + \nabla (y^{N} - y) \cdot \nabla y^{N} + \nabla y \cdot \nabla y^{p}) d x \\ + \int_{Γ} ((y^{N} - y) (y^{p} - y) + (y^{N} - y) y^{N} + y y^{p}) d s \\ + \int_{Ω} (f (R^{N} (y^{N})) R^{N} (y) - h y^{p}) d x - \int_{Γ} g y^{p} d s . \end{matrix}

Utilizing

∥ y - y^{p} ∥_{V} < ε

,

∥ R^{N} {(y) - y ∥}_{H} < ε

, and

∥ R^{N} (y^{p} - y^{N}) - (y^{p} - y^{N}) ∥_{H} < ε

, it follows that

\begin{matrix} \int_{Ω} (f (R^{N} (y^{N})) & R^{N} (y)) d x = {⟨ f (R^{N} (y^{N})), R^{N} (y) ⟩}_{H} \\ = {⟨ f (R^{N} (y^{N})) - f (R^{N} (y)), R^{N} (y) - R^{N} (y^{N}) ⟩}_{H} \\ + {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} + {⟨ f (R^{N} (y)), R^{N} (y) - R^{N} (y^{N}) ⟩}_{H} \\ \leq {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} + {⟨ f (R^{N} (y)), R^{N} (y) - R^{N} (y^{N}) ⟩}_{H} \\ = {⟨ f (R^{N} (y)), R^{N} (y - y^{N}) - (y - y^{N}) ⟩}_{H} + {⟨ f (y), y - y^{N} ⟩}_{H} \\ + {⟨ f (R^{N} (y)) - f (y), (y - y^{N}) ⟩}_{H} + {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} \\ \leq c ({∥ y - y^{p} ∥}_{H} + {∥ R^{N} (y^{p} - y^{N}) - (y^{p} - y^{N}) ∥}_{H}) + {⟨ f (y), y - y^{N} ⟩}_{H} \\ + c {∥ R^{N} (y) - y ∥}_{H} {∥ y - y^{N} ∥}_{H} + {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} \\ \leq c (2 ε + ε {∥ y - y^{N} ∥}_{H}) + {⟨ f (y), y - y^{N} ⟩}_{H} + {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} . \end{matrix}

Inserting this estimation into the equality before thus gives

\begin{matrix} {∥ y - y^{N} ∥}_{V}^{2} & \leq ε {∥ y - y^{N} ∥}_{V}^{2} + \frac{∥ y^{p} {- y ∥}_{V}^{2}}{4 ε} + {⟨ \nabla y^{N}, \nabla (y^{N} - y) ⟩}_{H^{d}} + {⟨ \nabla y^{p}, \nabla y ⟩}_{H^{d}} \\ + {⟨ y^{N} - y, y^{p} - y ⟩}_{L^{2} (Γ)} + {⟨ y^{N}, y^{N} - y ⟩}_{L^{2} (Γ)} + {⟨ y^{p}, y ⟩}_{L^{2} (Γ)} - {⟨ y^{p}, h ⟩}_{H} \\ - {⟨ y^{p}, g ⟩}_{L^{2} (Γ)} + c ε + c ε {∥ y - y^{N} ∥}_{H} + {⟨ g, y - y^{N} ⟩}_{L^{2} (Γ)} - {⟨ y, y - y^{N} ⟩}_{L^{2} (Γ)} \\ + {⟨ h, y - y^{N} ⟩}_{H} - {⟨ \nabla y, \nabla (y - y^{N}) ⟩}_{H^{d}} + {⟨ f (R^{N} (y^{N})), R^{N} (y^{N}) ⟩}_{H} \\ \leq c ε {∥ y - y^{N} ∥}_{V}^{2} + \frac{ε^{2}}{4 ε} + c ε {⟨ \nabla y^{N}, \nabla y ⟩}_{H^{d}} + {⟨ \nabla y^{p}, \nabla y ⟩}_{H^{d}} \\ + {⟨ g, y - y^{p} ⟩}_{L^{2} (Γ)} + {⟨ h, y - y^{p} ⟩}_{H} + {⟨ y^{N}, y^{p} - y ⟩}_{L^{2} (Γ)} - {⟨ \nabla y, \nabla (y - y^{N}) ⟩}_{H^{d}} \\ \leq c ε + c ε {∥ y - y^{N} ∥}_{V}^{2} \end{matrix}

with

H^{d} = \otimes_{i = 1}^{d} H

. In the last inequality, we used the fact that

y^{p} \to y

for

N \to \infty

. Therefore, we get

\begin{matrix} {∥ y - y^{N} ∥}_{V}^{2} \leq c_{0} ε \end{matrix}

and it follows

∥ y - y^{N} ∥_{V} \to 0

for

N \to \infty

. □

Remark A2.

For

y^{N} \in V^{N}

and

j = 1, \dots, N

we find

\begin{matrix} \int_{Ω} f (R^{N} (y^{N})) R^{N} (φ^{j}) d x & = \sum_{i = 1}^{N} \int_{Ω^{i}} f (\sum_{l = 1}^{N} y_{l}^{N} 1_{Ω_{l}}) 1_{Ω^{j}} d x \\ = \int_{Ω^{j}} f (\sum_{l = 1}^{N} y_{l}^{N} 1_{Ω_{l}}) d x = f (y_{j}^{N}) \int_{Ω^{j}} 1_{Ω_{j}} d x \\ = f (y_{j}^{N}) \sum_{τ \in T^{N} : z^{j} \in V (τ)} \frac{| τ |}{n + 1} = {\tilde{M}}_{j j} f (y_{j}^{N}) . \end{matrix}

For the last step we use

{\tilde{M}}_{j j} = \sum_{k = 1}^{N} M_{j k} = \sum_{τ \in T^{N} : z^{j} \in V (τ)} \int_{τ} φ_{j} d x = \sum_{τ \in T^{N} : z^{j} \in V (τ)} \frac{| τ |}{n + 1} .

A detailed proof for this equality can be found in ([36], Appendix A). ◊

References

Ehrgott, M. Multicriteria Optimization; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Iapichino, L.; Ulbrich, S.; Volkwein, S. Multiobjective PDE-constrained optimization using the reduced-basis method. Adv. Comput. Math. 2017, 43, 945–972. [Google Scholar] [CrossRef]
Miettinen, K. Nonlinear Multiobjective Optimization; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1999. [Google Scholar]
Zadeh, L. Optimality and non-scalar-valued performance criteria. IEEE Trans. Autom. Control 1963, 8, 59–60. [Google Scholar] [CrossRef]
Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2001. [Google Scholar]
Fliege, J.; Graña Drummond, L.M.; Svaiter, B.F. Netwon’s Method for multiobjective optimization. SIAM J. Optim. 2008, 20, 602–626. [Google Scholar] [CrossRef]
Fliege, J.; Svaiter, B.F. Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 2000, 51, 479–494. [Google Scholar] [CrossRef]
Schäffler, S.; Schultz, R.; Weinzierl, K. Stochastic method for the solution of unconstrained vector optimization problems. J. Optim. Theory Appl. 2002, 114, 209–222. [Google Scholar] [CrossRef]
Banholzer, S.; Gebken, B.; Dellnitz, M.; Peitz, S.; Volkwein, S. ROM-based multiobjective optimization of elliptic PDEs via numerical continuation. arXiv 2019, arXiv:1906.09075v1. [Google Scholar]
Hillermeier, C. Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach; Birkhäuser: Cambridge, MA, USA, 2001. [Google Scholar]
Schütze, O.; Dell’Aere, A.; Dellnitz, M. On continuation methods for the numerical treatment of multi-objective optimization problems. In Practical Approaches to Multi-Objective Optimization; Branke, J., Deb, K., Miettinen, K., Steuer, R.E., Eds.; Dagstuhl Seminar Proceedings: Dagstuhl, Deutschland, 2005. [Google Scholar]
Beermann, D.; Dellnitz, M.; Peitz, S.; Volkwein, S. Set-oriented multiobjective optimal control of PDEs using proper orthogonal decomposition. In Reduced-Order Modeling (ROM) for Simulation and Optimization: Powerful Algorithms as Key Enablers for Scientific Computing; Springer International Publishing: Cham, Switzerland, 2018; pp. 47–72. [Google Scholar]
Dellnitz, M.; Schütze, O.; Hestermeyer, T. Covering Pareto sets by multilevel subdivision techniques. J. Optim. Theory Appl. 2005, 124, 113–136. [Google Scholar] [CrossRef]
Jahn, J. Multiobjective search algorithm with subdivision technique. Comput. Optim. Appl. 2006, 35, 161–175. [Google Scholar] [CrossRef]
Schütze, O.; Witting, K.; Ober-Blöbaum, S.; Dellnitz, M. Set oriented methods for the numerical treatment of multiobjective optimization problems. In EVOLVE—A Bridge between Probability, Set Oriented Numerics and Evolutionary Computation; Tantar, E., Tantar, A.-A., Bouvry, P., Del Moral, P., Legrand, P., Coello Coello, C.A., Schütze, O., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 447, pp. 187–219. [Google Scholar]
Schilders, W.H.; Van der Vorst, H.A.; Rommes, J. Model Order Reduction; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Hesthaven, J.S.; Rozza, G.; Stamm, B. Certified Reduced Basis Methods for Parametrized Partial Differential Equations; SpringerBriefs in Mathematics: Heidelberg, Germany, 2016. [Google Scholar]
Patera, A.T.; Rozza, G. Reduced Basis Approximation and A Posteriori Error Estimation for Parametrized Partial Differential Equations. MIT Pappalardo Graduate Monographs in Mechanical Engineering: Cambridge, MA, USA, 2007. [Google Scholar]
Chaturantabut, S.; Sorensen, D.C. Nonlinear model reduction via discrete empirical interpolation. SIAM J. Sci. Comput. 2010, 32, 2737–2764. [Google Scholar] [CrossRef]
Chaturantabut, S.; Sorensen, D.C. A state space estimate for POD-DEIM nonlinear model reduction. SIAM J. Numer. Anal. 2012, 50, 46–63. [Google Scholar] [CrossRef]
Gebken, B.; Peitz, S.; Dellnitz, M. A descent method for equality and inequality constrained multiobjective optimization problems. In Numerical and Evolutionary Optimization—NEO 2017; Trujillo, L., Schütze, O., Maldonado, Y., Valle, P., Eds.; Springer: Cham, Switzerland, 2017. [Google Scholar]
Graña Drummond, L.M.; Svaiter, B.F. A steepest descent method for vector optimization. J. Comput. Appl. Math. 2005, 175, 395–414. [Google Scholar] [CrossRef]
Reichle, L. Set-Oriented Multiobjective Optimal Control of Elliptic Non-Linear Partial Differential Equations Using POD Objectives and Gradient. Master’s Thesis, University of Konstanz, Konstanz, Germany, 2020. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1h6pp1cxptbap6 (accessed on 15 April 2021).
Dellnitz, M.; Hohmann, A. A subdivision algorithm for the computation of unstable manifolds and global attractors. Numer. Math. 1997, 75, 293–316. [Google Scholar] [CrossRef]
Dautray, R.; Lions, J.-L. Mathematical Analysis and Numerical Methods for Science and Technology. Volume 2: Functional and Variational Methods; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Tröltzsch, F. Optimal Control of Partial Differential Equations: Theory, Methods and Applications; American Mathematical Society: Providence, RI, USA, 2010; Volume 112. [Google Scholar]
Thomée, V. Galerkin Finite Element Methods for Parabolic Problems; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Zienkiewicz, O.C.; Taylo, R.L. The Finite Element Method, 5th ed.; Butterworth-Hienemann: Oxford, UK, 2000; Volume 3. [Google Scholar]
Brenner, S.; Scott, R. The Mathematical Theory of Finite Element Methods; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Rösch, A.; Wachsmuth, G. Mass lumping for the optimal control of elliptic partial differential equations. SIAM J. Numer. Anal. 2017, 55, 1412–1436. [Google Scholar] [CrossRef]
Quarteroni, A.; Manzoni, A.; Negri, F. Reduced Basis Methods for Partial Differential Equations; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Grepl, M.A.; Maday, Y.; Nguyen, N.C.; Patera, A.T. Efficient reduced-basis treatment of nonaffine and nonlinear partial differential equations. ESAIM Math. Model. Numer. Anal. 2007, 41, 575–605. [Google Scholar] [CrossRef]
Hinze, M.; Korolev, D. Reduced basis methods for quasilinear elliptic PDEs with applications to permanent magnet synchronous motors. arXiv 2020, arXiv:2002.04288. [Google Scholar]
Rogg, S.; Trenz, S.; Volkwein, S. Trust-region POD using a-posteriori error estimation for semilinear parabolic optimal control problems. Konstanz. Schr. Math. 2017, 359. Available online: https://kops.uni-konstanz.de/handle/123456789/38240 (accessed on 15 April 2021).
Zeng, J.; Yu, H. Error estimates of the lumped mass finite element method for semilinear elliptic problems. J. Comput. Appl. Math. 2012, 236, 993–2004. [Google Scholar] [CrossRef]
Bernreuther, M. RB-Based PDE-Constrained Non-Smooth Optimization. Master’s Thesis, University of Konstanz, Konstanz, Germany, 2019. Available online: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-t4k1djyj77yn3 (accessed on 15 April 2021).

Figure 1. FE solutions of (25) for parameters (a)

μ = (0, 0)

, (b)

μ = (0, 1)

and (c)

μ = (1, 0)

.

Figure 1. FE solutions of (25) for parameters (a)

μ = (0, 0)

, (b)

μ = (0, 1)

and (c)

μ = (1, 0)

.

Figure 2. Pareto-critical set

P_{c}

in iteration step (a) 20 and (b) 25 for the FE-solver.

Figure 2. Pareto-critical set

P_{c}

in iteration step (a) 20 and (b) 25 for the FE-solver.

Figure 3. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the subdivision algorithm.

Figure 3. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the subdivision algorithm.

Figure 4. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the modified subdivision algorithm.

Figure 4. (a)

P_{1}^{ℓ}

and (b)

P_{2}^{ℓ}

generated with the modified subdivision algorithm.

Figure 5. Difference

∥ \nabla {\hat{J}}_{1}^{N} (μ) - \nabla {\hat{J}}_{1}^{ℓ} {(μ) ∥}_{2}

for

μ \in P_{ad}

.

Figure 5. Difference

∥ \nabla {\hat{J}}_{1}^{N} (μ) - \nabla {\hat{J}}_{1}^{ℓ} {(μ) ∥}_{2}

for

μ \in P_{ad}

.

Table 1. The performance of the subdivision algorithm with the FE-solver and the steepest descent method.

FE
It. Step	Time [s]	# Grad. Solves	# Del. Boxes	# Boxes
10	$370.7$	6872	30	$64$
15	$1077.1$	16786	$166$	$322$
20	$1239.1$	20092	$938$	1388
25	$3364.2$	64721	2567	4749

Table 2. Subdivision algorithm with the steepest descent method.

RB-DEIM
It. Step	Time [s]	# Grad. Solves	# Del. Boxes	# Boxes	$ε$
10	$18.8$	6809	32	$64$	$(0.0491, 0)$
15	$42.4$	$14279$	168	$326$	$(0.0491, 0)$
20	$44.3$	$13781$	622	$3412$	$(0.0491, 0)$
25	$848.6$	196405	149	97551	$(0.0491, 0)$

Table 3. Subdivision algorithm with the modified descent direction.

RB-DEIM
It. Step	Time [s]	# Grad. Solves	# Del. Boxes	# Boxes	$ε$
10	$21.0$	6792	31	$63$	$(0.0491, 0)$
15	$43.1$	15546	167	$315$	$(0.0491, 0)$
20	$38.0$	12655	878	$1438$	$(0.0491, 0)$
25	$289.5$	64606	233	30607	$(0.0491, 0)$

Table 4. Modified subdivision algorithm with the steepest descent method.

RB-DEIM
It. Step	Time [s]	# Grad. Solves	# Del. Boxes	# Boxes	$ε$
10	$18.5$	6829	32	$64$	$(0.041, 0)$
15	$44.1$	16307	170	$318$	$(0.0084, 0)$
20	$45.6$	16677	919	$1357$	$(0.0072, 0)$
25	$121.2$	29811	560	12230	$(0.0072, 0)$

Table 5. Modified subdivision algorithm with the modified descent direction.

RB-DEIM
It. Step	Time [s]	# Grad. Solves	# Del. Boxes	# Boxes	$ε$
10	$18.7$	6805	31	$263$	$(0.041, 0)$
15	$44.1$	16459	$168$	$318$	$(0.0084, 0)$
20	$49.7$	18266	$937$	1329	$(0.0072, 0)$
25	$109.7$	37170	1615	4129	$(0.0043, 0)$

Table 6. Total runtime, number of function, and gradient evaluations for the different methods and the speed-up.

Algorithm	Solver	Method	Time [min]	Speed-Up	# Grad. Solves
subdivision	FE	steep. desc.	$495.83$	-	610728
subdivision	RB-DEIM	steep. desc.	$41.77$	$11.9$	622034
subdivision	RB-DEIM	mod. desc.	$24.77$	$20.0$	390362
mod. subdivision	RB-DEIM	steep. desc.	$21.00$	$23.6$	372551
mod. subdivision	RB-DEIM	mod. desc.	$21.91$	$22.6$	412667

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

ROM-Based Inexact Subdivision Methods for PDE-Constrained Multiobjective Optimization

Abstract

1. Introduction

2. A Set-Oriented Method for Multiobjective Optimization with Inexact Objective Gradients

2.1. Multiobjective Optimization

2.2. Descent Direction with Exact Gradients

2.3. Descent Direction with Inexact Gradients

2.4. Subdivision Algorithm

2.5. Modified Subdivision Algorithm for Inexact Gradients

3. Multiobjective Optimization of a Semi-Linear Elliptic PDE

3.1. Problem Formulation

3.2. The Parameter Dependent Semi-Linear Elliptic PDE

3.3. Reduced Formulation and Adjoint Approach

3.4. Finite Element (FE) Galerkin Discretization

3.5. Reduced-Order Modelling (ROM)

3.6. Convergence Analysis

4. Numerical Experiments

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Mass Lumping

References

Article Metrics

Citations

Article Access Statistics