Next Article in Journal
Examining the Regional Disparity of Agricultural Development: A Distribution Dynamics Approach
Next Article in Special Issue
Nonlinear Modelling and Control of a Power Smoothing System for a Novel Wave Energy Converter Prototype
Previous Article in Journal
Performance Auditing to Assess the Implementation of the Sustainable Development Goals (SDGs) in Indonesia
Previous Article in Special Issue
The Impact of Industrial Intelligence on Energy Intensity: Evidence from China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gaussian Process Regression Based Multi-Objective Bayesian Optimization for Power System Design

Systems Engineering Laboratory, University of Applied Sciences, Lothstrasse 64, 80335 München, Germany
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(19), 12777; https://doi.org/10.3390/su141912777
Submission received: 25 August 2022 / Revised: 22 September 2022 / Accepted: 29 September 2022 / Published: 7 October 2022
(This article belongs to the Special Issue Intelligent Mechatronic and Renewable Energy Systems)

Abstract

:
Within a disruptively changing environment, design of power systems becomes a complex task. Meeting multi-criteria requirements with increasing degrees of freedom in design and simultaneously decreasing technical expertise strengthens the need for multi-objective optimization (MOO) making use of algorithms and virtual prototyping. In this context, we present Gaussian Process Regression based Multi-Objective Bayesian Optimization (GPR-MOBO) with special emphasis on its profound theoretical background. A detailed mathematical framework is provided to derive a GPR-MOBO computer implementable algorithm. We quantify GPR-MOBO effectiveness and efficiency by hypervolume and the number of required computationally expensive simulations to identify Pareto-optimal design solutions, respectively. For validation purposes, we benchmark our GPR-MOBO implementation based on a mathematical test function with analytically known Pareto front and compare results to those of well-known algorithms NSGA-II and pure Latin Hyper Cube Sampling. To rule out effects of randomness, we include statistical evaluations. GPR-MOBO turnes out as an effective and efficient approach with superior character versus state-of-the art approaches and increasing value-add when simulations are computationally expensive and the number of design degrees of freedom is high. Finally, we provide an example of GPR-MOBO based power system design and optimization that demonstrates both the methodology itself and its performance benefits.

1. Design of Complex Power Systems

Design of power systems is facing a triple disruptive upheaval: (i) primary energy sources are being converted from fossil to renewable, (ii) grid topologies are moving from centralized to decentralized dominance, and (iii) previously separate sectors (heat, electricity, and mobility) are merging. Driven by these changes, emerging technologies will replace established ones, while at the same time the number of degrees of freedom in facility design will increase. The resulting VUCA (volatile, uncertain, complex and ambiguous) [1] environment poses significant risks for all involved stakeholders throughout the value chain, from product owners over system architects to operators and consumers. Simultaneous ecologic and economic success of power plants requires the ability to identify system design alternatives with multi-objective optima in terms of cost-benefit trade-off for individual use cases.
The lack of technical experience around emerging or new technologies significantly increases the challenge of optimization. Virtual prototyping (VP) guided by engineering experience, therefore, may be considered the standard approach for today’s energy facility design. However, with an increasing number of technology alternatives and decreasing system-level technical experience, the dependence between objectives and design parameters is unknown in many cases. Power system design, thereby, becomes a black box multi-objective optimization (MOO) problem.
VP-based MOO guided by engineering intuition again fails when engineering experience is lacking and full or fractional factorial coverage of the design space is not a viable option due to (i) the dimensionality (i.e., number of degrees of freedom) of the design space and (ii) the time-consuming and costly effort required to simulate power systems. Therefore, gaining engineering knowledge about the relationship between system design objectives as a function of design parameters based on as few simulations as possible is of paramount importance.
VUCA driven power system design, accordingly, requires MOO algorithms being
(i)
effective, i.e., knowledgable of multi-objective (MO) related optimal cost-benefit (lateron called Pareto-optimal) trade-offs and
(ii)
efficient, i.e., requiring only a limited number of required (computationally expensive) simulations for quantifying these trade-offs.
While a broad variety of MOO approaches in black box environments deals with effective algorithms, only few of them meet the efficiency criterion [2]. Specifically Bayesian Optimization (BO) [3,4,5,6] algorithms based on Gaussian Process Regression (GPR) [3,7,8,9] appear as interesting effective and efficient MOO candidates.
GPR [10,11] provides a regression model that (in contrast to alternative methodical approaches) (a) does not suffer from the “curse of dimensionality” [12] ([Section 3.1]), and (b) inherently provides a quantification of regression uncertainty of the model. This uncertainty is exploited by the BO approach [13] which in turn is used to optimize the surrogate. For a more detailed and fundamental comparison of MOO approaches, we recommend [14].
In this paper, we present the mathematical background of GPR-based Bayesian Optimization (GPR-MOBO) in detail. This includes statements and selected proofs of key results. With that theoretical foundation on hand, we derive a computer implementation of an introduced GPR-MOBO algorithm, quantify its effectiveness and efficiency, and demonstrate the superiority of GPR-MOBO over state-of-the-art MOO algorithms, including a GPR-MOBO application to a power system design example. The paper is structured as follows: Section 2 restates the introductory problem in mathematical terms, including definitions such as “Pareto-optimality” or “hypervolume”. Section 3 explains Gaussian Process Regression (GPR) and a Bayesian Optimization (BO) based on it before a bipartite validation follows in Section 4: First, the superiority of the presented approach is validated via a mathematical test function before the proposed GPR-MOBO approach is applied to the design and optimization of a real life power system. Section 6 discusses our results, gives a brief summary and highlights ideas for future work.

2. Problem Statement in Mathematical Terms

Within this chapter, we phrase the task of effectively and efficiently identifying objective trade-offs in mathematical terms. For this purpose, we consider an (unknown) t dimensional black box function
f : X t .
We refer to X d as the design space with target space t for d , t . Assume further, we can sample f at a finite number of points, i.e., choosing x 1 , , x N we obtain f ( x 1 ) , , f ( x N ) . We translate the MOO issue
d opt = argmin d X f ( d )
of identifying an optimal design d opt X to finding solutions of f which represent non-dominated (Pareto-optimal) trade-offs of f, i.e., we are looking for Pareto points of f:
Definition 1 (Pareto point and front).
We write
t s : i : t i s i t s : i : t i < s i .
Then, given a set S t , a point s S is called Pareto point, if there exists no other point t S satisfying t s and t i < s i for some component i. The set of all Pareto (optimal) points is called the Pareto front of S.
More general, an x X is a Pareto point of f if f ( x ) is a Pareto point of f ( X ) = { f ( x ) : x X } . The set of all such points is called Pareto front of f.
Based on this definition, we will call a MOO algorithm to be effective whenever it is capable to identify the set of non-dominated (Pareto-optimal) trade-offs for the (unknown) black box function. Introducing a measure of effectiveness, we will now define the so-called hypervolume.
Definition 2 (Hypervolume).
Denote by
I : t × t P ( t ) , ( x , r ) { z t : x z r }
the function sending two t-dimensional real vectors to the cube bounded by them where P denotes the power set. The Hypervolume of some (finite) set Y t with respect to a reference point r t is given by the Lebesgue-measure
HV r ( Y ) = λ t y Y I ( y , r )
of the union over all cubes bounded by the reference point and by some point in Y.
Figure 1 illustrates the Hypervolume on exemplary base in two dimensions.
The Hypervolume is closely related to Pareto points in the following sense.
Proposition 1.
Let S t be quasi-compact (i.e., bounded and closed) and U S be a finite subset. Let r t such that
HV r ( U { s } ) > HV r ( U )
for some s S . Then,
p = argmax p S HV r ( U { p } )
is a Pareto point of S.
Proof. 
See Appendix A.  □
In simple words, Proposition 1 states that maximizing the hypervolume by adding an image (black box function) point implies this point to be a Pareto point. Accordingly, hypervolume is a suitable indicator of MOO effectiveness while the number of (simulation) samples required to find such Pareto points itself is a suitable measure for efficiency.

3. Bayesian Optimization Based on Gaussian Process Regression

Gaussian Process Regression (GPR) based Bayesian Optimization (BO) using Expected Hypervolume Improvement (EHVI, see Section 3.3) as acquisition function is a promising algorithmic approach to meet the simultaneous goals of effectiveness and efficiency. The following sub-sections will introduce the general mathematical GPR background (Section 3.1), choice of GPR related hyperparameters (Section 3.2) and GPR based multi-objective BO (GPR-MOBO, Section 3.3) prior summarizing previous sub-sections as a mathematical base for the subsequent algorithmic implementation.

3.1. Gaussian Process Regression

We summarize and recall the definition, statements and formulas needed in order to properly apply Gaussian Process Regression (GPR).

3.1.1. Multivariate Normal Distribution

Let n N be a positive integer and C M n ( ) be a real, positive definite matrix of dimension n with M n ( ) being the space of n × n matrices with values in ℝ. Let m n be a n-dimensional real vector. Recall the multivariate normal distribution N ( m , C ) to be the probability measure on n induced by the density function
f C , m : n , x exp ( 0.5 ( x m ) C 1 ( x m ) ) det ( C ) ( 2 π ) n .
The vector m is called mean(-vector) and the matrix C is called covariance matrix of N ( m , C ) .
Multivariate normal distributions are stable under conditioning in the following sense.
Theorem 2.
Let X 1 , X 2 be two random variables such that ( X 1 , X 2 ) is multivariate normal N ( m , C ) -distributed with mean m = ( m 1 , m 2 ) p × q and covariance matrix
C = C 1 C 2 C 3 C 4 M p + q ( R ) , C 1 M p ( R ) .
Then, given some x q in the co-domain of X 2 , the conditional density function of X 1 given X 2 = x is given by f C , m with
m = m 1 + C 2 C 4 1 ( x m 2 )
and
C = C 1 C 2 C 4 1 C 3 .
Proof. 
Theorem 2.5.1 in [15].  □
In particular, conditioning a multivariate normal distribution turns out to be multivariate normal distributed as well.

3.1.2. Stochastic Processes

Let ( Ω , F , P ) be a probability space, I a set and ( S , F ) be a measurable space.
Definition 3.
A stochastic process with state space S and index set I is a collection of random variables X i : Ω S , i I .
Remark 1.
Recall that arbitrary products of measurable spaces exist and their underlying sets are given by the Cartesian products of sets. By the universal property of the product, a stochastic process X i , i I , therefore, consists of the same data as a measurable function X : Ω S I .
Given a stochastic process X : Ω S I , in practice, we are mostly interested in the induced measure P X = P ( X 1 ( ) ) on S I . On the other hand, given such probability measure P I on S I , we obtain a stochastic process p r i : S I S given by the canonical projections. In that sense, a stochastic process may be seen as a proper construction of a probability measure on the product space S I .

3.1.3. Gaussian Process

With the definition of stochastic processes on hand, we can generalize the multivariate normal distribution (defined on finite products of real numbers) to possible infinite products of real numbers in the following sense.
Definition 4 (Gaussian Process).
Let X be a set. A Gaussian Process with index set X is a family of real valued random variables ( r x ) x X such that for every finite subset U X , the random variable ( r u ) u U is multivariate normal distributed.
Recall that by the above, this induces a probability measure on I . We can “construct” Gaussian Processes in the following way:
Theorem 3.
Let X be a set, C : X × X be a positive quadratic form in the sense that for every finite subset U X the induced matrix
C U = ( C ( u 1 , u 2 ) ) u 1 , u 2 U
is positive definite and m : X be a function. Given a subset U X , denote by
m U = ( m ( u ) ) u U
the induced vector.
Then, there exists a unique probability measure P on X satisfying
P p r U = P p r U 1 = N ( m U , C U )
for all U X finite where p r U : X U denotes the canonical projection.
The function C is called covariance function and m is called mean function of P.
Proof. 
See Appendix B.  □
In other words, we construct Gaussian Processes by choosing a positive quadratic form C further referred to as covariance function and a mean function m.
Example 4 (Squared exponential kernel).
The squared exponential kernel
k l , σ c 2 : X × X , ( x , y ) σ c 2 exp 1 2 l 2 x y 2 2
is a covariance function (i.e., a positive quadratic form; see [16]) for every l , σ c > 0 . The parameter l is called lengthscale and the parameter σ c 2 is called output variance. Other covariance functions may also be found in [16].
Example 5 (Covariance with white Gaussian noise).
Let m be a function and C be a covariance function. Given U = { x 1 , , x n } X . The reader may convince himself that the function
C wgn : X × X , ( x , y ) C ( x , x ) + σ 2 x = y U C ( x , y ) e l s e
is a positive quadratic form for each σ 2 > 0 . Note that σ 2 may be considered as a hyperparameter.
Combining Theorems 2 and 3, we derive an appropriate “conditioned” Gaussian Process.
Corollary 6.
Let ( r x ) x X be a Gaussian Process with index set X,
C : X × X
its covariance function and
m : X
its mean function. Let U X be a finite subset consisting of n elements and y n .
Then, there exists a unique probability measure P on X such that for every finite subset J X U the density function of P p r J 1 is given by the conditional density function of ( ( r j ) j J , ( r u ) u U ) given ( r u ) u U = y .
Its mean function m and covariance function C are constructed as follows: For every x X  define
C x , U = ( C ( x , u ) ) u U .
Then,
m ( x ) = m ( x ) + C x , U C U 1 ( y m U )
and
C ( x 1 , x 2 ) = C ( x 1 , x 2 ) + C x 1 , U C U 1 C x 2 , U .
Proof. 
See Appendix B.  □

3.1.4. Gaussian Process Regression

Consider a supervised learning problem
f : X
with training points
T = { ( x 1 , y 1 ) , , ( x N , y N ) } X × .
The task is to find an appropriate approximation of the unknown (“black box”) function f. To solve this task, we may use Gaussian Process regression, the idea of which is to
(i)
define a Gaussian Process on X by defining a mean and covariance function on X (Theorem 3),
(ii)
condition that Gaussian Process in the sense of Corollary 6 with U = { x 1 , , x N } and y = ( y 1 , , y N ) and
(iii)
use m from Formula 2 as approximation of f.
A GPR for f and T is then the data of a Gaussian Process on X conditioned to U = { x 1 , , x N } and y = ( y 1 , , y N ) .
Remark 2.
By its very nature, a GPR is equipped with a natural measure of prediction uncertainty. Instead of a single point prediction y for f ( x ) with x X , we obtain a probability distribution
N ( m ( x ) , C ( x , x ) ) .
We interpret
N ( 0 , C ( x , x ) )
as the uncertainty in the prediction m ( x ) at x.
Figure 2 illustrates the conditioning of a GPR to some new evaluations.

3.2. GPR Hyperparameter Adaption

Using a GPR for supervised learning problems requires the choice of some (initial) mean function and covariance function (Theorem 3). Most examples of covariance functions involve the choice of hyperparameters. Example 4 involves the choice of a lengthscale and output variance.
Consider a supervised learning problem f : X with training points T = { ( x 1 , y 1 ) , , ( x N , y N ) } X × . Given a mean function m and a family of covariance functions C θ with θ element of some index set ϑ , we choose a hyperparameter θ by following the maximum likelihood principle.
Denote by f C , m , θ the density function of the multivariate normal distribution
N ( m { x 1 , , x N } , C θ , { x 1 , , x N } ) .
We choose θ by solving
θ * = argmax θ ϑ ( f C , m , θ ( ( y i ) i = 1 , , N ) ) .
Remark 3.
In practice, one often replaces f C , m , θ with ln f C , m , θ and solves
θ * = argmax θ ϑ ( ln f C , m , θ ( ( y i ) i = 1 , , N ) )
resulting in identical parameters. However,
ln f C , m , θ ( y ) = 1 2 y C θ , { x 1 , , x N } y 1 2 ln ( det ( C θ , { x 1 , , x N } ) N 2 ln ( 2 π )
is more convenient to work with.

3.3. Bayesian Optimization

We define the hypervolume improvement as the gain of hypervolume when adding new points. At some places, the underlying function used for calculating new sample (or infill) points is called acquisition function.
Definition 5 (Hypervolume Improvement).
Given a reference point r t and a finite set of vectors F t , the hypervolume improvement of some y t is defined as
HVI r ( F , y ) = HV r ( F { y } ) HV r ( F ) .
We denote by
HVI r , F : t , y HVI r ( F , y )
the resulting function. We often write HVI r instead of HVI r , F whenever F is clear from context. Observe that HVI r , F is continuous (see Appendix C), hence integrable, on a bounded subset.
Remark 4.
Maximizing the Hypervolume improvement results in Pareto points (see Proposition 1).
Consider a black box function f : X t with evaluations T = { ( x 1 , y 1 ) , , ( x N , y N ) } X × t . Given an approximation f ^ of f (such as the mean of a GPR) and a suitable reference point r t , we strive to calculate
argmax x X HVI r ( { y 1 , , y N } , f ^ ( x ) )
in order to find a preimage of a Pareto point of f.
Recall that GPRs include a prediction uncertainty measure (Remark 2). We can take this additional information into account when maximizing the Hypervolume improvement in the following ways.
Definition 6 (Expected Hypervolume Improvement).
Let mean functions m i : X and covariance functions C i : X × X on X for i = 1 , , t be given. Denote by
m ( x ) = ( m i ( x ) ) i = 1 , , t
the induced mean vector and by C ( x ) the diagonal matrix with
C ( x ) i i = C i ( x , x ) .
Then, the expected Hypervolume improvement at x X is given by the expected value
EHVI ( x ) = y t HVI r ( y ) d P x = y t HVI r ( y ) f m ( x ) , C ( x ) d y
of HVI with respect to the probability measure
P x = i = 1 t N ( m i ( x ) , C i ( x , x ) ) = N ( m ( x ) , C ( x ) )
on t .
In many situations, the training data (more precisely the evaluations) are manipulated (i.e., pre-processed) before training (i.e., calculating the hyperparamters of the means und covariance functions). By the very definition, we obtain the following corollary:
Corollary 7.
Let S t and g : S t be a function. Assume g satisfies a i < b i if and only if g ( a ) i < g ( b ) i for all i = 1 , , t and a , b S . Then, s S is a Pareto point if and only if g ( s ) g ( S ) is a Pareto point.
Therefore, any function satisfying the above assumptions may be used for pre-processing of data points in the context of multicriterial optimization.
Remark 5.
The expected Hypervolume improvement involves the choice of some reference point. By construction, this choice affects the (expected) hypervolume contribution of any point in the target space. Notice that the reference point must be strictly greater in every component than every pareto optimal solution in order to ensure a hypervolume greater than zero for every such point. For example, if the black box function f factorizes through [ 0 , 1 ] t t , then, such a reference point may be chosen by
( 1 + ε , , 1 + ε ) t
for some ε > 0 .
Further discussion for the reference point selection may be found in the literature, e.g., under [17].
Roughly speaking, the expected hypervolume improvement is an extension of the hypervolume improvement incorporating the uncertainty information encapsulated in the GPRs. One hopes that maximizing the expected hypervolume improvement (of the GPRs) maximizes the hypervolume improvement of the black box function more efficiently than simply maximizing the hypervolume of the underlying mean function of the GPRs. Note that incorporating the uncertainty (of the model) allows to reflect a trade-off between exploration and exploitation.

3.4. Summary—Base for an Algorithmic Implementation

We shortly summarize the necessary steps in order to apply GPR and EHVI based multicriterial optimization to a black box function
f : X t .
Let
T = { ( x 1 , y 1 ) , , ( x N , y N ) } X × t
be evaluations of f. We define X 0 = { x 1 , , x N } and Y 0 = { y 1 , , y N } .

3.4.1. Setting up the GPRs

For each i = 1 , , t we choose a mean function m i and a covariance function (i.e., a positive quadratic form) C i on X. Examples of covariance functions can be found in [16]. By Theorem 3, we obtain a Gaussian Process for each i. In case the covariance function involves the choice of some hyperparameter, we determine that parameter by solving Equation (5). Next, we condition each mean m i and covariance function C i to T using Equations (2) and (3), respectively. We obtain GPRs, defined by their conditioned mean m i and covariance function C i for each i = 1 , , t (i.e., for each output component).

3.4.2. Maximizing the EHVI

We maximize the expected Hypervolume improvement, i.e., we solve
p = argmax x X EHVI ( x )
according to Equation (8) with respect to the mean functions m i and covariance functions C i . Algorithms for calculation of the expected Hypervolume improvement may be found in [18,19]. Lastly, we evaluate the black box function f at the found p.
We close this section by a couple of practical remarks:
  • GPRs form a rich class of regression models. However, evaluating a GPR involves the inversion of a N × N matrix with N being the number of training points (see Equation (3)). Accordingly, evaluating a GPR tends to get slow with increasing number of training points.
  • In addition, GPRs (as any regression model) requires a careful pre-processing of the data in order to produce reasonable results. At the very least, the input and output of the training data should be normalized (e.g, to ( 0 , 1 ) ).
To enable the reader understanding the GPR-MOBO algorithm described below in a comprehensible way, we present here a pre-processing example of the training data: Denote by
x min = ( inf z X ( z i ) ) i = 1 , , d and x max = ( sup z X ( x i ) ) i = 1 , , d
the componentwise minimum resp. maximum within the design space X. Then, define
n X : X [ 0 , 1 ] d , x x x min x max x min
where the division is performed componentwise and X n = n X ( X 0 ) . Assuming X to be bounded and x max , i x min , i for all components i, this is well defined. Furthermore, we define
y min = ( min y Y 0 ( y i ) ) i = 1 , , t and y max = ( max y Y 0 ( y i ) ) i = 1 , , t
by componentwise minimum resp. maximum of the evaluations. Without loss of generality, we may assume y max , i y min , i for all i. Define
n Y : t t , y y y min y max y min
and Y n = n Y ( Y 0 ) . It is straightforward to check that n Y satisfies the assumptions of Corollary 7.
We obtain normalized training data
T n = { ( n X ( x 1 ) , n Y ( y 1 ) ) , , ( n X ( x N ) , n Y ( y N ) ) } [ 0 , 1 ] d × [ 0 , 1 ] t .

4. Algorithmic Implementation and Validation

In this section, we first derive an algorithmic implementation of an adaptive Bayesian Optimization (BO) algorithm based on GPR infill making use of expected hypervolume improvement as acquistion function. in Section 4.2, we apply this algorithm to a mathematical test function with known Pareto front prior validating it in Section 4.3 within context of a real-world power system application.

4.1. GPR-MOBO Algorithmic Implementation

The proposed GPR-MOBO workflow presented in Algorithm 1 strictly follows the sequence of steps for applying GPR with EHVI-based MOO to a black box function as described in Section 3.4.
Algorithm 1 Structural MO Bayesian Optimization workflow.
X 0 { x 1 , , x N 0 } ▷ Set initial DoE
Y 0 f ( X 0 ) ▷ Simulate at X 0 (expensive)
m 0 map ▷ Choose zero map as mean function
C (1)▷ Choose covariance function
r (9)▷ Choose reference point
while | X 0 | < N tot do▷ unless abortion criterion is met
    X n , Y n (11), (12)▷ Pre-process data
   for  i = 1 t do▷ for each dimension of target space
     θ i * (5)▷ Calculate hyperparameters
     m i (2)▷ Condition mean
     C i (3)▷ Condition covariance
   end for
    p (10)▷ Calculate optimal infill sample point
    X 0 X 0 n X 1 ( p ) ▷ Add infill to X 0
    Y 0 Y 0 f ( n X 1 ( p ) ) ▷ Add simulation at n X 1 ( p ) to Y 0 (expensive)
  end while
   Y P { y 1 , , y k } ▷ Find non-dominated points
   X P f 1 ( Y P ) ▷ Acquire according design space points
  return X P , Y P
Wherever applicable, we refer to equations as referenced in Section 3. As an initial design of experiments (DoE) X 0 , we propose Latin Hypercube Sampling (LHCS) according [20] as the basis of the initial computationally expensive black box function evaluation f ( X 0 ) . We assume the initial sampling by N 0 expensive evaluations to cover the full range of parameter values within the target space to guarantee its image on unit cube after normalization.
N tot (maximum number of samples available for computationally expensive black box function evaluations) and N 0 (fraction of N tot as the number of initial samples) may be considered as the GPR-MOBO hyperparameters.
Algorithm 1 has been implemented in Matlab 2021b making use of the minimizer fmincon based on the sqp option (called multiple times with different initial values by GlobalSearch) to find the minimum of the negative log marginal likelihood function for hyperparameter adaption (see Section 3.2) and for BO (see Section 3.3) with application to the negative EHVI function which is provided by [21]. At this point it seems worthwhile noting, GlobalSearch not to yield deterministic results, i.e., multiple runs with identical input values may vary in their output values.
To make the computation more efficient, for numerical inversion of C F , the Cholesky decomposition is used.

4.2. Test Function Based Validation

In this subsection, we are aiming for an effectiveness and efficiency comparison of Algorithm 1 versus state-of-the-art alternatives LHCS [20] and NSGA-II [22]. We present results of their application to a well defined black box function with analytically well known Pareto front. In our case, we picked the to be minimized test function ZDT1 according [22]
f : [ 0 , 1 ] d 2 , x 1 x d x 1 g ( x ) 1 x 1 g ( x ) with g ( x ) = 1 + 9 d 1 i = 2 d x i
and 2 d 30 indicating the design space dimension. Note that ZDT1 exhibits a convex Pareto front independent of d.
We compare results for the alternative Pareto front search algorithms granting N tot = 50 black box evaluations plus adding an NSGA-II analysis with N tot = 200 black box evaluations. For LHCS, all samples were spent within the initial run while Algorithm 1 started with N 0 = 30 initial LHCS samples. In case of N tot = 50 total samples, the NSGA-II algorithm was applied in n gen = 5 generations with a population of n pop = 10 samples while in case of N tot = 200 total samples, n gen = 8 generations with a population of n pop = 25 samples were run. For statistical purposes, the algorithm evaluations were repeated fifty times with different random starting values. Design space dimension was chosen by d { 2 , 3 , 5 , 10 , 15 , 20 , 25 , 30 } . The reference point during optimization was chosen according Equation (9) with ε = 2 % . Making use of knowledge about the ZDT1 target domain, we fixed y min = ( 0 , 0 ) for pre-processing. To quantify the search algorithms’ performance, the hypervolume with respect to reference point ( 2 , 10 ) in relation to the (known) maximal hypervolume is evaluated. The results are plotted in Figure 3.
Box plots in Figure 3 indicate statistical evaluations of the repeated search runs. GPR-MOBO results are drawn in red, LHCS results in blue, NSGA-II results for N tot = 50 in brown and for N tot = 200 in green.

4.3. Power System Design Based Validation

Within the last subsection of this chapter, we apply the GPR-MOBO Algorithm 1 to the design and optimization of a power system example. Figure 4 illustrates both, the toolchain and the generic power system model as used for GPR-MOBO power system design and optimization validation.
For the energy domain specific “Power System Modeling and Simulation Environment” of the tool chain in our example, we use the commercial tool PSS®DE [23]. Connected to the “Script Control” (implemented in Python code) through an “xml interface”, adjustable design space parameters of the power system model receive dedicated parameter values as computed within the “Algorithmic Workflow” (execution of the GPR-MOBO algorithm in Matlab 2021b code). Data stored within the “Data Base” are accessible for “Statistical Evaluation” and “Visualisation” (all implemented in Python code). The generic power system model for our validation example is defined by a star topology connecting a standard H 0 -profile [24] electric load with 20.2 GWh annual demand to three aggregate components (“Wind turbine”, “PV”, “Battery”) and a “Public Grid”. The “Power System Modeling and Simulation Environment” is simulating power system results in terms of well defined key performance indicators (KPI), CAPEX (captial expenditures for installation of the according power system), and CO2 (amount of carbon dioxide for a given configuration to provide the total amount of energy). The KPI behavior of the “Power System Modeling and Simulation Environment” can therefore be viewed as a black box function.
Parameter value ranges defining the design space for the system of interest are listed in Table 1.
As mentioned in Section 3.3, for the GPR-MOBO, the design and target space samples (design parameters and KPIs, respectively) are normalized according Equations (11) and (12).
The results for the experiment are shown in Figure 5. N 0 = 30 initial samples and N tot = 50 samples in total were chosen as GPR-MOBO hyperparameters, while total emission of carbon dioxide (CO2 in kilotons) and capital expenditures for acquisition and installation of aggregate components (CAPEX in million euros) were selected as to be evaluated trade-off KPIs. Making use of knowledge about the according target domain, we fixed y min = ( 0 , 0 ) for pre-processing. The reference point during optimization was chosen according Equation (9) with ε = 2 % .
Figure 5 shows the acquired subspace of the target space by the experiment. All sample points are marked by crosses, while the Pareto optimal solutions forming the Pareto front are highlighted by red crosses × and the remaining portion of initial (30 LHCS) results are marked by cyan crosses ×. In addition, an approximate Pareto front is plotted, resulting from a full-factorial design space latticing based on 1200 sample points. Non-dominated Pareto points are indicated by blue dots with a Pareto front completed by linear interpolation (marked by a blue dotted line Sustainability 14 12777 i004).

5. Discussion

According Section 2, we may interpret the identified hypervolume of a black box as an indicator of effectiveness for a given Pareto front search algorithm. Putting that identified hypervolume in ratio to the number of computationally expensive evaluations required to identify this hypervolume, in turn, may be considered a suitable indicator of the algorithm’s efficiency. Applying these definitions to results obtained in the previous Section 4, Figure 3 clearly indicates:
(i)
Given the limited low number of N tot = 50 samples, all selected algorithms indicate a decreasing effectiveness over an increasing number of design space dimensions (i.e., degrees of design freedom, DoF).
(ii)
GPR-MOBO outperforms the effectiveness of the other algorithms for all DoF d, showing even for d = 30 (with N tot = 50 ) at the median more than 97 % effectiveness (versus less than 71 % for the other algorithms).
(iii)
For N tot = 50 , the effectiveness of NSGA-II and LHCS appears comparable within statistical significance for all evaluated DoF n.
(iv)
For N tot = 50 samples, GPR-MOBO outperforms NSGA-II even when applied with N tot = 200 samples.
(v)
Efficiency of GPR-MOBO significantly increases with increasing DoF n when compared to other algorithms.
On the other hand, it is worthwhile mentioning GPR-MOBO algorithm to require more computation time than standard LHCS or NSGA-II algorithms. Depending on the test function dimension d and hardware environment, GPR-MOBO runs (i.e., 20 iterations) took between 3   min and 11   min wherein the fraction for evaluating the black box function (ZDT1) can be treated as negligible while NSGA-II or LHCS runs tool only less than a tenth portion of this time. This indicates that the GPR-MOBO is advisable, if the black box function is expensive to evaluate in terms of capital, time, or other resources.
The experimental validation using an unknown black box function whose result is shown in Figure 5 again validates the effectiveness and efficiency of the GPR-MOBO. The Pareto front as identified by N tot = 50 black box evaluations is already very close to the front as approximated by N tot = 1200 full-factorial lattice based black box evaluations. Some GPR-MOBO Pareto points are even dominating that identified by full-factorial lattice. The example, thereby, demonstrates effectiveness and efficiency of GPR-MOBO based power system design and optimization.
The results shown indicate a general superiority of GPR-MOBO over state-of-the-art algorithms. However, this has been demonstrated only on exemplary base. We therefore point out the inadmissibility of a generalization of superiority. A general superiority of GPR-MOBO cannot and should not be derived from single, individual test functions or application examples. Such a fundamental superiority would have to be mathematically proven and would presumably require fundamental knowledge of the black box function itself or the Pareto front spanned by this black box function.

6. Conclusions and Outlook

In this paper, we tackled the challenge of power system design and optimization in VUCA environment. We proposed a Multi-Objective Bayesian Optimization based on Gaussian Process Regression (GPR-MOBO) in the context of power system virtual prototyping. After a mathematical reformulation of the challenge, we presented the background of Gaussian Process Regression, including hyperparameter adaption and use in the context of a Bayesian Optimization approach, focusing on the expected improvement in hypervolume. For validation purposes, we benchmarked our GPR-MOBO implementation statistically based on a mathematical test function analytically known Pareto front and compared results to those of well-known algorithms NSGA-II and pure Latin Hyper Cube Sampling. We demonstrated superiority of the GPR-MOBO approach over the compared algorithms, especially for high dimensional design spaces. Finally, we applied the GPR-MOBO algorithm for planning and optimization of a power system (energy park) in terms of selected performance indicators of exemplary character.
Concluding, GPR-MOBO turned out as an effective and efficient approach for power system design and optimization in VUCA environment with superior character when simulations are computationally expensive and the number of design degrees of freedom is high.
Some topics remain open for future investigation. Besides a performance comparison with other than already selected algorithms, some detailed questions within the GPR-MOBO family are worthwhile to be considered. This includes topics such as the choice of acquisition function, pre-processing, selection of the reference points when (expected) hypervolume (improvement) is put in focus and application of various global optimizers to name just a few. One level above, more not yet satisfactorily answered question address the extension to mixed-integer design spaces and issues related to constraint handling.

Author Contributions

Conceptualization, H.P.; methodology, H.P.; software, N.P. and M.L.; validation, N.P. and M.L.; writing—original draft preparation, H.P.; writing—review and editing, N.P., M.L. and H.P.; visualization, M.L. and H.P.; supervision, H.P.; project administration, H.P.; mathematical theory, N.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by Siemens AG under the “Pareto optimal design of Decentralized Energy Systems (ProDES)” project framework.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Siemens AG for providing a free PSS®DE license for the power system design based validation part.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Maximization of Hypervolume and Pareto Points

Lemma A1.
Let x , x , r t with x , x r . Define
c i : = ( r 1 , , r i 1 , max ( x i , x i ) , r i + 1 , , r t )
as vector r with i-th component replaced with the maximum of x i and x i . Then,
I ( x , r ) = ( I ( x , r ) I ( x , r ) ) i = 1 t I ( x , c i ) .
Proof. 
First, we prove “⊇”:
Since c i r for all i, we obtain I ( x , c i ) I ( x , r ) , hence, “⊇”.
Secondly, we prove “⊆”:
Given some z I ( x , r ) I ( x , r ) there exists an i with x i z i < x i . Since z r , we obtain z I ( x , c i ) . This proves “⊆”.
Moreover, if max ( x i , x i ) = x i , then, I ( x , c i ) = , hence, z I ( x , c i ) implies x i z i < x i and thus z I ( x , r ) . This proves that ( I ( x , r ) I ( x , r ) ) and i = 1 t I ( x , c i ) are disjoint. □
Corollary A2.
For x , x , r t with x r and x i < x i , there exists x d r with
I ( x , d ) I ( x , r ) I ( x , r ) .
Proof. 
We may assume x r , since otherwise I ( x , r ) = and the claim follows for d = r . Using Lemma A1, we obtain
I ( x , r ) I ( x , r ) = j = 1 t I ( x , c j ) I ( x , c i ) .
Furthermore, x c i r since x r and x i < x i . Thus, d = c i satisfies the claim. □
Corollary A3.
Let S t be a finite subset and x , r t with x r . Assume further for every s S there exists i s such that x i s < s i s . Then, there exists c t with x c r and
I ( x , c ) I ( x , r ) s S I ( s , r ) .
In particular, the Lebesgue measure λ ( I ( x , c ) ) > 0 of I ( x , c ) is greater than zero.
Proof. 
We prove the claim by induction over the size n of S. For n = 1 , this is the Corollary A2. Assume the claim holds for all such S with size equal to n. Given S with cardinality n + 1 , there exists such c with
I ( x , c ) I ( x , r ) s S , s s 1 I ( s , r ) .
Furthermore, we obtain
I ( x , r ) s S I ( s , r ) c r I ( x , c ) I ( s 1 , c ) I ( x , c )
for some x c c where the last subset is obtained by applying Corollary A2 to r = c , x = s 1 .
Lastly, the Lebesgue measure λ ( I ( x , c ) ) is given by the product i = 1 t ( c i x i ) > 0 which is greater than zero since x c . □
Corollary A4.
Let S t be a finite subset and x , r t with x , s r for all s S . Then,
HVI r ( S , x ) = 0
if and only if there exists s S with s x .
Proof. 
Given s x , I ( x , r ) s S I ( s , r ) by the very construction. Thus, “⇐” holds. Given HVI r ( S , x ) = 0 assume that no s S satisfies s x . This is for every s S there exists i s such that x i s < s i s . Since
s S I ( s , r ) I ( x , r ) = s S I ( s , r ) ( I ( x , r ) s S I ( s , r ) )
and the additivity of the Lebesgue measure, it suffices to prove
λ ( I ( x , r ) s S I ( s , r ) ) > 0 .
This follows by the Corollary A3. □
Corollary A5.
Let S t be a finite subset and x , x , r t with x , s r for all s S , x x and x i < x i for some i. Assume further for every s S there exists i s such that x i s < s i s . Then,
HV r ( S { x } ) > HV r ( S { x } ) HV r ( S ) .
Proof. 
Clearly, the last inequality holds. Due to
s S I ( s , r ) I ( x , r ) = s S I ( s , r ) ( I ( x , r ) s S I ( s , r ) )
(resp. for x ) and the additivity of the Lebesgue measure, it suffices to prove
λ ( I ( x , r ) s S I ( s , r ) ) > λ ( I ( x , r ) s S I ( s , r ) ) .
By Corollary A3, there exists x c r such that
I ( x , c ) I ( x , r ) s S I ( s , r ) .
Observe
I ( x , c ) I ( x , r ) = I ( x , c ) I ( x , c )
since c r . Applying Corollary A2, we obtain some x c c such that
I ( x , c ) I ( x , c ) I ( x , r ) = I ( x , c ) I ( x , c ) .
Observe
I ( x , r ) I ( x , r )
since x x . Together we obtain,
( I ( x , r ) s S I ( s , r ) ) I ( x , c ) I ( x , r ) s S I ( s , r ) .
By the additivity of the Lebesgue measure, we obtain
λ ( I ( x , r ) s S I ( s , r ) ) λ ( I ( x , r ) s S I ( s , r ) ) + λ ( I ( x , c ) ) .
Since x c , we argue as in the proof of the previous Corollary that λ ( I ( x , c ) ) > 0 . This proves the claim. □
Proof of proposition 1.
We may without loss of generality assume s r for all s S . Indeed, let T S be the points in S satisfying s r for all s T . Given some pareto point t T , assume there exists some s S T with s t . Then, s t r which contradicts s S T .
Since S is bounded and closed and HVI r ( F , ) is continuous, we deduce the existence of some p S which maximizes HVI r ( F , ) . Since
HVI r ( F , p ) = max x S HVI r ( F , x ) HVI r ( F , s ) > 0 ,
we obtain there exists no f F with f p by Corollary A4. i.e., for all f F there exists i f with p i f < f i f . Assume there exists x S with x p and x i < p i . Then, there exists no f F with f x since x p . By applying Corollary A5, we deduce
HV r ( F { x } ) > HV r ( F { p } ) HV r ( F )
which contradicts HVI r ( F , p ) = max x S HVI r ( F , x ) . □

Appendix B. Probability Measure for Multivariate Normal Distribution

Theorem A6.
Let I be a set and for every J I finite an inner regular probability measure P J on J be given. Given two finite J J I , denote by p r J J : J′ J the canonical projection. Assume that for all J J I finite
P J p r J J 1 = P J
holds. Then, there exists a unique measure P I on I satisfying
P I p r J 1 = P J
for all J I finite.
Before giving the proof, recall:
Theorem A7 (Hahn-Kolmogorov).
Let S be a set and R P ( S ) be ring, i.e., R and R is stable under finite unions and binary complements. Let
P 0 : R [ 0 , )
be a pre-measure, i.e.,
P ( ) = 0
and
P 0 ( n A i ) = n P 0 ( A i )
for pairwise disjoint
A 1 , A 2 , R
with
n A i R .
Then, P 0 extends to a measure P on the sigma algebra σ ( R ) generated by R. Furthermore, if P 0 is σ-finite, then, the extension is unique.
Proof of Theorem A6.
The sigma algebra of the product I is generated by
R = { A × I J | J I finite , A B ( J ) } .
The reader convinces himself that R is a ring. Define a function
P 0 : R [ 0 , )
by P 0 ( A × I J ) = P J ( A ) . Observe that given A × I J = A × I J , then, without loss of generality J J and A = A × J J . Thus,
P 0 ( A × I J ) = P J ( A ) by definition = P J ( A × J J ) = P J p r J J 1 ( A ) = P J ( A ) P J p r J J 1 = P J = P 0 ( A × I J )
and P 0 is well defined. Then, the reader convinces himself that
P 0 ( ) = P J ( ) = 0
and P 0 is finite additive. We prove that P 0 is σ -additive, i.e., given A 1 , R pairwise disjoint with n A n R , then,
P 0 ( n A n ) = n P 0 ( A n ) .
Then, the Hahn-Kolmogorov theorem above proves the claim. Notice that every probability measure is σ -finite and, thus, P 0 is.
It is well known that it suffices to prove
lim n P 0 ( A n ) = 0
for all
A 1 A 2 R
with n A n = since P 0 is finite additive. We prove that if there exists an ϵ > 0 such that
P 0 ( A n ) > 2 ϵ
for all n , then, n A n . Write A i = B i × I J i such that J i J i + 1 for every i. Then,
P 0 ( A i ) = P J i ( B i ) = sup K B i compact P J i ( K )
since P J is inner regular. For all i choose some K i B i compact such that
P J i ( B i ) P J i ( K i ) < 2 i ϵ .
Write C i = K i × I J i for every i. We first prove
P 0 ( i = 1 n C i ) > ϵ
for all n. Notice
P 0 ( i = 1 n C i ) = P 0 ( i = 1 n A i ) P 0 ( i = 1 n A i i = 1 n C i )
since P 0 is finite additive and
( i = 1 n C i ) ( i = 1 n A i i = 1 n C i ) = i = 1 n A i .
Furthermore, i = 1 n A i = A n since A i + 1 A i for all i. Thus,
P 0 ( i = 1 n A i ) = P 0 ( A n ) > 2 ϵ
by (A2) and it suffices to prove
P 0 ( i = 1 n A n C i ) = P 0 ( i = 1 n A i i = 1 n C i ) < ϵ
for all n. It holds
P 0 ( i = 1 n A n C i ) P 0 ( i = 1 n A i C i ) A n A i i = 1 n ( P 0 ( A i ) P 0 ( C i ) ) P 0 fin . add . = i = 1 n ( P J i ( B i ) P J i ( K i ) ) by definition i = 1 n 2 i ϵ ( A 2 ) < ϵ
In particular, i = 1 n C i and, hence, D n : = i = 1 n K i for all n. We consider the descending sequence
D 1 D 2 R .
Since all D i are compact, t is Hausdorff (hence, D 1 is) and all D i are non-empty, the below claim ensures that
i K i i A i
and, hence, the claim. □
Lemma A8. 
Let
E K 0 K 1
be a descending sequence of compact topological spaces with E being a Hausdorff space.
Then K = i K i = implies that there exists some n with K n = .
Proof. 
If K = , then, its complement
E = K c = E K = n ( E K i )
is E. Recall every compact subset of a Hausdorff space to be closed. Hence, E K i is open for every i. Since E is compact, there exist i 1 , , i k such that E = j = 1 , , k ( E K i j ) . Hence, j = 1 , , k K i j = is empty. Thus,
K n j = 1 , , k K i j =
for some n greater all i j . □
Proof of Theorem 3.
In view of Theorem A6, it suffices to prove
N ( m F , C F ) = N ( m F , C F ) p r F F 1
for any finite subsets F F X . Observe that p r F F : F F is given by left multiplication with the matrix A = ( δ i j ) i F , j F and that A has full rank (since the projection is an epimorphism). Furthermore, by construction we obtain equalities
C F = A C F A T
and
m F = A m F .
Then, the claim is precisely “(9.5) Satz” in [25] for A = A and a = 0 . □
Proof of Corollary 6.
Denote by
C J , F = ( C ( j , f ) ) j J , f F .
Given some J X finite, a straightforward computation proves
m J : = ( m ( j ) ) j J = m J + C J , F C F 1 ( y m F ) C J : = ( C ( j 1 , j 2 ) ) j 1 , j 2 J = C J + C J , F C F 1 C F , J .
Then, m and C are functions such that for every J X finite the induced matrix C J = ( C ( j 1 , j 2 ) ) j 1 , j 2 J is positive definite and the density function of the induced normal distribution
N ( m J , C J )
is the conditional density function induced by ( ( r j ) j J , ( r f ) f F ) given ( r f ) f F = y by Theorem 2. Applying Theorem 3 with mean function m and covariance function C finishes the proof. □

Appendix C. Integrability of Hypervolume Improvement

Lemma A9. 
The Hypervolume improvement function
HVI r , F : S , y HVI r ( F , y )
is continuous, hence, integrable for F finite and S t such that s , f r for all s S , f F .
Proof. 
It suffices to prove that
h :ℝ t , y HV r ( F { y } )
is continuous. First, recall that given sets F 1 , , F n , the Lebesgue measure of their union is given by
λ ( i = 1 n F i ) = i = 1 n λ ( F i ) i < j n λ ( F i F j ) + + i < j < k n λ ( F i F j F k ) + + ( 1 ) n 1 λ ( i = 1 n F i )
since λ is additive. Given s 1 , , s n t , we define
i ( s 1 , , s n ) = ( max ( s 1 i , , s n i ) ) i = 1 , , n .
Clearly, I ( a , r ) I ( b , r ) = I ( i ( a , b ) , r ) for all a , b r . Using i ( s 1 , , s n ) = i ( i ( s 1 , , s n 1 ) , s n ) , we obtain
i = 1 n I ( s i , r ) = I ( i ( s 1 , , s n ) , r )
for s i r by induction. Writing
h ( y ) = λ ( f F I ( f , r ) I ( y , r ) )
as sum as above and using (A4), it suffices to prove
g : S , y λ ( I ( i ( s 1 , , s n , y ) , r ) )
to be continuous for all s 1 , , s n S . Therefore, it suffices to prove
(i)
i ( s 1 , s n , ) : t t , y i ( s 1 , s n , y )
and
(ii)
λ ( I ( , r ) ) : S , y λ ( I ( y , r ) )
to be continuous.
To (i):
We observe
i ( s , y ) = 1 2 ( s + y + | s x | )
(where everything is taken componentwise) and, thus, i ( s , ) is continuous for all s. Using
i ( s 1 , , s n , y ) = i ( i ( s 1 , , s n ) , y ) ,
we obtain the claim.
To (ii):
We calculate
λ ( I ( y , r ) ) = i = 1 t ( r i y i )
which is continuous in y. □

References

  1. Elkington, R. Leadership Decision-Making Leveraging Big Data in Vuca Contexts. J. Leadersh. Stud. 2018, 12, 66–70. [Google Scholar] [CrossRef]
  2. Afshari, H.; Hare, W.; Tesfamariam, S. Constrained multi-objective optimization algorithms: Review and comparison with application in reinforced concrete structures. Appl. Soft Comput. 2019, 83, 105631. [Google Scholar] [CrossRef]
  3. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef] [Green Version]
  4. Lyu, W.; Xue, P.; Yang, F.; Yan, C.; Hong, Z.; Zeng, X.; Zhou, D. An Efficient Bayesian Optimization Approach for Automated Optimization of Analog Circuits. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 1954–1967. [Google Scholar] [CrossRef]
  5. Zhang, S.; Yang, F.; Yan, C.; Zhou, D.; Zeng, X. An Efficient Batch-Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multiobjective Acquisition Ensemble. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2022, 41, 1–14. [Google Scholar] [CrossRef]
  6. Guo, J.; Crupi, G.; Cai, J. A Novel Design Methodology for a Multioctave GaN-HEMT Power Amplifier Using Clustering Guided Bayesian Optimization. IEEE Access 2022, 10, 52771–52781. [Google Scholar] [CrossRef]
  7. Sawant, M.M.; Bhurchandi, K. Hierarchical Facial Age Estimation Using Gaussian Process Regression. IEEE Access 2019, 7, 9142–9152. [Google Scholar] [CrossRef]
  8. Huang, H.; Song, Y.; Peng, X.; Ding, S.X.; Zhong, W.; Du, W. A Sparse Nonstationary Trigonometric Gaussian Process Regression and Its Application on Nitrogen Oxide Prediction of the Diesel Engine. IEEE Trans. Ind. Inform. 2021, 17, 8367–8377. [Google Scholar] [CrossRef]
  9. Koriyama, T.; Nose, T.; Kobayashi, T. Statistical Parametric Speech Synthesis Based on Gaussian Process Regression. IEEE J. Sel. Top. Signal Process. 2014, 8, 173–183. [Google Scholar] [CrossRef]
  10. Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
  11. Lewis-Beck, C.; Lewis-Beck, M. Applied Regression: An Introduction; Sage Publications: Thousand Oaks, CA, USA, 2015; Volume 22. [Google Scholar]
  12. Verleysen, M.; François, D. The curse of dimensionality in data mining and time series prediction. In Proceedings of the International Work-Conference on Artificial Neural Networks, Barcelona, Spain, 8–10 June 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 758–770. [Google Scholar]
  13. Frazier, P.I. Bayesian optimization. In Recent Advances in Optimization and Modeling of Contemporary Problems; Informs: Phoenix, AZ, USA, 2018; pp. 255–278. [Google Scholar]
  14. Emmerich, M.; Deutz, A.H. A tutorial on multiobjective optimization: Fundamentals and evolutionary methods. Nat. Comput. 2018, 17, 585–609. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.; Wiley: New York, NY, USA, 2003. [Google Scholar]
  16. Duvenaud, D. Automatic Model Construction with Gaussian Processes. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2014. [Google Scholar]
  17. Ishibuchi, H.; Imada, R.; Setoguchi, Y.; Nojima, Y. Reference point specification in hypervolume calculation for fair comparison and efficient search. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017; pp. 585–592. [Google Scholar]
  18. Emmerich, M.; Deutz, A.; Klinkenberg, J. Hypervolume-based expected improvement: Monotonicity properties and exact computation. In Proceedings of the 2011 IEEE Congress of Evolutionary Computation (CEC), Ritz Carlton, New Orleans, LA, USA, 5–8 June 2011; pp. 2147–2154. [Google Scholar]
  19. Yang, K.; Emmerich, M.; Deutz, A.; Bäck, T. Efficient computation of expected hypervolume improvement using box decomposition algorithms. J. Glob. Optim. 2019, 75, 3–34. [Google Scholar] [CrossRef] [Green Version]
  20. McKay, M.D.; Beckman, R.J.; Conover, W.J. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics 1979, 21, 239. [Google Scholar]
  21. Emmerich, M. KMAC V1.0 - The efficient O(n log n) implementation of 2D and 3D Expected Hypervolume Improvement (EHVI). Available online: https://liacs.leidenuniv.nl/~csmoda/index.php?page=code (accessed on 22 June 2022).
  22. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
  23. Siemens, Data Sheet PSS®DE. Available online: https://new.siemens.com/global/en/products/energy/energy-automation-and-smart-grid/grid-edge-software/pssde.html (accessed on 24 August 2022).
  24. Proedrou, E. A comprehensive review of residential electricity load profile models. IEEE Access 2021, 9, 12114–12133. [Google Scholar] [CrossRef]
  25. Georgii, H.O. Stochastik: Einführung in die Wahrscheinlichkeitstheorie und Statistik, 5. Auflage; Walter de Gruyter: Berlin, Germany, 2007. [Google Scholar]
Figure 1. Hypervolume (blue area) in 2 . The dotted rectangles illustrate the volumes i.e., I ( u , r ) spanned by the respective point u U (×) and the reference point r (×).
Figure 1. Hypervolume (blue area) in 2 . The dotted rectangles illustrate the volumes i.e., I ( u , r ) spanned by the respective point u U (×) and the reference point r (×).
Sustainability 14 12777 g001
Figure 2. Top: real function f = s i n (Sustainability 14 12777 i001) and GPR with mean m (Sustainability 14 12777 i002) and covariance function C where (Sustainability 14 12777 i003) symbolizes mean plus/minus standard deviation (i.e., m ( x ) ± C ( x , x ) ) at each point. Bottom: Same as top with m and C conditioned to the vertical cyan dashes.
Figure 2. Top: real function f = s i n (Sustainability 14 12777 i001) and GPR with mean m (Sustainability 14 12777 i002) and covariance function C where (Sustainability 14 12777 i003) symbolizes mean plus/minus standard deviation (i.e., m ( x ) ± C ( x , x ) ) at each point. Bottom: Same as top with m and C conditioned to the vertical cyan dashes.
Sustainability 14 12777 g002
Figure 3. Statistical evaluation of relative hyper volume from 50 repeated runs comparing performance of three MOO algorithms for a selected set of ZDT1 design space dimension d. Top three evaluations for GPR-MOBO ( N 0 = 30 initial samples, red), LHCS (blue) and NSGA-II (with n pop = 10 samples in n gen = 5 generations, brown) are based on N tot = 50 samples. Bottom figure evaluation for NSGA-II ( n pop = 25 samples in n gen = 8 generations, (green) is based on N tot = 200 samples.
Figure 3. Statistical evaluation of relative hyper volume from 50 repeated runs comparing performance of three MOO algorithms for a selected set of ZDT1 design space dimension d. Top three evaluations for GPR-MOBO ( N 0 = 30 initial samples, red), LHCS (blue) and NSGA-II (with n pop = 10 samples in n gen = 5 generations, brown) are based on N tot = 50 samples. Bottom figure evaluation for NSGA-II ( n pop = 25 samples in n gen = 8 generations, (green) is based on N tot = 200 samples.
Sustainability 14 12777 g003
Figure 4. Toolchain and generic power system model as used for GPR-MOBO power system design based validation.
Figure 4. Toolchain and generic power system model as used for GPR-MOBO power system design based validation.
Sustainability 14 12777 g004
Figure 5. CAPEX vs. CO2 evaluations (marked by crosses) as obtained by power system simulation. Initial samples are marked by ×, Pareto-optimal samples by ×. An approximate Pareto front based on 1200 full-factorial design space latticing sample points is indicated by blue dots with interpolation marked by a blue dotted line Sustainability 14 12777 i004.
Figure 5. CAPEX vs. CO2 evaluations (marked by crosses) as obtained by power system simulation. Initial samples are marked by ×, Pareto-optimal samples by ×. An approximate Pareto front based on 1200 full-factorial design space latticing sample points is indicated by blue dots with interpolation marked by a blue dotted line Sustainability 14 12777 i004.
Sustainability 14 12777 g005
Table 1. Design space limits for the experiment.
Table 1. Design space limits for the experiment.
Design ParameterMinMax
PV peak power/MW p 041
Wind turbine peak power/MW p 0 17.1
Battery rack capacity/MAh0 1.21
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Palm, N.; Landerer, M.; Palm, H. Gaussian Process Regression Based Multi-Objective Bayesian Optimization for Power System Design. Sustainability 2022, 14, 12777. https://doi.org/10.3390/su141912777

AMA Style

Palm N, Landerer M, Palm H. Gaussian Process Regression Based Multi-Objective Bayesian Optimization for Power System Design. Sustainability. 2022; 14(19):12777. https://doi.org/10.3390/su141912777

Chicago/Turabian Style

Palm, Nicolai, Markus Landerer, and Herbert Palm. 2022. "Gaussian Process Regression Based Multi-Objective Bayesian Optimization for Power System Design" Sustainability 14, no. 19: 12777. https://doi.org/10.3390/su141912777

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop