A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints

Cong, Hongri; Wang, Bo; Wang, Zhe

doi:10.3390/math12071115

Open AccessArticle

A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints

by

Hongri Cong

¹

,

Bo Wang

^1,*

and

Zhe Wang

²

¹

School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK

²

Covestro (Netherlands), 6167 RD Geleen, The Netherlands

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(7), 1115; https://doi.org/10.3390/math12071115

Submission received: 6 February 2024 / Revised: 1 April 2024 / Accepted: 6 April 2024 / Published: 8 April 2024

(This article belongs to the Special Issue AI for Brain Science and Brain-Inspired Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Optimization, particularly constrained optimization problems (COPs), is fundamental in engineering, influencing various sectors with its critical role in enhancing design efficiency, reducing experimental costs, and shortening testing cycles. This study explores the challenges inherent in COPs, with a focus on developing efficient solution methodologies under stringent constraints. Surrogate models, especially Gaussian Process Regression (GPR), are pivotal in our approach, enabling the approximation of complex systems with reduced computational demand. We evaluate the efficacy of the Efficient Global Optimization (EGO) algorithm, which synergizes GPR with the Expected Improvement (EI) function, and further extend this framework to Constrained Expected Improvement (CEI) and our novel methodology Constrained Expected Prediction Error (CEPE). We demonstrate the effectiveness of these methodologies by numerical benchmark simulations and the real-world application of optimizing a Three-Bar Truss Design. In essence, the innovative CEPE approach promises a potent balance between solution accuracy and computational prowess, offering significant potential in the broader engineering field.

Keywords:

constrained optimization problems; Gaussian process regression; constrained expected improvement; constrained expected prediction error; Three-Bar Truss Design

MSC:

62P30

1. Introduction

In the fields of science and engineering, optimization problems, especially those with constraints, play a crucial role. Constrained optimization problems (COPs) are vital in a wide range of applications, including process control [1,2], reactor design [3], and production scheduling [4]. Among these, specific challenges such as tension/compression coil spring design [5] highlight the intricate nature of these problems. The aim of addressing COPs is to enhance design efficiency, minimize experimental costs, and reduce testing cycles. These problems hold intrinsic value across various engineering disciplines, including chemical, mechanical, and electrical engineering [6,7,8], as they seek optimal solutions within the boundaries set by constraints, be they equalities or inequalities [9]. To solve these complex issues, a range of methodologies have been developed, from gradient-based methods [10], known for their efficiency in differentiable problems, to population-based strategies like Genetic Algorithms [11], which are capable of addressing non-differentiable and multi-modal challenges, albeit often at a higher computational cost [12]. COPs are typically represented through mathematical formulations, underscoring their quantitative nature and the systematic approach required for their resolution. We define the formulation

\begin{matrix} m i n & y = f (x), \\ s t : & l \leq g (x) = (g_{1} (x), g_{2} (x), \dots, g_{m} (x)) \leq u, \\ w h e r e & l = (l_{1}, l_{2}, \dots, l_{m}), u = (u_{1}, u_{2}, \dots, u_{m}), \\ x = (x_{1}, x_{2}, \dots, x_{n}) \in X, \\ X = {x | x_{l} \leq x \leq x_{u}}, \\ x_{l} = (x_{l_{1}}, x_{l_{2}}, \dots, x_{l_{n}}), x_{u} = (x_{u_{1}}, x_{u_{2}}, \dots, x_{u_{n}}), \end{matrix}

(1)

where

x

is a solution vector within the solution space

X

,

g (x)

represents the constraints, and

l

and

u

are the lower and upper bounds of the constraints, respectively. There are two types of scenarios for the solution to this formulation: feasible and infeasible. A solution is considered feasible if the solution vector

x \in X

satisfies all the constraints

g (x)

. Conversely, a solution is deemed infeasible if

x \in X

but does not satisfy all the constraints

g (x)

. At times, the feasibility ratio (the percentage of feasible solutions out of the total number of feasible and infeasible solutions) can be very low, which is equivalent to having very stringent constraints. In such cases, it is imperative to find a solution that satisfies the constraints as efficiently as possible.

For solving complex optimization problems, especially those with constraints, the use of surrogate models has emerged as a highly effective strategy. Surrogate models, pivotal in the realm of optimization, provide simplified yet powerful approximations of complex systems, enabling efficient exploration and optimization of design spaces where direct evaluations are prohibitively expensive. These models, also referred to as metamodels, span a diverse array of methodologies, each tailored to capture the intricacies of different problem domains [13]. To encapsulate these varying surrogate model methodologies and their application in constrained optimization problems, Table 1 provides a comprehensive summary of the key literature, highlighting techniques, tested datasets, and their respective advantages and disadvantages.

Despite these advancements, a comprehensive review of the literature underscores certain limitations in the current surrogate modelling approaches, particularly when applied to constrained optimization problems [18]. A notable challenge is the difficulty in accurately capturing and adapting to the complex constraint boundaries that often define feasible regions in optimization landscapes [19]. Many existing surrogate models excel in unconstrained scenarios or situations with simple constraints but may fall short when faced with complex, nonlinear, or high-dimensional constraints [20]. Additionally, there is a paucity of methodologies that effectively incorporate constraint handling mechanisms within the surrogate model itself, often relying on external penalty functions or constraint relaxation techniques that may not always yield optimal solutions. This gap in the literature highlights the need for more sophisticated surrogate models that can inherently deal with complex constraints while maintaining the balance between exploration and exploitation in the search space. These insights into the limitations of current surrogate models in handling constrained optimization problems have motivated the development of our proposed methodologies. By addressing these identified gaps, we aim to contribute to the advancement of surrogate-assisted optimization strategies, providing more robust and efficient solutions for complex constrained optimization challenges.

Among the diverse array of surrogate modelling techniques, Gaussian Process Regression (GPR) stands out for its efficiency and effectiveness [17]. GPR is renowned for its robustness in capturing the underlying trends of the data with a quantifiable measure of uncertainty, making it particularly suitable for optimization problems where uncertainty plays a critical role [21]. The historical development of surrogate models in optimization was notably advanced in 1998 when Jones et al. [22] introduced the Efficient Global Optimization (EGO) algorithm. This algorithm integrates GPR with the Expected Improvement (EI) function, a pivotal concept in Bayesian optimization. The EI function is designed to systematically identify the global optimum of computationally expensive black-box functions, which are often subject to inherent uncertainties [23]. This approach prioritizes sampling in regions where the anticipated improvement in the function’s value is maximized, thereby enhancing the model’s accuracy in critical areas. This methodology exemplifies an active learning strategy, dynamically refining the model with each new data point to efficiently navigate the complex landscape of the optimization problem. However, when dealing with optimization problems that include stringent constraints, the standard EI approach may require modifications to accommodate these limitations. In response to this challenge, recent developments have introduced variants such as the Constrained Expected Improvement (CEI), which adapt the EI principle to handle constraints effectively [24,25].

In addressing the challenges inherent in constrained optimization, this study introduces significant innovations that extend the current state-of-the-art methodologies. Firstly, we propose a novel approach to compute the Expected Prediction Error (EPE) [26] at an untested point by leveraging the cross-validation error from a nearby tested point. This method enhances the prediction accuracy of our surrogate model, particularly in the sparse regions of the design space where traditional interpolation methods might falter. Secondly, we introduce the Constrained Expected Prediction Error (CEPE) criterion, a pioneering metric designed to navigate the intricate landscape of constrained optimization problems efficiently. By integrating CEPE within a Differential Evolution (DE) [27] framework augmented with Gaussian Process (GP) surrogates, our approach not only capitalizes on the strengths of evolutionary algorithms in exploring complex solution spaces but also harnesses the predictive prowess of GP models to make informed decisions during the optimization process. The synergy between these innovations presents a robust framework that promises improved optimization performance, especially in scenarios characterized by stringent constraints and expensive function evaluations.

The remainder of this paper is structured as follows. In Section 2, we introduce the theoretical knowledge about Gaussian Process, Expected Improvement and Expected Prediction Error. In Section 3, we discuss methodologies for Constrained Expected Improvement, introduce our novel method Constrained Expected Prediction Error, and provide the GP surrogate-assisted Differential Evolution (DE) algorithm. In Section 4, we evaluate the efficacy of CEI and CEPE using benchmark problems, and in Section 5, we illustrate their application in a real-world problem. Finally, we conclude this paper with discussions and remarks in Section 6.

2. Background and Concepts

In this section, we provide a brief theoretical overview of Gaussian Process Regression (GPR), Expected Improvement (EI), and Expected Prediction Error (EPE).

2.1. Gaussian Process Regression

We aim to use the Gaussian Process Regression (GPR) framework to model an unknown function

f (x)

, by assuming that

f (x)

is a sample from a Gaussian process denoted as

y (x)

. Within this context, for any input point

x

, the function value

f (x)

is viewed as a sample from a Gaussian random variable

y (x)

and

y (x)

is distributed as

N (μ, σ^{2})

, where

μ

and

σ^{2}

are constants independent of

x

. The covariance of

y (x)

with another random variable

y (x^{'})

, where

x^{'}

represents another point in the input space, is given by

cov [y (x), y (x^{'})] = σ^{2} Corr (y (x), y (x^{'})) .

(2)

The correlation function

Corr (y (x), y (x^{'}))

is defined as

Corr (y (x), y (x^{'})) = exp (- \sum_{k = 1}^{p} θ_{k} {| x_{k} - x_{k}^{'} |}^{2}),

(3)

where p is the dimension of the input space and

θ_{k}

are the hyperparameters.

Considering a set of N input points

x_{1}, x_{2}, \dots, x_{N}

, let

f^{N} = {(f (x_{1}), f (x_{2}), \dots, f (x_{N}))}^{T}

and

y^{N} = {(y (x_{1}), y (x_{2}), \dots, y (x_{N}))}^{T}

. The parameters

μ, σ^{2}, θ_{1}, \dots, θ_{p}

can be estimated by maximizing the joint Gaussian probability density function of

y^{N}

at

f^{N}

p (f^{N} | μ, σ^{2}, R) = \frac{1}{{(2 π σ^{2})}^{N / 2} \sqrt{det (R)}} exp [- \frac{{(f^{N} - μ 1)}^{T} R^{- 1} (f^{N} - μ 1)}{2 σ^{2}}],

(4)

where

R

is an

N \times N

matrix with elements

R_{i, j} = Corr (y (x_{i}), y (x_{j}))

, and

1

is an N-dimensional column vector of ones.

The log-likelihood function is then

L (μ, σ^{2}, θ_{1}, \dots, θ_{p} | f^{N}) = - \frac{N}{2} log (2 π σ^{2}) - \frac{1}{2} log (det (R)) - \frac{{(f^{N} - μ 1)}^{T} R^{- 1} (f^{N} - μ 1)}{2 σ^{2}} .

(5)

Now, let us differentiate the log-likelihood with respect to

μ

:

\frac{\partial L}{\partial μ} = \frac{1}{σ^{2}} (1^{T} R^{- 1} (f^{N} - μ 1)) .

(6)

Setting this derivative to zero and solving for

μ

gives the maximum likelihood estimate

\hat{μ} = \frac{1^{T} R^{- 1} f^{N}}{1^{T} R^{- 1} 1} .

(7)

Next, let us differentiate the log-likelihood with respect to

σ^{2}

\frac{\partial L}{\partial σ^{2}} = - \frac{N}{2 σ^{2}} + \frac{{(f^{N} - 1 \hat{μ})}^{T} R^{- 1} (f^{N} - 1 \hat{μ})}{2 σ^{4}} .

(8)

Setting this derivative to zero and solving for

σ^{2}

gives the maximum likelihood estimate

{\hat{σ}}^{2} = \frac{{(f^{N} - 1 \hat{μ})}^{T} R^{- 1} (f^{N} - 1 \hat{μ})}{N} .

(9)

There is no analytical form for the estimates of

θ_{1}, \dots, θ_{p}

, so the maximization of the log-likelihood function with respect to

θ_{1}, \dots, θ_{p}

is usually completed by conjugate descent.

With the maximum likelihood estimates of the parameters, we can now derive expressions for predicting

f (x)

at a new point

x

. Let

r

be the vector of correlations between the new point and the N data points, where the i-th element of

r

is

C o r r (y (x), y (x_{i}))

. The best linear unbiased predictor (BLUP) of

f (x)

can be written as:

\hat{f} (x) = \hat{μ} + r^{T} R^{- 1} (f^{N} - 1 \hat{μ}),

(10)

where

\hat{μ}

is the estimate of the mean in (7) and

R

is the correlation matrix calculated using the estimates of

θ_{i}

.

Similarly, the prediction variance can be derived as:

s^{2} (x) = {\hat{σ}}^{2} (1 - r^{T} R^{- 1} r + \frac{{(1 - 1^{T} R^{- 1} r)}^{2}}{1^{T} R^{- 1} 1}) .

(11)

This forms the basis for Gaussian Process Regression, where we model the unknown function

f (x)

as a Gaussian process and make predictions at new points by combining information from the observed data.

2.2. Expected Improvement

In this section, we introduce a widely-adopted infill sampling method known as Expected Improvement (EI), developed by Jones et al. [22] for the optimization of expensive black-box functions. Expected Improvement is particularly useful once a surrogate model, typically Gaussian Process Regression, is constructed.

As previously defined in Equation (1), let

f (x)

denote the objective function with a surrogate model

y (x)

that follows a Gaussian process. Consider a set of test inputs

{x_{1}, x_{2}, \dots, x_{N}}

with corresponding observed function values

f^{N} = {(f (x_{1}), f (x_{2}), \dots, f (x_{N}))}^{T}

. We define

f_{\min}

as the best observed value of the function among the test inputs, i.e.,

f_{\min} = min (f^{N})

. The improvement of

f (x)

at a new input point

x

is defined as

I (x) = max {f_{\min} - f (x), 0} .

(12)

Expected Improvement is then defined as the expected value of this improvement, given the observed data

f^{N}

E [I (x) | f^{N}] = E [max {f_{\min} - f (x), 0} | f^{N}] .

(13)

Through some non-trivial mathematical manipulations, we can express the Expected Improvement in terms of the predictive mean and standard deviation of the surrogate model as follows:

E [I (x) | f^{N}] = (f_{\min} - \hat{f} (x)) Φ (\frac{f_{\min} - \hat{f} (x)}{s (x)}) + s (x) ϕ (\frac{f_{\min} - \hat{f} (x)}{s (x)}),

(14)

where

\hat{f} (x)

and

s (x)

are the predictive mean and standard deviation of the surrogate model at

x

, computed according to Equations (10) and (11), and

ϕ (\cdot)

and

Φ (\cdot)

denote the probability density function (PDF) and cumulative distribution function (CDF) of the standard normal distribution, respectively.

2.3. Expected Prediction Error

In this section, we introduce the Expected Prediction Error (EPE), an active learning strategy employed to evaluate the accuracy of a predictive model. EPE quantifies the discrepancy between the predicted and actual values. Specifically, it is the expected value of the squared prediction error over a sample of data. EPE is sometimes referred to as the mean squared error (MSE).

Within the context of the GPR methodology, the posterior distribution of the response at a given location

x

is denoted as

\hat{y} (x)

, which follows a Gaussian distribution expressed as

\hat{y} (x) \sim N (μ_{x}, σ_{x}^{2})

. In this setting, the mean

μ_{x}

symbolizes the predictive estimate given by

\hat{f} (x)

, while the variance

σ_{x}^{2}

reflects the uncertainty associated with the prediction, denoted as

s_{x}^{2}

. Given this framework, we can define the prediction error, often referred to as the loss function,

L

, for a point

x

as

L (x) = {(y (x) - \hat{y} (x))}^{2} .

(15)

Here,

y (x)

embodies the true underlying response, perturbed by inherent observational noise.

Then, the overall generalization error of the GPR-based surrogate model is given by

e = \int_{D} E [L (x)] d x,

(16)

where

E [L (x)]

denotes the expected value of the prediction error, and

D

is a subset of

R^{p}

. This expectation can be decomposed into

\begin{matrix} E [L (x)] = & {(E [y (x)] - E [\hat{y} (x)])}^{2} + E {[(\hat{y} (x) - E [\hat{y} (x)])]}^{2} + E [{(y (x) - E [y (x)])}^{2}], \end{matrix}

(17)

where the first term represents the squared bias, capturing the average difference between the predicted and observed responses; the second term is the prediction variance of the surrogate model; and the third term is the variance of the noise

V a r (ϵ)

, which is often negligible.

The EPE quantifies the discrepancy between the GPR model’s predictions and the true function values. In this context,

f (x)

denotes the true function we aim to approximate, and

\hat{f} (x)

represents the GPR model’s prediction at

x

. Thus, EPE at a point

x

is given by

E P E (x) = \underset{{bias}^{2}}{\underset{︸}{{(f (x) - \hat{f} (x))}^{2}}} + \underset{variance}{\underset{︸}{s^{2} (x)}} .

(18)

However, the true response

f (x)

is unknown in the bias term. Following the approach by Liu et al. [26], we employ leave-one-out cross-validation for estimation. Initially, we estimate the cross-validation errors at all training sample locations:

\begin{matrix} e_{CV}^{2} (x_{i}) = {(f (x_{i}) - {\hat{f}}^{- i} (x_{i}))}^{2}, i = 1, 2, \dots, N, \end{matrix}

(19)

where

x_{i}

represents the i-th training sample, and

{\hat{f}}^{- i}

denotes the GP model trained using all training samples except

(x_{i}, f (x_{i}))

. Next, for any point

x

, we locate the nearest training sample to

x

, denoted as

x_{i}

, and assign its associated cross-validation error to

e_{CV}^{2} (x)

:

\begin{matrix} e_{CV}^{2} (x) = e_{CV}^{2} (x_{i}), where i = arg min_{i} | x - x_{i} |, i = 1, 2, \dots, N . \end{matrix}

(20)

Finally, incorporating the cross-validation error, we obtain a refined expression for the EPE:

\begin{matrix} E P E (x) = e_{CV}^{2} (x) + s^{2} (x) . \end{matrix}

(21)

The Expected Prediction Error (EPE) plays a crucial role in assessing the accuracy and reliability of our surrogate model, particularly in constrained optimization problems. By calculating the EPE, we can identify regions in the design space where the model’s predictions are less reliable, informing us where additional sampling might be beneficial. Moreover, as EPE comprises both bias and variance, it aids in striking a balance between the two, ensuring that the model is neither too simplistic nor too complex. This information is vital for efficient exploration of the search space and making informed decisions in the optimization process.

3. Methodology

In this section, we first present the CEI method proposed by Jiao et al. [25], then we introduce a novel method CEPE for solving constrained optimization problems (COP). To solve a COP, the objective is to minimize the objective function subject to several constraints. Before delving into the details of our proposed method, we will define some notations in the following subsection.

In this work, we treat

x

as an untested point, while

x_{1}, x_{2}, \dots, x_{N}

are considered tested points. Let

y (x)

denote the objective function, and

G_{i} (x)

represent the constraint functions for

i = 1, 2, \dots, m

. For a given point

x

, we define a Gaussian random vector

G (x)

as

\begin{matrix} G (x) = (G_{1} (x), G_{2} (x), \dots, G_{m} (x)) . \end{matrix}

(22)

We introduce

G (x)

as a Gaussian random vector to represent the constraints’ evaluations at the point

x

. This probabilistic modelling approach allows us to account for and manage the uncertainties associated with constraint evaluations, especially when exact determinations of these constraints are not feasible or when they are subject to variability due to measurement errors, model inaccuracies, or other sources of uncertainty.

For a set of N tested points

x_{1}, x_{2}, \dots, x_{N}

, we define the following Gaussian random vectors:

\begin{matrix} y^{N} & = {(y (x_{1}), y (x_{2}), \dots, y (x_{N}))}^{T}, \\ G_{1}^{N} & = {(G_{1} (x_{1}), G_{1} (x_{2}), \dots, G_{1} (x_{N}))}^{T}, \\ G_{2}^{N} & = {(G_{2} (x_{1}), G_{2} (x_{2}), \dots, G_{2} (x_{N}))}^{T}, \\ ⋮ \\ G_{m}^{N} & = {(G_{m} (x_{1}), G_{m} (x_{2}), \dots, G_{m} (x_{N}))}^{T}, \\ G^{m N} & = {(G_{1} (x_{1}), G_{2} (x_{1}), \dots, G_{m} (x_{1}), \dots, G_{1} (x_{N}), G_{2} (x_{N}), \dots, G_{m} (x_{N}))}^{T}, \end{matrix}

where

y^{N}

,

G_{1}^{N}

,

G_{2}^{N}

, …,

G_{m}^{N}

, and

G^{m N}

represent the function values at the tested points. In contrast,

y (x)

and

G (x)

are the quantities to be predicted. The evaluated function values are

\begin{matrix} f^{N} & = {(f (x_{1}), f (x_{2}), \dots, f (x_{N}))}^{T}, \\ g_{1}^{N} & = {(g_{1} (x_{1}), g_{1} (x_{2}), \dots, g_{1} (x_{N}))}^{T}, \\ g_{2}^{N} & = {(g_{2} (x_{1}), g_{2} (x_{2}), \dots, g_{2} (x_{N}))}^{T}, \\ ⋮ \\ g_{m}^{N} & = {(g_{m} (x_{1}), g_{m} (x_{2}), \dots, g_{m} (x_{N}))}^{T}, \\ g^{m N} & = {(g_{1} (x_{1}), g_{2} (x_{1}), \dots, g_{m} (x_{1}), \dots, g_{1} (x_{N}), g_{2} (x_{N}), \dots, g_{m} (x_{N}))}^{T}, \end{matrix}

which are considered as samples of

y^{N}

,

G_{1}^{N}

,

G_{2}^{N}

, …,

G_{m}^{N}

, and

G^{m N}

. Essentially, by using the known function values, we can predict

y (x)

and

G_{i} (x)

for

i = 1, 2, \dots, m

at the tested points

x_{1}, x_{2}, \dots, x_{N}

without additional function evaluations.

3.1. Constrained Expected Improvement

In the exploration of constrained optimization problems, we often encounter a combination of both feasible and infeasible solutions, as categorized by Jiao et al. [25]. The feasibility of a solution

x \in X

is determined by whether it satisfies all constraints

g (x)

. An essential aspect of optimization lies in efficiently transitioning from an infeasible to a feasible situation to reduce computational costs. In our methodology, we initially identify infeasible solutions by evaluating them against all defined constraints

G_{i} (x)

.

In the following, we introduce the concept of Constrained Expected Improvement to aid this process. For a solution

x

, we measure the extent of constraint violation by defining

\begin{matrix} G_{i}^{+} (x) = max {0, l_{i} - G_{i} (x), G_{i} (x) - u_{i}}, \end{matrix}

(23)

where

l_{i}

and

u_{i}

denote the lower and upper constraint boundaries, respectively. We represent the constraint violation of a solution as the maximum violation across all constraints,

\begin{matrix} G^{+} (x) = max_{i = 1, 2, \dots, m} {G_{i}^{+} (x)} . \end{matrix}

(24)

This value is zero for feasible solutions and positive for infeasible ones. A solution

x

is deemed infeasible if it violates any of these constraints, i.e., if

G_{i}^{+} (x) > 0

for at least one constraint. This preliminary step is crucial for distinguishing between solutions that require improvement to meet feasibility criteria and those that are already feasible.

Let

g_{min}^{+ N}

denote the smallest observed constraint violation among all evaluated points up to the current iteration, effectively representing the ‘best’ constraint violation scenario. This value is crucial for assessing the relative improvement of constraint satisfaction for new candidate solutions compared to the existing ones. To quantify the improvement in constraint violation at an untested point

x

relative to the current best solution, we define the constrained improvement for infeasible solutions as

\begin{matrix} I_{c, N} (x) = \{\begin{matrix} g_{min}^{+ N} - G^{+} (x), & if G^{+} (x) \leq g_{min}^{+ N}, \\ 0, & otherwise . \end{matrix} \end{matrix}

(25)

Here,

I_{c, N} (x)

quantifies how much the constraint violation at an untested point

x

improves relative to the current best solution.

We can now derive the solution

x

as follows:

\begin{matrix} E [I_{c, N} (x) | g^{m N}] & = \int_{0}^{g_{m i n}^{+ N}} (g_{m i n}^{+ N} - z) \times p_{G^{+} (x) | g^{m N}} (z) d z \\ = \int_{0}^{g_{m i n}^{+ N}} (g_{m i n}^{+ N} - z) \times d P_{G^{+} (x) | g^{m N}} (z) \\ = \int_{0}^{g_{m i n}^{+ N}} P_{G^{+} (x) | g^{m N}} (z) d z - g_{m i n}^{+ N} \times P_{G^{+} (x) | g^{m N}} (0), \end{matrix}

(26)

where

p_{G^{+} (x) | g^{m N}} (\cdot)

represents the probability density function of the random variable conditional on

G^{m N} = g^{m N}

, and

P_{G^{+} (x) | g^{m N}} (\cdot)

denotes the cumulative distribution function under the same condition.

For

z \leq 0

,

\begin{matrix} P (G^{+} (x) \leq z | g^{m N}) = 0; \end{matrix}

(27)

and for

z > 0

, the computation is as follows:

\begin{matrix} P_{G^{+} (x) | g^{m N}} (z) \\ = & P {G^{+} (x) \leq z | g^{m N}} \\ = & P {[l_{1} - G_{1} (x) \leq z | g^{m N}] \cap [G_{1} (x) - u_{1} \leq z | g^{m N}] \cap \cdot \cdot \cdot \\ \cap [l_{m} - G_{m} (x) \leq z | g^{m N}] \cap [G_{m} (x) - u_{m} \leq z | g^{m N}]} \\ = & P {[l_{1} - z \leq G_{1} (x) \leq u_{1} + z | g^{m N}] \cap \cdot \cdot \cdot \cap [l_{m} - z \leq G_{m} (x) \leq u_{m} + z | g^{m N}] \\ = & \int_{l_{1} - z}^{u_{1} + z} d g_{1} \cdot \cdot \cdot \int_{l_{m} - z}^{u_{m} + z} d g_{m} \times p_{G (x) | g^{m N}} (g_{1}, \dots, g_{m}), \end{matrix}

(28)

where

p_{G (x) | g^{m N}} (g_{1}, \dots, g_{m})

is the joint Gaussian probability function of

G (x)

conditioned on

g^{m N}

. Since the objective and all constraints are mutually independent,

p_{G (x) | g^{m N}} (g_{1}, \dots, g_{m})

can be computed by

\begin{matrix} p_{G (x) | g^{m N}} (g_{1}, \dots, g_{m}) \\ = & \prod_{i = 1}^{m} p_{G_{i} (x) | g_{i}^{N}} (g_{i}) \\ = & \prod_{i = 1}^{m} \frac{1}{s_{g_{i}} (x)} ϕ (\frac{g_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}), \end{matrix}

(29)

where

{\hat{g}}_{i} (x)

and

s_{g_{i}} (x)

(

i = 1, \dots, m

) are the expectations and variances of the constraints respectively.

By substituting the result

p_{G (x) | g^{m N}} (g_{1}, \dots, g_{m})

obtained from Equation (29) into Equation (28), we obtain the cumulative distribution function for

P_{G^{+} (x) | g^{m N}} (\cdot)

under the condition

G^{m N} = g^{m N}

as follows

\begin{matrix} P_{G^{+} (x) | g^{m N}} (z) \\ = & \int_{l_{1} - z}^{u_{1} + z} d g_{1} \cdot \cdot \cdot \int_{l_{m} - z}^{u_{m} + z} d g_{m} \times \prod_{i = 1}^{m} \frac{1}{s_{g_{i}} (x)} ϕ (\frac{g_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) \\ = & \prod_{i = 1}^{m} \int_{l_{i} - z}^{u_{i} + z} \frac{1}{s_{g_{i}} (x)} ϕ (\frac{g_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) d g_{i} \\ = & \prod_{i = 1}^{m} [Φ (\frac{u_{i} + z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] . \end{matrix}

(30)

Finally, the constrained expected improvement for a solution

x

is given by

\begin{matrix} E [I_{c, N} (x) | g^{m N}] = \int_{0}^{g_{m i n}^{+ N}} P_{G^{+} (x) | g^{m N}} (z) d z - g_{m i n}^{+ N} \times P_{G^{+} (x) | g^{m N}} (0), \end{matrix}

(31)

with

P_{G^{+} (x) | g^{m N}} (z) = \prod_{i = 1}^{m} [Φ (\frac{u_{i} + z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})]

calculated as mentioned above.

In situations where a feasible point has already been identified, it is desired to maximize the Expected Improvement (EI) of the objective, ensuring that the feasibility conditions are met. This is essentially a quest for a feasible solution that provides the best possible objective value. This strategy is denoted as the Constrained Expected Improvement (CEI), represented by

E [I_{c, N} (x) | f^{N}, g^{m N}]

. We define the improvement in the objective function subject to the satisfaction of constraints as

I_{c, N} (x)

. Assuming that

y (x)

and

G_{i} (x)

for

i = 1, 2, \dots, m

are mutually independent, the constrained expected improvement based on

y^{N} = f^{N}

and

G^{m N} = g^{m N}

is given by

\begin{matrix} E [I_{c, N} (x) | f^{N}, g^{m N}] = E [I (x) | f^{N}] \times \prod_{i = 1}^{m} P {{l_{i} \leq G_{i} (x) \leq u_{i}} | g_{i}^{N}}, \end{matrix}

(32)

where

\prod_{i = 1}^{m} P \{l_{i} \leq G_{i} (x) \leq u_{i} | g_{i}^{N}\}

represents the probability of feasibility (PF), and

E [I (x) | f^{N}]

denotes the expected improvement at the point

x

.

In the CEI method, the constrained improvement can be defined as

\begin{matrix} I_{c, N} (x) = \{\begin{matrix} f_{m i n} - y (x), & if y (x) \leq f_{m i n} and l_{i} \leq G_{i} (x) \leq u_{i}, \\ 0, & otherwise . \end{matrix} \end{matrix}

(33)

Here,

f_{m i n}

is the best value of

y (x)

over all the test values

f^{N}

, and

\begin{matrix} \prod_{i = 1}^{m} P {{l_{i} \leq G_{i} (x) \leq u_{i}} | g_{i}^{N}} \\ = & P {{l_{1} \leq G_{1} (x) \leq u_{1}} | g_{1}^{N}} \times P {{l_{2} \leq G_{2} (x) \leq u_{2}} | g_{2}^{N}} \times \cdot \cdot \cdot \times P {{l_{m} \leq G_{m} (x) \leq u_{m}} | g_{m}^{N}} \\ = & \int_{l_{1}}^{u_{1}} p_{G_{1} (x) | g_{1}^{N}} (g_{1}) d g_{1} \int_{l_{2}}^{u_{2}} p_{G_{2} (x) | g_{2}^{N}} (g_{2}) d g_{2} \cdot \cdot \cdot \int_{l_{m}}^{u_{m}} p_{G_{m} (x) | g_{m}^{N}} (g_{m}) d g_{m} \\ = & \prod_{i = 1}^{m} [Φ (\frac{u_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] . \end{matrix}

(34)

Using this, we can now define the Constrained Expected Improvement as

\begin{matrix} E [I_{c, N} (x) | f^{N}, g^{m N}] = & E [I (x) | f^{N}] \times \prod_{i = 1}^{m} [Φ (\frac{u_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] . \end{matrix}

(35)

Here,

E [I (x) | f^{N}]

is the same with Equation (14), and the CEI of a solution

x

is given by

\begin{matrix} E [I_{c, N} (x) | f^{N}, g^{m N}] = & [(f_{m i n} - \hat{f} (x)) Φ (\frac{f_{m i n} - \hat{f} (x)}{s (x)}) + s (x) ϕ (\frac{f_{m i n} - \hat{f} (x)}{s (x)})] \\ \times \prod_{i = 1}^{m} [Φ (\frac{u_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] . \end{matrix}

(36)

As previously defined in Equation (1), the special case for COP formula has a lower constraint bound of

l = - \infty

and an upper constraint bound of

u = 0

. Equations (31) and (36) for CEI can be summarized as follows:

For an infeasible situation:

\begin{matrix} E [I_{c, N} (x) | g^{m N}] = \int_{0}^{g_{m i n}^{+ N}} \prod_{i = 1}^{m} Φ (\frac{z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) d z - g_{m i n}^{+ N} \times \prod_{i = 1}^{m} Φ (\frac{- {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) . \end{matrix}

(37)

For a feasible situation:

\begin{matrix} E [I_{c, N} (x) | f^{N}, g^{m N}] = & [(f_{m i n} - \hat{f} (x)) Φ (\frac{f_{m i n} - \hat{f} (x)}{s (x)}) + s (x) ϕ (\frac{f_{m i n} - \hat{f} (x)}{s (x)})] \\ \times \prod_{i = 1}^{m} Φ (\frac{- {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) . \end{matrix}

(38)

3.2. Constrained Expected Prediction Error

Building upon the preceding subsection, which discussed the workings of Constrained Expected Improvement (CEI) under both infeasible and feasible situations, we now move our focus towards the Constrained Expected Prediction Error (CEPE). It is our novel method for constrained optimization problems.

In terms of infeasible situations, our approach with CEPE aligns with the methods used in the preceding CEI subsection. In terms of feasible situations of the CEPE method, we introduce the concept of prediction error, denoted as

P E_{c, N} (x)

, which is defined as follows:

\begin{matrix} P E_{c, N} (x) = \{\begin{matrix} {(f (x) - f (x_{*}))}^{2}, & if l_{i} \leq G_{i} (x) \leq u_{i}, \\ 0, & otherwise . \end{matrix} \end{matrix}

(39)

Here,

f (x_{*})

is the observed output sample at the sample point

x_{*}

, which is the nearest evaluated point to the candidate point

x

. This approach assumes that

f (x)

exhibits moderate continuity in the vicinity of

x_{*}

, an assumption that is generally valid for smooth functions commonly encountered in engineering and optimization applications.

\begin{matrix} E [P E_{c, N} (x) | f^{N}] = (\hat{f} (x_{*}) - \hat{f} {(x))}^{2} + s^{2} (x) . \end{matrix}

(40)

In this formulation,

\hat{f} (x_{*})

is the surrogate model’s prediction at the point

x_{*}

, and

\hat{f} (x)

is the model’s prediction at the candidate point

x

. The term

s^{2} (x)

represents the prediction variance at

x

, reflecting the model’s uncertainty.

Then we can obtain the formulation of the CEPE of a solution

x

below,

\begin{matrix} E [P E_{c, N} (x) | f^{N}, g^{m N}] = & E [P E_{c, N} (x) | f^{N}] \times \prod_{i = 1}^{m} [Φ (\frac{u_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] \\ = & [{(\hat{f} (x_{*}) - \hat{f} (x))}^{2} + s^{2} (x)] \times \prod_{i = 1}^{m} [Φ (\frac{u_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) - Φ (\frac{l_{i} - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)})] . \end{matrix}

(41)

This formulation takes into account the constraints by incorporating the product of the cumulative distribution functions for the constraints. It effectively combines the expected prediction error with the feasibility of the solution, guiding the optimization process towards regions of the design space where the model uncertainty is high and potential improvements in the objective function are likely, while still adhering to the constraints.

In summary, the CEPE method is underpinned by the assumption of moderate function continuity near

x_{*}

, allowing the use of surrogate model predictions at nearby points to estimate the prediction error at untested points. This approach enables the efficient exploration of the design space, especially in the context of expensive function evaluations and stringent constraints.

Similarly, as previously defined in Equation (1), the special case for COP formula has a lower constraint bound of

l = - \infty

and an upper constraint bound of

u = 0

. Equations (31) and (41) for CEPE can be summarized as follows:

For an infeasible situation:

\begin{matrix} E [P E_{c, N} (x) | g^{m N}] = \int_{0}^{g_{m i n}^{+ N}} \prod_{i = 1}^{m} Φ (\frac{z - {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) d z - g_{m i n}^{+ N} \times \prod_{i = 1}^{m} Φ (\frac{- {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) . \end{matrix}

(42)

For a feasible situation:

\begin{matrix} E [P E_{c, N} (x) | f^{N}, g^{m N}] = [{(\hat{f} (x_{*}) - \hat{f} (x))}^{2} + s^{2} (x)]] \prod_{i = 1}^{m} Φ (\frac{- {\hat{g}}_{i} (x)}{s_{g_{i}} (x)}) . \end{matrix}

(43)

3.3. The GP Surrogate-Assisted DE Algorithm

In this section, we outline and describe the steps of the GP surrogate-assisted Differential Evolution (DE) algorithm. The algorithm applies to problems with any number of dimensions n and any number of constraints m. The steps are as follows:

Step 1: Design of Experiments (DOE)
Initialize the algorithm by assigning samples using the Latin Hypercube Design (LHD). The steps are as follows:
- Divide each dimension into $p o p s i z e$ intervals of equal probability.
- Randomly assign one value to each interval within each dimension.
- For each dimension, permute the vector of assigned values randomly.
- Combine the permuted vectors of assigned values across dimensions into a $p o p s i z e \times n$ matrix.
Step 2: Identify the Best Point and Assess Feasibility
Extract the objective function values and constraint violation values from the initial samples. Identify the feasible solutions (those with no constraint violations) and select the one with the optimal objective function value as the best feasible solution. If no feasible solutions exist, select the solution with the least constraint violation as the best infeasible solution. Update the data structure with information about the best solution discovered thus far, and return this information along with a feasibility flag.
Step 3: Clustering—Organize Tested Points into Clusters
Given N tested points $x_{1}, x_{2}, \dots, x_{N}$ , if $N \leq L_{1}$ (where $L_{1}$ is the maximum number of points a local model can contain), use all the points to build a single local model. If $N > L_{1}$ , apply fuzzy clustering to divide the points into $c_{s i z e}$ clusters, where

$\begin{matrix} c_{s i z e} = 1 + ⌈\frac{N - L_{1}}{L_{2}}⌉, \end{matrix}$

and $L_{1} > L_{2}$ (where $L_{2}$ is the number of points for adding one more local model). The clustering minimizes the following function J:

$\begin{matrix} J = \sum_{i = 1}^{N} \sum_{j = 1}^{c_{s i z e}} u_{i j}^{α} {∥ x_{i} - v_{j} ∥}^{2}, \end{matrix}$

where $α$ is a constant greater than 1, $v_{j}$ is the centre of the cluster j, $u_{i j}$ represents the membership degree of $x_{i}$ in cluster j, and $∥ \cdot ∥$ denotes the Euclidean norm. $v_{j}$ can be calculated as

$\begin{matrix} v_{j} = \frac{\sum_{i = 1}^{N} {(u_{i j}^{t})}^{α} x_{i}}{\sum_{i = 1}^{N} {(u_{i j}^{t})}^{α}} . \end{matrix}$

Initialize $u_{i j}^{0}$ for $i = 1, 2, \dots, N$ and $j = 1, 2, \dots, c_{s i z e}$ , and set $t = 0$ . Compute $u_{i j}^{t + 1}$ using

$\begin{matrix} u_{i j}^{t + 1} = \frac{1}{\sum_{k = 1}^{c_{s i z e}} {(\frac{∥ x_{i} - v_{j} ∥}{∥ x_{i} - v_{k} ∥})}^{\frac{2}{α - 1}}} . \end{matrix}$

If ${max}_{1 \leq i \leq N, 1 \leq j \leq c_{s i z e}} | u_{i j}^{t + 1} - u_{i j}^{t} | < ϵ$ , terminate and output $v_{j}$ and $u_{i j} = u_{i j}^{t + 1}$ ; otherwise, increment t and recalculate $v_{j}$ and $u_{i j}^{t + 1}$ until the stopping criterion $ϵ$ is satisfied.
Step 4: Modelling—Build Local GP Surrogate Models
For each cluster, construct local Gaussian Process (GP) surrogate models for the objective function and the constraint functions separately.
Step 5: Differential Evolution–Generate and Evaluate Candidate Points
Use Differential Evolution (DE) [27] to generate $N P$ candidate points and evaluate them using the surrogate models. DE is a population-based optimization algorithm that aims to find the global minimum of a given objective function within a high-dimensional search space. The algorithm employs mutation, crossover, and selection strategies to evolve the population towards the optimal solution. During the mutation step, new candidate solutions are generated by combining existing solutions with a weighted difference vector, where the scaling factor is denoted by F. In the crossover step, these candidates are combined with the original population to form a trial population, with a defined crossover probability $C R$ .
Incorporate the Constrained Expected Improvement (CEI) and Constrained Expected Prediction Error (CEPE) criteria, derived from the Gaussian Process (GP) surrogate models, into the evaluation process of candidate points. By computing the CEI and CEPE values for each candidate point, the algorithm leverages these metrics to guide the selection process within DE. Candidate points are prioritized based on higher CEI values, indicating a greater expected improvement, or lower CEPE values, signifying a lower prediction error. This strategic prioritization ensures the exploration of regions in the search space with the highest potential for improvement while adhering to the constraints. This iterative process continues until a termination criterion is met, such as reaching the maximum number of generations $M a x t$ or achieving a predetermined level of convergence. The flowchart of the DE algorithm, highlighting the integration of CEI and CEPE, is depicted in Figure 1.
Step 6: Evaluation—Use Original Objective and Constraint Functions
Evaluate the $N P$ candidate points using the original objective and constraint functions. This step can also be considered as the final selection step of Differential Evolution.
Step 7: Iteration—Update Data and Repeat Steps 3 to 6
Incorporate new data from the observations into the model, and repeat Steps 3 to 6 until the termination condition is reached. The termination condition is typically based on the number of function evaluations, denoted as $F E s$ .

4. Simulation

In this section, we illustrate the efficacy of the proposed CEI (Constrained Expected Improvement) and CEPE (Constrained Expected Prediction Error) methods for addressing constrained optimization problems. We begin by introducing traditional benchmark problems, and then undertake a comparison and evaluation of the two methods by analyzing statistics for each problem.

4.1. Benchmark Problem Description

We employ the classic set of problems presented in CEC 2006 by Liang et al. [28], which was proposed as part of the IEEE Congress on Evolutionary Computation in 2006, and all problems we solve including the modified parts can be found by Chaiyotha and Krityakierne [24]. This suite encompasses 25 real-parameter optimization problems, representing a diverse array of characteristics. GP (Gaussian Process) is advantageous in low-dimensional settings (i.e., when the number of dimensions

n \leq 10

) [29]. Consequently, we select problems with suitable dimensions n and numbers of constraints m, as higher dimensions would entail significantly extended computation times with our GP surrogate-assisted DE algorithm. Nine problems are chosen, including two that we modify. The characteristics of these problems can be found in Table 2, where

P r o b

denotes the name of the problems,

D i m

represents the number of decision variables or dimensions,

f (x^{*})

signifies the best-known results, and

ρ

is the approximate ratio of feasible solutions within the search space (inclusive of both infeasible and feasible solutions).

4.2. Experimental Settings

As discussed in Section 3.3 regarding the GP surrogate-assisted DE algorithm, we will specify certain parameters for solving these numerical simulations, follows closely the methodologies and values reported in existing literature, particularly the work by Jiao et al. [25]. This reference provided a foundational basis for our initial parameter settings, ensuring consistency and comparability with established practices in the field. The problems have n dimensions and m constraints. In the Latin Hypercube Design (LHD), we generate

(11 n - 1)

initial samples, which serve as our population size (

p o p s i z e

). The parameters utilized in the fuzzy clustering are

L_{1} = 80

,

L_{2} = 20

,

α = 2

, and

ϵ = 0.5

. In the Differential Evolution (DE) process, we set the population size (

N P

) to 30, the number of generations to 500, the crossover probability (

C R

) to 0.9, and the scaling factor (F) to 0.5. In the CEPE method, the number of candidates is set to

50 \times n

. The number of function evaluations (

F E s

) is also set to

50 \times n

. Each benchmark problem is repeated 100 times using both methodologies (CEI and CEPE) with identical parameters to ensure fairness in the comparison.

The experiments were conducted on a personal computer with an Intel(R) Core(TM) i5-8265U CPU @ 1.60 GHz (turbo up to 1.80 GHz). The Intel Corporation is based in Santa Clara, California, USA. The computational analysis and simulations were performed using MATLAB version 2020b. This setup provided a consistent and controlled environment for all computational experiments, ensuring the reliability and reproducibility of our results.

4.3. Results

Figure 2 showcases box plots for nine different benchmark problems. Each box plot displays the distribution of values obtained from 100 independent experiments using two methods: CEI and CEPE. Additionally, Table 3 lists the mean, standard deviation (sd), best, and worst values of the objective function obtained by these methods after 100 iterations. The mean and standard deviation values indicate the quality of the best feasible solutions across the 100 independent experiments. The best and worst entries signify the lowest and highest values of the best feasible solution obtained from the 100 independent experiments for each method, respectively.

Examining Table 3, we observe that in six of the benchmark problems (G02mod, G04, G06, G11, G12, and G24), the CEPE method yields lower mean values compared to the CEI method, indicating better performance. Conversely, the CEI method performs better in the other three problems. Moreover, the standard deviation values mostly align with the mean values, with seven benchmark problems (G02mod, G03mod, G04, G06, G08, G12, and G24) exhibiting lower standard deviations for the method that has the lower mean. Regarding the best values, CEPE outperforms or matches CEI in eight out of nine problems, with the exception being G11, where CEI has a slight edge. For the worst values, CEPE is superior in seven out of nine problems, with CEI prevailing only in G08 and G11.

To statistically compare the performance differences between CEI and CEPE, a non-parametric Wilcoxon rank-sum test was employed. This test is particularly suitable when the data is not normally distributed, and it assumes equal variances between groups. The null hypothesis posits that the two methods yield the same distribution of values, whereas the alternative hypothesis contends that the distributions differ. A test statistic representing the sum of the ranks of the values is computed. A significant test statistic indicates a notable difference between the two methods, and a p-value below 0.05 is deemed statistically significant. The notation

X < Y

indicates that method X is significantly better than method Y at the 5% significance level, whereas

X \approx Y

signifies that there is no significant difference between the two methods. The results are compiled in Table 4. The table reveals that for most benchmark problems, CEI and CEPE exhibit similar performance, with CEPE being significantly better than CEI only for problem G02.

5. Application

In this section, we will apply the GP surrogate-assisted DE algorithm to the Three-Bar Truss Design problem. The Three-Bar Truss Design problem [30,31] is a classical optimization problem that is widely studied within the engineering and mathematical communities. The objective is to design a truss structure, composed of three bars, capable of withstanding a specified load while minimizing weight. This problem is particularly significant in the design of lightweight and efficient structures in fields such as aerospace and civil engineering. Moreover, it serves as a quintessential example of how optimization algorithms can address complex societal and civilizational challenges. The problem can be formulated as follows:

\begin{matrix} minimize f (x) = (2 \sqrt{2} x_{1} + x_{2}) & \times l, \\ subject to : \frac{\sqrt{2} x_{1} + x_{2}}{\sqrt{2} x_{1}^{2} + 2 x_{1} x_{2}} P - σ & \leq 0, \\ \frac{x_{2}}{\sqrt{2} x_{1}^{2} + 2 x_{1} x_{2}} P - σ & \leq 0, \\ \frac{1}{x_{1} + \sqrt{2} x_{2}} P - σ & \leq 0, \end{matrix}

(44)

where

0 \leq x_{1} \leq 1

and

0 \leq x_{2} \leq 1

;

l = 100

cm,

P = 2

KN/

{cm}^{2}

, and

σ = 2

KN/cm

{cm}^{2}

.

For the experiment, we employ the same parameter settings as in the numerical simulations. The problem has two dimensions and three constraints. Using Latin hypercube design (LHD), we generate 21 initial samples for our population size. The parameters in the fuzzy cluster are set to

L_{1} = 80

,

L_{2} = 20

,

α = 2

, and

ϵ = 0.5

. In Differential Evolution (DE), the population size is assumed to be

N P = 30

, the number of generations is set to 500, the crossover probability is

C R = 0.9

, and the scaling factor is

F = 0.5

. In the CEPE method, we set the candidate number to 100. The number of function evaluations (FEs) is 100. The experiment is repeated 100 times using two different methodologies (CEI and CEPE), and the parameters are kept the same to ensure a fair comparison.

Figure 3 depicts box plots that display the distribution of the objective function values obtained using the CEI and CEPE methods across 100 independent experiments. Table 5 lists the mean, standard deviation (sd), best, and worst of the objective function values obtained by both methods over 100 iterations. The best-known result for this problem is approximately

263.8958

. From the table, it is evident that CEPE performs better in terms of mean, standard deviation, and worst objective function values, while CEI marginally outperforms CEPE in the best objective function value.

Additionally, we employ the Wilcoxon rank-sum test to statistically compare the performance of the CEI and CEPE methods. With a p-value of 0.0538%, the test indicates that the CEPE method is significantly superior to the CEI method.

6. Discussion and Remarks

In this paper, we introduced the formulation of constrained optimization problems, the methodologies of Constrained Expected Improvement (CEI) and Constrained Expected Prediction Error (CEPE) in conjunction with the Gaussian Process (GP) surrogate-assisted Differential Evolution (DE) algorithm, and presented numerical benchmark simulations and applications. Additionally, we would like to emphasize the potential application of these methodologies in the field of engineering, where optimization plays a crucial role.

The implementation of CEI and CEPE has showcased their capacity for the efficient utilization of computational resources, making them highly advantageous in engineering contexts where simulation models can be computationally demanding. Their adeptness in managing multiple constraints, robustness against uncertainties, and flexibility to adapt to various problem domains highlight the methodologies’ applicability in real-world engineering problems. These attributes facilitate the optimization of processes, enhancement of product yields, reduction of waste, and improvement in safety and environmental performance.

Despite these advantages, the methodologies do present limitations, such as potential inflexibility in highly specialized engineering challenges and sensitivity to parameter configuration, which could impact their effectiveness. Moreover, the computational intensity required by CEI and CEPE, especially for large-scale problems, warrants consideration. These limitations suggest a cautious approach to their application, emphasizing the need for tailoring or integrating them with other techniques to meet specific engineering requirements.

In summary, the findings from this study affirm the promise of CEI and CEPE in advancing surrogate-assisted optimization strategies for engineering applications. While addressing the identified gaps in constrained optimization, this research also paves the way for future investigations into more sophisticated surrogate models and optimization techniques. Further exploration in this domain could yield even more robust and efficient solutions for complex optimization challenges, contributing to the continuous advancement of engineering optimization practices.

Author Contributions

Conceptualization, H.C. and B.W.; methodology, H.C. and B.W.; software, H.C. and Z.W.; validation, H.C.; formal analysis, H.C. and B.W.; investigation, H.C.; resources, H.C.; writing—original draft preparation, H.C.; writing—review and editing, H.C. and B.W.; visualization, H.C.; supervision, B.W.; project administration, H.C. and B.W.; funding, H.C. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by Chinese Scholarship Council (No.202008060052) and in part by University of Leicester (No. CSE-MA5-WANG).

Data Availability Statement

The data generated for this study are derived from formulae detailed in the referenced literature and further processed using the author’s methodologies. They are available from the corresponding author upon reasonable request. The data are not publicly available as they are unique to the study’s bespoke computational analysis.

Acknowledgments

The authors thank the editor and the reviewers for their very helpful and constructive comments.

Conflicts of Interest

Author Zhe Wang was employed by the company Covestro (Netherlands). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Covestro (Netherlands) had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CEI	Constrained Expected Improvement
CEPE	Constrained Expected Prediction Error
COPs	Constrained Optimization Problems
DE	Differential Evolution
EGO	Efficient Global Optimization
EI	Expected Improvement
EPE	Expected Prediction Error
GPR	Gaussian Process Regression

References

Banks, A.; Vincent, J.; Anyakoha, C. A review of particle swarm optimization. Part II: Hybridisation, combinatorial, multicriteria and constrained optimization, and indicative applications. Nat. Comput. 2008, 7, 109–124. [Google Scholar] [CrossRef]
Schueller, G.I.; Jensen, H.A. Computational methods in optimization considering uncertainties—An overview. Comput. Methods Appl. Mech. Eng. 2008, 198, 2–13. [Google Scholar] [CrossRef]
Kumar, A.; Wu, G.; Ali, M.Z.; Mallipeddi, R.; Suganthan, P.N.; Das, S. A test-suite of non-convex constrained optimization problems from the real-world and some baseline results. Swarm Evol. Comput. 2020, 56, 100693. [Google Scholar] [CrossRef]
Akyol, D.E.; Bayhan, G.M. A review on evolution of production scheduling with neural networks. Comput. Ind. Eng. 2007, 53, 95–122. [Google Scholar] [CrossRef]
Kim, T.H.; Maruta, I.; Sugie, T. A simple and efficient constrained particle swarm optimization and its application to engineering design problems. Proc. Inst. Mech. Eng. Part J. Mech. Eng. Sci. 2010, 224, 389–400. [Google Scholar] [CrossRef]
Martelli, E.; Amaldi, E. PGS-COM: A hybrid method for constrained non-smooth black-box optimization problems: Brief review, novel algorithm and comparative evaluation. Comput. Chem. Eng. 2014, 63, 108–139. [Google Scholar] [CrossRef]
Rao, R. Review of applications of TLBO algorithm and a tutorial for beginners to solve the unconstrained and constrained optimization problems. Decis. Sci. Lett. 2016, 5, 1–30. [Google Scholar]
Guedria, N.B. Improved accelerated PSO algorithm for mechanical engineering optimization problems. Appl. Soft Comput. 2016, 40, 455–467. [Google Scholar] [CrossRef]
Agarwal, H.; Renaud, J.E.; Preston, E.L.; Padmanabhan, D. Uncertainty quantification using evidence theory in multidisciplinary design optimization. Reliab. Eng. Syst. Saf. 2004, 85, 281–294. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Maier, H.R.; Kapelan, Z.; Kasprzyk, J.; Kollat, J.; Matott, L.S.; Cunha, M.C.; Dandy, G.C.; Gibbs, M.S.; Keedwell, E.; Marchi, A.; et al. Evolutionary algorithms and other metaheuristics in water resources: Current status, research challenges and future directions. Environ. Model. Softw. 2014, 62, 271–299. [Google Scholar] [CrossRef]
Kashani, A.R.; Camp, C.V.; Rostamian, M.; Azizi, K.; Gandomi, A.H. Population-based optimization in structural engineering: A review. Artif. Intell. Rev. 2022, 345–452. [Google Scholar]
Forrester, A.; Sobester, A.; Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Box, G.E.P.; Wilson, K.B. On the Experimental Attainment of Optimum Conditions. In Breakthroughs in Statistics: Methodology and Distribution; Springer: New York, NY, USA, 1992; pp. 270–310. [Google Scholar]
Buhmann, M.D. Radial Basis Functions. Acta Numer. 2000, 9, 1–38. [Google Scholar] [CrossRef]
Oliver, M.A.; Webster, R. Kriging: A Method of Interpolation for Geographical Information Systems. Int. J. Geogr. Inf. Syst. 1990, 4, 313–332. [Google Scholar] [CrossRef]
Gramacy, R.B. Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Queipo, N.V.; Haftka, R.T.; Shyy, W.; Goel, T.; Vaidyanathan, R.; Tucker, P.K. Surrogate-based analysis and optimization. Prog. Aerosp. Sci. 2005, 41, 1–28. [Google Scholar] [CrossRef]
Tabatabaei, M.; Hakanen, J.; Hartikainen, M.; Miettinen, K.; Sindhya, K. A survey on handling computationally expensive multiobjective optimization problems using surrogates: Non-nature inspired methods. Struct. Multidiscip. Optim. 2015, 52, 1–25. [Google Scholar] [CrossRef]
Lataniotis, C.; Marelli, S.; Sudret, B. Extending classical surrogate modeling to high dimensions through supervised dimensionality reduction: A data-driven approach. Int. J. Uncertain. Quantif. 2020, 10, 55–82. [Google Scholar] [CrossRef]
Zhou, Z.; Ong, Y.S.; Nguyen, M.H.; Lim, D. A study on polynomial regression and Gaussian process global surrogate model in hierarchical surrogate-assisted evolutionary algorithm. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2–5 September 2005; pp. 2832–2839. [Google Scholar]
Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 1998, 13, 455. [Google Scholar] [CrossRef]
Hawe, G.; Sykulski, J. Considerations of accuracy and uncertainty with kriging surrogate models in single-objective electromagnetic design optimisation. IET Sci. Meas. Technol. 2007, 1, 37–47. [Google Scholar] [CrossRef]
Chaiyotha, K.; Krityakierne, T. A comparative study of infill sampling criteria for computationally expensive constrained optimization problems. Symmetry 2020, 12, 1631. [Google Scholar] [CrossRef]
Jiao, R.; Zeng, S.; Li, C.; Jiang, Y.; Jin, Y. A complete expected improvement criterion for Gaussian process assisted highly constrained expensive optimization. Inf. Sci. 2019, 471, 80–96. [Google Scholar] [CrossRef]
Liu, H.; Cai, J.; Ong, Y.-S. An adaptive sampling approach for Kriging metamodeling by maximizing expected prediction error. Comput. Chem. Eng. 2017, 106, 171–182. [Google Scholar] [CrossRef]
Das, S.; Suganthan, P.N. Differential evolution: A survey of the state-of-the-art. IEEE Trans. Evol. Comput. 2010, 15, 4–31. [Google Scholar] [CrossRef]
Liang, J.J.; Runarsson, T.P.; Mezura-Montes, E.; Clerc, M.; Suganthan, P.N.; Coello, C.A.C.; Deb, K. Problem definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter optimization. J. Appl. Mech. 2006, 41, 8–31. [Google Scholar]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Ray, T.; Liew, K.-M. Society and civilization: An optimization algorithm based on the simulation of social behavior. IEEE Trans. Evol. Comput. 2003, 7, 386–396. [Google Scholar] [CrossRef]
Yıldırım, A.E.; Karci, A. Application of three bar truss problem among engineering design optimization problems using artificial atom algorithm. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; pp. 1–5. [Google Scholar]

Figure 1. Flowchart of the Differential Evolution Algorithm.

Figure 2. Box plots of the best feasible solution after 100 repetitions by CEI and CEPE in nine benchmark problems.

Figure 3. Box plots representing the distribution of the best feasible solutions obtained after 100 repetitions using the CEI and CEPE methods in the Three-Bar Truss Design problem.

Table 1. Summary of surrogate model methodologies in constrained optimization problems.

Methodology	Reference	Approach	Advantages	Limitations
Polynomial Regression	Box and Wilson, 1992 [14]	Utilizes polynomial equations to model relationships between variables	Straightforward approach for problems with smooth relationships	May not capture complex, nonlinear behaviour effectively
Radial Basis Function (RBF) Networks	Buhmann, 2000 [15]	Employs radial basis functions to approximate unknown functions	Effective for scattered multidimensional data	Computationally intensive for large datasets
Kriging	Oliver and Webster, 1990 [16]	A best-unbiased predictor based on geostatistical methods	Provides an estimate of prediction error; well-suited for surrogate modelling	Requires significant computational resources for large problems
Gaussian Process Regression (GPR)	Gramacy, 2020 [17]	Models unknown functions as samples from a Gaussian process	Robust in capturing trends and uncertainty quantification	May be challenging to scale to very high-dimensional problems

Table 2. Summary of benchmark characteristics.

$Prob$	$Dim$	Type of f	$f (x^{*})$	$ρ$
G02mod	2	Nonlinear	-	-
G03mod	2	Polynomial	-	-
G04	5	Quadratic	$- 30665.539$	52.123%
G06	2	Cubic	−6961.8139	0.0066%
G08	2	Nonlinear	−0.095825	0.8560%
G09	7	Polynomial	680.630	0.5121%
G11	2	Quadratic	0.7499	0.0000%
G12	3	Quadratic	−1.0	4.7713%
G24	2	Linear	−5.508	79.6556%

Table 3. Function values (mean, standard deviation, best, and worst) achieved by CEI and CEPE methods after 100 independent repetitions. Values in bold indicate the superior results across comparisons.

		CEI	CEPE
G02	mean	−0.3360827	−0.3510043
	sd	0.0398702	0.02986116
	best	−0.364923	−0.364946
	worst	−0.261689	−0.262792
G03	mean	−0.999943	−0.9999153
	sd	0.000125786	0.000146517
	best	−1.000086	−1.000097
	worst	−0.999447	−0.999467
G04	mean	−30663.44	−30663.82
	sd	2.397651	2.18125
	best	−30665.54	−30665.54
	worst	−30656.25	−30656.61
G06	mean	−6673.657	−6761.299
	sd	280.4894	172.4326
	best	−6942.869	−6949.9
	worst	−5990.58	−6282.121
G08	mean	−0.0958143	−0.09581347
	sd	1.36 × 10⁻⁵	1.51 × 10⁻⁵
	best	−0.095825	−0.095825
	worst	−0.095764	−0.095762
G09	mean	988.2063	995.5317
	sd	128.8924	110.2954
	best	744.6612	740.9385
	worst	1282.99	1204.271
G11	mean	0.7501684	0.7501476
	sd	0.000233982	0.000241632
	best	0.749901	0.749902
	worst	0.750818	0.751034
G12	mean	−0.9999771	−0.9999772
	sd	1.89 × 10⁻⁵	1.85 × 10⁻⁵
	best	−1	−1
	worst	−0.999935	−0.999937
G24	mean	−5.505601	−5.505723
	sd	0.002281037	0.00178044
	best	−5.50792	−5.507943
	worst	−5.498976	−5.499741

Table 4. Wilcoxon signed-ranks test results at the 5% significance level.

Problem	p-Value	Results
G02mod	0.001996	CEPE < CEI
G03mod	0.1895	CEPE ≈ CEI
G04	0.2478	CEPE ≈ CEI
G06	0.1301	CEPE ≈ CEI
G08	0.7026	CEPE ≈ CEI
G09	0.5071	CEPE ≈ CEI
G11	0.3431	CEPE ≈ CEI
G12	0.9248	CEPE ≈ CEI
G14	0.4606	CEPE ≈ CEI

Table 5. Function values (mean, standard deviation, best and worst) achieved by CEI and CEPE methods after 100 independent repetitions. Values in bold indicate the superior results across comparisons.

		CEI	CEPE
Three-Bar Truss Design	mean	264.2152	264.0456
	sd	0.3605	0.2208
	best	263.8962	263.8972
	worst	265.3764	265.0683

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cong, H.; Wang, B.; Wang, Z. A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints. Mathematics 2024, 12, 1115. https://doi.org/10.3390/math12071115

AMA Style

Cong H, Wang B, Wang Z. A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints. Mathematics. 2024; 12(7):1115. https://doi.org/10.3390/math12071115

Chicago/Turabian Style

Cong, Hongri, Bo Wang, and Zhe Wang. 2024. "A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints" Mathematics 12, no. 7: 1115. https://doi.org/10.3390/math12071115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Gaussian Process Surrogate Model with Expected Prediction Error for Optimization under Constraints

Abstract

1. Introduction

2. Background and Concepts

2.1. Gaussian Process Regression

2.2. Expected Improvement

2.3. Expected Prediction Error

3. Methodology

3.1. Constrained Expected Improvement

3.2. Constrained Expected Prediction Error

3.3. The GP Surrogate-Assisted DE Algorithm

4. Simulation

4.1. Benchmark Problem Description

4.2. Experimental Settings

4.3. Results

5. Application

6. Discussion and Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI