2.1.1. Kriging Surrogate

The concept of Kriging originates from geostatistical modeling inspired by the work of a South African engineer named Daniel Krige [20], further formalized by Matheron [21], and eventually adapted for deterministic numerical simulations by Sacks et al. [22] and Jones et al. [23]. The Kriging-based surrogate is founded on the hypothesis that any blackbox simulation response function *y* = *f*(*x*) can be expressed as a realization of the Gaussian random process *Y*(*x*).

The Kriging surrogate is composed of a global trend function *μ*(*x*) and centered GP *Z*(*x*) with zero mean and non-zero covariance (Equation (1)).

$$Y(\mathbf{x}) = \mu(\mathbf{x}) + Z(\mathbf{x}) \tag{1}$$

The trend function is a regression model that captures a general tendency in the observed data. It is expressed as a linear combination of *n* deterministic basis functions *φ*(*x*), based on regression of *N* response function evaluations (*n* ≤ *N*):

$$\mu(\mathbf{x}) = \sum\_{k=0}^{n} \phi\_k(\mathbf{x}) \beta\_k \tag{2}$$

Depending on the trend formulation, the Kriging metamodel can be classified into three categories: Simple Kriging (SK)—for which the trend is a known constant; Ordinary Kriging (OK)—for which the trend is constant but unknown; and Universal Kriging (UK) for which the basis functions *<sup>φ</sup>*(*x*) are known and fixed, but the coefficients *<sup>β</sup><sup>k</sup>* <sup>∈</sup> <sup>R</sup>\{0} are unknown and are estimated in the surrogate construction process. In this study, we use the most general, i.e., UK formulation, due to the anticipated high curvature of the objective function. The trend is defined as a *p*-dimensional full first-order polynomial (Equation (3)).

$$\mu(\mathbf{x}) = \sum\_{j=0}^{p} \beta\_j \mathbf{x}\_j \tag{3}$$

The stochastic component *Z*(*x*) in Equation (1) models a local deviation from the global trend to the true objective function. It is characterized by constant but unknown variance *σ*<sup>2</sup> and non-zero covariance (Equation (4)).

$$\text{Cov}(Z\_{i\prime}, Z\_{i\prime}) = \sigma^2 \mathbb{R}(\mathbf{x}\_i - \mathbf{x}\_{i\prime}, \boldsymbol{\Psi}) \quad i, i \prime = 1, \ldots, N \tag{4}$$

Here, *R*(*xi* − *xi*, *ψ*) denotes the spatial correlation function between any two samples *xi* and *xi*, and is often referred to as the covariance kernel. In multi-dimensional problems, the kernel turns into a tensor product of *p* one-dimensional correlation functions *r*(*xi* − *xi*, *ψ*):

$$R(\mathbf{x}\_i - \mathbf{x}\_{i'}, \boldsymbol{\Psi}) = \prod\_{j=1}^p r \left( (\mathbf{x}\_i - \mathbf{x}\_{i'})\_{j'}^{\delta\_j}, \boldsymbol{\Psi}\_j \right) \tag{5}$$

The level of impact of *j*-th dimension correlations on the Kriging prediction is controlled by the covariance kernel hyperparameters grouped in the vector *ψ*. The smoothness coefficient *δ* governs the differentiability of the metamodel surface. In the present study, we assume that the true objective function is smooth, which justifies using an infinitely differentiable Gaussian-type kernel (*δ<sup>j</sup>* = 2 ∀ *j* = 1, ... , *p*). In one dimension, the Gaussian kernel is formulated as:

$$r\_j(\mathbf{x}\_{i\prime}, \mathbf{x}\_{i\prime\prime}, \boldsymbol{\psi}) = \exp\left(-\frac{(\mathbf{x}\_i - \mathbf{x}\_{i\prime})\_j^2}{2\boldsymbol{\psi}\_j^2}\right) \tag{6}$$

The values of unknown hyperparameters *β*, *σ*2, *ψ* are estimated in the process called "fitting" the metamodel to the available data samples. This procedure is based on the maximization of a function L, expressing the probability of predicting the evaluated objective values at sampled locations. This method is called Maximum Likelihood Estimation (MLE) and is a non-trivial optimization problem of minimizing a multi-modal likelihood

function. For the purpose of this study, we used an algorithm relying on the quasi-Newton method [47], available in the R language package DiceKriging [48].

The Kriging predictor *Y*ˆ(*x*) at an arbitrary unobserved location *x*<sup>0</sup> is defined using random function evaluations at sampled positions *Y*(*xi*). The estimator holds the properties of the Best Linear Unbiased Predictor (BLUP), i.e.:

• Linearity—linear combination of random functions

$$\hat{Y}(\mathbf{x}\_0) = \sum\_{i=1}^{N} \lambda\_i(\mathbf{x}\_0) Y(\mathbf{x}\_i) \tag{7}$$

• Unbiasedness—absence of systematic bias

$$\mathbb{E}[\hat{Y}(\mathbf{x}\_0)] = \mathbb{E}[Y(\mathbf{x}\_0)] \tag{8}$$

• Minimal prediction variance—minimizing the mean squared error

$$MSE\left[\hat{Y}(\mathbf{x}\_0)\right] = \mathbb{E}\left[\left(\hat{Y}(\mathbf{x}\_0) - Y(\mathbf{x}\_0)^2\right)\right] \tag{9}$$

Solving the above system of equations for optimal weights *<sup>λ</sup><sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* allows computation of the GP predictor's mean and variance, which defines it entirely.

The Kriging surrogate requires input data from objective function evaluations at a set of discrete locations. The spatial distribution of these observations across the design space may profoundly influence the metamodel prediction quality [49]. Historically, various sampling strategies were developed for planning the empirical experiments, so the selection of an observation pattern is often called the Design of Experiment (DoE). The original DoE methods locate the majority of samples close to the design space boundaries and only a few observations in its interior [10]. Deterministic numerical simulations, however, are prone to systematic rather than random error. For this reason, space-filling strategies, distributing samples over the entire domain, are more convenient for such applications. Using classic DoE methods in computer experiments might be highly inefficient [22].

Although a variety of space-filling techniques well-suited for deterministic simulations is available, the Latin Hypercube Sampling (LHS) [50] strategy has attracted the most attention in recent years [49,51]. The LHS algorithm distributes the observations in a *p*-dimensional hypercube so that the samples' projections into (*p* − 1)-dimensional space do not share the exact location along the axes [52]. In a two-dimensional simplification, this criterion would result in samples being distributed on a grid in which only one sample is located in each row and each column. The designer can arbitrarily select the number of observations, which makes the LHS method attractive for studies based on expensive aerodynamic simulations.

In this paper, we use the Optimal Latin Hypercube (OLH) technique, which improves LHS's space-filling properties using a columnwise-pairwise algorithm [53] to optimize sample distribution with respect to the S-optimality criterion [54]. The algorithm is executed using the lhs R package [55].

For the sake of computational efficiency of the optimization process, the initial dataset is limited to a number ensuring sufficient initial prediction quality to initiate the following adaptive sampling stage. The optimal number is unknown a priori, although some respected studies suggest approximation using the "10*p*" rule-of-thumb [23,56], where *p* is the number of design variables. Here, the design space has 12 dimensions, which would result in 120 initial samples; however, we follow the suggestion of a finite-decimal spacing between observations [23]. Ultimately, the DoE contains 126 elements with an inter-distance of <sup>1</sup> (126−1) <sup>=</sup> 0.008.

The computational expense required to generate samples that construct the Kriging surrogate increases with the problem dimensionality, which restricts this method's applicability to a moderate number of design variables. The limiting problem size reported in the literature is not accurate (e.g., *p* < 50 in [11], *p* < 20 in [14]) as it is dependent

on the available resources. Nevertheless, the Kriging technique is prone to the curse of dimensionality and, as such, is not suited for extensively large optimization problems.

The adaptive sampling, subsequent to the initial DoE, sequentially adds new observations to the dataset using current information about the predicted properties of the objective landscape. The process locates new samples with respect to the preselected infill criterion. For this purpose, we use the Expected Improvement (EI) function, proposed by Jones et al. [23], which takes advantage of the Kriging ability to estimate the prediction uncertainty.

The EI criterion uses the Kriging variance to assess the possibility of the objective function value being lower than the current best prediction *ymin* at any unobserved location. The addition of a new sample may bring an improvement *I*(*x*) equal to max(*ymin* − *y*(*x*), 0). The value of *y*(*x*) has yet to be discovered in unevaluated points, and it is modeled using the Kriging predictor *Y*ˆ(*x*). As a result, the improvement becomes a random variable for which an expected value constitutes the EI function (Equation (10)).

$$EI(\mathbf{x}) = \mathbb{E}[I(\mathbf{x})] = \mathbb{E}\begin{Bmatrix} y\_{\min} - \hat{Y}(\mathbf{x}) & if \ \hat{Y}(\mathbf{x}) < y\_{\min} \\ 0 & otherwise \end{Bmatrix} \tag{10}$$

Integration by parts of Equation (10) gives the following analytical formulation of the EI:

$$EI(\mathbf{x}) = \begin{cases} (y\_{\min} - \mathfrak{m}(\mathbf{x})) \complement DF\left(\frac{y\_{\min} - \mathfrak{m}(\mathbf{x})}{\mathfrak{s}(\mathbf{x})}\right) + \mathfrak{s}(\mathbf{x}) DF\left(\frac{y\_{\min} - \mathfrak{m}(\mathbf{x})}{\mathfrak{s}(\mathbf{x})}\right) & \text{if } \mathfrak{s}(\mathbf{x}) > 0 \\ 0 & \text{if } \mathfrak{s}(\mathbf{x}) = 0 \end{cases} \tag{11}$$

Here, *m*ˆ(*x*) and *s*ˆ(*x*) are the Kriging prediction mean and variance, *CDF* is a cumulative distribution function, and *PDF* is a probability density function of a standard normal distribution.

Iterative introduction of new samples *x*∗ at locations maximizing the EI function constitutes the EGO algorithm (Equation (12)).

$$\mathfrak{x}^\* = \underset{\mathfrak{x} \in \mathfrak{X}}{\text{argmax}} EI(\mathfrak{x}) \tag{12}$$

Maximization of the EI criterion might not be trivial because of its strong multimodality. This study uses for this purpose a derivative-supported GA algorithm genoud [57] available under the R package DiceOptim [48].

#### 2.1.2. RBF-Based Mesh Morphing

Out of a variety of mesh morphing methods, this study employs the RBF-based technique. RBF are mathematical functions originating from a multivariate approximation of scattered data. Their value depends only on the distance from the function argument to the origin, called a control point. The RBF interpolates the value between control points according to the specific basis function properties while preserving it in the points' location.

A deliberate displacement of the control points introduces the desired deformation of the initial mesh. The motion is interpolated on the surrounding grid nodes' positions according to the characteristics of the selected RBF. The interpolation function <sup>h</sup>(*x*) : <sup>R</sup><sup>3</sup> <sup>→</sup> <sup>R</sup>, (Equation (13)) defines particular nodes' locations after deformation *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>* <sup>+</sup> h(*x*).

$$\mathfrak{h}(\mathbf{x}) = \sum\_{i=1}^{N\_{\mathbb{C}}} \gamma\_i \mathfrak{p}\left(||\mathbf{x} - \mathbf{x}\_{\mathbb{C}\_i}||\right) + p(\mathbf{x}) \tag{13}$$

Here, *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> : *<sup>x</sup>* <sup>=</sup> (*x*, *<sup>y</sup>*, *<sup>z</sup>*) is an arbitrary spatial position, *<sup>x</sup>Ci* is a known position of the *i*-th control point from a set of *NC* elements, and *γ<sup>i</sup>* is a corresponding weight coefficient describing the strength of the *i*-th interaction. The interpolation function comprises a linear combination of radial contributions described by RBF *ϕ*(·)—depending solely on the Euclidean distance *x* − *xCi* —and a linear polynomial:

$$p(\mathbf{x}) = \mathbf{a}\_0 + \mathbf{a}\_1 \mathbf{x} + \mathbf{a}\_2 \mathbf{y} + \mathbf{a}\_3 \mathbf{z} \tag{14}$$

The introduction of the polynomial *p*(*x*) ensures the uniqueness of the fit and allows for the affine motion.

From among several RBF suited for multivariate problems (see, e.g., [58–62]), we select the smooth Gaussian RBF to prioritize the preservation of the mesh quality (Equation (15)).

$$\mathfrak{sp}\left(\left\|\mathbf{x} - \mathbf{x}\_{\mathbb{C}\_i}\right\|\right) = \mathfrak{e}^{-\left\|\mathbf{x} - \mathbf{x}\_{\mathbb{C}\_i}\right\|^2/\mathfrak{e}^2} \tag{15}$$

The function's shape is tuned by a parameter *c*, whose value was selected for this study in a trial-and-error process as equal to 150.

The unknown values of weights *γ<sup>i</sup>* (Equation (13)) and polynomial coefficients *α<sup>i</sup>* (Equation (14)) are computed to satisfy the known displacement <sup>g</sup>*<sup>i</sup>* of each control point (Equation (16)).

$$\mathfrak{h}(\mathfrak{x}\_{\mathbb{C}i}) = \mathfrak{g}\_i \quad \text{for } i = 1, \ldots, N\_{\mathbb{C}} \tag{16}$$

The given displacements are stored in an *NC* <sup>×</sup> 3 matrix <sup>g</sup>. Moreover, additional orthogonality requirements (Equation (17)) are introduced to equate the number of equations with the number of degrees of freedom, increased by the presence of the polynomial *p*(*x*).

$$\sum\_{i=1}^{N\_{\mathbb{C}}} \gamma\_i = 0$$

$$\begin{aligned} \sum\_{i=1}^{N\_{\mathbb{C}}} \gamma\_i \chi\_{\mathbb{C}\_i} &= 0 \\ \sum\_{i=1}^{N\_{\mathbb{C}}} \gamma\_i y\_{\mathbb{C}\_i} &= 0 \\ \sum\_{i=1}^{N\_{\mathbb{C}}} \gamma\_i z\_{\mathbb{C}\_i} &= 0 \end{aligned} \tag{17}$$

A linear system of equations (Equation (18)) is constructed using the above conditions, which is subsequently solved for *NC* × 3 matrix *γ* of the interpolation function weights and 4 × 3 matrix *α* of polynomial coefficients.

$$
\begin{bmatrix} M & P \\ \mathcal{P}^T & 0 \end{bmatrix} \begin{bmatrix} \gamma \\ \mathfrak{a} \end{bmatrix} = \begin{bmatrix} \mathfrak{g} \\ 0 \end{bmatrix} \tag{18}
$$

The *NC* <sup>×</sup> *NC* interpolation matrix *<sup>M</sup>* gathers RBF evaluations with respect to the distances between all considered control points (Equation (19)).

$$\mathbf{M} = \begin{bmatrix} \varrho\left( \left\| \mathbf{x}\_{\mathbb{C}\_{1}} - \mathbf{x}\_{\mathbb{C}\_{1}} \right\| \right) & \cdots & \varrho\left( \left\| \mathbf{x}\_{\mathbb{C}\_{1}} - \mathbf{x}\_{\mathbb{C}\_{N\_{C}}} \right\| \right) \\ \vdots & \ddots & \vdots \\ \varrho\left( \left\| \mathbf{x}\_{\mathbb{C}\_{N\_{C}}} - \mathbf{x}\_{\mathbb{C}\_{1}} \right\| \right) & \cdots & \varrho\left( \left\| \mathbf{x}\_{\mathbb{C}\_{N\_{C}}} - \mathbf{x}\_{\mathbb{C}\_{N\_{C}}} \right\| \right) \end{bmatrix} \tag{19}$$

The orthogonality conditions (see Equation (17)) result in *NC* × 4 matrix *P* containing a unity column and all control points' positions (Equation (20)).

$$\mathcal{P} = \begin{bmatrix} \mathbf{1} & \mathbf{x\_{C\_1}} & \mathcal{Y\_{C\_1}} & \mathcal{z\_{C\_1}} \\ \vdots & \vdots & \vdots & \vdots \\ \mathbf{1} & \mathbf{x\_{C\_{N\_C}}} & \mathcal{Y\_{C\_{N\_C}}} & \mathcal{z\_{C\_{N\_C}}} \end{bmatrix} \tag{20}$$

After solving the system above, the post-deformation spatial position *<sup>x</sup>* <sup>=</sup> (*x*, *<sup>y</sup>*, *<sup>z</sup>*) of an arbitrary location is computed as an incremental displacement (Equation (21)).

$$\begin{cases} \mathbf{x}' = \mathbf{x} + \mathfrak{h}^x(\mathbf{x})\\ \mathbf{y}' = \mathbf{y} + \mathfrak{h}^y(\mathbf{x})\\ \mathbf{z}' = \mathbf{z} + \mathfrak{h}^z(\mathbf{x}) \end{cases} \tag{21}$$

In the definition of the optimization problem considered in the present paper, the Cartesian components of control points' translation vectors serve as design variables.

This study implements the mesh morphing algorithms using the RBF module of PyGeM (Python Geometrical Morphing) [63], an open-source Python programming language library for parameterization and deformation of complex geometries.

A vital aspect of the functional mesh morphing process is maintaining the mesh quality. Such a problem usually comes down to controlling an appropriate quality metric. Selecting a measure essential for the specific flow solver's error-free operation is fundamental. From the various metrics available in the literature [64,65], we select the mesh element orthogonality angle diagnostic as appropriate for the employed fluid dynamics solver. This measure compares the angle between adjoining mesh cell faces to the ideal level (Equation (22)).

$$90^\circ - \cos^{-1}(d \cdot \mathfrak{n}) \tag{22}$$

The vector *d* connects two element centroids, and *n* is a face normal vector. The orthogonality metric positively correlates with the mesh quality as the grid element shape approaches the idealized form for high values. An area-weighted averaging over relevant faces gives a value for the specific control volume. Quality control involves monitoring the minimum of the metric among all mesh elements and comparing it to the value for the initial mesh as a reference.

#### 2.1.3. Achievement Scalarizing Function

Multiple objectives and flight conditions considered in this study are combined using the augmented Chebyshev-type ASF [38]. This technique belongs to the a priori methods, for which the designer articulates their preferences about the relative importance of the objectives and conditions before the optimization process starts. Such an approach is particularly attractive for studies involving computationally expensive simulations, as only a fraction of the objective space needs to be resolved.

Combining preferences into one function, such that the problem can be solved with a single objective optimizer, is usually called scalarization. Consequently, the resulting function is referred to as a scalarizing function. Although the weighted sum method [66] is the most intuitive representant of this class, it has an inherent drawback in its inability to find solutions on a non-convex portion of a Pareto front (see Figure 2a). This issue is not present in methods minimizing a metric of distance to a reference point arbitrarily located in the objective space (see Figure 2b). The ASF selected for this study employs the Chebyshev metric as a measure of distance and the optimization target *f <sup>T</sup>*, expressing the designer's aspirations about particular objectives, as a reference point. The scalarizing function in a multi-point formulation is defined as follows:

$$\mathcal{Y}\_{\text{ASF}}^{\text{MP}}(\mathbf{x}) = \max \left\{ \lambda\_{j\bar{k}} \left( f\_{\bar{j}\bar{k}}(\mathbf{x}) - f\_{\bar{j}\bar{k}}^{T} \right) \right\} + \varrho \sum\_{j} \sum\_{k} \lambda\_{j\bar{k}} \left( f\_{\bar{j}\bar{k}}(\mathbf{x}) - f\_{\bar{j}\bar{k}}^{T} \right) \tag{23}$$

Above, subscripts *j* and *k* refer to particular design conditions and objectives, respectively. The augmentation coefficient  (together with the following term) guarantees proper Pareto optimality and takes an arbitrarily small positive value, here 0.05.

**Figure 2.** Non-convex Pareto front in different methods: (**a**) weighted sum, (**b**) reference point.

The coefficients *λjk* serve to normalize the objective values to a similar range (Equation (24)).

$$
\lambda\_{jk} = \frac{1}{f\_{jk}^N - f\_{jk}^T} \tag{24}
$$

The *f <sup>N</sup>* is called the *nadir* point and represents the upper bound of an objective value. The designer usually estimates its value based on knowledge about the landscape of the objective supported by their educated judgment.

Except for the advantages referred to above and the proven efficiency, the ASF is characterized by an intuitive concept of setting a target and aiming at advancing on it.
