**1. Introduction**

Conducting experiments to better understand manufacturing processes is crucial, with real physical experiments being considered the gold standard. However, conducting real physical experiments for each new experimental setting is impractical because of expensive materials, production stoppages and labor hours for monitoring and evaluation. One good alternative is conducting experiments via simulation, where numerical methods–such as Finite Element Method (FEM)–present a well-observed method in the field of structural analysis. However, solving complex problems with FEM is time-consuming and computationally expensive. In order to reduce the computational effort, surrogate modeling may offer a promising solution [1]. Surrogate models are trained in a supervised manner and are designed to learn the function mapping between inputs and outputs. With a sufficient amount of training data with respect to the observed use case, a customized surrogate model is able to substitute for a FEM simulation up to a certain accuracy. When only specific dimensions with a controlled reduction in accuracy of a simulation result are desired, reduced-order surrogate modeling is an already known technique. Thus, reduced-order surrogate modeling aims to substitute the high-resolution simulation domain with some carefully selected dimensions of importance, e.g., selected displacement measures of a deformed part can be predicted by a reduced-order surrogate modeling with low com-

**Citation:** Hoffer, J.G.; Geiger, B.C.; Kern, R. Gaussian Process Surrogates for Modeling Uncertainties in a Use Case of Forging Superalloys. *Appl. Sci.* **2022**, *12*, 1089. https://doi.org/ 10.3390/app12031089

Academic Editors: Jin-Gyun Kim, Jae Hyuk Lim and Peter Persson

Received: 16 December 2021 Accepted: 18 January 2022 Published: 20 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

putational effort, instead of performing a computationally intensive FEM simulation that predicts the displacement of each node representing the deformed part.

Meanwhile, Gaussian process regression (GP) has been successfully used as a surrogate model in the past. In literature, GP regression is also called "kriging" after the statistician and mining engineer Danie G. Krige [2]. However, for consistency, we use only the term GP regression or plain GP in this paper. Regarding GP regression, one of the biggest advantages is that it predicts a distribution (described by mean and standard deviation) rather than just a point estimate. The predicted standard deviation can be seen as a quality criterion related to the corresponding predicted mean value. In the following, we will refer to that standard deviation of a prediction as epistemic uncertainty, i.e., how certain the model is with respect to its prediction. Considering real manufacturing processes, another source of uncertainty can be observed with regard to the lack of complete control over all influence parameters. These deviations occurring during repeated process iterations under the same conditions are referred to as aleatoric uncertainty.

Recapitulating, we want to shed light on two types of uncertainties in surrogate modeling: (1) epistemic uncertainty referring to the lack of knowledge in respect to a simulation model and can be minimized by adding additional sources of information (with respect to machine learning models, it is mainly increasing the number of training instances at new locations in the feature space) and (2) aleatoric uncertainty referring to deviations of an observed manufacturing process itself, i.e., aleatoric uncertainty cannot be minimized even if more data is generated. Since epistemic and aleatoric uncertainties describe different properties, it seems natural to treat them separately when making predictions or optimization. However, it should be mentioned that in certain circumstances it may be useful to consider uncertainty as a whole rather than dividing it into aleatoric and epistemic uncertainty. In such cases, heteroskedastic GP regression represents a common approach for optimization with surrogate models [3–5]. In our problem definition, especially in solving inverse problems, we argue that the distinction of epistemic and aleatoric uncertainty shows clear advantages.

There is a wealth of literature on surrogate models, reduced-order surrogate models, and optimization with GP regression. We present in the following the main related works to our research field organized in (1) GP regression and FEM simulations, (2) GP regression trained with pure sensor data and (3) optimization with GP regression.

In the work of Roberts et al. [6], they predict damage development in forged brake discs reinforced with Al-SiC particles using damage maps. In addition to Multilayer Perceptron (MLP), Roberts et al. [6] utilize GP regression as a surrogate model. Loghin and Ismonov [7] predict the stress intensity factors using GP regression trained with FEM results of a classical bolt-nut assembly use case. Ming et al. [8] model an electrical discharge machining process with GP regression. Su et al. [9] utilize GP regression as a surrogate in a structural reliability analysis of a large suspension bridge. In the work of Guo and Hesthaven [10], GP regression is used as a reduced-order model for nonlinear structural analysis in a 1D and 3D use case, where data generation was performed with active learning. Hu et al. [11] use GP regression to estimate residual stresses field of machined parts from two-dimensional numerical simulations. Yue et al. [12] propose two active learning approaches using GP regression for a composite fuselage use case. In the work of Ortali et al. [13] GP regression is used as a reduced-order surrogate model for fluid dynamics use cases. Venkatraman et al. [14] use GP regression as a surrogate model of texture in micro-springs. GP regression can also be used on data with multiple fidelity levels, where Lee et al. [15] investigate GP regression surrogate modeling with uncertain material properties of soft tissues and multi-fidelity data. Brevault et al. [16] provide an overview of multi-fidelity GP regression techniques in the field of aerospace systems. GP regression can also be extended by methods that stack them or use them in a tree model. Civera et al. [17] predict imperfections in pultruded glass fiber reinforced polymers with a treed method of GP regression trained with experimental data and FEM simulation results. Abdelfatah et al. [18] propose a stacked GP regression to integrate different datasets and propagate uncertainties through the stacked model. GP regression can also be used for calibrating simulations, where Mao et al. [19] use GP regression as a surrogate model in a Bayesian model updating method to calibrate FEM simulation of a long-span suspension bridge.

In addition to the use of FEM data, GP regression also finds application in the use of pure sensor data, which we will discuss in the following. Tapia et al. [20] use a GP regression based surrogate model of a laser powder-bed fusion process to predict melt pool depth. Yu et al. [21] utilize–besides other thriving methods–a GP regression to model the relationship between geological variables and the broken rock zone thickness. Lee [22] uses GP regression trained with experimental data to optimize wire arc additive manufacturing process deposition parameters. Saul et al. [23] propose chained GP regression models based on non-linear latent function combination. Binois et al. [24] provide a heteroskedastic GP regression approach and results of two use cases, namely manufacturing and management of epidemics.

In the course of function maximization with GP regression surrogate models, Dai Nguyen et al. [25] propose a robust optimization approach based on Upper Confidence Bound (UCB) Bayesian Optimization (BO). In another field of optimization, namely solving inverse problems, there is related work found where BO with generalized chi-squared distribution is researched by Huang et al. [26], and Uhrenholt and Jensen [27], where besides standard GP regression Uhrenholt and Jensen [27] utilized warped GP regression from the work of Snelson et al. [28]. An extension of the standard BO can be found in the work of Plock et al. [29], where they combine BO with the Levenberg-Marquardt method. While in maximization and minimization problems aleatoric and epistemic uncertainties can often be treated in the same way, in most cases robust results can be obtained by distinguishing between these two sources of uncertainty [30]. We refer to robust results when mean predictions are associated with low aleatoric uncertainty.

There is already considerable related work in reduced-order surrogate modeling and optimization using GP regression surrogates. However, to the best of our knowledge, we could not identify related work for solving inverse problems in which aleatoric and epistemic uncertainties are treated differently. Optimization approaches for solving inverse problems usually use only epistemic uncertainty. When epistemic and aleatoric uncertainties are taken into account, they are often simply combined, resulting in the potential loss of important information.

To sum up, we identify the following drawbacks:


As a response, we present the following main contributions of our research to tackle the identified drawbacks of related work:


With the proposed surrogate model and novel multi-objective optimization strategy, we pave the way for surrogate modeling and inverse problem-solving for practical applications that make use of explicit modeling of sources of uncertainties. Our findings are validated on a typical hot metal forming manufacturing process: preforming an Inconel 625 superalloy billet on a forging press.

This paper is structured as follows. In Section 2, we present the proposed surrogate model, providing an introduction to GP regression in Section 2.1 and describe the GP based parts of our surrogate model in Sections 2.2 and 2.3. The data generation of aleatoric uncertainty for our surrogate model approach is presented in Section 2.4. Section 3 deals with optimization, where we outline active learning in Section 3.1 and solving inverse problems in Section 3.2. In Section 4 we present the studied use case, preforming an Inconel 625 superalloy billet on a forging press, where we give insights on the design of the forging aggregate characteristics in Section 4.1 and all information regarding the corresponding FEM simulation in Section 4.2. Section 5 shows the results, which are discussed in Section 6. In Section 7, we present the conclusion of our work and an outlook for the future.

#### **2. GP based Surrogate Model**

In this section, we first introduce briefly the general idea behind our surrogate modeling approach. We familiarize in Section 2.1 the reader with the general functionality of GP regression to provide an appropriate foundation for the content that follows. In Sections 2.2 and 2.3 we provide more detailed descriptions of each individual GP of our surrogate model. After describing our surrogate model, we move on to uncertainty propagation analysis with FEM simulation in Section 2.4, where we present the procedure for obtaining aleatoric uncertainties.

GP regression is already well researched for surrogate modeling, replacing expensive target labellers (e.g., numerical simulations, expensive manually labelling, conducting real physical experiments, etc.). One reason is their ability to work with low-dimensional data. Another big advantage of using GP regression is that predictions are made in a probabilistic way, i.e., a prediction is represented by a posterior distribution. Thus, a prediction of GP regression is described by a mean and a covariance. The covariance of a prediction can be used as a metric of prediction confidence, i.e., epistemic uncertainty. We specify that outputs of GP regression describe a distribution with mean *m* and epistemic uncertainty *σ*.

The proposed surrogate model consists of two individual GPs and takes manufacturing process-specific parameters *xm*, part-specific parameters *xp* and aleatoric process uncertainty Σ¯ *al*(*Z*) as input and predicts the mean manufacturing result *μ* and aleatoric uncertainty *σal* of the manufacturing result, see Figures 1b, 2 and 3. A similar simulation approach using FEM is shown in Figure 1a. We define *Z* as a parameter that describes the manufacturing process characteristics, e.g., velocity profile of a forming tool. Our model assumes that Σ¯ *al*(*Z*) can be efficiently obtained for every *xm*. This assumption is justified in our running example, where we focus on the first of two directly successive forging strokes. That means that measurements of the manufacturing process are available (i.e., velocity profile of the forging tool), but measurements in respect to the forged part are not possible due to the short time span between the first and second stroke.

**Figure 1.** Simulation of Manufacturing Processes with Uncertainties: (**a**) FEM simulation scenario and (**b**) GP based surrogate model with manufacturing process-specific parameters *xm*, part-specific parameters *xp* and distribution *Z* that describes a manufacturing process-specific characteristic by

mean *m*(*Z*) and aleatoric manufacturing process uncertainty Σ*al*(*Z*) where Σ¯ *al*(*Z*) is an aggregated form of Σ*al*(*Z*). Outputs are the mean of the manufacturing process result *m*(*μ*) and mean of the aleatoric uncertainty *m*(*σal*), each with corresponding epistemic uncertainties *σ*(*μ*) and *σ*(*σal*) in the GP based surrogate model.

**Figure 2.** GP takes manufacturing process parameters *xm*, part specific parameters *xp* and aleatoric manufacturing process uncertainty Σ¯ *al*(*Z*) as input and predicts the mean *m*(*σal*) and epistemic uncertainty *σ*(*σal*) of the aleatoric uncertainty of the manufacturing process result.

**Figure 3.** GP takes manufacturing process-specific parameters *xm* and part specific parameters *xp* as input and predicts the mean *m*(*μ*) and epistemic uncertainty *σ*(*μ*) of the manufacturing process result.

#### *2.1. Gaussian Process*

A GP is a generalization of the Gaussian distribution. The Gaussian distribution describes random variables or random vectors, while a GP describes functions *f*(*x*). In general, a GP is completely specified by its mean function *m*(*x*) and covariance function *k*(*x*, *x* ), also called kernel. If the function *f*(*x*) under consideration is modeled by a GP, we have

$$\mathbb{E}[f(\mathbf{x})] = m(\mathbf{x}) \tag{1}$$

$$\mathbb{E}[(f(\mathbf{x}) - m(\mathbf{x}))(f(\mathbf{x'}) - m(\mathbf{x'}))] = k(\mathbf{x}, \mathbf{x'}) \tag{2}$$

for all *x* and *x* . Where *x* refers to training and *x* to test data. Thus, we can define the GP by

$$f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')). \tag{3}$$

We use the following notation for explanatory purposes only in this section. Matrix *Dtrain* = (*X*,*Y*) contains the training data with input data matrix *X* = (*x*1, ... , *xn*) and output data matrix *Y* = (*y*1, ... , *yn*), and test data matrix *Dtest* = (*X* ,*Y* ) contains the test data with *X* = (*x <sup>n</sup>*+1, ... , *x <sup>n</sup>*+*m*) as input and *Y* = (*y <sup>n</sup>*+1, ... , *y <sup>n</sup>*+*m*) as output. We define that they are jointly Gaussian and have zero mean with consideration of the prior distribution, further, we assume an additive independent identically distributed Gaussian noise with variance *σ*<sup>2</sup> *<sup>n</sup>* and identity matrix *I* for noisy observations:

$$
\begin{bmatrix}
\boldsymbol{Y} \\
\boldsymbol{Y}'
\end{bmatrix} \sim \mathcal{N} \left( \mathbf{0}, \begin{bmatrix}
k(\mathbf{X}, \mathbf{X}) + \sigma\_\mathbf{n}^2 \boldsymbol{I} & k(\mathbf{X}, \mathbf{X}') \\
k(\mathbf{X}', \mathbf{X}) & k(\mathbf{X}', \mathbf{X}')
\end{bmatrix} \right) \tag{4}
$$

The GP predicts the function values *Y* at positions *X* in a probabilistic way, where, the posterior distribution can be fully described by the mean and the covariance.

$$\begin{aligned} Y'[X',X,Y \sim \mathcal{N}(k(X',X)[k(X,X) + \sigma\_n^2 I]^{-1}Y, \\ \mathbf{k}(X',X') - \mathbf{k}(X',X)[\mathbf{k}(X,X) + \sigma\_n^2 I]^{-1}\mathbf{k}(X,X')) \end{aligned} \tag{5}$$

Resulting in mean

$$m(Y') = \mathbb{E}[Y'|X, Y, X'] = k(X', X)[k(X, X) + \sigma\_n^2 I]^{-1}Y\tag{6}$$

covariance

$$\text{Cov}V(Y') = k(X',X') - k(X',X)[k(X,X) + \sigma\_n^2 I]^{-1}k(X,X')\tag{7}$$

and epistemic standard deviation *σ*

$$
\sigma(\mathbf{Y'}) = \sqrt{\text{diag}(\mathcal{COV}(\mathbf{Y'}))} \tag{8}
$$

where the diagonal of the covariance matrix *COV* is extracted as a vector and the square root is calculated for each element to determine the epistemic standard deviation *σ*. It can be observed that the selection or design of the covariance function is the main ingredient when using GP regression. In the following, we describe the two covariance functions we use in our approach: the popular Radial Basis Function (RBF) (also called squared exponential covariance function)

$$k\_{\rm RBF}(\mathbf{x}, \mathbf{x}') = \exp\left(\frac{||\mathbf{x} - \mathbf{x}'||^2}{l^2}\right) \tag{9}$$

with characteristic length-scale parameter *l* and || · || denoting the Euclidean distance and the Matérn covariance function

$$k\_{\text{Matérn}}(\mathbf{x}, \mathbf{x}') = \frac{1}{\Gamma(\nu) 2^{\nu - 1}} \left( \frac{\sqrt{2\nu}}{l} ||\mathbf{x} - \mathbf{x}'|| \right)^{\nu} K\_{\nu} \left( \frac{\sqrt{2\nu}}{l} ||\mathbf{x} - \mathbf{x}'|| \right) \tag{10}$$

with gamma function Γ, modified Bessel function *K<sup>ν</sup>* and parameter *ν* that controls the smoothness of the resulting function. For more information on GP regression and covariance functions, we refer the reader to the book of Williams and Rasmussen [31].

#### *2.2. Aleatoric Uncertainty GP*

A GP is used to predict a manufacturing process related aleatoric uncertainty *σal* = *σal*(*xm*, *xp*, Σ¯ *al*(*Z*)) of the manufactured part. Aleatoric uncertainty data are generated by uncertainty propagation analysis with FEM simulation. The inputs are the setting parameters from a real physical manufacturing process *xm*, properties of the part to be manufactured *xp* and aleatoric manufacturing process uncertainty Σ¯ *al*(*Z*) obtained from, e.g., sensor data of the real physical manufacturing process, see Figure 2. Here, *Z* describes a characteristic of the manufacturing process, e.g., the velocity profile of a forming tool. The output *σal* is predicted by a GP regression, such that *<sup>σ</sup>al* ∼ GP(*m*(*xm*, *xp*, <sup>Σ</sup>¯ *al*(*Z*)), *<sup>k</sup>*((*xm*, *xp*, <sup>Σ</sup>¯ *al*(*Z*)), (*xm*, *xp*, Σ¯ *al*(*Z*)) )) with mean *m*(*xm*, *xp*, Σ¯ *al*(*Z*)) and covariance function *k*((*xm*, *xp*, Σ¯ *al*(*Z*)), (*xm*, *xp*, Σ¯ *al*(*Z*)) ).

Of course, a wide variety of manufacturing process characteristics can be implemented, e.g., rolling speeds, cutting forces, heating times etc. As a running example, we choose as a manufacturing process hot metal forming on a friction screwpress, where *xm* contains different input features which control the forging aggregate (clutch pressure between flywheels and rotation speed of the electric motor), *xp* describes the part to be forged by different dimensions and part temperature and *Z* is a resulting velocity profile of the forging tool over time for a given input *xm*, where Σ¯ *al*(*Z*) represents aggregated aleatoric deviations in respect to forging velocity. *σal* then describes the deviations of the final forged part, i.e., deviations from important final part geometries. All relevant details of our running example can be found in Section 4.

#### *2.3. Mean Result GP*

Besides the GP that predicts the aleatoric uncertainty of a manufactured part, a second GP is used to predict the mean result *μ* of the manufactured part. The inputs for the second GP are the setting parameters from the real physical manufacturing process *xm* and properties of the to be manufactured part *xp*. The output *μ* is predicted by the GP regression, such that *μ* ∼ GP(*m*(*xm*, *xp*), *k*((*xm*, *xp*),(*xm*, *xp*) )). In respect to our running example, *μ* describes the final forged part by important final part geometries.

#### *2.4. Uncertainty Propagation Analysis*

In uncertainty propagation analysis, the effect of uncertainties related to an input on uncertainties of the corresponding output is investigated. In our case, Σ*al*(*Z*) refers to the aleatoric deviations of a manufacturing process characteristics (i.e., deviations in velocity profile data) due to different input settings. We refer to uncertainties with respect to a manufacturing process output obtained by uncertainty propagation analysis as aleatoric uncertainty *σal*.

We vary input values *x*(*j*) = (*x* (*j*) *<sup>m</sup>* , *x* (*j*) *<sup>p</sup>* ) with *j* ∈ {1, . . . , *N*} where *N* is the number of different input setting scenarios. For each case of process-specific input parameters *x* (*j*) *<sup>m</sup>* , we obtain a process-specific characteristic *Z*(*j*) that is a distribution with mean *m*(*Z*(*j*)) and standard deviation Σ*al*(*Z*(*j*)). Such distributions occur because, with identical input parameters, process characteristics in reality can show deviations when repeated. We simulate that behavior with a separate GP, thus, a random variable *Z*(*j*) is assumed to be Normally distributed, such that *<sup>Z</sup>*(*j*) <sup>=</sup> <sup>N</sup> (*m*(*Z*(*j*)), <sup>Σ</sup>*al*(*Z*(*j*))). From the posterior, we randomly draw *<sup>M</sup>* predictions *<sup>z</sup>*(*i*)(*j*) with *<sup>i</sup>* <sup>∈</sup> {1, . . . , *<sup>M</sup>*} (i.e., different curves characterizing the manufacturing process) and with each *z*(*i*)(*j*) and *x* (*j*) *<sup>p</sup>* we execute FEM simulations to obtain targets *y*(*i*)(*j*). We collect the individual targets *y*(*i*)(*j*), such that we obtain for each input setting *j* a distribution with mean *μ*(*j*) and aleatoric standard deviation (i.e., aleatoric uncertainty) *<sup>σ</sup>*(*j*) *al* . With that, we are able to describe each target by its distribution.

Thus, we obtain a dataset *D* = % *D*(1),..., *D*(*N*) & where each datapoint *D*(*j*) = (*X*(*j*),*Y*(*j*)) can then be separated into input *X*(*j*) = (*x* (*j*) *<sup>m</sup>* , *x* (*j*) *<sup>p</sup>* , Σ¯ *al*(*Z*(*j*))) and output *Y*(*j*) = (*μ*(*j*), *<sup>σ</sup>*(*j*) *al* ). Here <sup>Σ</sup>¯ *al*(*Z*(*j*)) is an aggregated manufacturing process uncertainty obtained from data. We model each output with a GP regression, thus the outputs are described again by a distribution with mean *m* and epistemic standard deviation *σ* (i.e., epistemic uncertainty), such that *<sup>μ</sup>*(*j*) <sup>=</sup> <sup>N</sup> (*m*(*μ*(*j*)), *<sup>σ</sup>*(*μ*(*j*))) and *<sup>σ</sup>*(*j*) *al* <sup>=</sup> <sup>N</sup> (*m*(*σ*(*j*) *al* ), *<sup>σ</sup>*(*σ*(*j*) *al* )).

#### **3. Active Learning and Solving Inverse Problems**

For optimization, we evaluate our surrogate model in two different areas: (1) active learning and (2) solving multi-objective inverse problems. We refer to active learning as a method to find the most informative data points in the feature space for the best overall performance of the surrogate model, i.e., predicting the mean result of a manufacturing process *μ* and corresponding aleatoric uncertainty of the manufacturing result *σal*. When solving multi-objective inverse problems, we try to find inputs where the error between a given target vector and a prediction as well as the aleatoric uncertainty is minimal, leading to robust optimization results.

#### *3.1. Active Learning*

Active learning is already well researched in terms of optimal use of resources for parameter optimization of a model, i.e., generating training data, see [12,32–34]. The process of generating training data means obtaining labels *Ytrain* for an input *Xtrain*, such that a dataset *Dtrain* = (*Xtrain*,*Ytrain*) can be used to fit or optimize parameters of a model. Labels *Ytrain* are obtained by an oracle, where an oracle can be a domain expert, results of real physical experiments or like in our case results of expensive numerical FEM simulations. In the following, we present the idea behind the researched optimization approach and highlight the applicability of active learning with our proposed surrogate model.

In active learning, a number of *nAL* datapoints connected to maximum epistemic uncertainty *σep* are queried from a pool of candidates *Xpool* to build a training dataset *Dtrain* = (*Xtrain*,*Ytrain*) that is used for training a surrogate model. Thus, we select ideal training data, i.e., we use a minimum amount of training data such that the overall epistemic uncertainty in respect to making prediction on *Xpool* is minimized. We define in (11) the active learning query strategy with loss function L*AL* = L*AL*(*σep*) = *σep*(*x*) to select a new query datapoint *dAL <sup>q</sup>* = (*xAL <sup>q</sup>* , *yAL <sup>q</sup>* ) with input *xAL <sup>q</sup>* and output *yAL <sup>q</sup>* .

$$d\_q^{AL} = \underset{\text{x in } X\_{\text{pvol}}}{\text{argmax}} \, \sigma\_{\text{exp}}(\mathbf{x}) \tag{11}$$

A query datapoint *dAL <sup>q</sup>* is then moved to the training dataset *Dtrain*, the surrogate model is fitted and the iterative generation of training data starts again. In respect to our proposed surrogate model, we are able to utilize directly the epistemic uncertainty predictions of the two GPs, i.e., *σ*(*μ*) and *σ*(*σal*). Thus, we define *σep*(*x*) = *σ*(*μ*(*x*)) + *σ*(*σal*(*x*)) and select training data by utilizing (11).

## *3.2. Inverse Problem*

In real physical manufacturing processes, it is commonly required that the result of the manufacturing process lies within a given tolerance range. Therefore, the parameters that control the manufacturing process and the properties of the part must be carefully selected. Moreover, the process of finding inputs to obtain a given target can be formulated as an inverse problem, i.e., finding causal factors for a required effect. In our work, we define that a basic solution of an inverse problem is to find an input *xinv*, minimizing a distance *d* = *d*(*yinv*, *ytarget*) between prediction *yinv* and target vector *ytarget*. However, such solutions neglect the existence of process variations, i.e., aleatoric uncertainty. With no consideration of aleatoric uncertainty, the found ideal inputs can lead to quite good results regarding mean values but very high deviations, such that no robustness assertions can be made.

Therefore, we present a novel multi-objective optimization approach in (12) based on BO with a modified UCB acquisition function, where we make a clear separation of uncertainties, such that a loss function L*inv*, dependent of a distance function *d*, respective aleatoric *σal* and epistemic *σep* uncertainties is minimized.

$$\chi\_{inv} = \underset{X\_{pol}}{\text{argmin }} \mathcal{L}\_{inv}(d\_\prime \sigma\_{al\prime} \sigma\_{cp}) \tag{12}$$

As a distance function *d*, we select the absolute error between mean target *μtarget* and mean manufacturing process result *m*(*μ*) as the metric. However, our approach is not limited to a specific distance metric, so any can be used.

$$d = d(\mu\_{\text{target}}, m(\mu)) = |\mu\_{\text{target}} - m(\mu)|\tag{13}$$

Utilizing *m*(*σal*), *σ*(*σal*), *m*(*μ*) and *σ*(*μ*) from our proposed surrogate model, we define epistemic uncertainty *σep* = *σep*(*σ*(*σal*), *σ*(*μ*)) = *σ*(*σal*) + *σ*(*μ*) and construct a loss function L*inv* with tuning parameters *α* and *β*, where *α* controls the influence of the aleatoric uncertainty and *β* controls exploration vs. exploitation, i.e., the influence of the epistemic uncertainty.

$$\begin{aligned} \mathcal{L}\_{inv}(d(\mu\_{target}, m(\mu)), m(\sigma\_{\text{al}}), \sigma\_{\text{ep}}(\sigma(\sigma\_{\text{al}}), \sigma(\mu))) &= \\ d(\mu\_{target}, m(\mu)) + a \cdot m(\sigma\_{\text{al}}) - \beta \cdot \sigma\_{\text{ep}}(\sigma(\sigma\_{\text{al}}), \sigma(\mu))) \end{aligned} \tag{14}$$

Thus, with our approach, we find inputs that provide robust outputs close to a given target while keeping aleatoric uncertainty low. As a result, we obtain robust optimization outcomes when solving multi-objective inverse problems with our approach. In the work of Dai Nguyen et al. [25] we found a similar handling of uncertainty in the observation of the acquisition function, however, the authors only focus on maximizing black box functions, while we present an extension that solves multi-objective inverse problems.

#### **4. Case Study on Forging Superalloys**

We evaluate the proposed surrogate model and novel optimization method with a classic use case from the field of hot metal forming, preforming an Inconel 625 superalloy billet on an artificially designed forging press. First, we design the forging press characteristics with a parameterized curve and a GP and second, we design the forming process itself in a FEM simulation environment where we provide all the relevant information so that it is possible for researchers to link directly to our work.

#### *4.1. Forging Aggregate Characteristic*

We calculate the mean forming velocity values of an artificially designed forging process on the example of a forging screwpress by a self-designed parameterized curve in (15) that models the die velocity *vdie* in mm/s as a function of the process timestep *t* in *seconds*, clutch pressure *x*<sup>1</sup> in *bar* and rotation speed of the electric motor *x*<sup>2</sup> in rpm, such that *vdie* = *vdie*(*x*1, *x*2, *t*). Where, *x*<sup>1</sup> and *x*<sup>2</sup> are two process-specific setting parameters, i.e. *xm* = (*x*1, *x*2).

$$
\sigma\_{dic}(\mathbf{x}\_1, \mathbf{x}\_2, t) = \mathbf{x}\_1 \cdot \mathbf{x}\_1 \cdot \mathbf{x}\_2 \cdot t^2 - \mathbf{x}\_2 \cdot \mathbf{x}\_1 \cdot \mathbf{x}\_2 \cdot t^3 \tag{15}
$$

where *κ*<sup>1</sup> = <sup>5</sup> <sup>3</sup> mm2/kg and *<sup>κ</sup>*<sup>2</sup> <sup>=</sup> <sup>5</sup> <sup>3</sup> mm2/kgs are constants. We utilize a designed forging press specific GP with data generated by using (15) to model the mean and input dependent deviations in respect to the manufacturing process characteristic *Z* (i.e., *Z* represents the velocity profile of the forging die *vdie*). *Z* is defined by a distribution with mean *m*(*Z*) and aleatoric standard deviation Σ*al*(*Z*). With respect to our use case, the forging press specific GP with output *Z*(*j*) is at the very beginning of the uncertainty propagation analysis, see Figure 4. The inputs for the forging press GP are *xm* = (*x*1, *x*2) and time increments *t* = {0, . . . , *T*}, where *T* represents the duration of the manufacturing process. The output of the forging press GP is *Z*, such that *Z* ∼ GP(*m*(*xm*, *t*), *k*((*xm*, *t*),(*xm*, *t*) )) with mean *m*(*xm*, *t*) and covariance function *k*((*xm*, *t*),(*xm*, *t*) ). Thus, we obtain for each time increment a distribution describing the velocity at time *t*. The principle GP design for the forging press can be seen in Figure 5. As covariance function, *k* we found out that an RBF kernel is appropriate.

**Figure 4.** Uncertainty Propagation Analysis: the characteristic *Z*(*j*) of the manufacturing process is described by a distribution, since deviations occur when the process is repeated with identical *x* (*j*) *<sup>m</sup>* . With *M* draws of *z*(*i*)(*j*) out of the distribution *Z*(*j*) as manufacturing process characteristic and to be manufactured part parameters *x* (*j*) *<sup>p</sup>* , FEM simulations are executed to obtain targets *y*(*i*)(*j*) that describe a distribution *<sup>Y</sup>*(*j*) = (*μ*(*j*), *<sup>σ</sup>*(*j*) *al* ) with mean *<sup>μ</sup>*(*j*) and aleatoric standard deviation, i.e., uncertainty *<sup>σ</sup>*(*j*) *al* for given inputs *x* (*j*) *<sup>m</sup>* and *x* (*j*) *<sup>p</sup>* .

**Figure 5.** GP takes manufacturing process-specific parameters *xm* and manufacturing process time steps *t* as input and predicts a manufacturing process-specific characteristic (i.e., velocity profile of the forging die) *Z* with mean *m*(*Z*) and uncertainty Σ*al*(*Z*).

We utilize (15) and different input parameter combinations to generate training data for the forging press GP, see Table 1. In terms of time step size *t*, we assume that each forging stroke lasts one second, and we model each stroke with a resolution of 100 time steps.

**Table 1.** Input parameter combinations to generate training data for the forging press GP.


To obtain different deviations connected to different *x*<sup>1</sup> and *x*<sup>2</sup> combinations, we use the underlying inference properties of GP regression and vary inter- and extrapolation tasks in respect to the input values for forging process representation, see Table 2.



We define interpolation such that a value is within the training range (i.e., *x*<sup>1</sup> equals 14 or 18 and *x*<sup>2</sup> equals 55 or 65) and extrapolation such that a value is out of the training range (i.e., *x*<sup>1</sup> equals 10 or 22 and *x*<sup>2</sup> equals 45 or 75).

Exemplary forging press characteristics can be seen in Figure 6, where Figure 6a shows low deviation because *x*<sup>1</sup> and *x*<sup>2</sup> are both lie within the range of training data, Figure 6b,c show moderate deviation because one of the process parameters is within and the other is outside the range of the training data and Figure 6d shows the highest deviation because both of the process-parameters lie outside the range of training data. Thus, our forging press GP represents a forging aggregate characteristics with uncertainties dependent on the inputs. In our approach, we intentionally generate deviations depending on input parameters and assume that uncertainty is aleatoric to approximate reality, i.e., we abuse epistemic uncertainty and assume that it is aleatoric. When working with sensor data coming from a real manufacturing process, it is obvious that deviations, i.e., aleatoric process uncertainties Σ*al*, can be directly measured from data.

**Figure 6.** Exemplary forging press characteristics *Z*(*j*) represented by mean and 95% credibility interval of *vdie* over *t* with (**a**) low deviation, (**b**,**c**) moderate deviation and (**d**) high deviation.

#### *4.2. FEM Simulation*

The considered use case, preforming an Inconel 625 superalloy billet on a forging press machine, is observed by utilizing a corresponding FEM simulation. Manufacturing process related FEM inputs *Z*(*j*), i.e., different velocity profiles of the upper die over time, are modeled by the forging press GP. Inputs for the forging press GP are *x*1, *x*<sup>2</sup> and *t*, such that *Z*(*j*)(*t*) = *Z*(*j*)(*t*)(*x* (*j*) <sup>1</sup> , *x* (*j*) <sup>2</sup> , *t*). All 16 possible combinations for manufacturing process related FEM inputs are shown in Table 2. Billet related inputs *x* (*j*) *<sup>p</sup>* that are shared with our proposed surrogate model and FEM simulation are diameter, height and temperature, such that *x* (*j*) *<sup>p</sup>* = (*d*(*j*), *h*(*j*), *θ*(*j*)). One possible billet configuration is shown in Figure 7 and possible billet parameters for different configurations are shown in Table 3. We define the radius of the rounded edges to be constant 10 mm across all configurations.

**Table 3.** Key parameters for billet configurations *x* (*j*) *<sup>p</sup>* , values in mm.


We observe in total 27 different billets. Connecting manufacturing process related combinations with different billets, we obtain 432 combinations, i.e., *j* ∈ {1, . . . , 432}. For evaluation of the uncertainty propagation, we randomly draw *<sup>z</sup>*(*i*)(*j*) with *<sup>i</sup>* <sup>∈</sup> {1, . . . , 20} from each distribution *Z*(*j*), i.e., 20 FEM simulations are performed for each input setting. Thus, a total of 8640 FEM simulation results are generated for our experiments. Selected FEM output variables for our surrogate model are the final diameter and height of the preformed billet, such that *y*(*i*)(*j*) = (*d* (*i*)(*j*) *final* , *h* (*i*)(*j*) *final*) and *<sup>Y</sup>*(*j*) = (*μ*(*<sup>d</sup>* (*j*) *final*), *μ*(*h* (*j*) *final*), *σal*(*d* (*j*) *final*), *σal*(*h* (*j*) *final*)). In respect to the final diameter *dfinal*, we calculate the empiric mean by *μ*(*d* (*j*) *final*) = 1 <sup>20</sup> <sup>∑</sup><sup>20</sup> *<sup>i</sup>*=<sup>1</sup> *d* (*i*)(*j*) *final* and aleatoric standard deviation by *σal*(*d* (*j*) *final*)<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>20</sup> <sup>∑</sup><sup>20</sup> *<sup>i</sup>*=1(*d* (*i*)(*j*) *final* − *μ*(*d* (*j*) *final*))2. The calculations are analogous with respect to *hfinal*. Thus, we obtain a dataset with 432 instances described by six input features and four output features. For our running example, input features are clutch pressure, rotation speed, initial billet diameter, initial billet height, initial billet temperature and aggregated manufacturing process uncertainties obtained from data, i.e., the aggregated output of the forging press GP Σ¯ *al*(*Z*(*j*)) = ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> <sup>Σ</sup>*al*(*Z*)(*j*)(*t*). Output features are the mean of the final billet diameter, the mean of the final billet height, the aleatoric uncertainty of the final billet diameter and the aleatoric uncertainty of the final billet height.

**Figure 7.** Billet configuration with Diameter *d* = 220 mm, Height *h* = 200 mm and rounded edges with Radius = 10 mm.

The problem is defined as a 2D axisymmetric simulation task to utilize symmetries and make efficient use of computational resources. We utilize isotropic elasto-plastic Inconel 625 material behavior from literature. The Young's modulus is temperature-dependent and the yield stress depends on plastic strain, strain-rate and temperature. We set contact properties to tangential behavior with isotropic directionality and a friction coefficient of 0.3 between the billet and upper and lower forging tool, which means that we assume lubricated hot forging conditions. The lower tool is encastred and the upper tool's boundary conditions are set so that the vertical movement *z*(*i*)(*j*) is drawn from distribution *Z*(*j*) and there is no horizontal movement. An exemplary simulation definition can be seen in Figure 8, where (a) shows the initial state of the billet loaded with a randomly drawn screwpress velocity profile *z*(*i*)(*j*) and (b) the end result of the simulation with selected FEM output variables *y*(*i*)(*j*), i.e., the final diameter of 288 mm and the final height of 92.83 mm.

All billets are meshed with an approximate global element size of 7 mm, using 4 node bilinear axisymmetric quadrilateral elements with reduced integration and hourglass control. We obtain our FEM simulation results in the context of general static simulations. Details of the simulation steps are shown in Table 4. Simulation control parameters that are not listed are left at default values.

(**a**) initial billet (**b**) preformed billet

**Figure 8.** Preforming an Inconel 625 superalloy billet: (**a**) initial billet and randomly drawn velocity profile *z*(*i*)(*j*), (**b**) FEM simulation result with graphical presentation of the horizontal displacement *U*, *U*1 and selected output variables *y*(*i*)(*j*), i.e., final diameter of 288 mm and final height of 92.83 mm.

**Table 4.** Abaqus FEM simulation control parameters for our use case.


## **5. Results**

*5.1. GPs*

Before utilizing optimization methods, we evaluate each individual GP, see Table 5. The screwpress GP is trained with data that is generated by using inputs from Table 1 with (15) and tested on data generated by using inputs from Table 2 with (15). As covariance function, *k* we found out that an RBF kernel is appropriate for this GP. The GPs of our proposed surrogate model are both designed with a Matérn kernel with *ν* = 2.5 and are independently evaluated by 10-fold cross-validation with inputs from Table 2 and *Z*(*j*) obtained from the screwpress GP. Outputs are obtained from FEM simulations, see Section 4.2. In each cross-validation step, we split the respective data randomly such that 10 percent are in the test dataset and the remaining 90 percent are used for model training.

**Table 5.** Evaluation of individual GPs by average R2-Scores over 10 folds.


In addition, we calculate mean Pearson kurtosis

$$kurt\_{\text{Pearson}} = \frac{1}{N} \sum\_{j=1}^{N} \frac{1}{M} \sum\_{i=1}^{M} \left( \frac{y^{(i)}(j) - \mu^{(j)}}{\sigma\_{al}^{(j)}} \right)^{4} \tag{16}$$

and mean Fisher-Pearson coefficient of skewness

$$skcw\_{Fisher-Pearson} = \frac{1}{N} \sum\_{j=1}^{N} \frac{1}{M} \sum\_{i=1}^{M} \left( \frac{y^{(i)(j)} - \mu^{(j)}}{\sigma\_{al}^{(j)}} \right)^{\mathfrak{I}} \tag{17}$$

to describe the distribution shapes obtained from uncertainty propagation analysis, see Table 6.

**Table 6.** Mean values of Pearson kurtosis and Fisher-Pearson coefficient of skewness calculated from uncertainty propagation analysis results.


GP models were implemented with the GPflow library version 2.2.1 and Python 3.8.10. Inferences were run on a machine with 16 GB RAM, 8 CPUs and Intel(R) i7-8565 2.0 GHz processor. We utilized a L-BFGS-B algorithm to train the models. Training our surrogate model on all available data took an average of 1.36 s based on 10 measurements. For one prediction our model needs on average 0.046 s. A FEM simulation lasted on average 149.78 s.

#### *5.2. Active Learning*

We evaluate our proposed surrogate model by using active learning and compare it with an approach based on random training data selection. Evaluation is based on 10-fold cross-validation. In each cross-validation step, models are initially trained on two randomly selected datapoints out of the pool dataset containing 432 instances. Evaluation metrics are R2-Score and mean-squared-error (MSE) and are computed on a 20 percent hold-out test set that is randomly generated in each cross-validation step. Results for the mean of reduced-order predictions and corresponding aleatoric uncertainties regarding final diameter and height are shown respectively in Figures 9 and 10, where solid lines depict the mean R2-Score values and shaded areas are obtained by adding and subtracting one standard deviation. Mean values and standard deviations are calculated from the 10 cross-validation results.

#### *5.3. Solving Inverse Problem*

We evaluate our proposed multi-objective optimization strategy by solving inverse problems, i.e., we try to find input settings that lead to an output that is as near as possible to an initially defined target vector. In addition to minimize distances between a target vector *ytarget* and random mean vector *m*(*μ*(*j*)), we try to achieve results that also keep mean aleatoric uncertainty *<sup>m</sup>*(*σ*(*j*) *al* ) low. We utilize a 10-fold cross-validation, where in each cross-validation step the target vector is randomly drawn from the pool dataset and the best found prediction after drawing 50 datapoints out of the pool dataset is used for evaluation. This means that for each method, a dataset of 50 datapoints is generated, and each best prediction is found by evaluating the respective acquisition function on the corresponding generated dataset. We compare our approach with two other baselines, namely:

**Figure 9.** R2-Scores of 10-fold cross-validation over number of drawn training data *N*. In each crossvalidation step, models are initially trained on two randomly selected datapoints drawn from the pool dataset. Solid lines depict the mean R2-Score values and shaded areas the upper and lower confidence bounds obtained by adding and subtracting the standard deviations, calculated from the obtained results. (**a**) *m*(*μDiameter*); (**b**) *m*(*σal*,*Diameter*); (**c**) *m*(*μHeight*); (**d**) *m*(*σal*,*Height*).

**Figure 10.** MSEs of 10-fold cross-validation over number of drawn training data *N*. In each crossvalidation step, models are initially trained on two randomly selected datapoints drawn from the pool dataset. Solid lines depict the mean R2-Score values and shaded areas the upper and lower confidence bounds obtained by adding and subtracting the standard deviations, calculated from the 10 obtained results. (**a**) *m*(*μDiameter*); (**b**) *m*(*σal*,*Diameter*); (**c**) *m*(*μHeight*); (**d**) *m*(*σal*,*Height*).

1. Combined (This baseline can be considered as an approximation to the use of heteroskedastic GP in UCB BO.): no distinction of uncertainties in UCB based BO, i.e., simply adding aleatoric and epistemic uncertainty with loss:

$$\begin{aligned} \mathcal{L}\_{\text{combined}}(d(\mu\_{\text{target}}, m(\mu)), m(\sigma\_{\text{all}}), \sigma\_{\text{cp}}(\sigma(\sigma\_{\text{all}}), \sigma(\mu))) &= \\ d(\mu\_{\text{target}}, m(\mu)) - [\alpha \cdot m(\sigma\_{\text{all}}) + \beta \cdot \sigma\_{\text{cp}}(\sigma(\sigma\_{\text{all}}), \sigma(\mu)))] \end{aligned} \tag{18}$$

2. Epistemic: neglecting aleatoric uncertainty in UCB based BO with loss:

$$\begin{split} \mathcal{L}\_{\text{expistemic}}(d(\mu\_{\text{target}}, m(\mu)), m(\sigma\_{\text{all}}), \sigma\_{\text{cp}}(\sigma(\sigma\_{\text{all}}), \sigma(\mu))) &= \\ d(\mu\_{\text{target}}, m(\mu)) - \beta \cdot \sigma\_{\text{cp}}(\sigma(\sigma\_{\text{all}}), \sigma(\mu))). \end{split} \tag{19}$$

Figure 11 shows representative plots of optimization results for one random target vector (i.e., one cross-validation step) over 50 draws of *xinv*, where solid lines depict squared errors and dotted lines show mean aleatoric uncertainty *m*(*σal*). Figures 12 and 13 show different distributions of optimization results obtained by 10-fold cross-validation in respect to squared errors and mean aleatoric uncertainty *m*(*σal*). Distributions are visualized by kernel density estimation.

**Figure 11.** Representative plots of multi-objective optimization results for different hyperparameter settings *α* and *β* over number of optimization steps *N*. Solid lines depict squared error values, and dotted lines represent corresponding mean aleatoric uncertainty *m*(*σal*). The plots for *α* = 0 show only blue lines, because the results of the different methods are the same and the lines are on top of each other.

**Figure 12.** Kernel density estimate plots of squared errors for different hyperparameter settings *α* and *β*, distributions are obtained by 10-fold cross-validation, where in each fold a target vector is randomly selected. The plots for *α* = 0 show only blue lines, because the results of the different methods are the same and the lines are on top of each other.

**Figure 13.** Kernel density estimate plots of mean aleatoric uncertainties *m*(*σal*) for different hyperparameter settings *α* and *β*, distributions are obtained by 10-fold cross-validation, where in each fold a target vector is randomly selected. The plots for *α* = 0 show only blue lines, because the results of the different methods are the same and the lines are on top of each other.

#### **6. Discussion**

Evaluation of the individual GPs with 10-fold cross-validation shows promising R2- Scores (lowest: 0.8146, mean: 0.89355, highest: 0.9586), i.e., hyperparameters appear to be appropriate for further evaluations. Observation of generated manufacturing process uncertainties, i.e., Σ¯ *al*(*Z*(*j*)) shows a diverse data landscape, thus, we assume that further uncertainty propagation analysis is meaningful.

We observe the distributions obtained from uncertainty propagation analysis by calculating Pearson kurtosis and Fisher-Pearson coefficient of skewness (A Pearson kurtosis of 3.0 and Fisher-Pearson coefficient of skewness of 0.0 describe a normal distribution). Regarding kurtosis, results shows that distributions are near to Normal distributions, where the distribution of *hfinal* is slightly platykurtic (*kurtPearson*(*hfinal*) = 2.685 < 3.0), i.e., it is less peaked than a Normal distribution and the distribution of *dfinal* is little leptokurtic (*kurtPearson*(*dfinal*) = 3.003 > 3.0), i.e., the distribution is more peaked compared to a Normal distribution. In terms of skewness, the distribution of *dfinal* is more skewed compared to *hfinal*, however, both values are less than 0.5 so that approximate symmetry can be assumed.

We evaluate the impact of data selection for model training using two metrics, R2-Score and MSE, with a 10-fold cross-validation comparing active learning with random sample selection. With respect to mean values *μ*, active learning shows overall an improvement compared to random sample selection. In terms of aleatoric uncertainties *σal*, random sample selection is superior to active learning up to the selection of about 20 samples, but after that active learning shows superior performance compared to random sample selection. The initially worse performance of active learning with respect to *σal* is due to a trade-off in the active learning cost function between *σ*(*μ*) and *σ*(*σal*) with the influence of *σ*(*μ*) dominating. A possible solution for this would be the introduction of appropriate tuning parameters that regulate the influence of the respective epistemic uncertainties *σ*(*μ*) and *σ*(*σal*). Moreover, it should be noted that random sample selection shows only better performance at a stage where the tuning of parameters is far from complete, so the better performance is not applicable in practice.

With regard to solving inverse problems, we compare our novel robust UCB based BO multi-objective optimization algorithm with two baselines: (1) combined: no distinction of uncertainties in UCB based BO and (2) epistemic: neglecting aleatoric uncertainty in UCB based BO. We show that over different values of tuning parameters *α* and *β* there are clear tendencies of the different approaches. By disabling the influence of aleatoric uncertainty (*α* = 0), all three approaches show the same results as expected: low squared errors and neglected aleatoric uncertainty. For all approaches, slight differences can be seen over different *β* values while *α* = 0, regulating the exploration vs. exploitation trade-off.

Due to the fundamentals of the epistemic approach, there are no differences in the optimization result when *α* values are changed for constant *β* values, see Figure 11. Differences in kernel density estimate plots over varying *α* values are from random target vector selection. Overall, the epistemic approach yields the best optimization results in terms of squared errors, see Figures 11 and 12, however, as expected, aleatoric uncertainty is ignored and thus high, see Figure 13. The combined approach, where aleatoric and epistemic uncertainties are simply added and handled as quasi-epistemic, shows the overall worst results. At low *α* values, the squared errors are acceptable, but the aleatoric uncertainty is high due to inappropriate handling of information, see Figures 11–13. To arrive at our approach, once aleatoric uncertainty is considered, i.e., *α* > 0.0 results for the inverse problem show low squared errors and low aleatoric uncertainty which we recognize as robust results. Moreover, by increasing *α* one can see that our approach leads to results where lowering aleatoric uncertainty *σal* is more preferred than lowering squared errors, see Figure 11 *α* = 1.0 and *α* = 10.0. Kernel density estimate plots generated from 10-fold crossvalidation results confirm those findings, where clear tendencies of optimization results in respect to tuning parameters *α* and *β* can be seen. While an approach considering only epistemic uncertainties delivers overall best results in respect to squared errors, aleatoric uncertainties are out of scope, thus, optimization results lead to less robust outcomes. An approach considering aleatoric and epistemic uncertainties combined by summing them up shows overall worst results and can not compete with the remaining. Our approach, where aleatoric and epistemic uncertainties are considered to deliver different information, depicts that overall good results are achieved with respect to squared errors while keeping aleatoric uncertainty low, thus robust solutions for solving multi-objective inverse problems are provided.

Moreover, our model is directly applicable in an industrial framework where the forging press characteristics are represented by measured sensor data of the aggregate (e.g., velocity over time, forging force over time, forging force over the forming path, etc.), which can be used in an appropriately designed FEM simulation for uncertainty propagation analysis and, moreover, for surrogate model training.

#### **7. Conclusions**

In this work, we present a GP based reduced-order surrogate model approach with a novel multi-objective target vector optimization strategy to obtain more robust optimization results by concerning aleatoric and epistemic uncertainties. Evaluation on a classic hot metal forming use case, preforming an Inconel 625 forging billet on a self-designed forging press, depicts the advantages of our approach compared to baselines. Our major findings include that our surrogate model produces fast results (over 3000 times faster) compared to FEM simulation, with a calculated loss of accuracy and information. Moreover, active learning can be used directly with our model to make optimal use of computational resources, and solving inverse problems leads to robust optimization results, i.e., finding results close to a defined objective while keeping aleatoric uncertainty low. With our work, we pave one promising way for faster and more realistic simulation and optimization methods.

In future work, we will evaluate our GP based surrogate model and multi-objective optimization strategy on manufacturing process use cases concerning other domains, with real sensor data describing the characteristics of a manufacturing process. Additionally, we will research other Bayesian machine learning and deep learning models as components instead of GP in our surrogate model approach. Moreover, we will experiment with further active learning approaches.

**Author Contributions:** Conceptualization, J.G.H. and B.C.G.; methodology, J.G.H. and B.C.G.; software, J.G.H.; validation, J.G.H., B.C.G. and R.K.; formal analysis, J.G.H.; investigation, J.G.H., B.C.G. and R.K.; resources, J.G.H. ; data curation, J.G.H.; writing—original draft preparation, J.G.H.; writing review and editing, J.G.H., B.C.G. and R.K.; visualization, J.G.H.; supervision, B.C.G. and R.K.; project administration, J.G.H., B.C.G. and R.K.; funding acquisition, J.G.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Österreichische Forschungsförderungsgesellschaft (FFG) Grant No. 881039 and Open Access Funding by the Graz University of Technology.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The project BrAIN–Brownfield Artificial Intelligence Network for Forging of High Quality Aerospace Components (FFG Grant No. 881039) is funded in the framework of the program 'TAKE OFF', which is a research and technology program of the Austrian Federal Ministry of Transport, Innovation and Technology. The Know-Center is funded within the Austrian COMET Program—Competence Centers for Excellent Technologies—under the auspices of the Austrian Federal Ministry of Transport, Innovation and Technology, the Austrian Federal Ministry of Economy, Family and Youth and by the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG. We would like to thank our colleagues at voestalpine Böhler Aerospace GmbH for the fruitful discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.
