**1. Introduction**

With the increase in demand for high-performance and highly-efficient systems, the complexity of industrial design and manufacturing processes are increasing proportionally, exposing many opportunities for novel technologies as well as many associated technical challenges. For example, advancement in the design of composite structures allows us to reduce weight, advancement in additive manufacturing enables us to reduce cost. Introducing a new technology typically happens at the lowest level of the systems hierarchy (e.g., at the parts or sub-component levels). Extending new technologies to the system level requires rigorous testing for many years. For example, in the eighties, composite material was only used for limited components of an aircraft (i.e., the wing and tail [1]). Recently, however, about 50% of the material used in the Boeing 787 Dreamliner are composite material [2]. In the industrial setting, the process of adaptation of a new technology can be accelerated by proper assessment of uncertainty at various aspects of the products life cycle. For example, at the design stage of an aircraft wing rib, it is crucial to consider the effect of uncertainty in the material and operation conditions on the aeroelastic dynamics of the wing [3]. At the manufacturing stage, it is important to consider the impact of material uncertainty on the quality control [4,5]. The maintenance stage requires a holistic assessment of the effect of measurement uncertainty on the static and dynamic responses of the wing during structural health monitoring [6].

Quantifying uncertainty at the system level often requires a physics-based computational model for the entire structure. However, in structures such as an aircraft wing, traditional computational models may become too complex and costly for simulating the

multi-scale dynamical response due to material heterogeneity at the sub-component level. The effect of the sub-component on the entire structure depends on the size, location and loading conditions of the part. It is therefore, necessary to consider a different level of fidelity for the analysis of the sub-components in order to reduce the cost and complexity of uncertainty quantification. To this end, the concept of localized uncertainty propagation for dynamical systems having multi spatio-temporal scales can be utilized to address such issues [7–9].

In this work, we consider assessing the effect of localized uncertainty in a region of interest within the entire structure. The framework is based on two uncoupled steps: (1) identification of the region of interest, (2) quantifying the effect of localized uncertainty. For structures composed of distinct parts, the localized region of interest for uncertainty propagation can be easily identified. Alternatively, measurement data of the system response can be used to identify the localized region of interest. The Bayesian paradigm integrates computational models and observational data in one framework to update the current state of knowledge [10,11]. Estimating the posterior probability density function in the Bayesian method requires solving the forward model many times, which may become challenging for limited computational budget. This issue is often addressed by building a surrogate model such as a Gaussian Process (GP) regression model [12]. The GP models are non-parametric and Bayesian in nature, and they provide uncertainty bounds on their predictions. Once the region of interest is identified, a Polynomial Chaos (PC) expansion [13,14] is used to propagate the uncertain material properties of the localized region through the entire domain. In contrast to References [7,8] the contributions of this work—(1) The partitioning of the domain is inferred from measurement of the system response, (2) Gaussian process model is used as a surrogate in the Bayesian framework, (3) non-intrusive polynomial chaos approach is used for localized uncertainty propagation. The rest of this work is organized as follows—in Section 2, we provide the problem statement and the associated mathematical formulations. Our numerical demonstrations are provided in Section 3. We provide the conclusions of the current work in Section 4.

#### **2. Methodology**

In this section, we present the mathematical framework of our approach for datadriven partitioning scheme for localized uncertainty quantification. In particular, in Section 2.1, we introduce the problem statement in the Bayesian setting. For problems where the localized region of interest is not defined explicitly, we rely on measurement data of the system response to infer the localized region of interest using Bayesian framework. The Bayesian framework requires a computational model (the forward problem) to estimate the response of the system for a given set of the input parameters. Consequently, in Section 2.2, we discuss the stochastic elastodynamic problem and its finite element discretization. Estimating the localized region of interest in the Bayesian setting necessitates many solutions to the stochastic elastodynamic problem which can become computationally demanding. A surrogate model for the system response can be used to reduce the computational cost of the Bayesian framework as will be presented in Section 2.3. Once the localized region of interest is estimated, uncertainty representation of the material properties within the region of interest can be performed. The localized uncertainty is propagated forward through the entire computational domain in order to estimate the effect on the material variability on the response. For this task, we use the polynomial chose expansion for efficient assessment of uncertainty with less computational cost. The polynomial chose expansion is reviewed in Section 2.4.

#### *2.1. Bayesian Inference*

In the Bayesian inference, the prior knowledge is updated to posterior using noisy measurements and the response of a physical model [10,11]. The update is based on the Bayes' rule defined as

$$p(\theta|\mathbf{d}) = \frac{p(\theta)p(\mathbf{d}|\theta)}{p(\mathbf{d})},\tag{1}$$

where *θ* is the unknown parameter to be estimated, **d** is the measurement of an observable quantity, *p*p*θ*|**d**q is the posterior probability density function, *p*p*θ*q is the prior probability density function, and *p*p**d**|*θ*q denotes the likelihood of the observations given the parameter. We assume that the measured data, **d**, is generated from a statistical model represented as

$$\mathbf{d} = \mathbb{M}(\theta) + \varepsilon\_{\prime} \tag{2}$$

where <sup>M</sup>p*θ*<sup>q</sup> denotes a physical model and is a measurement noise represented as a Gaussian random variable with unknown variance " N <sup>p</sup>0, *<sup>σ</sup>*<sup>2</sup> *<sup>n</sup>*q. For a Gaussian noise, the likelihood function becomes

$$p(\mathbf{d}|\theta) = \frac{1}{\left(2\pi\sigma\_n^2\right)^{-N/2}} \exp\left(-\sum\_{i}^{N} \frac{[d\_i - \mathcal{M}(\theta\_i)]^2}{2\sigma\_n^2}\right). \tag{3}$$

The task in hand is to utilize the measurement **<sup>d</sup>** and the physical model <sup>M</sup>p*θ*<sup>q</sup> to estimate the system parameters *<sup>θ</sup>*. The process requires many executions to the physical model <sup>M</sup>p*θ*q, which can be computationally expensive. The computational model is often approximated by a simpler easy to evaluate model as:

$$\mathcal{M}(\theta) \simeq \mathcal{M}(\theta),\tag{4}$$

where Mp*θ*q denotes the surrogate model that is constructed using limited runs of the physical model <sup>M</sup>p*θ*q. In our work, we represent <sup>M</sup>p*θ*<sup>q</sup> as the Gaussian process surrogate model [15]. Once we construct the surrogate model, the localized features parameterized by *θ* is estimated using Markov Chain Monte Carlo (MCMC) sampling technique [16,17]. Having identified the region of interest, a localized uncertainty quantification of the material properties can be performed efficiently using polynomial chaos expansion [13].

#### *2.2. The Forward Problem*

We consider an arbitrary physical domain <sup>Ω</sup> <sup>P</sup> <sup>R</sup>*<sup>d</sup>* with <sup>B</sup><sup>Ω</sup> being its boundary as shown in Figure 1a, and define the following problem:

Find a random function **<sup>u</sup>**p**x**, *<sup>t</sup>*, *<sup>ξ</sup>*<sup>q</sup> : <sup>Ω</sup> ˆ r0, *Tf*s ˆ <sup>Ξ</sup> <sup>Ñ</sup> <sup>R</sup>, such that the following equations hold

$$\begin{array}{ccccc}\rho(\xi)\stackrel{\circ}{\mathbf{u}}(\mathbf{x},t,\xi) &=& \nabla \cdot \boldsymbol{\sigma} + \mathbf{b} & \text{in} & \boldsymbol{\Omega} & \times & [0,T\_f] & \times \boldsymbol{\Xi}\_{\prime} \\ \mathbf{u}(\mathbf{x},t,\xi) &=& \dot{\mathbf{u}} & \text{on} & \dot{c}\Omega\_{\mathbf{u}} & \times & [0,T\_f] & \times \boldsymbol{\Xi}\_{\prime} \\ & \boldsymbol{\sigma} \cdot \mathbf{n} = & \dot{\mathbf{f}} & \text{on} & \dot{c}\Omega\_{\ell} & \times & [0,T\_f] & \times \boldsymbol{\Xi}\_{\prime} \\ & \mathbf{u}(\mathbf{x},0,\xi) &=& \mathbf{u}\_{0} & \text{in} & \boldsymbol{\Omega} & \times & \boldsymbol{\Xi}\_{\prime} \\ & \dot{\mathbf{u}}(\mathbf{x},0,\xi) &=& \dot{\mathbf{u}}\_{0} & \text{in} & \boldsymbol{\Omega} & \times & \boldsymbol{\Xi}\_{\prime} \end{array} \tag{5}$$

where *ρ*p*ξ*q is the mass density, *σ* is the stress tensor, **u** is the displacement field, **b** is the body force per unit volume, **u**¯ is the prescribed displacement on BΩ*u*, ¯**t** is the prescribed traction on BΩ*t*, **n** is a unit normal to the surface, and **u**<sup>0</sup> and **u**9 <sup>0</sup> are the initial displacement and velocity, respectively. Here, we define the stochastic space by (Θ, Σ, *P*q, where Θ denoting the sample space, Σ being the *σ*-algebra of Θ, and *P* representing an appropriate probability measure. The stochastic space is parameterized by a finite set of standardized identically distributed random variables *ξ* " t*ξi*p*θ*qu*<sup>M</sup> <sup>i</sup>*"1, where *<sup>θ</sup>* <sup>P</sup> <sup>Θ</sup>. The support of the random variables is defined as <sup>Ξ</sup> " <sup>Ξ</sup><sup>1</sup> <sup>ˆ</sup> <sup>Ξ</sup><sup>2</sup> ˆ¨¨¨ <sup>Ξ</sup>*<sup>M</sup>* <sup>P</sup> <sup>R</sup>*<sup>M</sup>* with a joint probability density function given as *p*p*ξ*q " *p*1p*ξ*1q ¨ *p*2p*ξ*2q¨¨¨ *pM*p*ξM*q.

For linear isotropic elastic martial, the constitutive relation between the stress and strain tensors is given by:

$$
\sigma = \lambda(\mathfrak{f}) \text{tr}(\mathfrak{e}) \mathbf{I} + 2\mu(\mathfrak{f}) \mathfrak{e}\_{\prime} \tag{6}
$$

where *λ*p*ξ*q and *μ*p*ξ*q are the Lamé's parameters, **I**is an identity tensor and *ε* is the symmetric strain tensor defined as

$$\boldsymbol{\varepsilon} = \frac{1}{2} \left( \boldsymbol{\nabla} \mathbf{u} + \boldsymbol{\nabla} \mathbf{u}^T \right). \tag{7}$$

For a random Young's modulus *E*p**x**, *ξ*q and deterministic Poisson's ratio *ν* , the Lamé's parameters can be expressed as

$$\lambda(\mathfrak{f}) = \frac{E(\mathbf{x}, \mathfrak{f}) \nu}{(1 + \nu)(1 - 2\nu)}, \quad \mu(\mathfrak{f}) = \frac{E(\mathbf{x}, \mathfrak{f})}{2(1 + \nu)}.\tag{8}$$

We consider the case that uncertainty stems from a localized variability in a confined region within the physical domain. For example as shown in Figure 1b, the variability in the quantity of interest can be attributed to random material properties within the subdomain Ω2. The artificial martial boundaries shown in Figure 1b for subdomain Ω<sup>2</sup> is estimated using Bayesian inference. Localizing random variability in the neighborhood of the quantity of interest reduces the computational cost of uncertainty propagation in problems where a region of interest can be specified. Depending on the interest in the region, each subdomain can have its local uncertainty representation and the corresponding mesh and time resolution. As a result, the Asynchronous Space-Time Domain Decomposition Method with Localized Uncertainty Quantification (PASTA-DDM-UQ) [7–9] can be utilized. In PASTA-DDM-UQ, spatial, temporal and material decomposition are considered. In this work however, we only consider material decomposition and apply non-intrusive approach for uncertainty propagation.

Consequently, let the physical domain Ω be partitioned based on the martial variability into *ns* non-overlapping subdomains Ω*s*, 1 ď *s* ď *ns* as shown in Figure 1b and such that:

$$
\Omega = \bigcup\_{s=1}^{n\_s} \Omega\_s, \quad \Omega\_s \bigcap \Omega\_r = \bigotimes \text{ for } s \neq r, \quad \Gamma = \bigcup\_{s=1}^{n\_s} \Gamma\_s, \quad \Gamma\_s = \partial \Omega\_s / \partial \Omega. \tag{9}
$$

(**a**) Spatial domain (**b**) Domain decomposition.

**Figure 1.** An arbitrary computational domain Ω with a random material property (i.e., *E*p**x**, *ξ*q) and its partitioning into non-overlapping subdomains. The partitioning is based on material variability.

Note that the partitioning boundaries are not set *a priori* as it will be estimated using noisy measurement of the system response. According to the decomposition in Equation (9), the stochastic dynamical problem in Equation (5) can be transformed into the following minimization problem:

Find a random function **<sup>u</sup>**p**x**, *<sup>t</sup>*, *<sup>ξ</sup>*<sup>q</sup> : <sup>Ω</sup> ˆ r0, *Tf*s ˆ <sup>Ξ</sup> <sup>Ñ</sup> <sup>R</sup>, such that

$$\mathcal{L}(\mathbf{u}, \dot{\mathbf{u}}) = \sum\_{s=1}^{n\_s} (\mathcal{T}\_s(\dot{\mathbf{u}}) - \mathcal{V}\_s(\mathbf{u})) \to \min, \quad s = 1, \cdots, n\_s. \tag{10}$$

where Lp**u**, **u**9q is the Lagrangian of the system, T*s*p**u**9q denotes the subdomain kinetic energy and V*s*p**u**q is the subdomain potential energy defined as:

$$\mathcal{T}\_s(\dot{\mathbf{u}}) = \int\_{\boldsymbol{\Sigma}} \int\_{\Omega\_s} \frac{1}{2} \rho\_s(\dot{\boldsymbol{\xi}}) \dot{\mathbf{u}} \cdot \dot{\mathbf{u}} \, d\Omega \, \mathrm{d}\boldsymbol{\Sigma}, \tag{11}$$

$$\mathcal{W}\_{\mathbf{s}}(\mathbf{u}) = \int\_{\boldsymbol{\Sigma}} \left( \int\_{\Omega\_{s}} \frac{1}{2} \boldsymbol{\varepsilon} : \boldsymbol{\sigma}\_{\boldsymbol{s}} \, d\Omega + \int\_{\Omega\_{s}} \mathbf{u} \cdot \mathbf{b}\_{s} \, d\Omega + \int\_{\boldsymbol{\mathcal{E}} \cap \Omega\_{t}} \mathbf{u} \cdot \mathbf{\bar{t}}\_{s} \, d\Gamma \right) d\boldsymbol{\Sigma}, \tag{12}$$

where **b***<sup>s</sup>* and ¯**t***<sup>s</sup>* are the subdomain body force and the prescribed traction, respectively. The Hamilton's principle with a dissipation term reads

$$\int\_{0}^{T\_f} \left( \delta \mathcal{L} - \frac{\partial \mathcal{Q}}{\hat{c} \dot{\mathcal{E}}} : \delta \varepsilon \right) \mathbf{d} \mathbf{t} = \mathbf{0},\tag{13}$$

where *δ*L is the first variation of the augmented Lagrangian defined as

$$
\delta \mathcal{L} = \sum\_{s=1}^{n\_s} \int\_{\Xi} \left( \int\_{\Omega\_s} \rho\_s(\xi) \delta \dot{\mathbf{u}} \cdot \dot{\mathbf{u}} \, d\Omega - \int\_{\Omega\_s} \delta \boldsymbol{\varepsilon} : \mathbf{D}\_s(\xi) : \boldsymbol{\varepsilon} \, d\Omega + \right. \tag{14}
$$

$$
\int\_{\Omega\_s} \delta \mathbf{u} \cdot \mathbf{b}\_s \, d\Omega + \int\_{\partial \Omega\_t} \delta \mathbf{u} \cdot \mathbf{f}\_s \, d\Gamma \right) d\Xi,
$$

here we define **D***s*p*ξ*q as the uncertain linear elasticity tensor. The dissipation function Qp**u**9q in the Hamilton is defined as

$$\mathcal{Q}(\dot{\mathbf{u}}) = \sum\_{s=1}^{n\_s} \frac{1}{2} \int\_{\Sigma} \int\_{\Omega\_s} \dot{\mathbf{e}} : \hat{\mathbf{D}}\_s : \dot{\mathbf{e}} \, d\Omega \, \mathrm{d}\Sigma\_{\prime} \quad \text{s} = 1, \cdots, n\_{\prime} \tag{15}$$

where **D**p *<sup>s</sup>* is the damping tensor assumed to be deterministic. Substituting Equations p15q and p14q into the Hamilton's principle Equation (13) gives the following stochastic equation of motion for a typical subdomain Ω*<sup>s</sup>*

$$\begin{array}{c} \int\_{\Sigma} \int\_{\Omega\_{s}} \rho\_{s}(\boldsymbol{\xi}) \tilde{\mathbf{u}} \cdot \delta \mathbf{u} \, \mathrm{d}\Omega \, \mathrm{d}\Sigma + \int\_{\Sigma} \int\_{\Omega\_{s}} \dot{\mathbf{e}} : \stackrel{\scriptstyle}{\mathbf{D}}\_{s} : \delta \boldsymbol{\varepsilon} \, \mathrm{d}\Omega \, \mathrm{d}\Sigma + \int\_{\Sigma} \int\_{\Omega\_{s}} \boldsymbol{\varepsilon} : \mathcal{D}\_{s}(\boldsymbol{\xi}) : \delta \boldsymbol{\varepsilon} \, \mathrm{d}\Omega \, \mathrm{d}\Sigma\\ = \int\_{\Sigma} \int\_{\Omega\_{s}} \delta \mathbf{u} \cdot \mathbf{b}\_{s} \, \mathrm{d}\Omega \, \mathrm{d}\Sigma + \int\_{\Sigma} \int\_{\hat{\Sigma} \cap \Omega\_{\uparrow}} \delta \mathbf{u} \cdot \mathbf{f}\_{s} \, \mathrm{d}\Gamma \, \mathrm{d}\Sigma. \end{array} \tag{16}$$

In the next section, we describe the finite element discretization of the weak form defined in Equation (16).

#### Spatial and Temporal Discretizations

Let the spatial domain Ω be triangulated with finite elements of size *h* and let the associated finite element subspace be defined as X*<sup>h</sup>* <sup>Ă</sup> *<sup>H</sup>*<sup>1</sup> <sup>0</sup> pΩq, the spatial component of the solution to the stochastic problem is then sought in the tensor product function space *W* " *H*<sup>1</sup> <sup>0</sup> <sup>p</sup>Ωq b *<sup>L</sup>*2pΘ<sup>q</sup> defined as [14,18]

$$\mathcal{W} = \{ w(\mathbf{x}, \theta) : \Omega \times \Theta \to \mathbb{R} \mid \|w\|\_{\mathcal{W}}^2 < \infty \}, \subset H\_0^1(\Omega) \otimes L^2(\Theta), \tag{17}$$

where the energy norm }¨}<sup>2</sup> *<sup>W</sup>* is defined as

$$\|w(\mathbf{x},\theta)\|\_{W}^{2} = \int\_{\Theta} \left(\int\_{\Omega} \kappa(\mathbf{x},\theta) |\nabla w(\mathbf{x},\theta)|^{2} \mathbf{dx}\right) \mathrm{d}\mathbf{P}(\theta). \tag{18}$$

The tensor product space *W* can be viewed as a stochastic space consists of random functions satisfying the Dirichlet boundary condition and having a finite second order moment. For a given realization of the underlying random variables of the stochastic space, an approximate finite element solution to the deterministic part can be expressed as

$$\mathbf{u}^h = \sum\_{i}^{n\_i} \mathbf{N}\_i(\mathbf{x}) \tilde{\mathbf{u}}^i(t),\tag{19}$$

where **N***i*p**x**q are traditional spatial finite element basis functions and **u**˜*<sup>i</sup>* p*t*q are the nodal values of the solution as a function of time [19]. Substituting the discrete field, Equation (19) in the weak form Equation (16) gives the following semi-discretized stochastic equation of motion :

$$\int\_{\Xi} (\mathbf{M}\ddot{\mathbf{u}}(t) + \mathbf{C}\dot{\mathbf{u}}(t) + \mathbf{K}\mathbf{u}(t))d\Xi = \int\_{\Xi} \mathbf{F}(t)d\Xi. \tag{20}$$

We drop the nodal finite element marks (tilde) for brevity of the representation and define the following matrices:

$$\begin{split} \mathbf{M} &= \sum\_{s=1}^{n\_s} \int\_{\Omega\_s} \rho\_s \mathbf{N}^T \mathbf{N} \mathbf{d} \Omega\_{\prime} \quad \mathbf{C} = \sum\_{s=1}^{n\_s} \int\_{\Omega\_s} \mathbf{B}^T \hat{\mathbf{D}}\_s \mathbf{B} \mathbf{d} \Omega\_{\prime} \\\ \mathbf{K} &= \sum\_{s=1}^{n\_s} \int\_{\Omega\_s} \mathbf{B}^T \mathbf{D}\_s^j \mathbf{B} \mathbf{d} \Omega\_{\prime} \quad \mathbf{F}(t) = \sum\_{s=1}^{n\_s} \left( \int\_{\Omega\_s} \mathbf{b}\_s^T \mathbf{N} \mathbf{d} \Omega + \int\_{\partial \Omega\_s} \mathbf{t}\_s^T \mathbf{N} \mathbf{d} \Gamma \right). \end{split}$$

Here, **B** is the displacement-strain matrix. For time discretization, we use the Newmark time integration scheme to advance the stochastic system one time step as

$$\dot{\mathbf{u}}^{k+1} = \dot{\mathbf{u}}^k + (1 - \gamma)\Delta t \ddot{\mathbf{u}}^k + \gamma \Delta t \ddot{\mathbf{u}}^{k+1},\tag{21}$$

$$\mathbf{u}^{k+1} = \mathbf{u}^k + \Delta t \dot{\mathbf{u}}^k + \left(\frac{1}{2} - \beta\right) \Delta t^2 \ddot{\mathbf{u}}^k + \beta \Delta t^2 \ddot{\mathbf{u}}^{k+1},\tag{22}$$

where *<sup>γ</sup>* and *<sup>β</sup>* are the integration parameters, and <sup>Δ</sup>*<sup>t</sup>* " *Tf* ´*T*<sup>0</sup> *nt* . Substituting he Newmark scheme into the semi-discretized stochastic equation of motion Equation (20), gives the following fully discretized linear system for a give realization of the random vector *ξ*:

$$\mathbf{A}(\mathfrak{F})\mathbf{U}^{k+1}(\mathfrak{F}) = \mathbf{F}^{k+1} - \mathbf{G}\mathbf{U}^{k}(\mathfrak{F}),\tag{23}$$

where for compact representation, we define

$$\begin{split} \mathbf{A}(\boldsymbol{\xi}) &= \begin{bmatrix} \mathbf{M}(\boldsymbol{\xi}) & \mathbf{C} & \mathbf{K}(\boldsymbol{\xi}) \\ -\gamma\Delta T \mathbf{I} & \mathbf{I} & \mathbf{0} \\ -\beta\Delta T^{2} \mathbf{I} & \mathbf{0} & \mathbf{I} \end{bmatrix}, \quad \mathbf{G} = \begin{bmatrix} \mathbf{0} & \mathbf{0} & \mathbf{0} \\ -(1-\gamma)\Delta T \mathbf{I} & -\mathbf{I} & \mathbf{0} \\ -(\frac{1}{2}-\beta)\Delta T^{2} \mathbf{I} & -\Delta T \mathbf{I} & -\mathbf{I} \end{bmatrix}, \\\ \mathbf{U}(\boldsymbol{\xi}) &= \begin{Bmatrix} \tilde{\mathbf{u}}(\boldsymbol{\xi}) \\ \tilde{\mathbf{u}}(\boldsymbol{\xi}) \\ \mathbf{u}(\boldsymbol{\xi}) \end{Bmatrix}, \quad \mathbf{F} = \begin{Bmatrix} \mathbf{f} \\ \mathbf{0} \\ \mathbf{0} \end{Bmatrix}. \end{split}$$

For the data-driven decomposition approach, many solutions to the forward problem Equation (23) are required to estimate the appropriate decomposition for localized uncertainty propagation. To mitigate the computational cost involved with identifying the underlying localized region of interest, a Gaussian Process (GP) surrogate model is utilized as explained in the next section.

#### *2.3. Surrogate Modeling*

The Gaussian Process (GP) surrogate model is widely used for engineering problems as a cost-effective alternative to costly computer simulator [20,21]. In GP for dynamical systems, we consider D " tp**x***i*, **y***i*q | *i* " 1, 2, ¨¨¨ , *N*u to be a set of training data consists of *<sup>N</sup>* samples, where **<sup>x</sup>***<sup>i</sup>* <sup>P</sup> <sup>R</sup>*<sup>d</sup>* represents the input sample *<sup>i</sup>*, and **<sup>y</sup>***<sup>i</sup>* is the corresponding output vector of size *nT*. For time-series data, the output is observed at a sequence of time steps *tj* P r*t*1, *t*2, ¨¨¨ , *tnT* s. We concatenate all the input and output into the design matrix X and the corresponding observation matrix Y, respectively as:

$$\mathcal{X} = \begin{bmatrix} t\_1 & \mathbf{x}\_1 \\ \vdots & \vdots \\ t\_{\text{tr}} & \mathbf{x}\_1 \\ \vdots & \vdots \\ t\_1 & \mathbf{x}\_N \\ \vdots & \vdots \\ t\_{\text{tr}} & \mathbf{x}\_N \end{bmatrix}, \qquad \mathcal{Y} = \begin{bmatrix} y\_1^1 \\ \vdots \\ y\_{\text{tr}}^1 \\ \vdots \\ y\_1^N \\ \vdots \\ y\_{\text{tr}}^N \end{bmatrix}' \tag{24}$$

where *y<sup>i</sup> <sup>j</sup>* is the response at time *tj* for the input parameters **x***i*. The sizes of the design matrix X and the observation matrix Y are p*N* ˆ *nT*qˆp*d* `1q and p*N* ˆ *nT*q ˆ1, respectively. In compact form, the training data set (X ,*Y*) can be rewritten as:

$$\mathcal{X} = \begin{bmatrix} \mathbf{1}\_N \otimes \mathbf{T} & \mathbf{X} \otimes \mathbf{1}\_{n\top} \end{bmatrix}, \qquad \mathcal{Y} = \text{vec}(\mathbf{Y}), \tag{25}$$

where **1***<sup>N</sup>* is an identity vector of size *N*, **X** " r**x**1, ¨¨¨ , **x***N*s*T*, **T** " r*t*1, ¨¨¨ *tnT* s*T*, **1***nT* is an identity vector of size *nT*, **<sup>Y</sup>** " " **<sup>y</sup>**<sup>1</sup> ¨¨¨ **<sup>y</sup>***N*‰ and **y***<sup>i</sup>* " r*y<sup>i</sup>* <sup>1</sup>, ¨¨¨ *<sup>y</sup><sup>i</sup> nT* <sup>s</sup>*T*. Here the symbols <sup>b</sup> and vecp'q represent Kronecker product and vectorization operators, respectively. Consequently, a general regression model for time-dependent data can be expressed as a function *f*pX q that maps the input X to time-series observation Y. In GP regression, the goal is to infer the function *f*pX q from noisy observation of the output Y. To this end, the function *f*pX q is viewed as a random realization of a GP as *f*pX q " G*P*p*μ*pX q, **K**pX , X <sup>1</sup> qq, where *μ*pX q and **K**pX , X <sup>1</sup> q are the mean vector and covariance matrix of the process, respectively. Training the GP model can be performed by finding the optimal values to the covariance parameters. Systematically, this is done by maximizing the evidence or the marginal likelihood with respect to the hyperparameter parameters of the kernel. For a zero mean and **K**pX , X <sup>1</sup> q covariance matrix, the prediction of the GP based on noisy observation for a new input **x**˚ is a Gaussian process with the following posterior mean and covariance [12]

$$
\mu(\mathbf{x}\_{\bullet}) = \mathbf{k}(\mathbf{x}\_{\bullet}, \mathcal{X}) [\mathbf{K}(\mathcal{X}, \mathcal{X}') + \sigma\_n^2 \mathbf{I}]^{-1} \mathcal{Y}, \tag{26}
$$

$$
\sigma^2(\mathbf{x}\_{\boldsymbol{\theta}}) = \mathbf{k}(\mathbf{x}\_{\boldsymbol{\theta}}, \mathbf{x}\_{\boldsymbol{\theta}}) - \mathbf{k}(\mathbf{x}\_{\boldsymbol{\theta}}, \mathcal{X})[\mathbf{K}(\mathcal{X}, \mathcal{X}') + \sigma\_n^2 \mathbf{I}]^{-1} \mathbf{k}(\mathcal{X}, \mathbf{x}\_{\boldsymbol{\theta}}).\tag{27}
$$

The covariance function in the GP framework encodes the smoothness and it measures the similarity of the process between two points. The covariance function also encodes the prior belief over the regression function to model the measurements. The prior belief can be on the level of the function smoothness, or behavior and trend such as periodicity. Selecting the right covariance kernel can be challenging for time-dependent data and may require a composition of several covariance functions together to model the right behavior of the data. On the other hand, for problems where the training data is given in the form as in Equation (24), the size of the data may grow exponentially demanding large computational budged. In such a scenario, a scaleable framework for the GP regression of large data set can be exploited to efficiently address the computational cost [22].

In this work, the ultimate goal of the GP model is to serve as a surrogate for the costly simulation code. Thus, we follow a simplified approach to reduce the computational cost of building the surrogate [23]. For the case when the time index of measurement is set *a priori* and prediction at intermediate time instant is not required, the inter correlation between the time steps can be relaxed. Specifically, the prediction of the model in this case is always set at the location of the measured data, and the model only considers the correlation between the input variables **x***i*. Thus the GP can be constructed on the subset of the data (**X**, **Y**) instead of (X ,*Y*) as GPp*μ*p**X**q, **K**p**X**, **X**<sup>1</sup> qq, where

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}\_1 \\ \vdots \\ \mathbf{x}\_N \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} y\_{1'}^1 & \dots & y\_{n\_T}^1 \\ & \vdots \\ & y\_{N'}^1 & \dots & y\_{nr}^N \end{bmatrix} \tag{28}$$

## *2.4. Polynomial Chaos*

The Polynomial Chaos (PC) expansion is based on the decomposition of a stochastic process into deterministic coefficients scaling random functions. In particular, the PC approximates a stochastic process as a linear combination of stochastic orthogonal basis functions as

$$\mathbf{u}(t,\xi) = \sum\_{j=0}^{N} \Psi\_j(\xi)\mathbf{u}\_j(t),\tag{29}$$

where **Ψ***j*p*ξ*q are a set of multivariate orthogonal random polynomials and **u***j*p*t*q are the deterministic projection coefficients. The PC coefficients can be estimated non-intrusively as

$$\mathbf{u}\_{\dot{\jmath}}(t) = \frac{\int\_{\Xi} \mathbf{u}(t, \xi) \mathbf{\varvarproj}\_{\dot{\jmath}} (\xi) d\Xi}{\int\_{\Xi} \mathbf{\varproj}\_{\dot{\jmath}}^2 (\xi) d\Xi},\tag{30}$$

where ş <sup>Ξ</sup>p'q dΞ denotes the expectation operator with respect to the probability density function of the underlying random variables. The expectation integral can be estimated using random sampling or deterministic quadrature rule [24].
