An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis

Xiao, Wei; Shen, Yingying; Zhao, Jiao; Lv, Luogeng; Chen, Jiangtao; Zhao, Wei

doi:10.3390/app15063359

Open AccessArticle

An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis

by

Wei Xiao

,

Yingying Shen

,

Jiao Zhao

,

Luogeng Lv

,

Jiangtao Chen

^* and

Wei Zhao

^*

China Aerodynamics Research and Development Center, Mianyang 621000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 3359; https://doi.org/10.3390/app15063359

Submission received: 23 January 2025 / Revised: 1 March 2025 / Accepted: 11 March 2025 / Published: 19 March 2025

(This article belongs to the Section Aerospace Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

To quantify the uncertainties in multi-dimensional flow field correlated responses caused by uncertain model parameters, this paper presents an adaptive multi-fidelity model based on gappy proper orthogonal decomposition (Gappy-POD), which integrates the two conventional approaches for enhancing the efficiency of surrogate modeling, namely, multi-fidelity modeling and adaptive sampling algorithms. The challenges surrounding the selection of initial high-fidelity samples and the subsequent incremental augmentation of these samples are addressed. The k-means clustering algorithm is employed to identify locations within the parameter space for conducting high-fidelity simulations, leveraging insights gained from low-fidelity responses. An adaptive sampling criterion, leveraging the low-fidelity projection error derived from the Gappy-POD method, is implemented to progressively augment high-fidelity samples. The results demonstrate that the adaptive model consistently outperforms random sampling methods, highlighting its superiority in terms of accuracy and reliability, providing an efficient and reliable prediction model for uncertainty quantification.

Keywords:

adaptive sampling; multi-fidelity model; multi-dimensional correlated responses; machine learning; flow field reduction

1. Introduction

The stochastic nature of fluid systems, coupled with the limitations in human knowledge representation, inevitably introduces uncertainties in computational fluid dynamics (CFDs). Establishing rigorous quantification frameworks for these uncertainties has become paramount for assessing predictive reliability in CFD applications, particularly when considering physical model parameterization as a primary uncertainty source. Contemporary methodologies for parameter-induced uncertainty quantification span from conventional Monte Carlo sampling techniques to advanced spectral approaches like polynomial chaos expansions [1]. Notably, the past decade has witnessed paradigm shifts through machine learning integration, where surrogate modeling techniques synergized with strategic experimental design dramatically curtail computational overhead while maintaining accuracy [2,3,4,5,6,7]. Wang et al. [8] provided a comprehensive review of surrogate-assisted uncertainty propagation methods.

The precision of surrogate model-based uncertainty quantification inherently relies on the model’s predictive fidelity, whereas computational efficiency is determined by the cumulative duration of data generation, model training, and inference processes. Current methodologies for enhancing surrogate model performance primarily focus on two strategic directions:

Multi-Fidelity Integration Framework: This strategy synergizes heterogeneous simulation models by combining abundant low-fidelity data (computationally economical) with selective high-fidelity samples (accuracy-assured). Established implementations include co-Kriging architectures [9,10,11] and hierarchical neural networks [12,13], which have demonstrated enhanced cost-accuracy equilibria in complex engineering optimizations.

Sequential Sampling Methodology: Diverging from conventional static designs that predefine sample sets, this approach dynamically allocates computational resources through iterative sample selection. Guided by information entropy criteria or error minimization principles (e.g., MSE, IMSE [14,15,16]), it strategically identifies critical regions in parameter spaces for supplemental sampling.

In CFD, quantities of interest extend beyond scalar outputs to include multi-dimensional, correlated flow field responses with temporal/spatial variations, such as spatially distributed wall pressure coefficients or time-dependent aerodynamic loads. These field variables exhibit inherent cross-correlations across discrete spatial nodes or time steps, with output dimensionality frequently reaching O(10²)–O(10⁴) magnitudes in practical applications. When modeling multi-dimensional correlated responses, although surrogate models such as multi-output Kriging models [17] and multi-output neural networks exist, these models require the estimation of a large number of parameters, and the training time for the models themselves is relatively long. Consequently, scholars have developed surrogate model methods that combine with flow field reduction techniques. Guo and Hesthaven [18] developed a non-intrusive reduced basis framework synergizing proper orthogonal decomposition (POD) with Gaussian process regression, specifically targeting non-linear structural response prediction. Demo et al. [19] proposed the construction of active subspaces for POD modal coefficients and built a surrogate model with active variables. Zhan et al. [20,21] proposed the non-intrusive POD-based approach enhanced by multivariate interpolation for the parametric analyses of aero-icing problems. However, within the framework of surrogate modeling combined with the reduced order method, enhancing modeling efficiency through the utilization of the aforementioned multi-fidelity modeling and adaptive sampling remains a significant challenge. Existing multi-fidelity models and adaptive sampling algorithms, while effective in single-output scenarios, often fall short when dealing with the multi-dimensional correlated flow field responses encountered in CFD simulations.

In the context of multi-fidelity modeling for multi-dimensional correlated responses, the Gappy-POD approach [22,23] has been recognized as a promising strategy. This approach combines high- and low-fidelity responses from training samples to generate multi-fidelity responses. It identifies a set of key orthogonal basis functions and then predicts the high-fidelity responses of the new sample from its corresponding low-fidelity responses, thereby eliminating the need for extensive high-fidelity simulations. However, a common challenge encountered in the Gappy-POD approach is the selection of high-fidelity training samples, namely, where to conduct high-fidelity simulations within the parameter space. Existing literature on the Gappy-POD model has not specifically addressed this issue. Tariq Benamara et al. [23] employed Latinized centroidal Voronoi tessellation to acquire multi-fidelity samples for constructing a Gappy-POD surrogate model predicting RAE2822 airfoil flow fields. This sampling method is fundamentally a random one. Poethke et al. [24] implemented Latin hypercube sampling to obtain high-/low-fidelity samples to construct the Gappy-POD model during gas turbine second-stage vane optimization. Toal [25] proposed boundary-intermediate sampling for NACA0012 airfoil optimization. However, this strategy only applies to low-dimensional problems, as the required number of samples increases exponentially for high-dimensional problems. Basically, these methods are all a priori sampling methods, meaning they do not utilize feedback information from prediction results. From the experience of many single-output modeling efforts, designing sampling strategies based on model prediction results can significantly enhance modeling efficiency [14,15,16]. This is exactly the motivation for this paper.

To address the need to quantify uncertainty in multi-dimensional correlated flow field responses, we propose an adaptive multi-fidelity modeling approach that uses the Gappy-POD algorithm. The primary focus of the paper is on the design of experiments, specifically exploring how to select samples for initial high-fidelity CFD simulations from numerous low-fidelity samples, as well as how to incrementally incorporate high-fidelity sample data based on model prediction feedback. The second section of this paper introduces the Gappy-POD method, which is the framework of the multi-fidelity model. The third section delves deeper into how to select high-fidelity samples, including the method for selecting the initial samples and the adaptive sampling criterion. The fourth section presents the implementation process of the adaptive multi-fidelity model and uncertainty propagation model. The fifth and sixth sections present the two test cases and the analysis results, highlighting the prediction error reduction achieved through the application of the experimental design methods compared to the traditional random sampling algorithm. Finally, conclusions are drawn, and a discussion of future research directions concludes the paper.

2. The Multi-Fidelity Modeling Framework

In this paper, we develop an adaptive multi-fidelity modeling approach based on the Gappy-POD algorithm for predicting multi-dimensional corrected flow field responses. The Gappy-POD method facilitates high-fidelity responses from low-fidelity ones, thereby circumventing computationally expensive high-fidelity CFD simulations.

Assuming that all training samples are divided into two parts, one part constitutes the complete sample set, where both high- and low-fidelity responses are obtained through CFD simulations. The remaining samples are referred to as incomplete samples, with only low-fidelity responses obtained via CFD simulations, and their high-fidelity responses are unknown and predicted by the surrogate model.

For the i-th sample in the complete sample set, we define its input vector as θ_i, its low-fidelity response vector as

s_{i}^{L} = {(s_{1, i}^{L}, s_{2, i}^{L}, \dots, s_{n_{L}, i}^{L})}^{T}

with a dimension of n_L, and its high-fidelity response vector as

s_{i}^{H} = {(s_{1, i}^{H}, s_{2, i}^{H}, \dots, s_{n_{H}, i}^{H})}^{T}

with a dimension of n_H. The high- and low-fidelity vectors are combined into a multi-fidelity snapshot denoted as

s_{i} = (\frac{s_{i}^{L}}{s_{i}^{H}})

with a dimension of n_L + n_H.

The conventional POD methodology is applied to multi-fidelity snapshots

{\{s_{i}\}}_{i = 1}^{m}

. Based on the generalized energy criterion, truncation is performed to obtain an orthogonal space

Φ = [φ_{1}, φ_{2}, \dots, φ_{n}]

, which is composed of n orthogonal basis functions. Therefore, any multi-fidelity sample snapshot can be represented as follows:

Employing a generalized energy threshold criterion, we perform modal truncation to construct a reduced-order orthogonal subspace

Φ = [φ_{1}, φ_{2}, \dots, φ_{n}]

. Consequently, any multi-fidelity snapshot can be expressed as follows:

s \approx \bar{s} + \sum_{j = 1}^{n} α_{j} φ_{j}

(1)

where

\bar{s} = \frac{1}{m} \sum_{i = 1}^{m} s_{i}

. The basis function coefficients vector

\vec{α} = {(α_{1}, α_{2}, \dots, α_{n})}^{T}

is obtained by the projection method. Each orthogonal basis function

φ_{i} (1 \leq i \leq n)

is a (n_L + n_H)-dimensional vector. The first n_L elements of each orthogonal basis function, denoted as

φ_{i}^{L}

, represent features of low-fidelity responses, while the last n_H elements, denoted as

φ_{i}^{H}

, represent features of high-fidelity responses. Therefore, the orthogonal space can be expressed as follows:

Φ = [\frac{φ_{1}^{L}}{φ_{1}^{H}}, \frac{φ_{2}^{L}}{φ_{2}^{H}}, \dots \frac{φ_{n}^{L}}{φ_{n}^{H}}] = [\frac{Φ^{L}}{Φ^{H}}]

(2)

For the j-th sample in the incomplete sample set, we denote its input parameter vector as θ_j and its multi-fidelity response vector as

s_{j} = (\frac{s_{j}^{L}}{s_{j}^{H}})

. The low-fidelity part

s_{j}^{L} = {(s_{1, j}^{L}, s_{2, j}^{L}, \dots, s_{n_{L}, j}^{L})}^{T}

is obtained with CFD simulation, while its high-fidelity part

s_{j}^{H}

is unknown. Projecting the low-fidelity part

s_{j}^{L}

onto the low-fidelity part of orthogonal space

Φ^{L}

, we obtain the basis function coefficient vector

\vec{α}

with the least-squares method:

\vec{α} = {({(Φ^{L})}^{T} Φ^{L})}^{- 1} {(Φ^{L})}^{T} s_{j}^{L}

(3)

Applying the basis function coefficient vector to the high-fidelity part of the orthogonal basis function space

Φ^{H}

, we obtain the prediction of high-fidelity responses:

s_{j, p r e d i c t i o n}^{H} = Φ^{H} \vec{α}

(4)

In this way, for an incomplete sample, only the low-fidelity response

s_{j}^{L}

needs to be obtained through CFD simulations; then, its corresponding high-fidelity response

s_{j}^{H}

can be predicted. Considering that the computational cost of low-fidelity simulations is significantly lower than that of high-fidelity simulations, this approach will result in substantial savings in simulation costs.

3. The Strategies for Selecting High-Fidelity Samples

In the Gappy-POD method, it is assumed that there are a relatively large number of low-fidelity samples. A portion of these samples needs to be selected for high-fidelity CFD simulations to obtain their high-fidelity responses, while the high-fidelity responses of the remaining samples will be predicted using Equation (4). The key challenge lies in determining which samples to select for high-fidelity CFD simulations. From the perspective of dynamic sampling, this can be divided into two problems: how to select initial high-fidelity samples and how to incrementally add high-fidelity samples based on model feedback. This paper addresses these two issues separately.

(1): Selection of the initial high-fidelity samples

To construct the multi-fidelity surrogate model, it is necessary to determine, through design-of-experiment, where to conduct high- and low-fidelity CFD simulations.

During the initial modeling process, Latin hypercube sampling is used to obtain training samples for low-fidelity CFD simulations. Latin hypercube sampling (LHS) synthesizes the space-filling advantages of Monte Carlo methods with the stratification principles of experimental design, establishing itself as an optimal sparse sampling technique for high-dimensional parameter spaces.

After obtaining the low-fidelity responses of all training samples, a subset of samples needs to be selected from all training samples for high-fidelity CFD simulations. The objective is to ensure that the selected samples comprehensively represent the distinct characteristics of the output responses while exhibiting a certain discrepancy between each other. Previously, Cook and Nachtsheim [26] developed an exchange algorithm employing the Morris-Mitchell criterion [27] for nested sample selection. However, this algorithm relies on the distance measured within the input parameter space, which may not adequately capture the similarity in output responses for non-linear CFD problems. Therefore, we shifted the focus to an approach based on the output responses. To achieve this, we utilize the k-means clustering algorithm [28] to partition the n_train training samples into n_complete clusters based on their low-fidelity responses. Subsequently, we select the sample closest to the centroid of each cluster as the representative of that cluster and incorporate it into the complete sample set. The k-means clustering algorithm assesses the similarity between different sample points by quantifying their Euclidean distance, and samples that are grouped into the same cluster exhibit higher similarity. The clustering algorithm is implemented using the scikit-learn library.

(2): Adaptive refinement

From the perspective of dynamic sampling, if the prediction capability of the model constructed with the initial complete sample set is inadequate, more training samples need to be incrementally added. This process involves three key steps: (a) selecting samples from the incomplete set based on specific criteria, (b) obtaining their high-fidelity responses through CFD simulations, and (c) incorporating them into the complete sample set.

We define the projection error of low-fidelity response as the deviation between the original low-fidelity response

s_{j}^{L}

and the projected response,

Φ^{L} \vec{α}

, which is as follows:

{\vec{ε}}_{j}^{p r o j e c t i o n} = Φ^{L} \vec{α} - s_{j}^{L}

(5)

The dimensionless two-norm projection error is defined as follows:

δ_{j}^{p r o j e c t i o n} = \frac{{‖{\vec{ε}}_{j}^{p r o j e c t i o n}‖}_{2}}{{‖s_{j}^{L}‖}_{2}}

(6)

The prediction error is defined as

{\vec{ε}}_{j}^{p r e d i c t i o n} = Φ^{H} \vec{α} - s_{j}^{H}

, and the dimensionless two-norm prediction error is as follows:

δ_{j}^{p r e d i c t i o n} = \frac{{‖{\vec{ε}}_{j}^{p r e d i c t i o n}‖}_{2}}{{‖s_{j}^{H}‖}_{2}}

(7)

For an incomplete sample, the high-fidelity response

s_{j}^{H}

is unknown; therefore, it is impossible to evaluate the prediction error of the Gappy-POD method. However, considering that in a CFD field, for the same set of model parameters, there is usually a certain correlation between high- and low-fidelity responses. Tariq Benamara et al. [23] posited a strong correlation between prediction errors on high-fidelity responses and projection errors on low-fidelity responses. That is, when

δ_{j}^{p r o j e c t i o n}

is large,

δ_{j}^{p r e d i c t i o n}

is also likely to be large. Although this cannot be strictly proven mathematically, it was confirmed in the subsequent test cases. Therefore, in this paper, we utilize projection error

δ_{j}^{p r o j e c t i o n}

as an estimation of prediction error.

In the adaptive refinement process, we traverse incomplete samples and compare the maximum projection error with a pre-set value to determine whether the existing complete sample set is sufficient; if the maximum projection error is higher than the pre-set value, we add the sample with the largest projection error to the complete sample set.

4. Implementation Process

(1): Implementation process of the Gappy-POD model

With the Gappy-POD multi-fidelity model framework and high-fidelity samples selection methods, the implementation process of the adaptive multi-fidelity model is introduced in the following.

The modeling process, as shown in Figure 1, begins with the space of the uncertain model parameters and is carried out in six steps. The first four steps involve generating initial samples and constructing the initial model, while the last two steps focus on incrementally adding samples to improve the model.

(a): The inputs for n_train training samples are generated using Latin hypercube sampling in the input parameter space.
(b): The low-fidelity responses for all training samples are obtained through low-fidelity CFD calculations.
(c): The k-means clustering algorithm is used to select n_complete samples from all training samples for subsequent high-fidelity CFD simulations. This divides the training samples into two sets, one consisting of n_complete samples, denoted as the complete sample set, and the other consisting of the remaining (n_train ₋ n_complete) samples, denoted as the incomplete sample set.
(d): High-fidelity responses are acquired via CFD simulations across the complete sample set. At this point, for the complete samples, there are both high- and low-fidelity responses. For the incomplete samples, only low-fidelity responses are available.
(e): The Gappy-POD method is employed, based on the high- and low-fidelity responses of the complete samples and the low-fidelity responses of the incomplete samples, to predict the high-fidelity responses and the projection error of low-fidelity responses for incomplete samples. Traversing through incomplete samples, if the maximum projection error is less than the preset value, it can be considered that the Gappy-POD method based on the existing complete sample set can well predict the high-fidelity responses of incomplete samples; therefore, the multi-fidelity modeling process is finished. Otherwise, it is considered that the prediction ability is insufficient, and additional complete sample data are selected through the sixth step.
(f): Traversing through the incomplete samples, the one with the maximum projection error is selected as the next additional sample. Its high-fidelity responses are obtained through high-fidelity CFD calculations, and added to the complete sample set together with the previously obtained low-fidelity responses.

The fifth and sixth steps are repeated until the maximum projection error is less than the present value.

At this point, high-fidelity responses for all training samples are available, obtained by either the prediction model or CFD simulations.

(2): Uncertainty propagation model

After obtaining the high-fidelity responses of all training samples, we can establish a predictive model that maps the uncertain model parameters to multi-dimensional flow field responses. Through Monte Carlo sampling on this surrogate, we quantify input uncertainty propagation.

The procedure is consistent with the method of combining POD and surrogate modeling proposed in the literature [17,18,19,20,21]. The dominant orthogonal basis functions are obtained by performing POD [29,30] on the high-fidelity responses of all training samples. The reduced-dimensional response for each training sample is obtained by the projection method. The surrogate model (the Kriging model [31,32] used in this paper) between uncertain model parameters and the reduced-dimensional response is then constructed. When given new model parameters, the corresponding reduced-dimensional response can be predicted, and the complete response is recovered via POD.

5. Test Cases

To evaluate the proposed method’s effectiveness, we analyze the impact of SA turbulence model coefficient uncertainties on flow field predictions. The SA model (widely used in aerospace [33]) assumes fully turbulent flow with the transition term excluded, yielding nine uncertain coefficients:

c_{b 1}, σ, c_{b 2}, κ, c_{w 2}, c_{w 3}, c_{v 1}, c_{t 3}, c_{t 4}

. These coefficients exhibit epistemic uncertainty, modeled probabilistically via uniform distributions (Table 1) with ranges from the literature [34]. Note that uncertainty parameter characterization (distributions/types) requires consultation with domain experts. The present study primarily focuses on the propagation of uncertainty given a mathematical description of the uncertain parameters, rather than on the precise quantification of uncertainty for the model parameters.

The analysis examines two numerical cases: wall friction coefficient distribution in low-speed NACA0012 airfoil flow and wall pressure coefficient distribution in transonic M6 wing flow. Detailed presentations follow for each configuration.

(1): Low-speed flow around the NACA0012 airfoil

The first case under investigation is wall friction coefficient distribution in low-speed NACA0012 airfoil flow under the following computational conditions:

M_{\infty} = 0.15, α = {5.0}^{\circ}, T_{\infty} = 288.15 K, {Re}_{\infty} = 6 \times 10^{6}

Two distinct grid resolutions (Figure 2) were employed for multi-fidelity sampling: a coarse mesh comprising 3584 cells and a refined mesh of 57,344 elements. The coarse grid consists of 25% of the cell count in each direction relative to the fine grid.

Figure 3 displays the wall friction coefficient distribution predicted with standard model parameters. While high- and low-fidelity results show similar trend patterns—particularly the sharp leading-edge variations on both airfoil surfaces followed by stabilization—significant magnitude discrepancies emerge. These abrupt local variations present critical modeling challenges.

(2): Transonic flow around the M6 wing

The transonic flow over the ONERA M6 wing—a standard validation case for compressible flow solvers—was analyzed under the following conditions:

M_{\infty} = 0.8395, α = {3.06}^{\circ}, T_{\infty} = 255.56 K, {Re}_{\infty} = 1.172 \times 10^{7}

Multi-fidelity simulations employed two computational meshes (illustrated in Figure 4): a coarse grid comprising 990,360 cells (13,638 surface nodes, yielding pressure coefficient vector in ℝ^13,638) and a refined grid containing 3,594,863 cells (29,684 surface nodes, corresponding response vector in ℝ^29,684).

At transonic conditions, a characteristic λ-shaped shock structure forms on the wing’s upper surface (Figure 5), presenting significant modeling challenges due to its complex pressure gradients. While both grid resolutions produce qualitatively similar pressure distributions, the fine-grid solution demonstrates improved fidelity in capturing the suction peak and shock wave position, as validated against reference experimental measurements in Figure 6.

The sample data in this paper were generated by Flowstar, an in-house unstructured grid solver [35], which is based on a cell-centered finite volume methodology and is adept at handling diverse element types, including hexahedra, tetrahedra, prisms, pyramids, and other polyhedral generated by multi-grid geometric techniques.

The Kriging model employs a constant regression basis paired with a Gaussian covariance kernel to characterize spatial correlations. Concurrently, POD dimensionality reduction retains modes capturing 99.9% of the system’s cumulative energy content.

6. Results and Discussion

The predictive capability of the adaptive Gappy-POD method was investigated first. To achieve this, we evaluated the prediction error of high-fidelity responses on incomplete sample sets.

After obtaining high-fidelity responses for all training samples, we obtained the prediction model for uncertainty analysis. The predictive capability of the uncertainty propagation model was then investigated with additional n_test test samples.

(1): Low-speed flow around the NACA0012 airfoil

In the analysis, n_train was set to 81, the initial n_complete to 6, and n_test to 20.

The method used in this paper for selecting high-fidelity samples is divided into two components: initial sample selection and adaptive sampling. Consequently, the results comparison is also conducted separately.

Firstly, a comparison of the methods for selecting initial high-fidelity samples was performed without involving adaptive sampling. Here, we primarily compare our clustering-based method using k-means with the classical random sampling algorithm. The results are presented in section (a).

Secondly, a comparison of adaptive sampling algorithms is carried out. We evaluated the adaptive sampling algorithm based on the projection error against the random refinement method. Both methods utilize the same initial complete sample set, which was obtained through the k-means clustering-based approach. The results are presented in section (b).

Finally, a comprehensive comparison of the entire high-fidelity sample selection method was conducted, encompassing both initial sample selection and adaptive sampling. This comparison was made against an entirely random sampling method that includes both initial and adaptive steps. The results are presented in section (c).

(a): Comparison of the methods for selecting initial high-fidelity samples

After acquiring the low-fidelity responses for all training samples, we utilized the k-means clustering method to classify them into six distinct categories. Figure 7 presents the distribution of friction coefficients and the clustering results for all samples. Notably, significant variations in magnitude can be observed between clusters on the upper and lower surfaces of the airfoil, while the distributions tend to be more concentrated within individual clusters. Consequently, it is reasonable to select a single sample from each cluster to represent the distribution of that particular cluster.

Based on the clustering results, we constructed an initial complete sample set and an incomplete sample set. For the complete sample set, high- and low-fidelity responses were obtained through CFD simulations for each sample. For the incomplete sample set, only the low-fidelity responses were computed via CFD simulations (although the high-fidelity responses were also calculated, they were not used in model training but solely for assessing prediction errors). To validate the advantages of our method for selecting initial high-fidelity samples, we compared it with the commonly used random sampling method. In this comparison, the same number of high-fidelity samples was randomly selected from low-fidelity samples. Both methods predicted the high-fidelity responses of incomplete samples using the Gappy-POD method. To eliminate the randomness inherent in random sampling, this comparison was repeated 10 times. The L2 norm errors for both methods are presented in Figure 8. It is evident that the clustering-based approach outperforms random selection, with a median error reduction of over 50% (from 0.0126 to 0.0062) and fewer outliers. The representativeness of initial samples plays a crucial role in model predictive capabilities, and the k-means clustering method provides a robust foundation for subsequent model refinement by selecting representative initial samples.

(b): Comparison of high-fidelity sample refinement algorithms

Based on the high- and low-fidelity responses of the initial complete samples and the low-fidelity responses of the incomplete samples, we used the Gappy-POD method to obtain the prediction error of the high-fidelity responses on the incomplete samples and the projection error of the low-fidelity responses, as shown in Figure 9. The correlation coefficient between the two exceeds 0.8. Therefore, it is reasonable to assume that there is a strong correlation between them. The projection error of the low-fidelity responses can serve as an indicator for the prediction error of the high-fidelity responses and as an adaptive criterion.

According to the adaptive criterion, a sample was selected from the incomplete sample set at each iteration and added to the complete sample set, to gradually improve the prediction ability of the Gappy-POD method. Figure 10 presents the evolution of the maximum projection error as the number of adaptive iterations increases. After four iterations, the maximum projection error has decreased to a level below the set threshold of 0.5%, indicating rapid improvement in the accuracy of the adaptive method.

Next, we compared the prediction errors of the Gappy-POD model using the adaptive sampling algorithm based on projection errors and the random sampling method. Both methods are based on the same initial high-fidelity sample set, which was selected through the k-means clustering method. Figure 11 presents a box plot of the prediction error for two methods, showing the average error and the 99% confidence interval. With the same number of complete samples, the mean prediction error of the adaptive method is consistently smaller, and its 0.995 quantile value also significantly decreases. The advantage of the adaptive sampling algorithm based on low-fidelity projection errors lies in its ability to more selectively incorporate samples that are poorly predicted by the Gappy-POD method into the training set, thereby effectively improving the model’s overall predictive capability.

(c): Comparison of the entire experimental design algorithm

The methods for selecting initial high-fidelity samples and adaptive sampling of high-fidelity samples constitute the complete high-fidelity sample experimental design process. To demonstrate the advantages of the entire method, the proposed approach was compared with completely random sampling. The latter used a static sampling method to generate an equal number of high-fidelity samples at once, and was repeated multiple times to eliminate the influence of randomness. As the training sample size increases, the statistical results of prediction errors are shown in Figure 12, representing the average error and the 99% confidence interval. Under the same number of complete samples, the proposed high-fidelity sample experimental design method consistently achieves a smaller mean error than the random sampling method. For example, after 10 adaptive refinement iterations, the mean error of the proposed method is approximately 30% lower than that of the random method (0.00288 vs. 0.00400). Additionally, the 0.995 quantile value of the error significantly decreases. This indicates that the experimental design method not only improves overall prediction accuracy, but also reduces the occurrence of larger errors.

Figure 13 presents a comparison between the wall friction coefficient distribution predicted by the Gappy-POD method and the high-fidelity CFD simulation on a randomly selected incomplete sample. It is evident that the two predictions are in close agreement, indicating minimal differences. Given that low-fidelity CFD simulations are significantly less costly than high-fidelity simulations, we can eliminate the need for time-consuming high-fidelity CFD simulations.

(d): Prediction capability of the uncertainty propagation model and uncertainty results

After obtaining the high-fidelity responses for all training samples, we constructed a prediction model to compare model parameters and friction coefficients. For the entire model, only the model parameters of the test samples are required as input. We assessed the prediction error of the entire model on 20 test samples. A comparison of the predicted wall friction coefficient distribution with that obtained from the high-fidelity CFD simulation using a randomly selected sample from the test dataset is illustrated in Figure 14. It is evident that the two distributions align well, indicating a strong agreement.

Figure 15 presents the prediction errors on all test samples. Notably, the prediction errors are less than 1% across all test samples, indicating the accurate prediction capability of the entire model. This finding demonstrates the utility of the entire model to support uncertainty quantification for large-scale random sampling.

Finally, the Latin hypercube method was employed to sample 10⁶ inputs within the nine-dimensional space of uncertain parameters. The samples were processed through the uncertainty propagation model to generate wall friction coefficient distributions. Subsequent statistical evaluation yielded mean values with 99% confidence intervals (Figure 16), revealing substantial uncertainty propagation across the entire airfoil surface.

(2): Transonic flow around the M6 wing

In this case, the entire high-fidelity sampling method (encompassing both initial sample selection and adaptive refinement) was benchmarked against traditional random sampling.

As the training sample size increases, Figure 17 shows the statistical results of prediction errors, including the average error and the 99% confidence interval. With the same number of complete samples, the proposed high-fidelity experimental design method consistently achieves a smaller mean error than random sampling. For example, after 20 adaptive refinement iterations, the proposed method demonstrates a 27% reduction in mean error compared to the random method (0.000344 vs. 0.000471). Additionally, the 0.995 error quantile shows a significant reduction. These results indicate that our experimental design not only enhances overall accuracy, but also mitigates large-error occurrences, aligning with findings from the NACA0012 benchmark case.

Based on the input of all training samples and their high-fidelity responses, we employed the POD method and Kriging surrogate model to establish a predictive model for flow fields under different input parameters. For a randomly selected test sample, pressure coefficients predicted by the model and CFD simulation were compared along the six cross-sections spanning from the root to the tip of the wing (cross-section locations illustrated in Figure 5), as shown in Figure 18. The displayed results indicate a high degree of concordance between the model prediction and CFD simulation, even in the complicated shock wave region. Consequently, this prediction model is suitable for large-scale MC sampling to investigate the uncertainty of wall pressures.

By performing extensive random sampling on the prediction model, we obtained the uncertainty distribution of the flow field. Figure 19 illustrates the spatial distribution of pressure coefficient standard deviation across the wing surface. Significant uncertainty concentrations are localized within the λ-shaped shock structure on the upper surface, contrasting with negligible variability in attached flow regions. This behavior aligns with well-documented challenges in Reynolds-Averaged Navier-Stokes (RANS) simulations, where shock position predictability exhibits strong parametric sensitivity—particularly to turbulence model closure coefficients—due to their direct influence on eddy viscosity modulation near strong pressure gradients.

7. Conclusions

This paper addresses the need for uncertainty quantification of multi-dimensional correlated flow field responses in CFD. Building upon the Gappy-POD framework, we develop a systematic experimental design methodology for constructing adaptive multi-fidelity surrogate models. This approach comprises two strategically integrated phases:

(a): Initial sampling through k-means clustering of low-fidelity solution snapshots.
(b): Adaptive refinement driven by projection error quantification

Based on the adaptive multi-fidelity method, the impact of uncertainty in the coefficients of the SA turbulence model on the distribution of wall friction coefficients for the NACA0012 airfoil and on the distribution of wall pressure coefficients for the M6 wing was analyzed. The results show that compared with the commonly used random sampling method, the high-fidelity sample experimental design method consistently achieves a smaller mean error. Additionally, the 0.995 quantile value of the error significantly decreases. This indicates that the experimental design method not only improves overall prediction accuracy, but also reduces the occurrence of larger errors. The entire model can accurately predict the distribution of wall friction or pressure, and is applicable to support uncertainty quantification in large-scale random sampling processes.

However, it should be noted that the multi-fidelity model used in this study is based on the Gappy-POD framework. The fundamental principle of this method involves a linear POD on multi-dimensional responses. As such, it inherits the limitations associated with linear decomposition. As pointed out by many scholars, in processing highly non-linear datasets, the linear nature of POD decomposition often necessitates numerous POD modes for flow field reconstruction, and numerical stability cannot be guaranteed. To address the limitations of POD in handling non-linear data, kernel methods have been introduced, forming Kernel Proper Orthogonal Decomposition (Kernel POD or KPOD). In the context of a non-linear decomposition framework, how to construct a multi-fidelity model will be the focus of our future research.

Author Contributions

Conceptualization, W.X. and J.C.; methodology, W.X.; validation, Y.S. and J.Z.; investigation, W.X.; data curation, L.L.; writing—original draft preparation, W.X.; writing—review and editing, W.Z. and J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSAF (Grant No. U2230208), National Natural Science Foundation of China (Grant No. 52175214), and the National Numerical Wind Tunnel Project. The APC was funded by the three aforementioned foundations.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xiu, D.; Karniadakis, G. The Wiener-Askey Polynomial Chaos for Stochastic Differential Equations. SIAM J. Sci. Comput. 2002, 24, 619–644. [Google Scholar] [CrossRef]
Li, M.; Wang, Z. Surrogate Model Uncertainty Quantification for Reliability-based Design Optimization. Reliab. Eng. Syst. Saf. 2019, 189, 106571. [Google Scholar] [CrossRef]
Bhattacharyya, B. Uncertainty quantification of dynamical systems by a POD–Kriging surrogate model. J. Comput. Sci. 2022, 60, 101602. [Google Scholar] [CrossRef]
Xia, L.; Zou, Z.-J.; Wang, Z.-H.; Zou, L.; Gao, H. Surrogate model based uncertainty quantification of CFD simulations of the viscous flow around a ship advancing in shallow water. Ocean Eng. 2021, 234, 109206. [Google Scholar] [CrossRef]
Tripathy, R.K.; Bilionis, I. Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification. J. Comput. Phys. 2018, 375, 565–588. [Google Scholar] [CrossRef]
Asouti, V.; Kontou, M.; Giannakoglou, K. Radial basis function surrogates for uncertainty quantification and aerodynamic shape optimization under uncertainties. Fluids 2023, 8, 292. [Google Scholar] [CrossRef]
Garbo, A.; Parekh, J.; Rischmann, T.; Bekemeyer, P. Multi-fidelity adaptive sampling for surrogate-based optimization and uncertainty quantification. Aerospace 2024, 11, 448. [Google Scholar] [CrossRef]
Wang, C.; Qiang, X.; Xu, M.; Wu, T. Recent advances in surrogate modeling methods for uncertainty quantification and propagation. Symmetry 2022, 14, 1219. [Google Scholar] [CrossRef]
Kennedy, M.C.; O’Hagan, A. Predicting the Output from a Complex Computer Code When Fast Approximations Are Available. Biometrika 2000, 87, 1–13. [Google Scholar] [CrossRef]
Forrester, A.I.J.; Sóbester, A.; Keane, A.J. Multi-fidelity optimization via surrogate modelling. Proc. R. Soc. A 2007, 463, 3251–3269. [Google Scholar] [CrossRef]
Han, Z.H.; Zimmermann, R.; Goretz, S. A new cokriging method for variable-fidelity surrogate modeling of aerodynamic data. In Proceedings of the 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, 4–7 January 2010. AIAA Paper 2010-1225. [Google Scholar]
Meng, X.; Karniadakis, G.E. A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. J. Comput. Phys. 2019, 401, 109020. [Google Scholar] [CrossRef]
Motamed, M. A multi-fidelity neural network surrogate sampling method for uncertainty quantification. J. Comput. Phys. 2021, 426, 109923. [Google Scholar] [CrossRef]
Garbo, A.; German, B.J. Comparison of adaptive design space exploration methods applied to S-duct CFD simulation. In Proceedings of the 57th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, San Diego, CA, USA, 4–8 January 2016. AIAA Paper 2016-0416. [Google Scholar]
Garbo, A.; German, B.J. Adaptive sampling with adaptive surrogate model selection for computer experiment applications. In Proceedings of the 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Denver, CO, USA, 5–9 June 2017. AIAA Paper 2017-4430. [Google Scholar]
Jin, R.; Chen, W.; Sudjianto, A. On sequential sampling for global metamodeling in engineering design. In Proceedings of the ASME Design Engineering Technical Conference, Montreal, Quebec, Canada, 29 September–2 October 2002. ASME paper DETC2002/DAC-34092. [Google Scholar]
Conti, S.; O’Hagan, A. Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Infer. 2010, 140, 640–651. [Google Scholar] [CrossRef]
Guo, M.; Hesthaven, J.S. Reduced order modeling for nonlinear structural analysis using Gaussian process regression. Comput. Methods Appl. Mech. Eng. 2018, 342, 193–215. [Google Scholar] [CrossRef]
Demo, N.; Tezzele, M.; Rozza, G. A non-intrusive approach for the reconstruction of POD modal coefficients through active subspaces. Comptes Rendus Mec. 2019, 347, 873–881. [Google Scholar] [CrossRef]
Zhan, Z.; Habashi, W.G.; Fossati, M. Local reduced-order modeling and iterative sampling for parametric analyses of aero-icing problems. AIAA J. 2015, 53, 2174–2185. [Google Scholar] [CrossRef]
Zhan, Z.; Habashi, W.G.; Fossati, M. Real-time regional jet comprehensive aeroicing analysis via reduced-order modeling. AIAA J. 2016, 54, 3787–3802. [Google Scholar] [CrossRef]
Everson, R.; Sirovich, L. Karhunen-Loeve procedure for gappy data. J. Opt. Soc. Am. A 1995, 12, 1657–1664. [Google Scholar] [CrossRef]
Benamara, T.; Breitkopf, P.; Lepot, I.; Sainvitu, C. Adaptive infill sampling criterion for multi-fidelity optimization based on Gappy-POD Application to the flight domain study of a transonic airfoil. Struct. Multidiscip. Optim. 2016, 54, 843–855. [Google Scholar] [CrossRef]
Poethke, B.; Völker, S.; Vogeler, K. Uncertainty Based Optimization Strategy for the Gappy-POD Multi-Fidelity Method. In Proceedings of the Turbo Expo: Power for Land, Sea, and Air, Virtual, 21–25 September 2020. ASME Paper GT2020-15754. [Google Scholar]
Toal, D.J. On the Potential of a Multi-Fidelity G-POD Based Approach for Optimization and Uncertainty Quantification. In Proceedings of the Turbo Expo: Power for Land, Sea, and Air, Düsseldorf, Germany, 16–20 June 2014. ASME Paper GT2014-26753. [Google Scholar]
Cook, R.D.; Nachtsheim, C.J. A comparison of algorithms for constructing exact D-optimal designs. Technometrics 1980, 22, 315–324. [Google Scholar] [CrossRef]
Morris, M.D.; Mitchell, T.J. Exploratory designs for computer experiments. J. Stat. Plan. Infer. 1995, 43, 381–402. [Google Scholar] [CrossRef]
Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
Lumley, J.L. The Structure of Inhomogeneous Turbulent Flows. Commun. Pure Appl. Math. 1967, 20, 453–488. [Google Scholar] [CrossRef]
Sirovich, L.; Kirby, M. Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A 1987, 4, 519–524. [Google Scholar] [CrossRef] [PubMed]
Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. S. Afr. Inst. Min. Metall. 1951, 52, 119–139. [Google Scholar]
Sacks, J.; Welch, W.J.; Mitchell, T.J.; Wynn, H.P. Design and analysis of computer experiments. Stat. Sci. 1989, 4, 409–423. [Google Scholar] [CrossRef]
Spalart, P.R.; Allmaras, S.R. A One-Equation Turbulence Model for Aerodynamic Flows. AIAA J. 1992, 30, 5–12. [Google Scholar] [CrossRef]
Schaefer, J.A.; Cary, A.W.; Mani, M.; Spalart, P.R. Uncertainty Quantification and Sensitivity Analysis of SA Turbulence Model Coefficients in Two and Three Dimensions. In Proceedings of the 55th AIAA Aerospace Sciences Meeting, Grapevine, TX, USA, 9–13 January 2017. AIAA Paper 2017-1710. [Google Scholar]
Chen, J.Q.; Wu, X.J.; Zhang, J.; Li, B.; Jia, H.; Zhou, N. Flowstar: General Unstructured-grid CFD Software for National Numerical Wind Tunnel (NNW) Project. Acta Aeronaut. Astronaut. Sin. 2021, 42, 625739. (In Chinese) [Google Scholar]

Figure 1. The framework of the adaptive multi-fidelity model.

Figure 2. Computational grids for the NACA0012 airfoil: (a) coarse grid, (b) dense grid.

Figure 3. The computed wall friction coefficient distribution for the NACA0012 airfoil using the standard model parameters.

Figure 4. Computational grid for the M6 wing: (a) coarse grid, (b) dense grid.

Figure 5. Pressure contours of the M6 wing under standard model parameters using the fine grid.

Figure 6. The computed wall pressure coefficient distribution for the M6 wing using the standard model parameters at a wingspan of 65%.

Figure 7. Clustering results for all training samples.

Figure 8. Comparison of the methods for selecting initial high-fidelity samples to the prediction errors of the Gappy-POD model.

Figure 9. The correlation between the projection error and the prediction error.

Figure 10. Evolution of the maximum projection error with adaptive iterations.

Figure 11. Box plot of the prediction error with adaptive and random sampling methods.

Figure 12. Box plot of the prediction error with the entire experimental design algorithm and random sampling method (NACA0012 case).

Figure 13. Comparison between the wall friction coefficient distribution predicted by the Gappy-POD method and high-fidelity CFD simulation.

Figure 14. Comparison between the wall friction coefficient distribution predicted by the entire model and high-fidelity CFD simulation.

Figure 15. Prediction errors of the entire model on all test samples.

Figure 16. Mean values and 99% confidence intervals for the wall friction coefficient.

Figure 17. Box plot of the prediction error with the entire experimental design algorithm and random sampling method (M6 case).

Figure 18. Comparison of the wall pressure distribution between model prediction and CFD simulation.

Figure 19. The standard deviation distribution of the pressure coefficient on the wing surface.

Table 1. Intervals and standard values of SA model parameters.

	Minimum Value	Maximum Value	Standard Value
c_b₁	0.12893	0.137	0.1355
σ	0.6	1.0	2/3
c_b₂	0.60983	0.6875	0.622
κ	0.38	0.42	0.41
c_w₂	0.055	0.3525	0.3
c_w₃	1.75	2.5	2.0
c_v₁	6.9	7.3	7.1
c_t₃	1.0	2.0	1.2
c_t₄	0.3	0.7	0.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, W.; Shen, Y.; Zhao, J.; Lv, L.; Chen, J.; Zhao, W. An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis. Appl. Sci. 2025, 15, 3359. https://doi.org/10.3390/app15063359

AMA Style

Xiao W, Shen Y, Zhao J, Lv L, Chen J, Zhao W. An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis. Applied Sciences. 2025; 15(6):3359. https://doi.org/10.3390/app15063359

Chicago/Turabian Style

Xiao, Wei, Yingying Shen, Jiao Zhao, Luogeng Lv, Jiangtao Chen, and Wei Zhao. 2025. "An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis" Applied Sciences 15, no. 6: 3359. https://doi.org/10.3390/app15063359

APA Style

Xiao, W., Shen, Y., Zhao, J., Lv, L., Chen, J., & Zhao, W. (2025). An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis. Applied Sciences, 15(6), 3359. https://doi.org/10.3390/app15063359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive Multi-Fidelity Surrogate Model for Uncertainty Propagation Analysis

Abstract

1. Introduction

2. The Multi-Fidelity Modeling Framework

3. The Strategies for Selecting High-Fidelity Samples

4. Implementation Process

5. Test Cases

6. Results and Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI