1. Introduction
The stochastic nature of fluid systems, coupled with the limitations in human knowledge representation, inevitably introduces uncertainties in computational fluid dynamics (CFDs). Establishing rigorous quantification frameworks for these uncertainties has become paramount for assessing predictive reliability in CFD applications, particularly when considering physical model parameterization as a primary uncertainty source. Contemporary methodologies for parameter-induced uncertainty quantification span from conventional Monte Carlo sampling techniques to advanced spectral approaches like polynomial chaos expansions [
1]. Notably, the past decade has witnessed paradigm shifts through machine learning integration, where surrogate modeling techniques synergized with strategic experimental design dramatically curtail computational overhead while maintaining accuracy [
2,
3,
4,
5,
6,
7]. Wang et al. [
8] provided a comprehensive review of surrogate-assisted uncertainty propagation methods.
The precision of surrogate model-based uncertainty quantification inherently relies on the model’s predictive fidelity, whereas computational efficiency is determined by the cumulative duration of data generation, model training, and inference processes. Current methodologies for enhancing surrogate model performance primarily focus on two strategic directions:
Multi-Fidelity Integration Framework: This strategy synergizes heterogeneous simulation models by combining abundant low-fidelity data (computationally economical) with selective high-fidelity samples (accuracy-assured). Established implementations include co-Kriging architectures [
9,
10,
11] and hierarchical neural networks [
12,
13], which have demonstrated enhanced cost-accuracy equilibria in complex engineering optimizations.
Sequential Sampling Methodology: Diverging from conventional static designs that predefine sample sets, this approach dynamically allocates computational resources through iterative sample selection. Guided by information entropy criteria or error minimization principles (e.g., MSE, IMSE [
14,
15,
16]), it strategically identifies critical regions in parameter spaces for supplemental sampling.
In CFD, quantities of interest extend beyond scalar outputs to include multi-dimensional, correlated flow field responses with temporal/spatial variations, such as spatially distributed wall pressure coefficients or time-dependent aerodynamic loads. These field variables exhibit inherent cross-correlations across discrete spatial nodes or time steps, with output dimensionality frequently reaching O(10
2)–O(10
4) magnitudes in practical applications. When modeling multi-dimensional correlated responses, although surrogate models such as multi-output Kriging models [
17] and multi-output neural networks exist, these models require the estimation of a large number of parameters, and the training time for the models themselves is relatively long. Consequently, scholars have developed surrogate model methods that combine with flow field reduction techniques. Guo and Hesthaven [
18] developed a non-intrusive reduced basis framework synergizing proper orthogonal decomposition (POD) with Gaussian process regression, specifically targeting non-linear structural response prediction. Demo et al. [
19] proposed the construction of active subspaces for POD modal coefficients and built a surrogate model with active variables. Zhan et al. [
20,
21] proposed the non-intrusive POD-based approach enhanced by multivariate interpolation for the parametric analyses of aero-icing problems. However, within the framework of surrogate modeling combined with the reduced order method, enhancing modeling efficiency through the utilization of the aforementioned multi-fidelity modeling and adaptive sampling remains a significant challenge. Existing multi-fidelity models and adaptive sampling algorithms, while effective in single-output scenarios, often fall short when dealing with the multi-dimensional correlated flow field responses encountered in CFD simulations.
In the context of multi-fidelity modeling for multi-dimensional correlated responses, the Gappy-POD approach [
22,
23] has been recognized as a promising strategy. This approach combines high- and low-fidelity responses from training samples to generate multi-fidelity responses. It identifies a set of key orthogonal basis functions and then predicts the high-fidelity responses of the new sample from its corresponding low-fidelity responses, thereby eliminating the need for extensive high-fidelity simulations. However, a common challenge encountered in the Gappy-POD approach is the selection of high-fidelity training samples, namely, where to conduct high-fidelity simulations within the parameter space. Existing literature on the Gappy-POD model has not specifically addressed this issue. Tariq Benamara et al. [
23] employed Latinized centroidal Voronoi tessellation to acquire multi-fidelity samples for constructing a Gappy-POD surrogate model predicting RAE2822 airfoil flow fields. This sampling method is fundamentally a random one. Poethke et al. [
24] implemented Latin hypercube sampling to obtain high-/low-fidelity samples to construct the Gappy-POD model during gas turbine second-stage vane optimization. Toal [
25] proposed boundary-intermediate sampling for NACA0012 airfoil optimization. However, this strategy only applies to low-dimensional problems, as the required number of samples increases exponentially for high-dimensional problems. Basically, these methods are all a priori sampling methods, meaning they do not utilize feedback information from prediction results. From the experience of many single-output modeling efforts, designing sampling strategies based on model prediction results can significantly enhance modeling efficiency [
14,
15,
16]. This is exactly the motivation for this paper.
To address the need to quantify uncertainty in multi-dimensional correlated flow field responses, we propose an adaptive multi-fidelity modeling approach that uses the Gappy-POD algorithm. The primary focus of the paper is on the design of experiments, specifically exploring how to select samples for initial high-fidelity CFD simulations from numerous low-fidelity samples, as well as how to incrementally incorporate high-fidelity sample data based on model prediction feedback. The second section of this paper introduces the Gappy-POD method, which is the framework of the multi-fidelity model. The third section delves deeper into how to select high-fidelity samples, including the method for selecting the initial samples and the adaptive sampling criterion. The fourth section presents the implementation process of the adaptive multi-fidelity model and uncertainty propagation model. The fifth and sixth sections present the two test cases and the analysis results, highlighting the prediction error reduction achieved through the application of the experimental design methods compared to the traditional random sampling algorithm. Finally, conclusions are drawn, and a discussion of future research directions concludes the paper.
2. The Multi-Fidelity Modeling Framework
In this paper, we develop an adaptive multi-fidelity modeling approach based on the Gappy-POD algorithm for predicting multi-dimensional corrected flow field responses. The Gappy-POD method facilitates high-fidelity responses from low-fidelity ones, thereby circumventing computationally expensive high-fidelity CFD simulations.
Assuming that all training samples are divided into two parts, one part constitutes the complete sample set, where both high- and low-fidelity responses are obtained through CFD simulations. The remaining samples are referred to as incomplete samples, with only low-fidelity responses obtained via CFD simulations, and their high-fidelity responses are unknown and predicted by the surrogate model.
For the i-th sample in the complete sample set, we define its input vector as θi, its low-fidelity response vector as with a dimension of nL, and its high-fidelity response vector as with a dimension of nH. The high- and low-fidelity vectors are combined into a multi-fidelity snapshot denoted as with a dimension of nL + nH.
The conventional POD methodology is applied to multi-fidelity snapshots . Based on the generalized energy criterion, truncation is performed to obtain an orthogonal space , which is composed of n orthogonal basis functions. Therefore, any multi-fidelity sample snapshot can be represented as follows:
Employing a generalized energy threshold criterion, we perform modal truncation to construct a reduced-order orthogonal subspace
. Consequently, any multi-fidelity snapshot can be expressed as follows:
where
. The basis function coefficients vector
is obtained by the projection method. Each orthogonal basis function
is a (
nL +
nH)-dimensional vector. The first
nL elements of each orthogonal basis function, denoted as
, represent features of low-fidelity responses, while the last
nH elements, denoted as
, represent features of high-fidelity responses. Therefore, the orthogonal space can be expressed as follows:
For the
j-th sample in the incomplete sample set, we denote its input parameter vector as
θj and its multi-fidelity response vector as
. The low-fidelity part
is obtained with CFD simulation, while its high-fidelity part
is unknown. Projecting the low-fidelity part
onto the low-fidelity part of orthogonal space
, we obtain the basis function coefficient vector
with the least-squares method:
Applying the basis function coefficient vector to the high-fidelity part of the orthogonal basis function space
, we obtain the prediction of high-fidelity responses:
In this way, for an incomplete sample, only the low-fidelity response needs to be obtained through CFD simulations; then, its corresponding high-fidelity response can be predicted. Considering that the computational cost of low-fidelity simulations is significantly lower than that of high-fidelity simulations, this approach will result in substantial savings in simulation costs.
3. The Strategies for Selecting High-Fidelity Samples
In the Gappy-POD method, it is assumed that there are a relatively large number of low-fidelity samples. A portion of these samples needs to be selected for high-fidelity CFD simulations to obtain their high-fidelity responses, while the high-fidelity responses of the remaining samples will be predicted using Equation (4). The key challenge lies in determining which samples to select for high-fidelity CFD simulations. From the perspective of dynamic sampling, this can be divided into two problems: how to select initial high-fidelity samples and how to incrementally add high-fidelity samples based on model feedback. This paper addresses these two issues separately.
- (1)
Selection of the initial high-fidelity samples
To construct the multi-fidelity surrogate model, it is necessary to determine, through design-of-experiment, where to conduct high- and low-fidelity CFD simulations.
During the initial modeling process, Latin hypercube sampling is used to obtain training samples for low-fidelity CFD simulations. Latin hypercube sampling (LHS) synthesizes the space-filling advantages of Monte Carlo methods with the stratification principles of experimental design, establishing itself as an optimal sparse sampling technique for high-dimensional parameter spaces.
After obtaining the low-fidelity responses of all training samples, a subset of samples needs to be selected from all training samples for high-fidelity CFD simulations. The objective is to ensure that the selected samples comprehensively represent the distinct characteristics of the output responses while exhibiting a certain discrepancy between each other. Previously, Cook and Nachtsheim [
26] developed an exchange algorithm employing the Morris-Mitchell criterion [
27] for nested sample selection. However, this algorithm relies on the distance measured within the input parameter space, which may not adequately capture the similarity in output responses for non-linear CFD problems. Therefore, we shifted the focus to an approach based on the output responses. To achieve this, we utilize the
k-means clustering algorithm [
28] to partition the
ntrain training samples into
ncomplete clusters based on their low-fidelity responses. Subsequently, we select the sample closest to the centroid of each cluster as the representative of that cluster and incorporate it into the complete sample set. The
k-means clustering algorithm assesses the similarity between different sample points by quantifying their Euclidean distance, and samples that are grouped into the same cluster exhibit higher similarity. The clustering algorithm is implemented using the scikit-learn library.
- (2)
Adaptive refinement
From the perspective of dynamic sampling, if the prediction capability of the model constructed with the initial complete sample set is inadequate, more training samples need to be incrementally added. This process involves three key steps: (a) selecting samples from the incomplete set based on specific criteria, (b) obtaining their high-fidelity responses through CFD simulations, and (c) incorporating them into the complete sample set.
We define the projection error of low-fidelity response as the deviation between the original low-fidelity response
and the projected response,
, which is as follows:
The dimensionless two-norm projection error is defined as follows:
The prediction error is defined as
, and the dimensionless two-norm prediction error is as follows:
For an incomplete sample, the high-fidelity response
is unknown; therefore, it is impossible to evaluate the prediction error of the Gappy-POD method. However, considering that in a CFD field, for the same set of model parameters, there is usually a certain correlation between high- and low-fidelity responses. Tariq Benamara et al. [
23] posited a strong correlation between prediction errors on high-fidelity responses and projection errors on low-fidelity responses. That is, when
is large,
is also likely to be large. Although this cannot be strictly proven mathematically, it was confirmed in the subsequent test cases. Therefore, in this paper, we utilize projection error
as an estimation of prediction error.
In the adaptive refinement process, we traverse incomplete samples and compare the maximum projection error with a pre-set value to determine whether the existing complete sample set is sufficient; if the maximum projection error is higher than the pre-set value, we add the sample with the largest projection error to the complete sample set.
4. Implementation Process
- (1)
Implementation process of the Gappy-POD model
With the Gappy-POD multi-fidelity model framework and high-fidelity samples selection methods, the implementation process of the adaptive multi-fidelity model is introduced in the following.
The modeling process, as shown in
Figure 1, begins with the space of the uncertain model parameters and is carried out in six steps. The first four steps involve generating initial samples and constructing the initial model, while the last two steps focus on incrementally adding samples to improve the model.
- (a)
The inputs for ntrain training samples are generated using Latin hypercube sampling in the input parameter space.
- (b)
The low-fidelity responses for all training samples are obtained through low-fidelity CFD calculations.
- (c)
The k-means clustering algorithm is used to select ncomplete samples from all training samples for subsequent high-fidelity CFD simulations. This divides the training samples into two sets, one consisting of ncomplete samples, denoted as the complete sample set, and the other consisting of the remaining (ntrain − ncomplete) samples, denoted as the incomplete sample set.
- (d)
High-fidelity responses are acquired via CFD simulations across the complete sample set. At this point, for the complete samples, there are both high- and low-fidelity responses. For the incomplete samples, only low-fidelity responses are available.
- (e)
The Gappy-POD method is employed, based on the high- and low-fidelity responses of the complete samples and the low-fidelity responses of the incomplete samples, to predict the high-fidelity responses and the projection error of low-fidelity responses for incomplete samples. Traversing through incomplete samples, if the maximum projection error is less than the preset value, it can be considered that the Gappy-POD method based on the existing complete sample set can well predict the high-fidelity responses of incomplete samples; therefore, the multi-fidelity modeling process is finished. Otherwise, it is considered that the prediction ability is insufficient, and additional complete sample data are selected through the sixth step.
- (f)
Traversing through the incomplete samples, the one with the maximum projection error is selected as the next additional sample. Its high-fidelity responses are obtained through high-fidelity CFD calculations, and added to the complete sample set together with the previously obtained low-fidelity responses.
The fifth and sixth steps are repeated until the maximum projection error is less than the present value.
At this point, high-fidelity responses for all training samples are available, obtained by either the prediction model or CFD simulations.
- (2)
Uncertainty propagation model
After obtaining the high-fidelity responses of all training samples, we can establish a predictive model that maps the uncertain model parameters to multi-dimensional flow field responses. Through Monte Carlo sampling on this surrogate, we quantify input uncertainty propagation.
The procedure is consistent with the method of combining POD and surrogate modeling proposed in the literature [
17,
18,
19,
20,
21]. The dominant orthogonal basis functions are obtained by performing POD [
29,
30] on the high-fidelity responses of all training samples. The reduced-dimensional response for each training sample is obtained by the projection method. The surrogate model (the Kriging model [
31,
32] used in this paper) between uncertain model parameters and the reduced-dimensional response is then constructed. When given new model parameters, the corresponding reduced-dimensional response can be predicted, and the complete response is recovered via POD.
5. Test Cases
To evaluate the proposed method’s effectiveness, we analyze the impact of SA turbulence model coefficient uncertainties on flow field predictions. The SA model (widely used in aerospace [
33]) assumes fully turbulent flow with the transition term excluded, yielding nine uncertain coefficients:
. These coefficients exhibit epistemic uncertainty, modeled probabilistically via uniform distributions (
Table 1) with ranges from the literature [
34]. Note that uncertainty parameter characterization (distributions/types) requires consultation with domain experts. The present study primarily focuses on the propagation of uncertainty given a mathematical description of the uncertain parameters, rather than on the precise quantification of uncertainty for the model parameters.
The analysis examines two numerical cases: wall friction coefficient distribution in low-speed NACA0012 airfoil flow and wall pressure coefficient distribution in transonic M6 wing flow. Detailed presentations follow for each configuration.
- (1)
Low-speed flow around the NACA0012 airfoil
The first case under investigation is wall friction coefficient distribution in low-speed NACA0012 airfoil flow under the following computational conditions:
Two distinct grid resolutions (
Figure 2) were employed for multi-fidelity sampling: a coarse mesh comprising 3584 cells and a refined mesh of 57,344 elements. The coarse grid consists of 25% of the cell count in each direction relative to the fine grid.
Figure 3 displays the wall friction coefficient distribution predicted with standard model parameters. While high- and low-fidelity results show similar trend patterns—particularly the sharp leading-edge variations on both airfoil surfaces followed by stabilization—significant magnitude discrepancies emerge. These abrupt local variations present critical modeling challenges.
- (2)
Transonic flow around the M6 wing
The transonic flow over the ONERA M6 wing—a standard validation case for compressible flow solvers—was analyzed under the following conditions:
Multi-fidelity simulations employed two computational meshes (illustrated in
Figure 4): a coarse grid comprising 990,360 cells (13,638 surface nodes, yielding pressure coefficient vector in ℝ
13,638) and a refined grid containing 3,594,863 cells (29,684 surface nodes, corresponding response vector in ℝ
29,684).
At transonic conditions, a characteristic
λ-shaped shock structure forms on the wing’s upper surface (
Figure 5), presenting significant modeling challenges due to its complex pressure gradients. While both grid resolutions produce qualitatively similar pressure distributions, the fine-grid solution demonstrates improved fidelity in capturing the suction peak and shock wave position, as validated against reference experimental measurements in
Figure 6.
The sample data in this paper were generated by Flowstar, an in-house unstructured grid solver [
35], which is based on a cell-centered finite volume methodology and is adept at handling diverse element types, including hexahedra, tetrahedra, prisms, pyramids, and other polyhedral generated by multi-grid geometric techniques.
The Kriging model employs a constant regression basis paired with a Gaussian covariance kernel to characterize spatial correlations. Concurrently, POD dimensionality reduction retains modes capturing 99.9% of the system’s cumulative energy content.
6. Results and Discussion
The predictive capability of the adaptive Gappy-POD method was investigated first. To achieve this, we evaluated the prediction error of high-fidelity responses on incomplete sample sets.
After obtaining high-fidelity responses for all training samples, we obtained the prediction model for uncertainty analysis. The predictive capability of the uncertainty propagation model was then investigated with additional ntest test samples.
- (1)
Low-speed flow around the NACA0012 airfoil
In the analysis, ntrain was set to 81, the initial ncomplete to 6, and ntest to 20.
The method used in this paper for selecting high-fidelity samples is divided into two components: initial sample selection and adaptive sampling. Consequently, the results comparison is also conducted separately.
Firstly, a comparison of the methods for selecting initial high-fidelity samples was performed without involving adaptive sampling. Here, we primarily compare our clustering-based method using k-means with the classical random sampling algorithm. The results are presented in section (a).
Secondly, a comparison of adaptive sampling algorithms is carried out. We evaluated the adaptive sampling algorithm based on the projection error against the random refinement method. Both methods utilize the same initial complete sample set, which was obtained through the k-means clustering-based approach. The results are presented in section (b).
Finally, a comprehensive comparison of the entire high-fidelity sample selection method was conducted, encompassing both initial sample selection and adaptive sampling. This comparison was made against an entirely random sampling method that includes both initial and adaptive steps. The results are presented in section (c).
- (a)
Comparison of the methods for selecting initial high-fidelity samples
After acquiring the low-fidelity responses for all training samples, we utilized the
k-means clustering method to classify them into six distinct categories.
Figure 7 presents the distribution of friction coefficients and the clustering results for all samples. Notably, significant variations in magnitude can be observed between clusters on the upper and lower surfaces of the airfoil, while the distributions tend to be more concentrated within individual clusters. Consequently, it is reasonable to select a single sample from each cluster to represent the distribution of that particular cluster.
Based on the clustering results, we constructed an initial complete sample set and an incomplete sample set. For the complete sample set, high- and low-fidelity responses were obtained through CFD simulations for each sample. For the incomplete sample set, only the low-fidelity responses were computed via CFD simulations (although the high-fidelity responses were also calculated, they were not used in model training but solely for assessing prediction errors). To validate the advantages of our method for selecting initial high-fidelity samples, we compared it with the commonly used random sampling method. In this comparison, the same number of high-fidelity samples was randomly selected from low-fidelity samples. Both methods predicted the high-fidelity responses of incomplete samples using the Gappy-POD method. To eliminate the randomness inherent in random sampling, this comparison was repeated 10 times. The L2 norm errors for both methods are presented in
Figure 8. It is evident that the clustering-based approach outperforms random selection, with a median error reduction of over 50% (from 0.0126 to 0.0062) and fewer outliers. The representativeness of initial samples plays a crucial role in model predictive capabilities, and the
k-means clustering method provides a robust foundation for subsequent model refinement by selecting representative initial samples.
- (b)
Comparison of high-fidelity sample refinement algorithms
Based on the high- and low-fidelity responses of the initial complete samples and the low-fidelity responses of the incomplete samples, we used the Gappy-POD method to obtain the prediction error of the high-fidelity responses on the incomplete samples and the projection error of the low-fidelity responses, as shown in
Figure 9. The correlation coefficient between the two exceeds 0.8. Therefore, it is reasonable to assume that there is a strong correlation between them. The projection error of the low-fidelity responses can serve as an indicator for the prediction error of the high-fidelity responses and as an adaptive criterion.
According to the adaptive criterion, a sample was selected from the incomplete sample set at each iteration and added to the complete sample set, to gradually improve the prediction ability of the Gappy-POD method.
Figure 10 presents the evolution of the maximum projection error as the number of adaptive iterations increases. After four iterations, the maximum projection error has decreased to a level below the set threshold of 0.5%, indicating rapid improvement in the accuracy of the adaptive method.
Next, we compared the prediction errors of the Gappy-POD model using the adaptive sampling algorithm based on projection errors and the random sampling method. Both methods are based on the same initial high-fidelity sample set, which was selected through the
k-means clustering method.
Figure 11 presents a box plot of the prediction error for two methods, showing the average error and the 99% confidence interval. With the same number of complete samples, the mean prediction error of the adaptive method is consistently smaller, and its 0.995 quantile value also significantly decreases. The advantage of the adaptive sampling algorithm based on low-fidelity projection errors lies in its ability to more selectively incorporate samples that are poorly predicted by the Gappy-POD method into the training set, thereby effectively improving the model’s overall predictive capability.
- (c)
Comparison of the entire experimental design algorithm
The methods for selecting initial high-fidelity samples and adaptive sampling of high-fidelity samples constitute the complete high-fidelity sample experimental design process. To demonstrate the advantages of the entire method, the proposed approach was compared with completely random sampling. The latter used a static sampling method to generate an equal number of high-fidelity samples at once, and was repeated multiple times to eliminate the influence of randomness. As the training sample size increases, the statistical results of prediction errors are shown in
Figure 12, representing the average error and the 99% confidence interval. Under the same number of complete samples, the proposed high-fidelity sample experimental design method consistently achieves a smaller mean error than the random sampling method. For example, after 10 adaptive refinement iterations, the mean error of the proposed method is approximately 30% lower than that of the random method (0.00288 vs. 0.00400). Additionally, the 0.995 quantile value of the error significantly decreases. This indicates that the experimental design method not only improves overall prediction accuracy, but also reduces the occurrence of larger errors.
Figure 13 presents a comparison between the wall friction coefficient distribution predicted by the Gappy-POD method and the high-fidelity CFD simulation on a randomly selected incomplete sample. It is evident that the two predictions are in close agreement, indicating minimal differences. Given that low-fidelity CFD simulations are significantly less costly than high-fidelity simulations, we can eliminate the need for time-consuming high-fidelity CFD simulations.
- (d)
Prediction capability of the uncertainty propagation model and uncertainty results
After obtaining the high-fidelity responses for all training samples, we constructed a prediction model to compare model parameters and friction coefficients. For the entire model, only the model parameters of the test samples are required as input. We assessed the prediction error of the entire model on 20 test samples. A comparison of the predicted wall friction coefficient distribution with that obtained from the high-fidelity CFD simulation using a randomly selected sample from the test dataset is illustrated in
Figure 14. It is evident that the two distributions align well, indicating a strong agreement.
Figure 15 presents the prediction errors on all test samples. Notably, the prediction errors are less than 1% across all test samples, indicating the accurate prediction capability of the entire model. This finding demonstrates the utility of the entire model to support uncertainty quantification for large-scale random sampling.
Finally, the Latin hypercube method was employed to sample 10
6 inputs within the nine-dimensional space of uncertain parameters. The samples were processed through the uncertainty propagation model to generate wall friction coefficient distributions. Subsequent statistical evaluation yielded mean values with 99% confidence intervals (
Figure 16), revealing substantial uncertainty propagation across the entire airfoil surface.
- (2)
Transonic flow around the M6 wing
In this case, the entire high-fidelity sampling method (encompassing both initial sample selection and adaptive refinement) was benchmarked against traditional random sampling.
As the training sample size increases,
Figure 17 shows the statistical results of prediction errors, including the average error and the 99% confidence interval. With the same number of complete samples, the proposed high-fidelity experimental design method consistently achieves a smaller mean error than random sampling. For example, after 20 adaptive refinement iterations, the proposed method demonstrates a 27% reduction in mean error compared to the random method (0.000344 vs. 0.000471). Additionally, the 0.995 error quantile shows a significant reduction. These results indicate that our experimental design not only enhances overall accuracy, but also mitigates large-error occurrences, aligning with findings from the NACA0012 benchmark case.
Based on the input of all training samples and their high-fidelity responses, we employed the POD method and Kriging surrogate model to establish a predictive model for flow fields under different input parameters. For a randomly selected test sample, pressure coefficients predicted by the model and CFD simulation were compared along the six cross-sections spanning from the root to the tip of the wing (cross-section locations illustrated in
Figure 5), as shown in
Figure 18. The displayed results indicate a high degree of concordance between the model prediction and CFD simulation, even in the complicated shock wave region. Consequently, this prediction model is suitable for large-scale MC sampling to investigate the uncertainty of wall pressures.
By performing extensive random sampling on the prediction model, we obtained the uncertainty distribution of the flow field.
Figure 19 illustrates the spatial distribution of pressure coefficient standard deviation across the wing surface. Significant uncertainty concentrations are localized within the λ-shaped shock structure on the upper surface, contrasting with negligible variability in attached flow regions. This behavior aligns with well-documented challenges in Reynolds-Averaged Navier-Stokes (RANS) simulations, where shock position predictability exhibits strong parametric sensitivity—particularly to turbulence model closure coefficients—due to their direct influence on eddy viscosity modulation near strong pressure gradients.
7. Conclusions
This paper addresses the need for uncertainty quantification of multi-dimensional correlated flow field responses in CFD. Building upon the Gappy-POD framework, we develop a systematic experimental design methodology for constructing adaptive multi-fidelity surrogate models. This approach comprises two strategically integrated phases:
- (a)
Initial sampling through k-means clustering of low-fidelity solution snapshots.
- (b)
Adaptive refinement driven by projection error quantification
Based on the adaptive multi-fidelity method, the impact of uncertainty in the coefficients of the SA turbulence model on the distribution of wall friction coefficients for the NACA0012 airfoil and on the distribution of wall pressure coefficients for the M6 wing was analyzed. The results show that compared with the commonly used random sampling method, the high-fidelity sample experimental design method consistently achieves a smaller mean error. Additionally, the 0.995 quantile value of the error significantly decreases. This indicates that the experimental design method not only improves overall prediction accuracy, but also reduces the occurrence of larger errors. The entire model can accurately predict the distribution of wall friction or pressure, and is applicable to support uncertainty quantification in large-scale random sampling processes.
However, it should be noted that the multi-fidelity model used in this study is based on the Gappy-POD framework. The fundamental principle of this method involves a linear POD on multi-dimensional responses. As such, it inherits the limitations associated with linear decomposition. As pointed out by many scholars, in processing highly non-linear datasets, the linear nature of POD decomposition often necessitates numerous POD modes for flow field reconstruction, and numerical stability cannot be guaranteed. To address the limitations of POD in handling non-linear data, kernel methods have been introduced, forming Kernel Proper Orthogonal Decomposition (Kernel POD or KPOD). In the context of a non-linear decomposition framework, how to construct a multi-fidelity model will be the focus of our future research.