Next Article in Journal
Simulation of Diurnal Evolution of Evaporation Zone during Soil Drying after Rainfall
Next Article in Special Issue
An Investigation of Contaminant Transport and Retention from Storage Zone in Meandering Channels
Previous Article in Journal
Bridging the Cyber–Physical Divide: A Novel Approach for Quantifying and Visualising the Cyber Risk of Physical Assets
Previous Article in Special Issue
Case Study of Contaminant Transport Using Lagrangian Particle Tracking Model in a Macro-Tidal Estuary
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Surrogate-Based Uncertainty Analysis for Groundwater Contaminant Transport in a Chromium Residue Site Located in Southern China

1
Key Laboratory of Metallogenic Prediction of Nonferrous Metals & Geological Environment Monitoring, Ministry of Education, Central South University, Changsha 410083, China
2
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(5), 638; https://doi.org/10.3390/w16050638
Submission received: 24 January 2024 / Revised: 15 February 2024 / Accepted: 18 February 2024 / Published: 21 February 2024
(This article belongs to the Special Issue Contaminant Transport Modeling in Aquatic Environments)

Abstract

:
Numerical modeling is widely acknowledged as a highly precise method for understanding the dynamics of contaminant transport in groundwater. However, due to the intricate characteristics of environmental systems and the lack of accurate information, the results are susceptible to a significant degree of uncertainty. Numerical models must explicitly consider related uncertainties in parameters to facilitate robust decision-making. In a Chromium Residue Site located in southern China (the study area), this study employed Monte Carlo simulation to assess the impact of variability in key parameters uncertainty on the simulation outcomes. Variogram analysis of response surface (VARS), global sensitivity analysis, and an XGBoost (version 2.0.0)-based surrogate model was employed to overcome the substantial computational cost of Monte Carlo simulation. The results of numerical simulation indicate that the contaminant is spreading downstream towards the northern boundary of contaminated site near Lianshui River, threatening water quality. Furthermore, migration patterns are complex due to both downstream convection and upstream diffusion. Sensitivity analysis identified hydraulic conductivity, recharge rate, and porosity as the most influential model parameters, selected as key parameters. Moreover, uncertainty analysis indicated that the variability in key parameters has a minimal impact on the simulation outcomes at monitoring wells near the contaminant source. In contrast, at wells positioned a considerable distance from the contaminant source, the variability in key parameters significantly influences the simulation outcomes. The surrogate model markedly mitigated computational workload and calculation time, while demonstrating superior precision and effectively capture the non-linear correlations between input and output of the simulation model.

1. Introduction

Transport of contaminants through groundwater presents substantial environmental concerns, resulting in the degradation of freshwater reservoirs and a decline in drinking water standards. The quantity and quality of freshwater play pivotal roles in shaping the surrounding ecosystem, notably affecting aquifer integrity [1]. Contaminants commonly consist of heightened concentrations of substances like heavy metals and toxins, posing threats to water reservoirs [2]. The rapid industrialization in China has served as a catalyst for socioeconomic advancement; however, at same time, it has led to the contamination of groundwater environments with heavy metals [3]. The transport of heavy metals contaminants in groundwater is an imperceptible, prolonged, challenging-to-detect, and reversible phenomenon [4]. Among them, Chromium (Cr) groundwater contamination often stems from the unregulated disposal of chromium slag and the seepage of leachate from slag piles. Chromium (Cr) is typically present in trivalent (Cr(III)) and hexavalent (Cr(VI)) states [5], in which Cr(VI) poses a significant environmental concern, adversely affecting the ecosystems and water quality [6,7]. Due to its high toxicity [8], mobility, and dissolvability [9], it poses substantial threats to the prospects of sustainable development [10,11]. Having recognized the hazardous nature of Cr, the United States Environmental Protection Agency (US EPA) added it to the priority pollutant list. Therefore, a comprehensive understanding of the dynamic features governing Cr(VI) contaminant transport is crucial for accurate contamination assessment, and effective remediation, contributing to the overall goal of sustainable development.
With advances in computational techniques, numerical simulation of contaminants in ground waters has become a predominant approach for studying contaminants behavior in groundwater systems [12,13]. In recent years, several numerical simulations of groundwater investigations have been conducted to simulate the transport of Cr(VI) contaminant [10,14,15,16,17].
The hydrodynamic processes governing the flow of groundwater and the transportation of contaminants in an aquifer are a very complex, introducing numerous uncertainties in the numerical modeling outcomes. These uncertainties encompass aspects related to model parameters [18], the conceptual model [19,20], and observational data and measurements [21,22]. Accurate data for groundwater simulation model parameters are hindered by high testing costs, measurement errors from non-calibrated devices, and the complex behavior of water movement in aquifers. Consequently, providing precise values for simulation parameters is impractical, necessitating the incorporation of a reasonable level of uncertainty [23]. Input parameter inaccuracies can further impact model outputs, stemming from measurement error [24]. To enhance model reliability, it is crucial to perform uncertainty analyses on simulation model parameters to reduce the possible risk in the failure of model [25].
In parameter uncertainty analysis, the preeminent methodologies employed with highest frequency encompass sensitivity analysis and the Monte Carlo method [26]. Sensitivity analysis serves as an essential evaluation technique, providing valuable understanding into how variations in input factors, such as model parameters, affect model responses [27]. Sensitivity analysis comprises two main approaches: Local and Global sensitivity analysis [28]. Local sensitivity analysis focuses on model sensitivity within a specific parameter range, while global sensitivity analysis examines sensitivity across the entire parameter spectrum [29]. Methods for local sensitivity analysis include the one-factor-at-a-time technique [30] and regional sensitivity analysis [31]. Global sensitivity analysis employs various approaches, including the derivative-based [32], variance-based [33], and the recently developed Variogram Analysis of Response Surfaces (VARS) architecture [34,35]. While the ultimate aim of uncertainty analysis is to explore the full range of potential outcomes and their associated probabilities, sensitivity analysis only assesses the impact of input variations on output values [36].
The Monte Carlo method is a commonly employed technique for evaluating model uncertainty [37]. Monte Carlo simulation proves to be a versatile and straightforward technique capable of translating the uncertainty associated with model parameters into corresponding uncertainties in simulation outcomes, making it well-suited for assessing uncertainties in multivariable models [38]. In the context of uncertainty analysis, Monte Carlo method necessitates running the numerical model multiple times, leading to significant time consumption, especially when dealing with a complex model directly invoked. The surrogate model serves as an approximation of the simulation model, mitigating computational burdens and reducing processing time without compromising accuracy [39]. The surrogate model emulates the functionality of the simulation model and attempts to approximate the inherent association between groundwater parameters and contaminants distribution, all accomplished with a minimal computational load [40].
In recent years, several surrogate model approaches have been suggested to approximate simulation models for uncertainty analysis, such as Polynomial Chaos Expansion (PCE); Gaussian Process Emulation (GPE) [41]; Gaussian processes [42]; Kriging [26,43,44]; Radial Basis Function (RBF) [45]; Support Vector Regression (SVR) [46]; Artificial Neural Network (ANN) [47]; Multi-gene Genetic programming (MGGP) [48]; Kernel Extreme Learning Machine (KELM) [49]; a hybrid approach using the Multilevel Monte Carlo method (MLMC); a graph convolutional neural network and a feed-forward neural network [50]; and a Deep Belief Neural Network (DBNN) [51]. While earlier studies have demonstrated certain achievements, the application of surrogate models encounters challenges related to scalability and accuracy, particularly in scenarios where contaminant-related associations exhibits pronounced non-linearity or high dimensionality [52]. Ensemble learning surrogate models integrate various surrogate models to generate a comprehensive prediction, allowing for the compensation of errors from individual surrogates by utilizing the collective knowledge of the ensemble [53]. Extreme Gradient Boosting (XGBoost) is an illustrative ensemble learning model that integrates the predictive results of individual decision trees to address regression tasks, demonstrating significant potential in the domain of surrogate modeling [54]. However, the significantly increasing computational costs due to the introduction of multiple surrogate models is the major concerns in applying ensemble learning surrogate models to groundwater contaminant transport modeling. Additionally, there has been limited attention in existing research on global sensitivity analysis, primarily focusing on local sensitivity analysis methods. This limits valuable understanding into how variations in input model parameters affect model responses.
Our study presents a novel and robust Variogram-based Global Sensitivity Analysis (VARS) to identify the most influencing parameters on the model output, as key parameters. Additionally, we proposed an XGBoost surrogate model to reduce the computational cost of Monte Carlo simulation. The Monte Carlo simulation was employed to assess the impact of variability in key parameters uncertainty on the results of groundwater contaminant transport model. The Chromium Residue Site located in southern China was chosen as the study area. This research addresses the following challenges: (1) Application of numerical modeling to simulate and analyze the behavior of contaminated site in the context of groundwater contaminant transport; (2) VARS sensitivity analysis to comprehensively assess the impact of input parameter variations on model outputs; and (3) Uncertainty analysis utilizing the proposed XGBoost surrogate model to approximate the outcomes of the groundwater contaminant transport model, facilitating a more computationally efficient analysis of parameter uncertainties.

2. Materials and Methods

2.1. Overview of Study Area

This paper examines a chromium-contaminated urban site that was previously operational ferroalloy refinery, situated in a South China as illustrated in Figure 1. The plant was constructed in 1958, officially started production in 1962, shutdown all factory operation in 2006, and it was completely dismantled in 2010. The Ferroalloy plant, occupying an area of around 590,141 m2, primarily produced metal chromium, carbon-manganese and pig iron. The production process involved utilization of raw materials such as silicon, manganese, chromium, and iron. The ferroalloy plant, situated in close proximity to Lianshui river, consistently produced chromium metals for over five decades, resulting in a substantial accumulation of chromium-bearing slags and tailings during this historical timeframe. Improper storage in open yard near the plant has led to leachate from rainfall induced slag infiltration, causing soil and groundwater pollution.
Based on geological drilling data obtained from the geological investigation, the region is characterized by a heterogeneous composition, including miscellaneous fill, silty clay, medium sandsilt-round gravel, weathered mudstone, and moderately weathered mudstone. The miscellaneous fill exhibits high porosity, contrasting with the aquiclude nature of the silty clay layer, which has limited water permeability. The primary confined aquifer is represented by the medium sandsilt-round gravel layer, featuring moderate porosity, while the less permeable mudstone layer forms the base.
A technical roadmap of the proposed methodology is illustrated in Figure 2. A conceptual model was developed from the conversion of the 3D geological model pre-processes to match the input format for the Groundwater Modeling System (GMS). Numerical model of Cr(VI) contaminant transport constructed in groundwater for analyzing the spatial and temporal changes of Cr(VI) contaminant, based on calibrated groundwater flow model. However, in numerical models, the presence of uncertainty associated with hydrogeological parameters is frequently inherent, challenging to ascertain, and contributes to uncertainty in the outcomes. Therefore, the VARS sensitivity analysis method was employed to determine the highly sensitive parameters within the system, subsequently utilized as the key input parameters. Latin hypercube sampling was applied within the probability distribution and range of values of key parameters to generate the input datasets. Then, a groundwater Cr(VI) contaminant transport numerical model was used to generate input–output dataset. An XGBoost surrogate model of groundwater Cr(VI) contaminant transport numerical model was established according to the input–output dataset. The surrogate model was built to reduce the computational burden caused by frequent invoking of the simulation model, thereby enhancing overall computational efficiency. Based on the surrogate model, the Monte Carlo simulation was used to evaluate the uncertainty caused by variability in key parameters and their impact on the output of numerical model.

2.2. Numerical Modeling

Numerical modeling is a robust tool for assessing, simulating, and predicting the movement of contaminants in groundwater flow systems. This process involves the following steps [10]: (1) development of conceptual model, (2) development and calibration of groundwater flow simulation model, and (3) Cr(VI) contaminant transport simulation model.
In this paper, the simulation of groundwater flow and contaminant transport was conducted using the MODFLOW-2005 and MT3DMS modules integrated within the Groundwater Modeling System (GMS) software (version 10.6) [55]. A calibrated MODFLOW model was utilized for the MT3DMS model. Furthermore, to acquire a comprehensive insight of Cr(VI) transportation during the simulation and for further studies, eight monitoring wells were used to observe the Cr(VI) transportation (Figure 3).

2.2.1. Conceptual Model

Previous studies [56,57] confirm the practicality of extracting hydrogeological model surfaces from GOCAD software (version 2017) geological model in either TIN or ASCII format for integration into Visual MODFLOW or GMS. A three-dimensional model of the study area (Figure 4) developed through a systematic process: (1) construction of a geological model using GOCAD’s structural and stratigraphic workflow, incorporating data from 30 boreholes, (2) preservation of corrected strata in ‘ASCII’ point files, and (3) application of the Inverse Distance Weighting (IDW) method in GMS for the interpolation of ‘ASCII’ point files. For conceptual model development, a finite-difference grid has been employed as a solving framework. The study area is discretized and modeled into finite difference cell-centered grid of 100 rows and 90 columns, ensuring that each cell maintains a size of 5 m (Figure 4). The modeled domain consisted of 45,000 cells, of which 22,325 were active cells and 22,675 were inactive cells.
In the numeric simulation, we exclusively focused on sandsilt-round gravel layer. This layer contains the confined pore water and is treated as the confined aquifer. Other layers, at the top and bottom of the sandsilt-round gravel layer, are characterized by low permeability and are regarded as unconfined aquicludes. Taking into account that there is no natural boundary at the study area, the study area was defined by the Wuming and Lianshui Rivers in the northwest and southeast, respectively, serving as the constant head boundaries in their respective regions (Figure 3). As the aquifer possessed a specific depth of embedment, the impact of evaporation was disregarded [14]. Groundwater in the area mainly receives recharge from rainfall and surface water, and the flow direction is from northwest to southeast to the Lianshui River. According to the hydrological survey for the site, the head is 51 m at northwest and 47 m at southeast boundaries.

2.2.2. Numerical Simulation Model

Groundwater flow models determine hydraulic heads and thereby velocities within an aquifer. This information is then incorporated into a transport model to simulate the migration of contaminants carried by the groundwater [58]. Governing partial differential equation for three-dimensional groundwater flow in saturated porous media is based on Darcy’s Law can be written as Ref. [59]:
x k x h x + y k y h y + z k z h z w = S s h t
where, k x , k y , k z denote the hydraulic conductivities in x , y , and z dimensions, respectively; h denotes the hydraulic head; w represents the sink/source; S s is coefficient of specific storage, and t represents the time.
A contaminant transport model serves as a simplified representation of geochemical conceptual model, focusing on the predicting the concentration of a dissolved chemical, such as Cr(VI), in an aquifer at a given location and time. This simplification is achieved through dynamics equation describing convection-dispersion of contaminant transport in three dimensions, including internal sources or sinks, can be written as [60]:
C t = x i D i j C x j x i q i C + q s θ C ,
where D i j is the dispersion coefficient, q i is the Darcy speed, C is concentration, θ denotes effective porosity, C represents source or sink concentration, and q s represents the source or sink flow rate per unit volume.
We use the dynamic models formulated in Equations (1) and (2) to simulate the groundwater contaminant transport process. The dynamic model was solved using GMS software (version 10.6). During the simulation, boundary conditions outlined in Figure 3 were applied, and the initial concentration for contaminant transportation was determined based on the concentrations measured in 2017, as described in Figure 5 [61]. Additionally, a constant concentration of 120 mg/L was hypothetically applied to the contaminant sources based on the observations from wells in close proximity to these sources. No adsorption processes were assumed in the model for Cr(VI) contaminant transport, and thus, the Cr(VI) can be considered as a conservative specie. The contaminant transport model was simulated over the period from 2017 to 2020. All parameters, including the calibrated hydraulic conductivity, as well as other parameter values of the confined aquifer (i.e., medium sandsilt-round gravel layer) input to the numeric simulation, along with their respective reasonable ranges [49], have been summarized in Table 1.

2.3. Variogram-Based Global Sensitivity Analysis (VARS)

In 2016, Razavi and Gupta [34,35] proposed a novel approach for a global sensitivity analysis called Variogram Analysis of Response Surfaces (VARS), which builds upon the Variogram concept. Variograms prove to be robust tools for describing the spatial or spatiotemporal framework and variance of a target variable, such as a model response across various perturbation scales in multi-parametric space [62]. Through the modeling of spatial variation in the variogram, VARS establishes the two types of global sensitivity approaches (GSA), variance-based [33] and derivative-based [32], making it significantly efficient than these two most widely-used GSA methods, while still producing nearly consistent results [34,35]. In addition, compared to other popular techniques like Morris [32] or Sobol [33], VARS offers much more efficient GSA with a considerably lower number of model evaluations [63].
VARS analysis characterizes the spatial variance and interrelation of the model output within a domain delineated by a set of various parameters. Let’s consider a scenario where the model output, Z , this is a function dependent on n parameters y 1 , y 2 , , y n . Let y = y 1 , , y n denote the position within the n -dimensional parameter space. p = p 1 ,   p 2 , p n be the distance vector between any two points in the parameter space. The multidimensional variogram γ ( p ) can be computed under intrinsic stationary assumptions as:
γ p = 1 2   V a r Z y + p Z y = 1 2   E Z y + p Z y 2
When the variogram is computed along dimension (parameter), such that p j = 0 for all i j , the resulting variogram is one dimensional, known as a directional. Directional variograms focus on the distance between pairs of points along specific parameter directions rather than their locations in the parameter space.
Directional variograms serve as the foundational elements of VARS and describe the sensitivity of model output to each parameter across a comprehensive range of perturbation scales, denoted by p i . Integrated Variogram Across a Range of Scales (IVARS), a unique sensitivity index can be calculated by integration of variograms for the i t h parameter up to a designated perturbation scale of concern P i (e.g., 10%, 30% or 50% of the parameter range):
Γ P i = 0 P i   γ p i d p i
The γ p i value indicates how the sensitive of model output is to parameter i at any specific ranges p i . IVARS50 ( P i = 50 % of the parameter range) is known as total variogram effect and can be regarded as the most inclusive index within the VARS structure to evaluating global sensitivity of a response surface to each respective parameter.
A novel and robust “star-based” sampling approach, denoted as STAR-VARS has been developed for the VARS numerically implementation, enhancing computational efficiency [35]. First, a sampling technique such as the Latin Hypercube or Progressive Latin Hypercube [64] is used to choose N star centers from the parameter space. Next, equidistant points, or ‘stars,’ are created along each parameter dimension for each star center. These stars are employed in computing sensitivity metrices like IVARS, based on a defined resolution ( Δ p ). The total number of sample (computational cost) associated with implementation of STAR-VARS is calculated using Equation (5).
T o t a l   n u m b e r   o f   s a m p l e = N D 1 Δ p 1 + 1 ,
where N , D , and Δ p represent the number of star centers, respectively, the number of input parameters, and perturbation resolution (that is minimum space between pairs of point within the parameter space).
In our study, we utilized a comprehensive version of the recently developed VARS-tool package in Python to implement the STAR-VARS. Our experimental setup of VARS involves following steps:
(1)
The parameters listed in Table 1 were selected for sensitivity analysis. A ±20% variation range was set for each sensitive parameter.
(2)
A total 300 random sample points are created by employing a star-based sampling method within defined parameter ranges. The sampling attributes utilized in STAR-VARS are detailed in Table 2.
(3)
Generated random sample points were input into the numeric simulation model. The numeric simulation model outputs concentrations of the Cr(VI) contaminant at three test points. The locations of three test points are illustrated in Figure 3.
(4)
The output concentrations of the Cr(VI) contaminant were then processed further in the VARS model. Specifically, the IVARS-50 index was executed to calculate a set of sensitivity indices for each test point.
Table 2. Sampling attributes for STAR-VARS.
Table 2. Sampling attributes for STAR-VARS.
Number of Stars Sampling   Resolution   ( Δ p )Perturbation ResolutionSamplerBootstrap SizeConfidence Interval (%)
50.10.1, 0.3, 0.5Latin Hypercube100090

2.4. Surrogate Model Based on Extreme Gradient Boosting

Chen and Guestrin [65] introduced Extreme Gradient Boosting (XGBoost) in 2016 as an advanced ensemble learning algorithm within the boosting framework. Its foundation in decision tree methodology enhances predictive modeling capabilities. Boosting algorithms construct base estimators to correct errors, fostering strong interdependence. In contrast, bagging creates independent base estimators for diversity. The strong correlation in boosting reflects cooperative model improvement [65]. Unlike the gradient boosting structure, XGBoost regularizes the loss function within the objective function. This regularization contributes to smoother learning weights and reduces the risk of overfitting [66]. XGBoost algorithm iteratively refines the model’s structure and parameters using an objective function, contributing to enhanced predictive accuracy [67].
This approach uses an ensemble of trees to predict the results, relying on total number of M trees in the ensemble denoted as E 1 x i , y i , E M x i , y i , where x i denotes the feature values associated with sample i , and y i signifies the actual value of the sample for prediction; E is the space of trees in the ensemble. The final prediction y ˆ i for a given sample is determined by the cumulative sum of predictions generated by each individual tree, expressed mathematically:
y ˆ i = m = 1 M     f m x i ,   f m E
Within provided dataset, f m corresponds to an independent tree. The leaf weights w p represent the values associated with each leaf node p in the regression trees, where p { 1 , 2 , 3 , P } and P is the entire number of leaves on the M tree. The objective function can be specifically described as:
O b j ( Θ ) = i = 1 n   l y i , y ˆ i + γ P + 1 2 λ p = 1 P   w p 2
where l y i , y ˆ i denotes the loss function which measure the difference between predicted y ˆ i and actual y i values; n is total number of samples in dataset used to train or evaluate the model. Gamma ( γ ) and lambda ( λ ) in the objective function contribute to the regularization of model, with γ controlling the tree structure and λ penalizing the magnitude of the weights. The regularization terms ( γ P + 1 2 λ p = 1 P   w p 2 ) prevent models from becoming overly complex and fitting the training data too precisely, thereby enhancing the model’s ability to generalize well to new, unseen data and avoiding overfitting.
In the training phase of XGBoost, the construction of tree structures occurs in a sequentially. The creation of a new tree is influenced by the predictive outcome of the preceding tree, leading to adjustments in the residuals within the predicted values. The Equation (8) defines optimum weight ω p * for a specific leaf p in a fixed tree structure.
ω p * = i I p   g i i I p   h i + λ
where terms g i = y ˆ d 1 l y i , y ˆ i d 1 , h i = y ˆ d 1 2 l y i , y ˆ i d 1 are the first and second-order derivatives of the loss function during the d t h iteration. With determined optimum leaf weight ω p * , the optimal objective function can be formulated as:
o b j ( Θ ) * = 1 2   p = 1 P   i I p   g i 2 i I p   h i + λ + γ P
The Surrogate model is a black box which effectively captures the non-linear correlations between input and output of the simulation model. The surrogate model reduces computational workload and calculation time. Thus, XGBoost surrogate model is implemented to alleviate the computational burden caused by frequent invoking of the simulation model, thereby enhancing overall computational efficiency. We further developed Random Forest (RF) [68] surrogate model in order to compare the performance of XGBoost surrogate model. Random forest is an ensemble learning method wherein numerous decision trees are constructed with limited interdependencies, which has been used to establish surrogate model for uncertainty analysis in groundwater numerical simulation and has achieved good results [69]. The surrogate model building involves following steps:
(1)
Key parameters were selected through VARS sensitivity analysis, treated as stochastic variables to assess the influence of uncertainty related to these parameters on simulation outcomes.
(2)
Key parameters were sampled based on their probability distributions and range of values using Latin hypercube sampling (LHS) method [70] to generate 140 input datasets (various combinations of the three key parameters). The values of the remaining model parameters were kept constant with previous adjustments.
(3)
These datasets were input into the simulation model. The simulation model was resolved using GMS software to produce corresponding output datasets (Cr(VI) contaminant concentrations observed in 8 monitoring wells), forming Input-output dataset.
(4)
Two methods (XGBoost and RF) were employed to build the surrogate model. The input and output datasets were partitioned, allocating 75% for training the model and reserving 25% for validating the accuracy of the surrogate model. This study employed python programming language (version 3.11.5) and the XGBoost library (version 2.0.0), as well as scikit-learn library for Random Forests (version 1.3.0).
(5)
To craft optimal surrogate model architectures, hyperparameters were carefully selected to address overfitting during training, thereby enhancing the accuracy of the model (Table 3). The lower and upper values used for these hyperparameter are shown in Table 3. In this study, Optuna Hyperparameter Optimization (OHPO) [71] was employed to automatically fine-tune the hyperparameters of the surrogate model.
Evaluating the prediction accuracy and efficiency of the surrogate model is essential, as inaccurate models can squander resources and impair optimization, predictions, and feasibility analysis [72]. In this study, the evaluation of surrogate model accuracy in approximating the simulation model involves the use of key metrics, namely the coefficient of determination (R2), mean relative error (MRE), mean-squared error (MSE), and root mean-squared error (RMSE). These metrics, essential for assessing performance, are computed in the following manner:
R 2 = 1 i = 1 n   y i y ˆ i 2 i = 1 n   y i y i 2
M R E = i = 1 n   y i y ˆ i / y i n × 100
M S E = i = 1 n     y i y ˆ i 2 n
R M S E = i = 1 n     y i y ˆ i 2 n
where n represents number of samples, y i , and y ˆ i denote outcomes of the simulation model and predictions of the surrogate model for the i th sample, respectively. y i is the mean of n simulation model results. R² values range from 0 to 1, with 0 indicating that the surrogate model does not explain any variability in the actual values, and 1 indicating the surrogate model explains all the variability in the actual values. A lower MRE, MSE, and RMSE indicate better accuracy, while a higher MRE, MSE, and RMSE suggest larger discrepancies between predicted and actual values. Especially, an MRE of 1 suggests that, on average, the predictions are off by the same magnitude as the actual values. Overall, R² value and MRE can provide a reference of standard metric values for the evaluation of surrogate model. The surrogate model achieves higher accuracy with R² values approaching 1 and smaller MRE, MSE, and RMSE values.

2.5. Monte Carlo Simulation

Based on statistical probability theory, the Monte Carlo simulation is the most widely employed technique for analyzing the uncertainty resulting from complicated mathematical models [73]. The Monte Carlo simulation is characterized by its conceptual simplicity, versatility across various applications, and its unique capability to thoroughly quantify the uncertainties associated with model outputs [74]. Monte Carlo simulation is divided into the following steps [60]:
(1)
Determine the random/key parameters by means of sensitivity analysis.
(2)
Generation of randomly samples using Latin hypercube sampling method within the feasible region of key parameters.
(3)
Run a simulation model for each sample dataset to extract the corresponding model output.
(4)
After the completion of all simulations, the construction of a histogram of all simulation results for the uncertain quantity of interest. From the frequency plot, the probability at any level can be estimated. The mean, variance, confidence limit, and other statistical parameters can also be determined.

3. Results and Discussion

We carried out experiments to validate the efficiency of the proposed method. The experiments were conducted on an ordinary PC equipped with an Intel Core i7-6700 processor (3.4 GHz, 8 GB RAM) on a 64-bit Windows 10 system.

3.1. Numerical Simulation of Flow Field and Cr(VI) Contaminant Transport

Figure 6 illustrates the east-to-west flow of groundwater within the study area. Hydraulic head is higher in the northwest and lower in the southeast, by the topography of the area. Groundwater primarily flows laterally towards the Lianshui River, exhibiting a gradual decline in vertical hydraulic gradient. The depth of the groundwater flow is 15 m according to the depth of impermeable based layer (i.e., moderately weathered mudstone) of aquifer. The water head values presented in Figure 6 are indicative of the water table within the aquifer system. The aquifer, in this case, is deemed as having a simple and singular structure [11].
Observation data acquired from ten groundwater level monitoring wells in 2019 (Figure 6) was used to calibrate the numeric groundwater model. The results in Figure 7 of the calibration process indicate a high alignment between the simulated and observed hydraulic heads (Figure 7a), with a significant correlation coefficient of 0.973 within the study. Residual analysis indicates a low residual between simulated and observed hydraulic heads, with a mean residual of 0.0023, a mean absolute residual of 0.14, and a root mean squared residual of 0.17. These values fall within allowable range. Figure 7b shows the comparison between simulated and observed hydraulic heads across ten monitoring wells. Both lines exhibit similar trend, indicating that simulation closely matches the observed hydraulic heads. This calibration results shows a good fit between the simulated and observed hydraulic heads.
The spatial and temporal changes in Cr(VI) over 6, 12, 18, 24, 30, and 36 months are illustrated in Figure 8. As illustrated in Figure 8a–f, the contamination area for Cr(VI) progressively expanded with increasing simulation time under natural conditions. Additionally, the movement of the Cr(VI) contaminant plume consistently followed the direction of groundwater flow. Over the 36-month simulation period, the Cr(VI) pollution plume at the contaminated site continuously spread towards the northern boundary of contaminated site near Lianshui River. Consequently, a lot of attention has been drawn to the water quality. The concentration variation of Cr(VI) in the monitoring wells, presented in Figure 9, illustrates a progressive increase in Cr(VI) contaminant concentration within the groundwater across all monitoring wells (O1 to O8). The concentration curves of each well indicate a consistent rise in Cr(VI) levels over the 36-month period. While the concentration in each well shows a distinct upward pattern, contamination of groundwater near the affected site and downstream areas of the Lianshui River, as highlighted by monitoring wells O3, O5, O6, and O7, posed a significant threat due to elevated levels of Cr(VI).
The above results indicate that hexavalent chromium (Cr(VI)) primarily moves downstream due to convection (fluid flow). However, there is also diffusion of Cr(VI) occurring in the upstream direction, creating a more complex migration pattern in the system. In fact, the movement and behavior of hexavalent chromium (Cr(VI)) in groundwater highlight the influence of two factors: hydraulic gradients and concentration gradients [11]. Areas with high gradients of hydraulic heads facilitate rapid downstream movement, while high concentration gradients drive convection from areas of high concentration to areas of low concentration.
As the contaminant plume continuously expands and the area of high concentration regions gradually increases due to the migration of Cr(VI), it is necessary to set remediation facilities, such as pumping wells and permeable reactive barriers (PRBS), in the downstream location of the site to effectively reduce the risk of chromium pollution in the downstream area.

3.2. Sensitivity Analysis

3.2.1. Directional Variograms

Figure 10 presents in the directional variograms of hydraulic conductivity, recharge, and porosity. The results distinctly illustrated that hydraulic conductivity, recharge, and porosity have the most substantial influence on the numeric simulation model output (Cr(VI) contaminant concentrations observed at three testing points at the end of simulation period). Conversely, the remaining parameters exhibit lower sensitivity. Moreover, it is essential to recognize that the sensitivity is influenced by the perturbation scale. Parameters display intricate sensitivity when subjected to a greater magnitude of variations across the perturbation scale, and conversely, sensitivity decreases with smaller magnitude. Parameters characterized by less sensitivity exhibited constrained variability in variograms over the perturbation scale, indicating their comparatively minor influence on the model outputs.

3.2.2. Sensitivity Index IVARS-50

The VARS-based most comprehensive sensitivity index IVARS-50, called ‘Total-Variogram Effect’ is selected. The results of analysis are shown in Figure 11. The parameters are ordered by parameter sensitivity. Overall, hydraulic conductivity, recharge, and porosity are the most influential parameters, followed by longitudinal dispersivity. The remaining parameters have minimal effect on the model outputs.
Given the IVARS-50 in Figure 11, hydraulic conductivity ( K ), recharge ( R ), and porosity ( n ) were identified as key parameters to assess the impact of uncertainty on the model’s predictions.
The Variogram-based global sensitivity results e.g., directional variogram and sensitivity index IVARS-50 at three different locations of test points show that the substantial influence of hydraulic conductivity, recharge, and porosity on numerical simulation model output (i.e., Cr(VI) concentration), as well as minimal effect of longitudinal dispersivity, specific storage, and specific yield on the Cr(VI) concentration migration.
Hydraulic conductivity plays a direct role in shaping the convective movement of contaminants by affecting pore-fluid velocity. Higher hydraulic conductivity facilitates faster contaminant movement, thereby significantly impacting groundwater transport and resulting in higher contaminant concentrations. Recharge rates significantly influence contaminant concentration by cyclically introducing contaminants into groundwater during water table fluctuations and surface infiltration. Thus, variations in recharge rate directly impact contaminant influx and concentration. Porosity directly impacts the groundwater transport in two key ways: by influencing seepage velocity, which governs convective transport, and by determining the volume of void space available to store groundwater and contaminants. Variations in porosity can lead to changes in the distribution and retention of contaminants, affecting their transport behavior and ultimately impacting contaminant concentration levels in groundwater.
Conversely, longitudinal dispersity, specific storage, and specific yield have a relatively lower sensitivity in influencing contaminant transport. Longitudinal dispersity, although relevant for characterizing the dispersion of contaminants, has minimal influence on contaminant migration. This is attributed to the fact that contaminant dispersion in groundwater, when contrasted with the dominant influence of contaminant convection, is evidently inconsequential. Specific storage and specific yield play a crucial role in regulating groundwater flow dynamics by managing the storage and release of water within the aquifer. However, their impact on contaminant transport, particularly in terms of concentration, may be overshadowed by other parameters such as hydraulic conductivity, recharge, and porosity.
Combining the results in Figure 10 and Figure 11, it is seen that, in proximity to the contaminant source’s location, the magnitude of variations across the perturbation scale and sensitivity index of IVARS-50 is comparatively lower. Conversely, at test points situated farther from the contaminant sources, the magnitude of variations across the perturbation scale and sensitivity index of IVARS-50 exhibits a gradual increase. This phenomenon reflects to the inherent uncertainty associated with the model parameters.

3.3. Evaluation and Comparative Analysis of Surrogate Model Performances

We construct surrogate models for uncertainty estimation of simulation models. To construct robust surrogate models, a total of 300 trials were conducted for hyperparameter optimization using OHPO to systematically explore and search for the optimal combination of hyperparameters [75]. The number of trials are determined by the convergence of searched hyperparameters. As we will see in the results in Section 3.4, this number of trails are enough to build reliable surrogate model within the scope of our study. The MSE, and R2 were chosen as metrics to assess the accuracy of the model during the hyperparameter optimization process. The regularization parameters such as reg_lambda and reg_alpha, are chosen through OHPO, to minimize MSE and maximize the R2 for XGBoost surrogate model. Figure 12 shows the variation of MSE, and R2 during the hyperparameter tuning with the OHPO. The optimal hyperparameters for the model, characterized by the minimum MSE value and a high R2 were identified at the 201st and 265th trial of XGBoost and RF, respectively. The enhancement in performance for XGBoost surrogate model with (OHPO) approach is notably more substantial compared to the improvement observed in the RF surrogate model.
Table 4 provides a summary of the optimal hyperparameter values identified through OHPO.
The results of accuracy evaluation metrics for the surrogate models have been presented in Table 5. Figure 13 illustrates the fitting curves results, comparing the output of the XGBoost and Random Forest surrogate models with the output of the simulation model. In addition, we conducted a comparison of the accuracy metrics (R2, MSE, RMSE, and MRE) for surrogate models based on individual monitoring wells. The findings are illustrated in Figure 14, indicating a superior performance of the XGBoost surrogate model over the RF surrogate model. Specifically, the R2, MSE, RMSE, and MRE (%) values of surrogate models developed for seventh and eighth Cr(VI) contaminant concentration monitoring wells exceeded those of other monitoring wells. This variation can be attributed to the uncertainty in contaminant concentration output.
The above results indicate that the XGBoost surrogate model achieved the highest level of accuracy. Hence, due to its superior accuracy, the XGBoost surrogate model has the potential to serve as a replacement for the simulation model in uncertainty analysis.

3.4. Uncertainty Analysis of Groundwater Contaminant Transport

We used the XGBoost as the surrogate model to carry out uncertainty analysis to our simulation model. In GMS 10.6 software, solving the simulation model took about 50 s. The 1000 runs of simulation model for uncertainty analysis took 13.8 h. The surrogate model was established through 140 runs of the simulation model, taking 1.94 h. Employing the surrogate model in the Monte Carlo simulation reduced the calculation time by 85.94%, taking only 4 s.
Utilizing the simulation and surrogate models introduced earlier, we generated 1000 datasets through the application of the Latin Hypercube sampling method. This sampling was conducted within the parameter values’ range identified by global sensitivity analysis in the model. The outcomes of the sampling were input into the trained surrogate model to conduct a Monte Carlo simulation. Subsequently, the statistical analysis of the Cr(VI) concentration in the output from individual monitoring wells was performed using SPSS software (version 22). Frequency histograms of Cr(VI) concentration in each monitoring well are shown in Figure 15.
Figure 15a,b,d demonstrate that the uncertainty resulting from variability in key parameters had a lesser impact on wells O1, O2, and O4, which were in close proximity to the contaminant source. In contrast, Figure 15g,h indicate a significant impact on wells O8, and O7, positioned at a considerable distance from the contaminant source.
In uncertainty analysis, standard deviation plays a crucial role in quantifying the uncertainty or variability associated with a set of measurements, observations, or model predictions. Results of standard deviation are presented in Table 6. A higher standard deviation implies greater dispersion in the outputs, signifying increased uncertainty. Owing to the stochastic nature of key parameters, the Cr(VI) concentration in each well exhibited significant variations.
Among the monitoring wells, O4, O1, and O2 exhibited the least discreteness in their outputs, while O7 and O8 displayed the highest discreteness. This suggests that the later wells were more significantly influenced by the variability of the key parameters.
In the SPSS software, we also calculated the confidence intervals for Cr(VI) ion concentrations in each well at confidence levels of 60% and 90%. In the context of uncertainty analysis, confidence intervals are useful for conveying the precision or uncertainty associated with estimates, predictions, or model outcomes. The width of the confidence interval reflects the level of uncertainty: a wider interval suggests greater uncertainty, while a narrower interval implies more precision in the estimate.
As shown in Table 7, well O7 exhibited the widest confidence interval among all considered wells for the identical confidence level. At a confidence level of 90%, the confidence interval for well O7 ranged from 56.31 to 57.51 mg/L, highlighting its heightened susceptibility to uncertainty in the three key parameters. Conversely, the confidence interval for O4 was the smallest, indicating that it is less susceptible to uncertainty in these key parameters. Thus, the confidence interval results indicate that as the confidence level increases, the interval range also increases; conversely, as the confidence level decreases, the interval range diminishes, resulting in a more concentrated distribution around the mean.
In summary, the uncertainty analysis shows how the variability of key parameters affect the simulation results. This helps us to figure out the best ways to improve simulation accuracy for ensuring the reliability of our model’s predictions.
This study’s findings suggest the critical need to strategically place repair measures, like implementing a Permeable Reactive Barrier (PRB) and Pumping Wells, to reduce the Cr(VI) contamination downstream. Wells closer to the contamination source show less variability, indicating that they are less influenced by key parameters. Hence, prioritizing remediation efforts near the source, specifically at monitoring wells O1, O2, and O4, is crucial. Installing PRBs at these locations holds significant promise in intercepting and treating the contaminant plume. Continuous monitoring of these wells is vital to assess the effectiveness of the remediation interventions.
Given the wider confidence intervals observed in monitoring wells positioned farther from the contamination source, such as O7, addressing simulation uncertainties becomes crucial, particularly focusing on PRB placement. Uncertainty in PRB placement could greatly impact its effectiveness in intercepting and treating the contaminant plume. Therefore, reducing simulation uncertainties, particularly regarding PRB deployment, is essential to ensure the precision and reliability of remediation strategies targeting Cr(VI) contamination reduction. This, in turn, effectively protects groundwater quality and public health.

4. Conclusions

In this study, we propose a Variogram-based Global Sensitivity Analysis (VARS) to identify the most influential parameters affecting the numeric simulation model output, serving as key parameters. Based on these key parameters, an XGBoost surrogate model was established for approximating the inherent association between groundwater parameters and contaminants distribution. This approach aimed to reduce the computational burden of Monte Carlo simulation for uncertainty analysis while maintaining the accuracy of the model predictions.
The following conclusions can be made:
(1)
Through global sensitivity analysis, hydraulic conductivity ( K ), recharge ( R ), and porosity ( n ) were identified as the key parameters to conduct the comprehensive evaluation of uncertainty’s impact on numeric simulation model results. Sensitivity analysis serves a dual purpose by diminishing the input dimensions of the surrogate model, thereby enhancing its precision, and providing guidance for the investigation of contaminated site.
(2)
During the uncertainty analysis, an XGBoost-based surrogate model not only effectively captured the non-linear correlations between input and output of the numeric simulation model, but also markedly mitigated computational workload and calculation time. Using an XGBoost-based surrogate model, instead of directly calling the numeric simulation model leads to an 85.94% reduction in computation time, making the Monte Carlo simulation with this surrogate model viable and efficient.
(3)
Utilizing the Monte Carlo simulation to consider the impact of random variations of key parameters on the numeric simulation model, results showed how later wells were influenced by the variability in the key parameters, which provided insights for improving the accuracy of groundwater simulation.
(4)
The Numeric Simulation model illustrated that the movement of the Cr(VI) contaminant plume is toward downstream. In order to effectively reduce the risk of chromium pollution in the downstream area, some pollution remediation measures such as PRB and pumping wells are suggested be set in the downstream location of the site.
In future studies, we’ll develop Auto Machine Learning (ML) method to automate model selection and hyperparameter optimization in machine learning engineering, reducing human intervention and saving time in the process of simulation. However, the current simulation lacks validation against independently measured values of concentrations, emphasizing the need for a rigorous validation step. Such validation is indispensable for ensuring the reliability and accuracy of our model’s predictions by benchmarking them against real-world data.

Author Contributions

Conceptualization, Y.Z., H.D. and M.S.Y.; methodology, Y.Z, M.S.Y. and F.Y.; software, F.Y. and M.S.Y.; validation, Y.Z., H.D. and F.Y.; data curation, M.S.Y., F.Y. and Y.H.; writing—original draft preparation, Y.Z. and M.S.Y.; writing—review and editing, H.D. and Y.H.; funding acquisition, Y.Z. and H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by grants from the National Key Research and Development Program of China (Grant No. 2019YFC1805905), the National Natural Science Foundation of China (Grant Nos. 42372346, 41872249 and 42272344).

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Banaei, S.M.A.; Javid, A.H.; Hassani, A.H. Numerical Simulation of Groundwater Contaminant Transport in Porous Media. Int. J. Environ. Sci. Technol. 2021, 18, 151–162. [Google Scholar] [CrossRef]
  2. Rahman, S.H.; Khanam, D.; Adyel, T.M.; Islam, M.S.; Ahsan, M.A.; Akbor, M.A. Assessment of Heavy Metal Contamination of Agricultural Soil around Dhaka Export Processing Zone (DEPZ), Bangladesh: Implication of Seasonal Variation and Indices. Appl. Sci. 2012, 2, 584. [Google Scholar] [CrossRef]
  3. Chen, R.; Teng, Y.; Chen, H.; Yue, W.; Su, X.; Liu, Y.; Zhang, Q. A Coupled Optimization of Groundwater Remediation Alternatives Screening under Health Risk Assessment: An Application to a Petroleum-Contaminated Site in a Typical Cold Industrial Region in Northeastern China. J. Hazard. Mater. 2021, 407, 124796. [Google Scholar] [CrossRef]
  4. He, Q.; He, Y.; Hu, H.; Lou, W.; Zhang, Z.; Zhang, K.; Chen, Y.; Ye, W.; Sun, J. Laboratory Investigation on the Retention Performance of a Soil–Bentonite Mixture Used as an Engineered Barrier: Insight into the Effects of Ionic Strength and Associated Heavy Metal Ions. Environ. Sci. Pollut. Res. 2023, 30, 50162–50173. [Google Scholar] [CrossRef]
  5. Zhu, F.; Liu, T.; Zhang, Z.; Liang, W. Remediation of Hexavalent Chromium in Column by Green Synthesized Nanoscale Zero-Valent Iron/Nickel: Factors, Migration Model and Numerical Simulation. Ecotoxicol. Environ. Saf. 2021, 207, 111572. [Google Scholar] [CrossRef]
  6. Xu, X.; Huang, H.; Zhang, Y.; Xu, Z.; Cao, X. Biochar as Both Electron Donor and Electron Shuttle for the Reduction Transformation of Cr(VI) during Its Sorption. Environ. Pollut. 2019, 244, 423–430. [Google Scholar] [CrossRef]
  7. Xu, Z.; Xu, X.; Tao, X.; Yao, C.; Tsang, D.C.W.; Cao, X. Interaction with Low Molecular Weight Organic Acids Affects the Electron Shuttling of Biochar for Cr(VI) Reduction. J. Hazard. Mater. 2019, 378, 120705. [Google Scholar] [CrossRef] [PubMed]
  8. DesMarias, T.L.; Costa, M. Mechanisms of Chromium-Induced Toxicity. Curr. Opin. Toxicol. 2019, 14, 1–7. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, N.; Fang, Z.; Zhang, R. Comparison of Several Amendments for In-Site Remediating Chromium-Contaminated Farmland Soil. Water Air Soil Pollut. 2017, 228, 400. [Google Scholar] [CrossRef]
  10. He, Q.; He, Y.; Zhang, Z.; Ou, G.; Zhu, K.; Lou, W.; Zhang, K.; Chen, Y.; Ye, W. Spatiotemporal Distribution and Pollution Control of Pollutants in a Cr(VI)-Contaminated Site Located in Southern China. Chemosphere 2023, 340, 139897. [Google Scholar] [CrossRef]
  11. He, Y.; Hu, G.; Wu, D.; Zhu, K.; Zhang, K. Contaminant Migration and the Retention Behavior of a Laterite–Bentonite Mixture Engineered Barrier in a Landfill. J. Environ. Manag. 2022, 304, 114338. [Google Scholar] [CrossRef]
  12. Deng, H.; Zhou, S.; He, Y.; Lan, Z.; Zou, Y.; Mao, X. Efficient Calibration of Groundwater Contaminant Transport Models Using Bayesian Optimization. Toxics 2023, 11, 438. [Google Scholar] [CrossRef]
  13. Liu, L.; Liang, S.; Liu, H.; Tan, W.; Zhu, G. Migration of C r 2 O 7 2 and Butanone in Soil and Groundwater System After the Tianjin Port 8·12 Explosion. Trans. Tianjin Univ. 2018, 24, 522–531. [Google Scholar] [CrossRef]
  14. He, Y.; Hu, G.; Zhang, Z.; Lou, W.; Zou, Y.; Li, X.; Zhang, K. Experimental Study and Numerical Simulation on the Migration and Transformation Mechanism of Hexavalent Chromium in Contaminated Site. Rock Soil Mech. 2022, 43, 528–538. [Google Scholar] [CrossRef]
  15. Wu, X.; Ye, T.; Xie, C.; Li, K.; Liu, C.; Yang, Z.; Han, R.; Wu, H.; Wang, Z. Experimental and Modeling Study on Cr(VI) Migration from Slag into Soil and Groundwater. Processes 2022, 10, 2235. [Google Scholar] [CrossRef]
  16. Guo, S.; Xu, Y.; Yang, J. Simulating the Migration and Species Distribution of Cr and Inorganic Ions from Tanneries in the Vadose Zone. J. Environ. Manag. 2021, 288, 112441. [Google Scholar] [CrossRef] [PubMed]
  17. Liu, L.; Xiao, L.; Li, X. Simulation of Migration of Hexavalent Chromium in Groundwater. Xitong Fangzhen Xuebao/J. Syst. Simul. 2018, 30, 560–568. [Google Scholar] [CrossRef]
  18. Xue, L.; Zhang, D.; Guadagnini, A.; Neuman, S.P. Multimodel B Ayesian Analysis of Groundwater Data Worth. Water Resour. Res. 2014, 50, 8481–8496. [Google Scholar] [CrossRef]
  19. Chitsazan, N.; Nadiri, A.A.; Tsai, F.T.-C. Prediction and Structural Uncertainty Analyses of Artificial Neural Networks Using Hierarchical Bayesian Model Averaging. J. Hydrol. 2015, 528, 52–62. [Google Scholar] [CrossRef]
  20. Moazamnia, M.; Hassanzadeh, Y.; Nadiri, A.A.; Khatibi, R.; Sadeghfam, S. Formulating a Strategy to Combine Artificial Intelligence Models Using Bayesian Model Averaging to Study a Distressed Aquifer with Sparse Data Availability. J. Hydrol. 2019, 571, 765–781. [Google Scholar] [CrossRef]
  21. Troldborg, M.; Nowak, W.; Tuxen, N.; Bjerg, P.L.; Helmig, R.; Binning, P.J. Uncertainty Evaluation of Mass Discharge Estimates from a Contaminated Site Using a Fully Bayesian Framework. Water Resour. Res. 2010, 46, 2010WR009227. [Google Scholar] [CrossRef]
  22. Delottier, H.; Pryet, A.; Dupuy, A. Why Should Practitioners Be Concerned about Predictive Uncertainty of Groundwater Management Models? Water Resour. Manag. 2017, 31, 61–73. [Google Scholar] [CrossRef]
  23. Nemati, M.; Tabari, M.M.R.; Hosseini, S.A.; Javadi, S. A Novel Approach Using Hybrid Fuzzy Vertex Method-MATLAB Framework Based on GMS Model for Quantifying Predictive Uncertainty Associated with Groundwater Flow and Transport Models. Water Resour. Manag. 2021, 35, 4189–4215. [Google Scholar] [CrossRef]
  24. Troldborg, L.; Ondracek, M.; Koch, J.; Kidmose, J.; Refsgaard, J.C. Quantifying Stratigraphic Uncertainty in Groundwater Modelling for Infrastructure Design. Hydrogeol. J. 2021, 29, 1075–1089. [Google Scholar] [CrossRef]
  25. Bordbar, M.; Neshat, A.; Javadi, S. A New Hybrid Framework for Optimization and Modification of Groundwater Vulnerability in Coastal Aquifer. Environ. Sci. Pollut. Res. 2019, 26, 21808–21827. [Google Scholar] [CrossRef] [PubMed]
  26. Yan, X.; Lu, W.; An, Y.; Chang, Z. Uncertainty Analysis of Parameters in Non-Point Source Pollution Simulation: Case Study of the Application of the Soil and Water Assessment Tool Model to Yitong River Watershed in Northeast China. Water Environ. J. 2019, 33, 390–400. [Google Scholar] [CrossRef]
  27. Xing, Y.; Shao, D.; Yang, Y.; Ma, X.; Zhang, S. Influence and Interactions of Input Factors in Urban Flood Inundation Modeling: An Examination with Variance-Based Global Sensitivity Analysis. J. Hydrol. 2021, 600, 126524. [Google Scholar] [CrossRef]
  28. van Griensven, A.; Meixner, T.; Grunwald, S.; Bishop, T.; Diluzio, M.; Srinivasan, R. A Global Sensitivity Analysis Tool for the Parameters of Multi-Variable Catchment Models. J. Hydrol. 2006, 324, 10–23. [Google Scholar] [CrossRef]
  29. Tansar, H.; Duan, H.-F.; Mark, O. Global Sensitivity Analysis of Bioretention Cell Design for Stormwater System: A Comparison of VARS Framework and Sobol Method. J. Hydrol. 2023, 617, 128895. [Google Scholar] [CrossRef]
  30. Nolan, B.T.; Healy, R.W.; Taber, P.E.; Perkins, K.; Hitt, K.J.; Wolock, D.M. Factors Influencing Ground-Water Recharge in the Eastern United States. J. Hydrol. 2007, 332, 187–205. [Google Scholar] [CrossRef]
  31. Hornberger, G.M.; Spear, R.C. Approach to the Preliminary Analysis of Environmental Systems. J. Environ. Manag. 1981, 12, 7–18. [Google Scholar]
  32. Morris, M.D. Factorial Sampling Plans for Preliminary Computational Experiments. Technometrics 1991, 33, 161–174. [Google Scholar] [CrossRef]
  33. Sobolprime, I.M. Sensitivity Analysis for Non-Linear Mathematical Models. Math. Model. Comput. Exp. 1993, 1, 407–414. [Google Scholar]
  34. Razavi, S.; Gupta, H.V. A New Framework for Comprehensive, Robust, and Efficient Global Sensitivity Analysis: 1. Theory. Water Resour. Res. 2016, 52, 423–439. [Google Scholar] [CrossRef]
  35. Razavi, S.; Gupta, H.V. A New Framework for Comprehensive, Robust, and Efficient Global Sensitivity Analysis: 2. Application. Water Resour. Res. 2016, 52, 440–455. [Google Scholar] [CrossRef]
  36. Loucks, D.P.; van Beek, E. System Sensitivity and Uncertainty Analysis. In Water Resource Systems Planning and Management; Springer: Cham, Switzerland, 2017; pp. 331–374. ISBN 978-3-319-44234-1. [Google Scholar]
  37. Jiang, X.; Na, J.; Lu, W.; Zhang, Y. Coupled Monte Carlo Simulation and Copula Theory for Uncertainty Analysis of Multiphase Flow Simulation Models. Environ. Sci. Pollut. Res. 2017, 24, 24284–24296. [Google Scholar] [CrossRef] [PubMed]
  38. Zhang, Z.; Wang, X.; Li, M. Uncertainty analysis of WASP based on global sensitivity analysis method. China Environ. Sci. 2014, 34, 1336–1346. [Google Scholar]
  39. Luo, J.; Liu, Y.; Li, X.; Xin, X.; Lu, W. Inversion of Groundwater Contamination Source Based on a Two-Stage Adaptive Surrogate Model-Assisted Trust Region Genetic Algorithm Framework. Appl. Math. Model. 2022, 112, 262–281. [Google Scholar] [CrossRef]
  40. Li, S.; Farrar, C.; Yang, Y. Efficient Regional Seismic Risk Assessment via Deep Generative Learning of Surrogate Models. Earthq. Eng. Struct. Dyn. 2023, 52, 3435–3454. [Google Scholar] [CrossRef]
  41. Rajabi, M.M. Review and Comparison of Two Meta-Model-Based Uncertainty Propagation Analysis Methods in Groundwater Applications: Polynomial Chaos Expansion and Gaussian Process Emulation. Stoch Environ. Res. Risk Assess. 2019, 33, 607–631. [Google Scholar] [CrossRef]
  42. Stone, N. Gaussian Process Emulators for Uncertainty Analysis in Groundwater Flow. Ph.D. Thesis, University of Nottingham, Nottingham, UK, 2011. [Google Scholar]
  43. Miao, T.; Lu, W.; Lin, J.; Guo, J.; Liu, T. Modeling and Uncertainty Analysis of Seawater Intrusion in Coastal Aquifers Using a Surrogate Model: A Case Study in Longkou, China. Arab. J. Geosci. 2018, 12, 1. [Google Scholar] [CrossRef]
  44. Miao, T.; Lu, W.; Luo, J.; Guo, J. Application of Set Pair Analysis and Uncertainty Analysis in Groundwater Pollution Assessment and Prediction: A Case Study of a Typical Molybdenum Mining Area in Central Jilin Province, China. Environ. Earth Sci. 2019, 78, 323. [Google Scholar] [CrossRef]
  45. Miao, T.; Lu, W.; Guo, J.; Lin, J.; Fan, Y. Modeling and Uncertainty Analysis of Seawater Intrusion Based on Surrogate Models. Environ. Sci. Pollut. Res. 2019, 26, 26015–26025. [Google Scholar] [CrossRef] [PubMed]
  46. Fan, Y.; Wu, Q.; Cui, H.; Lu, W.; Ren, W. Stochastic Simulation of Seawater Intrusion in the Longkou Area of China Based on the Monte Carlo Method. Environ. Sci. Pollut. Res. 2023, 30, 22063–22077. [Google Scholar] [CrossRef] [PubMed]
  47. Thiros, N.E.; Gardner, W.P.; Maneta, M.P.; Brinkerhoff, D.J. Quantifying Subsurface Parameter and Transport Uncertainty Using Surrogate Modelling and Environmental Tracers. Hydrol. Process. 2022, 36, e14743. [Google Scholar] [CrossRef]
  48. Han, Z.; Lu, W.; Lin, J. Uncertainty Analysis for Precipitation and Sea-Level Rise of a Variable-Density Groundwater Simulation Model Based on Surrogate Models. Environ. Sci. Pollut. Res. 2020, 27, 28077–28090. [Google Scholar] [CrossRef] [PubMed]
  49. Fan, Y.; Lu, W.; Miao, T.; Li, J.; Lin, J. Optimum Design of a Seawater Intrusion Monitoring Scheme Based on the Image Quality Assessment Method. Water Resour. Manag. 2020, 34, 2485–2502. [Google Scholar] [CrossRef]
  50. Špetlík, M.; Březina, J. Groundwater Contaminant Transport Solved by Monte Carlo Methods Accelerated by Deep Learning Meta-Model. Appl. Sci. 2022, 12, 7382. [Google Scholar] [CrossRef]
  51. Miao, T.; Huang, H.; Guo, J.; Li, G.; Zhang, Y.; Chen, N. Uncertainty Analysis of Numerical Simulation of Seawater Intrusion Using Deep Learning-Based Surrogate Model. Water 2022, 14, 2933. [Google Scholar] [CrossRef]
  52. Yu, X.; Cui, T.; Sreekanth, J.; Mangeon, S.; Doble, R.; Xin, P.; Rassam, D.; Gilfedder, M. Deep Learning Emulators for Groundwater Contaminant Transport Modelling. J. Hydrol. 2020, 590, 125351. [Google Scholar] [CrossRef]
  53. Wang, R.-Z.; Gu, H.-H.; Liu, Y.; Miura, H.; Zhang, X.-C.; Tu, S.-T. Surrogate-Modeling-Assisted Creep-Fatigue Reliability Assessment in a Low-Pressure Turbine Disc Considering Multi-Source Uncertainty. Reliab. Eng. Syst. Saf. 2023, 240, 109550. [Google Scholar] [CrossRef]
  54. Zhang, L.; Lin, P. Multi-Objective Optimization for Limiting Tunnel-Induced Damages Considering Uncertainties. Reliab. Eng. Syst. Saf. 2021, 216, 107945. [Google Scholar] [CrossRef]
  55. Aquaveo, LLC. Groundwater Modeling System (GMS, v.10.6). Available online: https://www.aquaveo.com/downloads-gms (accessed on 10 November 2023).
  56. He, H.; Shan, H.; Mo, D.; Liu, Y.; Peng, S.; Cheng, Y.; Chen, M.; Yan, Z. Simulation Study on the Environmental Impact of Rare Earth Ore Development on Groundwater in Hilly Areas: A Case Study in Nuodong, China. Water 2023, 15, 263. [Google Scholar] [CrossRef]
  57. Rabemaharitra, T.P.; Zou, Y.; Yi, Z.; He, Y.; Khan, U. Optimized Pilot Point Emplacement Based Groundwater Flow Calibration Method for Heterogeneous Small-Scale Area. Appl. Sci. 2022, 12, 4648. [Google Scholar] [CrossRef]
  58. Konikow, L.F. The Secret to Successful Solute-Transport Modeling. Ground Water 2011, 49, 144–159. [Google Scholar] [CrossRef] [PubMed]
  59. Kresic, N. Hydrogeology and Groundwater Modeling, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2006; ISBN 978-0-429-12210-1. [Google Scholar]
  60. Zheng, C.; Bennett, G. Applied Contaminant Transport Modeling; Wiley-Interscience: New York, NY, USA, 2002; Volume 34, ISBN 978-0-471-38477-9. [Google Scholar]
  61. Liu, L.; Chen, J.; Niu, H.; Li, L.; Yin, L.; Wei, Y. Numerical simulation of three-dimensional soil-groundwater coupled chromium contamination based on FEFLOW. Hydrogeol. Eng. Geol. 2022, 49, 164–174. [Google Scholar] [CrossRef]
  62. Razavi, S.; Gupta, H.V. A Multi-Method Generalized Global Sensitivity Matrix Approach to Accounting for the Dynamical Nature of Earth and Environmental Systems Models. Environ. Model. Softw. 2019, 114, 1–11. [Google Scholar] [CrossRef]
  63. Haghnegahdar, A.; Razavi, S. Insights into Sensitivity Analysis of Earth and Environmental Systems Models: On the Impact of Parameter Perturbation Scale. Environ. Model. Softw. 2017, 95, 115–131. [Google Scholar] [CrossRef]
  64. Sheikholeslami, R.; Razavi, S. Progressive Latin Hypercube Sampling: An Efficient Approach for Robust Sampling-Based Analysis of Environmental Models. Environ. Model. Softw. 2017, 93, 109–126. [Google Scholar] [CrossRef]
  65. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  66. Yan, Y.; Li, X.; Sun, W.; Fang, X.; He, F.; Tu, J. Semi-Surrogate Modelling of Droplets Evaporation Process via XGBoost Integrated CFD Simulations. Sci. Total Environ. 2023, 895, 164968. [Google Scholar] [CrossRef]
  67. Chen, Y.; Li, F.; Zhou, S.; Zhang, X.; Zhang, S.; Zhang, Q.; Su, Y. Bayesian Optimization Based Random Forest and Extreme Gradient Boosting for the Pavement Density Prediction in GPR Detection. Constr. Build. Mater. 2023, 387, 131564. [Google Scholar] [CrossRef]
  68. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  69. Tian, L.; Hu, L.; Wang, D.; Cao, X. Site-Scale Groundwater Pollution Risk Assessment Using Surrogate Models and Statistical Analysis. J. Contam. Hydrol. 2024, 261, 104288. [Google Scholar] [CrossRef] [PubMed]
  70. Iman, R.L. Latin Hypercube Sampling. In Encyclopedia of Quantitative Risk Analysis and Assessment; Melnick, E.L., Everitt, B.S., Eds.; Wiley: Hoboken, NJ, USA, 2008; ISBN 978-0-470-03549-8. [Google Scholar]
  71. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Francisco, CA, USA, 4–8 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2623–2631. [Google Scholar]
  72. Luo, J.; Ma, X.; Ji, Y.; Li, X.; Song, Z.; Lu, W. Review of Machine Learning-Based Surrogate Models of Groundwater Contaminant Modeling. Environ. Res. 2023, 238, 117268. [Google Scholar] [CrossRef]
  73. Xie, W.; Nelson, B.L.; Barton, R.R. Statistical Uncertainty Analysis for Stochastic Simulation. arXiv 2020, arXiv:2011.04207. [Google Scholar] [CrossRef]
  74. Zhang, J. Modern Monte Carlo Methods for Efficient Uncertainty Quantification and Propagation: A Survey. WIREs Comput. Stats 2021, 13, e1539. [Google Scholar] [CrossRef]
  75. Pan, Z.; Lu, W.; Wang, H.; Bai, Y. Groundwater Contaminant Source Identification Based on an Ensemble Learning Search Framework Associated with an Auto Xgboost Surrogate. Environ. Model. Softw. 2023, 159, 105588. [Google Scholar] [CrossRef]
Figure 1. Location of study area.
Figure 1. Location of study area.
Water 16 00638 g001
Figure 2. Technical roadmap of proposed methodology.
Figure 2. Technical roadmap of proposed methodology.
Water 16 00638 g002
Figure 3. The boundary of study area, distribution of monitoring wells, and locations of sensitivity analysis test points.
Figure 3. The boundary of study area, distribution of monitoring wells, and locations of sensitivity analysis test points.
Water 16 00638 g003
Figure 4. 3D finite difference cell centered grid of study area.
Figure 4. 3D finite difference cell centered grid of study area.
Water 16 00638 g004
Figure 5. Initial contaminant concentration in groundwater [61].
Figure 5. Initial contaminant concentration in groundwater [61].
Water 16 00638 g005
Figure 6. Groundwater flow patterns in steady state.
Figure 6. Groundwater flow patterns in steady state.
Water 16 00638 g006
Figure 7. Results of calibration of hydraulic heads: (a) scatter versus observed and simulated hydraulic heads; (b) comparison of observed and simulated hydraulic head at ten monitoring wells.
Figure 7. Results of calibration of hydraulic heads: (a) scatter versus observed and simulated hydraulic heads; (b) comparison of observed and simulated hydraulic head at ten monitoring wells.
Water 16 00638 g007
Figure 8. Transport of Cr(VI) in the study area: (a) 6 months; (b) 12 months; (c) 18 months; (d) 24 months; (e) 30 months; (f) 36 months.
Figure 8. Transport of Cr(VI) in the study area: (a) 6 months; (b) 12 months; (c) 18 months; (d) 24 months; (e) 30 months; (f) 36 months.
Water 16 00638 g008
Figure 9. Concentration changes curve of Cr(VI) in the monitoring wells.
Figure 9. Concentration changes curve of Cr(VI) in the monitoring wells.
Water 16 00638 g009
Figure 10. Directional variogram results of point 01–point 03 (ac).
Figure 10. Directional variogram results of point 01–point 03 (ac).
Water 16 00638 g010
Figure 11. IVARS-50 sensitivity index results of point 01–point 03 (ac).
Figure 11. IVARS-50 sensitivity index results of point 01–point 03 (ac).
Water 16 00638 g011
Figure 12. MSE and R2 across 300 trials of XGBoost and RF during hyperparameter optimization.
Figure 12. MSE and R2 across 300 trials of XGBoost and RF during hyperparameter optimization.
Water 16 00638 g012
Figure 13. Fitting curves results of surrogate models: (a) XGBoost; (b) RF.
Figure 13. Fitting curves results of surrogate models: (a) XGBoost; (b) RF.
Water 16 00638 g013
Figure 14. Accuracy of surrogate models for each monitoring well: (a) R2; (b) MSE; (c) RMSE; (d) MRE (%).
Figure 14. Accuracy of surrogate models for each monitoring well: (a) R2; (b) MSE; (c) RMSE; (d) MRE (%).
Water 16 00638 g014
Figure 15. Frequency histograms of Cr(VI) concentration in each monitoring well O1–O8 (ah).
Figure 15. Frequency histograms of Cr(VI) concentration in each monitoring well O1–O8 (ah).
Water 16 00638 g015
Table 1. Values and ranges of parameters of medium sandsilt-round gravel confined aquifer.
Table 1. Values and ranges of parameters of medium sandsilt-round gravel confined aquifer.
ParametersValuesRanges
Hydraulic Conductivity (m/s)5.32 × 10−54.25 × 10−5–6.38 × 10−5
Recharge Rate (m/s)4.3 × 10−93.49 × 10−9–5.25 × 10−9
Specific Storage (m−1)1.0 × 10−48.0 × 10−5–1.2 × 10−4
Specific Yield0.200.16–0.24
Porosity0.400.32–0.48
Longitudinal Dispersivity (m)1310.4–15.6
Table 3. Optimized hyperparameters generated by Optuna hyperparameter optimization.
Table 3. Optimized hyperparameters generated by Optuna hyperparameter optimization.
Surrogate ModelHyperparameterDescriptionLower ValueUpper Value
XGBoostn_estimatorsTotal Number of boosting trees501000
max_depthMaximum tree depth1.010
learning_rateBoosting rate0.010.2
gammaTree growth control1.0−51.0
reg_lambdaL2 regularization term weight1.0−51.0
reg_alphaL1 regularization term weight1.0−51.0
subsampleRandom sampling fraction0.51.0
colsample_bytreeFeature selection fraction0.51.0
scale_pos_weightClass imbalance correction factor1.010.0
RFn_estimatorsTotal number of trees in forest501000
max_depthMaximum tree depth1.010
Table 4. Optimal hyperparameter values captured by Optuna hyperparameter optimization.
Table 4. Optimal hyperparameter values captured by Optuna hyperparameter optimization.
Surrogate ModelHyperparameterOptimal Value
XGBoostn_estimators984
max_depth2.00
learning_rate0.069
gamma0.012
reg_lambda0.929
reg_alpha0.105
subsample0.584
colsample_bytree0.872
scale_pos_weight2.877
RFn_estimators318
max_depth2.00
Table 5. Comparison of accuracy of surrogate models.
Table 5. Comparison of accuracy of surrogate models.
Surrogate ModelR2 Mean Relative Error (%)MSERMSE
XGBoost0.9761.5540.4750.689
RF0.9342.7111.2381.113
Table 6. Statistical summary of concentration output values for contaminant in monitoring wells.
Table 6. Statistical summary of concentration output values for contaminant in monitoring wells.
Well NumberMaximum ValueMinimum ValueMean ValueStandard Deviation
O122.6216.5820.281.15
O238.9929.7035.111.76
O3109.7592.48101.093.49
O46.744.115.510.56
O584.9770.8379.272.93
O6105.9389.3699.463.29
O782.7228.3556.9011.53
O836.141.0414.126.87
Table 7. Interval estimation for each contaminant monitoring well.
Table 7. Interval estimation for each contaminant monitoring well.
Monitoring WellsConfidence Level (%)Confidence
Interval (mg/L)
Confidence Level (%)Confidence
Interval (mg/L)
O19020.22–20.346020.25–20.31
O29035.01–35.206035.06–35.15
O390100.91–101.2760101.00–101.18
O4905.48–5.54605.49–5.53
O59079.12–79.426079.19–79.35
O69099.29–99.636099.37–99.55
O79056.31–57.516056.59–57.21
O89013.77–14.486013.94–14.31
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zou, Y.; Yousaf, M.S.; Yang, F.; Deng, H.; He, Y. Surrogate-Based Uncertainty Analysis for Groundwater Contaminant Transport in a Chromium Residue Site Located in Southern China. Water 2024, 16, 638. https://doi.org/10.3390/w16050638

AMA Style

Zou Y, Yousaf MS, Yang F, Deng H, He Y. Surrogate-Based Uncertainty Analysis for Groundwater Contaminant Transport in a Chromium Residue Site Located in Southern China. Water. 2024; 16(5):638. https://doi.org/10.3390/w16050638

Chicago/Turabian Style

Zou, Yanhong, Muhammad Shahzad Yousaf, Fuqiang Yang, Hao Deng, and Yong He. 2024. "Surrogate-Based Uncertainty Analysis for Groundwater Contaminant Transport in a Chromium Residue Site Located in Southern China" Water 16, no. 5: 638. https://doi.org/10.3390/w16050638

APA Style

Zou, Y., Yousaf, M. S., Yang, F., Deng, H., & He, Y. (2024). Surrogate-Based Uncertainty Analysis for Groundwater Contaminant Transport in a Chromium Residue Site Located in Southern China. Water, 16(5), 638. https://doi.org/10.3390/w16050638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop