Next Article in Journal
Evolution Characteristics of Heilongtan Spring Discharge and Its Response Law to Precipitation in Lijiang City, China
Previous Article in Journal
A Fully Connected Neural Network (FCNN) Model to Simulate Karst Spring Flowrates in the Umbria Region (Central Italy)
Previous Article in Special Issue
Groundwater Dynamics in African Endorheic Basins in Arid to Semi-Arid Transition Zones: The Batha Aquifer System, NE Chad
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Empirical Bayesian Kriging, a Robust Method for Spatial Data Interpolation of a Large Groundwater Quality Dataset from the Western Netherlands

by
Mojtaba Zaresefat
1,*,
Reza Derakhshani
2,3 and
Jasper Griffioen
1,4
1
Copernicus Institute of Sustainable Development, Utrecht University, Princetonlaan 8a, 3584 CB Utrecht, The Netherlands
2
Department of Geology, Shahid Bahonar University of Kerman, Kerman 76169-13439, Iran
3
Department of Earth Sciences, Utrecht University, Princetonlaan 8a, 3584 CB Utrecht, The Netherlands
4
TNO Geological Survey of the Netherlands, Princetonlaan 6, 3584 CB Utrecht, The Netherlands
*
Author to whom correspondence should be addressed.
Water 2024, 16(18), 2581; https://doi.org/10.3390/w16182581
Submission received: 17 July 2024 / Revised: 3 September 2024 / Accepted: 9 September 2024 / Published: 12 September 2024
(This article belongs to the Special Issue Water, Geohazards, and Artificial Intelligence, 2nd Edition)

Abstract

:
No single spatial interpolation method reigns supreme for modelling the precise spatial distribution of groundwater quality data. This study addresses this challenge by evaluating and comparing several commonly used geostatistical methods: Local Polynomial Interpolation (LPI), Ordinary Kriging (OK), Simple Kriging (SK), Universal Kriging (UK), and Empirical Bayesian Kriging (EBK). We applied these methods to a vast dataset of 3033 groundwater records encompassing a substantial area (11,100 km2) in the coastal lowlands of the western Netherlands. To our knowledge, no prior research has investigated these interpolation methods in this specific hydrogeological setting, exhibiting a range of groundwater qualities, from fresh to saline, often anoxic, with high natural concentrations of PO4 and NH4. The prediction performance of the interpolation methods was assessed through statistical indicators such as root means square error. The findings indicated that EBK outperforms the other geostatistical methods in forecasting groundwater quality for the five variables considered: Cl, SO4, Fe, PO4, and NH4. In contrast, SK performed worst for the species except for SO4. We recommend not using SK to interpolate groundwater quality species unless the data exhibit low spatial variation, high sample density, or evenly distributed sampling.

1. Introduction

Spatial and temporal information about groundwater’s hydrochemical properties is essential for managing groundwater resources. Obtaining reliable groundwater analyses for a region can be costly and laborious, and samples cannot be collected without monitoring wells, usually resulting in a limited spatial data density. Therefore, the ability to predict water quality in unsampled areas is essential.
Geosciences employ GIS and geostatistical analysis to derive predicted values at unsampled sites [1]. Developing efficient interpolation methods has been a long-standing tradition in GIS. The methods for interpolating spatial data are generally divided into two main categories: deterministic and geostatistical methods. The deterministic approach is a method that uses a mathematical function to compute values at locations that have not been sampled. Distance-weighted smoothing in this approach considers the spatial proximity of variables. Locations closer together are more likely to have similar values than those farther apart. In contrast, geostatistical approaches account for the spatial relationship between data points. These approaches create a surface with inherent spatial dependence, leading to potentially more accurate predictions in unsampled areas [2]. The geostatistics approach analyses the spatial pattern of a parameter across sampled locations and uses this information to build a statistical model that predicts the parameter’s value at unsampled points, considering the distance between them [3,4,5].
Several factors affect interpolation accuracy, such as the sampling design, population size, boundary demarcation, and normality of the dataset. These factors also affect how interpolation techniques can be used [6,7,8,9]. Thus, choosing the more accurate interpolation method is important. In the geospatial analysis of groundwater quality, Ordinary Kriging (OK), Simple Kriging (SK), Universal Kriging (UK) as classical kriging methods, Local Polynomial Interpolation (LPI) as the deterministic method, and Empirical Bayesian Kriging (EBK) are more commonly used interpolation techniques [10,11,12].
No single interpolation method has been accepted for groundwater quality studies. For example, ref. [13] investigated the performance of UK and OK, along with EBK, across a 2145 km2 Mahvelat plain in Khorasan Razavi, Iran. They discovered that while EBK was the most accurate at predicting salinity, UK came in second. Seyedmohammadi [14] reported that OK provided the most precise interpolation results for groundwater electrical conductivity (EC) in Guilan Province, Iran. Xiao et al. [15] investigated different interpolation methods across the Yangtze River Estuary (China; 550 km2). OK was a more accurate predictor of total phosphorus concentrations in their analysis. The best methods for interpolating groundwater quality species in the Rumuola Community (135 km2) in Obio-Akpor, Nigeria, were found to be EBK for pH, TDS for sulphate and nitrate, and OK for nickel and hardness based on the relative performance of four interpolation methods [16]. Kumari et al. [17] evaluated the ability of various interpolation methods, including LPI, SK, and EBK, to predict various groundwater quality species over the Ulagalla cascade (51 km2) in Sri Lanka and concluded that EBK was the best method due to its low error. They, however, recommended EBK for smaller datasets and LPI for less variable datasets that do not contain extreme values near boundaries. Overall, the EBK model, as a solid non-stationary algorithm for spatiotemporal interpolation, often makes the best of all the geostatistical models for interpolating groundwater data [10,13,18].
To our knowledge, no study has evaluated these interpolation methods in the quasi-3D coastal lowlands of the western Netherlands. This study uniquely addresses this gap by evaluating and comparing several interpolation methods for a vast dataset of groundwater samples (spanning 11,100 km2). This region faces critical challenges related to geological saltwater intrusions (reflected by high Cl and SO4 concentrations), which happened last from 50 BC to 400 AD [19], and naturally elevated nutrient levels (NH4 and PO4) [20,21], originating from degradation of sedimentary organic matter and peat. Ferrous Fe (Fe2+) is also included due to its importance in understanding the area’s redox state (following this, Fe2+ is presented as Fe, considering negligible Fe3+). We aim to create high-resolution visualisations of these critical groundwater quality species by pinpointing the most effective interpolation method. This will offer valuable insights for managing groundwater resources in this intricate coastal environment, allowing for informed decision-making regarding salinity intrusion, nutrient levels, and redox conditions.

2. Study Area

The low-lying western part of The Netherlands lies in the southeastern North Sea sedimentary basin. A dune belt with a length of almost 150 km and a width of c. 8 km is found along the coast of the North Sea. Polders dominate the landscape behind this dune belt.
The geology is characterised by Holocene and Pleistocene deposits (Figure 1). The surface is predominantly covered by a complex Holocene layer that serves mostly as a confining unit. In the eastern riverine region, this layer is composed mainly of fluvial deposits and peat, while in the western coastal region, it comprises a combination of fluvial, marine deposits, and peat [20]. The thickness of this confining layer increases westward from less than 1 m to over 50 m and is mostly between 5 and 20 m [22]. Below the polder area and the coastal dunes, the geological setting is composed of Late and Middle Pleistocene and some periglacial deposits. Significant fluvial activity occurred during the Middle Pleistocene epochs, with large rivers depositing sand and gravel along their banks [23]. These deposits are recorded in a sequence of geological formations that reflect river dynamics and sea level changes. These formations, listed from oldest to youngest, include the Peize and Waalre (PZWA), Sterksel (ST), and Urk (UR) Formations. The Saalian ice age profoundly impacted the area, with advancing ice sheets from the north lowering sea levels and creating extensive fluvial systems. As the ice retreated, it left behind glacial deposits such as the Drente Formation (DR) and ice-pushed sediments (DT), including till, outwash, and moraines. These glacial deposits are now found both buried and exposed at the surface in the eastern part of the study area as ice-pushed ridges.
In the Late Pleistocene interglacial period, marked by high sea levels, marine deposits formed, stratigraphically recognised as the Eem Formation (EE). During the Weichselian ice age, periglacial conditions prevailed, leading to the deposition of aeolian cover sands from the Boxtel Formation (BX) and fluvial sediments from the Kreftenheye Formation (KR), which blanket much of the older Pleistocene deposit. Figure A1 in the Appendix A presents three hydrogeological cross-sections across the study area, illustrating the distribution of aquifers, aquitards, and complex layer and the associated geological formations.
Around 60% of the study area lies near or slightly below sea level (BSL) [24], known as lowland areas. One prominent feature in the northeastern part of the study area is Lake IJsselmeer, which spans around 1100 km2 and has average and maximum water depths of 4.5 and 7 m, respectively. Historically a marine bay, this lake contains fresh water primarily sourced from the River IJssel since the embankment in 1932 [25]. The relatively shallow depths of the lake may influence interactions with underlying groundwater, which depends on specific locations and geological conditions. Water level management in the lowland region involves pumping surface water directly into major rivers or through a network of lakes and canals. In the deepest polders, regional groundwater exfiltration occurs [20]. Additionally, groundwater recharge results from multiple sources, including rainwater infiltration, lake infiltration, seawater intrusion, and the infiltration of inlet water from large rivers. The variations in surface water levels and the hydraulic resistance of the confining layer significantly impact the extent of groundwater recharge. The region’s groundwater quality exhibits variability in several parameters, including salinity, redox state, pH, saturation state for carbonate minerals, and natural nutrients [20]. This variability is influenced by the complex interactions between surface water and groundwater, which are governed by the region’s unique topographical and geological conditions.

3. Materials and Methods

3.1. Data Selection

A dataset of 16,457 groundwater analyses from the Netherlands Geological Survey (TNO) database, spanning from 1970 to 2010 and including depths down to 50 m below sea level (MBLS), was used to identify the optimal interpolation method for visualising spatiotemporal changes in groundwater quality. Groundwater in the western Netherlands has mainly remained stagnant for centuries, with carbon-14 dating indicating ages between 4000 and 5000 years [26]. This long-term stability suggests that paleohydrogeological conditions significantly influence the salinity of Dutch groundwater. In agricultural areas, shallow rainwater infiltration results in rainwater lenses within the confining top layer, while groundwater wells typically access the first aquifer below. Generally, changes in groundwater composition over the past 40 years have been minimal, indicating stable salinity levels over this geological time scale (Figure A2).
During quality control, each analysis underwent checks for duplicates, illogical compound combinations (e.g., alkalinity lower than pH 4), and adherence to electroneutrality principles in water composition. Wells with multiple analyses were represented by the median values calculated using SPSS (Statistical Package for the Social Sciences). This resulted in a dataset of 3033 groundwater records from 1875 wells (some with multiple screen depths). Figure A3 shows the distribution of samples across various depth intervals. The study area was divided into eight horizontal layers, reflecting the overall aquifer stratification. While acknowledging the influence of vertical features on groundwater flow through infiltration and exfiltration processes, this layered approach was deemed more suitable. However, it is important to clarify that groundwater samples from depths of less than 5 m are primarily from coastal dunes and ice-pushed areas. To account for this, we limited the interpolation to these regions for this layer. Furthermore, since the groundwater level is generally within 3 m of the surface [20], the map we created accurately represents the deeper layers being mapped. Section 4.3 discusses the factors influencing 3D map accuracy in this context and justifies the selection of the layered approach.

3.2. Methodology

To achieve the research goals, the work was divided into stages (Figure 2). 1. Data collection, processing, and analysis and 2. spatial interpolation model comparison and selection. ArcMap’s Geostatistical Analyst (GA) tool was crucial for data analysis. GA offers both deterministic and geostatistical methods for surface mapping. We could validate the models and determine the optimal interpolation technique for each situation by employing these methods. We created interpolation maps, and the resulting surfaces were converted from GA layers into raster layers using the raster tools.

3.3. Interpolation Methods

This study compared the performance of five interpolation techniques for groundwater quality mapping: Ordinary Kriging (OK), Simple Kriging (SK), Universal Kriging (UK), Local Polynomial Interpolation (LPI), and Empirical Bayesian Kriging (EBK). We begin by providing a concise overview of the core steps involved in each method. Subsequently, we highlight the key strengths, limitations, and parameters influencing the effectiveness of each technique.

3.3.1. LPI Method

The LPI method is a linear regression model with varying regression coefficients [27]. It assesses the desired variable’s dependence on data locations and calculates the values of unknown points by fitting the local polynomial using point regression coefficients only within the specified neighbourhood instead of all [27,28]. The term “neighbourhood” refers to sample points that are close together. The sample points in a neighbourhood can be geographically weighted by their distance from the prediction location [17]. Neighbourhoods can overlap or be used in the next local polynomial. This interpolation method focuses on surface uniformity with a variable relief form and produces surfaces that capture short-range variation [29]. Thus, it may be a good candidate for automatically mapping the data regularly collected from the groundwater monitoring networks, especially in heterogeneous areas. The general Equation (1) used in the LPI method is the following [27]:
Z ^ ( x 0 ) = Z ( X i ) + ε i = X ( s i ) β ( s ) + ε i
where i = 1 to n, n is the number of data locations, z i is the average of q i observations made at the i th measurement point, s i are the coordinates of the ith measurement point, Z ( X i ) is the real value at the location X i , ε i is the averaged error of the local area around the i th measurement point, β(s) are the regression coefficients in the local area around Z ( X i ) , and X ( s i ) are the explanatory variables in the case of geographically weighted regression, including the x, y coordinates.

3.3.2. Classical Kriging Methods

Classical Kriging is a powerful geostatistical tool that estimates property values at unsampled locations based on neighbouring observations. For this method to work, you need a model of the semivariogram (Equation (2)), an essential tool to characterise the spatial variability of a variable of interest.
γ ( h ) = 1 2 n   i n ( h ) { Z ( X i ) Z ( X i + h ) } 2
In the equation, Z ( X i ) is the value of the variable of interest at location ( X i ) , and Z ( X i + h ) is the value of the variable at a location X i + h , which is h distance away from X i . n ( h ) is the number of data point pairs separated by the distance h and { Z ( X i ) Z ( X i + h ) } 2 is the squared difference between the values at two locations separated by distance h .
Classical Kriging views regionalised variables as spatially defined. This perspective allows us to treat regionalised variables probabilistically, even with a single observation. A key application of kriging is the generation of regular grids of estimates for creating contour maps with statistically optimal properties. The estimates are unbiased, meaning the forecast’s expected value aligns with the observations’ expected value. Another significant advantage of kriging is that it provides error variances for any linear estimation method. These can be computed at any location where a kriging estimate is made, allowing for the visualisation of uncertainty on a curved surface. A robust suite of kriging techniques, including Simple Kriging (SK), Ordinary Kriging (OK), Universal Kriging (UK), and others, are employed, each with its own strengths [30].

Simple Kriging (SK)

SK is the most basic kriging method, relying on three key assumptions. Firstly, it assumes that the observations partially realise a random function, denoted as Z(x), where x represents the spatial location. Secondly, it assumes that this random function is second-order stationary, meaning that the mean, spatial covariance, and semivariance are not dependent on x. Lastly, it assumes that the mean is known. In Equation (3) for the Simple Kriging estimate is:
Z ^ ( x 0 ) = m + i = 1 k λ i [ Z ( x i ) m ]
which Z ^ ( x 0 )   is the predicted value of the function at the location x 0 , m is the mean value of the function in the neighbourhood, k is the number of measured values, λ i   are unknown weights for each measured value Z ( x i ) .

Ordinary Kriging (OK)

OK relaxes the requirement for a known mean in SK. It assumes the mean is constant but unknown across the area of interest. This allows for wider use of kriging. Like SK, OK relies on the spatial dependence of the data to estimate values at unsampled locations. The OK estimator (Equation (4)) can be shown as
Z ^ ( x 0 ) = m ( 1 i = 1 k λ i ) + i = 1 k λ i Z ( x i )
However, a vector containing the k observations near location x i is necessary where the desired estimate of the regionalised variable Z ( x ) is needed.

Universal Kriging (UK)

UK tackles a limitation of Ordinary Kriging (OK). While OK works well for data with a constant average, it struggles with trends. UK addresses this by separating the data into the drift and the residuals.
UK works in three steps: 1. estimate and remove trend, 2. kriging on residuals, 3. combine for the final estimate. The usual geostatistic models are first- or second-degree polynomials (Equations (5) and (6)) [31].
m ( x 0 ) = α 0 + i = 1 k ( α 1 z 1 , i + α 2 z 2 , i )
m ( x 0 ) = α 0 + i = 1 k ( α 1 z 1 , i + α 2 z 2 , i + α 3 z 1 , i 2 + α 4 z 2 , i 2 + α 5 z 1 , i z 2 , i )
where in two dimensions, z 1 , i , represents the easting coordinate of observation i and z 2 , i represents the northing coordinate at the same location. The α j are the unknown drift coefficients.
However, the model must comprise the residuals from the regionalised variable, which leads to more complexity. Solving this extended system of equations will generate a set of weights involving a linear model with two coefficients, requiring at least five observations for each estimate. However, many more observations are typically used for each estimated location, often 16 to 32 control points.
More details on Classical Kriging methods and deterministic interpolation methods, such as LPI, are found in Krivoruchko [32] and Li, Heap [33].

3.3.3. Empirical Bayesian Kriging (EBK)

A significant challenge lies in automatically estimating all model parameters, including data transformation and regression coefficients, especially for large datasets spanning vast areas. Bayesian approaches, known for their ability to account for model parameter uncertainty, have emerged as a promising solution. EBK addresses the limitations of Classical Kriging, which relies on a single semivariogram and manual parameter adjustments. EBK utilises a geostatistical interpolation technique to predict values at specific locations using nearby observations [34,35,36]. It achieves this through self-optimisation using an ensemble of semivariogram models automatically generated via subsetting and simulation [37,38]. EBK also tackles data scarcity by leveraging local trends [35] and can extrapolate when necessary [39,40].
The final thematic map is generated by combining the results of these localised models. For large datasets, EBK fits models for data subsets and predicts using a weighted sum of nearby subset models, which may overlap. Multiple subsets, including those with varying trends, can contribute to predictions. Subsets can be user-defined or automatic. Refer to Krivoruchko et al. [41] for details on subsets and overlap.
Semivariance increases with the distance between the prediction location and the nearest observation, indicating that the variation is no longer related to distance after a certain point. Classical Kriging assumes a Gaussian process, but this assumption is violated for most real data. While EBK transforms data to a near-Gaussian distribution, the residuals can still be non-Gaussian, so a transformation option is available. Figure 3 summarises the EBK interpolation process used in this study. The general EBK Equation (7) is [42]:
Z ( x ) = i = 0 n λ i Z ( x i j )
where Z ( x ) was the predicted value ( i j ) was the coordinate of known points, λ i was the weight coefficient.
The EBK model is useful for connecting data from different points in time and space, especially groundwater data (e.g., [10,13,18,44]), even if the data shows unusual patterns or cannot be obtained from the same sources across the study area. Despite challenges in estimating all model parameters, EBK’s ability to handle uncertainties makes it a promising solution for automatic data interpolation.

3.4. Method Strengths and Weaknesses

Table 1 summarises the advantages, disadvantages, and influencing parameters of these common interpolation methods used to analyse spatial data. While each method offers unique benefits, it also comes with limitations. For instance, kriging techniques excel at incorporating spatial dependence but require more complex parameterisation and can be computationally expensive for large datasets. The outcomes generated by distinct interpolation methods can vary depending on the algorithms employed, the underlying assumptions, and the properties of the data to which they are applied. Additionally, the maximum concentration in a map produced through interpolation may surpass the maximum value present in the original data. As a result, the extrapolation accuracy depends on the method used, and some methods may be better suited to handle extrapolation than others. For example, the EBK method reduces the impact of outliers by borrowing strength from the ensemble, and standard errors are spatially stabilised [45,46].

3.5. Interpolation and Validation

The various tasks were applied to the data using ArcMap 10.8. The ArcMap Geostatistical Wizard was used to adjust the parameters, including transformation, order of trend removal, and declustering, to obtain the optimal interpolation methods. If necessary, these steps were performed during the pre-processing stage to enhance data quality before spatial interpolation. The reasons for applying each step and the conditions under which they were used are explained in the following. The spatial groundwater data were analysed with descriptive statistics tools in SPSS and showed a half-normally distribution (a truncated normal). Figure A4 shows the frequency histogram of groundwater species. It is important to note that an additional argument against using log-normal transformation is that it can exaggerate the importance of low-concentration values for substances like NO3 and Fe. Since the detection limits for these substances vary between 0.1 mg/L, 0.01 mg/L, etc., “low” or “below detection limit” values can become disproportionately spread when a log-normal transformation is applied.
Also, no complete regional trend was seen, which should have been removed. To address the absence of a complete regional trend, which should have been removed, trend detection and removal were performed as a pre-processing step. This involved fitting a polynomial regression model into the spatial data to identify systematic spatial variations. The identified trend was then subtracted from the data, resulting in a detrended dataset. This pre-processing step ensures that the subsequent interpolation methods, such as Ordinary Kriging (OK) or Empirical Bayesian Kriging (EBK), are applied to data that accurately reflect local variations, thereby improving the reliability of the spatial predictions. The UK and OK would yield similar results if the data did not exhibit a 100% regional trend. When unsure how effective our trend removal or transforming is, we should use both, compare the outcomes, and pick the best for our data. Furthermore, we used the declustering function wherever necessary for the SK interpolation, given the distribution of the groundwater samples commensurate with the known depth. Remember, kriging assigns equal weight to all measurements within a certain distance from an unidentified location, regardless of their spatial arrangement. This could lead to overestimating or underestimating values at the unknown location, mainly if anomalies or pollution sources exist. Declustering eliminates this bias by separating nearby points into clusters and giving each one a weight based on its features and how the data are spread out.
Furthermore, we used the Optimal model function to optimise the chosen interpolation model automatically. For EBK, many functions in General properties, such as overlap factor, data partitioning, and specified numbers, provide the best results. For example, the sample numbers can range from 20 to 1000 (with a default of 100) in such a way that each sample is used Q times (the overlap factor Q can range between 0.01 and 5, allowing for both overlapping and disjoint subsets, though a non-overlapping data subsetting option is also provided [37]. The EBK subsetting option was modified to achieve the smallest Root Mean Square Error (RMSE). In addition, there are numerous semivariogram models, such as Linear, Thin Plate Spline, Exponential, Exponential Detrended, Whittle Detrended, K-Bessel, and K-Bessel Detrended, each of which has its own benefits and drawbacks. We used a Power semivariogram because of the balance between accuracy and processing speed regarding the recommendations provided [52]. It is important to note that data transformation aims to enhance the accuracy and validity of interpolation results by reducing the impact of outliers, non-normality, and other sources of data variability. Nevertheless, selecting the appropriate data transformation type depends on the characteristics of the input data and the desired interpolation results. Finally, all thematic maps were created and transformed using the Raster tool. Note that although EBK often manages data variability effectively and provides more realistic estimates, negative values can still occur. To address this, negative values produced by EBK were replaced with zero using the Raster Calculator’s thresholding function as a post-processing adjustment since concentration values are bounded and cannot be negative. This adjustment, however, may affect error estimates and conditional simulations. Therefore, while the initial Gaussian assumption simplifies the analysis, its application to concentration data should be approached with caution.
Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Error (ME) are statistical measures using the leave-one-out cross-validation (LOOCV) technique for evaluating interpolation accuracy, which is accessible in Geostatistical Wizard. This technique has previously been used in hydrogeological studies [18,53]. Through LOOCV, the program systematically removes each point in the interpolation, predicts its value by interpolating the remaining points, and compares the expected value to the measured value [18,54]. This validation method lets us determine which interpolation models provide the most accurate dataset representation. The most appropriate interpolation method should have the lowest value of the RMSE as the most important index to evaluate the accuracy (Equation (8)):
R M S E = 1 n i = n ( y observed i y predicted i ) 2
where y observed i is the observed value, y predicted i is the predicted value, and n is the number of samples.
This study used the cross-validation results to compare the interpolation methods. The RMSE values were calculated per combination of species and depth intervals for all interpolation methods investigated. These RMSE values derived from the integrated interpolation model and cross-validation results were used to evaluate the different interpolation methods.
Choosing the most appropriate interpolation method using a wide range of RMSE calculations is complex. Statistical indicators of RMSE calculations, such as median value (MD), standard deviation (SD), and maximum value (Max), can be used to evaluate the different interpolation methods. SD is one of the most commonly used indicators of dispersion tendency, while MAX refers to the biggest error measurements. Small MD, SD, and MAX values indicate less uncertainty at a certain point for that interpolation method.

4. Results and Discussion

Table 2 presents sample counts, medians, mean, and interquartile ranges (Equation (9)) of chosen groundwater constituents for each depth interval.
IQR = Q3 − Q1
where Q1 is the 25th percentile of the data and Q3 is the 75th percentile of the data.

4.1. Maps

We looked at kriging and LPI interpolation maps for five important solutes. They showed that both small- and large-scale heterogeneity complicated the hydrogeochemical pattern at different depths. This is illustrated in Figure 4 for the EBK and LPI methods. High SO4 concentrations (more than 150 mg/L) have been linked to old, brackish-to-saline groundwater from Holocene transgressions [55] or a combination of pyrite (FeS2) oxidation in reclaimed land and peat mineralisation exacerbated by acid, SOx-rich rain until the 1980s [56,57]. Fresh (Cl < 100 mg/L) to saline (Cl > 5000 mg/L) groundwater is found in Pleistocene and Holocene sediments. The Cl concentration is lowest in the dunes and the area with fluvial deposits at the surface, reflecting rainwater infiltration [19]. Chloride is highest in Pleistocene aquifers under polders, reflecting the palaeohydrological conditions and mixing between marine and freshwater [20], especially in the southwestern and northern parts of the study area. Iron has low concentrations in the dunes and east. There is a belt with high concentrations in the polders eastwards of the dunes, and the northern part also contains high Fe concentrations. Extreme Fe concentrations above 40 mg/L are uncommon since high Fe concentrations will induce siderite (i.e., FeCO3) precipitation at the neutral to slightly alkaline pH found [20]. Phosphate shows the lowest concentrations in the east, while areas with high concentrations are in the southwest and north. The NH4 map is not displayed because of its similarities to the PO4 map.
There are substantial differences between the hydrogeochemical patterns established by the various interpolation techniques. After evaluating all thematic maps at all depth intervals, the following was observed. Unlike LPI, kriging interpolation methods tend to eliminate local anomalies from the interpolation grid to represent a general regional pattern. Except for the PO4 and Fe maps, which have a spotted pattern in some areas, the produced maps have a smooth pattern that gradually alters. Some groundwater wells may have significantly higher or lower concentrations of PO4 and Fe than the nearby wells, which can result from several factors relating to the hydrogeology and geochemistry of the subsurface. One possible explanation for a spotted pattern in PO4 is the presence of localised sources of phosphorus.
In the same way, spots on Fe maps can show differences in the geochemistry of the ground due to iron-rich minerals or changes in solubility controls and redox conditions. For example, sulphate reduction may affect the solubility control of Fe by pyrite versus siderite equilibrium. Additional investigation, including sediment and groundwater sampling and analysis, is needed to determine the precise controls of the sediment matrix on the groundwater composition, including the Fe concentrations and their spotted nature.
Various interpolation techniques also provide varying estimates of the maximum values. Here, the predicted values for Cl range from 0 to 17,230 mg/L using the geostatistical EBK method and 0 to 17,350 mg/L using the deterministic LPI method. SO4 ranged between 0 and 1450 mg/L for both interpolation methods (Figure 4(a2,b2)). The Fe concentration ranged between 0 and 94 mg/L using EBK (Figure 4(a3)), whereas Fe interpolation using LPI ranged between 0 and 101 mg/L (Figure 4(b3)). The greatest difference in maximum values is seen for PO4, where the highest value predicted by EBK is 19.9 mg/L, and the highest value predicted by LPI is 42.25 mg/L (Figure 4(a4,b4)).

4.2. Simulation Accuracy

Table 3 summarises the RMSE values calculated for each interpolation method across all groundwater species and depth intervals. In this study, the interpolation method with the lowest RMSE value is considered the best fit for representing the input data. Generally, the first layer exhibits the lowest RMSE, and it tends to increase with depth. This suggests that the complexity of geological formations influences the accuracy of the chosen interpolation method. Interestingly, accuracy improves between 20 and 30 NAP, which coincides with the highest lateral sampling density. This highlights the significant impact of sampling density on measurement precision. The RMSE changes for each species per depth layer, suggesting that each groundwater species might affect the accuracy of the interpolation method. This implies that a single interpolation method may not be optimal for mapping all species simultaneously.
To enhance clarity, the results are also presented in Figure 5, using the following statistical indicators. The statistical median of the RMSE values is represented by the middle line within each box, while the threshold indicates the lowest recorded error. The length of each box illustrates the range of RMSE values for each interpolation method. Therefore, the best method is characterised by the lowest median, the smallest RMSE values, and the narrowest range of RMSE.
Our results show that the EBK method performed better than other interpolation methods, while this method also has a better smoothing effect. Figure A5, Figure A6, Figure A7, Figure A8 and Figure A9 present maps of the selected groundwater species generated using the optimal interpolation method. The worst interpolation method was SK for Cl, LPI for Fe and NH4, OK for PO4, and UK for SO4. Furthermore, the next proper interpolation methods can be listed for Cl as OK > UK > LPI and for Fe as UK > OK > SK. By investigating the RMSE seen for PO4, the results show that although UK, SK, and LPI have almost similar MDs and SDs, the interpolation methods could be sorted according to the MAX value as SK > UK > LPI. The statistical results on the RMSE of SO4 show that the second-lowest MD and SD are found for LPI. As a result, the interpolation methods rank as LPI > OK > SK. A comparison of interpolation methods for NH4 also shows that the interpolation methods are sorted as OK > UK > SK.
However, it is crucial to remember that RMSE offers a single metric of accuracy and does not provide information about the distribution or patterns of errors. Therefore, employing additional metrics and visualisation techniques, along with incorporating domain expertise, data characteristics, prior knowledge, and validation data, is recommended. This is precisely why we delve deeper into exploring the distribution of errors in the next step.
Obtained maps’ error prediction and associated uncertainty vary when different interpolation methods are applied (e.g., EBK, Figure 6). Uncertainty about the interpolation method is related to many factors, including the effectiveness of the number of samples and sampling distance [58]. This may be related to the presence of impermeable versus permeable sedimentary layers. A suitable interpolation method should also give results with low interpolation smoothing, which keeps the gradual change in species [59]. However, the hexagonal patterns observed in the prediction error map at 40–50 m in the southwest can be linked to how the data were subsetted in our model. These shapes, appearing in areas without samples, could also result from extrapolations based on nearby data points. While this occasionally might yield less accurate predictions in these regions, it is a typical hurdle in spatial analysis and does not necessarily undermine our results. Interestingly, despite the presence of these shapes, the adjustments made in the model that led to their formation also produced the lowest RMSE. This suggests a more accurate data fit overall. Moreover, minor data irregularities are not expected to significantly influence our conclusions, especially if the analysis aims to identify overarching trends rather than pinpoint precise values.
Corresponding to our findings, the mean errors and coefficients of determination of the SK models were relatively higher than the other four methods. It implies that SK is more sensitive to the extreme points and the relatively low sample density in some areas because it uses the global mean of the entire data, not the local mean. These conditions are prevalent in the southern and southeastern parts of the study area at depths of more than 30 m. Also, a less appropriate sampling distribution exists in the first 10 m.

4.3. Three-Dimensional Aspect

We opted to create maps for depth intervals of 5 or 10 m, resulting in a quasi-3D model of groundwater quality rather than a full 3D model. The Dutch coastal areas pose several challenges when creating a full 3D model of groundwater quality. The region’s geological system is complex, with subsurface units and strata with distinct hydraulic properties [60]. Human activities such as land reclamation, peat extraction, and construction of dikes and canals have also significantly impacted complex hydrological patterns [61]. Additionally, various natural processes in the area impact the region’s sedimentation history, creating a complex stratigraphic record. As aquitards, local clay layers restrict vertical flow, making it difficult to estimate groundwater quality accurately [62].
The absence or limited use of well screens in clay-rich layers, which are prevalent in large portions of the Holocene top layer, especially in the first interval layer, poses a significant challenge for collecting observations. This is primarily attributed to clay’s low permeability, which hampers the movement of groundwater and makes it difficult to gather accurate data, leading to a decrease in the spatial density of information and further complicating groundwater assessments in these areas. As a result, interpolating groundwater data becomes notably more complicated in clay-rich areas, adding to the uncertainty in understanding subsurface dynamics. We recommend not using SK to interpolate the spatial variation in groundwater quality species unless the data show low spatial variation and a higher sample density with evenly distributed sampling.
For the first time, Appelo and Willemsen [63] noted that the development of the cation composition of groundwater might also differ between diffusion and advection as controlling transport mechanisms during the displacement of saline and fresh groundwater or vice versa. Advective transport can sustain sharp concentration gradients that persist significantly, whereas dispersive or diffusive transport can lead to gradual gradients. These different transport mechanisms can lead to complex and unpredictable patterns in groundwater chemistry, making it challenging to model contaminant behaviour accurately. Also, ref. [64] found that density-driven groundwater flow is present in the study area. This flow type can cause water and rock to interact, especially through cation-exchange processes [65]. With these complex hydrogeochemical patterns and factors, such as advective transport and cation-exchange processes, creating a full 3D model can be challenging and may only capture some of the relevant features of the system. Also, geostatistical interpolation methods tend to smooth concentration gradients, making it challenging to obtain vertical hydrogeochemical gradients [50,66]. As a result, if a 3D model without taking the data features, prior knowledge, and hydrogeochemical complexity into account, especially in the vertical dimension, is made, the model becomes less accurate, and predictions become less confident [67,68,69].
To our knowledge, no precise interpolation method involving barriers (e.g., aquitards or impermeable layers) can create a 3D hydrogeochemical model in an ArcGIS environment. Hence, we used a layered approach for our 3D hydrogeochemical model in ArcGIS, which does not incorporate geostatistical relationships in the vertical direction to account for the presence of barriers such as aquitards or impermeable layers. This approach is likely to reduce uncertainty in the vertical dimension of the model.

5. Conclusions

Groundwater monitoring is laborious and expensive, making it worthwhile to consider the optimum interpolation method and correctly estimate concentrations in unmonitored areas. For the first time, a large dataset of 3030 samples was used to compare five geostatistical interpolation methods (EBK, UK, OK, SK, and LPI) for estimating groundwater composition in the western Netherlands’s coastal lowlands. The comparison was conducted for five species that refer to salinity, redox state, and natural nutrients. EBK outperforms the other geostatistical methods based on root mean square analysis values obtained by cross-validation. UK is the second-best interpolation method for two out of five species: Fe and PO4. For Cl and NH4, the OK interpolation method is the second-best. Furthermore, LPI is the second-best SO4 indicator. The SK method is not recommended since higher mean errors exist. Based on this information, EBK is the method of interpolation that should be selected as the best option for determining the composition of groundwater when there is no possibility of conducting testing before interpolation. However, cross-validation remains crucial for confirming the chosen method’s effectiveness in any specific application.

Author Contributions

Conceptualisation, M.Z. and J.G.; methodology, M.Z. and R.D.; software, M.Z.; validation, M.Z. and J.G.; formal analysis, M.Z.; investigation, M.Z. and J.G.; resources, J.G.; data curation, M.Z. and R.D.; writing—original draft preparation, M.Z.; writing—review and editing, M.Z., R.D. and J.G.; visualisation, M.Z.; supervision, J.G.; project administration, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

TNO Geological Survey is the organisation that stores geoscientific data, among which the data used in this study are part of their national task.

Acknowledgments

This research is part of a project by the first author, Mojtaba Zaresefat, who would like to express his sincere gratitude for the financial support from the I.R. Iran government’s scholarship programme, funded by the Ministry of Science, Research and Technology (MSRT). Finally, the paper’s three anonymous reviewers are thanked for their constructive feedback.

Conflicts of Interest

Author Jasper Griffioen is employed by the company TNO Geological Survey of the Netherlands. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Figure A1. Two north-south and one east-west hydrogeological cross-sections of the study area from BRO REGIS II v2.2.1 (www.dinoloket.nl (accessed on 28 August 2024). NAP (Normal Amsterdam Peil) height of 0 m is about the average North Sea level. CM means complex, S means sand sediment (i.e., aquifers), and C means clay sediment layers (i.e., aquitards). The names refer to the geological formations.
Figure A1. Two north-south and one east-west hydrogeological cross-sections of the study area from BRO REGIS II v2.2.1 (www.dinoloket.nl (accessed on 28 August 2024). NAP (Normal Amsterdam Peil) height of 0 m is about the average North Sea level. CM means complex, S means sand sediment (i.e., aquifers), and C means clay sediment layers (i.e., aquitards). The names refer to the geological formations.
Water 16 02581 g0a1
Figure A2. The time series of the selected location presenting the overall groundwater quality changes over 40 years. Time series plots (AO) correspond to map sample locations.
Figure A2. The time series of the selected location presenting the overall groundwater quality changes over 40 years. Time series plots (AO) correspond to map sample locations.
Water 16 02581 g0a2
Figure A3. The distribution of samples across various depth intervals (in meters relative to NAP).
Figure A3. The distribution of samples across various depth intervals (in meters relative to NAP).
Water 16 02581 g0a3
Figure A4. The frequency histogram distribution of groundwater species suggesting a truncated normal distribution. It is important to note that each histogram visualises the overall species frequency, not the distribution within each depth interval.
Figure A4. The frequency histogram distribution of groundwater species suggesting a truncated normal distribution. It is important to note that each histogram visualises the overall species frequency, not the distribution within each depth interval.
Water 16 02581 g0a4
Figure A5. Chloride groundwater maps created by the EBK method for all depth intervals in the western Netherlands. Few groundwater observation wells at 0–5 m NAP outside the dunes and ice-pushed ridge exist because most of the study area is below sea level and the top layer is often clayey in the polder area in which few groundwater wells have been installed.
Figure A5. Chloride groundwater maps created by the EBK method for all depth intervals in the western Netherlands. Few groundwater observation wells at 0–5 m NAP outside the dunes and ice-pushed ridge exist because most of the study area is below sea level and the top layer is often clayey in the polder area in which few groundwater wells have been installed.
Water 16 02581 g0a5
Figure A6. Sulphate groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Figure A6. Sulphate groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Water 16 02581 g0a6
Figure A7. Iron (Fe (II)) groundwater maps created by EBK method for all depth intervals in the western Netherlands.
Figure A7. Iron (Fe (II)) groundwater maps created by EBK method for all depth intervals in the western Netherlands.
Water 16 02581 g0a7
Figure A8. Phosphate groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Figure A8. Phosphate groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Water 16 02581 g0a8
Figure A9. Ammonium groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Figure A9. Ammonium groundwater maps created by the EBK method for all depth intervals in the western Netherlands.
Water 16 02581 g0a9

References

  1. Zaryab, A.; Nassery, H.R.; Alijani, F. Identifying Sources of Groundwater Salinity and Major Hydrogeochemical Processes in the Lower Kabul Basin Aquifer, Afghanistan. Environ. Sci. Process. Impacts 2021, 23, 1589–1599. [Google Scholar] [CrossRef]
  2. Gunnink, J.L.; Burrough, P.A. Interactive Spatial Analysis of Soil Attribute Patterns Using Exploratory Data Analysis (EDA) and GIS. In Spatial Analytical Perspectives on GIS; Routledge: Oxford, UK, 2019; pp. 87–100. [Google Scholar]
  3. De Smith, M.J.; Goodchild, M.F.; Longley, P. Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools, 6th ed.; Winchelsea Press: London, UK, 2021; ISBN 9781912556038. [Google Scholar]
  4. Webster, R.; Oliver, M.A. Geostatistics for Environmental Scientists; John Wiley & Sons: Hoboken, NJ, USA, 2007; ISBN 0470517263. [Google Scholar]
  5. Smith, J.E.; von Winterfeldt, D. Decision Analysis in Management Science. Manag. Sci. 2004, 50, 561–574. [Google Scholar] [CrossRef]
  6. Güler, M.; Kara, T. Comparison of Different Interpolation Techniques for Modelling Temperatures in Middle Black Sea Region. Agric. Fac. Gaziosmanpasa Univ. 2014, 31, 61–71. [Google Scholar] [CrossRef]
  7. Stahl, K.; Moore, R.D.; Floyer, J.A.; Asplin, M.G.; McKendry, I.G. Comparison of Approaches for Spatial Interpolation of Daily Air Temperature in a Large Region with Complex Topography and Highly Variable Station Density. Agric. For. Meteorol. 2006, 139, 224–236. [Google Scholar] [CrossRef]
  8. Wu, W.; Tang, X.-P.; Ma, X.-Q.; Liu, H.-B. A Comparison of Spatial Interpolation Methods for Soil Temperature over a Complex Topographical Region. Theor. Appl. Clim. 2016, 125, 657–667. [Google Scholar] [CrossRef]
  9. Hengl, T. A Practical Guide to Geostatistical Mapping, EUR 22904 EN Scientific and Technical Research Series, 2nd ed.; Office for Official Publications of the European Communities: Luxembourg, 2009. [Google Scholar]
  10. Bhunia, G.S.; Shit, P.K.; Maiti, R. Comparison of GIS-Based Interpolation Methods for Spatial Distribution of Soil Organic Carbon (SOC). J. Saudi Soc. Agric. Sci. 2018, 17, 114–126. [Google Scholar] [CrossRef]
  11. Murphy, R.R.; Curriero, F.C.; Ball, W.P. Comparison of Spatial Interpolation Methods for Water Quality Evaluation in the Chesapeake Bay. J. Environ. Eng. 2010, 136, 160–171. [Google Scholar] [CrossRef]
  12. Varouchakis, E.A.; Hristopulos, D.T. Comparison of Stochastic and Deterministic Methods for Mapping Groundwater Level Spatial Variability in Sparsely Monitored Basins. Environ. Monit. Assess. 2013, 185, 1–19. [Google Scholar] [CrossRef]
  13. Jovein, E.B.; Hosseini, S.M. A Systematic Comparison of Geostatistical Methods for Estimation of Groundwater Salinity in Desert Areas. Iran Water Resour. Res. 2016, 11, 1–15. [Google Scholar]
  14. Seyedmohammadi, J.; Esmaeelnejad, L.; Shabanpour, M. Spatial Variation Modelling of Groundwater Electrical Conductivity Using Geostatistics and GIS. Model Earth Syst. Environ. 2016, 2, 1–10. [Google Scholar] [CrossRef]
  15. Xiao, Y.; Gu, X.; Yin, S.; Shao, J.; Cui, Y.; Zhang, Q.; Niu, Y. Geostatistical Interpolation Model Selection Based on ArcGIS and Spatio-Temporal Variability Analysis of Groundwater Level in Piedmont Plains, Northwest China. SpringerPlus 2016, 5, 425. [Google Scholar] [CrossRef]
  16. Amah, V.E.; Agu, F.A. Geostatistical Modelling of Groundwater Quality at Rumuola Community, Port Harcourt, Nigeria. Asian J. Environ. Ecol. 2020, 12, 37–47. [Google Scholar] [CrossRef]
  17. Kumari, M.K.N.; Sakai, K.; Kimura, S.; Nakamura, S.; Yuge, K.; Gunarathna, M.H.J.P.; Ranagalage, M.; Duminda, D.M.S. Interpolation Methods for Groundwater Quality Assessment in Tank Cascade Landscape: A Study of Ulagalla Cascade, Sri Lanka. Appl. Ecol. Environ. Res. 2018, 16, 5359–5380. [Google Scholar] [CrossRef]
  18. Mirzaei, R.; Sakizadeh, M. Comparison of Interpolation Methods for the Estimation of Groundwater Contamination in Andimeshk-Shush Plain, Southwest of Iran. Environ. Sci. Pollut. Res. 2016, 23, 2758–2769. [Google Scholar] [CrossRef]
  19. Delsman, J.R.; Hu-a-ng, K.R.M.; Vos, P.C.; de Louw, P.G.B.; Oude Essink, G.H.P.; Stuyfzand, P.J.; Bierkens, M.F.P. Paleo-Modeling of Coastal Saltwater Intrusion during the Holocene: An Application to the Netherlands. Hydrol. Earth Syst. Sci. 2014, 18, 3891–3905. [Google Scholar] [CrossRef]
  20. Griffioen, J.; Vermooten, S.; Janssen, G. Geochemical and Palaeohydrological Controls on the Composition of Shallow Groundwater in the Netherlands. Appl. Geochem. 2013, 39, 129–149. [Google Scholar] [CrossRef]
  21. Griffioen, J.; Passier, H.F.; Klein, J. Comparison of Selection Methods to Deduce Natural Background Levels for Groundwater Units. Environ. Sci. Technol. 2008, 42, 4863–4869. [Google Scholar] [CrossRef]
  22. Dufour, F. Groundwater in the Netherlands: Facts and Figures; Netherlands Institute of Applied Geoscience TNO: Delf, The Netherlands, 2000. [Google Scholar]
  23. Gouw, M.J.P.; Erkens, G. Architecture of the Holocene Rhine-Meuse Delta (The Netherlands)—A Result of Changing External Controls. Neth. J. Geosci. 2007, 86, 23–54. [Google Scholar] [CrossRef]
  24. De Mulder, E.F.J. Landscapes. In The Netherlands and the Dutch: A Physical and Human Geography; De Mulder, E.F.J., De Pater, B.C., Droogleever Fortuijn, J.C., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 35–58. ISBN 978-3-319-75073-6. [Google Scholar]
  25. Verschuuren, J. Restoration of Protected Lakes Under Climate Change: What Legal Measures Are Needed to Help Biodiversity Adapt to the Changing Climate? The Case of Lake IJssel, Netherlands. Tilburg Law School Research Paper Forthcoming. Electron. J. 2019. [Google Scholar] [CrossRef]
  26. Post, V.E.A.; Van der Plicht, H.; Meijer, H.A.J. The Origin of Brackish and Saline Groundwater in the Coastal Area of the Netherlands. Neth. J. Geosci. 2003, 82, 133–147. [Google Scholar] [CrossRef]
  27. Gribov, A.; Krivoruchko, K. Local Polynomials for Data Detrending and Interpolation in the Presence of Barriers. Stoch. Environ. Res. Risk Assess. 2011, 25, 1057–1063. [Google Scholar] [CrossRef]
  28. Hani, A.; Abari, S.A.H. Determination of Cd, Zn, K, PH, TNV, Organic Material and Electrical Conductivity (EC) Distribution in Agricultural Soils Using Geostatistics and GIS (Case Study: South-Western of Natanz-Iran). Int. J. Biol. Life Agric. Sci. 2011, 5.0, 264. [Google Scholar] [CrossRef]
  29. Esri ArcGIS Geostatistical Analyst|Model Spatial Data & Uncertainty. Available online: https://www.esri.com/en-us/arcgis/products/geostatistical-analyst/overview (accessed on 9 November 2021).
  30. McKillup, S.; Dyar, M.D. Geostatistics Explained: An Introductory Guide for Earth Scientists; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  31. Abdulmanov, R.; Miftakhov, I.; Ishbulatov, M.; Galeev, E.; Shafeeva, E. Comparison of the Effectiveness of GIS-Based Interpolation Methods for Estimating the Spatial Distribution of Agrochemical Soil Properties. Environ. Technol. Innov. 2021, 24, 101970. [Google Scholar] [CrossRef]
  32. Krivoruchko, K. Spatial Statistical Data Analysis for GIS Users, 1st ed.; Esri Press: Redlands, CA, USA, 2011; ISBN 978-1-58948-161-9. [Google Scholar]
  33. Li, J.; Heap, A.D. Spatial Interpolation Methods Applied in the Environmental Sciences: A Review. Environ. Model. Softw. 2014, 53, 173–189. [Google Scholar] [CrossRef]
  34. ESRI ArcGIS Desktop 10.8 Guide. Available online: https://desktop.arcgis.com/en/arcmap/latest/get-started/setup/arcgis-desktop-quick-start-guide.htm (accessed on 27 November 2021).
  35. Krivoruchko, K. Empirical Bayesian Kriging. ArcUser Fall 2012, 6, 1145. [Google Scholar]
  36. Krivoruchko, K.; Fraczek, W. Interpolation of Data Collected along Lines; Esri Press: Redlands, CA, USA, 2015. [Google Scholar]
  37. Gribov, A.; Krivoruchko, K. Empirical Bayesian Kriging Implementation and Usage. Sci. Total Environ. 2020, 722, 137290. [Google Scholar] [CrossRef]
  38. Krivoruchko, K.; Gribov, A. Distance Metrics for Data Interpolation over Large Areas on Earth’s Surface. Spat. Stat. 2020, 35, 100396. [Google Scholar] [CrossRef]
  39. Knotters, M.; Heuvelink, G.B.M. A Disposition of Interpolation Techniques; Wettelijke Onderzoekstaken Natuur & Milieu: Wageningen, Germany, 2010. [Google Scholar]
  40. Krivoruchko, K.; Butler, K. Unequal Probability-Based Spatial Mapping; Esri Press: Redlands, CA, USA, 2013. [Google Scholar]
  41. Krivoruchko, K.; Gribov, A. Evaluation of Empirical Bayesian Kriging. Spat. Stat. 2019, 32, 100368. [Google Scholar] [CrossRef]
  42. Li, Y.; Zhang, M.; Mi, W.; Ji, L.; He, Q.; Xie, S.; Xiao, C.; Bi, Y. Spatial Distribution of Groundwater Fluoride and Arsenic and Its Related Disease in Typical Drinking Endemic Regions. Sci. Total Environ. 2024, 906, 167716. [Google Scholar] [CrossRef]
  43. Zou, L.; Kent, J.; Lam, N.S.N.; Cai, H.; Qiang, Y.; Li, K. Evaluating Land Subsidence Rates and Their Implications for Land Loss in the Lower Mississippi River Basin. Water 2015, 8, 10. [Google Scholar] [CrossRef]
  44. Zaresefat, M.; Hosseini, S.; Roudi, M.A. Addressing Nitrate Contamination in Groundwater: The Importance of Spatial and Temporal Understandings and Interpolation Methods. Water 2023, 15, 4220. [Google Scholar] [CrossRef]
  45. Morris, C.N. Parametric Empirical Bayes Inference: Theory and Applications. J. Am. Stat. Assoc. 1983, 78, 47–55. [Google Scholar] [CrossRef]
  46. Maritz, J.S.; Lwin, T. Empirical Bayes Methods; CRC Press: Boca Raton, FL, USA, 2018; ISBN 1351071661. [Google Scholar]
  47. Kumar, N.; Sinha, N.K. Geostatistics: Principles and Applications in Spatial Mapping of Soil Properties. In Geospatial Technologies in Land Resources Mapping, Monitoring and Management; Reddy, G.P.O., Singh, S.K., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 143–159. ISBN 978-3-319-78711-4. [Google Scholar]
  48. Sahu, S.K. Bayesian Modeling of Spatio-Temporal Data with R; CRC: New York, NY, USA, 2022. [Google Scholar]
  49. van Lieshout, M.N.M. Theory of Spatial Statistics: A Concise Introduction; CRC: New York, NY, USA, 2019. [Google Scholar]
  50. Li, J.; Heap, A.D. A Review of Spatial Interpolation Methods for Environmental Scientists. Geosci. Aust. 2008, 23, 137. [Google Scholar]
  51. Boumpoulis, V.; Michalopoulou, M.; Depountis, N. Comparison between Different Spatial Interpolation Methods for the Development of Sediment Distribution Maps in Coastal Areas. Earth Sci. Inform. 2023, 16, 2069–2087. [Google Scholar] [CrossRef]
  52. Tomlinson, K.M. A Spatial Evaluation of Groundwater Quality Salinity and Underground Injection Controlled Well Activity in Texas. Ph.D. Thesis, The University of Texas, Dallas, TX, USA, 2019. [Google Scholar]
  53. Ahmad, A.Y.; Saleh, I.A.; Balakrishnan, P.; Al-Ghouti, M.A. Comparison GIS-Based Interpolation Methods for Mapping Groundwater Quality in the State of Qatar. Groundw. Sustain. Dev. 2021, 13, 100573. [Google Scholar] [CrossRef]
  54. Xie, Y.; Chen, T.; Lei, M.; Yang, J.; Guo, Q.; Song, B.; Zhou, X. Spatial Distribution of Soil Heavy Metal Pollution Estimated by Different Interpolation Methods: Accuracy and Uncertainty Analysis. Chemosphere 2011, 82, 468–476. [Google Scholar] [CrossRef] [PubMed]
  55. van den Brink, C.; Frapporti, G.; Griffioen, J.; Zaadnoordijk, W.J. Statistical Analysis of Anthropogenic versus Geochemical-Controlled Differences in Groundwater Composition in The Netherlands. J. Hydrol. 2007, 336, 470–480. [Google Scholar] [CrossRef]
  56. Van Dam, H. Evaluatie Basismeetnet Waterkwaliteit Hollands Noorderkwartier: Trendanalyse Hydrobiologie, Temperatuur En Waterchemie 1982–2007; Water en Natuur: Amsterdam, The Netherlands, 2009. [Google Scholar]
  57. Buijsman, E.; Aben, J.M.M.; Hetteling, J.P.; Van Hinsberg, A.; Koelemeijer, R.B.A.; Maas, R.J.M. Zure Regen, Een Analyse van Dertig Jaar Verzuringsproblematiek in Nederland; Planbureau voor de Leefomgeving (PBL): The Hague, The Netherlands, 2010. [Google Scholar]
  58. Liu, R.; Chen, Y.; Sun, C.; Zhang, P.; Wang, J.; Yu, W.; Shen, Z. Uncertainty Analysis of Total Phosphorus Spatial–Temporal Variations in the Yangtze River Estuary Using Different Interpolation Methods. Mar. Pollut. Bull. 2014, 86, 68–75. [Google Scholar] [CrossRef]
  59. Falivene, O.; Cabrera, L.; Tolosana-Delgado, R.; Sáez, A. Interpolation Algorithm Ranking Using Cross-Validation and the Role of Smoothing Effect. A Coal Zone Example. Comput. Geosci. 2010, 36, 512–519. [Google Scholar] [CrossRef]
  60. REGIS model Subsurface Models|DINO Counter. Available online: https://www.dinoloket.nl/ondergrondmodellen (accessed on 28 November 2022).
  61. Jirner, E.; Johansson, P.-O.; McConnachie, D.; Burt, A.; Peter, P.M.; Tomlinson, J.; Lawson, A.K.; Vernes, R.W.; Dabekaussen, W.; Gunnink, J.L.; et al. Application Theme 2—Groundwater Evaluations. In Applied Multidimensional Geological Modeling: Informing Sustainable Human Interactions with the Shallow Subsurface; John, Wiley & Sons: Hoboken, NJ, USA, 2021; pp. 457–477. [Google Scholar] [CrossRef]
  62. Arora, B.; Mohanty, B.P.; McGuire, J.T. An Integrated Markov Chain Monte Carlo Algorithm for Upscaling Hydrological and Geochemical Parameters from Column to Field Scale. Sci. Total Environ. 2015, 512–513, 428–443. [Google Scholar] [CrossRef]
  63. Appelo, C.A.J.; Willemsen, A. Geochemical Calculations and Observations on Salt Water Intrusions, I. A Combined Geochemical/Minxing Cell Model. J. Hydrol. 1987, 94, 313–330. [Google Scholar] [CrossRef]
  64. Post, V.E.A.; Kooi, H. Rates of Salinization by Free Convection in High-Permeability Sediments: Insights from Numerical Modeling and Application to the Dutch Coastal Area. Hydrogeol. J. 2003, 11, 549–559. [Google Scholar] [CrossRef]
  65. Yu, X.; Michael, H.A. Impacts of the Scale of Representation of Heterogeneity on Simulated Salinity and Saltwater Circulation in Coastal Aquifers. Water Resour. Res. 2022, 58, e2020WR029523. [Google Scholar] [CrossRef]
  66. Rata, M.; Douaoui, A.; Larid, M.; Douaik, A. Comparison of Geostatistical Interpolation Methods to Map Annual Rainfall in the Chéliff Watershed, Algeria. Theor. Appl. Clim. 2020, 141, 1009–1024. [Google Scholar] [CrossRef]
  67. Allen, D.M.; Schuurman, N.; Zhang, Q. Using Fuzzy Logic for Modeling Aquifer Architecture. J. Geogr. Syst. 2007, 9, 289–310. [Google Scholar] [CrossRef]
  68. Burke, H.F.; Ford, J.R.; Hughes, L.; Thorpe, S.; Lee, J.R. A 3D Geological Model of the Superficial Deposits in the Selby Area; CR/17/112N; British Geological Survey: Nottingham, UK, 2017.
  69. Turner, A.K.; Kessler, H.; van der Meulen, M.J. Applied Multidimensional Geological Modeling: Informing Sustainable Human Interactions with the Shallow Subsurface; John Wiley & Sons: Hoboken, NJ, USA, 2021; ISBN 1119163102. [Google Scholar]
Figure 1. Overview of the surface geology, classified by the sedimentological origin of formations in the western Netherlands. The thin red line shows the general boundary of the area lying below average sea level.
Figure 1. Overview of the surface geology, classified by the sedimentological origin of formations in the western Netherlands. The thin red line shows the general boundary of the area lying below average sea level.
Water 16 02581 g001
Figure 2. Flowchart of common interpolation method evaluation.
Figure 2. Flowchart of common interpolation method evaluation.
Water 16 02581 g002
Figure 3. The customised EBK procedure, based on Zou et al. [43].
Figure 3. The customised EBK procedure, based on Zou et al. [43].
Water 16 02581 g003
Figure 4. Spatial distribution of Cl, SO4, Fe, and PO4 in the study area for 25–30 m below sea level according to the EBK (a1a4) and LPI (b1b4) methods.
Figure 4. Spatial distribution of Cl, SO4, Fe, and PO4 in the study area for 25–30 m below sea level according to the EBK (a1a4) and LPI (b1b4) methods.
Water 16 02581 g004
Figure 5. Comparison of the selected geostatistical interpolation methods based on RMSE values.
Figure 5. Comparison of the selected geostatistical interpolation methods based on RMSE values.
Water 16 02581 g005
Figure 6. The prediction error maps for Cl and PO4 concentrations for three depth intervals using EBK.
Figure 6. The prediction error maps for Cl and PO4 concentrations for three depth intervals using EBK.
Water 16 02581 g006
Table 1. Comparison of spatial interpolation methods (based on Kumar and Sinha [47]; Sahu [48]; van Lieshout [49]; Heap [33,50]; Boumpoulis et al. [51]).
Table 1. Comparison of spatial interpolation methods (based on Kumar and Sinha [47]; Sahu [48]; van Lieshout [49]; Heap [33,50]; Boumpoulis et al. [51]).
MethodAdvantagesDisadvantagesInfluencing
Parameters
Differences in Estimated Results
LPIAdapts to local patterns, no stationarity assumptions, efficientNo uncertainty quantification, overfitting riskPolynomial degree, smoothing, neighbourhoodLower error in areas with local variations, higher in smoother regions
SKSimple, unbiased predictionsAssumes constant mean, no uncertaintySemivariogram, data qualityHigher RMSE when constant mean assumption fails
OKSpatial dependence, minimises error, predicts uncertaintyRequires stationarity, complex semivariogram modelNugget 1 effect, Sill 2, Range 3
Data quality
Lower error with local mean adaptation, affected by semivariogram choice.
UKHandles trends, improves non-stationary data, predicts uncertaintyComplex, trend requirements, overfitting riskSemivariogram, trend model, data qualityLower error with trends, higher if trends are misidentified
EBKAutomates parameter estimation, handles non-stationarityPotential bias, limited control, intensiveSimulations, data quality, distributionLower error with variability, higher with lower data density
Notes: 1 The value at which the semivariogram (almost) intercepts the y-value. 2 The value at which the semivariogram first flattens out. 3 The distance at which the semivariogram first flattens out.
Table 2. Characteristics of the selected groundwater species per depth interval.
Table 2. Characteristics of the selected groundwater species per depth interval.
Depth Interval (m-NAP)CountMedian, Mean and Interquartile Range (mg/L)
ClSO4FePO4NH4
0 to 5249783551133564571.94.45.60.52.62.01.04.03.1
5 to 1036012184428637104853.46.88.10.93.13.03.110.012.2
10 to 154551479196671781684.69.210.01.23.73.25.013.015.9
15 to 20462191100010021167405.610.914.11.53.84.28.315.418.9
20 to 25519175115911701182436.111.513.11.33.54.28.815.021.3
25 to 3048525712241330873416.911.212.51.23.23.78.814.017.0
30 to 405035441617192616108556.911.912.60.92.42.46.712.914.1
40 to 503176232180286317137695.49.611.30.62.01.55.511.410.2
Table 3. RMSE values across various depths and interpolation methods (SK, OK, UK, EBK, and IDW) for the groundwater quality species, including Cl, SO4, NH4, Fe, and PO4. Lower RMSE values indicate better model performance.
Table 3. RMSE values across various depths and interpolation methods (SK, OK, UK, EBK, and IDW) for the groundwater quality species, including Cl, SO4, NH4, Fe, and PO4. Lower RMSE values indicate better model performance.
Method 0 to 55 to 1010 to 1515 to 2020 to 2525 to 3030 to 4040 to 50
Depth
Chloride
SK8991698165916101963136718101684
OK10671816150615111477120717671367
UK11161816169014731477125018461464
EBK9861537142714701430116914871351
IDW10631596142014811504122417781427
Sulphate
SK119216165184169180206165
OK115225150167173133212215
UK122218162173180135203217
EBK128210129167171142209163
IDW124216130163172141206172
Ammonium
SK8.115.613.213.110.513.514.925
OK7.0712.313.5713.110.114.415.421.7
UK7.1912.413.613.110.813.715.721.8
EBK7.112.213.112.99.113.414.821
IDW7.512.513.413.19.413.515.422
Iron
SK6.719.112.212.510.411.214.57.2
OK7.19.512.412.410.5810.115.17.8
UK7.19.311.512.410.59.4914.67.88
EBK6.39.111.112.310.29.114.47
IDW6.89.711.612.29.99.514.97
Phosphate
SK4.365.165.953.94.33.614.3
OK4.46.25.454.73.94.33.75
UK4.56.25.454.83.84.383.74.3
EBK46.054.94.63.84.23.54.3
IDW4.436.165.444.94.084.493.84.78
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zaresefat, M.; Derakhshani, R.; Griffioen, J. Empirical Bayesian Kriging, a Robust Method for Spatial Data Interpolation of a Large Groundwater Quality Dataset from the Western Netherlands. Water 2024, 16, 2581. https://doi.org/10.3390/w16182581

AMA Style

Zaresefat M, Derakhshani R, Griffioen J. Empirical Bayesian Kriging, a Robust Method for Spatial Data Interpolation of a Large Groundwater Quality Dataset from the Western Netherlands. Water. 2024; 16(18):2581. https://doi.org/10.3390/w16182581

Chicago/Turabian Style

Zaresefat, Mojtaba, Reza Derakhshani, and Jasper Griffioen. 2024. "Empirical Bayesian Kriging, a Robust Method for Spatial Data Interpolation of a Large Groundwater Quality Dataset from the Western Netherlands" Water 16, no. 18: 2581. https://doi.org/10.3390/w16182581

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop