*Article* **Minimizing Errors in the Prediction of Water Levels Using Kriging Technique in Residuals of the Groundwater Model**

**Alireza Asadi <sup>1</sup> and Kushal Adhikari 2,\***


**Abstract:** Groundwater monitoring and water level predictions have been a challenging issue due to the complexity of groundwater movement. Simplified numerical simulation models have been used to represent the groundwater system; these models however only provide the conservative approximation of the system and may not always capture the local variations. Several other efforts such as coupling groundwater models with hydrological models and using geostatistical methods are being practiced to accurately predict the groundwater levels. In this study, we present a novel application of a geostatistical tool on residuals of the groundwater model. The kriging method was applied on the residuals of the numerical model (MODFLOW) generated by the TWDB (Texas Water Development Board) for the Edwards–Trinity (Plateau) aquifer. The study was done for the years 1995 through 2000 where 90% of the observation data was used for model simulation followed by cross-validation with the remaining 10% of the observations. The kriging method reduced the average absolute error of approximately 31 m (for MODFLOW simulation) to less than 5 m. Furthermore, the residuals' average standard error was reduced from 9.7 to 4.7. This implies that the mean value of residuals over the entire period can be a good estimation for each year separately. The use of the kriging technique thus can provide improved monitoring of groundwater levels resulting in more accurate potentiometric surface maps.

**Keywords:** groundwater monitoring; modeling; MODFLOW; kriging; residuals; water levels; Edwards–Trinity aquifer

#### **1. Introduction**

The exponential growth of population, rapid socio-economic development, increasing food demand, and changing climatic factors have led to a decline in both the quality and quantity of freshwater resources. The decreasing available resources have posed serious challenges in the agricultural sector with limited water available for irrigation. As an alternative resource, the reuse of wastewater in irrigation has been increasingly recognized as an essential, and economical strategy [1–3]. However, only a small fraction of wastewater with less than 6% in the US [4] and less than 3% globally [5] is reclaimed; and the irrigation largely relies on groundwater sources. Groundwater is one of the primary sustainable water resources, especially during the high demand seasons, due to its lower susceptibility to sudden changes. About 70% of groundwater withdrawal worldwide is used for agriculture while irrigating nearly 38% of irrigated lands [6]. Likewise, nearly 50% of irrigated lands in the United States are based on groundwater sources [6].

Over the past centuries, extreme drought events have significantly affected both surface and groundwater resources [7,8]. While low surface water levels might be an immediate indicator of drought, changes to groundwater levels indicate long-term water scarcity. Further, it is straightforward to monitor and assess surface water changes while

**Citation:** Asadi, A.; Adhikari, K. Minimizing Errors in the Prediction of Water Levels Using Kriging Technique in Residuals of the Groundwater Model. *Water* **2022**, *14*, 426. https://doi.org/10.3390/ w14030426

Academic Editor: Cristina Di Salvo

Received: 22 December 2021 Accepted: 27 January 2022 Published: 29 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

measuring variations in groundwater resources is very challenging and time-consuming. Water movement in porous media is sufficiently complex to simulate and even the most elaborate models cannot estimate all details of this complexity, even if attempted, it ends with a parsimonious estimation of the reality.

The most common approach which has been used for decades is to estimate the groundwater head variable, mostly by using numerical models [9–11]. The numerical models use mathematical equations to describe the physics of the groundwater flow; the model accuracy thus depends on how precisely the conceptual models describe the realworld system. These models largely rely on the available water-level data [12,13]. However, groundwater-level data are often irregularly sampled, leading to temporal gaps in the record, and are not adequately distributed spatially across an aquifer [14–16]. The spatial sparseness of data presents challenges when spatially interpolating potentiometric surfaces and creating groundwater maps [14,17,18].

In addition to numerical models, statistical-based or regression-based approaches like kriging, spline interpolation, and neural networks have also been used in predicting water levels [19,20]. The most vivid method applied in the spatially auto-correlated variable like groundwater level is the kriging method namely, the ordinary kriging technique. Aboufirassi et al. [21] employed the universal kriging to estimate the water table for the Souss aquifer in central Morocco. Pucci and Murashige [22] used kriging for optimizing data collection and utility in a regional groundwater investigation in central New Jersey and confirmed that kriging is a useful tool especially in areas lacking enough data for developing a water table management network. Hoeksema et al. [23] applied the co-kriging method to estimate the groundwater level at unknown points. A similar study was done by other researchers where the kriging method was used to estimate the water levels at wells [24,25]. Theodossiou and Latinopoulos [13] used the kriging method on 31 wells in evaluating and optimizing the groundwater level observation networks. Ahmadi and Sedghamiz [26] evaluated the spatial and temporal variations of groundwater level of 39 observation wells using kriging. In their later article [27], the kriging and co-kriging methods were applied for groundwater depth mapping in southern Iran. Tapoglou et al. [12] used Artificial Neural Networks (ANNs) to estimate the temporal prediction of the water level and applied the kriging method to spatial parts. Ruybal et al. [17] used spatiotemporal kriging in evaluating groundwater levels in the Arapahoe aquifer. These approaches come with a major limitation in that they fail to explain the physics of the groundwater flow [28].

This study presents an integrated approach where the groundwater model is coupled with a geo-spatial kriging tool. This way, we expect to integrate the strong aspects of both approaches while reducing their demerits. The numerical model will explain the physics of groundwater flow. The statistical interpolation method, kriging will then be used on the numerical model's errors with an expectation of improving our estimation by considering the complexity not detected by a numerical model. In this study, the kriging method was applied on the residuals of the numerical model (MODFLOW) generated by the TWDB (Texas Water Development Board) for the Edwards–Trinity (Plateau) aquifer to improve the estimation of the water table spatially. The study was done for the years 1995 through 2000 where 90% of the observation data was used for model simulation followed by crossvalidation with the remaining 10% of the observation data. To the authors' knowledge, no prior efforts have been done using the technique adopted in this paper. Most importantly, the significant improvement in groundwater level predictions makes this study a promising approach for the sustainable management of water resources.

#### **2. Methodology**

#### *2.1. Study Area*

The Edwards–Trinity (Plateau) Aquifer, as shown in Figure 1, expands over westcentral Texas between 97◦ and 105◦ west longitudes and between 29◦ and 33◦ north latitudes. The topology of the aquifer is known as a plateau, lightly leveling from about 610 m (2000 ft) above sea level in the southeast to about 915 m (3000 ft) in the northwest. The precipitation

in the region ranges between 86 cm (34 in) in the east to 30 cm (12 in) in the west. The maximum average annual temperature for the study area ranges between 23 ◦C (73◦ F) in the Trans-Pecos uplands to 26 ◦C (79◦ F) in southern Val Verde County. The study area had numerous drought events during the past hundred years [29]. Yet, the drought events are expected to have minimal impacts for the study period (1995 through 2000) as the last drought was during the 1950s.

**Figure 1.** Location of the study area—Edwards–Trinity (Plateau) Aquifer.

#### *2.2. Data*

Data on groundwater levels, climate data, boundaries shapefiles, data on hydraulic properties of the aquifer, elevation data, and MODFLOW simulated heads were required. All these data were obtained from the TWDB website [30]. The water levels were obtained from the Groundwater Database (GWDB:1, accessed on 1 April 2020 ) consisting of shapefiles with geospatial information on water level and quality for management, monitoring, and characterization of the water in the Edwards–Trinity aquifers. The precipitation data was gathered using raster data for the years 1995 through 2001. The boundaries and hydraulic properties were gathered from separate shapefiles consisting of aquifer boundaries, model boundaries, Texas county boundaries, hydraulic conductivity for each one sq. km. The elevation data was obtained from the raster files consisting of DEM (Digital Elevation Model), top and bottom elevation of the Edwards and Trinity aquifers. Lastly, MODFLOW generated heads were obtained as publicly accessible binary files produced from the MOD-FLOW model developed by TWDB. Figure 2 shows the location of observation wells for all the years (1995–2000). Not all years have observation data available for the same locations, thus the locations for each year were selected randomly as can be seen in Figure 2.

**Figure 2.** Location of observation wells in the Edwards–Trinity aquifer.

#### *2.3. Modeling*

Figure 3 represents a schematic diagram of the research methodology adopted. As shown in the figure, the general scheme of this study consists of three major components— (1) Data Preparation, (2) Calibration, and (3) Cross-Validation and includes a series of steps used interactively as listed below:


**Figure 3.** Schematic diagram of the MODFLOW + kriging model.

#### 2.3.1. Data Preparation

To minimize the distortion of the area and appropriate evaluation of the data, the NAD 1983 Texas Centric Mapping System Albers was used. First, the data were categorized into two groups—observed data and MODFLOW generated output. The observed data consists of the mean values for the water tables during the winter season (winter months were selected as the water tables are expected to be more stable during winter and thus minimize the anthropogenic effects due to uncertainties in demand seasons). The observed data were then provided with coordinates and transformed into a shapefile. For the MODFLOW generated data, the model grid shapefile produced by the MODFLOW program was employed in the study area and saved based on the needed attributes for future analysis. The binary files consisting of groundwater head data were converted to a text file using python programming. The extracted head data were matched with the corresponding observed data by locating the observation data in the grid shapefile. Finally, the shapefile of observation data was overlaid on the MODFLOW grid shapefile to determine the value of the MODFLOW generated head for each observation point. The MODFLOW generated heads were compared with the observed data for the corresponding locations thus obtaining the residuals.

#### 2.3.2. Kriging Method

Kriging is one of the well known methods of predicting spatial characteristics and it has been used in a variety of fields (e.g., soil science, ecology, mining, and water resources) to provide a robust unbiased estimation of geo-distributed variables from small scales like Xray scattering experiments [31] to large ones like traffic behavior pattern [32], soil properties' profile [33], and anticipating the metrological variables [34]. The main advantage of the kriging method over the other spatial interpolation techniques is that the method is driven based on the statistical theory, comparing others that are mostly deterministic and they have the lack of ability to use for prediction. Furthermore, studies over geo-dependent

variables showed that kriging has outstanding performance compared with its similar existing interpolation methods [35].

Equation (1) shows the stochastic function that is considered in the simple kriging method:

$$Z(s) = \mu + \varepsilon(s) \tag{1}$$

where *Z*(*s*) is the estimating function—which is simulated groundwater head elevation in our study, *μ* is a constant value representing the mean value of the groundwater head and *ε*(*s*) is a random error function regarding the model error from the observed records.

Estimating the value of *Z*(*s*) relies on two primary assumptions of considering that variables are randomly distributed, and they preserve second-order stationary condition respect to the location [36], which means:

$$E[Z(s+h)] = E[Z(s)]\tag{2}$$

$$\text{cov}[Z(\mathbf{s} + h), Z(\mathbf{s})] = \mathbb{C}[h] \tag{3}$$

where *h* is a vector that connects point s to point s + h. Equation (2) implies that the expected value *E*[*Z*(*s*)] is constant in all domains as represented in Equation (1) by the constant *μ*. However, in reality, this value fluctuates from place to place due to inherent trends and variability in data. To deal with this problem, it is usually assumed that the estimating variable (groundwater head) comprises two components

$$Z(s) = m(s) + \varepsilon(s) \tag{4}$$

where *m*(*s*) presents the deterministic part and *e*(*s*) is the statistical component of the estimating variable which includes the spatially correlated random variable. Since *m*(*s*) is treated as a deterministic part, it can be determined in a separate process and summed up with residuals random function component *e*(*s*). In this study, the deterministic part of the head variable is estimated by existing numerical models. We then applied the ordinary kriging (OK) to the residuals of the numerical model for the statistical part assuming the mean value of error is not known and needs to be obtained over optimization process by minimizing the variance of errors.

#### **3. Results and Discussion**

#### *3.1. Data Investigation*

In standard statistical problems, the first step before starting to model the phenomena is to examine the data. In this study, a commonly used statistical tool, histogram, was used to see if the MODFLOW residuals are normally distributed. As shown in Figure 4, the residuals in general, follow the normality pattern. However, some level of negative skewness and distortion of normality was observed in the data which could be attributed to the complexity of nature.

Next, the residuals were checked for spatial correlation as the data tends to share some information with their neighbors. A variogram method was used to estimate how strongly data are related to each other. The variogram in space is usually implemented to check two major assumptions in the application of the kriging method—stationary in space (variance is independent of location) and isotropy (variance is independent of direction). Thus, a semi-variogram cloud or plot was used to provide a better visual understanding of the data distribution and to detect any possible trends or geometric anisotropic behavior.

**Figure 4.** Histogram presenting the water level residuals for Edwards–Trinity aquifer (years 1995–2000).

Figure 5 shows the semi-variogram clouds of water level residuals calculated from subtraction between observation points and the MODFLOW outputs for the years 1995 to 2000. The difference squared (*γ*) in Figure 5 represents the MODFLOW residuals dissimilarity which is defined as:

$$\gamma = \frac{1}{2} \left( Z(s\_i) - Z(s\_j) \right)^2 \tag{5}$$

where *s* is sample location and *Z* is the residual value.

The reddish circles in the figure indicate that there is a strong gradient observed at short distances, as the value of the semi-variogram for each pair in these areas is significantly high. This significant change can be an indication of non-stationarity in higher ranks caused by neighboring drainage areas or rivers. However, to get a proper conclusion over the sources that resulted in non-stationary, more investigation needs to be considered which is beyond the scope of this study.

**Figure 5.** Semi-variogram cloud for water level residuals during the years 1995 through 2000.

The semi-variogram was further used for interpolation where semi-variance was used to represent the expected value of the residuals' dissimilarity. A theoretical model was fit into the sample data. In this study, a commonly used Spherical function was used. The spherical function shows a progressive decrease of spatial autocorrelation until some distance (radius of influence), beyond which autocorrelation is zero [37]. As observed from the sample variogram in Figure 6, the Nugget variance of 50 and range of 50,000 was defined for fitted theoretical semi-variogram.

**Figure 6.** Sample variogram and fitted model applied on Edwards aquifer dataset for years 1995 through 2000.

#### *3.2. Model Simulation*

The ordinary Kriging method was applied to the MODFLOW residuals. In the process, the existing trend arising from uncertainties of the simulated model was removed using first-order trend and the exponential Kernel function was applied to weight the values of the neighbors closed to sampled values [38].

The spherical model was chosen to fit on a semi-variogram, and the maximum number of neighbors affecting the predicted data was limited to ten points. The kriging then applied as a single model resulted in a higher mean root square error (not presented in this paper), thus the area was divided into four quadrants with 45 offsets to minimize the distortion due to anisotropy as observed in Figure 5.

Figure 7 shows the prediction for OK applied to Edwards–Trinity aquifer between the years 1995 to 2000. High residuals (as shown by dark red and blue color) were observed for the areas outside the boundary while the residual values are minimal inside the study area. Furthermore, similar observations were observed in deviations as shown in Figure 8. The high residuals or larger deviation (as shown by purple color) outside the boundary is attributed to the missing observation data. Furthermore, some areas inside the study area

showed relatively higher residuals with larger deviations which could also be due to the missing observation data.

**Figure 7.** Predicted residuals (in meters) after application of kriging on MODFLOW residuals for years 1995 through 2000.

**Figure 8.** Standard Deviation (in meters) for ordinary kriging applied to Edwards–Trinity aquifer for the years 1995 through 2000 (the black dots in the figure represent the observation points used during cross-validation).

#### *3.3. Model Validation*

As mentioned before, the entire dataset was divided into two sets—a simulation dataset (90% data) and a validation dataset (remaining 10% data). First, kriging was applied on 90% of the data where the validation was performed automatically using the ArcGIS Geostatistical Analyst toolbar [38]. The Geostatistical Analyst toolbar provides the measured and predicted values and the standard error for each point. During the process, the software keeps any individual points separate from other data (referred to as dataset) for estimating the spherical model parameters. The parameters are then estimated using every such dataset and the process continues until optimal parameters that best fit the entire datasets are obtained. Table 1 presents the average water head for observation points along with predicted water heads using MODFLOW and kriging, while Table 2 provides a detailed summary of kriging application on the MODFLOW residuals. As observed in Table 1, the residuals obtained from the MODFLOW simulation are much higher as compared to the residuals obtained after the application of kriging. The residuals significantly reduced from an average value of approximately 37 m to less than 1 m, with a standard error of approximately 0.5 m.


**Table 1.** Average water heads using MODFLOW and kriging method (simulation dataset).

<sup>1</sup> Observed average value for water level for the given year (winter season). <sup>2</sup> MODFLOW simulated average value for water level for the given year (winter season). <sup>3</sup> Difference between observed values and the MODFLOW simulated values (Observations—Mean a). <sup>4</sup> Simulated average value for water level for the given year after application of kriging (Mean a + Predicted Residuals). <sup>5</sup> Residuals obtained after application of kriging on the observed residuals. <sup>6</sup> Difference between predicted and observed residuals for each point.



As a next step, a manual cross-validation was adopted where the calibrated model was applied to the remaining 10% of the data. Table 3 presents the average water heads for observation points along with predicted water heads using MODFLOW and kriging. Similar to observations in Table 1, higher residuals were obtained for the MODFLOW simulation for all the years. However, after the application of kriging on MODFLOW residuals, the average absolute error of approximately 31 m (from MODFLOW simulation) was reduced to less than 5 m. A similar reduction was observed in the residuals' average standard error where the values reduced from 9.7 to 4.7. The observed reduction in standard error thus indicates that the average value of residuals over the entire period can be a good estimation for each year separately. Table 4 provides a detailed summary of the kriging application on the residuals.


**Table 3.** Average water heads using MODFLOW and kriging method (validation dataset).

**Table 4.** Detailed summary of kriging application on MODFLOW residuals (validation dataset).


For all the years, the water head for observation points obtained after kriging is much closer to the observed values. However, for the year 2000, three observation points (11, 12, and 15) showed higher residuals after the application of kriging as shown in Table 5. This could possibly be due to the higher prediction accuracy of the MODFLOW simulation in

those points. The table with water heads for all the years (1995 through 2000) has been provided as a supplementary file in Table S1.


**Table 5.** Predicted water heads for observation points for the year 2000.

#### *3.4. Comparison to Other Studies*

The integrated approach used in this study satisfactorily described the general pattern of residuals generated by numerical model MODFLOW, thereby improving the prediction of the groundwater level at ungauged areas. Results showed that the integrated kriging and MODFLOW method can be used as an alternative approach in improving the existing numerical models whilst reducing the underlying model uncertainties. Most of the previous studies focused on decreasing the uncertainties by improving the resolution, or/and including local hydraulic variables like pumping and recharge/discharge to the largescaled models [39].

However, the uncertainties in residuals can have sources of randomness that the improvements in the modeling process may not address, and rather could be represented by statistical methods such as kriging. The applied ordinary kriging in this study showed that the MODFLOW model's residuals are locally correlated noises and can be estimated from their neighbors to some extent (with the range of 50 km) by using the spherical method as a correlation function. The numerical model used in this study was developed and improved by Anaya & Jones [29] and is the MODFLOW model approved by TWDB for the Edwards–Trinity Aquifer. In this study, MODFLOW was used as a deterministic part and the model uncertainties were presented as trends during the kriging model. The use of kriging reduced the average absolute error from approximately 31 m (from MODFLOW simulation) to less than 5 m after the application of kriging, which aligns with the findings from a previous study by Liu et al., [40]. They calibrated the MODFLOW model and were able to reduce the average absolute error from 7.7 m to 3.44 m through updates in the numerical modeling process. During the calibration process, they revised the recharge and used the actual value for pumping. However, the improvement was only performed for a smaller region (only the San Antonio segment of the aquifer) in contrast to improvement over the entire Edwards–Trinity as in the study presented by the authors.

#### **4. Conclusions**

The kriging method was applied to improve spatial confidence in groundwater-level predictions at unsampled locations. Kriging was applied on the MODFLOW residuals for the groundwater levels in the Edwards–Trinity aquifer in Texas. The study was done for the years 1995 through 2000 where 90% of the observation data was used for model simulation followed by validation with the remaining 10% of the observations. The kriging method significantly improved water level predictions. The average absolute error of approximately 31 m (from MODFLOW simulation) was reduced to less than 5 m after the application of kriging on MODFLOW residuals. Furthermore, the average residuals' standard error decreased from 9.7 to 4.7, which indicates that the average value of residuals over the entire period can be a good estimation for each year separately. With improved water level predictions, geostatistical tools such as kriging can be used to produce more accurate potentiometric surface maps. Such improved results and accurate monitoring of groundwater resources will lead to the sustainable use of groundwater resources while also aiding in efficient and effective conjunctive management of surface and groundwater resources.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/w14030426/s1, Table S1: Predicted water head for observation points for the the years 1995 through 2000.

**Author Contributions:** Conceptualization, A.A. and K.A.; methodology, A.A.; validation, K.A.; formal analysis, A.A. and K.A.; investigation, A.A. and K.A.; data curation, A.A.; writing—original draft preparation, A.A. and K.A.; writing—review and editing, A.A. and K.A.; visualization, A.A. and K.A.; funding acquisition, K.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** The manuscript hasn't receive external founding. Publishing fee for this article has been covered by College of Natural Sciences at Cal Poly Humboldt.

**Data Availability Statement:** Data available on request.

**Acknowledgments:** The authors would like to thank the Department of Civil, Environmental and Construction Engineering at Texas Tech University for their support in conducting this research with assistantship support. And, sincere thanks to Guofeng Cao fromthe Department of Geography at University of Colorado, Boulder, and Adjunct Faculty of the Department of Geography at Texas Tech University, for his guidance with the research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

