A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China

Zhang, Jialun; Dong, Donglin; Zhang, Longqiang

doi:10.3390/w15234129

Open AccessArticle

A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China

by

Jialun Zhang

^1,2,

Donglin Dong

^1,2,* and

Longqiang Zhang

^1,2

¹

College of Geoscience and Surveying Engineering, China University of Mining & Technology-Beijing, Beijing 100083, China

²

National Engineering Research Center of Coal Mine Water Hazard Controlling, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Water 2023, 15(23), 4129; https://doi.org/10.3390/w15234129

Submission received: 26 October 2023 / Revised: 20 November 2023 / Accepted: 24 November 2023 / Published: 28 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

Estimating groundwater level (GWL) changes is crucial for the sustainable management of water resources in the face of urbanization and population growth. Existing prediction methods for GWL variations have limitations due to their inability to account for the diverse and irregular patterns of change. This paper introduces an innovative approach to GWL prediction that leverages multisource data and offers a comprehensive analysis of influencing factors. Our methodology goes beyond conventional approaches by incorporating historical GWL data, examining the impacts of precipitation and extraction, as well as considering policy-driven influences, especially in nations like China. The main contribution of this study is the development of a novel hierarchical framework (HGP) for GWL prediction, which progressively integrates correlations among different hierarchical information sources. In our experimental analysis, we make a significant discovery: extraction has a more substantial impact on GWL changes compared to precipitation. Building on this insight, our HGP model demonstrates superior predictive performance when evaluated on real-world datasets. The results show that HGP can increase NSE and R² scores by 2.8% during the test period compared to the current more accurate deep learning method: ANFIS. This innovative model not only enhances GWL prediction accuracy but also provides valuable insight for effective water resource management. By incorporating multisource data and a novel hierarchical framework, our approach advances the state of the art in GWL prediction, contributing to more sustainable and informed decision making in the context of groundwater resource management.

Keywords:

machine learning models; deep learning; groundwater; hierarchical recurrent neural network; CNN

1. Introduction

Groundwater is an essential natural resource widely used in society. Effectively managing this water resource is critical to ensure an adequate and stable supply for future production and consumption. Analyzing groundwater levels (GWLs) and estimating changes in GWLs are essential to sustainable groundwater management. Weather, groundwater extraction, and land use influence GWLs; accurate data availability, funding, policy structure, and application are critical to groundwater management. An accurate and stable monitoring system is required for adequate groundwater storage, establishing long-term and short-term storage plans, and optimizing infrastructure operations. Optimizing groundwater distribution and supply can mitigate and prevent environmental problems, such as droughts, floods, famines, and landslides [1,2].

Over the years, numerous scholars have proposed many GWL prediction techniques to assist in managing groundwater resources. However, the complexity and dynamics of groundwater flow make accurate and comprehensive simulations challenging [3]. The long period and wide spatial span of groundwater data make the selection of the best method to analyze these data complex.

In recent years, physical-based, or traditional, GWL prediction models have been used frequently. Sahoo and Jha studied multiple linear regression (MLR) to predict groundwater levels in nonpressurized aquifer systems [4]. Their research showed that the MLR model developed for predicting GWLs had reasonable accuracy and could be used as a simple GWL modeling tool when data are limited. However, the limitation of this physical-based method is that MLR cannot handle input and output variables. In a recent study, Yousefi predicted the GWL in Iran’s Karaj for a decade using MATLAB [5]. Their method used MODFLOW2005-NWT, an independent program that improves the solution flow of unconfined groundwater. The GWL modeling focused on three positive, negative, and sustained scenarios. Regarding this method, Yadav pointed out that predicting the GWL using this approach is complex because of the large number of physical operations in the groundwater system that need to be described [6].

Hydrology, geology, topography, meteorology, and climate contribute to data uncertainty, complicating the calibration and validation of physically based models [7]. The nonlinear relationships among variables in groundwater and other hydrological systems require a large amount of data for modeling, making groundwater level prediction challenging [8]. Many researchers have recently adopted machine learning techniques to overcome the limitations of physical models, which are increasingly important because they can independently adapt to new data and learn from previous computations to make reliable and even accurate predictions [9,10].

In the domain of groundwater level (GWL) prediction, machine learning advancements have progressively tackled the complexities inherent in hydrogeological systems, yet they often grapple with the nuanced interplay of exceptional events, such as droughts, and the critical role of anthropogenic factors [11,12]. Artificial neural networks (ANNs) marked a significant shift from traditional methods, adeptly handling complex scenarios but occasionally constrained by overfitting and computational intensity [13]. Subsequent developments, such as feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs), offered improvements in accuracy and time series management, although they too faced specific limitations such as gradient vanishing [14,15]. Innovations like long short-term memory (LSTM) and gated recurrent unit (GRU) models addressed some of these issues, showing enhanced stability and accuracy in predictions [16,17,18,19,20]. Concurrently, support vector machines (SVMs) and adaptive neuro-fuzzy inference systems (ANFIS) enriched the predictive landscape, especially in handling nonlinear and multivariable scenarios [21]. The recent advent of nonlinear autoregressive networks with exogenous inputs (NARX) further exemplifies the field’s evolution, particularly in challenging environments like urbanized and arid aquifers [22]. Despite these technological strides, a holistic approach that encompasses both prediction accuracy and a comprehensive understanding of the underlying physical processes, including the impact of human activities, remains a vital consideration in sustainable groundwater management [23,24,25,26,27,28,29,30].

This study presents an innovative approach to groundwater level (GWL) prediction by integrating data from the micro-, meso-, and macrolevels, an endeavor driven by the aspiration to achieve unparalleled accuracy and a holistic understanding of GWL dynamics. This multitiered strategy, drawing on lessons from prior research [31], harmonizes data across different scales to decode the complexities of groundwater behavior.

This study meticulously examines groundwater level (GWL) prediction by harnessing data across three scales: micro, meso, and macro. At the microlevel, historical GWL records from monitoring wells provide insight into local fluctuations and temporal autocorrelation, offering detailed but narrowly scoped data. The mesolevel extends this analysis by including meteorological and groundwater extraction data from the Baiquan spring domain, shedding light on broader influences such as precipitation’s effect on recharge and the impact of extraction practices on GWL. The macrolevel further broadens the perspective by integrating government policies using binary indicators to assess the effects of water resource management and groundwater utilization on GWL.

This holistic approach to data integration sets the stage for the main objective: the development and validation of a hierarchical groundwater level prediction (HGP) model, a novel approach that integrates multisource data across micro-, meso-, and macrolevels to enhance the accuracy of groundwater level (GWL) predictions. This model aims to overcome the limitations of traditional methods by providing a more comprehensive understanding of the factors influencing GWL, including environmental variables, human activities, and policy impacts. Through this innovative approach, we seek to advance the field of groundwater management by offering a nuanced and effective tool for predicting GWL.

2. Materials and Methods

2.1. Data

2.1.1. Dataset Description

The multisource dataset was constructed by collecting and integrating various data streams, including precipitation measurements obtained from satellite observations, historical groundwater level records from well stations, and data on groundwater extraction rates from local water authorities. Additionally, we incorporated policy-related variables, such as regulatory measures, conservation policies, and groundwater management initiatives, to capture the influence of governance on groundwater dynamics (pertinent water management policies were obtained from governmental repositories). Before delving into the interpretation of the multisource data, it is essential to provide a brief introduction to the hydrogeological conditions of the study area. This will enable us to explain the impact of multisource data on groundwater levels (GWLs) from a hydrogeological perspective. The Baiquan karst water system is located in the plain area of the eastern foot of the Taihang Mountains in the west of Xingtai and Handan [32]. It is an independent watershed in which the water supply is a vital function. Figure 1 shows an overview of the study area. The system covers an area of 3843 km², with a significant difference in terrain height. The Baiquan karst water system is a complete drainage type, mainly recharged by atmospheric precipitation and supplied to Xingtai city through underground runoff.

The study area, primarily situated in the eastern foothills of the southern Taihang Mountains in Hebei Province, China, features a topography of low mountains and hills with elevations ranging from 40 to 1200 m. Influenced by river and valley flood activities, a series of alluvial–proluvial fans of varying sizes have formed along the mountain front. The stratigraphy of the region spans from the Archean to the Cenozoic era, encompassing a diverse range of geological formations.

The aquifer systems in this spring area are categorized into three major types: porous water-bearing rock systems in unconsolidated rocks, water-bearing systems in carbonate rock fractures and karst, and those in bedrock fractures. Karst water in the area is further classified based on burial conditions into exposed, covered, and buried types. The development and distribution of primary karst features, such as solution pores, fissures, and caves, are influenced by lithology, structural geology, and topography, as well as hydrological and hydrodynamic conditions, with rock type and structure being key factors.

Hydrogeologically, the area is bounded by the Inner Hill–Xingtai Arcuate major fault and the Xingtai–Fengfeng fault in the east, forming a water-blocking boundary. The southern boundary’s western segment is demarcated by the groundwater divide of the North Ming River (with the Fengfeng Heilongdong spring area of Handan to the southwest), while its eastern segment is defined by coal strata and igneous rock formations. The western boundary aligns with the surface water divide of the Taihang Mountains, and the northern boundary is marked by the groundwater divide in the area of the Inner Hill Northwest Ridge (adjacent to the Shigu spring area of Lincheng, Xingtai, China). These boundaries delineate a largely independent and closed hydrogeological unit, predominantly characterized by karst water.

The system has the advantages of fast recharge, short cycle time, and excellent water quality, but changes in rainfall and large-scale extraction and drainage can have specific effects on the flow of the spring group. The causes of disconnection are groundwater overdraft, defective planning and construction of water supply sources, increased mining drainage, and reduced groundwater recharge. Therefore, an intelligent prediction mechanism is needed to manage groundwater resources.

To ensure data consistency and compatibility, we conducted rigorous preprocessing procedures, including data cleansing, normalization, and temporal alignment. The integrated dataset facilitated a unified framework for conducting in-depth analyses.

2.1.2. Empirical Observation

Groundwater level prediction is a complex task that requires a comprehensive understanding of the diverse factors influencing groundwater dynamics. To overcome the limitations of traditional single-source data-driven methods, we present a novel approach that leverages multisource data to forecast groundwater levels with enhanced accuracy and scientific rigor. The dataset comprises crucial components, including precipitation data, historical groundwater level records, groundwater extraction, and pertinent policy variables impacting groundwater management. Integrating this diverse dataset enables a more holistic analysis of groundwater behavior and its response to various environmental and anthropogenic influences. The associated observational findings and insight are presented in this section.

Microlevel: Historical GWL Observation

In Figure 2 (left), we analyzed historical groundwater levels from seven observation wells, focusing on well #7 in urban Xingtai city. The groundwater levels in the first half of 2018 decreased, followed by a gradual rise from July 2018 to a peak in February 2019. A continuous decline until June 2020 marked the lowest levels, followed by a gradual rise stabilizing in October 2020. A sharp increase occurred in July 2021, followed by stability in November 2021. These trends align closely with data from the Xingtai City Water Resources Bureau, suggesting groundwater levels as a reliable indicator of groundwater resources.

Figure 2 (right) shows the strong autocorrelation in the daily observation data from 2018 to 2022, with the autocorrelation coefficients exceeding 0.77 within a 30-day lag period. This suggests significant short-term memory effects on groundwater levels, influenced by factors like groundwater flow rates, aquifer properties, and external drivers.

While short-term autocorrelation is evident, relying solely on historical groundwater data may not provide precise predictions due to complex interactions involving meteorological patterns, hydrogeological properties, and human activities. To improve predictive models, integrating diverse data sources, including precipitation and groundwater extraction rates, is essential. Precipitation affects aquifer recharge, while groundwater extraction adds an anthropogenic dimension. This comprehensive approach enhances our understanding of groundwater dynamics and supports informed water resource management

2.: Mesolevel: Precipitation and Extraction Observation

Figure 3 highlights seasonal variations in precipitation, with a rainy season from July to November and a dry season from December to June. Peak rainfall, increasing yearly, was particularly pronounced in July 2021 due to unprecedented heavy rainfall. This July 2021 rainfall significantly contributed to a rapid rise in groundwater levels.

Granger causality testing is a statistical method used to assess whether there exists a causal relationship between two time series datasets. This approach relies on the concept of lagged values and employs a vector autoregression (VAR) model [33]. In the context of Granger causality testing, consider two time series: X and Y. The VAR model takes the following general form for each time series:

\begin{matrix} X_{t} = \sum_{i = 1}^{p} α_{X i} X_{t - i} + \sum_{i = 1}^{p} α_{Y i} Y_{t - i} + ε_{X t} \\ Y_{t} = \sum_{i = 1}^{p} β_{X i} X_{t - i} + \sum_{i = 1}^{p} β_{Y i} Y_{t - i} + ε_{Y t} \end{matrix}

(1)

where

X_{t}

and

Y_{t}

represent the observations of time series X and Y at time t; p is the chosen number of lags; α and β are the coefficients in the model; and ε represents the white noise error terms. Granger causality testing involves formulating null and alternative hypotheses. The null hypothesis (H0) assumes that time series X does not Granger cause time series Y, while the alternative hypothesis (H1) posits that time series X Granger causes time series Y, indicating at least one nonzero β coefficient. The statistical test uses the F-statistic to examine the null hypothesis, with a resulting p-value indicating the probability of observing the test statistic under the null hypothesis. If the p-value is less than a predetermined significance level (typically 0.05), we reject the null hypothesis, suggesting that time series X does indeed Granger cause time series Y.

Past observations confirm that increased precipitation leads to higher groundwater levels, as rainwater replenishes the underground aquifer. Notably, groundwater levels also respond to shorter-term fluctuations related to precipitation, even during an overall declining trend.

To explore the causal link between precipitation and groundwater levels, we conducted a Granger causality test. The results show a strong causal effect, with precipitation at a 2-day lag significantly impacting current groundwater levels.

However, it is important to consider that while statistically significant, the magnitude of this effect may be relatively small, and other factors like aquifer characteristics and human activities could also influence groundwater levels.

In Figure 4, we observe a general decline in groundwater extraction, attributed to efforts to combat excessive extraction in Hebei Province. Notably, groundwater extraction and groundwater levels show an inverse relationship. When extraction decreased, levels began to recover, indicating a correlation.

The correlation analysis reveals a moderate negative correlation of −0.272 between the extraction and groundwater levels, suggesting that as the extraction increases, the levels tend to decrease. The Granger causality tests further confirmed a significant causal effect, with extraction impacting levels at a 1-day lag.

It is important to note that while precipitation also impacts groundwater levels with a 2-day lag, these two factors alone cannot fully explain the fluctuations. For example, during the low precipitation and high extraction from August 2018 to February 2019, the groundwater levels continued to rise, challenging simple predictions based on precipitation and extraction alone.

3.: Macrolevel: Policy Observation

Figure 5 clearly demonstrates that the implementation of water management policies has had a tangible impact on groundwater levels, with a noticeable increasing trend and consistently high positions postimplementation. This observation highlights the criticality of incorporating policy effects into the analysis and prediction of groundwater levels, alongside other key factors like precipitation and groundwater extraction. The grey background in the figure denotes the range of policy influence, marking the periods corresponding with the rising groundwater levels and reduced extraction rates. Notably, significant policy interventions, such as the groundwater replenishment pilot project, initiated by the Provincial Water Resources Department, in Xingtai city in September 2018, and their subsequent conference in May 2021, emphasizing the reduction of the groundwater extraction, have been instrumental in shaping these trends. These policy measures, marked by key dates and actions in the figure, correspond with the periods of rising groundwater levels and reduced extraction rates, underscoring the profound influence that policy decisions exert on groundwater dynamics. This correlation between policy initiatives and groundwater levels, particularly the sustained rise following measures to curb extraction, affirms the necessity of integrating policy considerations into groundwater management strategies.

On the basis of the observations mentioned above, we can draw the conclusion that a comprehensive analysis based on historical groundwater levels, precipitation, groundwater extraction, and policy implications yield valuable insight into the variations of groundwater levels. This integrated approach provides a strong impetus and valuable guidance for predicting groundwater levels, taking into account the interrelationships among these four distinct factors. In summary, the combined analysis of historical groundwater levels, precipitation, groundwater extraction, and policy measures offer meaningful and comprehensive information on groundwater level variations. It also serves as a significant inspiration, urging us to consider the interconnections among these three different levels of factors when forecasting groundwater levels.

2.2. Model

In this segment, we endeavor to synthesize the rich insight obtained from empirical data, an exercise extensively elaborated upon in Section 2.1.2. This synthesis forms the crux of a novel and comprehensive framework dubbed “hierarchical groundwater level prediction” (HGP). This framework is designed to intricately decipher the multifaceted patterns of variations in groundwater levels, drawing heavily from the in-depth analysis of human behavior and data tiers presented in Section 2.1.2.

At the microlevel, our focus narrows to historical groundwater level (GWL) data, denoted as “

X_{g}

”. This layer of the model delves into the temporal intricacies of GWL fluctuations, capturing the nuanced ebb and flow patterns inherent in the historical data. The microlevel represents the bedrock of our model, providing a granular view of groundwater dynamics over time.

Ascending to the mesolevel, we integrate two pivotal datasets: precipitation “

X_{m}

” and groundwater extraction “

X_{e}

”. These elements serve as critical indicators, elucidating the interplay between meteorological conditions and anthropogenic influences on GWL. At this juncture, “

X_{m}

” and “

X_{e}

” collectively inform the model about the external factors that directly or indirectly sway the groundwater levels, thereby acknowledging the significant role of environmental and human activities in shaping GWL trends.

The macrolevel of our model, represented by “

X_{p}

”, encapsulates the overarching impact of government policies. This dimension extends beyond the immediate physical influences on GWL, offering insight into how policy decisions and regulatory frameworks contribute to the broader groundwater environment. Here, “

X_{p}

” stands as a testament to the far-reaching implications of policy interventions on groundwater dynamics, underpinning the necessity to incorporate these broader, often indirect, factors into our predictive model.

Our research demonstrates that data from the macro- and mesolevels significantly impact microlevel behavior. To achieve a holistic understanding of groundwater level (GWL) patterns and the interplay of diverse influences, we developed a hierarchical framework. This structured framework extracts nuanced features from different data sources and integrates them progressively.

Traditionally, concatenating data at each time point and subjecting them to a single latent representation is a basic approach in modeling multisource sequences, like multivariate recurrent neural networks (MRNNs). However, this method inflates feature dimensions and may overlook critical interrelationships among data from various hierarchical levels.

Our approach differs from this by processing well data separately at each hierarchical level. Individual factors are processed in dedicated recursive layers, where their latent representations interact and fuse in a harmonious manner, culminating in the multivariate fusion of data. This approach avoids the limitations of simple concatenation, ensuring the extraction of unique features from each source and resulting in the harmonious fusion of information for predicting future GWL changes.

In Figure 6, we present the structured architecture of the hierarchical groundwater level (GWL) prediction (HGP) framework, a design that is both intricate and insightful. The HGP framework is characterized by its multilevel approach to feature extraction and hierarchical fusion, enabling the comprehensive analysis of data from diverse sources across three distinct levels.

At Level 1, the primary focus is on extracting temporal patterns from various observational sequences independently. These sequences include historical GWL data (

h_{g}

), precipitation (

h_{m}

), extraction (

h_{e}

), and policy impact factors (

h_{p}

), each with its unique temporal dynamics. This level is dedicated to isolating and then integrating these temporal patterns, creating pairwise fusions. The objective here is to accurately model the impacts of macro- and mesolevel factors on historical GWL data, ensuring each factor is appropriately represented in the overall analysis.

Moving to Level 2, the approach shifts from independent temporal pattern analysis to exploring the interplay among these factors. The temporal patterns, initially combined at Level 1, are now integrated to form more cohesive units. This level focuses on the interactions between GWL and other factors: GWL-precipitation (

h_{g m}

), GWL-extraction (

h_{g e}

), and GWL-policy impact factors (

h_{g p}

). The aim is to capture the combined effects of mesolevel factors on GWL, ensuring a comprehensive representation of these influences in relation to the historical data.

Level 3 represents the culmination of the framework, in which the overarching temporal patterns that span across multiple data sources are fully integrated. This final level, designated as

h_{g m e p}

, embodies the complete hierarchical fusion process. It synthesizes the insight gathered from the previous levels, offering a detailed and holistic understanding of GWL behavior. This comprehensive fusion of data allows for an accurate prediction of future GWL trends.

Each level of the HGP framework is designed to progressively build upon the previous one, ensuring a thorough and detailed analysis of GWL. The framework utilizes sequential input,

I_{t}^{k}

, and another source,

\tilde{I_{t}^{k}}

, from the k-th level for feature extraction and fusion, following a systematic and rigorous methodology. This structured approach allows the HGP to provide a nuanced and comprehensive understanding of GWL dynamics.

h_{t}^{k} = F_{r e c u r r e n t} (W_{I h} \cdot I_{t}^{k}, h_{t - 1}^{k})

(2)

\tilde{h_{t}^{k}} = F_{r e c u r r e n t} (\tilde{W_{I h}} \cdot \tilde{I_{t}^{k}}, \tilde{h_{t - 1}^{k}})

(3)

I_{t}^{k + 1} = F_{a c t} (F_{f u s e} (h_{t}^{k}, W_{\tilde{h} h} \cdot \tilde{h_{t}^{k}}))

(4)

In this context,

h_{t}^{k}

represents the latent representation of

I_{t}^{k}

in the k-th layer at time point t. It is updated by the function

F_{r e c u r r e n t}

based on its previous memory

h_{t - 1}^{k}

and the current input

I_{t}^{k}

, where the prime symbol (

∽

) carries the same meaning as for the sequences from another source.

W_{*}

is a trainable weighted matrix, and the purpose of the function

F_{f u s e}

is to merge the information from

\tilde{I_{t}^{k}}

into the current sequence

I_{t}^{k}

. This is followed by an activation function,

- F_{a c t}

, to produce an intermediate representation. Next, we provide detailed insight into the model inference and learning for GWL prediction.

Here, we implement HGP using a neural network and learn its parameters by minimizing specific losses. To capture temporal patterns, we can employ LSTM for the recurrent layer,

F_{r e c u r r e n t}

. In previous research [34], LSTM has shown better performance compared to GRU, linear RNN, and average pooling for capturing time patterns. Regarding the fusion function,

F_{f u s e}

, we introduced a novel hierarchical fusion mechanism to effectively integrate information from different sources. More details are discussed in the next paragraph.

In our hierarchical fusion mechanism, we developed a multitiered approach to analyze the interplay of data across different hierarchical levels. This framework is based on the premise that interactions among data sequences from various levels reveal complex temporal relationships. Our analysis in Section 2.1.2 has shown that factors such as extraction and precipitation have a time-lagged impact on groundwater levels (GWLs). This finding suggests the need for a fusion approach that not only combines current data but also considers historical data to capture these evolving dynamics.

To model these temporal interactions, we employed a specific formula, illustrated in Figure 7. In this schematic, red arrows represent the computational steps that update the state of our model, incorporating new information at each timestep. The blue arrows trace the propagation of the hidden state

h_{t}

, maintaining the temporal continuity essential for our analysis. In this approach, the current latent representation, denoted as

h_{t}

, is integrated with another latent representation,

\tilde{h_{t}}

, derived from a different data source. This method allows us to effectively combine information from various points in time, providing a more comprehensive understanding of the factors influencing GWL. The orange and blue nodes, marked as

h_{t}

, illustrate the iterative nature of the state across time, crucial for capturing dynamic changes in GWL. By integrating these diverse data sequences, the model aims to offer a more accurate and dynamic representation of groundwater level behavior, considering both present conditions and historical influences.

F_{f u s e} (h_{t}, \tilde{h_{t}}) = (h_{t} ⊙ W_{\tilde{h} h} \tilde{h_{t - 1}}) \oplus (h_{t} ⨀ W_{\tilde{h} h} \tilde{h_{t}})

(5)

In this context,

\oplus

represents a concatenation operator, and ⊙ is a set operator. For the t-th time step,

h_{t} ⊙ W_{\tilde{h} h} \tilde{h_{t - 1}}

and

h_{t} ⨀ W_{\tilde{h} h} \tilde{h_{t}}

capture the influence of

\tilde{h_{t - 1}}

and

h_{t}

on

h_{t}

, respectively.

For predicting future groundwater level (GWLs), our model outputs an expected value. This is achieved by applying a linear transformation, depending on the nature of the data, to the combined features

X_{g}

,

X_{m}

,

X_{e}

, and

X_{p}

. The model is optimized using the Adam optimizer, a popular choice for its efficiency in handling large datasets and variable parameters. The objective function, in this case, is the mean square error (MSE), which is more suitable for regression tasks, as it directly corresponds to the model’s prediction accuracy for continuous variables like the GWL.

For achieving optimal model performance, we conducted a rigorous hyperparameter fine-tuning procedure. The learning rate was set at 0.001, which is a commonly used value for stable convergence in many models. The batch size was determined to be 128, balancing the computational load and the model’s ability to generalize from the training data. Additionally, the model was trained for 1000 epochs to ensure thorough learning from the data without overfitting. This combination of learning rate, batch size, and epochs is crucial for the model’s ability to accurately predict future GWLs, striking a balance between complexity, computational efficiency, and prediction accuracy [35,36].

2.3. Model Evaluation

In this study, the performance of the HGP models was evaluated using the Nash efficiency factor (NSE), root mean square error (NRMSE), mean absolute error (MAE), and coefficient of determination (R²) [37], which were calculated as follows.

NSE = 1 - \frac{\sum_{t = 1}^{n} {(R_{t} - P_{t})}^{2}}{\sum_{t = 1}^{n} {(R_{t} - {\bar{P}}_{t})}^{2}},

(6)

RMSE = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(R_{t} - P_{t})}^{2}},

(7)

MAE = \frac{100 %}{n} \sum_{t = 1}^{n} |R_{t} - P_{t}|,

(8)

R^{2} = \frac{{[\sum_{t = 1}^{n} (R_{t} - {\bar{R}}_{t}) (P_{t} - {\bar{P}}_{t})]}^{2}}{\sum_{t = 1}^{n} {(R_{t} - {\bar{R}}_{t})}^{2} \sum_{t = 1}^{n} {(P_{t} - {\bar{P}}_{t})}^{2}},

(9)

where n is the total number of data points,

R_{t}

and

P_{t}

are the measured and predicted values of GWL for the GRU and LSTM models,

{\bar{R}}_{t}

and

{\bar{P}}_{t}

are the average GWL measurements and prediction values, and

P_{t . \max}

and

P_{t . \min}

are the maximum and minimum values of the GWL predictions, respectively.

In the model performance evaluation, key metrics such as the coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), and Nash–Sutcliffe efficiency (NSE) are pivotal. R², with values nearing 1, indicates a model’s high predictive accuracy and alignment with the observed data. The RMSE, ideally close to 0, measures the standard deviation of prediction errors, effectively quantifying the average magnitude of errors in the model’s predictions. The MAE, also best when near 0, provides a direct average of error magnitudes, useful for uniformly distributed errors [38]. Finally, the NSE, ranging from −1 to 1, assesses the model’s predictive skill relative to the mean observed data, whereby values closer to 1 signify better performance, and values below 0 indicate poorer performance than a simple mean prediction. These metrics collectively offer a comprehensive evaluation of a model’s accuracy and reliability.

3. Results

In predicting groundwater levels (GWLs), conventional machine learning methods have been rigorously evaluated against a fundamental baseline comprising three renowned techniques: random forest, XGBoost, and support vector machines (SVMs). Random forest excels in ensemble learning, integrating multiple decision trees to form robust predictive models, particularly effective in time series forecasting due to its ability to discern complex patterns while avoiding overfitting [39]. XGBoost, or extreme gradient boosting, stands out in temporal data analysis, employing an iterative boosting approach that enhances precision in predictions [40]. SVM, a staple in supervised learning, adeptly handles time series forecasting through optimized hyperplane classification, demonstrating a keen ability to uncover patterns in sequential data [41]. These established machine learning methods are crucial in the realm of predictive analytics, offering versatility and robustness essential for complex forecasting tasks like GWL prediction, thus forming a solid foundation for data-informed decision making.

To evaluate the capabilities of the hierarchical groundwater level prediction (HGP) framework, we conducted a rigorous comparison with a suite of robust deep learning benchmarks. This comparative analysis included a range of established models such as artificial neural networks (ANNs), feed-forward neural networks (FFNNs), long short-term memory (LSTM) networks, gated recurrent unit (GRU) networks, adaptive neuro-fuzzy inference system (ANFIS), and nonlinear autoregressive networks with exogenous inputs (NARX). This approach allowed us to assess the performance of HGP in the context of these well-established deep learning methodologies, thereby validating its efficacy in groundwater level prediction.

For the evaluation of this framework, we adopted a comprehensive suite of metrics, focusing on the assessment of its predictive performance. Our evaluation criteria included the Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). These metrics are fundamental in hydrological modeling, providing a robust quantitative assessment of a model’s accuracy and predictive capabilities. The NSE offers insight into the predictive skill of a model relative to the mean observed data, while the RMSE and MAE provide measures of the average model prediction error. The R², on the other hand, quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables, offering a gauge of the model’s explanatory power. Together, these metrics form a comprehensive framework for evaluating the HGP model’s effectiveness in groundwater level prediction.

We compared the experimental results of the HGP model with the traditional machine learning baselines, as shown in Table 1.

From the result presented in Table 1, it is evident that in the domain of groundwater level (GWL) prediction, different models exhibit varied levels of effectiveness. Random forest showed a robust training performance (Train NSE: 0.9891, R²: 0.9890), but its effectiveness was reduced in the testing phase (Test NSE: 0.9160, R²: 0.9137). XGBoost, on the other hand, demonstrated improved accuracy and generalization capabilities, with a Train NSE of 0.9977 and a Test NSE of 0.9573. Similarly, SVM displayed high training accuracy (Train NSE: 0.9987, R²: 0.9987) but a slight decline in testing performance (Test NSE: 0.9560, R²: 0.9557).

The hierarchical groundwater level prediction (HGP) model, however, outshone these traditional methods, particularly in the testing phase, which is critical for assessing a model’s generalization ability. HGP achieved the highest NSE (0.9933) and R² (0.9933) in the test set, coupled with the lowest RMSE (0.0259) and MAE (0.3758), signaling its superior predictive accuracy and reliability. This performance underscores the HGP framework’s adeptness at comprehensively capturing the complexities inherent in GWL data, leading to more precise predictions.

We compared the experimental results of the HGP model with the deep learning baselines, as shown in Table 2.

The results from Table 2 provide a comprehensive comparison of the various machine learning models for groundwater level (GWL) prediction, highlighting the hierarchical groundwater level prediction (HGP) model’s superiority. The ANN, FFNN, LSTM, GRU, ANFIS, and NARX models showed commendable performances, particularly in the training phase, with high NSE and R² values close to 1, indicating strong predictive accuracy. However, in the testing phase, these models exhibited varying degrees of decreased effectiveness, as evidenced by the lower NSE and R² values compared to their training performance. This decrease is particularly noticeable in the RMSE and MAE values, which increased in the testing phase, indicating reduced accuracy in real-world scenarios.

In contrast, the HGP model demonstrated exceptional performance, not only maintaining high NSE (0.9994) and R² (0.9933) values in the testing phase but also achieving the lowest RMSE (0.0259) and MAE (0.3758). This indicates that the HGP model not only fits the training data well but also generalizes more effectively to new data. The HGP model’s superior performance is attributed to its advanced multitiered approach that captures the complexities inherent in GWL data more effectively than traditional models.

In summary, while traditional models, like random forest, XGBoost, and SVM, and deep learning models, like ANN, FFNN, LSTM, GRU, ANFIS, and NARX, have shown proficiency in GWL prediction, the HGP model outperforms these methods, particularly in terms of generalization capabilities, as evidenced by its testing phase metrics, which is shown in Figure 8. Its ability to accurately predict GWL under varying conditions demonstrates its potential as a robust tool for groundwater level forecasting and highlights its advanced capacity to handle complex hydrological data.

In the Taylor diagram (Figure 9), each model’s performance is denoted by a point, where the angular position represents the correlation coefficient between the model and observed data, the radial distance from the origin indicates the standard deviation, and the distance from the observed data point reflects the RMSE. A model that perfectly predicts the observed data would lie on the point marked “Observation”.

The Taylor diagram analysis reveals that the hierarchical groundwater level prediction (HGP) model exhibits an exceptional performance in predicting groundwater levels (GWLs). Positioned in close proximity to the “Observation” point, which indicates a perfect prediction with no error, the HGP model demonstrates both a high correlation with the observed data and a notably low root mean square error (RMSE). This proximity suggests that the HGP model not only accurately captures the observed data’s pattern but also its variance, outperforming other models such as GRU, NARX, and ANFIS, which are positioned further from the “observation” point and, hence, indicate less accuracy. The ANFIS model also shows a close correlation with the observed data, yet it is the HGP model that stands out for its superior predictive capabilities. These findings underscore the HGP model’s robustness and its significant potential in hydrological modeling applications.

The results summarized in Table 3 provide a detailed evaluation of the hierarchical groundwater level prediction (HGP) model’s performance when individual data components are removed. The comparison illustrates the impact of specific inputs on the model’s predictive accuracy for groundwater levels (GWLs).

When precipitation data are omitted, the model maintains high performance during training (Train NSE: 0.9993, R²: 0.9993) but experiences a slight reduction in testing accuracy (Test NSE: 0.9620, R²: 0.9616), as indicated by the metrics. The removal of extraction data results in a similar trend, with a small decrease in the testing performance (Test NSE: 0.9687, R²: 0.9685). Policy-related data, when excluded, show a slight decrease in the testing performance as well (Test NSE: 0.9726, R²: 0.9725), underlining the importance of governance factors in GWL prediction. Notably, the aggregate removal of all of these components results in a more pronounced drop in the model performance (Test NSE: 0.9577, R²: 0.9574), suggesting that each data stream contributes valuable information for accurate forecasting.

In contrast, the HGP model, utilizing the full multisource dataset, significantly outperforms the models with removed components, particularly in the testing phase, achieving a Test NSE of 0.9933 and a Test R² of 0.9933. This superior performance is further evidenced by the lowest Test RMSE (0.0259) and Test MAE (0.3758), highlighting the model’s robustness and accuracy.

These results underscore the importance of integrating multiple data sources for enhancing GWL prediction. The multisource approach of the HGP model effectively captures the complex interactions and influences on GWL, leading to significantly improved predictive performance. This comprehensive integration of various data streams, such as precipitation, extraction rates, and policy measures, allows the HGP model to provide a nuanced and accurate forecast of GWL, demonstrating the advantage of multisource data in hydrological modeling.

4. Discussion

We critically evaluated the hierarchical groundwater level prediction (HGP) model’s strengths and potential areas for improvement, placing particular emphasis on its performance relative to other machine learning models. The HGP model’s superior performance is evident in its high Nash–Sutcliffe efficiency (NSE) and coefficient of determination (R²) values in the testing phase, demonstrating robust generalization capabilities and predictive accuracy. This was further corroborated by the lowest root mean square error (RMSE) and mean absolute error (MAE) observed in testing, indicating its precision in forecasting groundwater levels (GWLs).

The broader perspective introduced by policy considerations provides invaluable insight into how large-scale policies and regulations sculpt the groundwater environment, giving context to more localized data. For example, significant policy actions by the Provincial Water Resources Department have had measurable effects on groundwater levels [42], such as the groundwater replenishment project initiated in Xingtai city on 12 September 2018, which led to over 100 million cubic meters of groundwater replenishment by 16 October 2018 [43]. Additionally, a conference on water management held on 17 May 2021, highlighted the imperative for Xingtai city to curb excessive groundwater extraction [44]. These policy actions are reflected in the HGP model’s data, underscoring the model’s sensitivity to governance influences on groundwater dynamics.

While other models, like GRU [27], NARX [21], and ANFIS [22], showed a commendable performance, they fell short of capturing these policy-driven variations due to their lack of policy integration, often resulting in suboptimal predictive outcomes [45,46,47,48]. In contrast, the HGP model’s multitiered architecture and progressive feature integration from each level effectively fuse disparate data sources without exaggerating feature dimensions. The hierarchical approach not only maintains the distinct characteristics of each data source but also encapsulates their collective impact on GWL, as demonstrated through rigorous testing and validation of the model’s reliability.

Despite the strengths of the HGP model, we acknowledge the limitations of the current dataset, including constraints related to observational completeness, spatial and temporal resolution, and potential biases. These limitations must be considered as they may affect the model’s generalization capabilities beyond the training data’s scope. The HGP model’s complexity also poses challenges for computational efficiency and interpretability, highlighting areas for future refinement.

In summary, this paper presents significant contributions through a meticulous exploration of GWL variations using a multisource dataset and the innovative HGP model. This research underscores the importance of integrating diverse environmental, hydrological, and policy-related factors, marking a significant advancement in groundwater management. We emphasize the innovative use of policy as a predictive variable within the multisource dataset, enhancing the model’s accuracy and providing a comprehensive understanding of GWL dynamics, aligned with the rigorous standards of “nature”. Future efforts will aim to further refine the model, exploring real-time data incorporation and advanced machine learning techniques to improve responsiveness and address current limitations, ultimately optimizing the model’s utility in groundwater management.

5. Conclusions

In conclusion, this study has meticulously developed the hierarchical groundwater level prediction (HGP) model, which stands as a testament to the power of integrating multisource data for the accurate prediction of groundwater levels (GWLs). Our comprehensive analysis has detailed the impact of precipitation, extraction volumes, and policy changes on groundwater level fluctuations within the region. The HGP model, through rigorous evaluations, has outperformed established machine learning benchmarks such as GRU, NARX, and ANFIS, as evidenced by its superior performance metrics—high Nash–Sutcliffe efficiency (NSE = 0.9933) and coefficient of determination (R² = 0.9933), and notably lower root mean square error (RMSE = 0.0259) and mean absolute error (MAE = 0.3758) in the testing phase.

The innovative inclusion of policy variables, alongside traditional hydrological data, has allowed the HGP model to capture the subtleties of GWL dynamics more accurately. This approach has provided a nuanced understanding of how governance, alongside environmental factors, contributes to the fluctuating nature of groundwater levels. The HGP model’s ability to process and analyze these diverse data streams sets a new benchmark for hydrological modeling.

The insight derived from this research not only enhance our scientific understanding of hydrological systems but also equip decision makers with a robust predictive tool for effective groundwater management. By demonstrating a methodological sophistication, this study contributes to the advancement of environmental modeling and underscores the critical role of comprehensive data analysis in the realm of water resource management.

Author Contributions

Conceptualization, D.D. and J.Z.; methodology, D.D. and J.Z.; software, J.Z.; validation, D.D., J.Z. and L.Z.; formal analysis, J.Z.; investigation, J.Z.; resources, D.D. and J.Z.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, D.D. and J.Z.; visualization, J.Z. and L.Z.; supervision, D.D.; project administration, D.D.; funding acquisition, D.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Study on the Analysis of the Causes of Groundwater Leakage in Hebei Province and Research on Key Technologies for Accurate Control” (grant number: 3110101) and “Leaching Failure Characteristics and Groundwater (Fluid) Mixing Mechanism of Abandoned Mine Slurry Modification Complex” (grant number: 41972255).

Data Availability Statement

If interested in the data used in the research work, contact [email protected] for the original dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, J.T.; Gao, Z.J.; Wang, M.; Li, Y.Z.; Ma, Y.Y.; Shi, M.J.; Zhang, H.Y. Study on the dynamic characteristics of groundwater in the valley plain of Lhasa City. Environ. Earth Sci. 2018, 77, 646. [Google Scholar] [CrossRef]
Yang, J.; Yu, Z.; Yi, P.; Aldahan, A. Assessment of groundwater quality and Rn-222 distribution in the Xuzhou region, China. Environ. Monit. Assess. 2018, 190, 1–12. [Google Scholar] [CrossRef]
Lo, W.; Purnomo, S.N.; Sarah, D.; Aghnia, S.; Hardini, P. Groundwater Modelling in Urban Development to Achieve Sustainability of Groundwater Resources: A Case Study of Semarang City, Indonesia. Water 2021, 13, 1395. [Google Scholar] [CrossRef]
Sahoo, S.; Jha, M.K.; Kumar, N.; Chowdary, V.M. Evaluation of GIS-based multicriteria decision analysis and probabilistic modeling for exploring groundwater prospects. Environ. Earth Sci. 2015, 74, 2223–2246. [Google Scholar] [CrossRef]
Yousefi, H.; Zahedi, S.; Niksokhan, M.H.; Momeni, M. Ten-year prediction of groundwater level in Karaj plain (Iran) using MODFLOW2005-NWT in MATLAB. Environ. Earth Sci. 2019, 78, 14. [Google Scholar] [CrossRef]
Yadav, B.; Gupta, P.K.; Patidar, N.; Himanshu, S.K. Ensemble modelling framework for groundwater level prediction in urban areas of India. Sci. Total Environ. 2020, 712, 135539. [Google Scholar] [CrossRef] [PubMed]
Barzegar, R.; Fijani, E.; Moghaddam, A.A.; Tziritis, E. Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. Sci. Total Environ. 2017, 599, 20–31. [Google Scholar] [CrossRef]
Osman, A.I.A.; Ahmed, A.N.; Chow, M.F.; Huang, Y.F.; El-Shafie, A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng. J. 2021, 12, 1545–1556. [Google Scholar] [CrossRef]
Alizamir, M.; Kim, S.; Kisi, O.; Zounemat-Kermani, M. A comparative study of several machine learning based non-linear regression methods in estimating solar radiation: Case studies of the USA and Turkey regions. Energy 2020, 197, 117239. [Google Scholar] [CrossRef]
Lai, Z.-H.; Kiang, J.-F.; IEEE. Water-Table Detection in a Hyper-Arid Region. In Proceedings of the USNC-URSI Radio Science Meeting/IEEE International Symposium on Antennas and Propagation (AP-S), Atlanta, GA, USA, 7–12 July 2019; pp. 107–108. [Google Scholar]
Huang, M.; Tian, Y. Prediction of Groundwater Level for Sustainable Water Management in an Arid Basin Using Data-driven Models. In Proceedings of the 2015 International Conference on Sustainable Energy and Environmental Engineering (IEEE), Bangkok, Thailand, 25–26 October 2015; pp. 134–137. [Google Scholar]
Lu, Y.; Ke, C.-Q.; Zhou, X.; Wang, M.; Lin, H.; Chen, D.; Jiang, H. Monitoring land deformation in Changzhou city (China) with multi-band InSAR data sets from 2006 to 2012. Int. J. Remote Sens. 2017, 39, 1151–1174. [Google Scholar] [CrossRef]
Khaki, M.; Yusoff, I.; Islami, N.; Hussin, N.H. Artificial Neural Network Technique for Modeling of Groundwater Level in Langat Basin, Malaysia. Sains Malays 2016, 45, 19–28. [Google Scholar]
Ahmadi, A.; Olyaei, M.; Heydari, Z.; Emami, M.; Zeynolabedin, A.; Ghomlaghi, A.; Daccache, A.; Fogg, G.E.; Sadegh, M. Groundwater Level Modeling with Machine Learning: A Systematic Review and Meta-Analysis. Water 2022, 14, 949. [Google Scholar] [CrossRef]
Pham, Q.B.; Kumar, M.; Di Nunno, F.; Elbeltagi, A.; Granata, F.; Islam, A.R.M.T.; Talukdar, S.; Nguyen, X.C.; Ahmed, A.N.; Anh, D.T. Groundwater Level Prediction Using Machine Learning Algorithms in a Drought-Prone Area. Neural Comput. Appl. 2022, 34, 10751–10773. [Google Scholar] [CrossRef]
Samani, S.; Vadiati, M.; Nejatijahromi, Z.; Etebari, B.; Kisi, O. Groundwater Level Response Identification by Hybrid Wavelet–Machine Learning Conjunction Models Using Meteorological Data. Environ. Sci. Pollut. Res. 2022, 30, 22863–22884. [Google Scholar] [CrossRef] [PubMed]
Cai, H.; Liu, S.; Shi, H.; Zhou, Z.; Jiang, S.; Babovic, V. Toward Improved Lumped Groundwater Level Predictions at Catchment Scale: Mutual Integration of Water Balance Mechanism and Deep Learning Method. J. Hydrol. 2022, 613, 128495. [Google Scholar] [CrossRef]
Shamsuddin, M.K.N.; Kusin, F.M.; Sulaiman, W.N.A.; Ramli, M.F.; Baharuddin, M.F.T.; Adnan, M.S. Forecasting of Groundwater Level using Artificial Neural Network by incorporating river recharge and river bank infiltration. In Proceedings of the International Symposium on Civil and Environmental Engineering (ISCEE), Melaka, Malaysia, 5–6 December 2016. [Google Scholar] [CrossRef]
Chitsazan, M.; Rahmani, G.; Neyamadpour, A. Forecasting groundwater level by artificial neural networks as an alternative approach to groundwater modeling. J. Geol. Soc. India 2015, 85, 98–106. [Google Scholar] [CrossRef]
Natarajan, N.; Sudheer, C. Groundwater level forecasting using soft computing techniques. Neural Comput. Appl. 2019, 32, 7691–7708. [Google Scholar] [CrossRef]
Kayhomayoon, Z.; Babaeian, F.; Milan, S.G.; Azar, N.A.; Berndtsson, R. A Combination of Metaheuristic Optimization Algorithms and Machine Learning Methods Improves the Prediction of Groundwater Level. Water 2022, 14, 751. [Google Scholar] [CrossRef]
Di Nunno, F.; Granata, F. Groundwater level prediction in Apulia region (Southern Italy) using NARX neural network. Environ. Res. 2020, 190, 110062. [Google Scholar] [CrossRef]
Takeuchi, D.; Yatabe, K.; Koizumi, Y.; Oikawa, Y.; Harada, N.; IEEE. Real-Time Speech Enhancement Using Equilibriated Rnn. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 851–855. [Google Scholar]
Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]
Bryngelson, S.H.; Charalampopoulos, A.; Sapsis, T.P.; Colonius, T. A Gaussian moment method and its augmentation via LSTM recurrent neural networks for the statistics of cavitating bubble populations. Int. J. Multiph. Flow 2020, 127, 103262. [Google Scholar] [CrossRef]
Massaoudi, M.; Abu-Rub, H.; Refaat, S.S.; Chihi, I.; Oueslati, F.S. Deep Learning in Smart Grid Technology: A Review of Recent Advancements and Future Prospects. IEEE Access 2021, 9, 54558–54578. [Google Scholar] [CrossRef]
Elsayed, N.; Maida, A.S.; Bayoumi, M.; IEEE. Gated Recurrent Neural Networks Empirical Utilization for Time Series Classification. In Proceedings of the IEEE International Conference on Cybermat/12th IEEE International Conference on Cyber, Physical and Social Comp (CPSCom)/15th IEEE International Conference on Green Computing and Communications (GreenCom)/12th IEEE Int Conf on Internet of Things (iThings)/5th IEEE Int Conf on Smart Data, Atlanta, GA, USA, 14–17 July 2019; pp. 1207–1210. [Google Scholar]
Ao, C.; Zeng, W.; Wu, L.; Qian, L.; Srivastava, A.K.; Gaiser, T. Time-delayed machine learning models for estimating groundwater depth in the Hetao Irrigation District, China. Agric. Water Manag. 2021, 255. [Google Scholar] [CrossRef]
Baek, J.M.; Shibuya, S.; Furumiya, M.; Saito, M.; Lohani, T.N.; Hur, J.S. Case study on evaluating the groundwater seepage flow by using a 3D ground model prepared from soil borehole and GIS data. In Proceedings of the Computer Methods and Recent Advances in Geomechanics, Kyoto, Japan, 22–25 September 2014; pp. 841–845. [Google Scholar]
Li, P.; Tian, R.; Xue, C.; Wu, J. Progress, opportunities, and key fields for groundwater quality research under the impacts of human activities in China with a special focus on western China. Environ. Sci. Pollut. Res. 2017, 24, 13224–13234. [Google Scholar] [CrossRef] [PubMed]
Hu, W.; Yang, Y.; Wang, J.; Huang, X.; Cheng, Z. Understanding Electricity-Theft Behavior via Multi-Source Data. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2264–2274. [Google Scholar]
Wang, R.; Li, X.; Wei, A. Hydrogeochemical characteristics and gradual changes of groundwater in the Baiquan karst spring region, northern China. Carbonates Evaporites 2022, 37, 47. [Google Scholar] [CrossRef]
Maziarz, M. A review of the Granger-causality fallacy. J. Philos. Econ. 2015, VIII, 10676. [Google Scholar] [CrossRef]
Cai, X.; Sun, H.; Zhang, Q.; Huang, Y. A Grid Weighted Sum Pareto Local Search for Combinatorial Multi and Many-Objective Optimization. IEEE Trans. Cybern. 2019, 49, 3586–3598. [Google Scholar] [CrossRef]
Gao, M.; Yang, F.; Wei, H.; Liu, X. Automatic Monitoring of Maize Seedling Growth Using Unmanned Aerial Vehicle-Based RGB Imagery. Remote Sens. 2023, 15, 3671. [Google Scholar] [CrossRef]
He, S.; Wu, J.; Wang, D.; He, X. Predictive Modeling of Groundwater Nitrate Pollution and Evaluating Its Main Impact Factors Using Random Forest. Chemosphere 2021, 290, 133388. [Google Scholar] [CrossRef]
Gao, M.; Yang, F.; Wei, H.; Liu, X. Individual Maize Location and Height Estimation in Field from UAV-Borne LiDAR and RGB Images. Remote Sens. 2022, 14, 2292. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Iqbal, N.; Khan, A.-N.; Rizwan, A.; Ahmad, R.; Kim, B.W.; Kim, K.; Kim, D.-H. Groundwater Level Prediction Model Using Correlation and Difference Mechanisms Based on Boreholes Data for Sustainable Hydraulic Resource Management. IEEE Access 2021, 9, 96092–96113. [Google Scholar] [CrossRef]
Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag. 2017, 31, 2761–2775. [Google Scholar] [CrossRef]
Lallahem, S.; Mania, J.; Hani, A.; Najjar, Y. On the use of neural networks to evaluate groundwater levels in fractured media. J. Hydrol. 2005, 307, 92–111. [Google Scholar] [CrossRef]
Provincial Water Resources Department Held a Pilot Conference on Groundwater Recharge of River and Lake in Hebei Province. Available online: http://slt.hebei.gov.cn/a/2018/09/12/2018091236969.html (accessed on 12 September 2018).
The Total Volume of Water Replenished by the Hebei Groundwater Recharge Pilot Project Has Surpassed 100 Million Cubic Meters. Available online: http://slt.hebei.gov.cn/a/2018/10/16/2018101637396.html (accessed on 16 October 2018).
The Provincial Department of Water Resources Convenes a Provincial Water Administration Work Conference. Available online: http://slt.hebei.gov.cn/a/2021/05/17/216A7F2DA703410299D478D8846DF438.html (accessed on 17 May 2021).
Mohanty, S.; Jha, M.K.; Raul, S.K.; Panda, R.K.; Sudheer, K.P. Using Artificial Neural Network Approach for Simultaneous Forecasting of Weekly Groundwater Levels at Multiple Sites. Water Resour. Manag. 2015, 29, 5521–5532. [Google Scholar] [CrossRef]
Rajaee, T.; Ebrahimi, H.; Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 2019, 572, 336–351. [Google Scholar] [CrossRef]
Mukherjee, A.; Ramachandran, P. Prediction of GWL with the help of GRACE TWS for unevenly spaced time series data in India: Analysis of comparative performances of SVR, ANN and LRM. J. Hydrol. 2018, 558, 647–658. [Google Scholar] [CrossRef]
Liu, F.; Zhang, Z.; Zhou, R. Automatic modulation recognition based on CNN and GRU. Tsinghua Sci. Technol. 2022, 27, 422–431. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Historical GWLs (left) and autocorrelation for historical GWLs (right).

Figure 3. Precipitation and Historical GWLs (left) and Granger causality test result (right).

Figure 4. Extraction and historical GWLs (left) and Granger causality test results (right).

Figure 5. Precipitation, extraction, and historical GWLs with policy.

Figure 6. Architecture of HGP.

Figure 7. The architecture of recurrent and fusion layers on multisource sequences.

Figure 8. Comparative performance analysis of the different models.

Figure 9. Taylor’s diagram for the different models.

Table 1. Statistical results of the different ML models during the training and testing period.

Methods	NSE		RMSE		MAE		R²
Methods	Train	Test	Train	Test	Train	Test	Train	Test
Random Forest	0.9891	0.9160	0.0292	0.1192	0.3376	1.3097	0.9890	0.9137
XGBoost	0.9977	0.9573	0.0130	0.0775	0.1494	0.8094	0.9977	0.9568
SVM	0.9987	0.9560	0.0100	0.0801	0.1068	0.7737	0.9987	0.9557
HGP	0.9994	0.9933	0.0069	0.0259	0.0838	0.3758	0.9994	0.9933

Table 2. Statistical results of the different DL models during the training and testing periods.

Method	NSE		RMSE		MAE		R²
Method	Train	Test	Train	Test	Train	Test	Train	Test
ANN	0.9995	0.9572	0.0060	0.0777	0.0636	0.7684	0.9995	0.9568
FFNN	0.9996	0.9692	0.0053	0.0638	0.0540	0.7219	0.9996	0.9691
LSTM	0.9986	0.9652	0.0105	0.0691	0.1099	0.8848	0.9986	0.9651
GRU	0.9996	0.9626	0.0053	0.0684	0.0554	0.6824	0.9996	0.9654
ANFIS	0.9996	0.9790	0.0055	0.0486	0.0557	0.6021	0.9996	0.9789
NARX	0.9994	0.9647	0.0069	0.0701	0.762	0.6782	0.9994	0.9646
HGP	0.9994	0.9933	0.0069	0.0259	0.0838	0.3758	0.9994	0.9933

Table 3. Effect of multisource information.

Removed Component	NSE		RMSE		MAE		R²
Removed Component	Train	Test	Train	Test	Train	Test	Train	Test
Precipitation	0.9993	0.9620	0.0071	0.0071	0.0732	0.8426	0.9993	0.9616
Extraction	0.9991	0.9687	0.0082	0.0632	0.0939	0.7042	0.9991	0.9685
Policy	0.9995	0.9726	0.0058	0.0599	0.0613	0.5702	0.9995	0.9725
Above all	0.9985	0.9577	0.0104	0.0777	0.1211	0.7597	0.9985	0.9574
HGP	0.9994	0.9933	0.0069	0.0259	0.0838	0.3758	0.9994	0.9933

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Dong, D.; Zhang, L. A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China. Water 2023, 15, 4129. https://doi.org/10.3390/w15234129

AMA Style

Zhang J, Dong D, Zhang L. A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China. Water. 2023; 15(23):4129. https://doi.org/10.3390/w15234129

Chicago/Turabian Style

Zhang, Jialun, Donglin Dong, and Longqiang Zhang. 2023. "A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China" Water 15, no. 23: 4129. https://doi.org/10.3390/w15234129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Method for Estimating Groundwater Changes Based on Optimized Deep Learning Models—A Case Study of Baiquan Spring Domain in China

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.1.1. Dataset Description

2.1.2. Empirical Observation

2.2. Model

2.3. Model Evaluation

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI