*Article* **Predicting Groundwater Level Based on Machine Learning: A Case Study of the Hebei Plain**

**Zhenjiang Wu, Chuiyu Lu, Qingyan Sun, Wen Lu, Xin He , Tao Qin, Lingjia Yan and Chu Wu \***

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

**\*** Correspondence: wuchu@iwhr.com

**Abstract:** In recent years, the groundwater level (GWL) and its dynamic changes in the Hebei Plain have gained increasing interest. The GWL serves as a crucial indicator of the health of groundwater resources, and accurately predicting the GWL is vital to prevent its overexploitation and the loss of water quality and land subsidence. Here, we utilized data-driven models, such as the support vector machine, long-short term memory, multi-layer perceptron, and gated recurrent unit models, to predict GWL. Additionally, data from six GWL monitoring stations from 2018 to 2020, covering dynamical fluctuations, increases, and decreases in GWL, were used. Further, the first 70% and remaining 30% of the time-series data were used to train and test the model, respectively. Each model was quantitatively evaluated using the root mean square error (RMSE), coefficient of determination (R<sup>2</sup> ), and Nash–Sutcliffe efficiency (NSE), and they were qualitatively evaluated using time-series line plots, scatter plots, and Taylor diagrams. A comparison of the models revealed that the RMSE, R 2 , and NSE of the GRU model in the training and testing periods were better than those of the other models at most groundwater monitoring stations. In conclusion, the GRU model performed best and could support dynamic predictions of GWL in the Hebei Plain.

**Keywords:** groundwater level prediction; data-driven models; gated recurrent units; model performance; Hebei Plain

### **1. Introduction**

The Hebei Plain is one of the most water-sensitive areas of China. Its per capita water resources amount to less than 12.5% of the national total [1], and 70% of the water consumption depends on groundwater [2]. Consequently, the decline in GWL has precipitated various ecological and environmental issues in the Hebei Plain, such as land subsidence, soil salinization, the expansion of cones of depression, and aquifer dewatering [3].

Physical and statistical models are the main tools used to predict GWL. Physical models can describe the groundwater system and reflect changes in groundwater, but their practical applications are hindered by heavy computational loads and the need for large volumes of hydrogeological data [4,5]. Conversely, statistical models, such as machine learning and deep learning models, are an effective alternative that do not require the specific characterization of physical properties, accurate physical parameters, or the modeling of the physical processes of a groundwater system [6–9]. Widely used statistical models include the support vector machine (SVM), long-short term memory (LSTM), multi-layer perceptron (MLP), and gated recurrent unit (GRU) models. Moreover, the use of the Gravity Recovery and Climate Experiment (GRACE) gravity satellite and global hydrological model has been identified as a promising alternative method for predicting groundwater levels. By utilizing remote sensing data and numerical models, this method provides valuable insights into the distribution of groundwater resources, allowing for more informed decision-making and effective management of these vital resources [9–11].

**Citation:** Wu, Z.; Lu, C.; Sun, Q.; Lu, W.; He, X.; Qin, T.; Yan, L.; Wu, C. Predicting Groundwater Level Based on Machine Learning: A Case Study of the Hebei Plain. *Water* **2023**, *15*, 823. https://doi.org/10.3390/w15040823

Academic Editors: Qiting Zuo, Fuqiang Wang, Jiaqi Zhai, Xiuyu Zhang, Dunxian She, Lei Zou, Rong Gan and Zengliang Luo

Received: 1 February 2023 Revised: 16 February 2023 Accepted: 16 February 2023 Published: 20 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

SVMs are a type of generalized nonlinear model for classification and regression analysis [12], and their solutions adopt a macro-perspective to solve quadratic constraint optimization [13]. Asefa et al. [14] proposed a solution for groundwater monitoring and prediction networks using the SVM method. Yoon et al. [15] compared time-series models based on artificial neural networks (ANNs) and SVM to predict GWL. Moreover, Tapak et al. [16] used SVM to predict the GWL of the Hamadan–Bahar Plain in West Iran. SVMs have also been used to predict hydrological factors, such as river flow [17,18].

LSTM is an improved recurrent neural network (RNN) that was developed to address the exploding gradient problem using forget and update gates to regulate gradient [19], with GWL predictions extensively tested. Vu et al. [20] used LSTM to reconstruct, fill gaps, and extend existing time-series of GWL data in Normandy, France. Further, Wunsch et al. [21] compared the LSTM and nonlinear autoregressive networks with exogenous input and proved the proficiency and accuracy of LSTM for GWL predictions.

An MLP is a type of feedforward ANN consisting of an input layer, single or multiple hidden layers, and an output layer, and each node (neuron) in a layer is connected to every node in the following layer [22]. MLPs have been widely used in hydrological models [23,24] and agricultural pollution models [25–27]. Sahoo et al. [7] combined an MLP and genetic algorithm to predict GWL changes in agricultural areas of the United States.

GRUs are an optimized version of an LSTM that shorten the model training time and simplify the gated structure [28]. Jeong et al. [29] predicted GWL sequences using the LSTM, GRU, and autoregressive with exogenous input models. Zhang et al. [30] used various models to compare the simulations of the water level of the middle route of the South-to-North Water Transfer Project and proved that the performance of GRU and LSTM is similar but GRU has a comparatively faster learning curve. Additionally, Chen et al. [31] automatically calibrated groundwater parameters by combining the GRU model with particle swarm optimization.

The GWL in the Hebei Plain exhibits highly nonlinear variability due to various factors such as precipitation, evapotranspiration, and human activities. This variability may result in poor model prediction. Additionally, some nonlinear machine learning models may not accurately process the noise and features that are present in the real situation of the study area. Therefore, this paper aims to explore the mathematical relationships of the GWL time-scale data themselves and to perform dynamic prediction of the GWL in the Hebei Plain by comparing support vector machine (SVM), long-short term memory (LSTM), multilayer perceptron (MLP), and gated recurrent unit (GRU) models. We evaluate each model qualitatively and quantitatively using dynamic fluctuation, dynamic rise, and dynamic fall types of sites, respectively, in order to obtain a dynamic prediction model applicable to the GWL in the Hebei Plain. The remaining paper is organized as follows: Section 2 introduces the research area, data sources, model methods, and evaluation indicators, along with the technical approach of this study. Section 3 introduces the main findings and discusses the performance indicators of each model. Section 4 sets out the main conclusions of this study.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The Hebei Plain is located in North China (114◦330–119◦420 N 36◦050–39◦930 E) (Figure 1). Per capita water resources in the region are approximately 386 m<sup>3</sup> , which is approximately one eighth of China's national average, thus making it an extremely water-scarce region. It lies in the semi-humid, semi-arid climate zone and experiences a temperate continental monsoon climate. This region has four distinct seasons with rainy and hot periods coinciding. Annual precipitation is unevenly distributed but mainly falls in summer, with an annual average of 450–550 mm. Precipitation from June to July accounts for approximately 75% of the annual precipitation, and surface water resources are relatively scarce. The main water source for irrigation is groundwater, and agricultural groundwater consumption accounts for 74.5–76.6% of groundwater exploitation.

**Figure 1.** Location and hydrogeological profile of the study area in the Hebei Plain.

Figure 1. Location and hydrogeological profile of the study area in the Hebei Plain. The North China Plain has complex hydrogeological conditions, but it mainly comprises basins of Quaternary (Cenozoic) loose sediment. Quaternary aquifer media are divided into four groups (from top to bottom): Quaternary Holocene Q4, Upper Pleistocene Q3, Middle Pleistocene Q2, and Lower Pleistocene Q1. The buried depths of the bottom boundaries of the aquifer group are 40–60 m, 120–170 m, 250–350 m, and 350–550 m, re-The North China Plain has complex hydrogeological conditions, but it mainly comprises basins of Quaternary (Cenozoic) loose sediment. Quaternary aquifer media are divided into four groups (from top to bottom): Quaternary Holocene Q4, Upper Pleistocene Q3, Middle Pleistocene Q2, and Lower Pleistocene Q1. The buried depths of the bottom boundaries of the aquifer group are 40–60 m, 120–170 m, 250–350 m, and 350–550 m, respectively. Overexploitation of groundwater for many years has resulted in many groundwater cones of depression in the first, second, and third aquifer groups in the Hebei Plain, causing nonlinear changes in the groundwater flow direction and GWL height in the region.

approximately 75% of the annual precipitation, and surface water resources are relatively scarce. The main water source for irrigation is groundwater, and agricultural groundwa-

ter consumption accounts for 74.5–76.6% of groundwater exploitation.

#### spectively. Overexploitation of groundwater for many years has resulted in many groundwater cones of depression in the first, second, and third aquifer groups in the Hebei Plain, *2.2. Data Sources*

causing nonlinear changes in the groundwater flow direction and GWL height in the region. 2.2. Data Sources GWL data collected during 2018–2020, with a time interval of 4 h, were obtained from China's National Groundwater Monitoring Project. The trends in GWL in the Hebei Plain during the study period consisted of dynamic fluctuations, increases, and decreases. In GWL data collected during 2018–2020, with a time interval of 4 h, were obtained from China's National Groundwater Monitoring Project. The trends in GWL in the Hebei Plain during the study period consisted of dynamic fluctuations, increases, and decreases. In total, six monitoring stations covering these three change types were selected as the study objects, which provided 32,880 datapoints. The time series of each station was divided, and any missing series was added so that the data could be converted into a format recognized by each model. The station data sample formats are shown in Table 1.


total, six monitoring stations covering these three change types were selected as the study objects, which provided 32,880 datapoints. The time series of each station was divided, **Table 1.** Data samples in the study area.

To intuitively reflect the daily changes in the GWL of the three types of selected monitoring stations, line graphs were drawn showing the time scale of GWL for 2018–2020. Figure 2a

shows a station with dynamic fluctuations, where the GWL was the same at the beginning and end of the research period. Figure 2b shows a station with a dynamic increase. Although there were fluctuations during the study period, the GWL at the end of the study period was higher than at the beginning. Further, Figure 2c shows a station with a dynamic decrease, wherein the GWL at the end of the study period was lower than at the beginning. Water 2023, 15, x FOR PEER REVIEW 5 of 21

Figure 2. Three types of GWL data samples: (a) dynamic fluctuations; (b) dynamic increase; (c) dynamic decrease. **Figure 2.** Three types of GWL data samples: (**a**) dynamic fluctuations; (**b**) dynamic increase; (**c**) dynamic decrease.

2.3. Methods

#### *2.3. Methods*

Data-driven models, such as ANNs, can easily approximate the complex behavior and responses of physical systems; additionally, they can quickly optimize many case scenarios with different constraints. Compared with the multiple assumptions, complex input variables, and parameter calibration of physical models, the input variables of datadriven models are easier to measure and quantify. In particular, machine learning can help to predict the GWL in areas that lack hydrogeological survey data. In this study, SVM, LSTM, GRU, and MLP models were used to predict GWL at the selected monitoring stations. Each of these models has been described below.

#### 2.3.1. Support Vector Machine

SVM is a linear discriminant classification method based on the maximum margin. SVMs are the most widely used machine learning models for predicting GWL, as they can maximize prediction accuracy. They use the linear kernel function, polynomial kernel function, Gaussian kernel radial basis function (RBF), and sigmoid kernel function, which greatly optimize the nonlinear prediction capability of the model. Moreover, it is considered the best theory for small-sample statistical estimations and predictive learning, and can precisely predict GWL.

RBF has a strong nonlinear mapping capability and is suitable for predicting momentto-moment changes in GWL as follows:

$$k(\mathbf{x}i, \mathbf{x}j) = \exp(-\gamma(\mathbf{x}i - \mathbf{x}j)^2) \tag{1}$$

At the core of LSTM is the objective of controlling the internal state to allow it to

and X<sup>t</sup>

the input gate selectively memorizes input information and determines the quantity saved

1 <sup>t</sup> C F C I C t t t t

Since each element of the output matrix (or vector) after the sigmoid layer is a real number between 0 and 1, which is then dot-multiplied with other information, it effectively controls the passage of information. In this range, "0" indicates that the information does not

mation needs to be forgotten in the internal state of the previous moment C<sup>t</sup> <sup>1</sup>

ment. The entire network equation of the LSTM cell can be given as follows:

tanh

O X

I H

in the current moment X<sup>t</sup>

t

C

t

t t

F

, and output gate O<sup>t</sup> ) in the hidden layer to control the signal input

1

 

t

W b

t

is the sigmoid activation function, which limits the output to between 0 and 1.

as inputs to control how much infor-

. The output gate controls how much in-

needs to output to the external state in the current mo-

H O C t t t tanh (4)

; further,

(2)

(3)

where *γ* is an artificially determined positive real number parameter and (*x<sup>i</sup>* , *x<sup>j</sup>* ) is the training sample.

#### 2.3.2. Long-Short Term Memory

The LSTM model processes and analyzes time-series data by selectively extracting saved information and combining the selected information with subsequently input timeseries data. The network can locally predict each fragmented sequence of GWL data, and the prediction deviations are passed back, to dynamically predict a GWL sequence.

LSTM is an improved recurrent neural network (RNN). The cycle of an ordinary RNN passes through the hidden state (H), but the LSTM output has two states: H and memory state (C). Further, three "gates" are added to the LSTM to process the input information differently. Figure 3 shows the internal structure of an LSTM cell. Water 2023, 15, x FOR PEER REVIEW 7 of 21

**Figure 3.** Structure of the LSTM cell.

to the cell state C<sup>t</sup>

Ft

where 

Figure 3. Structure of the LSTM cell.

formation the internal state C<sup>t</sup>

and output. The forget gate uses H<sup>t</sup> <sup>1</sup>

At the core of LSTM is the objective of controlling the internal state to allow it to retain and filter information from previous moments. LSTM has three gates (forget gate *Ft* , input gate *I<sup>t</sup>* , and output gate *Ot*) in the hidden layer to control the signal input and output. The forget gate uses *Ht*−<sup>1</sup> and *X<sup>t</sup>* as inputs to control how much information needs to be forgotten in the internal state of the previous moment *Ct*−1; further, the input gate selectively memorizes input information and determines the quantity saved to the cell state *Ct* in the current moment *X<sup>t</sup>* . The output gate controls how much information the internal state *C<sup>t</sup>* needs to output to the external state in the current moment. The entire network equation of the LSTM cell can be given as follows:

$$
\begin{bmatrix}
\check{\mathbf{C}}\_{t} \\
\boldsymbol{O}\_{t} \\
\boldsymbol{I}\_{t} \\
\boldsymbol{F}\_{t}
\end{bmatrix} = \begin{bmatrix}
\tanh \\
\boldsymbol{\sigma} \\
\boldsymbol{\sigma} \\
\boldsymbol{\sigma}
\end{bmatrix} \left( \boldsymbol{W} \begin{bmatrix}
\boldsymbol{X}\_{t} \\
\boldsymbol{H}\_{t-1}
\end{bmatrix} + \boldsymbol{b} \right) \tag{2}
$$

$$\mathbf{C}\_{t} = F\_{t} \times \mathbf{C}\_{t-1} + I\_{t} \times \widetilde{\mathbf{C}}\_{t} \tag{3}$$

In Figure 4, the update and reset gates are denoted by Z<sup>t</sup> and R<sup>t</sup> , respectively,

First, the gating state is obtained from the state in the previous moment Ht<sup>1</sup>

the input Xt in the current moment. Subsequently, using sigmoid nonlinear transformation, the data are mapped to [0,1], which acts as the gating signal. Once the gating signal has been obtained, the reset gate is "reset" as a coefficient before the previous mo-

R W H X t r t t

<sup>1</sup>

is retained in H<sup>t</sup> , and R<sup>t</sup> is

will be written into

and

(5)

. The equations are as follows:

,

$$H\_l = O\_l \times \tanh(\mathbb{C}\_l) \tag{4}$$

where *σ* is the sigmoid activation function, which limits the output to between 0 and 1. Since each element of the output matrix (or vector) after the sigmoid layer is a real number between 0 and 1, which is then dot-multiplied with other information, it effectively controls the passage of information. In this range, "0" indicates that the information does not pass at all, and "1" indicates that the information passes entirely. This allows the network to regulate the flow of information through the "gate". Water 2023, 15, x FOR PEER REVIEW 8 of 21 pass at all, and "1" indicates that the information passes entirely. This allows the network

to regulate the flow of information through the "gate."

#### 2.3.3. Gated Recurrent Units

GRU optimizes the gated structure of LSTM (Figure 4), and its training process is easier to converge. Compared with LSTM, GRU has only two gates: the update and reset gates. The update gate is constructed from the forget and input gates in the LSTM. The reset gate is recomposed from memory cells and hidden layer states. The GRU network controls the change in state of hidden units over time through its special gated structure. This avoids inaccurate parameter training due to the vanishing gradient problem or exploding gradient problem during long-term propagation. 2.3.3. Gated Recurrent Units GRU optimizes the gated structure of LSTM (Figure 4), and its training process is easier to converge. Compared with LSTM, GRU has only two gates: the update and reset gates. The update gate is constructed from the forget and input gates in the LSTM. The reset gate is recomposed from memory cells and hidden layer states. The GRU network controls the change in state of hidden units over time through its special gated structure. This avoids inaccurate parameter training due to the vanishing gradient problem or exploding gradient problem during long-term propagation.

where Z<sup>t</sup> is used to control the extent to which Ht<sup>1</sup>

to obtain the candidate hidden state <sup>H</sup> <sup>t</sup>

Figure 4. Structure of the GRU cell. **Figure 4.** Structure of the GRU cell.

Ht<sup>1</sup> .

ment Ht<sup>1</sup>

In Figure 4, the update and reset gates are denoted by *Z<sup>t</sup>* and *R<sup>t</sup>* , respectively, where *Zt* is used to control the extent to which *Ht*−<sup>1</sup> is retained in *H<sup>t</sup>* , and *R<sup>t</sup>* is used to control the extent to which the current candidate set *H*e*<sup>t</sup>* will be written into *Ht*−1.

First, the gating state is obtained from the state in the previous moment *Ht*−<sup>1</sup> and the input *X<sup>t</sup>* in the current moment. Subsequently, using sigmoid nonlinear transformation, the data are mapped to [0,1], which acts as the gating signal. Once the gating signal has been obtained, the reset gate is "reset" as a coefficient before the previous moment *Ht*−<sup>1</sup> to obtain the candidate hidden state *H*e*<sup>t</sup>* . The equations are as follows: Water 2023, 15, x FOR PEER REVIEW 9 of 21 Z W H X t z t t <sup>1</sup> ,

$$R\_t = \sigma(\mathcal{W}\_r \cdot [H\_{t-1}, X\_t]) \tag{5}$$

(6)

$$\mathbf{Z}\_t = \sigma(\mathbf{W}\_z \cdot [\mathbf{H}\_{t-1}, \mathbf{X}\_t]) \tag{6}$$

$$\tilde{H}\_{\mathbf{f}} = \tanh\left(\mathcal{W}\_{\tilde{\mathbf{h}}} \cdot \left[\mathcal{R}\_{\mathbf{f}} \times H\_{\mathbf{f}-1}, \mathbf{X}\_{\mathbf{f}}\right]\right) \tag{7}$$

where *σ* is the sigmoid activation function. While calculating a gate's hidden state, the sigmoid function is used to obtain a result between 0 and 1, and while calculating a candidate hidden state, the activation function uses the tanh function. *W<sup>r</sup>* , *<sup>W</sup>z*, and *<sup>W</sup>*e*<sup>h</sup>* are the reset gate, update gate, and the weight matrix, respectively, for calculating the candidate hidden state. sigmoid function is used to obtain a result between 0 and 1, and while calculating a candidate hidden state, the activation function uses the tanh function. W<sup>r</sup> ,Wz , and W<sup>h</sup> are the reset gate, update gate, and the weight matrix, respectively, for calculating the candidate hidden state.

#### 2.3.4. Multi-Layer Perceptron 2.3.4. Multi-Layer Perceptron MLP is an ANN used for predictions. It learns the relationships between inputs and

where

MLP is an ANN used for predictions. It learns the relationships between inputs and outputs using a large volume of data and can be used for nonlinear modeling. An MLP is composed of multiple layers of neurons, and each node (neuron) in a layer is connected with a certain weight to every node in the following layer (Figure 5). The main disadvantage of this method is that when layers or the nodes in each layer increase, overfitting and model training issues arise. outputs using a large volume of data and can be used for nonlinear modeling. An MLP is composed of multiple layers of neurons, and each node (neuron) in a layer is connected with a certain weight to every node in the following layer (Figure 5). The main disadvantage of this method is that when layers or the nodes in each layer increase, overfitting and model training issues arise.

Figure 5. Structure of MLP model. **Figure 5.** Structure of MLP model.

the activation function in this study.

2.4. Performance Evaluation of Models

where

In Figure 5, the output of the MLP hidden layer H node is given as In Figure 5, the output of the MLP hidden layer H node is given as

$$H\_{\hat{\jmath}} = g(\sum\_{i=1}^{n} \omega\_{i\hat{\jmath}} \mathbf{x}\_i + b\_{\hat{\jmath}}) \tag{8}$$

and b are the weight and deviation, respectively, and g is the activation func-

tion, commonly the sigmoid, tanh, or rectifier linear unit (ReLU) activation function. Be-

where *ω* and *b* are the weight and deviation, respectively, and *g* is the activation function, commonly the sigmoid, tanh, or rectifier linear unit (ReLU) activation function. Because the ReLU function avoids the vanishing gradient problem and its convergence speed is faster than that of the sigmoid and tanh activation functions, ReLU was used as the activation function in this study.

#### *2.4. Performance Evaluation of Models*

The performance of the SVM, LSTM, GRU, and MLP models can be evaluated using the root mean square error (RMSE), coefficient of determination (R<sup>2</sup> ), and Nash–Sutcliffe efficiency (NSE), which are calculated as follows:

$$\mathcal{R}^2 = \left[ \frac{\left[\sum Q\_o Q\_p\right] - \left[\frac{\sum Q\_o \sum Q\_p}{N}\right]}{\sqrt{\left[\sum Q\_P^2 - \frac{\left(\sum Q\_p\right)^2}{N}\right] \left[\sum Q\_o^2 - \frac{\left(\sum Q\_o\right)^2}{N}\right]}} \right]^2 \tag{9}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} \left(Q\_o - Q\_p\right)^2} \tag{10}$$

$$NSE = 1 - \left[\frac{\sum\_{i=1}^{N} \left(Q\_o - Q\_p\right)^2}{\sum\_{i=1}^{N} \left(Q\_o - \overline{Q}\_o\right)^2}\right] \tag{11}$$

where *Q<sup>o</sup>* is the observed GWL, *Q<sup>p</sup>* is the predicted GWL, *N* is the length of the groundwater series, and *Qo* is the mean value of the observed GWL.

#### *2.5. Groundwater Level Prediction Methodology*

As shown in Figure 6, we first input missing values and processed outliers in the original dataset and converted the format into a data type recognized by the model. Subsequently, we split the dataset, with the first 70% of the time-scale GWL measured sequence used as the training set. We converted the one-dimensional training data into a twodimensional matrix containing input data and testing data, with a window of 10, i.e., GWL0–GWL<sup>9</sup> was used to predict GWL10, GWL1–GWL<sup>10</sup> was used to predict GWL11, etc. GWL data of each station were converted into this format. Later, we used the SVM, LSTM, GRU, and MLP models to build and train the dataset, calculated the loss functions of each model, observed the consistency between predicted and actual values, and judged the robustness of each model based on its dynamic changes. The degree of convergence of the loss function was used as the basis for the training results of each model to continuously train the model. After training was completed, 30% of the time-scale GWL measured sequence for each station was used as the testing set. The testing set data were also converted into a two-dimensional matrix as described above and input into the trained model. The R 2 , RMSE, and NSE evaluation indicators were used to evaluate the performance of the models, to select the model that was most suitable for dynamically predicting the GWL in the Hebei Plain.

**Figure 6.** Running process of the SVM, LSTM, GRU, and MLP models.

#### Figure 6. Running process of the SVM, LSTM, GRU, and MLP models. **3. Results and Discussion**

3. Results and Discussion The results of the GWL predictions for each station in Hebei Province acquired through the machine learning techniques are presented in this section. The Huimazhai and Hongmiao stations were the dynamically fluctuating stations. The Huimazhai station is located in the urban area of Qinhuangdao, adjacent to the Shi River and Liaodong Bay, and it showed an average GWL of 34.36 m. During the study period, its GWL in summer was mostly higher than in spring and winter, resulting in dynamic fluctuations. This may be due to a reduced need to irrigate crops in the rainy and summer season across the entire Qinhuangdao area. Moreover, changes in GWL were consistent with changes in precipitation. The Hongmiao station is located in Xingtai City, where groundwater is exploited for crop irrigation. However, Xingtai has conducted a groundwater restoration pilot project and promoted the cultivation of drought-resistant winter wheat; additionally, groundwater replenishment work has also been conducted. The GWL in this area fluctuated due to the influence of changes in human activities, but it had a mean GWL of −65.09 m. The dynamically increasing stations were Xiliangdian and Yanmeidong. The Xiliangdian station is located in Gaoyang County, Baoding, and had a mean GWL of −19.41 m. The GWL in this station was lower in spring and summer than in other seasons, which was likely due to the expansive area of water-intensive crops in spring and summer. Nevertheless, because this area is located in the middle route of the South-to-North Water Diversion Project, its GWL has been increasing. Further, the Yanmeidong station is located in Laiyuan County, Baoding, and showed an average GWL of 1235.67 m, and its GWL increased overall during the study period. This was likely due to the station being located in a mountainous area that receives relatively high levels of precipitation; moreover, The results of the GWL predictions for each station in Hebei Province acquired through the machine learning techniques are presented in this section. The Huimazhai and Hongmiao stations were the dynamically fluctuating stations. The Huimazhai station is located in the urban area of Qinhuangdao, adjacent to the Shi River and Liaodong Bay, and it showed an average GWL of 34.36 m. During the study period, its GWL in summer was mostly higher than in spring and winter, resulting in dynamic fluctuations. This may be due to a reduced need to irrigate crops in the rainy and summer season across the entire Qinhuangdao area. Moreover, changes in GWL were consistent with changes in precipitation. The Hongmiao station is located in Xingtai City, where groundwater is exploited for crop irrigation. However, Xingtai has conducted a groundwater restoration pilot project and promoted the cultivation of drought-resistant winter wheat; additionally, groundwater replenishment work has also been conducted. The GWL in this area fluctuated due to the influence of changes in human activities, but it had a mean GWL of −65.09 m. The dynamically increasing stations were Xiliangdian and Yanmeidong. The Xiliangdian station is located in Gaoyang County, Baoding, and had a mean GWL of −19.41 m. The GWL in this station was lower in spring and summer than in other seasons, which was likely due to the expansive area of water-intensive crops in spring and summer. Nevertheless, because this area is located in the middle route of the South-to-North Water Diversion Project, its GWL has been increasing. Further, the Yanmeidong station is located in Laiyuan County, Baoding, and showed an average GWL of 1235.67 m, and its GWL increased overall during the study period. This was likely due to the station being located in a mountainous area that receives relatively high levels of precipitation; moreover, because the area under irrigated farmland in the region was small, agricultural water consumption was low. The dynamically decreasing stations were Wangduxiancheng and XincunIIIzu. The Wangduxiancheng station is located in Wangdu County, Baoding, and had a mean GWL of 16.43 m, and XincunIIIzu is located in Huanghua and had a mean GWL of

−43.13 m. Despite the implementation of groundwater extraction projects at the two stations, their water levels decreased gradually during the study period due to increases in the area under winter wheat. In this study, GWL data were divided into training and validation sets, with the GWL of the validation set predicted using mathematical relationships discovered in the training set data. The SVM, LSTM, GRU, and MLP models were used to develop a GWL prediction

because the area under irrigated farmland in the region was small, agricultural water consumption was low. The dynamically decreasing stations were Wangduxiancheng and XincunIIIzu. The Wangduxiancheng station is located in Wangdu County, Baoding, and had a mean GWL of 16.43 m, and XincunIIIzu is located in Huanghua and had a mean GWL of −43.13 m. Despite the implementation of groundwater extraction projects at the two stations, their water levels decreased gradually during the study period due to in-

Water 2023, 15, x FOR PEER REVIEW 12 of 21

In this study, GWL data were divided into training and validation sets, with the GWL of the validation set predicted using mathematical relationships discovered in the training set data. The SVM, LSTM, GRU, and MLP models were used to develop a GWL prediction model, and time-series scatter plots, relative error, and Taylor diagrams were used to qualitatively evaluate the performance of the models, while statistical and hydrological model indicators, such as RMSE, R<sup>2</sup> , and NSE, were used to quantitatively evaluate their performance. model, and time-series scatter plots, relative error, and Taylor diagrams were used to qualitatively evaluate the performance of the models, while statistical and hydrological model indicators, such as RMSE, R<sup>2</sup> , and NSE, were used to quantitatively evaluate their performance. 3.1. GWL Prediction Using SVM Model

#### *3.1. GWL Prediction Using SVM Model* Gaussian and linear kernel functions were used for SVM-based runoff modeling. The

creases in the area under winter wheat.

Gaussian and linear kernel functions were used for SVM-based runoff modeling. The former was more effective than the latter; therefore, we established an SVM model with the Gaussian function as the kernel function to predict the GWL of the six monitoring stations with the three types of dynamic changes (fluctuations, increases, and decreases). As shown in Figure 7, the SVM model overestimated the GWL of the dynamically fluctuating stations and slightly underestimated the GWL of the stations with dynamic increases and decreases. former was more effective than the latter; therefore, we established an SVM model with the Gaussian function as the kernel function to predict the GWL of the six monitoring stations with the three types of dynamic changes (fluctuations, increases, and decreases). As shown in Figure 7, the SVM model overestimated the GWL of the dynamically fluctuating stations and slightly underestimated the GWL of the stations with dynamic increases and decreases.

**Figure 7.** Results of hourly GWL simulation (m) in six stations using the SVM model during 2018– 2020 in the training and testing periods: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**) Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station.

Table 2 shows the simulation results. Generally, the simulation results were better for the dynamically increasing and decreasing stations. For example, the RMSE, R<sup>2</sup> , and NSE values of the Yanmeidong station in the testing period were 0.193 m, 0.998, and 0.984, respectively, indicating that SVM was well suited to these two types of monitoring stations. Considering the dynamically fluctuating stations, the RMSE, R<sup>2</sup> , and NSE values of the Huimazhai station in the training period were 0.253 m, 0.953, and 0.921, respectively, while the accuracy in the testing period was markedly lower, with NSE being 24.9% lower in the testing period than in the training period, indicating that SVM was not effective at capturing the nonlinear relationship of dynamically fluctuating stations. Overall, SVM was not suitable for predicting GWL in the Hebei Plain.

**Table 2.** Results of different performance indicators of the SVM model during the training and testing periods at each site.


#### *3.2. GWL Prediction Using the LSTM Model*

During LSTM modeling, the model filtered GWL features through the output gate, saved the useful features, and discarded the useless features to obtain current moment contextual information, which greatly enriched the information in the vector. This also implied that the current moment contextual information Ht was only one part of the global information Ct. Further, we used the LSTM model to simulate the six selected stations. The GWL in the training and testing periods is shown in Figure 8. The GWL prediction results were suboptimal for the dynamically fluctuating stations, including another overestimation of the peak value in the testing period shown in Figure 8a. However, compared with the SVM model, the results of the dynamically increasing station, shown in Figure 8d, improved, with the predicted GWL closer to the observed GWL, although the peak value was overestimated.

Table 3 shows the simulation results. The RMSE, R<sup>2</sup> , and NSE values of the Yanmeidong station in the testing period were 0.116 m, 0.996, and 0.994, respectively. Notably, the results from the testing period were better than those from the training period for this station, indicating that the observed and predicted values in the verification period were highly consistent, thus proving the effectiveness of the LSTM model for the dynamically increasing stations. The RMSE, R<sup>2</sup> , and NSE values of the Huimazhai station in the training period were 0.263 m, 0.868, and 0.868, respectively. Compared to the R<sup>2</sup> value of the SVM model, the R<sup>2</sup> value of the LSTM model was 12.7% higher. Nevertheless, for the dynamically fluctuating stations, both the peaks and troughs of the predicted GWL were more than the observed GWL; therefore, the prediction results were suboptimal. The overall R<sup>2</sup> value of the LSTM model was >0.85, indicating that it was a good model for predicting the GWL in the Hebei Plain.

Figure 8. Results of the hourly GWL simulation (m) in six stations using the LSTM model during 2018–2020 in the training and testing periods: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian station, (d)Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station. **Figure 8.** Results of the hourly GWL simulation (m) in six stations using the LSTM model during 2018– 2020 in the training and testing periods: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**) Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station.

Figure 7. Results of hourly GWL simulation (m) in six stations using the SVM model during 2018– 2020 in the training and testing periods: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian

station, (d)Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station.

Table 2 shows the simulation results. Generally, the simulation results were better for the dynamically increasing and decreasing stations. For example, the RMSE, R<sup>2</sup> , and **Table 3.** Results of the different performance indicators of LSTM model during training and testing periods at each monitoring station.


#### Table 2. Results of different performance indicators of the SVM model during the training and testing periods at each site. *3.3. GWL Prediction Using the MLP Model*

Station Training Testing RMSE R<sup>2</sup> NSE RMSE R<sup>2</sup> NSE Huimazhai 0.253 0.953 0.921 0.396 0.757 0.691 Hongmiao 2.299 0.98 0.967 3.823 0.867 0.804 Xiliangdian 0.298 0.995 0.994 0.511 0.915 0.908 Yanmeidong 0.204 0.998 0.909 0.193 0.998 0.984 Wangduxiancheng 0.076 0.992 0.985 0.071 0.929 0.808 XincunIIIzu 0.052 0.999 0.998 0.045 0.990 0.940 The MLP model includes the input, output, and hidden layers, and each node (neuron) in a layer is connected to every node in the following layer. We improved the basic threelayer MLP by adding two linear hidden layers and the ReLU activation function after the first hidden layer. Subsequently, we simulated the selected stations in the Hebei Plain. Notably, this model solved the shortcomings of the SVM and LSTM models for dynamically fluctuating stations (Figure 9a,b). The predicted trend was satisfactory and higher in the testing period than in the training period. However, in the stations with dynamically increasing trends (Figure 8c), further improvement in the prediction results of peak and trough values was not possible.

Figure 9. Results of hourly GWL simulation (m) in six stations using the MLP model during 2018– 2020 during the training and testing periods: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian station, (d)Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station. **Figure 9.** Results of hourly GWL simulation (m) in six stations using the MLP model during 2018–2020 during the training and testing periods: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**) Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station.

Table 4 shows the simulation results. The RMSE, R<sup>2</sup> , and NSE values of the Yanmeidong station in the testing period were 0.08 m, 0.998, and 0.997, respectively, which were the best simulation results. The results for the dynamically fluctuating stations (Huimazhai and Hongmiao) were also satisfactory, especially for the Huimazhai station, with RMSE, R<sup>2</sup> , and NSE values of 0.201 m, 0.959, and 0.95 in the training period and 0.128 m, 0.979, and 0.968 in the testing period, respectively. Thus, the results in the testing period were better than those in the training period. Overall, the RMSE values of the various station types were <0.6, and NSE and R<sup>2</sup> values were >0.96, thus making this a highly suitable model for predicting the GWL in the Hebei Plain. Table 4 shows the simulation results. The RMSE, R<sup>2</sup> , and NSE values of the Yanmeidong station in the testing period were 0.08 m, 0.998, and 0.997, respectively, which were the best simulation results. The results for the dynamically fluctuating stations (Huimazhai and Hongmiao) were also satisfactory, especially for the Huimazhai station, with RMSE, R 2 , and NSE values of 0.201 m, 0.959, and 0.95 in the training period and 0.128 m, 0.979, and 0.968 in the testing period, respectively. Thus, the results in the testing period were better than those in the training period. Overall, the RMSE values of the various station types were <0.6, and NSE and R<sup>2</sup> values were >0.96, thus making this a highly suitable model for predicting the GWL in the Hebei Plain.

Table 4. Results of different performance indicators of the MLP model during the training and testing periods at each station. **Table 4.** Results of different performance indicators of the MLP model during the training and testing periods at each station.


#### *3.4. GWL Prediction Using the GRU Model* 3.4. GWL Prediction Using the GRU Model

Compared with LSTM, GRU has a simpler gated structure and fewer parameters and faster convergence. The Ct in GRU already contained Ht, and there was a tradeoff between current unit information and previous global information while generating moment contextual information. Therefore, replacing I<sup>t</sup> with 1-Zt can expose all information globally. The GRU model for each of the selected stations in the training and testing periods is shown in Figure 10. The model retained the testing results of the MLP model for dynamically fluctuating stations, but the testing period results for the dynamically increasing station (c) were better than the results of the other models. Compared with LSTM, GRU has a simpler gated structure and fewer parameters and faster convergence. The Ct in GRU already contained Ht, and there was a trade-off between current unit information and previous global information while generating moment contextual information. Therefore, replacing It with 1-Zt can expose all information globally. The GRU model for each of the selected stations in the training and testing periods is shown in Figure 10. The model retained the testing results of the MLP model for dynamically fluctuating stations, but the testing period results for the dynamically increasing station (c) were better than the results of the other models.

XincunIIIzu 0.051 0.999 0.998 0.014 0.995 0.994

Water 2023, 15, x FOR PEER REVIEW 16 of 21

Figure 10. Results of hourly GWL simulation (m) in six stations using the GRU model during 2018– 2020 training and testing periods: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian station, (d)Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station. **Figure 10.** Results of hourly GWL simulation (m) in six stations using the GRU model during 2018–2020 training and testing periods: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**) Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station.

Table 5 shows the corresponding simulation results. The best training period results were observed in the XincunIIIzu station, with RMSE, R<sup>2</sup> , and NSE values of 0.081 m, 0.999, and 0.996, respectively, while the best testing period results were observed in the Yanmeidong station, with RMSE, R<sup>2</sup> , and NSE values of 0.098 m, 0.998, and 0.996, respectively. The GRU model results were notably better for dynamically fluctuating stations than those of the first three models. Furthermore, the model maintained the training and testing results for the other station types, thus making it the best model for predicting the Table 5 shows the corresponding simulation results. The best training period results were observed in the XincunIIIzu station, with RMSE, R<sup>2</sup> , and NSE values of 0.081 m, 0.999, and 0.996, respectively, while the best testing period results were observed in the Yanmeidong station, with RMSE, R<sup>2</sup> , and NSE values of 0.098 m, 0.998, and 0.996, respectively. The GRU model results were notably better for dynamically fluctuating stations than those of the first three models. Furthermore, the model maintained the training and testing results for the other station types, thus making it the best model for predicting the GWL in the Hebei Plain.

Table 5. Results of different performance indicators of the GRU model during the training and test-

GWL in the Hebei Plain.

ing periods at each station.


**Table 5.** Results of different performance indicators of the GRU model during the training and testing periods at each station.

#### *3.5. Model Comparison*

The SVM model had the lowest simulation accuracy for the selected stations, which may be due to the four kernel functions of the SVM. The RBF kernel function selected in this study could be modified further. When σ is too small, RBF can overfit, and when σ is too large, the relationship between X<sup>i</sup> and X<sup>j</sup> will have less overall influence on the model, causing inaccurate predictions. The LSTM model was the third-best model. It included a forget gate, which makes the partial derivative of the current memory unit to the previous memory unit a constant, thereby solving the disappearing gradient problem of RNN. Nevertheless, several input parameters exist, which increase the likelihood of overfitting. The second-best model was MLP. Its highly nonlinear global effect resulted in good accuracy in the training and testing periods, with an NSE value of 0.997 for the Yanmeidong station, but the model had the slowest learning speed, which was suboptimal in terms of time consumption for moment-to-moment predictions. Moreover, the GRU model was the most accurate for the dynamically fluctuating and dynamically increasing stations. As it had a simple gated structure and fewer input parameters, and as it reduced the risk of overfitting, it had a shorter training time (Table 6). Thus, it was the most suitable model for predicting the GWL in the Hebei Plain.


**Table 6.** Training time comparison of four models with 500 epochs.

Figure 11 presents the scatter plots for each station during the study period. The GRU model had the best correlation between the observed and predicted GWL of the four models, with predicted values near the regression function for almost every station. Although the MLP model was accurate for most stations, it did not capture the extreme values of the dynamically increasing sites. Moreover, the SVM and LSTM models only reflected the GWL trend for dynamically fluctuating stations, and they did not accurately predict the specific GWL values.

Finally, we evaluated the performance of the models for each of the stations using Taylor diagrams (Figure 12). The results showed that the GRU model had the best accuracy for both dynamically fluctuating and dynamically increasing stations, followed by the MLP model, which had the best accuracy for dynamically decreasing stations. Comparatively, the SVM model had the poorest performance, as it was the furthest from the "Ref" point for most stations. It was followed by the LSTM model, which performed reasonably.

Figure 11. Scatter diagrams of GWL simulations for each site by SVM, LSTM, MLP and GRU models: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian station, (d)Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station. **Figure 11.** Scatter diagrams of GWL simulations for each site by SVM, LSTM, MLP and GRU models: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**)Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station. Water 2023, 15, x FOR PEER REVIEW 19 of 21

Figure 12. Taylor diagrams of GWL simulations for each site by SVM, LSTM, MLP and GRU models: (a) Huimazhai station, (b) Hongmiao station, (c) Xiliangdian station, (d) Yanmeidong station, (e) Wangduxiancheng station, and (f) XincunIIIzu station. **Figure 12.** Taylor diagrams of GWL simulations for each site by SVM, LSTM, MLP and GRU models: (**a**) Huimazhai station, (**b**) Hongmiao station, (**c**) Xiliangdian station, (**d**) Yanmeidong station, (**e**) Wangduxiancheng station, and (**f**) XincunIIIzu station.

GWL is a crucial indicator for evaluating the health of groundwater resources in the Hebei Plain. This study attempted to model the GWL in the Hebei Plain and predict dynamic changes in the GWL using SVM, LSTM, MLP, and GRU models, and it qualitatively and quantitatively analyzed the training and testing datasets in the modeling process. The

performed the best for dynamically fluctuating and dynamically increasing stations, while the MLP model performed the best for dynamically decreasing stations. The update gate in the GRU model acquired previous moment state information in the current state, which assisted in capturing long-term dependencies in the time series and solved the problem of overfitting to some extent. Moreover, the GRU model not only showed good performance in predicting trends, but it was also better than the other models regarding the training time and capturing extreme values, thus making

it the most suitable model for predicting the GWL in the Hebei Plain.

(2) Apart from the different principles of each model, the differences in the simulation results can be attributed to factors such as data segmentation during the modeling process, the length of subsequences, and the uncertainty of model parameters. Moreover, the influence of the different activation functions on the GWL in the different models should also be considered. Furthermore, the training frequency of each model in this study was the same, and adaptive improvements should be made for

Author Contributions: Conceptualization, Z.W. and C.W.; methodology, Z.W.; software, C.L.; validation, Q.S., W.L. and X.H.; investigation, Z.W., C.W. and T.Q.; resources, Q.S.; data curation, C.W.; writing—original draft preparation, Q.S. and Z.W.; writing—review and editing, C.W.; visualization, Q.S.; supervision, W.L.; project administration, L.Y.; funding acquisition, C.L. All authors have

, and NSE indicators, we discovered that the GRU model

4. Conclusions

(1) By comparing the RMSE, R<sup>2</sup>

each model in subsequent studies.

read and agreed to the published version of the manuscript.
