A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction

Li, Xiang; Peng, Chaoyang; Zhao, Yule; Xia, Xuemin

doi:10.3390/app15084576

Open AccessArticle

A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction

¹

College of Publishing, University of Shanghai for Science and Technology, Shanghai 200093, China

²

School of Environment and Architecture, University of Shanghai for Science and Technology, Shanghai 200093, China

³

MOE Key Laboratory of Groundwater Circulation and Environmental Evolution, China University of Geosciences (Beijing), Beijing 100083, China

⁴

Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun 130021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4576; https://doi.org/10.3390/app15084576

Submission received: 13 February 2025 / Revised: 15 April 2025 / Accepted: 16 April 2025 / Published: 21 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Sustainable groundwater resource management necessitates dependable and precise predictions of groundwater head fields under fluctuating climatic conditions. The substitution of original simulation models with efficient surrogates presents a challenge in simultaneously accounting for correlations among multiple time series outputs and maintaining overall prediction accuracy. This study develops a novel surrogate modelling approach, DSCNN-GRU, incorporating a deep separable convolutional neural network (DSCNN) and a gated recurrent unit (GRU), to efficiently capture temporal and spatial variations in groundwater head fields from transient groundwater flow models using input hydraulic conductivity field data. The applicability and performance of the proposed method are evaluated for predicting groundwater head fields in a practical research area under three scenarios with different hydraulic conductivity fields. The performance of the DSCNN-GRU model is compared to the traditional convolutional neural network (CNN), CNN-LSTM, and DSCNN-LSTM models to further test its applicability. The numerical study demonstrates that optimizing hyperparameters can result in reasonably accurate performance of the proposed model, and the “simplest” DSCNN-GRU outperforms CNN, CNN-LSTM, and DSCNN-LSTM in both prediction accuracy and time-to-solution.

Keywords:

transient groundwater flow; surrogate; convolutional neural network; gated recurrent unit

1. Introduction

Groundwater, a vital freshwater resource on Earth, plays a critical role in supporting human life and economic development. The flow and solute transport processes in groundwater are significantly influenced by complex and highly nonlinear changes in water density [1]. To address the challenges posed by the complexity and variability of groundwater, establishing a reliable and highly accurate long-term groundwater model based on physical information is deemed effective [2,3,4,5]. However, the development of such models necessitates extensive detailed information regarding aquifer characteristics. Even with the requisite data for a physics-based model, challenges persist, such as lengthy calibration times and low computational efficiency.

Hydrologists are increasingly utilizing machine learning methods to address challenges associated with physical-based models [6,7,8,9]. The data-driven modeling approach offers an advantage by eliminating the need to explicitly define physical relationships and parameters required to describe the physical environment. Machine learning algorithms approximate the relationship between model inputs and outputs through an iterative learning process [10], significantly enhancing the efficiency of model operation. Neural networks (NNs) have proven effective in modeling and predicting nonlinear time series data, such as groundwater levels, and have demonstrated comparable or superior performance to physical-based models in certain cases [11]. This study focuses on introducing a novel neural network coupled model as a surrogate for existing unsteady groundwater flow numerical models, aiming to enhance model efficiency for nonlinear time series water head data while ensuring prediction accuracy.

Convolutional neural networks (CNNs) are essential for image recognition and natural language processing due to their powerful feature extraction capabilities [12]. Therefore, 3D convolutional neural networks can be used to predict the head field at a specific time point [13]. Wunsch et al. [14] applied a CNN-based model to predict the highly variable discharge behavior of karst systems due to their heterogeneous hydraulic properties of Gottsakel in Oberach, North Alps. Elmorsy et al. [15] indicated that increasing the diversity of dataset sizes, utilizing multi-scale feature aggregation, and optimizing architectures can surpass the accuracy of existing state-of-the-art CNN permeability prediction models. However, in studies of groundwater flow, predicting the flow field over long periods at each time point is necessary, and a single CNN model cannot effectively predict long-term sequences. To address the limitations of CNNs, which prioritize local feature detection struggle with long-term temporal dependencies [16], recurrent neural networks (RNNs) are often combined with CNNs to leverage their superior memory capabilities. In previous studies, Lei et al. [17] accurately predicted riverbed water flux (SWF) by integrating CNNs with a Bayesian data value analysis (DWA) framework. Similarly, Tian et al. [18]. effectively evaluated groundwater recharge in the North China Plain using attention-gated recurrent units (ATTENTION-GRU) and CNNs. In addition, Ali et al. [19] combined CNN layers with bidirectional long short-term memory (Bi-LSTM) models for long-term groundwater level prediction at each time point. Recently, Zhang et al. [20] proved that the convolutional neural network gated recurrent unit (CNN-GRU) model can extract hidden features of the coupling relationship between groundwater depth and time series, allowing further prediction of groundwater depth at different time series.

This study proposes a new architecture to address the computational and representational limitations of CNNs. The method combines a depth separable convolutional neural network (DSCNN) with the GRU framework to construct an efficient surrogate model specifically for predicting groundwater flow. The hybrid model leverages CNN’s feature extraction capabilities and uses GRUs to capture long-term temporal dependencies in time series data. The integration of DSCNN is recognized for its lower computational load and reduced parameter requirements, which provides a computationally efficient surrogate without sacrificing representation capabilities [21]. Additionally, this study conducted experiments on the groundwater conditions in the Penola region of southeastern Australia. Using only a CNN model to predict groundwater head fields resulted in lower accuracy compared to using a CNN combined with long short-term memory networks (LSTMs). Replacing CNN and LSTM with DSCNN and GRU further achieves better results. Compared to a single CNN model, the DSCNN-GRU model is more efficient in predicting groundwater head fields over continuous time series in this region.

The hybrid DSCNN-GRU model is designed to offer a balanced solution, leveraging the strengths of CNNs and RNNs in feature representation and temporal sequence learning, while addressing the associated high computational demands, providing an efficient approach for groundwater flow prediction. The subsequent sections of this study are organized as follows: Section 2 presents an overview of the hydrogeological conditions in the study area and details the groundwater model layout for three cases. Section 3 briefs the standard CNN and LSTM models, followed by a description of the proposed DSCNN-GRU model. Section 4 outlines the predictive results of varied surrogate models and compares their performance through numerical evaluations. Finally, the main conclusions are summarized in Section 5.

2. Methodology

2.1. Groundwater Models

2.1.1. Study Area

This study was conducted in a region in southeastern South Australia, which is revised by the cases described by Yu et al. [22], measuring 36 km in width from east to west and 42 km in length from north to south. The region primarily features flat terrain, with limestone geology and sandy soil prevalent on the surface. The climate in this area falls within the Mediterranean to temperate climate zone, characterized by hot and dry summers, as well as cool and humid winters. Additionally, there is a gradual north–south gradient of evapotranspiration, with potential evapotranspiration ranging from approximately 1400 mm per year in Mount Gambier to about 1700 mm per year in Keith, located north of the study area.

In this region, groundwater flows from east to west, with higher elevation in the eastern part (Figure 1). The area features sand dunes and interdunal plains resulting from past marine incursions. Based on the hydrogeological cross-sections in the report of Morgan et al. [23], the representative aquifer system at our study site comprises the unconfined Tertiary Limestone Aquifer (TLA) and the confined Tertiary Confined Sand Aquifer (TCSA). The shallow TLA is characterized by karst limestone geology, commonly referred to as the Gambier Limestone Aquifer. This limestone aquifer is extensively utilized for agricultural irrigation, as well as for livestock, domestic, and other purposes.

The region has a history of flooding, and intentional lowering of groundwater levels in low-lying areas has been ongoing since 1863 through the implementation of agricultural drainage channels, primarily built between 1949 and 1972. The area is marked by numerous lakes and wetlands, sustained by groundwater inflows, maintaining water levels either throughout the year or seasonally. Primary land uses in this region include dryland grazing, irrigated pastures and vineyards, cork and hardwood forestry, as well as native vegetation.

Due to the numerous factors influencing groundwater flow in this region, there is a considerable level of uncertainty associated with instantaneous changes in hydraulic head. This study selects this area as the simulation focus to test the capability of the surrogate models in addressing real-word problems and to improve their applicability in predicting transient flow in complex groundwater models.

2.1.2. Modeling Method

The three models in this study were designed to depict the general hydrogeological environment of the study area, rather than being specifically tailored to a particular location. Initially, a 3D watershed was established with dimensions of 10 km wide, 10 km long, and 105 m deep. The watershed was discretized into 100 rows, 100 columns, and 21 layers, all configured as convertible layers, but only the first layer of water head field data is taken. The surface elevation in the eastern and western regions was set to 2 m. The bottom of the aquifer was assumed to be a no-flow boundary, while the eastern and western boundaries were designated as time-varying groundwater heads.

Data for transient head boundary (Figure 2a) were obtained from nearby observation wells (website: https://www.waterconnect.sa.gov.au/Systems/GD/Pages/Default.aspx, accessed on 9 November 2024). Transient recharge and evapotranspiration data (Figure 2b,c) were sourced from a calibrated regional water balance model by Morgan et al. [1,22,23].

Sixty months, representing a single stress period, are selected from the time span between 2009 and 2013, with each month serving as a time step. The parameter ranges for this model are primarily derived from the calibration of a regional model in the southeastern part of South Australia, which is used in Yu et al.’s study [22,23]. Range of other parameters such as horizontal hydraulic conductivity, vertical hydraulic conductivity, initial water head, boundary conditions, pumping rate of wells, and evapotranspiration rate are determined based on expert knowledge and local hydrogeological information.

In Case 1, the hydraulic conductivity distribution in the study area is segmented into ten blocks, with constant values within each block. Based on available research data for the study area, hydraulic conductivity values for Blocks 1 to 5 are randomly chosen from the range of 1 to 2 m/d, and for Blocks 6 to 10, from 3 to 4 m/d. The specific values of hydraulic conductivity for each block are illustrated in Figure 3a.

In Case 2, the hydraulic conductivity field of the study area is divided into 20 different blocks, as illustrated in Figure 3b. The values for each block are randomly selected from 2 to 6 m/d, based on existing research data.

In Case 3, a total of 64 evenly distributed sampling points are set in the study area, as depicted in Figure 4a. Based on existing research data, hydraulic conductivity values randomly selected from 10 to 100 m/d were assigned to sampling points and spatially interpolated across the study area using ordinary Kriging with a linear variogram model to generate the conductivity field shown in Figure 4b [2,24]. The study area is further divided into a 100 × 100 grid. Two randomly placed pumping wells are located at row 5, column 1, and row 74, column 58, with pumping rates randomly selected from 20 to 70 m³/d. In these three cases, except for different hydraulic conductivity settings, other common parameters such as initial head field and drainage area are the same.

2.2. Deep Learning-Based Surrogate Models

This study first evaluates the use of a general convolutional neural network (CNN) to construct a surrogate for groundwater transient flow models (Cases 1 and 2). Subsequently, long short-term memory (LSTM) and gated recurrent unit (GRU) are integrated to enhance the training efficiency and prediction accuracy of surrogate models, especially in replacing complex flow models (Case 3). Finally, the GRU is combined with a depthwise separable convolutional neural network (DSCNN) in lieu of the original CNN, aiming to further improve training efficiency. Additionally, various optimizers, activation functions, and convolutional layers are assessed to determine their effects on the DSCNN-GRU surrogate model, with the goal of identifying the optimal hyper-parameter combination (Figure 5).

2.2.1. CNN

The viability of substituting the conventional convolutional neural network (CNN) model is investigated. The specific design is as follows. Initially, the input images are all 100 × 100 pixels in size and contain four color channels: red (R), green (G), blue (B), and transparency (A). Each pixel denotes the water level of the study area at the corresponding position. Following Figure 6, a convolutional layer and a pooling layer are combined into a unit, resulting in a total of five such units in the model. A time-distributed convolutional layer is used to process the image input in the time dimension, performing maximum pooling at each time step. The pooled output is further processed across the time dimension before and then flattened to fit into the fully connected layer. Next, the hydraulic conductivity data are input, with each set of coefficients matched one-to-one with the convolved image datasets. A fully connected layer integrates the hydraulic conductivity data with the corresponding image data for further processing. To simplify prediction evaluation, only output images from six time points are retained, each representing the groundwater head information of the entire study area at that specific time point. The image size remains 100 × 100 pixels with four color channels, consistent with the input images. Thus, the output size is defined as 6 × 100 × 100 × 4, resulting in a total of 240,000 elements. Finally, a shaping layer is utilized to convert the output of the fully connected layer into the desired predicted image data for preservation.

This study uses convolutional operations to extract temporal features from groundwater head field data, while fully connected layers establish relationships between groundwater head field data and hydraulic conductivity data. The ReLU activation function is consistently applied in the network architecture due to its simplicity and computational efficiency. It solves the vanishing gradients issue, enhances convergence of deep neural networks, and accelerates training (Figure 6). For model optimization, the Adam optimizer is employed, which combines the advantages of the AdaGrad and RMSProp algorithms, along with momentum. The default Adam optimizer parameters in Keras are a learning rate of 0.001, beta1 of 0.9, beta2 of 0.999, and epsilon of 1 × 10⁻⁷. These parameters regulate the decay rates of the momentum variable (m) and second-order momentum variable (v) and prevent division by zero. The model is trained for 200 epochs with a batch size of 23 to ensure thorough training.

Case 3 is selected for further modeling because its hydraulic conductivity complexity at each time point closely resembles the actual conditions in the study area. To make Case 3 more close to the real situation in the study area, two new variables representing pumping rates of two water wells are introduced. The locations of these wells are obtained from a report published by local authorities [23]. The values of these two new variables are randomly assigned and integrated with image data during training, similar to the treatment of hydraulic conductivity. A fully connected layer is then applied to learn the relationships between the data.

2.2.2. Depthwise Separable Convolution Neural Network (DSCNN)

DSCNN (depthwise separable convolutional neural network) comprises two primary components: depthwise convolution and pointwise convolution. Depthwise convolution applies a separate convolution kernel to each channel of the input feature map, with the outputs concatenated to produce the final output (Figure 7(2a)). The number of output channels in depthwise convolution matches the number of convolution kernels, with one kernel per channel. For instance, if the input feature map has 4 channels (Figure 7), employing separate convolution kernels results in four single-channel feature maps, which are combined to obtain a four-channel output feature map.

Pointwise convolution, essentially a 1 × 1 convolution in DSCNN, serves two key functions: flexibly adjusting the number of output channels and performing channel fusion on the feature map from depthwise convolution. It does not involve inter-channel computation between different input and output channels. By connecting 1 × 1 convolution after depthwise convolution, DSCNN effectively integrates the convolutional information from separated channels for further processing.

To evaluate the computational efficiency of both CNN and DSCNN, assuming the input feature map size is D_k × D_k × M and the convolution kernel size is D_f × D_f × M, with N convolution kernels, each point in the feature map requires D_k × D_k × D_f × D_f × M calculations, resulting in a total of D_k × D_k × D_f × D_f × M × N operations for CNN. For DSCNN, the total computation for depthwise convolution is D_k × D_k × D_f × D_f × M, while for pointwise convolution, it is D_k × D_k × M × N. Consequently, the total computation for DSCNN is D_k × D_k × D_f × D_f × M + D_k × D_k × M × N. The ratio of DSCNN to CNN computational cost is 1/N + 1/D_f², significantly smaller than 1 since N and D_f are integers greater than 1, indicating DSCNN’s superior computational efficiency compared to standard convolution. As a CNN variant, DSCNN offers substantial computational efficiency improvements without significant computational accuracy loss.

2.2.3. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)

The LSTM network shows its ability to capture both long- and short-term dependencies [25]. In contrast to traditional RNNs, LSTM networks have extra cell states that serve as memory units for storing information. The fundamental structure of an LSTM cell, as illustrated in Figure 8a, includes x_t as the input vector, h_t(h_t−1) representing the hidden state of the LSTM cell at time step t(t − 1), and c_t(c_t−1) indicating the cell state of the LSTM cell at time step t(t − 1). The architecture of the LSTM cell ensures that both its cell state (c_t) and hidden state (h_t) are propagated to the next time step.

The LSTM unit comprises three gates: the forget gate (f_t), input gate (i_t), and output gate (o_t), which regulate and update the cell state (c_t) and hidden state (h_t). These gates act as filters, each performing a distinct function. The forget gate (f_t) determines which information to discard from the cell state (c_t). The input gate (i_t) decides which new information should be incorporated into the cell state (c_t). The output gate (o_t) specifies the information derived from the cell state will be utilized as output (o_t). Through the coordinated actions of these three gates, the LSTM cell can effectively capture complex dependencies in both short- and long-term time series, making a significant improvement over traditional RNNs.

The complex structure of LSTM networks requires significant time for training. To expedite the training process, the GRU network was introduced as a modification of the LSTM network, featuring a more straightforward architecture. The specific structure of the GRU unit is illustrated in Figure 8b, where the hidden state (h_t) and cell state (c_t) are consolidated into a single component. The GRU contains two control gates: the update gate (z_t) and the reset gate (r_t). The update gate (z_t) regulates the extent to which state information from the previous time step (h_t−1 or c_t−1) is carried over to the current time step t. A higher update gate value indicates more information from the previous time step is retained. Conversely, the reset gate (r_t) controls how much state information from the previous time step is passed to the current time step. A lower reset gate value implies that less information is retained.

2.2.4. Proposed Method (DSCNN-GRU)

After confirming the feasibility of CNN as a surrogate model, an optimized coupling methods should be determined by systematically combining DSCNN, LSTM, and GRU network architectures. During this experimental phase, CNN is combined with LSTM to form CNN-LSTM, while DSCNN is fused with LSTM to create DSCNN-LSTM. Furthermore, the combination of DSCNN and GRU results in DSCNN-GRU.

To enhance the processing ability of surrogate models for time series, this study combines CNN and RNN. Traditional RNN networks struggle to capture long-term dependencies in time series data due to vanishing gradients. While LSTM mitigates the memory issue, it shares the inefficiency of CNN when handling large-scale datasets. To boost the training efficiency and accuracy of surrogate models, DSCNN offers higher computational efficiency and comparable prediction accuracy to CNN, as described in Section 2.2.2 and Section 2.2.3. On the other hand, GRU provides higher computational efficiency and retains robust long-term memory compared to LSTM. Thus, this study combines DSCNN with GRU to create a new model.

This study employs Keras’ TimeDistributed layer to merge DSCNN and GRU. The input consists of chronologically ordered image sequences. TimeDistributed applies the DSCNN model to each time slice of the sequence. The output flows into a 128-unit GRU layer, integrating information from the entire sequence. This allows the model to process image sequences while learning spatial features (using DSCNN) and temporal features (using GRU).

As shown in Figure 9, after DSCNN and GRU processing, the image dataset, now enriched with temporal features, is supplemented with hydraulic conductivity field and pumping rate data. These two datasets are first passed through a fully connected layer to extract temporal features. The processed image data and these features are then input into another fully connected layer to learn their temporal relationships. This allows the model, given hydraulic conductivity field and pumping rate data, to predict groundwater level fields for the entire study area at each time step. The prediction is based on the model’s understanding of the temporal and spatial characteristics of the image data, alongside the relationship between these features and the mentioned data types. To facilitate evaluation, as detailed in Section 2.2.1, a reshape operation is incorporated at the end of the model. This operation generates visual representations of the water level field at six specific time points across the entire time period.

2.3. Application of the Deep Learning Based Surrogate

In order to obtain training and testing data for the neural network model, physical models are developed for Case 1, Case 2, and Case 3. The training and testing datasets consist of hydraulic conductivity fields, well parameters, and head data at specific time steps, all of which were determined by the physical simulation model. The training and testing datasets contain 540 and 60 randomly generated samples, respectively. In Case 1 and Case 2, each sample includes permeability fields and global head fields at time steps 10, 20, 30, 40, 50, and 60. In Case 3, the samples also include pumping rate data from two wells as input images.

3. Results

The mean squared error (MSE) is employed as the loss function to evaluate the performance of surrogate models. A lower MSE value indicates reduced loss during the training process, signifying a heightened understanding of the relevant parameters and characteristics of the training samples generated using the physics-based model. Furthermore, MSE is used to evaluate the similarity between the surrogate model’s predicted outputs and the original outputs from the forward model.

However, due to the high sensitivity of MSE to outliers and its neglect of structural information, it simply calculates the gap between outputs at each grids. The calculation formula for MSE is given as follows:

MSE = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[x (i, j) - y (i, j)]}^{2};

In the above formula, x represents the predicted water head field data, y represents the input water head field data, and m and n represent the number of predicted and input water head field data, respectively.

To address this limitation, the coefficient of determination (R²) is introduced for a more comprehensive evaluation of prediction accuracy. A higher R² value, closer to 1, indicates a stronger correlation between the outputs from the surrogate and the forward model, reflecting superior prediction accuracy. The calculation formula for R² is given as follows:

R^{2} = \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

In this equation,

{\hat{y}}_{i}

is the predicted value,

\bar{y}

is the average of the predicted values, and

y_{i}

is the observed value.

Since CNN models are used for surrogate evaluation experiments in Cases 1, 2, and 3, R² is less affected by data size compared to MSE, making it more appropriate for cross-dataset comparisons. To further evaluate the accuracy of the prediction, the structural similarity index (SSIM) is also introduced, which comprehensively evaluates the overall similarity of two images based on brightness, contrast, and structural similarity. The formula for SSIM is as follows:

SSIM (x, y) = f (l (x, y), c (x, y), s (x, y)) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})} .

In this formula,

μ_{x}

is the average value of x,

μ_{y}

is the average value of y,

σ_{x}^{2}

is the variance of x,

σ_{y}^{2}

is the variance of y,

σ_{x y}

is the covariance between x and y, and

C_{1}

and

C_{2}

are constants used to maintain stability.

In addition, considering that the generation time for each set of predicted images (six images) by a trained model is less than 5.625 ms, which is negligible, the training time is used to evaluate model efficiency.

3.1. CNN Surrogate Model Results

To evaluate the effectiveness of the CNN surrogate model, experiments are conducted on three initial models: Case 1, Case 2, and Case 3. The initial model designs vary from Case 1 to 3 in terms of the complexity of hydraulic conductivity fields. The complexity of hydraulic conductivity data increases from Case 1 to Case 3, as described in Section 2.1.2. Case 1 involves 10 randomly generated values for each hydraulic conductivity volume within a field. Case 2 divides the hydraulic conductivity field into 20 blocks, each with randomly generated values in the range of 10 to 100 m/month. In Case 3, Kriging interpolation is employed to obtain the hydraulic conductivity field from 40 sampling points, where the values is determined from available materials of the study area. Case 3 incorporates pumping rate data from two pumping wells that are not considered in Case 1 and Case 2. Consequently, the overall complexity of the initial model increases progressively from Case1 to Case 3, accompanied by a sequential rise in the volume of data requiring processing by the surrogate model.

The training and testing sample sets are modified for different cases, while the network architecture and parameters remained constant. Figure 10 illustrates that the MSE value consistently decreases as the training period extends. Generally, the complexity of the models from Case 1 to 3 leads to a gradual increase in MSE during training. However, an intriguing phenomenon was observed at the 30th iteration: the MSE value of model c_3 was lower than that of model c_2. This may be attributed to the more complex Case 3 scenario, where the surrogate model processes a larger number of input features and nonlinear relationships, leading it to focus on global data patterns rather than overfitting to local noise, thus achieving better prediction performance.

To capture the temporal characteristics of the head data, an additional fully connected layer is incorporated into the network structure. In the test sample set, as depicted in Figure 11, the MSE and R² values of the alternative model’s predictions for Cases 1, 2, and 3 increase sequentially. This indicates that as the complexity of the initial model rises, the prediction accuracy of the same surrogate model declines, accompanied by an increase in training time. To further validate this finding, the experiment introduced the SSIM as a new metric to assess the structural similarity between the predicted image data and the original image data in the test set, as shown in Figure 11b. Notably, the SSIM will also decrease from 0.969 to 0.917 as the complexity of the initial model increases. Despite a gradual decrease in accuracy from Case 1 to Case 3, the CNN surrogate model still performs well in Case 3, with an average R² of 0.902 and SSIM of 0.917 when comparing predicted and original head field data.

3.2. Coupled Surrogate Model Results

To enhance the accuracy and effectiveness of the surrogate model, this study integrates CNN-LSTM, DSCNN-LSTM, and DSCNN-GRU. The coupling method employed for all three models follows the same coupling method as DSCNN-GRU, described in Section 2.2.4. The optimal coupling model is identified by assessing relevant parameters. Given that the training and test sample sets for all three coupled models are identical to c_3, identical evaluation criteria can be used for performance comparison.

As depicted in Figure 12 and Figure 13, the CNN-LSTM model has exhibited a significant improvement in accuracy compared to traditional CNN models. Specifically, the MSE value decreased from 0.0018 to 0.0012 during training and from 0.009 to 0.0034 during testing. However, this improvement comes at the expense of increased computational load, resulting in a 4.9% increase in training time from 3348 s to 3514 s. Furthermore, it was observed that the MSE value of the CNN-LSTM model exhibited significant fluctuations throughout the entire training cycle, with an absolute difference of up to 0.0008 between consecutive cycles. These fluctuations can primarily be attributed to the nonlinear characteristics of the transient groundwater flow in the time series.

After integrating DSCNN-LSTM, the training time decreased to 2892 s. Moreover, the limited sampling time of only 540 samples and the small sample size impede LSTM’s capability to effectively leverage its memory for acquiring long-term sequence information. Consequently, the MSE value increased to 0.0042 during the training process, and the MSE value rose to 0.059 during the testing process.

The integration of DSCNN and GRU reduced the training time to 2858 s, which is 490 s lower than that of CNN, 656 s shorter than CNN-LSTM, and 34 s lower than DSCNN-LSTM. This improvement in computational efficiency is significant. The MSE value of the proposed DSCNN-GRU model remained consistently lower than that of traditional CNN models throughout the entire training period. Additionally, the MSE values of the DSCNN-GRU model showed less fluctuations compared to the CNN-LSTM model during the initial 100 to 200 cycles.

Compared to the CNN-LSTM model, the proposed DSCNN-GRU model has a slightly higher MSE value of 0.0096, compared to the CNN-LSTM’s value of 0.0034. However, the DSCNN-GRU model achieves a shorter training time, reduced by 18.6%. The R² value of the DSCNN-GRU model is 0.949, which is 0.84% lower than the 0.957 achieved using the CNN-LSTM model. Additionally, the SSIM values for DSCNN-GRU and CNN-LSTM are 0.968 and 0.971, respectively. This indicates that the average prediction error of these two models is within 0.003 m, showing a significant improvement compared to the CNN model’s 0.923. Although the proposed DSCNN-GRU model shows only a slight reduction in prediction accuracy compared to CNN-LSTM, it significantly enhances computational efficiency. Considering the trade-off between computational accuracy and efficiency, the proposed DSCNN-GRU model outperforms the other three models in overall performance.

3.3. Further Optimization of the DSCNN-GRU Surrogate Model

The previous analysis compared various coupling models, demonstrating that the DSCNN-GRU not only maintained the prediction accuracy of the proxy model but also significantly improved its efficiency. To further improve the proposed DSCNN-GRU model, the impact of network parameters was examined, including variations in the number of convolutional layers, optimizers, and activation functions. Initially, the experiment assessed three, four, and five convolutional layers to identify the optimal configuration. Subsequently, Adam, Nadam, and RMSprop were employed as optimizers to determine the most effective option. Finally, Relu, Softmax, and Tanh were evaluated as activation functions to ascertain the best choice.

The Nadam optimizer combines the principles of the Adam optimizer with Nesterov momentum. Its primary objective is to accelerate convergence and enhance training stability, making it particularly effective for handling large datasets with high similarity. In contrast, RMSprop is an adaptive learning rate method that facilitates timely adjustments to the learning rate, thereby reducing potential data loss caused by inappropriate learning rates, especially when the complexity of the water head field data in the training samples varies.

Among the various activation functions, the Softmax function is frequently employed for multi-class classification problems. However, it is important to note that the exponential calculations inherent in the Softmax function can cause the output value to approach 1 when the input value is large and to approach 0 when the input value is small. When there is a substantial disparity between the input values, this can lead to saturation effects and potential vanishing gradient issues. Given the highly complex head field data present in the training samples, the differences between individual samples may be significant, resulting in increased training time and reduced prediction accuracy for models utilizing this function. Conversely, the Tanh (hyperbolic tangent) function is an S-shaped function that maps real number inputs to a range between −1 and 1. It exhibits symmetry in handling both positive and negative inputs; however, the data in the training samples of this study are highly nonlinear and do not demonstrate symmetry across a wide range. Consequently, the prediction accuracy of models employing this function may not be optimal.

3.3.1. Optimization of Convolution Layer

To determine the optimal number of convolutional layers for the proposed DSCNN-GRU model, this study experimented with three, four, and five layers. As the number of convolutional layers changed, it was found that excessive pooling layers were unnecessary. Thus, when only three or four convolutional layers were used, the number of units in both convolutional and pooling layers (as shown in Figure 6) was correspondingly reduced to three or four. Table 1 demonstrates minimal changes in MSE values across the three models during extended training, likely due to slight differences in the number of convolutional layers. However, reducing the number of convolutional layers resulted in an increase in training time, from 2858 s with five layers to 3064 s with three layers. Despite maintaining constant regularization parameters, batch size, and other model parameters settings, several possible reasons could explain this abnormal behavior: (1) Insufficient parallel optimization of the GPU, possibly due to the inappropriate scale of the model, limiting GPU utilization and slowing down training. (2) The reduction in the number of layers may have decreased the caching efficiency of data and model parameters in memory, resulting in longer training times. (3) With fewer convolutional layers, feature extraction capability is diminished, necessitating more time for the model to learn the same feature representation.

As the training duration increased, the MSE value in the test set also rose. Specifically, the MSE value was 0.0062 for the model with five layers and 0.0096 for the model with three layers, indicating a decline in prediction accuracy. The R² value exhibited a similar trend, decreasing from 0.954 for the model with five layers to 0.948 for the model with three layers. These findings suggest that increasing the number of convolutional layers enhances both the computational efficiency and accuracy of the model. Both the models in Section 3.3.2 and Section 3.3.3 utilize five convolutional layers.

3.3.2. Optimization of Optimizers

Following the decision to use five convolutional layers, this study further investigates the impact of different optimizers on model efficiency and accuracy. The results, presented in Table 1, reveal that when using Adam, the model’s training time is 2858 s, with the lowest MSE value during testing recorded at 0.0062. In contrast, when using RMSprop and Nadam, the MSE values are 0.043 and 0.068, respectively, representing an order of magnitude higher compared to using Adam. This difference in performance may be attributed to RMSprop having a high learning rate, causing the optimization process to oscillate around the optimal solution without convergence. However, it is noteworthy that the training time for RMSprop is 2598 s, while for Nadam it is 3212 s. Overall, Nadam yields a stable MSE value of 0.0031 after the 50th cycle during model training, yet the MSE value on the test set increases to 0.068, and the R² value reaches 0.853, which is inferior to RMSprop. This implies that Nadam, due to its accelerated convergence speed without appropriate regularization or early stopping strategies, may lead to overfitting problems. Consequently, even though the MSE value during model training is low, the model might assimilate a significant amount of noise information, resulting in diminished prediction accuracy.

The experimental data clearly demonstrate that the choice of optimizer significantly influences the model’s training efficiency and prediction accuracy. Based on the experimental data, it is evident that the choice of optimizer has a notable influence on the training efficiency and prediction accuracy of the model. Although Adam prolongs the training time by 260 s compared to RMSprop, the substantial disparity in prediction accuracy favors Adam for this study. However, in the case of Nadam, its performance was unsatisfactory in this experiment.

3.3.3. Optimization of Activation Functions

After selecting Adam as the optimizer, this study focused on examining the impact of different activation functions on model efficiency and accuracy. Specifically, only the activation function of the convolutional layer was modified, while the other layers remained unchanged. The results, as depicted in Table 1, show that when Softmax is used as the activation function, the model’s training time is 3548 s, the MSE value for prediction is 0.057, and the R² value is 0.826. However, compared to the other two activation functions, Softmax yielded less favorable results in terms of training time and prediction accuracy; thus, further discussion on this particular activation function will not be pursued.

In the training set, the mean squared error (MSE) using Tanh as the activation function is 0.0052, which represents a 16% decrease compared to the MSE of 0.0062 when using Relu. Additionally, the R² values for Tanh and Relu are 0.974 and 0.954, respectively. These results clearly indicate that using Tanh as the activation function can significantly improve the prediction accuracy of the model in this study. However, it is worth noting that the training time for the Tanh model is 2876 s, whereas it is only 2598 s for the Relu model, resulting in higher computational efficiency. Despite the reduction in training time by 278 s when using Relu, the improvement in prediction accuracy was deemed more important for this study, given that there was no significant difference in model budget efficiency. Therefore, using the Tanh activation function was considered more suitable for this study, as it yielded higher prediction accuracy compared to using Relu, and the reduction in training efficiency is acceptable.

3.4. Illustrative Example

As shown in Figure 14, each row of y, y₁, and y₂ consists of six images, each of which is the head field data of the study area at the 10th, 20th, 30th, 40th, 50th, and 60th time steps. Row y is a randomly selected set of head field data obtained from the physical model execution of Case 3. Row y₁ and row y₂ are the predicted head field data given by the CNN and DSCNN-GRU models, respectively, corresponding to the data in row y. The lines |y − y₁| and |y − y₂| are cloud maps, showing the absolute differences between the head fields in the corresponding time steps in the y, y₁, and y₂ rows.

Comparing the water level difference data at the same time step for rows |y − y₁| and |y − y₂|, it is clear that the prediction accuracy of CNN is lower than that of DSCNN-GRU. However, the head field data predicted by these two types of models generally give a consistent head distribution across the entire study area with the physical model. In addition, both models exhibited obvious prediction errors in the southeast and northeast regions of the study area, respectively. This difference may be due to the fact that the water level changes in the eastern region of the area during the sampling period of this study were larger than those in the western region, resulting in a higher complexity of the time-varying water head field data in the eastern region, which led to a decrease in model learning accuracy.

When comparing the prediction results of each time step of the DSCNN-GRU model, it was found that the model has high accuracy in predicting continuous time series. However, it should be pointed out that the error in predicting the head field at each time step is not completely consistent. This inconsistency may stem from the fact that the data contained in the input sample varies unevenly over time.

4. Discussion

This study presents an innovative DSCNN-GRU surrogate modeling framework for simulating transient groundwater head fields through continuous time series prediction. A comprehensive evaluation of surrogate modeling approaches, including CNN-LSTM, DSCNN-LSTM, and the proposed DSCNN-GRU framework, reveals that the proposed DSCNN-GRU model achieves an optimal balance between computational efficiency and predictive accuracy.

The comparative analysis demonstrates that while the CNN-LSTM architecture attains comparable predictive accuracy, the DSCNN-GRU surrogate achieves a significant reduction in computational training time. This enhanced efficiency corroborates the theoretical advantages of GRU architectures over conventional LSTM networks, as originally theorized by Chung et al. (2014) [16], particularly regarding GRU’s simplified gating mechanism that effectively mitigates the characteristic “memory saturation” phenomenon observed in LSTM networks. Furthermore, the DSCNN-GRU exhibits superior training stability compared to CNN-LSTM, with reduced error fluctuations that align with previous findings regarding LSTM instability in large-scale hydrological modeling applications. This study extends the application of GRU networks in groundwater modeling beyond the pure time series prediction demonstrated by Gharehbaghi et al. [26] through a novel integration with DSCNN, which successfully integrates spatial feature extraction with temporal sequence modeling. This hybrid architecture achieves a robust solution for simultaneous spatiotemporal analysis of transient groundwater flow dynamics, achieving enhanced predictive capability while maintaining computational efficiency comparable to conventional approaches.

Systematic parameter optimization analysis demonstrates that the DSCNN-GRU model exhibits pronounced sensitivity to three critical hyperparameters: (1) number of convolutional layers, (2) optimizer selection, and (3) activation function choice. Increasing convolutional layers from three to five significantly enhances prediction accuracy, corroborating Ali et al.’s [19] findings that deeper networks better capture complex relationships in groundwater systems. Notably, contrary to conventional patterns, the five-layer architecture demonstrates higher computational efficiency than the three-layer version, likely due to optimized memory caching that reduces data fragmentation and improved feature extraction hierarchy that accelerates convergence. In optimizer comparisons, Adam exhibits superior performance despite marginally longer training times than RMSprop. The suboptimal performance of Nadam contrasts with some conclusions from Kannan [27], suggesting its accelerated convergence may adversely affect groundwater modeling applications requiring smooth loss landscapes, highlighting the importance of problem-specific optimizer selection. The activation function evaluation reveals that Tanh activation provides superior performance for transient hydraulic head prediction, particularly in capturing nonlinear aquifer responses, and Softmax activation proves fundamentally unsuitable for this surrogate modeling task. These results quantitatively confirm Chen et al.’s [28] hypothesis about activation function specialization, demonstrating that hydrological data characteristics should drive function selection rather than relying on default choices from other domains.

While the simplified 2D heterogeneous groundwater flow model validates the DSCNN-GRU approach’s feasibility, several limitations merit discussion. First, the current implementation simplifies the complex hydrogeological characteristics of karst limestone aquifers (e.g., the Gambier Limestone Aquifer system) by representing hydraulic conductivity through range parameters rather than explicitly modeling dominant flow pathways through karst conduits and fracture networks. Future integration of discrete fracture network (DFN) modeling with conditional geostatistical simulation techniques could better capture the hierarchical organization of karst systems and preferential flow dynamics. Second, the current fixed optimization strategy for network architecture may limit performance in complex hydrogeological settings. Subsequent studies could systematically optimize critical parameter combinations in hybrid neural architectures using advanced optimization algorithms while incorporating attention mechanisms to capture the multiscale flow characteristics unique to karst systems. Third, and equally important, future development of surrogate models should explicitly account for conceptual model uncertainty—particularly the impact of geological conceptualizations on flow dynamics [2]. This would involve generating multiple realizations of aquifer heterogeneity to train and test surrogate models under different geological scenarios, thereby enhancing their robustness and applicability in challenging hydrogeological environments.

5. Conclusions

This article introduces a new deep learning research method, the DSCNN-GRU coupled model, and applies it to surrogate the physical model established for the southeastern coastal region of Australia. The goal is to predict the continuous time series of complex transient water head fields using neural network models, rather than relying on physical models established based on long-term observational data in the study area. By combining the DSCNN and GRU network architectures, this method effectively and accurately simulates the transient flow field of groundwater. In addition, this method shows good results as an alternative to non-transient aquifer models with complex conditions. The main findings of this study can be summarized as follows:

(1): The alternative model based on general convolutional neural networks (CNNs) is suitable for studying transient groundwater head prediction. However, as the flow state becomes more complex, the prediction accuracy decreases, and the model training time significantly increases, resulting in potential high costs.
(2): Compared to traditional CNN models, the coupling model based on DSCNN-GRU exhibits superior performance in both training efficiency and prediction accuracy. A distinctive feature of DSCNN is its two-step convolution process, which includes channel convolution and point convolution. This unique method significantly reduces the computational complexity during the convolution process, thereby improving network efficiency. On the other hand, GRU is a variant of RNN that effectively solves the problems related to long-term memory and vanishing gradients through its specialized gating unit. Therefore, it improves the ability of the model to maintain memory of time series data, thereby improving prediction accuracy. The forward model of groundwater flow developed using DSCNN-GRU effectively solves the problem of continuous time series by transferring it from a general CNN network to a GRU. This innovative method successfully achieved forward deduction and prediction of the groundwater head field in the study area during the study period. Importantly, this method does not require a large amount of computational resources for training.
(3): Another focus of this study is to optimize the DSCNN-GRU model by modifying three parameters: the convolutional layer, the optimizer, and the activation function. The experimental results show that when replacing complex physical models, neural networks with five convolutional layers, Tanh activation function, and Adam optimizer perform exceptionally well. Furthermore, the findings indicate that employing the RMSprop or Nadam optimizers, along with the Softmax activation function, notably diminishes the prediction accuracy for the groundwater head field data within the study area. However, due to limited resources, this study only explored the use of Kriging interpolation to establish a groundwater flow model for the region, and did not extensively validate other algorithms. In addition, neural networks contain various parameters such as convolutional layers, regularization parameters, fully connected layers, optimizers, activation functions, pooling layers, and learning rates. This article only briefly compares and analyzes three parameters. Further research is needed to explore other optimization methods and challenges.

Author Contributions

Conceptualization, X.L. and X.X.; Methodology, C.P.; Validation, X.L. and Y.Z.; Resources, X.L.; Writing—original draft, C.P.; Writing—review & editing, X.X.; Supervision, X.X.; Funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 42402252, the open project program of MOE Key Laboratory of Groundwater Circulation and Environmental Evolution, China University of Geosciences (Beijing), the Open Project Program of Key Laboratory of Groundwater Resources and Environment (Jilin University), and the Ministry of Education (Grant No. 202306ZDKF08).

Institutional Review Board Statement

The authors state that the research was not on animals or humans.

Informed Consent Statement

Authors state that informed consent is not applicable to this research as the research was not on humans or on any data violating the privacy of any individual.

Data Availability Statement

All data, models, and code generated or used during the study that appears in the submitted manuscript are available on a reasonable request.

Acknowledgments

This study was supported by National Natural Science Foundation of China (Grant No. 42402252), the open project program of MOE Key Laboratory of Groundwater Circulation and Environmental Evolution, China University of Geosciences (Beijing), the Open Project Program of Key Laboratory of Groundwater Resources and Environment (Jilin University), and the Ministry of Education (Grant No. 202306ZDKF08).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Roy, D.K.; Datta, B. Fuzzy c-mean clustering based inference system for saltwater intrusion processes prediction in coastal aquifers. Water Resour. Manag. 2017, 31, 355–376. [Google Scholar] [CrossRef]
Schiavo, M. The role of different sources of uncertainty on the stochastic quantification of subsurface discharges in heterogeneous aquifers. J. Hydrol. 2023, 617, 128930. [Google Scholar] [CrossRef]
Masterson, J.P.; Pope, J.P.; Fienen, M.N.; Jack Monti, M.R., Jr.; Finkelstein, J.S. Documentation of a Groundwater Flow Model Developed to Assess Groundwater Availability in the Northern Atlantic Coastal Plain Aquifer System from Long Island, New York, to North Carolina; Scientific Investigations Report; ISSN 2328–0328. Available online: https://pubs.usgs.gov/publication/sir20165076 (accessed on 15 April 2025).
Chang, S.W.; Nemec, K.; Kalin, L.; Clement, T.P. Impacts of climate change and urbanization on groundwater resources in a barrier island. J. Environ. Eng. 2016, 142, 1123. [Google Scholar] [CrossRef]
Doble, R.C.; Pickett, T.; Crosbie, R.S.; Morgan, L.K.; Turnadge, C.; Davies, P.J. Emulation of recharge and evapotranspiration processes in shallow groundwater systems. J. Hydrol. 2017, 555, 894–908. [Google Scholar] [CrossRef]
Sadler, J.M.; Goodall, J.L.; Morsy, M.M.; Spencer, K. Modeling urban coastal flood severity from crowd-sourced flood reports using poisson regression and random forest. J. Hydrol. 2018, 559, 43–55. [Google Scholar] [CrossRef]
Fahimi, F.; Yaseen, Z.M.; El-Shafie, A. Application of soft computing based hybrid models in hydrological variables modeling: A comprehensive review. Theor. Appl. Climatol. 2017, 128, 875–903. [Google Scholar] [CrossRef]
Yang, T.T.; Asanjan, A.A.; Welles, E.; Gao, X.G.; Sorooshian, S.; Liu, X.M. Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information. Water Resour. Res. 2017, 53, 2786–2812. [Google Scholar] [CrossRef]
Yaseen, Z.M.; El-Shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, K.N. Artificial intelligence based models for stream-flow forecasting: 2015, 2000–2015. J. Hydrol. 2015, 530, 829–844. [Google Scholar] [CrossRef]
Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinformatics 2008, 10, 3–22. [Google Scholar] [CrossRef]
Yoon, J.; Kim, K.M.; Ahn, J.H. Development of artificial intelligence model to forecast photovoltaic power generation including airborne particulate matter. J. Korean Soc. Environ. Eng. 2022, 44, 111–124. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (cnn) and support vector regression (svr). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
Wunsch, A.; Liesch, T.; Cinkus, G.; Ravbar, N.; Chen, Z.; Mazzilli, N.; Goldscheider, N. Karst spring discharge modeling based on deep learning using spatially distributed input data. Hydrol. Earth Syst. Sci. 2022, 26, 2405–2430. [Google Scholar] [CrossRef]
Elmorsy, M.; El-Dakhakhni, W.; Zhao, B. Generalizable permeability prediction of digital porous media via a novel multi-scale 3D convolutional neural network. Water Resour. Res. 2022, 58, 31454. [Google Scholar] [CrossRef]
Chung, J.Y.; Gulcehre, C.; Cho, K.H.; Beengio, Y.S. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Lei, J.; Hou, Y.T.; Zhang, J.J.; Zheng, Q.; Yan, H.Q. Streambed water flux characterization through a Deep-Learning-Based approach considering data worth analysis: Numerical modeling and sandbox experiments. J. Hydrol. 2022, 612, 128111. [Google Scholar] [CrossRef]
Tian, N.; Cao, W.G.; Wang, Z.; Gao, Y.Y.; Zhao, L.H.; Sun, X.Y.; Nan, J. Evaluation of shallow groundwater dynamics after water supplement in North China Plain based on attention-GRU model. J. Hydrol. 2023, 625, 130085. [Google Scholar] [CrossRef]
Ali, A.S.A.; Ebrahimi, S.; Ashiq, M.M.; Alasta, M.S.; Azari, B. CNN-Bi LSTM Neural Network for Simulating Groundwater Level. CRPASE Trans. Civ. Environ. Eng. 2022, 8, 1–7. [Google Scholar] [CrossRef]
Zhang, X.Q.; Zheng, Z.W. A novel groundwater burial depth prediction model based on two-stage modal decomposition and deep learning. Int. J. Environ. Res. Public Health 2022, 20, 345. [Google Scholar] [CrossRef]
Zhang, T.W.; Zhang, X.L.; Shi, J.; Wei, S.J. Depthwise separable convolution neural network for high-speed sar ship detection. Remote Sens. 2019, 11, 2483. [Google Scholar] [CrossRef]
Yu, X.Y.; Cui, T.; Sreekanth, J.; Mangeon, S.; Doble, R.; Xin, P.; Rassam, D.; Gilfedder, M. Deep learning emulators for groundwater contaminant transport modelling. J. Hydrol. 2020, 590. [Google Scholar] [CrossRef]
Morgan, L.K.; Harrington, N.; Werner, A.D.; Hutson, J.L.; Woods, J.; Knowling, M.J. South East Regional Water Balance Project–Phase 2 Development of a Regional Groundwater Flow Model; Goyder Institute for Water Research Technical Report Series; Goyder Institute for Water Research: Adelaide, Australia, 2016; ISSN 1839-2725. [Google Scholar]
Fetter, C.W. Applied Hydrogeology, 4th ed.; Prentice Hall: Hoboken, NJ, USA, 2001. [Google Scholar]
Duan, Y.J.; Lv, Y.S.; Wang, F.Y. Travel time prediction with LSTM neural network. In Proceedings of the 19th International Conference on Intelligent Transportation Systems, Rio de Janeiro, Brazil, 1–4 November 2016. [Google Scholar] [CrossRef]
Gharehbaghi, A.; Ghasemlounia, R.; Ahmadi, F.; Albaji, M. Groundwater level prediction with meteorologically sensitive Gated Recurrent Unit (GRU) neural networks. J. Hydrol. 2022, 612, 128262. [Google Scholar] [CrossRef]
Kannan, E.; Carmel, M.B. Optimization Algorithms in Diabetes Prediction: A Comparative Work. In Proceedings of the 2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL), Bhimdatta, Nepal, 13–14 March 2024; IEEE: Piscataway, NJ, USA; pp. 372–377. [Google Scholar]
Chen, J.C.; Wang, Y.M. Comparing activation functions in modeling shoreline variation using multilayer perceptron neural network. Water 2020, 12, 1281. [Google Scholar] [CrossRef]

Figure 1. Map of the study area. Red line represents the hydrogeologic section of the representative model location. Black lines and blue lines represent the groundwater heads and drains, respectively. Groundwater wells and petroleum wells are indicated by grey dots and orange dots, respectively [2,22].

Figure 2. Transient boundary conditions used in the numerical model. (a) Groundwater tables of west and east boundaries. (b) Recharge from the top surface. (c) Maximum evapotranspiration flux.

Figure 3. Hydraulic conductivity distribution schematic for the first two examples. (a) Case 1. (b) Case 2.

Figure 4. Example of using Kriging interpolation to set the global hydraulic conductivity (Case 3). (a) Hydraulic conductivity distribution map of sampling points before interpolation. (b) Hydraulic conductivity distribution map of the whole domain after interpolation.

Figure 5. Simplified schematic summary of selecting the optimal deep learning based surrogate model in this work.

Figure 6. Schematic diagram of the five-layer convolutional neural network structure.

Figure 7. (1) Depthwise separable convolution neural network (DSCNN) (2) Detailed internal operation flow of a DS-CNN. (a) Depthwise convolution (D-Conv2D). (b) Pointwise convolution (P-Conv2D).

Figure 8. (a) The structure of a LSTM cell. (b) The structure of a GRU cell.

Figure 9. Design flowchart for neural network surrogate model in Case 3.

Figure 10. MSE values achieved using CNN alternative models across various training epochs and different training datasets (c_1: Case 1 model; c_2: Case 2 model; c_3: Case 3 model).

Figure 11. MSE values, R² values, and training time values under the test sample set for each CNN surrogate of Case 1 to Case 3. (a) MSE values and training time values. (b) R² values and SSIM values.

Figure 12. MSE values obtained from various CNN-RNN coupling surrogate models during different training epochs (c_l: CNN-LSTM; d_l: DSCNN-LSTM; c: CNN in Case 3; d_g: DSCNN-GRU).

Figure 13. MSE values, R² values, and training time values under the test sample set for each surrogate model. (a) MSE values and training time values. (b) R² values and SSIM values.

Figure 14. The groundwater head field at t = [10, 20, 30, 40, 50, 60] months for the random test samples of the original model Case 3 (y), the general CNN surrogate model (y₁), and DSCNN-GRU surrogate model (y₂) with five convolutional layers and Adam and Tanh as the optimizer and loss function, respectively. The absolute error between the original model and the alternative model is represented as |y − y₁| and |y − y₂|.

Table 1. Data list of time, R² values, and MSE values when using different convolutional layers, different optimizers, and different activation functions for the model with the same remaining data.

	Different Layers			Different Optimizers			Different Activation Functions
	5 Layers	4 Layers	3 Layers	Adam	Nadam	RMSprop	Relu	Tanh	Softmax
Time (s)	2858.54	2914.89	3064.91	2858.54	3212.47	2598.3	2858.54	2876.42	3548.15
R²	0.954	0.949	0.948	0.954	0.853	0.876	0.954	0.974	0.826
MSE	0.0062	0.0069	0.0096	0.0062	0.0685	0.0431	0.0062	0.0052	0.0577
SSIM	0.942	0.931	0.929	0.942	0.724	0.755	0.942	0.961	0.701

Note: The values highlighted in red indicate (1) the optimal parameters identified through systematic evaluation, and (2) their corresponding performance metrics that demonstrate the prediction accuracy of the surrogate model under each optimal parameter condition.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Peng, C.; Zhao, Y.; Xia, X. A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction. Appl. Sci. 2025, 15, 4576. https://doi.org/10.3390/app15084576

AMA Style

Li X, Peng C, Zhao Y, Xia X. A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction. Applied Sciences. 2025; 15(8):4576. https://doi.org/10.3390/app15084576

Chicago/Turabian Style

Li, Xiang, Chaoyang Peng, Yule Zhao, and Xuemin Xia. 2025. "A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction" Applied Sciences 15, no. 8: 4576. https://doi.org/10.3390/app15084576

APA Style

Li, X., Peng, C., Zhao, Y., & Xia, X. (2025). A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction. Applied Sciences, 15(8), 4576. https://doi.org/10.3390/app15084576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction

Abstract

1. Introduction

2. Methodology

2.1. Groundwater Models

2.1.1. Study Area

2.1.2. Modeling Method

2.2. Deep Learning-Based Surrogate Models

2.2.1. CNN

2.2.2. Depthwise Separable Convolution Neural Network (DSCNN)

2.2.3. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU)

2.2.4. Proposed Method (DSCNN-GRU)

2.3. Application of the Deep Learning Based Surrogate

3. Results

3.1. CNN Surrogate Model Results

3.2. Coupled Surrogate Model Results

3.3. Further Optimization of the DSCNN-GRU Surrogate Model

3.3.1. Optimization of Convolution Layer

3.3.2. Optimization of Optimizers

3.3.3. Optimization of Activation Functions

3.4. Illustrative Example

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI