In this study, both the MODFLOW software and Long Short-Term Memory (LSTM) models were employed to predict the water levels in Chaoyang City. Additionally, the impact of two different optimization algorithms on the predictive performance of the LSTM network was examined, specifically the firefly algorithm (FA-LSTM) and the gray wolf optimization algorithm (GWO-LSTM). Compared to the traditional LSTM model, these optimization algorithms significantly enhanced the accuracy of the models in time-series prediction tasks. The goal was to identify the optimal model among these enhanced versions to achieve more accurate water level predictions.
3.1. MODFLOW Prediction Results
Based on the collected hydrogeological data, the hydrogeological conditions of the study area were conceptualized, and key parameters such as permeability coefficient, specific yield, rainfall, recharge, and evaporation were determined. A groundwater-flow model for Chaoyang City was then established using these parameters to predict water levels. The generalization of the hydrogeological model takes into account the internal structure of the aquifer, the hydraulic characteristics of the aquifer, the treatment of the boundaries of the study area, and the treatment of the source and sink terms, as shown below:
- (1)
Generalization of the internal structure of the aquifer
According to the hydrogeological conditions of the study area, the aquifer is regarded as a single submersible aquifer. The bottom plate of the Upper Pleistocene strata is taken as the bottom plate of the aquifer. According to the type, lithology, thickness, and hydraulic conductivity characteristics of the aquifer, the model is generalized to a non-homogeneous isotropic aquifer, which can be locally regarded as homogeneous.
- (2)
Generalization of hydraulic characteristics of the aquifer
The groundwater level in the study area is subject to certain changes due to the influence of dry and abundant water periods, and the water flow is unsteady, but, in general, the regional groundwater is a laminar movement, and the groundwater seepage conforms to Darcy’s law, which can be regarded as an unsteady two-dimensional planar flow.
- (3)
Boundary processing of the study area
According to the distribution of observation wells (holes) at the boundary of the study area, the time-series function provided by MODFLOW should be used to define the boundary as a ‘given head boundary’ that changes with time, and it can also be approximated and generalized as a water-isolated boundary of the shallow groundwater system (controlled by the topography, with a short runoff and fast alternation), and the boundary conditions should be determined according to the specific hydrogeological conditions. The boundary conditions depend on the specific hydrogeological conditions, and the value of the water level at the boundary point is determined according to the hydrogeological conditions of the study area and the data from the long-term observation wells of the boundary groundwater level.
- (4)
Treatment of source and sink items
The source term is mainly considered to have atmospheric precipitation infiltration recharge. Atmospheric precipitation infiltration recharge is partitioned according to the intensity of precipitation infiltration (the riverbed part is not included in the partition), and the intensity of precipitation infiltration is calculated according to the coefficient of precipitation infiltration and the amount of precipitation measured by the survey to calculate the amount of precipitation recharge per unit area.
The main considerations of the sinks are evaporation, agricultural water extraction, industrial water extraction, and groundwater recharge to rivers. Evaporation is calculated according to evaporation intensity; agricultural water extraction includes water for domestic use in villages, which is measured according to population and water use quotas; agricultural irrigation water is zoned according to extraction intensity, which is calculated according to crop area and irrigation quotas; industrial water extraction is calculated according to the amount of extraction surveyed; and groundwater recharge to rivers is treated in the same way as infiltration of rivers for recharge.
The hydrogeological parameters selected for the model are permeability coefficient, water supply degree, and porosity, and the initial values of the above parameters are mainly given according to the hydrogeological tests in the survey. The initial values of the model parameters and the values of the parameters after parameter adjustment are shown in
Table 4.
The period from 20 March 2017, to 22 August 2018, was selected as the model-validation period. Representative observation wells were chosen for the comparison between the calculated water levels and the measured values. The comparison results between the calculated and observed water levels are presented in
Figure 5 and
Figure 6. In order to make the model-identification results more intuitive, two quantitative metrics for evaluating the accuracy of the model were used, i.e., the root-mean-square error (RMSE) and the correlation coefficient (R
2) between the simulated and measured values of the model, as shown in
Table 5.
As shown in the above graphs, from the graph of the validation results in March, it can be seen that the maximum error occurs at point G4, which is 9.13 m, and the minimum error occurs at point G6, which is 6.68 m. From the graph of the validation results in August, it can be seen that the maximum error occurs at point G6, which is 0.35 m, and the minimum error occurs at point G2, which is 0.07 m. Combined with the simulation results of validation periods, the correlation coefficient (R2) of the model ranges from 0.78 to 0.95, indicating that the simulated values of monitoring well-water level have a good correlation with the actual values.
3.2. Prediction Results of Deep Learning Models
The groundwater level-prediction results of three models, namely, the Long Short-Term Memory network (LSTM), firefly algorithm (FA-LSTM), and gray wolf optimization algorithm (GWO-LSTM), are shown in
Figure 7. The average performance indicators of the model are shown in
Table 6.
Figure 7 clearly demonstrates that the LSTM model is unable to effectively capture the rapidly fluctuating water level information, resulting in a significant discrepancy in the predicted high water level. The LSTM model optimized by the firefly algorithm demonstrates enhanced capability in capturing rapidly fluctuating water level information; however, a lag persists, and the prediction of the high water level remains suboptimal. The LSTM model optimized by the grey wolf optimization algorithm demonstrates enhanced precision in both the prediction of high water levels and the capture of rapidly fluctuating water levels.
By comparing and analyzing the data in
Table 6, it was found that the optimized model significantly improved performance compared to the base-model LSTM. The FA-LSTM model adopts the brightness-attraction principle of the firefly algorithm to guide parameter optimization, achieving good prediction accuracy. The R
2 value is 0.9810, and the MAE, RMSE, and MAPE values are reduced to 7.8515, 11.7396, and 0.0208, respectively. The GWO-LSTM model achieved the best performance among the three models by adjusting the parameters of social class and hunting strategy through the grey wolf optimization algorithm, with an R
2 value of 0.9891, and other performances also reached the optimal level.
Using the FA-LSTM model for water level prediction in the study area can effectively capture the nonlinear characteristics and long-term dependencies of water level changes. The global optimization capability of FA significantly improves the optimization efficiency of LSTM parameters, thereby enhancing the accuracy and reliability of water level prediction.
The GWO algorithm can guide the LSTM network parameters to gradually approach the global optimal solution, thereby improving the accuracy of the model’s prediction of time-series data. In the application case of water level prediction in the research area, the GWO-LSTM model effectively captures the complex nonlinear patterns and long-term dependencies of water level changes by precisely adjusting the LSTM parameters, thereby significantly improving the prediction performance.
3.3. Discussion
When comparing the traditional groundwater numerical-simulation prediction model (MODFLOW) and the deep learning model (e.g., LSTM), we note that when the groundwater-flow numerical model is repeatedly adjusted according to the hydrogeological data, the prediction results with higher accuracy can be obtained. At this time, the prediction accuracy of the traditional groundwater-flow numerical model is slightly better than that of the LSTM model, but there is still room for improvement, and the traditional model still has a better application prospect in some cases; the optimization of the LSTM model by the firefly optimization algorithm (FA-LSTM) and the gray wolf optimization algorithm (GWO-LSTM) significantly improves the prediction accuracy. The percentage improvement in model accuracy after optimization is shown in
Table 7.
From the table we can see that the firefly-optimized LSTM model improves R2 by 4.25% compared to the before-optimization R2, and MAE, RMSE, and MAPE are reduced by 18.50%, 43.75%, and 22.04% respectively; the gray wolf-optimized LSTM model improves R2 by 4.99% compared to the before-optimization R2, and MAE, RMSE, and MAPE are reduced by 47.55%, 56.27%, and 44.89%. It can be seen that the deep learning method demonstrates high accuracy in groundwater level prediction, showing its potential in groundwater level prediction. The high water level anomalies in the prediction results may be caused by heavy rainfall, and these anomalous high water levels may bring about environmental problems, so the optimized LSTM model can be subsequently used to mark and predict extreme events to prevent the adverse impacts of high water levels on ecological environments and to provide a scientific basis for sustainable development and human health.