Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction

Tao, Jiahui; Gu, Yicheng; Yin, Xin; Chen, Junlai; Ao, Tianqi; Zhang, Jianyun

doi:10.3390/su16198699

Open AccessArticle

Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction

by

Jiahui Tao

^1,2,

Yicheng Gu

²,

Xin Yin

²,

Junlai Chen

³,

Tianqi Ao

² and

Jianyun Zhang

^1,2,*

¹

State Key Laboratory of Hydraulics and Mountain River Engineering, College of Water Resource & Hydropower, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, China

²

Institute of Hydrology and Water Resources, Nanjing Hydraulic Research Institute, No. 223, Guangzhou Road, Nanjing 210029, China

³

College of Water Resources and Architectural Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling 712100, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(19), 8699; https://doi.org/10.3390/su16198699

Submission received: 14 August 2024 / Revised: 26 September 2024 / Accepted: 28 September 2024 / Published: 9 October 2024

Download

Browse Figures

Versions Notes

Abstract

The establishment of an accurate and reliable predictive model is essential for water resources planning and management. Standalone models, such as physics-based hydrological models or data-driven hydrological models, have their specific applications, strengths, and limitations. In this study, a hybrid model (namely SWAT-Transformer) was developed by coupling the physics-based Soil and Water Assessment Tool (SWAT) with the data-driven Transformer to enhance monthly streamflow prediction accuracy. SWAT is first constructed and calibrated, and then its outputs are used as part of the inputs to Transformer. By correcting the prediction errors of SWAT using Transformer, the two models are effectively coupled. Monthly runoff data at Yan’an and Ganguyi stations on Yan River, a first-order tributary of the Yellow River Basin, were used to evaluate the proposed model’s performance. The results indicated that SWAT performed well in predicting high flows but poorly in low flows. In contrast, Transformer was able to capture low-flow period information more accurately and outperformed SWAT overall. SWAT-Transformer could correct the errors of SWAT predictions and overcome the limitations of a single model. By integrating SWAT’s detailed physical process portrayal with Transformer’s powerful time-series analysis, the coupled model significantly improved streamflow prediction accuracy. The proposed models offer more accurate and reliable predictions for optimal water resource management, which is crucial for sustainable economic and societal development.

Keywords:

streamflow; prediction; SWAT; transformer; coupled modeling

1. Introduction

With the increasing demand for water resources and the occurrence of extreme weather events, effective water resource planning and management have become critical issues over the past few decades [1]. Accurate and reliable streamflow prediction is a key step in ensuring water supply and preventing natural disasters such as floods and droughts [2]. However, due to the nonlinear and nonstationary nature of streamflow, accurate prediction is a challenging task, particularly when dealing with extreme flows (high and low flows) [3,4]. Therefore, improving streamflow prediction accuracy has received significant attention.

A variety of hydrological models have been developed, which can be classified into two types: physics-based models and data-driven models [5]. Physics-based models can effectively capture the mass and energy balance processes of the hydrological cycle. They provide a generalized representation of physical processes and describe watershed hydrological processes through complex mathematical equations [6,7]. However, their incomplete representation of physical processes may lead to prediction inaccuracies [8,9]. By extracting features from extensive historical data, data-driven hydrological models identify underlying nonlinear relationships and then establish complex input–output mappings to simulate hydrological processes [10]. Data-driven models often achieve higher prediction accuracy than physics-based hydrological models, as they can capture high-dimensional nonlinear features and complex relationships that traditional physics-based models struggle to describe among variables [11,12,13]. However, due to their lack of physical meaning, data-driven models are prone to overfitting, and their long-term predictions under changing environments may be inaccurate [14]. Given these respective strengths and weaknesses, relying on a single model type may not effectively accomplish the task of streamflow prediction. Therefore, we can couple physics-based hydrological models with data-driven models to overcome their respective limitations and complement each other’s strengths [15]. Coupled modeling has proven more effective than individual modeling in hydrological research [16,17,18]. Konapala et al. coupled physics-based models with LSTM for streamflow prediction in 531 U.S. watersheds. The results showed that the coupled model significantly improved multiple evaluation metrics compared to using the models individually, especially in watersheds where physics-based models failed completely (NSE = 0) [19].

In physics-based hydrological models, SWAT is a semi-distributed model widely used for streamflow prediction. However, it may exhibit systematic biases in low flow predictions [20], with performance fluctuations between dry and rainy seasons [21]. This issue can be mainly attributed to two factors: the SWAT model calculates surface runoff by using the Soil Conservation Service (SCS) Curve Number method, which has certain limitations in generating low flow conditions [22,23]. The objective functions or performance metrics used for calibration tend to focus on flood characteristics [24,25,26], which makes it more sensitive to high flow features and less effective in evaluating low flows. Gebremariam evaluated SWAT’s performance in the Maumee River Basin, the largest watershed in the Great Lakes region of North America, and found that SWAT performed well in predicting average flow, while it had limitations in predicting low flows, as well as the frequency and magnitude of flood events [20].

Data-driven hydrological models are often realized through statistical methods or machine learning algorithms [27]. Statistical methods include regression analysis and time series analysis, while machine learning algorithms encompass traditional machine learning algorithms (such as support vector machines, random forests, and decision trees) and deep learning algorithms (such as artificial neural networks, convolutional neural networks, recurrent neural networks, and long short-term memory networks). These methods have been successfully applied in water environment research for over two decades [28,29,30]. In 2017, Google introduced Transformer, a novel deep learning architecture initially designed for natural language processing [31], later extended to applications in computer vision, speech recognition, and knowledge graphs [32]. Unlike sequence-aligned recurrent neural networks (such as LSTM), Transformer relies entirely on the attention mechanism to generate internal representations of input and output data. By directly connecting any two positions in the time series, the self-attention modules enhance information transmission efficiency and capture long-range dependencies [33,34]. In 2022, Yin et al. introduced Transformer into rainfall–runoff modeling for the first time and compared it with the LSTM model [35]. The results indicated that Transformer outperformed the LSTM model in terms of accuracy and flexibility and showed superior performance in modeling large datasets. As a current hotspot in deep learning, Transformer has achieved significant applications in various fields [36,37]. However, its application in hydrology is still in the exploratory stage, especially in arid and semi-arid regions. Exploring the applicability of Transformer in streamflow prediction and its coupling performance with physics-based hydrological models remains a meaningful topic.

The objectives of this study are as follows. (1) Explore the applicability of Transformer and compare it with SWAT. (2) Construct a hybrid SWAT-Transformer and test its performance. To the best of our knowledge, no previous studies have combined these two models for streamflow prediction in arid and semi-arid regions. The remainder of this paper is organized as follows. Section 2 provides the materials and methods. Section 3 presents the results and discussion, including a comparison of proposed models, and their strengths and weaknesses in predicting high and low flows. Finally, Section 4 summarizes the entire study.

2. Material and Methodology

This section provides a brief introduction to the study area data, methods of SWAT and Transformer, the construction of SWAT-Transformer, and performance measures.

2.1. Study Area and Data

Yanhe River is a first-order tributary on the right bank of the Yellow River, with geographical coordinates of 36°21′–37°19′ N and 108°38′–110°29′ E. It has a total length of 286.9 km and a basin area of 7725 km². Yanhe River basin falls under the warm temperate continental semi-arid climate, with an average annual temperature of 9 °C and an average annual precipitation of 540 mm. The precipitation distribution is uneven throughout the year, with more than 70% of the annual precipitation occurring during the wet season (June to September). This study involves two hydrological stations, Yan’an Station and Ganguyi Station, which are the control stations at the middle and lower reaches of the basin, respectively. Figure 1 presents the two stations and Table 1 enumerates some basic statistics of the streamflow data of the present study sites.

The data required for this study include digital elevation model (DEM) data (30 m resolution, Geospatial Data Cloud Platform, https://www.gscloud.cn/home, accessed on 20 January 2024), land use data for 2005 (100 m resolution, Geo-information Monitoring Cloud Platform, http://www.dsac.cn/, accessed on 20 January 2024), soil data (1 km resolution, Harmonized World Soil Database, https://gaez.fao.org/pages/hwsd, accessed on 20 January 2024), meteorological data (China National Meteorological Information Center, http://data.cma.cn/, accessed on 20 January 2024), and streamflow data (Yan’an Station and Ganguyi Station, China’s Third Water Resources Survey and Evaluation Data, the Bureau of Hydrology, Ministry of Water resource, China).

2.2. SWAT Model

SWAT is a semi-distributed hydrological model that simulates the processes of precipitation, evapotranspiration, sediment transport, and crop growth within a watershed based on user inputs of geography, soil, vegetation, and meteorology. It divides the entire watershed into a number of sub-watersheds, which are further subdivided into Hydrological Response Units (HRUs) [22]. The water balance is the important driver of all hydrological processes in each of the HRUs, which can be written as follows,

S W_{t} = S W_{0} + \sum_{i = 1}^{t} (R_{d a y} - Q_{s u r f} - E_{a} - W_{s e e p} - Q_{g w})

(1)

where SW_t denotes the amount of soil moisture at step t, SW₀ denotes the amount of soil moisture at the initial moment, R_day denotes the amount of daily precipitation, Q_surf denotes the amount of surface runoff, E_a denotes the amount of evapotranspiration, W_seep represents the amount of water that percolates from the soil profile into the unsaturated zone, and Q_gw denotes the amount of groundwater runoff. The units for all these variables are in liters (L).

2.2.1. SWAT Model Database

There are mainly four components in the SWAT database: DEM data, land use data, soil data, and meteorological data. The DEM data were transformed and cropped to a spatial resolution of 30 m by 30 m for sub-watershed delineation and watershed hydrological system extraction. The 2005 land use data were classified and coded into six categories to meet the input requirements of SWAT. Based on the HWSD, the soils were categorized into eight classes to establish the soil database for the study area. Meteorological data on a daily scale at seven stations from 1979 to 2020 were imported, including the maximum temperature, minimum temperature, average barometric pressure, hours of sunshine, average wind speed, relative humidity, precipitation, and evaporation. Table 2 enumerates some basic statistics of the meteorological data.

2.2.2. SWAT Spatial Discretization

SWAT spatial discretization mainly consists of the delineation of sub-watersheds and HURs. It spatially analyzes the DEM to simulate the production and loss processes of the hydrological cycle and then generates sub-watershed boundaries and flow path networks. The threshold area (15,095 hm²) for the sub-watershed definition was set, and 32 sub-watershed units were finally formed. HRUs are the basic units of water, energy, and material cycles in SWAT, and each HRU has the same land use, soil type, and slope characteristics. The study area was divided into 176 HRUs.

2.2.3. SWAT Calibration

We calibrated SWAT by using the SWAT-CUP (SWAT Calibration and Uncertainty Procedures) tool. Based on the SUFI-2 algorithm, we performed a sensitivity analysis to identify key parameters that significantly impact the prediction results. These key parameters were iteratively adjusted to minimize the error between the predicted results and the actual observed data. The optimal parameters are shown in Table 3.

2.3. Transformer Model

Transformer is a deep learning model based on the attention mechanism, encapsulated by the core idea that ‘attention is everything’. Unlike the LSTM model, it does not rely on the sequential alignment structure of recurrent neural networks (RNNs). Instead, it uses the self-attention mechanism to establish connections at any position within a time series, which helps to avoid common issues such as gradient vanishing and long-range dependency in RNNs and LSTMs [38,39]. This structure enhances the model’s representational capacity and improves its efficiency in handling long-range dependencies. Figure 2 presents the structure of Transformer for streamflow prediction in this study.

In terms of structural design, Transformer creates complex dependencies between the encoder and decoder by combining self-attention, cross-attention, and multi-head attention mechanisms [40]. In the encoder, the self-attention layer and the position-by-position feed-forward network work together to capture long-range dependencies on input features. In the decoder, to minimize error accumulation, masking techniques are applied to the self-attention and encoder–decoder attention layers, which ensures that the model uses only known streamflow observations during multi-step predictions. In addition, the decoder enhances computational efficiency and reduces error accumulation by employing a non-autoregressive decoding strategy, which produces outputs for all positions in a single forward pass [39]. The pre-training and fine-tuning process enables the model to extract features from long series of hydrometeorological data, which significantly improves the model’s ability to characterize rainfall and runoff processes.

Since Transformer lacks inherent time labels, the input transformation layer and positional embedding mechanism are introduced at the initial stages of both the encoder and decoder to effectively handle time series data. To preserve the sequential characteristics of the data, the input transformation layer converts raw input into vectors that align with the model’s internal dimensions, while the positional embedding incorporates positional information into each data point. This design enables Transformer to effectively perform time series analysis tasks, such as streamflow prediction.

2.4. SWAT-Transformer Model

SWAT’s results may exhibit certain deviations due to limitations in its structure and objective function, and these deviations often show a degree of autocorrelation [41]. To improve prediction accuracy, we developed the SWAT-Transformer model. In this approach, the actual flow is considered as the sum of SWAT’s prediction and an error term. The modeling process of SWAT-Transformer is outlined as follows. Figure 3 presents the flowchart of the SWAT-Transformer predicting model.

➀ Construction of SWAT: SWAT is used to generate the initial results. As a physical model, SWAT considers various hydrological processes within the watershed, such as precipitation, evaporation, runoff, and groundwater. Although the results may be biased, they provide a solid database for subsequent bias analysis and correction.

➁ Construction of SWAT-Transformer: By contrasting the predictions generated by SWAT with the actual observed values, a bias sequence is obtained. The initial results and biases from SWAT, along with other relevant data (such as meteorological data and watershed attributes), are fed into Transformer. By leveraging deep learning, Transformer can extract features and correct errors to generate more accurate predictions.

To reduce the impact of initial conditions, the warm-up period was set from 1979 to 1980. For the sake of comparing the performance of the three models and facilitating their effective combination, we uniformly set the period from 1981 to 2010 as the training period, and the period from 2011 to 2020 as the testing period. Additionally, to avoid overfitting during model training, an ‘early stopping’ strategy was implemented. After each training iteration, the performance during the testing period is evaluated. If no significant improvement was observed over several consecutive assessments, the stopping condition is activated to terminate the training in time.

2.5. Performance Measures

The following three evaluation metrics were used for model performance measures: the Nash efficiency coefficient (NSE), the percentage of bias (PBIAS), and the coefficient of determination (R²), expressed, respectively, as

N S E = 1 - [\frac{\sum_{i = 1}^{n} {(Y_{i}^{o b s} - Y_{i}^{s i m})}^{2}}{\sum_{i = 1}^{n} {(Y_{i}^{o b s} - {\bar{Y}}^{m e a n})}^{2}}]

(2)

P B I A S = [\frac{\sum_{i = 1}^{n} (Y_{i}^{o b s} - Y_{i}^{s i m})}{\sum_{i = 1}^{n} Y_{i}^{o b s}}] \times 100 %

(3)

R^{2} = {(\frac{\sum_{i = 1}^{n} (Y_{i}^{obs} - {\bar{Y}}^{obs}) (Y_{i}^{sim} - {\bar{Y}}^{sim})}{\sqrt{\sum_{i = 1}^{n} {(Y_{i}^{obs} - {\bar{Y}}^{obs})}^{2} \sum_{i = 1}^{n} {(Y_{i}^{sim} - {\bar{Y}}^{sim})}^{2}}})}^{2}

(4)

where

Y_{i}^{o b s}

represents the

i^{t h}

observed value,

Y_{i}^{s i m}

represents the

i^{t h}

simulated value,

{\bar{Y}}^{sim}

and

{\bar{Y}}^{obs}

represent the mean observed and simulated values, respectively, and

n

represents the total number of evaluations in the series.

3. Results and Discussion

3.1. Results

Table 4 presents the statistical indicators of prediction results for the training and testing periods of SWAT, Transformer, and SWAT-Transformer. It can be observed that there were significant differences in accuracy among the three models. The results indicated that the proposed SWAT-Transformer model showed satisfactory prediction performance and outperformed the other two models at both cases.

For the Yan’an station during the testing period, SWAT showed some degree of bias with a relatively low prediction accuracy of NSE = 0.65, PBIAS = 35.01%, and R² = 0.70. Transformer showed higher accuracy compared to SWAT, with NSE and R² increasing by 9.2% and 7.1%, respectively, while PBIAS decreased by 40.0%. SWAT-Transformer performed the best with an NSE of 0.84, a PBIAS of 10.11%, and a R² of 0.86. The coupled model further improved the prediction accuracy compared to Transformer, with NSE and R² increasing by 18.3% and 14.7%, respectively, while PBIAS decreased by 51.9%.

For the Ganguyi station during the testing period, the performance of SWAT was also not ideal in terms of NSE (0.66), PBIAS (37.01%), and R² (0.71). Transformer significantly improved prediction accuracy compared to SWAT, with NSE and R² increasing by 7.6% and 7.1%, respectively, while PBIAS decreased by 73.4%. SWAT-Transformer performed very well at Ganguyi station, with NSE and R² increasing to 0.88 and 0.89, respectively, while PBIAS decreased to 2.61%. Compared to Transformer, the coupled model also showed significant improvement, with NSE and R² increasing by 24.2% and 8.5%, respectively, while PBIAS decreased by 73.4%.

While the above statistical indicators have provided a quantitative basis for evaluating the performance of the three models, hydrographs and scatterplots are also helpful in assessing the consistency between predicted and observed values. To visually compare model performance, we plotted the hydrographs and scatterplots of predicted and observed values during the training and testing periods, which allowed for a clearer observation of the performance differences among the models, as shown in Figure 4.

In the case of Yan’an station (Figure 4), the hydrographs showed that the predicted streamflow of all proposed models generally followed the same trend as the observed streamflow, but there were differences in accuracy. SWAT performed poorly in predicting low-flow periods, as evidenced by the scatterplots, where the predicted values tended to fall below the 45-degree line in the low-value range. In contrast, Transformer performed better overall than SWAT but significantly underestimated high values. Both the hydrographs and scatterplots indicated that SWAT-Transformer performed the best among the three models, consistent with the analysis of the previous statistical indicators.

In the case of Ganguyi Station (Figure 5), the performance of the three models was similar to Yan’an Station, and SWAT-Transformer still showed the best performance. From the scatterplots of Figure 5f, the scatters of the three models at Ganguyi station were closer to the 45-degree reference line and were less dispersed compared to Yan’an station. This indicated that the performance of the three models at Ganguyi station was better than at Yan’an station. This may be attributed to the fact that Gangu station is located downstream of the watershed and pools flow from the upstream and midstream; therefore, the streamflow conditions are more stable and the streamflow response to precipitation events is more significant [42].

To assess the robustness of the three models, the error distributions are presented in Figure 6, where the skewness of the errors was analyzed through boxplots. Figure 6 indicates that the prediction errors of the three models at Yan’an station were generally within the range of ±10 m³/s. In most cases, the errors of SWAT showed larger errors than the other two models. It tended to underestimate the observed values, with the widest error distribution and more outliers. The errors of Transformer were generally smaller than SWAT and were more concentrated, but there were still a number of outliers, especially under high-flow conditions. The boxplots showed that the results of SWAT-Transformer outperformed the standalone SWAT and Transformer, with the smallest prediction error and the most concentrated error distribution. The errors at Ganguyi station exhibited some fluctuations compared to Yan’an station, but the coupled model still showed the best performance, which was overall similar to Yan’an station.

Generally, according to the performance measures in both cases, the NSE of the three models exceeded 0.7, the PBIAS was generally controlled within 30%, and the R² was also generally above 0.7. The agreement between predicted and observed values suggests that all three models are applicable to monthly flow prediction. Transformer outperformed SWAT with smaller errors in both cases, as confirmed by the NSE, R², and PBIAS values. Moreover, SWAT-Transformer had the highest NSE and R² values (higher than 0.80 and 0.85, respectively) and the lowest PBIAS values in both cases, which indicated that the coupled model has significant advantages in handling complex hydrological processes.

According to the hydrographs and scatterplots, SWAT exhibited significant bias in low-flow predictions and generally underestimated the values, yet it was more accurate for high flows. Transformer generally outperformed SWAT, while it tended to underestimate high flows, and the scatterplots in the high-flow range were more dispersed, which indicated challenges in capturing high-flow characteristics. Overall, the coupled model combined the strengths of both SWAT and Transformer and significantly enhanced prediction accuracy. To further explore the performance of the three models across different flow levels, the following discussion is provided.

3.2. Discussion

Reliable prediction of extreme flows (low and high flows) is crucial for water resource management and timely warnings, such as in the case of droughts and floods. In this study, high flows are defined as flows one-third above the average flow, while low flows are defined as flows falling below one-third of the average flow [43]. The statistical metrics for high- and low-flow predictions during the testing period are shown in Figure 7.

In the high-flow predictions, SWAT showed good performance, and its NSE at Yan’an and Ganguyi stations reached 0.78 and 0.74, which were 30% and 15% higher than Transformer, respectively. The reason for this may be that SWAT relies on the SCS curve method for simulating the rainfall–runoff process. The SCS curve can sensitively capture the runoff response after a rainfall event, particularly during heavy rainfall, and effectively simulate the rapid conversion of rainfall into runoff. This capability allows SWAT to more accurately predict high-flow events. In addition, existing research has shown that machine learning models tend to underfit extreme flow events [44,45], especially when there are insufficient high-flow samples in the training data.

In the low-flow predictions, SWAT performed poorly at both Yan’an and Ganguyi stations, with NSE values of −0.11 and −0.08, respectively. In contrast, Transformer significantly outperformed SWAT, with NSE values of 0.28 and 0.31, respectively. The poor performance of SWAT is likely due to its lack of sensitivity to surface runoff responses during small precipitation events. The study area is located in an arid and semi-arid region, where most precipitation is absorbed by the soil or used to replenish groundwater during the dry season, making it difficult for SWAT to capture the characteristics of low-flow variations. In addition, SWAT mainly focuses on flow peaks in the calibration process and does not consider the low-flow period enough [21,23].

SWAT-Transformer showed the best accuracy in both high- and low-flow predictions. At Yan’an station, its NSE was 15% higher than SWAT in high-flow prediction and 64% higher than Transformer in low-flow prediction. At Ganguyi station, its NSE was 19% higher than SWAT in high-flow prediction and 74% higher than Transformer in low-flow prediction. By compensating for the shortcomings of the standalone models, the coupled model can capture the characteristics of different flow events and improve the overall prediction performance.

3.3. Simulation Uncertainties

In this study, the proposed SWAT-Transformer model showed improvements in performance for low streamflow and overall series; however, it had not yet achieved a standard of widespread applicability. The simulation uncertainties associated with the SWAT-Transformer model may stem from three primary aspects. First, the limitations of the SWAT model primarily relate to model parameters, model structure, and meteorological inputs. As previously mentioned, model parameters and structure have been discussed. Regarding input data, the limited number of meteorological stations in the study area may result in relatively uncertain spatial interpolation of meteorological predictors, which may increase simulation errors in the SWAT model. Second, the uncertainties in the Transformer model mainly arise from model parameter estimation, nonlinearities in long-term dependencies, and sequence overfitting. Although an early stopping strategy was implemented in this study to mitigate the overfitting issue, future research could explore parameter optimization and extend the study period to verify the model’s predictive capabilities over longer sequences. Finally, the impact of human activities on natural runoff is also a crucial factor. Activities such as ecological water replenishment and river water extraction complicate the restoration of natural runoff sequences and affect the precipitation–runoff relationship. Future research could consider integrating multi-source observational data (e.g., remote sensing) to enhance the input accuracy of the SWAT model, optimizing the architecture and parameter settings of the Transformer model, and investigating the specific effects of human activities on hydrological processes in the watershed to better understand the role of these factors in simulations.

4. Conclusions

The nonlinear and non-stationary nature of streamflow makes prediction a consistently challenging scientific task. Standalone models such as SWAT (physics-based) and Transformer (data-driven) each have specific applications, strengths, and limitations, while coupled modeling effectively enhances hydrological prediction accuracy. In this study, we developed and evaluated a coupled model (SWAT-Transformer) based on SWAT and Transformer. This model was built on an error correction framework, where SWAT’s output and other relevant variables were used as inputs to Transformer.

The proposed model was applied to predict monthly streamflow at Yan’an and Ganguyi stations on the Yan River. The results showed that (1) The NSE of all three models exceeded 0.7, PBIAS was generally within 30%, and R² was also overall above 0.7. Additionally, the predicted values were consistent with the observed values in terms of trend, which indicated the adaptability of these models in monthly flow predictions. (2) SWAT and Transformer have their strengths and weaknesses in streamflow prediction. SWAT performed better in high-flow predictions, while Transformer excelled in low-flow scenarios. (3) SWAT-Transformer significantly outperformed both the standalone SWAT and Transformer. The coupled model effectively integrated SWAT’s physical process mechanisms with Transformer’s data processing capabilities, which is a good approach to enhance the accuracy and reliability of predictions.

Author Contributions

J.T. and Y.G. conceived and designed the study; J.T., J.C., X.Y. and T.A. collected the data and carried out the investigation; J.T. analyzed the data; J.T. wrote the paper, with the assistance of J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52394234), the Basic Research Foundation of National Public Research Institutes of China (No. Y523003), and the National Key Research and Development Program of China (2023YFC3206804).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ni, L.; Wang, D.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J.; Liu, J. Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J. Hydrol. 2020, 586, 124901. [Google Scholar] [CrossRef]
Tian, Y.; Xu, Y.-P.; Yang, Z.; Wang, G.; Zhu, Q. Integration of a parsimonious hydrological model with recurrent neural networks for improved streamflow forecasting. Water 2018, 10, 1655. [Google Scholar] [CrossRef]
Yifru, B.A.; Lim, K.J.; Lee, S. Enhancing streamflow prediction physically consistently using process-Based modeling and domain knowledge: A review. Sustainability 2024, 16, 1376. [Google Scholar] [CrossRef]
Humphrey, G.B.; Gibbs, M.S.; Dandy, G.C.; Maier, H.R. A hybrid approach to monthly streamflow forecasting: Integrating hydrological model outputs into a Bayesian artificial neural network. J. Hydrol. 2016, 540, 623–640. [Google Scholar] [CrossRef]
Ni, L.; Wang, D.; Singh, V.P.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J. Streamflow and rainfall forecasting by two long short-term memory-based models. J. Hydrol. 2020, 583, 124296. [Google Scholar] [CrossRef]
Ding, G.; Wang, C.; Lei, X.; Xue, L.; Wang, H.; Zhang, X.; Song, P.; Jing, Y.; Yuan, R.; Xu, K. Application of coupling mechanism and data-driven models in the Hanjiang river basin. Front. Earth Sci. 2023, 11, 1185953. [Google Scholar] [CrossRef]
Barzegar, R.; Aalami, M.T.; Adamowski, J. Coupling a hybrid CNN-LSTM deep learning model with a boundary corrected maximal overlap discrete wavelet transform for multiscale lake water level forecasting. J. Hydrol. 2021, 598, 126196. [Google Scholar] [CrossRef]
Jaiswal, R.K.; Ali, S.; Bharti, B. Comparative evaluation of conceptual and physical rainfall–runoff models. Appl. Water Sci. 2020, 10, 48. [Google Scholar] [CrossRef]
Yang, T.; Sun, F.; Gentine, P.; Liu, W.; Wang, H.; Yin, J.; Du, M.; Liu, C. Evaluation and machine learning improvement of global hydrological model-based flood simulations. Environ. Res. Lett. 2019, 14, 114027. [Google Scholar] [CrossRef]
Li, W.; Liu, C.; Hu, C.; Niu, C.; Li, R.; Li, M.; Xu, Y.; Tian, L. Application of a hybrid algorithm of LSTM and Transformer based on random search optimization for improving rainfall-runoff simulation. Sci. Rep. 2024, 14, 11184. [Google Scholar] [CrossRef]
Jain, A.; Srinivasulu, S. Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J. Hydrol. 2006, 317, 291–306. [Google Scholar] [CrossRef]
Brath, A.; Montanari, A.; Toth, E. Neural networks and non-parametric methods for improving real-time flood forecasting through conceptual hydrological models. Hydrol. Earth Syst. Sci. 2002, 6, 627–639. [Google Scholar] [CrossRef]
Huang, F.; Zhang, X. A new interpretable streamflow prediction approach based on SWAT-BiLSTM and SHAP. Environ. Sci. Pollut. Res. 2024, 31, 23896–23908. [Google Scholar] [CrossRef]
Yuan, L.; Forshay, K.J. Enhanced streamflow prediction with SWAT using support vector regression for spatial calibration: A case study in the Illinois River watershed, US. PLoS ONE 2021, 16, e0248489. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Adams, B.J. Integration of artificial neural networks with conceptual models in rainfall-runoff modeling. J. Hydrol. 2006, 318, 232–249. [Google Scholar] [CrossRef]
Noori, N.; Kalin, L. Coupling SWAT and ANN models for enhanced daily streamflow prediction. J. Hydrol. 2016, 533, 141–151. [Google Scholar] [CrossRef]
Noori, N.; Kalin, L.; Isik, S. Water quality prediction using SWAT-ANN coupled approach. J. Hydrol. 2020, 590, 125220. [Google Scholar] [CrossRef]
Yang, S.; Yang, D.; Chen, J.; Santisirisomboon, J.; Lu, W.; Zhao, B. A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data. J. Hydrol. 2020, 590, 125206. [Google Scholar] [CrossRef]
Konapala, G.; Kao, S.-C.; Painter, S.L.; Lu, D. Machine learning assisted hybrid models can improve streamflow simulation in diverse catchments across the conterminous US. Environ. Res. Lett. 2020, 15, 104022. [Google Scholar] [CrossRef]
Gebremariam, S.Y.; Martin, J.F.; DeMarchi, C.; Bosch, N.S.; Confesor, R.; Ludsin, S.A. A comprehensive approach to evaluating watershed models for predicting river flow regimes critical to downstream ecosystem services. Environ. Model. Softw. 2014, 61, 121–134. [Google Scholar] [CrossRef]
Zhang, D.; Chen, X.; Yao, H.; Lin, B. Improved calibration scheme of SWAT by separating wet and dry seasons. Ecol. Model. 2015, 301, 54–61. [Google Scholar] [CrossRef]
Arnold, J.G.; Moriasi, D.N.; Gassman, P.W.; Abbaspour, K.C.; White, M.J.; Srinivasan, R.; Santhi, C.; Harmel, R.D.; van Griensven, A.; Van Liew, M.W.; et al. SWAT: Model use, calibration, and validation. Trans. ASABE 2012, 55, 1491–1508. [Google Scholar] [CrossRef]
Gao, X.; Chen, X.; Biggs, T.W.; Yao, H. Separating wet and dry years to improve calibration of SWAT in Barrett Watershed, Southern California. Water 2018, 10, 274. [Google Scholar] [CrossRef]
Ghermezcheshmeh, B.; Goodarzi, M.; Hajimohammadi, M. Simulation of low flow using SWAT under climate change status. Water Harvest. Res. 2021, 4, 191–209. [Google Scholar]
Jang, W.S.; Engel, B.; Ryu, J. Efficient flow calibration method for accurate estimation of baseflow using a watershed scale hydrological model (SWAT). Ecol. Eng. 2018, 125, 50–67. [Google Scholar] [CrossRef]
Van Liew, M.W.; Garbrecht, J. Hydrologic simulation of the little Washita river experimental watershed using SWAT 1. JAWRA J. Am. Water Resour. Assoc. 2003, 39, 413–426. [Google Scholar] [CrossRef]
Chen, S.; Huang, J.; Huang, J.C. Improving daily streamflow simulations for data-scarce watersheds using the coupled SWAT-LSTM approach. J. Hydrol. 2023, 622, 129734. [Google Scholar] [CrossRef]
Deng, H.; Chen, W.; Huang, G. Deep insight into daily runoff forecasting based on a CNN-LSTM model. Nat. Hazards 2022, 113, 1675–1696. [Google Scholar] [CrossRef]
Phetanan, K.; Hong, S.M.; Yun, D.; Lee, J.; Chotpantarat, S.; Jeong, H.; Cho, K.H. Enhancing flow rate prediction of the Chao Phraya River Basin using SWAT–LSTM model coupling. J. Hydrol. Reg. Stud. 2024, 53, 101820. [Google Scholar] [CrossRef]
Yuan, L.; Forshay, K.J. Evaluating monthly flow prediction based on SWAT and support vector regression coupled with discrete wavelet transform. Water 2022, 14, 2649. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
Xu, Y.; Lin, K.; Hu, C.; Wang, S.; Wu, Q.; Zhang, L.; Ran, G. Deep transfer learning based on transformer for flood forecasting in data-sparse basins. J. Hydrol. 2023, 625, 129956. [Google Scholar] [CrossRef]
Liu, H.; Yang, Q.; Liu, Z.; Shao, J.; Wang, G. An attention-mechanism-based deep fusion model for improving quantitative precipitation estimation in a sparsely-gauged basin. J. Hydrol. 2024, 628, 130568. [Google Scholar] [CrossRef]
Yin, H.; Guo, Z.; Zhang, X.; Chen, J.; Zhang, Y. RR-Former: Rainfall-runoff modeling based on Transformer. J. Hydrol. 2022, 609, 127781. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformer effective for time series forecasting? In Proceedings of the AAAI Conference On Artificial Intelligence, Montréal, QC, Canada, 8–10 August 2023; Volume 37, pp. 1121–11128. [Google Scholar]
Demiray, B.Z.; Sit, M.; Mermer, O.; Demir, I. Enhancing hydrological modeling with transformer: A case study for 24-h streamflow prediction. Water Sci. Technol. 2024, 89, 2326–2341. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Liu, C.; Xu, Y.; Niu, C.; Li, R.; Li, M.; Hu, C.; Tian, L. An interpretable hybrid deep learning model for flood forecasting based on Transformer and LSTM. J. Hydrol. Reg. Stud. 2024, 54, 101873. [Google Scholar] [CrossRef]
Fang, J.; Yang, L.; Wen, X.; Li, W.; Yu, H.; Zhou, T. A deep learning-based hybrid approach for multi-time-ahead streamflow prediction in an arid region of Northwest China. Hydrol. Res. 2024, 55, 180–204. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Improving long-term streamflow prediction in a poorly gauged basin using geo-spatiotemporal mesoscale data and attention-based deep learning: A comparative study. J. Hydrol. 2022, 615, 128608. [Google Scholar] [CrossRef]
Jiang, Z.; Lu, B.; Zhou, Z.; Zhao, Y. Comparison of Process-Driven SWAT and Data-Driven Machine Learning Techniques in Simulating Streamflow: A Case Study in the Fenhe River Basin. Sustainability 2024, 16, 6074. [Google Scholar] [CrossRef]
Yang, M.; Xu, J.; Yin, D.; He, S.; Zhu, S.; Li, S. Modified Multi–Source Water Supply Module of SWAT–WARM Model to Simulate Water Resource Responses under Strong Human Activities in the Tang–Bai River Basin. Sustainability 2022, 14, 15016. [Google Scholar] [CrossRef]
Strauch, M.; Bernhofer, C.; Koide, S.; Volk, M.; Lorz, C.; Makeschin, F. Using precipitation data ensemble for uncertainty analysis in SWAT streamflow simulation. J. Hydrol. 2012, 414, 413–424. [Google Scholar] [CrossRef]
Rezaie-Balf, M.; Kim, S.; Fallah, H.; Alaghmand, S. Daily river flow forecasting using ensemble empirical mode decomposition based heuristic regression models: Application on the perennial rivers in Iran and South Korea. J. Hydrol. 2019, 572, 470–485. [Google Scholar] [CrossRef]
Kan, G.; Li, J.; Zhang, X.; Ding, L.; He, X.; Liang, K.; Jiang, X.; Ren, M.; Li, H.; Wang, F.; et al. A new hybrid data-driven model for event-based rainfall–runoff simulation. Neural Comput. Appl. 2017, 28, 2519–2534. [Google Scholar] [CrossRef]

Figure 1. Locations of Yan’an and Ganguyi.

Figure 2. Structure of Transformer for streamflow prediction.

Figure 3. Flowchart of the SWAT-Transformer predicting model.

Figure 4. Hydrographs and scatterplots comparing observed and predicted streamflow at Yan’an station: (a,b) SWAT results, (c,d) Transformer results, and (e,f) SWAT-Transformer results.

Figure 5. Hydrographs and scatterplots comparing observed and predicted streamflow at Ganguyi station: (a,b) SWAT results, (c,d) Transformer results, and (e,f) SWAT-Transformer results.

Figure 6. Distribution of the predicted errors generated by SWAT, Transformer, and SWAT-Transformer during the testing period. (a,c) Yan’an station, (b,d) Ganguyi station.

Figure 7. Performance measures of SWAT, Transformer, and SWAT-Transformer for predicting low and high flows during the testing period: (a,b) high flows prediction results for Yan’an and Ganguyi Stations, (c,d) low flows prediction results for Yan’an and Ganguyi Stations.

Table 1. Summary of the basic statistics of streamflow data at Yan’an and Ganguyi stations.

Station	Time Period	Units	Mean	Minimum	Maximum
Yan’an	1981–2020	m³/s	3.21	0.04	43.30
Ganguyi	1981–2020	m³/s	5.67	0.30	69.96

Table 2. Summary of the basic statistics of meteorological data.

Station	Location	Maximum Temperature (°C)	Minimum Temperature (°C)	Annual Precipitation (mm)
Ansai	36°53′ N; 109°19′ E	40.1	−18.8	454.7
Jingbian	37°37′ N; 108°48′ E	38.5	−20.1	388.1
Suide	37°55′ N; 108°10′ E	38.4	−17.7	438.5
Wuqi	36°30′ N; 110°13′ E	38.0	−20.8	459.8
Xixian	36°42′ N; 110°57′ E	38.1	−17.4	446.7
Zhidan	36°46′ N; 108°46′ E	38.3	−19.6	498.3
Zichang	37°11′ N; 109°42′ E	36.4	−17.8	513.5

Table 3. Variable name and definition, minimum, maximum, and optimal parameters by SWAT-CUP.

Number	Variable Name	Initial Range	Optimal Parameters	Definition
1	r__CN2.mgt	(−0.25, 0.25)	0.13	Moisture condition II curve number
2	r__SOL_K().sol	(−0.25, 0.25)	0.21	Saturated hydraulic conductivity (mm/h)
3	r__SOL_AWC().sol	(−0.25, 0.25)	0.24	Available water capacity of the soil layer (mm H₂O/mm soil)
4	r__SOL_BD().sol	(−0.25, 0.25)	−0.18	Moist bulk density (g/cm³)
5	v__SFTMP.bsn	(−5, 5)	−4.51	snowfall temperature threshold (°C)
6	v__ALPHA_BF.gw	(0, 1)	0.45	Bank flow recession constant or constant of proportionality
7	v__GW_DELAY.gw	(30, 450)	434.67	Groundwater Delay Time
8	v__GWQMN.gw	(100, 300)	1283.80	Groundwater Flow Threshold
9	v__ESCO.hru	(0, 1)	0.30	Soil evaporation compensation coefficient
10	v__EPCO.hru	(0, 1)	0.82	Plant uptake compensation factor
11	v__CH_N2.rte	(0.01, 3)	0.03	Exponent for Channel Manning’s N
12	v__CH_K2.rte	(0.01, 500)	158.26	Effective hydraulic conductivity in main channel alluvium (mm/h)
13	v__ALPHA_BNK.rte	(0.3, 1)	0.86	Bank Erosion Coefficient

Table 4. Performance measures of SWAT, Transformer, and SWAT-Transformer for streamflow prediction during the training and testing periods.

Station	Models	Training Period			Testing Period
Station	Models	NSE (%)	PBIAS (%)	R²	NSE (%)	PBIAS (%)	R²
Yan’an	SWAT	73	21.58	0.69	65	35.01	0.70
	Transformer	75	19.94	0.79	71	21.02	0.75
	SWAT-Transformer	88	17.16	0.89	84	10.11	0.86
Ganguyi	SWAT	70	30.64	0.74	66	37.01	0.71
	Transformer	82	21.30	0.85	71	9.82	0.82
	SWAT-Transformer	90	0.30	0.91	88	2.61	0.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, J.; Gu, Y.; Yin, X.; Chen, J.; Ao, T.; Zhang, J. Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction. Sustainability 2024, 16, 8699. https://doi.org/10.3390/su16198699

AMA Style

Tao J, Gu Y, Yin X, Chen J, Ao T, Zhang J. Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction. Sustainability. 2024; 16(19):8699. https://doi.org/10.3390/su16198699

Chicago/Turabian Style

Tao, Jiahui, Yicheng Gu, Xin Yin, Junlai Chen, Tianqi Ao, and Jianyun Zhang. 2024. "Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction" Sustainability 16, no. 19: 8699. https://doi.org/10.3390/su16198699

APA Style

Tao, J., Gu, Y., Yin, X., Chen, J., Ao, T., & Zhang, J. (2024). Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction. Sustainability, 16(19), 8699. https://doi.org/10.3390/su16198699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coupling SWAT and Transformer Models for Enhanced Monthly Streamflow Prediction

Abstract

1. Introduction

2. Material and Methodology

2.1. Study Area and Data

2.2. SWAT Model

2.2.1. SWAT Model Database

2.2.2. SWAT Spatial Discretization

2.2.3. SWAT Calibration

2.3. Transformer Model

2.4. SWAT-Transformer Model

2.5. Performance Measures

3. Results and Discussion

3.1. Results

3.2. Discussion

3.3. Simulation Uncertainties

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI