A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction

Xie, Shuai; Wang, Dong; Wang, Jin; Yang, Chunhua; Shen, Keyan; Jia, Benjun; Cao, Hui

doi:10.3390/w17172506

Open AccessArticle

A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction

by

Shuai Xie

^1,2,3,*,

Dong Wang

^1,2,3,*,

Jin Wang

⁴,

Chunhua Yang

^1,2,3,

Keyan Shen

⁴

,

Benjun Jia

⁴ and

Hui Cao

⁴

¹

Water Resources Department, Changjiang River Scientific Research Institute, Wuhan 430010, China

²

Hubei Key Laboratory of Water Resources & Eco-Environmental Sciences, Changjiang River Scientific Research Institute, Wuhan 430010, China

³

Research Center on the Yangtze River Economic Belt Protection and Development Strategy, Wuhan 430010, China

⁴

Hubei Key Laboratory of Intelligent Yangtze and Hydroelectric Science, China Yangtze Power Co., Ltd., Yichang 443300, China

^*

Authors to whom correspondence should be addressed.

Water 2025, 17(17), 2506; https://doi.org/10.3390/w17172506

Submission received: 14 July 2025 / Revised: 4 August 2025 / Accepted: 21 August 2025 / Published: 22 August 2025

(This article belongs to the Section Hydrology)

Download

Browse Figures

Versions Notes

Abstract

Mid–long-term streamflow prediction (MLSP) plays a critical role in water resource planning amid growing hydroclimatic and anthropogenic uncertainties. Although AI-based models have demonstrated strong performance in MLSP, their capacity to quantify predictive uncertainty remains limited. To address this challenge, a DeepAR-based probabilistic modeling framework is developed, enabling direct estimation of streamflow distribution parameters and flexible selection of output distributions. The framework is applied to two case studies with distinct hydrological characteristics, where combinations of recurrent model structures (GRU and LSTM) and output distributions (Normal, Student’s t, and Gamma) are systematically evaluated. The results indicate that the choice of output distribution is the most critical factor for predictive performance. The Gamma distribution consistently outperformed those using Normal and Student’s t distributions, due to its ability to better capture the skewed, non-negative nature of streamflow data. Notably, the magnitude of performance gain from using the Gamma distribution is itself region-dependent, proving more significant in the basin with higher streamflow skewness. For instance, in the more skewed Upper Wudongde Reservoir area, the model using LSTM structure and Gamma distribution reduces RMSE by over 27% compared to its Normal-distribution counterpart (from 1407.77 m³/s to 1016.54 m³/s). Furthermore, the Gamma-based models yield superior probabilistic forecasts, achieving not only lower CRPS values but also a more effective balance between high reliability (PICP) and forecast sharpness (MPIW). In contrast, the relative performance between GRU and LSTM architectures was found to be less significant and inconsistent across the different basins. These findings highlight that the DeepAR-based framework delivers consistent enhancement in forecasting accuracy by prioritizing the selection of a physically plausible output distribution, thereby providing stronger and more reliable support for practical applications.

Keywords:

mid–long-term streamflow prediction; probabilistic prediction; DeepAR; modeling framework; Gamma distribution

1. Introduction

The combined impacts of climate change and human activities are profoundly altering the processes of runoff generation and confluence, resulting in increasing uncertainty in the evolution of water resources [1,2,3,4]. At the same time, socio-economic drivers such as population growth and urbanization are intensifying the demand for water supply [5,6,7,8]. Against this backdrop, mid–long-term streamflow prediction (MLSP), which refers to forecasts at ten-day, monthly, seasonal, and annual scales with lead times ranging from three days to one year, has become increasingly vital for effective water resource management and integrated utilization, as it provides valuable insights into future runoff patterns [9,10,11,12,13,14]. Consequently, MLSP is attracting growing attention in both research and practical applications [11,12,13,14].

Many models have been developed and applied in MLSP to improve the predictive performance and provide information for the comprehensive utilization of water resources [11,12,13,14,15,16,17,18]. These models can be broadly divided into physical-based models, which simulate the streamflow based on the runoff generation and confluence equations, and data-driven models, which directly simulate the relationship between streamflow and predictors including precipitation, temperature, and other factors [13,14,19,20,21,22]. Along with the development of artificial intelligence (AI) methods, the AI-based data-driven models, including support vector regression (SVR), artificial neural network (ANN), gated recurrent unit neural network (GRU), long short-term memory network (LSTM) and so on, can obtain better predictive performance than traditional models and have become predominant in the MLSP [10,20,23,24,25,26,27,28]. For instance, many studies applied SVR models in MLSP, and the results demonstrate that the SVR models can generate more accurate forecasts than linear models and ANN models [29,30,31,32,33]. Xie et al. (2024) compared five AI-based models, and the results demonstrate that the LSTM model outperformed other models in forecasting monthly streamflow in 37 basins [14]. In addition, some hybrid models which incorporate preprocessing methods into the AI methods are developed to integrate the advantages of different base models and have obtained better forecasting skill [19,34,35,36,37].

The proposed AI-based models in MLSP demonstrate significant improvements in predictive performance, but their limited capacity to characterize future water resource uncertainties constrains practical applications [38,39,40,41]. To overcome this issue, many post-processing methods are adopted to produce ensemble forecasting results capable of characterizing predictive uncertainties [40,41,42,43]. For example, Liang et al. (2018) proposes the hydrological uncertainty processor to post-process the deterministic outputs from the SVR model to quantify prediction uncertainties [39]. Mo et al. (2023) applies the generalized autoregressive conditional heteroskedasticity model to identify time-varying forecasting errors to improve the predictive performance [44].

Although the post-processing approach in MLSP demonstrates predictive capability, it exhibits two fundamental limitations: (1) inability to directly generate probability distributions and (2) failure to preserve the inherent statistical characteristics of streamflow. To overcome these constraints in MLSP, this study adopts a DeepAR-based modeling framework. The DeepAR architecture exhibits two key capabilities: (1) direct prediction of the predictand’s distribution parameters and (2) flexible selection of probability distributions for the target variable [45,46]. It has been applied in many studies, and empirical validation across five time series datasets demonstrates its superior performance over other existing state-of-the-art methods [45,47,48,49]. However, the DeepAR’s robustness under heavy-tailed streamflow distributions requires further verification, and the selection criteria for appropriate probability distributions lack systematic guidance. Therefore, the objectives of this study are (1) to develop a DeepAR-based probabilistic forecasting framework for MLSP, (2) to validate modeling framework’s applicability through implementation in two case studies, and (3) to systematically evaluate impacts of base model architecture and distribution type selection on predictive skill.

2. Materials and Methods

2.1. DeepAR Model

DeepAR is a probabilistic forecasting framework developed by Amazon Research, which combines recurrent neural networks (RNNs) with parametric probability distributions to generate time-series predictions [45]. Unlike traditional point-forecasting models, DeepAR directly outputs the parameters of user-specified distributions (e.g., Gaussian for real-valued data, Negative Binomial for positive count data, and Beta for data in the unit interval), enabling native uncertainty quantification.

Let

z_{t}

denote the streamflow value at time t, and

x_{t}

represent the vector of predictor variables at time t. The DeepAR model estimates the conditional distribution of future streamflow values:

P (z_{t_{0} + 1 : t_{0} + L} | z_{t_{0} - H + 1 : t_{0}}, x_{t_{0} - H + 1 : t_{0} + L})

(1)

where

t_{0}

is the forecast initialization time,

H

is the length of the target time series at prediction time, and

L

is the prediction length.

For a trained DeepAR model, the distribution parameters

θ_{t}

at time t are computed as a function of the hidden state

h_{t}

and model parameters

Θ

:

θ_{t} = f (h_{t}, Θ)

(2)

where

f (\cdot)

is a function used to map the hidden state to distribution parameters and the hidden state

h_{t}

evolves recursively via

h_{t} = g (h_{t - 1}, z_{t - 1}, x_{t}, Θ)

(3)

where

g (\cdot)

is a nonlinear transition function implemented as a multi-layer RNN (LSTM or GRU) parametrized by

Θ

, and

h_{t - 1}

is the hidden state from the previous time step.

Then, the simulated or forecasted streamflow value at time t can be sampled by

\tilde{z_{t}} ~ p (\cdot | f (h_{t}, Θ))

(4)

2.1.1. Training

Given a streamflow time series

{\{z_{t}\}}_{t = 1,2, \dots, T}

and associated predictor variables

x_{t}

, the DeepAR model parameters

Θ

, including the parameters of both

f (\cdot)

and

g (\cdot)

, can be learned by maximizing the log-likelihood, as shown below:

L = \sum_{t_{0}} \sum_{t = t_{0} + 1}^{t_{0} + L} \log p (z_{t} | f (h_{t}, Θ))

(5)

2.1.2. Prediction

Given an observed streamflow sequence

z_{t_{0} - H + 1 : t_{0}}

and corresponding predictor variables

x_{t_{0} - H + 1 : t_{0} + L}

, the trained DeepAR model generates probabilistic forecasts for future streamflow values

z_{t_{0} + 1 : t_{0} + L}

through the following procedure:

(1): The hidden state $h_{t_{0}}$ is obtained by recursively processing the historical streamflow $z_{t_{0} - H + 1 : t_{0}}$ and predictors $x_{t_{0} - H + 1 : t_{0}}$ through the RNN transition function in Equation (3);
(2): Initial conditions are set as $\tilde{h_{t_{0}}} = h_{t_{0}}$ and $\tilde{z_{t_{0}}} = z_{t_{0}}$ ;
(3): For each subsequent time step $t = t_{0} + 1$ to $t_{0} + L$ , the hidden state $\tilde{h_{t}}$ is updated using $g (\tilde{h_{t - 1}}, \tilde{z_{t - 1}}, x_{t}, Θ)$ , and the forecast $\tilde{z_{t}}$ is sampled from $p (\cdot | f (\tilde{h_{t}}, Θ))$ ;
(4): Step (3) is repeated N times to produce ensemble forecasts ${\{z_{i, t_{0} + 1 : t_{0} + L}\}}_{i = 1,2, \dots, N}$ , providing a Monte Carlo approximation of the predictive distribution.

2.1.3. Likelihood Model

The likelihood

p (\cdot | f (h_{t}, Θ))

, which defines the target distribution, must be carefully selected to match the statistical characteristics of predictand. While Salinas et al. (2020) [45] recommended Gaussian distributions for real-valued data and Negative Binomial distributions for positive count data, these choices prove suboptimal for streamflow forecasting due to (1) heavy-tailed characteristics of streamflow, and (2) the strictly positive nature of streamflow values. Therefore, to better characterize the statistical properties of streamflow, this study employs Gamma distribution. This choice is substantiated by its strong physical basis, as the Gamma function can be derived from conceptual watershed runoff models, distinguishing it from alternative distributions, which are applied primarily for their statistical fitting capabilities [50]. Its proven ability to accurately represent diverse streamflow regimes has also led to its widespread acceptance and recommendation in hydrological frequency analysis [51]. In addition to the physically grounded Gamma distribution, this study also employs Student’s t-distribution, which is suitable for extreme events, and conventional Gaussian distribution, which has been widely proposed.

For Student’s t-distribution, the likelihood and function

f (\cdot)

are as below:

p_{S T} (z| μ, σ, ν) = \frac{Γ (\frac{ν + 1}{2})}{Γ (\frac{ν}{2}) \sqrt{ν π} σ} {(1 + \frac{{(z - μ)}^{2}}{ν σ^{2}})}^{- \frac{ν + 1}{2}} μ (h_{t}) = {w_{μ}^{T} h}_{t} + b_{μ} σ (h_{t}) = s o f t p l u s ({w_{σ}^{T} h}_{t} + b_{σ}) ν (h_{t}) = 2 + s o f t p l u s ({w_{ν}^{T} h}_{t} + b_{ν})

(6)

where the softplus activation function

s o f t p l u s (x) = \log (1 + e^{x})

is applied to enforce positivity constraints on the distribution parameters.

For Gamma distribution, the likelihood and function

f (\cdot)

are as below:

p_{G a m m a} (z| α, β) = \frac{β^{α}}{Γ (α)} z^{α - 1} e^{- β z} α (h_{t}) = s o f t p l u s ({w_{α}^{T} h}_{t} + b_{α}) β (h_{t}) = s o f t p l u s ({w_{β}^{T} h}_{t} + b_{β})

(7)

2.2. DeepAR-Based Modeling Framework

This study develops a DeepAR-based probabilistic forecasting framework for MLSP (Figure 1), which consists of four main steps: (1) data preparing, involving data collection and optimal distribution selection; (2) data splitting, which spites data into training, validation and test datasets; (3) model calibration, involving model training and selection; and (4) model evaluation, involving model performance assessment [52,53].

2.2.1. Data Preparing

Data preparing and analysis are proposed to collect streamflow and corresponding predictors (precipitation), and select optimal distribution for the streamflow. To select the most appropriate distribution from these candidates, both the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are employed to evaluate the goodness-of-fit between the observed data and theoretical distributions. Then, the optimal distribution is selected by minimizing both AIC and BIC values, which are computed as

A I C = - 2 l n (L i) + 2 k A I C = - 2 l n (L i) + k l n (n)

(8)

where Li is the maximized likelihood value, k denotes the parameter number, and n is the length of streamflow time series.

2.2.2. Data Splitting

Data splitting is an important step in the data-driven modeling process, through which the available data is divided into training, validation, and test datasets [52,53,54]. In this study, the data after a specific time point is first separated as the test set to ensure no overlap between the test data and other data. Subsequently, the remaining data is randomly shuffled and split into training and validation datasets in a specific ratio.

2.2.3. Model Calibration

Model calibration is used to optimize model architecture and parameters in order for the model to represent the underlying relationships between predictors and predictand. First, the optimal streamflow distribution obtained by minimizing the AIC/BIC metrics is adopted to define the model’s output distribution type. Furthermore, taking into account the specific architecture of the DeepAR model, this study utilizes two types of RNN structures (i.e., LSTM and GRU), the detailed structures and gating equations of which can be seen in Demir (2025) [55]. Then, multiple model variants are generated by varying input conditions, and parameters are optimized using a training dataset by the ADAM optimizer, which is selected for its proven efficiency and robust performance in handling complex models through adaptive learning rates [56,57]. Finally, the model variants are used in a validation dataset and compared to select the best model in terms of their predictive performance.

2.2.4. Model Evaluation

Model evaluation is used to assess the predictive performance of the selected model over an independent dataset (i.e., test dataset) [58]. The deterministic and probabilistic predictions are generated, and two types of metrics are proposed to evaluate the predictive performance [22,53,59,60]. The performance of the deterministic predictions is evaluated using the following three metrics: the root mean square error (RMSE) and mean absolute error (MAE), which reflect the overall prediction accuracy, and the Nash–Sutcliffe efficiency (NSE) coefficient, which indicates the overall model effectiveness. The performance of the probabilistic predictions is evaluated using the following three metrics: (1) continuous ranked probability score (CRPS), which measures the overall probabilistic predictive performance; (2) prediction interval coverage probability (PICP), which measures the proportion of observations that fall within their predicted probability intervals; and (3) mean prediction interval width (MPIW), which indicates the average width of the prediction intervals and thus the sharpness of the forecast. These six metrics can be calculated according to the following equations:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\hat{Q_{i}} - Q_{i})}^{2}} M A E = \frac{1}{n} \sum_{i = 1}^{n} | \hat{Q_{i}} - Q_{i} | N S E = 1 - \frac{\sum_{i = 1}^{n} {(\hat{Q_{i}} - Q_{i})}^{2}}{\sum_{i = 1}^{n} {(Q_{i} - \frac{1}{n} \sum_{i = 1}^{n} Q_{i})}^{2}} C R P S = \frac{1}{N} \sum_{i = 1}^{N} \int {(F_{i} (x) - H (x - Q_{i}))}^{2} d x H (x - Q_{i}) = \{\begin{matrix} \begin{matrix} 0 & x < Q_{i} \end{matrix} \\ \begin{matrix} 1 & x \geq Q_{i} \end{matrix} \end{matrix} P I C P = \frac{1}{n} \sum_{i = 1}^{n} c_{i} \times 100 % c_{i} = \{\begin{matrix} \begin{matrix} 1 & L_{i} \leq Q_{i} \leq U_{i} \end{matrix} \\ \begin{matrix} 0 & o t h e r w i s e \end{matrix} \end{matrix} M P I W = \frac{1}{n} \sum_{i = 1}^{n} U_{i} - L_{i}

(9)

where n is the sample size in the test dataset, i is the sample index,

Q_{i}

is the observed streamflow,

\hat{Q_{i}}

is the ensemble mean prediction,

L_{i}

and

U_{i}

are the lower and upper bounds of the prediction interval, and

F_{i} ()

is the cumulative distribution function (CDF) of the probabilistic forecast.

2.3. Case Study and Data

The performance of the DeepAR-based modeling framework is examined in two basins upstream of the Wudongde (WDDR) and Sanxia (SXR) reservoirs, as illustrated in Figure 2. The available data and statistical characteristics for the two basins is presented in Table 1, including ten-day naturalized streamflow (Figure 3) and areal mean precipitation records spanning January 1980 to September 2022. To ensure data compatibility with neural network requirements, all variables are first normalized before model processing and then inversely transformed to their original scales. For the streamflow variable, the data is standardized using the Z-score method and then shifted by a constant to ensure all values were above 0, as required by the Gamma distribution. For the precipitation variable, min–max normalization is used to scale its values into the range 0–1.

2.4. Experiment Setup

The overall experimental setup, based on the DeepAR-based modeling framework, is illustrated in Figure 4. Three candidate distribution types—Gaussian, Gamma, Student’s t-distribution—are provided to account for streamflow characteristics. Then, the data after January 2018 is separated as the test dataset, while the other data is randomly split into training and validation datasets at an 8:2 ratio, following the recommendations in previous studies [52,53]. This splitting method effectively prevents data leakage between the test dataset and the training/validation datasets. The resulting training/validation and test datasets are visualized in Figure 3. As shown in Figure 3, the characteristics of the streamflow in the test dataset are largely consistent with those in the training/validation datasets for both the Upper WDDR and Upper SXR areas. Additionally, it is evident that the streamflow in neither basin exhibits a statistically significant increasing or decreasing linear trend over time. The coefficients of determination R² with time are 0.000 for both areas, with corresponding p-values of 0.876 and 0.681, respectively.

During the model calibration phase, different models for predicting streamflow in the next 18 ten-day periods, are trained based on varying input conditions that incorporate combinations of three precipitation input scenarios (temporal lags: [0], [0, 1] or [0, 1, 2]) and two streamflow input scenarios (temporal lags: [1], [1, 2]), with the model training hyperparameters detailed in Table A1. The optimal input combination is then selected based on a comparative evaluation of all model variants’ predictive performance (i.e., RMSE) on the validation dataset.

After data preparing and model calibration, the final model combining the optimal input configuration and the probability distribution output is established and applied to produce deterministic predictions and probabilistic predictions with 100 members in this study. In order to evaluate the impact of output distribution and RNN structure, alternative models with different probability distribution outputs and different RNN structures (LSTM and GRU), named GRU-N, GRU-S, GRU-G, LSTM-N, LSTM-S, LSTM-G, are established and compared in terms of their predictive performance on the testing dataset. In the “GRU-N” naming structure, the former denotes the RNN structure (including GRU and LSTM), while the latter denotes the model output distribution (including Normal distribution N, Student’s t-distribution S, and Gamma distribution G).

3. Results

Following the modeling framework, the results are presented in three sections: optimal probability distribution selection (Section 3.1), input configuration optimization (Section 3.2), and testing performance evaluation (Section 3.3).

3.1. Optimal Probability Distribution Selection

The distribution fitting results for both study areas over the data excluding the test dataset are presented in Figure 5, while the corresponding AIC and BIC values are listed in Table 2. The results reveals that the Gamma distribution provides superior statistical performance compared to the Normal and Student’s t-distributions. For the Upper WDDR area, the Gamma distribution achieves the lowest AIC (27,321.09) and BIC (27,337.03) values, substantially outperforming the Normal distribution (AIC: 28,574.36, BIC: 28,584.99) and Student’s t-distribution (AIC: 28,455.62, BIC: 28,471.57). Similarly, in the Upper SXR area, the Gamma distribution exhibits superior performance, with AIC (30,848.71) and BIC (30,864.66) values considerably lower than the alternative distributions. The probability density plots also illustrate that the Gamma distribution more accurately captures the right-skewed characteristics and tail behavior of the streamflow data in both study areas.

3.2. Input Configuration Optimization

The comparative analysis of different input configurations reveals consistent performance patterns across both study areas, which can be shown in Table 3. For the Upper WDDR area, the precipitation input scenario with temporal lags [0, 1, 2] achieves the lowest RMSE (1199.21 m³/s), indicating superior predictive accuracy when incorporating precipitation data from the current time step and two previous time steps. In contrast, the precipitation-only scenario with no temporal lag [0] shows the poorest performance with the highest RMSE (1277.07 m³/s). Similarly, in the Upper SXR area, the [0, 1, 2] precipitation configuration demonstrates optimal performance, with an RMSE value of 3481.18 m³/s, while the [0] configuration shows the highest RMSE (3634.35 m³/s). The performance gradient follows a consistent pattern across both areas, where increased temporal lag information progressively improves model accuracy.

Regarding streamflow input configurations, the comparison between temporal lags [1] and [1, 2] shows that incorporating additional historical streamflow information ([1, 2]) yields only marginal improvements, with RMSE decreasing modestly from 1245.33 to 1230.19 m³/s in the Upper WDDR area and from 3577.72 to 3577.31 m³/s in the Upper SXR area. This suggests that the contribution of additional streamflow lag information is relatively limited compared to the substantial performance gains observed with precipitation temporal lags.

3.3. Testing Performance Evaluation

The predictive performance metrics of the six models (i.e., GRU-N, GRU-S, GRU-G, LSTM-N, LSTM-S, and LSTM-G) are presented in Table 4, and the Taylor diagram are presented in Figure A1. The results consistently demonstrate that models utilizing a Gamma distribution output (GRU-G, LSTM-G) generally outperform their counterparts that use Gaussian or Student’s t-distributions. In the Upper WDDR area, LSTM-G demonstrates exceptional performance, achieving the lowest RMSE (1016.54 m³/s) and MAE (643.99 m³/s), and the highest NSE (0.89). For probabilistic forecasts, it also proves highly reliable, with a PICP of 93.15% and the lowest CRPS (473.26 m³/s). For the Upper SXR area, the superiority is shared between the Gamma-based models. The GRU-G model achieves the lowest MAE (2222.94 m³/s) and CRPS (1654.24 m³/s). Meanwhile, the LSTM-G model achieves the best RMSE (4047.15 m³/s) and the highest reliability in its probabilistic forecast, with an impressive PICP of 96.54%. The differences between GRU and LSTM architectures are relatively small and show no consistent patterns. In the Upper WDDR area, LSTM-G slightly outperforms GRU-G, while in the Upper SXR area, the differences are marginal. This suggests that RNN architecture choice has minimal impact compared to output distribution selection.

4. Discussion

The predictive performance of the six models (i.e., GRU-N, GRU-S, GRU-G, LSTM-N, LSTM-S, and LSTM-G) is discussed in this section. First, Section 4.1 discusses the deterministic prediction performance of models with different distribution outputs and different RNN structures across various forecast horizons. Then, Section 4.2 provides a comparative analysis of their probabilistic prediction performance. Finally, Section 4.3 demonstrates the overall predictive performance of the models across different forecast horizons.

4.1. Deterministic Prediction Performance of Different Models

The deterministic predictive performance of six models (i.e., GRU-N, GRU-S, GRU-G, LSTM-N, LSTM-S, and LSTM-G) across 18 forecast horizons (10-day periods) for two study areas is presented in Table 5 and Figure 6, with the results for RMSE shown as a representative example. It is evident that the forecasting accuracy is generally observed to decline as the forecast horizon increases, regardless of model architecture or output distribution. Among the three distributions, the Gamma distribution consistently results in lower RMSE values, particularly at longer lead times. This suggests a better capacity of the Gamma-based models to capture the positively skewed or heteroscedastic characteristics often found in hydrological data. In the Upper WDDR area, the LSTM-G model achieves the lowest RMSE across most forecast horizons, with values ranging from 859.8 m³/s at the 1st forecast period to 1096.7 m³/s at the 18th. A similar pattern is observed in the Upper SXR area, where LSTM-G yields the minimum RMSE of 3687.7 m³/s at the 1st period and 3998.1 m³/s at the 18th.

Differences between LSTM and GRU structures are also observed but are found to be less consistent and less influential than those resulting from output distribution selection. In the Upper WDDR area, GRU-N and GRU-S outperform their LSTM counterparts, while the advantage shifts to LSTM only when the Gamma distribution is employed. In the Upper SXR area, the superiority of LSTM-G is observed only at shorter forecast horizons, with GRU-G performing better as the forecast horizon increases. Across all configurations, no consistent advantage is associated with either architecture, suggesting that model structure plays a secondary role relative to the output distribution in determining predictive performance.

Notably, the magnitude of performance gains brought by the Gamma distribution varies across regions. In the Upper WDDR area, the use of Gamma distribution yields substantial improvements compared to Normal and Student’s t assumptions, particularly when paired with LSTM. In contrast, in the Upper SXR area, although Gamma-based models still outperform others, the improvement is relatively marginal. For instance, the reduction in RMSE from LSTM-N to LSTM-G at the 18th horizon is only modest (from 4203.0 m³/s to 4070.5 m³/s), whereas the gain is more pronounced in the Upper WDDR area (from 1559.5 m³/s to 1096.7 m³/s). These findings demonstrate that the benefit of applying flexible, non-Gaussian output distributions such as Gamma is region-dependent.

A primary reason for this discrepancy likely lies in the different statistical characteristics of the streamflow in the two basins, as detailed in Table 1. The streamflow in the Upper WDDR area exhibits a significantly higher degree of positive skewness (1.35) and kurtosis (1.23) compared to the Upper SXR area (skewness of 1.21, kurtosis of 0.98). This indicates that the Upper WDDR streamflow distribution has a longer right tail and is more “peaked,” deviating more substantially from a Normal distribution. Consequently, the flexible, skewed shape of the Gamma distribution provides a much better fit for the more asymmetric data of the Upper WDDR area, leading to more significant performance gains. Conversely, the streamflow in the Upper SXR area, while still positively skewed, is statistically closer to a symmetrical distribution. Therefore, the advantage of the Gamma distribution over simpler assumptions like the Gaussian distribution is less pronounced, resulting in only marginal improvements.

In summary, the results highlight the critical role of output distribution in deterministic streamflow forecasting, with the Gamma distribution consistently offering performance benefits, though to varying extents across regions. While model structure has some impact, it exerts less influence than distributional assumptions.

4.2. Probabilistic Prediction Performance of Different Models

The probabilistic predictive performance (i.e., CRPS, PICP, and MPIW) of six models (i.e., GRU-N, GRU-S, GRU-G, LSTM-N, LSTM-S, and LSTM-G) across 18 forecast horizons (10-day periods) for two study areas is presented in Figure 7. Similar to the deterministic results, probabilistic forecast accuracy generally declines with increasing lead time, as reflected by rising CRPS values across all model configurations. Among the output distributions, the Gamma distribution again provides the most substantial improvements in forecast skill. This consistency between deterministic and probabilistic evaluations reinforces the robustness of the Gamma assumption in capturing the inherent characteristics of streamflow data. In the Upper WDDR area, LSTM-G yields the lowest CRPS across nearly all horizons. In the Upper SXR area, LSTM-G yields the lowest CRPS during the first three horizons, while GRU-G shows superior accuracy at longer horizons. This pattern echoes deterministic results, where GRU-G also performed better in long-term forecasting under complex conditions.

Further analysis of the PICP and MPIW reveals deeper insights into the models’ reliability and sharpness (Figure 7c–f). As expected, the MPIW for all models generally increases with the forecast horizon, reflecting the growing uncertainty of longer-term predictions. However, the models based on the Gamma distribution (GRU-G and LSTM-G) consistently achieve higher PICP values, indicating that their prediction intervals more reliably encompass the observed values. Notably, these Gamma-based models manage to maintain relatively narrower interval widths (lower MPIW) for most lead times in the Upper WDDR area and for shorter lead times in the Upper SXR area. In contrast, while models using the Student’s t-distribution (GRU-S, LSTM-S) often produce the narrowest intervals (lowest MPIW), this comes at the cost of poor reliability, as their PICP values are frequently the lowest. This demonstrates their inability to effectively capture the true uncertainty of the streamflow.

Compared to output distribution selection, differences due to model structure are less consistent and generally less influential. While LSTM tends to perform better in the Upper WDDR area, GRU offers competitive or better results in the Upper SXR area when combined with the Gamma distribution. Regional differences are again observed, with CRPS values in the Upper SXR area consistently higher than those in the Upper WDDR area, reflecting greater predictive uncertainty. However, the relative benefit of using the Gamma distribution persists in both regions, though with different magnitudes.

In summary, probabilistic forecasting results confirm key findings from deterministic evaluations—particularly the consistent and robust advantage of using the Gamma distribution across regions and horizons. This advantage is not only reflected in superior accuracy (lower CRPS) but also in a better balance between forecast reliability (PICP) and sharpness (MPIW). However, nuanced differences are observed in the relative contributions of model structure, especially under probabilistic metrics. These findings suggest that while distributional assumptions remain the most critical component for improving streamflow forecast quality, model structure and regional hydrological characteristics jointly shape both deterministic and probabilistic forecasting performance.

4.3. Overall Predictive Performance

The predicted and observed streamflow at the first forecast horizon are shown in Figure 8. It is evident that all models produce narrow prediction intervals that closely follow the observations during non-flood seasons, when streamflow is low and uncertainty is limited. In contrast, flood seasons are characterized by increased variability, leading to wider forecast intervals and larger deviations, particularly around peak flows. The choice of output distribution also significantly affects forecast reliability. For example, models using the Normal distribution often generate overly wide intervals due to their poor fit to the skewed nature of streamflow, which is reflected in their generally higher MPIW values (Figure 7e,f). In contrast, Gamma-based models produce tighter and more stable intervals while maintaining high coverage probability, as evidenced by their superior balance between PICP and MPIW across most horizons (Figure 7c–f). While Gamma-based models generally provide better predictions in most cases, their advantage may not hold in every situation—for example, the GRU-G model performs worst in the Upper WDDR area. Nonetheless, their ability to constrain uncertainty remains evident, and such model-specific variability also underscores the importance of model fusion strategies for improving forecast robustness under diverse application scenarios [21,61,62,63,64].

The predicted and observed streamflow across 18 forecast periods for models using Gamma distribution in both the Upper WDDR and Upper SXR areas are presented in Figure 9. As the forecast horizon increases, streamflow uncertainty becomes more pronounced, and the width of the predictive intervals correspondingly expands, which is quantitatively confirmed by the rising MPIW values shown in Figure 7. This widening of intervals effectively reflects the growing uncertainty associated with longer lead times, particularly during high-flow periods. The use of the Gamma distribution enables the models to better capture the skewed nature of streamflow, resulting in predictive intervals that not only reflect the asymmetry of the data but also reliably encompass the observed hydrographs across seasons and forecast periods, which is validated by the high PICP values (Figure 7). This demonstrates the capability of using suitable output distribution to improve both the reliability and calibration in probabilistic streamflow prediction.

4.4. Limitations and Future Research Directions

Despite the promising performance of the proposed framework in the two selected study areas, there are still several limitations that need to be further studied. On one hand, given that different models exhibit varying performance under different scenarios, future work could explore model fusion or ensemble approaches. This is motivated by the finding that the LSTM-G model performs best at all lead times in the Upper WDDR area and at shorter lead times in the Upper SXR area, while the GRU-G model excels at longer lead times in the Upper SXR area. A model fusion method could leverage complementary model strengths to enhance overall forecast robustness.

On the other hand, the applicability of our modeling framework needs to be validated in basins with different hydrological regimes. The areas used in this study are humid, monsoon-influenced regions, for which precipitation is the primary input driver and the non-negative Gamma distribution is suitable. However, its effectiveness in other basins requires further investigation. For example, in semi-arid regions, streamflow is often intermittent, and evapotranspiration becomes a dominant influencing factor. Therefore, the model would need to incorporate evapotranspiration as an input, and the Gamma distribution, which cannot handle zero values, would likely need to be replaced with more suitable alternatives, such as Zero-Inflated or mixture distribution models. Similarly, in snowmelt-dominated regions, the framework would need to be adapted to include key input variables such as air temperature and snow water equivalent to capture the essential physics of snow accumulation and melt processes.

5. Conclusions

In this study, a DeepAR-based modeling framework is developed to generate probabilistic streamflow forecasts by integrating distribution selection with the DeepAR model. The framework is applied to two case studies to evaluate both deterministic and probabilistic forecasting performance. The influence of model structure and output distribution choice on prediction accuracy is also examined. The main conclusions are as follows.

(1) The proposed DeepAR-based modeling framework effectively identifies the most appropriate output distribution and input configuration, leading to superior deterministic and probabilistic forecasting performance in both case studies. By integrating distribution selection and input optimization, the framework ensures that the final models are well-calibrated to the characteristics of local streamflow dynamics.

(2) Multiple factors influence predictive performance, among which the selection of output distribution has the most significant impact. Notably, the advantage of the Gamma distribution is more pronounced in basins with higher skewness, providing a data-driven explanation for the observed regional performance differences. Meanwhile, differences between RNN model structures (i.e., GRU vs. LSTM) are relatively minor and less consistent.

(3) As forecast lead time increases and during flood seasons, streamflow uncertainty grows substantially. The developed probabilistic forecasting framework is capable of capturing this variation in uncertainty, and models using the Gamma distribution demonstrate superior performance by better representing the skewed nature of streamflow. This contributes to more reliable and better-calibrated forecasts across both typical and high-variability conditions.

Author Contributions

Conceptualization, S.X. and J.W.; methodology, S.X.; software, S.X. and K.S.; validation, S.X., D.W. and B.J.; formal analysis, S.X., C.Y. and H.C.; investigation, D.W. and C.Y.; resources, S.X. and K.S.; data curation, J.W. and B.J.; writing—original draft preparation, S.X., J.W. and B.J.; writing—review and editing, S.X., D.W., K.S. and H.C.; visualization, S.X., C.Y. and H.C.; supervision, D.W. and H.C.; project administration, S.X.; funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Key Research and Development Programof China (grant numbers: 2023YFC3206002, 2024YFE0100100), the Natural Science Foundation of Hubei Province (grant numbers: 2023AFB039, 2022CFD027), the National Natural Science Foundation of China (grant number: U2340211), the Key Project of Chinese Water Resources Ministry (grant number: SKS-2022120), fundamental research project for central public welfare research institutes (CKSF20241021/SZ). The authors declare that this study received funding from China Yangtze Power Co., Ltd. (contract no. Z242302057 and project no. 2423020055). The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication. Shuai Xie is supported by a program of China Scholarship Council (No. 202303340001) during his visit to the University of Regina, where the research is conducted.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

Various Python 3.6.1 open-source frameworks were used in this study. We express our gratitude to all contributors. We also give special thanks to the anonymous reviewers and editors for their constructive comments.

Conflicts of Interest

Author Jin Wang, Keyan Shen, Benjun Jia and Hui Cao were employed by the company China Yangtze Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLSP	mid–long-term streamflow prediction
AI	artificial intelligence
SVR	support vector regression
ANN	artificial neural network
LSTM	long short-term memory network
GRU	gated recurrent unit neural network
RNN	recurrent neural network
WDDR	Wudongde Reservoir
SXR	Sanxia Reservoir
RMSE	root mean square error
CRPS	continuous ranked probability score

Appendix A

Table A1. The hyperparameters.

Hyperparameters	Value
Number of layers	2
Number of cells	40
Dropout rate	0.2
Time features	True
Batch size	32
Epochs	200
Learning rate	0.001

Figure A1. Taylor diagram of six models: (a) Upper WDDR area; (b) Upper SXR area.

References

Nguyen, Q.H.; Tran, V.N. Temporal Changes in Water and Sediment Discharges: Impacts of Climate Change and Human Activities in the Red River Basin (1958–2021) with Projections up to 2100. Water 2024, 16, 1155. [Google Scholar] [CrossRef]
Jia, L.; Niu, Z.; Zhang, R.; Ma, Y. Sensitivity of Runoff to Climatic Factors and the Attribution of Runoff Variation in the Upper Shule River, North-West China. Water 2024, 16, 1272. [Google Scholar] [CrossRef]
Xu, H.; Liu, L.; Wang, Y.; Wang, S.; Hao, Y.; Ma, J.; Jiang, T. Assessment of climate change impact and difference on the river runoff in four basins in China under 1.5 and 2.0 °C global warming. Hydrol. Earth Syst. Sci. 2019, 23, 4219–4231. [Google Scholar] [CrossRef]
Zou, L.; Zhou, T. Near future (2016-40) summer precipitation changes over China as projected by a regional climate model (RCM) under the RCP8, 5 emissions scenario: Comparison between RCM downscaling and the driving GCM. Adv. Atmos. Sci. 2013, 30, 806–818. [Google Scholar] [CrossRef]
Shukla, P.; Skeg, J.; Buendia, E.C.; Masson-Delmotte, V.; Pörtner, H.-O.; Roberts, D.; Zhai, P.; Slade, R.; Connors, S.; van Diemen, S. Climate Change and Land: An IPCC Special Report on Climate Change, Desertification, Land Degradation, Sustainable Land Management, Food Security, and Greenhouse Gas Fluxes in Terrestrial Ecosystems. 2019. Available online: https://www.ipcc.ch/srccl/ (accessed on 4 August 2025).
Haj-Amor, Z.; Acharjee, T.K.; Dhaouadi, L.; Bouri, S. Impacts of climate change on irrigation water requirement of date palms under future salinity trend in coastal aquifer of Tunisian oasis. Agric. Water Manag. 2020, 228, 105843. [Google Scholar] [CrossRef]
Piao, S.; Ciais, P.; Huang, Y.; Shen, Z.; Peng, S.; Li, J.; Zhou, L.; Liu, H.; Ma, Y.; Ding, Y. The impacts of climate change on water resources and agriculture in China. Nature 2010, 467, 43–51. [Google Scholar] [CrossRef] [PubMed]
Larraz, B.; García-Rubio, N.; Gámez, M.; Sauvage, S.; Cakir, R.; Raimonet, M.; Pérez, J.M.S. Socio-Economic Indicators for Water Management in the South-West Europe Territory: Sectorial Water Productivity and Intensity in Employment. Water 2024, 16, 959. [Google Scholar] [CrossRef]
Gong, G.; Wang, L.; Condon, L.; Shearman, A.; Lall, U. A simple framework for incorporating seasonal streamflow forecasts into existing water resource management practices 1. JAWRA J. Am. Water Resour. Assoc. 2010, 46, 574–585. [Google Scholar] [CrossRef]
Sunday, R.; Masih, I.; Werner, M.; van der Zaag, P. Streamflow forecasting for operational water management in the Incomati River Basin, Southern Africa. Phys. Chem. Earth Parts A/B/C 2014, 72, 1–12. [Google Scholar] [CrossRef]
Bărbulescu, A.; Zhen, L. Forecasting the River Water Discharge by Artificial Intelligence Methods. Water 2024, 16, 1248. [Google Scholar] [CrossRef]
Chu, H.; Wei, J.; Wu, W. Streamflow prediction using LASSO-FCM-DBN approach based on hydro-meteorological condition classification. J. Hydrol. 2020, 580, 124253. [Google Scholar] [CrossRef]
Feng, Z.-K.; Niu, W.-J.; Tang, Z.-Y.; Jiang, Z.-Q.; Xu, Y.; Liu, Y.; Zhang, H.-R. Monthly runoff time series prediction by variational mode decomposition and support vector machine based on quantum-behaved particle swarm optimization. J. Hydrol. 2020, 583, 124627. [Google Scholar] [CrossRef]
Xie, S.; Xiang, Z.; Wang, Y.; Wu, B.; Shen, K.; Wang, J. An Index Used to Evaluate the Applicability of Mid-to-Long-Term Runoff Prediction in a Basin Based on Mutual Information. Water 2024, 16, 1619. [Google Scholar] [CrossRef]
Xie, S.; Huang, Y.; Li, T.; Liu, Z.; Wang, J. Mid-long term runoff prediction based on a Lasso and SVR hybrid method. J. Basic Sci. Eng. 2018, 26, 709–722. [Google Scholar]
Shamir, E. The value and skill of seasonal forecasts for water resources management in the Upper Santa Cruz River basin, southern Arizona. J. Arid. Environ. 2017, 137, 35–45. [Google Scholar] [CrossRef]
Zhao, H.; Li, H.; Xuan, Y.; Bao, S.; Cidan, Y.; Liu, Y.; Li, C.; Yao, M. Investigating the critical influencing factors of snowmelt runoff and development of a mid-long term snowmelt runoff forecasting. J. Geogr. Sci. 2023, 33, 1313–1333. [Google Scholar] [CrossRef]
Nguyen, D.H.; Elshorbagy, A.; Khaliq, M.N.; Shen, C.; Akhtar, M.K.; Moghairib, M.; Unduche, F.; Razavi, S.; Lamontagne, P. Advancing Sub-Seasonal to Seasonal Streamflow Forecasting in Canada: A Review of Conventional and Emerging Approaches for Operational Applications. Results Eng. 2025, 27, 106345. [Google Scholar] [CrossRef]
He, C.; Chen, F.; Long, A.; Qian, Y.; Tang, H. Improving the precision of monthly runoff prediction using the combined non-stationary methods in an oasis irrigation area. Agric. Water Manag. 2023, 279, 108161. [Google Scholar] [CrossRef]
Samsudin, R.; Saad, P.; Shabri, A. River flow time series using least squares support vector machines. Hydrol. Earth Syst. Sci. 2011, 15, 1835–1852. [Google Scholar] [CrossRef]
Bennett, J.C.; Wang, Q.; Li, M.; Robertson, D.E.; Schepen, A. Reliable long-range ensemble streamflow forecasts: Combining calibrated climate forecasts with a conceptual runoff model and a staged error model. Water Resour. Res. 2016, 52, 8238–8259. [Google Scholar] [CrossRef]
Crochemore, L.; Ramos, M.-H.; Pappenberger, F. Bias correcting precipitation forecasts to improve the skill of seasonal streamflow forecasts. Hydrol. Earth Syst. Sci. 2016, 20, 3601–3618. [Google Scholar] [CrossRef]
Le, X.-H.; Ho, H.V.; Lee, G.; Jung, S. Application of long short-term memory (LSTM) neural network for flood forecasting. Water 2019, 11, 1387. [Google Scholar] [CrossRef]
Choi, J.; Won, J.; Jang, S.; Kim, S. Learning Enhancement Method of Long Short-Term Memory Network and Its Applicability in Hydrological Time Series Prediction. Water 2022, 14, 2910. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Deep learning and process understanding for data-driven Earth system science. Nature. 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Yaseen, Z.M.; El-Shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, K.N. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J. Hydrol. 2015, 530, 829–844. [Google Scholar] [CrossRef]
Liu, Y.; Yin, Z.; Zhang, Y.; Wang, Q. Mid and long-term hydrological classification forecasting model based on KDE-BDA and its application research. IOP Conf. Ser. Earth Environ. Sci. 2019, 330, 032010. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrol. Earth Syst. Sci. 2019, 23, 5089–5110. [Google Scholar] [CrossRef]
Asefa, T.; Kemblowski, M.; Mckee, M.; Khalil, A. Multi-time scale stream flow predictions: The support vector machines approach. J. Hydrol. 2006, 318, 7–16. [Google Scholar] [CrossRef]
Kalteh, A.M. Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform. Comput. Geosci. 2013, 54, 1–8. [Google Scholar] [CrossRef]
Maity, R.; Bhagwat, P.P.; Bhatnagar, A. Potential of support vector regression for prediction of monthly streamflow using endogenous property. Hydrol. Process. 2010, 24, 917–923. [Google Scholar] [CrossRef]
Noori, R.; Karbassi, A.R.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.H.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol. 2011, 401, 177–189. [Google Scholar] [CrossRef]
Lin, J.; Cheng, C.; Chau, K. Using support vector machines for long-term discharge prediction. Hydrol. Sci. J. 2006, 51, 599–612. [Google Scholar] [CrossRef]
Tao, H.; Abba, S.I.; Al-Areeq, A.M.; Tangang, F.; Samantaray, S.; Sahoo, A.; Siqueira, H.V.; Maroufpoor, S.; Demir, V.; Dhanraj Bokde, N.; et al. Hybridized artificial intelligence models with nature-inspired algorithms for river flow modeling: A comprehensive review, assessment, and possible future research directions. Eng. Appl. Artif. Intell. 2024, 129, 107559. [Google Scholar] [CrossRef]
Kelly, R.A.; Jakeman, A.J.; Barreteau, O.; Borsuk, M.E.; ElSawah, S.; Hamilton, S.H.; Henriksen, H.J.; Kuikka, S.; Maier, H.R.; Rizzoli, A.E. Selecting among five common modelling approaches for integrated environmental assessment and management. Environ. Model. Softw. 2013, 47, 159–181. [Google Scholar] [CrossRef]
Mount, N.J.; Maier, H.R.; Toth, E.; Elshorbagy, A.; Solomatine, D.; Chang, F.-J.; Abrahart, R. Data-driven modelling approaches for socio-hydrology: Opportunities and challenges within the Panta Rhei Science Plan. Hydrol. Sci. J. 2016, 61, 1192–1208. [Google Scholar] [CrossRef]
Roy, A.; Kasiviswanathan, K.; Patidar, S.; Adeloye, A.J.; Soundharajan, B.S.; Ojha, C.S.P. A novel physics-aware machine learning-based dynamic error correction model for improving streamflow forecast accuracy. Water Resour. Res. 2023, 59, e2022WR033318. [Google Scholar] [CrossRef]
Sujay, R.N.; Deka, P.C. Support vector machine applications in the field of hydrology: A review. Appl. Soft. Comput. J. 2014, 19, 372–386. [Google Scholar]
Liang, Z.; Li, Y.; Hu, Y.; Li, B.; Wang, J. A data-driven SVR model for long-term runoff prediction and uncertainty analysis based on the Bayesian framework. Theor. Appl. Climatol. 2018, 133, 137–149. [Google Scholar] [CrossRef]
Wang, Q.; Zhao, T.; Yang, Q.; Robertson, D. A Seasonally Coherent Calibration (SCC) Model for Postprocessing Numerical Weather Predictions. Mon. Weather. Rev. 2019, 147, 3633–3647. [Google Scholar] [CrossRef]
May, R.; Dandy, G.; Maier, H. Review of Input Variable Selection Methods for Artificial Neural Networks. InTech 2011, 10, 19–45. [Google Scholar]
Mo, R.; Xu, B.; Zhong, P.-A.; Zhu, F.; Huang, X.; Liu, W.; Xu, S.; Wang, G.; Zhang, J. Dynamic long-term streamflow probabilistic forecasting model for a multisite system considering real-time forecast updating through spatio-temporal dependent error correction. J. Hydrol. 2021, 601, 126666. [Google Scholar] [CrossRef]
Qian, X.; Wang, B.; Chen, J.; Fan, Y.; Mo, R.; Xu, C.; Liu, W.; Liu, J.; Zhong, P. An explainable ensemble deep learning model for long-term streamflow forecasting under multiple uncertainties. J. Hydrol. 2025, 662, 133968. [Google Scholar] [CrossRef]
Mo, R.; Xu, B.; Zhong, P.-A.; Dong, Y.; Wang, H.; Yue, H.; Zhu, J.; Wang, H.; Wang, G.; Zhang, J. Long-term probabilistic streamflow forecast model with “inputs–structure–parameters” hierarchical optimization framework. J. Hydrol. 2023, 622, 129736. [Google Scholar] [CrossRef]
Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
Salinas, D.; Bohlke-Schneider, M.; Callot, L.; Medico, R.; Gasthaus, J. High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. arXiv 2019, arXiv:1910.03002. [Google Scholar]
Schaduangrat, N.; Anuwongcharoen, N.; Charoenkwan, P.; Shoombuatong, W. DeepAR: A novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J. Cheminform. 2023, 15, 50. [Google Scholar] [CrossRef]
Li, J.; Chen, W.; Zhou, Z.; Yang, J.; Zeng, D. DeepAR-Attention probabilistic prediction for stock price series. Neural Comput. Appl. 2024, 36, 15389–15406. [Google Scholar] [CrossRef]
Liao, Y.; Liang, C. A Temperature Time Series Forecasting Model Based on DeepAR. In Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Chengdu, China, 10–13 December 2021. [Google Scholar]
Chow, V. Applied Hydrology; McGraw-Hill: Columbus, OH, USA, 1971. [Google Scholar]
England Jr, J.F.; Cohn, T.A.; Faber, B.A.; Stedinger, J.R.; Thomas, W.O., Jr.; Veilleux, A.G.; Kiang, J.E.; Mason, R.R., Jr. Guidelines for Determining Flood Flow Frequency—Bulletin 17C; US Geological Survey: Reston, VA, USA, 2018.
Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environ. Model. Softw. 2014, 54, 108–127. [Google Scholar] [CrossRef]
Xie, S.; Wu, W.; Mooser, S.; Wang, Q.; Nathan, R.; Huang, Y. Artificial neural network based hybrid modeling approach for flood inundation modeling. J. Hydrol. 2021, 592, 125605. [Google Scholar] [CrossRef]
Maier, H.R.; Jain, A.; Dandy, G.C.; Sudheer, K.P. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environ. Model. Softw. 2010, 25, 891–909. [Google Scholar] [CrossRef]
Demir, V. Evaluation of Solar Radiation Prediction Models Using AI: A Performance Comparison in the High-Potential Region of Konya, Türkiye. Atmosphere 2025, 16, 398. [Google Scholar] [CrossRef]
Kingma, D.; Adam, J.B. A Method for Stochastic Optimization. Comput. Sci. 2014, 5, 6. [Google Scholar]
De Faria, V.; De Queiroz, A.; Lima, L.; Lima, J.; Da Silva, B. An assessment of multi-layer perceptron networks for streamflow forecasting in large-scale interconnected hydrosystems. Int. J. Environ. Sci. Technol. 2022, 19, 5819–5838. [Google Scholar] [CrossRef]
Humphrey, G.B.; Maier, H.R.; Wu, W.; Mount, N.J.; Dandy, G.C.; Abrahart, R.J.; Dawson, C.W. Improved validation framework and R-package for artificial neural network models. Environ. Model. Softw. 2017, 92, 82–106. [Google Scholar] [CrossRef]
Gneiting, T.; Raftery, A.E.; Westveld, A.H., III; Goldman, T. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather. Rev. 2005, 133, 1098–1118. [Google Scholar] [CrossRef]
Renard, B.; Kavetski, D.; Kuczera, G.; Thyer, M.; Franks, S.W. Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resour. Res. 2010, 46, W05521. [Google Scholar] [CrossRef]
See, L.; Abrahartand, R.J. Multi-model data fusion for hydrological forecasting. Comput. GeoSci. 2001, 27, 987–994. [Google Scholar] [CrossRef]
Azmi, M.; Araghinejad, S.; Kholghi, M. Multi model data fusion for hydrological forecasting using k-nearest neighbour method. Iran. J. Sci. Technol. 2010, 34, 81. [Google Scholar]
Wang, Q.; Schepen, A.; Robertson, D.E. Merging seasonal rainfall forecasts from multiple statistical models through Bayesian model averaging. J. Clim. 2012, 25, 5524–5537. [Google Scholar] [CrossRef]
Schepen, A.; Wang, Q.; Everingham, Y. Calibration, bridging, and merging to improve GCM seasonal temperature forecasts in Australia. Mon. Weather. Rev. 2016, 144, 2421–2441. [Google Scholar] [CrossRef]

Figure 1. DeepAR-based modeling framework.

Figure 2. Study area.

Figure 3. Observed 10-day streamflow, linear trend, and data splitting for two study areas: (a) Upper WDDR area; (b) Upper SXR area.

Figure 4. The workflow diagram.

Figure 5. Distribution fitting figure: (a) Upper WDDR area; (b) Upper SXR area.

Figure 6. RMSE of six models across various forecast horizons: (a) Upper WDDR area; (b) Upper SXR area.

Figure 7. Probabilistic predictive performance metrics of six models across various forecast horizons. Subfigures (a,c,e) are CRPSs, PICPs and MPIWs of six models in the Upper WDDR, and Subfigures (b,d,f) are CRPSs, PICPs and MPIWs of six models in the Upper SXR.

Figure 8. Predicted streamflow by different models at the 1st forecast lead time: (a) Upper WDDR area; (b) Upper SXR area.

Figure 9. Predicted streamflow at different forecast lead times: (a) Upper WDDR area; (b) Upper SXR area.

Table 1. Available data.

Study Area	Upper WDDR Area		Upper SXR Area
Variable	Naturalized Streamflow	Areal Mean Precipitation	Naturalized Streamflow	Areal Mean Precipitation
Temporal coverage and scale	January 1980 to September 2022 10-day time scale
Min	557 m³/s	0.00 mm	2910 m³/s	0.10 mm
Max	18,600 m³/s	90.83 mm	56,500 m³/s	91.41 mm
Average	3819.97 m³/s	17.59 mm	13,432.22 m³/s	22.69 mm
Standard deviation	3247.06 m³/s	18.98 mm	10,016.31 m³/s	20.36 mm
Coefficient of variation	0.85	1.07	0.75	0.84
Skewness	1.35	0.22	1.21	−0.26
Kurtosis	1.23	1.08	0.98	0.89

Table 2. AIC and BIC values of three distributions for two study areas.

Study Area	Distribution	AIC	BIC
Upper WDDR area	Normal	25,335.35	25,345.73
	Student’s t	25,223.80	25,239.38
	Gamma	24,044.36	24,059.95
Upper SXR area	Normal	28,283.95	28,294.34
	Student’s t	28,285.95	28,301.54
	Gamma	27,308.55	27,324.14

Note: The bold values indicate the minimum AIC/BIC.

Table 3. Average RMSE on the validation dataset of different models.

Input Configuration		Average RMSE on Validation Dataset of Different Models for Upper WDDR Area (m³/s)	Average RMSE on Validation Dataset of Different Models for Upper SXR Area (m³/s)
Lags of precipitation	0	1277.07	3634.55
	0, 1	1237.00	3616.81
	0, 1, 2	1199.21	3481.18
Lags of streamflow	1	1245.33	3577.72
Lags of streamflow	1, 2	1230.19	3577.31

Table 4. Predictive performance on the testing dataset of six models with different RNN structures and different probability distribution outputs.

Study Area	Evaluation Metrics	Model
Study Area	Evaluation Metrics	GRU-N	GRU-S	GRU-G	LSTM-N	LSTM-S	LSTM-G
Upper WDDR area	RMSE (m³/s)	1356.74	1282.09	1098.98	1407.77	1331.08	1016.54
	MAE (m³/s)	817.92	775.23	699.40	844.78	816.79	643.99
	NSE	0.82	0.84	0.88	0.81	0.83	0.89
	CRPS (m³/s)	608.89	578.95	517.54	637.95	620.08	473.26
	PICP (%)	90.08	84.81	92.93	87.09	76.84	93.15
	MPIW (m³/s)	3980.52	3585.95	3330.88	3778.34	3228.71	3485.17
Upper SXR area	RMSE (m³/s)	4217.12	4143.33	4057.33	4091.22	4296.16	4047.15
	MAE (m³/s)	2346.57	2289.88	2222.94	2370.06	2487.38	2312.61
	NSE	0.86	0.87	0.87	0.87	0.86	0.87
	CRPS (m³/s)	1776.92	1716.52	1654.24	1771.43	1870.94	1717.93
	PICP (%)	91.77	91.45	96.46	93.11	87.27	96.54
	MPIW (m³/s)	10,015.73	10,669.20	11,048.15	11,218.44	9156.87	11,464.91

Note: The bold values indicate the best predictive performance in terms of the specific evaluation metric.

Table 5. RMSE of six models across various forecast horizons.

Forecast Horizon (10-Day Periods)	Upper WDDR Area						Upper SXR Area
Forecast Horizon (10-Day Periods)	GRU-N	GRU-S	GRU-G	LSTM-N	LSTM-S	LSTM-G	GRU-N	GRU-S	GRU-G	LSTM-N	LSTM-S	LSTM-G
1	876.2	878.3	940.6	892.9	881.5	859.8	3897.6	4024.8	4126.8	3863.2	4022.0	3687.2
2	1035.7	1007.6	1017.9	1083.1	1054.9	958.9	4031.0	4091.6	4104.6	3880.4	4254.2	3841.5
3	1124.9	1085.5	995.7	1164.5	1129.9	913.6	4167.4	4109.5	4221.6	4045.0	4260.8	3938.4
4	1201.3	1157.9	1032.1	1235.9	1190.4	924.8	3982.7	3978.0	3947.0	4011.7	4176.2	4011.2
5	1239.9	1159.2	1006.4	1264.5	1178.6	902.7	4080.1	4002.4	4118.1	4106.4	4272.7	3977.0
6	1270.4	1215.3	1020.5	1312.1	1219.7	912.3	4098.8	4148.2	4043.1	4024.7	4213.8	4068.3
7	1330.8	1247.1	1082.2	1351.1	1263.2	986.2	4145.4	4102.0	4044.4	4104.7	4285.8	4020.6
8	1367.0	1303.7	1119.5	1406.9	1350.6	1051.4	4190.1	4031.1	3977.6	4065.0	4249.4	3936.1
9	1408.0	1340.6	1102.5	1467.4	1390.4	1093.2	4297.1	4161.7	4030.8	4063.2	4283.7	4023.0
10	1453.8	1350.4	1119.3	1492.2	1430.3	1089.7	4312.4	4183.7	4002.9	4155.7	4348.2	4117.6
11	1451.8	1362.3	1116.0	1518.1	1441.8	1068.9	4343.3	4163.7	4020.1	4116.1	4329.2	4064.6
12	1487.0	1394.6	1153.5	1532.0	1458.4	1073.3	4374.3	4184.0	4009.1	4158.2	4295.4	4095.5
13	1483.9	1394.1	1152.8	1545.5	1469.7	1048.0	4328.6	4229.5	4107.5	4140.8	4311.1	4103.9
14	1485.5	1394.9	1167.4	1559.4	1449.5	1055.8	4358.9	4277.4	4034.3	4176.3	4406.5	4258.0
15	1484.5	1390.3	1174.9	1574.5	1454.6	1071.8	4339.0	4217.5	4096.7	4170.8	4371.7	4235.7
16	1494.8	1391.5	1164.6	1565.6	1456.0	1069.7	4331.0	4219.1	4042.1	4119.7	4391.1	4180.6
17	1507.1	1407.0	1177.8	1574.5	1474.5	1068.0	4336.2	4245.0	4098.1	4216.8	4428.7	4178.2
18	1503.5	1430.7	1192.9	1559.5	1468.4	1096.7	4251.0	4194.8	3998.1	4203.0	4411.2	4070.5

Note: The bold values indicate the miminum RMSE value among six models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, S.; Wang, D.; Wang, J.; Yang, C.; Shen, K.; Jia, B.; Cao, H. A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction. Water 2025, 17, 2506. https://doi.org/10.3390/w17172506

AMA Style

Xie S, Wang D, Wang J, Yang C, Shen K, Jia B, Cao H. A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction. Water. 2025; 17(17):2506. https://doi.org/10.3390/w17172506

Chicago/Turabian Style

Xie, Shuai, Dong Wang, Jin Wang, Chunhua Yang, Keyan Shen, Benjun Jia, and Hui Cao. 2025. "A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction" Water 17, no. 17: 2506. https://doi.org/10.3390/w17172506

APA Style

Xie, S., Wang, D., Wang, J., Yang, C., Shen, K., Jia, B., & Cao, H. (2025). A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction. Water, 17(17), 2506. https://doi.org/10.3390/w17172506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A DeepAR-Based Modeling Framework for Probabilistic Mid–Long-Term Streamflow Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. DeepAR Model

2.1.1. Training

2.1.2. Prediction

2.1.3. Likelihood Model

2.2. DeepAR-Based Modeling Framework

2.2.1. Data Preparing

2.2.2. Data Splitting

2.2.3. Model Calibration

2.2.4. Model Evaluation

2.3. Case Study and Data

2.4. Experiment Setup

3. Results

3.1. Optimal Probability Distribution Selection

3.2. Input Configuration Optimization

3.3. Testing Performance Evaluation

4. Discussion

4.1. Deterministic Prediction Performance of Different Models

4.2. Probabilistic Prediction Performance of Different Models

4.3. Overall Predictive Performance

4.4. Limitations and Future Research Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI