Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting

Jia, Benjun; Fang, Wei

doi:10.3390/rs17132314

Open AccessArticle

Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting

by

Benjun Jia

¹ and

Wei Fang

^2,3,*

¹

Hubei Key Laboratory of Intelligent Yangtze and Hydroelectric Science, China Yangtze Power Co., Ltd., Yichang 443000, China

²

School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

³

College of Civil Engineering, Fuzhou University, Fuzhou 350108, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(13), 2314; https://doi.org/10.3390/rs17132314

Submission received: 25 May 2025 / Revised: 27 June 2025 / Accepted: 4 July 2025 / Published: 5 July 2025

(This article belongs to the Special Issue Machine Learning and Automation in Remote Sensing Applied in Hydrological Processes)

Download

Browse Figures

Versions Notes

Abstract

High-accuracy streamflow forecasting with long lead times can help promote the efficient utilization of water resources. However, the construction of cascade reservoirs has allowed the evolution of natural continuous rivers into multi-block rivers. The existing streamflow forecasting methods fail to consider the impact of reservoir operation. Thus, a novel short-term streamflow forecasting method for multi-block watersheds was proposed by integrating machine learning and hydrological models. Firstly, based on IMERG precipitation, the forecast precipitation product’s error is corrected by the long short-term memory neural network (LSTM). Secondly, coupling convolutional LSTM (ConvLSTM) and LSTM, operation rules for cascade reservoirs are extracted. Thirdly, a short-term deterministic streamflow forecasting model was built for multi-block watersheds. Finally, according to the sources of forecasting errors, probabilistic streamflow forecasting models based on the Gaussian mixture model (GMM) were proposed, and their performances were compared. Taking the Yalong River as an example, the main results are as follows: (1) Deep learning models (ConvLSTM and LSTM) show good performance in forecast precipitation correction and reservoir operation rule extraction, contributing to streamflow forecasting accuracy. (2) The proposed streamflow deterministic forecasting method has good forecasting performance with NSE above 0.83 for the following 1–5 days. (3) The GMM model, using upstream evolutionary forecasted streamflow, interval forecasted streamflow, and downstream forecasted streamflow as the input–output combination, has good probabilistic forecasting performance and can adequately characterize the “non-normality” and “heteroskedasticity” of forecasting uncertainty.

Keywords:

short-term streamflow forecasting; machine learning; reservoir operation; meteo-hydrological coupling; probabilistic forecasting

1. Introduction

Water resources are fundamental for maintaining human survival, sustaining ecosystem functions, and supporting sustainable socio-economic development [1]. In recent years, the dual effects of climate change and human activities have significantly altered the spatiotemporal distribution of the water cycle, increasing the probability of extreme hydrologic events and triggering frequent sudden floods and droughts [2,3]. For example, an extreme drought occurred in the Yangtze River, causing tens of millions of people to experience difficulties in using water and electricity, while floods in Liaoning Province in summer 2022 severely damaged agriculture and infrastructure [4,5]. These seriously threaten the safety of people’s lives and property, social stability, and economic development. Faced with these growing risks, enhancing early warning capabilities has become imperative. Consequently, accurately forecasting short-term streamflow is critical as a foundational tool to improve disaster prevention and mitigation, optimize water resource utilization efficiency, and ultimately promote sustainable societal development.

Short-term streamflow forecasting can be divided into single-value deterministic and multi-value probabilistic forecasting according to the outputs. The former combines hydrological models with forecast precipitation to obtain single-value forecasted streamflow under different lead times, whose accuracy depends on the hydrological models’ performance and forecast precipitation accuracy. It includes the unidirectional and bidirectional land–atmosphere coupling model [6]. The unidirectional land–atmosphere coupling model first uses numerical weather prediction models to obtain forecast precipitation, and then drives hydrological models to forecast streamflow. For example, Wang et al. proposed an urban unidirectional land–atmosphere coupling flood forecasting model based on Storm Water Management Model and Weather Research and Forecasting Model, and successfully extended the effective lead time of urban flood forecasting to 6 h in Zhengzhou [7]. Liu et al. used the forecast precipitation of the European Centre for Medium-Range Weather Forecasts to drive the Variable Infiltration Capacity Model in Yarlung Zangbo River, effectively improving the accuracy of flood forecasting and extending lead time to 5 days [8]. These studies indicate that the unidirectional land–atmosphere coupling model has a simple structure and is easy to model, and can significantly improve streamflow forecasting accuracy and extend lead times [9,10]. However, it operates numerical prediction models and hydrological models separately without sharing the real-time status of atmospheric and land surface factors, and is unable to make real-time corrections to the boundary conditions of numerical prediction models and inputs of hydrological models [11]. There is still room for further improvement in its forecast accuracy. Thus, the bidirectional land–atmosphere coupling model embeds hydrological models into numerical weather prediction models through the shared mutual feed channel, realizing the refined description of the rainfall–runoff formation process [12]. For example, Larsen et al. built an open modeling interface to achieve a bidirectional coupling of a regional climate model and the MIKE SHE model, and applied it to the Skjern River with good results [13]. Gu et al. coupled the WRF model and the WRF-Hydro model to establish a bidirectional land–atmosphere coupling flood forecasting model, which could accurately characterize the flood formation process and reduce flood forecasting errors [14]. Although the bidirectional land–atmosphere coupling model theoretically portrays the water cycle process more accurately with real-time corrections for meteorological factors and streamflow, the complexity of the internal structure and the cumbersome calculations make it ineffective for practical application. Therefore, multi-value probabilistic streamflow forecasting research has emerged to quantify forecasting error uncertainty and provide more possible outcomes.

According to the way the results are generated, probabilistic streamflow forecasting can be roughly divided into three categories of methods. The first method generates probabilistic forecast results through the Bayesian forecasting system according to deterministic forecast results [15]. It can comprehensively analyze the uncertainty of model inputs, parameters, or structures, is easy to couple with hydrological models, and is widely used in practice [16,17,18]. However, it is subjective in determining the prior distribution, which affects the forecasting effect. The second method modifies the parameters, structure, or outputs of deterministic streamflow forecasting models to generate multi-value forecast results for different periods. Common methods include generalized likelihood uncertainty estimation, upper and lower bound estimation, and so on [19,20]. These methods are simple in structure with good results, but they are highly dependent on data, and model training becomes more and more difficult with the increase of set objectives and required parameters. The last method directly relies on the correlation of the input factors with a forecast variable to build probabilistic streamflow forecasting models, including Gaussian process regression, Gaussian mixture model, and so on. For example, Sun et al. applied the Gaussian process regression model to streamflow forecasting in 438 regions of the United States, and the results showed that its forecasting effect was better than linear regression models and artificial neural networks [21]. Liu et al. used the Gaussian mixture model to conduct probabilistic streamflow forecasting research in Jinsha River, which reasonably quantified the forecast uncertainty and outperformed the traditional machine learning models [22]. These models have the advantages of clear theory, easy operation, and fully reflecting the distribution of streamflow forecasting errors, whose application prospects in probabilistic streamflow forecasting are broad.

In practice, many reservoirs have been built and operated globally to allocate water resources scientifically and reasonably [23]. The continuous construction and operation of large-scale reservoirs inevitably lead to the evolution of natural continuous rivers into multi-block rivers, which exacerbates the uncertainty of interval and downstream inflow. Thus, some problems still need further exploration for short-term streamflow forecasting. The main problems are as follows: (1) Most of the previous methods focus on the rainfall–runoff relationship to forecast river streamflow in the natural state, neglecting the influence of reservoir operation; (2) With the continuous improvement of precipitation forecast, how to effectively extend forecast lead times still needs to be studied under the multi-block condition. (3) Further analysis is needed to determine how uncertainty in precipitation forecast and reservoir operation affects the accuracy of short-term streamflow forecasting.

Therefore, the objective of this study was to explore a high-accuracy streamflow forecasting method with long lead times and analyze its uncertainty based on the coupling of precipitation forecast, reservoir operation, and hydrological models. The main contributions are as follows:

(1): A precipitation forecast correction model was developed based on deep learning to obtain high-accuracy forecast precipitation for the entire basin as input.
(2): An operation rule extraction model for cascade reservoirs was built by considering the hydraulic correlation between reservoirs and hydro-meteorological spatiotemporal information.
(3): A novel short-term streamflow forecasting method was proposed with meteo-hydrological coupling under the influence of reservoir regulation.
(4): A probabilistic streamflow forecasting method is proposed based on the Gaussian mixture model, and the influence of different input–output combinations on the results was evaluated to reasonably portray the forecasting uncertainty.

The remainder of this study is organized as follows: The methodology for short-term streamflow forecasting is described in Section 2. The detailed study area and data presentation are provided in Section 3. The results and discussion are presented in Section 4. Finally, the conclusions are presented in Section 5.

2. Methodology

The primary focus of this study was on exploring a novel streamflow forecasting method under the influence of reservoir regulation. The relevant methodology and evaluation indices are described in detail below.

2.1. Framework

In this study, a novel framework for short-term streamflow forecasting was proposed using machine learning and physically based hydrological models (Figure 1). The steps of this framework are as follows.

(1)

Correction of forecast precipitation:

•: First, the original forecast precipitation under different lead times and observed precipitation were unified to the same spatiotemporal resolution;
•: Second, for each lead time, the errors of the original forecast precipitation at each grid point were corrected using a long short-term memory (LSTM) neural network;
•: Finally, the corrected forecast precipitation was used to forecast restored streamflow using the hydrological model to verify the correction accuracy and determine the effective forecast information.

(2)

Operation rule extraction for cascade reservoirs:

•: First, historical operation data for cascade reservoirs and multi-step forecast information were collected to build different input–output datasets;
•: Second, operation rule extraction models were built by coupling convolutional LSTM (ConvLSTM), LSTM, and the multiple-input-multiple-output (MIMO) strategy;
•: Finally, different input datasets were used to drive extraction models, and the best one was determined by evaluating extraction accuracy.

(3)

Deterministic streamflow forecasting:

The short-term deterministic streamflow forecasting model was built by coupling forecast precipitation information, the reservoir operation rule extraction model, and the hydrological model; streamflow at each station was forecasted under different lead times to evaluate model performance and analyze the sources of forecasting errors.

(4)

Probabilistic streamflow forecasting:

•: First, different input–output combinations were constructed based on deterministic streamflow forecasting results;
•: Second, introducing the Gaussian mixture model, the probabilistic forecasting model was developed;
•: Finally, probabilistic streamflow forecasting process at each station was obtained to evaluate model’s probability forecasting performance and analyze the influence of different input–output combinations on probabilistic forecasting accuracy.

2.2. LSTM for Forecast Precipitation Correction

Due to the high complexity of atmospheric precipitation, numerical weather forecast models have large errors when applied locally [24]. Thus, it is necessary to correct them to meet the needs of practical applications. The general idea of correction is to correct the forecast precipitation errors through time-series models according to the correlation between historical precipitation and each forecast precipitation product.

The long short-term memory (LSTM) neural network is one of the representatives of time-series models, which has a powerful capability to learn time-series data [25]. To address the problem of exploding or vanishing gradients in the recurrent neural network, LSTM adds memory units to the hidden layer, including an input gate, a forgetting gate, and an output gate [26]. The three gates work together to selectively add, retain, or discard current and historical information, giving LSTM a good long-term dependency capture capability. The internal computational expressions of LSTM are as follows:

\{\begin{cases} F_{t} = σ (w_{X F} \times X_{t} + w_{H F} \times H_{t - 1} + b_{F}) \\ I_{t} = σ (w_{X I} \times X_{t} + w_{H I} \times H_{t - 1} + b_{I}) \\ C_{t} = F_{t} \times C_{t - 1} + I_{t} \times \tanh (w_{X C} \times X_{t} + w_{H C} \times H_{t - 1} + b_{C}) \\ O_{t} = σ (w_{X O} \times X_{t} + w_{H O} \times H_{t - 1} + b_{O}) \\ H_{t} = O_{t} \times \tanh (C_{t}) \end{cases}

(1)

where X denotes input factors; F_t, I_t, and O_t denote the activation values of the input gate, forgetting gate, and output gate at t-moment; C_t and H_t denote the output of the cell state and hidden layer at t-moment; w and b denote the weight vector and bias vector; and tanh and

σ

denote the Tanh and Sigmoid functions.

For each LSTM-based precipitation forecast correction model, the parameters that need to be trained include the learning rate, batch size, hidden layer dimension, dropout rate, number of input fronts, and so on. They are all trained through the Bayesian optimization algorithm.

2.3. Operation Rule Extraction for Cascade Reservoirs

The joint operation of cascade reservoirs is a complicated task with multiple objectives, multiple constraints, and reservoirs’ mutual feedback [25,27]. It is also affected by uncertainty in precipitation and streamflow forecasting. Thus, operation rule extraction for cascade reservoirs must consider reservoirs’ connectivity and multi-step effective forecasting information.

In this study, considering precipitation spatiotemporal information and reservoirs’ state, a reservoir operation rule extraction (ConvLSTM-LSTM) model was developed by coupling convolutional LSTM (ConvLSTM) and LSTM. Based on LSTM, ConvLSTM connects different hierarchical units internally by convolutional operation, which has a powerful capability of capturing spatiotemporal information and is widely used in precipitation and streamflow forecasting [28,29]. From this point, the computation of ConvLSTM-LSTM can be divided into two processes. ConvLSTM is used to capture precipitation spatiotemporal information between reservoirs to simulate interval streamflow, similar to the rainfall–runoff process. Then, LSTM aggregates the captured precipitation information and all the variables affecting reservoir operation to obtain outflow.

By combining the ConvLSTM-LSTM model with the MIMO strategy, the operation rule for cascade reservoirs can be extracted. Considering cascade reservoirs as a whole, inflow of the first reservoir, all reservoirs’ previous outflow and initial water level, interval precipitation and current time are selected as inputs, and all reservoirs’ current outflow is selected as output. The expression is as follows:

\begin{array}{l} [Q_{o u t, t}^{f, 1}, Q_{o u t, t}^{f, 2}, \dots, Q_{o u t, t}^{f, N}] = \\ F_{C o n v L S T M - L S T M} (t, \cup_{j = t}^{t + L - 1} Q_{i n, t}^{f, 1}, \cup_{j = t - 10 + L}^{t - 1} Q_{i n, j}^{o, 1}, \cup_{i = 1}^{N} Z_{i n i t i a l, t}^{o, i}, {\{\cup_{j = t - 10}^{t - 1} Q_{o u t, j}^{o, 1}\}}_{i = 1}^{N}, {\{\cup_{j = t}^{t + L - 1} P_{i n t e r v a l, j}^{f, i}, \cup_{j = t - 10 + L}^{t - 1} P_{i n t e r v a l, j}^{o, i}\}}_{i = 1}^{N}) \end{array}

(2)

where t denotes the current time; N denotes the number of reservoirs; L denotes the number of lead times;

Q_{i n}^{o}

and

Q_{i n}^{f}

denote the observed and forecasted inflow;

Q_{o u t}^{o}

and

Q_{o u t}^{f}

denote observed and forecasted outflow;

Z_{i n i t i a l}

denotes the initial water level; and

P_{i n t e r v a l}

denotes the interval precipitation between reservoirs.

2.4. Short-Term Deterministic Streamflow Forecasting

By coupling forecast precipitation, the reservoir operation rule extraction model, and the hydrological model, the short-term deterministic streamflow forecasting method under the multi-block condition was proposed, as shown in Figure 2. Among them, the selected hydrological model is the lumped Xin’anjiang model (XAJ). Based on the theory of runoff generation under saturated condition, XAJ completes the calculation of evapotranspiration, runoff generation and convergence sequentially through unit division, soil stratification, water source division, and runoff generation and convergence stage division to obtain the outlet’s streamflow [30].

The main steps of the short-term deterministic streamflow forecasting method under the multi-block condition are as follows:

Step 1: According to the spatial topological relationship, the forecast sub-intervals are divided, and the short-term streamflow forecasting system is constructed.

Step 2: By driving the hydrological model with forecast precipitation, streamflow of the 1st station and each sub-interval streamflow are forecasted under different lead times; Streamflow of the 2nd station under different lead times is forecasted by superimposing river propagation streamflow of the 1st station’s forecasted streamflow and interval forecasted streamflow; Streamflow of each hydrologic station is forecasted and propagated along the river until the first reservoir’s forecasted inflow is obtained.

Step 3: Outflow of each reservoir in the 1st lead time is forecasted by the operation rule extraction model, and the final water level of each reservoir is obtained through the water balance equation and the water level–capacity curve; according to the reservoir operation constraints, the forecasted outflow and final water level are simply corrected with the following expressions.

\{\begin{cases} Q_{o u t, t}^{m i n} < Q_{o u t, t} < Q_{o u t, t}^{m a x} \\ Z_{t}^{m i n} < Z_{t} < Z_{t}^{m a x} \\ |Z_{f i n a l, t} - Z_{i n i t i a l, t}| \leq Δ Z_{t} \end{cases}

(3)

where

Q_{o u t}^{m i n}

and

Q_{o u t}^{m a x}

denote the minimum and maximum outflow;

Z^{m i n}

and

Z^{m a x}

denote the minimum and maximum water level; and

Δ Z

denotes variation amplitude limits of the water level.

Step 4: Inflow of remaining reservoirs and streamflow of remaining stations downstream of the 1st reservoir in the 1st lead time are forecasted by superimposing river propagation streamflow of corrected outflow and interval forecasted streamflow.

Step 5: The initial water of each reservoir in the 2nd lead time is equivalent to the final water level in the 1st lead time, and they are used as inputs along with the forecasted inflow and outflow in the 1st lead time to drive the operation rule extraction model, forecasting each reservoir’s outflow in the 2nd lead time.

Step 6: Repeat steps 3~5 until streamflow of hydrologic stations and inflow of all reservoirs are forecasted under different lead times.

2.5. GMM for Probabilistic Streamflow Forecasting

The Gaussian mixture model (GMM) is a probabilistic mixture model formed by a linear combination of multiple sub-models obeying the Gaussian distribution [31]. Its basic assumption is that all data points can be composed of many Gaussian distributions. Thus, GMM can be divided into K sub-models, and each data point is categorized according to the probability magnitude to train the parameters of each sub-model [32]. The final result is obtained by linearly combining the sub-models. The expressions for GMM are as follows:

\{\begin{cases} p (X | θ) = \sum_{k = 1}^{K} π_{k} N (X | μ_{k}, Σ_{k}) \\ N (X | μ_{k}, Σ_{k}) = \frac{1}{{(2 π)}^{\frac{D}{2}} {|Σ_{k}|}^{\frac{1}{2}}} e^{- \frac{{(X - μ_{k})}^{T} Σ_{k}^{- 1} (X - μ_{k})}{2}} \end{cases}

(4)

where X denotes the D-dimensional model inpu;

p (X | θ)

denotes the probability density function;

N (X | μ_{k}, Σ_{k})

denotes GMM’s kth component;

θ = \{π, μ, Σ\}

denotes GMM’s parameters;

π

denotes the weight of each sub-model with the limit of

\sum_{k = 1}^{K} π_{k} = 1

and

0 \leq π_{k} \leq 1

;

μ

and

Σ

denote the mean value and covariance matrix of each sub-model; and K denotes the number of sub-models.

θ = \{π, μ, Σ\}

and K need to be optimized. In this study, these parameters were optimized by combining the Akaike information criterion (AIC), K-Means algorithm, and Expectation-Maximum (EM) algorithm. First, the range of K was determined to assign values; second, the initial value of

θ

was obtained by the K-Means algorithm; EM was used to optimize

θ

, and the AIC value of the optimized GMM model was calculated; finally, the previous process was repeated, and the model with the smallest AIC value was selected as the final model.

If X is split into two sub-input datasets

[X^{1}, X^{2}]

containing input factors and a decision variable, the following expression will be obtained by solving Equation (4).

\{\begin{cases} p (X^{2} | X^{1}) = \frac{p (X^{1}, X^{2})}{p (X^{1})} = \sum_{k = 1}^{K} w_{k} N (X^{2} | μ_{k}^{2 | 1}, Σ_{k}^{2 | 1}) \\ w_{k} = \frac{π_{k} N (X^{1} | μ_{k}^{1}, Σ_{k}^{11})}{\sum_{k = 1}^{K} π_{k} N (X^{1} | μ_{k}^{1}, Σ_{k}^{11})} \end{cases}

(5)

Furthermore, according to the composition of deterministic forecasted streamflow, four GMM-based probabilistic streamflow forecasting models (GMM-FO, GMM-Fe, GMM-RPO, and GMM-RPe) can be developed. GMM-FO uses forecasted and observed streamflow as inputs–outputs, whose expression is as follows:

Y_{i}^{l} = p_{GMM - FO} (Q_{i, O}^{l} | Q_{i, F}^{l}), i = 1, 2, \dots, n, l = 1, 2, \dots, L

(6)

where n denotes the number of forecasting stations, and L denotes the lead time.

GMM-Fe uses forecasted streamflow and errors as inputs–outputs, whose expression is as follows:

\{\begin{cases} e_{i}^{l} = Q_{i, O}^{l} - Q_{i, F}^{l} \\ Y_{i}^{l} = p_{GMM - Fe} (e_{i}^{l} | Q_{i, F}^{l}) + Q_{i, F}^{l} \end{cases} i = 1, 2, \dots, n, l = 1, 2, \dots, L

(7)

where

e_{i}^{l}

denotes forecasting errors.

GMM-RPO uses interval forecasted streamflow, forecasted streamflow propagated from upstream, and observed streamflow as inputs–outputs, whose expression is as follows:

Y_{i}^{l} = p_{GMM - RPO} [Q_{i, O}^{l} | (Q_{i, R}^{l}, Q_{i, P}^{l})] i = 1, 2, \dots, n, l = 1, 2, \dots, L

(8)

where

Q_{i, R}^{l}

denotes forecasted streamflow propagated from upstream, and

Q_{i, P}^{l}

denotes interval forecasted streamflow obtained from the forecast precipitation-driven hydrological model.

GMM-RPe uses interval forecasted streamflow, forecasted streamflow propagated from upstream, and errors as inputs–outputs, whose expression is as follows:

\{\begin{cases} e_{i}^{l} = Q_{i, O}^{l} - Q_{i, F}^{l} \\ Y_{i}^{l} = p_{GMM - RPe} [e_{i}^{l} | (Q_{i, R}^{l}, Q_{i, P}^{l})] + Q_{i, F}^{l} \end{cases} i = 1, 2, \dots, n, l = 1, 2, \dots, L

(9)

2.6. Evaluation Indices

The accuracy evaluation of streamflow forecasting can be divided into deterministic forecasting evaluation and probabilistic forecasting evaluation. Among them, Nash-Sutcliffe model efficiency (NSE), coefficient mean relative error (MRE), and root mean square error (RMSE) are selected as deterministic forecast evaluation indices. Their expressions are as follows:

N S E = 1 - \frac{\sum_{i = 1}^{n} {(Q_{i, O} - Q_{i, F})}^{2}}{\sum_{i = 1}^{n} {(Q_{i, O} - \bar{Q_{i, O}})}^{2}}

(10)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(Q_{i, F} - Q_{i, O})}^{2}}{n}} M R E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|Q_{i, F} - Q_{i, O}|}{Q_{i, O}} \times 100 %

(11)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(Q_{i, F} - Q_{i, O})}^{2}}{n}}

(12)

where n denotes the length of sequences,

Q_{i, O}

and

Q_{i, F}

denote observed and forecasted streamflow, and

\bar{Q_{i, O}}

denotes the mean value of observed streamflow.

Probabilistic forecasting is evaluated by interval coverage probability (ICP), interval normalized average width (INAW), and coverage width criteria (CWC). Their expressions are as follows:

I C P = \frac{1}{n} \sum_{i = 1}^{n} κ_{i}, κ_{i} = \{\begin{cases} 1, Q_{i, F} \in [Q_{i, F}^{L o w}, Q_{i, F}^{U p}] \\ 0, Q_{i, F} \notin [Q_{i, F}^{L o w}, Q_{i, F}^{U p}] \end{cases}

(13)

I N A W = \frac{1}{n R} \sum_{i = 1}^{n} (Q_{i, F}^{U p} - Q_{i, F}^{L o w})

(14)

C W C = I N A W \times [1 + γ \times e^{- η \times (I C P - α)}], γ = \{\begin{cases} 0, I C P \geq α \\ 1, I C P < α \end{cases}

(15)

where

Q_{i, F}^{L o w}

and

Q_{i, F}^{U p}

denote the lower and upper limits of the forecasted streamflow interval;

κ

denotes a Boolean variable; R denotes the range of observed streamflow;

α

denotes the confidence level of the forecasted streamflow interval; and

γ

and

η

denote the weights of INAW and ICP (

η = 1

in this study).

3. Study Area and Data

Taking the Yalong River as an example, the basic information of the study area and data are briefly introduced as follows.

3.1. Study Area

Yalong River (shown in Figure 3), the largest tributary of the Jinsha River, has a total length of 1571 km and an area of approximately 136,000 km² [33,34]. Its water resource is abundant, with annual precipitation concentrated from June to October and an average annual streamflow of 1800 m³/s. To measure streamflow in real time, control hydrologic stations are set up from upstream to downstream, including Ganzi, Yajiang, Maidilong, and so on. Furthermore, the large natural drop makes its water energy very rich, accounting for about 14% of the Yangtze River’s water energy. Thus, 22-level hydropower plants have been planned in the mainstream. Currently, five cascade reservoirs have been built and put into operation for a long time, including Jinping–I, Jinping–II, Guandi, Ertan, and Tongzilin. Among them, Jinping–I and Ertan have seasonal regulation performance and above, while the rest have daily regulation performance. The joint operation of cascade reservoirs effectively enhances water resource utilization efficiency in the Yalong River, but it also brings new challenges to streamflow forecasting.

3.2. Data Used

In this study, observed precipitation, forecast precipitation for the following 1–7 days, observed and restored streamflow of hydrologic stations, and historical operation data of reservoirs were selected from 2016 to 2020. Their basic information is presented in Table 1.

(1): IMERG

The Integrated Multi-Satellite Retrieval for the Global Precipitation Measurement Mission (IMERG) is a typical satellite-based precipitation product. It provides precipitation data with a maximum temporal resolution of 0.5 h and a spatial resolution of 0.1°. In this study, the GPM IMERG Final Precipitation V06 product was selected.

(2): Forecast precipitation

Forecast precipitation for the following 1–7 days was derived from the ECMWF (European Centre for Medium-Range Weather Forecasts) product. This product is interpolated control forecast precipitation with a spatial resolution of 0.1°, downloaded from the TIGGE (THORPEX International Grand Global Ensemble) system.

(3): Reservoir operation data

The daily inflow, outflow, and water level of cascade reservoirs were collected. These data are recorded accurately by Yalong River Hydropower Development Co., Ltd. Chengdu, China, covering in detail various operation scenarios of cascade reservoirs over recent years.

(4): Streamflow of hydrologic station

In this study, daily observed streamflow was obtained from daily monitoring at the hydrologic station. The restored streamflow was calculated by the restoration method to remove the influence of reservoir operation, which was used to verify precipitation forecast accuracy directly. These data are controlled and provided by Yalong River Hydropower Development Co., Ltd. Chengdu, China. They are complete and detailed.

4. Results and Discussion

In this study, the error of the forecast precipitation product was corrected, the operation rule for cascade reservoirs was extracted, streamflow of each section under the multi-block condition was forecasted, and forecast uncertainty was analyzed. Detailed results and discussion are shown below.

4.1. Task I: Correction of Forecast Precipitation Based on IMERG

Precipitation is a crucial factor in streamflow formation [35]. In this study, the purpose of correcting precipitation forecasting errors was to provide high-accuracy input for streamflow forecasting. Thus, the effectiveness of precipitation forecast correction needs to be reflected by the accuracy of streamflow forecasting. To directly evaluate the results through the rainfall–runoff relationship, it is necessary to select restored streamflow unaffected by reservoir operation.

As the final station of the Yalong River, Tongzilin’s restored streamflow forecasting accuracy can represent the streamflow forecasting accuracy of the Yalong River, verifying the accuracy of precipitation prediction of the whole basin. Thus, the XAJ model was driven by IMERG precipitation to simulate restored streamflow of Tongzilin. With 2019–2020 as the validation period and the remaining years as the calibration period, the results are shown in Table 2.

The simulation results of restored streamflow in Tongzilin based on IMERG are accurate. During the calibration and validation period, NSE is above 0.96, MRE is below 9%, and RMSE is below 250 m³/s. This indicates that the simulation error of restored streamflow is small and that the trained model can be used to verify the accuracy of forecast precipitation.

Afterward, LSTM is used to correct the error of ECMWF forecast precipitation based on IMERG, with 2019–2020 as the test period and the remaining years as the training pe-riod. Driving the XAJ model with the corrected forecast precipitation, the restored streamflow for the following 1–7 days was forecasted in Tongzilin. The results are compared with the ECMWF forecast precipitation from 2019 to 2020, as shown in Figure 4.

As shown in Figure 4, the forecast accuracy of restored streamflow based on corrected forecast precipitation is significantly better than that based on ECWMF, with NSE above 0.85, MRE below 20%, and RMSE below 600 m³/s under different lead times. This is because the vast territory and significant terrain changes of the Yalong River make the causes of precipitation and streamflow exceptionally complex. Although ECWMF has good applicability, it is oriented towards precipitation forecasting on a global scale, making it difficult to finely consider the effects of the complex topography and weather systems on precipitation in the Yalong River. Thus, forecast accuracy based on ECWMF is poor, with NSE below 0.4, MRE above 60%, and RMSE below 1300 m³/s under different lead times. Instead, LSTM with strong time-series data processing capability can effectively capture the correlation between forecast precipitation and IMERG to correct errors under different lead times. This leads to a substantial improvement in forecast accuracy. Furthermore, when NSE exceeds 0.7, it indicates that restored streamflow is forecasted effectively. Therefore, corrected forecast precipitation for the following 1–7 days has high accuracy, which can be used for subsequent studies.

4.2. Task II: Extraction of Cascade Reservoir Operation Rules

Based on Equation (2), operation rule extraction models for cascade reservoirs were constructed by considering different steps of future information. Using 2019–2020 as the test period and the remaining years as the training period, each model was driven and trained with 200 iterations using the Bayesian optimization algorithm. Each model was trained with IMERG precipitation, inflow, and time period as inputs and outflow as output. Among them, ‘F1’ represents an extraction model that considers the future precipitation and streamflow information for the following day, ‘F2’ represents an extraction model that considers the future precipitation and streamflow information for the next 2 days, and other symbols have similar meanings. The results are shown in Figure 5.

As can be seen in Figure 5, all seven models exhibit good extraction performance. It is worth noting that the extraction accuracy of the operation rule for cascade reservoirs has not always improved with the increase of future information. According to the actual operation mode of the cascade reservoir, the proposed method in this study considers the cascade reservoir as a whole to extract its operation rules at one time. When limited and accurate future information is used as input, the regulation capacity of each reservoir can be fully mobilized and operational needs can be met, making the comprehensive benefits of cascade reservoirs fully realized. Thus, the appropriate addition of future information can improve the model extraction accuracy. However, only previous future information has a large impact on current operational decisions in reality. Excessive inclusion of non-critical future information does not improve the extraction performance but increases the model’s complexity, making accuracy lower. From the perspective of overall accuracy, adding precipitation and streamflow information for the following 3 days is the most suitable for extracting operation rules for cascade reservoirs.

4.3. Task III: Simulation of Interval Streamflow

According to the geographic distribution of the three controlling hydrological stations and five reservoirs (Figure 3), the Yalong River is divided into eight forecast sub-intervals. IMERG precipitation and streamflow were used as inputs and outputs to train the optimal parameters of the XAJ model for each sub-interval. Due to the lack of observed data on interval streamflow, observed streamflow or inflow was used as the basis for verification. The results were evaluated by NSE, MRE, and RMSE, as shown in Table 3.

As can be seen from Table 3, the XAJ model has achieved good results in the streamflow simulation of all the sub-intervals. Except for the Ganzi hydrologic station, NSE of sub-intervals is all more than 0.95, MRE is less than 9%, and RMSE is less than 200 m³/s. This indicates that the simulated streamflow has small errors and a high degree of agreement with the observed streamflow. The upper reaches of Ganzi are high-altitude areas covered with snow, resulting in some streamflow being formed by snowmelt [36]. The lumped XAJ model cannot fully reflect the process of snowmelt-produced streamflow, making certain errors in the streamflow simulation of Ganzi. However, its simulation accuracy is also high, with NSE exceeding 0.86 and RMSE below 120 m³/s, which can still reflect well the main trend changes of the observed streamflow process at Ganzi Station. Therefore, the streamflow simulation of each sub-interval has good accuracy, and the trained parameters of the XAJ model can be used for subsequent studies.

4.4. Task IV: Deterministic Streamflow Forecasting Under Different Lead Times

Based on Figure 2, short-term deterministic streamflow forecasting results for multi-block basins were obtained under different lead times by coupling corrected forecast precipitation, the XAJ model for each sub-interval, and the operation rule extraction model for cascade reservoirs. Although the effective lead time of corrected forecast precipitation is 7 days, the operation rule extraction model for cascade reservoirs needs to use forecast precipitation and inflow for the following 3 days as inputs. Thus, considering models’ input demands comprehensively, lead time of short-term streamflow forecasting is set as 5 days. NSE, MRE, and RMSE of short-term deterministic streamflow forecasting for each section under different lead times from 2019 to 2020 are shown in Figure 6.

NSE of deterministic streamflow forecasting at each forecast section is all above 0.83 under different lead times, and MRE is all below 25%. This indicates that the proposed method has good forecast accuracy under different lead times, and forecasted streamflow is highly correlated with observed streamflow at each forecast section, which can accurately describe the main trend changes of observed streamflow. It is worth noting that streamflow forecasting accuracy decreases with the increase in lead time for all forecast sections, while there is no similar law in space. The reason for this phenomenon may be related to the climate characteristics of the Yalong River and the streamflow composition of each forecast section.

Ganzi, as the first section of the short-term streamflow forecasting system in this study, is located in the high-altitude area of the upper Yalong River, whose streamflow primarily originates from rainfall and snowmelt production. However, due to the insufficient forecasting ability of the corrected forecast precipitation for snowfall and the inability of the lumped XAJ model to sufficiently describe the process of snowmelt-produced streamflow, streamflow forecasting errors at Ganzi station are relatively large, with MRE ranging from 20 to 30% under different lead times. In addition, uncertainty in forecast precipitation increases with the extension of lead time, leading to a consistent decrease in the accuracy of streamflow forecasting at Ganzi station.

The forecasted streamflow of the downstream sections is derived from the river propagation streamflow of the previous section’s forecasted streamflow and interval forecasted streamflow. This makes them inevitably subject to the forecasting errors of streamflow at Ganzi. However, streamflow of the downstream sections is mainly generated by rainfall. The XAJ model has high accuracy in simulating streamflow in the downstream intervals, and its parameters can accurately portray the process of downstream interval streamflow formation. The corrected forecast precipitation has high forecast accuracy in the downstream intervals. The superposition of these factors can reduce the impact of Ganzi’s forecasting errors on downstream sections to a certain extent, enabling the NSE of Yajiang, Maidilong, and Jinping–I to remain between 0.87 and 0.91 under different lead times.

The forecasted inflow of four downstream reservoirs, including Jinping–II, Guandi, Ertan, and Tongzilin, is formed by superimposing the river propagation streamflow of the previous reservoir’s forecasted outflow and interval forecasted streamflow. It should be pointed out that for downstream reservoirs, the river propagation streamflow of the previous reservoir’s forecasted outflow constitutes a very large proportion of the forecasted inflow. When the accuracy of the operation rule extraction model is high enough, it can effectively minimize the effect of upstream forecasting errors on downstream sections. Thus, the inflow forecasting accuracy of the four reservoirs is high. Especially in the lead time of 1–3 days, their NSE values are all above 0.9. This indicates that the proposed operation rule extraction model for cascade reservoirs can effectively promote the improvement of the downstream reservoirs’ inflow forecasting accuracy when the inflow forecasting of Jinping–I and the interval precipitation forecast reach a certain accuracy. However, as forecasting errors accumulate over time, reservoirs’ outflow forecasting accuracy decreases, resulting in relatively large inflow forecasting errors at the lead time of the fifth day.

In summary, although there are some errors in the streamflow forecasting of each section under different lead times, their NSE values are all above 0.83, indicating high forecast accuracy and fully meeting the actual streamflow forecasting needs.

4.5. Task V: Probabilistic Streamflow Forecasting Under Different Lead Times

According to deterministic streamflow forecasting results of each section, GMM-based probabilistic streamflow forecasting results were obtained under different lead times from 2019 to 2020 by using single-value forecasted streamflow, interval forecast precipitation, forecasted outflow, observed streamflow, and single-value forecasting errors as inputs and outputs. To verify the performance of the GMM model in probabilistic streamflow forecasting, a normal distribution model of single-value forecasting errors (N(e)) was developed for comparison. A confidence level of 90% is selected and their accuracy is evaluated by ICP, INAW, and CWC, whose results are shown in Figure 7.

It can be seen from Figure 7 that GMM-RPO has the best comprehensive performance at a 90% confidence level, and probabilistic streamflow forecasting accuracy can be effectively improved by using the interval forecasted streamflow and the river propagation streamflow of the previous section’s forecasted streamflow as inputs. When the interval width is wide at a certain confidence level, it can cover numerous observed streamflow points with large ICP and INAW, also indicating high uncertainty in streamflow forecasting. Thus, the best effect of probabilistic streamflow forecasting is to cover as many observed streamflow points as possible with an interval width as narrow as possible. From this analysis, the possible reasons for the differences in the probabilistic streamflow forecasting performance of the models are as follows.

The normal distribution model N(e) obtains the distribution interval under different confidence levels through the normal distribution of streamflow forecasting errors. It constructs the same interval width for all forecasted streamflow points at different magnitudes and can cover as many observed streamflow points as possible, which makes its ICP larger than that of other models. However, it causes the interval width of N(e) to be too wide, resulting in a larger INAW than that of other models, which affects the synthesis effect of N(e) and makes the CWC larger than that of other models.

GMM-Fe constructs Gaussian mixture distributions between forecasted streamflow and forecasting errors at different magnitudes, which can effectively avoid different forecasted streamflow points having the same interval width. Thus, GMM-Fe has a small INAW at each forecast section. However, the strong randomness of forecasting errors causes the interval width narrow to be unable to cover many observed streamflow points. This makes its ICP small, affecting the forecasting performance of GMM-Fe. GMM-RPe further clarifies the sources of streamflow forecasting errors, which improves the interval coverage and is superior to GMM-Fe in terms of ICP. Nevertheless, there is still room for further improvement in the accuracy of GMM-RPe.

GMM-FO obtains the Gaussian mixture distribution between forecasted and observed streamflow at different magnitudes according to their correlation. This makes GMM-FO highly accurate and superior to the other three models in most cases. GMM-RPO further splits the composition of the forecasted streamflow at each section. This results in higher interval coverage, smaller interval width, and better accuracy for GMM-RPO. Therefore, the comprehensive performance of GMM-RPO is better than that of the other four models.

Comparing the four GMM-based probabilistic streamflow forecasting models, it can be seen that the inputs of the interval forecasted streamflow and the river propagation streamflow of the previous section’s forecasted streamflow can help to improve the forecasting accuracy. Generally speaking, the factors and degrees of streamflow forecasting uncertainty at different sections vary under different magnitudes. GMM-RPO and GMM-RPe take the interval forecasted streamflow and the river propagation streamflow of the previous section’s forecasted streamflow as inputs, which can consider the sources of streamflow forecasting errors in each period in detail to obtain a more refined Gaussian mixture distribution. Thus, their forecasting accuracy is higher.

The lower accuracy of deterministic streamflow forecasting means a higher forecasting uncertainty, and forecasting accuracy decreases with increasing lead times. Thus, the probabilistic streamflow forecasting results of Tongzilin in the lead time of the fifth day are plotted at 80%, 90%, and 95% confidence levels to visually compare the models’ performance, as shown in Figure 8.

Among the five models, GMM-RPO can more reasonably and accurately describe streamflow forecasting uncertainty. From the right panel of Figure 8, it can be seen that the scatter is more concentrated in the lower left corner (low-value region) and more dispersed in the upper right corner (high-value region), and the forecasting error is smaller in the lower left corner than in the upper right corner. Additionally, there is an irregular distribution between forecasted and observed inflow. This suggests that inflow forecasting uncertainty exhibits characteristics of “non-normality” and “heteroscedasticity”. Specifically, the difficulty of high-value streamflow forecasting is greater than that of low-value streamflow, whose uncertainty is also. Moreover, large forecasting errors also indicate large uncertainty. The normal distribution model N(e) gives the same interval width to all forecasted inflow points at different magnitudes. It contains too much forecast information for decision-makers to choose quickly and rationally and cannot effectively characterize the variability of forecast uncertainty. GMM constructs different probability distributions for inflow forecasting errors at different magnitudes by linearly combining K sub-models of Gaussian distributions, which can establish forecast intervals for different inflow forecasting values. The forecast intervals of the GMM model show an irregular distribution shape with different confidence levels in the right panel of Figure 8. This indicates that GMM can more reasonably describe this “non-normality” of forecasting uncertainty. Furthermore, it can be seen from Figure 8 that the forecast interval model width of GMM in flood season is greater than that in non-flood season, and it widens with the increase in inflow magnitudes or forecasting errors. This phenomenon indicates that GMM can reasonably describe the “heteroscedasticity” of forecasting uncertainty. Thus, GMM can portray streamflow forecasting uncertainty well.

Among the four GMM-based probabilistic streamflow forecasting models, GMM-RPO has the best forecasting performance. Though GMM-Fe has a narrow interval width and uniform interval variation, it does not fit high-value inflow points and large forecasting error points well. GMM-RPe increases interval width appropriately, improves interval coverage, and has a certain fitting ability for high-value inflow points, but its fitting effect for large forecasting error points is still not satisfactory. Thus, the randomness of forecasting errors is an obstacle to the improvement of model performance. GMM-FO obtains confidence intervals according to the correlation between forecasted and observed inflow, resulting in a more irregular interval distribution and higher accuracy than GMM-Fe and GMM-RPe. However, its overall interval distribution is low, and the fitting accuracy is low for large forecasting error points. Based on GMM-FO, GMM-RPO further clarifies the source of inflow forecasting error to obtain a more accurate and reasonable forecasting interval, which makes its interval width for low-value inflow points narrower than that of GMM-FO and its fitting accuracy for high-value inflow points better than that of GMM-FO and GMM-RHe. Therefore, the overall forecast accuracy of GMM-RPO is the best.

5. Conclusions

To accurately forecast streamflow for multi-block basins, a novel short-term streamflow forecasting method was proposed in this study by comprehensively considering the orderly joint calculation of interval streamflow forecasting, reservoir operation rule simulation, and river propagation confluence based on machine learning and physically based hydrological models. Firstly, the forecast precipitation product was corrected by LSTM based on IMERG precipitation. Secondly, the operation rule for cascade reservoirs was extracted by fully considering the reservoirs’ hydraulic connectivity and multi-step hydro-meteorological spatiotemporal future information. Thirdly, coupling forecast precipitation information, reservoir operation rule extraction model, and hydrological model, a short-term deterministic streamflow forecasting system for the basin was built. Finally, based on the deterministic streamflow forecasting composition, the Gaussian mixture model was introduced to establish four probabilistic streamflow forecasting models with different input–output combinations. Taking the Yalong River as an example, the main conclusions are as follows:

(1): LSTM can effectively correct the error of the forecast precipitation product to meet the demand for precipitation forecast accuracy in streamflow forecasting.
(2): Appropriate addition of multi-step future information can effectively improve the extraction accuracy of the operation rule for cascade reservoirs, with NSE all above 0.91 and MRE below 13%.
(3): The NSE of the proposed deterministic streamflow forecasting method for the following 1–5 days at eight forecast sections is above 0.83, indicating that the proposed method can effectively improve the forecasting accuracy and extend lead times under the multi-block condition.
(4): GMM-RPO’s ICP is above 0.9, INAW and CWC are all below 0.15 at all stations under different lead times, which indicates that it can adequately reflect the impact of uncertainty in interval streamflow forecasting and upstream reservoir operation on downstream streamflow forecasting accuracy, and characterize the “non-normality” and “heteroskedasticity” of forecasting uncertainty.

The proposed method can help improve the accuracy of streamflow forecasting in the multi-block watershed and promote the scientific allocation of water resources. However, the lumped XAJ model does not depict the process of snowmelt-produced streamflow, which results in a smaller simulation of streamflow at the Ganzi station and affects forecasting accuracy to some extent. Thus, future studies can explore the principle of snowmelt-produced streamflow and quantify the impact of uncertainty in model inputs, parameters, and structures on streamflow forecasting in more detail to provide more accurate guidance for water resource allocation.

Author Contributions

All authors contributed significantly to this manuscript. B.J.: writing—original draft preparation, software, funding acquisition; W.F.: writing review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Hubei Provincial Natural Science Foundation Program (2024AFD367).

Data Availability Statement

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors thank Yalong River Hydropower Development Co., Ltd., TIGGE for providing the basic data.

Conflicts of Interest

Author Benjun Jia is employed by China Yangtze Power Co., Ltd. The authors declare no conflicts of interest.

References

Genova, P.; Wei, Y. A Socio-Hydrological Model for Assessing Water Resource Allocation and Water Environmental Regulations in the Maipo River Basin. J. Hydrol. 2023, 617, 129159. [Google Scholar] [CrossRef]
Wang, R.; Li, X.; Zhang, Q.; Cheng, J.; Li, J.; Zhang, D.; Liu, Y. Projection of Drought-Flood Abrupt Alternation in a Humid Subtropical Region under Changing Climate. J. Hydrol. 2023, 624, 129875. [Google Scholar] [CrossRef]
Fang, W.; Zhou, J.; Jia, B.; Gu, L.; Xu, Z. Study on the Evolution Law of Performance of Mid- to Long-Term Streamflow Forecasting Based on Data-Driven Models. Sustain. Cities Soc. 2023, 88, 104277. [Google Scholar] [CrossRef]
Li, R.; Wang, Z.; Sun, H.; Zhou, S.; Liu, Y.; Liu, J. Automatic Identification of Earth Rock Embankment Piping Hazards in Small and Medium Rivers Based on UAV Thermal Infrared and Visible Images. Remote Sens. 2023, 15, 4492. [Google Scholar] [CrossRef]
Liu, Y.; Yuan, S.; Zhu, Y.; Ren, L.; Chen, R.; Zhu, X.; Xia, R. The Patterns, Magnitude, and Drivers of Unprecedented 2022 Mega-Drought in the Yangtze River Basin, China. Environ. Res. Lett. 2023, 18. [Google Scholar] [CrossRef]
Vogelbacher, A.; Aminzadeh, M.; Madani, K.; Shokri, N. An Analytical Framework to Investigate Groundwater-Atmosphere Interactions Influenced by Soil Properties. Water Resour. Res. 2024, 60, e2023WR036643. [Google Scholar] [CrossRef]
Wang, H.; Hu, Y.; Guo, Y.; Wu, Z.; Yan, D. Urban Flood Forecasting Based on the Coupling of Numerical Weather Model and Stormwater Model: A Case Study of Zhengzhou City. J. Hydrol. Reg. Stud. 2022, 39, 100985. [Google Scholar] [CrossRef]
Liu, L.; Ping Xu, Y.; Li Pan, S.; Xu Bai, Z. Potential Application of Hydrological Ensemble Prediction in Forecasting Floods and Its Components over the Yarlung Zangbo River Basin, China. Hydrol. Earth Syst. Sci. 2019, 23, 3335–3352. [Google Scholar] [CrossRef]
Li, X.; Rankin, C.; Gangrade, S.; Zhao, G.; Lander, K.; Voisin, N.; Shao, M.; Morales-Hernández, M.; Kao, S.C.; Gao, H. Evaluating Precipitation, Streamflow, and Inundation Forecasting Skills during Extreme Weather Events: A Case Study for an Urban Watershed. J. Hydrol. 2021, 603, 127126. [Google Scholar] [CrossRef]
Jabbari, A.; So, J.M.; Bae, D.H. Precipitation Forecast Contribution Assessment in the Coupled Meteo-Hydrological Models. Atmosphere 2020, 11, 34. [Google Scholar] [CrossRef]
Wang, Y.; Liu, W.; Li, X.; Xu, J. Research Progresses of Rainfall-runoff Simulation Based on Land-atmosphere Coupling Model. J. Chang. River Sci. Res. Inst. 2024, 41, 26–35. [Google Scholar]
Liu, S.; Han, Y.; Wang, P.; Zhang, G.J.; Wang, B.; Wang, Y. More Heavy Precipitation in World Urban Regions Captured through a Two-Way Subgrid Land-Atmosphere Coupling Framework in the NCAR CESM2. Geophys. Res. Lett. 2024, 51, e2024GL108747. [Google Scholar] [CrossRef]
Larsen, M.A.D.; Refsgaard, J.C.; Drews, M.; Butts, M.B.; Jensen, K.H.; Christensen, J.H.; Christensen, O.B. Results from a Full Coupling of the HIRHAM Regional Climate Model and the MIKE SHE Hydrological Model for a Danish Catchment. Hydrol. Earth Syst. Sci. 2014, 18, 4733–4749. [Google Scholar] [CrossRef]
Gu, T.; Chen, Y.; Gao, Y.; Qin, L.; Wu, Y.; Wu, Y. Improved Streamflow Forecast in a Small-Medium Sized River Basin with Coupled WRF and WRF-Hydro: Effects of Radar Data Assimilation. Remote Sens. 2021, 13, 3251. [Google Scholar] [CrossRef]
Han, S.; Coulibaly, P. Bayesian Flood Forecasting Methods: A Review. J. Hydrol. 2017, 551, 340–351. [Google Scholar] [CrossRef]
Feng, K.; Zhou, J.; Liu, Y.; Lu, C.; He, Z. Hydrological Uncertainty Processor (HUP) with Estimation of the Marginal Distribution by a Gaussian Mixture Model. Water Resour. Manag. 2019, 33, 2975–2990. [Google Scholar] [CrossRef]
Darbandsari, P.; Coulibaly, P. HUP-BMA: An Integration of Hydrologic Uncertainty Processor and Bayesian Model Averaging for Streamflow Forecasting. Water Resour. Res. 2021, 57, 2020WR029433. [Google Scholar] [CrossRef]
Cui, Z.; Guo, S.; Chen, H.; Liu, D.; Zhou, Y.; Xu, C.Y. Quantifying and Reducing Flood Forecast Uncertainty by the CHUP-BMA Method. Hydrol. Earth Syst. Sci. 2024, 28, 2809–2829. [Google Scholar] [CrossRef]
Babamiri, O.; Dinpashoh, Y. Uncertainty Analysis of River Water Quality Based on Stochastic Optimization of Waste Load Allocation Using the Generalized Likelihood Uncertainty Estimation Method. Water Resour. Manag. 2024, 38, 967–989. [Google Scholar] [CrossRef]
Nourani, V.; Jabbarian Paknezhad, N.; Sharghi, E.; Khosravi, A. Estimation of Prediction Interval in ANN-Based Multi-GCMs Downscaling of Hydro-Climatologic Parameters. J. Hydrol. 2019, 579, 124226. [Google Scholar] [CrossRef]
Sun, A.Y.; Wang, D.; Xu, X. Monthly Streamflow Forecasting Using Gaussian Process Regression. J. Hydrol. 2014, 511, 72–81. [Google Scholar] [CrossRef]
Liu, Y.; Ye, L.; Qin, H.; Ouyang, S.; Zhang, Z.; Zhou, J. Middle and Long-Term Runoff Probabilistic Forecasting Based on Gaussian Mixture Regression. Water Resour. Manag. 2019, 33, 1785–1799. [Google Scholar] [CrossRef]
Null, S.E.; Zeff, H.; Mount, J.; Gray, B.; Sturrock, A.M.; Sencan, G.; Dybala, K.; Thompson, B. Storing and Managing Water for the Environment Is More Efficient than Mimicking Natural Flows. Nat. Commun. 2024, 15, 5462, s41467–s024. [Google Scholar]
Mayer, M.J.; Yang, D. Calibration of Deterministic NWP Forecasts and Its Impact on Verification. Int. J. Forecast. 2023, 39, 981–991. [Google Scholar] [CrossRef]
Fang, W.; Qin, H.; Shen, K.; Yang, X.; Yang, Y.; Jia, B. Extracting Operation Rule of Cascade Reservoirs Using a Novel Framework Considering Hydrometeorological Spatiotemporal Information Based on Artificial Intelligence Models. J. Clean. Prod. 2024, 437, 140608. [Google Scholar] [CrossRef]
Cho, K.; Kim, Y. Improving Streamflow Prediction in the WRF-Hydro Model with LSTM Networks. J. Hydrol. 2022, 605, 127297. [Google Scholar] [CrossRef]
Zhang, X.; Liu, P.; Feng, M.; Xu, C.Y.; Cheng, L.; Gong, Y. A New Joint Optimization Method for Design and Operation of Multi-Reservoir System Considering the Conditional Value-at-Risk. J. Hydrol. 2022, 610, 127946. [Google Scholar] [CrossRef]
Lu, M.; Li, Y.; Yu, M.; Zhang, Q.; Zhang, Y.; Liu, B.; Wang, M. Spatiotemporal Prediction of Radar Echoes Based on Convlstm and Multisource Data. Remote Sens. 2023, 15, 1279. [Google Scholar] [CrossRef]
Dehghani, A.; Moazam, H.M.Z.H.; Mortazavizadeh, F.; Ranjbar, V.; Mirzaei, M.; Mortezavi, S.; Ng, J.L.; Dehghani, A. Comparative Evaluation of LSTM, CNN, and ConvLSTM for Hourly Short-Term Streamflow Forecasting Using Deep Learning Approaches. Ecol. Inform. 2023, 75, 102119. [Google Scholar] [CrossRef]
Gong, J.; Xu, J.; Gong, J.; Yao, C.; Li, Z.; Weerts, A.H.; Weerts, A.H.; Wang, X.; Huang, Y. State Updating in Xin’anjiang Model by Asynchronous Ensemble Kalman Filtering with Enhanced Error Models. J. Hydrol. 2024, 640, 131726. [Google Scholar] [CrossRef]
Guan, H.; Huang, J.; Li, L.; Li, X.; Miao, S.; Su, W.; Ma, Y.; Niu, Q.; Huang, H. Improved Gaussian Mixture Model to Map the Flooded Crops of VV and VH Polarization Data. Remote Sens. Environ. 2023, 295, 113714. [Google Scholar] [CrossRef]
Jia, B.; Zhou, J.; Tang, Z.; Xu, Z.; Chen, X.; Fang, W. Effective Stochastic Streamflow Simulation Method Based on Gaussian Mixture Model. J. Hydrol. 2022, 605, 127366. [Google Scholar] [CrossRef]
Zhao, Y.; Xu, K.; Dong, N.; Wang, H. Optimally Integrating Multi-Source Products for Improving Long Series Precipitation Precision by Using Machine Learning Methods. J. Hydrol. 2022, 609, 127707. [Google Scholar] [CrossRef]
Fang, W.; Qin, H.; Liu, G.; Yang, X.; Xu, Z.; Jia, B.; Zhang, Q. A Method for Spatiotemporally Merging Multi-Source Precipitation Based on Deep Learning. Remote Sens. 2023, 15, 4160. [Google Scholar] [CrossRef]
Lazo, P.X.; Mosquera, G.M.; McDonnell, J.J.; Crespo, P. The Role of Vegetation, Soils, and Precipitation on Water Storage and Hydrological Services in Andean Páramo Catchments. J. Hydrol. 2019, 572, 805–819. [Google Scholar] [CrossRef]
Wu, N.; Zhang, K.; Chao, L.; Ning, Z.; Wang, S.; Jarsjö, J. Snow Cover Expansion with Contrasting Depth Thinning in the Recent 40 Years: Evidence from the Yalong River Basin, South-Eastern Tibetan Plateau. J. Hydrol. Reg. Stud. 2024, 53, 101786. [Google Scholar] [CrossRef]

Figure 1. Framework diagram of the study methodology.

Figure 2. Short-term deterministic streamflow forecasting method with meteo-hydrological coupling under the multi-block condition.

Figure 3. Location and station distribution of the Yalong River.

Figure 4. Forecast accuracy of restored streamflow in Tongzilin under different lead times from 2019 to 2020.

Figure 5. Accuracy of different operation rule extraction models for cascade reservoirs during the test period.

Figure 6. Each section’s short-term deterministic streamflow forecasting accuracy under different lead times from 2019 to 2020 (‘L1’ represents the lead time of 1 day, and the other symbols have similar meanings).

Figure 7. Probabilistic streamflow forecasting accuracy for eight forecast sections with different lead times at a 90% confidence level.

Figure 8. Probabilistic streamflow forecasting results of different models for Tongzilin at the lead time of the fifth day.

Table 1. Basic information of the used data.

Name	Spatial Resolution	Temporal Resolution	Source
IMERG	0.1°	1 d	https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_06/summary (1 March 2024)
ECMWF	0.1°	1 d	https://apps.ecmwf.int/datasets/data/tigge/levtype=sfc/type=cf/ (12 March 2024)
Reservoir	\	1 d	Yalong River Hydropower Development Co., Ltd. Chengdu, China
Hydrologic station	\	1 d	Yalong River Hydropower Development Co., Ltd. Chengdu, China

Table 2. Restored streamflow simulation accuracy of Tongzilin.

Evaluation Indices	Calibration Period	Validation Period
NSE	0.973	0.968
MRE	7.2%	8.8%
RMSE	239 m³/s	248 m³/s

Table 3. Calibration and validation results of the XAJ model for each forecast sub-interval.

Forecast Section	Calibration Period			Validation Period
Forecast Section	NSE	MRE	RMSE	NSE	MRE	RMSE
Ganzi	0.870	15.1%	103 m³/s	0.865	17.5%	107 m³/s
Yajiang	0.962	7.2%	149 m³/s	0.958	8.5%	161 m³/s
Maidilong	0.979	4.5%	136 m³/s	0.974	4.8%	142 m³/s
Jinping–I	0.982	5.0%	143 m³/s	0.978	5.5%	151 m³/s
Jinping–II	0.995	1.0%	52 m³/s	0.992	2.2%	71 m³/s
Guandi	0.986	3.1%	92 m³/s	0.974	4.5%	161 m³/s
Ertan	0.981	4.7%	112 m³/s	0.973	5.2%	168 m³/s
Tongzilin	0.973	4.1%	163 m³/s	0.969	4.8%	186 m³/s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, B.; Fang, W. Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting. Remote Sens. 2025, 17, 2314. https://doi.org/10.3390/rs17132314

AMA Style

Jia B, Fang W. Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting. Remote Sensing. 2025; 17(13):2314. https://doi.org/10.3390/rs17132314

Chicago/Turabian Style

Jia, Benjun, and Wei Fang. 2025. "Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting" Remote Sensing 17, no. 13: 2314. https://doi.org/10.3390/rs17132314

APA Style

Jia, B., & Fang, W. (2025). Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting. Remote Sensing, 17(13), 2314. https://doi.org/10.3390/rs17132314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coupling Machine Learning and Physically Based Hydrological Models for Reservoir-Based Streamflow Forecasting

Abstract

1. Introduction

2. Methodology

2.1. Framework

2.2. LSTM for Forecast Precipitation Correction

2.3. Operation Rule Extraction for Cascade Reservoirs

2.4. Short-Term Deterministic Streamflow Forecasting

2.5. GMM for Probabilistic Streamflow Forecasting

2.6. Evaluation Indices

3. Study Area and Data

3.1. Study Area

3.2. Data Used

4. Results and Discussion

4.1. Task I: Correction of Forecast Precipitation Based on IMERG

4.2. Task II: Extraction of Cascade Reservoir Operation Rules

4.3. Task III: Simulation of Interval Streamflow

4.4. Task IV: Deterministic Streamflow Forecasting Under Different Lead Times

4.5. Task V: Probabilistic Streamflow Forecasting Under Different Lead Times

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI