1. Introduction
The expected population growth in the Syr Darya region will increase water demand in the forthcoming years [
1], while climate change will further exacerbate water stress in the forthcoming years [
2]. Climate change-induced effects mostly include temperature rises, which will increase the water’s need for irrigation due to higher evapotranspiration rates [
3,
4] and potentially reduce the streamflow in the Syr Darya River due to glacier loss in the Tien Shan Mountains, which greatly sustains the streamflow through glaciers and snow melting [
5,
6,
7]. Notwithstanding, anthropogenic activities such as the construction of dams and irrigation canals have already significantly reduced river streamflow, especially after the 1970s, and thus subsequently impacted the water supply and water-demand patterns in the Syr Darya region. These pluralities of multifaceted factors warrant the need for effective river streamflow forecasting tools to improve planning and water management efficiency in the Syr Darya region for the benefit of human well-being and environmental sustainability in light of the known and expected climate change impacts.
River streamflow prediction has recently gained significant research relevance in many studies over the past few years. The application of streamflow modeling has been applied to a wide range of applications, including, but not limited to, agriculture, water supply, recreation, flood defense, and energy production. Streamflow forecasting can be classified as short-term and long-term forecasting. Short-term forecasts correspond to real-time predictions (i.e., daily or hourly outputs) and are more suitable for flood-management applications. Long-term forecasts may include weekly, monthly, or annual predictions, and are more relevant for reservoir operation, irrigation-system management, and hydropower generation [
8].
Additionally, three models have been adopted for the forecasting of river streamflow, namely the conceptual, physical, or data-driven models [
9]. Conceptual models are based on correlations between various observable hydrological and meteorological variables. The dependency on the observed variables of this model diminishes the practicality of this approach in catchments where data availability is limited. Physical-based models have been widely used for river-flow predictions, mostly due to better process understanding, increased computing power, improved observation systems, which include flow data for model calibration, and remote sensing [
10]. Despite this substantial improvement, physical models require a large amount of data, including, but not limited to, that of land-use, slopes, soils, and climate [
11], and the limited process understanding of subsurface data and inadequate grid resolution represents a considerable constraint [
10].
The increasingly wide availability of hydrological and meteorological data, in addition to their rapid temporal evolution, often leads to a preference for data-driven models over physically based or conceptual forecasting models. Data-driven models overcome these limitations and produce reliable predictions of physical systems, even without prior knowledge of the underlying physical relationships and the catchment information. Data-driven models, often incorporating machine learning algorithms, have widely been applied to streamflow forecasting in recent decades. Thanks to the considerable development in computation capabilities and data availability [
12], machine learning models, particularly those based on artificial neural networks (ANNs), have been widely used to unravel the non-linear input–output relationship. Specifically, a particular type of ANN, referred to as a long short-term memory network (LSTM), has gained increasing attention in the field of hydrology, given its strong learning ability to process sequential time-series data.
In the realm of river streamflow forecast application, LSTM-based networks have successfully used and yielded promising outcomes e.g., [
13,
14,
15]. Cheng et al. compared ANNs and LSTM in long lead-time forecasting capability in the Nan River Basin and Ping River Basin and demonstrated that the LSTM model outperformed the ANN model in long lead-time daily forecasting [
16]. Xu et al. assessed the performance of LSTM networks in the task of the 10-day average flow predictions and the daily flow predictions for the Hun River and Upper Yangtze River basins, respectively [
17]. Their study evidenced the superiority of LSTM over traditional hydrological and data-driven models tested, which included the Soil and Water Assessment Tool (SWAT) SWAT, the Xinjiang model (XAJ), the multiple linear regression model (MLR), and back-propagation neural networks (BP). Mehedi et al. used the LSTM neural network to forecast river discharge and demonstrated its higher performance compared to other neural network regression models, including for longer lead periods [
18].
The performance of LSTM networks has also proved satisfactory when compared to other advanced machine learning algorithms [
19]. Dehghani et al. compared LSTM, CNN, and Convolutional long short-term memory (ConvLSTM) in hourly streamflow prediction in two rivers in Malaysia, namely the Kelantan and Muda River basins [
20]. They showed that all three deep learning methods performed with high accuracy in predicting streamflow, but LSTM outperformed CNN and ConvLSTM in small basins with well-spatially distributed rainfall stations, while it underperformed in moderate to high streamflow and large river basins. Le et al. demonstrated the superiority of LSTM-based models over a feed-forward neural network (FFNN), a CNN model adapted from a performance and stability standpoint [
21]. While their study showed that LSTM-based models could achieve remarkable forecasts even in the presence of upstream dams and reservoirs, they also evidenced that the complexity of the RNN family models—Stacked LSTM and BiLSTM models—is not accompanied by performance improvement.
In a more recent study, Le et al. also evidenced that only a single hidden layer would be sufficient for LSTM/GRU models to deliver highly accurate predictions while minimizing the data processing time [
22]. In addition, their study also demonstrated that the use of multiple-input data types—rainfall and streamflow data—does not necessarily yield an increase in prediction accuracy when compared to using only one data type (only streamflow). In other words, they concluded that the inclusion of rainfall data does not improve streamflow forecasting performance. However, it is very important to examine the contribution of rainfall data on model performance while considering the geographical locations of the weather stations relative to streamflow-measuring stations, as some may have more impact than others depending on their positions—upstream or downstream.
This study aims to examine the contribution of rainfall data in improving the model’s forecasting accuracy by considering different sets of scenarios, which include rainfall data from different weather stations based on their geographical locations with respect to the flow monitoring station. Specifically, the streamflow data were used as an input feature in all the scenarios, whereas rainfall data (RF) were only used in four scenarios, such that all All-RF included rainfall data collected in all 11 stations; Up-RF and Down-RF included only the rainfall data measured upstream and downstream of the streamflow-measuring station; and P-RF only included the rainfall data exhibiting the highest level of correlation with the streamflow data (Pearson coefficient > 0.3), which was only the case for Naryn and Tian-Shan stations. Also, two LSTM-based models were tested, namely LSTM and BILSTM, and their forecasting performance was compared quantitatively and qualitatively. The outcomes of this study will provide a first insight into the potential applicability of LSTM-based models in forecasting streamflow data in the Syr Darya River and better understand the influence of hydrological data properties on the model’s predictive forecasting performance.
2. Study Area and Data Acquisition
The Syr Darya River is the second major river in Central Asia and is a transboundary river shared by four countries, namely Kyrgyzstan, Kazakhstan, Tajikistan, and Uzbekistan. The catchment area of the Syr Darya extends over around 219,000.0 km2. Along with the Amu Darya, which originates in Tajikistan, the Syr Darya is the second major tributary of the Aral Sea and is one of the most anthropologically altered water systems in the world.
The Syr Darya possesses two tributaries, namely the Naryn and the Kara Darya. While the former originates in the high mountainous areas of Central Tianshan, the latter originates in the Fergana/Alai Range. In the Naryn catchment basin, the total annual precipitation ranges between 280 and 450 mm, where the largest amount of annual precipitation occurs during spring and summer. In the Kara Darya, the total annual precipitation ranges between 350 and 1050 mm, where the largest portion falls in winter and spring while very little falls during the summer period [
23]. These two tributaries eventually merge into the Syr Darya in the Fergana Valley of Uzbekistan.
The coupled impact of climate change and water governance contribute towards exacerbating tensions over the already limited water resources within and across the borders, particularly given the conflicting nature of the water demands. While water is mainly needed for energy-production (hydropower) purposes in winter in Kirgizstan, water is mostly abstracted for agriculture (water-intensive crops) during the vegetation season (spring/summer) in Uzbekistan. This conflicting interest in water use combined with river streamflow reduction will further compromise the relationship between the two riparian states in the future.
Figure 1 provides a map showing the demonstration catchments investigated herein, as well as the location of the hydrological and meteorological stations. The mean flow data from the Syr Darya River were provided by the World Meteorological Organization (WMO) and were collected as part of the framework of the First GARP Global Experiment (FGGE).
The flow data were collected at the Tomenaryk hydrological station, located in Kazakhstan (
Figure 1a), and were recorded daily from January 1936 to December 1986. Data collected before 1957 were discarded from the dataset due to some recurring portions of missing values. The resulting dataset comprised 30 years of streamflow data spanning from January 1957 to December 1986. The entire length of the sequence of the flow dataset did not present any missing values; thus, no deletion or imputation processes were required. The mean flow data were then converted into monthly data for the challenge of forecasting. The graph of the flow data of the Syr Darya River is shown in
Figure 2.
The rainfall data were sourced from the Global Historical Climate Network (GHCN) database. The rainfall data were collected at 11 stations located throughout the catchment area on the upstream and downstream sides of the Tomenaryk station. The locations of the meteorological stations are highlighted by the green dots in
Figure 1b. Missing values were recorded as zeros (i.e., no precipitation). The graph of the rainfall data for the Naryn and Tien-Shan stations is shown in
Figure 3 (other rainfall graphs are not shown for brevity).
4. Results and Discussion
The analysis of the effects of rainfall on the model’s forecasting performance required the comparison of the outputs achieved in the various scenarios including rainfall data to the results of the scenario without rainfall (FO), which was thus considered as the base case. The forecasting accuracy of the LSTM and BILSTM models in the FO scenario was examined by comparing the performance metrics at the validation stage.
Figure 5 shows the comparison between predictive and observation curves in the FO scenario, in both LSTM and BILSTM cases. Overall, the streamflow was relatively well forecasted by the two models during the validation phase. The agreement between observed and forecasted streamflow was relatively similar, with comparable R
2 values. In the two cases, the low flow values were better predicted than the peak flows, which were slightly overestimated, apart from the first peak flow in the BILSTM case, which was slightly underestimated.
Figure 6 shows the comparison between observation and prediction in the All-RF, Up-RF, Down-RF, and P-RF scenarios in the LSTM case. The prediction curves appear less smothered and present intermittent fluctuations compared to the FO scenarios. The P-RF scenarios visibly yielded a slightly better fit compared to the other scenarios, which are also reflected in a slightly greater R
2 value than that recorded in the other scenarios. Neither scenario could reproduce the peak values accurately, all being slightly shifted forward over time and slightly misestimated in magnitude. The first peak was overestimated in the All-RF scenario but underestimated in the Up-RF scenario. The closest fit to the first peak was observed in the P-RF scenario. The second peak was overestimated in the Up-RF and P-RF, and was visibly better in the All-RF scenario. The third peak was also overestimated in the Up-RF, and the All-RF, but slightly less in the latter scenario. The worst prediction–observation fit was exhibited in the Down-RF scenario, where all peaks were noticeably overestimated. On the other hand, all scenarios presented some odd fluctuations during the lows, especially in the All-RF, Up-RF, and Down-RF scenarios. The P-RF scenarios also exhibited similar behavior, but only in the second low period (t = 335–350 months), while a relatively smoother match was observed in the first low period.
Figure 7 shows the comparison between the observation and prediction in the All-RF, Up-RF, Down-RF, and P-RF scenarios in the BILSTM case. Similarly, the predicted curves exhibited some odd fluctuation patterns compared to the FO scenario. However, the forecasted curve of the P-RF scenario did not present such discrepancies, but rather presented a smoother behavior, as observed in the FO scenario. Nonetheless, the All-RF scenario produced the best overall fit with an R
2 value of 0.8598, while the worst overall prediction–observation fit was observed in the Down-RF scenario, with an R
2 of 0.6425.
Again, all multivariate scenarios presented slight discrepancies in predicting the peak times, all being slightly shifted forward time-wise, and slightly misestimated in magnitude. The first peak was overestimated in the Up-RF and P-RF scenarios but underestimated in the All-RF and Down-RF scenarios. The second peak was overestimated in the Up-RF and P-RF scenarios, albeit slightly less in the former scenario. The Down-RF and All-RF scenarios both presented underestimated peak values. The third peak was overestimated in the Up-RF and Down-RF scenarios, albeit more in the latter than the former. The P-RF reproduced a somewhat similar magnitude to the observed values, but shifted forward. The All-RF scenario exhibited a relatively good match throughout the occurrence of the peak from a timing and magnitude standpoint, but the tip presented a slightly sharper shape.
Figure 8 shows the performance metrics recorded in all the scenarios in the LSTM and BILSTM cases during the validation phase. In the LSTM case, the FO scenario exhibited the best forecasting performance among all the scenarios, while the Down-RF scenario yielded the least accurate predictions. Specifically, the RMSE values were 75.59 m
3/s, 77.14 m
3/s, 87.29 m
3/s, 73.73 m
3/s, and 58.97 m
3/s in the All-RF, Up-RF, Down-RF, P-RF, and FO scenarios, respectively. In other words, the RMSE was increased by about 28%, 31%, 48%, and 25% compared to the FO scenario in the All-RF, Up-RF, Down-RF, and P-RF scenarios, respectively. The MAE values were 50.72 m
3/s, 52.46 m
3/s, 57.09 m
3/s, 48.21 m
3/s, and 40.88 m
3/s in the All-RF, Up-RF, Down-RF, P-RF, and FO scenarios, respectively. This means that the MAE was increased by 24%, 28%, 40%, and 18% in the All-RF, Up-RF, Down-RF, P-RF, and FO scenarios, respectively. Regarding the overall fit, the highest and lowest R
2 values were in the FO and Down-RF scenarios, respectively, with 0.8535 and 0.6791 recorded in the former and the latter. The scenarios All-RF, Up-RF, and P-RF yielded comparable fits.
In the BILSTM case, the All-RF and P-RF were able to achieve performance metrics more comparable to the FO scenario, albeit the MAE was still smaller in the latter scenario. The worst performance was recorded in the Down-RF scenario, as observed in the LSTM case. The RMSE was 57.70 m3/s, 71.35 m3/s, 92.13 m3/s, 57.53 m3/s, and 59.74 in the All-RF, Up-RF, Down-RF, P-RF and FO scenarios, respectively. The results show that multivariate scenarios containing multiple input features, as opposed to the univariate scenario containing a single data type (flow), did not improve the predictive accuracy of either model, regardless of the positions of the weather stations. In other words, feeding an ML model with a large disparity in input variables (hydrological and meteorological data) does not necessarily lead to a gain in the model’s forecasting performance. Thus, these findings show that rainfall may not necessarily be among the controlling input variables for streamflow forecasting in the Syr Darya River.
Hence, the All-RF and P-RF scenarios induced very comparable model performances in the FO scenario. However, the Up-RF scenario induced an increase of 19% compared to the FO scenario, and the Down-RF scenario achieved an RMSE 54% greater. Regarding the MAE values, the BILSTM achieved similar values in the All-RF and P-RF scenarios, with 39.40 m3/s and 39.80 m3/s. In other words, the MAE achieved in these two scenarios was greater than in the FO scenario, by 12% and 13% in the former and the latter, respectively. The Up-RF and Down-RF scenarios yielded higher MAE values compared to the FO scenario, with a respective 33% and 59,40% increase. The overall fit was very comparable in All-RF, P-RF, and FO scenarios, with R2 values of 0.8598, 0.8606, and 0.8497, respectively. The scenarios of Up-RF and Down-RF induced a lower fit, with an R2 of 0.7857 and 0.6426, respectively.
These findings suggest that selecting only the rainfall data upstream of the flow monitoring station tends to make a positive contribution to the model’s forecasting performance while, also, including data from downstream rain gauges complicates training and thus may reduce the model’s predictive accuracy, which is particularly important in the context of data scarcity, as is the case in most developing countries.
Also, in comparison to the LSTM case, the BILSTM model achieved RMSE values of 24%, 7.5%, and 22% in All-RF, Up-RF, and P-RF scenarios. However, the Down-RF was increased by 5.5% compared to the LSTM model, while the FO scenario yielded relatively similar outputs in both models. Also, the BILSTM achieved 22%, 11%, 17%, and 14% smaller MAE values than the LSTM model in All-RF, Up-RF, P-RF, and FO scenarios, respectively. However, the Down-RF scenario achieved similar MAE values in the two models. The coefficient of determination, R2, recorded in the BILSTM case was higher in All-RF and P-RF scenarios. In the Up-RF and FO scenarios, the R2 values were comparable, albeit slightly higher in the Up-RF scenario. However, the Down-RF scenario yielded a slightly worse fit in the BILSTM.
Amongst all the multivariate scenarios, the results show that the models performed the least well in the Down-RF scenario. In other words, the forecasting performance was substantially reduced when the models were fed with rainfall data from stations located downstream of the streamflow monitoring station. In the LSTM case, the results also show that the models exhibited comparable performances in the All-RF, the Up-RF, and the P-RF scenarios, although the model performed slightly better in the latter scenario. This means that the LSTM model trained with rainfall data from all the stations yielded comparable forecasts as when trained with rainfall data only from upstream stations, which is particularly important in the context of data scarcity, as is the case in most developing countries.
In the BILSTM case, the performance achieved by the model was equivalent in the All-RF and P-RF scenarios but was worsened in the Up-RF scenario. Furthermore, the results suggest that the two models trained with only rainfall data with some level of correlation with streamflow data (P-RF) also helped improve the forecasting performance compared to when feeding with all available rainfall data. The results also show that the forecast was the most satisfactory in the FO scenario, regardless of the multitude of rainfall datasets considered. In other words, none of the multivariate LSTM-based models could outperform the univariate LSTM-based models, i.e., considering only streamflow data. While Le et al. proved that incorporating rainfall data into their LSTM forecasting model did not significantly improve forecasting performance [
22], our results show that the multivariate scenarios containing multiple input features as opposed to the univariate scenario containing a single data type (flow) did not noticeably improve the predictive accuracy of either forecasting model, regardless of the positions of the weather stations with respect to the position of the flow monitoring station. Hence, these results call into question the relevance of including rainfall data as a predictor in streamflow-forecasting applications in the Syr Darya River basin.