**1. Introduction**

Most Asian countries depend on agriculture, and Pakistan is the country most dependent on agriculture. A huge part of the gross domestic production (GDP) of Pakistan is based on agriculture, and the agriculture sector of this country is mostly reliant on irrigation water generated by the Upper Indus Basin. The water scarcity report of the International Monetary Fund (IMF) ranked Pakistan in the third position among water scarcity in the year 2018. The International Panel for Climate Change (IPCC) estimated that the total temperature increased by 0.72 ◦C in the period from 1951 to 2012. The expected temperature will likely increase from 1 ◦C to 3 ◦C until 2050 and from 2 ◦C to 5 ◦C until 2100, depending on the different gas emission circumstances documented by IPCC-2013. Water reserves are the core of most crises in countries such as Pakistan, where the economy, culture, and textiles are intimately connected to irrigation water. The effects of warming drift on the outer circle are particularly unique and complex [1].

Throughout history, floods have been recorded as one of the most devastating natural disasters capable of causing severe personal damage as well as destroying property. In

**Citation:** Imran, M.; Majeed, M.D.; Zaman, M.; Shahid, M.A.; Zhang, D.; Zahra, S.M.; Sabir, R.M.; Safdar, M.; Maqbool, Z. Artificial Neural Networks and Regression Modeling for Water Resources Management in the Upper Indus Basin. *Environ. Sci. Proc.* **2023**, *25*, 53. https://doi.org/ 10.3390/ECWS-7-14199

Academic Editor: Athanasios Loukas

Published: 14 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

recent years, the expansion of flood events has been a problem resulting in a large number of deaths every year. Another main reason is that as the human population is increasing, human communities are becoming closer to water resources. Flood-affected people's infrastructure and lives have been severely damaged and disturbed [2].

In this study, data-driven approaches based on the statistical relationship between input and output data were used to forecast flood events in the UIB. A possible alternative to the current approaches for the hydrological forecast of streamflow may be data-driven approaches such as ANNs [3].

### **2. Materials and Methods**

### *2.1. Study Area*

Research study area of Indus basin spans within the following geographical points of 33–75◦ N and 72–78◦ E covered by the Swat hillock, along with the Mohamad, Mangla Complex in the north, the district of Charsadda in the southwesterly area, and the Kotli, Mangla, Kalam Complex to the west, as shown in Figure 1. Kalam Station is at a high mean sea level of 5821 km and Munda Dam is at a low mean sea level of 376 km. The research region is characterized by climatic lateral tropical and wet temperate zones, with thunderstorms and snowfall. Summers are hot (41.9 ◦C) and winters are frigid (0.8 ◦C) in the chosen location. The effluvium ornament is a cardinal aspect in determining water potential. The flow of a locality is defined by its effluvium compactness, which is the relative amount of rainfall that enters. As a result, the lower the runoff, the higher the chances of recharging.

**Figure 1.** Digital elevation model of the Upper Indus Basin with gauge stations.

### *2.2. Data Collection and Model Description*

Pakistan is one of the most climatically varied countries due to its wide temperature range, and it has a higher occurrence of rainfall in the UIB including extreme flood events. Extreme climatic devastations specifically in the forms of floods have impacted precious lives and financial losses in Pakistan in the last three decades (e.g., 2011, 2020 and 2022). Data on the extreme events of stream flow were collected for the duration from 1971–2009. The data were collected from the Water and Power Development Authority (WAPDA) and the Pakistan Meteorological Department (PMD) [4].

### 2.2.1. LSTM Model

Deep learning is a type of neural network that uses a larger number of layers and layer types to model complex systems and interactions. Because traditional neural networks cannot retain temporal information, recurrent neural networks were developed using previous time step information. LSTMs are a deep learning version of recurrent neural networks that can remember information for longer. To change the data, LSTM cells use gates, vector addition, and multiplication to remove or add information.

### 2.2.2. SARIMA Model

SARIMA is a regression model, and all regression models assume that the values in a dataset are independent of one another. When using regression to predict time series, it is critical to ensure that the data are stationary, which means that statistical properties such as variance do not change over time. In ARIMA, "AR" denotes the "autoregressive" component, which is the lag of the stationary series, "MA" denotes the lags of the forecast errors, and "I" denotes the order of differentiation to make the series stationary. The SARIMA (1, 0, 1) × (0, 1, 1) 12 model was used in this study for the forecasting of flow in the UIB. To ensure that the time series data were stationary, the Dickey–Fuller test was used. The resulting *p*-value was less than 0.03 and the test static was −3.739768, allowing us to reject the null hypothesis and conclude that the data were stationary. The seasonality period(s) was a 12-month moving average, and the minimum AIC score was 138.065 at 29 time steps.

### 2.2.3. Model Evaluation Criteria

Root mean square error (RMSE) is a statistical method commonly used to compare predicted values with observed values in hydrology fields to evaluate the performance of forecasting models. Based on the relative range of the data, the RMSE is frequently used to evaluate how closely the predicted values match the observed values.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\text{Y} - \text{Y}i)^2} \tag{1}$$

In Equation (1), Y and Yi are actual and predicted discharges at time t, respectively; Y is the mean of actual discharges; and *n* is the total number of observations.

### 2.2.4. Model Structure

Our study is related to open-source software libraries. According to the literature, Python [5] is the programming language of choice. The NumPy [6], Pandas [7], and Matplotlib [8] libraries are also imported for data processing, management, and visualization. We created the LSTM model and TensorFlow [9], a Google open-source software library. TensorFlow was originally designed to conduct machine learning, deep learning, and numerical computation research using data flow graphs. However, this framework is comprehensive enough to be applicable to a wide range of domains.

### **3. Results and Discussions**

The forecasting models developed were validated and tested using independent data. When comparing the actual data and forecast values made during the validation and testing process, the RMSE values were used to evaluate the qualitative and quantitative performance of the scenarios. The size of the training datasets varies depending on the scenario. A thirty-four-year dataset (1971–2004) was used for training, while a five-year dataset (2005–2009) was used for testing and a five-year dataset (2010–2014) was used for forecasting.

The RMSE in the two-prediction model for testing and training is shown in Table 1. From the observation of the below RMSE results, the LSTM model performs better for training as well as testing as compared to the SARIMA model.


**Table 1.** Model Evaluation Results.

### *Training, Testing, and Forecast Results*

Figure 2a–c shows the visual comparison results of the actual and forecasted flow data from Kalam Station. Figure 2a shows the comparison results of the SARIMA model at Kalam Station. The blue line shows the training data, the green line shows the testing data, and the red line shows the forecasted data, which are the output data of our model. Figure 2b,c shows the comparison result of the LSTM model at Kalam Station. In Figure 2b, the green line shows the training data (actual values), and the red line shows the testing data (predicted values), while in Figure 2c, the green line is the observed values from the model and the red line is the forecasted values, which are the output data of our model.

**Figure 2.** *Cont*.
