*3.1. Study Area and Datasets*

Application of the proposed approach is proven to forecast monthly streamflow in the Xiangxi River basin, which is located in the western Hubei province and is part of the Three Gorges Reservoir region with a basin area of about 3100 km<sup>2</sup> (between 30◦570–31◦340 N and 110◦250–111◦060 E, shown in Figure 3) in China. The Xiangxi River, originating in the Shennongjia Mountain area, is a tributary of the Yangtze River with a main stream length of 94 km [51,52]. Due to the influence of typical subtropical continental monsoon climate characteristics, the annual precipitation in this basin is between 670 and l700 mm [53]. The annual average temperature of this region is 15.6 ◦C and ranges between 12 ◦C and 20 ◦C.

The amount of streamflow is affected by many factors, a large part of which involve geographical and climatic conditions. Specifically, the climatic conditions consist of a collection of meteorological variables such as the air temperature (◦C) and the precipitation (mm). Previous studies have proven that precipitation has a significant effect on both short- and long-term streamflow [54,55]. Therefore, the total monthly precipitation is used as a predictor in this study. Most importantly, the initial catchment conditions are nonnegligible factors affecting the streamflow generation and confluence. Moreover, the monthly average temperature is also applied as a predictor for streamflow forecasting [56]. It is noted that observations of hydrological processes tend to vary with time [57]. The occurrence of rainfall events is closely related to the fluctuation in streamflow, especially the distribution of a rainfall event is crucial to the influence of peak discharge (i.e., flood events). In addition, considering the climatic characteristics of the watershed, the snowmelt runoff (mainly in winter) is relatively little, so the influence of snowmelt runoff is ignored. The available hydrological (streamflow, unit: m3/s) and meteorological data (temperature and precipitation) from 1962 to 2009 were obtained from the Xingshan Hydrometric Station (located at 110◦4500" E, 31◦1300" N, as shown in Figure 3), which was provided by the Hydrological Bureau of Xingshan County. Considering that Xingshan Hydrometric Station is the largest hydrological control station in Xiangxi watershed (the representative station of the Three Gorges Hydrological Zone between 1000–3000 km<sup>2</sup> ), the hydrometeorological data of Xingshan Station was used for the streamflow forecasting. Moreover, as a lumped hydrological model, good results have also been achieved in the process of streamflow simulation in the earlier study of Kong [51].

In this study, considering that the current streamflow at month t and the streamflow (and precipitation) of the previous month has a certain correlation, the monthly streamflow (St) and precipitation (Pt) data sets were separated into multiple lead time factors such as Pt-1 and St-1, St-2, and St-12, where St-1, St-2, and St-12 represent streamflow at 1, 2, and 12 months ahead of forecast month t, respectively [58,59]. These factors together with the monthly average temperature (Tt) are potential prediction factors (inputs) to predict the monthly streamflow St (response variable). In the out-of-sample test of this study, the

data set at a specific time point was divided into a training data set (38 years) for model calibration and a test data set (10 years) for validation of the model performance. Then, the predictions were repeated five times using different training and test data sets. The specific data set division method, namely 5-fold cross-validation models, is jointly shown in Table 1 and Figure 4. *Sustainability* **2021**, *13*, x FOR PEER REVIEW 11 of 24

**Figure 3.** The study area. **Figure 3.** The study area.

**Table 1.** Cross-validation models with different sets of calibration and validation data. **Table 1.** Cross-validation models with different sets of calibration and validation data.


**Figure 4.** Selection of calibration and validation dataset with the 5-fold cross-validation method. **Figure 4.** Selection of calibration and validation dataset with the 5-fold cross-validation method.

#### *3.2. Evaluation Measures 3.2. Evaluation Measures*

In order to evaluate the performance of the developed models, in this study, four commonly used statistical evaluation methods are selected for model evaluation, including the coefficient of determination (R2), the root mean square error (RMSE), and the Nash–Sutcliffe efficiency coefficient (NSE) and Mean Absolute Error (MAE). Then, the formulae for R2, RMSE, NSE, and MAE can be written as follows: In order to evaluate the performance of the developed models, in this study, four commonly used statistical evaluation methods are selected for model evaluation, including the coefficient of determination (R<sup>2</sup> ), the root mean square error (RMSE), and the Nash– Sutcliffe efficiency coefficient (NSE) and Mean Absolute Error (MAE). Then, the formulae for R<sup>2</sup> , RMSE, NSE, and MAE can be written as follows:

$$R^2 = \frac{1}{K} \sum\_{j=1}^{K} \left[ \left( \frac{\sum\_{i=1}^{n} \left( Q\_i - Q\_{avg} \right) \left( P\_i - P\_{avg} \right)}{\sqrt{\sum\_{i=1}^{n} \left( Q\_i - Q\_{avg} \right)^2} \sqrt{\sum\_{i=1}^{n} \left( P\_i - P\_{avg} \right)^2}} \right)^2 \right] \tag{16}$$

$$RMSE = \frac{1}{K} \sum\_{j=1}^{K} \left[ \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (Q\_i - P\_i)^2} \right] \tag{17}$$

$$MAE = \frac{1}{K} \sum\_{j=1}^{K} \left[ \frac{1}{N} \sum\_{i=1}^{N} |(P\_i - Q\_i)| \right] \tag{18}$$

$$NSE = \frac{1}{K} \sum\_{j=1}^{K} \left[ 1 - \frac{\sum\_{i=1}^{n} (Q\_i - P\_i)^2}{\sum\_{i=1}^{n} \left( Q\_i - Q\_{avg} \right)^2} \right] \tag{19}$$
 
$$\text{all number of observations (or predictions), K is the number of}$$

( ) 1 1 *<sup>n</sup> <sup>j</sup> i avg <sup>i</sup> K Q Q* <sup>=</sup> <sup>=</sup> − where *n* indicates the total number of observations (or predictions), *K* is the number of repeated forecasting periods (*K* = 5), *Q<sup>i</sup>* and *P<sup>i</sup>* are the observed and simulated values; *Qavg* and *Pavg* are the averages of all of the observed and simulated values, respectively.

The 90% confidential interval containing ratio (CR90) and its dispersion index (DI) are also used to evaluated the reliability and sharpness of the probabilistic predictions,

respectively. CR90 is the ratio of observations covered by the 90% prediction interval. The range is between 0 and 1, and the best effect is 0.90. DI is the ratio of the average width of the 90% prediction interval to the observed value, with the lower the value, the better the prediction [60].

$$\begin{cases} \text{CR90} = \frac{\sum\_{i}^{N} k\_{i}}{N}, \; k = \begin{cases} 1, s\_{l}(i) \le o\_{l} \le s\_{\text{\textquotedblleft}}(i) \\ 0, o\_{l} < s\_{l}(i) \, or \, o\_{l} > s\_{\text{\textquotedblleft}}(i) \end{cases} \end{cases} \tag{20}$$
 
$$DI = \frac{1}{N} \sum\_{i=1}^{N} \frac{s\_{\text{\textquotedblleft}}(i) - s\_{l}(i)}{o\_{i}}$$

where *k<sup>i</sup>* indicates the *i*th observation *o<sup>i</sup>* in the 90% confidence interval with the bound [*sl* (*i*), *su*(*i*)] and *N* is the number of observations. Notably, from the perspective of flood forecasting, A high CR90 is still insufficient to illustrate a good prediction, and a high corresponding DI indicates an overestimation of uncertain boundaries.

To further illustrate the applicability of the CVQR model in streamflow forecasting, the relative estimated root mean square error (RRMSE) and relative mean absolute error (RMAE) are used to evaluate the comparison between the CVQR, ANN, and MLR models at different quantiles [61]:

$$\begin{cases} \begin{array}{c} \text{RRMSE} = \frac{RMSE^{\text{model}}}{RMSE^{\text{CVQR}}}\\ \text{RMAE} = \frac{MAE^{\text{model}}}{MAE^{\text{CVQR}}} \end{array} \end{cases} \tag{21}$$

in which the RMSE and MAE of the three models are acquired from Equations (17) and (18); RMAE and RRMSE stand for the relative performances of the proposed model (CVQR), for which values greater than one suggest a worse relative performance compared to the proposed model.
