**2. Methods**

Figure 1 illustrates the probabilistic forecast architecture that separately integrates the NARX (Figure 1a) with the UKF approach (Figure 1b). The related methods are briefly described below.

**Figure 1.** Probabilistic forecast architecture. (**a**) The non-linear auto-regressive with exogenous inputs (NARX) neural network model. (**b**) The unscented Kalman filter (UKF) post-processing approach.

### *2.1. Deterministic Flood Forecast Models Based on Artificial Neural Network (ANN)*

The NARX is a recurrent neural network suitable for time-series prediction [42,43]. This study uses it to carry out deterministic flood forecasting because its recurrent mechanism can effectively integrate rainfall and discharge data with the latest outputs of the model to alleviate the time shift phenomenon and improve model reliability. The NARX network produces recurrent connections from

the outputs, which could delay several unit times to produce new inputs (Figure 1a). This nonlinear system for M-step-ahead forecasting (M ≥ 1) can be described mathematically below:

$$\mathbf{\hat{y}}(\mathbf{t}+\mathbf{M})=\mathbf{f}(\mathbf{\hat{y}}(\mathbf{t}+\mathbf{M}-1),\ldots,\mathbf{\hat{y}}(\mathbf{t}+\mathbf{M}-\mathbf{q}),\mathbf{y}(\mathbf{t}),\mathbf{r}(\mathbf{t}),\ldots,\mathbf{r}(\mathbf{t}-\mathbf{p})) \tag{1}$$

where *R*(*t*) = [*y*(t),*r*(*t*), ... ,*r*(*t* − *p*)] and *y*ˆ(*t* + *M*) denote the input vector and the output value at the time step t and t + M, respectively. *f*(·) is the nonlinear function. p and q are the input-memory and output memory orders. Two inputs: *y*ˆ(*t* + *M* − *i*) (*i* = 1, 2, ... , *q*) serves as an autoregressive variable (e.g., forecasted runoff), *R*(*t*), serves as an implicit exogenous variable (e.g., observed rainfall and runoff) in a time series.

The BPNN, a static neural network, is implemented for comparison purposes. Both BPNN and NARX have the same model architecture of one input layer, one hidden layer and one output layer. The NARX model is a dynamic ANN model with the recurrent mechanism whereas the BPNN is a non-recurrent ANN model. The mathematical equation of the BPNN model is described as follows:

$$\circ(\mathbf{t} + \mathbf{M}) = \mathbf{f}(y(\mathbf{t}), \mathbf{r}(\mathbf{t}), \dots, \mathbf{r}(\mathbf{t} - \mathbf{p})) \tag{2}$$

In this study, the Levenberg–Marquardt back-propagation algorithm is used to train both ANN models [44]. The transfer functions of hidden and output layers are of a sigmoid type and a linear type, respectively, owing to their practicability and good performance in flood forecasting [45].

The parameters of both models consist of: the maximal epoch = 1000, the initial learning rate (mu) = 0.001, the increasing factor of mu = 10, the decreasing factor of mu = 0.1 and the maximal value of mu = 1000 respectively. Besides, the two ANN models are configured to have one hidden layer with 8–10 nodes in the perspective of different forecast horizons. The value of output memory *q* is set as one whereas the value of input memory p should be determined by using correlation analysis methods (presented in the section of study area and materials).

#### *2.2. Probabilistic Forecasting Based on the Unscented Kalman Filter (UKF)*

The UKF is an optimal recursive data processing algorithm and can be introduced to find the optimal estimation error for each state in multi-step-ahead flood forecasts [17]. The unscented Kalman filter is applied to model a nonlinear flood forecasting system and described as below:

$$\mathbf{x}(t+1) = \mathcal{g}(\mathbf{x}(t), \mathbf{u}(t), \mathbf{v}(t), t) \tag{3}$$

$$\mathbf{w}(t) = h(\mathbf{x}(t), \mathbf{u}(t), t) + \mathbf{w}(t) \tag{4}$$

where *x*(*t*) is the n-dimensional state of the flood forecasting system at time-step t, *u*(*t*) is the input vector at time-step t, *v*(*t*) is the k-dimensional state noise process vector owing to disturbances and model errors, *z*(*t*) and *w*(*t*) are the observation vector and noise, *g*(·) and *h*(·) are the distributions of forecasted and observed datasets, both are assumed following Gaussian distribution.

The implementation procedures for the UKF (Figure 1b) contain the following four steps [17].

Step 1: Computation of set of sigma points and assignment of weights to all sigma points. The *n*-dimensional random variable *x* with mean *x* and covariance *Pxx* is transformed to 2*n* + 1 weighted points described below:

$$\mathbf{x}\_0 = \overline{\mathbf{x}}\_\prime \,\, \mathcal{W}\_0 = \lambda / (\mathfrak{n} + \lambda) \tag{5a}$$

$$\mathbf{x}\_{i} = \overline{\mathbf{x}} + \sqrt{(n+1)\mathbf{P}\_{\mathbf{x}\mathbf{x}\prime}} \text{ } \mathcal{W}\_{i} = 1/2(n+\lambda) \tag{5b}$$

$$\chi\_{i+n} = \overline{\mathfrak{x}} - \sqrt{(n+1)\mathfrak{P}\_{\text{xx}}} \; \mathcal{W}\_{i+n} = 1/2(n+\lambda) \tag{5c}$$

where χ0, χ*<sup>i</sup>* and χ*i*+*<sup>n</sup>* are the 2*n* + 1 sigma points, *W*0, *Wi* and *Wi*+*<sup>n</sup>* are the 2*n* + 1 weight coefficients, λ is the spreading parameter.

*Water* **2020**, *12*, 578

Step 2: Transformation of the points through non-linear function. The transformed set is produced by using a nonlinear function and the predicted mean and covariance are described below:

$$\mathfrak{X}\_{i}(t+1|t) = \mathfrak{g}\left(\mathfrak{x}\_{i}^{\mathfrak{n}+\mathbf{k}}(t|t), \mathfrak{u}(t), t\right) \tag{6}$$

$$\mathfrak{X}(t+1|t) = \sum\_{i=0}^{2(n+k)} \mathcal{W}\_i \mathfrak{X}\_i^{n+k}(t+1|t) \tag{7a}$$

$$\mathbf{P}(t+1|t) = \sum\_{i=0}^{2(n+k)} \mathcal{W}\_{i} \cdot \left[ \mathbf{x}\_{i}(t+1|t) - \hat{\mathbf{x}}(t+1|t) \right] \cdot \left[ \mathbf{x}\_{i}(t+1|t) - \hat{\mathbf{x}}(t+1|t) \right]^{\mathrm{T}} \tag{7b}$$

where χ*i*(*t* + 1|*t*) is the transformed set. *x*ˆ(*t* + 1|*t*) and **P**(*t* + 1|*t*) are the predicted mean and covariance of the transformed set.

Step 3: Computation of Gaussian function from weighted and transformed points.

$$\mathbf{Z}\_{i}(t+1|t) = h(\mathbf{x}\_{i}(t+1|t), \boldsymbol{\mu}(t), t) \tag{8}$$

where *Zi*(*t* + 1|*t*) is the observed dataset computed by using the transformed point.

Step 4: Probabilistic forecasts. A Monte Carlo simulation is conducted to create probabilistic forecasts. A realization of observation *h*m at the horizon m can be simulated according to the Gaussian function (Equation (7)) and the Monte Carlo simulation would be repeated for K times. K is the number of Monte Carlo samples and set as 1000 in this study; 90% confidence intervals are employed to reveal the uncertainty of probabilistic flood forecasts. Then, both observed and forecasted datasets are transformed into the real space for evaluating the performance of UKF probabilistic forecasts.

#### *2.3. Evaluation Indicators*

To evaluate the deterministic forecast accuracy and predictability of flood peak and flood volume, the mean absolute error (MAE), the peak percent threshold statistics (PPTS) [46] and the Nash–Sutcliffe efficiency (NSE) coefficient [47] were introduced accordingly. Their mathematical expressions are described below:

$$\text{MAE} = \frac{1}{N} \sum\_{\mathbf{t}=1}^{N} [\hat{Z}(\mathbf{t}) - Z(\mathbf{t})]\_{\mathbf{t}} \text{ MAE} \geq 0 \tag{9}$$

$$\text{PPTS}(\mathbf{l}, \mathbf{u}) = \frac{1}{(\mathbf{k}\_{\mathbf{l}} - \mathbf{k}\_{\mathbf{u}} + 1)} \sum\_{t=1}^{N} \left( \left| \frac{\mathbf{Z}(\mathbf{t}) - \mathbf{Z}(\mathbf{t})}{Z(\mathbf{t})} \right| \right) . \text{PPTS} \ge 0 \tag{10}$$

$$\text{NSE} = 1 - \frac{\sum\_{\mathbf{t}=1}^{N} \left( \hat{Z}(\mathbf{t}) - Z(\mathbf{t}) \right)^{2}}{\sum\_{\mathbf{t}=1}^{N} \left( Z(\mathbf{t}) - \overline{Z}(\mathbf{t}) \right)^{2}}, \text{NSE} \le 1 \tag{11}$$

where *kl* = *<sup>l</sup>*×*<sup>N</sup>* <sup>100</sup> and *ku* <sup>=</sup> *<sup>u</sup>*×*<sup>N</sup>* <sup>100</sup> . N is the number of observed data while l and u are the lower and higher limits in percentage, respectively. For instance, PPTS(l, 10%) denotes the flood percentage threshold statistic of larger than 10% data whereas PPTS(90%, u) denotes the flood percentage threshold statistic of smaller than 90% data. *Z*ˆ(*t*), *Z*(*t*) and *Z*(*t*) are the forecasted data (i.e., model output), observed data and the average of observed data at the *t* time, respectively.

The containing ratio (CR), the average relative bandwidth (RB) and the continuous ranked probability score (CRPS) were used for assessing the performance of probabilistic forecasts [48,49]. Their mathematical formulas are described below.

$$\mathbf{N}(\mathbf{t}) = \begin{cases} 1, & \text{if}\Big(\mathbf{q}\_{\mathrm{l}}(\mathbf{t}) \le \hat{\mathbf{Z}}(\mathbf{t}) \le \mathbf{q}\_{\mathrm{u}}(\mathbf{t})\Big) \\ & \text{0, else} \end{cases} \tag{12a}$$

$$\text{CR} = \frac{\sum\_{\mathbf{t}=1}^{\text{N}} \text{N}(\mathbf{t})}{\text{N}} \times 100\% \tag{12b}$$

$$\text{RB} = \frac{1}{\text{N}} \sum\_{\mathbf{t}=1}^{\text{N}} \left( \frac{\mathbf{q}\_{\text{u}}(\mathbf{t}) - \mathbf{q}\_{\text{l}}(\mathbf{t})}{Z(\mathbf{t})} \right) \tag{13}$$

$$\text{CRPS} = \int\_{-\infty}^{+\infty} \left[ \mathbf{F}^{\mathbf{f}}(\mathbf{x}) - \mathbf{F}^{\mathbf{o}}(\mathbf{x}) \right]^{2} \mathbf{dx} \tag{14}$$

where *ql*(*t*) and *qu*(*t*) are the lower and upper limitation of the forecasted value at the t time, *Ff*(*x*) and *Fo*(*x*) are the cumulative distribution functions (CDF) of the forecast and observation distributions, respectively, x is the variable of the CDF.

From the standpoint of model performance, the indicators of MAE, PPTS and NSE are employed to evaluate the accuracy of deterministic flood forecasts while the indicators of CR, RB and CRPS are employed to evaluate the correctness and sharpness of probabilistic flood forecasts. The general implementation programming of the NARX and UKF can be downloaded from the Statistics and Machine Learning Toolbox of the Matlab software (website: https://ww2.mathworks.cn/products/ statistics.html#machine-learning).

#### **3. Study Area and Materials**

Figure 2a illustrates the study area. The Yangtze River is famous for the longest river of China, and its length and drainage area are 6300 km and 1.80 million km2 accordingly. The Three Gorges Reservoir (TGR) is a pivotal hydraulic facility that serves many purposes consisting of flood control, hydropower production and navigation etc. We notice that the upper Yangtze River' tributaries have notoriously complex hydro-geological characteristics. Plenty of studies were devoted to developing reliable and accurate short-term (less than 24-h ahead) flood forecast models for the Yangtze River due to the extremely non-linear relationship between rainfall and runoff over this basin during storm events [50,51]. Besides, a small improvement in the reliability and accuracy of short-term flood forecasts could be critical and beneficial to flood prevention as well as the dynamic management of the TGR. Heavy rainfalls usually cause floods and further induce downstream flooding within one day. In consequence, rational reservoir operation as well as river basin management require reliable and accurate flood forecasting so that they can adequately handle the high uncertainty of river discharge ranging from 8000 up to 70,000 m3/s, according to the observed reservoir inflows (Figure 2b).

The inflow of the TGR contains three parts: the upstream inflow from the discharge of the XJB reservoir, the inflows from 8 tributaries (flow stations F1–F8), and rainfall (aggregated into two variables: Rainfall-I and Rainfall-II) monitored by 67 rain gauge stations spreading over the TGR intervening basin. Inflow and rainfall data for use in this study were gathered from 2003 to 2018 at a temporal scale of 6 h. When model training completes (2003–2010, 8 years), two ANN models (BPNN and NARX) are constructed. Then the trained ANN that creates the best forecast accuracy in the validation period (2011–2014, 4 years) is identified as the final model to be evaluated upon model reliability using test datasets (2015–2018, 4 years). The Kendall tau coefficient analysis [38,52] is employed to extract the highest correlation of lag-times between model input and output values. According to the highest correlation coefficients, lag-times between the inflow of the TGR and flow/rainfall at various gauge stations are set as 6 h (TGR), 48 h (XJB reservoir), 48 h (F1), 48 h (F2), 42 h (F3), 42 h (F4), 24 h (F5), 18 h (F6), 18 h (F7), 12 h (F8), 42 h (Rainfall-I) and12 h (Rainfal-II) [36]. To reduce the adverse effect of the distinct scales of input data on model performance, all 12 input variables (one autoregressive variable plus 11 exogenous variables) were transformed into the same scale during data preprocessing.

**Figure 2.** Study area and the statistical characteristics of reservoir inflows. (**a**) Locations of the Xiang-Jia-Ba (XJB) Reservoir and the Three Gorges Reservoir (TGR), river flow as well as rain gauging stations; (**b**) The boxplot of TGR inflows collected in flood seasons (from 1 June to 30 September) during 2003 and 2018 at a temporal scale of 6 h.

#### **4. Results and Discussion**

This study intends to promote the predictability of the deterministic flood forecast model that is integrated with a UKF for probabilistic forecasting at different time horizons.

#### *4.1. Performance of Deterministic Flood Forecasts*

The short-term (one-day-ahead) forecast not only provides a crucial guideline in reservoir operation but also offers a warning to residents at inundation prone areas. Consequently, a 24 h lead time at a temporal scale of 6 h is suggested to evaluate the performance of two deterministic flood forecast models (NARX and BPNN). Four horizons (t + 1 to t + 4) are specified that 6 h (t + 1) is the first prediction, 12 h (t + 2) is the second prediction, 18 h (t + 3) is the third prediction and 24 h (t + 4) is the fourth prediction.

Table 1 summarizes the results of the NARX and BPNN models. It indicates that the BPNN model creates much lower NSE values but much higher PPTS and MAE values than the NARX model at all three stages. The PPTS indicator is able to show model forecast accuracy in different flood magnitudes. For instance, PPTS (l, 10%) denotes the flood percentage threshold statistic of larger than 10% data. The results in Table 1 demonstrate that the PPTS indicator raises gradually if the lead time increases from t + 1 to t + 4. The NARX model makes the smallest increments in PPTS and MAE values along the lead time while the BPNN model makes the largest ones. In addition, the NARX model displays its superiority compared with the BPNN model in predicting high-magnitude floods regarding frequencies of 2% and 5%. Given a lead time of one day (24 h), the NARX model can increase the NSE by 14.5% while decreasing the PPTS (l, 2%) by 35.8% and the MAE by 20.1% at the testing stage, in comparison to the BPNN model. The results demonstrate that the NARX model performs much better than the BPNN model as the forecast horizon increases. Therefore, it is obvious that the NARX model incorporated with a recurrent mechanism can improve forecast accuracy at longer horizons by feeding itself with the forecasted inflows attained from previous horizons.


**Table 1.** Performances (peak percent threshold statistics (PPTS), mean absolute error (MAE) and Nash–Sutcliffe efficiency (NSE)) of NARX and back-propagation neural network (BPNN) models in three stages.

To distinguish between the BPNN and NARX models on deterministic forecasting capabilities in the testing stage, three flood events with maximal peak-flows reaching 35,000 m3/s (small), 50,000 m3/s (medium) and 60,000 m3/s (large) are specified to test both models by evaluating the goodness-of-fit between observed and forecasted values at time-step t + 3 and t + 4 (Figure 3). The results appear to show that the NARX model is able to significantly mitigate the time gap between the observed and forecasted flood peaks. Furthermore, the NARX model can produce good forecast results at time-step t + 3 and t + 4, whereas the BPNN model creates a noticeable time-lag problem and fairly big gaps between observed and forecasted values at time-step t + 3 and t + 4. This demonstrates that the NARX model is able to effectively reduce time-lag phenomena and provide accurate deterministic flood forecasting results.

**Figure 3.** TGR inflow forecasts using the NARX and the BPNN models. Three flood events with maximal peak-flow exceeding (**a**) 35,000 m3/s (small-scaled), (**b**) 50,000 m3/s (medium-scaled) and (**c**) 60,000 m3/s (large-scaled).

Although the NARX model provides substantial evidence of a good performance in flood forecasting, it is easy to produce systematic under-prediction results for extreme flood events (Figure 3). In addition, hydrologic uncertainties raised in model inputs (e.g., precipitation), as well as the structure and parameters, can be the drivers and causes of time-lag problems occurring in flood forecasting. Therefore, we next adopt a processing approach (i.e., UKF) to decrease the hydrological uncertainty based on the presumption that no uncertainty encounters in model input data.

### *4.2. Probabilistic Flood Forecasting Performance*

Table 2 illustrates the results of the CR, RB and CRPS corresponding to the UKF plus NARX and the UKF plus BPNN approaches in all three stages at forecast horizons (t + 1 to t + 4). It supports the superiority of the UKF plus NARX approach in all three stages, whereas the UKF plus BPNN approach performs almost as well as the UKF plus NARX one only in the training stages at forecast horizons up to t + 2 (RB is lower than 0.15, CRPS is lower than 1200 m3/s and CR is higher than 90%). The results demonstrate that the UKF plus NARX approach achieves higher reliability in probabilistic flood forecasting than the UKF plus BPNN one. For horizon t + 4 (one day ahead) in the testing stage, the UKF plus NARX approach is able to increase the CR value by 8.48% while reducing the RB and the CRPS values by 22.73% and 20.31%, respectively, in comparison to the UKF plus BPNN one. In other words, the UKF plus NARX approach is able to significantly improve probabilistic forecast accuracy by producing the narrower prediction bound.


**Table 2.** Probabilistic forecasting performance (containing ratio (CR), average relative bandwidth (RB) and continuous ranked probability score (CRPS)) regarding TGR reservoir inflow.

For obviously differentiating the capabilities of the UKF plus NARX and the UKF plus BPNN approaches in the testing stage, the three aforementioned flood events are still specified to test both approaches by evaluating whether the observations fall in the 90% confidence interval at lead time t + 4 (Figure 4). It appears that most of the observations lie within the 90% confidence intervals created by both probabilistic forecasting approaches while the UKF plus NARX approach offers a narrower prediction range. Gneiting et al. [1] advocated that the maximization of the sharpness of the predictive distribution is the goal of probabilistic flood forecasts. Therefore, the UKF plus NARX approach is considered superior to the UKF plus BPNN one.

**Figure 4.** Probabilistic flood forecasts for TGR of the testing stage at lead time t + 4. Three flood events with maximal peak-flow exceeding (**a**) 35,000 m3/s (small-scaled); (**b**) 50,000 m3/s (medium-scaled); and (**c**) 60,000 m3/s (large-scaled). The 90% confidence bound is corresponding to the lower (5%) and upper (95%) limitation of the forecasted value at time t.

For median forecasts at horizons from t + 1 up to t + 4 in three stages, the indicators of MAE and CRPS closely associated with the deterministic forecast model (NARX) and the UKF plus NARX approach are listed in Table 3. It is revealed that the median forecasts of the UKF plus NARX approach would output lower values of MAE and CRPS indicators than those of the deterministic forecast model NARX after the horizon t + 2. That is to say, from the perspective of the median forecast, the probabilistic forecast approach also can mitigate the drawback of under-prediction.


**Table 3.** Results of median forecasts regarding inflows of Three Gorges Reservoir.

In summary, the UKF plus NARX approach not only can create more accurate and robust probabilistic forecasting results but can also mitigate the problems of systematic under-prediction of extreme flood events. Despite UKF indeed correcting for bias, it also produces a probabilistic forecast which we believe is more important. Furthermore, the reasons that we do not conduct a comparative analysis between LKF (including the autoregressive model, AR(1)) and NKF consist of: first, the AR(1) is a simple formation of LKF. The LKF approach can only identify the linear error estimation whereas the NKF approach can quantify the non-linear error estimation. Second, the rainfall-runoff process is intrinsically complex and non-linear and demands non-linear techniques for quantifying predictive uncertainty.

We want to mention that the probabilistic forecasting approach only spent around 60-s calculation temporal cost to provide the deterministic (within 50 s) and the probabilistic (within 10 s) flood forecasting with respect to the inflows of the TGR. A DELL computer conducted the computation (Intel® CoreTM i5, 7th Generation CPU @ 2.50 GHz, RAM 8 GB and 1 TB Hard Disk).

#### **5. Conclusions**

We adopt a probabilistic forecasting methodology that hybridises UKF and ANN to generate posterior distributions from observed and forecasted inflows for effectively reducing the predictive distributions occurring in data-driven flood forecasting to small ranges. The contribution of the UKF approach depends on modeling the non-linear correlation among hydrologic variables and on reducing the uncertainty arosing in flood forecasting. The results demonstrated that the recurrent neural network (NARX model) produced more accurate and stable deterministic flood forecasting on the inflows as longer lead time and effectively mitigated time-lag effects as compared with the static neural network (BPNN model). The reason could be due to a key strategy: the recurrent mechanism drives the integration of the antecedent observations and forecasts of input variables into the next forecasting step for alleviating the accumulation and propagation of multi-step-ahead forecast errors.

Nevertheless, the NARX model also suffered the technical barrier of systematic under-prediction of extreme flood events. Therefore, the UKF was applied to the post-processing of deterministic forecasts obtained from the NARX model. The results demonstrated that the UKF plus NARX approach not only can provide a narrower predictive distribution on the inflow series at longer forecast horizons but also can significantly alleviate under-prediction phenomena. The reason can be due to the key strategy: the effective quantification of the non-linear correlation among variables for lessening hydrologic uncertainty. Finally, it is worth noting that the computational time of the UKF plus NARX approach is extremely short (less than 60 s); therefore, it can be applied with success to real-time flood forecasting. **Author Contributions:** Y.Z. carried out the analysis and wrote the article; S.G., C.-Y.X. and F.-J.C. developed the methodology; C.-Y.X., F.-J.C. and J.Y. provided technical assistance and contributed in writing the article. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (No. 51539009 and No. U1865201), the National Key Research and Development Program of China (2018YFC0407904) and the Research Council of Norway (FRINATEK Project 274310).

**Acknowledgments:** We thank the Changjiang Water Resources Commission of China for providing the monitoring data, and the data can be downloaded from this website (http://www.cjh.com.cn, Chinese). The authors would like to thank the editors and anonymous reviewers for their constructive comments that greatly contributed to improving the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
