Historical flood event 2006

From Table 2, 83% of grids have RMSE smaller than 0.2 m, and the rest of the grids have RMSE around zero in 3 h prediction. For the 6 h prediction, 79% of grids have RMSE less than 0.2 m. In the 9 h and 12 h prediction, the area with large errors grows slightly (see Table 2). From Figure 6, the inundation maps from 3 h and 6 h predictions match well with the hydraulic inundation maps. The 9 h and 12 h are less accurate, especially in the southwest of the study area, which is further away from the location of the discharge inflows and is, thus, likely less sensitive to the changes in the discharge inputs. In brief, all the prediction for flood event 2006 is precise with more than 82% grids having RMSE less than 0.3 m.

#### Historical flood event 2013

For the historical flood event 2013, the discharge forecast threshold for the forecast start was reached later, signaling that the discharge forecast threshold is indeed effective in starting and stopping the forecast. Therefore, the start of the forecast is picked up at a later moment in time once one discharge crosses the forecast threshold of 10 m<sup>3</sup> /s for a second time. The red line in Figure 7 marks the new start of the forecast (red line). For this event, it is nine hours later after the first forecast signaled by the ANN. Table 3 shows that the ANN model achieved high accuracy for the flood event in 2013. For the 3 h prediction, 96% of the grids have RMSE less than 0.2 m. 6 h prediction has 82% grids with RMSE less than 0.2 m. From 3 h and 6 h prediction of event 2013, the ANN performs better than the event 2006. Overall, the event of 2013 is also well predicted, with over 78% of grids having RMSE less than 0.3 m. Similar to event 2006, the predicted flood inundation maps of 3 h and 6 h intervals are similar to the hydraulic inundation simulations (see Figure 10).

#### Historical flood event 2005

The general model performance for this flood event is less good. Figure 10 shows the comparison between the predicted inundation maps of ANN, where the inundated area is underestimated, compared to the hydraulic model (see the dark blue area in Figure 10b,d,f,h. However, Table 4 shows the prediction accuracy is still sufficient (above 65%) considering as acceptable an error within 0.3 m for the prediction of the first intervals for flood event 2005. The comparison can be observed in the water depth maps in Figures 6, 8 and 10, particularly when comparing the subplots (e) with (f) and (g) with (h). The ANN results differ more from the hydraulic results at points located closer to the southwest end of the study area. Reason being that those are the points that are further away from the inflow points (Figure 4).

## *5.2. Assessment of Real-time Forecasting of Water Depths for Multistep Flood Forecast Intervals, 1–5 h*

In this section, we investigate if we can train different ANNs for the first interval of the forecast (from time 0) to predict the multistep forecast, the hypothesis of this study. Herein we test the ANN for four forecast intervals, namely for the next 3, 6, 9, 12 hours (X), and for five forecast starts 1–5 hours (S). This is represented by the format "X h + S".

#### Historical flood event 2006

Table 5 shows the forecast accuracy of the historical flood event in 2006. The forecast of the 2006 flood event shows a good accuracy in 3 h, 6 h and most of 9 h (over 70% grid with RMSE < 0.3 m). It is visible that as the multistep forecast shifts further away from the original start used for the model training (X h + 5 h forecasts in Table 5), the ANN model performance decreases.

#### Historical flood event 2013

The discharge forecast threshold of the flood event from 2013 exceeds shortly at the beginning. Hence, the forecast is deactivated and reactivated again when the discharge exceeds the forecast threshold of 10 m<sup>3</sup> /s for the second time, namely 9 hours after time 0 (the first time the forecast window was activated). From the second starting point, all other forecasts are done for every forecast for X h + 1 h to X h + 5 h. Table 6 shows the forecast accuracy of the historical flood event of 2013. From this table, the ANN model performs as similar to the flood event in 2006. The forecast of the 2013 flood event has good results in 3 h, 6 h and most of 9 h (over 70% of the grid with RMSE < 0.3 m).

#### Historical flood event 2005

Table 7 shows the accuracy percentage of the grids evaluated by the RMSE less than 0.3 m. With the changing of the forecast starting point, all the forecasts of different intervals have similar RMSE as the forecast done during the first intervals. It is noticeable that the model provides a good forecast (over 70% grids with RMSE < 0.3 m) for 3 h intervals for all starting points. This shows that 3 h ANN trained for the first interval could be used to forecast subsequent intervals with a slight drop in the overall accuracy. However, the forecasts of 6 h, 9 h and 12 h show a poor performance (Tables 5–7). Similar to the other events, most of the errors occur in the southwest of the study area (Figure 10). However, in this particular event, the errors are substantially larger at the southwest than at the city center, hence the overall poor performance of the ANNs.

In all the three historical events, the forecast accuracy decreases as the forecast interval increases from 3 h to 12 h (see Tables 5–7). One exception occurs in the event 2006 between 9 h and 12 h, where the 12 h forecast has higher accuracy than the 9 h forecast. From the discharge curve (see Figure 5), unlike in other events, the two major discharges are falling after the peak value, which could be the reason for the higher accuracy at 12 h in this case.

#### *5.3. Forecast of the Inundation Extent*

The forecasts of flood inundation extent growths are examined through the statistical analysis proposed by Li et al. [33]. Figure 11 shows three indices, POD, FAR and CSI, for measuring the forecast performance of the flood inundation extent. Analyzing the POD index (see Figure 11a,d,g), it is clear that for the 3 h ANN forecast, the accuracy decreases slightly as forecasts proceeds from 3 h + 0 h to 3 h + 5 h. In other words, the accuracy of the 3 h ANN network is more sensitive to the shift of the forecast intervals than the 6 h, 9 h and 12 h ANN networks. The 3 h network achieves the best forecast performance for the first interval (training interval same as the forecast intervals). When moving forward for multistep forecast, shifting each hour decreases the POD by a value that varies between 0.08 to 0.1. This means that an added 8% to 10% of the inundation extent displayed by the hydraulic model is missing in the ANN forecast. In any case, and except for the event of 2005, the POD exhibits values above 70% for the first 2 hours of the forecast.

The FAR index (see Figure 11b,e,h) indicates the false-alarm percentage of the ANN forecasted flood inundation extents. In all three events, it is noticeable that the area percentage with false-alarms decreases over all the forecast networks when the forecast interval moves forward. It shows the ANN forecast produces more percentage of false alarms at the early stage in a flood event. This is because the flood inundation is relatively small at the beginning and the number of outer pixels larger than that of inner pixels, causing a higher number of false alarms. Moreover, the decreasing trend of POD and FAR indicate that the ANN model tends to change from overestimation to underestimation when the forecast starts to shift from 0–5 h.

The CSI index (see Figure 11c,f,i) shows the percentage of agreement of the ANN forecasts of the flood inundation extent to the hydraulic model. The ANN predicts better the flood inundation extents for two events of 2006 and 2013 than for the event of 2005, with the CSIs from 2006 and 2013 close to 0.6. For the event of 2005, the CSI is around 0.4 showing a poor accuracy forecast in the flood inundation extent forecast.

It is noteworthy to mention that the ANN shows an expectable performance of water depth prediction decreasing with lead time (see Tables 5–7). However, if we focus on the inundation extent, it seems contradictory, as the 12 h prediction shows better performance (see CSI and POD in Figure 11). The latter apparent contraction can be explained by the flood inundation extent being limited by the topography. The topography limits the size of the inundation, making it easier for the ANN to predict it better.

#### **6. Conclusions**

The aim of this study is to perform multiple subsequent forecasts for 1–5 h after the flooding event has started. It was shown that it is possible to use different ANNs for the first interval of the forecast (time 0) to issue the multistep forecasts. However, there should be made a distinction between the quality of the forecast regarding the water-depths or the flood inundation extent. The overall forecast performance of the water-depths was found slightly better than the flood inundation extents. The performance was mostly adversity affected by the flood event from 2005, in particular, close to the southwest end, far away from the location where the input inflows are.

The ANN model was first applied to the forecast of the first intervals of 3 h, 6 h, 9 h and 12 h. For the 60 synthetic flood events in the testing dataset, the model produced good results, as over 81% grids with RMSE less than 0.3 m. For the historical event 2006 and the historical event 2013, the model performed good water depths with the accuracy of over 82% and 78%, evaluated by RMSE smaller than the error threshold of 0.3 m. The flood event 2005 has a sufficient performance with an accuracy of over 65%, evaluated by RMSE smaller than 0.3 m. The forecasted inundation maps by ANN of all the three historical events have a similar shape to the inundation maps from the hydrodynamic model (HEC-RAS). For the far end area away from the inflow inputs, the long-distance may be responsible for a decrease in the forecast performance; therefore, it is likely that the model requires other information than those of discharge to enhance the forecast accuracy for those areas.

The ANN model was applied for the real-time forecast of the historical events in 2006, 2013 and 2005. For this purpose, the same ANN model was used for the forecast. The input discharge inputs were replaced by the shifted intervals for 1–5 h after the event's beginnings. The forecast shows good results in the flood events 2006 and 2013 for the real-time forecasts, with over 70% grids with RMSE less than 0.3 m. The forecast shows worse results in flood event 2005, with only over 58% grids with RMSE less than 0.3 m. Overall, the forecast accuracy drops as the forecast interval increased from 3 h to 12 h. The forecast accuracy also decreases as the forecast progresses forward from X h + 1 h to X h + 5 h. For all the three historical flood events, the 3 h forecast is classified as good, with more than 70% grids accurately forecasted. However, the quality of 6 h or longer intervals was more event dependent.

Based on the analysis of indices of POD, FAR and CSI, the multistep ANN flood forecast provides good results at the beginning and decreases as the forecast progresses. The forecasts of the ANN model switches from an overestimation to an underestimation when the forecast proceeds from 0 h to 5 h. In our case, except for the event 2005, the 3 h ANN trained by the first interval improved the performances slightly with the multistep forecast; the 6 h, 9 h and 12 h ANN trained by the first interval for multistep interval forecasts would have accuracies depending on the exact flood events.

Future research could include recurrent neural networks with long short-term memory to involve the water depth information acquired from previous forecasted steps for a multistep forecast. To reduce the forecasted time interval for finer temporal multisteps could also be another possibility to enhance the accuracy of the forecast.

**Author Contributions:** Conceptualization, J.L.; methodology, Q.L.; data curation, S.G.; software, S.G.; investigation, Q.L.; visualization, Q.L.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L., J.L. and M.D.; formal analysis, Q.L.; validation, Q.L. and S.G.; supervision, M.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Bavarian State Ministry of the Environment and Consumer Protection (StMUV) with the grant number 69-0270-50433/2017. The APC was funded by Technical University of Munich.

**Acknowledgments:** The research presented in this paper has been carried out as part of the HiOS project (Hinweiskarte Oberflächenabfluss und Sturzflut) funded by the Bavarian State Ministry of the Environment and Consumer Protection (StMUV) and supervised by the Bavarian Environment Agency (LfU).

**Conflicts of Interest:** The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
