**Flood Early Warning and Risk Modelling**

Editors

**Marina Iosub Andrei Enea**

MDPI ' Basel ' Beijing ' Wuhan ' Barcelona ' Belgrade ' Manchester ' Tokyo ' Cluj ' Tianjin

*Editors* Marina Iosub Integrated Center of Environmental Science Studies in the North Eastern Region—CERNESIM, Sciences Department, Interdisciplinary Research Institute; Faculty of Geography and Geology Alexandru Ioan Cuza University Iasi Romania Andrei Enea Faculty of Geography and Geology Alexandru Ioan Cuza University of Iasi Iasi Romania

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Hydrology* (ISSN 2306-5338) (available at: www.mdpi.com/journal/hydrology/special issues/flood risk modelling).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3778-8 (Hbk) ISBN 978-3-0365-3777-1 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


### **About the Editors**

### **Marina Iosub**

PhD Marina Iosub is a Scientific Researcher at L4-CERNESIM-ICI and Associate Assistant at Faculty of Geography and Geology, Alexandru Ioan Cuza University of Iasi. She obtained the title of Doctor of Geography, in 2017, at "Alexandru Ioan Cuza"University of Iasi, Romania. PhD Iosub Marina's research interests are related to hydrological risk analysis and flood simulations. Research directions focus on interdisciplinary research on natural hazards, GIS used in the analysis of hydrological risks by applying different methods of spatial modeling and calculating the probabilities of their occurrence, mathematical modeling of floods using specific software, statistical analysis of climate change, and the creation of risk maps. Another direction of research is given by remote sensing analyses of extreme hydrological phenomena, but also of the response of vegetation to various extreme climatic phenomena.

### **Andrei Enea**

PhD Andrei Enea is an Assistant Professor at Faculty of Geography and Geology, Alexandru Ioan Cuza University of Iasi, and he obtained his PhD in Geography in 2017 at "Alexandru Ioan Cuza"University of Iasi, Romania. His research fields of interests are related to GIS (Geographic Informational Sistems), hydrology, remote sensing, drone imagery, and land use analysis. The research directions focus on performing vector and raster analyses at different spatial scales, flood modeling, quantifying flood risk (and other phenomena), satellite images analyses, or using aerial drone photogrammetry for modeling areas, through SFM (Structure From Motion) techniques, generating terrain models, performing analyses related to land use, land cover and their dynamics. Furthermore, another field of interest is using multi-criteria analysis, on small- and large-scale areas, for identifying spatial dependencies both on isolated, and on more complex phenomena, from a temporal resolution perspective. Lastly, PhD Andrei Enea studies spatial model creation using GIS functions, to perform natural and anthropic analysis.

### *Editorial* **Flood Early Warning and Risk Modelling**

**Marina Iosub 1,2,\* and Andrei Enea 2, \***


### **1. Introduction**

The evolution of mankind during the last 2 centuries has generated an ever growing thrive for increased production, for the need to create novel means to generate energy and for society to change into a more consumerism-oriented version. All of these aforementioned arguments induce repeated and frequently irreversible repercussions on nature, and through constant alterations of the environment, significant global climate changes have been observed especially in the most recent times. During the last half of the century, all of these climatic changes have induced, as direct effects, the exponential rise in the number of natural disasters, and this increase is also emphasized by the ever-growing intensification of these phenomena, with devastating effects [1]. Out of all these extreme natural phenomena, floods are statistically considered some of the most damage-inducing, with an ever-more frequent recurrence time period, and can originate from fluvial sources, from extreme torrential rainfall, from coastal floods, or even due to underground causes. On a global scale, out of the entirety of registered events that took place since 1900, floods account for 25% of the events that occurred since 1900, 38% of the events registered since 1960, and 45% for events that took place since 2000 [2]. The population affected by this type of risk has dramatically decreased since the 1960 ′ s, but the economic damages from floods, as a shared portion of the GDP at a global level, has risen from 0.02% (during the timespan between 1960–1969), up to 0.05% (between 2010–2019) [3]. The economic implications (costs of infrastructure, insurances, and even social-related issues) induced on the local communities are expected to increase even more in the following years, due to the expansion of the urban built-up areas, economic growth, and especially climate change [4,5].

The fact that natural phenomena, such as floods, induce tremendous amounts of damage, both from an economic and social perspective, has determined scientists to address this issue, and to search for potential solutions to identify areas, exposed to flood risk. In order to help mitigate this, multiple mathematical modelling algorithms were developed, with a variety of functions, such as calculating flow rate values, estimating possible recurring flow rates, mapping of flood risk, generating scenarios related to flood extent layers, warning and delay times for flash floods etc. [6,7]. Most of the aforementioned were mainly developed during the last 5 decades, justified by the increase in computational capabilities and in interest towards mitigating the negative impact of future events [2,3]. The main, general purpose they serve, is to describe the manner in which a drainage basin responds to sudden changes in flow rates, the increased values in torrential rainfall and estimate as precisely, as possible, the runoff, or even the groundwater related parameters, if required. Another aspect flood models address is the probability extent of floods along the main channel and floodplain for the main, collector river.

In order to mathematically model a flood event or simulation, significant temporal resources and labor have to be dedicated towards generating and validating the final results, therefore raising numerous technical challenges, associated to the complete characterization

**Citation:** Iosub, M.; Enea, A. Flood Early Warning and Risk Modelling. *Hydrology* **2022**, *9*, 57. https:// doi.org/10.3390/hydrology9040057

Received: 29 March 2022 Accepted: 29 March 2022 Published: 31 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and cartographic representation of several physical and hydrodynamic parameters, such as: the relationship between rainfall and runoff, the heterogenic distribution of precipitation, the morphometric and pedological aspects in the study area etc. The more these parameters are better calibrated, the significantly better the final results related to the hydrological risk are, in accordance with the in-field situation [8,9]. Recent progress in the modern techniques, based on long, representative datasets are related to the increased availability of these kinds of methods, in comparison to the classic hydrodynamic analysis methodologies, which aid in the real-time forecasting of flood events. Novel detection systems (such as remote sensing techniques), in relation to modern data acquisition means, and backed by experimental techniques, provide new opportunities for calibration, validation, testing and improving the flood models [10–12].

The heterogenic manner through which the issue of flood risk is addressed, at the scientific, quantifiable level has determined us to compile a Special Issue which would promote publications that address flood analysis and apply some of the most novel inundation prediction models, as well as various hydrological risk simulations related to floods, that will enhance the current state of knowledge in the field as well as lead toward a better understanding of flood risk modeling. Furthermore, the current Special Issue will address the temporal aspect of flood propagation, including alert times, warning systems, flood time distribution cartographic material, and the numerous parameters involved in flood risk modeling.

### **2. Contributed Papers**

The current Special Issue has managed to compile a total number of 6 published papers, with topics varying significantly, from the application of hydrological models specific to river valleys, to the study of entire drainage basins, by using 1D methods of analysis, or even 2D methods, with the help of virtual machines, and statistical data processing techniques, related to flood events and last but not least, flood early warning systems.

The article "*Towards Coupling of 1D and 2D Models for Flood Simulation—A Case Study of Nilwala River Basin, Sri Lanka*" by Dhanapala, L.; Gunarathna, M.H.J.P.; Kumari, M.K.N.; Ranagalage, M.; Sakai, K.; Meegastenna, T.J. describes the coupling method of two hydraulic models, in order to mathematically simulate floods on the valley of Nilwala River, in Sri Lanka. The methods used are HEC-HMS for the 1D analysis, while for 2D hydrological modelling, the method applied is iRIC. The model has been validated and calibrated using data from 3 selected events that have taken place during 2017–2019-time frame. The generated results have been validated to have an overall accuracy of 81.5%. Considering the novelty of the applied methodology, the authors have proposed that this method should be applied on more recorded hydrological events, in order to help emphasize the forecasting capabilities of the model [13].

One of the most recent large-scale hydrotechnical issues of current times is related to dams, and their potential risk to fail catastrophically. Considering that water resources are an ever-increasing global issue, governments have looked for different types of solutions to identify and prevent dam failure. The severity of the topic is emphasized by the fact that most current dams have been built at the beginning of the 20th century, which, correlated to the generalized state of degradation of most dams, corresponds to the continuous aging of these types of constructions. The current study "*Flood Mapping from Dam Break Due to Peak Inflow: A Coupled Rainfall–Runoff and Hydraulic Models Approach*" by Tedla, M.G.; Cho, Y.; Jun, K. addresses the dam break simulation of Kesem Dam, from Ethiopia, using the coupling 1D rainfall-runoff method HEC-HMS, which was used to predict flood volume, peak rates, and the runoff hydrograph, as well as the 2D hydraulic model HEC-RAS, used to simulate a piping dam break. Final results indicate the predisposition of the dam to flood risk through partial or complete breaking, for flow rates corresponding to recurrence periods between 100–200 years. The resulting maps from the 2D simulations aid in the identification of risk exposed areas downstream of dams [14].

For this particular study: "*Hydrological and Hydraulic Flood Hazard Modeling in Poorly Gauged Catchments: An Analysis in Northern Italy*" by Aureli, F.; Mignosa, P.; Prost, F.; Dazzi, S., a method for hydraulic 2D simulation has been applied, through the PARFLOOD algorithm, with the purpose of generating a flood hazard map in the poorly gauged drainage basins, with the study area located along the Chiavenna river valley in Italy. The article is divided into two methodological sections: in the first part, there is an analysis of the regional flood frequency from where the synthetic design hydrographs are derived, which are also imposed as upstream boundary conditions for 2D modeling. In the second part, 2D flood simulations are performed for 2 scenarios: a real one, in which 20 bridges are taken into account as cross sections; and a hypothetical one, in which bridges are not taken into account. The 20 sections have a role in indicating whether or not they influence the flood manifestation, depending on the way in which the bridges are dimensioned and located across the riverbed. Therefore, it was concluded that many of them influence the flood depth and the extent of flooded areas upstream of the bridges. The final results are the more important, the easier they are to be included in the design of road infrastructure and hydrotechnical builds for civil protection [15].

Understanding how floods manifest introduced the need to create models that aid in the efficient taking of decisions during the early stages of flood occurrences, or even before they manifest in the inhabited areas. The current study "*Flood Early Warning Systems Using Machine Learning Techniques: The Case of the Tomebamba Catchment at the Southern Andes of Ecuador*" by Muñoz, P.; Orellana-Alvear, J.; Bendix, J.; Feyen, J.; Célleri, R. addresses the comparative processes and results for 5 machine learning algorithms designed to simulate the flood water quantity that is associated to runoff due to short term floods, inside mountainous drainage basins of small or medium sizes. The models have been tested to emphasize the results in a clear, easily understandable form, by any non-expert person, in 3 categories (No-Alert, Pre-Alert and Alert). One of the suggested limitations of the applied methodology is that it offers very good performance only for the up to the 6-hour limit for lead times for forecasting result, as generated by the algorithms [16].

In the context of ever more frequent catastrophes and exposure of society to their impact, the need to differentiate between forecasting and nowcasting gathers more and more attention. Furthermore, the need to quickly and efficiently disseminate information regarding the manifestation of floods has determined scientists to develop and apply increasingly more complex simulation models. The HAND model has been applied in representative areas in Iowa in the article "*Real-Time Flood Mapping on Client-Side Web Systems Using HAND Model*" by Hu, A. and Demir, I., and have managed to prove the high dependability and efficiency of this model, especially when paired with web-platforms. Local authorities can use this approach to identify the areas exposed to flood risk, without requiring trained personnel to interpret scientific data, during torrential events, resulting in floods. Finally, one of the main advantages of this method is the minimal usage of datasets originating from external sources [17].

Catastrophic events have always had a two-sided approach, between researchers, on one side, and the target population, on the other side, therefore implying the need to quickly disseminate useful information via an easily accessible platform. ESRI StoryMaps provides such a possibility, as proved by the authors Oubennaceur, K.; Chokmani, K.; El Alem, A.; Gauthier, Y. of the study "*Flood Risk Communication Using ArcGIS StoryMaps*". For this particular study, a StoryMap has been generated in the drainage basin of Petite-Nation River, which suffers from flood damage on a frequent basis. Both vector and raster data has been integrated into the StoryMap, to more accurately represent the on-site situation, on a high level of detail. As a case study, the manuscript emphasizes the current flood-related situation, regarding risk management, but also includes predictive data which is related to future climate change [18].

### **3. Conclusions**

The current Special Issue regarding Flood Early Warning and Risk Modelling, has successfully addressed different approaches of studying the way that flash floods manifest in the differently-sized drainage basins. Furthermore, these diverse papers have used 1D and 2D modelling methods, as well as comparative analysis between several machine learning algorithms, or hydraulic models, such as HAND model, which uses more morphometric based parameters. The emphasized has been also put on disseminating information on web-based virtual platforms, such as StoryMaps.

**Author Contributions:** The two Guest Editors (M.I. and A.E.) equally shared the work of soliciting, editing, and handling the contributions made to this Special Issue. All authors have read and agreed to the published version of the manuscript.

**Funding:** The creation of this Special Issue did not receive external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to acknowledge the efforts of all authors that contributed to the Special Issue.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Towards Coupling of 1D and 2D Models for Flood Simulation—A Case Study of** *Nilwala* **River Basin, Sri Lanka**

**Lanthika Dhanapala 1 , M. H. J. P. Gunarathna 1, \* , M. K. N. Kumari 1 , Manjula Ranagalage 2 , Kazuhito Sakai 3 and T. J. Meegastenna 4**


**Abstract:** The *Nilwala* river basin is prone to frequent flooding during the southwest monsoon and second intermonsoon periods. Several studies have recommended coupling 1D and 2D models for flood modelling as they provide sufficient descriptive information of floodplains with greater computational efficiency. This study aims to couple a 1D hydrological model (HEC-HMS) with a 2D hydraulic model (iRIC) to simulate flooding in the *Nilwala* river basin. Hourly rainfall and streamflow data of three flood events were used for calibration and validation of HEC-HMS. The model performed exceptionally well considering the Nash–Sutcliffe coefficient, percent bias, and root mean square error. The flood event of May 2017 was simulated on iRIC using the streamflow hydrographs modelled by HEC-HMS. An overall accuracy of 81.5% was attained when the simulated extent was compared with the surveyed flood extent. The accuracy of the simulated flood depth was assessed using the observed water level at *Tudawa* gauging station, which yielded an NSE of 0.94, PBIAS of −4.28, RMSE of 0.18 and R <sup>2</sup> of 0.95. Thus, the coupled model provided an accurate estimate of the flood extent and depth and can be further developed for hydrological flood forecasting on a regional scale.

**Keywords:** *Nilwala* river basin; coupled flood modelling; HEC-HMS; iRIC

### **1. Introduction**

A significant consequence of climate change is the increased frequency and intensity of extreme weather events [1]. Germanwatch described Sri Lanka as being among the top ten most affected countries due to climate change in 2018, with a global climate risk index of 19.0 [2]. Regional studies conducted on rainfall trends in Sri Lanka have indicated both increasing and decreasing patterns [3–5]. It was found that there was a higher tendency for short-duration high-intensity rainfall events to occur within the island [4], which often resulted in water-related natural hazards. Floods are destructive and occur more frequently than other natural disasters [6], with Asia being one of the most affected regions [7]. The Fifth Assessment Report of the IPCC states with medium confidence that the magnitude and frequency of recent floods are comparable to or surpass historical floods [1].

Sri Lanka was recently devastated by floods and landslides due to heavy southwest monsoonal rainfall in May 2017. This natural disaster led to 219 deaths, affecting approximately 230,000 families [8]. The most adversely affected districts by this disaster were Kalutara, Galle, Matara, Hambantota, and Ratnapura. Flooding occurred in towns, villages, and agricultural land bordering the *Kalu*, *Nilwala*, and *Gin* rivers. Matara city is located

**Citation:** Dhanapala, L.; Gunarathna, M.H.J.P.; Kumari, M.K.N.; Ranagalage, M.; Sakai, K.; Meegastenna, T.J. Towards Coupling of 1D and 2D Models for Flood Simulation—A Case Study of *Nilwala* River Basin, Sri Lanka. *Hydrology* **2022**, *9*, 17. https://doi.org/10.3390/ hydrology9020017

Academic Editors: Marina Iosub and Andrei Enea

Received: 10 December 2021 Accepted: 21 January 2022 Published: 25 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in the floodplain of the *Nilwala* river basin, an area prone to flooding at the onset of the southwest monsoonal rains. Matara is the administrative capital of the Matara district and is considered a commercial hub. The risk of flooding is significantly greater in urban areas, such as Matara city where impervious surfaces prevent infiltration and cause runoff to generate quickly [9].

A past study identified precipitation within the basin increases from the south to the north with an increasing elevation [10]. Thus, a significant proportion of the rainfall that contributes to flooding in the lower coastal plains occurs in the basin's upper region. There is a considerable period of time for extreme precipitation in upper *Nilwala* to accumulate into floods in the lower coastal floodplains. With the recent advances in information and communication technology, this study aims to assess the feasibility of applying a 1D–2D coupled model to simulate flooding in real-time that could provide sufficient lead time for evacuation and disaster preparation.

The Post Disaster Needs Assessment (PDNA) was conducted following the disaster in 2017 and identified several vital interventions: strengthening of the country's early warning system, the conduction of hazard risks, and vulnerability assessments were points focused on. Moreover, studies have shown that mortalities caused by floods depend on the scale or severity of the flood, the possibility of an early warning mechanism, and evacuation [11]. Computational models can be used to understand the nature of the flood and simulate hydrological responses to various control measures [12]. They can also generate flood risk maps for planning and developing the resiliency of high-risk regions [13]. The existing early warning system of the *Nilwala* river basin consists of a network of water gauging stations with extreme water levels categorized as alert, minor flood, and major flood. Flood warnings are issued based on this water level and its trend. There remains room for improvement of the system through flood modeling.

Hydraulic models can be classified dimensionally as 1D, 2D, and 3D models. Threedimensional models were developed relatively recently, requiring more descriptive parameters that remain unavailable for most basins [14]. The simplest representation of flow in a stream is as a one-dimensional flow in a longitudinal direction. Although these models are computationally efficient, they demonstrate certain limitations; namely the topological discretization of topography as cross-sections of the floodplain [15], the subjectivity of the location and orientation of cross-sections, and the incapability of simulating lateral diffusion of the flood wave [16]. A 2D model solves for water flow equations, both in longitudinal and lateral planes by using either finite element or finite volume analysis methods [17]. They can provide additional detailed information on the floodplain and allow the visualization of flood extent, but are computationally more demanding, requiring greater processing power and storage capacity.

In this study, we coupled a 1D hydrological model to a 2D hydraulic model. The upper portion of the basin is modelled one-dimensionally to carry the flood wave to the two-dimensionally modelled coastal floodplains, which is an urban area with a high risk of flooding. The coupled model will generate comprehensive information of the floodplain with greater computational efficiency [13]. The coupled model allows one to alter the roughness coefficient of the simulation extent, provide a more detailed description of the topography, simulate overland flow, and can be easily visualized as two-dimensional maps [12]. Teng et al., [18] describes several methods to couple a 1D model with a 2D model. These include; modelling a section of the stream in 1D and the rest in 2D, coupling a 1D drainage model with a 2D flood model [19], and coupling a 1D river model with a 2D floodplain model [14,20,21]. This study uses the second approach, where Hydrologic Engineering Center—Hydrologic Modelling System (HEC-HMS) is used to simulate drainage in upper *Nilwala* and is externally coupled with the 2D hydraulic model, the Nays2DFlood solver, in the International River Interface Cooperative (iRIC).

This approach has been successfully used for flood forecasting and urban flood management. A study conducted in the Brahmani and Baitarani river delta in India by coupling SWAT and SWMM with iRIC yielded satisfactory results and was able to identify the depth

and extent of inundation [19]. A similar study using a coupled model conducted in China was similarly successful and capable of assessing the effects of water conservancy structures within the simulation extent [22].

HEC-HMS is a rainfall-runoff hydrological model developed by the United States Army Corps of Engineers used extensively to simulate the hydrological response of catchments [23–25]. This model is used to estimate the runoff generated upstream of the 2D flood modeling extent and to provide the boundary conditions required for flood simulation. HEC-HMS was selected in this study due to its versatility in modeling hydrological processes and ability to model storm events. Nays2DFlood solver in iRIC is a 2D hydraulic model used for flood inundation mapping [19,26–29]. The iRIC is a recent hydraulic model that can model large flood events [30]. Owing to the nature of its recency, it has not been applied widely, but was able to simulate flooding in ungauged catchments where data are scarce. It is a free open-source software developed by Yasuyuki Shimizu and Toshiki Iwasaki of Hokkaido University, Japan.

### **2. Materials and Methods**

### *2.1. Study Area*

Figure 1 depicts the *Nilwala* river basin located along the southern coast of Sri Lanka. The river is 72 km in length and originates at *Panilkanda*, north-east of *Deniyaya,* at 1050 m above mean sea level [10]. The total drainage area of the river basin is 1025 km<sup>2</sup> [31]. Steep longitudinal slopes characterize the basin's upper region, while the lower basin consists of gentle slopes [31,32]. It passes several small towns such as *Deniyaya*, *Morawaka*, and *Akuressa* and flows into the ocean *Thotamuna* close to Matara city. The catchment is located in the wet zone, with most of the upper catchment covered by rainforests. A significant portion of the basin is used for cultivating agricultural crops such as paddy, coconut, tea, rubber, cinnamon, and citronella [32,33].

**Figure 1.** *Nilwala* river basin.

### *2.2. Data*

Hourly rainfall and streamflow data required for simulation were obtained from the Irrigation Department of Sri Lanka. The *Nilwala* river has three streamflow gauging stations; *Urawa*, *Pitabeddara*, and *Panadugama*, from which hourly streamflow data were collected. Continuous data of the 2017 flood event were not available for *Urawa* and *Panadugama* stations. Thus, calibration and validation of HEC-HMS was undertaken solely using data obtained from *Pitabeddara*. Hourly precipitation data from four stations in upper *Nilwala*; *Deniyaya*, *Panadugama*, *Pitabeddara*, and *Urawa* were additionally used. The data used in this study are summarized in Table 1.

**Table 1.** Summary of data used in study.


### *2.3. Rainfall-Runoff Modelling Using HEC-HMS*

The Hydrologic Engineering Center—Hydrologic Modelling System (HEC-HMS) model was designed by the US Army Corps of Engineers as a rainfall-runoff model to determine the hydrological response of watersheds [34]. It was developed for dendritic watersheds and has a wide array of applications, including urban drainage, flood-loss reduction, flood early warning system planning, reservoir spillway design, stream restoration, flow forecasting, surface erosion, sediment routing, and systems operation [35].

The *Nilwala* basin was delineated and divided into 20 subbasins using the HEC-GeoHMS extension in ArcGIS. Terrain preprocessing and basin processing tools were used to generate the basin model file containing the drainage network and delineated basin. Next, precipitation was estimated using the Thiessen polygon method [36], and suitable weights for each subbasin were defined to create a meteorological model. The basin and meteorological models were imported into HEC-HMS. Hourly precipitation data from the *Deniyaya*, *Panadugama*, *Pitabeddara,* and *Urawa* gauging stations, and observed discharge data for the selected events were imported into the HEC-HMS interface.

HEC-HMS allows for the separate modelling of hydrological processes; loss, transformation, baseflow, and routing with several models for each process. The selection of the model for each process should be based on the catchment characteristics, data availability, and whether the simulation is event-based or continuous [25].

The Snyder unit hydrograph method [37] was selected to simulate the transform in conjunction with the deficit and constant loss method, and the recession baseflow model [38]. The lag time, *t<sup>p</sup>* to be used in the Snyder unit hydrograph was defined as the time period between the centroid of excess rainfall and the peak discharge. It was calculated for each subbasin using the following relationship;

*t<sup>p</sup>* = *CCt*(*LLc*) 0.3, (1)

where *C<sup>t</sup>* = basin coefficient, *L* = length of the mainstream from the outlet to the basin divide, *L<sup>c</sup>* = length along the mainstream from the outlet nearest the watershed centroid and *C* = a conversion factor (0.75 for SI units).

The Muskingum–Cunge method [39] was used to model routing in all reaches with a Manning's n coefficient of 0.040 [40,41]. Parameters such as reach length and slope were estimated using topographical maps. A deterministic optimization approach was followed with NSE set as the objective function. Optimization trials were carried out for the deficit and constant loss model, and the recession baseflow model to determine proper parameters. Table 2 lists the parameters optimized in HEC-HMS.

**Table 2.** Parameters optimized in HEC-HMS model.


The model was initially applied to the basin's upper region reaching up to *Pitabeddara* due to the lack of reliable streamflow data below this point. Once the model was calibrated and validated, it was then extended to the entire basin. For the simulation, three flood events were selected: May 2017, May 2018, and September 2019. The May 2018 and September 2019 events were used for calibration, while the May 2017 event was used to validate the model. The two selected events for calibration had both reported flooding, although not at the magnitude of the May 2017 flooding event.

### *2.4. Hydrodynamic Modelling Using Nays2DFlood Solver in iRIC*

The International River Interface Cooperative (iRIC) is a free numerical simulation software consisting of various computational solvers to decipher hydrological problems. Nays2DFlood is one such solver in iRIC, which is used for flood simulation. The iRIC software platform was initiated by Professor Yasuyuki Shimizu (Hokkaido University) and Dr. Jonathan Nelson (USGS). Nays2DFlood relies on unsteady 2-dimensional plane flow simulations using boundary-fitted coordinates as the general curvilinear coordinates [42]. This solver can be applied to river basins with sparse data, especially in developing regions as it only requires the DEM without specific topographical dimensions of the river.

The equation of continuity (Equation (2)) and equations of motion (Equations (3) and (4)) of two-dimensional unsteady flows in a rectangular coordinate system are expressed as;

$$\frac{\partial h}{\partial t} + \frac{\partial (hu)}{\partial x} + \frac{\partial (hv)}{dy} = q + r,\tag{2}$$

$$\frac{\partial(hu)}{\partial t} + \frac{\partial(hu^2)}{\partial \mathbf{x}} + \frac{d(huv)}{\partial y} = -\log \frac{\partial H}{\partial \mathbf{x}} - \frac{\mathbf{r}\_\mathbf{x}}{\rho} + D^\mathbf{x} \tag{3}$$

$$\frac{\partial(hv)}{\partial t} + \frac{\partial(huv)}{\partial \mathbf{x}} + \frac{d\left(hv^2\right)}{\partial y} = -\log\frac{\partial H}{\partial \mathbf{x}} - \frac{\tau\_y}{\rho} + D^y \tag{4}$$

where,

*τx ρ* = *C<sup>f</sup> u* p *u* <sup>2</sup> + *v* 2 , (5)

*τy ρ* = *C<sup>f</sup> v* p *u* <sup>2</sup> + *v* 2 , (6)

$$D^{\chi} = \frac{\partial}{\partial \mathfrak{x}} \left[ v\_t \frac{\partial (hu)}{\partial \mathfrak{x}} \right] + \frac{\partial}{\partial y} \left[ v\_t \frac{\partial (hu)}{\partial y} \right] \,\tag{7}$$

$$D^y = \frac{\partial}{\partial \mathbf{x}} \left[ v\_l \frac{\partial (hv)}{\partial \mathbf{x}} \right] + \frac{\partial}{\partial y} \left[ v\_l \frac{\partial (hv)}{\partial y} \right],\tag{8}$$

where, *h* = water depth, *t* = time, *u* = flow velocity in *x* direction, *v* = flow velocity in *y* direction, *g* = gravitational acceleration, *H* = water surface elevation, *τ<sup>x</sup>* and *τ<sup>y</sup>* are riverbed shear stress in *x* and *y* directions, respectively, *C<sup>f</sup>* = riverbed friction coefficient, *vt* = eddy viscosity coefficient, *ρ* = density of water, *q* = inflow through a box culvert, a sluice pipe or a pump per unit area, and *r* = rainfall [42]. These basic equations, which present in a

rectangular coordinate system, are mapped to a generalized curvilinear coordinate system. The shape of the simulation mesh can be changed by accomplishing this depending on the boundary conditions [42].

The inputs required for iRIC are the DEM, satellite images, and streamflow hydrographs at the inlet points. The 30 m SRTM DEM from the United States Geological Survey (USGS) was used for the simulation, while the required background satellite images were obtained from Google Earth. Next, a calculation grid with 80 m grid size was created by defining a grid centerline representing the river's flow. Values for n<sup>I</sup> , nJ, and W were defined, where n<sup>I</sup> = number of divisions in the longitudinal direction, n<sup>J</sup> = number of divisions in the transverse division, and W = grid width in the transverse direction. Subsequently, calculation conditions were set, including boundary conditions, initial conditions, roughness conditions, and an appropriate time step for computation. The inflow streams identified earlier were then located on the grid. The discharge hydrographs obtained from the HEC-HMS model were then imported into iRIC manually and set as boundary conditions.

Impermeable objects such as roads, banks, embankments, and other flood control structures were located on the grid as obstacles. Matara city was outlined as an area of the grid occupied by buildings, and its areal fraction occupied by buildings was defined. Most calculation conditions of iRIC were left as default with a numerical upwind scheme used as the finite differential method. The northern, eastern and western boundaries were set as inflow boundaries while the southern boundary was modelled as having free outflow. The resulting inundation map's accuracy was determined by comparison with a survey of the flood extent carried out by the Irrigation Department. The depth of flooding was validated using observed water level data at *Tudawa* gauging station.

### *2.5. Statistical Evaluation*

Both models were statistically evaluated using Nash–Sutcliffe efficiency (NSE) [43], percent bias (PBIAS) [44], and root mean square error (RMSE).

The NSE indicates the goodness of fit of simulated data to observed data. It is calculated using the following equation, given by Nash and Sutcliffe (1970).

$$\text{NSE} = 1 - \frac{\sum\_{i=1}^{n} \left(O\_i - S\_i\right)^2}{\sum\_{i=1}^{l} \left(O\_i - \overline{O}\right)^2} \tag{9}$$

where *O<sup>i</sup>* = value of *i*th observation, *S<sup>i</sup>* = value of *i*th simulation, *O* = mean of observations and *<sup>n</sup>* = total observations. NSE values range from −<sup>∞</sup> to 1, with an NSE of 1 indicating a perfect fit.

The PBIAS measures a simulated data's tendency to be greater than or lesser than the observed values [44]. A negative value of PBIAS indicates a model overestimation bias, while a positive value indicates a model underestimation bias. A PBIAS value of 0 is optimal.

$$\text{PBIAS} = \frac{\sum\_{i=1}^{n} 100(O\_i - S\_i)}{\sum\_{i=1}^{l} O\_i} \,\tag{10}$$

The RMSE was performed for quantifying the prediction error in terms of the units of the variable calculated by the model [45].

$$\text{RMSE} = \sqrt{\frac{\sum\_{i=1}^{n} \left(O\_i - S\_i\right)^2}{n}},\tag{11}$$

### **3. Results and Discussion**

### *3.1. Calibration and Validation of HEC-HMS*

The Irrigation Department has only recently begun collecting hourly streamflow and rainfall data from gauging stations at *Panadugama*, *Pitabeddara*, and *Urawa*. Thus, only three storm events were found with significant precipitation that resulted in downstream flooding. Of the three selected events, the May 2018 and September 2019 events were selected for model calibration. The flooding event of May 2017, considerably more intense among the events, was selected for validation and subsequently modelled using iRIC.

The Snyder unit hydrograph method for transformation produced excellent results during calibration and validation processes. A past study conducted within the *Nilwala* river basin confirmed that the Snyder transformation method provided the most accurate results [46]. Another study conducted in the *Attanagalu Oya* basin using HEC-HMS concluded that the Snyder unit hydrograph simulated flow with greater reliability than the Clark unit hydrograph and SCS Curve Number methods [23]. Moreover, the study recommends applying this model in conjunction with the deficit and constant loss model to simulate river flow in the country's wet zone [23].

The recession baseflow model in HEC-HMS simulates a catchment behavior when flow recedes exponentially and was designed primarily for event simulations [34]. Recession baseflow and the Muskingum–Cunge routing method were successfully used to predict flow in the *Nilwala* river basin [46]. The downstream regions of the basin display a characteristic gentle slope, modelled more accurately using the Muskingum–Cunge method [47]. Detailed river cross-sectional data could not be obtained for the study, and thus the parameters for the routing model were estimated using the basin model generated by HEC-GeoHMS and satellite imagery. Moreover, obtaining 8-point cross-sectional data could significantly improve model performance. Alternatively, a trapezoidal shape design was specified for the routing channels.

During optimization, it was understood that the loss parameters influenced the peak of discharge, while the baseflow parameters affected the initial flow and recession curve. The transformation and routing processes influenced the time of the peak. After several optimization trials, the model was adequately calibrated for selected flood events.

Figure 2a depicts the observed and simulated hydrographs for the May 2018 event, while Figure 2b shows the September 2019 flood event. The May 2018 event provided the values of 0.93 NSE and 0.3 RMSE (Table 3), categorized as being excellent according to the criteria suggested by Moriasi et al., [48] for monthly hydrological simulations. The September 2019 event produced an NSE value of 0.746 after calibration and can be categorized as useful while having an RMSE value of 0.5. The PBIAS for both events was negative, −10.95% and −4.82%, respectively, indicating an overestimation bias.

**Figure 2.** HEC-HMS calibration at *Pitabeddara* station: (**a**) May 2018 storm event; (**b**) September 2019 storm event.



The observed and simulated hydrograph at *Pitabeddara* gauging stations for the May 2017 event is shown in Figure 3. The calibrated parameters were averaged and validated using the May 2017 event. As mentioned earlier, this event is a comparatively more severe flood event, and rainfall frequency analysis has classified this event as having a return interval of 75–100 years [49]. Due to the severity, the gauge station at *Pitabeddara* was completely flooded during the event's peak. The peak discharge of the flood was estimated by flood marks. Thus, there remains considerable uncertainty in the peak discharge of this event. However, the simulated flow using averaged parameters showed the goodness of fit with the observed flow with an NSE value of 0.927, a PBIAS of −8.33%, and an RMSE value of 0.3. As with calibration, the PBIAS indicates a model overestimation bias.

**Figure 3.** HEC-HMS Validation using May 2017 flood event at *Pitabeddara*.

### *3.2. Hydraulic Model Simulation*

Initial simulation runs identified that the model could not correctly identify the flow path of the mainstream in specific regions due to the coarser 30 m DEM used. As a result, the elevation file was modified to indicate the river's correct flow path. A study conducted for flood simulation in Kabul using iRIC, Shokory et al., [28] modified the DEM similarly as the flood would not pass over the original DEM. Initially, the model was applied to a smaller region, including Matara and the region immediately north of the city. This overestimated the city's flood as the model failed to account for flooding that occurred above this region. Thus, the simulation extent was extended further upstream up to *Akuressa* for more accurate simulation.

Six inlets were located within the simulation extent (Figure 4a). The hydrographs in Figure 4b, which were obtained using the HEC-HMS model were set as the boundary conditions for Nays2DFlood at each inlet. The highest contribution was seen from inlet 1, the main river, and included flow generated from a larger proportion of the upper catchment. Inlet 2, the second-largest contributor, represents a tributary of the *Nilwala* river, *Kirama Oya*.

Paddy cultivation represents the primary land use of the region surrounding Matara city. The 30 m DEM could not recognize the complicated network of interconnected minor canals that characterize this region. The roughness coefficient was taken as 0.15 for forested areas and 0.09 [46] for other areas based on land cover and by reviewing past studies and literature [41]. Moreover, a section of Matara city was incorrectly simulated as being flooded when in fact flooding did not occur. We attribute this to faults in the DEM, that was unable to detect flood control structures such as bunds on either side of the river due to topographical changes that might have occurred owing to its production and low resolution.

**Figure 4.** (**a**) 2D flood modeling extent and inflow locations. (**b**) Input hydrographs for flood modeling.

The extent of flooding was verified using a survey of the flood extent carried out by the Department of Irrigation for the 2017 May flood event (Figure 5). The survey was carried out by considering flood marks and accounts of the residents in the months shortly after the flood event. An accuracy assessment (Table 4) using 10,000 random points was carried out to compare the simulated flood extent with the surveyed extent, with a maximum simulated depth greater than 0.5 m being considered as a flood, based on the classification of Hamdan et al., [50]. The producer's accuracy details the number of reference points (i.e., surveyed points) classified correctly to the total number of reference points for that particular class. The user's accuracy represents the correctly simulated points to the total number of points simulated for its respective class. The results of the assessment indicated an overall accuracy of 81.5%. The relatively lower producer's accuracy for inundated class (70.1%) when compared with the higher user's accuracy (81.5%) indicates a higher error of observed inundated points being simulated as unaffected.

**Figure 5.** The surveyed flood extent of May 2017 flood event (Source: Department of Irrigation, Sri Lanka).


**Table 4.** Accuracy assessment of flood extent simulation using iRIC.

The simulated flood depth accuracy was verified using observed water levels at *Tudawa* station, located north of Matara city. This resulted in; NSE = 0.94, PBIAS = −4.28, RMSE = 0.18 and R<sup>2</sup> = 0.95 (Figure 6).

**Figure 6.** Comparison of observed water level and simulated flood depth of May 2017 flood at *Tudawa*.

In Figure 7a, depths over 7 m were observed towards the northern aspect of the simulation extent close to the main river's inflow points and its tributary *Kirama Oya* (i.e., inlets 1 and 2, from Figure 4). There remains uncertainty in flood depth in the upper regions as the model fails to consider any inundation that may have occurred upstream of the simulation extent which is modelled as purely one-dimensional. Furthermore, the conservation of momentum flow and the return flow of water at the linkage sites were neglected [18]. However, when considering the region closer to Matara, a maximum depth of 5 m was simulated. Comparing the velocity magnitude map (Figure 7b) with the background image affirms that the model could correctly identify the river's flow path.

Shokory et al., [28] studied the influence of rainfall and streamflow on flooding. The study found that rainfall did not significantly impact flooding in the Kabul case study, with the difference between the two resulting in the formation of "puddles". The *Nilwala* basin showed similar characteristics as the flood occurs mainly due to heavy precipitation in the upper catchment [10]. This can be clearly understood when comparing the input hydrographs generated using HEC-HMS for the May 2017 event (Figure 4). In addition, a rainfall gauging station with the capability to measure hourly precipitation within the simulated extent was unavailable. Due to these reasons, the rainfall within the basin was omitted from the simulation.

As this study seeks to develop a tool that could be applied for real-time flood forecasting, the time required to simulate needs to be considered. The grid size of the generated mesh will affect the simulation time where a grid with higher resolution will produce more accurate results but will require a significantly longer time to complete the simulation. Thus, an 80 m grid size was selected to produce results faster with acceptable accuracy.

**Figure 7.** iRIC simulation results of 2017 May flood event: (**a**) maximum flood depth and (**b**) maximum flood velocity at *Nilwala* river downstream area (Matara City and surrounding area).

During the May 2017 flood event, it took approximately 36 h for the flood peak to reach *Tudawa*. The average time taken to run the hydraulic model was close to 5 h, but this could be further reduced by using a machine with greater computational power. However, the model did simulate within an acceptable period, with sufficient lead time for taking necessary precautions based on the generated flood risk maps.

A flood risk map (Figure 8) was generated based on the May 2017 flood event. Regions were classified as high, medium and low risk according to the flood depth [50]. Such maps generated for different return periods can aid in the design and planning of infrastructure, flood control structures, and provide guidance in community resiliency development projects. In addition, maps generated in real-time during a storm event can be used to issue flood warnings and provide specific instructions for evacuation where needed.

**Figure 8.** Flood risk map of *Nilwala* River downstream (Matara City and surrounding area) for May 2017 flood event.

### **4. Conclusions**

The *Nilwala* river basin's upper region was modelled in HEC-HMS using the Snyder unit hydrograph method to simulate transformation, while the Muskingum–Cunge routing model, deficit and constant loss model, and recession baseflow model were used to model routing, loss and baseflow, respectively. The model was calibrated and validated using three flood events: May 2018 and September 2019 events for calibration and the May 2017 event for validation. The model performed well considering the Nash–Sutcliffe efficiency, percent bias, and root mean square error. Simulated stream hydrographs from HEC-HMS were externally coupled with iRIC to simulate inundation in Matara for the 2017 May flood event. The neglection of upstream flooding, momentum transfer and return flow at the linkage site presented major limitations of the model. The accuracy of the DEM used will greatly impact the performance of iRIC. However, the extent of flooding was estimated, with an overall accuracy of 81.50%. The accuracy of simulated flood depth was verified using water levels at the *Tudawa* gauging station which resulted in an NSE of 0.94, a PBIAS of <sup>−</sup>4.28, an RMSE of 0.18 and an R<sup>2</sup> of 0.95. Based on these values, it can be concluded that the 1D–2D coupled model can accurately identify the extent of flooding in addition to the flood depth. Nevertheless, the model's accuracy should be further improved by simulating more events with available observed flood data. Moreover, the model should be tested in the field using real-time flood events to evaluate its feasibility as a tool for flood forecasting and early warning systems. Accordingly, a flood risk map was developed for the May 2017 flood event. Further studies should be undertaken to develop such maps for storms with different return periods and for various climate change scenarios.

**Author Contributions:** Conceptualization, M.H.J.P.G. and K.S.; methodology, L.D. and M.R., M.K.N.K.; software, L.D. and M.R.; validation, M.H.J.P.G., M.K.N.K. and M.R.; data curation, T.J.M.; writing original draft preparation, L.D.; writing—review and editing, M.H.J.P.G. and K.S.; supervision, M.H.J.P.G. and M.K.N.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Flood Mapping from Dam Break Due to Peak Inflow: A Coupled Rainfall–Runoff and Hydraulic Models Approach**

**Mihretab G. Tedla 1 , Younghyun Cho 1,2, \* and Kyungsoo Jun 1**


**Abstract:** In this study, we conducted flood mapping of a hypothetical dam break by coupling the Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) and River Analysis System (HEC-RAS) models under different return periods of flood inflow. This study is presented as a case study on the Kesem embankment dam in Ethiopia. Hourly hydrological and meteorological data and high-resolution land surface datasets were used to simulate the design floods for piping dam failure with empirical dam breach methods. Based on the extreme inflows and the dam physical characteristics, the dam failure was simulated by a two-dimensional, unsteady flow hydrodynamic model. As a result, the dam will remain safe for up to 50-year return-period inflows, but it breaks for 100- and 200-year return periods and floods the downstream area. For the 100-year peak inflow, a 208 km<sup>2</sup> area will be inundated by a maximum depth of 20 m and for a maximum duration of 46 h. The 200-year inflow will inundate a 240 km<sup>2</sup> area with a maximum depth of 31 m for a maximum duration of 93 h. The 2D flood map provides satisfactory spatial and temporal resolution of the inundated area for evaluation of the affected facilities.

**Keywords:** design floods; HEC-HMS; HEC-RAS; dam break; unsteady; flood mapping; Kesem

### **1. Introduction**

Dam failures have recently had a large disastrous effect on downstream areas in many countries. Between 2000 and 2009, more than 200 notable dam failures occurred worldwide [1,2], such as Oroville, USA, in 2017 (dam failure) [3]; Brumadinho, Brazil, in 2019 (dam break) [4]; Xepian Xe Nam Noy Dam, Laos, in 2018 (dam break) [5]; and Hidroituango, Colombia, in 2018 (dam failure) [6]. In the east Africa region, water burst through the banks of the Patel Dam in Kenya's Rift Valley, washing away almost an entire village [7]. Public concern has resulted in an increasing focus on dam safety and has imposed responsibility on decision makers. Dam-break-induced disasters may occur more frequently due to infrastructure ageing. Emergency planning, as a non-structural measure to minimize flood impacts, plays an important role in crisis management. One of the keys to preventing and reducing losses from flood disasters is obtaining reliable information about the flood risk through flood inundation maps [8]. If a disaster cannot be avoided, individual and social structure preparedness may help in risk reduction [9]. As the cost of hazard assessment is negligible compared with the total cost of disasters, every dam should have been analyzed for safety with an emergency plan in place [10].

Numerical flood simulation models are useful tools for quantitatively evaluating flooding, which enable researchers to qualitatively evaluate hazard and other risks caused by flooding; however, these models have limitations related to insufficient datasets [11]. Recently, due to advances in hydraulic models, research on the application of 2D models for flood risk management, including dam breaks [12,13], has been increasing. Despite the rapid development of the risk analysis of dam breaks, research is relatively lacking

**Citation:** Tedla, M.G.; Cho, Y.; Jun, K. Flood Mapping from Dam Break Due to Peak Inflow: A Coupled Rainfall–Runoff and Hydraulic Models Approach. *Hydrology* **2021**, *8*, 89. https://doi.org/10.3390/ hydrology8020089

Academic Editors: Marina Iosub and Andrei Enea

Received: 28 April 2021 Accepted: 31 May 2021 Published: 6 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

on integrating the impact of peak inflows, which causes dam breaks. Heavy rainfall and peak flood inflows have the potential to trigger dam failures. As a result, determining their impacts on dam failure and the associated disaster is instrumental in understanding complicated dam break problems. Moreover, the flood risk analysis, the inundation extent, and the environmental impact of dam breaks are also affected by peak flood inflows.

In this study, the Hydrologic Engineering Center's Hydrological Modeling System (HEC-HMS), which can simulate a wide range of hydro-meteorological and physical processes including flooding, and the Hydrologic Engineering Center's River Analysis System (HEC-RAS) 2D model were used to evaluate the impact of peak inflows on dam break and flood inundation. HEC-GeoHMS is typically used to import all the required inputs into HEC-HMS [14]. The watershed is physically represented with the HEC-HMS semi-distributed basin model [15]. Various hydrologic elements are connected in a dendritic network to simulate runoff processes [16,17]. In this study, the inflow hydrographs for different return periods were generated with hourly precipitation and streamflow data using the curve number method. RAS Mapper, an extension of HEC-RAS, has a set of procedures, tools, and utilities to explicitly display the results [18]. HEC-RAS 2D has a GIS interface and applies the finite method to solve unsteady flow equations that describe the two dimensions of the river and floodplain [19,20]. A detailed animation showing flood wave progression in multiple directions on a local scale is best-represented using a 2D model.

The novel objective of this study was to develop a flood inundation map by integrating hydrological and hydraulic models for a hypothetical dam break. The peak inflow for various return periods was used to initiate the dam break. The inputs derived from the physical condition, material properties, dimensions of the dam, and maximum bounding breach width and height were used in the empirical breach method. A stepwise approach was implemented to execute the model. The first step was developing a physically based hydrological model to simulate the designed rainfall for flood discharges with HEC-HMS after calibrating and validating using historical storm events. The output of this step was used to predict flood volume, peak rates, and the runoff hydrograph. The next step was to develop a hydraulic model of the watershed for a piping dam break with an HEC-RAS model. The last step was to simulate the flood hydrograph and analyze the inundation map of the return periods using the 2D flood map.

This study is presented as a case study of the Kesem embankment dam. The aim was to understand the interaction of hydro-meteorological, topographic, and physical dam settings to determine dam-break-associated flooding. The Afar region, located in the East African Rift System, has been affected by progressive stages in the development of the largest active rift in the world [21]. Given its volcanically and seismically active setting, the Kesem dam is subjected to high seismicity due to regular strain release occurrences [22]. Previously, dam breach simulation and analysis of the Kesem dam were conducted by combining the physical breaching model HR BREACH, a simple geotechnical failure model, to simulate any breaches in the embankments, with HEC-RAS 1-D, to perform the flood routing [23]. In our method, peak inflows drive the dam break in the piping dam failure method, demonstrating the inundation area on a 2D flood map.

### *Study Area*

Kesem River is a tributary to the Awash River Basin, which is the most-used among the twelve basins in Ethiopia (Figure 1a). Annual rainfall is received in the four rainy months of June, July, August, and September, and most of the river courses become full and flood their surroundings during these months [11]. The Kesem dam was built in 2015 by the then Ethiopian Water Work Construction Enterprise (WWCE), mainly for sugarcane irrigation, with the capacity for developing 20,000 ha of land. The dam is a rockfill dam, categorized as a large dam, possessing an embankment volume of 3.15 hm<sup>3</sup> and a crest length of 685 m. The capacity of the reservoir is estimated at 770 hm<sup>3</sup> , of which 360–480 hm<sup>3</sup> (47–62%)

is assumed to be live storage, and the spillway capacity is about 6180 m3/s. The dam is located 237 km to the northeast of Addis Ababa at 800 m above sea level.

**Figure 1.** Locations and characteristics of the study area: (**a**) Ethiopian river basins and the Kesem watershed in the Awash river basin; (**b**) elevation, Hydro-Met observation stations, and Kesem dam location; (**c**) hydrologic soil group; and (**d**) land-use map of the Kesem watershed.

### **2. Materials and Methods**

We combined a rainfall–runoff and a hydro-dynamic model to map floods due to dam break. High spatio-temporal resolution land surface and observed hydro-climatic datasets were employed for simulating the probabilistic rainfall events of various return periods that trigger piping dam failure. We also investigated the performance of the coupled rainfall–runoff and hydraulic models in flood hazard and inundation extent analysis. The topographic data (i.e., DEM, soil, land use, and terrain data) and the hydro-meteorological data used in this study are presented in the Data Acquisition section. Figure 2 summarizes the methodological workflow followed in this study.

**Figure 2.** Overall procedures of our modeling work.

### *2.1. Data Acquisition*

### 2.1.1. Land Surface Data

The global data for a digital elevation model (DEM) with a 30 m × 30 m resolution were acquired from the Shuttle Radar Topographic Mission (SRTM) of the United States Geological Survey (USGS) to extract stream networks and prepare inputs to the hydrologic model. ArcGIS (version 10.4) was used to prepare the spatial datasets for use in the HEC-HMS 4.2.1 [24] model. Shapefiles collected from the Ministry of Water Irrigation and Energy (MoWIE), Ethiopia, reservoir geometry in the form of control levels, and the revised elevations–area–capacity curve from a dam design report [25] were used to determine the breaching simulation at different water levels.

Although a wide range of soil types exists in the Awash river basin, two main soil classes, shown in Figure 1c, can be distinguished in the Kesem sub-basin, based on the global hydrologic soil groups (HYSOGs250m) for curve number-based runoff modeling [26]. The most abundant hydrologic soil group (HSG) is D soils, which have the highest runoff potential and very low infiltration rates when thoroughly wetted. They mainly consist of clay soils with a high swelling potential, soils with a permanent high-water table, soils with a claypan or clay layer at or near the surface, and shallow soils over nearly impervious material. The other HSG that exists in the study area is Group C. Group C has low infiltration rates when thoroughly wetted and mainly consists of soils with a layer that impedes downward movement of water and soils with a moderately fine structure [27].

For the land-use classification, the GlobeLand30 dataset developed by China, which has a global surface coverage dataset with a 30 m resolution, was acquired from the National Geographic Information Resource Directory Service [28]. These data contain richer and more detailed information on the distribution of global surface coverage, which can more accurately portray most human land-use activities and the landscape patterns. The land use was reclassified broadly into five groups: agriculture, bare land, forest, built-up, and water. Table 1 shows the major land-use areas in km<sup>2</sup> in the Kesem watershed.

**Table 1.** Land-use classifications and corresponding areas in the Kesem watershed.


### 2.1.2. Meteorological and Hydrological Data

In this study, hourly precipitation and discharge data were used for the model simulation. The rainfall data were collected from the Ethiopia National Meteorological Agency (NMA) at three observation stations. Similarly, hourly streamflow data from MOWIE were collected at the automated Kesem Arerti gauge station. The locations and description of these stations are provided in Figure 1 and Table 2. Both the precipitation and streamflow data were originally collected in 15 min intervals and were converted to hourly intervals. An automated real-time telemetry observation system was used to collect the streamflow data; the precipitation data were collected from first-class meteorological stations. In Ethiopia, only the first-class meteorological stations can collect and store high-temporalresolution data, up to 15 min, with many climate parameters. For the hydrologic model set-up, five storm events (16–19 August 2012, 22–24 August 2013, 22–24 August 2016, 7–9 August 2014, and 18–20 August 2015) were used independently to calibrate and validate the model.

**Table 2.** List of selected meteorological and hydrological gauge stations.


### *2.2. Estimation of the Designed Storm*

In this study, the rainfall distributions (i.e., designed storm) for the different return periods were estimated using the Gumbel distribution. The Gumbel distribution method is best for identifying and predicting future rainfall occurrences and flood analysis [29]. The equation for fitting the Gumbel distribution using the observed series of annual peak storm events (i.e., maximum precipitation) for the return period T is

$$\mathbf{P\_{l} = P\_{\rm av} + K\_{\rm T} \sigma} \tag{1}$$

where P<sup>t</sup> denotes the precipitation depth of the T-year storm event; K<sup>T</sup> is the frequency factor; and Pav and σ are the mean and the standard deviation of the annual maximum precipitation, respectively. The frequency factor K<sup>T</sup> is expressed as

$$\mathbf{K}\_{\rm T} = -\frac{\sqrt{6}}{\pi} \left\{ \lambda + \ln \left[ \ln \left( \frac{T}{T - 1} \right) \right] \right\} \tag{2}$$

where π is 3.14; λ is the Euler constant (0.5572); and ln is the natural logarithm.

From this, the rainfall intensity, It, is calculated by the precipitation depth of the T-year return period divided by a certain duration as

$$\text{It} = \text{P}\_{\text{t}} / \text{T}\_{\text{d}} \tag{3}$$

where T<sup>d</sup> is the rainfall duration.

### *2.3. Hydrologic Modeling (HEC-HMS)*

The HEC-HMS rainfall–runoff model [30] was used to simulate inflows into the dam. HEC-HMS is capable of simulating runoff based on hourly to daily rainfall [24]. All the necessary processes to produce HEC-HMS projects were prepared using ArcHydro [31] and HEC-GeoHMS [32] tools with input files from the DEM, stream network, sub-basin boundaries, and connectivity of various hydrologic elements. The HEC-GeoHMS in the ArcGIS environment can create HEC-HMS projects using terrain pre-processing and basin processing tools [14,33]. Arc Hydro is a geospatial and temporal data model for water

resources, which operates within ArcGIS. The attribute tables of the terrain were populated with consecutive calculations of basin slope, river length, river slope, basin centroid, centroid elevation, and centroid longest flow path under the Characteristics tool. The land use and soil map were employed to generate the CN grid map through the HEC-Geo HMS tool Generate CN grid.

In the hydrologic modeling, different methods were used to create the rainfall–runoff relationship. The soil conservation service-curve number (SCS-CN) method for losses, the SCS unit hydrograph for transformation, the monthly constant for base flow, and the Muskingum–Cunge method for determining the channel routing were used for the simulation. A curve number grid was generated with the HMS tool; the most common parameters in the flood simulation were entered to develop the HMS schematic for import into HEC-HMS. After performing all the necessary processes in HEC-GeoHMS, the HMS model was exported for hydrologic simulation.

The rainfall–runoff model parameters have to be derived from calibration against one or more observed variables, e.g., streamflow. Comparison of these parameters with modeled results provides physical meaning and acceptable ranges for their values [34]. Exact values for the parameters usually cannot be fixed in advance. Therefore, five hourly historical storm events between 2012 and 2018 at three precipitation gauge and one hydrological gauge stations were used to set up the model through calibration and validation. The model estimates flow output and compares it with the observed flow at the Kesem Arerti gauge station. Statistical measures were used to assess the model performance (i.e., the strengths and weaknesses of the model). This measures-oriented approach to model performance assessment focuses on several different aspects of the overall accuracy or skill of the streamflow model. The applied model was evaluated with a coefficient of statistics, to provide the range of information [35].

The coefficient of determination (R<sup>2</sup> ) is used to describe the degree of correlation between the simulated and measured data. It ranges from −1 to 1, where values close to 1 indicate the least error. The percent bias (PBIAS) measures the tendency of the average value. The Nash–Sutcliffe efficiency (NSE) is a normalized statistic showing the residual variance, where an NSE > 0.5 is considered to indicate a good fit. The mean absolute error (MAE) and root mean square error (RMSE) are absolute measures of fit. The equations that were employed in this study are shown below:

$$\mathbf{R}^2 = \left\{ \sum\_{i=1}^n (\mathbf{Oi} - \mathbf{Oav})(\mathbf{Si} - \mathbf{Sav}) / \left[ \sqrt{\sum\_{i=1}^n (\mathbf{Oi} - \mathbf{Oav})^2} \sqrt{\sum\_{i=1}^n (\mathbf{Si} - \mathbf{Sav})^2} \right] \right\}^2,\tag{4}$$

$$\text{PBIAS} = \left[ \sum\_{i=1}^{n} (\text{Oi} - \text{Si}) / \sum\_{i=1}^{n} \text{Oi} \right] \times 100,\tag{5}$$

$$\text{NSE} = 1 - \left[ \sum\_{i=1}^{n} (\text{Oi} - \text{Si})^2 / \sum\_{i=1}^{n} (\text{Oi} - \text{Oav})^2 \right] \tag{6}$$

$$\text{MAE} = \sum\_{i=1}^{n} |\text{Oi} - \text{Si}| / n\_{\prime} \tag{7}$$

$$\text{RMSE} = \sqrt{\sum\_{i=1}^{n} (\text{Oi} - \text{Si})^2 / n} \,\text{}\tag{8}$$

where Oi is measured flow (m3/s); Si is simulated flow (m3/s); and Oav and Sav are mean measured and simulated flow (m3/se), respectively. Afterward, the frequency method was selected to run the HEC-HMS model for specified durations under the inbuilt meteorological model in the HEC-HMS model.

### *2.4. Hydraulic Modeling (HEC-RAS)*

HEC-RAS was designed to perform one- and two-dimensional hydraulic calculations for a full network of natural and constructed channels. HEC-RAS can be used to route an inflowing flood hydrograph through a reservoir with any of the following three methods: one-dimensional (1D) unsteady flow routing (full Saint-Venant equations), twodimensional (2D) unsteady flow routing (full Saint-Venant equations or diffusion wave

equations), or level pool routing. In general, full unsteady flow routing (1D or 2D) is more accurate for both the with- and without-breach scenarios. Because HEC-RAS solves the full Saint-Venant equations, it is well-suited for computing floods [36].

In a previous study, the Kesem dam was simulated using HR BREACH for both the overtopping and piping failure modes, employing Hershfield's technique to estimate the probable maximum flood. The result showed overtopping failure is not expected as the spillway can safely evacuate this flood [23]. For piping failure, the water seeps in at a significant rate through the dam; as the material is eroded, a large hole forms. Hence, during the piping process, erosion and head cutting begin to occur on the downstream side of the dam, which eventually widen the breach. Depending on the volume of water in the reservoir, the widening continues until the natural channel bed is reached. HEC-RAS has two types of methods available for assessing the dam breach characteristics, namely, the user-entered and the simplified physical methods.

Breach formation model parameters that need to be estimated consist of the reservoir water-surface elevation at which breach formation begins (Yf), height (Hb), average width (Bav), average side slope ratio z of the final trapezoidal breach, and the breach formation time (tf, the time needed for complete development of the breach following the initiation phase), as shown in Figure 3. For this study, the Froehlich simplified physical method regression equations [37] were used for piping dam breach simulation. The relationships for both expected values and their variances are presented using the coefficient of determination (R<sup>2</sup> ) for breach width, peak outflow, and breach formation time parameters: 0.68, 0.86, and 0.96, respectively [38]. The prediction equations for the parameters Bav, z, and tf were determined from multiple regression analysis of the assembled data. Logarithmic transformations of all dependent and independent variables were found to provide the best linear relationships [39]. As a result, the Froehlich method was found to be more appropriate for the simulation of the Kesem dam.

**Figure 3.** Dimensions of a trapezoidal dam breach approximation (Froehlich, 2008).

### *2.5. 2-D Flood Mapping*

The modeling of a dam break flood wave is one of the most difficult unsteady flow problems to solve. Within HEC-RAS, the downstream area can be modeled as a combination of one-dimensional streams and storage areas; as a combination of one-dimensional streams, storage areas, and two-dimensional flow areas; or as a single two-dimensional flow area. Many other factors must be considered to obtain an accurate estimate of the downstream flood stages and flows. In unsteady flow model development for a dam break, stability and numerical accuracy can be improved by selecting a time step that satisfies the Courant condition. Too large a time step causes numerical diffusion (attenuation of the peak) and

possible model instability, whereas too small a time step can lead to long computation times, as well as possible model instability.

The best method to estimate a computational time step for HEC-RAS is to use the Courant condition. This is especially important for dam-break flood studies. The water depth, velocity, and the water surface elevation were analyzed, and the result was displayed in HEC-RAS Mapper view. The Courant condition is shown below:

$$\mathbf{C} = \mathbf{V}\_{\mathbf{W}} \,\Delta\mathbf{T} / \Delta\mathbf{X} \le 1 \text{ and } \Delta\mathbf{T} \le \Delta\mathbf{X} / \mathbf{V}\_{\mathbf{W}} \tag{9}$$

where C denotes the courant number; ∆T is time step in seconds; ∆X is the distance step in feet (average cross-section spacing or two-dimensional cell size); and Vw is the wave speed in feet per second. The flood wave speed can be calculated by

$$\mathbf{V\_W = dQ/dA} \tag{10}$$

where dQ is the change in discharge over a short time interval (Q<sup>2</sup> − Q1); and dA is the change in cross-sectional area over a short time interval (A<sup>2</sup> − A1).

For our applications of the Courant condition, we used the maximum average velocity from HEC-RAS and multiplied it by 1.5 to obtain a rough estimate of the flood wave speed in natural cross-sections. According to the recommendation of the HEC-RAS manual, multiplication by 1.5 can be used for practical reasons to identify the flood wave speed. The contour lines of velocity and their flow directions can also be displayed in the 2D flood map.

Afterward, a new set of layers were created and imported into the RAS mapper to visualize the outputs generated in a specified pre-processing step for performing the floodplain delineation. For the model to be accurate, small cell sizes can be used as the computational requirements are less relevant and the simulation runs quickly [12]. Previous studies have addressed the effects of mesh size and resolution on 2D flood map modeling. The coefficient of surface roughness, i.e., Manning values analysis, is required to be very large to avoid any significant impact on the model predictions [13,40]. In this study, different mesh sizes were tested considering the terrain resolution and the computational time. The maximum possible cell size, i.e., 30 m × 30 m, was employed to simulate the flood flow in a reasonable amount of computational time. In addition, a constant Manning roughness value (i.e., 0.05) was used per Chow's (1959) recommendation for vegetated flood plains [41].

Stream centerlines, cross-section cut lines, bank points, velocity points, and bounding polygons were created in ArcGIS. The software creates different bounding polygons and a spatial limit for floods based on the water surface elevation of the return periods floodplain. Finally, inundation mapping was performed in two steps: water surface generation and floodplain delineation. The water surface was created from the altitude of the water surface in the 2D flow area. Floodplain delineation was achieved using the SRTM terrain model. The flood damage and inundated infrastructure were analyzed by overlaying the Google Earth satellite map. Thus, the flood inundation boundaries and their depths were calculated. The flood inundation areas for different flood values are represented by polygons of the contour lines generated in the RAS mapper.

### **3. Results**

### *3.1. Designed Storm*

The designed storm was computed by applying the Gumbel distribution to each set of annual daily maximum precipitation corresponding to the return periods, as shown in Figure 4. The Gumbel distribution fairly estimates the stochasticity of the Kesem watershed, as shown on the curve below. Following the frequency result, an intensity duration frequency (IDF) curve was developed to estimate the designed storm for various return periods. The value of the IDF was converted into the depth duration frequency (DDF) to estimate the design precipitation depth, as shown in Figure 5, and computed for the design inflow calculation.

**Figure 4.** Frequency analysis using Gumbel frequency factors (Kesem, 2006 to 2015).

**Figure 5.** Depth duration frequency (DDF) for the Kesem watershed with rainfall estimation of seven return periods.

### *3.2. Hydrologic Simulation (Flood Hydrograph)*

The HEC-HMS model was set up by simulating the historical observed data and tuning the sensitive parameters within the model. The calibration, validation, and physical properties analysis of the HEC-HMS model significantly enhanced the performance of the hydrologic model. Moreover, the base flow that was calculated from the annual minimum monthly flow of the streamflow was the crucial element in determining the peak rainfall and maintaining the hydrograph consistency during low flows. Figure 6 shows the monthly average precipitation of the Kesem watershed.

**Figure 6.** Annual monthly average flow at the Kesem Arerti station from 2006 to 2015.

Regardless of historical changes in the basin, the model calibration increased the efficiency of the model in making a good agreement between the simulated and observed hydrographs. Furthermore, the model was verified using peak hourly storm events in August 2014 and 2015. Both calibration and validation hydrographs are shown in Figure 7. The results of the statistical evaluations after calibration and validation are within the acceptable range, as shown in Table 3. Thus, the model was well-set and ready for the simulation of flood hydrographs using the return periods of the designed storm.

**Figure 7.** Model calibration and validation results: hydrographs of storm events after calibration in (**a**) 2012, (**b**) 2013, and (**c**) 2016; and validation in (**d**) 2014 and (**e**) 2015.

2-year 5-year 10-year 25-year 50-year 100-year

200-year

12:00 AM 12:00 PM 12:00 AM 12:00 PM

**Time (hour)**

0

1000

2000

3000

**Discharge (m3/s)**

4000

5000

6000

7000


**Table 3.** Major parameter indices after calibration and validation with optimized parameter values.

Afterward, the model simulated a 48 h storm event. The hydrographs from the HEC-HMS model were used as a boundary condition for the hydraulic model. Figure 8 depicts the flood hydrographs of the designed storm in the Kesem reservoir. The reservoir inflow hydrograph peak discharge for the return periods of 2, 5, 10, 25, 50, 100, and 200 years was 318.2, 968.1, 1672.6, 2866.9, 3892.3, 4997.5, and 6161.1 m3/s, respectively. The use of hourly peak observed rainfall events for the flood estimation shows a relatively higher flood hydrograph, which increased the estimation accuracy of the Kesem watershed hydrographs.

**Figure 8.** Reservoir flood inflow hydrographs for different return periods of the designed storm.

### *3.3. Hydraulic Simulation (Flood Mapping)*

Then, HEC-RAS simulation was performed using the peak inflow hydrographs of the hydrologic model outputs as the boundary conditions for a piping dam breach. Consequently, we found the inflow up to the 50-year return period cannot result in a Kesem dam break. However, the 100- and 200-year return period inflows show dam failure and flooding of the downstream area. The dam breaching begins when the water level of the reservoir reaches an elevation of 930 m and the volume of water stored in the reservoir is 550 million m<sup>3</sup> . The breach is estimated to begin as triangular and then stretching to reach the bottom of the breach, when the shape changes to trapezoidal. A standardized dam-breach scenario examines a range of possibilities to estimate the expected annual damages, which vary with the volume of water in the reservoir. For general planning purposes, the most useful scenario is the worst-case one in which the dam fails while the reservoir is full [42]. Therefore, in this study, the stored water in the dam and the inflow was routed to the floodplain downstream of the dam. Thus, the flood hazard maps for the 100- and 200-year return periods were generated. Figure 9 represents the model results for the return periods that show dam break and those that remain safe.

**Figure 9.** Hydraulic model outputs of inundation areas for various return periods: (**a**) the main infrastructure and flooded area locations; and the (**b**) 50-year, (**c**) 100-year, and (**d**) 200-year return periods.

The inundation area coverage and depth in the floodplain are displayed for both the 100- and 200-year return periods. The 2D flood simulation video in RAS mapper provides a detailed view of the flood extent, propagation, and depth. In this case, the velocity of the flooding shows variation in space and time, increasing to 15 m/s in the immediate vicinity of the dam and slowing as it flows to the wider area toward the confluence with the Awash River, as shown in Figure 10.

**Figure 10.** Maximum velocity of flood flow from the dam in the cases of (**a**) 100-year and (**b**) 200-year return period inflows.

The inundation depth and duration vary depending on the topography and proximity to the dam. Table 4 provides the main information about the flood inundation. The maximum water depth for the 100- and 200-year return periods are 20 and 31 m, respectively. The inundation time also increases to 46 and 93 h, with 10 and 39 h on average, for the 100- and 200-year return periods, respectively. The flood inundates everything in the downstream area of the dam. The devastating effect extends to the life, livelihood, and property of the community as well as the national economy, including major infrastructure such as Awash National Park, the irrigation control area, the Kesem sugar factory, and two small villages downstream of the dam.


**Table 4.** Main information on inundated areas from the results.

### **4. Discussion**

In previous studies, the Kesem watershed and dam flood flows were estimated using monthly to daily time-series data for flood simulation. In this study, the flood flow estimation using hourly time steps of precipitation data demonstrated a more reasonable results, as short duration peak events are included in flow calculation. That means the higher the temporal resolution of the record, the higher the accuracy of the flow estimation. Moreover, the statistical results of the simulated flow hydrograph against stream flow observations were within an acceptable range after model calibration and validation. Although the application of short-term rainfall data to the evolution of long-term trends might not be representative and conclusive, as the trend results are influenced by data uncertainty [43], the observed annual maximum storm data of the Kesem watershed well fit the estimated storm curve produced using the Gumbel method, as shown in Figure 4. In addition, the accuracy of estimation was enhanced using a longer period of observed data and the use of IDF and DDF curves with the Gumbel precipitation distribution, which were found indispensable for return periods flow simulation.

Dam breaches are often simply modeled in the shape of a trapezoid that is defined by its final height, base width, or average width and side slopes, along with the time needed for the opening to form completely [39]. In this study, the dimension and occurrence of the dam breach were estimated by Froehlich mathematical expressions for the expected values of the final width and the side slope of a trapezoidal breach, along with its formation time. Although data on historical embankment dam failures are limited and were generally not directly and accurately recorded, Froehlich collected data from 74 embankment dam failures for the determination of the breach characteristics [37]. Additionally, an investigation into the Mosul embankment dam failure with HEC-RAS embedded with piping dam breach mathematical equations showed that Froehlich is the most suitable method for estimating breach parameters from an embankment dam [44]. Therefore, the application of the Froehlich method to the Kesem dam is the most accurate method to estimate the breach dimensions and formation time.

The use of a high-spatial-resolution DEM for 2D flood map generation and inundation extent determination was essential for topographic and cross-sectional data-scarce areas. River cross-sectional dimension data at a specified interval are scarcely available for many rivers of the world. Furthermore, for flood patterns around discrete features, it is important to use a 2D model to determine the direction of flow [45]. However, terrain analysis for implementing large-scale hydraulic simulation is still a challenging and active research issue. A comparison of the detailed digital surface models (DSMs) and DEMs shows that the detailed DSMs may more accurately indicate the surface relief and the existing natural obstacles, such as vegetation, buildings, and greenhouses, enabling more realistic hydraulic simulation results [46]. Consequently, the 2D map can be enhanced by incorporating 1D simulations with the necessary topographic and cross-sectional data as well as detailed digital surface models. Additionally, in this study, the existing structures downstream the dam, except for the flood-control levee on the side of Awash River, could not be included due to a lack of data. However, the accuracy of the results can be improved with the addition of existing inline and lateral structures, such as weirs, bridges, and dykes. The effect of land use on both the watershed and the floodplain can be demonstrated to depict scenario-based flow change and the expected damage. The depth of inundation and the velocity can be synchronized to assist in practical impact-based flood forecasting and damages calculation.

Although there is no unified definition, the environmental impact of a dam break, which refers to the changes in the natural environment and ecological conditions around the reservoirs, mainly includes the natural environment, including water, soil, atmosphere, noise, and solid waste [47]. Flood maps of anticipated dam breaks cannot be accurately drawn, and it is practically impossible to exactly validate the inundation area. Flood maps can be enhanced by incorporating a validation sample specific to the flood-inundated area. We used the volume–area–elevation curve between the reservoir storage time and the breach occurrence time to validate the inundation area. The validated model could be useful for reservoir operation and dam safety management.

### **5. Conclusions**

In this study, we coupled HEC-HMS and HEC-RAS for application to the flood mapping of a Kesem dam break in Ethiopia, with the results showing a satisfactory result for both the rainfall–runoff and hydraulic simulations. The use of 2D flood mapping, particularly in data-scarce areas, greatly contributed to the identification of risk zones and the extent of the hazard. From this study, the following conclusions were drawn:


**Author Contributions:** Conceptualization, methodology, investigation, resources, data curation, writing original draft preparation, validation, software and formal analysis: M.G.T.; writing—review and editing, supervision and project administration: Y.C.; conceptualization and funding acquisition: K.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was done under the scholarship funded by the Korea International Cooperation Agency (KOICA).

**Data Availability Statement:** The data presented in this study are available on request from the first author. The data are not publicly available due to restrictions from the raw data providers.

**Acknowledgments:** We would like to acknowledge to all the support from the KOICA and the graduate school of water resources at Sungkyunkwan University. The author thanks K-water and the USACE for training in the HEC-RAS 2-D flood modeling.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Hydrological and Hydraulic Flood Hazard Modeling in Poorly Gauged Catchments: An Analysis in Northern Italy**

**Francesca Aureli \* , Paolo Mignosa , Federico Prost and Susanna Dazzi**

Department of Engineering and Architecture, University of Parma, Parco Area delle Scienze 181/A, 43124 Parma, Italy; paolo.mignosa@unipr.it (P.M.); federico.prost@unipr.it (F.P.); susanna.dazzi@unipr.it (S.D.) **\*** Correspondence: francesca.aureli@unipr.it

**Abstract:** Flood hazard is assessed for a watershed with scarce hydrological data in the lower plain of Northern Italy, where the current defense system is inadequate to protect a highly populated urban area located at a river confluence and crossed by numerous bridges. An integrated approach is adopted. Firstly, to overcome the scarcity of data, a regional flood frequency analysis is performed to derive synthetic design hydrographs, with an original approach to obtain the flow reduction curve from recorded water stages. The hydrographs are then imposed as upstream boundary conditions for hydraulic modeling using the fully 2D shallow water model PARFLOOD with the recently proposed inclusion of bridges. High-resolution simulations of the potential flooding in the urban center and surrounding areas are, therefore, performed as a novel extensive application of a truly 2D framework for bridge modeling. Moreover, simulated flooded areas and water levels, with and without bridges, are compared to quantify the interference of the crossing structures and to assess the effectiveness of a structural measure for flood hazard reduction, i.e., bridge adaptation. This work shows how the use of an integrated hydrological–hydraulic approach can be useful for infrastructure design and civil protection purposes in a poorly gauged watershed.

**Keywords:** flood risk; poorly gauged watersheds; regional flood frequency; flood modeling; GPU-parallel numerical scheme; bridges

### **1. Introduction**

Among the most frequent and destructive disasters, floods annually hit people in many countries all over the world. Flooding is the cause of up to 40% of natural disasters in the world, entailing almost half of the fatalities related to natural hazard, with a rising trend [1–4]. Flood risk management, therefore, has to be implemented on the basis of a proper and aware estimation of flood hazards under the consciousness that, due to global warming, the occurrence of extreme flood events will probably increase in the future, even if some recent studies show a situation that, at least in Europe, seems very patchy [5–8]. This task can be challenging, as it requires the careful consideration of many factors related to catchment properties, hydrological inputs, and characteristics of rural, urban, and productive areas that are potentially floodable [9]. Inaccurate hazard and risk assessments may result in poor risk management interventions, ranging from insufficient protection to squandering of public resources. Accurate flood hazard and risk assessments, on the contrary, can valuably support decisions on land use, design of infrastructure, and drafting and organization of emergency response for civil protection purposes. Inundation mapping allows capturing of the extent of the flooded area and can also represent a tool of primary importance to support first responders [10,11]. Whatever type of model is used for the purpose, a key element in flood risk assessment is the identification of the hydrological stresses to be adopted as input. Unfortunately, the definition of the hydrological inputs is strongly hindered by the fact that, very often, the watersheds of interest are devoid of reliable field observations. Adopting the definition of Sivapalan et al. [12], *ungauged* or

**Citation:** Aureli, F.; Mignosa, P.; Prost, F.; Dazzi, S. Hydrological and Hydraulic Flood Hazard Modeling in Poorly Gauged Catchments: An Analysis in Northern Italy. *Hydrology* **2021**, *8*, 149. https://doi.org/ 10.3390/hydrology8040149

Academic Editor: Andrea Petroselli

Received: 21 July 2021 Accepted: 30 September 2021 Published: 5 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

*poorly gauged* basins are those in which quality and quantity of hydrological observations are inadequate to allow a reliable evaluation of streamflow. Always the object of ongoing investigation, the prediction or forecasting of the hydrological responses of these basins and the evaluation of the associated uncertainty were the subject of a significant research activity (Prediction in Ungauged Basins—PUB) by the scientific community through the decade 2003–2012 [12]. Rather disheartening results emerged from that analysis, according to which the majority of rivers and stream reaches, and tributaries in the world are ungauged or poorly gauged [13]. Indeed, the knowledge of continuous water stages and discharges in rivers is a crucial factor in watershed analyses and water resource management, due to the need to evaluate flood peaks, flow reduction factors, and duration curves at the basis of the design of engineering structures, such as defense systems, flood detention reservoirs, etc. Very often, unfortunately, even for watersheds with a good consistency of water stage observations, both as regards the length of the measured series and the density of the gauging stations along the river network, the poor reliability or even the unavailability of the stage–discharge relationships affects the potential of the observed database. This could be obviated by developing methodologies capable of extracting useful information for the construction of flow hydrographs directly from water-stage-observed time series.

A possible solution to overcome the lack of direct observations is to resort to regionalization techniques that allow the necessary information to be derived from the knowledge of quantitative data in nearby gauged watersheds or with reference to a region homogeneous with the ungauged or poorly gauged catchment under investigation [14]. Behind the concept of a homogeneous region, there is the assumption that the similitude in climatic, geologic, vegetative, topographic, etc., characteristics would give origin to similar responses in terms of runoff. This does not necessarily imply the neighborhood of the basins [15]. Regionalization can be intended as the process of transferring hydrological information from gauged to poorly gauged watersheds, also, in terms of frequency [16–18]. It is obviously advisable to use any direct information available for the area of interest, even if scarce and fragmentary, to validate, to some extent, its belonging to the identified homogeneous region [19,20]. Many sources of uncertainty can affect the prediction reliability due to data and to the regionalization procedure itself, and this issue is constantly subject to attention given the importance that regionalization procedures play in hydrology [21–24].

In general, once the hydrological inputs of assigned probable frequency have been identified, inundation mapping can be performed through distinct approaches [25]: simplified conceptual procedures, empirical methods, and physically based models. The latter are often run at low resolution to allow reasonable computational times with the aim of creating flooding maps, sometimes also at a continental scale [26–29]. Even if a line of research [30–32] argued that inundation mapping performed at high spatial resolution can lead to useless and even counterproductive detail, there is evidence that high-resolution meshes derived from LiDAR-based DTMs allow a detailed description of the relevant terrain features typical of man-made landscapes (e.g., roads, railways, channels, and embankments) that dominate flow patterns and that would be undetectable at a coarse resolution [33–36]. The use of GPU-accelerated 2D shallow water numerical models and the adoption of nonuniform grids represent a powerful analysis tool, allowing a drastic reduction in the computational costs [11,37]. Care has to be devoted, anyway, to an indepth analysis of the terrain model, which, even in the presence of high spatial resolution, could be affected by non-negligible descriptive deficiencies, such as those occurring during vegetation filtering along streams with densely vegetated banks [38]. Anyway, the most accurate results for flood modeling are obtained adopting a fine spatial resolution, also, thanks to the accurate description of local terrain features and hydraulic structures that can be achieved [34,39].

Flood propagation in rivers and channels is significantly affected by the presence of bridges and other crossing structures, which represent an obstruction to the flow due to lateral and vertical constrictions and increase the hazard in upstream areas as a consequence of the backwater effect. Moreover, in the presence of high return period flood events, bridge

decks may even experience overtopping; it is, therefore, appropriate that all the elements interfering with the flow would be properly considered in numerical simulations aimed at flood management purposes [40]. Different approaches can be adopted to include the presence of bridges in 2D shallow water models [41,42], and the choice of the modeling approach should also consider the possible additional computational burden consequent to the introduction of the necessary solution algorithms. A possible way to simulate the presence of bridges and other hydraulic structures in 2D domains is the implementation of internal boundary conditions [43]. Recent studies investigated the possibility to also implement this approach in GPU-accelerated numerical codes [44], and the validations performed provided good results, both in low- and high-flow conditions for bridges, without affecting the performance of the calculation tools. It is, in fact, very important to preserve the computational efficiency of the numerical codes, both with a view to effectively simulate flooding phenomena on very extensive computing domains and to achieve efficient and accurate 2D simulations in ever shorter calculation times.

With the aim of assessing flood hazard in a poorly gauged watershed in Northern Italy, an integrated hydrologic–hydraulic approach is adopted here. A regional flood frequency analysis is performed to derive synthetic design hydrographs (SDH) to be imposed as upstream boundary conditions for fully 2D high-resolution hydraulic modeling in a domain in which an urban center is located at the confluence of two streams. An original approach is studied to obtain indispensable information for the construction of SDHs by exploiting the knowledge of the behavior of water stage hydrographs instead of discharge ones. Water stage hydrographs are, indeed, usually more accessible and reliable than discharge hydrographs due to the unavailability or uncertainty of the stage–discharge relationships.

The area of interest is characterized by the presence of over 20 crossing structures. This then leads to an extensive application of bridge modeling in a truly 2D framework for each different scenario. Thanks to the high efficiency of the code architecture and bridge computational approach, this entails only a negligible increase in the calculation times compared to the simulation of the same scenario in the absence of the crossing structures. Therefore, the computed flooded areas and water levels, in the actual state and in the hypothetical absence of bridges, are compared for each return period to quantify the influence exerted by the interfering structures upon flooding extent and to assess the effectiveness of a structural measure for flood hazard reduction, i.e., bridge rebuilding.

The paper is organized as follows. The study area is introduced in Section 2, while the hydrological analysis is presented in Section 3. In Section 4, the characteristics and set-up of the model and of the different flooding scenarios considered are illustrated. The results thus obtained are presented in detail in Section 5. The discussion and conclusions are then drawn in Section 6.

### **2. Study Area**

The Chiavenna is a left tributary of the Po River (the main Italian watercourse). It is about 52 km long and flows in the northwest of the Emilia-Romagna region (Northern Italy); its narrow and elongated catchment, which also collects the waters of the Riglio and Chero streams, has a total area of 340 km<sup>2</sup> (Figure 1 and Table 1), about 40% of which is hilly and mountainous. The character of the river system is typically torrential, and not infrequently, in summer, the riverbeds are totally dry.

**Figure 1.** (**a**) Chiavenna watershed, (**b**) location of the Po and Chiavenna basins in Italy, and (**c**) river network in the urban area of Roveleto.

**Table 1.** Watershed characteristics and mean annual rainfall for contributing areas at the gauging stations of interest for the period 1997–2018.


<sup>1</sup> Above the gauging station.

### **3. Hydrological Analysis**

With the aim of assessing the hydrological inputs for the hydraulic simulations, design hydrographs of an assigned return period have to be evaluated. They can be completely determined once the peak discharge frequency distribution and the shapes of the hydrographs are identified at the sites of interest. Of course, these features should be inferred from the analysis of historical flood hydrographs at the chosen boundary condition sections, if available. However, the fragmentary or not totally reliable knowledge of the historical flood waves makes it necessary to partly resort to regional techniques.

### *3.1. Available Data*

The historical rainfall data available for the Chiavenna basin refer to the gauging stations of Castellana Groppo (1983–2001) and Riglio (2003–2018). The available records at the two sites were joined into a unique sample of 35 years due to the proximity and similar properties. The gauging sites are, in fact, only about 5 km away (Figure 1) and at the same altitude above sea level (434 m and 432 m for Castellana Groppo and Riglio, respectively), without any noteworthy difference in topographic parameters, such as slope gradient and exposition, capable of significantly influencing the local rainfall patterns. The main characteristics of the rainfall data samples confirm the possibility of merging them in a unique sample (Table 2).


**Table 2.** Characteristics of the rainfall data samples.

With reference to the analysis of the historical rainfall unique sample, it is worthwhile stressing two significant values, namely the average values of the 1-h *m*(*h*1) and daily *m*(*h<sup>d</sup>* ) maximum annual rainfall, equal to 24.1 mm and 65.9 mm, respectively (Table 2). These values may be of some interest if regionalization techniques at the outlet of contributing areas along the watercourses have to be adopted in the absence of reliable direct flow records.

Three hydrometric stations are present on the three main streams (Figure 1). For these outlets, watershed main characteristics, together with the mean annual rainfall are reported in Table 1.

For the gauging station on the Chero at Ciriano, frequent lacks and anomalies throughout the available water stage records (since 2002) and the absence of a reliable stage– discharge relationship made the data hardly usable.

The water stage records on the Riglio at Montanaro, instead, provide sufficient accuracy and completeness to determine average values. In particular, a series of 18 historical flood events can be identified. Unfortunately, for this station, the official stage–discharge relationships show excessive variability over the years, probably due to the difficulty in acquiring direct discharge measures. It was, therefore, decided not to rely on the flow records derived through the stage–discharge conversions, but, rather, only on the water stages.

Moreover, the water stages recorded on the Chiavenna at Saliceto appear to be of fairly good quality, and this encourages their analysis with the aim of determining SDHs. As for the previous gauging station, however, the stage–discharge relationships show discordant and sometimes anomalous trends over the years (Figure 2). For this reason, a rating curve was derived through 1D hydraulic modeling, thanks to the availability of river cross-sectional surveys made specifically for this study. Values for Strickler's roughness coefficient equal to 30 and 15 m1/3 s <sup>−</sup><sup>1</sup> were assumed for the main channel and the floodplains, respectively, and a monomial function of the form *Q* = *a* − = *a* ℎ − − − *<sup>b</sup>* was fitted to the numerical stage–discharge relation computed at the gauging site (*a* = 5.579 m3−*<sup>b</sup>* s −1 , and *b* = 2.267). From Figure 2, it can be observed that this numerical rating curve is in fairly good agreement with the 2009 official stage–discharge relationship. From the conversion of the recorded hydrographs using the interpolated stage–discharge relationship, the average value of the annual maximum series (AMS) of *N* flood peaks (*N* = 18 events) was evaluated, resulting in 94.8 m<sup>3</sup> s −1 . It is good to keep this value in mind for considerations that will be made in the following.

Due to the size of the available sample, the observed flood hydrographs can, of course, be trusted for the evaluation of reliable average values and for the identification of a shape of the hydrographs typical of the site of interest. On the other hand, such a reduced sample size is not enough to allow a reliable estimate of maximum values for the high return periods to be investigated in flood hazard assessment. It is, therefore, advisable to resort to alternative methodologies. As a consequence, the peak flow quantiles were estimated through a regional analysis, which requires the evaluation of an index flood and of a growth factor, a dimensionless function increasing with the return period, that allows the identification of the peak flow values for the return periods of interest once multiplied by the index flood.

**Figure 2.** Stage–discharge relationships for the Chiavenna at Saliceto.

### *3.2. Peak Discharge Estimation*

### 3.2.1. Index Flood

Regionalization procedures allow exploitation of the observations available for a group of watersheds sharing similar behavior and gathered in a homogeneous region to which the catchment of interest is believed to belong, due to geographic location or characteristics. In general, a regionalization procedure is accomplished, first of all, through the determination of a flow scale factor proper of the site of interest, independent of the return period, namely the index flood [45,46]. Index flood is often assumed to be equal to the average of the annual maximum peak flows at the site of interest. Alternatively, it is possible to resort to indirect methods. Among others, multi-regressive models that express the index flood as a function of some significant features of the basin and of the watercourse can be adopted [45–48].

With reference to [45–48], two multi-regressive expressions for the index flood at the river sections of interest were evaluated. In the first, the area *A* of the catchment, the length *Lmb* of the main branch, and the hourly rain value *m*(*h*1) are taken into consideration:

$$Q\_{I1} = 2.797 \cdot 10^{-5} A\_{\tilde{A}}^{1.235} m (h\_1)^{3.513} L\_{mb}^{-0.72} \text{ \AA} \tag{1}$$

while, in the second, the index flood is calculated with reference to the area *A* of the catchment, the average altitude *Hmed* of the basin at the section of interest, and the pluvial average value relating to the daily rainfall *m*(*h<sup>d</sup>* ):

$$Q\_{l2} = 2.1 \cdot 10^{-4} \,\text{ $\mathcal{A}^{0.082}$ } m (h\_d)^{2.416} H\_{med}^{-0.469} \tag{2}$$

For the Chiavenna basin (from [45,49]), it is possible to estimate the values of 26 mm and 66 mm, respectively, for *m*(*h*1) and *m*(*h<sup>d</sup>* ). These values are in very good agreement with the corresponding values of 24.1 mm and 65.9 mm obtained from the rain sample introduced in Section 3.1.

− The application of Equations (1) and (2) leads to the values shown in Table 3. For the Chiavenna at Saliceto, it is observed that the index flood obtained with the two different multi-regressive relationships is in good agreement with the value obtained analyzing the at site flood waves, equal to 94.8 m<sup>3</sup> s −1 , evaluated in Section 3.1. As a precautionary choice, the maximum values provided by the regressive relations at the sections of interest, which correspond to the outcome of Equation (1), will be adopted (bolded in Table 3).


**Table 3.** Index floods for the gauging stations of interest (selected values in bold type).

### 3.2.2. Growth Factor

The VA.PI. project [48] and the following in-depth studies divided the Italian territory into different macro-regions that were, in turn, subdivided into homogeneous zones, each characterized by a different growth factor. For the different areas of the macro-regions, the growth curves were obtained from the analysis of the multiple gauging stations present in the considered territory and provided with reliable stage–discharge relationships. The growth factor is identified on the basis of a GEV distribution having expression:

$$\chi\_T = \lg \frac{\mathfrak{a}}{\mathfrak{k}} \left[ 1 - e^{-\kappa y} \right] \tag{3}$$

in which *y* is the Gumbel reduced variate. With regard to Zone C, the closer to the Chiavenna basin, the three parameters *ξ*, *α*and *κ* in Equation (3) assume the values 0.643, 0.377, and −0.276, respectively. , and −

Other studies have been based on an updated version of the original database of the VA.PI. project and have further divided the Emilia-Romagna–Marche region into sub-regions, each with its own growth curve [45–47,50].

The growth curve relating to the Zone C region can be compared with that deduced from the analysis of the series of maximum annual peak flows in Saliceto. The sample is well interpreted by a GEV distribution from which a growth factor close to that of Zone C is derived, as shown in Figure 3.

**Figure 3.** Growth factors as a function of the return period *T*.

From the index floods and growth factors derived from the previous analyses, the peak discharges were obtained at the sites of interest simply by multiplying the maximum index flood by the growth factor for the chosen return periods of 20, 50, 200, and 500 years (Table 4).


**Table 4.** Peak discharges for the chosen return periods at the sites of interest.

### *3.3. Definition of the Design Hydrographs*

3.3.1. Flow Reduction Curve and Peak Position Ratio

In order to evaluate hazard and risk maps, it is also necessary to estimate the characteristic volumes of the floods and their shape at the site of interest. These features are well represented by the reduction factor *εD*(*T*): *ε*

$$
\varepsilon\_D(T) = \frac{\overline{Q}\_D(T)}{Q\_0(T)}\tag{4}
$$

where *Q*0(*T*) is the peak discharge of *T* years return period and *QD*(*T*) is the average discharge over the duration *D* of the same return period, and by the position *r<sup>D</sup>* (0 ≤ *r<sup>D</sup>* ≤ 1) of the peak of the hydrographs in the same intervals (Figure 4). Some methodologies for identifying the reduction factor and peak position are available for instrumented river sections, where a series of historical floods has been recorded, and for which a reliable stage– discharge relationship is available. In the absence of the second condition, the reduction curve can be inferred, after appropriate processing, on the basis of the trend of the water stages instead of discharges. If water stages are also not available, as for the majority of non-instrumented river sections, some indirect methods still allow the estimation of the reduction factor and peak position through empirical regional relations, which express them as a function of some characteristics of the basin under consideration [51]. ≤ ≤

**Figure 4.** Data sampling of *Q<sup>D</sup>* and *r<sup>D</sup>* from a historical hydrograph (*D* = 60 h).

− In the present case, the water stage series recorded at Saliceto was analyzed directly and after its conversion into discharges through the rating curve previously derived. By analyzing the *N* available flow hydrographs with *N<sup>D</sup>* moving windows, it was then possible to extract *N<sup>D</sup>* samples with the size *N* for the value of the maximum annual average flow and (*N<sup>D</sup>* − 1) samples for the peak position ratio for the durations of interest (for the duration 0 the position of the peak is, in fact, not defined). The choice of the maximum window duration *D<sup>f</sup>* must derive from a preliminary examination of the characteristic

durations of the most significant historical flood hydrographs. For the Chiavenna stream at the gauging site of Saliceto, durations from 0 to 96 h on an hourly basis were considered (*D<sup>f</sup>* = 96 h, *N<sup>D</sup>* = 97).

Although, in the general case, the reduction ratio *ε<sup>D</sup>* is a function of duration *D* and return period *T*, in many practical cases, it can be assumed independent of the latter. This is strictly true only if—neglecting the influence of the statistical moments higher than the second—the coefficient of variation *CV Q<sup>D</sup>* and the probability distribution type of *Q<sup>D</sup>* can be considered independent of *D*. Under these assumptions, *ε<sup>D</sup>* becomes independent of *T* and reduces to the ratio of the averages of *Q<sup>D</sup>* and *Q*0: *ε* (ത ) ത *ε*

$$
\varepsilon\_D = \frac{\mu(\overline{Q}\_D)}{\mu(Q\_0)}\tag{5}
$$

The expression of the reduction curve through a continuous and differentiable function of *D*, depending on the fewest possible parameters, can be advantageous. Based on the crossing properties of a given threshold value from continuous Gaussian stationary stochastic processes, [52] derived the following theoretical formulation: ( )

$$\varepsilon\_{D} = \sqrt{\frac{\theta}{2D} \left[ 2 + e^{-\frac{4D}{\theta}} - \frac{3\theta}{4D} \left( 1 - e^{-\frac{4D}{\theta}} \right) \right]} \tag{6}$$

Besides the advantages coming from a theoretical basis and from the presence of a unique time parameter *θ*, Equation (6) is particularly suitable to fit the empirical reduction ratios of large catchments. Moreover, [53] showed that *θ* can be strictly correlated to the time of concentration of the catchment. 

For medium-size catchments (area between 100 and 1000 km<sup>2</sup> ) [54], like the one considered here, Equation (6) does not accurately fit the empirical reduction factor (Figure 5). It was, therefore, decided to adopt the following generalized form:

$$
\varepsilon\_D = \left[\frac{\theta}{2D} \left[2 + e^{-\frac{4D}{\theta}} - \frac{3\theta}{4D} \left(1 - e^{-\frac{4D}{\theta}}\right)\right] \right]^\beta \tag{7}
$$

$$
\dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots
$$

The higher flexibility given by the two parameters (*θ* and *β*) allows a better fit for the sample values (here *β* = 0.7 and *θ* = 7.95 h) (Figure 5). It is worth noting that other commonly adopted two-parameter functions, such as the one suggested by [16]: *θ β β θ*

$$
\varepsilon\_D = \left(1 + \gamma D\right)^{-\delta} \tag{8}
$$

where *γ* and *δ* are positive parameters, do not fit the empirical data in this case as well (Figure 5). 

**Figure 5.** Reduction factor at Saliceto gauging station.

An interesting result can be obtained applying the same procedure to the water stage time series instead of to the discharge time series. If the stage–discharge relationship can be expressed by means of a monomial function of the type adopted in Section 3.1, a very good agreement between the reduction curve determined from the discharges and the one determined on the basis of the flood stages is achieved if the following transformation is adopted:

$$
\varepsilon\_{D|Q} \cong \varepsilon\_{D|h}{}^{\lambda b} \tag{9}
$$

where *b* is the exponent of the stage–discharge relationship expressed in monomial form and *λ* is a corrective coefficient close to 1. For the Chiavenna at Saliceto, Equation (9) is satisfied in a very good way when *λ* is equal to the value 0.91 (Figure 6). *λ* |ொ ≅ | *λ*

ఒ

This result, even if it must be confirmed with further studies, allowed expansion of the analysis, also, to the station on the Riglio stream, for which a reliable stage–discharge relationship was not available. *λ λ*

In the almost total lack of useful observations for the Chero stream, and given the similarity among the watercourses under investigation, it was decided to adopt the same flow reduction factor and peak position ratio relations (i.e., the same waveform) obtained for the Saliceto gauging station for the Chero stream also.

With regard to the expression of the peak position ratio introduced above, a continuous and differentiable function of *D* can be useful. *N<sup>D</sup>* time series of observed peak position ratios are available for the Saliceto gauging station, one for each duration considered. For the purpose of the evaluation of the SDH, the average value of each series was calculated and the *N<sup>D</sup>* values thus achieved were interpolated with the expression:

$$r\_D(D) = a\_1 + \frac{a\_2}{a\_3 + (D)^{a\_4}} \tag{10}$$

ర

<sup>ଷ</sup> + ()

imposing *rD*(0) = 0.5 (Figure 7).

**Figure 6.** Reduction factor expressed through the stage reduction factor at Saliceto.

**Figure 7.** Observed and interpolated peak position ratio.

### 3.3.2. Evaluation of the SDHs

Following [55], the evaluation of the synthetic design hydrographs (SDHs) is carried out by imposing that the maximum average discharge in each duration coincides with the one predicted by the reduction curve. The shape of the hydrographs is then determined by the peak position ratio behavior.

The synthetic hydrograph is, therefore, defined by the conditions:

$$\int\_{-r\_D D}^{0} Q(\tau; T) d\tau = r\_D \overline{Q}\_D(T) D; \quad \int\_{0}^{(1-r\_D)D} Q(\tau; T) d\tau = r\_D (1-r\_D) \overline{Q}\_D(T) D \tag{11}$$

The evaluation of the two branches of the hydrograph *Q*(*t*; *T*), before and after the peak, is obtained by differentiating Equation (11) with respect to duration *D* [55] once the expression for the quantiles of the average maximum flow rate as a function of the reduction factor is introduced: ିವ 

$$Q\_D(T) = \varepsilon\_D Q\_0(T) \tag{12}$$

Once the index flood, the growth factor, the reduction factor, and the peak position are evaluated, it is possible to identify the set of SDHs for each section of interest and for the chosen return periods. As an example, Figure 8 shows the SDHs obtained through the adoption of the described procedure for the gauging station of Saliceto.

**Figure 8.** Design hydrographs obtained for the Chiavenna stream at Saliceto gauging station.

### 3.3.3. Water Stage Record Behavior

Riglio, Chero, and Chiavenna streams are usually subject to simultaneous floods. Negligible time lags between the peaks of the flood hydrographs, in the order of one to three hours, were, in fact, inferred by observing the historical water stage records (Figure 9). In the setting up of flooding scenarios, a synchronous behavior of the hydrological inputs will, therefore, be assumed.

**Figure 9.** Water stages at the gauging stations of interest for two semesters in (**a**) 2009 and (**b**) 2016.

### **4. Hydraulic Analysis**

Hydraulic analyses focused on the hydrographic network, including the main reach of the Chiavenna stream and of the Chero and Riglio tributaries (Figure 10). The Riglio stream flows in a rather engraved riverbed and is provided with discontinuous embankments of modest height (1–2 m). The Chero stream has almost negligible embankments (<1 m) and runs through an alluvial plain characterized by a rather modest incision, unlike the Chiavenna, which flows upstream of Roveleto in a rather deep valley. The levee system of the Chiavenna stream originates at Roveleto and develops for about 10 km, with increasing height up to the confluence with the main Po River. All watercourses are crossed by several bridges.

**Figure 10.** Modeled river network.

### *4.1. Numerical Model*

### 4.1.1. Hydraulic Model

The 2D hydraulic simulations were performed using the PARFLOOD code [37,39], developed at the University of Parma. This numerical model is based on an explicit finite volume discretization of the fully 2D shallow water equations (SWEs) [56], expressed using the well-balanced formulation proposed by [57]. The model is second-order accurate in both time and space, thanks to the adoption of the second-order Runge–Kutta method and of a depth-positive MUSCL extrapolation, while fluxes are evaluated using the HLLC approximate Riemann solver [56]. The model is compatible with both Cartesian grids and structured nonuniform grids, named Block-Uniform Quadtree (BUQ), as detailed in [37]. The adoption of an unevenly distributed spatial resolution is useful for reducing the number of computational cells in the domain and, consequently, the computational burden, while ensuring high accuracy in the areas of greatest interest (e.g., rivers, channels, buildings, hydraulic structures, embankments). Moreover, the code is developed in the compute unified device architecture (CUDA) environment, which enables parallel computing on graphics processing units (GPUs), leading to a drastic reduction in runtimes (of about two orders of magnitude) compared to serial codes, even for domains of several million cells [39]. The good performances of the PARFLOOD model in both simulations of theoretical cases and practical applications over complex bathymetries are well documented in previous works [11,34,58–62], to which the reader is referred for further details.

### 4.1.2. Bridge Modeling

Among the approaches available for the inclusion of bridges and other hydraulic structures in the 2D numerical models, internal boundary conditions (IBC) were favored in PARFLOOD thanks not only to their applicability to different flow conditions and structure types, but also to their suitability to predict the field-scale backwater effects induced by the fluid/structure interaction in case of high flows.

The IBC implementation is described in detail by [44] and, here, only very briefly recalled. In summary, each bridge is represented as a segment in the plane, which separates "upstream" and "downstream" IBC cells. The average water levels upstream and downstream of the bridge are computed over these cells, and then used to identify the current flow condition (free-surface flow, partially or fully pressurized flow, overtopping, Figure 11). If levels are below the bridge deck, the IBCs are not activated and only the bridge pier obstructions are considered. Otherwise, the discharge flowing through the structure is evaluated using available formulae from the literature [63] and imposed as specific discharge values at upstream IBC cells (redistributed according to the area available to flow). The same values are imposed in the corresponding downstream IBC cells, while water surface elevations are not modified, so that mass conservation is not impaired. Finally, the special flux treatment adopted at the edges between upstream and downstream IBC cells allows the bridge discontinuity to be naturally taken into account. An example of the application of IBCs to bridges in urban flood modeling is reported in [61].

**Figure 11.** Sketch of a bridge and flow conditions.

### *4.2. Model Setup*

4.2.1. DTM and Building Treatment

The study area covers over 180 km<sup>2</sup> . A recent LiDAR survey provided a digital terrain model (DTM) with a resolution of 1 m × 1 m, which was converted to a raster map with a grid size of 2 m × 2 m, considered adequate for this test case. In the grid coarsening process, particular attention was paid to preserving the elevation of retaining structures along the streams and other thin linear topographic features in the domain. A further preprocessing of the DTM was necessary to restore the embankment crest elevations that were not correctly described due to the removal of the bank vegetation cover from the raw LiDAR data, and to integrate in the DTM the bathymetric portion of the riverbed not correctly detected due to the presence of water at the time of the survey.

The presence of buildings in urban areas was included by adopting the building hole (BH) approach [64]. This strategy requires that the computational cells falling within the footprints of buildings are removed from the mesh, which may be achieved by superimposing the shapefile containing the outlines of buildings on the DTM. Such a detailed building treatment is possible thanks to the high mesh resolution. Moreover, a total number of 21 bridges (Figure 12) were introduced in the hydraulic model.

**Figure 12.** DTM of the hydraulic model and position of boundary conditions and modeled bridges.

### 4.2.2. Mesh

Starting from the DTM at 2 m × 2 m, a multiresolution grid with cells of variable size, from a minimum value of 2 m to the maximum of 2<sup>4</sup> = 16 m, was built. In order to model the flood propagation accurately, all the main waterways, urban areas, and road and rail communication routes were described with the highest resolution (Figure 13). Elsewhere, the spatial resolution is automatically relaxed by the preprocessing algorithm, as described in [37]. The calculation grid thus identified is made up of a total of 6.8 million cells.

**Figure 13.** Multiresolution grid and main infrastructures present in the modeled domain; 2 m × 2 m cells in black, 4 m × 4 m cells in blue, 8 m × 8 m cells in green, and 16 m × 16 m cells in red.

### 4.2.3. Domain Roughness

− For the river beds, in the absence of data for calibration, a value of the Strickler's roughness coefficient of 25 m1/3 s <sup>−</sup><sup>1</sup> was assumed based on local inspections, literature suggestions [65], and previous studies conducted by the authors concerning neighboring watersheds with similar characteristics [62]. Moreover, this value allowed the reproduction of, at best, the numerical rating curve previously obtained at Saliceto through 1D hydraulic modeling.

− − − − For the flooded areas, data for the calibration of roughness are usually unavailable. In literature applications, values for rural areas are highly variable, covering a range between 10 and 40 m1/3 s −1 [66–69]. For the low plain areas of Emilia-Romagna, indications can be obtained from the work by [34], where a Strickler's coefficient equal to 20 m1/3 s −1 was calibrated to reproduce the well-known dynamics of the inundation caused by an embankment breach. Preliminary simulations of the present study indicated no significant differences in maximum water depths if values equal to 20 or 25 m1/3 s <sup>−</sup><sup>1</sup> were adopted outside the riverbeds, while flooding propagation times are slightly more influenced. As far as the time of arrival of the flooding is important in the emergency management phase during a real event, it is completely irrelevant for the identification of potentially floodable areas. In the absence of further information, it was, therefore, decided to adopt a uniform Strickler's coefficient equal to 25 m1/3 s <sup>−</sup><sup>1</sup> over the whole domain.

### 4.2.4. Boundary and Initial Conditions

The SDHs obtained through the hydrological analysis were imposed as upstream boundary conditions at the sections indicated in Figure 12. At the confluence of the Chiavenna with the Po River, a constant water level was imposed as a downstream boundary condition, due to the presence of a power plant on the Po River just two kilometers downstream of the confluence. The water level upstream of the barrage of the power plant is

constantly kept at 41.00 m a.s.l. [70], except for very high floods on the Po River, which never occur simultaneously with the floods of the Chiavenna stream, due to the enormous difference in size of the two river basins.

A steady flow of 10, 15, and 20 m<sup>3</sup> s −1 for the Chero, Chiavenna, and Riglio streams, respectively, was assumed as the initial condition.

### **5. Results**

Four flooding scenarios were considered, adopting as inflow boundary conditions the synthetic design hydrographs estimated for *T* = 20, 50, 200, and 500 years. For the sake of space, only the main results of the study related to the *T* = 200 years reference scenario (the current flood protection standard in Italy) are presented here and discussed in detail. For this reference scenario, an additional corresponding simulation in the absence of all bridges was performed, in order to evaluate the obstruction effect exerted by these structures. The comparison between the numerical results in the presence and absence of bridges allows an immediate assessment of their influence on the flooding dynamics and on the extent of the flooding upstream and downstream of the urban area of Roveleto.

For the 200 years scenario in the current state, the extension of the flooded areas is remarkable (Figure 14a). Flooding due to the insufficient conveyance of the Chero extends downstream, reaching Roveleto and partly reflowing into the Chiavenna stream (A in Figure 14a). The flooding involves almost the entire town of Roveleto, where pressurized flow is observed at all the bridges, some of which are also overtopped, albeit to a modest extent. The flooded waters also lean against the railway embankment, which is largely overtopped. Proceeding northwards, the flooding extends further downstream, reaching the motorway and the high-speed railway (B in Figure 14a), with depths higher than 2 m. Both the infrastructures are overtopped, and the flooding spreads in the northeast direction towards San Pietro in Cerro and is retained by other motorway embankments. Additional inundations due to outflows from the Riglio stream are also observed, and the water depths are particularly high in the rural area bounded by the two streams, upstream of Caorso. Overall, the scenario is extremely severe in terms of flooded urban and suburban areas and of the crucial transport infrastructures involved.

The same hydrological scenario has been simulated in the hypothetical situation of the absence of all bridges. In this case, a minor extension of the flooding and lower water depths occurs in the areas upstream of the removed crossing structures. At the same time, larger flooding areas are estimated downstream due to the higher discharge released, which exceeds, to a greater extent, the conveyance of the downstream river branches (Figure 14b).

The differences in the flooding extent are even more evident if we focus on the urban area. Figure 15a shows the detail inside Roveleto in the current state, showing that urban areas are totally flooded due to the backwater effect induced by the presence of crossing structures with insufficient conveyance. With reference to the northernmost portion of the inhabited area between the two streams, close to the confluence, the water depths on the ground are still quite low (less than 0.6 m, waterways and underpasses excluded). However, it should be noted that the presence of even a few tens of centimeters on the ground necessarily implies the total flooding of the underground floors, a circumstance that is not immediately evident due to the lack of description of these elements in the DTM.

**Figure 14.** Maximum water depths at Roveleto for *T* = 200 years: (**a**) current condition; (**b**) hypothetical scenario of absence of bridges.

**Figure 15.** Maximum water depths at Roveleto for *T* = 200 years: (**a**) current condition; (**b**) hypothetical scenario of absence of bridges.

Even in the hypothetical scenario of absence of all the bridges, the majority of the urban area of Roveleto is flooded, but with almost halved water depths compared to the

current situation. The reduction in the maximum levels, of the order of several tens of centimeters, would result in a significant reduction in the areas involved, preserving more than half of the urban area from flooding. From Figure 15b, it can be appreciated that the portion of Roveleto, bounded by the Chero and the Chiavenna immediately upstream of their confluence, would be almost entirely unaffected by the flooding. It then follows that, even if the railway is still overtopped, the flooding spreading towards the northeast involves a smaller area. This is obviously due to the lower level of the flooding originating upstream of the highway as a result of the free outflow at the section of the bridge (here removed). The overtopping of the embankments of the highway and of the high-speed railway would be, in this case, minimal. In the absence of bridges, the decrease in maximum water levels ranges from a few tens of centimeters up to one meter and more, with an extreme benefit in terms of population and assets involved. However, the solution of rebuilding the majority of the bridges, increasing their deck level and removing abutments (situation similar to this hypothetical scenario), does not appear to be easily achievable due to the elevation constraints induced by the existing main infrastructures, such as roads, motorways, and railways. Moreover, the removal of the crossing structures, due to the consequent absence of the obstruction effect, allows greater discharges to propagate downstream, giving rise to an increase in water levels in the lower stretch of the stream compared to the current situation, possibly inducing additional outflows. For this reason, the rebuilding of the bridges, besides being difficult to implement, does not even seem appropriate to secure the entire territory from the risk of flooding.

### *Bridges' Conveyance*

Figure 16 shows a few examples of the maximum water levels upstream of six of the more than twenty bridges present in the computational domain (see Figures 12 and 14 for their location).

**Figure 16.** Maximum water levels for some of the crossing structures of interest.

It is worth noting that each profile does not refer to a particular instant, but represents the envelope of all the results obtained during each transient simulation, possibly including flooding outside of the river (see bridge C4 for the 200- and 500-years scenarios). For the most upstream bridges, incipient (C1) or weak (C3) pressurization is already estimated for the *T* = 20 years scenario. The conditions become slightly less severe further downstream (C4) due to the reduced discharge caused by the upstream flooding. For *T* = 50 years, complete pressurization conditions are estimated at the bridges inside Roveleto (C1, C3, and C4). Free outflow instead occurs from the railway crossing onwards (C5, C7), with few exceptions (C6). For the scenario with return period *T* = 200 years, for all bridges inside Roveleto, pressurization and incipient overflow conditions occur. Again, free outflow is observed along the Chiavenna from the railway onwards, with the exception of the motorway bridge that is slightly overtopped on the left side (C6). With reference to the catastrophic scenario for *T* = 500 years, a general worsening of the outflow conditions inside Roveleto is observed. Free outflow is still observed along the Chiavenna at the railway crossing, while the outflow conditions at the bridges further downstream worsen appreciably.

### **6. Discussion and Conclusions**

The case study analyzed here can be considered a reference example of the many challenges faced in hydrologic/hydraulic flood hazard modeling when dealing with poorly gauged watersheds. In this work, some improvements in already consolidated literature methodologies are introduced as regards the hydrologic analysis, while a recently proposed approach for bridge inclusion in fully 2D hydrodynamic models solving the complete SWEs is applied for the first time, with the purpose of assessing the adequacy of bridge conveyance in a complex urban environment.

As regards the hydrologic analysis, due to the scarcity of reliable field observations, the flooding scenarios simulated for different return periods were conducted with reference to synthetic design hydrographs, the peak discharges of which were estimated through a regional approach. Of course, this procedure can lead to results affected by an uncertainty that is difficult to quantify. Nevertheless, the choice to rely on estimates based on regional procedures seems appropriate and promising. Recent studies have, in fact, shown that, in this region, the methodologies adopted for the estimation of the discharge quantiles are characterized by a good reliability [47]. It will be very important in the future to deepen the analyses at a regional scale by expanding the database of direct observations of the reference hydrological variables as much as possible. Some difficulties arising in the fitting of the empirical reduction ratios through the well-known Equation (6) suggested adopting a generalized form of the same relation, in which the presence of an additional parameter allowed a better fit for the available observed values. Moreover, the possibility of deriving the reduction factor on the basis of the analysis of water stages (therefore, not relying on the stage–discharge conversion) could dramatically increase the set of observations available for regional analyses. This would make it possible to exploit historical information available at gauging stations devoid of reliable rating curves, but sometimes rich in several decades of reliable hydrometric water stage records instead. Large-scale analyses that cannot be undertaken at the present time since, in the vast majority of watersheds, only a small percentage of gauging stations is equipped with a reliable stage–discharge relationship would, therefore, be allowed. Both of these aspects seem to be worthy of further investigation.

As regards hydrodynamic modeling, all flooding scenarios are simulated here adopting a GPU-parallelized model based on an explicit shock-capturing finite volume method for the solution of the fully 2D shallow water equations. The river and the flood-prone areas belong to a unique computational domain, and the flow is modeled avoiding the special treatments required by other simplified numerical schemes (quasi 2D, 1D–2D, etc. [71]). Moreover, the hydraulic modeling of the many bridges present in the flow field is here conducted following a truly 2D approach. This prevents significant flow field distortions that can occur in case contiguous crossing structures are modeled following

too simplified approaches. Thanks to an efficient parallelization, the issues related to long computational times are overcome, even in the case of high-resolution grids and even when a very large number of crossing structures are introduced. This is of great importance since preserving computational efficiency without affecting the accuracy of the description of the interfering elements is a fundamental objective to lay the foundations for an increasingly advanced ultra-fast modeling that allows real-time detailed simulation to be achieved in the near future.

The strategy of comparing flooding scenarios with and without bridges is used here to investigate possible structural measures to reduce the flood hazard in the study area, e.g., the rebuilding of bridges with increased deck elevation and in the absence of abutments. Indeed, it is evident that most of the bridges are inadequate to convey floods with large return periods. In the absence of bridges, the reduced backwater upstream of the crossing structures would result in a significant decrease in the inundated areas, preserving from flooding at least half of the inhabited center of Roveleto. However, this solution does not appear to be easily practicable due to the elevation constraints induced by the existing infrastructures, roads, and railways. Furthermore, this intervention would not be sufficient to prevent the urban inundation completely, and does not even appear entirely desirable, given the proven worsening of the downstream conditions due to the greater discharge released. Therefore, further structural interventions have to be designed, and the highresolution 2D model setup for this study can be an extremely powerful tool for assessing the effectiveness of any adaptation measure to reduce the exposure to flood of Roveleto without increasing flood hazard in downstream territories. Examples may include the local adjustments of the embankments' elevations or the identification of temporarily floodable areas upstream of the urban center.

In summary, the flood hazard assessment conducted here with reference to a poorly instrumented watershed has highlighted how the use of an integrated approach of hydrological and hydraulic methodologies can lead to results useful for the design of infrastructures and civil protection purposes. The methodology proposes a series of good practices that can also be applied in those circumstances in which the essential assessment of the flood hazard in highly urbanized areas may, at a first glance, appear strongly discouraged by the scarcity of reliable local hydrological information.

**Author Contributions:** Conceptualization and methodology, F.A., P.M. and F.P.; software, F.P., F.A. and S.D.; validation, F.P. and F.A.; investigation, F.A. and F.P.; writing—original draft preparation, F.A.; writing—review and editing, F.A., P.M., F.P. and S.D.; visualization, F.A. and F.P.; supervision, F.A., S.D. and P.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the first author. The data are not publicly available due to restrictions from the raw data providers.

**Acknowledgments:** The authors would like to thank Regional Agency for Territorial Security and Civil Protection of Emilia-Romagna, Italy, for supporting this study. This research benefits from the HPC (High-Performance Computing) facility of the University of Parma.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Flood Early Warning Systems Using Machine Learning Techniques: The Case of the Tomebamba Catchment at the Southern Andes of Ecuador**

**Paul Muñoz 1,2,\* , Johanna Orellana-Alvear 1,2 , Jörg Bendix 3 , Jan Feyen <sup>4</sup> and Rolando Célleri 1,2**


**Abstract:** Worldwide, machine learning (ML) is increasingly being used for developing flood early warning systems (FEWSs). However, previous studies have not focused on establishing a methodology for determining the most efficient ML technique. We assessed FEWSs with three river states, *No-alert*, *Pre-alert* and *Alert* for flooding, for lead times between 1 to 12 h using the most common ML techniques, such as multi-layer perceptron (MLP), logistic regression (LR), K-nearest neighbors (KNN), naive Bayes (NB), and random forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as a case study. For all lead times, MLP models achieve the highest performance followed by LR, with *f* 1-macro (*log-loss*) scores of 0.82 (0.09) and 0.46 (0.20) for the 1 h and 12 h cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the *Pre-alert* and *Alert* states. The proposed methodology for selecting the optimal ML technique for a FEWS can be extrapolated to other case studies. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of society of floods.

**Keywords:** flood early warning; forecasting; hydrological extremes; machine learning; Andes

### **1. Introduction**

Flooding is the most common natural hazard and results worldwide in the most damaging disasters [1–4]. Recent studies associate the increasing frequency and severity of flood events with a change in land use (e.g., deforestation and urbanization) and climate [2,5–7]. This particularly holds for the tropical Andes region, where complex hydro-meteorological conditions result in the occurrence of intense and patchy rainfall events [8–10].

According to the flood generation mechanism, floods can be classified into long- and short-rain floods [11,12]. A key for building resilience to short-rain floods is to anticipate in a timely way the event, in order to gain time for better preparedness. The response time between a rainfall event and its associated flood depends on the catchment properties and might vary from minutes to hours [13]. In this study special attention is given to flash-floods, which are floods that develop less than 6 h after a heavy rainfall with little or no forecast lead time [14].

Flood anticipation can be achieved through the development of a flood early warning system (FEWS). FEWSs have proved to be cost-efficient solutions for life preservation, damage mitigation, and resilience enhancement [15–18]. However, although crucial, flood forecasting remains a major challenge in mountainous regions due to the difficulty to

**Citation:** Muñoz, P.;

Orellana-Alvear, J.; Bendix, J.; Feyen, J.; Célleri, R. Flood Early Warning Systems Using Machine Learning Techniques: The Case of the Tomebamba Catchment at the Southern Andes of Ecuador. *Hydrology* **2021**, *8*, 183. https:// doi.org/10.3390/hydrology8040183

Academic Editors: Marina Iosub and Andrei Enea

Received: 25 November 2021 Accepted: 13 December 2021 Published: 16 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

effectively record the aerial distribution of precipitation due to the sparse density of the monitoring network and the absence of high-tech equipment by budget constraints [8,9].

To date, there has been no report of any operational FEWS in the Andean region for scales other than continental [17,19,20]. An alternative attempt in Peru aimed to derive daily maps of potential floods based on the spatial cumulated precipitation in past days [21]. Other endeavors in Ecuador and Bolivia focused on the monitoring of the runoff in the upper parts of the catchment to predict the likelihood of flood events in the downstream basin area [19,22]. However, such attempts are unsatisfactory as countermeasures against floods and especially flash-floods, where it is required to have reliable and accurate forecasts with lead times shorter than the response time between the farthest precipitation station and runoff control point.

There are two paradigms that drive the modeling of the precipitation-runoff response. First, the physically-based paradigm includes knowledge of the physical processes by using physical process equations [23]. This approach requires extensive ground data and, in consequence, intensive computation that hinders the temporal forecast window [24]. Moreover, it is argued that physically based models are inappropriate for real-time or short-term flood forecasting due to the inherent uncertainty of river-catchment dynamics and over-parametrization of this type of model [25]. The second data-driven paradigm assumes floods as stochastic processes with an occurrence distribution probability derived from historical data. Here, the idea is to exploit relevant input information (e.g., precipitation, past runoff) to find relations to the target variable (i.e., runoff) without requiring knowledge about the underlying physical processes. Among the traditional data-driven approaches, statistical modeling has proven to be unsuitable for short-term prediction due to lack of accuracy, complexity, model robustness, and even computational costs [24]. Previous encouraged the use of advanced data-driven models, e.g., machine learning (ML), to overcome the aforementioned shortcomings [7,24,26,27]. Particularly during the last decade, ML approaches have gained increasing popularity among hydrologists [24].

Different ML strategies for flood forecasting are implemented, generating either quantitative or qualitative runoff forecasts [18,28–38]. Qualitative forecasting consists of classifying floods into distinct categories or river states according to their severity (i.e., runoff magnitude), and use this as a base for flood class prediction [30,37,39]. The advantage of developing a FEWS is the possibility to generate a semaphore-like warning system that is easy to understand by decision-makers and the public (non-hydrologists). The challenge of FEWSs is the selection of the most optimal ML technique to obtain reliable and accurate forecasts with sufficient lead time for decision making. To date, the problem has received scant attention in the research literature, and as far as our knowledge extends no previous work examined and compared the potential and efficacy of different ML techniques for flood forecasting.

The present study compares the performance of five ML classification techniques for short-rain flood forecasting with special attention to flash floods. ML models were developed for a medium-size mountain catchment, the Tomebamba basin located in the tropical Andes of Ecuador. The ML models were tested with respect to their capacity to forecast three flood warning stages *(No-alert*, *Pre-alert* and *Alert*) for varying forecast lead times of 1, 4, and 6 h (flash-floods), but also 8 and 12 h to further test whether the lead time can be satisfactorily extended without losing the models' operational value.

This paper has been organized into four sections. The first section establishes the methodological framework for developing a FEWSs using ML techniques. It will then go on to describe the performance metrics used for a proper efficiency assessment. The second section presents the findings of the research following the same structure as the methodological section. Finally, the third and fourth sections presents the discussion and a summary of the main conclusions of the study, respectively.

### **2. Materials and Methods**

### *2.1. Study Area and Dataset*

The study area comprises the Tomebamba catchment delineated upstream of the Matadero-Sayausí hydrological station of the Tomebamba river (Figure 1), where the river enters the city. The Tomebamba is a tropical mountain catchment located in the southeastern flank of the Western Andean Cordillera, draining to the Amazon River. The drainage area of the catchment is approximately 300 km<sup>2</sup> , spanning from 2800 to 4100 m above sea level (m a.s.l.). Like many other mountain catchments of the region, it is primarily covered by a páramo ecosystem, which is known for its important water regulation function [8].

**Figure 1.** The Tomebamba catchment located at the Tropical Andean Cordillera of Ecuador, South America (UTM coordinates).

The Tomebamba river plays a crucial role as a drinking water source for the city of Cuenca (between 25% to 30% of the demand). Other important water users are agricultural and industrial activities. Cuenca, which is the third-largest city of Ecuador (around 0.6 million inhabitants), is crossed by four rivers that annually flood parts of the city, causing human and significant economic losses.

The local water utility, the Municipal Public Company of Telecommunications, Water, Sewerage and Sanitation of Cuenca (ETAPA-EP), defined three flood alert levels associated with the Matadero-Sayausí station for floods originating in the Tomebamba catchment: (i) *No-alert* of flood occurs when the measured runoff is less than 30 m3/s, (ii) *Pre-alert* when runoff is between 30 and 50 m3/s, and (iii) the flood *Alert* is triggered when discharge exceeds 50 m3/s. With these definitions, and as shown in Figure 2, the discharge label for the *No-alert* class represents the majority of the data, whereas the *Pre-alert* and *Alert* classes comprise the minority yet the most dangerous classes.

To develop and operate forecasting models, we use data of two variables: precipitation in the catchment area and river discharge at a river gauge. For both variables, the available dataset comprises 4 years of concurrent hourly time series, from Jan/2015 to Jan/2019 (Figure 2). Precipitation information was derived from three tipping-bucket rain gauges: Toreadora (3955 m a.s.l.), Virgen (3626 m a.s.l.), and Chirimachay (3298 m a.s.l.) installed within the catchment and along its altitudinal gradient. Whereas for discharge, we used data of the Matadero-Sayausí station (2693 m a.s.l., Figure 1). To develop the ML modes, we split the dataset into training and test subsets. The training period ran from 2015 to 2017, whereas 2018 was used as the model testing phase.

**Figure 2.** Time series of precipitation (Toreadora) and discharge (Matadero-Sayausí). Horizontal dashed lines indicate the mean runoff and the currently employed flood alert levels for labeling the *Pre-alert* and *Alert* flood warnings classes.

### *2.2. Machine Learning (ML) Methods for Classification of Flood Alert Levels*

ML classification algorithms can be grouped in terms of their functionality. According to Mosavi et al. (2018), five of the worldwide most-popular statistical method groups are commonly used for short-term flood prediction (extreme runoff), and include:


For this study, we selected five ML algorithms, one from each group, respectively, a logistic regression, K-nearest neighbor, random forest, naive Bayes, and a multi-layer perceptron.

### 2.2.1. Logistic Regression

Logistic Regression (LR) is a discriminative model, modeling the decision boundary between classes. In a first instance, linear regressions are applied to find existent relationships between model features. Thereafter, the probability (conditional) of belonging to a class is identified using a logistic (sigmoid) function that effectively deals with outliers (binary classification). From these probabilities, the LR classifies, with regularization, the dependent variables into any of the created classes. However, for multiclass classification problems are all binary classification possibilities considered, it is *No-alert* vs. *Pre-alert*, *No-alert* vs. *Alert*, and *Pre-alert* vs. *Alert*. Finally, the solution is the classification with the

maximum probability (multinomial LR) using the *softmax* function Equation (1). With this function is the predicted probability of each class defined [41]. The calculated probability for each class is positive with the logistic function and normalized across all classes.

$$
gamma(z)\_i = \frac{e^{z\_i}}{\sum\_{l=1}^k e^{z\_l}}\tag{1}
$$

where *z<sup>i</sup>* is the *i*th input of the *softmax* function, corresponding to class *i* from the *k* number of classes.

### 2.2.2. K-Nearest Neighbors

K-Nearest Neighbors (KNN) is a non-parametric statistical pattern recognition algorithm, for which no theoretical or analytical background exist but an intuitive statistical procedure (memory-based learning) for the classification. KNN classifies unseen data based on a similarity measure such as a distance function (e.g., Euclidean, Manhattan, Chebyshev, Hamming, etc.). The use of multiple neighbors instead of only one is recommended to avoid the wrong delineation of class boundaries caused by noisy features. In the end, the majority vote of the nearest neighbors (see the formulation in [41]) determines the classification decision. The number of nearest neighbors can be optimized to reach a global minimum avoiding longer computation times, and the influence of class size. The major advantage of the KNN is its simplicity. However, the drawback is that KNN is memory intensive, all training data must be stored and compared when added information is to be evaluated.

### 2.2.3. Random Forest

Random Forest (RF) is a supervised ML algorithm that ensembles a multitude of decorrelated decision trees (DTs) voting for the most popular class (classification). In practice, a DT (particular model) is a hierarchical analysis based on a set of conditions consecutively applied to a dataset. To assure decorrelation, the RF algorithm applies a bagging technique for a growing DT from different randomly resampled training subsets obtained from the original dataset. Each DT provides an independent output (class) of the phenomenon of interest (i.e., runoff), contrary to numerical labels for regression applications. The popularity of RF is due to the possibility to perform random subsampling and bootstrapping which minimizes biased classification [42]. An extended description of the RF functioning is available in [43,44].

The predicted class probabilities of an input sample are calculated as the mean predicted class probabilities of the trees in the forest. For a single tree, the class probability is computed as the fraction of samples of the same class in a leaf. However, it is well-known that the calculated training frequencies are not accurate conditional probability estimates due to the high bias and variance of the frequencies [45]. This deficiency can be resolved by controlling the minimum number of samples required at a leaf node, with the objective to induce a smoothing effect, and to obtain statistically reliable probability estimates.

### 2.2.4. Naïve Bayes

Naïve Bayes (NB) is a classification method based on Bayes' theorem with the "naive" assumption that there is no dependence between features in a class, even if there is dependence [46]. Bayes' theorem can be expressed as:

$$P(y|X) = \frac{P(X|y)\ P(y)}{P(X)}\tag{2}$$

where *P*(*A*|*B*) is the probability of y (hypothesis) happening, given the occurrence of *X* (features), and *X* can be defined as *X* = *x*1, *x*2, . . . , *xn*. Bayes' theorem can be written as:

$$P(y|\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) = \frac{P(\mathbf{x}\_1|y) \ P(\mathbf{x}\_2|y) \dots \ P(\mathbf{x}\_n|y) \ P(y)}{P(\mathbf{x}\_1) \ P(\mathbf{x}\_2) \dots \ P(\mathbf{x}\_n)} \tag{3}$$

There are different NB classifiers depending on the assumption of the distribution of *P*(*x<sup>i</sup>* |*y*). In this matter, the study of Zhang [46] proved the optimality of NB under the Gaussian distribution even when the assumption of conditional independence is violated (real application cases). Additionally, for multiclass problems, the outcome of the algorithm is the class with the maximum probability. For the Gaussian NB algorithm no parameters have to be tuned.

### 2.2.5. Multi-Layer Perceptron

The Multi-Layer Perceptron (MLP) is a class of feedforward artificial neural networks (ANN). A perceptron is a linear classifier that separates an input into two categories with a straight line and produces a single outcome. Input is a feature vector multiplied by specific weights and added to a bias. Contrary to a single-layer case, the MLP can approximate non-linear functions using additional so-called hidden layers. Prediction of probabilities of belonging to any class is calculated through the *softmax* function. The MLP consists of multiple neurons in fully connected multiple layers. Determination of the number of neurons in the layers with a trial-and-error approach remains widely used [47]. Neurons in the first layer correspond to the input data, whereas all other nodes relate inputs to outputs by using linear combinations with certain weights and biases together with an activation function. To measure the performance of the MLP, the logistic loss function is defined with the limited memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method as the optimizer for training the network. A detailed and comprehensive description of ANN can be found in [48].

### *2.3. Methodology*

Figure 3 depicts schematic the methodology followed in this study. The complete dataset for the study consists, as mentioned before, of precipitation and labeled discharge time-series (see Figure 2). The dataset was split in two groups, respectively, for training and testing purposes, and training and test feature spaces were composed for each lead time for the tasks of model hyperparameterization and model assessment. This procedure is repeated for each of the ML techniques studied. Finally, the ranking of the performance quality of all ML methods for every lead time, based on performance metrics and a statistical significance test, were determined.

### 2.3.1. Feature Space Composition

For each lead time, we used single training and testing feature spaces for all ML techniques. A feature space is composed by features (predictors) coming from two variables: precipitation and discharge. The process of feature space composition starts by defining a specific number of precipitation and discharge features (present time and past hourly lags) according to statistical analyses relying on Pearson's cross-, auto and partial-autocorrelation functions [49]. The number of lags from each station was selected by setting up a correlation threshold of 0.2 [28].

Similarly, for discharge, we used several features coming from past time slots of discharge selected for the analysis. It is worth noting that the number of discharge features triples since we replace each discharge feature with three features (one per flood warning class) in a process known as one-hot-encoding or binary encoding. Therefore, each created feature denotes 0 or 1 when the correspondent alarm stage is false or true, respectively. Finally, we performed a feature standardization process before the computation stage of the KNN, LR, NB, and NN algorithms. Standardization was achieved by subtracting the mean and scaling it to unit variance, resulting in a distribution with a standard deviation equal to 1 and a mean equal to 0.

**Figure 3.** Work schedule for the development and evaluation of the machine learning (ML) flood forecasting models.

### 2.3.2. Model Hyperparameterization

After the composition of the feature space the optimal architectures for each ML forecasting model, and for each lead time was set up. The optimal architectures were defined by the combination of hyperparameters under the concept of balance between accuracy, and computational cost, and speed. However, finding optimal architectures requires an exhaustive search of all combinations of hyperparameters. To overcome this issue, we relied on the randomized grid search (RGS) with a 10-fold cross-validation scheme. The RGS procedure randomly explores the search space for discretized continuous hyperparameters based on a cross-validation evaluation. Moreover, we selected the f1 macro score (see Section 2.3.4) as objective function.

### 2.3.3. Principal Component Analysis

ML applications require in general the analysis of high-dimension and complex data, involving substantial amounts of memory and computational costs. Reduction of the dimensionality was realized through the application of principal component analysis (PCA) enabling exclusion of correlating features that do not add information to the model. PCA was applied after feature scaling and normalization.

This method enables finding the dimension of maximum variance and the reduction of the feature space to that dimension so that the model performance remains as intact as possible when compared to performance with the full feature space. But considering that each ML technique assimilates data differently, we did not define the number of principal components to keep a fixed threshold of variance explanation (e.g., 80–90%), but performed an exploratory analysis to evaluate its influence on each model. As such, the number of PCAs was treated as an additional hyperparameter, and we optimized the number of principal components for each specific model (lead time and ML technique) with the objective to find the best possible model for each case.

All ML techniques and the RGS procedure were implemented through the scikit-learn package for ML in Python® [50]. Table 1 presents the relevant hyperparameters for each ML technique and their search space for tuning [38]. We employed default values for the hyperparameters which are depicted in Table 1.


**Table 1.** Model hyperparameters and their ranges/possibilities for tuning.

### 2.3.4. Model Performance Evaluation

Forecasting hydrological extremes such as floods turns into an imbalanced classification problem, and becomes even more complex when the interest lies in the minority class of the data (flood alert). This is because most ML classification algorithms focus on the minimization of the overall error rate, it is the incorrect classification of the majority class [51]. Resampling the class distribution of the data for obtaining an equal number of samples per class is one solution. In this study, we used another approach that relies on training ML models with the assumption of imbalanced data. The approach we used penalizes mistakes in samples belonging to the minority classes rather than under-sampling or over-sampling data. In practice, this implies that for a given metric efficiency, the overall score is the result of averaging each performance metric (for each class) multiplied by its corresponding weight factor. According to the class frequencies the weight factors for each class were calculated (inversely proportional), using Equation (4).

$$w\_i = \frac{N}{\mathbb{C} \ n\_j} \tag{4}$$

where *w<sup>i</sup>* is the weight of class *i*, *N* is the total number of observations, *C* is the number of classes, and *n<sup>j</sup>* the number of observations in class *i*. This implies that higher weights will be obtained for minority classes.

### Performance Metrics

The metrics for the performance assessment were derived from the well-known confusion matrix, especially suitable for imbalanced datasets and multiclass problems, and are respectively the f1 score, the geometric mean, and the logistic regression loss score [51–56]. Since neither of the metrics is adequate it is suggested to use a compendium

of metrics to properly explain the performance of the model. In addition, those metrics complement each other.

### f1 Score

The *f* score is a metric that relies on precision and recall, which is an effective metric for imbalanced problems. When the f score as a weighted harmonic mean, we name this score f1. The latter score can be calculated with Equation (5).

$$\text{f1 score} = \frac{2 \times \text{Precision} \times \text{Recall}}{(\text{Precision} + \text{Recall})} \tag{5}$$

where precision and recall are defined with the following equations:

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \tag{6}$$

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \tag{7}$$

where *TP* stands for true positives, *FP* for false positives, and *FN* for false negatives.

The f1 score ranges from 0 to 1, indicating perfect precision and recall. The advantage of using the f1 score compared to the arithmetic or geometric mean is that it penalizes models most when either the precision or recall is low. However, classifying a *No-Alert* flood warning as *Alert* might have a different impact on the decision-making than when the opposite occurs. This limitation scales up when there is an additional state, e.g., *Pre-alert*. Thus, the interpretation of the f1 score must be taken with care. For multiclass problems, the f1 score is commonly averaged across all classes, and is called the f1-macro score to indicate the overall model performance.

### Geometric Mean

The geometric-mean (g-mean) measures simultaneously the balanced performance of TP and TN rates. This metric gives equal importance to the classification task of both the majority (*No-alert*) and minority (*Pre-alert* and *Alert*) classes. The g-mean is an evaluation measure that can be used to maximize accuracy to balance TP and TN examples at the same time with a good trade-off [53]. It can be calculated using Equation (8)

$$\text{G}-\text{mean}=\sqrt{\left(\text{TP}\_{\text{rate}}\times\text{TN}\_{\text{rate}}\right)}\tag{8}$$

where TPrate and TNrate are defined by:

$$\text{TP}\_{\text{rate}} = \text{Recall} \tag{9}$$

$$\text{TN}\_{\text{rate}} = \frac{\text{TN}}{\text{TN} + \text{FP}} \tag{10}$$

The value of the g-mean metrics ranges from 0 to 1, where low values indicate deficient performance in the classification of the majority class even if the minority classes are correctly classified.

### Logistic Regression Loss

The metric logistic regression loss (log-loss) measures the performance of a classification model when the input is a probability value between 0 and 1. It accounts for the uncertainty of the forecast based on how much it varies from the actual label. For multiclass classification, a separate log-loss is calculated for each class label (per observation), and the results are summed up. The log-loss score for multi-class problems is defined as:

$$\log \text{loss} = -\frac{1}{N} \sum\_{i=1}^{N} \sum\_{j=1}^{M} y\_{ij} \log(p\_{ij}) \tag{11}$$

where *N* is the number of samples, *M* the number of classes, *yij* equal to 1 when the observation belongs to class *j*; else 0, and *pij* is the predicted probability that the observation belongs to class *j*. Starting from 0 (best score), the log-loss magnitudes increase as the probability diverges from the actual label. It punishes worse errors more harshly to promote conservative predictions. For probabilities close to 1, the log-loss slowly decreases. However, as the predicted probability decreases, the log-loss increases rapidly.

### Statistical Significance Test for Comparing Machine-Learning (ML) Algorithms

Although we can directly compare performance metrics of ML alternatives and claim to have found the best one based on the score, it is not certain whether the difference in metrics is real or the result of statistical chance. Different statistical frameworks are available allowing us to compare the performance of classification models (e.g., a difference of proportions, paired comparison, binomial test, etc.).

Among them, Raschka [57] recommends using the chi-squared test to quantify the likelihood of the samples of skill scores, being observed under the assumption that they have the same distributions. The assumption is known as the null hypothesis, and aims to prove whether there is a statistically significant difference between two models (error rates). If rejected, it can be concluded that any observed difference in performance metrics is due to a difference in the models and not due to statistical chance. In our study we used the chi-squared test to assess whether the difference in the observed proportions of the contingency tables of a pair of ML algorithms (for a given lead time) is significant.

For the model comparison, we defined the statistical significance of improvements/ degradations for all lead times (training and test subsets) under a value of 0.05 (chi-squared test). In all cases, the MLP model was used as the base model to which the other models were compared.

### **3. Results**

This section presents the results of the flood forecasting models developed with the LR, KNN, RF, NB, and MLP techniques, and for lead times of 1, 4, 6, 8, and 12 h. For each model, we addressed the forecast of three flood warnings, *No-alert*, *Pre-alert* and *Alert*. First, we present the results of the feature space composition process, taking the 1 h lead time case as an example. Then, we show the results of the hyperparameterization for all models, followed by an evaluation and ranking of the performance of the ML techniques.

### *3.1. Feature Space Composition*

Figures 4 and 5 show the results of the discharge and precipitation lag analyses for the flood forecasting model 1-h before the flood would occur. Figure 4a depicts the discharge autocorrelation function (ACF) and the corresponding 95% confidence interval from lag 1 up to 600 (h). We found a significant correlation up to a lag of 280 h (maximum correlation at the first lag) and, thereafter, the correlation fell within the confidence band. On the other hand, Figure 4b presents the discharge partial-autocorrelation function (PACF) and its 95% confidence band from lag 1 to 30 h. We found a significant correlation up to lag 8 h (first lags outside the confidence band). As a result, based on the interpretation of the ACF and PACF analyses, and according to Muñoz et al. [28] we decided to include 8 discharge lags (hours) for the case of 1 h flood forecasting in the Tomebamba catchment.

**Figure 4.** (**a**) Autocorrelation function (ACF) and (**b**) partial-autocorrelation function (PACF) of the Matadero-Sayausí (Tomebamba catchment) discharge series. The blue hatch indicates in each case the correspondent 95% confidence interval.

**Figure 5.** Pearson's cross-correlation comparison between the Toreadora (3955 m a.s.l.), Virgen (3626 m a.s.l.), and Chirimachay (3298 m a.s.l.) precipitation stations and the Matadero-Sayausí discharge series. Note the blue horizontal line at a fixed correlation of 0.2 for determining past lags.

Figure 5 plots the Pearson's cross-correlation between the precipitation at each rainfall station and the discharge at the Matadero-Sayausí stream gauging station. For all stations, we found a maximum correlation at lag 4 (maximum 0.32 for Chirimachay). With the fixed correlation threshold of 0.2, we included 11, 14, and 15 lags for Virgen, Chirimachay, and Toreadora stations, respectively.

Similarly, the same procedure was applied for the remaining lead times (i.e., 4, 6, 8, and 12 h). In Table 2, we present the input data composition and the resulting total number of features obtained from the lag analyses for each forecasting model. For instance, for the 1 h case, the total number of features in the feature space equals 67, from which 43 are derived from precipitation (past lags and one feature from present time for each station), and 24 from discharge (one-hot-encoding).


**Table 2.** Input data composition (number of features) for all ML models of the Tomebamba catchment.

\* Note that each discharge feature triples (three flood warning classes) after a one-hot-encoding process.

### *3.2. Model Hyperparameterization*

The results of the hyperparameterization including the number of PCA components employed for achieving the best model efficiencies are presented in Table 3. No evident relation between the number of principal components and the ML technique nor the lead time was found. In fact, for some models we found differences in the f1-macro score lower than 0.01 for a low and high number of principal components. See for instance the case of the KNN models where the optimal number of components significantly decayed for lead times greater than 4 h. For the 1 h lead time, 96% of the components were used, whereas for the rest of the lead times only less than 8%.

**Table 3.** Model hyperparameters and number of principal components used for each specific model (ML technique and lead time).


\* From the total number of features: 1 h = 67, 4 h = 88, 6 h = 100, 8 h = 112, 12 h = 136 features.

If we turn to the evolution of models' complexity with lead time (Table 3) more complex ML architectures are needed to forecast greater lead times. This is underpinned by the fact that the corresponding optimal models require for greater lead times a stronger regularization (lower values of *C*) for LR, a greater number of neighbors (n\_neighbors) for KNN, more specific trees (lower values of min\_samples\_split) for RF and more hidden layers (hidden\_layers) for MLP.

### *3.3. Model Performance Evaluation*

As mentioned before, model performances calculated with the f1-score, g-mean, and log-loss score were weighted according to class frequencies. Table 4 presents the frequency distribution for the complete dataset, respectively, for the training and test subsets. Here, the dominance of the *No-alert* flood class is evident, with more than 95% of the samples in both subsets. With this information, the class weights for the training period were calculated as wNo−alert = 0.01, wPre−alert = 0.55 and wAlert = 0.51.

**Table 4.** The number of samples and relative percentage for the entire dataset and the training and test subsets.


The results of the model performance evaluation for all ML models and lead times (test subset) are summarized in Table 5. We proved for all models that the differences in performance metrics for a given lead time were due to the difference in the ML techniques rather than to the statistical chance. As expected, ML models' ability to forecast floods decreased for a longer lead time. For instance, for the case of 1 h forecasting, we found a maximum f1-macro score of 0.88 (MLP) for the training and 0.82 (LR) for the test subset. Whereas, for the 12 h case, the maximum f1-macro score was 0.71 (MLP) for the training and 0.46 (MLP) for the test subset.



Note: All improvements and degradations are statistically significant.

The extensive hyperparameterization (RGS scheme) powered by 10-fold cross-validation served to assure robustness in all ML models and reduced overfitting. We found only a small difference between the performance values by using the training and the test subsets. For all models, maximum differences in performances were lower than 0.27 for the f1-macro score and 0.19 for the g-mean.

In general, for all lead times, the MLP technique obtained the highest f1-macro score, followed by the LR algorithm. This performance dominance was confirmed by the ranking of the models according to the log-loss score. The ranking of the remaining models was highly variable and, therefore, not conclusive. For instance, the results of the KNN models obtained the second-highest score for the training subset, but the lowest for the test subset, especially for longer lead times. This is because the KNN is a memory-based algorithm and therefore more sensitive to the inclusion of information different to the training subset in comparison to the remaining ML techniques. This can be noted in Table 4, where the training and test frequency distributions are different for the *Pre-alert* and *Alert* classes.

On the other hand, for the g-mean score, we obtained a different ranking of the methods. We found the highest scores for the LR algorithm, followed by the RF and the MLP models. Despite this behavior, the values of the g-mean were superior to the f1-macro scores for all lead times and subsets. This is because the f1 score relies on the harmonic mean. Therefore, the f1 score penalizes more a low precision or recall in comparison with a metric based on a geometric or arithmetic mean. Results of the g-mean served to identify that the LR is the most stable method in terms of correctly classifying both the majority *(No-alert*) and the minority (*Pre-alert* and *Alert*) flood warning classes, while the MLP technique could be used to focus on the minority (flood alert) classes.

To extend the last idea, we analyzed the individual f1 scores of each flood warning class. This unveils the ability of the model to forecast the main classes of interest, i.e., *Pre-alert* and *Alert*. Figure 6 presents the evolution of the f1-score of each ML algorithm at the corresponding lead time. We found that for all ML techniques, the *Alert* class is clearly the most difficult to forecast when the f1-macro score was selected as the metric for the hyperparameterization task. An additional exercise consisted in choosing the individual f1-score for the *Alert* class as the target for hyperparameterization of all models. However, although we obtained comparable results for the *Alert* class, the scores of the *Pre-alert* class had significantly deteriorated, even reaching scores near zero. The most interesting aspect in Figure 6 is that the most efficient and stable models across lead times (test subset) were the models based on MLP and LR techniques. It is also evident that for all forecasting models, a lack of robustness for the *Pre-alert* warning class was found, and there were major differences between the f1-scores for the training and test subsets. An explanation for this might be that the *Alert* class implies a *Pre-alert* warning class, but not the opposite. Consequently, this might mislead the learning process causing overfitting during training leading to poor performances when assessing unseen data during the test phase.

**Figure 6.** *Cont*.

**Figure 6.** f1 scores per flood warning state (*No-alert*, *Pre-alert* and *Alert*) for all combinations of ML techniques across lead times. (**a**), Logistic Regression (**b**), K-Nearest Neighbors, (**c**) Random Forest, (**d**) Naïve Bayes, and (**e**) Multi-layer Perceptron. The brightest and dashed lines in each subfigure (color coding) represent the scores for the test subset.

Moreover, although we added a notion of class frequency distribution (weights) to the performance evaluation task, it can be noted that for all models, the majority class is most perfectly classified. This is because the *No-alert* class arises from low-to-medium discharge magnitudes. This eases and simplifies the learning process of the ML techniques since these magnitudes can be related to normal conditions (present time and past lags) of precipitation and discharge.

to assess models' operational value for longer lead times.

### **4. Discussion**

In this study, we developed and evaluated five different FEWSs relying on the most common ML techniques for flood forecasting, and for short-term lead times of 1, 4, and 6 h for flash-floods, and 8 and 12 h to assess models' operational value for longer lead times. Historical runoff data were used to define and label the three flood warning scenarios to be forecasted *(No-alert*, *Pre-alert* and *Alert*). We constructed the feature space for the models according to the statistical analyses of precipitation and discharge data followed by a PCA analysis embedded in the hyperparameterization.

This was aimed to better exploit the learning algorithm of each ML technique. In terms of model assessment, we proposed an integral scheme based on the f1-score, the geometric mean, and the log-loss score to deal with data imbalance and multiclass characteristics. Finally, the assessment was complemented with a statistical analysis to provide

a performance ranking between ML techniques. For all lead times, we obtained the best forecasts for both, the majority and minority classes from the models based on the LR, RF, and MLP techniques (g-mean). The two most suitable models for the dangerous warning classes (*Pre-Alert* and *Alert*) were the MLP and LR (f1 and log-loss scores). This finding has important implications for developing FEWSs since real-time applications must be capable of dealing with both the majority and minority classes. Therefore, it can be suggested that the most appropriate forecasting models are based on the MLP technique.

The results on the evolution of model performances across lead times suggest that the models are acceptable for lead times up to 6 h, i.e., the models are suitable for flash-flood applications in the Tomebamba catchment. For lead times greater than 6 h, we found a strong decay in model performance. In other words, the utility of the 8 and 12 h forecasting models is limited by the models' operational value. This is because, in the absence of rainfall forecasts, the assumption of future rain is solely based on runoff measurements at past and present times. This generates forecasts that are not accurate enough for horizons greater than the concentration-time of the catchment. The concentration-time of the Tomebamba catchment was estimated between 2 and 6 h according to the equations of Kirpich, Giandotti, Ven Te Chow, and Temez, respectively. A summary of the equations can be found in Almeida et al. [58]. This results in an additional performance decay for the 8 and 12 h cases in addition to the error in modeling.

The study of Furquim et al. [31] is comparable. These authors analyzed the performance of different ML classification algorithms for flash-flood nowcasting (3 h) in a river located in an urban area of Brazil. They found that models based on neural networks and decision trees outperformed those based on the NB technique. In addition, the study of Razali et al. [30] proved that decision tree-based algorithms perform better than KNN models, which agrees with our findings. However, such studies only evaluated the percentage of correctly classified instances which is a simplistic evaluation. Thus, we recommend a more integral assessment of model performances, like the one in the current study, which allows for better support in decision making.

Other studies related to quantitative forecasting revealed that neural network-based models usually outperform the remaining techniques proposed in our study [32–34]. Similarly, the study of Khalaf et al. [37] proved the superiority of the RF algorithm when compared to the bagging decision trees and HyperPipes classification algorithms. Thus, in certain cases, the use of less expensive techniques regarding the computational costs produces comparable results as in [36]; this is also the case in our short-rain and flash-flood flood classification problem.

As a further step, we propose the development of ensemble models for improving the performance results of individual models. This can be accomplished by combining the outcomes of the ML models with weights obtained, for instance, from the log-log scores. Another alternative that is becoming popular is the construction of hybrid models as a combination of ML algorithms for more accurate and efficient models [24,35,36]. Moreover, as stated by Solomatine and Xue [36], inaccuracies in forecasting floods are mainly due to data-related problems. In this regard, Muñoz et al. [9] reported a deficiency in precipitationdriven models due to rainfall heterogeneity in mountainous areas, where orographic rainfall formation occurs. In most cases, rainfall events are only partially captured by punctual measurement, and even the entire storm coverage can be missing.

In general precipitation-runoff models will reach at a certain point an effectiveness threshold that cannot be exceeded without incorporating new types of data such as soil moisture [59,60]. In humid areas, the rainfall–runoff relationship also depends on other variables such as evapotranspiration, soil moisture, and land use, which leads to significant spatial variations of water storage. However, these variables are difficult to measure or estimate.

### **5. Conclusions**


A more detailed model assessment (individual f1 scores) demonstrated the difficulties of forecasting the *Pre-alert* and *Alert* flood warnings. This was evidenced when the hyperparameterization was driven for the optimization of the forecast for the alert class and this, however, did not improve the model performance of this specific class. This study can be extended with a deep exploration of the effect of input data composition, precipitation forecasting, and the feature engineering strategies for both the MLP and LR techniques. Feature engineering pursues the use of data representation strategies that could, for example, provide spatial and temporal information of the precipitation in the study area. This can be done by spatially discretizing precipitation in the catchments with the use of remotely sensed imagery. With this additional knowledge, it would be possible to improve the performance of the models hereby developed at longer lead times.

We recommend that future efforts should be put into applying the methodology and assessment framework proposed here in other tropical Andean catchments, and/or benchmarking the results obtained in this study with the outputs of physically based forecasting models. This was not possible for this study due to lack of data.

Finally, for FEWSs, the effectiveness of the models is strongly linked to the speed of communication to the public after a flood warning is triggered. Therefore, future efforts should focus on the development of a web portal and/or mobile application as a tool to boost the preparedness of society against floods that currently threaten people's lives, possessions, and environment in Cuenca and other comparable tropical Andean cities.

**Author Contributions:** Conceptualization, P.M. and R.C.; formal analysis, P.M.; funding acquisition, R.C.; investigation, P.M.; methodology, P.M. and J.O.-A.; project administration, R.C.; supervision, J.O.-A., J.B., J.F. and R.C.; writing—original draft, P.M.; writing—review and editing, J.O.-A., J.B., J.F. and R.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the project: "Desarrollo de modelos para pronóstico hidrológico a partir de datos de radar meteorológico en cuencas de montaña", funded by the Research Office of the University of Cuenca (DIUC), and the Empresa Pública Municipal de Telecomunicaciones, Agua Potable, Alcantarillado y Saneamiento de Cuenca (ETAPA-EP). Our thanks go to these institutions for their generous funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data, models, and code that support the findings of this study are available from the corresponding author upon reasonable request.

**Acknowledgments:** We acknowledge the Ministry of Environment of Ecuador (MAAE) for providing research permissions, and are grateful to the staff and students that contributed to the hydrometeorological monitoring. for experiments).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Anson Hu <sup>1</sup> and Ibrahim Demir 2, \***


**Abstract:** The height above nearest drainage (HAND) model is frequently used to calculate properties of the soil and predict flood inundation extents. HAND is extremely useful due to its lack of reliance on prior data, as only the digital elevation model (DEM) is needed. It is close to optimal, running in linear or linearithmic time in the number of cells depending on the values of the heights. It can predict watersheds and flood extent to a high degree of accuracy. We applied a client-side HAND model on the web to determine extent of flood inundation in several flood prone areas in Iowa, including the city of Cedar Rapids and Ames. We demonstrated that the HAND model was able to achieve inundation maps comparable to advanced hydrodynamic models (i.e., Federal Emergency Management Agency approved flood insurance rate maps) in Iowa, and would be helpful in the absence of detailed hydrological data. The HAND model is applicable in situations where a combination of accuracy and short runtime are needed, for example, in interactive flood mapping and supporting mitigation decisions, where users can add features to the landscape and see the predicted inundation.

**Keywords:** flood maps; flood risk management; HAND model; WebAssembly; flood risk mapping; web systems; floods; urban flooding; flood analysis

### Systems Using HAND Model. *Hydrology* **2021**, *8*, 65. https:// doi.org/10.3390/hydrology8020065

Academic Editors: Evangelos Baltas, Andrea Petroselli and Marina Iosub

**Citation:** Hu, A.; Demir, I. Real-Time Flood Mapping on Client-Side Web

Received: 26 January 2021 Accepted: 3 April 2021 Published: 11 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Flooding is a global problem, and floods around the world are becoming more significant and severe [1]. Of natural disasters worldwide 40% are floods [2]. It is important to model and predict inundation extent of floods, which is critical information for flood mitigation [3,4], preparedness [5], and planning and response efforts [6]. Real-time and accurate predictions of flood inundation extent can facilitate understanding the potential flood risk and damage [7,8], and support flood mitigation and planning [9]. Flood inundation maps can be used in flood risk communication using intelligent systems [10,11] and novel communication systems [12].

Floods are caused by a multitude of factors, which makes prediction difficult [13]. Moisture flows and consequently rainfall [14], runoff, and stream behavior [15] and other issues affect flooding significantly. For these reasons, flood prediction has a highly complex and non-linear nature [16]. There are various models to predict flood extent. While empirical models include observations [17] and remote sensing [18], and benchmark datasets [19] of past events, hydrodynamic models simulate physical movement of the water [20]. Data driven methods are used extensively in hydrology and water resources modeling [21,22]. One of the simplest and earliest data driven methods based on elevation is flood-fill, which produces results in optimal time but provides less accurate predictions. Another simple method involves overlaying an observed flood level with the topography [23]. On the other hand, there are a variety of complex hydrological models that take into account numerous distinct factors, such as the three-dimensional model [24]. Generally, models that have a good balance between speed and accuracy are used in operational needs [25].

There are several challenges with the existing hydrodynamic models to be helpful in real-time and operational decision-making situations [26]. One of the main issues is that these models often require a lot of detailed hydrological data [27]. For example, advanced hydrodynamic models for flood map prediction often take into account a plethora of factors, including riverbed dynamics, hydraulic features, and structures [28]. The second main challenge to utilize these models in decision making is their computational complexity. Some of the methods that provide a balance between speed and accuracy [29] also utilize data that is costly to gather, such as topographical indices [30].

The height above the nearest drainage algorithm has a wide variety of applications. The height above nearest drainage (HAND) method was originally designed to categorize soil environments and find water tables. Renno et al. [31] first outlined the algorithm and applied it to categorize swampy forests vs. terra-firme forests in the Amazonian jungle. Nobre et al. [32] applied HAND to categorize water tables in a large area of the central Amazon rainforest. The most recent application area for HAND methods is in flood inundation. Nobre et al. [33] applied the HAND model to predict flooding in Southern Brazil. The largest application of this is being able to predict flooding in areas that have little to no prior data, due to the HAND model's total reliance on the DEM. Previous studies on HAND model to predict flood maps [33,34] achieved model accuracy around 86-98% when compared to reference flood extent maps.

Recent advancements in web technologies have allowed for desktop level computation and analysis of large-scale datasets. Web systems have been effectively used in environmental data analysis [35,36], distributed computing [37], and geospatial data processing [38]. WebAssembly is a low-level assembly-like language for the web with near-native performance and allows languages such as C/C++, C# and Rust to run on the web. WebAssembly was recently released in March 2017 and is designed to run code both safely and efficiently in browsers on the client-side [39]. These technologies have allowed for previously unobtainable speeds and computations to be performed on local users' computers.

Here, we utilize the HAND method in order to provide flood inundation extent predictions in real-time on client-side web systems. We discuss the applicability of the algorithm in comparison with other prediction algorithms for flood-prone regions of Iowa and the applications for such real-time prediction. Our objective is to outline the HAND algorithm and demonstrate that it has potential to predict both accurately and quickly on the web, and support flood related decision-making activities [40,41]. There are a variety of benefits that HAND can provide in flood inundation prediction. First, due to its low complexity, HAND runs in very fast time both theoretically and practically. This means that stakeholders involved or interested in flood planning (for example a government official or concerned citizen) can easily make changes to a DEM to simulate adding man-made barriers and then run the algorithm to determine how these changes affect flood inundation extents. This can lead to more effective flood protection planning, since previously users were meant to look at a standalone flood model generated once by an advanced but computationally expensive algorithm and then make educated guesses on the design and utilization of man-made flood prevention measures.

In the following sections, we will outline the algorithm, details of implementations on the web systems, and several evaluation case-studies for comparison to real-world flooding scenarios.

### **2. Materials and Methods**

Our goal with this study is to demonstrate the implementation of the HAND model on client-side web systems and evaluate the accuracy and performance for real-time decisionmaking scenarios. The system will provide an implementation of the HAND algorithm that runs in the local browser of the user for fast flood extent prediction that can work offline and in operational settings with minimal processing on the web server. The system will allow users to modify aspects of the terrain for mitigation purposes and be able to

see predicted flood inundation extents in order to facilitate the use of measures to control potential flood damage.

### *2.1. Web Implementation of HAND*

Figure 1 presents the architecture of the real-time web-based HAND flood map generation system. It includes several steps to compute and generate flood maps using the HAND algorithm. The system is designed so that the algorithmic portion of HAND is entirely in C++ and run on WebAssembly to enable client-side implementation. The code for the computationally heavy portion of the HAND algorithm was written in C++. Emscripten was used to compile the C++ code to WebAssembly, which allows JavaScript (JS) to pass input data and receive model output. The only input is the DEM data and the drainage threshold value provided by the user. The WebAssembly-compiled implementation of HAND calculates the heights above nearest drainages and passes that information to JavaScript for visualizing the flood extent on Google Maps.

**Figure 1.** Architecture and components of the real-time web-based height above nearest drainage (HAND) mapping system.

The C++ algorithmic implementation is precompiled into WebAssembly. The resulting code is meant to supply an efficient way to run the HAND algorithm. Most features of prediction, which can be applied either pre- or post-computation, are left out of the WebAssembly component. WebAssembly is running on an in-browser virtual machine. JavaScript glue code generated by Emscripten provides a means to interface with the WebAssembly VM. The HTML/JS frontend loads images from a source, either static or online, and applies user selected parameters before and after computation. This piece of the architecture calls the API provided by the glue code to interface with the WebAssembly VM and provide prediction results. The JS manages the frontend of the application for taking user inputs, integrating external terrain data, and visualizing map output.

The web implementation is meant to make the system as flexible as possible for all sorts of applications that can utilize the HAND algorithm implementation, and apply their own features (changing terrain, allowing for multiple parameter changes in local areas, etc.). For example, one application might scrape DEM data from various online sources and allow users to generate flood maps in any region in the world. A different application might generate DEMs with another algorithm and allow users to explore the generated flood maps.

Many of the features relevant for flood prediction are nearly independent of the computation of the HAND. For example, two of the three relevant parameters, HAND threshold and absolute height threshold, can be applied after the actual HAND model is computed. In addition, other features such as applying HAND thresholds only to specific areas, dynamically loading DEMs, and adding terrain features (dikes, reservoirs, etc.) are better suited to the client-side processing using JavaScript. Since these features are highly application-specific but low in terms of computational expense, we leave them out of the C++ and WebAssembly implementation. For example, one application might require dynamic loading of DEMs from state resources, while another generates DEMs with an algorithm and passes them in. This allows the C++ implementation to be highly portable and versatile; a wide variety of applications requiring the use of the HAND algorithm can utilize it.

### *2.2. HAND Algorithm*

In our study, we applied HAND methodology (as described in [32,33]) in order to predict the extent of flood inundation. By calculating the vertical distance from a cell and its parent cell and applying several heuristic steps, we can calculate the extent of flood inundation with a comparable accuracy. A brief description of the steps (Figure 2) for HAND algorithm is given here:

**Figure 2.** Steps of HAND algorithm procedure to generate flood maps (adapted from [42]).

Determining Flow Direction: We define the direction of the flow of cell a in the DEM as the adjacent cell to which the water on a will flow. Water can only flow in one direction, which means that it can only flow to one cell. We define the set of adjacent cells to the cell at coordinates (x0, y0) in Equation (1) as

$$\{\left(\mathbf{x}, \mathbf{y}\right) \mid \max\left(\left|\mathbf{x} - \mathbf{x}\right|, \left|\mathbf{y} - \mathbf{y}\right|\right) = 1\}\tag{1}$$

We define the child of cell a to be cell b such that the direction of the flow of cell a is towards cell b. We define a flow chain of cell a as an ordered list of cells such that the first cell in the list is a, and each subsequent cell is a child of the previous cell.

Depression Removal: First, the graph must be normalized to remove depressions that are undrainable. An undrainable depression is a set of cells in which any pair of cells in the set can be reached by traversing a path of cells in which consecutive cells on the path are adjacent to each other and each cell in the path is part of the set and none of the cells have a continuous flow to the border of the DEM (none of the cells have a border cell in their flow chain). We used priority-flood depression filling [43] in this case. This ensures that all water can drain to the outside of the map, and that there will be no cycles or other anomalies that can disrupt subsequent algorithms. This step runs in O(N) for integer elevations or O(N log N) for floating-point elevations, where N is the number of cells.

Computing Flow Directions: Then, the direction of water flow for each cell needs to be resolved. We used the D8 algorithm to compute the flow directions. The D8 algorithm takes the adjacent cell that provides the highest gradient to be the cell of the direction of the flow. A flat-resolving algorithm [44] was used to determine the direction of flow in flat areas (where all adjacent cells to a cell have the same height) and guide water away from high areas and towards lower areas. This step runs in O(N) time.

HAND Model Generation: Next, drainages need to be established on the graph. This is done by setting an accumulated area threshold. If a cell has more units of area of water that drain through it than the area threshold, then it becomes a drainage cell. In other words, if the number of flow chains that the cell is part of is greater than the accumulated area threshold, then it is a drainage cell. The nearest drainage for a cell is then the first cell in its flow chain that is a drainage cell. This step runs in O(N) time. Finally, the heights above the nearest drainage are calculated by subtracting the height of the cell in the original DEM with the nearest drainage cell's height in the original DEM.

### *2.3. Implementation Challenges*

There were several issues encountered during the course of implementing the HAND model on the web. HAND model does not account for some of the parameters like height of water and absolute height.

Height of Water: The HAND algorithm as it currently exists views each drainage stream as only one pixel in width, and it does not take into account the height of water. This means that the algorithm will think that a wide river is actually a canyon. The algorithm thinks that the banks of the river have extremely high HAND values because the stream's elevation is at the bottom of the river. In reality the banks of the river have HAND value near zero, because water goes up to the top of the river bank, but the algorithm does not reflect this. To overcome this issue, we added two parameters into implementation.


There are several effects of these parameters on the model outcome. As the HAND threshold goes up, the extent of areas marked as flooded will increase. If the HAND threshold is defined too high, the entire area will be flooded. Drainage threshold controls how many drainage points are available in the selected region. If the drainage threshold is defined too high, the algorithm becomes a slow flood-fill. If it is too low, the flooding predictions are exceedingly coarse, with big blotches splattered around with general inaccuracy. By having a high HAND threshold and enough drainage streams, we allow the areas adjacent to the rivers to be easily flooded, despite being viewed as canyons by the algorithm.

Absolute Height: If there is a large plateau that drains into a stream going downhill, the algorithm will view the plateau as flooded, even when it might be far above local rivers and other water bodies and have a low chance of flooding. To overcome this issue, we added a third parameter for the maximum height threshold of flooding (hereafter referred to "absolute height threshold"). If a cell's absolute height is above this threshold, then it will not be considered as flooded. Absolute height threshold controls extent of flooding. If it is defined too high, areas that would be dry are predicted as flooded. If it is too low, the algorithm will miss much of the flooding in higher areas. A medium absolute height threshold allows rivers that have a slope to still have effective flooding predictions within them and on their borders but prevents adjacent flat-but-elevated landforms being counted as flooded. A combination of a medium absolute height threshold and high HAND threshold effectively serves to counter the two issues listed.

### **3. Results and Discussion**

We compared the results of the HAND output with two other flood inundation predictors: a basic flood-fill implementation and Federal Emergency Management Agency (FEMA) approved Flood Insurance Rate Maps (100-yr, 500-yr) developed [45] at the Iowa Flood Center (IFC). We used two datasets for the analysis. The first was the Cedar Rapids (Figure 3) metro area in Iowa, where we utilized a DEM at a resolution of 1 m for 5 sq

km area (2863 × 1746 grid). The second was for the Ames, Iowa (Figure 3), where we used a DEM at a resolution of 2 m for 53.8 sq km area (4220 × 3189 grid). All DEMs were normalized to 255 units corresponding to 1–3 ft vertical elevation in real-world based on vertical resolution.

**Figure 3.** FEMA-approved flood maps for Cedar Rapids (**left**) and Ames (**right**), Iowa by the Iowa Flood Center (IFC).

For the Cedar Rapids area, the HAND algorithm ran in roughly 3 s using a mid-range personal laptop and produced an output differing from the reference map predictions by 10%. The flood-fill implementation ran in less than one second and produced an output differing by roughly 15%. For the Ames area, the HAND algorithm (Figure 4) ran in roughly 9 s and produced an output differing from the reference map predictions by 6%. The flood-fill implementation (Figure 4) ran in roughly one second and produced an output differing by roughly 30%. The HAND model's accuracy was notably improved, both in absolute error and in error relative to the flood-fill implementation; this was most likely due to the larger scale of the Ames DEM. Comparisons were drawn between appropriate thresholds in the FEMA reference maps and the results of the HAND implementation. Several predictions of arbitrary values of parameters were generated with the algorithm and are presented here to show the effects of adjusting the parameters of the prediction algorithm on the output.

**Figure 4.** Maps generated for Ames, Iowa using basic flood-fill (**left**) and HAND model (**right**).

### *3.1. Adjustment of Parameters*

To adjust for flood predictions in areas without prior predictions or flood maps, we must take into account the scale of the area we are modifying, and a sense of an absolute maximum for good area control. The Ames area DEM measures 65 square km, and the

most effective drainage threshold measured was 1 million cells corresponding to a 4 square km area. This is most likely the absolute maximum, as it is strikingly similar to suggested effective threshold value of 4.05 square km in the literature [33]. The HAND threshold is highly correlated with the level of the flooding on the terrain. The best way to analyze this is to figure out the relative heights of the cells and compare them to real-world heights, and then scale appropriately. For example, if the cells had relative heights corresponding to one unit as 2 ft, and if information exists that areas less than 4 ft above the nearest drainage will be flooded, then the HAND threshold can be set as two units corresponding to 4 ft.

The absolute threshold must be determined by the maximum height that the user predicts the flood will extend, similar to the HAND threshold, but in an absolute manner. To generalize parameters to areas for which there is no prior prediction available, it is recommended that areas with similar topography and known predictions for flood inundation extents be examined; then, the user can find good parameters for the HAND algorithm that match those known predictions, and apply those same parameters to the unknown area.

### *3.2. Evaluation of Parameter Sensitivity*

We evaluated the sensitivity of the HAND parameters on generating the flood extent. Tables 1 and 2 present the resulting flood extents for changing model parameters mentioned in the HAND algorithm section above. We tested several drainage threshold (100 K, 500 K, 1 M, and 2 M cells), HAND threshold (2, 3, 4, and 6 units), and absolute elevation threshold (220 and 225 ft) values. As seen from the resulting flood extents, higher HAND threshold values cause larger areas around drainage points to be flooded. Higher drainage threshold on the other hand causes number of drainage points to be fewer, which causes less branching in the predictions. Higher absolute threshold causes fewer high-elevation areas to be flooded. These parameters have significant impact on the resulting map details and accuracy. Drainage threshold has an optimal value as 4 square kilometers as supported by the literature [33], however it can further be calibrated if reference maps are available for selected region. The HAND threshold represents the water level at the outlet of the selected watershed. The HAND threshold value can be assigned using visual inspection, water level gauge, forecast models or historical data from a near-by gauge.


**Table 1.** Sensitivity of model parameters (HAND threshold, drainage threshold) on the resulting flood extent (absolute threshold 220 ft).


**Table 2.** Sensitivity of model parameters (HAND threshold and drainage threshold) on the resulting flood extent (absolute threshold 225 ft).


*3.3. Real-Time HAND Generation Performance*

In this study, we evaluated the computational performance of client-side flood map generation using the HAND model and WebAssembly. We selected different DEM resolutions (1, 5, 10, 25, 50, and 100 m) and grid sizes (1, 4, 9, 16, and 25 million cells) for

evaluation. Based on the selected elevation data resolution and grid size, generated maps represent city, county, and state scale in coverage (Table 3). The performance of the map generation took roughly 0.25–6.25 s on a moderate personal computer. Higher DEM resolutions could be used for any scale based on data availability and computational resources. The client-side implementation on WebAssembly could further speed up by using Web Workers and distributing the computation to multiple CPUs on the same machine.


**Table 3.** Computing time for generating flood extent in different scales and resolutions.

Terrain Modification and Mitigation Scenarios: The computational speed and starting the mapping process from bare elevation dataset is a critical benefit of the presented system for what if scenario analysis in flood mitigation. The users can modify the terrain for potential flood preparedness and mitigation applications (i.e., adding flood walls, elevating structures, and sand bagging critical infrastructure) and generate the resulting map within seconds to evaluate the mitigation application. The system can use base or modified terrain data as an input and generate corresponding floods maps. Users can evaluate mitigation options based on resulting flood maps and how much area is saved from flooding.

A sample case study was generated for Cedar Rapids, Iowa, where resulting map can be seen in Figure 5 before and after terrain modification. Mitigation impact of the levee on resulting flood can be seen from the figure. Significant area in the urban setting could be saved from flooding by installing a levee on the west site of the river. The system can be used for what-if scenario evaluation by simple terrain modifications. While this analysis could take hours to days to run on traditional flood mapping applications, it takes under a minute to modify the terrain and generate the resulting maps in the proposed system.

**Figure 5.** Maps generated for Cedar Rapids, Iowa before (**left**) and after (**right**) mitigation using levees.

Model Generalization: Operational use and generalization capability of any flood map generation method is heavily dependent on the data availability and quality. While high-resolution DEM data (e.g., 1 m or 3 m) is not generally available around the world, many global elevation datasets [46] exist with moderate resolution (i.e., 30 m) and data quality. We tested HAND model at Cedar Rapids, Iowa (Figure 6) with low resolution elevation datasets (i.e., 10 m, 25 m, and 50 m) to understand the performance of the proposed system for limited data availability and quality. The resulting maps using 25 m

or 50 m were relatively similar in flood map prediction quality compared to maps based on 10 m resolution datasets.

**Figure 6.** Maps generated for Cedar Rapids, Iowa with HAND model using 10 m, 25 m, and 50 m (left to right).

Low resolution elevation dataset caused small percentage (<5%) of the areas to be overestimated and underestimated compared to high resolution maps due to changes in the connectivity of the terrain for flow direction calculations. This issue could be further reduced while creating different resolution elevation datasets during resampling or surveying. The proposed HAND model includes terrain conditioning and clean up (i.e., pit removal and resolving flats) during flow direction generation. Low resolution elevation datasets can be further improved externally before connecting to the real-time system for more accurate results.

### **4. Conclusions**

Real-time flood map generation is critical for operational activities in flood preparedness, mitigation, and response. Traditional flood mapping models requires extensive data and computation resources. In this study, we presented real-time flood map generation in client-side web systems using the HAND model with comparable results to FEMAbased reference maps. The height above the nearest drainage model computes the vertical distance from each cell to its nearest drainage. Testing of the HAND model in two key regions of Iowa demonstrated that the HAND model could be used to generate flood inundation predictions that are both accurate and fast. Predictions generated matched precomputed FEMA-approved flood insurance rate maps with acceptable accuracy within several seconds. Previous papers and applications of HAND focused on their minimal usage of external datasets, which allows for prediction of soil conditions and flood inundation extent in areas with little to no data aside from elevation datasets. The main application of our development is the ability of floodplain managers and end-users to add features to the terrain, such as levees and reservoirs, and examine how they will affect the extent of flooding. Since the users will be able to modify the terrain and see the result in real time, they will be able to get a better idea of how flood mitigation measures will affect flood inundation extents and be able to prepare for flooding in time with more accurate and relevant information. Future work can include an interactive user interface allowing selection of parameters, with potential automation and algorithms to select parameters to reflect real-world scenarios.

**Author Contributions:** Conceptualization, I.D.; methodology, A.H. and I.D.; software, A.H. and I.D.; validation, A.H.; formal analysis, A.H.; investigation, A.H. and I.D.; resources, A.H. and I.D.; data curation, A.H.; writing—original draft preparation, A.H. and I.D.; writing—review and editing, A.H. and I.D.; visualization, A.H. and I.D.; supervision, I.D.; project administration, I.D.; funding acquisition, I.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Secondary Student Training Program at the University of Iowa.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data used in this study is publicly available from Federal Emergency Management Agency, Iowa Flood Center and Iowa Department of Natural Resources.

**Acknowledgments:** The floodplain maps used in the analysis were acquired from the statewide mapping project at the Iowa Flood Center.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Flood Risk Communication Using ArcGIS StoryMaps**

**Khalid Oubennaceur \* , Karem Chokmani, Anas El Alem and Yves Gauthier**

Institut national de la recherche scientifique-Centre Eau Terre Environnement, 490 De la Couronne Street, Quebec City, QC G1K 9A9, Canada; Karem.chokmani@inrs.ca (K.C.); Anas.ELAlem@inrs.ca (A.E.A.); Yves.Gauthier@inrs.ca (Y.G.)

**\*** Correspondence: khalid.oubennaceur@inrs.ca; Tel.: +1-418-999-0861

**Abstract:** In Canada, flooding is the most common and costly natural hazard. Flooding events significantly impact communities, damage infrastructures and threaten public security. Communication, as part of a flood risk management strategy, is an essential means of countering these threats. It is therefore important to develop new and innovative tools to communicate the flood risk with citizens. From this perspective, the use of story maps can be very effectively implemented for a broad audience, particularly to stakeholders. This paper details how an interactive web-based story map was set up to communicate current and future flood risks in the Petite-Nation River watershed, Quebec (Canada). This web technology application combines informative texts and interactive maps on current and future flood risks in the Petite-Nation River watershed. Flood risk and climate maps were generated using the GARI tool, implemented using a geographic information system (GIS) supported by ArcGIS Online (Esri). Three climate change scenarios developed by the Hydroclimatic Atlas of Southern Quebec were used to visualize potential future impacts. This study concluded that our story map is an efficient flood hazard communication tool. The assets of this interactive web mapping tool are numerous, namely user-friendly mapping, use and interaction, and customizable displays.

**Keywords:** story maps; disaster risk reduction; flood risk; slide; GARI tool; risk communication; climate change

### **1. Introduction**

Flooding is the most common and costly natural hazard affecting Canadians today [1]. Floods occur throughout the country, but most of the worst flooding events have happened in southern regions, affecting hundreds of thousands of people and costing billions of dollars in damage [2]. In addition, scientists expect an increase of flooding events and damage costs in riparian, coastal, and urban areas of the country [3]. Flooding poses a serious risk to households and communities across Canada. In the spring of 2019, for instance, flooding in eastern Canada cost more than CAD 200 million in insured losses and damage to nearly 20,000 properties [4]. Despite such impacts, Canadians are largely unaware of the flood risk they face. According to a survey published in 2016 based on 2300 households by the Interdisciplinary Center on Climate Change by Partners for Action at the University of Waterloo [4], only 6% of Canadians living in a designated flood risk area are aware of the risk, and only 21% believe that the risk of flooding will increase over the next 25 years. Furthermore, Canadians need and want more information on how to be actively engaged in flood management and how to improve their resilience. In this context, efforts are being made to reduce the impacts of flooding events by increasing public awareness through communication of flood hazards and popularization of flood risk reduction measures.

Communication plays an important role in flood risk management since it has the power to raise risk awareness on potential flood hazards, educate and motivate the population at risk to take preventive actions, and to be prepared in the case of an emergency. Information tools in flood risk communication aim to inform the people about risks in their

**Citation:** Oubennaceur, K.; Chokmani, K.; El Alem, A.; Gauthier, Y. Flood Risk Communication Using ArcGIS StoryMaps. *Hydrology* **2021**, *8*, 152. https://doi.org/10.3390/ hydrology8040152

Academic Editors: Marina Iosub and Andrei Enea

Received: 8 September 2021 Accepted: 7 October 2021 Published: 11 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

residential zone. Various research projects have been conducted to better understand the mechanisms of effective flood hazard communication [5–8]. Mees et al. [9], for example, have demonstrated the need for accessible communication tools to raise awareness of flood and climate change impacts. The use of flood risk maps to show the potential severity of floods and locations at risk has been shown to improve citizen response to floods in the United Kingdom [10]. Flood maps can be used to different ends, such as hazard and risk identification and land use planning. They can be used for many aims and by various users, such as identifying which properties should adopt flood-proofing measures and educating the public about local flood risks. Different forms of flood maps exist and it is important to distinguish flood hazard maps from flood risk maps. Flood hazard maps [11] show flood depth and water extent for given events, whereas flood risk maps [12] indicate the potential consequences of floods on properties and social, economic, environmental, and cultural assets, along with the global risk faced by a community to certain flood scenarios. De Moel et al. [13] highlighted that flood hazard maps are the most common type of flood maps, followed by flood risk maps.

The use of online interactive communication tools such as story maps to dispense information is becoming increasingly popular. Such web applications facilitate communication by illustrating complex scientific data in an accessible format to specific target groups. Besides being informative for target users, story maps should also be easy to understand and use. Web-based story maps can be static or dynamic. While static webmaps are based on fixed images, dynamic webmaps can be adjusted according to the user's preferences, which can select different scenarios or zoom over a desired location, for example. Both map types offer the possibility of choosing the information displayed on the map (e.g., flood hazard, flood risk maps, etc.). Recent advances in internet accessibility and speed have led to an increased interest in favour of using webmaps as a mode of visual communication.

Story mapping represents a novel mixed media approach that combines different functionalities such maps, videos, images, graphs, and text into a simple online interface. New and innovative GIS tools such as ArcGIS Online and ArcGIS StoryMaps (Esri) are much more than thematic maps; they are a complete and promising means of communication. A major advantage is that story maps can be used by non-GIS users to showcase and popularize the results of scientific research and that is partly why they represent a good option in flood hazard and risk communication. Individuals can be informed of their flood risk level while actually visualizing their houses [14]. They allow the combination of mapped flood hazard and risk data with other information to enhance both the map's content and the quality of flood hazard and risk communication. The tools available within ArcGIS StoryMaps help create interactions with flood risk information that goes beyond what can be achieved by flood risk maps alone. Story maps can illustrate the spatiality of flood risks supplemented with critical information, informing users on numerous flooding-related topics. In Canada, the city of Calgary has created a story map for flood risk and resilience: https://maps.calgary.ca/RiverFlooding/ (accessed on 1 September 2021), which contains images, text, flood maps, and testimonies from flood victims. This application allows users to look up their individual address and see if they are within the floodway zone. Furthermore, story maps can be used in climate change studies to communicate potential flooding impacts [15–17].

Flood risk maps are one of the visual components of a story map and should be developed with an easy-to-understand content, a user-friendly interface, and links to further information [18]. Story map design plays a crucial role in the interpretation, so elements such as legends, colours, embedded pop-ups, and technical terminology should be used to enhance user experience. For example, flood risk map legends should draw the user's attention by being clear, clearly displayed, simple to understand (low−medium−high risk levels, shades of blue for flood extents, and avoiding return period terminology, rather frequent−medium−extreme events) and contain a conservative amount of information [19]. The category classes also have to be comprehensible and readable at first sight. Furthermore, the contrast of the map's background containing informative elements should be optimized to prevent information overload [19].

− −

Given the benefits of story maps to crisis managers and the public, in risk communication, it was decided to develop one as a case study for the Petite-Nation River. The main elements of this interactive story map included the flood risk of previously flooded areas, the buildings and streets affected during different flooding events, climate projection maps, and information about past flooding events in the Petite-Nation River watershed. This initiative was aimed at assessing the potential of story mapping to disseminate scientific data and relevant information to the general public, with regard to flooding management. For this case study, ArcGIS software by Esri was used alongside ArcGIS Online and the ArcGIS StoryMaps story builder.

### **2. Materials, Methods, and Application Case**

### *2.1. Study Area*

The study area is located in the Petite-Nation River watershed in Quebec, Canada (Figure 1). The watershed covers an area of 2248 km<sup>2</sup> . This sector has been regularly experiencing flood events due to its location on the Petite-Nation River floodplain. The most recent event occurred in 2019 and affected the municipalities of St-Andre-Avellin and Duhamel. Many roads were flooded. The municipality of St-Andre-Avellin declared afterwards a state of emergency resulting in the evacuation of approximately 100 people.

**Figure 1.** The Petite-Nation watershed study area.

This study is part of a larger project named FRAPNR (flood risk assessment on the Petite-Nation River watershed), which is carried out by the National Institute of Scientific Research (INRS), in partnership with the local watershed organization (Organisme de bassins versants des rivières Rouge, Petite-Nation et Saumon; OBVRPNS) and six municipalities located within the watershed (Duhamel, Ripon, St-Andre-Avellin, Lac-Simon, Papineauvillle, and Plaisance). This project aims at improving community response to flooding by increasing the understanding of the impacts of climate change, promoting upstream−downstream solidarity among municipalities, and helping vulnerable communities to be better equipped to deal with climate change-related flooding [20]. For this case-study, we designed a story map in collaboration with the partners to address flooding and climate change. To effectively do this, we also produced and included a series of flood maps for the watershed of the Petite-Nation River.

Meetings with different stakeholders of the Petite-Nation River watershed were held to better identify user needs and to address the best ways in which they could utilize the developed story maps as part of their regular responsibilities. The consulted stakeholders included municipal representatives such as the town manager, community planners, public works manager, emergency planning team, officers of the environment department, and citizens. The aim of these meetings was to take stock of their opinions on current flooding-related issues, assess suitable communication methods, gather their views in regard to the use of story maps, and to assess their potential as flood hazard and risk communication resources.

### *2.2. The GARI Tool*

In order to visualize the extent of flooding events under different climate change scenarios, we developed a series of flood risk maps using GARI [21,22]. The GARI tool contains three modules described as follows:

Flood mapping: shows the flooding extent and depth delineation on a basemap. The flood hazard maps used in this case study were produced using the HAND method (height above nearest drainage). The HAND technique is a low-complexity, field-based approach for inundation mapping using elevation data, discharge–height relationships, and stream discharge flows. Producing a HAND map requires a digital elevation model (DEM) and a spatial representation of a region's river network. The HAND technique computes the elevation difference between each land surface cell and the stream bed cell to which it drains. The resulting HAND map contains the vertical distance between a location and its nearest stream [23].

Vulnerability of the population: indicates the vulnerability and potential adverse consequences resulting from flooding events including social and environmental aspects. Three different shapefiles can be produced using this module: the first one presents the level of risk to the population per building (flood exposure), the second contains the flooded and nonflooded critical infrastructure, and the third illustrates the flooded road sections. The vulnerability of the population is estimated and mapped for every building. This module also includes water height at flooded critical infrastructures and flooded road portions. The first layer, which is the exposure of the population to a flood hazard can be determined by using the characteristics of the buildings in which the citizens are located at the time of the flood. The second layer, which contains information on the building, includes the following information: registration number, civic address, state of the building, maximum water level at building level, socio-economic vulnerability of the residents, exposure to flood hazard, and the estimated final risk level for the residents (low, serious, severe, and very severe). The attributes of the third layer are the road number, name, condition with regard to water level (nonflooded, passable (< 25cm), emergency vehicles only (25 to 50 cm), impassable), and the maximum water level on each flooded road section.

Damage to buildings and infrastructure: this module generates a map of the estimated damage to each building during a given flood event. This damage is estimated and mapped for each building and expressed in absolute (CAD) or relative (%) terms of the building value. For this purpose, four damage curves (A damage curve indicates the level of damage to a residence as a function of the water height at the main floor level. There is a specific curve for each type of residence) covering an area encompassing one thousand buildings (appx.) was used. A damage curve indicates the level of damage to a residence as a function of the water height at the main floor level. For each residential building, the software estimates the damage based on the water height in the building, by using the curve adapted to its structural characteristics. The total damage to the study area is the sum of each residential building's damage value.

The outputs of each module of the GARI tool are summarized in Figure 2. The GARI modelling outputs were used to develop the story map presented in this case study. These outputs were imported into GIS layers using ArcMap, then uploaded as an image service into ArcGIS Online, and finally implemented as story maps.

**Figure 2.** Output information for all three modules of the GARI tool.

### *2.3. Available Data*

Different types of datasets were selected to set up the flood and climate risk story map of the Petite-Nation River watershed (Table 1). The data falls under two main components of flood risk: hazard and vulnerability (Figure 3). Flood risk is defined as a function of flood hazard and the vulnerability of communities to a flood with a particular return period.

**Figure 3.** Flood risk as the interaction of hazard and vulnerability, source:(http://www.prim.net, accessed on 1 September 2021).

The field values (attributes) and the corresponding variable format of the vector data used to map the flood risk and climate change are shown in Table 2. The raster data corresponds to the depth of water in the pixel.


**Table 1.** Data used for the construction of the story map.


**Table 2.** Field values and variable format used to develop the story map.

### *2.4. Generation of Flood Risk and Climate Change Maps*

### 2.4.1. Flood Risk Maps

Nine flood risk maps were developed based on the scenarios shown in Table 3. These maps were generated by combining the flood hazard maps with population vulnerability maps. The flood hazard was obtained through the HAND method and composed of the raster files (water depth grids) associated with different discharges. The estimated discharge values for the Petite-Nation River watershed correspond to the discharge flow at the hydrometric station located in Ripon. Threshold discharge values are defined by Quebec's Ministère de la Sécurité Publique (department of public security). The flood hazard maps are mainly composed of the raster files (water depth grids). These files were gradually transformed into a format suitable for ArcGIS Online (feature or image service) and included in the story map. The population vulnerability was based on the vector of the flood exposure and the flooded road network, generated by the module vulnerability of population of GARI tool. The combination of the hazard and vulnerability allowed a complete and relevant estimate of the flood risk to be obtained. The flood risk maps were estimated and mapped for each residential building present in the study.

### 2.4.2. Flood Risk Maps

In addition to the flood risk maps, three maps based on climate change projections were also created. Climate change projections were derived from the Hydro-Climatic Atlas of Southern Quebec issued by the Coupled Model Intercomparison Project phase 5 (CMIP 5) [24] following the pessimistic Representative Concentration Pathways (RCPs) scenarios RCP 8.5. This scenario was selected because it is the one most commonly employed for impact assessments. The details of these climate projections are summarized in Table 4. The projections are based on the estimated development for 2050 and 2080. Present and future flood hazard and vulnerability will help stakeholders better understand the potential impact of climate change on flood risk in the Petite-Nation River watershed. These climate change maps were created for the two most populated municipalities of the Petite-Nation River watershed (Ripon and St-Andre-Avellin) and included projected flood hazard and monetary damage layers for the baseline, 2050 and 2080 horizons. The monetary damage was estimated using the GARI software (damage to buildings and infrastructure module).


**Table 3.** The nine scenarios for the Petite-Nation River watershed with their corresponding discharge expressed in m3/s and return period in years.

**Table 4.** Climate change projections for the Petite-Nation River watershed using a return period of 20 years for the baseline and the 2050 and 2080 horizons (RCP 8.5).


### *2.5. Story Map Template*

The geospatial web technology selected in this study is the ArcGIS StoryMaps application. This application can be created using several designs that need no coding, or creators can develop their own StoryMap designs [25]. A story map slide for each location in the story: the location can be inserted by performing a text search for the name, address, etc. When viewing the story map slides, the side bar on the left helps to follow the narrative, and the majority of the screen is taken up by a map, which may be panned and zoomed. There are several story mapping templates included in the ArcGIS StoryMaps application, namely Story Map Tour, Journal, Cascade, Shortlist, Swipe, Basic, and Series [26]. The template used in this study is from the StoryMap Series (Figure 4).

**Figure 4.** Example of a story map developed with ArcGIS StoryMap Series—Title: Calgary's River Flood Story (https://maps.calgary.ca/RiverFlooding/, accessed on 1 September 2021).

The StoryMap Series allows the assembling of a series of maps via tabs, numbered bullets, or an expandable side accordion using an interactive builder. The pages of the story (called map sheets) can be designed by using a template in which each page shows a particular map. In addition to maps, images, and videos, web content can also be added to create a narrative and engage your audience. For the Petite-Nation River case study, we selected the tabs template (https://arcg.is/OvP98, accessed on 1 September 2021).

### *2.6. Story Map Construction*

The first step of the story map construction process was to proceed to geospatial data structuration and their spatial and descriptive analyses. The ArcGIS platform was used to collect, analyze, and visualize data. The vector data are mainly shapefiles of flood risk exposure (polygon), damage to buildings (polygon), damage to the road network (polyline), and the raster data is the water depth. A geodatabase was created in ArcMAP 10.7.1 (Esri) to stock the information layers that would also appear on the online story map. Each information layer hosts one type of spatial information along with the necessary fields for the descriptive information (Table 5). Then, the collected files were uploaded (vector and raster) to ArcGIS Online platform in order to construct the online map (webmap). These files were gradually transformed into a format suitable for online use (feature or image services) and uploaded directly to the online platform. Vector data were zipped and directly imported into ArcGIS Online, and the raster data were converted into image service and then imported into ArcGIS Online. Images were converted to the Flickr format in order to be compatible with the accepted template formats. Layer parameters (e.g., symbol, transparency, class, tag display, and pop-up menus) were then created. Specific legends and symbols were also created for every data type. The legend used for flood risk mapping includes three components: water depth, risk exposure of buildings, and risk exposure of the road network. For the water depth, a blue-color palette was used. For the building and roads risk data, different colors were used to differentiate building status (from green for normal buildings, to red for buildings with a basement and first floor flooded). The categorization and symbolization of roads (from green for nonflooded segments to red for flooded roads authorizing only emergency vehicles) was added.


**Table 5.** Story map slide design.


**Table 5.** *Cont.*

The transparency of raster data was set at 50%. The narrative text combined with the maps was kept as brief as possible, especially for map slides. For each webmap, we included a short text explaining the concepts of discharge flow value, return period, and thresholds set up by the public security department. Only the most important information and attributes (Table 2) are displayed to the user to avoid clutter and information overload (from ESRI's basic gallery maps). Finally, two basemaps available from ArcGIS Online were assigned as the background: Google Earth Webmaps were used for the flood risk scenarios and topographic webmaps were used for the climate change scenarios. Thematic maps were created in ArcGIS Online based on the collected data.

The story map presented in this paper is composed of 17 slides (or tabs) through which the users can browse. Each slide is divided in two: the left part contains the narrative text, photos, and tables, and the right part which is the main frame, displays the webmaps, scenes, or multimedia content. The first four slides provide a general description of the project and the study area, the following nine slides contain the flood risk maps, and the next three slides present the climate change projections (Table 5). Layout and content variations were minimal between slides. Table 5 illustrates the content and design of the seventeen story map slides.

### **3. Results**

### *3.1. Online Flood Risk Story Maps*

The information contained in this story map was organized to present the flood risk over the Petite-Nation River watershed based on increasingly severe discharge scenarios (slides 6−14). When the discharge flow increases, the risk, hazard, number of flooded buildings, and roads also increase. A total of nine scenarios related to nine water discharge flows ranging from 96 m3/s (2-year period) to 213 m3/s (maximum flow recorded in the spring of 2019 (Table 3)). The selected discharge flow values cover the full range of flood conditions in the watershed of the study (https://arcg.is/OvP98, accessed on 1 September 2021).

Figure 5 provides representative screenshots of slide designs 6, 9, and 14. These scenarios correspond to minor, medium and major discharge flows: 96 m3/s (2 years), 139 m3/s (20 years), and 213 m3/s (up to 200 years), respectively. For each of these slides, in the left side of the interface, the users can find information about discharge values, return period used in this scenario with its threshold, and just below the legend used for the flood risk map. This legend contains: intensity of water depth, building exposure status, and road status in terms of flooding. On the right side of the slide, the user can explore and visualize the web flood risk map corresponding to a given scenario. Users, by a simple click of buttons, can zoom in and out, move the map, identify the status (with its displayed color), the address of the building, and the status of road segment flooded and its length. The colors on this map show areas at risk of river flooding. The high resolution of the maps would allow decision makers and emergency responders to take action at specific locations by knowing the intervention context.

**Figure 5.** Screenshot of slides 6 (**A**: Scenario 1), 9 (**B**: Scenario 4) and 14 (**C**: Scenario 9).

### *3.2. Climate Projection Online Story Maps*

By changing the flood frequency analysis based on climate change projections with the associated inundated depth maps and the damage curves for the residential buildings and roads, we estimated the change in flood damage, expressed in percentage according to the value of the property for the horizons 2050 and 2080. Figure 6 shows representative screenshots of slide designs for slides 14, 15, and 16, which addressed the projected flood hazard and damage based on 20-year return periods for the municipalities of St-Andre-Avellin and Ripon under current and future horizons, assuming that there are no improvements or modifications to the existing flood protection network.

**Figure 6.** Screenshot of slides 14 (**A**: current with a 20-year return period), 15 (**B**: 2050 with a 20 year return period) and 16 (**C**: 2080 with a 20-year period) (https://arcg.is/OvP98, accessed on 1 September 2021).

The resulting webmaps illustrate flood damage at the scale of the building, combined with information on water depth. When navigating the slide of a given climate scenario, the user can explore and visualize the static risk map corresponding to the climate scenario by zooming into a desired location and clicking in a grid cell to view the projected change in flood hazard and damage. Information about climate change scenarios, the return period, and the RCP used in the slide, as well as the legend (water depth, building damage expressed in percentage, and the roads status) are displayed on the same screen to the left of map.

### **4. Discussion**

In this work, we have developed a story map application for the Petite Nation Watershed based on flood risk and climate maps combined with informative texts. The study has been instructive in quantifying the potential of this web technology, its limitations and the critical influence.

The decision to use the StoryMaps application for this work was based on the numerous advantages compared to traditional methods:


However, the present state of this story map is affected by some limitations. This application at present is a simple mapping with reduced interactivity and little functionality, but with good readability and adequate explanations for the citizen. In general, the information communicated to the citizen by flood related authorities has to be adapted to the needs and requirements of the users to be as effective as possible. The currently existing flood risk maps still lack a good balance between simplicity and complexity. The best way for flood risk communication with the public is the use of real-time information such as gauge levels to inform about flood risks. Gauge levels are easier to understand than the return periods and allow the population at risk to compare these flood situations in the past event. Real flood events have a much stronger influence on building an awareness of flood risk perception than the flood events only derived from hydraulic modelling. In addition, the information communicated via these maps includes water depth with different returns periods, and lack information about floodplain extents. In the majority of cases of webmapping tools around the world, the extent of floods or a floodplain map is

used as the basis and the most widely used information. Moreover, risk communication has to be easily comprehensible to the people at risk. This means avoiding technical terminology and statistical terms such as return periods.

### **5. Conclusions**

In this study, a story map is proposed as a new method for communicating current flood risks and flood risk projections based on climate change scenarios. This proposition stems from the need to better popularize and communicate flood-related information to citizens. The method was applied to the Petite-Nation River watershed, which is regularly flooded. Geospatial data (vector and raster) were used and thematic maps were created depicting the flood risk under current conditions and climate change projections. These maps were the foundation of the story mapping process. The final story map was composed of slides addressing the study area, background information on flooding, interactive risk maps, and some context on the project and the development team.

New enhancements could further improve the StoryMap application developed in this study. For example, the addition of new dynamic functions such as formulating queries, search buttons or data selection would improve the user experience. Adding information about the extension of floodplains, and up-to-date water levels and forecasts (gauges levels or web cameras), and pictures of past events. Displaying the uncertainties related to the values displayed and the projections used would increase transparency, allowing the user to better interpret the data. Statistics such as the number of flooded areas and the affected streets and buildings could also be relevant information to include. Finally, the story map could be enriched by imbedding hyperlinks to interesting articles and websites that would support and provide opportunities for people to investigate and learn of their own flood risk situation.

If story maps were to be optimized for decision makers, such users could benefit from the addition of more indicators, such as socio-economic vulnerability. The quality of the information displayed in the story map could be enhanced by allowing dynamic flood forecasting using real hydrometric time data from the hydrometric station. A warning system could even be added leading to faster response delays. The webmap showing flood forecasts are similar in form to the present web maps, but their emphasis is on the future and they also need to explain the analysis on which the predictions are made. The combination of flood hazard maps with real time information on gauge levels is made possible by directly linking gauge stations with the flood forecast center.

Since the objective of this case study was to test the potential of story maps as a means of communication to citizens and decision makers residing in flood-prone areas, this story map focused on illustrating flood risk and future projections, while providing the information necessary to understand and interpret the maps. The use of specialized software developed by Esri, namely ArcMap, ArcGIS Online and ArcGIS story maps, has proven to be an efficient and effective way to develop the flood-related story map.

**Author Contributions:** Conceptualization, K.O.; methodology, K.O. and K.C.; writing—original draft preparation, K.O.; writing—review and editing, K.O., A.E.A.; visualization, Y.G., K.C.; validation, K.C., Y.G.; supervision, K.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Municipalities for Climate Innovation Program led by the Federation of Canadian Municipalities, grant number MIC-15557.

**Data Availability Statement:** The data presented in this study are available on request from the first author Khalid Oubennaceur.

**Acknowledgments:** This study was carried out by the National Institute of Scientific Research (INRS), in partnership with the local watershed organization (Organisme de bassins versants des rivières Rouge, Petite-Nation et Saumon; OBVRPNS) and six municipalities located within the watershed (Duhamel, Ripon, St-Andre-Avellin, Lac-Simon, Papineauvillle, Plaisance), under the project entitled ''Tool: Assess flood risk and develop sustainable management plans".

### **Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Hydrology* Editorial Office E-mail: hydrology@mdpi.com www.mdpi.com/journal/hydrology

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-3777-1