**GIS Based Hybrid Computational Approaches for Flash Flood Susceptibility Assessment**

**Binh Thai Pham 1,\* , Mohammadtaghi Avand 2,\*, Saeid Janizadeh <sup>2</sup> , Tran Van Phong <sup>3</sup> , Nadhir Al-Ansari 4,\* , Lanh Si Ho 5,\*, Sumit Das <sup>6</sup> , Hiep Van Le <sup>1</sup> , Ata Amini <sup>7</sup> , Saeid Khosrobeigi Bozchaloei <sup>8</sup> , Faeze Jafari <sup>2</sup> and Indra Prakash <sup>9</sup>**


Received: 11 January 2020; Accepted: 27 February 2020; Published: 2 March 2020

**Abstract:** Flash floods are one of the most devastating natural hazards; they occur within a catchment (region) where the response time of the drainage basin is short. Identification of probable flash flood locations and development of accurate flash flood susceptibility maps are important for proper flash flood management of a region. With this objective, we proposed and compared several novel hybrid computational approaches of machine learning methods for flash flood susceptibility mapping, namely AdaBoostM1 based Credal Decision Tree (ABM-CDT); Bagging based Credal Decision Tree (Bag-CDT); Dagging based Credal Decision Tree (Dag-CDT); MultiBoostAB based Credal Decision Tree (MBAB-CDT), and single Credal Decision Tree (CDT). These models were applied at a catchment of Markazi state in Iran. About 320 past flash flood events and nine flash flood influencing factors, namely distance from rivers, aspect, elevation, slope, rainfall, distance from faults, soil, land use, and lithology were considered and analyzed for the development of flash flood susceptibility maps. Correlation based feature selection method was used to validate and select the important factors for modeling of flash floods. Based on this feature selection analysis, only eight factors (distance from rivers, aspect, elevation, slope, rainfall, soil, land use, and lithology) were selected for the modeling, where distance to rivers is the most important factor for modeling of flash flood in this area. Performance of the models was validated and compared by using several robust metrics such as statistical measures and Area Under the Receiver Operating Characteristic (AUC) curve. The results of this study suggested that ABM-CDT (AUC = 0.957) has the best predictive capability in terms of accuracy, followed by Dag-CDT (AUC = 0.947), MBAB-CDT (AUC = 0.933), Bag-CDT (AUC = 0.932), and CDT (0.900), respectively. The proposed methods presented in this study would help in the development of accurate flash flood susceptible maps of watershed areas not only in Iran but also other parts of the world.

**Keywords:** machine learning; flash flood; GIS; Iran; decision trees; ensemble techniques

#### **1. Introduction**

Flash floods are those events where the rise in water is rapid within a few hours of the heavy rainfall. Flash flood is one of the most common, severely devastating natural hazards, which causes significant damages to the infrastructure and socioeconomy, and most importantly, it brings loss of lives [1–5]. Globally, more than 5000 people die each year due to flash flood events, which is about four times greater than any other category of flood event [6]. The most destructive nature of flood events is generally related to the extreme amount of torrential rainfall within a short duration resulting in high surface runoff [4,7]. Flash floods occur within catchments, where the response time of the drainage basin is short. According to the American Meteorological Society, flash flood events generally do not give advance warning and therefore, they cause significant risk and destruction due to their complex and dynamic environmental settings and nature [8,9].

Flash flood occurrence is affected by various watershed characteristics (type of basin and drainage), anthropogenic activities (land use, deforestation, and civil engineering construction) and meteorological conditions such as amount, intensity, spatial distribution, and time of rainfall. Recently, climate change is altering meteorological conditions which may lead to flash flood condition at one place and drought condition at another place. Therefore, the past may no longer be a reliable guide to the future. Thus, in the planning of flood management, especially of flash flood in urban areas, climate change effect is to be properly considered to avoid future damages to property and loss of life [10,11].

Geomorphological changes due to natural and anthropogenic causes can modify the flood pattern of different areas [12]. Urbanization is one of the important factors in the occurrence of flash floods in cities. Construction of roads and buildings reduces permeable areas and increases sealed surfaces (impermeable areas), thus causing less infiltration and more runoff with the same amount of rainfall causing pluvial flash floods [10]. Therefore, it is essential to identify and map accurately flash flood susceptible areas within a basin considering appropriate factors to develop suitable models for proper planning, management, and mitigation of flash flood events in an area [13].

There are many natural and anthropogenic factors that affect flood occurrence. Among these factors, topography is one of the important elements (land surface slope, river longitudinal profile, river cross section) that affects natural floods [14]. Flood parameters are very sensitive to topography changes. Low areas adjacent to rivers and streams have the highest risk of flooding. However, flash floods can also occur on hill slopes. Digital Elevation Model (DEM) as an indicator of the earth's surface contains information about the elevation of the earth. Flood depth and velocity are the most important parameters used in vulnerability assessment, estimation of casualties, and financial losses based on the land record [14]. Therefore, careful consideration of the topography of the area is desirable to avoid overestimation or underestimation of financial losses, casualties, and thus overall vulnerability assessment of an area [15,16].

Nowadays, multidisciplinary approaches including remote sensing, Geographic Information System (GIS), and machine learning methods are used for effective prediction and management of floods [5,6,12,17–19]. To recognize and delineate flash flood susceptible areas, DEM and other remote sensing satellite images have become popular and useful tools [20,21]. Bui and Hoang [22] reviewed the flash flood studies into three major classes, namely rainfall-runoff models, traditional methods, and pattern classification. In the case of rainfall-runoff models, the methodologies generally focus on establishing the relationship between the rainfall and runoff to determine the spatiotemporal distribution of the floods at a local scale and to carry out such studies in that area [23]. The traditional methods include analysis of long-term time series data and various statistical models [22]. The problem of predicting flash flood probability by implementing the above methods is the lack of reliable data availability of the long-term time series discharge records. Another method based on the pattern

classification is relatively new, which employs monitoring of data at the gauging stations and also preparation of data of flooded and nonflooded group to assess the flash flood probability of a region and to demarcate the area where flash floods can occur [24,25].

Independent simplified decision-making techniques such as Analytical Hierarchy Process (AHP) [5,26–29], Fuzzy Logic (FL) [30,31], and Frequency Ratio (FR) [32,33] are some of the pattern classification-based methods which have been used to generate the flash flood maps around the world. Though these methods are simple, they do not provide a great level of accuracy in flash flood prediction in comparison to modern and advanced machine learning methods such as Support Vector Machine (SVM) [34,35], Artificial Neural Network (ANN) [36–38], Logistic Regression (LR) [39], GARP and QUEST [40], and Random Forest (RF) [41]. In recent years, some hybrid and ensemble machine learning methods such as Hybrid Bayesian Framework [24], Logistic Model Tree with Bagging Ensembles [42], Ensemble Weight-of-Evidence and Support Vector Machines [43], and Neuro-Fuzzy system integrated with Meta-Heuristic Algorithms [44] have been developed which provide better accuracy in comparison to single machine learning methods.

The main objective of the present study is to use GIS Based Hybrid Computational Approaches to develop ensemble models for accurate flash flood susceptibility assessment. In view of this, four hybrid ensemble models for the flash flood prediction were developed with Credal Decision Tree (CDT) as base classifier. These developed ensemble models are: AdaBoostM1 based CDT (ABM-CDT); Bagging based CDT (Bag-CDT); Dagging based CDT (Dag-CDT); and MultiBoostAB based CDT (MBAB-CDT). A small watershed of Tafresh county in the Markazi province of Iran, which experiences many flash floods every year, was selected as a study area for collecting and generating the datasets for the modeling process. To validate and compare performance of the models, various methods such as statistical measures and Area Under the Receiver Operating Characteristic (AUC) curve were used.

#### **2. Materials and Methods**

#### *Description of the Research Area*

Watershed of Tafresh county is one of the flash flood-prone areas of Markazi province. This county is located in the Markazi province of Iran covering an area of 1605 km<sup>2</sup> , between 34◦31′ N and 35◦5 ′ N, 49◦30′ E to 50◦9 ′ E (Figure 1). Topography of the Tafrash watershed area is hilly with elevation ranging from 1296 to 3101 m. This area experiences cold winters and relatively moderate summers. The average temperature is 19.2 ◦C in summer and 6.4 ◦C in winter. Average annual rainfall in this region is 254.3 mm. Major water supply sources in the Tafresh watershed include springs, the perennial GharehChay River, the Ab Kamar seasonal river, and semi-deep wells. The GharehChay river with discharge 3000 ls−<sup>1</sup> is one of the most important rivers in the area, which provides water for irrigation in Tafresh area, but due to droughts in recent years, discharge has reduced below 2000 ls−<sup>1</sup> . However, several severe flash floods occur in the Tafrash watershed during winter every year, due to sudden heavy rainfall within a short period.

**Figure 1.** Location of study area.
