*Article* **Flash-Flood Potential Mapping Using Deep Learning, Alternating Decision Trees and Data Provided by Remote Sensing Sensors**

**Romulus Costache 1,2, Alireza Arabameri 3,\*, Thomas Blaschke 4, Quoc Bao Pham 5,6,\*, Binh Thai Pham 7, Manish Pandey 8,9, Aman Arora 10, Nguyen Thi Thuy Linh 11,12 and Iulia Costache <sup>13</sup>**


**Abstract:** There is an evident increase in the importance that remote sensing sensors play in the monitoring and evaluation of natural hazards susceptibility and risk. The present study aims to assess the flash-flood potential values, in a small catchment from Romania, using information provided remote sensing sensors and Geographic Informational Systems (GIS) databases which were involved as input data into a number of four ensemble models. In a first phase, with the help of high-resolution satellite images from the Google Earth application, 481 points affected by torrential processes were acquired, another 481 points being randomly positioned in areas without torrential processes. Seventy percent of the dataset was kept as training data, while the other 30% was assigned to validating sample. Further, in order to train the machine learning models, information regarding the 10 flashflood predictors was extracted in the training sample locations. Finally, the following four ensembles were used to calculate the Flash-Flood Potential Index across the Bâsca Chiojdului river basin: Deep Learning Neural Network–Frequency Ratio (DLNN-FR), Deep Learning Neural Network–Weights of Evidence (DLNN-WOE), Alternating Decision Trees–Frequency Ratio (ADT-FR) and Alternating Decision Trees–Weights of Evidence (ADT-WOE). The model's performances were assessed using several statistical metrics. Thus, in terms of Sensitivity, the highest value of 0.985 was achieved by the DLNN-FR model, meanwhile the lowest one (0.866) was assigned to ADT-FR ensemble. Moreover, the specificity analysis shows that the highest value (0.991) was attributed to DLNN-WOE algorithm, while the lowest value (0.892) was achieved by ADT-FR. During the training procedure, the models achieved overall accuracies between 0.878 (ADT-FR) and 0.985 (DLNN-WOE). K-index shows again that the most performant model was DLNN-WOE (0.97). The Flash-Flood Potential Index (FFPI) values revealed that the surfaces with high and very high flash-flood susceptibility

**Citation:** Costache, R.; Arabameri, A.; Blaschke, T.; Pham, Q.B.; Pham, B.T.; Pandey, M.; Arora, A.; Linh, N.T.T.; Costache, I. Flash-Flood Potential Mapping Using Deep Learning, Alternating Decision Trees and Data Provided by Remote Sensing Sensors. *Sensors* **2021**, *21*, 280. https://doi. org/10.3390/s21010280

Received: 6 November 2020 Accepted: 22 December 2020 Published: 4 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

cover between 46.57% (DLNN-FR) and 59.38% (ADT-FR) of the study zone. The use of the Receiver Operating Characteristic (ROC) curve for results validation highlights the fact that FFPIDLNN-WOE is characterized by the most precise results with an Area Under Curve of 0.96.

**Keywords:** flash-flood potential index; remote sensing sensors; bivariate statistics; deep learning neural network; alternating decision trees; ensemble models

#### **1. Introduction**

In recent decades, climate change and its related phenomena, e.g., flash floods, have had significant negative effects worldwide for both human society and environment [1]. The extreme rainfalls, extreme river discharge values, and therefore the flash-flood risk are characterized by a continuous increasing trend [2]. This trend is also validated by the huge amount of damages that flash floods generate worldwide. Therefore, an increasing number of studies in the literature approaching the subject of flash-flood susceptibility can be also observed [3–6]. Moreover, the estimation of flood risk and vulnerability became an essential and mandatory procedure which should be included in the Flood Risk Management strategy [7]. In this regard, the Geographic Informational Systems (GIS) and Remote Sensing (RS) techniques represent the necessary tools, which facilitate the spatial modelling and mapping of flash-flood susceptible areas. It is worth emphasizing the crucial role of Remote Sensing sensors in the observation's campaigns conducted for the identification of areas already affected by flash-flood processes [8]. Thus, without the RS sensors, the correct inventory of the torrential areas, which favor the occurrence of flash flood, will be impossible. Consideration of the previously affected areas and their involvement as input data in more advanced techniques such as machine learning or bivariate statistics, is of a real help to estimate as accurate as possible the flash-flood susceptibility within a specific catchment [9].

In recent years, new techniques and models have been developed by researchers worldwide [10–35]. During the last 6 years, several studies have been individualized regarding the flash-flood susceptibility investigations, which were carried out through the integration of GIS techniques with bivariate statistical models such as: frequency ratio [36], weights of evidence [37], statistical index [38], evidential belief function [39], certainty factor [40], or index of entropy [41]. Another category of methods successfully used in this type of study are those included in Multicriteria Decision Making such as: Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) [42], Analytical Hierarchy Process (AHP) [43], Analytical Network Process (ANP) [44] or Vlse Kriterijuska Optamizacija I Komoromisno Resenje (VIKOR) [45]. Promising results in terms of flashflood susceptibility were also provided by machine learning models such as: logistic regression [46], naïve bayes [47], artificial neural network [48], random forest [49,50], support vector machine [51], neuro-fuzzy inference system [52], *k*-nearest neighbor [53] or deep learning neural network [54]. The attempts of researchers to combine models from the same category or from different categories to generate ensemble algorithms that are considered much more accurate than the stand-alone ones should also be noted [55]. In this regard, the following examples can be provided: Fuzzy Unordered Rules Induction Algorithm (FURIA) [3], Bayesian-based machine learning models [9], machine learning and multicriteria decision making ensembles [7], machine learning and bivariate statistics ensembles [56].

Taking into account the previously presented aspects, the main purpose of the proposed research work is to estimate the susceptibility to flash floods in the basin of the Bâsca Chiojdului river from Romania. Estimation of flash-flood exposure will be based on the data collected using Remote Sensing sensors and the GIS database and their use in a number of four ensemble models generated by combining bivariate statistics with deep learning neural networks and alternating decision trees. Thus, on the one hand, the Frequency Ratio and Weights of Evidence bivariate statistical models will be used; these being combined with deep learning neural network and alternating decision trees. The construction of Receiver Operating Characteristic (ROC) curve and the calculation of several statistical metrics will ensure the validation of the results and the evaluation of the models' performances. It is worthwhile to note that the present study is intended to enrich the scientific literature regarding the flash-flood susceptibility assessment by proposing, for the first time in the literature, the combination above mentioned of four machine learning ensemble models with the GIS and remote sensing techniques.

#### **2. Study Area**

The Bâsca Chiojdului river basin from Romania, on which the present research is focused, has a total area of 340 km2. The basin has an elevation which varies from 242 m to 1463 m, and a slope angle with an average value of 12.3◦. It should be noted that a percentage of 79% of the total area is characterized by slope angles higher than 7◦ [57]. The circularity ratio, that is another important feature with a high influence on flash-flood susceptibility, has a value of 0.46, while the river basin concentration time is 7.27 h [36]. The low concentration time highlights a high predisposition of the study area to the flashflood events. The forest vegetation covers a total percentage of 50%, while in terms of the soil component, the hydrological group B accounts for approximately 41% of the total research area.

The lithology consists mainly of the sedimentary rocks included in the Paleogene and Cretaceous flysch. The climate is characterized by a high continentalism degree, and especially in the warm season, the heavy rainfalls often lead to severe flash-flood phenomena. Due to the geographical characteristics of the Bâsca Chiojdului river basin, the socio-economic elements located across its territory suffered material losses following the flash-flood propagation. The most important flash-flood event occurred in 1975, when the maximum discharge value (300 m3/s) of the Bâsca Chiojdului river reached the historical maximum [57]. More information regarding the main flash floods occurred across the study area, as well as the damages caused by these phenomena, can be found in the research works carried out by: Costache and Zaharia [10], Prăvălie and Costache [57], Costache et al. [38], Zarea and Gheorghe [58], Prăvălie and Costache [59].

#### **3. Data**

In order to carry out the present study, data consisting of torrential areas polygons and flash-flood predictors were gathered.

#### *3.1. Torrential Area Inventory and Sampling*

The inventory of surfaces previously affected by a specific process is essential for an accurate prediction of the areas where that phenomenon can occur in the future [60]. In the present research work, we consider the torrential surfaces as the spatial indicator for the areas with a high susceptibility for flash-flood genesis. In order to identify, as accurate as possible, the areas affected by torrential phenomena, analysis of the images provided by the Remote Sensing sensors was mandatory. This fact highlights the crucial role that this type of sensor has in the analysis of natural hazard susceptibility. Thus, using the Google Earth imagery a total area of 34 km<sup>2</sup> was delimited. These surfaces were created by the accelerated surface runoff occurring on the slopes. The manner in which these surfaces are delineated is described in the study carried out by Costache [61]. According to Costache and Zaharia [8], the torrential areas are defined as the areas characterized by the unified presence of torrential microform of relief such as ravines and gullies, which are generated by surface runoff. They are located in the upper part of the river basin, where the absence of vegetation and the high slopes favor the production of such phenomena. In order to be taken into account in the present study, a sample of 481 points representing locations where the torrential runoff took place was extracted from the entire delimited area. Moreover, another sample of 481 points was placed within the study area, representing points without

torrential processes (Figure 1). Both torrential pixels and non-torrential pixels were divided into training (70%) and validating (30%) samples. This division was necessary in order to train the models and then to validate the results regarding the susceptibility to flash floods.

**Figure 1.** Study area location.

#### *3.2. Flash-Flood Predictors*

For the realization of this study, a number of 10 flash-flood conditioning factors were taken into account. Their main properties are described in the following lines. **Slope angle** was calculated using the Digital Elevation Model (DEM) taken from Shuttle Radar Topographic Mission (SRTM) 30 m database and processed in ArcGIS 10 software. A high value of slope angle will influence in a positive water runoff velocity, while the low values of the same parameter will be restrictive for the surface runoff occurrence [56]. For the study area, the map of slope angle was designed by splitting its range of values into five classes as following [12]: <3◦; 3◦–7◦; 7.1◦–15◦; 15.1◦–25◦; >25◦ (Figure 2a). Another water surface runoff predictor is represented by the **Topographic Wetness Index (TWI)** calculated by the DEM processing in SAGA GIS 2.1.0. The algorithm used to calculate this index requires the use of the area upslope to each pixel and the tangent value of the slope value recorded in the same pixel [53]. The generation of TWI map was possible following the partition of its values into the next five classes using *Natural Breaks* method: 3.15–6.1, 6.11–7.78, 7.79–10.21, 10.22–14.5, 14.51–24.59 (Figure 2b). **Topographic Position Index (TPI)** is a mandatory flash-flood predictor which should be involved in the susceptibility related studies because its values emphasize the altitude difference between the location of a specific point and its neighboring area [62]. This important morphometric indicator was achieved at a spatial resolution of 30 m and its values ranging from −20 to 20 were divided into the next five classes using *Natural Breaks* method: (−20)–(−3.8), (−3.7)–(−1.1), (−1.1)–1.3, 1.4–4.5, 4.6–20 (Figure 2c). **Profile curvature** is mainly used to delineate the surfaces on which an accelerated surface runoff is manifested from those on which a decelerated surface runoff occurs [63]. According to the literature [38], positive profile curvature is characteristic for areas with a decelerated water runoff, while the negative values show the surfaces that increase the water runoff velocity. Across the study area, the profile curvature was classified into the following three intervals: (−3)–0, 0.1–0.9, 1–2 (Figure 2d). The ability of **convergence index** morphometric factor consists of the differentiation of the areas belonging the river valleys from those which are situated along the interfluvial lines. This

index, achieved by DEM processing in SAGA GIS 2.1.0, was classified according to the literature: (−99)–(−3), (−2.9)–(−2), (−1.9)–(−1), (−0.9)–0, 0–99 (Figure 2e). **Stream Power Index (SPI)** is another morphometric factor that is generated in SAGA GIS 2.1.0 based on the values of upslope region that drains into a pixel and the tangent applied to the slope angle [64]. This predictor, which shows the capacity of the river for sediment transport, was mapped using the following classes values: <50, 50–500, 501–2000, 2001–5000, >5000 (Figure 2f). **Slope aspect** (Figure 3a) is the seventh morphometric index taken into account for the present research. The slope orientation has a big influence in the surface runoff process because the humidity condition will vary due to the different quantity of solar radiation [65]. The slope aspect predictor was derived from the DEM.

**Figure 2.** Flash-flood predictors: (**a**) Slope; (**b**) Topographic Wetness Index (TWI); (**c**) Topographic Position Index (TPI); (**d**) Profile curvature; (**e**) Convergence index; (**f**) Stream Power Index (SPI).

**Figure 3.** Flash-flood predictors: (**a**) Aspect; (**b**) Land use; (**c**) Hydrological soil groups; (**d**) Lithology.

Land use, which is the main interface between the torrential rainfalls and the ground surface, has an important influence on the runoff velocity [66]. For the present study, the land use layer was taken from the **Corine Land Cover** 2018 database. According to Figure 3b, a number of eight land use categories were delineated within the study area perimeter. **Hydrological soil group** was considered as a flash-flood predictor in the present research due to its incontestable influence on vertical infiltration of water in the ground [67]. Within the Bâsca Chiojdului cathcment, all of the four hydrological soil groups are present (Figure 3c). A similar contribution, as soil groups, in flash-flood genesis is held by the **lithological groups**. In the area of the Bâsca Chiojdului catchment, a total of 10 lithological groups can be found (Figure 3d).

#### **4. Methods**

The main steps of the methodological workflow are synthetically described in Figure 4.

**Figure 4.** Flowchart of the methodological steps applied in this research.

#### *4.1. Linear Support Vector Machine (LSVM) for Feature Selection*

In a study that aims to estimate the qualitative flash-flood susceptibility, it is imperative to analyze the predictive ability of flash-flood conditioning factors in order to see if they all manage to contribute to some extent to the genesis of flash floods. In the present research paper, the evaluation of the prediction ability of flash-flood predictors was determined using Linear Support Vector Machine (LSVM). This method is widely used because it is able to remove redundant and irrelevant information from input data [68]. The following equation is used to compute the predictive ability through LSVM algorithm [69]:

$$f(\mathbf{x}) = \text{sign}\left(\mathbf{C}^T \ast i + j\right) \tag{1}$$

where *C<sup>T</sup>* is equal to the inverse of weight matrix attributed to each flash-flood predictor, *i* = (*i*1, *i*2, ... , *i*11) is the vector containing the ten flash-flood predictors, *j* is equal to the offset value calculated from the hyper-plane origin [5].

This algorithm was applied with the help of Weka 9.3 software.

#### *4.2. Weights of Evidence (WOE)*

The bivariate statistics model represented by Weights of Evidence (WOE) is a very frequently used algorithm involved in the studies focused on natural hazards predisposition evaluation [40]. In this study, the WOE model is used to calculate the weight that each factor class/category has in relation to the genesis of the flash-flood process. In order to derive the WOE coefficients, first, computing the positive (*W*+) and negative (*W*−) weights is required. The positive weight highlights the association between a factor class/category and the torrential points, while the negative weight indicates the absence of this spatial association [36]. The following relations should be employed in the weights computation [70]:

$$\mathcal{W}^+ = \ln \frac{P\{B|S\}}{P\{B|\overline{S}\}} \tag{2}$$

$$\mathcal{W}^- = \ln \frac{P\{\overline{B} \mid S\}}{P\{\overline{B} \mid \overline{S}\}} \tag{3}$$

where: *W*+—positive weight, *W*−—negative weight, *P*—the probability, *B*—the presence of flash-flood predictor, *B*—the absence of flash-flood predictor, S—the presence of torrential pixels, *S*—the absence of torrential pixels.

The final WOE coefficients can be derived using the next equation [71]:

$$\mathsf{Vdf} = \mathsf{Vdf}u\mathsf{s} + \mathsf{Vd}\mathsf{mirutotal} - \mathsf{Vd}\mathsf{miru} \tag{4}$$

where: *Wplus*—positive weight of a class factor, *W*min—negative weight of a class factor, *W*min*total*—the total of all negative weights in a multiclass map.

The final WOE values will be used as input data into the Deep Learning and Alternating Decision Tree models through which the flash-flood susceptibility will be determined.

#### *4.3. Frequency Ratio (FR)*

Frequency Ratio (FR) is the second bivariate statistical model which will be employed in order to prepare the input data in the Deep Learning and Alternating Decision Tree algorithms. The FR model consists of the calculation of the ratio between the sum of torrential pixels within a specific category of predictor, and the sum of torrential pixels within the entire study zone. The following relation can be used to estimate the FR coefficients [72]:

$$FR = \frac{\frac{Np(LX\bar{i})}{\frac{\sum\_{i=1}^{n} Np(LX\bar{i})}{Np(X\bar{j})}}}{\frac{Np(X\bar{j})}{\sum\_{j=1}^{n} Np(X\bar{j})}} \tag{5}$$

where: *FR*—the frequency ratio of class *i* of factor *j*; *Np*(*LXi*)—the number of pixels with torrentiality within class *i* of factor variable *X*; *Np*(*Xj*)—the number of pixels within factor variable *Xj;* m—the number of classes in the factor variable *Xi*; *n*—the number of factors in the study area.

#### *4.4. Deep Learning Neural Network (DLNN)*

Besides one hidden layer neural networks, the Deep Learning Neural Network (DLNN) is characterized by a feed-forward architecture which contains more than one hidden layer [73]. Due to this fact, DLNN model is considered better than the simple neural network in terms of complex classification problems [74]. In the DLNN structure, the information from the input layer will be transmitted to the hidden layers where it is processed and then forwarded to the output layer. Further, the backpropagation algorithm will be employed to send back the error from the output layer to the input layer [75]. The training procedure of DLNN, which is a type of fee-forward neural network, is ensured by the application of Rectified Linear Unit (ReLU) activation function [76]. This function, which is able to reduce the vanishing gradient, is expressed as follows:

$$\sigma(\mathbf{x}) = \begin{cases} \; \vert \ge \; if \; \mathbf{x} > 0 \\\; \vert 0 \; if \; \mathbf{x} \le 0 \end{cases} = \max(0, \mathbf{x}) \tag{6}$$

where *x* is the input signal transmitted to neuron, while *r* is the ReLU function.

The derivate associated to the ReLU function, which are required by the back-propagation algorithm, can be calculated using the following relation:

$$r'(x) = \begin{cases} \quad |1/x>0\\ \quad |0/x \le 0 \end{cases} \tag{7}$$

It should be remarked that the cross-entropy function is also involved in the training procedure because it helps the DLNN to achieve a higher degree of accuracy [77]. The cross-entropy is mathematically described using the next equation:

$$E = -\frac{1}{N} \sum\_{n=1}^{N} M \ln(P) + (1 - M) \ln(1 - P) \tag{8}$$

where *N* is the total number of records in training sample; *M* is the predictor values, while *P* is the predicted values.

The adaptive momentum (Adam) prediction model, implied in the stochastic optimization process, is used to complete the training process of DLNN. Through the Adam model, the first and second moments could be computed via the exponential moving averages highlighted through the next relations [78]:

$$m\_t = \beta\_1 m\_{t-1} + (1 - \beta\_1)g\_t \tag{9}$$

$$v\_t = \beta\_2 v\_{t-1} + (1 - \beta\_2)g\_t^2 \tag{10}$$

where *m* and *v* are the values of the moving averages, *g* represents current mini-batch gradient, *β* is new hyper-parameters computed via the algorithm.

In order to apply the DLNN-FR and DLNN-WOE ensembles, the specific lines of code were written in R programming language. More specifically, the Keras and Lime package from R Studio were used in this regard.

#### *4.5. Alternating Decision Tree*

Alternating Decision Tree (ADT) model is an ensemble of the decision tree and boosting method [79]. ADT structure has a lower complexity than decision tree models such as Rotation Forest, Classification and Regression Tree or Random Forest [80]. ADT model uses a natural extension of decision tree and voted stumps and is formed by prediction

alternate layers and nodes of decision [81]. Within the ADT algorithm, the decision nodes will specify the predicate condition; meanwhile the prediction nodes will be characterized by a single number [80].

Let *c*<sup>1</sup> be the value of a precondition, *c*<sup>2</sup> the value of a base condition, and *a* and *b* the values of two real numbers; then *a* and *b* will be computed using the relations [82]:

$$a = 0.5\* \ln \frac{W\_+ \left(c\_1 \cap c\_2\right)}{W\_- \left(c\_1 \cap c\_2\right)}, \ b = 0.5\* \ln \frac{W\_+ \left(c\_1 \cap \overline{c\_2}\right)}{W\_- \left(c\_1 \cap \overline{c\_2}\right)}\tag{11}$$

where *W* denotes the sum of the values from any prediction node, and the best *c*<sup>1</sup> and *c*<sup>2</sup> are estimated by minimizing the *Zt* (*c*1, *c*2), determined as follows:

$$z\_t(c\_1c\_2) = 2\sqrt{\mathcal{W}\_+(c\_1\cap c\_2)\*\mathcal{W}\_-(c\_1\cap c\_2)} + \sqrt{\mathcal{W}\_+(c\_1\cap \overline{c\_2})\*\mathcal{W}\_-(c\_1\cap \overline{c\_2})} + \mathcal{W}(\overline{c\_2}) \tag{12}$$

The ADT-FR and ADT-WOE ensembles were run and implemented in Weka software.
