A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method

Meng, Caisu; Jin, Hailiang

doi:10.3390/su152014928

Open AccessArticle

A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method

by

Caisu Meng

and

Hailiang Jin

^*

School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo 454003, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(20), 14928; https://doi.org/10.3390/su152014928

Submission received: 31 July 2023 / Revised: 8 October 2023 / Accepted: 9 October 2023 / Published: 16 October 2023

(This article belongs to the Section Hazards and Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

A flood is a common and highly destructive natural disaster. Recently, machine learning methods have been widely used in flood susceptibility analysis. This paper proposes a NHAND (New Height Above the Nearest Drainage) model as a framework to evaluate the effectiveness of both individual learners and ensemble models in addressing intricate flood-related challenges. The evaluation process encompasses critical dimensions such as prediction accuracy, model training duration, and stability. Research findings reveal that, compared to Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Lasso, Random Forest (RF), and Extreme Gradient Boosting (XGBoost), Stacked Generalization (Stacking) outperforms in terms of predictive accuracy and stability. Meanwhile, XGBoost exhibits notable efficiency in terms of training duration. Additionally, the Shapley Additive Explanations (SHAP) method is employed to explain the predictions made by the XGBoost.

Keywords:

Zhengzhou City; machine learning; SRTM; HAND; normalization of topography; SHAP; flood sensitivity

1. Introduction

Floods are a type of natural disaster that can cause significant destruction. Throughout history, humans have faced the urgent problem of effectively countering and minimizing flood damage. To gain insight into the topic of flooding, a comprehensive analysis based on CiteSpace was conducted using 2000 research articles and reviews from the Web of Science core database. The analysis focused on keywords related to flooding and utilized latent semantic clustering techniques. The results, presented in Figure 1, reveal the following findings:

Climate change is closely linked to the occurrence of floods. Most floods are caused by heavy rainfall, and increasing global temperatures are contributing to the rise in rainfall intensity and frequency, indirectly increasing the likelihood of flood events [1];
Urban flooding has become a growing concern due to rapid urbanization. While urbanization brings about economic and social development, it also leads to increased flood volumes and peak heights due to the expansion of built-up areas [2,3]. Additionally, the concentration of population in flood-prone areas further exacerbates the problem of urban flooding [4];
Machine learning (ML) models, such as Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN), have been increasingly applied in flood analysis. Researchers have utilized historical flood data and various factors such as geography, meteorology, and hydrology to generate flood susceptibility maps and assess sensitivity to flash flood hazards [5,6].

Using a comprehensive dataset of surface reflectance data from Landsat satellites, Liu et al. [7] constructed dynamic global annual maps spanning from 1985 to 2015. Their findings revealed that during this 30-year period, the global urban land area expanded at a rate of 9687 square kilometers per year. Additionally, forthcoming growth trends in diverse regions and countries will be shaped by socio-economic advancements, resulting in varying rates of expansion [8]. This urban land expansion not only transforms the environment but also disrupts river flow [9]. In the context of heavy rainfall, inadequacies in urban surfaces, drainage systems, and the concentration of urban populations can contribute to severe flooding incidents [10]. Therefore, gaining a comprehensive understanding of floods is crucial for mitigating flood-induced losses and ensuring the safety of human lives and societal assets.

Flood susceptibility pertains to the potential occurrence of floods in a particular area, contingent on diverse environmental factors [11]. By means of flood analysis and simulation, the evaluation of risks and informed decision-making regarding appropriate responses can be attained. This assessment is of importance for the implementation of measures aimed at flood prevention, mitigation, and overall disaster risk reduction.

Over time, flood analysis methods have changed with advancements in technology. One method, called burst analysis, uses tools such as Citespace to identify keywords that suddenly become more frequently cited during specific time periods. This helps researchers track and understand trends in research over time.

The citation of keywords in the selected literature (Table 1) demonstrates the frequent use of rainfall-runoff models for flood sensitivity assessment (FSA) in the past. These models typically rely on the physical principles of hydrological processes and are formulated using mathematical equations. Models such as HEC-HMS/RAS, MIKE FLOOD, SWMM, and TUFLOW have the capability to simulate diverse water movements, encompassing infiltration, evapotranspiration, overland flow, pipe network flow, and river flow. These simulations span from rainfall input to runoff output [12,13,14]. However, these models often require intricate parameter calibration. Furthermore, their substantial computational and storage requirements might make them less suitable for extensive study areas or regions with limited observational data [15,16].

Besides rainfall-runoff models, the application of multi-criteria decision analysis (MCDM) based on expert knowledge is a frequently employed approach in flood sensitivity assessment (FSA). While rainfall-runoff models primarily focus on simulating flood processes, MCDM is utilized to scrutinize and appraise factors contributing to flooding. This aids in estimating flood risks and potential losses. However, MCDM has certain limitations [17]. For instance, intricate decision problems can complicate subjective judgments. Discrepancies in decision-makers’ backgrounds and perspectives, alongside limited and distinct criteria, can lead to diminished decision consistency. Although strides have been taken to address these concerns through approaches such as probability hesitant fuzzy sets, intuitionistic fuzzy sets, AHP-group decision, and adaptive AHP methods, the computation of indicator weights in multi-criteria evaluation analysis heavily relies on expert knowledge and experience, often yielding subjective assessments. Ensuring decision consistency remains a challenge [18]. In the domain of flood research, the MCDM approach is commonly favored to evaluate flood losses and risks, often in conjunction with factors such as flood hazard, exposure, vulnerability, and prevention and mitigation capacity. However, it is less frequently utilized for susceptibility analysis [19].

With the advancement of artificial intelligence technology, machine learning (ML) models that are based on data have become widely popular in the field of disaster research. Comparative experiments have revealed distinct advantages of ML models over rainfall-runoff models and Multi-Criteria Decision Analysis (MCDM) in flood sensitivity assessment (FSA). For instance, Yaseen [20] contends that physical models necessitate an abundance of hydrological variables to precisely simulate flood processes in specific watersheds, yet acquiring data for these variables can prove challenging. In contrast, ML models operate independently of hydrological parameters and are not confined by the physical principles governing water flow. Flood prediction results presented by Tehrany [21] in Kelantan demonstrate that decision tree (DT) models are well-suited for flood susceptibility analysis, given their speed and absence of necessity for statistical assumptions. This holds true despite slightly lower prediction accuracy when compared to traditional statistical methods. Khosravi [22] juxtaposed the flood prediction capabilities of MCDM with two prevalent ML algorithms, Naive Bayes (NB) and Naive Bayes Tree (NBT). ROC curve analysis unveiled that both ML models exhibited superior prediction accuracy than MCDM methods.

Moreover, there have been studies that have identified limitations in the data analysis and mining capabilities of individual learning models (ILMs) [23], referring to independently trained and assessed learning algorithms or models such as Support Vector Machine (SVM) and the least absolute shrinkage and selection operator (Lasso) algorithm. Conversely, ensemble machine learning models (EMLMs) such as Random Forest (RF), AdaBoost, Gradient Boosting Decision Trees (GBDT), and XGBoost, synthesized from multiple ILMs, have emerged as more prevalent and successful in FSA [24,25]. However, there is limited research on the predictive accuracy of diverse EMLMs and the impact of influencing factors on their performance. Therefore, this paper primarily undertakes a comprehensive analysis of the performance of SVM, KNN, and Lasso as ILMs, in conjunction with EMLMs such as RF, XGBoost, and Stacked Generalization (Stacking). Additionally, the paper delves into the repercussions of influencing factor datasets on model performance.

Additionally, a contention has arisen that flood inventory maps featuring point features, often used in machine learning-based flood sensitivity assessment (FSA), may not accurately represent floods due to their regional nature [26]. Therefore, it is imperative to incorporate values characterized by regional continuity, portraying flood-related attributes (such as inundation properties and hazard levels), as inputs to the ML models. This inclusion will furnish a more robust foundation for the applicability of machine learning in FSA. The Height Above Nearest Drainage (HAND) model, introduced by Rennó [27] and computed based on Digital Elevation Model (DEM), effectively encapsulates the attributes of regional flood hazard levels [28]. This efficacy stems from the fact that floods induced by rainfall predominantly materialize when surface runoff doesn’t follow the local drainage system situated in lower terrain sections, or when water regresses into the drainage system [29]. The accuracy of HAND results relies heavily on the quality of the DEM. Given the complexities tied to obtaining high-precision DEMs, the establishment of accurate HAND models remains challenging.

Consequently, this research paper introduces a new approach called NHAND that overcomes the limitations of the existing HAND method. NHAND is employed as the dependent variable for three distinct individual learning models (ILMs)—SVM, KNN, Lasso—as well as three distinct categories of ensemble machine learning models (EMLMs)—RF, XGBoost, and Stacking. To accommodate the direct influence of elevation on NHAND values, two distinct datasets of flood inducing factors were constructed. One dataset encompasses a digital elevation model (DEM), along with derived terrain and hydrological factors such as slope, aspect, curvature, and sediment transport index (STI), aggregating to a total of 12 factors (Factors-Group 1). The other dataset excludes DEM, constituting the remaining 11 factors (Factors-Group 2). Subsequently, a comparative analysis was carried out among the six ML models, focusing on prediction accuracy, stability, and training time.

Therefore, this study presents an advanced approach to the HAND model termed NHAND, which is designed to overcome the limitations of traditional HAND methods. The NHAND outputs are utilized to assess the effectiveness of three distinct linear models (SVM, KNN, Lasso) and three diverse categories of ensemble models (RF, XGBoost, and Stacking) in terms of their predictive accuracy, training time, and stability.

2. Materials and Methods

2.1. Study Area

The focus of this paper is centered on the main urban region of Zhengzhou City in Henan Province, China, as depicted in Figure 2. Over the period from 1999 to 2021, significant changes in the proportion of urban and cultivated land within this urban area have been observed, reflecting alterations in the urban land ratio. Through a comprehensive investigation, various factors including shifts in population dynamics, levels of urbanization, regional gross domestic product (GDP), and the composition of the three key industries—agriculture, industry, and services—have been examined. This analysis yields crucial insights into the underlying drivers that have led to modifications in land use characteristics within the central urban zone during this temporal span. The data utilized in this analysis comes from the Zhengzhou Municipal Bureau of Statistics (http://jj.zhengzhou.gov.cn, accessed on 14 August 2023).

The findings presented in Table 2 show significant correlations between changes in the proportion of urban and cultivated land within the main urban area and several key factors. These factors include population dynamics, urbanization rate, regional Gross Domestic Product (GDP), and the proportion of the agricultural sector among the three major industries. Notably, the historical trajectory of Henan Province has been predominantly centered around agricultural development, constituting the primary industry. However, the initiation of economic reforms in 1978 triggered a transformation in the distribution of the three major sectors—agriculture, industry, and services—shifting away from agriculture and towards the secondary and tertiary sectors, namely industry and services. This structural change in industries has resulted in an increase in population and urbanization rate within the main urban area. As a consequence, there has been a gradual conversion of cultivated land into urban land. While the expansion of urban territory symbolizes regional progress, it also presents challenges related to drainage management and elevated flood risk due to the escalated presence of impermeable surfaces.

2.2. Establishment of NHAND Model

The Height Above the Nearest Drainage (HAND) model introduces a quantitative terrain approach that normalizes a Digital Elevation Model (DEM) by considering its vertical distance relative to the drainage network [30]. This model serves a dual purpose: it not only depicts the likelihood of water drainage for each cell [31], but also offers efficiency and reliability in its application [32]. Consequently, it has gained widespread utilization in flood modeling. However, the utilization of this model is not devoid of limitations. To begin with, the input DEM can contain inaccuracies and biases due to factors such as vegetation and constraints in data acquisition [33,34]. Additionally, publicly accessible DEM datasets might be outdated and fail to accurately portray alterations in terrain relief. Moreover, even with high-resolution data, errors can emerge in delineating drainage patterns within depressions and level areas [35]. Therefore, it becomes imperative for the HAND model to rectify DEM inaccuracies in instances where detailed DEMs are inaccessible or flow pathways are inaccurately represented [36].

Since 2000, there has been a consistent increase in the amount of developed land in the main urban area. This transformation has rendered the publicly available DEM inadequate for effectively characterizing hydrological conditions in the region [37]. To address this issue, modifications were made to the DEM before using it in the HAND model to generate flood hazard assessments. Additionally, river data obtained from map imagery was incorporated into the HAND model to highlight areas at high risk of flooding within drainage areas. This improved model is referred to as the New-HAND (NHAND). The effectiveness of this approach relies on accurate terrain data and precise positioning of watercourses. For terrain data, the SRTM1 image dataset with a spatial resolution of 30 m was chosen. This dataset not only encompasses a substantial portion of the global landmass but also provides a reliable depiction of topographical variations, rendering it widely applicable in hydrological research [38,39]. As for map data, it was obtained through Esri’s geographical information platform. The NHAND model adheres to a specific procedural sequence, which can be segmented into three distinct steps (as depicted in Figure 3).

The precision of water body positioning plays a pivotal role in influencing NHAND outcomes, as it directly establishes the reference position for terrain normalization. Hence, ensuring the accurate identification of water body locations is of paramount importance for establishing NHAND. In this study, an Esri topographic map equipped with R, G, and B band information is processed using the MATLAB programming. To pinpoint water bodies effectively, the RGB threshold segmentation technique [40] is employed. Following this, elevation values within grid cells containing water bodies are adjusted to better mirror the actual topography. The scale of topographic adjustment is determined through hydrological extraction experiments. The primary consideration is to identify the revised scale that aligns optimally with the actual drainage network and minimally alters the flow direction.

Next, the nearest drainage map for non-water areas is computed based on the flow direction and water boundary coding. Initially introduced by O’Callaghan and Mark in 1984 [41], the D8 algorithm, which accounts for eight flow directions, is employed to establish flow direction. This algorithm is a commonly employed technique in terrain analysis, aiming to determine the direction of water flow at each grid point in a digital elevation model. By assessing the slope between each grid point and its eight neighboring grid points, this algorithm uses slope information to discern the direction of water flow. It is crucial to highlight that while encoding water body boundaries, special attention must be given to zigzag-shaped boundaries. Precise encoding of these cells (as illustrated by orange grid cells in Figure 3c) is essential to ensure accurate direction for non-water units and prevent referencing undefined drainage points.

Lastly, the nearest drainage map is employed to assign elevation values corresponding to the nearest drainage positions for areas without water bodies, leading to the generation of the nearest drainage DEM map (depicted in Figure 3g) that encompasses the entire extent of the study area. Furthermore, the original terrain undergoes depression filling to eliminate sink areas. This process was executed using the Fill tool within the ArcGIS 10.8.1 software. By subtracting the nearest drainage DEM map from the initial filled DEM, NHAND results are obtained, where water body areas are assigned a value of 0. A higher NHAND value indicates a greater elevation relative to its corresponding drainage point, implying a lower risk of inundation. Conversely, a lower NHAND value suggests a lower elevation relative to the drainage point, consequently elevating the risk of inundation.

Through the meticulous execution of this comprehensive process, the NHAND approach successfully overcomes the limitations stemming from the temporal and spatial resolution of DEM inherent in the HAND method. As a result, it introduces novel insights for hydrological simulation and various other applications.

2.3. Flood Inducing Factors

Flood occurrences are influenced by a complex interplay of various factors. While rainfall is a direct trigger for flood events, other elements such as terrain topography, land cover types, and soil characteristics also play pivotal roles in flood causation [5]. Within this study, a selection of 12 terrain and hydrology-related factors that contribute to the occurrence of floods has been made. These factors encompass elevation, slope, aspect, planar curvature, profile curvature, sediment transport index (STI), stream power index (SPI), topographic wetness index (TWI), distance to the nearest river (DTR), direction to the river (DOR), terrain position index (TPI), and terrain roughness index (TRI). By incorporating these factors, a more comprehensive comprehension of flood formation and the spatial distribution of flood risks can be attained, subsequently yielding significant insights. This integrated analysis enhances the precision of flood prediction and management, facilitating the formulation of suitable strategies for risk management and emergency responses.

The correlation coefficients between the considered 12 factors and NHAND results are depicted in Figure 4. It’s notable that factors such as slope, aspect, plan curvature, profile curvature, and STI, derived directly or indirectly from the DEM, display relatively low correlation coefficients with NHAND. In contrast, a strong correlation coefficient of 0.91 is evident between NHAND and elevation. This observation signifies that elevation potentially wields a substantial influence on the predictive efficacy of ML models. To conduct a more comprehensive assessment of the performance of the six machine learning models, two distinct sets of flood-inducing factors were taken into account during the selection process. The initial set, termed Factor-Group 1, encompassed the DEM and 11 terrain and hydrological factors derived from it. The second set, designated as Factor-Group 2, excluded the DEM and comprised the remaining 11 factors.

2.4. Individual Learning Models (ILMs)

2.4.1. Support Vector Machine

Support Vector Machine (SVM) model is a classic ILMs in the field of machine learning and is widely used in FSA. The objective of the regression model is to find a regression function, represented by

f (x)

in Equation (1), that can minimize prediction errors. This is achieved by adjusting the regularization parameters and kernel functions.

f (x) = \sum α_{i} K (x_{i}, x) + b

(1)

αᵢ in Equation (1). represents the Lagrange multipliers in the dual problem of support vector regression, established by the constrained objective function Equations (2) and (3).

\min \frac{1}{2} {‖ω‖}^{2} + C \sum_{i = 1}^{n} ξ_{i} + \sum_{i = 1}^{n} ε_{i}

(2)

|y_{i} - f (x_{i})| \leq ε_{i} + ξ_{i}, ξ_{i} \geq 0

(3)

where,

‖ω‖

denotes the L2 norm of the weight vector of the

f (x)

.

C

is the regularization parameter that balances model complexity and tolerance.

ξ_{i}

represents the degree of relaxation for data points, signifying allowable prediction errors.

ε_{i}

is the tolerance that represents the permissible range of errors between predicted values and actual values.

The Gaussian Radial Basis Function (RBF) is the most commonly used kernel function for handling non-linear data. The penalty parameter C and the γ value, which determines the mapping dimension, are important parameters for the RBF [42].

2.4.2. K-Nearest Neighbor

K-Nearest Neighbor (KNN) algorithm is based on the distance measure between training and testing data. It identifies the K nearest training samples closest to a given testing data point and predicts the output based on these K samples [43]. The predicted result,

y_{pred}

, for the unknown variable can be expressed as shown in Equation (4).

y_{pred} = \frac{1}{K} \sum_{i = 1}^{K} y_{i}

(4)

where

y_{i}

represents the target value of the

i

-th nearest neighbor.

K is a critical parameter for this estimator. A smaller K value means that the model uses a smaller neighborhood of training samples for prediction, making the model more susceptible to noise data interference. On the other hand, a larger K value can lead to higher model bias and increased deviation from the true data.

2.4.3. Lasso Regression

Lasso regression is a type of linear regression model that introduces the L1 norm as a regularization term on top of the general linear regression. This approach enables the model to achieve an optimal fitting error while keeping the parameters as “sparse” as possible, thereby enhancing the generalization ability of the model [44]. The objective of the Lasso regression model is to solve the parameter θ that minimizes the loss function

J (θ)

, which is calculated as follows:

J (θ) = \frac{1}{2 m} \sum_{i = 1}^{m} {(h_{θ} (x^{(i)}) - y^{(i)})}^{2} + α \sum_{j = 1}^{n} |θ_{j}|

(5)

h_{θ} (x^{(i)})

is a linear function, and its calculation formula is:

h_{θ} (x^{(i)}) = θ_{0} + θ_{1} x_{1}^{(i)} + \dots + θ_{n} x_{n}^{(i)}

(6)

where

m

is the number of samples in the training set,

n

is the dimension of the samples,

y

is the true value of the samples, and

α

is a user-defined regularization parameter that has a significant impact on the prediction accuracy of the model.

2.5. Ensemble Machine Learning Models (EMLMs)

The goal of EMLMs is to combine multiple independent estimators to complete a learning task. Common ensemble methods include Bagging, Boosting, and Stacking [45]. In FSA, Bagging (represented by RF) is commonly used. XGBoost, belonging to the Boosting category, is known for its ability to achieve high predictive accuracy and operational efficiency [46]. Stacking method allows for flexible combination of different machine learning models with unique strengths and has great potential for various applications [47].

2.5.1. Random Forest

Both RF and XGBoost models are based on decision tree (DT). The main difference lies in the training process of DT. In RF, all trees can be computed simultaneously, while in the XGBoost model, each subsequent tree is trained based on the previous tree, except for the first one. The prediction process of RF is as follows. First, a new training dataset is created by randomly taking M samples with replacement from the original training dataset. Then, N features are selected from the feature vector, and a decision tree is trained using the new training dataset. These two steps are repeated K times to obtain K decision trees and create a random forest. Finally, for new data to be predicted, the final output as shown in Equations (7) and (8) is determined by taking the majority vote or the average value of the results from each decision tree.

Final Prediction = mode (f_{1} (x), f_{2} (x), \dots, f_{K} (x))

(7)

or,

Final Prediction = \frac{1}{K} \sum_{k = 1}^{K} f_{k} (x)

(8)

in this context, the term “mode” refers to taking the mode (the most frequently occurring category) among the predictions of each tree, where

f_{k} (x)

represents the prediction of the k-th tree.

The performance and quantity of decision trees largely determine the effectiveness of a random forest. Factors such as the maximum depth, minimum samples for splitting, minimum samples per leaf, maximum features, and splitting criterion play a crucial role in shaping the structure, complexity, and generalization ability of decision trees [48]. Therefore, during the hyper-parameter optimization process in a random forest, careful consideration should be given to these parameters [49].

2.5.2. Extreme Gradient Boosting

XGBoost, short for Extreme Gradient Boosting, is an efficient algorithm for gradient boosting decision trees. This technique progressively refines the model’s fit to the training data by sequentially adding decision trees [50]. However, this iterative process can potentially lead to overfitting and consequently diminish the model’s accuracy when applied to validation data. To counteract the risk of overfitting, the appropriate configuration of parameters during the construction of the XGBoost model is vital.

Several parameters play a pivotal role in preventing overfitting. Among these, the maximum depth of the trees, subsampling ratios for both samples and features, are particularly significant. These parameters contribute to effectively constraining the model’s complexity. Furthermore, the number of decision trees incorporated into the model significantly impacts its predictive performance. While increasing the number of trees could lead to overfitting and extended training times, the introduction of a learning rate (η) helps regulate the number of trees, thereby achieving an optimal trade-off between performance and efficiency.

2.5.3. Stacking Approach

The primary objective of Stacking is to amalgamate the predictive outputs of multiple foundational models to create an ensemble model of enhanced accuracy [51]. The underlying principle of Stacking entails harnessing the predictive outcomes from diverse base models, also termed primary learners, and employing a meta-model, also referred to as a secondary learner, to engender the final ensemble prediction. Various base learners exhibit distinct performance attributes. Stacking capitalizes on the synergy of these diverse learners to mitigate the limitations of individual models and augment overall predictive efficacy. A common construction framework for stacking algorithms is shown in Figure 5.

Nevertheless, it is not a certainty that Stacking will unfailingly surpass the predictive accuracy of the best-performing base learner [52]. The prediction accuracy of the model hinges on numerous factors. On the one hand, the heterogeneity and relatively modest correlations among base learners confer benefits in various aspects, potentially empowering Stacking to enhance predictive performance. Conversely, the selection and meticulous tuning of the meta-model within the Stacking framework also exert an influence on the ultimate prediction accuracy. Furthermore, if the leading base model already exhibits high performance, Stacking may not yield incremental enhancements [53]. Consequently, it becomes imperative to judiciously assess diverse facets and conduct extensive experimentation to determine whether Stacking can yield improvements in the given problem domain.

2.6. Hyper-Parameter Tuning

Hyper-parameters represent the parameters of an estimator that cannot be directly learned from the training process. In the context of training a machine learning model, hyper-parameter optimization holds significant importance due to its potential impact on the model’s performance [54]. The process of selecting appropriate hyper-parameters is vital for attaining optimal model efficacy.

In this study, the hyper-parameter optimization technique employed is grid search. This systematic approach explores a range of user-defined hyper-parameter combinations to identify potentially optimal configurations for model hyperparameters. To mitigate the risk of model overfitting resulting from improper hyperparameter settings, the grid search procedure integrates the k-fold cross-validation methodology.

K-fold cross-validation is based on the principle of partitioning the dataset into k equal parts or folds. In each iteration, one fold is chosen as the validation set, while the remaining k-1 folds are combined to form the training set. This process is repeated k times, with each fold serving as the validation set once and the others as the training set. The purpose is to allow both training and validation across different subsets of the data, aiding in assessing the model’s performance robustly. The entire process is outlined in Figure 6, detailing the steps and workflow of the k-fold cross-validation method.

For this study, a specific value of k = 5 has been selected, meaning that the dataset is divided into 5 folds. This 5-fold cross-validation approach is applied alongside grid search to optimize hyperparameters for each model.

3. Results

3.1. NHAND Results Analysis

For the development of the NHAND model in the main urban area, a range of vertical drops from 5 m to 100 m were applied for topographic corrections. The impact of each correction on the regional flow direction was carefully evaluated, and the results are summarized in Table 3.

To quantify the effect of each correction, the change rate of flow direction was calculated. This was achieved by dividing the amount of flow change induced by the correction by the total number of raster cells. Furthermore, the difference in change rate was computed by subtracting the flow direction change rate at the smallest correction scale of 5 m from each flow direction change rate. It’s important to highlight that extensive corrections did not notably alter the flow direction within the main urban area. This observation led to the conclusion that a correction scale of 5 m is sufficient for accurately modeling NHAND in this particular urban region.

The NHAND model’s representation of the study area can be observed in Figure 7a. In comparison to models that do not incorporate corrected terrain flow directions (Figure 7b), the complete NHAND model presents a notably larger portion of high-risk areas within the primary urban zone, particularly downstream regions. By integrating corrected terrain flow directions, the NHAND method predicts elevated flood risks for the downstream areas, aligning with observations made in [55]. Furthermore, the NHAND model, unlike the HAND method that relies on a stream network derived from corrected topography as a reference (as depicted in Figure 7c), underscores the importance of ‘nearest to drainage’ through the incorporation of topographic corrections and actual water locations. This approach helps compensate for inaccuracies arising from insufficient temporal or spatial precision of DEM data.

3.2. Hyperparameters of Six Models

Using the NHAND outcomes acquired from the primary urban area along with two separate collections of floods influencing factors (Factor-Group 1 and Factor-Group 2), a dataset was formed to create and assess six distinct machine learning models, as shown in Figure 7d. In this dataset, 70% of the entries were utilized for training the models, leaving the remaining 30% for the testing set.

Grid search was employed to explore optimal hyperparameters for each model using the training dataset. Additionally, it was used to determine the base learners, meta-learner, and their respective hyperparameters for the Stacking model. This iterative process was conducted using the Scikit-learn library in Python. It’s essential to highlight that the selection of hyperparameters is a stepwise procedure. Initially, during the early stages of the iteration, the parameter ranges are set wide, allowing for a broader grid search to expedite the process. Subsequently, as guided by actual results from prior iterations, these ranges and step sizes are systematically narrowed down to identify the best combination of hyperparameters.

Following a rigorous 5-fold cross-validation process on the training set, the results indicated that Stacking exhibited superior performance when Lasso was utilized as the meta-learner, as opposed to SVM and KNN. Specifically, when based on Factor-Group 1 data, the Stacking model featuring six base learners yielded the best results, while Factor-Group 2 data showcased optimal performance for the Stacking model incorporating KNN, RF, and XGBoost as base learners. Detailed hyperparameter settings for each model and the constructed Stacking models can be found in Table 4 and Table 5.

3.3. Comparison of the ML Methods

In this section, the analysis centers on the predictive abilities and stability of the machine learning models that were created. This evaluation is carried out using two different sets of influencing factor datasets. To quantify the predictive performance, evaluation metrics including the coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) are utilized. The calculation formulas for these metrics are presented in Equations (9) to (11).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \in [0, 1]

(9)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(10)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(11)

where

y

and

{\hat{y}}_{i}

are the target and predicted values, respectively,

\bar{y}

is the mean of

y

values, and

n

is the number of data points.

3.3.1. Model Prediction Results

Table 6 and Table 7 display the performance metrics of the six models when predicting NHAND using Factor-Group 1 and Factor-Group 2, respectively. A distinct disparity in NHAND prediction performance is evident between the two sets of datasets. The utilization of features from Factor-Group 2 leads to a notable reduction in the predictive performance of all models. Among them, individual learners such as SVM, KNN, and Lasso exhibit relatively inferior predictive metric outcomes compared to ensemble methods.

Across both datasets, KNN attains the highest predictive performance for NHAND results on the training set, achieving an R2 value of 1. However, its performance on the test set is comparatively lower, with R2 values of 0.868 and 0.459 for Factor-Group 1 and Factor-Group 2, respectively. Regarding test set predictions, Stacking consistently exhibited the most robust predictive performance, irrespective of the dataset employed for influencing factors. Notably, this superiority was especially prominent in Factor-Group 2, where Stacking showcased a substantial enhancement in prediction accuracy in comparison to other models.

The different prediction outcomes obtained from the two datasets underscore the importance of feature selection in the development of machine learning models. This significance becomes particularly evident when the dataset lacks strongly correlated factors, resulting in a substantial reduction in the predictive capabilities of machine learning models. However, when evaluating the predictive accuracy of different model categories, ensemble techniques consistently maintain a high level of predictive performance. This phenomenon is particularly pronounced in the case of methods such as Stacking, which provide the flexibility to amalgamate various types of models.

Figure 8 illustrates scatter plots depicting the correlation between predicted values and actual values for all models in Factor-Group 1. It’s evident that models such as SVM, KNN, and Lasso exhibit poorer predictions of the actual NHAND values. Notably, as the actual NHAND values increase, the predictive performance of KNN and Lasso becomes increasingly inconsistent. SVM follows a fitting curve with a slope approximately equal to 1, although it makes unrealistic predictions for instances with higher NHAND values. In contrast, ensemble methods are less affected by the variations in actual NHAND values, particularly XGBoost and Stacking, which maintain good predictive performance even when faced with higher NHAND values.

Table 8 provides an overview of the training durations for all models using the two distinct feature datasets. The results indicate that KNN and Lasso exhibit faster training speeds, but their predictive performance is relatively weaker. The Stacking model with the best predictive performance has the longest training time among all models. Among all ensemble methods, XGBoost has the fastest training speed, with average training times of 0.32 s and 0.39 s, and its predictive performance metrics are only slightly behind those of Stacking.

3.3.2. Stability of the Models

A model achieves stability when it produces consistent predictive outcomes across diverse datasets. To evaluate the stability of all models, a 5-fold cross-validation technique was employed on the complete dataset.

Figure 9 illustrates that each model exhibits a similar pattern of fluctuations in both the training and testing data, indicating that data quality significantly impacts model predictions. Within each iteration of cross-validation, each model maintains relatively consistent predictive performance on the training data, suggesting a relatively strong learning capability from the training data. However, different models yield notably different predictions on different testing datasets. In Factor-Group 1, RF, XGBoost, and Stacking show relatively small variations in predictive results across different groups of data, indicating better prediction stability for these three methods on this dataset. Even in Factor-Group 2, ensemble methods exhibit lower fluctuations while ensuring predictive accuracy.

3.4. Model Explainability Using SHAP Approach

Shapley Additive Explanations (SHAP) is grounded in the concept of Shapley values derived from game theory. This technique is employed to expound on the influence of individual features on the prediction of each sample within a ML model [56]. Among the six models introduced in Section 3.2, the Stacking model excels in both prediction accuracy and model stability, surpassing the others. However, it does require more time for training. The XGBoost model, on the other hand, performs exceptionally well as a boosting ensemble method, with prediction capabilities just slightly below those of Stacking, but it operates with high efficiency. Moreover, it’s worth noting that there are significant variations in prediction accuracy among the different impact factor datasets, underscoring the pivotal role of elevation in enhancing model predictions.

In this section, the SHAP method are employed to specifically elucidate the contributions of each feature—such as slope, aspect, curvature, and others—from Factor-Group 2 to the prediction outcomes of the XGBoost model.

Single-sample prediction analysis serves as a valuable tool in comprehending the impact of each feature on the final prediction made by the model. In Figure 10, we illustrate an actual NHAND value of 21 for a specific sample, accompanied by the influence of its corresponding feature values on the ultimate prediction. The chart’s bar lengths clearly convey that, within this particular sample, DOR (direction to the river) and slope exert the most substantial effects on the prediction outcome. Notably, DOR exhibits a positive effect, contributing to an increase in the prediction value, while slope exerts a negative effect. The cumulative impact of varying degrees of influence from all features culminates in a final prediction value of 21.13.

Figure 11a illustrates the SHAP value distributions for each factor within Factor-Group 2 across the entire dataset, along with the mean absolute values for each factor. In Figure 12a, the y-axis corresponds to the analyzed feature, with each dot representing a sample. The color of the dots corresponds to the feature’s value, and the horizontal position of the dots indicates the magnitude of the SHAP value for the given sample. A larger absolute SHAP value for a sample indicates a more significant influence of that feature on the prediction outcome. Positive SHAP values signify a positive correlation between the feature’s sample value and the prediction result, while negative values indicate a negative relationship. For instance, larger distance to the nearest river (DTR) and higher terrain roughness index (TRI) are associated with increased NHAND values, indicating an elevated flood risk.

Figure 11b illustrates the mean of the absolute SHAP values for each factor across the entire dataset. A larger average value indicates a more pronounced overall influence of that feature on the prediction results. The results in the figure suggest that factors such as Distance to the Nearest River (DTR), River Orientation (DOR), and Terrain Roughness Index (TRI) have the most significant impact on the model’s prediction results, and these factors collectively exhibit a negative influence. Among all of the factors, only Aspect and Sediment Transport Index (STI) show a positive impact on the predictions.

Feature interaction analysis aids in comprehending how the interplay between different features affects prediction outcomes. As portrayed in Figure 12, the contributions of features such as DTR and distance to the DOR to the prediction outcomes reveal intricate nonlinear associations. Nevertheless, regularity is observed in the SHAP values of these two features when they interact. For instance, when the direction of the nearest river falls between 200 and 360 degrees, or under circumstances of low roughness index, the impact of river distance on the prediction result remains relatively modest (refer to Figure 12a,b)). Lower DTR values give rise to fluctuations in DOR’s SHAP values within the approximate range of −10 to +10 (see Figure 12c).

4. Discussion

The NHAND methodology, which involves normalizing terrain using nearest drainage points, is designed to highlight the elevation difference between any given location on land and its nearest drainage point. By incorporating actual water systems as references, this approach aims to mitigate errors in stream extraction due to inaccuracies in terrain data. It proves especially useful in urban areas lacking high temporal and spatial resolution data. Compared to the original HAND method, the enhanced NHAND approach better captures the distribution characteristics of flood risk within the region [28].

Addressing natural disasters such as floods has always been a complex challenge. The rise of artificial intelligence technology has led to the adoption of data-driven machine learning models for flood susceptibility analysis and risk prediction. However, previous research has predominantly focused on prediction accuracy, often overlooking factors such as model training time and stability. Additionally, limited historical flood records have constrained the comprehensive evaluation of various models [57]. This paper leverages the NHAND model to construct an evaluation framework, assessing individual learners such as SVM, KNN, Lasso, along with ensemble models such as RF, XGBoost, and Stacking.

The correlation analysis results highlight a substantial correlation between elevation and NHAND outcomes, underscoring elevation’s significant influence on model predictions. This hypothesis is further corroborated in Section 3.2. The composition of the feature dataset significantly impacts model predictive capacity [58]. This impact is particularly pronounced in the case of individual learners such as SVM, KNN, and Lasso [59]. For example, SVM’s R² result on the test set displays a variation rate of 38.2%. In contrast, ensemble methods such as RF, XGBoost, and Stacking demonstrate lower sensitivity to changes in the feature set, consistently outperforming individual learners in overall predictive accuracy [23].

Among the three ensemble methods, Stacking stands out as having the best performance. It not only achieved the highest prediction results on the test set but also demonstrated higher average prediction results in the 5-fold cross-validation across all data. However, it is crucial to acknowledge that this model is relatively time-consuming, not only during training but also in the model construction process, which involves selecting base learners, meta-learners, and their respective hyperparameters. XGBoost, on the other hand, boasts prediction accuracy just slightly below that of Stacking. It exhibits stability comparable to Stacking in Factor-Group 1. What is even more impressive is that it accomplishes this while maintaining optimal operational efficiency. The exceptional predictive capabilities of XGBoost for natural disasters such as floods have been corroborated in previous studies [6,60].

Different datasets possess distinct features and relationships, resulting in variable model performances. This highlights the importance of selecting the appropriate model based on the specific problem and data characteristics. When employing machine learning models, it is essential to consider not only their predictive abilities on unknown data but also their stability and efficiency. The insights provided in this paper offer valuable guidance for the application of machine learning methods in flood-related research.

5. Conclusions

In the face of persistent and extreme climate change, flood prevention has become an absolutely vital task. To effectively address flood risks and minimize potential losses, it is imperative to conduct a comprehensive analysis and prediction of flood susceptibility and risks.

In this study, the NHAND (New Height Above the Nearest Drainage) method has been employed to generate flood risk assessments for the main urban area of Zhengzhou City in Henan Province. Subsequently, the flood risk assessments generated through the NHAND method were used as the foundation for constructing an evaluation framework. This framework incorporated a range of individual learning models including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Lasso, along with ensemble models such as Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Stacking. Through comprehensive comparative analysis, the effectiveness of ensemble models in flood risk analysis has been clearly demonstrated. Among these models, the Stacking ensemble model not only excels in terms of prediction accuracy but also stands out for its stability. Meanwhile, XGBoost, while slightly below Stacking in prediction accuracy, showcases outstanding computational efficiency.

Nevertheless, it is important to acknowledge that this study has certain limitations. Although we have improved the NHAND model, we have not yet established a graphical user interface to allow readers to utilize the NHAND model more directly. Additionally, the application of machine learning methods in the field of floods still faces challenges. While machine learning models have proven effective in understanding and explaining complex flood-related issues, whether based on model simulations or historical flood records, each flood event occurs within specific environmental conditions, with influencing factors extending beyond direct factors such as rainfall or topographical hydrological conditions. Continuous exploration and validation of additional methods are necessary to better understand and predict flood risks.

Author Contributions

Conceptualization, C.M. and H.J.; methodology, C.M.; software, C.M.; validation, C.M.; formal analysis, C.M.; investigation, C.M.; resources, C.M.; data curation, C.M.; writing—original draft preparation, C.M.; writing—review and editing, C.M. and H.J.; visualization, C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks to H.J. for providing the lab environment.

Conflicts of Interest

The authors declare no conflict of interest.

References

IPCC. AR6 Synthesis Report: Climate Change 2023; IPCC: Interlaken, Switzerland, 2023. [Google Scholar]
Gao, Y.Q.; Chen, J.H.; Luo, H.; Wang, H.Z. Prediction of hydrological responses to land use change. Sci. Total Environ. 2020, 708, 134998. [Google Scholar] [CrossRef] [PubMed]
Gashaw, T.; Tulu, T.; Argaw, M.; Worqlul, A.W. Modeling the hydrological impacts of land use/land cover changes in the Andassa watershed, Blue Nile Basin, Ethiopia. Sci. Total Environ. 2018, 619–620, 1394–1408. [Google Scholar] [CrossRef] [PubMed]
Tellman, B.; Sullivan, J.A.; Kuhn, C.; Kettner, A.J.; Doyle, C.S.; Brakenridge, G.R.; Erickson, T.A.; Slayback, D.A. Satellite imaging reveals increased proportion of population exposed to floods. Nature 2021, 596, 80. [Google Scholar] [CrossRef]
Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef]
Seydi, S.T.; Kanani-Sadat, Y.; Hasanlou, M.; Sahraei, R.; Chanussot, J.; Amani, M. Comparison of Machine Learning Algorithms for Flood Susceptibility Mapping. Remote Sens. 2023, 15, 192. [Google Scholar] [CrossRef]
Liu, X.; Huang, Y.; Xu, X.; Li, X.; Li, X.; Ciais, P.; Lin, P.; Gong, K.; Ziegler, A.D.; Chen, A.; et al. High-spatiotemporal-resolution mapping of global urban change from 1985 to 2015. Nat. Sustain. 2020, 3, 564–570. [Google Scholar] [CrossRef]
He, W.R.; Li, X.C.; Zhou, Y.Y.; Shi, Z.T.; Yu, G.J.; Hu, T.Y.; Wang, Y.X.; Huang, J.X.; Bai, T.C.; Sun, Z.C.; et al. Global urban fractional changes at a 1 km resolution throughout 2100 under eight scenarios of Shared SocioeconomicPathways (SSPs) and Representative Concentration Pathways (RCPs). Earth Syst. Sci. Data 2023, 15, 3623–3639. [Google Scholar] [CrossRef]
Grill, G.; Lehner, B.; Thieme, M.; Geenen, B.; Tickner, D.; Antonelli, F.; Babu, S.; Borrelli, P.; Cheng, L.; Crochetiere, H.; et al. Mapping the world's free-flowing rivers. Nature 2019, 569, 215. [Google Scholar] [CrossRef]
Liu, B.B.; Xu, C.W.; Yang, J.S.; Lin, S.; Wang, X. Effect of Land Use and Drainage System Changes on Urban Flood Spatial Distribution in Handan City: A Case Study. Sustainability 2022, 14, 14610. [Google Scholar] [CrossRef]
Fang, L.; Huang, J.L.; Cai, J.T.; Nitivattananon, V. Hybrid approach for flood susceptibility assessment in a flood-prone mountainous catchment in China. J. Hydrol. 2022, 612, 128091. [Google Scholar] [CrossRef]
Teng, J.; Jakeman, A.J.; Vaze, J.; Croke, B.F.W.; Dutta, D.; Kim, S. Flood inundation modelling: A review of methods, recent advances and uncertainty analysis. Environ. Modell. Softw. 2017, 90, 201–216. [Google Scholar] [CrossRef]
Henonin, J.; Russo, B.; Mark, O.; Gourbesville, P. Real-time urban flood forecasting and modelling—A state of the art. J. Hydroinf. 2013, 15, 717–736. [Google Scholar] [CrossRef]
Yin, D.K.; Evans, B.; Wang, Q.; Chen, Z.X.; Jia, H.F.; Chen, A.S.; Fu, G.T.; Ahmad, S.; Leng, L.Y. Integrated 1D and 2D model for better assessing runoff quantity control of low impact development facilities on community scale. Sci. Total Environ. 2020, 720, 137630. [Google Scholar] [CrossRef]
Leskens, J.G.; Brugnach, M.; Hoekstra, A.Y.; Schuurmans, W. Why are decisions in flood disaster management so poorly supported by information from flood models? Environ. Modell. Softw. 2014, 53, 53–61. [Google Scholar] [CrossRef]
Thrysoe, C.; Balstrom, T.; Borup, M.; Lowe, R.; Jamali, B.; Arnbjerg-Nielsen, K. FloodStroem: A fast dynamic GIS-based urban flood and damage model. J. Hydrol. 2021, 600, 126521. [Google Scholar] [CrossRef]
He, Y.; Xu, Z.S. Multi-attribute decision making methods based on reference ideal theory with probabilistic hesitant information. Expert Syst. Appl. 2019, 118, 459–469. [Google Scholar] [CrossRef]
Niu, L.L.; Li, J.; Li, F.L.; Wang, Z.X. Multi-criteria decision-making method with double risk parameters in interval-valued intuitionistic fuzzy environments. Complex Intell. Syst. 2020, 6, 669–679. [Google Scholar] [CrossRef]
Li, C.L.; Sun, N.; Lu, Y.H.; Guo, B.Y.; Wang, Y.; Sun, X.K.; Yao, Y.K. Review on Urban Flood Risk Assessment. Sustainability 2023, 15, 765. [Google Scholar] [CrossRef]
Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K. An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction. J. Hydrol. 2019, 569, 387–408. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. 2013, 504, 69–79. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.B.; Grof, G.; Ho, H.L.; et al. A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Tyralis, H.; Papacharalampous, G.; Langousis, A. Super ensemble learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms. Neural Comput. Appl. 2021, 33, 3053–3068. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
Zhao, G.; Pang, B.; Xu, Z.X.; Yue, J.J.; Tu, T.B. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 2018, 615, 1133–1142. [Google Scholar] [CrossRef]
Al-Abadi, A.M.; Pradhan, B. In flood susceptibility assessment, is it scientifically correct to represent flood events as a point vector format and create flood inventory map? J. Hydrol. 2020, 590, 125475. [Google Scholar] [CrossRef]
Renno, C.D.; Nobre, A.D.; Cuartas, L.A.; Soares, J.V.; Hodnett, M.G.; Tomasella, J.; Waterloo, M.J. HAND, a new terrain descriptor using SRTM-DEM: Mapping terra-firme rainforest environments in Amazonia. Remote Sens. Environ. 2008, 112, 3469–3481. [Google Scholar] [CrossRef]
Ali, S.A.; Parvin, F.; Pham, Q.B.; Vojtek, M.; Vojtekova, J.; Costache, R.; Linh, N.; Nguyen, H.Q.; Ahmad, A.; Ghorbani, M.A. GIS-based comparative assessment of flood susceptibility mapping using hybrid multi-criteria decision-making approach, naive Bayes tree, bivariate statistics and logistic regression: A case of Topla basin, Slovakia. Ecol. Indic. 2020, 117, 106620. [Google Scholar] [CrossRef]
Lin, T.; Shi, P.; Ma, C.; Shi, F.; Nie, J.; Chen, B. Height above nearest drainage and application in flood inundation mapping in China. J. Beijing Norm. Univ. 2022, 58, 300–309. [Google Scholar]
Nobre, A.D.; Cuartas, L.A.; Hodnett, M.; Rennó, C.D.; Rodrigues, G.; Silveira, A.; Waterloo, M.; Saleska, S. Height Above the Nearest Drainage—A hydrologically relevant new terrain model. J. Hydrol. 2011, 404, 13–29. [Google Scholar] [CrossRef]
Li, Z.; Duque, F.Q.; Grout, T.; Bates, B.; Demir, I. Comparative analysis of performance and mechanisms of flood inundation map generation using Height Above Nearest Drainage. Environ. Modell. Softw. 2023, 159, 105565. [Google Scholar] [CrossRef]
Li, Z.; Mount, J.; Demir, I. Accounting for uncertainty in real-time flood inundation mapping using HAND model: Iowa case study. Nat. Hazards 2022, 112, 977–1004. [Google Scholar] [CrossRef]
Garousi-Nejad, I.; Tarboton, D.G.; Aboutalebi, M.; Torres-Rua, A.F. Terrain Analysis Enhancements to the Height above Nearest Drainage Flood Inundation Mapping Method. Water Resour. Res. 2019, 55, 7983–8009. [Google Scholar] [CrossRef]
Wu, T.; Li, J.Y.; Li, T.J.; Sivakumar, B.; Zhang, G.; Wang, G.Q. High-efficient extraction of drainage networks from digital elevation models constrained by enhanced flow enforcement from known river maps. Geomorphology 2019, 340, 184–201. [Google Scholar] [CrossRef]
Ye, S.; Zhang, Q.W.; Yan, F.; Ren, B.; Shen, D.T. A novel approach for high-quality drainage network extraction in flat terrains by using a priori knowledge of hydrogeomorphic features to extend DEMs: A case study in the Hoh Xil region of the Qinghai-Tibetan Plateau. Geomorphology 2022, 403, 108138. [Google Scholar] [CrossRef]
Teng, J.; Penton, D.J.; Ticehurst, C.; Sengupta, A.; Freebairn, A.; Marvanek, S.; Vaze, J.; Gibbs, M.; Streeton, N.; Karim, F.; et al. A Comprehensive Assessment of Floodwater Depth Estimation Models in Semiarid Regions. Water Resour. Res. 2022, 58, e2022WR032031. [Google Scholar] [CrossRef]
Yu, M. Spatial and temporal evolution patterns of land use in provincial capital cities and their driving factors: Taking Zhengzhou city as an example. Rural. Econ. Technol. 2022, 33, 24–26. [Google Scholar]
Chen, H.L.; Liang, Q.H.; Liu, Y.; Xie, S.G. Hydraulic correction method (HCM) to enhance the efficiency of SRTM DEM in flood modeling. J. Hydrol. 2018, 559, 56–70. [Google Scholar] [CrossRef]
Kim, D.E.; Liong, S.Y.; Gourbesville, P.; Andres, L.; Liu, J.D. Simple-Yet-Effective SRTM DEM Improvement Scheme for Dense Urban Cities Using ANN and Remote Sensing Data: Application to Flood Modeling. Water 2020, 12, 816. [Google Scholar] [CrossRef]
Hassanein, M.; Lari, Z.; El-Sheimy, N. A New Vegetation Segmentation Approach for Cropped Fields Based on Threshold Detection from Hue Histograms. Sensors 2018, 18, 1253. [Google Scholar] [CrossRef]
O'Callaghan, J.F.; Mark, D.M. The extraction of drainage networks from digital elevation data. Comput. Vis. Graph. Image Process. 1984, 28, 323–344. [Google Scholar] [CrossRef]
Hsieh, M.H.; Hsieh, M.J.; Chen, C.M.; Hsieh, C.C.; Chao, C.M.; Lai, C.C. Comparison of machine learning models for the prediction of mortality of patients unplanned extubation in intensive with care units. Sci. Rep. 2018, 8, 17116. [Google Scholar] [CrossRef] [PubMed]
Qian, Y.G.; Zhou, W.Q.; Yan, J.L.; Li, W.F.; Han, L.J. Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery. Remote Sens. 2015, 7, 153–168. [Google Scholar] [CrossRef]
Kang, M.C.; Yoo, D.Y.; Gupta, R. Machine learning-based prediction for compressive and flexural strengths of steel fiber-reinforced concrete. Constr. Build. Mater. 2021, 266, 121117. [Google Scholar] [CrossRef]
Yao, J.; Zhang, X.X.; Luo, W.C.; Liu, C.J.; Ren, L.L. Applications of Stacking/Blending ensemble learning approaches for evaluating flash flood susceptibility. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102932. [Google Scholar] [CrossRef]
Fan, J.L.; Wang, X.K.; Wu, L.F.; Zhou, H.M.; Zhang, F.C.; Yu, X.; Lu, X.H.; Xiang, Y.Z. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Conv. Manag. 2018, 164, 102–111. [Google Scholar] [CrossRef]
Gu, J.Y.; Liu, S.G.; Zhou, Z.Z.; Chalov, S.R.; Zhuang, Q. A Stacking Ensemble Learning Model for Monthly Rainfall Prediction in the Taihu Basin, China. Water 2022, 14, 492. [Google Scholar] [CrossRef]
Kutty, A.A.; Wakjira, T.G.; Kucukvar, M.; Abdella, G.M.; Onat, N.C. Urban resilience and livability performance of European smart cities: A novel machine learning approach. J. Clean. Prod. 2022, 378, 134203. [Google Scholar] [CrossRef]
Al-Aizari, A.R.; Al-Masnay, Y.A.; Aydda, A.; Zhang, J.Q.; Ullah, K.; Islam, A.; Habib, T.; Kaku, D.U.; Nizeyimana, J.C.; Al-Shaibah, B.; et al. Assessment Analysis of Flood Susceptibility in Tropical Desert Area: A Case Study of Yemen. Remote Sens. 2022, 14, 4050. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Zhang, H.C.; Zhu, T.T. Stacking Model for Photovoltaic-Power-Generation Prediction. Sustainability 2022, 14, 5669. [Google Scholar] [CrossRef]
Dumancas, G.; Adrianto, I. A stacked regression ensemble approach for the quantitative determination of biomass feedstock compositions using near infrared spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 276, 121231. [Google Scholar] [CrossRef] [PubMed]
Zitlau, R.; Hoyle, B.; Paech, K.; Weller, J.; Rau, M.M.; Seitz, S. Stacking for machine learning redshifts applied to SDSS galaxies. Mon. Not. R. Astron. Soc. 2016, 460, 3152–3162. [Google Scholar] [CrossRef]
Wakjira, T.G.; Alam, M.S.; Ebead, U. Plastic hinge length of rectangular RC columns using ensemble machine learning model. Eng. Struct. 2021, 244, 112808. [Google Scholar] [CrossRef]
Ouma, Y.O.; Tateishi, R. Urban Flood Vulnerability and Risk Mapping Using Integrated Multi-Parametric AHP and GIS: Methodological Overview and Case Study Assessment. Water 2014, 6, 1515–1545. [Google Scholar] [CrossRef]
Temenos, A.; Tzortzis, I.N.; Kaselimi, M.; Rallis, I.; Doulamis, A.; Doulamis, N. Novel Insights in Spatial Epidemiology Utilizing Explainable AI (XAI) and Remote Sensing. Remote Sens. 2022, 14, 3074. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.F.; Shahabi, H.; Li, S.J.; Hong, H.Y.; Wang, X.J.; Bian, H.Y.; Zhang, S.; Pradhan, B.; et al. Modeling flood susceptibility using data-driven approaches of naive Bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef]
Chen, R.C.; Dewi, C.; Huang, S.W.; Caraka, R.E. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
Fan, J.; Wang, X.; Zhang, F.; Ma, X.; Wu, L. Predicting daily diffuse horizontal solar radiation in various climatic regions of China using support vector machine and tree-based soft computing models with local and extrinsic climatic data. J. Clean. Prod. 2020, 248, 119264. [Google Scholar] [CrossRef]
Lin, N.; Zhang, D.; Feng, S.S.; Ding, K.; Tan, L.B.; Wang, B.; Chen, T.; Li, W.L.; Dai, X.A.; Pan, J.P.; et al. Rapid Landslide Extraction from High-Resolution Remote Sensing Images Using SHAP-OPT-XGBoost. Remote Sens. 2023, 15, 3901. [Google Scholar] [CrossRef]

Figure 1. Keyword Clustering and Co-occurrence Diagram.

Figure 2. Location and terrain of the study area. (a) China map. (b) Zhengzhou City. (c) The main urban area.

Figure 3. NHAND establishment steps.

Figure 4. Correlation matrix of flood-inducing factors and NHAND.

Figure 5. Schematic diagram of the Stacking approach. Parameters refer to the optimal parameters for each model. y₁, y₂, y_n represent the prediction results of the base learners on the original training data.

Figure 6. Conceptual schematic of k-fold cross-validation.

Figure 7. Flood riskiness distribution results. (a) Complete NHAND proposed in this paper. (b) NHAND’ results without topographic corrections. (c) Existing HAND model. (d) Training and testing datasets.

Figure 8. Predictive Performance Comparison of all Models base on Factor-Group 1.

Figure 9. R² variation curves of each model under 5-fold cross-validation. Std represents the standard deviation reflecting the fluctuation in each model’s 5-fold cross-validation.

Figure 10. Single-sample prediction analysis.

Figure 11. Global explanatory analysis. (a) SHAP summary plot. (b) Feature Importance Ranking.

Figure 12. SHAP dependency and interaction plots. The effect of DTR on model predictions under the interaction of (a) DOR and (b) TRI with DTR, respectively. Effect of DOR on model prediction results when (c) DTR and (d) TRI interact with DOR, respectively.

Table 1. Top five keywords with the strongest citation bursts from 2003 to 2023.

Keywords	Year	Begin	End	Strength
flood forecasting	2003	2003	2012	5.807
rainfall runoff model	2003	2005	2015	11.623
mathematical model	2003	2007	2017	7.813
machine learning	2003	2020	2023	7.574
random forest	2003	2020	2023	5.829

* Light blue represents the years 2003 to 2023, during which the keyword first appeared. Blue represents the period when the keyword was at the forefront. Red indicates the period when the keyword was most frequently cited.

Table 2. Land Use Type Changes and Correlation Analysis of Driving Factors in the Main Urban Area from 1999 to 2021.

		Cropland	Forest	Grassland	Water	Barren	Impervious
Change	Rate (%)	−51.31	746.02	−41.00	−3.64	−63.64	81.86
Change	Area (km²)	−316.65	3.37	−0.04	−0.87	−0.01	314.19
Correlation	Urbanization	−0.95	0.74	−0.20	−0.47	−0.07	0.95
	Population	−0.95	0.73	−0.37	−0.62	−0.09	0.95
	GDP	−0.96	0.76	−0.35	−0.64	−0.05	0.97
	Primary industry	−0.97	0.90	−0.10	−0.53	0.22	0.97
	Secondary industry	−0.99	0.87	−0.27	−0.64	0.09	0.99
	Tertiary industry	−0.91	0.65	−0.40	−0.63	−0.15	0.92

Table 3. Change in the flow direction.

Vertical Drop (m)	5	10	15	20	50	100
Change rate of flow direction (%)	7.85	8.74	8.94	9.01	9.55	9.75
Difference in change rate (%)	~	0.89	1.09	1.16	1.70	1.90

Table 4. Parameter values of the models.

Model	Hyper-Parameters	Optimal Values
		Factor-Group 1	Factor-Group 2
SVM	Kernel	RBF	RBF
	C	700	900
	γ	0.012	0.034
KNN	K	5	23
Lasso	$α$	0.004	0.003
RF	Number of estimators	1000	1000
	Maximum depth	16	18
	Minimum sample split	2	2
	Minimum sample leaf	3	1
XGBoost	Number of estimators	130	140
	Learning rate	0.08	0.05
	Maximum depth	7	13
	subsample	0.8	0.8
	Colsample by tree	1.0	1.0

Table 5. Stacking Model Reference.

Model	Layers	Optimal Learners and Hyper-Parameters
		Factor-Group 1	Factor-Group 2
Stacking	Base-Learners	SVM: Kernel = RBF, C = 1, γ = 0.089 KNN: K = 11 Lasso: α = 0.1 RF: Number of estimators = 1900, Maximum depth = 16, Minimum sample split = 2, Minimum sample leaf = 2 XGBoost: Number of estimators = 80, Learning rate = 0.14, Maximum depth = 6, subsample = 0.8, Colsample by tree = 1.0	KNN: K = 17 RF: Number of estimators = 700, Maximum depth = 12, Minimum sample split = 2, Minimum sample leaf = 1 XGBoost: Number of estimators = 190, Learning rate = 0.08, Maximum depth = 4, subsample = 0.8, Colsample by tree = 1.0
Stacking	Meta-Learner	Lasso: α = 0.01	Lasso: α = 0.1

Table 6. Performance indices of different ML models base on Factor-Group 1.

Model	Training Dataset			Testing Dataset
	R²	MSE	RMSE	R²	MSE	RMSE
SVM	0.935	35.740	5.978	0.949	34.339	5.860
KNN	1.000	0.000	0.000	0.868	89.491	9.460
Lasso	0.843	86.004	9.274	0.855	98.407	9.920
RF	0.990	9.045	3.001	0.959	27.599	5.254
XGBoost	0.997	1.848	1.362	0.964	24.707	4.971
Stacking	0.992	6.448	2.533	0.966	23.36	4.833

Table 7. Performance indices of different ML models base on Factor-Group 2.

Model	Training Dataset			Testing Dataset
	R²	MSE	RMSE	R²	MSE	RMSE
SVM	0.698	165.771	12.875	0.567	294.017	17.147
KNN	1.000	0.000	0.000	0.459	366.637	19.148
Lasso	0.295	387.026	19.673	0.343	445.663	21.111
RF	0.960	22.226	4.714	0.730	183.232	13.536
XGBoost	0.994	3.479	1.865	0.739	176.932	13.302
Stacking	0.958	23.310	4.828	0.759	168.659	12.987

Table 8. Comparison of model training time.

Model	Training Time * (s)
	Factor-Group 1	Factor-Group 2
SVM	0.97	0.47
KNN	0.01	0.01
Lasso	0.01	0.01
RF	5.11	4.75
XGBoost	0.32	0.39
Stacking	31.02	29.44

* The average of training times across 5 runs.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, C.; Jin, H. A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method. Sustainability 2023, 15, 14928. https://doi.org/10.3390/su152014928

AMA Style

Meng C, Jin H. A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method. Sustainability. 2023; 15(20):14928. https://doi.org/10.3390/su152014928

Chicago/Turabian Style

Meng, Caisu, and Hailiang Jin. 2023. "A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method" Sustainability 15, no. 20: 14928. https://doi.org/10.3390/su152014928

APA Style

Meng, C., & Jin, H. (2023). A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method. Sustainability, 15(20), 14928. https://doi.org/10.3390/su152014928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparison of Machine Learning Models for Predicting Flood Susceptibility Based on the Enhanced NHAND Method

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Establishment of NHAND Model

2.3. Flood Inducing Factors

2.4. Individual Learning Models (ILMs)

2.4.1. Support Vector Machine

2.4.2. K-Nearest Neighbor

2.4.3. Lasso Regression

2.5. Ensemble Machine Learning Models (EMLMs)

2.5.1. Random Forest

2.5.2. Extreme Gradient Boosting

2.5.3. Stacking Approach

2.6. Hyper-Parameter Tuning

3. Results

3.1. NHAND Results Analysis

3.2. Hyperparameters of Six Models

3.3. Comparison of the ML Methods

3.3.1. Model Prediction Results

3.3.2. Stability of the Models

3.4. Model Explainability Using SHAP Approach

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI