Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods

Li, Yifan; Zhang, Chendi; Cui, Peng; Hassan, Marwan; Duan, Zhongjie; Bhattacharyya, Suman; Yao, Shunyu; Zhao, Yang

doi:10.3390/rs17060946

Open AccessArticle

Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods

by

Yifan Li

^1,2

,

Chendi Zhang

^1,*

,

Peng Cui

^1,2,3,

Marwan Hassan

⁴

,

Zhongjie Duan

⁵,

Suman Bhattacharyya

⁴,

Shunyu Yao

⁶

and

Yang Zhao

⁷

¹

Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

China-Pakistan Joint Research Center on Earth Sciences, CAS-HEC, Islamabad 45320, Pakistan

⁴

Department of Geography, University of British Columbia, Vancouver, BC V6T1Z2, Canada

⁵

Department of Data Science and Engineering, East China Normal University, Shanghai 200062, China

⁶

China Institute of Water Resources and Hydropower Research, Beijing 100038, China

⁷

Sichuan Zipingpu Development Co., Ltd., Chengdu 610091, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(6), 946; https://doi.org/10.3390/rs17060946

Submission received: 20 January 2025 / Revised: 19 February 2025 / Accepted: 5 March 2025 / Published: 7 March 2025

(This article belongs to the Special Issue Advancing Water System with Satellite Observations and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The Hengduan Mountains region (HMR) is vulnerable to flash flood disasters, which account for the largest proportion of flood-related fatalities in China. Flash flood regionalization, which divides a region into homogeneous subdivisions based on flash flood-inducing factors, provides insights for the spatial distribution patterns of flash flood risk, especially in ungauged areas. However, existing methods for flash flood regionalization have not fully reflected the spatial topology structure of the inputted geographical data. To address this issue, this study proposed a novel framework combining a state-of-the-art unsupervised Graph Neural Network (GNN) method, Dink-Net, and Shapley Additive exPlanations (SHAP) for flash flood regionalization in the HMR. A comprehensive dataset of flash flood inducing factors was first established, covering geomorphology, climate, meteorology, hydrology, and surface conditions. The performances of two classic machine learning methods (K-means and Self-organizing feature map) and three GNN methods (Deep Graph Infomax (DGI), Deep Modularity Networks (DMoN), and Dilation shrink Network (Dink-Net)) were compared for flash-flood regionalization, and the Dink-Net model outperformed the others. The SHAP model was then applied to quantify the impact of all the inducing factors on the regionalization results by Dink-Net. The newly developed framework captured the spatial interactions of the inducing factors and characterized the spatial distribution patterns of the factors. The unsupervised Dink-Net model allowed the framework to be independent from historical flash flood data, which would facilitate its application in ungauged mountainous areas. The impact analysis highlights the significant positive influence of extreme rainfall on flash floods across the entire HMR. The pronounced positive impact of soil moisture and saturated hydraulic conductivity in the areas with a concentration of historical flash flood events, together with the positive impact of topography (elevation) in the transition zone from the Qinghai–Tibet Plateau to the Sichuan Basin, have also been revealed. The results of this study provide technical support and a scientific basis for flood control and disaster reduction measures in mountain areas according to local inducing conditions.

Keywords:

flash flood; Graph Neural Network (GNN); regionalization; the Hengduan Mountains region (HMR); interpretation model

1. Introduction

Floods are among the most severe and destructive natural disasters worldwide [1,2]. Among them, flash floods, the rapid flooding processes in small mountainous watersheds (<200 km² suggested by the Ministry of Water Resources of China) triggered by heavy rainstorms [3], stand out as particularly devastating and deadly. Between 1949 and 2015, over 60,000 flash floods resulted in approximately 280,000 deaths in China [4,5]. The number of missing and dead caused by flash floods accounted for 66.4% of the total casualties owing to flood-related disasters in China during 2000–2021 in average [6]. The Hengduan Mountains region (HMR) in Southwest China is particularly susceptible to flash flood disasters, accounting for 42.3% of the total fatalities in China during 2011–2015 [4,7]. In 2014, the deaths by flash floods in the HMR even exceeded the sum of the remaining areas of China [4]. The HMR is also the stronghold for many national key engineering projects, e.g., the hydro–wind–solar multi-energy complementation bases in the Jinsha River and Yalong River [8,9] and the Sichuan–Tibet Railway [10]. The need to address the great threats from flash floods for the safety of local people and infrastructure in this region is great.

Understanding the spatial distribution of flash floods is crucial for risk analysis and prevention. Traditional methods have advanced by integrating high-resolution ground-based hydrological data with numerical models that capture hydraulic dynamics and morphological features of river networks [11]. However, their reliance on ground-based networks results in limited coverage, especially in mountainous regions. In contrast, satellite monitoring provides broader and near-real-time data on flood extents, channel morphology, and post-flood topographic changes [12]. Moreover, integrating satellite data with GIS and machine learning techniques in flash flood regionalization refines our ability to characterize the spatial distribution patterns [13]. This is particularly valuable for mountainous areas in the HMR, where monitoring facilities are sparse [14]. Although regionalization analysis on climate, vegetation [15], and hydrology [16] in the HMR has been conducted, these studies apply traditional indicator thresholds or geographical analysis methods without considering the spatial relationship between geographical data dimensions and were not designed specifically for flash floods in this region.

Flash flood regionalization utilizes the inducing factors as input parameters and partitions the geographical space into homogeneous regions considering both the intra-regional similarity and inter-regional differences of the input parameters [17]. The inducing factors are closely related to the generation processes of flash floods, which involve complex interactions between various aspects of geomorphology, climate, meteorology, hydrology, vegetation, and soil conditions [11]. In each homogeneous region, the influence of inducing factors on flash floods is similar in space, and hence, the risk management strategies and hazard mitigation measures can be consistent for the same region [18]. Additionally, flash flood regionalization facilitates the extrapolation of monitored data from gauged sites to ungauged areas within the same region [14,18], which would be of great help to address the challenges of limited data access in mountainous areas due to measuring difficulty.

Classic machine learning techniques, including K-means, tree-based ensemble models [19], support vector machine (SVM, e.g., [20]), and self-organizing-map-based (e.g., [18]), have been widely employed in flash flood regionalization. These methods perform as the spatial expansion of various clusters obtained by integrating the impacts of the input inducing factors (e.g., [18]). Zhang et al. [18] applied a two-stage hybrid self-organizing map-based clustering algorithm to delineate 18 homogeneous flash flood regions according to 13 key factors in Jiangxi Province. Deep learning techniques, such as convolutional neural network (CNN, e.g., [2]) and LSTM neural network (e.g., [1]), have also been applied in flash flood regionalization and showed superior computational efficiency and feature extraction capabilities. Flash flood regionalization utilizes multi-dimensional geographical data that relates to the formation of flash floods and shows significant spatial correlations [18]. The factors considered in regionalization typically include topographic characteristics (e.g., slope, elevation [1]), climatic conditions (e.g., precipitation [2]), hydrological properties (e.g., drainage density [19]), soil attributes (e.g., type, depth [21]), and land cover types (e.g., vegetation index [2]). Therefore, it is crucial to account for both the geographical attributes at individual map grids and the spatial connections between these grids during the clustering process. Despite their effectiveness, classic machine learning approaches primarily focus on geographical attributes and often fail to capture the spatial correlations crucial for flash flood regionalization [22].

In contrast, graph neural networks (GNNs) take both attribute characteristics and spatial topology structures into account in classification tasks in geoscience [23] and hence, show great potential in flash flood regionalization. Recent advancements have enhanced the performance of GNNs by jointly optimizing embedding learning and graph clustering [24], as well as enhancing the scalability of GNNs for better classification performance in processing large-scale data. Notable examples, such as GraphSAGE [25], S3GC [26], and Dink-Net [27], are capable of dealing with large-scale input data at the level of millions. These advancements in GNN would facilitate flash flood regionalization for large areas with multi-dimensional input attributes [28]. However, since these advanced GNN methods have only been developed very recently, their performance in flash flood regionalization still needs to be tested and evaluated [28].

An inherent issue regarding the predictions or classifications by deep learning (DL) models (including GNNs) lies in that these models perform as “black-box” approaches. Fortunately, post-hoc interpretation models are capable of quantifying and visualizing the complex interactions and non-linear relationships that DL models have captured from the input data [29]. Particularly, the Shapley Additive exPlanations (SHAP) model studies the sensitivity aspect of DL models by perturbing the original inputs [29,30]. This model can not only examine the influence of each input feature on the entire region, parts of the region, or even each grid but also indicate whether the influence is positive or negative. Therefore, the SHAP model has the promise to quantify the contributions of the inducing factors to flash flood regionalization results obtained by the GNN methods.

The objectives of this study are: (i) to establish an automatic flash flood regionalization framework by integrating GNNs with interpretation models for the HMR; and (ii) to evaluate the framework’s effectiveness in revealing spatial distribution patterns of flash floods and elucidating the impacts of inducing factors within the HMR. To achieve these research goals, we first compiled and analyzed a comprehensive set of attributes related to flash flood generation for the entire HMR. We tested three GNN methods (Deep Graph Infomax (DGI), Deep Modularity Networks (DMoN), and Dilation shrink Network (Dink-Net)) and two classical machine learning methods (K-means and the self-organizing feature map (SOFM)) for flash flood regionalization. The SHAP model was then employed to quantify the impact of inducing factors on the optimal regionalization result. Last, the spatial distribution features of flash floods and the impacts of inducing factors were analyzed and discussed.

2. Study Area and Data

2.1. Study Area

The area of the Hengduan Mountains region (HMR) is about 430,000 km², covering 5 provinces of China (Figure 1). The terrain elevation in the HMR decreases from northwest to southeast, with altitudes typically exceeding 4000 m in the northwest and falling to less than 1500 m in the southeast. This region features seven prominent north–south oriented mountain ranges (i.e., the Boshulaling Mountain, Taniantaweng Mountain, Mangkang Mountain, Shaluli Mountain, Daxueshan Mountain, Qionglai Mountain, and Min Mountains) and six large rivers (i.e., the Nu (Salween) River, Lancang (Mekong) River, Jinsha River, Yalong River, Dadu River, and Min River). This topography results in the alternation of high mountains and deep valleys in the east–west direction [31,32].

The HMR spans the middle subtropical zone and the plateau temperate zone, with an average annual temperature and precipitation of 11.33 °C and 816.20 mm [33]. Influenced by the East Asian monsoon and South Asian monsoon and blocked by north–south mountain ranges, rainfall diminishes from the southeast and southwest towards the northwest hinterland (Figure S5f). The wet seasons, generally from May to October, contribute approximately 90% of the annual precipitation [34]. In summer, the occurrence rate of nocturnal precipitation generally exceeds 50% here [35]. Over the past 50 years, an increase in both the frequency and intensity of rainfall has been observed, which in turn has elevated flash flood risks in vulnerable areas [36]. Both the frequency and intensity of extreme rainfall events are higher in the southern and western fringe areas of the HMR but lower in the northern part (Figure S5d,e) [37]. As for soil characteristics, brown soil, red soil, and alpine meadow and steppe soil are the dominant soil types and sandy loam soil is the main soil texture in the region [38]. The highly spatial heterogeneity of soil characteristics and landscapes results in high diversity of plant communities, with distinct zonation in both vertical and horizontal gradients [32]. The most widely distributed vegetation types in the HMR are evergreen coniferous forests and shrub grass [32,39]. Because of the abundant rainfall and steep terrains in the mountainous areas, flood-related disasters are widely common in the HMR (Figure 1a). In general, historical flash flood events appear more frequently in the south of the HMR than in the north [38] and are mainly concentrated along the valleys of the large rivers flowing in the HMR (Figure 1a).

2.2. Data

To comprehensively capture the factors influencing flood formation, an input database comprising 36 inducing factors was established (see details in Section 3.1 and Table 1). This dataset was selected from an initial set of 43 factors. To reduce redundancy and ensure the robustness of the analysis, 7 factors with correlation coefficients > 0.7 and high variance inflation were eliminated through Spearman correlation analysis, multicollinearity analysis, and flood mechanism analysis [40]: residual soil moisture, saturated soil moisture, soil bulk density, topsoil sand content, topsoil silt content, topsoil clay content, and stream power index (see details in Supplementary Section S1). The selected input data covered five key aspects, i.e., geomorphology, climate, meteorology, hydrology, and underlying surface conditions.

The original dataset included both raster and vector formats. Runoff data, originally in vector form, was converted to raster format, while the remaining data were already in raster form. Among the 36 variables, geomorphic type, vegetation type, soil type, soil texture, and soil depth were classified data, whereas the others were continuous data. Additionally, factors listed in Table 1 without specified time scales or durations represented average states or static indicators. For factors spanning extended time periods, annual averages were calculated first, followed by multi-year averages (e.g., annual mean rainfall).

Extreme rainfall is an important direct trigger for the occurrence of flash floods [18]. We have included several key extreme rainfall factors in the input data, and the calculation process is as follows. 25 extreme rainfall factors under different statistical frequencies were initially calculated based on the 3-h resolution dataset from 1979 to 2018. After testing the clustering results under different data combinations, 12 in Table 1 were selected. To quantify extreme rainfall factors, the maximum rainfall values at the scales of 3 h, 6 h, 12 h, and 24 h in each year were calculated and ranked in descending order. The empirical frequency of the maximum values for each temporal scale was computed by fitting Pearson III-type empirical frequency curves with the method of moments [28]. From this, the maximum rainfall values for each temporal scale with frequencies of 1% and 50% were then extracted, representing rainfall with a recurrence interval of 100 years and 2 years, respectively. These values were denoted as (P_3h)_1%, (P_3h)_50%, (P_6h)_1%, (P_6h)_50%, (P_12h)_1%, (P_12h)_50%, (P_24h)_1%, and (P_24h)_50%. Additionally, the annual mean of the maximum values for each temporal scale were also calculated and added to the input dataset as (P_3h)_mean, (P_6h)_mean, (P_12h)_mean, and (P_24h)_mean. Since flash floods due to snow- or ice-melting only occur in very limited areas in the HMR, data on snow or ice were not considered in this work [41].

The point-based records of 1680 historical flash flood events cover the period during 1950–2015 in the HMR (Figure 1), as documented by the National Flash Flood Investigation and Evaluation Project of China [6]. This dataset primarily consists of events caused by heavy rainfall. Flash floods caused by snowmelt or ice melt were not prominently reported during this period. The data were used in the validation of the regionalization results in Section 3.3, and the analysis was shown in Section 4.2. Each event was georeferenced with latitude and longitude coordinates, allowing for spatial analysis of flash flood occurrence patterns.

Table 1. Influencing factors for flash floods in the HMR.

Aspect	Factors	Abbreviation	Spatial Resolution	Temporal Resolution or Span	Geographical Meanings	Data Source
Geomorphology	Elevation	Dem	30 × 30 m²	Average states or static indicator	Height above sea.	Geospatial data cloud (www.gscloud.cn (accessed on 28 February 2025))
	Slope	Slope	1 × 1 km²		The ratio of elevation increment to horizontal increment.	Calculated from DEM data
	Aspect	Aspect	1 × 1 km²		The topographic slope orientation.
	Elevation difference	ED	1 × 1 km²		The difference between the maximum and minimum of elevation.
	Topographic wetness index (TWI)	TWI	1 × 1 km²		TWI = In[sink flow per unit area/Tan(slope)] The influence of topography on runoff direction and accumulation.
	Geomorphic type	GT	—		Classification based on elevation and topographic relief.	Resource and Environment Science and Data Center (https://www.resdc.cn/ (accessed on 28 February 2025))
Climate	Annual mean temperature	Temp	0.1° × 0.1°	2000~2023 (yearly data)	The annual average value of temperature.	European Centre for Medium-Range Weather Forecasts (ECMWF), ERA5-Land Dataset (https://www.ecmwf.int/ (accessed on 28 February 2025))
	Annual mean rainfall	Rainfall	1 × 1 km²	2000~2022 (yearly data)	The annual average value of rainfall.	National Tibetan Plateau Data Center (http://data.tpdc.ac.cn/ (accessed on 28 February 2025))
	Annual mean potential evapotranspiration	PET	1 × 1 km²	2000~2022 (yearly data)	The annual average value of potential evapotranspiration.
	Terrestrial actual evapotranspiration	AET	1 × 1 km²	2001~2019 (yearly data)	The annual average value of actual evapotranspiration.	National Tibetan Plateau Data Center (http://data.tpdc.ac.cn/ (accessed on 28 February 2025)), ETMonitor Global Actual Evapotranspiration Dataset with 1-km Resolution [42]
Meteorology	Maximum of 3-h/6-h/12-h/24-h with frequency of 1% and 50% rainfall, annual average maximum rainfall for 3 h/6-h/12-h/24-h	(P_3h)_1%, (P_3h)_50%, (P_3h)_mean, (P_6h)_1%, (P_6h)_50%, (P_6h)_mean, (P_12h)_1%, (P_12h)_50%, (P_12h)_mean, (P_24h)_1%, (P_24h)_50%, (P_24h)_mean	0.1° × 0.1°	1979~2018	Flash floods are primarily influenced by short-term meteorological indices within a day. Therefore, extreme rainfall data across four time scales within 24 h were selected. The 1% and 50% percentiles correspond to event frequencies occurring once every 100 years and once every two years, respectively.	A Big Earth Data Platform for Three Poles (https://poles.tpdc.ac.cn/ (accessed on 28 February 2025)), China meteorological forcing dataset [43].
Hydrology	Runoff	Runoff	1 × 1 km²	1979–2013 (yearly data)	Index of the annual daily runoff for 35 years for each river segment, reflecting both the structure of the river network and the flux in the river network, obtained from VIC simulations.	National Tibetan Plateau Data Center (http://data.tpdc.ac.cn/ (accessed on 28 February 2025)), Global Reconstruction of Naturalized River Discharge at 2.94 Million River Reaches (GRADES) [44,45].
	Vegetation transpiration	Ec	500 × 500 m²	2000–2020 (yearly data)	Water lost in plants to the atmosphere as water vapor.	PML-V2(China): evapotranspiration and gross primary production dataset. A Big Earth Data Platform for Three Poles (https://poles.tpdc.ac.cn/ (accessed on 28 February 2025)) [46,47].
	Vaporization of intercepted rainfall	Ei	500 × 500 m²	2000–2020 (yearly data)	Water from vegetation canopy intercepting rainfall evaporates into the atmosphere.
	Soil evaporation	Es	500 × 500 m²	2000–2020 (yearly data)	Water in soil that evaporates into the atmosphere.
Underlying surface condition	Vegetation type	Vege	1 × 1 km²	Average states or static indicator	Vegetation type.	Resource and Environment Science and Data Center (https://www.resdc.cn/ (accessed on 28 February 2025))
	NDVI	NDVI	1 × 1 km²	2001~2020 (yearly data)	Normalized difference vegetation index.	Resource and Environment Science and Data Center (https://www.resdc.cn/ (accessed on 28 February 2025))
	LAI	LAI	1 × 1 km²	2001~2023 (yearly data)	Leaf area index.	Google Earth Engine (https://developers.google.cn/earth-engine/datasets/catalog/MODIS_061_MOD15A2H (accessed on 28 February 2025)), MOD15A2H LAI
	Soil type	Stype	1 × 1 km²	Average states or static indicator	Soil type.	HWSD v1.1 (http://www.ncdc.ac.cn (accessed on 28 February 2025))
	Soil texture	Stexture	1 × 1 km²		Assemblage of mineral particles of different sizes in soil.
	Soil depth	Sdepth	1 × 1 km²		The depth between the surface and the bedrock.
	Soil gravel content	Sgravel	1 × 1 km²		Proportion of gravel volume in surface soil.
	Soil saturated hydraulic conductivity	Ks	1 × 1 km²		In water-saturated soil, the amount of water passing through the soil per unit area in a unit time under a unit water potential gradient.	A High-Resolution Global Map of Soil Hydraulic Properties (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UI5LCE (accessed on 28 February 2025)) [48].
	Soil moisture	Smoisture	1 × 1 km²	2000~2020 (yearly data)	The amount of water contained in the soil.	A Big Earth Data Platform for Three Poles(https://poles.tpdc.ac.cn/ (accessed on 28 February 2025)), China Soil Moisture Dataset [49,50].
	Soil erodibility	Uslek	250 × 250 m²	Average states or static indicator	Soil erodibility reflects the sensitivity of soil to erosion and the transport capacity of soil particles.	National Tibetan Plateau Data Center (http://data.tpdc.ac.cn/ (accessed on 28 February 2025)), Soil Erodibility Dataset of Pan-Third Pole 20 Countries [51,52].

3. Methods

The framework for flash flood regionalization is presented in Figure 2. The framework initiated with collecting and preprocessing the input data on inducing factors of flash floods. Clustering analysis was then conducted using five methods and different cluster numbers. By postprocessing and mapping of the flash flood regionalization, the optimal method and regionalization map for the HMR were determined. Last, the interpretation of the regionalization result was achieved based on the SHAP model and the impact of each input factor on the regionalization result was revealed.

3.1. Data Preprocessing

The raw database containing 43 inducing factors for flash floods were re-rasterized to 2 × 2 km² in ArcGIS (v10.6). The correlation between the input data will affect the subsequent influence analysis of the factors and 7 factors were eliminated through correlation analysis, multicollinearity analysis, and flood mechanism analysis. Consequently, the final input data contained 36 inducing factors. Spearman correlation analysis and multicollinearity analysis were performed with the pandas library in Python 3.8. The Stream Power Index, soil saturated water content, soil residual water content, soil bulk density, sand content, silt content, and clay content were removed and 36 attributes were used for clustering analysis (Table 1 and Section 2.2).

For the GNN methods, the attributes and locations of all the grids needed to be transformed into a graph structure G = (V, E, X) first [22]. Here,

V = {v_{1}, \dots, v_{N}}

represents a set of nodes, where each node is associated with corresponding d-dimension attributes, and

E = {e_{i j}}

denotes a set of edges connecting nodes. The adjacency matrix of G is denoted as

A \in {0, 1}^{N \times N}

, where

A_{i j} = 1

if

(v_{i}, y_{i}) \in E

, else

A_{i j} = 0

. The node attribute matrix of G is denoted as

X = {x_{1}, \dots x_{N}} \in R^{N \times d}

, where the i-th row of

x_{i} \in R^{d}

represents the d-dimensional attributes vector of node i [22].

The GNN methods aims to partition the nodes in graph G into k disjoint partitions or subgraphs

{G_{1}, G_{2} G_{3}, \dots, G_{k}}

to ensure that the nodes within the same cluster exhibit similarity or proximity in terms of both topology structure and attribute characteristics [27].

To convert the grid data into graph structure data, each grid was initially regarded as a node in the graph and the node set V (V = 108503) was constructed. The distances between the nodes in V were calculated, and the edges connecting the nodes were then established within the 8 neighboring nodes according to the given nearest node number and the upper limit of the node-to-node distances. Nodes represent uniformly distributed grids in our study, so these edges were undirected and unweighted and the connections between the nodes did not have any specific direction. This setting allowed information to flow in both directions in each connection between nodes so that all connections were equal in importance or influence in graph G [53].

The 36 filtered factors were input as the node attributes, which formed the node attribute matrix X (

X \in R^{108503 \times 36})

. The adjacency matrix A was constructed to represent the edges connecting nodes in V, where

A \in R^{108503 \times 108503}

.

3.2. Clustering Analysis

3.2.1. Classic Machine Learning Methods

We employed two classic machine learning methods for flash flood regionalization: K-means and self-organizing feature map (SOFM). The K-means method repeatedly updates the cluster centers to segment all the data points into K clusters by minimizing the sum of squared distances between each data point and its closest cluster center [54]. SOFM efficiently maps high-dimensional data onto a lower-dimensional space based on the idea of competitive learning and self-organization [55]. The grid data of all the 36 inducing factors was transformed into a text data format for these two classic machine learning methods, and both methods were implemented by the scikit-learn library and minisom library in Python 3.8.

3.2.2. GNN Methods

Graph Neural Networks (GNNs) employ self-supervised manners to embed the input data into the latent space and separate the embeddings into several disjoint clusters [53]. Three recently developed GNN methods with the capacity to process large-scale input data, i.e., Deep Graph Infomax (DGI), Deep Modularity Networks (DMoN), and Dilation shrink Network (Dink-Net), were chosen and implemented by Python 3.8.

DGI utilizes the principle of contrastive learning and trains the Graph Convolutional Network by optimizing local mutual information between patch node representations and corresponding high-level summaries of graphs. This method employs a binary cross-entropy loss function, leading to an embedding representation that encapsulates the global information content of the entire graph [56]. The DGI method is a two-step process: (i) generating the embedding representation of the graph data, and (ii) applying K-means to produce the final clustering results.

DMoN and Dink-Net are one-step approaches that integrate the clustering process within GNNs to yield the ultimate clustering results. Inspired by the modularity quality function and spectral optimization, DMoN applies activation functions to derive soft cluster assignments and utilizes a fully differentiable unsupervised clustering objective to optimize soft cluster assignments, with a null model to manage graph inhomogeneities [57].

Dink-Net first learns representations in a self-supervised manner by discriminating nodes. The K-means method was then used to initialize the clustering centers of the node representations, and the clustering centers were assigned as learnable parameters of GNN. Motivated by the dilation and shrink of galaxies in the universe, Dink-Net optimizes the final clustering distribution by minimizing both cluster dilation loss and cluster shrink loss in an adversarial manner [27]. Unlike DGI and DMoN, Dink-Net unifies representation learning and clustering optimization within a cohesive end-to-end framework. This integration facilitates the extraction of clustering-friendly features and improves the overall clustering performance.

3.3. Evaluation of Clustering Performance

The clustering performance by the classic machine learning and GNN methods was evaluated by two clustering validity indices and the optimal regionalization was identified.

First, both the clustering quality index (CQI) and Davies–Bouldin Index (DBI) were calculated (Equations (1) and (2)) across all methods and all cluster numbers to quantify the overall performance of each regionalization result. CQI is the sum of the average coefficient of variation of the inducing factors within the clustering results and quantifies the dispersion level of these characteristics [58]. A lower CQI value indicates that the variation in inducing factors within each cluster is smaller, meaning that the flash flood conditioning factors exhibit higher internal consistency within clusters. This suggests that the clustering method effectively differentiates regions based on similar flash flood conditioning environments. DBI assesses the compactness within the clusters and separation between the clusters by the distance between cluster centers [59]. A lower DBI value signifies that the average distance of all grid points to their cluster centroid is smaller, while the distance between different cluster centroids is larger. This indicates that the clusters are spatially compact internally and well-separated from each other. Therefore, lower values of both CQI and DBI indicate better clustering performance. The optimal regionalization method and result were achieved by simultaneously considering the performance in CQI and DBI of each regionalization result.

C Q I = \frac{1}{N} \sum_{i = 1}^{K} n_{i} \sum_{j = 1}^{P} {C V}_{i}^{j}

(1)

where N is the number of total grids; K is the cluster number;

n_{i}

is the number of grids for cluster i; P is the number of inducing factors; and

{C V}_{i}^{j}

is the coefficient of variation of factor j in cluster i.

D B I = \frac{1}{K} \sum_{i, j = 1}^{K} {m a x}_{i \neq j} {\frac{{\bar{d}}_{i} + {\bar{d}}_{j}}{d_{i j}}}

(2)

where

{\bar{d}}_{i}

is the average distance of all grids in the i-th cluster to the centroid of the cluster; and

d_{i j}

is the distance between the centroid of cluster i and the centroid of cluster j.

The point data of historical flash flood events was used to assess the reasonability of the optimal regionalization result and how well the regionalization captured the spatial distribution patterns of flash flood occurrences First, the quantity and density of historical flash flood events in each subdivision of the optimal regionalization result were calculated. Then, we applied the hotspot analysis to examine the statistically significant spatial structure of the historical flash flood events [60]. Hotspot analysis calculates the Z-score, which represents the heterogeneity degree of geographical element (i.e., historical flash flood events in this study) distributions relative to the uniform distribution. The Z-score can be used to identify the spatial structure features of the geographical elements in the map [60]. A positive Z-score suggests spatial clustering of the geographical elements, while a negative Z-score suggests a sparser distribution. A larger magnitude of Z-score means the distribution is increasingly concentrated or sparse for positive and negative Z-scores, respectively [18]. In this work, the distribution of Z-score was calculated for the entire HMR and compared with the regionalization result (see details in Supplementary Section S2). If some subdivisions from the regionalization overlapped with the densely or sparsely distributed areas denoted by the hotspot analysis, it means that the regionalization could well capture the spatial distribution structure of historical flash flood events.

3.4. Data Postprocessing and Mapping

To enhance the regional continuity and integrity of the results, we refined the results through a post-processing step. Polygons smaller than 400 km² were merged with adjacent larger polygons with the longest common boundaries using the Eliminate tool in ArcGIS (v10.6). The polished regionalization result in raster format was then converted into shapefile format to generate the regionalization map for each regionalization result.

3.5. Interpretation of Regionalization Results

The Shapley Additive Explanations (SHAP) interpretation model was used to quantify the contribution of 36 inducing factors to the overall regionalization result and to each subdivision (Figure 2). The SHAP model is based on the Shapley value concept in cooperative game theory and calculates the marginal contribution of the input factors to the model output by perturbing the original inputs for the tested model [29,30].

Given that the SHAP model can only be applied to supervised models, a Random Forest (RF) model with 100 decision trees was trained using the optimal regionalization result obtained by Dink-Net with 12 clusters and the 36 input inducing factors, and the SHAP model was then applied to the trained RF model. The accuracy of the RF model reached 0.9, indicating that its results well represented the regionalization result by the original GNN model. The SHAP values of all the input factors were calculated for each grid in the regionalization result as the local interpretation result to quantify the influence of each inducing factor at grid scale. The average SHAP absolute values of all the grids in a subdivision and the entire HMR for each input factor were calculated as the global interpretation results. The SHAP and RF models were implemented by the scikit-learn library and shap library in Python 3.8, respectively.

4. Results

4.1. Clustering Performance of the Five Methods

Figure 3 presents the variations in the DBI and CQI values with the cluster number (K) from 2 to 20 for all the five tested clustering methods. DBI values generally increase with K for each tested clustering method (Figure 3a). The DBI values of Dink-Net and DGI methods increase for K = 2–10. Then, they stabilize around 2.8 at K = 11–20. The K-means method shows a continuous increase in DBI values in the tested range of K. For the DMoN method, the DBI values fluctuate around 4.5 at K = 3–14. And the DBI value remains around 4 at K > 14. The DBI values for the SOFM method keep increasing at K ≤ 12 and fluctuate around 7 at higher cluster numbers. The SOFM has DBI values significantly higher than the other four methods after K reaches 7. Overall, the Dink-Net and DGI methods outperform the other methods as lower DBI values indicate higher clustering quality.

In terms of CQI values, Figure 3b shows an overall decreasing trend as the cluster number increases for all five methods. The CQI values of all five methods show an overall decreasing trend with the increase of the cluster number (Figure 3b). The CQI of the five methods show no clear variation trends at K < 6. The CQI values of the Dink-Net method are the lowest of the five methods at K ≥ 6. Although the CQI values of DGI are also kept at a low level, the fluctuation is much more significant than that for Dink-Net. In some cases, the CQI values of DGI even become the highest among the five methods at the peaks (e.g., K = 13). After K reaches 6, the CQI values of DMoN generally exceed those of K-means, which is higher than those of SOFM. Overall, the values of both CQI and DBI for the Dink-Net method remain the lowest among all the tested methods, and this method shows higher potential as the optimal method for regionalization.

Figure 4 presents the regionalization maps for the HMR with all five tested methods with 12 clusters. The regionalization maps by both K-means and SOFM methods generally show much more fragmented subdivisions than those generated by GNN methods. In the regionalization map by K-means, the fragmented subdivisions are mainly located in the western and northeastern parts of the HMR. These areas correspond to the Three Parallel Rivers region (Nu River, Lancang River, and Jinsha River, Figure 4 and Figure 1a) and the transition area from the western Sichuan Plateau to Chengdu Plain, respectively, both characterized by the alternation of deep valleys and high mountains. Similarly, in the regionalization map by SOFM, the subdivisions are seriously broken along the areas with intense river incision, especially the Jinsha River and Yalong River in the central HMR. This indicates that both K-means and SOFM methods are unsuitable for flash flood regionalization in areas with deep incised rivers.

The DMoN method generates flawed clustering results due to a significant imbalance in the number of grids across classes in the initial clustering output. In contrast, the regionalization maps generated by the Dink-Net and DGI methods feature fewer subdivisions, well-defined and continuous boundaries, and a better reflection of the spatial characteristics of geographical elements. For instance, subdivisions in the northwestern part of the regionalization maps by these two methods can reflect the drainage structure of the Nu, Lancang, and Jinsha Rivers (Figure 4c,d). Additionally, the Dink-Net method requires less computation time compared to the other methods. Overall, Dink-Net achieves the lowest DBI and CQI values while demonstrating the highest computational efficiency, making it the optimal clustering method among those tested in this study.

We then narrowed the range of cluster numbers to 10–16 for the regionalization results by Dink-Net based on the two following considerations. First, lower DBI and CQI values are preferred, as they refer to higher clustering quality. Second, a smaller cluster number facilitates the visualization and interpretation of the regionalization results. These regionalization results at K = 10–16 were evaluated against the geographic characteristics of the HMR and consequently, the cluster number of 12 was selected to generate the optimal regionalization map (see Supplementary S3 for details). The optimal regionalization result reflected the spatial distribution patterns of inducing factors for flash floods, and the subdivisions matched with the heterogeneous spatial distributions of these factors (see Supplementary S4 for details). Through the above analysis, the optimal clustering method, number of clusters and regionalization result are determined.

4.2. Regionalization Result vs. Historic Flash Flood Events

The regionalization result was compared with the spatial distribution of historical flash flood events (see details in Supplementary Section S2), with the quantity, density, and Z-score of the historical flash flood events for the subdivisions shown in Figure 5. Regions SW-1, SW-7, and SE-8 show relatively high quantity and density of the historical flash flood events (Figure 5a,b). The larger positive mean values of the Z-score in these three subdivisions indicates that these areas are spatially clustered areas for historical flash flood events (Figure 5c). In contrast, the regions NW-3, NW-4, M-6, and M-12 show a relatively low quantity and density of the historical events. Correspondingly, the mean values of the Z-score in these subdivisions are negative, indicating that these subdivisions are sparse areas of historical flash flood events. On balance, these findings suggest that the regionalization subdivisions are capable of characterizing the densely and sparsely distributed areas of the historical flash flood events.

To further analyze the characteristics of these regions, we examined their geographical and environmental attributes. Regions SW-1 and SW-7 are located in the southwest of the HMR, encompassing the lower basins of the Lancang River and Jinsha River (Figure 6a,b). Flash flood events are relatively evenly distributed in Region SW-1 but concentrated in the lower basins of the Lancang River in Region SW-7. Region SE-8, situated in the Pan-xi area (southwest of Sichuan Province), experiences widespread flash flood events throughout the region (Figure 6a,d). Region SW-1 is relatively flat, while Regions SW-7 and SE-8 exhibit slight undulations. Additionally, the soil moisture conditions vary among these regions: Region SW-1 has a relatively low soil moisture, whereas Region SE-8 has a higher soil moisture content (Figure S5g). Despite these differences, all three regions share a common characteristic of low soil saturated hydraulic conductivity (Figure S5h).

As the sparsely distributed areas of the historical flash flood events, Regions NW-3 and NW-4 are located in the northwest of the HMR and the southeast of Tibet, characterized by high elevation and significant topographic relief (Figure 6a,f), and Regions M-6 and M-12 are located in the plateau mountains of western Sichuan Province (Figure 6a,h). Regions NW-3 and NW-4 differ significantly in vegetation and geomorphology types from Regions M-6 and M-12. The former regions are characterized by dry-hot valleys with strong downcutting and lateral constraints, steep topography, and exposed surface soils. Sparse shrub is the dominating vegetation. The rare flash flood events in these regions may be related to the low rainfall (Figures S5d–f). In contrast, Regions M-6 and M-12 are located in the broad, flat river valleys on the plateau, where the river valleys are generally ‘U’ shaped with gentler slopes (Figure 6h,i). Unlike the steep terrain of Regions NW-3 and NW-4, these regions experience weaker incision dynamics and benefit from better vegetation coverage, primarily consisting of alpine meadows and shrublands. Consequently, the improved vegetation conditions and reduced geomorphic potential energy conditions might lead to rare occurrence of flash flood events even though the rainfall in Regions M-6 and M-12 is higher than that in Regions NW-3 and NW-4.

4.3. Influence of Inducing Factors on Flash Flood Regionalization

The impact of inducing factors for the entire HMR based on the global interpretation of the SHAP model is presented in Figure 7. Notably, the five most influential factors on flash flood regionalization are temperature, (P_12h)_mean, (P_24h)_mean, soil moisture, and (P_24h)_50%, and their impacts occupy proportions of 8.29%, 7.41%, 6.49%, 5.71%, and 5.69% in the sum of 36 factors’ absolute SHAP values. Importantly, six of the top ten factors are related to the extreme rainfall and account for 55.03% of the impact from all the inducing factors, highlighting the significant influence of extreme rainfall in driving flash floods within the HMR.

Furthermore, Figure 7 shows the influence of the inducing factors and their variability across the 12 subdivisions. For the flash flood-prone areas (e.g., Regions SW-1, SW-7, and SE-8), soil moisture, temperature, and extreme rainfall-related factors (e.g., (P_12h)_mean, (P_24h)_mean, and (P_24h)_50%) emerge as the most significant effects in general. Additionally, soil saturated hydraulic conductivity (Ks) plays an important role in Region SE-8, ranking as the second most important factor for this subdivision. In contrast, flash flood sparse areas, such as Regions NW-3, NW-4, M-6, and M-12, exhibit a different set of dominant factors. Here, temperature, potential evapotranspiration, and (P_12h)_mean are the three most influential factors. Moreover, in Region NW-3, the geomorphic type is found to be particularly critical, ranking the second among all inducing factors.

To further illustrate the local impact of these factors, taking the three clustered areas for flash floods (i.e., Regions SW-1, SW-7, and SE-8) as examples, Figure 8 illustrates the local impact of the inducing factors and the way of their influence. In both Regions SW-1 and SW-7, soil moisture and temperature are the top two influential factors, with reversed rankings in these two regions. These are followed by extreme rainfall-related factors. Notably, the way in which soil moisture and temperature influence flash floods are the opposite (Figure 8a,b): higher soil moisture negatively impacts flash flood occurrence, reducing the likelihood of an event, while higher temperatures show a positive impact, demonstrating that elevated temperatures promote flash flood occurrence. In Region SE-8, soil moisture is the most significantly influencing factor, followed by soil saturated hydraulic conductivity (Figure 8c). The higher the soil moisture and lower soil saturated hydraulic conductivity (Ks) would lead to the stronger positive impact effect on flash floods. As for the extreme rainfall-related factors, higher values generally have a greater positive influence on the occurrence of flash floods in all three regions. Specifically, the annual mean values of the maximum rainfall at 6 h-, 12 h-, and 24 h-scales showed more significantly positive influence than the remaining rainfall-related factors in the three regions except that (P_24h)_50% had a higher positive effect in Regions SW-7 and SE-8 (Figure 8b,c).

Figure 9 illustrates the spatial distributions of the subdivision-scale global impacts from several representative inducing factors. Overall, the impacts of all of these inducing factors on flash flood regionalization exhibit significant spatial heterogeneity at the subdivision scale in general. Temperature shows a pronounced positive impact along the river valleys in the HMR, particularly in Region W-2, which encompasses the Three Parallel Rivers region (Figure 9a). The two key extreme rainfall factors, (P_12h)_mean and (P_24h)_mean, exhibit significantly positive contributions in the south and east of the HMR (i.e., Regions E-11, SW-1, and SE-8), with the strongest impact emerging along the eastern edge of the HMR (Figure 9b,c). Elevation exhibits a highly positive contribution in the central and northeast parts of the HMR, generally along the Yalong, Dadu, and Min Rivers in the HMR, which locate in the transition zone between the Qinghai-Tibet Plateau and the Sichuan Basin (Figure 9d). Soil moisture contributes positively in the southern part of HMR, where the historical flash flood events are densely distributed (Figure 6a and Figure 9e). Lastly, soil saturated hydraulic conductivity exhibits a pronounced influence mainly in the eastern and western parts of HMR, while showing less impacts in the northwestern areas (Figure 9f).

5. Discussion

5.1. Strengths of the New Regionalization Framework

The newly developed framework that combines GNN and SHAP models for flash flood regionalization effectively applied the spatial distribution information within the input data and provided a quantitative assessment of the influence of inducing factors. The framework has shown the following advantages over the existing methods for flash flood regionalization through the comparisons among the tested methods and with historical flash flood events.

First, the GNN methods accounted for both the attribute characteristics and spatial topology structure of the input geographic data in regionalization, which resulted in better performance over the classic methods. This had been reflected in the lower values of both DBI and CQI for the Dink-Net and DGI methods than the K-means and SOFM methods across the tested cluster numbers (see details in Section 3.3 and Figure 3). In contrast, the other GNN method, the DMoN, showed higher or similar DBI values compared to the K-means (Figure 3a) and the highest CQI values among all tested methods at K > 8 (Figure 3b). The DMoN method produced clustering results in which some clusters had very few grids, leading to significant discrepancies with other clusters. The subdivisions corresponding to these smaller clusters were further merged into adjacent larger subdivisions during postprocessing, resulting in an imbalanced regionalization result (Figure 4e). These results indicate that not all GNN methods are suitable for flash flood regionalization, and their suitability should be assessed before application.

Compared to previous studies that applied K-means, hierarchical clustering [14], tree-based ensemble models [20], or self-organizing-map-based models [18], the proposed GNN approach effectively captures spatial dependencies and preserves geographical coherence in clustered regions. For example, in the regionalization results obtained by both K-means and SOFM methods, highly fragmented subdivisions were observed in the western region of the HMR (Figure 4a,b). This region corresponded to the Three Parallel Rivers region, where alternation of high mountains and deep valleys lead to abrupt changes in topography, landforms, and related meteorological and climatic conditions [38]. In contrast, the regionalization results obtained by Dink-Net and DGI methods well characterized the drainage structure and topographic differences in this region (Figure 4c,e). These results suggest that the two traditional clustering methods faced challenges in handling the regions with significant and successive topographic relief.

Second, the Dink-Net model in the framework was unsupervised, meaning that the historical flash flood event records were not needed in the regionalization process, in contrast to supervised approaches [2,21,61]. This makes it easier to implement our framework in flash flood-prone regions, where access to the historical data is normally necessary to train the supervised models but is also limited, especially in mountainous regions [7]. Even though the historical flash flood event records exist sometimes, the quality and coverage are difficult to guarantee, and hence, the sparse data points in certain areas does not necessarily indicate a lower flash flood risk. Such historical data would introduce biases into the supervised model through the training process. Nevertheless, without applying the historical records in training, the regionalization results obtained by the unsupervised Dink-Net model will not be affected by the biases in the historical data. For example, in Region SW-7, the eastern part had fewer historical flash flood events than the west (Figure 5a). If historical records were used together with a supervised neural network, the subdivision might be divided into two or more subdivisions, although the inducing factors in this area showed the similar features.

Third, our framework quantified the impact of all the input data on regionalization at various spatial scales. It not only revealed the general impact of all the input factors on the entire study area, but also quantified the impact on each subdivision and even each grid (Figure 7, Figure 8 and Figure 9). Owing to this multi-scale quantification, much more details of the spatial heterogeneity of the impact from the input inducing factors had been captured. For instance, our results visualized the different impacts of soil moisture across various subdivisions (Figure 8 and Figure 9). In Regions SW-1 and SE-8, soil moisture showed the most significant influence at the subdivision scale but its influence operated in opposite directions at the grid scale. In Region SW-1, lower soil moisture levels corresponded to a stronger positive effect, whereas in Region SE-8, higher soil moisture levels exerted a greater positive impact. In comparison, the probabilistic model (e.g., [62,63] or the information gain ratio method (e.g., [1]) in previous studies were only able to illustrate the effects of the inducing factors for the whole study area, without further information at subdivision or even grid scale.

5.2. Impact of Inducing Factors on Flash Floods in the HMR

Based on the interpretation of the regionalization results through the SHAP model, the influence of the inducing factors was quantified and their spatial distribution was illuminated. These quantitative results provided insights into the formation mechanisms of flash floods in the HMR. The four most influential factors for flash floods in the HMR were identified and discussed as follows, i.e., temperature, extreme rainfall, soil moisture, and elevation.

Our results showed that the extreme rainfall-related factors had the most significant influence on the regionalization (Figure 7). The average cumulative influence of extreme rainfall-related factors across all the subdivisions reached 46.64% (Figure 7). In all the three clustered flash flood regions (i.e., Regions SW-1, SW-7, and SE-8), the cumulative impact of these factors exceeded 50% of the entire impact from all factors (Figure 8). The significant influence of extreme rainfall-related factors in this study is consistent with the finding in Tan et al. [64] that extreme rainfall-related factors contributed 42.24–58.42% of flash flood occurrences. The intense and frequent extreme rainfalls in the HMR mainly resulted from the strong impacts of the monsoons, e.g., East Asian monsoon and South Asian monsoon [38]. These monsoonal systems bring substantial moisture, resulting in extreme rainfall predominantly concentrated in the southern, eastern, and western fringe regions of the HMR. This spatial pattern exhibits high heterogeneity, as shown in Figure S5d–f. The pronounced spatial variability of extreme rainfall-related factors consequently amplifies their influence in the clustering process, increasing their relative contribution to the classification.

The role of topography, such as elevations and slopes, was also identified in this study. The local elevations may affect the formation of orographic precipitation, and steep slopes might facilitate the rapid rise in streamflow [65]. The significant influence of topographic factors was particularly pronounced in the regions with strong valley downcutting, steep bank slopes, and active geological structures, such as Regions M-5, NE-10, and M-12 (Figure 7 and Figure 9). These regions are mainly located in the transition zone between the Qinghai–Tibet Plateau and the Sichuan Basin [38]. However, in areas with lower elevations and relief (e.g., Regions SW-1, SW-7, and SE-8), topography played a secondary role compared to soil characteristics (Figure 6, Figure 7 and Figure 8). These results suggest that a large topographic relief affects flash floods in a more significantly way than elevation. Previous studies have also reported the important effect of topography on the formation of flash floods, contributing up to 35% to the general flash flood risk [13,66]. Our results showed consistency with previous research but further quantified the different roles of various topographic factors.

Our results revealed the significant effects of temperature and evapotranspiration on flash floods, which had not been reported frequently in previous studies (e.g., [19,21]). Temperature had the most significant impact on flash floods across the entire HMR, and it was also the top influential factor for Region SW-7 with densely distributed historical flash flood events (Figure 7 and Figure 8b). For flash flood sparse Regions NW-3 and M-12, the potential evapotranspiration showed the greatest impact (Figure 7) because the attribute value was significantly lower than that of the surrounding regions (Figure S5i). Both temperature and potential evapotranspiration serve as the indicators of heat and influence runoff generation through complex interactions with soil moisture and erosion, vegetation growth, and bedrock weathering [67,68]. Additionally, temperature is also closely related to local elevation, rainfall, and potential evapotranspiration. The broad interactions with other inducing factors might help explain the overwhelming effect of temperature to some degree.

The impact of soil moisture ranked fourth among all the input factors to the entire HMR and showed significant spatial heterogeneity, in terms of both magnitude and influence direction, in the clustered flash flood areas in the southern part of the HMR (Figure 7, Figure 8c, Figure 9e and Figure S5g). For example, in term of influence magnitude, the influence of soil saturated hydraulic conductivity is prominent in Region SE-8 but much less prominent in Regions SW-1 and SW-7. The influence direction of soil moisture is the same in Regions SW-1 and SW-7 but opposite to that in Region SE-8 (Figure 8 and Figure 9e,f). In Region SE-8, the influence of soil moisture and saturated hydraulic conductivity indicates that soil conditions with high water content and poor water conductivity are favorable to flash flood formation. Under such humid and poorly permeable soil conditions, transforming to infiltration would be limited during rainfall [69], making it more likely to generate surface runoff. Thus, we infer that excess infiltration runoff is the dominant runoff generation mechanism in this area. This aligns with findings by Liu et al. [70] that the surface runoff was primarily controlled by saturated excess infiltration mechanism under saturated soil conditions in the purple soil in this region. In contrast, in Regions SW-1 and SW-7, although dry soil has a high initial infiltration capacity under certain conditions, long-term low water content will gradually make the soil hydrophobic, thereby reducing the actual infiltration rate under extreme rainfall conditions, resulting in more rainfall being converted into surface runoff [11]. In addition, dry soils tend to be associated with lower vegetation cover and organic matter content, which also weakens the soil’s water retention capacity [38], resulting in the rapid surface runoff and increasing the risk of flash floods. This explains the favorable conditions for higher occurrence of flash flood under lower soil moisture conditions in Regions SW-1 and SW-7 (Figure 8a,b) and highlights the crucial role of water transport characteristics of surface soil in flash flood generation.

5.3. Limitations and Future Work

Although the presented framework shows satisfying performance in flash flood regionalization in the HMR and effectively quantifies the impact of the inducing factors, this study also has several limitations, as discussed below.

First, the input dataset might not cover all the factors that affect the physical formation processes of flash floods (e.g., soil porosity, maximum consecutive days of rainfall, and accumulated temperature in different seasons), despite our efforts to make the dataset as comprehensive as possible. Because of the limited sources, low resolution, incomplete coverage, and low accuracy of the remote sensing data in the mountainous areas [12], some data dimensions cannot fit the spatial resolution requirement of this work. In future, the expansion of the input data for the flash flood regionalization can be expected with the improvement of the hydrometeorological monitoring system in the HMR.

Second, the selection of the inducing factors is inevitably subjective when building the input dataset. When the dimension number of the input data for a given aspect is large, the weight of this aspect within the entire input dataset will increase and might affect its influence on flash flood regionalization. For instance, there are 12 factors related to extreme rainfall in the input data, and this might intensify the contribution of extreme rainfall in the regionalization result, although the proportion of extreme rainfall-related data in our dataset has already been lower than in previous research (e.g., [18]). A more comprehensive input dataset as mentioned in the first limitation might address this issue in the future. Additionally, an automatic method to determine the optimal combination of input factors [21] will help to reduce the subjectivity in data selection, which will be the objective of our next research work.

Third, a trained Random Forest model was built specially for the SHAP model to interpret the regionalization results as SHAP model was not designed for unsupervised neural networks [30]. Despite this limitation of the SHAP model, we still chose SHAP for its outstanding ability to quantify the impact of input factors at various spatial scales [29,71]. Should new interpretation models be invented specifically for unsupervised GNNs in the future, we will consider replacing the SHAP model.

Last, we applied the framework only in the HMR to address the significant threat of flash floods in this area. Our framework may also work in other mountainous areas prone to flash floods as long as the dataset for the inducing factors is available. Further applications of our framework in other areas are required to assess its generality and to make necessary modifications in flash flood-prone areas in the following research.

6. Conclusions

This study presented a newly developed framework for flash flood regionalization that combined Dink-Net Graph Neural Network (GNN) and SHAP model for a more accurate and interpretable analysis of flash floods in the Hengduan Mountains Region. The comparisons between the traditional and GNN clustering methods on the regionalization performance, together with the visualization of impacts from various inducing factors, showed great promise in our new framework. First, our framework considers both the attribute characteristics and spatial topology structures of the input data and the regionalization result effectively captured the spatial distribution patterns of the inducing factors compared to previous methods. Second, since the GNN model is unsupervised, the regionalization would not be limited by the access to historical flash flood event data, which is normally scarce in mountainous areas. Last, our framework quantifies the influence of inducing factors on flash floods and visualizes the spatial distributions of these impacts in the HMR. The interpretation results of the new framework reveals the dominant impact of extreme rainfall-related factors and their integrated impact, which accounted for 46.64% of all the tested inducing factors in the HMR. The influence of elevation on flash floods becomes significant mainly in the transition zone between the Qinghai–Tibet Plateau and the Sichuan Basin. In the clustered flash flood areas, the impact of soil moisture and conductivity water conduction were particularly prominent and spatially heterogeneous. In conclusion, this study advances flash flood regionalization by integrating cutting-edge deep learning techniques with explainable AI, thereby enhancing both the accuracy and interpretability of the regionalization results. The proposed framework not only provides a powerful tool to reveal the spatial distribution properties of flash flood risk but also offers critical insights to the occurrence mechanism of flash floods. These findings provide a scientific basis for improving flood control and disaster reduction strategies in mountainous regions. Our framework has shown great potential to be applied in early warning establishment, guiding land-use plan and infrastructure development. The distributions and interpretations of key inducing factors can also provide policy-makers with key information to efficiently launch mitigation efforts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17060946/s1. The Supplementary Materials mainly contains information on the results of spearman correlation analysis and multicollinearity analysis (Table S1, Figures S1 and S2), the hotspot analysis (Figure S3), determination of the optimal regionalization (Figure S4), and the comparison between optimal regionalization result and key inducing factors (Figure S5).

Author Contributions

Conceptualization, Y.L. and C.Z.; methodology, Y.L. and C.Z.; software, Y.L., Z.D. and S.Y.; validation, C.Z., P.C., M.H. and S.B.; formal analysis, Y.L. and C.Z.; data curation, Y.L. and C.Z.; writing—original draft preparation, Y.L. and C.Z.; writing—review and editing, all authors; visualization, Y.L.; funding acquisition, C.Z. and P.C; supervision, P.C. and M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R & D Program of China (2023YFC3006701, 2024YFC3012704), the National Natural Science Foundation of China (42471086), the international partnership program of the Chinese Academy of Sciences (177GJHZ2022064FN), and Sichuan Province Zipingpu Development Co., Ltd. (ZPPC2024-05).

Data Availability Statement

The sources of the data used in this study can be found in Table 1.

Acknowledgments

The authors gratefully thank Yifan Liu from Peking University for his assistance in model construction. Guotao Zhang and Yaqiao Wu from the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, are kindly acknowledged for their suggestions on manuscript preparation. We sincerely thank the four anonymous reviewers for their valuable comments, which have greatly improved this paper.

Conflicts of Interest

Author Yang Zhao was employed by the company Sichuan Zipingpu Development Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HMR	Hengduan Mountains region
GNN	Graph Neural Network
SHAP	Shapley Additive exPlanations
SOFM	self-organizing feature map
DGI	Deep Graph Infomax
DMoN	Deep Modularity Networks
Dink-Net	Dilation shrink Network
CQI	clustering quality index
DBI	Davies–Bouldin Index
K	cluster number

References

Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Predicting flood susceptibility using LSTM neural networks. J. Hydrol. 2021, 594, 125734. [Google Scholar] [CrossRef]
Liu, J.; Wang, J.; Xiong, J.; Cheng, W.; Sun, H.; Yong, Z.; Wang, N. Hybrid Models Incorporating Bivariate Statistics and Machine Learning Methods for Flash Flood Susceptibility Assessment Based on Remote Sensing Datasets. Remote Sens. 2021, 13, 4945. [Google Scholar] [CrossRef]
Arabameri, A.; Saha, S.; Chen, W.; Roy, J.; Pradhan, B.; Bui, D.T. Flash flood susceptibility modelling using functional tree and hybrid ensemble techniques. J. Hydrol. 2020, 587, 125007. [Google Scholar] [CrossRef]
He, B.; Huang, X.; Ma, M.; Chang, Q.; Tu, Y.; Li, Q.; Zhang, K.; Hong, Y. Analysis of flash flood disaster characteristics in China from 2011 to 2015. Nat. Hazards 2018, 90, 407–420. [Google Scholar] [CrossRef]
Liu, Y.; Yang, Z.; Huang, Y.; Liu, C. Spatiotemporal evolution and driving factors of China’s flash flood disasters since 1949. Sci. China Earth Sci. 2019, 49, 408–420. [Google Scholar] [CrossRef]
Ministry of Water Resources of China. Bulletin of Flood and Drought Disasters in China; Ministry of Water Resources of China, Beijing: Beijing, China, 2021.
Ma, M.; He, B.; Wan, J.; Jia, P.; Guo, X.; Gao, L.; Maguire, L.W.; Hong, Y. Characterizing the Flash Flooding Risks from 2011 to 2016 over China. Water 2018, 10, 704. [Google Scholar] [CrossRef]
Tian, L.; Ming, B.; Zhang, W.; Huang, K. Multi-time scale complementarity analysis of hydropower, wind power and photoelectric resources in lower reaches of Jinsha River. J. Hydroelectr. Eng. 2023, 42, 40–49, (In Chinese with English abstract). [Google Scholar]
Wen, X.; Sun, Y.; Tan, Q.; Lei, X.; Ding, Z.; Liu, Z.; Wang, H. Risk and Benefit Analysis of Hydro-wind-solar Multi-energy System Considering the One-day Ahead Output Forecast Uncertainty. Adv. Eng. Sci. 2020, 52, 32–41, (In Chinese with English abstract). [Google Scholar]
Cui, P.; Ge, Y.-G.; Li, S.; Li, Z.; Xu, X.-W.; Zhou, G.G.D.; Chen, H.-Y.; Wang, H.; Lei, Y.; Zhou, L.; et al. Scientific challenges in disaster risk reduction for the Sichuan–Tibet Railway. Eng. Geol. 2022, 309, 106837. [Google Scholar] [CrossRef]
Vichta, T.; Deutscher, J.; Hemr, O.; Tomášová, G.; Žižlavská, N.; Brychtová, M.; Bajer, A.; Shukla, M.K. Combined effects of rainfall-runoff events and antecedent soil moisture on runoff generation processes in an upland forested headwater area. Hydrol. Process. 2024, 38, e15216. [Google Scholar] [CrossRef]
Ding, L.; Ma, L.; Li, L.; Liu, C.; Li, N.; Yang, Z.; Yao, Y.; Lu, H. A Survey of Remote Sensing and Geographic Information System Applications for Flash Floods. Remote Sens. 2021, 13, 1818. [Google Scholar] [CrossRef]
Costache, R.; Pham, Q.B.; Sharifi, E.; Linh, N.T.T.; Abba, S.I.; Vojtek, M.; Vojteková, J.; Nhi, P.T.T.; Khoi, D.N. Flash-Flood Susceptibility Assessment Using Multi-Criteria Decision Making and Machine Learning Supported by Remote Sensing and GIS Techniques. Remote Sens. 2020, 12, 106. [Google Scholar] [CrossRef]
Rami, O.; Hasnaoui, M.D.; Ouazar, D.; Bouziane, A. A mixed clustering-based approach for a territorial hydrological regionalization. Arab. J. Geosci. 2021, 15, 75. [Google Scholar] [CrossRef]
Zhang, X.Q.; Xu, X.M.; Li, X.; Cui, P.; Zheng, D. A new scheme of climate-vegetation regionalization in the Hengduan Mountains Region. Sci. China-Earth Sci. 2024, 67, 751–768. [Google Scholar] [CrossRef]
Liu, Y.; Zou, Q.; Lu, Y.; Li, J.; Xiao, P. Eco-hydrological division of the watershed in the rapid topographic change basin on the eastern Tibetan Plateau. Shuili Xuebao 2022, 53, 243–252, (In Chinese with English abstract). [Google Scholar]
Ahani, A.; Mousavi Nadoushani, S.S.; Moridi, A. Regionalization of watersheds by finite mixture models. J. Hydrol. 2020, 583, 124620. [Google Scholar] [CrossRef]
Zhang, R.; Chen, Y.; Zhang, X.; Ma, Q.; Ren, L. Mapping homogeneous regions for flash floods using machine learning: A case study in Jiangxi province, China. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102717. [Google Scholar] [CrossRef]
Ekmekcioğlu, Ö.; Koc, K.; Özger, M.; Işık, Z. Exploring the additional value of class imbalance distributions on interpretable flash flood susceptibility prediction in the Black Warrior River basin, Alabama, United States. J. Hydrol. 2022, 610, 127877. [Google Scholar] [CrossRef]
Costache, R.; Hong, H.; Pham, Q.B. Comparative assessment of the flash-flood potential within small mountain catchments using bivariate statistics and their novel hybrid integration with machine learning models. Sci. Total Environ. 2020, 711, 134514. [Google Scholar] [CrossRef]
Hosseini, F.S.; Choubin, B.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Darabi, H.; Haghighi, A.T. Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: Application of the simulated annealing feature selection method. Sci. Total Environ. 2020, 711, 135161. [Google Scholar] [CrossRef]
Wang, S.; Yang, J.; Yao, J.; Bai, Y.; Zhu, W. An Overview of Advanced Deep Graph Node Clustering. IEEE Trans. Comput. Soc. Syst. 2023, 11, 1302–1314. [Google Scholar] [CrossRef]
Liu, G.; Ouyang, S.; Qin, H.; Liu, S.; Shen, Q.; Qu, Y.; Zheng, Z.; Sun, H.; Zhou, J. Assessing spatial connectivity effects on daily streamflow forecasting using Bayesian-based graph neural network. Sci. Total Environ. 2023, 855, 158968. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Pan, S.; Hu, R.; Long, G.; Jiang, J.; Zhang, C. Attributed Graph Clustering: A Deep Attentional Embedding Approach. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
Devvrit, F.; Sinha, A.; Dhillon, I.; Jain, P. S3GC: Scalable self-supervised graph clustering. Adv. Neural Inf. Process. Syst. 2022, 35, 3248–3261. [Google Scholar]
Liu, Y.; Liang, K.; Xia, J.; Zhou, S.; Yang, X.; Liu, X.; Li, S.Z. Dink-net: Neural clustering on large graphs. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 21794–21812. [Google Scholar] [CrossRef]
Shi, K.; Chen, Y.; Zhang, X.; Ma, Q.; Ren, L. Flash Flood Hazard Regionalization Based on Graph Clustering Neural Network in Jiangxi Province,China. Geogr. Geo-Inf. Sci. 2023, 39, 7–15, (In Chinese with English abstract). [Google Scholar]
Jiang, S.; Sweet, L.-b.; Blougouras, G.; Brenning, A.; Li, W.; Reichstein, M.; Denzler, J.; Shangguan, W.; Yu, G.; Huang, F.; et al. How Interpretable Machine Learning Can Benefit Process Understanding in the Geosciences. Earth’s Future 2024, 12, e2024EF004540. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Shi, Z.; Deng, W.; Zhang, S. Spatio-temporal pattern changes of land space in Hengduan Mountains during 1990–2015. J. Geogr. Sci. 2018, 28, 529–542. [Google Scholar] [CrossRef]
Yao, Y.; Zhang, B.; Han, F.; Pang, Y. Diversity and geographical pattern of altitudinal belts in the Hengduan Mountains in China. J. Mt. Sci. 2010, 7, 123–132. [Google Scholar] [CrossRef]
Yang, Z.-I.; Zhang, T.-B.; Yi, G.-H.; Li, J.-J.; Qin, Y.-B.; Chen, Y. Spatio-temporal variation of Fraction of Photosynthetically Active Radiation absorbed by vegetation in the Hengduan Mountains, China. J. Mt. Sci. 2021, 18, 891–906. [Google Scholar] [CrossRef]
He, Y.; Xiong, Q.; Yu, L.; Yan, W.; Qu, X. Impact of Climate Change on Potential Distribution Patterns of Alpine Vegetation in the Hengduan Mountains Region, China. Mt. Res. Dev. 2020, 40, 48–54. [Google Scholar] [CrossRef]
Wang, J.; Jiang, X.; Zhang, X. Characteristics of temporal and spatial variations of nocturnal precipitation in China. J. Nanjing Univ. Nat. Sci. 2022, 58, 750–765, (In Chinese with English abstract). [Google Scholar]
Chen, Z.; Wang, L.; Li, X.; Xue, Y.; Jia, H. Spatiotemporal Change Characteristics of Extreme Precipitation in South-western China and its Relationship with Intense ENSO Events. Plateau Meteorol. 2022, 41, 604–616, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Bian, Y.; Sun, P.; Zhang, Q.; Liu, R.; Ma, Z.; Zou, Y.; Lyu, Y. Spatial distribution characteristics of extreme climatic events in the hengduan mountains Region. Water Resour. Hydropower Eng. 2021, 52, 1–15. [Google Scholar] [CrossRef]
Li, Y.; Zhang, C.; Zhang, G. The development characteristics and formation modes of rainstorm-triggered flash flood disasters in the Hengduan Mountains. Acta Geogr. Sin. 2024, 79, 600–616, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Yang, J.; Dai, J.; Yao, H.; Tao, Z.; Zhu, M. Vegetation distribution and vegetation activity changes in the Hengduan Mountains from 1992 to 2020. Acta Geogr. Sin. 2022, 77, 2787–2802, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Mia, M.U.; Rahman, M.; Elbeltagi, A.; Abdullah-Al-Mahbub, M.; Sharma, G.; Islam, H.M.T.; Pal, S.C.; Costache, R.; Islam, A.R.M.T.; Islam, M.M.; et al. Sustainable flood risk assessment using deep learning-based algorithms with a blockchain technology. Geocarto Int. 2022, 38, 1–29. [Google Scholar] [CrossRef]
Cui, M.; Zhou, G.; Zhang, D.; Zhang, S. Global snowmelt flood disasters and their impact from 1900 to 2020. J. Glaciol. Geocryol. 2022, 44, 1898–1911. [Google Scholar]
Zheng, C.; Jia, L.; Hu, G. Global land surface evapotranspiration monitoring by ETMonitor model driven by multi-source satellite earth observations. J. Hydrol. 2022, 613, 128444. [Google Scholar] [CrossRef]
He, J.; Yang, K.; Tang, W.; Lu, H.; Qin, J.; Chen, Y.; Li, X. The first high-resolution meteorological forcing dataset for land process studies over China. Sci. Data 2020, 7, 25. [Google Scholar] [CrossRef]
Lin, P.; Pan, M.; Beck, H.E.; Yang, Y.; Yamazaki, D.; Frasson, R.; David, C.H.; Durand, M.; Pavelsky, T.M.; Allen, G.H.; et al. Global Reconstruction of Naturalized River Flows at 2.94 Million Reaches. Water Resour. Res. 2019, 55, 6499–6516. [Google Scholar] [CrossRef]
Lin, P.; Pan, M.; Yang, Y. Global Reconstruction of Naturalized River Discharge at 2.94 Million River Reaches (GRADES); National Tibetan Plateau Data Center/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar] [CrossRef]
He, S.; Zhang, Y.; Ma, N.; Tian, J.; Kong, D.; Liu, C. A daily and 500 m coupled evapotranspiration and gross primary production product across China during 2000–2020. Earth Syst. Sci. Data 2022, 14, 5463–5488. [Google Scholar] [CrossRef]
He, S.; Zhang, Y. PML-V2 (China): Evapotranspiration and Gross Primary Production Dataset (2000.02.26–2020.12.31); National Tibetan Plateau Data Center/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar] [CrossRef]
Zhang, Y.; Schaap, M.G.; Zha, Y. A High-Resolution Global Map of Soil Hydraulic Properties Produced by a Hierarchical Parameterization of a Physically Based Water Retention Model. Water Resour. Res. 2018, 54, 9774–9790. [Google Scholar] [CrossRef]
Li, Q.; Shi, G.; Shangguan, W.; Nourani, V.; Li, J.; Li, L.; Huang, F.; Zhang, Y.; Wang, C.; Wang, D.; et al. A 1km daily soil moisture dataset over China using in situ measurement and machine learning. Earth Syst. Sci. Data 2022, 14, 5267–5286. [Google Scholar] [CrossRef]
Shangguan, W.; Li, Q.; Shi, G. China Soil Moisture Dataset (2000–2020), A Big Earth Data Platform for Three Poles; National Tibetan Plateau Data Center/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar] [CrossRef]
Yang, M.; Yang, Q.; Zhang, K.; Li, Y.; Wang, C.; Pang, G. Effects of Content of Soil Rock Fragments on Calculating of Soil Erodibility. Acta Pedol. Sin. 2021, 58, 1157–1168, (In Chinese with English abstract). [Google Scholar]
Yang, Q. Soil Erodibility Dataset of Pan-Third Pole 20 Countries (2020, with a Resolution of 7.5 arc Second); National Tibetan Plateau Data Center/Third Pole Environment Data Center: Beijing, China, 2021. [Google Scholar] [CrossRef]
Zhou, S.; Xu, H.; Zheng, Z.; Chen, J.; Li, Z.; Bu, J.; Wu, J.; Wang, X.; Zhu, W.; Ester, M. A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions. ACM Comput. Surv. 2024, 57, 69. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Society. Ser. C Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep Graph Infomax. Int. Conf. Learn. Represent. 2019, 2, 4. [Google Scholar]
Tsitsulin, A.; Palowitch, J.; Perozzi, B.; Müller, E. Graph clustering with graph neural networks. J. Mach. Learn. Res. 2023, 24, 1–21. [Google Scholar] [CrossRef]
Mao, Q.; Peng, J.; Liu, Y.; Wu, W.; Zhao, M.; Wang, Y. An ecological function zoning approach coupling SOFM and SVM: A case study in Ordos. Acta Geogr. Sin. 2019, 74, 460–474, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
Hazaymeh, K.; Almagbile, A.; Alomari, A.H. Spatiotemporal Analysis of Traffic Accidents Hotspots Based on Geospatial Techniques. ISPRS Int. J. Geo-Inf. 2022, 11, 260. [Google Scholar] [CrossRef]
Yin, Y.; Zhang, X.; Guan, Z.; Chen, Y.; Liu, C.; Yang, T. Flash flood susceptibility mapping based on catchments using an improved Blending machine learning approach. Hydrol. Res. 2023, 54, 557–579. [Google Scholar] [CrossRef]
Cao, Y.; Jia, H.; Xiong, J.; Cheng, W.; Li, K.; Pang, Q.; Yong, Z. Flash Flood Susceptibility Assessment Based on Geodetector, Certainty Factor, and Logistic Regression Analyses in Fujian Province, China. ISPRS Int. J. Geo-Inf. 2020, 9, 748. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B.; et al. Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef]
Tan, X.Z.; Li, Y.; Wu, X.X.; Dai, C.; Zhang, X.L.; Cai, Y.P. Identification of the key driving factors of flash flood based on different feature selection techniques coupled with random forest method. J. Hydrol. Reg. Stud. 2024, 51, 101624. [Google Scholar] [CrossRef]
Marchi, L.; Borga, M.; Preciso, E.; Gaume, E. Characterisation of selected extreme flash floods in Europe and implications for flood risk management. J. Hydrol. 2010, 394, 118–133. [Google Scholar] [CrossRef]
Tariq, A.; Yan, J.; Ghaffar, B.; Qin, S.; Mousa, B.G.; Sharifi, A.; Huq, M.E.; Aslam, M. Flash Flood Susceptibility Assessment and Zonation by Integrating Analytic Hierarchy Process and Frequency Ratio Model with Diverse Spatial Data. Water 2022, 14, 3069. [Google Scholar] [CrossRef]
Qiu, A.-N.; Zhang, Y.-J.; Wang, G.-X.; Cui, J.; Song, Y.-X.; Sun, X.-Y.; Chen, L. A modified TOPMODEL introducing the bedrock surface topographic index in Huangbengliu watershed, China. J. Mt. Sci. 2022, 19, 3517–3532. [Google Scholar] [CrossRef]
Venegas-Cordero, N.; Cherrat, C.; Kundzewicz, Z.W.; Singh, J.; Piniewski, M. Model-based assessment of flood generation mechanisms over Poland: The roles of precipitation, snowmelt, and soil moisture excess. Sci. Total Environ. 2023, 891, 164626. [Google Scholar] [CrossRef] [PubMed]
Ke, L.; Junfang, C.; Ruoxuan, L.; Yaling, Z.; Minxi, L.; Li, G.; Euihua, N. Mechanisms of soil moisture response to rainfall infiltration in dry-hot valley of southwest China. South-North Water Transf. Water Sci. Technol. 2024, 22, 736–746, (In Chinese with English abstract). [Google Scholar]
Liu, G.; Tian, G.; Shu, D.; Lin, S.; Liu, S. Characteristics of surface runoff and throughflow in a purple soil of Southwestern China under various rainfall events. Hydrol. Process. 2005, 19, 1883–1891. [Google Scholar] [CrossRef]
Wang, N.; Zhang, H.; Dahal, A.; Cheng, W.; Zhao, M.; Lombardo, L. On the use of explainable AI for susceptibility modeling: Examining the spatial pattern of SHAP values. Geosci. Front. 2024, 15, 101800. [Google Scholar] [CrossRef]

Figure 1. (a) The digital elevation model of the Hengduan Mountains region (HMR) in China, with the historical flash flood events from 1950 to 2015 marked as black dots; (b) the location of the HMR with the provinces involved marked in grey.

Figure 2. The framework of flash flood regionalization.

Figure 3. Variations of (a) DBI and (b) CQI with the cluster number (K) for K-means, SOFM, DGI, DMoN, and Dink-Net.

Figure 4. The regionalization maps by (a) K-means, (b) SOFM, (c) DGI, (d) DmoN, and (e) Dink-Net with 12 clusters. In the clustering result obtained by DMoN method with 12 clusters, the number of grids among the clusters is extremely unbalanced. Therefore, there are only 7 clusters in (d).

Figure 5. (a) The regionalization map by Dink-Net with 12 subdivisions; (b) quantity and density, and (c) Z-score of the historical flash flood events in each subdivision of the regionalization map. The locations of historical flash flood events are marked with black dots in (a). The event number/event density has been marked for several subdivisions in panel (b), and the mean Z-score values have been presented in panel (c).

Figure 6. (a) The locations of the Regions SW-1, SW-7, SE-8, NW-3, NW-4, M-6, and M-12 in the HMR; (b,c): topography and vegetation for Regions SW-1 and SW-7, with low relief and sub-high latitude mountain and mixed coniferous broad-leaved forest; (d,e): topography and vegetation for Region SE-8, with low relief and sub-high latitude mountain and the vegetation of shrub or farmland; (f,g) show the deep dry, hot valley canyon with sparse shrubs in Regions NW-3 and NW-4; (h,i) refer to the wide river valley on Western Sichuan plateau and the vegetation of alpine meadows and shrubland in Regions M-6 and M-12. All photos were taken by Yifan Li.

Figure 7. The average of SHAP absolute values (average impact on model output magnitude) of inducing factors on the regionalization result from the SHAP model for the entire Hengduan Mountains region (HMR).

Figure 8. The SHAP value of the top 20 important inducing factors for each 2 × 2 km² grid in (a) Region SW-1, (b) Region SW-7, and (c) Region SE-8. Each dot in a panel represents the data for a grid. The inducing factors in each panel are ranked in descending order according to the factors’ local impact in each subdivision. The dot color indicates the attribute values of the corresponding factors, with red referring to the high attribute values of a factor. The x-axis value represents the SHAP values that quantify the factors’ impact on the classification tendency for that grid.

Figure 9. The SHAP value spatial distribution of inducing factors for each subdivision: (a) temperature; (b) mean of maximum 12 h rainfall ((P_12h)_mean); (c) mean of maximum 24 h rainfall ((P_24h)_mean); (d) elevation (Dem); (e) soil moisture (Smoisture); and (f) soil saturated hydraulic conductivity (Ks).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, C.; Cui, P.; Hassan, M.; Duan, Z.; Bhattacharyya, S.; Yao, S.; Zhao, Y. Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods. Remote Sens. 2025, 17, 946. https://doi.org/10.3390/rs17060946

AMA Style

Li Y, Zhang C, Cui P, Hassan M, Duan Z, Bhattacharyya S, Yao S, Zhao Y. Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods. Remote Sensing. 2025; 17(6):946. https://doi.org/10.3390/rs17060946

Chicago/Turabian Style

Li, Yifan, Chendi Zhang, Peng Cui, Marwan Hassan, Zhongjie Duan, Suman Bhattacharyya, Shunyu Yao, and Yang Zhao. 2025. "Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods" Remote Sensing 17, no. 6: 946. https://doi.org/10.3390/rs17060946

APA Style

Li, Y., Zhang, C., Cui, P., Hassan, M., Duan, Z., Bhattacharyya, S., Yao, S., & Zhao, Y. (2025). Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods. Remote Sensing, 17(6), 946. https://doi.org/10.3390/rs17060946

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Flash Flood Regionalization for the Hengduan Mountains Region, China, Combining GNN and SHAP Methods

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Data

3. Methods

3.1. Data Preprocessing

3.2. Clustering Analysis

3.2.1. Classic Machine Learning Methods

3.2.2. GNN Methods

3.3. Evaluation of Clustering Performance

3.4. Data Postprocessing and Mapping

3.5. Interpretation of Regionalization Results

4. Results

4.1. Clustering Performance of the Five Methods

4.2. Regionalization Result vs. Historic Flash Flood Events

4.3. Influence of Inducing Factors on Flash Flood Regionalization

5. Discussion

5.1. Strengths of the New Regionalization Framework

5.2. Impact of Inducing Factors on Flash Floods in the HMR

5.3. Limitations and Future Work

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI