**1. Introduction**

Urban drainage systems (UDSs) are the infrastructures constructed to provide conveyance ability and storage capability for drainage overflow mitigation, surface inundation reduction, and pollutant removal. However, the existing UDSs, whose functionality can only serve for a limited number of years, might degrade and even deteriorate as time goes by [1]. In recent years, retrofitting the traditional UDSs with water-level sensors, velocity meters, and flow sensors has been widely adopted as an adaptive and cost-effective solution for flooding challenges [2,3]. The deployed sensors can measure the water quantity and quality data in a real-time way, which now makes it feasible for decision-makers and stakeholders to foresee the potential flood events and locate the vulnerable sites, which supports

decision making. The need to understand the emerging data is crucial for forecasting flash floods, reducing sewer overflows, and detecting flooded sites [4–6]. Interpreting big water data for flood detection is attracting increasing attention from researchers [7–10] and can be employed to reduce potential flood damages.

In the last decade, many scholars have introduced several machine learning techniques to investigate the available water resources and hydrological datasets [11–13]. The major machine learning algorithms employed for flood detection are support vector machines [14,15], neuro-fuzzy [16], adaptive neuro-fuzzy inference systems, multilayer perceptron [17], random forest [18], and classification and regression trees [19]. Bowes et al. compared long short-term memory and recurrent neural networks by using a time-series of groundwater table data in the city of Norfolk, Virginia [20]. They explained that a long short-term memory neural network is better than the recurrent neural network in predicting groundwater level, but takes about three times longer to train the model. Hu et al. applied a boosted decision regression tree to detect drainage floods with over 90% accuracy in combined sewer systems of Detroit city, Michigan [21]. Li proposed a data-driven fuzzy neural method for reducing downstream urban flooding volume and showed that with an enhanced genetic algorithm optimization the regression deviations could be reduced from 0.22 to 0.07 [22]. However, the majority of these studies have focused on supervised learning (i.e., when a known outcome is used to train the model), and unsupervised machine learning algorithms (UMLA) are not commonly used in stormwater UDSs.

Clustering algorithms are a data-driven technology without considering the classification standard of different risk levels and thereby provide more objective and reasonable results [23]. Therefore, cluster analysis, one of the key unsupervised machine learning methods, has been applied in many fields, including pattern recognition, image analysis, data compression, and anomaly detection [24]. However, its applicability in urban flood detecting is yet to be fully investigated. In general, cluster analysis is based on identifying similarities between observations. If a water quantity or quality event happens in the water system, these observations are likely to be highly dissimilar to other observations [25]. The increment in dissimilarity would lead to these observations being considered as outliers, and thus detected as anomalies. Although cluster analysis has been extensively discussed in municipal topology classification and water distribution network simplification [26,27], the ability of UMLA methods to group time-series data at UDSs is still unknown, and the most appropriate methods to assess these algorithms are unclear. Keogh and Lin concluded that clustering time-series data is meaningless, but this argument does not cover the similarity-based clustering algorithms such as K-means and agglomerative clustering [28]. In contrast, Chen demonstrated that similarity-based cluster analysis could be successfully applied to sequence datasets by using different distance measures [29,30]. Wu et al. adopted the clustering algorithm [24], developed by Rodriguez and Laio [31], to detect the short-duration pipe burst with a 0.61% false positive in water distribution systems. Xing and Sela selected SCI (silhouette coefficient index) and CHI (Calinski–Harabasz index) as the metrics to evaluate K-mean clustering (KC) performance in clustering time-series water pressure data and they finally identified the number of clusters for the pressure sensor placement [32]. However, it was unclear why they chose these two indexes as the UMLA performance metrics. Previous studies from the computer science field have demonstrated the differences and similarities among the popular performance evaluation indices such as SCI, CHI, and DBI (Davies–Bouldin index) [33–35]. However, there is no systematic study of how these apply to time-series data from UDSs.

Floods are one of the most hazardous natural events in the world. The short response time against flood events makes them challenging for the hydrologists, and as a result, floods cause loss of life, economics, infrastructure, and property worldwide annually [36,37]. Researchers are trying to promote flooding indicators to identify flooding locations ahead of extreme storm events. There are several hydro-meteorological indicators, such as temperature, humidity, and precipitation, which are related to flood events. The most widely used indicator is hydraulic water level since it can be efficiently and continuously monitored and forecasted to facilitate floods early detection and warning [38]. To efficiently capture the flood events, the flooding water level should be well investigated.

In this study, clustering algorithms, including KC, agglomerative (AC), and spectral clustering (SC), are applied for the urban flood tracking. A storm water management model (SWMM) is established to represent the real-world stormwater urban drainage systems, located in Sugar House neighborhood, Salt Lake City, UT, USA. Three evaluation indices are used to test the performance analysis of the clustering algorithms, namely SCI, CHI, and DBI. The whole research is driven by the hypothesis that the clustering of time-series water level data has the potential to facilitate flooding location detection in the Sugar House Area. The investigations provide answers to various inter-related research questions: (1) What is the performance of different clustering algorithms in capturing the floods? (2) Which metrics are the most suitable for assessing cluster model performance based on hydraulic-hydrologic data in UDSs? (3) Which features of flood time-series data (length, volume and variability) are the most influential for flooding detection, and how does the choice of data feature affect the clustering performance in localizing the flooding sites?

To answer these questions, it is necessary to explore how UMLA groups time-series water depth data, and which assessment score can best represent UMLA performance. However, challenges to implement UMLA with time-series data still exist. Firstly, it is essential to re-format the time-series water depth datasets to make them suitable for clustering problem. This difficulty is associated with the second research question above since the features of datasets determine how we re-structure the data frame [39]. Secondly, the connection between the number of clusters and the clustering model performance is another obstacle. As it is still unknown how to correlate clustering performance and the number of clusters in the stormwater systems, it is necessary to build such a theoretical relationship for a practical application like the flooding detection herein [40]. Therefore, the study aims to improve the understanding of how UMLA facilitates detecting hydraulic anomaly according to the characteristics of water depth datasets in urban drainage networks.

The layout of the study is as follows: (1) build KC, AC, and SC algorithms to group the time-series water depth data; (2) use UMLA metrics, including SCI, CHI, and DBI, to evaluate these algorithms; (3) compare the best number of clusters obtained by each method; (4) investigate the relationship between model performance of flooding detection and water depth data characteristics (see Figure 1 for details). We start by describing the implementation of different UMLA methods, followed by the research methodology with an overview of the real-world case study, performance metrics, and simulation scenarios for cluster analysis. Then, we present the results, discussion, and finally, the conclusions.

**Figure 1.** Representing the workflow of the whole study.
