*2.2. Clustering Model Implementation*

The SWMM model was run six times, once with each of the rainfall scenarios described above. We collected the simulated time-series water depth from each node in the stormwater drainage network for cluster analysis. As there are 60 junctions in the SWMM model, this results in a matrix where each column represents a single time step with a 5-min interval, and each row (60 rows) stands for a junction or node in the network. We then used the principal component analysis (PCA) to reduce the dimensionality of this matrix. PCA uses the eigendecomposition of the correlation matrix to identify a small set of principal components that represent the majority of variance in the original data [50]. Here, we used correlations between the time-series at different nodes to reduce the column of matrix to 2, which means the number of timesteps is compressed to 2 principal components. Finally, the dataset matrix is configured with 60 rows and 2 columns under each modeling scenarios. The datasets used in this work are not large, and for computational costs are limited. While other techniques for data reduction exist (e.g., correspondence analysis, factor analysis, or non-metric multi-dimensional scaling), we used PCA due to the assumed linear response of the water depth values. Although the reduction of dimensionality might cause data loss or an undesirable relationship between score axes, PCA indeed helps reduce computation time and remove redundant data features in the following cluster analysis.

All clustering algorithms were then run using this set of two principal components shown in Figure 2, with the following set up:

(1) K-means: We initially set the number of clusters (*k*) to 2 for each modeling scenarios. The algorithm was repeated ten times with different random initialization, and a maximum of 5 iterations was used to converge the algorithm.


**Figure 2.** Principal component scores for the two components (x\_pca means the first component score; y\_pca means the second component score) by K-mean under varying rainfall scenarios: (**a**) 3 h' duration rainfall, (**b**) 12 h' duration rainfall, (**c**) 48 h' duration rainfall. The principal component scores are used to examine if these two clusters are reasonably distinguished from each other clustering (gray circles the blue and red dots assigned to the closest cluster).

In Figure 2 below, there is no sample marginal overlapping, which indicates the cluster classification is reasonable with respect to grouping the time-series water level data. Additionally, the isolated dots in the subplots of Figure 2 present the dissimilarity of the water depth datasets under this event, indicating these isolated dots might be the potential flooded junctions, which help the decision-makers to pre-screen the vulnerable sites in the drainage networks.
