**1. Introduction**

Land use modifications accompanied by urbanization, including the decrease of vegetation cover, increase of impervious surfaces, and drainage channel modifications, result in changes in the characteristics of the surface runoff hydrograph [1,2]. Urban flooding and waterlogging are severe global environmental issues. Urban hydrological models play an important role in the planning and construction of urban drainage and flood control systems [3,4]. The Storm Water Management Model (SWMM), which was developed by the United States Environmental Protection Agency (EPA), has become one of the most popular urban hydrological models [5,6] in urban stormwater simulation.

Urban hydrological models typically require a large number of parameters, which are difficult or impossible to measure with sufficient accuracy, and must generally be estimated or evaluated from secondary information sources [7,8]. Hence, the obtained simulation results are typically laden with notable degrees of uncertainty [9]. Thus, various methods have been developed to assess the uncertainty in urban hydrological models, such as the Monte Carlo Markov chain [10,11], grey box model [12], Bayesian approach [13], and Generalized Likelihood Uncertainty Estimation (GLUE) method [14,15]. The selection of the objective function is critical for many of these uncertainty analysis methods. According to GLUE method, the objective function and acceptability threshold exert substantial influence on the final assessment results. The Nash–Sutcliffe Efficiency index (NSE) is widely used as the objective function in the GLUE method for urban hydrological models [16,17].

In 1998, Gupta et al. [18] proposed that hydrological model calibration inherently comprises multiple objectives, and that converting it into a single-objective model must involve some degree of subjectivity. The complexity of urban stormwater management comes from flood characteristics such as the flood volume, peak flood value, lag time, and overflow characteristics such as the overflow volume and duration. These characteristics concern different management system components [15,19] and are difficult to be captured by a single-objective function during model calibration and validation. Moreover, the adoption of the single-objective function introduces additional uncertainties in applications because it may consider unacceptable parameter sets [20,21]. The controversy can also be found in the use of informal likelihood measures based on the NSE, which involves subjective decisions [22].

This paper proposes a new framework for analyzing the uncertainty of an urban hydrological model. TOPSIS (evaluation technique for order performance by similarity to ideal solution), which is a multiple-criteria decision analysis (MCDA) method developed by Hwang and Yoon [23,24], was adopted as the likelihood criterion of the GLUE method using four important objective criteria in urban flood modelling. The proposed method was used to quantify the parameter uncertainties in the SWMM model and was applied to the Dahongmen (DHM) catchment located in Beijing, China. We attempted to improve the uncertainty assessment of an urban hydrological model by comprehensively investigating the performance of the parameter sets during simulation.

#### **2. Methods**

The proposed uncertainty analysis method is based on the GLUE framework, which was developed by Beven and Binley [25]. In the GLUE method, a likelihood measure is selected and calculated to reflect the goodness of fit between the model simulation and the observations. The model simulations are considered as non-behavioral when the values of the likelihood measures are lower than the cut-off threshold. The selection of the likelihood measure and cut-off threshold has been discussed in various papers, owing to the subjective decisions involved [22,26–28]. Because the single objective always involves some degree of subjectivity [18], the proposed method uses MCDA to carry out the likelihood measure; the critical points are summarized as follows:


The uncertainty assessment process can be described as follows:


modeler [31]. A behavior parameter set should be able to achieve these objectives. The criteria typically used in flood prediction are as follows:

$$\text{NSE} = 1 - \frac{\sum\_{i=1}^{n} \left( \mathbf{y}\_{\text{cbss},i} - \mathbf{y}\_{\text{sim},i} \right)^{2}}{\sum\_{i=1}^{n} \left( \mathbf{y}\_{\text{cbss},i} - \overline{\mathbf{y}\_{\text{cbss},i}} \right)^{2}} \tag{1}$$

$$\text{VB} = \frac{\left| \sum\_{i=1}^{n} \mathbf{y}\_{\text{obs},i} - \sum\_{i=1}^{n} \mathbf{y}\_{\text{sim},i} \right|}{\sum\_{i=1}^{n} \mathbf{y}\_{\text{obs},i}} \tag{2}$$

$$\text{PB} = \frac{\left| \max\_{1 \le i \le n} \langle \mathbf{y}\_{\text{obs},i} \rangle - \max\_{1 \le i \le n} \langle \mathbf{y}\_{\text{sim},i} \rangle \right|}{\max\_{1 \le i \le n} \langle \mathbf{y}\_{\text{obs},i} \rangle} \tag{3}$$

and

$$\mathbf{R} = \frac{\sum \left( \mathbf{y}\_{\text{obs}, \text{i}} - \overline{\mathbf{y}\_{\text{obs}}} \right) \left( \mathbf{y}\_{\text{sim}, \text{i}} - \overline{\mathbf{y}\_{\text{sim}}} \right)}{\sqrt{\sum \left( \mathbf{y}\_{\text{obs}, \text{i}} - \overline{\mathbf{y}\_{\text{obs}}} \right)^2 \left( \mathbf{y}\_{\text{sim}, \text{i}} - \overline{\mathbf{y}\_{\text{sim}}} \right)^2}} \tag{4}$$

where yobs is the observed flow, ysim is the simulated flow, yobs is the average measured flow, ysim is the average simulated flow, and n is the number of the observed flow points. In the above formula, NSE is the widely used Nash–Sutcliffe efficiency index [32]. The flood volume bias (VB) and flood peak bias (PB) are the modified expressions for the flood volume and flood peak deviation [18,33], respectively. The parameter R represents the consistency between the observed flow and the simulated flow [34]. The threshold of these criteria can be defined with reference to practical demand [35]. The reasonable range of the four criteria is between 0 to 1. The optimum value of NSE and R is 1, and optimum value of VB and PB is 0, which means the simulation results of the model completely fit the measured results. In this study, the behavioral parameter sets whose likelihood values of the four criteria were greater than the corresponding thresholds were chosen for further analysis.

3. TOPSIS, which is a well-known MCDA method and can provide the ranking order of all alternatives [36,37], was employed in the calculation of the aggregate likelihood value L(θi) of the behavioral parameter set θi. In the TOPSIS method, the four criteria of all parameter sets should be normalized by the classification of the benefit and cost criterion, where the benefit criterion means that a larger value is more valuable, and vice-versa for the cost criterion [37]. In this study, NSE and R are benefit criteria, while VB and PB are cost criteria; xij is the ith criterion of the jth parameter set. For the benefit criteria, the normalized value (rij) is calculated as follows:

$$\mathbf{r}\_{\mathbf{i}\rangle} = \frac{\mathbf{x}\_{\mathbf{i}\rangle}}{\sqrt{\sum\_{\mathbf{i}=1}^{n} \mathbf{x}\_{\mathbf{i}}}^{2}} \cdot \tag{5}$$

For the cost criteria, the normalized value (rij) is calculated as follows:

$$\mathbf{r}\_{\mathbf{i}\mathbf{j}} = \frac{\frac{1}{\mathbf{x}\_{\mathbf{i}\mathbf{j}}}}{\sqrt{\sum\_{\mathbf{i}=1}^{\mathbf{n}} \frac{1}{\mathbf{x}\_{\mathbf{i}\mathbf{j}}}^2}} \cdot \tag{6}$$

In many situations, the criterion should be weighted according its importance [36]. Because it is difficult to identify which criterion is more important, we assume that all criteria are equally important. The ideal solution Rj <sup>+</sup> and negative-ideal solution Rj − can be calculated as follows:

$$\mathbf{R}\_{\mathbb{I}}^{+} = \max \{ \mathbf{x}\_{1\circ}, \mathbf{x}\_{2\circ}, \dots, \mathbf{x}\_{n\circ} \} \tag{7}$$

$$\mathbf{R}\_{\mathbf{i}}^{+} = \min \{ \mathbf{x}\_{1\circ}, \mathbf{x}\_{2\circ}, \dots, \mathbf{x}\_{\text{r}\dot{\mathbf{i}}} \}. \tag{8}$$

Then, the separation of each alternative from the ideal and negative-ideal solutions are expressed, respectively, as follows:

$$\mathbf{D}\_{\mathbf{i}}^{+} = \sqrt{\sum\_{\mathbf{j}=1}^{n} \left(\mathbf{r}\_{\mathbf{i}\mathbf{j}} - \mathbf{R}\_{\mathbf{j}}^{+}\right)^{2}} \tag{9}$$

$$\mathbf{D}\_{\rm i}^{-} = \sqrt{\sum\_{\mathbf{j}=1}^{n} \left(\mathbf{r}\_{\rm ij} - \mathbf{R}\_{\rm j}^{-}\right)^{2}} \tag{10}$$

The aggregate likelihood value L(θi) is evaluated by comparing the distance from the ideal solution and the distance from the negative-ideal solution:

$$\mathcal{L}(\theta\_{\rm i}) = \frac{\mathcal{D}\_{\rm i}^{+}}{\mathcal{D}\_{\rm i}^{+} + \mathcal{D}\_{\rm i}^{-}} \tag{11}$$

4. Finally, the predictions from the behavioral parameter sets are ranked in the order of the likelihood weights W(i), which is defined as follows:

$$\mathcal{W}(\mathbf{i}) = \frac{\mathcal{L}(\boldsymbol{\theta}\_{\mathbf{i}})}{\sum\_{\mathbf{i}=1}^{N} \mathcal{L}(\boldsymbol{\theta}\_{\mathbf{i}})} \tag{12}$$

where N is the number of behavioral parameter sets. Additionally, the cumulative probability distribution for the ranked discharge predictions can be obtained as follows:

$$P(\mathbf{Q} < \mathbf{Q}\_i) = \frac{\sum\_{\mathbf{j=1}}^{\mathbf{i}} W(\mathbf{j})}{\sum\_{\mathbf{j=1}}^{\mathbf{n}} W(\mathbf{j})} \tag{13}$$

where Q denotes the discharge, and Qi is the discharge prediction ranked at the ith position; n has the same meaning as in Equation (2). According to the cumulative probability distribution, the uncertainty bound can be obtained for a given certainty level.

#### **3. Study Area and SWMM Model**

#### *3.1. Study Area*

The Dahongmen catchment is a typical urbanized area in Beijing, PRC, with a high population density and heavily built-up underlying surface. The catchment covers an area of approximately 131.49 km<sup>2</sup> and is located upstream of the Liangshui River basin in Beijing, between 39◦48'–39◦55' N and 116◦90'–116◦24' E. In the catchment, the terrain exhibits a downward trend from the western mountains to the eastern plains. The annual average precipitation is 522.4 mm, and 80% of the precipitation occurs during the period from June to September. The river systems and hydro-meteorological stations of the Dahongmen catchment are shown in Figure 1.

**Figure 1.** Station distribution at Dahongmen catchment.

#### *3.2. SWMM Model*

The SWMM is a dynamic hydrological simulation model used in single-event or long-term (continuous) simulations of runoff quantity and quality from primarily urban areas. The runoff component of SWMM operates on a collection of subcatchment areas that receive precipitation and generate runoff and pollutant loads. The routing portion of the SWMM transports this runoff through a system of pipes, channels, storage/treatment devices, pumps, and regulators [38,39]. Additionally, the SWMM tracks the quantity and quality of the runoff generated within each subcatchment, and the flow rate, flow depth, and quality of the water in each pipe and channel during a simulation period with multiple time steps. In this study, SWMM version 5.1 was used; its technical details have been reported by Rossman et al. [40]. The best performance of SWMM before and after urbanization in the Dahonmen catchment was addressed by Xu and Zhao (2017) [41].

In this study, five heavy storms (accumulated precipitation > 50 mm) occurred between 2011 and 2012 in the Dahongmen catchment were used for model calibration and verification. Precipitation on 23 June 2011, 26 July 2011, and 14 August 2011 were used to calibrate the model, and the other two on 24 June 2012 and 21 July 2012 were used for validation. The accumulated precipitation of the five events was 110.6, 66.8, 66.9, 63.9, and 197.4 mm, respectively. The rainfall duration of the five storms was14, 5, 3, 16, and 18 h, respectively. As shown in Figure 1, the hourly series from three precipitation stations and a DHM gauge station were obtained from the Hydrographic Station of Beijing, CHINA. The hourly inflow data of the Youanmen station were obtained from the Liangshui River Basin Authority. The floodwater from this gate only account for very small amount of streamflow thus will not have great impact on the results of uncertainly analysis. The Digital Elevation Model (DEM) and sewer system map were provided by the National Aeronautics and Space Administration (NASA, ASTER GDEM) and Beijing Municipal Institute of City Planning and Design. These datasets include the locations, section shapes, and conveyance capacities of the river and sewer system in the study area. The catchment was divided into 13 subcatchments, jointly controlled by the river and pipe network, wherein there existed a total of 351 drainage channels (282 watercourses and 69 road drainage channels), as shown in Figure 2.

**Figure 2.** Structure of Storm Water Management Model (SWMM) model in Dahongmen catchment.

Table 1 lists the parameters used in the uncertainty analysis of this study, along with their units and distribution. The minimum and maximum values were obtained from the SWMM user's manual [40] and relevant literature [42,43]. The values for the selected model parameters were randomly selected from uniform probability distributions.


**Table 1.** Distribution of SWMM parameters.

#### *3.3. Interval Evaluation Index*

To analyze the effectiveness of the range of uncertainty, we selected three evaluation indices, namely, the average band width (B), coverage (CR), and average relative deviation (RD) [41,44], which are defined as follows:

$$\mathbf{B} = \frac{1}{\mathbf{n}} \sum\_{i=1}^{n} (\mathbf{Q}\_{\text{s,upper}}^{\text{i}} - \mathbf{Q}\_{\text{s,lower}}^{\text{i}}) \tag{14}$$

CR <sup>=</sup>nQin <sup>n</sup> <sup>×</sup> 100% (15)

$$\text{RD} = \frac{1}{\text{n}} \sum\_{i=1}^{n} \frac{\left| \frac{1}{2} \left( \mathbf{Q}\_{\text{s,upper}}^{\text{i}} + \mathbf{Q}\_{\text{s,lower}}^{\text{i}} \right) - \mathbf{Q}\_{\text{o}}^{\text{i}} \right|}{\mathbf{Q}\_{\text{o}}^{\text{i}}} \tag{16}$$

Here, Qi s,upper and Q<sup>i</sup> s,lower are the upper and lower boundary values of the confidence interval; Qi <sup>o</sup> is the observed flow; and nQin is the number of observed values in the confidence interval.

#### **4. Results**

#### *4.1. Comparison of Di*ff*erent Acceptability Thresholds*

By the application of Monte Carlo method, 10,000 parameter sets were generated within the ranges listed in Table 1. The four objective criteria illustrated in Equations (1)–(4) were calculated by running the SWMM with these sampled parameter sets.

We set the threshold of each objective criterion according to the catchment characteristics and time scale, which are list in Table 2. The number of parameter sets which met the threshold requirements are also list in the table.


**Table 2.** Numbers of behavior parameter sets under different criteria.

As shown in Table 2, 2506 parameter sets could satisfy the requirement of NSE threshold, which is the least in the 4 criteria and indicates NSE is the strictest criterion among them. Nevertheless, R threshold is the most flexible one, 6918 parameter sets can satisfy the criterion. Additionally, all parameter sets that satisfied other criteria thresholds can satisfy R threshold simultaneously.

From Table 2 we can also observe that the number of behavioral parameter sets decreased when more criteria were considered. There are 2226 parameter sets that satisfied the NSE and VB thresholds simultaneously, and 1772 parameter sets that satisfied the NSE and PB thresholds. When we considered all criteria thresholds in GLUE-TOPSIS, the number of behavioral parameter sets reduced to 1598 finally.

#### *4.2. Comparison of Posterior Distribution*

We can obtain the posterior probability distributions of the SWMM parameters from the behavioral parameter sets listed in Table 2. The posterior probability distributions from the single criterion (NSE) and GLUE-TOPSIS are shown in Figure 3.

Obvious difference can be observed between the posterior probability distributions from single criterion (NSE) and those from GLUE-TOPSIS. In general, we can find the obvious areas with high frequency distributions in the posterior probability distribution curves from GLUE-TOPSIS. However, those from single criterion (NSE) are relative flat.

Overall, the posterior probability distributions of Imperv, N-perv, D-perv, MaxRate, Decay, and N-conduit are similar for the two methods. Nevertheless, N-imperv, D-imperv, MinRate, and N-river exhibit a significantly higher frequency interval under GLUE-TOPSIS conditions, which results in higher sensitivity. Notably, there are obvious high-frequency intervals in the parameters characterizing the impervious area, owing to the large impervious area in the DHM catchments. Additionally, the parameter spatial distribution has more than one high frequency interval, which may reflect the "equifinality" of parameter sets proposed by Beven and Binley [25].

**Figure 3.** Parameter posterior distribution for single criterion and multiple criteria.

#### *4.3. Uncertainty Estimation of Discharge Simulation*

We simulated the SWMM discharge with parameter sets from the single-criterion and GLUE-TOPSIS method. The normalized simulations were sorted by the likelihood value to determine the 95% and 5% uncertainty bounds (90 confidence interval). Figures 4 and 5 present the plotted simulation results obtained by the single criterion and GLUE-TOPSIS methods for five rainfall events during the calibration and validation periods. To analyze the effectiveness of the uncertainty ranges, the evaluation results of the uncertainty indices under the two methods are listed in Table 3.

**Figure 4.** MCDA analysis results for rainstorm events 20110623, 20110726, and 20110814 in calibration period.

**Figure 5.** Multiple-criteria decision analysis (MCDA) analysis results for rainstorm event 20120624, 20120721 in validation period.


**Table 3.** Uncertainty interval evaluation results.

As shown in Figures 4 and 5, the widths of the uncertainty bound from GLUE-TOPSIS method obviously narrower than those from single criterion method, particularly for the flood peak area and drop section of the discharge curve. Most of the observations fell within the 90% confidence interval, except rainstorm 20120624, which is the smallest storm and may influenced by the strobe operation. However, SWMM model show high performance in rainstorm 20120721, which is one of the most disastrous rainstorms in history.

As presented in Table 3, the band width (B), coverage (CR), and relative deviation (RD) were 46.233, 41.5%, and 24.685, respectively, for single criterion method in the calibration period, and 38.934, 52.3%, and 23.211, respectively, for GLUE-TOPSIS method. In the verification period, B, CR, and RD were 35.526, 45.9%, and 17.120, respectively, for the single criterion method, and 27.970, 47.4%, and 13.754, respectively, for GLUE-TOPSIS method. The results indicate that the GLUE-TOPSIS method show superiority in uncertainty bounds assessment over the single criterion methods, with lower values in average band width (B), average relative deviation (RD) and higher values in coverage (CR).

From the simulation results of the median GLUE estimates in Table 4, it can be seen that under the GLUE-TOPSIS method, the simulation results of NSE, VB are significantly improved compared with the single criterion method, although the VB of 20110623 has a slight increase. The median estimates improved, with an increase of the NSE by 1.6% in the calibration period, and by 10.0% in the validation period. However, the cost criteria PB increased by 0.010 during the calibration period, from the perspective of single field precipitation, the PB of the 20110814 precipitation and the 20120624 precipitation have increased, indicating that the flood peak flow simulation effect is poor. And the benefit criteria R decreased by 0.078.


**Table 4.** Simulation results of themedian Generalized Likelihood Uncertainty Estimation (GLUE) estimates.

#### **5. Discussion**

The uncertainties of urban hydrological models have been addressed by several researches [45–48]. Most researches used single objective function such as NSE or VB as evaluation index [45,46]. As shown in the Figure 6a,b, we found that VB and PB still involved large uncertainties when the NSE was higher than 0.8. This means that a high NSE value can still lead to the poor simulation of the flow volume and flood peak.

**Figure 6.** Relationship between different criteria.

Some attempts were made to construct one likelihood function [47,49] considering different objectives. This likelihood function will not work when the objective change. Other important objectives such as overflow and flow in sewer system should be considered in uncertainty analysis of urban hydrological model [15]. The proposed method avoids the process of function construction and is feasible to be applied considering different objectives.

Considering the completeness and consistent of data, five heavy storms between 2011 to 2012 were selected for uncertainty analysis. These are representative enough to test the model reliability for heavy storms. However, small storms are easily affected by local Blue-Green Infrastructure in urban areas that make it difficult to predict [50,51]. Further work needs to test the reliability of the propose method for different magnitude of flood.

#### **6. Conclusions**

Urban hydrological models are extensively used in flood forecasting, sponge cities design, and pollution management, etc. The uncertainty assessment of an urban hydrological model is important when evaluating the model's reliability. Because of the various applications of urban hydrological models, a single criterion is hard to evaluate their performance. In this research, we proposed a multiple-criteria uncertainty analysis method, namely GLUE-TOPSIS, and applies it to the uncertainty assessment of SWMM model in Dahongmen catchment, Beijing. Five typical rainstorm events that occurred during 2011–2012 were investigated and used to test the performance of the proposed method. The conclusions can be summarized as follows:


The GLUE-TOPSIS method provide a feasible way to assess the uncertainty of urban hydrological models, which have been used various aspects in urban water resources management. The users can select objective criteria flexibly and combine them in the GLUE-TOPSIS framework according to their actual needs. We will proceed our research and apply GLUE-TOPSIS to a wider area in water resource management.

**Author Contributions:** Conceptualization, B.P., G.Z. and R.S.; methodology, B.P., D.P. and Z.Z.; validation, S.S.; writing—original draft preparation, B.P. and G.Z.; writing—review and editing, S.S. and G.Z.; visualization, S.S.; supervision, B.P. and G.Z.; project administration, B.P.; funding acquisition, B.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by three research programs: (1) National Natural Science Funds of China (Grant No. 51879008); (2) China Scholarship Council (Grant No. 201906045024); (3) China Scholarship Council-University of Bristol Joint PhD Scholarships Programme (Grant No. 201700260088)

**Acknowledgments:** We thank Liwen Bianji, Edanz Editing China (www.liwenbianji.cn/ac), for editing the English text of a draft of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.
