**2. Methodology**

The proposed model framework used to analyze urban flooding based on short-record remotely sensed rainfall and hydrologic model includes three parts. (i) Generating extreme rainfall events. A rainfall generator named Rainyday with the short (nine years), gridded (0.1◦ × 0.1◦), and hourly record of remote sensing rainfall is used to generate extreme rainfall events with 20 realizations at 2-, 10-, 20-, 50-, and 100-yr return periods for 2 h, 6 h, 12 h, and 24 h durations. These events are compared to the traditional design rainfall (i.e., intensity-duration-frequency (IDF) formula-based estimates) for rationality analysis. (ii) Simulating runoff under different rainfall return periods and durations. We leverage SWMM to construct a rainfall-runoff model for simulating the runoff under different rainfall return periods and durations, and the time distribution of the design rainfall follows the Chicago rainfall pattern. (iii) Analyzing urban flood. On the basis of analyzing the flood indicators (i.e., flood time, maximum rainfall rate, total maximum rainfall volume) under different rainfall return periods and durations, its comprehensive characteristics are analyzed by projection pursuit method.

#### *2.1. Stochastic Storm Transposition*

The traditional estimation methods of design rainfall for urban areas often have some drawbacks, such as a high requirement of rainfall series and a limited scope of application [27]. Many of them cannot meet the requirements of urban flood analysis in areas lacking data [11]. In order to conquer these drawbacks, this study uses RainyDay software with the core technique of stochastic storm transposition (SST) to estimate the design rainfall at different return periods in the area lacking data.

RainyDay is developed by Wright et al. [27] based on Python. The core of this model is to combine SST and remote sensing rainfall products to transpose the spatial location of observed rainfall events. It can effectively lengthen the rainfall record and expand the sample size of observed rainfall events. Figure 1 shows an example of transposing two observed rainfall events to the study area through RainyDay. It is worth mentioning that RainyDay only changes the spatial location of the observed rainfall events, but does not change the temporal distribution. The reader is directed to Zhu et al. [11], Wright et al. [27], Yu et al. [47], and Franchini et al. [48] for more details. The following is a brief introduction to RainyDay.

**Figure 1.** Schematic diagram of rainfall spatial transposition of RainyDay. Where *RObs*1 and *RObs*2 are the observed rainfall events in the transposition domain, respectively; *RTran*1 and *RTran*2 are the rainfall events after transposition, respectively.

Step 1. Selecting the transposition domain. RainyDay requires that (i) the selected transposition domain should contain the study area; (ii) the selected transposition domain has the same climatic conditions and similar rainfall characteristics as the study area; (iii) the area of the transposition domain is more than 10 times larger than the study area. We selected a typical residential district in Guangzhou as case-study area. Following the requirements of RainyDay, Guangdong Province, which belongs to the same administrative region as the case-study area, is selected as the transposition domain.

Step 2. Identifying the "parent storms". RainyDay selects the *m* largest *t*-hour rainfall events that occurred in the transposition domain over *n*-year record of gridded rainfall dataset, in terms of rainfall accumulation with the same size (i.e., single grid in this study) of study area. The selected rainfall events, which do not occur in the same 24 h, are temporally non-overlapping. That is, RainyDay only selects one *t*-hour event when there are two or more *t*-hour events in the top *m* events occurring in the same 24 h. These selected rainfall events are defined as "parent storms".

Step 3. Calculating the distribution probability of extreme rainfall events. The occurred probability of extreme rainfall events is spatially non-uniform in the transposition domain. RainyDay calculates the probability through the two-dimensional Gaussian kernel according to the storm centers of the "parent storms". The sum of the probability of each grid in the transposition domain is on one.

Step 4. Transposing rainfall events. RainyDay randomly selects *k* rainfall events from the "parent storms" to generate rainfall events, where *k* is an integer and indicates a "number of storms per year". Besides, RainyDay assumes that *k* follows a Poisson distribution with annual occurrence rate *λ*, where *λ* represents the ratio of the selected *m* parent storms to *n*-year rainfall records, *λ* = *<sup>m</sup>*/*<sup>n</sup>*. More details about Poisson-distributed storm occurrences can be found in Wilson and Foufoula-Georgiou [49]. The selected rainfall events can be transposed to any position in the transposition domain according to the distribution probability of extreme rainfall events, but only the rainfall that occurred in the study area is calculated. RainyDay extracts the *t*-hour maximum rainfall, and the extracted rainfall is regarded as the maximum *t*-hour annual rainfall.

Step 5. Generating *T*max annual maximum rainfall. The *T*max annual maximum rainfall can be generated through repeating Step 4 *T*max times. To obtain the intensity-durationfrequency relationships, the maximas are ranked *i* = 1 ... *T*max from smallest to largest

based on rainfall accumulation. Then, the return period *P* of each these ranks can be calculated as *Pi* = 1/(*i*/*T*max). Each return period includes *N* realizations after repeating Step 4 and this step *N* times, that is, RainyDay provides the ensemble spread of rainfall accumulation rather than a single estimated value at each return period.

In this study, RainyDay is used to generate 5- to 100-yr design rainfall events with durations of 2 h, 6 h, 12 h and 24 h, respectively. Each return period includes 20 realizations for different durations. For simplicity, we only analyze the mean, minimum, and maximum of 20 realizations, since these results include the ensemble spread of all the realizations. In addition, we compare these results (i.e., RainyDay-based estimates) with IDF formulabased estimates to reflect the reasonability of the proposed framework.

#### *2.2. Constructing Different Rainfall Scenarios*

The design rainfall used in urban drainage systems and flood control is often calculated through coupling the IDF formula and the Chicago rainfall pattern [50]. To be consistent with this, the Chicago rainfall pattern is also used to allocate the RainyDay-based estimates at different times. The difference between IDF formula-based and the minimum, maximum and mean in 20 realizations of RainyDay-based estimates are compared. IDF formula is the empirical formula *q* = <sup>167</sup>×*<sup>A</sup>*(<sup>1</sup>+*C*lg*P*) (*t*+*b*) *n* , where *q* (L/(s·hm2)) indicates the design rainstorm intensity of *t*-minute duration at return period *P* (year); *A*, *C*, *b*, and *n* are the constant parameters that are derived and modified based on long-term rainfall records using the Gauss–Newton iterative algorithm [46]. For the case-study area, the IDF formula is shown in Equation (4).

$$q = \frac{3618.27(1 + 0.438 \text{lg}P)}{(t + 11.259)^{0.750}} \tag{1}$$

In order to analyze the difference between the IDF-based and RainyDay-based estimates impact in urban flood analysis, we combine different return periods (5-, 10-, 20-, 50-, 100-yr), durations (2 h, 6 h, 12 h, 24 h), and estimates (IDF formula-based estimates, and the minimum, maximum, and mean in 20 realizations of RainyDay-based estimates) to generate 80 rainfall scenarios for urban flood analysis. For all rainfall scenarios, the rain peak coefficient is set to 0.375 to be consistent with the design specification for outdoor drainage in China [46].

#### *2.3. Urban Hydrologic Model*

In this study, an urban hydrologic model named SWMM is used to simulate and reflect the relationships between rainfall and runoff. SWMM is widely used in urban flood analysis and hydraulic practices, and it has very good simulated performances in both urban and natural basins [51,52]. Since the theory of the SWMM model is introduced in detail in a previous study by Gironás et al. [53], we do not show more details about the SWMM model in this study.

Because the calibrated and verified hydrological model in Zhu et al. [15] is used in this study, the reader is directed to Zhu et al. [15] for more information about case-study area and the performance of the model. In this model, the nonlinear reservoir method is selected to calculate the surface runoff, the Saint-Venant equations are used to calculate the flow, the Horton model is used to calculate the infiltration process, the Manning formula and the approximate continuity equation are used to convert the runoff of each sub basin into the outflow process, and the Newton-Raphson method and finite difference method are used to calculate the time-varying process of runoff. Zhu et al. [54] calibrated and verified the model based on the observed rainfall and runoff data, while the Nash-Sutcliffe efficiency (NSE) index is used to assess the model's performance.

In order to reflect the performance of RainyDay-based estimates for runoff process simulation, we take the time distributions of the RainyDay-based and IDF formula-based estimates as the inputs of the constructed urban hydrologic model and compare their differences. The model used in this study is same as that in Zhu et al. [54] and the calibration and verification results show that the model can be used to simulate the runoff process of the case-study area. The applicability and rationality of the model are demonstrated. More details about the model can be found in Zhu et al. [54].

#### *2.4. Projection Pursuit Algorithm*

The projection pursuit algorithm is a robust and powerful algorithm for the exploratory analysis of multivariate high-dimensional data. It is widely used to reduce dimensionality for feature extraction, especially for flood and environment analysis. For instance, Zhi et al. [55] coupled the drainage model, 2D flood simulation model, and projection pursuit algorithm to assess urban flood risk; when Guo et al. [56] proposed an evaluation framework to assess atmospheric environment carrying capacity based on an evaluation index system including 20 indicators, the projection pursuit algorithm was used to reduce dimensionality. The basic theory of the projection pursuit algorithm is to project the data into low-dimensional subspace via projection vectors. It has the advantages of a strong anti-jamming capability and not depending on subjective evaluation criteria. In this study, the projection pursuit algorithm is adopted to analyze the comprehensive characteristics of urban flooding by constructing an evaluation index system. The system includes three indicators, i.e., flood time, maximum rate, and total inundation volume. Zhu et al. [40] demonstrated that flood characteristics could be estimated well based on these indicators. The general steps are summarized as follows; more details are provided in Kruskal and Shepard [57] and Zhu et al. [40].

Step 1: Construct and normalize the evaluation indicator set. Flood time, maximum rate, and total inundation volume are selected as the evaluation indicator set (*X* = { *Xij*|*i* = 1, 2, 3; *j* = 1, 2, ... , *p*}), where *Xij* represents the value of the *i*th evaluation indicator of the *j*th sample, *j* and *i* represent the number of evaluation indicators and sample size, respectively. The normalized set *xij*is calculated as follow:

$$\alpha\_{ij} = \frac{X\_{ij} - X\_{j\text{min}}}{X\_{j\text{max}} - X\_{j\text{min}}} \tag{2}$$

where *Xj*max and *Xj*min denote the maximum and minimum of *i*th evaluation indicator.

Step 2: Establishing the projection indicator function *Q*(*a*). The evaluation indicator set is synthesized into a 1 × 3 vector (i.e., *a* = {*ai* |*i* = 1, 2, 3}) as the projection direction. Therefore, the projection value of *j*th sample is calculated as follow:

$$Z\_{\hat{\jmath}} = \sum\_{i=1}^{3} a\_i \mathbf{x}\_{i\hat{\jmath}}(\dot{\jmath} = 1, 2, \dots, p) \tag{3}$$

Then, *Q*(*a*) can be expressed as:

$$\mathbf{Q}(a) = \mathbf{S}\_{\mathbb{Z}} \mathbf{D}\_{\mathbb{Z}} \tag{4}$$

$$S\_Z = \sqrt{\frac{1}{p-1} \sum\_{j=1}^{p} \left( Z(j) - \overline{Z} \right)^2} \tag{5}$$

$$D\_Z = \sum\_{i=1}^{3} \sum\_{j=1}^{p} \left( R - R(i,j)\mu(R - r(i,j)) \right) \tag{6}$$

where *SZ* and *DZ* note the interclass distance and local density of *Zj*, respectively; *Z* represents the mean of *Zj*; *R*(*R* = 0.1*SZ*) means the cutoff radius; *u*(*<sup>R</sup>* <sup>−</sup>*<sup>r</sup>*(*<sup>i</sup>*, *j*)) is the unit step function, if *R* − *<sup>r</sup>*(*<sup>i</sup>*, *j*) ≥ 0, *u*(*<sup>R</sup>* − *<sup>r</sup>*(*<sup>i</sup>*, *j*)) = 1; otherwise, *u*(*<sup>R</sup>* − *<sup>r</sup>*(*<sup>i</sup>*, *j*)) = 0.

Step 3: Calculating the best projection direction. *Q*(*a*) is determined by the projection direction *a* if the value of the evaluation indicator is given. For the projection direction, the higher the value of *Q*(*a*) the better. When the value of *Q*(*a*) is at its maximum, the corresponding projection direction is the best. In order to seek the best projection direction, the optimum objective function can be constructed as max( *Q*(*a*) = *SZDZ*), and the constraint condition is *p* ∑ *j*=1 *a*<sup>2</sup>(*j*) = 1. Seeking the best projection direction is a nonlinear global optimization problem; the particle swarm optimization (PSO) technique is widely used to solve such problems. We also adopt it in this study, and more details are directed toKennedyandEberhart[58].

Step 4: Analyzing the comprehensive characteristics of urban flooding. The best projection values can be obtained through putting the best projections direction into Equation (4). The best projection values represent the comprehensive characteristics of urban flooding. The larger the values are, the more severe is the urban flood.

Based on analyzing the runoff processes at the outlet of the case-study area, we focus on the flood characteristics under RainyDay-based and IDF formula-based estimates at the manholes (i.e., junctions) for the case-study area drainage system in this section. Three flood indicators (i.e., flood time, maximum rate, total inundation volume), which are demonstrated to reflect the urban flood characteristics by Zhu et al. [40], are selected to analyze the flood characteristics at each manhole. The comprehensive flood characteristics are analyzed by combining these three indicators with the projection pursuit algorithm.

#### **3. Data and Case-Study Area**

*3.1. Data*

The hourly, 0.1◦ gauge-adjusted remotely sensed rainfall data (http://www.cma. gov.cn/2011qxfw/2011qsjgx/, accessed date: 15 November 2020) from the China Meteorological Administration merges CMORPH (the Climate Prediction Center Morphing algorithm) and the observations of 30,000 automatic rain gauges. This rainfall product is optimized and verified by the probability density function matching technique and optimal interpolation method. The temporal resolution is coarsened to one hour. Its total error is less than 10%, and the errors for heavy rainfall in the area with sparse ground gauge networks are less than 20%. The accuracy is higher than similar rainfall products and the product has been widely used for precipitation studies [59]. In order to verify the feasibility of estimating the design rainfall based on short-record remote sensing rainfall data, the rainfall data from 2008 to 2016 are selected in this study, where 2008 is the earliest year when data are available.

The rainfall and runoff data used for calibration and verification are observed from the case-study area, where the rainfall data is observed by RainLoggerTM rain gauge (RainWise Inc.; USA), and the runoff data is observed by Stingray open channel gauge (Greyline Instruments Inc.; Germany). The observed time steps are 10 min.
