**2. Methods**

#### *2.1. Radar-Rain Gauge Merging Methods*

The potential of high spatial and temporal resolution precipitation based on weather radar is known. Hence, di fferent merging methods have been proposed, and are generally classified as bias reduction and error variance minimization [33]. Identifying the spatial correlation in the error structure model is the most important step in the merging process. A categorization similar to a starting point and refined based upon a theoretical categorization was adopted by Wang et al. (2013), who also proposed the following (Table 1): First, a radar bias adjustment methods focusing on bias adjustment of radar estimates; second, radar-rain gauge integration methods, undertaking a true integration of both radar precipitation and rain gauge data; and, third, rain gauge interpolation methods using di fferent interpolation methods with the radar spatial association as additional information [21].


**Table 1.** Categories of radar-rain gauge merging, the merging methods, and their abbreviations used to derive the data.

#### 2.1.1. Radar Bias Adjustment Category

The methods in this category attempt to predict the ungauged location value by the bias that computes radar accumulation and the rain gauge accumulation. In this category, the rain gauge observation is assumed as the true rainfall value, the radar precipitation is used as the entire background, and the radar values at the gauged locations are used to compare with the bias adjustment [34,35]. The ungauged locations value is adjusted by multiplying or adding the gauge-radar comparison correction factor, which is given over a long or short time period [36].

(1) Mean field bias QPE (MFB)

Mean field bias (MFB) adjustment is the simplest method in this category and assumes that the radar estimates are affected by a spatially uniform multiplicative error, which can be a bad electronic calibration or an offset in the Z-R relation used to convert radar reflectivity to rainfall value [13,34]. It is acknowledged that MFB adjustment is the most common and simplest technique in radar meteorology and the correction factor is simply obtained by comparing a spatially averaged ratio of rain gauge with the radar accumulations at gauged locations over the given time period. In this method, a simple multiplicative factor is used to correct the radar domain uniformly. In the current study, the adjustment factor is estimated as

$$\mathbf{C}\_{MFB} = \frac{\sum\_{i=1}^{N} \mathbf{G}\_i}{\sum\_{i=1}^{N} \mathbf{R}\_i} \tag{1}$$

where *N* is the number of valid rain gauge, *Gi* is the rain gauge observations, and *Ri* is the radar estimated values at the rain gauge located pixel.

(2) Regression inverse distance weighting (RIDW)

In this method, the rain gauge observations are used as the true rainfall value to correct the entire radar background field by multiplying a dynamic adjustment correction factor. In geostatistics, a random process *R*(*<sup>s</sup>*, *t*) consists of two parts, where the first deterministic part *D*(*<sup>s</sup>*, *t*) corresponds to the trend component, and the other stochastic residual part <sup>ε</sup>(*<sup>s</sup>*, *t*) corresponds to local fluctuations of the trend. In this study, in addition to the observation vector *g*(*<sup>s</sup>*, *t*) measured by the rain gauge, the radar-based QPE at rain gauge locations over the whole period were also considered and used at each interpolation point. In this context, the radar external variable was used as a linear function to model the trend *D*(*<sup>s</sup>*, *t*) [23], so that

$$R(s,t) = D(s,t) + \varepsilon(s,t) \tag{2}$$

$$\mathbf{D}(\mathbf{s},t) = a(t)r(\mathbf{s},t) \tag{3}$$

where *s* is the location of a given point at time *t*. *<sup>r</sup>*(*<sup>s</sup>*, *t*) is the radar value at location s and time *t*. The coefficient is computed as the slope of a linear regression of all pairs of points composed of the gauge values on the y-axis and the values of the radar pixel on the x-axis. *a*(*t*) is assumed to be constant spatially.

$$\varepsilon(\mathbf{s},t) = \mathbf{g}(\mathbf{s},t) - D(\mathbf{s},t) = \mathbf{g}(\mathbf{s},t) - a(t)r(\mathbf{s},t) \tag{4}$$

$$\mathcal{R}\_{RIDW}(\mathbf{s}, t) = D\_p(\mathbf{s}, t) + \mathcal{E}\_{RIDW} \tag{5}$$

#### 2.1.2. Radar-Rain Gauge Integration Category

The methods in this category aim to minimize the estimation uncertainty by conducting an actual integration of both rain gauge and radar data. As well as differing with some local bias adjustment aiming at reducing local biases between radar and rain gauge observation, the integration category also attempts to minimize overall estimation uncertainty [22]. The merging methods in this category do not simply retain the radar as background or impact the local magnitude at the rain gauge location, but depend on their relative uncertainties and estimate the rainfall value at given location grid in the weighted average of both resources [37]. Two main methods in this category are applied to obtain a minimum uncertainty estimation in different ways, which are:

(1) Collocated co-kriging (CCoK)

To achieve the final aim of reducing uncertainty as much as possible, this method integrates both data instead of using only radar or rain gauge precipitation as the background. The co-kriging (CoK) belongs to this category because it minimizes the variance of estimation by solving a single kriging system, including both radar and rain gauge data. Although CoK can be seen as a radar-based interpolation, it is a liner combination of a multivariate variant, merging radar and rain gauge data [38]. These results, however, from several full forms of CoK with different secondary variables, show a significantly larger kriging system and always lead to a numerically unstable co-kriging matrix, with a significant difference between the primary and secondary data. To avoid these uncertainties, collocated

co-kriging (CCoK) has been proposed as a reduced form of CoK and applied in merging rainfall with this variation [39,40]. Compared with CoK, CCoK employs the radar value at the rain gauge location and only incorporates the cross-covariance between the radar value and observation at the rain gauge location. Consequently, the kriging system has been efficiently reduced.

The CCoK estimates rainfall and can be defined as

$$R\_{\rm CCoK}(\mathbf{s}\_0, t) = \sum\_{i=1}^{n} \lambda\_i^{\rm CCoK} R\_{\rm \mathcal{J}}(\mathbf{s}\_i, t) + \lambda\_2' R\_r(\mathbf{s}\_0, t) \tag{6}$$

where *Rg*(*si*, *t*) are the rainfall values of the *n* known rain gauges at time *t*, λ*CCoK i* are the weights of the rain gauges, *Rr*(*<sup>s</sup>*0, *t*) is the radar QPE value at the target point, and λ2 is the weight of related to the radar field.

In this method, the constraint of both data weight can be defined as

$$\sum\_{i=1}^{n} \lambda\_i^{\text{CCoK}} + \lambda\_2' = 1 \tag{7}$$

The full radar covariance is, hence, not required. Instead, the covariance of rain gauges and the cross-covariance between radar and gauges are necessary. In this paper, we used rain gauge data and radar data as the primary variable and the secondary variable, respectively, to integrate the precipitation estimation, and approximate the cross-correlation from the radar spatial correlation using the alternative Markov approach [41]. Instead of the multiple (cross-) spatial correlation and large equations using the full COK, for this study, the CCoK used a simplified approximation with its advantage easily applied [40].

$$\sum\_{i=1}^{n} \lambda\_i^{\text{CCaK}} \mathbf{C}\_{\text{GG}}(\mathbf{x}\_i, \mathbf{x}\_0) + \lambda\_2^{\prime} \mathbf{C}\_{\text{GR}}(\mathbf{x}\_i, \mathbf{x}\_o) = \mathbf{C}\_{\text{GG}}(\mathbf{x}\_i, \mathbf{x}\_o) \tag{8}$$

$$\sum\_{i=1}^{n} \lambda\_{ki}^{\text{CCoK}} \mathbb{C}\_{\text{RG}}(\mathbf{x}\_0, \mathbf{x}\_i) + \lambda\_2' \mathbb{C}\_{\text{RR}}(\mathbf{0}) = \mathbb{C}\_{\text{RG}}(\mathbf{0}) \tag{9}$$

where *CGG*(*h*) is the covariance of the rain gauges, *CRG*(*h*) is the cross-covariance between radar and rain gauges, and *CRR*(0) is the radar covariance at h = 0.

(2) Fast Bayesian regression kriging (FBRK)

In this category, we integrated both rainfall data with the purpose of obtaining the estimation at the minimum uncertainty. For this purpose, methods in a Bayesian framework are widely used, and we applied the fast Bayesian regression kriging (FBRK), method, as proposed by Yang and Ng [42], to merge different data types [42]. We explicitly considered the difference in the errors from the raw input data and aimed to estimate an accurate rainfall field and obtain better precipitation data. Unlike most other existing kriging-based merging methods, the likelihood function in the FBRK is modified. Further accounting for the differences, Yang and Ng [42] applied the FBRK in three different data types, i.e., the residuals of the regression model were used to regress radar estimations, and rain gauge observations were interpolated by the ordinary kriging.

$$
\vec{I}\_0 = a\_r \mathcal{R}\_0 + \mathcal{J}\_r + \mathcal{e}\_0 \tag{10}
$$

where ˆ *I*0 is the FBRK interpolated rainfall intensity at *x*0, *R*0 is the radar measured intensity at the same location, *ar* and β*r* are the regression coefficients, and *e*0 is a random error term whose mean and standard deviation are computed following the kriging equations below:

$$
\mu\_{c0} = \sum\_{i=1}^{M} \lambda\_i \varepsilon\_i \tag{11}
$$

$$
\sigma\_o^2 = \left(1 - \sum\_{i=1}^M \lambda\_i\right) \mathbf{y}\_{\infty} + \sum\_{i=1}^M \lambda\_i \mathbf{r}\_{0,i} \tag{12}
$$

$$r\_{i,j} = \tau^2 + s^2 \left[ 1 - \exp\left( -\frac{|\mathbf{x}\_{i\prime} \mathbf{x}\_j|}{d\_{\rm ox}} \right) \right] \tag{13}$$

$$
\begin{pmatrix} r\_{1,1} & \dots & r\_{M,1} \\ \vdots & \ddots & \vdots \\ r\_{1,M} & \dots & r\_{M,M} \end{pmatrix} \begin{pmatrix} \lambda\_1 \\ \vdots \\ \lambda\_M \end{pmatrix} = \begin{pmatrix} r\_{0,1} \\ \vdots \\ r\_{0,M} \end{pmatrix} \tag{14}
$$

where *ue*0 and σ*o* are the expected value and standard deviation of *e*0, respectively, *ei* is the residual of the *i*th of the *M* rain gauge and crowdsourced observations, and λ*i* is its associated weight.

#### 2.1.3. Rain Gauge Interpolation Category Using the Spatial Association of Radar as an Addition

Unlike the bias adjustment methods using the entire radar field as background with these integration methods integrating both data sources, the methods in this category simply used the spatial association as an external drift to interpolate the rain gauge values. Ochoa-Rodriguez et al. proposed that the merging methods in this category are all geostatistical and kriging-based [33]. The kriging-based interpolation approaches predict the ungauged located values with the linear weights of observations at gauged locations by minimizing the variance of the error. As such, the main component of the methods is that the rainfall filed can be characterized as a Gaussian random variable, and because of this, the methods predict ungauged values with the liner combination of gauged value by deriving the weights through minimizing the variance. This di ffers to the classical geostatistics by assuming a Gaussian distribution and stationarity of the spatial covariance, with the distribution of precipitation skewed over the domain [43]. The transformation of applying both rain gauge and radar data into a more Gaussian distribution is termed trans-gaussian kriging [44]. It is based on the quantitative spatial variability of both data, and a more Gaussian distribution always has a better Gaussianity in the residuals. Two widely used methods of this category were applied in this study, which are:

(1) Regression kriging (RK)

As one of the kriging family hybrid interpolation methods, regression kriging (RK) is a spatial interpolation technique that integrates a linear regression and the regression residuals with simple kriging. The advantage of this method is that all points are used to interpolate the residual with a global neighborhood, which can extend to a broader range. RK uses arbitrarily complex regression on auxiliary information (radar data) and chooses simple kriging to interpolate the residuals acquired from the regression model. In this study, we defined the following successive steps to implement this method: (a) The linear regression step, (b) the residuals variogram computation step, and (c) the kriging-based interpolation of the residual steps.

For steps a and b, the trend and residuals computation based on Equation (3) are also valid for RK. At the beginning of step c, the covariance matrix *Caa* of computing the covariance of the residuals at the target location is:

$$\mathbf{C}\_{\text{aa}} = \begin{pmatrix} \mathbf{C}\_{Z}^{2} & \mathbf{C}\_{12} & \cdots & \mathbf{C}\_{1N} \\ \mathbf{C}\_{21} & \mathbf{C}\_{Z}^{2} & \cdots & \mathbf{C}\_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{C}\_{N2} & \mathbf{C}\_{N2} & \cdots & \mathbf{C}\_{Z}^{2} \end{pmatrix} \tag{15}$$

where element *Caa* of the matrix is computed by the covariance between the observation locations I and j, where *C*<sup>2</sup> *Z* is the variance of the observations. In this method, the square-root transformation of the data is used in the process of applying kriging. This transformation shows a trending increase on the Gaussianity of the overall residuals, although some analysis of the e ffect sometimes show a few that are limited. Based on the linear kriging and linear combination, the weights used to compute residuals at the target location could thus be given as:

$$\mathfrak{k}\_{RK}(\mathfrak{s}\_0, t) = \sum\_{i=1}^{N} \lambda\_i \in (\mathfrak{s}, t) \tag{16}$$

which is then added to the trend *mp*(*<sup>S</sup>*0, *t*) to obtain the expected value of the precipitation depth.

Finally, the expected precipitation value at the interpolation location can be computed as:

$$
\phi\_{\mathbb{R}K}(\mathbf{s}\_0, t) = m\_p(\mathbf{s}\_0, t) + \mathfrak{e}\_{\mathbb{R}K}(\mathbf{s}\_0, t) \tag{17}
$$

#### (2) Kriging with external drift (KED)

Kriging with external drift (KED) is an extension of universal kriging interpolation, in which the interpolated variable, in this case, is computed as the sum of stochastic term and a deterministic term. Kriging with external drifts allows the incorporation of several additional variables that are used as background information to interpolate the primary variable [45]. In this study, we focused on merging rain gauge and radar data, and therefore, radar data were considered as the only additional information in this method. The basic assumption of KED is that the expected value of the estimated variable <sup>G</sup>(*x*) has a linear relationship with an additional variable *<sup>R</sup>*(*x*):

$$\mathbf{G}(\mathbf{x}) = \mathbf{a} + \mathbf{b} \cdot \mathbf{R}(\mathbf{x}) \tag{18}$$

where <sup>G</sup>(*x*) is the rain gauge value at location *x*, *<sup>R</sup>*(*x*) is the radar rainfall estimate at the gauged location *x*, and *a* and *b* are linear coefficients that are determined.

The external drift can clearly indicate the full spatial variability of the radar QPE data, especially in the events that the rain gauge-radar consistency is high. Thus, the estimation at given location x0 is derived from a linear estimator, and the weights are computed as follows:

$$\sum\_{i=1}^{n} \lambda\_i^{KED} = 1\tag{19}$$

$$\sum\_{i=1}^{n} \lambda\_i^{KED} R(\mathbf{x}\_i) = R(\mathbf{x}\_0) \tag{20}$$

As mentioned above, data transformation is used to deal with the rainfall showing non-Gaussian features and the problematic cases that lack enough rain gauges to obtain a reliable variogram [46]. In this method, normal score transformation, which can associate every given probability quantile to the corresponding quantiles of a standard normal probability distribution, is used to transform the data to obtain a continuous, strictly cumulative distribution [44].

## *2.2. Meteorological Evaluation*

#### 2.2.1. Leave-One-Out Cross Validation (LOOCV)

No independent precipitation observations exist at a high resolution. Hence, to validate the merging methods, a leave-one-out cross validation (LOOCV) was used to assess the performance of the rainfall-merged techniques. In this method, the rain gauge point observations are assumed to be directly measured as true values, which are used to access the performance by a comparison with other given grid point rainfall products that are computed by different rainfall-merged methods. For the scale mismatch between rain gauge and merging grid cell, the gauged location value is substituted by merged rainfall data from the nearest grid center. It is notable that LOOCV only assesses the accuracy of estimations at the rain gauge locations, but the LOOCV statistics allow us to compare the performance of different merging methods systematically.

In this study, we undertook the evaluation on an hourly basis for each rain gauge location and for each of the merging methods.

The following indicators were used to quantitatively compare the different radar-rain gauge merged products and the rain gauge observations for each time step:

Bias: The systematic errors assessment is calculated from the mean of difference between the observed and predicted rainfall values.

$$\text{Bias} = \frac{1}{n} \sum\_{i=1}^{n} \left( R\_i - \hat{R}\_i \right) \tag{21}$$

RMSE: The root mean square (RMSE) represents the standard deviation of the differences between the observed and predicted rainfall values and is widely used in verification.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(\mathbb{R}\_{i} - \mathbb{R}\_{i}\right)^{2}}\tag{22}$$

MRTE: The mean-root-transformed error (MRTE) can mitigate the dominant of errors from large precipitation amounts for the given lower weight:

$$\text{MRTE} = \frac{1}{n} \sum\_{i=1}^{n} \left( \sqrt{\mathcal{R}\_i} - \sqrt{\hat{\mathcal{R}}\_i} \right)^2 \tag{23}$$

where *Ri* is the rain gauge observed value, n is the number of the rain gauges, and *R* ˆ *i* is the estimated value at the rain gauge location. For each whole event, the bias was computed as an average of the entire period and entire spatial range. The bias value can range from −∞ to +∞, with optimal value equal to 0. The RMSE and MRTE can range from 0 to +∞, with optimal value equal to 0.

Finally, the assessment of individual rain gauges only provides a performance at the point scale. For the catchment that has fewer rain gauges to validate the performance, the evaluation over the whole catchment is necessary. Thus, in order to assess the performance of the merging methods at larger areas, a hydrological application approach was implemented.

#### 2.2.2. Hybrid Hydrological Model (Hybrid-Hebei Model)

In this study, all the rainfall runoff forecasts were produced with the semi-distributed rainfall-runoff Hybrid-Hebei model. This model is a semi-distributed model with a lumped conceptual from the Hebei model and a spatially distributed feature based on GIS [47]. The model is in operational use in semi-arid and semi-humid regions. The runoff of the semi-distributed Hebei model is divided into surface runoff and underground runoff in each 1-km<sup>2</sup> grid cell. When the precipitation intensity is greater than the infiltration intensity, the landmark runoff confluence is generated and, conversely, the infiltration component generates underground runoff after considering the soil water demand. Finally the confluence generates the outlet flow of the basin. The structure of the Hebei model in each 1-km<sup>2</sup> grid cell is shown in Figure 1.

**Figure 1.** Structure of the Hybrid-Hebei model in each grid cell.

In semi-humid and semi-arid areas, the middle zone of the soil vadose zone is relatively thick, the infiltration rate generated by precipitation often fails to reach the diving surface, and the infiltration rate gradually decreases during the infiltration process. Therefore, the main factor affecting the infiltration rate is the water content of the surface soil. In the semi-distributed Hebei model, considering the complex changes of the underlying surface and the significant difference in infiltration capacity in the semi-arid and semi-humid areas, the infiltration curve in the model is a parabolic infiltration curve controlled by surface soil moisture. This is based on measured data of the Tuanshan gully in northern Shaanxi, China.

The model's infiltration curve within the grid is:

$$f\_{\mathcal{S}} = \left(p\_{\mathcal{S}} - \frac{i^{(1+n)}}{(1+n)f\_m^n}\right)e^{-\mu m} + f\_{\mathcal{E}}\tag{24}$$

where *fg* is infiltration rate within the grid, unit: mm/h; *pg* is rainfall intensity within the grid, mm/h; n is the index; *fc* is stable infiltration rate, mm/h; u is the index; m is the surface soil moisture, mm; and *f nm* is the infiltration capacity within the grid, mm/h.

The structure of the lumped Hebei model was described in a previous study [47]. Compared to the original lumped Hebei model, the Hybrid-Hebei model provides improved hydrograph simulations. It can be coupled with high-resolution precipitation to achieve a superior runoff simulation. In this study, the Nash efficiency coefficient (NSE) was used to evaluate the ensemble runoff simulation.

$$\text{NSE} = 1 - \frac{\sum\_{t=1}^{T} \left(Q\_0^t - Q\_m^t\right)^2}{\sum\_{t=1}^{T} \left(Q\_0^t - \overline{Q}\_o\right)^2} \tag{25}$$

where *Qt*0 is the observation at time *t*, *Qtm* is the estimated value at time *t*, and *Q*ˆ *o* is the mean value of the whole time T.

For floods in mountainous catchments, the peak flow is an important index, and the relative error (RE) of the peak flow is adopted:

$$\text{RE} = \frac{Q\_m - Q\_p}{Q\_p} \tag{26}$$

where *Qp* is the observed peak flow and *Qm* is the estimated peak flow.

#### **3. Study Area and Data**

#### *3.1. Study Area and Events*

The two river catchments of Fuping and Zijinguan were selected as the study areas. These catchments belong to the south and north reaches of the Daqinghe catchment located in Northern China and have semi-humid and semi-arid climatic conditions. The drainage area of Fuping (from latitude 39◦22 to latitude 38◦47N and from longitude 113◦40 to longitude 114◦18E) is 2210 km2, and the area of Zijingguan (from latitude 39◦13 to latitude 39◦40N and from longitude 114◦28 to longitude 115◦11E) is 1760 km2. The elevation above sea level in the Fuping and Zijingguan catchments varies from 254 m to approximately 2456 m and 519 m to 2105 m, respectively (Figure 2). Both catchments react very specifically to intense rainfall for the steep terrain and the low vegetation coverage, and the rivers of the two catchments flow from west to east. The river flow is measured at the catchment outlets. There are eight rain gauges in the Fuping catchment and eleven rain gauges in the Zijingguan catchment. An S-band Doppler weather radar with a scan radius of 250 km is located in Shijiazhuang city, which is approximately 100 km to the southeast of the two catchments and both catchments can be completely covered by the radar. In this study, four storm events from the Zijingguan catchment and four storm events from the Fuping catchment were selected to assess the performance of different merging methods. When we chose the storm events, a 24-h time window was used. Within the 24-h window, the storms which showed representative rainfall evenness in space and time were chosen,

which then formed the eight storm events. The start times, durations, and peak flows of these events are shown in Table 2.

**Figure 2.** The information of the Zijingguan and Fuping catchments.


**Table2.**Durationsandrainfalltotals fortheeightselectedstormevents.

#### *3.2. Weather Radar and Data*

The radar data used in this study were retrieved from the S-band single-polarization Doppler weather radar located at Shijiazhuang city in Northern China (Figure 1). The detailed information of this radar can be seen in Table 3. The radar is operated by the China Meteorological Administration (CMA) and has an optimal detection range of 230 km. The radar obtains per base reflectivity data on every six-minute volume scan and completely covers the two study areas [48]. A radar QPE Group System (QPEGS) was developed by the CMA, providing hourly QPE data at a high spatial resolution. For different sources of errors, such as radar calibration, variation of the vertical reflectivity profile, attenuation, and anomalies, quality control of radar data, such as the removal of ground clutter, is necessary and carried out by the OPEGS. A maximum-pixel-value method was also used to generate the "mixed height radar reflectivity" in each volume gridded value, as in the hybrid scan reflectivity proposed by the National Severe Storms Laboratory (NSSL). For the location of the selected radar in this study, a threshold value (38 dBZ) of radar reflectivity was set to differentiate between the convective and stratiform rainfall types considering the rainfall features in eastern China [49]. The accumulated clutter was always magnified during the hourly accumulated rainfall collection period, so a simple clutter filter was used to remove the static clutter. The power law Marshall–Palmer relationship

converted the radar reflectivity (Z) to rain rata (R), and a Z-R relationship calculated the accumulated rainfall for convective and stratiform rainfall types [50]. In the QPEGS, the common definition of "a" and "b" is shown in Equations (25) and (26) [48]. To ensure that the spatial resolution of radar data can reflect the precipitation, a higher radar precipitation, rather than a low threshold of 0.1 mm, was considered for the estimation of a 1 km × 1 km radar grid.

For convective:

$$Z = 300 \text{R}^{1.4} \tag{27}$$

For stratiform:

$$Z = 200 \text{R}^{1.6} \tag{28}$$



<sup>\*</sup> The antenna diameter includes the radome.

#### **4. Results and Discussion**

We produced a series of sets of radar-rain gauge merged data by combining the radar and rain gauge precipitation using different merging methods. During the computing period with these merging methods, some applications were defined. For example, a value of 2 (common default value) was given in the RIDW method, and a minimum of three stations were used in the kriging-based methods. Spherical-model semivariograms defined the variograms that were ill defined, with an insufficient number of points. The last previously computed valid variogram was instead used when the present condition was not valid, and the regression and integration were computed on stations only located in the basins [51,52]. The evaluation of the quality and the reliability of the merged data in this study was assessed in two stages: The quantitative evaluation of radar-rain gauge merging methods based on LOOCV and the hydrological model performance driven from the merged data as Hybrid-Hebei model input.

#### *4.1. Evaluation of Radar-Rain Gauge Merging Methods*

To assess the performance of the different merging methods, we computed the present performance indicators for the eight events. For each indicator, hourly rainfall values were averaged for each event (Table 4, Figure 3), and the calculated weight of each rain gauge was determined by the area weight divided by the Thiessen polygon. As indicated, the QPE-only-based radar data (OR) clearly showed the weakest performance in all performance indicators, which confirms the necessity of bias correction using rain gauge observations. For the BIAS indicator, the median of BIAS should be close to zero with minimum dispersion because the cross-validation errors of a good estimator should be unbiased. The results shown in Figure 3 indicate that all merging methods were relatively unbiased (median errors were close to zero in all events), but the range of BIAS varied between the different methods. The main difference between these methods can be seen in the range of the indicators, whereby the integration category has the smallest range of BIAS rather than the interpolation category and bias adjustment category [53]. The results show that the kriging-based methods all performed well, with the bias reduced. A comparison of the results of the three categories on the RMSE and MRTE (Figure 3) is noticeable. Irrespective of the two merging methods chosen in the integration or interpolation category, one of these had a clear improvement over the bias adjustment and another has a slight improvement. The performance and added value of the associated merging methods generally increase with the improvement of correlation between rain gauge and radar estimation for medium or large geographic domains, but is invisible for the smaller domains for the limited rain gauges observation and the uncertainty of precipitation under rapid spatial transformation [33].

**Figure 3.** Boxplots of the indicators BIAS, root mean square error (RMSE), and mean-root-transformed error (MRTE) of the chosen two basins. For each method, the central bar is the median, the bounds of the box are the first and third quartiles, and the whiskers include 1.5-times the interquartile range from the box. Note that only the hourly rainfall values in the domain >0.1 mm are provided in this figure.

**Table 4.** The indicators performance based on leave-one-out cross validation (LOOCV) in the two catchments.


Figure 3 provides a graphical representation of the results of the different merging methods on the two basins and indicates that the FBRK results performed best, followed by the KED, CCoK, RK, RIDW, and MFB. The improvement of the FBRK data over the KED, however, was relatively minor compared to their improvements over other merging methods. The two methods that performed best were the FBRK and KED, which belong to the integration category and interpolation category, respectively. It has been pointed out in previous studies that the methods in the bias adjustment category were not found to have a better performance in any inter-comparison studies [32,54]. The results of these methods make us consider that correction can be used to estimate all points inside the study area and can deal with anisotropy and the spatial evolution of precipitation with assuming translation invariance and small basins. The results are not clear, however, for partial methods and indicators. For example, the RIDW performed better overall than the CCoK and RK when applied in Fuping. In terms of the BIAS, the RIDW showed a similar performance with the FBRK or KED and had a better performance than CCoK or RK in terms of RMSE and MTRE indicators. This is due to the fact that the quality of the kriging-based methods and bias adjustment methods depend significantly on the geometry of the rain gauge network distribution and, in particular, on the low-density rain gauge network [32].

Through the overall comparison of the performance of the two basins using di fferent merging methods, we found that di fferent merging methods not only produced similar results, but also showed di fferent performances. This is because there is an information gap, such as the area of the basin, the distribution information of the rain gauges, and the distance from the radar. Therefore, we need to analyze and discuss each basin in order to determine suitable merging methods that suit di fferent basins and the reasons for their suitability, as well as the reasons why other methods do not perform well.

In Table 5, the indicator performances of the merging methods for the four storm events of Zijingguan are shown. Figures 4 and 5 show the boxplot of the three indicators for the four events in di fferent merging methods and the scatterplot of the rain gauge observations with the predicted rainfall values from the six merging methods. The merging methods generally outperformed the radar-only estimations and the quality of radar data determined the quality of the merging products. The BIAS shows a clear underestimation when comparing the radar QPE with the rain gauge observation and confirms the need for correction of the radar QPE using rain gauges. The results for the BIAS show that for the performance of the rainfall data compared with the gauge observation, OR and MFB showed a strong negative value and the RIDW had a strong positive value. The other methods had a bias value of approximately 0. Furthermore, the values of the RMSE and MRTE strengthen the observation. It can also be seen that the interpolation methods and integration methods performed better than the bias adjustment methods. This could be due to the simplicity of bias adjustment methods and the complex formulation of the other two methods. It seems that a more complex implementation of merging methods always achieves a better result.

The FBRK method and the KED method provided the best performance and second-best performance, respectively, for the three indicators, whereby they provided the best and second-best values over all four events in the Zijingguan catchment. The scatter diagram (Figure 5) demonstrates that the FBRK and KED had a significant relationship between the merging data and rain gauge data, thereby indicating the high potential of merging skills in the applications. To the authors' knowledge, in all merging methods studied, the BAY-based method and the KED are the most popular merging methods that generally perform best [32], and this is in concordance with our results. The overall indicators and events show that the FBRK has a slightly better performance than KED. The third and fourth best performing methods were the CCoK and the RK methods. It is noted that the BIAS value-based RK had a slightly negative bias. In terms of all indicators, CCoK outperformed for three events, but for the Z3 event, the RK performed better than CCoK. The RK preserved the relative spatial rainfall resolution of the radar data, but its value estimates tended to be under the range of the gauge observations. We highlight that the RIDW method was applied with the default value of 2 [52]. The parameter, however, is often applied across large areas (particularly including a degree of spatial varying of rain gauge-radar biases), along with small-scale features that are spatially variable. This means that a default value may fail to quantitatively correct the rainfall data. An adjustment of the parameter may achieve a better performance of the RIDW method. The MFB performed worst in all methods and events. The MFB is, however, the most commonly used and investigated method among all merging methods, and it scales the original radar data to match the rainfall accumulations recorded by rain gauges. Considering this, the MFB can potentially provide a better representation of small-rainfall features compared with the other five merging methods [33]. The simple use and way in which the radar QPE is employed throughout the merging process, however, suggests that the MFB may fail to satisfactorily correct the rainfall features.


**Table 5.** The indicator performance based on LOOCV in Zijingguan.

**Figure 4.** Boxplot for the radar only and six merging methods values of three indicators in four events. For each method, the central bar is the median, the bounds of the box are the first and third quartiles, and the whiskers include 1.5-times the interquartile range from the box. Note that only the hourly rainfall values in the domain >0.1 mm are provided in this figure.

**Figure 5.** Scatterplots of data between the rain gauge observations and the radar-rain gauge merged products (the mean field bias (MFB), regression inverse distance weighting (RIDW), collocated co-kriging (CCok), fast Bayesian regression kriging (FBRK), regression kriging (RK), and kriging with external drift (KED)) of the Zijingguan basin. Continuous 1/1 slope lines are shown for the purpose of visualization when comparing different merging methods. In this figure, (**a**) the scatter distribution of the chosen merging methods in Z1 event; (**b**) the scatter distribution of the chosen merging methods in Z2 event; (**c**) the scatter distribution of the chosen merging methods in Z3 event; (**d**) the scatter distribution of the chosen merging methods in Z4 event.

Table 6 shows the indicator performances of the merging methods for the four storm events of Fuping. Figures 6 and 7 show detailed information regarding the six merging methods applied in Fuping. The results of the performances for the three categories are the same as those of the Zijingguan events. When comparing the indicator performances of the two basins, the OR performed better in Fuping, inferring that there was a better QPE for close to the radar station. As expected, the methods belonging to the bias adjustment category had a better performance than their applications in Zijingguan. In contrast to the results of the chosen methods applied in Zijingguan, it is clear that the RIDW belonging to the bias adjustment category outperformed the CCoK belonging to the integration category and the RK belonging to the interpolation category. It is well known that the smaller the value in the indicators and the smaller the range in the boxplot, the better the scatter correction. The better QPE is the main reason leading to the superior performance of the RIDW, and with a better QPE, the RIDW can preserve the original structure of the radar rainfall, especially the small-scale features. The RK method is highly reliant upon rain gauge numbers because it simply utilizes the spatial information of the radar field at the rain gauge locations to interpolate. As well as the ability of RK to reproduce rainfall features, it is highly dependent upon the density of rain gauges at the small scale. The RK method is likely to be used more in the limited rain gauges, with the variogram generation based on the point rain gauge value. The relatively poor performance of the CCoK and RK results (Figure 6) were likely due to the assumption that the Gaussian distribution in dynamic merging methods would

compute results that may be limited in simulating rainfall values [43]. For CCoK, the method takes spatial information from the radar in the basin. According to the performance results (Table 6), the better QPE and limited rain gauges combined to produce a satisfactory performance.

**Table 6.** The indicator performance based on LOOCV in Fuping.


**Figure 6.** Boxplot for the radar only and six merging methods values of three indicators in four events. For each method, the central bar is the median, the bounds of the box are the first and third quartiles, and the whiskers include 1.5-times the interquartile range from the box. Note that only the hourly rainfall values in the domain >0.1 mm are shown in this figure.

**Figure 7.** Scatterplots of data between the rain gauge observations and the radar-rain gauge merged products (the MFB, RIDW, CCoK, FBRK, RK, and KED) of the Fuping basin. Continuous 1/1 slope lines are shown for the purpose of visualization when comparing different merging methods. In this figure, (**a**) the scatter distribution of the chosen merging methods in F1 event; (**b**) the scatter distribution of the chosen merging methods in F2 event; (**c**) the scatter distribution of the chosen merging methods in F3 event; (**d**) the scatter distribution of the chosen merging methods in F4 event.

Through the comparison of the three evaluation indicators with the two basins, the two classical approaches (FBRK and KED), based on inotropic and variograms, are the best suited of the merging methods, which work well with the high spatial and temporal variability of precipitation. The performance of the different merging methods is clearly shown in this section. Multiple factors, which affect the application of radar-rain gauge merging methods used in the basins, were identified in this study. Obviously, the most important single factor affecting the performance is the quality of QPE that indicates the ability of radar to sample the rainfall conditions [55]. As the application in the Fuping shows, the RIDW, based a better QPE, outperformed the CCoK and RK, although the limited rain gauges significantly affected the performances of these methods. The lack of rain gauges created a lack of consistency provided by radar and rain gauges. As some studies have found, merging performance generally improves with increasing consistency between radar and rain gauge measurements, particularly in the integration and interpolation categories [19].

For small-scale basins, the preservation of small-scale rainfall features is critical to apply the methods. Different merging methods mean different choices regarding how the radar and rain gauge data are treated and applied during the merging process [56]. In this study, methodological choices focused on improving the quality of radar and rain gauge estimates through different methods. For small-scale basins, the radar QPE was the main data source, providing spatial details of the rainfall. Thus, the ability of these merging methods to preserve local rainfall features was highly dependent on the proportion of, and way of, employing radar data through the merging process. As described in Section 2.1.1, the bias adjustment category scaled the original radar estimation by multiplying the

QPE accumulations with factors to match the rain gauge record. As such, the original structure of the radar rainfall field was essentially preserved. However, the MFB did not achieve a satisfactory performance in the rainfall rates correction associated with small-scale features. Because MFB is usually applied uniformly to large areas, it often ignores the spatial variability in radar QPEs when applied at the small scale [57]. As Table 6 demonstrates, RIDW achieved a satisfactory performance with a high-quality radar QPE. As discussed in the previous section, a local regression of the radar data on the rain gauge data could contribute positively to the RIDW performance. The methods in the integration category combine the two datasets based on their relative uncertainty (see Section 2.1.2). As such, with limited rain gauges or high spatial rainfall variability, integration methods take more information from the radar data. This means that it is possible for integration methods to obtain a satisfactory reproduction of small-scale features, as in the assessment result of FBRK (Table 5). To a certain extent, however, CCoK prefers the interpolation category, in both the performance of comparison from the results and the Gaussian assumption. Unlike the other two categories, the methods in the interpolation category are highly reliant on rain gauge observations, simply considering the gauged radar data in the interpolation process. As per the result comparison discussed in the previous sections, the performance of interpolation-based methods is clearly associated with the density of rain gauges, which decide the ability to capture the small-scale rainfall features. Conversely, interpolation methods with Gaussian assumptions always smooth the high nonlinearities in small-scale features [58]. This is why the error of the RK prediction value is more obvious when the precipitation is large. For KED, the spatial details are reconstructed after the merging process. The density of rain gauges employed in radar-rain gauge merging has an impact on the performance of the merging methods. The impact of limited rain gauge availability on merging performance is closely linked to the reliance of a given merging method upon rain gauge data, as well as to the way in which radar data is employed in the merging process. Having a su fficient number of rain gauges in the study area may increase the ability of the rain gauges to capture the relevant precipitation features. However, the two catchments in this manuscript had a limited number of rain gauges due to the lack of monitoring network. With the increasing of monitoring stations, a further work should be implemented to study the influence of the rain gauge density on the merging performance.

In the six chosen merging method performances, we conclude the strengths and weaknesses of the three categories. The bias adjustment category has the advantage of ease of use, which leads to its wide application. For the methods in this category, small-scale rainfall features are generally preserved, although the correction may fail to correct the rain rates. The integration category allows for the consideration of rain gauge and radar uncertainties. The complex computation of the integration category, such as requiring solving matrix systems, leads to it being applied and tested the least. The interpolation category ranks between the other two categories in both complexity and performance, which is why the methods in the interpolation category are becoming increasingly popular [19].

#### *4.2. Hydrological Model Performance Evaluation*

The LOOCV analysis result does not allow a direct comparison because the left rain gauge is just used to compare and not used to compute the product [52]. Some authors have proposed that a higher quality of the merging products can be indicated from agreements between the simulated and observed runo ff using the merging products as the input [53]. All authors, however, pointed out that the calibration of a hydrological model is a demanding task and subject to various uncertainties, particularly for mountain flood simulations, whereby it is not easy to find the adequate parameters for considering uncertainties in the model structure and parameterization. In this study, a set of calibrated and validated model parameters were, therefore, used with the Hybrid-Hebei rainfall-runo ff model in the same two basins [47].

Table 7 shows the performance of merging methods in the runo ff simulation. As expected, the merging methods performed a di fferent goodness-of-fit with the runo ff simulation. For the NSE, all merging methods performed normally, and the extent of failure was revealed more clearly in RE. The values of NSE were increased from 0.21 (the MFB in F4) to 0.62 (the KED in F2). Concerning peak flows, the values of MFB and RK were negative overall, indicating a general tendency of underestimation together with the performance of rainfall estimates in the previous section. All the other methods showed a higher estimate. It can be noticed that the RK method seemed to have a better performance for the peak flows than FBRK and KED methods for the Z3, F2, F3, and F4 storm events. In this study, the purpose of inputting the QPEs to the hydrological model was not to rank the merging methods, but to test the applicability of the merging rainfall products for flood forecasting. The performance of the rainfall-runo ff model is subject to its parameter calibration. Since only one certain set of parameters was used for the Hybrid-Hebei model, the di fferences of QPEs from di fferent merging methods may be obscured in the runo ff simulations. As indicated in Table 7, the MFB performed worse than any of the methods chosen. The best results were the FBRK and KED methods. The performance of the FBRK and KED methods were almost identical when considering the simulation of NSE and RE in the entire events. This may be because the model calibration seems to have compensated for the di fferences in the rainfall garnering in the merging process. Compared with the CCoK and RK, the CCoK showed a better performance in most events, and therefore, the result indicates that the performance of the hydrological output is highly dependent on the accuracy of the rainfall product. The RIDW method ranked between the RK and MFB in both Fuping and Zijingguan. With one exception (Z3), however, the performances of these merging methods were di fferent from other events.


**Table 7.** The indicator performance based on the Hybrid-Hebei model in two catchments.

Figure 8 shows the 24-h accumulated rainfall in the Zijingguan catchment based on di fferent merging methods in the Z3 event. Figure 9 shows the hourly precipitation distribution and simulated stream flow hydrograph of all merging methods in this event, and the hydrograph flood process was extended to 36 h. As indicated in Figure 9, the entire runo ff simulation process lines of merging methods had the same trend as the measured flow process line. The runo ff hydrograph indicates that the catchment in the study area was relatively fast. The occurrence time of flood peak was the same in all simulation processes, and the peak staggering time was no more than a maximum of one hour. This indicates that the Hybrid-Hebei model can successfully simulate runo ff under di fferent precipitation products. For all merging methods, it is can be seen from Table 7 and Figure 9 that FBRK continued to rank highest in the hydrological verification. The RIDW outperformed at NSE because the runo ff simulation hydrograph of RIDW was the only one with a double-peak flow as the measured runo ff streamline. RIDW, however, was evaluated as only better than the MFB in LOOCV. KED performed the

third best in the NSE, however, the extent of failure was found in the peak flow. The CCoK performed worse, and a one-hour peak flow lag was found (Figure 9). RK and MFB both showed an average performance in NSE and RE, and both showed a clear underestimation in peak flow.

**Figure 8.** Twenty-four-hour accumulated rainfall based on different merging methods (mm).

**Figure 9.** Hydrographs showing simulated stream discharge at the discharge station in Zijingguan.

Based on the result of NSE (Table 7), the results of runoff simulations were normal. This is particularly because these merging methods do not have a large difference in accumulated precipitation and temporal distribution in Z3 (Figure 9). When considering the peak flow, however, the discrepancy was pronounced in both value and occurrence time. Furthermore, for RIDW and CCoK, the peak flow occurrence time of the other four methods was consistent with the measurement, and the difference between them was the value of peak flow. The lagged phase of the double peak flows in RIDW and peak flow in CCoK was approximately one hour. Compared to line A and line B, clear rainfall random errors in RIDW and CCoK were found, and the response of the runoff model performed differently. It is known that the transformation of precipitation to runoff is a smoothing operation in both space and time. For this catchment, a quick response time leads to a failure in the "smoothing effect." If the rainfall accumulation is larger than the response time of catchment, the performance of these methods will be improved [53].

In this section, the fitness of the simulated flow driven by different merging products was used to assess the performance of these products. The hydrological performance of the merging methods was not as good as expected. It is notable that, in runoff simulations, these methods all significantly ranked the lowest in the bias adjustment category. This may be because a spatially differentiated correction was not adopted in this category, and the transformation of rain to runoff is a temporal and spatial process. If the correlation length of random errors is close to the catchment's response time, random errors will be averaged out to a lower extent [53]. For the chosen methods, both FBRK and KED exhibited a low variation of approximately 0.6. The median ranking of RIDW in LOOCV was different to its performance in runoff simulations than CCoK and RK. It should be noted that the potential of giving the merging products as hydrological input is also a function of further multiple factors, including the methodological choices in the merging process, the climatological conditions in the basins, structural model errors, and the cross uncertainties in the entire merging and hydrological application [59]. Moreover, in order to assess the performance of hydrological variables while considering spatial observations, temporal observations must be taken into consideration because products generated by a distributed hydrological model or semi-distributed model, such as stream flow, are found to be sensitive to different high-resolution precipitation [60].
