*4.1. Unceratinty Quantification*

For the hydrologic simulation, uncertainty can result from different sources such as forcing data, model structure, model parameter, and initial and boundary condition uncertainties [37]. Each of these sources of uncertainty could be investigated and reduced through different treatments such as by providing more accurate forcing data to reduce the forcing uncertainty or by calibrating the model parameters to limit the parameter uncertainty. Data assimilation techniques are also used commonly to reduce the uncertainty due to the initial and boundary conditions in the models and in some cases to address the parameter uncertainty. Although investigating the model structure uncertainty is possible via the multi-parameterization scheme of NoahMP as the Land Surface Model used in WRF-Hydro as well as through different routing options available in WRF-Hydro, such a study was beyond the scope of this paper. We investigated the impact of the forcing on the model simulation as well as the parameter uncertainty, a subset of sources of uncertainty, of which the findings are summarized in the following subsections.

#### 4.1.1. Forcing Uncertainty

Poor representation of meteorological forcing or errors may propagate into the hydrologic simulation and affect the result in a nonlinear way. Similar to previous WRF-Hydro studies [20,21], the NLDAS-2 dataset was used in this study as meteorological forcing. As explained in Section 2.4, in order to better represent the rainfall intensity, NLDAS-2 s rainfall field was substituted with interpolated rain gauge observation. However, how well the model can perform when driven by the original NLDAS-2 dataset needs to be investigated. We performed a sensitivity experiment on the calibration event. The streamflow was simulated with default parameters (un-calibrated) and driven by the original NLDAS-2 (1/8 degree resolution) and updated rain gauge data (1 km resolution), respectively. The comparison between simulated and observed hydrographs is shown in Figure 8.

**Figure 8.** Measured (dash line) and simulated hydrographs (solid line for updated NLDAS-2, dash-dotted line for original NLDAS-2) at MSGC01 for calibration event.

Without calibration, using the original NLDAS-2 forcing data, the simulated hydrograph (Figure 8, dash-dotted line) fails to reproduce the rainfall event. Once the rainfall field is updated with rain gauge data and used to drive the model, a significant improvement is achieved in terms of amplitude of simulated hydrograph (Figure 8, solid line). However, timing error still exists, which will be largely minimized through calibration, as shown in Figure 4. A similar improvement has also been reported previously [37]. Thus, rainfall data with compatible resolution is recommended when applying WRF-Hydro-Sed to a relatively small watershed under local storm events in order to generate satisfactory results. This comparison could serve as a good example of how sensitive the model response is to the forcing dataset and in this case, precipitation. Ideally, one would like to force the model with an ensemble of the different forcing datasets to cover a range of forcing uncertainty in the model simulations.

## 4.1.2. Parameter Uncertainty

Model parameters are another source of uncertainty in the model simulation. This uncertainty is reduced to some degree through the calibration process. In this part, based on the single event calibration, we assess the goodness of calibration and prediction uncertainty using P-factor adapted from the Sequential Uncertainty Fitting method (SUFI-2) following previous studies [38–40]. P-factor is defined as the percentage of the measured data bracketed by the 95% prediction uncertainty (95 PPU), which represents the degree of uncertainties considered by the model parameters. The 95 PPU is calculated based on the cumulative distribution of the model outputs from different experiments corresponding to different model parameters. Here, the model output is the streamflow simulation and the model experiments are different calibration iterations having different model parameters. It is believed that the streamflow measurements reflect all the uncertainty in the model and inputs [41]. A P-factor of 100% indicates full coverage of observation in the 95 PPU, indicating that all uncertainty is explained by the model parameter uncertainty [38].

Based on the single event calibration we conducted, the P-factor is 95%, indicating that most of the measurements were bracketed by the model parameter uncertainty. Figure 9 shows the ensemble of simulated hydrographs (from the calibration process), with the 95 PPU band (red) against the observation and the best simulation during the calibration event. In this case, it can be concluded that the simulations based on single event calibration has generated a large coverage that covers the observation except for the overestimation over the beginning of the rising limb.

**Figure 9.** The best-simulated streamflow with 95 PPU and observations during the calibration event in 17–18 October 1981.

#### *4.2. Applicability of Calibrated Parameters*

Fine scale, grid- and process-based sediment models are usually used to simulate the sediment processes for a single rainfall event (e.g., [13,22,35,42]), instead of continuously simulating soil erosion over a long time scale. Part of the reason for this is that soil erosion at the watershed scale is thought to be controlled mainly by a few rainfall events [3]. In addition, continuous calculation of soil erosion with process-based models requires a large quantity of computational time as well as huge amounts of observation data, which are usually not available. In this case, such models are usually calibrated on one event and the calibrated parameters are then applied to another event.

In this study, WRF-Hydro-Sed was calibrated for the rainfall event of 17 October 1981 and then verified by the validation event with calibrated parameters. As mentioned in Section 3.3, in spite of the difference in initial conditions, with a reasonable spin-up period, the calibrated hydro-parameters can be transferred to the validation event and generate satisfactory hydrographs with high NSE values (0.86). For sediment simulation, although simulated sediment concentration and sediment flux exhibit larger bias (Table 6), which are well acknowledged by researchers in sediment modeling as a challenge, the simulated sediment yield at the outlet is acceptable, which validated the model's satisfactory performance with calibration.

However, variability in land use character and soil condition, as well temporal and spatial distribution of rainfall between different rainfall events, which haven not/cannot fully have been/be considered in our model, can restrict the application of the calibrated parameters based on a single event calibration to other events. In order to evaluate how well the model can perform over different rainfall events with calibrated parameters based on one single event, we applied calibrated hydro- and sediment parameters to the year of 1982 to conduct a one-year simulation. The year 1982 was selected mainly because it covers various rainfall events with different intensities and rainfall totals (Figure 10).

**Figure 10.** (**a**) Daily rainfall (black) of 1982. (**b**) Measured (dark orange) and simulated (blue, based on calibrated hydro-parameters) hydrographs at MSGC01 for all the rainfall events of 1982. (**c**) Same as (**b**) but based on recalibrated hydro-parameters. Several rainfall events are annotated with red, indicating underestimation and black represents overestimation.

#### 4.2.1. Hydro-Parameters

Figure 10b shows the simulated and observed streamflow during 30 rainfall events of 1982, with the calibrated hydro-parameters from Section 3.3. Overall, with the calibrated hydro-parameters based on one single event (calibrated hydro-parameters hereafter), the model can reproduce all the streamflow events in the year with an NSE value of 0.43. However, simulation underestimates the streamflow mainly during the heaviest rainfall events of the year, i.e., rainfall events of 19 April, 6 October, 3 and 25 December, while overestimation can be found during less intense rainfall events such as that of 1 July. This indicates that the calibrated hydro-parameters based on one single event might favor the calibration event itself, while they are less suitable to fully reproduce hydrographs over events that have much different rainfall characteristics. In this case, we recalibrated the streamflow on the 30 rainfall events of 1982 (recalibrated hydro-parameters hereafter) with the observed streamflow to investigate how much the model performance can be improved through the multiple events recalibration. In addition, the calibrated hydro-parameters can be better evaluated by comparing them to the recalibrated hydro-parameters in terms of model performance improvement.

The multiple events recalibration was conducted automatically with a 150-iterations run using the NCAR developed calibration tool. The recalibrated hydrograph against the observation is shown in Figure 10c. The NSE value is 0.51, which is 1.19 times better than that using calibrated hydro-parameters. However, the multiple events recalibration consumed more than 21 times the computational hours than the single event calibration (6840 versus 320 computational hours) to achieve such an improvement. In addition, streamflow due to three rainfall events (19 April, 3 and 25 December) is underestimated (Figure 10b) and is still subject to underestimation after recalibration (Figure 10c). Streamflow during the rainfall event of July 1 is overestimated both in Figure 10b and after recalibration in Figure 10c. Meanwhile, simulated streamflow during 27 August and 6 October changed from being underestimated with single event calibration to being overestimated under recalibration. This implies that for the event-based simulation, it might not be practical to find a set of parameters that can be suitable for all events. Multiple events calibration can be used to improve the model's performance to a certain degree, yet it requires a substantially higher computational cost than the single event calibration. With this regard, intensive calibration over a long time scale might not be an optimal strategy if computational cost is a major concern and model performance based on a single event calibration is acceptable.

#### 4.2.2. Sediment Parameters

To evaluate the applicability of calibrated sediment parameters based on one single event to other events, we applied them to simulate the sediment processes for the year of 1982. With 20 processors, 168 h were used to finish the simulation. Based on the available observation data of the sediment, the simulated sediment yield is compared against the observation for 17 sediment events. The characteristics of rainfall events, the simulated and the observed sediment yield during those events are shown in Table 7. It is noted that the sediment event of 3–4 June, 3–4 December, and 24~28 December includes 3, 2, and 3 rainfall events, respectively, as the sedimentary processes are correlated during such rainfall events.

For all of the 17 sediment events simulated, the minimum and maximum ratios between the observed sediment yield and the simulated sediment yield are 0.13 and 5.47, respectively (Table 7). This proves that with the calibrated sediment parameters based on one single event, simulated sediment yield for other different events can be expected to be at least within the same magnitude of the measured one. Furthermore, for 8 out of 17 sediment events, the simulated sediment yield is within 50–150% of the measurements, which corresponds to at most 50% under- or over-estimation. For 12 out of 17 events, the simulated sediment yield is within 33–300% of the measured sediment yield, in response to 200% under- or over-estimation at most, which is generally acceptable in sediment simulations. The coefficient of determination R2 between the simulated and the measured sediment yield is 0.57, which also indicates the acceptable performance of the model [43]. In addition, the simulated total sediment yield (228,698 t) of all the events is only 11% higher than that of the observation

(203,387 t), which implies that it is also promising to use the model to estimate annual soil erosion on a watershed scale.

In spite of the overall acceptable performance of the model in simulating sediment yield, substantial over- and under-estimation can be found during events of 17 April, 25 May, 3–4 June, 11 August and 10–11 December. Considering the exponential relationship between overland runoff and sediment transport capacity, the bias of the simulated sediment yield can partly be attributed to the under- or over-estimation of the streamflow. In addition, model bias can also be attributed to the absence of a channel and bank erosion algorithm in the current model. As the sediment yield may be sourced from not only upland erosion, but also from channel and bank erosion, model bias may occur as a consequence of the model's failing to account for the sediment contribution from the channel bed and bank. With this regard, future development of the model should include the bank and channel erosion to further improve the model performance.


**Table 7.** Rainfall intensity, duration, and return period, simulated and observed sediment yield for 17 sediment events during 1982. Sediment events of 3–4 June, 3–4 December, and 24–28 December include 3, 2 and 3 rainfall events, respectively. Return period < 1 represents normal rainfall event.
