*3.2. Validation of Runo*ff *Discharge*

Figure 7a presents the event-based comparison between the measured runoff discharge and the runoff discharge predicted under different cases. The simulated runoff discharge in most events were comparable to their measured values. For the five events in the 1980s, the peak values of R1 were lower than those of O1. For the three events in the 2010s, the peak values rank from lowest to highest: R3 < R1 < R2 < O1 (Figure 7a). This shows that the revised simulation with vegetation can significantly reduce the runoff peak compared to the simulation with terrace, and that the simulation considering both terrace and vegetation was closest to the estimated value.

Comparison of the simulated runoff with the measured runoff is shown in Figure 7b–e. Figure 7b shows the validation of O1, in which the five events in the 1980s achieved an average NSE of 0.46, while the three events in the 2010s achieved an average NSE of −15.29. The data points of the 1980s are distributed close to the 1:1 line, while the data points of the 2010s are distributed farther away from the

1:1 line. It shows that the original model did not achieve good performance for recent years, since the surface had undergone significant change. Figure 7c shows that the average NSE of the 1980s and 2010s both increased in R1, with the average NSE of the 2010s showing a greater increase (value of −1.32), and the data points of the 2010s distributed closer to the 1:1 line than in the O1 validation shown in Figure 7b. Figure 7d shows the R2 validation, and indicates that the average NSE of the 2010s has also increased compared with that of O1, but that the extent of the increase is less than for R1. Figure 7e shows the R3 validation and indicates that the average NSE of the 2010s is 0.39, which for the first time is positive, and that the data points of the 2010s are closely distributed around the 1:1 line. Overall, the simulation accuracy of the 2010s is highest in R3, followed by R1. These figures show that the revised model was better at simulating the runoff in recent years and can reflect the effect of significant surface change (i.e., slope terracing and revegetation) on runoff.

**Figure 7.** Comparison of observed and simulated runoff for eight events in the Pianguanhe Basin. (**a**) Event rainfall-runoff simulation; (**b**) Runoff in O1 simulation; (**c**) Runoff in R1 simulation; (**d**) Runoff in R2 simulation; (**e**) Runoff in R3 simulation.

#### *3.3. Validation of Sediment Discharge*

The hourly measured sediment concentration (kg/m3) along with the measured runoff discharge were converted to get sediment discharge (t/h). Figure 8a shows the event-based comparison between the measured and predicted sediment discharge under different cases. The simulations of sediment discharge were in good agreement with their measured values in most events. The difference between the sediment discharge of O1 and measured values is smaller than the runoff discharge between O1 and measured values in recent years; it was because the sediment reduction was partly achieved in the sediment yield process through the C and P factors in MUSLE. For the five events in the 1980s, the peak values of R1 were lower than those of O1. For the three events in the 2010s, the peak values rank from lowest to highest were: R3 < R1 < R2 < O1 (Figure 8a). This shows that the revised simulation with vegetation can significantly reduce the sediment peak compared to the simulation with terrace, and that the simulation considering both terrace and vegetation was closest to the estimated value.

**Figure 8.** Comparison of observed and simulated sediment discharge for eight events in the Pianguanhe Basin. (**a**) Event rainfall sediment simulation; (**b**) Sediment in O1 simulation; (**c**) Sediment in R1 simulation; (**d**) Sediment in R2 simulation; (**e**) Sediment in R3 simulation.

NSE was also used to evaluate the performance of the simulated sediment discharge, as shown in Figure 8b–e. Figure 8b shows the O1 validation, in which the five events in the 1980s achieved an average NSE of 0.08, while the three events in the 2010s achieved an average NSE of −2.47. It shows that the original model did not achieve good performance in the recent years. Figure 8c shows the R1 validation, in which the average NSE of the 1980s and 2010s both increased. The data points of the 1980s and 2010s were distributed closer to the 1:1 line than in Figure 8b. Figure 8d shows the R2 validation, in which the average NSE of the 2010s have also increased, and greater than that of R1, which indicates that terrace can significantly reduce sediment. Figure 8e shows the R3 validation, in which the average NSE of the 2010s is 0.37, and the data points of the 2010s are distributed closely around the 1:1 line. Overall, the accuracy of the simulation in the 2010s is highest in R3, followed by R2. These figures show that the revised model performs better in recent years, and can reflect the effect of significant surface change (i.e., slope terracing and revegetation) on sediment yield.
