2.4.4. Evaluation Criteria


#### **3. Results**

## *3.1. Parameter Sensitivity*

The sensitivity analysis results of seven selected parameters under different scenarios (see scenario code in Table 2) are listed in Table 3, and the ranking (1 to 7) is based on the absolute value of the *t*-stat for each approach. CH\_K2 was the most sensitive parameter in this study with the largest *t*-stat for most of the scenarios. SOL\_AWC ranked last among the seven parameters in this sensitivity analysis. In the initial iteration, all parameters showed relatively higher sensitivity index. As the iteration increased, however, the parameter range narrowed and the sensitivity index declined. This showed that the parameter sensitivity was dependent on the allowable range of a parameter, and could be lower when it approached the optimal value.


**Table 3.** Ranks of parameter sensitivity in different approaches.

#### *3.2. Model Performance*

Model performance in simulating streamflow and ET was evaluated in this study. Table 4 shows the model complexity of the calibration procedure, NSE, and uncertainty indexes of the six approaches involved in our study. The impacts of different approaches on model efficiency were further evaluated in terms of implementation complexity. There was an upward trend of calibration complexity from No.1 to No.6 due to the manual parameter range adjustments after each SUFI-2 iteration. NSE of single-objective (streamflow only) calibration (NSE = 0.86) was larger than that of bi-objective

calibration (streamflow plus ET, NSE = 0.82), indicating the SWAT model showed different abilities for the simulation of streamflow and ET.


**Table 4.** Complexity, model performance, and uncertainty of different scenarios.

Note: each "\*" indicates manually adjusting the parameter ranges once.

Figures 3–5 graphically compare the simulated monthly streamflow and basin-average ET against those observed from gaging stations and remote sensing technology during the 13 year (1998–2010) calibration and 5 year (2011–2015) validation periods.

#### 3.2.1. Streamflow

Figures 3 and 4 illustrate the optimal simulation of monthly streamflow at stations at Zhaoping (July 2007–October 2011) and Jingnan (January 1998–October 2011, excluding December 2013–June 2007 due to data non-availability). Throughout the simulation period, the average monthly streamflows at Zhaoping and Jingnan stations were 367.5 and 485.2 m<sup>3</sup> /s, respectively. Overall, the streamflow at Zhaoping station (375.4–397.4 m<sup>3</sup> /s) was overestimated by 2% to 8%, whereas the streamflow at Jingnan station (423.7–449.8 m<sup>3</sup> /s) was underestimated by 7% to 13%. Though values in winter (December to February) at both stations showed better consistency with simulations than peaks. A large bias mostly appeared in peak flows, among which the most dramatic overestimation and underestimation were found in 2011 (about 500% overestimation) and 2013 (about 40% underestimation). Compared to the calibrations, NSE and R<sup>2</sup> decreased significantly in validations (see Figures 3 and 4) due to the substantial overestimation of peak flow (in 2011) mentioned above.

Different approaches (i.e., No.1 to No.6 as listed in Table 2) had little impact on simulation accuracy, with very close NSE values under the same calibration category (single-objective or bi-objective calibration) (Figures 3 and 4). Similarly, additional constraints had a slight impact on NSE. At Zhaoping, for example, the difference in NSE and R<sup>2</sup> was approximately 0.01 to 0.03 between single-objective and bio-objective calibrations, and was less than 0.01 at Jingnan station.

## 3.2.2. ET

In this study, monthly MODIS ET data was corrected using a simple water balance model (i.e., ET = Precipitation — Water yield — 4Soil water storage [61]) before being used as an extra constraint for model calibration. 4Soil was set as zero because the changes in soil storage are of smaller magnitude as those of other components, and were thus considered negligible over a multi-annual period. We found there was a substantial bias—20% higher during August to December—compared to the amount derived from the long-term water balance, and thus we corrected ET data accordingly before using it for model calibration. Similar to the results for streamflow, there was little difference in the performance of ET simulation between different approaches (Figure 5). However, the additional constraints improved the modeling accuracy (an increase of 0.1 in NSE). NSE of single-objective (streamflow only) calibration was about 0.7 (the left column of Figure 5) and 0.8 for bi-objective

(streamflow plus ET) calibration (the right column of Figure 5). In addition, there was no significant bias during the simulation with all PBIAS values being less than 3% for single-objective calibration and less than 8% for bi-objective calibration. The overestimation mostly occurred in June and July for both single-objective and bi-objective calibration, whereas the differences for the latter were slightly little lower than those of the former (i.e., single-objective calibration), indicating the additional constraints of ET decreased the bias of ET's peak value simulation and improved the simulation accuracy in terms of ET without compromising streamflow simulation. Overall, model performance in ET simulation (NSE < 0.8) was not as good as streamflow (NSE ≈ 0.85), but it is still satisfactory, particularly considering the uncertainty of remotely sensed ET [62].

In addition, we analyzed the tendency of NSE and compared the proportion of behavioral simulations (PBS) with the increase in simulation times (see Figure 6). With the exception of the random distribution from 0.15 to 0.85 of No.1, NSE of other approaches (as listed in Table 4) exhibited an obvious upward tendency because of the iteration and optimization throughout the simulations, and finally stood at about 85%. In terms of PBS, we mark it at each step of different approaches on the bottom of each plot with alternate background colors. Overall, PBS had a tendency similar to that of NSE in all approaches. In approaches No.4 to No.6, PBS increased in each step, among which the most significant increments occurred after the first two manual adjustments of the parameter ranges, and the increase in the last step was relatively small in approaches No.5 and No.6. The PBS of R (streamflow calibration only) decreased from 96.8% to 93.8% in the fourth step of approach No.5, and all other PBS in approaches No.4 and No.5 reached 100%, suggesting 2000 model runs was enough for the calibration. In addition, it is worth noting that the PBS of bi-objective calibration increased more rapidly than that of single-objective calibration. Prior to the parameter optimization and range adjustment (i.e., approach No.1 and the first step of approaches No.3 to No.6), PBS of R (about 16%) was about twice that of RE (less than 8%), whereas the gap narrowed with the reduction of the parameter range, and finally the latter became higher than the former. This indicated bi-objective calibration was more efficient for accuracy improvement than single-objective calibration.

#### *3.3. Modeling Uncertainty Analysis*

Figures 3–5 provide a visual representation of the prediction uncertainty through the width of 95PPU. The uncertainty band narrows from No.1 to No.6, indicating the combined approaches had a positive influence on modeling uncertainty. In addition, from *R*-factor and *P*-factor listed in Table 4, we found the uncertainty bands of No.1 were the widest (*R*-factor ≈ 0.9, *P*-factor ≈ 0.8) among all approaches, followed by No.2, 3, 4, and 6, and the bands of No.5 were the narrowest (*R*-factor ≈ 0.1, *P*-factor ≈ 0.4). This suggests approach No.5 (SUFI-2 500\*3 + PSO 500) has the lowest modeling uncertainty.

Scatter-box plots (Figure 7) show the distributions and tendencies of the seven parameters. Clearly the parameter distributions were contingent on the running times. The parameter values using approach No.1 were randomly distributed in the original ranges, while parameter values using No.2 to 6 were clustered with the running of the model. However, it is clear that the tendencies of CN2, ESCO, and SURLAG were distinct between single-objective and bi-objective calibrations. Moreover, parameters might be outside of the threshold set initially with the iterations.

**Figure 3.** Calibration and validation of SWAT in simulating streamflow at Zhaoping gaging station using observed streamflow data only (the left panel) and observed streamflow plus remotely-sensed ET (the right panel). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios. **Figure 3.** Calibration and validation of SWAT in simulating streamflow at Zhaoping gaging station using observed streamflow data only (the left panel) and observed streamflow plus remotely-sensed ET (the right panel). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios.

Remote Sens. *2020,* 12*, x FOR PEER REVIEW 11 of 26*

**Figure 4.** Calibration and validation of SWAT in simulating streamflow at Jingnan gaging station using observed streamflow data only (the left panel) and observed streamflow plus remotely-sensed ET (the right panel). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios. **Figure 4.** Calibration and validation of SWAT in simulating streamflow at Jingnan gaging station using observed streamflow data only (the left panel) and observed streamflow plus remotely-sensed ET (the right panel). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios.

*Remote Sens.* **2020**, *12*, 4069

Remote Sens. *2020,* 12*, x FOR PEER REVIEW 12 of 26*

Remote Sens. *2020,* 12*, x FOR PEER REVIEW 13 of 26*

**Figure 5.** Model performance in simulating the spatially averaged evapotranspiration using different methods. The left panel was the simulated ET using the best parameter sets of single-variable (streamflow) calibration. The right was the output of multi-variable (streamflow plus ET) calibration. The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios. **Figure 5.** Model performance in simulating the spatially averaged evapotranspiration using different methods. The left panel was the simulated ET using the best parameter sets of single-variable (streamflow) calibration. The right was the output of multi-variable (streamflow plus ET) calibration. The definition of scenario codes (e.g., 1-R, 1-RE, etc.) marked on the upper left corner of each panel can be referenced to Table 2. (**a**–**l**) were the codes of 12 scenarios.

Remote Sens. *2020,* 12*, x FOR PEER REVIEW 16 of 26*

**Figure 6.** Dot-plots of NSE against different methods and running times under different approaches (i.e., approaches No.1 to 6 as listed in Table 2). Values denoted as PR or PRE in each panel indicate the percentage of the NSE > 0.8. X axis means the running time, and Y axis indicates the NSE value. The red dots represent use of observed streamflow only, and teal dots represent use of observed streamflow and remotely-sensed ET. (**a**–**f**) were the codes of 6 approaches. **Figure 6.** Dot-plots of NSE against different methods and running times under different approaches (i.e., approaches No.1 to 6 as listed in Table 2). Values denoted as PR or PRE in each panel indicate the percentage of the NSE > 0.8. X axis means the running time, and Y axis indicates the NSE value. The red dots represent use of observed streamflow only, and teal dots represent use of observed streamflow and remotely-sensed ET. (**a**–**f**) were the codes of 6 approaches.

Remote Sens. *2020,* 12*, x FOR PEER REVIEW 18 of 26*

**Figure 7.** Uncertainty ranges of calibrated parameters using different methods and running times during the calibration period. The back line in each panel indicates the initial range of the parameters (listed in Table 1). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) on X axis can be referenced to Table 2. **Figure 7.** Uncertainty ranges of calibrated parameters using different methods and running times during the calibration period. The back line in each panel indicates the initial range of the parameters (listed in Table 1). The definition of scenario codes (e.g., 1-R, 1-RE, etc.) on X axis can be referenced to Table 2.

#### **4. Discussion**

#### *4.1. Approaches for Model Calibration and Uncertainty Analysis*

#### 4.1.1. Effects on Simulations

We compared different combinations (the approaches listed in Table 2), which showed multifaceted impacts, such as complexity of calibration procedure, performance, and uncertainty, on the modeling. However, approaches with more SUFI-2 iterations were more complex to implement because of the manual manipulation of the parameter range. In terms of NSE, all approaches performed well, particularly during the calibration of streamflow at two sites. However, the different approaches had little impact on NSE, for which the difference was less than 0.02 at the same gaging station. Similar results were reported in previous studies [18,56]. In addition, Figure 6 shows NSE of scenarios 2–5-R (all used PSO) were bi-modal at the end of the calibration cycle with values above and below NSE = 0.8. The PSO method sets the particles randomly before the optimization begins, followed by iterations with pre-set times. Thus, it is possible that some particles may result in relatively poor simulation even after several rounds of iteration.

To evaluate the modeling uncertainty of different approaches, we present Figures 3–5 to show the change in uncertainty, and also provide the quantitative evaluation using *R*- and *P*-factors in Table 4. Approach No.1 used SUFI-2 with only 2000 model runs, and had the widest 95PPU due to the random prior distribution of parameter sets generated by LHS [18,63], a stratified sampling method. Approach No.2 showed relatively low uncertainty because PSO is an adaptive optimization algorithm, in which particles aggregate gradually around the supposed optimum. Although manual adjustment increased the complexity when combining SUFI-2 and PSO in approaches No.3 to 5, it reduced the uncertainty because the ranges of parameters were narrower. Approach No.6 also exhibited a substantial reduction of uncertainty because it iterated four times and reduced parameter ranges (each time with 500 model runs).

#### 4.1.2. Effects on Parameter Ranges

Parameter boundaries can be divided into two categories: empirical and compulsory boundaries. The former was set according to previous experience, while the latter was considered to be the maximum allowable range of a specific parameter in all cases. Thus, a parameter, which is beyond the empirical boundary but still within the compulsory boundary, may be physically rational. In this study, parameter may overflow (beyond the defined boundary) when using approaches No.2 to No.5 with the PSO algorithm (see Figure 7 and Table 5). This is because PSO is a statistical algorithm that does not consider the physical meaning of the parameters [64], and the iteration of each step was automatic without user's manipulation. In particular, in scenario 2-RE (PSO 2000 with bi-objective), SURLAG initially ranged from 0 to 4 (a compulsory boundary of zero), but a final value of −0.3 (the optimal value) was derived, which represents a typical parameter overflow using PSO. Therefore, optimization without considering the rationality of parameter may lead to incorrect results.

In contrast to PSO, SUFI-2 can overcome the problem of boundary overflow because each iteration allows users to change parameter ranges to ensure the physical suitability for the next iteration [52]. Nonetheless, parameter overflow can still occur. In approach No.6 (SUFI-2 500 × 4, see Figure 7), for example, ALPHA\_BF was set to be beyond the empirical upper boundary but within the compulsory boundary in our manual adjustment to ensure better model performance. This indicates that updating the parameter ranges can be flexible in the SUFI-2 algorithm.



**Table 5.** Final optimal parameter values and range.

The above discussion, which focuses on effects of different approaches to simulation and parameter ranges, suggests PSO was more efficient for parameter optimization and uncertainty reduction, but SUFI-2 can ensure the rationality of parameter selection because it allows users to adjust and control the parameter boundary. Therefore, this study provided a new method (a combined approach) for efficient parameter optimization and uncertainty analysis, which can be valuable for other study areas.

#### *4.2. Single-Objective and Bi-Objective Model Calibration and Uncertainty Analysis*

One parameter tended to be sensitive for some specific hydrological components, but showed non-sensitivity for others [65]. Hence, introducing more objectives (hydrological components) to the calibration can be used to consider the influences of more parameters that may not initially be sensitive, and thus result in parameters that are more suitable [30,66].

In this study, we attempted to quantitatively analyze the consequences (e.g., parameter distribution) of introducing an additional calibration objective (remotely-sensed ET). Simulation results showed that some parameters (i.e., CN2, ESCO, and SURLAG) when using single-objective and bi-objective calibration showed different trends with the iteration (see Figure 7). These parameters were all related to water balance processes, including ESCO, which controls soil evaporative demand. There was a substantial increase in ESCO when adding ET as an additional constraint (optimum value increased from around 0.03 to 0.66). Clearly, the parameter value derived from the additional constraint of ET can be more accurate or close to its true value, with NSE increasing from 0.7 to 0.78, but PBIAS increased slightly using bi-objective calibration (see Figure 5). Moreover, CN2 controls the surface streamflow; the decrease in CN2 (from 0.05 to 0.004) indicated the decrease in surface streamflow and increase in infiltration potential [25]. The simulation results showed a decrease in surface streamflow (from 563.6 to 539.2 mm), and increases in subsurface lateral flow (from 102.1 to 116.8mm) and groundwater flow (from 130.3 to 201.2 mm). However, because of the lack of observed soil moisture and groundwater storage, such processes cannot be calibrated in this study, which could be a suitable subject for future study.

As mentioned in the discussion of results in Section 3.2, the PBS growth rate of RE was much faster than that of R via iteration. One explanation for this phenomenon is that the iteration can consider more processes, leading to simultaneous optimization of more sensitive parameters and reducing the amount of trial-and-error. This deserves further investigation.

Determining whether the value of a parameter can reflect real hydrological processes is a significant challenge for researchers and practitioners to judge, particularly when a model is calibrated by a single constraint (e.g., streamflow). Introducing more constraints (e.g., ET, soil water) into the model calibration (i.e., multi-objective calibration) may compromise simulation performance of a specific component (e.g., streamflow), but multiple variables can help constrain the model behavior and thus improve its reliability and appropriateness. Therefore, we believe a reasonable representation and simulation of hydrological processes is more significant than a so-called "accurate" simulation with a high rating numeric value. Thus, we developed this study to investigate the effects of introducing one more constraint (i.e., ET) on model performance and modeling uncertainty. Nevertheless, we acknowledge that two variables can constrain a small portion of the complex hydrological system, and the modeling uncertainties of other specific hydrological processes may still be high. It is worth investigating the effects of involving more constraints in model calibration, and caution should be used, particularly considering the limitations of the observations we adopted.

In addition, we acknowledge that there were also some limitations in this study: (i) NES was the only objective function for parameter optimization, although it has been verified that different optimal parameter sets may be obtained by different objective functions [19,67]. Investigation of the potential effects of different objective functions on model calibration and uncertainty reduction can be a suitable subject for future studies. (ii) The case study was implemented in the Guijiang River Basin using SWAT, however, the combined approach recommended in this study deserves further evaluation in other regions and using other models in future. (iii) Significant bias and uncertainties in ET simulation remained during high value periods, although the simulation accuracy of ET improved by adding additional constraints. The over/underestimations that result from inaccurate observations or unsuitable simulation could be an issue that deserves further investigation. (iv) The approach we proposed in this study (i.e., combined optimization scheme and multi-objective calibration using streamflow plus ET) could help improve modeling accuracy and reduce modeling uncertainty in the simulation of the water cycle. The combined optimization scheme may be a good reference and help in the calibration of similar models. However, using remotely sensed ET as an additional constraint may not help in sediment and nutrient simulations considering the weak relationship between ET and sediment/nutrient transport.

#### **5. Conclusions**

This study, for the first time, employed SUFI-2 and PSO in six different ways (i.e., singlealgorithm approaches and combined-algorithm approaches) for efficient parameter calibration, using SWAT as an example. We also investigated the impacts of single-objective (streamflow only) and bi-objective (streamflow plus remotely sensed ET) model calibration on uncertainty and simulation accuracy. Our results showed that combined-algorithm approaches can well control the physical rationality of a parameter, whereas manual involvement resulted in complex calibration procedures. From the width of the 95PPU bands, we found that the combined-algorithm approaches can help reduce uncertainty due to the reduction of parameter ranges and the aggregation of parameters. After several rounds of reducing the parameter ranges, the effect of uncertainty was also reduced. Although little difference in streamflow simulation accuracy exists between these two approaches, the latter approach can substantially improve the simulation of ET, indicating that parameter ranges constrained by an additional variable can help improve the accuracy of the whole hydrological cycle. In addition, accuracy improvement was more efficient when there was an additional calibration objective.

In brief, the combined approaches identified in this study and the effect of an additional constraint on model calibration and uncertainty reduction can be useful for the hydrological modeling community. Nevertheless, considering the many processes involved in the hydrological cycle in the real world, the effect of introducing additional variables (i.e., three or more objectives) on the improvement of model performance and reduction of uncertainty deserves further investigation in future studies.

**Author Contributions:** J.H. and Y.W. designed and performed the study, interpreted the results, and wrote the paper. F.Z., X.L., P.S., S.K.S., W.L., L.Q., and J.L. contributed to data collection, results presentation, and draft revision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was funded by the National Key Research and Development Program of China (2019YFC0507403), the National Science Foundation of China (31961143011), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB40000000), the project of Hydrological Modeling of a Typical Watershed in Southern China (WR1003A102016D1701); the Key laboratory of Degraded and Unused Land Consolidation Engineering of Natural Resources Ministry of China (SXDJ2019-5), the National Thousand Youth Talent Program of China, Shaanxi Key Research and Development Program of China (2018ZDXM-GY-030), Young Talent Support Plan of Xi'an Jiaotong University. We also thank the HPCC Platform in Xi'an Jiaotong University for computing equipment and computer maintenance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Measurements for Model Performance Evaluation**

#### *Appendix A.1. NSE*

The Nash–Sutcliffe coefficient (NSE) was chosen as the objective function in this study. A NSE close to 1 indicated the model was of high accuracy.

$$NSE = 1 - \frac{\sum\_{1}^{K} \left(Q\_{O} - Q\_{S}\right)^{2}}{\sum\_{1}^{K} \left(Q\_{O} - \overline{Q\_{O}}\right)^{2}}\tag{A1}$$

where *Q* is a hydrological variable, *K* is the number of observations, and the subscripts *O* and *S* indicate observation and simulation, respectively.
