1. Introduction
Hydrological systems are often complicated due to high temporal variability and diverse topography, land use, and anthropogenic conditions in the catchments. Hydrological models have become indispensable for understanding these complex human–ecosystem interactions and investigating the effects of human activities on watershed systems [
1]. Over the past few decades, intensive efforts have been made to develop process-based catchment models operating on different temporal and spatial scales. Such models include, for example, Topographic Hydrological Model (TOPMODEL) [
2], Système Hydrologique Européen (MIKE-SHE) [
3], and the Soil and Water Assessment Tool (SWAT) [
4]. Various process-based models have been extensively applied worldwide to improve the understanding of hydrological processes and provide scientifically credible solutions. However, SWAT has gained wide popularity and was chosen in this study due to its open-access nature, compatibility with geospatial tools, spatial and temporal flexibility, and incorporation of optimization algorithms [
1,
5].
SWAT requires detailed information on soil, land use, topography, and weather to successfully set up, execute and interpret the results [
6]. Such spatial datasets need to be high quality and reliable to produce trustworthy model responses. However, many specific data requirements such as soil hydraulic conductivity or soil bulk density cannot be measured everywhere and, thus, are modeled or derived in other ways to create spatial coverage for these parameters. Consequently, these types of data are often exposed to various levels of errors associated with data sources, resolution, interpolation, and resampling techniques [
6,
7]. Such errors combined with an inaccurate model structure can lead to uncertainties in the modeling outputs [
8,
9]. Model uncertainty analysis plays a key role in identifying the magnitude and sources of errors and enables more adequate decision-making [
8,
10]. Failure to understand and interpret the effects of these uncertainties on model performance may result in model outputs that cannot consistently represent the observations.
In recent years, the sensitivity of the SWAT model to spatial input data has attracted the attention of researchers [
6,
8,
11,
12,
13]. The previous studies show somewhat contradicting results, and there is no clear pattern indicating that high-resolution and local data outperform the low-resolution data. For example, Camargos et al. [
6] evaluated the effect of spatial resolution of the input data on river discharge simulation and found that regional land use data reduced the bias of discharge simulation by 50%, while global soil data performed better than regional soil. On the contrary, Geza and McCray [
14] evaluated the performance of SWAT with two U.S. soils (i.e., high-resolution SSURGO and low-resolution STATSGO soils) and reported better performance of the model with SSURGO soil. Al-Khafaji et al. [
12] investigated the effect of DEM and land use data quality on the accuracy of SWAT model predictions and reported that high-resolution datasets did not provide better predictive reliability. Similar results were obtained by Asante et al. [
11], who evaluated the impact of land use data quality on the predictive capacity SWAT model and indicated slightly better performance of low-resolution land use data. Chaplot [
15] also confirmed only a little impact of land use quality on the SWAT model results, while soil data with lower resolution greatly degraded the prediction accuracy. In general, the existing studies on the effect of spatial input data resolution (especially soil and land use) on SWAT estimates have yielded contradicting conclusions. Such contradictions mainly arise from the variations in environmental characteristics of the investigated watersheds [
15]. As a result, it is essential to evaluate the sensitivity of the SWAT model to the accuracy of these datasets in catchments differing in physiographic conditions.
Moreover, previous studies have focused on evaluating the effect of input data uncertainty on SWAT model predictions using low-resolution global or regional datasets, while no attention has been given to assessing the effect of high-resolution local datasets. In this paper, we examined the effect of high-resolution local soil and land use data on the predictive capacity of the SWAT model in the Porijõgi catchment of Estonia. We hypothesized that local datasets provide greater information details and yield a reduced range of parameter uncertainty and better simulation performance than global or regional lower-resolution datasets.
4. Discussion
In this study, we compared low to medium resolution datasets of land use (CORINE) and soil (HWSD) with local high-resolution datasets (ETAK, EstSoil-EH) in a four-fold cross-wise fashion to answer the question of whether local datasets yield a better simulation performance and can reduce parameter uncertainty when predicting streamflow with SWAT. The results indicate a mixed response. The models with the high-resolution local soils performed worse; however, the models with the same soil but with the high-resolution Estonian land use data performed marginally better. Overall, the impact of the soil datasets was stronger on the model uncertainty.
Overall, all models captured the low flow better, while many of the peak flows were underestimated (positive PBIAS), which might be due to the very mosaic nature of the catchment and the abundance of floodplains with alluvial soils in the lower catchment that delay the flow peak. In general, it can be said that the values obtained for the P-factor and R-factor (
Table 4) indicate low uncertainties for all individual models. Higher uncertainties can only be observed for validation, which is expected. All models exhibit a decline in NSE values for the validation period within a range of 0.2 to 0.3, with the best model, ELHS (Estonian landuse, HWSD soil), also having the best scores during validation (cf.
Figure 3). This could indicate over-fitting during calibration. One reason for the lower validation scores could be that the three rain gauges are not fully capturing the variability of the rainfall within the catchment, thus increasing the chance of over-fitting in the calibration period. However, the overall ranking is principally comparable. Of the models with the same soil, those models with the high-resolution Estonian land use data had a smaller decline in NSE during the validation period, indicating that land use parameters were more reliable. Modeled and reanalyzed rainfall data would be available [
13], but we decided to refrain from introducing additional large-scale data.
Table 5 shows the main differences in parameter values between HWSD and EstSoil-EH. EstSoil-EH shows a significantly higher large amount of very sandy soils in the catchment. This might be one of the reasons for the great differences in streamflow performance between models with EstSoil-EH and HWSD. The high sand content in EstSoil-EH subsequently might have led to the much lower curve numbers for CLES and ELES and the hydrological soil group configurations. As can be seen, the EstSoil-EH models have large areas with soil hydrologic groups “A” and “C”, whereas HWSD only shows type “D”. The soil hydrologic group is a parameter that is not used by the SWAT model during simulation, but it is used by the ArcSWAT and QSWAT packages that create the initial SWAT model files from the spatial and tabular input data. The SWAT documentation explains the hydrologic soil group in four categories, from “A” to “D”, and it relies partially on infiltration rates and soil textures [
28]. However, the designation is subjective and includes guidelines to acknowledge the existence of impermeable layers such as clay horizons, shrink-swell potential, and depth to bedrock. For EstSoil-EH, the labeling of the soil hydrologic groups seems to be mostly based on the textures and fine earth fractions, and this might have caused the overestimation of sandy soils in the catchment.
The low SOL_BD values additionally decrease runoff potential, which during calibration has to be compensated. The authors of EstSoil-EH describe that SOL_BD was derived with an inverse proportional pedo-transfer function from the soil organic carbon content (SOL_CBN) [
23]. This might have been supported by the higher soil organic carbon values in EstSoil-EH. However, the large areas of histosols and peatlands in the catchment naturally contain large carbon contents.
There are also differences in land use distribution: ETAK has 38% agriculture and 45.8% forest, whereas CORINE labels 49.8% agriculture and 37.6% forests, almost an inverted relationship of 10% shift. Furthermore, ETAK has more pasture (8.7%) and correctly indicates the existence of wetlands (3.2%), whereas CORINE labels 6.8% of the catchment with a range of shrublands and grasslands and less pasture (4.9%). However, the land use differences of ETAK and CORINE regarding the number of forests and general vegetation patterns are rather negligible. With CANMX being a sensitive parameter, we attribute ETAK’s better performance to the larger fraction of forest over agricultural areas. The larger forest areas in ETAK tend to reduce runoff and retain water. In CORINE, the larger agricultural areas tend to allow increased surface runoff. In ETAK, the larger forest areas have a stronger ability to store more water in the canopy, which is also visible as a tendency in the parameter distributions for CANMX (
Figure A6 and
Figure 4).
Lastly, we want to reflect on the methodology. SWAT sensitivity analysis can be performed locally or globally. Local sensitivity analysis changes parameter values one-at-a-time while in the global sensitivity analysis, all parameter values are changed. The problem with the one-at-a-time analysis is that the sensitivity of parameters often depends on the values of other parameters, but we do not know if the other fixed parameters have the best values. On the other side, the strength of the global sensitivity analysis method is the much more robust depiction of model uncertainty by comprehensively accounting for parameter interactions. However, a disadvantage of a global sensitivity analysis method is the high number of simulations needed, which can become computationally expensive. We extended the global sensitivity and uncertainty analysis approach not only to account for multiple parameters but also for several models, which are at least supposed to be in the same modeling domain (the same catchment, the same spatial and observed data to fit and force). By developing the common parameter list from the individual model calibrations and applying this for the joint analysis in step 3, we can then assess the tendencies of preferential parameter values by extracting only the top 5% performing simulations. Conversely, only looking at the individual models’ parameter ranges does not yield additional information:
Figure A3 and
Figure A5 in the appendix show the results of this Mann–Whitney-U analysis for the initial calibration and validation steps for the models. Almost all parameter distributions are different from each other, which understandably is rooted in their individual initial calibration and optimizations. One possible future direction to improve the understanding of the effect of the input data on hydrological models is to use a spatially explicit distributed hydrologic model, e.g., the mesoscale Hydrologic Model mHM [
35], which would also enable the use of spatial metrics, such as SPAEF [
36], to evaluate the results.