3.1. Study Area
The study area was the Chungju Dam Basin in the Han River of the Korean peninsula. The area of the basin is approximately 6648 km
2, and the length of the related river is approximately 280 km. The average altitude of the basin, calculated using a 50 × 50 m
2 grid, is 610 m; its maximum altitude is 1560 m; its minimum altitude is 71 m; and its standard deviation is 261 m. We selected five weather-gauging sites (the red circles in
Figure 2), which have collected data for five years (2008–2012), from the Korean Meteorological Agency.
Table 1 shows the geographic information for the weather stations and the daily data (minimum temperature, maximum temperature, precipitation, relative humidity, wind speed, and solar radiation) from the collection period. There are 21 water-level gauging sites in the basin (the black and pink triangles in
Figure 2). However, only eight discharge-gauging sites (the pink triangles in
Figure 2) had discharge data for the period from 2008 to 2012, as most gauging stations were either recently installed or have not developed a relationship between water level and discharge.
Figure 2.
Study area (Chungju Dam Basin).
Figure 2.
Study area (Chungju Dam Basin).
Table 1.
Weather and discharge-gauging stations.
Table 1.
Weather and discharge-gauging stations.
Stations | Code | Station Name | Latitude (°) | Longitude (°) | Elevation (m) | Period of Record (Year) |
---|
Weather gauging stations | 100 | Daegoanrung | 37.68 | 128.82 | 772.4 | 2008–2012 |
114 | Wonju | 37.34 | 127.95 | 150.7 | 2008–2012 |
216 | Taebaek | 37.17 | 128.99 | 714.2 | 2008–2012 |
221 | Jecheon | 37.16 | 128.19 | 263.1 | 2008–2012 |
272 | Youngju | 36.87 | 128.52 | 210.5 | 2008–2012 |
Discharge-gauging stations | 1 | Chungju Dam | 37.00 | 128.00 | 80.0 | 2008–2012 |
2 | Youngchun | 37.10 | 128.51 | 190.0 | 2008–2012 |
3 | Youngwol 1 | 37.18 | 128.48 | 200.0 | 2008–2012 |
4 | Geowun | 37.23 | 128.51 | 221.0 | 2008–2012 |
5 | Youngwol 2 | 37.19 | 128.41 | 383.0 | 2008–2012 |
6 | Panwoon | 37.30 | 128.38 | 722.0 | 2008–2012 |
7 | Pyeongchang | 37.37 | 128.41 | 762.0 | 2008–2012 |
8 | Jucheon | 37.27 | 128.27 | 720.0 | 2008–2012 |
3.2. Entropy Estimation
The concept of entropy has been applied to several fields of study, for example, Jaynes [
48] in statistical mechanics, Molgedey and Ebeling [
49] in finance, Ulanowicz [
50] in ecology, Mormarco,
et al. [
51] in hydraulics, Mogheir,
et al. [
52] in groundwater, and others. In hydrology, entropy has mostly been applied as a tool for modeling and decision-making (Singh [
53,
54]) including the evaluation of a sampling network. Yoo,
et al. [
55] evaluated the rain gauge network by comparing mixed and continuous distribution function applications. This study tried to apply the entropy method to find calibration sites for hydrological modeling. In this study, the number of class intervals was set to 500 for all sites. Mutual information was calculated using the same class interval number, though the class intervals’ Δ
x are different from each site. First, the goodness-of-fit of the observed data for the log-normal distribution was tested. The Quantile-Quantile (QQ) plot, which is a very useful plot as one of several heuristics for assessing how closely a data set fits a particular distribution used to visually inspect the similarity between theoretical quantiles of log-normal distribution and quantiles of observation fit comparatively well in each site, as shown in
Figure 3.
Figure 3.
Quantile-Quantile (QQ) plot of observation versus log-normal distribution.
Figure 3.
Quantile-Quantile (QQ) plot of observation versus log-normal distribution.
Table 2 shows the information matrices of the discharge-gauging sites from the entropy method. These matrices summarize the marginal entropy, transinformations between the sites, and the total information for a selected site, represented as the “sum”. For example, if we select Discharge-Gauging Site 1 in
Table 2, the total information from Gauging Site 1 is the marginal entropy (7.667) plus the sum of the transinformations.
Table 3 summarizes the optimal sites depending on the total number of sites. At the beginning of the selection of the discharge-gauging sites, the sum of the marginal entropy of the selected sites and the transinformations with the other sites is increasing. The increasing trend is valid until the threshold number of sites for a given basin. However, after the threshold number of discharge-gauging sites, the sum of transinformation between the selected site and the other unselected sites decreases more rapidly than the additional marginal entropy from a newly selected site. The total entropy thus decreases as the number of selected sites increases. In the study area, the highest number of maximum information is 66 when the five sites (Sites 1, 2, 3, 5, and 6) are selected.
Table 2.
Information matrix.
Table 2.
Information matrix.
Discharge-Gauging Sites | Discharge-Gauging Sites |
---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Sum |
---|
1 | 7.667 | 2.708 | 2.643 | 2.543 | 2.353 | 1.745 | 2.128 | 1.609 | 23.396 |
2 | 2.708 | 7.079 | 2.599 | 2.530 | 2.337 | 1.452 | 2.070 | 1.641 | 22.415 |
3 | 2.643 | 2.599 | 6.783 | 2.505 | 2.771 | 1.867 | 2.354 | 1.857 | 23.378 |
4 | 2.543 | 2.530 | 2.505 | 6.007 | 2.755 | 2.049 | 2.462 | 1.948 | 22.798 |
5 | 2.353 | 2.337 | 2.771 | 2.755 | 6.457 | 1.938 | 2.353 | 1.808 | 22.772 |
6 | 1.745 | 1.452 | 1.867 | 2.049 | 1.938 | 4.288 | 3.075 | 2.474 | 18.888 |
7 | 2.128 | 2.070 | 2.354 | 2.462 | 2.353 | 3.075 | 5.124 | 2.187 | 21.755 |
8 | 1.609 | 1.641 | 1.857 | 1.948 | 1.808 | 2.474 | 2.187 | 4.140 | 17.665 |
Table 3.
Changes in the total information depending on the selected sites.
Table 3.
Changes in the total information depending on the selected sites.
Number of Sites | Selected Sites | Total Information | Change of Total Information |
---|
#1 | 1 | 23.4 | |
#2 | 1, 3 | 41.5 |
#3 | 1, 3, 7 | 54.3 |
#4 | 1, 2, 5, 7 | 62.4 |
#5 | 1, 2, 3, 5, 6 | 66.03 |
#6 | 1, 2, 3, 5, 6, 8 | 64.9 |
#7 | 1, 2, 3, 4, 5, 6, 8 | 59.1 |
#8 | 1, 2, 3, 4, 5, 6, 7, 8 | 47.5 |
Sensitivity in each site was analyzed by calculating the losing information, which is the difference between the maximum information in each case and the maximum information from the eight sites. Here, each case means the combination of sites, with a specific site removed. For example, Case 1 estimates the maximum information using the other sites, without Site 1.
Figure 4a shows the result of losing information, depending on the number of selecting sites in each case.
Figure 4b shows the calculated sensitivity ranking in each site for each case. The ranking of the sites is: 1, 3, 2, 5, 7, 6 and 8.
Figure 4.
Sensitivity analysis. (a) Losing information in each case; (b) Sensitivity ranking.
Figure 4.
Sensitivity analysis. (a) Losing information in each case; (b) Sensitivity ranking.
3.3. Model Setup and Calibration
A rainfall-runoff model was built for the study basin using SWAT. Maps of 1:25,000 scale were collected to generate a 50 × 50 m
2 Digital Elevation Model (DEM) and a stream network. In addition, a land cover map (
Figure 5b) and a soil map (
Figure 5c) from the National Water Resources Management Information System (WAMIS;
http://www.wamis.go.kr/) were used. The basin was classified into eight different land-use conditions, among which forests (82.2%) and rice paddies (10.3%) accounted for 92.5% of the land use. The soil map, which included classifications of 141 total types of soil, showed that apb (17.8%) and ana (15.5%) were the most prevalent soil types in the area. To build the model used for the study, GIS data were prepared to generate hydrological response units, based on the above data. Terrain analyses were conducted to delineate the channel network using the DEM of the Chungju Dam basin. The basin was divided into ten sub-basins, as shown in
Figure 5a. The extract geomorphological characteristics in each sub-basin are shown in
Table 4.
Figure 5.
GIS data as SWAT input. (a) Stream network and sub-basins map; (b) Land use map; (c) Soil type map.
Figure 5.
GIS data as SWAT input. (a) Stream network and sub-basins map; (b) Land use map; (c) Soil type map.
Table 4.
Geomorphological characteristics in each sub-basin.
Table 4.
Geomorphological characteristics in each sub-basin.
Sub-Basin | Basin | Stream | Remark |
---|
Area (km2) | Slope (%) | Altitude (El. m) | Upstream Area (km2) | Length (km) | Slope (%) | Min. Alt. (El. m) | Max. Alt. (El. m) |
---|
B-1 | 1905.1 | 27.2 | 391.0 | 6631.4 | 101.5 | 10.2 | 71.0 | 174.0 | Site 1 |
B-2 | 553.6 | 35.0 | 487.0 | 4726.2 | 16.4 | 29.3 | 157.0 | 205.0 | Site 2 |
B-3 | 164.6 | 33.7 | 243.0 | 2398.5 | 12.0 | 70.9 | 136.0 | 221.0 | Site 3 |
B-4 | 2233.9 | 32.2 | 667.0 | 2233.9 | 128.9 | 30.2 | 216.0 | 605.0 | Site 4 |
B-5 | 276.9 | 22.6 | 328.0 | 1774.2 | 26.8 | 17.9 | 193.0 | 241.0 | Site 5 |
B-6 | 88.5 | 28.6 | 421.0 | 896.1 | 19.3 | 56.9 | 215.0 | 325.0 | – |
B-7 | 110.1 | 31.1 | 476.0 | 807.5 | 23.7 | 33.0 | 257.0 | 335.0 | Site 6 |
B-8 | 697.4 | 28.5 | 636.0 | 697.4 | 52.4 | 47.0 | 291.0 | 537.0 | Site 7 |
B-9 | 67.3 | 21.1 | 351.0 | 601.2 | 14.0 | 28.5 | 211.0 | 251.0 | – |
B-10 | 533.9 | 26.0 | 548.0 | 533.9 | 43.2 | 44.0 | 251.0 | 441.0 | Site 8 |
In this study, surface runoff was estimated using the Soil Conservation Service Curve Number, which has an advantage to predict direct runoff or infiltration from excess rainfall using daily precipitation and GIS data like soil type and land-use maps in an ungagged area. Any water that does not become surface runoff enters the soil column, where it is removed through evapotranspiration or through deep percolation into the deep aquifer, or the runoff may move laterally in the soil column as a streamflow contribution. Groundwater contribution to streamflow is generated from both shallow and deep aquifers, and is based on groundwater balance. There are three methods for estimating evapotranspiration like Priestley-Taylor, Penman-Monteith, and Hargreaves in SWAT. The Penman-Monteith method [
56] was used to estimate evapotranspiration using weather variables, such as mean temperature, wind speed, relative humidity, and solar radiation.
SWAT contains several parameters that are used to describe the spatially distributed movement of water through the watershed system. Some of these parameters, such as the Curve Number (CN), cannot be directly measured and must be estimated through calibration. SWAT is a distributed hydrological model and consequently there are potentially many (thousands) parameters. As it is impossible to calibrate all of them, a reduction of the number of parameters to estimate is inevitable. In this study, seven parameters that govern the surface water response and the subsurface water response of SWAT were used in the calibration.
Table 5 shows a general description of the seven parameters [
57]. The default parameters were determined by the methods introduced by Neitsch,
et al. [
58]. A more detailed presentation for primary parameters and sensitivity tests is referred in many studies [
57,
58,
59,
60,
61].
There are several automatic calibration algorithms. Zhang,
et al. [
32] compared the efficacy of five global optimization algorithms, such as shuffled complex evolution method developed at The University of Arizona (SCE-UA), Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Artificial Immune Systems (AIS), and Differential Evaluation (DE), for calibrating SWAT and found that GA is a promising single-objective optimization method. This study used GA to estimate the optimized parameters of SWAT. In GA, a roulette wheel algorithm is used to select chromosomes for the crossover and the mutation operations [
62]. A two-point crossover method with a probability of 0.8 was selected for making the search shorter and more robust, and a mutation with a probability of 0.01 was selected. The RMSE fitness function (Fs) [
25] was used in this study. This performance index was defined to minimize the RMSE, as shown in Equation (13):
where
is simulated daily discharge,
is observed daily discharge at the calibration site, and
n is the number of days with observations.
Table 5.
Parameters for the calibration of SWAT.
Table 5.
Parameters for the calibration of SWAT.
Num. | Parameter | Description | Range |
---|
Parameters governing surface water response |
1 | CN2 | Curve number 2 | 35–98 |
2 | ESCO | Soil evaporation compensation factor | 0–1 |
3 | SOL_AWC | Available soil water capacity | 0–1 |
Parameters governing subsurface water response |
4 | GWQMN | Threshold depth of water in the shallow aquifer for return flow to occur (mm) | 0–5000 |
5 | REVAPMN | Threshold depth of water in the shallow aquifer for reevaporation to occur (mm) | 0–500 |
6 | GW_REVAP | Groundwater reevaporation coefficient | 0.02–0.2 |
7 | ALPHA_BF | Base flow recession constant | 0–1 |
The size of the initial population was set to 50, and the number of generations was set to 1000. The sites were selected according to the entropy method. The algorithm was configured so that optimization was implemented sequentially, starting with the discharge-gauging site that was the furthest upstream. For example, if calibration is conducted for the case where there are three observation station sites (Sites 1, 3, and 7), then Site 7, which is the furthest upstream, would be the first to be calibrated, followed by Site 3 and Site 1.
The GA for the parameter optimization of the SWAT in this study was tested by comparing it to a simple Brute-force Search Algorithm (BSA) for checking the applicability of GA. The calibration was only performed at the outlet site of the whole basin. The optimized parameters in each algorithm are shown in
Table 6. The parameters were remarkably similar and the RMSE between the results (from
Figure 6) using these methods was about 0.07 m
3/s. This shows both the applicability of the GA and its usefulness in solving the problem of complex combinations in this study.
Table 6.
Optimized parameters by GA and BSA.
Table 6.
Optimized parameters by GA and BSA.
Parameter | GA | BSA |
---|
CN2 | 48 | 48 |
ESCO | 0.73 | 0.8 |
SOL_AWC | 0.32 | 0.3 |
GWQMN | 1694 | 1600 |
REVAPMN | 132 | 150 |
GW_REVAP | 0.08 | 0.1 |
ALPHA_BF | 0.6 | 0.5 |
Figure 6.
Discharge comparison of GA vs. BSA.
Figure 6.
Discharge comparison of GA vs. BSA.
Using the above method, calibration was conducted at the respective sites.
Table 3 was referenced for all of the cases where the number of sites selected was one to eight. After the respective cases were calibrated, the relation between observed daily discharge and simulated daily discharge at the eight discharge-gauging sites in the study basin was illustrated, as shown in
Figure 7.
Figure 7a shows the case where calibration was conducted at only one site (Case 1), whereas
Figure 7b shows the case where calibration was performed at five different sites (Case 5). These cases were compared to the case where no calibration was conducted (no calibration; blue circle). Case 5 is included in the comparison because the maximum amount of information was indicated when five sites (Sites 1, 2, 3, 5, and 6) were selected in the study basin (see
Table 3). It was determined that the simulated discharge in the case where no calibration was conducted had an underestimation issue (blue circles), and Case 5 (five sites selected) produced a better result than Case 1 (one site selected).
Figure 7.
Scatter plot for the relation between observation and simulation. (a) Calibration at one site (Case 1); (b) Calibration at five sites (Case 5).
Figure 7.
Scatter plot for the relation between observation and simulation. (a) Calibration at one site (Case 1); (b) Calibration at five sites (Case 5).
3.4. Calibration Results and Discussion
The calibration results, based on the respective results of
Table 3 (from #1 to #8), were mutually compared. Three evaluation functions were applied for the observed and simulated discharges, coefficient of correlation (CC), RMSE, and Nash-Sutcliffe efficiency (NSE) [
63]. The results of the case evaluations (with the selection of one to eight sites) using the evaluation functions are shown in
Table 7 and
Figure 8. The evaluation was conducted for all of the sites and for the outlet. First, the calibration results were applied to all of the sites for comparison. Even if only one site had been selected for calibration, it would have been compared with the respective observation discharge of eight sites after the simulated discharge of eight sites was extracted. Next, the outlet from the most important site (as determined in
Table 3) was applied. The applicability of the SWAT model was outstanding in the study basin as the CC, RMSE, and NSE were 0.782, 147.4, and 0.482, respectively, even in the case where no calibration was conducted (#0); however, it was confirmed that the results were improved slightly when the model was calibrated. In particular, the result of the case with more sites selected was even better than the result of the case with only one site selected. Nevertheless, the calibration result did not improve any further when the number of sites selected exceeded a certain number. This characteristic is easily confirmed through
Figure 8 and Case 5, where all of the sites were evaluated (five sites selected: CC, 0.813; RMSE, 138.8; NSE, 0.540), and Case 4, where the basin outlet point was evaluated (four sites selected: CC, 0.799; RMSE, 324.0; NSE, 0.575) and the best calibration result was produced. If the case evaluating all of the discharge-gauging sites in the basin is deemed to be more representative than the case evaluating only the outlet point of the basin, then Case 5, where five sites (Sites 1, 2, 3, 5, and 6) were selected for calibration, produces the best result.
Table 7.
Calibration results at all sites and the outlet site.
Table 7.
Calibration results at all sites and the outlet site.
Site Number | All Sites | Outlet Site |
---|
CC | RMSE (m3/s) | NSE | CC | RMSE (m3/s) | NSE |
---|
#0 (Non-C.) | 0.782 | 147.4 | 0.482 | 0.763 | 351.3 | 0.501 |
#1 | 0.800 | 142.8 | 0.516 | 0.784 | 330.4 | 0.554 |
#2 | 0.805 | 141.1 | 0.530 | 0.798 | 325.2 | 0.568 |
#3 | 0.810 | 140.0 | 0.538 | 0.798 | 325.1 | 0.573 |
#4 | 0.812 | 139.7 | 0.539 | 0.799 | 324.0 | 0.575 |
#5 | 0.813 | 138.8 | 0.540 | 0.798 | 325.1 | 0.573 |
#6 | 0.809 | 140.3 | 0.536 | 0.797 | 325.5 | 0.572 |
#7 | 0.809 | 142.1 | 0.524 | 0.794 | 329.0 | 0.562 |
#8 | 0.809 | 142.1 | 0.524 | 0.794 | 329.2 | 0.562 |
The total information will increase if more sites are used. For example, the maximum information was about 66 when eight sites were used in this study (see
Table 3). However, the maximum information was about 56.6 when seven sites were used in Case 8 (shown in
Figure 9). Here, Case 8 means that Site 8 was removed from the eight sites and the maximum information is calculated using the other seven sites. There is a small difference between using seven sites among seven sites and using seven sites among eight sites. The maximum information was 59.1 when seven sites were selected among eight sites. However, the maximum information was shown when five sites (1, 2, 3, 5, and 6 sites) were selected among the eight sites. As a result, if in the future more observation sites are available, it will still be possible to get more information. However, the maximum information is not shown when all observation sites are used.
Figure 8.
Calibration results using evaluation functions. (a) Coefficient of correlation; (b) RMSE; (c) NSE; (d) RMSE range in case of all sites.
Figure 8.
Calibration results using evaluation functions. (a) Coefficient of correlation; (b) RMSE; (c) NSE; (d) RMSE range in case of all sites.
Figure 9.
Maximum information using seven total sites.
Figure 9.
Maximum information using seven total sites.
The entropy method may identify the number of calibration sites after which the marginal increase in model efficiency to represent the observed runoff no longer significantly increases. Choi,
et al. [
31] stated that additional calibration sites can benefit model performance. However, the results of this study showed that model performance instead decreased if more four of five sites were selected (
Table 7). There may be two reasons for this. First, the exclusion of one of the sites worsened the simulation result of the other sites. The sites that caused this response could be Site 4 and Site 8 because model performance with those sites is decreased. However, there is a limit to understanding the result of the model performance using only one problem in each site. This does not clearly explain the results in terms of Site 7. In fact, the model performance for Site 7 is positive in Case 3 and Case 4, and negative in Case 8 (as seen
Table 3 and
Table 7). Here, error compensation is a very important point for multi-site calibration. In a case where many sites are considered for calibration, error compensation has an effect on model performance. In this study, error compensation can be prevented if all of the sites are used for calibration, and therefore decrease model performance. Model performance will increase due solely to the error compensation if fewer than the maximum number of sites is used for calibration. Therefore, the entropy method should not be preferred over an approach where all available sites are used for calibration, if time allows for it. The entropy method is useful in cases where computational requirements do not allow the use of all sites for calibration. In other words, the entropy method is only useful in reducing the time and effort of model calibration, but not in increasing model performance.
The growing importance of water resource management, along with the development of observation techniques, has recently resulted in the installation of significantly more water level observation stations in basins. Currently, there are 21 water level observation stations in the study basin, and it is expected that the observed discharge information will be continuously accumulated. Obviously, observation data obtained from more sites will be a great advantage to hydrological modeling. However, assuming that all 21 sites in the study basin can be utilized, the number of cases of the selection of sites for calibration is 2,097,151 (21Cn= 2,097,151). While this assumption does not consider the importance of the sites, the number of cases for the hydrological model calibration must still be high. If the brute-force search method is considered to select calibration sites in this area, we would waste too much time and effort. Sometimes, a modeling result does not improve any further, although we try to get a good result in model calibration. This study confirmed that the selection of more calibration sites did not lead to improved calibration results from the model. Therefore, the entropy method attempted in this study is expected to provide an excellent guideline to conduct the calibration of the hydrological model. In addition, the application of the theory will further increase when selecting a certain number of sites, depending on the purpose of the application of the model, because the theory also provides information as to which sites need to be selected.