3.1. Descriptive Statistics and Spatial Distributions
Ecological conditions measured by biological indicators, including TDI, KSI, and IBI, varied greatly among sampling sites (
Table 2). Very low biological indicator values (near zero: very poor ecological condition) were observed at some sites, while others had very high values (near 100: very good ecological condition). The mean values of the TDI, KSI, and IBI within the study area were 43.27, 63.19, and 56.89, respectively. Overall, the condition of macroinvertebrate assemblies in the study areas was slightly better than the condition of benthic diatom or fish assemblies. The standard deviation of KSI values suggested greater variance in KSI values than in TDI or IBI values. Interestingly, all indicators reflected poor ecological conditions along main streams and in downstream areas in the southeastern part of the study region (
Figure 2).
The mean values of the BOD, T-N, and T-P were 1.0, 2.24, and 0.04, respectively, indicating relatively good water quality in the areas investigated. However, the large standard deviation of elevation and slope indicate complex topographic characteristics in the study areas.
The relative proportions of each type of land use in watersheds also varied greatly across the study areas. The dominant land use type in the study area was forest (mean: 63.64%). Developed areas were relatively small, and concentrated at several sites along the main river, particularly near the central and downstream regions (
Figure 3). As shown in
Figure 3, several cities of various sizes were located near/along the main stream. Daegu and Busan were the two largest cities, with populations of 2.5 million and 3.5 million, respectively.
Before we undertook a detailed analysis, we considered the changes of the biological indicators over the five years from 2008 through to 2012 to understand the nature of the dataset (
Figure 4). In 2011, there were changes in the biological indicators. KSI and IBI had slightly higher values, while TDI had slightly lower values, than the other years. Despite this fluctuation in the biological indicators in 2011, we used 2011 monitoring data to match with the most up-to-date land use/land cover data released by the Korean Ministry of Environment. Thus, there is a possibility that models estimated using datasets for other years might be slightly different from the model estimated using the 2011 dataset.
3.2. Selecting the Best Predictors for Biological Indicators
Before estimating the OLS and GWR models for each biological indicator, we conducted a preliminary regression analysis using the water quality parameters (i.e., BOD, T-N, T-P), topographic variables (i.e., elevation and slope), and land use parameters (i.e., developed areas, forested areas, agricultural areas, grass, wetland, and bare soils) to select the best predictive variables. To select the best-fit model for each biological indicator, we used the stepwise option in the SPSS for Windows software (SPSS Inc., Chicago, IL, USA) giving the R2, F-statistics, and t values (p < 0.05) of each variable.
In
Table 3, a model with %forests, T-N, T-P, %bare soil, %wetland, and elevation had the highest adjusted-
R2 (0.41) value for the TDI. For the KSI, a model with %developed areas and concentration of T-P had the highest
R2 value (0.32,
F = 22.50,
p < 0.01). The percentage of forests, concentration of T-P, BOD, and elevation were the most significant variables for explaining the variance of IBI in the study region (adjusted-
R2 = 0.42,
F = 26.01,
p < 0.01). From a land use perspective, the proportions of forests, developed areas, wetland, and grass in the watershed seemed to be significant variables for explaining the variances of the TDI, KSI, and IBI. In terms of water quality parameters, the concentration of T-P was the most significant determinant for all indicators, while the concentration of T-N and BOD was the most significant variable only for TDI and IBI, respectively.
In most previous studies dealing with the relationships between land use and water quality, the proportion of forest in the watersheds had a strong positive relationship with water quality parameters, while the percentage of developed areas had a strong negative relationship with water quality [
1,
2,
3,
4] and biological indicators [
5,
6,
7,
8,
9,
10,
11,
12]. However, it was rare for both variables to be significant in a regression model due to a negative mutual relationship. In our preliminary analysis, there was a strong negative correlation between %forest and %developed areas (
r = 0.61) in the study areas. Despite there being other variables affecting the biological indicators, we focused on only two contrasting land use types (
i.e., forest and developed areas) in our study. In particular, we investigated the spatial pattern of the coefficients of these two variables in GWR models for the TDI, KSI, and IBI when holding the other significant determinants constant.
3.3. Comparison between OLS and GWR Models
We compared the performance of the general OLS (global) and GWR (local) models for the TDI indicators (
Table 4). In the global model, forest land had a positive effect (b = 0.25,
β = 0.18) on TDI indicators, while the concentration of T-P and T-N had negative effects (b = −111.38,
β = −0.32, b = −4.98,
β = −0.21). Thus, a higher proportion of forest in watersheds may enhance the benthic diatom communities of streams. Conversely, a higher concentration of T-P and T-N had adverse effects on benthic diatom communities. The percentage of bare soil in the watershed appeared to have a negative impact on the TDI (b = −3.44,
β = −0.24), while the percentage of wetland had a positive influence on the TDI (b = 5.27,
β = 0.19). Elevation also had a positive effect on the TDI of streams (b = 0.02,
β = 0.15). In the global model (OLS model), the intercept, land uses in watersheds (e.g., proportion of forest, bare soils, and wetland), water quality parameters (e.g., concentrations of T-N and T-P), and topographic characteristics (e.g., elevation) were significant at
p < 0.01, and the global model was also significant overall (
F = 16.71,
p < 0.01).
The adjusted R2 of the global model was 0.41, indicating that 41% of the variance in the TDI across streams in the study area could be explained by three land use variables, two water quality parameters, and one topographic variable, while the remaining 59% was not explainable with these six variables. The R2 value of the GWR model was 0.44, which was slightly higher than the R2 value of the OLS model, suggesting that the GWR model performed better than the OLS model in explaining the variance of the TDI in the study areas. Similarly, the AICc values of the global and local models were 1165.43 and 1159.81, respectively. The lower AICc values of the GWR model also suggested a closer approximation of the model to the actual nature of the relationships between the dependent variables and TDI indicators. The Moran’s I value of the residuals in the local model was −0.10, which was slightly higher than in the global model (−0 09). However, the difference in the Moran’s I values for the two models was very small (0.01).
As discussed previously, the relative performance of the OLS and GWR models can be assessed based on the R2, AICc, and Moran’s I values of model residuals. Comparisons of these criteria suggested that the local model (GWR) performed better in explaining the variance of the TDI in the study areas and the presence of non-stationarity in the relationships between dependent variables, including the proportion of forest and TDI over space. The presence of non-stationarity suggested that the influence of forest on the TDI might vary over the study areas.
For the KSI, the proportion of the variation explained by the OLS model was modest (
R2 = 0.33) (
Table 5). About 32% of the KSI variance could be explained by the percentage of developed areas in the watershed (b = −2.87,
β = −0.33), the concentration of T-P (b = −100.96,
β = −0.25), and the percentage of grass areas (b = −4.77,
β = −0.19), while the remaining 68% could not be explained by these variables. The high F- statistic (23.5,
p < 0.01) for the OLS model suggests that this model was significant for the KSI. The results of this model further suggested an inverse relationship between the proportion of developed land in watersheds and the KSI values of streams. From a land use perspective, the proportion of developed land had the highest
β value (−0.33) among the effective independent variables, including the concentration of T-P (
β = −0.25) and proportion of grass areas (
β = −0.19) in the OLS model.
The GWR model had the same R2 value (0.32) as the OLS model (0.32), suggesting that the GWR model explains almost the same amount of variance as the KSI. Furthermore, the similar AICc values of the OLS (1215.70) and GWR (1217.88) models revealed that both described the relationship between the independent variables and KSI to a similar degree of accuracy. The spatial autocorrelation indexes, measured by Moran’s I values, of the OLS (−0.04) and GWR (−0.06) models were very similar, suggesting that there was no significant spatial dependency of the residuals in the two models. The comparison between the two models of the KSI also indicated that non-stationarity effects were not present in the relationships between the KSI and independent variables, including the proportion of developed areas, the concentration of T-P, and the percentage of grass areas in watersheds.
For the IBI, the results of the global (OLS) model indicated that IBI values increased significantly with the proportion of forest within watersheds (b = 0.47,
β = 0.32,
p < 0.01) and elevated land (b = 0.03,
β = 0.18,
p < 0.01). Conversely, the IBI values were inversely related to the concentration of T-P (b = –81.91,
β = −0.23,
p < 0.01) and BOD (b = −5.99,
β = −0.22,
p < 0.01). The adjusted
R2 of the global model was 0.42, indicating that ~42% of the variance in the IBI among streams can be explained by the four variables of forests, T-P, BOD, and elevation. The
F-value (26.57) of the OLS model was significant (
p < 0.01) (
Table 6).
The R2 value of the GWR model (0.49) was considerably higher than that of the OLS, and suggested that ~49% of the variance in the IBI among study sites could be explained by the proportions of forests, T-P, BOD, and elevation. The GWR model also had a lower AIC value (1165.13) than that of the OLS model (1171.27). Both the higher R2 value and the lower AIC value strongly indicate that the GWR model performed better in terms of explaining the IBI variance and approximating reality. Furthermore, the lower Moran’s I value (–0.01) of the GWR model compared with the OLS model (0.07) indicates that the residuals in the former model exhibited less spatial dependency. The higher R2, lower AICc and lower Moran’s I of the GWR model strongly suggest the presence of non-stationarity between the independent variables and IBI in the study areas. The presence of non-stationarity in the relationships suggests that the influence of the proportion of forest, along with other independent variables, might vary stream by stream in the study areas.
Overall, the results of the OLS models (
Table 4,
Table 5 and
Table 6) indicated that the selected variables for each biological indicator could explain ~41%, 32%, and 42% of the variance in the TDI, KSI, and IBI, respectively. No considerable differences between the OLS and GWR models were observed for the KSI indicators. The estimated OLS and GWR models of the KSI had almost the same
R2, AICc, and Moran’s
I values suggesting that there might be no non-stationary effects of land use, water quality, and topographic variables for the KSI models. Compared to the OLS model, the considerably higher
R2 value and
β-value of the GWR model for the TDI and IBI indicated that, according to this model, the TDI and IBI were more sensitive to the heterogeneity of forest coverage than the KSI. Therefore, a higher percentage of forest land in watersheds may substantially enhance the ecological conditions as measured by the RDI and IBI, and the relationships between forests and two indicators (
i.e., TDI and IBI) might vary over space (
i.e., non-stationary effects). Interestingly, the negative impacts of developed areas were found only in the OLS model for the KSI implying that the higher proportion of developed areas in watersheds can adversely affect the KSI to a greater extent than the other biological indicators.
Positive influences of forests were found in the TDI and IBI models, suggesting that a higher proportion of forests in watersheds may enhance the TDI and IBI in streams. Interestingly, land use (developed areas or forests) in watersheds appeared to be a more significant variable than water quality parameters or topographic variables in the KSI and IBI models. In the TDI model, a water quality parameter (i.e., T-P) was more significant than land use or topographic variables.
In contrast to the OLS model, the GWR model assumes non-stationarity in the relationship between a dependent variable (i.e., a biological indicator) and independent variable (i.e., the proportions of forest or developed land in watersheds). In the comparison of the performance of the OLS and GWR models, the GWR models of the KSI were not able to better explain the variances of the KSI in the study areas. However, the GWR model of the TDI and IBI clearly performed better than the OLS model in terms of the R2, AICc, and spatial autocorrelation index values (i.e., Moran’s I). The GWR model is based on non-stationary effects (Equation (2)), while the OLS model is based on stationary effects (Equation (1)). Therefore, the superior performance of the GWR models strongly suggests non-stationarity in the effects of land use (i.e., forests) on biological indicators (i.e., the TDI and IBI). The OLS model might be an effective tool for understanding regionally averaged effects of land use on ecological conditions, but this global model cannot capture local variations in such effects. For some watersheds and indicator types, the OLS model might overestimate or underestimate the effects of land use.
3.4. Description of Local Estimated Land Use Effects in GWR models
In the GWR models, descriptive statistics for local
R2 and land use coefficients for the TDI, KSI, and IBI vary greatly in each GWR model (
Table 7). For example, the proportion of forest with other variables in the local (GWR) TDI model could explain about 38% (minimum) of the variance in the TDI among streams in some watersheds, while it could explain 48% (maximum) of the variance in the TDI among other streams. Similarly, the coefficients of the proportions of forest varied to a considerable degree among watersheds, ranging from 0.08 to 0.31. Despite this variation, TDI values always increased with the proportion of forest land. The mean
R2 and coefficient values for forests in the local TDI model were 0.44 and 0. 20, respectively.
Similarly, we also found a large degree of variance in R2 values and the coefficients of developed land in the local KSI model. In particular, the changes of R2 values in the KSI models surprisingly varied watershed by watershed. In some watersheds, the proportion of developed areas and other variables explained a small proportion of the KSI variance (15%), while in other areas they explained up to 41% of the variance. The proportion of developed areas in watersheds had a negative relationship with the KSI, ranging from −3.12 to −2.31.
The
R2 of the GWR model for the IBI also varied considerably among the study areas, ranging from 0.27 to 0.54. This variance indicates that the proportion of forests, T-P, BOD, and elevation were not consistently able to explain IBI values over space. Although the effect varied significantly (−0.03 –0.98), higher proportions of forest in watersheds were associated with increased IBI values in streams. The mean value for the coefficient of the proportion of forest areas was 0.39, the minimum value was −0.03, and the maximum value was 0.98. It is interesting to note that, while in most watersheds the proportion of forests was associated with increased IBI values for streams, in some watersheds forests had a negligible (or even negative) relationship with IBI values (
Table 7). The standard deviation of %developed (
SD = 0.25) land in the KSI local model and %forest (
SD = 0.30) in the IBI local model were relatively high.
Scatterplots between observed and predicted values of the TDI local model indicate that most sites fell within the 95% confidence range of the estimated GWR model. It seemed that the observed values in the low range of TDI values were underestimated in the GWR model (
Figure 5a). The KSI-GWR model showed a clear relationship in the middle–high range of KSI values. The GWR model overestimated in the range of observed KSI values from 20 to 40, while it underestimated in the range from 0 to 20. Seven watersheds were outside the 95% confidence interval (
Figure 5b). The IBI-GWR model produced an even more complex estimation pattern. In the high range, three watersheds were overestimated, and one watershed was underestimated. In the middle range, one was overestimated and one was underestimated. In the low range of IBI values, two watersheds were underestimated in the GWR model (
Figure 5c).