**Recognizing Crucial Aquatic Factors Influencing Greenhouse Gas Emissions in the Eutrophication Zone of Taihu Lake, China**

### **Yulin Wang 1, Liang Wang 2,\*,**<sup>+</sup>**, Jilin Cheng 2, Chengda He 1,\*,**† **and Haomiao Cheng <sup>1</sup>**


Received: 14 August 2019; Accepted: 17 September 2019; Published: 20 September 2019

**Abstract:** Greenhouse gas (GHG) emissions, which are closely related to climate change and serious ecological instability, have attracted global attention. The estimation of crucial aquatic factors for the flux of GHGs in lakes is a key step in controlling and reducing GHG emissions. The importance of 14 aquatic factors for GHG emissions was estimated in Meiliang Bay, which is an eutrophication shallow bay in Taihu Lake in eastern China. The random forest (RF) method, which is an improved version of the classified and regression tree (CART) model, was employed. No distribution assumption on variables was required in this method and it could include nonlinear actions and interactions among factors. The results show significant positive correlations among the fluxes of CO2, CH4, and N2O. The most crucial factor influencing CO2 emissions is the water temperature (WT) followed by sulfate (SO4 <sup>2</sup>−), alkalinity (Alk), dissolved oxygen (DO), and nitrate (NO3 −–N). The important factors for CH4 emissions are WT, SO4 <sup>2</sup><sup>−</sup>, DO, Alk, and NO2 <sup>−</sup>–N. The outcome for N2O, in which the key factor is NO2 <sup>−</sup>–N, was slightly different from those of CO2 and CH4. A comprehensive ranking index (CRI) for the fluxes of all three GHGs was also calculated and showed that WT, NO2 <sup>−</sup>–N, SO4 <sup>2</sup>−, DO, and Alk are the most crucial aquatic factors. These results indicate that increasing DO might be the most effective means of controlling GHG emissions in eutrophication lake bays. The role of SO4 <sup>2</sup><sup>−</sup> in GHG emissions, which has previously been ignored, is also worth paying attention to. This study provides a useful basis for controlling GHG emissions in eutrophication shallow lake bays.

**Keywords:** GHGs; aquatic factors; random forest; water temperature; nitrogen; sulfate

### **1. Introduction**

The emission of greenhouse gases (GHGs) to the atmosphere is closely related to climate change [1], resulting in significant disruption in biological living conditions and ecosystem instability [2,3]. Natural lakes, though representing only about 2% of the land surface area, are important sources of GHGs such as carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O) [4–6], and the emission of GHGs in lakes has therefore attracted the interest of many researchers.

According to previous assessments, lakes contribute about 71.6 TgC CH4 and 1943 TgC CO2 to the atmosphere per year [5,7–9]. However, these data remain largely uncertain due to the spatial heterogeneity of emissions in waterbodies [9,10]. Furthermore, it is even harder to estimate the contribution of N2O from global lakes [11]. In large developing countries, such as China, the problem may be more serious as data are lacking [12]. The flux of GHGs in lakes is also drastically different according to the distinct nutrient level zones. The CH4 flux in the East Plain Lakes zone is about two times more than in the Tibetan Plateau and Inner Mongolia–Xinjiang Lakes zone [13]. GHG emissions may be different in the same lake. Previous observations have shown a one order of magnitude larger CO2 flux in the overeutrophication zone compared with the eutrophication and mesotrophic zones in Taihu Lake in China [14]. The flux of N2O in the emergent macrophyte-type area was also about 1.5 and 30 times larger than in the algae-type and submerged macrophyte areas, respectively, during summer [15], while in winter, the flux of the algae-type area was the largest [16].

Many factors affect the flux of GHGs. Inorganic nitrogen compounds such as nitrate [15,17,18] and ammonia nitrogen [19] are the factors controlling N2O production, while total phosphorus and chlorophyll A promote CO2 [14,20] and CH4 [20–22] production in waterbodies. The water temperature, wind speed, water velocity, and turbulence are common factors influencing the three kinds of GHGs [23–25]. Other factors, for instance, pH, dissolved oxygen, and chloride ions may also affect the release of GHGs [16,26]. However, the roles of aquatic factors in the control of GHG emissions remain controversial because of their complex effects [15,27,28]. Identifying the main controlling factors and their roles is critical for further understanding the mechanisms of GHG emissions. Recognizing the aquatic variables affecting GHG emissions, especially under the nonlinear action and interaction effects of aquatic factors, is still an urgent problem to be solved.

China has 2700 lakes with a total area of 81,414.6 km<sup>2</sup> [29]. The carbon emission of lakes in China is larger than the mean of the world lakes in the temperate zone [7]. An initial assessment showed that the lakes in China release 3.0 TgC CH4 per year [13]. Controlling the GHG flux from lakes, especially from the eutrophication lake bay, will play a key role for China in meeting their United Nations Framework Convention on Climate Change (UNFCCC) commitments. It is also very important for the sustainability of lake water in a social and environmental dimension [30]. Hence, it is necessary to recognize the crucial aquatic factors influencing the GHG flux at different nutrient level zones in China, especially the eutrophication zone.

Here, we provide a combination approach to identify important variables for GHG flux in Meiliang Bay, which is an eutrophication zone of Taihu Lake in eastern China. The statistic and seasonal characteristics of 14 aquatic factors and GHG emissions in this lake bay are performed and Pearson coefficients among them are also shown. The random forest (RF) method, which can take into account the nonlinear effects and interaction effects of factors, is employed to identify the most important factors influencing the flux of the three types of GHGs. A comprehensive ranking of the GHGs is also given.

The results showed dissolved oxygen, water temperature, alkalinity and nitrite are very important for the flux of GHGs. Sulfate, which may have been ignored by previous studies, also play a crucial roles in GHG emissions. Although this assessment is based on a specific shallow lake bay, it is a useful method and its result could easily be popularized to clarify the vital factors and their roles in GHG flux in other large shallow fresh water lakes, such as Chaohu Lake [31,32] and Hongze Lake [33], in eastern of China.

### **2. Methods**

### *2.1. Data Sources*

Taihu Lake, which is located in the north subtropical monsoon climate region, is the third largest freshwater lake in China. It has an area of approximately 2445 km2 with a mean depth of about only 1.9 m [15]. There are about 100 million people living around the lake, contributing over 5 trillion dollars to the GDP in the year 2018. Shanghai, which is the most developed city in China, is also close to the lake. Hence, any research conducted on Taihu Lake could have potentially significant implications for China.

In recent decades, Taihu Lake has suffered from a eutrophication problem. The water quality in the north and east of the lake has improved, while it has deteriorated in other regions, especially from the 1990s to the 2010s [34,35]. The west eastern corner of the lake is Meiliang Bay, which is eutrophicated, and algae blooms occur frequently in spring and summer. Figure 1 shows the location of Taihu Lake and Meiliang Bay.

**Figure 1.** Location of Taihu Lake.

The research on GHG emissions carried out in this bay shows that there is considerable flux in GHGs that is significantly different from other regions in the lake [14–16]. However, only relationships between a few aquatic factors and the flux of GHGs were simultaneously considered in Meiliang Bay. The nonlinear relationships were also ignored because linear regressions and Pearson correlation analysis were applied. In this paper, the observation data of CO2, CH4, and N2O in this bay (Figure 1 shows the observation site) are analyzed using 14 aquatic factors, including inorganic nitrogen (nitrate, NO3 <sup>−</sup>–N; ammonia, NH4 <sup>+</sup>–N; and nitrite, NO2 <sup>−</sup>–N), phosphorus (phosphate, PO4 <sup>3</sup>−; dissolved total phosphorus (dTP)), the response of nutrients (chlorophyll A ,Chl-a), physical indices of water (water temperature (WT); water depth (WD); and Secchi depth (SD)) and other chemical factors (dissolved oxygen (DO); sulfate, SO4 <sup>2</sup><sup>−</sup>; O2 demand (CODMn); pH; and alkalinity, Alk). The data were sampled once per month from January 2004 to December 2004, and all data have been previously published [20,36].

### *2.2. Statistical Analyses*

The RF method is an improved and robust version of the classified and regression tree (CART) method. It introduces the bootstrapping aggregation approach into CART [37–39] and calculates the predicted values by averaging the results of CART trees on bootstrap samples [40]. The variables used in the RF method should have an independent identical distribution [40]. However, different from the common bagging tree method, the RF method resamples prediction features at every split node to ensure independence among the selected features [41], and the scaled observations would reduce the difference in their distributions. Similar to many other statistical methods, the violation of the property of the identical distribution may not lead to serious consequences [42].

The biggest advantage of RF is that the nonlinear and interaction effects of independent variables can be included. RF has been demonstrated, through practice, to be a successful machine learning method for forecasting [43,44].

Although the RF model was proposed for prediction, it can be used for other purposes. The RF model can determine the quantitative importance of predictors using some indices. The most popular index, which is also used in this paper, is an increase in node purity. The node purity was measured by the Gini index [38,40]. All computations were completed using the R (3.6.1) language [45] with the randomForest package.

After calculating the importance of all aquatic factors for CO2, CH4, and N2O, a comprehensive importance index was developed to further investigate, because the significant correlations between GHG emissions were shown. The comprehensive index for all three GHGs can be given by the following equation:

$$\begin{array}{l} \mathcal{R}\_k = \sum\_{i=1}^3 \frac{1}{r\_{ik}}\\ \mathcal{CRI} = rank(\mathcal{R}\_k) \end{array} \tag{1}$$

where *Rk* is the index whose rank is determined as the comprehensive ranking index (CRI) for the k-th factor and *ri,k* is the importance rank of the k-th factor for the i-th GHG. CRI is a simple, helpful, and widely used index to measure comprehensive importance [46].

### **3. Results and Discussion**

As seen in Table 1, the flux of GHGs showed no significant differences to the observations in previous years for Meiliang Bay [14–16]. The minimum values of the three GHGs were lower than 0, indicating that, at some stages, the lake acts as a sink for these GHGs, which is also in agreement with previous studies [14,15,27]. This induced large fluctuations of GHG emissions compared with the aquatic factors. Based on the observations of CO2, CH4, and N2O emissions, the coefficients of variation (CVs) were respectively 1.64, 1.50, and 1.50. These values imply that the fluxes were strongly influenced by the water environment.


**Table 1.** Statistics of Observations (n = 12).

\* dTP: dissolved total phosphorus, Chl-a: chlorophyll A, WT: water temperature, WD: water depth, SD: Secchi depth, DO: dissolved oxygen, Alk: alkalinity, CODMn: O2 demand.

The values of aquatic factors showed that the water quality was not very poor, while the high mean concentration of Chl-a and the very low SD showed that eutrophication at the site was serious. The large CVs of NH4 <sup>+</sup>–N and NO2 −–N, which were 1.14 and 0.87, respectively, indicated that the release of GHGs might be affected by nitrogen. It was also observed that the concentration of SO4 <sup>2</sup><sup>−</sup> in the lake, which has previously been ignored, was very high. The minimum value of pH was larger than 8.0, meaning the water was weakly alkaline, which would also have affected the production of N2O [47].

Figure 2 shows the scaled time series of the concentrations of the 14 aquatic factors and the flux in the three GHGs in the year 2004. All values were scaled by their mean and standard deviations.

**Figure 2.** Scaled time series in the year 2004.

As shown in Figure 2, the GHG fluxes showed strong seasonal changes. Combined with the results of Table 1, they progressed from one extreme to the other from winter to summer. The maximum values of flux occurred in July, and the data for CH4 and CO2 in September also showed high values, while for N2O, a second peak emerged in October. November was the only month that Meiliang Bay appeared to be a sink for all three GHGs. The observed values of N2O showed slight differences to the data collected from Meiliang Bay in 2017 [15,16], but the characteristics of CH4 were in agreement with those observed in Donghu Lake in China, which is similar to Taihu Lake [28]. This difference might be because N2O is controlled by nitrogen, while CH4 emissions are not. In addition, the relatively high concentration of NO2 <sup>−</sup>–N in October might be the reason for the high concentration of N2O observed in this month.

It was clearly observed that the water quality was better in winter, i.e., from November to January, and this can be explained by the low level of agricultural activity. The high concentrations of NH4 <sup>+</sup>–N, NO3 <sup>2</sup><sup>−</sup>–N, and SO4 <sup>2</sup><sup>−</sup> in December were notable exceptions. Laboratory experiments showed that the low temperature would decrease the activity of nitrifiers and denitrifiers [48,49], and so both NH4 <sup>+</sup>–N and NO3 <sup>−</sup>–N accumulated. The reduction in SO4 <sup>2</sup><sup>−</sup> was also weakened as the sulfate-reducing bacteria (SRB) were also influenced by low WT. References indicate that SO4 <sup>2</sup><sup>−</sup> and SRB are closely linked to nitrogen cycling [50]; thus, the variation of NH4 <sup>+</sup>–N and NO3 <sup>2</sup>−–N in the time series showed the same pattern as that of SO4 2−.

In addition to the time series, strong correlations were also observed between CO2 and CH4 and between CH4 and N2O, with Pearson coefficients being significant at *p* < 0.01 in Figure 3. Denitrification, acetate fermentation, and CO2 reduction, which connect the production of CO2, CH4, and N2O [51] could explain this outcome. The relationship between N2O and CO2 was a little less significant (0.01 < *p* < 0.05). This might be because the production pathways of N2O and CO2 were not involved with each other directly. SO4 <sup>2</sup><sup>−</sup> showed a negative correlation with the fluxes of CO2 and CH4 at *p* < 0.1. There have been few studies on the effects of SO4 <sup>2</sup><sup>−</sup> on GHG emissions, but those that have been done have suggested that SRB could take part in reactions with CH4 [52,53]. SO4 <sup>2</sup><sup>−</sup> would also impact the nitrogen cycle [51], which may be another reason for the significant correlation among the three GHGs.

The crucial aquatic factors were similar for CH4 and CO2, with WT, DO, and Alk being recognized as key factors, which is in agreement with previous studies [20,23,26]. Chl-a was shown to have little effect on GHG emissions, which differs from previous research [14,20]. Having noticed that Chl-a also showed strong correlations with WT, DO, CODMn, pH, and Alk, the weak impact that was observed might be the result of complex interactions amongst the different factors.

Few factors show a correlation with the flux of N2O, with the exception of NO2 −–N. This should not be surprising considering the fact that NO2 - -N is the intermediate product of the denitrification reaction of nitrifying bacteria [54], which produces N2O [55]. However, the effects of NO3 <sup>−</sup> and NH4 + shown by some studies [15–17] may be masked by the effects of NO2 - -N under linear relationships. The influences of SO4 <sup>2</sup><sup>−</sup> on NO3 - -N and NH4 <sup>+</sup>-N were very complex. The strong correlations could be explained by the effects of sulfur and sulfate on NO3 - -N reduction and NH4 <sup>+</sup>-N oxidation [51]. We will describe and summarize these reaction details after the results of RF have been shown.

Figure 4 shows the importance of aquatic factors on the three GHGs, measured by an increase in the Gini index in the RF models. The explained variance of the three models was 80.4%, 86.2%, and 75.1%, respectively, for CO2, CH4, and N2O, and these results imply that the RF models were adequate for exploring the crucial factors.

**Figure 4.** The importance order of aquatic factors for greenhouse gas (GHG) emissions using the random forest (RF) method.

As seen in Figure 4, some RF results were in agreement with the Pearson coefficients, while others were not. The results of RF showed that the five most important aquatic factors for the three GHGs were similar. WT was the first key variable implicated in the flux of CO2 and CH4, while it was also second most important for N2O emissions. This is because methane bacteria choose different methanogen metabolic pathways [56,57] under different temperatures. The nitrifiers and denitrifiers are also sensitive to temperature [49]. This outcome was in agreement with the results of the Pearson coefficients (Figure 3) and were also analogous with the results of other field studies [23,24,58].

DO also played an important role in the emissions of the three GHGs. Methane bacteria are a diverse group of strict anaerobes [59] and are, therefore, greatly influenced by DO. The two main pathways for producing CH4, acetate fermentation and CO2 reduction, are both associated with CO2 [51,57]. This may be part of the reason why DO also impacts CO2 production. The results of the linear correlations and field observations also confirmed the effects of DO [20,56]. In addition, the observations showed that Alk would impact the carbon dioxide partial pressure [60] and anaerobic digestion [61], so Alk greatly influences the flux of CO2 and CH4. For N2O emissions, although both nitrification and denitrification would produce N2O, DO has a dominant influence in determining the pathway. This can explain the importance of DO for N2O [15,16].

Nitrogen compounds, including NO3 <sup>−</sup>–N, NO2 <sup>−</sup>–N, and NH4 <sup>+</sup>–N, were shown to be important for all three GHGs. It is easy to understand the effects of them on N2O, while it should be noted that NO2 −–N was shown to be more important. This may be because, in all four main pathways, the producing NO2 <sup>−</sup>–N are closer to N2O than NO3 −–N [62]. Both denitrification and dissimilatory nitrogen reduction to ammonium (DNRA) oxidize organic matter and then produce CO2 [51,59], so the importance of nitrogen for CO2 can be rationalized. Methanotrophs outcompete nitrifiers for O2 when CH4 is sufficiently abundant, as more energy can be released from oxidizing methane than from oxidizing NH4 <sup>+</sup> [51]. This is a good explanation for the negative relationship between NH4 <sup>+</sup> and CH4 shown by the Pearson coefficients, as well as the importance of nitrogen in the results of the RF method and field observations [21].

SO4 <sup>2</sup><sup>−</sup> was crucial in determining the flux of all three GHGs (Figure 4). This seemed a little strange as SO4 <sup>2</sup><sup>−</sup> has been often taken for granted when assessing GHG emissions from the lake during field studies. However, the result should not be surprising. On the one hand, NO3 − may be used in the oxidation of reduced sulfur (S<sup>0</sup> or S2−) and the production of SO4 <sup>2</sup>−. These processes may occur in preference to DNRA and denitrification [51,59]. On the other hand, SO4 <sup>2</sup><sup>−</sup> reduction by SRB could also produce CO2 [51]. Additionally, observations in freshwater wetlands indicated that SO4 <sup>2</sup><sup>−</sup> input would suppress CH4 flux because of the higher energy alternative provided by SO4 <sup>2</sup><sup>−</sup> reduction [63,64]. In summary, SO4 <sup>2</sup><sup>−</sup> plays an important role in CO2, CH4, and N2O production.

Generally speaking, the effects of nitrogen compounds, SO4 <sup>2</sup><sup>−</sup> and DO on GHG emissions is very complex and is summarized as concept models as follows.

As seen in Figure 5, the DO, WT, and ALK are conditions that affect the reactions. For example, DO determines if anaerobic or aerobic oxidation can take place, and it also chooses what type of reduction will happen. WT is also very important and it can affect the activities of SRB, nitrifiers, and denitrifiers. Different from these conditions, the NO3 <sup>−</sup>–N, NO2 <sup>−</sup>–N, NH4 <sup>+</sup>–N, and SO4 <sup>2</sup><sup>−</sup> are involved with the production of GHGs directly. From the models, SO4 <sup>2</sup><sup>−</sup> participates in the production of all three GHGs simultaneously, which highlights its importance and its complex effects on GHG emissions.

**Figure 5.** Concept models for GHG emissions. (**a**) Concept model for CH4 and CO2; (**b**) Concept model for N2O.

The significant Pearson coefficients among CO2, CH4, and N2O highlight the necessity for working out a comprehensive importance index for the flux of all three GHGs. The CRI values of the five most important aquatic factors calculated by Equation 1 are shown in Table 2.


**Table 2.** The comprehensive importance ranking index for GHG emissions.

\* CRI: comprehensive ranking index defined in equation 1.

The results of CRI showed that WT, NO2 <sup>−</sup>–N, SO4 <sup>2</sup>−, DO, and Alk are the five most crucial aquatic factors influencing the flux of GHGs. The importance and positive relationships between WT and GHG emissions remind us that the largest flux should appear in summer. Perhaps the emissions will become larger and larger as global warming progresses [1]. NO2 <sup>−</sup>–N is not very important for CO2 and CH4, while it is still the second crucial factors for GHG emissions because of its importance for N2O production. Compared with WT and NO2 <sup>−</sup>–N, SO4 <sup>2</sup><sup>−</sup> is the third key factors for the flux of GHGs because of its importance in all three. Maybe this is an indication that it is feasible to control GHG emissions by increasing the concentration of SO4 <sup>2</sup><sup>−</sup> in lake bays.

### **4. Conclusions**

GHG emissions, which lead to serious ecological problems, have attracted widespread attention. The estimation of crucial aquatic factors in the flux of GHGs in lakes has played a key role in reducing GHG emissions. In this paper, RF methods, taking into account nonlinear effects and interaction effects of factors, were employed to identify the crucial factors among 14 aquatic variables in the flux of GHGs in a eutrophicated lake bay.

The results showed significant positive correlations between the fluxes of CO2 and CH4, which were shown to be affected by similar factors, while there was little difference for N2O. WT, SO4 <sup>2</sup>−, Alk, DO, and NO3 <sup>−</sup>–N were identified as the five key factors in CO2 emissions, while for CH4, the key factors were WT, SO4 <sup>2</sup><sup>−</sup>, DO, Alk, and NO2 <sup>−</sup>–N. The outcome that NO2 −–N is the most crucial factor for N2O emissions while NO3 <sup>−</sup>–N is the fifth showed the importance of nitrogen in the flux of N2O. Apart from these common factors, SO4 <sup>2</sup>−, which has been previously ignored, was also shown to play an important role in GHG emissions. It is the second most influential factor for CO2 and CH4, and the fourth factor for N2O. The concept models showed that SO4 <sup>2</sup><sup>−</sup> had very complex effects on the production of CO2 and CH4, as well as on the nitrogen cycle.

The outcomes of the comprehensive ranking index for the flux of all three GHGs have also been shown. WT, NO2 <sup>−</sup>–N, SO4 <sup>2</sup>−, DO, and Alk were found to be the five most crucial aquatic factors. Compared with WT and Alk, the remaining factors are easier to manage by engineered measures. A comprehensive analysis of the results show that increasing the DO might be the most effective way of controlling GHG emissions in eutrophication lakes. Apart from the direct benefits of increasing DO, such as reducing the fluxes of CO2 and CH4, N2O emissions should also reduce, led by the decrease in the concentration of NO2 <sup>−</sup>–N. It seems that a higher SO4 <sup>2</sup><sup>−</sup> concentration would also be good for decreasing GHG emissions, but this can be a dilemma for water quality managers because there is evidence that excess SO4 <sup>2</sup><sup>−</sup> can lead to black blooms in shallow lakes [65].

This study provides useful information for controlling GHG emissions in eutrophicated shallow lake bays. However, there is still work to be done. The quantitative mechanism model for water factors and GHG emissions in shallow lake bays is a very important topic for GHG emission reduction. This model will become more detailed as research continues. The smooth linear regression in Figure 3 suggests that there are threshold points for these relationships. The existence of threshold points indicates the necessity of investigating these crucial factors in GHG emissions using advanced methods. The role of SO4 <sup>2</sup><sup>−</sup> should also receive more attention in future studies.

**Author Contributions:** Yulin Wang conducted the research and completed the first manuscript of the paper. Liang Wang and Chengda He guided the research and improved the manuscript. Haomiao Cheng and Jilin Cheng collected the data and created the figures.

**Funding:** This work was financially supported by the National Key R & D Program of China (Grant No.2017YFC0403205), the National Natural Science Foundation of China (Grant Nos. 51909230 and 51809226), the China Postdoctoral Science Foundation Funded Project (Grant No. 2018M632390), the Jiangsu Planned Projects for Postdoctoral Research Funds (Grant No.2018K124C), and the Jiangsu Funded Recruitment of Postdoctoral Project (Grant No. 2018Z051).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **A Multivariate Geomorphometric Approach to Prioritize Erosion-Prone Watersheds**

**Jesús A. Prieto-Amparán 1, Alfredo Pinedo-Alvarez 1, Griselda Vázquez-Quintero 2, María C. Valles-Aragón 2, Argelia E. Rascón-Ramos 1, Martin Martinez-Salvador <sup>1</sup> and Federico Villarreal-Guerrero 1,\***


Received: 15 August 2019; Accepted: 15 September 2019; Published: 19 September 2019

**Abstract:** Soil erosion is considered one of the main degradation processes in ecosystems located in developing countries. In northern Mexico, one of the most important hydrological regions is the Conchos River Basin (CRB) due to its utilization as a runoff source. However, the CRB is subjected to significant erosion processes due to natural and anthropogenic causes. Thus, classifying the CRB's watersheds based on their erosion susceptibility is of great importance. This study classified and then prioritized the 31 watersheds composing the CRB. For that, multivariate techniques such as principal component analysis (PCA), group analysis (GA), and the ranking methodology known as compound parameter (*Cp*) were used. After a correlation analysis, the values of 26 from 33 geomorphometric parameters estimated from each watershed served for the evaluation. The PCA defined linear-type parameters as the main source of variability among the watersheds. The GA and the *Cp* were effective for grouping the watersheds in five groups, and provided the information for the spatial analysis. The GA methodology best classified the watersheds based on the variance of their parameters. The group with the highest prioritization and erosion susceptibility included watersheds RH24Lf, RH24Lb, RH24Nc, and RH24Jb. These watersheds are potential candidates for the implementation of soil conservation practices.

**Keywords:** prioritization; geomorphometric parameters; compound parameter; geospatial distribution; GIS

### **1. Introduction**

Soil erosion is considered one of the most important degradation processes in the world [1,2]. The soil resource is limited and its wide use is of utmost importance; it sustains biogeochemical processes and is the habitat for a great diversity of microorganisms [3]. Sustained soil development, conservation, and restoration is one of today's main challenges for humankind.

Hydric erosive processes affect the fertile soil layer, which is a key factor in the primary production of ecosystems [4]. The production of goods and satisfiers for the population such as wood, food, fiber, fodder, water, and recreational areas, among others, in addition to industrial expansion and the need for infrastructure facilities, have increased land-use/land-cover changes, increasing the pressure over the soil [5]. This has caused experts to pay more attention to the growing trend of soil erosion and the importance of water and soil conservation for achieving sustainable development.

Integrated watershed management is an alternative for soil management [6–8]. Watersheds are one of the spatial units that are used for the planning and management of soil resources [9]. Management implies the characterization of the ecosystems inside the watershed and the understanding of the relationships between uplands, lowlands, land use/land cover, geomorphic processes, slope, and soil [10, 11]. In watershed management, erosion control is one of the main components [3]. Thus, the hydrological planning and monitoring of a watershed is important for the development of environmental policies [12]. In this sense, the quantification of the watersheds' characteristics is fundamental to understanding their dynamics and degradation levels. This knowledge serves to define and implement strategies to prevent soil erosive processes and promote the conservation and restoration of watersheds [13].

Morphometry is used in the analysis of the watershed configuration [14]. Such methodology was developed by Horton [15] and then modified by Strahler [16], and provides information on the behavior of the basin [17]. It is an important tool for identifying and prioritizing eroded watersheds [18].

Nevertheless, monitoring soil erosion *in situ* is costly and time consuming in large watersheds. Thus, the analysis of geomorphometry is often carried out based on geographic information systems (GIS) [19–23]. On a spatial scale, geomorphometric parameters, e.g., the Gravelius compactness coefficient [24] and elongation ratio [25], among others, are important to know the hydrological configuration of watersheds. The relationships among these parameters are useful for developing hydrological models, which allow prioritizing watersheds based on their condition, such as erosion susceptibility. To determine the aforementioned relationships, statistical methods, such as multivariate techniques, have been widely used worldwide [26,27]. For instance, Gavit et al. [28] characterized 13 geomorphometric parameters in 11 watersheds located in the Godavari river in Maharashta, India. Youssef et al. [29] estimated the erosion risk by using remote sensing technology, GIS, and geomorphometric parameters in 11 watersheds located in Sinai, Egypt. Makwana and Tiwari [30] used seven geomorphometric parameters to characterize 19 watersheds in the region of Gujarat, India. Sharma et al. [31] applied the multivariate technique of principal component analysis (PCA) to 13 geomorphometric parameters from eight watersheds located in the Madhya district, India. Meshram and Sharma [32] and Farhan et al. [33] used PCA to analyze the geomorphometric parameters of a group of watersheds located in the Shakkar Catchment River in India and Jordan.

A large number of shape, relief, and hydrological parameters are associated with watershed geomorphometry [34,35]. The statistical techniques of PCA and group analysis (GA) [36–41], as well as multivariate analysis of variance (MANOVA) [41] and the ranking methodology known as compound parameter (*Cp*) [42] have been widely used in recent years for the analysis of environmental data from watersheds. These techniques assist with analyzing the spatial variability of watersheds, their structure, as well as the relationships existing among them.

The most important basin in the state of Chihuahua, as a runoff source [43], is the Conchos River Basin (CRB). Yet, this basin has experienced water stress conditions during the past years. According to Ordoñez [44], approximately 8000 km2 (11.8%) of the basin high lands present strong erosion problems, which could impact waterflow and water quality. In these high lands, deforestation and land-use/land-cover changes had contributed to reduce the amount of infiltrated water, impacting on groundwater availability [43]. In the low lands of the basin, agriculture consumes 90% of the water harvested in the basin. Other consumers include the industrial and domestic sectors [43,45]. In addition, the international water trade between Mexico and the U.S. from 1944 commands Mexico to deliver annually from this basin a total of 432 <sup>×</sup> 106 m3 of water to the U.S. [46]. Therefore, specific knowledge about water management and the status of the basin's soil erosion is required to implement strategies to solve water-related problems and to promote the sustainable development of the region.

The objective of this study was: (1) to describe the behavior of the 31 watersheds located along the CRB in the state of Chihuahua, Mexico, based on the values of their geomorphometric parameters; (2) to spatially classify the 31 watersheds into groups; and (3) to prioritize the groups of basins according to their erosion susceptibility. For the grouping, multivariate techniques and the compound parameter (*Cp*) were used and their efficacy compared.

### **2. Materials and Methods**

### *2.1. Study Area*

The CRB is located in the state of Chihuahua and Durango, Mexico, and is part of the 24th Rio Bravo-Conchos Hydrological Region [47] (Figure 1). The basin has an area of 67,800 km2 [48], with an altitudinal distribution ranging from 841 m to 2348 m [49]. It presents a diversity of climates ranging from temperate in the upper, semi-arid in the middle, and arid in the lower parts of the basin [50]. The physiography of the upper basin belongs to the mountainous zone made up of temperate forests dominated by species of pines and oaks. The middle part of the Altiplano or central valleys is made up of transition zones where oaks and bushes are present. Regarding the lower part, it belongs to the arid region and is made up of shrublands and grasslands [51]. The basin has a precipitation regime from June to September, with July and August being the wettest months, and fluctuating from 200 mm to 700 mm [48].

**Figure 1.** (**a**) Land-use/land-cover types of the Conchos River Basin (CRB), (**b**) Delimitation of the 31 watersheds of the CRB, (**c**) Location of the CRB in Mexico.

### *2.2. Data*

Data of the CRB was obtained from the online GIS source of CONABIO [52]. Likewise, the data of the 31 watersheds composing the CRB (Figure 2) were obtained from the Watershed Water Flows Simulator [53]. The relief and hydrology type parameters were estimated by processing the necessary data from a Digital Terrain Model (DTM), with a resolution of 15 × 15 m, downloaded from INEGI [54]. The values of the basic parameters from the watersheds were obtained by using the Hydrology tool [55] of ArcMap© 10.3 (ESRI, Redlands, CA, USA; https://wwwesri.com/en-us/home). The values of the shape, relief, and linear type parameters were calculated from the equations listed in Table 1.


**Table 1.** Geomorphometric parameters.

<sup>1</sup> The basic parameters were calculated with the GIS software.

Basic parameters include the area (*A*), perimeter (*P*), watershed length (*Lb*2), stream order (*u*), main channel length (*Lc*), all channel lengths (*Lu*), and contour length (*Li*). The area and perimeter are the most important parameters of the watersheds to understand their hydrological design and reflect the volume of water that can be discharged in a rainfall event [61].

Shape parameters include the Gravelius compactness coefficient (*Cc*), elongation ratio (*Re*), shape factor (*Rf*), elongation index (*Ia*), unit shape factor (*Ru*), and circularity ratio (*Rc*). The Gravelius compactness coefficient is the relationship between the perimeter of the watershed and the perimeter

of a circle of area equal to that of the watershed [24]. Low values of the Gravelius compactness coefficient, shape factor, and elongation index indicate a more elongated watershed, while high values correspond to a less elongated watershed. Watersheds with less elongated shapes have a shorter maximum flow duration, while elongated watersheds correspond to watersheds with longer flow duration [24]. The elongation ratio is the diameter of a circle with an area equal to that of the watershed divided by the length of the watershed [32]. Those watersheds with values close to or greater than the unit correspond to flat watersheds, while values that move away from the unit are watersheds with pronounced relief [25]. The shape factor is the relationship between the area and the length of the watershed [15]. The elongation index is also a relationship between the length and width of the watershed [15]. The unit shape factor is the relationship between the length and the area of the watershed. Values less than 2 indicate that they have weak flood discharge periods, while values greater than 2 indicate that their flood discharge is strong [30]. The circularity ratio is defined as the relationship between the area and the perimeter of the watershed. If the circularity of the watershed is low, then discharge will be slow, and the susceptibility to erosion will be low [61].

**Figure 2.** Classification of 31 watersheds based on the values of five principal components resulting from a principal component analysis. Conchos River Basin, Chihuahua, Mexico. (**a**) PC1, (**b**) PC2, (**c**) PC3, (**d**) PC4, (**e**) PC5. PC1 = Principal component 1, PC2 = Principal component 2, PC3 = Principal component 3, PC4 = Principal component 4, PC5 = Principal component 5.

Relief parameters include the mean watershed slope (*J*), massivity coefficient (*tg*α), relief ratio (*Rh*), relative relief (*Rr*), and orographic coefficient (*Co*). The mean watershed slope indicates the degree of the terrain roughness. As the slope increases, the watershed will be prone to erosion. The massivity coefficient represents the relationship between the mean watershed height and its surface area, which is expressed as a percentage. Small values of the massivity coefficient correspond to very mountainous watersheds, while large values correspond to watersheds from moderately mountainous

to flat. The relief ratio is directly related to the slope of the currents and the watershed surface, being an indicator of hydrological processes and erosion. The relief ratio, similar to the relative relief, has a direct correlation with the watershed erosion processes [25].

Linear parameters include drainage density (*Dd*), mean slope of the main channel (*j*), mean distance (*Am*), sinuosity of the main channel (*Scp*), Kirpich concentration time (*Tc*), average peak flow (*Qp*), texture ratio (*T*), rivers frequency (*Fu*), resistance number (*Rn*), general flow length (*Lo*), and drainage intensity (*Di*). High drainage density values indicate a high current density, and therefore a rapid response to precipitation events [62]. The mean slope of the main channel indicates the slope of the longest channel in the watershed. The high values of this parameter indicate that sediment flow and entrainment will quickly exit the watershed [15]. The sinuosity of the main channel indicates the velocity of flow movement in the channels. The lowest values of sinuosity correspond to channels where the flow travels at greater speed, whereas the channels with the highest values of sinuosity indicate the slow movement of the flow [57]. The Kirpich concentration time is the time when a drop of water falls at the furthest point until it reaches the exit point [58]. Average peak flow is defined as the mean maximum amount of water passing through a specific section [59]. The texture ratio corresponds to the relationship between the total number of streams and the watershed perimeter. Rivers frequency represents the total number of streams of all orders per unit area [15]. The resistance number is used to measure the flood potential of rivers. It has a direct relationship with erosion, where increasing its value would represent an increment in erosion susceptibility [15].

### *2.3. Watershed's Description and Classification*

Prior to the watersheds' classification, their geomorphometric parameters were correlated [6,63]. Correlation indicates when part of the information contained in a set of geomorphometric parameters is also contained in the remaining ones [32]. That served to reduce the number of parameters included in the subsequent analysis.

To describe the variability of the geomorphometric parameters, principal component analysis (PCA) was performed. The technique of PCA reduces the data dimensionality, simplifies the dataset, and makes it easier to explain through a set of new principal components (PCs) [64,65]. The first principal component is the linear combination of the original geomorphometric parameters that contributes the most to the total variance; the second principal component, not correlated to the first, contributes the most to the residual variance, and so on until the total variance is analyzed. The PCs are orthogonal variables that could be obtained by multiplying the original variables, which are correlated, with coefficients similar to the ones described in Equation (1):

$$z\_{i\rangle} = a\_{i1}\mathbf{x}\_{1\rangle} + a\_{i2}\mathbf{x}\_{2\rangle} + \dots + a\_{im}\mathbf{x}\_{mj} \tag{1}$$

where *z* represents the coefficient of the component, *a* represents the weight of the component, *x* represents the measured value of the variable, *i* corresponds to the component number, *j* represents the sample number, and *m* represents the total number of variables.

The PCA was applied to the values of geomorphometric parameters to calculate the correlation matrix and to obtain the PCs [66]. Both the correlation analysis and the PCA were performed in the SAS© 9.1.3 software (The SAS Institute Inc., Cary, NC, USA, https://www.sas.com/en\_us/home.html). Then, the PCs were mapped for interpretation.

For the classification, the first method used for the watersheds was a hierarchical group analysis, which was based on the Ward criterion [67]. The Ward criterion was applied to the GA by using the square Euclidean distance to explore the clustering of the 31 watersheds. The definition of the watershed groups was performed based on the coefficient of determination (R2) [68]. Finally, a multivariate analysis of variance (MANOVA) was used to analyze whether significant multivariate differences exist between the groups based on the values of their geomorphometric parameters [69].

The second classification method considered in this work was the compound parameter (*Cp*). Previous research has employed this approach for sustainable watershed planning and management [42]. Linear and shape parameters have been commonly used for this method, whereas the relief and basic parameters were additionally included in this study. Linear geomorphometric parameters have a direct relationship with erosion susceptibility, where high values are more likely to result where high erosion probabilities are present [61,70]. Thus, for watershed classification, the highest value of linear parameters was ranked as 1, the second highest was ranked as 2, and so on. Conversely, shape parameters have an inverse relationship with erosion [61,70], and low values are related to high susceptibility to erosion and vice versa. Then, the lowest value of the shape parameters was ranked as 1, the next lowest value was ranked as 2, and so on. Regarding the relief and basic parameters, the highest value was ranked as 1, the second highest value was ranked as 2, and so on. After this procedure was completed, the ranked values from each watershed were summed and then averaged to produce the *Cp* of each watershed. This average represents the collective impact of all the parameters, and is calculated according to Equation (2) [42]:

$$C\_P = \frac{1}{n} \sum\_{i=1}^{n} R\_i \tag{2}$$

where *Cp* is the compound parameter, *R* is the ranked value of each parameter, and *n* is the number of parameters.

Based on the *Cp*, the highest priority was assigned to the watersheds with the lowest *Cp* value, the second priority was assigned to the next higher *Cp* value, and so on. Then, the *Cp* was classified into five categories or groups of erosion susceptibility. The categories were defined as: Very High (Group 5), High (Group 4), Moderate (Group 3), Low (Group 2), and Very Low priority (Group 1), similar to classifications made in previous studies [71].

### *2.4. Comparison of the Classification Methods*

To compare the grouping methods, the following procedure was followed.

First, an ANOVA was carried out for each geomorphometric parameter (independent variable), and separated for each grouping method. The source of variation or class was considered to be the group. Such analyses served to determine possible significant differences among the groups within each grouping method. After the ANOVA analyses were completed, the grouping method that achieved the highest number of significant *p*-values (α < 0.05) was considered the most effective for grouping the watersheds of the CRB.

A list of abbreviations can be found in Appendix B (Table 3).

### **3. Results**

### *3.1. Characterization of Conchos River Basin Watersheds*

The values of the basic, shape, relief, and linear-type parameters from the 31 watersheds used in this study are shown in Table A1. Watershed RH24Ia has the smallest area and perimeter with 78.62 km2 and 50.59 km, respectively. Meanwhile, the watershed with the largest area and perimeter is RH24Lb, with 5428.29 km<sup>2</sup> and 640.77 km, respectively. The watershed with the highest stream order is RH24Kb, while watershed RH24Lb has the longest main channel. First-order streams do not have tributaries, and their flow depends on the secondary surface contributions that converge to them [61]. The watershed with the highest number of channels of order one is RH24Ia, whereas the watershed with the lowest number of channels is RH24Jb. The watershed lengths vary from 11.17 km to 133 km. Watershed RH24Kb presents the largest elongation ratio value, indicating that it is the flattest, while watershed RH24Ib has the lowest value for this parameter, indicating that it has the steepest slope [25]. The values of sinuosity of the main channel vary from 0.05 km to 4.1 km. The watershed

with the shortest Kirpich concentration time is RH24Kb, while watershed RH24Lb has the longest. The lowest average peak flow value corresponds to watershed RH24Ia, whereas watershed RH24Lb presents the highest. The texture ratio values are between 4.98–75.1, which are considered as moderate to high values; the low values correspond to watershed RH24Na, while the highest value belongs to watershed RH24Lg, so the former is not susceptible to erosion, while the latter is. The watershed with the lowest resistance value is RH24Na, while the watershed with the highest value is RH24Lg.

### *3.2. Correlations and Principal Component Analyses*

The data matrix of the 31 watersheds and the 33 parameters, which included the basic shape, linear, and relief type parameters, were analyzed through a correlation analysis. A set of parameters showing high correlations were identified. From each pair of highly correlated parameters, only one parameter was chosen, and the rest were eliminated to reduce the data dimensionality. After the reduction, the final number of geomorphometric parameters was 26, as listed in Table 2.

The PCA was performed on the 26 parameters and showed that the first five principal components were the most important for explaining the data variance (Figure 2). The most important parameter was selected according to its contribution to the principal component, as it is shown by the values of the eigenvectors (in bold) of the correlation matrix (Table 2). The first five principal components accounted for 88.44% of the total variance of the dataset. The linear parameters (hydrology) are the ones mainly explaining the CRB behavior (Table 2).

**Table 2.** Principal components and geomorphometric parameters explaining the greatest portion of variance in the watersheds of the Conchos River Basin, Chihuahua, Mexico.


PC = Principal component, First = Primary importance, Second = Secondary importance.

### *3.3. Watershed's Classification Based on Group Analysis*

Figure 3 shows the dendrogram, resulting from the group analysis (GA) of the 31 watersheds. Five groups were identified based on their basic, relief, shape, and hydrology-type parameters, and considering a value of R<sup>2</sup> = 0.84. Group 1, with seven watersheds, has the largest amount of low values for the shape, relief, and linear-type parameters, as can be verified in Table 3. Group 2, with eight watersheds, presents the lowest average values of drainage density, sinuosity of the main channel, and general flow length. Group 3, also with eight watersheds, showed the highest values of elongation ratio, drainage density, mean slope of the main channel, and general flow length. Group 4, with four watersheds, has the highest values in maximum and minimum height, mean slope of the watershed, mean distance, unit shape factor, resistance number, and drainage intensity. In this group, the lowest values correspond to the elongation ratio. Group 5, also with four watersheds, presents the largest amount of high values for the basic, shape, relief, and linear-type parameters. The multivariate analysis of variance (MANOVA) showed significant differences among the groups of watersheds, showing a value of Wilks' lambda equal to 0.0025, with a value of *p* < 0.0001.

**Figure 3.** Dendrogram classifying 31 watersheds by group analysis. Conchos River Basin, Chihuahua, Mexico. The red line was drawn to define the groups.


**Table 3.** Average values of the geomorphometric parameters by group.

Gid = Group identification, 1 = Group 1 (Very low erosion susceptibility), 2 = Group 2 (Low erosion susceptibility), 3 = Group 3 (Moderate erosion susceptibility), 4 = Group 4 (High erosion susceptibility), 5 = Group 5 (Very high erosion susceptibility)*, Lc* = Length of main channel, *Lb*<sup>2</sup> = Length of watershed, *Li* = Length of contour lines, *Lu* = Length of channels, *Hmin* = Minimum height, *Hmax* = Maximum height, *Cc* = Gravelius compactness coefficient, *Re* = Elongation elation, *Rf* = Form factor, *Ia* = Elongation index, *J* = Mean slope of watershed, *tg*α = Mass coefficient, *Dd* = Drainage density, *j* = Mean slope of main channel, *a* = Medium distance, *TcK* = Kirpich concentration time, *Scp* = Sinuosity of main channel, *Qp* = Average peak flow, *T* = Texture ratio, *Ru* = Unitary shape factor, *Fu* = River frequency, *Rn* = Resistance number, *Rh* = Relief ratio, *Rr* = Relative relief, *Lo* = Length of general flow, *Di* = Drainage intensity.

The geospatial distribution of the groups is shown in Figure 4. Group 1 shows a homogeneous pattern in its distribution, which is concentrated in the central part of the study area. In contrast, Group 2 shows a dispersed distribution, mainly at the edges of the CRB. Group 3 is distributed in the northern, central, and southern parts, and is represented by small clusters of two or three watersheds. Group 4 corresponds to watersheds spatially dispersed over the basin. The watersheds of these groups are isolated from the other watersheds of the same group. Group 5 shows watersheds clustered in the southern part of the basin, except for the watershed RH24Jb, which is located in the northern region. The GA showed a clustered geospatial pattern for Groups 1, 3, and 5, who share characteristics in their parameters and space.

**Figure 4.** Geospatial distribution of watershed groups classified by group analysis. Very High (-), High (-), Moderate (-), Low (-), Very Low (-). Conchos River Basin, Chihuahua, Mexico.

### *3.4. Watershed's Classification Based on the Compound Parameter (Cp)*

Considering the 26 geomorphometric parameters selected after the correlation analysis was performed, the value of the compound parameter (*Cp*) was calculated for the 31 watersheds of the CRB (Figure 5a). The watershed RH24Lg received the highest priority (1), followed by the watershed RH24Le (2). The watershed with the lowest priority (31) was watershed RH24Kg. A high priority is an indicator of a high degree of erosion susceptibility in the watershed.

**Figure 5.** (**a**) Watersheds and their compound parameter (*Cp*). (**b**) Geospatial distribution of watershed groups by *Cp* reclassification. Very High 1 (-), High 2 (-), Moderate 3 (-), Low 4 (-), Very Low 5 (-). Conchos River Basin, Chihuahua, Mexico.

The resulting *Cp* map (Figure 5a) was reclassified into the following five categories: Very High, High, Moderate, Low, and Very Low (Figure 5b). The spatial distribution of the groups was reclassified by natural breaks [71]. The watersheds classified as Very High show a homogeneous pattern in their distribution in the southwestern part of the watershed. Meanwhile, the watershed RH24Jb is isolated in the northwestern part. The High, Moderate, and Low classes show a dispersed distribution, with at least two of their watersheds clustered in space. The Very Low class shows a homogeneous distribution in space in the southeastern part of the study area, with only one dispersed watershed (RH24Hf).

### *3.5. Comparison of the Classification Methods*

Regarding the GA classification method, the results from the ANOVA analyses performed on 14 geomorphometric parameters, out of 26, detected significant differences among the groups defined by the method. In the case of the *Cp* classification method, the ANOVA analysis of only two parameters detected significant differences among the groups defined by this classification method (Table 4).

**Table 4.** *p*-values resulting from the ANOVA analyses performed to detect differences among the groups defined by each classification method.


GA = Group analysis, *Cp* = Compound parameter, *Lc* = Length of main channel, *Lb*<sup>2</sup> = Length of watershed, *Li* = Length of contour lines, *Lu* = Length of channels, *Hmin* = Minimum height, *Hmax* = Maximum height, *Cc* = Gravelius compactness coefficient, *Re* = Elongation elation, *Rf* = Form factor, *Ia* = Elongation index, *J* = Mean slope of watershed, *tg*α = Mass coefficient, *Dd* = Drainage density, *j* = mean slope of main channel, *a* = medium distance, *TcK* = Kirpich concentration time, *Scp* = Sinuosity of main channel, *Qp* = Average peak flow, *T* = Texture ratio, *Ru* = Unitary shape factor, *Fu* = River frequency, *Rn* = Resistance number, *Rh* = Relief ratio, *Rr* = Relative relief, *Lo* = Length of general flow, *Di* = Drainage intensity.

### **4. Discussion**

The prioritization of watersheds, based on susceptibility to erosion, has been carried out in different regions of the world [72,73], using different prioritization methods [74]. This study contributes to the lack of knowledge regarding the susceptibility to erosion in northern Mexico. This was assessed by implementing two methods for prioritization based on the analysis of a set of 33 parameters, which differ from other studies [5,7,75]. The inclusion of several geomorphometric parameters and their relationships within several connected watersheds enriched the study of their erosion susceptibility. In this sense, multivariate techniques have proved to be appropriate methods for establishing priorities, reducing the dimensionality of the dataset by losing the least amount of information [76].

This study integrated a multivariate analysis of several geomorphometric parameters that served to identify those watersheds, which may be prone to erosion. That was possible by evaluating the behavior of such geomorphometric parameters and representing them in a geospatial basis [77,78]. Their relationships provided significant information about the main sources of variability among the studied watersheds [74]. Previous research studies have reported that topography, geomorphology, and land use/land cover are the most important factors in the watershed susceptibility to erosion [79–82]. In this study, the factors with the greatest influence on the hydrological behavior of watersheds and their erosion susceptibility were the average peak flow and the all channel lengths, as it has also been found in previous studies [6,83].

The PCA is considered a statistical exploratory technique, whose results have helped explain the distribution of environmental attributes [69]. Results from the PCA were useful to identify the sources of variance, which were mainly represented by the dominant parameters influencing the data structure. Then, the basin's hydrological configuration was explained by those geomorphometric parameters explaining the greatest portion of the variance among the watersheds. The PCA results from this study are consistent with the observations made by Meshram and Sharma [32] and Farhan et al. [33].

From the PCA analysis, PC1 and PC2 are mainly influenced by linear geomorphometric parameters. Some of the linear parameters with an influence on PC1 are the average peak flow. This is shown in Figure 3, where the lowest PC1 coefficients correspond to the watersheds with the lowest mean slope values of the small channels. Regarding PC2, drainage density is one of the linear parameters with an influence. Watersheds with low drainage density indicate the presence of permeable surface material, good vegetation cover, and low relief [84,85]. The map of PC2 (Figure 3) was highly influenced by drainage density, since the watersheds with low values of this parameter are located in the south–central part of the study area and grouped in such a map.

PC3 and PC4 are influenced by linear parameters such as mean distance and shape parameters such as elongation ratio, as well as topographic parameters such as minimum height and mean slope. These factors are associated with the main channel, relief, and slopes, among others. In Figure 3, the watersheds with the greatest heights and slopes correspond to the watersheds located in a mountainous zone, while the watersheds with the lowest elevations and slopes correspond to the arid and semi-arid zones of the state of Chihuahua [86]. Regarding PC5, it is mainly influenced by the elongation index (shape parameter) and the general flow length (linear parameter). The high values of the elongation index correspond to enlarged watersheds, which are related to high drainage densities. A watershed with a high drainage density implies a quick hydrological response to rainfall events, while non-enlarged watersheds correspond to fan-shaped watersheds, which are characterized by short channels [87].

Group analysis (GA) was one of the methodologies used in this study to group and then prioritize the watersheds. It was useful to relate watersheds that share the same characteristics based on their geomorphometric parameters. The groups delineated by the analysis have a unique combination in terms of their geomorphometric attributes [40]. The groups of watersheds follow a territorial pattern. Group 1 includes watersheds located in the zones with the least slopes, where the predominant economic activity is agriculture. Group 2 and 3 belong to watersheds located in transition zones because of their moderate slopes. Groups 4 and 5 are watersheds with rugged topography, with vegetation of shrublands and oak forests.

The compound parameter (*Cp*) was the second methodology employed to prioritize the watersheds. The value of *Cp* was calculated for each of the 31 watersheds composing the CRB (Figure 5a). Based on the value of *Cp,* watershed RH24Hf received the highest priority (1), followed by the watershed RH24Le (2). By comparing the results from GA and *Cp*, Group 4 was identically integrated by the two methodologies. This group is characterized by watersheds having the highest average values of maximum and minimum height, elongation ratio, elongation index, mean watershed slope, slope of the main channel and unit shape factor. The high values of these parameters correspond to watersheds with a high erosion susceptibility [88]. Conversely, Group 5 was formed by watersheds having the highest values of main channel length, watershed length, contour length, all channel lengths, Gravelius compactness coefficient, shape factor, massivity coefficient, mean distance, Kirpich concentration time, sinuosity of the main channel, average peak flow, texture ratio, river frequency and resistance number. This coincide with high values of *Cp*, which correspond to watersheds distributed in the southwestern zone of the study area and may have a low erosion susceptibility [61].

The two prioritization schemes used in this study gave similar results according to the spatial distribution of watershed groups. The prioritization of watersheds, obtained through GA and *Cp*, highlighted those watersheds with potential for the implementation of soil and water conservation practices. Based on the ANOVA analyses performed to statistically compare the GA and *Cp* methodologies, the former resulted in more effectively classifying the watersheds, since it permitted better differentiating the watershed groups.

Results from the GA show that erosion susceptibility is strongly related to linear parameters (surface hydrology) for southwestern watersheds, where steep slopes of both the watershed and the main channel influence soil erosion [89]. Watersheds RH24Lg, RH24Le, RH24Lf, RH24Mc, RH24Lb, RH24Nc, and RH24Ne have the steepest slopes, making them more prone to erosion [29].

One of the advantages of using the watershed as a territorial unit is the analysis of multiple geomorphometric parameters, which are related to the watershed's hydrological configuration, topography, and shape. Most of the watershed surface attributes depend on local topographic conditions [5]. In this study, the Basin's altitudinal gradient, a surface attribute, assists in exhibiting the contrasts among watersheds groups, while showing a homogeneous geographic distribution within them. The linear and shape-type parameters are important because of their influence on soil erosion.

The description and spatial grouping of the 31 watersheds through their 26 parameters using multivariate techniques proved to be useful to understand the main factors that control the variance in the CRB. Prioritization through the two types of grouping was also effective in detecting those watersheds susceptible to erosion. The proposed methodology for prioritizing watersheds on a geospatial basis is a feasible approach for identifying watersheds that are susceptible to erosion. However, prioritization with parameters that are based on shape, linear, and relief of the watersheds may not be sufficient. Thus, the incorporation of information regarding the activities on the territory of the CRB would help to improve the efficacy of the classification of watersheds based on their erosion susceptibility. Therefore, future research could include socioeconomic attributes that contribute to soil loss, such as agriculture [39]. Despite the limitations of this study, the contribution of this work represents an advance in the identification of the watersheds that are most susceptible to erosion in the CRB. This in turn contributes to land-use planning, which may help mitigate soil degradation processes.

### **5. Conclusions**

The application of GA and *Cp* methodologies allowed integrating a large set of geomorphometric parameters, which served to classify watersheds according to their characteristics.

GA more effectively clustered the watersheds of the Conchos River Basin than *Cp*, since the groups formed by GA were more differentiated based on the analysis of the watersheds' geomorphometric parameters. The results of GA show that watersheds RH24Lf, RH24Lb, RH24Nc, and RH24Jb might be subjected to strong erosion processes, and are potential candidates to be subjected to soil conservation practices.

The present study demonstrates the usefulness of integrating GIS and multivariate techniques to prioritize watersheds based on their erosion susceptibility. Such an integration approach showed the spatial relationships of the different geomorphometric parameters analyzed. Although the present study permitted a definition of watershed groups according to the values of their geomorphometric parameters and their relation with erosion susceptibility, the integration of additional variables in the analysis may provide a more insightful classification and thus a more reliable watershed prioritization. Such variables could include land use/land cover, soil type, lithology, geomorphology, and socioeconomic activities, among others.

**Author Contributions:** Conceptualization, A.P.-A. and F.V.-G.; Data curation, G.V.-Q.; Formal analysis, J.A.P.-A., A.P.-A., M.C.V.-A., A.E.R.-R., and M.M.-S.; Investigation, M.C.V.-A.; Methodology, M.M.-S.; Project administration, A.P.-A.; Software, J.A.P.-A. and G.V.-Q.; Supervision, F.V.-G.; Writing—original draft, J.A.P.-A.; Writing—review and editing, M.C.V.-A., A.E.R.-R., and F.V.-G.

**Funding:** This research received no external funding.

**Acknowledgments:** We deeply thank Consejo Nacional de Ciencia y Tecnología (CONACYT-Mexico) for supporting the doctoral graduate studies of the first author of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.



#### *Sustainability* **2019** , *11*, 5140

 4655.43

 2761.21

 2077.04

 315.23

 128.60

 93.97

 3020.71

 5385.38

 363.46

 166.11

 118.17

 4289.87

 5719.63

 494.37

 207.40

 110.90

 7684.79

 10,752.04

RH24Nc

RH24Nd

RH24Ne

 23,253

 8250

 9996

 4964

 1930

 2500

 1360

 3.90

 4146

 2080

 2800

 1360

 3.90

 11,600

 2020

 2780

 1260

 4.09

0.37 41.98 0.36 23.37 0.40 22.10

 2.64

 5.06

 4.25

 14.54

 1.08

 15.54

 1.33

 16.51

 2.30


**Table A1.** *Cont.*

Unitary shape factor, *Fu* = River frequency, *Rn* =

Resistance number, *Rh* = Relief ratio, *Rr* = Relative relief, *Lo* = Length of general flow, *Rc* =

Circularity ratio, *Di* = Drainage intensity.

#### *Sustainability* **2019** , *11*, 5140



GP = Geomorphometric parameter, PC1 = Principal component 1, PC2 = Principal component 2, PC3 = Principal component 3, PC4 = Principal component 4, PC5 = Principal component 5, *Lc* = Length of main channel, *Lb*<sup>2</sup> = Length of watershed, *Li* = Length of contour lines, *Lu* = Length of channels, *Hmin* = Minimum height, *Hmax* = Maximum height, *Cc* = Gravelius compactness coefficient, *Re* = Elongation ratio, *Rf* = Form factor, *Ia* = Elongation index, *J* = Mean slope of watershed, *tg*α = Mass coefficient, *Dd* = Drainage density, *j* = mean slope of the main channel, *a* = Medium distance, *TcK* = Kirpich concentration time, *Scp* = Sinuosity of the main channel, *Qp* = Average peak flow, *T* = Texture ratio, *Ru* = Unit shape factor, *Fu* = River frequency, *Rn* = Resistance number, *Rh* = Relief ratio, *Rr* = Relative relief, *Lo* = General flow length, *Di* = Drainage intensity. Bold letters indicate the dominant coefficient.

### **B.**

### **Table 3.** Abbreviations.


\* Acronyms in Spanish.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Water Quality Sustainability Evaluation under Uncertainty: A Multi-Scenario Analysis Based on Bayesian Networks**

### **Anna Sperotto 1,2, Josè Luis Molina 3, Silvia Torresan 1, Andrea Critto 1,2,\*, Manuel Pulido-Velazquez <sup>4</sup> and Antonio Marcomini 1,2**


Received: 1 August 2019; Accepted: 23 August 2019; Published: 31 August 2019

**Abstract:** With increasing evidence of climate change affecting the quality of water resources, there is the need to assess the potential impacts of future climate change scenarios on water systems to ensure their long-term sustainability. The study assesses the uncertainty in the hydrological responses of the Zero river basin (northern Italy) generated by the adoption of an ensemble of climate projections from 10 different combinations of a global climate model (GCM)–regional climate model (RCM) under two emission scenarios (representative concentration pathways (RCPs) 4.5 and 8.5). Bayesian networks (BNs) are used to analyze the projected changes in nutrient loadings (NO3, NH4, PO4) in mid- (2041–2070) and long-term (2071–2100) periods with respect to the baseline (1983–2012). BN outputs show good confidence that, across considered scenarios and periods, nutrient loadings will increase, especially during autumn and winter seasons. Most models agree in projecting a high probability of an increase in nutrient loadings with respect to current conditions. In summer and spring, instead, the large variability between different GCM–RCM results makes it impossible to identify a univocal direction of change. Results suggest that adaptive water resource planning should be based on multi-model ensemble approaches as they are particularly useful for narrowing the spectrum of plausible impacts and uncertainties on water resources.

**Keywords:** water quality; climate change; Bayesian networks; uncertainty; multi-models

### **1. Introduction**

The maintenance of good water quality resources is essential to protect both ecosystems and human health, and they represent one of the main targets of both the European Water Framework Directive (2000/60/CE) and Sustainable Development Goals (i.e., SDG6) [1].

Changes in the global climate system are expected to have major consequences on the qualitative aspect of available water resources [2–7]; thus, assessing the impacts of future climate change scenarios on water systems is necessary to ensure a sustainable management of water supply for multiple purposes.

The inherent complexity, variability, and randomness of water systems, their interaction with socio-economic factors including the land use and population growth, and the high degree of uncertainty stemming from climate change make the assessment of climate change impacts on water resources particularly challenging [8].

Uncertainty plays a prominent role in climate change science and climate change impact science, with hydrology and water resources research in particular [8–11]. According to Parker et al. [12] and Hawkins et al. [13], it can be attributed to a number of reasons including (i) scenario uncertainty, arising from our limited understanding about the path of greenhouse gasses emissions and socio-economic development; (ii) internal climate variability, due to the inherent variability of the climate system components, processes, and their interaction; (iii) model uncertainty, caused by the different formulations used to represent climatic processes in climate and impact models.

A proper understanding of the type, sources, and effects of uncertainty is needed to achieve the goals of reliability and sustainability in water system management and planning under changing conditions [14,15]. Uncertainty quantification is vital to facilitate a risk-based approach to decision-making, where the range of possible futures is considered [16,17], and costs and benefits of adaptation are estimated accordingly. For this reason, uncertainties should be communicated as an inevitable component of each climate impact assessment study in a form which is also understandable by a non-scientific community to avoid misjudged information and to prevent overconfidence in impact projections [18].

A promising way to evaluate and deal with uncertainty is represented by ensemble modeling approaches [19]. Multi-model ensembles are commonly used to investigate structural model uncertainty, employing more than one climate model to perform multiple simulations and analyzing how climate change projections differ. The development of ensembles in both climate and impact studies is strongly encouraged by the Intergovernmental Panel on Climate Change (IPCC) since the Fourth Assessment Report (AR4, 2007) [20], which suggests the use of multiple climate models and scenarios to cover different sources of uncertainty [20]. The variability among ensemble components, in fact, can be used as a measure of the state of the knowledge. Furthermore, it could be useful to describe the confidence about the impact of climate change on the system modeled, supporting more robust decisions. In other words, if most ensemble members give comparable results, high confidence in projected climate change impacts is obtained, while, by contrast, if a large spread between components exist, there is less confidence in the forecasted impacts. Furthermore, it was shown that ensembles often give a more accurate prediction of future climate impacts than even the best individual model [21–23].

Relying on the extensive experience acquired in climate modeling, the use of ensembles was also transferred to the water resources field where attempts to build ensembles of impact models and scenarios (i.e., hydrological, water quality) are becoming increasingly common [24–26] to support water system management and adaptation.

In this respect, the paper proposes a Bayesian network (BN)-based approach to develop an ensemble of impact scenarios simulating the effect of different climate change projections on the quality of water of the Zero river basin (Italy). Accordingly, BNs are used as a modeling framework to evaluate the uncertainty due to global climate model (GCM)–regional climate model (RCM) structure and representative concentration pathways (RCPs), helping in determining and communicating the level of confidence of projected water quality alterations between baseline and future climate regimes.

BN outcomes (i.e., multiple impact scenarios) can be used to inform the spectrum of plausible effects of expected climate change on the Zero river basin and, thus, support the choice of effective adaptation strategies for a sustainable management of water resource quality at the local scale.

After a brief introduction to the study area (Section 1), this paper describes the methodology and input data employed (Section 2) and, finally, discusses the scenarios developed for the Zero river basin case study (Section 3), together with their uncertainty analysis.

### *Study Area*

The Zero river basin (ZRB) (latitudes 45◦28' north (N)–45◦48 N, longitudes 11◦54' east (E)–12◦25' E) (Figure 1) is located within the Venetian floodplain (northern Italy) and it is a sub-basin of the Venice

lagoon watershed (Figure 1a), covering an area of 140 km2. The Zero river (Figure 1b), which is 47 km long, together with the Dese rivers, provides the greatest contribution of freshwater (21% of the total) to the lagoon of Venice [27].

**Figure 1.** The Zero river basin case study (**a**) and input data location (**b**).

The basin features a Mediterranean climate but typical traits of more continental climates [28], with an average annual precipitation of around 1000 mm (period 2007–2012) and an average annual temperature of 14 ◦C (period 2004–2013). It is characterized by marked inter-annual climate variability, which can originate years climatologically very different from each other. The land use of the ZRB is mainly characterized by agricultural areas, representing 73% of the total surface, while the remaining surface of the basin is covered by artificial (24%), semi-natural, and forested areas (4%).

Agricultural areas are dominated by industrial crop typologies, including corn (45%) (i.e., *Zea mays* L.), soy (9%) (i.e., *Glycine max* L.), and autumn–winter cereals (13%) such as winter wheat (i.e., *Triticum aestivum* L.) and barley (i.e., *Hordeum vulgare* L.). A negligible percentage of the agricultural land is also used for the cultivation of beets and other permanent horticultural crops.

Artificial surfaces are mainly represented by housing areas (54%), industrial businesses (32%), and transportation and services (14%). Accordingly, several industrial and residential activities exist on the basin. Three wastewater treatment plans (i.e., Morgano, Zero-Branco, and Castelfranco Veneto) (Figure 1b) with capacities ranging from 2500 to 32,000 of population equivalents (PE) directly discharge into the Zero river.

The intensive agriculture, characterized by an elevated level of fertilization, and the dense urbanization represent significant pollution sources for the area and the main factor responsible for the excessive nutrient loadings in the reaching bodies of the Venice lagoon. Nutrient pollution is a major concern in the area considering the risk of eutrophication and toxic algae blooms which can threaten the good qualitative status of waters with consequent implications for environmental and human health [29]. Climate change, inducing extreme changes in temperature and precipitation trends, could exacerbate such nutrient pollution, altering those hydrological processes (e.g., runoff, river flow, water retention time, evapotranspiration) that regulate the mobilization of nutrients from land to inland and coastal water bodies.

### **2. Materials and Methods**

To understand the complexity of the interactions between climate, hydrology and water quality parameters in the Zero river basin, we adopted the BN-based integrated approach proposed in Figure 2.

**Figure 2.** The Bayesian network (BN)-based integrated approach.

The core of the proposed approach is represented by a Bayesian network, which is employed as a modeling tool for the simulation of multiple nutrient loadings scenarios, and for the analysis of their uncertainty. Different data coming from hydrological models, historical observations, and expert judgment (Table A1, Appendix A) were structured and combined into a probabilistic form to develop and train the BN at different levels of implementation.

Qualitative information elicited from experts was used to develop the conceptual model of the network and to train socio-economic and agronomic variables (i.e., irrigation, fertilizer application) of the model for which quantitative data were not available. Observations regarding the main climatic parameters (i.e., precipitation, temperature, and evapotranspiration) and point-source pollution sources (i.e., wastewater treatment plants (WWTPs) and industrial discharges), together with Soil and Water Assessment Tool (SWAT) model simulations (i.e., runoff, river discharge (Q)) for the current conditions developed by Reference [30], were used for the training of the network for the period 2004–2013. Additional observed hydrologic data (i.e., river discharge (Q), nutrient concentrations (i.e., NO3 <sup>−</sup>, NH4 +, PO4 <sup>3</sup>−)) coming from the water quality monitoring station were used to evaluate the performance of the model under current conditions.

After the training and validation, an ensemble of climate change projections generated by coupling different combinations of Global Climate Models (GCMs) with regional climate models (RCMs) was used as input for scenarios analysis to assess the effect (and uncertainty) of future climate change on nutrient loadings.

### *2.1. Climate Change Projections*

To assess the effect of climate change on nutrient loadings (i.e., NO3 <sup>−</sup>, NH4 <sup>+</sup>, PO4 <sup>3</sup>−), changes in temperature and precipitation over future scenarios were selected as climate change indicators and used as input for the development of alternative nutrient loading scenarios using the BN model. The main aim of the study, however, was to capture uncertainties across a range of available GCM–RCM structures and representative concentration pathways (RCPs); thus, in order to represent the widest range of temperature and precipitation changes projected for the case study area, different climate change model outputs were considered (Table 1). This allowed including both the "worst" and "best" case scenarios in the BN, giving the users substantial flexibility in exploring and understanding the possible implications of climate change in the future. GCM–RCM combinations were selected among those available considering different kinds of features, including (i) their representativeness for the case study area and for the selected time periods (i.e., 2041–2070 and 2071–2100); (ii) their ability to perform at high spatial resolution; (iii) their availability in an open-source format.

**Table 1.** Global climate model (GCM)–regional climate model (RCM) projections selected and implemented in the Bayesian network (BN). SMHI—Swedish Meteorological and Hydrological Institute; DMI—Danish Meteorological Institute; CMCC—Centro Euro-Mediterraneo sui Cambiamenti Climatici.


Ensembles of 10 climate change models were selected (Table 1), including the CMCC-CM/COSMO-CLM GCM–RCM and nine GCM–RCM model combinations from the EURO-CORDEX project [31].

The CMCC-CM global model [32] is a coupled atmosphere–ocean general circulation model, while the COSMO-CLM (CCLM) [33] is a high-resolution (between 1 and 50 km) climate regional model; both were developed by the Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC), and, when coupled, they allow a spatial resolution of 8 km for the selected region.

EURO-CORDEX is the European branch of the CORDEX initiative sponsored by the World Climate Research Program (WRCP) with the aim of organizing an internationally coordinated framework to produce improved regional climate change projections for all land regions worldwide based on dynamical statistical downscaling models forced by multiple GCMs. CORDEX results were used as input for climate change impact and adaptation studies within the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC). In this study, nine climate change scenarios resulting from different combinations of GCMs and RCMs at 12-km spatial resolution were selected (Table 1). Different GCMs and RCMs were developed by different research groups including the Danish Meteorological Institute (DMI), the Swedish Meteorological and Hydrological Institute (SMHI), and the Met Office Hadley Centre (MOHC) (Table 1).

Based on the outputs of the selected GCM–RCMs (Table 1), different climate change scenarios were developed for the Zero river basin case study by extrapolating the mean temperature (◦C) and the cumulative precipitation (mm) calculated on a monthly basis. Specifically, for each GCM–RCM, five different 30-year scenarios were developed for a control period (i.e., 1983–2012), a mid-term period (i.e., 2041–2070) and a long-term period (i.e., 2071–2100) under two different representative concentration pathways (i.e., RCP4.5 and RCP8.5). RCP4.5 represents the moderate emission scenario which predicts an increase in radiative forcing up to 4.5 W·m−<sup>2</sup> by 2100 and a stabilization of the emissions (i.e., 650 ppm) shortly after 2100 [34], while RCP8.5 was chosen as representative of the extreme emission scenario, in which the greenhouse gas (GHG) emissions and concentrations increase considerably over the 21st century, leading to a radiative forcing of 8.5 W·m−<sup>2</sup> by 2100 [35], thus describing a future without any specific climate mitigation target.

To make the outputs of GCM–RCMs suitable to be implemented at the spatial scale of impact assessment models, a bias correction was applied [30]. GCMs, in fact, have a spatial resolution too coarse for local-scale assessments and, for this reason, they are generally coupled with RCMs to consider the effects of orography, land–sea surface contrast, and land surface characteristics. However, RCMs also often show significant biases due to imperfect conceptualization, discretization, and spatial averaging within grid cells; therefore, a bias correction is required.

For the data used in this study, the linear scaling (LS) method was applied to correct the biases in the monthly values of temperature and precipitation based on observed ones. The LS method was applied using the software CLIME, a geographic information system (GIS) software for climate data analysis developed by the Regional Models and Geo-Hydrogeological Impacts (REMHI) division of CMCC, as extensively described in Reference [30]. Specifically, the method was implemented for all 10 climate scenarios for every weather station of the case study (Figure 1) using the rainfall and temperature observations for the period 1993–2012 as a correction factor. Once corrected, outputs of the GCM–RCMs for each of the 10 climate scenarios and for each of the three weather stations of the case study (i.e. Castelfranco Veneto, Zero-Branco, Mogliano Veneto) were elaborated to obtain suitable inputs for the BN model.

### *2.2. Bayesian Network Model*

A BN model was employed to assess and compare the impacts of different climate change scenarios on nutrient loadings (i.e., NO3 <sup>−</sup>, NH4 <sup>+</sup>, PO4 <sup>3</sup>−) in the transitional waters of the Zero river basin, thus generating an ensemble of impact scenarios supporting the identification of climate change effect on water quality.

The BN was implemented by building on a BN model previously developed and validated in a case study [36] which was extended to allow the incorporation of multiple GCM–RCM inputs. The BN for the Zero river basin was developed and run using the software HUGIN Expert, version 8 [37,38]. For additional details about the methodology and data used to develop the BN, please refer to Reference [36].

### 2.2.1. BN Development and Training

The BN structure was designed following the DPSIR (Drivers-Pressures-State-Impacts-Responses) framework, starting from the conceptual model described in Reference [36]. An influence (i.e., "box and arrow") diagram was developed including the most relevant systems variables (i.e., nodes), as well as the links between them (i.e., directed arcs), allowing the identification of the main cause–effect pathways between input variables, represented by climatic changes and land use, and output variables, represented by the increase in nutrient loadings (i.e., NO3 <sup>−</sup>, NH4 <sup>+</sup>, PO4 <sup>3</sup>−) discharged by the Zero river basin into the Venice lagoon. Successively, the BN was trained, assigning states, prior information and conditional probabilities to all nodes of the network, translating the conceptual model into an operative probabilistic form.

The training was performed using a heterogeneous set of information for the period 2004–2013 (Tables A1 and A2, Appendix A) at seasonal time steps including historical observations, hydrological model simulations (i.e., SWAT), and expert opinions. Specifically, for nodes associated with climatic variables (i.e., temperature, precipitation, evapotranspiration), probabilities were learned directly from the frequency of observations of weather monitoring stations available in the case study (Figure 1). Nodes related with point pollution sources were trained using the nutrient loadings measured in the outflow from three different WWTPs (i.e., Morgano, Zero-Branco, Castelfranco Veneto) (Figure 1) located in the basin, summing up their respective contribution.

Probability distributions of hydrological variables (i.e., runoff, river flow, nutrients loadings, N and P in the runoff) were instead calculated based on the frequencies of results of hydrological simulations performed with the Soil and Water Assessment Tool (SWAT) [39]. Finally, nodes describing agronomic practices (i.e., water needs, irrigation, P and N fertilizer application) were trained through expert elicitation or by applying empirical equations due to the lack of quantitative information and experiences in the case study. An exhaustive description of assumptions and information used to train

the BN can be found in Reference [36]. Figure 3 shows the configuration of the BN for the Zero river basin once states, prior information, and conditional probabilities of each node were parametrized.

**Figure 3.** Configuration of the Bayesian network for the Zero river basin trained with the information for the period 2004–2013.

### 2.2.2. BN Evaluation

The developed BN was evaluated with the main aim to assess if it was able to correctly represent nutrient loadings and dynamics in the case study. Specifically, two main forms of model evaluation were performed, namely, model predictive accuracy and sensitivity analysis [36].

### Predictive Accuracy

Predictive accuracy assessment was performed by comparing BN simulations with observations from water quality monitoring stations available for the case study. Specifically, observed nutrient loadings were calculated, multiplying the observed water flow (Q) and nutrient concentrations measured at two different water quality monitoring stations located in the case study (Figure 1) managed by ARPAV Servizio Acque Interne and the former MAV (Magistrato alle Acque di Venezia). The manual station 122 (45◦33' N and longitude 12◦15' E), provided seasonal data, and the automatic station B2q (45◦34' N and longitude 12◦17' E) provided daily data. Both stations were specifically identified and they are routinely used for (i) the assessment of the good environmental status of the Zero river according to the requirements of the Water Framework Directive (WFD); (ii) the assessment of the compliance with the maximum admissible load of nutrients discharged into the Venice lagoon from the drainage basin fixed by the national competent law (DM 09/02/1999).

For this reason, the two stations were considered particularly representative to measure the condition of the Zero river and its basin, thus providing reliable data for the evaluation of BN performance.

Specifically, loadings of nitrogen nitrate (NO3 <sup>−</sup>) and ammonium (NH4 <sup>+</sup>) were obtained using data from station B2q, while loadings of phosphate (PO4 <sup>3</sup>−) were calculated using data from station 122. A consistent set of observations was available only from year 2007 to 2012 and, therefore, the evaluation was conducted only for this period.

For each output node (i.e., NO3 <sup>−</sup>, NH4 <sup>+</sup>, and PO4 <sup>3</sup><sup>−</sup> loadings), correctly classified instances (CCIs) were assessed as the percentage of cases correctly predicted divided by the total number of cases, providing the measure of how many instances the model predicted correctly when tested against

known case outcomes (i.e., observations). Error rates, used as evaluation criteria, were then computed and depicted in confusion matrices as suggested in Reference [40].

### Sensitivity Analysis

Another form of evaluating the developed model entails sensitivity analysis, which allows testing the sensitivity of model outcomes to variations of model parameters [41]. Sensitivity analysis in BNs can measure the sensitivity of outcome probabilities to changes in input nodes or other model parameters, such as changes in node type of state and their coarseness; therefore, it useful to detect the most relevant variables within the network. Sensitivity analysis was performed using two types of measures: entropy and Shannon's measure of mutual information [42].

The entropy (H(*x*)) of the probability distribution of a variable (*x*) expresses the measure of the associated uncertainty of the random process with a particular probability distribution (P(*x*)) [42]; it is calculated using the following function:

$$\mathcal{H}(\mathbf{x}) = -\sum \mathcal{P}(\mathbf{x}) \log \mathcal{P}(\mathbf{x}).\tag{1}$$

Reducing entropy by collecting information, in addition to the current knowledge about the variable *x*, is interpreted as reducing the uncertainty about the true state of *x*. Accordingly, the entropy function enables an assessment of the level of uncertainty/certainty about the state of the output node and of the additional information required to specify a particular alternative.

Entropy can be seen as a score of a variable richness (i.e., how much information is within the data for that particular variable) [43,44], and it was used to rank nodes from the most uncertain to the least uncertain, where the most uncertain variables are the least informative within the network.

In addition, the sensitivity of one node to multiple other nodes was evaluated using Shannon's measure of mutual information (MI) as follows:

$$\text{MI}\,\,\text{MI}\,\,(\mathbf{Y},\mathbf{X}) = \mathbf{H}(\mathbf{Y}) - \mathbf{H}(\mathbf{Y}|\mathbf{X}).\tag{2}$$

MI enables assessing the effect of collecting information about one variable (*Y*) on reducing the total uncertainty about variable *X*. When MI is equal to zero, the condition of one node does not affect the state of the other and, therefore, the nodes can be defined as mutually independent [43].

### 2.2.3. Scenarios and Uncertainty Analysis

The model developed as above was used in this study to perform scenario analysis, allowing the assessment of the relative change in outcome probabilities of nutrients under different climate change conditions (Section 2.1), thus obtaining an ensemble of multiple impact scenarios. For each GCM–RCM combination (Table 1) and climate change scenario (i.e., 2041–2070 and 2071–2100 under two different representative concentration pathways, RCP4.5 and RCP8.5), the probability distributions of temperature and precipitation were calculated based on the frequency in the respective model simulations (Section 2.1). The BN was then run, alternatively fixing the evidence of being in a particular scenario by assigning 100% probability to the related state in the "climate change scenario node", letting the information propagate through nodes linked by conditional probability (Figure 3) and calculating the change in the posteriori probabilities of output variables (i.e., NO3 <sup>−</sup>, NH4 <sup>+</sup>, PO4 <sup>3</sup><sup>−</sup> loadings).

Moreover, uncertainties in projected loadings due to the application of the ensemble of 10 GCM–RCM couples was performed by comparing outputs obtained with each of the different GCM–RCM combinations across scenarios and seasons. Specifically, the changes in the probability of each loading class in the mid-term (i.e., 2041–2070) and long-term (i.e., 2071–2100) simulated periods were compared against the respective baseline scenario (i.e., 1983–2012) for each combination of GCM–RCM.

### **3. Results**

### *3.1. BN Evaluation*

### 3.1.1. Accuracy

As described above, a data-based evaluation was performed to assess the ability of the model to correctly predict instances in an independent dataset. Accordingly, BN predictions were tested against observations from water quality monitoring stations available from ARPAV for the case study (Figure 1), generating confusion matrices representing the percentage of CCIs and, consequently, the error rates (Figure 4). Observations were available only for 2007–2012 and, therefore, the evaluation was conducted only for this period.



**Figure 4.** Confusion matrices for output nodes of the BN model tested against the observed dataset (2007–2012). The cells lying on the leading diagonal of the matrices represent the correctly predicted instances, while those off the diagonal are incorrect predictions. Adapted from Sperotto et al., 2019 [35].

In addition, the expected values of the probability distributions of nutrient loadings (i.e., NO3 −, NH4 <sup>+</sup>, PO4 <sup>3</sup>−) for observed data were compared with those obtained through the Bayesian network outputs (Figure 5).

Overall, the BN was able to reproduce the observed nutrient dynamics with loadings closely replicated for most seasons. The evaluation produced very good results for phosphate (PO4 <sup>3</sup>−), while, for ammonium (NH4 <sup>+</sup>) and nitrate (NO3 −), the correlation between observed and predicted nutrient loadings was slightly worse. Indeed, overall, the BN was able to correctly classify 87.50% of instances for PO4 <sup>3</sup>−, 63.64% for NH4 <sup>+</sup>, and the 66.67% for NO3 −, when tested against the observed dataset (Figure 4). The BN overpredicted the decrease in ammonium and nitrate loadings between spring and summer, while it underestimated the autumn loadings (Figure 5) for all three nutrient species (i.e., PO4 <sup>3</sup><sup>−</sup>, NH4 <sup>+</sup>, NO3 <sup>−</sup>) and the winter loadings of NH4 <sup>+</sup> and NO3 −.

**Figure 5.** Expected values of the probability distributions of nutrient loadings (NO3 <sup>−</sup>, NH4 <sup>+</sup>, P04 <sup>3</sup>−) from observed data (black) and Bayesian network outputs (red) for the period 2007–2012.

### 3.1.2. Sensitivity Analysis

Entropy was calculated for each node of the network (Table 2), allowing us to rank all variables from the most uncertain to the least uncertain, where the most uncertain variables were those characterized by high entropy and were, thus, the least informative probability distributions.

Results showed that the variable characterized by the least informative probability distribution and a particularly high value of entropy was "effective rainfall" (3.26); however, variables characterized by intermediate values of uncertainty were nodes directly influencing "loading NO3 −", including "N point sources" (1.36), "river flow" (1.34), "N runoff"(1.33), "N diffuse sources" (1.31), and "total N Loading" (1.31). These uncertainties propagated through the network and, as a result, "loading NO3 −" was the output node characterized by the highest value of entropy (1.24), while, for others, the uncertainty was moderate (0.98 and 0.97).


**Table 2.** Node ranking according to entropy score.

Table 3 provides a ranking of the top five most influential variables on output nodes based on the mutual information analysis. The output nodes "NO3 <sup>−</sup> loadings" and "NH4 <sup>+</sup> loadings" were both highly sensitive to "river flow" (MI = 0.54 and 0.37, respectively). "PO4 <sup>3</sup><sup>−</sup> loadings" resulted highly sensitive to "total P Loading" (M = 0.53), "P runoff" (M = 0.39), and "P diffuse sources" (M = 0.39).


**Table 3.** Summary of the mutual information (MI) analysis presenting the top five most influential variables on output nodes.

In general, hydrological variables, which in turn were strongly influenced by climatic ones (i.e., precipitation, temperature), were those most influential on other network variables. By contrast, variables related to agronomic practices and land use had a mild effect (MI < 0.1) on other variables with the exception of "N fertilizer application", which moderately affected "NO3 − loadings" (MI = 0.22). In particular, point sources had negligible effects on all output nodes (MI < 0.04) with respect to diffuse sources.

### *3.2. Climate Change Scenarios for the Zero River Basin*

Different climate change scenarios were developed for the Zero river basin case study by extrapolating the mean temperature (◦C) and the cumulative precipitation (mm) calculated on a monthly basis, based on the outputs of the selected GCM–RCMs (Table 1). Specifically, for each GCM–RCM, five different 30-year scenarios were developed for a control period (i.e., 1983–2012), a mid-term period (i.e., 2041–2070), and a long-term period (i.e., 2071–2100) under two different representative concentration pathways (i.e., RCP4.5 and RCP8.5).

Figure 6 shows the variability of temperature for different time periods and RCPs across different climate change scenarios used to inform the BN. It is possible to observe that the temperature variability across the future projection was quite narrow.

**Figure 6.** Variability in mean seasonal temperature within the global climate model (GCM)–regional climate model (RCM) ensembles for the Zero river basin.

All climate scenarios agreed on projected temperature during the control period (i.e., 1983–2012). Greater variability, instead, was depicted for RCP8.5 where one model in particular (i.e., MPI-ESM-LR/RCA4, Model 5) of the ensemble projected lower temperatures in spring and higher temperatures in autumn. In general, all models predicted an increase in mean seasonal

temperature with respect to baseline across the different considered scenarios (Figure A3, Appendix A). MPI-ESM-LR/RCA4 (Model 5) represented the only exception, predicting a decrease in temperature in spring for RCP8.5 (Figure A3, Appendix A). A greater increase in temperature with respect to baseline was predicted by RCP8.5 for the period 2071–2100.

By contrast, precipitation featured marked variability in all scenarios, as shown in Figure 7. All 10 GCM/RCMs of the ensemble generated quite similar statistics for the control period (i.e., 1983–2012) with a narrow range between maximum and minimum values for all seasons. By contrast, the variability increased consistently along the century, especially for RCP8.5. Greater variability can be seen in summer, autumn, and winter, where the range between maximum and minimum values projected by different GCM/RCMs became quite wide. However, while for winter and autumn most models agreed on an increase in the cumulative precipitation (Figure A4, Appendix A), for spring and summer, models gave the opposite results, making it impossible to agree on the direction of change (i.e., decrease or increase).

**Figure 7.** Variability in cumulative seasonal precipitation within the GCM/RCM ensembles for the Zero river basin.

### *3.3. Hydrological Responses to Climate Change*

The BN model was run, alternatively fixing the probability distribution of precipitation and temperature according to the medium- and long-term projections (i.e., 2041–2070, 2071–2100) provided by the different available combinations of GCM–RCMs under two different representative concentration pathways (i.e., RCP4.5 and RCP8.5). Accordingly, the network was used to develop multiple impact scenarios linking the effect of future climate change projections on nutrient loadings. The developed scenarios represent the probability of different classes of nutrient loadings (i.e., low, medium, high, very high) calculated by the BN model as a result of changes in the probability distribution of input variables (i.e., temperature and precipitation).

Figure 8 gives a concise overview of the probabilistic results obtained through the BN for each season and scenario across the different GCM–RCM models considered (Table 1). Specifically, each triangular portion of the graph represents one of the different climate change scenarios considered (i.e., RCP4.5 2041–2070, RCP8.5 2041–2070, RCP4.5 2071–2100, RCP8.5 2071–2100), while, inside them, each slice represents the results of different GCM–RCMs arranged in a clockwise direction (i.e., from 1–10, in Table 1). Each slice, in turn, is divided into the four different classes of loadings with an amplitude corresponding to the value of the associate probability (i.e., from 0–100).

With regard to NO3 − (Figure 8a), the impact scenarios reported that higher loadings will take place in autumn and winter, while the lowest loadings are predicted for summer. Across different models, in fact, in autumn, higher probabilities were associated with high (i.e., 48,615-69,182 kg/season, orange) and very high loading classes (i.e., >69,182 kg/season, red). The highest loading was predicted by the MPI-ESM-LR/RCA4 (Model 5) under the RCP8.5 2071–2100 scenario with 70% probability associated with the high loading class (Figure A5, Appendix A).

In summer, by contrast, a higher probability was associated with low (i.e., 0–28,047 kg/season, green) loading classes, with the CMCC-CM/COSMO-CLM (Model 7) predicting the highest probability (77%) under the long-term RCP8.5 scenario (Figure A5, Appendix A).

For ammonium (i.e., NH4 <sup>+</sup>), results across different models predicted high probabilities of low loading during summer and spring (Figure 8b). The lowest loading was predicted by the CMCC-CM/COSMO-CLM (Model 7) under RCP8.5 2041–2070 with a 97% probability of the low loading class (i.e., 0–3224 kg/season, green) (Figure A6, Appendix A). In autumn, the probability of low loading states decreased gradually across the scenarios, followed by an increase in the probability of medium (i.e., 3224–5009 kg/season, yellow) and very high loadings (i.e., >6794 kg/season, red), respectively reaching 38% and 24% under RCP8.5 2071–2100 in the simulation with IPSL-CM5A-MR/RCA4 (Model 2).

Results for PO4 <sup>3</sup><sup>−</sup> showed a marked seasonality with high autumn loads and low loads in spring and summer across different scenarios (Figure 8c). In summer, in fact, higher probabilities were associated with the low loading state (i.e., 0–1978 kg/season, green). Specifically, the lowest loadings were predicted by the CMCC-CM/COSMO-CLM (Model 7) under the medium- and long-term RCP8.5 scenarios with a probability of 98% (Figure A5, Appendix A). High loadings were instead predicted for autumn with probabilities of high (i.e., 2954–3929 kg/season, orange) and very high classes (i.e., >3929 kg/season, red) increasing across scenarios. The IPSL-CM5A-MR/RCA4 (Model 2), which described the most extreme loadings for the season, predicted probabilities of 34% and 16% of being in very high and high classes under the long-term RCP8.5 scenario (Figure A5, Appendix A).

### Uncertainty Analysis

The variability of results was also analyzed by comparing outputs obtained with each of the 10 GCM–RCM combinations across scenarios and seasons. To make the results comparable, the change in the probability of each loading class with respect to the baseline scenario (i.e., 1983–2012) was calculated for each combination of GCM–RCM.

Accordingly, in Figure 9, which provides an example for PO4 <sup>3</sup><sup>−</sup> loadings, negative values describe a decrease in probability of specific loading classes (i.e., colored bars) with respect to baseline, while positive values indicate an increase. Orthophosphate (i.e., PO4 <sup>3</sup>−) loadings showed clear variability during spring and summer. During these periods, in fact, half of considered models predicted an increase in loading, while others predicted a strong decrease. Less marked variability, however, was depicted under RCP8.5 2071–2100, where most models agreed on a reduction in loadings in the summer–spring period and an increase in probabilities associated with the low class. A good agreement among models, instead, can be depicted in autumn especially under RCP8.5 2071–2100, where most models predicted an increase in probabilities of very high and high loadings. Despite the good agreement on the increase in loading, a moderate variability in the magnitude of the change with respect to baseline remained. For RCP8.5 2041–2070, for instance, the maximum variation was related to MPI-ESM-LR/RCA4 (Model 5) (i.e., +20%), while EC-EARTH/RCA4 (Model 4) predicted an increase of +1.5%. In RCP8.5 2071–2100, the increase in probability ranged from +28% for IPSL-CM5A-MR/RCA4 (Model 2) to 2% for EC-EARTH/RCA4 and EC-EARTH/HIRHAM5 (Models 4 and 9). Also, in winter, a general increase in loading was predicted with an increase in probabilities associated with higher classes and a consequent decrease in probabilities of lower classes. The maximum increase (i.e., +10%) was depicted with EC-EARTH/RACMO22E (Model 10) under RCP4.5 2071–2100.

Results for NO3 <sup>−</sup> and NH4 <sup>+</sup> loadings presented a similar tendency (Figure A6, Appendix A). The best agreement among models resulted for the autumn season, where an increase in loading was predicted across all scenarios and for all GCM–RCM combinations. Specifically, for NO3 −, an increase in probability of the high loading class was depicted, while, for NH4 <sup>+</sup>, the increase was associated with the highest loading class (i.e., very high). Also, for winter, the variability of results was quite low, with most models agreeing on an increase in probability of high and very high classes across different scenarios. By contrast, two models (i.e., CNRM-CM5/CCLM and CMCC-CM/COSMO-CLM (Model 6 and 7)) predicted a decrease in loadings for both NO3 <sup>−</sup> and NH4 <sup>+</sup>. Large variability resulted for both summer and spring seasons; hence, it was not possible to identify a clear direction of change.

Overall, the results for different nutrient species highlighted that, in general, the best agreement between models resulted for autumn and winter, especially for RCP8.5 scenarios. In summer and spring, instead, variability was high and, thus, there was less confidence in the changes projected. This seasonal pattern of variability to some extent reflects that of precipitation (Section 3.1, Figures 6 and 7), suggesting that this variable could play a major role in the model in determining both the direction and the magnitude of changes in nutrient loadings. Comparing the results (Figures 9, A3 and A4, Appendix A) with the changes in precipitation across the different models (Figure 9 and Tables A1 and A2, Appendix A), a strong correlation between the increase in precipitation and increase in the probability of high loading can be found.

In summer and spring, in fact, those models which predicted the highest increase in probability of high loadings were also those showing a positive variation (i.e., increase) in precipitation with respect to baseline.

In this context, it is interesting to notice how, in spring (Figures A6 and A7, Appendix A) some models (e.g., Models 8–10) contemporarily predicted an increase in probability of the two most extreme classes (i.e., low and very high). The same models were also those predicting the highest increase in precipitation with respect to baseline (Figure A7, Appendix A), suggesting that the unexpected high probabilities of very high loading classes could be related to the projections of extreme precipitation events in the considered scenarios.

### **4. Discussion and Conclusions**

A BN was used to develop an ensemble of impact scenarios to investigate the potential mid- and long-term impacts of climate change on the nutrient loadings in the waters of the Zero river basin in northern Italy, one of the main tributaries of the Venice lagoon. Moreover, the uncertainty related to the implementation of 10 GCM–RCM combinations forced with RCP4.5 and RCP8.5 emission scenarios was analyzed.

The BN used was implemented by building on a model previously developed and tested in a case study [36], integrating a heterogeneous set of data coming from multiple information sources (i.e., observations, hydrological model simulations, climate change projections). The BN was evaluated through a cross comparison between predicted and observed loadings, providing satisfactory results on a seasonal time step; therefore, it is considered suitable for projecting future climate change scenarios.

According to the analysis of future climate for the Zero river basin, all projections agreed on an increase in the mean seasonal temperature with respect to baseline for both RCPs.

By contrast, precipitation featured marked variability across scenarios; while for winter and autumn most models agreed on an increase in the cumulative precipitation, for spring and summer, some models gave opposite results, making it impossible to agree on the direction of change (i.e., decrease or increase). The variability increased consistently along the century, especially for RCP8.5.

The impact scenarios developed showed that seasonal changes in precipitation and temperature are likely to affect nutrient loadings and, consequently, the water quality of the Zero river. Results suggest with good confidence that, across the considered scenarios, nutrient loadings will increase, especially during the autumn and winter seasons. Most models, in fact, agreed in projecting a high probability of an increase in nutrient loadings with respect to the current conditions. In summer and spring, instead, the large variability between different GCM–RCM results made it impossible to identify a clear direction of change.

The results were consistent with those obtained by Reference [45] applying the SWAT model for simulating the effect of climate change on hydrological and ecological parameters in the same case study. However, while conclusions for autumn are similar to those reached by other authors [26,46,47] for similar catchments in Europe and the United States (US), for spring and summer, the results differ. Xu et al., in particular, found that, in the Lake Erie region, spring loading of P will increase under RCP8.5 scenarios driven by an anticipation of snow-melting processes. Such discrepancies can be attributed to local and regional climatic characteristics which should, therefore, be taken into account carefully, as they can have a significant role in governing nutrient transport dynamics.

In the Zero river basin, nutrient loadings were found to be particularly sensitive to hydrological variables (i.e., river flow, runoff, N and P in runoff) directly correlated with climate variables (i.e., precipitation, temperature) and diffuse pollution, especially considering that most dramatic changes (e.g., increases in precipitation and runoff) will happen during seasons characterized by intensive agricultural activities (e.g., manure application, irrigation).

In spring and summer, in fact, NO3 <sup>−</sup> and NH4 <sup>+</sup> are commonly applied as fertilizers. In dry and warm conditions, NH4 <sup>+</sup> is readily adsorbed to clay mineral and is, therefore, scarcely prone to movements; however, it becomes easily available in autumn, driven by runoff and extreme precipitation events. NO3 −, on the other hand, is highly soluble and, thus, suitable to be transported by hydrological flow. In autumn, the elevated temperature and wet conditions projected will enhance the nitrification process, making NO3 − highly available. This, combined with the seasonal increase in the river flow, could explain the great increase in NO3 − loading during autumn. In the soil, the soluble form of phosphorus (PO4 <sup>3</sup>−) is mobile, and it can be transported by diffusion or by surface water flow. At elevated temperatures and in dry conditions, however, PO4 <sup>3</sup><sup>−</sup> is easily adsorbed to clay particles or immobilized by organic matter accumulating in the upper soil layer. In autumn, an increase in runoff, following the enrichment of the topsoil of phosphorus occurring during the summer, increases PO4 3− transport and, thus, its loading in the river. In addition, the projected increase of dry prolonged conditions in summer might speed up soil erosion phenomena and, consequently, enhance the runoff of adsorbed mineral forms of phosphorus through the basin, leading to peaks of PO4 <sup>3</sup><sup>−</sup> loading in autumn as soon as the drought breaks.

At the same time, the large uncertainty in spring and summer loadings makes it difficult to predict the possible implications for the trophic state of the Venice lagoon. A significant increase in spring and summer nutrients delivered by the Zero river, during the season of growth for most phytoplankton species, would significantly increase the risk of harmful algae blooms and eutrophication phenomena.

The BN was revealed to be a suitable tool to characterize and communicate uncertainty on the effect of climate change and land use on water quality attributes in a policy-relevant manner; however, it is important to also acknowledge some limitations. Some uncertainty exists mainly due to the availability and quality of input data, especially regarding agronomic practices. Due to data constraints, in fact, fertilizer application and irrigation were considered uniform across the whole catchment, while they could vary considerably, both spatially and temporally. Furthermore, due to scarce information regarding point pollution sources, nutrient (N and P) loading was considered while neglecting to take into account the type of WWTPs and how they work in cases with a large amount of inflow water, for instance, during extreme precipitation events.

Improving the accuracy of input data throughout the catchment and involving a higher number of experts in the model development would improve its calibration, validation, and results.

Finally, changes in land use (i.e., agricultural land extension, crop typology distribution) and agricultural management practices (i.e., amount of fertilizer application), which were kept constant over future scenarios in this BN version, should be accounted for in future model improvements to provide a realistic picture of future risks threatening water quality sustainability. Accordingly, further improvements of the proposed approach will consider the implementation of a dynamic version of the BN [48] to better handle temporal dynamics over future scenarios, while also integrating land-use change projections.

Overall, the results obtained from this study show that the selection of climate change information to feed impact studies should be considered carefully as it strongly affects the outcome and the conclusions of the assessment. Studies based on only one GCM–RCM combination should be interpreted with caution, as results are highly dependent on the assumptions of the selected combination. Adaptation and management decisions are taken based on this information with the consequence that societies may underprepare for real risks affecting water systems, increasing the likelihood of severe impacts, or, by contrast, they may overreact, wasting resources and efforts targeting irrelevant threats.

Accordingly, an adaptive water resource planning method should be based on ensembles and multi-model probabilistic approaches rather than on an individual scenario and a single-value projection for the future. Through the identification of worst- or best-case scenarios, it is possible to bound the spectrum of plausible climate change impacts into an uncertainty space, inside which a set of optimal adaptation strategies can be defined and tested for the sustainable and climate-proof management of the water system.

**Author Contributions:** Conceptualization, A.S., J.L.M., A.C., M.P.-V. and A.M.; Data curation, A.S.; Formal analysis, A.S. and J.L.M.; Investigation, A.S.; Methodology, A.S., J.L.M. and S.T.; Supervision, A.C. and A.M.; Validation, A.S., J.L.M. and M.P.-V.; Visualization, A.S., J.L.M., S.T. and A.C.; Writing – original draft, A.S., J.L.M. and S.T.; Writing – review & editing, S.T., A.C., M.P.-V. and A.M.

**Funding:** This research was funded.

**Acknowledgments:** The authors would like to thank all the public authorities and local experts that provided territorial data and information supporting the implementation of the methodology; we would also like to thank Giampietro Basei who helped in the graphical design of the results.

**Conflicts of Interest:** The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **Appendix A**


**Table A1.** List of input data used for the implementation of the BN in the Zero river basin.


intheBayesiannetworkmodelfortheZeroriverbasin.


**Table A2.** *Cont.*


**Table A2.** *Cont.*


**Table A2.**

*Cont.*

#### *Sustainability* **2019** , *11*, 4764





**Figure A1.** Variation in mean seasonal temperature with respect to baseline (i.e., 1983–2012) within GCM–RCM ensemble.




**Figure A2.** Variation in cumulative seasonal precipitation with respect to baseline (i.e., 1983–2012) within GCM–RCM ensemble.


**Figure A3.** Probability of different classes of NO3− loadings associated with different seasons and scenarios across the GCM–RCM combinations considered.



6.CNRM-CM5/CCLM;7.CMCC-CM/COSMO-CLM;8.HadGEM2-ES/RACMO22E;

 EC-EARTH/HIRHAM5;10.EC-EARTH/RACMO22E.

**Figure A4.** Probability of different classes of NH4+ loadings associated with different seasons and scenarios across the GCM–RCM combinations considered.


**Figure A5.** Probability of different classes of PO43− loadings associated with different seasons and scenarios across the GCM–RCM combinations considered.

**Figure A6.** Variations in the probability of each NO3 − loading class with respect to baseline (i.e., 1983–2012) under different scenarios and GCM–RCM combinations.

**Figure A7.** *Cont.*

**Figure A7.** Variations in the probability of each NH4 <sup>+</sup> loading class with respect to baseline (i.e., 1983–2012) under different scenarios and GCM–RCM combinations.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Challenges for Sustainable Water Use in the Urban Industry of Korea Based on the Global Non-Radial Directional Distance Function Model**

**Na Wang 1,\* and Yongrok Choi 2,\***


**\*** Correspondence: 5sunnywang@live.cn (N.W.); yrchoi@inha.ac.kr (Y.C.)

### Received: 17 June 2019; Accepted: 9 July 2019; Published: 17 July 2019

**Abstract:** Since water stress and industrial water pollution pose a huge threat to South Korea's sustainable water use, it is an urgent task to assess industrial water green use efficiency (GUEIW). Based on the global non-radial directional distance function (GNDDF) model, this paper calculated GUEIW in 16 Korean local governments from 2006 to 2015 using two decomposition indicators: Economic efficiency of industrial water use (ECEIW) and environmental efficiency of industrial water use (ENEIW). The growth of GUEIW is mainly driven by ECEIW, and subsequent environmental problems are obstacles to achieving green use of Korean industrial water. The regional heterogeneity of GUEIW is so important that the downstream region outperformed the upstream region in all three indicators. The government's efforts to ensure water quality inhibits industrial development in upstream areas, where incomes are much lower than in downstream areas, and downstream industrial areas have to pay upstream industrial areas extra for water. However, regarding upstream industrial areas, low prices easily promote water waste. Because of relatively high water use costs, downstream producers are encouraged to save water. To improve the economic efficiency of industrial water use in upstream areas, advanced water technology should be developed or introduced to make full use of water resources in industrial production.

**Keywords:** water resource; South Korean urban industry; green use efficiency of industrial water (GUEIW); global non-radial directional distance function model (GNDDF); economic efficiency of industrial water use (ECEIW); environmental efficiency of industrial water use (ENEIW)

### **1. Introduction**

### *1.1. Background of the Research*

In the 20th century, oil was referred to as black gold, the "core of the world market economy," but in the 21st century, "blue gold," or water, has become an increasingly valuable resource due to population explosion, climate change, and the current world economic model [1]. Water systems are vital to human well-being and offer diverse benefits to society. Water policies have long aimed at optimal water use due to the ever-increasing demand for limited water resources in many parts of the world [2,3].

However, many countries around the world still face significant challenges in managing their scarce water resources due to industrialization, urbanization, and climate change. Industrialization and urbanization, driven by population pressure, are the main factors of economic growth; meanwhile, intensive agricultural spread-out use of high pesticides has also become dominant, resulting in water shortage and degradation in many parts of the world. In addition, climate change has increased characteristics of severe spatial and temporal variations in water resources [4].

South Korea ("Korea") is no exception. Over the past few decades, the Korean economic growth rate has been astonishingly high [4,5]. Unfortunately, economic growth has come at the expense of the environment [6,7], indicated by water shortage and deteriorating water quality [8] which became serious problems in the 1990s [9]. Rapid economic development has led to urban, industrial, and agricultural pollution that has severely disrupted water supplies and ecosystems. Population and industrial growth have increased pressure on limited available water resources, resulting in water conflicts among stakeholders [10]. Sewage contains a large number of organic pollutants which affect the ecosystem in the form of toxicity, reduce dissolved oxygen in water, and endanger human health. Specific organic pollutants refer to organic compounds with high toxicity, strong accumulation, and delayed degradation that are listed as priority pollutants [11]. Organic pollutants have different characteristics. The hazardous substances are a minor fraction of wastewater, often measured in μg/L, while the less harmful organic substances may be given in hundreds or even thousands of mg/L. They are not toxic or pose a strong risk to human health but are responsible for oxygen consumption in receiving water bodies and for this reason should be reduced by treatment. The water resources management information system of the Korean government advises that the concentration of organic pollutants (BOD) needs to be kept below 10 mg/L [12]. Regarding water regulation policies, an intermediate law concerning the flow of river water was introduced in 1999, which mandates lowland water users to pay upland residents in an effort to reduce agricultural intensification (Agricultural intensification refers to a kind of agricultural operation mode in which more labor, capital, and technology are invested in a certain area of land in order to obtain more output per unit area and simultaneously reduce the labor cost per unit of product.) in the upland regions and decrease the need for more water pollution treatment facilities. However, water pollution has still occurred [13,14].

More than direct regulatory policies, therefore, water prices may play an important role in determining the efficiency of resource utilization, as suggested by Kumbhakar and Bhattacharyya [15]. Specifically, relatively low prices can easily promote water waste, and relatively high prices may lead to higher production costs, thus encouraging water consumers to save water [16]. Therefore, the price should be set within a reasonable range for sustainable use of water. Uncontrolled water wastage and numerous water pollution incidents have made the problems of water shortage and degradation more serious, which has led to a huge negative impact on social and economic development [17–19]. Therefore, improving water efficiency and reducing water pollution are crucial for sustainable water use [20–23].

As for the estimation of resource efficiency, many recent studies prefer the distance function method, which simultaneously takes multiple input and output factors into account [24–29]. There are two methods for estimating the distance function: The nonparametric data envelope analysis (DEA) method and the parametric method. The DEA method was proposed by Charnes et al. [30] and has been widely applied in recent environmental and energy-related studies [31]. A major advantage of the DEA method over the parametric method is that it does not require the underlying technology to have a specific functional form [32–35].

As for the evaluation of water use efficiency in China, Hu et al. [36] developed the pioneer empirical analytic framework DEA to evaluate water use efficiency at the national level. The concept adjustment quantity they proposed was used to determine the optimal scale of water use. Liao and Dong [37] adopted a similar method to evaluate the efficiency of provincial water resources utilization. However, these studies only took economic benefits into account and may be considered only a part of a comprehensive analysis because they ignored pollutants (such as water waste and water pollution) from industrial production. Therefore, the expected output should also be considered along with the undesirable output. By utilizing this undesirable output in our model, "green use efficiency" will be achieved [38–41]. In addition, most previous studies tended to use the radial DEA method, which aims to increase the good output and reduce the bad output at the same rate. This is inconsistent with actual production activities and often leads to the same efficiency value of 1 for many evaluated observations, making it very difficult to evaluate the observed results in an appropriate manner [42]. In addition, as pointed out by Zhang et al. [32], many relevant studies have used time series data only,

but studies on cross-sectional data can be regarded as yielding more efficient and effective evaluations over the same period. Obviously, production technology varies from year to year, and results based on production technology at the same time may not be reasonable. To overcome these spatial and dynamic issues, Zhang et al. [32] and Zhang et al. [33] proposed a global non-radial directional distance function (GNDDF) method, which increased the good output and reduced the bad output at different rates and covered all contemporary technologies during the research period.

Although there are many papers on resource utilization efficiency, to our knowledge, there is no research on the analysis of the green use efficiency of industrial water (GUEIW) in Korea. Therefore, we applied a relatively advanced GNDDF model, which included undesirable output, to calculate the GUEIW in Korea from 2006 to 2015. We further analyzed water use efficiency using the economic efficiency of industrial water use (ECEIW) and the environmental efficiency of industrial water use (ENEIW) components to illuminate the major contributing factors of GUEIW. Moreover, we calculated local government-level ECEIW and ENEIW to find out which local government is more effectively promoting the efficiency of Korean industrial water use.

### *1.2. Geographic Features*

Korea is located in East Asia, south of the Korean peninsula, and is surrounded by the East Sea and Yellow (West) Sea. The country lies between 124◦ and 132◦ longitude and between 33◦ and 42◦ latitude. This geographical location has a major influence on the country's climate, which is divided into four distinct seasons and is characterized by continental and temperate monsoon climates, depending on the region [43].

The country's land area is 99,596 square kilometers, with mountainous topography covering 70 percent of the country. Most of the mountains are in the eastern part, characterized by sharp declines, while the height of the western and southern regions drops slowly. This is why four major rivers flow from the eastern mountain areas into the West Sea (see Figure 1) [43]. The Han river is 481.7 km long, with a catchment area of 26,018 sq km, and is the largest river in South Korea; it flows through the most populous Seoul metro area, including Incheon (the third largest city), and into the West Sea. The Naktong river (506.17 km in length), with a catchment area of 23,384 sq km, is the longest river flowing into the southern sea, passing through the two metropolises of Busan (second) and Daegu (fourth), as well as several industrial cities. The Geum river, which is 394.79 km in length and covers 9912.15 sq km, begins in the central part of the country and ends in the West Sea. The cities of Daejeon and Sejong (fifth) reside on its banks. The Yeongsan, which is 115.5 km in length and 3371 sq km in basin area, is a river in southwestern Korea. It passes through Gwangju (sixth) and flows into the West Sea [43].

Despite heavy rainfall, water supplies in South Korea are limited. The Organization for Economic Co-operation and Development (OECD) classifies Korea as a water-stressed country (Figure 2), for, although the concept of water stress is relatively new, access to freshwater resources has become much more difficult over time and could lead to further depletion and deterioration of available water resources. Water shortage can be caused by climate change, droughts or floods, increased pollution, increased human demand, and excessive use of water resources. A water crisis can occur when the amount of potable, uncontaminated water in an area is less than what the area needs. Shortage of water resources is caused by two converging phenomena: The ever-increasing fresh water use and the exhaustion of available fresh water resources. Water stress can be the result of two mechanisms: Material (absolute) water stress and economic water stress. Material water stress is the result of water resources being naturally inadequate in meeting regional needs, while economic water stress is the result of mismanagement of adequate available water resources (OECD Environmental Outlook to 2050).

**Figure 1.** The four major river basins in South Korea.

**Figure 2.** Water stress in Organization for Economic Co-operation and Development (OECD) countries. Note: Water stress levels: Below 10% = no stress; 10%–20% = low stress; 20%–40% = medium stress; above 40%: Severe stress.

As shown in Figure 2, compared with other OECD countries, Korea is coming close to facing a water crisis. As of 2015, Korea had used up to 33 percent of its total available water, putting the country's water balance at risk. Korea's total water resources amounted to 132.3 billion m3 in 2014, as shown in Figure 3. However, available water resources are estimated to be slightly more than half (76 billion m3, or 57%). During the rainy season between June and September, 42.5 percent of water (56.3 billion m3) was discharged. In particular, heavy rain brought by summer monsoons and typhoons resulted in flooding in the downriver areas of the four major river basins (see Figure 3). Due to steep mountain slopes and covered soils, the surface runoff is fast and half of it goes directly into the Sea. Furthermore, almost half of the precipitation is lost due to evaporation and transpiration [44].

**Figure 3.** Distribution of water resources in South Korea, 100 million m3/year. Note: Flood runoff is measured between June and September. Source: MoLIT (2017), The 4th Long-term Comprehensive Plan of Water Resources (2001–2020), 3rd revision.

Moreover, the proportion of water utility bills that covers production costs is far lower than total costs, with a 0.7 per cent decrease per year on average, from 84.4 percent in 2007 to 77.5 percent in 2015 (see Table 1). This continuously increasing disparity between the production costs of drinkable water and the price for which it is supplied has magnified the mismatch between water demand and supply. According to Kim [13], Korea's per capita daily water consumption of 395 liters is the highest among the 29 member nations of the OECD. The amount stands at 197.5 liters in France, 197.5 liters in Germany, and 158 liters in Denmark. Korea's low-priced water bills are likely to lead to this overuse. Therefore, in order to delineate the sustainable performance of this water policy, we will evaluate the quantitative and qualitative efficiency of water in the following section.


**Table 1.** Water tariff over 2007–2015.

Source: Ministry of the Interior and Safety, https://www.mois.go.kr/.

### **2. Methodology**

In this section, the global non-radial directional distance function (GNDDF) is presented to evaluate water stress in terms of its efficiency. This model shows how to estimate the green use efficiency of industrial water (GUEIW), the "economic" efficiency of industrial water (ECEIW), and the "environmental" efficiency of industrial water (ENEIW).

### *2.1. GNDDF (Global Non-Radial Directional Distance Function)*

Chambers et al. [30] first introduced the traditional directional distance function (DDF). After Chung et al. [45] extended this approach to evaluate environmental efficiency, DDF became a widely accepted approach to resource and environmental efficiency assessment. As shown in Equation (1), the conventional radial directional distance function approach always assumes that all inputs and outputs are introduced at the same proportionate rate, which does not conform to real production activities. Moreover, this radial DDF approach may overestimate efficiency due to non-zero slacks existing in inputs or outputs [46,47]. To overcome these problems, this paper utilized a non-radial DDF approach, which is widely used in research on resource efficiency evaluations [48], as shown in Equation (2)

$$\begin{array}{l}D \stackrel{\rightarrow}{\longrightarrow} (\mathbf{x}, \mathbf{y}, \mathbf{b}; \mathbf{g}) = \sup \{ \varnothing \colon ((\mathbf{x}, \mathbf{y}, \mathbf{b}) + \mathbf{g} \times \varnothing) \in T \}\tag{1}$$

$$\begin{array}{ll} \rightarrow (\text{x}, \text{y}, \text{b}; \text{g}) & = \sup \{ \mathsf{W}^{T} \boldsymbol{\beta} ; ((\mathsf{x}, \mathsf{y}, \mathsf{b}) + \text{g} \times \text{diag}(\boldsymbol{\beta})) \in T \} \\ \top & & \prime \end{array} \tag{2}$$

where *W<sup>T</sup>* = (*x*, *y*, *b*) *<sup>T</sup>* denotes a normalized weight vector with respect to inputs and outputs, *g* = −*gx*, *gy*,−*gb* is an explicit directional vector, and β = (β*x*, β*y*, β*b*) *<sup>T</sup>* <sup>≥</sup> 0 represents the vector of scaling factors. Thus, input and output may have different adjustment ratios, conforming to real production activities. The symbol *diag* denotes diagonal matrices and the environmental technology possibility set.

In this paper, let us assume that there are N local governments being evaluated. Each local government has inputs (x) to produce desirable outputs (y) and undesirable outputs (b). The regulated environmental technology *T*<sup>1</sup> for N decision making units (DMUs) can be expressed as follows:

$$\begin{cases} T\_1(\mathbf{x}) = \left\{ (\mathbf{x}, y, b) \right\} \ge \quad \text{can} \quad \text{product} \ (y, b): \mathbf{j} \\ \begin{cases} \sum\limits\_{n=1}^{N} \lambda\_n \mathbf{x}\_n \le \mathbf{x}, \\ \sum\limits\_{n=1}^{N} \lambda\_n \mathbf{y}\_n \ge \mathbf{y}, \\ \sum\limits\_{n=1}^{N} \lambda\_n \mathbf{b}\_n = b, \\ \sum\limits\_{n=1}^{N} \lambda\_n = 1, \\ \sum\limits\_{n=1}^{N} \lambda\_n = 1, \dots, N, \end{cases} \tag{3}$$

where *T*1(*x*) satisfies the production function theory, implying that limited inputs can produce limited outputs only. Inactivity is assumed as well, implying that there is no production for no undesirable outputs [49]. In addition, we imposed a weak disposability hypothesis on *T*1(*x*), implying that the undesirable output cannot be reduced freely. Considering the production technical change during the experimental period, we applied a function that imposed the constraints *<sup>N</sup> n*=1 λ*<sup>n</sup>* = 1 into variable returns to scale (VRS). Because the value is estimated based on the benchmark technology of the same period, the technology is obviously different over the years. The values over different years cannot be compared with each other; therefore, we utilized a global technology benchmark to include all of the contemporary technologies over the research period. Global technology integration, as proposed by Pastor [50], is denoted by *TG* = *T*<sup>1</sup> ∪ *T*<sup>2</sup> ∪ ... ∪ *TN*, and the global benchmark technology possibility set T2(x) can be defined as follows:

$$\begin{array}{lcl} GT\_{2(x)} = \left\{ (\mathbf{x}, y, b) \mid \mathbf{x} \quad \text{can} \quad \text{product} (y, b) \right\} \\ \begin{cases} \begin{array}{ll} T & \sum\limits\_{t=1}^{T} \mathbf{x}\_{n}^{t} \boldsymbol{\lambda}\_{n}^{t} \leq \mathbf{x}, \\ \sum\limits\_{t=1}^{T} \sum\limits\_{n=1}^{N} \mathbf{y}\_{n}^{t} \boldsymbol{\lambda}\_{n}^{t} = \mathbf{y}, \\ \sum\limits\_{t=1}^{T} \sum\limits\_{n=1}^{N} \mathbf{b}\_{n}^{t} \boldsymbol{\lambda}\_{n}^{t} = \mathbf{b}, \\ \sum\limits\_{t=1}^{T} \sum\limits\_{n=1}^{N} \mathbf{b}\_{n}^{t} \boldsymbol{\lambda}\_{n}^{t} = \mathbf{b}, \\ \boldsymbol{\lambda}\_{n} > \mathbf{0}, n = 1, \dots, N, t = 1, 2, \dots, T \end{array} \tag{4} \\ \boldsymbol{\lambda}\_{n} > \mathbf{0}, n = 1, \dots, N, t = 1, 2, \dots, T \end{array} \tag{5}$$

In this research, in accordance with previous research and data availability, input variables mainly included three indicators: Industrial water use (WA), industrial labor force (L), and industrial capital (K), referring to the water amount used for industrial production, the strength of the labor force in the industrial sector, and annual net industrial fixed assets, respectively. The desirable output was industrial GDP (Y), while the undesirable output was the two major water stress factors in industries: Water waste and organic pollution (OP). In order to eliminate the dilution effect of industrial labor and industrial capital, we removed it from the objective function and constrained it with the "ceteris paribus" condition. Therefore, we can calculate the GUEIW through the GNDDF model as follows.

$$\begin{aligned} \stackrel{\circ}{D} &= \text{max}\,\eta\_{\text{W}}\beta\_{\text{I}W} + w\_{Y}\beta\_{Y} + w\_{\text{WW}}\beta\_{\text{W}W} + w\_{\text{OP}}\beta\_{\text{OP}}\\ &\quad \left\{ \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{t}^{t}IW\_{n} \le (1-\beta\_{\text{I}W})IW\_{0}, \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{n}^{t}L\_{n} \le L\_{o}, \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{n}^{t}K\_{n} \le K\_{0},\\ \text{s.t.} &\quad \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{n}^{t}Y\_{n} \ge (1+\beta\_{Y})Y\_{o},\\ &\quad \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{n}^{t}WW\_{n} = (1-\beta\_{\text{W}W})VW\_{o}, \sum\_{t=1}^{T}\sum\_{n=1}^{N}\lambda\_{n}^{t}OP\_{n} = (1+\beta\_{\text{OP}})OP\_{o},\\ &\quad \beta\_{\text{IF}} \ge 0, \beta\_{\text{Y}} \ge 0, \beta\_{\text{WW}} \ge 0, \beta\_{\text{OP}} \ge 0, \\\ n = 1,\ldots,N; t=1,2,\ldots,T; \lambda\_{n}^{t} \ge 0, \sum\_{n=1}^{N}\lambda\_{n}^{t} = 1 \end{aligned} \tag{5}$$

where subscript 0 refers to the evaluation of the local government. β*IW* is the input adjustment ratio of industrial water, β*<sup>Y</sup>* is the adjustment radio of desirable output of industrial GDP's, and β*WW* and β*OP* are the adjustment ratios of undesirable outputs of industrial water waste and organic pollutants. Subscript n refers to the number of local governments in the sample; superscript t is the year. When β is zero, it means the local government is located on the production frontier.

### *2.2. GUEIW (Green Use E*ffi*ciency of Industrial Water)*

In accordance with Wang et al. [51], we set the weight vector as (1/3, 1/3, 1/6, 1/6) since we included two undesirable outputs. By solving the following equations, we can estimate the GUEIW and its two decompositions ECEIW and ENEIW [51]. The estimated DMU is valid in industrial water when the value of the GUEIW is equal to 1 and is ineffective when the GUEIW is less than 1. ECEIW and ENEIW are the same case.

$$GUEIN = \frac{(1 - \beta\_{IN}) + (1 - \beta\_Y)\frac{1}{2}[(1 - \beta\_{WW}) + (1 - \beta\_{OP})]}{3} = 1 - \frac{\beta\_{IN} + \beta\_Y + \frac{1}{2}(\beta\_{WW} + \beta\_{OP})}{3}$$

$$ECEIN = \frac{(1 - \beta\_{IN}) + (1 - \beta\_Y)}{2} = 1 - \frac{\beta\_{IN} + \beta\_Y}{2} \tag{6}$$

$$ENEIW = \frac{(1 - \beta\_{IV}) + \frac{1}{2}[(1 - \beta\_{WW}) + (1 - \beta\_{OP})]}{2} = 1 - \frac{\beta\_{IV} + \frac{1}{2}(\beta\_{WW} + \beta\_{OP})}{2}$$

### **3. Empirical Results**

In this section, we present the data including the inputs and outputs of the GNDDF framework. We collected annual industry data for all 16 Korean local government levels from 2006–2015. In this chapter, we calculated GUEIW based on the non-radial directional distance function (NDDF) and GNDDF models for comparison. In order to examine the main catalyst for GUEIW growth in Korea, we calculated the two forms of decomposition of the GUEIW: ECEIW and ENEIW.

### *3.1. Data Collection*

In the DEA model, production inputs include two basic types: Industrial labor (L) and industrial capital (K); we added industrial water (IW) as the third input. We selected industrial GDP as the desirable output and selected water waste (WW) and organic pollutants (OP) {Organic pollutant (kg/year) <sup>=</sup> BOD (mg/l) <sup>×</sup> water waste (m3/year) <sup>×</sup> 103} as two undesirable outputs. Table <sup>2</sup> shows the descriptive statistics for each variable. The data were derived from the Korean Statistical Information Service (KOSIS). The industrial GDP and capital were converted based on 2010 prices.


**Table 2.** Descriptive statistics for variables.

Sources: Korean Statistical Information Service (KOSIS) (http://kosis.kr/).

### *3.2. Results and Discussion*

As demonstrated in Figure 4, the GUEIW calculated from the GNDDF model is always lower than the value based on the NDDF model during the study period, with average values of 0.929 and 0.864. This indicates that the GUEIW based on the NDDF model might be over-estimated, as it is based on contemporary environmental production technology. The NDDF measures relative efficiency and its evaluation criteria are determined by the evaluated DMUs. The efficiency scores obtained under different evaluation criteria are not comparable. To compare efficiency intertemporally, they must be carried out on the same benchmark (frontier). For example, when evaluating the efficiency of 16 local governments in 2006, relevant data constituted a frontier surface A, and the efficiency of each DMU was relative to the evaluation benchmark A. When evaluating the efficiency of 16 local governments in 2007, relevant data constituted a new frontier surface B, and the efficiency of each DMU was relative to the evaluation benchmark B. Generally, A and B are not the same, so the efficiency values in 2006 and 2007 were not comparable. If intertemporal comparisons are to be made, the efficiency of the two years must be based on the same frontier. The GNDDF in this paper refers to the data of the same region sampled in the ten years from 2006 to 2015 as different DMUs. The 160-sample data are then combined to create a common frontier on which the efficiency may be measured. However, the GNDDF model encompasses all concurrent technologies during the sample period. The calculated value is more reasonable because the most efficient node 1 is based on global environmental production technology. Therefore, the following analysis was based on the GNDDF model. In Figure 4, the GUEIW trend

on GNDDF exhibits an m-shaped curve, which can be divided into three phases. The first phase is 2006–2007, the second 2008–2012, and the third 2013–2015. The first phase shows a rising trend from 2006–2007, indicating that industrial water use efficiency had achieved obvious improvement during the two years, but decreased in 2008 because of the global economic crisis. A rising trend can be seen again after 2010, following strong support for President Lee's "green growth policies." However, this upward trend only lasted two years; in 2013, a sharp decline occurred again following less emphasis on the environment by President Park's administration. It is clear that the GUEIW for Korea experienced two notable increase–decrease phases over the sample period. Measurements during 2007 and 2012 show that water use was more efficient; this may stem from the impacts of ENEIW and ECEIW. Therefore, in order to further understand the dynamic changes in GUEIW, it is necessary to analyze the trends of its contributing factors.

**Figure 4.** The green use efficiency of industrial water (GUEIW) for Korea based on the non-radial directional distance function (NDDF) and global non-radial directional distance function (GNDDF) models, 2006–2015.

Based on Equation (6) and as shown in Figure 5, we were able to outline the trends in ECEIW and ENEIW for Korea. It is clear that they exhibited similar trends as the GUEIW. There was no large gap during the study period, but the value of ECEIW was higher than that of ENEIW, with an average value of 0.870, while that of ENEIW was 0.864. This implies that environmental efforts may present some opportunity costs that are expressed in the gap between ECEIW and ENEIW. The year 2010 exhibited the biggest gap between ECEIW and ENEIW, with average values of 0.845 and 0.826. This implies that the growth of GUEIW is mainly driven by ECEIW and that environmental issues function as constraints in the achievement of green use efficiency of industrial water in Korea. This is consistent with the argument of Wang et al. [51], who stated that environmental protection of industrial sectors needs to be improved in China too. In order to improve the economic efficiency of industrial water use, advanced water technology should be developed or introduced to make full use of water resources in industrial production. For example, in Denmark, through technology integration, "wastewater treatment plants" could be turned into "energy factories", biomass power could be produced by extraction of organic matter in sewage, and enough electricity could be supplied not only to run the motor, but also to pump water and have a surplus. In addition, waste water treatment plants use the money earned from waste power generation and heating to support the operation of the sewage treatment plant and even to achieve a break-even point [52]. In addition, it is noteworthy that ENEIW showed a significant upward trend during 2006–2007 and 2010–2012, indicating a significant environmental improvement in Korea.

This can be attributed to a series of laws and regulations. As the River Basin Law was amended in 2007 under the Long-term Comprehensive Water Resources Plan, the Korean government worked to ensure a stable supply of water in the event of climate change. The Korean government implemented the river-oriented national land improvement project (four-river restoration project) and the new national land development paradigm (low-carbon green growth) [53]. Nevertheless, the unanticipated international financial crisis in 2008 had a significant negative impact on Korea's industrial exports, which caused many factories to close down and workers to lose their jobs, resulting in sharp decreases in both ECEIW and ENEIW. This forced the central government to introduce a series of economic stimulus plans, the components of which included expanding investment in small- and medium-sized industries to ensure industrial growth, thus boosting the recovery of the ECEIW [54]. Under the stimulus plan, some small- and medium-sized industries in Korea have overemphasized temporary doubts about the monitoring and implementation of environmental policies; they take chances on not being regulated, causing serious industrial pollution to many rivers. This is consistent with our findings. Figure 5 demonstrates the rapid upward trend of the ECEIW after 2008, as well as a large gap in the ENEIW in 2010. As Water Vision 2020 was revised for the second time in 2011 [55], both the ECEIW and ENEIW exhibit upward trends. The period between 2008–2012 exhibits these stricter environmental regulations that can stimulate technological innovation and improve production efficiency, leading to resource savings and environmentally friendly industrial production. This provides evidence for the Porter hypothesis [56]. Unfortunately, since 2013, President Park Geun-hye changed the paradigm from "green growth" to "creative economy." Environmental policies became passive and loose compared with those during Lee Myung-bak's administration. These changes led to the decline of the GUEIW.

**Figure 5.** The GUEIW, economic efficiency of industrial water use (ECEIW), and environmental efficiency of industrial water use (ENEIW) for Korea, 2006–2015.

Figure 6 shows the average values of the GUEIW, ECEIW, and ENEIW at the local government levels. All three values are close to 1 in many governments; this is because Korea's water management system is centralized. In other words, all of these local governments share the same management experience and are under the same laws and regulations, so their efficiency values are extremely close. Therefore, there is no competition among local governments to enhance water management, resulting in similar performances. During the study period, Ulsan showed the highest value of 1 and Gangwon showed the lowest value of 0.351. It is worth noting that 10 local governments with a GUEIW value of over 0.9 are located on the south and west coasts of South Korea (Chungnam, Busan, Jeonnam, Gyeonggi, Gyeongnam, Seoul, Gyeongbuk, Jeju, Gwangju, Ulsan). These local

governments are situated downstream of rivers. Moreover, Gangwon, Daejeon, Daegu, Chungbuk, and Jeonnbuk are situated upstream, in the water source protection area. The results are not surprising, as the Han River Law was established in 1999 [57] to improve water quality, manage drinking water sources, and support the compensation of local governments in the upper reaches of the river basin as an economic incentive to reduce chemical use in factories and farms. In the same year, the River Basin Law envisaged the comprehensive improvement of national development plans [56]. This has led downstream industries to pay a kind of water environmental tax to upstream industries. Water users in downstream areas, such as Seoul, Gyeonggi, Chungnam, Busan, Jeonnam, Gwangju, Ulsan, and Gyeongnam pay for upstream water protection compensation in conservation areas that supply them with more environmentally friendly water. Gangwon, Chungbuk, Daejeon, Daegu, and Jeonnbuk provinces are the areas that protect upstream areas and their people by regulating economic activity (such as environmentally friendly housing and industries). It is worth mentioning that Incheon, although in the downstream.

**Figure 6.** Average values of GUEIW, ECEIW, and ENEIW for Korea local governments.

m region, has a GUEIW value of 0.752, lower than other downstream regions. This result is not surprising. Most of Incheon's cities have been developed as industrial zones for Seoul; thus, many of these zones, such as Namdong or Gajwa, consist of small- and medium-sized enterprises in the country. Compared with large enterprises in other cities such as Ulsan, industries in Incheon do not have sufficient technologies and facilities for environmentally friendly production processes, resulting in low ECEIW and ENEIW values. This situation also indicates that small- and medium-sized enterprises in the area can learn more from big companies by a spill-over effect.

Except for Incheon, all downstream regions outperformed the upstream region in all three indicators, particularly ECEIW. The government's efforts to ensure water quality inhibits industrial development in upstream areas, which negatively impacts the incomes of upstream areas; the relatively high water use costs in downstream areas encourage producers to save water. While the government focuses on compensating the upstream areas for water quality management with the water tax from downstream areas, such supplies cannot be sustainable because the government is only responsible for the initial stage of installation and does not provide any incentives for industries to reduce emissions. Regulations should be strengthened and effective incentives should be implemented to allow upstream industries to provide their own environmental protection activities. Therefore, the upstream region should focus on environmental protection as a valuable activity for sustainable performance and develop suitable industries with lower pollution and higher output. To develop

such an industry, the water emission trading system (ETS) could be a good option for sustainable performance because the industry may reduce water use (quantitative efficiency) as well as produce less pollution (qualitative efficiency) for enhanced GUEIW. Here, ETS is the market-oriented carbon emission regulatory mechanism for the participating unit to reduce its carbon emission for better compensation in the ETS market. A water ETS could be constructed in a similar direction with caps and a trading system based on water management of industries.

### **4. Conclusions**

Since water stress—which mainly comes from industrial water waste and water pollution—poses a significant threat to sustainable water management in Korea, it is urgent to assess GUEIW and make useful policy recommendations. Based on the GNDDF model, this paper calculated GUEIW in 16 Korean local governments from 2006 to 2015 based on two decomposition indicators: ECEIW and ENEIW.

Empirical results showed that GUEIW exhibited fluctuations during the study and that the growth of GUEIW was mainly driven by ECEIW. In order to improve the economic efficiency of industrial water use, advanced water technology should be developed or introduced to make full use of water resources in industrial production. Local governments should employ advanced technology and help less developed local governments achieve enhanced industrial water use.

For the ENEIW, environmental issues might be constraints for achieving green use of industrial water. Stricter environment regulations can stimulate technological innovation and improve water management efficiency, leading to water resource savings and environmentally friendly industrial production. This result supports the arguments related to the Porter hypothesis. It also demonstrates that environmental deregulation leads to a decline in GUEIW.

Regarding local government heterogeneity, the downstream region outperformed the upstream region in all three indicators, particularly in ECEIW. The government's efforts to ensure water quality inhibits industrial development in upstream areas. Upstream area incomes are much lower than in downstream areas, as downstream industries have to pay upstream industries for upstream water quality management. With relatively high water use costs, producers in downstream regions promote conservation of water. Upstream industries, however, may easily create water waste due to relatively low prices of water use. In order to improve the economic efficiency of industrial water use in upstream areas, an advanced water technology should be developed and introduced to make full use of water resources in industrial production. The local government in upstream regions should help local industries acquire advanced technology. In addition, the government can provide incentives for producers to save water to encourage industrial water saving. Water ETS could be one of the best alternatives for these performance-oriented policies.

**Author Contributions:** The authors are contributed each part of a paper by Conceptualization, Y.C.; Methodology, N.W.; Software, N.W.; Validation, Y.C.; Formal Analysis, N.W.; Investigation, Y.C.; Resources & Data Curation, N.W.; Writing—Original Draft Preparation, N.W.; Writing—Review & Editing, Y.C.; Visualization, N.W.; Supervision, Y.C.; Project Administration, Y.C.; Funding Acquisition, Y.C.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
