**Deriving a Bayesian Network to Assess the Retention E**ffi**cacy of Riparian Bu**ff**er Zones**

#### **Andreas Gericke 1,\*, Hong Hanh Nguyen <sup>1</sup> , Peter Fischer 1,2, Jochem Kail <sup>3</sup> and Markus Venohr 1,4**


Received: 7 January 2020; Accepted: 11 February 2020; Published: 25 February 2020

**Abstract:** Bayesian networks (BN) have increasingly been applied in water management but not to estimate the efficacy of riparian buffer zones (RBZ). Our methodical study aims at evaluating the first BN to predict the RBZ efficacy to retain sediment and nutrients (dissolved, total, and particulate nitrogen and phosphorus) from widely available variables (width, vegetation, slope, soil texture, flow pathway, nutrient form). To evaluate the influence of parent nodes and how the number of states affects prediction errors, we used a predefined general BN structure, collected 580 published datasets from North America and Europe, and performed classification tree analyses and multiple 10-fold cross-validations of different BNs. These errors ranged from 0.31 (two output states) to 0.66 (five states). The outcome remained unchanged without the least influential nodes (flow pathway, vegetation). Lower errors were achieved when parent nodes had more than two states. The number of efficacy states influenced most strongly the prediction error as its lowest and highest states were better predicted than intermediate states. While the derived BNs could support or replace simple design guidelines, they are limited for more detailed predictions. More representative data on vegetation or additional nodes like preferential flow will probably improve the predictive power.

**Keywords:** model evaluation; nitrogen; nutrient retention; phosphorus; sediment

#### **1. Introduction**

The eutrophication of lakes and rivers remain a global concern despite considerable efforts in the past to reduce the input of nutrients from terrestrial environments [1]. After targeting the industrial and domestic sources, the diffuse agricultural inputs of nitrogen (N), phosphorus (P), sediments, and also pesticides gained importance [2–5]. Particularly, the legacy of long-term excess applications of fertilizers and manure became eminent [6,7]. In the European Union (EU), for instance, diffuse sources affect now more water bodies than point sources [8].

Many processes and factors control the mobilization and the diffuse transport of agricultural pollutants. Accordingly, various "best management practices" (BMP) can contribute to reduce the risk of pollutants entering water bodies. Riparian buffers zones (RBZ), also termed vegetated buffer strips, are vegetated areas along water bodies which are either too difficult to work or intentionally set aside [9]. RBZ stand among the most widely applied BMP to reduce the diffuse pollution of surface waters but also to improve their hydro-morphological and thermal conditions [9–11]. This is reflected by the rich literature on the efficacy of RBZ to retain substances from adjacent upslope areas (henceforth catchments).

The RBZ efficacy is expressed as the ratio of the amount of nutrients, pesticides, or sediment leaving the buffer towards the water body and the amount of material entering the RBZ from the catchment. It depends on RBZ properties and external factors like land management which, in turn, determine the runoff and incoming nutrients [12,13]. The large variability in RBZ efficacy reported in the literature reflects the variability of those factors.

Therefore, modelling approaches to predict RBZ efficacy in large-scale applications tend to be (too) data-demanding, e.g., process-based models like VFSMOD [14] and REMM [15], or (too) simplistic [16]. For instance, Weissteiner et al. [17] used RBZ width as single variable to estimate the efficacy of N and P retention in their model for the EU. This approach is in line with regulations and recommendations on RBZ design which specify adequate widths of RBZ as simple, easily administrable rules to reduce diffuse emissions of nutrients and sediments [18–20]. However, the use of RBZ width as the only design variable is not commonly considered sufficient [21,22].

Some studies used sub-datasets and applied multivariate regression to improve the model performance. For example, Zhang et al. [13] derived separate models for different vegetation types (grass and mixed, only trees), again only with RBZ width as the single explanatory variable for N and P retention. For sediment retention, they considered RBZ slope as the second variable which was supported by a previous study that identified RBZ width and slope to be best correlated with sediment retention [23]. In contrast, Yuan et al. [24] found only a weak relationship between sediment retention and slope. In their meta-analysis on nitrate retention, Mayer et al. [25] also tested RBZ width as the only predictor in regression models for different width classes, vegetation types, and flow pathways. The explained variability (r<sup>2</sup> ) in these subsets was partly higher than the r<sup>2</sup> values for the full dataset. They concluded that more factors need to be considered. In accordance, Hassanzadeh et al. [16] achieved even higher r<sup>2</sup> values by considering various RBZ and external variables in their regression model.

Nonetheless, such regression models have limitations which can be addressed by Bayesian networks (BN, also termed Bayesian belief networks, [26]). Firstly, BN explicitly deal with the uncertainty in data or conditions because they represent the relationships (links or edges) between system variables (nodes) probabilistically rather than deterministically [27]. Secondly, they can integrate even incomplete quantitative and qualitative information (e.g., "sandy soil") from different domains such as expert knowledge, field data, and other BN. Thirdly, BN can provide information on causes from evidence of observed effects.

Although BN have increasingly been applied for water resource management, decision-making, predictions, and assessments of ecosystem services during the last decades [26,28–30], they have only implicitly considered RBZ retention so far [31]. Therefore, the aims of this study are (i) to develop the first BN to predict RBZ efficacy to retain N and P from RBZ and external factors, (ii) to train and cross-validate the BN based on empirical data compiled from an extensive literature review, and (iii) to evaluate how node selection and the number of their states, i.e., data discretization, affect model accuracy. As our BN aims at medium- to large-scale applications, we focused on commonly available input data and grouped nutrient fractions of N and P.

#### **2. Materials and Methods**

#### *2.1. Data Collection and Preprocessing*

We compiled the measured RBZ efficacy for N, P, and sediment and information on generally available RBZ and external variables (first column in Table 1) based on 23 reviews published between 1994 and 2019 [12,13,22–25,32–48] (Supplementary Table S1). These variables were selected based on key processes of nutrient and sediment retention discussed in the literature, e.g.,

• RBZ width and nutrient form: after entering the RBZ, the velocity of surface runoff decreases due to surface roughness and soil infiltration [22,49,50]. This results in a lower transport capacity and a deposition of sediment and particulate nutrients, which is an important process for their

retention [40]. Positive yet variable relationships of RBZ width and efficacy have widely been acknowledged in reviews and guidelines (e.g., [19,33,48]) with broader RBZ being needed to efficiently retain dissolved nutrients compared to sediment [46].


**Table 1.** List of riparian buffer zones (RBZ) and external variables used as nodes in the Bayesian networks (BN) and their discrete states. State definitions are shown as comma-separated lists. The variables (nodes) RBZ width, soil texture, RBZ slope, and RBZ efficacy were discretized using different numbers of states (given in separate lines, cf. section on BN design for details on the state definition).


<sup>1</sup> Loam (L), clay (C), sand (S), silt (Si), fine (f), coarse (c), <sup>2</sup> including orthophosphate, <sup>3</sup> i.e., nitrate.

As far as possible, the measured RBZ efficacy values—preferably from mass (load or yield)—were extracted from the original studies to minimize compilation errors and to avoid double counting. From 139 studies, published between 1977 and 2017 (Figure 1b), we gathered 341 datasets with at least one efficacy value mainly located in North America and Western Europe (Figure 1a). Data from other continents as well as model outcomes, (constructed) wetlands, and ponds were not considered.

**Figure 1.** Overview of collected data; (**a**) complete/incomplete datasets per country *(n* = 341, cf. Table 2), (**b**) number of studies per publication year (*n* = 139).


**Table 2.** Completeness of literature values.

<sup>1</sup> One study with partly incomplete soil data, <sup>2</sup> three countries without complete data.

A total of 269 of the collected datasets were complete, i.e., with values for all variables shown in Table 1 (Table 2). The others missed RBZ slope (*n* = 52) and soil texture (*n* = 61). Slope was missing for 39 groundwater studies, i.e., 27% of all, thus shifting the dataset towards studies on surface flow. To increase the sample size, we calculated missing average values from ranges, e.g., the minimum and maximum efficacy given for RBZ of 10–15 m width was converted to an average value of 12.5 m width, and applied no quality criteria apart from data completeness.

Two thirds of the 580 efficacy values with complete data originated from plot studies and one third from field and watershed studies, with different experimental designs and time scales. Rainfall data was particularly inconsistently reported, e.g., as average values or intensities, and was therefore excluded from the data collection.

#### *2.2. Aggregating the Nutrient Forms*

We collected the efficacies for five nutrient forms and sediment. In order to reduce the number of states and increase the number of data sets for each state, we applied preliminary regression analyses to quantify the similarity of RBZ efficacy for particulates (mostly sediment) and total N and P (TN, TP) as well as the dissolved forms (DN, DP) before setting up the BN.

#### *2.3. BN Design—Nodes and States*

BN are directed acyclic graphs which link variables (called parent and child nodes) without feedback loops. The nodes and links of the BN were based on the literature listed above: the RBZ and external variables in the first column of Table 1 were considered as independent parent nodes directly influencing RBZ efficacy as the final child (output) node of the BN (Figure 2).

**Figure 2.** BN structure with nodes and links. We assumed that the reported soil textures are similar in the RBZ and their catchments.

In a BN, discrete states have to be defined for each node. They have to be exhaustive and mutually exclusive. Apart from flow pathway, the states of the nominal variables were groups of the reported values (see columns three and four in Table 1). The original nutrient forms were aggregated to the three states "particulate", "mixed", and "dissolved" based on the preliminary regression analyses. The various values of soil texture were grouped into (i) the three states "fine" (clayey soils), "medium" (loamy and silty soils), and "coarse" (sandy soils), and (ii) only two states "fine" and "coarse" (comprising loamy, silty, and sandy soils). The plant species composition, structure, and management of RBZ varied widely in the literature and were simplified to "grass" (herbaceous) and "woody" (with shrubs and trees).

In contrast to the nominal variables, the three continuous variables had to be discretized. Different discretization approaches exist but seemingly no optimal one [59,60]. Given the uneven value distribution and to acknowledge the limited data availability for parent nodes, we used the common equal-frequency approach. For RBZ width, however, we slightly adjusted the quantiles to match legal regulations. Since one aim of this study was to assess the effect of state definition on model accuracy, different numbers of quantiles were used (see Table 1).

#### *2.4. BN Training and Evaluation—Importance of Nodes and State Definition*

≤ BNs with different number of states were trained using efficacy values with data for all parent nodes (*n* = 580) and the Maximum Likelihood parameter estimation. The training filled the conditional probability tables (CPT) which describe the probability of each state of the child node considering all possible states of the parent nodes, based on Bayes' theorem. The different state definitions of four nodes (see Table 1) resulted in 72 different BNs. A 10-fold cross-validation—i.e., 90% of the data used for training and 10% for validation—was repeated 10 times for each of these BNs to compare prediction (classification) errors [59,61], i.e., the share of wrongly predicted output states. Ten different probabilities and prediction errors were thus obtained for each BN, ranging from 0 (all predictions correct) to 1 (all predictions wrong).

In order to assess how the state definition (state number) of the parent and output nodes affected the model performance, we used the non-parametric Wilcoxon Signed Rank test to determine whether the prediction errors differed significantly for parent nodes with different numbers of states (*p* < 0.05).

We also compared the complete BNs with all parent nodes to simplified BNs where the least important nodes were removed. The node importance was evaluated in two ways. Firstly, we applied the non-parametric Kruskal–Wallis test to each parent node to assess whether the RBZ efficacy for at least one state significantly differs from the other states (*p* < 0.05). Following Moe et al. [62], we also performed a classification tree analysis for each of the 72 possible state combinations. For all parent nodes we compared their relative importance and their occurrence in the classification trees to a random binary variable. In the simplified BN, we did not consider nodes with an influence close to the random variable and with a low occurrence in the classification trees (≤33%). As the values of the random variable vary with each repetition, we averaged the outcomes of 10 analyses to avoid misclassifications. The cross-validation and state evaluation were repeated with the 72 simplified BNs. We discarded undefined output from validation data outside the training data.

#### *2.5. Implementation*

All statistical analyses were conducted with the software R [63]. For the training and cross-validation of the BNs we used the R package bnlearn [64], and rpart [65] for the classification tree analysis. To exemplarily show the influence of individual parent nodes on the output, we trained an "optimal" BN—based on the prediction errors and the CPT size—with the full dataset. For the node of interest, we determined the BN output for each possible state by setting its probability to 1 ("hard evidence") while assuming an equal probability of all states of the other parent nodes ("soft evidence"). These calculations were conducted using the R package gRain [66].

#### **3. Results**

#### *3.1. Aggregating the Nutrient Forms*

The retention of TN and TP were well correlated (r<sup>2</sup> = 0.64). Both were also more significantly correlated to sediment retention values (Figure 3, r<sup>2</sup> = 0.40–0.48, *p* < 0.0001) than the DN and DP retentions (r<sup>2</sup> = 0.09, *p* = 0.003). Based on these results, we considered TN and TP together as a separate "mixed" state of the node "nutrient form" and distinguished it from the "particulate" state (mainly sediment) and the "dissolved" nutrient forms, for which sedimentation is less important.

**Figure 3.** RBZ efficacy to retain total phosphorus (TP) (**a**), total nitrogen (TN) (**b**), as well as dissolved N and P (DP, DN) (**c**) compared to particulates as collected from the literature with (dashed) lines of best fit. Higher state numbers correspond to broader RBZ.

#### *3.2. Variability of RBZ E*ffi*cacy and Importance of Nodes*

≤ The reported RBZ efficacy ranged from complete retention to negative values, i.e., where the RBZ acted as sources. The distribution of all values was left-skewed with a median of 68% and a mode of 87%. This variability also resulted in a strong overlap between the states of the parent nodes (Figure 4). Therefore, even wide buffers can have a rather low efficacy (Figure 4a). However, the Kruskal–Wallis test revealed significant differences in RBZ efficacy among the states of all parent nodes (*p* ≤ 0.0003) except the flow pathway (*p* = 0.07) and RBZ vegetation (*p* = 0.8). For instance, RBZ are generally more effective with coarse soils than with fine and medium soils (Figure 4c) except for DP (not shown).

The classification tree analysis confirmed the strong influence of nutrient form, RBZ width, and soil texture on the RBZ efficacy (Figure 5). Despite its comparatively low importance (9%), RBZ slope occurred in 60% of the classification trees. The least important variables were RBZ vegetation and flow pathway (6%). Both variables rarely occurred in the trees, although they were more influential than a random variable (1.5%). This was especially the case for the flow pathway.

**Figure 4.** RBZ efficacy for the states of the parent nodes using the maximum number of states as given in Table 1. Higher state numbers correspond to broader RBZ (**a**) and steeper RBZ (**f**).

**Figure 5.** Relative importance (**a**) and occurrence of BN nodes (**b**) in the classification trees compared to a random binary variable. Note the changed order of (RBZ) vegetation and (flow) pathway.

#### *3.3. Cross-Validation and Evaluation of BNs*

Fewer output states resulted in lower prediction errors, with average values over all BNs ranging from 0.31 (two states) to 0.66 (five states). However, the BNs predicted the lowest and highest output states significantly better than the intermediate states (Figure 6). These errors correspond to 62% of the expected error of a random predictor (E) for the two-state output (E = 0.5) up to 82% for the five-state output (E = 0.8). The values for the minimum and maximum states equaled on average 63% of E.

− − The prediction errors for RBZ efficacy were significantly lower when the nutrient form was "particulate" or "mixed" rather than "dissolved", on average by 0.05 (−0.05–0.15, Figure 7). The impact of the flow pathway and the RBZ vegetation on the prediction error was negligible. The error even decreased on average by 0.01 after the BNs were simplified, i.e., by excluding the RBZ vegetation and the flow pathway. Limiting the database to the respective dominant state of these two nodes, i.e., to grassed RBZ and surface flow (*n* = 391), had no additional effect on the error.

**Figure 6.** Cross-validation results for the different output states based on the 72 complete BNs with all nodes. Higher state numbers correspond to more effective RBZ.

The prediction errors were generally lowest when the parent nodes had three states (Table 3). However, for some BNs with all six parent nodes ("complete BN" in Table 3) choosing four states for the node "RBZ width" resulted in better predictions. Choosing two states, on the other hand, increased the predictions errors by up to 0.05 for RBZ width.


**Table 3.** Effect of the state number of parent nodes on prediction errors for different BNs. Mean errors and *p* values based on the outcomes of six BNs (RBZ width, RBZ slope) and nine BNs (soil texture). The nodes "flow pathway" and "RBZ vegetation" in complete BNs have only two states and are not shown.

<sup>1</sup> One-sided Mann–Whitney U test, symbols: *p* < 0.001 (\*\*\*), < 0.01 (\*\*), < 0.05 (\*), <sup>2</sup> prediction error smaller with two states than with three states.

**Figure 7.** Cross-validation results for disaggregated (dis), particulate (par), and mixed (total, mix) nutrients and different numbers of output states.

#### **4. Discussion**

#### *4.1. BN Cross-Validation and Evaluation*

We tested the influence of nodes and states on RBZ efficacy in order to identify the minimum data requirement while ensuring no significant decrease in model performance and reliability. In this sense, the simplified BN with four out of the six considered parent nodes and with three (slope, soil texture) or four states (width) for the parent nodes as well as two to three output states can be considered as "optimal" among the tested alternatives (Figure 8A). The CPT size, i.e., the product of the state numbers, is 216 or 324 elements—which is manageable given the available input data [59]. Nonetheless, 0.3% or 0.9% of the predicted values during the cross-validation were undefined (compared to 0.2% for the smallest CPT size).

In accordance to the reviewed literature (cf. Methods), the BN reflects that a high RBZ efficacy was less probable for narrow than for broad RBZ (Figure 8B, a), for fine sediments than for coarse sediment (Figure 8B, b), and for dissolved nutrients than for particulate nutrients (Figure 8B, c). The more complex pattern for RBZ slope (Figure 8B, d) depended on the nutrient form (not shown). The higher probability of low RBZ efficacies for medium slopes occurred only for TN and TP. For particulate and dissolved nutrients, the efficacy of RBZ declined with increasing slope. Considering the dominance of studies on surface flow, this pattern can be explained by higher flow velocity and transport capacity on steeper slopes (cf. Methods). Nonetheless, it differs from findings of Zhang et al. [13], based on a smaller dataset, that sediment retention decreases with slopes steeper than 10% but increases with slopes below this threshold. At the plot scale—which dominates the collected data—flow resistance and other flow characteristics varies with slope gradient but also depends on the fragmentation of riparian vegetation [67] which is usually not available from RBZ studies. Following Zhang et al., more research on slope effects on RBZ efficacy is recommended to revise the inconsistent probabilities.

The use of these commonly available variables as nodes facilitates the envisaged application and validation of this first BN to estimate the retention efficacy of RBZ. This contributes to a better evaluation at medium to large scales of how RBZ can help to improve the water quality. The BN was significantly better in distinguishing effective (high state) from ineffective RBZ (low state) than between intermediate states (Figure 6). With two output states, the BN can for instance support or replace guidelines and design aids for effective RBZ based on single factors. For quantitative assessments, the discrete output states could be replaced by the average values of 45% and 90% efficacy in our database, or 30%, 70%, and 90% if a third state is needed. The high prediction errors hamper the use of more intermediate efficacy states at least in combination with our heterogeneous empirical dataset. Note that the BN is not intended to replace but rather complement other modelling approaches. For instance, Figure 8b illustrates that the BN can deal with uncertain evidence and provides probabilities of RBZ efficacy. Nonetheless, the high influence of RBZ width, soil texture, and slope emphasize the need of multivariate approaches in RBZ analyses.

**Figure 8.** The "optimal" BN with three output states (**A**) influence diagram (Supplementary File S2, par. = particulate, diss. = dissolved), (**B**) probability of the output states (RBZ efficacy) given hard evidence of the parent node mentioned below the subfigures a–d (state with probability = 1 on x axes) and soft evidence (here equal probability of all states) of the other parent nodes. Higher state numbers correspond to broader RBZ (RBZ width), steeper RBZ (RBZ slope), and more efficient RBZ.

Although the loss of information is a potential limitation of discrete BN [59], defining more than two states for parent nodes was only slightly helpful to improve the model accuracy. The prediction errors as well as the variability and the significant inconsistency among the reported efficacy values may indicate the need for supplementing BN nodes. Even with more input data than in previous studies, our analyses indicate that additional parent nodes are not necessarily useful to improve predictions if the data is biased and does not cover the variability of site conditions. For instance, the low influence of RBZ vegetation is in line with the similarity of woody and grassed RBZ as observed in previous studies (cf. Methods). However, the dominant grassed RBZ were on average narrower (9 m) than the woody RBZ (22.6 m). Moreover, the common distinction between grassed and woody RBZ does not consider the influence of different plant groups and species on the nutrient retention in RBZ [68,69]. Vegetation and root density seem to be promising to refine the state definition and to improve model predictions (cf. Methods), but such data is unavailable in existing RBZ studies.

#### *4.2. Further BN Development*

The current BN can be improved in different ways. The collection of training and validation data could be limited to specific experimental designs, scales, or site conditions. This simple approach would minimize the impact of some factors not explicitly considered in the BN at the expense of restricting our intended applicability at large scales. More nodes and states, on the other hand, require enough representative training data to fill the rapidly growing CPT. Expert knowledge can reduce the data demand (e.g., [70]) but its validation can be challenging [71]. We exemplarily discuss below constraints of the available data regarding the design and application of the BN (and other empirical models).

Firstly, only less than 20% of the datasets were on subsurface flow. The underrepresentation was most striking for DP (*n* = 10). Although limiting the dataset to surface flow did not reduce the prediction errors, the observed low influence of the flow pathway on RBZ efficacy and the (slightly) higher errors for dissolved nutrients (Figure 7) should be scrutinized with more (DP) data. However, revising the role of flow pathway may also require considering the contrasting effects of external factors like redox conditions on the retention of P (oxidation/reduction of iron oxides) and N (denitrification) either as separate node or as additional DN and DP states of the nutrient node.

Secondly, the current BN may overestimate the efficacy of "real-world" RBZ because most literature values were measured in rather short-term studies. However, aging buffer strips may turn from sinks to sources [34,53,72] and foster eutrophication if poorly maintained [72–74]. Negative values occurred in 26 datasets and comprised mostly DP (*n* = 16) and DN (*n* = 9), followed by TN (*n* = 3), TP (*n* = 2), and PP (*n* = 1). Given the variety of underlying processes and the small number of cases, they could not be considered as a separate state in the output node. Apart from the remobilization of P from saturated RBZ soils as DP [34,45], P-rich colluvium in RBZ can be remobilized during heavy rainfall or flood events. Detailed maps on water-soluble P, degree of P saturation, and residual soil P in agricultural areas (e.g., [7]) might be helpful to overcome this limitation in future BN. The data availability is, however, more challenging for other processes such as the decay of plants [74,75].

Thirdly, other below-average retention efficacies e.g., for sediment retention in our data collection were explained by conservation tillage (CT) [76,77] and preferential flow [78] in the catchment of RBZ. CT, i.e., minimum- and no-till, has increasingly been used worldwide during the last decade [79], and might do so in the future [80], to reduce sediment and PP losses from agricultural land. However, the accumulation of labile P pools in top soils and the development of preferential flow paths belowground can unintendedly impair the RBZ efficacy to retain soluble, immediately bioavailable P [34,81]. No-till can also increase the water and nitrate flux entering RBZ from adjacent fields [82]. While more water flux can reduce the nitrate retention in subsurface flow [46], higher (initial) concentrations favor its retention [47].

In the few available quantitative studies on CT, no-till reduced significantly the RBZ efficacy compared to conventional till, notably except for P (Figure 9a). However, most of the values already belonged to the lowest efficacy state (RBZ efficacy < 55%). Therefore, the influence of CT on the BN output would currently be rather low. It might also not be representative for other output states. By separating the soil texture in RBZ and their catchments, the BN could consider some of the complex impacts of CT together with nutrient form.

**Figure 9.** Examples of non-representative and inconsistent effects on RBZ efficacy, (**a**) tillage (data from references [74,76,83,84], paired *n* = 30), (**b**) preferential flow (left: data from references [78,85], right: data from [86], without switchgrass barrier, paired *n* = 6).

As a side effect, these two BN nodes can generally integrate the effects of hydrology [46,87], land use, and land management [16,21]. These variables affect the concentration [47,88], timing, amount, and composition of nutrient and sediment entering RBZ and have been used in other empirical RBZ models (e.g., [16,46]).

Fourthly, most published RBZ studies were plot studies which are typically constructed with uniform slopes to favor sheet-flow conditions under which RBZ work best [48,89]. However, they neglect the tendency of water to converge and develop preferential flow paths at larger scales and within natural terrain [90–92]. According to the very few available quantitative studies, preferential flow can reduce the efficiency of RBZ [78,85,86,93] (Figure 9b, left). Nonetheless, RBZ may perform equally well depending on RBZ width and structure as well as the runoff conditions [86,91] (Figure 9b, right). According to our data collection, plot studies reported on average less effective RBZ than fieldand watershed studies which can be explained by the narrower RBZ in these plot studies. Therefore, more field studies are needed to reliably quantify how preferential flow affects the efficacy of RBZ.

In an extended BN, the current output state could be reduced based on the probability of preferential flow. The likelihood of preferential surface flow is often estimated from digital elevation models by applying area thresholds or variants thereof [94] or from linear anthropogenic features [95]. Similar slope–area relationships are known for e.g., the topographic index [96,97] and soil erosion [98] which also influence the variability and composition of nutrients.

#### **5. Conclusions**

The discrete, multivariate BNs to predict RBZ efficacy developed and evaluated in our study are flexible tools which e.g., provide probabilities instead of exact outcomes. They predict sufficiently well the RBZ efficacies collected from previous studies. The "optimal" BNs have four parent nodes (width, slope, nutrient form, and soil texture) with more than two states and up to three states for RBZ efficacy. The resulting CPTs are manageable with the available input data. RBZ vegetation and flow pathway are found to be not relevant.

The model performance is better for the highest and lowest states of RBZ efficacy compared to the intermediate states. The BNs could thus complement or replace existing design guidelines for RBZ. The expected limitations for more detailed predictions arise from missing data for other variables as discussed in the literature as well as the heterogeneous database which does not adequately represent all site conditions. This is relevant for the vegetation type which typical states "grass" and "woody" comprise different species. Additionally, most data are obtained from plot-scale studies which typically do not consider preferential flow. More field-scale studies are needed to better represent preferential flow in BNs. Future research should focus on the database to better cover the different state combinations and to provide new data to train and validate the relationships.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4441/12/3/617/s1, Table S1: Data table, File S2: BN in Figure 8a as Hugin. net file.

**Author Contributions:** Conceptualization, A.G. and M.V.; methodology, A.G. and M.V.; software, A.G.; validation, A.G.; formal analysis, A.G.; investigation, A.G., P.F., and M.V.; resources, A.G. and P.F.; data curation, A.G. and M.V.; writing—original draft preparation, A.G.; writing—review and editing, A.G., H.H.N., P.F., J.K., and M.V.; visualization, A.G.; supervision, M.V.; project administration, J.K. and M.V.; funding acquisition, J.K. and M.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the German "Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research)", grant number 01LC1618A within the 2015–2016 BiodivERsA COFUND call. Peter Fischer was financed by the "Ministerium für Umwelt, Klima und Energiewirtschaft (Ministry of the Environment, Climate Protection and the Energy Sector) Baden-Württemberg" within the project "Retention of sediments, nutrients, and pesticides in riparian buffer zones".

**Acknowledgments:** We thank Daniela Pietras (University of Rennes) for assisting the data collection. We are thankful to three anonymous reviewers for their detailed and valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
