5.2.3. Validation

While the unit stream power formula was built upon only laboratory data, as stated in [80], Yang primarily built his dimensionless unit stream power equation to be used by engineers for the estimation of the total sediment concentration in both laboratory flumes and natural rivers. To make sure of the applicability of his unit stream power formula in natural streams, Yang validated it with total sediment concentrations and total suspended sediment loads from several natural rivers and streams [81–85]. The results revealed that Equation (4) is fairly accurate in predicting total sediment load or total bed-material load in the sand size range in natural rivers, as it is for laboratory flumes [55,80].

To test the applicability of the crisp and fuzzy regression formulas, presented in this study, data from three different sandy-bed rivers in Wisconsin, USA, taken from a US Geological Survey [86], were used. More specifically, total sediment concentration (bed load and suspended load) measurements, from Wisconsin River at Muscoda, Black River near Galesville, Chippewa River at Durand and Chippewa

River near Pepin, were used for the validation of Equations (24) and (26). The median particle diameters (*d*50) were obtained from granulometric curves, which were constructed upon sieve analysis data, and are in a range between 0.38 mm and 0.88 mm. Along with the sediment data, basic hydraulic parameters, such as flow velocity, flow depth, energy slope and water temperature were available in the same survey. These data were used for the computation of the independent variables in Equations (24) and (26). The independent variables in any of the Equations (4), (24) or (26) represent the geometric and flow characteristics of the stream that they are applied for. A total of 55 sets of data were used for the validation of Equations (24) and (26).

Several well-known metrics, like the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE), the Mean Bias Error (MBE), the Index of Agreement (d) and the NSE, were used to test the validity of the crisp multiple regression Equation (24). Though the comparison between observations and computations resulted in low statistical errors (RMSE = 0, MAE = 0.3, MBE = −0.124), and a fair Index of Agreement (d = 0.483), a negative Nash-Sutcliffe Efficiency (NSE = −1.207) (see in Appendix A) indicates that Equation (24) cannot be applied for the selected data sets. However, this was not received entirely as a surprise. Despite the validated suitability of Yang's formula for both laboratory flumes and natural rivers in the sand range [55,80], Yang and Stall, in their report "Unit stream power for sediment transport in natural rivers" [80], stress also the constraints of the unit stream power theory for natural rivers. According to them, these constraints can be reduced to particle size, temperature and water depth. Adding to these the stream sediment transport uncertainties, mentioned in Section 3, it is realized that the successfulness of Equation (24)—which is the crisp regression—is not guaranteed for natural streams.

This deficiency of the crisp regression is overcome by the multiple fuzzy regression Equation (26), which contains 96.36% of the observed data in the fuzzy band. This can be clearly seen in Figure 4; 53 out of 55 observations are included in the produced fuzzy band of Equation (26).

**Figure 4.** Multiple fuzzy regression of total sediment concentration in natural streams.

In order to check the performance of the proposed fuzzy curve upon the observations, the following validation measures are proposed:

$$\begin{aligned} E\_1 &= \left[ \sum\_{j=1}^m a\_{R\_j} \left( \log \mathbb{C}\_{t\_j} - \log \mathbb{C}\_{F\_j}^- \right)^2 + \sum\_{j=1}^m a\_{L\_j} \left( \log \mathbb{C}\_{t\_j} - \log \mathbb{C}\_{F\_j}^- \right)^2 \right]^{\frac{1}{2}} \\\ a\_{R\_j} &= \begin{cases} 0 & \text{if } \log \mathbb{C}\_{F\_j}^- \geq \log \mathbb{C}\_{t\_j} \\ 1 & \text{if } \log \mathbb{C}\_{F\_j}^- \leq \log \mathbb{C}\_{t\_j} \end{cases} \qquad a\_{L\_j} = \begin{cases} 0 & \text{if } \log \mathbb{C}\_{F\_j}^- \leq \log \mathbb{C}\_{t\_j} \\ 1 & \text{if } \log \mathbb{C}\_{F\_j}^- \geq \log \mathbb{C}\_{t\_j} \end{cases} \end{aligned} \tag{31}$$

*Water* **2020**, *12*, 257

This validation measure expresses the divergence of the produced fuzzy band to include all data. In other words, the squared penalty term, *E*1, is activated if, and only if, the observed data are not included within the produced fuzzy band. This measure was initially proposed by Ishibuchi et al. [87] as a cost function to be minimized in the learning process, regarding a neural network with interval weights.

The second validation measure is to examine the number of points (observed data) which are outside the produced fuzzy band:

$$E\_2 = \sum\_{j=1}^{m} a\_{R\_j} + \sum\_{j=1}^{m} a\_{L\_i} \tag{32}$$

Obviously, the application of the above validation measures on the training data (laboratory data) leads to the identical values *E*<sup>1</sup> = *E*<sup>2</sup> = 0. By applying the validation measures to the data for natural streams, the following values are achieved: *E*<sup>1</sup> = 0.1868, *E*<sup>2</sup> = 2. This means that only two points do not belong to the fuzzy band (*E*<sup>2</sup> = 2), but these points are not far from the produced fuzzy band, as suggested by the low value of the *E*<sup>1</sup> measure.

Ultimately, the success of the crisp regression Equation (24) is not guaranteed when applied for a dataset different than the one it was created from (in this case a dataset from natural rivers). Indeed, there is a large dispersion of the measurement data from the crisp curve Equation (24) and hence, the Nash-Sutcliffe Efficiency obtains a negative value (Figure 4). Contrarily, by applying the fuzzy curve of Equation (26) for a dataset different than that it was created from, it is concluded that rather all the data is included within the produced fuzzy band with a small divergence (Figure 4). Therefore, the fuzzy curve can be used in order to achieve a fuzzy estimation of the logarithmized total sediment concentration.
