*4.2. Fuzzy Regression*

A fuzzy set can be seen as a mapping from a general set *X* to the closed interval [0, 1]. A fuzzy set can be expressed by a membership function, which shows to what degree an element lies in the examined fuzzy set. A membership function is confined in the interval [0, 1], with a membership degree of 0 indicating that the element does not belong to the set and a membership degree of 1 indicating that the element fully belongs to the set. Subsequently, an object with a membership degree between 0 and 1 will belong to the set to some degree [37].

A fuzzy number is a fuzzy set which, furthermore, satisfies the properties of convexity and normality. It is defined in the axis of real numbers and its membership function is a piecewise continuous function [70].

The (soft) α-cut set of the fuzzy number *A*, with 0 < α ≤ 1, is defined as follows [71]:

$$\left[A\right]\_a = \left\{\mathbf{x} | \mu\_A(\mathbf{x}) \ge a, \quad \mathbf{x} \in \mathbb{R}\right\}\tag{7}$$

where μ*A*(*x*) the membership function of the fuzzy number *A*; and *R* is the set of real numbers.

An interesting point is that the crisp set including all the elements with non-zero membership function is the 0-strongcut which can be defined as follows [72]:

$$\mathcal{A}\_{0^{+}} = \left\{ \mathbf{x} \middle| \mu(\mathbf{x}) > 0, \quad \mathbf{x} \in \mathcal{R} \right\} \tag{8}$$

More analytically, according to Equation (8), above the 0-cut is an open interval that does not contain the boundaries. For this reason, and in order to have a closed interval containing the boundaries, Hanss [73] suggested the phrase worst-case interval *W*, which is the union of the 0-strongcut and the boundaries [74].

Linear regression analysis is used to model the linear relationship between the independent variables and the dependent variable. Most collected data in the present study constitute independent variables and the derivative regression model should approximate the results of the dependent variable measurements according to the criteria specified by the analyst. In the fuzzy linear regression model, the difference between the computational data and the actual values (measurements) is assumed to be due to the structure of the system. The proposed model carries this uncertainty back to its coefficients or, in other words, our inability to construct a precise relationship, is directly introduced into the model, on the fuzzy parameters [75,76]. Based on the above reasoning, the coefficients for the independent variables are chosen to be fuzzy numbers. This study also deals with cases where both the input data (independent variables) and the derived output (dependent variable) are classic numbers. The problem of fuzzy linear regression is reduced to a linear programming problem according to the following steps [77]:

*Water* **2020**, *12*, 257

1. The model is as follows:

$$
\overline{\Upsilon}\_{\dot{j}} = \overline{A}\_0 + \overline{A}\_1 \mathbf{x}\_{1\dot{j}} + \overline{A}\_2 \mathbf{x}\_{2\dot{j}} + \dots + \overline{A}\_n \mathbf{x}\_{n\dot{j}} \tag{9}
$$

where .Υ is the fuzzy dependent variable; *j* = 1, ... , *m*; *i* = 1, ... , *n*; *A*.*<sup>i</sup>* = (*ai, ci*) are symmetric fuzzy triangular numbers selected as coefficients; and *x* is the independent variable (Figure 1). In addition, *n* is the number of independent variables; *m* is the number of data; a is the central value (where μ = 1); and *c* is the semi-width.

2. Determination of the degree *h* at which the data [(*x*1*j*, *x*2*j*, ... , *xnj*), *yj*] is aimed to be included in the estimated number *Yj*:

$$
\mu\_{Y\_j}(y\_j) \ge h, \quad j = 1, \dots, m \tag{10}
$$

The constraints express the concept of inclusion in case that the output data are crisp numbers. In the examined case of the widely used model of Tanaka [47], a more soft definition of the fuzzy subsethood is used compared to the Zadeh [42] definition. Hence, the inclusion of a fuzzy set *A* into the fuzzy set *B* with the associated degree 0 ≤ *h* ≤ 1 is defined as follows:

$$\|A\|\_{h} \subseteq \|B\|\_{h} \tag{11}$$

In our case, since the data are crisp (for each individual data), the set *A* is only a crisp value (a point of data which must be included in the produced fuzzy band) and the fuzzy set *B* is a fuzzy triangular number. Hence, Equation (11) is equivalent to:

$$\sum\_{i=0}^{n} a\_i \mathbf{x}\_{ij} - (1 - h) \sum\_{i=0}^{n} c\_i |\mathbf{x}\_{ij}| \le y\_j \le \sum\_{i=0}^{n} a\_i \mathbf{x}\_{ij} + (1 - h) \sum\_{i=0}^{n} c\_i |\mathbf{x}\_{ij}|, \quad j = 1, \dots, m \tag{12}$$

It must be clarified that the above equations hold for a specified *h*-cut and not for every α-cut. Normally, the 0-strongcut is used since greater levels of *h* lead to a greater uncertainty.

3. Determination of the minimization function (objective function) *J*. In the conventional fuzzy linear regression model, the objective function, *J*, is the sum of the produced fuzzy semi-widths for the data:

$$J = \left\{mc\_0 + \sum\_{j=1}^{m} \sum\_{i=1}^{n} c\_i |\mathbf{x}\_{ij}| \right\} \tag{13}$$

where *c*<sup>0</sup> is the semi-width of the constant term; and *ci* semi-width of the other fuzzy coefficients.

Since fuzzy symmetric triangular numbers are selected as fuzzy coefficients, it can be proved that the objective function is the sum of the semi-widths of the produced fuzzy band regarding the available data:

$$J = \left\{ mc\_0 + \sum\_{j=1}^{m} \sum\_{i=1}^{n} c\_i |\mathbf{x}\_{ij}| \right\} = \frac{1}{2} \sum\_{j=1}^{m} \left( Y\_j^+ - Y\_j^- \right) \tag{14}$$

where *Yj* <sup>+</sup>, *Yj* − the right and the left-hand side of the 0-strongcut, respectively.

4. The problem results in the following linear programming problem:

$$\min\_{\begin{subarray}{c}\sum\_{i=0}^{n}\left\{\mathbf{m}\mathbf{c}\_{0} + \sum\_{j=1}^{m}\sum\_{i=1}^{n}c\_{i}\big|\mathbf{x}\_{ij}\right\} \\ \sum\_{i=0}^{n}a\_{i}\mathbf{x}\_{ij} - (1-h)\sum\_{i=0}^{n}c\_{i}\big|\mathbf{x}\_{ij}\big| = \mathbf{y}\_{h}^{L} \leq \mathbf{y}\_{j} \\ \sum\_{i=0}^{n}a\_{i}\mathbf{x}\_{ij} + (1-h)\sum\_{i=0}^{n}c\_{i}|\mathbf{x}\_{ij}| = \mathbf{y}\_{h}^{R} \geq \mathbf{y}\_{j} \end{subarray} \tag{15}$$

where *ci* ≥ 0, for *i* = 0, 1, ... , *n*.

In addition, many times, when data are classic numbers, we can easily approximate non-linear cases with the fuzzy linear regression model with the help of auxiliary variables. In this case, the total uncertainty (cumulative width) indicates incomplete complexity, whereas non-physical behavior is an indicator of overtraining [77], due to adoption of excessive complexity in non-linear models.

**Figure 1.** Fuzzy symmetric triangular number.
