2.3.4. Multiple Linear Regression (MLR)

A linear regression analysis in which more than one independent variable is involved is called MLR. The advantage of MLR is that it is simple, showing how dependent variables interact with independent variables. The overall model of the MLR is:

$$y = c\_0 + c\_1 \mathbf{x}\_1 + c\_2 \mathbf{x}\_2 + \dots + c\_n \mathbf{x}\_n \tag{7}$$

where *y* is the dependent variable, and *x*1, *x*2, ... , *xn* are independent variables, *c*1, *c*2, ... , *cn* are regression coefficients, and *c*<sup>0</sup> is intercepted. These values are the local behavior calculated using the least square rule or other regression [27].

#### *2.4. Modeling Methodology*

In the present study, the daily pan evaporation (EPan) was estimated based on different input climatic variables (Tmax, Tmin, RH-1, RH-2, W.S., and S.S.H.). The five different techniques used for estimation were the artificial neural network (ANN), wavelet-based artificial neural network (WANN), radial function-based support vector machine (SVM-RF), linear function-based support vector machine (SVM-LF), and multi-linear regression (MLR) models. The climatic parameters were collected from 2013 to 2017 and split into three different scenarios, based on the percentage of training and testing datasets for model development (Table 3).


**Table 3.** Different scenarios of training and testing datasets used in this study.

Scenario 1 contains 60% (2013–2015) data for training and 40% (2016–2017) data for testing. Scenario 2 contains 70% data for training and 30% data for testing from 2016. Scenario 3 contains 80% (2013–2016) data for training and 20% (2017) data for testing. The training datasets were used for calibration purposes, while the testing dataset was used for validation purposes.

The results of the applied models in three different scenarios were evaluated through different performance evaluators described in Section 2.5.

#### *2.5. Performance Evaluation Criteria*

There were four criteria used to measure the performance of the scenarios mentioned above, quantitatively evaluated using root mean square error (RMSE), Nash–Sutcliffe Efficiency (NSE), Pearson's correlation coefficient (PCC), and Willmott index (W.I.), and qualitatively evaluated through graphical interpretation (time-series plot, scatter plot, and Taylor diagram). The RMSE range is zero to infinity (0 < RMSE < ∞); the lower the RMSE, the better the model's performance. The NSE ranges from minus infinity to one (−∞ < NSE < 1). NSE below zero (NSE < 0) indicates that the observed mean only as strong as the average, whereas negative values suggest that the observed mean a more robust indicator than the average [48]. The PCC is also known as the correlation coefficient and is used to calculate the degree of collinearity between observed and estimated values. The PCC varies from minus one to plus one (−1 < PCC < 1) [39]. The WI is also known as the index of agreement. The WI ranges from zero to one (0 < WI < 1); approximately 1 is ideal agreement/fit [3]. The most accurate models were selected based on the highest values of PCC, NSE, and WI, while showing the lowest values of RMSE among all developed models.

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{N} \left(E\_{p\_{obs,i}} - E\_{p\_{pre,i}}\right)^2}{N}};\tag{8}$$

$$NSE = 1 - \left[\frac{\sum\_{i=1}^{N} \left(E\_{p\_{obs,i}} - E\_{p\_{prec,i}}\right)^2}{\sum\_{i=1}^{N} \left(E\_{p\_{obs,i}} - E\_{p\_{obs,i}}\right)^2}\right];\tag{9}$$

$$\text{PCC} = \frac{\sum\_{i=1}^{N} (E\_{p\_{obs,i}} - \mathbb{E}\_{p\_{obs,i}})(E\_{p\_{pre,i}} - \mathbb{E}\_{p\_{pre,i}})}{\sqrt{\sum\_{i=1}^{N} \left(E\_{p\_{obs,i}} - \mathbb{E}\_{p\_{obs,i}}\right)^2 \sum\_{i=1}^{N} \left(E\_{p\_{pre,i}} - \mathbb{E}\_{p\_{pre,i}}\right)^2}}; \tag{10}$$

$$\text{WI} = 1 - \frac{\sum\_{i=1}^{N} \left( E\_{p\_{obs,i}} - E\_{p\_{pre,i}} \right)^2}{\sum\_{i=1}^{N} \left( \left| E\_{p\_{pre,i}} - E\_{p\_{obs,i}} \right| + \left| E\_{p\_{obs,i}} - E\_{p\_{obs,i}} \right| \right)^2}. \tag{11}$$

where *Epobs*,*<sup>i</sup>* , *Ep pre*,*<sup>i</sup>* observed and predicted pan evaporation values on the *i*th day. *Epobs*,*<sup>i</sup>* , *Ep pre*,*<sup>i</sup>* are average of observed and predicted values, respectively.
