*3.4. Comparison of Training and Testing Datasets for Scenario 3*

In Scenario 3, 80% of the total dataset was used for training periods, while the rest, 20%, was used to test the models. The training results obtained by ANN, wavelet analysis, and SVM have been shown in Table 6.

As depicted from Table 6, for developed ANN models, model ANN-3 has the highest PCC value of 0.520; it has an RMSE value of 1.333 and a W.I. value of 0.688. Similarly, for the WANN model, WANN-1 has shown better performance with a PCC value of 0.725, the lowest RMSE value of 1.213, the highest NSE value of 0.519, and the highest WI value of 0.812. Further, SVM-RF-3 has shown better performance compared to other developed models. The SVM-RF-3 model has the highest PCC value of 0.893, the lowest RMSE value of 0.858, the highest NSE value of 0.760, and the highest WI value of 0.913 during training datasets. The values of PCC, RMSE, NSE, and WI for MLR techniques were 0.688, 1.269, 0.474, and 0.795, respectively. Thus, it can be depicted that SVM-RF has modeled the Epan most efficiently among all the machine learning algorithms developed for training.

For testing datasets, for developed ANN models, ANN-3 has the highest PCC value of 0.520, an RMSE value of 1.333, and the highest W.I. value of 0.688. Similarly, for the WANN model, WANN-1 has shown better performance with a PCC value of 0.467, an RMSE value of 1.447, and WI value of 0.639. Furthermore, among developed SVM-RF and SVM-LF models, SVM-RF-1 has shown better performance than other developed models. The SVM-RF-1 model has the highest PCC value of 0.528, the lowest RMSE value of 1.411, and the highest WI value of 0.665 during the testing of datasets.


**Table 6.** Results for ANN, WANN, SVM-RF, SVM-LF, and M.L.R. during the training and testing period for Scenario 3 (80–20: Training–Testing).

The values of PCC, RMSE, NSE, and WI for MLR techniques were 0.506, 1.363, −0.227, and 0.665. The scatter plot and line diagram for testing have been shown in Figure 8. From the line diagram, it has been observed that obtained results were under-predicted and over-predicted for all models. The scatter plot showed that the highest value of the coefficient of determination (R2) was obtained for SVM-RF models of 0.2791. Thus, it can be seen that SVM-RF has modeled the daily Epan most efficiently among all the machine learning algorithms developed for testing.

The comparative results of training and testing data results have been shown in Table 7. This table could suggest that training and testing data using the SVM-RF model, Epan, can be modeled more accurately than ANN and WANN.

The performance of models from best to lowest is SVM > ANN > MLR > WANN for all three scenarios. Table 7 also showed that the WANN model performed poorly compared to other models. This is because wavelet transformation does not reveal the hidden information present in the primary time-series data through different sub-series. It is also observed that, with an increase in the sample set for training, the testing data will show a less accurate modeled result.

**Figure 8.** Line and scatter plot between observed and predicted data at scenario 3 for (**a**) ANN, (**b**) WANN (**c**) SVM-RF, (**d**) SVM-LF, and (**e**) MLR, for the study area.

The comparative result of all three scenarios of all developed models has also been shown through Taylor's diagram [50] in Figure 9a–c, which acquires information based on correlation coefficient, standard deviation, and root mean square difference [27]. Figure 9a–c indicates that the SVM-RF model predictions in all three scenarios are very close to the daily values of Epan, which are tending more toward observed point values at abscissa. The performance-based correlation coefficient, standard deviation, and root mean square difference are also superior compared to others. Therefore, the SVM-RF model with Tmax, Tmin, RH-1, RH-2, WS, and SSH climate variables can be used for daily Epan estimation at the Pusa station.

**Figure 9.** Taylor diagrams of ANN, WANN, SVM-RF, SVM-LF, and MLR corresponding to (**a**) Scenario 1, (**b**) Scenario 2, (**c**) Scenario 3 during the testing period at the study site.


**Table 7.** Results for best ANN, WANN, SVM-RF, and MLR during the training and testing period for all scenarios.

#### **4. Discussion**

Our results as obtained are similar to the results of [17,39]. They modeled pan evaporation and found that the ANN and SVR models achieved high correlation coefficients ranging from 0.81 to 0.90. In addition, our findings are in agreement with Cobaner [15], who observed that the ANN model with Bayesian Regularization (BR) and algorithm during training, validation, and testing generated 0.76, 0.67, and 0.72, respectively. Applying Levenberg–Marquardt (LM) algorithm, the corresponding values were 0.77, 0.69, and 0.71, respectively. Furthermore, for SVR, this model's findings are close to those of Tezel and Buyukyildiz [51]. They concluded that the SVR gave high correlations, ranging from 0.86 to 0.90, for evaporation forecasting. Moreover, the results obtained with SVR are in line with Pammar and Deka [52]. They stated that the correlation coefficients and RMSE ranged from 0.79 to 0.84 and from 0.90 to 1.03 under the different kernels. The values of RMSE conducted by Alizamir et al. [17] were 0.836 and 0.882 for ANN 4-6-6-1 and 1.028 and 1.106 for MLR models through the training and testing period. Their results found that ANN's evaporation estimation was better than the estimation through MLR and agreed with the present study results. The ANN model of pan evaporation, with all available variables as inputs, proposed by Rahimi Khoob [21] was the most accurate, delivering an R<sup>2</sup> of 0.717 and an RMSE of 1.11 mm independent evaluation data set, which correlates with our outcomes. As reported by Keskin and Terzi [25], the R2 values of the ANN 3, 6, 1, ANN 6, 2, 1, and ANN 7, 2, 1 model equaling 0.770, 0.787, and 0.788 for modeling Epan are also acceptable and agree with our results. These developed models produced a more acceptable outcome than Kim et al. [53]. The latter stated that the ANN and MLR generated R2 values ranging from 0.69 to 0.74 and from 0.61 to 0.64. The RMSE for these models varied from 1.38 to 1.48 and from 1.56 to 1.60, respectively. However, all developed

models in this manuscript could not capture the variability of extreme values present in the input and output parameters at the given study location. The models' efficiency might be improved if the extreme values are removed. This is one of the limitations of the study outlined in this paper.

#### **5. Conclusions**

Evaporation processes are strongly non-linear and stochastic phenomena affected by relative humidity, temperature, vapor pressure deficit, and wind speed. In the present study, daily pan evaporation (Epan) estimation was evaluated using ANN, WANN, SVM-RF, SVM-LF, and MLR models. The input climatic variables for the estimation of daily Epan were: maximum and minimum temperatures (Tmax and Tmin), relative humidity (RH-1 and RH-2), wind speed (W.S.), and bright sunshine hours (SSH). The free availability of these meteorological parameters for other stations in Bihar, India, is a significant concern and limitation of this research. The proposed models were trained and tested in three separate scenarios, i.e., Scenario 1, Scenario 2, and Scenario 3, utilizing different percentages of data points. The models above were evaluated using statistical tools, namely, PCC, RMSE, NSE, and WI, through visual inspection using a line diagram, scatter plot, and Taylor diagram. Research results evidenced the SVM-RF model's ability to estimate daily Epan, integrating all weather details like Tmax, Tmin, RH-1, RH-2, WS, and SSH The SVM-RF model's dominance was found at Pusa station for all scenarios investigated. It is also clear that, with an increase in the sample set for training, the testing data will show a less accurate modeled result. Since the Pusa dataset has many extreme values, the developed model could not capture extreme values very efficiently; this is one of the limitations of this paper. Overall, the current research outcome showed the SVM-RF model's viability as a newly established data-intelligent method to simulate pan evaporation in the Indian area. It can be extended to many water resource engineering applications. It is also recommended that SVM-RF models can be applied under the same climatic conditions and the availability of the same meteorological parameters.

**Author Contributions:** Conceptualization, M.K., A.K. (Anuradha Kumari), D.K. and A.K. (Ambrish Kumar); methodology, M.K. and D.K.; software, M.K., A.K. (Anuradha Kumari) and R.K.; validation, M.K., A.K. (Anuradha Kumari), D.K. and A.K. (Ambrish Kumar); formal analysis, M.K., D.K. and A.K. (Alban Kuriqi); investigation, M.K.; resources, M.K., D.K. and A.K. (Ambrish Kumar); data curation, M.K. and A.K. (Anuradha Kumari); writing—original draft preparation, M.K., A.K. (Anuradha Kumari), R.A. and R.K.; writing—review and editing, M.K., D.K., R.A., A.E. and A.K. (Alban Kuriqi); visualization, D.K., N.A.-A., R.A., A.E. and A.K. (Alban Kuriqi); supervision, D.K., N.A.-A., A.K. (Ambrish Kumar), A.E. and A.K. (Alban Kuriqi); project administration, A.K. (Alban Kuriqi); funding acquisition, N.A-A. Please refer to the CRediT taxonomy for the term explanation. Authorship must be limited to those who have contributed substantially to work reported. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not available.

**Acknowledgments:** The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this manuscript further.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

