Next Article in Journal
Decoupling of Ecological and Hydrological Drought Conditions in the Limpopo River Basin Inferred from Groundwater Storage and NDVI Anomalies
Next Article in Special Issue
A Temporal Fusion Transformer Model to Forecast Overflow from Sewer Manholes during Pluvial Flash Flood Events
Previous Article in Journal
Improvements and Evaluation of the Agro-Hydrologic VegET Model for Large-Area Water Budget Analysis and Drought Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data

1
Department of Agricultural Engineering, Bahauddin Zakariya University, Multan 60000, Pakistan
2
School of Engineering, University of Basilicata, 85100 Potenza, Italy
3
Department of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China
4
School of Science and the Environment, Grenfell Campus, Memorial University, St. John’s, NL A1C 5S7, Canada
*
Authors to whom correspondence should be addressed.
Hydrology 2023, 10(8), 169; https://doi.org/10.3390/hydrology10080169
Submission received: 22 June 2023 / Revised: 27 July 2023 / Accepted: 9 August 2023 / Published: 11 August 2023

Abstract

:
To maximize crop production, reference evapotranspiration (ET0) measurement is crucial for managing water resources and planning crop water needs. The FAO-PM56 method is recommended globally for estimating ET0 and evaluating alternative methods due to its extensive theoretical foundation. Numerous meteorological parameters, needed for ET0 estimation, are difficult to obtain in developing countries. Therefore, alternative ways to estimate ET0 using fewer climatic data are of critical importance. To estimate ET0 with alternative methods, difference climatic parameters of temperatures, relative humidity (maximum and minimum), sunshine hours, and wind speed for a period of 20 years from 1996 to 2015 were used in the study. The data were recorded by 11 meteorological observatories situated in various climatic regions of Pakistan. The significance of the climatic parameters used was evaluated using sensitivity analysis. The machine learning techniques of single decision tree (SDT), tree boost (TB) and decision tree forest (DTF) were used to perform sensitivity analysis. The outcomes indicated that DTF-based models estimated ET0 with higher accuracy and fewer climatic variables as compared to other ML techniques used in the study. The DTF technique, with Model 15 as input, outperformed other techniques for the most part of the performance metrics (i.e., NSE = 0.93, R2 = 0.96 and RMSE = 0.48 mm/month). The results indicated that the DTF with fewer climatic variables of mean relative humidity, wind speed and minimum temperature could estimate ET0 accurately and outperformed other ML techniques. Additionally, a non-linear ensemble (NLE) of ML techniques was further used to estimate ET0 using the best input combination (i.e., Model 15). It was seen that the applied non-linear ensemble (NLE) approach enhanced modelling accuracy as compared to a stand-alone application of ML techniques (R2 Multan = 0.97, R2 Skardu = 0.99, R2 ISB = 0.98, R2 Bahawalpur = 0.98 etc.). The study results affirmed the use of an ensemble model for ET0 estimation and suggest applying it in other parts of the world to validate model performance.

1. Introduction

Estimation of reference evapotranspiration (ET0) has become momentous and necessary. It is considered a crucial parameter due to its boundless and extensive range of applications in hydrological studies. These studies are used to estimate the amount of water crops will need, making irrigation scheduling possible, stimulating crop yield, and enabling better planning and management of water resources [1]. Accurate estimation of ET0 values has gained higher importance in agro-meteorological, hydrological and water-balance studies. In the interaction between flora, atmosphere, and soil, ET0 is an important variable. Also, it can provide an accurate quantification for planning cropland water consumption [2] and effective irrigation [3].
Evapotranspiration is the term for the overall loss of moisture (water) caused by evaporation from surfaces like soil and plants [4]. The moisture loss from a well-irrigated grassy surface is referred to as “reference evapotranspiration” (ET0) [5]. By using in situ monitoring-based experimental techniques such as the lysimeter method, the Bowen ratio-energy balance methodology, or eddy covariance devices, ET0 can be measured directly [6]. A weighing lysimetric method that was based on the phenomena of water gain and loss to estimate ET0 directly was used by the author of [7]. This method gained significant importance among direct methods (eddy covariance system, Bowen ratio) and was widely used in scientific studies [8]. The high capital, operating, and maintenance expenses of these techniques may restrict their practical application. Therefore, the best option for quantification is to rely on an indirect technique. Utilizing empirical models, such as those based on temperature, radiation, mass transfer, and other variables, is one of these indirect strategies. Nearly all empirical models determine ET0. This is due to the difficulty of determining ET for each crop. As a result, crop coefficients are used to estimate crop evapotranspiration (ETc) of each desired crop once ET0 is first determined using indirect methods. To accurately estimate ET0, several attempts have been undertaken. The Penman–Monteith (FAO PM56) approach, however, was developed by Allen in 1998, and he validated it in a variety of climatic conditions. The FAO PM56 method is recommended by the United Nations Food and Agriculture Organization (FAO) as the primary reference approach for determining ET0 and validating other techniques [9,10,11]. Many locations throughout the world do not have the entire set of meteorological data needed to calculate ET0 using the FAO PM56 method. A substantial obstacle to estimating ET0 using the FAO PM56 approach is the lack of accessibility to all required information, uncertainty in dependability of climatic data, and unavailability of climatic data for many locations [12,13].
Recent research [14,15] reproduced FAO PM56 ET0 using machine learning (ML) by utilizing a comprehensive set of climatic data, revealing the links and interrelationships among the variables. Similarly, the results of a deep learning neural network model using only one predictor parameter of solar radiation and FAO PM56 for estimating ET0 were compared [16]. In a semi-arid location, the authors of [14] reported that Rs was the most important meteorological variable in determining ET0. It is possible to substitute the variable Rs with the number of sunny hours (n); however, this is not always a viable option. It is also supported by the authors of [17] who investigated the potential of a deep factorization machine, gradient boosting techniques, and three tree-based ML models for modeling daily ET0 in the context of a daily time series. According to previous studies [18,19,20], sunny hours have a stronger relationship with net radiation (Rn) than any other meteorological variable. As a result, this study chose N as an alternative to Rs [21,22]. The phenomenon of net radiation holds significant implications for the thermal characteristics of the Earth’s surface, thus constituting a crucial variable in the examination of land-surface phenomena and the wider topic of global climate change. Rn is the difference between inbound and outbound radiation (i.e., reflected shortwave radiation) at the surface of earth.
In practical applications, employing stand-alone AI models to process complex datasets can result in inadequate predictive capabilities. This limitation stems from the inability of an individual model to learn the diverse array of intricate patterns in data. The outcome can be suboptimal predictions. An ensemble of stand-alone prediction models can be used to get around this problem, yielding promising outcomes that surpass the performance of an individual model [23]. These are used to reduce single ML model bias and variance [24,25]. Ensemble of different ML models over the individual model yield best results as stated by the author of [26]. The authors of [27] have recommended the ensemble of ML models as they found better results in comparison to an individual model. The following literature highlighted the use of an ensemble approach reported recently in the literature.
The findings of the authors of [28] clearly depicted that ensembles of ML models have the capacity to increase the efficiency of individual models. By applying the ensemble approach, they have improved the efficiency of Artificial Intelligence (AI) models and empirical models up to 22% and 55%, respectively. Furthermore, they also found AI ensemble modeling superior to the empirical models. The ensemble-based genetic programming model was also utilized by the authors of [29] to measure the degree of unpredictability related to the model architecture. The findings support the idea that quantifying the structural ambiguity of the model may be carried out thoroughly, objectively, and realistically by using the projections of these ensemble models. To forecast the model’s dependability, the authors of [30] examined three linear ensembles and one non-linear ensemble technique. The non-linear ensemble surpassed all the other ensembles and the individual statistical and intelligent methods, according to the research. The ensembles created here can also be utilized to replace current techniques in effective ways.
Despite the ease with which weather data are being made accessible recently, many locations still lack reliable and consistent weather information. Insufficient weather observatories were established in Pakistan (the subject of our study), and climatic information for various sites was observed to be inadequate for calculating crop water needs based on ET0. Consequently, conventional techniques (like PM56) are not suitable to be used owing to exorbitant requirements of input or the absence of weather-related variables, such as Rs. The development of approaches depending on lesser weather-related data inputs and the advancement of ML algorithms for the estimation of ET0 with limited climate data become tasks of great significance. ML is among the finest solutions for developing an ET0 model for this purpose. However, the formation of an ML model that can be tested versus a target variable using an established set of input parameters is a key and essential issue that was successfully solved in this work. With less climate data, the constructed ML models were tested at several test sites to confirm their accuracy in predicting ET0. Moreover, in stand-alone applications, the ML models were prone to poor performance due to their inability to capture the trends and abruptly changing elements, which often reduced modelling performance. The objective of the ensemble technique, as shown by its notation, is to achieve distinctive characteristics for the component models that will result in the varied patterns that are displayed in the dataset [31]. In addition, an ensemble of various ML models increases the predictive ability of the model to draw input–output relations perfectly [32,33]. The selection of the best model to use in an ensemble depends on the outcomes of comparative analysis of the stand-alone performance of the models. The machine learning models with better performance are selected and ensembled into a conjunction to leverage the strengths of each model. The current study unifies the three tree-based techniques (TB, SDT and DTF) using an MLP-based non-linear ensemble through a parallel combination of the machine learning models. This implication enables the ensemble model to leverage the strength of each technique to enhance the final modelling accuracy.
In view of the above-discussed literature, it is evident that a reliable ensemble of machine learning models can significantly decrease parametric requirements to accurately predict reference evapotranspiration. Most of the existing literature focuses on making hydrological predictions or forecasts by direct modeling from input space to output space, therefore ensemble modelling is still a growing research direction in the field of hydrology. In the context of evapotranspiration estimation, an ensemble of tree-based machine learning techniques is a novel application. Therefore, this study aims to apply a tree-based ML ensemble approach for ET0 estimation with the following objectives: (i) apply sensitivity analysis using tree-based ML techniques to identify the best indicators of ET0 in order to reduce parametric requirements (ii) develop an ensemble model and improve ET0 estimation, (iii) investigate ensemble model performance at various climatic stations. In addition, the studies conducted on ET0 estimation using ML techniques have been limited to the analysis of only one climatic station or region. For example, the authors of [34] investigated a hybrid neural network approach in a semi-arid station only, and recommended using at least one climatic station from arid, semi-arid and humid regions to propose a generalized conclusion of the developed ensemble approach. Thus, the current study includes climatic stations from each selected region to investigate the performance of the developed tree-based ensemble ML approach.

2. Materials and Methods

2.1. Study Area and Datasets

In this study, 11 climatic stations located in different climatic regions of Pakistan have been studied. The input climatic parameters and daily average values of ET0 were recorded on a monthly basis in Bhakkar, Jhang, Toba Tek (T.T) Singh, Sahiwal, D.G Khan, Bahawalpur, Rahim Yar (R.Y) Khan, and Jacobabad as arid regions, while Multan, Islamabad, and Skardu were considered as hyper arid, semi-arid, and humid regions, respectively [35]. The monthly dataset duration of climatic stations and their climatic conditions corresponding to each region are explicitly mentioned in Table 1. Figure 1 indicates the geographic position of all the selected climatic stations. Blue dots represent climatic stations near to Multan Station (purple dot), while red dots represent distant climatic stations.

2.2. Methodology

Firstly, the climatic data of Multan station (1996–2015) were divided into 70% training and 30% testing sets and SDT, TB and DTF were applied to estimate ET0. This division of data into training and testing is practiced by most of the researchers in hydrology [36,37] and is also regarded as a simplified form of the V-fold rule of data partition [38]. Different input combinations of meteorological parameters i.e., Tmin, Tmax, Tmean, RHmean, (u(x)), and n, were formed and used as input in the selected tree-based ML model.
Afterward, an effective input parameter combination for ET0 estimation was selected by developing, training, and testing tree-based ML models (i.e., SDT, TB, and DTF) at Multan station, using input combinations. The tree-based linear and non-linear ensemble models were developed using the multi-layer perceptron (MLP) technique. Lastly, the performance of the developed tree-based ensemble model was tested in different weather stations located in various climatic regions (arid, semi-arid, humid) to validate the ensemble model’s results (for details: Sections 4.3 and 4.4). For this purpose, monthly data of climate parameters for the selected stations were applied as input to estimate ET0 values using a tree-based ensemble model. The FAO-PM56 Method, which is described in Section 2.2.1, was used to calculate the ET0 value that is indicated in Table 2. The statistical summary of the dataset for all the selected climatic stations is summarized in Table 2. In this section, we will further discuss the FAO-PM56 method and machine learning techniques used to estimate ET0. Further, we will also explain the development of non-linear ensemble models based on the best-performing machine learning technique.

2.2.1. FAO-PM56 Method

Using Allen’s [5] FAO-56 PM approach, the ET0 values for the Multan Station during the course of the research period were calculated using the meteorological variables:
E T 0 = 0.408 R n G + γ × 900 T m e a n + 273 × U 2 × e s e a + γ 1 + 0.34 u 2
e s = e m i n + e m a x 2
e a = e m i n × R H m a x 100 + e m a x × R H m i n 100 2
U 2 = w s × 4.87 × 1000 3600 × e m i n 67.8 × 3 5.42
where ET0 is calculated in mm/day, Rn is representing the net radiation (MJ/m2 day) at the surface of the crop, soil heat flux density is represented by G (MJ/m2 day, mean average temperature in °C is shown by the parameter Tmean, U2 denotes the wind speed (m/s), es ea, emin and emax represents the saturation actual, minimum and maximum vapor pressure (kPa). Finally, Δ and Ɣ are the vapor pressure curve slope (kPa/°C) and psychometric constant (kPa/°C), respectively.

2.2.2. Tree-Based Machine Learning Techniques

Tree-based machine learning approaches have a setup that resembles a tree and numerous nodes which are further responsible for examining and categorizing the given dataset [39,40]. The objective of this work was to identify the most useful climatic parameters for ET0 estimation using the techniques of TB, SDT, and DTF. The SDT consists of one decision tree while TB and DTF are designed on multiple trees. The difference between TB and DTF originates from the transfer of error from the previous tree to the next (i.e., series combination) in TB and the parallel combination in DTF. The background and applied procedure of these techniques can be found in [41]. In addition to finding optimal values, the ML techniques based on superlative algorithms are of critical importance. The selected ML algorithms corresponding to the applied tree-based techniques are given in Table 3.

2.2.3. Development of Ensemble Models

A concept of ensemble process was employed which united the single output of each ML model by means of an arbitration process to attain an accurate target value by improving its performance [30]. The author of [42] has explained the arbitration process while complete detail regarding ensemble modeling with its diversity and size is elaborated in [43]. Ensemble modeling has fractionized into different types: (a) linear ensemble (b) non-linear ensemble. Linear ensemble (LE) includes Stack regression [44], weighted average [45], and simple average methods [46], while a combination of ML techniques is called a non-linear ensemble (NLE). The non-linear ensemble method is favored and preferred over the linear ensemble method according to recent studies. Linear ensemble methods have the advantage of computational simplicity over NLEs, whereas the latter are sought as having greater predictive accuracy as compared to linear ensemble methods. In addition, the authors of [28] have found NLE modelling superior in comparison to NLE for ET0 estimation using pan evaporation data. They have also intricated the superior characteristics of NLEs over an LE ensemble approach and henceforth recommend applying NLEs to obtain significant results.
The ensemble modeling in this study was organized via one linear (simple averaging) and non-linear (combined ML techniques) ensemble method in order to make better comparison and a strong case. NLE methods combine the predictions of individual tree-based models using a non-linear function i.e., bagging or boosting. The non-linear function can be a weighted sum of the individual model predictions, or it can involve more complex operations such as decision trees, neural networks, or kernel methods. Non-linear ensembles can capture more complex relationships between the attributes of the input and the desired variable, and this can result in higher predictive accuracy compared to linear ensembles. In the linear ensemble (LE), a simple averaging method is carried out as:
E T L E = 1 N i = 1 N E T i
Here, ETLE, ETi, and N indicate the results of the ensemble model, the combination of the single model and the total number of selected models, respectively.
On the other hand, the outcome of each selected ML model has been accounted for and then further used as a predictor (input) in another chosen ML model to acquire entire ensemble results. In this study, a multi-layer perceptron (MLP) has employed the selected ensemble model. The NLE-ET0 is estimated based on the ET0 outputs of the ML models (SDT, TB, DTF) as:
E T N L E = f ( E T S D T , E T T B , E T D T F )
Here, ETSDT, ETTB, and ETDTF present predicted ET0 by SDT, TB and DTF models, respectively; while ETNLE is ensemble ET0 obtained by a non-linear ensemble (NLE) technique. The process continued until each subset had been analyzed once during validation. The general ensemble procedure can be seen in Figure 2.
Researchers [35,47] have confirmed the performance of the MLP (type of ANN) model over other AI models in the selection of a non-linear ensemble approach. For each ML model, the prime parameters of training algorithms, the number of iterations, convergence value and execution times always play a critical role [28]. Thus, this study has employed MLP as an ensemble model to obtain overall ensemble results. The parametric values for the selected ensemble model are given in Table 4 as recommended by the authors of [35].
By calculating the Nash–Sutcliffe efficiency (NSE), coefficient of determination (R2), and root mean squared error (RMSE), the performance of these models was examined. The error values indicate deviation error from the mean-ET0 value. In addition, the lowest deviation error from the mean, and highest effectiveness of climatic parameters on ET0 was observed [28]. The RMSE value for each model was calculated using Equation (7), while Equations (8) and (9) were used to determine the Nash–Sutcliffe Efficiency (NSE) and coefficient of determination (R2). The RMSE, NSE, and R2 values of both the training and testing datasets are summarized in Table 5, Table 6 and Table 7, respectively.
R M S E = i = 1 N ( E T o b s E T e s t ) 2 N
N S E = 1 i = 1 n E T o b s E T e s t 2 i = 1 n E T o b s E T o b s ¯ 2
R 2 = n i = 1 n E T o b s E T e s t i = 1 n E T o b s i = 1 n E T e s t 2 n i = 1 n E T o b s 2 E T e s t 2 n i = 1 n E T e s t 2 E T e s t 2
The value of RMSE is always positive as of the squaring function used in its mathematical formula. An increase in the divergence between observations and predictions results in an increase in RMSE value. The results obtained with a high RMSE value from the model are always ignored and not acceptable. Conversely, an output of low RMSE from the selected model has been chosen for perfect fit. If the value approaches 0, it shows the perfect fit of the model. Figure 3 refers to the flow chart of best input combination selection and a non-linear ensemble of tree-based techniques for ET0 estimation.

3. Results

3.1. Determination of Effective Climatic Parameters

A total of 17 models based on different meteorological-input datasets were tried using selected ML techniques for ET0 estimation at Multan station. It can be observed in Table 5 that model 15 having Tmin, RHmean, u(x) among all the models had the least RMSE value which indicated less deviation from ET0-mean values. However, the TB technique outperformed in testing as RMSE was recorded at 0.42 mm/month while 0.48 mm/month and 0.58 mm/month were calculated in the case of DTF and SDT, respectively. The testing NSE values observed for TB, DTF, and SDT while using Model 15 were 0.91, 0.93 and 0.90, respectively. Hence, DTF performed best in estimating ET0 using the selected input combination. Similarly, Table 6 summarizes the NSE values of 17 SDT, TB and DTF models with 17 input combinations, whereas Table 7 presents the summary of results in terms of R2.
To validate the results of ET0 estimation at Multan station (summarized in Table 5, Table 6 and Table 7), comparison of RMSE results obtained through tree-based techniques (SDT, TB and DTF) is graphically presented in Figure 4 to determine effective meteorological input combinations on ET0 estimation. It can be observed from Figure 4a that testing RMSEs for Model 1, Model 5, Model 10, Model 11, Model 13, and Model 15 under the SDT technique were found to be less than 0.7 mm/month. On the other hand, deviations in ET0 values were observed above 50% from the mean value when meteorological input combinations based on other models were used in SDT for ET0 estimation. The testing RMSEs for Model 1, Model 5, Model 13, and Model 15 under TB recorded less than 0.7 mm/month for ET0 estimation among all other applied models as shown in Figure 4b. For DTF, only Model 1, Model 5, and Model 15 generated testing RMSEs less than 0.7 mm/month as observed in Figure 4c. Similarly, Figure 5 graphically presents the performance of SDT, TB and DTF models in terms of NSE. Model 1 is based on the maximum number of climatic variables including Tmin, Tmax, RHmean, u(x), and n, while Model 5 uses Tmax, Tmin, n, and u(x), as input variables. Therefore, Model 15, having the minimum number of variables, is rendered as the best input combination.
The reason behind this is that some models did not contain temperature as an input parameter which generated more residuals in resulting values and hence the error recorded was highest. For Multan station, which has an arid climatic nature, the change in temperature affected ET0 and was considered an effective parameter for ET0 estimation.
The above results could be summarized as applied tree-based ML techniques with Model 15 having a total of three input parameters (Tmin, RHmean, u(x)) which outperformed other models and generated the best results for ET0 estimation. As the FAO-PM56 method is not only reliant on meteorological and aerodynamic parameters but also requires local calibration, in this situation tree-based techniques dependent on only meteorological parameters are the best alternative way to estimate ET0. Thus, a scatter plot of SDT, TB and DTF techniques’ performance in the testing phase using the Model 15 input combination against the FAO-PM56 method was plotted and is presented in Figure 6. The obtained results indicated that Model 15 with only 3 climatic parameters (Tmin, RHmean, u(x)) generated less variance and the R2 obtained is higher. For the TB-based model with input combination 15, an R2 value of 0.93 was observed during the testing phase. For SDT, this value was 0.94 and for DTF, it was 0.96. These observations of R2 also validated our above results which indicated that the RMSE value increased as the number of non-effective climatic parameters increased as input in applied tree-based techniques.

3.2. Ensemble Model Results

After the comparative analysis of DTF, TB and SDT performance at all seventeen input combinations, the best technique of DTF with Model 15 as the input combination was selected for ensemble. The ensemble of an individual technique enhanced the capability of the target value and generated close results to the actual value. In addition, the output obtained from the ensemble approach captured seasonal variations in the best way and generated good results against target values. The current study applied one linear (simple averaging) and non-linear (combined ML techniques) ensemble approach to estimate ET0. The obtained results are shown in Figure 7. Simple linear ensemble-based ET0 (LE-ET0) shows less accuracy (i.e., R2 = 0.89) than that of non-linear ensemble-based ET0 (NLE-ET0) (i.e., R2 = 0.97) with respect to PM-ET0. Similarly, the RMSE of LE-ET0 (RMSE = 0.38 mm/month) is higher than that of NLE-ET0 (RMSE = 0.18 mm/month).

3.3. Testing of the NLE Method at Nearby Climate Stations

In this section, comparison of NLE and FAO-PM56 is presented by considering climatic data from adjacent stations in southern Punjab. These climatic stations include Bhakkar, DG Khan, Jhang, RY Khan, Sahiwal, TT Singh and Bahawalpur. The selected Model 15, with an input combination of Tmin, RHmean and u(x), was used as input to estimate ET0 by applying an NLE approach and compared with the FAO-PM56 method. The obtained results for selected climatic stations are shown in Figure 8. At Bhakkar station, an MLP-based NLE model was able to reproduce the PM-method ET0 with a small estimation error (i.e., RMSE = 0.34 mm/month) and high similarity (i.e., R2 = 0.96). Similarly, values of RMSE at DG Khan, Jhang, RY Khan, Sahiwal, TT Singh, and Bahawalpur stations were 0.38, 0.36, 0.36, 0.25, 0.32, and 0.33 (mm/month), respectively, whereas R2 values were observed to be above 0.96 at all stations.
It can be perceived from Figure 8 that ET0 obtained through an NLE approach compared well with the FAO-PM56 method. The shape of the trend for each climatic station in Figure 7 indicated: (1) available data duration of climatic stations; (2) winter and summer seasons. The higher and lower peaks of ET0 in the results indicated climatic variation over the selected periods. The random data duration of adjacent climatic stations was selected to investigate the seasonal changes over the selected period. At each climatic station, the NLE approach overlapped with FAO-PM56 results and generated supreme results.

3.4. Testing of NLE Approaches in Faraway Climatic Stations

To investigate NLE performance in other climatic regions, three climatic stations, namely, Jacobabad (arid region); Islamabad (semi-arid region) and Skardu (humid region) were analyzed. Only the effective input meteorological parameters of Tmin, RHmean, and u(x) were used as input (Table 5) to estimate ET0 by applying an NLE approach and compared with the FAO-PM56 method. It was noted in Figure 9 that ET0 estimated by NLE approach compared well with the FAO-PM56 method. The higher and lower peaks of ET0 at each selected station with the NLE and FAO-PM56 method closely overlapped. This indicated that ET0 obtained through the NLE approach is reliable and acceptable with the use of limited climatic data. Similar to adjacent stations, NLE-based ET0 showed an excellent resemblance to PM-based ET0. The RMSE values for Jacobabad, Islamabad, and Skardu were 0.37, 0.32, and 0.19 (mm/month), respectively. The R2 scores ranged between 0.96 at Jacobabad and 0.99 at Skardu stations.

3.5. Discussion

In this study, firstly, the findings indicated that DTF outperformed TB and SDT in estimating ET0 using climatic parameter-based combinations as input to machine learning models. The climatic data of different weather stations, across diverse climate zones of Pakistan, was used. Earlier, for estimation of ET0, TB was found to outperform SDT and DTF in Pakistan and other countries including the USA, New Zealand, and China [35,47,48,49,50]. However, DTF has been found to be an effective machine learning techniques in other hydrological applications including rainfall-runoff modelling [51,52]. This result is inconsistent with past investigations in the case of ET0 and with hydrological applications generally. This contradiction is possibly due to the greater number of climatic variables involved in the estimation of ET0 as compared to other hydrological applications.
Secondly, an ensemble of machine learning models has been found to enhance modelling performance and accuracy. The ensemble of SDT, TB and DTF by using MLP enhanced the accuracy of ET0 estimation with minimum parametric requirements. Earlier studies have also shown that an adequate ensemble of machine learning techniques can increase modelling performance as compared to stand-alone applications. Therefore, this finding of the current study is consistent with those of other researchers [28,29,30,32,33].
Thirdly, mean relative humidity, mean temperature, and wind speed were found to be critical indicators of ET0 in our study. The study’s findings supported the assertion made by the authors of [28] that increased air moisture content causes relative humidity to have greater impact in wet locations; as a result, when the aridity index increases, air moisture content is constrained, and its effects are less. Temperature and relative humidity were discovered to be the most important predictors of ET0 in a study [53]. In another study [54], the effect of weather parameters on ET0 estimation in Esfahan province in Iran was investigated. The study concluded that minimum air temperature, sunshine hours, and relative humidity formed effective parameters for ET0 estimation in this region. Similarly, it was observed by the authors of [49] that climatic variables related to relative humidity had a significant influence on ML modelling of ET0. Including relative humidity in machine learning-based models increased performance by up to 24%. These earlier observations support our finding on the selection of the best input combination.
However, it is recommended to employ ML over empirical and locally calibrated models in cases where climatic data is unavailable, inconsistent or of poor quality. Calibration of ML models in the training phase is critical to avoid over- or underestimation of ET0 values. ET0 is underrated with more training data, but it is overestimated with less training data. The use of models based on machine learning techniques with minimum information requires sufficient training. Therefore, in order to test the efficacy of the ML-ET0 models generated, this study evaluated ML models in diverse climates. The data requirements for the current study, “Using the FAO PM56 and ML models for ET0 estimation,” are displayed in Table 8.
FAO PM56 may be observed in Table 8 to depend on numerous characteristics that are difficult to obtain, especially in poor countries. As an alternative to the FAO PM56 approach, ML models use fewer parameters that yield the best ET0 value.

4. Conclusions

The research effort that was carried out to create an ensemble-based machine learning model to predict ET0 with scant climate data is discussed in this paper. The lengthy process and significant data requirements (not readily accessible in some scenarios) for determining ET0 using the FAO-PM56 approach, which is recommended, served as the impetus for the study. The study’s findings demonstrate that it is possible to predict ET0 from extant climatic data using a tree-based model. It has been shown that the mean relative humidity, minimum temperature, and wind speed are the three most important inputs for a precise determination of ET0 using a tree-based model. This effective input was supported by a sensitivity analysis of the input parameters on ET0 carried out using tree-based models, where the lowest RMSE and maximum R2 values were obtained. According to the study’s findings, tree-based models can still predict ET0 precisely even when just data for these three variables are provided. Furthermore, an ensemble approach was applied to improve ET0 estimation using only three effective inputs (Tmin, RHmean, u(x)) and the results showed considerable improvement in ET0 estimation. The performance of this ensemble model was further investigated in seven adjacent and four faraway climatic stations of the selected study area to include different climatic effects from diverse climatic regions. The obtained results of the ensemble model indicate its usefulness and reliability as the obtained ET0 was well correlated with the standard FAO-PM56 method. Lastly, the study proposed to develop different ensemble ML techniques for ET0 estimation in other parts of the world.
Because ML strategies can handle system uncertainty, the ensemble stays superior, which implies that when a single ML methodology performs poorly, an ensemble approach will have more potential for improvements. Additionally, when a single ML approach performs well, ensemble modelling produces findings of a high caliber, and when a stand-alone ML technique performs poorly, improved results may be obtained. The approach we have suggested for estimating ET0 has to be applied in a number of places with diverse climatic conditions. The crucial thing to remember is that applying this ensemble approach in many parts of the world will assist in increasing its veracity and accuracy, and more recent machine learning approaches built on cutting-edge algorithms offer fodder for further study. This study proposes that an ensemble approach can be used by combining other ML techniques such as ANFIS, SVM, GMDH and CCNN on ET0 estimation. The most significant factor in applying an ensemble approach is the use in all parts of the world to determine its efficiency and reliability, specifically in areas that have limited climatic data. In addition, the current study used climatic data on a monthly basis, therefore we recommend future research should be focused to develop an ensemble model based on data on a daily basis to increase the accuracy and generalizability of the developed ensemble model for ET0 estimation.

Author Contributions

Conceptualization, M.S.; methodology, M.S., R.A. and A.R.; software, M.S., M.H. and H.S.; validation, A.R., M.S. and M.A.I.B.; formal analysis, A.R., M.U.A., H.S. and M.H.; investigation, M.S. and A.R.; resources, M.S., A.R., A.A., R.A. and H.S.; data curation, A.R. and H.S.; writing—original draft preparation, A.R., H.S. and M.H. and R.A.; writing—A.R., A.A., M.H. and R.A.; visualization, R.A., A.A. and M.U.A.; supervision, M.S. and M.A.I.B.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Higher Education Commission (HEC) of Pakistan, 7368.

Data Availability Statement

The Pakistan Meteorological Department (PMD) provided the data for the study, which the authors gratefully acknowledge. One can get the information directly from PMD.

Acknowledgments

We are thankful to the academic editor and reviewers for their insightful reviews and suggestions to improve the quality of our work. Also, The Pakistan Meteorological Department (PMD) provided the data for the study, which the authors gratefully acknowledge.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lieth, H. Modeling the Primary Productivity of the World. In Primary Productivity of the Biosphere; Lieth, H., Whittaker, R.H., Eds.; Springer: Berlin/Heidelberg, Germany, 1975; pp. 237–263. [Google Scholar]
  2. Zhang, Y.; Sun, A.; Sun, H.; Gui, D.; Xue, J.; Liao, W.; Yan, D.; Zhao, N.; Zeng, X. Error Adjustment of TMPA Satellite Precipitation Estimates and Assessment of Their Hydrological Utility in the Middle and Upper Yangtze River Basin, China. Atmos. Res. 2019, 216, 52–64. [Google Scholar] [CrossRef]
  3. Jung, M.; Reichstein, M.; Ciais, P.; Seneviratne, S.I.; Sheffield, J.; Goulden, M.L.; Bonan, G.; Cescatti, A.; Chen, J.; de Jeu, R.; et al. Recent decline in the global land evapotranspiration trend due to limited moisture supply. Nature 2010, 467, 951–954. [Google Scholar] [CrossRef] [Green Version]
  4. Goyal, M.R.; Harmsen, E.W. Evapotranspiration: Principles and Applications for Water Management; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
  5. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements; Food and Agriculture Organization: Rome, Italy, 1998. [Google Scholar]
  6. Wang, L.; Iddio, E.; Ewers, B. Introductory overview: Evapotranspiration (ET) models for controlled environment agriculture (CEA). Comput. Electron. Agric. 2021, 190, 106447. [Google Scholar] [CrossRef]
  7. Van, B.C.H. Lysimetric measurements of evapotranspiration rates in the eastern United States. Soil Sci. Soc. Am. J. 1961, 25, 138–141. [Google Scholar]
  8. Ding, R.; Kang, S.; Li, F.; Zhang, Y.; Tong, L.; Sun, Q. Evaluating eddy covariance method by large-scale weighing lysimeter in a maize field of northwest China. Agric. Water Manag. 2010, 98, 87–95. [Google Scholar] [CrossRef]
  9. Garcia, M.; Dirk, R.; Rick, A.; Carlos, H. Dynamics of Reference Evapotranspiration in the Bolivian Highlands (Altiplano). Agric. For. Meteorol. 2004, 125, 67–82. [Google Scholar] [CrossRef]
  10. Gavilán, P.; Lorite, I.J.; Tornero, S.; Berengena, J. Regional Calibration of Hargreaves Equation for Estimating Reference ET in a Semiarid Environment. Agric. Water Manag. 2006, 81, 257–281. [Google Scholar] [CrossRef]
  11. McMahon, F.H.S.; Chiew, N.N.; Kamaladasa, H.M.; Malano, T.A. Penman-Monteith, FAO-24 Reference Crop Evapotranspiration and Class—A Pan Data in Australia. Agric. Water Manag. 1995, 28, 9–21. [Google Scholar] [CrossRef]
  12. Gocic, M.; Trajkovic, S. Software for estimating reference evapotranspiration using limited weather data. Comput. Electron. Agric. 2010, 71, 158–162. [Google Scholar]
  13. Tabari, H.; Talaee, P. Local calibration of the Hargreaves and Priestley–Taylor equations for estimating reference evapotranspiration in arid and cold climates of Iran based on the Penman–Monteith model. J. Hydrol. Eng. 2011, 16, 837–845. [Google Scholar]
  14. Başağaoğlu, H.; Chakraborty, D.; Winterle, J. Reliable Evapotranspiration Predictions with a Probabilistic Machine Learning Framework. Water 2021, 13, 557. [Google Scholar] [CrossRef]
  15. Chakraborty, D.; Başağaoğlu, H.; Winterle, J. Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling. Expert Syst. Appl. 2021, 170, 114498. [Google Scholar] [CrossRef]
  16. Ravindran, S.M.; Bhaskaran, S.K.M.; Ambat, S.K.N. A Deep Neural Network Architecture to Model Reference Evapotranspiration Using a Single Input Meteorological Parameter. Environ. Process. 2021, 8, 1567–1599. [Google Scholar] [CrossRef]
  17. Zhou, Z.; Zhao, L.; Lin, A.; Qin, W.; Lu, Y.; Li, J.; Zhong, Y.; He, L. Exploring the potential of deep factorization machine and various gradient boosting models in modeling daily reference evapotranspiration in China. Arab. J. Geosci. 2020, 13, 1287. [Google Scholar] [CrossRef]
  18. Deo, R.C.; Wen, X.; Qi, F. A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset. Appl. Energy 2016, 168, 568–593. [Google Scholar] [CrossRef]
  19. Wang, L.; Kisi, O.; Zounemat-Kermani, M.; Salazar, G.; Zhu, Z.; Gong, W. Solar radiation prediction using different techniques: Model evaluation and comparison. Renew. Sustain. Energy Rev. 2016, 61, 384–397. [Google Scholar] [CrossRef]
  20. Wang, L.; Kisi, O.; Zounemat-Kermani, M.; Hu, B.; Gong, W. Modeling and comparison of hourly photosynthetically active radiation in different ecosystems. Renew. Sustain. Energy Rev. 2015, 56, 436–453. [Google Scholar] [CrossRef]
  21. Rahimikhoob, A. Estimation of Evapotranspiration Based on Only Air Temperature Data Using Artificial Neural Networks for a Subtropical Climate in Iran. Theor. Appl. Climatol. 2010, 101, 83–91. [Google Scholar] [CrossRef]
  22. Slavisa, T.; Kolakovic, S. Estimating Reference Evapotranspiration Using Limited Weather Data. J. Irrig. Drain. Eng. 2009, 135, 443–449. [Google Scholar] [CrossRef]
  23. Lessmann, S.; Bart, B.; Hsin-vonn, S.; Lyn, C.T. Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research. Eur. J. Oper. Res. 2015, 247, 124–136. [Google Scholar] [CrossRef] [Green Version]
  24. Kim, M.; Sung-hwan, M.; Ingoo, H. An Evolutionary Approach to the Combination of Multiple Classifiers to Predict a Stock Price Index. Earth Syst. Appl. 2006, 31, 241–247. [Google Scholar] [CrossRef]
  25. Tsai, C.; Yu-chieh, H. Combining Multiple Feature Selection Methods for Stock Prediction: Union, Intersection, and Multi-Intersection Approaches. Decis. Support Syst. 2010, 50, 258–269. [Google Scholar] [CrossRef]
  26. Baker, K. Operational Research Society Is Collaborating with JSTOR to Digitize, Preserve, and Extend Access to Operational Research Quarterly (1970–1977). Oper. Res. Q. 1977, 27, 155–167. [Google Scholar]
  27. Makridakis, S.; Andersen, A.; Carbone, R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Newton, J.; Parzen, E.; Winkler, R. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. J. Forecast. 1982, 1, 111–153. [Google Scholar]
  28. Nourani, V.; Elkiran, G.; Abdullahi, J. Multi-station artificial intelligence-based ensemble modeling of reference evapotranspiration using pan evaporation measurements. J. Hydrol. 2019, 577, 123958. [Google Scholar] [CrossRef]
  29. Parasuraman, K.; Amin, E. Toward Improving the Reliability of Hydrologic Prediction: Model Structure Uncertainty and Its Quantification Using Ensemble-Based Genetic Programming Framework. Water Resour. Res. 2008, 44, 1–12. [Google Scholar] [CrossRef]
  30. Kiran, N.R.; Ravi, V. Software reliability prediction by soft computing techniques. J. Syst. Softw. 2008, 81, 576–583. [Google Scholar]
  31. Sharghi, E.; Nourani, V.; Nazanin, B. Earthfill Dam Seepage Analysis Using Ensemble Artificial Intelligence Based Modeling. J. Hydroinform. 2018, 20, 1071–1084. [Google Scholar] [CrossRef]
  32. Finlay, S. Multiple Classifier Architectures and Their Application to Credit Risk Assessment. Eur. J. Oper. Res. 2011, 210, 368–378. [Google Scholar] [CrossRef] [Green Version]
  33. Paleologo, G.; André, E.; Gianluca, A. Subagging for Credit Scoring Models. Eur. J. Oper. Res. 2010, 201, 490–499. [Google Scholar] [CrossRef]
  34. Sharma, G.; Singh, A.; Jain, S. A hybrid deep neural network approach to estimate reference evapotranspiration using limited climate data. Neural Comput. Appl. 2021, 34, 4013–4032. [Google Scholar] [CrossRef]
  35. Raza, A.; Shoaib, M.; Faiz, M.A.; Baig, F.; Khan, M.M.; Ullah, M.K.; Zubair, M. Comparative Assessment of Reference Evapotranspiration Estimation Using Conventional Method and Machine Learning Algorithms in Four Climatic Regions. Pure Appl. Geophys. 2020, 177, 4479–4508. [Google Scholar] [CrossRef]
  36. Hammad, M.; Shoaib, M.; Salahudin, H.; Baig, M.A.I.; Khan, M.M.; Ullah, M.K. Rainfall forecasting in upper Indus basin using various artificial intelligence techniques. Stoch. Environ. Res. Risk Assess. 2021, 35, 2213–2235. [Google Scholar] [CrossRef]
  37. Quilty, J.; Adamowski, J. Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J. Hydrol. 2018, 563, 336–353. [Google Scholar] [CrossRef]
  38. Stone, M. Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Soc. 1974, 36, 111–147. [Google Scholar] [CrossRef]
  39. Peng, W.; Juhua, C.; Haiping, Z. An Implementation of IDE3 Decision Tree Learning Algorithm. Mach. Learn. 2009, 9417, 1–20. [Google Scholar]
  40. Sherrod, P.; DTREG Predictive Modeling Software. DevDigital: Nashvilla Software Development. 2009. Available online: https://www.dtreg.com (accessed on 11 April 2023).
  41. Raza, A.; Shoaib, M.; Khan, A.; Baig, F.; Faiz, M.A.; Khan, M.M. Application of Non-Conventional Soft Computing Approaches for Estimation of Reference Evapotranspiration in Various Climatic Regions. Theor. Appl. Climatol. 2020, 139, 1459–1477. [Google Scholar] [CrossRef]
  42. Vannieuwenhuyse, G. Arbitration and new technologies: Mutual benefits. J. Int. Arbitr. 2018, 35, 119–129. [Google Scholar] [CrossRef]
  43. Rokach, L. Ensemble Methods in Supervised Learning. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2010; pp. 959–979. [Google Scholar]
  44. Richman, R.; Wüthrich, M.V. Nagging Predictors. Risks 2020, 8, 83. [Google Scholar] [CrossRef]
  45. Perrone, M.P.; Copper, L.N. When Networks Disagree: Ensemble Methods for Technical Report Hybrid Neural Networks Unclassified; Brown University Providence Ri Institute for Brain and Neural Systems: Providence, RI, USA, 1992. [Google Scholar]
  46. Benediktsson, J.A.; Sveinsson, J.R.; Ersoy, O.K.; Swain, P.H. Parallel consensual neural networks. IEEE Trans. Neural Netw. 1997, 8, 54–64. [Google Scholar] [CrossRef] [Green Version]
  47. Raza, A.; Shoaib, M.; Faiz, M.A.; Shakil, A.; Khan, M.M.; Ullah, M.K.; Sarfraz, H. Comparative Study of Powerful Predictive Modeling Techniques for Modeling Monthly Reference Evapotranspiration in Various Climatic Regions. Fresenius Environ. Bull. 2021, 30, 7490–7513. [Google Scholar]
  48. Tikhamarine, Y.; Malik, A.; Kumar, A.; Souag-Gamane, D.; Kisi, O. Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol. Sci. J. 2019, 64, 1824–1842. [Google Scholar] [CrossRef]
  49. Ferreira, L.B.; da Cunha, F.F.; de Oliveira, R.A.; Filho, E.I.F. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM—A new approach. J. Hydrol. 2019, 572, 556–570. [Google Scholar] [CrossRef]
  50. Kisi, O.; Sanikhani, H.; Zounemat-Kermani, M.; Niazi, F. Long-term monthly evapotranspiration modeling by several data-driven methods without climatic data. Comput. Electron. Agric. 2015, 115, 66–77. [Google Scholar] [CrossRef]
  51. Khan, M.T.; Shoaib, M.; Hammad, M.; Salahudin, H.; Ahmad, F.; Ahmad, S. Application of Machine Learning Techniques in Rainfall—Runoff Modelling of the Soan River Basin, Pakistan. Water 2021, 13, 3528. [Google Scholar] [CrossRef]
  52. Khan, M.T.; Shoaib, M.; Albano, R.; Inam, M.A.; Salahudin, H.; Hammad, M.; Ahmad, S.; Ali, M.U.; Hashim, S.; Ullah, M.K. Intercomparison and Assessment of Stand-Alone and Wavelet-Coupled Machine Learning Models for Simulating Rainfall-Runoff Process in Four Basins of Pothohar. Atmosphere 2023, 14, 452. [Google Scholar] [CrossRef]
  53. Estévez, J.; Pedro, G.; Joaquín, B. Sensitivity Analysis of a Penman–Monteith Type Equation to Estimate Reference Evapotranspiration in Southern Spain. Hydrol. Process. 2009, 23, 3342–3353. [Google Scholar] [CrossRef]
  54. Eslamian, S.; Saeid, S.; Alireza, G.; Zareian, M.J.; Alireza, F. Estimating Penman-Monteith Reference Evapotranspiration Using Artificial Neural Networks and Genetic Algorithm: A Case Study. Arab. J. Sci. Eng. 2012, 37, 935–944. [Google Scholar] [CrossRef]
Figure 1. Study Area location.
Figure 1. Study Area location.
Hydrology 10 00169 g001
Figure 2. Mechanism of Ensemble Modeling applied in the Study.
Figure 2. Mechanism of Ensemble Modeling applied in the Study.
Hydrology 10 00169 g002
Figure 3. Flow chart of best input combination selection and non-linear ensemble of tree-based techniques for ET0 estimation.
Figure 3. Flow chart of best input combination selection and non-linear ensemble of tree-based techniques for ET0 estimation.
Hydrology 10 00169 g003
Figure 4. Training and testing results of RMSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.
Figure 4. Training and testing results of RMSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.
Hydrology 10 00169 g004
Figure 5. Training and testing results of NSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.
Figure 5. Training and testing results of NSE for (a) SDT, (b) TB, and (c) DTF based on input to various models.
Hydrology 10 00169 g005
Figure 6. Regression comparison of SDT, TB and DTF with FAO-PM56 method.
Figure 6. Regression comparison of SDT, TB and DTF with FAO-PM56 method.
Hydrology 10 00169 g006aHydrology 10 00169 g006b
Figure 7. ET0 Comparison of LE and NLE approaches with FAO-PM56 for Model 15 at Multan station.
Figure 7. ET0 Comparison of LE and NLE approaches with FAO-PM56 for Model 15 at Multan station.
Hydrology 10 00169 g007
Figure 8. NLE Performance against FAO-PM56 in adjacent climatic stations.
Figure 8. NLE Performance against FAO-PM56 in adjacent climatic stations.
Hydrology 10 00169 g008
Figure 9. Performance of NLE against FAO-PM56 in faraway climatic stations.
Figure 9. Performance of NLE against FAO-PM56 in faraway climatic stations.
Hydrology 10 00169 g009
Table 1. Dataset duration and climatic characteristic of selected stations.
Table 1. Dataset duration and climatic characteristic of selected stations.
Sr. No.Station NameLatitudeLongitudeDurationYearsClimatic Region
1Multan30.270571.50241996–201520Hyper Arid
2Jhang31.278172.33172004–201714Arid
3T.T. Sing30.970972.48262009–20179Arid
4Sahiwal30.668273.11142005–201713Arid
5Bahawalpur29.354471.69111987–201630Arid
6R.Y. Khan28.421270.29892002–201716Arid
7D.G. Khan30.048970.64552003–201715Arid
8Bhakkar31.608271.08542010–20178Arid
9Jacobabad28.282368.44722004–201612Arid
10Islamabad33.684473.04792004–201612Semi-Arid
11Skardu35.324775.55102004–201612Humid
Table 2. Climatic data of other Stations.
Table 2. Climatic data of other Stations.
Statistical ParametersTmaxTminRHmeanU (x)nET0
°C°C%Knotshour/dayMean (mm/day)
Multan
Mean32.4318.8556.566.077.484.78
Median34.7020.2059.005.507.694.75
Maximum43.8030.6080.0018.7811.2510.30
Minimum18.003.8028.000.003.131.10
Std. Dev.7.398.6211.833.971.472.61
Toba Tek Singh (T.T. Singh)
Mean31.717.565.20.836.13.36
Median34.318.867.50.76.83.5
Maximum41.728.482.52.459.76.7
Minimum16.92.739.50.000.001.00
Std. Dev.7.138.1710.820.612.441.64
Sahiwal
Mean31.4717.5561.341.757.334.20
Median34.0018.6064.001.658.004.20
Maximum42.0028.0082.004.2510.507.50
Minimum16.403.2033.000.100.001.40
Std. Dev.7.387.9011.140.892.561.74
Raheem Yar Khan (R.Y. Khan)
Mean34.2918.6557.542.200.004.62
Median36.6020.0559.752.150.004.40
Maximum44.9029.6083.007.100.0010.40
Minimum19.904.4031.000.150.001.40
Std. Dev.7.398.0910.321.250.002.22
Jhang
Mean31.7117.5062.161.018.084.04
Median34.1518.5565.000.908.374.00
Maximum42.1029.0082.503.1011.348.50
Minimum16.903.4035.000.003.440.90
Std. Dev.7.178.2711.390.731.672.16
Dera Ghazi Khan (D.G Khan)
Mean32.4919.0557.083.257.364.79
Median35.0020.4560.003.258.414.95
Maximum43.7030.2076.006.1010.629.30
Minimum17.605.0024.500.801.371.50
Std. Dev.7.457.9810.661.082.972.13
Bhakkar
Mean32.5917.6160.301.023.463.62
Median34.6019.1062.500.950.003.50
Maximum44.7029.5087.003.2010.107.60
Minimum17.503.2034.000.100.000.90
Std. Dev.8.018.4911.080.673.921.90
Bahawalpur
Mean32.4924.4620.524.985.084.95
Median33.0024.705.054.105.805.10
Maximum44.9029.6063.0011.0011.4010.50
Minimum19.904.4034.000.100.001.50
Std. Dev.12.4313.8820.593.363.462.06
Jacobabad
Mean33.8220.2941.802.917.644.45
Maximum45.4530.7572.857.108.458.98
Minimum19.956.3512.850.156.851.22
Std Dev.7.287.8313.551.500.441.93
Islamabad
Mean28.6214.1649.681.617.303.40
Maximum40.1525.3573.857.4411.158.19
Minimum15.05−2.9022.850.055.351.73
Std. Dev.6.417.7011.071.311.400.76
Skardu
Mean19.144.1339.212.465.983.22
Maximum9.2419.4081.002.041.812.04
Minimum−2.70−17.9014.000.152.550.37
Std. Dev.9.608.1414.568.541.952.12
Table 3. Summary of Applied Machine Learning Techniques.
Table 3. Summary of Applied Machine Learning Techniques.
ML TechniquesLearning AlgorithmOptimal Values of Prime Parameters
Rows in NodeTree LevelNode Size
SDTIterative Dichotomiser 3 (ID3)51010
TBGradient Boosting Algorithm (GBA)40055
DTFRandom Forest Algorithm (RFA)200502
Table 4. Parametric values for selected MLP ensemble model.
Table 4. Parametric values for selected MLP ensemble model.
ParametersValuesParametersValues
Number of layers3Number of Iterations10,000
Min to max neurons2–20Convergence tolerance−1.00 × 10−5
Neurons in hidden layer6Minimum improvement delta−1.00 × 10−6
Hidden layer FunctionSigmoidMinimum gradient−1.00 × 10−7
Output layer functionLinearMaximum execution time0
Table 5. Results of RMSE (mm/month) for all the meteorological input combinations.
Table 5. Results of RMSE (mm/month) for all the meteorological input combinations.
ModelMeteorological Input DatasetSDTTBDTF
TrainingTestingTrainingTestingTrainingTesting
Model 1Tmin, Tmax, RHmean, u(x), n0.550.660.390.460.380.54
Model 2RHmean, n1.051.741.121.411.141.9
Model 3RHmean, n, u(x)0.791.760.611.640.681.39
Model 4RHmean, u(x)0.791.760.61.630.361.7
Model 5Tmax, Tmin, n, u(x)0.420.620.320.40.220.51
Model 6Tmax, RHmean, n, u(x)0.380.970.30.870.181.06
Model 7Tmax, RHmean, u(x)0.380.970.290.860.21.23
Model 8Tmax, Tmin, RHmean, n0.40.830.391.210.251.04
Model 9Tmax, Tmin, RHmean, n, u(x)0.320.830.270.820.180.81
Model 10Tmean, RHmean, n, u(x)0.450.640.290.950.180.99
Model 11Tmean, RHmean, u(x)0.450.640.280.950.181.18
Model 12Tmean, RHmean0.520.70.411.210.271.5
Model 13Tmean, n0.550.640.60.640.620.74
Model 14Tmean, RHmean, n0.520.70.421.210.511.31
Model 15Tmin,RHmean, u(x)0.480.580.380.420.240.48
Model 16Tmin, RHmean, n, u(x)0.451.170.291.110.191
Model 17Tmean, u(x)0.451.170.291.120.21.17
Table 6. Results of NSE for all the meteorological input combinations.
Table 6. Results of NSE for all the meteorological input combinations.
ModelMeteorological Input DatasetSDTTBDTF
TrainingTestingTrainingTestingTrainingTesting
Model 1Tmin, Tmax, RHmean, u(x), n0.970.930.950.940.990.98
Model 2RHmean, n0.710.680.660.580.650.54
Model 3RHmean, n, u(x)0.830.40.90.420.880.44
Model 4RHmean, u(x)0.830.50.90.430.940.46
Model 5Tmax, Tmin, n, u(x)0.950.890.930.910.950.82
Model 6Tmax, RHmean, n, u(x)0.960.730.980.780.950.68
Model 7Tmax, RHmean, u(x)0.960.730.980.780.940.56
Model 8Tmax, Tmin, RHmean, n0.960.80.960.580.920.68
Model 9Tmax, Tmin, RHmean, n, u(x)0.970.80.980.810.930.81
Model 10Tmean, RHmean, n, u(x)0.940.880.980.720.940.71
Model 11Tmean, RHmean, u(x)0.940.880.980.740.940.6
Model 12Tmean, RHmean0.930.860.960.580.920.35
Model 13Tmean, n0.920.880.90.880.90.84
Model 14Tmean, RHmean, n0.930.860.950.580.930.5
Model 15Tmin,RHmean, u(x)0.940.900.960.910.980.93
Model 16Tmin, RHmean, n, u(x)0.940.610.980.640.950.71
Model 17Tmean, u(x)0.940.610.980.640.930.6
Table 7. Results of R2 for all the meteorological input combinations.
Table 7. Results of R2 for all the meteorological input combinations.
ModelMeteorological Input DatasetSDTTBDTF
TrainingTestingTrainingTestingTrainingTesting
Model 1Tmin, Tmax, RHmean, u(x), n0.960.950.960.950.970.96
Model 2RHmean, n0.690.660.640.570.640.53
Model 3RHmean, n, u(x)0.810.390.880.410.860.43
Model 4RHmean, u(x)0.810.490.880.420.920.45
Model 5Tmax, Tmin, n, u(x)0.930.870.910.890.930.80
Model 6Tmax, RHmean, n, u(x)0.940.710.960.760.930.66
Model 7Tmax, RHmean, u(x)0.940.710.960.760.920.55
Model 8Tmax, Tmin, RHmean, n0.940.780.940.570.900.66
Model 9Tmax, Tmin, RHmean, n, u(x)0.950.780.960.790.910.79
Model 10Tmean, RHmean, n, u(x)0.920.860.960.700.920.69
Model 11Tmean, RHmean, u(x)0.920.860.960.720.920.59
Model 12Tmean, RHmean0.910.840.940.570.900.34
Model 13Tmean, n0.900.860.880.860.880.82
Model 14Tmean, RHmean, n0.910.840.930.570.910.49
Model 15Tmin,RHmean, u(x)0.960.930.950.940.970.96
Model 16Tmin, RHmean, n, u(x)0.920.600.960.630.930.69
Model 17Tmean, u(x)0.920.600.960.630.910.59
Table 8. Data required for the Et0 estimation using the FAO PM56 and ML models.
Table 8. Data required for the Et0 estimation using the FAO PM56 and ML models.
Input DataTminTmaxRHminRHmaxRHmeanU(x)NRnAerodynamic Factors Adopted Methodology
(Rn, es, ea, emin, emax, Δ, Z, and Ɣ)
Climatic and aerodynamic******************FAO PM56
Effective variables**xxxxxx****xxxxxxML models
**—parameters required for ET0 estimation. xx—parameters used in the best input combination.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salahudin, H.; Shoaib, M.; Albano, R.; Inam Baig, M.A.; Hammad, M.; Raza, A.; Akhtar, A.; Ali, M.U. Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data. Hydrology 2023, 10, 169. https://doi.org/10.3390/hydrology10080169

AMA Style

Salahudin H, Shoaib M, Albano R, Inam Baig MA, Hammad M, Raza A, Akhtar A, Ali MU. Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data. Hydrology. 2023; 10(8):169. https://doi.org/10.3390/hydrology10080169

Chicago/Turabian Style

Salahudin, Hamza, Muhammad Shoaib, Raffaele Albano, Muhammad Azhar Inam Baig, Muhammad Hammad, Ali Raza, Alamgir Akhtar, and Muhammad Usman Ali. 2023. "Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data" Hydrology 10, no. 8: 169. https://doi.org/10.3390/hydrology10080169

APA Style

Salahudin, H., Shoaib, M., Albano, R., Inam Baig, M. A., Hammad, M., Raza, A., Akhtar, A., & Ali, M. U. (2023). Using Ensembles of Machine Learning Techniques to Predict Reference Evapotranspiration (ET0) Using Limited Meteorological Data. Hydrology, 10(8), 169. https://doi.org/10.3390/hydrology10080169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop