Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment

Kiraga, Shafik; Peters, R. Troy; Molaei, Behnaz; Evett, Steven R.; Marek, Gary

doi:10.3390/w16010012

Open AccessArticle

Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment

¹

Center for Precision and Automated Agricultural Systems, Irrigated Agriculture Research and Extension Center, Washington State University, Prosser, WA 99350, USA

²

Department of Agricultural and Environmental Science, College of Agriculture, Tennessee State University, Nashville, TN 37207, USA

³

United States Department of Agriculture-Agricultural Research Service Conservation and Production Research Laboratory, Bushland, TX 79012, USA

^*

Author to whom correspondence should be addressed.

Water 2024, 16(1), 12; https://doi.org/10.3390/w16010012

Submission received: 29 October 2023 / Revised: 8 December 2023 / Accepted: 11 December 2023 / Published: 20 December 2023

(This article belongs to the Topic Hydrology and Water Resources in Agriculture and Ecology)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate estimation of reference evapotranspiration (ET_r) is important for irrigation planning, water resource management, and preserving agricultural and forest habitats. The widely used Penman–Monteith equation (ASCE-PM) estimates ET_r across various timescales using ground weather station data. However, discrepancies persist between estimated ET_r and measured ET_r obtained from weighing lysimeters (ET_r-lys), particularly in advective environments. This study assessed different machine learning (ML) models in comparison to ASCE-PM for ET_r estimation in highly advective conditions. Various variable combinations, representing both radiation and aerodynamic components, were organized for evaluation. Eleven datasets (DT) were created for the daily timescale, while seven were established for hourly and quarter-hourly timescales. ML models were optimized by a genetic algorithm (GA) and included support vector regression (GA-SVR), random forest (GA-RF), artificial neural networks (GA-ANN), and extreme learning machines (GA-ELM). Meteorological data and direct measurements of well-watered alfalfa grown under reference ET conditions obtained from weighing lysimeters and a nearby weather station in Bushland, Texas (1996–1998), were used for training and testing. Model performance was assessed using metrics such as root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), and coefficient of determination (R²). ASCE-PM consistently underestimated alfalfa ET across all timescales (above 7.5 mm/day, 0.6 mm/h, and 0.2 mm/h daily, hourly, and quarter-hourly, respectively). On hourly and quarter-hourly timescales, datasets predominantly composed of radiation components or a blend of radiation and aerodynamic components demonstrated superior performance. Conversely, datasets primarily composed of aerodynamic components exhibited enhanced performance on a daily timescale. Overall, GA-ELM outperformed the other models and was thus recommended for ET_r estimation at all timescales. The findings emphasize the significance of ML models in accurately estimating ET_r across varying temporal resolutions, crucial for effective water management, water resources, and agricultural planning.

Keywords:

machine learning; genetic algorithm; advective environments; radiation components; aerodynamic components; reference evapotranspiration

1. Introduction

Evapotranspiration (ET) is a significant factor in the hydrological cycle and is frequently used to calculate hydrological losses through several important processes that take place between the ground and the atmosphere. It is essential for the optimal design of irrigation schedules [1], management of regional water resources [2], and estimation of different hydrological processes [3]. ET has a significant impact on several terrestrial ecosystem processes as well as pertinent characteristics, such as soil water content and energy balances [4]. Considerable progress in estimating ET and understanding the mechanisms of its ongoing variations in daily, annual, and inter-annual timescales has been made through numerous studies, motivated by the early awareness of the importance of water as an essential resource for life sustainability on earth. Despite the findings in these studies, the complex and nonlinear processes that dominate evapotranspiration have made its estimation a great challenge, partly due to several influencing factors, such as landform, geomorphological, soil moisture, and vegetation traits [5,6,7]. In this context, the precise estimation of ET is still of particular importance for careful water resource management.

An overview of the evolution of ET-estimating methods during the previous century was discussed in Ref. [8], and these methods are often classified as direct and indirect. Direct methods, such as lysimeters and micrometeorological techniques, demand special construction and high maintenance, which is expensive. Indirect methods, on the other hand, are less expensive and time-saving and are in some contexts regarded as suitable alternatives to the direct ones. The indirect methods are commonly classified into water-balance-based, radiation-based, mass-transfer-based, and temperature-based models [9]. Because it considers both aerodynamic and thermodynamic factors, the FAO Penman–Monteith (FAO-PM) model for calculating the ET of a reference short (grass) crop continues to be the most extensively used indirect method for ET_r estimation in a variety of regions and climates [10]. The Food and Agriculture Organization (FAO) approved the equation after determining that it accounts for all the variables influencing evapotranspiration and fixes most of the flaws in the other empirical techniques.

The Penman–Monteith Equation was slightly modified and standardized by the American Society of Civil Engineers (ASCE) for both tall crop (alfalfa) (ET_rs) and short crop (clipped grass) surfaces with similar parameterizations as the FAO-PM for computation of the equation components after national and international discussions on the adoption of a taller reference crop [11]. The result was the ASCE-PM equation, and it was formulated to allow calculations for both daily and hourly or shorter time steps [12]. The complexity involved in the calculations of all the required inputs in the ASCE-PM is a disadvantage for its application, which could lead to significant errors [13]. Another drawback in using the ASCE-PM is the difficulty in obtaining the extensive weather data needed and the absence of adequate historical records for each study location, which are crucial aspects for calculating reliable ET_r estimates [4]. This can be particularly challenging in developing countries, where there are limited meteorological stations and a scarcity of weather data records.

Reference evapotranspiration depends on latitude, altitude, and several climatic variables, such as relative humidity, air temperature, soil temperature, wind speed, net radiation, and dew point temperature [14]; several researchers have utilized a combination of these different parameters to model ET_o using machine learning models at daily and monthly timescales to overcome the identified limitations and difficulties of traditional methods [15]. CatBoost, generalized regression neural network (GRNN), and the random forest (RF) models were evaluated for estimating daily ET_o in the arid and semi-arid regions of northern China, using limited meteorological data with eight different combinations of inputs [16]. It was found that CatBoost demonstrated superior performance and was identified as the most effective method for estimating ET_o. The efficiency of extreme learning machines (ELM) was compared to the empirical Penman–Monteith equation and the feedforward backpropagation (FFBP) in predicting ET_o for three meteorological stations in Iraq, utilizing meteorological data from thirteen years (2000–2013) as inputs [5]. They concluded that the ELM model demonstrated efficiency, simplicity, high speed, and good generalization performance for ET_o estimation. Four different variants of an extreme learning machine (ELM) model optimized using bio-inspired search algorithms were evaluated to estimate daily reference evapotranspiration (ET_r) across various regions in China using data from eight meteorological stations [1]. The results highlighted the effectiveness of bio-inspired optimization algorithms, particularly the FPA and CSA algorithms, in enhancing the performance of the conventional ELM model for daily ET_o prediction. Kernel-based (Gaussian process regression (GPR), support vector regression (SVR)) and deep learning methods (Broyden–Fletcher–Goldfarb–Shanno artificial neural network (BFGS-ANN)) were compared to long short-term memory (LSTM) for estimation of monthly reference evapotranspiration using minimal meteorological parameters in ten different combinations [17]. The results showed that all four methods predicted ET_o amounts with acceptable accuracy and error levels.

Due to consistent climatic changes and the complexity of the evapotranspiration process that leads to its high variability in time and space, in this study, modeling the ET_r was extended to quarter-hourly and hourly timescales. Meteorological data at high temporal resolution has become easier to collect due to the recent advancements in the development of automatic weather stations [18]. This, in turn, boosts the estimation of ETr at such resolutions, which is commonly used for calibrating surface energy balance models in the determination of geospatial evapotranspiration from drone imagery or satellite images [19,20]. Therefore, assessing the performance of the ASCE-PM at timescales lower than the daily interval is increasingly becoming necessary.

To the best of our knowledge, there are no studies that have evaluated the use of ML models for the estimation of ET_r at lower timescales, such as hourly or quarter-hourly with varied meteorological input data. In addition, studies that evaluated ET_r at a daily timescale mostly considered parameters directly measured from weather stations. This is desirable for the estimation of reference evapotranspiration in areas with incomplete meteorological data [9]. However, the sensitivity of ET_r to different meteorological variables (directly measured or those calculated from directly measured) has been studied [21,22], and they were all found to influence the energy budget of the surface [23]. Changes in wind speed produced the largest decrease in ET_r, followed by vapor pressure, net radiation, and mean temperature [21]. Also, computed ET_r was found to be most sensitive to net radiation, followed by vapor pressure deficit and wind travel transfers; the contribution of the aerodynamic and net radiation components to the ET_r value varied throughout the year [24]. The difference in the contribution of these components could be attributed to the difference in climates [8]. Therefore, it is evident that for proper evapotranspiration estimation, proper assessment of all meteorological parameters (directly measured or not) is necessary. In this study, we combined directly measured weather parameters with subsequently calculated parameters for ET_r estimation using machine learning models.

Advantages of ML include the ability to use reduced data inputs, capturing non-linearity in the data inputs, and utilizing the computing power of modern-day computer systems to analyze big data. Collectively, these factors have made ML algorithms attractive options for estimating ET_r. The choice of the best possible algorithm and the choice of adequately representative variables are among the challenging aspects of any ML task [25]. Moreover, the performance of ML algorithms strongly depends on the size and structure of available data [25]. To improve performance, several bio-inspired algorithms, such as GA, are coupled with ML models to find the optimal set of parameters during model training [1,5]. This is because default optimization algorithms, such as backpropagation, are often trapped in the local minima, several parameters influence its speed and robustness, and its best parameters appear to vary from problem to problem [26]. Unlike the backpropagation technique that always adjusts weights towards the descending direction of the error function, GA is a parallel stochastic optimization algorithm good at global searching. The drawback of GA is its slowness during model training due to its exploration mechanism through reproduction, crossover, and mutation, as well as searching for optimal solutions from random genes [27,28]. Nonetheless, GA provides multiple optimal solutions from the search space, and has thus gained prominence in recent years. For evapotranspiration studies, ML models coupled with GA were found to perform better than the corresponding single models [29,30,31,32]. Therefore, in this paper, GA was utilized to optimize ML models and compared to standardized ASCE-PM for ET_r estimation on daily, hourly, and quarter-hourly timescales considering different input meteorological variables.

2. Materials and Methods

2.1. Lysimetric and Weather Data Collection

A research study was set up for the actual estimation of alfalfa reference evapotranspiration in Bushland, Texas, facility of the USDA Agricultural Research Service Conservation and Production Research Laboratory, in 1996. Bushland has a semi-arid climate that is impacted by local and regional advection events [33]. The site was subdivided into two square fields, designated northeast (NE) and southeast (SE), each measuring ~5.0 ha. Monolithic weighing lysimeters (NE and SE) with 3 m × 3 m surface dimensions and 2.4 m deep were located at the center of each of the two fields. These were used for the direct measurement of ET_r-lys. The details about the study area and the lysimeter’s site in Bushland (35°11′ N, 102°6′ W, and 1170 m above MSL) can be found in Refs. [34,35]. Alfalfa was seeded in the two fields, which were irrigated simultaneously by a Lindsay lateral move sprinkler system to maintain a well-watered condition. Experiments were conducted for 4 years from 1996 through 1999 [36], but the data for the year 1999 were omitted in this study because reference conditions for a tall reference crop (alfalfa) were not always met [10]. The leaf area index (LAI), growth stage, and plant height were measured periodically between and at each harvest in each field (Figure 1). The whole planting and agronomic management of alfalfa crops during the four-year growing period were described in Ref. [36], and the data and metadata are available in Ref. [37].

The methodology for meteorological data acquisition was extensively described in Refs. [35,36] and are available in Ref. [38]; therefore, only the measured variables and other noteworthy points will be presented in this work. Meteorological measurements were made at 5 s intervals and reported as quarter-hourly averages. This meteorological data, including the mean air temperature at 2.0 m height, mean relative humidity at 1.8 m height, mean wind speed at 2.0 m height, and mean solar irradiance were necessary to obtain the ET_r calculated in this study.

2.2. Data Processing

Lysimeter and meteorology data were processed and analyzed under three different timescales: quarter-hourly, hourly, and daily. Only the days when alfalfa height was at least 0.5 m were used in this study in accordance with reference conditions for a tall reference crop (alfalfa) [10]. Also, days were not considered if irrigation or rainfall affected the accuracy of the water balance calculations for measured ET_r or if the crop was lodged or badly watered. Figure 2 shows the K_c values for the days selected in this study. Note that the K_c values were often >1, which indicates that alfalfa ET under reference conditions exceeded the ASCE 2005 PM reference ET in the advective environment of Bushland.

ETr-lys values from the NE and SE lysimeters were averaged for each of the data points to obtain the full ET_r-lys dataset under the quarter-hourly timescale. The hourly ET_r-lys was computed as the sum of four (4) consecutive quarter-hourly readings, while the same number of readings were averaged to obtain the mean hourly values of the meteorological variables (air temperature, wind speed, solar radiation, relative humidity). On the daily timescale, the daily ET_r-lys was computed as the sum of all ninety-six (96) consecutive quarter-hourly readings, while the meteorological variables were extracted as follows; the maximum and minimum values of temperature and relative humidity were taken as the maximum and minimum of the 15 min average values for a given day, respectively. The ninety-six (96) consecutive quarter-hourly readings were averaged to obtain the mean daily values of the meteorological variables (wind speed, solar radiation, temperature). This resulted in daily meteorological data, including maximum temperature (T_max), minimum temperature (T_min), mean wind speed (u₂), maximum relative humidity (RH_max), minimum relative humidity (RH_min), and mean solar radiation (R_s).

2.3. Calculation of Parameters for Reference Evapotranspiration Estimation

Equation (1) presents the form of the ASCE-PM by Ref. [12] for different time steps. The constants C_d and C_n for each of the timescales are shown in Table 1.

E T_{r s} = \frac{0.408 ∆ (R_{n} - G) + γ \frac{C_{n}}{T + 273} u_{2} (e_{s} - e_{a})}{∆ + γ (1 + C_{d} u_{2})}

(1)

where ET_rs is the standardized reference crop evapotranspiration for a tall surface (mm d⁻¹ for daily time steps or mm h⁻¹ for hourly time steps); R_n is the net radiation at the crop surface (MJ m⁻² d⁻¹ for daily time steps or MJ m⁻² h⁻¹ for hourly time steps); G is the soil heat flux density at the soil surface at the daily (MJ m⁻² d⁻¹) and hourly (MJ m⁻² h⁻¹) basis; T is the mean daily or hourly air temperature at 1.5 to 2.5 m height (°C); u₂ is the mean daily wind speed at 2 m height (m s⁻¹); e_s is the saturation vapor pressure at 1.5 to 2.5 m height (kPa); e_a is the mean actual vapor pressure at 1.5 to 2.5 m height (kPa); ∆ is the slope of the saturation vapor pressure–temperature curve (kPa °C⁻¹); γ is the psychrometric constant (kPa °C⁻¹); C_n is the numerator constant that changes with reference type and calculation time step; and C_d is the denominator constant that changes with reference type and calculation time step (s m⁻¹).

The ASCE-PM reference evapotranspiration was calculated using REF-ET software (Version 4.1.4.22) [39], which calculates ET_o and ET_r for grass and alfalfa as short and tall reference crops, respectively, on different timescales. It can be used to calculate reference evapotranspiration on monthly, daily, and hourly or less timescales, and it has been adopted in several studies [40,41]. The calculated ET_rs values from the software were compared to the outputs of ML models at daily, hourly, and quarter-hourly timescales.

2.4. Machine Learning Algorithms and Optimization

Four ML models with different operation principles were used in this study to estimate ET_r: random forest (RF), extreme learning machine (ELM), support vector regression (SVR), and artificial neural network (ANN). These models have gained prominence in evapotranspiration studies in recent years [42,43]. ELM is different from ANN and SVR, as it does not require iterative training, and its hidden layer parameters are randomly selected [1]. It was first proposed by Ref. [44] and has received wide applicability due to its fast convergence speed, strong generalization ability, and no local extrema [45]. RF is a tree-based approach that manages high-dimension regression problems, where the final decision results via a bagging procedure. The structure of each of these models is shown in Figure 3, but a further detailed description of the definitions and principle of operation is out of the scope of this paper; it has been extensively discussed elsewhere [46,47,48,49].

2.4.1. Genetic Algorithm

The GA is a widely used optimization technique that has shown promise in agricultural studies for fine-tuning the parameters of ML models [50]. It is an evolutionary algorithm used to search for optimized solutions to the natural evolutionary process through simulation [1]. Several evapotranspiration studies indicated that coupling GA with machine learning models, such as ANN, SVR, and ELM, yields better results than the single models [29,30,31,32]. An extensive overview of the implementation of a genetic algorithm including potential integration with ML models was provided in Ref. [51]. In this study, its integration with ML models is demonstrated in Figure 4. Across all models, the population size, mutation probability, crossover probability, and the number of generations of the GA were set to 50, 0.2, 0.3, and 5, respectively. Also, input features were normalized within the range of [–1, 1] using the Mix and Max method [4].

For the SVR model, the radial basis function was used as the kernel function, and the regularization parameter and kernel parameter (gamma) were optimized with them ranging from 5 to 1000, and from 0.05 to 0.99, respectively.

For the RF models, the maximum depth of the tree (max_depth), the number of features (max_features), the minimum number of samples required to be at a leaf node (min_samples_leaf), the minimum number of samples required to split an internal node (min_samples_split), and the number of trees in the forest (n_estimators) were optimized ranging from 5 to 500, from 2 to 6, from 5 to 20, from 5 to 20, and from 1 to 500, respectively.

The weight matrix and bias vector of the ELM were randomly generated and the activation function was set to sigmoid. The hidden units and the regularization parameters were optimized from 5 to 1000 and from 0.02 to 0.9, respectively. Detailed descriptions of the ELM parameters used in this study are extensively discussed in [44].

The ANN model consisted of an input layer, a hidden layer, and the output layer. The Rectified Linear Unit (ReLu) was taken as the activation function. The neg_mean_squared_error and neg_mean_absolute_error of the scikit-learn package in Pyhtonv3.8 were used as the loss functions. The number of neurons and hidden layers were both optimized ranging from 3 to 100, while the learning rate was optimized from 0.01 to 0.3. Additional parameters were taken as the default values in the scikit-learn package.

2.4.2. k-Folds Cross Validation

The k-fold cross-testing approach was used during this phase to train and test the models. The meteorological dataset was randomly divided into training and testing datasets, each with 80% and 20% of the total data, respectively. The training dataset was then equally divided into five folds, of which four were utilized to train the models and one for model testing. To make sure that each fold was used at least once for model testing, the procedure was carried out five times. The performance of each of the model hyperparameters was assessed by the resulting error. The errors of the five trials were averaged as the expected generalization error. The parameters that provided the minimum average error were returned as the tuned hyperparameters.

2.4.3. Arrangement of Datasets for Machine Learning Models

The datasets for ML models were arranged to reflect the impacts of aerodynamic and radiation components on ET_r (Figure 5). Aerodynamic components are represented by wind speed, relative humidity, and vapor pressure deficit (VPD), while radiation components are represented by net radiation (R_n), solar irradiance (R_s), air temperature, vapor pressure–temperature curve (∆), and the relative cloudiness (R_s/R_so).

Further considerations included the simplicity of measuring the weather parameters, the ease of calculation of the intermediate parameters, and the completeness of the climatic variables. Parameters u₂, R_s, and the means, minimums, and maximums of temperature and relative humidity can be directly measured by weather stations. Parameters that are calculated from the directly measurable variables include the VPD, the R_s/R_so, the ∆, and the R_n. The requirement for the calculation of all these parameters for ET_r estimation has been cited as one of the challenges of the ASCE-PM that could lead to significant errors [13]. Such errors could be reduced by determining the most crucial parameters that influence evapotranspiration for a given area through machine learning algorithms. For this, eleven DTs were considered for the daily timescale, while seven input datasets (DTs) were considered for the hourly and quarter-hourly timescales, as shown in Table 2. On a daily timescale, DT₃, DT₅, and DT₁₀ contain all or the majority of the aerodynamic components, while DT₁, DT₄, DT₇, and DT₈ provide a good mix of both aerodynamic and radiation components. The presence of aerodynamic components is reduced in DT₆, DT₉, and DT₁₁. Among these datasets, all the parameters in DT₈ can be directly measured from a weather station, followed by DT₃ and DT₄, which have additional parameters that require computation. The need for computation is increased in DT₁ and DT₅. A similar arrangement was followed at hourly and quarterly timescales, however, in this case, the minimums and maximums of relative humidity and temperature were replaced by their mean values. The calculation of all the parameters we included in our datasets is defined and extensively explained in Ref. [12]. R_s/R_so represents relative cloudiness and can be derived from pyranometer data and calculated R_so values [52]. The calculation of VPD requires air temperature and relative humidity, while ∆ requires air temperature. The net radiation indicates the amount of solar irradiance absorbed by vegetation, and it is commonly calculated from the short and long-wave radiation components (Equation (2)) at either a daily, hourly, or lower timescale.

R_{n} = R_{n s} - R_{n l}

(2)

where R_ns and R_nl are net short-wave and net outgoing long wave radiation in MJ m⁻² d⁻¹/h⁻¹, respectively.

The equation for R_ns (Equation (3)) does not differ based on timescale, while that of R_nl (Equations (4) and (5)) differs based on the timescale considered [12].

R_{ns} : daily : and hourly (1 - α) R s

(3)

R_{nl} : d a i l y : σ f_{c d} (0.34 - 0.14 \sqrt{e_{a}}) [\frac{T_{K m a x}^{4} + T_{K m i n}^{4}}{2}]

(4)

R_{nl} : h o u r l y : σ f_{c d} (0.34 - 0.14 \sqrt{e_{a}}) T_{K h r}^{4}

(5)

where α (0.23) is albedo [dimensionless], R_s is incoming solar radiation [MJ m⁻² h⁻¹/d⁻¹], σ (2.042 × 10⁻¹⁰) is Stefan–Boltzmann constant [MJ K⁻⁴m⁻² h⁻¹/d⁻¹], f_cd is a cloudiness function [dimensionless] calculated from the relative solar radiation (R_s/R_so), e_a is actual vapor pressure [kPa], T_Khr is mean absolute temperature during the hourly period [K], and T_{K max} and T_{K min} are the maximum and minimum absolute temperatures during the 24-h period [K], respectively.

2.5. Evaluation Metrics

The competency of the ASCE-PM, GA-SVR, GA-ANN, GA-ELM, and GA-RF models for estimating ET_r-lys at daily, hourly, and quarter-hourly timescales was assessed by comparing the estimated ET from each model with the ET measured using the lysimeters and quantified using four commonly used statistical indices, i.e., the coefficient of determination (R²) [1], the mean bias error (MBE) [53], the mean absolute error (MAE), and the root mean squared error (RMSE) [54], which can be expressed as follows:

R^{2} : 1 - \frac{{[\sum_{i = 1}^{n} (y_{i} - y_{i}) (x_{i} - x_{i})]}^{2}}{{\sum_{i = 1}^{n} (y_{i} - y_{i})}^{2} {\sum_{i = 1}^{n} (x_{i} - x_{i})]}^{2}}

(6)

RMSE : \sqrt{\frac{\sum_{i = 1}^{n} {{(y}_{i} - x_{i})}^{2}}{n}}

(7)

MAE : \frac{\sum_{i = 1}^{n} {| y}_{i} - x_{i} |}{n}

(8)

MBE : \frac{\sum_{i = 1}^{n} {(y}_{i} - x_{i})}{n}

(9)

Note: y_i denotes estimated ET_r, and x_i denotes the observed ET_r.

3. Results and Discussion

3.1. Comparison of the Estimation Accuracy of the ASCE-PM and Machine Learning Models at a Daily Timescale

The estimated ET_r using ASCE-PM (ET_rs) and that from ML models was compared to that from weighing lysimeters (ET_r-lys) on a daily timescale. Table 3 indicates that DT₃, DT₄, and DT₆ provided the best-performing estimators of ET_r-lys across the different ML models. However, Figure 6 illustrates that the ASCE-PM equation performed better than these models (considering R²) but tended to underestimate ET_r-lys (slope of 0.91, MBE = −0.43 mm/day) for values from 7.5 mm day⁻¹ and greater. Also, the residual plot in Figure 7b shows that the largest errors occurred at approximately > 7.5 mm day ⁻¹ but, generally, the equation seems to persistently underestimate ET_r-lys throughout all estimates. Underestimation or overestimation of ET_r-lys by the different variants of the Penman–Monteith equation due to low and high evaporative demands has been reported in earlier studies, and it was attributed to the difference in local climatic conditions and lysimetric measurement errors. Due to the semi-arid climate and relatively large wind speeds at Bushland [33], the slight deviations between ET_rs and ET_r-lys could be attributed to the advective transport that adds energy, thus increasing the ET_r-lys in the reference alfalfa fields, in agreement with Ref. [55]. Such underestimations were reported from other highly advective environments [56].

Comparison of the ET_r from ML models with the ETr-lys shows that the underestimation was reduced, most significantly for GA-ELM, when applied to DT₃ (slope = 0.97, MBE = −0.25) and DT₆ (slope = 0.95, MBE = −0.19) (Figure 6a,d). It also demonstrates that GA-ELM can reduce the positive offset and slope, similar to Refs. [22,57], who found ELM to be suitable for reference evapotranspiration estimation. The best-performing model (GA-ANN-DT₄, considering R²) had a smaller slope (0.90), although it reduced underestimation (MBE = −0.15). According to RMSE values shown in Table 3, ASCE-PM performed better than both the SVR and RF models for the majority of the datasets. These models performed close to the ASCE-PM with RMSE values of 0.98 and 0.93 for the GA-SVR and GA-RF models, respectively, for the DT₈. However, the high positive offsets, 0.81 and 0.46 for the GA-SVR and GA-RF, respectively, indicate the poor performance of the models as compared to that of ASCE-PM (0.30). The statistical parameters shown in Table 3 give a clear idea about the interaction of the meteorological variables used in estimating ET_r-lys across the different models. Based on these results, u₂ was involved in all the best-performing datasets (DT₃, DT₄, and DT₆); therefore, it can be concluded that it is one of the most relevant estimators for daily ET_r-lys in this region and that accurate estimates of ET_r-lys might not be achieved without the inclusion of this parameter. Also, the inclusion of VPD and R_s/R_so in DT₃ led to improved slope and offset values compared to those of DT₄ for GA-ELM, GA-ANN, and GA-SVR. In DT₆, when R_n was excluded from the inputs, it did not affect the performance of the ML models. This result presents an opportunity for estimating ET_r-lys without the long R_n calculations and could suggest that in dry and advective environments, radiation components might primarily contribute to pressure deficit through temperature changes, but aerodynamic components play a major role in driving evapotranspiration. This might affirm the relevancy of the inclusion of aerodynamic components in ML models in advective environments for daily ET_r-lys calculations.

Values of RMSE and MAE for GA-ELM-DT₄ and GA-ELM-DT₅ confirm that the inclusion of ∆, VPD, and R_s/R_so in DT₅ decreased the model accuracy from a MAE of 0.56 mm/day and RMSE of 0.85 mm/day to a MAE of 0.90 mm/day and RMSE of 1.20 mm/day (Table 3). Removing ∆ in GA-ELM-DT₃ and replacing it with T_mean and R_n reduced the MAE from 0.90 mm/day to 0.61 mm/day. A similar trend is observed in all the datasets where it was included (DT₁, DT₂). The effect of the inclusion of R_s/R_so (an indication of cloudiness) can also be assessed by looking at the values of MAE and RMSE for DT₇ and DT₈ in Table 3. It is evident that across all models, the MAE and RMSE for DT₇ ranged from 0.79 mm/day to 0.86 mm/day and from 0.95 mm/day to 1.21 mm/day, respectively, which are larger than the range for DT₈ (0.68 mm/day–0.89 mm/day and 0.92 mm/day–0.98 mm/day, respectively). Therefore, this suggests that ∆ and R_s/R_so might not be suitable for daily ETr-lys estimation in the advective environment tested.

It is observed from Table 3 that replacing the maximums and minimums of relative humidity and temperature with their mean values for DT₆ and DT₇ slightly improved the RMSE values. However, when the R_s/R_so was excluded as shown for DT₈, the RMSE and MAE were improved. This means R_s/R_so might not be suitable as a direct input into ML models for daily ET_r estimation for the conditions of this study. MAE and RMSE values for DT₄ and DT₈ indicate that R_n and R_s might have a similar effect on the daily ET_r-lys estimation and can be used interchangeably. Hence, precautions should be taken into consideration for deploying ML models in locations such as the Bushland, TX station when records of both R_n and R_s are missing. Among all the ML models tested, GA-ANN subjected to DT₄ produced the largest R² value (0.91), while the GA-ELM applied to DT₈ produced the slope closest to unity and the smallest offset. Figure 7a,c,d also reveals that residuals for the GA-ELM and GA-ANN were persistently close to the zero line, which indicates that these models can reduce underestimation or overestimation of ET_r-lys.

Overall, the results indicated that on a daily timescale, ML models can give better estimates of ETr-lys than the ASCE-PM under the tested conditions, and we can learn some things about the importance of the various input parameters on the accuracy of the equations. This is important since weather data are normally collected with missing data, which is sometimes estimated or imputed [58]. Therefore, the relevance of the parameters indicates how much effort should be expended to replace or estimate missing data based on the prevailing local conditions. For instance, if accurate estimates of ET_r-lys can be obtained without a particular parameter, no effort would be required to impute its missing values.

3.2. Comparison of the Estimation Accuracy of the ASCE-PM and Machine Learning Models at Hourly and Quarter-Hourly Timescale

Table 4 gives the performance of the ML models across the different datasets for hourly and quarter-hourly timescales. Residual plots for the best-performing ML models and ASCE-PM are shown in Figure 8. Residuals for all ML models in Figure 8a,c,d indicate more points close to the zero line as compared to the ASCE-PM plot (Figure 8b). The errors appear to increase at about 0.6 mm/h. A similar trend is observed in Figure 9b, where the slope of the straight line coincides sensibly with the bisector up to about 0.6 mm/h, indicating the method’s success for that set of values. However, the method slightly underestimates ET_r-lys above this value, although the overall MBE was calculated as −0.008 mm/h. Our results agree with the findings of other previous studies [56]. In contrast, some studies have had results slightly different from ours, where grass reference (ET_r) was evaluated and compared with measurements from a lysimeter [59]. In that paper, the ASCE-PM overestimated lysimeter measurements by 4% for values above 0.45 mm/h. The behavior of the ASCE-PM might vary depending on the reference crop considered and the study site [36].

For the ML models GA-ELM-DT₂, GA-ELM-DT₃, and GA-ELM-DT₇, the residuals were close to zero and slopes (Figure 9a,c,d) close to unity, indicating that the models performed quite well. It can be perceived from Table 4 that these models performed better than the ASCE-PM, considering all statistical parameters. The MAE and RMSE for ASCE were 0.36 and 0.24, respectively, lower than values for DT₇, where all models performed better than the ASCE, with ranges for MAE and RMSE being 0.04 mm/h–0.05 mm/h and 0.06–0.07 mm/h, respectively. Considering the effect of the different variables on the model’s performance, results for DT₄ show that excluding ∆, VPD, and R_s/R_so led to a reduction in the slope. However, when the data for the three variables were present in DT₅ and data for R_n and T_mean were absent, the slopes were still reduced across all models. This indicates that T_mean and R_n could be major contributing factors as direct inputs for hourly ET_r estimation using ML models. The contribution of R_n to hourly ET_r-lys estimation can be further assessed by looking at the results for DT₄ and DT₇. The consistent MAE, RMSE, and slope values for all the evaluated models indicate that R_s and R_n can be used interchangeably as direct inputs to the ML models. In fact, when data for both variables were absent in DT₅, the worst estimates were observed from the ML models. The performance of ML models on DT₄ is encouraging and indicates a potential for estimating hourly ET_r with directly sensed meteorological data.

It can be perceived from both Figure 10a and Figure 11a that the ASCE-PM produced biased estimates at the quarter-hourly timescale (slope of 0.93, MBE = −0.0012 mm/h), tending to underestimate ET_r-lys. On the other hand, the residuals for the best-performing models (Figure 10b,c) appear to be close to the zero line, and the R² values in both cases (Figure 11b,c) are larger than that for the ASCE-PM (Figure 11b).

An improved performance can be observed from the best-performing ML models (GA-ELM-DT₂, GA-ELM-DT₇) with residuals (Figure 10b,c) close to the zero line and slopes (Figure 11b,c) close to unity. Additionally, all the other models tested (GA-SVR, GA-RF) performed better than ASCE-PM, as shown by the statistical parameters in Table 4. The MAE and RMSE for ASCE were 0.36 and 0.24, respectively, higher than values for all models and across all datasets that ranged from 0.01 to 0.02 mm/h and from 0.02 to 0.03 mm/h, respectively. The superiority of ML models for quarter-hourly ET_r-lys estimation provides an opportunity to use more high temporal resolution data for ET_r-lys estimation.

Some relevant findings have emerged from the results shown above. The adopted machine learning algorithms are a powerful tool for the prediction of reference evapotranspiration at lower timescales. Starting from the parameters that are directly sensed, as in DT₈, to a combination with parameters calculated from them, as in DT₂, and other datasets, it is possible to obtain prediction models characterized by very high accuracy. Mean temperature, as well as net solar radiation, play a significant role in influencing the various processes of the hydrological cycle. These parameters appear to be relevant for modeling at hourly or quarter-hourly timescales; therefore, a data-driven model that considers the above two factors most likely leads to satisfactory results. However, in the absence of net radiation measurements or in case the resources for its long calculations are limiting, it is still possible to build a reliable prediction model of ET_r-lys with the aid of the machine learning algorithms, based only on solar radiation, mean temperature, wind speed, and relative humidity data. This would especially be desirable for remote sensing applications in data-scarce environments. The use of climatic data that are not very recent may appear to be a limitation of this study. However, the ability to interpret patterns in past climatic data will be crucial to climatic forecasting, and ML models have been deemed necessary.

3.3. Transferability of the Developed Machine Learning Models

Developing models for reference evapotranspiration estimation requires an understanding of how the different features contribute to the model estimations. This is necessary, especially for data-scarce environments, where the most dominant variables can be utilized to provide real-time ET_r-lys estimates through ML algorithms. Our investigation of the different parameter combinations and contributions to ET_r-lys estimation at different timescales indicated that ML models can perform better than ASCE-PM while utilizing fewer data inputs to achieve reasonable results. If the estimated ET_r-lys values are acceptable representations of ET_rs, then the improved accuracy on shorter timescales is especially relevant for remote sensing of ET since daily or weekly ET is extrapolated from ET_rs estimates from the small time window that the images were taken, and thus, improved estimates on these shorter timescales could improve the overall accuracy of these methods.

However, the use of the models from this study could be limited to the study region and to places that have similar climatic conditions. Areas with similar climatic conditions might experience similar patterns of the meteorological parameters, producing similar variability as our results. Still, it is necessary to create similar or novel ML models using different parameter combinations to investigate their performance in other regions. Moreover, based on the target application, the timescale of the required data might be different from region to region. The models developed here showed the potential of capturing variability even at small timescales (quarter-hourly), which is a common temporal resolution for weather data collection in many regions. The necessity to test different models at different timescales is paramount for a given region. In fact, different regions have been found to suit different empirical models for ET_r computation, primarily based on local climatic conditions.

In the case that the models of the current study are transferred to other regions, proper training of the models is required as well as paying extra attention to the underfitting and overfitting phenomena commonly experienced during ML model training. Future developments of this study will concern coupling the ML models, surface energy balance models, and satellite imagery to improve the estimation of ET_r. Increasing the spatial scale of ET_r estimation would be necessary to enhance water management and planning.

4. Conclusions

Machine learning models (GA-ELM, GA-SVR, GA-RF, and GA-ANN) were investigated in modeling lysimeter-measured reference evapotranspiration (ET_r-lys) at daily, hourly, and quarter-hourly timescales using various input combinations of radiation and aerodynamic components for a highly advective environment. The results were compared with those of the standardized Penman–Monteith Equation (ASCE-PM). Based on the comparison results, it was observed that machine learning models yielded more accurate ET_r-lys estimates compared to ASCE-PM across all timescales. ASCE-PM consistently underestimated ET_r-lys at all timescales. Radiation components, as well as a combination of radiation and aerodynamic model inputs, demonstrated superior performance at the hourly and quarter-hourly timescales. Conversely, datasets primarily characterized by aerodynamic components performed better at the daily timescale. The results indicate that machine learning models can effectively replace ASCE-PM for ET_r-lys estimation in highly advective environments. Moreover, different climatic variables exert varying influence on model performance at a given timescale based on local weather conditions. This approach can be integrated to enhance the accuracy of ET_r-lys estimation at hourly or quarter-hourly timescales, which is crucial for precise geospatial ET estimation using land surface energy balance models. Additionally, the same approach could be applied in similar advective environments to evaluate the impact of meteorological parameters on ET_r estimation. This is imperative for ensuring precise ET_r estimation for agricultural water management.

Author Contributions

Conceptualization, S.K., R.T.P. and B.M.; methodology, S.K., R.T.P. and S.R.E.; software, S.K.; formal analysis, S.K., R.T.P., B.M. and G.M.; validation, R.T.P., S.R.E. and G.M.; writing: original draft preparation, S.K.; writing: review and editing, S.K., R.T.P., S.R.E. and B.M.; data curation S.K., R.T.P., S.R.E. and B.M.; visualization, S.K., R.T.P. and S.R.E.; investigation, R.T.P.; supervision, R.T.P.; funding acquisition, R.T.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported from USDA-ARS Project 3090-13000-15-00-D, and from the Ogallala Aquifer Program, a Consortium between USDA-Agricultural Research Service, Kansas State University; Texas AgriLife Extension Service & Research, Texas Tech University; and West Texas A&M University, Department of Agriculture. This material is also based upon work supported by the AI Research Institutes program supported by NSF and USDA-NIFA under the AI Institute: Agricultural AI for Transforming Workforce and Decision Support (AgAID) award No. 2021-67021-35344.

Data Availability Statement

Data are available on the USDA ARS NAL Ag Data Commons at URLs given in Refs. [37,38].

Conflicts of Interest

The authors declare no conflict of interest.

References

Wu, L.; Zhou, H.; Ma, X.; Fan, J.; Zhang, F. Daily reference evapotranspiration prediction based on hybridized extreme learning machine model with bio-inspired optimization algorithms: Application in contrasting climates of China. J. Hydrol. 2019, 577, 123960. [Google Scholar] [CrossRef]
Jia, Y.; Su, Y.; Zhang, R.; Zhang, Z.; Lu, Y.; Shi, D.; Xu, C.; Huang, D. Optimization of an extreme learning machine model with the sparrow search algorithm to estimate spring maize evapotranspiration with film mulching in the semiarid regions of China. Comput. Electron. Agric. 2022, 201, 107298. [Google Scholar] [CrossRef]
Shiri, J.; Sadraddini, A.A.; Nazemi, A.H.; Kisi, O.; Marti, P.; Fard, A.F.; Landeras, G. Evaluation of different data management scenarios for estimating daily reference evapotranspiration. Hydrol. Res. 2013, 44, 1058–1070. [Google Scholar] [CrossRef]
Abdullah, S.S.; Malek, M.A.; Abdullah, N.S.; Kisi, O.; Yap, K.S. Extreme Learning Machines: A new approach for prediction of reference evapotranspiration. J. Hydrol. 2015, 527, 184–195. [Google Scholar] [CrossRef]
Wu, L.; Huang, G.; Fan, J.; Ma, X.; Zhou, H.; Zeng, W. Hybrid extreme learning machine with meta-heuristic algorithms for monthly pan evaporation prediction. Comput. Electron. Agric. 2020, 168, 105115. [Google Scholar] [CrossRef]
Moazenzadeh, R.; Mohammadi, B.; Shamshirband, S.; Chau, K. Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng. Appl. Comput. Fluid Mech. 2018, 12, 584–597. [Google Scholar] [CrossRef]
Li, X.R.; Jia, R.L.; Zhang, Z.S.; Zhang, P.; Hui, R. Hydrological response of biological soil crusts to global warming: A ten-year simulative study. Glob. Change Biol. 2018, 24, 4960–4971. [Google Scholar] [CrossRef] [PubMed]
Allen, R.G.; Pereira, L.S.; Howell, T.A.; Jensen, M.E. Evapotranspiration information reporting: I. Factors governing measurement accuracy. Agric. Water Manag. 2011, 98, 899–920. [Google Scholar] [CrossRef]
Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao Rome 1998, 300, D05109. [Google Scholar]
Perera, K.C.; Western, A.W.; Nawarathna, B.; George, B. Comparison of hourly and daily reference crop evapotranspiration equations across seasons and climate zones in Australia. Agric. Water Manag. 2015, 148, 84–96. [Google Scholar] [CrossRef]
Allen, R.G.; Walter, I.A.; Elliott, R.L.; Howell, T.A.; Itenfisu, D.; Jensen, M.E.; Snyder, R.L. The ASCE Standardized Reference Evapotranspiration Equation; Water Resources Institute: Reston, VA, USA, 2005. [Google Scholar]
Valiantzas, J.D. Simplified forms for the standardized FAO-56 Penman–Monteith reference evapotranspiration using limited weather data. J. Hydrol. 2013, 505, 13–23. [Google Scholar] [CrossRef]
Izadifar, Z.; Elshorbagy, A. Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models. Hydrol. Process. 2010, 24, 3413–3425. [Google Scholar] [CrossRef]
Kumar, D.; Adamowski, J.; Suresh, R.; Ozga-Zielinski, B. Estimating Evapotranspiration Using an Extreme Learning Machine Model: Case Study in North Bihar, India. J. Irrig. Drain. Eng. 2016, 142, 04016032. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Z.; Zheng, J. CatBoost: A new approach for estimating daily reference crop evapotranspiration in arid and semi-arid regions of Northern China. J. Hydrol. 2020, 588, 125087. [Google Scholar] [CrossRef]
Sattari, M.T.; Apaydin, H.; Band, S.S.; Mosavi, A.; Prasad, R. Comparative analysis of kernel-based versus ANN and deep learning methods in monthly reference evapotranspiration estimation. Hydrol. Earth Syst. Sci. 2021, 25, 603–618. [Google Scholar] [CrossRef]
Estévez, J.; Gavilán, P.; Giráldez, J.V. Guidelines on validation procedures for meteorological data from automatic weather stations. J. Hydrol. 2011, 402, 144–154. [Google Scholar] [CrossRef]
Chandel, A.K.; Molaei, B.; Khot, L.R.; Peters, R.T.; Stöckle, C.O. High Resolution Geospatial Evapotranspiration Mapping of Irrigated Field Crops Using Multispectral and Thermal Infrared Imagery with METRIC Energy Balance Model. Drones 2020, 4, 52. [Google Scholar] [CrossRef]
Molaei, B.; Peters, R.T.; Khot, L.R.; Stöckle, C.O. Assessing Suitability of Auto-Selection of Hot and Cold Anchor Pixels of the UAS-METRIC Model for Developing Crop Water Use Maps. Remote Sens. 2022, 14, 4454. [Google Scholar] [CrossRef]
Xie, H.; Zhu, X. Reference evapotranspiration trends and their sensitivity to climatic change on the Tibetan Plateau (1970–2009): Reference evapotranspiration on the tibetan plateau. Hydrol. Process. 2013, 27, 3685–3693. [Google Scholar] [CrossRef]
Irmak, S.; Kabenge, I.; Skaggs, K.E.; Mutiibwa, D. Trend and magnitude of changes in climate variables and reference evapotranspiration over 116-yr period in the Platte River Basin, central Nebraska–USA. J. Hydrol. 2012, 420–421, 228–244. [Google Scholar] [CrossRef]
Alexandris, S.; Proutsos, N. How significant is the effect of the surface characteristics on the Reference Evapotranspiration estimates? Agric. Water Manag. 2020, 237, 106181. [Google Scholar] [CrossRef]
Saxton, K.E. Sensitivity analyses of the combination evapotranspiration equation. Agric. Meteorol. 1975, 15, 343–353. [Google Scholar] [CrossRef]
Granata, F. Evapotranspiration evaluation models based on machine learning algorithms—A comparative study. Agric. Water Manag. 2019, 217, 303–315. [Google Scholar] [CrossRef]
Siddique, M.N.H.; Tokhi, M.O. Training neural networks: Backpropagation vs. genetic algorithms. In Proceedings of the IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), Washington, DC, USA, 15–19 July 2001; Volume 4, pp. 2673–2678. [Google Scholar]
Gill, E.J.; Singh, E.B.; Singh, E.S. Training back propagation neural networks with genetic algorithm for weather forecasting. In Proceedings of the IEEE 8th International Symposium on Intelligent Systems and Informatics, Subotica, Serbia, 10–11 September 2010; pp. 465–469. [Google Scholar]
Rozos, E.; Dimitriadis, P.; Mazi, K.; Koussis, A.D. A multilayer perceptron model for stochastic synthesis. Hydrology 2021, 8, 67. [Google Scholar] [CrossRef]
Huang, G.B.; Ding, X.; Zhou, H. Optimization method based extreme learning machine for classification. Neurocomputing 2010, 74, 155–163. [Google Scholar] [CrossRef]
Chen, Y.; Chang, F.J. Evolutionary artificial neural networks for hydrological systems forecasting. J. Hydrol. 2009, 367, 125–137. [Google Scholar] [CrossRef]
Oyebode, O.; Stretch, D. Neural network modeling of hydrological systems: A review of implementation techniques. Nat. Resour. Model. 2019, 32, 12189. [Google Scholar] [CrossRef]
Liu, Q.; Wu, Z.; Cui, N.; Zhang, W.; Wang, Y.; Hu, X.; Gong, D.; Zheng, S. Genetic Algorithm-Optimized Extreme Learning Machine Model for Estimating Daily Reference Evapotranspiration in Southwest China. Atmosphere 2022, 13, 971. [Google Scholar] [CrossRef]
Evett, S.R.; Schwartz, R.C.; Howell, T.A.; Louis Baumhardt, R.; Copeland, K.S. Can weighing lysimeter ET represent surrounding field ET well enough to test flux station measurements of daily and sub-daily ET? Adv. Water Resour. 2012, 50, 79–90. [Google Scholar] [CrossRef]
Evett, S.R.; Howell, T.A.; Schneider, A.D.; Copeland, K.S.; Dusek, D.A.; Brauer, D.K.; Tolk, J.A.; Marek, G.W.; Marek, T.M.; Gowda, P.H. The bushland weighing lysimeters: A quarter century of crop et investigations to advance sustainable irrigation. Trans. ASABE 2016, 59, 163–179. [Google Scholar]
Evett, S.R.; Marek, G.W.; Copeland, K.S.; Colaizzi, P.D. Quality Management for Research Weather Data: USDA-ARS, Bushland, TX. Agrosystems Geosci. Environ. 2018, 1, 1–18. [Google Scholar] [CrossRef]
Evett, S.R.; Howell, T.A.; Todd, R.W.; Schneider, A.D.; Tolk, J.A. Alfalfa reference ET measurement and prediction. In Proceedings of the National Irrigation Symposium: Proceedings of the 4th Decennial Symposium, Phoenix, Arizona, 14–16 November 2000; pp. 266–272. [Google Scholar]
Evett, S.R.; Copeland, K.S.; Ruthardt, B.B.; Marek, G.W.; Colaizzi, P.D.; Howell, T.A.; Brauer, D.K.; The Bushland, Texas, Alfalfa Datasets. USDA ARS NAL Ag Data Commons. Available online: https://data.nal.usda.gov/dataset/bushland-texas-alfalfa-datasets (accessed on 6 December 2023).
Evett, S.R.; Copeland, K.S.; Ruthardt, B.B.; Marek, G.W.; Colaizzi, P.D.; Howell, T.A.; Brauer, D.K. Standard Quality Controlled Research Weather Data-USDA-ARS, Bushland, Texas. USDA ARS NAL Ag Data Commons. Available online: https://data.nal.usda.gov/dataset/standard-quality-controlled-research-weather-data-%E2%80%93-usda-ars-bushland-texas (accessed on 6 December 2023).
Allen, R.G. REF-ET: Reference Evapotranspiration Calculation Software for FAO and ASCE Standardized Equations; University of Idaho: Moscow, ID, USA, 2009. [Google Scholar]
DeJonge, K.C.; Thorp, K.R. Implementing Standardized Reference Evapotranspiration and Dual Crop Coefficient Approach in the DSSAT Cropping System Model. Trans. ASABE 2017, 60, 1965–1981. [Google Scholar] [CrossRef]
Park, J.; Choi, M. Estimation of evapotranspiration from ground-based meteorological data and global land data assimilation system (GLDAS). Stoch. Environ. Res. Risk Assess. 2015, 29, 1963–1992. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]
Yao, Y.; Liang, S.; Li, X.; Chen, J.; Liu, S.; Jia, K.; Zhang, X.; Xiao, Z.; Fisher, J.B.; Mu, Q.; et al. Improving global terrestrial evapotranspiration estimation using support vector machine by integrating three process-based algorithms. Agric. For. Meteorol. 2017, 242, 55–74. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Hao, W.; Gao, L.; Gong, D. Estimation of soil temperature from meteorological data using different machine learning models. Geoderma 2019, 338, 67–77. [Google Scholar] [CrossRef]
Raghavendra, N.S.; Deka, P.C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372–386. [Google Scholar] [CrossRef]
Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. Promot. Commun. Stat. Stata 2020, 20, 3–29. [Google Scholar] [CrossRef]
Eslamian, S.S.; Gohari, S.A.; Zareian, M.J.; Firoozfar, A. Estimating Penman–Monteith reference evapotranspiration using artificial neural networks and genetic algorithm: A case study. Arab. J. Sci. Eng. 2012, 37, 935–944. [Google Scholar] [CrossRef]
Abdullah, S.S.; Malek, M.A.; Mustapha, A.; Aryanfar, A. Hybrid of artificial neural network-genetic algorithm for prediction of reference evapotranspiration in arid and semiarid regions. J. Agric. Sci. 2014, 6, 191. [Google Scholar] [CrossRef]
Ikram, R.M.A.; Mostafa, R.R.; Chen, Z.; Islam, A.R.M.T.; Kisi, O.; Kuriqi, A.; Zounemat-Kermani, M. Advanced hybrid metaheuristic machine learning models application for reference crop evapotranspiration prediction. Agronomy 2022, 13, 98. [Google Scholar] [CrossRef]
Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
Lobit, P.; López Pérez, L.; Lhomme, J.P. Retrieving air humidity, global solar radiation, and reference evapotranspiration from daily temperatures: Development and validation of new methods for Mexico. Part II: Radiation. Theor. Appl. Climatol. 2018, 133, 799–810. [Google Scholar] [CrossRef]
Dombrowski, O.; Hendricks Franssen, H.J.; Brogi, C.; Bogena, H.R. Performance of the ATMOS41 All-in-One Weather Station for Weather Monitoring. Sensors 2021, 21, 741. [Google Scholar] [CrossRef] [PubMed]
Fu, T.; Li, X.; Jia, R.; Feng, L. A novel integrated method based on a machine learning model for estimating evapotranspiration in dryland. J. Hydrol. 2021, 603, 126881. [Google Scholar] [CrossRef]
Tolk, J.A.; Evett, S.R.; Howell, T.A. Advection influences on evapotranspiration of alfalfa in a semiarid climate. Agron. J 2006, 98, 1646–1654. [Google Scholar] [CrossRef]
Berengena, J.; Gavilán, P. Reference Evapotranspiration Estimation in a Highly Advective Semiarid Environment. J. Irrig. Drain. Eng. 2005, 131, 147–163. [Google Scholar] [CrossRef]
Feng, Y.; Peng, Y.; Cui, N.; Gong, D.; Zhang, K. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput. Electron. Agric. 2017, 136, 71–78. [Google Scholar] [CrossRef]
Doreswamy, I.G.; Manjunatha, B.R. Performance evaluation of predictive models for missing data imputation in weather data. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India, 22–24 August 2017; pp. 1327–1334. [Google Scholar]
López-Urrea, R.; de Olalla, F.M.S.; Fabeiro, C.; Moratalla, A. An evaluation of two hourly reference evapotranspiration equations for semiarid conditions. Agric. Water Manag. 2006, 86, 277–282. [Google Scholar] [CrossRef]

Figure 1. Alfalfa harvesting patterns and days when alfalfa height and LAI measurements were taken for growing seasons from 1996 to 1998.

Figure 2. Kc values for the days when reference conditions were met for alfalfa growing period (1996–1998).

Figure 3. Structure of the machine learning models: ANN (a), SVR (b), ELM (c), and RF (d).

Figure 4. Flow diagram for tuning the hyperparameters of the machine learning models.

Figure 5. Representation of radiation and aerodynamic components for ET_r estimation.

Figure 6. Relationship between ASCE, machine learning algorithms, and ET_r-lys at daily timescale for days when reference conditions were met. (a) GA-ELM-DT₃, (b) ASCE-PM, (c) GA-ANN-DT₄, (d) GA-ELM-DT₆.

Figure 7. Residual plots for ASCE and machine learning algorithms at a daily timescale. (a) GA-ELM-DT3, (b) ASCE-PM, (c) GA-ANN-DT4, (d) GA-ELM-DT6.

Figure 8. Residual plots for ASCE and machine learning algorithms at hourly timescale. (a) GA-ELM-DT₂, (b) ASCE-PM, (c) GA-ANN-DT₃, (d) GA-ELM-DT₇.

Figure 9. Relationship between ASCE, machine learning algorithms, and ET_r-lys at hourly timescale. (a) GA-ELM-DT₂, (b) ASCE-PM, (c) GA-ELM-DT₃, (d) GA-ELM-DT₇.

Figure 10. Residual plots for ASCE and machine learning algorithms at quarter-hourly timescale. (a) ASCE-PM, (b) GA-ANN-DT₂, (c) GA-ELM-DT₇.

Figure 11. Relationship between ASCE, machine learning algorithms and ET_r-lys at quarter-hourly timescale. (a) ASCE-PM, (b) GA-ELM-DT₂, (c) GA-ELM-DT₇.

Table 1. Parameters of the ASCE-PM at different timescales.

Version	Time Step	C_n	C_d	r_s (m s⁻¹)
ASCE-PM			Tall reference (0.5 m high)
	Daily	1600	0.38	45
	Hourly (daytime)	66	0.25	30
	Hourly (nighttime)	66	1.7	200

Table 2. Parameter combinations at different timescales used for training machine learning models.

Daily
Factor	DT₁	DT₂	DT₃	DT₄	DT₅	DT₆	DT₇	DT₈	DT₉	DT₁₀	DT₁₁
VPD	×	×	×		×					×
∆	×	×			×
R_n	×	×	×	×
u₂	×	×	×	×	×	×	×	×		×
R_s/R_so		×	×		×	×	×		×		×
T_mean			×	×			×	×
RH_mean			×	×	×		×	×
RH_max						×
R_s						×	×	×		×	×
RH_min						×
T_max						×			×	×	×
T_min						×			×	×	×
Quarter-hourly and Hourly
VPD	×	×	×		×
∆	×	×			×
R_n	×	×	×	×
u₂	×	×	×	×	×	×	×
R_s/R_so		×	×		×	×
R_s						×	×
T_mean			×	×		×	×
RH_mean			×	×	×	×	×

Table 3. Statistical indicators for machine learning ET_r estimations at a daily timescale (fit linear equations representing the slope and intercept for each respective model are presented in italics).

Daily
	DT₁			DT₂			DT₃			DT₄			DT₅			DT₆
	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE
ANN	0.83	1.12	0.79	0.76	1.41	0.84	0.83	1.17	0.86	0.91	0.84	0.58	0.82	1.19	0.90	0.87	1.01	0.74
ANN	0.93x + 0.53			0.94x + 0.52			0.97x + 0.31			0.90x + 0.70			0.83x + 1.10			0.95x + 0.26
ELM	0.85	1.02	1.03	0.68	0.99	0.68	0.89	0.91	0.61	0.90	0.85	0.56	0.80	1.20	0.90	0.88	0.93	0.61
ELM	0.87x + 1.07			0.90x + 0.79			0.97x − 0.03			0.93x + 0.41			0.93x + 0.31			0.95x + 0.26
SVR	0.79	1.24	0.81	0.77	1.28	0.79	0.87	0.95	0.69	0.88	1.92	0.63	0.74	1.39	1.11	0.84	1.07	0.80
SVR	0.92x + 0.50			0.89x + 0.84			0.93x + 0.35			0.89x + 0.82			0.78x + 1.57			0.88x + 0.76
RF	0.83	1.06	0.92	0.86	0.99	0.98	0.85	1.12	1.04	0.89	0.83	0.73	0.81	1.21	1.14	0.85	0.90	0.83
RF	0.82x + 1.22			0.97x + 0.24			1.01x + 0.34			0.98x + 0.37			0.74x + 2.15			0.97x + 0.45
	DT₇			DT₈			DT₉			DT₁₀			DT₁₁			ASCE-PM
ANN	0.83	1.21	0.86	0.89	0.93	0.70	0.62	1.67	1.30	0.77	1.31	0.85	0.83	1.27	0.99	0.94	0.75	0.57
ANN	0.99x + 0.17			0.94x + 0.34			0.63x + 2.76			0.86x + 0.98			0.83x + 1.43			0.91x + 0.30
ELM	0.83	1.09	0.79	0.88	0.92	0.69	0.63	1.63	1.31	0.81	1.17	0.73	0.81	1.17	0.91
ELM	0.96x + 0.31			0.85x + 1.09			0.68x + 2.46			0.92x + 0.56			0.83x + 1.50
SVR	0.81	1.18	0.82	0.87	0.98	0.68	0.43	2.03	1.51	0.73	1.39	0.90	0.66	1.57	1.08
SVR	0.92x + 0.71			0.88x + 0.81			0.435x + 4.42			0.85x + 1.15			0.65x + 3.02
RF	0.91	0.95	0.86	0.89	0.93	0.89	0.73	1.37	1.40	0.84	1.07	1.12	0.83	1.17	1.06
RF	0.94x + 0.38			0.97x + 0.46			0.92x + 0.79			0.83x + 0.84			0.82x + 1.22

Table 4. Statistical indicators for machine learning ET_r-lys estimations at the hourly and quarter-hourly timescales (fit linear equations representing the slope and intercept for each respective model are presented in italics).

Hourly
	DT₁			DT₂			DT₃			DT₄			DT₅			DT₆			DT₇
	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE
ANN	0.94	0.09	0.06	0.94	0.09	0.06	0.96	0.08	0.05	0.96	0.08	0.05	0.89	0.12	0.08	0.94	0.09	0.06	0.96	0.07	0.04
ANN	0.90x − 0.010			1.03x + 0.020			0.91x + 0.030			0.99x − 0.010			0.82x + 0.030			0.99x + 0.050			1.02x + 0.010
ELM	0.97	0.06	0.04	0.97	0.06	0.04	0.97	0.06	0.04	0.97	0.06	0.04	0.93	0.10	0.06	0.97	0.06	0.04	0.97	0.06	0.04
ELM	0.98x + 0.011			0.98x + 0.010			0.98x + 0.009			0.98x + 0.011			0.98x + 0.010			0.95x + 0.255			0.98x + 0.011
SVR	0.97	0.07	0.04	0.97	0.06	0.04	0.97	0.06	0.04	0.97	0.06	0.04	0.93	0.10	0.06	0.97	0.06	0.04	0.97	0.07	0.04
SVR	0.98x + 0.012			0.98x + 0.009			0.98x + 0.009			0.984x + 0.012			0.94x + 0.023			0.98x + 0.011			0.98x + 0.011
RF	0.97	0.07	0.04	0.97	0.06	0.04	0.97	0.06	0.04	0.97	0.07	0.04	0.93	0.10	0.06	0.97	0.07	0.04	0.97	0.07	0.04
RF	0.97x + 0.012			0.97x + 0.011			0.97x + 0.012			0.97x + 0.012			0.91x + 0.030			0.97x + 0.013			0.97x + 0.013
Quarter-Hourly
ANN	0.91	0.03	0.02	0.94	0.02	0.02	0.94	0.02	0.01	0.94	0.02	0.02	0.87	0.03	0.02	0.94	0.02	0.02	0.92	0.03	0.02
ANN	1.030x + 0.0066			1.004x + 0.0004			1.015x − 0.0004			0.944x + 0.0084			0.892x + 0.0115			0.881x + 0.0114			0.806x + 0.0089
ELM	0.95	0.02	0.01	0.96	0.02	0.01	0.96	0.02	0.01	0.95	0.02	0.01	0.91	0.03	0.02	0.96	0.02	0.01	0.95	0.02	0.01
ELM	0.956x + 0.0038			0.959x + 0.0034			0.959x + 0.0035			0.956x + 0.0038			0.906x + 0.008			0.960x + 0.0035			0.955x + 0.004
SVR	0.95	0.02	0.01	0.96	0.02	0.01	0.96	0.02	0.01	0.95	0.02	0.01	0.91	0.03	0.02	0.96	0.02	0.01	0.95	0.02	0.01
SVR	0.960x + 0.0034			0.969x + 0.0031			0.969x + 0.0032			0.959x + 0.0035			0.913x + 0.0067			0.969x + 0.0032			0.957x + 0.004
RF	0.96	0.02	0.01	0.96	0.02	0.01	0.96	0.02	0.01	0.96	0.02	0.01	0.92	0.03	0.02	0.96	0.02	0.01	0.95	0.02	0.01
RF	0.957x + 0.0036			0.956x + 0.0037			0.953x + 0.0039			0.958x + 0.0036			0.915x + 0.0072			0.957x + 0.0037			0.955x + 0.004

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kiraga, S.; Peters, R.T.; Molaei, B.; Evett, S.R.; Marek, G. Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment. Water 2024, 16, 12. https://doi.org/10.3390/w16010012

AMA Style

Kiraga S, Peters RT, Molaei B, Evett SR, Marek G. Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment. Water. 2024; 16(1):12. https://doi.org/10.3390/w16010012

Chicago/Turabian Style

Kiraga, Shafik, R. Troy Peters, Behnaz Molaei, Steven R. Evett, and Gary Marek. 2024. "Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment" Water 16, no. 1: 12. https://doi.org/10.3390/w16010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reference Evapotranspiration Estimation Using Genetic Algorithm-Optimized Machine Learning Models and Standardized Penman–Monteith Equation in a Highly Advective Environment

Abstract

1. Introduction

2. Materials and Methods

2.1. Lysimetric and Weather Data Collection

2.2. Data Processing

2.3. Calculation of Parameters for Reference Evapotranspiration Estimation

2.4. Machine Learning Algorithms and Optimization

2.4.1. Genetic Algorithm

2.4.2. k-Folds Cross Validation

2.4.3. Arrangement of Datasets for Machine Learning Models

2.5. Evaluation Metrics

3. Results and Discussion

3.1. Comparison of the Estimation Accuracy of the ASCE-PM and Machine Learning Models at a Daily Timescale

3.2. Comparison of the Estimation Accuracy of the ASCE-PM and Machine Learning Models at Hourly and Quarter-Hourly Timescale

3.3. Transferability of the Developed Machine Learning Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI