**1. Introduction**

According to Brutsaert [1], the interception process is determined by the rainfall fraction that moistens vegetation and that is temporarily stored on it, before evaporating. When the vegetation cover is fully saturated, the interception storage capacity is achieved. In practice, the interception storage capacity is denoted as rainfall left on the canopy at the end of the rainfall after all drip has ceased [2]. The water stored on the canopy may evaporate soon after, thus short-circuiting the hydrologic cycle.

Although most surfaces can store only a few millimeters of rainfall, which is often not much in comparison to other stocks in the water balance, interception is generally a significant process and its impact becomes evident at longer time scales [3]. Thus, interception storage is generally small, but the number of times that the storage is filled and depleted can be so large that the evaporation losses by wet canopy may become of the same order of magnitude as the transpiration flux [4].

Evaporation flux by canopy exerts a negative effect on plant water consumption by preventing water from reaching the soil surface, thus the plant roots [2,3]. In contrast, the remaining rainfall (i.e., the net rainfall) reaches the soil surface either as throughfall or by flowing down branches and stems as stemflow. Throughfall is the fraction of water that reaches the soil surface directly through the canopy gaps without hitting the canopy surfaces, or indirectly through dripping from the leaves and branches [5].

The interception may also exert important effects on surface runoff [6], providing a certain delay compared with the time of the beginning of the rain. For the *Dunnian* mechanism of runoff generation, Baiamonte [7] showed the effect induced by the interception process on the delay time, and emphasized that the effect is more frequent for well-drained soils [8,9] in humid regions, for low rainfall intensities and high groundwater table, when infiltration capacity exceeds the rainfall intensity. Indeed, the latter conditions also occurs

**Citation:** Baiamonte, G. Simplified Interception/Evaporation Model. *Hydrology* **2021**, *8*, 99. https:// doi.org/10.3390/hydrology8030099

Academic Editors: Aristoteles Tegos and Nikolaos Malamos

Received: 6 June 2021 Accepted: 30 June 2021 Published: 2 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

for high density of cover crops or forest soils that are reach in organic carbon and are very structured [10].

The proportion of the precipitation that does not reach the ground, i.e., the interception loss, depends on the type of vegetation (forest, tree, or grassland), its age, density of planting and the season of the year. The interception loss also depends on rainfall regime, thus on climate. For example, in tall dense forest vegetation at temperate latitudes interception loss as large as 30–40% of the gross precipitation has been observed [11], whereas in tropical forest with high rainfall intensity was of the order 10–15% [12] and in heather and shrub also 10–15% [13]. In arid and semi-arid areas, where there is little vegetation, the interception loss is negligible.

Since interception is an important component of the water balance, a comprehensive evaluation of interception loss by prediction tools has been considered of great value in the study of hydrological processes, and different formulations, at hourly and event scale, have been introduced in the hydrologic literature. Muzylo et al. [14] wrote an interesting review paper, where the principal models proposed in the literature are described, and their characteristics reported (input temporal scale, output variable, number of parameters, layers, spatial scale).

Linsley et al. [15] modified the very simple interception model first introduced by Horton [16], which did not account for the amount of gross rainfall, since it assumed that the rainfall in each storm completely filled the interception storage. Linsley et al. [15] assumed that the interception loss approached exponentially to the interception capacity as the amount of rainfall increased [17]. Then, this simple sketch was applied and tested by Merrian [18], who studied the effect of fog intensity and leaf shape on water storage on leaves, by using a simple fog wind tunnel and leaves of aluminum and plastic. Merrian [18] found that drip measurements were reasonably close to values predicted, by using an exponential equation based on fog flow and leaf storage capacity.

Rutter et al. [19,20] were the first to model forest rainfall interception recognizing that the process was primarily driven by evaporation from the wetted canopy. The conceptual model developed by Rutter et al. [19,20] describes the interception loss in terms of both the structure of the forest and the climate in which it is growing. The model is physically based, thus it has potential for application in all areas, where there are suitable data. In Rutter et al. [20], the model's definitive version was developed by adding a stemflow module, in which a fraction of the rainfall input is diverted to a compartment comprising the trunks. Early applications of Rutter-type models were made by Calder [21] and Gash and Morton [22].

The rate of evaporation increases with solar radiation and temperature. The process also depends on the air humidity and the wind speed. The greater the humidity, the less the evaporation. Wind carries moist air away from the ground surface, so wind decreases the local humidity and allows more water to evaporate. Therefore, in the Rutter model, the evaporation flux was calculated from the form of the Monteith–Penman equation [23]. Later, Gash [24] provided a simplification of the data-demanding Rutter model. Although some of the assumptions of the Gash model may not be suitable, the model has been shown to work well under a variety of forest types, including different species and sites [11,25–27].

For agricultural crops and for grassland, where interception loss is of the order 13–19% [28], Von Hoyningen-Hüne [29] and Braden [30], proposed a general formula, which was used in the SWAP model [31] that however can be applied only at daily scale.

By the experimental point of view, the interception evaporation process requires monitoring intercepted mass and interception loss with high accuracy and time resolution, to provide accurate estimates. Net precipitation techniques, in which interception evaporation is determined from the difference between gross precipitation and throughfall, fulfill many of the requirements but usually have too-low accuracy and time resolution for process studies. Furthermore, for grassland, these techniques are unsuitable.

In this paper, we explored the rainfall partitioning in net rainfall and evaporation losses by canopy, by using a very simplified sketch of the interception process, which

combines a modified exponential equation applied and tested by Merrian [18], accounting for the antecedent volume stored on the canopy, and a simple power-law equation to compute evaporation by wet canopy. We are aware that the considered approach is far from the sophisticated physically based developments that were performed to quantify interception and evaporations losses. However, the latter may require many parameters that are not easy to determine. It is shown that the simplified parsimonious approach may lead to a reasonable quantification of this important component of the hydrologic cycle, which can be useful when a rough estimate is required, in absence of a detailed characterization of the canopy and of the climate conditions. It is also shown that the Merrian model can be derived by considering a simple linear storage model. For faba bean cover crop, an application of the suggested procedure is performed and discussed. In this paper, we explored the rainfall partitioning in net rainfall and evaporation losses by canopy, by using a very simplified sketch of the interception process, which combines a modified exponential equation applied and tested by Merrian [18], accounting for the antecedent volume stored on the canopy, and a simple power-law equation to compute evaporation by wet canopy. We are aware that the considered approach is far from the sophisticated physically based developments that were performed to quantify interception and evaporations losses. However, the latter may require many parameters that are not easy to determine. It is shown that the simplified parsimonious approach may lead to a reasonable quantification of this important component of the hydrologic cycle, which can be useful when a rough estimate is required, in absence of a detailed characterization of the canopy and of the climate conditions. It is also shown that the Merrian model can be derived by considering a simple linear storage model. For faba bean cover crop, an

#### **2. Rainfall Data Set, Wet and Dry Spells** application of the suggested procedure is performed and discussed.

Rainfall time series were analyzed according to Agnese et al. [32], who applied the discrete three-parameter Lerch distribution to model the frequency distribution of interarrival times, *IT*, derived from daily precipitation time-series, for the Sicily region, in Italy. Agnese et al. [32] showed a good fitting of the Lerch distribution, thus evidencing the wide applicability of this kind of distribution [33], also allowing us to jointly model dry spells, *DS*, and wet spells, WS. **2. Rainfall Data Set, Wet and Dry Spells**  Rainfall time series were analyzed according to Agnese et al. [32], who applied the discrete three-parameter Lerch distribution to model the frequency distribution of interarrival times, *IT*, derived from daily precipitation time-series, for the Sicily region, in Italy. Agnese et al. [32] showed a good fitting of the Lerch distribution, thus evidencing the wide applicability of this kind of distribution [33], also allowing us to jointly model dry

Since this work aimed at modelling by continuous rainfall data series the interception losses, during the *WS* and the evaporation losses during *DS*, only the frequency distribution of the two latter were considered, which are defined in the following, according to Agnese et al. [32]. spells, *DS*, and wet spells, WS. Since this work aimed at modelling by continuous rainfall data series the interception losses, during the *WS* and the evaporation losses during *DS*, only the frequency distribution of the two latter were considered, which are defined in the following, according to

For any rainfall data series, the ten minutes temporal scale, τ, which is appropriate to model both interception and evaporation processes, could be considered in order to describe clustering and intermittency characters of continuous rainfall data series. Agnese et al. [32]. For any rainfall data series, the ten minutes temporal scale, τ, which is appropriate to model both interception and evaporation processes, could be considered in order to de-

Let *H* = {*h*1, *h*2, . . . , *hn*}, a time-series of rainfall data of size *n*, spaced at uniform time-scale τ. The sub-series of *H* can be defined as the event series, *E* {*t*1, *t*2, . . . , *tnE*}, where *n<sup>E</sup>* (0 < *n<sup>E</sup>* < *n*) is the size of *E*, which is an integer multiple of time-scale τ. The succession constituted by the times elapsed between each element of the *E* series, with exception of the first one and the immediately preceding one, is defined as the inter-arrival time-series, *IT* {*T*1, *T*2, . . . , *TnE*}, with size *nE*−<sup>1</sup> (Figure 1). It can be observed the sequence of dry spells, *DS*, can be derived from *IT* dataset by using the relationship {*DS<sup>k</sup>* } = {*T<sup>k</sup>* } − 1 for any *T<sup>k</sup>* > 1. scribe clustering and intermittency characters of continuous rainfall data series. Let *H* = {*h*1, *h*2, …, *hn*}, a time-series of rainfall data of size *n*, spaced at uniform timescale τ. The sub-series of *H* can be defined as the event series, *E* {*t*1, *t*2, …, *tnE*}, where *nE* (0 < *nE* < *n*) is the size of *E*, which is an integer multiple of time-scale τ. The succession constituted by the times elapsed between each element of the *E* series, with exception of the first one and the immediately preceding one, is defined as the inter-arrival time-series, *IT* {*T*1, *T*2, …, *TnE*}, with size *nE−*<sup>1</sup> (Figure 1). It can be observed the sequence of dry spells, *DS*, can be derived from *IT* dataset by using the relationship {*DSk*} = {*Tk*} − 1 for any *Tk* > 1.

**Figure 1.** Sketch of inter-arrival times (*IT*), dry spells (*DS*) and wet spells (*WS*). Reprinted with permission from ref. [32], Copyright 2014, Elsevier (Amsterdam, The Netherlands). **Figure 1.** Sketch of inter-arrival times (*IT*), dry spells (*DS*) and wet spells (*WS*). Reprinted with permission from ref. [32], Copyright 2014, Elsevier (Amsterdam, The Netherlands).

Figure 1 shows an example of a sequence of wet and dry spells. In the context of this work, which aims at modelling the interception process during *WS* and the evaporation process from the canopy, as previously observed, only the frequency distribution of wet spells (*WS*) and dry spells (*DS*) were derived. Figure 1 shows an example of a sequence of wet and dry spells. In the context of this work, which aims at modelling the interception process during *WS* and the evaporation process from the canopy, as previously observed, only the frequency distribution of wet spells (*WS*) and dry spells (*DS*) were derived.

For the 2009 rainfall data series of Fontanasalsa station (Trapani, 37°56′37″ N, 12°33′12″ E, western Sicily, Italy), which will be considered for an example application, Figure 2 shows the complete characteristics of the rainfall regime. In particular, Figure 2a For the 2009 rainfall data series of Fontanasalsa station (Trapani, 37◦5603700 N, 12◦3301200 E, western Sicily, Italy), which will be considered for an example application, Figure 2 shows the complete characteristics of the rainfall regime. In particular, Figure 2a describes the *DS* distribution of frequency, *F*, whereas Figure 2b plots the *WS* distribution associated with the cumulated rainfall depth collected with 10 min time resolution. Both figures also illustrate the frequency of non-exceedance, corresponding the selected time resolution (τ = 1 = 10 min) that equals 0.189 for *DS* and much higher for *WS* (0.596), indicating that a high fraction of rainfall with *WS* = 1 occurs, which could play an important role in the interception process. Figure 2b also plots the rainfall depth distribution corresponding to *WS*, where the frequency is calculated with respect to the yearly rainfall depth, *hyear*, which equals 885.2 mm. Therefore, in Figure 2b, to *WS* = 1 (*F*<sup>1</sup> = 0.115) corresponds a rainfall depth equals to 101.6 mm that, if associated with subsequent large enough *DS*, may potentially evaporate from the canopy. resolution (τ = 1 = 10 min) that equals 0.189 for *DS* and much higher for *WS* (0.596), indicating that a high fraction of rainfall with *WS* = 1 occurs, which could play an important role in the interception process. Figure 2b also plots the rainfall depth distribution corresponding to *WS*, where the frequency is calculated with respect to the yearly rainfall depth, *hyear*, which equals 885.2 mm. Therefore, in Figure 2b, to *WS* = 1 (*F*1 = 0.115) corresponds a rainfall depth equals to 101.6 mm that, if associated with subsequent large enough *DS*, may potentially evaporate from the canopy.

describes the *DS* distribution of frequency, *F*, whereas Figure 2b plots the *WS* distribution associated with the cumulated rainfall depth collected with 10 min time resolution. Both figures also illustrate the frequency of non-exceedance, corresponding the selected time

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 4 of 16

**Figure 2.** For the investigated year (2009), distributions of frequency, *F*, (**a**) of dry spells (*DS*), and (**b**) of wet spells (*WS*) and of the rainfall depth (*h*τ) corresponding to *WS*. **Figure 2.** For the investigated year (2009), distributions of frequency, *F*, (**a**) of dry spells (*DS*), and (**b**) of wet spells (*WS*) and of the rainfall depth (*hτ*) corresponding to *WS*.

#### **3. Interception Rate and Stored Volumes during Wet Spells 3. Interception Rate and Stored Volumes during Wet Spells**

The interception Merrian [18]'s model states that: The interception Merrian [18]'s model states that:

$$I\text{CS} = S\left[1 - \exp\left(-\frac{R}{S}\right)\right] \tag{1}$$

storage volume corresponding to the cumulated rainfall volume *R* (mm). Equation (1) can be applied for dry initial condition, i.e., when the no water volume is stored on the canopy at the beginning of the rainfall. However, it is demonstrated here that the Merrian model can be derived by considering a simple linear stored model, also used in other hydrological contexts [34], which where *S* (mm) is the interception capacity of the canopy, and *ICS* (mm) is the interception storage volume corresponding to the cumulated rainfall volume *R* (mm). Equation (1) can be applied for dry initial condition, i.e., when the no water volume is stored on the canopy at the beginning of the rainfall.

made it possible to also account for the antecedent interception volume. The latter is useful for the applications, when in between two consecutive *WSs*, *DS* is not long enough to result in full evaporation of the rainfall volume intercepted by the canopy starting from the end of a *WS*. Thus, a residual water volume, *ICS*0, is still stored on the canopy at the beginning of the subsequent *WS*, and it needs to be considered as initial condition. Therefore, for the purpose of this study, which aims at estimating interception losses during *WS* and *DS* sequences of different amount and duration, Equation (1) needs to be extended to different antecedent water interception storage volume, *ICS*0 (mm), before a new *WS* takes place, after a *DS* when evaporation process ceases. Consider a simple linear reservoir, miming the interception storage volume (Figure 3). A stationary rainfall of intensity *r* (mm h−1) uniformly distributed over the canopy, is However, it is demonstrated here that the Merrian model can be derived by considering a simple linear stored model, also used in other hydrological contexts [34], which made it possible to also account for the antecedent interception volume. The latter is useful for the applications, when in between two consecutive *WSs*, *DS* is not long enough to result in full evaporation of the rainfall volume intercepted by the canopy starting from the end of a *WS*. Thus, a residual water volume, *ICS*0, is still stored on the canopy at the beginning of the subsequent *WS*, and it needs to be considered as initial condition. Therefore, for the purpose of this study, which aims at estimating interception losses during *WS* and *DS* sequences of different amount and duration, Equation (1) needs to be extended to different antecedent water interception storage volume, *ICS*<sup>0</sup> (mm), before a new *WS* takes place, after a *DS* when evaporation process ceases.

applied to the reservoir. In the time *dt*, the interception storage, *ICS* (mm), stored in the canopy equals *dICS*. The interception capacity, i.e., the maximum rainfall volume that can be stored on the canopy, is denoted as *S* (mm). The duration of rainfall that needs to Consider a simple linear reservoir, miming the interception storage volume (Figure 3). A stationary rainfall of intensity *r* (mm h−<sup>1</sup> ) uniformly distributed over the canopy, is applied to the reservoir. In the time *dt*, the interception storage, *ICS* (mm), stored in the canopy equals *dICS*. The interception capacity, i.e., the maximum rainfall volume that can be stored on the canopy, is denoted as *S* (mm). The duration of rainfall that needs to achieve the saturation of the canopy, without dripping out from the reservoir, *t<sup>s</sup>* (h), can be expressed as:

$$t\_s = \frac{\mathcal{S}}{r} \tag{2}$$

be expressed as:

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 5 of 16

**Figure 3.** Sketch of the linear water tank describing the interception process, for a maximum water volume stored on the canopy, *S* (mm), corresponding to the interception capacity. **Figure 3.** Sketch of the linear water tank describing the interception process, for a maximum water volume stored on the canopy, *S* (mm), corresponding to the interception capacity.

achieve the saturation of the canopy, without dripping out from the reservoir, *ts* (h), can

 

From a physical point of view, *ts* is not very meaningful since it describes the duration of rainfall that is required for the canopy saturation, for dry antecedent conditions (*ICS*0 = 0), and with no losses from the reservoir, e.g., no dripping and no net rainfall intensity *p<sup>n</sup>* (mm h−1), which actually occurs (Figure 3). However, *ts* represents a characteristic time of the interception process that it is useful to introduce for the following derivations.

By assuming that the water volume stored in the reservoir linearly varies with the output net rainfall intensity, *pn*, according to the time *ts* (Equation (2)), the balance of the water volumes can be expressed as: - (3) From a physical point of view, *t<sup>s</sup>* is not very meaningful since it describes the duration of rainfall that is required for the canopy saturation, for dry antecedent conditions (*ICS*<sup>0</sup> = 0), and with no losses from the reservoir, e.g., no dripping and no net rainfall intensity *p<sup>n</sup>* (mm h−<sup>1</sup> ), which actually occurs (Figure 3). However, *t<sup>s</sup>* represents a characteristic time of the interception process that it is useful to introduce for the following derivations.

Separation of variables may be used to solve the ordinary differential equation: (4) By assuming that the water volume stored in the reservoir linearly varies with the output net rainfall intensity, *pn*, according to the time *t<sup>s</sup>* (Equation (2)), the balance of the water volumes can be expressed as:

$$r\,dt - p\_n\,dt = d\text{ICS} = t\_s\,dp\_n\tag{3}$$

 (5) Separation of variables may be used to solve the ordinary differential equation:

$$dt = \frac{t\_s}{r - p\_n} dp\_n \tag{4}$$

(2)

Solving Equation (6) yields: By assuming as initial conditions:

Integration of Equation (4) provides:

$$t = t\_0 \qquad p\_n = p\_{n0} \tag{5}$$

Equation (7) can be made explicit to derive the net rainfall intensity, *pn*: Integration of Equation (4) provides:

$$\int\_{t\_0}^{t} dt = \int\_{p\_{n0}}^{p\_n} \frac{t\_s}{r - p\_n} dp\_n \tag{6}$$

Of course, knowledge of Equation (8) makes it possible to determine the interception intensity, *ics* (mm/h): Solving Equation (6) yields:

$$t = t\_0 + t\_s \log \frac{r - p\_{n0}}{r - p\_n} \tag{7}$$

Equation (7) can be made explicit to derive the net rainfall intensity, *pn*:

$$p\_n = r - (r - p\_{n0}) \exp\left(-\frac{t - t\_0}{t\_s}\right) \tag{8}$$

Of course, knowledge of Equation (8) makes it possible to determine the interception intensity, *ics* (mm/h):

$$\text{cics} = r - p\_n = (r - p\_{n0}) \exp\left(-\frac{t - t\_0}{t\_s}\right) \tag{9}$$

Due to the assumption of a linear storage model, the antecedent net rainfall intensity, *pn*0, in Equations (8) and (9), is linked to the antecedent interception volume, *ICS*0, by:

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 6 of 16

$$p\_{n0} = \frac{I\text{CS}\_0}{t\_s} = r\frac{I\text{CS}\_0}{\text{S}}\tag{10}$$

For a compacted graphical illustration, a constant rainfall intensity, *r*, and a zero time antecedent condition *t*<sup>0</sup> = 0 can be assumed. Thus, Equations (8) and (9) normalized with respect to *r* can be expressed as: (10) For a compacted graphical illustration, a constant rainfall intensity, *r*, and a zero time antecedent condition *t*0 = 0 can be assumed. Thus, Equations (8) and (9) normalized with

$$\frac{p\_n}{r} = 1 - \left(1 - \frac{I\text{CS}\_0}{S}\right) \exp\left(-\frac{R}{S}\right) \tag{11}$$

$$\frac{\text{cisc}}{r} = \left(1 - \frac{\text{ICS}\_0}{\text{S}}\right) \exp\left(-\frac{R}{\text{S}}\right) \tag{12}$$

where in Equations (8) and (9) *t<sup>s</sup>* was substituted by Equation (2), and *r* by *R t*, with *R* (mm) the cumulated rainfall depth. Equations (11) and (12) are graphed in Figure 4a,b, respectively. For any fixed *S*, and for a fixed antecedent interception storage volume, *ICS*0, Figure 4a shows that, at increasing *R*, the net rainfall intensity, *pn*, attains the gross rainfall intensity *r* (*pn*/*r* ~ 1), and that, at increasing *ICS*0, the *r* achievement is slower and slower, as it could be expected. Of course, the normalized interception intensity *ics*/*r* plotted in Figure 4b provides complementary curves. where in Equations (8) and (9) *ts* was substituted by Equation (2), and *r* by *R t*, with *R* (mm) the cumulated rainfall depth. Equations (11) and (12) are graphed in Figure 4a and 4b, respectively. For any fixed *S*, and for a fixed antecedent interception storage volume, *ICS*0, Figure 4a shows that, at increasing *R*, the net rainfall intensity, *pn*, attains the gross rainfall intensity *r* (*pn*/*r* ~ 1), and that, at increasing *ICS*0, the *r* achievement is slower and slower, as it could be expected. Of course, the normalized interception intensity *ics*/*r* plotted in Figure 4b provides complementary curves.

**Figure 4.** For different antecedent normalized water volume stored on the canopy, *ICS*0/*S*, (**a**) normalized interception volume, *ICS*/*S*, corresponding the normalized net rainfall intensity, *pn*/*r*, and (**b**) normalized interception intensity, *ics*/*r*, versus the normalized rainfall volume, *R*/*S*. **Figure 4.** For different antecedent normalized water volume stored on the canopy, *ICS*0/*S*, (**a**) normalized interception volume, *ICS*/*S*, corresponding the normalized net rainfall intensity, *pn*/*r*, and (**b**) normalized interception intensity, *ics*/*r*, versus the normalized rainfall volume, *R*/*S*.


In order to determine the interception volume, *ICS* (mm), as a function of its antecedent amount, *ICS*0 (mm), because of the assumption of a linear storage model (Equation (10)), it can be easily observed that Equation (11) also matches the interception storage volume, *ICS*, normalized with respect to the infiltration capacity, *S*: In order to determine the interception volume, *ICS* (mm), as a function of its antecedent amount, *ICS*<sup>0</sup> (mm), because of the assumption of a linear storage model (Equation (10)), it can be easily observed that Equation (11) also matches the interception storage volume, *ICS*, normalized with respect to the infiltration capacity, *S*:

> -

$$\frac{ICS}{S} = 1 - \left(1 - \frac{ICS\_0}{S}\right) \exp\left(-\frac{R}{S}\right) \tag{13}$$

equivalent to Equation (11), but with a different meaning of the output parameters. It is noteworthy to observe that, for dry antecedent condition (*ICS*0 = 0), Equation (13) matches the interception storage model (Equation (1)) proposed by Merrian [18], which as displayed in the vertical axis label of Figure 4a. Therefore, Equation (13) is formally equivalent to Equation (11), but with a different meaning of the output parameters.

was developed only for dry initial conditions, as previously observed. Equation (13) could also be derived analytically, with a lesser extend from a physical point of view, than that provided by the simple linear stored model previously described. It is noteworthy to observe that, for dry antecedent condition (*ICS*<sup>0</sup> = 0), Equation (13) matches the interception storage model (Equation (1)) proposed by Merrian [18], which was developed only for dry initial conditions, as previously observed.

Indeed, denoting *R*0 the antecedent rainfall volume corresponding to *ICS*0, the Merrian's model (Equation (1)) yields: Equation (13) could also be derived analytically, with a lesser extend from a physical point of view, than that provided by the simple linear stored model previously described.

Indeed, denoting *R*<sup>0</sup> the antecedent rainfall volume corresponding to *ICS*0, the Merrian's model (Equation (1)) yields: *Hydrology* **2021**, *8*, x FOR PEER REVIEW 7 of 16

$$ICS = S \left[ 1 - \exp\left(-\frac{R}{S} - \frac{R\_0}{S}\right) \right] \tag{14}$$

The antecedent rainfall volume, *R*0, can be expressed as a function of *ICS*<sup>0</sup> by using Equation (1): The antecedent rainfall volume, *R*0, can be expressed as a function of *ICS*0 by using Equation (1):

$$R\_0 = -S\log\left(1 - \frac{I\text{CS}\_0}{S}\right) \tag{15}$$

Substituting Equation (15) into Equation (14), and normalizing *ICS* with respect to *S*, provides Equation (13), which was to be demonstrated. Substituting Equation (15) into Equation (14), and normalizing *ICS* with respect to *S*, provides Equation (13), which was to be demonstrated.

As an example, for *S* = 0.5 mm and *ICS*<sup>0</sup> = 0, and for a constant rainfall intensity *r* = 1.75 mm/h, Figure 5 graphs *ICS* vs. the cumulated rainfall *R* (solid black line). For *R*<sup>0</sup> = 0.7 mm, corresponding to *ICS*<sup>0</sup> = 0.377 mm, Equation (14) is plotted in Figure 5 (red line), which of course is not physical meaningful for *R* < 0 (dashed line). As an example, for *S* = 0.5 mm and *ICS*0 = 0, and for a constant rainfall intensity *r* = 1.75 mm/h, Figure 5 graphs *ICS* vs. the cumulated rainfall *R* (solid black line). For *R*0 = 0.7 mm, corresponding to *ICS*0 = 0.377 mm, Equation (14) is plotted in Figure 5 (red line), which of course is not physical meaningful for *R* < 0 (dashed line).

**Figure 5.** Relationship between the interception storage, *ICS* (mm), and the rainfall depth, *R* (mm), for dry antecedent condition, *ICS*0 = 0, according to Merrian [18], and for a wet antecedent condition (*ICS*0 = 0.377 mm). **Figure 5.** Relationship between the interception storage, *ICS* (mm), and the rainfall depth, *R* (mm), for dry antecedent condition, *ICS*<sup>0</sup> = 0, according to Merrian [18], and for a wet antecedent condition (*ICS*<sup>0</sup> = 0.377 mm).

For *ICS*0 = 0.377 mm, Equation (13) is also plotted in Figure 5 (round circles) showing that Equation (13) matches Equation (14). Equation (13) will be used in the following to calculate the interception storage volume starting from any *ICS*0 value. For *ICS*<sup>0</sup> = 0.377 mm, Equation (13) is also plotted in Figure 5 (round circles) showing that Equation (13) matches Equation (14). Equation (13) will be used in the following to calculate the interception storage volume starting from any *ICS*<sup>0</sup> value.

#### **4. Evaporation by Wet Canopy during Dry Spells 4. Evaporation by Wet Canopy during Dry Spells**

During a dry spell, *DS*, starting from an antecedent storage volume, *ICS*0 > 0, the evaporation process from the wet canopy takes place. When the canopy is wetted by rain, evaporation of intercepted rainfall is largely a physical process that does not depend on the functioning of stomata. According to a realistic description of evaporation from canopies the Penman–Monteith equation [23,35] could be used, by imposing zero the surface (or canopy) resistance. However, a different approach is used here, which is based on the physical circum-During a dry spell, *DS*, starting from an antecedent storage volume, *ICS*<sup>0</sup> > 0, the evaporation process from the wet canopy takes place. When the canopy is wetted by rain, evaporation of intercepted rainfall is largely a physical process that does not depend on the functioning of stomata. According to a realistic description of evaporation from canopies the Penman–Monteith equation [23,35] could be used, by imposing zero the surface (or canopy) resistance.

stance highlighted well by Babu et al. [36] that evaporation by wet canopy comprises two stages. First, the drying process involves removal of unbound (free) moisture from the surface, and second it involves removal bound moisture from the interior of the leaf till a defined limit, corresponding to a critical moisture content. Apart from the second stage, which refers to water consumption by the canopy, and thus it is beyond the purpose of this study, the first stage comprises (i) a "preheat period", where the drying speed quickly However, a different approach is used here, which is based on the physical circumstance highlighted well by Babu et al. [36] that evaporation by wet canopy comprises two stages. First, the drying process involves removal of unbound (free) moisture from the surface, and second it involves removal bound moisture from the interior of the leaf till a defined limit, corresponding to a critical moisture content. Apart from the second stage, which refers to water consumption by the canopy, and thus it is beyond the purpose of this study, the first

stage comprises (i) a "preheat period", where the drying speed quickly increases, and then (ii) a "constant rate period", where evaporation takes place at the outside surface for the removal of unbound moisture (free water) from the surface of the leaf [36].

The evaporation mechanism by wet canopy of the first stage could be also described by the physical equations requiring the knowledge of climatic parameters and structure parameters of the canopy. However, in agreement with the simple sketch also considered for the interception process, in this simplified study the first stage evaporation mechanism is described by a simple power-law, according to two parameters. A similar power-law equation was also considered by Black et al. [37] to model the cumulative evaporation of an initially wet, deep soil. For the same power-law equation, Ritchie [38] reports the experimental parameters obtained by other researchers for different soils.

One limited experimental campaign, described in the next section for the faba bean, supported this choice and revealed that for fixed outdoor air temperature, *Tex* ( ◦C) the cumulated evaporation volume, per unit leaf surface area, *E* (mm), could be actually described by the power-law equation:

$$E = m \text{ } \text{\textquotedblleft for a fixed } T\_{\text{ex}} \text{ } (^{\circ}\text{C}) \tag{16}$$

where *t* (h) is the time spent after the canopy interception capacity, *S*, is achieved, *m* is a scale parameter and *n* a shape parameter to be determined by experimental measurements.

In order to upscale Equation (16) to any values of air temperature, a regional equation developed for the Sicily region was considered [39]:

$$E\_m = 0.38 \, T\_m^{1.93} \tag{17}$$

where *E<sup>m</sup>* (mm) is the monthly evaporation depth and *T<sup>m</sup>* ( ◦C) is the monthly temperature. By scaling Equation (16) with Equation (17), provides:

$$E = LAI \, m \, t^{\text{\tiny\tiny\tiny\tiny\tiny\tiny\tiny\tiny\tiny\tiny\tag}1.93} \,\tag{18}$$

where the leaf area index, *LAI*, was introduced, accounting for the actual leaf surface from which the water evaporates. Of course, Equation (18) gives the same experimental evaporation amount derived by Equation (17), when the experimental air temperature, *Tex*, is equal to *Tm*.

It should be noted that Equation (18) does not account for wind speed, thus evaporation losses are linked to the wind experimental conditions for which *m* and *n* parameters are determined, otherwise losses are underestimated or overestimated for wind speeds lower and higher than the experimental ones, respectively. Moreover, application of this procedure in regions different from Sicily would require modifying Equations (17) and (18).

For *LAI* = 2 and *Tex* = 18 ◦C, and for fixed experimental values of *m* and *n* parameters, Figure 6 shows evaporation losses, *E*, during the time, with the air temperature, *Tm*, as a parameter.

In order to apply the suggested procedure, according to the discrete nature of rainfall, the cumulated evaporation volume (Equation (18)) needs to be expressed in discrete terms by accounting, as per the interception model, for the antecedent conditions, at the aim to determine the *E* fraction, ∆*E*, which occurs during the dry spells, *DS*, starting from *t*0:

$$
\Delta E = LAI \, m \left( \frac{T\_m}{T\_{\text{ex}}} \right)^{1.93} \left( \left( DS + t\_0 \right)^n - t\_0^n \right) \tag{19}
$$

where the antecedent initial condition, *t*0, refers to the end of the wet spell, *WS*, when the evaporation process starts. Thus, *t*<sup>0</sup> needs to be calculated by Equation (19), by assuming that the evaporation initial condition, *E*0, equals the interception capacity minus the water stored in the canopy as interception, *S* − *ICS*:

$$t\_0 = \left( \left( \frac{T\_{cx}}{T\_{\text{in}}} \right)^{1.93} \frac{S - ICS}{LAI \text{ } m} \right)^{1/n} \tag{20}$$

**Figure 6.** For *LAI* = 2 and an experimental temperature, *Tex* = 18 °C, evaporation losses by canopy per unit surface leaf, *E* (mm), during the time *t* (h), starting from the interception capacity (*E* = 0, *t* = 0), with air temperature, *Tm* (°C), as a parameter. For *Tm* = 22 °C and *DS* = 3 h, the figure also illustrates the evaporation loss, ∆*E* (mm), corresponding to the segment B–C, starting from an initial condition A = (*t*0, *E*0) drier than saturation. **Figure 6.** For *LAI* = 2 and an experimental temperature, *Tex* = 18 ◦C, evaporation losses by canopy per unit surface leaf, *E* (mm), during the time *t* (h), starting from the interception capacity (*E* = 0, *t* = 0), with air temperature, *T<sup>m</sup>* ( ◦C), as a parameter. For *T<sup>m</sup>* = 22 ◦C and *DS* = 3 h, the figure also illustrates the evaporation loss, ∆*E* (mm), corresponding to the segment B–C, starting from an initial condition A = (*t*<sup>0</sup> , *E*<sup>0</sup> ) drier than saturation.

In order to apply the suggested procedure, according to the discrete nature of rainfall, the cumulated evaporation volume (Equation (18)) needs to be expressed in discrete terms by accounting, as per the interception model, for the antecedent conditions, at the aim to determine the *E* fraction, ∆E, which occurs during the dry spells, *DS*, starting from *t*0: Δ\$ /0 %  +& +<sup>12</sup> ,.-. 4 + (19) where the antecedent initial condition, *t*0, refers to the end of the wet spell, *WS*, when the evaporation process starts. Thus, *t*0 needs to be calculated by Equation (19), by assuming By using Equations (18) and (20), an example of ∆*E* calculation (Equation (19)), which is useful for the applications that will be shown, is performed here. Let assume *m* = 0.047, *n* = 0.657, *Tex* = 22 ◦C (Figure 6), and an evaporation loss ∆*E* for a mean temperature of *T<sup>m</sup>* = 22 ◦C needs to be determined. The interception capacity *S* equals 1.5 mm, and the interception volume stored in the canopy *ICS* when the evaporation process takes place equals 0.8 mm, thus the difference *S* − *ICS* = 0.7 mm mimics the amount water loss due to a virtual evaporation till *E*<sup>0</sup> (Figure 6). The corresponding initial condition *t*0, calculated by Equation (20) provides 12 h. The pair (*t*0, *E*0) is illustrated in Figure 6 (point A).

that the evaporation initial condition, *E*0, equals the interception capacity minus the water stored in the canopy as interception, *S* − *ICS*: ,.-. -,/ Assuming that ∆*E* needs to be calculated after a *DS* = 3 h, Equation (19) yields ∆*E* = 0.111 mm, which corresponds to the segment B–C in Figure 6, also indicating an evaporation loss *E* = *E*<sup>0</sup> + ∆*E* = 0.816 mm (point C).

 5 +<sup>12</sup> +& /0 % 6 (20) By using Equations (18) and (20), an example of ∆*E* calculation (Equation (19)), which In order to consider that ∆*E* computation is limited by the antecedent volume stored water on the canopy, *ICS*, actually available to evaporate, the following condition was imposed, and a corrected ∆*E*, denoted as ∆*E*<sup>∗</sup> evaluated:

is useful for the applications that will be shown, is performed here. Let assume *m* = 0.047,

ception volume stored in the canopy *ICS* when the evaporation process takes place equals

0.111 mm, which corresponds to the segment B–C in Figure 6, also indicating an evapora-

$$
\begin{array}{c}
\Delta E\_\* = I\text{CS} & \text{if } \Delta E > I\text{CS} \\
\Delta E\_\* = \Delta E & \text{otherwise}
\end{array}
\tag{21}
$$

0.8 mm, thus the difference *S* − *ICS* = 0.7 mm mimics the amount water loss due to a virtual evaporation till *E*0 (Figure 6). The corresponding initial condition *t*0, calculated by Equation (20) provides 12 h. The pair (*t*0, *E*0) is illustrated in Figure 6 (point A). Assuming that ∆*E* needs to be calculated after a *DS* = 3 h, Equation (19) yields ∆*E* = In both interception and evaporation models previously introduced, neither the canopy actually covering the field (cover fraction), nor the proportion of rain which falls through the canopy without striking a surface (throughfall), were taken into account. In the next section, the latter issues are considered and then the water mass balance is analyzed.

In order to consider that ∆*E* computation is limited by the antecedent volume stored water on the canopy, *ICS*, actually available to evaporate, the following condition was

evaluated:

tion loss *E* = *E*0 + ∆*E* = 0.816 mm (point C).

lyzed.

(Δ\$<sup>∗</sup>

#### **5. Water Mass Balance 5. Water Mass Balance**

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 10 of 16

Δ\$<sup>∗</sup> -

!9 Δ\$ > -

In both interception and evaporation models previously introduced, neither the canopy actually covering the field (cover fraction), nor the proportion of rain which falls through the canopy without striking a surface (throughfall), were taken into account. In the next section, the latter issues are considered and then the water mass balance is ana-

The fraction of ground covered by the canopy [40] plays a fundamental role in estimating interception losses, as well as the proportion of rain that falls through the canopy without striking any surface. In the following, in order to analyze the water mass balance, *C<sup>F</sup>* denotes the fraction of ground covered by the canopy and *p* is the free throughfall coefficient [19,20]. Assuming *p* = 0, meaning that the canopy fully covers the ground at the plant scale, Figure 7 shows two very different conditions in terms of *LAI* and *CF*, which could provide similar evaporation losses, since high *C<sup>F</sup>* (Figure 7a) could be counterbalanced by low *LAI* (Figure 7b), and vice versa. The fraction of ground covered by the canopy [40] plays a fundamental role in estimating interception losses, as well as the proportion of rain that falls through the canopy without striking any surface. In the following, in order to analyze the water mass balance, *CF* denotes the fraction of ground covered by the canopy and *p* is the free throughfall coefficient [19,20]. Assuming *p* = 0, meaning that the canopy fully covers the ground at the plant scale, Figure 7 shows two very different conditions in terms of *LAI* and *CF*, which could provide similar evaporation losses, since high *CF* (Figure 7a) could be counterbalanced by low *LAI* (Figure 7b), and vice versa.

 Δ\$<sup>∗</sup> Δ\$ ℎ <!# (21)

**Figure 7.** For faba bean, two different attributions of the parameters *LAI*, throughfall index, *p*, and fraction of ground covered by the canopy, *CF*. **Figure 7.** For faba bean, two different attributions of the parameters *LAI*, throughfall index, *p*, and fraction of ground covered by the canopy, *CF*.

By considering both *CF* and *p*, as weighting factors, for a gross rainfall depth *R* (mm), the water mass balance is described by the following relationship: By considering both *C<sup>F</sup>* and *p*, as weighting factors, for a gross rainfall depth *R* (mm), the water mass balance is described by the following relationship:

$$R = R\_f + R\_\mathbb{C} + R\_l = R\left(1 - \mathbb{C}\_F\right) + R\left\mathbb{C}\_F\left(1 - p\right) + R\left\mathbb{C}\_F\left.p\right.\right.\tag{22}$$

where *R<sup>f</sup>* (mm) is the portion of *R* that freely achieves the ground in between the plants, *R<sup>c</sup>* (mm) is the fraction of *R* that achieves the canopy, and *R<sup>t</sup>* (mm) is the throughfall. Thus, depending on the plant species, both *p* and *CF* affect *Rc* and could be simply where *R<sup>f</sup>* (mm) is the portion of *R* that freely achieves the ground in between the plants, *R<sup>c</sup>* (mm) is the fraction of *R* that achieves the canopy, and *R<sup>t</sup>* (mm) is the throughfall.

determined by using RGB images [41,42]. According to Equation (22), to evaluate interception losses, in Equation (13) *R* has to be replaced by *Rc*. By assuming *CF* = 1 and *p* = 0 (*R* = *Rc*), for dry initial condition, *ICS*0 = 0, for fixed Thus, depending on the plant species, both *p* and *C<sup>F</sup>* affect *R<sup>c</sup>* and could be simply determined by using RGB images [41,42]. According to Equation (22), to evaluate interception losses, in Equation (13) *R* has to be replaced by *Rc*.

parameters *S* = 0.8 mm, *LAI* = 4, *m* = 0.047, *n* = 0.657, *Tex* = 12 °C, and for three sequences of *WS* and *DS*, a simple application of the procedure is illustrated in Figure 8, where the denoted variables are also indicated. For simplicity, in the figure linear *ICS* and ∆*E* variations were assumed. Figure 8 shows that for the third *DS* a high air temperature (*Tm* = 20 °C) provides the By assuming *C<sup>F</sup>* = 1 and *p* = 0 (*R* = *Rc*), for dry initial condition, *ICS*<sup>0</sup> = 0, for fixed parameters *S* = 0.8 mm, *LAI* = 4, *m* = 0.047, *n* = 0.657, *Tex* = 12 ◦C, and for three sequences of *WS* and *DS*, a simple application of the procedure is illustrated in Figure 8, where the denoted variables are also indicated. For simplicity, in the figure linear *ICS* and ∆*E* variations were assumed.

condition ∆*E* > *ICS*, and only the available water stored on the canopy may evaporate ). Moreover, for this step, according to the initial condition (*t*0, *E*0), the evaporation flux (i.e., the slope) is higher than those corresponding to the lower air temperatures. Figure 8 shows that for the third *DS* a high air temperature (*T<sup>m</sup>* = 20 ◦C) provides the condition ∆*E* > *ICS*, and only the available water stored on the canopy may evaporate (∆*E*∗). Moreover, for this step, according to the initial condition (*t*0, *E*0), the evaporation flux (i.e., the slope) is higher than those corresponding to the lower air temperatures.

> For fixed *T<sup>m</sup>* = 12 ◦C, the evaporation flux is greater for the second *DS* step that starts from a higher *ICS*0, which is near to the saturation compared to that of the first *DS* step. Similarly, for the second *WS* step, although the higher rainfall depth (*R* = 2 mm) with respect to the first *WS* step, *ICS* increases as in the first *WS* step, because of a higher initial condition *ICS*0, agreeing with the dynamic flux of the both considered interception/evaporation models.

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 11 of 16

**Figure 8.** For a simple sequence of three *WSs* and *DSs*, interception loss during *WS*, and evaporation loss, ∆*E*, during *DS*, with *E*0 = *S* − *ICS* as initial condition. For the third *DS*, the condition described by Equation (21) occurs (∆*E* > *ICS*) and Δ\$<sup>∗</sup> = *ICS*. **Figure 8.** For a simple sequence of three *WSs* and *DSs*, interception loss during *WS*, and evaporation loss, ∆*E*, during *DS*, with *E*<sup>0</sup> = *S* − *ICS* as initial condition. For the third *DS*, the condition described by Equation (21) occurs (∆*E* > *ICS*) and ∆*E*<sup>∗</sup> = *ICS*. from a higher *ICS*0, which is near to the saturation compared to that of the first *DS* step. Similarly, for the second *WS* step, although the higher rainfall depth (*R* = 2 mm) with respect to the first *WS* step, *ICS* increases as in the first *WS* step, because of a higher initial condition *ICS*0, agreeing with the dynamic flux of the both considered interception/evap-

For fixed *Tm* = 12 °C, the evaporation flux is greater for the second *DS* step that starts from a higher *ICS*0, which is near to the saturation compared to that of the first *DS* step. Similarly, for the second *WS* step, although the higher rainfall depth (*R* = 2 mm) with respect to the first *WS* step, *ICS* increases as in the first *WS* step, because of a higher initial The flow chart displayed in Figure 9 describes the suggested procedure to calculate the interception/evaporation water losses, which will be used in the following application, where the condition to be imposed (Equation (21)) is also indicated. oration models. The flow chart displayed in Figure 9 describes the suggested procedure to calculate the interception/evaporation water losses, which will be used in the following application, where the condition to be imposed (Equation (21)) is also indicated.

**Figure 9.** Schematic flowchart of the proposed methodology to calculate *ICS* during *WS*, and ∆*E* (or Δ\$<sup>∗</sup> ) during *DS*. **Figure 9.** Schematic flowchart of the proposed methodology to calculate *ICS* during *WS*, and ∆*E* (or ∆*E*∗) during *DS*.

**Figure 9.** Schematic flowchart of the proposed methodology to calculate *ICS* during *WS*, and ∆*E* (or Δ\$<sup>∗</sup> ) during *DS*. The water mass balance applied to the fraction of rainfall intercepted by the canopy, can be tested by checking the balance of the inflowing volumes as interception during the *i WSs* and the outflowing volumes, as evaporation from the canopy, during the *j DSs*, where of course the antecedent stored volumes *ICS*0 are involved: The water mass balance applied to the fraction of rainfall intercepted by the canopy, can be tested by checking the balance of the inflowing volumes as interception during the *i WSs* and the outflowing volumes, as evaporation from the canopy, during the *j DSs*, where of course the antecedent stored volumes *ICS*<sup>0</sup> are involved:

$$\sum\_{\text{JWS}=1}^{i} (I\text{CS} - I\text{CS}\_0) = \sum\_{\text{DS}=1}^{j} \Delta E\_\* \tag{23}$$

*i WSs* and the outflowing volumes, as evaporation from the canopy, during the *j DSs*, ∑ Δ\$<sup>∗</sup> E (24) Once evaporation losses are calculated, the net rainfall *R<sup>n</sup>* (mm) can be evaluated:

FCD,

$$R\_{\rm II} = R - \sum\_{\rm DS=1}^{j} \Delta E\_{\rm \*} \tag{24}$$

#### Once evaporation losses are calculated, the net rainfall *Rn* (mm) can be evaluated: **6. Interception Capacity LAI and Evaporation Measurements for Faba Bean**

 ∑ Δ\$<sup>∗</sup> E FCD, (24) In order to show the results that can be obtained by the application of the simplified procedure, the 2019 rainfall (Figure 2) and temperature data series of Fontanasalsa station were considered, and the required empirical parameters of Equation (19) were obtained by experimental measurements carried out for faba bean. Faba bean is a species of flowering plant in the pea and bean family Fabaceae, originated in the continent of Africa, which is widely cultivated as a crop for human consumption and also used as a cover crop.

A

crop.

For a single plant, planted in mid-November, and sampled after 95 days, the temporal variation of water loss by evaporation starting from the interception capacity, the number of leaves, # Leaves, and the corresponding surface area were measured. Saturation of the canopy was achieved by using sprinkler irrigation, for a fixed outdoor temperature, *Tex* = 12 ◦C. number of leaves, # Leaves, and the corresponding surface area were measured. Saturation of the canopy was achieved by using sprinkler irrigation, for a fixed outdoor temperature, *Tex* = 12 °C. Starting from the interception capacity condition, after dripping has ceased, water loss by evaporation was measured by weighing during the time. The # Leaves (45), and the corresponding surface area (0.043 m2), made it possible to calculate the cumulative

For a single plant, planted in mid-November, and sampled after 95 days, the temporal variation of water loss by evaporation starting from the interception capacity, the

In order to show the results that can be obtained by the application of the simplified procedure, the 2019 rainfall (Figure 2) and temperature data series of Fontanasalsa station were considered, and the required empirical parameters of Equation (19) were obtained by experimental measurements carried out for faba bean. Faba bean is a species of flowering plant in the pea and bean family Fabaceae, originated in the continent of Africa, which is widely cultivated as a crop for human consumption and also used as a cover

Starting from the interception capacity condition, after dripping has ceased, water loss by evaporation was measured by weighing during the time. The # Leaves (45), and the corresponding surface area (0.043 m<sup>2</sup> ), made it possible to calculate the cumulative evaporation volume, per unit leaf surface area, *E* (mm). evaporation volume, per unit leaf surface area, *E* (mm). Figure 10a plots the cumulate experimental *E* values during the time. This made it possible to calculate by a simple linear regression of the corresponding logarithmic values, the *m* and *n* parameters that are required to apply Equation (16). According to Babu et al.

Figure 10a plots the cumulate experimental *E* values during the time. This made it possible to calculate by a simple linear regression of the corresponding logarithmic values, the *m* and *n* parameters that are required to apply Equation (16). According to Babu et al. [36], Figure 10a shows that in a first stage evaporation rate is high, and then the evaporation rate decreases remaining almost constant in the time, for approximately *t* = 1 h (at least for this case), the behavior of which can be described by the power-law well. [36], Figure 10a shows that in a first stage evaporation rate is high, and then the evaporation rate decreases remaining almost constant in the time, for approximately *t* = 1 h (at least for this case), the behavior of which can be described by the power-law well. Of course, it should be noted that there are other secondary factors not considered in the present approach, such as the plant structure, the leaf angle and the light interception, although not easy to consider, which may affect the considered parameters [43], especially when upscaling is needed.

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 12 of 16

**6. Interception Capacity LAI and Evaporation Measurements for Faba Bean** 

**Figure 10.** For Field bean, (**a**) relationship between experimental evaporation losses per unit *LAI*, *E*/*LAI*, during the time *t* (h), starting from the interception capacity, *S* (mm), and (**b**) assumed temporal variation of *LAI* and *S* (mm), from the day of planting (mid-November), considered for the application (Figure 11). **Figure 10.** For Field bean, (**a**) relationship between experimental evaporation losses per unit *LAI*, *E*/*LAI*, during the time *t* (h), starting from the interception capacity, *S* (mm), and (**b**) assumed temporal variation of *LAI* and *S* (mm), from the day of planting (mid-November), considered for the application (Figure 11). *Hydrology* **2021**, *8*, x FOR PEER REVIEW 13 of 16

**Figure 11.** For the year 2009, evaporation loss Δ\$<sup>∗</sup> , interception loss, *ICS*, antecedent volume stored on the canopy, *ICS*0. The figure also illustrate the water mass balance (red line) that was checked by using Equation (23), and the gross and net rainfall depth, *R* and *Rn* (Equation (24)), respectively (secondary axis). **Figure 11.** For the year 2009, evaporation loss ∆*E*∗, interception loss, *ICS*, antecedent volume stored on the canopy, *ICS*<sup>0</sup> . The figure also illustrate the water mass balance (red line) that was checked by using Equation (23), and the gross and net rainfall depth, *R* and *Rn* (Equation (24)), respectively (secondary axis).

For faba bean planted in mid-November, as in this study, in El-Bosaily farm in the Northern coast of Egypt, Hegab et al. [44] measured a maximum *LAI* value equal to 5.5. Of course, it should be noted that there are other secondary factors not considered in the present approach, such as the plant structure, the leaf angle and the light interception,

For faba bean, FAO56 [35] reports ini. = 90, crop dev. = 45, mid.s. = 40, late s. = 0 days, thus no late season was considered since it was assumed that after the mid-season the faba bean was fully removed from the field. For the four growth stages, Figure 10b shows the *LAI* variations versus the days from planting that made it possible the applications described in the next section to be performed. In Figure 10b can be observed that it was assumed that the crop development stage, after 90 days from planting, starts with *LAI* =

According to Brisson et al. [45], the interception capacity was assumed *S* = 0.2 *LAI*. Thus, with increasing *LAI* during crop growth and decreasing *LAI* during senescence, *S* will increase and decrease, respectively. More recently, this relationship between *S* and *LAI* was also considered by Kozak et al. [46]. Figure 10b also graphs the *S* variation during the days from planting. The interception capacity in the day of sampling (cross symbol), i.e., after 95 days from the day of planting, was equal to 0.202, not too far from that exper-

It should be noted that other *S* vs. *LAI* relationships could be used in the presented methodology, and that specific methods, as that proposed by Aston [47] who also found a linear *S* vs. *LAI* relation, could be applied in order to experimentally determine such

For the considered crop described in the previous section (Figure 10a,b), for *CF* = 1 and *p* = 0, and for the 2019 rainfall data series, interception and evaporation losses were calculated according to the described procedure. Figure 11 plots the detailed results of the involved cumulative water balance components, i.e., the rainfall depth, *R* (mm), the inter-

. The latter

ception depth at the end of each wet spell, *ICS*, and the evaporation losses, Δ\$<sup>∗</sup>

locations.

0.5.

imentally measured (0.156).

**7. Example of Application** 

relationships that depend on the considered crops.

This value was assumed for the mid-season stage according to FAO56 [35], which pro-

although not easy to consider, which may affect the considered parameters [43], especially when upscaling is needed.

For the purpose of the present investigation, the relationship between the interception capacity and *LAI* was considered in order to extend to 2009, the corresponding relationships displayed in Figure 10a, which are related to the limited experimental *LAI* value. To consider the *LAI* temporal variations, some data for faba bean available in the literature were considered.

For faba bean planted in mid-November, as in this study, in El-Bosaily farm in the Northern coast of Egypt, Hegab et al. [44] measured a maximum *LAI* value equal to 5.5. This value was assumed for the mid-season stage according to FAO56 [35], which provides general lengths for four distinct growth stages, initial (ini.), crop development (crop dev.), mid-season (mid. s.) and late season (late), for various, crops, types of climates and locations.

For faba bean, FAO56 [35] reports ini. = 90, crop dev. = 45, mid.s. = 40, late s. = 0 days, thus no late season was considered since it was assumed that after the mid-season the faba bean was fully removed from the field. For the four growth stages, Figure 10b shows the *LAI* variations versus the days from planting that made it possible the applications described in the next section to be performed. In Figure 10b can be observed that it was assumed that the crop development stage, after 90 days from planting, starts with *LAI* = 0.5.

According to Brisson et al. [45], the interception capacity was assumed *S* = 0.2 *LAI*. Thus, with increasing *LAI* during crop growth and decreasing *LAI* during senescence, *S* will increase and decrease, respectively. More recently, this relationship between *S* and *LAI* was also considered by Kozak et al. [46]. Figure 10b also graphs the *S* variation during the days from planting. The interception capacity in the day of sampling (cross symbol), i.e., after 95 days from the day of planting, was equal to 0.202, not too far from that experimentally measured (0.156).

It should be noted that other *S* vs. *LAI* relationships could be used in the presented methodology, and that specific methods, as that proposed by Aston [47] who also found a linear *S* vs. *LAI* relation, could be applied in order to experimentally determine such relationships that depend on the considered crops.

#### **7. Example of Application**

For the considered crop described in the previous section (Figure 10a,b), for *C<sup>F</sup>* = 1 and *p* = 0, and for the 2019 rainfall data series, interception and evaporation losses were calculated according to the described procedure. Figure 11 plots the detailed results of the involved cumulative water balance components, i.e., the rainfall depth, *R* (mm), the interception depth at the end of each wet spell, *ICS*, and the evaporation losses, ∆*E*∗. The latter made it possible to evaluate the antecedent stored volume before each *WS*, *ICS*<sup>0</sup> = *ICS* − ∆*E*∗.

The figure also illustrate the volume water mass balance (red line) that was checked by using Equation (23). Of course, the interception process mostly occurs during the growing season when it rains. For the considered year, 2009, the evaporation loss by canopy achieved 37.6 mm. Figure 11 also plots the net rainfall (dashed line), calculated according to Equation (24).

It is interesting to observe that the rainiest periods (1–110 and 250–365 Julian day) are not associated with the periods with the highest evaporation losses (1–110 Julian day). Of course, this is explained by the rainfall distribution, which has to be analyzed with respect to the growing period of the considered crop, playing a fundamental role in detecting evaporation losses by the rainfall intercepted by the canopy. Evaporation losses for the considered cover crop are 4.5% with respect to the yearly rainfall, and are thus lower than the minimum found in other studies [28], despite different cover crops being considered, whereas interception losses are of course higher and equal to 10.58%.

However, this author is of the opinion that comparison of these values with those obtained in other studies is not very meaningful, due to the high interactions between climate variability, rainfall distribution, and cover crop management (growing season, date of planting, etc.) that should be analyzed to explain the impact of inter-annual variability of interception/evaporation losses [48]. In this sense, the procedure described in this work could be useful to study in deep these interactions, also accounting for the other parameters as the cover fraction, the date of planting [42], the throughfall index and different rainfall regimes and their changes [33,49] that were not considered in this work.

## **8. Conclusions**

In order to derive the evaporation losses by wet canopy, the suggested procedure combines a modified interception model proposed by Merrian, which is applied during rainfall wet spells, and a simple power-law equation to model evaporation during the dry spells. This simple approach makes it possible to estimate the evaporation losses during continuous simulations, and requires few parameters that consolidate climate and crop conditions. Moreover, it is shown that the Merrian model can be derived by considering a simple linear storage model that makes it possible to account of the antecedent intercepted volume, which is useful for applications. The crop considered in the application is the faba bean that was described according to the general lengths of four distinct growth stages considered in FAO56, whereas *LAI* and interception capacity were obtained from literature. Since the required parameters are few, this simple approach could be applied when a rough estimate of evaporation loss by wet canopy is necessary, in the absence of a detailed characterization of canopy and climate. Due to simplified schemes, the procedure can be easily applied at large scale, in order to study the important role of rainfall regime and crop growing stages, on this important component of the hydrologic cycle, and it could be suitable to be implemented according to a probabilistic line of approach.

**Funding:** This research received no external funding.

**Data Availability Statement:** All data, models, and code, generated or used during the study appear in the published article.

**Acknowledgments:** The author wish to thank Agata Novara for providing evaporation measurements, and the three anonymous reviewers for the helpful comments and suggestions during the revision stage.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Ali Rashid Niaghi 1,\* , Oveis Hassanijalilian <sup>2</sup> and Jalal Shiri <sup>3</sup>**


**Abstract:** Evapotranspiration (ET) is widely employed to measure amounts of total water loss between land and atmosphere due to its major contribution to water balance on both regional and global scales. Considering challenges to quantifying nonlinear ET processes, machine learning (ML) techniques have been increasingly utilized to estimate ET due to their powerful advantage of capturing complex nonlinear structures and characteristics. However, limited studies have been conducted in subhumid climates to simulate local and spatial ET<sup>o</sup> using common ML methods. The current study aims to present a methodology that exempts local data in ET<sup>o</sup> simulation. The present study, therefore, seeks to estimate and compare reference ET (ETo) using four common ML methods with local and spatial approaches based on continuous 17-year daily climate data from six weather stations across the Red River Valley with subhumid climate. The four ML models have included Gene Expression Programming (GEP), Support Vector Machine (SVM), Multiple Linear Regression (LR), and Random Forest (RF) with three input combinations of maximum and minimum air temperature-based (Tmax, Tmin), mass transfer-based (Tmax, Tmin, U: wind speed), and radiation-based (Rs: solar radiation, Tmax, Tmin) measurements. The estimates yielded by the four ML models were compared against each other by considering spatial and local approaches and four statistical indicators; namely, the root means square error (RMSE), the mean absolute error (MAE), correlation coefficient (r<sup>2</sup> ), and scatter index (SI), which were used to assess the ML model's performance. The comparison between combinations showed the lowest SI and RMSE values for the RF model with the radiation-based combination. Furthermore, the RF model showed the best performance for all combinations among the four defined models either spatially or locally. In general, the LR, GEP, and SVM models were improved when a local approach was used. The results showed the best performance for the radiation-based combination and the RF model with higher accuracy for all stations either locally or spatially, and the spatial SVM and GEP illustrated the lowest performance among the models and approaches.

**Keywords:** evapotranspiration; machine learning; local; spatial; subhumid climate

#### **1. Introduction**

Evapotranspiration (ET) is a combination of two separate processes that transfer huge volumes of water and energy from the soil (evaporation) and vegetation (transpiration) to the atmosphere [1,2]. ET is the second greatest component of hydrological cycle and a major component of the Earth's surface energy balance. ET closely relates to plant growth, droughts, gas efflux, and production. Since ET plays a crucial role in watershed and agricultural water management, accurate spatial and local estimation of crop water requirements (ETa) at the scale of human influence is a critical need for a wide range of applications [3]. Quantifying ET<sup>a</sup> from agricultural lands is vital to the management of water resources. Measurement methods of ET<sup>a</sup> are available through water vapor

**Citation:** Rashid Niaghi, A.; Hassanijalilian, O.; Shiri, J. Estimation of Reference Evapotranspiration Using Spatial and Temporal Machine Learning Approaches. *Hydrology* **2021**, *8*, 25. https://doi.org/10.3390/ hydrology8010025

Received: 24 December 2020 Accepted: 26 January 2021 Published: 2 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

transfer methods (e.g., Bowen ratio, eddy covariance) [4–6], water budget measurements (e.g., soil water balance, weighting lysimeters) [7–9], or remote sensing techniques [10–12]. However, the application of field scale methods is limited due to cost, complexity, and maintenance. Estimating ET<sup>a</sup> using remote sensing models has developed in recent years, but cloud existence in many areas during the satellite passing dates limits the imagery's usefulness. Therefore, due to difficulties in acquiring direct ET<sup>a</sup> measurements, ET<sup>a</sup> can be estimated using by multiplying the calculated reference ET (ETo), using different calculation methods and meteorological data, with the crop coefficient (Kc). However, the required meteorological data for ET<sup>o</sup> calculation methods are not readily available for many locations, and methods with fewer variables must be considered when basic meteorological data are available [13]. However, the simplified and basic models are suited to estimating ET<sup>o</sup> on a weekly or monthly basis instead of a daily basis [14].

The ET<sup>o</sup> calculation is a complex process due to a large number of associated meteorological variables, and it is hard to develop an accurate empirical model to overcome all the complexities of the process [15]. Over the last few decades, machine learning (ML) techniques have attracted the interest of streams of researchers around the globe to overcome the ET<sup>a</sup> estimation complexity. These methods are evolutionary computation techniques that can achieve the best relationship in a system with a data driving tool. Due to the capability of the ML methods in tackling the nonlinear relationship between dependent and independent variables [16], numerous ML techniques have been applied and proposed to predict ET<sup>o</sup> for agricultural purposes including genetic programming (GP) [17,18]; kernel-based algorithms, e.g., support vector machine (SVM) [19,20]; artificial neural network [21–23]; wavelet neural network [14,24]; random forest (RF) [24,25]; and multiple linear regression (LR) [26,27].

Several authors have applied ML methods to detect ET<sup>o</sup> values with minimum variation from the observed values [16,21,28]. Among these models, Gene Expression Programming (GEP) and SVM illustrated better estimation than other models [29,30]. Gene Expression Programming (GEP) is comparable to GP and both involve different sizes and shapes of computer programs encoded in linear chromosomes of fixed length [16]. The SVM method is a regression procedure that has been used successfully in the hydrology context [30–32] and agro-hydrology for ET<sup>o</sup> modeling [19,33]. GEP has several advantages compared to GP such as generating valid structures, its multigenic nature, and the ability to surpasses the old GP system. Shiri et al. [17] evaluated GEP to estimate daily evaporation through spatial and local data scanning Ki¸si and Çimen [34] studied the potential of the SVM model in ET<sup>o</sup> prediction and observed that the SVM model could be useful for ET<sup>o</sup> estimation and hydrological modeling studies. Shiri and Ki¸si [35] compared the GEP and Adaptive neuro-fuzzy inference system (ANFIS) techniques for predicting short and long-term river flows. They also used a similar comparison to predict groundwater table fluctuations.

The LR model is one of the commonly used ML methods in hydrology [28,36]. It has been used to cover the study of the relationship between two or more hydrological variables and investigate the dependence and relationship between the successive values of hydrologic data. Some researchers have used the LR method to estimate the ET<sup>o</sup> rate for different regions due to the simplicity of the method compared to other numerical methods [37,38]. Tabari et al. [27] compared the LR and SVM models' performance for the semi-arid region and found that the SVM model was superior to empirical and LR models.

Due to the high computational costs and complexity of the ML models, the treebased ensemble models attract people by their simplicity and estimation power. The RF model as an ML tree-based model can produce a great result compared to the other ML models [25,39]. This model is known for its simplicity and the ability to perform both classification and regression tasks. The RF method is also widely used to predict the ET rate of different climate regions [40]. Feng et al. [26] applied the RF model for ET<sup>o</sup> estimation daily for southwest China and indicated that the RF model performed slightly better than the general regression neural network model. Shiri [41] evaluated the coupled RF with

wavelet algorithm to estimate ET<sup>o</sup> for the southern part of Iran and obtained results stating that the coupled RF model showed a great improvement compared to the conventional RF and empirical models. To our knowledge, this model has not been applied in the Northern US for ET<sup>o</sup> studies. the coupled RF model showed a great improvement compared to the conventional RF and empirical models. To our knowledge, this model has not been applied in the Northern US for ETo studies. According to the literature, GEP and SVM models have been frequently applied

general regression neural network model. Shiri [41] evaluated the coupled RF with wavelet algorithm to estimate ETo for the southern part of Iran and obtained results stating that

According to the literature, GEP and SVM models have been frequently applied across the world in various climate conditions for ET<sup>o</sup> estimation, while the LR and RF models' applications were minimal. Besides, these models have not been compared with commonly used SVM and GEP models for subhumid climate conditions. Since the limited studies have been conducted to evaluate ML models for the Northern part of the US (which experiences a high variability of weather conditions and a huge amount of agricultural production), the objectives of this study were to (1) investigate the effect of different input combinations of meteorological data on the accuracy of daily ET<sup>o</sup> estimation in subhumid climate using the GEP, SVM, LR, and RF methods, (2) compare the spatial and local prediction capabilities of the different ML models in ET<sup>o</sup> estimation, and (3) evaluate the performance of the models based on the various study years and meteorological stations. across the world in various climate conditions for ETo estimation, while the LR and RF models' applications were minimal. Besides, these models have not been compared with commonly used SVM and GEP models for subhumid climate conditions. Since the limited studies have been conducted to evaluate ML models for the Northern part of the US (which experiences a high variability of weather conditions and a huge amount of agricultural production), the objectives of this study were to (1) investigate the effect of different input combinations of meteorological data on the accuracy of daily ETo estimation in subhumid climate using the GEP, SVM, LR, and RF methods, (2) compare the spatial and local prediction capabilities of the different ML models in ETo estimation, and (3) evaluate the performance of the models based on the various study years and meteorological stations.

#### **2. Materials and Methods 2. Materials and Methods**

#### *2.1. Study Area Climate and Reference Evapotranspiration (ETo) 2.1. Study Area Climate and Reference Evapotranspiration (ETo)*

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 3 of 16

The weather data for the current study were obtained from the North Dakota (ND) Agricultural Weather Stations [42] located at Prosper (ND), Galesburg (ND), Leonard (ND), Sabin (MN: Minnesota), Perley (MN), and Fargo (ND) for 17 study years (January 2003–December 2016). Geographical location of the study stations is shown in Figure 1. The weather data for the current study were obtained from the North Dakota (ND) Agricultural Weather Stations [42] located at Prosper (ND), Galesburg (ND), Leonard (ND), Sabin (MN: Minnesota), Perley (MN), and Fargo (ND) for 17 study years (January 2003–December 2016). Geographical location of the study stations is shown in Figure 1.

**Figure 1.** Location of weather stations used for the current study. **Figure 1.** Location of weather stations used for the current study.

To calculate the daily reference evapotranspiration (ETo) for each study station, the ASCE standardized reference evapotranspiration equation (ASCE-EWRI) was used for the alfalfa reference crop [43]. This equation provides a standardized calculation of ETo demand that can be used in developing Kc and comparing it with other methods. Equation (1) presents the form of the standardized ETo equation for all hourly and daily calculation time steps. To calculate the daily reference evapotranspiration (ETo) for each study station, the ASCE standardized reference evapotranspiration equation (ASCE-EWRI) was used for the alfalfa reference crop [43]. This equation provides a standardized calculation of ET<sup>o</sup> demand that can be used in developing K<sup>c</sup> and comparing it with other methods. Equation (1) presents the form of the standardized ET<sup>o</sup> equation for all hourly and daily calculation time steps.

$$\rm ET\_o = \frac{0.408 \,\Delta \, (R\_n - G) + \gamma \frac{\rm C\_n}{T + 273} \mu\_2 (e\_s - e\_d)}{\Delta + \gamma (1 + \mathcal{C}\_d \mu\_2)} \tag{1}$$

∆ (1 ௗଶ) where ETo is reference evapotranspiration rate (mm d−1), Rn is net solar radiation (MJ m−<sup>2</sup> d−1), G is soil heat flux (MJ m−2 d−1), is psychrometric constant (KPa °C−1), T is the mean daily air temperature, U2 is the average wind speed at 2 m height (m s−1), es is saturation vapor pressure (KPa), ea is actual vapor pressure (KPa), ∆ is the slope of the saturation vapor pressure–temperature relationship (KPa °C−1), Cn and Cd are coefficients which are related to the crop and time step. The value for the constants Cn and Cd are 1600 and 0.38 where ET<sup>o</sup> is reference evapotranspiration rate (mm d−<sup>1</sup> ), R<sup>n</sup> is net solar radiation (MJ m−<sup>2</sup> d −1 ), G is soil heat flux (MJ m−<sup>2</sup> d −1 ), *γ* is psychrometric constant (KPa ◦C −1 ), T is the mean daily air temperature, U<sup>2</sup> is the average wind speed at 2 m height (m s−<sup>1</sup> ), es is saturation vapor pressure (KPa), e<sup>a</sup> is actual vapor pressure (KPa), ∆ is the slope of the saturation vapor pressure–temperature relationship (KPa ◦C −1 ), C<sup>n</sup> and C<sup>d</sup> are coefficients which are related to the crop and time step. The value for the constants C<sup>n</sup> and C<sup>d</sup> are 1600 and 0.38 for the alfalfa reference crop. Table 1 summarized weather parameters of the study locations for the study period.


**Table 1.** Statistical summary of the weather data for the study locations for the study period.

#### *2.2. Models Structure and Application*

To process the GEP and SVM algorithm, we used the GeneXpro program in Matlab, and to process the LR and RF models, a scikit-learn module embedded in the Python 3.2 programming language was used. GEP is an extension of GP [44] developed by Ferreira [45] that creates a computer program to investigate a relationship between input and output variables. GEP was developed to find a better solution for solving a particular problem relating to the understudied phenomena [46].

The application of GEP requires several steps. GEP is, like GAs and GP, a genetic algorithm, as it uses populations of individuals, selects them according to fitness, and introduces genetic variation using one or more genetic operators. The procedure to model daily evapotranspiration (as a dependent variable) by using weather variables (as independent variables) is as follows: 1. Selection of fitness function; 2. Choosing the set of terminals T and the set of functions F to create the chromosomes; 3. Choosing the chromosomal architecture; 4., Choosing the linking function; 5. Choosing the genetic operators.

The fitness function must be determined in the first step with a random generation of chromosomes of a certain program (initial population) and evaluated against a set of fitness cases [47]. Using weather station data as input variables (terminals) to model daily ET<sup>o</sup> involves the next general step. The selection of fitness functions (i.e., absolute error, relative error, and correlation coefficient) depends on the experience and intuition of the user. The GEP model in the current study was developed based on the recommended functions by Shiri et al. [17]. In the third step, the chromosomal architecture can be defined by having the weather variables as terminal and function set as chromosomes. The fourth step was to select a linking function that relates genes to each other in addition to linking the parse trees [17]. Finally, genetic operators' corresponding rates were chosen. Table 2 summarizes the commonly used parameters for each run.


**Table 2.** Parameters used per run of gene expression programming (GEP).

The SVM was developed by Cortes and Vapnik [48] and is known as the classification and regression method [34] to solve problems by applying a flexible representation of the class boundaries and implementing an automatic complexity control to reduce overfitting. In SVM, the dependency of the dependent variable to a set of independent variables is evaluated. In regression estimation with Support Vector Regression (SVR), which is used to define SVMs in the literature, a functional dependency *f*(*x*) between a set of sampled input points *X* = {*x*1, *x*2, *x*3, . . . , *xl*} (here, input sampled refer to meteorological variables) taken from *R<sup>n</sup>* (input vector of n dimension) and target values *Y* = {*y*1, *y*2, *y*3, . . . , *yl*} (ET<sup>o</sup> as target values) with *y<sup>i</sup>* ∈ *Rn*. More detail on SVM can be found in Vapnik [49].

The LR is a statistical method used to describe a quantitative relationship between a dependent variable and one or more independent variables [27,50]. In LR, the function is a linear equation and is expressed as:

$$\mathbf{Y\_i = b\_0 + b\_1 \times 1 + \dots \times + b\_k x\_k} \tag{2}$$

where bo–b<sup>k</sup> are the fitting constants, y<sup>i</sup> is the dependent variable, and x1–x<sup>k</sup> are the independent variables for this system.

The RF method combines a group of decision trees for either classification or regression purposes. Although each decision tree may not be capable of learning well, the combination of the decision trees results in a strong learner. Each decision tree predicts the outcome individually, and RF votes among the outcomes for classification or averages the outcomes for regression. Each decision tree is trained on a different subset of samples by a bagging extension of the RF model to reduce the risk of overfitting. Moreover, a different subset of input variables can be used in each tree to make it more useful in prediction for datasets with higher dimensions [51]. For this study, a small subset of data was used to find a good combination of parameters for the RF model. As a result, there were several trees in the forest and the minimum number of samples required for the leaf nodes were 50 and 35, respectively. The mean square error criteria are used as a procedure for estimation.

The calculated daily ET<sup>o</sup> was used to feed the GEP, SVM, LR, and RF models. Three treatments including temperature, radiation, and mass transfer-based combinations were used as input to feed the models, and each model of the combinations was assessed for spatial and local approaches. Different statistical analysis was performed to evaluate the accuracy and performance of the different combinations and approaches for each studied station. The combinations were as follows:


#### *2.3. K-Fold Cross-Validation*

Splitting the data into the sets of data for testing and training is a usual procedure for assessing the ML techniques. Using 10–30% of the complete dataset as a single test set is a common method for GEP evaluation. Therefore, the K-fold cross-validation technique was used to increase the evaluation performance and set of data for either training or testing purposes. Using K-fold cross-validation, the dataset was divided into K subsets, and the training process was repeated K times leaving each time a distinct set of patterns for testing until a complete testing scan for the dataset was fulfilled. Computation cost defines the minimum assembly size of the test set. Here, the minimum test size was fixed as one year for local modeling and one station for spatial modeling. Consequently, at a local scale, one year was held out each time for testing while the models were trained using the remaining 16 years; hence, a total of 612 models (17 years × 6 stations × 3 input combinations × 2 models) were established for the local k-fold testing. At the spatial scale, one station was considered as a test block each time and the models were trained using the patterns from the remaining stations; hence, a total number of 36 models (6 stations × 3 input combinations × 2 models) were constructed. The local and spatial approaches were noted with T and S in the figures.

#### *2.4. Evaluation Criteria*

To investigate the performance of the models for each combination and approaches, four statistical indicators were used, namely, the root mean square error (RMSE), the mean absolute error (MAE), correlation coefficient (r<sup>2</sup> ), and scatter index (SI), defined as follows:

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\text{ET}\_{\text{e}} - \text{ET}\_{\text{o}})^2} \tag{3}$$

$$\text{MAE} = \frac{\sum\_{i=1}^{n} |\text{ET}\_{\text{e}} - \text{ET}\_{\text{o}}|}{n} \tag{4}$$

$$r^2 = \left(\frac{\sum\_{i=1}^{n} \left(\text{ET}\_{\text{o}} - \overline{\text{ET}\_{\text{o}}}\right) \left(\text{ET}\_{\text{e}} - \overline{\text{ET}\_{\text{e}}}\right)}{\sqrt{\sum\_{i=1}^{n} \left(\text{ET}\_{\text{o}} - \overline{\text{ET}\_{\text{o}}}\right)^2 \sum\_{i=1}^{n} \left(\text{ET}\_{\text{e}} - \overline{\text{ET}\_{\text{e}}}\right)^2}}\right)^2\tag{5}$$

$$\text{SI} = \frac{\text{RMSE}}{\overline{\text{E}\_0}} \tag{6}$$

where ET<sup>e</sup> and ET<sup>o</sup> are simulated and calculated reference evapotranspiration at the *i*-th time step, respectively, n is a number of time steps, ET<sup>e</sup> and ET<sup>o</sup> are mean values of simulated and calculated ETo, respectively.

The RMSE describes the average magnitude of errors and can take on values from 0 to ∞ indicate perfect and worst fit, respectively, and the MAE scores the error magnitudes without any specific weight to larger/smaller errors. Therefore, the lower value of the RMSE and MAE is desirable. The r<sup>2</sup> values around 1 indicate a perfect linear relationship between estimated and calculated values, where the closer a value is to zero, the more it demonstrates the poor relationship between simulation and calculation. Finally, SI is a dimensionless index of RMSE that gives a good insight to compare the performance of different models.

#### **3. Results and Discussions**

The local and spatial analysis of four models for six studied stations is shown in Table 3. According to the three combinations' performance, the radiation-based method illustrated the highest accuracy for either local or spatial approaches compared to the other combinations. The mass-transfer-based combination was the next most accurate combination. The results showed that the local trained models surpassed the spatially trained models because of using the same patterns for both training and testing models. However, the spatial models gave comparable results compared to the local model in some cases, especially for radiation-based combinations. Differences in temperature among the stations have dramatically affected the performance of both the temperature-based and the mass transfer-based models. In all cases, the minimum differences between the performance accuracy of the local and spatial models belonged to the LR model. This can be inferred to the mathematical structure of this technique, where only linear relationships can be supposed between the input and target parameters with a lower degree of flexibility compared to heuristic data driven models.


**Table 3.** Global average performance indicators of the gene expression programming (GEP), support vector machine (SVM), multiple linear regression (LR), and random forest (RF) methods for three input combinations of local (T) and spatial (S) approaches.

> Among four models with three input combinations, the models relying on radiation, mass-transfer, and temperature-based combinations showed the lowest RMSE and MAE, respectively (Table 3). Comparing the GEP, SVM, LR, and RF models, the RF model illustrated the lowest rate of RMSE and MAE with the best performance for radiationbased approaches. However, the RF model improved 4.37, 5.76, and 1.49 percent from local to spatial approaches for temperature, radiation, and mass-transfer-based combinations, respectively, which was in contrast with the improvement's direction in the other models. Considering the models based on radiation combination, the spatial RF model exhibited the highest linear relationship (r<sup>2</sup> = 0.927) between calculated and estimated ET<sup>o</sup> in comparison with the other models. The local RF method was the next accurate approach to estimate ET<sup>o</sup> based on radiation-based data. This observation illustrated the ability of the RF algorithms to estimate ET<sup>o</sup> using data from local stations for training. Furthermore, the LR model had significant improvement for RMSE and MAE from spatial to local approaches for all three types of input combinations. For the LR model, the r<sup>2</sup> value was not changed for radiation-based and temperature-based combination from spatial to local approaches and the change was 0.13 percent for the mass-transfer-based model. Therefore, the LR model illustrated almost similar results for both spatial and local approaches among all models.

> The GEP and SVM models illustrated the great improvement rate for all three input combinations from spatial to local condition with the highest improvement of 21 percent for mass-transfer-based combination. Specifically, the GEP model showed an improvement from spatial to local approach, however, the percentage of improvement was 2.3, 3.9, and 0.13 for radiation, temperature, and mass-transfer-based combinations, respectively. In

term of obtained improvement for RMSE and MAE from spatial to local approaches, both of the GEP and SVM models gained similar results. The correlation coefficient of the SVM model decreased from spatial to local approaches for about 6.3, 6.5, and 10.9 percent for radiation, temperature, and mass-transfer-based combinations, respectively. By using local radiation data for training the models, the SI indicator for the GEP and SVM models showed an improvement of 8.2 and 10 percent from spatial to local approach, respectively. This improvement was about 6.6 and 8.7 percent for mass-transfer-based and 15 and 10.9 percent for temperature-based combinations, respectively.

Statistical analysis revealed the similar performance of the local GEP and SVM models. For RMSE and MAE statistical variables, GEP and SVM models showed a greater improvement in performance for mass-transfer and temperature-based combinations, respectively. By considering correlation coefficient values, it can be concluded that the improvement in accuracy of either GEP or SVM approaches was not significant and all illustrated the ability to estimate with acceptable accuracy. Therefore, if temperature data are not available at a particular station, but they are for other stations, the GEP and SVM approaches can be useful to estimate ETo. However, due to the higher mapping ability of the GEP models, using either local or spatial GEP are preferable.

The models relying on the mass-transfer combination had slightly higher accuracy than the temperature-based approach, but lower accuracy compared to the radiationbased combination. All of the local and spatial GEP and SVM methods illustrated lower improvement compared to that for the temperature-based approach. This showed that wind speed can have a significant effect on the accurate estimation of spatial and local ETo. Due to the flat topography of the study area and being faced with lots of high-speed winds during the growing season and almost all other seasons, including the wind as a parameter to build the model architecture and estimating the ET<sup>o</sup> can increase the accuracy of the approach.

Overall, the RF and the LR models illustrated the best performance among the four models, and comparing the GEP and SVM models, the GEP model showed better performance than the SVM model for all three input combinations.

A breakdown of the models' performance accuracy at each station is shown in Figures 2–4 for all of the three input combinations, respectively. In the case of the temperature-based combination (Figure 2), the local GEP and SVM models (shown as TGEP and TSVM) gave more accurate results than the spatial (shown as SGEP and SSVM) models. For the LR and RF models, the difference in accuracy between local (TLR and TRF) and spatial approaches (SLR and SRF) was not significant and both showed better performance than GEP and SVM models since they relied on the patterns of the same location used for training and testing the models. According to Table 2, station 6 (Fargo) had the highest, and station 2 (Galesburg) had the lowest range of recorded temperature among the study stations. This range may be caused to have the lowest performance for station 2; however, it was difficult to evaluate the model's performance in the climate context of each station due to the few number of study stations. The RF model showed the best performance with higher accuracy for all stations either locally or spatially, and the spatial SVM and GEP illustrated the lowest performance among other models and approaches.

**Figure 2.** Statistical indices of the temperature-based combination for the four applied models (GEP, VM, LR, and RF) with local (T) and spatial (S) approaches. **Figure 2.** Statistical indices of the temperature-based combination for the four applied models (GEP, VM, LR, and RF) with local (T) and spatial (S) approaches.

**Figure 3.** Statistical indices of radiation-based combination for the four applied models (GEP, SVM, LR, and RF) with local (T) and spatial (S) approaches. **Figure 3.** Statistical indices of radiation-based combination for the four applied models (GEP, SVM, LR, and RF) with local (T) and spatial (S) approaches.

Among the study stations comparison, the SI range of the spatial RF was 0.018, which showed the best performance compared to the other applied methods. As obtained from the temperature-based combination, the LR method performed well in the radiationbased combination too, with an SI indicator range of 0.021 and 0.024 for local and spatial LR approaches, respectively. The worst performance was observed for spatial GEP and

and SVM models, the local GEP performed well compared to other approaches of the SVM and GEP models. The statistical indicators were in agreement with the spatial RF performance in which they showed the lowest rate of RMSE and MAE and the highest value of r2. However, comparing the MAE might not be a valid indicator due to taking into account the local order of magnitude of the target variable. The ranking of the SI indicators showed that spatial RF and LR could overcome the lack of meteorological data for the station. On the other hand, the averages of the SI values for all six study stations showed that the spatial RF and local RF had the lowest and the spatial GEP and spatial SVM had the highest rate of SI indicators. Therefore, either spatial or local RF methods could be useful to

Figure 4 shows the statistical indices of the mass-transfer-based combination. Similar to the previous combinations, the spatial and local RF gave a more accurate estimation than other methods. On the other hand, the local SVM approach showed better estimation than the spatial SVM and GEP methods for all stations except station 2, which had the lowest range of temperature variation. The fluctuations of the indices among the stations were higher than the radiation-based combination and lower than that for temperaturebased combination, which showed mediocre accuracy compared to the other combina-

estimate the missing values for any of the stations.

tions.

**Figure 4.** Statistical indices of mass transfer-based combination for the four applied models (GEP, SVM, LR, and RF) with local (T) and spatial (S) approaches. **Figure 4.** Statistical indices of mass transfer-based combination for the four applied models (GEP, SVM, LR, and RF) with local (T) and spatial (S) approaches.

By having wind speed and temperature data as the input variable for the mass transfer-based combination, the spatial RF approach gained the lowest SI and highest r2 values for ETo estimation compared to all other methods. The minimum and maximum SI values for mass transfer-based combination were obtained for the spatial RF and spatial SVM approaches, which were 0.011 and 0.120, respectively. According to the performance ranking of the models based on the SI indicator, spatial LR, local LR, and local RF showed better performance after spatial RF with SI values of 0.015, 0.018, and 0.018, respectively. The RF and LR methods showed the lowest range of SI compared to the spatial and local GEP and SVM methods. For temperature-based combinations, the spatial and local LR approaches had minimum SI ranges of 0.018 and 0.020, respectively, and the spatial SVM and GEP methods illustrated the highest SI ranges of 0.113 and 0.119, respectively. The spatial RF approaches with an average of 0.333, and spatial SVM, with an average of 0.457, showed the lowest and highest SI rate, respectively. Therefore, spatial RF approaches may be the most practical way to estimate the missing meteorological data of the study stations.

The local SVM, local GEP, and spatial GEP had the SI values of 0.087, 0.10, and 0.119, respectively. The average of SI for all study stations showed that the spatial and local LR had the highest and spatial and local RF had the lowest SI values, respectively. Therefore, by having the lowest range of SI and lowest value of SI for the spatial RF approach, it might be more practical to apply the spatial RF for other stations without training a specific model for each station. Accordingly, no local dataset would be needed to train the local models. This could be helpful to estimate the ETo for stations with partial or missing meteorological data. To understand the yearly performance of the applied models based at each of the study stations, the models were assessed per test year. Figure 5 illustrates the SI values obtained from the three input combinations for each study year of the study stations. The Figure 3 shows the evaluation result of the radiation-based combination for the four models with spatial and local approaches. The amount of received radiation for all study stations was similar. According to the global performance of the defined models, the radiation-based combination gained the best performance among the three input combinations. Besides, the radiation-based combination had the lowest rate of RMSE, MAE, and SI, and the highest rate of r<sup>2</sup> for each of the study stations. Among the spatial and local scenarios, the local approach had a better performance than the spatial approach. For the radiation-based combination, the spatial RF and local RF models had an accurate estimation of ETo, respectively. For stations 3 and 4 (Leonard and Sabin) either spatial or local approaches of GEP and SVM models gained lower performance than the other stations. This could be due to the slightly higher magnitude and variations of solar radiation (Table 2) among the other stations during the study period.

SI values of the models fluctuated considerably for almost all stations during the test years. As shown in Figure 5, the SI values fluctuated considerably within test years for all input combinations and approaches. Among the study stations, Prosper and Sabin stations showed the average maximum range for the SI values. The minimum average of the SI value 0.223 was observed for the RF radiation-based combination for the Fargo station, and the maximum average of the SI was obtained for the SVM temperature-based combination (SVM1) for the Galesburg station. The Galesburg station had the lowest temperature range among the study stations. According to Figure 4, the RF2 (radiation-based combination) model showed the lowest fluctuation compared to the other approaches with a similar trend among the study stations. For all of the study stations, test years, and input Among the study stations comparison, the SI range of the spatial RF was 0.018, which showed the best performance compared to the other applied methods. As obtained from the temperature-based combination, the LR method performed well in the radiation-based combination too, with an SI indicator range of 0.021 and 0.024 for local and spatial LR approaches, respectively. The worst performance was observed for spatial GEP and SVM approaches, with SI indicator of 0.128 and 0.140, respectively. According to the GEP and SVM models, the local GEP performed well compared to other approaches of the SVM and GEP models. The statistical indicators were in agreement with the spatial RF performance in which they showed the lowest rate of RMSE and MAE and the highest value of r<sup>2</sup> . However, comparing the MAE might not be a valid indicator due to taking into account the local order of magnitude of the target variable. The ranking of the SI indicators showed that

spatial RF and LR could overcome the lack of meteorological data for the station. On the other hand, the averages of the SI values for all six study stations showed that the spatial RF and local RF had the lowest and the spatial GEP and spatial SVM had the highest rate of SI indicators. Therefore, either spatial or local RF methods could be useful to estimate the missing values for any of the stations.

Figure 4 shows the statistical indices of the mass-transfer-based combination. Similar to the previous combinations, the spatial and local RF gave a more accurate estimation than other methods. On the other hand, the local SVM approach showed better estimation than the spatial SVM and GEP methods for all stations except station 2, which had the lowest range of temperature variation. The fluctuations of the indices among the stations were higher than the radiation-based combination and lower than that for temperature-based combination, which showed mediocre accuracy compared to the other combinations.

By having wind speed and temperature data as the input variable for the mass transferbased combination, the spatial RF approach gained the lowest SI and highest r<sup>2</sup> values for ET<sup>o</sup> estimation compared to all other methods. The minimum and maximum SI values for mass transfer-based combination were obtained for the spatial RF and spatial SVM approaches, which were 0.011 and 0.120, respectively. According to the performance ranking of the models based on the SI indicator, spatial LR, local LR, and local RF showed better performance after spatial RF with SI values of 0.015, 0.018, and 0.018, respectively. The local SVM, local GEP, and spatial GEP had the SI values of 0.087, 0.10, and 0.119, respectively. The average of SI for all study stations showed that the spatial and local LR had the highest and spatial and local RF had the lowest SI values, respectively. Therefore, by having the lowest range of SI and lowest value of SI for the spatial RF approach, it might be more practical to apply the spatial RF for other stations without training a specific model for each station. Accordingly, no local dataset would be needed to train the local models. This could be helpful to estimate the ET<sup>o</sup> for stations with partial or missing meteorological data.

To understand the yearly performance of the applied models based at each of the study stations, the models were assessed per test year. Figure 5 illustrates the SI values obtained from the three input combinations for each study year of the study stations. The SI values of the models fluctuated considerably for almost all stations during the test years.

As shown in Figure 5, the SI values fluctuated considerably within test years for all input combinations and approaches. Among the study stations, Prosper and Sabin stations showed the average maximum range for the SI values. The minimum average of the SI value 0.223 was observed for the RF radiation-based combination for the Fargo station, and the maximum average of the SI was obtained for the SVM temperature-based combination (SVM1) for the Galesburg station. The Galesburg station had the lowest temperature range among the study stations. According to Figure 4, the RF2 (radiation-based combination) model showed the lowest fluctuation compared to the other approaches with a similar trend among the study stations. For all of the study stations, test years, and input combinations, the RF models gave the best performance with the lowest SI values compared to the other models and combinations. The SVM and GEP models showed the worst SI averages for the temperature-based combinations. In this case, the order of the accuracy of the models was similar to that obtained from the station-based analysis. The radiation-based combination gave the most accurate results and estimation in comparison with the temperature or mass-transfer-based combination models.

Comparing the performance of the models relying on each of the combination methods, it can be observed that the performance of the approaches is similar. However, the range of the SI indicator for different approaches was different depending on the test years. For example, 2011, 2012, and 2013 were the dry, normal, and wet years among the study years, respectively. The result of SI per test years showed that 2012 had lower SI than 2011 and 2013 for all of the three input combinations, and the SI values of the various methods and approaches were close together. On the other hand, for the 2013 test year, some of

*Hydrology* **2021**, *8*, x FOR PEER REVIEW 12 of 16

the approaches illustrated a huge jump for the obtained SI from 2012 to 2013 due to the weakness of the model performance. based combination gave the most accurate results and estimation in comparison with the temperature or mass-transfer-based combination models.

combinations, the RF models gave the best performance with the lowest SI values compared to the other models and combinations. The SVM and GEP models showed the worst SI averages for the temperature-based combinations. In this case, the order of the accuracy of the models was similar to that obtained from the station-based analysis. The radiation-

**Figure 5.** Scatter index (SI) values of GEP, SVM, LR, and RF models per test year and study station **Figure 5.** Scatter index (SI) values of GEP, SVM, LR, and RF models per test year and study station for three input combinations.

for three input combinations. Due to the variability in the meteorological data and the climate pattern, the variability in performance accuracy at each station was expected. Similar variability in performance of the ML approaches was observed by Shiri et al. [17]. Selection of the training set and testing set plays an important role in model performance. The existence of any abnormal year in the test years in comparison with training datasets causes it to have an inaccurate estimation [19]. By lowering the number of the input values, the validity of the training set for estimation of test years decreases. Because of the lower input values, the variable rate would be low, and this type of input would only be valid for periods with very specific trends. This explanation may clarify the performance of spatial approaches, where the relationship encountered might be site-specific. Other researchers illustrated the sitespecific performance of spatial approaches for the different study regions and climate conditions [34,35].

The comparison of the three input combinations showed that the performance of some approaches was similar for some years while the performance of methods for some years were far from each other. For example, the SVM model showed the most improvement from temperature to mass-transfer-based combination, which became like the RF method. However, depending on the station and test year, this similarity becomes even closer. All the test years and stations showed an improvement from temperature to radiation or mass-transfer-based combination except for the Prosper station, which is in agreement with the findings of Adnan et al. [15]. On the other hand, the Prosper station showed the best improvement for the SVM model for radiation-based approaches. Considering that all of the input combinations rely on the temperature and another variable (solar radiation or wind speed), it might be thought that the performance differences could be due to the effect of the second variable in the estimation of the output. Besides, when the performance of the models is similar, the impact of the secondary variable might be less than the primary variable (temperature). However, when the gap between the performance of the SI indicator increases, it proves the crucial impact of the second variable on the model performance for the specific test year or station. A similar conclusion could be drawn for the comparison between the input combinations. If each of the combinations shows a better performance than the other combination method for a specific year and station, the second variable effect should be important and might have a significant influence in the explanation of the output for that test year.

#### **4. Conclusions**

This paper aimed to evaluate the different ML methods including GEP, SVM, LR, and RF to estimate ET<sup>o</sup> in the Red River Valley. The external approach exempts using local data like spatial approaches in the current paper for simulating ET<sup>o</sup> values in a decisive way when local data is not available, reliable, or sufficient. Global comparison of the performance accuracy of the applied models revealed that the RF model was the best for all combinations among the four defined models. Furthermore, the RF model illustrated the best performance for spatial and local conditions for all input combinations. In general, the LR, GEP, and SVM models were improved when a local approach was used, except for the RF model, which was less accurate with a local approach. The radiation-based combination was the most accurate predictor among all models tested. As a result, this combination showed the lowest rate of improvement due to better performance in the first step.

The results showed that due to the flat topography of the study area with high wind speeds during the growing season, including the wind as a parameter to build the model architecture and estimate the ET<sup>o</sup> can increase the accuracy of the prediction. Besides, it might be more practical to apply the spatial RF model for stations with missing meteorological data without the need for local training. The recommended application of spatial RF using radiation combination allows for a more reliable estimate of ET<sup>o</sup> to fill the missing values for more precision water management purposes. Further research should confirm the current results in other geographical locations and for the various input combination methods.

**Author Contributions:** Conceptualization, A.R.N. & J.S.; methodology, A.R.N. & J.S.; software, A.R.N., J.S., and O.H.; validation, A.R.N. & J.S.; formal analysis, A.R.N., J.S., and O.H.; investigation A.R.N.; resources, A.R.N.; data curation, A.R.N.; writing—original draft preparation, A.R.N.; writing review and editing, A.R.N.; visualization, A.R.N. & J.S.; supervision, J.S.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**

