**4. Discussion**

#### *4.1. ET Model Intercomparison*

Overall results listed in Table 3 show that TSEB models produced more robust estimates of both sensible and latent heat fluxes, with lower errors around 80 to 90 W m−<sup>2</sup> and larger correlation coefficient, while at the same time returning more valid cases than the other two models, METRIC and ESVEP. Those errors are within the expected and reported errors in literature, e.g., Kalma et al. [2] showed errors in *λE* ranging between 24 and 105 W m−<sup>2</sup> for a wide range of models, Chirouze et al. [58] reported errors for TSEB > 100 W m−<sup>2</sup> in a semi-arid area of Mexico, and 50 W m−<sup>2</sup> errors are reported in Tang et al. [35]. Choi et al. [91] found TSEB-PT and METRIC produced similar errors of 54 W m−<sup>2</sup> in a watershed in Iowa, US. However it is worth noting that most of the reported errors in these studies [35,58,91,92] used actual surface temperature at high spatial resolution (e.g., Landsat or ASTER), whereas in this study we used low resolution temperature sharpened to high spatial resolution, which provides an additional input uncertainty to the models. For that reason, Section 4.2 is dedicated to this issue in depth.

TSEB-PT was developed trying to solve some of the issues in sparse vegetation and semi-arid conditions previously raised by less complex models [36], and therefore it adapts better to a wider range of climatic and vegetation conditions [1] as it was shown in Tables 5 and 7. METRIC, on the other hand, was primarily designed for standard crops and requires concomitant presence of stressed and well watered-full vegetation conditions within the scene itself. This more often happens in semi-arid climate where irrigated crops and rainfed crops and natural vegetation are present. Those cases in which, either due to the increased presence of clouds (i.e., fewer available pixels in the scene) or in regions where these hot and cold pixels cannot be simultaneously found, METRIC would produce more uncertain retrieval, as already pointed by Choi et al. [91] and Tang et al. [35] (in humid or sub-humid areas), or even would not produce any valid data. Similarly, ESVEP was designed and tested in an agricultural area located in a subhumid and monsoon climate [44] and therefore certain assumptions and parameterizations taken in that model might not transfer well to other vegetation or climatic conditions.

Despite of TSEB being the model with the largest required amount of input data, this study proposed several new approaches to retrieve some of those inputs operationally, with special focus on exploiting the spectral capabilities of Sentinel-2, in particular the bands in the red-edge region that is sensitive to leaf pigments. A simple empirical approach relating leaf bihemispherical reflectance and transmittance with the leaf biochemical properties resulted in accurate estimates of net radiation. More importantly, due to the larger uncertainty of TSEB models over senescent vegetation, we derived a method to obtain both total LAI and its green fraction based on Fisher et al. [75] FAPAR/FIPAR relationship. Nevertheless, more research is needed to systematically derive other vegetation properties such as canopy height/aerodynamic roughness or vegetation clumping.

Finally, it is worth pointing out that even *in situ* EC measurements are prone to uncertainty as is confirmed for instance by the usual energy imbalance in those systems. Particularly we found a larger disagreement between observed and predicted net radiation in Dahra (see Figures in the Supplement). We hypothesise that this could be due to two possible reasons. Firstly, our modelled irradiance, with depends on TCWV and aerosol optical thickness, could be more noisy at Dahra than the other sites, due to unaccounted dust aerosols in that site placed over the Sahel. The second issue might be the actual *Rn* measurements, as in this site only a NR-lite (Kipp & Zonen, Netherlands) is available to measure global *Rn* that might be less accurate than the radiometers at the other sites, which are measuring the four components of radiation. In addition, Harvard Forest site lacks *in situ* G measurements, which effects the energy balance closure correction. This issue together with the fact that very few cases are available in forests (Table 5), leads us to avoid strong conclusions regarding the performance of the models in forested areas.

#### *4.2. Sharpening and Disaggregation*

As was previously mentioned, thermal sharpening relates empirically or semi-empirically coarse resolution surface temperature with fine resolution multispectral and other ancillary data. This technique could be a sound alternative to the lack high resolution thermal imagery for operational activities. However, previous studies in thermal sharpening have reported some uncertainties when compared to actual *Trad* temperatures, with errors ranging up to 3.5 K [7–9,11,12,14]. Therefore, for some applications requiring ET estimates at higher accuracy (i.e., precision agriculture), sharpening might not be considered as a suitable substitute of *Trad* but complementary to it, such as in the fusion approach by Knipper et al. [93].

In order to reduce flux retrieval errors with sharpened *Trad* inputs, we also tested a flux disaggregation method [10,23]. Our results listed in Tables 3–7 show that disTSEB model, i.e., coarse S3 TSEB-PT fluxes disaggregated with fluxes derived with TSEB-PT and fine resolution sharpened *Trad*, yielded only modest improvement (5 W m−<sup>2</sup> reduction in RMSE in case of H and only 1 W m−<sup>2</sup> in case of *λE*) to the TSEB-PT model, i.e., running TSEB directly on the sharpened *Trad* imagery. The one exception was at the savanna sites (see Table 5) where using disaggregation reduced the errors in H by

around 12% and errors in *λE* by around 10%. However, previous studies have shown the robustness of this approach to overcome limitations of the likely less reliable fine resolution *Trad* images [4,93,94]. Furthermore, coarse input data must be produced beforehand for thermal sharpening and hence it is readily available for running the models at coarse resolution, which indeed is computationally inexpensive given the much lower number of pixels within a scene. Therefore, flux disaggregation would still be recommended when running TSEB-PT with sharpened temperatures.

In addition, the sharpening of a coarse resolution *Trad* image using fine resolution images acquired on different days, with a maximum of 10 days offset, might lead to additional uncertainties. This is caused by the fact that some changes in either land cover properties, (e.g., vegetation growth, harvests, fires) or moisture conditions (e.g., rainfall or irrigation) might happen between the Sentinel 2 and 3 acquisitions. Figure 2 shows that at a general level (all validation sites taken together) this does not appear to be a significant issue as the error does not increase as the day offset between thermal and shortwave acquisitions gets larger. Particularly relevant in this analysis is H since it is the energy component that is directly related to *Trad*, and hence more prone to errors in sharpening. However, more studies should be conducted to look at the effect of the day offset in particular situations, e.g., in crops during senescence or with localized irrigation patterns. It might be also worth to investigate using high-resolution radar data (e.g., from Sentinel-1), which is sensitive to soil moisture, in the thermal data sharpening approach [95]. Furthermore, the Landsat family of satellites could also be utilised during the sharpening since they acquire thermal data at around 100 m spatial resolution although at 8 days (two satellites) to 16 days (single satellite) temporal resolution. Using observations from those satellites would both increase the temporal density of the high-resolution data but also capture physical processes and properties which are not reflected in the shortwave data, such as near soil surface soil moisture and soil evaporative efficiency.

Finally, some studies have reported larger errors than in this study, but they were using coarser resolution imagery [43]. This is probably due to the scale mismatch between the coarse pixel estimate and the footprint of the EC towers' measurements. We evaluated this by comparing fluxes modelled at Sentinel-2 (i.e., with sharpened *Trad*) and Sentinel-3 spatial resolutions against measurements form towers. This was done for all the sites put together and also for the validation sites split into two categories: those in which the tower is located in a landscape feature too small to have significant effect on the original resolution Sentinel-3 *Trad* (category "small" containing CH, KL, GR and SE sites), and those where the opposite is true (category "large" containing SL, DA, HF, HTM, MT TA and KG sites). The results (Table 8) indicate that using sharpened *Trad* is most important when modelling H in the "small" category. However, the correlation of high-resolution fluxes against tower measurements is in almost all the cases higher than that of low-resolution fluxes and rRMSE is lower or the same in case of turbulent fluxes. Therefore, even though sharpened *Trad* might be more prone to errors than actual high-resolution *Trad*, it is still a good option for downscaling fluxes for model validation [4], addressing therefore the vegetation cover variability within coarse resolution pixels. Nevertheless, there is still an open question on how feasible thermal sharpening is for early detection of water stress at small scales, compared to using high resolution thermal imagery. This issue is especially relevant for precision irrigation tasks and therefore future studies should address this topic.

**Figure 2.** Error (modelled–measured) distribution of fluxes modelled with TSEB-PT and disTSEB models using sharpened *Trad* depending on offset days between a Sentinel-3 *Trad* image and the fine-scale Sentinel-2 multispectral image. Error computed for all sites together.

**Table 8.** Landscape feature size dependence of errors for low and high resolution TSEB-PT modelled fluxes using Decision Trees sharpened temperatures. N, number of valid cases; Obs.; mean of observed values (W m−<sup>2</sup> ); bias, mean difference between predicted and observed (W m−<sup>2</sup> ); MAE, Mean Absolute Error (W m−<sup>2</sup> ), RMSE, Root Mean Square Error (W m−<sup>2</sup> ); rRMSE, Relative RMSE (–); r, Pearson correlation coefficient (–).



**Table 8.** *Cont*.

### *4.3. Effects of Ancillary Inputs*

Ancillary data is required to characterise the canopy structure, since it affects both the radiation transmission through the canopy [53], and hence albedo and radiation partitioning, as well as the surface aerodynamic properties [79]. In this study we have used a static land cover map at global scale to assign some standard values to each land cover type (Table 2). However, the large difference in spatial resolution between the S2 data and CCI map can lead to visible artefacts in the output fluxes when modelled at 20 m resolution, especially on the edges of two classes with different vegetation properties (e.g., croplands and forests). However, those spatial artefacts seem not to have any influence on the validation results. Nevertheless, some discrepancies were found between the land cover type flagged by the map and the actual type at the validation sites. In Majadas de Tiétar, CCI-LC flagged the site as cropland (CCI-LC = 11), thus *hc*,*MAX* = 0.5 m, *f<sup>c</sup>* = 1 and *l<sup>w</sup>* = 0.02 m, but actually this site is a savanna with 8 m tress at 20% coverage (CCI-LC = 30). In addition, the prescribed values that were assigned in Table 2 are very general, as they are trying to fit a global-based land cover legend. Therefore they can significantly deviate from the site's actual values. Indeed, all croplands were assumed to be not clumped (*f<sup>c</sup>* = 1) although row crops, like the vineyard in Sierra Loma, or orchards like the olive grove in Taous have very different canopy structure compared to a standard crop. Therefore, a significant improvement could be expected if a more area-specific surface characteristics parametrization was used, either using some ancillary remote sensing like SAR imagery or LiDAR or a regional/local oriented land cover classification.

To conclude, atmospheric forcing from numerical weather prediction models might add some uncertainty to the ET model compared to using local meteorological data, specially for precision agriculture where access to local agrometeorological stations is possible. In this study we relied on ERA5 reanalysis data and despite large discrepancy between spatial resolution of ERA5 (tens of kilometres) and point scale measurements from the towers there is a strong agreement for the most important meteorological parameters (Figure 3). Instantaneous shortwave irradiance, which was computed at the Sentinel 3 overpass time using ECMWF AOT and TCWV dataset, showed no systematic bias but a RMSE of 29 W m−<sup>2</sup> . This in turns directly effects on the accuracy of *Rn* (Tables 3–7), and indirectly that of *λE* since it is estimated as a residual of the energy balance. Therefore, errors in *λE* could be significantly reduced if more accurate inputs of irradiance were used, especially over temperate areas (i.e., radiation limited) which are more sensitive to uncertainty in available energy. On the other hand, the errors in both air temperature (RMSE = 1.8 K) and windspeed (RMSE = 1.3 m s−<sup>1</sup> ) affect mainly on the retrievals of sensible heat flux. This issue become more relevant in the estimation of *λE* over semi-arid (i.e., water limited) areas. For near-real-time applications it is necessary to use forecast or analysis data, instead of the ensemble mean reanalysis data, and those issues could become more evident.

**Figure 3.** Scatterplot between the input ERA5 ensemble meant reanalysis data and in situ measurements for the main atmospheric forcings. Data from all validation sites is shown on the same plot.

#### **5. Conclusions**

The multispectral shortwave images acquired by the Sentinel-2 satellites at 10–20 m spatial resolution are highly suitable for characterizing vegetation in order to derive inputs required for evapotranspiration models. At the same time, thermal data acquired daily by Sentinel-3 satellites is also suitable as input to the ET models. However, its low spatial resolution (1 km) needs to be increased if the models are to be run at the scale of predominant landscape features, which is usually on the order of tens of meters. This study evaluated three thermal-based remote sensing ET models (METRIC, ESVEP and TSEB-PT) and a thermal sharpening method (ensembles of modified Decision Trees) in order to derive methodology for operational estimates of water and energy fluxes using Sentinel data and applicable for the whole globe. Further evolution of the thermal sharpening methodology by using other data sources with high spatial resolution and variable temporal resolutions, e.g., Sentinel-1 radar [95] or Landsat thermal observations [96], is planned.

TSEB-PT produced overall the most accurate estimates in terms of sensible heat and latent heat (i.e., evapotranspiration) fluxes, being robust in different land covers and climates. Additional disaggregation step further improved TSEB-PT output accuracy in savanna ecosystems. Without any site-specific tuning and relying only on global datasets the methodology achieved RMSE of 80–90 W m−<sup>2</sup> for modelled instantaneous *H* and *λE* across eleven validation sites located in different land cover classes and climatic conditions. In an agricultural setting the modelled fluxes were more accurate with rRMSE of *λE* of around 0.3 which is of the same magnitude as uncertainty of the measured turbulent fluxes from the validation dataset. Until a new generation of thermal satellites are launched [97], the proposed methodology will be useful solution for overcoming the lack of thermal data with high spatio-temporal resolution required for operational ET modelling at field scale.

**Supplementary Materials:** The following are available at http://www.mdpi.com/2072-4292/12/9/1433/s1, Figure S1: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Sierra Loma, Figure S2: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Choptank, Figure S3: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Dahra, Figure S4: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Grillenburg, Figure S5: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Harvard, Figure S6: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Hyltemossa, Figure S7: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Klingenberg, Figure S8: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Majadas, Figure S9: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Selhausen, Figure S10: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Taous, Figure S11: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Kendall Grassland.

**Author Contributions:** Conceptualization, R.G. and H.N.; Data curation, R.G., H.N., I.S. and G.K.; Formal analysis, R.G., H.N., I.S. and G.K.; Investigation, R.G. and H.N.; Methodology, R.G. and H.N.; Software, R.G. and H.N.; Writing—original draft, R.G. and H.N.; Writing—review & editing, I.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the European Space Agency contract number 4000121772/17/I-NB.

**Acknowledgments:** We would like to thank all the PIs (listed in Table 1) of the flux towers used in this study for making their data available for model validation. We gratefully thank the TERestrial ENvironmental Observations (TERENO) for providing data at Selhausen site. Technical support from Pascal Fanise (CESBIO) and Zoubeir Majoub (Institut de l'Olivier) for the Taous tower dataset is gratefully acknowledged. Data collection at the Kendall Grassland site (AmeriFlux site: US-Wkg) was supported by USDA-ARS and by the U.S. Department of Energy. This study was conducted as part of the Sen-ET project (http://esa-sen4et.org/).

**Conflicts of Interest:** The authors declare no conflict of interest.
