1. Introduction
The fluxes of water (e.g., evapotranspiration—ET) and energy (e.g., of latent and sensible heat) at the surface of the Earth are critical to quantify for many applications in the fields of climatology, meteorology, hydrology and agronomy. Easy access to reliable estimations of ET is considered a key requirement within natural resource management, and if ET can be estimated accurately enough it holds a vast potential to assist in the current attempts of meeting the UN Sustainable Development Goals (SDG), e.g., SDG2—zero hunger, or SDG6—clean water and sanitation (
https://sustainabledevelopment.un.org, last accessed 10 December 2018).
Water and energy fluxes show large spatio-temporal variability since they are highly dependent not only on the meteorological conditions, but also on different characteristics and properties of the land surface, such as soil moisture/water availability, land cover type and amount of vegetation biomass and its health. Remote sensing data can provide spatially-distributed information about relevant land surface states and properties used to model the relevant fluxes and hence this technology addresses a key limitation of conventional point scale observations when estimating fluxes at watershed and regional scales. In particular, thermal remote sensing has been widely used for assessing land surface turbulent fluxes [
1]. While there are a variety of existing remote sensing ET methods and data options available [
2,
3], none is fully satisfying the user needs for reliable, operational and easy accessible estimates and tools able to derive ET at agricultural-parcel scale. The limitations have so far primarily been centred on the lack of suitable satellite-based input data sources.
With the recent launch of Sentinel-2 and Sentinel-3 satellites, the data foundation for producing operational ET maps has been set since as a constellation they contain most of the required spatial, temporal and spectral characteristics [
4]. Sentinel-3 Sea and Land Surface Temperature Radiometer (SLSTR) instrument acquires daily thermal infrared (TIR) information of the surface at ca. 1 km scale [
5]. However, the reliable estimation of ET in agricultural and heterogeneous landscapes requires that the model’s spatial resolution matches the dominant landscape feature scale, usually tens or hundreds of meters. Sentinel 2, with a spatial resolution ranging from 10 to 60 m and 5 day revisit time with Sentinel 2A & B combined [
6], can resolve part of these scaling issues, although it lacks a TIR instrument at high spatial resolution such as in the Landsat missions. Therefore sharpening [
7,
8,
9] and/or disaggregation methods [
10] are required to bridge the spatial gap between the currently available Sentinel constellation’s thermal-infrared (referred to as “thermal” in the reminder of the paper) and optical-shortwave (referred to as “shortwave” in the reminder of the paper) observational capabilities in order to optimally exploit the synergies of both types of sensors for field-scale ET estimations. The aim of this study is to develop an optimal combination of thermal sharpening and ET modelling methods for the derivation field-scale ET with combined Sentinel-2 and Sentinel-3 observations.
Several data fusion methods have been proposed to merge low resolution thermal infrared imagery with high resolution shortwave imagery in order to obtain estimates of surface temperature () and/or ET at high spatial resolution. In this study we focus on different, but possibly complementary, approaches: empirical and semi-empirical methods that exploit relationships between shortwave bands and thermal or ET data (hereinafter called image sharpening methods); and physically-based ET downscaling methods (hereinafter called ET disaggregation).
Thermal image sharpening uses information from the thermal and shortwave images themselves to calibrate empirical or semi-empirical models. Those models relate coarse resolution (or ET) with coarse resolution (or fine resolution aggregated to coarse resolution) shortwave bands, and then apply the calibrated model to the fine scale shortwave image, producing either a sharpened , or directly an ET product.
One of the first attempts to sharpen
was TsHARP [
11], who tested different regression models between
and NDVI. Since then, TsHARP has been utilised as reference method for developing and testing other sharpening methods [
8,
12,
13]. The Data Mining Sharpening (DMS) approach [
8] used local and global regression trees between reflective bands and
of homogeneous samples at coarse scale (based on coefficient of variation threshold). Residual analysis was performed to ensure energy conservation (based on emitted radiances) between original resolution and sharpened images. To avoid overfitting of regression trees such as in DMS the use of random forests was proposed instead [
14]. Following with the machine learning algorithms, Yang et al. [
15] used an Artificial Neural Network with Genetic Algorithm and Self-Organizing Feature Mapping trained with different land surface parameters for each land cover class (vegetation, bare soil, urban and water). A different approach used an unmixing method to derive brightness temperature and emissivity at fine scale [
16]. The unmixed brightness temperature and emissivity were then the inputs to a generalized split-window algorithm to retrieve fine resolution
.
The use of a contextual algorithm can also be applied in sharpening, such as is the case of DISPATCH-LST (DISaggregation based on Physical And Theoretical scale CHange) by Merlin et al. [
7] who used shortwave information on fractional vegetation cover and fractional photosynthetically active vegetation cover in contextual scatterplots of fractional green vegetation cover versus
and albedo versus
to define minimum and maximum soil and canopy endmember temperatures. Finally, two or more different methods can be used together and combined through weighted averaging, such as in Chen et al. [
17], who combined TsHARP and a Thin Plate Spline interpolation by weighting their corresponding residuals. Besides of the fact that all methods described above can be used as well to sharpen ET, other studies have already suggested methods to directly downscale coarse scale ET using shortwave data [
18,
19,
20,
21]. In any case, shortwave images provide limited information related to some surface energy balance processes, such as turbulent transport, soil moisture, and meteorological forcing. Therefore ancillary variables could be included in
or ET sharpening such as land cover maps (to account for different aerodynamic roughness), local meteorology, or surface geometry [
22].
A previous study [
4] found that using a “disaggregation” approach [
10,
23] significantly enhanced the accuracy of turbulent fluxes derived with sharpened
. That approach ensures spatial consistency between fluxes derived at fine and coarse spatial scales by first estimating them at the coarse scale at which the thermal observations were acquired. In the following step, the low-resolution air temperature is varied to adjust the flux estimates for all high-resolution pixels falling within one low-resolution pixel. This is repeated until a consistency between the two scales is obtained. This approach assumes that since the coarse scale estimates are derived with
at original spatial resolution they are of higher accuracy. The disaggregation was shown to improve ET model skill when compared with outputs produced at either coarse or fine resolution alone [
4,
23,
24].
The sharpened
can be used as input to land-surface energy flux models. The latent heat flux
(or energy used for ET) can be estimated as the residual of the surface energy budget, using estimates of the net radiation (
), soil heat flux (G) and sensible heat flux (H). The thermal-based ET models were originally formulated for computing H, which is governed by the bulk resistance equation for heat transfer [
25], and is driven by the gradient between an ensemble surface temperature, called the “aerodynamic surface temperature” (
), and the surface layer air temperature. Besides of the estimation of that surface-to-air temperature gradient, the estimation of H requires the modelling of an aerodynamic resistance term, which can be viewed as a simplification of the complex turbulent transport of heat, momentum and water vapour, by using a similarity with Ohm’s law for electric transport. These resistances therefore represent how efficiently a scalar (heat, momentum or water vapour) is transported from one point to another following a gradient (i.e., vertical differences of temperature and/or vapour pressure). Several formulations and/or parametrizations have been proposed to describe these turbulent transport processes but generally they include variables related to surface aerodynamic roughness, wind speed as well as wind attenuation through the canopy, and atmospheric stability [
26].
The challenge in resistance energy balance models is that
cannot be directly estimated by remote sensing [
27,
28]. Hence, remote sensing ET models differ from each other on how the existing difference between the radiometric temperature (
) observed by satellite sensors and
is considered. Single-source or bulk transfer schemes for modelling H treat soil and canopy as a single flux source and often employ an additional resistance term (
, usually dependent on the Stanton number
) because heat transport is less efficient than momentum transport from land surface (see e.g., Garratt and Hicks [
29] or Verhoef et al. [
30]). Appropriately calibrated, one-source energy balance (OSEB) models have shown satisfactory estimates of surface energy fluxes in heterogeneous landscapes [
31,
32,
33,
34]. However, due to the difficulty in robustly and parsimoniously parametrizing
for OSEB schemes at different landscapes, climates, and observational configurations [
35], the two-source energy balance (TSEB) modelling approach was developed [
36]. TSEB models partition the surface energy fluxes and the radiometric temperature between nominal soil and canopy sources, and include a more physical representation of processes related to
and
without requiring any additional input information beyond that needed by single-source models using more sophisticated
parametrizing. However, because direct measurements of canopy (
) and soil (
) temperatures rarely are available, in most applications these component temperatures are derived from a measurement of the bulk surface radiometric temperature
. Partitioning of
between
and
requires some assumptions related to the evaporative efficiency of soil or canopy [
36,
37,
38].
Finally, like all remote sensing retrievals, satellite radiometric temperature is prone to uncertainty due to sensor noise, surface emissivity and atmospheric effects. To overcome this issue in ET estimation, several methods have been proposed based on either contextual models [
39,
40,
41], by constraining the ET range between hot (no ET) and cold (potential ET) pixels [
31,
32], or using time-differenced morning temperature rise [
42,
43]. Regarding the contextual methods, all of them require homogeneous forcing and coupling between land surface/atmosphere which is a disadvantage when applied at large scales. In addition, those models assume that the coldest pixel in the image means potential transpiration, and the hottest pixel means zero transpiration which is not always the case (e.g., in humid and sub-humid areas).
In this study we will evaluate three different ET models driven by Sentinel-2 and Sentinel-3 imagery: METRIC [
32] is a one source energy balance model that is less sensitive to heat transfer coefficient parametrizing than other OSEB model such as SEBS [
33]; TSEB-PT [
36] as a widely used two source energy balance model; and ESVEP [
44] as a hybrid contextual-two source energy balance model.
3. Results
The overall performance of the tested models using sharpened temperatures from Decision Trees regressor (hereinafter
) is shown in
Table 3. Scatter plots of modelled versus measured fluxes for all the validation sites are in the
Supplement. We removed all the cases in which the S3 image was contaminated by clouds in the vicinity of the flux towers or in which the SLSTR view zenith angle was larger than 45 degrees. In addition, we filtered all cases where estimated
W m
, assuming that noisy outputs will be produced under low available energy, as well as those yielding unrealistic fluxes during daytime (≤−500 W m
and ≥1000 W m
). After filtering the data, more than 400 cases were available overall for the following analyses. However, it is worth noting that ESVEP yielded significantly fewer valid retrievals. This issue might be due to the fact that ESVEP’s end-member estimation equations were designed and parametrised for herbaceous crops [
44] while in this study they were applied to varied land-covers. All models returned a similar performance regarding the estimation of
, with mean bias between −10 and −24 W m
, RMSE ranging between 49 and 59 W m
and
r above 0.91. This similar behaviour is explained by the fact that all models share the same approach and same inputs in modelling net shortwave radiation, which is the component with larger magnitude of
. Likewise, G showed similar behaviour as well, but in this case
is computed differently as it is a function of surface
[
31,
32] as opposed to TSEB and ESVEP where, as two-source models, G is computed from
[
36,
44].
The main differences in model performance are therefore in the estimation of turbulent fluxes (i.e., sensible and latent heat fluxes), and TSEB (TSEB-PT and disTSEB) usually produced most accurate estimates in terms of RMSE (≈80 W m, 45% relative error, in H; and ≈90 W m, 45% relative error, in ) and higher correlation between observed and predicted values (≈0.67 for H and ≈0.76 for ). disTSEB performs slightly better than TSEB-PT but the difference is not significant. For METRIC and ESVEP, the RMSE values are in all cases higher than 120 W m (going as high as 220 W m in case of H modelled with ESVEP) and with lower correlation (≤0.47).
The choice of closing the energy balance gap in field measurements by assigning it to
has influence on the above results. Therefore, in
Table 4 we also present the accuracy statistics of the turbulent fluxes when Bowen ratio is preserved during the energy gap closure procedure. The overall ranking of the models is preserved with the TSEB models still obtaining the lowest RMSE and highest correlation coefficients. However, the differences between the models (particularly in case of RMSE) are not as large as in
Table 3. In particular the RMSE of the TSEB models increases significantly while there is a decrease in
r, while the influence of closure method on the other two models is much weaker with the RMSE of ESVEP even decreasing slightly. In subsequent analysis we always assign the residual energy to
.
In order to evaluate the model sensitivity and uncertainty to different vegetation types, we have split the results of
Table 3 into four main vegetation types, depending on differences in aerodynamic roughness, horizontal homogeneity and/or seasonal dynamics/senescence (i.e., croplands, grasslands, savannas and forests,
Table 5). Similar to the overall results, the TSEB models output most accurate turbulent fluxes across all four vegetation types. They obtain the best results for H in grassland (RMSE
W m
, r
) and for
in cropland (RMSE
W m
, r
). In grassland and cropland TSEB-PT and disTSEB produce very similar fluxes while in savanna disTSEB improves the accuracy of modelled H and
by up to 10 W m
. METRIC has its best overall performance in savanna (RMSE of 132 W m
and r of 0.43 for H; RMSE of 99 W m
and r of 0.61 for
) followed by cropland while ESVEP produces inaccurate H in all vegetation types (rRMSE > 1) and its best overall
in grassland. It should also be noted that RMSE of
is for all models double in savanna (≈65 W m
) than in the other land cover types. This is due to vegetation being most sparse at those sites meaning that uncertainties in estimation of albedo and emissivity of soil have the biggest influence on shortwave and longwave net radiation respectively. Finally, very few valid cases are available to evaluate the forest sites and hence the results are not very conclusive, with the TSEB models again outperforming the METRIC and ESVEP.
The agriculture class was further split into herbaceous and woody types, with results shown in
Table 6. The former sub-class represents crops such as corn, soybean or wheat while the latter represents olive groves and vineyards. TSEB models produce the most consistent results for both types of crops, although somewhat surprisingly the RMSE of
in woody crops (76–79 W m
) is significantly lower than in herbaceous crops (91–93 W m
), while opposite is the case for RSME of H (69–71 W m
in herbaceous crops and 91–94 W m
). rRMSE of
in both agricultural sub-classes was 0.32 which is of the same magnitude as energy closure gap at the validation sites (e.g., the mean value at CH was 0.34 at the times at which fluxes were modelled). METRIC is very clearly performing better in woody crops, while ESVEP obtains better results for H in herbaceous crops and better results for
in woody crops. It is also worth noting that
and G showed larger relative errors in woody crops than in herbaceous crops, since woody canopies are more complex and therefore more difficult to capture by the models and/or parametrizations used [
89,
90].
Finally,
Table 7 lists the model performance depending on whether sites are under Mediterranean and semi-arid climate (i.e., water limited sites), or sites under temperate climate (i.e., energy limited sites). First of all it is worth noting that due to cloud coverage conditions, more valid cases are obtained over semi-arid conditions than in temperate areas. TSEB models showed similar range of errors in both climatic conditions, with RMSE in
at around 85 W and 99 W m
for semi-arid and temperate conditions, and correspondingly around 80 and 70 W m
for H. ESVEP and METRIC yielded more varying results between climates, with METRIC producing more accurate estimates of both H and
in semi-arid conditions and ESVEP showing better performance for H in temperate climates and better performance for
in semi-arid climates.