*2.3. Downscaling Approach*

Bisquert et al. [1] tested different downscaling methods with pairs of Landsat/MODIS images in this Barrax area. A modification of the sharpening method presented by Agam et al. [19] showed the best results. This method is based on the linear relationship established between NDVI and LST at the MODIS 1000 m resolution (NDVIMOD and LSTMOD, respectively). The approach outlined by Bisquert et al. [1] has been revised and adapted to the combination MODIS-S2 and used in this work as a basis to derive 10-m LST maps.

The flowchart in Figure 4 shows the main steps and calculations of this downscaling algorithm that can be summarized as follows:


**Figure 4.** Flowchart of the downscaling methodology, including the different processing steps, inputs and outputs. Variable descriptions are included in the text.

This downscaling procedure was applied to pairs of MODIS-S2 images. When no Sentinel-2 image was available concurrent with the MODIS overpass, close in time images (±1 day) were used, under the assumption of minimum changes in NDVI. Note the normalization procedure applied in step 3 reduced possible differences at this point.

The assessment of the MODIS-S2 downscaling method was carried at both local and distributed scales, by comparison with ground measurements and Landsat-7/ETM+ LST products, respectively. In this last case, the comparison was established at the 30-m spatial resolution provided by U.S. Geological Survey (USGS). Following Gao et al. [20], the aggregation of 10-m S2 LST was done through the Stefan-Boltzmann law, with the assumption of similar emissivity values for adjacent pixels. Some differences may arise due to the 20–30 min delay in acquisition time between MODIS and ETM+ sensors. A normalization procedure was applied to minimize these discrepancies in LST values [1]. A linear regression between the aggregated Landsat and MODIS images at 1000 m was obtained, and then applied at 30-m spatial resolution, for each pair of MODIS-ETM+ images.

The model performance was quantified in terms of classical statistical metrics, such as the determination coefficient (r<sup>2</sup> ), the root mean square difference (RMSD), the systematic difference parameter (Bias), the mean absolute deviation (MAD), or the mean absolute deviation in percentage (MADP) [40]. Following Schneider et al. [41], other statistics considered more robust and less influenced by outliers were also calculated, such as the median bias (Me), robust standard deviation (RSD) and robust RMSD (R-RMSD). The skewness and kurtosis were also included, which quantitatively describe the distribution of the differences between the estimated and observed values.

#### **3. Results**

#### *3.1. Ground Validation*

The field scale assessment was performed using the ground data as a reference. IRT radiometric temperatures were corrected from atmospheric and emissivity effects, and average values for each 10-min transect/field were calculated. A total of 51 LST<sup>g</sup> data were available for this study (Table 1). Measured LSTg values were in the range 297–327 K. The lowest values were observed for fully vegetated crops (grass, potato, or maize), whereas the largest values corresponded to bare soil, soil dominated

crops (vineyard and almond orchard) and senescence cereals. Standard deviation of LST<sup>g</sup> data per crop field were <±1.5 K in 90% of the dataset, with a maximum value of ±1.9 K, showing the thermal homogeneity of the fields and the thermal stability during the 10-min interval.

The methodology described above was applied to the six pairs of MODIS-S2 images listed in Table 1. Mean values of 5 × 5 high-resolution pixel arrays, centered in the location of the ground transects, were calculated and plotted against LSTg (see Figure 5). Disaggregated LST values ranged from 302 to 322 K, pointing to a certain limitation of the disaggregation technique to reproduce extreme low and high temperatures. Values of the standard deviation for the 5 x 5 pixel averages were <± 2.0 K.

All parcels in this study were provided with sprinkler irrigation system, except the vineyard and almond orchard, where drip irrigation was supplied. Irrigation was scheduled and frequently applied during the study period. For a few hours after an irrigation event, a cooling effect occurs consequence of the wetted surface. This effect is stressed when sprinklers are used. This was the case of 20% of our dataset, with 12 ground transects collected just a few hours/minutes after irrigation events. These points are plotted with non-filled circles in Figure 5. Note the evident overestimation of the disaggregated LST compared to LST<sup>g</sup> values, with differences >10 K in some cases. These results reinforce Bisquert et al. [26] findings pointing a shortcoming of the method over wet soil areas.

Certain limitations were also observed for the highest LST values. By stablishing a threshold of 325 K, only five data were excluded corresponding to fallow and tilled barley or poppy.

**Figure 5.** Linear regression between disaggregated LST10m and ground-measured values (LSTg). Crop fields under recently irrigated conditions are labeled with a non-filled circle. Outlier high LST values are labeled with triangles. Dashed line represents the 1:1 agreement.

Focusing on LST data lower than 325 K, and excluding points corresponding to recently irrigated conditions, a good agreement (r<sup>2</sup> = 0.90) is observed between disaggregated LST10m and LST<sup>g</sup> values (Figure 5). Differences range between ±4.0 K, with a systematic deviation of 0.2 K and a RMSD value of ±2.2 K (see Table 2). The kurtosis values (~−1) indicate a behavior close to the normal distribution, while the negligible skewness observed indicates a LST-difference distribution closely centered at 0.

The plot in Figure 6 superposes results obtained running the Bisquert et al. [1] algorithm. Good agreement is also observed by this original formulation, although some scatter is added, with an increase in the RMSD value up to ±2.7 K in this case.

MODIS LST values are superposed to plot in Figure 6 too, showing a large scatter (RMSD=±8.0 K) and discrepancies >10 K. This deviation is stressed for low temperatures registered in small size vegetated parcels that are surrounded by bare soil or other croplands with higher LST. This effect was already observed by Bisquert et al. [26].

−

**Figure 6.** Disaggregated LST10m (dots) and Moderate Resolution Imaging Spectroradiometer (MODIS) LST (crosses) versus ground LST measurements (LSTg). Non-filled dots represent results using the original Bisquert et al. [1] formulation of the downscaling approach. Error bars are not included in this figure for cleanliness. Dashed line represents the 1:1 agreement.

**Table 2.** Quantitative analysis of the differences between disaggregated LST or MODIS LST, and ground-measured LST data. The statistics include: mean bias (Bias); standard deviation (SD); mean absolute deviation (MAD); Mean Absolute Deviation in Percentage (MADP), obtained as the MAD divided by the mean observed value; root mean square difference (RMSD); coefficient of determination (r2 ); median bias (Me); robust standard deviation (RSD); robust RMSD (R−RMSD); skewness (S); and kurtosis (K).


#### **in ax Bi as D A D**  *3.2. Distributed Assessment*

**S**

**D P**  **M S**

**r 2** **M e** 

**N = 34** 

> **L S**

**M**

**M**

**(K ) (K ) (K ) (K ) (K ) ( % ) D (K ) (K )**  −3 .6 3. 9 0. 2 2. 2 1. 9 0. 6 2. 2 0. 90 0. <sup>2</sup>2.8 2.8 -0.04 -1.2 Beyond the ground validation at a field scale, the model performance was assessed at a larger distributed scale by using the three concurrent Landsat-7/ETM+ images as a reference. The Single-Band Atmospheric Correction (SBAC) tool, recently introduced by Galve et al. [39], was used in this work for the correction of the TIR data.

**RSD (K) R-RMSD (K) S K** 

Prior to the *downscaling* assessment, the feasibility of the Landsat-derived LST data needs to be tested. Ground data were also used to evaluate the Landsat-7/ETM+ LST estimates. The plot in Figure 7 shows the comparison between estimated and ground-measured LST values for a total of 21 data available for the three Landsat dates/images. Differences ranged within ± 3.5 K, except four cases corresponding to 1.1 and 1.2 sites. Note that these are small size parcels (<2 ha), for which the spatial resolution of Landsat 7/ETM+ is not fine enough, resulting in an overestimation of the LST values in these vegetated targets (potato and grass). Excluding these data from the analysis, a good matching with the 1:1 line was observed, with a coefficient of determination of r<sup>2</sup> = 0.96. An average error of RMSD = ±1.8 K was obtained. These results are in agreement with those reported by Galve et al. [39], using data from 2015–2016 in this same agricultural area. A RMSD value of ±1.6 K was obtained by these authors using ground LST data measured in six of the crop fields within "Las Tiesas" experimental farm also used in the present work.

Note significant differences in terms of LST between MODIS and high-resolution sensors (Landsat or ASTER) up to 2–3 ◦C have been reported, induced by difference in the retrieval algorithm, atmospheric correction, sensor performance, acquisition time, view geometry, or spectral response function [10,42–45]. Weng et al. [44] pointed out that comparison of thermal data from different sensors requires some pre-processing procedure. In this work, the differences in the spectral characteristics and overpass time (20–30 min difference) between Landsat and MODIS were minimized by applying a normalization process to the Landsat bands [1]. The disaggregated LST10m were aggregated to the equivalent 30 m Landsat pixels (LST30m) by a 3 × 3 pixel averaging, based on the Stefan-Boltzmann law, following Gao et al. [20].

**Figure 7.** Linear adjustment between L7-ETM+ estimates and ground-measured LST values. Data collected in Sites 1.1 and 1.2 (field extension <2 ha) are labeled with non-filled dots. Dashed line represents the 1:1 agreement.

Figure 8 shows the comparison between the original 30-m LST derived from L7-ETM+, and LST disaggregation products at 10-m spatial resolution, for a subset of 10 × 10 km<sup>2</sup> centered in the "Las Tiesas" experimental farm. Visual inspection points the significant improvement in the capacity to discriminate the different field borders. Although the real potential of the downscaling approach is revealed when focusing on parcels <5 ha, where 3 × 3 thermal pixels of L7-ETM+ can be hardly fit in, being these areas the main responsible of the scatter (r<sup>2</sup> = 0.82) observed in the regression plot in Figure 9.

To quantify the performance of the downscaling approach at a full scene perspective, pixel-to-pixel differences were calculated at the 30 m spatial resolution for the selected subset of 10 × 10 km<sup>2</sup> (Figure 9). Statistical metrics of the differences are listed in Table 3. Considering more than 270,000 pixel/data, an average RMSD of ±2.0 K was obtained, with a minor overestimation of 0.3 K.

**Table 3.** Quantitative analysis of the differences plotted in Figure 9. Statistical metrics as defined in Table 2.


**Average** 274643 0.3 1.9 1.4 0.5 2.0 0.82 0.010 2.1 2.1

**Figure 8.** Disaggregated LST10m (left column), LST derived from original 30 m L7-ETM+ Thermal InfraRed (TIR) band (center), and MOD11A1\_LST product (right column). Examples corresponding to dates 9 July 2018 (up), 16 July 2018 (middle), and 25 July 2018 (bottom).

**Figure 9.** Disaggregated LST30m versus L7-ETM+ LST for the full dataset (left). Dashed line represents the 1:1 agreement. Histograms of pixel-to-pixel differences between disaggregated LST30m products and LST estimates from L7-ETM+ images for three concurrent MODIS and Landsat-7 overpasses (right).

LST diff. (K)

#### **4. Discussions**

Agromomy management decisions based on TIR data need confidence in LST estimates. An absolute uncertainty <1.5 K is traditionally reported as a requirement [7,46]. The translation of this uncertainty to ET accuracy depends on the model but ranges between 10% and 20% [47,48]. Based on this threshold, the results obtained in this work (average RMSD = ±2.2 K) are encouraging. Additionally, the 10-m pixel size and the revisit frequency of the MODIS data, much better than the 3–5 days revisit frequency of S2A and S2B satellites, can fulfill the LST input requirements of the surface energy balance methods for a variety of hydrological, climatological or agricultural applications. At this point, no significant differences in the model performance were observed in connection with the collocation delay between the MODIS overpass and the Sentinel-2 image used, i.e., ±1-day mismatch seems not to have had an effect on the disaggregation.

With the new treatment of the residuals (RLST) introduced in this work, as part of the downscaling scheme, an improvement around 20% was obtained in the performance of the original formulation of the model [1] that simply included a Gaussian filter, as suggested by Anderson et al. [2].

Ground validation results are in agreement with those obtained by Bisquert et al. [26] using MODIS-Spot 5 pairs in this case. These authors reported a bias of 0.2 K and a RMSD of ±2.4 K based on the comparison between disaggregated 10-m LST and ground-measured LST values for 10 different dates and five different fields. Regarding the distributed assessment, our results are similar to the RMSD value of ±2.6 K reported by Bisquert et al. [26] at a scene scale. In a first work using MODIS-Landsat combination [1], these authors reported an average RMSD = ±2.0 K for disaggregated 60-m LST in this case.

In this context, Agam et al. [19] tested a sharpening model (TsHARP) over extensive corn/soybean fields in central Iowa, USA. RMSD values between ±0.7 and ±1.4 ◦C were obtained by sharpening down simulated MODIS thermal maps at 1000 m to 250 m and between ±1.8 and ±2.4 ◦C by sharpening simulated thermal Landsat maps from 60 and 120 m to a VNIR 30 m resolution. Also using TsHARP, Duan and Li [49] disaggregated MODIS LST from 1000 m to 90 m, with an uncertainty of ±2.7 ◦C. Jeganathan et al. [14] tested TsHARP from MODIS over a heterogeneous agricultural landscape in India, and found uncertainties ranging ±2–3 K, using ASTER thermal data as a reference. Eswar et al. [23] used a thermal sharpening technique with five different indices to downscale MODIS LST from 960 m to 120 m, and compared this with the Landsat 7 LST data at different sites in India. These authors found that NDVI/FVC showed better result for wet areas, whereas the Normalized Difference Water Index (NDWI) was found better for dry areas. Yang et al. [17] used the multiple linear regression models to downscale the aggregated Landsat TIRS (360 m) image to 90 m, using a relation of LST with multiple scale factors in an area of mixed land covers (water, vegetation, bare soil, impervious surface), and then compared with the pure Landsat LST. The result found was satisfactory with coefficient of determination of 0.87 and RMSD of ±1.13 K. Merlin et al. [10] used a time series of higher resolution Formosat-2 images to test a new disaggregation procedure of kilometric thermal data over an irrigated cropping area in northwestern Mexico during an agricultural season. RMSD values about ±3 ◦C were obtained by these authors.

Many of these previous studies already pointed larger uncertainties in disaggregated LST over irrigated lands [1,10,26]. This is a weekness that remains in the present work since, although affected, VNIR reflectivity data does not fully capture the cooling effect produced in a wetted surface. Therefore, the downscaling technique still fails at reproducing LST values for spots with an undergoing irrigation or recently irrigated targets. In an attempt to face these limitations, some works incorporate additional reflectance information in the regression algorithms. Gosh and Joshi [22] tested several regression algorithms using EO1-Hyperion hyperspectral data over different land use land cover scenes. These authors used three pairs of coincident Hyperion and Landsat 7-ETM+ images as a reference for the assessment. Liu et al. [24] compared the performances of a thermal disaggregation technique, based on three different indices: temperature vegetation dryness index (TVDI), NDVI and fractional vegetation cover (FVC), over a humid agriculture region. These authors found the smallest

RMSD using TVDI, with an improvement of 0.2 K in comparison to the results obtained using NDVI or FVC. A similar reduction of the uncertainty in 0.2 K was obtained by Amazirh et al. [25], thanks to the inclusion of Sentinel-1 radar data, linked to surface soil moisture, in a new formulation to improve the LST disaggregation methodology. These authors used Sentinel-1 imagery to derive 100-m resolution LST, and the results were compared with Landsat LST, used as a reference over two heterogeneous sites (irrigated and rainfed). However, average RMSD values for the six dates of study resulted over ±3.0 K, with even worse accuracy during summer. So, further efforts are still required to improve this soil moisture integration. Further works should also explore the inclusion of additional Sentinel-2 bands in the shortwave infrared (SWIR) in the sharpening scheme, since they might account for vegetation and soil water content [50].

Another finding is this work is the difficulty of the downscaling approaches to reproduce excepcionally high LST when these conditions are constrained to small parcels in the image, and there is a lack of coarse original MODIS pixels showing this homogeneous thermal conditions. The modeled relationship LST-VNIR reflectivities may not fully cover these conditions, leading to an underestimation of LST.

Focusing on the combination of S3-S2 images, very few quantitative studies have been conducted. In a first attempt, Huryna et al. [28] applied TsHARP sharpening. These authors reported LST differences of ±1.3–1.5 K, when compared with sharpened to 60-m S3 temperature with reference Landsat 8 temperature at 60 m, with a positive bias of 0.3–0.6 K, depending on the study site. However, the lack of local measurements prevented these authors from conducting a ground assessment. Further research should merge Sentinel-2 and Sentinel-3 imagery and conduct robust assessment of the downscaling results. Collections of ground LST measurements under a variety of surface and environmental conditions are then required, and the dataset gathered in the framework of this work is potentially attractive for this aim.

#### **5. Conclusions**

This work adds to the previous literature dealing with thermal infrared downscaling. The 10-m LST maps generated from the combination MODIS-S2 can contribute to fill the gap until high spatial-temporal resolution TIR images are available. The linear relation NDVI-LST was adopted as a basis for the downscaling approach. Results obtained encourage the parametrization of the residual as a function of the NDVI as a key step in the algorithm (an improvement of 0.5 K was achieved). The variety of surface conditions and the wide range of NDVI and LST values in the semi-arid area of Barrax allowed a robust assessment of the downscaling approach. An average estimation error of ±2.2 K in LST10m resulted from the ground validation. This evaluation was reinforced by the pixel-to-pixel comparison of rescaled LST30m with Landsat-7/ETM+ LST estimates, showing a similar RMSD of ±2.0 K for the distributed assessment.

Findings in this study highlight the limitations of the methodology to capture the variability of extreme LST, and the problems in recently sprinkler irrigated fields. Results indicate the need for caution, since disaggregated LST under these conditions may result artificially higher than expected.

Despite the weaknesses, this work gives promising insights for the adaptation of this methodology to the tandem S3-S2 in coming works. Further research could also benefit from the ground LST dataset introduced in this paper for a comprehensive performance assessment.

Finally, note that the benefits of this research may extend to other applications, such as monitoring volcanic activity and wildfire, estimating evapotranspiration or assessing drought severity.

**Author Contributions:** Conceptualization, J.M.S.; Methodology, J.M.S. and J.M.G.; Resources, J.G.-P., R.L.-U. and A.C.; Software, J.M.G.; Validation, J.M.S., J.M.G. and R.N.; Writing—original draft, J.M.S.; Writing—review & editing, J.G.-P., R.L.-U., R.N. and A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Spanish Economy and Competitiveness Ministry, together with FEDER funds (project CGL2016-80609-R), the Education and Science Council (JCCM, Spain) (project SBPLY/17/180501/000357), the European Commission (SUPROMED project, N◦ 1813), and Juan de la Cierva Research Grant of Galve (FJCI-2015-24876).

**Acknowledgments:** We thank the ITAP-FUNDESCAM and the Thermal Remote Sensing Group of University of Valencia for the logistic support in the experimental campaigns (MINECO-FEDER, EU projects AGL2017-83738-C3-3-R and CGL2015-64268-R).

**Conflicts of Interest:** The authors declare no conflict of interest.
