*3.2. Sites and Experimental Samples*

Field measurements for the study were made in three locations: a disused airfield at Alconbury (UK), a farm in Grosseto (Italy), and Duxford Aerodrome (UK; Figure 6). In total, fourteen different surface samples were considered, detailed in Table 4 and shown in Figure 7. These included manmade samples such as a large tarpaulin used as calibration targets in an accompanying airborne campaign, and natural samples such as homogeneous areas of grass, sand, and water.

**Figure 6.** Sites in the UK and Italy where the study measurements took place, showing (**a**) detail of Duxford where measurements of grass, gravel, tarmac, and the tarpaulins were collected; (**b**) detail of Grosseto indicating locations of the same tarpaulin measurements there; (**c**) detail of the runway area at Alconbury, showing the grass and the tarmac where all non-grass samples were placed for field measurement, and (**d**) the relative locations of the three field sites.

− − − Field measurements were collected in the late afternoon at all sites to try to maximise thermal contrast between target radiance and downwelling radiance, as recommended in Salvaggio and Miller [41]. All data were collected under stable and generally low wind, clear sky conditions, with these considered suitable for application of the two-lid box method. Air temperatures at the UK sites were between 19 and 25 ◦C, with wind speeds recorded in Alconbury and Duxford of 3.6 ± 0.9 ms−<sup>1</sup> and 4.3 ± 0.3 ms−<sup>1</sup> respectively. Air temperatures in Italy were higher (between 30 and 32 ◦C) with a recorded wind speed of 2.3 ± 0.5 ms−<sup>1</sup> . Relative humidity throughout the measurement period was 44% ± 4%, 56% ± 6%, and 60% ± 7% in Alconbury, Duxford and Grosseto respectively.

**Figure 7.** All samples considered in this study other than distilled water. Samples (**a**) to (**d**) were measured for the Alconbury comparison and show (**a**) black hardboard card on the tarmac, (**b**) construction sand, (**c**) green grass in Alconbury, and (**d**) polystyrene. Samples (**e**) to (**l**) were measured in Italy, with (**e**) the beach near Grosseto from where sand was collected, (**f**) the short green grass in Grosseto, (**g**) the gravel drive in Grosseto with emissivity box pictured, (**h**) a close-up the grey tarpaulin in Grosseto, (**i**) the white, grey and black tarpaulin photographed from the plane in Grosseto, Italy while in use as calibration targets, (**j**) gravel in Duxford during measurement, (**k**) green grass in Duxford, and (**l**) tarmac in Duxford.

**Table 4.** Samples considered for the field and laboratory emissivity inter-comparison, with the number of measurements made using each indicated in brackets in the final column. Note that the three tarpaulins were measured by the EM27 in both Duxford (UK) and Grosseto (Italy).



#### **Table 4.** *Cont.*

#### 3.2.1. Sample Preparation

For the EM27-based field emissivity measurements conducted in Alconbury in May 2018, the black card, construction sand, and polystyrene (Figure 7a,b,d) were placed onto the runway tarmac shown in Figure 6c for measurement. For measurement of the sand, an area greater than the instrument FOV and with a depth of at least 3 cm was prepared on the tarmac. Distilled water was poured into a plastic tray to a depth of 15 mm for measurement, while the grass measurement (Figure 7c) was conducted on the vegetated area neighbouring the runway as indicated in Figure 6c.

For the other field-based emissivity measurements, all samples aside from the beach sand (Sand\_Gro) were measured as found and as shown in Figures 6 and 7. A sample of beach sand (Sand\_Gro) was collected from the beach shown in Figure 7e for measurement the following day at the same time as the other targets. The emissivity measurements of the tarpaulins, which were being used as calibration targets for airborne remote sensing measurements, were collected when the tarpaulins were laid out for the overhead flights as shown in Figure 6b.

For the laboratory emissivity measurements, flat samples such as the tarps, card, and polystyrene were placed under the sample port of the integrating sphere shown in Figure 1a, with no gap between port and sample. To preserve the structure and moisture content of the Alconbury grass sample, a section of turf was extracted (Figure 8a–b) and measured the following morning in a foil container of high reflectivity, with the grass pressed underneath the sample port while ensuring no blades went inside the integrating sphere. This method was chosen over the method detailed in Salisbury and D'Aria [48] as it better mimicked field conditions. Distilled water, construction sand, and gravel were all placed into petri dishes such as that in Figure 8c, and measured as close to the sphere port as possible without risk of contaminating the inside of the sphere. Due to the uneven shapes and surfaces of the gravel, some gaps were observed between the sample and sphere (distances < 10 mm) as shown in Figure 8d. However, as with the grass, it was determined as preferable to measure the sample unaltered rather than change the structural composition so as to best mimic field conditions.

**Figure 8.** Laboratory preparation of (**a**) the grass sample from Alconbury from the side and (**b**) from above, (**c**) the gravel sample from Duxford from above, and (**d**) underneath the sample port of the external integrating sphere.

#### *3.3. Emissivity Measurement Comparison*

μ The surface spectral emissivities derived from measurements made by the two portable FTIR spectrometers (EM27 and D&P) along with those from the laboratory setup (Vertex 70) were compared to determine the absolute emissivity differences and the degree of agreement of the identified spectral features. The comparison was limited to the spectral range 8–13 µm, since this covers the wavelength range commonly employed in LST retrieval algorithms [62]. Measurements of the tarpaulins made using the EM27 in both Italy and the UK were compared to enable assessment of the performance of the same method in different environments.

Each FTIR-derived emissivity spectrum was then convolved with the Heitronics K15.85 radiometer spectral response function (Figure 4) to obtain broadband emissivity values comparable with those derived from the emissivity box.

#### *3.4. Evaluation of Impact on LST*

ε To understand the impact of any noted emissivity differences on LST estimation, a scenario was simulated for a near-surface Heitronics KT15.85 radiometer observing six samples measured in Grosseto and Duxford. Atmospheric transmissivity and path radiance effects were negligible due to the near-surface nature of the simulated observations, and sample-specific input values for land surface BT and sky BT were taken from in-situ LWIR radiometer measurements collected during the campaign (Table 5). Input emissivities (ε) used were the broadband emissivities derived for each of the measurement methods used to assess the emissivity of that sample. LSTs corresponding to the radiometer were calculated for each sample input emissivity as in Guillevic et al. [3]:

$$LST = B^{-1} \left[ \frac{1}{\varepsilon} \left( L\_{\text{surf}} - (1 - \varepsilon) L\_{\text{sky}}^{\downarrow} \right) \right] \tag{10}$$

where *B* −1 (*L*) is the inverse Planck function describing the blackbody equivalent temperature (*T*, kelvin) of spectral radiance (*L*, W.m−<sup>2</sup> .sr−<sup>1</sup> .µm−<sup>1</sup> ), *L*surf the spectral radiance (W.m−<sup>2</sup> .sr−<sup>1</sup> .µm−<sup>1</sup> ) corresponding to the input surface viewing BT, and *L* ↓ sky the downwelling atmospheric LWIR spectral radiance (W.m−<sup>2</sup> .sr−<sup>1</sup> .µm−<sup>1</sup> ) corresponding to the sky viewed BT. Uncertainties were calculated and propagated as in Ghent et al. [63] and detailed in Appendix A.


Tarmac Duxford, UK 320 240

**Table 5.** Input surface viewing and sky viewing brightness temperatures used to simulate land surface temperature with the measured emissivities.

#### **4. Results**

#### *4.1. Emissivity Measurement Inter-comparison*

#### 4.1.1. Spectral Emissivities

Emissivity spectra (8–13 µm) for the Alconbury surface samples are shown in Figure 9, as measured in the field using the EM27 and in the laboratory using the Vertex 70. Figure 10 shows the spectral emissivities of the white, grey, and black tarpaulin as measured in the laboratory using the Vertex 70 and in Grosseto and Duxford using the EM27 and D&P spectrometers. Field- and laboratory-derived emissivity spectra of the other samples from Grosseto and Duxford are shown in Figure 11.

#### Alconbury

Results from Alconbury (Figure 9) enable a direct comparison of the laboratory (Vertex 70) and field-measured (EM27) spectral emissivities. The closest absolute agreement is found for the graybody samples (grass and water), with high and spectrally flat emissivities within 1% of each other between 8 and 12 µm. The laboratory and field measurements of polystyrene and card are generally within 2% across 8–12 µm, but differences of up to 0.04 are observed at points for the polystyrene (for example around 9.56 µm). For the sand sample, the laboratory and field measurements are within 2% between 9.8 µm and 12 µm but there are differences of 10–15% in the restrahlen bands over 8.0–9.5 µm, with the laboratory measured emissivity higher than the EM27 field measured emissivity. For the other samples, between 8 and 12 µm the field-derived emissivities tend generally to be slightly higher than the laboratory-derived values. Beyond 12 µm, there is increased noise in the field-measured (EM27) emissivity spectra, which could be due to increased atmospheric effects at these longer wavelengths (Figure 4).

Some non-physical spectral emissivities (>1) are observed in the laboratory measurements of the graybody samples and in the field measurements of the grass shown in Figure 9. Increased uncertainties are also observed for both field and laboratory measurements of the graybody samples compared to the other samples. Given that the surface temperatures of water and grass are sensitive to even low winds [17], increased field uncertainties and noise for these sample measurements are probably due to varying sample temperatures during the measurement. The emissivities greater than unity found in the laboratory measured data appear also to be largely due to noise. Increased noise for these samples is expected due to the limitation of measuring samples of high spectral emissivity (low spectral reflectance) using a laboratory setup operating in directional hemispherical reflectance mode. An alternative explanation of the non-physical emissivities of the grass sample for both field and laboratory measurements could be canopy scattering, with increased emissivities due to the cavity effect [16]. μ μ μ

**Figure 9.** Laboratory-measured (Vertex) and field-measured (EM27) surface spectral emissivities of five different samples, as measured in Alconbury (UK) in May 2018. Values are the mean of all measurements, with the surrounding shaded area indicating the corresponding uncertainty as detailed in Section 3.1. The numbers of measurements made of each sample were listed in Table 4. Grey shaded area indicates the spectral range of the Heitronics KT15.85 IIP radiometer used for the emissivity box measurements.

μ

μ

μ

**Figure 10.** Spectral emissivities of (top to bottom) the black tarpaulin, white tarpaulin, and grey tarpaulin based on data collected in the laboratory (Vertex) and field using (i) the Bruker EM27 FTIR spectrometer (EM27) in both Grosseto, Italy and Duxford, UK and (ii) the Designs and Prototypes µFTIR spectrometer (D&P, grey tarpaulin only). Values are the mean of all measurements, with the surrounding shaded area indicating the corresponding uncertainty as detailed in Section 3.1. The numbers of measurements made of each sample were listed in Table 3. Grey shaded area indicates the spectral range of the Heitronics KT15.85 IIP radiometer used for the emissivity box measurements.

Considerable spectral variability is observed in the three non-graybody surfaces measured at Alconbury, with emissivities going down to about 0.6 (polystyrene, ~8.8 µm). The restrahlen bands (8–9.5 µm) and the Christiansen peak near 12.3 µm are clearly evident in both field and laboratory spectra of sand, although the minima in the restrahlen bands are weaker in the laboratory measurement. Despite absolute differences, the wavelengths at which specific spectral features are observed at correspond very well between the field (EM27) and laboratory (Vertex 70) measurements of the non-graybody samples, particularly for the polystyrene.

μ **Figure 11.** From top left clockwise, spectral emissivity measurements of (i) gravel from Duxford, (ii) beach sand in Grosseto, (iii) the sandy gravel drive in Grosseto, and (iv) short grass in Grosseto. Measurements were made using a Bruker Vertex 70 laboratory setup, a Designs and Prototypes µFTIR spectrometer (D&P) operated in the field, and a Bruker EM27 also operated in the field. Values are the mean of all measurements, with the surrounding shaded area indicating the corresponding uncertainty as detailed in Section 3.1. The numbers of measurements made of each sample were listed in Table 4. Grey shaded area indicates the spectral range of the Heitronics KT15.85 IIP radiometer used for the emissivity box measurements.

μ μ Grosseto and Duxford

Considering the spectral measurements of the samples from Grosseto and Duxford, the measurements of the tarpaulin made in the laboratory and those collected using the D&P µFTIR in Grosseto and the EM27 in the field at Duxford are all within 2% of each other between 8.0 and 12.0 µm (Figure 10). These differences are in line with the Alconbury measurements, and comparable with other studies that have compared emissivity measurement approaches [11,15]. However, agreement between the laboratory and field measurements of the gravel sample from Duxford is poor by comparison, with a difference of up to 8% observed between the EM27 (field) and Vertex (lab) measurements in the restrahlen bands between 8.0 and 9.5 µm (Figure 11). Furthermore, as with the sand measurement from Alconbury shown in Figure 9, while the restrahlen bands are clearly evident in the EM27 measurements of the gravel in Duxford (Gravel\_Dux), these minima are weaker in the laboratory measurements.

The increase in noise in the derived spectral emissivity data beyond 12 µm for field measurements made using the EM27 in Alconbury (Figure 9) is again observed for all EM27 field measurements in Grosseto and Duxford (Figures 10 and 11 respectively). The EM27 measurements of the white tarpaulin and grass in Grosseto additionally show non-physical emissivities (>1) above 12 µm, which appeared systematic and not attributable solely to noise. Conversely, a decrease in spectral emissivity above 12 µm is observed in the EM27 measurements of the gravel in Grosseto, beach sand in Grosseto, and gravel in Duxford (Figure 11).

Spectral emissivity data of the same tarpaulins measured in the field in Duxford and in Grosseto using the same EM27 setup show larger differences than anticipated, in both spectral shape and magnitude (Figure 10). The Duxford EM27 measurements were in better agreement with the laboratoryand D&P-measured spectra than the Grosseto EM27 measurements, particularly so for the grey tarpaulin where the EM27 measurements collected in Grosseto failed to identify certain spectral features. The EM27 measurements in Duxford by contrast were in close agreement (<1%) to those of the D&P (from Grosseto) and the laboratory measurements. Despite the Duxford measurements appearing to perform relatively better than those from Grosseto, high uncertainties are also observed in the Duxford spectral emissivity measurements of the white tarpaulin. The PVC coating of this particular target had slightly specular characteristics, which may have made the emissivity more variable between measurements as the EM27′ s retrieval method is intended for samples with Lambertian behaviour surfaces.

The sand, grass, and gravel samples from Grosseto shown in Figure 11 enable direct comparison of the data from the two portable FTIR spectrometers, with measurements collected almost simultaneously and under identical field conditions. The spectral emissivities of sand derived with the EM27 and the D&P instruments were within 1% of each other between 8.5 and 12.0 µm, and the gravel emissivities were within 2% of one another over the same range, with the increased differences for the gravel likely attributable to the increased variability within this material. There was also strong agreement seen between the spectral features for the beach sand and gravel road. These results promote confidence in the emissivity data derived from both FTIR instruments over the 8.5–12.0 µm range. While the measured emissivities of the grass from the two FTIR spectrometers were also within 2% of each, non-physical (>1) noisy emissivities are observed in the EM27 data of the grass sample in Grosseto, as was also observed in this instrument's measurement of the Alconbury grass sample (Figure 9).

Below 8.5 µm, spectral emissivities retrieved using the D&P µFTIR spectrometer seem to be consistently lower than those derived using the EM27, particularly for the grass sample (Grass\_Gro) in Figure 11. This could indicate insufficient correction of atmospheric features in the post-processing of the D&P data, since this region has increased absorbance from atmospheric water vapour [17]. Outside this spectral region, EM27-derived emissivity spectra appeared consistently noisier than those from the D&P, as can be observed again in the grass measurements from Grosseto (Figure 11).

#### 4.1.2. Broadband Emissivities

The derived broadband emissivities for all samples measured in Grosseto and Duxford with the EM27 and D&P systems are shown in Figure 11, alongside those derived using the two-lid emissivity box. Agreement between the FTIR-derived values and those of the emissivity box is excellent for some samples, such as the gravel road in Grosseto (Gravel\_Gro) where the EM27 and D&P measurements were within 0.1% (0.001) of those of the emissivity box (Figure 12). However, for other samples the emissivity box provides broadband emissivities consistently lower than those of the FTIR systems, with differences of over 5% (0.05) for the grey tarpaulin for example (whereas for this sample the EM27 in Duxford, D&P in Grosseto and the laboratory Vertex 70 deliver broadband emissivities within 0.5% of each other).

Considering the different sample types, we observe that there were consistent discrepancies between the measurements of the two grass samples collected in Grosseto and Duxford using the EM27 and the emissivity box. For both, the emissivities of the vegetation as measured by the box method had a negative bias compared to the EM27 (Figure 12), with the EM27-derived values more in line with vegetation measurements reported elsewhere [39,51]. However, the measurement of the grass sample in Grosseto collected using the D&P was similar to the box-derived emissivity of that sample. Given this unclear performance and the non-physical emissivities observed in the EM27-derived emissivities of the grass samples from Alconbury (Grass\_Alc, Figure 9) and Grosseto (Grass\_Gro, Figure 11), further work is recommended to understand the performance of the EM27-based system over such targets.

**Figure 12.** Broadband LWIR emissivities of target samples measured in Grosseto and Duxford by the EM27 and D&P FTIR systems, derived via the convolution of these surface spectral emissivity data with the spectral response function of the Heitronics KT15.85 radiometer shown in Figure 4. Matching broadband emissivity values derived using a two-lid emissivity box and that same Heitronics radiometer are shown alongside. Error bars show the uncertainties of each estimate.

Uncertainties were consistently around 0.010 for the box emissivity measurements, which is comparable with other studies making use of the emissivity box [15,37,52]. However, these were consistently higher than uncertainties from the other methods (with the exception of the white tarpaulin measurement collected using the EM27 in Duxford, which had a band-averaged uncertainty of 0.028). The increased uncertainty, and the lack of consistency found here in the relative emissivity values in comparison to the range of other sensors deployed, lead us to question the ability to reliably use the box method—and thus its suitability for calibration and validation studies.

#### *4.2. Impact of Measurement Di*ff*erences on LST Estimation*

Table 6 shows the LST values and uncertainties calculated as detailed in Section 3.4 for each sample and input emissivity, with maximum and minimum derived LSTs highlighted in red and blue respectively. LSTs derived using emissivities from the two-lid emissivity box were the highest for all samples other than for the white tarpaulin, reflecting the consistent negatively biased emissivities delivered by this method relative to the others. In the case of the grey tarpaulin, the derived LSTs using the box-derived emissivities were 3.92 ◦C higher than those based on emissivities from the D&P µFTIR, EM27, or Vertex FTIR spectrometer as inputs. The magnitude of this bias again questions the reliability of the emissivity box approach.


**Table 6.** Calculated LSTs and LST uncertainties (◦C) for the six samples, each calculated using the emissivities derived with the various field and laboratory emissivity measurement methods considered herein, with the median and interquartile range (IQR) for each sample.

Differences between LSTs calculated using the broadband emissivities derived from the FTIR-based methods deployed herein are smaller than those resulting from use of the box-derived emissivities. However, we still observe LSTs differing by up to 1 ◦C (white tarpaulin), which, given that the GCOS target accuracy and currently achievable requirements for LST as an ECV are 1 ◦C and 2–3 ◦C respectively [28], highlights the continuing importance of reducing uncertainties on emissivity retrieval.

#### **5. Discussion**

Comparison of the spectral emissivities found the majority of emissivities from field and laboratory spectrometers to be within 1–2% of each other for most of the spectral range 8.5–12.0 µm. These levels are broadly in line with other studies [11,12] and give confidence in the measurements and methods for selected samples. Outside of 8.5–12.0 µm, differences between emissivity measurements increased and increased noise levels were observed in the EM27 spectra. A potential cause for the reduced performance of the EM27 beyond 12 µm could be extrapolation error from calibration to the cold sky temperatures, which Korb et al. [31] observed to create spectral artefacts between 11 and 13 µm due to nonlinearity of the MCT detector responses over wide signal ranges. However, if this were the case, this decrease in emissivities beyond 12 µm would likely be observed in the EM27 measured spectra of all samples, which was not the case. Furthermore, no systematic distortion in the EM27 spectra was apparent at wavelengths below 12 µm as would likely be if this were the cause. While emissivities from 8.5 to 12.0 µm were satisfactory for most applications from field to satellite scale, with the majority of satellite thermal bands used for LST calculation located in this region [2], further investigation is required if field emissivities accurate outside of this range are required.

The differences between the laboratory (Vertex 70) and field (EM27) measurements in the restrahlen bands of the sand sample from Alconbury (Figure 9) and the gravel sample from Duxford (Figure 11) raise questions about the relative performances of these laboratory and field setups in this region for samples with strong restrahlen features. Further investigation in particular is recommended for the laboratory setup, since the EM27 measurements of the sand and gravel drive samples in Grosseto compared well with the D&P measurements of the same samples (Figure 11). In the case of the gravel sample from Duxford, a likely cause of this lack of agreement is the inhomogeneity of the sample (Figure 8c) together with the different field-of-regards between the two measurement techniques, with the diameter of the sample port in the laboratory setup half that of the field-of-regard for the EM27 in the setup deployed in Duxford (diameters of 25 mm and 50 mm respectively). A contributing factor to the higher emissivity in the laboratory measurement could also be that the gap between the gravel sample and sample port in the laboratory (discussed in Section 3.2.1) decreased the measured sample reflectance and increased the derived emissivity. Although neither of these interpretations are applicable to the sand measurements, these issues highlight the impact that the different scales and designs of laboratory and field instrumentation can have on the retrieved emissivity. This is particularly important when using emissivity from in situ measurements for calibration/validation

activities over targets such as gravel that are apparently homogeneous in satellite/airborne sensor pixels but heterogeneous at the sub-pixel level [64]. In this study, field measurements were found to be vital for such samples as they observe a larger area than laboratory measurements and can more easily cope with the target sample structure.

A key limitation of this study was that only one sample was measured by all four methods (GreyTarp, Figure 9). With this sample however, strong agreement was seen between the laboratory, D&P and EM27 (Duxford) while the broadband emissivity from the emissivity box measurement of the sample was considerably lower. The negative bias in emissivity box measurements of this sample and other samples compared to the other methods as shown in Figure 12 questioned the reliability of the emissivity box approach for calibration and validation activities, particularly since use of the emissivities from the box in this study were found to result in positive biases of up to 4 ◦C when used to simulate in situ LSTs from radiometer observations.

The differences between the EM27 measurements of the tarpaulins made in Grosseto and Duxford indicate field conditions have a strong impact on output emissivities. Interestingly, the EM27 measurements of the tarpaulin from Grosseto were found to be in poorer agreement with the laboratory and D&P measurements from Grosseto than the EM27 measurements of the tarpaulin from Duxford, despite being made at the same time and location as the D&P measurements. However, as shown in Figure 11, other measurements made with the EM27 and D&P in Grosseto were in closer agreement, with the D&P and EM27 measurements of gravel, grass, and sand made in Grosseto within 2% across 8.5–12 µm. More measurements with the D&P (e.g., in Duxford) would have assisted here in understanding the relative performance of these two field spectrometers. Increased noise in the EM27 measurements of the grass sample compared to the D&P measurements was likely an artefact of the different instrument spectral resolutions (EM27 at 0.5 cm−<sup>1</sup> and D&P at 4 cm−<sup>1</sup> ). However it could also indicate reduced sensitivity in the EM27 instrument compared to D&P. The latter interpretation was identified as the calculated sample temperature of the grass that was five degrees lower than the measured ambient air temperature (30.5 ◦C), thus reducing signal to noise [41]. This should be investigated further since if this is the case the performance of the EM27 will be limited for measurement of cold samples in hot environments due to the reduced signal to noise ratio for these samples unless modifications are made to improve the sensitivity of the instrument.

Two potential causes were identified for the poor agreement in the EM27 tarpaulin measurements from Grosseto compared to those from Duxford. Firstly, the calculated temperatures of the white and grey tarpaulin were within just 2 ◦C of the cold blackbody temperature used in the EM27′ s two temperature calibration, with Hook and Kahle [40] finding that absolute errors in field emissivity measurements increased where the sample temperature was close to the temperature of the calibration blackbody. Care should therefore be taken to ensure the blackbody temperatures are at least 5 ◦C either side of the estimated sample temperature, but the high ambient temperatures in Grosseto meant the power required to cool the blackbody to the necessary temperature was insufficient—an issue now resolved by the use of a more powerful inverter in the EM27 setup. This could also have been the cause of the high uncertainties in the EM27 measurement of the white tarpaulin from Duxford (Figure 10), since the calculated sample temperature was also only 0.5 ◦C above the cold blackbody temperature. A second potential cause of the differences in the tarpaulin measurements made in Grosseto and Duxford using the EM27 was identified through comparison of the measured downwelling radiances and humidity in Grosseto and Duxford (not shown). Increased downwelling radiances were observed throughout the Grosseto campaign compared to Duxford, which corresponded with increased humidity (Section 3.2). Furthermore, greater variability between consecutive measurements of the gold panel in Grosseto than in Duxford indicates downwelling radiances were changing more rapidly with time in Grosseto—providing increased scope for changes between the sequential measurements of the panel and the sample being collected. Environmental conditions therefore indicate reduced stability and increased humidity, both factors known to impact the accuracy of retrieved emissivities [17,31]. This interpretation was supported by the increased atmospheric emission lines between 8 and 9 µm

apparent in the EM27 spectra collected in Grosseto compared to Duxford (Figure 10). This may have impacted the EM27 measurements more than it did the D&P measurements due to (i) the higher spectral resolution and (ii) the spectral smoothness retrieval method used to derive emissivity from the EM27, which relies on minimizing atmospheric features across 8.12–8.60 µm and therefore is optimal for stable atmospheres. Consideration of the tarpaulin emissivity measurements derived from the two different locations (Grossetto and Duxford) therefore highlights that measurement accuracies and uncertainties were highly sensitive to environmental conditions, and that care should be taken to ensure the blackbody calibration temperatures properly bracket the sample temperature.

The measurement of vegetation was shown to prove challenging for all methods, with non-physical emissivities and high noise levels observed and differences between the broadband emissivities. In the case of the EM27, non-physical emissivities and high noise were attributed to increased variability in sample temperatures during measurement [57]. However, we also observed that the calculated sample temperature of the Alconbury grass (Grass\_Alc, Figure 9) was just 1 K hotter than the cold blackbody temperature. As discussed above with the tarpaulin measurements, this could have led to increased errors. With respect to the laboratory measurements, while non-physical emissivities and high noise levels were observed in the measurements of the Alconbury grass (Figure 9), it is difficult to determine if there is a systematic problem with measurement of vegetation or if this was an exception as only one sample was measured in the laboratory. Measurements of additional vegetation samples in the laboratory would have enabled further analysis but no samples were collected in Grosseto and Duxford, as they would have deteriorated before measurement in the laboratory due to the gap between the collection date and measurement date. An increased number of scans would improve the signal-to-noise ratio and is therefore recommended for future measurement of low reflectance samples. However it should also be considered whether a setup operating in the DHR mode could be used to make measurements of vegetation, since vegetation tends to have non-isothermal properties (with different temperatures in different parts of the sample) but Kirchhoff's law theoretically requires samples to be isothermal [47]. Salisbury and D'Aria [48] avoided this by cutting vegetation samples and arranging them in a continuous monolayer on an adhesive tape substrate. However, this is also known to impact the emissivity by changing the structural composition and does not take into account any exposed soil components [65]. The non-isothermal properties of vegetation samples could also be the cause of the non-physical emissivities of the grass sample for the EM27-derived field measurements (which assumes a uniform sample temperature to calculate emissivity with the spectral smoothness method). This supports Ribeiro de Luz and Crowley's [47] argument for development of radiative transfer models that account for non-isothermal structures. Given that one of the major applications of LST from satellite and airborne sensors is monitoring evapotranspiration and crop health [66], further work on measurement of vegetation samples in both the laboratory and the field is therefore recommended. This is particularly important since the vegetation samples considered in both Duxford and Grosseto were limited to homogeneous short cropped grass, while in reality more complex samples containing exposed soil and more complex canopy structures are likely to also need assessing.

This study considered the impact these emissivity differences would have on LST algorithm validation activities through simulating in situ LSTs from field radiometers. However, in situ emissivity values from the laboratory or field instrumentation are also important for the development of LST and land surface emissivity (LSE) retrieval algorithms from satellite or airborne sensors, despite the development of new hyperspectral and multispectral thermal sensors and new physical retrieval algorithms (e.g., [26,67]) capable of simultaneous LST/LSE retrieval without the need for input emissivity estimates from land cover maps or other sources [62]. An example of such an application is in derivation of the coefficients for the Maximum–Minimum Difference (MMD) module in the TES algorithm [26] used to produce the operational Moderate Resolution Imaging Spectroradiometer (MODIS), ASTER, and ECOSTRESS LST/LSE products [53,68]. In this case, a negative bias in emissivity inputs would cause reduced maximum emissivities for the same min–max difference, thus shifting the regression curve, changing the coefficients in the MMD module and impacting the retrieved LSTs and LSEs. It is

crucial therefore for LST and LSE retrieval algorithm development and validation activities that work continues on improving and understanding uncertainties surrounding in situ emissivity measurement methods in the field and laboratory.

#### **6. Summary and Conclusions**

We conducted an inter-comparison of four different methods of LWIR surface emissivity retrieval, encompassing methods that derived full spectral emissivity data and broadband emissivities, and which operate in the field and in the laboratory. The methods considered are based on field measurements made with two portable FTIR spectrometers (a Bruker EM27 and a D&P µFTIR) operating in the emission mode, a laboratory FTIR spectrometer (Vertex 70) operating in directional hemispherical reflectance mode, and a two-lid emissivity box based on the design of Rubio et al. [33] also deployed in the field. Fourteen target samples were considered across four field sites covering both the UK and Italy, and these include man-made materials such as tarpaulins and natural materials such as sand, grass, and water.

The majority of the derived spectral emissivities were within 1–2% of each other between the major part of the LWIR atmospheric window (8.5–12.0 µm), with identification of spectral features also in agreement between the different field and laboratory approaches. This degree of agreement is consistent with that found by other studies comparing field and laboratory methods of spectral emissivity determination. Differences of up to 15% were observed between the laboratory and field measurements for samples with strong restrahlen features, suggesting a need for further investigation into the laboratory setup's performance when measuring samples with these features. Consideration of the gravel sample from Duxford suggests that field instrumentation can be more suitable than laboratory directional hemispherical reflectance setups for non-homogeneous samples and samples with complex structures. Beyond 12 µm, significant noise and an unexplained drop off in spectral emissivity was observed in certain of the EM27 retrieved emissivities. As a result, we recommend use of EM27 emissivity spectra should be limited to within the 8.0–12.0 µm region. Similarly, although fewer measurements were made using the D&P, increased noise and a decrease in emissivity below 8.5 µm indicates that the D&P-system may deliver emissivities not fully to be trusted below this wavelength, at least in the configuration used herein.

Differences between field measurements made of the same samples using the EM27 but in different locations under different environmental conditions identified some issues. In particular the power supply was inadequate to cool the internal blackbody to the ideal temperature when ambient conditions were particularly warm, leading to the cold blackbody temperature being probably too similar to the target sample temperature to give well calibrated data. This has now been resolved through installation of a higher power inverter. Some increased noise was also evident in certain EM27 measurements, and we recommend that for comparatively cool samples such as vegetation data collection should be done at times to maximise thermal contrast with the surroundings. The time taken to collect each spectral measurement should also be minimised under conditions of potentially changing atmospheric humidity, for example by reducing the number of scans or lowering the measurement spectral resolution (Salisbury [17] advise that 8 cm−<sup>1</sup> is generally adequate for spectral emissivity determination).

Measurement of vegetation samples was found to be challenging for all methods due to reduced signal-to-noise, canopy scattering, varying sample temperature during the measurement and non-isothermal properties. Using the measured emissivities to simulate near-surface LST observations of grass found differences of 1.5 ◦C depending on which method of emissivity determination was used. Given that a major application of LSTs is for agriculture and use in evapotranspiration models [6], accurate measurement of the emissivity of vegetation at the field and laboratory scale is crucial, so further work towards understanding the uncertainties at both the field and laboratory scale is recommended.

We derived broadband emissivities from the spectral emissivity measurements and compared these with those calculated using the two-lid emissivity box method. We found a lack of consistency in the emissivity values measured with the box and increased uncertainties compared to the other methods. This indicates that its performance was inferior to that of the FTIR-based approaches, albeit it is based on far cheaper and more available technology.

**Author Contributions:** Conceptualization, M.F.L., T.P.F.D. and M.W.; Data curation, M.F.L., T.P.F.D., M.W., M.J.G., M.C.d.J. and W.R.J.; Formal analysis, M.F.L.; Funding acquisition, T.P.F.D. and M.W.; Investigation, M.F.L., T.P.F.D., M.W., M.J.G., M.C.d.J. and W.R.J.; Methodology, M.F.L., T.P.F.D. and M.W.; Project administration, M.F.L., T.P.F.D. and M.W.; Resources, M.F.L., T.P.F.D., M.W., J.J., W.R.J., S.J.H. and G.R.; Software, M.F.L.; Supervision, M.F.L., T.P.F.D. and M.W.; Validation, M.F.L.; Visualization, M.F.L. and T.P.F.D.; Writing—original draft, M.F.L., T.P.F.D. and M.W.; Writing—review and editing, M.F.L., T.P.F.D., M.W., M.J.G., S.J.H. and G.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** Aspects of this work were part of the joint NASA ESA Temperature Sensing Experiment (NET-Sense) conducted under a programme of, and funded by, the European Space Agency (Contract Number 4000131017/20/NL/FF/ab) and the National Aeronautics and Space Administration (NASA), with part of the research described in this paper carried out in part at the Jet Propulsion Laboratory, California Institute of Technology, under contracts with NASA. In addition, aspects were funded through PRISE (Pest Risk Information Service), a project funded by the UK Space Agency as part of the Global Challenge Research Fund. Support for this research also came partly from NERC National Capability funding to the National Centre for Earth Observation (NE/Ro16518/1).

**Acknowledgments:** We thank Hannah Nyugen, Bruce Main and Francis O'Shea from King's College London for their assistance in development and testing of the emissivity box on multiple field campaigns. The views in this publication can in no way be taken to reflect the official opinion of the European Space Agency or any other funding body.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

This appendix presents the calculation of uncertainties associated with the evaluation of the impact on LST presented in Section 3.4. The error sources on LST were identified as that of the surface radiation (*L*↑), downwelling radiation (*L*↓), and emissivity (ε). All terms are wavelength (λ) dependent but the wavelength terms were omitted for clarity.

To calculate the uncertainty on the derived LST observations, the equivalent uncertainties in radiance units *U*L↑/<sup>↓</sup> for both surface and sky viewing radiometer observations were first determined from the manufacturer stated uncertainty of the radiometer in temperature units (*U*T↑/↓) through the differential of the Planck function with respect to temperature (*T*) such that:

$$\mathcal{U}\mathcal{U}\_{\mathcal{L}} = \left| \frac{\partial B}{\partial T} \right| \mathcal{U}\_{\mathcal{T}} = \frac{c\_1 c\_2 e^{\frac{c\_2}{\lambda T}}}{\lambda^6 T^2 \left(e^{\frac{c\_2}{\lambda T}} - 1\right)^2} \mathcal{U}\_{\mathcal{T}} \tag{A1}$$

where *c*<sup>1</sup> and *c*<sup>2</sup> are constants such that *c*<sup>1</sup> = 2*hc*<sup>2</sup> and *c*<sup>2</sup> = *hc k* (with *h*, *c* and *k* as defined in Section 3.1.2).

The uncertainty of the land surface radiance (*U*Lsurf ) was then calculated using Equation (8) in Ghent et al. [63] such that:

$$\mathcal{U}\_{\text{L}\_{\text{surf}}} = L\_{\text{surf}} \sqrt{\frac{\mathcal{U}\_{\text{L}\uparrow}^{2} + \left( (1 - \varepsilon) L\_{\downarrow} \sqrt{\frac{\mathcal{U}\_{\text{e}}^{2}}{\left(1 - \varepsilon\right)^{2}} + \frac{\mathcal{U}\_{\text{L}\_{\downarrow}}^{2}}{L\_{\downarrow}^{2}}}\right)^{2}}{\left(L\_{\uparrow} - L\_{\downarrow}(1 - \varepsilon)\right)^{2} + \frac{\mathcal{U}\_{\text{e}}^{2}}{\varepsilon^{2}}}} \tag{A2}$$

where *U*ε is the uncertainty on the emissivity observation. Using the uncertainty of the surface radiance, we then calculated the absolute uncertainty of a given LST observation (*U*LST) using Equation (9) in Ghent et al. [63]:

$$\mathcal{U}\_{\rm LST} = \mathcal{C}\_2 \left( \frac{c\_1 \left( \frac{\mathcal{U}\_{\rm Lsurf}}{\lambda^5 L\_{\rm surf}^2} \right)}{\left( \frac{c\_1}{L\_{\rm surf} \lambda^5} + 1 \right) \lambda \left( \ln \frac{c\_1}{L\_{\rm surf} \lambda^5} + 1 \right)^2} \right) \tag{A3}$$
