*2.2. Single-Channel Algorithm*

This study used the single-channel algorithm developed by Jiménez-Muñoz et al. [27] to retrieve LST from Landsat 8 TIRS Band 10 or 11 image. This algorithm is expressed as:

$$T\_s = \chi \left[ \frac{1}{\varepsilon} (\psi\_1 L\_{taa} + \psi\_2) + \psi\_3 \right] + \delta \tag{7}$$

where *Ltoa* is the TOA radiance and calculated from the radiometric calibration on the observation; ε is the emissivity of Band 10 or 11; and γ and δ are two parameters given by

$$\gamma \approx \frac{T\_b^2}{b\_\gamma L\_{toa}}, \text{ and } \delta \approx T\_b - \frac{T\_b^2}{b\_\gamma} \tag{8}$$

In Equation (8), *T<sup>b</sup>* is the brightness temperature of TIRS Band 10 or 11, that is, *T*<sup>10</sup> or *T*11. *b*<sup>γ</sup> = *c*2/λ (*c*<sup>2</sup> = 1.43877 × 10<sup>4</sup> µm·K; λ is the effective wavelength of TIRS, and *b*<sup>γ</sup> is 1324 K for Band 10 and 1199 K for Band 11, respectively). ψ1, ψ2, and ψ<sup>3</sup> are atmospheric terms related to the atmospheric transmittance and downward and upward thermal radiance, and can be nonlinearly related to CWV. Equation (9) shows the coefficients for Band 10 that are estimated by Jimenez-Munoz et al. [12] from the Global Atmospheric Profiles from Reanalysis Information (GAPRI) database [32].

$$
\begin{bmatrix} \psi\_1\\ \psi\_2\\ \psi\_3 \end{bmatrix} = \begin{bmatrix} c\_{11} & c\_{12} & c\_{13} \\ c\_{21} & c\_{22} & c\_{23} \\ c\_{31} & c\_{32} & c\_{33} \end{bmatrix} \begin{bmatrix} w^2\\ w\\ 1 \end{bmatrix} = \begin{bmatrix} 0.04019 & 0.02916 & 1.01523 \\ -0.38333 & -1.50294 & 0.20324 \\ 0.00918 & 1.36072 & -0.27514 \end{bmatrix} \begin{bmatrix} w^2\\ w\\ 1 \end{bmatrix} \tag{9}
$$

Jiménez-Muñoz et al. [12] only provided coefficients for Band 10. However, after the stray light correction, Band 11 can also be used in the single-channel algorithm. Therefore, coefficients for this band are required, as well. This study obtained the coefficient matrix *C* for Band 11 by using the above TIGR atmospheric profiles. The matrix is

$$\mathbf{C}\_{b11} = \begin{bmatrix} 0.09874 & -0.03212 & 1.06497 \\ -0.81391 & -0.94691 & -0.17172 \\ -0.00676 & 1.40205 & -0.14864 \end{bmatrix} \tag{10}$$

#### *2.3. Determination of Pixel Emissivity and Atmospheric CWV*

The above split-window and single-channel algorithms require pixel emissivity and atmospheric CWV as input. The two parameters were both acquired using the same way to maintain constancy among algorithms.

For pixel emissivity, this study adopted the widely used normalized difference vegetation index (NDVI)-threshold method to estimate the pixel emissivity. In this method, land pixels were classified into three types on the basis of their NDVI value, namely, barren soil, fully vegetated, and partly vegetated pixels [24,33]. NDVI was calculated from atmospherically corrected ground red and near-infrared band reflectance. The emissivity of partly vegetated pixel was mainly calculated from

the combination of soil and vegetation component emissivities, which are weighted by the fraction of vegetation cover (FVC). For convenience, the emissivities of barren soil and fully vegetated pixels were directly given by soil and vegetation component emissivities, respectively, on the assumption that the component emissivities slightly change in time. Finally, this method is expressed as:

$$\varepsilon\_{p} = \begin{pmatrix} \varepsilon\_{\rm s} & \text{NDVI} < \text{NDVI}\_{\rm s} \text{(barren soil)}\\ \varepsilon\_{\rm v}f + \varepsilon\_{\rm s}(1-f) + 4 < d\varepsilon > f(1-f), & \text{NDVI}\_{\rm s} \le \text{NDVI} \le \text{NDVI}\_{\rm v} \text{(partly weighted)}\\ \varepsilon\_{\rm v} & \text{NDVI} > \text{NDVI}\_{\rm s} \text{(fully weighted)} \end{pmatrix} \tag{11}$$

where ε*<sup>p</sup>* is the pixel emissivity, ε*<sup>s</sup>* is the soil component emissivity, ε*<sup>v</sup>* is the vegetated component emissivity, <*d*ε> is the maximum cavity term and is set as 0.01 [34], and *f* is the FVC. *NDVI<sup>s</sup>* is the NDVI for barren soil pixel, and its value is 0.20; *NDVI<sup>v</sup>* is the NDVI for fully vegetated pixel and valued with 0.86 [35]. Ren et al. [33] improved this method by using flexible component emissivity instead of the fixed component emissivity in the original method on the basis of the different land cover types of the fine-resolution observation and monitoring of global land cover product [36].

For atmospheric CWV, the MODIS/Terra total precipitable water vapor 5-Min L2 Swath 1 km and 5 km (MOD05\_L2) product was used in accordance with the location and observation time of Landsat 8. As one of MODIS standard products, the MOD05\_L2 product consists of atmospheric CWV amounts estimated from two different algorithms, namely, the near-infrared and infrared algorithms. The spatial resolution of CWV data generated by the near-infrared algorithm is 1 km, whereas that of the infrared algorithm is 5 km [37]. The water vapor at 1 km was used in this study and the temporal variation between MODIS and Landsat 8 overpass time was ignored.

#### **3. Landsat 8 Images and Ground-Measured LST**

Landsat 8 images and the corresponding ground-measured data of the surface radiation budget network (SURFRAD) [38] that were usually used for in situ LST validation [15,20,39–47] were obtained to evaluate the different LST retrieval algorithms as mentioned above. The Landsat 8 image pairs before and after the stray light correction were obtained from the USGS website to illustrate the LST retrieval accuracy change before and after the correction. The brightness temperatures *T*<sup>10</sup> and *T*<sup>11</sup> were consequently calculated from the observation using the radiometric coefficients in the metadata file.

The SURFRAD was established in 1993 through the support of NOAA's Office of Global Programs. The primary objective was to support climate research with accurate, continuous, long-term measurements of the surface radiation budget over the United States. SURFRAD has seven sites, but only four of them (Bondville\_IL (BND), Goodwin\_Creek\_MS (GCM), Penn\_State\_PA (PSU), and Sioux\_Falls\_SD (SXF)) are suitable for the validation of moderate-resolution remote sensing images, such as Landsats 5 and 7, owing to the heterogeneity issues [41]. However, given that the error caused by stray light is related to the ground temperature of the surrounding area and the error is greater when the surrounding area is warm [17], the Desert\_Rock\_NV (DRA) site was also selected for validation to explore the effect of the stray light correction in broader temperature ranges. Since its land cover type is open shrublands, resulting in some high LST of this site. Meanwhile, DRA site has also been used in the LST validation of Landsat 8 [20] and other TIR sensors [46,47] in recent years, which can prove the applicability of the DRA site in some degree. As a result, we finally used the BND, GCM, PSU, SXF, and DRA sites to validate the Landsat 8 LST retrieval results considering heterogeneity issues and sufficient temperature range. Table 3 lists the information of the five sites of the SURFRAD program and the number of clear-sky Landsat 8 images from its launch to August 2017 over each site. We removed the observations that had a standard deviation of temperature exceeding 1 K for 3 pixels × 3 pixels around the site center, in order to reduce the error caused by heterogeneity [41]. Finally, a total of 207 images were obtained for analysis, as shown in Table 3.


**Table 3.** The information of the surface radiation budget network (SURFRAD) sites and the image numbers under clear-sky conditions.

Each site provides a measurement of the upward surface TIR radiance *L* <sup>↑</sup> and the downward atmospheric TIR radiance *R* ↓ in the wavelength range of 3–50 µm every 1 min [39]. On the basis of the thermal radiative transfer equation of the near surface and the Stefan–Boltzmann law, the ground LST can be calculated as:

$$T\_{s,ground} = \left[\frac{L^\uparrow - (1 - \overline{\varepsilon})R^\downarrow}{\overline{\varepsilon}\sigma}\right]^{\frac{1}{4}}\tag{12}$$

In Equation (12), σ is the Stefan–Boltzmann constant with the value of 5.67 × 10−<sup>8</sup> W/m<sup>2</sup> ·K<sup>4</sup> . ε is the ground broadband emissivity (BBE). BBE in 8–13.5 µm was used, because it is considered as the best wavelength range for estimating the net longwave radiation under clear sky [48,49]. To obtain this BBE, we also used the NDVI-threshold method as stated in Equation (11). However, the broadband component emissivity rather than the band component emissivity was used in Equation (11) for BBE calculation. Some details of the technique can be found in Ren et al. [33].

#### **4. Band Radiance and LST Evaluation Results**

The LST in the ground sites was retrieved from Landsat 8 images by using the above five algorithms. For simplification, the generalized split-window algorithm by Du et al. [14], the linear split-window algorithm by Rozenstein et al. [13], and the split-window algorithm by Jiménez-Muñoz et al. [12] were denoted as SW\_Du, SW\_Rozenstein, and SW\_JM, respectively. The single-channel algorithms were denoted as SC\_10 for using Band 10 image and as SC\_11 for using Band 11 image. This section focuses on the LST evaluation results in two aspects, namely, the band radiance comparison before and after the stray light correction and the LST retrieval result comparison.

### *4.1. TOA Radiance Comparison Before and After the Stray Light Correction*

After the stray light correction, some changes in band TOA radiance should be observed. On the basis of the above clear-sky Landsat 8 images over the ground sites, we investigated the band TOA radiance changes caused by the stray light correction. From the 207 images over the five sites, Figure 1a shows the TOA radiance difference (∆*L*) of Band 10 before and after the stray light correction, and Figure 1b is the case of Band 11. The bias of both bands turned out to be slightly positive, which meant that the TIRS data became minimally "hotter" after the stray light correction in general. For Band 10, the histogram of ∆*L* was mostly concentrated larger than zero, with a bias of 0.08 W·m−<sup>2</sup> ·sr−<sup>1</sup> ·µm−<sup>1</sup> . A general positive TOA radiance change was observed for this band, although such a change was unremarkable. For Band 11, ∆*L* showed a broader distribution but had a similar bias with Band 10, indicating that the stray light on Band 11 was corrected in a larger scale and the variance of correction was also greater than that of Band 10.

after the stray light correction.

variance of correction was also greater than that of Band 10.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 8 of 21

bias with Band 10, indicating that the stray light on Band 11 was corrected in a larger scale and the

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 8 of 21

**Figure 1.** Histograms of the band top of the atmosphere (TOA) radiance change after the stray light correction for Landsat 8 thermal infrared sensor (TIRS) of (**a**) Band 10 and (**b**) Band 11. Δ*L* is the difference between TOA radiance after and before the stray light correction (after minus before). **Figure 1.** Histograms of the band top of the atmosphere (TOA) radiance change after the stray light correction for Landsat 8 thermal infrared sensor (TIRS) of (**a**) Band 10 and (**b**) Band 11. ∆*L* is the difference between TOA radiance after and before the stray light correction (after minus before). correction for Landsat 8 thermal infrared sensor (TIRS) of (**a**) Band 10 and (**b**) Band 11. Δ*L* is the difference between TOA radiance after and before the stray light correction (after minus before).

The effect of stray light is related to LST [17], hence, the relationship between correction results

The effect of stray light is related to LST [17], hence, the relationship between correction results and LST is necessary to explore. As stated in Section 3, the LST calculated from the SURFRAD data was regarded as the reference LST for the evaluation. Figure 2 shows the scatter plot between Δ*L* caused by the stray light correction and SURFRAD ground LST (SURFRAD LST). A negative correlation was firstly observed, and the radiance of Band 11 was found to have a more significant negative relationship between Δ*L* and SURFRAD LST than that of Band 10. In the low LST range of 260–280 K, the stray light correction of Band 11 was 0.2–0.4 W·m−2·sr−1·μm−1, which was larger than that of Band 10. In the LST range of 290–300 K, the stray light correction of the two bands became nearly the same, with a value of approximately 0.1 W·m−2·sr−1·μm−1, corresponding to a temperature difference of approximately 0.7–0.8 K. For LST > 310 K, the absolute value of the stray light correction of Band 10 was smaller than that of Band 11. Such modification can result in a remarkable effect on the final LST retrieval, especially for the single-channel method. This study analyzed this effect, as stated in the following Section 4.2, to clarify the change on LST retrieval result before and The effect of stray light is related to LST [17], hence, the relationship between correction results and LST is necessary to explore. As stated in Section 3, the LST calculated from the SURFRAD data was regarded as the reference LST for the evaluation. Figure <sup>2</sup> shows the scatter plot between <sup>∆</sup>*<sup>L</sup>* caused bythe stray light correction and SURFRAD ground LST (SURFRAD LST). A negative correlation was firstly observed, and the radiance of Band 11 was found to have a more significant negative relationship between ∆*L* and SURFRAD LST than that of Band 10. In the low LST range of 260–280 K, the stray light correction of Band 11 was 0.2–0.4 W·m−<sup>2</sup> ·sr−<sup>1</sup> ·µm−<sup>1</sup> , which was larger than that of Band 10. In the LST range of 290–300 K, the stray light correction of the two bands became nearly the same, with a value of approximately 0.1 W·m−<sup>2</sup> ·sr−<sup>1</sup> ·µm−<sup>1</sup> , corresponding to a temperature difference of approximately 0.7–0.8 K. For LST > 310 K, the absolute value of the stray light correction of Band 10 was smaller than that of Band 11. Such modification can result in a remarkable effect on the final LST retrieval, especially for the single-channel method. This study analyzed this effect, as stated in the following Section 4.2, to clarify the change on LST retrieval result before and after the stray light correction. and LST is necessary to explore. As stated in Section 3, the LST calculated from the SURFRAD data was regarded as the reference LST for the evaluation. Figure 2 shows the scatter plot between Δ*L* caused by the stray light correction and SURFRAD ground LST (SURFRAD LST). A negative correlation was firstly observed, and the radiance of Band 11 was found to have a more significant negative relationship between Δ*L* and SURFRAD LST than that of Band 10. In the low LST range of 260–280 K, the stray light correction of Band 11 was 0.2–0.4 W·m−2·sr−1·μm−1, which was larger than that of Band 10. In the LST range of 290–300 K, the stray light correction of the two bands became nearly the same, with a value of approximately 0.1 W·m−2·sr−1·μm−1, corresponding to a temperature difference of approximately 0.7–0.8 K. For LST > 310 K, the absolute value of the stray light correction of Band 10 was smaller than that of Band 11. Such modification can result in a remarkable effect on the final LST retrieval, especially for the single-channel method. This study analyzed this effect, as stated in the following Section 4.2, to clarify the change on LST retrieval result before and after the stray light correction.

**Figure 2.** Relationship between ∆*L* and SURFRAD LST over ground sites.

After considering ∆*L* of the single band, the relationship between the two bands' brightness temperature difference (*T*<sup>10</sup> − *T*11) and the SURFRAD LST before and after the stray light correction was investigated. As illustrated in Figure 3, before the correction, the brightness temperature difference (*T*<sup>10</sup> − *T*11) had no evident relationship with the ground LST (see Figure 3a). However, after the stray light correction, the brightness temperature difference (*T*<sup>10</sup> − *T*11) had a significant positive correlation with the ground LST (Figure 3b) and the same trend as the simulation data (Figure 3d). This change showed that the stray light correction process not only improved TOA radiance of two TIRS bands respectively as stated in the Introduction section, but also produced a more reasonable correlation on brightness temperature of two Landsat 8 TIRS bands data. Moreover, the change ∆(*T*<sup>10</sup> − *T*11) in the brightness temperature difference (*T*<sup>10</sup> − *T*11) before and after the stray light correction was also found to be linearly correlated with SURFRAD LST, as shown in Figure 3c. In high temperature (LST > 320 K), the (*T*<sup>10</sup> − *T*11) increased, while in low temperature (LST < 280 K), the (*T*<sup>10</sup> − *T*11) decreased. Figure 3d shows that (*T*<sup>10</sup> − *T*11) had an obvious relationship with LST in theory, which makes it possible to develop a split-window algorithm with the term (*T*<sup>10</sup> − *T*11) to retrieve LST. Therefore, the evident and closer-to-theory relationship between (*T*<sup>10</sup> − *T*11) and LST appeared after correction was expected to make the refined TIRS image more suitable for the split-window algorithm than the original TIRS image. After considering Δ*L* of the single band, the relationship between the two bands' brightness temperature difference (*T*10 − *T*11) and the SURFRAD LST before and after the stray light correction was investigated. As illustrated in Figure 3, before the correction, the brightness temperature difference (*T*10 − *T*11) had no evident relationship with the ground LST (see Figure 3a). However, after the stray light correction, the brightness temperature difference (*T*10 − *T*11) had a significant positive correlation with the ground LST (Figure 3b) and the same trend as the simulation data (Figure 3d). This change showed that the stray light correction process not only improved TOA radiance of two TIRS bands respectively as stated in the Introduction section, but also produced a more reasonable correlation on brightness temperature of two Landsat 8 TIRS bands data. Moreover, the change Δ(*T*<sup>10</sup> − *T*11) in the brightness temperature difference (*T*10 − *T*11) before and after the stray light correction was also found to be linearly correlated with SURFRAD LST, as shown in Figure 3c. In high temperature (LST > 320 K), the (*T*10 − *T*11) increased, while in low temperature (LST < 280 K), the (*T*10 − *T*11) decreased. Figure 3d shows that (*T*10 − *T*11) had an obvious relationship with LST in theory, which makes it possible to develop a split-window algorithm with the term (*T*10 − *T*11) to retrieve LST. Therefore, the evident and closer-to-theory relationship between (*T*10 − *T*11) and LST appeared after correction was expected to make the refined TIRS image more suitable for the split-window algorithm than the original TIRS image.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 9 of 21

**Figure 2.** Relationship between Δ*L* and SURFRAD LST over ground sites.

**Figure 3.** Relationship of brightness temperature difference (*T*10 − *T*11) and ground-measured LST (denoted as SURFRAD LST). (**a**) Before the stray light correction and (**b**) after the stray light correction. (**c**) Change (Δ(*T*10 − *T*11)) in the brightness temperature difference before and after the stray light correction. (**d**) Density scatter plot of (*T*10 − *T*11) and LST in the simulation dataset. **Figure 3.** Relationship of brightness temperature difference (*T*<sup>10</sup> − *T*11) and ground-measured LST (denoted as SURFRAD LST). (**a**) Before the stray light correction and (**b**) after the stray light correction. (**c**) Change (∆(*T*<sup>10</sup> − *T*11)) in the brightness temperature difference before and after the stray light correction. (**d**) Density scatter plot of (*T*<sup>10</sup> − *T*11) and LST in the simulation dataset.

#### *4.2. LST Retrieval Comparison Before and After the Stray Light Correction 4.2. LST Retrieval Comparison Before and After the Stray Light Correction*

This section focused on the LST retrieval evaluation before and after the stray light correction in two ways. First, the retrieved LST was compared with the ground-measured LST to check whether This section focused on the LST retrieval evaluation before and after the stray light correction in two ways. First, the retrieved LST was compared with the ground-measured LST to check whether the

LST retrieval accuracy improved after the stray light correction. Second, the LST retrieval accuracy among different split-window and single-channel algorithms was compared to find the best algorithm for retrieving LST from Landsat 8 TIRS images.

#### 4.2.1. Overall Comparison of LST Retrieval Results

On the basis of the Landsat 8 images before and after the stray light correction, the above split-window and single-channel algorithms were used to estimate the LST over the ground sites. In accordance with the ground reference LST in different sites, Table 4 lists the LST RMSE and bias for the five algorithms before and after the stray light correction over all five sites.

**Table 4.** The RMSE and bias of Landsat 8 LST retrieval from five algorithms before and after the stray light correction over all five sites.


<sup>1</sup> The value in the bracket is the RMSE or bias change after the stray light correction compared to those of no correction.

Before the stray light correction, three split-window algorithms and SC\_10 nearly had RMSEs smaller than 3.0 K, which was obviously smaller than the RMSE (4.33 K) of SC\_11. However, the absolute values of bias for SW\_Du and SW\_Rozenstein were greater than 1.0 K, indicating that the retrieved LST was slightly biased from the ground-measured LST on average. As for the single-channel algorithm, SC\_10 performed better than SC\_11; the RMSE of SC\_10 was 2.74 K with a bias of −0.17 K, whereas SC\_11 had a larger RMSE of 4.33 K. From the RMSE and bias, the performance of SW\_JM exhibited the best results among the five methods before the stray light correction, with an RMSE of 2.33 K and a bias of 0.58 K. After the stray light correction, the RMSEs of all algorithms were less than 4.0 K. The least RMSE was nearly 2.5 K, which was obtained by SW\_Rozenstein, SW\_JM, and SC\_10. Among the three algorithms, the biases of SW\_Rozenstein and SC\_10 were within 0.6 K; therefore, these two algorithms may be the best algorithms to retrieve LST after the stray light correction. SC\_11 still had the worst performance to retrieve LST from the Landsat TIRS images, with an RMSE of 3.55 K and a bias of 1.59 K.

The comparison of the retrieved results of all sites before and after the stray light correction indicated that the absolute changes in RMSE and bias for the five algorithms were all within 1.0 K, which showed only slight changes in retrieved LSTs after the stray light correction on average. For the split-window algorithms, the RMSE of SW\_Rozenstein decreased by 0.3 K, whereas that of SW\_Du and SW\_JM increased minimally after the stray light correction. For the single-channel algorithm, the RMSE of two bands both decreased, indicating an improved performance of the single-channel algorithm for Landsat 8 TIRS after the correction. The RMSE of SC\_11 decreased most among the five algorithms but was still larger than 3.5 K. However, the bias of all algorithms increased greater than 0.6 K, showing that retrieved LSTs were overestimated after the stray light correction on average, and the correction might be biased.

Figure 4 demonstrates the comparison of LST retrieval from the five algorithms before and after the stray light correction. Moreover, because of the particularity of the DRA site mentioned in Section 3, its result is shown separately with different symbols. From Figure 4, it was found that the retrieved LST highly linearly correlated to the ground-measured LST, with coefficients *R* 2 larger than 0.95 for all cases. For the three split-window algorithms (Figure 4a–c), the retrieved LSTs from the corrected images were generally higher than those from uncorrected images, especially at high ground surface

temperature. In the case that the retrieved LST before the correction was higher than the ground LST, the accuracy of LST retrieval naturally decreased if the retrieval LST was higher after the correction. For the SW\_Du algorithm (Figure 4a), most retrieved LSTs before correction were higher than ground LST. After correction, the retrieval error was greater in most cases. For the SW\_Rozenstein algorithm (Figure 4b), when ground LST was higher than 315 K, the retrieved LST before correction was lower than the ground LST for most cases. Although the retrieved LST after correction was higher than that before the correction (similar to the SW\_Du algorithm), the accuracy of LST retrieval increased after the correction, which was different from the SW\_Du algorithm. For the SW\_JM algorithm (Figure 4c), under low-temperature conditions (LST < 290 K), the retrieved LST became lower after the correction. Consequently, the over-high retrieved LST before the correction was getting closer to the ground LST, leading to the improvement of retrieval accuracy. However, under high-temperature conditions (LST > 300 K), the retrieved LST after the correction became higher in general, resulting in a larger error. With the opposite accuracy change of low and high temperatures, the RMSE change of the SW\_JM algorithm finally became very small and was only half of that of the SW\_Du algorithm, as presented in Table 4 (+0.55 K for the SW\_Du algorithm and +0.21 K for the SW\_JM algorithm). For the single-channel algorithm (Figure 4d,e), the retrieved LST after the correction increased at low temperatures but decreased at high temperatures, compared to that of before the correction. The LST change of the SC\_11 algorithm was more evident than that of the SC\_10 algorithm (−0.27 K for SC\_10 and −0.78 K for SC\_11), and the scatter plot result of both algorithms were getting closer to the 1:1 line. This finding explained the details of accuracy increase in single-channel algorithms for the two TIRS bands, as shown in Table 4.

The single-channel algorithm is a theoretically derived algorithm; accordingly, the accuracy changes of SC\_10 and SC\_11 can be used to check whether the TIRS data are better after the stray light correction. Table 4 indicates that the errors of the SC\_10 and SC\_11 algorithms were reduced, meaning that LST retrieval results got better after the stray light correction and the SC\_11 improved more. Combined with the change in the band radiance before and after the correction, the difference of retrieved LST can be analyzed clearly. Figure 2 in Section 4.1 illustrates that the TOA radiance of the two bands after the correction increased at low temperatures and decreased at high temperatures, resulting that the retrieved LST was closer to the ground LST of the single-channel algorithm on two bands. This finding from SURFRAD sites provided the proof supporting that the stray light correction improved not only the TIRS data quality as stated in previous study [17] but also the LST retrieval accuracy in practice. However, the validations over more robust and homogeneous ground-measured datasets, such as desert and water sites, are still needed to clarify the sole effect of the stray light correction on LST retrieval.

Since the DRA site had heterogeneity issues and high ground temperature cases (all from DRA when SURFRAD LST > 320 K) that had distinguishing features in the result of validation in Figure 4, the analysis excluding the DRA site is also necessary to help understand the effect of the stray light correction. Table 5 lists the temperature RMSE and bias of the five algorithms calculated from data of all sites excluding DRA.

**Table 5.** The RMSE and bias of Landsat 8 LST retrieval from five algorithms before and after the stray light correction over all sites excluding DRA.


<sup>1</sup> The value in the bracket is the RMSE or bias change after the stray light correction compared to those of no correction.

of all sites excluding DRA.

those of no correction.

light correction over all sites excluding DRA.

the sole effect of the stray light correction on LST retrieval.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 11 of 21

the ground LST, the accuracy of LST retrieval naturally decreased if the retrieval LST was higher after the correction. For the SW\_Du algorithm (Figure 4a), most retrieved LSTs before correction were higher than ground LST. After correction, the retrieval error was greater in most cases. For the SW\_Rozenstein algorithm (Figure 4b), when ground LST was higher than 315 K, the retrieved LST before correction was lower than the ground LST for most cases. Although the retrieved LST after correction was higher than that before the correction (similar to the SW\_Du algorithm), the accuracy of LST retrieval increased after the correction, which was different from the SW\_Du algorithm. For the SW\_JM algorithm (Figure 4c), under low-temperature conditions (LST < 290 K), the retrieved LST became lower after the correction. Consequently, the over-high retrieved LST before the correction was getting closer to the ground LST, leading to the improvement of retrieval accuracy. However, under high-temperature conditions (LST > 300 K), the retrieved LST after the correction became higher in general, resulting in a larger error. With the opposite accuracy change of low and high temperatures, the RMSE change of the SW\_JM algorithm finally became very small and was only half of that of the SW\_Du algorithm, as presented in Table 4 (+0.55 K for the SW\_Du algorithm and +0.21 K for the SW\_JM algorithm). For the single-channel algorithm (Figure 4d,e), the retrieved LST after the correction increased at low temperatures but decreased at high temperatures, compared to that of before the correction. The LST change of the SC\_11 algorithm was more evident than that of the SC\_10 algorithm (−0.27 K for SC\_10 and −0.78 K for SC\_11), and the scatter plot result of both algorithms were getting closer to the 1:1 line. This finding explained the details of accuracy

increase in single-channel algorithms for the two TIRS bands, as shown in Table 4.

The single-channel algorithm is a theoretically derived algorithm; accordingly, the accuracy changes of SC\_10 and SC\_11 can be used to check whether the TIRS data are better after the stray light correction. Table 4 indicates that the errors of the SC\_10 and SC\_11 algorithms were reduced, meaning that LST retrieval results got better after the stray light correction and the SC\_11 improved more. Combined with the change in the band radiance before and after the correction, the difference of retrieved LST can be analyzed clearly. Figure 2 in Section 4.1 illustrates that the TOA radiance of the two bands after the correction increased at low temperatures and decreased at high temperatures, resulting that the retrieved LST was closer to the ground LST of the single-channel algorithm on two bands. This finding from SURFRAD sites provided the proof supporting that the stray light correction improved not only the TIRS data quality as stated in previous study [17] but also the LST retrieval accuracy in practice. However, the validations over more robust and

**Figure 4.** Comparison of retrieved LST before and after the stray light correction for different algorithms. (**a**) for SW\_Du, (**b**) for SW\_Rozenstein, (**c**) for SW\_JM, (**d**) for SC\_10, and (**e**) for SC\_11. The linear regression coefficients were obtained from data of all five sites. "Before/After (without DRA)" in the figure means the data from the other four SURFRAD sites after excluding DRA before or after the correction; "Before/After (DRA)" in the figure means the data from the DRA site before or after the correction. **Figure 4.** Comparison of retrieved LST before and after the stray light correction for different algorithms. (**a**) for SW\_Du, (**b**) for SW\_Rozenstein, (**c**) for SW\_JM, (**d**) for SC\_10, and (**e**) for SC\_11. The linear regression coefficients were obtained from data of all five sites. "Before/After (without DRA)" in the figure means the data from the other four SURFRAD sites after excluding DRA before or after the correction; "Before/After (DRA)" in the figure means the data from the DRA site before or after the correction.

Since the DRA site had heterogeneity issues and high ground temperature cases (all from DRA when SURFRAD LST > 320 K) that had distinguishing features in the result of validation in Figure 4, the analysis excluding the DRA site is also necessary to help understand the effect of the stray light correction. Table 5 lists the temperature RMSE and bias of the five algorithms calculated from data From Table 5, when excluding high ground temperature cases of DRA, there were some differences from the results including DRA in Table 4. Except the SW\_Du, the RMSE of the other four algorithms all decreased little and RMSEs of five algorithms all were within 3.0 K after the correction. The decreases on RMSE of SC\_10 and SC\_11 were both around 0.3 K, narrowing the gap of the RMSE change on SC\_10

**RMSE (K) Bias (K) RMSE (K) Bias (K)** 

1.97 (+0.68)

**Algorithm Before Correction After Correction** 

SW\_Rozenstein 2.43 −0.42 2.30 (−0.13) 0.32 (+0.74) SW\_JM 2.47 0.82 2.27 (−0.20) 1.22 (+0.40) SC\_10 2.41 −0.83 2.11 (−0.31) 0.10 (+0.93) SC\_11 3.20 −0.17 2.84 (−0.36) 1.02 (+1.19) <sup>1</sup>The value in the bracket is the RMSE or bias change after the stray light correction compared to

SW\_Du 2.40 1.30 2.71 (+0.30) <sup>1</sup>

and SC\_11 in Table 4. This indicated that the correction and improvement of single-channel algorithm on Band 11 were greater than Band 10 in a high ground temperature range, which was consistent with the result in Figures 2 and 4d,e. However, the RMSE change of split-window algorithms was different from Table 4, especially for SW\_JM, which will be analyzed in detail in the Discussion section. Considering the high temperature cases in the practical use, this paper finally kept the DRA site in the statistical analysis to make the conclusion more universal.

#### 4.2.2. Comparison Results over Each SURFRAD Site

On the basis of the comparison results over all ground sites, we can determine the overall change on LST retrieval from different algorithms before and after the stray light correction. However, these sites are different from one another in the land cover type and homogeneity degree, and those differences may cause confusion in understanding the validation results. Therefore, further analysis of the results over each site is necessary. Table 6 lists the temperature RMSE and bias of retrieved LSTs from the five algorithms using the Landsat 8 images before the stray light correction over each site. Table 7 provides the results after the stray light correction. Figure 5 presents the scatter plots between the retrieved LST and ground LST.

Table 6 indicates that before the correction, the four other algorithms performed better than SC\_11 over BND, GCM, and PSU sites, and their temperature RMSEs were within the range of 1.5–2.7 K. From Table 7, after the correction, the temperature RMSEs of the single-channel algorithm of two bands decreased over the three sites. However, the temperature RMSE change of the split-window algorithms was not the same. The RMSE of SW\_Du for the three sites evidently increased, that is, 0.40 K for BND, 0.66 K for GCM, and 0.75 K for PSU. By contrast, the RMSE of the two split-window algorithms (SW\_Rozenstein and SW\_JM) over the three sites changed minimally. The temperature bias of all algorithms of the three sites increased, indicating that the retrieved LST was higher after correction on average. Figure 5 depicts that the ground LST of these sites was lower than 310 K; in this temperature range, most of the radiance of Bands 10 and 11 was increased (Figure 2). Therefore, the retrieved LST results were consistent with the change in radiance.

Table 6 presents that the GCM and PSU sites were suitable to retrieve LST using split-window algorithms before the stray light correction. The RMSEs of the three split-window algorithms on the two sites were within the range of 1.5–2.3 K, which were generally smaller than those of the single-channel algorithms of both bands. After the correction, with an increase on RMSEs of the split-window algorithms and a decrease on RMSEs of the single-channel algorithms, the superiority of the split-window algorithms in retrieving LST for the two sites disappeared. SC\_10 had the smallest RMSEs with a small bias over the PSU site, similar to the result obtained over the BND site. At the GCM site, SC\_10 also showed no weakness compared with some split-window algorithms.

For SXF, before the correction, the performance of the split-window algorithms was worse than that of SC\_10 (Table 6), and the SW\_Du and SW\_JM algorithms even had larger RMSEs than that of SC\_11. The combined analysis of Figure 5a–c implied that for the split-window algorithms on SXF, when the ground LST was in the range of 300–310 K, the retrieved LST was obviously higher than the ground LST, corresponding to the poor performance of the split-window algorithms on SXF. After the correction, the obvious error had disappeared, indicating an improvement in the split-window algorithms in retrieving LSTs (the RMSE of three split-window algorithms decreased and the biases of split-window algorithms were closer to 0.0 K). Nevertheless, they still exhibited worse results than SC\_10.

Although DRA was unsuitable for accurate validation because of heterogeneity, as mentioned in Section 3, DRA still could reveal important information of the effect of the stray light correction because of its high ground surface temperature, as illustrated in Figure 5. Its RMSE of SC\_11 was very large before correction and reduced by 1.4 K after correction. The large RMSE in Figure 5e was caused by the poor performance of SC\_11 at high temperature, and the improvement was due to the lower

radiance correction at high temperature on Band 11 (Figure 2). This comparison result confirmed the good stray light correction of Band 11 at high-temperature conditions.

In conclusion, the biases of five algorithms increased on most cases after the correction, except for three split-window algorithms on SXF and SC\_11 on DRA. However, the RMSE changes were complicated of different split-window algorithms on different sites. SW\_Du and SW\_JM mostly kept a similar change trend on GCM, PSU, SXF, and DRA, but their RMSEs increased after the stray light correction except on SXF. The RMSEs of SW\_Rozenstein decreased on these five sites, except the PSU site, showing the better performance of SW\_Rozenstein after the correction on most cases. The RMSEs of the single-channel algorithms decreased over all sites after correction. SC\_10 had the smallest RMSEs and good biases on BND, PSU, and SXF after the stray light correction, and its RMSEs were approximately 2.1 K on these sites. The performance of SW\_Rozenstein was close to that of SC\_10 after the correction in RMSE and bias, but the accuracy of SW\_Rozenstein was better than that of SC\_10 for the DRA site. Therefore, similar to SC\_10, SW\_Rozenstein was also a better algorithm than other two split-window algorithms to retrieve LST for Landsat 8 after the stray light correction.

**Table 6.** The RMSE and bias of the LST retrieved from Landsat 8 TIRS images on five sites before the stray light correction.


**Table 7.** The RMSE and Bias of the LST retrieved from Landsat 8 TIRS images on five sites after the stray light correction.


<sup>1</sup> The value in the bracket means the RMSE and bias change after the stray light correction compared to those of no correction.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 15 of 21

(**c**) SW\_JM algorithm result.

**Figure 5.** *Cont.*

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 16 of 21

(**e**) SC\_11 algorithm result.

**Figure 5.** Comparison of Landsat 8 retrieved LST (Landsat LST) with ground LST (SURFRAD LST) over different sites before and after the stray light correction. (**a**) is for the SW\_Du algorithm, (**b**) is for the SW\_Rozenstein algorithm, (**c**) is for the SW\_JM algorithm, (**d**) is for the SC\_10 algorithm, and (**e**) is for the SC\_11 algorithm. **Figure 5.** Comparison of Landsat 8 retrieved LST (Landsat LST) with ground LST (SURFRAD LST) over different sites before and after the stray light correction. (**a**) is for the SW\_Du algorithm, (**b**) is for the SW\_Rozenstein algorithm, (**c**) is for the SW\_JM algorithm, (**d**) is for the SC\_10 algorithm, and (**e**) is for the SC\_11 algorithm.

#### **5. Discussion 5. Discussion**

After the stray light correction, the single-channel and split-window algorithms showed different accuracy changes in different directions and ranges. The RMSE of single-channel algorithms of two bands all decreased on each site, but none of the three split-window algorithms showed the same change (all increased or all decreased) on the five sites. Moreover, different split-window algorithms had different accuracy changes on overall sites, and even the same split-window algorithm (SW\_JM) had increasing RMSE on all sites when including the DRA site but decreasing RMSE excluding the DRA site. The first confusing point is why split-window algorithms had such complicated accuracy changes that seemed to have no regularity. Secondly, when focusing on some specific temperature ranges, for instance, the high temperature (>320 K), the change of retrieved LSTs by split-window algorithms before and after the stray light correction was unreasonable. Figure 2 in Section 4.1 shows that the TOA radiance of the two bands after the correction was reduced under high-temperature condition (> 320 K). In the case if other factors are regarded as the same, the retrieved LSTs should have reduced in theory. However, as seen in Figure After the stray light correction, the single-channel and split-window algorithms showed different accuracy changes in different directions and ranges. The RMSE of single-channel algorithms of two bands all decreased on each site, but none of the three split-window algorithms showed the same change (all increased or all decreased) on the five sites. Moreover, different split-window algorithms had different accuracy changes on overall sites, and even the same split-window algorithm (SW\_JM) had increasing RMSE on all sites when including the DRA site but decreasing RMSE excluding the DRA site. The first confusing point is why split-window algorithms had such complicated accuracy changes that seemed to have no regularity. Secondly, when focusing on some specific temperature ranges, for instance, the high temperature (>320 K), the change of retrieved LSTs by split-window algorithms before and after the stray light correction was unreasonable. Figure 2 in Section 4.1 shows that the TOA radiance of the two bands after the correction was reduced under high-temperature condition (> 320 K). In the case if other factors are regarded as the same, the retrieved LSTs should have reduced in theory. However, as seen in Figure 4, under the high-temperature condition, the retrieved

LSTs of the three split-window algorithms increased after the correction, which was inconsistent with the theoretical expectation. The same unreasonable retrieved LST changes occurred also in some low temperature cases of SW\_Du and SW\_JM.

To explain the two confusing points, we must first explain the RMSE change of these algorithms was made of two parts: the one was the performance of the algorithm before the correction; the other was the change of retrieved LST before and after the correction. From Figures 4 and 5, it was easy to find that before the correction, the SW\_Du overestimated LST for most cases (bias = 1.14 K (with DRA), bias = 1.30 K (without DRA)), the SW\_Rozenstein underestimated LST for most cases (bias = −1.06 K (with DRA), bias = −0.42 K (without DRA)), and the SW\_JM overestimated LST a little (bias = 0.58 K (with DRA), bias = 0.82 K (without DRA)). Different performance before the correction will certainly result in a different accuracy change if these split-window algorithms have the same retrieved LST change before and after the correction, just as the high temperature cases (>320 K).

Meanwhile, the brightness temperature relationship between Bands 10 and 11, such as the brightness temperature difference of two bands, has an important effect on the performance of split-window algorithms. Taking the SW\_Du algorithm as an example, from its structure (see Equation (1)), the coefficients of (*T*10+*T*11) 2 and (*T*10−*T*11) 2 are positive based on the simulation data, and those coefficients of (*T*<sup>10</sup> − *T*11) <sup>2</sup> are very small. Therefore, the influence of (*T*<sup>10</sup> − *T*11) 2 can be ignored when analyzing the influence of the brightness temperature change on *Ts* . Under a high-temperature condition (LST > 320 K), as illustrated in Figure 3, the decreases in *T*<sup>10</sup> and *T*<sup>11</sup> would tend to cause a low retrieved LST, but the increase in (*T*<sup>10</sup> − *T*11) would tend to increase the LST. The result from Figure 4a shows that the retrieved LST of the SW\_Du algorithm after the correction was larger than that before the correction for the case LST > 320 K, indicating that the increasing effect on retrieved LST caused by an increase in TOA band radiance difference was greater than the decreasing effect on retrieved LST caused by a decrease in TOA radiance on Bands 10 and 11. The influence of brightness temperature difference finally caused the second confusing point above. Moreover, the different split-window algorithms have different structures and therefore, the brightness temperature relationship (for example the brightness temperature difference) will influence the performance of the algorithm in different degree. This can also cause the first confusing point, or more accurately, it was the different structures that resulted in a different performance of different split-window algorithms before the correction.

In the case that the RMSE change of split-window algorithms was much affected by the brightness temperature difference of two bands, the accuracy of brightness temperature (radiance) difference correction could directly determine the performance of split-window algorithms after the correction. However, the correction did not take into account the relationship between the radiance of two bands. Even though the radiance of Bands 10 and 11 got closer to the real radiance respectively and the relationship between the two bands' brightness temperature difference and LST became closer to the theoretical result mentioned in Section 4.1, the value of the radiance difference between the two bands may still be more biased away from the real value, making the performance of split-window algorithms worse.

Therefore, the structure of split-window algorithms, the performance of split-window algorithms before the correction and the change of brightness temperature difference between Bands 10 and 11 combined together to cause the confusing results of split-window algorithms before and after the correction.

#### **6. Conclusions**

This study focused on the evaluation of LST retrieval from Landsat 8/TIRS data before and after the stray light correction on the original observations using ground-measured LST from five SURFARD sites. Three split-window algorithms (SW\_Du, SW\_JM, and SW\_Rozenstein) and two single-channel algorithms (SC\_10 and SC\_11) were investigated.

The stray light correction increased band TOA radiance for low brightness temperature range (< 305 K for Band 10 and 310 K for Band 11) but decreased such radiance for a high brightness temperature range. The relationship between the two bands' brightness temperature difference and LST became closer to the theoretical result.

The LST retrieval error of the single-channel algorithm was consequently reduced. The RMSE of the single-channel algorithm for Bands 10 and 11 decreased by 0.27 K and 0.78 K, respectively. The improvement in retrieval accuracy for Band 11 at high temperature was obvious. In the high-temperature site (DRA), the decreased RMSE of the single-channel algorithm for Band 11 was 1.40 K. By contrast, the accuracy of split-window algorithm (such as SW\_Du) was unexpectedly reduced due to the variation in the brightness temperature difference of the two bands, and was unreasonably inconsistent with the change in radiance. For better use of split-window algorithms, the development of the split-window algorithms specified for Landsat 8, as well as the stray light correction on radiance relationship of Bands 10 and 11 may need more concern.

Among the five algorithms, the best two were SW\_Rozenstein and SC\_10. Before the stray light correction, the accuracy of the split-window algorithms was generally better than that of the single-channel algorithm. For the corrected images of overall sites, the accuracy of the SC\_10 algorithm increased, whereas the accuracy of the SW\_Du and SW\_JM algorithms decreased, even making the accuracy of the SC\_10 algorithm better than that of the SW\_Du and SW\_JM algorithms. After the correction, the RMSEs of SW\_Rozenstein and SC\_10 were approximately 2.5 K over all ground sites. In BND, GCM, PSU, and DRA sites, the RMSE of the single-channel algorithm for Band 10 was even within 2.2 K.

The results obtained in this study implied that the accuracy and applicability of the single-channel algorithm on Landsat 8/TIRS LST retrieval improved after the stray light correction. However, it still needs to be careful when using split-window algorithms to retrieve LST from Landsat 8/TIRS images. Meanwhile, it must be noted that the current results were only based on the measurements of five SURFRAD sites. Future evaluation using more ground-measured datasets over more homogeneous sites like desert and water sites remains strongly expected to clarify the quality of Landsat 8/TIRS data and the performance of different LST retrieval algorithms.

**Author Contributions:** Conceptualization, H.R.; Data curation, J.G., S.L. and J.D.; Formal analysis, J.G.; Funding acquisition, H.R.; Investigation, J.G.; Methodology, J.G., H.R. and Y.Z.; Resources, S.L. and J.D.; Supervision, H.R.; Validation, J.G.; Visualization, J.G. and Y.Z.; Writing—original draft, J.G.; Writing—review and editing, H.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National High-Resolution Earth Observation Project (04-Y30B01-9001-18/20-1-4), National Natural Science Foundation of China (No. 41771369), National key research and development program (2017YFB0503905-05).

**Acknowledgments:** The Landsat 8 images were downloaded from the USGS datacenter at https://earthexplorer. usgs.gov/; The SURFRAD data was obtained from Earth System Research Laboratory Global Monitoring Division (https://www.esrl.noaa.gov/gmd/grad/surfrad/).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
