**3. Methods**

#### *3.1. LST Pre-Processing*

The 1 km MODIS LSTs with quality control (QC) flags of 'cloud' were considered to be pixels under cloudy sky conditions, and all others were clear sky LSTs. To train the model and validate the estimated LST under cloudy areas, we used in situ LSTs measured at ASOSs. However, there are spatial scale differences between the MODIS LST (1 km) and the point-based ASOS LST; thus, a systematic bias could occur. To solve this problem, we fitted in situ LST and MODIS LST under clear sky conditions using polynomial regression by station [35], to bias-correct in situ LSTs for both daytime and nighttime (Equation (1)).

$$\mathbf{Y} = \mathbf{a}\mathbf{\hat{X}}^2 + \mathbf{b}\mathbf{\hat{X}} + \mathbf{C} \tag{1}$$

where X is the in situ LST and Y is the MODIS LST under clear sky conditions. a, b, and c represent the coefficients. We used the quadratic function because many ASOSs have observed that in situ LSTs rise more rapidly than MODIS LSTs at high temperatures in the summertime.

The MODIS and in situ LSTs from March to October, starting from 2013 through to 2018 at each station were used for the bias-correction. Only MODIS LSTs with good quality were used based on the QC flags, and the hourly in situ LSTs were linearly interpolated to the MODIS view time. We excluded the winter season (December through February) because the range of winter LSTs is not compatible with that of summer LSTs, which are the target variable of this study. Finally, we selected a total of 10 ASOSs for validation, with a calibration error (RMSE) < 2.5 ◦C for both daytime and nighttime in July–August (summer season) from the bias-correction analysis. Figure 1 shows the geographic distribution of the ten ASOSs used in this study. Appendix A describes the statistical results of before and after bias correction of the 1 km in situ LSTs at the 10 stations.

In addition, clear sky 1 km MODIS LSTs were aggregated to 10 km based on the 10 km AMSR2 grid area. At this stage, only 1 km MODIS LSTs where 95% or more clear sky LSTs exist in a 10 × 10 km<sup>2</sup> area were used for aggregation. We also fitted the in situ LSTs and the 10 km MODIS LSTs under clear sky conditions using polynomial regression in the same way as the bias correction of the 1 km LSTs at the selected 10 ASOS stations to produce the bias-corrected in situ LSTs at 10 km scale.

#### *3.2. Pre-Processing of Input Variables*

Table 1 describes the input variables used for the estimation of LST under cloudy conditions. A total of 10 AMSR2 BTs were projected onto MODIS LSTs, whilst AMSR2 pixels from near the coastline were masked to eliminate the effects of ocean water. Due to the shift of the flight overpass path, the areas of missing AMSR2 BT values were also excluded.

The 1 km MODIS annual cycle parameters (ACPs), including mean annual surface temperature (MAST), yearly amplitude surface temperature (YAST), and LST phase shift relative to the spring equinox (theta) were calculated for each of daytime and nighttime based on the following equation [36].

$$f(\mathbf{d}) = MAST + YAST \ast \sin\left(\frac{d2\pi}{365} + theta\right) \tag{2}$$

where d represents the day of the cycle relative to the spring equinox. The ACPs were created for each year of the study period and averaged to construct one MAST, YAST, and theta each for daytime and nighttime.

We aggregated the 30 m SRTM elevation and 30 m GMIS impervious surface fraction to the 1 km resolution of MODIS. Latitude and longitude values were extracted from the information contained in the MODIS tiles. This study converted DOY to values ranging from −1 to 1 within a one-year period, using a sine function that considers seasonality (i.e., setting the middle of summer as 1 and the middle of winter as −1) [37].


**Table 1.** Description of input variables used in this study.

### *3.3. Random Forest (RF)*

In this study, we applied machine-learning random forest (RF) to estimate the LST for all-weather conditions. RF has been widely used to conduct a variety of classification and regression tasks [38–43]. RF comprises classification and regression trees, producing a variety of independent trees to make a final decision through two randomizations: (1) Random selection of training samples and (2) random selection of predictor variables at each node of a tree [44]. This randomization is aimed at solving the overfitting problem and reducing the sensitivity of a model that comes from training sample configurations [45]. When a variable is randomly permuted, RF calculates the relative variable importance from out-of-bag (OOB) samples, measuring the mean squared error (MSE) of the OOB portion of samples in each tree. The same process is implemented whenever each input variable is perturbed, and the difference between the two MSEs of all trees is averaged. The larger the increase in the percentage of MSE (%incMSE) of a variable, the greater the contribution of the variable. It should be noted that the relative variable importance is considered as the sum of local contributions, not global importance [46–48]. The RF was implemented using the R statistical software through the "ranger" add-on package, with default model parameter settings (e.g., the number of trees set as 500). The "ranger" is a fast implementation of RF, or recursive partitioning, and is known to be suited to high-dimensional data with large datasets [49].

#### *3.4. Schemes for Estimating All-Weather LSTs*

In this study, we proposed two schemes (S1 and S2) to estimate all-weather MODIS LSTs. The S1 is based on a two-step approach (see Figure 2). The first step is estimating 10 km LSTs for daytime and nighttime. We used the following independent variables: Four different frequencies of the AMSR2 of two polarization BTs (10H/V, 18H/V, 23H/V, and 36H/V), along with five 10 km resampled auxiliary variables (Elev, Lat, Lon, DOY, and year). The aggregated 10 km MODIS LSTs under the clear sky condition were used as a target variable. Moreover, we used the 10 km bias-corrected in situ LSTs under the cloudy sky condition from 10 stations as a dependent variable together to train the model with the real LST characteristics under the cloudy skies. RF was applied to predict LSTs by training the

samples from March to October between 2013 and 2018. Finally, the 10 km LSTs were produced based on the developed model using the input variables under all-weather conditions.

**Figure 2.** Flowchart of scheme 1(S1) suggested in this study. Procedures are divided into two main steps: Estimating 10 km land surface temperatures (LSTs) (**left**) and the spatial downscaling of LSTs to 1 km (**right**).

The second step involved the spatial downscaling of LSTs from 10 km to 1 km. We used RF to develop the downscaling model for each date. The Elev, Imp, Lat, Lon, and ACPs (i.e., MAST, YAST, and theta) were used as input variables, considering the surface properties and spatial distribution characteristics of 1 km LSTs. At first, the 1 km MAST, YAST, and theta with 1 km auxiliary factors—Elev, Imp, Lat, and Lon—were aggregated to 10 km for independent variables for the RF model, and the 10 km LSTs, constructed in the first step, were used as the dependent variable. After the model was trained for each date, the original high-resolution 1 km input variables were put into the RF model to finally produce the 1 km all-weather LSTs, after the residual correction proposed by Hutengs and Vohland [50].

The second scheme S2 is a one-step algorithm that directly estimates 1 km all-weather LSTs (Figure 3). To create this scheme, first we downscaled all 10 km AMSR2 BTs to 1 km using bilinear resampling. The resampled BTs together with 1 km auxiliary variables (i.e., MAST, YAST, theta, Elev, Imp, Lat, Lon, DOY, and Year) were used as input variables in RF. We used the 1 km clear sky MODIS LSTs and the 1 km bias-corrected in situ LSTs under cloudy skies from ten stations as target variables for daytime and nighttime separately. RF models were built for each year with the samples of the entire study area from March through October being used for training. We put the input variables of all-weather conditions into the developed RF model to produce 1 km LSTs for both clear and cloudy sky conditions.

**Figure 3.** Flowchart of scheme 2(S2) for directly estimating 1 km all-weather LSTs in one-step.

#### *3.5. A Series of Validations*

For the first step of the first scheme, S1, two types of validations were implemented using the clear sky 10 km aggregated MODIS LSTs focusing on the summertime (i.e., July and August). In the first validation, we randomly divided July and August samples into training and validations sets (80%/20% split). In the humid summer, however, most of the area in South Korea is often covered by clouds for the whole day. So, for the second validation, we randomly divided the samples of July and August by date: 80% for training and the remaining 20% for validation. In the second step of S1, LSTs downscaled into 1 km by RF were validated using all clear sky MODIS LSTs in the summer. At this stage, we compared the performance of RF with other downscaling techniques in two ways: (1) A bilinear resampling of 10 km LSTs to 1 km; (2) a lapse-rate technique using Equation (3), which assumes that the temperature difference comes from elevation, suggested by Dual et al. [12]:

$$LST\_{1\ k m} = TLR \times (Elv\_{1\ k m} - Elv\_{10\ k m}) + LST\_{10\ k m} \tag{3}$$

where LST1km is the downscaled LST at 1 km scale; LST10km represents the predicted 10 km LSTs in step 1; Elv1km is the surface elevation at 1 km scale; Elv10km is the surface elevation at 10 km scale; and the TLR is the temperature lapse rate, which is defined as the rate of temperature decrease with elevation in the troposphere (average TLR value is 6.5 K/km; [51]). For the second scheme, S2, first and second validations similar as in S1 were performed.

Moreover, the estimated all-weather 1 km LSTs, including those under cloudy skies, were validated using bias-corrected in situ LSTs of 10 ASOSs in summer. Among the 10 ASOS stations, data at one station were used as validation and the others were used to train the model with in situ cloudy LSTs as targets. This leave one-station out cross-validation (LOOCV) was repeated for all 10 stations. The coefficient of determination (R<sup>2</sup> ; Equation (4)), RMSE (Equation (5)), normalized RMSE (nRMSE; Equation (6)), and bias (Equation (7)) were used to evaluate the performance of the models for each validation step.

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^n (y\_i - \hat{y}\_i)^2}{\sum\_{i=1}^n (y\_i - \overline{y})^2}, \overline{y} = \frac{1}{n} \sum\_{i=1}^n y\_i. \tag{4}$$

$$\text{RMSE} \left( ^{\circ}\text{C} \right) = \sqrt{\sum\_{i=1}^{n} \frac{\left( \hat{y}\_{i} - y\_{i} \right)^{2}}{n}} \,\text{}\,\text{}\,\text{}\,\text{}\,\text{}\,\text{(5)}$$

$$\text{RRMSE} \left( \% \right) = \frac{\text{RMSE}}{\left( y\_{\text{max}} - y\_{\text{min}} \right)} \times 100 \text{ \AA} \tag{6}$$

$$\text{Bias } (\ulcorner \mathbf{C}) \; \; = \sum\_{i=1}^{n} \frac{(\mathbf{\hat{y}}\_{i} - y\_{i})}{n} \; \; \; \tag{7}$$

where *y<sup>i</sup>* is the measured value, *y*ˆ*<sup>i</sup>* is the predicted value, *ymax* and *ymin* represent the maximum and minimum values of observations, and n is the number of samples.

#### **4. Results and Discussion**

#### *4.1. Analysis of Clear and Cloudy Sky* in Situ *LSTs in Summer*

Figure 4 depicts the boxplots of bias-corrected LSTs for the 10 in situ stations under clear sky and cloudy sky conditions. It should be noted that the average LSTs in clear skies were higher than those of cloudy skies for all stations in the daytime. Although high LSTs also appear under cloudy sky conditions, LSTs are considerably lower when compared to clear sky LSTs. The boxplot (Figure 4) shows that the upper quartile of cloudy sky LSTs is close to the lower-quartile of clear sky LSTs for most in situ stations. One possible reason for this is an increasing cloud-cover on rainy days in humid areas in the summer, where precipitation reduces the surface sensible heat [12,52]. When the sky is heavily cloudy in the daytime, most of the Sun's energy does not reach the surface, preventing the heating up of the Earth. Otherwise, at nighttime, clear sky and cloudy sky LST distributions do not differ significantly for all the stations. There is no incoming radiation from the Sun at night, therefore LSTs do not rise significantly. The average nighttime LSTs—early overnight—under cloudy skies have a nearly identical range as, or only slightly lower than, the clear sky LSTs.

**Figure 4.** Boxplots of bias-corrected in situ LSTs under clear and cloudy skies for daytime and nighttime for each station. Refer to Figure 1 for station numbers.

Among the 372 days, the total number of summer days (July–August) for the six years (2013–2018), the size of good-quality clear sky samples was very small, especially for the daytime (see the sample size in Tables A1 and A2). This implies that the temporal or spatial interpolation approach to reconstructing LSTs in cloudy conditions could not be effectively applied to the summer season because of a large amount of cloud contamination.
