*2.3. Methods*

The workflow of the SM retrieval algorithm development is illustrated in Figure 2. The main steps are summarized as (1) S1 backscatter preprocessing; (2) Reducing the effect of surface roughness; (3) Reducing the effect of vegetation; (4) SM retrieval algorithm construction; (5) SM result post-processing; and (6) Retrieval result validation.

**Figure 2.** Workflow of SM retrieval and validation.

2.3.1. S1 Backscatter Preprocessing

Data preprocessing was performed on the GEE platform [72]. GEE has performed some preprocessing of the S1 data using the ESA S1 Toolbox (S1TBX), including applying orbit files, removing thermal noise, removing GRD border noise, radiometric calibration, and Range-Doppler terrain correction. Furthermore, the S1 incident angle normalization and spatial filtering are needed to make the data as correct as possible.

•S1 incident angle normalization

The *σ*◦ is affected by the incidence angle (*θ*0) of S1 and has a slight deviation from the actual situation. There is a certain correlation between *θ*0 and *<sup>σ</sup>*◦, which can be expressed as a slope *β* [73,74]. This study chose the central incidence angle of the study area (38◦) as the reference angle to reduce the overall error caused by extrapolation [42]. Therefore, as shown in Equation (1), we uniformly correct the *σ* to the value corresponding to the incident angle of 38◦ (*σ*◦ (38◦)).

$$
\sigma^{\circ}(38^{\circ}) = \sigma^{\circ}\ (\theta^{0}) - \beta\ (\theta^{0} - 38^{\circ})\ \text{[dB]}.\tag{1}
$$

The calculation of Equation (1) can be performed on the GEE.

• Refined Lee Filtering

In order to reduce the speckle noise in the image while preserving the image edge information, the Refined Lee filter with a window size of 7 × 7 is used in this study [75,76].

### 2.3.2. Sensitivity of Backscattering Coefficient to Soil Liquid Water

The correlation between SM and *σ*◦ in the permafrost region was analyzed at sites QT08 and QT09. As shown in Figure 3, the *σ*◦ and SM at both stations showed obvious seasonal variations, i.e., high in summer and low in winter. When the soil freezes, the soil's liquid water content decreases sharply, and therefore, the dielectric constant decreases, and consequently, the *σ*◦ drops significantly.

**Figure 3.** The time-series variation of SM and *σ*◦ at sites. (**a**) QT08 and (**b**) QT09.

Figure 3 also revealed that the same *σ*◦ corresponded to different SM values at the two sites. At site QT08, the *σ*◦ was in the range of −17 to −10 dB, corresponding to SM of about 0.003 to 0.13 m3/m3, while at site QT09, the *σ*◦ range was −17 to −11 dB, corresponding to the SM range of about 0.05 to 0.4 m3/m3. This shows that the range of *σ*◦ does not vary much between the two sites, but the corresponding SM ranges are dramatically different. Many studies have demonstrated that surface roughness and vegetation are the major factors affecting the correlation between surface backscatter intensity and SM [77]. Therefore, it is critical to reduce the effects of surface roughness and vegetation in the retrieval process to improve accuracy.

### 2.3.3. Reducing the Effect of Surface Roughness

The CD algorithm originally proposed by Wagner et al. determines the SM by linearly scaling the observed backscatter between that at the driest and wettest conditions [78]. The algorithm has been validated and used in many regions in SM retrieval studies in recent years, including semi-arid areas and mountainous areas [42,79]. Rignot EJM et al. conducted experiments in a mountainous region of the Katmai National Park and Preserve in Alaska and found that the CD algorithm is effective in removing the topographic influences [80]. Therefore, this study refers to the principle of the CD algorithm to eliminate the effect of topography in permafrost areas.

According to the characteristics of soil liquid water in the permafrost regions of QTP, the soil surface is mostly frozen in winter and holds little liquid water. Meanwhile, due to low precipitation during the period of January–February, the snowmelt or wet snowfall effect is limited, and the change in surface roughness in winter is not significant compared to the melting season. We assumed that the smallest *σ*◦ in winter of its temporal curves could represent the lowest liquid water content of the soil during this period. Therefore, we took the smallest value of *σ*◦ in the freezing season (January, February) as the reference value and subtracted it from the backscatter signal during the thawing season. The backscatter difference (Δ*σ*) between the freezing and thawing seasons (months July, and August) represents the changes in SM and vegetation. It can be expressed as:

$$
\Delta \sigma = \sigma\_{\ $} - \sigma\_{\$ } \tag{2}
$$

where *σ*s is the *σ*◦ of the thawing season, and *σ*w is the *σ*◦ of the freezing season. Figure 4 shows the distribution and details of *σ*s, *σ*w, and Δ*σ* in the study area. In the case of Figure 4a,b, the topographic factor has a significant effect on the backscattered signal. Figure 4c shows that the method of calculating the *σ*◦ difference between summer and winter is effective in weakening this effect.

**Figure 4.** The distribution of *σs*, *σw,* and Δ*σ* and details in the study area. (**a**): The *σ*◦ on 2 July 2018. (**b**): The *σ*w in 2018. (**c**): The Δ*σ.* (**d**–**f**): The details after scaling up of *<sup>σ</sup>*◦, *σw*, and Δ*σ* within the red box.

### 2.3.4. Reduce the Effect of Vegetation

We applied NDVI and NDMI to reflect vegetation characteristics and the water content in vegetation and proposed to use the combination of NDVI and NDMI to characterize the effect of vegetation on the *σ*. These vegetation indices were calculated as follows:

$$\text{NDVI} = (\rho\_{\text{nirr}} - \rho\_{\text{red}}) / (\rho\_{\text{nirr}} + \rho\_{\text{red}}) \tag{3}$$

$$\text{NIDMI} = (\rho\_{\text{nir}} - \rho\_{\text{svir}}) / (\rho\_{\text{nir}} + \rho\_{\text{svir}}) \tag{4}$$

where *ρred*, *ρnir*, and *ρswir* are the reflection value in the red spectrum, the near-infrared spectrum, and the shortwave infrared spectrum, respectively. It is noteworthy that the contribution of soil to NDMI is primarily negative for some areas, while the contribution of green vegetation is mostly positive [81].

### 2.3.5. SM Retrieval Algorithm Construction

We collected 129 datasets (in situ data, and corresponding Δ*σ*, NDVI, NDMI) during the thawing season (months July and August) from 2016 to 2019. A multiple linear regression model was constructed based on the linear relationship between SM and the Δ*σ*, NDVI, and NDMI, and the SM retrieval algorithm can be expressed as follows:

$$\text{\\$SM} = \text{a} \times \triangle \sigma + \text{b} \times \text{NIDVI} + \text{c} \times \text{NDMI} + \text{d} \tag{5}$$

where a, b, and c are the coefficients of the three variables (*<sup>σ</sup>*, NDVI, NDMI), respectively, and d is a constant. In order to ensure the universality of the retrieval algorithm, we arrange the in situ data of different years together and then performed 10,000 random divisions with a ratio of nearly 8 to 2 for determining the optimal coefficient. One part is used to obtain model coefficients (a, b, c, d). The other is used to verify the accuracy of retrieved SM. Thus, we can obtain 10,000 sets of coefficients, training, and validation of R2. Finally, we calculate the sum of R<sup>2</sup> for training and validation processes using their sample size as the weights. The coefficients corresponding to the maximum sum of R<sup>2</sup> are determined as the optimal coefficient.

### 2.3.6. SM Result Post-Processing

There are some outlier regions in the retrieval results, which are removed in the post-processing steps.

• Waterbody masking

The sensitivity of the *σ*◦ to soil liquid water is its advantage of SM retrieval, but *σ*◦ will present an anomaly and deviate from the normal range when the sensor scans water bodies such as rivers and lakes. The normalized difference water index (NDWI) is calculated using the green band, and the near-infrared band can effectively identify the water body information [82]. The NDWI is calculated as follows:

$$\text{NDWI} = (\rho\_{\text{green}} - \rho\_{\text{nir}}) / (\rho\_{\text{green}} + \rho\_{\text{nir}}) \tag{6}$$

In Equation (6), *ρgreen* is the reflection in the green spectrum, corresponding to the B3 band of the S2. Then, a mask is created with 0 as the threshold to remove the water body part of the retrieved results.

• Shadow masking

It is found that the *σ*◦ in the hillside or foothills usually shows outliers in our study area due to the radar signal being obscured and distorted in these areas. The local incidence angle (*θ*) can be calculated by using the zenith angle and azimuth angle of S1 to represent the illumination condition of the radar signal, expressed in Equation (7),

$$
\cos \theta = \cos \theta\_{zu} \times \cos S + \sin \theta\_{zu} \times \sin S \times \cos (\theta\_{u1} - A) \tag{7}
$$

where *θza* is the zenith angle, which is the same angle as *θ*0, *θaa* is the azimuth angle, *S* is the slope, and *A* is the aspect. The pixels with *θ* smaller than a threshold to be defined are masked out. According to a visual comparison of shadow outliers and local incidence masks, the threshold of 15◦ is adopted in our study.

• Negative Δ*σ* masking

Theoretically, there is a positive correlation between the *σ*◦ and the SM, and the *σ*◦ in the thawing season should be higher than that in winter. Hence, the area where the Δ*σ* is less than zero is considered abnormal and masked out during the post-processing.

### 2.3.7. SM Retrieval Algorithm Validation

In all, 29 samples were used to evaluate the accuracy of the retrieved SM using the proposed algorithm. The root-mean-square error (RMSE) and coefficient of determination (R2) are applied to indicate the accuracy of the SM retrieval result.
