*2.1. Bistatic Forward Scattering*

The DDM is the function of signal time-delay and Doppler frequency shift from the surface specular point, which implies the mapping relationship of scattering power between space and delay-Doppler domain. When the L-band signal impinges on the rough sea surface, incoherent scattering occurs in most cases and scattering power can be well signified by the Z-V model [6]:

$$P(\hat{\mathbf{r}}, \hat{f}\_D) = \frac{T\_I^2 P\_T \lambda^2}{(4\pi)^3} \iint\limits\_A \frac{G\_T \sigma^0 G\_R}{R\_R^2 R\_T^2} \Lambda^2(\hat{\mathbf{r}} - \mathbf{r}) \text{sinc}^2(\hat{f}\_D - f\_D) dA \tag{1}$$

$$\chi(\mathfrak{t}, \hat{f}\_D) = \Lambda(\mathfrak{t} - \mathfrak{r}) \text{sinc}(\hat{f}\_D - f\_D) \tag{2}$$

$$\Lambda(\mathfrak{k}-\mathfrak{r}) = \left\{ \begin{array}{c} (1-\frac{|\mathfrak{k}-\mathfrak{r}|}{\mathfrak{r}\_{\mathfrak{c}}})^2 & |\mathfrak{k}-\mathfrak{r}| \le \mathfrak{r}\_{\mathfrak{c}} \\ 0 & |\mathfrak{k}-\mathfrak{r}| > \mathfrak{r}\_{\mathfrak{c}} \end{array} \right. \tag{3}$$

$$\text{sinc}(f\_D - f\_D) = \left| \frac{\sin(\pi (\hat{f}\_D - f\_D) T\_I)}{\pi (f\_D - f\_D) T\_I} \right|^2 \tag{4}$$

where *P*(*τ*ˆ, ˆ *fD*) is the complex, diffuse scattering power, *PT* is the right-hand circular polarized (RHCP) transmitted power of the GNSS satellite, *GT* is the transmitter antenna gain, *GR* is the GNSS-R receiver antenna gain, *RR* is the distance from the receiver to the scatter point over the ocean surface, *RT* is the distance from the transmitter to the scatter point, *λ* is the wavelength of the GNSS carrier, *TI* is the coherent integration time, *σ*<sup>0</sup> is normalized bistatic radar cross-section, Λ(*τ*ˆ − *τ*) is the correlation function of the GNSS navigation code, *τ*ˆ and *τ* are the local replica code in the receiver and received signal time delays, respectively, sinc( ˆ *fD* <sup>−</sup> *fD*) is the attenuation due to Doppler misalignment, <sup>ˆ</sup> *fD* and *fD* are the local replica code in the receiver and received signal Doppler frequency shift, respectively. The product of the correlation function and the sinc function is called the Woodward ambiguity function (WAF) of the GNSS PRN navigation code. *A* indicate the

glistening zone, *dA* is the differential area within the glistening zone. The *σ*<sup>0</sup> can be further expressed as:

$$\sigma^0 = \frac{\pi |\mathfrak{R}\_{LR}(\theta)|^2 \stackrel{\rightarrow}{q}^4}{q\_z^4} P\left(-\frac{\stackrel{\rightarrow}{q}\_\perp}{q\_z}\right) \tag{5}$$

where *LR* is the Fresnel reflection coefficient of scattered left-hand circular polarized (LHCP) signal over the sea surface, <sup>→</sup> *q* is the scattering unit vector, <sup>→</sup> *<sup>q</sup>* <sup>⊥</sup> and *qz* are horizontal and vertical components, respectively, and *P* is the probability density function of the sea surface slope. The coherent integration time commonly sets 1ms on the delay Doppler mapping instrument to generate a single DDM look. Due to the diffuse scattering signals over sea surface being relatively weak, to improve the signal-to-noise ratio (SNR) of DDM and reduce the effect of speckle and thermal noise within a coherent integration of a DDM look, there is an extra noncoherent integrated step that takes 1 s, during which the received signal will lose the phase information. Normalized bistatic radar cross-section (NBRCS) has been used as the land surface remote sensing fundamental observable in the backward scatterometer for a very long time, which is also reasonable to be employed in GNSS-R with specific calibration.

Theoretically, real land scattering power consists of coherent and incoherent components. After noncoherent integration, the DDM can be expressed as:

$$
\left\langle \left| P(\hat{\mathbf{r}}, \hat{f}\_D) \right|^2 \right\rangle = \left\langle \left| P\_{\text{coh}}(\hat{\mathbf{r}}, \hat{f}\_D) \right|^2 \right\rangle + \left\langle \left| P\_{\text{incol}}(\hat{\mathbf{r}}, \hat{f}\_D) \right|^2 \right\rangle \tag{6}
$$

where *Pcoh*(*τ*ˆ, ˆ *fD*) 2 and *Pincoh*(*τ*ˆ, ˆ *fD*) 2 are coherent and incoherent contributions, respectively. Previous studies focus on land SM retrieval directly assumed that the incoherent item is negligible on Earth's land surface; the received scattering power mainly concentrates from the adjacent region around specular point. According to the image theory and Friis transmission equation, the coherent scattering power coming from the first Fresnel zone can be expressed:

$$P\_{\rm coh} = \frac{P\_T G\_T \lambda^2 G\_R}{\left(4\pi\right)^2 \left(R\_R + R\_T\right)^2} \Gamma(\theta) \gamma^2 \exp\left(-\left(2kr\cos(\theta)\right)^2\right) \tag{7}$$

where Γ is reflectivity, it is the function of the Fresnel reflection coefficient  (Γ(*θ*) = |*LR*(*θ*)| 2 ). *γ* is the transmissivity which indicates the vegetation layer attenuation, it is the function of vegetation opacity depth (VOD) *τ*(*γ*= exp(−*τ* sec *θ*)). The exponential term represents signal attenuation caused by surface roughness. *k* is the wavenumber, and *σ* represents the standard deviation of surface height. The size of the first Fresnel zone is related to the height of the receiver platform and signal incidence angle. For the CYGNSS mission, the diameter of the first Fresnel zone is about 0.5 km. Considering this small area compared to the spatial resolution of CYGNSS DDM pixels, it is reasonable to directly pick the peak power value in the DDM to calculate reflectivity. According to Equation (7), the surface reflectivity can be derived as follows:

$$\Gamma(\theta) = \frac{(4\pi)^2 P\_{peak} (R\_R + R\_T)^2}{\lambda^2 P\_T G\_T G\_R} \tag{8}$$

where *Ppeak* is the coherent power with surface roughness and vegetation attenuation correction. Since the velocity of CYGNSS satellites along the track is 7 km/s, the spatial resolution corresponding to the peak power measurement is about 0.5 km × 7 km. After June 2019, the sampling frequency of the CYGNSS mission has increased to 0.5 Hz, which allows the spatial resolution along the direction of satellite movement to reach 3.5 km.

When the roughness of the land surface is comparable to the scale of GNSS carrier wavelength, incoherent scattering will occur over the land surface. Since there is no reliable high spatial-temporal surface roughness information, it is difficult to determine the coherence of a DDM. The aforementioned assumption indeed ignores two issues for GNSS-R SM inversion. On one hand, if we directly consider that the main contribution of scattering power comes from the coherent component, which implies the influence of the incoherent components in DDM is ignored. On the other hand, when the GNSS-R observation footprint passes through the land surface with large roughness, the purely incoherent signal will be received, the influence of incoherent observations on the SM inversion is ignored as well. There is a big difference between the sensitivity of coherent and incoherent DDM observables to the SM level [9], so it is important to evaluate the influence of the previous assumption. Due to the different scattering mechanisms that happen behind the coherent and incoherent observations, the shape and magnitude of the measured scattering power in the DDM are different. The coherent DDM resembles the WAF itself without delay-doppler spreading [33]. Figure 1a shows a typical land reflected DDM over the winter wheat field. As a comparison, Figure 1b presents the DDM observed over the ocean surface with the 6.6 m/s wind speed near the specular point. The received diffuse scattering signals come from the entire glistening zone with the WAF spreading in the direction of delay and Doppler axis; DDM exhibits the typical "horseshoe" shape.

**Figure 1.** Typical scattering-power delay-Doppler map (DDM) and delay waveforms over the land surface (**a**,**c**) and ocean surface (**b**,**d**).

The time-delay waveform (DW) is the 1D representation of DDM; it is also a fundamental observable in the GNSS-R study, which usually includes two types: central Doppler

time-delay waveform (CDW) and integrated time-delay waveform (IDW). The CDW is the zero Doppler delay scattering power in the DDM. The IDW is obtained by summing the columns along the Doppler axis of DDM. In our classification method, we also define the deviation of time-delay waveforms (DDW) calculated by IDW subtracting CDW at each time delay bin. It can be noticed from the land reflected DW in Figure 1c that the trailing edge scattering power quickly decreases to the noise floor level after the peak point, and the deviation between CDW and DDW is small. Whereas the scattering power of the trailing edge of DW derived from the sea surface scattered DDM decreases slowly, especially for IDW and DDW, which is shown in Figure 1d, and the peak and trailing edge power of DDW are much larger than CDW. As the land topography changes and the surface roughness increases, which can lead to the intensity of the incoherent field strengthened rapidly, the coherent field weakens according to the conservation of energy. Then, the magnitude and distribution characteristics of DDM gradually approach the sea surface observations. Based on these features, we proposed a method to classify CYGNSS coherent and incoherent observations.

#### *2.2. Definition of Classification Estimator*

Since the DDM observed from the ocean surface dominated by incoherent scattering and from the relatively flat land surface dominated by coherent scattering are significantly different [17,25], the coherence classification method proposed here is based on the shape and distribution characteristics of power-spreading in the DDM. Here and after, we directly call coherent component dominated DDM as coherent DDM, and incoherent component dominated as incoherent DDM. The whole classification idea is inspired by GNSS-R sea ice detection [17,18], and both are essentially determining the similarity to the coherent model. For the coherent DDM, it resembles the Woodward ambiguity function (WAF) without delay-doppler spreading [33], while incoherent DDM exhibits the typical "horseshoe" shape. The calculation of the defined classification estimators is introduced in the following part in detail. To characterizes the difference in the coherence of DDM, combining the known typical range of estimator values calculated from ocean scattered incoherent DDM, the threshold of coherence can be determined in the CYGNSS land observation. It should be noted that the reference position of defined DDM estimators from the land surface refers to the delay and doppler bin of the DDM peak power. If the DDM observable can be calculated from the DW, the selected window is set to spanning 5 time-delay bins from the peak. For the DDMA calculation, the selected delay/Doppler window is a 5 × 3 matrix with the center located on the peak location. The given window size mainly depends on two reasons. On one hand, the position of peak power is not fixed in each CYGNSS DDM, so when the entire DDM is directly used to calculate the estimator, the range of statistical delay and Doppler is different. On the other hand, it is based on the shape of WAF, which is shown in Figure 2. The main coherent reflection power concentrate on this zone. The final objective is to compute the probability density function (PDF) of DDM observables in the land and ocean observations to determine the separation threshold of coherent and incoherent dominated observations.

**Figure 2.** The delay spreading function (**a**), Doppler-spreading function (**b**), and Woodward ambiguity function (**c**).

1. TES: It is the trailing edge slope of the normalized DW and determined using the least-squares fitting within the time-delay window to a linear expression:

$$a\_{\text{TES}}^{N} = \frac{n\sum\_{i=1}^{n} \pi\_i P\_i^N - \sum\_{i=1}^{n} \pi\_i \sum\_{i=1}^{n} P\_i^N}{n\sum\_{i=1}^{n} \pi\_i^2 - \left(\sum\_{i=1}^{n} \pi\_i\right)^2} \tag{9}$$

where *n* = 5 is the number of time-delay bins for the linear fitting. *τ<sup>i</sup>* is the timedelay of each bin. *P<sup>N</sup> <sup>i</sup>* is the normalized scattering power in raw count within the corresponding time-delay bin.

2. TEV: It is the average volume of the normalized DW trailing edge:

$$\mathcal{P}\_{\text{TEV}}^{\text{N}} = \frac{1}{n} \sum\_{i=1}^{n} P\_i^{\text{N}} \tag{10}$$

3. TEV\_POW: It is the average absolute scattering power of the DW trailing edge:

$$\overline{P}\_{\text{TEV\\_POW}} = \frac{1}{n} \sum\_{i=1}^{n} P\_i \tag{11}$$

where *Pi* is the scattering power of the corresponding time-delay bin.

4. DDMA: It is the average of the normalized scattering power DDM near its peak:

$$
\sigma\_{\text{DDMA}}^{N} = \frac{1}{nm} \sum\_{i=1}^{n} \sum\_{j=1}^{m} P\_{i,j}^{N} \tag{12}
$$

where *n* and *m* indicate the selected size of delay and Doppler window.

5. DDMA\_POW: It is the average of the absolute scattering power DDM near the peak:

$$\sigma\_{\text{DDMA\\_POW}} = \frac{1}{mn} \sum\_{i=1}^{n} \sum\_{j=1}^{m} P\_{i,j} \tag{13}$$

6. MF: It is known as the WAF-matched filter (MF) approach, which directly calculates the correlation coefficient of normalized DDM and unitary energy WAF:

$$R\_{\rm MF} = \frac{\left| \left\langle \left\langle \left| P(\mathfrak{t}, f\_{\rm D}) \right|^2 \right\rangle, \chi(\mathfrak{t}, f\_{\rm D}) \right\rangle \right|^2}{\left\langle \left\langle \left| P(\mathfrak{t}, f\_{\rm D}) \right|^2 \right\rangle, \left\langle \left| P(\mathfrak{t}, f\_{\rm D}) \right|^2 \right\rangle \right\rangle \left\langle \chi(\mathfrak{t}, f\_{\rm D}), \chi(\mathfrak{t}, f\_{\rm D}) \right\rangle}} \tag{14}$$

#### *2.3. Dataset for Soil Moisture Retrieval*

CYGNSS is part of the NASA Earth system science pathfinder program; it is deployed as the first dedicated spaceborne GNSS-R constellation launched in December 2016. The space segment consists of eight microsatellites orbiting on a non-synchronous near-circular orbit with an inclination of approximately 35◦ (all spacecraft distributed on the same orbital plane). Each spacecraft is capable of tracking 4 reflections simultaneously, resulting in 32 DDMs per second over the Earth's surface. The standard CYGNSS DDM consists of 17 delay bins with the resolution of a quarter of the GPS C/A chip by 11 Doppler bins with an interval of 500 Hz. Each DDM was processed using 1 ms coherent integration followed by 1000 looks of noncoherent averaging. After July 2019, the noncoherent time was reduced to 0.5 s. The primary objective of CYGNSS is to monitor the wind speed during the evolution of tropical cyclones. The footprint of its measurement covers the critical latitude band between ±38◦ [34]. At the same time, it also provides substantial land observations within this scope. The work conducting in this paper uses the CYGNSS level-1 DDMs in power analog from published version 2.1 data product ranging from January 2018 to August 2019. General data quality control (QC) in the land application is also utilized, the DDM SNR lower than 2 dB, receiver antenna gain at the specular point direction lower than 0 dB, and specular incidence angle over 65◦ are screened. To get the purely incoherent DDMs from the ocean surface [9], the corresponding DDMs with ERA5 wind speed greater than 5 m/s are employed. Land and sea surface observations are directly distinguished by the quality flag provided in the CYGNSS level-1 data.

The SMAP dataset used as the land surface reference SM is version 6 level-3 radiometer global daily 36 km equal-area scalable earth grid version 2.0 (EASE-Grid 2.0) soil moisture product within the same period time of the CYGNSS dataset [35], except the data product missing from 20 June to 22 July 2019. The SMAP SM data provides daily descending (a.m.) and ascending passes (p.m.) measurement, including the auxiliary parameters VOD (in the SMAP product indicates vegetation opacity parameter) and surface roughness coefficient (in the SMAP product indicates vegetation\_roughness\_coefficient), which are used to correct the attenuation of CYGNSS scattering power from the impact of surface roughness and vegetation canopy. In the soil moisture retrieval process, only the recommended data are used without the open water, urban area.
