*2.2. Cyclone Global Navigation Satellite System*

As a component of NASA's Earth System Science Pathfinder project, the Cyclone Global Navigation Satellite System (CYGNSS) was launched on 15 December 2016. The observatories are composed of eight microsatellites. They offer almost uninterrupted coverage of the Earth due to their orbit inclination of approximately 35◦ to the equator. This positioning results in an average revisit time of 7 h and a median revisit time of 3 h. This inclination allows CYGNSS to cover an observational range from 38◦N to 38◦S. Therefore, we selected the southern part of the United States as the study area (CYGNSS observables cannot cover the entire US).

The objective of this study is to retrieve SM within a specific region. To achieve this, we utilized the CYGNSS Level-1 (L1) version 2.1 product, with data sourced from the Physical Oceanography Distributed Active Archive Center (PO.DAAC, https://podaac. jpl.nasa.gov/, accessed on 1 April 2023). The primary goal of CYGNSS is to enhance understanding and prediction of tropical cyclone intensity by leveraging signals from the Global Navigation Satellite System (GNSS). The core component of this system is the Delay Doppler Mapping Instrument (DDMI), whose main task is to generate Delay Doppler Maps (DDMs) [31]. DDMs represent the received surface power of each observed specular reflection point through a series of time delays and Doppler frequencies, measured on a bin-by-bin basis. In other words, they provide a two-dimensional representation of the reflection characteristics of GNSS signals. These characteristics are influenced by factors such as SM and vegetation cover, and can therefore be used to infer SM. It is important to note that the DDMI initially measures in uncalibrated "counts", which have a linear relationship with the total signal power it processes. The total signal power includes thermal radiation from the Earth and the DDMI itself, as well as GPS signals scattered from the land surface. However, during the Level-1A calibration process, each bin in the DDM converts these raw counts into watts, allowing for a more intuitive understanding and analysis of the data. The CYGNSS observables used in this paper cover the period from 1 January to 31 December 2019.

The surface reflectivity can be estimated through a variety of methods with various coherence and incoherence assumptions using the observables in the L1 data [15,32,33]. In water accumulation areas such as lakes, rivers, and wetlands, low surface roughness leads to dominant coherent scattering in forward scattering. Even with higher SM, coherent forward scattering remains strong due to water. However, GPS signals interacting with vegetation introduce some incoherent components. Higher SM regions show a stronger signal intensity due to a relatively higher SNR compared to lower SM areas. Thus, in this paper, we adopted the approach proposed by Rodriguez-Alvarez et al. [32] to calculate reflectivity, under the assumption that the observed GNSS-R signal is predominantly made up of coherent reflections. This involves using the BRCS (denoted as '*brcs*' in CYGNSS L1) and the range terms to calculate the reflectivity (Γ*RL*(*θi*)) as:

$$\Gamma\_{RL}(\theta\_i) = (\frac{4\pi}{\lambda})^2 \frac{P\_{RL}^{\rm coh}(rst + rsr)^2}{P\_l G\_l G\_r} \tag{1}$$

where *Pcoh RL* represents the dual base radar coherent receive power. The subscripts *R* and *L* stand for the right circularly polarized GNSS transmit antenna and the left circularly polarized GNSS-R antenna, respectively. The GNSS signal wavelength is denoted by *λ*. *rst* and *rsr* refer to the distances from the specular reflection point to the GNSS transmitter and the GNSS-R receiver, respectively. *Pt* signifies the peak power of the transmitting GNSS signal. *Gt* and *Gr* are the gains of the transmitting and receiving antennas, respectively. Lastly, Γ*RL*(*θi*) is the surface reflectance at an incidence angle of *θi*.

Leading Edge Slope (LES) and Trailing Edge Slope (TES) are indicators associated with coherent or incoherent scattering conditions. An increase in the incoherent reflection component within the reflected signal typically results in a corresponding increase in the absolute values of both LES and TES. Following the methodologies presented by Carreno-Luengo et al. [34] and Rodriguez-Alvarez et al. [32], LES and TES can be calculated as follows:

$$\text{LES} = \frac{\Gamma\_m - \Gamma\_{m-3}}{3\Delta} \tag{2}$$

$$\text{TES} = \frac{\Gamma\_{m+3} - \Gamma\_m}{3\Delta} \tag{3}$$

where Γ*<sup>m</sup>* represents the peak reflectivity at the reflection point, Γ*m*−<sup>3</sup> is the reflectivity at the third point before the reflection point, Γ*m*+<sup>3</sup> is the reflectivity at the third point after the reflection point, and Δ stands for the delay resolution of the Doppler delay map, which is 0.2552 chips.

DDM\_SNR is one of the most basic variables in CYGNSS observables. When the value of SM increases in the same area, the difference between the corresponding values of DDM\_SNR also increases. Therefore, DDM\_SNR is added to the model as a factor affecting the SM retrieval. For SM retrieval in the machine learning framework, the derived reflectivity, together with LES, TES, and DDM\_SNR, are used as the input layer characteristics of CYGNSS observables.
