**5. Discussion**

A key advantage of this study is the downscaling of CYGNSS based on the XGBoost model using L-band passive microwave SM (i.e., SMAP SM) and auxiliary variables. Instead, most previous studies downscale satellite SM products (AMSR-E, SMOS, and SMAP, etc.) based on optical data [39,53,54,57]. Another advantage is that it improves upon previous research that used CYGNSS to retrieve SM with a maximum spatial resolution of 9 km. Through the method of downscaling, this study has increased the spatial resolution of SM retrieval using CYNGSS to 3 km. Furthermore, the downscaled SM can more finely represent the spatial distribution changes in SM, offering substantial potential for applications such as irrigation planning in agriculture.

The noteworthy limitations of this study may present opportunities for improving the spatial downscaling of satellite SM outputs with coarse resolution. First, CYGNSS observables are collected at pseudo-random positions, with irregular spatial and temporal resolution. This is different from conventional remote sensing technologies, which have repeatable swaths and consistent local collection times. As a result, mapping CYGNSS observables regularly in space presents a challenge in terms of assigning appropriate spatial grid sizes. The spatial resolution of CYGNSS observations can vary greatly, ranging from the first Fresnel zone with coherent reflections (0.5 km) to the scintillation zone with incoherent reflections (i.e., dozens of km). Traditional methods of mapping using regular spatial grids and integral time-step cannot fully account for this complexity in the spatiotemporal resolution of CYGNSS signals [58]. However, a transformation procedure is needed in order to match CYGNSS observables with other remote sensing and modeling data. This conversion process can introduce inaccuracies into the reflectivity, which could have implications, not just for this study, but for all similar research endeavors as well [12,17,59,60].

Second, during the model building process, the input CYGNSS observables and auxiliary variable are aggregated from high resolution to 36-km coarse resolution using a simple arithmetic averaging method. Furthermore, the SMAP SM encapsulates an average representation of SM, which is spread across a spatial resolution of 36 km. The average SM represents the SM of the whole region, and most of the information is ignored due to the coarse resolution. Hence, the training samples chosen during the model building process are smooth data with minimal extreme values. Models built from these samples invariably influence the downscaled SM. The scale discrepancy between the input data for model training and the SMAP products somewhat constrains the selection of suitable data during the regression model construction process. If a large amount of training data is necessary, choosing a research area that is large enough to assure the collection of enough training samples becomes crucial. During the application of the downscaling model, due to the increased heterogeneity and richer data representation at a 3 km resolution, there might be extreme values that were not encountered during model training. This corresponded with the results of Wakigari et al. [54]. Therefore, in practical applications, the downscaled SM has some inevitable errors. These errors are not randomly generated, but are closely related to the variance of SM in our training samples. In other words, the greater the degree of variation or dispersion of SM in the training samples, the greater the retrieval error may be. This is because a large variance means that the SM values in the dataset have greater changes, which may lead to more errors in the model's predictions. At the same time, the results of the downscaling process are significantly affected by the number and representativeness of the training samples. A sufficient number of training samples can provide more comprehensive information, helping the model to better learn and understand the characteristics of the data. The degree of representativeness of the samples directly affects the generalization ability of the model. If the samples can fully represent the characteristics of the entire data, then the model's retrieval results on unknown data will be more accurate.

Third, we used in situ SM observations, which are direct measurements from specific locations. However, these data may introduce some uncertainties when validating our downscaled SM model, mainly due to scale discrepancies. In our model, the downscaled SM represents an average SM value over a 3 km × 3 km area, which is a broader spatial average than in situ measurements. However, due to geographical conditions and human activities, there may be significant variations in SM within this area. For instance, if a location is under irrigation, it could lead to the recorded SM value at this point being much higher than the area's average. This scale discrepancy could pose some issues during the validation phase. For example, if a site is located in an irrigation area, its SM measurement might be significantly higher than the average SM of the area, leading to a large deviation between the SM measurement and the model retrieval at this site during model validation. This deviation is not a problem with the model, but is caused by spatial scale differences.

Fourth, the input optical data NDVI is inevitably affected by clouds during the model construction and model application. This also leads to the downscaled SM exhibiting optical properties. Factors such as cloud cover can impact the downscaled SM, leading to the occurrence of null values [39]. The presence of clouds may influence the availability of downscaled SM at the corresponding location. Furthermore, the 3 km resolution may present challenges, potentially leading to the presence of missing values in the downscaled SM due to its inability to cover all processed pixels. To address this issue, we adopted an approach similar to that described by Wei Shangguan et al. [53] to fill these gaps. Specifically, we performed Kriging interpolation on the 3-km downscaled CYGNSS SM. However, it is important to acknowledge that the utilization of interpolation unavoidably introduces certain errors. Thus, some inconsistencies will present themselves during the validation stage.

Fifth, we utilized only four auxiliary variables, namely rainfall, land cover type, DEM, and NDVI. It is crucial to adequately consider the spatial scale of CYGNSS observation data, SM reference data, and auxiliary variables for the accuracy of SM retrieval. Factors such as soil type (sand, loam, clay, etc.) affect soil water absorption and the capacity to minimize water loss, as well as surface temperature variations and water evaporation caused by wind speed. As discussed by Volkan Senyurek [16], soil texture features are considered to have the greatest impact on retrieval SM among auxiliary inputs. In summary, while the method proposed in this paper has achieved a commendable accuracy in SM retrieval, there is room for improvement by considering a wider range of auxiliary factors. This has the potential to further enhance the accuracy of SM retrieval using GNSS-R.

Sixth, in assessing the spatial distribution of downscaled SM, this paper has not yet considered the influence of a variety of factors on plant growth and ET. These factors include light conditions, temperature, soil texture, and carbon dioxide concentration, all of which may have an impact on the accuracy of MODIS EVI and MODIS ET products, as changes in these factors may result in changes in vegetation activity and ET. There are limitations in using these products to assess the spatial distribution of downscaled SM. Considering that these factors may add to the complexity of the assessment, future research could attempt to integrate these factors to obtain more accurate downscaled SM estimates.

Seventh, in the process of evaluating the spatial distribution of downscaled SM, we employed the Kriging interpolation method. However, the Kriging interpolation method might not be optimal, as it measures SM with limited physical significance and could result in spatial heterogeneity. Therefore, in future research, it is essential to compare different interpolation methods and investigate their impact on downscaled SM. Selecting the most suitable interpolation method will facilitate the assessment of the spatial distribution of downscaled SM.

Finally, there are some limitations concerning the geographical scope of our study area and the duration of the data utilized. When validating the downscaled SM using in situ sites, we observed that, aside from grasslands and farmlands, the availability of in situ sites for other land cover types was limited. This paucity of data can hinder a comprehensive validation. By expanding the study area, the number of in situ sites for other land cover types would increase, thereby augmenting the validation dataset and enhancing the accuracy and reliability of our model performance assessment. In this study, the SM downscaling model was constructed using data from January to August 2019, while data from September to December was used for SM retrieval. This may introduce seasonal biases into the constructed downscaling model, leading to certain inaccuracies. Extending the data period for a year or even longer could mitigate such seasonal effects, thus boosting the reliability of the downscaling approach.
