**4. Results**

## *4.1. Models Evaluation*

Following the methodologies outlined in Section 3, this paper constructs four CYGNSS SM downscaling models and adjusts the hyperparameters for the RF, XGBoost, LGBM, and GA-BP models. Hyperparameters are parameters given in advance in neural networks or machine learning to control the learning process of the model. The appropriate selection of hyperparameters is crucial for the predictive performance of the model and can also prevent the occurrence of overfitting or underfitting. The common hyperparametric methods are Grid search, Random search, and Bayesian optimization. This paper uses the Grid Search method for hyperparameter adjustment. Although Grid Search requires a longer runtime compared to the other two hyperparameter selection methods, it is a more exhaustive search method that ensures the best hyperparameter combination is found within the given parameter range. The final hyperparameter adjustment results are shown in Table 2.


**Table 2.** Results of hyperparameter adjustment for the four models.

Through this process, we found the best combination of hyperparameters for each model to ensure that the CYGNSS SM downscaling models have high predictive performance. In the subsequent analysis, we will use these optimal hyperparameter combinations to train the model and evaluate its performance. To preliminarily assess the performance of the four models, this paper uses the method of ten-fold cross-validation for comparative analysis. The dataset uses SMAP SM (36 km) as the reference value, and four CYGNSS parameters, including SR, SNR, LES, and TES, as well as the auxiliary variables described in Section 2. This paper selects the coarse resolution data (36 km) from January to August 2019 to construct the downscaling model, yielding a total of 303,354 samples. For the prediction dataset, we use high-resolution data (3 km) from September to December, which provides a total of 4,123,129 samples. The ten-fold cross-validation accuracy of the four models and the running time of the models are shown in Table 3 and Figure 5.

**Table 3.** Summary of the overall accuracy of the ten-fold cross-validation of the four models.


**Figure 5.** Summary of the accuracy of the ten-fold cross-validation of the four models. (**a**) RMSE; (**b**) MAE; (**c**) *R*.

Table 3 and Figure 5 present the accuracy of the ten-fold cross-validation of the four models and their execution times. In terms of execution time, the RF model took the longest, reaching 5712.91 s. This could be due to the fact that the RF model needs to generate a large number of decision trees during the training process and carry out complex voting and averaging operations, thus taking a longer time. The execution time of the GA-BP model was 46.16 s, significantly shorter than the RF model. However, the predictive performance

of the GA-BP model was not satisfactory, with an RMSE of 0.072, an MAE of 0.055, and an *R* of 0.831. These metrics indicate that the GA-BP model has relatively low accuracy and stability in predicting SM. The XGBoost model had a shorter execution time of 25.9 s. Its predictive performance was relatively good, with an RMSE of 0.038, an MAE of 0.028, and an *R* of 0.955. These metrics indicate that the XGBoost model has high accuracy and stability in predicting SM. The LGBM model had the shortest execution time, only 6.85 s, although its predictive performance was not as good as the RF and XGBoost models, with an RMSE of 0.045, an MAE of 0.034, and an *R* of 0.935. Nevertheless, these metrics still indicate that the LGBM model has acceptable predictive performance. Therefore, considering both execution time and predictive performance, the LGBM model performs best in terms of time efficiency, but its predictive performance is slightly worse than the RF and XGBoost models. Although the RF model takes the longest time, it has the best predictive performance. The XGBoost model performs well in both execution time and predictive performance. Although the GA-BP model has a shorter execution time, it has the worst predictive performance.

Figure 6 presents the performance of the downscaling models XGBoost, RF, LGBM, and GA-BP, which were constructed based on CYGNSS and auxiliary variables. The CYGNSS SM predictions at a coarser resolution (36 km) were compared with the SMAP SM predictions at the same resolution using the scatter plots for each model. It can be seen that the XGBoost and RF models perform well, exhibiting strong consistency between the CYGNSS SM and SMAP SM in both the training and testing set. The *R* for the training set is 0.95 and 0.99, respectively, while, for the testing set, they are both 0.95. However, the GA-BP model shows less satisfactory retrieval results, with an *R* of 0.84 for both the training and testing set. When comparing the RMSE of the models mentioned, XGBoost and RF models clearly outperform, with an RMSE of 0.038 and 0.012 for the training set, and 0.039 and 0.033 for the testing set. In contrast, the GA-BP and LGBM models show a higher RMSE, with 0.069 and 0.045 for the training set, and 0.070 for both in the testing set. The RF model exhibits the lowest MAE, with 0.008 for the training set and 0.022 for the testing set. However, the GA-BP and LGBM models show a higher MAE, with 0.054 and 0.033 for the training set, and 0.054 and 0.055 for the testing set, respectively.

The results indicate that the downscaling models built on RF and XGBoost outperform the models constructed using LGBM and GA-BP. Overall, the RF and XGBoost downscaling models demonstrate superior correlation and less error compared to the other models. This may be due to the robustness and unpredictable nature of the RF and XGBoost algorithms. When dealing with numerous variables at once, these techniques are intended to avoid overfitting. However, compared to the RF model, the XGBoost model achieves high accuracy with less time. Therefore, we mainly focus on downscaled SM from XGBoost model in the following sections.

Figure 7 presents the importance scores for various variables in retrieving outcomes with the XGBoost model. Among all input variables, the greatest impact on both the 36 km and 3 km resolution is seen with land cover and DEM. In particular, NDVI has a more substantial influence at the 36 km resolution, while its effect diminishes at the 3 km resolution. Conversely, influence of DDM\_SNR is relatively low at the 36 km scale, but shows an increase at the 3 km resolution.

**Figure 6.** The retrieval accuracy of the four models in the training and testing datasets.

**Figure 7.** Variables' importance scores at (**a**) 36 km and (**b**) 3 km of XGBoost model.

#### *4.2. Assessing the Accuracy of Downscaled Soil Moisture Using In Situ Observations*

In the study area, we utilized the spatial coverage of CYGNSS and selected 78 sites from the ISMN with ground-based observation data as our research subjects from 1 September to 31 December 2019. These sites mainly come from SCAN, USCRN, and SNOTEL. Given the utilization of SMAP SM as a reference for the downscaling model in this study, it is essential to ensure the credibility of the assessment between downscaled SM and in situ SM observations. To achieve this, we initiated accuracy statistics for SMAP SM and in situ SM observations. Furthermore, to conduct a comprehensive time series analysis, we randomly selected four in situ sites for comparative evaluation with SMAP SM.

Table 4 delineates the comparison between in situ SM observations and corresponding SMAP SM. Analysis reveals that, out of all the in situ sites, 48 exhibit an MAE below 0.6, while 58 showcase an RMSE below 0.7. Additionally, 50 sites demonstrate an *R* exceeding 0.7. The respective average values for MAE, RMSE, and *R* stand at 0.051, 0.062, and 0.813. Overall, the majority of in situ sites exhibit commendable accuracy, thereby validating the reliability of the downscaling model constructed with SMAP SM as a reference. Figure 8 further illustrates the time series comparison of in situ SM observations and SMAP SM for randomly selected four in situ sites, with the time frame matching the dates of the downscaling model's prediction set. Of note is the 2–3-day revisit period of the SMAP satellite, which inhibits the guarantee of simultaneous coverage for each in situ site within the study area. Despite this limitation, the temporal variation of in situ SM observations (the blue line) closely aligns with that of SMAP SM (the red line). This alignment underscores SMAP SM's capability to capture the temporal dynamics of in situ SM, thereby validating the rationale for utilizing in situ SM observations in the downscaled SM assessment. To provide a quantitative assessment of the downscaled SM from the XGBoost model, Table 5 includes the accuracy statistical data for the downscaled SM and in situ SM observations.

According to the data analysis results in Table 5, for the total of 78 sites we studied, 62% of the sites, or about 49 sites, have an *R* greater than 0.600. This value is quite high, indicating that the downscaling model for these sites have good predictive performance. Similarly, we have 54% of sites, about 43, with an RMSE less than 0.070, which also indicates that these sites have a small retrieval error. Further, 53% of sites, or about 42 sites, have an MAE less than 0.060, indicating that our model has a high accuracy of retrieval for these sites. Overall, the average *R*, RMSE, and MAE of all sites are 0.712, 0.065, and 0.058, respectively, demonstrating the excellent performance of our model overall. However, we also found that the type of land cover may affect the accuracy of site validation. To gain a deeper understanding of this issue, we conducted further analysis. After analysis, we concluded that the downscaled SM for most sites using the XGBoost model is reliable compared to the in situ SM observations. However, the validation accuracy of a few sites is relatively poor. To better understand the accuracy of the downscaling model, we also considered the type of land cover at the site location, as shown in Figure 9. This means that

the type of land cover at the site location may affect the accuracy of the model. Through further research and model adjustment, we hope to better predict and understand this impact to optimize our model accuracy.


**Table 4.** Accuracy statistics for SMAP SM and in situ SM observations.

**Figure 8.** Time series of the SMAP SM and the in situ SM observations at the four sites.



**Figure 9.** Precision statistics of in situ observations of different land cover types.

As seen in Figure 9, in the 78 in situ sites, 22 are located in grassland areas. Among these grassland sites, 14 have an *R* value less than 0.600, 13 have an RMSE less than 0.060, and 14 have an MAE less than 0.06. For the 21 sites situated in farmland areas, 18 have an *R* value less than 0.600, 10 have an RMSE less than 0.060, and 11 have an MAE less than 0.060. Of the 12 savanna sites, 7 have an *R* value less than 0.600, 3 have an RMSE less than 0.060, and 4 have an MAE less than 0.060. In the woody savannas, there are 13 sites, with 8 having an *R* value less than 0.600, 6 with an RMSE less than 0.060, and 4 with an MAE less than 0.060. Lastly, in the open shrublands, there are 4 sites. Two of these have an *R* value less than 0.600, all 4 have an RMSE less than 0.060, and 3 have an MAE less than 0.060. For the land cover types of deciduous broadleaf forests, mixed forests, closed shrublands, and cropland/natural vegetation mosaics, the number of sites are 3, 1, 1, and 1, respectively. Correspondingly, the sites with an *R* greater than 0.600 are 2, 0, 1, and 0. Sites with an RMSE less than 0.060 are found to be 0, 1, 1, and 1, while those with an MAE less than 0.060 are also 0, 1, 1, and 1.

This study primarily investigates the accuracy of downscaled SM models obtained through the application of the XGBoost model across nine different land cover types. The results indicate that sites located in grasslands and farmlands exhibit higher accuracy. This may be attributed to the fact that SM retrieval based on GNSS-R technology tends to be more accurate in flat areas than in areas with significant surface undulations or tree cover. Additionally, grasslands and farmlands are common land use types; hence, we have more sites for observation and validation. Conversely, the other seven land cover types have fewer sites, leading to a lack of sufficient validation data, which could be a significant factor affecting accuracy. Furthermore, we believe that other potential factors might influence the accuracy of SM retrieval. For instance, the varying soil properties and complexities across different regions could impact model performance. Highly heterogeneous soils or areas with significant rock content could lead to inaccurate predictions. Changes in precipitation and meteorological conditions might also affect the accuracy of the model. Prolonged droughts or consistent rainfall could potentially lead to decreased model performance during specific periods. The quality of GNSS-R technology data, the calibration process, and observational errors could impact model accuracy to some extent. Additionally, if there are changes in land use or land cover types in the study area during the observation period, this could affect the training and validation data, consequently influencing the model's accuracy. However, despite lower accuracy in some areas, it is evident from our results (Figure 9 and Table 5) that the downscaled SM model constructed in this study generally achieves satisfactory results. This suggests that our method exhibits adaptability and robustness, providing high accuracy in most scenarios.
