Next Article in Journal
Toward a Vision-Based Intelligent System: A Stacked Encoded Deep Learning Framework for Sign Language Recognition
Previous Article in Journal
Interval Type-2 Fuzzy PID Controller Using Disassembled Gradational Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Daily Soil Moisture Retrieval by Fusing CYGNSS and Multi-Source Auxiliary Data Using Machine Learning Methods

1
CAS Engineering Laboratory for Yellow River Delta Modern Agriculture, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
2
Shandong Dongying Institute of Geographic Sciences, Dongying 257000, China
3
Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
4
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
5
National Meteorological Center, China Meteorological Administration, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(22), 9066; https://doi.org/10.3390/s23229066
Submission received: 25 September 2023 / Revised: 4 November 2023 / Accepted: 6 November 2023 / Published: 9 November 2023
(This article belongs to the Section Intelligent Sensors)

Abstract

:
The Cyclone Global Navigation Satellite System (CYGNSS), a publicly accessible spaceborne Global Navigation Satellite System Reflectometry (GNSS-R) data, provides a new alternative opportunity for large-scale soil moisture (SM) retrieval, but with interference from complex environmental conditions (i.e., vegetation cover and ground roughness). This study aims to develop a high-accuracy model for CYGNSS SM retrieval. The normalized surface reflectivity calculated by CYGNSS is fused with variables that are highly related to the SM obtained from optical/microwave remote sensing to solve the problem of the influence of complicated environmental conditions. The Gradient Boost Regression Tree (GBRT) model aided by land-type data is then used to construct a multi-variables SM retrieval model with six different land types of multiple models. The methodology is tested in southeastern China, and the results correlate very well with the existing satellite remote sensing products and in situ SM data (R = 0.765, ubRMSE = 0.054 m3m−3 vs. SMAP; R = 0.653, ubRMSE = 0.057 m3 m−3 vs. ERA5 SM; R = 0.691, ubRMSE = 0.057 m3m−3 vs. in situ SM). This study makes contributions from two aspects: (1) improves the accuracy of the CYGNSS retrieval of SM based on fusion with other auxiliary data; (2) constructs the SM retrieval model with multi-layer multiple models, which is suitable for different land properties.

1. Introduction

Studying soil moisture (SM) with high accuracy is crucial for land–air interactions, affecting the climate system by influencing processes such as evapotranspiration and water transport at the land surface [1,2]. Accordingly, this highlights the necessity of retrieving SM data with high accuracy. SM can be retrieved by optical remote sensing missions, such as Moderate Resolution Imaging Spectroradiometer (MODIS), microwave remote sensing missions, such as Soil Moisture Active Passive (SMAP), Sentinel-1 (Sentinel-1) [3,4,5]. It is noted that optical remote sensing is seriously affected by cloud cover; passive microwave remote sensing has a low spatial resolution (9~40 km), and active microwave remote sensing has a low temporal resolution (≥6 days).
With the rapid development of the spaceborne Global Navigation Satellite System Reflectometry (GNSS-R) remote sensing technology, this technique provides a new, useful, and supplemental tool for SM retrieval at a regional scale. Spaceborne GNSS-R is an emerging remote sensing technique that exploits the capability of GNSS satellites in a bistatic radar configuration, and collects the Earth’s reflected signals by specially designed receivers. This technology has the unique advantages of a short revisit period, the ability to provide a cost-effective alternative, and a high spatial–temporal resolution. The CYGNSS mission consists of a constellation of eight satellites orbiting in the same plane at an altitude of approximately 510 km and an orbit inclination of 35°. Each micro-satellite is equipped with a specially designed GNSS receiver named SGR-ReSI. The GNSS-R payload on the CYGNSS mission can receive signals of the GPS L1C/A band, while the BeiDou, Galileo, and GLONASS data are not yet available for use. CYGNSS can obtain the surface retrievals at the level of a few hours, i.e., 2.8~7 h per day; in terms of spatial resolution, the minimum spatial footprint is ~3.5 km × 0.5 km for coherent scattering from smooth surfaces [6,7]. Moreover, the CYGNSS uses the L-band (1–2 GHz), which is relatively sensitive to surface SM retrieval.
The spaceborne GNSS-R SM retrieval methods used at present can mainly be categorized into two types, i.e., empirical and data-driven methods. The empirical method takes SMAP SM or in situ SM as the reference data, establishing the conversion relationship between spaceborne GNSS-R surface reflectivity (SR) and the reference SM [7,8,9]. At present, SMAP is the most widely used regional-scale SM data product for comparison with CYGNSS data, and it operates in the same band (i.e., L-band) as CYGNSS, making a direct comparison of the SM data meaningful [7,8]. Due to the high heterogeneity of SM, SM can reflect the surface conditions under the effect of climatic, meteorological, and hydrological factors, e.g., vegetation water content (VWC), land surface temperature (LST), slope, surface roughness, and precipitation. Subsequently, the data-driven method is derived based on the empirical method, which mainly combines spaceborne GNSS-R with multi-source remote sensing auxiliary data using machine learning (ML) models to improve the accuracy of SM retrieval [10,11,12,13,14,15,16,17]. Yang et al. (2020) fused the CYGNSS, TDS-1 data, and multi-source remote sensing data based on the BP neural network to retrieve SM in southeastern China at 36 km resolution [10]. Senyurek et al. (2020) comparatively analyzed the validation of the advantages in CYGNSS SM retrieval based on a variety of common ML algorithms (ANN, RF, and SVR) [11]. To make the CYGNSS SM applicable to different land types with high accuracy Jia et al. (2021) estimated the global SM with a spatial of 36 km using ML regression aided by a pre-classification strategy [12].
This study aims to develop a high-accuracy model for CYGNSS SM retrieval, based on the approach described by [13]. First, the normalized CYGNSS SR is retrieved using the coherent model. Then, normalized CYGNSS SR is fused with the variables highly related to the SM obtained from optical/microwave remote sensing data, aiming to solve the problem of the influence of complicated environmental conditions. Finally, these parameters are combined through the Gradient Boosted Regression Trees (GBRT) model; here, multiple models with six different land cover types (i.e., cropland, forest, grassland, shrubland, wetland, and bare land) are built, and the results derived from each model are linked together. The result is validated with SMAP SM, ERA5 SM, and in situ SM.

2. Data and Methods

2.1. Data

2.1.1. CYGNSS

CYGNSS consists of eight microsatellites with a global coverage of approximately from 38° N to 38° S. Each satellite carries a four-channel GNSS-R payload, for GNSS-R low-orbit measurements. The average revisit period is about 7.2 h [8]. Many researchers utilize the observations of Delayed Doppler Map (DDM) data from CYGNSS for SM retrieval. The CYGNSS receivers firstly calculate the satellite position and receiver position, based on ephemeris for the collected direct GNSS satellite signals and the reflected GNSS satellite signals, respectively; the specular reflection point position and Doppler delay of the reflected signals are then calculated to generate the DDM [6]. Level 1a data of CYGNSS V 3.1 version were used in this study. During the data preprocessing, the water bodies were masked, and only the land data were retained. The CYGNSS data in 2020 and 2021 were chosen as the training data and testing data, respectively. In addition, to synchronize with the in situ SM, we simultaneously tested the CYGNSS in July 2018. Since the CYGNSS V 3.1 version was unavailable until August 2018, the V 2.1 version of CYGNSS data was compared with in situ SM. The CYGNSS data were finally resampled to a 9 km Equal-Area Scalable Earth (EASE) Grid 2.0 cell.

2.1.2. SMAP

The SMAP operates at L-band, with an incidence angle of about 40°. Due to the satellite orbit design, there are gaps in the daily SMAP SM product, and the revisit time ranges from 2 to 3 days [18]. As mentioned before, SMAP SM is commonly used for comparison with CYGNSS SR data in empirical or data-driven methods to convert the CYGNSS SR to the final SM.
The SMAP SM data were obtained from the National Snow and Ice Data Center (NSIDC) at the L3 level with a spatial resolution of 9 km. In this study, the “retrieval_qual_flag” variable in the data product was utilized for SM quality control. At the same time, the daily ascending and descending data were averaged as the final daily SM. SMAP values larger than 0.6 m3m−3, and less than 0.04 m3m−3 are excluded. To synchronize with the CYGNSS data, the SMAP data of 2020 were selected as the model target data, and the SMAP data of July 2018 and 2021 were used as the testing data, respectively.

2.1.3. ERA5-Land

The CYGNSS and SMAP satellites both work at the L-band. To further verify the accuracy of the retrieved SM in this study, ERA5-Land SM data were selected as an independent data source for a cross-comparison of the results. ERA5-Land data were obtained from the European Center for Medium-Range Weather Forecasts (ECMWF). This is a reanalyzed dataset consisting of a combination of the modeled data and in situ observations. The spatial resolution of ERA5-Land is 0.1°, and the temporal resolution is 1 h [19]. This study averaged the hourly ERA5-Land SM to obtain its daily average SM. Second, to match with the CYGNSS SM, this study resampled the ERA5-Land SM to 9 km based on the nearest neighbor interpolation method.

2.1.4. In-Situ SM Data

The in situ SM data from 596 stations collected on 1 July, 11 July, and 21 July 2018, were provided by the China Meteorological Administration (Figure 1). These data are currently not publicly available. Data were collected to a depth of 10 cm and then averaged at timepoints ranging from hourly to daily. It is worth noting that the L-band signals are relatively sensitive to the surface SM (~5 cm depth); thus, there is a different sensitivity depth between in situ SM and CYGNSS SM. Considering the complex geographical conditions in China, the selected stations were distributed in six provinces of China (i.e., Henan, Hunan, Shandong, Jiangxi, Sichuan, and Yunnan), with different land types and topographic distributions.

2.1.5. Auxiliary Data

Significant spatial differences in topography and vegetation changes were also considered in this study to select auxiliary variables affecting SM. Variables such as climate, elevation, slope, precipitation, soil, and vegetation characteristics can also influence SM. The vegetation index can provide information on the growth status of vegetation; LST is a good indicator to characterize the energy balance of the land surface, and can indirectly reflect the SM status [20,21]. In addition, precipitation strongly influences soil infiltration, runoff, and evapotranspiration, which are the main control factors affecting SM [22]. Topographic factors, such as slope, aspect, and elevation, influence SM changes and distribution patterns [23]. The surface albedo was estimated using MCD43A3 reflectance by the method proposed by [24]. Table 1 summarizes the auxiliary data information used in this study. Here, slope and aspect are calculated based on the NASA Shuttle Radar Topographic Mission (SRTM) 90 m Digital Elevation Models (DEM) data [23]. It is worth mentioning that the VWC derived from SMAP is retrieved based on the empirical method using the Normalized Difference Vegetation Index (NDVI) of the optical data, and thus there is no strong correlation between the data and SMAP SM. It has been proved in [13] that the surface roughness variable has less influence on the SMAP SM retrieval than the vegetation, so the impact of VWC and surface roughness on the SMAP data can be ignored.

2.2. Methodology

The overall flowchart for this study is shown in Figure 2. The main steps are summarized as follows: (1) the normalized CYGNSS SR is obtained using the coherent model; (2) quality control of auxiliary data: variables highly related to the SM obtained from optical/microwave remote sensing data are matched up, aiming to solve the problem of the influence of complicated environmental conditions; (3) SM retrieval: this step is executed to build the data-driven model for retrieving SM from input parameters through the GBRT model, where the GBRT model contains six models aided by different land types (i.e., cropland, forest, grassland, shrubland, wetland, and bare land); (4) validation: the result is evaluated by comparison with SMAP SM, ERA5-Land SM, and in situ data.
Three main differences exist between the algorithm presented by Yang et al. (2020) [13] and the one presented here. The first is that the proposed method uses normalized CYGNSS SR as the proxy of CYGNSS SM to eliminate the effect of incidence angle. Second, this retrieval algorithm uses six sub-models with different land cover types. Third, the SM resolution obtained by the two algorithms is different, i.e., 9 km vs. 36 km.

2.2.1. Normalized CYGNSS SR Retrieval

The CYGNSS approaches for retrieving SM should account for the dominance of coherent or incoherent effects. The coherent method assumes that the coherent component dominates the land reflections, but dense vegetation and a rough terrain can cause incoherent scattering [4,6,7,10,14]. It should be pointed out that the dense-vegetation- and rough-terrain-induced incoherence scattering mentioned here do not conflict with our study correcting the vegetation effects. The discussion of vegetation-induced incoherence aims to determine how SR is derived, since the methods of deriving the SR from coherent and incoherent scattering signals are totally different. Through the GlobeLand30-2020 land classification data statistics [25], the proportion of CYGNSS located in densely vegetated areas accounts for only forested areas accounted for ~7.53%; therefore, this study assumes that the reflected signals are dominated by the coherent scatterings.
For the CYGNSS data preprocessing step, firstly, the data with a signal-to-noise ratio of less than 0 are filtered; secondly, data with incidence angles greater than 65° are removed; thirdly, the variable of “quality control” (e.g., spacecraft attitude error, blackbody DDM direct signal, and GPS equivalent isotope radiated power) is used to select the good data acquisition. The SR (dB) can be calculated as [7,14]:
Γ r , e f f ( d B ) = 10 log P r + 20 log ( R t s + R s r ) 10 log G t 20 log λ 10 log P t 10 log G r 20 log c o s θ
where Γr,eff is the SR; Pr represents the peak DDM power minus the noise; Rts represents the distance between the transmitter and the specular reflection point; Rsr represents the distance between the specular reflection point and the receiver; λ is the GPS wavelength; Gt is the gain of the GPS transmitting antenna; Gr is the gain of the receiving antenna; Pt is the transmitted power. The above observables are available from the CYGNSS L1a V 3.1 and V 2.1 data.
In addition to the effect of the soil dielectric constant, SR is also affected by the incidence angle, as in Equation (1). To eliminate the effect of the incidence angle, and retain only the effect of the soil dielectric constant, the incidence angle of SR is normalized to 0° here. Previous studies have investigated the reflected signals of CYGNSS dependence on the incidence angle [9,26]. Thus, to eliminate the effect of incidence angle in CYGNSS SR, the CYGNSS SR is calibrated as follows:
Γ r , e f f 0 , d B = Γ r , e f f θ , d B / f θ
where Γ r , e f f 0 , d B is the normalized SR, hereafter named SRn; f ( θ ) is the modified function. Here, f ( θ ) is not a simple expression; it is related to the incidence angle. The f(θ) value shows a nonlinear decreasing trend as the incidence angle increases. When the incidence angle is 0°, the value of f(θ) is 1; when the incidence angle is 65°, the f(θ) has a value of about 0.8. We previously simulated the value of f ( θ ) varying with the incidence angle, and the value of f θ can be referred to in [26]. Subsequently, the CYGNSS SR is gridded to 9 km spatial resolution.

2.2.2. SM Retrieval Model Construction

This study comprehensively considers the effects of eight environmental and land auxiliary variables (i.e., VWC, precipitation, LST, surface albedo, elevation, surface roughness, aspect, and slope) on SM. CYGNSS SRn (spatial resolution of 9 km) is used as the reference data, and other data are all resampled to 9 km using the nearest-neighbor interpolation method. Jia et al. (2021) proposed a pre-categorization strategy to minimize the influence of different land types [12]. This study thus constructs multiple models considering the complex geographical environment. The GBRT model is used to construct the model aided by the land cover data separately for collaborative computation (Figure 2). The GBRT is a typical ML method; it combines basic regression trees and boosting techniques to perform nonlinear regression and classification [26]. Theoretically, the GBRT uses the classification and regression trees as weak classifiers, and uses the boosting techniques for iterative training [27]. In this study, the “sklearn” library in Python is used.
Based on the resampled 9 km GlobeLand30 land-type data [27], the input data are divided into six classes according to land type (cultivated, forest, grass, shrub, wetland, and barren). The whole model is then divided into six models for training to reduce the effect of land properties and improve the model’s adaptability to complex surfaces.
The steps of the model are as follows:
(1)
The area is divided into six land types according to the GlobeLand30 data. All auxiliary data are resampled to 9 km. The feature data of the same land types are then extracted to construct multiple models.
(2)
Multiple models of six different land types are built to extract SM features. Then, the feature fusion layer is constructed to connect all nodes. The optimal structure of the model is determined through repeated testing.
(3)
The dataset is divided into a training set and a testing set to verify the accuracy of the model. The training set is used to adjust the variables of the GBRT model, and the testing set is used to test the performance. The fusion model process is shown in Figure 3.

2.2.3. Error Metrics

Two statistical error metrics, i.e., the correlation coefficient (R), and the unbiased root mean square error (ubRMSE), are used to provide quantitative analysis between the CYGNSS SM and SMAP SM. The formulas for these metrics are shown:
R = E i E ¯ O i O ¯ E i E ¯ 2 O i O ¯ 2
u b R M S E = 1 n i = 1 n E i E ¯ O i O ¯ 2
where n is the number of observations; O i   is the reference SM; Ei is the CYGNSS SM.

3. Results

3.1. Quality Control of Auxiliary Variables

Determination of the input features is an important step in the proposed model. The importance of the input variables to the SM is calculated using the function of “permutation_importance” from the sklearn module of Python. Permutation importance is a method used to assess the importance of features based on model prediction results. It obtains the value of feature importance by randomly disrupting individual features to obtain the extent of each feature variation’s effect on model accuracy [27]. The value is then measured to determine the variable of importance to the target data, i.e., SMAP SM. The importance values of the above eight variables (i.e., CYGNSS SR, VWC, slope, surface roughness, precipitation, surface albedo, land surface temperature, aspect, and elevation) are listed in Table 2. In addition, to assess the contribution of CYGNSS in SM retrieval, the CYGNSS SR is also included in calculating importance values, along with auxiliary variables. The testing data were collected from 1 July 2021 to 31 July 2021 in southeastern China, and the target data were SMAP SM. As illustrated in Table 2, the CYGNSS SR shows the highest value of feature importance with SMAP SM. The feature importance of VWC, precipitation, and LST is greater than 0.1. Conversely, the values of elevation and aspect are less than 0.02. Thus, it can be concluded that the elevation and aspect are less correlated with the SM. In this study, except for the CYGNSS SR, six auxiliary variables with feature importance values greater than 0.02 were selected to construct the SM retrieval model, i.e., VWC, slope, surface roughness, precipitation, surface albedo, and LST. It should be noted that the test data presented here are relatively short compared to the long-term data input in the GBRT model, which may lead to fluctuations in the relative weights of the feature importance values.

3.2. Comparison with SMAP SM

To validate the retrieval accuracy of CYGNSS SM, the spatial distributions of the CYGNSS SM, and SMAP SM on 18 January 2021, are shown in Figure 4a,b. The SMAP is three days averaged to a final value, aiming to fill the gap in the data. The result of CYGNSS SM minus SMAP SM is shown in Figure 4c. The CYGNSS SM shows a good consistency of fluctuations with SMAP SM, and the CYGNSS SM is close to the SMAP SM result. The result shows that the SM values are larger in the central region, with a variation range between 0.3 and 0.5 m3 m−3, and smaller values are seen in the north, with a variation range between 0 and 0.2 m3 m−3. The delta value varies in the range −0.19~0.15 m3 m−3, with a mean value of −0.003 m3 m−3, and the delta value varies in the central region.
As shown in Figure 5, the R and ubRMSE are introduced to compare the CYGNSS SM with SMAP SM in 2021. Generally, the spatial patterns of both indices are good, with a mean R = 0.765 m3m−3, and ubRMSE = 0.054 m3m−3. The performance of R is relatively good, with more than 60% of the R-values being larger than 0.6. In terms of ubRMSE, most of the region has ubRMSE values less than 0.05 m3m−3, while the northeast and west regions are greater than 0.03 m3 m−3. The incoherent scattering caused by dense vegetation and high surface roughness may lead to this phenomenon.

3.3. Comparison of with ERA5-Land SM

Figure 6 shows the spatial distribution of R and ubRMSE of CYGNSS SM and ERA5-Land SM in 2021 based on a daily scale. In most study areas, CYGNSS SM correlates well with ERA5-Land SM, with an average value of R = 0.653 and ubRMSE = 0.057 m3 m−3 due to the different spatial mismatches in the measurement depth between ERA5-Land SM detection (an effective depth of 0–7 cm) and CYGNSS L-band frequency signal penetration (an effective depth of 0–5 cm). In addition, ERA5 SM is reanalysis model-derived data and may not accurately capture SM dynamics. Figure 6 shows a lower correlation in the overlap area than SMAP SM.

3.4. Comparison with In Situ SM

This study utilizes the in situ SM data from July 2018 in southeastern China to further validate CYGNSS SM’s retrieval accuracy. The CYGNSS SM, located less than 0.45 km from this site, was selected during the observation period. Here, 5766 in situ SM were used and 786 in situ SM were used. The results of the density plots are shown in Figure 7. Specifically, the data agree well with an R greater than 0.65, and RMSE = 0.057 m3m−3. This indicates that the CYGNSS SM provides fine-scale information on SM and can capture variations in the observed SM.

3.5. Comparisons with the GBRT Model without Land Cover Classification

To show the benefits of utilizing multiple models to include land cover information, Figure 8 shows the average SM validation results for six land cover types from January to June, 2021. The R values are calculated from CYGNSS SM and SMAP SM, with an average value of 0.675. As shown in Figure 8a, the R values are lower in forested areas, and this phenomenon may be due to the larger proportion of incoherent reflective signals in these areas. The R values of grassland and barren areas are higher, with values greater than 0.65. Figure 8b shows the results of the ubRMSE. The ubRMSE of all land types is less than 0.07 m3 m−3, and the average value is 0.053 m3 m−3. The results indicate that the retrieval model proposed in this study is relatively reliable.
To further prove the effectiveness of the model proposed by this study, this study compares the accuracy of the proposed model (GBRTGL30_SM) with the GBRT model without land cover classification (GBRTSM). As shown in Figure 8, the R-value of the GBRT_GL30_SM model is higher than that of the GBRTSM model, and the mean value of R increases from 0.617 to 0.675 (~9.40% gain in R); in terms of the ubRMSE, the mean value of the ubRMSE decreases from 0.058 m3 m−3 to 0.053 m3 m−3 (~8.62% decrease in ubRMSE).

4. Discussion

The research community regarding the use of spaceborne GNSS-R to sense SM has been facing difficulties in removing the effects of complex terrain. This methodology differs from other studies in that it retrieves SM from using multiple different land cover models constructed by the GBRT model, and shows an improvement in the accuracy of SM. In addition, the proposed methodology provides a complementary method for estimating SM.
According to the presented results, the proposed framework has several limitations. First, the reliable SM datasets rely on the use of SMAP as a reference rather than retrieving SM from the mechanistic perspective; errors in SMAP SM products will lead to uncertainty in the final results. Moreover, since most signals are probably a combination of incoherent and coherent scattering, we assume the reflected signals are coherent, which may lead to errors in final SM retrievals. Therefore, the possible directions for progressively enhancing this methodology should focus on comprehensively considering the incoherent and coherent signal scattering.

5. Conclusions

This study combines CYGNSS data with other remote sensing data to address the effect of complicated environmental conditions. Using the GBRT model, aided by land-type data, a multi-variables SM retrieval model with six different land cover submodels was built. The results of this study are expected to support the SM retrieval method for in-orbit GNSS satellites.
The main conclusions of the study are summarized as follows:
(1)
The method proposed in this study can retrieve SM with high accuracy, with R= 0.765 m3m−3, and ubRMSE = 0.054 m3m−3 compared to SMAP SM, and R = 0.653 and ubRMSE = 0.057 m3 m−3 compared to ERA5-Land SM, and with R = 0.691, and RMSE = 0.057 m3m−3 compared to the in situ SM.
(2)
The accuracy of the proposed model is improved compared to the model without land cover classification, with the R-value improved by ~9.40% and the ubRMSE value decreased by ~8.62%.

Author Contributions

Conceptualization, T.Y., J.W. and Z.S.; methodology, T.Y. and J.W.; validation, T.Y.; formal analysis T.Y.; resources, T.Y. and S.L.; writing—original draft preparation, T.Y.; writing—review and editing, T.Y., J.W. and Z.S.; supervision, Z.S.; funding acquisition, T.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 42101376), and the National Key Research and Development Program of China (2021YFD1900902).

Data Availability Statement

Data used in this study are available on request from the first author.

Acknowledgments

The authors would also like to thank the CYGNSS scientific team for their hard work in providing the data used in this study. The authors would also like to thank the Soil Moisture Active Passive (SMAP) and the European Centre for Medium-Range Weather Forecasts (ECMWF) teams for archiving and providing the data used in this study. Also, we thank the China National Meteorological Center for providing in situ SM data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Glenn, E.P.; Huete, A.R.; Nagler, P.L.; Nelson, S.G. Relationship Between Remotely-sensed Vegetation Indices, Canopy Attributes and Plant Physiological Processes: What Vegetation Indices Can and Cannot Tell Us About the Landscape. Sensors 2008, 8, 2136–2160. [Google Scholar]
  2. Verstraeten, W.W.; Veroustraete, F.; Feyen, J. Assessment of Evapotranspiration and Soil Moisture Content across Different Scales of Observation. Sensors 2008, 8, 70–117. [Google Scholar]
  3. Chan, S.K.; Bindlish, R.; O’Neill, P.; Jackson, T.; Kerr, Y. Development and assessment of the smap enhanced passive soil moisture product. Remote Sens. Environ. 2018, 204, 2539–2542. [Google Scholar]
  4. Portal, G.; Vall-Llossera, M.; Piles, M.; Camps, A.; Chaparro, D.; Pablos, M.; Rossato, L. A spatially consistent downscaling approach for SMOS using an adaptive moving window. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1883–1894. [Google Scholar]
  5. Kornelsen, K.C.; Coulibaly, P. Advances in soil moisture retrieval from synthetic aperture radar and hydrological applications. J. Hydrol. 2013, 476, 460–489. [Google Scholar]
  6. Clarizia, M.P.; Pierdicca, N.; Costantini, F.; Floury, N. Analysis of CYGNSS data for soil moisture retrieval. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2227–2235. [Google Scholar]
  7. Chew, C.; Shah, R.; Zuffada, C.; Hajj, G.; Masters, D.; Mannucci, A.J. Demonstrating soil moisture remote sensing with observations from the UK TechDemoSat-1 satellite mission. Geophys. Res. Lett. 2016, 43, 3317–3324. [Google Scholar]
  8. Chew, C.; Small, E. Description of the UCAR/CU soil moisture product. Remote Sens. 2020, 12, 1558. [Google Scholar]
  9. Al-Khaldi, M.M.; Johnson, J.T.; O’Brien, A.J.; Balenzano, A.; Mattia, F. Time-series retrieval of soil moisture using CYGNSS. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4322–4331. [Google Scholar]
  10. Yang, T.; Wan, W.; Sun, Z.; Liu, B.; Li, S.; Chen, X. Comprehensive evaluation of using TechDemoSat-1 and CYGNSS data to estimate soil moisture over mainland China. Remote Sens. 2020, 12, 1699. [Google Scholar]
  11. Senyurek, V.; Lei, F.; Boyd, D.; Gurbuz, A.C.; Kurum, M.; Moorhead, R. Evaluations of ML-based CYGNSS soil moisture estimates against SMAP observations. Remote Sens. 2020, 12, 3503. [Google Scholar]
  12. Jia, Y.; Jin, S.; Chen, H.; Yan, Q.; Savi, P.; Jin, Y.; Yuan, Y. Temporal-spatial soil moisture estimation from CYGNSS using machine learning regression with a preclassification approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4879–4893. [Google Scholar]
  13. Yang, T.; Wan, W.; Wang, J.; Liu, B.; Sun, Z. A Physics-Based Algorithm to Couple CYGNSS Surface Reflectivity and SMAP Brightness Temperature Estimates for Accurate Soil Moisture Retrieval. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4409715. [Google Scholar]
  14. Yan, Q.; Huang, W.; Jin, S.; Jia, Y. Pan-tropical soil moisture mapping based on a three-layer model from CYGNSS GNSS-R data. Remote Sens. Environ. 2020, 247, 111944. [Google Scholar]
  15. Lei, F.; Senyurek, V.; Kurum, M.; Gurbuz, A.C.; Boyd, D.; Moorhead, R.; Crow, W.T.; Eroglu, O. Quasi-global machine learning-based soil moisture estimates at high spatio-temporal scales using CYGNSS and SMAP observations. Remote Sens. Environ. 2022, 276, 113041. [Google Scholar]
  16. Eroglu, O.; Kurum, M.; Boyd, D.; Gurbuz, A.C. High spatio-temporal resolution CYGNSS soil moisture estimates using artificial neural networks. Remote Sens. 2019, 11, 2272. [Google Scholar]
  17. Xu, H.; Yuan, Q.; Li, T.; Shen, H.; Zhang, L.; Jiang, H. Quality improvement of satellite soil moisture products by fusing with in-situ measurements and GNSS-R estimates in the western continental US. Remote Sens. 2018, 10, 1351. [Google Scholar]
  18. Chan, S.K.; Bindlish, R.; O’Neill, P.E.; Njoku, E.; Jackson, T.; Colliander, A.; Chen, F.; Burgin, M.; Dunbar, S.; Piepmeier, J.; et al. Assessment of the SMAP passive soil moisture product. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4994–5007. [Google Scholar]
  19. Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar]
  20. Holzman, M.E.; Rivas, R.; Piccolo, M.C. Estimating soil moisture and the relationship with crop yield using surface temperature and vegetation index. Int. J. Appl. Earth Obs. 2014, 28, 181–192. [Google Scholar]
  21. Zhang, D.; Zhou, G. Estimation of Soil Moisture from Optical and Thermal Remote Sensing: A Review. Sensors 2016, 16, 1308. [Google Scholar] [PubMed]
  22. Koster, R.D.; Dirmeyer, P.A.; Guo, Z.; Bonan, G.; Chan, E.; Cox, P.; Gordon, C.T.; Kanae, S.; Kowalczyk, E.; Lawrence, D.; et al. Regions of strong coupling between soil moisture and precipitation. Science 2004, 305, 1138–1140. [Google Scholar]
  23. Chen, S.; She, D.; Zhang, L.; Guo, M.; Liu, X. Spatial downscaling methods of soil moisture based on multisource remote sensing data and its application. Water 2019, 11, 1401. [Google Scholar]
  24. Liang, S. Narrowband to broadband conversions of land surface albedo I algorithms. Remote Sens. Environ. 2001, 76, 213–238. [Google Scholar]
  25. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. Isprs J. Photogramm. 2015, 103, 7–27. [Google Scholar]
  26. Wang, J.; Yang, T.; Zhu, K.; Shao, C.; Zhu, W.; Hou, G.; Sun, Z. A novel retrieval model for soil salinity from CYGNSS: Algorithm and test in the Yellow River Delta. Geoderma 2023, 432, 116417. [Google Scholar]
  27. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar]
Figure 1. Distribution of in situ SM (1 July 2018).
Figure 1. Distribution of in situ SM (1 July 2018).
Sensors 23 09066 g001
Figure 2. Flowchart of this study.
Figure 2. Flowchart of this study.
Sensors 23 09066 g002
Figure 3. Flow chart of the fusion model.
Figure 3. Flow chart of the fusion model.
Sensors 23 09066 g003
Figure 4. SM in southeastern China on 18 January 2021 (9 km). (a) CYGNSS SM, (b) 3-day averaged SMAP SM, (c) delta between the two data.
Figure 4. SM in southeastern China on 18 January 2021 (9 km). (a) CYGNSS SM, (b) 3-day averaged SMAP SM, (c) delta between the two data.
Sensors 23 09066 g004
Figure 5. Comparison with SMAP SM for 2021 in southeastern China (9 km). (a) R, (b) ubRMSE.
Figure 5. Comparison with SMAP SM for 2021 in southeastern China (9 km). (a) R, (b) ubRMSE.
Sensors 23 09066 g005
Figure 6. Comparison with the ERA5-Land SM in 2021 in southeastern China (9 km). (a) R, (b) ubRMSE.
Figure 6. Comparison with the ERA5-Land SM in 2021 in southeastern China (9 km). (a) R, (b) ubRMSE.
Sensors 23 09066 g006
Figure 7. Comparison of CYGNSS SM with in situ SM in July 2018.
Figure 7. Comparison of CYGNSS SM with in situ SM in July 2018.
Sensors 23 09066 g007
Figure 8. Accuracy of SM under different land types. (a) R, (b) ubRMSE.
Figure 8. Accuracy of SM under different land types. (a) R, (b) ubRMSE.
Sensors 23 09066 g008
Table 1. Auxiliary remote sensing data information.
Table 1. Auxiliary remote sensing data information.
DataTemporal ResolutionSpatial Resolution
CYGNSS L1aDaily~0.5 km × 3.5 km
(coherent)
CLDAS LST Daily0.625°
MCD43A3 reflectance16 days500 m
GPM IMERG Late Precipitation L3 precipitationDaily0.1°
ASTER surface elevation/90 m
SMAP surface roughnessDaily9 km
SMAP VWCDaily9 km
Table 2. Feature importance values of different auxiliary variables.
Table 2. Feature importance values of different auxiliary variables.
VariablesFeature Importance
CYGNSS SRn0.385
VWC0.162
Slope0.063
Surface roughness0.026
Precipitation 0.130
Surface albedo 0.071
LST0.143
Elevation 0.011
Aspect0.009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, T.; Wang, J.; Sun, Z.; Li, S. Daily Soil Moisture Retrieval by Fusing CYGNSS and Multi-Source Auxiliary Data Using Machine Learning Methods. Sensors 2023, 23, 9066. https://doi.org/10.3390/s23229066

AMA Style

Yang T, Wang J, Sun Z, Li S. Daily Soil Moisture Retrieval by Fusing CYGNSS and Multi-Source Auxiliary Data Using Machine Learning Methods. Sensors. 2023; 23(22):9066. https://doi.org/10.3390/s23229066

Chicago/Turabian Style

Yang, Ting, Jundong Wang, Zhigang Sun, and Sen Li. 2023. "Daily Soil Moisture Retrieval by Fusing CYGNSS and Multi-Source Auxiliary Data Using Machine Learning Methods" Sensors 23, no. 22: 9066. https://doi.org/10.3390/s23229066

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop