1. Introduction
Surface soil moisture (SSM) exerts a fundamental control on land surface hydrological and ecological processes [
1,
2] and serves as a key environmental input for a variety of scientific studies and applications such as flood and drought monitoring [
3,
4], wildfire risk assessment [
5], and crop yield forecasts [
6].
SSM strongly influences soil thermal and dielectric properties, surface reflectance, and vegetation physiology [
7]. Both optical-infrared (IR) and microwave remote sensing techniques provide practical approaches for quantifying the spatial distribution and temporal changes of regional SSM through measuring the electromagnetic signatures of the land surface. Optical-IR sensors are well suited for indirectly inferring surface and root zone soil moisture by monitoring the changes of surface thermal properties (e.g., soil temperature, thermal inertia in the case of bare soil) and surface reflectance properties sensitive to vegetation cover and growth, although, the observations may be degraded by cloud cover, atmosphere aerosols, and sub-optimal illumination conditions [
7]. Microwave remote sensing provides direct measurements of soil dielectric properties, which are highly sensitive to soil moisture changes [
8]. Satellite active and passive microwave sensors are also capable of day-and-night and nearly all-weather observations of Earth’s surface. The microwave penetration ability is superior to optical-IR wavelengths, with lower frequencies suitable for sensing deeper soil layers.
Operational SSM mapping over the globe mainly relies on space-borne microwave radiometers and scatterometers. For example, SSM at ~1-cm depth has been measured routinely using the Advanced Microwave Scanning Radiometer—EOS (AMSR-E) and AMSR-2 sensors since 2002 [
9,
10]. The AMSR-E/2 X-band SSM products have reasonable accuracy (RMSE < 0.06 cm
3/cm
3) for sparsely to moderately vegetated conditions with vegetation water content (VWC) less than 1.5 kg/m
2 [
9,
11]. With the launch of L-band microwave sensors, including the ESA Soil Moisture and Ocean Salinity (SMOS) mission [
12] and the NASA Soil Moisture Active-Passive (SMAP) mission [
2], significant improvements in SSM retrievals have been made in terms of both accuracy (RMSE < 0.04 cm
3/cm
3 for VWC < 6 kg/m
2), and soil sensing depth (~5 cm) [
2,
13,
14]. In addition, the microwave radiometers onboard polar-orbiting satellites provide 1 to 3-day global revisits suitable for monitoring surface wetness dynamics at a level of performance required for hydrological and ecological studies [
10,
15]. A major constraint of the satellite microwave radiometers or scatterometers is associated with their coarse product spatial resolutions (e.g., AMSR-E/2 25 km; SMAP 9 km and 36 km), which are unable to characterize local scale (e.g., meter-level) heterogeneity in SSM dynamics.
To overcome the limitations in satellite passive microwave observations, a variety of downscaling techniques have been developed to disintegrate radiometer brightness temperature (Tb) or soil moisture estimates into values for finer-scale pixels [
16]. The downscaling approaches generally rely on (a) higher-resolution information inferred from Synthetic Aperture Radar (SAR) observations [
17,
18], optical-IR remote sensing [
19,
20,
21], or land surface model simulations [
22]; and (b) cross-scale relationships among the observations, retrievals or simulations [
16,
23]. For example, SMOS 36-km SSM was downscaled to 1-km resolution using a triangular feature space defined by MODIS land surface temperature and vegetation index data without significant degradation in accuracy [
24]. SMAP Tb data was disintegrated into 1-km pixels by exploiting complementary information provided from Sentinel-1 SAR radar backscatter, and the disaggregated Tb was further used for deriving 1-km resolution SSM [
25]. Besides the physically-based approaches, machine learning (ML) methods were successfully used to define the non-linear and cross-scale relationships among soil moisture, satellite observations, and geospatial variables [
16]. In particular, tree-based regression models were proven effective in combining a variety of predictor variables from satellites, process models, and in situ measurements for enhancing SSM resolution [
20,
26,
27].
Continuous and local-scale (e.g., from 1-m to 100-m resolutions) surface wetness information is essential for characterizing environmental heterogeneity and dynamics [
22,
28] and improving applications such as irrigation management [
20], household-level flood risk assessment [
29], and landslide monitoring [
30]. Despite previous downscaling efforts, the potential of meter and sub-meter satellite remote sensing has not been fully explored, considering the growing capability of small or micro-satellites for rapid and high-resolution earth observations. Commercial satellite constellations consisting of optical or SAR sensors such as Planet and Capella are now able to provide global daily/sub-daily coverage at meter-level resolutions. Planet observations have been used for delineating fine-scale surface features such as water and ice over small water bodies [
31,
32]. The high-resolution images are rich in spatial, spectral, and textural information but also involve very large data loads. Relative to traditional computation performed locally, cloud-based platforms such as Google Earth Engine (GEE) enable more efficient access, processing, and analysis of big data archives [
33].
Benefiting from recent advances in remote sensing and cloud computation, this study focused on the synergistic use of high-resolution optical and coarse-resolution microwave sensors and exploited ML and Cumulative Distribution Function (CDF) matching approaches to derive daily and local-scale (3-m) SSM. The work is potentially useful for studies and applications needing improved quantification of land surface heterogeneity.
5. Discussion
This study proposed a new approach for deriving both high temporal and spatial resolution SSM data using a combination of ML, statistical modeling, and multi-sensor fusion. Of the ML approaches tested, the LightGBRegressor method showed the best performance in estimating SSM using independent and reflectance-based observations over 9-km grid cells (R
2 0.857; RMSE 0.029 cm
3/cm
3). Other regression-tree-based methods are also suitable for this application, given their comparable performance to the LightGBRegressor results (
Table 3). These alternative approaches include traditional RF methods widely used for SSM downscaling [
20,
26] and additional RF refinements using gradient boosting methods. One advantage of this new approach is that the model training does not rely on any in situ observations or measurements from airborne campaigns, which enables the approach to be generally applicable to other regions where high-quality SMAP retrievals are available.
When analyzing the importance of SSM predictor variables, the NDRE was weighted more than NDVI in the ML prediction. Considering the inherent relationships between vegetation growth and soil wetness, NDVI was among the main inputs for downscaling passive microwave SSM [
24,
26], while this study suggests that vegetation health conditions represented by NDRE are likely related more closely to SSM than the greenness quantified by NDVI. Soil moisture in the Yanco region has clear seasonality with generally drier conditions in the austral summer (DJF) and wetter soil conditions in winter (JJA) [
52], which likely led to the relatively high importance of N10DOY in the SSM prediction. In addition, elevation and slope factors are among the most important predictors, which reflect the impacts of topographic control on SSM spatial distributions [
26,
54]. Despite the high performance of the ML models over the 9-km grid cells, this new approach may still be constrained by the limited spectral information provided by PSD 8-band observations. Richer spectral information, in particular additional thermal band observations used in previous studies [
24], may enable further improvements in model performance.
There are two challenges in deriving meter-level SSM using the ML model trained using coarser 9-km grid cell data. One is that the training data sets representing or aggregated for 9-km grid cells may not be comprehensive enough to cover the range of local variability represented from 3-m pixels. The training data were collected from 49 SMAP 9-km grid cells spanning a 12-month period and accounting for a variety of surface soil and vegetation conditions. However, larger study regions and a longer training period would likely enable further algorithm enhancement. Another issue is the relatively sparse temporal coverage of clear-sky PSD observations due to frequent cloud cover causing approximately 80% missing data out of all possible PSD observations during the study period. Although clouds are a well-known constraint in optical remote sensing, this limits the capability of downscaling approaches for generating continuous SSM products needed in many applications, such as irrigation management [
20]. In addition, despite the overall lower RMSE and bias of the LightGBRegressor versus 9-km SMAP SSM results relative to the in situ measurements (
Table 5), the 3-m ML predictions had relatively lower correlations and more data point outliers (e.g.,
Figure 4a). To increase the temporal fidelity, the LightGBRegressor results paired with the corresponding SMAP SSM values were fed into the additional CDF matching process. The CDF matching removes outliers in the ML predictions likely caused by noise in PSD reflectance observations under suboptimal atmospheric conditions and, more importantly, for building a continuous SSM time series by accounting for the cross-scale SSM relationships for each 3-m pixel. After applying CDF matching, it was possible to generate SSM time series at 3-m resolution with similarly low RMSE and bias as the LightGBRegressor results while maintaining similar high correlation with in situ observations as the SMAP product.
It is worth noting that the underlying assumptions of our approach are that (a) the SMAP product has high accuracy over the 9-km grid cells, as has been shown for the Yanco region [
14], (b) station-based SSM measurements are representative of the overall surface wetness of 3-m pixels, and (c) the ML results can capture the unique soil wetness conditions for a given 3-m pixel. These assumptions held for most of the 14 sites examined (10 sites in
Table 5; 11 sites in
Table 6), which showed better SSM performance of the downscaled results than the original SMAP product. If the above assumptions are incorrect, no improvement in SSM at fine scales would be expected since the LightGBRegressor results would fail to represent the surface wetness level observed on site (e.g.,
Figure 4b). The sites without performance enhancement in the downscaled results relative to the SMAP product (e.g., Yb5e, Yb5d, and Yb3;
Table 5 and
Table 6) are concentrated in a small area with relatively homogeneous soil properties and land use (mainly pasture) [
55]. The lack of fine-scale variations in surface properties in this area likely leads to little added contribution to the SSM estimations from high-resolution SuperDove observations. There are no additional measurements within the 3-m pixels for evaluating the spatial representativeness of the in situ SSM measurements; however, differences between the 3-m pixel results and in situ SSM measurements, which may be biased due to site installation and management activities [
55], likely contribute to uncertainties in the model validation (
Section 4.2;
Table 5 and
Table 6).
The 3-m SSM distributions over the focused study area were examined under three contrasting wetness conditions. In general, irrigated farms and denser vegetation cover corresponded with higher 3-m SSM levels, while non-irrigated land, bare ground, and roads showed lower wetness. The fine-scale land features with significant SSM spatial variations caused by different irrigation regimes and vegetation cover were generally captured by the downscaled results. A major issue identified is the SSM overestimation relative to the HDAS measurements (e.g., Plots 3 and 4), which was likely caused by high-level retrieval biases as found in other locations (
Table 5 and
Table 6). In addition, the 3-m SSM estimates for irrigated fields were higher than the surrounding fields but lower than the HDAS measurements. The sub-daily irrigation signals are likely partially missed in the slower changes of vegetation conditions captured by the PDS observations, which led to the underestimation of downscaled SSM.