*2.6. Signal Decomposition*

The STL geostatistical procedure is used in this work to decompose the SMOS signal into its temporal components and build an observation-based SM climatology. The STL technique was originally introduced by Cleveland et al. [25], and adapted by Humphrey et al. [37] to evaluate the seasonal cycle of unevenly spaced time series using locally weighted regression, or Loess. This technique has already been used in several research studies to extract the seasonal and interannual components of GRACE time series [37–39]. In this work, the STL procedure has been used to decompose the filtered and temporally-averaged SMOS SM signal (*SMtot*) as the sum of a seasonal component (*SMseas*), a low-frequency component (*SMlong*−*term*) and remaining high-frequency residuals (*SMres*):

$$SM\_{tot} = SM\_{long-term} + SM\_{sens} + SM\_{res} \tag{1}$$

The low-frequency component *SMlong*−*term* contains only periodicities larger than a season and is further decomposed into linear trends (*SMtrend*) and the anomalies with respect to this linear trend or interannual variability (*SMinterannual*). The high-frequency residual is expected to be both a real signal representing subseasonal variability and noise present in SMOS data. For a detailed description of the method, refer to [25,37]. In short, STL is a double recursive approach: an inner iteration cycle is used to recover the seasonal cycle from the low-frequency component using Loess; the outer iteration cycle is used to recalculate the Loess weights and to separate the signal between low- and high-frequency components again using Loess. At the end of the process, the low-frequency signal remaining after removal of the seasonal cycle is decomposed as a trend and an interannual signal. The residual component is equal to the resulting high-frequency component.

The application of the STL algorithm requires the specification of six smoothing filter parameters that need to be optimized to minimize spectral leakage between high- and low-frequency components and control the possible influence of outliers in the time series. These are: (1) the length of the seasonal cycle; (2) the degree of the weighted polynomial regression; (3) the number of cycles of the inner loop (used to estimate the trend, the seasonal and the interannual components); (4) the number of cycles of the outer loop (used to estimate the residual signal, i.e., the subseasonal component); (5) the maximum time lag for the seasonal component, and (6) the maximum time lag for the long-term component. Here, the seasonal cycle is taken as exactly 365 days, a multiple of the 5-day map interval (which was constructed disregarding the occurrence of the leap day, 29 February 2012). Following Humphrey et al. [37], the number of inner and outer loops is set to 2 and 3, respectively, a quadratic fit is used for the seasonal cycle and a linear fit for the long-term components. The optimal values for the maximum time lag of the seasonal and long-term components have been selected after a comprehensive analysis carried out at the 8 target sites that is reported hereafter. The role of the maximum time lag for the seasonal component *λper* is illustrated in Figure 5 for site A. It is shown that the smoothness of the seasonal cycle increases with the value of the maximum seasonal time-lag period. Although large values are recommended when using noisy data, they may hide key details of the seasonal cycle. A maximum seasonal time lag of 90 days fails to reconstruct the double-maximum SM in June and September. Instead, it results in an erroneous maximum during July. On the other hand, too short maximum seasonal time lags tend to over fit the noisy data. A similar effect was observed for sites C, E and G, whereas there was not a strong impact in B, D and F (not shown). These results indicate that a maximum seasonal time lag of 45 days is a reasonable compromise for SMOS SM data and it is the value used for the rest of this study.

**Figure 5.** Analysis of the extracted soil moisture seasonal component at location A. Two seasonal components are shown depending on the value of the maximum seasonal time lag *λper*. Black and gray lines corresponds the values of 45 days and 90 days, respectively.

The role of the maximum time lag for the long-term or low-frequency component is to separate the seasonal anomalies between a low-frequency (that will be further decomposed as a trend and an interannual component) and a residual high-frequency component. Figure 6 shows the spectra of the interannual and subseasonal SM components for a long-term maximum time lag of 0.20 × 365 (top) and of 0.10 × 365 (bottom). It is observed that the larger lag leads to a subseasonal component having maximums for cycles beyond 90 days, whereas the shorter lag leads to a subseasonal variability closer to white noise. Here, the lag of 0.10 × 365 has been selected since it allows a more appropriate decomposition of long-term variability and residual components (which integrates both subseasonal variability and instrumental noise). Results shown in Figure 6 are for site A, results obtained for the other sites are consistent (not shown).

**Figure 6.** Spectra (m3· <sup>m</sup>−3) <sup>2</sup>·cpd−<sup>1</sup> (*cpd* = cycles per day) versus frequency (cycles per day) for soil moisture interannual (black line) and residual (blue line) temporal components at location A. The seasonal maximum time-lag parameter is 45 days. Top and bottom plots are obtained using a long-term maximum time lag equal to 0.20 × 365 and 0.10 × 365 days, respectively. The time lag of 0.10 × 365 is selected to ensure the residual component contains only subseasonal variability and instrumental noise i.e., it has no maximums for cycles beyond 90 days.

#### *2.7. Analyses at Target Sites*

The STL procedure was applied to the SMOS time series at the target sites. The obtained distribution of SMOS SM variance among its temporal components was analyzed in detail. Subsequently, SMOS, GLDAS-Noah and ERA5 SM time series at these locations were inter-compared to identify consistencies and potential shortcomings of building a climatology with satellite data alone. A comparison of the three data sets to ground-based SM was performed for one of the target sites (REMEDHUS, in Europe). Statistical scores of the comparisons are provided.

## *2.8. Analyses at the Global Scale*

Global maps of the long-term average and standard deviation of SM were computed on a pixel basis over the filtered and temporally-averaged SMOS signal. The STL decomposition was subsequently applied globally to each individual pixel with a temporal coverage greater than 80% for the study period (see Figure 4). The relative magnitude of the extracted long-term, seasonal and residual components with respect to the total variance was computed to assess the dominant modes of temporal variability in global SM during the study period. The magnitude of the linear trends within the long-term component were also evaluated per pixel at the global scale.

#### **3. Results**

#### *3.1. Soil Moisture Temporal Decomposition at Target Sites*

Figure 7 illustrates the STL decomposition of the SMOS signal into the different subcomponents at the 8 target regions. In general, the temporal series show an almost negligible linear trend, with Australia (H) and South America Temperate (D) presenting a slight trend towards drier conditions

and North America Temperate (B) and Southern Africa (G) revealing a slight trend towards wetter conditions. This will be examined in detail later in this section. A clear seasonal signal is extracted for most of the sites, except for South America Temperate (D), which exhibits limited temporal variability in the original series (see also Figure 3). The residual component shows a temporal behaviour similar to that of white noise in all the sites. It is especially high in Australia (H) and, to a lesser extent, in Southern Africa (G). The analysis for Europe (E) reflects a regular seasonal cycle, with exceptionally dry conditions in winter of years 2011–2012 and 2014–2015, and exceptionally wet conditions during short periods of spring 2012, winter 2013–2014 and end of 2015, as suggested by the analysis of the interannual component. The dry anomaly in winter 2011–2012 has also been reported in previous studies devoted to detection of agricultural drought [40,41]. The interannual component presents also prominent fluctuations in North America Temperate (B), Southern Africa (G) and Australia (H). In particular, the dry anomaly observed in North America Temperate (B) in 2012 reflects the strong summer drought suffered in the contiguous US due to unusual high temperatures in spring and summer 2012 combined with record low rainfall [42]. The analysis of North America Boreal (A) exemplifies the non-negligible impact of temporal gaps in the decomposition procedure. The wet peak measured by SMOS in October 2010 and followed by a short no-data is assigned mainly to the subseasonal component. The same applies to the wet peak measured in October 2013, which is not affected by the absence of data. In contrast, there is a no-data corresponding to the winter months of the six annual cycles that are assigned to the interannual component. This reveals a limitation of the method to correctly differentiate short-term from long-term variability in the extremes of temporal gaps. This same effect was observed in initial tests conducted at target sites within Eurasia Boreal, Eurasia Temperate and Asia Tropical (see Figure 1), which were discarded from the analysis. Although we imposed an 80% minimum temporal coverage, the impact of no data in the time series needs to be taken with some caution when interpreting overall results.
