**1. Introduction**

The hydrologic component of the climate system, as a vital driver of vegetation activity and productivity across many terrestrial ecosystems, is a constraint to over half of the world's primary productivity [1]. Consequently, shifts and anomalies in the dynamics of the hydrologic cycle have far-reaching impacts not only on vegetation but also on human livelihood and wildlife. There is, therefore, the need for consistent monitoring of the hydrologic cycle in tandem with vegetation activity, which is critical to the understanding of the influence of climate variability and change on natural and agricultural systems. This is also important for early warning systems and attaining sustainable use of water resources.

Recent advances in remote sensing, especially satellite tracking systems, have enabled the consistent monitoring of vegetation dynamics. Thus, satellite-retrieved data on vegetation and components of the hydrologic cycle is often used to probe the degree to which water availability is coupled to vegetation dynamics. While the majority of these studies are focused on unraveling the relationship between vegetation and precipitation (e.g., [2,3]), a few others compare the strength of the relationship between vegetation-precipitation and vegetation-soil moisture, e.g., [4]. This

comparison has recently been extended to vegetation-precipitation versus vegetation-terrestrial water storage (TWS) from the Gravity Recovery and Climate Experiment (GRACE) satellites, e.g., [5,6]. However, the challenges with these studies are (i) the relative importance of the variables is drawn from multi-temporal bivariate correlation analyses wherein the time series of vegetation and one of the hydrological variables at various lags are assessed one at a time, rather than from a multivariate analysis; (ii) prior to the analysis, seasonality in the time series is often removed by subtracting the long-term monthly climatology from the corresponding monthly observations across the years. The resulting time series is termed anomalies. The motivation for this is, in part, to allow the use of parametric statistics in which the assumptions of stationarity and independence might be violated by the presence of seasonality and autocorrelation in the time series. However, an analysis based on anomalies alone will mask the impact of the hydrological measures on vegetation phenology in ecosystems with markedly wet and dry season oscillations. All in all, these studies often assume a linear relationship between vegetation greenness and the hydrological variables, thereby overlooking the potential for interactions between the hydrological measures and their lag e ffects on vegetation dynamics. To overcome this challenge would require the use of a more robust multivariate approach in which simultaneous analysis and untangling of the relative importance of the hydrological variables and their lags are carried out, regardless of whether the original observation or anomalies are used.

Given these gaps, the aim of this study is to perform a multivariate analysis using time series of vegetation greenness as response variable and precipitation, soil moisture, and TWS at eight concurrent or lead lags as predictor variables. The specific research questions associated with the objective of the study are

(1) What was the spatiotemporal trend in vegetation greenness across Africa from 2003 to 2015?

(2) How well is the dynamics in vegetation greenness associated with the combined influence of precipitation, soil moisture, and TWS, and what are their relative contributions?

The analysis was performed using either the original or anomalies of the monthly values of the variables, and the model was estimated with the random forest algorithm. Random forest is a non-parametric, distribution-free machine learning algorithm that is capable of modeling linear and non-linear relationships between variables [7]. The study is focused on Africa, which is a continent with considerable water-limited ecosystems [8,9] that severely a ffect livelihoods. In addition, many studies that included soil moisture in assessing the eco-hydrological relationship between water availability and vegetation greenness dynamics in Africa largely used modeled soil information [9–11], essentially because of the paucity of in situ data. Thus, the use of independent satellite-observed datasets in this study should o ffer new insights into the eco-hydrological relationships across Africa and ameliorate the data paucity situation.

#### **2. Materials and Methods**

#### *2.1. Study Area and Data*

The study area encompasses the vegetative region of Africa, as shown by the land cover map in Figure 1. Monthly time series of the Enhanced Vegetation Index (EVI), precipitation, soil moisture, and GRACE TWS were used in this study (Table 1). Because GRACE TWS was first available in 2002 and we are interested in vegetation delayed response of up to 12 months to water availability (see Section 2.3), the data span of available hydrological variables that was considered was from 2002 to 2015, whereas EVI was from 2003 to 2015. Thus, the latter period determined the temporal extent of the analysis.

**Figure 1.** The major land cover types across Africa aggregated from the University of Maryland (UMD) MODIS land cover layer for 2013 (adapted from Ugbaje et al. [12]).


**Table 1.** General description of the research data.

EVI, as a surrogate for vegetation greenness, has the advantages of being robust against background and atmospheric noises, and, unlike the Normalized Difference Vegetation Index, it does not saturate over high biomass regions [18]. For this study, EVI time series from the MOD13C2 (version 6) product, which is a derivative of image acquisitions from the moderate resolution imaging spectroradiometer (MODIS) sensor onboard the Terra satellite, was retrieved. MOD13C2 was derived from the MOD13A2 product, which is a 16-day composite at a 1-km spatial resolution.

Monthly precipitation data were obtained from the Climate Hazards Group InfraRed Precipitation with Station data portal. CHIRPS are global, gridded precipitation datasets, derived from a blend of satellite-observed infrared cold cloud duration and in situ gauge records [14]. The blending is performed using a series of algorithms and interpolation techniques, which are described in Funk et al. [14]. Validation of CHIRPS estimates against independent ground observations and some global gridded precipitation products show CHIRPS to have a relatively low bias [14]. Further, with a spatial resolution of 0.05◦, CHIRPS has one of the finest spatial resolution of all the currently available long-term, global gridded precipitation products. The products, as an integral component of the Famine Early Warning Systems Network, have been used for other applications as well, e.g., [12].

Daily satellite observed soil moisture data were downloaded from the European Space Agency Climate Change Initiative data portal. The merged product comprising retrievals from the active and passive microwave soil moisture was used. This product represents the surface soil moisture not deeper than 10 cm [19]. While the merging approach is described in Liu et al. [19,20] and Wagner et al. [21] and the global validation with ground measurements is reported in Dorigo et al. [22], the merging approach used in version 03.2 (this was used for this study) includes a weighted averaging technique where the weights are proportional to the signal-to-noise ratio [15]. As this study is at a monthly time step, we averaged the original daily observations into monthly values. However, the tropical forest areas are masked out in this dataset [19] because of the di fficulty of measuring soil moisture over dense vegetation with microwave remote sensing [23]. We noted a number of gridded soil moisture products with complete coverage of Africa, e.g., [24], but the soil moisture estimates are modeled from other variables, including precipitation. Thus, we opted for the microwave satellite soil moisture product, which is an independent set of measurements.

The GRACE system is made up of two satellites orbiting at identical orbital paths separated at a distance of ~220 km [25]. GRACE provides monthly measurements of the Earth's gravity field variations, as a function of local landmasses. After accounting for atmosphere and ocean e ffects [26], and taking into cognizance the relatively low contribution of vegetation biomass variations [27], the net mass variation can be attributed to the redistribution of TWS. Thus, TWS is an integral of surface water, soil moisture, and groundwater. In this study, we used an ensemble of average TWS from April 2002 to December 2015 computed from the GRACE data (release-5, level-2) independently pre-processed by three research centers (NASA Jet Propulsion Laboratory (JPL), University of Texas Center for Space Research (CSR), and the GeoForschungsZentrum (GFZ) Potsdam). However, GRACE's original signal is lost during pre-processing (e.g., filtering, de-stripping, and truncation). Hence, it is important that an appropriate scaling factor is applied to restore the signal loss prior to using water-related GRACE-TWS applications. For this reason, the scaling factor from the NCAR's Community Land Model 4.0 (CLM4.0, [28]) was applied to correct and restore the GRACE original signal. Besides, CLM4.0 takes into account the interaction between surface and groundwater in addition to accounting for human activities such as irrigation and river diversion [17,28,29]. Consequently, the TWS used in this study is measured in equivalent water height (EWH, in cm) at a spatial resolution of 1◦ and at a monthly time step. In order to match the January 2002 starting point of the soil moisture and precipitation datasets, the GRACE data was extended backward by replacing the three missing months (January, February, and March) with their long-term means. It is not expected that this data interpolation will have a significant impact on the result of this study. It is worth mentioning that the GRACE-TWS was derived from subtracting the monthly observations from a historical mean (2004–2009), the output of which is commonly referred to as terrestrial water storage anomalies (TWSA) in the GRACE user community.

The soil moisture, precipitation, and the GRACE datasets were resampled to co-register with the MODIS EVI data. Each pixel of the soil moisture (0.25◦) and GRACE (1◦) datasets were first disaggregated to 0.05◦ and then resampled by the nearest neighbor technique and then aligned to the EVI dataset. These two approaches ensure minimum loss of information in the downscaling process. However, the CHIRPS pixels were only resampled to match the MODIS EVI pixels using the nearest neighbor approach. The two-step (disaggregation and nearest neighbor interpolation) approach used here replicates the pixels without necessarily altering the original cell values.

We masked some pixels based on the following considerations. Any pixel missing more than 20% of data or where there are more than three consecutive months of missing data was masked out. Otherwise, we filled the missing values using a spline interpolation algorithm [30]. Furthermore, to focus the analysis on vegetated areas, pixels with time series having a mean EVI not exceeding 0.1 were masked out. EVI values below this threshold are indicative of bare soil or open water bodies [31].

A key focus of this study is to understand the relative importance of the hydrological measures as drivers of vegetation greenness, based on original and anomalies values. For this reason, monthly anomalies were computed by subtracting the climatology (long-term mean) of each calendar month from the corresponding monthly time series over the years covered in this study. The equation for this is given by

$$X\_{\text{anomaly}}(i,j) = X(i,j) - \frac{1}{n} \sum\_{j=1}^{n} X(i,j),\tag{1}$$

where *X* is the monthly observed variable, *i* is the month, *j* is the year, and *n* is the number of the years of the data. It should be noted that for the GRACE data, the computed monthly anomaly should not be confused with the original GRACE data in which the long-term mean has been subtracted and is referred to as TWSA. To avoid confusion, the original GRACE data will still be referred to as TWSA in this paper.

#### *2.2. Trend Analysis*

Trend analysis was performed on the time series of the monthly anomalies of each of the variables: EVI, precipitation, soil moisture, and TWSA over the period 2003 to 2015. The trend was estimated using the Mann–Kendell (MK) trend test [32,33]. The MK test evaluates a series for the presence and persistence of monotonic trend (increase/decrease) through pairwise comparison of observations. The test outputs Kendall's tau rank correlation coe fficients (τ), which takes on values between −1 and 1 [33]. Positive, zero, or negative τ values, respectively, indicate an increasing, no trend, or a decreasing trend. The test is non-parametric and is widely used in remote sensing time series analysis, e.g., [12]. The MK test was used in this study because of its robustness to outliers and its capacity to handle short or noisy series compared to parametric tests such as ordinary least squares regression. Finally, only Kendall's τ values where the estimated trend was significantly di fferent from zero (*p* < 0.05) were retained.

#### *2.3. Modeling the Relationship between Vegetation Greenness Dynamics and Water Availability*

In cognizance of the possible vegetation response to water availability exhibiting time lags, the relationship between vegetation greenness and the water availability was estimated with EVI as the response variable and 0–6 and 12 months lagged measures of the hydrological variables as predictors. Consequently, the time series of EVI for the period 2003 to 2015 was used, whereas the predictor variables were drawn from the time series of the hydrological variables covering the period from 2002 to 2015. This resulted in 156 observations for each pixel for the response and the 24 predictor variables (3 variables × 8 lags).

The relationship between EVI and the hydrological measures at the associated lags was estimated with random forest (RF). We chose to use RF because it is a distribution-free, non-parametric algorithm that can be used to model simple and complex relationships between response and predictor variables [7]. It has been widely used in many regression and classification problems [34,35]. In addition to being robust in the presence of irrelevant features, RF estimates a metric for the relative importance of the predictor variables to the prediction problem [7]. The variable importance (VIP) metric indicates the decrease in prediction accuracy on an out-of-bag sample (test sample) in which the values of a variable are randomly reshu ffled as against the prediction accuracy on the same out-of-bag sample with no reshu ffle. The higher the decrease in prediction accuracy, the more is the predictive power of the variable in the model. In this study, the RF model was estimated for each pixel with the number of trees set to 500, above which there is no significant improvement of model performance. The default values for the other parameters (e.g., mtry and node size) in the R statistical software implementation of RF were used.

**Model performance assessment:** The strength of the model was assessed by comparing the out-of-bag prediction of EVI against the observed values. The comparison was based on Lin's concordance correlation coe fficient (LCCC) metric [36]. LCCC expresses the degree of agreemen<sup>t</sup> between the predicted values with the observed values, with respect to the 1:1 line [36].
