1. Introduction
A good understanding of regional hydrological conditions is vital to scientists, water resource managers, policy makers, among others, in helping to understand the state and flux of available freshwater. Such conditions are mostly evaluated with hydrological models, which enable the estimation and prediction of discharge and the various components of the terrestrial water storage. In West Africa (WA), the use of such models is nonetheless challenged by data scarcity [
1]. Alternatively, global land surface models, such as, the Global Land Data Assimilation System (GLDAS) could be used to characterize hydrological conditions in the region [
2]. However, these models perform poorly over the region due to the persistent issue of data deficiency [
3]. The terrestrial water storage (TWS) inverted from the observations of the Gravity Recovery and Climate Experiment (GRACE) mission, therefore, is a very useful dataset which can be used to access changes in water availability to ensure a sustainable use [
4]. However, the previous GRACE mission covered a relatively short period (April 2002 to July 2017), with 2 to 6 months latency, which makes it difficult to be used for long-term studies. Additionally, a data gap is envisaged between the past and the current (follow-on) GRACE missions. This, therefore, necessitates the need for an approach to forecast/backcast GRACE-derived TWS over the region.
As the variability of the water resources in WA is highly coupled to land-atmosphere-ocean interactions, both at the regional and global scales [
5], it is reasonable to use hydro-climatic variables, such as rainfall, temperature, evaporation, and El Niño Southern Oscillation (ENSO) index, as retrieved from space-borne sensors, reanalysis datasets etc. to forecast/backcast TWS in the region. Using GRACE-derived TWS and in-situ river level records, Becker et al. [
6] employed principal component analysis, which makes a linear and stationary assumption of time series to reconstruct GRACE-TWS from 1980 to 2008 over the Amazon basin. Similarly, de Linage et al. [
7], developed a simple linear model between TWS over the Amazon and sea surface temperatures (SSTs) indices derived from Niño 4 and the Tropical North Atlantic Index (TNAI). Meanwhile, Forootan et al. [
5], developed an autoregressive (ARX) model, which utilized independent component analysis (ICA) for dimension reduction to predict TWS over WA using rainfall data from the Tropical Rainfall Measuring Mission (TRMM), and SSTs of the three major oceanic basins (Pacific, Indian and Atlantic). However, this ICA/ARX method assumes a stationary state of TWS over the region and the prediction accuracy reduces after a forecast period of two years. Furthermore, Yin et al. [
8] used the water balance method to extend the TWS series back to 1980 using multi-source datasets as inputs in the terrestrial water balance equation.
All the examples herein given require the use of mathematical models. It must be noted that, since the formulation of these models are based on empirical data, it essentially makes this a machine learning/pattern recognition problem [
9]. Mukhopadhyay [
10] indicated that no mathematical model could effectively characterize hydrological phenomena. Thus, algorithms which make no prior assumption(s) of time series with adaptive capabilities offer a good alternative for predicting TWS over a region. Artificial Neural Networks (ANNs), which are effective machine learning tools, offer such algorithms that are, self-learning, self-adapting, and self-organizing and capable of predicting hydrological variables with high efficiency [
11]. Long et al. [
12] demonstrated this by training an ANN, which was used to extend GRACE-derived TWS series back to 1982 over the Yun–Gui Plateau and its sub-regions, China.
The ANN machine learning algorithm is exceptionally well suited for modeling input–output relationships, especially, in the absence of optimally calibrated physically-based models [
12]. ANN has been extensively used in forecasting hydro-climatic parameters such as, stream flows, groundwater, rainfall, and droughts, with reasonably great performances (see References [
13,
14,
15]). The major strength of ANN lies in the ability to efficiently learn the causal relationships within a nonlinear dynamic system, without a priori assumption(s) of the physical processes in the system [
16]. In the context of GRACE-derived TWS, there is a growing interest in the use of ANNs, although the applications are relatively few. Yirdaw-Zeleke [
17] applied a neural network to downscale GRACE-derived TWS into local groundwater storage over the Assiniboine Delta Aquifer, Canada. Likewise, Miro and Famiglietti [
18] downscaled the GRACE-TWS to high-resolution groundwater storage in California’s Central Valley. Sun [
19] employed it to predict groundwater level using GRACE data over the mid-west regions of the USA. Moreover, ANNs have been used to extend GRACE data to the early 1980s over mainland Australia [
20] and karst plateau regions in southwest China [
12]. Notable are also the contributions of Zhang et al. [
21], who extended TWS series over Yangtze basin, and Mukherjee and Ramachandran [
22], who used ANN (among other algorithms) to predict groundwater variations in India. Chen et al. [
23], monitored the hydrological extremes (droughts and floods) Liao River Basin, Northeast China, using extended GRACE data based on ANN.
To the best of our knowledge, this current study represents the first attempt to apply ANNs in the context of TWS changes (TWSC), derived from GRACE over any region in WA. Additionally, we make use of a broad data spectrum, which replicates dynamics in the water and energy cycles that impact TWSC to train the network. In this study, we present a nonlinear Autoregressive Neural Network with eXogenous Inputs (NARX) which was used to backcast GRACE-derived TWS over WA. The exogenous data inputs employed included: rainfall, evapotranspiration, land surface air temperature, precipitation minus evapotranspiration (from both the atmospheric and land perspective), soil moisture as well as global and regional circulation indices. The NARX was used to learn the physical relationships between GRACE estimates and the afore-mentioned fields over the period 2003 to 2013. The trained network was then used to retrodict TWSC estimates over WA from 2013 to 1979 (34-years period). To this end, for the first time, we consider the accuracy of backcasted TWSC from synthetic GRACE-TWSC to address the question of whether extended TWSC series can be used to address the past seasonal variations in TWSC over WA.
The rest of the study is outlined as follows: the datasets and the method used are presented in
Section 2; while the
Section 3 describes the results followed by their discussions in
Section 4. Summary of the study is presented in
Section 5.
4. Discussion
One of the main goal of this experiment was to attempt to assess the quality of backcasted GRACE-derived TWS over WA. We considered a “closed-loop” simulation by using synthetic TWSC (i.e., TWCC) predicted by a land surface model, GLDAS-Noah V2, since it covers the entire period of study. A period equivalent to the GRACE observation were used to generate an artificial neural network-based learning machine (ANN NARX) and then retrodict the TWCC till 1979. The network was trained to simulate Noah TWCC signals from 2003 to 2010 over WA, based on their physical nonlinear relationships with seven hydro-climatic variables (rainfall; evapotranspiration; land surface air temperature; net-precipitation; soil moisture; ENSO index; and global temperature anomaly). This was necessary in order to validate the ANN NARX. Overall, the results of this network presented similar spatial accuracies as that of GRACE, whereas the spatially-averaged TWCC presented ME, RMSE, NSE and
of 0.13 mm/month, 5.01 mm/month, 0.91, and 0.93, respectively (
Table 3). Furthermore, considering the main river basins over WA and the climate zones (cf.,
Figure 1), the errors are within GRACE-derived TWS [
38,
39]. There is some overestimation over moist sub-humid areas (
Figure 6a) although NSE values shows reasonable performance. Yet, the ANN method seems to be robust to adequately reconstruct GRACE-derived TWSC estimates over WA as shown by the simulation.
Following this, the second goal was to retrodict the actual GRACE-derived TWSC to 1979 using a similar network. Firstly, an ensemble model of GRACE-derived TWS was created using the data from the three primary processing centers (CSR, JPL, and GFZ). In this regard, we applied the TCH method to independently estimate uncertainties in GRACE-derived TWS. The overall uncertainties estimated for the individual processing centers show that CSR delivered the lowest values. This is somewhat in agreement with previous studies [
38,
39], which found CSR providing the lowest noises among three processing centers. The computed uncertainties were subsequently used as weights in creating an ensemble TWS model from the original series, as well as, another combined model obtained through simple averaging. The weighted ensemble product, when compared to the original and the averaged datasets presented lower noise estimates and higher SNRs (
Table 1 and
Table 2), indicating an improvement in the TWS solution. This improvement in the ensemble series using weights contrasts with the results provided in Ref. [
38].
Following this, the improved product was used in the reconstruction of GRACE TWSC (TWS was converted to TWSC by means of Equation (3)) retrodicted from 2013 to 1979 (34-years period) using an ANN NARX.
Figure 8a depicts the averaged series for WA, which shows an overall agreement with observed TWSC with ME, RMSE, NSE, and
of 0.05 mm/month, 6.98 mm/month, 0.91, and 0.91, respectively. Despite the differences in the length of the comparisons, the results of a “closed-loop” simulation (Noah-TWCC) and those for actual data (GRACE-TWSC) show consistency for the entire region and its sub-domains (compare
Table 3 and
Table 5). Furthermore, the use of TWSC shows relatively low NSE values in comparison with previous studies that adopted TWS [
12,
21,
23], known that there are differences in the datasets, time span, study region, etc.
To support the above discussion that TWSC is a better candidate to be predicted than TWS an ANN NARX was trained to predict GRACE-TWS with the same hydro-climatic variables. Similarly, the ANN NARX predicted GRACE-derived TWS series (TWS
ANN), showed strong agreement with the original GRACE data, presenting similar trends over their overlapping periods (
Figure 8b). The results, upon validation showed high skill scores, mostly in areas south of the Sahara Desert, but low performance in the desert due to the very low amplitudes in signals. The coefficient of determination and NSE coefficient values obtained for areas south of the Sahara were mostly greater or equal to 0.6 and 0.5, respectively, while RMSEs ranging from 20 mm to 50 mm were predominant. Spatially-averaged series for major basins and sub-climatic zones, as well as, the whole of WA itself presented median RMSE, NSE and
of 11.83 mm, 0.76 and 0.89, respectively. This is somewhat in agreement with the findings presented in [
12], thought different scenarios. Interesting,
Figure 8a shows a better agreement between GRACE-TWSC with Noah-TWCC in comparison with the results depicted in
Figure 8b, which shows GRACE-TWS and Noah-TWC. That means TWSC over WA mainly reflects fluctuations in soil moisture while TWS still have some memory effect, mainly in the groundwater compartments.
As mentioned in the
Section 1 (Introduction), so far no one appear to have assessed ANN considering a “closed-loop” simulation. Thus, the importance of results using such algorithm lies both to validate the algorithm in the aim that the errors lie within an acceptable range (e.g., GRACE errors) and to provide long-term series to enable long term studies (e.g., droughts). However, we have not considered and investigated the contribution of desiccation of Lake Chad [
46] on the hindcasted TWSC. Furthermore, the regularization of Lake Volta due to the water impoundment must be considered. Sensitivity of the output in ANN to the set of the parameters has not been considered. We just used seven hydro-meteorological variables that could modulate TWSC over WA, however, it is recommended to select inputs one-by-one or several at a time to assess the prediction performance to a set of inputs. Nonetheless, our finds support the use of ANN to extend TWSC series dedicated to long-term studies over West Africa, a data-poor region.
5. Summary
Although GRACE currently offers the most viable option of obtaining reliable estimates of TWS at regional and global scales, its time series is relatively short. For example, it cannot be used to infer the long-term changes in water availability over West Africa since the 1970s. Thus, a data-driven machine learning approach, which involved the use of NARX network was adopted to retrodict the GRACE series to 1979. The network was trained to learn the complex nonlinear relationships between GRACE-derived TWSC and the following hydrological variables: rainfall; temperature; evapotranspiration; net-precipitation; soil moisture, ENSO and Atlantic Niño (Niña) indices; and global temperature anomalies over the period 2003 to 2013. The trained network was subsequently used to backcast TWSC estimates from 2013 to 1979 covering a period of 34 years. Due to the lack of long-term records for the purposes of validation, a similar system was designed to synthesized TWSC (i.e., TWCC since there is no groundwater and surface water storages) from the Noah driven GLDAS land surface model, which is endowed with long-term series in its Version 2. This network was trained to predict TWCC from 2000 to 2010, after which, it was used to backcast TWCC from 2010 to 1979. Records from 1979 to 1999 from the reconstructed and the original datasets were then used to validate results from the ANN NARX.
Overall, the network employed to reconstruct Noah series yielded good spatial accuracies, with the spatially-averaged TWCC estimates presenting median RMSE, NSE and of 8.06 mm/month, 0.76 and 0.88 respectively. Thus, the artificial neural network method proved robust to adequately reconstruct GRACE-derived TWS estimates over West Africa. For the real GRACE data, the reconstructed TWSC series, showed strong agreement with the original GRACE data, presenting similar trends over their overlapping periods. The results, upon validation showed high skill scores, mostly in areas south of the Sahara Desert, but low performance in the desert dues to the very low amplitudes in signals. The spatially-averaged series for major basins and sub-climatic zones, as well as, the whole of West Africa itself, presented median RMSE, NSE, and of 11.83 mm/month, 0.76 and 0.89, respectively. These results agree with those of the “closed-loop” simulation and thereby we can conclude that the NARX network method used here proved robust to adequately reconstruct GRACE-derived TWSC estimates over West Africa.