1. Introduction
Food and economic safety in West Africa depend heavily on rainfed agriculture and, therefore, on rainfall. In this context, accurate rainfall information is essential to ensure food security. Uncertainty in West African rainfall and the associated vulnerability of small-holder farmers have been documented since the last century. In the 1970s, the Sahelian Drought was socially and agriculturally devastating. It was reported to have produced 100,000 deaths by 1973 and was followed by continuous droughts in the next two decades [
1,
2]. Currently, climate change and global population growth [
3], the two great threats of this century, exacerbate these problems. Sub-Saharan Africa will account for most of this century’s population growth and will become the world’s most populous area by the late 2060s [
4]. Climate change is changing the onset of the rainy season over the Sahel [
5] and causing more frequent droughts in most of Africa, which is severely increasing food insecurity [
6,
7]. Rainfall detection is essential to monitor these changes, characterize rainfall patterns, and supply the information needed for efficient agricultural planning. However, a sparse, unevenly distributed, and inconsistently reported rain gauge network poses a major challenge to studying rainfall variability in this region—and has been a persistent problem since the last century [
8].
Satellite rainfall products are of special relevance for areas with sparse rain gauge networks, such as sub-Saharan Africa, because of their global coverage. In fact, satellite rainfall retrieval and its application over Africa have been in constant development since the late 1960s [
8,
9,
10,
11]. However, existing satellite products show a poor correlation with ground measurements in the region. For example, the Africa Climate Hazards Infrared Precipitation with Stations (CHIRPS) [
12] and the Tropical Applications of Meteorology Using Satellite Data and Ground-Based Observations (TAMSAT) [
13], particularly developed for Africa based on the Cold Cloud Duration method, show daily Kling–Gupta Efficiency values below 0.4 [
14,
15]. The most widely used machine learning-based product, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System (PERSIANN-CCS) [
16], tends to have a high false alarm ratio (FAR) and to overestimate rainfall both globally and in Africa [
17,
18]. Lastly, the Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG) [
19], which combines both physical and ML-based methods and has been developed to become the longest and most detailed rainfall data set, show a weaker correlation with ground measurements in West Africa than in other regions of the world [
20,
21].
The literature suggests that an important reason for the poor performance of satellite rainfall estimates over West Africa is the sparse rain gauge distribution, leading to underrepresentation in the training or calibration data for the modeling algorithms. Additionally, atmospheric conditions differ from other regions in the world, as there are higher aerosol concentrations, higher land surface temperatures, and a generally drier atmosphere [
22]. Furthermore, the generalization performance of existing ML rainfall retrieval models trained on dense gridded rainfall data [
16,
23,
24,
25] may decrease for areas with less training data and different atmospheric conditions.
Deep learning (DL) is becoming increasingly popular in the field of environmental remote sensing because of its ability to learn complex patterns and features from data [
26]. DL methods exploit spatial and sequential inductive biases to improve performance by incorporating the assumption that nearby pixels in an image and nearby elements in a sequence have more relevance to the output, which allows the network to learn more effectively and generalize to new examples.
In this work, we investigate whether locally training a deep learning model can overcome the limitations of global products in capturing the complex rainfall dynamics of this region. We develop two models based on CNN and ConvLSTM for rain/no-rain detection in the data-scarce region of northern Ghana, West Africa. Both models have been trained on a small regional dataset, representative of data availability in the region. The focus of this paper is on rain/no-rain detection, i.e., binary classification, as a first step towards rainfall intensity estimation. In
Section 2, we present the data and study area and introduce our methodology; in
Section 3, we report our results; in
Section 4, we compare our findings with those of other studies; and in
Section 5 we draw the main conclusions of this study and propose future work beyond this paper.
2. Materials and Methods
2.1. Model Development Datasets
The input to the model is level 1.5 data from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) instrument aboard the Meteosat Second Generation (MSG) satellite. Concretely, we use data from the 10.8 µm channel (channel 9 of SEVIRI), a window channel in the thermal infrared (TIR) region that is widely employed for rainfall estimation from cloud top temperature [
27]. The spatial resolution over our study area is 3.1 km × 3.1 km [
28]. The temporal resolution is 15 min.
2.2. Target Data: TAHMO Rain Gauge Data
To develop the models, we used hourly rain gauge data from the Trans-African Hydro-Meteorological Observatory (TAHMO) [
29] as target data. TAHMO provides quality-controlled rainfall data, available in near real time. There are eight TAHMO stations in our study area during the research period (July 2018—December 2020). Their locations and characteristics are displayed in
Figure 1 and
Table 1, respectively.
Table 1 also includes the number of rain events per station per year. Here, a rain event is defined as an uninterrupted time period of over-zero rainfall measurements with a cumulative rainfall of at least 1 mm, and there is a 1 h separation window between rain events.
2.3. Benchmark Products
We used two benchmark products for performance evaluation (
Table 2): PERSIANN-CCS [
16], as a reference operational ML-based satellite rainfall product, and IMERG [
19], as a very high-quality global satellite rainfall product.
PERSIANN–CCS builds on its predecessor, PERSIANN [
30], and estimates rainfall from GEO IR images. First, the model segments and classifies clouds into cloud patches based on manually selected features such as cloud texture or geometry. Second, it learns the relationship between brightness, temperature, and rainfall rates for each cloud patch [
16]. PERSIANN-CCS has a latency of approximately 3 h. A possible limitation of this method lies in the human-assigned features and group definitions of cloud patches, which may be reductionist or faulty in representing physical (rainfall) processes that are not yet fully understood.
IMERG has been developed by NASA and is available as three different products with varying latency times and more data being incorporated in successive runs of the algorithm: Early run, with a 4 h latency time; Late run, with a 12 h latency time; and Final run, with a latency time of 3.5 months. NASA advises using the Final Run as research-ready data. Here, we evaluated all three products. The latest algorithm upgrade of IMERG at the time of writing this paper was version 06.
IMERG relies on multiple data sources and algorithms: It employs GEO IR satellites, “as many as possible” opportunistic LEO satellites, and monthly gauge analyses [
19]. The LEO satellites provide PMW rainfall estimates that are propagated forwards and backwards in time using estimated rainfall motion vectors. GEO IR estimates are added using the PERSIANN-CCS algorithm to fill in the gaps between LEO PMW estimates. The Early Run of the algorithm only has forward propagation, whereas the Late Run has both forward and backward propagation, allowing for interpolation. Furthermore, the longer latency time allows for lagging data transmissions that might have been missed in the Early Run to be incorporated in the Late Run. The gauge analyses from the Global Precipitation Climatology Centre (GPCC) are used to regionalize and correct biases in the final stage of the algorithm. Other input data are the GPM Combined Radar-Radiometer (CORRA) rainfall estimates, Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2), and Goddard Earth Observing System model (GEOS) Forward Processing (FP) precipitable water vapor data [
19].
2.4. Study Area: North of Ghana
Northern Ghana, defined here as the northern part of Ghana comprising the five northern regions and not the northern region alone, lies between latitudes 8°N and 11°N and longitudes 3°W and 0°30′E and is situated in the Savanna climatic zone. It is heavily affected by high variability in climate and hydrological fluxes, with frequent floods and droughts accompanied by high temperatures. This produces frequent crop failures or losses, outbreaks of diseases, and dislocation of human populations, with major economic repercussions [
31]. Over 70% of employment in Ghana is in near-subsistence agriculture in rural areas [
32].
Ghana’s climate is characterized by markedly seasonal rainfall with high interannual variability. Rainfall seasons are determined by the movement of the intertropical convergence zone (ITCZ), which oscillates between the north and south tropics throughout the year [
32]. The ITCZ separates a cold, moist air mass moving northward from the Atlantic and a dry, hot, and dusty air mass from the Sahara Desert. As opposed to the south of Ghana, which has two annual rainy seasons, northern Ghana has a unimodal rainfall regime, with a rainy season from March to October, when the ITCZ is in its northernmost position [
32,
33].
Figure 2 shows the average monthly temperature and rainfall in Bawku, in the upper-east region of Ghana, which is representative of the climatology of the region. Over 75% of rainfall in this area is due to deep convection, most of it organized as large mesoscale convective systems [
34]. Intense and short-lived events as a result of deep convection characterize the diurnal rainfall variation in this region [
35]. For example, over 80% of rain events present in our development dataset last less than 3 h.
2.5. Data Preprocessing
Figure 3 shows the flow diagram of the overall methodology presented in this research, with special detail given to the data preprocessing stage.
One-hour TAHMO and thirty minute IMERG data were accumulated in 3 h intervals, while PERSIANN-CCS data were directly obtained with a 3 h resolution. All three products were classified as rain/no-rain using a 1 mm/3 h threshold.
Data scarcity poses an obstacle to DL-based rainfall estimation or prediction, in that the existence of densely gridded data to use as training data during model development is a pre-requisite for most existing approaches [
16,
23,
24,
25]. Our study area has a sparse rain gauge distribution, with distances between stations too large to allow reliable interpolation, especially considering the highly localized rainfall patterns in West Africa. We employ a methodology to overcome this obstacle by using point-based instead of gridded data as the output of the model. RainRunner utilizes an image-to-point approach: the model is trained only with point-based rainfall data, corresponding to the center of the input image. Some studies [
37,
38] have used a similar methodology, cropping satellite data around rain gauge measurements used as target data before being input to a CNN in a DL model to estimate rainfall. However, both approaches use other rainfall measurements present in the cropped scene—and other data sources—as input to the models. [
37] uses all rain gauges present in the scene, and [
38] uses TRMM 34B2 precipitation data. In our case, MSG TIR images are the only model input, and they were cropped to create 32 pixels × 32 pixels (i.e., approx. 96 km × 96 km area) images centered on each TAHMO station as shown in
Figure 4. Images were cropped in a way to ensure that the corresponding station fell in a “center square”, defined as a square with sides of length equal to the pixel size and with center on the geometrical center of the image. In this way, the model’s spatial resolution is the pixel size, i.e., approx. 3 km.
Cropped MSG TIR images were grouped in 3 h sequences (i.e., groups of 12 images). Sequences were then classified as rain/no-rain according to the corresponding TAHMO data. Incomplete sequences due to gaps in TAHMO or MSG data were discarded. We chose a 3 h temporal resolution according to the short-lived rainfall events characteristic of this area. We expect this resolution to be able to capture the daily rainfall dynamics and deem a finer temporal resolution not needed for the end goal of our research, which is to improve the quality of rainfall information for agricultural applications.
To prepare the model development datasets, first we resampled the training dataset to deal with the data imbalance characteristic of rainfall binary classification [
39]. We employed a 4:1 dry/rain ratio. Validation and test datasets were created with the same dry/rain ratio as the full 2020 data, i.e., 28.2:1, in order to be representative of reality. The data distribution is presented in
Table 3.
We assigned all 2018 and 2019 data to the training dataset. Out of the 2020 data, we randomly selected two sets of 250 rain sequences for the validation and test datasets; the rest were assigned to the training dataset.
After model development, its performance on the test dataset was evaluated through comparison to IMERG and PERSIANN-CCS. However, IMERG Final run and PERSIANN-CCS presented data gaps in the validation and test datasets. IMERG Final run had 241 gaps in the validation dataset and 229 in the test dataset, while PERSIANN-CCS only had 110 gaps in the validation dataset. Conveniently, so as not to penalize further the minority class, all corresponded to sequences recorded as dry by TAHMO stations. For a fair comparison, these sequences were removed during result evaluation.
2.6. Deep Learning Model
We framed the rainfall binary classification as a supervised binary classification problem. We developed two model architectures: RainRunner, based only on convolutional neural networks (CNN), and RainRunner-R, which incorporates a convolutional long short-term memory (ConvLSTM) architecture. Both models have the same input (sequences of 12 TIR images taken every 15 min) and output (rain/no-rain classification).
CNNs are deep neural networks with convolutional layers that exploit symmetries in gridded data by recognizing similar patterns and features, achieving efficient processing and generalization by reducing the number of learned parameters [
40]. CNNs treat pixels as connected to their neighborhood instead of independent from each other through convolution and pooling operations [
41]. This enables them to account for spatial correlations in rainfall. They are more computationally efficient than multi-layer perceptrons (MLPs) [
18]. LSTM architectures are improved recurrent neural networks that incorporate a sequential inductive bias by means of memory cells and gates that selectively maintain and propagate important information across timesteps. This allows them to effectively process sequential data such as time series and natural language texts [
42]. ConvLSTM [
43] is an extension of LSTM to 2D sequences, i.e., images changing in time, instead of point-based time series. As such, they are suitable techniques to capture the spatio-temporal evolution of satellite gridded data.
We selected these relatively simple methods because of the nature of our problem. State-of-the-art (SOTA) methods for image classification and sequential processing based on transformers require tens of millions to a billion parameters to achieve top performances on benchmark datasets [
44]. Training is performed using millions to billions (e.g., pre-training) of images and exceptional computing power [
45]. These settings are very different from the case study we are considering, where we are dealing with fewer than ten thousand images. As such, it is beyond the scope of our paper to test very large SOTA models. Instead, we focus on more basic DL models capable of dealing with limited data to explore the overall suitability of the DL approach for this context. We employed ConvLSTM to test whether using a more suitable inductive bias to process our sequences would yield better results.
We differentiate two building blocks: CNN and ConvLSTM blocks. A CNN block comprises multiple convolution and pooling layers. The output of a CNN block is a feature map with dimensions greater than or equal to 8 × 8. The ConvLSTM block consists of ConvLSTM and batch normalization layers, with the output of the block being a 2D tensor. Besides these building blocks, we also used MLPs with a dropout layer between the hidden and output layers.
2.6.1. RainRunner Architecture
Upon receiving an image sequence, RainRunner processes each image in parallel through a CNN block and an MLP to produce one bounded real value (0,1) from each one of them. Then, these outputs are concatenated into a fully connected layer and passed through a second MLP to classify the 3 h input sequence as rain/no-rain.
Figure 5 shows a schematic block diagram of this architecture.
2.6.2. RainRunner-R Architecture
RainRunner-R processes all the images as a sequence through a ConvLSTM block. The output of this block is a 2D tensor that is then passed through a CNN block and an MLP to produce a rain/no-rain prediction. This architecture is shown in
Figure 6. We investigated the effect of bidirectionality on the ConvLSTM architecture. Bidirectional recurrent neural networks allow training a model using both time directions (i.e., past to future, future to past) of the input when a whole sequence is available. While they cannot be used for forecasting purposes, they are particularly suitable for sequence recognition tasks such as ours [
46].
2.7. Training and Hyperparameter Search
To account for data imbalance, we trained the models to minimize a weighted binary cross-entropy loss, where a weight of 0.8 was given to the rain class and 0.2 to the dry class (Equation (1)).
where N is the size of the dataset, y is the label/true value (i.e., 0 for no-rain and 1 for rain), and p(y) is the prediction probability (i.e., the estimated probability of each sequence i containing rain).
We trained multiple hyperparameter combinations and chose the best models based on a trade-off between the validation F1-score and the number of trainable parameters. We ran these models ten times and selected the overall best model for both RainRunner and RainRunner-R based on the validation F1-score. Using F1-score as a performance metric helps deal with the rain/dry data imbalance.
2.8. Performance Metrics and Misclassification Analysis
We used performance metrics commonly used in the meteorology field as well as the F1-score, a metric commonly used for imbalanced problems in DL, all extracted from the contingency table (
Figure 7). Accuracy (Equation (2)) represents the number of correctly classified data samples out of all data samples; probability of detection (POD, Equation (3)) measures the ability of the model to correctly detect rain sequences; success rate and false alarm ratio (SR and FAR, Equation (4)) are complementary and represent the certainty with which rain sequences are detected; frequency bias (FBias, Equation (5)) represents the degree of correspondence between rain predictions and observation; finally, F1-score and critical success index (F1-score and CSI, Equations (6) and (7)) evaluate at the same time SR and POD.
POD, SR, F1score, and CSI can vary from 0 to 1, with 1 being the optimal value. FBias can range from 0 to ∞, with the optimal value being 1. If FBias is below 1, the events are under-forecasted; if it is greater than 1, they are over-forecasted.
We present results in three ways: as contingency tables, numerically as the forecast verification metrics, and visually in a Roebber diagram [
47] or performance diagram.
To assess the generalization ability of the models in the context of the highly localized and seasonal rainfall in northern Ghana, we analyzed their performance depending on factors such as location and time of the year (
Table 4) in terms of the distribution of misclassified sequences.
We compared the performance of RainRunner to that of the benchmark products by computing the forecast verification metrics and performing a misclassification analysis of all products. To assess the difference in performance of the three IMERG products–i.e., Early, Late, and Final Run—we included all of them in the forecast verification metrics computation. For the misclassification analysis, we used IMERG Final Run, as the highest-performing satellite rainfall product. We conducted the performance evaluation based on the results of the test dataset. For reference, we also include the forecast verification metrics of all products on the validation dataset.
4. Discussion
Our findings show that DL models for rainfall binary classification trained with a small local dataset of strictly TIR data compare well to state-of-the-art global products. These results suggest three insights: (1) TIR data are strongly related to rainfall in this region; (2) DL can extract relevant features linking TIR images with rainfall; and (3) locally developing a DL model enables it to capture the characteristics of local processes, in this case, rainfall occurrence, better than some globally trained models.
The strong relationship between brightness temperature (Tb) and rainfall has been extensively studied and used for satellite rainfall retrieval. This relationship is particularly relevant in the Sahel, where around 75% of surface rainfall is due to deep convection that involves cold cloud tops, observable in TIR data [
34]. RainRunner surpasses PERSIANN-CCS, which uses machine learning to link TIR data to rainfall through manually extracted features related to cloud properties. This shows that DL methods are able to extract relevant features from data and model natural processes better than expert-based models that rely on manual feature extraction. Especially training the model locally allows it to reproduce regional rainfall patterns more efficiently.
As seen in the Roebber performance diagram (
Figure 10), all models over-predict rainfall with an FBias greater than 2. It is known that TIR-based methods over-predict rainfall because the size of large convection systems is much larger than the surface rainfall area underneath [
34]. A further explanation of this over-prediction lies in the characteristic West African rainfall processes. Particularly, the presence of rain-bearing clouds does not necessarily mean rainfall on the ground. Sometimes rainfall does not reach the ground due to the higher concentration of aerosols and associated smaller drops, higher land surface temperature, and drier atmosphere compared to other regions [
22]. Therefore, adding other relevant sources of information such as aerosols, land surface temperature, or water vapor data might improve the performance of the models. Furthermore, virga—precipitation evaporating before it reaches the ground—accounts for 15% of all precipitation profiles in the northern African Savanna (8°–12°N) [
35]. Virga has been found to account for 50% of false PMW precipitation results in arid regions [
50] and could be a cause for IMERG’s rainfall over-prediction. Furthermore, the presence of other MW radiation scatterers, such as dry sand, also results in satellite PMW retrievals over-estimating rainfall [
35].
Despite the proven efficiency of DL methods to reproduce physical processes, data scarcity poses a challenge to their employment. To overcome this, we have used an image-to-point methodology that only needs point-based rainfall measurements. Although other studies have applied similar methodologies [
37,
38], they required additional rainfall information—additional rain gauges in the study region or a gridded satellite product—as model inputs. Compared to these, our approach has the advantage that it does not require any further rainfall information.
Of the two DL architectures we evaluated, results suggest that the temporal inductive bias introduced by the ConvLSTM architecture—processing each image in the 12-image sequence one after the other—does not improve model performance, although it results in a model with fewer trainable parameters (21,033 against 120,125 for RainRunner). The hyperparameter search in model design produced a wide range of performances for both models, which is probably explained by the relatively small training dataset. To investigate the robustness of the models, further research on a range of small to larger datasets would be needed. It is striking that our DL models based on TIR data only, developed with a small dataset and simple model architectures, achieve a performance close to that of IMERG. The high learning efficiency of the DL model, when trained with local data, is promising for the independent application of such models in data-scarce areas such as Sub-Saharan Africa. Additionally, it might be interesting to investigate combining the DL model with existing products such as IMERG, where the DL approach can offer complementary insights that help improve performance. For example, substituting the PERSIANN-CCS rainfall estimation scheme from TIR data within IMERG with our better-performing approach might improve IMERG’s estimations.
With most agriculture in West Africa being rainfed, access to accurate rainfall information is necessary for agricultural productivity. Satellite rainfall products, such as the one developed in this study, that, after training, can be interpolated to areas with no ground observations can play an essential role in overcoming the data scarcity challenge and contributing towards food and economic security.
5. Conclusions
In this paper, we have developed two DL models based on the CNN and ConvLSTM architectures. The output of our models is a rain/no-rain binary classification of 3 h sequences. We show that our models compare well against existing products despite being considerably simpler, developed with a small training dataset—observations from 8 stations over 2.5 years, with 20.4% data gaps—and using TIR data alone. Specifically, our models consistently outperform PERSIANN-CCS for rain/no-rain detection at a sub-daily timescale. While IMERG is the overall best performer, the DL models perform better than IMERG in the second half of the rainy season despite their simplicity (i.e., up to 120 k parameters). Compared to our models that follow a black-box approach from raw MSG TIR data, IMERG uses data from multiple LEO and GEO satellites, both TIR and PMW, combined with reanalysis and rain gauge data. The high performance that the models are able to reach despite the important challenge of data scarcity shows their high efficiency and, ultimately, the potential of DL to model rainfall in regions with low data availability. We overcome the challenge of data scarcity to develop DL models with an image-to-point methodology that only needs point data instead of densely gridded rainfall information from the ground.
The DL model based on CNN achieved somewhat higher performance than the one including CNN and a ConvLSTM. The temporal structure information brought by the ConvLSTM architecture enables the model to achieve similar performances as when based on CNN, with only 17.5% of the trainable parameters but at the expense of a slower training process.
We suggest that regionally training a DL rainfall model can result in better performances than global models, especially in areas with complex, highly region-specific meteorological characteristics, such as the Savanna region of West Africa.
Further work includes the addition of other EO data as inputs to the model. Particularly, and because of the drier atmosphere characteristic of our study region, the SEVIRI water vapor channel is expected to improve the performance of satellite rainfall estimation. Aerosol data from the Sentinel 5P satellite is also to be added. We expect that the incorporation of these two data products will capture the atmospheric conditions that are the potential causes of rainfall over-detection in West Africa. Furthermore, because the aim of our study was to prove the potential of deep learning methods for providing rainfall information in data-scarce areas, finding the optimal model through a thorough hyperparameter search was out of our scope. However, we believe such a search would improve model performance, and we strongly encourage it. At the same time, we recommend the expansion of the development dataset to cover a longer period and/or a wider region in West Africa, which would allow for the use of more advanced architectures such as ConvNeXt [
51] and eventually enable direct rainfall estimation. We expect that the fully data-driven approach can give useful insights into rain processes in the West African savanna.