Next Article in Journal
Rainfall Variability across the Agneby Watershed at the Agboville Outlet in Côte d’Ivoire, West Africa
Previous Article in Journal
Integrated Flood Risk Assessment of Rural Communities in the Oti River Basin, West Africa
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Case Report

Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya

Department of Disaster Management & Sustainable Development, Masinde Muliro University of Science and Technology, P.O. Box 190, Kakamega 50100, Kenya
*
Author to whom correspondence should be addressed.
Hydrology 2016, 3(4), 40; https://doi.org/10.3390/hydrology3040040
Submission received: 24 April 2016 / Revised: 1 August 2016 / Accepted: 24 October 2016 / Published: 22 November 2016

Abstract

:
Design and operation of water resources management systems in sub-Saharan Africa suffer from inadequate observation data. Long running uninterrupted time series of data are often not available for water resource planning. Incomplete datasets with missing gaps is a challenge for users of the data. Inadequate data compromise results of analyses leading to wrong inference and conclusions of scientific assessments and research. Infilling of missing sections of data is necessary prior to the practical use of hydrometeorological time series. This paper proposes the use of Tropical Rainfall Measuring Mission satellite data as a viable alternate source of infill for missing rain gauge records. The least square regression method, using satellite-based estimates of rainfall was tested to fill in the missing data for 153 data points at nine rain gauge stations in Machakos, Makueni and the Kitui region of Kenya. Results suggest that the satellite rainfall estimates can be used as an alternative data source for rainfall series where the missing data gaps are large. The infilled data series were used in the development of monitoring, forecasting and drought early warning for Arid and Semi-Arid Lands (ASAL) in Kenya.

1. Introduction

As in other Arid and Semi-Arid Lands (ASAL) of Kenya, climatic variations have been experienced over the years in the south eastern lowlands of Kitui, Machakos and Makueni counties. The typical approach to gaining understanding of climate variability starts with the acquisition of historical data. For rainfall, historical data provide necessary information about accumulated amounts in both time and space and form the basis for fitting and testing stochastic data-based distribution models. When historical data is unavailable in a region, or available data is inaccurate or incomplete in a spatial or temporal sense, geophysical models can be used to ‘fill in’ the missing values [1]. According to Collischonn et al., [1], areal rainfall estimated by rain gauges exhibits a great deal of uncertainty where the rain gauge network is sparse. This problem is related to the differences in distribution of rain gauges around the region. This situation also affects the quality of data. This paper suggests a method of improving rain gauge-based rainfall measurement datasets through infilling missing gaps using remotely sensed rainfall estimates.
Generally, in operation and model validation of meteorological data, surface observations are considered to be “the truth” [2]. Analysis of climatic systems require availability of data forming a complete and homogeneous series to enable generalised deduction and inference from results [3]. This is especially important for those approaches that use statistical techniques based on the estimation of covariance matrices, e.g., the principal component, cluster, or discriminant analysis, the canonical correlation method, and the method of multiple linear regressions [4]. In Africa in general and Kenya in particular, incomplete datasets of climatic variables are frequent with the ensuing appearance of gaps in the measurement series [5]. The existence of missing values in the data series affects the variable estimation from the series [6], and the output of multivariate analysis techniques [7].
Hydrometeorological data analysis such as drought assessment and forecast benefit from a complete dataset [8]. A possible way of minimizing the influence of missing data is to rebuild the series, filling in the gaps with estimated values. Various methods for the estimation of missing values in climatological series exist. Bareither et al., [2], evaluated the influence of replacing missing meteorological data with estimates on hydrologic predictions for a water balance model in a semiarid climate. According to Bareither et al., [2], surrogate data technique yields modest predictions of annual water percolation that are statistically similar to percolation predicted using actual data. Aly et al., [9], evaluated deterministic and stochastic interpolation methods to fill gaps in daily precipitation records.
The simplest and more direct methods of data extension take into account the data of the series that is being filled. The arithmetic mean method substitutes missing values by the series mean value of the series. Thus, although the average value of the series is not altered, its variance is reduced and thus the method rendered inefficient to address highly variable climatic quantities, such as precipitation [10]. Other methods include the linear interpolation method and the first differences method both of which are particularly appropriate for small temporal scales and variables with high autocorrelation [10].
Methodologies which use information from different sites other than the station with missing data (target station) have also been developed. These methods take into account the spatial variability of the measured variable, ignoring the temporal information in long-time series [11]. Such methods include the closest station method [12], the simple arithmetic averaging method; the inverse distance method, the single best estimator method, and the normal ratio method. These methods generally under and/or overestimate the high and low extremes, respectively [13].
Another important set of approaches for gap filling in climatological series is regression methods. These methods are based on relationship techniques of the temporal series of the variable under consideration [14]. They take into account the station’s ‘history’ and its climatic characteristics without consideration of spatial dependence of the variables. Uncertainty in climate parameters however originate from its stochastic nature [15], and its magnitude depends on other environmental factors, intrinsic on the recorded value [16]. Spatial characteristics of the uncertainty enters the records through the procedure for stations selection [17] when stations other than the target station are considered. The procedures followed for the selection of neighbour stations in the regressive methods utilizes relative weighting, enabling differentiation of analysis from one station to another. The regression methods have the advantage of robustness when dealing with extreme events or local effects [18]. This paper utilizes the least square regression method for the estimation of missing data in a monthly precipitation dataset taking into account the measurement uncertainty. The paper addresses the question of whether remote sensing rainfall estimates over a region can be used for infilling missing data in the time series of rain gauge-based data. The Tropical Rainfall Measuring Mission (TRMM) satellite datasets was selected on the basis of its good prior performance in estimating rainfall in East Africa [19,20] in particular and in many parts of the tropics [21] in general.
Errors occurring due to rain gauge measurements are fairly well understood [22], and so, except for their limited coverage, they are ideal for checking satellite estimates [22]. The use of satellite estimates to fill rain gauge measurements on the other hand however raises errors due to the space-time differences of the two measurement methods. While rain gauge measurements are point (tens of centimetres in diameter) estimates, satellite measurements are a good attempt to measure rain amounts over areas many kilometres in diameter around a point (rain gauge position). Bell and Kundu [22] investigated the “noisiness” in the comparisons of satellite and rain gauge estimates given the very different observational characteristics of the two. Bell and Kundu [22] observed that the satellite measurements catches glimpses of large areas at infrequent intervals, whereas rain gauges record what happens in small areas continuously. Panet et al., [23] alluded that the presence of non-negligible errors in satellite rainfall estimation presents a hurdle to fully implement the product for wide ranges of hydrologic applications. Gebregiorgis and Hossain [24], however, indicated that the quantitative picture of satellite precipitation error over ungauged regions can be effectively discerned. The paper makes consideration of the space–time scale difference of rainfall estimates based on the point rain gauge measurements and satellite-based estimates.
Rain gauge data series in Machakos, Makueni and Kitui counties of Kenya for the period 2001–2011 has long running data gaps of over two years. These data gaps however form less than 5% of the total length of existing the data for most of the rain gauge stations in the region. The data series are therefore worth consideration for infilling in view of their importance to connect the historical rainfall analysis and the current rainfall situation [25]. The purpose of this paper is to proposes the use of TRMM data as a viable alternate source of infill for missing rain gauge records. The method of infill utilizes linear regression relationships and make use the records of a reference station which cover the period of interest. The paper demonstrates the use of satellite rainfall estimate data for extending rain gauge records by infilling missing gaps in a rainfall data series. The method adapts the MOVE.2 approach [26] in a variation of linear regression equations [27], which ensure preservation of characteristics of the statistical parameters (mean, variance and extreme value statistics), of the infilled data series. The Gamma distribution with shape parameter α and scale parameter β is often assumed to be suitable for distributions of precipitation events [28]. This distribution has been proven to be effective for the analysis of precipitation data in previous studies [29]. The gamma distribution was used in this study to confirm that the infilled data did not alter the parameters of the original series.
In this study, an attempt was made to infill missing monthly rainfall data for 153 missing data points for 9 rain gauge stations in Machakos, Makueni and Kitui counties of Kenya. This paper is organized as follows; first, this introduction giving the background, the problem, the objectives and rationale of the study. The materials and methods used to address the research question and related formulation of proposed solution, and technical details, such as approaches for estimation of infilling model, are detailed in Section 2. The results of the infilling process and evaluation of model achievements in infilling datasets and related statistical test are discussed in Section 3 followed by summary and concluding remarks in Section 4.

2. Materials and Methods

This study was carried out in Machakos Makueni and Kitui Counties of Kenya. The study area is located in the arid and semi-arid regions of the country. The area lies between Latitude 00°03′ and 3°00′ and Longitudes 36°45′ degrees 39°12′ (Figure 1). The area receives rains twice a year, with the main rains season occurring in October to December and the lesser rains season occurring in March to May. The annual rainfall ranges from 500 mm in the low moorland areas to 1500 mm in the sub-humid hilltops. The seasonal rainfall is highly variable, erratic and unreliable.

2.1. DATA

The data used in this study is of secondary nature comprising of rainfall elements measured on rain gauge instruments in the study area and satellite based rainfall estimates. The rain gauge data series comprised of monthly records for the period 1961–2011 for the different stations. Only rain gauges with missing data gaps were considered in this study. The rain gauge data series used comprised of records for the period 1961–2011 for the different stations. Table 1 below shows the length of the rain gauge data series used in the study. TRMM is a joint mission of the U.S. National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), designed to monitor and study tropical rainfall [31]. TRMM has a data coverage area, ranging from latitudes 50° S to 50° N, and a spatial resolution (0.25° × 0.25°). The TRMM rainfall estimates have more reliable data than those obtained from other satellites [32]. The TRMM data series used in this study comprised of records for the period 1998–2011.

2.2. Data Analysis Approach

Two types or techniques of data analysis were considered to achieve the objectives of this study; correlation analysis and the least squares regression method. Data from closest TRMM grid point were compared against each respective rain gauge.
TRMM data from the chosen grid point was used to compare with the corresponding observed rain gauge data. Correlation analysis was used for comparison of the rain gauge and TRMM data fields to confirm relationship of the two data series.
Table 1 shows the locations of the rain gauges matched with the corresponding grid point at which monthly TRMM data was extracted, the estimated distance between rain gauge location and grid point data, the number of data points of missing record and the period of missing rain gauge data. The datasets of the rain gauges had large continuous gaps of missing data for period 2008–2011.
The least square regression method was used to translate estimates of rainfall values donated by the TRMM data series into rainfall values for infilling into the rain gauge series. The viability of the TRMM rainfall data to infill rain gauge missing data gaps was first evaluated through comparison with rain gauge data for the periods in which the rain gauge datasets were complete. Descriptive statistics of station rainfall was calculated for all the TRMM cells and corresponding rain gauge stations, and compared in monthly intervals. Scatter plots of the rain gauge data and TRMM were plotted to confirm the versatility of TRMM data to infill the rain gauge data.

2.3. Methods of Infilling Rain-Gauge Data

A variation of the linear regression method as employed by Krug et al. [33] was used in this study. The method considers that if records for a normal climatological period of 30-years are incomplete for a desired station, then the records are extended by correlation with a nearby station using Equation (1).
Y1 = ys + (b)(x1 − xs)
where
  • y1 = the estimated value for the missing gap rain gauge data for the respective month.
  • ys = the mean value for the respective month for TRMM dataset for the period of record 14 years (1998–2011).
  • b = the slope of the regression line between the concurrent (1998–2011), mean value at the rain gauge station and TRMM.
  • x1 = the 30-year (climatological standard 1971–2000) mean value for the monthly rain gauge data.
  • xs = the mean value for the rain gauge station for the concurrent period with the TRMM.
TRMM datasets were used to estimate monthly values of rain gauge records to fill the gaps. The TRMM data at respective grid point was used as reference for corresponding rain gauge as indicated in Table 1. In this approach, the least squares regression was used in an extension method following a linear form (y = bx + c), but that the coefficient “b” and constant “c” were set not to minimize squared errors, but rather to maintain the sample mean and the variance according to Hirsch [26]. Two such linear equations that preserve the sample mean and variance are given in Hirsch [26]. These equations are labelled: “Maintenance of Variance Extension, Type I” (MOVE.1) and “Type 2” (MOVE.2).
Hirsch [34] evaluated MOVE.1 for streamflow data extension methods and Parrett and Johnson [35] utilized MOVE.1 for extending streamflow gaging data in eastern Montana over a fifty year period of record. Alley and Burns [36] and Hirsch [26] evaluated MOVE.1, and MOVE.2, for streamflow data extension using least squares linear regression and linear regression plus white-noise based on criteria of sample mean and variance maintenance. These works from the literature suggest that MOVE.2 is the most effective infilling method in preserving the mean, variance, and extreme order statistics of a baseline data. As such MOVE.2 was used in this study to extend the rain gauge data using TRMM monthly rainfall data.

2.3.1. MOVE.2 Method

Missing rain gauge data were infilled using related TRMM data following the MOVE.2 model. The TRMM values are denoted as “X(I)” where “I” is an index of time (month). The rain gauge data values were denoted as “Y(I)”. The events for the two sequences are represented as:
X(1), .........., X(N1), X(N1 + 1),............................., X(N1 + N2)
Y(1),.........................,Y(N1)
where “N1” represents the number of rain gauge/TRMM data values used to make the regression equation. When “N1” is less than 10, there is not enough known TRMM data to build a regression relationship (since TRMM data records commences at 1998). “N2” represents the number of missing data gaps in the series. When “N2” is equal to 1, there is only one missing data gap. “N1 + N2”represents the total number of values in the data set. The following notations identify the different parts of the two series.
X = x(1),..................., X(N1), x(N1 + 1),........................., X(N1 + N2)
X1 = x(1),.............., X(N1)
X2 = x(N1 + 1),................, X(N1 + N2)
Y1 = y(1),..............................., Y(N1)
It is not necessary for the two sequences to begin or end simultaneously, nor for the observations be consecutive [26]. The MOVE.2 infilling equation yielding an estimate for the missing rain gauge data, denoted as “[yN(I)]”, is given by the relationships in Equations (6)–(9) as derived by Hirsch [26].
y ( i ) = m ( y 1 ) + S ( y ) S ( x ) [ x ( i ) m ( x ) ]
m ( y ) = m ( y 1 ) + N 2 N 1 + N 2 r S ( y 1 ) S ( x 2 ) [ m ( x 2 ) m ( x 1 ) ]
S ( y ) = 1 N 1 + N 2 1 [ ( N 1 1 ) ] S 2 ( y 1 ) + [ ( N 2 1 ) ] r 2 S 2 ( y 1 ) S 2 ( x 1 ) S 2 ( x 2 ) + ( N 2 1 ) a ` ( 1 r 2 ) S 2 ( y 1 ) + N 1 N 2 N 1 + N 2 r 2 S 2 ( y 1 ) S 2 ( x 1 ) [ ( m ( x 2 ) m ( x 1 ) ) 2 ]
a ` = N 2 ( N 1 4 ) ( N 1 1 ) ( N 2 1 ) ( N 1 3 ) ( N 1 2 )
where “m()” and “s2()” represent the mean and variance of the series in the parentheses respectively; “r” represents the product moment correlation coefficient of “x1” and “y1”.
Thus, in the MOVE.2 method, the mean and variance estimates for “x” are based on all “N1 + N2” observations, and the mean and variance estimates for “y” (i.e., “mN(y)” and “SN2(y)” respectively) are based on the historical values of “y” and on information transferred from the “x” sequence of data. à is a coefficient.

2.3.2. Evaluation of Infilled Data Series

Evaluation was done to examine the extent to which the MOVE.2 method would yield correct values of estimated rain gauge data to infill gaps on repeated trials. Given that the MOVE.2 approach was used in the study to infill long running data gaps, it was necessary to create long running gaps for purpose of the evaluation of the method. The procedure involved removing 12 successive points from the rain gauge data series to create running gaps in the rain gauge series. The gaps were created for each station for the respective years commencing 2007–2011. The months which data was removed was the months which did not have a gap in the original unfilled rain gauge series. If one month had gaps in the original rain gauge data series, this month was not included in this part of analysis.
For each of the months with removed data, a value was computed following the MOVE.2 approach and the same was used to fill the created gap. The months whose data was removed were then replaced with MOVE.2 data series and the series was used in test of reliability. The procedure to remove the data for the respective months was done in steps so that not any more than one year (12 months), were removed at the same time, but each year was removed with successive replacement of the same data to be used in computing MOVE.2 values of next removed year.

2.3.3. Jacknife Sampling Approach

A jackknifing sampling approach was used to evaluate the effectiveness of the MOVE.2 approach to infill rain gauge data gaps. In this simulation, the actual rain gauge data was compared with corresponding TRMM-MOVE.2 values for the respective periods. Thus, in this approach, it was easy to compare performance of imputation methods. The following notation was used in a sums of squares equation:
Yijk = is the rainfall value measured on a rain gauge for the ith month, jth year and kth station, and is an element in the rain gauge dataset of the station within the period 1998–2011. For purpose of evaluation, Yijk was removed from the dataset and replaced with an estimated value Zijk.
Therefore,
Y ¯ = 1 N 1 1 n Y i j k
is the average of the rainfall value measured on a rain gauge for the ith month, jth year and kth station and
Y . j . ¯ = 1 N 1 k = 1 n Y ¯ . j k
tracks the average change across all the data years as the MOVE.2 model is simulated to estimate the values of removed rainfall values of the respective rain gauge with subsequent replacement.
Zijk is the MOVE.2 estimated value of rainfall for the ith month, jth year and kth station which was used to infill the data gap created by removing Yijk, Zijk is thus an element in the TRMM-MOVE.2 imputed rainfall series.
Likewise,
Z ¯ = 1 N 1 1 n Z i j k
is the average value of the MOVE.2 estimated rainfall for the ith month, jth year and kth station which was used to infill the data gap created by removing Yijk, and
Z ¯ . j . ¯ = 1 N 1 k = 1 n Z ¯ . j k
tracks the average change across all the data years as the MOVE.2 values are simulated and added to the series of the removed rainfall values of the respective rain gauge with subsequent replacement.

2.3.4. Evaluation of Errors in the MOVE.2 Estimates

An evaluation of the suitability of the MOVE.2 values for infilling the rain gauge data gaps was done. The evaluation compared samples of the original rain gauge values with MOVE.2 values for the respective sample areas. The evaluation followed 3 steps as follows: first a visual inspection and comparison of non-parametric characteristics of the infilled series against original series was done. The non-parametric comparison considered the descriptive statistics such as median, skewness, kurtosis, minimum and maximum values with due consideration of the influence of the statistics in the distribution of the data sets.
The second step in the evaluation considered the effect of random errors in the computation of the MOVE.2 values. Systematic error inherent in the measurement of rainfall whether in the rain gauge or in the TRMM data series were not considered. An analysis of errors was used to indicate the difference between the computed MOVE.2 values with the original data. The error analysis considered two types of error, the Mean Absolute Percent Error (MAPE), regression residuals. The errors were computed for the samples generated following the jacknife sampling approach in Section 2.3.3.
MAPE is the average of the absolute differences between the estimated values of MOVE.2 and actual rain gauge values, expressed as a percent of actual values.
The SEM was estimated by the sample estimate of the population standard deviation. The SEM assumes statistical independence of the values in the sample and was computed by Equation (14).
S E M = n
where ∂ is the sample standard deviation.
And n is the sample size.
For each of the data series with data replaced with MOVE.2, values, regression analysis and test of equality of means and variance of the series was done. The regression analysis was done to examine how close the replaced data was to the original rain gauge data. Using regression analysis, the capability of the MOVE.2 approach to infill the rain gauge data gaps was tested further. The regression residual was used to estimate the difference between the rain gauge value of the samples (dependent variable) (y) and the predicted MOVE.2 values (ŷ). For each of the samples used in the jacknife resampling, each data point was estimated by the computed MOVE.2 value and the regression resultant regression residual were considered for the difference with rain gauge values. In the regression analysis, the residuals were computed following Equation (15):
Residual = Observed − Predicted
Analysis of the residuals was done to determine the difference between the MOVE.2 values and the rain gauge values.

2.3.5. Test of Preservation of Mean and Variance

The method of moments was used to estimate the mean and variance. Parameter estimation was done for the 2-parameter gamma distribution as:
E(x) = αβ and
Var(x) = αβ2
where E(x) denotes the expected value of the variable and Var(x) denotes the variance.
This approach was used since probability distribution function extension data are known to have the same value distribution as the measurement, but on average have no autocorrelations [37]. The Student t-test and the F-test were used to compare the means and variance of the original datasets and the extended datasets. Statistical significance of the hypotheses test was determined by p-value at 5% level.

2.3.6. Test of Goodness of Fit

A “goodness-of-fit” test is a procedure for determining whether a sample of n observations, x1,…, xn, can be considered as a sample from a given specified distribution. The Pearson correlation coefficient and the coefficient of determination were used to test the closeness of the estimated TRMM-MOVE.2 series and the original rain gauge data series.

2.3.7. Stationarity of Extended Time Series

Given that most of the missing gaps in the data series to be infilled occur consecutively in tine sequences, it was necessary to confirm that the time series generated upon infilling of datasets remain stationary. A key assumption in regression is that the error terms are independent of each other. It is therefore necessary to confirm that there is no autocorrelation in the series. The Durbin-Watson test was used to test for autocorrelation. The Durbin-Watson statistic was computed following Equation (18).
d = i n ( e i e i 1 ) 2 i = 1 n e i 2
where the ei = yi – ŷi are the observed and predicted values of the response variable for individual i and n = the number of elements in the sample.

3. Results and Discussion

3.1. Comparison of Rainfall Records TRMM vs. Rain Gauge

A comparison of rainfall data from the rain gauge and TRMM data was done using data for periods of the TRMM data 1998–2011 which were found not have gaps in the respective rain gauge datasets. Figure 2 shows time series plots comparison of the monthly values of the TRMM and rain gauge datasets for Kampi ya Mawe station.
From Figure 2 it is observed that the TRMM datasets fit well with the rain gauge datasets for Kampi ya Mawe station. The close association of the rain gauge and TRMM datasets were further confirmed with the scatter plots of respective stations. Figure 3, Figure 4, Figure 5 and Figure 6 show the scatter plots for Mutonguini, Kambi ya Mawe, Kitui and Mutomo. The scatter plots were done for the month of rain gauge and TRMM data for selected years in the period 1998–2000.
The scatter plots indicate that strong positive association exist between the TRMM rainfall estimates and rain gauge-based rainfall observations. From the foregoing comparison of TRMM and rain gauge datasets, it is observed that TRMM rainfall datasets fit closely with rain gauge data series for Machakos, Makueni and Kitui County. As such it is inferred that TRMM rainfall estimates are a viable dataset for use in infilling missing rain gauge data gaps.

3.2. Infilling Missing Values of Rain Gauge Data

Missing data gaps in rain gauge datasets were infilled following the MOVE.2 approach. The MOVE.2 infilling model required stepwise approach to be able to account for special discontinuities. The rain gauge and TRMM datasets were arranged into annular sequences of monthly data series such that each month of the year had its own time series of rainfall data. Therefore, for each rain gauge and corresponding TRMM there was 12 series of annular sequence of month data time series (that is the sequence of all January data points for the period of interest for each respective station). In this arrangement for the station with the long missing data gaps (for example Mutonguini with 24 consecutive missing gaps), the missing gaps were reduced at most to two missing data gaps for infilling at the furthest point of estimation.
Following the MOVE.2 approach, stations whose missing gaps occurred earlier than January 2007, had only less than 10 data points of TRMM to be used in the regression. This was so because TRMM rainfall estimates commence in January 1998. As such, the infilling method discussed here applied only for missing gaps occurring at January 2008 onwards. Separate MOVE.2 regression relationships were developed for the stations Kambi ya Mawe, Mutonguni, Kitui, Mutomo, Kisasi, Lukenya, Matungulu, Matiliku and Mutito Forest.
For example, the estimated infilled value for Kitui station for the month of January 2009 followed the Equation (19).
Y ( i ) = M i ( y ) ( Jan ) + S ( y ) ( Jan ) V a r ( T R M M ) ( J a n ) [ T R M M ( J a n ) M e a n   f o r   T R M M ( J a n ) ]
This equation was used to estimate the value of infilled rain gauge missing gaps for January 2009 in Kitui station. The subsequent gaps occurring in the months of February 2009–December 2009, were filled by using the appropriate TRMM value for the respective months as in Equation (19). Table 2 and Table 3 show the MOVE.2 parameters used to compute the infilled values following equation 16 for Kitui and Mutonguini station (Kambi ya Mawe, Mutomo, Kisasi, Lukenya, Matungulu, Matiliku and Mutit forest). Similar parameters were computed to infill gaps in other stations. One hundred and forty-five data gaps for nine stations infilled in this method.
For the first month of long running data gap, the value of the coefficient à changes with the value of N1 and N2. This change affects the computed value of S’(y) which is estimated variance of the infilled series. The value of N1 changes due to overlap of the months for the preceding year. However, the equation of the infill remains the same for the subsequent year because of the effect of the change in N1 to N1 + 1 and N2 (1st gap for the year) form intrinsic part of S’(y).
The equation was developed with MOVE.2 approach with the intention to preserve mean and variance. Subsequent infilling of other data gaps followed the MOVE.2 process. The MOVE.2 equation was only applied at N1 and N1 + 1 depending on the number of data gaps for each month at each station. For each station, the regression equation was applied to estimate the respective value of the data gap. In this method, each month at which data was estimated was considered independent of the previous estimate. Table 4 shows the number of data gaps infilled for the respective months for the stations.

3.3. Evaluation of Infilled Data Series

MOVE.2 values were evaluated for precision and accuracy in estimating the rain gauge values. The evaluation involved comparison sampled series of rain gauge data which were removed from the series and replaced with estimated values following a jacknife approach of replacement. Following the approach described in Section 2.3.3, MOVE.2 values were computed for gaps which were created by removing some rain gauge data. Figure 7 and Figure 8, show plots of infilled data plotted against the rain gauge data in the removed areas for Kambi ya Mawe and Kisasi stations respectively. From the figures it is observed that the MOVE.2 infilled values follow the rain gauge data closely, but they are not a one-on-one match.

3.3.1. Comparison of Descriptive Statistics

Statistical parameters Mean, Median, standard deviation (Std. Dev), standard error of the mean (Std. Err. Mean), Minimum, Maximum, Skewness and Kurtosis were used to compare the MOVE.2 values infilled in the gaps where rain gauge data had been removed. In this analysis, the difference between the respective summary statistics of the rain gauge values and the MOVE.2 estimates were evaluated. Altman and Bland [38], recommended the use of the difference approach for comparison of summary statistics. The evaluation was done based on a non-parametric approach considering only the arithmetic difference of the statistics. For each station, the arithmetic difference in the summary statistics (Median, Standard Deviation, Standard Error of the Mean, Maximum and Minimum, Skewness and Kurtosis), of the samples originating from the samples of monthly values of rain gauge and MOVE.2 values were evaluated. Table 5 shows the computed differences of the summary statistics.

Difference in the Standard Error of the Mean

The standard error of the mean (SE of the mean) estimates the variability between sample means that were obtained when multiple samples from the same population. In this study, the difference between the standard error of the mean of the samples of rain gauge values and standard error of the mean of the samples of the MOVE.2 estimates, were used to compare the difference in variability of the mean of the rain gauge values and the values of computed MOVE.2 estimates placed at gaps previously created by removing the rain-gauge values against the true rain-gauge values at those respective positions. Reading from Table 5, lower values (less than 2 standard deviations), of the difference of the standard error of the mean indicate closeness to precision of the MOVE.2 estimates to the rain gauge values.
In this way, the difference in the standard error of the mean as indicated in Table 4 is an indication of the deviation of the MOVE.2 estimates from the actual values of variability of the mean. The units of the standard error of the mean are rainfall units (millimetres—mm). The standard error of the mean is a good indicator of the precision of the estimated MOVE.2 values to infill respective rain-gauge values. This analysis is in line with inference made by Altman and Bland [38], that 95% of observations fall within 2 standard deviations. The difference in the standard error of the mean summary statistics viewed in this manner therefore indicates close proximity for all the samples of MOVE.2 values and rain gauge values analysed.
Thus, in this analysis the arithmetic difference between the standard error of the mean of the MOVE.2 infilled values and standard error of the mean of the rain gauge values indicates closeness of the estimated (MOVE.2 values) to the actual data (rain gauge values) for each of the removed data gaps.

Difference in the Median

The median is a measure of location which is useful, particularly when a distribution is skewed, and the end-values are not known, or when it is required that reduced importance be attached to outliers. This consideration is necessary for the purpose of measurement of errors. Given that the median is the 2nd quartile, 5th decile, and 50th percentile, the median values in this study were used alongside the minimum and maximum values of rain gauge data to determine the central location of the data series and compared the same with that of the MOVE.2 infilled series.
From Table 5, it is observed that the difference in the skewness and kurtosis of the two samples data sets is small (less than 1). The low difference analysed imply that the skewness and kurtosis of the samples of the rain gauge data series and the MOVE.2 values are in close proximity. This is an indication that the infilled datasets do not significantly affect the skewness nor the kurtosis of the data series. This is inferred due to the fact that the distribution of the differences in skewness and kurtosis was always symmetrical about zero, and of magnitude less than one, in the respective periods for all the stations.
It is also worth noting that the differences analysed in the minimum values was always low (less than 10 mm of rainfall). The minimum rainfall occurs during the non-seasonal months of January, February, June, July, August and September. On the other hand, systematic errors for TRMM estimates have been observed to be more during the non-rain months, since aggregation of hourly TRMM always gives values more than zero [39]. The difference in the maximum value is affected by large outliers associated with the influence of rainfall by topography. TRMM measurements have also been associated with low skill on highly variable topographic regions [39]. Reading from Table 5, the infilled data series was observed to maintain the location of the median value as exhibited by the rain gauge series without affecting the skewness nor the kurtosis of the distribution of the infilled series for all the stations.
The Wilcoxon signed test is a non-parametric statistical hypothesis test used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a paired difference test). Given that the median is a measure of the central location in a data series. The Wixcon test was used to evaluate the closeness of the median to the position of the mean.
The requirements for the Wilcoxon Signed-Rank Tests for Paired Samples where zi = yi – xi for all i = 1, …, n, are as follows:
The zi are independent;
xi are differences data;
The distribution of the zi is symmetric (or at least not very skewed).
The null hypothesis was thus stated as follows:
H0: the distribution of difference between paired values of the median of the samples of rain gauge data and the corresponding MOVE.2 values is symmetric about zero. (That is, any differences are due to chance). The test was done at values of α = 0.05 and n = 14 (i.e., the number of values of the TRMM period of rainfall data). From the statistical table we find that Tcrit = 21 (two-tail test). Since Tcritical = 21 < 35.5 = T. The decision to reject or accept the null hypothesis was done at α = 0.05 (i.e., p ≥ 0.05). Table 6 below shows the decision (1) to accept and (0) to reject the null hypothesis, of the Wixcon test and so conclude there is no significant difference between the two data series.
From Table 6, it is observed that a mix of both acceptance and rejection of the null hypothesis is analysed for the different samples at different stations. Notable in this analysis is the scenario in the months of April, October, November and December where all the stations accepted the null hypothesis indicating therefore that the median was close to the mean in these months. The months of February, June, July, August and September exhibited rejection of the null hypothesis, thus indicating that the median location was not close to the mean. It is worth noting that the months of April, October, November and December are the months with the highest seasonal rainfall exhibiting the high rainfall amounts in the study area. Mahmud et al., [40], analysed similar characteristics between TRMM estimates and rainfall in Peninsular Malaysia and noted that correlation between TRMM and monthly rainfall was good during the wettest months in all local climate regions. Thus, borrowing from Mahmud et al., [40] it may be inferred that the difference indicated in the median probably originate from the TRMM data rather than induced from the MOVE.2 analysis approach.

3.3.2. Parametric Evaluation of Infilled Data Series

An error of measurement is the difference between an obtained value and its theoretical true score counterpart. Two types of errors were used to evaluate the accuracy and precision of the estimated MOVE.2 data series, including Mean Absolute Percent Error (MAPE), and analysis of regression residuals.

Mean Absolute Percent Error (MAPE)

MAPE was used as a measure of accuracy of the infilled data series of the sampled MOVE.2 data to estimate the respective rain gauge rainfall values, Hyndman and Koehler [41]; Wilson [42] recommended that MAPE be used for evaluation of cross-sectional estimates such as the MOVE.2 estimates of rain gauge rainfall. MAPE expresses the accuracy of the MOVE.2 infilled data series as a percentage of the rain gauge data series. Figure 9, shows the distribution of MAPE for the nine stations in the study.
From Figure 9, it is observed that low values (less than 100%), of MAPE are analysed for the months of January, March, April, May, October, November and December. These months are the period of seasonal rainfall for the study area. However, a drastic increase and extremely high values of MAPE are analysed for the months of February, June, July, August and September, which also happen to be months of low rainfall amounts.
Given that the MAPE is a relative measure which expresses errors as a percentage of the actual data, it provides in this analysis an easy and intuitive indication of the distribution of errors in the infilled series of estimated rain gauge values. It also gives a way of judging the extent, or importance of errors, such that in this case an error of 10% when the actual value is 100 (making a 10% error) is more worrying than an error of 10 when the actual value is 500 (making a 2% error). This aspect is clearly indicated with the low values of error for the months of high seasonal rainfall and the high values of error during the months of low seasonal rainfall. Thus, the distribution of the MAPE indicated in Figure 9 is an indication of relatively acceptable distribution of errors for the infilled MOVE.2 derived estimates.

Error in the Regression Analysis

Figure 10 shows regression results of the samples of Kisasi station for the year 2011. The plot shows the MOVE.2 values for the year 2011 against rain gauge values for the same year for the station. In this plot each point plotted on the figure indicates where the MOVE.2 values are plotted on the x-axis, and the accuracy of the observations are on the y-axis. The distance from the solid line (perfect agreement) indicates the magnitude of the error (residual) on the prediction of the value. Values above the solid line mean the prediction was too low, and values below the solid line mean the prediction was too high. In this regression analysis, it is observed that the computed MOVE.2 values are close to the rain gauge values with relative small margins of errors. This analysis was repeated for all the stations and similar results were observed.
Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19 shows plots of the mean of regression residuals for each station. The mean of regression residuals was computed for the number of years which were sampled for each of the stations. From the plots of mean regression residuals, it is observed that the mean residuals are not evenly distributed vertically, such that there are positive and negative residuals. It is also observed that the residual exhibit high variability but certain patterns are easily discerned from the plots. For example, it is notable that the during the months of October, November and December which also are the main rainfall season of the study area, the residuals exhibit low values (less than 30 mm) for all the stations. The months of March and April exhibit high variability of the regression residuals across the stations. This is despite the two months being months of seasonal rainfall in the study area. The high variability of regression residuals during the March–April–May season may be related to the high unreliability of rainfall during the period [43]. Glover et al., [44], estimated the unreliability of rainfall in the April to May season in the South Eastern parts of Kenya within which this study was conducted at 40%. The 40% unreliability of seasonal rainfall depicts a situation of erratic characteristics of rainfall with high variability.
Figure 20 shows the normal probability plot of the residuals. In Figure 20, the pattern of the residuals curve is approximately linear indicate that the residuals are normally distributed hold.

3.4. Test of Equality of the Mean and Variance

The statistical tests t-test and F-test were based on two approaches, first the data of the series generated with the jacknife sampling was arranged in the order of running calendar months (a series for each station for the sample period) containing data of the generated MOVE.2 values, and the dataset of respective rain gauge values arranged in a similar manner and the mean/variance of the two series were compared for equality in a t-test and F-test respectively.
The data of the series generated with the jacknife sampling was arranged along the annular month (a series of data of the order of annular modes representing year to year variability), month-month values beginning 1998 up to and including the year of which there was replacement with the MOVE.2 value). The sequence of the annular series was such as that the sequence of values was, for example: Jan 1998, Jan 1999, Jan 2000, ..., Jan 2011). A similar sequence for each of the 12 calendar months were developed. Two annular months series, one with the surrogate data and the other of rain gauge data within the sections without gaps were developed. The mean and variance of the two series were compared for equality in a t-test and F-test respectively. This approach applied only for the years preceding the gaps in the respective stations. The years within the gaps area as indicated in Table 1 and the years after the appearance of gaps were not included in the analysis.

3.4.1. Two-Sample t-Test for Equal Means

The two-sample t-test [45] was used to determine if the means of the rain gauge data series and the MOVE.2 estimated series are equal. The test, was used to determine whether a significant difference exists or does not exist between two data sets. The t-test was also used to determine whether the two sample means of two independent samples come from the same population. In the t-test, the formula for calculating “t” is given in equation [46].
The null and alternative hypotheses were stated as follows:
H0: µ1 = µ2; the means are equal
H1: µ1 ≠ µ2; the means are different
This is a two tailed test because the Null Hypothesis does not specify a direction, only the condition of equality.
The t-test indicates that there is not enough evidence to reject the null hypothesis that the two means are equal at the 0.05 significance level. The t-test therefore concluded that the two datasets rain gauge datasets and MOVE.2 infilled datasets have the same means at the 0.05 significance level and that the two datasets may be considered to come from the same population.

3.4.2. F-Test for Equality of Two Variances

An F-test is a statistical test in which the test statistic has an F-distribution under the null hypothesis. An F-test [47] was used to test if the variances of two populations are equal. The F-test used is a two-tailed test. The null hypothesis was stated as:
H0: σ1 = σ2
H1: σ1 ≠ σ2
The F Statistic was computed as:
F = s1/s2
where s1 and s2 are the sample variances. The more this ratio deviates from 1, the stronger the evidence for unequal population variances. The variances are significantly different if F is greater than the appropriate value in the F table. The degrees of freedom for the numerator are (n1 − 1), where n1 is the sample size for the group with higher variance. Degrees of freedom for the denominator are (n2 − 1), where n2 is the sample size for the denominator group. This is a two-tailed test.
The F-test indicated mixed analysis with many favouring acceptance of the null hypothesis and two stations favouring rejection of the null hypothesis. The stations of Kisasi, Kitui, Mutonguini, Mutitu, and Lukenya the null hypothesis was accepted for all the samples. The station of Kisasi indicated rejection of the null hypotheses for two samples 2007–2008 and 2011 while Matiliku indicated rejection of the null hypotheses for one sample 2010–2011 The F test indicates that there is enough evidence to reject the null hypothesis that the two variances are not equal at the 0.05 significance level.
Notable in this analysis is that those months which had incidents favouring the acceptance of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are equal. The months of low rainfall including January, February, June, July, August and September indicated rejection of the null hypothesis indicating that the variance of the two samples MOVE.2 generated surrogates and the rain gauge dataset are not equal. Details of the computation of the t-test and the F-test may be found in the Appendix of this paper.

3.5. Confirmation of Preservation of Mean and Variance

A Gamma probability density function (PDF) was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48]. The data sets of the extended MOVE.2 were fitted into a gamma distribution function and statistical tests of equality of mean and variance was done. Figure 21, Figure 22, Figure 23 and Figure 24 show the Gamma cumulative distribution function for Mutomo station comparing the plots of original data and that of the infilled data for the month of January. It is observed that the cumulative function of the extended dataset fit well with the distribution of the original data. Given that the gamma function fits well, it is an indication that the plots have similar parameters α and β for the two plots further confirming the assumption of preservation of mean and variance of the MOVE.2 approach.

3.6. Autocorrelation Test

Given that the infilled values were translated from different datasets, it is prudent to test for autocorrelation among the adjacent variables. If they are correlated, then this implies that the least-squares regression underestimated the standard error of the coefficients and predictors can seem to be significant when they may not [49]. The Durbin-Watson statistic was used to test for autocorrelation within adjacent values in the new series after infilling of the data. Table 7 shows the values of Durbin-Watson statistic computed for each data series.
For autocorrelation test the critical limits are 2 − DL and 2 − DU. The hypotheses were stated as follows:
H0: ρ = 0 (no serial correlation);
H1: ρ < 0 (negative serial correlation)
If d < 2 − DU do not reject H0, if d > 2 − DL reject H0.
Since all the values of Durbin-Watson statistic are greater than 2, H0 is not rejected and the conclusion is that there is no serial correlation in the infilled data series. The analysis of lack of serial correlation in the infilled data series serves to confirm the stationarity assumption of the extended series.

3.7. Goodness of Fit

Table 8 shows the computed values of correlation coefficient and the coefficient of determination for the years 2007–2011 for the respective stations. Medium and high values of correlation coefficient and the coefficient of determination were analysed between the original rain gauge data series and the MOVE.2 surrogate series for the different years at all the stations.

3.8. Discussion

The intent of infilling missing data is to produce a time series which is relatively long that possesses statistical characteristics believed to be like those of the actual record for the station [50]. The reason for producing such a record is for use in simulation and optimizations related to potential water management decisions. This study demonstrated the extension of rain gauge records donated from TRMM rainfall estimates following a least square regression MOVE.2 approach. In this methodology, the study transferred the characteristics of distribution shape, serial correlation, and seasonality from the TRMM dataset to the rain gauge station record [51]. The analytical derivations, based on linear regression alone cannot be expected to provide records with the appropriate variability [52], and TRMM data series cannot be expected to provide records with the appropriate distribution shape or serial correlation as the rain gauge data series. This is so because the TRMM data series and rain gauge data series have substantial differences in terms of distribution shapes, serial correlation, or seasonality.

3.8.1. Viability of Infilled Data Series

The jacknife sampling approach used in this study for evaluating the infilled series involved removing one value from the annual month series of rain gauge data to enable estimation of the same following the MOVE.2 approach. The remove-1 jacknife, approach, however is known to give inconsistent variance estimators for non-smooth estimators such as the sample quantiles including the median [53]. This deficiency was overcome in this study by increasing the number of values removed, following a smoothness measure of the point estimator as recommended by Shao and Wu, [53]. In the analysis, there were 12 values removed on the running series of monthly rain gauge data sets in one jacknife sample of one annual month. Thus, it followed that for each annual month removed, there were a total of 12 running month series of values removed thereby achieving the required values for smoothness of estimator. The sampling methodology used in this study also follows very closely with the suggestions of Guo Hua et al., [54].
In estimating the median using a jacknife sampling approach, Guo Hua et al., [54] observed a lack of smoothness which seemingly was caused by the jacknife inconsistent estimate of the standard error. In this concern, Guo Hua et al., [54], suggested that instead of removing one value at a time in the jacknife, a number of values, equivalent to (d), be removed where n = r.d for some integer r. Guo Hua et al., [54], actually suggested removing out more than d = n when estimating the median, but fewer than n values to achieve consistency for jacknife estimate of standard error. These suggestions made by Guo Hua et al., [54], are similar to recommendation of Shao and Wu [53]. Therefore, since in the jacknife approach (used as explained in Section 2.3.3), considered the requirement for consistency as suggested by Guo Hua et al., [54] and Shao and Wu, [53], it is expected that the MOVE.2 values as evaluated in this study give a true picture of the capability of estimation of the rain gauge values.
The use of the MOVE.2 method produced infilled series with statistical characteristics (mean, variance and extreme values) of the rain gauge series. The MOVE.2 methodology has desirable properties that enable appropriate preservation of the parameters. The MOVE.2 methods also considered the two distributions as separate and distinct distributions with different parameters yet combining into one distribution with the same parameters. A probability density function (PDF) approach was used to confirm the preservation of mean and variance of the infilled data series of rain gauge data following Theiler et al., [48]. Sen and Eljadid, [55] indicated that the gamma distribution has appropriate probability distribution for describing monthly rainfall for arid and semi regions. The month data series of the infilled data sets were fitted into a gamma distribution function and statistical tests of preservation of mean and variance was done. It was observed that the cumulative function of the extended dataset fit well with the distribution of the original data. Given that the gamma function fits well, this is an indication that the plots have similar parameters α and β, confirming the assumption of preservation of mean and variance of the MOVE.2 approach.
No physical quantity can be measured with perfect certainty; there are always errors in any measurement. This means that the measurement of MOVE.2 estimates of rain gauge rainfall values, on a repeated basis as more gaps are infilled, certainly will contain errors [56]. The error analysis is an attempt to quantify the uncertainty resulting from the infilled values. The understanding of the errors also contributes to emphasizing the need for care in the measurement and application of refinement of the method for the purpose of reducing the errors. We can thereby gain greater confidence that the computed MOVE.2 values closely approximate the true value [57]. Error analysis in this study therefore expresses the uncertainties inherent in the estimated values of rainfall computed by the MOVE.2 approach for infilling in the rain gauge data gaps. As such it is inferred that the results of the error analysis are an indicator of the high quality of the extended data series. It is thus inferred that MOVE.2 approach enables maintenance of high quality rainfall data series even after the infilling of the extended datasets.
A mean-preserving spread is a change from one probability distribution (donor series) to another probability distribution (recipient series), which is formed by spreading out one or more portions of the donor probability density function while leaving the mean of the recipient series unchanged [58]. As such, in this study, TRMM data series have proven to be good at preserving the mean and variance contraction of rain gauge data series following Gentzkow and Kamenica [58].
A statistical test confirmed the significance of the similarity of the statistical parameters’ mean and variance of the infilled dataset for all the stations. In this study, therefore, it is inferred that the use of TRMM data series to infill rain gauge data following the MOVE.2 approach is the mean and variance preservation method. The approach agrees with Khalema [59], who showed that one can mix a baseline distribution with a Gamma distribution and obtain a mixture distribution which has mean and variance preservation capability.

3.8.2. Reliability and Validity of Infilled Data Series

Harvey et al., [50] identified three factors likely to influence reliability of data infilling, the nature of the donor station (TRMM in this case), the location of the station and duration of the gap and the infilling procedure. In this study, TRMM rainfall estimates were confirmed as a good fit of the rain gauge data. The MOVE.2 regression relationships were developed for rain gauge series and TRMM series data for each month of the 12 calendar months. In this approach the number of missing data gaps was reduced to a maximum of two data points for each month for the longest running series of missing data (24 months). Other stations had at most only one missing data point for the respective month. Giustarini [60] observed that best performances for infilling missing data was obtained when the gaps were comparatively short. In this study, the MOVE.2 approach used along the sequential annual months series reduced the long-running missing gaps to short gaps for the respective months series. As such, this study recommends the use of MOVE.2 in a sequential annual months approach for infilling rainfall data from TRMM estimates for effectiveness. It is also observed that the reduced number of missing gaps for infilling reduces the regression errors, thereby enhancing the reliability of results. This method also agrees with Henn et al., [61] that shorter missing gaps are easy to fill for all methods. Generally, in ordinary regression methods of data infilling, it follows that the RMSE increases with an increase in the proportion of missing values (gap size). Furthermore, the MOVE.2 approach demonstrated in this analysis, suggests reducing the gap size, thereby reducing the RMSE. Thus, the MOVE.2 approach, utilizing sequential annual months, enables the infilling to attain high accuracy even with long gaps of missing data.

4. Summary

This study tested a methodology for infilling missing gaps in rain gauge observed data series following the least squares regression. The study presented a methodology for infilling the rain gauge data series from a satellite based rainfall estimates. The satellite estimates were extracted from grid points nearest to the respective station. These satellite estimates were used as donor stations.
The study tested the use of the MOVE.2 approach using TRMM satellite data as a donor station. The study therefore addressed an imperative challenge for hydro-meteorological science, of long consecutive missing data gaps among the rain gauge observed data series. This is particularly true for the ASAL of Kenya and Africa whose data gaps are rampant in the hydro-meteorological data series, and also other parts of the tropics where TRMM data observations are available.
In the MOVE.2 approach, the coefficient of linear regression was interpreted as being of marginal effect. This marginal effect corresponds to how the dependent variable (rain gauge data) changes when the independent variable (TRMM data) changes by an additional unit holding all other variables in the equation constant. Based on the data used in this regression, adding one additional month of rain gauge record, corresponded to an increase in monthly rainfall. The sequential annual month arrangement of rain gauge rainfall records helped to operationalise the capability of MOVE.2 approach. With this approach the methodology enabled the preservation of the mean, variance and extreme value statistic for the infilled data series. As such the infilled rain gauge series maintained the same distribution as the observed series. It is, however, worth mentioning that the preservation of the variance was not always upheld, particularly for months of low seasonal rainfall. This observation was also noted for the median.

5. Conclusions

The results reported in this study provide researchers with a methodological framework that can be readily applied for infilling missing values of rainfall in rain gauge data series using TRMM satellite estimates as donor station. The approach has demonstrated capability of extending monthly rainfall values which remain similar to those observed by way of preserving the statistical parameters such as mean, variance and extreme statistics. The infilled values of rainfall have characteristics like those of the actual records they are intended to represent.
The methodology therefore serves a need as expressed by researchers, for development of generic data infilling methodologies which ensure consistency, auditability and effectiveness in the infilled series.
Infilling of missing rainfall data in the data series using the least square regression in MOVE.2 approach as used in this study promises robustness of methodology even in situations of large and extensive data gaps with a high proportion of missing values. The approach proposes a way of shortening long and running missing gaps into very short and manageable missing gaps. The infilling of short missing gaps as proposed here, promises quality of infilled data and hence quality of predictions for models which utilise the infilled data series. The method offers a viable alternative to traditional infilling approaches.
The results suggest that MOVE.2 utilizing TRMM data is effective for infilling rainfall data series in Machakos, Makueni and Kitui counties of Kenya. The TRMM rainfall products coupled with MOVE.2 approaches could therefore be considered as viable alternative data source for large-scale distributed rainfall analysis for development of hydro-meteorological models such drought early warning, monitoring and forecasting. The approach ensures a consistent and auditable approach towards infilling, which could find application in the ASAL of Kenya and for the tropical regions in general.

Acknowledgments

There were no funds sourced for the publication of this paper.

Author Contributions

William Githungo is corresponding author and main contributor of research data, analysis, results and discussion. Silvery Otengi, University supervisor, guide on methods for data analysis; Jacob Wakhungu University supervisor, guide on methods for data analysis; Edward Masibayi University supervisor, guide on methods for data analysis.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The t-test and the F-test were conducted to test for similarity of the mean and variance of the generated data and the rain gauge datasets. The t-test and the F-test were conducted following two approaches. First the data of the series generated with the jacknife sampling was arranged in the order of running calendar months (a series for each station for the sample period) containing data of the generated MOVE.2 values, and the dataset of respective rain gauge values arranged in a similar manner and the mean/variance of the two series was compared for equality in a t-test and F-test respectively.
Second, the data of the series generated with the jacknife sampling was arranged along the annular month (a series of data of the order of annular modes representing year to year variability as described by Thompson and Li, [62], month-month values beginning 1998 up to and including the year of which there was replacement with the MOVE.2 value). The sequence of the annular series was such as shown in the example; (an example sequence of values was: Jan 1998, Jan 1999, Jan 2000, ..., Jan 2011). A similar sequence for each of the 12 calendar months was developed. Two annular months series, one with the surrogate data and the other of rain gauge data within the sections without gaps were developed. The mean and variance of the two series were compared for equality in a t-test and F-test respectively. This approach applied only for the years preceding the gaps in the respective stations. The years within the gaps area as indicated in Table 1 and the years after the appearance of gaps were not included in the analysis.

Appendix A.1. t-Test and F-Test following the Arrangement of Sample Datasets in Annular Months

The two-sample t-test [45] was used to determine if the means of the rain gauge data series and the MOVE.2 estimated series are equal. The test was used to determine whether a significant difference exists or does not exist between two data sets. The t-test was also used to determine whether the two sample means of two independent samples come from the same population. In the t-test, the formula for calculating “t” is given in equation [46].
The null and alternative hypotheses were stated as follows:
H0: µ1 = µ2; the means are equal
H1: µ1 ≠ µ2; the means are different
This is a two tailed test because the Null Hypothesis does not specify a direction, only the condition of equality.
For a two-sided t-test, the null hypothesis was rejected if the absolute value of the test statistic was greater than the value of t1-α/2,ν in the t-table. The mean of the series was computed along the annular month. This meant that the degrees of freedom changed for each removal of rain gauge values and subsequent replacement following the jacknife approach. For the year 2007, 18 degrees of freedom was used, in 2008 20 degrees of freedom, and up to 22 degrees of freedom for the year 2009. The result is significant if t is greater than the appropriate value in the t-table. The computed values of t for each month alongside the critical values of the t-test for the respective degrees of freedom are shown in the table below. If the t value calculated from the data is equal to or larger than the critical value, the Null hypothesis of H0: µ1 = µ2 was rejected otherwise the null hypothesis was accepted. The test was done for the means of all the twelve calendar months for the subsequent data removal and replacement as described in the jacknife approach. Incidentally, all the means computed favoured an acceptance of the null hypothesis, thereby upholding the hypothesis that µ1 = µ2.
Therefore, the t-test indicates that there is not enough evidence to reject the null hypothesis that the two means of the annular month series of surrogate datasets and rain gauge datasets are equal at the 0.05 significance level. The t-test therefore concluded that the two datasets rain gauge datasets and MOVE.2 infilled datasets have the same means at the 0.05 significance level and that the two datasets may be considered to come from the same population.
An F-test is a statistical test in which the test statistic has an F-distribution under the null hypothesis. An F-test [47] was used to test if the variances of two populations are equal. The F-test used is a two-tailed test. The null hypothesis was stated as:
H0: σ1 ≠ σ2
H1: σ1 = σ2
The F Statistic was computed as:
F = s1/s2
where s1 and s2 are the sample variances. The more this ratio deviates from 1, the stronger the evidence for unequal population variances. The degrees of freedom for the numerator are (n1 − 1), where n1 is the sample size for the group with higher variance. Degrees of freedom for the denominator are (n2 − 1), where n2 is the sample size for the denominator group. The two variances were considered significantly different if ratio F is greater than the appropriate value in the F-table.
In this approach, the F-test indicated mixed analysis with many favouring rejection of the null hypothesis and another few favouring acceptance of the null hypothesis. For the stations of Kisasi, Kitui, Mutonguini, Mutitu, Matiliku and Lukenya, the null hypothesis was rejected for all the samples. The station of Matungulu 6 months indicated acceptance of the null hypotheses (4 in the month of June and 2 in the month of July), Mutomo indicated 12 months 4 in January, 2 in February, 2 in June and 4 in September. Kambi ya Mawe indicated four occasions of acceptance of the null hypothesis 2 in February and 2 in September. The F test indicates that there is enough evidence to reject the null hypothesis that the two variances are not equal at the 0.05 significance level.
Notable in this analysis is that those months which had incidents favouring the rejection of the null hypothesis were mainly the months of high rainfall including high seasonal rainfall such including March, April, May, October, November and December indicated rejection of the null hypotheses for all the stations in all the resampled data series. The months of low rainfall including January, February, June, July, August and September indicated acceptance of the null hypothesis.
Table A1 and Table A2 shows the computed t-values and F-values of the samples.
Table A1. Computed t-values of samples.
Table A1. Computed t-values of samples.
JanFebMarAprMayJunJulAugSepOctNovDecDegrees of FreedomCritical Values
KYM2007−0.014−0.0020.008−0.027−0.038−0.353−0.032−0.537−0.0040.0160.004−0.017181.812
20080.0100.0680.014−0.016−0.0140.1030.071−0.1540.0200.0140.007−0.010201.796
20090.0140.0520.0100.000−0.008−0.034−0.241−0.0360.0070.0140.0120.004221.782
Mutomo2007−0.001−0.011−0.001−0.008−0.071−0.156−1.252−0.060−0.0140.0030.000−0.007181.812
2008−0.001−0.0010.049−0.0040.052−0.098−0.314−0.0090.0220.0210.006−0.003201.796
20090.0000.0010.0140.005−0.480−0.091−0.262−0.0120.0200.0310.0000.005221.782
Lukenya20070.000−0.0040.0000.0000.018−0.0540.119−0.0420.0070.0000.0000.000181.812
2008−0.001−0.0190.0000.0000.0010.0760.018−0.0470.0170.0000.0000.000201.796
20090.0130.0400.024−0.014−0.0030.0650.030−0.017−0.055−0.0080.001−0.005221.782
Matiliku2007−0.005−0.0090.098−0.023−0.157−0.2040.057−0.0360.0090.0290.006−0.010181.812
20080.0160.0220.095−0.003−0.1290.2270.0510.2770.0080.0140.003−0.009201.796
Mutito20070.008−0.0120.0160.0300.034−0.3890.285−0.034−0.0130.0060.009−0.009181.812
2008−0.003−0.0090.0190.0280.041−0.0260.214−0.015−0.0170.0120.006−0.005201.796
2009−0.015−0.0070.0140.0060.024−0.1060.228−0.007−0.0150.0140.006−0.003221.782
Matungulu2007−0.007−0.011−0.002−0.017−0.0140.0570.0410.0230.0230.0170.002−0.004181.812
20080.0050.022−0.003−0.002−0.0010.0560.0110.0130.0070.0010.004−0.003201.796
20090.0040.015−0.0070.001−0.0040.0450.0000.007−0.001−0.0060.0080.010221.782
20100.0040.015−0.005−0.002−0.007−0.016−0.021−0.030−0.012−0.0100.0090.011241.771
Kisasi2007−0.0200.0170.0170.0230.0650.0735.097−0.2780.0610.0100.0160.001181.812
2008−0.0200.0270.0190.0340.1170.227−2.8630.1980.1620.0130.0160.002201.796
Kitui20070.0190.0250.0250.0320.0790.0480.2440.0730.0470.0040.0090.011181.812
20080.000−0.0020.0000.0000.000−0.068−0.032−0.0030.0070.0000.000−0.001201.796
Mutonguini20070.0050.0310.0160.0240.140−0.001−0.1390.0060.0820.0230.0190.009181.812
20080.0150.0250.0180.0290.1250.0970.1950.2830.0570.0140.0150.019201.796
20090.0150.0520.0290.0260.1320.0950.5170.1890.0350.0020.0120.022221.782
Table A2. Computed F-values of samples.
Table A2. Computed F-values of samples.
JanFebMarAprMayJunJulAugSeptOctNovDecDegrees of FreedomF-Critical
KYM N
2007100.8101.2192.3361.3540.9460.7772.2060.2312.25113.5761.8870.94993.1789
2008111.2514.0122.3311.0310.9932.6272.0300.8854.31816.4571.8550.880102.913
2009121.2992.7860.9861.1821.1151.6351.0780.9971.24315.8412.1470.685112.8536
Mutomo20071016.05514.2950.6814.4590.94213.5890.4561.79629.7315.0392.4912.14193.1789
20081114.90211.8100.73711.7310.77611.6950.3861.33923.6923.3181.9761.754102.913
2009122.9562.4880.63316.5610.0850.7200.4311.21432.1053.3822.5072.895122.6866
Lukenya2007100.0070.0320.0020.0030.0251.0442.9481.0060.0460.0020.0000.00293.1789
2008110.0070.0310.0020.0030.0250.9722.8720.9920.0450.0020.0000.002102.913
2009120.2340.4070.9720.43113.4022.11515.7011.8160.1808.0680.4120.314112.8536
Matiliku2007100.7920.8162.2781.0240.7700.7462.1110.8351.3184.3652.1981.05293.1789
2008110.7920.8160.9750.9590.8130.7462.1110.8351.3184.3652.1990.958102.913
Mutito2007101.2640.9261.6718.8536.2530.8901.8890.9381.1291.5712.0040.83493.1789
2008111.2250.9281.6699.47968.9620.8961.6690.8601.1371.5032.1950.818102.913
Matungulu2007100.8260.8121.2480.8230.8795.1511.7241.5711.8121.0853.0130.88493.1789
2008110.8460.8341.2980.9840.8327.0155.7551.5452.5091.8714.2770.938102.913
2009120.8330.8091.4150.7750.9676.8085.2521.5392.2311.8424.2380.933112.8536
2010130.8240.8201.1690.8230.8795.2381.9611.5132.8551.1232.8880.936122.6866
Kisasi2007101.4491.1251.2541.3170.7701.4090.0460.3451.8201.4281.3230.85293.1789
2008111.2531.1691.0731.3500.9071.2700.1400.9151.0771.3231.2810.893102.913
Kitui2007101.4651.2411.1631.3150.8891.3161.6571.1541.4920.9450.8510.89293.1789
2008110.0000.0080.0020.0010.0000.0480.0040.0010.0090.0010.0020.002102.913
Mutonguini2007101.2521.1501.2281.3351.1141.1110.7570.7631.0071.3241.2830.80393.1789
2008111.0481.2401.0041.0280.9300.8940.8810.8851.1341.4061.2390.907102.913
2009121.1841.2011.3011.3991.1181.0900.8060.8671.0721.3311.2720.836112.8536

Appendix A.2. t-Test and F-Test Following the Arrangement of Sample Datasets in Calendar Months

The student t-test and Fisher’s F-test were conducted for the rain gauge datasets for the respective periods against datasets generated using the MOVE.2 approach. The datasets used were the MOVE2 datasets developed by replacement of rain gauge values with MOVE.2 computed values following a jacknife approach. Statistics for the samples are given in Table A3.
Table A3. Summary Statistics of Samples arranged in Calendar form.
Table A3. Summary Statistics of Samples arranged in Calendar form.
Station NameSample NameObservationsMinimumMaximumMeanStd. Deviation
MutoguiniMOVE2 2007-09361.300320.00072.11792.535
RG-2007-09360.000457.00091.489116.371
KisasiMOVE2 2007-08243.800258.00075.30677.716
RG 2007-08240.000457.000101.512131.580
KisasiMOVE2-2011122.900289.00072.15879.550
Kisasi RG-2011120.800562.300114.883164.200
KYMMOVE2 2007-2010483.000162.00048.15638.335
RG 2007-2010480.000296.30042.67357.917
MutomoMOVE2-2007-09362.100250.00045.99263.348
Mutomo-RG-2007-09360.000290.50037.78971.321
LukenyaMOVE2-2007-09363.000115.00036.70836.632
RG-2007-09360.000110.10028.17834.141
MatilikuMOVE2-2007-08245.400287.00057.33374.121
RG-2007-08240.000303.70059.06381.379
MatilikuMOVE2-2010-11243.000157.00040.91740.496
RG-2010-11240.400531.80066.787113.160
MutitoMOVE2-2007-09363.000280.00063.33362.405
RG-2007-09360.000262.40063.18374.225
MatunguluMOVE2-2007-09484.100187.50072.19852.813
RG-2007-09480.000271.00068.89868.010
KituiMOVE2-2007-08248.900258.00069.44271.043
RG-2007-08240.000266.70080.79690.989
KituiMOVE2-2011127.100212.00060.47569.233
RG-2011121.800137.40039.73340.296

Appendix A.3. Student’s t-Test on Two Independent Samples

Student’s t-test on two independent samples was done the test compared the mean of the two independent samples, using the independent sample t-test. The goal was to test if there is a clear difference between the means of the two samples. The Student’s t-test on two independent samples was done for the rain gauge datasets for the respective periods against datasets generated using the MOVE2 jacknife approach the test compared the mean of two independent samples, using the two independent sample t-test. The results of the test were considered on the merit below:
Accept null hypothesis H0 if computed p-value is greater than the significance level alpha = 0.05.
Reject null hypothesis H0 and if computed p-value is less than the significance level alpha = 0.05. Table A4 shows the results of t-test for the jacknife samples.

Appendix A.4. Two-Sample Comparison of Variances Tests

Fisher’s F-test for comparison of variances on two independent samples was done. The test compared the variance of the two independent samples, using the independent sample F-test. The goal was to test if there is a clear difference between the variance of the two samples. The F test on two independent samples was conducted for the samples of rain gauge datasets for the respective periods against samples generated using the MOVE.2 jacknife approach. The test compared the variance of two independent samples, using the two independent sample F-test. The results of the test were considered on the merit below:
Accept null hypothesis H0 if computed p-value is greater than the significance level alpha = 0.05.
Reject null hypothesis H0 and if computed p-value is less than the significance level alpha = 0.05. Table A5 shows the results of F-test for the jacknife samples.
Table A4. Results of t-test for the jacknife samples.
Table A4. Results of t-test for the jacknife samples.
Mutonguini 2007–2009Kisasi 2007–2008Kisasi 2011KYM 2007–2009Mutomo 2007–2009Lukenya 2007–2009Matiliku 2007–2008Matiliku 2010–2011Mutito 2007–2009Matungulu 2007–2009Kitui 2007–2008Kitui 2011
Difference−19.372−26.206−42.725−19.3728.2038.531−1.729−25.8710.1503.300−11.35420.742
t (Observed value)−0.782−0.840−0.811−0.7820.5161.022−0.077−1.0550.0090.266−0.4820.897
t (Critical value)1.9942.0132.0741.9941.9941.9942.0132.0131.9941.9862.0132.074
DF704622707070464670944622
p-value (Two-tailed)0.4370.4050.4260.4370.6080.3100.9390.2970.9930.7910.6320.379
alpha0.050.050.050.050.050.050.050.050.050.050.050.05
AcceptAcceptAcceptAcceptAcceptAcceptAcceptAcceptAcceptAcceptAcceptAccept
Risk to reject null hypothesis H0 while it is true43.7040.5242.6043.760.7531.0293.9029.7299.2679.1263.2237.99
Table A5. Results of F-test for the jacknife samples.
Table A5. Results of F-test for the jacknife samples.
Mutonguini 2007–2009Kisasi 2007–2008Kisasi 2011KYM 2007–2009Mutomo 2007–2009Lukenya 2007–2009Matiliku 2007–2008Matiliku 2010–2011Mutito 2007–2009Matungulu 2007–2009Kitui 2007–2008Kitui 2011
Ratio0.6320.3490.2350.4380.7891.1510.8300.1280.7070.6032.9520.610
F (Observed value)0.6320.3490.2350.4380.7891.1510.8300.1280.7070.6032.9520.610
F (Critical value)1.9612.3123.4741.7841.9611.9612.3122.3121.9611.7843.4742.312
DF1352311473535232335471123
DF2352311473535232335471123
p-value (Two-tailed)0.1800.0150.0240.0060.4870.6790.658< 0.00010.3090.0860.0860.243
alpha0.050.050.050.050.050.050.050.050.050.050.050.05
AcceptRejectRejectRejectAcceptAcceptAcceptRejectAcceptAcceptAcceptAccept
Risk to reject null hypothesis H0 while it is true (%)18.011.452.380.5548.6767.9265.780.0130.958.6324.288.63

References

  1. Collischonn, B.; Collischonn, W.; Tucci, C.E.M. Daily hydrological modeling in the Amazon basin using TRMM rainfall estimates. J. Hydrol. 2008, 360, 207–216. [Google Scholar] [CrossRef]
  2. Bareither, C.; Foley, J.; Benson, C. Using Surrogate Meteorological Data to Predict the Hydrology of a Water Balance Cover. J. Geotech. Geoenviron. Eng. 2015, 142. [Google Scholar] [CrossRef]
  3. Franz, K.J. Evaluation of National Weather Service Ensemble Streamflow Prediction (ESP) Water Supply Forecasts. Master’s Thesis, Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, 2001. [Google Scholar]
  4. Wilks, D.S. Multisite generalization of a daily stochastic precipitation generation model. J. Hydrol. 1998, 210, 178–191. [Google Scholar] [CrossRef]
  5. Auer, I.; Bohm, R.; Jurkovic, A.; Orlik, A.; Potzmann, R.; Ungersbock, M.; Brunetti, M.; Nanni, T.; Maugeri, M.; Schoner, W.; et al. A new instrumental precipitation dataset for the greater alpine region for the period 1800–2002. Int. J. Climatol. 2005, 25, 139–166. [Google Scholar] [CrossRef]
  6. Eaton, C.; Plaisant, C.; Drizd, T. The challenge of missing and uncertain data. In Proceedings of the IEEE InfoVis Poster Compendium 2003; IEEE Computer Society Press: Los Alamitos, CA, USA, 2003; pp. 40–41. [Google Scholar]
  7. Schneider, T. Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J. Clim. 2001, 14, 853–871. [Google Scholar] [CrossRef]
  8. Livneh, B.; Theodore, J.B.; Pierce, W.D.; Munoz-Arriola, F.; Nijssen, B.; Vose, R.; Cayan, R.D.; Brekke, L. A Spatially Comprehensive, Hydrometeorological Data Set for Mexico, the U.S., and Southern Canada 1950–2013. Scientific Data; 2015. Available online: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4540002/ (accessed on 16 February 2015). [Google Scholar]
  9. Aly, A.; Pathak, C.; Teegavarapu, R.; Ahlquist, J.; Fuelberg, H. Evaluation of Improvised Spatial Interpolation Methods for Infilling Missing Precipitation Records. World Environ. Water Resour. Congr. 2009. [Google Scholar] [CrossRef]
  10. El Sharif, H.A.; Teegavarapu, R.S.V. Evaluation of spatial interpolation methods for missing precipitation data: preservation of spatial statistics. In Proceedings of the World Environmental and Water Resources Congress, Albuquerque, NM, USA, 20–24 May 2012; pp. 3822–3832.
  11. Regonda, S.K.; Seo, D.-J.; Lawrence, B.; Brown, J.D.; Demargne, J. Short-term ensemble stream forecasting using operationally produced single-valued streamflow forecasts—A Hydrologic Model Output Statistics (HMOS) approach. J. Hydrol. 2013, 497, 80–96. [Google Scholar] [CrossRef]
  12. Xia, J.; Chen, Y.D. Water problems and opportunities in hydrological Sciences in China. Hydrol. Sci. J. 2001, 46, 907–921. [Google Scholar]
  13. Teegavarapu, R. Statistical corrections of spatially interpolated missing precipitation data estimates. Hydrol. Process. 2014, 28, 3789–3808. [Google Scholar] [CrossRef]
  14. Villazón, M.F.; Willems, P. Filling gaps and daily disaccumulation of precipitation data for rainfall-runoff model. In Proceedings of the 4th International Scientific Conference on Water Observation and Information Systems for Decision Support, Ohrid, Republic of Macedonia, 25–29 May 2010; Morell, M., Popovska, C., Morell, O., Stojov, V., Eds.; BALWOIS: Ohrid, Republic of Macedonia, 2010; pp. 1–9. [Google Scholar]
  15. Gillingham, K.; Nordhaus, W.; Anthoff, D.; Blanford, G.; Bosetti, V.; Christensen, P.; McJeon, H.; Reilly, J.; Sztorc, P. Modeling Uncertainty in Climate Change: A Multi-Model Comparison. Cowles Foundation for Research in Economics Yale University. Discussion Paper No. 2022. Available online: http://cowles.yale.edu/sites/default/files/files/pub/d20/d2022.pdf (accessed on 16 February 2015).
  16. Michelson, D.B. Systematic correction of precipitation gauge observations using analysed meteorological variables. J. Hydrol. 2004, 290, 161–177. [Google Scholar] [CrossRef]
  17. Becker, A.; Finger, P.; Meyer-Christoffer, A.; Rudolf, B.; Schamm, K.; Schneider, U.; Ziese, M. A description of the global land-surface precipitation data products of the Global Precipitation Climatology Centre with sample applications including centennial (trend) analysis from 1901-present. Earth Syst. Sci. Data 2013, 5, 71–99. [Google Scholar] [CrossRef]
  18. Pepler, P.T.; Uys, D.W.; Nel, D.G. A comparison of some methods for the selection of a common eigenvector model for the covariance matrices of two groups. Commun. Stat. Simul. Comput. 2013. [Google Scholar] [CrossRef]
  19. Dinku, T.; Ceccato, P.; Grover-Kopec, E.; Lemma, M.; Connor, S.J.; Ropelewski, C.F. Validation of satellite rainfall products over East Africa’s complex topography. Int. J. Remote Sens. 2007, 28, 1503–1526. [Google Scholar] [CrossRef]
  20. Asadullah, A.; Mcintyre, N.; Kigobe, M. Evaluation of five satellite products for estimation of rainfall over Uganda. Hydrol. Sci. J. 2008, 53, 1137–1150. [Google Scholar] [CrossRef]
  21. Bowman, K.P.; Phillips, A.B.; North, G.R. Comparison of TRMM rainfall retrievals with rain gauge data from the TAO/TRITON buoy array. Geophys. Res. Lett. 2003, 30, 1757. [Google Scholar] [CrossRef]
  22. Bell, T.L.; Kundu, P.K. Comparing satellite rainfall estimates with rain gauge data: Optimal strategies suggested by a spectral model. J. Geophys. Res. 2003, 108, 4121. [Google Scholar] [CrossRef]
  23. Panet, I.; Pollitz, F.; Mikhailov, V.; Diament, M.; Banerjee, P.; Grijalva, K. Upper mantle rheology from GRACE and GPS postseismic deformation after the 2004 Sumatra-Andaman earthquake. Geochem. Geophys. Geosyst. 2010, 11. [Google Scholar] [CrossRef] [Green Version]
  24. Gebregiorgis, A.S.; Hossain, F. How well can we estimate error variance of satellite precipitation data around the world? Atmos. Res. 2015, 154, 39–59. [Google Scholar] [CrossRef]
  25. Johnston, C.A. Development and Evaluation of Infilling Methods for Missing Hydrologic and Chemical Watershed Data. Master’s Thesis, Virginia Tech, Blacksburg, VA, USA, 1999. [Google Scholar]
  26. Hirsch, R.M. A comparison of four streamflow record extension techniques. Water Resour. Res. 1982, 15, 1781–1790. [Google Scholar] [CrossRef]
  27. Gottaut, H.; Kruger, K. Results of experiments at the AVR reactor. Nucl. Eng. Des. 1990, 121, 143–153. [Google Scholar] [CrossRef]
  28. Piantadosi, J.; Boland, J.; Howlett, P. Generating Synthetic Rainfall on Various Timescales—Daily, Monthly and Yearly. In Proceedings of the 17th Biennial Congress on Modelling and Simulation, Christchurch, New Zealand, 10–13 December 2007.
  29. Piani, C.; Haerter, J.O.; Coppola, E. Statistical bias correction for daily precipitation in regional climate models over Europe. Theor. Appl. Climatol. 2010, 99, 187–195. [Google Scholar] [CrossRef]
  30. Githungo, W.N. Index Based Drought Early Warning in Eastern Arid and Semi Arid Kenya. PhD Thesis, Masinde Muliro University of Science and Technology, Kakamega, Kenya, 2016. [Google Scholar]
  31. Kummerow, C.; Barnes, W.; Kozo, T.; Shiue, J.; Simpson, J. The Tropical Rainfall Measuring Mission (TRMM) sensor package. J. Atmos. Ocean. Technol. 1998, 15, 809–817. [Google Scholar] [CrossRef]
  32. Nicholson, S. On the question of the “recovery” of the rains in the West African Sahel. J. Arid Environ. 2005, 63, 615–641. [Google Scholar] [CrossRef]
  33. Krug, W.R.; Gebert, W.A.; Graczyk, D.J.; Stevens, D.L.; Rochelle, B.P.; Church, M.R. Map of Mean Annual Runoff for the Northeastern, Souiheastem, and Mid-Atlantic United States, Water Years 1951–1980; U.S. Geological Survey Water Resource Investment Report 884094; U.S. Geological Survey: Madison, WI, USA, 1990.
  34. Hirsch, R.M. An evaluation on some record reconstruction techniques. Water Resour. Res. 1979, 18, 1081–1088. [Google Scholar] [CrossRef]
  35. Parrett, C.; Johnson, D.R. Estimates of Monthly Streamflow Characteristics and Dominant-Discharge Hydrographs for Selected Sites in the Lower Missouri and Little Missouri River Basins in Montana; U.S. Geological Survey Water Resources Investigations Report 94-4098; U.S. Geological Survey: Madison, WI, USA, 1994; p. 29.
  36. Alley, W.M.; Bums, A.W. Mixed-station extension of monthly streamflow records. J. Hydraul. Eng. 1983, 109, 1272–1284. [Google Scholar] [CrossRef]
  37. Prichard, D.; Theiler, J. Generating surrogate data for time series with several simultaneously measured variables. Phys. Rev. Lett. 1994, 73, 951–954. [Google Scholar] [CrossRef] [PubMed]
  38. Altman, D.G.; Bland, J.M. The normal distribution. BMJ 1995, 310, 298. [Google Scholar] [CrossRef] [PubMed]
  39. Njoroge, E.M. Validation of Satellite Derived Rainfall Estimates over Kenya. Master’s Thesis, University Of Nairobi, Nairobi, Kenya, 2010. [Google Scholar]
  40. Mahmud, M.R.; Numata, S.; Matsuyama, H.; Hosaka, T.; Hashim, M. Assessment of Effective Seasonal Downscaling of TRMM Precipitation Data in Peninsular Malaysia. Remote Sens. 2015, 7, 4092–4111. [Google Scholar] [CrossRef]
  41. Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
  42. Wilson, T. The forecast accuracy of Australian Bureau of Statistics national population projections. Int. J. Popul. Res. 2007, 24, 91–117. [Google Scholar] [CrossRef]
  43. Nyandiko, N.; Wakhungu, K.; Oteng’i, S. Effects of climate variability on maize yield in the arid and semi arid lands of lower eastern Kenya. Agric. Food Secur. 2015. [Google Scholar] [CrossRef]
  44. Glover, J.; Robinson, P.; Henderson, J.P. Provisional maps of the reliability of annual rainfall in East Africa. Q. J. R. Meteorol. Soc. 1954, 80, 602–609. [Google Scholar] [CrossRef]
  45. Snedecor, G.W.; Cochran, W.G. Statistical Methods, 8th ed.; Iowa State University Press: Ames, IA, USA, 1989. [Google Scholar]
  46. Sawilowsky, S. Fermat, Schubert, Einstein, and Behrens–Fisher: The Probable Difference between Two Means When σ12 ≠ σ22. J. Mod. Appl. Stat. Methods 2002, 1, 461–472. [Google Scholar]
  47. Snedecor, G.W.; Cochran, W.G. Statistical Methods, 7th ed.; Iowa State University Press: Ames, IA, USA, 1980. [Google Scholar]
  48. Theiler, J.; Lu DanK, S.; Longtin, A.; Galdrikian, B.; Farmer, D. Testing for nonlinearity in time series: The method of surrogate data. Physica D 1992, 58, 77–94. [Google Scholar] [CrossRef]
  49. Minitab 17. Getting Started with Minitab 17; Minitab Inc.: State College, PA, USA, 2016. [Google Scholar]
  50. Harvey, C.L.; Dixon, H.; Hannaford, J. An appraisal of the performance of data-infilling methods for application to daily mean river flow records in the UK. Hydrol. Res. 2012, 43, 618–636. [Google Scholar] [CrossRef] [Green Version]
  51. Kyriakidis, P.C.; Miller, N.L.; Kim, J. Uncertainty Propagation of Regional Climate Model Precipitation Forecasts to Hydrologic Impact Assessment. J. Hydrometeorol. 2001. [Google Scholar] [CrossRef]
  52. Brandsma, T.; Können, G.P. Application of nearest-neighbour resampling techniques for homogenizing temperature records on a daily to sub-daily level. Int. J. Climatol. 2006, 26, 75–89. [Google Scholar] [CrossRef]
  53. Shao, J.; Wu, C.F.J. A general theory for jacknife variance estimation. Ann. Stat. 1989, 17, 1176–1197. [Google Scholar] [CrossRef]
  54. Zou, G.H.; Li, Y.L.; Zhu, R.; Guan, Z. Imputation of mean of ratios for missing data and its application to PPSWR sampling. Acta Math. Sin. 2010, 26, 863. [Google Scholar] [CrossRef]
  55. Sen, Z.; Eljadid, A.G. Rainfall Distribution Function for Libya and Rainfall Prediction. Hydrol. Sci. J. 1999, 44, 665–680. [Google Scholar] [CrossRef]
  56. Taylor, J.R. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, 2nd ed.; University Science Books: Mill Valley, CA, USA, 1997. [Google Scholar]
  57. Bevington, P.R.; Robinson, D.K. Data Reduction and Error Analysis for the Physical Sciences, 2nd ed.; WCB/McGraw-Hill: Boston, MA, USA, 1992; p. 328. [Google Scholar]
  58. Gentzkow, M.; Kamenica, E. A Rothschild-Stiglitz Approach to Bayesian Persuasion; Working Paper; Stanford University: Stanford, CA, USA; University of Chicago: Chicago, IL, USA, 2015. [Google Scholar]
  59. Khalema, T. Stochastic Ordering with Applications to Reliability Theory. Master’s Thesis, University of the Free State, Bloemfontein, South Africa, 2015. [Google Scholar]
  60. Giustarini, L.; Parisot, O.; Ghoniem, M.; Trebs, I.; Médoc, N.; Faber, O.; Hostache, R.; Matgen, P.; Otjacques, B. Data-infilling in daily mean river flow records: First results using a visual analytics tool (gapIT). Geophys. Res. Abstr. 2015, 17, 10462. [Google Scholar]
  61. Henn, B.; Raleigh, M.S.; Fisher, A.; Lundquist, J.D. A Comparison of Methods for Filling Gaps in Hourly Near-Surface Air Temperature Data. Am. Meteorol. Soc. 2013. [Google Scholar] [CrossRef]
  62. Thompson, D.W.J.; Li, Y. Baroclinic and barotropic annular variability in the Northern Hemisphere. J. Atmos. Sci. 2015, 72, 1117–1136. [Google Scholar] [CrossRef]
Figure 1. Map of Machakos, Makueni and Kitui Counties inset in a Map of Kenya and Africa source: in [30], republished with permission with the Masinde Muliro University of Science & Technology.
Figure 1. Map of Machakos, Makueni and Kitui Counties inset in a Map of Kenya and Africa source: in [30], republished with permission with the Masinde Muliro University of Science & Technology.
Hydrology 03 00040 g001
Figure 2. Linear Plot Comparison of Rain gauge and TRMM datasets for Kampi ya Mawe Station.
Figure 2. Linear Plot Comparison of Rain gauge and TRMM datasets for Kampi ya Mawe Station.
Hydrology 03 00040 g002
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001–December 2002.
Figure 3. Mutonguini rain gauge data plotted against respective TRMM data set for the period November 2001–December 2002.
Hydrology 03 00040 g003
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004–October 2005.
Figure 4. Mutomo rain gauge data plotted against respective TRMM data set for the period October 2004–October 2005.
Hydrology 03 00040 g004
Figure 5. Kambi ya Mawe rain gauge data plotted against respective TRMM data set for the period January 1998–May 1999.
Figure 5. Kambi ya Mawe rain gauge data plotted against respective TRMM data set for the period January 1998–May 1999.
Hydrology 03 00040 g005
Figure 6. Kitui rain gauge data plotted against respective TRMM data set for the period November 2001–March 2003.
Figure 6. Kitui rain gauge data plotted against respective TRMM data set for the period November 2001–March 2003.
Hydrology 03 00040 g006
Figure 7. Plot of Kambi ya Mawe rain gauge (Actual) vs. MOVE.2 (Predicted).
Figure 7. Plot of Kambi ya Mawe rain gauge (Actual) vs. MOVE.2 (Predicted).
Hydrology 03 00040 g007
Figure 8. Plot of Kisasi rain gauge (Actual) vs. MOVE.2 (Predicted).
Figure 8. Plot of Kisasi rain gauge (Actual) vs. MOVE.2 (Predicted).
Hydrology 03 00040 g008
Figure 9. Distribution of Mean Absolute Percentage Error of Samples of the Rain gauge Values and the MOVE.2 Estimates.
Figure 9. Distribution of Mean Absolute Percentage Error of Samples of the Rain gauge Values and the MOVE.2 Estimates.
Hydrology 03 00040 g009
Figure 10. Results Regression Analysis of the samples of Kisasi station for the year 2011.
Figure 10. Results Regression Analysis of the samples of Kisasi station for the year 2011.
Hydrology 03 00040 g010
Figure 11. Mean of Regression Residuals for Kambi ya Mawe Station.
Figure 11. Mean of Regression Residuals for Kambi ya Mawe Station.
Hydrology 03 00040 g011
Figure 12. Mean of Regression Residuals for Mutomo Station.
Figure 12. Mean of Regression Residuals for Mutomo Station.
Hydrology 03 00040 g012
Figure 13. Mean of Regression Residuals for Lukenya Station.
Figure 13. Mean of Regression Residuals for Lukenya Station.
Hydrology 03 00040 g013
Figure 14. Mean of Regression Residuals for Matiliku Station.
Figure 14. Mean of Regression Residuals for Matiliku Station.
Hydrology 03 00040 g014
Figure 15. Mean of Regression Residuals for Mutito Station.
Figure 15. Mean of Regression Residuals for Mutito Station.
Hydrology 03 00040 g015
Figure 16. Mean of Regression Residuals for Matungulu Station.
Figure 16. Mean of Regression Residuals for Matungulu Station.
Hydrology 03 00040 g016
Figure 17. Mean of Regression Residuals for Kisasi Station.
Figure 17. Mean of Regression Residuals for Kisasi Station.
Hydrology 03 00040 g017
Figure 18. Mean of Regression Residuals for Kitui Station.
Figure 18. Mean of Regression Residuals for Kitui Station.
Hydrology 03 00040 g018
Figure 19. Mean of Regression Residuals for Mutonguini Station.
Figure 19. Mean of Regression Residuals for Mutonguini Station.
Hydrology 03 00040 g019
Figure 20. Normal Quantile Plot for Kisasi_Samples for 2011.
Figure 20. Normal Quantile Plot for Kisasi_Samples for 2011.
Hydrology 03 00040 g020
Figure 21. Gamma Cumulative Distribution Plot for Mutomo Station During the month of February.
Figure 21. Gamma Cumulative Distribution Plot for Mutomo Station During the month of February.
Hydrology 03 00040 g021
Figure 22. Gamma Cumulative Distribution Plot for Mutomo Station During the month of April.
Figure 22. Gamma Cumulative Distribution Plot for Mutomo Station During the month of April.
Hydrology 03 00040 g022
Figure 23. Gamma Cumulative Distribution Plot for Mutomo Station During the month of July.
Figure 23. Gamma Cumulative Distribution Plot for Mutomo Station During the month of July.
Hydrology 03 00040 g023
Figure 24. Gamma Cumulative Distribution Plot for Mutomo Station During the month of November.
Figure 24. Gamma Cumulative Distribution Plot for Mutomo Station During the month of November.
Hydrology 03 00040 g024
Table 1. Locations of the rain gauges, corresponding grid point for Tropical Rainfall Measuring Mission (TRMM) data extraction, distance between rain gauge location and grid point data, the number and Proportion of data points of missing record and the period of missing data.
Table 1. Locations of the rain gauges, corresponding grid point for Tropical Rainfall Measuring Mission (TRMM) data extraction, distance between rain gauge location and grid point data, the number and Proportion of data points of missing record and the period of missing data.
Station NameStation Identifier NumberStation LocationLocation of for TRMM Pixel CentreDistance (km) between Rain Gauge and TRMM Grid PointNumber of Missing Data Points (1961–2011)Percentage (%) of Missing Data of Total Data-Set (1961–2011)Period of Missing Data Points
Latitude (Degrees)Longitude (Degrees)Latitude (Degrees)Longitude (Degrees)
Kambi ya Mawe9137075−1.8537.6667−1.7537.750.520121.96Jan 2011–Dec 2011
Mutonguni9137094−1.2837.9833−1.0380.75243.92Jan 2010–Dec 2011
Kitui9137095−1.2237.59−1.25380.23243.92Jan 2009–Dec 2010
Mutomo9138001−1.8538.2−1.7538.250.45121.96Jul–Nov 2010
Kisasi9138037−1.533338.0167−1.5380.15304.9Jul 2008–Dec 2009 and Jan–Dec 2011
Lukenya9137046−1.533337.6167−1.537.50.49294.73Jan 2010–May 2011
Matungulu9137040−1.266737.35−1.2537.250.41294.73Jan–Dec 2011
Matiliku9137028−1.9537.533−2.037.50.20193.10Jan–Jul 2009
Mutito Forest9138040−1.1338.11−1.038.250.65264.24Nov 2010–Dec 2011
Table 2. MOVE.2 parameters used in computation of infilled values for Kitui Station.
Table 2. MOVE.2 parameters used in computation of infilled values for Kitui Station.
JanFebMarAprMayJunJulAugSepOctNovDec
m’(y)56.5162.86137.91158.5967.929.983.4356.8411.53158.28274.44108.90
S’(y)1727.0215776.681959.56795.491174.0612.377.16623.74206.216013.482924.2718701.36
M(x)75.5326.35107.83152.7169.068.937.1112.6012.0332.302187.3688.02
M(y)54.459.2127.9162.269.810.14.38.015.162.5246.394.1
à4.4−8.8−13.2−8.8−7.33333−6.6−6.16−5.86667−5.65714−5.5−5.37778−5.28
y’(2007)57.897.77144.13157.7166.209.972.965.675.4735.24275.17137.03
y’(2008)55.227.96131.70159.4869.6510.013.918.0217.6181.33273.7280.77
X291.974.3219.4386.1159.117.115.625.943.7193.3437.766.4
X12.89.252.8215.5113.310.815.320.638.7143.5239.429.5
M(X1)51.478.5589.36165.2766.478.323.8510.9710.5732.67209.65111.17
M(X2)56.8220.88183.22116.0646.704.919.7813.3012.0230.69201.0272.30
Table 3. MOVE.2 parameters used in computation of infilled values for Mutonguini Station.
Table 3. MOVE.2 parameters used in computation of infilled values for Mutonguini Station.
JanFebMarAprMayJunJulAugSepOctNovDec
m’(y)71.6127.15115.70155.6972.2059.346.619.2511.7431.88200.293.56
S’(y)6265.06852.111579.48574.963281.3714.267.39157.1912.6055.516776.3817067.16
M(x)68.0520.1889.82169.9963.737.285.5110.9811.3152.86235.4691.14
M(y)54.168.8137.9161.453.99.31.57.722.184.2291.6117.1
à4.4−8.8−13.2−8.8−7.3−6.6−6.16−5.8−5.6−5.5−5.3−5.2
y’(2007)67.527.75119.17156.2975.949.456.414.1811.7031.8217.5112.4
y’(2008)75.6926.55112.22155.0968.469.236.8122.6911.8031.91183.0674.71
X2150.2859.89140.34299.01117.689.1413.2346.7341.8696.18411.8281.66
X1131.8214.6730.55124.00544.366.5710.6532.3915.5044.5171.4138.31
M(X1)37.216.4082.83187.8158.856.453.729.3810.0140.45227.02133.44
M(X2)50.117.60140.87157.1845.795.106.6311.0511.4465.8263.0555.06
Table 4. Distribution of number of data points infilled with MOVE.2 estimated values for respective stations.
Table 4. Distribution of number of data points infilled with MOVE.2 estimated values for respective stations.
StationJanFebMarAprMayJunJulAugSepOctNovDec
Mutomo 11111
KYM111111111111
Mutonguni222222222222
Kitui222222222222
Kisasi222222333333
Lukenya222221111111
Matungulu111111111111
Matiliku1111111
Mutito Forest111111111122
Table 5. Differences in Descriptive Statistics of rain gauge values and MOVE.2 estimates.
Table 5. Differences in Descriptive Statistics of rain gauge values and MOVE.2 estimates.
Mean (mm)Median (mm)Std. Dev (mm)Std. Err. Mean (mm)Minimum (mm)Maximum (mm)SkewnessKurtosis
Kambi ya Mawe20078.9659.191−1.925−0.5562.76310.115−0.0690.318
2008−19.54119.85−59.335−17.129−0.1−173.3−0.598−0.924
200916.08529.75−5.392−1.5579.3630.7−0.6550.59
201016.42519.61.8810.5429.823.90.2130.824
Lukenya200712.85810.110.9533.1625.143.50.2620.94
20080.09112.15−6.604−1.90634.90.3980.626
201012.64117.657.7172.228443.50.2860.398
Mutomo20079.31730.55−36.1−10.4228.2−121.5−0.185−0.02
20086.9921.950.0840.0252.15.7−0.039−0.061
20098.315.39.3662.7032.144.20.3340.995
20119.6918.559.5962.772.128.8−0.295−0.881
Matiliku2007−2.5336.15−13.159−3.7985.4−21.80.0410.278
2008−0.925−2.2−3.101−0.8951.9−16.7−0.135−0.834
2010−0.759−0.62.6810.774−5−10.3−0.471−0.801
201152.5−0.45117.60933.951−1.4408.80.150.756
Mutitu2007−3.8514.95−25.843−7.463−41.20.4230.983
2008−1.94210.5−11.219−3.2383.5−31.1−0.092−0.234
20096.2412.6−6.774−1.9563.917.60.3710.213
Matungulu20077.45834.15−24.599−7.101−2.3−125−0.152−0.372
200810.42533.25−7.593−2.192−3.4−15−0.496−0.386
2009−17.267−36.85−15.965−4.6095.8−510.275−0.414
201012.58434.67−7.166−2.0698.613.30.3350.288
Kisasi2007−11.5796.2−42.752−12.3427.1−138.3−0.473−0.107
2010−42.725−16.35−84.65−24.4372.1−273.3−0.163−0.026
Kitui2007−17.6674.5−32.676−9.4330.9−900.020.507
2008−3.192−1.85−3.975−1.1474.3−3.5−0.078−0.278
201158.19214.3566.05819.07−1.8129.3−0.553−3.166
Mutonguini2007−13.05815.5−31.919−9.2141.9−310.8054.868
2008−15.0410.55−18.554−5.356−1.7−48.30.3852.043
200950.7515.0573.71221.279−0.6189.7−0.394−2.096
Table 6. Results of Wixcon Test Comparing the Difference between the mean of the Median of the samples of MOVE.2 Estimates and the Rain gauge Values.
Table 6. Results of Wixcon Test Comparing the Difference between the mean of the Median of the samples of MOVE.2 Estimates and the Rain gauge Values.
Month of the YearKYMMutomoLukenyaMatilikuMutitoMatunguluKisasiKituiMutonguini
Jan101110000
Feb000000000
March010100100
April111111111
May000101100
June000000000
July000000000
August000000000
September000011000
October111111111
November111111111
December011111110
Table 7. Durbin-Watson Statistic Matching the size of infilled datasets and Highest Number of Points infilled per month (n).
Table 7. Durbin-Watson Statistic Matching the size of infilled datasets and Highest Number of Points infilled per month (n).
Size of Infilled DatasetValue of nUpper Bound ValueLower Bound ValueDurbin-Watson Statistic
Kambi ya Mawe121212.1
Mutonguni242212.32
Kitui242212.49
Mutomo121212.89
Kisasi304212.32
Lukenya294212.89
Matungulu294212.22
Matiliku192212.38
Mutito Forest263212.15
Table 8. Correlation coefficient and the coefficient of determination for the years 2007–2011 for the respective stations.
Table 8. Correlation coefficient and the coefficient of determination for the years 2007–2011 for the respective stations.
20072008200920102011
Corr. Coef.R2Corr. Coef.R2Corr. Coef.R2Corr. Coef.R2Corr. Coef.R2
KYM0.920850.570.60.520.650.890.79--
Matiliku0.960920.990.99--0.980.960.550.52
Lukenya0.420180.8070.650.690.48----
Mutomo0.880.7707280.89-0.86--0.920.85
Kisasi0.94088------0.830.69
Mutitu0.940.870.950.910.840.71----
Matungulu0.600.360.900.810.690.650.940.82--
Kitui0.900.800.940.89----0.630.59
Mutonguini0.980.960.970.950.840.71----

Share and Cite

MDPI and ACS Style

Githungo, W.; Otengi, S.; Wakhungu, J.; Masibayi, E. Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya. Hydrology 2016, 3, 40. https://doi.org/10.3390/hydrology3040040

AMA Style

Githungo W, Otengi S, Wakhungu J, Masibayi E. Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya. Hydrology. 2016; 3(4):40. https://doi.org/10.3390/hydrology3040040

Chicago/Turabian Style

Githungo, William, Silvery Otengi, Jacob Wakhungu, and Edward Masibayi. 2016. "Infilling Monthly Rain Gauge Data Gaps with Satellite Estimates for ASAL of Kenya" Hydrology 3, no. 4: 40. https://doi.org/10.3390/hydrology3040040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop