1. Introduction
Precipitation is a major component of the water cycle and is also a key input to hydrological and ecohydrological models. Meanwhile, the water cycle is largely influenced by changes in regional temperature [
1]. Therefore, long-term precipitation and temperature information are vital to study climate changes, forecast local precipitation variability and extreme events trend analysis. Despite this, acquisition of reliable precipitation and temperature data is still a challenging task, especially in developing countries. Ground-based gauge collection is generally regarded as the most accurate precipitation and temperature acquisition approach. However, there is a sparse network of climate stations in many regions due to high installation, operation and maintenance costs, and low awareness of the importance of such information [
2], resulting in the inability to capture precipitation and temperature information at sufficient spatial and temporal resolutions.
Gridded climate products (GCPs), which have been developed from modeled and satellite remotely sensed data sources, are potentially alternative sources of climate data for streamflow modeling and other applications, which feature advantages of uninterrupted regional coverage, and high spatial and temporal resolutions [
3,
4,
5]. For instance, the National Centers for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR) [
6] and the Asian Precipitation—Highly-Resolved Observational Data Integration towards Evaluation of Water Resources (APHRODITE) [
7] are available globally at a daily time-scale for periods of more than 35 years. Recently, Ashouri et al. [
8] developed a new daily time-scale high resolution satellite precipitation product, called the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network-Climate Data Record (PERSIANN-CDR), for long-term hydro-climatic studies. However, the reliability of these products in many regions is still not well known.
Many studies have validated the performance of GCPs at either global, regional or catchment scale [
9,
10,
11]. Many of the studies reveal regional differences in GCP performance. For example, Tan et al. [
12] reported underestimation of precipitation values by APHRODITE over Peninsular Malaysia, whereas Jamandre and Narisma [
13] showed overestimation of the same product in the Philippines. Based on Fekete et al. [
14], such differences are expected to be larger in tropical regions compared to temperate regions due to the high precipitation variability. In addition, GCPs are associated with various uncertainties and differences in terms of algorithms, sources, spatial and temporal resolutions [
15]. These errors can propagate into streamflow modeling via water cycle processes [
16,
17].
Reliable climate data are essential for hydrological modeling because errors in climate inputs could lead to false model outputs. For example, an inappropriate model setup with inaccurate GCPs could result in a seemingly “good” model [
18], that leads to wrong simulations and subsequent decisions. Therefore, a capability assessment of GCPs prior to applying them in a hydrological model is critical to understanding and reducing these errors. In tropical regions, the capability of GCPs for hydrological assessments have been evaluated in the upper Mara Catchment, Kenya [
19]; Negro River Basin, Amazon [
20]; Blue Nile River Basin [
21]; and Adean watersheds [
22]. Vu et al. [
23] compared five GCPs in streamflow simulations of the Dak Bla River in Vietnam and concluded that APHRODITE performed the best in replicating daily streamflows. APHRODITE was also used successfully by Le and Sharif [
24] to evaluate climate change impacts on streamflow in the Huang River Basin in Central Vietnam. Several studies found that NCEP-CFSR performed poorly for streamflow simulations studies conducted in tropical or sub-tropical regions [
25,
26,
27]. However, Auerbach et al. [
28] reported satisfactory streamflow simulations using NCEP-CFSR for two catchments in Puerto Rico. Zhu et al. [
29] and Ashouri et al. [
30] report that PERSIANN-CDR performed well when used in streamflow simulations of sub-tropical catchments in China and the United States, respectively. Most studies have only focused on the GCP precipitation data assessments; comparatively few studies have also assessed the accuracy of GCP temperature data [
31]. To date, the assessment of suitability and accuracy of these newly developed GCPs in streamflow simulations is still limited in Malaysia.
The overall goal of this study is to investigate the performance of long-term GCPs relative to climate data inputs via streamflow simulations for two major basins in Malaysia. This is an extension of the previous study by Tan et al. [
12] which evaluated the performance of different GCPs across the entire country of Malaysia, but did not incorporate streamflow analysis. The specific objectives here for the two study basins are: (1) to assess the accuracy of the APHRODITE, PERSIANN-CDR and NCEP-CFSR data for precipitation and temperature data retrieval from 1983 to 2007; (2) to evaluate the capability of these products for streamflow simulations using the Soil and Water Assessment Tool (SWAT) ecohydrological model [
32,
33,
34,
35,
36]; and (3) to analyze the suitability of the three GCPs for capturing extreme hydro-climatic events.
4. Results
4.1. Precipitation Validation
The result of the statistical assessment of the 25-year (1983 to 2007) comparisons between the APHRODITE, PERSIANN-CDR and NCEP-CFSR annual, seasonal, monthly and daily precipitation data versus the rain gauge observations for the KRB and JRB is listed in
Table 2. The PERSIANN-CDR monthly-scale precipitation was the only GCP data that did not show significant differences relative to the KRB rain gauge observations, at a significance level of 0.05 (
Table 2). The PERSIANN-CDR data showed insignificant differences versus observations at the JJA seasons in both basins.
In the KRB, the APHRODITE precipitation data produced the best linear correlation for all time-scales, with CC values varying from 0.38 to 0.74, followed by the PERSIANN-CDR and NCEP-CFSR data. It is also clear that the APHRODITE and PERSIANN-CDR precipitation data underestimated the annual, DJF, SON, monthly and daily precipitation amounts, based on the respective positive and negative signs for the ME and RB indicators, while the NCEP-CFSR data resulted in highly overestimated precipitation across the basin. In addition, the NCEP-CFSR data showed the largest average errors as evidenced by the highest RMSE values that ranged from 19.49 mm to 1695.34 mm for most of the time-scales, except for the DJF.
All other GCP data showed significant differences for annual, daily and monthly time steps as compared to the rain gauge precipitation estimates for the JRB (
Table 2). The APHRODITE data produced the best results at the DJF, JJA, SON, monthly and daily time-scales, with CC values that ranged from 0.44 to 0.73. In contrast, the NCEP-CFSR data resulted in the worst performance at all time scales with CC values that spanned between 0.13 and 0.46. The APHRODITE data slightly underestimated the MMA, SON, monthly and daily precipitation levels, versus the PERSIANN-CDR and NCEP-CFSR data which produced large overestimations.
Generally, the GCPs show better linear correlation performance for the DJF and monthly time-scale estimations as compared to other time scales in both basins. The results found here showed that the APHRODITE data produced the best precipitation estimation performance for over both basins, which is in agreement with Tan et al. [
12] who conducted a national assessment over Malaysia. The main reason is due to the fact that the developers of APHRODITE incorporated MMD rain gauges’ data in the development of the product [
7]. On the contrary, NCEP-CFSR displays more serious errors and dramatically overestimated the total precipitation compared to the other GCPs. Similarly, Roth and Lemann [
64] found that the total annual NCEP-CFSR precipitation data was three times greater than observed precipitation data in Ethiopia. The distinct weaknesses that have been quantified for the NCEP-CFSR data may be attributed to the scale differences, where the size of a grid point is huge (up to 0.3125°) compared to the station data which is a point-based measurement. The errors are expected to be higher in a grid point with high spatial and temporal variability of precipitation as well as for regions characterized by complex topography [
65].
4.2. Precipitation Spatial Variability
The monthly CC and RB values for the GCPs over both basins are presented in
Figure 2 and
Figure 3, respectively, to provide insights regarding spatial variability. Generally, high CC values for all GCPs were found for the northern and eastern KRB sub-regions, which are near coastal and low elevation areas (
Figure 2a–c). All of the GCPs reflected strong performance of the CC values computed for the northwest JRB sub-region, while lower CC values dominated in the middle of the basin (
Figure 2d–f). The APHRODITE data underestimated monthly ground-based precipitation at most of the stations (
Figure 3). In contrast, the NCEP-CFSR data dramatically overestimated monthly precipitation at all of the stations, resulting in especially high RB values (more than 100%) for the stations mainly distributed in the southwestern KRB sub-region, which is characterized by high mountains (
Figure 3c). The NCEP-CFSR was the only GCP which resulted in significant overestimates for all stations distributed across the JRB.
These findings agree with other studies, which state that GCPs generally are more reliable in low land regions compared to higher elevations [
66,
67]. This might be due to misrepresenting the effects of warm clouds, by infrared (IR) sensors that commonly appear on mountaintops [
68]. The overall less accurate performance of GCPs in mountainous regions may be due to fewer rain gauges that can be used for product development. The installation and maintenance of climate stations in high mountainous regions is often problematic because of difficulties related to physical access and the fact the climate stations are representative of relatively small area due to high topography variability. In general, the APHRODITE dataset performed better for mountainous regions compared to other two GCPs, because the product has better orographic precipitation variability resolving skill [
69].
4.3. Precipitation: Rain Detection and Intensity Assessment
The NCEP-CFSR data showed the most outstanding performance for rain detection ability assessment, with POD values of 0.94 and 0.96 for KRB and JRB, respectively. However, the APHRODITE exhibits better ACC skills for the JRB, indicating that it has a stronger capability to correctly estimate overall precipitation and non-precipitation events in southern Peninsular Malaysia. In contrast, the PERSIANN-CDR and NCEP-CFSR GCPs performed better for the KRB. The analysis further revealed that the NCEP-CFSR data were most prone to predicting false rain event, which in fact were not recorded by the rain gauges, resulting in the highest FAR values of 0.52 (KRB) and 0.57 (JRB). Moderate CSI values were also predicted for all three GCPs ranging from 0.45 to 0.48 (KRB) and 0.42 to 0.51 (JRB), demonstrating that roughly 50% of the precipitation was correctly estimated.
Figure 4 presents the probability distribution functions (PDFs) of precipitation intensity for the KRB and JRB. The non-precipitation values ≤0.254 mm·day
−1 (common rain gauge threshold detection limit) were removed from the analysis. The three GCPs showed moderate underestimation for the ≥50 mm·day
−1 precipitation classes over both basins. The NCEP-CFSR data resulted in significant overestimation for the 5–10 and 10–20 mm·day
−1 precipitation classes in both basins. This is similar to the results reported by Blacutt et al. [
70], who also discovered the NCEP-CFSR overestimated precipitation at 3–20 mm·day
−1 class in Bolivia. They further reported the NCEP-CFSR tended to overestimate precipitation during the annual precipitation season period. This problem could potentially be amplified in both the KRB and JRB, which are typical tropical basins that receive precipitation throughout the year, especially during the northeast monsoon and southwest monsoon periods. The NCEP-CFSR data overestimation rate was higher for the JRB (up to 270% at 5–10 mm·day
−1) compared to the KRB, because the Sumatra and Titiwangsa mountain ranges help to reduce precipitation days in the KRB during the southwest monsoon season.
4.4. Temperature Validation
The statistical analysis of the NCEP-CFSR maximum and minimum temperature versus climate stations temperature gauges (
Figure 1) of the KRB and JRB is listed for various time scales in
Table 3. The temperature values from each temperature gauge were compared to the nearest NCEP-CFSR grid point. Generally, the NCEP-CFSR temperature data have better correlation with observations at the DJF and monthly time-scale, with CC values ranging from 0.6 to 0.91 and 0.57 to 0.93, respectively. In addition, the daily maximum temperature data were better correlated with the observed data as compared to the minimum temperature data. However, the average error of the daily maximum temperature data (RMSE = 2.58 to 3.32 °C) is larger than the minimum temperature (RMSE = 0.98 to 2.68 °C) at all stations.
Box plots of the interactions between the NCEP-CFSR data and climate station maximum and minimum temperature data, for the four climate stations distributed across the KRB and JRB, are shown in
Figure 5. The inter-quartile range shows that the minimum temperature at the 48679 station provides the best performance, as the range of the NCEP-CFSR data versus the gauge data matched quite well. The range of the NCEP-CFSR temperature data is larger than the observations at the all stations. As can be seen from the
Table 3 and
Figure 5, the NCEP-CFSR temperature data tend to underestimate the actual maximum and minimum temperature values. The main reason of the underestimation could be due to the land use types [
65]. For example, the 48679 station is located in an industrial area where the surface temperature is expected to be higher. However, the NCEP-CFSR relies on National Aeronautics and Space Administration (NASA) land use information data [
71], so reliable local land use information might be missing for the 48679 station location. Another possible reason for the underestimation of the NCEP-CFSR data may be explained by the mismatch of the temperature time measurement. For instance, the climate stations’ daily maximum and minimum temperature data were taken at 0800 and 1400 local time, respectively, while the NCEP-CFSR daily maximum and minimum temperature were obtained from hourly values [
72].
4.5. Streamflow: GCPs Precipitation Data
Table 4 lists the best fitted calibration parameters for KRB and JRB. The calibration and validation of the SWAT model were conducted based on local knowledge and a literature review of the SWAT model in tropical regions (e.g., [
54,
55,
73,
74]). As can be seen in
Table 4, the CN2 values were increased by 1% and 13% for the KRB and JRB, respectively. This increment of CN2 values was also observed in calibration of other tropical SWAT models [
75,
76,
77]. The CN2 value was higher in the JRB as it is dominated by oil palm plantations, where the surface runoff is generally higher than in a forest basin (KRB). Generally, the SWAT simulations that were based on rain gauge data agreed well with the observed streamflow during the calibration and validation periods for both the KRB and JRB (
Figure 6). The NSE values that were computed for the KRB (JRB) were 0.75 (0.78) and 0.65 (0.6) for the calibration and validation periods, respectively (
Table 5), and the corresponding KRB (JRB) R
2 statistics were 0.87 (0.78) and 0.84 (0.61) indicating that the SWAT model performed well for both basins based on the previously discussed suggested criteria [
58,
59].
Among the three GCPs, the most accurate KRB SWAT simulations occurred in response to the APHRODITE precipitation input, followed by the simulations driven by the PERSIANN-CDR and NCEP-CFSR precipitation data. The SWAT simulation streamflow trends, based on the APHRODITE and PERSIANN-CDR data, revealed overestimation of low streamflows and underestimation of high streamflows. The predicted streamflow results obtained with the NCEP-CFSR data were unacceptable as reflected by the negative NSE values (
Table 6). In addition, the NCEP-CFSR precipitation data resulted in relatively high overestimation of observed streamflows throughout the simulation period, as indicated by the high RB values of 167.77% and 143.72% during the calibration and validation periods, respectively.
Similar results were obtained in the JRB, where the SWAT simulations that were driven by the APHRODITE precipitation data yielded the best calibration and validation (
Figure 6 and
Table 6), followed again by the PERSIANN-CDR and NCEP-CFSR precipitation data. However, both the PERSIANN-CDR and NCEP-CFSR data resulted in unacceptable performance as shown, by the mostly negative NSE values (
Table 6). Overestimation of the observed streamflows is also clearly shown in the PERSIANN-CDR- and NCEP-CFSR-based JRB SWAT streamflow predictions (
Figure 6b) by 57.63% and 142.45%, respectively, during the validation period (
Table 6). However, the APHRODITE-based data tracked the observed streamflow well (
Figure 6b), which was also confirmed by the majority of NSE, R
2 and RB statistics (
Table 6), which indicated satisfactory results based on previously suggested criteria [
58,
59].
4.6. Streamflow: GCPs Precipitation + NCEP-CFSR Temperature Data
The statistical indices (R
2, NSE and RB) are summarized in
Table 6 for the SWAT simulations that were executed as a function of precipitation inputs from one of the three GCPs in combination with the NCEP-CFSR temperature data. The combinations of GCP precipitation inputs and NCEP-CFSR temperature data resulted in overestimations of the observed streamflow for the majority of the simulation period for both basins. Similarly, the most severe streamflow overpredictions resulted in response to the combination of NCEP-CFSR precipitation and NCEP-CFSR temperature data.
Generally, the integration of the NCEP-CFSR temperature data with the GCP precipitation data did not result in significant impacts on the SWAT simulations for either basin, compared to the simulations that were performed with just the GCP precipitation inputs. For example, the differences of the validation NSE values between the APHRODITE precipitation data input and the APHRODITE precipitation with the NCEP-CFSR temperature input for the KRB and JRB are 0.02 and 0.01, respectively. These findings show that the influence of the precipitation data on the local hydrological cycle is very dominant relative to the effects of the temperature data in this tropical region. This could be due to the small temperature range and variation that occurs in Malaysia as compared to more temperate or arid regions in other global sub-regions.
Some success was obtained by forcing the SWAT model with the integration of the APHRODITE precipitation and NCEP-CFSR temperature data. However, we could not ignore a tendency by the APHRODITE data to underestimate the actual precipitation, which in turn offset some of the trend in overestimated streamflow that occurred within the SWAT models in the two basins that we have studied. Based on Faramarzi et al. [
18], inaccurate input data, wrong model structure and inappropriate model parameters could generate misleading SWAT model outputs. The input data error can easily be identified using more reliable observations, while the other two require local expert knowledge with modeling skill. Hence, multiple GCP data should be evaluated through an initial assessment prior to applying them in any hydrological models.
4.7. Extreme Event Assessment
The final aspect of the overall analysis was to evaluate the capability of the GCPs to predict extreme precipitation events (
Table 7). All of the GCPs showed significant differences at 0.05 significance level when compared with the observed precipitation, except for the NCEP-CFSR data when assessed for the Rx1d index for the KRB. The APHRODITE data exhibited better correlation with observed precipitation for three of the indices (Rx1d, Rx5d and R10mm) in both basins versus the other GCPs, while the PERSIANN-CDR data resulted in the best performance in the R50mm index estimation. In addition, the majority of the RB values, which were calculated for the Rx1d, Rx5d and R50mm indices estimated by the three GCPs, were negative. This is similar to the findings reported by Miao et al. [
78], who found that the PERSIANN-CDR data tends to underestimate the Rx1d and Rx5d indices in the eastern China region. This can be explained by the fact that most of the GCPs underestimated the precipitation range which is greater than 50 mm in the two basins (
Figure 4).
The RB statistic was used to quantify the difference in accuracy in simulating extreme streamflow events, based on the Rx1d and Rx5d indices, between the rain gauge-based and other three GCPs for the KRB (
Figure 7) and JRB (
Figure 8) because it provided a reliable basis for comparison of different case studies [
79]. The majority of the RB values calculated for the APHRODITE and PERSIANN-CDR Rx1d and Rx5d indices are negative, indicating that most of the high streamflows were underestimated. However, the reverse pattern can be observed for the RB values determined for the respective NCEP-CFSR indices, indicating that streamflow was significantly overestimated in both basins for the NCEP-CFSR-based SWAT simulations.
5. Discussion
In this study, six different sets of GCP precipitation and temperature inputs were forced to drive the SWAT model. The overall results of the analyses of the GCP data clearly revealed that the APHRODITE precipitation data resulted in the best performance of the three GCP data sources, based on the SWAT simulation graphical and statistical results. These results agree with the findings reported in several other studies, which showed that SWAT simulations executed with APHRODITE precipitation data performed very well in central Vietnam [
23,
24,
80]; glacier influenced basins in mountainous regions in northwest China [
81,
82] and central Asia [
83,
84]; and a major tributary of the Yangtze River in central China [
85]. Lauri et al. [
31] also found that executing the VMod hydrological model [
86] with combined APHRODITE precipitation and NCEP-CFSR temperature inputs accurately replicated hydrological simulations based on surface climate inputs of the 795,000 km
2 Mekong River Basin in southeast Asia. These composite results underscore the strength of the APHRODITE precipitation data for a variety of Asian conditions and that it can reliably be used for hydrological applications in un-gauged, data limited or restricted basins in the Southeast Asia.
The results found here clearly show that the original NCEP-CFSR precipitation is not suitable to apply for streamflow simulations in Malaysia, which is in agreement with the findings of Monteiro et al. [
27], Roth and Lemann [
64] and Bressiani et al. [
87] for other tropical or sub-tropical conditions. However, the results found here conflict with the findings of Jajarmizadeh et al. [
88], who report successful SWAT streamflow simulation results using the NCEP-CFSR data for the Roodan watershed that is located in southern Iran. Differences in climate and geographical conditions are the most likely explanation for such differences between the Jajarmizadeh et al. [
88] study and the results reported in this research and other previously cited studies. In addition, the streamflow overestimation that resulted from the use of the NCEP-CFSR data in this study could be related to possible problems that occur over tropical regions [
70], including the effects of the satellite algorithms on precipitation estimation and the CFSR model parameterizations.
In general, the performance of the APHRODITE data was better for the KRB compared to the JRB. This is due in part to a more complete distribution of rain gauges for the KRB versus the JRB (
Figure 1); the JRB lacks long-term climate data representation in the northern part of the basin. In addition, the PERSIANN-CDR precipitation-based SWAT simulation also performed better for the KRB, which is consistent with Zhu et al. [
29] who found that the PERSIANN-CDR data resulted in a smaller relative error in a data-rich region. These results are consistent with previously reported findings that improved SWAT hydrologic simulations usually occur in response to precipitation inputs characterized by higher resolution, versus lower resolution precipitation inputs [
89,
90,
91].
As shown in
Table 6, we also found that the effect of the basin size proved to be of minor importance compared to the performance of the three GCPs. For instance, the NCEP-CFSR data performed poorly in both basins, regardless of size and flow characteristics, while the APHRODITE precipitation resulted in the best performance for both basins. We also note that differences in sub-basin and/or HRU delineations, while not investigated in this study, typically do not impact SWAT streamflow and other hydrologic outputs as discussed in a previous review of SWAT literature [
48] and reported in several subsequent SWAT applications [
92,
93,
94,
95].
Finally, it is important to emphasize that there were distinct periods within the overall simulation timeframe in which prevailing periods of bias actually were reversed for a specific GCP; e.g., streamflow extremes were overestimated during periods where precipitation extremes were underestimated. For example, the PERSIANN-CDR underestimated the Rx1d precipitation index by about 45% during 1989, but the corresponding Rx1d streamflow index was overestimated by 31.3%. This is consistent with the findings of a similar study conducted by Zhu et al. [
29] for the Xiang River and Qu River watersheds in China. This finding indicates that there are certain periods where the precipitation generated by GCPs is unlikely to accurately capture the amount and durations of extreme events. This is further exacerbated by the fact that there is a variation between the precipitation and streamflow extremes temporal scales. For example, peak streamflow usually occurred a few days/hours after the corresponding peak precipitation, but the peak streamflow normally represents an accumulation of precipitation events that occurred over several days/hours.