3.2.2. Hydrological Indices

The widely used metrics NSE is used to evaluate the performance of each precipitation dataset for hydrological simulations. NSE is calculated as the ratio of residual variance to measured discharge variances [94]. Simulated discharges using these datasets were also compared against their gauged counterparts using three hydrological statistics: daily mean discharge, winter low flow (5th percentile of the winter flow), and summer high flow (95th percentile of the summer flow).

## 3.2.3. Error Propagation Indices

Two absolute ratios (γ) between error metrics (RB and ubRMSE) for the runoff and precipitation series are used to quantify the error propagation through the precipitation–runoff process. γ*RB* and γ*ubRMSE* respectively reflect the systematic and random error propagation effects. They are always greater than 0 due to their absolute values, and values larger (smaller) than 1 indicate the amplification (dampening) of the error from precipitation to runoff.



#### *Remote Sens.* **2020**, *12*, 3550

#### **4. Results and Discussion 4. Results and Discussion**

#### *4.1. Precipitation Evaluation 4.1. Precipitation Evaluation*

#### 4.1.1. Seasonal Patterns of Precipitation Datasets 4.1.1. Seasonal Patterns of Precipitation Datasets

Figure 2 presents the seasonality (spring: March–May, summer: June–August, autumn: September–November, winter: December–February; wet season: April–September and dry season: October–March) of the mean precipitation for all ten precipitation datasets (eight satellite-based precipitation datasets, one gauged precipitation (i.e., the dense-gauge dataset), and one gauge-interpolated precipitation (i.e., CN05)). All stations or grids within the watershed are averaged to a single time series to calculate the seasonal mean values. The figure graphically demonstrates that CN05 agrees well with the dense-gauge observation for all four seasons. Specifically, CN05 presents a small RB within ±7.0% for seasonal precipitation (−2.4% for spring, −6.1% for summer, −1.7% for autumn, and 0.3% for winter). With the exception of satellite-only datasets, which considerably underestimate the precipitation for all seasons, the satellite-based datasets also reasonably represent the observed seasonality. However, all of them are worse than CN05 for all seasons. The better performance of PERSIANN CDR among satellite-based datasets for seasonal precipitation, especially in spring, summer, and autumn, could reflect the effects of its blending strategies. PERSIANN CDR maintains monthly precipitation that is consistent with the monthly GPCP, and GPCP is mainly composed of gauged precipitation datasets (e.g., GPCC) [70]. In addition, all the satellite-gauge datasets overestimate the dense-gauge precipitation in summer and the wet season while underestimating in winter. In addition, both blended datasets (MSWEP and CHIRPS) overestimate the precipitation all year round. TRMM, CMORPH BLD, and MSWEP fit the dense-gauge precipitation better in the dry season than the wet season, while PERSIANN CDR, CMORPH CRT, CHIRPS, and satellite-only datasets perform better in the wet season than the dry season. Figure 2 presents the seasonality (spring: March–May, summer: June–August, autumn: September–November, winter: December–February; wet season: April–September and dry season: October–March) of the mean precipitation for all ten precipitation datasets (eight satellite-based precipitation datasets, one gauged precipitation (i.e., the dense-gauge dataset), and one gauge-interpolated precipitation (i.e., CN05)). All stations or grids within the watershed are averaged to a single time series to calculate the seasonal mean values. The figure graphically demonstrates that CN05 agrees well with the dense-gauge observation for all four seasons. Specifically, CN05 presents a small RB within ±7.0% for seasonal precipitation (−2.4% for spring, −6.1% for summer, −1.7% for autumn, and 0.3% for winter). With the exception of satellite-only datasets, which considerably underestimate the precipitation for all seasons, the satellite-based datasets also reasonably represent the observed seasonality. However, all of them are worse than CN05 for all seasons. The better performance of PERSIANN CDR among satellite-based datasets for seasonal precipitation, especially in spring, summer, and autumn, could reflect the effects of its blending strategies. PERSIANN CDR maintains monthly precipitation that is consistent with the monthly GPCP, and GPCP is mainly composed of gauged precipitation datasets (e.g., GPCC) [70]. In addition, all the satellite-gauge datasets overestimate the dense-gauge precipitation in summer and the wet season while underestimating in winter. In addition, both blended datasets (MSWEP and CHIRPS) overestimate the precipitation all year round. TRMM, CMORPH BLD, and MSWEP fit the dense-gauge precipitation better in the dry season than the wet season, while PERSIANN CDR, CMORPH CRT, CHIRPS, and satellite-only datasets perform better in the wet season than the dry season.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 12 of 34

**Figure 2.** Mean seasonal precipitation (spring: March–May, summer: June–August, autumn: September–November, winter: December–February, dry season: October–March and wet season: April–September) of 2003–2013 from the dense-gauge precipitation and eight gridded datasets (National Meteorological Information Center dataset from the China Meteorological Administration (CN05), Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN), Climate Prediction Center MORPHing-raw satellite dataset (CMORPH RAW), Tropical Rainfall Measuring Mission (TRMM), PERSIANN Climate Data Record (PERSIANN CDR), CMORPH bias-corrected (CMORPH CRT), CMORPH gauge blended (CMORPH BLD), Multi-Source Weighted-Ensemble Precipitation V2.0 (MSWEP) and Climate Hazards Group InfraRed Precipitation with Stations V2.0 (CHIRPS)). **Figure 2.** Mean seasonal precipitation (spring: March–May, summer: June–August, autumn: September–November, winter: December–February, dry season: October–March and wet season: April–September) of 2003–2013 from the dense-gauge precipitation and eight gridded datasets (National Meteorological Information Center dataset from the China Meteorological Administration (CN05), Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN), Climate Prediction Center MORPHing-raw satellite dataset (CMORPH RAW), Tropical Rainfall Measuring Mission (TRMM), PERSIANN Climate Data Record (PERSIANN CDR), CMORPH bias-corrected (CMORPH CRT), CMORPH gauge blended (CMORPH BLD), Multi-Source Weighted-Ensemble Precipitation V2.0 (MSWEP) and Climate Hazards Group InfraRed Precipitation with Stations V2.0 (CHIRPS)).

The spatial distributions of summer precipitation are also presented for all datasets in Figure 3. The dense-gauge datasets are presented as color dots, while all the other datasets are presented as grids. Generally, summer precipitation is heavier in high elevation areas (southeastern, southwestern, and southern parts) than in other regions. CN05 clearly missed quite some regional intensive precipitation (such as the heavy precipitation in the southeastern parts of the region), which can even be captured by MSWEP and CHIRPS. The bad performance of CN05 may be caused by two reasons: (1) its lower spatial resolution (0.5◦ × 0.5◦ ) and (2) its less gauged source data compared with the dense-gauged dataset. Satellite-only datasets underestimate precipitation for all grids, even though PERSIANN can capture the heavy precipitation signal in mountain areas. Although all satellite-gauge datasets could capture this spatial distribution pattern, these datasets still underestimate the heavy precipitation in mountain areas (southern and southeastern parts) while overestimating the small precipitation in central plain regions. The spatial distributions of winter precipitation, as shown in Appendix B in Figure A1, display similar patterns, as CN05 still performs relatively worse than the two blended datasets: MSWEP and CHIRPS. The better performance of blended datasets for the spatial distribution of seasonal precipitation may be due to their reanalysis components. The dense-gauge datasets are presented as color dots, while all the other datasets are presented as grids. Generally, summer precipitation is heavier in high elevation areas (southeastern, southwestern, and southern parts) than in other regions. CN05 clearly missed quite some regional intensive precipitation (such as the heavy precipitation in the southeastern parts of the region), which can even be captured by MSWEP and CHIRPS. The bad performance of CN05 may be caused by two reasons: (1) its lower spatial resolution (0.5° × 0.5°) and (2) its less gauged source data compared with the dense-gauged dataset. Satellite-only datasets underestimate precipitation for all grids, even though PERSIANN can capture the heavy precipitation signal in mountain areas. Although all satellite-gauge datasets could capture this spatial distribution pattern, these datasets still underestimate the heavy precipitation in mountain areas (southern and southeastern parts) while overestimating the small precipitation in central plain regions. The spatial distributions of winter precipitation, as shown in Appendix B in Figure A1, display similar patterns, as CN05 still performs relatively worse than the two blended datasets: MSWEP and CHIRPS. The better performance of blended datasets for the spatial distribution of seasonal precipitation may be due to their reanalysis components.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 13 of 34

The spatial distributions of summer precipitation are also presented for all datasets in Figure 3.

**Figure 3.** Summer precipitation (mm) during June–August of 2003–2013 from CN05 and eight **Figure 3.** Summer precipitation (mm) during June–August of 2003–2013 from CN05 and eight satellite-based datasets compared with the dense-gauge dataset, which is shown as colored dots.

#### satellite-based datasets compared with the dense-gauge dataset, which is shown as colored dots. 4.1.2. Error Structures of Precipitation Datasets

4.1.2. Error Structures of Precipitation Datasets Figure 4A presents two types of the daily gridded precipitation information for the dense-gauge, CN05, and eight satellite-based datasets: (1) bar charts represent the frequency distribution of precipitation under seven different rain rate classes (0, 0–1, 1–5, 5–10, 10–25, 25–50 and >50 mm/day) and (2) line charts represent the contribution of the precipitation amount under each rain rate class to the total precipitation. As shown in the bar charts, PERSIANN CDR, CMORPH Figure 4A presents two types of the daily gridded precipitation information for the dense-gauge, CN05, and eight satellite-based datasets: (1) bar charts represent the frequency distribution of precipitation under seven different rain rate classes (0, 0–1, 1–5, 5–10, 10–25, 25–50 and >50 mm/day) and (2) line charts represent the contribution of the precipitation amount under each rain rate class to the total precipitation. As shown in the bar charts, PERSIANN CDR, CMORPH CRT, and CMORPH BLD are close to the gauged counterparts of where the precipitation frequencies decrease from 0 to

0–1 mm and slightly increase under the 1–5 mm class, and then decrease until the >50 mm class. These tendencies of precipitation frequency under the 0, 0–1, and 1–5 mm classes are inaccurately represented by MSWEP. Another two satellite-based datasets (TRMM and CHIRPS) overestimate the frequencies of no rain (0 mm) and heavy rain (>50 mm) and underestimate little rain (0–1 mm). Line charts show that the largest precipitation contribution of all datasets except for TRMM and CMORPH CRT occurs at the 10–25 mm class. Large differences in the precipitation contribution among datasets occur at the 25–50 mm and >50 mm classes.

The detection errors of each satellite-based dataset are quantified based on the FBI, FAR, POD, and ETS in terms of the 11-year (2003–2013) annual, wet season, and dry season precipitation processes. Figure 4B presents the distribution of the FBI for nine precipitation datasets (CN05 and eight satellite-based datasets). FBI values of CMORPH RAW at the 25–50 (13.66) and >50 (89.35) intervals being larger than 6 are not demonstrated, which is the same as Figure 4C,D. Although both satellite-gauge and blended datasets poorly simulate the annual FBI values in the rain rates of 0 mm (e.g., the FBI of MSWEP is 2.55) and 0–1 mm (e.g., the FBI of TRMM is 3.95), they overall outperform the satellite-only categories, which have worse annual FBI results under more than half of the rain rate classes (Figure 4B). Figure 4C,D further demonstrates that more overestimations of FBI values of satellite-only datasets under most rain rate classes (5–10, 10–25, 25–50, and >50 mm) occur in the dry season than the wet season. The larger underestimation of precipitation events in the dry season is in good agreement with the seasonal precipitation amount in Section 4.1.1 and could further explain the sources of poor performances for satellite-only datasets. This may be because of the underestimation of precipitation events with the rain rate classes being larger than 10 mm during the wet season and the underestimation of all precipitation events during the dry season. As the rain rate class increases, the FBI of satellite-gauge datasets improves until the precipitation class exceeds 50 mm for both seasons. The annual FBI values of CMORPH CRT (0.78), TRMM (0.62), and CHIRPS (0.49) at this class are less than 1, indicating that these datasets overestimate the number of heavy rain events. This may also explain the overestimation of the percentage of heavy rains (Figure 4A).

FAR, POD, and ETS of satellite-based datasets also show obvious seasonal patterns. Two satellite-only datasets significantly deteriorate with the increasing rain rate classes in terms of the annual FBI, FAR, and POD, indicating their inability to capture the heavy precipitation. These two datasets clearly perform better in the wet season than in the dry season, especially in terms of the POD (Figure 4I,J) and ETS (Figure 4L,M). However, CMORPH BLD and MSWEP show an opposite seasonal pattern, as the better performance occurs in the dry season than the wet season in terms of the three statistics. In addition, both of them maintain their superiority among all the satellite-gauge datasets. Although gauge-interpolated CN05 shows relatively worse performances than CMORPH BLD and MSWEP, it shares similar seasonal patterns with them and also outperforms the other six satellite-based datasets with regard to POD and ETS under the most rain rate classes (0–1, 1–5, 5–10, and 10–25 mm).

Figure 5 shows the RB, ubRMSE, and R 2 of nine precipitation datasets (CN05 and eight satellite-based datasets) at both grid (shown as boxplots) and watershed-average scales (shown as radar plots). Generally, the performances of each precipitation dataset under two different scales are basically consistent in terms of all three quantitative statistics. Figure 5A,B show that CN05 presents a better RB than the eight satellite-based datasets. Among all satellite-gauge datasets, TRMM, PERSIANN CDR, and CMORPH CRT show the smallest RBs, indicating their smaller systematic errors, at both grid and watershed-averaged scales. CMORPH BLD and two blended datasets (MSWEP and CHIRPS) generally show positive RB under both scales, especially for CHIRPS, which overestimates the daily precipitation for more than 86.1% of the grids and has an RB of 16.5% at the watershed-average scale. In contrast, satellite-only datasets considerably underestimate the mean precipitation at both scales.

Random errors of CN05 and satellite-based datasets are quantified using the ubRMSE (Figure 5C). CN05 shows relatively larger random errors than the satellite-based datasets except for CMORPH RAW and CHIRPS. In addition, large differences are observed among satellite-based datasets. Specifically, CMORPH BLD presents the smallest ubRMSE with the median value of 6.31 mm at the grid scale

(Figure 5C) and 2.09 mm at the watershed-average scale (Figure 5D), while CHIRPS presents the largest ubRMSE with the median value of 10.94 mm at the grid scale and 6.28 mm at the watershed-average scale. MSWEP performs the best among all nine precipitation datasets with a median value of 5.83 mm at the grid scale and 2.14 mm at the watershed-average scale. datasets. Specifically, CMORPH BLD presents the smallest ubRMSE with the median value of 6.31 mm at the grid scale (Figure 5C) and 2.09 mm at the watershed-average scale (Figure 5D), while CHIRPS presents the largest ubRMSE with the median value of 10.94 mm at the grid scale and 6.28 mm at the watershed-average scale. MSWEP performs the best among all nine precipitation datasets with a median value of 5.83 mm at the grid scale and 2.14 mm at the watershed-average scale.

**Figure 4.** Frequency distribution of daily rainfall (**A**, shown as bar graph), contribution of rain rate classes to the annual accumulation (**A**, shown as line chart) as well as frequency bias index (FBI), false alarm ratio (FAR), probability of detection (POD), and equitable threat score (ETS) values of the 11-year annual (**B**,**E**,**H**,**K**), wet season (**C**,**F**,**I**,**L**), and dry season (**D**,**G**,**J**,**M**) precipitation process of 2003–2013 on a grid scale for seven daily precipitation thresholds over the Xiangjiang River Basin for the nine gridded datasets (CN05, TRMM, CHIRPS, PERSIANN CDR, CMORPH CRT, CMORPH **Figure 4.** Frequency distribution of daily rainfall (**A**, shown as bar graph), contribution of rain rate classes to the annual accumulation (**A**, shown as line chart) as well as frequency bias index (FBI), false alarm ratio (FAR), probability of detection (POD), and equitable threat score (ETS) values of the 11-year annual (**B**,**E**,**H**,**K**), wet season (**C**,**F**,**I**,**L**), and dry season (**D**,**G**,**J**,**M**) precipitation process of 2003–2013 on a grid scale for seven daily precipitation thresholds over the Xiangjiang River Basin for the nine gridded datasets (CN05, TRMM, CHIRPS, PERSIANN CDR, CMORPH CRT, CMORPH BLD, MSWEP, PERSIANN and CMORPH RAW) and the dense-gauge dataset.

Figure 5E,F presents the Rଶ values for all nine precipitation datasets, and both clearly reflect the influence of the blending methods and the incorporated gauged datasets on R<sup>ଶ</sup> . Datasets designed to provide the best instantaneous accuracy of precipitation (TRMM, CMORPH CRT, CMORPH BLD, and MSWEP) perform relatively better than those aimed to achieve the most Figure 5E,F presents the R <sup>2</sup> values for all nine precipitation datasets, and both clearly reflect the influence of the blending methods and the incorporated gauged datasets on R 2 . Datasets designed to provide the best instantaneous accuracy of precipitation (TRMM, CMORPH CRT, CMORPHBLD, and MSWEP) perform relatively better than those aimed to achieve the most temporally homogeneous record (PERSIANN CDR and CHIRPS). Within the four better-behaved satellite-based

BLD, MSWEP, PERSIANN and CMORPH RAW) and the dense-gauge dataset.

temporally homogeneous record (PERSIANN CDR and CHIRPS). Within the four better-behaved

datasets, those that directly incorporate daily gauge data (CMORPH BLD and MSWEP) clearly perform better than those that directly incorporated monthly gauge data (TRMM and CMORPH CRT). Two satellite-only datasets show the worst performance among all the satellite-based datasets. Similarly, CN05 is also less correlated with the dense-gauge dataset than half of the satellite-based datasets (TRMM, CMORPH CRT, CMORPH BLD, and MSWEP). *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 16 of 34 MSWEP) clearly perform better than those that directly incorporated monthly gauge data (TRMM and CMORPH CRT). Two satellite-only datasets show the worst performance among all the satellite-based datasets. Similarly, CN05 is also less correlated with the dense-gauge dataset than half of the satellite-based datasets (TRMM, CMORPH CRT, CMORPH BLD, and MSWEP).

**Figure 5.** Relative bias (RB), unbiased root mean squared error (RMSE), and Rଶ of the daily precipitation for the nine gridded datasets on both grid (shown as boxplot in (**A**,**C**,**E**)) and watershed (shown as radar plot in (**B**,**D**,**F**)) scales. In radar plots at right sides, red and blue lines represent the optimal values (RB (0), unbiased RMSE (0) and Rଶ(1)) and the results of gridded datasets of each **Figure 5.** Relative bias (RB), unbiased root mean squared error (RMSE), and R 2 of the daily precipitation for the nine gridded datasets on both grid (shown as boxplot in (**A**,**C**,**E**)) and watershed (shown as radar plot in (**B**,**D**,**F**)) scales. In radar plots at right sides, red and blue lines represent the optimal values (RB (0), unbiased RMSE (0) and R 2 (1)) and the results of gridded datasets of each statistic, respectively.

#### statistic, respectively. 4.1.3. Simulation of Extreme Precipitation

4.1.3. Simulation of Extreme Precipitation The results of four extreme precipitation statistics are presented in Figure 6 for nine datasets (CN05 and eight satellite-based datasets) at both grid (shown as relative bias compared to the dense-gauge precipitation dataset in boxplots) and watershed (shown as the absolute value in radar The results of four extreme precipitation statistics are presented in Figure 6 for nine datasets (CN05 and eight satellite-based datasets) at both grid (shown as relative bias compared to the dense-gauge precipitation dataset in boxplots) and watershed (shown as the absolute value in radar plots) scales. In the radar plots, red and blue lines represent the results of the dense-gauge and each dataset, respectively.

plots) scales. In the radar plots, red and blue lines represent the results of the dense-gauge and each dataset, respectively. R99pTOT (Figure 6A,B) reflects the total precipitation of heavy rain. The R99pTOT values of satellite-gauge and blended datasets except PERSIANN CDR and CHIRPS are similar to the dense-gauge observation, especially for more than 50% grids having biases within ±20.0% at the grid R99pTOT (Figure 6A,B) reflects the total precipitation of heavy rain. The R99pTOT values of satellite-gauge and blended datasets except PERSIANN CDR and CHIRPS are similar to the dense-gauge observation, especially for more than 50% grids having biases within ±20.0% at the grid scale. Specifically, PERSIANN CDR underestimates R99pTOT with more than 56.3% of grids having negative bias being smaller than −20% and a relative bias of 9.3% at the watershed scale.

scale. Specifically, PERSIANN CDR underestimates R99pTOT with more than 56.3% of grids having

datasets.

satellite-only datasets underestimate R99TOT.

However, CHIRPS overestimates R99pTOT at both scales (with more than 63.6% of grids having positive bias being larger than 20%, and a relative bias of 45.3% at the watershed scale). Additionally, two satellite-only datasets underestimate R99TOT. CDD, which is used as a criterion for representing droughts. For example, CMORPH CRT shows a small bias of CWD at both grid (with more than half of the grids have a bias of between ±10.0%) and watershed-average scales (CMORPH CRT: 18 days and the dense-gauge: 19 days). On the contrary,

datasets could not accurately capture the CDD and the CWD at the same time, especially for the

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 17 of 34

being larger than 20%, and a relative bias of 45.3% at the watershed scale). Additionally, two

The SDII values are shown in Figure 6C,D, and the similar results of SDII and R99pTOT can be explained by two factors: (1) heavy precipitation accounts for a large proportion of the annual precipitation amount, and (2) the number of wet days is similar for all nine datasets. The CWD is presented in Figure 6E,F at the grid and watershed-average scales, respectively. The CDD is

The SDII values are shown in Figure 6C,D, and the similar results of SDII and R99pTOT can be explained by two factors: (1) heavy precipitation accounts for a large proportion of the annual precipitation amount, and (2) the number of wet days is similar for all nine datasets. The CWD is presented in Figure 6E,F at the grid and watershed-average scales, respectively. The CDD is presented in Figure 6G,H. The results show that CMORPH BLD maintains its superiority among all the datasets in simulating these two extreme statistics. However, the other seven satellite-based datasets could not accurately capture the CDD and the CWD at the same time, especially for the CDD, which is used as a criterion for representing droughts. For example, CMORPH CRT shows a small bias of CWD at both grid (with more than half of the grids have a bias of between ±10.0%) and watershed-average scales (CMORPH CRT: 18 days and the dense-gauge: 19 days). On the contrary, the CDD of CMORPH CRT is not accurately estimated with more than 50.0% of the grids having a bias larger than 10.0% and bias of 44.7% at the watershed-average scale (CMORPH CRT: 55 days and the dense-gauge: 38 days). the CDD of CMORPH CRT is not accurately estimated with more than 50.0% of the grids having a bias larger than 10.0% and bias of 44.7% at the watershed-average scale (CMORPH CRT: 55 days and the dense-gauge: 38 days). CN05 better represents these four extreme precipitation statistics than all satellite-based datasets, especially for CDD, as shown in Figure 6G,H. It also shows tiny detection and systematic errors in the previous comparison in Section 4.1.2. However, CN05 misses quite some regional intensive seasonal precipitation and has larger random errors and worse Rଶ compared with more than half of the satellite-based datasets. In other words, the bad performance of CN05 indicates that some satellite-based datasets are effective in representing the spatial distribution of precipitation. However, this effect can be missed by gauge-interpolated datasets using sparse gauges with a relatively coarser spatial resolution. Therefore, there is a risk of having CN05 as the reference when investigating the statistical properties of satellite-based precipitation, especially for high-precision

**Figure 6.** Annual total precipitation when daily precipitation amount on a wet day > 99th percentile (R99pTOT), the annual daily precipitation amount on a wet day (SDII), and the maximum length of wet and dry spells (CWD and CDD) of the daily precipitation for the nine gridded datasets on both grids (shown as boxplot in (**A**,**C**,**E**,**G**)) and areal mean (shown as radar plot in (**B**,**D**,**F**,**H**)) scales. In radar plots at right sides, red line and blue lines respectively represent the results of the dense-gauge dataset and the nine other gridded datasets.

CN05 better represents these four extreme precipitation statistics than all satellite-based datasets, especially for CDD, as shown in Figure 6G,H. It also shows tiny detection and systematic errors in the previous comparison in Section 4.1.2. However, CN05 misses quite some regional intensive seasonal precipitation and has larger random errors and worse R 2 compared with more than half of the satellite-based datasets. In other words, the bad performance of CN05 indicates that some satellite-based datasets are effective in representing the spatial distribution of precipitation. However, this effect can be missed by gauge-interpolated datasets using sparse gauges with a relatively coarser spatial resolution. Therefore, there is a risk of having CN05 as the reference when investigating the statistical properties of satellite-based precipitation, especially for high-precision datasets.

#### *4.2. Hydrological Simulations*

Eight satellite-based datasets and gauge-interpolated CN05 are further compared against the dense-gauge dataset in hydrological modeling by both XAJ and SWAT models calibrated by observed streamflow. Both models are adequately calibrated with NSE values of 0.89 (XAJ) and 0.86 (SWAT) for calibration, and 0.89 (XAJ) and 0.84 (SWAT) for validation (Table 3).

**Table 3.** Comparison of Nash–Sutcliffe efficiency (NSE) of both Xinanjiang (XAJ) and Soil and Water Assessment Tool (SWAT) models in daily step simulation based on the dense-gauge and the nine precipitation datasets.


For illustrating the intra-annual variability of the hydrological process, Figure 7 shows the mean monthly hydrographs of observed and the simulated streamflow of the dense-gauge and the other nine precipitation datasets based on two models. The reason for using a monthly hydrograph rather than a daily hydrograph is to avoid noises when calculating the climatology due to the relatively short time period (i.e., 10 years) [95,96]. It can be observed that (1) the most precise simulation of discharge is achieved by the gauge-interpolated CN05 among all nine precipitation datasets. CMORPH BLD, MSWEP, TRMM, and CMORPH CRT offer better performance than the other satellite-based datasets. CHIRPS and PERSIANN CDR, respectively, overestimate and underestimate the observed discharge for almost the whole year. (2) During the flood periods (from April to August), the simulation processes of both the dense-gauge and the other nine datasets based on the XAJ model are obviously larger than results based on the SWAT model. Similar results were also discovered by Xu et al. [3], who used XAJ and SWAT models to test the ability of two reanalysis datasets in simulating flood events in the Xiangjiang River Basin.

simulation of satellite-based precipitation datasets.

dataset with the lumped XAJ, but the performance of satellite-gauge datasets in SWAT is slightly worse than that in the XAJ except for PERSIANN CDR. Similar to PERSIANN CDR, both blended datasets, CHIRPS (NSE = 0.44/0.48 for XAJ/SWAT) and MSWEP (NSE = 0.78/0.79 for XAJ/SWAT), perform better in SWAT than in XAJ. Despite some differences in the simulation performances of the two models, the relative sort orders of the datasets based on NSE are almost consistent in both models. The best simulation was achieved by satellite-gauge CMORPH BLD, which was followed by blended MSWEP, TRMM, and CMORPH CRT. CHIRPS. PERSIANN CDR performed moderately; however, satellite-only datasets showed the worst performance. This consistency indicates that

**Figure 7.** Simulation results of mean monthly hydrographs (2004–2013) using nine gridded datasets **Figure 7.** Simulation results of mean monthly hydrographs (2004–2013) using nine gridded datasets and the dense-gauge dataset in the SWAT together with XAJ models.

and the dense-gauge dataset in the SWAT together with XAJ models.

Three hydrological statistics (daily mean discharge, winter low flow, and summer high flow) are further used to compare the daily simulated discharge of both the dense-gauge and the nine alternative datasets against their observed counterparts. Figure 8 presents the annualized results To further quantify the performance of satellite-based datasets in representing streamflow time series, the NSE values of two hydrological models based on daily streamflow are calculated and presented in Table 3. Results based on the XAJ model show that three satellite-gauge precipitation datasets (TRMM, CMORPH CRT, and CMORPH BLD) and blended datasets (MSWEP) are satisfactory

in simulating streamflow time series, with NSE being larger than 0.72. CMORPH BLD (NSE = 0.84) outperforms all other satellite-based datasets. CHIRPS (NSE = 0.44) and PERSIANN CDR (NSE = 0.56) show moderate performances. Satellite-only datasets cannot represent the observed streamflow time series with NSE = −0.97 for CMORPH RAW and NSE = −0.40 for PERSIANN. The semi-distributed SWAT shows the similar daily simulation performance of each dataset with the lumped XAJ, but the performance of satellite-gauge datasets in SWAT is slightly worse than that in the XAJ except for PERSIANN CDR. Similar to PERSIANN CDR, both blended datasets, CHIRPS (NSE = 0.44/0.48 for XAJ/SWAT) and MSWEP (NSE = 0.78/0.79 for XAJ/SWAT), perform better in SWAT than in XAJ. Despite some differences in the simulation performances of the two models, the relative sort orders of the datasets based on NSE are almost consistent in both models. The best simulation was achieved by satellite-gauge CMORPH BLD, which was followed by blended MSWEP, TRMM, and CMORPH CRT. CHIRPS. PERSIANN CDR performed moderately; however, satellite-only datasets showed the worst performance. This consistency indicates that using different models does not significantly alter the relative performances of streamflow simulation of satellite-based precipitation datasets.

Three hydrological statistics (daily mean discharge, winter low flow, and summer high flow) are further used to compare the daily simulated discharge of both the dense-gauge and the nine alternative datasets against their observed counterparts. Figure 8 presents the annualized results (shown as the relative bias between the simulated discharge of each precipitation dataset and the observed discharge) of three statistics from 2004 to 2013. XAJ and SWAT models show similar results for daily mean discharge (Figure 8A); however, SWAT obviously underestimates the other two hydrological statistics (Figure 8B,C), especially for the winter low flow. Based on three statistics, CMORPH BLD consistently performs better than other satellite-gauge datasets. PERSIANN CDR and CHIRPS respectively underestimate and overestimate the observed discharge for both models. Blended MSWEP performs well, although its daily maxima discharge in the XAJ model shows an obvious overestimation (the results of 8 years are more than 0) and underestimation in the SWAT model (the results of 7 years are less than 0). Similar to previously used indexes, satellite-only datasets still show the worst performance. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 20 of 34 (shown as the relative bias between the simulated discharge of each precipitation dataset and the observed discharge) of three statistics from 2004 to 2013. XAJ and SWAT models show similar results for daily mean discharge (Figure 8A); however, SWAT obviously underestimates the other two hydrological statistics (Figure 8B,C), especially for the winter low flow. Based on three statistics, CMORPH BLD consistently performs better than other satellite-gauge datasets. PERSIANN CDR and CHIRPS respectively underestimate and overestimate the observed discharge for both models. Blended MSWEP performs well, although its daily maxima discharge in the XAJ model shows an obvious overestimation (the results of 8 years are more than 0) and underestimation in the SWAT model (the results of 7 years are less than 0). Similar to previously used indexes, satellite-only datasets still show the worst performance.

**Figure 8.** *Cont.*

**Figure 8.** Boxplot of the relative bias of the daily mean discharge (**A**), summer high flow (**B**), and winter low flow (**C**) simulated using the nine gridded datasets and the dense-gauge dataset based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value from each

RB and ubRMSE of streamflow respectively reflect the systematic and random errors of each dataset in simulating the streamflow. Figures 9 and 10 respectively show the RB and ubRMSE of annual, wet, and dry seasons streamflow and their corresponding propagation factors (γோ and γ௨ோெௌா) simulated using CN05 and eight satellite-based precipitation datasets from 2004 to 2013.

year of 2004–2013.

4.3.1. Systematic Error Propagation

*4.3. Error Propagation* 

datasets still show the worst performance.

(shown as the relative bias between the simulated discharge of each precipitation dataset and the observed discharge) of three statistics from 2004 to 2013. XAJ and SWAT models show similar results for daily mean discharge (Figure 8A); however, SWAT obviously underestimates the other two hydrological statistics (Figure 8B,C), especially for the winter low flow. Based on three statistics, CMORPH BLD consistently performs better than other satellite-gauge datasets. PERSIANN CDR and CHIRPS respectively underestimate and overestimate the observed discharge for both models. Blended MSWEP performs well, although its daily maxima discharge in the XAJ model shows an obvious overestimation (the results of 8 years are more than 0) and underestimation in the SWAT model (the results of 7 years are less than 0). Similar to previously used indexes, satellite-only

**Figure 8.** Boxplot of the relative bias of the daily mean discharge (**A**), summer high flow (**B**), and winter low flow (**C**) simulated using the nine gridded datasets and the dense-gauge dataset based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value from each **Figure 8.** Boxplot of the relative bias of the daily mean discharge (**A**), summer high flow (**B**), and winter low flow (**C**) simulated using the nine gridded datasets and the dense-gauge dataset based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value from each year of 2004–2013.

#### *4.3. Error Propagation*

year of 2004–2013.

*4.3. Error Propagation*  RB and ubRMSE of streamflow respectively reflect the systematic and random errors of each dataset in simulating the streamflow. Figures 9 and 10 respectively show the RB and ubRMSE of annual, wet, and dry seasons streamflow and their corresponding propagation factors (γோ and RB and ubRMSE of streamflow respectively reflect the systematic and random errors of each dataset in simulating the streamflow. Figures 9 and 10 respectively show the RB and ubRMSE of annual, wet, and dry seasons streamflow and their corresponding propagation factors (γ*RB* and <sup>γ</sup>*ubRMSE*)simulated using CN05 and eight satellite-based precipitation datasets from 2004 to 2013.

#### γ௨ோெௌா) simulated using CN05 and eight satellite-based precipitation datasets from 2004 to 2013. 4.3.1. Systematic Error Propagation

4.3.1. Systematic Error Propagation Generally, TRMM, with the minimal RB (systematic error) of annual streamflow performs the best among all datasets, which is then followed by CMORPH CRT and CMORPH BLD, displaying comparable performance with CN05 (Figure 9A). Two satellite-only datasets considerably underestimate the annual streamflow. However, their results of the systematic error propagation factor (γ*RB* shown in Figure 9B) are larger than 1, indicating amplification of the systematic error when translating the precipitation into a runoff. TRMM, PERSIANN CDR, and CHIRPS have the same amplified effect for the systematic error of the precipitation, while γ*RB* values for the other five datasets are around 1.

There is a seasonal trend for the RB of streamflow for all datasets in which the range of RB values for the wet season streamflow (Figure 9C) is much smaller than that for the dry season (Figure 9E). This narrow RB range means a smaller inter-annual difference in the wet season. As for RB, six out of nine datasets (all datasets except CMORPH BLD, MSWEP, and CHIRPS) show smaller RBs (closer to 0) in the wet season than in the dry season. Thus, the more apparent amplification of the systematic error of precipitation to runoff (the larger results of γ*RB*) occurs in the dry season compared to the wet season for nearly all nine datasets except for satellite-only datasets (Figure 9D,F).

Moreover, the hydrological models also influence the RB of streamflow. During the wet season, SWAT generally performs much better than XAJ for more than half of the datasets (all datasets except for TRMM, PERSIANN, and CHIRPS, Figure 9C). While during the dry season, XAJ outperforms SWAT for more than half of the datasets (all datasets except for CMORPH CRT and MSWEP, Figure 9E).

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 22 of 34

**Figure 9.** Relative bias of streamflow (**A**,**C**,**E**) and relative bias propagation factor (**B**,**D**,**F**) of the whole year, the wet season and the dry season for the nine gridded datasets based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value of each year from 2004 **Figure 9.** Relative bias of streamflow (**A**,**C**,**E**) and relative bias propagation factor (**B**,**D**,**F**) of the whole year, the wet season and the dry season for the nine gridded datasets based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value of each year from 2004 to 2013.

#### to 2013. 4.3.2. Random Error Propagation

Figure 10A demonstrates that the streamflow values of ubRMSE are not distinctive among nine datasets, except for two satellite-only data, which have significantly larger values. Among the rest of the datasets, PERSIANN CDR and CHIRPS show the largest random errors of streamflow. CN05 shows the minimum ubRMSE of streamflow; however, this is different from its relatively larger ubRMSE of the precipitation (as demonstrated in Section 4.1.2). This discrepancy in the ubRMSE of precipitation and streamflow for CN05 is due to its largest dampening effect of random error. All the other eight datasets have similar dampening effects with γ*ubRMSE* being smaller than 1 (Figure 10B), and CMORPH BLD along with MSWEP have the largest γ*ubRMSE* among these datasets.

ubRMSE of streamflow also has a seasonal trend with its values and ranges in the wet season (Figure 10C) being larger than those in the dry season (Figure 10E) for all datasets. This seasonal difference also applies to the random error propagation factor γ*ubRMSE* (Figure 10D,F). ubRMSE of the same precipitation dataset generated from different hydrological models is different (Figure 10C,E). 4.3.2. Random Error Propagation

Specifically, SWAT generates a larger ubRMSE than that of the XAJ model for nearly all datasets (nine datasets except for PERSIANN CDR and CHIRPS), especially during the wet season (Figure 10D,F). 10C,E). Specifically, SWAT generates a larger ubRMSE than that of the XAJ model for nearly all datasets (nine datasets except for PERSIANN CDR and CHIRPS), especially during the wet season (Figure 10D,F).

the same precipitation dataset generated from different hydrological models is different (Figure

and CMORPH BLD along with MSWEP have the largest γ௨ோெௌா among these datasets.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 23 of 34

Figure 10A demonstrates that the streamflow values of ubRMSE are not distinctive among nine datasets, except for two satellite-only data, which have significantly larger values. Among the rest of the datasets, PERSIANN CDR and CHIRPS show the largest random errors of streamflow. CN05 shows the minimum ubRMSE of streamflow; however, this is different from its relatively larger ubRMSE of the precipitation (as demonstrated in Section 4.1.2). This discrepancy in the ubRMSE of precipitation and streamflow for CN05 is due to its largest dampening effect of random error. All the other eight datasets have similar dampening effects with γ௨ோெௌா being smaller than 1 (Figure 10B),

ubRMSE of streamflow also has a seasonal trend with its values and ranges in the wet season (Figure 10C) being larger than those in the dry season (Figure 10E) for all datasets. This seasonal

**Figure 10.** Unbiased RMSE of streamflow (**A**,**C**,**E**) and unbiased RMSE propagation factor (**B**,**D**,**F**) of the whole year, the wet season and the dry season for the nine gridded datasets based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value of each year from 2004– 2013. **Figure 10.** Unbiased RMSE of streamflow (**A**,**C**,**E**) and unbiased RMSE propagation factor (**B**,**D**,**F**) of the whole year, the wet season and the dry season for the nine gridded datasets based on SWAT (red) together with XAJ (blue) models. Each boxplot is constructed with one value of each year from 2004–2013.

Satellite-only datasets directly estimate precipitation through PMW or IR sensors [22–24]. Their worst performances among all eight satellite-based datasets reflect the defectiveness of the existent remote sensing retrievals algorithms and the necessity to blend with gauged measurement to account for their limited abilities, such as distinguishing rain particles and electromagnetic interferences from rough terrain and trees to sensors [26]. Theoretically, PMW is more accurate than VIS–IR, because the former physically links the sensors' signal to the size and phase of the hydrometeors, which is presented within the observed atmospheric column [1,24,91]. However, CMORPH RAW (mainly based primarily on PMW) performs worse than PERSIANN (mainly based on IR), which is opposite for some other regional studies in the Asian monsoon regions, such as Japan [97] and the Tibet Plateau [98]. This inconsistent result may reflect the influence of integrating methods (PMW and IR) on the performance of satellite-only datasets and the region-dependent nature of these methods.

Compared with satellite-only datasets (CMORPH RAW and PERSIANN), better performances in both precipitation and hydrological simulations are clearly achieved by their improved satellite-gauge versions (CMORPH CRT, CMORPH BLD, and PERSIANN CDR). This improvement proves the validity of the blending algorithms using gauge precipitation to enhance the precipitation estimation performances of the satellite-only datasets. Satellite-gauge CMORPH BLD outperforms all satellite-gauge datasets in both precipitation and hydrological simulations, which is mainly due to the effectiveness of using bias correction and blending algorithms by incorporating the daily precipitation gauge dataset to improve CMORPH RAW [36]. Among blended datasets, MSWEP shows comparable good performance with CMORPH BLD. The superiority of MSWEP could be mainly explained by two factors: (1) the gauged component utilized by MSWEP takes up a higher proportion (30.0% to 50.0%) in the final precipitation dataset compared to the other satellite-based datasets; and (2) the reanalysis data used in MSWEP may bring more potential information [71,99].

CN05 outperforms all satellite-based datasets in hydrological simulation. This satisfactory performance suggests that CN05 is fully able to act as the proxy of the dense-gauge precipitation dataset in the hydrological simulation in the Xiangjiang River Basin, although it could not act as the reference data to directly evaluate the statistical properties of satellite-based precipitation.

Additionally, the datasets used for model calibration would influence the hydrological performances of satellite-based datasets for the validation period, and many studies suggested recalibrating hydrological models directly using satellite-based datasets [60,100,101]. However, only the dense-gauge precipitation was used in this study to calibrate the hydrological models, and all satellite-based datasets then used the same set of optimal parameters for hydrological modeling. This is based on the assumption that the dense-gauge dataset is more accurate than the satellite-based datasets, excluding the effects of uncertainty in model parameters on hydrological simulations. Even though the satellite product-forced model performance may be degraded, when using the dense-gauge precipitation for model calibration, all satellite-based datasets used the same set of optimal parameters for hydrological modeling. In addition, one test based on the XAJ model has been conducted to prove that the calibration dataset would not change the relative hydrological performance of these satellite-based datasets, which are shown in Appendix C as Table A1 (NSE value for both calibration and validation periods) and Figure A1 (mean monthly hydrograph during 2004–2013). Therefore, it is rational to compare the performance of each satellite-based dataset. For those watersheds where the dense-gauge precipitation dataset is not available, the hydrological model may be calibrated using satellite-based datasets or other gridded datasets.

#### **5. Conclusions**

This study evaluates eight high-resolution satellite-based precipitation datasets (satellite-only: PERSIANN and CMORPH RAW, satellite-gauge: TRMM PERSIANN CDR, CMORPH CRT and CMORPH BLD and blended: MSWEP and CHIRPS) and a gauge-interpolated CN05 based on a dense-gauge dataset for hydrological modeling over a monsoon prone watershed in China. We can draw the following conclusions:


homogeneous record (PERSIANN CDR and CHIRPS). Among the four better-behaved datasets, two directly incorporating daily gauge data (CMORPH BLD and MSWEP) outperform two directly incorporating monthly gauge data (TRMM and CMORPH CRT). However, satellite-only datasets (CMORPH RAW and PERSIANN) are the least capable of simulating streamflow, which is not recommended to use in the hydrological application. CN05 outperforms all satellite-based datasets in the hydrological simulation, indicating its capability to act as reference data during the hydrological evaluation.


There are still some limitations in this study. For example, the eight satellite-based datasets were compared over only one monsoon-prone watershed, and the conclusion may not be the same for other regions. In addition, the differences between using a dense-gauge dataset and satellite-based datasets to calibrate hydrological models were not fully investigated. For some data-lacking regions, the satellite-based datasets may be directly used to calibrate the hydrological models when the dense-gauge dataset is available. Therefore, in future studies, more watersheds from various climate regimes should be used to generalize the conclusions drawn from this study. In addition, the impacts of using different satellite-based datasets to calibrate the hydrological models on hydrological performances also need to be investigated.

**Author Contributions:** Conceptualization, J.C. and Z.L.; Data curation, Z.L., J.W. and W.Q.; Formal analysis, Z.L., L.L., J.W. and W.Q.; Funding acquisition, J.C.; Investigation, J.C., Z.L., L.L., J.W. and W.Q.; Methodology, J.C., Z.L., L.L., C.-Y.X. and J.-S.K.; Project administration, J.C.; Resources, C.-Y.X.; Supervision, J.C., C.-Y.X. and J.-S.K.; Validation, Z.L.; Visualization, J.C. and Z.L.; Writing—original draft, J.C. and Z.L.; Writing—review and editing, J.C. and Z.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by the National Natural Science Foundation of China (Grant No. 52079093, 51779176), the Hubei Provincial Natural Science Foundation of China (Grant No. 2020CFA100), and the Overseas Expertise Introduction Project for Discipline Innovation (111 Project) (Grant No. B18037).

**Acknowledgments:** The authors would like to acknowledge the Water Resources Bureau of Hunan Province for providing gauged precipitation data and the China Meteorological Data Sharing System for providing temperature data and gridded datasets. The authors also thank all of the organizations for providing the satellite-based datasets, namely, the Goddard Earth Sciences Data and Information Services Center (TRMM), NOAA Climate Prediction Center (CMORPH RAW, CMORPH CRT, and CMORPH BLD), NOAA National Climatic Data Center (PERSIANN and PERSIANN CDR), Climate Hazards Group (CHIRPS) and Hylke Beck from Princeton University, the developer of MSWEP.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

A detailed description of the satellite-based precipitation datasets used in this study.
