**Evaluation of Multi-Satellite Precipitation Datasets and Their Error Propagation in Hydrological Modeling in a Monsoon-Prone Region**

#### **Jie Chen 1,2,\*, Ziyi Li <sup>1</sup> , Lu Li <sup>3</sup> , Jialing Wang <sup>1</sup> , Wenyan Qi <sup>1</sup> , Chong-Yu Xu <sup>4</sup> and Jong-Suk Kim <sup>1</sup>**


Received: 13 September 2020; Accepted: 26 October 2020; Published: 30 October 2020

**Abstract:** This study comprehensively evaluates eight satellite-based precipitation datasets in streamflow simulations on a monsoon-climate watershed in China. Two mutually independent datasets—one dense-gauge and one gauge-interpolated dataset—are used as references because commonly used gauge-interpolated datasets may be biased and unable to reflect the real performance of satellite-based precipitation due to sparse networks. The dense-gauge dataset includes a substantial number of gauges, which can better represent the spatial variability of precipitation. Eight satellite-based precipitation datasets include two raw satellite datasets, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) and Climate Prediction Center MORPHing raw satellite dataset (CMORPH RAW); four satellite-gauge datasets, Tropical Rainfall Measuring Mission 3B42 (TRMM), PERSIANN Climate Data Record (PERSIANN CDR), CMORPH bias-corrected (CMORPH CRT), and gauge blended datasets (CMORPH BLD); and two satellite-reanalysis-gauge datasets, Multi-Source Weighted-Ensemble Precipitation (MSWEP) and Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS). The uncertainty related to hydrologic model physics is investigated using two different hydrological models. A set of statistical indices is utilized to comprehensively evaluate the precipitation datasets from different perspectives, including detection, systematic, random errors, and precision for simulating extreme precipitation. Results show that CMORPH BLD and MSWEP generally perform better than other datasets. In terms of hydrological simulations, all satellite-based datasets show significant dampening effects for the random error during the transformation process from precipitation to runoff; however, these effects cannot hold for the systematic error. Even though different hydrological models indeed introduce uncertainties to the simulated hydrological processes, the relative hydrological performance of the satellite-based datasets is consistent in both models. Namely, CMORPH BLD performs the best, which is followed by MSWEP, CMORPH CRT, and TRMM. PERSIANN CDR and CHIRPS perform moderately well, and two raw satellite datasets are not recommended as proxies of gauged observations for their worse performances.

**Keywords:** satellite-based precipitation; hydrological modeling; error propagation; monsoonclimate watershed

#### **1. Introduction**

Precipitation is one of the most important meteorological variables in the hydrologic cycle and is often used as the fundamental input to environmental models for agricultural, meteorological, and hydrological studies [1]. However, precipitation measured by pluviometers usually suffers from many problems, such as sparse station distribution at high altitudes or in rural areas, missing data, and short time periods [2]. Meanwhile, artificial errors in measurements are inevitable [3]. In addition, surface observational networks have indicated decreasing coverages and spatial densities, which may limit the future capacity to measure precipitation for many parts of the world [4,5].

As a proxy for gauged precipitation, gridded precipitation with high spatial and temporal coverage has been developed, which can be generally classified into three categories based on different data sources: (1) gauge-interpolated, (2) reanalysis-based, and (3) satellite-based precipitation [6–10].

Gauge-interpolated precipitation, such as the Global Precipitation Climatology Centre (GPCC) [11] and Climate Research Unit (CRU) [12], is generated by interpolating gauged data to grids with different spatial resolutions [13,14]. Thiessen polygons, Kriging, and inverse distance weighting (IDW) are the most widely used interpolation algorithms [2,15]. More sophisticated interpolation methods take extra geographical or physical information into consideration, such as topography and atmospheric lapse rate [13,14].

Reanalysis-based precipitation is produced by assimilating various observations (e.g., weather stations, satellites, ships, and buoys) into a climate model to generate various meteorological variables with a consistent spatial and temporal resolution [4,16–18]. The reliability of reanalysis-based precipitation relies on assimilated observations, climate model parameters, and the interactions between models and observations. Several reanalysis datasets have been made freely available, such as the National Centers for Environmental Prediction/National Center for Atmosphere Research Reanalysis (NCEP/NCAR) [19], the European Centre for Medium-Range Weather Forecasts Reanalysis (ERA) [20], and the NCEP Climate Forest System Reanalysis (CFSR) [21].

Satellite-based precipitation, with a global and continuous temporal scale, estimates precipitation using polar-orbiting passive microwave (PMW) sensors on low-Earth-orbiting satellites and geosynchronous infrared (IR) sensors on geostationary satellites [22–24]. PMW sensors could observe the emissions and lower-atmosphere scattering signals of rainfall, snow, and ice contents, while IR sensors indirectly measure the lower-level rainfall rate by collecting cloud-top temperature and cloud height [25]. Usually, in regions with gauges, satellite-based datasets are modified by gauged measurements to offset their limited abilities [26]. Over the past 30 years, a number of precipitation datasets that combine gauges, PMW, and IR data to produce precipitation estimates are available with the spatial resolution on 0.25 latitude/longitude or finer. These include monthly Global Precipitation Climatology Project (GPCP) [27], daily Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN) [28], Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) [29], and Global Precipitation Measurement (GPM) [30].

Although gauge-interpolated and reanalysis-based precipitation may be more appropriate for climate change studies for their long-term data records, it is often difficult to verify their reliabilities in regions with sparse weather station networks [31,32]. Satellite-based datasets could estimate precipitation with a global and homogeneous spatial coverage; this spatial continuity could provide valuable information for hydrological modeling, especially for ungauged watersheds. In recent years, a few global-scale studies are revealing that the performances of satellite-based datasets differ regionally and temporally and correlate to topography, seasonality, and climatology [6,23,33,34]. However, the lack of global and sufficiently dense precipitation references makes these results of satellite-based precipitation still unreliable and inadequate for operational purposes, such as flood forecasting [35]. Therefore, a regional ground validation of satellite-based precipitation datasets based on dense gauges references, especially for their hydrological performances, still requires to be conducted [36–44]. Although most studies revealed the potential of satellite-based precipitation datasets for hydrological simulations, they also report error sources during the hydrological modeling

of satellite-based datasets. Generally, two main sources are (1) the error of the satellite-based datasets and (2) the error propagation of satellite-based datasets through the hydrological model [45].

The monsoon regions, having an obvious seasonal variation of precipitation, have always been a research focus of satellite-based precipitation datasets [46–52]. For example, Prakash et al. [51] compared four satellite-based precipitation datasets (Climate Prediction Center MORPHing-raw satellite dataset (CMORPH RAW), Naval Research Laboratory (NRL)-blended, PERSIANN, and TRMM 3B42) with the gauged-interpolated dataset in one Indian monsoon region with respect to their abilities to simulate the seasonal rainfall and the rainfall detection abilities over regions with diverse topography. The results show that although all four datasets underestimate the summer seasonal mean rainfall (June to September), TRMM 3B42 generally performs better than the other three datasets mainly due to its incorporation of rain gauge observations. Mou et al. [49] compared five satellite-based precipitation datasets (TRMM 3B42, its real-time dataset TRMM 3B42RT, GPCP-1DD, PERSIANN Climate Data Record (PERSIANN CDR), and CMORPH RAW) and a gauge-interpolated dataset (Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE)) at daily, monthly, seasonal, and annual scales with rain gauges over Malaysia. It was found that TRMM 3B42 and APHRODITE performed the best, while PERSIANN CDR slightly overestimated observed precipitation, and the other three satellite-based datasets showed the worst performance. In addition, all six precipitation datasets show better performances in southern Peninsular Malaysia, which receives higher precipitation, while worse performances appear in the western and dryer Peninsular Malaysia.

There also have been some studies executed in the monsoon regions aiming to evaluate the applicability of satellite-based datasets in hydrologic simulations [53–60]. For example, Tong et al. [58] evaluated four satellite-based datasets (TRMM 3B42, TRMM 3B42RT, CMORPH RAW, and PERSIANN) through comparing with the gauged China Meteorological Administration dataset (CMA) in streamflow simulations over the Tibetan Plateau based on the distributed Variable Infiltration Capacity (VIC) hydrological model. It was found that the error sources of these datasets are systematically different in different seasons. Furthermore, TRMM 3B42 shows comparable performance to CMA for both monthly and daily streamflow simulations due to its monthly gauge adjustment. However, the other three satellite-based datasets only show potentials or little capability for streamflow simulations over TP. In addition, five satellite-based precipitation datasets (TRMM 3B42, TRMM 3B42RT, CMORPH RAW, CMORPH CRT, and CMORPH BLD) were used by Wang et al. [59] to simulate the daily streamflow by driving the distributed Vegetation Interface Processes (VIP) model over two river basins in the southeastern Tibetan Plateau. The results show that these satellite-based datasets perform better in summer than other seasons, and CMORPH BLD performs the best for runoff simulations. TRMM 3B42 and CMORPH CRT show much better performance than their uncorrected counterparts: TRMM 3B42RT and CMORPH RAW.

From the previous studies, we found that first, there are relatively few evaluations focusing on satellite-based precipitation datasets in the monsoon regions of southern China, which is a flood-prone area. Both the flood predictions and water resource management are mainly based on hydrological simulations. Moreover, most existing studies in the monsoon characterized regions only compare several commonly used satellite-based datasets (such as TRMM and CMORPH serial datasets) and some promising recently released precipitation datasets, such as PERSIANN CDR, Climate Hazards Group InfraRed Precipitation with Stations V2.0 (CHIRPS), and Multi-Source Weighted-Ensemble Precipitation V2.0 (MSWEP) have not been thoroughly evaluated. Second, it is crucial to ensure that the gauged benchmark reference is sufficient to reflect the real performance of satellite-based precipitation when testing satellite-based datasets. However, many studies compared the satellite-based datasets based on the sparse-gauge datasets or gridded datasets generating from sparse gauges, which may not accurately reflect the spatial characteristic of precipitation [47,49,54,59,60]. Furthermore, when evaluating the accuracies of satellite-based datasets, the gauged references in some studies are not independent of the satellite-based datasets, which uses the gauged precipitation as part of their source data [49,52,58]. Third, despite the fact that some studies show that the performance of hydrological simulation is highly dependent on the satellite-based datasets themselves in the monsoon regions, the uncertainties of hydrological models caused by different models' complexities could also influence the hydrological simulation. The impact of these two uncertainties has not been carefully examined.

The latest review article of Maggioni et al. [35] pointed out that one of the future research areas for satellite-based precipitation datasets is to study the conditions (climate type, basin area, acceptable error in the output, and model structure) under which satellite-based precipitation could be successfully used in hydrological models. In order to provide a comprehensive understanding of the error of the satellite-based precipitation and its error propagation through hydrological models for monsoon-characterized watersheds, this study tests the reliability of eight satellite-based precipitation datasets in hydrological modeling for a large-sized (>80,000 km<sup>2</sup> ) monsoon-characterized watershed (Xiangjiang River Basin) in southern China. Even though one of the main usages of the satellite-based datasets is for ungauged watersheds or watersheds with spare weather stations, the test of their reliability requires a watershed with dense gauges. The Xiangjiang River Basin, which has 267 precipitation gauges (referred to as the dense-gauge precipitation dataset in the study), can meet this requirement for an 80,000 km<sup>2</sup> surface area. All the eight satellite-based precipitation datasets include TRMM 3B42 (TRMM), PERSIANN, PERSIANN CDR, CMORPH RAW, CMORPH bias-corrected (CMORPH CRT), CMORPH gauge blended (CMORPH BLD), MSWEP, and CHIRPS. In addition to using the dense-gauge precipitation dataset as a reference, an independent gridded gauge-interpolated precipitation dataset is also used, which incorporates much fewer stations from the National Meteorological Information Center dataset from the China Meteorological Administration (CN05) [61]. As high-density gauged precipitation is usually not available in China, CN05 is commonly used for meteorological and hydrological studies over most watersheds [62–64]. This study could be extended to test whether CN05 is capable of being used as a reliable reference for using satellite-based datasets over other watersheds where gauges are much less dense. To investigate the uncertainty related to hydrological models, the lumped Xinanjiang (XAJ) model and the semi-distributed Soil and Water Assessment Tool (SWAT) model, with different complexities, are used.

#### **2. Study Area and Datasets**

#### *2.1. Study Area*

The Xiangjiang River Basin has a complex topography with elevation ranging from 0 to 2100 m above sea level and is located between 24.5◦–28.1◦N and 110.5◦–114.0◦E in the southern part of China (Figure 1). The Xiangjiang River originates from Haiyang Mountain in Guangxi province with a drainage area of 80,669 km<sup>2</sup> and a total length of 801 km, making it one of the largest tributaries of the Yangtze River [3,65]. The Xiangjiang River Basin, located in the subtropical and warm temperate zone, which is dominated by the East-Asian monsoon climate with heavy summer rainfall in the south, is an ideal experimental basin with a good relationship between precipitation and runoff [65]. The average temperature is around 17 ◦C, and the annual precipitation is close to 1500 mm with occasionally little snowfall in the winter. More than 70% of the annual precipitation occurs between March and August. In addition, there are abundant water resources in the Xiangjiang River Basin; the study of satellite-based precipitation could provide valuable information for flood forecasting and water resources management for the administrative department.

**Figure 1.** The location of Xiangjiang River Basin and its river channel, precipitation gauge stations (dense-gauge precipitation gauge stations, original stations of CN05, and international exchange **Figure 1.** The location of Xiangjiang River Basin and its river channel, precipitation gauge stations (dense-gauge precipitation gauge stations, original stations of CN05, and international exchange stations) and discharge stations.

#### stations) and discharge stations. *2.2. Data*

*2.2. Data*  In this study, eight satellite-based datasets are selected and can be further classified into three categories: (1) satellite-only (PERSIANN and CMORPH RAW), in which their quality fully depends on the raw satellite data, (2) satellite-gauge (TRMM, PERSIANN CDR, CMORPH CRT, and CMORPH BLD), in which their quality partly depends on gauge data, and (3) satellite-reanalysis-gauge/blended (MSWEP and CHIRPS), in which reanalysis data are blended. In this study, eight satellite-based datasets are selected and can be further classified into three categories: (1) satellite-only (PERSIANN and CMORPH RAW), in which their quality fully depends on the raw satellite data, (2) satellite-gauge (TRMM, PERSIANN CDR, CMORPH CRT, and CMORPH BLD), in which their quality partly depends on gauge data, and (3) satellite-reanalysis-gauge/blended (MSWEP and CHIRPS), in which reanalysis data are blended. These datasets share the same spatial resolution of 0.25◦ × 0.25◦ for latitude and longitude, and the common period between 2003 to 2013.

These datasets share the same spatial resolution of 0.25° × 0.25° for latitude and longitude, and the common period between 2003 to 2013. Although PERSIANN and CMORPH RAW both incorporate PMW and IR to estimate rainfall, the proportion of PMW and IR is totally different between these two datasets. Specifically, CMORPH RAW is primarily based on PMW remote sensing of rainfall, while PERSIANN is mainly based on IR imagery [66,67]. Each satellite-gauge and blended (gauges, satellites, and reanalysis data) dataset blends different source data by using different data fusion methods. In general, CMORPH BLD and MSWEP directly incorporate daily gauge data, while TRMM and CMORPH CRT directly incorporate monthly gauge data. Unlike these four datasets specially designed to Although PERSIANN and CMORPH RAW both incorporate PMW and IR to estimate rainfall, the proportion of PMW and IR is totally different between these two datasets. Specifically, CMORPH RAW is primarily based on PMW remote sensing of rainfall, while PERSIANN is mainly based on IR imagery [66,67]. Each satellite-gauge and blended (gauges, satellites, and reanalysis data) dataset blends different source data by using different data fusion methods. In general, CMORPH BLD and MSWEP directly incorporate daily gauge data, while TRMM and CMORPH CRT directly incorporate monthly gauge data. Unlike these four datasets specially designed to provide the best instantaneous accuracy, PERSIANN CDR (monthly precipitation) and CHIRPS (5-day precipitation) have been designed to achieve the best simulations of the most temporally homogeneous record.

provide the best instantaneous accuracy, PERSIANN CDR (monthly precipitation) and CHIRPS (5-day precipitation) have been designed to achieve the best simulations of the most temporally homogeneous record. Specifically, TRMM blended GPCC with their satellite-only counterparts TMPA 3B42RT (which, similar to CMORPH RAW, is also estimated primarily by PMW remote sensing of rainfall) by the inverse error variance weighting method [68]. CMORPH CRT was produced by blending the CMORPH RAW dataset with Climatic Prediction Center (CPC) and GPCC via the probability density function matching a bias correction method [69]. The optimal interpolation method was used to combine the CMORPH CRT with daily gauge analysis to produce the CMORPH BLD [69]. Instead of using gauged observations directly, PERSIANN CDR was adjusted to match the monthly satellite-gauge GPCP, which uses gauge-interpolated GPCC, to remove its monthly biases [6,70]. Specifically, TRMM blended GPCC with their satellite-only counterparts TMPA 3B42RT (which, similar to CMORPH RAW, is also estimated primarily by PMW remote sensing of rainfall) by the inverse error variance weighting method [68]. CMORPH CRT was produced by blending the CMORPH RAW dataset with Climatic Prediction Center (CPC) and GPCC via the probability density function matching a bias correction method [69]. The optimal interpolation method was used to combine the CMORPH CRT with daily gauge analysis to produce the CMORPH BLD [69]. Instead of using gauged observations directly, PERSIANN CDR was adjusted to match the monthly satellite-gauge GPCP, which uses gauge-interpolated GPCC, to remove its monthly biases [6,70]. Although both MSWEP and CHIRPS are categorized as blended datasets, the data sources and fusion methods are totally different. MSWEP is mainly produced by giving weights to each dataset on each grid from different data sources (daily and monthly gauges such as CPC and GPCC, reanalysis from ERA-Interim,

Although both MSWEP and CHIRPS are categorized as blended datasets, the data sources and

Japanese 55-year Reanalysis (JRA 55) and satellite from CMORPH RAW, Global Satellite Mapping of Precipitation (GSMap MVK) and TRMM 3B42RT) based on their comparative performances at the surrounding gauges [71]. However, CHIRPS mainly uses the NOAA Climate Forecast System (CFS) reanalysis datasets to fill the missing values calculated by satellite datasets (from such as TRMM 3B42) and five-day gauged precipitation from datasets such as World Meteorological Organization's Global Telecommunication System [72]. More details of the above datasets are shown in Appendix A.

The reliability of the eight satellite-based precipitation datasets is evaluated by comparing it with two gauged precipitation datasets, including the dense-gauge dataset and the gridded gauge-interpolated dataset (CN05). As an important experimental basin, the Xiangjiang River Basin owns the dense-gauge precipitation dataset derived from a dense ground network of 267 precipitation stations with complete temporal coverage from 1963 to 2013, which is offered by the local hydrological department: the Water Conservation Bureau of Hunan Province. CN05, as a national gauge-interpolated dataset, is composed of daily precipitation estimates at the spatial resolution of 0.5◦ for the quasi-China coverage of 54◦N to 18◦S latitude from 1961 to 2016. CN05, independent from the dense-gauge precipitation dataset, is generated by blending daily precipitation data (2472 Chinese national weather gauges and 44 gauges locating in this study region) with Chinese mainland Digital Elevation Model (DEM) data (resampled from the Global 30 Arc Second Elevation Dataset, with a spatial resolution of 0.5◦ × 0.5◦ ) using Thin Plate Spline algorithm (TPS) [73]. It is worth noting that CN05 is not independent of the eight satellite-based datasets. This is because two of the 44 gauges of CN05 in the study region are selected as the international exchange gauges that provide measured components (such as GPCC and CPC) from four satellite-gauges and two blended datasets. This means that the gauged components of the satellite-gauge and blended datasets come from the same source. In other words, factors that influence the performances of satellite-based datasets come from other data sources (satellite or reanalysis) or the blending strategies between and within various source data. Compared with the eight above-mentioned daily satellite-based precipitation datasets, which define a day as 0–23:59 UTC, both dense-gauge and CN05 precipitation datasets use the same daily precipitation time interval, from 8 UTC of one day to that of the next day. This ensures that the daily precipitation measurement in China, in the eastern eight zones, is executed simultaneously with daily precipitation measurements under the 0–23:59 UTC standard. A brief summary of the eight satellite-based datasets and two gauged datasets is presented in Table 1. The locations of 267 dense-gauge precipitation datasets, 44 precipitation gauges of source data of CN05, and two international exchange gauges are shown in Figure 1.

For hydrological modeling, temperature data from 13 stations and streamflow time series at the watershed outlet are also used. In addition, a Digital Elevation Model (DEM) dataset with a spatial resolution of 30 m, a land-use dataset with a spatial resolution of 1 km, and a soil dataset from Harmonized-world-soil-datasets (HWSD) are used to establish the semi-distributed SWAT model.


**Table 1.** Background information for selected precipitation datasets used in this study.

#### *Remote Sens.* **2020**, *12*, 3550

#### **3. Methodology**

The comparison of datasets is carried out in both precipitation evaluations and hydrological simulations. When evaluating the precipitation, we compared the differences among all satellite-based precipitation datasets on both areal mean and grid scales to better understand the hydrological impacts of the errors from the satellite-based datasets. This is because the areal mean precipitation and the spatial distribution of precipitation are respectively decisive factors in the lumped XAJ and semi-distributed SWAT models used in this study. When an evaluation is executed at the grid-scale, the dense-gauge observations are interpolated by the IDW method to 151 grids with a spatial resolution of 0.25◦ × 0.25◦ , which is the same with eight satellite-based precipitation data [74]. For CN05 with a spatial resolution of 0.5◦ × 0.5◦ , the precipitation in four 0.25◦ grids within one 0.5◦ grid shares the same value.

#### *3.1. Hydrological Models*

In this study, two hydrological models with different complexities, such as a conceptual lumped model and a physically-based semi-distributed model, are utilized for hydrological modeling. Both models have been successfully established in the Xiangjiang River Basin for many studies [3,44,75,76]. Compared to the lumped XAJ, which uses the areal mean precipitation as the model input, the semi-distributed SWAT uses precipitation from a single rain gauge closest to each sub-basin's centroid as the model input. Details of these two models are described below.

#### 3.1.1. Xinanjiang Model (XAJ)

The XAJ model is a lumped conceptual rainfall–runoff model of a set of 15 variables developed in the 1970s [77,78]. It has been successfully used in humid regions of China [79–81]. Outflow simulation from the total outlet of the basin mainly consists of three phases: evapotranspiration, runoff generation, and runoff routing. Four parameters account for evapotranspiration, two account for runoff generation, and nine account for runoff routing. Its hydrological cycle is based on the water balance equation:

$$\mathcal{S}\_{l} + \mathcal{W}\_{l} = \mathcal{S}\_{0} + \mathcal{W}\_{0} + \sum\_{i=1}^{t} \left( \mathcal{R}\_{day} - \mathcal{Q}\_{surf} - \mathcal{E}\_{a} - \mathcal{Q}\_{lat} - \mathcal{Q}\_{\mathcal{G}w} \right) \tag{1}$$

where *S<sup>t</sup>* and *S*<sup>0</sup> are the mean and initial free water storage capacity, *W<sup>t</sup>* and *W*<sup>0</sup> are the mean and initial tension water storage, *Rday* is the amount of precipitation on day *i*, *Qsur f* is the amount of surface runoff on day *i*, *E<sup>a</sup>* is the amount of evapotranspiration on day *i*, *Qlat* is the amount of lateral flow on day *i*, and *Qgw* is the amount of groundwater flow on day *i*.

The evapotranspiration is calculated by dividing the soil into three layers: an upper layer, a lower layer, and a deep layer. The storage curve calculates the total runoff according to the hypothesis that when the soil moisture content reaches the field capacity, all rainfall turns into a runoff. The rainfall exceeding infiltration is transformed into the surface runoff *Qsur f* , and the rainfall that has infiltrated belongs to the lateral flow *Qlat* and groundwater flow *Qgw*.

#### 3.1.2. Soil and Water Assessment Tool Model (SWAT)

SWAT, a physically-based semi-distributed model, is designed to predict the effects of land management practices on the hydrology, sediment, and contaminant transport [82]. SWAT could be operated under different soil compositions, land uses, and management conditions in an agricultural watershed [3,83]. Different from the XAJ model, which uses the whole basin as the operation unit, SWAT divides the entire basin into several unit basins, and each unit basin is further divided into

several Hydrologic Research Units (HRUs). Each HRU is calculated individually based on relatively homogeneous land use, land cover, and soil types. The water balance of SWAT is described below as:

$$SW\_t = SW\_0 + \sum\_{i=1}^{t} \left( R\_{day} - Q\_{surf} - E\_a - W\_{seep} - Q\_{gw} \right) \tag{2}$$

where *SW<sup>t</sup>* is the final soil water content, *SW*<sup>0</sup> is the initial soil water content on day *i*, *t* is the time, *Rday* is the precipitation amount on day *i*, *Qsur f* is the surface runoff amount on day *i*, and *Wseep* is the water amount entering the vadose zone from the soil profile on day *i*.

The Penman–Monteith method is used to estimate evapotranspiration *E<sup>a</sup>* [84]. The surface runoff volume *Qsur f* is calculated by a Soil Conservation Service Curve Number method, and groundwater flow *Qgw* is simulated by creating a shallow aquifer. The outlet simulation of basin is calculated by the Muskingum method for each sub-basin's simulation results [85].

#### 3.1.3. Model Calibration and Validation

XAJ and SWAT models are respectively calibrated using the Shuffled Complex Evolution (SCE-UA) algorithm [86] and Sequential Uncertainty Fitting version 2 (SUFI2) algorithm [87], using the Nash–Sutcliffe efficiency (NSE shown in Table 2) coefficient as the objective function. Two models are calibrated from 2004 to 2010 and validated from 2011 to 2013, and 2003 is used as the spin-up year.

#### *3.2. Statistical Analysis Methods*

A set of statistical indices is utilized to evaluate the performance of eight satellite-based datasets in preserving precipitation and simulating watershed runoff. For precipitation evaluation, the indices include (1) four categorical statistics for detection error, (2) three quantitative metrics, of which two of them could reflect the systematic and random errors, and (3) four extreme precipitation statistics. There is one metric for hydrological evaluation to determine the overall hydrological performances and three hydrological statistics to reflect the characteristic values for streamflow. Additionally, the error propagation from precipitation to streamflow is qualified by two absolute ratios. A list of the indices can be found in Table 2, and more details are explained in the following section.

#### 3.2.1. Precipitation Indices

Detection, systematic, and random errors are three main error sources of satellite-based datasets [35,88]. False alarms (when gauges do not observe the satellite-detected precipitation) and missed rain (when the gauge-observed precipitation are not actually detected by satellites) constitute the detection errors [89]. When the satellite correctly detects precipitation, errors of estimated precipitation compose systematic and random errors [90–93].

In this study, four categorical statistics: the frequency bias index (FBI), the probability of detection (POD), the false alarm ratio (FAR), and the equitable threat score (ETS) are used to quantify the detection errors of each satellite-based dataset [1]. The FBI reflects the tendency to underestimate or overestimate rainfall events. The FAR (POD) measures the fraction of false alarms (rain occurrences) that were correctly detected. The ETS provides an overall skill measurement of the correctly detected rain events (observed and/or detected).

The three quantitative statistics of precipitation are the relative bias (RB), unbiased root mean squared error (ubRMSE), and the coefficient of determination (R 2 ). RB reflects the systematic error, which is the relative difference in the long-term mean values of the two series. Although RMSE shows the amplitude of differences between the two series, it could not directly reflect the random error unless the system error is removed by subtracting the mean difference from the RMSE to get the ubRMSE. R 2 indicates the correlation between two series.

Four extreme statistics are selected from the recommended list by the joint World Meteorological Organization Commission for Climatology/World Climate Research Programme project on Climate Change Detection and Indices (https://www.climdex.org/indices.html). These are the annual total precipitation when daily precipitation amount on a wet day > 99th percentile (R99pTOT), the annual daily precipitation amount on a wet day (SDII), the maximum length of wet and dry spells (CWD and CDD). P99pTOT is one threshold index, and SDII reflects the intensity of extreme precipitation. CWD (CDD) shows the duration of extreme precipitation (non-precipitation) events.
