*3.3. Evaluation Metrics*

A total of four continuous metrics are used to evaluate the quality of satellite precipitation products in the SRB. Correlation coefficient (CC) is used to quantify the linear correlation between satellite precipitation estimates and ground measurements; it varies between −1 and 1, with a value close to 0 indicating little correlation. Root-mean-square error (RMSE) quantifies the degree of dispersion between satellite precipitation and measured precipitation, which can reflect the overall error level and accuracy of SPPs [36]; mean absolute difference (MAD) evaluates the magnitude of the average difference between satellite precipitation and measured precipitation. Smaller values of RMSE and MAD indicate a better performance of the SPPs. Relative bias (RB) measures the systematic bias of satellite precipitation compared with gauge observations. A positive and negative RB indicates overestimation and underestimation, respectively. As a rule of thumb, SPPs can be considered as reliable when RB falls between −10% and 10% and CC exceeds 0.7 [37].

The four continuous metrics are calculated as [38–40]

$$\text{CC} = \frac{\sum\_{i=1}^{n} \left( X\_i^o - \overline{X}^i \right) \left( X\_i^s - \overline{X}^s \right)}{\sqrt{\sum\_{i=1}^{n} \left( X\_i^o - \overline{X}^o \right)^2} \sqrt{\sum\_{i=1}^{n} \left( X\_i^s - \overline{X}^s \right)^2}} \tag{1}$$

$$RB = \frac{\sum\_{i=1}^{n} \left(X\_i^s - X\_i^o\right)}{\sum\_{i=1}^{n} X\_i^o} \times 100\% \tag{2}$$

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{n} \left(X\_i^s - X\_i^o\right)^2}{n}} \tag{3}$$

$$MAD = \frac{\sum\_{i=1}^{n} \left| X\_i^o - X\_i^s \right|}{n} \tag{4}$$

where *n* is the number of the simulated and observed data pairs; *X<sup>s</sup> <sup>i</sup>* and *<sup>X</sup><sup>o</sup> <sup>i</sup>* denote the *i*th simulated and observed amount, respectively; *<sup>X</sup><sup>s</sup>* and *<sup>X</sup><sup>o</sup>* are the mean of the simulated and observed data, respectively.

Besides the continuous metrics, three categorical evaluation metrics are used to evaluate the precipitation detection capability of the SPPs, which include probability of detection (POD), false alarm rate (FAR), and critical success index (CSI). POD represents the ratio of correctly detected precipitation occurrences by the SPPs to the total number of actual precipitation occurrences. With an optimal value of 1, a higher POD indicates that the SPP is more capable of detecting the actual precipitation occurrences. FAR calculates the ratio of falsely detected precipitation occurrences to the total number of detected precipitation occurrences. With an optimal value of 0, a lower FAR indicates that the SPP is less likely to yield false precipitation occurrences. CSI incorporates both missed events and false detections in its calculation [41]. With an optimal value of 1, a higher CSI indicates a better performance of the SPP with more correct detections as well as fewer false alarms of precipitation occurrences. Based on the number of hits (H), false alarms (F), and misses (M) (Table 1), the three categorical metrics are calculated as

$$POD = \frac{H}{H+M} \tag{5}$$

$$FAR = \frac{F}{H + F'} \tag{6}$$

$$\text{CSI} = \frac{H}{H + M + F} \tag{7}$$

where *S* represents rain gauge observation; *P* represents satellite rainfall estimate; H (hits) represents the number of cases when both the rain gauge and the satellite determine the rainfall to equal or exceed the threshold; F (false alarms) represents the number of cases when the satellite determines the rainfall to equal or exceed the threshold but not the rain gauge; M (misses) represents the number of cases when the rain gauge determines the rainfall to equal or exceed the threshold but not the satellite; and Z (correct negatives) represents the number of cases when both the rain gauge and the satellite determine the rainfall to fall below the threshold.


**Table 1.** Contingency table between rain gauge observations and satellite precipitation estimates.

#### *3.4. Analysis of Variance (ANOVA)*

To evaluate the rainfall estimation performance of the five satellite precipitation products, four continuous metrics (*CC, RB, RMSE,* and *MAD*) are respectively calculated at each of the 13 rainfall stations across the monthly, daily, and hourly scales. Previous studies have mostly used the mean values of the metrics over all rainfall stations to compare the rainfall estimation performance among the SPPs. This simple averaging approach, however, does not account for the variability in the metrics among the rainfall stations. Furthermore, it is incapable of determining the significance of the difference between the SPPs.

In view of the deficiency, we adopt the one-way analysis of variance (ANOVA) to statistically evaluate the difference in metrics between the SPPs. In the ANOVA, satellite precipitation product

type is used to designate the five groups of metrics for comparison. If the ANOVA determines there is some significant difference in the mean metrics among the SPPs, multiple commonly used posterior comparison tests—including the Bonferroni, Sidak, Tukey, and Scheffe tests built in the Origin 2018 Statistical Package—are further used to identify the pairs of SPPs whose mean metrics are indeed statistically different.
