(2) Statistical Performance Comparison among the SPPs

Figure 8 compares the boxplots of the four continuous evaluation metrics (April to October) at the hourly scale among the SPPs. For example, Figure 8b contains five boxplots, which respectively characterize the distribution of RMSEs (April to October) of the five SPPs across the 13 rainfall stations. The RMSEs of the IMERG\_E range from 1.42 to 1.77 among the 13 rainfall stations, compared to 1.34–1.71 for IMERG\_L, 1.45–1.74 for IMERG\_F, 1.49–1.77 for 3B42, and 1.55–1.83 for 3B42RT.

Levene's Test has confirmed that all four metrics could meet the pre-condition of homogeneity of variance for conducting one-way ANOVA. The subsequent one-way ANOVA has concluded that the mean values of all four metrics are significantly different among the SPPs at the significance level (α) of 0.05. Further posterior comparison tests have shown that the CCs are significantly different between most of the pairs of SPPs except four pairs (IMERG\_L/IMERG\_F; IMERG\_E/3B42; IMERG\_E/3B42RT; and 3B42/3B42RT), while the RBs are significantly different except between two pairs (IMERG\_E/IMERG\_L and IMERG\_F/3B42). Unlike CC and RB, the RMSEs of the SPPs are only significantly different between one pair (IMERG\_L/3B42RT). Finally, the MADs are only significantly different between 3B42RT and all IMERG products, as well as between 3B42 and IMERG\_L (Figure 8). It is worth noting that the posterior comparison tests have shown that IMERG\_F is not significantly different from the TMPA products in terms of all four metrics except CC.

**Figure 8.** Boxplots of the four continuous evaluation metrics (April to October) of SPPs at hourly scale: (**a**) correlation coefficient (*CC*) (Levene's test, *p* = 0.77; one-way ANOVA, *p* = 3.0 <sup>×</sup> 10<sup>−</sup>13); (**b**) root-mean-square error (RMSE) (Levene's test, *p* = 0.63; one-way ANOVA, *p* = 0.03); (**c**) relative bias (RB) (Levene's test, *<sup>p</sup>* <sup>=</sup> 0.61; one-way ANOVA, *<sup>p</sup>* <sup>=</sup> 5.6 <sup>×</sup> <sup>10</sup><sup>−</sup>16); and (**d**) mean absolute difference (MAD) (Levene's test, *<sup>p</sup>* <sup>=</sup> 0.87; one-way ANOVA, *<sup>p</sup>* <sup>=</sup> 9.8 <sup>×</sup> <sup>10</sup><sup>−</sup>10). For CC and RB, two SPPs are connected with a black dotted line if posterior comparison tests have indicated a non-significant difference between their means at α = 0.05. For RMSE and MAD, two SPPs are connected with a red dotted line if posterior comparison tests have indicated a significant difference between their means at α = 0.05. Each boxplot is used to depict the distribution, therefore the variation, of the continuous evaluation metrics among the 13 rainfall stations. In each boxplot, the top and bottom of the box represent the first and third quantiles. The whiskers extends to 1.5 times of the inter-quantile range. The horizontal line inside the box represents the median. The '×' inside the box represents the mean.

## (3) Spatial Variation

Figure 9 compares the spatial distribution of the four annual continuous evaluation metrics among the SPPs. At the hourly scale, topography also does not seem to be a significant influencing factor of the CCs, with lower CC values observed at stations of both low and high altitude. However, the spatial distribution of the other three metrics does indicate a significant impact of topography on the performance of the SPPs in estimating hourly rainfall. Both RMSEs and MADs exhibit similar spatial patterns across the five SPPs, whose values at the three stations of high altitude (station 10, 12, and 13) consistently stay at the top. As discussed above, the IMERG products tend to underestimate hourly rainfall. As shown in Figure 9, underestimation by the IMERG products is especially severe at higher altitude. Meanwhile, the 3B42 product also tends to underestimate hourly rainfall more at high altitude. Different from the other SPPs, the 3B42RT product tends to overestimate hourly rainfall more seriously at lower altitude.

**Figure 9.** Spatial distribution of the continuous evaluation metrics (April to October) at hourly scale: (**a**) *CC*; (**b**) *RMSE*; (**c**) *RB*; and (**d**) *MAD*.

## 4.3.2. Categorical Evaluation Metrics
