**2. Data and Methods**

Daily in situ accumulated precipitation data from 100 meteorological synoptic stations and 221 Ministry of Energy rain gauges between 15 March 2019 to 2 April 2019 were collected. The collected data have been quality controlled by the respective organization. Ensemble precipitation forecasts of three major global meteorological centers, namely ECMWF, NCEP and UKMO, were then extracted from the TIGGE database in a 24-h lead time (https://confluence.ecmwf.int/display/TIGGE). TIGGE is part of the THORPEX project, which includes ensemble forecasts of 11 NWP world centers. Ensemble forecasts include multiple individual forecasts generated by different physical parameterizations or different initial conditions. Based on the results of NWP evaluation in several studies, the three selected models performed better than the others in different regions of Iran/the world [9,27]. As a result, the NCEP, UKMO, and ECMWF models were evaluated as numerical forecasts in this study. The NWP data are in the GRIB2 format with a resolution of 50 km. Furthermore, daily IMERG-V06B-RT satellite estimates with an approximate 10-km spatial resolution for the whole of Iran were downloaded in NetCDF format for the study time period. Figure 1 shows the spatial distribution of the in situ observations overlaid on the elevation map. Table 1 shows the characteristics of the three NWP models and IMERG product.

**Figure 1.** The spatial distribution of in situ observations on the elevation map of Iran.


**Table 1.** Characteristics of the studied NWP models and IMERG satellite.

In this study, the evaluations were conducted in three steps. In the first step, the performance of the three NWP models and SPEs in terms of capturing the spatial distribution of precipitation for the three flood events (17–22 March, 24–26 March, and 31 March to 2 April 2019, respectively) were compared. The numerical forecast and SPE data have a 50 × 50 km and ~10 × 10 km spatial resolution, respectively. Therefore, spatial aggregation from 10 km to 50 km was performed using the cubic convolution resampling method, which is based on the weighted average of 16 nearest neighboring pixels [28]. An in situ observation map was also constructed using inverse distance weighting (IDW) interpolation with a 50 × 50 km resolution. Moreover, for a more robust comparison of the spatial distribution of precipitation, the isohyet contours over the three respective basins for all three flood events were obtained.

For detailed insight into the second step, the mean, maximum, and minimum precipitation values for each individual event were also determined.

In the third step, the NWP forecasts and satellite data were interpolated with the aim of a direct comparison of precipitation with the in situ measurements. The IDW method used for interpolation involved four grid-points around each station. Then the interpolated precipitation at each station was evaluated against the observation. The dichotomous (yes/no) evaluation of daily precipitation was further examined. For this purpose, 25, 50, 75, and 100 mm/day thresholds were set and the number of correct events determined by the satellite and NWP models in each threshold were compared. Precipitation events were counted if at least one of the stations operating in each basin recorded precipitation. Otherwise, if none of the stations recorded precipitation, a "no-precipitation" event was assigned to the whole basin. Accordingly, the probability of detection (POD) and false alarm ratio (FAR) as well as the equitable threat score (ETS) criterion were used to examine the capability of the products to detect the precipitation events. POD and ETS values vary between 0 to 1, with 1 as a perfect score, while the FAR perfect score is 0. In addition, the average results of the dichotomous evaluation of the stations in each basin were calculated in four precipitation thresholds. Interested readers are referred to Wilks (2011) for further detail on the dichotomous (yes/no) evaluation [29]. Table 2 indicates the metrics used to measure the effectiveness of precipitation estimations.



Notes: *F* and *O* denote the forecast and corresponding observation, respectively. Similarly, *F* and *O* denote the forecast average and observation average, respectively. A, B, C, and D were obtained from the contingency table.

It should be noted that the evaluations of the first and second step were based on the total precipitation in each flood event. To clarify this, the first precipitation/flood event consisted of the precipitation accumulation of six days from 17 March to 22 March 2019; the second event in three days from 24 March to 26 March 2019; and the third event in three days from 31 March to 2 April 2019. A tertiary evaluation for the daily accumulation precipitations from 15 March to 2 April 2019 was conducted.
