**5. Conclusions**

In this study, the performance of ensemble precipitation forecasts of three NWP models within the TIGGE database, namely ECMWF, NCEP, and UKMO, and a satellite-based precipitation product, IMERG, for three severe flood events in Iran in March–April 2019 period were evaluated. In the first step, the performance of the precipitation products in capturing the spatial distribution of precipitation was evaluated. The results showed that all of the products could generally capture the main features of the precipitation system, including the spatial distribution, total accumulation, and extreme values (Figure 2). In general, UKMO, followed by IMERG and ECMWF, showed better performance than other products in capturing the spatial distribution of the accumulated precipitation during the 19 days of extreme precipitation events over Iran. However, the in situ observations identified four precipitation hotspots along the Zagros Mountain in western Iran with the largest precipitation amount; IMERG outperformed other products to capture these hotspots.

In the second step, all of the products were examined in comparison with the in situ observations in three major basins that were most affected by the floods. It was evident that ECMWF and UKMO, followed by IMERG, compared well with the corresponding in situ measurements in terms of mean precipitation through the first event in the Gorganrud Basin (Figure 7a). With respect to the second flood event, the box-plots indicated that IMERG, followed by ECMWF, outperformed other products in both the Karkheh and Karun Basins, while the UKMO whiskers extended to the most extreme data points (Figure 7b). In the third flood event, the mean areal precipitation values of all products were rather close to in situ observations over the Karkheh Basin, while box-plots confirmed that the IMERG pixels were in a larger range in comparison with the observations. However, almost all products overestimated the precipitation over the Karkheh and Karun Basins (Figure 7c).

In the third evaluation step, four daily precipitation thresholds of 25 mm, 50 mm, 75 mm, and 100 mm were selected to evaluate the skill of the products in capturing precipitation within the specified thresholds via dichotomous evaluation methods. The results showed that when the threshold was increased, the performance of the NCEP model was greatly reduced, while the IMERG estimates

improved at higher thresholds. At the 50 mm threshold, UKMO predicted a closer number of events in comparison with the observations. At the 75 mm threshold, UKMO revealed better results than the other products, whereas NCEP had difficulty in forecasting the precipitation amount at this threshold. At the 100 mm threshold, the in situ observations recorded 22 events, while NCEP detected none. The maximum ensemble forecasts of UKMO in higher thresholds could estimate a larger number of precipitation events than other models and the satellite. As such, UKMO detected 11 out of 22 events (Figure 8). However, in terms of the contingency table, ECMWF outperformed other products with a higher POD and lower FAR (Figure 9).

Overall, the results of this study show that the IMERG precipitation estimates and NWP ensemble forecasts performed well in the three major flood events in spring 2019 in Iran. Given the widespread damage caused by the floods, the necessity of establishing an efficient flood warning system using the best precipitation products is advised.

The overestimation/underestimation of precipitation by forecast models and satellite-based precipitation products still remains a challenge, particularly for extreme precipitation events. Short-time and extreme precipitation events are much more variable than moderate precipitation events. However, studies on the impact of the uncertainty of precipitation products are needed to obtain a better understanding of how and why precipitation products succeed or fail in the detection of heavy precipitation. Moreover, it is important to note that this study was conducted based on a short period of data (i.e., 19 precipitation days containing three severe flood events) limited to Iran. Thus, further studies using a longer dataset at the global scale in different climate regimes/geophysical features is essential to assess the impacts of the aforementioned limitations.

**Author Contributions:** Conceptualization: S.A., B.S. and E.S.; Methodology: B.S., S.A. and E.S.; Analysis: S.A. and E.S.; Writing—original draft: S.A. and E.S.; Writing—review and editing: B.S., S.A. and E.S.; Supervision: B.S.

**Funding:** This research received no external funding.

**Acknowledgments:** The open access publishing was supported by the BOKU Vienna Open Access Publishing Fund.

**Conflicts of Interest:** The authors declare no conflicts of interest.
