Reliable estimation of extreme rainfall is essential for hydraulic design and flood risk mitigation, particularly in arid regions where rainfall exhibits strong temporal and spatial variability. This study presents a statistical framework for developing rainfall intensity-duration-frequency (IDF) curves, complemented by a machine learning-based
[...] Read more.
Reliable estimation of extreme rainfall is essential for hydraulic design and flood risk mitigation, particularly in arid regions where rainfall exhibits strong temporal and spatial variability. This study presents a statistical framework for developing rainfall intensity-duration-frequency (IDF) curves, complemented by a machine learning-based assessment of model bias and performance. The analysis was conducted using data from ten rainfall stations located within or near the Wadi Al-Rummah Basin. Annual maximum series (AMS) from 1969 to 2024 were first reconstructed to address missing years using a modified normal ratio method (NRM) combined with nearest-station selection, ensuring spatial consistency while preserving station-specific rainfall characteristics. Six probability distributions (Weibull, Gumbel, gamma, lognormal, generalized extreme value (GEV), and generalized Pareto) were fitted to each station, and the best-fit distribution was identified using multiple goodness-of-fit (GOF) criteria, including the Kolmogorov–Smirnov (K-S) test, Anderson–Darling (A-D) test, root mean square error (RMSE), chi-square (χ
2) statistic, Akaike information criterion (AIC), Bayesian information criterion (BIC), and the coefficient of determination (
R2). Statistical IDF curves were then developed for durations ranging from 5 to 1440 min and return periods from 2 to 1000 years. To evaluate the robustness of the statistically derived IDF curves, three machine learning (ML) models, multiple linear regression (MLR), regression random forest (RRF), and multilayer feed-forward neural network (MFFNN), were trained as surrogate models using duration, return period, and station geographic attributes as predictor variables. Model performance was evaluated using RMSE, MAE, and mean bias metrics across stations and return periods. The lognormal distribution emerged as the best-fit model for four stations, while the Gumbel and gamma distributions were selected for two stations each. Overall, no single probability distribution consistently outperformed others, indicating station-dependent behavior. Among the machine learning models, the MFFNN achieved the closest agreement with statistical IDF estimates (
,
,
), followed by RRF and MLR based on global average performance across all stations and return periods. The proposed framework offers a reliable approach for rainfall IDF development and evaluation in arid region watersheds.
Full article