Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques

Ntagkounakis, Giorgos; Nastos, Panagiotis; Kapsomenakis, John; Douvis, Kostas

doi:10.3390/hydrology12020031

Open AccessFeature PaperArticle

Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques

¹

Laboratory of Climatology and Atmospheric Environment, Department of Geology and Geoenvironment, National and Kapodistrian University of Athens, 15784 Athens, Greece

²

Research Center for Atmospheric Physics and Climatology, Academy of Athens, 11527 Athens, Greece

^*

Author to whom correspondence should be addressed.

Hydrology 2025, 12(2), 31; https://doi.org/10.3390/hydrology12020031

Submission received: 11 January 2025 / Revised: 4 February 2025 / Accepted: 7 February 2025 / Published: 10 February 2025

(This article belongs to the Special Issue Advances in the Measurement, Utility and Evaluation of Precipitation Observations)

Download

Browse Figures

Versions Notes

Abstract

This study investigates a range of precipitation interpolation techniques with the objective of generating high-resolution gridded daily precipitation datasets for the Greek region. The study utilizes a comprehensive station dataset, incorporating geographical variables derived from satellite-based elevation data and integrating precipitation data from the ERA5 reanalysis. A total of three different modeling approaches are developed. Firstly, we utilize a General Additive Model in conjunction with an Indicator Kriging model using only station data and limited geographical variables. In the second iteration of the model, we blend ERA5 reanalysis data in the interpolation methodology and incorporate more geographical variables. Finally, we developed a novel modeling framework that integrates ERA5 data, a variety of geographical data, and a multi-model interpolation process which utilizes different models to predict precipitation at distinct thresholds. Our results show that using the ERA5 data can increase the accuracy of the interpolated precipitation when the station dataset used is sparse. Additionally, the implementation of multi-model interpolation techniques which use distinct models for different precipitation thresholds can improve the accuracy of precipitation and extreme precipitation modeling, addressing important limitations of previous modeling approaches.

Keywords:

interpolation; daily gridded precipitation totals; extreme precipitation; wet days; Greece; ERA5; general additive models; Indicator Kriging; regression kriging

1. Introduction

Climate datasets at a high spatial resolution are essential for evaluating climate risks and for designing and implementing adaptation strategies and mitigation measures [1]. High-resolution gridded datasets of meteorological variables can be produced by Global Climate Models (GCMs), Regional Climate Models (RCMs), and statistical interpolation of existing climate data. GCMs and RCMs can also simulate future scenarios for the atmosphere of the Earth and/or a given region. The different methodological approaches and the resulting datasets have different applications. GCMs and RCMs are utilized to forecast future climate scenarios. In contrast, gridded datasets generated using statistical interpolation are essential for understanding current climate conditions and for evaluating and refining the output from RCMs and GCMs.

1.1. Global and Regional Climate Models

GCMs use quantitative analysis to simulate the Earth’s climate and to make predictions about the future. These models are usually calculated on a coarse resolution and are not suited for regional or national climate analysis. To analyze climatic variables at a regional level, downscaling methodologies are employed. These methods are broadly classified into two categories: dynamical downscaling and statistical downscaling.

In the dynamical downscaling approach, the output from GCMs is used to develop RCMs with higher resolution, providing a more accurate representation of the region’s climatic patterns. The most widely used regional climate models are the EURO-CORDEX RCMs which have a horizontal resolution of about 12 km [2]. These models also simulate climate parameters for a variety of different scenarios (RCP 2.6, RCP 4.5, RCP 8.5) and are widely used for assessing climate change and its impacts in different regions [3,4,5]. If the resolution of the EURO-CORDEX RCMs proves insufficient other dynamical downscaling techniques are also used by incorporating the data outputs of the EURO-CORDEX RCMs and developing more specialized models for specific areas [6,7]. The accuracy evaluation of dynamical downscaling and RCMs outputs involves a comparison with reference datasets, which commonly include observational data or high-resolution reanalysis products.

Statistical downscaling aims to leverage observational data to enhance the outputs of global or regional climate models. According to the methodology, a statistical model is used to establish some statistical relationship between observations of a climatic parameter and independent or predictor values that are usually available on a higher resolution. Once the statistical relationship has been established, predictions can be made for the climatic parameters that are more accurate and usually have a higher resolution. Additionally, to validate the results, the observation dataset is usually split into a dataset that is used for training and a dataset used for validation. Once the statistical model has established the mathematical relationships using the training dataset, predictions are made for the validation dataset and the predicted values are compared to the observed values to assess the model’s performance. The statistical models and methodologies employed for this type of downscaling vary greatly depending on its intended use and the data available. Initial statistical models frequently employ regression analysis and regression kriging, whereas contemporary approaches encompass more intricate techniques such as Generalized Additive Models, Bias Adjustment techniques, Random Forests, and combinations of the different models using different approaches [8,9,10,11,12].

Both approaches are widely used to create high-resolution climate data for different uses, utilizing a variety of modeling approaches and data. Dynamical downscaling is based on GCM outputs and can be used to make predictions about the future without the need for robust observational data, but is usually computationally intense. Statistical downscaling approaches are less computationally intense and can be more easily applied in a variety of resolutions; however, they require long-term observational records in order to establish mathematical relationships between the climate variables and independent values, which may not be available in every region. Although such observational databases may not be needed for the dynamical downscaling approach, observational records are used for validation in both techniques. In the last three decades, RCMs have been used extensively for providing climate data due to the coarse resolution of the GCMs; however, in recent research, the need for RCMs has been questioned because the new generation of high-resolution GCMs seems to have managed to match the performance of RCMs [13,14].

1.2. Gridded Datasets Based on Observational Values

Gridded datasets based on observational values use complex statistical interpolation techniques with existing climate data from gauges or satellites in order to create high-resolution gridded datasets. These datasets can then be used to validate or downscale climate models providing data that are inherently closer to the observed climate parameters of a country or region. Additionally, databases covering reference periods between 1970 and 2010 are essential for analyzing how climate change has affected a region and for understanding how the spatial distribution of climate variables has changed.

With a spatial resolution of 0.1 degrees, the E-OBS dataset is widely recognized as one of the highest-quality gridded observational datasets for Europe, providing valuable insights into climate patterns [15]. The methodology employed by the E-OBS for creating the temperature dataset involves first interpolating monthly values to a high-resolution grid using station data. Then, a Generalized Additive Model (GAM) is fitted to the daily station values in order to interpolate to the rest of the grid. The formula employed by the E-OBS for the GAM uses longitude, latitude, altitude, and mean monthly background interpolated temperatures. For precipitation, a very similar methodology is used in the E-OBS dataset with the GAM formula utilizing longitude, latitude and the square root of monthly background interpolated totals predicting the square root of daily precipitation. In their documentation, they explain that altitude was not used because it did not improve the performance of the model. Additionally, they used the squared root of the precipitation and the precipitation totals in order to remove some of the skewness of the precipitation data.

In the past decade, there has been a proliferation of region-specific or country-specific climate databases. This development is primarily driven by the need to increase the accuracy of regional climate data and to provide higher temporal or spatial resolution, typically around 1–2 km [16,17,18,19,20]. In Greece, the only gridded dataset has been developed by Varotsos et al. [21,22] with a 1 km resolution. In their research for the temperature grid dataset, they applied a hybrid approach blending outputs from the non-hydrostatic Weather Research and Forecasting (WRF) meso-scale meteorological model with the available gauge data, while for precipitation they chose to use only station data.

Among all climatic variables studied, precipitation has consistently presented substantial challenges in accurately reproducing geographical variability and the occurrence of extreme events [13,21,22]. The main reason that precipitation has been so difficult to simulate is because the variable does not follow a normal distribution and it is not a continuous variable like temperature. Additionally, precipitation extremes are even more difficult to simulate [23,24,25] and current research shows that the frequency, intensity, and spatial distribution of precipitation extremes are changing [26,27]. In Greece, precipitation exhibits intense seasonal variability [28,29,30], while additionally the high degree of spatial variability of the Greek landscape influences precipitation [31] and creates different microclimates in the region, with the mountainous high-elevation regions of Greece exhibiting a higher number of wet days and annual precipitation totals while the islands in the Aegean Sea experience lower precipitation totals and droughts in the summer [32,33,34].

The goal of this study is to create a high-resolution daily gridded precipitation database at a 0.01° × 0.01° resolution using novel statistical techniques which can be used for analyzing precipitation changes as well as for validation or bias-adjustment of RCMs or GCMs. In this research, we will focus on precipitation as there are additional challenges in creating a high-resolution precipitation dataset in Greece since the station network that is available is quite sparse and the spatial and seasonal variability of the variable in the region is very high. The aforementioned challenges hinder the accurate interpolation of precipitation in the region by traditional approaches necessitating the use of additional independent variables and data sources including reanalysis data. The methodologies employed in our research will be particularly useful for Greece since the country has experienced a significant number of extreme precipitation events in recent years [35] and higher-resolution precipitation data will be useful in analyzing precipitation climatology and improving the performance of climate models over the region, informing policy-makers about mitigation and adaptation measures that may be needed currently or in the future. Finally, the novel multi-model approach developed improves upon existing statistical interpolation techniques by blending ERA5 reanalysis data in the methodology which utilizes dynamical modeling techniques and the incorporation of extreme precipitation models which improve the distribution of precipitation in the studied region.

2. Materials and Methods

2.1. Data

In this research, we decided to use a hybrid method where gauge precipitation data are used as dependent variables, while ERA5 reanalysis data in combination with elevation, longitude, latitude, distance from sea coastal line, and the AUREHLY principal components are used as independent variables. Reanalysis data have been used for Greece in previous research [36] to supplement the sparse gauge dataset that is available in the Greek region for the reference period of 1980–2010. The time frame of 1980–2010 was selected for this study due to its widespread use as a reference period in both regional and global climate modeling, serving as a benchmark for comparison against future scenario simulations. Consequently, precise gridded climate datasets spanning this period are of critical importance for conducting a thorough analysis of regional climate change and variability.

2.1.1. Gauge Precipitation Dataset

In order to construct the high-resolution grid, daily quality-controlled precipitation gauge data for the Greek region were provided by the Hellenic National Meteorological Service (HNMS) for the period 1980–2010. A total of 97 stations were used, covering most of the region studied, while out of the total available data points, 23.86% were missing across the period studied. The dataset used has a density of one station per 1360 km² which is very sparse for interpolating precipitation, highlighting the challenges in creating an accurate and representative dataset for the region. The dataset represents the best available data for the Greek region for the period studied. To better examine the performance of the interpolation methods, the study area was divided into 13 different subregions that represent the different climates and precipitation regimes that exist in Greece. The stations were divided into Crete (C), Dodecanese (D), eastern Peloponnese (EP), western Peloponnese (WP), Cyclades (CY), eastern Aegean (EA), Ionian Islands (I), western Greece (WG), central-eastern Greece (CEG), northern Aegean (NAe), west-central Macedonia (WCM), east Macedonia and Thrace (EMT), and Cities (CT). The station density per area studied is also presented in Table S1 in the Supplementary Materials section. The regions were chosen based on published literature for the region [33,37], while for the purposes of this research, we decided to add Cities as a separate category. This category includes large cities that experience the urban heat island effect, a phenomenon that can influence precipitation regimes [38,39]. Additionally, the high population density within these urban areas underscores the critical need for accurate climate information to support evidence-based policy decisions that address the unique challenges and vulnerabilities of urban environments.

2.1.2. Geospatial Data

The elevation data employed in this study were generated by upscaling the 12 m resolution TanDEM-X Elevation Model, which is a product generated from the TerraSAR-X satellite mission [40]. The final Digital Elevation Model (DEM) along with the stations used and the different subregions studied are presented in Figure S1 in the Supplementary Materials. To calculate the distance from the sea coastal line that is used as an independent variable in the models developed, we used coordinates provided by the https://geodata.gov.gr/ (accessed on 1 June 2024) [41] government platform.

2.1.3. Reanalysis Data

For independent variables, we also utilize reanalysis precipitation data from the ERA5 reanalysis [42,43], which were provided at a 0.1° resolution in the WGS84 geographic coordinate system. The ERA5 precipitation dataset has been shown to overestimate the frequency and duration of precipitation while underestimating its intensity [44,45], and more specifically in Greece, there are also differences in the performance of the dataset between the island and continental regions [36]. Despite its known shortcomings, the ERA5 reanalysis dataset has been shown that it can be used for downscaling climate variables in previous research in a variety of different regions [36,45,46,47]. Additionally, the ERA5 dataset has demonstrated a capacity to effectively simulate precipitation within regions characterized by sparse gauge networks, in contrast to the E-OBS dataset, which may exhibit inaccurate precipitation estimates due to limited data availability in regions such as Southern Europe [48]. Additionally, the reanalysis dataset employs dynamical modeling techniques which will supplement the sparse station network of the region. Finally, while the E-OBS provides climate data for Europe, the global availability of ERA5 data enhances the transferability of research findings. Techniques and methodologies developed using ERA5 can be more readily applied to other regions worldwide.

2.2. Data Transformations and Preprocessing

To facilitate the analysis process, all source geospatial data were reprojected to the World Geodetic System 1984 geographic coordinate system using ESRI ArcGIS’ Project function. Moreover, the grid presented in this study was generated by upscaling the TanDEM-X Elevation Model to a 0.01° resolution using ESRI ArcGIS’ Resample function with bilinear interpolation. To map the coarser 0.1° resolution ERA5 grid to the 0.01° raster, a nearest-neighbor interpolation approach was used. Regarding the resolution chosen, prior research conducted in the Greek region [21,22,36,49] and other European countries [16,17,18,19,20] has established that a 0.01° spatial resolution is sufficient for the analysis of precipitation totals and extreme events.

Additionally, the principal components (PCs) generated by the Analyse Utilisant le RELief pour les besoins del’ HYdrométéorologie (AURELHY) method [50] were also used as independent variables in the models developed. The AURELHY PCs are variables that are obtained through the application of principal component analysis on the elevation differences between a number of different grid points. This approach provides valuable insights into the underlying topographical structure of the region studied. The methodology employed for the calculation of principal components is elaborated upon in prior research conducted within the Greek region [36,51,52], where AURELHY PCs have been effectively used to generate high-resolution gridded datasets of temperature and precipitation. The AUREHLY PCs were calculated for the grid dataset used, as well as the ERA5 grid and the meteorological stations. On all occasions, the 0.01° elevation grid was used to calculate the elevation differences between the neighboring grid points, the ERA5 data grid, and the stations’ positions.

Finally, the distance from the sea coastal line, which has been used to interpolate precipitation in Greece in previous research [36,51], was calculated using ESRI ArcGIS’ Near function and the geospatial coordinates of the shoreline, the ERA5 grid and the final grid that was used for the analysis.

2.3. Methodology

2.3.1. Interpolation Models

In this research, we employed a number of different methodologies and interpolation techniques to construct daily gridded precipitation datasets in order to find the most optimal for the region and our station dataset. Initially, a monthly interpolation was performed to establish monthly background totals, which will subsequently be utilized in the daily interpolation process, as independent variables. We used a regression kriging model to interpolate the daily rain gauge data accumulated by month using longitude, latitude, and elevation as covariates on one instance (MoInt) and considering longitude, latitude, elevation, distance from sea coastal line, the ERA5 monthly sums and a number of AUREHLY PCs on another instance (MoIntERA5). The AUREHLY PCs are chosen based on their usage in previous research in Greece [36,51,52].

For the daily interpolation grid, Generalized Additive Models (GAMs) are used in combination with an Indicator Kriging (IK) model. Generalized Additive Models are an extension of Generalized Linear Models that assume the underlying functions are additive and that the predictors are smooth. Interpolation approaches that use GAMs have been used to generate the E-OBS dataset [15] and daily precipitation datasets in Greece [21,22]. In our research, we used a variety of different GAMs, the formula of the different models is described in Figure 1 and in the Supplementary Materials in Figures S2 and S3. The GAMs and the Indicator and regression kriging models are developed using the R programming language [53] and a variety of different libraries [54,55,56,57].

The inherent sparsity of the gauge network in Greece presents a significant challenge in accurately capturing the intricate spatial patterns of precipitation across the diverse topographic features of the country. Recognizing this limitation, the decision to employ more complex models with a greater number of parameters and integrate ERA5 reanalysis data was driven by the need to improve the accuracy of precipitation estimates, particularly in regions with limited gauge coverage.

The IK approach is used in order to interpolate the daily occurrence of rainfall. The station data are first categorized by setting days where precipitation is less than 0.1 mm/day as zero and days where precipitation is equal to or greater than 0.1 mm/day as one. These data are interpolated using regression kriging on the grid and a threshold is set based on the interpolated values to assign a wet day to each grid point. This approach has been used before to interpolate daily precipitation and develop high-quality gridded datasets [58,59] usually using a 0.5 IK value as a threshold for assigning precipitation. In our research, we found that using different IK thresholds can be useful for the Greek area, especially in the drier months where we achieved slightly better results by utilizing a lower threshold. The thresholds used along the different models are shown in the flowcharts in Figure 1 and Figures S2 and S3.

During the training stages of the models, we found that all interpolation models could not accurately simulate extreme precipitation, in particular precipitation over 20 mm/day. To bypass this problem, we chose to add three more IK models, one for a 10 mm/day threshold, one for 20 mm/day, and one for 40 mm/day (0 or 1 depending on whether daily precipitation equals or exceeds 10, 20 or 40 mm/day). The data, similarly to assigning a wet day, are interpolated using regression kriging, and a threshold is used to assign an extreme precipitation day. To predict precipitation in the points that the IK has assigned an extreme precipitation day a regression kriging model is used that is trained on only station data that exceed the 10, 20, or 40 mm/day threshold, respectively. The model uses only longitude and latitude as independent variables for all different thresholds, since during the training stages we found that adding more variables does not result in significant improvements in performance. Given the rarity of extreme precipitation events exceeding 10, 20, and 40 mm/day in the Greek region and the limited number of rain gauges available during the model training period, there may not be enough data to effectively train the model for days identified by IK as having extreme precipitation. To circumvent this constraint, a progressive data augmentation technique was utilized. When the available data for a particular day proved inadequate for regression kriging model training, data from the preceding and subsequent days were appended to the dataset, with the condition that no data points originated from the same gauge station. The process was implemented until there were enough data to train the model. In order to blend the different models, an iterative approach is used. For each cell of the grid, the threshold for the most extreme precipitation is checked first and if the grid point exceeds the threshold the appropriate model is used. To set the thresholds for each period an iterative approach was used where all IK thresholds started from 0.5 and progressively added and subtracted 0.1 to find the most optimal value. Results from this process are displayed in Annex S2 of the Supplementary Materials section.

Building upon the previously outlined data and modeling considerations, three distinct interpolation models were formulated (Figure 1 and Figures S2 and S3) to facilitate a comparative analysis of their performance. Firstly, for all the generated models, a monthly background precipitation total dataset is generated that is used as an independent variable in the different models employed. For the IKGAM the monthly background totals are generated utilizing only station data, while for the IKGAMV2 and the IKGAM RK, the background totals also utilize ERA5 data. The parameters utilized for the background monthly totals are also presented in the model flowcharts (Figure 1 and Figures S2 and S3). Regarding the parameters used for the GAM and the IK models, IKGAM presented in Figure S2 does not incorporate any AUREHLY parameters or ERA5 data and sets a homogenous 0.5 threshold for assigning a wet day. Additionally, the final value for each cell is calculated by multiplying the IK value and the GAM value. In Figure S3, the IKGAMV2 model incorporates the AUREHLY and ERA5 data when constructing the GAM daily models. Furthermore, the months are split into three groups and each group has a separate IK threshold and a separate GAM. Finally, in Figure 1, the IKGAMRK uses three different IK models that use different thresholds for three different groups of months. Each IK is interpolated using a different precipitation threshold and for each precipitation threshold, a different model is used for interpolation. IKGAMRK’s modeling framework aims at addressing the challenges in interpolating accurately the frequency of days where precipitation exceeds 10 and 20 mm/day by incorporating specific models and modular IK thresholds. The variables used for each model are given inside the frame of the flowchart.

2.3.2. Validation Process

To ensure the robustness and comparability of the model results of this study, a Leave-One-Out Cross-Validation (LOOCV) technique was selected for model evaluation. According to the methodology, each station is left out of the training process of the model, and after training, the model is used to predict the station values. Since in all of the interpolation models used in this study a monthly interpolation is performed first, therefore in order to have the most accurate results, the monthly interpolation also excludes each station that is validated during the LOOCV process. This essentially means that a monthly and a daily model must be trained for each station, which is a very time-consuming process. However, this process gives the most accurate and comparable results between the models, since all 836,194 available gauge data points are validated through the same process. The main shortcoming of the LOOCV process is that the results depend largely on the density of the station network [60]. In our research, the same station dataset is used to construct all the models therefore the results are comparable; however, caution should be taken when comparing the results of different station datasets.

2.3.3. Statistical Metrics

The metrics that will be used to compare the different interpolation models and the ERA5 datasets are the R² coefficient and the Root Mean Squared Error (RMSE). These statistics will be calculated for monthly and annual totals of Precipitation, wet days (where precipitation exceeds 1 mm/day), and the number of times precipitation exceeds 20 mm/day (P20). It is worth noting that some stations have missing data therefore all statics will be weighted and averaged when presented for each month or annually. To enhance readability, the presentation of statistical results for precipitation events exceeding 10 mm/day and 40 mm/day has been omitted, as these results exhibit strong similarities to those observed for the 20 mm/day threshold.

Additionally, the contingency statistics probability of detection (POD), false alarm ratio (FAR), and Critical Success Index (CSI) will be calculated for wet days and the number of times precipitation exceeded 20 mm/day. The contingency scores are calculated using the number of hits (A), false alarms (B), and misses (C) for each time precipitation hits a threshold in the predicted data compared to the station data. Specifically, POD is the ratio of observed events that are estimated correctly and it ranges from 0 (poor) to 1 (good). FAR is the ratio of false alarms, ranging from 0 (good) to 1 (poor). CSI is a combination of the hit ratio and the false alarm ratio in one score which gives a holistic metric of performance and it ranges from 0 (poor) to 1 (good). These metrics have been used by Nastos et al. [37] to evaluate precipitation and are described in detail in their research. The formulas for all metrics and models used in this study are also presented in the Annex SI—Formulas of the Supplementary Materials sections.

To compare the distribution of the resulting datasets a two-sided non-parametric Mann–Whitney–Wilconox test was performed, and their effect size metrics were compared. The non-parametric test hypothesis assumes that two variable distributions come from the same population, and in the two-sided version of the test, the alternative hypothesis stipulates that the first group’s data distribution differs from the second group. To quantify the magnitude of the observed effect, the Wilcoxon effect size (r) was calculated by dividing the Z statistic by the square root of the sample size. A smaller absolute value of r suggests a greater similarity between the groups, whereas a larger absolute value indicates a more substantial difference. Finally, all presented metrics underwent statistical significance testing using appropriate R functions or their respective specialized metric libraries. All results were found to be significant at the 1% level.

The combination of the aforementioned validation metrics and techniques will collectively enable an in-depth assessment of the models’ performance. Additionally, figures comparing station data and predicted data from the models will be shown in the Section 3 in order to visualize the regional differences in performance. More specifically, figures of the annual distribution for the different Greek regions will be presented along with maps of the stations’ RMSE and the mean annual totals of precipitation, wet days and P20.

3. Results

3.1. Precipitation Totals

In Table 1 the improvements in performance in the monthly and annual totals are presented. We observe that the standalone ERA5 dataset performs better than the IKGAM dataset but when blended with our interpolation technique in IKGAMV2, the results are better than the ERA5 on an annual basis. IKGAMV2’s performance relative to ERA5 varied throughout the individual months, exhibiting superior performance from January through April, September, October, and December while underperforming in the remaining months. In the summer months, all interpolation methods record their worst performance, and they are the only months where ERA5 is able to outperform the interpolation models. Finally, the IKGAMRK performs better than the rest of the interpolation methods, especially on an annual basis and in the Winter and Autumn months. In Summer, the model has a very similar performance to the rest of the interpolation methods, especially when comparing the RMSE metrics of the models. Similar results are also observed for the daily metrics presented in Table S2.

In Figure S4, histograms of the distribution of annual precipitation are presented. Through the observed distribution of the models studied and the metrics presented in Table 1 and Table S2, we can conclude that the ERA5 model seems to be overestimating precipitation in our station dataset on an annual basis. In Figure S4, the differences are especially pronounced in the west-central Macedonia region and western Greece. It is worth noting that most of the stations in our dataset are located in lower-elevation areas which does influence the results. In Crete, where the dataset contains a lot of stations in higher elevation, the ERA5 mean values are lower than the observed precipitation depicted in Figure S4. Previous evaluations of the ERA5 dataset also conclude with our results [61] that the ERA5 overestimates precipitation overall but underestimates precipitation in very high-elevation areas. The IKGAM dataset underestimates precipitation in most areas while the IKGAMV2, which has ERA5 data blended into the interpolation method, interpolates precipitation more accurately in most areas. Finally, the IKGAMRK interpolates precipitation more closely to the station data out of all the models, as observed through the metrics presented in Table 1 and Table S2.

The maps of mean annual precipitation totals in Figure 2 further confirm the results described above. The ERA5 dataset simulates a very strong longitudinal shift in precipitation which also exists in the IKGAMV2 and IKGAMRK datasets but to a lesser extent. The IKGAM dataset records notably lower precipitation across the whole Greek region. Finally, the IKGAMV2 and IKGAMRK record precipitation much higher in the peaks of the mountains compared to the ERA5 dataset. This difference is especially pronounced in Crete where the ERA5 dataset simulates notably less precipitation in the mountain peaks, compared to the IKGAMV2 and the IKGAMRK datasets. Additionally, the IKGAM interpolates a much smoother precipitation compared to IKGAMV2 and IKGAMRK which interpolates much more refined peaks of precipitation across the Pindos mountain range. This can be attributed to the fact that IKGAM utilized very few parameters and the station dataset that we use does not have a lot of stations in the Pindos mountain range or in very high-elevation areas in general.

When comparing the spatial distribution of the station RMSE in Figure 3, we can see that the IKGAMRK and IKGAMV2 perform notably better than the rest of the models. The differences are more pronounced in Crete, northeastern Greece, Ionian Islands, and the Evia region. The IKGAM has notable differences in the Ionian Islands and Evia region compared to the rest of the models. In the Aegean islands, the model also performs worst in the region compared to the rest of the models. The ERA5 model also has notably higher RMSE than IKGAMV2 and IKGAMRK in Crete, eastern Aegean, and the Ionian islands. Finally, it is worth noting that in the RMSE maps, we can see the influence that station density has on performance. In Athens, where the station network is denser, all the models record some of their best performance, while, in the rest of Greece, the RMSE is higher in all the models.

3.2. Wet Days

Moving forward, in Table 2, the performance in the monthly and annual number of wet days is presented. It is worth noting that when comparing the number of wet days and the number of days precipitation exceeded 20 mm the RMSE and the R² should be analyzed in conjunction with the contingency metrics (Table 4) since they can also give important information about the performance of the models. For wet days, the ERA5 may present a high R² but the RMSE is notably higher than all the interpolation methods. IKGAM has the lowest RMSE of all the models and, in Table 4, we can see that FAR is the highest in ERA5 while the lowest in IKGAM and IKGAMRK, while at the same time, CSI is the highest in IKGAM and IKGAMRK and lowest in ERA5. When comparing the metrics of IKGAM and ERA5, the R² values are much closer between the two models when compared to the RMSE. This difference indicates that while the ERA5 data capture the overall trend of monthly and annual wet days correctly the prediction error is higher than the rest of the models. This indicates that the ERA5 model simulates a lot more wet days and the interpolation methods are able to correct this overestimation. Moreover, the performance of the models between the different months is much more uniform, especially when compared to the models’ performance in precipitation. It is worth noting that while IKGAM performed much worse than IKGAMV2 in precipitation, on wet days, IKGAM outperformed IKGAMV2. This difference can be attributed to the different thresholds used in the IK model. In IKGAM the threshold is 0.5 while in IKGAMV2 the threshold is 0.4 and 0.3 (Figures S2 and S3). This means that while a higher threshold can be used to accurately predict the number of wet days it impairs the performance of the total precipitation. This does not affect the IKGAMRK performance because while it does use a higher threshold for wet days it also utilizes extreme precipitation models which add more precipitation to the totals.

In Figure S5, the distribution of the annual wet day totals is presented for all the regions in Greece. The ERA5 model displays a higher average number of wet days across all regions of Greece. This provides a more nuanced interpretation of the contingency metrics with respect to the ERA5 data, which exhibit a high probability of detection (POD) but also the highest false alarm rate (FAR) among all evaluated models. It is worth noting that the IKGAMV2 also records a higher number of wet days on an annual basis in Figure S5. The region that has the highest deviations is the northern Aegean region similar to the precipitation.

Moving on to the maps of the different models in Figure 4, where the ERA5 model simulates a large number of wet days compared to the rest of the models. This difference is especially pronounced across the Greek mainland. The IKGAMV2 and IKGAMRK models interpolate a lot more wet days in the mountain peaks of the Pindos mountain range and the mountains of Crete compared to IKGAM. In Figure 5, the bias of each station is presented for all the models. When cross-examining Figure 4 and Figure 5, we can see that the large number of wet days simulated by the ERA5 in the Greek Mainland is increasing the RMSE in the region. At the same time, the performance of the rest of the interpolation models is very similar. The best model overall seems to be IKGAM recording the lowest RMSE with IKGAMRK having a slightly higher RMSE in the Greek mainland.

3.3. Number of Days Where Precipitation Exceeds 20 mm

Prior research [36,62,63] has established that daily precipitation exceeding 20 mm is a relatively rare occurrence in the Greek region, particularly during the summer and spring months. Consequently, the scale of the metrics presented for these extreme events will exhibit a marked contrast compared to the totals for overall precipitation and the number of wet days. In Table 3, metrics of performance for the different models are presented. Firstly, the differences in performance are much smaller compared to the rest of the parameters presented above. This occurs due to the rarity of P20 events in the Greek region. Moreover, there is an obvious drop in performance in the summer and spring months across all models. The model that has the best performance overall is the IKGAMRK, which records the best R² and the lowest RMSE out of all the models. Moreover, the POD and CSI metrics (Table 4) are notably higher for the IKGAMRK model compared to the rest of the models. At the same time, the FAR metric is slightly higher for ERA5 and IKGAMRK, but the differences are not as large as the POD and CSI metrics. The worst performance is recorded by the IKGAM which has the lowest CSI and POD metrics.

Figure S6 illustrates the distribution of annual P20 totals, revealing that the ERA5, IKGAM, and IKGAMV2 models have lower mean totals of P20 events across the entire region. This underestimation appears to be more pronounced for the IKGAM and ERA5 models, while the IKGAMRK model’s P20 distribution and mean values correspond closer to observed station data. Similar to the rest of the parameters studied, the largest deviations are recorded in western Greece and the Northern Aegean. These results along with the metrics calculated indicate that all the models, besides IKGAMRK, underestimate the number of P20 events across the Greek region. Furthermore, we can also conclude that the extreme models used in IKGAMRK help increase the predictive accuracy of the interpolation approaches in the region.

In Figure 6, maps of the mean annual P20 totals are presented. The IKGAM interpolates notably fewer P20 events across the whole region. The ERA5 simulates most P20 events across northwestern Greece and smaller amounts in northern Greece. Similar patterns are also recorded in the IKGAMV2 model. As stated before, the station dataset used is quite sparse and it does not have a lot of data in higher elevation areas which record most of the P20 events occurring, therefore it is very difficult for the IKGAM to interpolate the P20 events accurately. The differences between IKGAM and IKGAMV2 show the influence that the ERA5 data have when they are blended in the interpolation process. Finally, the IKGAMRK is able to interpolate the P20 events more accurately across the whole region. In the Pindos mountain range, the model interpolates more P20 events in the mountain peaks. In northeastern Greece, the IKGAMRK also records more P20 events than the rest of the models. In Crete the IKGAMV2 and IKGAMRK record more events in the mountain peaks compared to IKGAM and the ERA5.

In Figure 7, we observe that the IKGAM has higher RMSE across northeastern Greece and the northwestern mountainous regions. In those regions, the IKGAMRK outperforms the rest of the models by a wide margin. These mountainous regions record some of the highest numbers of P20 in Greece; therefore, the models’ performance is crucial in those regions. In Crete, the ERA5 data have a higher bias than the rest of the models, most notably in the eastern mountainous regions. Finally, while the IKGAMV2 has a lower bias than IKGAM and ERA5, the best model overall is IKGAMRK because it has the lowest bias in the whole Greek region and records the best metrics by a wide margin.

4. Discussion

According to the metrics presented in the Section 3 and the distributions shown in Figure 8, the ERA5 model demonstrates biases in its precipitation estimates across the Greek region, characterized by an overestimation of total precipitation and the number of wet days, coupled with an underestimation of P20 events. This pattern is further corroborated by the higher RMSE values observed for wet days and P20 (Table 2 and Table 3) in the ERA5 model relative to the interpolation methods. Furthermore, the ERA5 model exhibits a notably lower CSI score (Table 4), indicating inaccuracies in simulating wet days and P20 events.

The interpolation method IKGAM displays increased accuracy in simulating the number of wet days while underestimating P20 events as seen in the POD, FAR, and CSI metrics in Table 4. This may be due to the fact that IKGAM utilizes a higher IK threshold for wet days which may be accurate for the number of wet days but it hinders the performance of Precipitation and the number of P20 in the region as seen in Figure 2 and Figure 6, where there are differences between IKGAM and the rest of the datasets. In Table 1, Table 2 and Table 3, we can also observe that the IKGAM has very high RMSE in the precipitation totals and in the number of P20 events but has the lowest bias in the number of wet days. When analyzing the maps of the interpolated variables (Figure 2, Figure 4, and Figure 6) in conjunction with the mapped RMSE values (Figure 3, Figure 5, and Figure 7), we can conclude that an important limitation of all the interpolation methods is the data quality, spatial distribution and density of the station dataset that is used. In areas with denser precipitation networks, such as Cities in our station dataset, the interpolation methods perform their best.

Table 4. Contingency metrics of the ERA5 reanalysis and the models studied.

Models	Wet Days			P20
Models	POD	FAR	CSI	POD	FAR	CSI
ERA5	0.82	0.45	0.49	0.35	0.54	0.25
IKGAM	0.72	0.28	0.56	0.25	0.48	0.20
IKGAMV2	0.74	0.34	0.53	0.36	0.49	0.27
IKGAMRK	0.70	0.29	0.55	0.45	0.55	0.29

The IKGAMV2 model had ERA5 blended in the interpolation process in order to add information to the sparse station dataset and utilized more parameters in all the models used. Due to the sparse density of the gauge network that exists in Greece, the ERA5 data that were used for the IKGAMV2 methodology improved the results of the model (Table 1, Table 3, Table 4, and Table S2) compared to IKGAM. Moreover, the spatial variability improved as we can see in the maps of the parameters studied. However, it is worth noting that when adding reanalysis data in the process, inevitably, the resulting dataset will be influenced by the dataset used. Additionally, in order to increase the precipitation totals, the threshold of the IK was decreased and, as a result, the performance of wet days was hindered. Moreover, in Figure 6 we can see that IKGAMV2 still underestimates the number of P20 events. The only model that was able to interpolate extreme precipitation accurately was IKGAMRK. This is due to the fact that the model utilizes specialized models for extreme precipitation. At the same time, the model can interpolate wet days very accurately because it also has a high IK threshold for wet days.

In Figure 8, the cumulative distribution of the aggregate station data is compared to the distribution of the interpolated predicted data using the LOOCV method. This graph was derived by aggregating all station and predicted data, then for each threshold value displayed on the X-axis, the Y-axis depicts the proportion of data with rainfall values falling below that threshold. In the Supplementary Section, a different version of the cumulative graph is provided for context, where instead of the percent of data points, the percent of total precipitation is calculated.

The cumulative distribution graph (Figure 8) underscores the critical role of wet days in shaping the overall precipitation distribution, findings that are consistent with the results presented in Table 1, Table 2, Table 3 and Table 4. The ERA5 model, in particular, demonstrates a notable discrepancy from observed data, characterized by an underestimation of days with sub-1 mm precipitation and an overestimation of days with 1–10 mm rainfall. This observation is further substantiated by Figure S7, which reveals a disproportionately high contribution of the 1–10 mm precipitation range to the total precipitation in the ERA5 and IKGAM datasets. To further solidify the observed differences in the cumulative plots, in Table S3 the results of the Wilcoxon–ManS2–Whitney test are presented. The results show that the ERA5 distribution has the largest effect size when compared to the interpolation methods, with the differences being more pronounced on a daily basis. All results of the test are statistically significant within the 1% level, with the IKGAMRK model recording a slightly higher p-value than the rest of the models on the distribution of monthly precipitation totals. The differences between the daily and monthly results can be attributed to the overestimation of wet days in the region. While the IKGAMV2 model represents an improvement over both IKGAM and ERA5, the IKGAMRK model exhibits the most accurate representation of the observed precipitation distribution across all precipitation thresholds, with the exception of extreme events exceeding 50 mm. The reason that we did not incorporate even more extreme precipitation models coupled with more IK models in the IKGAMRK approach is that due to their rarity, we were not able to incorporate enough data to train the models. To further optimize the methodology’s performance, future research should explore the utilization of different combinations of data, models, and thresholds for extreme precipitation.

In conclusion, the best-performing model overall was IKGAMRK because it can interpolate the distribution of precipitation more accurately across the whole region. The rest of the models seem to excel at predicting certain aspects of precipitation but lack the ability to accurately predict the distribution of precipitation. The differences between the approaches are also observed when comparing their results on a daily basis (Table S2) and their distributions (Figure S7, Figure 8, and Table S2). This is especially pronounced in the number of P20 events (Table 4) where none of the other models can accurately predict the parameter, highlighting the outperformance of the novel modeling framework in accurately interpolating extreme precipitation events. Moreover, the main strength of IKGAMRK lies in its utilization of multiple IK models for different precipitation thresholds. The adjustable thresholds within the multi-model approach enhance the accuracy of precipitation prediction across the variable distribution.

5. Conclusions

The goal of this study is to create and compare daily gridded precipitation databases for the Greek region, in order to facilitate a comparative analysis of the different methodologies and their respective strengths and limitations. Three different interpolation models were created, IKGAM utilizes only the station network data and limited geographical variables (Figure S2) to create the precipitation dataset using a combination of IK and GAM. IKGAMV2 model has ERA5 precipitation data blended into the interpolation process and uses a variety of different geographical variables (Figure S3). Finally, IKGAMRK incorporates ERA5 precipitation data and different IK models combined with GAM and specific models for extreme precipitation (Figure 1). From the metrics and figures that were presented, the IKGAMV2 and the IKGAMRK interpolation approaches outperform the ERA5 and IKGAM when estimating annual precipitation totals (Table 1). This improvement is also evident in their respective daily metrics on an annual basis (Table S2). Regarding wet days, the ERA5 dataset simulates a larger annual total of wet days (Figure 4), which seems to hinder its performance compared to the rest of the models (Table 2). Out of the three modeling approaches, the IKGAM is the most accurate in predicting the number of wet days (Table 2 and Table 4), while its performance lags the rest of the methods in the precipitation totals (Table 1) and P20 totals (Table 3 and Table 4). The IKGAMV2 approach is more accurate in precipitation totals; however, it is still less accurate than IKGAMRK in predicting P20 and the number of wet days. The best-performing model is the IKGAMRK since it outperforms the rest of the models when comparing their annual precipitation totals (Table 1), the daily values on an annual basis (Table S2), and the contingency metrics regarding wet days and P20 (Table 4). The outperformance of IKGAMRK is most apparent in precipitation events over 20 mm where there is a notable difference with the rest of the models in the metrics calculated (Table 3 and Table 4). This outperformance is also evident in the precipitation distribution graphs (Figure 8 and Figure S7) and through the metrics of the Wilcoxon–Mann–Whitney test (Table S3).

Finally, although the database and the modeling process are centered around Greece, important conclusions can be made for interpolating precipitation in general. More specifically, we can conclude that when using a sparse precipitation dataset blending reanalysis data in the interpolation process can increase the accuracy of the results. However, caution should be taken, because the reanalysis data will influence the results, while due to the sparsity of the gauge networks, the influence of the reanalysis can be hard to validate. Additionally, the final dataset is also dependent on the availability of the reanalysis dataset. Furthermore, utilizing multiple IK models across different precipitation thresholds can aid in interpolating extreme precipitation more accurately. This approach seems to be more appropriate when the station data are quite sparse as the models did have more similar performances in the Cities category where the station network was denser. An important limitation in all interpolation approaches is additionally the data quality, geographical distribution, and density of the station network since this influences the results achieved as well as the ability to validate the results. Our findings demonstrate the efficacy of multi-model approaches compared to traditional interpolation techniques, suggesting that such multi-model strategies could improve regional precipitation database accuracy in other regions exhibiting similar data limitations and spatial variability to those observed in Greece. Improved precipitation predictions can lead to better preparedness for extreme weather events, informed water resource management, and a deeper understanding of climate dynamics, ultimately aiding in effective disaster response strategies. In future research, it would be beneficial to investigate the incorporation of alternative modeling approaches to improve the prediction of extreme precipitation events. Moreover, additional data sources, such as satellite data, could be considered in order to increase the accuracy of prediction, especially for extreme precipitation events. Finally, for regions characterized by significant heterogeneity in precipitation patterns, the application of this methodology on a sub-regional basis by incorporating different model threshold values into the IK models could potentially enhance the overall accuracy of the predictions.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/hydrology12020031/s1, Figure S1: Elevation of the region studied along with the location of the stations used and the subregions used in the study; Figure S2: Flowchart of the interpolation method IKGAM. In the chart, Lon stands for longitude, Lat for latitude, Alt for altitude, MoInt for Monthly Interpolated Totals; Figure S3: Flowchart of the interpolation method IKGAMV2. In the chart, Lon stands for longitude, Lat for latitude, DistShore for distance from sea coastal line, Alt for Altitude, M.ERA5 for monthly ERA5, D.ERA5 for daily ERA5, MoIntERA5 for Monthly Interpolated Totals with ERA5 data and PC1-PC15 are the AUREHLY principal components; Figure S4: Comparison of distributions between predicted and station annual precipitation totals in the Greek region; Figure S5: Comparison of distributions between predicted and station annual wet day totals in the Greek region; Figure S6: Comparison of distributions between predicted and station annual P20 totals in the Greek region. Figure S7: Comparison of cumulative precipitation distribution between the gauge dataset, the ERA5 reanalysis, and the interpolation methods for the total precipitation below certain thresholds; Figure S8: Scatterplots of observed and modeled precipitation values for the ERA5 reanalysis and the interpolation methods; Table S1: Station density in the areas studied; Table S2: Daily Metrics of the ERA5 reanalysis and the interpolation methods; Table S3: Results of Wilcoxon–Mann–Whitney test for daily and monthly distributions of the ERA5 reanalysis and the interpolation methods; Table S4: Results from the wet days threshold selection process of IKGAMRK.

Author Contributions

Conceptualization, G.N., P.N. and J.K.; methodology, G.N., P.N. and J.K.; data analysis, G.N. and J.K.; validation, G.N., J.K. and K.D.; investigation, G.N., J.K. and K.D.; data curation, G.N., J.K. and K.D.; writing—original draft preparation, G.N. and J.K.; writing—review and editing, K.D. and P.N.; visualization, G.N. and K.D.; supervision, P.N. and J.K.; project administration, P.N. and J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The gauge data used in this study were provided by the Hellenic National Meteorological Service and can only be acquired by contacting HNMS, the reanalysis data by the ECMWF, the Digital Elevation Model by TerraSAR-X satellite mission and the sea coastal line, lake and river datasets by the https://geodata.gov.gr/ (accessed on 1 June 2024) platform. The datasets generated during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to thank Hellenic National Meteorological Service for providing us with precipitation gauge data to use in our study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Demuzere, M.; Kittner, J.; Martilli, A.; Mills, G.; Moede, C.; Stewart, I.D.; Van Vliet, J.; Bechtel, B. A global map of Local Climate Zones to support earth system modelling and urban scale environmental science. Earth Syst. Sci. Data 2022, 14, 3835–3873. [Google Scholar] [CrossRef]
Jacob, D.; Teichmann, C.; Sobolowski, S.; Katragkou, E.; Anders, I.; Belda, M.; Benestad, R.; Boberg, F.; Buonomo, E.; Cardoso, R.M.; et al. Regional climate downscaling over Europe: Perspectives from the EURO-CORDEX community. Reg. Environ. Change 2020, 20, 51. [Google Scholar] [CrossRef]
Georgoulias, A.K.; Akritidis, D.; Kalisoras, A.; Kapsomenakis, J.; Melas, D.; Zerefos, C.S.; Zanis, P. Climate change projections for Greece in the 21st century from high-resolution EURO-CORDEX RCM simulations. Atmos. Res. 2022, 271, 106049. [Google Scholar] [CrossRef]
Carvalho, D.; Pereira, S.C.; Silva, R.; Rocha, A. Aridity and desertification in the Mediterranean under EURO-CORDEX future climate change scenarios. Clim. Change 2022, 174, 28. [Google Scholar] [CrossRef]
Hosseinzadehtalaei, P.; Tabari, H.; Willems, P. Precipitation intensity–duration–frequency curves for central Belgium with an ensemble of EURO-CORDEX simulations, and associated uncertainties. Atmos. Res. 2018, 200, 1–12. [Google Scholar] [CrossRef]
Lauwaet, D.; Hooyberghs, H.; Maiheu, B.; Lefebvre, W.; Driesen, G.; Van Looy, S.; De Ridder, K. Detailed urban heat island projections for cities worldwide: Dynamical downscaling CMIP5 global climate models. Climate 2015, 3, 391–415. [Google Scholar] [CrossRef]
Lauwaet, D.; De Ridder, K.; Saeed, S.; Brisson, E.; Chatterjee, F.; van Lipzig, N.P.; Maiheu, B.; Hooyberghs, H. Assessing the current and future urban heat island of Brussels. Urban Clim. 2016, 15, 1–15. [Google Scholar] [CrossRef]
Smid, M.; Costa, A.C. Climate projections and downscaling techniques: A discussion for impact studies in urban systems. Int. J. Urban Sci. 2018, 22, 277–307. [Google Scholar] [CrossRef]
Stoner, A.M.; Hayhoe, K.; Yang, X.; Wuebbles, D.J. An asynchronous regional regression model for statistical downscaling of daily climate variables. Int. J. Climatol. 2013, 33, 2473–2494. [Google Scholar] [CrossRef]
Hamadalnel, M.; Zhu, Z.; Gaber, A.; Iyakaremye, V.; Ayugi, B. Possible changes in Sudan’s future precipitation under the high and medium emission scenarios based on bias adjusted GCMs. Atmos. Res. 2022, 269, 106036. [Google Scholar] [CrossRef]
Ngai, S.T.; Juneng, L.; Tangang, F.; Chung, J.X.; Salimun, E.; Tan, M.L.; Amalia, S. Future projections of Malaysia daily precipitation characteristics using bias correction technique. Atmos. Res. 2020, 240, 104926. [Google Scholar] [CrossRef]
Manzanas, R.; Fiwa, L.; Vanya, C.; Kanamaru, H.; Gutiérrez, J.M. Statistical downscaling or bias adjustment? A case study involving implausible climate change projections of precipitation in Malawi. Clim. Change 2020, 162, 1437–1453. [Google Scholar] [CrossRef]
Tapiador, F.J.; Moreno, R.; Navarro, A.; Sánchez, J.L.; García-Ortega, E. Climate classifications from regional and global climate models: Performances for present climate estimates and expected changes in the future at high spatial resolution. Atmos. Res. 2019, 228, 107–121. [Google Scholar] [CrossRef]
Tapiador, F.J.; Navarro, A.; Moreno, R.; Sánchez, J.L.; García-Ortega, E. Regional climate models: 30 years of dynamical downscaling. Atmos. Res. 2020, 235, 104785. [Google Scholar] [CrossRef]
Cornes, R.C.; van der Schrier, G.; van den Besselaar, E.J.; Jones, P.D. An ensemble version of the E-OBS temperature and precipitation data sets. J. Geophys. Res.-Atmos. 2018, 123, 9391–9409. [Google Scholar] [CrossRef]
Lussana, C.; Saloranta, T.; Skaugen, T.; Magnusson, J.; Tveito, O.E.; Andersen, J. seNorge2 daily precipitation, an observational gridded dataset over Norway from 1957 to the present day. Earth Syst. Sci. Data 2018, 10, 235–249. [Google Scholar] [CrossRef]
MeteoSwiss. Documentation of MeteoSwiss Grid-Data Products: Daily Mean, Minimum and Maximum Temperature: TabsD, TminD, TmaxD. Available online: https://www.meteoswiss.admin.ch/dam/jcr:818a4d17-cb0c-4e8b-92c6-1a1bdf5348b7/ProdDoc_TabsD.pdf (accessed on 3 February 2025).
MeteoSwiss. Documentation of MeteoSwiss Grid-Data Products: Daily Precipitation (Final Analysis): RhiresD. Available online: https://www.meteoswiss.admin.ch/dam/jcr:4f51f0f1-0fe3-48b5-9de0-15666327e63c/ProdDoc_RhiresD.pdf (accessed on 3 February 2025).
Sekulić, A.; Kilibarda, M.; Protić, D.; Bajat, B. A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation. Sci. Data 2021, 8, 123. [Google Scholar] [CrossRef] [PubMed]
Camera, C.; Bruggeman, A.; Hadjinicolaou, P.; Pashiardis, S.; Lange, M.A. Evaluation of interpolation techniques for the creation of gridded daily precipitation (1 × 1 km²); Cyprus, 1980–2010. J. Geophys. Res.-Atmos. 2014, 119, 693–712. [Google Scholar] [CrossRef]
Varotsos, K.V.; Dandou, A.; Papangelis, G.; Roukounakis, N.; Kitsara, G.; Tombrou, M.; Giannakopoulos, C. Using a new local high resolution daily gridded dataset for Attica to statistically downscale climate projections. Clim. Dyn. 2023, 60, 2931–2956. [Google Scholar] [CrossRef]
Varotsos, K.V.; Karali, A.; Kitsara, G.; Lemesios, G.; Patlakas, P.; Hatzaki, M.; Tenentes, V.; Katavoutas, G.; Sarantopoulos, A.; Koutroulis, A.G.; et al. High resolution observational daily gridded dataset for Greece: The CLIMADAT-hub project. In Proceedings of the EMS2024, Barcelona, Spain, 2–6 September 2024. Abstract Number EMS2024-643. [Google Scholar]
Akinsanola, A.A.; Ongoma, V.; Kooperman, G.J. Evaluation of CMIP6 models in simulating the statistics of extreme precipitation over Eastern Africa. Atmos. Res. 2021, 254, 105509. [Google Scholar] [CrossRef]
Merino, A.; García-Ortega, E.; Navarro, A.; Sánchez, J.L.; Tapiador, F.J. WRF hourly evaluation for extreme precipitation events. Atmos. Res. 2022, 274, 106215. [Google Scholar] [CrossRef]
Kazamias, A.P.; Sapountzis, M.; Lagouvardos, K. Evaluation of GPM-IMERG rainfall estimates at multiple temporal and spatial scales over Greece. Atmos. Res. 2022, 269, 106014. [Google Scholar] [CrossRef]
Li, Z.; Shi, Y.; Argiriou, A.A.; Ioannidis, P.; Mamara, A.; Yan, Z. A comparative analysis of changes in temperature and precipitation extremes since 1960 between China and Greece. Atmosphere 2022, 13, 1824. [Google Scholar] [CrossRef]
Nastos, P.T.; Kapsomenakis, J.; Douvis, K.C. Analysis of precipitation extremes based on satellite and high-resolution gridded data set over Mediterranean basin. Atmos. Res. 2013, 131, 46–59. [Google Scholar] [CrossRef]
Philandras, C.M.; Nastos, P.T.; Kapsomenakis, J.; Douvis, K.C.; Tselioudis, G.; Zerefos, C.S. Long term precipitation trends and variability within the Mediterranean region. Nat. Hazards Earth Syst. Sci. 2011, 11, 3235–3250. [Google Scholar] [CrossRef]
Seager, R.; Liu, H.; Kushnir, Y.; Osborn, T.J.; Simpson, I.R.; Kelley, C.R.; Nakamura, J. Mechanisms of winter precipitation variability in the European–Mediterranean region associated with the North Atlantic Oscillation. J. Clim. 2020, 33, 7179–7196. [Google Scholar] [CrossRef]
Deitch, M.J.; Sapundjieff, M.J.; Feirer, S.T. Characterizing precipitation variability and trends in the world’s Mediterranean-climate areas. Water 2017, 9, 259. [Google Scholar] [CrossRef]
Lee, M.H.; Im, E.S.; Bae, D.H. Impact of the spatial variability of daily precipitation on hydrological projections: A comparison of GCM-and RCM-driven cases in the Han River basin, Korea. Hydrol. Process. 2019, 33, 2240–2257. [Google Scholar] [CrossRef]
Nastos, P.T.; Politi, N.; Kapsomenakis, J. Spatial and temporal variability of the Aridity Index in Greece. Atmos. Res. 2013, 119, 140–152. [Google Scholar] [CrossRef]
Markonis, Y.; Batelis, S.C.; Dimakos, Y.; Moschou, E.; Koutsoyiannis, D. Temporal and spatial variability of rainfall over Greece. Theor. Appl. Climatol. 2017, 130, 217–232. [Google Scholar] [CrossRef]
Politi, N.; Vlachogiannis, D.; Sfetsos, A.; Nastos, P.T.; Dalezios, N.R. High resolution future projections of drought characteristics in Greece based on SPI and SPEI indices. Atmosphere 2022, 13, 1468. [Google Scholar] [CrossRef]
Tzanis, C.G.; Pak, A.N.; Koutsogiannis, I.; Philippopoulos, K. Climatology of Extreme Precipitation from Observational Records in Greece. Environ. Sci. Proc. 2022, 19, 51. [Google Scholar] [CrossRef]
Ntagkounakis, G.; Nastos, P.T.; Kapsomenakis, Y. Creating High-Resolution Precipitation and Extreme Precipitation Indices Datasets by Downscaling and Improving on the ERA5 Reanalysis Data over Greece. Eng 2024, 5, 1885–1904. [Google Scholar] [CrossRef]
Nastos, P.T.; Kapsomenakis, J.; Philandras, K.M. Evaluation of the TRMM 3B43 gridded precipitation estimates over Greece. Atmos. Res. 2016, 169, 497–514. [Google Scholar] [CrossRef]
Li, L.; Zha, Y.; Wang, R. Relationship of surface urban heat island with air temperature and precipitation in global large cities. Ecol. Indic. 2020, 117, 106683. [Google Scholar] [CrossRef]
Steensen, B.M.; Marelle, L.; Hodnebrog, Ø.; Myhre, G. Future urban heat island influence on precipitation. Clim. Dyn. 2022, 58, 3393–3403. [Google Scholar] [CrossRef]
European Data. Available online: https://data.europa.eu/data/datasets/5eecdf4c-de57-4624-99e9-60086b032aea?locale=en (accessed on 1 June 2024).
GEODATA.gov.gr. Available online: https://geodata.gov.gr/ (accessed on 1 June 2024).
Muñoz-Sabater, J.; Dutra, E.; Agustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
ECMWF. Available online: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5 (accessed on 1 June 2024).
Wu, G.; Qin, S.; Mao, Y.; Ma, Z.; Shi, C. Validation of precipitation events in ERA5 to gauge observations during warm seasons over eastern China. J. Hydrometeorol. 2022, 23, 807–822. [Google Scholar] [CrossRef]
Jiang, Y.; Yang, K.; Shao, C.; Zhou, X.; Zhao, L.; Chen, Y.; Wu, H. A downscaling approach for constructing high-resolution precipitation dataset over the Tibetan Plateau from ERA5 reanalysis. Atmos. Res. 2021, 256, 105574. [Google Scholar] [CrossRef]
Nacar, S.; Kankal, M.; Okkan, U. Evaluation of the suitability of NCEP/NCAR, ERA-Interim and, ERA5 reanalysis data sets for statistical downscaling in the Eastern Black Sea Basin, Turkey. Meteorol. Atmos. Phys. 2022, 134, 39. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.; Liu, D. Spatial Downscaling of ERA5 Reanalysis Air Temperature Data Based on Stacking Ensemble Learning. Sustainability 2024, 16, 1934. [Google Scholar] [CrossRef]
Rivoire, P.; Martius, O.; Naveau, P. A comparison of moderate and extreme ERA-5 daily precipitation with two observational data sets. Earth Space Sci. 2021, 8, e2020EA001633. [Google Scholar] [CrossRef]
Ntagkounakis, G.E.; Nastos, P.T.; Kapsomenakis, Y. Statistical Downscaling of ERA5 Reanalysis Precipitation over the Complex Terrain of Greece. Environ. Sci. Proc. 2023, 26, 81. [Google Scholar] [CrossRef]
Bénichou, P.; Le Breton, O. AURELHY: Une méthode d’analyse utilisant le relief pour les besoins de l’hydrométéorologie. Journ. Hydrol. l’ORSTOM Montp. 1987, 2, 299–304. [Google Scholar]
Gofa, F.; Mamara, A.; Anadranistakis, M.; Flocas, H. Developing gridded climate data sets of precipitation for Greece based on homogenized time series. Climate 2019, 7, 68. [Google Scholar] [CrossRef]
Mamara, A.; Anadranistakis, M.; Argiriou, A.A.; Szentimrey, T.; Kovacs, T.; Bezes, A.; Bihari, Z. High resolution air temperature climatology for Greece for the period 1971–2000. Meteorol. Appl. 2017, 24, 191–205. [Google Scholar] [CrossRef]
The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 1 June 2024).
mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. Available online: https://cran.r-project.org/web/packages/mgcv/index.html (accessed on 1 June 2024).
Hiemstra, P.; Hiemstra, M.P. Package ‘automap’. Compare 2013, 105, 10. [Google Scholar]
Pebesma, E.; Bivand, R.S. Classes and methods for spatial data: The sp package. R. News 2005, 5, 9–13. [Google Scholar]
Pebesma, E.J. Multivariable geostatistics in S: The gstat package. Comput. Geosci. 2004, 30, 683–691. [Google Scholar] [CrossRef]
Herrera, S.; Gutiérrez, J.M.; Ancell, R.; Pons, M.R.; Frías, M.D.; Fernández, J. Development and Analysis of a 50-Year High-Resolution Daily Gridded Precipitation Dataset Over SPAIN (Spain02). Int. J. Climatol. 2012, 32, 74–85. [Google Scholar] [CrossRef]
Herrera, S.; Cardoso, R.M.; Soares, P.M.; Espírito-Santo, F.; Viterbo, P.; Gutiérrez, J.M. Iberia01: A new gridded dataset of daily precipitation and temperatures over Iberia. Earth Syst. Sci. Data 2019, 11, 1947–1956. [Google Scholar] [CrossRef]
Parajka, J.; Merz, R.; Skøien, J.O.; Viglione, A. The role of station density for predicting daily runoff by top-kriging interpolation in Austria. J. Hydrol. Hydromech. 2015, 63, 228. [Google Scholar] [CrossRef]
Crossett, C.C.; Betts, A.K.; Dupigny-Giroux, L.A.L.; Bomblies, A. Evaluation of daily precipitation from the ERA5 global reanalysis against GHCN observations in the northeastern United States. Climate 2020, 8, 148. [Google Scholar] [CrossRef]
Nastos, P.T.; Zerefos, C.S. On extreme daily precipitation totals at Athens, Greece. Adv. Geosci. 2007, 10, 59–66. [Google Scholar] [CrossRef]
Nastos, P.T.; Zerefos, C.S. Decadal changes in extreme daily precipitation in Greece. Adv. Geosci. 2008, 16, 55–62. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the interpolation method IKGAMRK. In the chart Lon stands for longitude, Lat for latitude, DistShore for distance from sea coastal line, Alt for altitude, M.ERA5 for monthly ERA5, D.ERA5 for daily ERA5, MoIntERA5 for Monthly Interpolated Totals with ERA5 data, and PC1-PC15 are the AUREHLY principal components.

Figure 2. Maps of mean annual precipitation totals of the different interpolation methods and the ERA5 data.

Figure 3. Annual precipitation RMSE maps of each station for the interpolation methods and the ERA5 data.

Figure 4. Maps of mean annual wet days totals of the different interpolation methods and the ERA5 data.

Figure 5. Annual wet days RMSE maps of each station for the interpolation methods and the ERA5 data.

Figure 6. Maps of mean annual P20 totals of the different interpolation methods and the ERA5 data.

Figure 7. Annual P20 RMSE maps of each station for the interpolation methods and the ERA5 data.

Figure 8. Comparison of cumulative precipitation distribution between the gauge dataset, the ERA5 reanalysis, and the interpolation methods for the number of values below precipitation thresholds.

Table 1. Metrics of annual and monthly precipitation totals.

Model	ERA5		IKGAM		IKGAMV2		IKGAMRK
Months	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE
January	0.51	55.81	0.54	56.87	0.58	52.09	0.58	52.12
February	0.48	47.03	0.43	51.28	0.50	45.36	0.49	46.42
March	0.47	40.01	0.44	42.09	0.50	37.79	0.51	37.95
April	0.51	28.00	0.48	28.88	0.51	26.40	0.53	26.40
May	0.52	21.58	0.45	24.06	0.51	20.69	0.49	22.34
June	0.42	19.06	0.36	20.60	0.25	23.06	0.28	23.99
July	0.45	14.45	0.27	17.31	0.27	18.10	0.35	17.30
August	0.40	16.13	0.33	17.78	0.25	19.06	0.28	19.51
September	0.46	26.32	0.49	27.38	0.51	25.34	0.50	26.02
October	0.52	43.24	0.49	45.41	0.53	41.95	0.52	43.85
November	0.44	59.99	0.39	64.02	0.43	60.58	0.43	60.97
December	0.41	65.96	0.38	69.31	0.45	62.29	0.46	62.13
Annual	0.45	229.71	0.42	267.81	0.48	212.87	0.49	212.49

Table 2. Metrics of annual and monthly wet days totals.

Model	ERA5		IKGAM		IKGAMV2		IKGAMRK
Months	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE
January	0.65	4.54	0.69	2.71	0.58	3.49	0.61	3.12
February	0.58	4.67	0.58	2.80	0.51	3.30	0.54	3.00
March	0.57	4.57	0.57	2.44	0.51	3.28	0.54	2.63
April	0.58	3.94	0.57	2.14	0.52	2.82	0.55	2.33
May	0.60	3.40	0.57	1.92	0.54	2.25	0.57	2.04
June	0.59	2.34	0.55	1.44	0.52	1.63	0.55	1.48
July	0.54	2.11	0.48	1.34	0.56	1.23	0.60	1.15
August	0.57	2.07	0.41	1.36	0.50	1.30	0.52	1.21
September	0.63	2.50	0.69	1.49	0.61	1.97	0.64	1.64
October	0.55	3.38	0.62	1.98	0.41	2.99	0.47	2.49
November	0.50	3.84	0.56	2.43	0.47	3.00	0.51	2.65
December	0.54	4.40	0.55	3.04	0.42	3.96	0.49	3.44
Annual	0.58	30.59	0.63	11.67	0.47	16.58	0.53	13.36

Table 3. Metrics of annual and monthly number of days where precipitation exceeds 20 mm totals.

Model	ERA5		IKGAM		IKGAMV2		IKGAMRK
Months	R²	RMSE	R²	RMSE	R²	RMSE	R²	RMSE
January	0.23	1.47	0.27	1.46	0.33	1.36	0.36	1.33
February	0.21	1.27	0.12	1.37	0.21	1.26	0.24	1.24
March	0.21	1.08	0.14	1.16	0.22	1.09	0.27	1.08
April	0.17	0.78	0.10	0.80	0.19	0.74	0.22	0.74
May	0.12	0.58	0.09	0.59	0.12	0.58	0.15	0.64
June	0.08	0.45	0.02	0.45	0.06	0.45	0.10	0.52
July	0.07	0.39	0.06	0.39	0.09	0.39	0.10	0.44
August	0.09	0.41	0.12	0.40	0.10	0.42	0.15	0.45
September	0.19	0.64	0.10	0.66	0.16	0.67	0.20	0.69
October	0.32	1.08	0.24	1.08	0.26	1.12	0.30	1.15
November	0.32	1.43	0.23	1.50	0.30	1.43	0.29	1.46
December	0.23	1.68	0.19	1.69	0.27	1.57	0.30	1.53
Annual	0.23	5.65	0.23	6.17	0.27	5.45	0.28	5.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ntagkounakis, G.; Nastos, P.; Kapsomenakis, J.; Douvis, K. Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques. Hydrology 2025, 12, 31. https://doi.org/10.3390/hydrology12020031

AMA Style

Ntagkounakis G, Nastos P, Kapsomenakis J, Douvis K. Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques. Hydrology. 2025; 12(2):31. https://doi.org/10.3390/hydrology12020031

Chicago/Turabian Style

Ntagkounakis, Giorgos, Panagiotis Nastos, John Kapsomenakis, and Kostas Douvis. 2025. "Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques" Hydrology 12, no. 2: 31. https://doi.org/10.3390/hydrology12020031

APA Style

Ntagkounakis, G., Nastos, P., Kapsomenakis, J., & Douvis, K. (2025). Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques. Hydrology, 12(2), 31. https://doi.org/10.3390/hydrology12020031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Creation and Comparison of High-Resolution Daily Precipitation Gridded Datasets for Greece Using a Variety of Interpolation Techniques

Abstract

1. Introduction

1.1. Global and Regional Climate Models

1.2. Gridded Datasets Based on Observational Values

2. Materials and Methods

2.1. Data

2.1.1. Gauge Precipitation Dataset

2.1.2. Geospatial Data

2.1.3. Reanalysis Data

2.2. Data Transformations and Preprocessing

2.3. Methodology

2.3.1. Interpolation Models

2.3.2. Validation Process

2.3.3. Statistical Metrics

3. Results

3.1. Precipitation Totals

3.2. Wet Days

3.3. Number of Days Where Precipitation Exceeds 20 mm

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI