**2. Materials and Methods**

The data used in the present study are daily (00UTC and 12UTC) 0.25◦ × 0.25◦ grid point values of air temperature (AT), dew point temperature (DP), zonal (ZW) and meridional (MW) wind components, Convective Available Potential Energy (CAPE), Convective Inhibition (CIN), and total cloud cover (TCC) for the southern Balkans area (19◦–29◦ E, 34◦–42◦ N) (Figure 1) for the 10-year period of 2008 to 2017, obtained from the ERA5 Reanalysis data set [22]. The selection of the above parameters has been made taking into account that their values over the examined area are directly connected to the climate of the region, by either determining its main characteristics (AT, DP, ZW, MW, and TCC) or being responsible for the in situ extreme precipitation events related to thunderstorms (CAPE and CIN). This is not the case for other parameters for example sea level pressure or geopotential height, which affect the climate characteristics of the region indirectly and remotely and have to be examined over a broader area. Also, the data corresponds to 00UTC and 12UTC hours in order to involve both midnight and midday atmospheric conditions, which are generally different especially during the

warm period of the year, mainly because of the intense daytime land warming and the development of small-scale circulations (e.g., see breezes). ERA5 is a recently introduced ECWF data set, which provides hourly values of many atmospheric, land, and oceanic parameters at a horizontal resolution of 31 km on 137 levels from the surface up to 0.01 hPa (~80 km above the earth's surface). It combines large quantities of historical observations into global estimates with the use of advanced modeling and data assimilation procedures [22]. For each of the above parameters (AT, DP, ZW, MW, CAPE, CIN, and TCC) and time (00UTC and 12UTC), the 2008–2017 long-term mean spatial anomaly patterns are calculated for each of the 365 calendar days of the year. The spatial anomaly pattern of a specific parameter for a specific calendar day is calculated by subtracting the spatial average from the value of each grid point. Thus, a matrix containing all the long-term mean spatial anomaly patterns of the above parameters at 00UTC and 12UTC for the 365 calendar days of the year is constructed. Each column of the matrix corresponds to a specific parameter, a specific hour (00UTCor 12UTC) and a specific calendar date of the year, while each line corresponds to a specific grid point of the study area.

**Figure 1.** The geographical domain used.

Principal Component Analysis (PCA), with varimax rotation, is applied on the above matrix as a dimensionality reduction tool. PCA is a multivariate statistical method which projects a set of possibly correlated variables onto a set of uncorrelated variables, which are called principal components. Only the statistically significant components are used for the next step and their number is indicated by the SCREE plot and the physical hypostasis of the results [23,24]. Next, K-Means Cluster Analysis (CA) is applied on the time series of the standardized significant principal components in order to group grid points, and thus to define the areas with homogenous climate characteristics regarding the spatial anomalies of specific climatic parameters during specific sub-periods of the year. CA is a statistical method that classifies cases of a set of variables into objectively defined distinct and homogeneous clusters. The squared Euclidean distance is selected to be the measure of similarity, while the k-means technique succeeds in the continuous rearrangement of the cases in new clusters optimizing the final classification [25–27]. The optimum number of clusters is indicated by the distortion test [28]. For the grid points classified into each of the clusters, the mean intra-annual variations of all the climatic parameters are constructed. These intra-annual variations are smoothed by averaging the daily values over each of the 73 (365/5) 5-day periods of the year. In this way the main climate characteristics of the objectively defined areas regarding the magnitude of each climatic parameter, relatively to the spatial average, during the year, are revealed. The methodology scheme, which is followed in the present study and is described in the above paragraphs, is presented in Figure 2. Finally, a comparison between the ERA5 and ERA-Interim data sets is carried out for the common period of 2008 to 2017. This comparison involves air temperature and total cloud cover, parameters which are connected to the most significant climate characteristics, and it is performed separately for the land and the sea areas. For this purpose, daily values of ERA-Interim 1◦ × 1◦ grid point data of air temperature and total cloud cover are also used [29].

**Figure 2.** The methodology scheme used in the present study.
