1. Introduction
The similarity of catchments is an intensively researched issue, examining how it can be determined and how it is possible to deduce from the physical similarity of catchments the similarity concerning the formation of runoff from them. A good synthesis of knowledge and an introduction to the topic can be found in [
1], in which more than 120 expert international authors participated under the guidance of the leading editors. The formation of runoff consists of the interaction of basic processes: distribution (e.g., interception and infiltration), storage (e.g., in plants, lakes, and soil), and release of water from catchments (e.g., evapotranspiration or river flow) [
2]. However, these processes are not often measured or cannot be observed directly with enough precision. Geographical and geophysical features are essential indicators of the hydrological response of catchments [
3] and can be used for catchment classification [
4,
5,
6] and the evaluation of their similarity. This is a complicated issue; the dominant factors in runoff generation are different in various parts of the world [
7,
8], and the identification of the weights or relative importance of various elements of runoff generation is often limited by convoluted effects of multiple variables [
9].
Similarity analysis can be used, for example, in determining flows in unmeasured catchments, which in turn is of great importance for solving many engineering tasks of water management. The determination of unknown flows is not addressed in this article; it only serves to introduce a possible context for the utilization of similarity of the catchments.
In most catchments, the flows are not measured, particularly in smaller streams. However, the flows of such streams need to be analyzed to check flood protection, the design of supply irrigation reservoirs, etc. Catchments that lack gauging stations are called unmeasured or ungauged catchments [
1].
Rainfall-runoff models can be used for the calculation of these unmeasured flows. However, one needs to calibrate such a model, which is highly problematic in unmeasured catchments, as some period of the measurement of flows is necessary for the calibration. One possible way to create and calibrate a model of ungauged catchment flows involves utilizing a catchment located in a nearby and similar region with measured flows. The chosen region should have comparable climate characteristics, geological conditions, topography, vegetation, land use, and soil types, as such a situation should generally lead to similar runoff. The model is calibrated in this catchment and can be used with limited accuracy in an unmeasured catchment. With this example, we can illustrate that determining the similarity of catchments has very useful applications.
The transfer of information from similar catchments is known as hydrological regionalization [
10]. The issue of hydrological regionalization has been the subject of intensive research. A helpful introduction to this topic is Hrachowitz et al. [
11] or Guo et al. [
12]. This article addresses only a portion of this subject, evaluating the similarities of catchments.
The term hydrological analogy is used in this work as its subject of interest is not larger regions, but rather single unmeasured catchments or small groups of catchments. Many different characteristics fundamentally influence the hydrological response of a catchment (geometric characteristics, topography, soil, land cover, geology, climate, etc.). Similarities between catchments must be evaluated using several such elements. Previous studies have demonstrated the usefulness of clustering in identifying similar groups of catchments. It has been shown to be possible to directly transpose the most similar donor catchment’s parameters to the target catchment for catchments in the same clustering group [
13,
14]. The k-means clustering algorithm was often used in these works. Some of the limitations of this algorithm are the accuracy of the initial random centroid’s location and the identification of the optimal number of clusters. To solve this problem, in [
15], the authors proposed a similarity analysis method based on the iterative clustering ensemble algorithm with random sampling, which was able to find more potentially similar watersheds than the standard application of k-means. Rao and Srinivas in [
16] used a two-step clustering procedure to identify groups of similar catchments by refining the clusters derived from agglomerative hierarchical clustering using the k-means algorithm. The physical similarity approach using clustering and principal component analysis, proposed in [
17], differs from the classical regionalization method, which simply transfers a full set of model parameters from donor catchments to the unmeasured target catchments, by finding relationships between physical catchment characteristics and model parameters. In [
18], the similarity of catchments with the k-means clustering method was investigated by using the hydrological response units, which are the smallest units that can be thought of as the cells of the SWAT model [
19]. Similarity measures based on Kohonen self-organizing maps [
20] and copulas have also been proposed [
21]. SOMs have some advantages over the k-means algorithm, such as a better visualization in two-dimensional space, into which multidimensional data (for example, a set of different river basin characteristics) are projected. Therefore, they allow a better decision on the number of clusters, which do not have to be input in advance. However, in various works where this method was evaluated, e.g., in [
22], comparisons with other methods show that it is only slightly better. Nevertheless, based on the mentioned advantages, the SOM is the more objective strategy in comparison with, e.g., k-means.
Studies on this topic, however, have usually involved testing a larger number of catchments, and so cluster analysis, principal component analysis, Kohonen self-organizing maps, and the like were appropriate. However, in this study, we do not intend to propose tools for analyzing large regions, processing large volumes of data, or studying many catchments. Instead, we aim to offer a penalization methodology for searching for the level of similarity in a smaller group of catchments to complete a practical engineering task (for example, the mentioned detection of unknown flows). From the point of view of solving practical tasks, the use of the advanced algorithms mentioned above is further complicated by several unresolved problems associated with their use, some of which we tried to indicate above.
More emphasis can be placed on engineering know-how—for example, visual inspection based on maps, graphs or direct field evaluation and the consideration of various irregular details when analyzing a smaller number of catchments. Errors in datasets that represent a larger number of catchments may not be as influential in terms of overall evaluation but can frequently lead to incorrect judgment when evaluating only a few catchments. Roughly speaking, when we consider a group of five objects, and a mistake occurs with one object, there is a 20% error. When we make the same error when analyzing 50 objects, the error may not be significant. Therefore, a task involving a smaller number of analyzed catchments differs from a task dealing with larger regions and is the novel contribution of this work.
This paper offers a similarity analysis method for small catchments (under 100 km2) in practical engineering tasks, and its original contributions are as follows:
- (1)
An evaluation model combining several aspects of catchments into a conjunct characteristic of their similarity utilizing a penalty approach
- (2)
A newly proposed catchment characteristic designed to evaluate the overall suitability of a given catchment for tasks related to hydrological similarity, which we call “calibrability”
The structure of this paper is as follows: in
Section 2, the case study is described; in
Section 3, the methods applied in this study are briefly explained; in
Section 4, GIS and statistical analyses performed with selected catchments are evaluated and discussed, and
Section 5 summarizes the results and offers conclusions.
2. Study Area
In this case study, a comparison of catchments was tested on four catchments in the Small Carpathians in Western Slovakia (
Figure 1); namely, the catchment of the small mountain stream Parná (catchment size—37.33 km
2), measured at the Horné Orešany water gauging station, the catchment of the Trnávka stream using the Buková gauging station (42.96 km
2), the catchment of the Vištucký stream using the Modra-Piesok station (9.38 km
2), and the Gidra stream catchment using the Píla gauging station (32.9 km
2). We will henceforth refer to these catchments by the names of their measuring stations.
In a real investigation of catchment similarity aiming to determine the unmeasured flows, the flows in the so-called target catchments are unknown. However, all selected catchments had measured flows in the presented study, meaning that it was possible to evaluate the relationship between their physical and hydrological similarity. To obtain more results, each catchment was alternately considered as unmeasured, and the remaining three catchments were understood as those from which we seek the most similar to this “unmeasured catchment”; so, we de facto examined the mutual similarity of all catchments.
The daily flow data for all the river catchments were obtained from the Slovak Hydro-meteorological Institute in Bratislava, Slovakia for the years 1980 to 2017. These data were used to create a rainfall-runoff models and, at the end of the work, to verify the similarity of the catchments determined based on other characteristics of the catchment. They were not used to determine similarity, as one catchment is always considered unmeasured. The data on precipitation and temperatures were also obtained from the same institution and used in rainfall-runoff modelling, not in determining the similarity of the catchments, as they are nearby areas. The distribution of flow data is shown in
Figure 2. The average monthly precipitation totals and temperatures in the area of interest are provided in
Table 1.
As neighboring catchments were investigated, it can be assumed that their properties, i.e., climate, topography, geology, soil cover, land use, the genesis of runoff, etc., were similar but not identical. Differences are analyzed in the Results section in such a way that in terms of each catchment descriptor, the least similar catchment to the others is always identified.
3. Methodology
In this work, digital elevation models (DEMs), land use maps, soil maps, and various other information sources were used to analyze the properties of the study area that influence runoff. The characteristics of the catchments investigated are reviewed below.
Various geometric properties influencing a catchment’s hydrological response, including the catchment’s area, perimeter, stream length, catchment length, and various catchment shape factors such as the form factor, circulation ratio, and elongation ratio, were determined using GIS software. The methods for determining these characteristics are well-known and can easily be found in the established hydrological literature [
23,
24,
25]. For this reason, we present the methodology for their determination in
Appendix A.
The topographic characteristics were derived from a DEM using GIS software. QGIS [
26] and R software [
27] were the primary tools used. The topographic similarity was examined using a digital elevation model with a grid size of 20 × 20 m. The characteristics evaluated included altitude, slope, and aspect, among others. Higher altitudes were accompanied by a higher total rainfall, depth, snow cover duration, and reduced temperature. Statistical methods and a hypsometric curve were used to compare the altitudes of the catchments; the hypsometric curve represents the relative area below (or above) a given altitude [
28] and describes the distributions of elevations across a catchment. A catchment’s slope determines the direction and speed of the runoff [
29].
Both the composition of the bedrock and soil properties also significantly influence the hydrological response of a catchment, as these characteristics result in a different rate of rainfall infiltration, percolation, retention of soil moisture, etc. [
29].
The parent rock and terrain also influence the formation of the drainage networks of the catchments, which were evaluated based on their density. A digital elevation model and QGIS software were used for mapping the drainage networks and evaluating their density.
Land cover is another crucial factor; it encompasses the percentage of forest soil, agricultural soil, and the built-up area in the catchment. Runoff is usually slowed by forest cover because it influences hydrological processes such as interception, rainfall infiltration, evaporation, and transpiration. Conversely, a built-up area or improperly cultivated arable land accelerates outflow [
30]. The areas of the individual land-use types in the examined localities were compared using GIS tools.
The climatic similarity between catchments can be evaluated by comparing factors such as precipitation, temperature, differences in rainfall, and potential evapotranspiration. Quite often, climate has been identified as the most important driving factor for different hydrological behaviours [
5,
6]. However, when comparing only nearby catchments, as is the case in this study, the climatic differences are usually negligible. As such, this work devotes minimal space to this factor.
This work also introduces the catchment characteristic “calibrability”, which is the level of precision achieved via the hydrological modelling of flows. Suppose we are investigating the similarity of catchments with the goal of creating a hydrological model for unmeasured catchments. If the flows cannot be modelled with sufficient precision in a measured catchment, transferring the model parameters to another, unmeasured catchment model, would be probably even less successful. The engineers solving the task in which the determination of unknown flows will be used must assess what level of accuracy is still acceptable in a particular task context, i.e., it would be different in the introductory study of an irrigation system design and different in a detailed design of flood protection. In general, however, it can be said that the determination of flow rates below a Nash–Sutcliffe efficiency of 0.5 is insufficient.
In this work, we will therefore use the umbrella term “calibrability” to emphasise the context in which it is used, to stress the importance of evaluating this aspect in searching for a suitable source (analogous) catchment and also because one must decide how to determine this level of accuracy in a particular task. It is necessary to take into account the purpose for which derived flows will be used. In a real-life task, the engineer may choose different indicators for tasks that address low flows and other indicators for solving flood problems. Identifying this characteristic can also be based on graphical methods (for example, the comparison of measured and calculated hydrograms) or on a combination of several indicators. This characteristic is expressed by the Nash–Sutcliffe coefficient in this work [
31], since the purpose for which the similarity of the catchments is evaluated is not addressed in it, and it is a frequently used indicator in the hydrological community. As will be shown in the Results section, it served quite well for the given purpose.
Based on the catchment properties (catchment descriptors), this paper offers a practical penalization methodology for identifying the most similar catchments from a few catchments surrounding an ungauged catchment of interest. In this penalization methodology, some space is left to engineering know-how for considering various irregular details. This is both possible and more appropriate due to the smaller number of catchments than in regional studies. Therefore, we proposed testing the process of determining the overall similarity between catchments using a penalization methodology in which a catchment most dissimilar from the other catchments is penalised by one point. The authors think that the determination of the penalty with different values of the penalty coefficient for each characteristic of the catchment exceeds the current know-how in this issue and exceeds the possibilities of a practical study that does not analyse the whole region.
The “hydrological response” view of catchment similarity was verified using specific flows. These are flows in millimetres per time step, i.e., computed from usual units according to the following formula:
where
Qm3/s is the flow in m
3 ∗ s
−1,
Qmm/day is the flow in mm × day
−1, and
a is the catchment area in km
2. We chose these units because the investigated catchments have/can have different areas (so that the flows can be compared). These units are also more advantageous in evaluating the hydrological balance, because evapotranspiration and precipitation are also in mm. The correlation of the specific flows was investigated because the catchments in the presented case study have different areas. In this verification process (
Figure 2), the proposed manner of penalization and mentioned unified size of the penalty coefficient was verified. Therefore, the similarity between catchments was evaluated from two points of view, one based on the catchment descriptors (which is an evaluation of the physical similarity) and the other based on the similarity of runoff response. Runoff response of two or more catchments is similar, when flows (time series of flows) have good correlation (e.g., above 0.65). A comparison of these two methods of assessing the similarity of catchments served as verification. In this process, it was verified as to whether both the penalty procedure and the assessment of the hydrological response led to the identification of the same catchment as most similar to the ungauged catchment.
4. Results and Discussion
This chapter contains catchment similarity assessments from several viewpoints, i.e., according to various characteristics. In the following, the most different catchment from the others in terms of each characteristic is evaluated. This catchment is penalized by one point. Penalization is summarized, and the overall similarity/dissimilarity of the catchments in question is determined at the end of this chapter in Table 6 (firstly, four auxiliary tables follow).
The basic geometric properties of the catchments, including area, perimeter, catchment length, segmentation of the catchment boundary, form factor, circulation ratio, elongation ratio and shape factor, are evaluated and summarized in
Table 2. The value of the most different catchment’s characteristic is marked in bold and is underlined. The most different catchment is then penalized with one point; these points are summed at the bottom. The perimeter and length of the catchments are not evaluated as they are used in the calculations of other (evaluated) geometric characteristics. Based on
Table 2, it can be concluded that the Buková catchment is the most dissimilar in terms of geometric properties, and it is recorded in Table 6, which summarizes all the properties.
An analysis of the altitude, slope, and orientation of the catchments slopes was performed using a digital elevation model (
Figure 3), as these are the primary topographic attributes that influence hydrological processes.
The lowest and highest altitudes were found in the Buková catchment at 195.3 m above sea level and 738.4 m above sea level, respectively (
Table 3). Compared to the other catchments, Buková’s median elevation was relatively low, and its elevations also included the highest number of outliers; see
Figure 4. Additionally, the Modra-Piesok catchment had the highest median altitude and deviated from the other catchments’ altitude conditions. In terms of this characteristic, Buková and Modra-Piesok are the most different from the other analyzed catchments (and are penalized in Table 6).
The hypsometric curves of the evaluated catchments are shown in
Figure 5, a non-dimensional measure of the proportion of the catchment above a given elevation. Hypsometry is used as an indicator of the geomorphic form of catchments. Many researchers have postulated that this is an important characteristic of a catchment’s form and is useful for explaining various hydrological or erosion processes [
24,
32]. It is clear from
Figure 5 that the hypsometric curve of the Buková catchment differs the most from the other evaluated catchments (penalized in Table 6).
The boxplots of the slopes (
Figure 6) show a similar distribution of values in the Buková, Horné Orešany and Píla catchments; the most frequent slope has a value of around 9°. The Modra-Piesok catchment distribution of slope values shows the smallest standard deviation in the set, i.e., 4.46, and the highest skewness (
Table 3), so it was evaluated as the most dissimilar to the rest of the evaluated catchments from this point of view (and it is penalized in Table 6).
The catchments’ slope aspects, which were obtained through a analysis of their DEMs, are shown in
Figure 7 and
Figure 8. The working procedures and methodology of these analyses can be found in geoinformatics literature such as the work of Lovelace et al. [
33].
Figure 8 shows the so-called radar graph, which summarizes the data on the slope aspect of each catchment. The relative number of cells with a given slope is plotted on each cardinal and intercardinal direction and labelled on the edge of the radar graph. North corresponds to the values of 0° (or 360°), east corresponds to 90°, south to 180°, and west to 270°. Using a relative view enables the comparison of aspects of catchments with different areas. Both the radar plots and the summary in
Table 3 led to the overall conclusion that the Modra-Piesok catchment is the most dissimilar in terms of slope aspect.
Table 3 includes a summary of the statistical characteristics of the preceding analyses of the catchments’ altitudes, gradients, and aspects of catchments.
Figure 9 shows the catchments’ drainage network, which was obtained using GIS tools (details are in
Appendix A). An important parameter in terms of the outflow regime is the density of the drainage network, which was 1.15 for the Buková catchment; and higher in others, i.e., 1.45, 1.53, and 1.71 for Modra-Piesok, Horné Orešany, and Píla, respectively. The Buková catchment can therefore be considered the most different in terms of drainage network.
The soils of the catchments investigated consisted exclusively of the loam, sandy loam, and loamy-sand soil textural classes. According to
Figure 10, Buková is almost entirely covered by loamy soil, which is less leaky than sandy soils, making it the most dissimilar catchment compared to the other catchments analyzed.
The land use was evaluated in GIS software using a vectorized map. Deciduous forest was the most common type of land cover.
Figure 11 shows the high similarity of Horné Orešany, Modra-Piesok, and Píla, which are predominantly covered with deciduous forest and some transition shrubs, i.e., with a land cover that slows down runoff. The Buková catchment is the only one with larger areas of other kinds of land use, mainly arable land and urbanized areas. This catchment also has the least amount of deciduous forest. Arable land and urbanized areas are more susceptible to surface runoff, which significantly influences the hydrological response of the catchment. In sum, Buková is considered the most different catchment; the percentages of the different types of land are shown in
Table 4. As with the other catchment characteristics, this assessment is marked by a penalty in Table 6.
4.1. Calibrability
The similarity of catchments was analyzed in this work as a preliminary step toward utilizing this factor in various analyses, such as analogous calculations of river flows in unmeasured catchments. A similar measured (donor) catchment to an unmeasured (target) catchment is sought in such calculations. Then, using the donor catchment, one can calibrate a hydrological model and subsequently use it for a target river catchment.
One important indicator of the suitability of a donor catchment is the possibility to satisfactorily accomplish its rainfall-runoff model’s calibration. We refer to this donor catchment property as “calibrability”. If calibrating the catchment’s model satisfactorily is impossible, then the catchment is not a suitable donor, as its model cannot act as a realistic predictor of flows for another catchment.
Calibrating a hydrological rainfall-runoff model means finding its parameters and ensuring the closest possible agreement of the calculated and measured flows on a given stream. Therefore, on the donor catchment, measured flows are required data to accomplish calibration. “Calibrability” was determined in this work using the TUW hydrological model [
34]. This model runs with a daily time step and consists of a snow routine, a soil moisture routine, and a flow routing routine; a genetic algorithm was used to calibrate its 15 parameters. The level of precision of the final model (agreement between measured and computed flows) for all catchments, as expressed by the value of the Nash–Sutcliffe (NSE) coefficient, is shown in
Table 5.
The highest “calibrability” (NSE values of 0.681 and 0.683) was evaluated for the Píla and Modra-Piesok catchments. They are, therefore, suitable candidates for use as donor catchments. The lowest “calibrability” value was seen in the Buková catchment (0.426); an NSE value lower than 0.5 cannot be accepted as satisfactory. Therefore, it can be concluded that the Buková catchment is not appropriate for the hydrological analogy and is also penalized in
Table 6 from this point of view.
4.2. Overall Assessment of the Similarity of the Catchments
The overall evaluation of the similarity of the catchments was undertaken using penalization. As previously mentioned, this method is more appropriate for a task in which (for engineering purposes) a smaller number of catchments are analyzed, as opposed to a task in which the whole region is analyzed. It is clear from the literature review that a smaller number of works in the hydrological literature have been dedicated to this practical task. For each previously discussed characteristic, the most dissimilar catchment from the others scored one point, as shown in
Table 6. In some ambiguous cases, the second least similar catchment was also given a point if it was also significantly different from the others. The penalization results of the individual catchments, together with their overall total, are shown in
Table 6.
Horné Orešany, and Píla were not penalized in any category, and the most often penalized (and thus most different) catchment was Buková, with eight points. The climate indicators are not listed in
Table 6, as a high similarity level was reported in terms of the climate conditions across all examined catchments.
In a real-life setting, determining the flows of a target catchment would involve one catchment with unknown flows (target) and the evaluation of the best possible choice (the most similar catchment) from potential donor catchments. In
Table 6, the best choice of donor catchment when each catchment is considered the target is evaluated.
For the Horné Orešany catchment, the best donor catchment is Píla; the second-best option is Modra-Piesok (
Table 7). The situation is similar when Píla is the target catchment, and Modra-Piesok could also use Píla and Horné Orešany as donor catchments. It would be most appropriate to look for other donor catchments for Buková, the least similar to all the other evaluated catchments.
4.3. Verification of Similarity Using Specific Flows
The purpose of this subchapter is to verify the proposed assessment of the similarity of catchments based on their hydrological response.
Catchments with known flows were selected to test the above-described evaluation of the similarity. We did not directly compare flows in the riverbed, but specific flows, as the catchments vary in size. The specific flows are flows per unit area of the catchment (in mm) and are compared in
Figure 12a using box plots.
Figure 12b shows the same flows, but their logarithms are displayed for clarity, making the similarity/dissimilarity between the catchments more evident. The smallest specific flows are in the Buková catchment; this finding confirms our methodology of examining the physical similarity of the catchments. From that viewpoint, the Buková catchment was also the least similar to other catchments. The flows of the other catchments are quite similar (as can be seen in
Figure 12b; they have close values of their medians, quartiles and outliers).
The proposed method of catchment similarity assessment was also verified using flow correlations. The mutual relation between specific flows is expressed in
Figure 13 through a plot of the correlation matrix. This matrix makes it possible to see which catchments have the most similar hydrological regimes to others and which are different. A correlation plot shows in the cells above the diagonal the correlation coefficients between the variables (time series of flows), which are identified by the gauging station name on the diagonal of the matrix. The correlation coefficient in a particular cell is between those two variables, which are in the vertical and horizontal directions from the diagonal. This figure serves as a quick overview of the flow data; therefore, mini histograms and mini scatter plots of all pairs of variables (flows) are also presented. The assignment of the variables for these mini plots is similar to in the case of the correlation coefficients. The average correlation coefficient could be easily evaluated for each catchment; it could be used to characterize a given catchment’s overall similarity/dissimilarity to others. This average is the smallest for the Buková catchment (0.54), again confirming its dissimilarity. Such catchments should not be used for calculating flows for unmeasured catchments. The low correlation can mean that there are factors in the catchment that influence its runoff that are not present in the other catchments or the given area. Average correlation coefficients with other catchments are as follows: for Horné Orešany, 0.66, Modra-Piesok, 0.60, Buková, 0.54, and Píla, 0.67. The scatterplots under a diagonal in
Figure 13 graphically illustrate the similarity/dissimilarity of the catchments; the smaller the dispersion of points around the regression line, the more hydrologically similar a given pair of catchments are.