1. Introduction
The frequency analysis of extreme events in hydrology is of particular importance tothe aim of determining the probability of occurrence of extreme events of a given magnitude.
Flood frequency analysis is important because it determines the maximum flow with certain exceeding probabilities; they have a defining role in the design of dams [
1] and in water management [
2], and can have significant impacts on human lives, infrastructure, and the environment.
Together with the distributions from the Gamma family and generalized extreme values, the Generalized Pareto Type 2 distribution (PGII) represents one of the most used distributions in flood frequency analysis [
3,
4,
5], especially in the analysis of partial series using the Annual Exceedance Series (AES) or Peak Over Thresholds (POT).
Among the Generalized Pareto distributions analyzed in this article, the ones that received considerable attention in flood frequency analysis are the Generalized Pareto distribution Type II (PGII) using AES or POT [
3,
4,
5], respectively, the Generalized Pareto distribution Type III (PGIII) and the Wakeby (WK5) distribution [
4,
6,
7,
8] in the analysis with the Annual Maximum Series (AMS). The PGIII is also known in the literature as the log-logistic distribution or generalized logistic distribution [
4,
6,
7,
9].
The PGII distribution has a broad use in the analysis of extreme events such as precipitation frequency analysis [
10,
11,
12,
13,
14,
15], low flow frequency analysis [
16] and in flood frequency analysis [
4,
5,
17,
18,
19,
20].
Based on Rao and Hamed [
4] and Hosking and Wallis [
21], respectively, the Generalized Pareto distribution “is the logical choice for modeling flood magnitudes that exceed a fixed threshold when it is reasonable to assume that successive floods follow a Poisson process and have independent magnitudes”.
In the case of the Wakeby distribution, it is a quantile distribution (a distribution expressible in inverse form) that in certain situations [
4,
8,
22] turns into the Generalized Pareto Type II distribution. Its application, using the close-form equations for the first four central moments, was realized for the first time by Anghel and Ilinca [
22,
23], both for the four and five parameters of Wakeby distributions.
Other distributions from the Generalized Pareto family have received little to no attention, such as the four parameters Generalized Pareto Type IV (PGIV4), the three parameters Generalized Pareto Type IV (PGIV3) and the Generalized Pareto Type I (PGI) also known as Pearson XI distribution.
Taking this into account, one of the objectives of the article consists of the analysis of the applicability of other distributions belonging to the same family of Generalized Pareto distribution in flood frequency analysis. For a comprehensive analysis, in this article, all the distributions belonging to this family are comparatively analyzed. Although some of these distributions have been used in the frequency analysis of extreme events in hydrology, this article brings new elements for these distributions that help to better understand and apply them in hydrology and beyond.
The research from this article is part of a larger and more complex research carried out in the Faculty of Hydrotechnics to identify the distributions from different families of distributions, which have applicability in frequency analysis of extreme values; partial results are presented in other materials [
22,
23,
24].
In this article, the estimation methods of the parameters of these distributions are the method of ordinary moments (MOM) and the method of linear moments (L-moments); for some of them, it is necessary to solve nonlinear systems of equations, which leads to some difficulties in using these distributions. Thus, for the ease of applications of these distributions, parameter approximation relations are presented, using polynomial, exponential or rational functions.
Only these two methods of estimating parameters are analyzed in this article, because the MOM is the “parent” method in Romania, and the L-moments method is the method that is intended to be used in the new regulations regarding the analysis of extreme phenomena in hydrology, being much more stable and less biased to small lengths of data [
3,
4,
8], as is the general case of hydrometry in Romania.
However, in recent years it has been demonstrated that this also requires certain corrections of both the statistical indicators and the parameters of the theoretical distributions obtained with this method [
25,
26].
All the mathematical elements necessary to use these distributions in the flood frequency analysis are presented.
New elements are presented, such as the first six raw and central moments for the PGII and PGI distributions; the relations for estimating the parameters with the L-moments for the PGIV4, PGIV3 and PGI distributions; new approximate parameter estimation relations for the PGII, PGI and PGIII distributions; the frequency factors for all analyzed distributions, both for the MOM and the L-moments and approximation relations of the frequency factors for the PGI, PGII and PGIII distributions.
Thus, all these novelty elements for these distributions presented in
Table 1 will help hydrology researchers to better understand and easily apply these distributions.
The raw and central moments of the analyzed distributions were determined using the methodology presented in the
Supplementary File, based on the probability density functions. It is for the first time that, for the Generalized Pareto distributions Type II and Type III, the raw and central moments up to order six are presented, important, along with the frequency factor, in establishing the confidence interval (for the MOM) using the Kite approximation [
4]. In addition, for the first time, the frequency factor of these distributions is presented based on the L-moments method, an important aspect in determining the confidence interval using the Chow approximation [
4], the latter was used in hydrology only based on the estimation of parameters with the MOM.
In order to verify the performances of the proposed distributions, a flood frequency analysis is carried out, using the Annual Maximum Series (AMS) and the Annual Exceedance Series (AES) for the Prigor River as a case study. All results are presented in comparison with the Pearson III distribution, which is the “parent” distribution in flood frequency analysis in Romania [
22]. The purpose of this article was to identify other distributions from this family with applicability in flood frequency analysis compared to the distributions already used in the literature from these families such as the PGIII and PGII. This article does not exclude the applicability of other distributions from other families (Gamma, GEV, Beta) in flood frequency analysis, especially since these families were also analyzed within the research carried out in the Faculty of Hydrotechnics and are presented in other materials [
22,
23,
24].
Comparing the results and choosing the best-fitted distribution is based on the L-skewness () and L-kurtosis () values and diagram. The values of the RME and RAE indicators are presented indicatively, knowing that they only properly evaluate the probability area of the observed values. Outside of this domain (left hand, upper part of the graph), they lose their relevance, because, in general, the data sets in Romania are not large enough (n > 80).
This article is organized as follows: The description of the statistical distributions by presenting the density function, the complementary cumulative function and the quantile function is in
Section 2.1. The presentation of the relations for exact calculation and the approximate relations for determining the parameters of the distributions is in
Section 2.2. Case studies by applying these distributions in flood frequency analysis for the Prigor River are in
Section 3. Results, discussions and conclusions are in
Section 3,
Section 4 and
Section 5.
2. Methods
The frequency analysis consists of determining the flows with certain exceedance probabilities using the AMS and the AES, respectively, with the Prigor River as a case study.
The series of maximum annual flows consists of choosing the maximum value corresponding to each year. In many cases, the lower maximum values of the annual data series do not always represent ”flood”. Thus, the use of frequency analysis using the AES is required, which allows secondary events, which exceed certain annual maximum values, which are to be considered as “flood”.
The AES was established by a descending sorting of all independent maximum values and choosing the first “
n” values corresponding to the “
n” number of years of analysis [
14,
15].
Using this criterion, it is important to verify that two or more maximum flow values do not come from the same flood. The independence of flows was verified and established based on the Cunnane criterion and the USWRC 1976 criterion, respectively [
27].
The determination of the maximum flows was carried out in stages according to
Figure 1. The verification of the character of outliers (Grubbs, Pilon, Quartile method), homogeneity and independence of flows was carried out in the data curation phase. No outliers were detected.
The estimation of the parameters of the analyzed distributions was done with MOM and L-moments. The MOM estimation has the disadvantage that for high-order moments and small data series, in many cases, it generates unrealistic values because the high-order moments require correction [
4,
5,
28,
29]. For skewness, the correction can be made using the Bobee relation [
26] or, as is the practice in Romania [
22,
30,
31,
32,
33], being established according to the origin of the maximum flows by multiplying the coefficient of variation (
) by a coefficient reflecting this origin. In many cases, the choice is subjective and without rigor [
22,
34].
Considering that in many cases it is necessary to solve some systems of non-linear equations to estimate the parameters of the analyzed distributions, approximate relations for estimating the parameters were determined in the case of distributions where skewness and L-skewness depend on a single parameter. In addition, for a simplified and fast calculation that takes into account the fact that the inverse function can be expressed with the frequency factor for both MOM [
4,
23,
34] and L-moments [
23], the approximation relations of the factor of frequency (and the coefficients of these relations) are for the most frequent exceedance probabilities in the analysis of maximum flows. The estimation errors of both the parameters and the frequency factors are between 10
−3 and 10
−4.
The quantile results are compared with those of the Pearson III distribution, which is the “parent” distribution in Romania in the analysis of extreme events in hydrology, especially in the flood frequency analysis [
22,
30,
31].
2.1. Probability Distributions
In
Table 2 the probability density function,
; the complementary cumulative distribution function,
, and the quantile function,
, are presented for analyzed distributions [
4,
5,
6,
7,
8,
9,
14,
15,
16,
17]. All
and
of the analyzed distributions were determined using the methodology presented in the
Supplementary file using only
.
2.2. Parameter Estimation
The parameters estimation of the analyzed statistical distributions is presented for the MOM and the L-moments method, some of the most used methods in hydrology for parameter estimation [
3,
4,
5,
7,
22].
2.2.1. Generalized Pareto Type IV (PGIV4)
The equations needed to estimate the parameters with MOM have the following expressions:
Because they are too long, the relations for estimating the skewness and the kurtosis are presented in
Appendix F.
The equations needed to estimate the parameters with L-moments have the following expressions:
where
represent the first four linear moments.
2.2.2. Generalized Pareto Type IV (PGIV3)
The distribution represents a particular case of the Pareto IV distribution when the position parameter
. It is also known as the beta_p or Singh–Maddala distribution [
9,
35].
The equations needed to estimate the parameters with MOM have the following expressions:
The equations needed to estimate the parameters with L-moments have the following expressions [
35]:
2.2.3. Generalized Pareto Type III (PGIII)
The equations needed to estimate the parameters with MOM have the following expressions [
3,
4]:
The shape parameter can be obtained approximately depending on the skewness coefficient using the following function:
The equations needed to estimate the parameters with L-moments have the following expressions:
2.2.4. Generalized Pareto Type II (PGII)
The equations needed to estimate the parameters with MOM have the following expressions [
3,
4,
5,
36]:
The shape parameter can be obtained approximately depending on the skewness coefficient using the following functions:
The frequency factor (for MOM), presented in
Appendix B, can be obtained approximately using a polynomial function of skewness and probability, whose coefficients are presented in
Table A4 from
Appendix D.
The equations needed to estimate the parameters with L-moments have the following expressions:
Based on these equations, the parameters have the following expressions:
The frequency factor (for L-moments) can be obtained approximately using a polynomial function depending on skewness and probability, whose coefficients are presented in
Table A5 from
Appendix D.
2.2.5. Generalized Pareto Type I (PGI)
The equations needed to estimate the parameters with MOM have the following expressions:
The shape parameter can be obtained approximately depending on the skewness coefficient using the following functions:
The equations needed to estimate the parameters with MOM have the following expressions:
Based on these equations, the parameters have the following expressions:
2.2.6. The Five-Parameter Wakeby Distribution (WK5)
The equations needed to estimate the parameters with MOM have the following expressions:
The equations for skewness and kurtosis are presented in
Appendix E.
The equations needed to estimate the parameters with L-moments have the following expressions [
7,
8]:
3. Case Study
The presented case study consists in the determination of the maximum annual flows, on the Prigor River, Romania, using the proposed probability distributions.
The Prigor River is the left tributary of the Nera River, and it is located in the south-western part of Romania, as shown in
Figure 2. The geographical coordinates of the location are 44°55′25.5″ N 22°07′21.7″ E.
The main morphometric characteristics of the river are presented in
Table 3 [
37].
In the section of the hydrometric station, the watershed area is 141 km2 and the average altitude is 729 m. The river has a length of 33 km, with an average slope of 22‰ and a sinuosity coefficient of 1.83.
There are 31 annual maximum flows; the values are presented in
Table 4.
For the analysis with the AES, the maximum flows resulting from the selection are presented in
Table 5.
The main statistical indicators of the data series are presented in
Table 6.
4. Results
The proposed distributions from the Generalized Pareto family, were applied to perform a flood frequency analysis using the annual maximum series (AMS) and the annual exceedance series (AES) analysis, on the Prigor River.
The MOM and the L-moments method were used to estimate the parameters of the distributions. For the MOM, the skewness coefficient was chosen depending on the origin of the flows according to Romanian regulations [
22,
30] based on some multiplication coefficients for the coefficient of variation (
). For the Prigor River, the multiplication coefficient 3, applied to the coefficient of variation, resulted in a skewness of 2.29 for AMS and 1.786 for AES. In
Table 7 and
Table 8 the resulted values of quantile distributions for some of the most common exceedance probabilities in flood frequency analysis are presented.
Figure 3 and
Figure 4 show the fitting distributions for AMS and AES for the Prigor River. For plotting positions, the Alexeev formula was used [
2].
In
Figure S1, from the
Supplementary Material, the confidence interval for each analyzed distribution is presented, both for the MOM and L-moments, using Chow’s relation [
4,
36] for a 95% confidence level.
Table 9 shows the values of the distributions parameters for the two methods of estimating and for both the AMS and AES.
Considering that it is desired to apply the distributions using the L-moments method, choosing the best distribution is based on the L-skewness (
) and L-kurtosis (
) values and diagram. The values of the RME and RAE indicators [
38,
39,
40] are presented informatively, knowing that they are relevant only in the area of the observed data.
The distribution performance values are presented in
Table 10 and
Table 11. The values for the best-fitted model are highlighted in bold.
5. Discussions
In this article, the applicability of the distributions from the Generalized Pareto family in flood frequency analysis was analyzed using the Prigor River as a case study.
The analysis was performed using the AMS and AES. As can be seen both graphically (
Figure 4) and tabularly (
Table 7), the analysis with the AMS is more conservative than the analysis with the AES.
The main advantage of the AMS analysis is the ease of data selection, these being chosen as the maximum flow corresponding to each year. The disadvantage of the analysis is the use of maximum flows characteristic of each year, which do not always represent floods. These values located in the right-hand (high probabilities) lead to a steeper graph with higher values of quantiles in the field of low probabilities (left-hand).
The advantage of the analysis with AES is that the flows of the data series always represent floods. The disadvantage is the greater effort in data selection, through additional analyzes regarding data independence, respecting the condition that these maximum flows do not come from the same flood.
The estimation methods of the analyzed distribution parameters were performed for the MOM (standard method in Romania) and L-moments, two of the most used estimation methods in hydrology.
For the MOM analysis, the skewness was chosen depending on the origin of the flows, as is the hydrological practice in Romania. The use of multiplication coefficients for the calculation of the corrected skewness is an outdated method based on principles from the abrogated norms and a legacy from the USSR normative standards.
All analyzed distributions represent particular cases of the Generalized Pareto distribution, which are distributions of three and four parameters. The Wakeby distribution, which is a five-parameter distribution, was analyzed because it has as its particular case the PGII distribution, which is a quantile function whose structure is made up of two quantile functions of the PGII distribution.
All the results obtained in the case study are presented in comparison to the Pearson III distribution, which is considered the “parent” distribution in Romania, for the most used exceedance probabilities in hydrology.
Considering that it is desired to implement the L-moments method in Romania, according to Tabel 10 and Tabel 11, the best-fitted distributions are the PGIV4 and WK5 distributions, which best approximate the statistical indicators of the data set, and . The PGII and PGI distributions give satisfactory results because the natural values and of the distributions are close to those of the data sets.
For the PE3, PGIV3, PGIII, PGII and PGI, which are three parameter distributions, the resulting values are characterized by a high degree of uncertainty, especially in the area of small exceedance probabilities (left-hand), due to the fact that a proper calibration of the higher moments cannot be done. The confidence intervals for the analyzed distributions are presented in the
Supplementary Material.
As observed in other materials [
24], the apparent stability of the Pearson III distribution is due to the fact that the variation of the shape parameter for the two estimation methods does not differ much, except in the upper area of
and
. The same cannot be said, for example, about the PGI, PGII and PGIII distributions, in which the variation is extremely large.
Figure 5 shows the variation graph of the shape parameter for the PE3, PGIII, PGII and PGI distributions. As it could be observed in
Section 2.1, both skewness and L-skewness depend only on the shape coefficient
.
The results of the quantiles obtained with the L-moments method, for the PGII and PGI distributions, both for the AES and AMS, presented in
Table 4 and
Table 5, are the same, the two distributions being mutual special cases. The WK5 and PGIV4 distributions best approximate, as expected, the values of the indicators obtained with L-moments, which are distributions of five and four parameters, respectively.
Figure 6 shows the
variation diagram of the distributions as well as their relation to the values of the two indicators of the data sets.
Regarding the results obtained with MOM, for AES it can also be observed that the PGII and WK5 distributions have extremely close quantile values, which is due to the fact that for the same
they correspond to the same value of
, the WK5 distribution becoming the PGII distribution, as it was highlighted in other materials [
4,
8,
9,
22]. This is due to the choice of skewness based on the origin of the maximum flows. This is another disadvantage of using MOM in Romania.
Concerning the WK5 distribution, although it is a distribution that was introduced in flood frequency analysis to achieve the so-called “separation effect” described by Matalas [
9,
22], it can be seen that it is extremely sensitive depending on the analysis used (AMS or AES), and this is due to the particular cases in which it can take.
Figure 7 shows the skewness (
)-kurtozis (
) variation diagram of the distributions.
6. Conclusions
The Generalized Pareto distribution represents a usual distribution used in the analysis of extreme events in hydrology. In flood frequency analysis this is especially applied using the partial series of maximum flows.
This article presents five distributions of three, four and five parameters that represent different forms of the Generalized Pareto distribution, some of them received limited attention in flood frequency analysis.
These distributions were analyzed (besides other families of distributions) in the research carried out in the Faculty of Hydrotechnics regarding the elaboration of a norm in Romania for the frequency analysis using the L-moments method.
The main purpose of the article was to identify other distributions from the same family that have applicability in flood frequency analysis using both AMS and AES.
For the transparency and ease of use of these distributions, all the necessary elements for their use are presented, such as the exact and approximate relations of the parameters’ estimation, and of the frequency factors, which eliminates the need for iterative numerical calculation, thus facilitating their applicability.
The performances of these distributions were verified in flood frequency analysis of the Prigor River, using the Annual Maximum Series and the Annual Exceedance Series.
The results were evaluated using the values of and , based on the diagram, compared to that of the data sets, which is the easiest and most accessible selection criterion.
Based on this study’s results, and also from the research carried out in the Faculty of Hydrotechnics for other sites, for flood frequency analysis and the L-moments estimation method, good candidates, from the Generalized Pareto family, are the PGIV4 and WK5 distributions, which are distributions of four and five parameters, which have the advantage that they can calibrate all linear moments.
Regarding the Wakeby distribution, this requires an additional analysis because in some cases it turns into the PGII distribution, which is a three-parameter distribution that does not achieve a satisfactory calibration of the linear moments.
In general, the three parameters distributions can be used in the analysis with L-moments, but the selection of their use must be made based on the
-
diagram so that the natural values
and
of the distribution to be very close to those of the observed data. Based on the work of Anghel si Ilinca [
23], in
Appendix A the
diagram for a wide range of distributions used in hydrology is presented.
Mathematical support in statistical analysis is useful because the use of software (EasyFit, HEC-SSP, etc) without knowledge of mathematical foundations often leads to superficial analyzes with negative consequences. Another important aspect of the presentation of all the mathematical elements necessary for the application of these distributions is the fact that the software dedicated to statistical analysis is limited and does not offer the possibility of choosing the skewness coefficient depending on the origin of the maximum flows, as is the practice in Romania.
The research in this article is part of a more complex research carried out within the Faculty of Hydrotechnics, with the main aim of establishing the necessary guidelines for a robust, clear and concise norm regarding the determination of the maximum flow using the L-moment estimation method.