Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions

Korolev, Victor; Gorshenin, Andrey

doi:10.3390/math8040604

Open AccessArticle

Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions

by

Victor Korolev

^1,2,3,4

and

Andrey Gorshenin

^1,2,3,*

¹

Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia

²

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991 Moscow, Russia

³

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333 Moscow, Russia

⁴

Department of Mathematics, School of Science, Hangzhou Dianzi University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(4), 604; https://doi.org/10.3390/math8040604

Submission received: 4 April 2020 / Revised: 12 April 2020 / Accepted: 14 April 2020 / Published: 16 April 2020

(This article belongs to the Special Issue Stability Problems for Stochastic Models: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Mathematical models are proposed for statistical regularities of maximum daily precipitation within a wet period and total precipitation volume per wet period. The proposed models are based on the generalized negative binomial (GNB) distribution of the duration of a wet period. The GNB distribution is a mixed Poisson distribution, the mixing distribution being generalized gamma (GG). The GNB distribution demonstrates excellent fit with real data of durations of wet periods measured in days. By means of limit theorems for statistics constructed from samples with random sizes having the GNB distribution, asymptotic approximations are proposed for the distributions of maximum daily precipitation volume within a wet period and total precipitation volume for a wet period. It is shown that the exponent power parameter in the mixing GG distribution matches slow global climate trends. The bounds for the accuracy of the proposed approximations are presented. Several tests for daily precipitation, total precipitation volume and precipitation intensities to be abnormally extremal are proposed and compared to the traditional PoT-method. The results of the application of this test to real data are presented.

Keywords:

precipitation; limit theorems; statistical test; generalized negative binomial distribution; generalized gamma distribution; asymptotic approximations; extreme order statistics; random sample size

MSC:

60F05; 62G30; 62E20; 62P12; 65C20

1. Introduction

In this paper, we continue the research we started in [1,2]. We develop the mathematical models for statistical regularities in precipitation proposed in the papers mentioned above. We consider the models for the statistical regularities in the duration of a wet period, maximum daily precipitation within a wet period and total precipitation volume per wet period. The base for the models is the generalized negative binomial (GNB) introduced in the recent paper [3]. The GNB distribution is a mixed Poisson distribution, the mixing distribution being generalized gamma (GG). The results of fitting the GNB distribution to real data are presented and demonstrate excellent concordance of the GNB model with the empirical distribution of the duration of wet periods measured in days. Based on this GNB model, asymptotic approximations are proposed for the distributions of the maximum daily precipitation volume within a wet period and of the total precipitation volume for a wet period. The asymptotic distribution of the maximum daily precipitation volume within a wet period turns out to be a tempered scale mixture of the gamma distribution in which the scale factor has the Weibull distribution, whereas the asymptotic approximation for the total precipitation volume for a wet period turns out to be the GG distribution. These asymptotic approximations are deduced using limit theorems for statistics constructed from samples with random sizes having the GNB distribution. The bounds for the accuracy of the proposed approximations are discussed theoretically and illustrated statistically. The proposed approximations appear to be very accurate. Based on these models, two approaches are proposed to the definition of abnormally extremal precipitation. These approaches can be regarded as a further development of those proposed in [2].

The importance of the problem of modeling statistical regularities in extreme precipitation is indisputable. Understanding climate variability and trends at relatively large time horizons is of crucial importance for long-range business, say, agricultural projects and forecasting of risks of water floods, dry spells and other natural disasters. Modeling regularities and trends in heavy and extreme daily precipitation is important for understanding climate variability and change at relatively small or medium time horizons. However, these models are much more uncertain as compared to those derived for mean precipitation or total precipitation during a wet period. In [4], a detailed review of this phenomenon is presented and it is noted that, at least for the European continent, most results hint at a growing intensity of heavy precipitation over the last decades.

In [2], we proposed a rather reasonable approach to the unambiguous (algorithmic) determination of extreme or abnormally heavy total precipitation for a wet period. This approach was based on the NB model for the duration of wet periods measured in days, and, as a consequence, on the distribution of the total precipitation volume during a wet period. This approach has some advantages. First, estimates of the parameters of the total precipitation are weakly affected by the accuracy of the daily records and are less sensitive to missing values. Second, the corresponding mathematical models are theoretically based on limit theorems of probability theorems that yield unambiguous asymptotic approximations, which are used as adequate mathematical models. Third, this approach gives an unambiguous algorithm for the determination of extreme or abnormally heavy total precipitation that does not involve statistical significance problems owing to the low occurrence of such (relatively rare) events.

The problem of the construction of a statistical test for the precipitation volume to be abnormally large can be mathematically formalized as follows. Let

m ⩾ 2

be a natural number and consider a sample of m positive observations

X_{1}, X_{2}, \dots, X_{m}

. With finite m, among

X_{i}

’s there is always an extreme observation, say,

X_{1}

, such that

X_{1} ⩾ X_{i}

,

i = 1, 2, \dots, m

. Two cases are possible: (i)

X_{1}

is a ‘typical’ observation and its extreme character is conditioned by purely stochastic circumstances (there must be an extreme observation within a finite homogeneous sample) and (ii)

X_{1}

is abnormally large so that it is an ‘outlier’ and its extreme character is due to some exogenous factors.

To construct a test for distinguishing between these two cases for abnormally extreme daily precipitation, we use the fact that the distribution of the maximum daily precipitation per wet period is a tempered scale mixture of the gamma distribution in which the scale factor has the Weibull distribution. According to this model, a daily precipitation volume is considered to be abnormally extremal if it exceeds a certain (pre-defined) quantile of this distribution.

As regards testing for anomalous extremeness of total precipitation volume during a wet period, we use the GG distribution as the model of statistical regularities of its behavior. The theoretical grounds for this model are provided by the law of large numbers for random sums in which the number of summands has the GNB distribution. It turns out that, as compared to the ordinary negative binomial (NB) model (see [2]), the additional exponent power parameter in the corresponding GG distribution matches slow global climate trends. Hence, the hypothesis that the total precipitation volume during a certain wet period is abnormally large can be re-formulated as the homogeneity hypothesis of a sample from the GG distribution. Two equivalent tests are proposed for testing this hypothesis. One of them is based on the beta distribution whereas the second is based on the Snedecor–Fisher distribution. Both of these tests deal with the relative contribution of the total precipitation volume for a wet period to the considered set (sample) of successive wet periods. Within the second approach, it is possible to introduce the notions of relatively abnormal and absolutely abnormal precipitation volumes. These tests are scale-free and depend only on the easily estimated shape parameter of the GNB distribution and the time-scale parameter determining the denominator in the fractional contribution of a wet period under consideration. The tests appeared to be applicable not only to total precipitation volumes over wet periods but also to the precipitation intensities (the ratios of total precipitation volumes per wet periods to the durations of the corresponding wet periods measured in days).

2. Generalized Negative Binomial Model for the Duration of Wet Periods

The main results of this paper strongly rely on the GNB model, a wide and flexible family of discrete distributions that are mixed Poisson laws with the mixing GG distribution. Namely, we say that a random variable

N_{r, γ, μ}

(

r > 0

,

γ \in R

and

μ > 0

) has the generalized negative binomial distribution, if

P (N_{r, γ, μ} = k) = \frac{1}{k!} \int_{0}^{\infty} e^{- z} z^{k} g^{*} (z; r, γ, μ) d z, k = 0, 1, 2 . . .,

(1)

where

g^{*} (z; r, γ, μ)

is the density of GG distribution:

g^{*} (x; r, γ, μ) = \frac{| γ | μ^{r}}{Γ (r)} x^{γ r - 1} e^{- μ x^{γ}}, x ⩾ 0,

(2)

with

γ \in R

,

μ > 0

,

r > 0

. The GNB distributions seem to be very promising in the statistical description of many real phenomena, being very convenient and almost universal models.

It is necessary to explain why this combination of the mixed and mixing distributions is considered. First of all, the Poisson kernel is used as mixed for the following reasons. Pure Poisson processes can be regarded as the best models of stationary (time-homogeneous) chaotic flows of events [5]. Recall that the attractiveness of a Poisson process as a model of homogeneous discrete stochastic chaos is due to at least two circumstances. First, Poisson processes are point processes characterized by the time intervals between successive points that are independent random variables (r.v.’s) with one and the same exponential distribution, and, as is well known, the exponential distribution possesses the maximum differential entropy among all absolutely continuous distributions concentrated on the nonnegative half-line with finite expectations, whereas the entropy is a natural and convenient measure of uncertainty. Second, the points forming the Poisson process are uniformly distributed along the time axis in the sense that for any finite time interval

[t_{1}, t_{2}]

,

t_{1} < t_{2}

, the conditional joint distribution of the points of the Poisson process that fall into the interval

[t_{1}, t_{2}]

under the condition that the number of such points is fixed and equals, say, n, coincides with the joint distribution of the order statistics constructed from an independent sample of size n from the uniform distribution on

[t_{1}, t_{2}]

, whereas the uniform distribution possesses the maximum differential entropy among all absolutely continuous distributions concentrated on finite intervals and very well corresponds to the conventional impression of an absolutely unpredictable random variable (see, e g., [5,6]). However, in actual practice, as a rule, the parameters of the chaotic stochastic processes are influenced by poorly predictable <<extrinsic>> factors, which can be regarded as stochastic so that most reasonable probabilistic models of non-stationary (time-non-homogeneous) chaotic point processes are doubly stochastic Poisson processes, also called Cox processes (see, e.g., [5,7,8]). These processes are defined as Poisson processes with stochastic intensities. Such processes proved to be adequate models in insurance [5,7,8], financial mathematics [9], physics [10] and many other fields. Their one-dimensional distributions are mixed Poisson.

In order to have a flexible model of a mixing distribution that is “responsible” for the description of statistical regularities of the manifestation of external stochastic factors, we suggest to use the GG distributions defined by the density (2). The class of GG distributions was first described as a unitary family in 1962 by E. Stacy [11] as the class of probability distributions simultaneously containing both Weibull and gamma distributions. The family of GG distributions contains practically all the most popular absolutely continuous distributions concentrated on the non-negative half-line. In particular, the family of GG distributions contains:

•

The gamma distribution

(γ = 1)

and its special cases

∘: The exponential distribution ( $γ = 1$ , $r = 1$ ),
∘: The Erlang distribution ( $γ = 1$ , $r \in N$ ),
∘: The chi-square distribution ( $γ = 1$ , $μ = \frac{1}{2}$ );

•

The Nakagami distribution (

γ = 2

);

•

The half-normal (folded normal) distribution (the distribution of the maximum of a standard Wiener process on the interval

[0, 1]

) (

γ = 2

,

r = \frac{1}{2}

);

•

The Rayleigh distribution (

γ = 2

,

r = 1

);

•

The chi-distribution (

γ = 2

,

μ = 1 / \sqrt{2}

);

•

The Maxwell distribution (the distribution of the absolute values of the velocities of molecules in a dilute gas) (

γ = 2

,

r = \frac{3}{2}

);

•

The Weibull–Gnedenko distribution (the extreme value distribution of type III) (

r = 1

,

γ > 0

);

•

The (folded) exponential power distribution (

γ > 0

,

r = \frac{1}{γ}

);

•

The inverse gamma distribution (

γ = - 1

) and its special case

∘: The Lévy distribution (the one-sided stable distribution with the characteristic exponent $\frac{1}{2}$ – the distribution of the first hit time of the unit level by the Brownian motion) ( $γ = - 1$ , $r = \frac{1}{2}$ );

•

The Fréchet distribution (the extreme value distribution of type II) (

r = 1

,

γ < 0

)

and other laws. The limit point of the class of GG-distributions is

•: The log-normal distribution ( $r \to \infty$ ).

GG distributions are widely applied in many practical problems. There are dozens of papers dealing with the application of GG-distributions as models of regularities observed in practice. Apparently, the popularity of GG-distributions is due to the fact that most of them can serve as adequate asymptotic approximations, since all the representatives of the class of GG-distributions listed above appear as limit laws in various limit theorems of probability theory in rather simple limit schemes. Below we will formulate a general limit theorem (an analog of the law of large numbers) for random sums of independent r.v.’s in which the GG-distributions are limit laws. It is worth noting that the GG distribution and its limit cases give a general form of the exponential distribution of rank 1 for the scale parameter.

In [1], the data registered in so climatically different points as Potsdam (Brandenburg, Germany) and Elista (Kalmykia, Russia) was analyzed, and it was demonstrated that the fluctuations of the numbers of successive wet days with very high confidence fit the NB distribution with shape parameters

r = 0.847

and

r = 0.876

, respectively. In the same paper, a schematic attempt was undertaken to explain this phenomenon by the fact that NB distributions can be represented as mixed Poisson laws with mixing gamma distributions whereas, as it already has been mentioned, the Poisson distribution is the best model for the discrete stochastic chaos and the mixing distribution accumulates the stochastic influence of factors that can be assumed exogenous with respect to the local system under consideration.

The NB distributions are special cases of the GNB distributions. This family of discrete distributions is very wide and embraces Poisson distributions (as limit points corresponding to a degenerate mixing distribution), NB (Polya) distributions including geometric distributions (corresponding to the gamma mixing distribution, see [12]), Sichel distributions (corresponding to the inverse gamma mixing distributions, see [13,14]), Weibull–Poisson distributions (corresponding to the Weibull mixing distributions, see [15]) and many other types supplying descriptive statistics with many flexible models. More examples of mixed Poisson laws can be found in [8,16].

It is quite natural to expect that, having introduced one more free parameter into the pure negative binomial model, namely, the power parameter in the exponent of the original gamma mixing distribution, instead of the negative binomial model one might obtain a more flexible GNB model that provides an even better fit with the statistical data of the durations of wet days. The analysis of the real data shows that this is indeed so.

In Figure 1 and Figure 2 there are the histograms constructed from real data of 3323 wet periods in Potsdam and 2937 wet periods in Elista. On the same pictures, there are the graphs of the fitted NB distribution (that is, the GNB distribution with

γ = 1

) and the fitted GNB distribution with additionally adjusted scale and power parameters. For vividness, in the GNB model, the value of the shape parameter r was taken the same as that obtained for the NB model and equal to 0.876 for Elista and 0.847 for Potsdam. For the “fine tuning” of the GNB models with these fixed values of r, the minimization of the

ℓ_{1}

-norm of the difference between the histogram and the fitted GNB model was used. In Appendix A, the Algorithm A1 of for the computation of GNB probabilities by the minimization of the

ℓ_{1}

,

ℓ_{2}

and

ℓ_{\infty}

-norms of the difference between the histogram and the fitted GNB model is presented.

The analytic and asymptotic properties of the GNB distributions were studied in [3]. In particular, it was shown in that paper that the GNB distribution with shape parameter and exponent power parameter less than one is actually mixed geometric. The mixed geometric distributions were introduced and studied in [17] (also see [15,18]). A mixed geometric distribution can be interpreted in terms of the Bernoulli trials as follows. First, as a result of some “preliminary” experiment the value of some r.v. taking values in

[0, 1]

is determined, which is then used as the probability of success in the sequence of Bernoulli trials in which the original “unconditional” mixed Poisson r.v. is nothing else than the “conditionally” geometrically distributed r.v. having the sense of the number of trials up to the first failure. This makes it possible to assume that the sequence of wet/dry days is not independent but is conditionally independent and the random probability of success is determined by some outer stochastic factors. As such, we can consider the seasonality or the type of the cause of a wet period. So, since the GG-distribution is a more general and, hence, a more flexible model than the “pure” gamma distribution, there arises a hope that the GNB distribution could provide an even better goodness of fit to the statistical regularities in the duration of wet periods than the “pure” NB binomial distribution.

3. Notation, Definitions and Mathematical Preliminaries

In the paper, conventional notation is used. The symbols

\overset{d}{=}

and ⟹ denote the coincidence of distributions and convergence in distribution, respectively.

In what follows, for brevity and convenience, the results will be presented in terms of r.v.’s with the corresponding distributions. It will be assumed that all the r.v.’s are defined on the same probability space

(Ω, F, P)

.

An r.v. having the gamma distribution with shape parameter

r > 0

and scale parameter

μ > 0

will be denoted

G_{r, μ}

,

P (G_{r, μ} < x) = \int_{0}^{x} g (z; r, μ) d z, with g (x; r, μ) = \frac{μ^{r}}{Γ (r)} x^{r - 1} e^{- μ x}, x ⩾ 0,

where

Γ (r)

is Euler’s gamma-function,

Γ (r) = \int_{0}^{\infty} x^{r - 1} e^{- x} d x

,

r > 0

.

In this notation, obviously,

G_{1, 1}

is an r.v. with the standard exponential distribution:

P (G_{1, 1} < x) = [1 - e^{- x}] 1 (x ⩾ 0)

(here and in what follows

1 (A)

is the indicator function of a set A).

A GG-distribution is the absolutely continuous distribution defined by the density (Equation (2)). The distribution function (d.f.) corresponding to the density

g^{*} (x; r, γ, μ)

will be denoted

F^{*} (x; r, γ, μ)

.

The properties of GG-distributions are described in [11,19]. An r.v. with the density

g^{*} (x; r, γ, μ)

will be denoted

{\bar{G}}_{r, γ, μ}

. It can be easily made sure that

{\bar{G}}_{r, γ, μ} \overset{d}{=} G_{r, μ}^{1 / γ},

(3)

and hence,

{({\bar{G}}_{r, γ, μ})}^{γ} \overset{d}{=} G_{r, μ}

(4)

For convenience, for an r.v. with the Weibull distribution, a particular case of GG-distributions corresponding to the density

g^{*} (x; 1, γ, 1)

and the d.f.

[1 - e^{- x^{γ}}] 1 (x ⩾ 0)

with

γ > 0

, we will use a special notation

W_{α}

, that is,

W_{γ} \overset{d}{=} {\bar{G}}_{1, γ, 1}

. Thus,

G_{1, 1} \overset{d}{=} W_{1}

. The density

g^{*} (x; 1, α, 1)

with

α < 0

defines the Fréchet or inverse Weibull distribution. It is easy to see that

W_{1}^{1 / γ} \overset{d}{=} W_{γ} .

(5)

An r.v.

N_{r, p}

is said to have the negative binomial (NB) distribution with parameters

r > 0

(shape) and

p \in (0, 1)

(success probability), if

P (N_{r, p} = k) = \frac{Γ (r + k)}{k! Γ (r)} \cdot p^{r} {(1 - p)}^{k}, k = 0, 1, 2, . . .

A particular case of the NB distribution corresponding to the value

r = 1

is the geometric distribution. Let

p \in (0, 1)

and let

N_{1, p}

be the r.v. having the geometric distribution with parameter

p

:

P (N_{1, p} = k) = p {(1 - p)}^{k}, k = 0, 1, 2, . . .

This means that for any

m \in N

P (N_{1, p} ⩾ m) = \sum_{k = m}^{\infty} p {(1 - p)}^{k} = {(1 - p)}^{m} .

Let Y be an r.v. taking values in the interval

(0, 1)

. Moreover, let for all

p \in (0, 1)

the r.v. Y and the geometrically distributed r.v.

N_{1, p}

be independent. Let

M = N_{1, Y}

, that is,

M (ω) = N_{1, Y (ω)} (ω)

for any

ω \in Ω

. The distribution

P (M ⩾ m) = \int_{0}^{1} {(1 - y)}^{m} d P (Y < y), m \in N,

of the r.v. M will be called Y-mixed geometric [17].

It is well known that the negative binomial distribution is a mixed Poisson distribution with the gamma mixing distribution [12] (also see [20]): for any

r > 0

,

p \in (0, 1)

and

k \in {0} ⋃ N

we have

\frac{Γ (r + k)}{k! Γ (r)} \cdot p^{r} {(1 - p)}^{k} = \frac{1}{k!} \int_{0}^{\infty} e^{- z} z^{k} g (z; r, μ) d z,

(6)

where

μ = p / (1 - p)

.

The d.f. and the density of a strictly stable distribution with the characteristic exponent

α

and shape parameter

θ

defined by the characteristic function (ch.f.)

f (t; α, θ) = exp \{- {| t |}^{α} exp {- \frac{1}{2} i π θ α sign t}\}, t \in R,

where

0 < α ⩽ 2

,

| θ | ⩽ min {1, \frac{2}{α} - 1}

, will be respectively denoted

F (x; α, θ)

and

f (x; α, θ)

(see, e.g., [21]). An r.v. with the d.f.

F (x; α, θ)

will be denoted

S_{α, θ}

.

To symmetric, strictly stable distributions, there corresponds the value

θ = 0

. To one-sided strictly stable distributions concentrated on the nonnegative halfline, there correspond the values

θ = 1

and

0 < α ⩽ 1

. The pairs

α = 1

,

θ = \pm 1

correspond to the distributions degenerate in

\pm 1

, respectively. All the other strictly stable distributions are absolutely continuous. Stable densities cannot explicitly be represented via elementary functions with four exceptions: the normal distribution (

α = 2

,

θ = 0

), the Cauchy distribution (

α = 1

,

θ = 0

), the Lévy distribution (

α = \frac{1}{2}

,

θ = 1

) and the distribution symmetric to the Lévy law (

α = \frac{1}{2}

,

θ = - 1

). Expressions of stable densities in terms of the Fox functions (generalized Meijer G-functions) can be found in [22,23].

The standard normal d.f. will be denoted

Φ (x)

,

Φ (x) = \frac{1}{\sqrt{2 π}} \int_{- \infty}^{x} e^{- y^{2} / 2} d y, y \in R .

An r.v. with the d.f.

Φ (x)

will be denoted X. The folded or half- normal distribution is the distribution of the r.v.

| X |

. It can be easily verified that

S_{2, 0} \overset{d}{=} \sqrt{2} X

.

In [24,25], it was demonstrated that if

γ \in (0, 1]

, then

W_{γ} \overset{d}{=} W_{1} \cdot S_{γ, 1}^{- 1}

(7)

with the r.v.’s on the right-hand side being independent.

For

r \in (0, 1)

let

G_{r, 1}

and

G_{1 - r, 1}

be independent gamma-distributed r.v.’s. Let

μ > 0

. Introduce the r.v.

Z_{r, μ} = \frac{μ (G_{r, 1} + G_{1 - r, 1})}{G_{r, 1}} \overset{d}{=} μ Z_{r, 1} \overset{d}{=} μ (1 + \frac{1 - r}{r} Q_{1 - r, r}),

(8)

where

Q_{1 - r, r}

is the r.v. with the Snedecor–Fisher distribution defined by the probability density

q (x; 1 - r, r) = \frac{{(1 - r)}^{1 - r} r^{r}}{Γ (1 - r) Γ (r)} \cdot \frac{1}{x^{r} [r + (1 - r) x]}, x ⩾ 0 .

(9)

In the paper [26], it was shown that any gamma distribution with shape parameter no greater than one is mixed exponential. For convenience, we formulate this result as the following lemma.

Lemma 1

([26]). The density of a gamma distribution

g (x; r, μ)

with

0 < r < 1

can be represented as

\begin{matrix} g (x; r, μ) = \int_{0}^{\infty} z e^{- z x} p (z; r, μ) d z, \end{matrix}

where

\begin{matrix} p (z; r, μ) = \frac{μ^{r}}{Γ (1 - r) Γ (r)} \cdot \frac{1 (z ⩾ μ)}{{(z - μ)}^{r} z} \end{matrix}

is the density of the r.v.

Z_{r, μ}

introduced above. In other words, if

0 < r < 1

, then

\begin{matrix} G_{r, μ} \overset{d}{=} \frac{W_{1}}{Z_{r, μ}}, \end{matrix}

(10)

where the random variables

W_{1}

and

Z_{r, μ}

are independent. Moreover, a gamma distribution with shape parameter

r > 1

cannot be represented as a mixed exponential distribution.

Let

r > 0

,

γ \in R

and

μ > 0

. Let the r.v.

N_{r, γ, μ}

have the GNB distribution. Its d.f. will be denoted

F_{G N B} (x; r, γ, μ)

.

Along with the arguments given above in favor of the adequacy of the GNB models for the duration of wet periods based on their definition as mixed Poisson distributions, this effect can also be explained (at least in part) by their one more important property of being mixed geometric formulated as the following theorem.

Theorem 1

([3]). If

r \in (0, 1]

,

γ \in (0, 1]

and

μ > 0

, then a GNB distribution is a

Y_{r, γ, μ}

-mixed geometric distribution:

P (N_{r, γ, μ} = k) = \int_{0}^{1} y {(1 - y)}^{k} d P (Y_{r, γ, μ} < y), k = 0, 1, 2 . . .,

(11)

where

Y_{r, γ, μ} \overset{d}{=} \frac{S_{γ, 1} Z_{r, μ}^{1 / γ}}{1 + S_{γ, 1} Z_{r, μ}^{1 / γ}} \overset{d}{=} \frac{μ^{1 / γ} S_{γ, 1} {(G_{r, 1} + G_{1 - r, 1})}^{1 / γ}}{G_{r, 1}^{1 / γ} + μ^{1 / γ} S_{γ, 1} {(G_{r, 1} + G_{1 - r, 1})}^{1 / γ}},

(12)

where the r.v.’s

S_{γ, 1}

and

Z_{μ, r}

or

S_{γ, 1}

,

G_{r, 1}

and

G_{1 - r, 1}

are independent.

4. The Asymptotic Approximation to the Probability Distribution of Extremal Daily Precipitation within a Wet Period

In this section, the probability distribution of extremal daily precipitation within a wet period will be deduced as an asymptotic approximation. We will require some auxiliary statements formulated as lemmas.

The following asymptotic property of the GNB distribution will play the fundamental role in the construction of asymptotic approximations to the distributions of extreme daily precipitation within a wet period and the total precipitation volume per wet period and the corresponding statistical tests for precipitation to be abnormally heavy.

Lemma 2

([3]). For

r > 0

,

γ \in R

,

μ > 0

let

N_{r, γ, μ}

be an r.v. with the GNB distribution. We have

μ^{1 / γ} N_{r, γ, μ} ⟹ {\bar{G}}_{r, γ, 1} \overset{d}{=} G_{r, 1}^{1 / γ}

(13)

as

μ \to 0

. If, moreover,

r \in (0, 1]

and

γ \in (0, 1]

, then the limit law can be represented as

{\bar{G}}_{r, γ, 1} \overset{d}{=} \frac{W_{1}}{S_{γ, 1} Z_{r, 1}^{1 / γ}} \overset{d}{=} \frac{W_{1}^{1 / γ}}{Z_{r, 1}^{1 / γ}} \overset{d}{=} {(\frac{W_{1} G_{r, 1}}{G_{r, 1} + G_{1 - r, 1}})}^{1 / γ} \overset{d}{=} W_{1}^{1 / γ} \cdot {(1 + \frac{1 - r}{r} Q_{1 - r, r})}^{- 1 / γ},

(14)

where the r.v.’s

W_{1}

,

S_{γ, 1}

and

Z_{r, 1}

are independent as well as the r.v.’s

W_{1}

and

Z_{r, 1}

, or the r.v.’s

W_{1}

,

G_{r, 1}

and

G_{1 - r, 1}

, and the r.v.

Q_{1 - r, r}

has the Snedecor–Fisher distribution with parameters

1 - r

and r, see Equation (9).

Let

μ > 0

,

γ > 0

. Instead of an infinitesimal parameter

μ

, in order to construct asymptotic approximations with “large” sample size, introduce an auxiliary “infinitely large” parameter

n \in N

and assume that

μ = μ_{n} = μ n^{- γ}

. It can be easily made sure that in this case

{\bar{G}}_{r, γ, μ / n^{γ}} \overset{d}{=} n {\bar{G}}_{r, γ, μ} .

(15)

Then for

r > 0

,

μ > 0

for any

n \in N

, we have

n^{- 1} {\bar{G}}_{r, γ, λ / n^{γ}} \overset{d}{=} {\bar{G}}_{r, γ, λ} \overset{d}{=} λ^{- 1 / γ} {\bar{G}}_{r, γ, 1} \overset{d}{=} λ^{- 1 / γ} G_{r, 1}^{1 / γ} .

(16)

The standard Poisson process (the Poisson process with unit intensity) will be denoted

P (t)

,

t ⩾ 0

.

Lemma 3

([27]). Let

Λ_{1}, Λ_{2}, \dots

be a sequence of positive r.v.’s such that for any

n \in N

the r.v.

Λ_{n}

is independent of the standard Poisson process

P (t)

,

t ⩾ 0

. The convergence

n^{- 1} P (Λ_{n}) ⟹ Λ

as

n \to \infty

to some nonnegative r.v. Λ takes place if and only if

n^{- 1} Λ_{n} ⟹ Λ, n \to \infty .

(17)

Lemma 3 can be regarded as a special case of the following result. Consider a sequence of r.v.’s

W_{1}, W_{2}, . . .

Let

N_{1}, N_{2}, . . .

be natural-valued r.v.’s such that for every

n \in N

the r.v.

N_{n}

is independent of the sequence

W_{1}, W_{2}, . . .

In the following statement, the convergence is meant as

n \to \infty

.

Lemma 4

([28,29]). Assume that there exists an infinitely increasing (convergent to zero) sequence of positive numbers

{b_{n}}_{n ⩾ 1}

and an r.v. W such that

b_{n}^{- 1} W_{n} ⟹ W .

If there exist an infinitely increasing (convergent to zero) sequence of positive numbers

{d_{n}}_{n ⩾ 1}

and an r.v. N such that

d_{n}^{- 1} b_{N_{n}} ⟹ N,

(18)

then

d_{n}^{- 1} W_{N_{n}} ⟹ W \cdot N,

(19)

where the r.v.’s on the right-hand side of Equation (19) are independent. If, in addition,

N_{n} ⟶ \infty

in probability and the family of scale mixtures of the d.f. of the r.v. W is identifiable, then Condition (18) is not only sufficient for Equation (19), but is necessary as well.

Consider a sequence of independent identically distributed (i.i.d.) r.v.’s

X_{1}, X_{2}, \dots

. Let

N_{1}, N_{2}, \dots

be a sequence of natural-valued r.v.’s such that for each

n \in N

the r.v.

N_{n}

is independent of the sequence

X_{1}, X_{2}, \dots

. Denote

M_{n} = max {X_{1}, \dots, X_{N_{n}}}

.

Lemma 5

([30]). Let

Λ_{1}, Λ_{2}, \dots

be a sequence of positive r.v.’s such that for each

n \in N

the r.v.

Λ_{n}

is independent of the Poisson process

P (t)

,

t ⩾ 0

. Let

N_{n} = P (Λ_{n})

. Assume that there exists a nonnegative r.v. Λ such that Convergence (17) takes place. Let

X_{1}, X_{2}, \dots

be i.i.d. r.v.’s a common d.f.

F (x)

. Assume also that

sup {x : F (x) < 1} = \infty

and there exists a number

α > 0

such that for each

x > 0

lim_{y \to \infty} \frac{1 - F (x y)}{1 - F (y)} = x^{- α} .

(20)

Then

lim_{n \to \infty} sup_{x ⩾ 0} | P (\frac{M_{n}}{F^{- 1} (1 - \frac{1}{n})} < x) - \int_{0}^{\infty} e^{- z x^{- α}} d P (Λ < z) | = 0 .

Now we turn to the main results of this section. The principal role in our reasoning will be played by Lemma 5. In order to justify its applicability, we need to make sure that the daily precipitation volumes satisfy Condition (20). A thorough statistical analysis shows that, although being rather adequate and, in general, acceptable model, the traditional gamma distribution (used, e.g., in [4]) is not the best model for statistical regularities in daily precipitation. The analysis of meteorological data (daily precipitation volumes) registered over 60 years at two geographic points with a very different climate: Potsdam (Brandenburg, Germany) with a mild climate influenced by the closeness to the ocean with warm Gulfstream flow and Elista (Kalmykia, Russia) with a radically continental climate convincingly suggests the Pareto-type model for the distribution of daily precipitation volumes, see Figure 3 and Figure 4. For comparison, on these figures, the graphs of the best gamma-densities there are also presented. It can be seen that the gamma model fits the histograms in a noticeably worse way than the Pareto distribution.

Theorem 2.

Let

n \in N

,

γ > 0

,

μ > 0

and let

N_{r, γ, μ_{n}}

be an r.v. with the GNB distribution with parameters

r > 0

,

γ > 0

and

μ_{n} = μ / n^{γ}

. Let

X_{1}, X_{2}, \dots

be i.i.d. r.v.’s with a common d.f.

F (x)

. Assume that

rext (F) = \infty

and there exists a number

α > 0

such that Relation (20) holds for any

x > 0

. Then

lim_{n \to \infty} sup_{x ⩾ 0} | P (\frac{max {X_{1}, \dots, X_{N_{r, γ, μ_{n}}}}}{F^{- 1} (1 - \frac{1}{n})} < x) - F (x; r, α, γ, μ) | = 0,

where

F (x; r, α, γ, μ) = \int_{0}^{\infty} e^{- λ x^{- α}} g^{*} (λ; r, γ, μ) d λ \equiv P (M_{r, α, γ, μ} < x), x \in R .

The limit r.v.

M_{r, α, γ, μ}

admits the following product representations:

M_{r, α, γ, μ} \overset{d}{=} \frac{{\bar{G}}_{r, α γ, μ}}{W_{α}} \overset{d}{=} {(\frac{{\bar{G}}_{r, γ, μ}}{W_{1}})}^{1 / α} \overset{d}{=} μ^{- 1 / α γ} {(\frac{G_{r, 1}}{W_{γ}})}^{1 / α γ}

(21)

and in each term, the involved random variables are independent.

Proof.

By definition, the GNB distribution is a mixed Poisson distribution with the GG mixing distribution. So,

N_{r, γ, μ_{n}} \overset{d}{=} P ({\bar{G}}_{r, γ, μ_{n}})

. Therefore, from Equation (16), Lemma 3 with

Λ_{n} = {\bar{G}}_{r, γ, μ_{n}}

and Lemma 5 with the account of the absolute continuity of the limit distribution it immediately follows that

lim_{n \to \infty} sup_{x ⩾ 0} | P (\frac{max {X_{1}, \dots, X_{N_{r, γ, μ_{n}}}}}{F^{- 1} (1 - \frac{1}{n})} < x) - \int_{0}^{\infty} e^{- z x^{- α}} g^{*} (z; r, γ, μ) d z | = 0 .

Since the Fréchet (inverse Weibull) d.f.

e^{- x^{- α}}

with

α > 0

corresponds to the r.v.

W_{α}^{- 1}

, it is easy to make sure

F (x; r, α, γ, μ) \equiv \int_{0}^{\infty} e^{- z x^{- α}} g^{*} (z; r, γ, μ) d z = P (\frac{{\bar{G}}_{r, γ, μ}^{1 / α}}{W_{α}} < x) .

Moreover, using relation

{\bar{G}}_{r, γ, μ} \overset{d}{=} G_{r, μ}^{1 / γ}

, it is easy to see that

\frac{{\bar{G}}_{r, γ, μ}^{1 / α}}{W_{α}} \overset{d}{=} \frac{{\bar{G}}_{r, α γ, μ}}{W_{α}} \overset{d}{=} {(\frac{{\bar{G}}_{r, γ, μ}}{W_{1}})}^{1 / α} \overset{d}{=} μ^{- 1 / α γ} {(\frac{G_{r, 1}}{W_{γ}})}^{1 / α γ},

where in each term the involved random variables are independent. The theorem is proved. □

If

γ = 1

, then the limit distribution

F (x; r, α, 1, μ)

corresponds to the results of [31,32].

Theorem 3.

The distribution of the r.v.

M_{r, α, γ, μ}

admits the following representations.

(i): If $r \in (0, 1]$ , it is the scale mixture of the distribution of the ratio of two independent Weibull-distributed r.v.’s:

$M_{r, α, γ, μ} \overset{d}{=} {(μ Z_{r, 1})}^{- 1 / α γ} \cdot \frac{W_{α γ}}{W_{γ}},$

where all the involved random variables are independent and the r.v. $Z_{r, 1}$ is defined in Equation (8).
(ii): If $γ \in (0, 1]$ , it is the scale mixture of the tempered Snedecor–Fisher distribution with parameters r and 1:

$M_{r, α, γ, μ} \overset{d}{=} {(\frac{S_{γ, 1}}{μ r} \cdot Q_{r, 1})}^{1 / α γ},$

where $S_{γ, 1}$ is a positive strictly stable r.v. with characteristic exponent γ independent of the r.v. $Q_{r, 1}$ with the Snedecor–Fisher distribution in Equation (9) with parameters r and 1.
(iii): If $γ \in (0, 1]$ and $r \in (0, 1]$ , it is the scale mixture of the Pareto laws:

$M_{r, α, γ, μ} \overset{d}{=} Π_{α} {(S_{γ, 1} Z_{r, 1}^{1 / γ})}^{- 1 / α},$

where $P (Π_{α} > x) = {(x^{α} + 1)}^{- 1}$ , $x ⩾ 0$ .
(iv): If $r \in (0, 1]$ and $α γ \in (0, 1]$ , it is the scale mixture of the folded normal laws:

$M_{r, α, γ, μ} \overset{d}{=} | X | \cdot \frac{\sqrt{2 W_{1}}}{μ^{1 / α γ} W_{α} S_{α γ, 1} Z_{r, 1}^{1 / α γ}},$

where all the involved r.v.’s are independent.

Proof.

To prove (i) it suffices to consider the rightmost term in Equation (21), apply relations

W_{1}^{1 / γ} \overset{d}{=} W_{γ}

and

G_{r, μ} \overset{d}{=} \frac{W_{1}}{Z_{r, μ}}

(here

0 < r < 1

and the r.v.’s

W_{1}

and

Z_{r, μ}

are independent (for details, see Lemma 1).

To prove (ii) it suffices to transform the rightmost term in (21) with the account of representation in Equation (7) and use the definition of the Snedecor–Fisher distribution as the distribution of the ratio of two independent gamma-distributed r.v.’s (see, e.g., Section 27 in [33]).

To prove (iii) it suffices to transform the second term in Equation (21) with the account of Equation (14) and notice that the distribution of the ratio of two independent exponentially distributed r.v.’s coincides with that of the random variable

Π_{1}

.

To prove (iv) it suffices to transform the second term in (21) with the account of (14) and notice that

W_{1} \overset{d}{=} | X | \sqrt{2 W_{1}}

with the r.v.’s on the right-hand side being independent (see, e.g., [25]). The theorem is proved. □

The product representations for the random value

M_{r, α, γ, μ}

established in Theorem 3 can be used for computer simulation.

Theorem 4.

If

r \in (0, 1]

,

μ > 0

and

α γ \in (0, 1]

, then the d.f.

F (x; r, α, γ, μ)

is mixed exponential:

1 - F (x; r, α, γ, μ) = \int_{0}^{\infty} e^{- u x} d A (u), x ⩾ 0,

where

A (u) = P (μ^{1 / α γ} W_{α} S_{α γ, 1} Z_{r, 1}^{1 / α γ} < u)

,

u ⩾ 0

, and all the involved r.v.’s are independent.

Proof.

To prove this statement, it suffices to transform the second term in Equation (21) with the account of Equation (14) and obtain

M_{r, α, γ, μ} \overset{d}{=} \frac{W_{1}}{μ^{1 / α γ} W_{α} S_{α γ, 1} Z_{r, 1}^{1 / α γ}} .

□

Corollary 1.

Let

r \in (0, 1]

,

α γ \in (0, 1]

,

μ > 0

. Then the distribution function

F (x; r, α, γ, μ)

is infinitely divisible.

Proof.

This statement immediately follows from Theorem 3 and the result of [34] stating that the product of two independent non-negative r.v.’s is infinitely divisible, if one of the two is exponentially distributed. □

It is possible to deduce explicit expressions for the moments of the r.v.

M_{r, α, γ, μ}

.

Theorem 5.

Let

0 < δ < α

. Then

E M_{r, α, γ, μ}^{δ} = \frac{Γ (r + \frac{δ}{α γ}) Γ (1 - \frac{δ}{α})}{μ^{δ / α γ} Γ (r)} .

Proof.

From Equation (14) it follows that

E M_{r, α, γ, μ}^{δ} = μ^{- δ / α γ} E G_{r, 1}^{δ / α γ} \cdot E W_{1}^{- δ / α}

. It is easy to verify that

E G_{r, 1}^{δ / α γ} = Γ (r + \frac{δ}{α γ}) / Γ (r)

,

E W_{1}^{- δ / α} = Γ (1 - \frac{δ}{α})

. Hence follows the desired result. □

Consider the bounds for the rate of convergence in Theorem 2. For this purpose, we will use one more auxiliary statement.

Lemma 6.

Let

λ > 0

,

X_{1}, X_{2}, \dots

be i.i.d. r.v.’s with a common d.f.

F (x)

,

P (t)

be the standard Poisson process independent of

X_{1}, X_{2}, \dots

. Assume that there exists a d.f.

H (x)

such that for any

x \in R

lim_{n \to \infty} P (\frac{1}{F^{- 1} (1 - \frac{1}{n})} max_{1 ⩽ k ⩽ n} X_{k} < x) = H (x) .

(22)

Then for any

n \in N

| P (\frac{1}{F^{- 1} (1 - \frac{1}{n})} max_{1 ⩽ k ⩽ P (n λ)} X_{k} < x) - H^{λ} (x) | ⩽ | n [1 - F (x F^{- 1} (1 - \frac{1}{n}))] - log H (x) | λ H^{λ} (x) .

Proof.

This statement is a special case of Corollary 2 in [35]. □

Theorem 6.

Let

n \in N

,

γ > 0

,

μ > 0

and let

N_{r, γ, μ_{n}}

be an r.v. with the GNB distribution with parameters

r > 0

,

γ > 0

and

μ_{n} = μ / n^{γ}

. Let

X_{1}, X_{2}, \dots

be i.i.d. r.v.’s with the common Pareto d.f.

F (x) = 1 - \frac{c}{a x^{α} + c}, x ⩾ 0,

with

a, c, α > 0

,

F (x; r, α, γ, μ) = \int_{0}^{\infty} e^{- λ x^{- α}} g^{*} (λ; r, γ, μ) d λ, x \in R .

Then for any

x \in R

\begin{matrix} | P ({[\frac{a}{c (n - 1)}]}^{1 / γ} max_{1 ⩽ k ⩽ N_{r, γ, μ_{n}}} X_{k} < x) - F (x; r, α, γ, μ) | ⩽ \end{matrix}

\begin{matrix} ⩽ | \frac{x^{α} - 1}{x^{α} (n - 1) + 1} | \cdot \int_{0}^{\infty} λ e^{- λ x^{- α}} g^{*} (λ; r, γ, μ) d λ ⩽ | \frac{x^{α} - 1}{x^{α} (n - 1) + 1} | \cdot \frac{Γ (r + \frac{1}{γ})}{μ^{1 / γ} Γ (r)} . \end{matrix}

Proof.

First of all, check Condition (20). We have

\frac{1 - F (x y)}{1 - F (y)} = \frac{a y^{α} + c}{a x^{α} y^{α} + c} ⟶ x^{- α}

as

y \to \infty

, that is, Condition (20) holds implying Equation (22) with

H (x) = e^{- x^{- α}}

in accordance with the classical theory of extremes (see, e.g., [36]). Second, note that in the case under consideration

F^{- 1} (1 - \frac{1}{n}) = {[\frac{c (n - 1)}{a}]}^{1 / α}

so that

F (x F^{- 1} (1 - \frac{1}{n})) = 1 - {[x^{α} (n - 1) + 1]}^{- 1}

and

n [1 - F (x F^{- 1} (1 - \frac{1}{n}))] - log H (x) = \frac{n}{x^{α} (n - 1) + 1} - \frac{1}{x^{α}} = \frac{x^{α} - 1}{x^{α} (n - 1) + 1} .

Third, from Equation (15) it follows that

N_{r, γ, μ / n^{γ}} \overset{d}{=} P (n {\bar{G}}_{r, γ, μ})

with independent

P (t)

and

{\bar{G}}_{r, γ, μ}

. Therefore, by Lemma 6 we have

\begin{matrix} | P ({[\frac{a}{c (n - 1)}]}^{1 / γ} max_{1 ⩽ k ⩽ N_{r, γ, μ_{n}}} X_{k} < x) - F (x; r, α, γ, μ) | ⩽ \\ ⩽ \int_{0}^{\infty} | P ({[\frac{a}{c (n - 1)}]}^{1 / γ} max_{1 ⩽ k ⩽ P (n λ)} X_{k} < x) - e^{- λ x^{- α}} | g^{*} (λ; r, γ, μ) d λ ⩽ \\ ⩽ | \frac{x^{α} - 1}{x^{α} (n - 1) + 1} | \cdot \int_{0}^{\infty} λ e^{- λ x^{- α}} g^{*} (λ; r, γ, μ) d λ ⩽ | \frac{x^{α} - 1}{x^{α} (n - 1) + 1} | \cdot E {\bar{G}}_{r, γ, μ} = \\ = | \frac{x^{α} - 1}{x^{α} (n - 1) + 1} | \cdot \frac{Γ (r + \frac{1}{γ})}{μ^{1 / γ} Γ (r)} . \end{matrix}

The theorem is proved. □

Actually, Theorem 6 states that the rate of convergence in Theorem 2 is

O (μ_{n}^{1 / γ})

as

μ_{n} \to 0

.

The results of this section serve as a theoretical base for the construction of a test for abnormally extreme daily precipitation. The distribution of the maximum daily precipitation per wet period can be assumed to be a tempered scale mixture of the gamma distribution in which the scale factor has the Weibull distribution. According to the typical construction of a test, a daily precipitation volume is considered to be abnormally extremal, if it exceeds a certain (pre-defined) quantile of this distribution. A detailed description of this test and algorithm of estimation of the parameters of the distribution mentioned above deserve a separate study as well as its application to real data.

5. The Asymptotic Approximation to the Probability Distribution of Total Precipitation over a Wet Period. Generalized R ényi Theorem for Gnb Random Sums

As far ago as in the 1950s, being interested in modeling rare events, A. Rényi studied rarefaction of renewal point processes and proved his famous theorem on convergence of rarefied renewal processes to the Poisson process [37,38]. The Rényi theorem states that the distribution of a geometric sum (i.e., a sum of a random number of i.i.d. r.v.’s in which the number of summands is a r.v. with the geometric distribution independent of the summands) normalized by its expectation converges to the exponential law as the expectation of the sum infinitely increases. The normalization of a sum by its expectation is typical for laws of large numbers. Therefore, the Rényi theorem can be regarded as the law of large numbers for geometric sums. A general law of large numbers for random sums of independent identically distributed (i.i.d.) random variables (r.v.’s) was proved in [28]. It was demonstrated there that the distribution of a random sum normalized by its expectation converges to some distribution, if and only if the distribution of the random index (the number of summands) converges to the same distribution (up to a scale parameter) under the same normalization. In [3] the law of large numbers for GNB random sums was proved. However, a direct application of this result to modeling the probability distribution of total precipitation over a wet period is hampered by the following very interesting practical observation.

One might have expected that successive daily precipitation volumes

X_{1}, X_{2}, \dots

satisfy the classical law of large numbers, that is, the arithmetic mean

\frac{1}{n} (X_{1} + \dots + X_{n})

converges to some number a almost surely as n infinitely grows, as it was done in [2]. However, a thorough analysis of real data shows that this not quite so. In Figure 5, there are the graphs of the averaged daily precipitation volumes in Potsdam and Elista demonstrating the slowly decreasing trend for Potsdam and slowly increasing trend for Elista.

This means that, in order to match the stabilization of the averages at some level a, it is required to normalize the sum

X_{1} + \dots + X_{n}

not by n, but by a somewhat more complicated function of n that can match the influence of slow global trends. As such, a function of n, consider a power function

n^{β}

with

β > 0

and assume that not necessarily i.i.d. r.v.’s

X_{1}, X_{2}, \dots

satisfy the condition

\frac{1}{n^{β}} \sum_{j = 1}^{n} X_{j} ⟹ a \in (0, \infty)

(23)

as

n \to \infty

. The parameters a and

β

can be rather reliably estimated by the least squares technique.

Let

X_{1}, X_{2}, \dots, X_{n}

be the observed values of successive nonzero daily precipitation volumes,

n \in N

be the total number of available observations. For a natural

k = 1, \dots, n

denote

s_{k} = X_{1} + \dots + X_{k}

. If Condition (23) holds, then for k large enough (

1 ⩽ m ⩽ k ⩽ n

), the following estimates of the parameters a and

β

in Relation (23) can be used:

\tilde{a} = exp \{\frac{\sum_{k = m}^{n} log s_{k} \cdot \sum_{k = m}^{n} {(log k)}^{2} - \sum_{k = m}^{n} log k \cdot \sum_{k = m}^{n} (log k \cdot log s_{k})}{(n - m + 1) \sum_{k = m}^{n} {(log k)}^{2} - {(\sum_{k = m}^{n} log k)}^{2}}\},

(24)

\tilde{β} = \frac{\sum_{k = m}^{n} log s_{k} - (n - m + 1) log \tilde{a}}{\sum_{k = m}^{n} log k} .

(25)

Indeed, if Condition (23) holds, the following approximate equality can be written:

\frac{T_{k}}{k^{β}} \approx a \Leftrightarrow - β log k + log T_{k} \approx log a .

Therefore, the estimates of the parameters a and

β

can be found as the solution of the least squares problem

\sum_{k = m}^{n} {(log T_{k} - β log k - log a)}^{2} ⟶ min_{β, log a} .

This solution can be found explicitly and has the form

\begin{matrix} \tilde{log a} = \frac{\sum_{k = m}^{n} log T_{k} \cdot \sum_{k = m}^{n} {(log k)}^{2} - \sum_{k = m}^{n} log k \cdot \sum_{k = m}^{n} (log k \cdot log T_{k})}{(n - m + 1) \sum_{k = m}^{n} {(log k)}^{2} - {(\sum_{k = m}^{n} log k)}^{2}}, \\ \tilde{β} = \frac{\sum_{k = m}^{n} log T_{k} - (n - m + 1) \tilde{log a}}{\sum_{k = m}^{n} log k}, \end{matrix}

that leads to Formulas (24) and (25). This least squares method for estimation of a and

β

is realized by Algorithm A2 (see Appendix A).

The application of Equation (23) to real data from Potsdam and Elista with a and

β

estimated by Equations (24) and (25) is illustrated in Figure 6. It can be seen that the cumulative averages stabilize at the level

a = 4.087

with

β = 0.981

for Potsdam and at the level

a = 0.96

with

β = 1.146

for Elista.

So, to construct the asymptotic approximation to the probability distribution of total precipitation over a wet period, we should prove a generalized Rényi theorem for GNB random sums improving an analogous statement proved in [3]. It must be especially noted that in the following theorem, the r.v.’s

X_{1}, X_{2}, . . .

are not assumed to be i.i.d.

Theorem 7.

Assume that the nonzero daily precipitation volumes

X_{1}, X_{2}, . . .

satisfy Condition (23) with some

β > 0

and

a > 0

. Let the numbers

r > 0

, γ and

μ > 0

be arbitrary. For each

n \in N

, let the r.v.

N_{r, γ, μ_{n}}

have the GNB distribution with parameters r, γ and

μ_{n} = μ / n^{γ}

. Assume that for each

n \in N

the r.v.

N_{r, γ, μ_{n}}

is independent of the sequence

X_{1}, X_{2}, . . .

Then

\frac{a μ^{β / γ}}{n^{β}} \sum_{j = 1}^{N_{r, γ, μ / n^{γ}}} X_{j} ⟹ {\bar{G}}_{r, γ / β, 1} \overset{d}{=} G_{r, 1}^{β / γ}

as

n \to \infty

.

Proof.

The proof is based on Lemma 4 and Equation (16). From Equation (16) it follows that

\frac{μ^{1 / γ}}{n} \cdot N_{r, γ, μ / n^{γ}} ⟹ {\bar{G}}_{r, γ, 1}

(26)

as

n \to \infty

. By virtue of Condition (23), in Lemma 4 let

b_{n} = n^{β} / a

. As

N_{n}

in Lemma 4 take

N_{r, γ, μ / n^{γ}}

. Then

b_{N_{n}} = \frac{1}{a} N_{r, γ, μ / n^{γ}}^{β}

. From Equation (26) it follows that, as

n \to \infty

,

\frac{1}{a} N_{r, γ, μ / n^{γ}}^{β} \cdot \frac{μ^{β / γ}}{n^{β}} ⟹ \frac{1}{a} {\bar{G}}_{r, γ, 1}^{β} \overset{d}{=} \frac{1}{a} {\bar{G}}_{r, γ / β, 1} \overset{d}{=} \frac{1}{a} G_{r, 1}^{β / γ} .

(27)

Therefore, as

d_{n}

we can take

d_{n} = n^{β} / μ^{β / γ}

. So, using Equation (27) in the role of Equation (18) in Lemma 4, we obtain Equation (19) in the form

\frac{μ^{β / γ}}{n^{β}} \sum_{j = 1}^{N_{r, γ, μ / n^{γ}}} X_{j} ⟹ \frac{1}{a} {\bar{G}}_{r, γ / β, 1} \overset{d}{=} \frac{1}{a} G_{r, 1}^{β / γ},

(28)

whence follows the desired result. The theorem is proved. □

Theorem 7 presents a good tool for the account of the parameters

β

and

γ

characterizing the deviation from traditional NB and arithmetic mean models due to the influence of possible (slow) global trends. If in Theorem 7

r = γ = β = 1

, then we obtain a version of the Rényi theorem [39] generalized to non-identically distributed and not necessarily independent summands. If in Theorem 7

β = 1

, then we obtain the law of large numbers for GNB random sums (see [3]). If in Theorem 7

γ = 1

, then we obtain the law of large numbers for NB random sums modified for the case

β \neq 1

.

Therefore, if daily precipitation volumes

X_{1}, X_{2}, \dots

(of course, being non-identically distributed and not independent), with the account of the excellent fit of the GNB model for the duration of a wet period (see Figure 1), with rather small

μ

, the GG distribution can be regarded as an adequate and theoretically well-based model for the total precipitation volume over a (long enough) wet period.

As regards the bounds for the rate of convergence in Theorem 7, consider a special case of

β = 1

and i.i.d.

X_{1}, X_{2}, \dots

As a measure of the distance between probability distributions, consider the

ζ

-metric proposed by V. M. Zolotarev in [40,41] (also see [42], p. 44). Let

s > 0

. There exists a unique representation of the number s as

s = m + α

where m is an integer and

0 < α ⩽ 1

. By

F_{s}

we denote the set of all real-valued bounded functions f on

R

that are m times differentiable and

| f^{(m)} (x) - f^{(m)} {(y) | ⩽ | x - y |}^{α}

. Let X and Y be two r.v.’s in which the distribution functions will be denoted

F_{X} (x)

and

F_{Y} (x)

, respectively. The

ζ

-metric

ζ_{s} (X, Y) \equiv ζ_{s} (F_{X}, F_{Y})

in the space of probability distributions is defined by the equality

ζ_{s} (X, Y) = sup \{| E (f (X) - f (Y)) | : f \in F_{s}\}

. In particular,

ζ_{1} (X, Y) = \int_{R} | F_{X} (x) - F_{Y} (x) | d x .

In [43], it was shown that in the case

β = 1

and i.i.d.

X_{1}, X_{2}, \dots

for

1 ⩽ s ⩽ 2

we have

ζ_{s} (\frac{μ^{1 / γ}}{n a} \sum_{j = 1}^{N_{r, γ, μ / n^{γ}}} X_{j}, {\bar{G}}_{r, γ, 1}) ⩽ \frac{{(E X_{1}^{2})}^{s / 2}}{n^{s / 2} {| E X_{1} |}^{s}} \cdot \frac{}{} \frac{Γ (1 + γ) Γ (r + \frac{s}{2 γ})}{Γ (1 + s) Γ (r)} .

In particular,

ζ_{2} (\frac{μ^{1 / γ}}{n a} \sum_{j = 1}^{N_{r, γ, μ / n^{γ}}} X_{j}, {\bar{G}}_{r, γ, μ}) ⩽ \frac{E X_{1}^{2}}{2 n {(E X_{1})}^{2}} \cdot \frac{Γ (r + \frac{1}{γ})}{Γ (r)} .

The results presented above justify the GG models for the probability distribution of total precipitation volume over a wet period improving the models considered in [2]. Statistical tests for the detection of anomalously extreme total volumes will be considered below.

6. Statistical Tests for Anomalously Extreme Total Precipitation Volumes

Now we turn to the construction of the tests for the total precipitation volume during a wet period to be abnormally large.

In what follows, based on the results of the preceding section, we will assume that the total precipitation volume during a wet period has the GG distribution with some parameters

r > 0

,

γ > 0

and

μ > 0

.

Let

m \in N

and

{\bar{G}}_{r, γ, μ}^{(1)}, {\bar{G}}_{r, γ, μ}^{(2)}, \dots, {\bar{G}}_{r, γ, μ}^{(m)}

be independent r.v.’s having the same GG distribution with parameters

r > 0

,

γ

and

μ > 0

. Also, let

G_{r, μ}^{(1)}, G_{r, μ}^{(2)}, \dots, G_{r, μ}^{(m)}

be i.i.d. r.v.’s having the same gamma distribution with parameters

r > 0

and

μ > 0

.

The base for the first step in the construction of the desired test is the following obvious conclusion: if the r.v.’s

{\bar{G}}_{r, γ, μ}^{(1)}, {\bar{G}}_{r, γ, μ}^{(2)}, \dots, {\bar{G}}_{r, γ, μ}^{(m)}

are identically distributed (that is, the sample

{\bar{G}}_{r, γ, μ}^{(1)}, {\bar{G}}_{r, γ, μ}^{(2)}, \dots, {\bar{G}}_{r, γ, μ}^{(m)}

is homogeneous), then the r.v.’s

{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ}, {({\bar{G}}_{r, γ, μ}^{(2)})}^{γ}, \dots, {({\bar{G}}_{r, γ, μ}^{(m)})}^{γ}

are also identically distributed (that is, the sample

{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ}, {({\bar{G}}_{r, γ, μ}^{(2)})}^{γ}, \dots, {({\bar{G}}_{r, γ, μ}^{(m)})}^{γ}

is homogeneous.

Consider the relative contribution of the r.v.

{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ}

to the sum

{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ} + {({\bar{G}}_{r, γ, μ}^{(2)})}^{γ} + \dots + {({\bar{G}}_{r, γ, μ}^{(m)})}^{γ}

:

R = \frac{{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ}}{{({\bar{G}}_{r, γ, μ}^{(1)})}^{γ} + {({\bar{G}}_{r, γ, μ}^{(2)})}^{γ} + \dots + {({\bar{G}}_{r, γ, μ}^{(m)})}^{γ}} .

(29)

From Equation (4), it obviously follows that

R \overset{d}{=} \frac{G_{r, μ}^{(1)}}{G_{r, μ}^{(1)} + G_{r, μ}^{(2)} + \dots + G_{r, μ}^{(m)}} \overset{d}{=} \frac{G_{r, 1}^{(1)}}{G_{r, 1}^{(1)} + G_{r, 1}^{(2)} + \dots + G_{r, 1}^{(m)}} \overset{d}{=} R^{*}

(see Equation(29)).

So, the r.v. R characterizes the relative precipitation volume for one (long enough) wet period with respect to the total precipitation volume registered for m wet periods.

Note that

R = {(1 + \frac{1}{G_{r, μ}^{(1)}} (G_{r, μ}^{(2)} + \dots + G_{r, μ}^{(m)}))}^{- 1} \overset{d}{=} {(1 + \frac{G_{(m - 1) r, μ}}{G_{r, μ}})}^{- 1},

where the gamma-distributed r.v.’s on the right hand side are independent. The distribution of the r.v. R was described in [2] where it was demonstrated that

R \overset{d}{=} {(1 + \frac{k}{r} Q_{k, r})}^{- 1}

where

Q_{k, r}

is the r.v. having the Snedecor–Fisher distribution determined for

k > 0

,

r > 0

by the Lebesgue density

f_{k, r} (x) = \frac{Γ (k + r)}{Γ (k) Γ (r)} {(\frac{k}{r})}^{k} \frac{x^{k - 1}}{{(1 + \frac{k}{r} x)}^{k + r}}, x ⩾ 0 .

(30)

It should be noted that the particular value of the scale parameter is insignificant. For convenience, it is assumed equal to one. It can be easily made sure by standard calculation using Equation (30), the distribution of the r.v. R is determined by the density

p (x; k, r) = \frac{Γ (k + r)}{Γ (r) Γ (k)} {(1 - x)}^{k - 1} x^{r - 1}, 0 ⩽ x ⩽ 1,

that is, it is the beta distribution with parameters

k = (m - 1) r

and r.

Then the test for the homogeneity of an independent sample of size m consisting of the GG distributed observations of total precipitation volumes during m wet periods with known

γ

based on the r.v. R looks as follows. Let

V_{1}, \dots, V_{m}

be the total precipitation volumes during m wet periods and, moreover,

V_{1} ⩾ V_{j}

for all

j ⩾ 2

. Calculate the quantity

S R = \frac{V_{1}^{γ}}{V_{1}^{γ} + \dots + V_{m}^{γ}}

(

S R

means <<Sample R>>). From what was said above, it follows that under the hypothesis

H_{0}

: <<the precipitation volume

V_{1}

under consideration is not abnormally large>> the r.v.

S R

has the beta distribution with parameters

k = (m - 1) r

and r. Let

ε \in (0, 1)

be a small number,

β_{k, r} (1 - ε)

be the

(1 - ε)

-quantile of the beta distribution with parameters

k = (m - 1) r

and r. If

S R > β_{k, r} (1 - ε)

, then the hypothesis

H_{0}

must be rejected, that is, the volume

V_{1}

of precipitation during one wet period must be regarded as abnormally large. Moreover, the probability of erroneous rejection of

H_{0}

is equal to

ε

.

Instead of R, the quantity

R_{0} = \frac{(m - 1) {({\bar{G}}_{r, γ, μ}^{(1)})}^{γ}}{{({\bar{G}}_{r, γ, μ}^{(2)})}^{γ} + \dots + {({\bar{G}}_{r, γ, μ}^{(m)})}^{γ}} \overset{d}{=} \frac{(m - 1) G_{r, μ}^{(1)}}{G_{r, μ}^{(2)} + \dots + G_{r, μ}^{(m)}} \overset{d}{=} \frac{k}{r} \frac{G_{r, μ}}{G_{k, μ}} \overset{d}{=} \frac{k}{r} \frac{G_{r, 1}}{G_{k, 1}} \overset{d}{=} Q_{r, k}

can be considered. Then, as is easily seen, the r.v.’s R and

R_{0}

are related by the one-to-one correspondence

R = \frac{R_{0}}{m - 1 + R_{0}} or R_{0} = \frac{(m - 1) R}{1 - R},

so that the homogeneity test for a sample from the GG distribution equivalent to the one described above and, correspondingly, the test for a precipitation volume during a wet period to be abnormally large, can be based on the r.v.

R_{0}

, which has the Snedecor–Fisher distribution with parameters r and

k = (m - 1) r

.

Namely, again let

V_{1}, \dots, V_{m}

be the total precipitation volumes during m wet periods and, moreover,

V_{1} ⩾ V_{j}

for all

j ⩾ 2

. Calculate the quantity

S R_{G G} = \frac{(m - 1) V_{1}^{γ}}{V_{2}^{γ} + \dots + V_{m}^{γ}} .

(31)

(

S R_{0}

means <<Sample

R_{0}

>>). From what was said above, it follows that under the hypothesis

H_{0}

: <<the precipitation volume

V_{1}

under consideration is not abnormally large>> the r.v.

S R

has the Snedecor–Fisher distribution with parameters r and

k = (m - 1) r

. Let

ε \in (0, 1)

be a small number,

q_{r, k} (1 - ε)

be the

(1 - ε)

-quantile of the Snedecor–Fisher distribution with parameters r and

k = (m - 1) r

. If

S R_{0} > q_{r, k} (1 - ε)

, then the hypothesis

H_{0}

must be rejected, that is, the volume

V_{1}

of precipitation during one wet period must be regarded as abnormally large. Moreover, the probability of erroneous rejection of

H_{0}

is equal to

ε

.

Let l be a natural number,

1 ⩽ l < m

. It is worth noting that, unlike the test based on the statistic R, the test based on

R_{0}

can be modified for testing the hypothesis

H_{0}^{'}

: <<the precipitation volumes

V_{i_{1}}, V_{i_{2}}, \dots, V_{i_{l}}

do not make an abnormally large cumulative contribution to the total precipitation volume

V_{1} + \dots + V_{m}

>>. For this purpose denote

T_{l}^{γ} = V_{i_{1}}^{γ} + V_{i_{2}}^{γ} + \dots + V_{i_{l}}^{γ}, T^{γ} = V_{1}^{γ} + V_{2}^{γ} + \dots + V_{m}^{γ}

and consider the quantity

S R_{0}^{'} = \frac{(m - l) T_{l}^{γ}}{l (T^{γ} - T_{l}^{γ})} .

In the same way as it was done above, it is easy to make sure that

S R_{0}^{'} \overset{d}{=} \frac{(m - l) G_{l r, l}}{l G_{(m - l) r, 1}} \overset{d}{=} Q_{l r, (m - l) r} .

Let

ε \in (0, 1)

be a small number,

q_{l r, (m - 1) r} (1 - ε)

be the

(1 - ε)

-quantile of the Snedecor–Fisher distribution with parameters

l r

and

k = (m - l) r

. If

S R_{0}^{'} > q_{l r, (m - l) r} (1 - ε)

, then the hypothesis

H_{0}^{'}

must be rejected, that is, the cumulative contribution of the precipitation volumes

V_{i_{1}}, V_{i_{2}}, \dots, V_{i_{l}}

into the total precipitation volume

V_{1} + \dots + V_{m}

must be regarded as abnormally large. Moreover, the probability of erroneous rejection of

H_{0}^{'}

is equal to

ε

.

7. Comparison of Tests for Anomalously Extreme Precipitation Volumes Based on Gamma and Gg Distributions

In this section, the results of the application of the test based on the statistic R in Equation (29) to the analysis of the time series of daily precipitation observed in Potsdam and Elista from 1950 to 2007 are considered and compared with similar results for the case of gamma distributed total precipitation volumes during wet periods [2].

The results of the application of the tests for a total precipitation volume during one wet period to be abnormally large based on GG and gamma models in the moving mode are shown in Figure 7 and Figure 8 (Potsdam) and Figure 9 and Figure 10 (Elista).

If m is the window width (the number of observations in a moving window). A fixed sample point falls in exactly m windows. One of the following cases can occur for a fixed observation:

Absolute (abs) extreme, if at all m windows it is recognized as abnormally extreme;
Intermediate (int) extreme, if it is recognized as abnormally extreme for at least half of windows containing it;
Relative (rel) extreme, if it is recognized as abnormally extreme for at least one window;
Not extremal, if it is not recognized as abnormally extreme for all windows.

Algorithm A3 (see Appendix A) realizes the method based on the statistic R described above.

For the sake of vividness on these figures the time horizon equals 90 and 360 days and the significance level

α

of the tests is

0.01

. The absolutely, intermediate and relatively abnormal precipitation volumes are marked by downward-pointing triangles, circles and squares, respectively, for the test based on the gamma model, whereas the corresponding test based on the statistic R based on the GG distribution are marked by upward-pointing triangles, diamonds and right-pointing triangles, respectively. It is worth noting that MATLAB’s notations are used here for these markers.

Figure 7, Figure 8, Figure 9 and Figure 10 demonstrate non-trivial values of the parameter

γ

, that is,

γ \neq 1

. For Potsdam

γ = 1.286

, whereas for Elista

γ

equals

1.279

. At the same time, the results of the two methods are quite close, although the approach based on the GG distribution demonstrates a higher quality of determining potentially extreme observations. The same conclusions are valid for smaller window sizes.

8. Comparison of GG-Based Statistical Test and Peaks over Threshold Methodology for Extreme Precipitation Intensities

One important precipitation indicator is the precipitation intensity that is defined as the ratio of the total precipitation volume over a wet period to the duration of this wet period measured in days. The extreme precipitation volumes and intensities are relevant to various problems of climatology and hydrometeorology (see, for example, [44,45,46,47]). Traditionally, these phenomena are investigated for different geographical regions or countries [48,49]. In particular, the issue of determining threshold values, the excess of which leads to the extreme events, for example, in daily rainfalls or their intensities, is the key point of the study. Precipitation intensities are important not only for forecasting floods but also for solving problems such as runoff and soil erosion [50,51]. It can be explained by the contemporary climate change scenarios that predict a significant increase in the frequency of high intensity rainfall events, primarily in the dry areas. Moreover, precipitation can induce shallow landslides [52,53] and debris flows [54].

Statistical analysis of real data shows that the probability distribution of the precipitation intensity can be approximated by the gamma distribution with very high accuracy. In [55], some theoretical arguments were presented to justify the gamma model for the distribution of precipitation intensities. So, the statistical approach described in Section 6 and in [2] can be also used for identification of abnormally large intensities. For the analysis, the precipitation intensities in Potsdam and Elista (verified samples without missing values) are used as the initial data. This section presents a comparison of a non-parametric approach based on the extreme value theory as well as modified Peaks over Threshold (PoT) methodology [56] with the parametric approach that significantly involves testing parametric statistical hypotheses to determine extreme intensities of wet periods (see Section 6). The classical version of PoT [57] is quite popular for solving a wide range of climatic problems. In particular, the following results can be mentioned: the inverse Weibull distribution as an extreme wind speed model [58], a time-dependent versions of the PoT model for severe storm waves [59] and daily temperatures [60], probability model for rainfalls of high magnitude [61], analysis of precipitation extremes in a changing climate [62]. Most applications of the extreme value theory assume stationarity, but it is well-known that real events are not stationary. So, the generalized results analogous to Theorem 7 are required. All the numerical methods are implemented as a MATLAB program. Algorithm A4 demonstrates the method based on the PoT and GG-test (see Appendix A).

Figure 11 and Figure 12 present the results obtained by the modified PoT algorithm in which the Weibull distribution is considered as the distribution of time between extreme events. Starting from the maximum threshold value that coincides with the maximum of the analyzed data, the hypothesis that the time intervals between the moments of excess of a certain threshold have the Weibull distribution is tested. The corresponding P-value is saved, and the threshold is shifted down by a certain (small) step (in this case,

0.01

). It is worth noting that a similar procedure was suggested in [56] for precipitation volumes under the assumption that the time intervals between excesses have the exponential distribution. For a given significance level (in both cases,

α

is chosen as

0.01

), the corresponding hypothesis is not rejected for all thresholds for which the P-values are located to the right of the red vertical line in the upper graphs in Figure 11 and Figure 12. The lower graphs show the parameters of the fitted Weibull distribution.

On Figure 13 and Figure 14 the results of the test (see Section 6) for both the GG and usual gamma distribution are compared with those of the PoT method based on the exponential and Weibull distributions for the intensities in Potsdam and Elista. The following notation is used:

The thresholds with the indices low correspond to the minimum levels at which the hypothesis of exponential or Weibull distribution is not rejected (the lowest point to the right of the red line on the upper graphs in Figure 11 and Figure 12);
The thresholds with the indices maxval correspond to the maximum P-value (the rightmost point in the upper graphs);
The thresholds with the indices high correspond to the upper level, when the corresponding hypothesis is not rejected (the highest point to the right of the red line on the graphs).

The green-filled downward-pointing triangles mark the intensities, which are classified as absolutely abnormal based on the GG test (see Section 6). The black upward-pointing triangles correspond to the decision based on the classical gamma distribution test (that is,

γ = 1

, see (31)). The circles denote intermediate extreme observations, and the squares mark relatively extreme ones. This classification is described in Section 7. It is worth noting that for the GG test in Potsdam the value of

γ

is

1.0775

and for Elista

γ = 1.1257

.

For Potsdam, the results of gamma and GG tests are good and close. In addition, the PoT method is also effective in the case where the threshold is chosen with the maximum P-value. However, for Elista, with less rainfalls with lower intensities, the results are quite different. Indeed, the decisions of the PoT method are close for the exponential and Weibull cases (the thresholds differ by only

0.29

). However, a statistical test based on the gamma distribution identifies only four intensities as absolutely extreme, while the GG test identifies more absolute extremes, including those below the thresholds mentioned above.

9. Conclusions and Discussion

In the paper, asymptotic models for some precipitation characteristics based on GNB distributions were considered. Also, a statistical test based on the GG distribution was proposed for the determination of the type of precipitation extremes. The GG and GNB distributions are not quite widespread, so the methods for the estimation of their parameters are, as a rule, not implemented in standard statistical packages. Therefore, the implementation of appropriate procedures requires the creation of specialized software solutions, for example, based on the functional approach, as it was done in the study described in this paper using the MATLAB programming language. However, as was demonstrated in the paper, the results of fitting such distributions to real data turned out to be better as compared to conventional models. Therefore, for processing spatial meteorological data from a large number of stations, the proposed methods and models can be effectively implemented as high-performance computing services.

Author Contributions

Conceptualization, V.K., A.G.; formal analysis, V.K., A.G.; funding acquisition, A.G.; investigation, A.G., V.K.; methodology, V.K., A.G.; project administration, V.K., A.G.; resources, A.G.; software, A.G.; supervision, V.K.; validation, A.G.; visualization, A.G.; writing—original draft, V.K., A.G.; writing—review and editing, A.G., V.K. All authors have read and agreed to the published version of the manuscript.

Funding

The results presented in Section 8 were obtained by Andrey Gorshenin to meet the research goals of grant No 18-71-00156 of the Russian Science Foundation.

Acknowledgments

Authors thank the reviewers for their valuable comments that helped to improve the presentation of the material.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Algorithms

This section presents the above-described algorithms in the form pseudo-code. All of these algorithms have been implemented using Matlab programming language without any tools specific to this development environment. They can be successfully implemented, for example, with the Python programming language taking into account some minor changes.

Algorithm A1: GNB approximations

1: Load(Data);
2:

α

=0.05;
3: WP=WetPeriods(Data);
4: r=NBfit(WP-1,

α

);
5: [

γ_{l 1}

,

μ_{l 1}

,

e r r_{l 1}

]=GNBapprox(

l_{1}

, r);
6: [

γ_{l 2}

,

μ_{l 2}

,

e r r_{l 2}

]=GNBapprox(

l_{2}

, r);
7: [

γ_{l \infty}

,

μ_{l \infty}

,

e r r_{l \infty}

]=GNBapprox(

l_{\infty}

, r);

8: Histograms(

γ_{l 1}

,

μ_{l 1}

,

e r r_{l 1}

,

γ_{l 2}

,

μ_{l 2}

,

e r r_{l 2}

,

γ_{l \infty}

,

μ_{l \infty}

,

e r r_{l \infty}

);

// Loading initial data (Potsdam, Elista)
// Significance level
// Finding wet periods in Data
// Finding parameter r
// GNB approximation based on

ℓ_{1}

-distance minimization
// GNB approximation based on

ℓ_{2}

-distance minimization
// GNB approximation based on

ℓ_{\infty}

-distance minimization
// Plotting initial histograms and GNB approximations for all cases

Algorithm A2: Stabilization of averages
1: Load(Potsdam, Elista); 2: m=3000; 3: DataPreprocessing(Potsdam, Elista); 4: [ $β_{P}$ , $a_{P}$ ]=StabParams(Potsdam); 5: [ $β_{E}$ , $a_{E}$ ]=StabParams(Elista); 6: PlotAverages(Potsdam, Elista, $β_{P}$ , $a_{P}$ , $β_{E}$ , $a_{E}$ );	// Loading initial data // The value can be chosen empirically // Search for stabilization parameters using Formulas (24) and (25) // Drawing results

Algorithm A3: Statistical test for extreme volumes

1:

α

=0.01;
2: Load(Data);

3: window=DaysToObservations(Days);
4: Vols=Volumes(Data);
5:

γ

=GammaFit(Vols,

α

);
6: [

S R

,

S R_{G G}

]=GGstatistics(Vols,

γ

);
7: [

E x t r_{a b s}

,

G G E x t r_{a b s}

,

E x t r_{i n t}

,

G G E x t r_{i n t}

,

E x t r_{r e l}

,

G G E x t r_{r e l}

]=GGtest(

S R

,

S R_{G G}

);

8: PlotExtremes(

E x t r_{a b s}

,

G G E x t r_{a b s}

,

E x t r_{i n t}

,

G G E x t r_{i n t}

,

E x t r_{r e l}

,

G G E x t r_{r e l}

);

// Significance level
// Loading initial data (Potsdam, Elista)
// Correspondence between astronomical time and sample elements

// Volumes obtained from raw data
// Finding parameter γ
// Finding values of statistics based on Formula (31)

// Plotting initial data and decisions based on statistics

S R

and

S R_{G G}

Algorithm A4: PoT and GG-based test for intensities

1:

α

=0.01;
2: Load(Data);
3: Ints=Intensities(Data);
4: [

G G E x t r_{a b s}

,

G G E x t r_{i n t}

,

G G E x t r_{r e l}

]=GGtest(Ints,

S R_{G G}

);

5: level=max(Ints);
6: L=length(Ints);
7: k=0;

8: while level⩾0 do
9:     I=find(Ints>level);

10:     if length(I)<MinNum) then
11:        level=level-step;
12:         CONTINUE;

13:     end if
14:     [ExpParam(k),ExpPval(k)] = fitExp(I,

α

);
15: [WeibullParams(k),WeibullPval(k)] = fitWeibull(I,

α

);
16: k++;
17: level=level-step;
18: end while
19: PlotExtremes(

G G E x t r_{a b s}

,

G G E x t r_{i n t}

,

G G E x t r_{r e l}

);
20: PlotThresholds( );

// Significance level
// Loading initial data (Potsdam, Elista)
// Intensities obtained from raw data

// Modified PoT
// Initial PoT level (threshold) equal to maximum data value

// Determining the dependence of the level on the p-value

// Minimum sufficient number of elements in the sample exceeding the threshold

// Plotting intensity extremes
// Plotting thresholds

References

Korolev, V.; Gorshenin, A.; Gulev, S.; Belyaev, K.; Grusho, A. Statistical Analysis of Precipitation Events. AIP Conf. Proc. 2017, 1863, 090011. [Google Scholar] [CrossRef] [Green Version]
Korolev, V.; Gorshenin, A.; Belyaev, K. Statistical tests for extreme precipitation volumes. Mathematics 2019, 7, 648. [Google Scholar] [CrossRef] [Green Version]
Korolev, V.; Zeifman, A. Generalized negative binomial distributions as mixed geometric laws and related limit theorems. Lith. Math. J. 2019, 59, 1461–1466. [Google Scholar] [CrossRef] [Green Version]
Zolina, O.; Simmer, C.; Belyaev, K.; Kapala, A.; Gulev, S.; Koltermann, P. Changes in the duration of European wet and dry spells during the last 60 years. J. Clim. 2013, 26, 2022–2047. [Google Scholar] [CrossRef]
Bening, V.; Korolev, V. Generalized Poisson Models and Their Applications in Insurance and Finance; VSP: Utrecht, The Netherlands, 2002. [Google Scholar]
Gnedenko, B.; Korolev, V. Random Summation: Limit Theorems and Applications; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Grandell, J. Doubly Stochastic Poisson Processes; Lecture Notes Mathematics; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 1976; Volume 529. [Google Scholar]
Grandell, J. Mixed Poisson Processes; Chapman and Hall: London, UK, 1997. [Google Scholar]
Korolev, V.; Chertok, A.; Korchagin, A.; Zeifman, A. Modeling high-frequency order flow imbalance by functional limit theorems for two-sided risk processes. Appl. Math. Comput. 2015, 253, 224–241. [Google Scholar] [CrossRef] [Green Version]
Korolev, V.; Skvortsova, N. Stochastic Models of Structural Plasma Turbulence; VSP: Utrecht, The Netherlands, 2006. [Google Scholar]
Stacy, E. A generalization of the gamma distribution. Ann. Math. Stat. 1962, 38, 1187–1192. [Google Scholar] [CrossRef]
Greenwood, M.; Yule, G. An inquiry into the nature of frequency-distributions of multiple happenings, etc. J. R. Stat. Soc. 1920, 83, 255–279. [Google Scholar] [CrossRef] [Green Version]
Holla, M. On a Poisson-inverse Gaussian distribution. Metrika 1967, 11, 115–121. [Google Scholar] [CrossRef]
Sichel, H. On a family of discrete distributions particular suited to represent long tailed frequency data. In Proceedings of the 3rd Symposium on Mathematical Statistics, Pretoria, South Africa, 19–22 July 1971; pp. 51–97. [Google Scholar]
Korolev, V.; Korchagin, A.; Zeifman, A. Poisson theorem for the scheme of Bernoulli trials with random probability of success and a discrete analog of the Weibull distribution. Informat. Appl. 2016, 10, 11–20. [Google Scholar] [CrossRef]
Steutel, F.; van Harn, K. Infinite Divisibility of Probability Distributions on the Real Line; Marcel Dekker: New York, NY, USA, 2004. [Google Scholar]
Korolev, V. Limit distributions for doubly stochastically rarefied renewal processes and their properties. Theory Probab. Appl. 2016, 61, 753–773. [Google Scholar] [CrossRef]
Korolev, V.; Korchagin, A.; Zeifman, A. On doubly stochastic rarefaction of renewal processes. AIP Conf. Proc. 2017, 1863, 090010. [Google Scholar] [CrossRef]
Zaks, L.; Korolev, V. Generalized variance gamma distributions as limit laws for random sums. Informat. Appl. 2013, 7, 105–115. [Google Scholar] [CrossRef]
Korolev, V.; Bening, V.; Shorgin, S. Mathematical Foundations of Risk Theory, 2nd ed.; FIZMATLIT: Moscow, Russia, 2011. [Google Scholar]
Zolotarev, V. One-Dimensional Stable Distributions; American Mathematical Society: Providence, RI, USA, 1986. [Google Scholar]
Schneider, W. Stable Distributions: Fox Function Representationand Generalization; Springer: Berlin, Germany, 1986; pp. 497–511. [Google Scholar]
Uchaikin, V.; Zolotarev, V. Infinite Divisibility of Probability Distributions on the Real Line; VSP: Utrecht, The Netherlands, 1999. [Google Scholar]
Shanbhag, D.; Sreehari, M. On certain self-decomposable distributions. Z. Für Wahrscheinlichkeitstheorie Und Verwandte Geb. 1977, 38, 217–222. [Google Scholar] [CrossRef]
Korolev, V. Product representations for random variables with the Weibull distributions and their applications. J. Math. Sci. 2016, 218, 298–313. [Google Scholar] [CrossRef]
Gleser, L. The gamma distribution as a mixture of exponential distributions. Am. Stat. 1989, 43, 115–117. [Google Scholar] [CrossRef]
Korolev, V. On convergence of distributions of compound Cox processes to stable laws. Theory Probab. Appl. 1999, 43, 644–650. [Google Scholar] [CrossRef]
Korolev, V. Convergence of random sequences with independent random indexes. I. Theory Probab. Appl. 1994, 39, 313–333. [Google Scholar] [CrossRef]
Korolev, V. Convergence of random sequences with independent random indexes. II. Theory Probab. Appl. 1995, 40, 770–772. [Google Scholar] [CrossRef]
Korolev, V.; Gorshenin, A.; Sokolov, I. Max-compound Cox processes. I. J. Math. Sci. 2019, 237, 789–803. [Google Scholar] [CrossRef]
Korolev, V.; Gorshenin, A. The probability distribution of extreme precipitation. Dokl. Earth Sci. 2017, 477, 1461–1466. [Google Scholar] [CrossRef]
Gorshenin, A.; Korolev, V. Scale mixtures of Frechet distributions as asymptotic approximations of extreme precipitation. J. Math. Sci. 2018, 234, 886–903. [Google Scholar] [CrossRef] [Green Version]
Johnson, N.; Kot, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1995; Volume 2. [Google Scholar]
Goldie, C. A class of infinitely divisible distributions. Math. Proc. Camb. Philos. Soc. 1967, 63, 1141–1143. [Google Scholar] [CrossRef]
Korolev, V.; Gorshenin, A.; Sokolov, I. Max-compound Cox processes. III. arXiv 2020, arXiv:1912.02237v2. [Google Scholar]
Galambos, J. The Asymptotic Theory of Extreme Order Statistics; Wiley: New York, NY, USA, 1978. [Google Scholar]
Rényi, A. A Poisson-folyamat egy jellemzese. Magy. Tud. Acad. Mat. Kut. Int. Közl. 1956, 1, 519–527. [Google Scholar] [CrossRef]
Rényi, A. On an extremal property of the Poisson process. Ann. Inst. Stat. Math. 1964, 16, 129–133. [Google Scholar] [CrossRef]
Kalashnikov, V. Geometric Sums: Bounds for Rare Events with Applications; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997. [Google Scholar]
Zolotarev, V. Approximation of Distributions of Sums of Independent Random Variables with Values in Infinite-Dimensional Spaces. Theory Probab. Appl. 1976, 21, 721–737. [Google Scholar] [CrossRef]
Zolotarev, V. Ideal Metrics in the Problem of Approximating Distributions of Sums of Independent Random Variables. Theory Probab. Appl. 1977, 22, 433–449. [Google Scholar] [CrossRef]
Zolotarev, V. Modern Theory of Summation of Random Variables; VSP: Utrecht, The Netherlands, 1997. [Google Scholar]
Korolev, V.; Zeifman, A. Bounds for convergence rate in laws of large numbers for mixed Poisson random sums. arXiv 2020, arXiv:2003.12495. [Google Scholar]
Cheng, L.; AghaKouchak, A. Nonstationary Precipitation Intensity-Duration-Frequency Curves for Infrastructure Design in a Changing Climate. Sci. Rep. 2014, 4, 7093. [Google Scholar] [CrossRef] [Green Version]
Wasko, C.; Sharma, A. Steeper temporal distribution of rain intensity at higher temperatures within Australian storms. Nat. Geosci. 2015, 8, 527–U166. [Google Scholar] [CrossRef]
Mo, C.; Ruan, Y.; He, J.; Jin, J.; Liu, P.; Sun, G. Frequency analysis of precipitation extremes under climate change. Int. J. Climatol. 2019, 39, 1373–1387. [Google Scholar] [CrossRef]
Donat, M.; Angelil, O.; Ukkola, A. Intensification of precipitation extremes in the world’s humid and water-limited regions. Environ. Res. Lett. 2019, 14. [Google Scholar] [CrossRef]
Groisman, P.; Knight, R.; Karl, T. Changes in Intense Precipitation over the Central United States. J. Hydrometeorol. 2012, 13, 47–66. [Google Scholar] [CrossRef]
Xu, C.; Qiao, Y.; Jian, M. Interdecadal Change in the Intensity of Interannual Variation of Spring Precipitation over Southern China and Possible Reasons. J. Clim. 2013, 32, 5865–5881. [Google Scholar] [CrossRef]
Ziadat, F.; Taimeh, A. Effect of Rainfall Intensity, Slope, Land Use and Antecedent Soil Moisture on Soil Erosion in an Arid Environment. Land Degrad. Dev. 2013, 24, 582–590. [Google Scholar] [CrossRef]
Jomaa, S.; Barry, D.; Rode, M.; Sander, G.; Parlange, J. Linear scaling of precipitation-driven soil erosion in laboratory flumes. Catena 2017, 152, 285–291. [Google Scholar] [CrossRef] [Green Version]
Bezak, N.; Auflic, M.J.; Mikos, M. Application of hydrological modelling for temporal prediction of rainfall-induced shallow landslides. Landslides 2019, 16, 1273–1283. [Google Scholar] [CrossRef] [Green Version]
Bliznak, V.; Kaspar, M.; Muller, M.; Zacharov, P. Sub-daily temporal reconstruction of extreme precipitation events using NWP model simulations. Atmos. Res. 2019, 224, 65–80. [Google Scholar] [CrossRef]
Huang, W.; Nychka, D.; Zhang, H. Estimating precipitation extremes using the log-histospline. Environmetrics 2019, 30. [Google Scholar] [CrossRef] [Green Version]
Martinez-Villalobos, C.; Neelin, J. Why Do Precipitation Intensities Tend to Follow Gamma Distributions? J. Atmos. Sci. 2019, 76, 3611–3631. [Google Scholar] [CrossRef]
Gorshenin, A.; Korolev, V. Determining the extremes of precipitation volumes based on a modified “Peaks over Threshold”. Inform. I Ee Primen. 2018, 12, 16–24. [Google Scholar] [CrossRef]
Leadbetter, M. On a basis for “Peaks over Threshold” modeling. Stat. Probab. Lett. 1991, 12, 357–362. [Google Scholar] [CrossRef]
Simiu, E.; Heckert, N. Extreme wind distribution tails: A “peaks over threshold” approach. J. Struct. Eng.-ASCE 1996, 122, 539–547. [Google Scholar] [CrossRef] [Green Version]
Mendez, F.; Menendez, M.; Luceno, A.; Losada, I. Estimation of the long-term variability of extreme significant wave height using a time-dependent Peak Over Threshold (POT) model. J. Geophys. Res.-Ocean. 2006, 111, C07024. [Google Scholar] [CrossRef]
Kyselỳ, J.; Picek, J.; Beranova, R. Estimating extremes in climate change simulations using the peaks-over-threshold method with a non-stationary threshold. Glob. Planet. Chang. 2010, 72, 55–68. [Google Scholar] [CrossRef]
Begueria, S.; Angulo-Martinez, M.; Vicente-Serrano, S.; Lopez-Moreno, I.; El-Kenawy, A. Assessing trends in extreme precipitation events intensity and magnitude using non-stationary peaks-over-threshold analysis: A case study in northeast Spain from 1930 to 2006. Int. J. Climatol. 2011, 31, 2102–2114. [Google Scholar] [CrossRef] [Green Version]
Roth, M.; Buishand, T.; Jongbloed, G.; Tank, A.; van Zanten, J. A regional peaks-over-threshold model in a nonstationary climate. Water Resour. Res. 2012, 48, W11533. [Google Scholar] [CrossRef]

Figure 1. The histograms constructed from real data of 3320 wet periods in Potsdam and the fitted negative binomial (NB) and generalized negative binomial (GNB) models,

ℓ_{1}

-distance minimization.

Figure 1. The histograms constructed from real data of 3320 wet periods in Potsdam and the fitted negative binomial (NB) and generalized negative binomial (GNB) models,

ℓ_{1}

-distance minimization.

Figure 2. The histograms constructed from real data of 2937 wet periods in Elista and the fitted NB and GNB models,

ℓ_{1}

-distance minimization.

Figure 2. The histograms constructed from real data of 2937 wet periods in Elista and the fitted NB and GNB models,

ℓ_{1}

-distance minimization.

Figure 3. The histogram of daily precipitation volumes in Potsdam and the fitted Pareto and gamma distributions.

Figure 4. The histogram of daily precipitation volumes in Elista and the fitted Pareto and gamma distributions.

Figure 5. Stabilization of the cumulative averages of daily precipitation volumes as n grows in Potsdam (continuous line) and Elista (dash line).

Figure 6. Stabilization of the cumulative averages of daily precipitation volumes as n grows with

β = 1.139

for Potsdam (solid line) and with

β = 0.981

for Elista (dashed line).

Figure 6. Stabilization of the cumulative averages of daily precipitation volumes as n grows with

β = 1.139

for Potsdam (solid line) and with

β = 0.981

for Elista (dashed line).

Figure 7. Abnormal precipitation volumes (Potsdam, 90 days).

Figure 8. Abnormal precipitation volumes (Potsdam, 360 days).

Figure 9. Abnormal precipitation volumes (Elista, 90 days).

Figure 10. Abnormal precipitation volumes (Elista, 360 days).

Figure 11. P-values that correspond to the various thresholds (Potsdam).

Figure 12. P-values that correspond to the various thresholds (Elista).

Figure 13. Comparison of statistical tests andpPeaks over threshold methodology for extreme precipitation intensities (Potsdam).

Figure 14. Comparison of statistical tests and peaks over threshold methodology for extreme precipitation intensities (Elista).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Korolev, V.; Gorshenin, A. Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions. Mathematics 2020, 8, 604. https://doi.org/10.3390/math8040604

AMA Style

Korolev V, Gorshenin A. Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions. Mathematics. 2020; 8(4):604. https://doi.org/10.3390/math8040604

Chicago/Turabian Style

Korolev, Victor, and Andrey Gorshenin. 2020. "Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions" Mathematics 8, no. 4: 604. https://doi.org/10.3390/math8040604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions

Abstract

1. Introduction

2. Generalized Negative Binomial Model for the Duration of Wet Periods

3. Notation, Definitions and Mathematical Preliminaries

4. The Asymptotic Approximation to the Probability Distribution of Extremal Daily Precipitation within a Wet Period

5. The Asymptotic Approximation to the Probability Distribution of Total Precipitation over a Wet Period. Generalized R ényi Theorem for Gnb Random Sums

6. Statistical Tests for Anomalously Extreme Total Precipitation Volumes

7. Comparison of Tests for Anomalously Extreme Precipitation Volumes Based on Gamma and Gg Distributions

8. Comparison of GG-Based Statistical Test and Peaks over Threshold Methodology for Extreme Precipitation Intensities

9. Conclusions and Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Algorithms

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI