Distinguishing Log-Concavity from Heavy Tails

Søren Asmussen; Jaakko Lehtomaa

doi:10.3390/risks5010010

Abstract

Well-behaved densities are typically log-convex with heavy tails and log-concave with light ones. We discuss a benchmark for distinguishing between the two cases, based on the observation that large values of a sum

X_{1} + X_{2}

occur as result of a single big jump with heavy tails whereas

X_{1}, X_{2}

are of equal order of magnitude in the light-tailed case. The method is based on the ratio

| X_{1} - X_{2} | / (X_{1} + X_{2})

, for which sharp asymptotic results are presented as well as a visual tool for distinguishing between the two cases. The study supplements modern non-parametric density estimation methods where log-concavity plays a main role, as well as heavy-tailed diagnostics such as the mean excess plot.

Keywords:

heavy-tailed; log-concave; mean excess function; principle of a single big jump

MSC:

60E05; 60G50; 62G20

1. Introduction

General interest towards non-parametric thinking has increased over the last few years. One example is density estimation under shape constraints instead of requiring the membership of a parametric family. Here, a particularly robust alternative to parametric tests is provided by searching for the best fitting log-concave density. Another example is the mean excess plot that aims at distinguishing light and heavy tails.

Throughout the paper, we consider i.i.d. random variables

X, X_{1}, X_{2}, \dots > 0

with common distribution F having density f and tail

\bar{F} (x) = P (X > x)

. Then, X is (right) heavy-tailed if

E e^{s X} = \infty

for all

s > 0

and light-tailed otherwise. The density f is log-concave, if

f (x) = e^{ϕ (x)}

, where

ϕ

is a concave function. If

ϕ

is convex, then f is log-convex. This paper aims to illustrate that light-tailed asymptotic behaviour is associated with log-concave densities. Likewise, log-convexity seems to be connected to heavy-tailed behaviour. One can use the connection to assess potential heavy-tailedness by searching for patterns that are typically present among distributions with log-concave or log-convex densities.

Log-concavity is a widely studied topic in its own right [1,2]. There also exists substantial literature regarding its connections to probability theory and statistics [3,4]. Several papers concentrate on the statistical estimation of density functions assuming log-concavity [5,6]. This is due to the fact that log-concavity provides desirable statistical properties for estimators. For instance, maximum likelihood estimation becomes applicable and the estimate is unique. The topic is discussed in detail in the beginning of [7]. Unfortunately, much less emphasis seems to be put on verification of the log-concavity property itself. Specifically, it seems to be relatively little studied if it is feasible that the sample be generated by a log-concave distribution. See, for example [8,9].

A distribution with a log-concave density f is necessarily light-tailed. In contrast, f is log-convex in the tail in the standard examples of heavy tails such as regular variation, the lognormal distribution and Weibull case

\bar{F} (x) = e^{- x^{α}}

with

α < 1

. An important class of heavy-tailed distributions are the subexponential ones defined by

P (X_{1} + X_{2} > d) \sim 2 \bar{F} (d)

. The intuition underlying this definition is the principle of a single big jump:

X_{1} + X_{2}

is large if one of

X_{1}, X_{2}

is large, whereas the other remains typical. This then motivates

R = \frac{| X_{1} - X_{2} |}{X_{1} + X_{2}}

(1)

being close to 1. In contrast, the folklore is that

X_{1}, X_{2}

contribute equally to

X_{1} + X_{2}

with light tails. We are not aware of general rigorous formulations of this principle, but it is easily verified in explicit examples like a gamma or normal F (see further below) and, for a large number of summands rather than just 2, it is supported by conditioned limit theorems (see e.g., ([10] (VI.5))). However, it was recently shown in [4] that these properties of R hold in greater generality and that asymptotic properties of the corresponding conditioned random variable

Y_{d} = R | X_{1} + X_{2} > d

(2)

provide a sharp borderline between log-convexity and log-concavity.

In this paper, we provide a wider perspective in terms of both sharper and more general limit results and of the usefulness for visual statistical data exploration. To this end, we propose a feature based nonparametric test. It can be used as a visual aid in identification of log-concavity or heavy-tailed behaviour. It complements earlier ways to detect signs of heavy-tailedness such as the mean excess plot [11]. Further tests based on probabilistic features have been previously utilised in e.g., [12,13,14].

2. Background

A property holds eventually, if there exists a number

y_{0}

so that the property holds in the set

[y_{0}, \infty)

. Standard asymptotic notation is used for limiting statements. These and basic properties of regularly varying functions with parameter α, denoted RV(α), can be recalled from e.g., [15].

We note that the principle of a single big jump relates to the fact that joint distributions of independent random variables concentrate probability mass to different regions. For example, a distribution with tail function

\bar{F} (x) = e^{- x^{α}}

satisfies

\bar{F} (x) = o (\bar{F} {(x / 2)}^{2})

for

α > 1

and

\bar{F} {(x / 2)}^{2} = o (\bar{F} (x))

for

α \in (0, 1)

, as

x \to \infty

. We refer to [16,17,18,19] for related work in this direction. It is shown in Lemma 1.2 of [4] that log-concavity or log-concavity of the density is closely related to the occurrence of the principle of a single big jump. A further observation in this direction is the following lemma. It states that contour lines of joint densities of independent variables behave differently for log-concave and log-convex densities, and thereby leads naturally to different concentrations of probability mass of joint densities (recall that a contour line corresponding to a value

p \in R

of joint density

f : R^{2} \to R

is the set of points in the plane defined as

{(x, y) \in R^{2} : f (x, y) = p}

).

Lemma 1.

Suppose

X_{1}

and

X_{2}

are i.i.d. unbounded non-negative random variables. Assume further that they have a common twice differentiable density function f of the form

f (x) = e^{- h (x)},

where h is a strictly increasing function.

If f is log-concave (log-convex), then, for any fixed

p \in (0, e^{- h (0)})

, there exists a convex (concave) function

ψ_{p}

defining a contour line of

f_{X_{1}, X_{2}}

corresponding to p such that

f_{X_{1}, X_{2}} (x, ψ_{p} (x)) = p

for all

x \in [0, h^{- 1} (- log p - h (0))]

.

Lemma 1 implies that log-convex and log-concave densities cause maximal points of joint densities to accumulate into different regions in the plane. Log-convex densities tend to put probability mass near the axis, while log-concave densities have a tendency to concentrate mass near the graph of the identity function. The exponential density is the limiting case where all contour lines are straight lines. More generally, for

f_{α} (x) = C_{α} e^{- x^{α}}

, where

C_{a} > 0

is an integration constant, the contour lines are circles for

α = 2

, straight lines for

α = 1

, and parabolas for

α = 1 / 2

.

3. Theoretical Results

The emphasis of the paper is on the mathematical formulation of the connection between log-convexity and the principle of a single big jump. However, some additional theoretical results are provided concerning convergence rates of the conditional ratio defined in Equation (3). These rates, or estimates for the rates, are obtained in some standard distribution classes. Their proofs are mainly based on sharp asymptotics of subexponential distributions obtained in [20,21,22]. Recall that some main classes of such distributions are RV

(α)

, meaning regularly varying ones, where

\bar{F} (x) = L (x) / x^{α}

with

α > 0

and

L (\cdot)

are slowly varying, Weibull tails with

\bar{F} (x) = e^{- x^{α}}

for some

α \in (0, 1)

, and lognormal tails which are close to the case

γ = 2

of

\bar{F} (x) = e^{- {log}^{γ} x}

for

x \geq 1

and some

γ > 1

; we refer in the following to this class as lognormal type tails.

3.1. Convergence Properties

Define the function

g : (0, \infty) \to [0, 1]

by

g_{X} (d) = g (d) = E [\frac{| X_{1} - X_{2} |}{X_{1} + X_{2}} | X_{1} + X_{2} > d] .

(3)

It can be viewed as a generalisation of the function

f_{Z_{d}}

considered in [4] and has the same interpretation as in the case with densities: if both

X_{1}

and

X_{2}

contribute equally to the sum

X_{1} + X_{2}

, then g should eventually obtain values close to 0; similarly, if only one of the variables tends to be the same magnitude as the whole sum, then g is close to 1 for large d. Note also that g is scale independent in the sense that

g_{a X} (d) = g_{X} (d / a)

for all

a > 0

. Due to this property, two or more samples can be standardised to have, say, equal means in order to obtain graphs on the same scale.

In Proposition 1, sharp asymptotic forms of g are exhibited in some classes of distributions.

Proposition 1.

The following convergence rates hold for g defined in Equation (3).

Let X be RV $(α)$ with $α > 1$ , eventually decreasing density f. Then,

$g (d) = 1 - \frac{c}{d} + o (1 / d) where c = \frac{2 α E [X]}{α + 1} .$
Let X be Weibull distributed with $α < 1$ . Then,

$g (d) = 1 - o (d^{α - 1}) .$

(4)
Let X be of lognormal type. Then,

$g (d) = 1 - o ({log}^{γ - 1} x / x) .$

(5)

Remark 1.

In the case of Weibull and lognormal distributions, the implication is that

g (d)

converges to 1 at a larger rate than their associated hazard rates tend to zero. In addition, inspection of the proof shows

\underset{d \to \infty}{lim inf} d |g (d) - 1| > 0 .

This implies that the actual convergence rate can not be substantially larger than in the regularly varying case, where the leading term is explicitly identified.

The light-tailed case appears to be more difficult to study than the heavy-tailed case. Difficulty arises mainly from the lack of good asymptotical approximations for probabilities of the form

P (X_{1} + X_{2} > d)

when

P (X_{1} > d)

decays much faster than

e^{- d}

. Interestingly, the full asymptotic form of g can be recovered in the special case of the normal distribution if we allow X to obtain negative values.

Proposition 2.

Suppose that X is normally distributed with

E [X] = 0

and

V a r [X] = 1 / \sqrt{2}

. Then,

g (d) = \frac{c}{d} + o (1 / d), where c = E [| X_{1} - X_{2} |] .

(6)

The following theorem can be used to assess if a sample is coming from a source with log-concave density. It can be seen as a natural continuation as well as a generalisation to [4].

Theorem 1.

Assume the density f is twice differentiable and eventually log-concave. Then,

\underset{d \to \infty}{lim sup} g (d) \leq \frac{1}{2} .

(7)

Similarly, if f is eventually log-convex, then

\underset{d \to \infty}{lim inf} g (d) \geq \frac{1}{2} .

(8)

4. Statistical Application: Visual Test

Suppose

(X_{1}, Y_{2}), (X_{2}, Y_{2}), \dots

is a sequence of i.i.d. vectors whose components are also i.i.d. One can formulate the empirical counterpart of Quantity (3) by setting

\hat{g} (d, n) = \frac{\sum_{k = 1}^{n} R_{k} 1 (X_{k} + Y_{k} > d)}{\sum_{k = 1}^{n} 1 (X_{k} + Y_{k} > d)},

(9)

where

R_{k} = \frac{| X_{k} - Y_{k} |}{X_{k} + Y_{k}},

and

1 (A)

is the indicator function of the event A.

Remark 2.

Equation (9) requires as input a two-dimensional sequence of random variables. One can form such a sequence from a real valued i.i.d. source

Z_{1}, Z_{2}, \dots, Z_{N}

using any pairing of the

Z_{i}

. Obvious examples are to take

X_{k} = Z_{2 k - 1}, Y_{k} = Z_{2 k}

to take the set

{(X_{k}, Y_{k})}

as all pairings of the

Z_{k}

or as a randomly sampled subset of these

N (N - 1) / 2

pairings. If the data is truly i.i.d, this should not have any effect on the outcome.

4.1. Examples and Applications

A graph of

\hat{g} (d, n)

as function of d can be used to determine if the data support the density being log-concave or light-tailed behaviour. According to Theorem 1, the graph should then stay below

1 / 2

. Figure 1, Figure 2, Figure 3 and Figure 4 illustrate such graphs using experimental data.

Figure 1. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Gamma distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 2. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Weibull distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 3. Graph of

\hat{g} (d, n)

from a classical set of Danish fire insurance data that can be obtained for instance from data set ‘danish’ in the R package [23]. The data is scaled to have mean 1. The sample is traditionally used to illustrate how heavy-tailed data behaves. A similar set of data was previously used in [24]. The graph supports the usual finding that the data set is heavy-tailed.

Figure 4. The graphs of multiple versions of

\hat{g} (d, n)

based on a dataset obtained from Hansjörg Albrecher (private communication) and related to occurrences of floods in a particular area. The data is scaled to have mean 1. The sample size is

n = 39

. Bivariate vectors

(X_{1}, Y_{1}), \dots, (X_{19}, Y_{19})

were sampled several times randomly without replacement from the original data. The overall appearance of the paths points to the data being heavy- rather than light-tailed.

The test method is visual. A similar idea has been used at least in the classical mean excess plot, where one visually assesses if the tail excess in the sample points is increasing at the level, as is the case for heavy tails.

4.2. Finer Diagnostics

The idea of plotting

\hat{g} (d, n)

as a function of d was introduced as a graphical test for distinguishing between heavy-tailed and log-concave light tailed distributions. It seems reasonable to ask if the plot can be used for finer diagnostics, in particular to further specify the tail behaviour of F when F was found to be heavy-tailed. Such an idea would be based on the rate of convergence of

\hat{g} (d, n)

to 1.

To gain some preliminary insight, we simulated

R = 5000 = 5 \cdot 10^{3}

and

R = 5 \cdot 10^{6}

i.i.d pairs of r.v.’s

X_{1}, X_{2}

from an F which was either RV(1.5), lognormal(0,1) or Weibull(0.4). The results are in Figure 5 as plots of

d (1 - \hat{g} (d, n))

, with three runs in each subfigure.

Figure 5.

d (1 - \hat{g} (d, n))

. Pareto in the first row, lognormal in the second, and Weibull in the last.

R = 5000

(left),

R = 5 \times 10^{6}

(right).

A first conclusion is that a sample size of

R = 5000

is grossly insufficient for drawing conclusions about the way in which

\hat{g} (d, n)

approaches 1—random fluctuations take over long before a systematic trend is apparent. The sample size

R = 5 \cdot 10^{6}

is presumably unrealistic in most cases, but even for this, the picture is only clear in the RV case. Here,

d (1 - \hat{g} (d, n))

seems to have a limit c, as it should be, and the plot is in good agreement with the value 2.4 =

2 α E [X] / (α + 1)

of c predicted by Proposition 1.

Whether a limit exists in the lognormal or Weibull case is less clear. The results of Proposition 1 are less definite here, but, actually, a heuristic argument suggests that the limit c should exist and be

2 E [X]

. To this end, let

X_{(1)} < X_{(2)}

be the order statistics. According to subexponential theory (see, in particular, [25]),

X_{(1)}, X_{(2)}

are asymptotically independent given

X_{1} + X_{2} > d

, with

X_{(1)}

having asymptotic distribution F and

X_{(2)}

being of the form

d + e (d) E

with

e (d)

, and E as in the proof of Proposition 1. For large d, this gives

\begin{matrix} \frac{| X_{1} - X_{2} |}{X_{1} + X_{2}} & \sim \frac{X_{(2)} - X_{(1)}}{X_{(2)}} \sim \frac{1 - X_{(1)} / X_{(2)}}{1 + X_{(1)} / X_{(2)}} \sim 1 - 2 X_{(1)} / X_{(2)} . \end{matrix}

(10)

In the lognormal or Weibull case, one has

e (d) = o (d)

and so

d + e (d) E \sim d

. Taking expectations gives the conjecture.

For the Weibull,

2 E [X] = 6.65

, and the

R = 5 \cdot 10^{6}

part of Figure 5 is rather inconclusive concerning the conjecture. We did one more run with

R = 5 \cdot 10^{7}

over a range of parameters (the variance

σ^{2}

of

log X

for the lognormal and β for the Weibull). All

X_{1}, X_{2}

were normalized to have mean 1 so that the conjecture would assert convergence to

c = 2

. This is not seen in the results in Figure 6. Large values of

σ^{2}

and small values of β could appear to give convergence, but not to 2.

Figure 6.

R = 5 \times 10^{7}

. Lognormal (left), Weibull (right).

It should be noted that the heuristics give the correct result in the RV case. Namely, here we can take

e (d) = d

and

P (E > x) = 1 / {(1 + x)}^{α}

. This easily gives

E 1 / (1 + E) = α / (α + 1)

so that Quantity (10) is approximately

1 - 2 α E [X] / d (α + 1)

, as rigorously verified in Proposition 1.

The overall conclusion is that the finer diagnostic value of the method is quite limited, and restricted to RV and sample sizes which may be unrealistically large in many contexts.

5. Proofs

Proof of Lemma 1.

Suppose h is concave and

p \in (0, 1)

. The contour line corresponding to value p is formed as the set of points

(x, y)

that satisfy

f_{X_{1}, X_{2}} (x, y) = p

, or equivalently

h (x) + h (y) = - log p .

(11)

For any such pair

(x, y)

one can solve Equation (11) for y to obtain

y = h^{- 1} (- log p - h (x)) .

(12)

Firstly,

h^{- 1}

is convex as the inverse of an increasing concave function. Secondly, the composition of an increasing convex function and a convex function remains convex. Thus, as a function of x, Expression (12) defines a convex function when

x \in [0, h^{- 1} (- log p - h (0))]

. Thus, one can define

ψ_{p} (x) = h^{- 1} (- log p - h (x))

.

If h is convex, the proof is analogous. ☐

The following technical lemma is needed in the proof of Proposition 1. It applies to Pareto, Weibull and lognormal type distributions. Indeed, condition (13) follows from Proposition 1.2; (ii) of [21] and further needed assumptions are easily verified apart from strong subexponentiality, which is known to hold in the mentioned examples.

Lemma 2.

Suppose

X_{1}

and

X_{2}

are non-negative i.i.d. variables with a common density f, where the hazard rate

r (d) = f (d) / \bar{F} (d)

is eventually decreasing with

r (d) = o (1)

. Assume further that

P (X_{1} + X_{2} > d) - 2 P (X_{1} > d) \sim 2 E [X] f (d) .

(13)

Then,

\frac{2 P (X_{1} > d) + 2 f (d) E [X]}{P (X_{1} + X_{2} > d)} = 1 + o (r (d)) .

(14)

If in addition

\bar{F} {(d / 2)}^{2} = o (\bar{F} (d))

, then

\frac{P (X_{1} \leq d / 2, X_{2} \leq d, X_{1} + X_{2} > d)}{P (X_{1} > d)} = E [X] r (d) + o (r (d)) .

(15)

Proof.

Equality (13) implies subexponentiality of

X_{1}

. Writing

\frac{2 P (X_{1} > d) + 2 E [X] f (d)}{P (X_{1} + X_{2} > d)} = 1 + \frac{- (P (X_{1} + X_{2} > d) - 2 P (X_{1} > d)) + 2 E [X] f (d)}{P (X_{1} + X_{2} > d)}

and observing that the nominator on the right-hand side is of order

E [X] r (d) o (1) 2 \bar{F} (d)

proves Equation (14) since

2 P (X_{1} > d) / P (X_{1} + X_{2} > d) \to 1

by subexponentiality.

Equality (13) implies

\begin{matrix} \frac{P (X_{1} + X_{2} > d)}{2 P (X_{1} > d)} & = & 1 + \frac{2 E [X] f (d) (1 + o (1))}{2 P (X_{1} > d)} \\ = & 1 + E [X] r (d) + o (r (d)) . \end{matrix}

(16)

On the other hand, writing

\begin{matrix} {X_{1} + X_{2} > d, X_{2} > X_{1}} & = & {X_{1} \leq d / 2, X_{2} > d} \\ \cup & {X_{1} \leq d / 2, X_{2} \leq d, X_{1} + X_{2} > d} \\ \cup & {X_{1} > d / 2, X_{2} > d / 2} \\ = & A_{1} \cup A_{2} \cup A_{3} \end{matrix}

gives

\frac{P (X_{1} + X_{2} > d)}{2 P (X_{1} > d)} = \frac{2 P (A_{1}) + 2 P (A_{2}) + P (A_{3})}{2 P (X_{1} > d)} = 1 + \frac{P (A_{2})}{P (X_{1} > d)} + o (r (d)) .

Since we know from Equation (16) that the leading term tending to zero must be

E [X] r (d)

, Equation (15) holds. ☐

Proof of Proposition 1.

Suppose X is regularly varying with index α. In light of Lemma 2, we only need to establish

E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{1} > X_{2}, X_{1} + X_{2} > d] = \bar{F} (d) (1 - \frac{c}{d} + \frac{α}{d} E X + o (1 / d)) .

(17)

The contribution to the l.h.s of Equation (17) from

E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{1} > X_{2}, X_{2} > A d, X_{1} + X_{2} > d]

is of order

O (\bar{F} (d / 2) \bar{F} (A d)) = O (d^{- 2 α + ϵ})

for any

A > 0

and

ϵ > 0

. Thus, it can be neglected. We are left with estimating

\begin{matrix} E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{1} > X_{2}, X_{2} \leq A d, X_{1} + X_{2} > d] \\ = E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{2} \leq A d, X_{1} + X_{2} > d] \\ = \int_{0}^{A d} E [\frac{X_{1} - y}{X_{1} + y}; X_{1} + y > d] f (y) d y . \end{matrix}

We will bound this quantity from above and below, assuming

A < 1 / 2

.

Firstly,

\int_{0}^{A d} E [\frac{X_{1} - y}{X_{1} + y}; X_{1} + y > d] f (y) d y \leq \int_{0}^{A d} E [(1 - \frac{2 (1 - A) y}{X_{1}}); X_{1} + y > d] f (y) d y .

Now, given

X > x

,

X - x

is approximately distributed as

x E

for large x where

P (E > z) = 1 / {(1 + z)}^{α}

. Hence, dominated convergence gives

E [\frac{1}{X} | X > z] \sim E [\frac{1}{z (1 + E)}], z \to \infty .

We get

\begin{matrix} \int_{0}^{A d} E [(1 - \frac{2 (1 - A) y}{X_{1}}); X_{1} + y > d] f (y) d y \\ \leq \int_{0}^{A d} (1 - \frac{2 (1 - A) y}{(d - y)} (1 + o (1)) E \frac{1}{1 + E}) \bar{F} (d - y) f (y) d y \\ = \int_{0}^{A d} (1 - \frac{2 (1 - A) y}{(d - y)} \frac{α}{α + 1}) \bar{F} (d - y) f (y) d y + η_{1} (d) \\ \leq \int_{0}^{A d} (1 - \frac{2 (1 - A) y}{d} \frac{α}{α + 1}) (\bar{F} (d) + y f (d)) f (y) d y + η_{1} (d) + η_{2} (d) \\ \leq \int_{0}^{A d} (1 - \frac{2 (1 - A) y}{d} \frac{α}{α + 1}) (\bar{F} (d) + \frac{y α}{d} \bar{F} (d) (1 + A)) f (y) d y + η_{1} (d) + η_{2} (d) \\ \leq \bar{F} (d) (1 - \frac{2 (1 - A) E X}{d} \frac{α}{α + 1} + \frac{α E X (1 + A)}{d}) + η_{1} (d) + η_{2} (d) . \end{matrix}

Here, the error terms

η_{1} (d)

and

η_{2} (d)

are of order

o (\bar{F} (d) / d)

. The latter error comes from Taylor expansion of function

\bar{F} (d - y)

around point

y = 0

. The fact that f is assumed eventually decreasing guarantees that

f (x) \sim α x^{- α - 1} L (x)

, when

\bar{F} (x) = x^{- α} L (x)

.

Secondly, for the lower bound, we have that

\int_{0}^{A d} E [\frac{X_{1} - y}{X_{1} + y}; X_{1} + y > d] f (y) d y \geq \int_{0}^{A d} E [(1 - \frac{2 y}{X_{1}}); X_{1} + y > d] f (y) d y .

As before, we get

\begin{matrix} \int_{0}^{A d} E [(1 - \frac{2 y}{X_{1}}); X_{1} + y > d] f (y) d y \\ \geq \int_{0}^{A d} (1 - \frac{2 y}{(d - y)} (1 + o (1)) E \frac{1}{1 + E}) \bar{F} (d - y) f (y) d y \\ = \int_{0}^{A d} (1 - \frac{2 y}{(d - y)} \frac{α}{α + 1}) \bar{F} (d - y) f (y) d y + η_{1} (d) \\ \geq \int_{0}^{A d} (1 - \frac{2 y}{d} \frac{α}{α + 1}) (\bar{F} (d) + y f (d)) f (y) d y + η_{1} (d) + η_{2} (d) \\ \geq \int_{0}^{A d} (1 - \frac{2 y}{d} \frac{α}{α + 1}) (\bar{F} (d) + \frac{y α}{d} \bar{F} (d) (1 - A)) f (y) d y + η_{1} (d) + η_{2} (d) \\ \geq \bar{F} (d) (1 - \frac{2 E X}{d} \frac{α}{α + 1} + \frac{α E X (1 - A)}{d}) + η_{1} (d) + η_{2} (d) \end{matrix}

for error terms

η_{1}

and

η_{2}

of order

o (\bar{F} (d) / d)

.

Repeating the argument with arbitrarily small

A > 0

and combining the upper and lower estimates allows one to deduce

d |g (d) - (1 - \frac{c}{d})| \to 0,

as

d \to \infty

, which proves the claim.

Suppose then that X is Weibull distributed. Now assumptions of Lemma 2 are satisfied with

r (d) = α d^{α - 1}

. Since

\bar{F} {(d / 2)}^{2} = O (e^{- c x^{α}})

for some

c > 1

depending on α, we only need to find the order of

E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{1} > X_{2}, X_{2} \leq d / 2, X_{1} + X_{2} > d] .

(18)

In fact, proceeding similarly as in the regularly varying case, it can be seen that Quantity (18) equals

\int_{0}^{d / 2} E [1 - \frac{2 y}{X_{1} + y} | X_{1} + y > d] \bar{F} (d - y) f (y) d y .

(19)

It is known that

(X_{1} - z) / e (z) | X_{1} > z

, where

e (z) = 1 / (α z^{α - 1})

, converges in distribution to a standard exponential variable, as

z \to \infty

. Because

e (z / 2) / z = o (1)

, it holds for

y \in [0, d / 2]

that

\begin{matrix} E [1 - \frac{2 y}{X_{1} + y} | X_{1} + y > d] \\ = & 1 - \frac{2 y}{d} E [\frac{1}{\frac{e (d - y)}{d} \frac{X_{1} - (d - y)}{e (d - y)} + 1} | X_{1} + y > d] \\ = & 1 - \frac{2 y}{d} (1 + o (1)) \end{matrix}

(the interchange of expectation and convergence is justified by dominated convergence). In addition, the same error term can be used for any y.

Thus, Quantity (19) can be written as

\begin{matrix} \int_{0}^{d / 2} (1 - \frac{2 y}{d} (1 + o (1))) [\bar{F} (d) + \int_{d - y}^{d} f (s) d s] f (y) d y \\ = & \bar{F} (d) \int_{0}^{d / 2} (1 - \frac{2 y}{d}) f (y) d y \\ + & \bar{F} (d) \frac{\int_{0}^{d / 2} [\int_{d - y}^{d} (1 - \frac{2 y}{d} (1 + o (1))) f (s) d s] f (y) d y}{\bar{F} (d)} + o (\bar{F} (d) / d) . \end{matrix}

Now, using the definition of

A_{2}

from Lemma 2 with Equality (15), we get

\frac{\int_{0}^{d / 2} [\int_{d - y}^{d} f (s) d s] f (y) d y}{\bar{F} (d)} = \frac{P (A_{2})}{P (X_{1} > d)} = E [X] r (d) + o (r (d))

and

\frac{\int_{0}^{d / 2} (2 y / d) [\int_{d - y}^{d} f (s) d s] f (y) d y}{\bar{F} (d)} = 2 E [X_{1} / d | A_{2}] \frac{P (A_{2})}{P (X_{1} > d)} = o (r (d)),

since

E [X_{1} / d | A_{2}] = o (1) .

(20)

Equation (20) follows from the fact that conditionally to

A_{2}

, all probability mass concentrates near small values of

X_{1} / d

.

Gathering estimates and using Equation (14) of Lemma 2 yields

\begin{matrix} g (d) & = & (1 + o (r (d)) \frac{2 E [\frac{X_{1} - X_{2}}{X_{1} + X_{2}}; X_{1} > X_{2}, X_{2} \leq d / 2, X_{1} + X_{2} > d]}{2 P (X_{1} > d) + 2 f (d) E [X]} \\ = & (1 + o (r (d)) \frac{2 \bar{F} (d) [1 - 2 E [X] / d + o (1 / d) + E [X] r (d) - o (r (d)) + o (e^{- (c - 1) d^{α}})]}{2 \bar{F} (d) (1 + E [X] r (d))} \\ = & 1 + o (r (d)) = 1 + o (d^{α - 1}) . \end{matrix}

This shows Equation (4), and Equation (5) can be obtained using similar calculations with

e (z) = z / {log}^{γ - 1} z

. ☐

Proof of Proposition 2.

Note first that

X_{1} + X_{2}

and

X_{1} - X_{2}

are independent in the normal case. Denote

Z = X_{1} + X_{2}

so that

Z \sim N (0, 2)

. Let

e (d) = \bar{F} (d) / f (d)

be the mean excess function of Z (inverse hazard rate). It is then standard that

e (d)

is of order

1 / d

and that

(Z - d) e (d) | Z > d

converges in distribution to a standard exponential. Writing

g (d) = \frac{E [| X_{1} - X_{2} |]}{d} E [\frac{1}{\frac{e (d)}{d} \frac{Z - d}{e (d)} + 1} | Z > d],

(21)

it follows in the same way as in the proof or Proposition 1 that the r.h.s. of Equation (21) is

(c / d) (1 + o (1))

. This proves the claim. ☐

Proof of Theorem 1.

Suppose f is log-concave and twice differentiable. Since

g (d) = \frac{\int_{d}^{\infty} \int_{0}^{z} |1 - 2 y / z| f (z - y) f (y) d y d z}{\int_{d}^{\infty} \int_{0}^{z} f (z - y) f (y) d y d z},

it suffices to show that, for a fixed z, it holds that

\int_{0}^{1} | 1 - 2 s | f_{Z_{z}} (s) d s \leq \frac{1}{2},

where

f_{Z_{z}} (s) = \frac{f (z s) f (z (1 - s))}{\int_{0}^{1} f (z x) f (z (1 - x)) d x}, s \in [0, 1] .

(22)

In fact, by symmetry, one only needs to show

\int_{0}^{1 / 2} (1 - 2 s) (f_{Z_{z}} (s) - 1) d s \leq 0 .

(23)

It is known from the proof of Proposition 2.1 of [4] that

f_{Z_{z}}

is increasing in

[0, 1 / 2]

. Since

f_{Z_{z}}

is non-negative and integrates to one over interval

[0, 1]

, there exists a number

a \in (0, 1 / 2)

such that

f_{Z_{z}} (s) \leq 1

when

s \leq a

and

f_{Z_{z}} (s) > 1

when

s > a

. Therefore,

\begin{matrix} \int_{0}^{1 / 2} (1 - 2 s) (f_{Z_{z}} (s) - 1) d s \\ = & \int_{0}^{a} (1 - 2 s) (f_{Z_{z}} (s) - 1) d s + \int_{a}^{1 / 2} (1 - 2 s) (f_{Z_{z}} (s) - 1) d s \\ \leq & (1 - 2 a) [\int_{0}^{a} (f_{Z_{z}} (s) - 1) d s + \int_{0}^{a} (f_{Z_{z}} (s) - 1) d s] = 0, \end{matrix}

which proves Inequality (23). Generally, if f is log-concave and twice differentiable in the set

[x_{0}, \infty)

, then

f_{Z_{z}}

is increasing in the set

[x_{0} / z, 1 / 2]

. The difference from the presented calculation vanishes in the limit

d \to \infty

, and thus Inequality (7) holds.

If f is eventually log-convex, the proof is analogous and Inequality (8) holds. ☐

Author Contributions

The authors contributed equally to this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

M.Y. An. “Logconcavity versus logconvexity: A complete characterization.” J. Econom. Theory 80 (1998): 350–369. [Google Scholar] [CrossRef]
A. Saumard, and J.A. Wellner. “Log-concavity and strong log-concavity: A review.” Stat. Surv. 8 (2014): 45–114. [Google Scholar] [CrossRef] [PubMed]
R.C. Gupta, and N. Balakrishnan. “Log-concavity and monotonicity of hazard and reversed hazard functions of univariate and multivariate skew-normal distributions.” Metrika 75 (2012): 181–191. [Google Scholar] [CrossRef]
J. Lehtomaa. “Limiting behaviour of constrained sums of two variables and the principle of a single big jump.” Stat. Probab. Lett. 107 (2015): 157–163. [Google Scholar] [CrossRef]
M. Cule, R. Samworth, and M. Stewart. “Maximum likelihood estimation of a multi-dimensional log-concave density.” J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (2010): 545–607. [Google Scholar] [CrossRef]
G. Walther. “Inference and modeling with log-concave distributions.” Stat. Sci. 24 (2009): 319–327. [Google Scholar] [CrossRef]
F. Balabdaoui, K. Rufibach, and J.A. Wellner. “Limit distribution theory for maximum likelihood estimation of a log-concave density.” Ann. Stat. 37 (2009): 1299–1331. [Google Scholar] [CrossRef] [PubMed]
M.L. Hazelton. “Assessing log-concavity of multivariate densities.” Stat. Probab. Lett. 81 (2011): 121–125. [Google Scholar] [CrossRef]
L. Dümbgen, and K. Rufibach. “Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency.” Bernoulli 15 (2009): 40–68. [Google Scholar] [CrossRef]
S. Asmussen, and P.W. Glynn. “Stochastic simulation: Algorithms and analysis.” In Stochastic Modelling and Applied Probability. New York, NY, USA: Springer, 2007, Volume 57. [Google Scholar]
S. Ghosh, and S. Resnick. “A discussion on mean excess plots.” Stoch. Process. Appl. 120 (2010): 1492–1517. [Google Scholar] [CrossRef]
M.E. Crovella, and M.S. Taqqu. “Estimating the heavy tail index from scaling properties.” Methodol. Comput. Appl. Probab. 1 (1999): 55–79. [Google Scholar] [CrossRef]
J. Del Castillo, J. Daoudi, and R. Lockhart. “Methods to distinguish between polynomial and exponential tails.” Scand. J. Stat. 41 (2014): 382–393. [Google Scholar] [CrossRef]
Y.R. Gel, W. Miao, and J.L. Gastwirth. “Robust directed tests of normality against heavy-tailed alternatives.” Comput. Stat. Data Anal. 51 (2007): 2734–2746. [Google Scholar] [CrossRef]
N.H. Bingham, C.M. Goldie, and J.L. Teugels. “Regular variation.” In Encyclopedia of Mathematics and Its Applications. Cambridge, UK: Cambridge University Press, 1989, Volume 27. [Google Scholar]
H. Albrecher, C.Y. Robert, and J.L. Teugels. “Joint asymptotic distributions of smallest and largest insurance claims.” Risks 2 (2014): 289–314. [Google Scholar] [CrossRef]
I. Armendáriz, and M. Loulakis. “Conditional distribution of heavy tailed random variables on large deviations of their sum.” Stoch. Process. Appl. 121 (2011): 1138–1147. [Google Scholar] [CrossRef]
S.G. Bobkov, and G.P. Chistyakov. “On concentration functions of random variables.” J. Theor. Probab. 28 (2015): 976–988. [Google Scholar] [CrossRef]
A.R. Pruss. “Comparisons between tail probabilities of sums of independent symmetric random variables.” Ann. Inst. Henri Poincaré Probab. Stat. 33 (1997): 651–671. [Google Scholar] [CrossRef]
S. Asmussen, and D. Kortschak. “Error rates and improved algorithms for rare event simulation with heavy Weibull tails.” Methodol. Comput. Appl. Probab. 17 (2015): 441–461. [Google Scholar] [CrossRef]
A. Baltrūnas, and E. Omey. “The rate of convergence for subexponential distributions and densities.” Liet. Math. J. 42 (2002): 1–14. [Google Scholar] [CrossRef]
E. Omey, and E. Willekens. “Second order behaviour of the tail of a subordinated probability distribution.” Stoch. Process. Appl. 21 (1986): 339–353. [Google Scholar] [CrossRef]
B. Pfaff, and A. McNeil. “evir: Extreme Values in R. 2012. R Package Version 1.7-3.” Available online: https://CRAN.R-project.org/package=evir (accessed on 14 November 2016).
A. McNeil. “Estimating the tails of loss severity distributions using extreme value theory.” Astin Bull. 27 (1997): 117–137. [Google Scholar] [CrossRef]
H. Albrecher, and S. Asmussen. “Ruin probabilities.” In Advanced Series on Statistical Science & Applied Probability. Hackensack, NJ, USA: World Scientific Publishing, 2010, Volume 14. [Google Scholar]

Figure 1. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Gamma distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 1. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Gamma distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 2. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Weibull distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 2. Graphs of

\hat{g} (d, n)

for

n = 10, 000

for Weibull distributed random variables with shapes

0.2, 1

and 5 in figures (a), (b) and (c), respectively. All variables are standardised to have mean 3.

Figure 3. Graph of

\hat{g} (d, n)

from a classical set of Danish fire insurance data that can be obtained for instance from data set ‘danish’ in the R package [23]. The data is scaled to have mean 1. The sample is traditionally used to illustrate how heavy-tailed data behaves. A similar set of data was previously used in [24]. The graph supports the usual finding that the data set is heavy-tailed.

Figure 3. Graph of

\hat{g} (d, n)

from a classical set of Danish fire insurance data that can be obtained for instance from data set ‘danish’ in the R package [23]. The data is scaled to have mean 1. The sample is traditionally used to illustrate how heavy-tailed data behaves. A similar set of data was previously used in [24]. The graph supports the usual finding that the data set is heavy-tailed.

Figure 4. The graphs of multiple versions of

\hat{g} (d, n)

based on a dataset obtained from Hansjörg Albrecher (private communication) and related to occurrences of floods in a particular area. The data is scaled to have mean 1. The sample size is

n = 39

. Bivariate vectors

(X_{1}, Y_{1}), \dots, (X_{19}, Y_{19})

were sampled several times randomly without replacement from the original data. The overall appearance of the paths points to the data being heavy- rather than light-tailed.

Figure 4. The graphs of multiple versions of

\hat{g} (d, n)

based on a dataset obtained from Hansjörg Albrecher (private communication) and related to occurrences of floods in a particular area. The data is scaled to have mean 1. The sample size is

n = 39

. Bivariate vectors

(X_{1}, Y_{1}), \dots, (X_{19}, Y_{19})

were sampled several times randomly without replacement from the original data. The overall appearance of the paths points to the data being heavy- rather than light-tailed.

Figure 5.

d (1 - \hat{g} (d, n))

. Pareto in the first row, lognormal in the second, and Weibull in the last.

R = 5000

(left),

R = 5 \times 10^{6}

(right).

Figure 5.

d (1 - \hat{g} (d, n))

. Pareto in the first row, lognormal in the second, and Weibull in the last.

R = 5000

(left),

R = 5 \times 10^{6}

(right).

Figure 6.

R = 5 \times 10^{7}

. Lognormal (left), Weibull (right).

Figure 6.

R = 5 \times 10^{7}

. Lognormal (left), Weibull (right).

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.