A Method for Confidence Intervals of High Quantiles

Huang, Mei Ling; Raney-Yan, Xiang

doi:10.3390/e23010070

Open AccessArticle

A Method for Confidence Intervals of High Quantiles

by

Mei Ling Huang

^1,*

and

Xiang Raney-Yan

²

¹

Department of Mathematics, Brock University, St. Catharines, ON L2S 3A1, Canada

²

Department of Mathematcs, Niagara College, Welland, ON L3C 7L3, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(1), 70; https://doi.org/10.3390/e23010070

Submission received: 29 November 2020 / Revised: 22 December 2020 / Accepted: 28 December 2020 / Published: 4 January 2021

(This article belongs to the Special Issue Extreme Value Theory)

Download

Browse Figures

Versions Notes

Abstract

:

The high quantile estimation of heavy tailed distributions has many important applications. There are theoretical difficulties in studying heavy tailed distributions since they often have infinite moments. There are also bias issues with the existing methods of confidence intervals (CIs) of high quantiles. This paper proposes a new estimator for high quantiles based on the geometric mean. The new estimator has good asymptotic properties as well as it provides a computational algorithm for estimating confidence intervals of high quantiles. The new estimator avoids difficulties, improves efficiency and reduces bias. Comparisons of efficiencies and biases of the new estimator relative to existing estimators are studied. The theoretical are confirmed through Monte Carlo simulations. Finally, the applications on two real-world examples are provided.

Keywords:

efficiency; extreme value distributions; generalized Pareto distribution; Hill estimator; mean square errors; order statistics; tail index; Weissman estimator

1. Introduction

Extreme value analysis (EVA) was first introduced by Leonard Tippett (Fisher and Tippett, 1928 [1]). Tippett was working on how to make cotton thread stronger, he realized that the strength of the weakest threads were the only factor that matters when it comes to deciding the strength of the cotton thread. Nowadays, extreme value analysis is widely used in almost all fields, from engineering, social science, economics, traffic predictions to insurance and so on. People are interested in extreme events in these fields such as, the shortest life span of a new engine, the maximum appreciation of the stock market, the longest driving time on a highway at rush hour, or the biggest medical claim to an insurance company. The distributions of these extreme events are usually unknown. In general, EVA involves the extrapolation of an unknown distribution and its high quantiles. Estimating high quantile based on observation is very important in EVA, since it gives the corresponding value x for a very small exceeding possibility p.

There are certain risks, ones that are not decided by us or can barely be predicted until right before they are about to happen. This can include things such as an earthquake, terrorist attacks, a virus breakout, and so forth. For these events, we will need risk management which is in place to minimize, monitor, and control the impact of unfortunate events, or to maximize the realization of opportunities. Estimating the confidence interval of high quantiles plays an important role in risk management. Since a high quantile is located at the tail area, it heavily depends on the behaviour of the tail distribution, or from the statistical point of view, it depends on the k largest order statistics. This leads to the challenges of the instability in the choice of

k,

and the bias issues. There are many research on the mathematical models and theoretical studies in the literature for estimating confidence intervals of high quantiles, we review them in Section 2.

This paper proposes a new method to estimate high quantile of a heavy-tailed distribution. The new method has interesting improvements compared with other existing methods. This paper makes three main contributions to methodology.

(1) This paper proposes a new estimation method based on a geometric mean with good asymptotic properties. It is consistent and stable relative to the existing methods. The paper provides a computational algorithm which overcomes the mathematical difficulties and bias problems of the estimation of confidence intervals of high quantiles of a heavy tailed distribution.

(2) The Monte Carlo simulation studies on three heavy tailed distribution models: Fréchet (0.25), GPD (0.5) and GPD(2) (GPD: generalized Pareto distribution). The simulation results confirm that the proposed method is more efficient relative to the existing quantile estimators.

(3) This paper uses the proposed estimation method to predict extreme values in the flu in Canada, and gamma ray from solar flare examples. It is interesting to see that these data sets fit the GPD model very well. We apply the proposed method to estimate the confidence intervals of high quantiles. The numerical results show that the proposed method gives more efficient results compared with other existing methods.

In this paper, we review several existing high quantile estimators with their behavior in Section 2. We propose a new estimator for the confidence interval of high quantiles based on the geometric mean and explore its asymptotic properties in Section 3. To compare the new estimator with the existing estimators, Section 4 presents Monte Carlo simulation results and the improvement of the proposed quantile estimator relative to existing methods. In Section 5 we apply the proposed new method to construct confidence intervals of high quantiles on flu in Canada and gamma ray examples. Finally, conclusions and discussions are given in Section 6.

2. Existing Estimator for High Quantiles

Heavy-tailed distributions (de Haan and Ferreira, 2006 [2]) is important to extreme value events.

Definition 1.

A random variable X is said to have a heavy tail distribution if its distribution function F(x) satisfies

1 - F (x) = L (x) x^{- 1 / γ}, x \in (- \infty, \infty), as x \to \infty, γ > 0,

where

L (t)

is a slowly varying function

w i t h \lim_{t \to \infty} \frac{L (t x)}{L (t)} = 1,

for all

x > 0 .

γ

is the tail index.

Notice that we can have

L (x) = {(\ln (x))}^{b},

b \in ℝ

(de Hann and Ferreira, 2006, p. 362 [2]). Since

L (t)

behaves approximately as a constant c, for simplicity, we assume that a heavy tailed distribution satisfies

1 - F (x, γ) \to c x^{- 1 / γ}, x \in (- \infty, \infty), as x \to \infty, c > 0, γ > 0 .

(1)

Since the heavy tailed distributions decay slower than the exponential distributions and have longer tails. A tail function is defined as

Definition 2.

A tail function

U (t)

of any distribution function F(x) is defined as

U (t) = {(\frac{1}{1 - F})}^{⟵}, w h e r e “^{⟵} ” d e n o t e s t h e i n v e r s e f u n c t i o n .

For the heavy tailed distribution in (1), we can rewrite the tail function as

U (t) = {(\frac{1}{c t^{- 1 / γ}})}^{⟵} = c^{γ} t^{γ} = C t^{γ}, as t \to \infty, where c^{γ} = {(L (t))}^{γ}, let C = c^{γ} .

(2)

Definition 3.

The quantile function

Q (1 - p, γ)

of a heavy tailed distribution

F (x, γ)

in (1) for a given probability

1 - p

is defined by

x_{1 - p} = Q (1 - p, γ) = \inf \{x : F (x, γ) \geq 1 - p\}, x \in (- \infty, \infty), 0 < p < 1

where

Q (1 - p, γ)

is the generalized inverse function of F, we call

Q (1 - p, γ)

the

(1 - p) t h

quantile function of

F (x)

.

Value at Risk (

V a R

) is widely used in risk management. When p is very small,

x_{1 - p}

becomes a high quantile as the pth value at risk, we define

V a R_{p, γ} = x_{1 - p} = Q (1 - p, γ), 0 < p < 1, p is very small .

(3)

Also we can use the tail function in (2) to write

V a R_{p, γ}

as

V a R_{p, γ} = U (\frac{1}{p}, γ) = C t^{γ}, t = \frac{1}{p}, p = p_{n} \to 0, n p_{n} \to 0, a s n \to \infty .

The heavy-tailed models have a compulsory infinite right endpoint. In the case of negative observations in the model, the sample size should be exclusively the number of positive observations,

n^{+},

although a deterministic shift in the data is preferred by some authors, to work only with positive values. In this paper, we use the real line

(0, \infty)

.

To estimate

V a R_{p, γ},

let

X_{1 : n} \leq X_{2 : n} \leq . . . \leq X_{n : n}

be the order statistics from a random sample

X_{1}, X_{2}, . . . X_{n} .

We review the four high quantile estimation methods in the literature.

2.1. Quantile Function-Tail Index Method

For estimating high quantiles, we use the ln function, and estimate the tail index first

\ln Q_{\hat{γ}}^{(p)} = \ln V a R_{p, \hat{γ}} = \ln x_{1 - p, \hat{γ}}, 0 < p < 1, p is very small .

(4)

To estimate high quantile function, we estimate the tail index first (Dekkers and de Haan 1989 [3]). Hill (1975) [4] estimator is a well known consistent estimator for tail index

γ

.

Definition 4.

Consider the order statistics

X_{n - k : n}

, and k as an intermediate sequence of integers, Hill estimator is defined as

{\hat{γ}}_{H} = H (k) = \frac{1}{k} \sum_{i = 1}^{k} U_{i}, U_{i} = i (\ln \frac{X_{n - i + 1 : n}}{X_{n - i : n}}), 1 \leq i \leq k .

(5)

where

k = k_{n} \to \infty

,

k \in [1, n)

,

k = o (n)

a s

n \to \infty

.

The Hill estimator

{\hat{γ}}_{H} = H (k)

in (5) used largest k order statistics of a random sample. Substitute

{\hat{γ}}_{H} = H (k)

defined in (5) into (4), then we obtain ln

(1 - p)

th high quantile as

\ln q_{H, p} (k) = \ln Q_{H}^{(p)} (k) = \ln x_{1 - p, H} (k), 0 < p < 1, p is very small .

(6)

This estimator depends on

k,

small values of k provide high volatility whereas large values of k induce considerable bias. Hence, semi-parametric extensions may be considered for increasing the degree of freedom in the trade-off between variance and bias. Note that the tail index

γ

is a parameter of a given distribution, and a quantile of a distribution is a function of

γ .

2.2. Weissman Method

Weissman (1978) [5] proposed the following semiparametric estimator of a high quantile

Q_{\hat{γ}}^{(p)} (k) = V a R_{p, \hat{γ}} = x_{1 - p, \hat{γ}} = X_{n - k : n} {(\frac{k}{n p})}^{\hat{γ}}, 0 < p < 1, 1 \leq k \leq n - 1, and

\ln Q_{\hat{γ}}^{(p)} (k) = \ln X_{n - k : n} + \hat{γ} \ln (\frac{k}{n p}) .

We substitute

{\hat{γ}}_{H} = H (k)

in (5) into the function above, then we have,

\ln {\hat{Q}}_{H}^{(p)} (k) = \ln X_{n - k : n} + H (k) \ln (\frac{k}{n p}), 1 \leq k \leq n - 1 .

(7)

Without any prior indication on k, the Weissman estimator shows a large volatility as it depends on the fraction sample k. Although the minimization of the bias and MSE can be considered as a criterion to select k, it is impractical as they are unknown. Other methods for the selection of sample fraction k can be found in Beirlant et al. (1996) [6]; Dreea and Kaufmann (1998) [7]; Guillou and Hall (2001) [8]; Gomes and Oliveira (2001) [9].

The optimal k value through the tail index Hill estimator

H,

k_{0},

is given by formula (15) in Section 2.4 Optimal k Values.

2.3. Reduced-Bias Method

Hall and Welsh (1985) [10] proposed a second-order expansion on the tail function U in (2)

U (t) = C t^{γ} (1 + \frac{A (t)}{ρ} + o (t^{ρ})), A (t) = γ β t^{ρ}, a s t \to \infty,

(8)

with

C,

γ > 0,

ρ < 0,

and

β

\neq 0

. Where

β

is the scale second-order parameter and

ρ

is the shape second-order parameter.

To further reduce the bias of quantile estimators which requires us to observe the behavior of the estimation of the second-order parameters

β

and

ρ

. Second-order reduced-bias was discussed by Peng (1998) [11], Beirlant, Dierckx, Goegebeur and Mattys (1999) [12], Freueverger and Hall (1999) [13], Gomes, Martins and Neves (2000) [14], Caeiro and Gomes (2002) [15], Gomes, Figueiredo and Mendonea (2004) [16], among others. Comes and Pestana (2007) [17] considered the estimators

(\hat{ρ_{τ}} (k), \hat{β_{\hat{ρ}}} (k))

for the second-order parameters (

ρ, β)

.

Careiro et al. (2005, p. 122) [18] advises the the use of turning parameter

τ

in the estimation of

ρ .

It provides higher stability as functions of

k,

the number of the top order statistics used, for a wide range of large k value, by means of any stability criterion.

Definition 5.

Caeiro et al. (2005) [18] defined the bias-corrected Hill estimator

\bar{H} (k) \equiv {\bar{H}}_{\hat{β}, \hat{ρ}} (k) = H (k) (1 - \frac{\hat{β}}{1 - \hat{ρ}} {(\frac{n}{k})}^{\hat{ρ}}),

(9)

where

H (k)

is defined in (5). For a tuning real parameter

τ \in ℝ

,

\hat{ρ_{τ}} (k) \equiv {\hat{ρ_{n}}}^{(τ)} (k) = \min (0, \frac{3 (T_{n}^{(τ)} (k) - 1)}{T_{n}^{(τ)} (k) - 3})

(10)

T_{n}^{(τ)} (k) = \{\begin{matrix} \frac{{(M_{n}^{(1)} (k))}^{τ} - {(\frac{M_{n}^{(2)} (k)}{2})}^{\frac{τ}{2}}}{{(\frac{M_{n}^{(2)} (k)}{2})}^{\frac{τ}{2}} - {(\frac{M_{n}^{(3)} (k)}{6})}^{\frac{τ}{3}}} if τ \neq 0; \\ \frac{\ln (M_{n}^{(1)} (k)) - \frac{1}{2} \ln (\frac{M_{n}^{(2)} (k)}{2})}{\frac{1}{2} \ln (\frac{M_{n}^{(2)} (k)}{2}) - \frac{1}{3} \ln (\frac{M_{n}^{(3)} (k)}{6})} if τ = 0, \end{matrix}

M_{n}^{(j)} (k) = \frac{1}{k} \sum_{i = 1}^{k} {(\log X_{n - i + 1 : n} - \log X_{n - k : n})}^{j}, j = 1, 2, 3 .

\hat{β_{\hat{ρ}}} (k) = {(\frac{k}{n})}^{\hat{ρ}} \frac{d_{\hat{ρ}} (k) D_{0} (k) - D_{\hat{ρ}} (k)}{d_{\hat{ρ}} (k) D_{\hat{ρ}} (k) - D_{\hat{2 ρ}} (k)},

(11)

w h e r e f o r a n y θ \leq 0, d_{θ} (k) = \frac{1}{k} \sum_{i = 1}^{k} {(\frac{i}{k})}^{- θ} and D_{θ} (k) = \frac{1}{k} \sum_{i = 1}^{k} {(\frac{i}{k})}^{- θ} U_{i},

with

U_{i}

as defined in (5) that

1 \leq i \leq k

, and (10) achieves consistency if

\sqrt{k} A (n / k) \to \infty

as

n \to \infty

and

\hat{ρ} - ρ = o_{ρ} (1 / \ln n)

.

The corresponding ln-quantile estimator with the tail index estimator

\bar{H}

in (9) is

\ln {\hat{Q}}_{\bar{H}}^{(p)} (k) = \ln X_{n - k + 1 : n} + \bar{H} (k) \ln (\frac{k}{n p}), 1 \leq k \leq n - 1 .

(12)

A similar estimator to the estimator in (12) is considered in Lekina et al. (2014) [19] and Lekina (2010) [20].

Gomes and Pestana (2007) [17] considered the ln-Var estimator

\ln {\bar{Q}}_{\hat{γ}}^{(p)} (k) = \ln X_{n - k + 1 : n} + \hat{γ} (\ln (\frac{k}{n p}) + C_{p} (k; \hat{β}, \hat{ρ})), C_{p} (k; \hat{β}, \hat{ρ}) = \hat{β} {(\frac{n}{k})}^{\hat{ρ}} \frac{{(\frac{k}{n p})}^{\hat{ρ}} - 1}{\hat{ρ}} .

(13)

Substitute the estimator

\bar{H}

in (9) into (13), we have another estimator for high quantile as

\ln {\bar{Q}}_{\bar{H}}^{(p)} (k) = \ln X_{n - k + 1 : n} + \bar{H} (k) (\ln (\frac{k}{n p}) + C_{p} (k; \hat{β}, \hat{ρ})) .

(14)

2.4. Optimal k Values

As discussed previously, we have problem that the estimation varies as the k varies, and it become very unreliable when k is large. Gomes and Pestana (2007) [17] suggested to use the numerically estimated optimal k values.

The optimal k for the tail index estimator through Hill estimator

H (k)

in (5) is

k_{0},

k_{0} = {(\frac{(1 - ρ) n^{- ρ}}{β \sqrt{- 2 ρ}})}^{\frac{2}{1 - 2 ρ}} .

(15)

The optimal k for the semiparametric quantile estimator

\ln Q_{H} (k)

in (7), is

k_{0}^{Q_{H}},

k_{0}^{Q_{H}} = \underset{k}{arg min} \{\ln^{2} (\frac{k}{n p}) (\frac{1}{k} + \frac{β^{2} {(n / k)}^{2 ρ}}{{(1 - ρ)}^{2}})\} .

(16)

The optimal k for the second-order reduced-bias quantile estimator

\ln Q_{\bar{H}}^{(p)} (k)

in (12) and

\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)

in (14) should be larger than

k_{0}

, is

k_{01}

,

k_{01} = {(\frac{1.96 (1 - ρ) n^{- ρ}}{|β|})}^{\frac{2}{1 - 2 ρ}} .

(17)

By using these optimal k values, all the quantile estimators provide better results. However, with an unknown distribution, and estimated second-order parameters, these numerically estimated k values are not always accurate. Since all the quantile estimators are so sensitive to the k value, in this paper, we propose a new quantile estimator which does not depends on

k .

3. New Estimator for High Quantile

3.1. New Estimator

Our goal is to improve the quantile estimators in Section 2. There are bias issues and difficult in determining k with the existing estimating methods. In order to overcome these problems, Huang (2011) [21] proposed a new quantile estimator which is the geometric mean of the reduced-bias quantile estimator in (14).

Definition 6.

{\hat{Q}}_{N e w, \hat{γ}}^{(p)} = {[\prod_{k = 1}^{n - 1} (X_{n - k + 1 : n} {(\frac{k}{n p})}^{\hat{γ}})]}^{\frac{1}{n - 1}}, 0 < p < 1, 1 \leq k \leq n - 1 .

(18)

where

X_{n - k : n}

is the

(k + 1) t h

top order statistic,

\hat{γ}

is any consistent estimator for γ, and Q stands for quantile function.

Based on (16), (20) can be written as

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)} = \frac{1}{n - 1} \sum_{k = 1}^{n - 1} [\ln X_{n - k + 1 : n} + \hat{γ} (\ln (\frac{k}{n p}) + α C_{p} (k; \hat{β}, \hat{ρ}))],

(19)

where

0 < p < 1

and

α

is a constant that

α \in ℝ .

α C_{p} (k; \hat{β}, \hat{ρ})

is the adjustment term, where

C_{p} (k; \hat{β}, \hat{ρ})

is defined in (13) that reduces bias using the second-order parameters.

α

is a key value depends only on n to furthermore reduce the bias by observing the behavior of the second-order parameters. We will discuss the choice of

α

in Section 4.

Section 3, Section 4 and Section 5 will show that the new estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

has good properties, and

1. The new quantile estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

has the least bias, the smallest

M S E

and the highest efficiency.

2. The new quantile estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

is consistent and does not depend on k as the existing quantile estimators does.

3. The confidence interval based on the new quantile estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

is the most efficient compared to the existing methods, where it not only has the shortest length of the interval, but also has the highest probability coverage of the true value in most cases.

3.2. Asymptotic Properties of the New Estimator ln ${\hat{Q}}_{N e w, \bar{H}}^{(p)}$

Using the Hall-Welsh class of model in (8), we derive that the new estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

in (19) has the asymptotic properties under following conditions, when

\hat{γ} = \bar{H}

in (5).

Condition 1 (C1).

For intermediate

k,

k = k_{n} \to \infty .

k \in [1, n),

k = o (n),

as

n \to \infty .

Condition 2 (C2).

\ln (n p_{n}) = o (\sqrt{k}), {lim}_{n \to \infty} \sqrt{k} A (\frac{n}{k}) = λ \in ℝ,

where A is in (8).

Theorem 1.

Under (C1) and (C2), if we use

\hat{γ} = \bar{H}

in (5), then

l n {\hat{Q}}_{N e w, \bar{H}}^{(p)}

has a asymptotic normal distribution

(\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \overset{d}{\underset{n \to \infty}{⟶}} N o r m a l (0, \frac{γ^{2}}{{(n - 1)}^{2}} [\sum_{k = 1}^{n - 1} \frac{{[\ln \frac{k}{n p}]}^{2}}{k} + w \underset{i < j}{\sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]]) .

(20)

The asymptotic mean, variance and efficiency of

\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} (k)

in (19) relative to

\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)

in (14) are given by

E (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}) \underset{n \to \infty}{\approx} \ln V a R_{p};

\begin{matrix} V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}) \underset{n \to \infty}{\leq} \frac{γ^{2}}{{(n - 1)}^{2}} [\sum_{k = 1}^{n - 1} \frac{{[\ln \frac{k}{n p}]}^{2}}{k} + w \underset{i < j}{\sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]]; \end{matrix}

(21)

\begin{matrix} E F F {(\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})}_{_{\ln {\hat{Q}}_{\bar{H}}^{(p)} (k)}} \underset{n \to \infty}{\geq} \frac{{(n - 1)}^{2} \frac{{(\ln (\frac{k}{n p}))}^{2}}{k}}{\sum_{k = 1}^{n - 1} \frac{{[\ln \frac{k}{n p}]}^{2}}{k} + \underset{i < j}{w \sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]} > 1, \\ f o r k = 1, . . ., n - 1, \end{matrix}

(22)

where w is the weight,

w = \max_{i \neq j} ρ_{i j}^{+},

0 \leq w \leq 1,

ρ_{i j}^{+} = |ρ_{i j}^{}|,

0 \leq ρ_{i j}^{+} \leq 1;

ρ_{i j}^{}

is correlation coefficient of

\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)

and

\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)

ρ_{i j}^{} = \frac{C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)]}{\sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))}}, i \neq j, i, j = 1, . . ., n - 1, - 1 \leq ρ_{i j}^{} \leq 1, a n d

E F F {(\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})}_{\ln {\hat{Q}}_{\bar{H}}^{(p)} (k)} = \frac{V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k))}{V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})}, w h e r e V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)) \underset{n \to \infty}{\approx} γ^{2} \frac{{(\ln (\frac{k}{n p}))}^{2}}{k} .

See Appendix A for the proof of Theorem 1.

3.3. The C.I. for The New Estimator ln ${\hat{Q}}_{N e w, \bar{H}}^{(p)}$

Theorem 2.

Under conditions (C1) and (C2), a

(1 - α) 100 %

confidence interval for

\ln V a R_{p}

by using

\ln {\bar{Q}}_{N e w, \bar{H}}^{(p)}

in (19) is given by

(L C L_{\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}} (k), U C L_{\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}} (k)) = (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - U C L_{\bar{H}} (k) b 3, \ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} + U C L_{\bar{H}} (k) b 3)

(23)

where

z_{1 - α / 2}

is the

(1 - α / 2) t h

quantile of standard normal distribution, and

b 3 = \frac{z_{1 - α / 2}}{n - 1} \sqrt{\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]}, U C L_{\bar{H}} (k) = \frac{\bar{H}}{1 - \frac{z_{1 - α / 2}}{\sqrt{k}}} .

See Appendix A for the proof of Theorem 2.

Remark 1.

Note that in the CI in (23), the main term

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

does not depend on

k,

only the error terms

U C L_{\bar{H}} (k) b_{3}

depends on

k .

Remark 2.

In Section 4 Simulations and Section 5 Applications, we use the maximum weight

w = 1

in Formula (23), thus, we use maximum CI length for new proposed estimator ln

{\hat{Q}}_{N e w, \bar{H}}^{(p)}

comparing with existing methods. Even with maximum CI length. Section 4 and Section 5 show that the new estimator obtained confidence interval in (23) is still shorter than existing estimators obtained confidence intervals for most of k values.

4. Simulations

4.1. Computer Simulations of Quantile Estimators

To verify that the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

has good properties, we use simulations and compare the new estimator to the existing estimators using the following statistics

The expected value $E [\cdot]$ .
The root of mean squared errors $R M S E [\cdot]$ .
The relative efficiencies $R E F F [\cdot]$

R E F F_{{\tilde{Q}}_{H or \bar{H}}} = \sqrt{\frac{M S E [\ln q_{H}^{(p)} (k_{0})]}{M S E [\ln {\tilde{Q}}_{_{H or \bar{H}}}^{(p)} (k_{0})]}} for \tilde{Q} = Q or \bar{Q}, p = 1 / (2 n), k_{0} is defined in (15) .

(24)

In this Section, we choose models of Fréchet (0.25), GPD (0.5), GPD (2) to compare with the simulation results of Gomes and Pestana (2007) [17]. We use four quantile estimators in Table 1 to run simulations. When

|ρ| \leq 1,

estimators

\hat{β}

and

\hat{ρ}

in

\bar{H}

use the tuning parameter

τ = 0

, otherwise, use

τ = 1

(1): The Fréchet distribution (Fréchet, 1927) [22] has the c.d.f.

$F (x) = \exp (- x^{- \frac{1}{γ}}), x > 0, γ > 0 .$

(25)

An estimator of the pth ln-high quantile function is

$\ln Q_{\hat{γ}}^{(p)} = \ln x_{1 - p, \hat{γ}} = - \hat{γ} \ln (\ln (\frac{1}{1 - p})), 0 < p < 1, p is very small .$
(2): The generalized Pareto distribution (GPD) (de Zea Bermudeza and Kotz, 2010) [23] has the c.d.f.

$F (x; γ) = 1 - {(1 + γ x)}^{- \frac{1}{γ}}, x \geq 0, γ \neq 0,$

(26)

for $γ > 0$ , an estimator of the pth ln-high quantile function is

$\ln Q_{\hat{γ}}^{(p)} = \ln V a R_{\hat{γ}} = \ln x_{1 - p, \hat{γ}} = \ln (\frac{p^{- \hat{γ}} - 1}{\hat{γ}}), 0 < p < 1, p is very small .$

4.2. The Choice of $α$

As mentioned in Section 3,

α

is a key value to reduce the bias of the

\ln {\hat{Q}}_{N e w, \hat{;}}^{(p)}

defined in (19). We developed an algorithm to estimate

α

based on the results of

m -

simulation runs:

Step 1: For a fixed sample size n, the

α_{i} (n)

in ith iteration,

i = 1, . . ., m,

m = 500,

is the true solution of equation

\ln {\hat{Q}}_{N e w, {\bar{H}}_{i}}^{(p)} = \frac{1}{n - 1} \sum_{k = 1}^{n - 1} [\ln X_{i, n - k + 1 : n} + {\bar{H}}_{i} (\ln (\frac{k}{n p}) + α C_{p} (k; \hat{β}, \hat{ρ}))] = \ln V a R_{p}, i = 1 . . ., m,

then

α (n) = \frac{1}{m} \sum_{i = 1}^{m} α_{i} (n) .

Note that

α (n)

depends on

n .

\ln V a R_{p}

is the true lnVaR value.

Step 2: Obtain estimator

\hat{α} (n)

based on the linear regression (LR) models where

α

is related to

n .

We collect data set

(α_{j}, n_{j}),

j = 1, . . ., l,

with the sample size

l .

\hat{α} (n) = \{\begin{cases} \hat{ρ} (n), & |ρ (n)| < 1, for G P D (0.5); \\ {1.7488 - 0.0002 n + 2.9693 X}_{1} {+ 2.6604 X}_{2}, & |ρ (n)| \geq 1, \\ for Fréchet (0.25), X_{1} = \hat{β} {(n); X}_{2} = \hat{ρ} (n); \\ for GPD (2), X_{1} = \hat{ρ} {(n); X}_{2} = \hat{β} (n) . \end{cases}

(27)

Note that the estimate

\hat{α} (n)

in (27) depends the parameters of the models and LR relationship with sample size n.

Remark 3.

If we assume

α_{j}

in

(α_{j}, n_{j}),

j = 1, . . ., l,

is normally distributed, based on (Bickel and Doksum, 2015, pp. 286–388) [24], then

\hat{α} (n)

is a maximum likelihood estimator (MLE) and has an asymptotic normal distribution. Since the estimator

\hat{α} (n)

only depands to n not related to the order statistics, it will not affect the asymptotic proprties of the proposed estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

in (19).

4.3. Simulation of $F r \overset{´}{e} c h e t$ (0.25). GPD (0.5) and GPD (2)

Table 2, Table 3 and Table 4 list the results of simulations under the Fréchet (0.25), GPD (0.5) and GPD (2), where

N = 500

iterations for sample size

n = 500, 1000, 2000, 5000

and

p = 1 / (2 n) .

With

\hat{α} (n)

in (27), we compare mean values, mean squared errors (MSE) and REFF of the four

\ln V a R

estimators in Table 1, at optimal level

k = k_{0}

based on (15) Note that the new estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

has the highest REFF values among the four estimators which are in bold in all three models. The simulation MSE of

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

is defined as

M S E (\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}) = \frac{1}{N} \sum_{i = 1}^{N} {(\ln {\hat{Q}}_{N e w, \hat{γ}, i}^{(p)} - \ln V a R_{p})}^{2},

where

\ln {\hat{Q}}_{N e w, \hat{γ}, i}^{(p)}

is the

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

in the ith iteration,

i = 1, . . ., N .

So do for other ln-quatile estimators.

Figure 1, Figure 2 and Figure 3 are based on Table 2, Table 3 and Table 4 results, Figure 1 is for Fréchet (0.25), we use

N = 500

iterations, sample size

n = 1000

,

γ = 0.25

,

ρ = - 1

,

β = 0.5

,

p = 1 / 2 n

. The new estimator

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

has the best performance with the least bias and RMSE. It does not change as k varies. Figure 2 and Figure 3 are for GPD(0.5) and GPD(2),

N = 500

iterations, sample size

n = 1000

,

γ = 0.5

and 2,

ρ = - γ

,

β = 1

,

p = 1 / 2 n

. We note that the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}

is the best estimator as well, with the least bias, consistency as k varies, and the smallest

R M S E

. Note that

\ln {\hat{Q}}_{N e w, \hat{γ}}^{(p)}

values are very close to the true lnVaR

_{p}

values.

4.4. Simulations of Confidence Intervals

By Gomes and Pestana (2007) [17], the 95% confidence interval of the true tail index using H is

(L C L_{H} (k), U C L_{H} (k)) = (\frac{H (k)}{1 + \frac{β {(n / k)}^{ρ}}{1 - ρ} + \frac{1.96}{\sqrt{k}}}, \frac{H (k)}{1 + \frac{β {(n / k)}^{ρ}}{1 - ρ} - \frac{1.96}{\sqrt{k}}})

(28)

and the 95% confidence interval of the true tail index using

\bar{H}

is

(L C L \bar{_{H}} (k), U C L \bar{_{H}} (k)) = (\frac{\bar{H} (k)}{1 + \frac{1.96}{\sqrt{k}}}, \frac{\bar{H} (k)}{1 - \frac{1.96}{\sqrt{k}}})

(29)

Next, we compute the confidence intervals for the true ln-quantile by using the quantile estimators. We only use three out of four quantile estimators in Table 1, except

\ln q_{H}

which has the worst result. Therefore, we compare CIs only using

\ln Q_{H}

,

\ln {\tilde{Q}}_{\bar{H}}

and

\ln {\hat{Q}}_{n e w, \bar{H}}

in (30), (31) and (23). Thus

(1): The 95% confidence interval for the true $\ln V a R_{p}$ using $\ln Q_{H}$ is

$\begin{matrix} L C L_{\ln Q_{H}} (k) & = & min \ln Q_{H} (k) - L C L_{H} (k) \ln (\frac{k}{n p}) b_{2}, \ln Q_{H} (k) - U C L_{H} (k) \ln (\frac{k}{n p}) b_{2}; \\ U C L_{\ln Q_{H}} (k) & = & max \ln Q_{H} (k) + L C L_{H} (k) \ln (\frac{k}{n p}) b_{1}, \ln Q_{H} (k) + U C L_{H} (k) \ln (\frac{k}{n p}) b_{1} . \end{matrix}$

(30)

where $L C L_{H} (k)$ , $U C L_{H} (k)$ is given in (28), and

$b_{1} = \frac{1.96}{\sqrt{k}} - \frac{β {(n / k)}^{ρ}}{1 - ρ}, b_{2} = \frac{1.96}{\sqrt{k}} + \frac{β {(n / k)}^{ρ}}{1 - ρ} .$
(2): The 95% confidence interval for the true $\ln V a R_{p}$ using $\ln {\tilde{Q}}_{\bar{H}}$ is

$\begin{matrix} L C L_{\ln \tilde{Q} \bar{_{H}}} (k) & = & \ln \tilde{Q} \bar{_{H}} - U C L_{\bar{H}} (k) \ln (\frac{k}{n p}) \frac{1.96}{\sqrt{k}}, \\ U C L_{\ln \tilde{Q} \bar{_{H}}} (k) & = & \ln \tilde{Q} \bar{_{H}} + U C L_{\bar{H}} (k) \ln (\frac{k}{n p}) \frac{1.96}{\sqrt{k}}; \end{matrix}$

(31)

where $L C L_{\bar{H}} (k)$ , $U C L_{\bar{H}} (k)$ is given in (29), and
(3): The 95% confidence interval for the true $\ln V a R_{p}$ using $\ln {\hat{Q}}_{n e w, \bar{H}}$ is given in (24).

To compare new proposed CI in (23) to CIs in (30) and (31), we use evaluate the length and probability coverage of the CIs.

The length of CI is given as

l e n g t h of CI = U C L_{quantile estimator} - L C L_{quantile estimator .}

and the efficiency of the length of 95%

C I

is given as

E F F_{l e n g t h} = \frac{C . I . l e n g t h o f \ln Q_{H} at k_{0}^{Q_{H}}}{C . I . l e n g t h o f \ln \bar{Q} \bar{_{H}} or \ln {\hat{Q}}_{n e w, \bar{H}} at k_{01}} .

(32)

Also, the confidence interval is more efficient when it has a higher coverage of the true value under the simulations, where the probability coverage of 95%

C I

is defined as

P . C . = \frac{number of 95 % C I ’ s contains the true value}{number of 95 % C I ’ s simulated in total} * 100 % .

and the efficiency of the probability coverage of 95%

C I

is given as

E F F_{P . C .} = \frac{| P . C ._{\ln Q_{H}} - 95 % |}{|(P . C ._{\ln \bar{Q} \bar{_{H}}} or P . C ._{\ln {\hat{Q}}_{n e w, \bar{H}}}) - 95 %|} .

(33)

when

E F F_{P . C .}

is bigger means it is more efficient.

Figure 4, Figure 5 and Figure 6 show the 95% confidence interval of the three ln-quantile estimators under Fréchet (0.25), GPD (0.5 and 2) with

p = 0.0005

. We compare the size of each confidence interval at their optimal k level, and the probability coverage of each confidence interval at their optimal k level. Recall, the optimal k level for

\ln Q_{H}

is at

k_{0}^{Q_{H}}

based in (16), the optimal k level for

\ln \tilde{Q} \bar{_{H}}

and

\ln {\hat{Q}}_{n e w, \bar{H}}

is at

k_{01}

based in (15).

Table 5 compare the efficiencies of 95%

C I

of the three quantile estimators under

F r \overset{´}{e} c h e t

(0.25),

G P D

(0.5 and 2). The efficiency of 95%

C I

can be compared by the length of

C I

and the probability coverage of

C I

, denoted by

E F F_{l e n g t h}

and

E F F_{P . C .}

.

In this section, we compared the new quantile estimator

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

in (19) with the existing methods.

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

has the least bias, the smallest

R M S E

, and not depends on k too much. It also has the smallest length and the highest probability coverage in 95% confidence interval in most cases. The simulation results verify that

\ln {\hat{Q}}_{n e w, \bar{H}}

is the best quantile estimator among all three methods. Next section, we apply the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

to real world examples.

5. Applications

We will study two real-world examples in this Section. We are interested in the population that is above the threshold for each example. The goal is to estimate the

(1 - p) t h

high quantiles of the example, where

0 < p < 1

is a very small. We use the four quantile estimators in table 1

\ln q_{H},

\ln Q_{H},

\ln Q_{\bar{H}}

and

\ln {\hat{Q}}_{n e w, \hat{γ}}

in (21), and compare their performances.

Procedure:
Step 1:
Choose and collect data of examples of real life extreme events.
Step 2:
Run Goodness-of-Fit tests to check if data is heavy distributed.
Step 3:
Estimate the high quantiles and construct the confidence intervals by using the new method and the existing methods.
Step 4:
Estimate the high quantiles and construct the confidence intervals by using the new method and the existing methods.
Estimators
- Two tail index estimators $H (k)$ in (5) and $\bar{H} (k)$ in (9).
- Four quantile estimators (6), (7), (12) and (19) are in Table 1.
- We use $\hat{α} (n)$ in (27) for the new estimator $\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}$ in (19) for the GPD model.

Remark 4.

In applications, the GPD is used as a tail approximation to the population distribution from which a sample of excesses

x - μ

above some suitably high threshold μ are observed. The GPD is parameterized by location, scale and shape parameters

μ,

λ > 0

and γ, and can equivalently be specified in terms of threshold excesses

x - μ

or, as here, exceedances

x > μ

, as three parameters (

γ, μ, λ)

GPD in (34) (de Zea Bermudeza and Kotz, 2010) [23],

H_{γ} (x) = 1 - {(1 + γ \frac{x - μ}{λ})}^{- \frac{1}{γ}}, 0 < μ < x < {(0 \lor (- γ))}^{- 1}, λ > 0,

(34)

Traditionally, the threshold was chosen before fitting, giving the so-called fixed threshold approach (Pickands, 1975 [25], Balkema and de Haan, 1974 [26]). It is common for practitioners to assume a constant quantile level, determined by some assessment of fit across all or a subset of the datasets (Scarrott and McDonald, 2012, p.36 [27]). In our application, the threshold is pre-determined by physical considerations, that is, number of type A flu viruses detected weekly in Canada above the average in flu season, and the counts of gamma ray released from significant solar flares (M and X rated) during the Sun’s active years. Although it is possible to make some arbitrary definition of the choice of the threshold, it is preferable not to become involved with such delicate question. The application of the proposed method is presented in both examples for illustrative purpose.

5.1. Flu in Canada Example

According to the WHO (World Health Organization, 2020 [28]), seasonal influenza is a common infection of the airways and lungs that can spread easily among humans. There are 37 million people in Canada, and flu season usually runs from November to April. Most people recover from the flu in about a week. However, influenza may be associated with serious complications such as pneumonia, especially in infants, the elderly and those with underlying medical conditions like diabetes, anemia, cancer, and immune suppression. On average, the flu and its complications send about 12,200 Canadians to the hospital every year, and around 3500 Canadians die. There are 3 types of flu viruses, A, B and C. Type A flu virus is the most harmful, and it is constantly changing and is generally responsible for the large flu epidemics. The 1918 Spanish Flu, 1957 Asian Flu, 1968 Hong Kong Flu, 2009 Swine flu, and the most recent 2014 H5N1 Bird Flu are all type A flu. In this paper, we study type A viruses in Canada.

We collected the number of the type A flu viruses detected weekly in Canada, from 1 January 1997 to 31 December 2019, resulting in a sample size of

n^{*} =

994 weeks. According to the WHO, the average number of type A flu viruses detested per week in the flu season, November to April, is 953, for the past 10 years. We set 953 viruses/week as the threshold, which reduced our sample size to

n =

111 weeks. Full data-set is available at http://apps.who.int/influenza/gisrs_laboratory/flunet/en.

Figure 7a shows a Flu chart in

n^{*} =

994 weeks of type A flu viruses detected in Canada, and

n =

111 weeks remaining after the threshold, of average 953 flu viruses. For each flu incubation period, a flu virus can last from one up to few weeks, that is why some arches are narrow and some arches are more bell shaped in this figure. The top three weeks are circled in the plot. Figure 7b shows a histogram of

n^{*} =

994 weeks data. We are interested in the 99% quantile,

x_{0.99}

, such that 99% chance that the viruses detected in a given week would be less than this value, or equivalently, with a 1% possibility, the number of flu viruses detested in a given week would be in excess of this value. This information is useful for monitoring and studying the virus, also is helpful for medical organizations that deal with disease control and prevention, pharmaceutical availability, and hospital resource readiness, especially during a serious flu outbreak.

{\hat{x}}_{0.99}

is approximately located in the plot. In this paper, we propose a new estimate high quantiles method, and compare it with existing methods.

Our interest is to find the 5%

V a R

and 1%

V a R

of the number of type A flu viruses detested in a week, and their 95% confidence intervals.

5.1.1. Goodness-of-Fit Test

Through data transformation

Y_{i} = \frac{X_{i} - μ}{λ}

,

i = 1, . . ., n

,

n = 111 .

Take

μ = 953

as the threshold, the maximum likelihood estimators (MLE) are

{\hat{λ}}_{M L E} = 1275.97287

and

{\hat{γ}}_{M L E} = 0.01345 .

Figure 8a is the log-log plot of

G P D

curve with the horizontal axis

\ln (x)

against the vertical axis

\ln (P {x < X})

. Visually the transformed data fit the one parameter

G P D

in (26) the bestusing

{\hat{γ}}_{M L E}

(red curve). Figure 8b shows the GPD density curve (red curve) fits the histogram very well.

Beside visual view of Figure 8, we also carry on the three goodness-of-fit tests: the Kolmogorov-Smirnov (K-S) test (Kolmogorov, 1933 [29]), Anderson-Darling (A-D) test, and Cramér von Mises (C-v-M) test (Anderson-Darling, 1952 [30]). All three tests are based on the maximum vertical distance between the empirical distribution function and the observations, and the parent distribution function is the

G P D .

The Hypothesis for all three tests is

\begin{matrix} H_{0} & : & F (x) = F^{*} (x), for all values of x \\ H_{1} & : & F (x) \neq F^{*} (x), for at least one value of x \end{matrix}

F (x)

is the true but unknown distribution of the sample.

F^{*} (x)

is the theoretical distribution, in our project, the parent distribution,

G P D

.

S_{n} (x)

is the empirical distribution and step function of the sample. It is defined as

S_{n} (x) = \frac{1}{n} \overset{n}{\sum_{i = 1}} I_{(- \infty, x]} (X_{i}), where I_{A} = \{\begin{matrix} 1, & if x \in A; \\ 0 & if x \notin A . \end{matrix}

where

- \infty < x < \infty

,

0 \leq S_{n} (x) \leq 1 .

The test statistics under

H_{0}

of

K - S

test is

T = sup_{x} |F^{*} (x) - S_{n} (x)| .

(35)

Based on Table 6 goodness of fit tests’ results, we set the GPD model for the flu in Canada data. We define the absolute errors (

A E

) in (34) and integrated errors (

I E

) in (35) as

I E = \frac{1}{X_{n : n} - X_{n - r + 1 : n}} {(\int_{X_{n - r + 1 : n}}^{X_{n : n}} {[S_{n} (x) - F^{*} (x)]}^{2} d x)}^{1 / 2} .

(36)

For both

A E

and

I E

, we use 3 different r values by letting

r = \frac{n}{10} t h,

r = \frac{n}{2} t h

, and

r = n t h

top statistics. Table 7 lists the AE and IE errors which are very small.

Next, we estimate the high quantiles and their confidence interval for this example.

5.1.2. Compare Four Estimation Methods

We use the four estimators in Table 1:

\ln q_{H},

\ln Q_{H},

\ln Q_{\bar{H}}

, and the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}

.

We use

{\hat{ρ}}_{τ} (k)

in (10), and

{\hat{β}}_{{\hat{ρ}}_{0}} (k)

in (11). To decide if the tuning parameter

τ = 0

or 1, consider

{{\hat{ρ}}_{τ} (k)}_{k \in k}

, for

k \in k = ([n^{0.995}], [n^{0.999}])

, and compute their median

x_{τ}

, then

τ = \underset{k}{arg min} \sum_{k \in k} {({\hat{ρ}}_{τ} (k) - x_{τ})}^{2} .

With

n = 111

, we get

k \in k = (108, 110)

and

x_{τ} = 109

, then

\sum_{k \in k} {({\hat{ρ}}_{0} (k) - x_{τ})}^{2} \approx 36116 < \sum_{k \in k} {({\hat{ρ}}_{1} (k) - x_{τ})}^{2} \approx 37033

, conclude that

τ = 0

, thus we have

{\hat{ρ}}_{0} (k_{1}) = - 0.7101

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k_{1}) = 1.026571

, where

k_{1}

is the optimal k value. Figure 9 shows the results.

Figure 9a shows estimates of the second-order parameters

ρ

through

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0;

Figure 9b shows Estimates

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

Figure 9c shows the two estimated tail index, H,

\bar{H},

H = 0.4379

at its optimal level using

{\hat{k}}_{0} = 21

based on (15) and

\bar{H} = 0.3736

at its optimal level using

{\hat{k}}_{01} = 42

based on (17). Figure 9d shows four quantile estimators of flu in Canada example, with

p = 0.01

. The full circles “•” in the plot are the values of the quantile estimators at their optimal k level. We note that

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

has a constant value, which does not depend on

k .

Figure 10 compares the confidence intervals of three quantile estimators in (7), (12) and (19). This figure shows that the new quantile estimator

\ln {\hat{Q}}_{n e w, \bar{H}}^{(p)}

has the smallest confidence interval with length 0.7966, where we use

\hat{α} = \hat{ρ} = - 0.7101 .

(The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

In Table 8, we compare the four ln-quantile estimators and their mean, median,

V a R_{0.05}

and

V a R_{0.01} .

Table 9 compares the size of confidence intervals at

\ln V a R_{0.01}

and

V a R_{0.01}

of the three quantile estimators.

In Table 9, we compared

Q_{H}

,

Q_{\bar{H}}

and

Q_{n e w, \bar{H}},

the

Q_{n e w, \bar{H}}

has the shortest confidence interval with the highest efficiency of 2.2462.

5.1.3. Summary

Based on Figure 10 and Table 9, we conclude that the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}

in (19) is the best estimator for Flu in Canada example. We can predict that at

V a R_{0.01}

, we expect 5500 type A flu viruses during a flu outbreak after threshold 953/week. This is shown in Figure 8b.

5.2. Gamma Ray of Solar Flare Example

Gamma ray has the most penetrating power among all the radiations. The burst of gamma rays are thought to be, due to the collapse of stars called hypernovas, the most powerful events so far discovered in the cosmos. The measurement of gamma rays are in counts, and it is the number of atoms in a given quantity of radioactive material that are detected by an instrument to have decayed. We have collected gamma ray data from solar flares, from November 2008 to September 2020, from NASA (National Aeronautics and Space Administration, 2020 [31]). Full data-set is available at http://hesperia.gsfc.nasa.gov/fermi/gbm/qlook/fermi_gbm_flare_list.txt.

The solar flare travels hundreds of miles per second, and can reach the Earth within hours. It can disrupt communication navigational equipment, damage satellites, and even cause blackouts by damaging power plants. In 1989, a strong solar storm knocked out the power grid in Québec, Canada, causing 6 million people to lose power for more than 9 hours, and it cost millions of dollars to repair. It can bring additional radiation around the north and south poles, a risk that forces airlines to reroute flights. The Fermi Gamma-ray Space Telescope was launched in late 2008 to explore high-energy phenomena in the Universe. It is worth noting that more than one trigger may have occurred during the flare, the one nearest the peak of the flare is listed, resulting in a sample size of 5128. Solar flares are classified as A, B, C, M or X according to the peak flux (in watts per square meter, W/m²) of 1 to 8 angstrom (The angstrom is a unit of length equal to 1/10,000,000,000 (one ten-billionth) of a meter.) X-rays near the Earth, as measured on the GOES spacecraft. Gamma ray activity is correlated with the X ray activity, as shown in Figure 11 (NOAA, 2020 [32]. When the amount of gamma ray released is over 5 million counts, it usually corresponds to an X rated flare or significant M rated flares.

Figure 12a shows a Gamma ray chart of

n^{*} =

5128 flares, and

n =

104 flares remaining after the threshold of 86 million counts. The most powerful gamma ray was released in March 7, 2012 with nearly 1.5 billion counts, the sun was brightened by 1000 times, and became the brightest object in the gamma ray sky. The top three events are circled in the chart. Figure 12b shows a histogram of

n^{*} =

5128 flares. We are interested in the 99% quantile,

x_{0.99}

, such that 99% gamma ray released from solar flares are under this value, or equivalently, with a 1% possibility, the amount of gamma ray a solar flare releases would be in excess of this value. During the spring and fall, the satellites that are used to detect solar flares experience eclipses, in which the Earth or the Moon blocks between the satellites and the Sun for a short period every day. Eclipse season lasts for about 45 to 60 days and ranges from minutes to just over an hour. The quantile estimation would provide useful predictions for these times.

{\hat{x}}_{0.99}

is approximately located in the plot since we do not know this value yet.

We chose the threshold as the mean of the data from the peak period. The solar cycle is every 11.6 years, and the sun’s activity peaked from 2011 to 2014. In Figure 12a we can see that the top 3 flares, in fact, almost 90% of the top 100 flares, are from the 2011 to 2014 time period. Taking the average of all the X rated and significant M rated flares from this peak period, we obtained a mean of 86 million counts, resulting in a remaining sample size of

n = 104

.

For the Gamma ray of solar flare example, our goal is to find out the high quantiles, specifically, the 5%

V a R

and 1%

V a R

of the amount of gamma ray a solar flare would release, and their 95% confidence intervals.

5.2.1. Gooness-of-Fit Tests

Similar as Flu in Canada Example, we set

μ = 86 m i l l i o n

, and obtain

{\hat{λ}}_{M L E} = 171.0708592

, and

{\hat{γ}}_{M L E} = 0.2580384847 .

Figure 13a is a log-log plot of gamma ray data under

G P D

model, with the horizontal axis

\ln (x)

against the vertical axis

\ln (P {x < X})

. Figure 13b shows the histogram fits the GPD model.

Next, we will perform three goodness-of-fit tests: Kolmogorov-Smirnov test, the Anderson-Darling test and the Cramér-von-Mises test. The results listed in Table 10, the data fits the

G P D

with

{\hat{γ}}_{M L E}

the best, nearly 59%.

In Table 11, all the errors are less than 0.07 for AE, and less than 0.01 for

I E

.

Next, we can compare the four high quantile estimators and their confidence intervals of this example.

5.2.2. Compare Four Estimation Methods

Similar as Example 1, we use the four quantile estimators in Table 1:

\ln q_{H}

,

\ln Q_{H}

,

\ln Q_{\bar{H}}

, and the

\ln {\hat{Q}}_{n e w, \bar{H}}

.

We use

{\hat{ρ}}_{τ} (k)

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k),

and

τ = 0

, thus we have

{\hat{ρ}}_{0} (k 1) = - 0.7269

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k_{1}) = 1.0257

, where

k_{1}

is the optimal k value for the second-order parameters. The results are in Figure 14.

Figure 14a shows the estimates of the second-order parameters

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0 .

Figure 14b shows

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

Figure 14c shows the two different tail index estimators, H,

\bar{H}

. We have

H = 0.5324

at its optimal level with

{\hat{k}}_{0} = 21

,

\bar{H} = 0.6517

at its optimal level with

{\hat{k}}_{01} = 41 .

Figure 14d shows all four quantile estimators of gamma ray example, with

p = 0.01

. We note that

\ln {\hat{Q}}_{n e w, \bar{H}}

has a constant value which does not depend on k.

Figure 15 compares the confidence intervals of our ln-quantile estimators in (7), (12) and (19). This figure shows that the new quantile estimator

\ln {\hat{Q}}_{n e w,, \bar{H}}

has the smallest confidence interval with length 1.4451, where we use

\hat{α} = \hat{ρ} = - 0.7269 .

The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level.

In Table 12, we compare all four quantile estimators under

V a R_{0.05}

and

V a R_{0.01} .

Table 13 compares the size of confidence intervals of

\ln V a R_{0.01}

and

V a R_{0.01}

by three quantile estimators.

Table 13 shows that the new estimator has the shortest confidence interval, compared to ln

Q_{H}

, and ln

Q_{\bar{H}},

with the highest efficiency of 1.6016.

5.2.3. Summary

Based on Figure 15 and Table 13, we conclude that the new estimator

\ln {\hat{Q}}_{n e w, \bar{H}}

in (19) is the best estimator for Gamma Ray example. We predict that

V a R_{0.01}

is a gamma ray release of

1102.57

million counts, this is most likely an X rated solar flare. This is shown in Figure 13b.

6. Conclusions

Based on the studies in this paper, we conclude that:

1. High quantile and its CI estimation provides important information for risk management and for extreme event predictions.

2. Based on the theoretical and simulation results, the proposed new method for estimating confidence interval of high quantiles has advantages properties comparing with other existing methods. The estimation is consistent and stable with less error. The proposed method provides a useful computational algorithm to the readers.

3. The confidence interval of high quantile obtained by the new proposed method also has the highest efficiency compared to the existing methods, in terms of having the smallest size of confidence interval, and the highest probability coverage of the true quantile values in most cases.

4. Based on the analysis of the two real-world examples, flu in Canada and gamma ray from the solar flare, we can see that the new proposed method can be applied to many more fields, including other extreme events such as insurance claims, natural disasters, stock market predictions and pandemic disease monitoring.

Author Contributions

The authors M.L.H. and X.R.-Y. carried this work and drafted the manuscript together. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Nature Sciences and Engineering Research Council of Canada (NSERC) grant: MLH DDG-2019-04206.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The datasets can be found here: http://hesperia.gsfc.nasa.gov/fermi/gbm/qlook/fermi_gbm_flare_list.txt [31] and https://www.who.int/influenza/gisrs_laboratory/flunet/en [28].

Acknowledgments

We are grateful for the comments of the reviewers and editor. They have helped us to improve the paper. We deeply appreciate the Brock Library Open Access Publishing Fund support.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Theorems 1 and 2

Lemma 1.

The sum of

C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)],

i \neq j, i, j = 1, . . ., n - 1,

satisfy the inequality

\underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)] \leq w \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))},

(A1)

where w is a weight

w = \max_{i \neq j} ρ_{i j}^{+}, 0 \leq w \leq 1, where ρ_{i j}^{+} = |ρ_{i j}^{}|, 0 \leq ρ_{i j}^{+} \leq 1;

ρ_{i j}^{}

is correlation coefficient of

\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)

and

\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)

ρ_{i j}^{} = \frac{C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)]}{\sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))}}, i \neq j, i, j = 1, . . ., n - 1, - 1 \leq ρ_{i j}^{} \leq 1 .

Proof of Lemma 1.

For each pair

(i, j),

i \neq j, i, j = 1, . . ., n - 1

C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)] = ρ_{i j}^{} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))}

then we have a boundary if we only add the positive

ρ_{i j}^{+}

terms in the following summation

\begin{matrix} \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)] & = & \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} ρ_{i j} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))} \\ \leq & \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} ρ_{i j}^{+} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))} \\ \leq & w \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))} . □ \end{matrix}

Lemma 2.

Under conditions (C1) and (C2), for

\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)

in (14) by use Theorem 5.1, formula (5.2) in Gomes and Pestana (2007, p.285 [17]), as

n \to \infty,

\frac{\sqrt{k}}{\ln (\frac{k}{n p})} (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k) - \ln V a R_{p}) \overset{d}{\vec{n \to \infty}} N o r m a l (0, γ^{2}),

then the asymptotic expected value and variance are

E (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k) - \ln V a R_{p}) \approx 0, a s n \to \infty;

V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)) \approx γ^{2} \frac{{(\ln (\frac{k}{n p}))}^{2}}{k}, a s n \to \infty .

(A2)

Proof of Theorem 1.

Under conditions (C1) and (C2), in the Hall-Welsh class of models in (6), where

\bar{H}

is in (8) with conditions

\sqrt{k} ({\bar{H}}_{β, ρ} (k) - γ) \overset{d}{\underset{n \to \infty}{⟶}} N o r m a l (0, γ),

and

{\bar{H}}_{\hat{β}, \hat{ρ}} (k) \overset{d}{\underset{n \to \infty}{⟶}} = γ + \frac{γ}{\sqrt{k}} V_{k} + o_{p} (A (\frac{n}{k})),

where

V_{k}

is an asymptotic standard normal random variable.

By Schwartz inequality and Lemma 1 formula (A1), sinece

α

is a contant in (19), based on asympototic properties of

C_{p} (k; \hat{β}, \hat{ρ})

in (13) (Gomes and Pestana, 2007, p.286 [17]), we have that

\begin{array}{l} V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}) \\ = & V a r [(\frac{1}{n - 1} \sum_{k = 1}^{n - 1} \ln {\bar{Q}}_{\bar{H}}^{(p)} (k))] \\ = & \frac{1}{{(n - 1)}^{2}} \{\sum_{k = 1}^{n - 1} V a r [(\ln {\bar{Q}}_{\bar{H}}^{(p)} (k))] + \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} C o v [\ln {\bar{Q}}_{\bar{H}}^{(p)} (i), \ln {\bar{Q}}_{\bar{H}}^{(p)} (j)]\} \\ = & \frac{1}{{(n - 1)}^{2}} \{\sum_{k = 1}^{n - 1} V a r [(\ln {\bar{Q}}_{\bar{H}}^{(p)} (k))] + \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} ρ_{i j} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))}\} \\ \leq & \frac{1}{{(n - 1)}^{2}} \{\sum_{k = 1}^{n - 1} V a r [(\ln {\bar{Q}}_{\bar{H}}^{(p)} (k))] + w \sum_{i \neq j}^{n - 1} \sum^{n - 1} \sqrt{(V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (i)) V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (j)))}\} . \end{array}

Therefore when n is large enough, use Lemma 2, formula (A2),

V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k)) \approx γ^{2} \frac{{(\ln (\frac{k}{n p}))}^{2}}{k}, a s n \to \infty;

and

E (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \approx 0, a s n \to \infty .

we can have the following approximate relation

\begin{matrix} V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)}) & \leq & \frac{1}{{(n - 1)}^{2}} [\sum_{k = 1}^{n - 1} γ^{2} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \sum_{i \neq j}^{n - 1} \sum^{n - 1} γ^{2} \sqrt{\frac{{[\ln (\frac{i}{n p})]}^{2}}{i}} \sqrt{\frac{{(\ln (\frac{j}{n p}))}^{2}}{j}}] \\ = & \frac{γ^{2}}{{(n - 1)}^{2}} [\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \sum_{i \neq j}^{n - 1} \sum^{n - 1} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]], n \to \infty . \end{matrix}

this proved (21). Therefore, we have the asymptotic normal distribution in (20)

(\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \overset{d}{\underset{n \to \infty}{⟶}} N o r m a l (0, \frac{γ^{2}}{{(n - 1)}^{2}} [\sum_{k = 1}^{n - 1} \frac{{[\ln \frac{k}{n p}]}^{2}}{k} + w \sum_{i < j}^{n - 1} \sum^{n - 1} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]]) .

(A3)

Furthermore, we use (21) and (A2), we obtain (22) as

\begin{matrix} E F F {(\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})}_{\ln {\hat{Q}}_{\bar{H}}^{(p)} (k)} & = & \frac{V a r (\ln {\bar{Q}}_{\bar{H}}^{(p)} (k))}{V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})} \\ \geq & \frac{{(n - 1)}^{2} \frac{{(\ln (\frac{k}{n p}))}^{2}}{k}}{\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \sum^{n - 1} \sum^{n - 1} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i • j}}]}, n \to \infty . □ \end{matrix}

Proof of Theorem 2.

Under conditions (C1) and (C2), Use

V a r (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)})

in Theorem 1, Formula (22), and

z_{α / 2} = - z_{1 - α / 2},

with

b 3 = \frac{z_{1 - α / 2}}{n - 1} \sqrt{\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \sum_{i \neq j}^{n - 1} \sum^{n - 1} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i \cdot j}}]}, U C L_{\bar{H}} (k) = \frac{\bar{H}}{1 - \frac{z_{1 - α / 2}}{\sqrt{k}}} .

we have

Z = \frac{\sqrt{n - 1}}{γ \sqrt{\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + w \underset{i \neq j}{\sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i • j}}]}} (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \overset{d}{\vec{n \to \infty}} N o r m a l (0, 1),

therefore approximately

P \{- z_{1 - α / 2} \leq \frac{n - 1}{γ \sqrt{\sum_{k = 1}^{n - 1} \frac{{[\ln (\frac{k}{n p})]}^{2}}{k} + \underset{i \neq j}{w \sum^{n - 1} \sum^{n - 1}} [\frac{\ln (\frac{i}{n p}) \ln (\frac{j}{n p})}{\sqrt{i • j}}]}} (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \leq z_{1 - α / 2}\}

= 1 - α,

then

P \{- γ b 3 \leq (\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - \ln V a R_{p}) \leq γ b 3\} = 1 - α

P \{- \ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - γ b 3 \leq - \ln V a R_{p} \leq - \ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} + γ b 3\} = 1 - α,

and using

\hat{γ} = \bar{H} (k) \leq U C L_{\bar{H}} (k),

to guarantee a coverage probability of at least

(1 - α) 100 %,

we have

P \{\ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} - U C L_{\bar{H}} (k) b 3 \leq \ln V a R_{p} \leq \ln {\hat{Q}}_{N e w, \bar{H}}^{(p)} + U C L_{\bar{H}} (k) b 3\} \geq 1 - α . □

References

Fisher, R.A.; Tippett, L.H.C. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Procs. Camb. Philos. Soc. 1928, 24, 180–190. [Google Scholar] [CrossRef]
de Haan, L.D.; Ferreira, A. Extreme Value Theory; Springer: New York, NY, USA, 2006. [Google Scholar]
Dekkers, A.L.M.; de Haan, L. On the estimation of the extreme value index and large quantile estimation. Ann. Stat. 1989, 17, 1795–1832. [Google Scholar] [CrossRef]
Hill, B.M. A simple general approach to inference about the tail of a distribution. Ann. Statist. 1975, 3, 1163–1174. [Google Scholar] [CrossRef]
Weissman, I. Estimation of parameters and large quantiles based on the k largest observations. J. Am. Stat. Assoc. 1978, 73, 812–815. [Google Scholar]
Beirlant, J.; Vynckier, P.; Teugels, J.L. Tail index estimation, Pareto quantile plots and regression diagnostics. J. Am. Statist. Assoc. 1996, 91, 1659–1667. [Google Scholar]
Dreea, H.; Kaufmann, E. Selecting the optimal sample fraction in univaraiate extreme values estimation. Stoch. Proc. Appl. 1998, 75, 149–172. [Google Scholar]
Guillou, A.; Hall, P.G. A diagnostic for selecting the threshold in extreme value analysis. J. R. Stat. Soc. B 2001, 63, 293–305. [Google Scholar] [CrossRef]
Gomes, M.I.; Oliveira, O. The bootstrap methodology in statistics of extremes: Choice odf the optimal sample fraction. Extremes 2001, 4, 331–358. [Google Scholar] [CrossRef]
Hall, P.; Welsh, A.H. Adaptive estimates of parameters of regular variation. Ann. Stat. 1985, 13, 331–341. [Google Scholar] [CrossRef]
Peng, L. Asymptotic unbiased estimator for extreme-value index. Stat. Probab. Lett. 1998, 38, 107–115. [Google Scholar] [CrossRef]
Beirlant, J.; Dierckx, G.; Goegebeur, Y.; Matthys, G. Tail index estimation and an exponential regression model. Extremes 1999, 2, 177–200. [Google Scholar] [CrossRef]
Feuerverger, A.; Hall, P.G. Estimating a tail exponent by modelling departure from a Pareto. Ann. Stat. 1999, 27, 760–781. [Google Scholar]
Gomes, M.I.; Martins, M.J.; Neves, M. Alternatives to a semi-parametric estimator of parametric of rare events-the jackknife methodology. Extremes 2000, 3, 207–229. [Google Scholar] [CrossRef]
Caeiro, F.; Gomes, M.I. A class of asymptotically unbiased semi-parametric estimations of the tail index. Test 2002, 11, 345–364. [Google Scholar] [CrossRef]
Gomes, M.I.; Figueiredo, F.; Mendonca, S. Asymptotically best linear unbiased tail estimators under second order regular variation. J. Stat. Plan. Inference 2004, 134, 409–433. [Google Scholar] [CrossRef]
Gomes, M.I.; Pestana, D. A study reduced-bias extreme quantile (VaR) estimator. J. Am. Assoc. 2007, 102, 280–292. [Google Scholar] [CrossRef]
Caeiro, F.; Gomes, M.I.; Pestana, D. Direct reduction of bias of the classical Hill estimator. Rev. Stat. 2005, 3, 113–136. [Google Scholar]
Lekina, A.; Chebana, F.; Ouarda, T.B.M.J. Weighted estimate of extreme quantile: An application to the estimation of high flood return periods. Stoch. Environ. Res. Risk Assess. 2014, 28, 147–165. [Google Scholar] [CrossRef] [Green Version]
Lekina, A. Estimation Non-Paramétrique des Quantiles Extrêmes Conditionnels. Ph.D. Thesis, Université de Grenoble, Saint-Martin-d’Hères, France, 2010. [Google Scholar]
Huang, M.L. A New High Quantile Estimator for Heavy Tailed Distributions; (Working Paper); Department of Mathematics, Brock University: St. Catharines, ON, Canada, 2011. [Google Scholar]
Fréchet, M. Sur la loi de probabilite de l’écart maximum. Ann. Soc. Pol. Math. 1927, 6, 93–116. [Google Scholar]
de Zea Bermudeza, P.; Kotz, S. Parameter estimation of the generalized Pareto distribution. J. Stat. Inference 2010, 140, 1374–1388. [Google Scholar] [CrossRef]
Bickel, P.J.; Docksum, K.A. Mathematical Statistics, Basic Ideas and Selected Topic, 2nd ed.; CRC Press, Taylor & Frances Group: Boca Raton, FL, USA, 2015; Volume 1. [Google Scholar]
Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
Balkema, A.A.; de Haan, L. Residual life time at great age. Ann. Prob. 1974, 2, 792–804. [Google Scholar] [CrossRef]
Scarrott, G.; McDonald, A. A Review of extreme value threshold estimation and uncertainty quantification. Revstat 2012, 10, 33–60. [Google Scholar]
World Health Organization (WHO). 2020. Available online: https://www.who.int/influenza/gisrs_laboratory/flunet/en (accessed on 31 December 2020).
Kolmogorov, A.N. Sulla determinazione empirica di una legge di distribuzione. G. Dell. Istituto Ital. Degli Attuari 1933, 4, 83–91. [Google Scholar]
Anderson, T.W.; Darling, D.A. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
National Aeronautics and Space Administration (NASA). Gamma Ray. 2020. Available online: http://hesperia.gsfc.nasa.gov/fermi/gbm/qlook/fermi_gbm_flare_list.txt (accessed on 31 December 2020).
National Weather Service (NOAA). Space Weather Prediction Center. 2020. Available online: https://satdat.ngdc.noaa.gov/sem/goes/data/plots (accessed on 31 December 2020).

Figure 1. Underlying Fréchet (

0.25

),

ρ = - 1,

β = 0.5

.

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 1.9

(

\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 1.88

). (b) The RMSE of Ln-quantile estimation,

p = 0.0005

,

\hat{α} = 1.14 .

Figure 1. Underlying Fréchet (

0.25

),

ρ = - 1,

β = 0.5

.

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 1.9

(

\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 1.88

). (b) The RMSE of Ln-quantile estimation,

p = 0.0005

,

\hat{α} = 1.14 .

Figure 2. Underlying

G P D

(

0.5

),

ρ = - 0.5,

β = 1

,

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 4.47

(\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 4.42) .

(b) The RMSE of Ln-quantile estimation,

p = 0.0005

,

\hat{α} = - 0.7482

.

Figure 2. Underlying

G P D

(

0.5

),

ρ = - 0.5,

β = 1

,

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 4.47

(\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 4.42) .

(b) The RMSE of Ln-quantile estimation,

p = 0.0005

,

\hat{α} = - 0.7482

.

Figure 3. Underlying

G P D

(2),

ρ = - 2,

β = 1

,

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 14.51

(

\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 14.49

). (b) The RMSE of ln-quantile estimators,

p = 0.0005

,

\hat{α} = - 2.8417 .

Figure 3. Underlying

G P D

(2),

ρ = - 2,

β = 1

,

N = 500,

n = 1000 .

(a) The means of ln-quantile estimators with the true

\ln V a R_{0.0005} \approx 14.51

(

\ln {\hat{Q}}_{n e w, \bar{H}}^{0.0005} \approx 14.49

). (b) The RMSE of ln-quantile estimators,

p = 0.0005

,

\hat{α} = - 2.8417 .

Figure 4.

F r \overset{´}{e} c h e t

(0.25) model, 95% confidence interval of quantile estimators,

N = 500,

n = 1000,

p = 0.0005

,

β = 0.5, ρ = - 1,

α (1000) = L R = 1.14,

k_{0}^{Q_{H}} = 165,

k_{01} = 395 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.2668. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 4.

F r \overset{´}{e} c h e t

(0.25) model, 95% confidence interval of quantile estimators,

N = 500,

n = 1000,

p = 0.0005

,

β = 0.5, ρ = - 1,

α (1000) = L R = 1.14,

k_{0}^{Q_{H}} = 165,

k_{01} = 395 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.2668. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 5. The

G P D

(0.5) model, 95% confidence interval of quantile estimators,

N = 500,

n = 1000,

p = 0.0005,

β = 1, ρ = - 0.5,

\hat{α} (1000) = \hat{ρ} = - 0.7482,

k_{0}^{Q_{H}} = 28,

k_{01} = 93 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.7094. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 5. The

G P D

(0.5) model, 95% confidence interval of quantile estimators,

N = 500,

n = 1000,

p = 0.0005,

β = 1, ρ = - 0.5,

\hat{α} (1000) = \hat{ρ} = - 0.7482,

k_{0}^{Q_{H}} = 28,

k_{01} = 93 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.7094. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 6. The

G P D

(2) model, 95% confidence interval of quantile estimators

N = 500, n = 1000,

p = 0.0005,

β = 1, ρ = - 2,

\hat{α} (1000) = L R = - 2.8417,

{\hat{k}}_{0} = 75,

{\hat{k}}_{0}^{Q_{H}} = 80,

{\hat{k}}_{01} = 70 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 2.2511. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 6. The

G P D

(2) model, 95% confidence interval of quantile estimators

N = 500, n = 1000,

p = 0.0005,

β = 1, ρ = - 2,

\hat{α} (1000) = L R = - 2.8417,

{\hat{k}}_{0} = 75,

{\hat{k}}_{0}^{Q_{H}} = 80,

{\hat{k}}_{01} = 70 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 2.2511. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 7. Flu original data from 1 January 1997 to December 31 2019,

n^{*} =

994 weeks, (a) Flu chart of type A flu viruses detected in Canada, and

n =

111 weeks remaining after the threshold, of average 953 flu viruses. (b) Histogram of the number of type A flu viruses detected in Canada.

Figure 7. Flu original data from 1 January 1997 to December 31 2019,

n^{*} =

994 weeks, (a) Flu chart of type A flu viruses detected in Canada, and

n =

111 weeks remaining after the threshold, of average 953 flu viruses. (b) Histogram of the number of type A flu viruses detected in Canada.

Figure 8. After threshold 953 flu viruses, Flu transformation data,

n = 111,

(a) Log-log plot of flu in Canada example. (b) Estimate GPD curve and the 99% high quantile and histogram of the distribution of type A flu viruses detested weekly.

Figure 8. After threshold 953 flu viruses, Flu transformation data,

n = 111,

(a) Log-log plot of flu in Canada example. (b) Estimate GPD curve and the 99% high quantile and histogram of the distribution of type A flu viruses detested weekly.

Figure 9. For flu in the Canadian data,

n = 111,

(a) Estimates of the second-order parameter

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0;

(b) Estimates

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

(c) Tail index estimators, H,

\bar{H};

(d) ln-quantile estimators,

p = 0.01 .

The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level.

Figure 9. For flu in the Canadian data,

n = 111,

(a) Estimates of the second-order parameter

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0;

(b) Estimates

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

(c) Tail index estimators, H,

\bar{H};

(d) ln-quantile estimators,

p = 0.01 .

The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level.

Figure 10. 95% confidence interval of three ln-quantile estimators after the threshold 953 for the flu in Canada example.

n = 111

,

p = 0.01 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.7966. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 10. 95% confidence interval of three ln-quantile estimators after the threshold 953 for the flu in Canada example.

n = 111

,

p = 0.01 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 0.7966. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 11. Two weeks plot of gamma ray & X ray from July 2 to 16, 2012.

Figure 12. Gamma ray original data from November 2008 to April 2017,

n^{*} = 5182,

(a) Gamma ray released V.S solar flare occurred. After the threshold of 86 million counts,

n = 104

flares remaining. (b) Histogram of gamma ray released from solar flares.

Figure 12. Gamma ray original data from November 2008 to April 2017,

n^{*} = 5182,

(a) Gamma ray released V.S solar flare occurred. After the threshold of 86 million counts,

n = 104

flares remaining. (b) Histogram of gamma ray released from solar flares.

Figure 13. After threshold 86 millions count, transformation data,

n = 104,

(a) Log-log plot of gamma ray from solar flare example. (b) The Estimate GPD and the 99% high quantile of the distribution of gamma ray released by solar flare.

Figure 13. After threshold 86 millions count, transformation data,

n = 104,

(a) Log-log plot of gamma ray from solar flare example. (b) The Estimate GPD and the 99% high quantile of the distribution of gamma ray released by solar flare.

Figure 14. For gamma ray of solar flare example,

n = 104,

(a) Estimates of the second-order parameters

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0

, (b) Estimates

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

(c) Tail index estimators, H,

\bar{H}

. (d) ln-quantile estimators,

p = 0.01 .

The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level.

Figure 14. For gamma ray of solar flare example,

n = 104,

(a) Estimates of the second-order parameters

\hat{ρ}

and

{\hat{ρ}}_{τ} (k)

,

τ = 0

, (b) Estimates

\hat{β}

and

{\hat{β}}_{{\hat{ρ}}_{0}} (k) .

(c) Tail index estimators, H,

\bar{H}

. (d) ln-quantile estimators,

p = 0.01 .

The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level.

Figure 15. 95% confidence interval of three ln-quantile estimators after threshold of 86 million counts for the gamma ray example.

n = 104

,

p = 0.01 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 1.4451. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Figure 15. 95% confidence interval of three ln-quantile estimators after threshold of 86 million counts for the gamma ray example.

n = 104

,

p = 0.01 .

Note that

\ln {\hat{Q}}_{n e w, \bar{H}}

(purple) has shortest CI with length 1.4451. (The solid circles “•” in the plot are the values of the quantile estimators at their optimal k level).

Table 1. The four ln-quantile estimators we use in simulations.

Quantile Estimators	Defined in	Tail Index Estimator
$\ln Q_{\hat{γ} = H} = \ln q_{H}$	(6)	H in (5)
$\ln Q_{H}$	(7)	H in (5)
$\ln {\tilde{Q}}_{\bar{H}}$	$\begin{matrix} \ln Q_{\bar{H}} when ρ \neq - 1 in (12) \\ \ln {\bar{Q}}_{\bar{H}} when ρ = - 1 in (14) \end{matrix}$	$\bar{H}$ in (9)
$l n {\hat{Q}}_{n e w, \bar{H}}$	(19)	$\bar{H}$ in (9)

Table 2. Fréchet (

0.25), N = 500

,

β = 0.5, ρ = - 1

. Mean, MSE, REFF of the

\ln V a R

Estimators. The highest REFF values are in bold.

Table 2. Fréchet (

0.25), N = 500

,

β = 0.5, ρ = - 1

. Mean, MSE, REFF of the

\ln V a R

Estimators. The highest REFF values are in bold.

n		500	1000	2000	5000
$\ln {V a R}_{p},$ $p = 1 / (2 n)$		1.7268	1.9002	2.0735	2.3026
$k_{0}$		126	200	318	585
$\hat{α} (n) = L R$		1.1218	1.1400	0.4991	−0.0357
$\ln q_{H}$	Mean (MSE)	1.8526 (0.0429)	2.0038 (0.0300)	2.1657 (0.0228)	2.3755 (0.0147)
	REFF	1	1	1	1
$\ln Q_{H}$	Mean (MSE)	1.7906 (0.0219)	1.9540 (0.0154)	2.1239 (0.0115)	2.3431 (0.0074)
	REFF	1.4004	1.3933	1.4104	1.4125
$\ln {\bar{Q}}_{\bar{H}}$	Mean (MSE)	1.7092 (0.0206)	1.8849 (0.0141)	2.0764 (0.0111)	2.3073 (0.0072)
	REFF	1.4419	1.4576	1.4347	1.4257
$\ln {\hat{Q}}_{n e w, \bar{H}}$	Mean (MSE)	1.7185 (0.0095)	1.8791 (0.0065)	2.0716 (0.0051)	2.2798 (0.0044)
	REFF	2.1252	2.1399	2.1139	1.8231

Table 3. GPD (

0.5),

N = 500,

β = 1, ρ = - 0.5

. Mean, MSE, REFF of the

\ln V a R

estimators. The highest REFF values are in bold.

Table 3. GPD (

0.5),

N = 500,

β = 1, ρ = - 0.5

. Mean, MSE, REFF of the

\ln V a R

estimators. The highest REFF values are in bold.

n		500	1000	2000	5000
$\ln {V a R}_{p},$ $p = 1 / (2 n)$		4.1149	4.4710	4.8242	5.2883
$k_{0}$		34	48	68	107
$\hat{α} (n) = \hat{ρ}$		−0.7512	−0.7482	−0.7427	−0.7244
$\ln q_{H}$	Mean (MSE)	4.7019 (0.6349)	4.9773 (0.4863)	5.3065 (0.4554)	5.7209 (0.3427)
	REFF	1	1	1	1
$\ln Q_{H}$	Mean (MSE)	4.2913 (0.2172)	4.6258 (0.1628)	4.9904 (0.1491)	5.4485 (0.1074)
	REFF	1.7159	1.7282	1.7478	1.7865
$\ln Q_{\bar{H}}$	Mean (MSE)	4.1140 (0.1654)	4.4801 (0.1267)	4.8656 (0.1166)	5.3434 (0.0825)
	REFF	1.9663	1.9591	1.9763	2.0379
$\ln {\hat{Q}}_{n e w, \bar{H}}$	Mean (MSE)	3.9076 (0.0779)	4.4239 (0.0233)	4.8674 (0.0241)	5.4359 (0.0382)
	REFF	2.8657	4.5666	4.3428	2.9954

Table 4. GPD(

2),

N = 500,

β = 1, ρ = - 2 .

Mean, MSE, REFF of the

\ln V a R

estimators. The highest REFF values are in bold.

Table 4. GPD(

2),

N = 500,

β = 1, ρ = - 2 .

Mean, MSE, REFF of the

\ln V a R

estimators. The highest REFF values are in bold.

n		500	1000	2000	5000
$\ln {V a R}_{p},$ $p = 1 / (2 n)$		13.1224	14.5087	15.8949	17.7275
$k_{0}$		170	269	515	1071
$\hat{α} (n) = L R$		−2.7893	−2.8417	−2.8687	−3.1684
$\ln q_{H}$	Mean (MSE)	13.6276 (1.3232)	14.9415 (0.9745)	16.2733 (0.6548)	18.0099 (0.3833)
	REFF	1	1	1	1
$\ln Q_{H}$	Mean (MSE)	13.4502 (0.9965)	14.8004 (0.7283)	16.1618 (0.4779)	17.9283 (0.2804)
	REFF	1.1523	1.1567	1.2412	1.1693
$\ln Q_{\bar{H}}$	Mean (MSE)	13.1933 (0.8926)	14.5960 (0.6477)	15.9719 (0.4491)	17.7717 (0.2751)
	REFF	1.2175	1.2267	1.2075	1.1804
$\ln {\hat{Q}}_{n e w, \bar{H}}$	Mean (MSE)	13.0009 (0.6007)	14.4907 (0.3680)	15.8926 (0.3070)	17.6429 (0.2127)
	REFF	1.4841	1.6274	1.4606	1.3426

Table 5.

N = 500, n = 1000,

efficiencies of 95% CI for

\ln V a R_{0.01}

.

Table 5.

N = 500, n = 1000,

efficiencies of 95% CI for

\ln V a R_{0.01}

.

	$C I$ of	at Optimal k	Length	${EFF}_{length}$	$\begin{matrix} Probability \\ Coverage \end{matrix}$	${EFF}_{P . C .}$
	$\ln Q_{H}$	$k_{0}^{Q_{H}} =$ 165	0.5142	1	94.2%	1
Fréchet (0.25)	$\ln {\bar{Q}}_{\bar{H}}$	$k_{01} =$ 395	0.3564	1.4517	96.7%	0.4706
	$\ln {\hat{Q}}_{n e w, \bar{H}}$	$k_{01} =$ 395	0.2668	1.9275	99.6%	0.1739
	$\ln Q_{H}$	$k_{0}^{Q_{H}} =$ 28	2.4922	1	47.4%	1
GPD(0.5)	$\ln Q_{\bar{H}}$	$k_{01} =$ 93	1.5204	1.6392	79.0%	2.9750
	$\ln {\hat{Q}}_{n e w, \bar{H}}$	$k_{01} =$ 93	0.7094	3.5130	99.6%	10.3478
	$\ln Q_{H}$	$k_{0}^{Q_{H}} =$ 270	3.4410	1	79.7%	1
GPD(2)	$\ln Q_{\bar{H}}$	$k_{01} =$ 511	2.7291	1.2609	83.2%	1.2966
	$\ln {\hat{Q}}_{n e w, \bar{H}}$	$k_{01} =$ 511	2.2511	1.5286	99.6%	3.3261

Table 6. The goodness-of-fit tests under the

G P D

model for the flu in Canada data.

Table 6. The goodness-of-fit tests under the

G P D

model for the flu in Canada data.

	Goodness-of-Fit Tests
	K-S Test		A-D Test		C-v-M Test
	$\begin{matrix} Test \\ Statistics \end{matrix}$	p-Value	$\begin{matrix} Test \\ Statistics \end{matrix}$	$p$ -Value	$\begin{matrix} Test \\ Statistics \end{matrix}$	$p$ -Value
${\hat{γ}}_{M L E}$	0.0628	0.6406	0.4475	0.8007	0.0621	0.8006

Table 7.

A E

and

I E

under the

G P D

model for the flu in Canada data by using

{\hat{γ}}_{M L E}

.

Table 7.

A E

and

I E

under the

G P D

model for the flu in Canada data by using

{\hat{γ}}_{M L E}

.

	Absolute Errors ( $AE$ )			Integrated Errors ( $IE$ )
	$\begin{matrix} r th Highest Amount \\ of Type A Viruses \end{matrix}$			$\begin{matrix} r th Highest Amount \\ of Type A Viruses \end{matrix}$
	$r = 12$	$r = 56$	$r = 111$	$r = 12$	$r = 56$	$r = 111$
${\hat{γ}}_{M L E}$	0.0450	0.0450	0.0628	0.0085	0.0071	0.0074

Table 8. Estimated

V a R_{0.05}

and

V a R_{0.01}

for the flu in Canada data. (Unit: Type A flu viruses).

Table 8. Estimated

V a R_{0.05}

and

V a R_{0.01}

for the flu in Canada data. (Unit: Type A flu viruses).

Estimation	$\hat{α}$	$\hat{γ}$	Mean	Median	${VaR}_{0.05}$	${VaR}_{0.01}$
$\ln Q_{H}$	$N / A$	$H =$ 0.4370	3219.29	2257.03	4519.70	8159.10
$\ln Q_{\bar{H}}$	$N / A$	$\bar{H} =$ 0.3736	2989.93	2130.78	3736.80	6031.79
$\ln {\hat{Q}}_{n e w, \bar{H}}$	$\hat{ρ} = - 0.7101$	$\bar{H} =$ 0.3736	2989.93	1690.07	2924.80	5499.85

Table 9. The 95% confidence interval for

\ln V a R_{0.01}

and

V a R_{0.01}

.

Table 9. The 95% confidence interval for

\ln V a R_{0.01}

and

V a R_{0.01}

.

$\begin{matrix} Estimation \\ Method \end{matrix}$	k	$LCL$	$\ln {VaR}_{0.01}$ ( ${VaR}_{0.01})$	$UCL$	Length	$EFF$
$\ln Q_{H}$	${\hat{k}}_{0} = 21$	0.6920	1.7312	2.1452	1.4531	1
( $Q_{H})$		(3502.14)	(8159.10)	(11854.31)	(8352.17)	(1)
$\ln Q_{\bar{H}}$	${\hat{k}}_{01} = 42$	0.7929	1.3814	1.9698	1.1770	1.2346
( $Q_{\bar{H}})$		(3772.58)	(6031.78)	(10101.19)	(6328.21)	(1.3197)
$\ln Q_{n e w, \bar{H}}$	${\hat{k}}_{01} = 42$	0.8724	1.2707	1.6690	0.7966	1.8242
( $Q_{n e w, \bar{H}})$		(4006.07)	(5499.85)	(7724.49)	(3718.42)	(2.2462)

Table 10. Compare the goodness-of-fit tests under the

G P D

model for the gamma ray data.

Table 10. Compare the goodness-of-fit tests under the

G P D

model for the gamma ray data.

	Goodness-of-Fit Tests
	$K - S$ Test		$A - D$ Test		$C - v - M$ Test
	$\begin{matrix} Test \\ Statistics \end{matrix}$	p-Value	$\begin{matrix} Test \\ Statistics \end{matrix}$	$p$ -Value	$\begin{matrix} Test \\ Statistics \end{matrix}$	$p$ -Value
${\hat{γ}}_{M L E}$	0.0697	0.5750	0.7276	0.5362	0.0991	0.5893

Table 11.

A E

and

I E

under the

G P D

model for the gamma ray data using

{\hat{γ}}_{M L E}

.

Table 11.

A E

and

I E

under the

G P D

model for the gamma ray data using

{\hat{γ}}_{M L E}

.

	Absolute Errors ( $AE$ )			Integrated Errors ( $IE$ )
	rth Highest Gamma Ray Released			rth Highest Gamma Ray Released
	$r = 11$	$r = 53$	$r = 104$	$r = 11$	$r = 53$	$r = 104$
${\hat{γ}}_{M L E}$	0.0359	0.0697	0.0697	0.0062	0.0092	0.0089

Table 12. Estimated

V a R_{0.05}

and

V a R_{0.01}

in the gamma ray example. (Unit: million counts).

Table 12. Estimated

V a R_{0.05}

and

V a R_{0.01}

in the gamma ray example. (Unit: million counts).

$\begin{matrix} Estimation \\ Method \end{matrix}$	$\hat{α}$	$\hat{γ}$	Mean	Median	${VaR}_{0.05}$	${VaR}_{0.01}$
$\ln Q_{H}$	N/A	$H =$ 0.5324	451.82	315.27	867.12	1926.04
$\ln Q_{\bar{H}}$	N/A	$\bar{H} =$ 0.6517	577.22	232.28	742.01	1958.67
$\ln {\hat{Q}}_{n e w, \bar{H}}$	$\hat{ρ} = -$ 0.7269	$\bar{H} =$ 0.6517	577.22	189.35	441.60	1102.57

Table 13. The 95% confidence interval of

\ln V a R_{0.01} t .

and

V a R_{0.01} .

Table 13. The 95% confidence interval of

\ln V a R_{0.01} t .

and

V a R_{0.01} .

$\begin{matrix} Estimation \\ Method \end{matrix}$	k	$LCL$	$\ln {VaR}_{0.01}$ ( ${VaR}_{0.01})$	$UCL$	Length	$EFF$
$\ln Q_{H}$	${\hat{k}}_{0} = 21$	1.0807	2.3755	2.8864	1.8057	1
( $Q_{H})$		(590.12)	(1926.04)	(3153.18)	(2563.06)	(1)
$\ln Q_{\bar{H}}$	${\hat{k}}_{01} = 41$	1.3367	2.3930	3.4494	2.1128	0.8547
( $Q_{\bar{H}})$		(737.14)	(1958.67)	(5471.71)	(4734.56)	(0.5414)
$\ln Q_{n e w, \bar{H}}$	${\hat{k}}_{01} = 41$	1.0595	1.7821	2.5047	1.4451	1.2495
( $Q_{n e w, \bar{H}})$		(579.55)	(1102.57)	(2179.85)	(1600.30)	(1.6016)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, M.L.; Raney-Yan, X. A Method for Confidence Intervals of High Quantiles. Entropy 2021, 23, 70. https://doi.org/10.3390/e23010070

AMA Style

Huang ML, Raney-Yan X. A Method for Confidence Intervals of High Quantiles. Entropy. 2021; 23(1):70. https://doi.org/10.3390/e23010070

Chicago/Turabian Style

Huang, Mei Ling, and Xiang Raney-Yan. 2021. "A Method for Confidence Intervals of High Quantiles" Entropy 23, no. 1: 70. https://doi.org/10.3390/e23010070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method for Confidence Intervals of High Quantiles

Abstract

1. Introduction

2. Existing Estimator for High Quantiles

2.1. Quantile Function-Tail Index Method

2.2. Weissman Method

2.3. Reduced-Bias Method

2.4. Optimal k Values

3. New Estimator for High Quantile

3.1. New Estimator

3.2. Asymptotic Properties of the New Estimator ln Q ^ N e w , H ¯ ( p )

3.3. The C.I. for The New Estimator ln Q ^ N e w , H ¯ ( p )

4. Simulations

4.1. Computer Simulations of Quantile Estimators

4.2. The Choice of α

4.3. Simulation of F r e ´ c h e t (0.25). GPD (0.5) and GPD (2)

4.4. Simulations of Confidence Intervals

5. Applications

5.1. Flu in Canada Example

5.1.1. Goodness-of-Fit Test

5.1.2. Compare Four Estimation Methods

5.1.3. Summary

5.2. Gamma Ray of Solar Flare Example

5.2.1. Gooness-of-Fit Tests

5.2.2. Compare Four Estimation Methods

5.2.3. Summary

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of Theorems 1 and 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Asymptotic Properties of the New Estimator ln ${\hat{Q}}_{N e w, \bar{H}}^{(p)}$

3.3. The C.I. for The New Estimator ln ${\hat{Q}}_{N e w, \bar{H}}^{(p)}$

4.2. The Choice of $α$

4.3. Simulation of $F r \overset{´}{e} c h e t$ (0.25). GPD (0.5) and GPD (2)