Return Level Prediction with a New Mixture Extreme Value Model

Altun, Emrah; Alqifari, Hana N.; Söyler, Kadir

doi:10.3390/math13172705

Open AccessArticle

Return Level Prediction with a New Mixture Extreme Value Model

by

Emrah Altun

^1,*

,

Hana N. Alqifari

²

and

Kadir Söyler

³

¹

Department of Statistics, Gazi University, Ankara 06560, Turkey

²

Department of Statistics and Operations Research, College of Science, Qassim University, Buraydah 51482, Saudi Arabia

³

Department of Mathematics, Bartin University, Bartin 74100, Turkey

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(17), 2705; https://doi.org/10.3390/math13172705

Submission received: 3 July 2025 / Revised: 19 August 2025 / Accepted: 19 August 2025 / Published: 22 August 2025

(This article belongs to the Special Issue Mathematical Modelling and Applied Statistics)

Download

Browse Figures

Versions Notes

Abstract

The generalized Pareto distribution is frequently used for modeling extreme values above an appropriate threshold level. Since the process of determining the appropriate threshold value is difficult, a mixture of extreme value models rises to prominence. In this study, mixture extreme value models based on exponentiated Pareto distribution are proposed. The Weibull, gamma, and log-normal models are used as bulk densities. The parameter estimates of the proposed models are obtained using the maximum likelihood approach. Two different approaches based on maximization of the log-likelihood and Kolmogorov–Smirnov p-value are used to determine the appropriate threshold value. The effectiveness of these methods is compared using simulation studies. The proposed models are compared with other mixture models through an application study on earthquake data. The GammaEP web application is developed to ensure the reproducibility of the results and the usability of the proposed model.

Keywords:

return level; extreme value distribution; exponentiated Pareto; simulation

MSC:

62E15

1. Introduction

Earthquakes are one of the most important natural disasters that people are exposed to throughout their lives. Although the occurrence times of earthquakes cannot be predicted, the recurrence periods of earthquakes can be calculated by examining the characteristic structures and magnitudes of earthquakes. Peaks-over-threshold (POT) and block-maxima (BM) methods are statistical methods used to calculate the return levels of earthquakes. The POT method is based on the fact that observations above a certain threshold level can be expressed by a generalized Pareto distribution (GPD). The BM method is based on the modeling of the maximum (minimum) series obtained depending on the determined block length (monthly or annually) with the generalized extreme value (GEV) distribution.

The BM and POT methods have been widely utilized in the literature. Ref. [1] applied the GPD to model the earthquake energy distribution using the Harvard catalog that contains the seismic events between 1 January 1997 and 31 May 2000. Ref. [2] compared the lognormal and GPD models and found that the GPD model represents the tail behavior of the underlying data better than the lognormal distribution. Ref. [3] used the POT-GPD method to estimate the recurrence intervals for the Yunnan region of China. Ref. [4] demonstrated that the right-truncated-POT method produces more accurate results than the classic POT method. Ref. [5] estimated the statistical distribution of the maximum magnitudes of the earthquakes using the EVT approach. Ref. [6] examined the tail index estimators of the GPD model in detail. Ref. [7] proposed the sub-sampling block maxima method for modeling environmental extreme risk and offered a new approach for the estimation of the tail index under temporal dependence.

Applications of extreme value theory in areas other than earthquakes have become widespread. Ref. [8] modeled strong winds with extreme value techniques. In financial and insurance applications, refs. [9,10] investigated the effects of extreme values in portfolio management and risk measurement, while [11] laid the first theoretical foundations for extreme order statistics. Ref. [12] proposed a new approach called the POT-KumGP to model the extreme rainfall. Ref. [13] used GEV distribution to estimate the extreme rainfall events.

In response to the uncertainty and incompatibility problems of threshold selection, mixture models have come to the fore. Ref. [14] modeled extreme events in a Bayesian framework by defining the threshold value as a model parameter. Ref. [15] proposed a weighted mixture model instead of a fixed threshold. Ref. [16] developed a flexible mixture extreme value model with parametric flexibility to accurately represent both bulk and tail regions. Also, ref. [17] developed a semi-parametric Bayesian approach to extreme events, providing flexibility in both threshold estimation and tail structure. Ref. [18] proposed a model based on the Dirichlet distribution for the multivariate extremes. Ref. [19] introduced an R package called evmix to implement the mixture EVT models. Ref. [20] developed an approach based on the Mahalanobis distance for the determination of the appropriate threshold value. Ref. [21] used the mixture EVT models to predict the household income distribution. The dependence between the irradiance, temperature, and humidity was modeled by the help of the mixture EVT models [22].

In this study, we propose new mixture EVT models using the exponentiated Pareto (EP) distribution as the tail distribution. Although the GP distribution is frequently used in modeling extreme values and tail behavior, it does not provide the necessary flexibility for data with different structures because it has only one shape parameter. Since the EP distribution has two shape parameters, it provides an efficient modeling process in situations where the GP distribution fails. The main motivation of the presented study is to introduce a flexible model that is suitable for the extreme observations, such as earthquake, rainfall, and wind speed data modeling. The EP distribution is a generalized version of the GP distribution (see Section 3). So, it provides more flexibility in terms of skewness, kurtosis, and thicker tails than those of the GP distribution. The contributions of the presented study can be highlighted as follows:

✓: The EP-based mixture EVT models that generalized GP models are introduced.
✓: The threshold selection process in the mixture EVT model is performed with two approaches: maximization of likelihood and test statistics of the goodness-of-fit test.
✓: The proposed approach is compared with GP-based models on the earthquake data.
✓: The GammaEP web tool is developed to make the implementation of the proposed model easy for users.

The model parameters are estimated with the maximum likelihood estimation (MLE) approach. A simulation study is performed to evaluate the performance of the MLE approach in estimating the EP-based EVT models. As mentioned above, the appropriate threshold value is determined using two different methods. These are based on the maximization of the likelihood function and maximization of the Kolmogorov–Smirnov (KS) p-value. In the existing mixture EVT models, the threshold parameter is assumed to be a model parameter that increases the parameter vector. Therefore, the estimation problem in these models becomes more complex.

The rest of the study is organized as follows: In Section 2, the general structure of mixture EVT models is given. In Section 3, EP-based mixture EVT models are introduced. In Section 4, the parameter-estimation process of EP-based mixture EVT models is discussed, and the validity of the estimation method is analyzed by simulation. In Section 5, the models are compared using Istanbul earthquake data. In Section 6, the cloud-based software developed for the proposed model is presented. The results obtained in the study are summarized in Section 7.

2. Mixture EVT Models

One of the most powerful ways of modeling extreme observations is the POT method. The results given in [11] are the basics of the POT method. Let X be a random variable and

Y = X - u

represent the values above the predefined threshold, u. Refs. [11,23] showed that the values above the sufficiently high threshold u are well approximated by the GPD. The cumulative distribution function (cdf) of the GPD is

G (x) = P (X \leq x| X > u) = 1 - {(1 + δ \frac{x - u}{σ_{u}})}^{- \frac{1}{δ}}, δ \neq 0,

(1)

where

u > 0

is the threshold value, and

σ_{u} > 0

and

δ

\in ℜ

are scale and shape parameters, respectively. If

δ \geq 0

,

x \geq u

; otherwise,

u \leq x \leq u - σ_{u} / δ

. The probability density function (pdf) of the GPD is

g (x; σ_{u}, δ, u) = \frac{1}{σ_{u}} {(1 + \frac{(x - u) δ}{σ_{u}})}^{- \frac{(1 + δ)}{δ}} .

(2)

The threshold-exceedance probability, also known as the tail fraction is calculated by

ϕ_{u} = P (X > u)

. The tail fraction is used to calculate the survival probability

P (X > x) = ϕ_{u} [1 - P (X \leq x| X > u)],

(3)

where

P (X \leq x| X > u)

is the cdf of the GPD. The most important issue of the POT method is to determine the appropriate threshold value. Graphical methods such as mean excess plot, Hill plot, and threshold stability plot are used for that purpose. Information in detail on the appropriate threshold selection can be found in [24]. As mentioned in [25], more than one suitable threshold value may be possible.

To overcome the threshold-selection problem, mixture extreme value models have been proposed by [14]. In the mixture extreme value models, modeling is performed using all the data instead of values that are above a certain threshold. The general form of the pdf in a mixture extreme value model is

\begin{matrix} f (x) = \{\begin{matrix} h (x; Φ_{1}), & x \leq u, \\ (1 - H (u; Φ_{1})) g (x; Φ_{2}), & x > u, \end{matrix} \end{matrix}

(4)

where

h (x; Φ_{1})

is the density for the observations less than the threshold u,

g (x; Φ_{2})

is the heavy-tailed distribution and represents extreme observations located at the tail of the distribution, and

Φ_{1}

and

Φ_{2}

are the parameter vectors for the densities

h (\cdot)

and

g (\cdot)

, respectively. The cdf of (4) is

\begin{matrix} F (x) = \{\begin{matrix} H (x; Φ_{1}), & x \leq u, \\ H (u; Φ_{1}) + [1 - H (u; Φ_{1})] G (x; Φ_{2}), & x > u, \end{matrix} \end{matrix}

(5)

where

G (x; Φ_{2})

is the cdf of the heavy-tailed distribution. When the tail distribution is modeled by GPD, the return level is calculated

r_{t} = u + \frac{σ}{δ} ({(1 - p^{'})}^{- δ} - 1),

(6)

where

p^{'} = \frac{1 - \frac{1}{t} - H (u; Φ_{1})}{1 - H (u; Φ_{1})} .

(7)

Many mixture EVT models have been defined using different G and H functions. The Weibull-GPD model by [15], the gamma-GPD model by [14], the kernel density estimator with GPD by [16], normal-GPD by [26,27], and lognormal-GPD by [28]. In this study, we develop a more flexible approach than the existing bulk models by considering the EP distribution for the tail modeling. We use three distributions for the bulk part of the model: gamma, Weibull, and log-normal. The tail of the distribution is modeled by EP distribution.

3. Mixture Models Based on the Exponentiated Pareto

The EP distribution was proposed by [29] and its parameter estimation was discussed by [30]. The pdf of the EP is

G (y; α, θ) = {[1 - {(1 + y)}^{- α}]}^{θ},

(8)

where

y > 0

and

α, θ > 0

are both shape parameters. The pdf of (8) is

g (y; α, θ) = α θ {[1 - {(1 + y)}^{- α}]}^{θ - 1} {(1 + y)}^{- α - 1} .

(9)

The EP distribution reduces to the Pareto distribution when

θ = 1

. We also investigate the relation between EP and GP distributions. Let

X = α Y

where

Y \sim EP (α, θ)

. The resulting distribution is

g (x; α, θ) = θ {[1 - {(1 + \frac{x}{α})}^{- α}]}^{θ - 1} {(1 + \frac{x}{α})}^{- α - 1} .

(10)

It is obvious that the pdf in (10) is the exponentiated GP distribution with fixed scale parameter

σ = 1

. When the parameter

θ = 1

, we have the GP distribution with

δ = 1 / α

and

σ = 1

. Now, we introduce the location-EP distribution. Let

X = Y + u

; then, the resulting distribution is

g (x; α, θ, u) = α θ {[1 - {(1 + (x - u))}^{- α}]}^{θ - 1} {(1 + (x - u))}^{- α - 1},

(11)

where u is the location parameter and

x > u

. The pdf in (11) is called location-EP distribution. The cdf of the location-EP is

G (x; α, θ, u) = {[1 - {(1 + (x - u))}^{- α}]}^{θ} .

(12)

The gamma, Weibull, and lognormal distributions are used as bulk densities. In the rest of the section, the gamma-EP, Weibull-EP, and log-normal-EP models are described, respectively.

3.1. Gamma-EP

First, we describe the gamma-EP model that uses the gamma distribution for bulk density and EP distribution for the tail density. The pdf and cdf of the gamma distribution are

h_{gamma} (x) = \frac{1}{Γ (a) b^{a}} x^{a - 1} exp (- \frac{x}{b}),

(13)

H_{gamma} (x) = \frac{1}{Γ (a)} γ (a, \frac{x}{b}),

(14)

where

a, b > 0

and

x > 0

. Using (4), the pdf of the gamma-EP model is

f (x; a, b, α, θ) = \{\begin{matrix} \frac{1}{Γ (a) b^{a}} x^{a - 1} exp (- \frac{x}{b}), & x \leq u \\ (1 - \frac{1}{Γ (a)} γ (a, \frac{u}{b})) α θ {[1 - {(1 + (x - u))}^{- α}]}^{θ - 1} \\ \times {(1 + (x - u))}^{- α - 1}, & x > u \end{matrix} .

(15)

where

α, θ > 0

. The pdf shapes of the gamma-EP distribution for

u = 3

and

u = 5

are displayed in Figure 1.

3.2. Weibull-EP

In the Weibull-EP model, the bulk of the data are modeled with the Weibull distribution, and the tail density that contains the extreme observations is modeled by the EP distribution. First, we describe the pdf and cdf of the Weibull distribution, respectively

h_{Weibull} (x) = \frac{k}{λ} {(\frac{x}{λ})}^{k - 1} exp ({(- \frac{x}{λ})}^{k}),

(16)

H_{Weibull} (x) = 1 - exp ({(- \frac{x}{λ})}^{k}),

(17)

where

k, λ > 0

and

x > 0

. The pdf of the Weibull-EP model is

\begin{matrix} f (x; k, λ, α, θ) = \{\begin{matrix} \frac{k}{λ} {(\frac{x}{λ})}^{k - 1} exp ({(- \frac{x}{λ})}^{k}), & x \leq u \\ exp ({(- \frac{u}{λ})}^{k}) α θ {[1 - {(1 + (x - u))}^{- α}]}^{θ - 1} {(1 + (x - u))}^{- α - 1}, & x > u \end{matrix} \end{matrix}

(18)

where

α, θ > 0

. The pdf shapes of the Weibull-EP model for

u = 2

and

u = 7

are displayed in Figure 2.

3.3. Log-Normal-EP

In the third model, the bulk density is modeled by the log-normal and the extreme observations are modeled by the EP distribution. The pdf and cdf of the log-normal distribution are

h_{log - normal} (x) = \frac{1}{x σ \sqrt{2 π}} exp (- \frac{{(ln x - μ)}^{2}}{2 σ^{2}}),

(19)

H_{log - normal} (x) = Φ (\frac{ln x - μ}{σ}),

(20)

where

μ \in ℜ

and

σ > 0

. The pdf of the log-normal-EP model is

\begin{matrix} f (x; μ, σ, α, θ) = \{\begin{matrix} \frac{1}{x σ \sqrt{2 π}} exp (- \frac{{(ln x - μ)}^{2}}{2 σ^{2}}), & x \leq u \\ (1 - Φ (\frac{ln u - μ}{σ})) α θ {[1 - {(1 + (x - u))}^{- α}]}^{θ - 1} {(1 + (x - u))}^{- α - 1}, & x > u \end{matrix} \end{matrix}

(21)

where

α, θ > 0

. The pdf shapes of the log-normal-EP distribution for

u = 3

are displayed in Figure 3.

The EP-based EVT models can handle right-skewed, bi-modal, and fat-tailed datasets. The model to be preferred for bulk density varies depending on the data structure.

3.4. Return Level Estimation

As in the GPD model, the return level estimation in EP-based mixture EVT models is performed by the quantile function of the EP distribution. The return level is calculated by

r_{t} = u + {(1 - {(p^{'})}^{\frac{1}{θ}})}^{- \frac{1}{α}} - 1,

(22)

where

p^{'}

is defined in (7).

4. Estimation

The MLE method is used to estimate the unknown parameters of the gamma-EP, Weibull-EP, and log-normal-EP models. The general form of the likelihood function for mixture EVT models is

L (Φ_{1}, Φ_{2}) = \prod_{x_{i} < u} h (x; Φ_{1}) + \prod_{x_{i} \geq u} (1 - H (u; Φ_{1})) g (x; Φ_{2}) .

(23)

Using (23), the log-likelihood functions of the gamma-EP, Weibull-EP, and log-normal-EP models are, respectively

\begin{matrix} ℓ (a, b, α, θ, u) & = & - n_{u^{-}} (log Γ (a) + α log b) + \sum_{x_{i} \leq u} - \frac{x_{i}}{b} \\ + & n_{u^{+}} (log (1 - \frac{1}{Γ (a)} γ (a, \frac{u}{b})) + log (α θ)) \\ + & \sum_{x_{i} > u} log ({[1 - {(1 + (x_{i} - u))}^{- α}]}^{θ - 1} {(1 + (x_{i} - u))}^{- α - 1}) \end{matrix}

(24)

\begin{matrix} ℓ (k, λ, α, θ, u) & = & n_{u^{-}} log (\frac{k}{λ}) + (k - 1) \sum_{x_{i} \leq u} log (\frac{x_{i}}{λ}) + \sum_{x_{i} \leq u} {(- \frac{x_{i}}{λ})}^{k} \\ + & n_{u^{+}} ({(- \frac{u}{λ})}^{k} + log (α θ)) + (θ - 1) \sum_{x_{i} > u} log ([1 - {(1 + (x_{i} - u))}^{- α}]) \\ - & (α + 1) \sum_{x_{i} > u} log (1 + (x_{i} - u)) \end{matrix}

(25)

\begin{matrix} ℓ (μ, σ, α, θ, u) & = & - n_{u^{-}} log (σ \sqrt{2 π}) - \sum_{x_{i} \leq u} log (x_{i}) - \sum_{x_{i} \leq u} \frac{{(ln x_{i} - μ)}^{2}}{2 σ^{2}} \\ + & n_{u^{+}} [log (1 - Φ (\frac{ln u - μ}{σ})) + log (α θ)] \\ + & (θ - 1) \sum_{x_{i} > u} log (1 - {(1 + (x_{i} - u))}^{- α}) \\ - & (α + 1) \sum_{x_{i} > u} log (1 + (x_{i} - u)) \end{matrix}

(26)

where

n_{u^{-}}

is the number of the observations below the threshold and

n_{u^{+}}

is the number of the observations above the threshold. There is no explicit solution for the maximum likelihood estimators of the gamma-EP model. Equations (24)–(26) are maximized using the Nelder–Mead algorithm that is available in the optim function of the R software, version 4.3.1. The asymptotic standard errors of the estimated parameters are obtained using the observed information matrix.

To prove the existence of the MLEs, we need to define the continuity constraint for the model given in (4). The continuity constraint, for

x = u

, is defined as

g (u; Φ_{2}) = \frac{h (u, Φ_{1})}{1 - H (u, Φ_{1})} .

(27)

Let

Φ = (Φ_{1}, Φ_{2})

be the parameter vector. If the likelihood function ℓ is continuous on

Φ

, the MLEs exist. It is clear that under the continuity constraint, the likelihood is continuous.

As mentioned earlier, the parameter estimates of the EP-based models are obtained by maximizing the log-likelihood functions using an optim/constrOptim function. The most important issue here is the determination of the initial parameter vector that is essential for the used optimization algorithm. The evmix package is used to determine the initial parameter vectors. The evmix package offers comprehensive content for the GPD-based mixture EVT models. The functions fgammagpd, fweibullgpd, and flognormgpd within the evmix package are used to obtain the initial parameter vectors for the gamma-EP, Weibull-EP, and log-normal-EP models, respectively.

4.1. Threshold Selection

In mixture EVT models, the threshold value u is considered as the model parameter and is estimated together with the other parameters of the model. The optimization algorithm aims to find the best parameter vector to maximize the likelihood function. However, in some cases the algorithm gets stuck at local maxima. In this case, the success of the estimated model in explaining the data is quite low. Moreover, accepting the threshold value as a parameter increases the parameter space and makes the optimization process more difficult. Therefore, the threshold parameter is estimated separately from the model parameters. In the application process of the study, the following problems occurred:

Problem 1: When the sample size is insufficient, the optim function obtains parameter estimates outside of the parameter domains.
Solution: The constrOptim function is used, and the parameter domains are defined as constraints in the optimization step.
Problem 2: In small samples, the constrOptim function can also get stuck at local minima values as in the optim function.
Solution: To avoid this situation, two different threshold-selection methods are used together.

The first approach to obtain the appropriate threshold value is to maximize the log-likelihood function according to the predefined threshold values. The value u that gives the highest log-likelihood value is chosen as the optimal threshold. However, as mentioned previously, the optimization algorithm may get stuck in local maxima. When the threshold value is determined by maximization of the log-likelihood value, the resulting model may not be statistically valid, especially for small samples. The optimization algorithm gets stuck at local minima values. Therefore, the two approaches should be used together to check the statistical validity of the model obtained with the log-likelihood approach. To overcome this problem, the p-value obtained from the KS test is used. In the log-likelihood approach, the aim is to maximize the log-likelihood value, while in the KS approach, the aim is to minimize the KS test statistic value, or in other words, to maximize the p-value. The results obtained using the KS approach guarantee the statistical validity, whereas the log-likelihood approach does not guarantee statistical validity. Therefore, the threshold value obtained using the log-likelihood approach can cause problems in terms of statistical validity, especially in small samples. To eliminate this problem, it is important to use both approaches together. For each value of u, the model parameters, log-likelihood value, and KS p-value are obtained. The efficiencies of both methods are analyzed and compared via simulation study.

4.2. Simulation

In this section, the parameters of the gamma-EP distribution are estimated via the MLE approach. The estimated biases and mean squared errors (MSEs) are calculated to evaluate the performance of the MLE approach in estimating the unknown parameters of the gamma-EP model. The inverse transform method is used to generate random variables from the gamma-EP model. The simulation is repeated 1000 times. The sample sizes are selected as

n = 50, 100, 200

, and 500. The true parameter values are

a = 2

b = 2

α = 0.5

and

θ = 0.9

. The threshold value u is determined using two approaches. These are maximization of the log-likelihood function and maximization of the KS p-value.

The results are given in Table 1. When the results obtained according to the likelihood approach are analyzed, it is seen that the MSE values obtained for the second shape parameter

θ

of the EP distribution are very high, especially for small samples. When the results obtained with the KS p-value approach are analyzed, it is seen that the bias and MSE values obtained for both small and large samples are close to zero and rapidly approach zero as the sample size increases. Therefore, the KS p-value approach is more appropriate for parameter estimation of the gamma-EP model, especially for small samples. The analyst should interpret the results according to both approaches and prefer the appropriate method. Similar results can be obtained for Weibull-EP and log-normal-EP models.

5. Application

5.1. Data

In the application section, earthquake magnitudes in Istanbul, Turkey, are discussed. The latitude range for Istanbul is 34.604–44.896 and the longitude range is 22.0894–42.524. The data are obtained from https://deprem.afad.gov.tr/event-catalog (accessed on 18 June 2025). The earthquakes that occurred between the dates of 1 January 1900 and 17 June 2025 are taken into consideration. Monthly maximum earthquake magnitudes are extracted from the underlying data. The monthly maximum earthquakes are given in Figure 4. The number of earthquakes that are analyzed is 253. Monthly maximum earthquake magnitudes are used to eliminate the autocorrelation problem in the earthquake data and to obtain an independent and identically distributed earthquake data set.

When examining the QQ and PP plots given in Figure 4, it is seen that the right tail probabilities cannot be expressed with a normal distribution and that a distribution with a fat tail is needed. Note that the given latitude and longitude values do not indicate the exact geographical boundaries of Istanbul province. The used values are defined in the AFAD system for the Istanbul province. Therefore, some earthquakes occur outside of the geographical boundaries of Istanbul, particularly in the Black Sea region.

5.2. Comparison of the Models

The dataset is analyzed with six mixture EVT models. These are gamma-GPD, Weibull-GPD, log-normal-GPD, gamma-EP, Weibull-EP, and log-normal-EP models. The estimated parameters and corresponding standard errors are listed in Table 2. The threshold value for the Gamma-GPD, Weibull-GPD, and log-normal-GPD models is estimated as

3.383

. The threshold value is estimated as

2.928

for the Gamma-EP and log-normal-EP models and

3.528

for the Weibull-EP model.

The model-selection criteria such as Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) are used to select the best model for the data. Also, the KS test is applied using the estimated parameters of the models. The minimum values of the AIC and BIC metrics with the highest KS p-value show the best model for the data. Table 3 shows the AIC, BIC, and KS p-values of the fitted mixture EVT models. The gamma-EP model has the lowest values of the AIC and BIC and also has the highest p-value for the KS test. Therefore, the gamma-EP model is selected as the best model for the data. According to the KS results, all models are statistically valid in explaining the data. However, the fundamental difference between the modeling successes lies in their ability to represent the empirical probabilities that are located in the right tail of the distribution.

The estimated pdfs of the mixture EVT models with the histogram of the data are displayed in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10. Also, the PP plots are given for each model. These figures show that all models are statistically valid for representing the data. When the PP plot is examined in detail, it can be seen that GPD-based EVT models are partially unsuccessful in both tail modeling and bulk density modeling. The EP-based EVT models provide quite successful results in both bulk density and tail modeling. The reason EP-based models achieve greater success compared to GPD-based models lies in the fact that the EP distribution has a more flexible structure than GPD. EVT models formulated under the EP distribution demonstrate superior performance in modeling data characterized by high skewness/kurtosis and fat tails.

Return level plots with simulated confidence intervals for each fitted model are displayed in Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16. From these figures, it is clear that the estimated return levels of the gamma-EP model are closer to the empirical return levels than those of the other fitted models. It can be seen that the theoretical return levels obtained with GPD-based models are quite far away from the empirical values.

The threshold-selection process of the gamma-EP, Weibull-EP, and log-normal-EP models is performed with maximization of the log-likelihood and KS p-value. Figure 17, Figure 18 and Figure 19 show the threshold-selection process of these models. The threshold values for gamma-EP and log-normal-EP models are very close to each other in both likelihood and KS p-value approaches. However, the p-value approach suggests a lower threshold value than the likelihood approach in the Weibull-EP model.

Table 4 shows the specific return levels estimated by the gamma-EP model. According to the results in Table 4, we have the following results for return levels 50, 100, and 1000:

1.: The highest recorded earthquake magnitude is anticipated to exceed 4.375 once every 50 months.
2.: The highest recorded earthquake magnitude is anticipated to exceed 4.897 once every 100 months.
3.: The highest recorded earthquake magnitude is anticipated to exceed 7.574 once every 1000 months.

6. GammaEP Software

The GammaEP tool, available at https://smartstat.shinyapps.io/GammaEP/ is designed to model and analyze earthquake magnitudes using a gamma-EP mixture model. It allows users to perform the threshold-selection method and parameter estimation and generate various diagnostic plots, all through an interactive interface. The GammaEP tool contains five panels. These are summarized below.

6.1. Upload Data

The users upload a CSV file that contains the earthquake dataset. The uploaded file should contain information for the date, magnitude, latitude, and longitude. The dates are parsed into monthly intervals, and the maximum magnitude per month is retained (see Figure 20).

6.2. Map View

The leaflet maps of monthly maximum earthquakes are displayed and earthquake markers are sized based on the magnitude (see Figure 21).

6.3. Threshold Selection

The users choose the threshold-selection method, log-likelihood, and KS p-value approaches and find the threshold that best fits empirical data. The results are visualized with an LL plot and a KS p-value plot, showing the optimal threshold with a blue dashed line (see Figure 22).

6.4. Parameter Estimates

Once the best threshold is selected, the tool estimates and reports the model parameters, standard errors, threshold, log-likelihood, and KS p-value (see Figure 23).

6.5. Plots

The return level, PP, and histogram plots of the gamma-EP model using the estimated parameters that are obtained with the selected threshold-estimation method are displayed (see Figure 24).

7. Conclusions and Future Work

In this study, a new approach for the prediction of the earthquake return levels is developed. The existing approaches and the proposed models are compared with an application study on earthquake data. EP-based mixed EVT models are proved to be more effective than the GPD-based models. In addition, software support for the proposed model is provided. The users are enabled to calculate earthquake return levels under the gamma-EP model with their own data. The main limitation of the presented study is related to the selection of appropriate initial values for the model parameters used in the optimization procedure. There is no direct way to determine the initial values. However, the estimated parameter values of the GPD-based model can be used as the initial values for EP-based models.

The emergency-management planners can use the GammaEP tool to perform the return-level analyses to predict the probability of extreme earthquake events over a given time horizon, and support the risk assessments and resource allocation. Policy makers can implement seismic safety regulations and infrastructure investments. Scientists and other researchers can perform extreme value analysis by easily adapting to different geophysical and environmental datasets.

As future work, the GammaEP tool will be updated to include Weibull-EP and log-normal-EP models. Also, a change-point approach will be adopted for EP-based EVT models to capture extreme regime shifts that are common in financial return series. EP-based EVT models can be used not only for modeling earthquake data, but also for extreme rainfall, wind speed, and financial risk predictions. A more comprehensive application will be developed to enable the analysis of different datasets in order to facilitate and promote the use of the model in different areas.

Author Contributions

Conceptualization, E.A., H.N.A. and K.S.; Methodology, E.A., H.N.A. and K.S.; Software, E.A.; Validation, E.A.; Writing—original draft, E.A., H.N.A. and K.S.; Writing—review & editing, E.A., H.N.A. and K.S.; Visualization, E.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in https://deprem.afad.gov.tr/event-catalog (accessed on 18 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pisarenko, V.F.; Sornette, D. Characterization of the frequency of extreme earthquake events by the generalized Pareto distribution. Pure Appl. Geophys. 2003, 160, 2343–2364. [Google Scholar] [CrossRef]
Zhang, M.; Pan, H. Application of generalized Pareto distribution for modeling aleatory variability of ground motion. Nat. Hazards 2021, 108, 2971–2989. [Google Scholar] [CrossRef]
Qian, X.; Wang, F.; Sheng, S. Characterization of tail distribution of earthquake magnitudes via generalized Pareto distribution. Acta Seismol. Sin. 2013, 35, 341–350. [Google Scholar]
Ma, N.; Bai, Y.; Meng, S. Return period evaluation of the largest possible earthquake magnitudes in mainland China based on extreme value theory. Sensors 2021, 21, 3519. [Google Scholar] [CrossRef]
Yegulalp, T.M.; Kuo, J.T. Statistical prediction of the occurrence of maximum magnitude earthquakes. Bull. Seismol. Soc. Am. 1974, 64, 393–414. [Google Scholar] [CrossRef]
Fedotenkov, I. A review of more than one hundred Pareto-tail index estimators. Statistica 2020, 80, 245–299. [Google Scholar]
Cheng, T.; Peng, X.; Choiruddin, A.; He, X.; Chen, K. Environmental extreme risk modeling via sub-sampling block maxima. arXiv 2025, arXiv:2506.14556. [Google Scholar] [CrossRef]
Walshaw, D.; Anderson, C.W. A model for extreme wind gusts. J. R. Stat. Soc. Ser. C Appl. Stat. 2000, 49, 499–508. [Google Scholar] [CrossRef]
Embrechts, P.; Klüppelberg, C.; Mikosch, T. Modelling Extremal Events: For Insurance and Finance; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 33. [Google Scholar]
McNeil, A.J.; Frey, R. Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. J. Empir. Financ. 2000, 7, 271–300. [Google Scholar] [CrossRef]
Pickands, J. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar] [CrossRef]
Tekin, S.; Altun, E.; Can, T. A new statistical model for extreme rainfall: POT-KumGP. Earth Sci. Inform. 2021, 14, 765–775. [Google Scholar] [CrossRef]
Collier, A.J. Extreme Value Analysis of Non-Stationary Processes: A Study of Extreme Rainfall Under Changing Climate. Doctoral Dissertation, Newcastle University, Tyne, UK, 2011. [Google Scholar]
Behrens, C.; Lopes, H.F.; Gamerman, D. Bayesian analysis of extreme events with threshold estimation. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2004, 53, 61–76. [Google Scholar] [CrossRef]
Frigessi, A.; Haug, O.; Rue, H. A dynamic mixture model for unsupervised tail estimation without threshold selection. Extremes 2002, 5, 219–235. [Google Scholar] [CrossRef]
MacDonald, A.; Scarrott, C.J.; Lee, D.; Darlow, B.; Reale, M.; Russell, G. A flexible extreme value mixture model. Comput. Stat. Data Anal. 2011, 55, 2137–2157. [Google Scholar] [CrossRef]
Nascimento, F.F.; Gamerman, D.; Lopes, H.F. A semiparametric Bayesian approach to extreme value estimation. Stat. Comput. 2012, 22, 661–675. [Google Scholar] [CrossRef]
Boldi, M.O.; Davison, A.C. A mixture model for multivariate extremes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 217–229. [Google Scholar] [CrossRef]
Hu, Y.; Scarrott, C. evmix: An R package for extreme value mixture modeling, threshold estimation and boundary corrected kernel density estimation. J. Stat. Softw. 2018, 84, 1–27. [Google Scholar] [CrossRef]
Kiran, K.G.; Srinivas, V.V. A Mahalanobis distance-based automatic threshold selection method for peaks over threshold model. Water Resour. Res. 2021, 57, e2020WR027534. [Google Scholar] [CrossRef]
Majid, M.H.A.; Ibrahim, K. Composite pareto distributions for modelling household income distribution in Malaysia. Sains Malays. 2021, 50, 2047–2058. [Google Scholar] [CrossRef]
Sigauke, C.; Ravele, T.; Jhamba, L. Extremal Dependence Modelling of Global Horizontal Irradiance with Temperature and Humidity: An Application Using South African Data. Energies 2022, 15, 5965. [Google Scholar] [CrossRef]
Balkema, A.A.; De Haan, L. Residual life time at great age. Ann. Probab. 1974, 2, 792–804. [Google Scholar] [CrossRef]
Scarrott, C.; MacDonald, A. A review of extreme value threshold estimation and uncertainty quantification. REVSTAT-Stat. J. 2012, 10, 33–60. [Google Scholar]
Li, Y.; Tang, N.; Jiang, X. Bayesian approaches for analyzing earthquake catastrophic risk. Insur. Math. Econ. 2016, 68, 110–119. [Google Scholar] [CrossRef]
Cabras, S.; Castellanos, M.E. A Bayesian approach for estimating extreme quantiles under a semiparametric mixture model. ASTIN Bull. J. IAA 2011, 41, 87–106. [Google Scholar]
Carreau, J.; Bengio, Y. A hybrid Pareto model for asymmetric fat-tailed data: The univariate case. Extremes 2009, 12, 53–76. [Google Scholar] [CrossRef]
Solari, S.; Losada, M.A. A unified statistical model for hydrological variables including the selection of threshold for the peak over threshold method. Water Resour. Res. 2012, 48. [Google Scholar] [CrossRef]
Gupta, R.C.; Gupta, P.L.; Gupta, R.D. Modeling failure time data by Lehman alternatives. Commun. Stat.-Theory Methods 1998, 27, 887–904. [Google Scholar] [CrossRef]
Afify, W.M. On estimation of the exponentiated Pareto distribution under different sample schemes. Stat. Methodol. 2010, 7, 77–83. [Google Scholar] [CrossRef]

Figure 1. The pdfs of the gamma-EP model.

Figure 2. The pdfs of the Weibull-EP model.

Figure 3. The pdfs of the log-normal-EP model.

Figure 4. Overview plots for the data.

Figure 5. Fitted density of the gamma-GPD.

Figure 6. Fitted density of the Weibull-GPD.

Figure 7. Fitted density of the log-normal-GPD.

Figure 8. Fitted density of the gamma-EP.

Figure 9. Fitted density of the Weibull-EP.

Figure 10. Fitted density of the log-normal-EP.

Figure 11. Return level plot for gamma-GPD.

Figure 12. Return level plot for Weibull-GPD.

Figure 13. Return level plot for log-normal-GPD.

Figure 14. Return level plot for gamma-EP.

Figure 15. Return level plot for Weibull-EP.

Figure 16. Return level plot for log-normal-EP.

Figure 17. Threshold selection for gamma-EP model.

Figure 18. Threshold selection for Weibull-EP model.

Figure 19. Threshold selection for log-normal-EP model.

Figure 20. Upload data.

Figure 21. Map.

Figure 22. Threshold-selection methods.

Figure 23. Parameter estimates.

Figure 24. Diagnostic plots.

Table 1. Simulation results of the gamma-EP model.

Sample Sizes	Metrics	Likelihood Approach				KS Approach
Sample Sizes	Metrics	$a$	$b$	$α$	$θ$	$a$	$b$	$α$	$θ$
50	Bias	0.318	−0.056	0.411	5.961	0.282	0.183	0.355	1.120
	MSE	0.565	0.644	0.418	722.488	0.669	1.526	0.288	4.632
100	Bias	0.082	0.016	0.258	4.742	0.063	0.085	0.224	0.906
	MSE	0.168	0.279	0.254	516.347	3.384	0.168	0.437	3.558
200	Bias	0.073	−0.018	0.089	0.899	0.051	0.021	0.105	0.395
	MSE	0.069	0.087	0.039	2.201	0.063	0.093	0.048	0.323
500	Bias	−0.008	−0.002	0.025	0.421	−0.006	−0.003	0.072	0.274
	MSE	0.015	0.028	0.007	1.736	0.016	0.030	0.019	0.168

Table 2. Parameter estimates for Istanbul earthquake data.

Models	Parameters
Gamma-GPD	17.260	0.156	0.949	−0.197	3.383
( $a, b, σ_{u}, δ, u$ )	1.698	0.016	0.266	0.188	-
Weibull-GPD	5.203	2.898	0.957	−0.205	3.383
( $k, λ, σ_{u}, δ, u$ )	0.289	0.038	0.262	0.182	-
Log-normal-GPD	0.960	0.251	0.938	−0.186	3.383
( $μ, σ, σ_{u}, δ, u$ )	0.016	0.012	0.265	0.193	-
Gamma-EP	13.263	0.210	3.587	1.271	2.928
( $a, b, α, θ, u$ )	1.574	0.027	0.448	0.176	-
Weibull-EP	5.022	2.914	4.060	4.261	3.528
( $k, λ, α, θ, u$ )	0.271	0.039	0.874	1.656	-
Log-normal-EP	0.997	0.296	3.587	1.271	2.928
( $μ, σ, α, θ, u$ )	0.021	0.018	0.448	0.176	-

Table 3. Model-selection criteria of the GPD- and EP-based mixture EVT models.

Models	$- ℓ$	AIC	BIC	KS	p
Gamma-GPD	259.184	526.368	527.980	0.068	0.190
Weibull-GPD	253.507	515.014	516.626	0.049	0.575
Lognormal-GPD	263.508	535.017	536.629	0.081	0.073
Gamma-EP	253.034	514.067	515.680	0.034	0.935
Weibull-EP	254.252	516.504	518.117	0.039	0.834
Log-normal-EP	253.177	514.354	515.967	0.036	0.889

Table 4. Return levels for specified return periods.

Return Periods	2	5	10	20	30	40	50	100	1000
Return levels	2.717	3.194	3.479	3.816	4.051	4.237	4.375	4.897	7.574

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Altun, E.; Alqifari, H.N.; Söyler, K. Return Level Prediction with a New Mixture Extreme Value Model. Mathematics 2025, 13, 2705. https://doi.org/10.3390/math13172705

AMA Style

Altun E, Alqifari HN, Söyler K. Return Level Prediction with a New Mixture Extreme Value Model. Mathematics. 2025; 13(17):2705. https://doi.org/10.3390/math13172705

Chicago/Turabian Style

Altun, Emrah, Hana N. Alqifari, and Kadir Söyler. 2025. "Return Level Prediction with a New Mixture Extreme Value Model" Mathematics 13, no. 17: 2705. https://doi.org/10.3390/math13172705

APA Style

Altun, E., Alqifari, H. N., & Söyler, K. (2025). Return Level Prediction with a New Mixture Extreme Value Model. Mathematics, 13(17), 2705. https://doi.org/10.3390/math13172705

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Return Level Prediction with a New Mixture Extreme Value Model

Abstract

1. Introduction

2. Mixture EVT Models

3. Mixture Models Based on the Exponentiated Pareto

3.1. Gamma-EP

3.2. Weibull-EP

3.3. Log-Normal-EP

3.4. Return Level Estimation

4. Estimation

4.1. Threshold Selection

4.2. Simulation

5. Application

5.1. Data

5.2. Comparison of the Models

6. GammaEP Software

6.1. Upload Data

6.2. Map View

6.3. Threshold Selection

6.4. Parameter Estimates

6.5. Plots

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI