1. Introduction
Heavy-tailed distributions have been used in modeling data in several applied areas such as risk management, economic, and actuarial sciences. The insurance data sets are usually positive (Klugman et al. [
1]), unimodal shaped (Cooray and Ananda [
2]), right-skewed (Lane [
3]), and with heavy tails (Ibragimov and Prokhorov [
4]). Right-skewed data may be adequately modeled by skewed distributions (Bernardi et al. [
5]). Modeling insurance data using heavy-tailed distributions is of a great interest for actuaries. Furthermore, actuaries and risk managers are often interested in “the chance of an adverse outcome”, which can be expressed through the value at risk (VaR) at a particular probability level. The VaR can also be utilized to determine the amount of capital required to withstand such adverse outcomes. Investors and rating agencies are particularly interested in the company’s ability to withstand such events.
Hence, several unimodal positively skewed distributions are utilized to model such data sets (Adcock et al. [
6] and Bhati and Ravi [
7]).
The heavy-tailed distributions have right tail probabilities which are heavier than the exponential one, that is, for any baseline with cumulative distribution function (CDF)
, we have
More information about heavy-tailed distributions can be found in Resnick [
8] and Beirlant et al. [
9].
An empirical analysis for loss distributions to estimate the risk using some approaches is conducted by Dutta and Perry [
10], who pointed out that the exponential, gamma, and Weibull models can not be used because of their poor results and stated that one would need to use a more flexible model. There are some methods that have been introduced to construct new distributions with heavier tails than the exponential distribution, called the transformation method, compounding of distributions, composition of two or more models and finite mixture distributions. The interested reader can refer to Eling [
11], Kazemi and Noorizadeh [
12], Bakar et al. [
13], Punzo [
14], Mazza and Punzo [
15], Miljkovic and Grun [
16], and Punzo et al. [
17].
Aforementioned approaches may be very useful in constructing more flexible distributions; however, these methods are still subject to some sort of deficiencies, such as the inferences of transformation approach become difficult and require much computational work to derive the distributional characteristics (see Bagnato and Punzo [
18]), and the approach of composition of two or more models based on fixed or pre-defined mixing weights, may be very restrictive (see Calderin-Ojeda and Kwok [
19]).
Hence, it is important to develop more flexible models either from the existing classical distributions or a new class of distributions to model different insurance data including insurance loss data, unemployment insurance data, and financial returns, among others. In the present paper, we are motivated to propose a more flexible distribution called alpha power exponentiated exponential (APExE) distribution, which provides greater accuracy and flexibility in fitting actuarial data.
Furthermore, the aim of the paper is three-fold: first to study a new extension of the exponential (E) and exponentiated exponential (ExE) distributions based on an alpha power-G (AP-G) family proposed by Mahdavi and Kundu [
20], called the APExE distribution. Some distributional properties of the APExE distribution are derived. The proposed model has some desirable properties such as
The APExE model contains, as special cases, some lifetime distributions, called E, ExE [
21], and alpha power exponential (APE) [
20] distributions.
The APExE distribution accommodates upside down bathtub, bathtub, decreasing, decreasing-constant and increasing hazard rates, and right-skewed, symmetrical, left-skewed, J-shape, reversed-J shape, and unimodal densities (see
Figure 1 and
Figure 2).
It provides a heavier tailed distribution than the APE, ExE, and E distributions based on computational results of risk measures (see also
Section 4).
The APExE has closed forms for its CDF and hazard rate function (HRF). Hence, it can be used conveniently in analyzing censored data. Furthermore, it can be used to model various data in actuarial applications.
It can be utilized to model heavy-tailed insurance data from actuarial science than other competing models. The proposed APExE distribution provides better fits than other fifteen competing distributions in modeling unemployment insurance data (see
Section 7).
One of the most important subjects of actuarial sciences is to evaluate the exposure to market risk in a portfolio of instruments, which arise from changes in underlying variables such as prices of equity, interest rates, or exchange rates. Hence, our second objective is to derive some important risk or actuarial measures including VaR, tail value at risk (TVaR), tail variance (TV), and tail variance premium (TVP) for the APExE distribution, which play a crucial role in portfolio optimization under uncertainty. Third, we explore the estimation of the APExE parameters by six methods of estimation. Such methods include the maximum likelihood estimators (MLE), ordinary least-squares estimators (OLSE), weighted least squares estimators (WLSE), Anderson–Darling estimators (ADE), Cramér–von Mises estimators (CVME), and percentile estimators (PE). We compare these estimators using an extensive computational study in order to develop a guideline for choosing the best method of estimation that provides better estimates for the APExE parameters, which we think would be of a great interest to applied actuaries/statisticians/engineers.
The paper is organized as follows. In
Section 2, we define the APExE distribution. Its mathematical properties are derived in
Section 3. In
Section 4, we discuss four important actuarial measures based on the APExE distribution and present some numerical results for them. Six methods of parameter estimation are explored in
Section 5. The performance of estimation methods is adopted by simulation results in
Section 6. In
Section 7, we consider a heavy-tailed real data set from the insurance field to illustrate the usefulness of the APExE distribution. Final remarks are presented in
Section 8.
2. The APExE Distribution
In this section, we present the APExE model that can be specified by the following CDF and probability density function (PDF)
where
is a scale parameter and
, and
are shape parameters. By setting
, the APExE reduces to APE distribution (Mahdavi and Kundu [
20]). The ExE distribution (Gupta and Kundu [
21]) follows from the APExE distribution with
, whereas the E follows with
and
.
The survival function (SF) and HRF of APExE distribution have the forms
Plots of the PDF and HRF of APExE distribution are shown in
Figure 1 and
Figure 2, respectively. These plots reveal that the APExE distribution accommodates bathtub, upside-down bathtub, decreasing, decreasing-constant and increasing hazard rate functions as well as symmetrical, left-skewed, right-skewed, J-shape, and reversed-J shape densities.
3. Mathematical Properties
3.1. Quantile Function
The quantile function (QF) of the APExE distribution is derived by determining the inverse function of the CDF (
1) as
The first, second, and third quartiles of the APExE distribution are obtained by setting
, and
, respectively, in (
5).
Let
p follow uniform distribution
, then the QF can be used for generating random data sets of size
n from APExE distribution as follows:
3.2. Moments
The
moments of the APExE distribution has the form
Setting , and 4, respectively, we obtain the first four moments about the origin of the APExE distribution.
The plots of the mean, variance, skewness, and kurtosis of the APExE model for various parametric values of
and
c are displayed in
Figure 3.
The moment generating function of the APExE distribution takes the form
The characteristic function of the APExE distribution follows from the above equation by replacing t with .
3.3. Moments of Residual Life
Let
be the SF of the APExE distribution, then its
moment of residual life is given by
The mean residual life function of the APExE distribution is obtained by setting in the previous equation and then by setting , we have the mean of the APExE distribution.
3.4. Entropies
The entropy of a random variable
X is a measure of variation of the uncertainty, and it has some applications in several applied areas such as statistics to test hypotheses in parametric models (see Morales et al. [
22]), and information theory, engineering and physics for describing nonlinear chaotic or dynamical systems (see Kurths et al. [
23]). Furthermore, Song [
24] developed a log-likelihood-based distribution measure using the Rényi information which exists for all distributions and allows for meaningful comparisons between distributions than the traditional kurtosis measure. Song’s measure can be used in exploring density shapes especially for heavy-tailed distributions, while the kurtosis measure does not exist for many of these distributions.
In this section, we derive the continuous Rényi, Tsallis, and Shannon entropies of the APExE distribution. The Rényi,
, and Tsallis,
entropies of order
r, where
of the APExE distribution are given, respectively, by
and
The Rényi entropy reduces to Shannon entropy,
, as
r approaches to 1. The Shannon entropy of APExE distribution takes the form
where
and
is the Euler Mascheroni constant.
Table 1 reports some numerical values for the Rényi, Tsallis, and Shannon entropies of the APExE distribution.
3.5. Inequality Curves
The most used and known important curves among inequality curves are Lorenz and Bonferroni that have useful applications in some applied areas including economics to study income and poverty, demography, reliability, medicine, and insurance.
The Lorenz and Bonferroni curves are defined for the APExE distribution as follows:
respectively, where
is the QF of the APExE distribution.
3.6. Order Statistics
The PDF and CDF of the
order statistic for the APExE distribution are
where
is a hyper geometric function.
By setting
, we have the PDF and CDF of minimum order statistics
. The limit distribution for
reduces to (see Theorem
in Galambos [
25])
By setting
, we have the PDF and CDF of maximum order statistics
. The limit distribution for
takes the form (see Theorem
in Galambos [
25])
where
.
4. Actuarial Measures
Probability distributions present a description of risk exposure. The level of risk exposure can be described by “key risk indicators” (numbers) that usually are functions of the model. Actuaries and risk managers often use such key risk indicators to determine the degree to which their companies are subject to particular aspects of risk, which arise from changes in underlying variables such as prices of equity, interest rates, or exchange rates.
In this section, we discuss the theoretical and computational aspects of some important risk measures including VaR, TVaR, TV, and TVP for the APExE distribution, which play a crucial role in portfolio optimization under uncertainty.
4.1. VaR Measure
The VaR is also known as the quantile risk measure or quantile premium principle, and it is specified with a given degree of confidence say q (typically
,
or
). Furthermore, VaR is a quantile of the distribution of aggregate losses. Risk managers are often interested in “the chance of an adverse outcome” that can be expressed through the VaR at a particular probability level. The VaR can be used to evaluate exposure to risk, and hence it is used to determine the amount of capital required to withstand such adverse outcomes. Investors, regulators, and rating agencies are particularly interested in the company’s ability to withstand such events. The VaR of a random variable X is the
quantile of its CDF, denoted by
, and it is defined by
(see Artzner [
26]).
If X has the PDF (2), then its VaR can be derived as
4.2. TVaR Measure
Another important measure is the TVaR that has been given several names including conditional tail expectation, conditional-value at-risk, and expected shortfall. The TVaR is used to quantify the expected value of the loss given that an event outside a given probability level has occurred. The TVaR is defined by
The TVaR of the APExE distribution is defined by
where
.
4.3. TV Measure
Landsman [
27] introduced the TV risk that is defined by the variance of the loss distribution beyond some critical value. The TV is one of the most important risk measures which pay attention to the tail variance beyond the VaR. The TV of the APExE distribution can be defined as
where
where
,
B and
C are generalized hyper geometric functions given by
and
. Using Equations (
6)–(
8), we get the TV of APExE distribution.
4.4. TVP Measure
The TVP is another important measure which plays an essential role in insurance sciences. The TVP of the APExE distribution takes the form
where
. Substituting the expressions (
6) and (
7) in Equation (
9), one can obtain the TVP of the proposed distribution.
4.5. Numerical Simulations for Risk Measures
In this sub-section, we present some numerical results for the VaR, TVaR, TV, and TVP measures of the APExE, APE, ExE, and E distributions for different parametric values. The results are obtained through the following two steps:
Random sample of size is generated from the APExE, APE, ExE and E distributions, and parameters have been estimated via the maximum likelihood method.
1000 repetitions are made to calculate the VaR, TVaR, TV, and TVP of the four distributions.
Simulation results of the VaR, TVaR, TV, and TVP for the APExE, APE, ExE, and E distributions are provided in
Table 2 and
Table 3. Furthermore, the results in these tables are depicted graphically in
Figure 4 and
Figure 5.
The model with higher values of VaR, TVaR, TV, and TVP measures is said to have a heavier tail than other competing models. The results in
Table 2 and
Table 3, and the plots in
Figure 4 and
Figure 5 show that the proposed APExE model has higher values of the four risk measures than the APE, ExE, and E distributions. Hence, the APExE model has a heavier tail than other distributions and can be utilized accurately to model heavy-tailed insurance data.
5. Methods of Estimation
In this section, we discuss the estimation of the APExE parameters by different methods of estimation including the MLE, OLSE, WLSE, ADE, CVME, and PE. Parameter estimation using different classical estimators have been discussed by many researchers. For example, the alpha logarithmic transformed Weibull distribution [
28], quasi xgamma-geometric distribution [
29], Weibull Marshall–Olkin Lindley distribution [
30], and APE distribution [
31], among others.
5.1. Maximum Likelihood Estimation
Let
be a random sample of size
n from the PDF (2), then the log-likelihood function reduces to
By differentiating Equation (
10) with respect to
,
a and
c, respectively, and equating to zero, we have
Solving the previous equations, we obtain estimators of the APExE parameters by the MLE.
5.2. Ordinary Least-Squares and Weighted Least-Squares Estimators
Let
be the order statistics of a random sample of size
n from the APExE distribution. Hence, we have the OLSE of the APExE parameters by minimizing the following equation:
The OLSE of the APExE parameters can also be obtained by solving the following nonlinear equations:
where
The WLSE of the APExE parameters can be calculated by minimizing the following equation:
Furthermore, the WLSE of the APExE parameters follow by solving the following nonlinear equations:
where
were defined in (
11), (12), and (13), respectively.
5.3. Anderson–Darling Estimation
The ADE of the APExE parameters are obtained by minimizing the following equation:
The ADE can also be calculated by solving the following nonlinear equations:
where
were defined in (
11), (12), and (13), respectively.
5.4. Cramér–von Mises Estimation
The CVME of APExE parameters are obtained by minimizing the following equation:
or by solving the following nonlinear equations
where
were defined in (
11), (12), and (13), respectively.
5.5. Percentile Estimation
Let
be an estimate of
, then the PE of the APExE parameters are obtained by minimizing the following equation:
or by solving the following nonlinear equations:
where
6. Simulation Results
In this section, we explore the performance of the aforementioned estimation methods in estimating the APExE parameters using simulation results. We consider various sample sizes, , and various parametric values, , and . We generate random samples from the APExE distribution via Equation (6). We determine the average values of the estimates ( AEs) along with their corresponding average absolute biases (ABs), average mean square error (MSEs), and average mean relative estimates (MREs) for all sample sizes and parameter combinations using the R software©.
The ABs, MSEs, and MREs can be calculated by the following respective equations:
where
.
Table A1,
Table A2,
Table A3,
Table A4,
Table A5,
Table A6,
Table A7 and
Table A8 report the simulation results including AEs, ABs, MSEs, and MREs of the APExE parameters using the six estimation approaches. These tables are given in
Appendix A. One can note that the estimates of the APExE parameters obtained from all six estimation methods are entirely good, that is, these estimates are quite reliable and very close to the true values, showing small biases, MSEs and MREs in all parameter combinations. All six estimators show the consistency property, where the MSEs, ABs, and MREs decrease as sample size increases, for all parameter combinations. We conclude that the MLE, OLSE, WLSE, ADE, CVME, and PE methods perform very well in estimating the APExE parameters.
7. Modeling Insurance Data
In this section, we consider a heavy-tailed real data set from the insurance field to illustrate the usefulness of the APExE distribution. This data set represents monthly metrics on unemployment insurance from July 2008 to April 2013 including 58 observations, and it is reported by the Department of Labor, Licensing and Regulation, State of Maryland, USA. The data consist of 21 variables and we particularly analyze the variable number 12. The data are available at:
https://catalog.data.gov/dataset/unemployment-insurance-data-july-2008-to-april-2013. We compare the goodness-of-fit results and some discrimination measures of the proposed distribution with some other well-known competing distributions, including the APE [
20], ExE [
21], beta exponential (BE) [
32], E, extended odd Weibull exponential (EOWE) [
33], exponentiated Weibull (ExW) [
34], Weibull (W), transmuted generalized exponential (TGE) [
35], Marshall–Olkin exponential (MOE) [
36], generalized transmuted exponential (GTE) [
37], transmuted exponentiated generalized exponential (TExGE) [
38], complementary geometric transmuted exponential (CGTE) [
39], gamma (G), Harris extended exponential (HEE) [
40], and transmuted exponential (TE) [
41] distributions.
The competing models can be compared using some discrimination measures such as Akaike information (AKI), consistent Akaike information (CAKI), Bayesian information (BAI), and Hannan–Quinn information (HAQUI) criteria. Further discrimination measures including Anderson Darling (ANDA), Cramér–von Mises (CRVMI), and Kolmogorov–Smirnov (KOSM) with its p-value.
The MLEs and the analytical measures are computed using the R software©.
Table 4 gives the MLEs and their standard errors. The analytical measures are provided in
Table 5. The results in
Table 5 indicate that the APExE provides better fits than other competing models and could be chosen as an adequate model to analyze the studied heavy-tailed insurance data. The fitted PDF, CDF, SF, and probability-probability (P–P) plots of the APExE model are depicted in
Figure 6. Furthermore, we use the six estimation approaches discussed before in
Section 5 to estimate the APExE parameters.
Table 6 reports the estimates of the APExE parameters using these approaches and the numerical values of goodness-of-fit for insurance data. Based on the values of KOSM and
p-values listed in
Table 6, we conclude that the performance ordering of six estimators, from best to worst, for insurance data are OLSE, CVME, WLSE, ADE, MLE, and PE. The P–P plots and histogram of insurance data with the fitted APExE density for various estimation methods are shown in
Figure 7 that supports the results in
Table 6.
8. Conclusions
In this paper, we propose a new heavy-tailed distribution to model heavy-tailed insurance data, called alpha power exponentiated exponential (APExE) distribution that extends the exponential (E), exponentiated exponential (ExE), and alpha power exponential (APE) distributions. Its associated hazard rate function can be bathtub, unimodal, decreasing, decreasing-constant, increasing, and reversed-J shaped. Some of its statistical properties are derived. The risk measures such as value at risk, tail value at risk, tail variance, and tail variance premium are derived for the APExE distribution along with a conducted simulation study for these actuarial measures, proving that the APExE distribution has a heavier tail than the APE, ExE, and E distributions. Its unknown parameters are estimated by six frequentist estimation approaches. The practical applicability of the APExE distribution has been illustrated by an insurance real-life data, proving its superiority over fifteen competing models.
The research in this article can be extended in some ways. For example, the APExE distribution can be utilized for analyzing and modeling data in other fields such as reliability engineering, medicine, economics, survival analyses, and life testing.
Bayesian estimation of the APExE parameters based on complete and several types of censored samples under different types of loss functions could be discussed. Furthermore, a bivariate extension of the APExE distribution may also be studied.