1. Introduction
It is well-known that many empirical datasets that are traditionally used in different scenarios, such as financial econometrics, actuarial, income modelling, and industrial engineering, include positive support and bimodality. For example, in the stochastic frontier model, it can be assumed that some firms are fully efficient. In contrast, others are inefficient, giving to an error term that can be bimodal (see, for example, the recent work of [
1]). In this case, the problem is to assess which regime (efficient or inefficient) each firm belongs to. The result of the distribution of the disturbance term can be, in this case, bimodal. Furthermore, in some cases, the classical continuous distributions can neither account for zero values in its support nor reach two maxima. In this regard, many empirical continuous data with positive support begins at zero, including a high frequency of this initial value. This portion of observations is neither negligible, nor should they be ignored. Because most of the aforementioned classical distributions are not allowed to incorporate the zeroes, most of the researchers truncate the data by omitting those values or modelling with mixed random variables that include a mixture of a classical distribution and a point mass 0 (see [
2]). Although numerous models have been developed in discrete scenarios, e.g., zero-inflated and hurdles models, which can perfectly describe the excess of zeros, this is not the case in the continuous case.
Being motivated by this idea, this work introduces a family of distributions with support in
that satisfactorily adapts to the unimodal or bimodal nature of the empirical data. Our objective, following the methodology shown in [
3], consists of incorporating an additional parameter
, in a parent distribution, e.g., exponential, gamma, to build a more flexible probability model. This parameter controls the unimodality or bimodality of the proposed family. The particular case
is reduced to the starting distribution. In that work, the author incorporates a methodology that is conducive to generating asymmetry and sometimes bimodality starting from the normal distribution. The idea here is to use this methodology for the general case of starting from any distribution. In this paper, special attention is paid to the parent distribution case is the exponential one. After providing the expression of the probability density function (pdf) of the proposed distribution, we study some of its more relevant properties, such as moments, kurtosis, Fisher’s asymmetric coefficient, and some estimation methods. A regression model can also be derived by reparameterizing the mean of this new distribution. Several examples that are based on actuarial data are discussed and the performance of this model is compared with the exponential distribution. Note that other generalizations of the exponential distribution have been considered by [
4,
5] and also by [
6]. Nevertheless, these generalizations of the exponential model are not able to incorporate covariates. Although the methodology that is developed in this work is fully parametric, there has been an increasing number of publications to discuss similar problems within the field of machine learning and statistical framework, see, for example [
7,
8,
9], among others.
On the other hand, a weighted distribution is a powerful tool to enhance a parent discrete or continuous distribution. Recall that, for a random variable
X with support in
with probability density function
that depends on a parameter (or vector of parameters)
, it can be constructed a new distribution via a weighted function,
, with probability density function (see for instance [
10,
11,
12])
where
is a new parameter and where it is assumed that
. Now, by combining this methodology with the idea that is given in [
3], a result that provides a generalization of a classical continuous distribution is proposed. The resulting model can be either unimodal or bimodal, as shown in the following Proposition.
Proposition 1. Let a continuous distribution with finite mean μ and variance . Subsequently, it is verified thatwith . The latter parameter controls the unimodality or the bimodality of the distribution andis a genuine pdf. Proof. The result is obtained by considering that and integrating over the support of the random variable Y to have . □
The parameter controls the unimodality or bimodality of the distribution. Additionally, by taking , the parent pdf is obtained as a special case. Subsequently, the methodology proposed in Proposition 1 is a method to generalize a parent pdf.
The probability density function that is given in (
2) can be viewed as a weighted distribution. There exists a vast literature dealing with the construction of such distributions in the discrete case since the pioneering work of [
10]. However, the literature regarding the continuous scenario is scarce. The idea behind this construction is simple and it aims to obtain more flexible distributions that adapt to empirical data distributions. If the weight is the mean of the initial distribution, then the weighted function’s interpretation in terms of the length biased (size biased) sampling is possible. However, much more effort is required to obtain an interpretation of the function
beyond incorporating a parameter that controls the unimodality or bimodality of the distribution obtained.
The cumulative distribution function (cdf) of this family is obtained by integrating (
2) by parts
It is noted that the integral in the second term of the right-hand side of (
3) is obtained in a closed-form under the classical distributions, such as the exponential, gamma, and Weibull. In this paper, we discuss the particular case in which the parent distribution is the exponential distribution. A comprehensive examination of its mathematical properties is carried out with relevant emphasis on results that are related to insurance. Additionally, parameter estimation is completed by the methods of moments and maximum likelihood. Moreover, we analyze the efficiency of the estimates via a simulation study. Finally, the model’s practical performance is examined by using two real claims size sets of data. The distribution that is proposed in this paper can be used as a basis for excess-of- loss quotations. Furthermore, it provides a good description of the random behaviour of significant losses, similarly to the Pareto distribution. Unlike other generalizations of the exponential distribution, the one introduced in this work allows for us to derive a regression model due to its mean simple expression.
The rest of the paper is organised, as follows.
Section 2 provides some statistical properties of this model.
Section 3 shows a catalogue of actuarial results. Next,
Section 4 describes parameter estimation and a simulation study. The regression model is derived in
Section 5. Numerical applications are given in
Section 6 and the last Section concludes the paper.
2. Bimodal Extension of the Exponential Distribution
Let us first consider the classical exponential distribution with the pdf given by
with a rate parameter
and a unique modal value located at
. The survival function is
. Henceforward, a random variable
X that follows (
4) will be denoted by
.
Now, by using (
2) and taking into account that
and
, we have that, for
where
, the expression
is a genuine pdf for
,
and
. The survival function is obtained from (
3) and it is given by
with
The exponential distribution is obtained when
. From now on, a random variable that
Y follows the pdf (
5) will be denoted as
.
Figure 1, below, shows the graphs of the pdf of this distribution for several values of the parameters
and
.
It can be easily verified that, if
, then the shape of the distribution resembles the exponential one with a mode at
. On the other hand, for other values of
, the pdf reaches a local maximum (local mode) at
and a minimum (antimode) at
Furthermore, since , then the value of the pdf at zero is larger than the one of the classical exponential distribution at the same value for and lower in the rest of its domain.
Reliability, Hazard Rate Function and Moments
The reliability function and the hazard (failure) rate function are two important reliability measures. The reliability function of a random variable
Y is defined as
and it is given by (
6), while the hazard rate function, defined as
, is provided by
Figure 2 displays the hazard rate function of the
law for different values of the parameters. As compared to the exponential distribution, which has a constant hazard rate, it is discernible that the hazard function of the distribution that is proposed here exhibits a wide variety of shapes. Therefore, the new family of distributions is flexible enough to describe a diversity of real datasets.
It is simple to see that the hazard rate function reaches a minimum at
for
and at
for
. Obviously, for
the hazard rate function is constant, corresponding to the exponential case. The moment generating function is given by
from which we can derive the moments of the distribution; these are given by
where
, and
and
are the mean and standard deviation of the exponential distribution respectively. Therefore, (
8) can be rewritten as
In particular, the mean, second order moment, and variance are given by
It is straightforward to see that the mean decreases with for and increases in the rest of the support of this parameter.
The asymmetry coefficient (not given here) can be obtained in closed-form expression by using the well-known formula
where
.
3. Results in Risk Theory
In this section, some interesting actuarial results of this family of distributions are provided. Let the random variable
Y represent either a policy limit or reinsurance deductible (from an insurer’s perspective); then, the limited expected value function
L of
Y with cdf
, is defined by
which is the expectation of the cdf
. In other words, it represents the expected amount per claim that is retained by the insured on a policy with a fixed amount deductible of
y. This is an appropriate tool for analyzing an excess of loss reinsurance ([
13], Chapter 2, p. 59 and [
14], Chapter 3, p. 113), among others. For the
distribution, this amount is given by
where
. The pdf (
5) can be also applied to rating excess-of-loss reinsurance, as it can be seen in the following result.
Proposition 2. Let Y be a random variable denoting the individual claims size taking values only for individual claims greater than d. Let us also assume that Y follows the pdf (5), then the expected cost per claim to the reinsurance layer when the loss in excess of m subject to a maximum of l is given bywhere Proof. The result follows by taking into account that
from which we obtain the result after some algebra. □
The failure rate of the integrated tail distribution, as defined by
is given by
Additionally, the reciprocal of
is the mean residual life that can be easily derived. In the insurance context (see, for example [
13,
15]) for a claims amount random variable
Y, the mean excess function or mean residual life function,
, plays an essential role in reinsurance framework. It is interpreted as the expected payment per claim on a policy with a fixed deductible of
y, where claims with an amount less than or equal to
y is completely ignored. Because the mean residual life is related to the limited expected value function through the expression (see [
13], p. 59)
a closed-form expression can be obtained for the mean residual life
Finally, the TVaR function is also provided in closed-form,
4. Methods of Estimation and Simulation
Given a random sample
taken from the
distribution, simple moment estimates can be calculated by equating the sample and theoretical moments. Because there are two parameters, we need, for example, the mean,
, and the sample second order moment around the origin,
. Now, by setting equal (
9) and (
10) to the sample counterparts, we get
By plugging (
13) into (
10), we obtain the equation
which depends solely on
and it can be solved numerically. The impossibility of proving that both moment and maximum likelihood estimators exist and are unique is one of the most substantial limitations of the proposed probabilistic model. However, in practice, the model’s estimates, as shown in the simulation analysis and numerical applications, are easily obtained by numerical methods without difficulty. This issue leads us to think that they correspond to global maxima, although they cannot be guaranteed.
We now proceed with the maximum likelihood method of estimation. The log-likelihood function can be written as
where
. Then, the normal equations are given by
with,
,
and
Numerical procedures, such as the Newton–Raphson algorithm can be used to derive the solutions of the system of equations that are given by (
14) and (
15). Unlike the exponential distribution, the maximum likelihood estimates cannot be expressed in closed-form. In practice, as we are unable to prove that the log-likelihood function is concave, the likelihood function can be directly maximized by considering different values as seed points, since the global maximum is not guaranteed by the impossibility to prove that the log-likelihood function is concave. We have used different maximum search methods available in the
FindMaximum built-in function in
Mathematica software package. These methods include the Newton–Raphson and the Broyden–Fletcher–Goldfarb-Shanno (BGGS) algorithms. The same results were achieved under these two optimization functions. Although a more general structure, such as kernel regression or neural network, could provide accurate estimates, the approach used in this paper does not require training data. It can also work well, even if the fit to data is not perfect. Additionally, this method is easier to understand and interpret results, i.e., a parametric test for the significance of the parameter estimates can lead to a rejection of the null hypothesis rather than the non-parametric counterpart. Finally, from the actuarial perspective, the practitioner may be interested in the parametric approach since it provides appealing closed-form expressions, as is the case of this BE representation.
Simulation Experiment
Here, an acceptance-rejection algorithm to generate random variates from the
distribution (see [
16]) is used. The simulation analysis results are illustrated in
Table 1, where the behaviour of the maximum likelihood estimates of 1000 simulated samples of sizes 50, 100, 150, and 200 from the
distribution. For each simulated sample generated, the estimates were numerically computed via a Newton–Raphson algorithm. In this Table, the means, standard deviations (SD), and percentage of coverage probability (C) are reported for different values of the parameters
and
. As expected, it is observable that the bias becomes smaller as the sample size
n increases.
5. A suitable Regression Model
In practice, to better explain the response variable, it is important that the statistical model is able to incorporate covariates. By rewriting (
9) as
a reparameterization of the
distribution in (
5) is obtained, where
. The variance of the reparameterized distribution is given by
where
In this case, the parameter
can be interpreted as a precision parameter, since the function
increases for
(see
Figure 3) and decreases in the rest of the domain of
. Thus,
is the mean of the response variable and
can be regarded as a precision parameter in the sense that, for a fixed value of
, the variance of
Y varies according to the values of
, i.e., the values of the parameter
.
Because the mean of the response is non-negative, the most common function that relates the mean and the linear predictor is the log link,
where
is a
q-vector of explanatory variables and
is a
q-vector of unknown regression coefficients that may include an intercept. Subsequently, we have the conventional log-linear model, such that
with
.
The maximum likelihood estimates of the regressors
, can be computed via the Newton–Raphson algorithm. In our applications, parameters will be estimated by the maximum likelihood method by using this algorithm available in the software packages Mathematica [
17] and RATS [
18]. The code for the latter package is available upon request.
As is well-known, the marginal effect reflects the variation of the conditional mean of Y due to a one-unit change in the sth covariate, and it is calculated as for and . Thus, the marginal effect indicates that a one-unit change in the sth regressor increases or decreases the expectation of the dependent variable, depending on the sign, positive or negative, of the regressor for each mean. For indicator or dummy variables that take only the value 0 or 1, the marginal effect in term of the odds-ratio is approximately . Therefore, the conditional mean is times larger if the indicator variable is one rather than zero.