Next Article in Journal
Optimal Claiming Strategies in Bonus Malus Systems and Implied Markov Chains
Previous Article in Journal
A Review and Some Complements on Quantile Risk Measures and Their Domain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An EM Algorithm for Double-Pareto-Lognormal Generalized Linear Model Applied to Heavy-Tailed Insurance Claims

Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Melbourne, VIC 3010, Australia
*
Author to whom correspondence should be addressed.
Risks 2017, 5(4), 60; https://doi.org/10.3390/risks5040060
Submission received: 27 September 2017 / Revised: 2 November 2017 / Accepted: 3 November 2017 / Published: 7 November 2017

Abstract

:
Generalized linear models might not be appropriate when the probability of extreme events is higher than that implied by the normal distribution. Extending the method for estimating the parameters of a double Pareto lognormal distribution (DPLN) in Reed and Jorgensen (2004), we develop an EM algorithm for the heavy-tailed Double-Pareto-lognormal generalized linear model. The DPLN distribution is obtained as a mixture of a lognormal distribution with a double Pareto distribution. In this paper the associated generalized linear model has the location parameter equal to a linear predictor which is used to model insurance claim amounts for various data sets. The performance is compared with those of the generalized beta (of the second kind) and lognorma distributions.

1. Introduction

Heavy-tailed distributions are an important tool for actuaries working in insurance where many insurable events have low likelihoods and high severities and the associated insurance policies require adequate pricing and reserving. In such cases the four-parameter generalized beta distribution of the second kind (GB2) and the three-parameter generalized gamma distribution fulfil this purpose, as demonstrated in McDonald (1990), Wills et al. (2006), Frees and Valdez (2008), Wills et al. (2006), Frees et al. (2014a) and Chapter 9 of Frees et al. (2014b). In fact, the set of possible distributions that could be used for long-tail analyses is much broader than suggested here and good references for these are Chapter 10 of Frees et al. (2014a), Chapter 9 of Frees et al. (2014b) and Section 4.11 of Kleiber and Kotz (2003). We propose in this article the use of the double Pareto lognormal (DPLN) distribution as an alternative model for heavy-tailed events.
The DPLN distribution was introduced by Reed (2003) to model the distribution of incomes. It occurs as the distribution of the stopped wealth where the wealth process is geometric Brownian motion, the initial wealth is lognormally distributed and the random stopping time is exponentially distributed. This parametric model exhibits Paretian behaviour in both tails, among other suitable theoretical properties, and there is favourable evidence of its fit to data in various applications, as demonstrated in Colombi (1990), Reed (2003), Reed and Jorgensen (2004) and Hajargasht and Griffiths (2013) for income data and Giesen et al. (2010) for settlement size data. Particular applications of the DPLN distribution to insurance and actuarial science have previously been given in Ramírez-Cobo et al. (2010) and Hürlimann (2014).
In this paper, the DPLN generalized linear model (GLM) is introduced by setting the location parameter equal to a linear predictor, i.e., ν = β x , where β R d is a vector of regression coefficients and x is a vector of explanatory variables. Then the mean of the DPLN GLM is proportional to some exponential transformation of the linear predictor. Then an EM algorithm is developed which solves for the regression parameters.
Particular applications of the DPLN distribution to insurance and actuarial science have previously been given in Ramírez-Cobo et al. (2010) and Hürlimann (2014).  However, another practical application of the DPLN GLM, beyond what is demonstrated in this article, is assessing variability of survival rates of employment, as done in Yamaguchi (1992).
Applying this generalized linear model, we model the claim severity for private passenger automobile insurance and claim amounts due to bodily injuries sustained in car accidents, the data sets of which are supplied in Frees et al. (2014a). We compare this predictive model with the generalized linear model derived from the generalized beta distribution of the second kind (GB2), which has been employed in modelling incomes, for example by McDonald (1990), and in modelling insurance claims, for example by Wills et al. (2006) and Frees and Valdez (2008). The EM algorithm has previously been applied to insurance data, for example in Kocović et al. (2015) , where it is used to explain losses of a fire insurance portfolio in Serbia.
The rest of the paper is organized as follows. Section 2 explains how the DPLN model is applied to regression by setting the location parameter equal to a linear predictor. In Section 3, details of parameter estimation by the method of maximum likelihood using the related normal skew-Laplace (NSL) distribution are provided, where we develop an EM algorithm for such a purpose. Section 4 gives numerical applications of the models to two insurance-related data sets and makes comparisons of fits with the LN and GB2 distributions using the likelihood ratio test and another test for non-nested models due to Vuong (1989). Out-of-sample performances of the models in regards to capital requirements of an insurer are also provided. Section 5 concludes.

2. DPLN Generalized Linear Model

As mentioned previously, the DPLN distribution can be obtained as a randomly stopped geometric Brownian motion whose initial value is lognormally distributed. Therefore, without any mathematical analysis, stopping the geometric Brownian motion at the initial time with probability one will give the lognormal distribution. If the diffusion coefficient of the geometric Brownian motion is set to zero, we have a deterministic geometric motion which is stopped at an exponentially distributed time, giving the PLN distribution. If also the drift coefficient of the geometric Brownian motion is set to zero then the lognormal distribution results. Another degenerate case emerges when the initial value is constant, that is, when its variance is zero, giving the Pareto distribution. This gives us a sensible intuition on the mathematical derivations in this section.
Formally, given a filtered probability space ( Ω , F ̲ , ( F t ) t 0 , P ) , where Ω is the sample space, F ̲ is the σ -algebra of events, ( F t ) t 0 is the filtration of sub- σ -algebras of F ̲ and P is a probability measure, we consider the adapted stochastic process Y specified by the following stochastic differential equation (SDE), for t 0 ,
d Y t = ( μ 1 2 σ 2 ) d t + σ d W t ,
where μ and σ 0 are constants and W is a Wiener process adapted to the filtration ( F ) t 0 .
Then for a fixed time t 0 , the random variable Y t can be written as
Y t = Y 0 + ( μ 1 2 σ 2 ) t + σ W t .
Now if Y 0 is a random variable, dependent upon the vector of predictor variables x = ( x 1 , x 2 , , x d ) R d , such that Y 0 N ( ν , τ 2 ) , where ν = β x , and if we stop the process Y randomly at time t = T , where T E x p ( λ ) , then Y T is a normal skew Laplace (NSL) distributed random variable regressed on the vector of predictors x , that is
Y T N S L ( ν , τ 2 , λ 1 , λ 2 ) ,
the exponential of which is a DPLN distributed random variable V T (see Reed and Jorgensen (2004) for more details) dependent on the same predictors, namely
V T = exp ( Y T ) D P L N ( ν , τ 2 , λ 1 , λ 2 ) ,
where ν = β x . As indicated previously, the particular case of the PLN distribution arises when σ = 0 in (1) and the case of the lognormal distribution (LN) arises when μ = 0 and σ = 0 in (1).
The moment generating function (MGF) of Y T is
M G F Y T ( s ) = M G F log X 0 ( s ) M G F T 1 T 2 ( s ) ,
where T 1 E x p ( λ 1 ) and T 2 E x p ( λ 2 ) are exponentially distributed random variables with
λ 1 = 1 σ 2 ( μ 1 2 σ 2 ) 2 + 2 λ σ 2 ( μ 1 2 σ 2 ) , λ 2 = 1 σ 2 ( μ 1 2 σ 2 ) 2 + 2 λ σ 2 + ( μ 1 2 σ 2 ) ,
as given in Reed and Jorgensen (2004). This product of MGFs demonstrates that the NSL distributed random variable Y T can be expressed as the sum
Y T = log X 0 + T 1 T 2 ,
where log X 0 N ( ν , τ 2 ) and T 1 and T 2 are as above.
The probability density function (PDF) of V T = exp ( Y T ) , as in (4), is given by
f V T ( x ) = λ 1 λ 2 λ 1 + λ 2 1 x [ exp 1 2 τ 2 λ 1 2 λ 1 ( log x ν ) Φ log x τ 2 λ 1 ν τ + exp 1 2 τ 2 λ 2 2 λ 2 ( ν log x ) Φ c log x + τ 2 λ 2 ν τ ] ,
also given in Reed (2003). Because we will work with logarithms of DPLN variates we will make use of the PDF of Y T given by
f Y T ( y ) = λ 1 λ 2 λ 1 + λ 2 [ exp 1 2 τ 2 λ 1 2 λ 1 ( y ν ) Φ y τ 2 λ 1 ν τ + exp 1 2 τ 2 λ 2 2 λ 2 ( ν y ) Φ c y + τ 2 λ 2 ν τ ] ,
where Φ ( · ) and Φ c ( · ) represent the cumulative distribution function and survival function of the standard normal distribution respectively.
Additional properties concerning the moments of the DPLN and asymptotic tail behaviour can be found in Reed (2003). When we incorporate a vector of explanatory variables (or covariates) x R d in our model, the location parameter ν is set equal to β x , where β R d is a vector of regression coefficients, and the mean of the response variable is given by
E [ V T | x ] = λ 1 λ 2 ( λ 1 1 ) ( λ 2 + 1 ) exp { β x + 1 2 τ 2 } ,
where τ , λ 2 > 0 and λ 1 > 1 . Each regression coefficient can be interpreted as a proportional change in the mean of the response variable per unit change in the corresponding covariate.
For a random sample coming from V T
v 1 , v 2 , , v n
and corresponding vectors of covariates x ( 1 ) , x ( 2 ) , , x ( n ) , we will use the maximum likelihood estimation described in the following section to compute the parameters of the DPLN distribution.

3. Maximum Likelihood Estimation of Parameters

3.1. Methods of Estimation

Given the random sample in (11) and corresponding vectors of covariates, there are several ways of estimating parameters, such as moment matching, where such moments exist, and maximum likelihood estimation (MLE). As we are dealing with heavy-tailed distributions, moment matching may not be possible and we therefore resort to maximum likelihood estimation. Maximum likelihood estimators are also preferable since for large samples the estimators are unbiased, efficient and normally distributed. The EM algorithm of Dempster et al. (1977) is one approach to performing MLE of parameters, which we describe in the next section. Another approach is based on the gradient ascent method which we discuss in a subsequent section.

3.2. Application of the EM Algorithm to DPLN Generalized Linear Model

3.2.1. The EM Algorithm for the DPLN GLM

Our task is to obtain maximum likelihood estimates of the parameters of the model D P L N ( ν , τ 2 , λ 1 , λ 2 ) using the EM algorithm, which was developed in Dempster et al. (1977). Because an NSL random variable is the logarithm of a DPLN random variable, fitting a DPLN distribution to the observations in (11) is the same as fitting the NSL distribution to the logarithms y 1 , y 2 , , y n of these observations. The EM algorithm starts from an initial estimate of parameter values and sequentially computes refined estimates which increase the value of the log-likelihood function. In the following paragraphs we explain how it is applied to the DPLN distribution.
Suppose that θ = ( β , τ 2 , λ 1 , λ 2 ) is an initial estimate of the parameters of the distribution of the random variable Y whose density function is f Y , given in (9). Let θ denote a refined estimate of the parameters of the distribution , that is, an estimate for which the log-likelihood function exceeds that of the initial estimate θ . In what follows, we demonstrate how to generate a refined estimate for the DPLN GLM. For the refined estimate θ , we can write the log-likelihood function as
( θ ) = i = 1 n log f Y ( y i ; θ ) = i = 1 n log f Y , Z ( y i , z ; θ ) d z ,
where f Y ( y ) is the PDF of the random variable Y, f Y , Z ( y , z ) is the joint density function of the random variables ( Y , Z ) and where the random variable Z is latent and therefore unobserved. In our case, the random variable Z is a normally distributed random variable having parameters ν and τ 2 , as indicated by the random variable log X 0 in (7).
We now give the probability density function g i , for each i = 1 , 2 , , n , for the conditional random variable Z | Y = y i which only depends on the initial estimate θ of parameters, and not on θ , namely
g i ( z ) = f Y , Z ( y i , z ; θ ) / f Y ( y i ; θ ) = f Z | Y = y i ( z ) .
We then rewrite (12) as
( θ ) = i = 1 n log f Y , Z ( y i , z ; θ ) d z = i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) g i ( z ) d z
and applying Jensen’s inequality log E [ X ] E [ log X ] gives
( θ ) = i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) g i ( z ) d z i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) g i ( z ) d z = i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) d z i = 1 n log g i ( z ) g i ( z ) d z .
So our maximization of likelihood amounts to maximizing
i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) d z
with respect to θ , which is the M-step or maximization-step of the EM algorithm.
In practice, there is an E-step or expectations-step of the algorithm which is performed prior to the M-step, however we continue with the M-step in the next section because this identifies the variables whose expectations are to be computed in the E-step.

3.2.2. M-Step

So we need to maximize (16) with respect to the parameter θ . We show how this is done for the double Pareto lognormal distribution by expanding out the terms in (16), giving
i = 1 n log f Y , Z ( y i , z ; θ ) g i ( z ) d z = i = 1 n log [ 1 2 π ( τ ) 2 exp ( z ν i ) 2 2 ( τ ) 2 × λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 ( y i z ) ) , if z > y i exp ( λ 1 ( y i z ) ) , if z y i ] g i ( z ) d z ,
which becomes
n log 1 2 π ( τ ) 2 1 2 ( τ ) 2 i = 1 n z i ( 2 ) + 1 ( τ ) 2 i = 1 n z i ν i 1 2 ( τ ) 2 i = 1 n ( ν i ) 2 + n log λ 1 λ 2 λ 1 + λ 2 + λ 2 i = 1 n w i λ 1 i = 1 n w i + ,
where
z i = z g i ( z ) d z , z i ( 2 ) = z 2 g i ( z ) d z , w i + = log x i ( log x i z ) g i ( z ) d z , w i = log x i ( log x i z ) g i ( z ) d z .
We arrive at the following Theorem giving the optimum parameter vector θ and whose proof follows the reasoning for the simpler case without explanatory variables given in Reed and Jorgensen (2004).
Theorem 1.
The components of the parameter vector θ which maximise (16) are
β = ( X X ) 1 X Z , ( τ ) 2 = 1 n 1 Z ( 2 ) Z X ( X X ) 1 X Z , λ 1 = 1 P + P Q , λ 2 = 1 Q + P Q ,
where
P = 1 n i = 1 n w i + , Q = 1 n i = 1 n w i ,
Z = ( z 1 , z 2 , , z n ) , Z ( 2 ) = ( z 1 ( 2 ) , z 2 ( 2 ) , , z n ( 2 ) )
and X is the matrix of predictor variables
X = x ( 1 ) x ( 2 ) x ( n ) .
Proof. 
See Appendix A.1. ☐

3.2.3. E-Step

Here we compute the conditional distributions which are used in the E-step. For the set of n logarithms y 1 , , y n of observations, the maximum likelihood estimates of the parameters can be obtained using the EM algorithm with as follows,
g i ( z ) = f Z ( z ; θ ) f W ( y i z ; θ ) f Y ( y i ; θ ) ,
where the density functions f Z , f W and f Y are defined as
f Z ( z ; θ ) = 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 , f W ( w ; θ ) = λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 w ) w < 0 exp ( λ 1 w ) w 0 f Y ( y ; θ ) = ϕ ( y ν ) / τ R ( λ 1 τ ( y ν ) / τ ) + R ( λ 2 τ + ( y ν ) / τ ) .
For this choice of g i , log f Y , Z ( y i , z ; θ ) in (16) becomes
log f Y , Z ( y i , z ; θ ) = log f Z ( z ; θ ) + log f W ( y i z ; θ ) .
Our E-step of the EM algorithm is given in the following Theorem, most of the equations of which are mentioned in Reed and Jorgensen (2004) but for which we supply explicit proofs.
Theorem 2.
The expectations in our E-step are as follows
z i = z g i ( z ) d z = ν + τ 2 λ 1 R ( p i ) λ 2 R ( q i ) R ( p i ) + R ( q i ) , z i ( 2 ) = z 2 g i ( z ) d z = ν 2 + τ 2 τ 2 p i + q i R ( p i ) + R ( q i ) + τ 2 ( 2 ν λ 1 + λ 1 2 τ 2 ) R ( p i ) + ( λ 2 2 τ 2 2 ν λ 2 ) R ( q i ) R ( p i ) + R ( q i ) , w i + = y i ( y i z ) g i ( z ) d z = τ p i R ( p i ) + 1 R ( p i ) + R ( q i ) , w i = y i ( y i z ) g i ( z ) d z = τ q i R ( q i ) 1 R ( p i ) + R ( q i ) ,
where p i = ( y i ν ) / τ + λ 1 τ and q i = ( y i ν ) / τ + λ 2 τ .
Proof. 
See Appendix A.2. ☐

3.2.4. Standard Errors

The standard errors of the estimates θ ^ = ( β ^ , τ ^ 2 , λ ^ 1 , λ ^ 2 ) can be estimated in the last iteration of the EM algorithm, as shown in Louis (1982). The observed Fisher information matrix evaluated at θ ^ based on the observations { v i } i = 1 n can be approximated by
I ( θ ^ ; v i ) i = 1 n θ log f Y ( log v i ; θ ^ ) θ log f Y ( log v i ; θ ^ ) ,
where f Y is as in (9). Since E θ log f Y ( y i ; θ ) | y i ; θ ^ = θ log f Y ( y i ; θ ^ ) , (28) is equivalent to
i = 1 n θ E log f Y ( y i ; θ ) | y i ; θ ^ θ E log f Y ( y i ; θ ) | y i ; θ ^ .
In particular,
θ E log f Y ( y i ; θ ) = E log f ( u i ; ν i , τ ) | y i ; θ ^ ( β , τ ) , E log f Y ( w i ; λ 1 , λ 2 ) | y i ; θ ^ ( λ 1 , λ 2 ) ,
and therefore these expressions are available in the last iteration of the EM algorithm.

3.3. Gradient Ascent Method

The gradient ascent method is applied to the likelihood function of the normal skew Laplace distribution in this subsection. Let be y = ( y 1 , , y n ) be a random sample of size n from a NSL distribution. Its log-likelihood function is:
( y ; β , τ 2 , λ 1 , λ 2 ) = n log λ 1 + n log λ 2 n log ( λ 1 + λ 2 )    + i = 1 n log Φ y i τ 2 λ 1 ν i τ exp { 1 2 λ 1 τ 2 + λ 1 ( y i ν i ) } + 1 Φ y i + τ 2 λ 2 ν i τ exp { 1 2 λ 2 τ 2 + λ 2 ( ν i y i ) } .
The solutions of the d + 3 score equations that are shown in the Appendix, provide the maximum likelihood estimates of λ 1 , λ 2 , τ and { β j } j = 1 , , d , which can be obtained by numerical methods such as Newton-Raphson algorithm. Alternatively, parameter estimates can be obtained directly via a grid search for the global maximum of the log-likelihood surface given by (29), or equivalently by maximizing the log-likelihood function derived from the expression (8). We have used FindMaximum function of Mathematica software package v.11.0. Since the global maximum of the log-likelihood surface is not guaranteed, different initial values of the parametric space can be considered as seed point using different methods of maximization, such as Newton–Raphson method, Principal Axis method and the Broyden–Fletcher–Goldfarb–Shanno algorithm (BGGS), among others. The standard errors of the estimates have been approximated by inverting the Hessian matrix and the relevant partial derivatives can be approximated well by finite differences.

4. Numerical Applications

In this section, two well-known data sets in the actuarial literature that can be downloaded from Professor E. Frees’ personal website1 will be considered to test the practical performance of the DPLN generalized linear model. For the two data sets considered, the EM algorithm for the DPLN GLM was stopped when the relative change of the log-likelihood function was smaller than 1 × 10 10 . The initial values were calculated by using the estimates of the lognormal GLM and the estimates of the parameters λ 1 and λ 2 for the model without covariates.
Because we are comparing the DPLN generalized linear model with the GB2 GLM, we give here some rudimentary facts concerning the GB2 distribution. Let Z be a random variable having the B e t a ( p , q ) distribution, for p , q ( 0 , ) , as defined in Chapter 6 of Kleiber and Kotz (2003). Then, for  ν ( , ) and τ ( 0 , ) , the random variable
V = exp ( ν ) ( Z / ( 1 Z ) ) τ
has the GB2 distribution and its probability density function can be written as
f V ( v ) = exp { p ( log v ν ) τ } v τ B ( p , q ) [ 1 + exp { ( log v ν ) τ } ] p + q ,
where v ( 0 , ) , ν is a location parameter, τ > 0 is a scale parameter, p > 0 and q > 0 are shape parameters and
B ( p , q ) = 0 1 z p 1 ( 1 z ) q 1 d z = Γ ( p ) Γ ( q ) / Γ ( p + q )
is the Beta function. As for the aformentioned distributions, to include explanatory variables in the model, we let the location parameter be a linear function of covariates, i.e., ν = β x . The k-th moment is easily seen to be
E [ V k ] = exp ( k ν ) B ( p + k τ , q k τ ) B ( p , q ) ,
where k ( p / τ , q / τ ) , and looking at the case k = 1 we can interpret each of the regression coefficients β i , i = 1 , , d , as being the proportional sensitivity of the mean to the corresponding covariate. Further details of this model can be found in Frees et al. (2014a). Parameter estimation for the GB2 GLM has been performed via a grid search for the global maximum of the log-likelihood surface associated to this model. We have used FindMaximum function of Mathematica software package v.11.0.

4.1. Example 1: Automobile Insurance

The first data set pertains to claims experience from a large midwestern (US) property and casualty insurer for private passenger automobile insurance.
The dependent variable is the amount paid on a closed claim, in US$. The sample includes 6773 claims. The following explanatory variables have been considered to explain the claims amount:
  • GENDER, gender of operator, takes the value 1 if female and 0 otherwise;
  • AGE, age of operator;
  • CLASS rating class of operator as coded in Table 1.
In the top part of Figure 1 the histogram of the Automobile insurance claims is exhibited in logarithmic scale. This dataset is quite symmetrical but it presents a slightly longer lower tail. For that reason the DPLN distribution seems suitable to explain this dataset.

4.1.1. Model Without Covariates

Here, for comparison purposes only, the lognormal, DPLN and GB2 distributions will be used to describe the total losses (e.g., when explanatory variables are not considered). Firstly, the automobile insurance claims dataset is examined. Table 2 summarizes parameters estimates obtained by maximum likelihood with corresponding standard errors (in brackets) for the aforementioned distributions.
In respect of model selection, we also provide the negative of the maximum of the log-likelihood (NLL), Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) results in the table. Note that for all three measures of model validation, smaller values indicate a better fit of the model to the empirical data. As expected, the lognormal distribution exhibits the worst performance in terms of all three measures of model validation. In the top part of Figure 2, we have superimposed the log transformation of these three distributions to the empirical distribution of the log of the claims sizes to test the fit in both tails. It is evident that the log transformation of the lognormal distribution (black curve), i.e., Normal distribution (N), provides the worst fit due to asymmetry of the data. The logGB2 (blue curve) and NSL distributions (red curve) give better fit to data as measured by the NLL, AIC and BIC, although the latter model adheres closely to the data. Although it is not shown in Table 2, the PLN distribution, replicates the fit of the LN distribution and the value of the shape parameter that controls the right tail tends to infinity. The computing times (CT) in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these distributions are shown in the last row of the table. The computing time of the EM algorithm for the DPLN GLM was 1145.86 s using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 4 .

4.1.2. Comparison of Estimation from Simulations

In this section we compare the methods of estimating parameters by conducting the following simulation experiment. For the lognormal, DPLN and GB2 distributions, we simulate values based on the corresponding parameter estimates given in Table 2, and then, using as appropriate either standard formulae or the EM algorithm, compute parameter estimates from the 1000 simulated data sets of size N = 100 , 200 , 300 , 400 , 500 , 1000 . The results are shown in Table 3, where it is evident that increasing the sample size increases the accuracy of the parameter estimate. Of course, the true parameter values are given in Table 2, and these are the limits of the estimates as the sample size N increases. Importantly, the standard errors of the parameter estimates in Table 3 are noticeably smaller for the DPLN distribution, highlighting the consistency of the parameter estimation for the DPLN model. It is noteworthy to observe that the parameter estimates of the GB2 distribution are unstable for small sample sizes, which is not the case for the DPLN model. This highlights an advantage of the DPLN model over the GB2 in this case.

4.1.3. Including Explanatory Variables

Making use of the above additional information, we aim to better explain the total losses in terms of the set of covariates by using the DPLN generalized linear model. For the purpose of comparison, we have also fitted the lognormal and GB2 generalized linear models. Here, we choose the identity link function for the location parameter.
From left to right in Table 1, the parameter estimates, standard errors (S.E.) and the corresponding p-values calculated based on the t-Wald statistics for the LN, GB2 and DPLN generalized linear models are displayed for the automobile insurance claims dataset. The AIC and BIC values for each model are provided in the last two rows of the table. For the i-th claimant, the number of total amount y i follows (10) whose mean depends on the above set of covariates through the identity link function. The exponential of INTERCEPT coefficient 7.260 is proportional to the predicted loss amount when the values of the other explanatory variables are equal to 0. This estimate is statistically significant at the usual significance levels, i.e., 5% and 1%. In total, the estimates of 10 out of 23 parameters for the DPLN generalized linear models are statistically significant at the usual levels (i.e., 5% and 1%) including the scale and shape parameters. The results for the LN and GB2 generalized linear model are also exhibited in Table 1 to compare their behaviour with DPLN generalized linear model. As it can be seen the fit provided by the DPLN generalized linear model improves the one provided by GB2. For the DPLN generalized linear model, parameters were estimated by the method of maximum likelihood by maximizing the log-likelihood surface. The same estimates were achieved by the EM algorithm described in Section 3.2. The standard errors of the parameter estimates for the DPLN GLM were computed from the last iteration of the EM algorithm and also approximated by inverting the Hessian matrix. Similar values were obtained. The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these generalized linear models are shown in the last row of the table. The DPLN GLM shows a better performance than the GB2 counterpart. The computing time of the EM algorithm for the DPLN GLM was 2239.24 s using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 4 .

4.1.4. Model Validation

Now, we analyze model validation from a practical perspective. In this regard, LN generalized linear model can be seen as a limiting case of DPLN generalized linear model when both λ 1 and λ 2 tend to infinity. We are interested, by means of the likelihood ratio test, in determining whether the LN generalized linear model (null hypothesis) is preferable to DPLN generalized linear model (alternative hypothesis) in describing these datasets. The test statistic is T = 2 ( L N D P L N ) where L N and D P L N represent the maximum of the log-likelihood function for the LN and DPLN generalized linear models respectively. Asymptotically, under certain regularity conditions (see for example Lehmann and Casella (1998)) T follows a chi-square distribution with two degrees of freedom. We have that T = 2 ( 57179.69 + 57155.87 ) = 50.02 , therefore the larger model (DPLN) is preferable to the smaller (LN) generalized linear model at the usual significance levels, i.e., 5% and 1% (p-value less than 0.0001).
Next, the likelihood ratio test proposed by Vuong (1989) for non-nested models will be considered as a tool for model diagnostic. The test statistic is
T = 1 ω n f ( θ ^ 1 ) g ( θ ^ 2 ) log n n f 2 n g 2 ,
where
ω 2 = 1 n i = 1 n log f ( θ ^ 1 ) g ( θ ^ 2 ) 2 1 n i = 1 n log f ( θ ^ 1 ) g ( θ ^ 2 ) 2
is the sample variance of the pointwise log-likelihood ratios and f and g represent the probability density function (pdf) of two different non–nested models, θ ^ 1 and θ ^ 2 are the maximum likelihood estimates of θ 1 and θ 2 and n f and n g are the number of estimated coefficients in the model with pdf f and g respectively. Note that the Vuong’s statistic is sensitive to the number of estimated parameters in each model and therefore the test must be corrected for dimensionality. Under the null hypotheses, H 0 : E [ f ( θ ^ 1 ) g ( θ ^ 2 ) ] = 0 and T is asymptotically normally distributed. At the 5% significance level, the rejection region for this test in favor of the alternative hypothesis occurs when T > 1.96 .
Now we compare the GB2 and DPLN generalized linear models in terms of Vuong’s test. Under the null hypothesis the two models are equally close to the true but unknown specification. For our data set, the value of the test statistic is T = 1.00 and we fail to reject H 0 , and therefore differences between these two models do not exist.

4.2. Example 2: Automobile Bodily Injury Claims

The second data set deals with automobile bodily injury claims sourced from the Insurance Research Council (IRC), a division of the American Institute for Chartered Property Casualty Underwriters and the Insurance Institute of America. The data, collected in 2002, contains demographic information about the claimants, attorney involvement and the economic losses (in thousands of US$), among other variables. As some of these explanatory variables contain missing observations, we only consider those data items having no missing values, resulting in a sample of 1,091 losses from a single state. We use as the response variable the claimant’s total economic loss. Also, additional information is available to explain the claimants’ total economic losses. We employ the following factors as covariates in our model fitting:
  • ATTORNEY, takes the value 1 if the claimant is represented by an attorney and 0 otherwise;
  • CLMSEX, takes the value 1 if the claimant is male and 0 otherwise;
  • MARRIED, takes the value 1 if the claimant is married and 0 otherwise;
  • SINGLE, takes the value 1 if the claimant is single and 0 otherwise;
  • WIDOWED, takes the value 1 if the claimant is widowed and 0 otherwise;
  • CLMINSUR, whether or not the claimant’s vehicle was uninsured ( = 1 if yes and 0 otherwise);
  • SEATBELT, whether or not the claimant was wearing the seatbelt/child restraint ing belt’s vehicle was uninsured ( = 1 if yes and 0 otherwise);
  • CLMAGE, claimant’s age.
The empirical distribution of this variable combines losses of small, moderate and large sizes which makes it suitable for fitting heavy-tailed distributions. It has other features such as unimodality, skewness and a long upper tail, indicating a high likelihood of extremely expensive events. In the bottom part of Figure 1 the histogram of the response variable of this data set is shown again in logarithmic scale. A heavy lower tail is evident when this scale is used.

4.2.1. Model Without Covariates

The results for the bodily injury claims data are shown in Table 4. The GB2 and DPLN distributions give the best fit to data as measured by these three measures of model selection. As expected, the LN distribution has the worst performance due to the asymmetry of the data. Again, although it is not shown in Table 4, the three-parameter PLN model replicates the LN distribution. This is due to the fact that the former model is a limiting case of the latter when shape parameter λ 1 tends to infinity. These results are also supported by the bottom part of Figure 1, where it can be seen that the log transformation of the GB2 distribution L o g GB2 (blue curve) and the NSL distribution (red curve) provide almost an identical fit to data. The MLEs for the DPLN distribution were obtained by using the EM algorithm whose starting parameter values λ 1 , λ 2 , ν and τ are those obtained by moment-matching the first four cumulants. These MLEs were confirmed by those obtained directly from maximizing the log-likelihood surface. The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these distributions are shown in the last row of the table. The computing time of the EM algorithm for the DPLN GLM was 1,322.81 seconds using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 4 .

4.2.2. Comparison of Estimation from Simulations

In this section we compare the methods of estimating parameters by conducting the following simulation experiment. For the lognormal and DPLN distributions, we simulate values based on the corresponding parameter estimates given in Table 4, and then, using as appropriate either standard formulae or the EM algorithm, compute parameter estimates from the 1000 simulated data sets of size N = 100 , 200 , 300 , 400 , 500 , 1000 . The results are shown in Table 5, where it is evident that increasing the sample size increases the accuracy of the parameter estimate. Of course, the true parameter values are given in Table 4, and these are the limits of the estimates as the sample size N increases. Importantly, the standard errors of the parameter estimates in Table 5 are noticeably smaller for the DPLN distribution, highlighting the consistency of the parameter estimation for the DPLN model. However, in attempting to simulate values from the GB2 distribution, calculation of the inverse CDF via the expression in (30) is highly unstable for simulated values of the B e t a ( p , q ) random variable Z which are close to unity.

4.2.3. Including Explanatory Variables

Table 6, displays the same results for the automobile injury claims dataset. For the i-th policyholder, the number of total amount y i follows (10) whose mean depends on the above set of covariates through the identity link function. The exponential of INTERCEPT coefficient 1.023 is proportional to the predicted loss amount when the values of the other explanatory variables are equal to 0. In view of its low p-value, this estimate is statistically significant at the usual significance levels, 5% and 1%. On the other hand, the indicator ATTORNEY is statistically significant at the usual nominal levels, whereas the gender and marital status of the claimant, except that the explanatory variable SINGLE, are not significant at the 5% significance level. Similarly, the fact that the vehicle was uninsured is not relevant in the investigation. Both claimant’s age and usage of seatbelt/child restraint are highly significant. Three more parameters affect the calculation of the predicted mean: the parameter τ , which is also highly significant, and shape parameters λ 1 and λ 2 . All these three parameters are highly statistically significant at the usual nominal levels, 5% and 1%. For the sake of comparison the results for the LN and GB2 generalized linear model are displayed in Table 6. As it can be observed the fit provided by the GB2 generalized linear model is only marginally better than the DPLN generalized linear model. For the DPLN generalized linear model, parameters were estimated by the method of maximum likelihood by using log-transformed data and the NSL distribution. The maximum of the log-likelihood function was 1753.07 and it was achieved after considering different initial values of likelihood surface by using the FindMaximum function of Mathematica software package v.11.0. Similar estimates were obtained by means of the EM algorithm described in Section 3.2. In this case the same value was obtained for the maximum of the log-likelihood function of the NSL GLM. The standard errors of the parameter estimates for the DPLN GLM have been approximated by inverting the Hessian matrix and also from the last iteration of the EM algorithm. Similar values were obtained. The computing times in seconds to estimate the maximum likelihood estimates by directly maximizing the log-likelihood surface for these generalized linear models are shown in the last row of the table. The DPLN GLM shows a better performance than the GB2 counterpart. The computing time of the EM algorithm for the DPLN GLM was 142.57 seconds using the stopping criterion that the relative change of the log-likelihood function be less than 1 × 10 4 .

4.2.4. Model Validation

As done for our first example, we analyze model validation from a practical perspective. The test statistic is T = 2 ( L N D P L N ) where L N and D P L N represent the maximum of the log-likelihood function for the LN and DPLN generalized linear models respectively and T asymptotically follows a chi-square distribution with two degree of freedom. For the automobile bodily injury claims data set, it is verified that T = 2 ( 2450.54 + 2430.02 ) = 20.52 . Then, at the usual significance levels (i.e., p-value is less than 0.0001), the null hypothesis is clearly rejected and consequently, the smaller regression (LN) is rejected in favour of the model based on the DPLN distribution.
Also, as done for our first example, Vuong’s test statistic is T = 1.00 for our second data set and we fail to reject H 0 , and therefore differences between these two models do not exist.

4.3. Log-Residuals for Assessing Goodness-of-Fit

In the following we consider the log-residuals for assessing the goodness-of-fit of the proposed models for the two datasets considered. As the population moments of order higher than two cannot be derived neither for the DPLN (i.e., λ 1 < 2 ) nor the GB2 (i.e., the condition p < 2 τ < q is not satisfied in none of the datasets) distribution for the automobile bodily injury claims dataset, we have not examined the Pearson’s type residual. In Figure 2, one can see the QQ-plot of the log-residuals for LN, GB2 and DPLN generalized linear models for the automobile insurance claims (left hand side) automobile bodily injure claims data set (right hand side). The alignment along the 45-degree line is better in both the DPLN and GB2 generalized linear models in the central part and both tails of the distribution of the residuals as compared to the LN generalized linear model for the two datasets analyzed.

4.4. Out-of-Sample Validation of Models

We demonstrate the abilities of the models to predict portfolio losses out-of-sample with probability-probability plots shown in Figure 3. The data set { v 1 , v 2 , , v n } is partitioned into two halves by sorting the claim sizes ascendingly, that is, we write the data set as
v i 1 < v i 2 < v i 3 < < v i n ,
where i 1 , , i n { 1 , 2 , , n } , and then two data sets are formed, A = { v i 1 , v i 3 , v i 5 , } and B = { v i 2 , v i 4 , v i 6 , } , alternating the data set to which each claim data item in the ordered data set is allocated. In this way the second data set is a good representation of the first data set in respect of claim distribution, but not necessarily in respect of the corresponding covariates, i.e., { x ( i 2 ) , x ( i 4 ) , } may not be representative of { x ( i 1 ) , x ( i 3 ) , } . The data set A is used for fitting the models, whereas the data set B is used for graphing the probability-probability plots. In Figure 4 and Figure 5 we focus on the lower and upper tails of the distributions respectively, where it is evident that the DPLN and GB2 models provide the best fit and are almost indistiguishable.
In Figure 6 the net losses under the various models are shown, where, as before, we have used half of the data for fitting the models and the other half for computing the net losses and maximum probable losses based on the 99.5-th percentile. It is evident that the DPLN and GB2 models give a higher computed maximum probable loss than for the LN distribution, and thus illustrates the ability of these models to provide adequate solvency levels when extreme claims are experienced by the insurer’s portfolio.

5. Conclusions

In this paper, the DPLN generalized linear model was developed and fitted to two data sets, these being private passenger automobile insurance claims data and automobile bodily injury claims data. Several covariates pertaining to various attributes of insurance claimants were combined in the linear predictor of the location parameter ν , and were chosen because of their anticipated effect on claim size. This model exhibits Paretian behaviour in both tails and it is shown to provide fits to the two data sets which are comparable to those of the GB2 distribution.
The parameters of the DPLN generalized linear model were estimated via the EM algorithm and independently confirmed by maximizing the log-likelihood surface of the closely related Normal Laplace generalized linear model. The performance of the DPLN model has been compared with the lognormal distribution, a limiting case of the DPLN distribution, and the GB2 generalized linear model according to different model selection criteria. In view of the results obtained, we have found that the proposed DPLN generalized linear model is a valid alternative to other parametric heavy-tailed generalized linear models such as the GB2 GLM.
Potential practical applications of the DPLN GLM, beyond what is demonstrated in this article, include predicting mortality rates for lives where the covariates of the GLM are age, sex, occupation, etc. and predicting hazard rates in reduced-form credit risk models. These will be considered in further work.

Acknowledgments

The authors acknowledge the financial support from the Faculty of Business and Economics, University of Melbourne via a grant awarded for 2017. Also, the authors are grateful for the helpful suggestions of the reviewers.

Author Contributions

These authors contributed equally to this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

Looking at (18) we see that the expression separates into two parts, as mentioned in Reed and Jorgensen (2004), the first being dependent upon β and τ and the second being dependent upon λ 1 and λ 2 . The first part
n log 1 2 π ( τ ) 2 1 2 ( τ ) 2 i = 1 n z i ( 2 ) + 1 ( τ ) 2 i = 1 n z i ν i 1 2 ( τ ) 2 i = 1 n ( ν i ) 2
can be rewritten using matrix notation as
n log 1 2 π ( τ ) 2 1 2 ( τ ) 2 ( 1 Z ( 2 ) 2 Z X β + β X X β ) .
Viewed as a quadratic form in β , the optimum value of β is
β = ( X X ) 1 X Z
and the first part becomes
n log 1 2 π ( τ ) 2 1 2 ( τ ) 2 ( 1 Z ( 2 ) Z X ( X X ) 1 X Z ) ,
which is to be maximized with respect to τ . Differentiating this with respect to τ and equating to zero gives the update for τ .
The second part
n log λ 1 λ 2 λ 1 + λ 2 + λ 2 i = 1 n w i λ 1 i = 1 n w i +
can be rewritten as
n log λ 1 λ 2 λ 1 + λ 2 λ 2 Q λ 1 P ,
where
P = 1 n i = 1 n w i + , Q = 1 n i = 1 n w i .
The optimum values of λ 1 and λ 2 are solved by equating the first order partial derivatives to zero with closed-form solutions
λ 1 = 1 P + P Q , λ 2 = 1 Q + P Q .

Appendix A.2. Proof of Theorem 2

The first expectation is computed as follows
z i = z g i ( z ) d z = 1 f Y ( y i ; θ ) z 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 × λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 ( y i z ) ) , y i z < 0 exp ( λ 1 ( y i z ) ) , y i z 0 d z = 1 f Y ( y i ; θ ) λ 1 λ 2 λ 1 + λ 2 × { y i z 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 exp ( λ 2 ( y i z ) ) d z + y i z 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 exp ( λ 1 ( y i z ) ) d z } = 1 f Y ( y i ; θ ) λ 1 λ 2 λ 1 + λ 2 × { y i z 1 2 π τ 2 exp 1 2 τ 2 ( z 2 + ν 2 2 ν z + 2 τ 2 λ 2 z 2 τ 2 λ 2 y i ) d z + y i z 1 2 π τ 2 exp 1 2 τ 2 ( z 2 + ν 2 2 ν z 2 τ 2 λ 1 z + 2 τ 2 λ 1 y i ) d z }
The first of these integrals simplifies as
exp 1 2 τ 2 ( ν 2 ( ν + τ 2 λ 2 ) 2 2 τ 2 λ 2 y i ) × y i z 1 2 π τ 2 exp 1 2 τ 2 ( z + ( ν + τ 2 λ 2 ) ) 2 d z = ϕ ( ( y i ν ) / τ ) / ϕ ( q i ) × y i z 1 τ ϕ ( ( z ( ν τ 2 λ 2 ) ) / τ ) d z = ϕ ( ( y i ν ) / τ ) / ϕ ( q i ) × { ( ν τ 2 λ 2 ) Φ c ( ( y i ( ν τ 2 λ 2 ) ) / τ ) + τ ϕ ( ( y i ( ν τ 2 λ 2 ) ) / τ ) } = ϕ ( ( y i ν ) / τ ) × ( ν τ 2 λ 2 ) R ( q i ) + τ .
The second of these integrals simplifies as
exp 1 2 τ 2 ( ν 2 ( ν τ 2 λ 1 ) 2 + 2 τ 2 λ 1 y i ) × y i z 1 2 π τ 2 exp 1 2 τ 2 ( z 2 + ( ν z 2 τ 2 λ 1 ) ) 2 d z } = ϕ ( ( y i ν ) / τ ) / ϕ ( p i ) × y i z 1 τ ϕ ( ( z ( ν z + 2 τ 2 λ 1 ) ) / τ ) d z = ϕ ( ( y i ν ) / τ ) / ϕ ( p i ) × { ( ν z + 2 τ 2 λ 1 ) Φ ( ( y i ( ν z + 2 τ 2 λ 1 ) ) / τ ) τ ϕ ( ( y i ( ν z + 2 τ 2 λ 1 ) ) / τ ) } = ϕ ( ( y i ν ) / τ ) / ϕ ( p i ) × ( ν z + 2 τ 2 λ 1 ) R ( p i ) τ .
Combining both integrals in the simplified formula for z i gives
z i = ν + τ 2 λ 1 R ( p i ) λ 2 R ( q i ) R ( p i ) + R ( q i ) .
The second expectation is computed as follows
z i ( 2 ) = z 2 g i ( z ) d z = 1 f Y ( y i ; θ ) z 2 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 × λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 ( y i z ) ) , y i z < 0 exp ( λ 1 ( y i z ) ) , y i z 0 d z = ν 2 + τ 2 τ 2 p i + q i R ( p i ) + R ( q i ) + τ 2 ( 2 ν λ 1 + λ 1 2 τ 2 ) R ( p i ) + ( λ 2 2 τ 2 2 ν λ 2 ) R ( q i ) R ( p i ) + R ( q i ) .
The third expectation is computed as follows
w i + = ( y i z ) + g i ( z ) d z = 1 f Y ( y i ; θ ) ( y i z ) + 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 × λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 ( y i z ) ) , y i z < 0 exp ( λ 1 ( y i z ) ) , y i z 0 d z = τ p i R ( p i ) + 1 R ( p i ) + R ( q i ) .
The fourth expectation is computed as follows
w i = ( y i z ) g i ( z ) d z = 1 f Y ( y i ; θ ) ( y i z ) 1 2 π τ 2 exp 1 2 τ 2 ( z ν ) 2 × λ 1 λ 2 λ 1 + λ 2 exp ( λ 2 ( y i z ) ) , y i z < 0 exp ( λ 1 ( y i z ) ) , y i z 0 d z = τ q i R ( q i ) 1 R ( p i ) + R ( q i ) .

Appendix A.3. Score Equations

The score equations to be solved for calculation of the maximum likelihood estimates are given by
( y ; β , τ 2 , λ 1 , λ 2 ) λ 1 = n λ 1 n λ 1 + λ 2 +    i = 1 n 2 A 1 ( y i , ν i ) π ( τ 2 + 2 ν i 2 y i ) Φ ( y i λ 1 τ 2 ν i τ ) 2 B 1 ( y i , ν i ) π A 1 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 1 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) = 0 ,
with A 1 ( y i , ν i ) = exp λ 2 y i + λ 1 ν i + λ 1 τ 2 2 , B 1 ( y i , ν i ) = exp ( λ 1 τ 2 + ν i y i ) 2 2 τ 2 and C 1 ( y i , ν i ) = exp λ 1 y i + λ 2 ν i + λ 2 τ 2 2 .
( y ; β , τ 2 , λ 1 , λ 2 ) λ 2 = n λ 2 n λ 1 + λ 2 +    i = 1 n 2 C 2 ( y i , ν i ) ( τ 2 + 2 ν i 2 y i ) Φ c ( y i + λ 2 τ 2 ν i τ ) 2 π B 2 ( y i , ν i ) A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) = 0 ,
with A 2 ( y i , ν i ) = exp λ 1 y i + λ 1 ν i + λ 1 τ 2 2 , B 2 ( y i , ν i ) = exp ( λ 2 τ 2 ν i + y i ) 2 2 τ 2 and C 2 ( y i , ν i ) = exp λ 2 y i + λ 2 ν i + λ 2 τ 2 2 .
( y ; β , τ 2 , λ 1 , λ 2 ) τ = i = 1 n 4 A 3 ( y i , ν i ) B 3 ( y i ν i λ 2 τ 2 ) y i + ν i λ 1 τ 2 2 π τ 2 ( A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) )    + i = 1 n τ ( λ 1 A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + λ 2 C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) 2 A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) = 0 ,
with A 3 ( y i , ν i ) = exp ( y i ν i ) + ( λ 1 1 ) λ 1 τ 4 2 τ 2 and B 3 ( y i , ν i ) = exp 2 λ 2 ( y i ν i ) + 1 2 ( λ 1 λ 2 ) ( λ 1 + λ ) 2 1 ) τ 2 .
( y ; β , τ 2 , λ 1 , λ 2 ) β j = i = 1 n 2 2 x j ( A 4 ( y i , ν i ) B 4 ( y i , ν i ) ) π τ ( A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) )    + i = 1 n x j λ 1 A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + λ 2 C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) A 2 ( y i , ν i ) Φ ( y i λ 1 τ 2 ν i τ ) + C 2 ( y i , ν i ) Φ c ( y i + λ 2 τ 2 ν i τ ) = 0 ,
with j = 1 , , d , A 4 ( y i , ν i ) = exp λ 2 ( y i ν i ) + λ 2 τ 2 2 ( y i ν i + λ 2 τ 2 ) 2 2 τ 2 and B 4 ( y i , ν i ) = exp 1 2 λ 1 τ 2 ( 1 λ 1 ) ( y i ν i ) 2 2 τ 2 .

References

  1. Colombi, Roberto. 1990. A new model of income distribution: The Pareto lognormal distribution. In Income and Wealth Distribution, Inequality and Poverty. Edited by C. Dagum and M. Zenga. Berlin: Springer, pp. 18–32. [Google Scholar]
  2. Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39: 1–38. [Google Scholar]
  3. Frees, Edward W., Richard A. Derrig, and Glenn Meyers. 2014a. Predictive Modeling Applications in Actuarial Science, Volume 1. New York: Cambridge University Press. [Google Scholar]
  4. Frees, Edward W., Richard A. Derrig, and Glenn Meyers. 2014b. Predictive Modeling Applications in Actuarial Science, Volume 2. New York: Cambridge University Press. [Google Scholar]
  5. Frees, Edward W., and Emiliano A. Valdez. 2008. Hierarchical Insurance Claims Modeling. Journal of the American Statistical Association 103: 1457–69. [Google Scholar] [CrossRef]
  6. Giesen, Kristian, Arndt Zimmermann, and Jens Suedekum. 2010. The size distribution across all cities—Double Pareto lognormal strikes. Journal of Urban Economics 68: 129–37. [Google Scholar] [CrossRef]
  7. Hajargasht, Gholamreza, and William E. Griffiths. 2013. Pareto-lognormal distributions: Inequality, poverty, and estimation from grouped income data. Economic Modelling 33: 593–604. [Google Scholar] [CrossRef]
  8. Hürlimann, Werner. 2014. Pareto type distributions and excess-of-loss reinsurance. International Journal of Recent Research and Applied Studies 18: 1. [Google Scholar]
  9. Kleiber, Christian, and Samuel Kotz. 2003. Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken: Wiley. [Google Scholar]
  10. Kočović, Jelena, Vesna Ćojbašić Rajić, and Milan Jovanović. 2015. Estimating a tail of the mixture of log-normal and inverse gaussian distribution. Scandinavian Actuarial Journal 2015: 49–58. [Google Scholar] [CrossRef]
  11. Lehmann, Erich Leo, and George Casella. 1998. Theory of Point Estimation, 2nd ed. New York: Springer. [Google Scholar]
  12. Louis, Thomas A. 1982. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society Series B 44: 226–33. [Google Scholar]
  13. McDonald, James B. 1990. Regression model for positive random variables. Journal of Econometrics 43: 227–51. [Google Scholar] [CrossRef]
  14. Ramirez-Cobo, Pepa, R. E. Lillo, S. Wilson, and M. P. Wiper. 2010. Bayesian inference for double pareto lognormal queues. The Annals of Applied Statistics 4: 1533–57. [Google Scholar] [CrossRef]
  15. Reed, William J. 2003. The Pareto law of incomes - an explanation and an extension. Physica A 319: 469–86. [Google Scholar] [CrossRef]
  16. Reed, William J., and Murray Jorgensen. 2004. The Double Pareto-Longnormal Distribution—A new parametric model for size distributions. Communications in Statistics - Theory and Methods 33: 1733–53. [Google Scholar] [CrossRef]
  17. Vuong, Quang H. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society 57: 307–33. [Google Scholar] [CrossRef]
  18. Wills, M., E. Valdez, and E. Frees. 2006. GB2 regression with insurance claim severities. Paper presented at the UNSW Actuarial Research Symposium, Sydney, New South Wales, Australia, November 9. [Google Scholar]
  19. Yamaguchi, Kazuo. 1992. Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of ’permanent employment’ in Japan. Journal of the American Statistical Association 87: 284–92. [Google Scholar]
1.
Figure 1. Empirical distribution of the logarithm of automobile insurance claims (above) and logarithm of automobile bodily injury claims (below). The log transformation of the LN (N, black), DPLN (normal skew-Laplace (NSL), red) and GB2 ( L o g GB2, blue) distributions have been superimposed.
Figure 1. Empirical distribution of the logarithm of automobile insurance claims (above) and logarithm of automobile bodily injury claims (below). The log transformation of the LN (N, black), DPLN (normal skew-Laplace (NSL), red) and GB2 ( L o g GB2, blue) distributions have been superimposed.
Risks 05 00060 g001
Figure 2. QQ-plots of the log-residuals for LN (above), GB2 (middle) and DPLN (below) generalized linear models for automobile insurance claims data (left panel) and automobile bodily injury claims (right panel).
Figure 2. QQ-plots of the log-residuals for LN (above), GB2 (middle) and DPLN (below) generalized linear models for automobile insurance claims data (left panel) and automobile bodily injury claims (right panel).
Risks 05 00060 g002
Figure 3. Probability-probability plot (out-of-sample).
Figure 3. Probability-probability plot (out-of-sample).
Risks 05 00060 g003
Figure 4. Probability-probability plot of lower tail (out-of-sample).
Figure 4. Probability-probability plot of lower tail (out-of-sample).
Risks 05 00060 g004
Figure 5. Probability-probability plots of upper tail (out-of-sample).
Figure 5. Probability-probability plots of upper tail (out-of-sample).
Risks 05 00060 g005
Figure 6. Net losses on a portfolio of automobile bodily injury claims for various models (out-of-sample).
Figure 6. Net losses on a portfolio of automobile bodily injury claims for various models (out-of-sample).
Risks 05 00060 g006
Table 1. Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile insurance claims dataset under lognormal distribution (LN), generalized beta distribution of the second kind (GB2) and double Pareto lognormal distribution (DPLN) generalized linear models.
Table 1. Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile insurance claims dataset under lognormal distribution (LN), generalized beta distribution of the second kind (GB2) and double Pareto lognormal distribution (DPLN) generalized linear models.
Generalized Linear Model
Estimate (S.E.)LNGB2DPLN
INTERCEPT7.184 (0.150)7.234 (0.163)7.260 (0.080)
p-value<0.0001<0.0001<0.0001
GENDER−0.035 (0.027)−0.012 (0.027)−0.039 (0.014)
p-value0.19180.66040.0073
AGE−0.004 (0.002)−0.004 (0.002)−0.005 (0.001)
p-value0.01670.0110<0.0001
C10.018 (0.118)−0.002 (0.115)0.017 (0.063)
p-value0.87600.98770.7889
C110.063 (0.116)0.021 (0.114)0.063 (0.062)
p-value0.58530.85670.3146
C1A−0.076 (0.165)−0.047 (0.161)−0.085 (0.088)
p-value0.64530.76870.3389
C1B0.057 (0.122)0.008 (0.120)0.055 (0.066)
p-value0.64110.94710.4045
C1C−0.164 (0.206)−0.154 (0.203)−0.1392 (0.110)
p-value0.42670.44980.2075
C2−0.134 (0.176)0.034 (0.170)−0.132 (0.094)
p-value0.44500.84070.1626
C60.070 (0.120)0.033 (0.118)0.086 (0.065)
p-value0.55940.77670.1815
C7−0.030 (0.116)−0.028 (0.114)−0.033 (0.062)
p-value0.79830.80710.5960
C710.018 (0.115)−0.029 (0.113)0.013 (0.062)
p-value0.87250.79410.8380
C720.239 (0.160)0.036 (0.157)0.226 (0.086)
p-value0.13670.82030.0087
C7A0.127 (0.150)0.225 (0.147)0.123 (0.080)
p-value0.39650.12490.1248
C7B0.128 (0.118)0.091 (0.116)0.129 (0.063)
p-value0.28060.43130.042
C7C0.282 (0.162)0.173 (0.158)0.270 (0.087)
p-value0.08240.27350.0020
F10.103 (0.228)−0.134 (0.222)0.132 (0.122)
p-value0.64990.54620.2785
F11−0.087 (0.203)−0.177 (0.202)−0.099 (0.109)
p-value0.66750.37980.3623
F60.058 (0.144)0.069 (0.142)0.090 (0.077)
p-value0.68800.63000.2434
F7−0.347 (0.178)−0.382 (0.172)−0.351 (0.095)
p-value0.05080.02660.0002
τ 1.068 (0.009)0.968 (0.111)0.810 (0.006)
p-value<0.0001<0.0001<0.0001
p or λ 1 2.083 (0.371)2.127 (0.032)
p-value <0.0001<0.0001
q or λ 2 2.109 (0.427)1.952 (0.029)
p-value 0.0001<0.0001
NLL57,164.457,145.257,139.3
AIC11,4370.711,4336.611,4324.6
BIC114,513.9114,493.2114,481.6
CT3.010895.457091.1358
Table 2. Model fitting results of the LN, GB2 and DPLN distributions regarding automobile insurance claims.
Table 2. Model fitting results of the LN, GB2 and DPLN distributions regarding automobile insurance claims.
Distribution
Estimate (S.E.)LNGB2DPLN
ν 6.956 ( 0.013 ) 6.945 ( 0.074 ) 7.009 ( 0.007 )
τ 1.071 ( 0.009 ) 0.916 ( 0.089 ) 0.824 ( 0.006 )
p or λ 1 1.914 ( 0.289 ) 2.191 ( 0.033 )
q or λ 2 1.897 (0.316) 1.961 ( 0.029 )
NLL57,185.157,162.557,161.5
AIC114,374114,333114,331
BIC114,390114,360114,358
CT0.23403.837612.5113
Table 3. Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.
Table 3. Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.
Distribution
Sample Size N LNDPLNGB2
100 ν ^ = 6.9551 ( 0.1017 ) ν ^ = 7.0095 ( 0.1401 ) ν ^ = 7.0670 ( 2.1561 )
τ ^ = 1.0599 ( 0.0779 ) τ ^ = 0.8091 ( 0.1063 ) τ ^ = 1.3188 ( 6.6295 )
λ ^ 1 = 2.3715 ( 0.5056 ) p ^ = 146.9750 ( 2403.6500 )
λ ^ 2 = 2.1121 ( 0.4517 ) q ^ = 180.7270 ( 3075.6600 )
200 ν ^ = 6.9565 ( 0.0763 ) ν ^ = 7.0098 ( 0.1015 ) ν ^ = 6.9789 ( 0.5097 )
τ ^ = 1.0684 ( 0.0540 ) τ ^ = 0.8176 ( 0.0750 ) τ ^ = 0.5591 ( 1.7972 )
λ ^ 1 = 2.3024 ( 0.3785 ) p ^ = 11.3055 ( 204.7950 )
λ ^ 2 = 2.0471 ( 0.3450 ) q ^ = 12.6484 ( 231.6380 )
300 ν ^ = 6.9569 ( 0.0610 ) ν ^ = 7.0038 ( 0.0859 ) ν ^ = 6.9602 ( 0.5468 )
τ ^ = 1.0668 ( 0.0445 ) τ ^ = 0.8235 ( 0.0636 ) τ ^ = 0.3635 ( 0.3605 )
λ ^ 1 = 2.2636 ( 0.3131 ) p ^ = 1.0887 ( 5.1770 )
λ ^ 2 = 2.0389 ( 0.2892 ) q ^ = 1.7380 ( 26.1729 )
400 ν ^ = 6.9578 ( 0.0531 ) ν ^ = 7.0031 ( 0.0772 ) ν ^ = 6.9411 ( 0.2291 )
τ ^ = 1.0695 ( 0.0368 ) τ ^ = 0.8211 ( 0.0560 ) τ ^ = 0.3052 ( 0.2085 )
λ ^ 1 = 2.2368 ( 0.2761 ) p ^ = 0.7963 ( 4.7610 )
λ ^ 2 = 2.0189 ( 0.2553 ) q ^ = 0.6528 ( 0.7841 )
500 ν ^ = 6.9555 ( 0.0465 ) ν ^ = 7.0104 ( 0.0665 ) ν ^ = 6.9409 ( 0.0843 )
τ ^ = 1.0709 ( 0.0333 ) τ ^ = 0.8208 ( 0.0489 ) τ ^ = 0.2941 ( 0.1768 )
λ ^ 1 = 2.2289 ( 0.2450 ) p ^ = 0.6268 ( 0.4547 )
λ ^ 2 = 1.9930 ( 0.2162 ) q ^ = 0.6036 ( 0.4101 )
1000 ν ^ = 6.9553 ( 0.0335 ) ν ^ = 7.0077 ( 0.0488 ) ν ^ = 6.9493 ( 0.0526 )
τ ^ = 1.0699 ( 0.0242 ) τ ^ = 0.8208 ( 0.0358 ) τ ^ = 0.2680 ( 0.1085 )
λ ^ 1 = 2.2055 ( 0.1776 ) p ^ = 0.5409 ( 0.2539 )
λ ^ 2 = 1.9722 ( 0.1558 ) q ^ = 0.5344 ( 0.2460 )
Table 4. Results of fitting the LN, GB2 and DPLN distributions to automobile bodily injury claims data.
Table 4. Results of fitting the LN, GB2 and DPLN distributions to automobile bodily injury claims data.
Distribution
Estimate (S.E.)LNGB2DPLN
ν 0.620 ( 0.044 ) 1.204 ( 0.052 ) 1.200 ( 0.040 )
τ 1.445 ( 0.031 ) 0.022 ( 0.186 ) 0.047 ( 0.150 )
p or λ 1 0.017 ( 0.140 ) 1.324 ( 0.068 )
q or λ 2 0.030 ( 0.247 ) 0.749 ( 0.025 )
NLL2626.742573.472573.47
AIC5257.485154.945154.94
BIC5267.475174.925174.92
CT0.17163.44763.0888
Table 5. Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.
Table 5. Results of the simulation experiment involving 1000 simulations of data sets of size N, with standard errors shown in brackets.
Distribution
Sample Size N LNDPLN
100 ν ^ = 0.6220 ( 0.1442 ) ν ^ = 1.2001 ( 0.0028 )
τ ^ = 1.4336 ( 0.1054 ) τ ^ = 0.0470 ( 0.0004 )
λ ^ 1 = 1.3611 ( 0.1934 )
λ ^ 2 = 0.7663 ( 0.0861 )
200 ν ^ = 0.6214 ( 0.0995 ) ν ^ = 1.2001 ( 0.0020 )
τ ^ = 1.4397 ( 0.0714 ) τ ^ = 0.0470 ( 0.0002 )
λ ^ 1 = 1.3472 ( 0.1379 )
λ ^ 2 = 0.7581 ( 0.0606 )
300 ν ^ = 0.6189 ( 0.0820 ) ν ^ = 1.2000 ( 0.0016 )
τ ^ = 1.4430 ( 0.0580 ) τ ^ = 0.0470 ( 0.0002 )
λ ^ 1 = 1.3348 ( 0.1076 )
λ ^ 2 = 0.7537 ( 0.0483 )
400 ν ^ = 0.6201 ( 0.0737 ) ν ^ = 1.2000 ( 0.0015 )
τ ^ = 1.4377 ( 0.0507 ) τ ^ = 0.0470 ( 0.0002 )
λ ^ 1 = 1.3332 ( 0.0921 )
λ ^ 2 = 0.7509 ( 0.0419 )
500 ν ^ = 0.6223 ( 0.0627 ) ν ^ = 1.2000 ( 0.0013 )
τ ^ = 1.4430 ( 0.0440 ) τ ^ = 0.0470 ( 0.0002 )
λ ^ 1 = 1.3335 ( 0.0848 )
λ ^ 2 = 0.7520 ( 0.0383 )
1000 ν ^ = 0.6210 ( 0.0449 ) ν ^ = 1.2000 ( 0.0009 )
τ ^ = 1.4434 ( 0.0329 ) τ ^ = 0.0470 ( 0.0001 )
λ ^ 1 = 1.3273 ( 0.0581 )
λ ^ 2 = 0.7486 ( 0.0273 )
Table 6. Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile bodily injury claims dataset under LN, GB2 and DPLN generalized linear models.
Table 6. Parameter estimates, standard errors (S.E.) and p-values of the t-test for automobile bodily injury claims dataset under LN, GB2 and DPLN generalized linear models.
Generalized Linear Model
Estimate (S.E.)LNGB2DPLN
INTERCEPT0.764 (0.382)1.083 (0.383)1.023 (0.376)
p-value0.04580.00480.0067
ATTORNEY1.368 (0.075)1.215 (0.079)1.213 (0.075)
p-value<0.0001<0.0001<0.0001
CLMSEX−0.103 (0.076)−0.135 (0.070)−0.135 (0.069)
p-value0.17570.05240.0516
MARRIED−0.221 (0.235)−0.350 (0.233)−0.352 (0.234)
p-value0.34640.13400.1320
SINGLE−0.378 (0.241)−0.494 (0.237)−0.498 (0.237)
p-value0.11710.03740.0360
WIDOWED−0.887 (0.430)−0.748 (0.417)−0.744 (0.419)
p-value0.03930.07300.0763
CLMINSUR−0.009 (0.127)−0.043 (0.116)−0.041 (0.115)
p-value0.94480.70910.7218
SEATBELT−0.996 (0.278)−0.785 (0.272)−0.768 (0.272)
p-value0.00150.00400.0048
CLMAGE0.014 (0.003)0.013 (0.003)0.013 (0.003)
p-value0.0010<0.0001<0.0001
τ 1.230 (0.026)0.448 (0.129)0.538 (0.110)
p-value<0.00010.0006<0.0001
p or λ 1 0.513 (0.185)1.458 (0.139)
p-value 0.0055<0.0001
q or λ 2 0.670 (0.252)1.112 (0.085)
p-value 0.0079<0.0001
NLL2450.542429.592430.02
AIC4921.094883.184884.05
BIC4971.044943.124943.98
CT0.45246.25563.2488

Share and Cite

MDPI and ACS Style

Calderín-Ojeda, E.; Fergusson, K.; Wu, X. An EM Algorithm for Double-Pareto-Lognormal Generalized Linear Model Applied to Heavy-Tailed Insurance Claims. Risks 2017, 5, 60. https://doi.org/10.3390/risks5040060

AMA Style

Calderín-Ojeda E, Fergusson K, Wu X. An EM Algorithm for Double-Pareto-Lognormal Generalized Linear Model Applied to Heavy-Tailed Insurance Claims. Risks. 2017; 5(4):60. https://doi.org/10.3390/risks5040060

Chicago/Turabian Style

Calderín-Ojeda, Enrique, Kevin Fergusson, and Xueyuan Wu. 2017. "An EM Algorithm for Double-Pareto-Lognormal Generalized Linear Model Applied to Heavy-Tailed Insurance Claims" Risks 5, no. 4: 60. https://doi.org/10.3390/risks5040060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop