Next Article in Journal
RanKer: An AI-Based Employee-Performance Classification Scheme to Rank and Identify Low Performers
Previous Article in Journal
Potentially Related Commodity Discovery Based on Link Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Large Arcsine Exponential Dispersion Model—Properties and Applications to Count Data and Insurance Risk

1
Faculty of Industrial Engineering and Technology Management, HIT—Holon Institute of Technology, Holon 5810201, Israel
2
School of Business and Economics, Vrije University, 1081 HV Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(19), 3715; https://doi.org/10.3390/math10193715
Submission received: 26 August 2022 / Revised: 4 October 2022 / Accepted: 6 October 2022 / Published: 10 October 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
The large arcsine exponential dispersion model (LAEDM) is a class of three-parameter distributions on the non-negative integers. These distributions show the specific characteristics of being leptokurtic, zero-inflated, overdispersed, and skewed to the right. Therefore, these distributions are well suited to fit count data with these properties. Furthermore, recent studies in actuarial sciences argue for the consideration of such distributions in the computation of risk factors. In this paper, we provide a thorough analysis of the LAEDM by deriving (a) the mean value parameterization of the LAEDM; (b) exact expressions for its probability mass function at n = 0 , 1 , ; (c) a simple bound for these probabilities that is sharp for large n; (d) a simulation algorithm for sampling from LAEDM. We have implemented the LAEDM for statistical modeling of various real count data sets. We assess its fitting performance by comparing it with the performances of traditional counting models. We use a simulation algorithm for computing tail probabilities of the aggregated claim size in an insurance risk model.

1. Introduction

In this paper, we consider a class of parameterized probability distributions on the non-negative integers, which are given as members of an exponential dispersion model with specific variance functions. To clarify these concepts, we begin with the classic technique in statistics of representing a family of probability measures (or distributions) on the real line as a natural exponential family (NEF) [1,2] and its subsequent generalization to an exponential dispersion model (EDM) [3]. A key feature of an NEF is that it expresses the variances of the distributions as functions of their means. This leads not only to the mean value parameterization of the NEF [4], but also to its unique pair of mean and variance, which is called its variance function and denoted by V ( m ) [5].
Next, an interesting question arises as to what functions of V ( m ) do natural exponential families of probability distributions have as their variance function. In his seminal work, Morris [5] characterized all six NEFs with quadratic variance functions. Letac and Mora [2] characterized all six NEFs with a cubic variance structure. Out of the latter six families, two are absolutely continuous (relative to the Lebesgue measure) and four are discrete, supported by the set of non-negative integers. These four are the Abel, Tacács, strict arcsine, and large arcsine families. The probabilities of the first three families can be well expressed both in terms of their Laplace transform and in terms of its mean value representation. This was utilized by Bar-Lev and Ridder [6], who implemented the Abel, Takács, and strict arcsine families for fitting Swedish car insurance claim data. These families were used to model the counting variable N in the random sum S N = j = 1 N Y j representing the aggregated claim distribution in an insurance risk model, in which N is the number of claims over a time period and Y j s are the claim sizes. Bar-Lev and Ridder [6] computed the risk measure P ( S N > x ) for large x and demonstrated superiority over the commonly used counting distributions (e.g., Poisson and negative binomial).
The fourth discrete natural exponential family of probability distributions with a cubic variance function, namely the large arcsine family, was presented by Letac and Mora [2] in their characterization of cubic variance functions of NEFs on R . This was also obtained by Fosam and Shanbhag [7] in their characterization of the constant regression of cubic polynomial statistics on the sample mean. However, beyond these two characterization results, the large arcsine family has not been studied in the literature with respect to its probabilistic and statistical features and thus has not been analyzed as a candidate for modeling any real count data sets. The main reason for this seems to be due to the fact that its Laplace transform does not have an explicit form.
Our contribution in this paper aims to fill this gap. We present probabilistic and statistical aspects of the large arcsine NEF and its associated exponential dispersion model, which we denote by LAEDM. The kernel (or generating measure) of the probability distributions of the LAEDM has a rather cumbersome analytic expression with different forms for even and odd values. Moreover, the respective Laplace transform of its generating measure cannot be expressed explicitly and thus cannot serve as a normalizing constant for computing the relevant LAEDM probabilities. Fortunately, an expressible alternative to this normalizing constant is available in terms of the mean and is obtained by utilizing the mean value parameterization of NEFs.
Our method for determining the LAEDM distributions is based on the mean value parameterization of NEFs and uses the Lagrange formula of Letac and Mora [2]. This leads to exact expressions for the LAEDM probabilities which grant it the ability to run the simulation experiments that are needed for applications such as risk model computations. However, in many other cases, such as the ABM and LM distributions of Bar-Lev and Ridder [8,9], the probabilities can be only computed numerically without being able to use Monte Carlo simulations. Recently, Jørgensen and Kokonendji [10] introduced two- and three-parameter discrete dispersion models by combining convolution with a factorial tilting operation. The Poisson–Tweedie mixture model appears from this approach and is implemented in our data fitting experiments in Section 6.
As opposed to the Abel, Takács, and the strict arcsine families which depend on two parameters, the LAEDM depends on three parameters, a fact that allows more flexibility in statistical modeling. In this paper, we thoroughly study various features of the LAEDM. We rewrite its kernel and derive its probability mass function in terms of its mean value representation. As the latter is rather intricate (particularly, for large positive mass probability n), we study its asymptotic behavior as well as tail probabilities. We provide expressions for the respective moments and show that all members of LAEDM are skewed to right and leptokurtic. In addition, we present a scheme of Monte Carlo simulation for the LAEDM probabilities. Such simulations are needed for many statistical, insurance risk-related, and operational problems. Most of these problems cannot be solved analytically, a fact which requires Monte Carlo computations.
We also consider the effectiveness of LAEDM as a candidate for modeling real count data sets, while comparing it to other frequently used probability models. For this, we utilize various metrics of goodness-of-fit tests such as chi-squared tests, Akaike information criterion, root-mean-square error, and Kullback–Leibler divergence to demonstrate its superior fit.
The paper is organized as follows. For the sake of readability, we summarize in Section 2 some preliminaries of the concept’s natural exponential family, exponential dispersion model, mean value parameterization, and variance function. Section 3 presents the main features and properties of the LAEDM, notably, (a) its mean value parameterization; (b) exact expression of the probability mass function f ( n ) ; (c) a simple bound for these probabilities that is sharp for large n; (d) moments, central moments, and coefficients of skewness and kurtosis. In particular, we show that all LAEDM distributions are skewed to the right and are leptokurtic. Section 4 develops a Monte Carlo algorithm for sampling from the LAEDM distributions. Applications of the LAEDM are given in Section 5 and Section 6. In Section 5, we implement the LAEDM in an collective risk model and analyze its statistical performance for a data set of car insurances, and in Section 6, the LAEDM is implemented in the statistical modeling of various real count data sets. Section 7 is devoted to some concluding remarks.

2. Preliminaries

We summarize the concepts of the natural exponential family (NEF), the exponential dispersion model (EDM), mean value parameterization, and variance function.

2.1. Natural Exponential Family

Let μ be a positive non-Dirac Radon measure on R , S the support of μ , and C the convex-hull of S. The Laplace transform of μ is defined by
L ( θ ) = R e x θ μ ( d x ) ,
where the effective domain D of μ is D = { θ R : L ( θ ) < } . We assume that Θ = int D , so that Θ is a nonempty open interval. Thus the cumulant transform k ( θ ) = log L ( θ ) of μ is well defined by Θ . The family F of probability measures (or distributions) defined by
F = P θ ( d x ) = e x θ k ( θ ) μ ( d x ) , θ Θ
is the NEF generated by μ . Therefore, μ is called a generating kernel for F . It is easy to see that μ is not unique in its generation of F ; any exponential shift μ * ( d x ) = e a + b x μ ( d x ) generates the same family of distributions.

2.2. Variance Function

The cumulant transform λ θ ( s ) of a distribution P θ ( d x ) F is defined by log R e s x P θ ( d x ) , which equals
λ θ ( s ) = log R e s x e θ x k ( θ ) μ ( d x ) = k ( s + θ ) k ( θ ) .
The first moment and the j-th central moments j = 2 , 3 , of a distribution P θ ( d x ) F can be computed by the derivatives ( d / d s ) j λ θ ( s ) | s = 0 . From (1) we see that these moments are equal to k ( j ) ( θ ) = ( d / d θ ) j k ( θ ) , j = 1 , 2 , . In particular, k ( θ ) and k ( θ ) , and θ Θ , are the respective mean and variance.
The open interval M = k ( Θ ) is called the mean domain of F . Note that it depends on the generating kernel μ only through its exponential shifts. The map θ k ( θ ) is one to one; thus, its inverse function ( k ) 1 : M Θ is well defined. Then, the variance of a distribution P θ ( d x ) F can be expressed as a function of the mean m,
V ( m ) = k ( k ) 1 ( m ) .
The pair ( V , M ) is called the variance function of the NEF F . It uniquely determines F within the class of NEFs (see Letac and Mora [2], Morris [5]).

2.3. Mean Parameterization

The inverse function ( k ) 1 , as a function on the mean domain M , is denoted by ψ . Differentiating the ψ function, we obtain the following from (2):
ψ ( m ) = 1 k ( k ) 1 ( m ) = 1 V ( m ) .
Next, we define the function ϕ on the mean domain by ϕ ( m ) = k ( k ) 1 ( m ) = k ψ ( m ) . Differentiating, we obtain
ϕ ( m ) = k ψ ( m ) ψ ( m ) = m V ( m ) .
Note that in this way, the ψ and ϕ functions can be interpreted as being primitives of the functions 1 / V ( m ) and m / V ( m ) , respectively, on the mean domain M .
Now, let there be given a variance function ( V , M ) of an NEF F , without having specified the generating kernel μ . We choose two primitives ψ and ϕ of 1 / V ( m ) and m / V ( m Z ) , respectively, and then there is a positive Radon measure μ on the real line such that [2],
ϕ ( m ) = log R e ψ ( m ) μ ( d x ) , m M .
This leads to expressing the NEF F in terms of the mean as
F = P m ( d x ) = e x ψ ( m ) ϕ ( m ) μ ( d x ) , m M .
The representation (3) is called the mean value parametrization of F [2,4].
The generating kernel is not unique, as any exponential shift generates the same family. This corresponds to the fact that the functions ψ ( m ) and ϕ ( m ) are not unique as primitives. Indeed, the set of such primitives is infinitely uncountable. We shall detail how to choose appropriate primitives in subsequent work. Parameterization (3) is important for two reasons. One is related to the fact that the parameter m is as meaningful as the mean, and therefore is much more significant than the canonical parameter θ which is just the argument of the corresponding Laplace transform. The second reason is related to situations in which the corresponding Laplace transforms are not explicitly expressed, while primitives ψ and ϕ are easy. Numerous examples of the latter situation are presented in Bar-Lev and Kokonendji [4], Awad et al. [11].
Additionally, our study relates to the latter situation. We consider an NEF given by the variance function ( V , M ) in which the mean domain M = R + , and variance
V ( m ) = m 1 + 2 m + 1 + a 2 a 2 m 2 , m M ,
where a > 0 is a positive parameter. In Section 3, we will derive primitives ϕ and ψ and an associated kernel μ for the NEF represented by the mean value parametrization (3). This NEF is called the large arcsine family in Letac and Mora [2].

2.4. Exponential Dispersion Model

Let F be an NEF generated by a kernel μ with a variance function ( V , M ) , a Laplace transform L ( θ ) , and a cumulant transform k ( θ ) . Consider the set
Λ = p R + : L p is a Laplace transform of a kernel μ p .
Then, Λ is nonempty due to convolution. This is called the Jørgensen set (or the dispersion parameter space). It has been shown that Λ = R + if μ is infinitely divisible (then also all distributions in the family F are infinitely divisible). For any p Λ , the NEF generated by μ p is the set of probability measures of the form
F p = P θ , p ( d x ) = e x θ p k ( θ ) μ p ( d x ) , θ Θ .
The set of NEFs
p Λ F p
is the EDM associated with μ [12]. The parameter p is called the dispersion parameter. In particular if Λ = R + (i.e., μ is infinitely divisible) then EDMs are used to describe the distribution of the error component in generalized linear models (see Nelder and Wedderburn [13], and Jørgensen [3,12] for numerous applications). Note that an EDM is an uncountable set and that the NEF with which we began to construct the EDM is just a special case with a unit dispersion parameter.
Next, let us develop the mean value parameterization of an EDM. The NEF F p of (5) has a variance function of ( V p , M p ) with M p = p M . Clearly,
if M = R + then M p = p M = R + .
The variance satisfies (where k p ( θ ) = p k ( θ ) ),
V p ( m ) = k p ( k p ) 1 ( m ) = p k ( k ) 1 ( m / p ) = p V ( m / p ) .
Then, it is easy to see that we can choose primitives ψ p ( m ) of 1 / V p ( m ) , and ϕ p ( m ) of m / V p ( m ) that satisfy
ψ p ( m ) = ψ ( m / p ) , ϕ p ( m ) = p ϕ ( m / p ) .
Then, we obtain the mean value parameterization corresponding to (5),
F p = P m , p ( d x ) = e x ψ ( m / p ) p ϕ ( m / p ) μ p ( d x ) , m M p .
Specifically, in our study of the large arcsine family with variance given in (4) for the NEF ( p = 1 ), the variance for the EDM (any p > 0 ) becomes
V p ( m ) = m 1 + 2 m p + 1 + a 2 a 2 m 2 p 2 .
This will be the variance function of our study. We denote the associated exponential dispersion model by LAEDM (large arcsine exponential dispersion model).

2.5. Literature Review on Discrete EDMs and Related Distributions

Exponential dispersion models (EDMs) are considered to be powerful tools for statistical analysis because of their modeling flexibility, convolution properties, their usage for generalized linear models [3,12], and their mean value parameterization feature [2,4,5]. The latter enables the modeling of the variances of the distributions as functions of the mean. Jørgensen [12] showed that all polynomial variance functions with non-negative coefficients and a zero constant coefficient correspond to infinitely divisible NEFs. A discrete EDM is an EDM for which its distributions have a discrete domain (not necessarily the integers). The LAEDM is a special case of a discrete exponential dispersion model where its domain is the non-negative integers (denoted N 0 ) and it has a polynomial variance function of degree 3 which satisfies the infinite divisibility property. As a consequence, its variance function (10) fulfills (6).
In the introductory section, we already reflected on the three other discrete EDMs on N 0 with cubic variance functions identified by Letac and Mora [2] and implemented in Bar-Lev and Ridder [6]. Other simple discrete EDMs on N 0 are the classic Poisson with variance function V ( m ) = m (degree 1 polynomial), the binomial with V ( m ) = m ( 1 m / N ) (degree 2), and the negative binomial with V ( m ) = m ( 1 + m / r ) (degree 2) identified by Morris [5]. For ease, we mention here the variance function of the default NEF. The general variance function for dispersion p has the form V p ( m ) = p V ( m / p ) (see (7)).
In Bar-Lev and Ridder [8,9], we analyzed discrete EDMs in N 0 with polynomial variance functions in the form V ( m ) = m ( 1 + m ) r for any r = 0 , 1 , and discrete EDMs on N 0 with rational variance functions of the form V ( m ) = m / ( 1 m ) r , r = 1 , 2 , . We showed that these models give excellent fitting performances of data showing zero-inflation, overdispersion, and a large amount of skewness and kurtosis.
Kokonendji et al. [14] investigated two discrete EDMs, the first is a class of Poisson mixture with positive Tweedie mixing distributions, hence called a Poisson–Tweedie EDM, which is concentrated on N 0 . Its variance function has the form V ( m ) = m + m γ exp ( 2 γ ) ψ ( m ) , where γ 1 , and ψ ( m ) the inverse of the derivative of the cumulant function (see Section 2.3). The second discrete EDM in [14] has the variance function V ( m ) = m + m γ , γ > 1 , and is called Hinde–Demétrio class. If γ = 1 , 2 , , it is concentrated on N 0 . The probability mass functions of the Poisson–Tweedie EDM and the Hinde-Demétrio EDM are generally not easy, except for in cases with specific parameter values. Kokonendji et al. [14] considered two data sets of car insurance claims [15] and fit Poisson–Tweedie EDM with γ = 2 (which is negative binomial model) and the Hinde–Demétrio EDM with γ = 3 , which is the strict arcsine model (see also Kokonendji and Khoudar [16]). The generating measure of the Poisson–Tweedie EDM is the Poisson–Tweedie mixture distribution whose probability mass function is more easily computable, and has three parameters that allow for maximum likelihood estimation. Therefore, it has been applied in a wide range of modeling data, such as crash and traffic accident data [17,18], species abundance data [19], and longitudinal RNA-sequencing data [20]. In Section 6, we implement the Poisson–Tweedie mixture distribution as one of the models for fitting count data.
Two of the three parameters of the Poisson–Tweedie mixture model are the dispersion p and the power γ of the Tweedie EDM [12]. Then, the variance of the Poisson–Tweedie mixture distribution is expressed by its mean as Var = m + p m γ . Recently, Abid et al. [21] extended this model to the Poisson–exponential Tweedie model by the relationship Var = m + m 2 + p m γ , and applied this model for fitting overdispersed count data sets. Although they are closely related, note however that these distributions are not members of an EDM.

3. The Large Arcsine Exponential Dispersion Model

This section is dedicated to our main study. We first establish the mean value parametrization of the LAEDM and the choice of appropriate primitives ψ p ( m ) and ϕ p ( m ) for any dispersion parameter p. From these, we argue that the probability distributions P m , p ( d x ) (see (9)) of the LAEDM are concentrated on the non-negative integers. Therefore, we denote these as the probability mass functions f m , p ( n ) , n = 0 , 1 , . We then present a generating kernel μ p = { μ p ( n ) , n = 0 , 1 , } from which we obtain exact expressions for the probability mass functions f m , p ( n ) . As these expressions are rather cumbersome, we derive simple bounds of f m , p ( n ) for any n. These bounds are sharp, as n . Furthermore, in the next section, the bounds serve for developing a sampling algorithm based on the accept–reject method.
We end this section by presenting general expressions for the central moments of LAEDM and show that all LAEDM members are skewed to the right and are leptokurtic, i.e., in terms of shape, a leptokurtic distribution has fatter tails.

3.1. The Mean Value Parameterization of the LAEDM

From now on and for convenience, we will use the abbreviation LAEDM, both to indicate the variance function (10) and to its associated EDM. In this section, we will derive the mean value parameterization (9) of the LAEDM. To obtain the mean value parameterization, we start with computing the primitives ψ ( m ) of 1 / V ( m ) , and ϕ ( m ) of m / V ( m ) of the NEF whose variance is given for the unit dispersion in (4).
Lemma 1.
Consider the large arcsine NEF with variance function given in (4). Then,
1 V ( m ) d m = log m 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m + a 2 a arctan ( 1 + a 2 ) m + a 2 a + c , m V ( m ) d m = a arctan ( 1 + a 2 ) m + a 2 a + d ,
where c , d R are integration constants.
Proof. 
Use partial fraction for
1 V ( m ) = 1 m 1 + 2 m + 1 + a 2 a 2 m 2 = a 2 m a 2 + 2 a 2 m + ( 1 + a 2 ) m 2 = 1 m 1 2 log 2 a 2 + 2 ( 1 + a 2 ) m a 2 + 2 a 2 m + ( 1 + a 2 ) m 2 a 2 a 2 + 2 a 2 m + ( 1 + a 2 ) m 2 .
The first two terms have primitives log m and 1 2 log 1 + 2 m + ( 1 + a 2 ) m 2 , respectively. For the third term, apply square completion of the denominator to obtain after rewriting
1 + a 2 1 + ( 1 + a 2 ) m + a 2 a 2 ,
with primitive a arctan ( 1 + a 2 ) m + a 2 a .
Similarly,
m V ( m ) = 1 1 + 2 m + 1 + a 2 a 2 m 2 = a 2 a 2 + 2 a 2 m + ( 1 + a 2 ) m 2 = 1 + a 2 1 + ( 1 + a 2 ) m + a 2 a 2 ,
with primitive a arctan ( 1 + a 2 ) m + a 2 a .    □
Primitives of the variance function (10) for any dispersion parameter p > 0 follow from (8),
ψ p ( c ) ( m ) = ψ ( m / p ) = log m 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 a arctan ( 1 + a 2 ) m + a 2 p a p + c , ϕ p ( d ) ( m ) = p ϕ ( m / p ) = a p arctan ( 1 + a 2 ) m + a 2 p a p + d ,
where we have added the superscripts c and d to indicate the integration constants that are free parameters at this moment.
Corollary 1.
The LAEDM is a class of probability distributions on the non-negative integers.
Proof. 
Proposition 4.4 of Letac and Mora [2] provides the necessary and sufficient conditions under which a given variance function is associated with an NEF concentrated on the non-negative integers. We check these conditions for any NEF of an LAEDM given by the variance function (10). The first two conditions ( M = ( 0 , b ) for some 0 < b and ( ϕ p ( d ) ) = m / V p ( m ) is a real analytic on M ) are clearly satisfied. The third condition is
lim m 0 ( ϕ p ( d ) ) ( m ) = 1 .
Because ( ϕ p ( d ) ) ( m ) = m / V p ( m ) = 1 + 2 m / p + ( 1 + a 2 ) m 2 / ( a p ) 2 1 , the condition (12) is immediate.    □
Now, the question becomes how to choose, appropriately, the integration constants c and d. The most convenient method is to impose [2,8]
lim m 0 ϕ p ( d ) ( m ) = 0 , and lim m 0 m e ψ p ( c ) ( m ) = 1 .
We denote the resulting primitives just by ψ p ( m ) and ϕ p ( m ) . Then, under these conditions, a generating kernel μ p can be represented (and computed) by [2,8],
μ p ( n ) = 1 n ! d d m n 1 e ϕ p ( m ) ϕ p ( m ) m e ψ p ( m ) n | m = 0 .
Proposition 1.
The mean value parameterization of LAEDM that satisfies the conditions (13) is given by primitives
ψ p ( m ) = log a m p 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 a arctan ( 1 + a 2 ) m + a 2 p a p arctan a , ϕ p ( m ) = a p arctan ( 1 + a 2 ) m + a 2 p a p arctan a .
Proof. 
It suffices to determine the integration constants c and d in (11) such that (13) holds. Concerning the ϕ p ( d ) primitive in (11), we immediately obtain d = a p arctan a . To compute c, let ξ p ( c ) ( m ) = ψ p ( c ) ( m ) log m , then
m e ψ p ( c ) ( m ) = e log m ψ p ( c ) ( m ) = e ξ p ( c ) ( m ) ,
where (see (11))
ξ p ( c ) ( m ) = 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 a arctan ( 1 + a 2 ) m + a 2 p a p + c .
Thus,
lim m 0 m e ψ p ( c ) ( m ) = 1 lim m 0 ξ p ( c ) ( m ) = 0 c = log a p + a arctan a .
Now, we substitute the found constants c and d in (11) to obtain (15).    □
The final component of the mean value parameterization (9) is the generating kernel { μ p ( n ) , n = 0 , 1 , } .
Proposition 2.
Consider the discrete measure on the non-negative integers given by
μ p ( 2 n ) = p p + 2 n 1 ( a p ) 2 n 1 ( 2 n ) ! k = 0 n 1 a 2 ( 2 n + p ) 2 + 4 k 2 , n = 0 , 1 , , μ p ( 2 n + 1 ) = a p ( a p ) 2 n + 1 1 ( 2 n + 1 ) ! k = 0 n 1 a 2 ( 2 n + 1 + p ) 2 + ( 2 k + 1 ) 2 , n = 0 , 1 , ,
(with empty products equal one). This measure generates the mean value parameterization of LAEDM, and serves as a kernel.
Proof. 
When the primitives ψ p and ϕ p satisfy the conditions (13), a kernel can be computed by (14) for n = 1 , 2 , , with μ p ( 0 ) = 0 . This has been elaborated in Letac and Mora [2] by applying the Lagrange formula, to become
μ p ( n ) = p p + n 1 ( a p ) n 1 n ! π n a ( n + p ) , n = 0 , 1 ,
where the polynomials ( π n ) n = 0 are defined by
π 2 n ( x ) = k = 0 n 1 x 2 + 4 k 2 , π 2 n + 1 ( x ) = x k = 0 n 1 x 2 + ( 2 k + 1 ) 2 .
By computation, we can obtain the expressions in (16).    □

3.2. Computation of the Probability Mass Functions

Let { f m , p ( n ) , n = 0 , 1 , } be the probability mass function as member of LAEDM with a specific mean m > 0 and dispersion p > 0 . It is represented by the mean value parameterization
f m , p ( n ) = μ p ( n ) e n ψ p ( m ) ϕ p ( m ) , n = 0 , 1 , ,
where the kernel μ p and the primitives ψ p and ϕ p are computed in Section 3.1. We present here their expressions that result after straightforward computations. For completeness, we give the calculus in Appendix A.
Lemma 2.
The LAEDM probability mass functions are given by
f m , p ( 2 n ) = e a p B p p + 2 n 1 C n 1 ( 2 n ) ! k = 0 n 1 a 2 ( 2 n + p ) 2 + 4 k 2 , f m , p ( 2 n + 1 ) = a p e a p B 1 C 1 C n 1 ( 2 n + 1 ) ! k = 0 n 1 a 2 ( 2 n + 1 + p ) 2 + ( 2 k + 1 ) 2 ,
where
B = arctan ( 1 + a 2 ) m + a 2 p a p arctan a , C = ( 1 + a 2 ) ( 1 + D ) e 2 a B , D = a 2 1 + a 2 p m 2 + p m .
These exact expressions are rather cumbersome, and not very helpful for recognizing structural properties. However, in Appendix B we shall prove the following theorem expressing simple bounds and asymptotics. For these, define
E = 1 2 log 1 + 1 a 2 1 + a arctan 1 a , γ = e a p B p e p 2 π , ρ = a e 1 + E C .
All these parameters depend only on the variance function parameters m , p , a .
Proposition 3.
(i) 
For all n = 1 , 2 , ,
f m , p ( n ) γ n 3 / 2 ρ n .
(ii) 
The bound is asymptotically sharp.
f m , p ( n ) = γ n 3 / 2 ρ n + o ( 1 ) , as n .
(iii) 
For all m , p , a > 0 , 0 < ρ < 1 .
Hence, we see that the tail of an LAEDM distribution decays to zero faster than the geometric distribution with parameter ρ . Notwithstanding, we call it the geometric parameter of LAEDM.

3.3. Moments, Central Moments, Skewness and Kurtosis

Consider the NEF F p of an LAEDM with an arbitrary dispersion parameter p > 0 (see (5)) In Section 2.4, we saw that F p has the cumulant k p ( θ ) as function of the natural parameter, variance function V p ( m ) as function of the mean parameter, and ψ p ( m ) as a primitive of 1 / V p ( m ) . Then, we can define the derivatives of the cumulant in the mean parameterization by
k p ( j ) ( m ) = d d θ j k p ( θ ) | θ = ψ p ( m ) .
Recursively, it can easily be shown that
k p ( j + 1 ) ( m ) = V p ( m ) k p ( j ) ( m ) , j = 1 , 2 , ; m M p .
Now, consider the random variable X m , p associated with the probability mass function (17). By definition, its mean is given by k p ( 1 ) ( m ) = m . Its higher central moments are
C p ( j ) ( m ) = E ( X m , p m ) j , j = 2 , 3 , .
Using the recursion (21) for the cumulant derivatives, we can easily obtain that
C p ( 2 ) ( m ) = k p ( 2 ) ( m ) = V p ( m ) , C p ( 3 ) ( m ) = k p ( 3 ) ( m ) = V ( m ) V ( m ) , C p ( r + 2 ) ( m ) = k p ( r + 2 ) ( m ) + j = 2 r r + 1 j C p ( j ) ( m ) k p ( r j + 2 ) ( m ) , r 2 .
The LAEDM shares the following properties.
Proposition 4.
The LAEDM distributions are
(a) 
Overdispersed (relative to the Poisson NEF).
(b) 
Zero inflated (relative to the Poisson NEF).
(c) 
Skewed to the right and leptokurtic.
Proof. 
(a)
This is simple as V p ( m ) > m .
(b)
Let X m , p be an LAEDM and Y m be a Poisson, both with mean m. We need to show that
P ( X m , p = 0 ) > P ( Y m = 0 ) , for all m , p .
Calculate
P ( X m , p = 0 ) = f m , p ( 0 ) = μ p ( 0 ) e ϕ p ( m ) = e ϕ p ( m ) ,
where by (15),
ϕ p ( m ) = a p arctan ( 1 + a 2 ) m + a 2 p a p arctan a .
Because P ( Y m = 0 ) = e m , it suffices to show that
m > a p arctan ( 1 + a 2 ) m + a 2 p a p arctan a
for all m , a and p. Define
h ( m ) = m a p arctan ( 1 + a 2 ) m + a 2 p a p arctan a ,
then
h ( m ) = m m + a 2 m + 2 a 2 p a 2 m 2 + 2 a 2 m p + a 2 p 2 + m 2 > 0 ,
for all m > 0 , a > 0 , p > 0 . As h ( 0 ) = 0 , (22) follows.
(c)
Recall that a distribution is skewed to the right if its skewness coefficient γ 1 > 0 . It is called leptokurtic if it has a positive excess kurtosis γ 2 > 0 (i.e., in terms of shape it has fatter tails). For the LAEDM variance function, we have by (21) and denoting k j k p ( j ) ( m ) , C j C p ( r ) ( j ) , that k j > 0 for all j 1 , implying
γ 1 = C 3 C 2 3 / 2 = k 3 k 2 3 / 2 > 0 , γ 2 = C 4 C 2 2 = k 4 + 3 C 2 k 2 k 2 2 = k 4 + 3 k 2 2 k 2 2 > 0 .
   □

4. Monte Carlo Simulation Algorithm for Sampling from LAEDM Distributions

From the upper bound in Proposition 3, we are able to construct a Monte Carlo simulation algorithm for sampling from any LAEDM distribution. The method is based on the accept–reject method [22].
Let X be the random variable on the non-negative integers with probability mass function f m , p ( n ) , n = 0 , 1 , , as given in Lemma 2, and with the upper bound displayed in Proposition 3. We shall consider two random variables, Y and Z, both on the positive integers, as candidates for majorizing X | X 1 in the accept–reject sampling method. We define the probability in zero,
q = f m , p ( 0 ) = e a p B .
  • Consider the random variable Y on { 1 , 2 , } with probability mass function (or density)
    g ( n ) = ρ n 1 n 3 / 2 ρ n ( n + 1 ) 3 / 2 , n = 1 , 2 , .
    Now, we majorize the density of X given X 1 :
    f m , p ( n | n 1 ) = f m , p ( n ) 1 q q 1 q p e p 2 π ρ n n 3 / 2 = q 1 q p e p 2 π ρ 1 ρ ( 1 ρ ) ρ n 1 n 3 / 2 = q 1 q p e p 2 π ρ 1 ρ ρ n 1 n 3 / 2 ρ n n 3 / 2 q 1 q p e p 2 π ρ 1 ρ ρ n 1 n 3 / 2 ρ n ( n + 1 ) 3 / 2 .
    In this way, we obtain the inequality f m , p ( n | n 1 ) C y g ( n ) for all n = 1 , 2 , , with the majorizing constant
    C y = q 1 q p e p 2 π ρ 1 ρ .
  • Consider the random variable Z on { 1 , 2 , } with density
    h ( n ) = 1 n + 1 1 + 1 n 1 , n = 1 , 2 , .
    This easily shows that
    1 n 3 / 2 2 2 1 h ( n ) .
    Because ρ < 1 , we find
    f m , p ( n | n 1 ) = f m , p ( n ) 1 q q 1 q p e p 2 π ρ n n 3 / 2 q 1 q p e p 2 π 2 2 1 h ( n ) .
    So now, we obtain the inequality f m , p ( n | n 1 ) C z h ( n ) for all n = 1 , 2 , , with the majorizing constant
    C z = q 1 q p e p 2 π 2 2 1 .
Consequently, sampling from an LAEDM distribution can be executed by an accept–reject algorithm using the dominating Y or Z defined above. In fact, we choose the one with the smallest dominating factor C (highest chance of acceptance), thus
ρ 1 ρ < 2 2 1 use   Y ; ρ 1 ρ > 2 2 1 use   Z .
Hence, given the parameters p , m , a of the LAEDM, we compute the probability in zero q and the majorizing constants C y , C z as in (24) and (26). Then, Algorithm 1 summarizes the accept–reject method in the case of sampling using the Z majorant. The case of using the Y majorant is similar.
Algorithm 1 Sampling X from LAEDM Using Z
1:
Generate U uniform ( 0 , 1 )
2:
if U < q then
3:
   return 0
4:
else
5:
   repeat
6:
     Generate X h ( · )
7:
     Compute P = f ( X ) | X 1 ) C z h ( X )
8:
     Generate U uniform ( 0 , 1 )
9:
   until U < P
10:
   return X
11:
end if
In the following sections, we sketch the process of generating samples of the majorant distributions of Y and Z. As a performance test, we sampled N = 10000 samples of the LAEDM distribution with parameters a = 2.3 , p = 1.5 , and m = 3.7 . The geometric parameter ρ satisfies ρ / ( 1 ρ ) > 2 / ( 2 1 ) , and thus the Z majorant was used. The acceptance ratio in Algorithm 1 for these LAEDM parameters was 1 / C Z = 0.31 . The simulated samples were tested (chi-square) against expected numbers. The p-value was 0.46, which verifies that the algorithm was implemented correctly. Figure 1 shows the simulated and expected number of data.

4.1. Sampling Y

The cumulative distribution function associated with density (23) is
G ( n ) = 1 P ( Y n + 1 ) = 1 ρ n ( n + 1 ) 3 / 2 , n = 1 , 2 , .
Hence, the inverse transform method applies,
Y = inf { n = 1 , 2 , : G ( n ) U } ,
where U uniform ( 0 , 1 ) . Solving for u ( 0 , 1 ) :
G ( n ) u ρ n ( n + 1 ) 3 / 2 1 u .
Thus, let x > 0 be the solution to
ρ x ( x + 1 ) 3 / 2 = 1 u ,
then Y = x . Now,
ρ x ( x + 1 ) 3 / 2 = 1 u ρ x + 1 ρ ( x + 1 ) 3 / 2 2 / 3 = ( 1 u ) 2 / 3 ( x + 1 ) ρ ( 2 / 3 ) ( x + 1 ) = ρ ( 1 u ) 2 / 3 ( x + 1 ) e ( 2 / 3 ) log ρ ( x + 1 ) = ρ ( 1 u ) 2 / 3 ( 2 / 3 ) log ρ ( x + 1 ) e ( 2 / 3 ) log ρ ( x + 1 ) = ( 2 / 3 ) log ρ ρ ( 1 u ) 2 / 3
This reads as w e w = z (for positive z) whose solution is the Lambert W function. Hence,
2 3 log ρ ( x + 1 ) = W 2 3 log ρ ρ ( 1 u ) 2 / 3 .
Finally, solving for x, we obtain the random sample
Y = 3 2 log ρ W 2 3 log ρ ρ ( 1 U ) 2 / 3 1

4.2. Sampling Z

Recall the density (25) of Z,
h ( n ) = 1 n + 1 1 + 1 n 1 , n = 1 , 2 , .
It is well known (for instance, see page 550 in Devroye [22]) that such a density is associated with the random variable U 2 , where U is the uniform ( 0 , 1 ) variate. Sampling is easy.

5. Applications of the LAEDM to Risk Measures

Risk measures are statistical indicators that are used by investors, financial institutions, and financial regulators for assessing investment risk. Their main purpose is to determine an amount of capital to keep in reserve in order to cope with risk. Here we consider risk measures for a collective risk model S N = j = 1 N Y j , where
  • Y 1 , Y 2 , are i.i.d. positive random variables representing the individual claims (or losses) at, for instance, an insurance (or financial) company.
  • N N 0 = { 0 , 1 , } is a random variable designating the total number of claims (or losses) occurring during a certain time period. Commonly it is called the frequency of the claims.
  • N and the Y j s are independent.
Risk measures could be
  • Catastrophic risk, or the loss probability [23]
    P S N > x
    for large levels of x. Such a case is very familiar to actuaries since many insurance policies include a deductible and reinsurance contracts which involve some level of retention from the insurer [24].
  • The value at risk (VaR) at confidence level q ( 0 , 1 ) :
    VaR q = inf x : P ( S N x ) q ) .
  • The tail conditional expectation at level q ( 0 , 1 ) [25,26]:
    TCE q = E S N | S N > VaR q .
    This is also known to be the expected shortfall, or the conditional value-at-risk. It is interpreted as the expected worst possible loss, given that this loss exceeds the value at risk. Typically in practice, q is taken to be larger than 0.9 .
For a case study, we chose the data from Bar-Lev and Ridder [6] to compute the loss probability (27) in a car insurance company. Typically for automobile insurance, only a small percentage of the policyholders will file claims in any given year. Hence, data sets for insurance risk modeling are highly zero-inflated, and moreover, they involve overdispersed distributions (see Lee [27] and references therein). The data that we considered were taken from a Swedish car insurance company and are publicly available [28,29]. These data satisfy the aforementioned properties (see Table 1 below). From Bar-Lev and Ridder [6], we obtained the frequency distributions of Abel, strict arcsine, and Takács EDMs. These are EDMs on the non-negative integers with cubic variance functions, and their distributions are zero-inflated and overdispersed. We took the claim distributions from gamma and inverse Gaussian distributions and distributions from the natural exponential family (NEF) generated by positive α -stable random variables. The reason for taking these claim distributions is that each forms a natural exponential family also containing their convolutions. Our findings showed that implementing Abel, strict arcsine, or Takács for the frequency with any of the mentioned claim size distributions is significantly superior than the use of the classical Poisson or negative binomial frequency distributions, according to various goodness-of-fit metrics.
As a follow-up to our previous study, we now investigate whether the fourth EDM with cubic variance function, the large arcsine EDM, might be an alternative for modeling the frequency distribution. The data set is the same as the one in Bar-Lev and Ridder [6]: 630 observations ( n i , s i ) where n i is the i-th frequency (sample of N) and s i the i-th observation of the aggregated claim (sample of S N ). The statistics of the data are shown in Table 1. Note that the individual claim data ( y i j ) are not observed, but that their sample average can be computed:
m ^ Y = i s i i n i .
For the sample variance of the individual claims, we use the well-known identity for the variance of the aggregated sum S N :
Var ( S N ) = ( E [ N ] ) ( Var ( Y ) ) + ( Var ( N ) ) ( E [ Y ] ) 2 .
From Table 1 we see that the claim frequency is overdispersed ( V / m = 52181.5 / 70.60 1 ) and zero-inflated ( p 0 = 0.06349 2.189 × 10 31 = e m ). The LAEDM parameters ( a , p , m ) are estimated by matching the fraction of zeros p 0 , the mean m, and the variance V ( m ) . The distribution of the claim size Y is chosen from gamma, inverse Gaussian, and NEF positive α -stable distributions. These are two-parameter distributions, estimated by two moment matching.
With these parameters, we have fit the frequency distribution and the claim distribution. Then, we ran simulations of aggregated claim sizes in these models and executed the chi-square test for goodness of it (hypothesizing that the samples came from the same distribution). Furthermore, we show the histograms in Figure 2 and Q–Q plots of the data S N versus simulated S N in these three cases in Figure 3.
Table 2 summarizes the test results in terms of p-values.
As a comparison with classic modeling, we have fitted the negative binomial (NB) frequency distribution. When combining this with with gamma, inverse Gaussian, and NEF-stable claim distributions, all three combinations gave p-values of the order 10 5 of the goodness of fit for the aggregated sum samples. Figure 4 shows the three Q–Q plots. Clearly, the NB distribution does not catch the skewness, and is outperformed by the LAEDM.

6. Statistical Modeling

In this section, we will investigate the usage of LAEDM distributions for fitting real-world count data. We consider only data that show the four properties of Proposition 4, i.e., those that are zero-inflated, overdispersed, skewed to the right, and leptokurtic. Many data sets fulfilling these conditions are available in areas such as actuarial sciences, labor economics, health economics, and behavioral science. We choose five data sets with values given in Table 3. Their origins and the model fitting analyses are given below in Section 6.1.
Their statistics are listed in Table 4, showing the required properties for the zero-inflation ( p 0 > e m ), the dispersion δ = V ( m ) / m > 1 , the skewness γ 1 > 0 , and the excess kurtosis γ 2 > 0 (see Section 3.3).

6.1. Model Fit Analysis

Given a data set, we estimate the mean parameter m of the LAEDM distribution by the data average, and the other two parameters (a and dispersion p) by the maximum likelihood method. Then, we compute the performance of using the LAEDM- ( a , p , m ) distribution as a model for the data. In addition to the usual measures, Akaike information criterion (AIC) and the χ 2 value (chi sq) with degrees of freedom (df) [30], we also computed the difference between the observed data frequencies and the expected frequencies under the fitted LAEDM distribution by their root mean squared error (RMSE). Finally, we also computed the difference of the empirical (data) distribution and the fitted LAEDM distribution by their Kullback–Leibler divergence (KL).
For assessing the quality of these measures, we also computed these values when using other models for fitting. Because the LAEDM distributions have three parameters, we consider other three-parameter distributions: (a) distributions from the ABM and LM exponential dispersion models. We introduced these models in Bar-Lev and Ridder [8], and analyzed their fits for insurance and crash data in Bar-Lev and Ridder [9]. (b) Poisson–Tweedie distributions (PTD), introduced in Kokonendji et al. [14] and analyzed for fitting purposes in Saha et al. [18]. These distributions are based on Poisson mixing of distributions from the Tweedie exponential dispersion model, and in this way, they are flexible in the sense that they include classical two-parameter distributions such as negative binomial, Poisson-inverse Gaussian, and the geometric-Poisson distributions. (c) Zero-inflated negative binomial distribution (ZINB), which is a traditional model. To illustrate that two-parameter distributions typically perform worse, we included the Poisson-inverse Gaussian (PIG) distribution which has been used for modeling insurance data in Wilmot [15].
data set 1 
Automobile claim data from the Central African Republic, 1984 [14,16], presented in Table 5. Surprisingly, all models have difficulty fitting these data except the LAEDM, which, in fact, gave an excellent fit. The statistics of these data show a minor zero-inflation ( p 0 = 0.6984 > 0.6875 = e m ) and a minor overdispersion ( δ = 1.127 ). We will consider data sets with larger zero-inflation and overdispersion for which other models perform just as well as LAEDM.
data set 2 
Single-vehicle roadway departure fatal crashes on rural two-lane horizontal curves in Texas 2003–2008 [31], presented in Table 6. The best fit to these data was the Poisson–Tweedie distribution, while our LAEDM performs well and much better than the other models. Again, zero-inflation and overdispersion is minor, though larger than in data set 1.
data set 3 
The counts of cysts of kidneys using steroids in 2010 [32], presented in Table 7. These data show a large zero-inflation ( p 0 = 0.5909 > 0.2488 = e m , and large overdispersion (4.394). Here, we see that the traditional zero-inflated negative binomial distribution performs just as well as the Poisson–Tweedie model (and, actually better than the distribution proposed in El-Morshedy et al. [32]). These two are clearly the best. The optimal a parameter of the LAEDM is rather large, making it the same distribution as ABM with a power parameter of two.
data set 4 
The length of stays after admission in a USA hospital among the elderly population, aged 65 years or more in 1997 [33], presented in Table 8. The data show moderate zero-inflation and overdispersion, but a large excess kurtosis ( γ 2 = 22.80 ). All optimal distributions from the exponential dispersion models (LAEDM, PTD, ABM, LM) perform about the same (and about the same as proposed in Bhati and Bakouch [33]). The traditional ZINB and PIG perform worse.
data set 5 
Falls of older people in a randomized controlled study in Sydney 2007 [34], presented in Table 9. Again, these data have moderate zero-inflation and overdispersion, now also with moderate excess kurtosis. All six models of our comparison analysis perform about the same, including the two-parameter PIG.

7. Conclusions

In this study, we gave a comprehensive analysis of a three-parameter distribution (LAEDM) on the non-negative integers as an alternative for classical Poisson, negative binomial, and their zero-inflated (or Hurdle) variations. The main feature of LAEDM is that it is introduced by the variance function of an exponential dispersion model. This leads to an ability to easily derive statistical properties such as zero-inflation, overdispersion, right skewness, and leptokurtic. With these properties, the LAEDM distributions are well suited to fit insurance, accident, or crash data, as these show typically the mentioned statistics. Indeed, we showed that for small sets of count data from insurance and accidents, the fitted LAEDM performs very well and much better than traditional distributional models when zero-inflation and overdispersion is present (but not too much). We also compared the model fitting of the LAEDM distributions with other distributions from exponential dispersion models. The results of this comparison showed that LAEDM performs as good as the other distributions in most cases. An advantage of LAEDM distributions is that they are easy to sample from. This is useful for larger stochastic problems such as insurance risk measures in an insurance risk model, as we showed. In conclusion, we believe that the LAEDM distributions form a useful tool for modeling and analyzing data that show zero-inflation, overdispersion, right skewness, and leptokurtic properties. Future work concerns the use of the LAEDM distributions for analyzing data in the context of a generalized linear model, in which, for instance, the mean is linked to covariates.

Author Contributions

Conceptualization, S.K.B.-L.; methodology, S.K.B.-L. and A.R.; software, A.R.; validation, S.K.B.-L. and A.R.; formal analysis, S.K.B.-L. and A.R.; investigation, S.K.B.-L. and A.R.; resources, S.K.B.-L. and A.R.; data curation, S.K.B.-L. and A.R.; writing—original draft preparation, S.K.B.-L.; writing—review and editing, A.R.; funding acquisition, A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by STAR (Stochastics—Theoretical and Applied Research) and NWO (Netherlands Organization for Scientific Research) grant number 040.11.711.

Data Availability Statement

The automobile insurance data that we used in Section 5 are publicly available at http://www.statsci.org/data/.

Acknowledgments

The authors thank the reviewers for their comments and suggestions to improve our paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 2

We give here the computations for obtaining the LAEDM probability mass function. Basically, it can be summarized as computing the mean value parameterization for the even and odd terms.
The even terms
f m , p ( 2 n ) = μ p ( 2 n ) e 2 n ψ p ( m ) ϕ p ( m ) ,
where μ p ( 2 n ) is given in (16), the primitives ψ p ( m ) and ϕ p ( m ) are given in (15), and the parameters B , C , D are given in (19).
Start with
e 2 n ψ p ( m ) = exp 2 n log a m p 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 a B = ( a p ) 2 n m 2 n ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 n e 2 n a B .
Go on with
m 2 n ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 n = m 2 ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 n = 1 ( 1 + a 2 ) + 2 a 2 p m + a 2 p 2 m 2 n = ( 1 + a 2 ) n 1 + 2 p m a 2 1 + a 2 + p 2 m 2 a 2 1 + a 2 n = ( 1 + a 2 ) n 1 + a 2 1 + a 2 p m 2 + p m n = ( 1 + a 2 ) n ( 1 + D ) n .
Next, note that
e ϕ p ( m ) = e a p B .
Hence, collecting all ingredients,
f m , p ( 2 n ) = μ p ( 2 n ) × e 2 n ψ p ( m ) × e ϕ p ( m ) = p p + 2 n 1 ( a p ) 2 n 1 ( 2 n ) ! k = 0 n 1 a 2 ( 2 n + p ) 2 + 4 k 2 × ( a p ) 2 n ( 1 + a 2 ) n ( 1 + D ) n e 2 n a B × e a p B = e a p B p p + 2 n 1 C n 1 ( 2 n ) ! k = 0 n 1 a 2 ( 2 n + p ) 2 + 4 k 2 .
The odd terms
e ( 2 n + 1 ) ψ ( m ) = exp ( 2 n + 1 ) log a m p 1 2 log ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 a B = ( a p ) 2 n + 1 m 2 n + 1 ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 ( 2 n + 1 ) / 2 e ( 2 n + 1 ) a B ,
with
m 2 n + 1 ( 1 + a 2 ) m 2 + 2 a 2 m p + a 2 p 2 ( 2 n + 1 ) / 2 = ( 1 + a 2 ) ( 1 + D ) n ( 1 + a 2 ) ( 1 + D ) 1 / 2 .
Thus,
f m , p ( 2 n + 1 ) = μ p ( 2 n + 1 ) × e ( 2 n + 1 ) ψ p ( m ) × e ϕ p ( m ) = a p ( a p ) 2 n + 1 1 ( 2 n + 1 ) ! k = 0 n 1 a 2 ( 2 n + 1 + p ) 2 + ( 2 k + 1 ) 2 × ( a p ) 2 n + 1 ( 1 + a 2 ) ( 1 + D ) n ( 1 + a 2 ) ( 1 + D ) 1 / 2 e ( 2 n + 1 ) a B × e a p B = a p e a p B 1 C 1 C n 1 ( 2 n + 1 ) ! k = 0 n 1 a 2 ( 2 n + 1 + p ) 2 + ( 2 k + 1 ) 2 .

Appendix B. Proof of Proposition 3

(i)
The even terms
The following inequalities are elementary analysis.
  • By Stirling approximation, for any n 1 ,
    n ! 2 π n n e n .
  • Differential calculus gives for any n 1 ,
    2 n + p 2 n 2 n = 1 + p 2 n 2 n e p .
  • Because all parameters are positive,
    2 a ( 2 n + p ) 1 n a .
  • By Riemann integration,
    0 1 / a log ( 1 + x 2 ) d x k = 0 n 1 1 n a log 1 + 1 n a k 2 k = 0 n 1 2 2 n a log 1 + 2 a ( 2 n + p ) k 2 ,
    where
    0 1 / a log ( 1 + x 2 ) d x = x log ( 1 + x 2 ) 2 ( x arctan x ) 0 1 / a = 1 a log 1 + 1 a 2 2 a + 2 arctan 1 a = 2 a E . .
  • Using the previous inequalities,
    k = 0 n 1 log 1 + 2 a ( 2 n + p ) k 2 2 n a 2 0 1 / a log ( 1 + x 2 ) d x = 2 n E .
Now, collect all bounds in the expression of the probability f m , p ( 2 n ) given in (18), as follows.
f m , p ( 2 n ) = e a p B p p + 2 n 1 C n 1 ( 2 n ) ! k = 0 n 1 a 2 ( 2 n + p ) 2 + ( 2 k ) 2 = e a p B p p + 2 n 1 C n 1 ( 2 n ) ! a 2 n ( 2 n + p ) 2 n k = 0 n 1 1 + 2 k a ( 2 n + p ) 2 e a p B p p + 2 n 1 C n 1 2 π ( 2 n ) e 2 n 2 n a 2 n ( 2 n + p ) 2 n e 2 n E = e a p B p 2 π 2 n p + 2 n 1 2 n 2 n 2 n + p 2 n 2 n a e C e E 2 n e a p B p e p 2 π 1 2 n 2 n ρ 2 n = γ ( 2 n ) 3 / 2 ρ 2 n .
The odd terms
The analysis is similar.
f m , p ( 2 n + 1 ) = a p e a p B 1 C 1 C n 1 ( 2 n + 1 ) ! k = 0 n 1 a 2 ( 2 n + 1 + p ) 2 + ( 2 k + 1 ) 2 = a p e a p B 1 C 1 C n 1 ( 2 n + 1 ) ! a 2 n ( 2 n + 1 + p ) 2 n p r o d k = 0 n 1 1 + 2 k + 1 a ( 2 n + 1 + p ) 2 p e a p B 1 C 1 C n 1 2 π ( 2 n + 1 ) e 2 n + 1 2 n + 1 a 2 n + 1 ( 2 n + 1 + p ) 2 n e ( 2 n + 1 ) E = e a p B p 2 π 1 ( 2 n + 1 ) 2 n + 1 2 n + 1 + p 2 n + 1 2 n ( a e ) 2 n + 1 e ( 2 n + 1 ) E C n C e a p B p e p 2 π 1 ( 2 n + 1 ) 2 n + 1 ρ 2 n + 1 = γ ( 2 n + 1 ) 3 / 2 ρ 2 n + 1 .
(ii)
All inequalities above become asymptotics as n . For instance,
n ! 2 π n n e n , n ,
and
2 n + p 2 n 2 n = 1 + p 2 n 2 n e p , n ,
where ∼ indicates that the ratio of the left and right part trends toward 1 as n .
(iii)
The geometric parameter ρ
Recall
ρ = a e 1 + E C ,
where B , C , D , and E are given in (19)–(20).
Clearly, ρ is positive. We prove that ρ 2 < 1 . Solve
ρ 2 = ( a e ) 2 C e 2 E = a 2 1 + a 2 × e 2 1 + D × e 2 E 2 a B = a 2 1 + a 2 × e 2 1 + a 2 1 + a 2 p m 2 + p m × exp [ log 1 + 1 a 2 2 + 2 a arctan 1 a 2 a arctan ( 1 + a 2 ) m + a 2 p a p arctan a ] = a 2 1 + a 2 × e 2 1 + a 2 1 + a 2 p ( 2 m + p ) m 2 × a 2 + 1 a 2 e 2 exp 2 a arctan 1 a arctan ( 1 + a 2 ) m + a 2 p a p + arctan a = 1 1 + a 2 1 + a 2 p ( 2 m + p ) m 2 exp 2 a π 2 arctan ( 1 + a 2 ) m + a 2 p a p .
Let
x = a 2 1 + a 2 p m .
We obtain
ρ 2 = 1 1 + x ( 2 + p / m ) exp 2 a π 2 arctan a ( 1 + 1 / x ) = 1 1 + x 2 + x ( 1 + a 2 ) / a 2 exp 2 a π 2 arctan a ( 1 + 1 / x ) .
  • Fix a and p, then ρ 2 as function of m is increasing. This follows by differentiating:
    d d m ρ 2 = 2 a 2 ( 1 + a 2 ) x 2 m x 2 + a 2 ( 1 + x ) 2 2 exp 2 a π 2 arctan a ( 1 + 1 / x ) > 0 .
    From (A1), we obtain lim m x = 0 , hence,
    lim m ρ 2 = lim x 0 1 1 + x 2 + x ( 1 + a 2 ) / a 2 × lim x 0 exp 2 a π 2 arctan a ( 1 + 1 / x ) = 1 × exp ( 0 ) = 1 .
    Conclude, for any pair ( a , p ) , the geometric parameter satisfies 0 < ρ < 1 for all m > 0 .
  • In the same way, fix a and m, and consider ρ 2 as function of p.
    d d p ρ 2 = 2 a 2 ( 1 + a 2 ) x 2 p x 2 + a 2 ( 1 + x ) 2 2 exp 2 a π 2 arctan a ( 1 + 1 / x ) < 0 .
    Thus, ρ 2 is decreasing (as function of p) with (see (A1)–(A2)),
    lim p 0 x = 0 lim p 0 ρ 2 = lim x 0 ρ 2 = 1 ,
    which gives 0 < ρ < 1 for all p > 0 (and any pair ( a , m ) ).
  • Finally, fix m and p and consider ρ 2 as function of a. The derivative with respect to a is a large expression which is negative for all positive a. Thus, ρ 2 is decreasing (as function of a), with (see (A1)–(A2)),
    lim a 0 x = 0 lim a 0 ρ 2 = lim x 0 ρ 2 = 1 ,
    which gives 0 < ρ < 1 for all a > 0 (and any pair ( p , m ) ).
Conclude that 0 < ρ < 1 for any triple ( a , p , m ) of the LAEDM parameters.

References

  1. Barndorff-Nielsen, O. Information and Exponential Families in Statistical Theory; Wiley: Chichester, UK, 1978. [Google Scholar]
  2. Letac, G.; Mora, M. Natural real exponential families with cubic variance functions. Ann. Stat. 1990, 18, 1–37. [Google Scholar] [CrossRef]
  3. Jørgensen, B. The Theory of Exponential Dispersion Models. In Monographs on Statistics and Probability; Chapman and Hall: London, UK, 1997; Volume 76. [Google Scholar]
  4. Bar-Lev, S.K.; Kokonendji, C.C. On the mean value parameterization of natural exponential families—A Revisited Review. Math. Methods Stat. 2017, 26, 159–175. [Google Scholar] [CrossRef]
  5. Morris, C.N. Natural exponential families with quadratic variance functions. Ann. Stat. 1982, 10, 65–80. [Google Scholar] [CrossRef]
  6. Bar-Lev, S.K.; Ridder, A. Monte Carlo methods for insurance risk computation. Int. J. Stat. Probab. 2019, 8, 54–74. [Google Scholar] [CrossRef]
  7. Fosam, E.B.; Shanbhag, D.N. An extended Laha-Lukacs characterization results based on a regression property. J. Stat. Plan. Inference 1997, 63, 173–186. [Google Scholar] [CrossRef]
  8. Bar-Lev, S.K.; Ridder, A. New exponential dispersion models for count data—The ABM and LM classes. ESAIM Probab. Stat. 2021, 25, 31–52. [Google Scholar] [CrossRef]
  9. Bar-Lev, S.K.; Ridder, A. Exponential dispersion models for overdispersed zero-inflated count data. Commun.-Stat.-Simul. Comput. 2021, 1–19. [Google Scholar] [CrossRef]
  10. Jørgensen, B.; Kokonendji, C.C. Discrete dispersion models and their Tweedie asymptotics. AStA Adv. Statictical Anal. 2016, 100, 43–78. [Google Scholar] [CrossRef] [Green Version]
  11. Awad, Y.; Bar-Lev, S.K.; Makov, U. A new class of counting distributions embedded in the Lee-Carter model for mortality projections: A Bayesian Approach. Risks 2022, 10, 111. [Google Scholar] [CrossRef]
  12. Jørgensen, B. Exponential dispersion models (with discussion). J. R. Stat. Soc. Ser. B 1987, 49, 127–162. [Google Scholar]
  13. Nelder, J.A.; Wedderburn, R.W.M. Generalized linear models. J. R. Stat. Soc. Ser. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
  14. Kokonendji, C.C.; Dossou-Gbété, S.; Demétrio, C.G.B. Some discrete exponential dispersion models: Poisson-Tweedie and Hinde-Demétrio classes. Stat. Oper. Res. Trans. 2004, 28, 201–214. [Google Scholar]
  15. Willmot, G. The Poisson-Inverse Gaussian distribution as an alternative to the negative binomial. Scand. Actuar. J. 1987, 3–4, 113–127. [Google Scholar] [CrossRef]
  16. Kokonendji, C.C.; Khoudar, M. On strict arcsine distribution. Commun. Stat.-Theory Methods 2004, 33, 993–1006. [Google Scholar] [CrossRef]
  17. Debrabant, B.; Halekoh, U.; Bonat, W.H.; Hansen, D.L.; Hjelmborg, J.; Lauritsen, J. Identifying traffic accident black spots with Poisson-Tweedie models. Accid. Anal. Prev. 2018, 111, 147–154. [Google Scholar] [CrossRef]
  18. Saha, D.; Alluri, P.; Dumbaugh, E.; Gan, A. Application of the Poisson-Tweedie distribution in analyzing crash frequency data. Accid. Anal. Prev. 2020, 137, 105456. [Google Scholar] [CrossRef]
  19. El-Shaarawi, A.H.; Zhu, R.; Joe, H. Modelling species abundance using th Poisson-Tweedie family. Environmetrics 2011, 22, 152–164. [Google Scholar] [CrossRef]
  20. Signorelli, M.; Spitali, P.; Tsonaka, R. Poisson-Tweedie mixed-effects model: A flexible approach for the analysis of longitudinal RNA-seq data. Stat. Model. 2021, 21, 520–545. [Google Scholar] [CrossRef]
  21. Abid, R.; Kokonendji, C.C.; Masmoudi, A. On Poisson-exponential-Tweedie models for ultra-overdispersed count data. AStA Adv. Statictical Anal. 2021, 105, 1–23. [Google Scholar] [CrossRef]
  22. Devroye, L. Non-Uniform Random Variate Generation; Springer: New York, NY, USA, 1986. [Google Scholar]
  23. Kunreuther, H.; Novemsky, N.; Kahneman, D. Making low probabilities useful. J. Risk Uncertain. 2001, 23, 103–120. [Google Scholar] [CrossRef]
  24. Kaas, R.; Goovaerts, M.; Dhaene, J.; Denuit, M. Modern Actuarial Risk Theory, 2nd ed.; Springer: Heidelberg, Germany, 2008. [Google Scholar]
  25. Furman, E.; Landsman, Z. On some risk-adjusted tail-based premium calculation principles. J. Actuar. Pract. 2006, 13, 175–190. [Google Scholar]
  26. McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management: Concpets, Techniques and Tools, Revised ed.; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
  27. Lee, S. Addressing imbalanced insurance data through zero-inflated Poisson regression boosting. ASTIN Bull. 2021, 51, 27–55. [Google Scholar] [CrossRef]
  28. Hallin, M.; Ingenbleek, J.-F. The Swedish automobile portfolio in 1977. Scand. Actuar. J. 1983, 1, 49–64. [Google Scholar] [CrossRef]
  29. Smyth, G.K. Third Party Motor Insurance in Sweden. Australasian Data and Story Library (OzDASL). 2011. Available online: http://www.statsci.org/data/ (accessed on 17 August 2022).
  30. Hilbe, J.M. Modeling Count Data; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  31. Lord, D.; Geedipally, S.R. The negative binomial-Lindley distribution as a tool for analyzing crash data characterized by a large amount of zeros. Accid. Anal. Prev. 2011, 43, 1738–1742. [Google Scholar] [CrossRef]
  32. El-Morshedy, M.; Eliwa, M.S.; Nagy, H. A new two-parameter exponentiated discrete Lindley distribution: Properties, estimation and applications. J. Appl. Stat. 2020, 47, 354–375. [Google Scholar] [CrossRef]
  33. Bhati, D.; Bakouch, H.S. A new infinitely divisible discrete distribution with applications to count data modeling. Commun. Stat.-Theory Methods 2019, 48, 1401–1416. [Google Scholar] [CrossRef]
  34. Ullah, S.; Finch, C.F.; Day, L. Statistical modelling for fall count data. Accid. Anal. Prev. 2010, 42, 384–392. [Google Scholar] [CrossRef]
Figure 1. Histogram of the simulated data and their expected numbers.
Figure 1. Histogram of the simulated data and their expected numbers.
Mathematics 10 03715 g001
Figure 2. Histograms of the aggregated claim observed data and 2000 simulated samples (normed to form pdf’s), using LAEDM frequency.
Figure 2. Histograms of the aggregated claim observed data and 2000 simulated samples (normed to form pdf’s), using LAEDM frequency.
Mathematics 10 03715 g002
Figure 3. Q–Q plots of the aggregated claim observed data and 2000 simulated samples, using LAEDM frequency.
Figure 3. Q–Q plots of the aggregated claim observed data and 2000 simulated samples, using LAEDM frequency.
Mathematics 10 03715 g003
Figure 4. Q–Q plots of the aggregated claim observed data and 2000 simulated samples, using NB frequency.
Figure 4. Q–Q plots of the aggregated claim observed data and 2000 simulated samples, using NB frequency.
Mathematics 10 03715 g004
Table 1. Statistics of insurance data.
Table 1. Statistics of insurance data.
VariableZeros p 0 Average mVariance V
Frequency N0.0634970.6052,181.5
Aggregate S N 329.21,153,532.3
Claim Y 4.663265.3
Table 2. p-values of the fitted models.
Table 2. p-values of the fitted models.
Frequency ModelClaim Modelp-Value
LAEDMGamma0.3709
LAEDMInverse Gaussian0.3646
LAEDMNEF Stable0.6042
Table 3. Five data sets.
Table 3. Five data sets.
01234567891011
Set 169842452433100265
Set 229,087295246410840952311
Set 3651410642221112
Set 43541599176482012514
Set 525654141012
Table 4. The statistics of the data sets.
Table 4. The statistics of the data sets.
p 0 m e m δ γ 1 γ 2
Set 10.69840.37470.68751.1272.0545.455
Set 20.89030.13760.87141.4855.11945.25
Set 30.59091.3910.24884.3942.2304.807
Set 40.80370.29600.74381.8823.96322.80
Set 50.75960.37390.68811.7422.7488.786
Table 5. Performance for fitting models of data set 1.
Table 5. Performance for fitting models of data set 1.
ModelAICchi sqdfRMSEKL
LAEDM15,943.90.259613.0040.0002498
PTD16,028.1140.6172.750.004459
ABM15,965.525.65153.150.001332
LM15,963.122.45152.500.001208
ZINB15,967.628.22153.000.001432
PIG15,961.222.70252.640.001214
Table 6. Performance for fitting models of data set 2.
Table 6. Performance for fitting models of data set 2.
ModelAICchi sqdfRMSEKL
LAEDM27,062.86.18336.1770.0001844
PTD27,058.02.86834.7640.0001123
ABM27,075.720.10325.990.0003818
LM27,059.64.509310.650.0001361
ZINB27,105.258.17338.700.0008345
PIG27,062.68.294414.560.0002120
Table 7. Performance for fitting models of data set 3.
Table 7. Performance for fitting models of data set 3.
ModelAICchi sqdfRMSEKL
LAEDM343.51.90021.7130.04392
PTD340.00.137120.65870.02827
ABM343.51.90021.7130.04392
LM347.64.45422.6280.06257
ZINB340.10.121720.64820.02857
PIG345.74.75532.7630.06333
Table 8. Performance for fitting models of data set 4.
Table 8. Performance for fitting models of data set 4.
ModelAICchi sqdfRMSEKL
LAEDM6021.13.53345.4280.001268
PTD6021.13.50445.8850.001264
ABM6020.83.10545.4340.001235
LM6020.93.09145.9320.001240
ZINB6025.28.30347.3860.001735
PIG6020.54.71559.9940.001421
Table 9. Performance for fitting models of data set 5.
Table 9. Performance for fitting models of data set 5.
ModelAICchi sqdfRMSEKL
LAEDM542.13.38312.0940.008977
PTD541.82.97912.3030.008497
ABM542.13.38212.0930.008977
LM542.73.94112.1900.009763
ZINB541.83.02312.2560.008541
PIG541.14.47222.3660.01043
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bar-Lev, S.K.; Ridder, A. The Large Arcsine Exponential Dispersion Model—Properties and Applications to Count Data and Insurance Risk. Mathematics 2022, 10, 3715. https://doi.org/10.3390/math10193715

AMA Style

Bar-Lev SK, Ridder A. The Large Arcsine Exponential Dispersion Model—Properties and Applications to Count Data and Insurance Risk. Mathematics. 2022; 10(19):3715. https://doi.org/10.3390/math10193715

Chicago/Turabian Style

Bar-Lev, Shaul K., and Ad Ridder. 2022. "The Large Arcsine Exponential Dispersion Model—Properties and Applications to Count Data and Insurance Risk" Mathematics 10, no. 19: 3715. https://doi.org/10.3390/math10193715

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop