Next Article in Journal
New Analytical Model Used in Finite Element Analysis of Solids Mechanics
Next Article in Special Issue
A Sarmanov Distribution with Beta Marginals: An Application to Motor Insurance Pricing
Previous Article in Journal
Quantifying Aristotle’s Fallacies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Frequency and Severity Dependence in the Collective Risk Model: An Approach Based on Sarmanov Distribution

by
Catalina Bolancé
1,† and
Raluca Vernic
2,*,†
1
Department of Econometrics, Riskcenter-IREA University of Barcelona, Av. Diagonal, 690, 08034 Barcelona, Spain
2
Faculty of Mathematics and Computer Science, Ovidius University of Constanta, 124 Mamaia Blvd., Constanta, and Gheorghe Mihoc-Caius Iacob Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, Calea 13 Septembrie 13, 050711 Bucharest, Romania
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2020, 8(9), 1400; https://doi.org/10.3390/math8091400
Submission received: 22 July 2020 / Revised: 17 August 2020 / Accepted: 18 August 2020 / Published: 21 August 2020
(This article belongs to the Special Issue Multivariate Sarmanov Distributions and Applications)

Abstract

:
In actuarial mathematics, the claims of an insurance portfolio are often modeled using the collective risk model, which consists of a random number of claims of independent, identically distributed (i.i.d.) random variables (r.v.s) that represent cost per claim. To facilitate computations, there is a classical assumption of independence between the random number of such random variables (i.e., the claims frequency) and the random variables themselves (i.e., the claim severities). However, recent studies showed that, in practice, this assumption does not always hold, hence, introducing dependence in the collective model becomes a necessity. In this sense, one trend consists of assuming dependence between the number of claims and their average severity. Alternatively, we can consider heterogeneity between the individual cost of claims associated with a given number of claims. Using the Sarmanov distribution, in this paper we aim at introducing dependence between the number of claims and the individual claim severities. As marginal models, we use the Poisson and Negative Binomial (NB) distributions for the number of claims, and the Gamma and Lognormal distributions for the cost of claims. The maximum likelihood estimation of the proposed Sarmanov distribution is discussed. We present a numerical study using a real data set from a Spanish insurance portfolio.

1. Introduction

The collective risk model is a basic classical actuarial risk model consisting of the sum of a random number of independent, identically distributed (i.i.d.) random variables (r.v.s) that represent costs. To facilitate computations related to this model, there is a classical assumption of independence between the random number of such random variables (i.e., the claim frequency) and the random variables themselves (i.e., the claim severities). However, studies on real data emphasized in several cases the existence of a certain dependence that should be taken into account because it can affect important actuarial quantities like premiums and ruin probabilities. Therefore, alternative approaches incorporate dependence between the number of claims and their average severity, see, for example, Erhardt and Czado [1], Czado et al. [2], Krämer et al. [3], Lee and Shi [4], or Oh et al. [5]. In this paper, we propose the bivariate Sarmanov distribution to analyze the joint behavior of the number of claims and of each one of the individual claim amounts, instead of their average; i.e., we consider heterogeneity between the claim amounts associated with each number of claims. Therefore, we need to work with all the individual claim amounts separately, and some analytical results become complex, like e.g., the distribution of the aggregated claims and its moments, which, in insurance, are fundamental to calculate the premium.
Starting from an increasing need of flexible multivariate distributions, Sarmanov distribution recently gained interest in the actuarial literature and, fitted to some real insurance data in its bivariate and trivariate forms, provided better fits than other distributions, including Copula ones. In this sense, we mention its applications in modeling continuous claim sizes see [6], modeling discrete claim frequencies see [7,8], in the evaluation of ruin probabilities see [9,10] or in capital allocation see [11,12,13].
Therefore, in this work, we use the Sarmanov distribution to model the joint distribution of the frequency and of the individual severity of claims, and we deduce the moments of the distribution of the resulting aggregate claims in the collective model. In Section 2, we present general theoretical results. In Section 3, we study some interesting particular distributions that are commonly used in insurance, i.e., the Poisson and Negative Binomial (NB) distribution for frequency and the Gamma and Lognormal for severity. A numerical example is presented in Section 4: on a real data set, we compare the bivariate Sarmanov distribution relating the frequency and the individual severity with the simpler case in which it is assumed that the claim amounts are all equal to the mean cost per policyholder; this last assumption implies eliminating the heterogeneity within each insured, and has been used in the alternative works cited in the first paragraph of this introduction. Finally, we conclude in Section 5. All the proofs are given in the Appendix A.

2. Collective Model with Frequency Dependent on the Individual Claims

2.1. Introducing Sarmanov Dependence

If N denotes the r.v. number of claims from a certain portfolio and X 1 , X 2 , , X N the corresponding claim amounts, then the resulting aggregate claims can be represented by the collective model as
S = j = 1 N X j ,
where S = 0 when N = 0 . The usual assumptions under which this model is considered are X 1 , X 2 , , X N i.i.d. positive r.v.s, independent of the r.v. N. If, in particular, we assume that X 1 = X 2 = = X N = X ¯ , where X ¯ is the mean cost per policyholder, we obtain a simpler representation of the collective risk model, for which S = N X ¯ .
Let X denote the generic r.v. claim amount whose distribution is assumed to be absolutely continuous with probability density function (pdf) denoted by f X , let p denote the probability mass function (pmf) of N and let f S be pdf of S. The cumulative distribution function (cdf) of an r.v. will be denoted by F indexed with the r.v.’s name. It is well-known that for model (1),
f S s = n = 0 p n f X * n x , E S = E N E X , V a r S = E X 2 V a r N + E N V a r X ,
where f * n is the n-fold convolution of the function f , iteratively defined by
f * 0 x = 1 , x = 0 0 , o t h e r w i s e , f * 1 = f , f * n + 1 = f * f * n , with f * f x = R f y f x y d y .
In order to relax the independence condition between the number of claims and the claim amounts, we replace the above assumptions with the following ones:
Hypothesis 1.
Given N = n , the r.v.s X 1 , X 2 , are assumed to be i.i.d.
Hypothesis 2.
Assume a Sarmanov type dependence between each X i and N , i.e., X i , N i 1 are identically distributed with the mixed Sarmanov pdf
f X , N x , n = p 0 , n = x = 0 p n f x 1 + ω ψ n ϕ x , n 1 , x > 0 ,
where ψ and ϕ are bounded non-constant kernel functions, ω R and f is a pdf; for simplicity, we denote by Y an r.v. having pdf f and representing X > 0 . Note that the pdf (2) is mixed because it joins the continuous pdf f and the discrete pmf p. In order for (2) to define a proper pdf, we impose the conditions
n 1 ψ n p n = R ϕ x f x d x = 0 , and
1 + ω ψ n ϕ x 0 , for all n 1 , x > 0 .
With L Y denoting the Laplace transform of the r.v. Y , we shall use the exponential kernels: ϕ y = e γ y L Y γ and ψ n = e δ n L N δ p 0 1 p 0 . Then, letting
m 1 = inf n 1 ψ n = L N δ p 0 1 p 0 , M 1 = sup n 1 ψ n = 1 L N δ p 0 1 p 0 , m 2 = inf x > 0 ϕ x = L Y γ , M 2 = sup x > 0 ϕ x = 1 L Y γ ,
from condition (4), ω is restricted to the following interval
max 1 m 1 m 2 , 1 M 1 M 2 ω min 1 m 1 M 2 , 1 M 1 m 2 .
Note that under the assumption H2, the distribution of the r.v. X will have both an absolutely continuous component (with pdf f X ) and a probability mass at 0; hence, the distribution of S also has a probability mass at 0 and the pdf f S .
For the mixed Sarmanov pdf (2), it can be easily deduced that:
Pr X = 0 = p 0 , f X x = 1 p 0 f x , x > 0 ;
Pr X = 0 N = n = 1 , n = 0 0 , n 1 ;
f X N = n x = f x 1 + ω ψ n ϕ x , x > 0 , n 1 ;
Pr N = n X = x = 1 , n = x = 0 p n 1 p 0 1 + ω ψ n ϕ x , n 1 , x > 0 .
In the following proposition, we present the distribution of S. Its proof is given in the Appendix A.
Proposition 1.
Under the assumptions H1-H2, it holds that
F S s = p 0 + n = 1 p n F X N = n * n x , Pr S = 0 = p 0 , f S s = n = 1 p n f X N = n * n x , x > 0 .
To evaluate the pdf of S based on formula (8), we shall need the following result, which gives a formula for the conditional convolutions.
Proposition 2.
Under the assumptions H1-H2, for x > 0 it holds that
f X N = n * m x = f * m x + k = 1 m m k ω k ψ k n f ϕ * k * f * m k x .
Next, we present in terms of Y the first two moments of the aggregate claims S and the correlation coefficient between X and N.
Proposition 3.
Under the assumptions H1-H2, the expected value and variance of S are given by
E S = E N E Y + ω E N ψ N E Y ϕ Y , V a r S = E Y 2 V a r N + E N V a r Y + ω 2 E 2 Y ϕ Y V a r N ψ N E N ψ 2 N + 2 ω E Y E Y ϕ Y E N 2 ψ N E N E N ψ N E N ψ N + ω E Y 2 ϕ Y E N ψ N ,
while the correlation coefficient between X and N is
c o r r X , N = ω E N ψ N E Y ϕ Y + p 0 E N E Y 1 p 0 V a r Y + p 0 E 2 Y V a r N .
Remark 1.
Let m 1 and x = x 1 , , x m . Then, based on H1 and (7), the conditional distribution of X 1 , X 2 , , X m given N = n is
f X 1 , X 2 , , X m N = n x , n = j = 1 m f X j N = n x = j = 1 m f x j 1 + ω ψ n ϕ x j ,
therefore, the joint distribution of X 1 , X 2 , , X m , N results as
f X 1 , X 2 , , X m , N x , n = p n j = 1 m f x j 1 + ω ψ n ϕ x j = p n j = 1 m f x j 1 + k = 1 m ω k ψ k n j 1 < < j k l = 1 k ϕ x j l .

2.2. Simulation from the Collective Model

To simulate values from the distribution of S under the assumptions H1-H2, we use the inversion method for the conditional cdf of X given N = n , i.e., we use
F X N = 0 0 = 1 , F X N = n x = F Y x + ω ψ n 0 x f y ϕ y d y , n 1 , x > 0 .
Therefore, we first simulate the value n from the distribution of N. If  n = 0 then s = 0 ; otherwise, we generate n uniform U 0 , 1 values u i i = 1 n and, by solving the equation F X N = n x i = u i for x i , we obtain the vector x i i = 1 n . Then, s = i = 1 n x i is a value generated from S according to model (1).

2.3. Parameters Estimation

Let n i ; x i i = 1 K = n i ; x i 1 , , x i n i i = 1 K be a random sample of the number of claims and individual claim amounts. Let 0 be the 0 vector, let θ and ν be, respectively, the parameters vectors of the marginal distributions of N and Y , while ω is the dependence parameter of the Sarmanov distribution. From (2), the corresponding log-likelihood function is
ln L n i ; x i i = 1 K = i : n i = 0 , x i = 0 ln p 0 ; θ + i : n i 1 , x i j > 0 ln p n i ; θ + j = 1 n i ln f x i j ; ν + j = 1 n i ln 1 + ω ψ n i ϕ x i j = ln L n i i = 1 K ; θ + ln L x i x i j > 0 , i = 1 , , K , j = 1 , , n i ; ν + i = 1 K j = 1 n i ln 1 + ω ψ n i ϕ x i j ,
where L n i i = 1 K ; θ is the likelihood function corresponding to the marginal r.v. N and L x i x i j > 0 , i = 1 , , K , j = 1 , , n i ; ν the one corresponding to Y.
Since maximizing the log-likelihood expressed in (10) is very difficult, specifically due to the close relationship that exists between the dependency parameter and the parameters associated with the marginal distributions, we define l n i , x i i = 1 K θ ; ν | ω to be the log-likelihood function corresponding to the marginal parameters given ω , and l n i , x i i = 1 K ω | θ ; ν the log-likelihood function of the dependence parameter given the marginal parameters θ , ν , and we determine the Maximum Likelihood Estimation (MLE) of the parameters in two phases:
Phase 1
By MLE, find initial values for the parameters of the marginal distributions. Then, iterate the following two steps:
Step 1
(iteration j) Given the parameters for the marginal distributions, find ω ^ j within the interval defined in (5) for this dependence parameter by maximizing the log-likelihood l n i , x i i = 1 K ω | θ ; ν ;
Step 2
Given ω ^ j , obtain new values for the parameters of the marginals by maximizing the log-likelihood function l n i , x i i = 1 K θ ; ν | ω .
Repeat steps 1 and 2 until convergence. If the dependence parameter is located at an extreme of the interval, recalculate these intervals using the parameters for marginals obtained in Step 2.
Phase 2 Starting with the initial parameters estimated in Phase 1, perform full MLE.
As a result that our parametric space is bounded, we used the optim() function of R with the method L-BFGS-B to perform the optimizations.
The method proposed in Phase 1 is known as the Inference From Margin (IFM) method, and has been widely used in the estimation of Copulas, see [14] for a review. Regarding the estimation of the Sarmanov distribution, the method in two phases, IFM and full MLE, has already been used with excellent results by Bolancé and Vernic [8] for the case of NB marginals.

3. Particular Cases

3.1. Particular Severity Distributions

For our particular dataset, we considered two severity distributions: Lognormal and Gamma.
Under the Gamma severity distribution assumption, Y G a α , β , α , β > 0 , with β the rate parameter, we recall that
E Y = α β , E Y 2 = α α + 1 β 2 , V a r Y = α β 2 , L Y γ = β β + γ α .
Furthermore, given that
E Y e γ Y = β α Γ α 0 y α + 1 1 e β + γ y d y = α β α β + γ α + 1 ,
and
E Y 2 e γ Y = β α Γ α 0 y α + 2 1 e β + γ y d y = α α + 1 β α β + γ α + 2 ,
for the exponential kernel ϕ y = e γ y L Y γ , we easily obtain
E Y ϕ Y = α γ β α 1 β + γ α + 1 , E Y 2 ϕ Y = α α + 1 γ β α 2 2 β + γ β + γ α + 2 .
We approach the Lognormal severity distribution in a different way. We recall that if Y follows a Lognormal distribution L N μ , σ 2 , σ > 0 , then Z = ln Y follows a normal distribution N μ , σ 2 . Therefore, it is easier to estimate the model having the same counting distribution, but normal severity distribution, using logarithmized claim amounts. So, we first consider the bivariate Sarmanov distribution f Z , N with Z N μ , σ 2 and exponential kernel ϕ . However, since the domain of the normal distribution is the entire real line and the kernel function ϕ must be bounded, we shall work with a left truncated normal distribution with left truncation point a, i.e., Z L T N μ , σ 2 , a , and we select this truncation point such that Pr ( Z < = a ) is almost zero; note that a good choice is a = 3 σ + μ (another simple choice for a would be the minimum of the log-data). Hence,
f Z , N z , n = p n f z ; μ , σ 2 , a 1 + ω ψ n ϕ z , n 1 , z > a ,
where
f z ; μ , σ 2 , a = 1 K σ 2 π e z μ 2 2 σ = 1 K σ φ z μ σ , ϕ z = e γ z L Z γ = e γ z e γ μ + γ 2 σ 2 2 1 Φ a μ + γ σ 2 σ K .
Here K = 1 Φ a μ σ , while φ and Φ denote the pdf and, respectively, cdf of the standard normal distribution. Then, the limits for the interval (5) of ω are
m 2 = inf z > a ϕ z = L Z γ , M 2 = sup z > a ϕ z = e γ a L Z γ .
Now, to obtain the bivariate Sarmanov distribution with a truncated Lognormal marginal, we change variable Y = e Z in (11) and have
f Y , N y , n = 1 y p n f ln y ; μ , σ 2 , a 1 + ω ψ n ϕ ln y = p n f L T L N μ , σ 2 , a y 1 + ω ψ n 1 y γ L ln Y γ , n 1 , y > e a ,
where f L T L N μ , σ 2 , a y = 1 K σ y φ ln y μ σ is the pdf of the left truncated Lognormal distribution L T L N μ , σ 2 , a . It follows that the bivariate Sarmanov distribution with a Lognormal marginal is given by
f X , N x , n = p 0 , n = x = 0 p n f L T L N μ , σ 2 , a x 1 + ω ψ n ϕ ˜ x , n 1 , x > e a ,
where the kernel function ϕ ˜ x = x γ L L T N μ , σ 2 , a γ is not of exponential type.
The following proposition gives the needed ingredients to calculate the expected value and variance of S under the simplifying assumption γ = 1 .
Proposition 4.
For Y L T L N μ , σ 2 ; a and γ = 1 , it holds that
E Y = e μ + σ 2 2 K 2 K , E Y 2 = e 2 μ + 2 σ 2 K 3 K , E Y ϕ ˜ Y = 1 e σ 2 K 1 K 2 K 2 E Y 2 ϕ ˜ Y = E Y e μ + 5 σ 2 2 K 1 K 3 K 2 = 1 K e μ + σ 2 2 K 2 e 2 σ 2 K 1 K 3 K ,
where K 1 = 1 Φ a μ + σ 2 σ , K 2 = 1 Φ a μ σ 2 σ , K 3 = 1 Φ a μ 2 σ 2 σ .

3.2. Particular Counting Distributions

As counting distributions, we consider the Poisson and Negative Binomial distributions. The following result holds.
Lemma 1.
Let N follow a certain discrete distribution with support N and let ψ n = e δ n L ^ N δ be the corresponding exponential kernel with L ^ N δ = L N δ p 0 1 p 0 . Then,
E N ψ N = E N e δ N L ^ N δ E N ,
E N 2 ψ N = E N 2 e δ N L ^ N δ E N 2 ,
E N ψ 2 N = E N e 2 δ N 2 L ^ N δ E N e δ N + L ^ N 2 δ E N ,
E N 2 ψ 2 N = E N 2 e 2 δ N 2 L ^ N δ E N 2 e δ N + L ^ N 2 δ E N 2 .
In the following result, we present formulas needed to evaluate the expected value and variance of S given in Proposition 3, for our particular distributions.
Proposition 5.
Let ψ n = e δ n L N δ p 0 1 p 0 be the exponential kernel.
(i) 
If N P o λ , then
E N ψ N = λ e λ e λ e δ δ e λ e δ 1 1 e λ , E N 2 ψ N = λ e λ e λ e δ δ λ e δ + 1 λ + 1 e λ e δ 1 1 e λ , E N ψ 2 N = λ e λ e 2 δ 1 2 δ + e λ e δ 1 e λ 1 2 e λ e δ 1 2 e δ + 2 e λ δ 1 , E N 2 ψ 2 N = λ e λ e 2 δ 1 2 δ λ e 2 δ + 1 + e λ e δ 1 e λ 1 2 × e λ e δ λ + 1 2 e δ λ e δ + 1 1 e λ λ 1 .
(ii) 
If N N B r , p , then
E N ψ N = r q p r 1 q e δ r 1 e δ q 1 1 q e δ r p 1 p r , E N 2 ψ N = r q p r 1 q e δ r r q + e δ e δ q 2 1 + q r 1 1 q e δ r p 2 1 p r , E N ψ 2 N = r q p r e 2 δ 1 q e 2 δ r + 1 + 1 1 q e δ r p r 1 1 q e δ 2 r 1 1 q e δ r p 1 p r 2 e δ 1 q e δ E N 2 ψ 2 N = r q p r e 2 δ r q e 2 δ + 1 1 q e 2 δ r + 2 + 1 1 q e δ r p r 1 1 q e δ 2 r = × 1 1 q e δ r 1 p r 1 + q r p 2 2 e δ r q e δ + 1 1 q e δ 2 .
In the numerical study that we present below, we combine the two discussed counting distributions with the two severity distributions and obtain the following particular compound distributions: compound Poisson–Gamma, compound Negative Binomial–Gamma, compound Poisson–Lognormal, and compound Negative Binomial–Lognormal.

4. Numerical Study

We analyzed a dataset containing a sample of K = 99,972 Spanish insureds with a total of 8872 claims. We assumed that they have a homogeneous risk profile. For each individual, we obtained information on the number of claims and on the individual cost of each claim notified by each insured. Our aim was to fit the bivariate Sarmanov distribution and to check the effect of dependence between frequency and severity on the risk premium. We compared the results obtained when considering that S = N X ¯ , where X ¯ represents the cost per policyholder (calculated as the mean of individual claim amounts, see comment after model (1)), and when considering that S = j = 1 N X j , where X j represents the j-th claim amount of the policyholder or cost per claim; i.e., we compared the results obtained by taking into account the heterogeneity of the claim costs for each insured and by considering that this heterogeneity does not exist.
In Table 1, we display the results of the initial analysis of the number of claims, consisting in the basic descriptives and initially estimated parameters (by MLE) for the marginal distribution associated with this variable. From the values of the Chi-square statistic, we can see that the best fit is obtained with the NB distribution.
In Table 2, we show the basic descriptive statistics for the cost per policyholder and for the cost per claim, together with the MLE parameters of the Gamma and Lognormal distributions for these variables. The main difference between the two variables is in the scale and shape parameters. As expected, the cost per claim r.v. has larger variance and more pronounced right skewness than the cost per policyholder. We compared the goodness of fit Akaike Information Criterion (AIC) for the Gamma and Lognormal distributions, and obtained that the best fit is provided by the Lognormal distribution.
The results of the estimated bivariate Sarmanov distributions using the procedure described in Section 2.3 are shown in Table 3. In both cases, i.e., cost per claim and cost per policyholder, the best fit was obtained with the NB–Lognormal model. The correlations ρ calculated for this type of model are consistent with the empirical correlations given in Table 2.
In Table 4, we present the values of the estimated mean and variance of S obtained when using each estimated model presented in Table 3. These values are the ones used to calculate the risk premium. If we compare the results obtained using the estimated dependence parameter ω > 0 with the results obtained by assuming ω = 0 (i.e., independence), we observe that, as we expected, the positive values estimated for the dependence case increase the mean and the variance. Furthermore, the largest values are obtained with the NB–Lognormal Sarmanov distribution.
To see the effect on risk premiums, in Table 5 we calculated the risk premium according to the standard deviation principle, i.e., π R = E S + δ V a r S , where δ is the loading constant. We display the pure ( δ = 0 ) and risk ( δ > 0 ) premiums, assuming that δ = 1 ; we also considered the case when N and X are independent (i.e., ω = 0 ) , and when N and X are Sarmanov distributed for both estimated NB–Lognormal models, with cost per claim and cost per policyholder. We can see that, by using the NB–Lognormal distribution, the risk premiums are larger for the cost per policyholder model. Specifically, in the dependent case, using the individual claim cost information allows the company to reduce its risk premium with approximately 55 Euros.

5. Conclusions

In this paper, we have shown how the flexibility of the Sarmanov model allows us to introduce dependence between different types of variables, discrete and continuous. We have used the Sarmanov distribution to model the bivariate distribution joining the number of claims and the individual claim amounts, which is needed to estimate, e.g., the moments of the aggregate claims in the collective risk model considering dependence between its variables. We numerically compared the results obtained using the cost per claim and also using the cost per policyholder; in both cases, the NB–Lognormal Sarmanov model proved to provide the best fit to our real dataset.
We have also analyzed the differences between the expectations and the variances of the aggregate claims in the collective models obtained using alternative estimated distributions. We note that, as expected, these values are larger for the models in which a Lognormal marginal distribution is considered for the cost.

Author Contributions

Both authors contributed equally to this work in terms of conceptualization, methodology, validation, writing; software, C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We thank Fundación BBVA, Equipos de Investigación Científica en Big Data 2018 for support. Also, we are very grateful to the three referees for their nice reviews and valuable comments that helped to improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree letter acronym
LDlinear dichroism

Appendix A

Proof of Proposition 1.
Since S = 0 when N = 0 , we clearly obtain that Pr S = 0 = p 0 . Then, for x > 0 , we have
F S x = Pr S s = Pr S = 0 + n = 1 Pr j = 1 n X j x N = n p n = p 0 + n = 1 p n F X N = n * n x .
Differentiating with respect to x easily yields the formula of f S . □
Proof of Proposition 2.
Let x > 0 . We prove the result by induction: when m = 1 , based on the definition of f * 0 , we have
f X N = n * 1 x = f X N = n x = f x + ω ψ n f ϕ x = f x + ω ψ n f ϕ * f * 0 x .
Assuming that the result holds for m 1 and taking m + 1 , we obtain
f X N = n * m + 1 x = D f X N = n * m y f X N = n x y d y
= D f * m y + k = 1 m m k ω k ψ k n f ϕ * k * f * m k y f x y 1 + ω ψ n ϕ x y d y = D f * m y f x y 1 + ω ψ n ϕ x y d y + k = 1 m m k ω k ψ k n D f ϕ * k * f * m k y f x y 1 + ω ψ n ϕ x y d y = f * m + 1 x + ω ψ n f ϕ * f * m x + k = 1 m m k ω k ψ k n f ϕ * k * f * m + 1 k x + k = 1 m m k ω k + 1 ψ k + 1 n f ϕ * k + 1 * f * m k x ,
where D is the integration domain. Changing the index in the last sum, we have
k = 1 m m k ω k + 1 ψ k + 1 n f ϕ * k + 1 * f * m k x = k = 2 m + 1 m k 1 ω k ψ k n f ϕ * k * f * m + 1 k x .
Inserting this into the above formula of f X N = n * m + 1 and rearranging gives
f X N = n * m + 1 x = f * m + 1 x + k = 1 m m k + m k 1 ω k ψ k n f ϕ * k * f * m + 1 k x + ω m + 1 ψ m + 1 n f ϕ * m + 1 * f * 0 x = f * m + 1 x + k = 1 m m + 1 k ω k ψ k n f ϕ * k * f * m + 1 k x + ω m + 1 ψ m + 1 n f ϕ * m + 1 * f * 0 x ,
which immediately yields the result. This completes the proof.  □
Proof of Proposition 3.
We prove the expected value and variance formulas in the usual way. Using (7), we have
E S = E E j = 1 N X j N = E N E X N = E N E Y + ω ψ N E Y ϕ Y ,
which easily yields the formula of E S . In what concerns the variance, we use
V a r S = V a r E S N + E V a r S N .
We shall need the following formulas. From (7), we have
V a r X N = E X 2 N E 2 X N = E Y 2 + ω ψ N E Y 2 ϕ Y E Y + ω ψ N E Y ϕ Y 2 ,
formula that we insert into
E V a r S N = E V a r j = 1 N X j N = E N V a r X N = E N V a r Y + ω E N ψ N E Y 2 ϕ Y 2 ω E N ψ N E Y ϕ Y E Y ω 2 E N ψ 2 N E 2 Y ϕ Y .
On the other hand,
V a r E S N = V a r N E Y + ω ψ N E Y ϕ Y = E N 2 E Y + ω ψ N E Y ϕ Y 2 E 2 N E Y + ω ψ N E Y ϕ Y = E N 2 E 2 Y + 2 ω E N 2 ψ N E Y ϕ Y E Y + ω 2 E N 2 ψ 2 N E 2 Y ϕ Y E 2 N E 2 Y 2 ω E N ψ N E Y ϕ Y E N E Y ω 2 E 2 N ψ N E 2 Y ϕ Y = E 2 Y V a r N + 2 ω E Y E Y ϕ Y E N 2 ψ N E N E N ψ N + ω 2 E 2 Y ϕ Y E N 2 ψ 2 N E 2 N ψ N ,
and after inserting all these into (A1), we obtain the variance formula.
From formula (6) it is easy to check that
E X = 1 p 0 E Y , E X 2 = 1 p 0 E Y 2 , V a r X = 1 p 0 V a r Y + p 0 E Y 2 .
On the other hand, we have that
E X N = n = 1 n p n 0 x f x 1 + ω ψ n ϕ x d x = n = 1 n p n 0 x f x d x + ω n = 1 n p n ψ n 0 x f x ϕ x d x = E N E Y + ω E N ψ N E Y ϕ Y ,
hence
c o v X , N = E X N E X E N = E N E Y + ω E N ψ N E Y ϕ Y 1 p 0 E Y E N = ω E N ψ N E Y ϕ Y + p 0 E Y E N ,
which, together with the above formula of V a r X , immediately yields the stated formula of c o r r X , N . This completes the proof.  □
Proof of Proposition 4.
We note that
E Y = E e Z = L Z 1 = e μ + σ 2 2 1 Φ a μ σ 2 σ K , E Y 2 = E e 2 Z = L Z 2 = e 2 μ + 4 σ 2 2 1 Φ a μ 2 σ 2 σ K ,
yielding the first two formulas. The other two formulas result from
E Y ϕ ˜ Y = E Y Y 1 L Z 1 = 1 e μ + σ 2 2 1 Φ a μ + σ 2 σ K E Y = 1 e μ + σ 2 2 e μ + σ 2 2 K 1 K K 2 K , E Y 2 ϕ ˜ Y = E Y 2 Y 1 L Z 1 = E Y e μ + σ 2 2 K 1 K E Y 2 = e μ + σ 2 2 K 2 K e μ + σ 2 2 K 1 K e 2 μ + 2 σ 2 K 3 K ,
and the result is immediate.  □
Proof of Lemma 1.
The formulas (12) and (13) can be obtained directly. Formula (14) easily results from
E N ψ 2 N = E N e 2 δ N 2 L ^ N δ e δ N + L ^ N 2 δ ,
which is also the case with formula (15).  □
Proof of Proposition 5.
(i) Let N P o λ . Tamraz and Vernic [15] proved in Lemma 4.1 that E N e δ N = λ e λ e δ 1 δ , hence, applying also formula (12),
E N ψ N = λ e λ e δ 1 δ λ e λ e δ 1 e λ 1 e λ = λ e λ e λ e δ δ e λ e δ 1 1 e λ .
For the second formula, we use
E N 2 e δ N = e λ n = 0 n 2 λ n n ! e δ n = e λ n = 1 n 1 + 1 λ e δ n n 1 ! = e λ λ e δ 2 n = 2 λ e δ n 2 n 2 ! + λ e δ n = 1 λ e δ n 1 n 1 ! = e λ λ e δ 2 e λ e δ + λ e δ e λ e δ = λ e λ e δ λ δ λ e δ + 1 ,
that we insert into (13) and obtain
E N 2 ψ N = λ e λ e δ λ δ λ e δ + 1 λ λ + 1 e λ e δ 1 e λ 1 e λ = λ e λ e λ e δ δ λ e δ + 1 λ + 1 e λ e δ 1 1 e λ .
Note that E N e 2 δ N = λ e λ e 2 δ 1 2 δ . Inserting into (14), we have
E N ψ 2 N = λ e λ e 2 δ 1 2 δ + e λ e δ 1 e λ 1 e λ e λ e δ 1 e λ 1 e λ λ 2 λ e λ e δ 1 δ = λ e λ e 2 δ 1 2 δ + e λ e δ 1 e λ 1 2 e λ e δ 1 2 e λ e δ 1 δ e λ 1 = λ e λ e 2 δ 1 2 δ + e λ e δ 1 e λ 1 2 e λ e δ 1 2 e λ e δ δ + 2 e λ e δ 1 δ ,
which easily gives the third formula of (i). For the fourth formula of this case, using E N 2 e δ N = λ e λ e δ λ δ λ e δ + 1 and (15), we write
E N 2 ψ 2 N = λ e λ e 2 δ 1 2 δ λ e 2 δ + 1 + e λ e δ 1 e λ 1 e λ × e λ e δ 1 e λ 1 e λ λ λ + 1 2 λ e λ e δ 1 δ λ e δ + 1 = λ e λ e 2 δ 1 2 δ λ e 2 δ + 1 + e λ e δ 1 e λ 1 2 × e λ e δ 1 λ + 1 2 e λ e δ 1 δ λ e δ + 1 e λ 1 ,
from where the stated result.
(ii) Let N N B r , p . From Tamraz and Vernic [15] we have that E N e δ N = r q p r e δ 1 q e δ r + 1 , so that using (12), we obtain
E N ψ N = r q p r e δ 1 q e δ r + 1 r q p p 1 q e δ r p r 1 p r = r q p r 1 q e δ r e δ 1 q e δ 1 1 q e δ r p 1 p r ,
yielding the first formula. To obtain the second stated formula, we first evaluate
E N 2 e δ N = n = 0 Γ r + n n ! Γ r n 2 p r q e δ n = n = 1 Γ r + n n 1 + 1 n 1 ! Γ r p r q e δ n = p r 1 q e δ r n = 2 Γ r + n n 2 ! Γ r 1 q e δ r q e δ n + n = 1 Γ r + n n 1 ! Γ r 1 q e δ r q e δ n = p r 1 q e δ r r r + 1 q e δ 2 1 q e δ 2 + r q e δ 1 q e δ = r q p r e δ r q e δ + 1 1 q e δ r + 2 .
Therefore, based on (13), we have
E N 2 ψ N = r q p r e δ r q e δ + 1 1 q e δ r + 2 r q 1 + q r p 2 p 1 q e δ r p r 1 p r = r q p r 1 q e δ r e δ r q e δ + 1 1 q e δ 2 1 + q r p 2 1 1 q e δ r 1 p r ,
which easily yields the stated formula.
Now, using (14), we obtain
E N ψ 2 N = r q p r e 2 δ 1 q e 2 δ r + 1 + p 1 q e δ r p r 1 p r r q p p 1 q e δ r p r 1 p r 2 r q p r e δ 1 q e δ r + 1 = r q p r e 2 δ 1 q e 2 δ r + 1 + p r 1 1 q e δ r 1 p r 1 q e δ r × 1 1 q e δ r p 1 p r 1 q e δ r 2 e δ 1 q e δ r + 1 ,
yielding the third formula of (ii). Inserting now E N 2 e δ N = r q p r e δ r q e δ + 1 1 q e δ r + 2 into (15) gives
E N 2 ψ 2 N = r q p r e 2 δ r q e 2 δ + 1 1 q e 2 δ r + 2 + p r 1 1 q e δ r 1 p r 1 q e δ r × p r 1 1 q e δ r 1 p r 1 q e δ r r q 1 + q r p 2 2 r q p r e δ r q e δ + 1 1 q e δ r + 2 ,
and the result is immediate. This completes the proof.  □

References

  1. Erhardt, V.; Czado, C. Modeling dependent yearly claim totals including zero claims in private health insurance. Scand. Actuar. J. 2012, 2, 106–129. [Google Scholar]
  2. Czado, C.; Kastenmeier, R.; Brechmann, E.C.; Min, A. A mixed copula model for insurance claims and claim sizes. Scand. Actuar. J. 2012, 4, 278–305. [Google Scholar] [CrossRef]
  3. Krämer, N.; Brechmann, E.; Silvestrini, D.; Czado, C. Total loss estimation using copula-based regression models. Insur. Math. Econ. 2013, 53, 829–839. [Google Scholar] [CrossRef] [Green Version]
  4. Lee, G.Y.; Shi, P. A dependent frequency–severity approach to modeling longitudinal insurance claims. Insur. Math. Econ. 2019, 87, 115–129. [Google Scholar] [CrossRef]
  5. Oh, R.; Shi, P.; Ahn, J.Y. Bonus-Malus premiums under the dependent frequency-severity modeling. Scand. Actuar. J. 2020, 3, 172–195. [Google Scholar] [CrossRef]
  6. Bahraoui, Z.; Bolancé, C.; Pelican, E.; Vernic, R. On the bivariate distribution and copula. An application on insurance data using truncated marginal distributions. Stat. Oper. Res. Trans. 2015, 39, 209–230. [Google Scholar]
  7. Abdallah, A.; Boucher, J.; Cossette, H. Sarmanov family of multivariate distributions for bivariate dynamic claim counts model. Insur. Math. Econ. 2016, 68, 120–133. [Google Scholar] [CrossRef] [Green Version]
  8. Bolancé, C.; Vernic, R. Multivariate count data generalized linear models: Three approaches based on the Sarmanov distribution. Insur. Math. Econ. 2019, 85, 89–103. [Google Scholar] [CrossRef] [Green Version]
  9. Yang, Y.; Yuen, K.C. Finite-time and infinite-time ruin probabilities in a two-dimensional delayed renewal risk model with Sarmanov dependent claims. J. Math. Anal. Appl. 2016, 442, 600–625. [Google Scholar] [CrossRef]
  10. Guo, F.; Wang, D.; Yang, H. Asymptotic results for ruin probability in a two-dimensional risk model with stochastic investment returns. J. Comput. Appl. Math. 2017, 325, 198–221. [Google Scholar] [CrossRef] [Green Version]
  11. Ratovomirija, G. On mixed Erlang reinsurance risk: Aggregation, capital allocation and default risk. Eur. Actuar. J. 2016, 6, 149–175. [Google Scholar] [CrossRef]
  12. Ratovomirija, G.; Tamraz, M.; Vernic, R. On some multivariate Sarmanov mixed Erlang reinsurance risks: Aggregation and capital allocation. Insur. Math. Econ. 2017, 74, 197–209. [Google Scholar] [CrossRef] [Green Version]
  13. Vernic, R. Capital allocation for Sarmanov’s class of distributions. Methodol. Comput. Appl. Probab. 2017, 19, 311–330. [Google Scholar] [CrossRef]
  14. Joe, H.; Xu, J.J. The Estimation Method of Inference Functions for Margins for Multivariate Models. 1996. Available online: https://open.library.ubc.ca/cIRcle/collections/facultyresearchandpublications/52383/items/1.0225985 (accessed on 1 August 2020).
  15. Tamraz, M.; Vernic, R. On the evaluation of multivariate compound distributions with continuous severity distributions and Sarmanov’s counting distribution. ASTIN Bull. 2018, 48, 841–870. [Google Scholar] [CrossRef]
Table 1. Number of claims: true, Poisson and Negative Binomial (NB) fitted frequencies, and Chi-square statistic. At the bottom: Maximum Likelihood Estimation (MLE) parameters included.
Table 1. Number of claims: true, Poisson and Negative Binomial (NB) fitted frequencies, and Chi-square statistic. At the bottom: Maximum Likelihood Estimation (MLE) parameters included.
Number of Cases99,972 Policyholders
FrequencyTRUEPoissonNB
092,538.0091,482.2892,524.63
16166.008118.586285.65
21122.00360.24950.48
3125.0010.66170.11
418.000.2432.81
53.000.001.73
Chi-Square6761.2052.81
Initial Parameters λ = 0.0887 r = 0.2897
p = 0.7655
Table 2. Descriptive statistics of cost variables. At the bottom: MLE parameters of Gamma and Lognormal distributions included.
Table 2. Descriptive statistics of cost variables. At the bottom: MLE parameters of Gamma and Lognormal distributions included.
Number of CasesMeanMedianSTDSkewnessPearson’s Correlation
Cost per claim8872 Claims859.92513.502448.2724.010.31
Cost per policyholder7434 Policyholders758.13513.501580.8115.720.38
GammaLognormal
Cost per claimInitial Parameter α = 0.6631 β = 0.0008 μ = 5.8384 σ = 1.3554
AIC136,470134,172
Cost per policyholderInitial Parameter α = 0.7148 β = 0.0009 μ = 5.7882 σ = 1.3441
AIC112,829111,557
Table 3. Estimation results of bivariate Sarmanov distributions using Poisson–Gamma, Poisson–Lognormal, NB–Gamma, and NB–Lognormal marginals.
Table 3. Estimation results of bivariate Sarmanov distributions using Poisson–Gamma, Poisson–Lognormal, NB–Gamma, and NB–Lognormal marginals.
Cost Per Claim
GammaLognormal
PoissonNBPoissonNB
λ 0.0877r0.5998 λ 0.0875r1.2693
p0.8727 p0.9355
α 0.6650 α 0.5892 μ 5.8384 μ 5.8384
β 0.0008 β 0.0006 σ 1.3553 σ 1.3552
ω 2.8197 ω 2.9530 ω 17.1549 ω 17.5107
ρ 0.6020 ρ 0.5625 ρ 0.3769 ρ 0.3717
AIC199,566AIC199,109AIC197,249AIC195,744
BIC199,583BIC199,126BIC197,264BIC195,759
Cost Per Policyholder
GammaLognormal
PoissonNBPoissonNB
λ 0.0887r0.2897 λ 0.0887r0.2897
p0.7655 p0.7655
α 0.7152 α 0.6951 μ 5.7882 μ 5.7882
β 0.0009 β 0.0009 σ 1.3441 σ 1.3441
ω 2.8157 ω 3.0631 ω 16.8899 ω 18.3588
ρ 0.6152 ρ 0.5753 ρ 0.3824 ρ 0.3621
AIC175,690AIC173,754AIC174,402AIC172,374
BIC175,707BIC173,769BIC174,417BIC172,389
Table 4. Expectation and variance of S obtained from the estimated Sarmanov models, and by assuming independence ( ω = 0 ).
Table 4. Expectation and variance of S obtained from the estimated Sarmanov models, and by assuming independence ( ω = 0 ).
Cost Per Claim
GammaLognormal
PoissonNBPoissonNB
ω > 0 E(S)74.6485.9675.4075.46
V ( S ) 158,977.20240,279.00407,210.20412,289.50
ω = 0 E ( S ) 74.6285.8975.3375.33
V ( S ) 158,880.50239,745.70406,504.30411,029.60
Cost Per Policyholder
GammaLognormal
PoissonNBPoissonNB
ω > 0 E ( S ) 67.2768.2671.6671.87
V ( S ) 128,654.30174,034.60378,528.60490,725.00
ω = 0 E ( S ) 67.2668.2071.5971.59
V ( S ) 128,584.80173,651.10377,286.30484,877.30
Table 5. Premiums obtained with NB–Lognormal models ( δ = 1 for risk premiums).
Table 5. Premiums obtained with NB–Lognormal models ( δ = 1 for risk premiums).
Pure Premium, δ = 0 Risk Premium, δ = 1
Indep. ω = 0 Depend. ω > 0 Indep. ω = 0 Depend. ω > 0
Cost per policyholder71.5971.87767.92772.39
Cost per claim75.3375.46716.45717.56

Share and Cite

MDPI and ACS Style

Bolancé, C.; Vernic, R. Frequency and Severity Dependence in the Collective Risk Model: An Approach Based on Sarmanov Distribution. Mathematics 2020, 8, 1400. https://doi.org/10.3390/math8091400

AMA Style

Bolancé C, Vernic R. Frequency and Severity Dependence in the Collective Risk Model: An Approach Based on Sarmanov Distribution. Mathematics. 2020; 8(9):1400. https://doi.org/10.3390/math8091400

Chicago/Turabian Style

Bolancé, Catalina, and Raluca Vernic. 2020. "Frequency and Severity Dependence in the Collective Risk Model: An Approach Based on Sarmanov Distribution" Mathematics 8, no. 9: 1400. https://doi.org/10.3390/math8091400

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop