The study is based on subset of heavy-tailed and overdispersed claim frequency data from a pool of MTPL insurance policies observed for 3.5 years from a major Greek insurance company. The sample comprised of insureds with complete records, i.e., with availability of all the a priori rating variables under consideration. There were 14,143 observations that met our criteria. The response variable is the number of claims at fault registered for each insured vehicle in the data set and the explanatory variables we employ are: the age of the driver (AD), the horsepower (HP) of their car, and the age of their car (AC). Furthermore, an exploratory analysis was carried out in order to accurately select the subset of explanatory variables with the highest predictive power for the number of claims. Additionally, in light of the heterogeneity that exists within the portfolio, we grouped the levels of each a priori rating variable with respect to risk profiles with similar claim frequency. This will enable us to achieve ratemaking accuracy and balance homogeneity and sufficiency of the volume of data in each cell in order to provide credible patterns. This is necessary, since, under the proposed modelling framework, both the mean and dispersion parameters of the Poisson-Inverse Gamma (PIGA) distribution will be modelled in terms of covariate information.
In what follows, we will compare the fit of the PIGA model with the classic Negative Binomial Type I (NBI) and Poisson-Inverse Gaussian (PIG) models for the case without covariate information and for the case when the mean and dispersion parameters of these mixed Poisson models are allowed to be modelled as functions of covariates.
4.1. Modelling Results
The ML estimates of the parameters and the corresponding standard errors in parentheses for the NBI, PIG, and PIGA distributions and regression models with varying dispersion are presented in
Table 2 and
Table 3, respectively. Note that, for the case when the mean and dispersion parameters,
and
,
, of the NBI, PIG, and PIGA distributions are modelled in terms of covariates, variable selection should start by selecting the best predictor for parameter
of each claim frequency model. This can be done by adding all available explanatory variables and testing whether the exclusion of each one will result in lower Global Deviance (DEV), Akaike information criterion (AIC), and Schwartz Bayesian criterion (SBC) values. Subsequently, we can continue by testing which explanatory variable between those used in parameter
would lead to a further decrease of the DEV, AIC, and SBC values when inserted in parameter
of each claim frequency model. Furthermore, if different parameter specifications for the same claim frequency model result in very close DEV, AIC, and SBC values, we should opt for the simpler model with the fewer predictors for the dispersion parameter
in order to avoid overfitting. Regarding our data set, as we can observe from
Table 3, the variables AD, HP, and AC are in the model equation for
and the variable AD is in the model equation for
. Additionally, we see that the values of the estimated regression coefficients of the variables AD, HP, and AC are almost identical for
across all three claim frequency models. Additionally, it can be seen that the values of the estimated regression coefficients of the variable AD have a similar effect (positive and/or negative) on parameter
in the case of the NBI and PIG models, but have a different effect for
in the case of the PIGA model. In what follows, we will see that, due to this discrepancy, the a posteriori, or Bonus-Malus, premium rates that result from the traditional NBI and PIG models will differ from those derived by the more heavy-tailed PIGA model.
Finally, normalized randomized quantile residuals, see
Dunn and Smyth (
1996), are used as a graphical tool to help us assess the adequacy of the fit of the competing NBI, PIG, and PIGA regression models with varying dispersion. Additionally, the simple Poisson regression model was fitted for comparison purposes. The normalized randomized quantile residuals for these claim count regression models are defined as
where
is the inverse cumulative distribution function of a standard Normal distribution and where
is defined as a random value from the uniform distribution on the interval
, where
is the cumulative distribution function estimated for the
ith policyholder and where
is the vector of the estimated model parameters after the EM algorithm has reached the global maximum and
is the corresponding observation. The claim frequency model fit can be investigated via the usual quantile–quantile plots. In particular, if the data indeed follow the assumed claim frequency distribution, then the residual on the quantile-quantile plot will fall approximately on a straight line.
Figure 1 depicts the normalized (random) quantiles for the Poisson, NBI, PIG, and PIGA models. From
Figure 1, we see that, unlike the Poisson model, which has a light tail, and, hence, is not a good assumption, the residuals of the NBI, PIG, and PIGA models are close to the diagonal and indicate a good fit to the distribution of the claim frequency. Furthermore, we observe that the more heavy-tailed PIGA model yields a better performance than the NBI and PIG models close to the right tail of the claim frequency distribution. On the other hand, the PIGA model shows a worse fit than the NBI and PIG models in the lower tail. These were anticipated, since, as is well known, and as is was previously mentioned, the tails of mixed Poisson distributions are equivalent to the tails of their mixing distributions, also see, for example,
Willmot (
1998) and
Perline (
1990). Thus, as we move from the less heavy-tailed Gamma and Inverse Gaussian mixing distributions to the Inverse Gamma mixing distribution zero and near zero values in the left tail area become less likely and high values in the right tail area become more likely. Therefore, it should be noted that, regarding our data set, it is reasonable to suggest the employment of the PIGA model, which, as we are going to see in what follows, performs better than the NBI and PIG models in terms of the DEV, AIC, and SBC values, for deriving a posteriori, or Bonus-Malus, ratemaking mechanisms for younger drivers who are more likely to have car accidents than older drivers and, hence, are more likely to make insurance claims. However, it should also be noted that, because, in view of the unique features of the body, the left and the right tail areas of the actual claim frequency distribution, for other data sets the NBI, PIG and/or a different mixed Poisson model may perform better than the PIGA model. Thus, because a particular model cannot represent all aspects of real insurance data, judging from a practical business standpoint, as an overall conclusion, it may be appropriate to use a combination of models that could provide alternative options to the insurer for carrying out different tasks, such as deciding on their pricing strategies, setting the appropriate level of reserves, and reinsurance.
4.3. Application to Ratemaking
In this subsection, the net premium principle is used to compute the a priori and a posteriori, or Bonus-Malus, premium rates resulting from the NBI, PIG, and PIGA distribution/regression models with varying dispersion.
Firstly, the differences between the claim frequency regression models with varying dispersion will be analyzed via the expected claim frequency of the insureds who belong to the eight different risk classes, which are determined by the relevant a priori characteristics. In particular,
, for
serves as a basis of the premium for each risk class.
Table 5 presents the a priori premium rates resulting from the NBI, PIG, and PIGA models. From
Table 5, we see that the group of policyholders with the lowest mean claim frequency are those who are older than 25 years and have a car with HP between 0 and 5000 cc and age greater than five years, i.e., risk class 6. Additionally, the group of insureds with the highest mean claim frequency are those who are aged between 18 and 25 years and have a car with HP between greater than 5000 cc and age between zero to five years, i.e., risk class 3. Overall, as expected, we observe that small discrepancies lie in the mean claim frequency values of the NBI, PIG, and PIGA models. However, when the a posteriori corrections will be computed in the following examples, we will see that allowing both the mean and dispersion parameters of the NBI, PIG, and PIGA models to be to modelled as functions of covariate information will affect the estimation of the Bonus-Malus premium rates. In particular, since, as was previously mentioned, the effect of the values of the estimated regression coefficients of the explanatory variable AD for the dispersion parameter is similar in the case of the NBI and PIG models but differs in the case of the PIGA model as a result the Bonus-Malus premiums determined by the NBI and PIG models will differ from the premiums rates that result from the PIGA model.
Secondly, we investigate how the PIGA distribution/regression model with varying dispersion responds to claim experience. Consider an insured
i with claim frequency history
and
,
characteristics and assume that
is the total number of claims that they had. In what follows, we determine at the renewal of the policy the expected claim frequency
of the insured
i for the period
given the observation of the reported claims in the preceding
t periods and the observable characteristics in the preceding
periods and the current period. As was mentioned in
Section 3, employing Bayes theorem, we can find that the posterior distribution of
is a GIG. Thus, using the quadratic loss function and the net premium principle we can easily see that, in this case, the mean of the posterior structure function given by
1
where
and
are given by Equations (
5) and (6), respectively.
Following this methodology, we calculate the Bonus-Malus premium rates determined by the PIGA model based only on the a posteriori criteria, i.e., the number of individual claims, and based both on the a posteriori and a priori criteria, i.e., the characteristics of the policyholders and their cars. When we consider both criteria, to illustrate the efficiency of the PIGA regression model with varying dispersion for deriving Bonus-Malus ratemaking mechanisms for heavy-tailed and overdispersed claim counts, we restrict our attention to young drivers aged between 18 and 25 years, because they reported significantly more claims when compared to older drivers. In what follows, we examine all four risk classes 1, 2, 3, and 4 of young policyholders who share common characteristics, i.e., which can be formed by all the possible combinations of category C1 of the variable AC with categories C1 and C2 of the variables HP and AC, see
Table 5. Assuming that the number of claims ranges from 0 to 4 and the age of the policy is up to five years, we calculated comparable relative premiums for the NBI, PIG, and PIGA distributions/regression models with varying dispersion respectively. The results are presented in
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10.
From all
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10, we see that, if the policyholder
i has a claim free year, the premium rates reduce, whereas, if they have one or more claims, the premium rates increase, resulting in bonus or malus, respectively. Furthermore, we observe that the bonuses awarded to claim free policyholders are quite similar and moderate discrepancies lie in the premiums required to be paid by those insureds who have reported up to
claims or who have made more than
claims, but have some claim experience in the case of the NBI, PIG and PIGA distributions/regression models with varying dispersion. For example, for the case without covariates, as we can see from
Table 6, the policyholders who are claim free will receive a bonus of 22.72%, 22.02% and 19.23% in year
in the case of the NBI, PIG and PIGA distributions, respectively. Additionally, the insureds who had
claims in year
will have to pay a malus of 125.00%, 140.63%, and 131.43% in the case of the NBI, PIG, and PIGA distributions, respectively. Similarly, for the case with covariates, we observe that claim free policyholders will receive bonuses of 4.23%, 5.45%, and 6.36%, see
Table 7, 11.08%, 14.29%, and 14.00%, see
Table 8, 2.26%, 2.77%, and 3.55%, see
Table 9, and 6.12%, 7.78%, and 8.39%, see
Table 10, in year
in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively. Additionally, the individuals who had
claims in year
will have to pay maluses of 17.85%, 24.50%, and 27.86%, see
Table 7, 7.01%, 6.62%, and 7.65%, see
Table 8, 21.06%, 30.41%, and 37.37%, see
Table 9, and 14.81%, 19.55%, and 21.82%, see
Table 10, in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively.
Furthermore, the more heavy-tailed PIGA distribution/regression model with varying dispersion model penalizes high risk policyholders who reported more than
claims in years
and
more severely than the NBI and PIG distribution/regression models with varying dispersion. Thus, the proposed model encourages good driving behavior more than the NBI and PIG models during the first two years of the policy. At this point, we would also like to call attention to the fact that, since, in our data, many of the young insureds who belong to risk classes 1, 2, 3, and 4 had just started to drive; this is in line with market practice when considering these years of the driving history as the most dangerous period. For instance, for the case without covariates, as we can see from
Table 6, policyholders who had
claims will have to pay a malus of 161.87%, 205.03%, and 248.87% of the basic premium in year
in the case of the NBI, PIG, and PIGA distributions, respectively. Analogously, regarding the case with covariates, we observe, for example, that the insureds who had
claims and belong to risk class 1, see
Table 7, will have to pay a malus of 29.25%, 43.48%, and 60.45% of the basic premium in year
in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively. Additionally, we see that the individuals who belong to risk class 3, see
Table 9, will have to pay a malus of 31.03%, 47.29%, and 70.23% of the basic premium in year
in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively.
Additionally, the premiums required to be paid by a high risk policyholder who has reported more than
claims in different years are better distinguished under the PIGA distribution/regression model with varying dispersion rather than the NBI and PIG distributions/regression models with varying dispersion. Regarding the case without covariates, as we can see from
Table 6, for example, an insured who had
claims in years
and
will have to pay maluses of 142.04% and 110.20%, 168.69% and 118.31%, and 173.91% and 103.42% in the case of the NBI, PIG, and PIGA distributions, respectively. Similarly, for the case with covariates, we observe, for instance, that a policyholder who had
claims in years
and
and belongs to risk class 2, see
Table 8, will have to pay maluses of 18.31% and 10.17%, 22.69%, and 10.18% and 27.11% and 11.82% in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively. An insured who belongs to risk class 4, see
Table 10, will have to pay maluses of 24.91% and 20.01%, 35.35%, and 26.97% and 44.71% and 31.07% in the case of the NBI, PIG, and PIGA regression models with varying dispersion, respectively.
Finally, it is worth noting that the Bonus-Malus premiums reported in
Table 7,
Table 8,
Table 9 and
Table 10 are significantly lower than the Bonus-Malus premiums presented in
Table 6. Therefore, allowing both the mean and the dispersion parameters of the three mixed Poisson models to vary through covariates is justified from a practical business standpoint since the MTPL market is very competitive and, hence, insurance companies can better attract clients by offering them lower penalties. Overall, for all the reasons listed above, it is reasonable to agree that, for the heavy-tailed and overdispersed MTPL data set used in this study, the employment of the PIGA model, which provided the best fitting performances, leads to a better tariffication than the classic NBI and PIG models, since, while it rewards claim free policyholders in a similar way to the latter, it also results in a more reasonable growth in the premium payments.