Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation

Usman, Farha; Chan, Jennifer S. K.; Makov, Udi E.; Wang, Yang; Dong, Alice X. D.

doi:10.3390/risks12090137

Open AccessEditor’s ChoiceArticle

Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation

by

Farha Usman

^1,*,†

,

Jennifer S. K. Chan

^1,*,†,

Udi E. Makov

²,

Yang Wang

¹

and

Alice X. D. Dong

³

¹

School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2050, Australia

²

Department of Statistics, University of Haifa, Haifa 3103301, Israel

³

Transdisciplinary School, University of Technology Sydney, Ultimo, NSW 2007, Australia

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Risks 2024, 12(9), 137; https://doi.org/10.3390/risks12090137

Submission received: 23 July 2024 / Revised: 15 August 2024 / Accepted: 22 August 2024 / Published: 28 August 2024

Download

Browse Figures

Versions Notes

Abstract

We leverage telematics data on driving behavior variables to assess driver risk and predict future insurance claims in a case study utilising a representative telematics sample. In the study, we aim to categorise drivers according to their driving habits and establish premiums that accurately reflect their driving risk. To accomplish our goal, we employ the two-stage Poisson model, the Poisson mixture model, and the Zero-Inflated Poisson model to analyse the telematics data. These models are further enhanced by incorporating regularisation techniques such as lasso, adaptive lasso, elastic net, and adaptive elastic net. Our empirical findings demonstrate that the Poisson mixture model with the adaptive lasso regularisation outperforms other models. Based on predicted claim frequencies and drivers’ risk groups, we introduce a novel usage-based experience rating premium pricing method. This method enables more frequent premium updates based on recent driving behaviour, providing instant rewards and incentivising responsible driving practices. Consequently, it helps to alleviate cross-subsidization among risky drivers and improves the accuracy of loss reserving for auto insurance companies.

Keywords:

usage-based insurance pricing; lasso regression; Poisson mixture model; ROC curve; experience rating auto insurance premium

1. Introduction

Traditional auto insurance premiums have been based on driver-related risk (demographic) factors such as age, gender, marital status, claim history, credit risk and living district, and vehicle-related risk factors such as vehicle year/make/model, which represent the residual value of an insured vehicle. Although these traditional variables or factors indicate claim frequency and size, they do not reflect true driving risk and often lead to cross-subsidising higher-risk drivers by lower-risk drivers to balance the claim cost. These premiums have been criticised for being inefficient and socially unfair because they do not punish aggressive driving nor encourage prudent driving. Chassagnon and Chiappori (1997) reported that the accident risk depends not only on demographic variables but also on driver behaviour that reflects how drivers drive cautiously to reduce accident risk.

Usage-based insurance (UBI) relies on telematic data, often augmented by global positioning systems (GPSs), to gather vehicle information. UBI encompasses two primary models: Pay As You Drive (PAYD) and Pay How You Drive (PHYD). PAYD operates on a drive-less-pay-less principle, taking into account driving habits and travel details such as route choices, travel time, and mileage. This model represents a significant advancement over traditional auto insurance approaches. For instance, Ayuso et al. (2019) utilised a Poisson regression model to analyse a combination of seven traditional and six travel-related variables. However, Kantor and Stárek (2014) highlighted limitations in PAYD policies, notably their sole focus on kilometres driven, neglecting crucial driver behaviour aspects.

By integrating a telematics device into the vehicle, PHYD extends the principles of PAYD to encompass the monitoring of driving behaviour profiles over a specified policy period. Driving behaviour, encompassing operational choices such as speeding, harsh braking, hard acceleration, or sharp cornering in varying road types, traffic conditions, and weather, serves as a defining aspect of drivers’ styles (Tselentis et al. 2016; Winlaw et al. 2019). These collected driving data offer valuable insights into assessing true driving risks, enabling the calculation of the subsequent UBI experience rating premium. This advancement over traditional premiums incorporates both historical claim experiences and current driving risks. The UBI premium can undergo regular updates to provide feedback to drivers, incentivising improvements in driving skills through premium reductions. Research by Soleymanian et al. (2019) indicated that individuals drive less and safer when incentivised by UBI premiums. Moreover, Bolderdijk et al. (2011) demonstrated that monitoring driving behaviours can effectively reduce speeding and accidents by promoting drivers’ awareness and behavioural changes. Wouters and Bos (2000) showed that monitoring of driving resulted in a 20% reduction in accidents. The monitoring system enables early intervention for risky drivers, potentially saving lives (Hurley et al. 2015). Finally, Ellison et al. (2015) concluded that personalised feedback coupled with financial incentives yields the most significant changes in driving behaviour, emphasising the importance of a multifaceted approach to risk reduction.

The popularity of PHYD policies has surged in recent years, driven by the promise of lower premiums for safe driving behaviour. QBE Australia (Q for Queensland Insurance, B for Bankers’ and Traders’ and E for The Equitable Probate and General Insurance Company), renowned for its innovative approaches, introduced a product called the Insurance Box, a PHYD policy featuring in-vehicle telematics. This product not only offers lower premiums to good drivers but also delivers risk scores as actionable feedback on driving performance. These risk scores directly influence the calculation of insurance premiums. In essence, PHYD policies epitomise personalised insurance (Barry and Charpentier 2020), nurturing a culture of traffic safety while concurrently reducing congestion and environmental impact by curbing oil demand and pollutant emissions.

To assess UBI premiums, extensive driving data are initially gathered via telematics technology. Subsequently, a comprehensive set of driving behaviour variables, termed driving variables (DVs), is generated. These variables encompass four main categories: driver-related, vehicle-related, driving habits, and driving behaviours. These DVs are then analysed through regression against insurance claims data to unveil correlations between driving habits and associated risks, which is a process commonly referred to as knowledge discovery (Murphy 2012). Stipancic et al. (2018) determined drivers’ risk by analysing the correlations of accident frequency and accident severity with specific driving behaviours such as hard braking and acceleration.

To forecast and model accident frequencies, Guillen et al. (2021) proposed utilising Poisson regression models applied to both traditional variables (related to drivers, vehicles, and driving habits) and critical incidents, which encompass risky driving behaviours. Through this approach, the study delineates insurance premiums into a baseline component and supplemental charges tailored to near-miss events—defined as critical incidents like abrupt braking, acceleration, and smartphone usage while driving—which have the potential to precipitate accidents. Building on this, Guillen et al. (2020) employed negative binomial (NB) regression, regressing seven traditional variables, five travel-related factors, and three DVs to the frequency of near-miss events attributable to acceleration, braking, and cornering. Notably, the study suggests that these supplementary charges stemming from near-miss events could be dynamically updated on a weekly basis.

To comprehensively assess the potential nonlinear impacts of DVs on claim frequencies, Verbelen et al. (2018) utilised Poisson and negative binomial regression models within generalised additive models (GAMs). They focused on traditional variables, as well as telematic risk exposure DVs, such as total distance driven, yearly distance, number of trips, distance per trip, distance segmented by road type (urban, other, motorways, and abroad), time slot, and weekday/weekend. While these exposure-centric DVs can serve as offsets in regression models, they fail to capture the subtle details of actual driving behaviour. To attain a deeper understanding of driving behaviour, it becomes essential to extract a broader array of DVs that can discern between safe and risky driving practices while also delineating claim risk. For instance, rather than merely registering a braking event, a more comprehensive approach involves constructing a detailed ’braking story’ that accounts for various factors such as road characteristics (location, lanes, angles, etc.), braking style (abrupt, continuous, repeated, intensity, etc.), braking time (time of day, day of the week, etc.), road type (speed limit, normative speed, etc.), weather conditions, preceding actions (turning, lane changing, etc.), and more. Furthermore, the inclusion of additional environmental and traffic variables obtained through GPS enhances the richness of available information, facilitating a more thorough analysis of driving behaviour and associated risk factors.

As the number of variables describing driving behaviour increases, the data can become voluminous, volatile, and noisy. Managing this influx of variables is crucial to mitigate computational costs and address the issue of multicollinearity among them. Multicollinearity arises due to significant overlap in the predictive power of certain variables. For instance, a driver residing in an area with numerous traffic lights might engage in more forceful braking, or an elderly driver might tend to drive more frequently during midday rather than during typical rush hours or late nights. Consequently, it is possible for confusion to arise between factors such as location and aggressive braking or age and preferred driving times. Multicollinearity can lead to overfitting and instability in predictive models, diminishing their effectiveness. Thus, streamlining the variables to a manageable number is not only essential for computational efficiency but also critical for addressing multicollinearity and enhancing the reliability of predictive models.

Machine learning, employing statistical algorithms, holds remarkable potential in mitigating overfitting and bolstering the stability and predictability of various predictive models. These algorithms are typically categorised as supervised or unsupervised. In the realm of unsupervised learning, Osafune et al. (2017) conducted an analysis wherein they developed classifiers to distinguish between safe and risky drivers based on acceleration, deceleration, and left-acceleration frequencies gleaned from smartphone-equipped sensor data from over 800 drivers. By labelling drivers with at least 20 years of driving experience and no accident records as safe and those with more than two accident records as risky, they achieved a validation accuracy of 70%. Wüthrich (2017) introduced pattern recognition techniques utilising two-dimensional velocity and acceleration (VA) heat maps via K-means clustering. However, it is worth noting that neither study offers predictions related to insurance claims.

With claim risk information such as claim making, claim frequency, and claim size, supervised machine learning models embedded within generalised linear models (GLMs) can be constructed to unfold the hidden patterns in big data and predict future claims for premium pricing. Various machine learning techniques are widely utilised in predictive modelling, including clustering, decision trees, random forests, gradient boosting, and neural networks. Gao et al. (2019) investigated the effectiveness of Poisson GAMs, integrating two-dimensional speed–acceleration heat maps alongside traditional risk factors for predicting claim frequencies. They employed feature extraction methods outlined in their previous work (Gao and Wüthrich 2018), such as K-medoids clustering to group drivers with similar heatmaps and principal component analysis (PCA) to reduce the dimensionality of the design matrix, thereby enhancing computational efficiency. Furthermore, Gao et al. (2019) conducted an extensive analysis focusing on the predictive power of additional driving style and habit covariates using Poisson GAMs. In a similar vein, Makov and Weiss (2016) integrated decision trees into Poisson predictive models, expanding the repertoire of predictive algorithms in insurance claim forecasting.

In assessing various machine learning techniques, Paefgen et al. (2013) discovered that neural networks outperformed logistic regression and decision tree classifiers when analysing claim events using 15 travel-related variables. Ma et al. (2018) employed logistic regression to explore accident probabilities based on four traditional variables and 13 DVs, linking these probabilities to insurance premium ratings. Weerasinghe and Wijegunasekara (2016) categorised claim frequencies as low, fair, and high and compared neural networks, decision trees, and multinomial logistic regression models. Their findings indicated that neural networks achieved the best predictive performance, although logistic regression was recommended for its interpretability. Additionally, Huang and Meng (2019) utilised logistic regression for claim probability and Poisson regression for claim frequency, incorporating support vector machine, random forest, advanced gradient boosting, and neural networks. They examined seven traditional variables and 30 DVs grouped by travel habits, driving behaviour, and critical incidents, employing stepwise feature selection and providing an overview of UBI pricing models integrated with machine learning techniques.

However, machine learning techniques often encounter challenges with overfitting. One strategy to address both multicollinearity and overfitting is to regularise the loss function by penalising the likelihood based on the number of predictors. While ridge regression primarily offers shrinkage properties, it does not inherently select an optimal set of predictors to capture the best driving behaviours. Tibshirani (1996) introduced the Least Absolute Shrinkage and Selection Operator (lasso) regression, incorporating an L1 penalty for the predictors. Subsequently, the lasso regression framework underwent enhancements to improve model fitting and variable selection processes. For instance, Zou and Hastie (2005) proposed the elastic net, which combines the L1 and L2 penalties of lasso and ridge methods linearly. Zou (2006) introduced the adaptive lasso, employing adaptive weights to penalise different predictor coefficients in the L1 penalty. Moreover, Park and Casella (2008) presented the Bayesian implementation of lasso regression, wherein lasso estimates can be interpreted as Bayesian posterior mode estimates under the assumption of independent double-exponential (Laplace) distributions as priors on the regression parameters. This approach allows for the derivation of Bayesian credible intervals of parameters to guide variable selection. Jeong and Valdez (2018) expanded upon the Bayesian lasso framework proposed by Park and Casella (2008) by introducing conjugate hyperprior distributional assumptions. This extension led to the development of a new penalty function known as log-adjusted absolute deviation, enabling variable selection while ensuring the consistency of the estimator. While the Bayesian approach is appliable, the running of MCMC is often time-consuming.

When modelling claim frequencies, a common issue arises from an abundance of zero claims, which Poisson or negative binomial regression models may not effectively capture. These zero claims do not necessarily signify an absence of accidents during policy terms but rather indicate that some policyholders, particularly those pursuing no-claim discounts, may refrain from reporting accidents. To identify factors influencing zero and nonzero claims, Winlaw et al. (2019) employed logistic regression with lasso regularisation on a case–control study, assessing the impact of 24 DVs on acceleration, braking, speeding, and cornering. Their findings highlighted speeding as the most significant driver behaviour linked to accident risk. In a different approach, Guillen et al. (2019) and Deng et al. (2024) utilised zero-inflated Poisson (ZIP) regression models to model claim frequencies directly and Tang et al. (2014) further integrated the model with the EM algorithm and adaptive lasso penalty. However, Tang et al. (2014) observed suboptimal variable selection results for the zero-inflation component, suggesting a lower signal-to-noise ratio compared to the Poisson component. Banerjee et al. (2018) proposed a multicollinearity-adjusted adaptive lasso approach employing ZIP regression. They explored two data-adaptive weighting schemes: inverse of maximum likelihood estimates and inverse estimates divided by their standard errors. For a comprehensive overview of various modelling approaches in UBI, refer to Table A1 in Eling and Kraft (2020).

Numerous studies in UBI have employed a limited number of DVs to characterise a broad spectrum of driver behaviours. For instance, Jeong (2022) analysed synthetic telematic data sourced from So et al. (2021), encompassing 10 traditional variables and 39 DVs, including metrics like sudden acceleration and abrupt braking. While Jeong (2022) utilised PCA to reduce dimensionality and enhance predictive model stability, the interpretability of the principal components derived from PCA remains constrained. Regularisation provides a promising alternative for dimension reduction while addressing the challenge of overfitting. The literature on UBI predictive models employing GLMs with machine learning techniques, such as lasso regularisation to mitigate overfitting, is still relatively sparse, particularly concerning forecasting claim frequencies and addressing challenges like excessive zero claims and overdispersion arising from heterogeneous driving behaviours.

Our main objective is to propose predictive models to capture the impact of driving behaviours (safe or risky) on claim frequencies, aiming to enhance prediction accuracy, identify relevant DVs, and classify drivers based on their driving behaviours. This segmentation will enable the application of differential UBI premiums for safe and risky drivers. More importantly, we advocate for the regular updating of these UBI premiums to provide ongoing feedback to drivers through the relevant DVs and encourage safer driving habits.

We demonstrate the applicability of the proposed predictive models through a case study using a representative telematics dataset comprising 65 DVs. The proposed predictive models includes two-stage threshold Poisson (TP), Poisson mixture (PM), and ZIP regression models with lasso regularisation. We extend the regularisation technique to include adaptive lasso and elastic net, facilitating the identification of distinct sets of DVs that differentiate safe and risky behaviours. In the initial stage of regularised TP models, drivers are classified into risky (safe) group if their annual predicted claim frequencies, estimated by a single-component Poisson model, exceed (not exceed) predefined thresholds. Subsequently, in stage two, regularised Poisson regression models are refitted to each driver subgroup (exceeding thresholds or not) using different sets of selected DVs in each group. Alternatively, PM models simultaneously estimate parameters and classify drivers. Our findings reveal that PM models offer greater efficiency compared to TP models, providing added flexibility and capturing overdispersion akin to NB distributions.

In ZIP models, we observe that the structural zero component may not necessarily indicate safe drivers, as safe drivers may claim less frequently but not necessarily abstain from claims altogether, while risky drivers may avoid claims due to luck or incentives like bonus rewards. So et al. (2021) investigated the cost-sensitive multiclass adaptive boosting method, defining classes based on zero claims, one claim, and two or more claims, differing from our proposed safe and risky driver classes. We argue that the level of accident risk may not solely correlate with the number of claims but rather with driving behaviours. Hence, the regularised PM model proves more efficient in tracking the impact of DVs on claim frequencies, allowing for divergent effects between safe and risky drivers. This proposed PM model constitutes the primary contribution of this paper, addressing a critical research gap in telematics data analysis.

Our second contribution is to bolster the robustness of our approach and mitigate overfitting by incorporating resampling and cross-validation (CV) apart from lasso regularisation. These techniques help us attain more stable and reliable results. Additionally, we utilise the area under curve (AUC) from the receiver operating characteristic (ROC) curve as one of the performance metrics, which evaluates classification accuracy highlighting the contribution of predictive models in classifying drivers.

Our third contribution involves introducing an innovative UBI experience rating premium method. This method extends the traditional experience rating premium method by integrating classified claim groups and predicted claim frequencies derived from the best-trained model. This dynamic pricing approach also enables more frequent update of premiums to incentivise safer and reduced driving. Moreover, averaged and individual driving scores from the identified relevant DVs for each driver can inform their driving behaviour possibly with warnings and encourage skill improvement. By leveraging these advanced premium pricing models, we can improve loss reserving practices, and we can even evaluate the legitimacy of reported accidents based on driving behaviours.

Lastly, we highlight a recent paper (Duval et al. 2023) with similar aims to this paper. They applied logistic regression with elastic net regularisation to predict the probability of claims, but this paper considers two-group PM regression instead of logistic regression to predict claim frequency and allow different DVs to capture the distinct safe and risky driving behaviours. For predictive variables, they used driving habits information (when, where, and how much the insured drives) from telematics, as well as traditional risk factors such as gender and vehicle age, whereas this paper focuses on driving behaviour/style (how the insured drives). To avoid handcrafting of telematics information, they proposed measures using the Mahalanobis method, Local Outlier Factor, and Isolation Forest to summarise trip information into local/routine anomaly scores by trips and global/peculiar anomaly scores by drivers, which were used as features. This is an innovative idea in the literature. On the other hand, this paper uses common handcraft practices to summarise driving behaviour by drivers, using both driving habits (where and when) and driving styles (how) information by defining driving events such as braking and turning considering time, location, and severity of events. Duval et al. (2023) demonstrated that the improvement in classification using lower global/peculiar Mahalanobis anomaly scores enables a more precise pure premium (product of claim probability from logistic regression to insured amount) calculation. As stated above, this paper provides differential contributions by classifying drivers into safe and risky groups, predicting claims for drivers in their groups using regularised PM models (among regularised TP and ZIP models), which is pioneering in the UBI literature, and calculating premiums using the proposed innovative UBI experience rating premium based on drivers’ classifications (safe/risky) and predicted annual claims.

The paper is structured as follows: Section 2 outlines the GLMs, including Poisson, PM, and ZIP regression models, alongside lasso regularisation and its extensions. Section 3 presents the telematics data and conducts an extensive empirical analysis of the two-stage TP, PM, and ZIP models. Section 4 introduces the proposed UBI experience rating premium method. Lastly, Section 5 offers concluding remarks and implementation guidelines and explores potential avenues for future research.

2. Methodologies

We derive predictive models for the claim rate of any driver using GLMs with lasso regularisation when the true model is assumed to have a sparse representation of 65 DVs. This section also considers some model performance measures including AUC.

2.1. Regression Models

2.1.1. Poisson Regression Model

The Poisson regression model is commonly applied to count data, like claims. It is defined as

Y_{i} \sim Poisson (μ_{i} (β)), μ_{i} (β) = n_{i} a_{i} = n_{i} \exp (x_{i •} β) = \exp (x_{i •} β + \log (n_{i}))

(1)

where

Y = Y_{1 : N}

,

β = β_{0 : J_{1}}

, and J are the number of selected DVs in the model;

a_{i} = \exp (x_{i •} β)

estimates the number of claim per year for driver i; and

\log (n_{i})

is the offset parameter in a regression model. Poisson regression assumes equidispersion and is applied to the 2-stage TP model in Section 3.4. For overdispersed data, negative binomial (NB) distribution, as in the Poisson–Gamma mixture, provides extra dispersion. With NB regression, the term “

Poisson (μ_{i})

” in (1) is replaced with NB distribution

NB (ν, q_{i}) = NB (ν, μ_{i} / (ν - μ_{i}))

—where

ν

is the shape parameter, and

q_{i}

is the success probability of each trial. NB distribution converges to Poisson distribution if

ν

tends to infinity.

2.1.2. Poisson Mixture Model

The finite mixture Poisson model is another popular model for modelling unobserved heterogeneity. The model assumes G unobserved groups each with probability

π_{g}

,

0 < π_{g} < 1, g = 1, \dots, G

, and

\sum_{g = 1}^{G} π_{g} = 1

. Focusing on classifying safe and risky drivers, we model

G = 2

groups. Assume that the claim frequency

Y_{i}

for driver i in group g follows a Poisson distribution with mean

μ_{i g}

, that is,

Y_{i} \sim Poisson (μ_{i g})

at probability

π_{g}, g = 1, \dots, G

. The probability mass function (pmf) of the Poisson mixture model is

f (y_{i} | π, μ_{i 1} (β_{1}), μ_{i 2} (β_{2})) = π f_{1} (y_{i} | μ_{i 1} (β_{1})) + (1 - π) f_{2} (y_{i} | μ_{i 2} (β_{2}))

(2)

where

π_{1} = π

and

π_{2} = 1 - π

,

β_{g} = {(β_{0 : J, g})}^{⊤}

,

θ = {(β_{1}^{⊤}, \dots, β_{G}^{⊤}, π_{1}, \dots, π_{G - 1})}^{⊤}

is the model parameter vector, and

f_{g} (y_{i} | μ_{i g} (θ))

is the Poisson pmf with mean function

μ_{i g} (β_{g}) = \exp (x_{i •} β_{g} + \log (n_{i}))

.

The expectation–maximisation (EM) algorithm is often used to estimate parameters

θ

. In the E step, the posterior group membership for driver i is estimated by

z_{i g} = \frac{π_{g} f_{g} (y_{i} | μ_{i g} (β_{g}))}{\sum_{g^{'} = 1}^{G} π_{g^{'}} f_{g^{'}} (y_{i} | μ_{i g^{'}} (β_{g}))} .

(3)

The marginal predicted claim is

{\hat{y}}_{i} = {\hat{z}}_{i 1} μ_{i 1} (β_{1}) + (1 - {\hat{z}}_{i 1}) μ_{i 2} (β_{2}) .

(4)

If there is a high proportion of zero claims, the ZIP model (Lambert 1992) may be suitable to capture the excessive zeros. The model is a special case of a two-group mixture model that combines a zero point mass in group 1 with a Poisson distribution in group 2. The zeros may come from the point mass (structural zero) or the zero count (natural zero) in a Poisson distribution. The model is given by

\Pr (Y_{i} = 0) = π_{i} + (1 - π_{i}) \exp (- μ_{i}), and \Pr (Y_{i} = y_{i}) = (1 - π_{i}) \frac{μ_{i}^{y_{i}} \exp (- μ_{i})}{y_{i}!}, y_{i} \geq 1

(5)

where the two regression models (called zero and count) for the probability

π_{i}

of structural zero and the expected counts (including nonstructural zero)

μ_{i}

, respectively, are given by

π_{i} (θ) = \frac{\exp (x_{i •} ψ + \log (n_{i}))}{1 + \exp (x_{i •} ψ + \log (n_{i}))}, and μ_{i} (β) = \exp (x_{i •} β + \log (n_{i})),

(6)

and the logistic regression parameters

ψ = {(ψ_{0}, \dots, ψ_{J_{ψ}})}^{⊤}

define a vector of

J_{ψ}

selected DVs to estimate the probability of extra zero; the vector of model parameters is

θ = {(ψ^{⊤}, β^{⊤}, π)}^{⊤}

.

2.2. Regularisation Techniques

The stepwise procedure to search for a good subset of DVs often suffers from high variability, a local optimum, and ignorance of uncertainty in the searching procedures (Fan and Li 2001). Lasso (L) regularisation offers an alternative approach to select variables for parsimonious models. It is further extended to adaptive lasso (A), elastic net (E), and adaptive elastic net (N). These regularisations with L1 penalty provide a simple way to enforce sparsity in variable selection by shrinking some coefficients

β_{j}

to zero. This aligns with our aim to select important DVs, that is, those with coefficients not shrunk to zero.

To implement these regularisation techniques, we consider the penalised log likelihood (PLL) (Banerjee et al. 2018; Bhattacharya and McNicholas 2014). For the case of the most general adaptive elastic net regularisation, coefficients

β_{λ, w, α, N}

of Poisson regression in (1) estimated by minimising the penalised log likelihood are given by

{LOSS}_{λ, α, w} (β) = - \sum_{i = 1}^{N} \log f (y_{i}; μ_{i} (β)) + λ [\frac{1 - α}{2} \sum_{j = 1}^{J} w_{j} {β_{j}}^{2} + α \sum_{j = 1}^{J} w_{j} | β_{j} |]

(7)

where the first term is the negative log likelihood (NLL), the second term is the penalty,

f (y_{i}; μ_{i} (β))

is the pmf of Poisson model, and

w_{j}

are the data-driven adaptive weights. Equation (7) includes special cases:

α = 1, w_{j} = 1

for lasso,

α = 1

for adaptive lasso, and

w_{j} = 1

for elastic net.

The development of (7) starts with the basic lasso regularisation with

α = 1

and

w_{j} = 1

. The parameter estimates condition on

λ

are given by

{\hat{β}}_{j, λ} = β_{j, NLL} \max (0, 1 - \frac{N λ}{| β_{j, NLL} |})

(8)

where

β_{NLL} = (β_{1, NLL}, \dots, β_{J, NLL})

minimise the NLL when

λ = 0

. As

λ

increases, the term

1 - \frac{N λ}{| β_{j, NLL} |}

becomes negative, and so

{\hat{β}}_{j, λ}

will shrink to zero. Then, the penalty term

λ \sum_{j = 1}^{J} | {\hat{β}}_{j, λ} |

will drop as more

{\hat{β}}_{j, λ} = 0

, but NLL increases as

β_{λ} = ({\hat{β}}_{1, λ}, \dots, {\hat{β}}_{J, λ})

get further away from

β_{NLL}

so that one can choose a

λ_{\min}

that minimises the PLL to obtain

β_{λ, L}

. Alternatively, one can perform a K-fold CV and choose

λ_{\min}

that provides the best overall model fit for all K validated samples. Different criterion may suggest different optimal

λ_{\min}

and hence the estimates

β_{λ_{\min}}

. Details are provided points 1–2 in Appendix A.

However, Meinshausen and Bühlmann (2006) showed the conflict of optimal prediction and consistent variable selection in lasso regression. Moreover, whether lasso regression has an oracle procedure is debatable. An estimating procedure is an oracle if it can identify the right subset of variables and has an optimal estimation rate so that estimates are unbiased and asymptotically normal. Städler et al. (2010) also proclaimed these issues and addressed some bias problems of the (one-stage) lasso, which may shrink important variables too strongly. Zou (2006) introduced the two-stage adaptive lasso as a modification of lasso in which each coefficient

β_{j}

is given its own weight

w_{j}

to control the rate as each coefficient is shrunk towards 0.

Adaptive lasso deals with three issues, namely, inconsistent selection of coefficients, lack of oracle property, and unstable parameter estimation when working with high dimensional data. As smaller coefficients

β_{j, NLL}

in (8) will leave the model faster than larger coefficients, Zou (2006) suggested the weights

w_{j} = {| {\hat{β}}_{j, R} |}^{- γ}

in (7), where the tuning parameter

γ > 0

is to ensure that the adaptive lasso has oracle properties and that

{\hat{β}}_{j, R}

is an initial estimate from ridge regression. The weights are rescaled so that their sum equals to the number of DVs. Städler et al. (2010) suggested the tuning parameter

γ = 1

for low-claim threshold and

γ = 2

for high-claim threshold. We adopted

γ = 2

as the best tuning parameter to estimate weights

w_{j}

in the subsequent adaptive lasso models.

Zou and Zhang (2009) argued that L1 penalty can perform poorly when there are multicollinearity problems, which is common in high-dimensional data. This severely degrades the performance of lasso. They proposed elastic net, which takes a weighted average of two penalties: ridge (

α = 0

) and lasso (

α = 1

). The mixing parameter

α \in (0, 1)

in (7) balances the two penalties with

α > 1 / 3

, indicating a heavier lasso penalty.

When regularisation is applied to ZIP and PM models, the penalised log likelihood in (7) is extended to

\begin{matrix} {LOSS}_{λ, α, w} (β_{1}, β_{2}) = & - \sum_{i = 1}^{N} \log f (y_{i}; π, μ_{i 1} (β_{1}), μ_{i 2} (β_{2})) + λ_{1} [\frac{1 - α}{2} \sum_{j = 1}^{J} {β_{j 1}}^{2} + α \sum_{j = 1}^{J} w_{j 1} | β_{j 1} |] \\ + λ_{2} [\frac{1 - α}{2} \sum_{j = 1}^{J} {β_{j 2}}^{2} + α \sum_{j = 1}^{J} w_{j 2} | β_{j 2} |] \end{matrix}

(9)

where

f (y_{i}; π, μ_{i 1} (β_{1}), μ_{i 2} (β_{2}))

is given by (2).

The optimal

α

needs to be searched over to identify the best

α

with the lowest mean square error (MSE), root MSE (RMSE), or R-squared. We searched for

α

in (7) and (9) for different models summarised in Table 1. For example, model TPAL-2 refers to a stage 2 TP model when adaptive lasso regularisation is applied in stage 1 and lasso is applied in stage 2. Different stage 1 TP, stage 2 (under TPL-1 and TPA-1) TP with threshold

τ

(to split predicted annual claim

a_{i} = y_{i} / n_{i}

into low and high groups), PM, and ZIP models were considered. We ran each model over five

α

values (0.100, 0.325, 0.550, 0.775, 1.000) and identified the best

α

, which gives the lowest RMSE with

K = 10

folds CV. To ensure the search is robust, results were repeated

R = 100

times for each model based on

R = 100

70% subsamples

S_{1 : R}

. Results show that low

α = 0.1

should be adopted for stage 1 TP models, the low group of most stage 2 TP models, the PME model, and the PMN model, whereas higher

α = 0.775, 1

should be adopted for the higher group of stage 2 TP models. See point 1 in Appendix B for the implementation of all lasso regularisation procedures under Poisson regression and point 2 in Appendix B for the implementation details using caret package in R.

2.3. Model Performance Measures

Model performance can be evaluated from different aspects depending on the aims and model assumptions. The goodness of model fit, prediction accuracy, and classification of drivers are the main types of criteria that are linked to different metrics.

Firstly, the Bayesian information criterion (BIC) is a popular model fit measure that contains a deviance and a parameter penalty term using the log of sample size as the model complexity penalty weight. The Akaike information criterion (AIC) can also be used when the parameter penalty term uses 2 as the weight. Deviance (without parameter penalty) is also used by some packages to select models.

Secondly, for prediction accuracy, we adopted the popular mean square error

MSE = \sum_{i = 1}^{N} {(y_{i} - μ_{i})}^{2} / N

and mean absolute error

MAE = \sum_{i = 1}^{N} {| y_{i} - μ_{i} |}^{2} / N

. The third measure we considered is the correlation

ρ

between observed and predicted annual claim frequencies (instead of claim frequencies in MSE). A higher correlation shows better performance.

Lastly, to quantify classification performance, the difference between observed group membership and predicted group membership should be quantified. In machine learning, AUC for ROC curve (Fawcett 2006) is a measure of model classification power. It constructs a confusion matrices condition on the classifier (e.g.,

a_{i}

in (1) for TP-2 and

z_{i 1}

in (3) for PM) cutoff, calculates the true positive rate (TPR) (sensitivity) and the false positive rate (FPR) (1-specificity), and plots the TPR against the FPR as the discrimination cutoff for the classifier varies to obtain the ROC curve. AUC is the probability that a randomly chosen member of the positive class has a lower estimated probability of belonging to the negative class than a randomly chosen member of the negative class. See point 3 in Appendix B for implementation. For the claim data, we let the binary classifier be the low-claim (safe driver) and high-claim (risky driver) groups. However, the group membership of each driver is not observed, so it is approximated using K-means clustering, which minimises the total within-cluster variation using the selected DVs for each model. These four types of measures, namely BIC, MSE,

ρ

, and AUC, assessing different performance perspectives, were applied to assess the performance of a set of models

M

.

Although these four measures are popular in statistical and machine learning models, they are not particularly built for count models. Czado et al. (2009) and Verbelen et al. (2018) proposed six scores for claim count models based on the idea of probability integral transform (PIT) or, equivalently, the predictive CDF. The six scores are defined as

\begin{matrix} Logarithmic : & Log (F, y) & = - \log (f_{y}) \\ Quadratic (Quad) : & Quad (F, y) & = - 2 f_{y} + ∥ f ∥ \\ Spherical : & Spher (F, y) & = - f_{y} / ∥ f ∥ \\ Ranked Probability : & RankProb (F, y) & = \sum_{k = 1}^{\infty} {[F_{y} - 1 (y \leq k)]}^{2} \\ Dawid - Sebastiani : & Dawid (F, y) & = {(\frac{y - μ_{F}}{σ_{F}})}^{2} + 2 \log (σ_{F}) \\ Squared error : & SqErr (F, y) & = {(y - μ_{F})}^{2} \end{matrix}

where

Y \sim

Poisson,

f_{y} = \Pr (Y = y)

,

F_{y} = \Pr (Y \leq y)

,

μ_{F} = E (Y)

,

σ_{F} = Var (Y)

, and

∥ f ∥ = \sum_{k = 0}^{\infty} f_{k}^{2}

. For PM models,

f_{y} = π_{1} f_{y 1} + (1 - π_{1}) f_{y 2}

,

F_{y}

, and

μ_{F}

are similarly defined, and

σ_{F}^{2} = \sum_{k = 0}^{\infty} {(k - μ_{F})}^{2} f_{k}

. To accommodate the effect of driver classification, the prior probability

π_{g}

is replaced by posterior probabilities

z_{i g}

in (3). These scores are averaged over drivers, and lower scores indicate better predictions. The Logarithmic score is the common NLL, which is a model-fit measure. Quadratic and Spherical scores are similar to Logarithmic scores for assessing model fit using different functional forms. Dawid–Sebastiani and Squared error (MSE) scores measure prediction accuracy. For the Dawid–Sebastiani score, the term

2 \log (σ_{F})

adjusts for the fact that the first term decays to zero as

σ_{F}

tends to infinity. The Ranked probability score calculates the sum of squares value to summarise the PP plot, plotting the fitted cumulative probability

F_{y}

against the observed proportion

1 (y \leq k) / N

when averaged. These six measures are used to select the final model to enrich the versatility of our model selection criteria.

To facilitate model selection, we ranked each model

M

in the model class

M

in descending order of preference for each performance measure

m_{M, 1 : 4}

for (BIC, MSE,

ρ

, AUC) and

m_{M, 1 : 6}

for (Log, Quad, Spher, RankProb, Dawid, SqErr) to obtain ranks

R_{M, l} = \underset{M \in M}{rank} (m_{M, l})

and sum of ranks

R_{M} = \sum_{l = 1}^{l} R_{M, l}, l = 4, 6

(10)

to reflect the performance of each model

M

.

3. Empirical Studies

3.1. Data Description

The dataset is originated from cars driven in the US, where special UBI sensors were installed. The University of Haifa Actuarial Research Center provided the data, where UBI modelling was analysed (Chan et al. 2022). It contains two column vectors of claim frequencies

y = y_{1 : N}

and policy duration or exposure

n = n_{1 : N}

in a year. Ninety-two percent of

y

are zero. Figure 1 displays three histograms for

y

,

n

, and annual claim frequencies

a

(

a_{i} = y_{i} / n_{i}

), respectively. The dataset also contains

J_{0} = 65

numerical DVs constructed based on the information collected from telematics and GPS for

N =

14,157 drivers. Figure A2 in Appendix D.1 visualises the DVs by plotting

x_{i j}

across driver i, with colours indicating the number of claims

y_{i} = 0, 1, 2^{+}

. We remark that the DVs are labelled up to 77 with some skips of numbers. For example, DV 6, 11, 12, etc. do not exist in Figure A2. Each DV describing a specific event (details in the next section) has been aggregated over time to obtain certain incidence rates (per km or hour of driving) and scaled to normalise their ranges for better interpretability of their coefficients in the predictive models. These procedures transformed the multidimensional longitudinal DVs into a single row for each driver, which is the unit of analysis. All DVs are presented as column vectors

x_{• j}, j = 1, \dots, J_{0}

.

3.2. Data Cleaning and DVs Setting

Telematics sensors are installed by car manufacturers to provide much cleaner signals. Therefore, standard data cleaning techniques, including the removal of outliers, were applied. External environmental information from GPS was utilised to minimise false signals, recognising that driving behaviours are often responsive to varying conditions. Then, the DVs can be defined to indicate specific driving events, which can associate with certain driving risks. However, while rapid acceleration is typically undesirable, it may be necessary when merging onto a busy highway. To accurately process and analyse telematics and GIS data, roads were categorised into specific types such as highways, junctions, roundabouts, and others. This segmentation enables a more precise assessment of driving behaviours across different contexts, improving safety measures and performance evaluations. Given the complexity of telematics data, including metrics like acceleration and braking, the definition of events like rapid acceleration or hard braking was adapted to account for varying road conditions depending also on time. Hence, the DVs were defined for a range of driving events as combinations of event types (e.g., accelerating, braking, left/right turning), environmental condition (e.g., interchange, junction), and time (e.g., the morning rush from 6 am to 9 am). Then, rates of the events (over standardised period or mileage) were evaluated and normalised. Appendix C provides the labels and interpretation of these DVs.

3.3. Exploratory Data Analyses

To summarise the variables, their averages are presented:

\bar{y} = 0.083, \bar{n} = 1.146

, and

\bar{a} = 0.075

. We split the drivers into three classes:

C_{b}, b = 0, 1, 2^{+}

, with 0, 1, and at least 2 claims and class sizes

N_{b}

. Their proportions—

p_{b} = N_{b} / N

—came out to (0.92, 0.07, 0.005), averaged exposures—

{\bar{n}}_{b} = \sum_{i \in C_{b}} n_{i} / N_{b}

—came out to (1.13, 1.38, 1.64), and averaged annual claim frequencies—

{\bar{a}}_{b} = \sum_{i \in C_{b}} a_{i} / N_{b}

—came out to (0, 0.92, 1.71). The average claim frequency for

C_{2^{+}}

was 2.11. Regressing the claim frequencies

y_{i}

on the exposure

n_{i}

, the

R^{2}

was only 0.014, showing that the linear effect of exposure on claim frequency is weak and insignificant. Hence, it is possible that other effects, such as driving behaviour as measured by the DVs

x_{• j}

, may impact the claim frequency

y_{i}

. Section 3.4, Section 3.5 and Section 3.6 will analyse such effects of DVs on claim frequencies.

Section 2.1.1 introduced the Poisson and NB regression for equidispersed and overdispersed data, respectively. To assess the level of dispersion, we used sample variance

Var (y) = 0.089

, which shows equidispersion possibly due to the large proportion of zeros. We also tested the equidispersion assumption with the null hypothesis—

H_{0}

:

Var (Y_{i}) = μ_{i}

—and alternative hypothesis—

H_{1}

:

Var (Y_{i}) = μ_{i} + ψ g (μ_{i})

—where

g (\cdot) > 0

is a transformation function (Cameron and Trivedi 1990), and

ψ > 0

(

ψ \leq 0

) indicates overdispersion (underdispersion). See point 4 in Appendix B for the implementation. For model TPL-1,

ψ = 0.0369

(

p = 0.0482

), and for model TPA-1,

ψ = 0.0369

(

p = 0.0477

), which are marginally significant outcomes. Moreover, the TP, PM, and ZIP models can capture some overdispersion by splitting according to threshold and mixture components. Hence, we focused on Poisson regression for all subsequent analyses.

Moreover, noninformative DVs can lead to unstable models. In Figure A2, seven DVs (1, 2, 7, 8, 10, 14, and 28) are shown to be sparse, with at most 13 nonzeros. Hence, we explored the information content of each DV. Firstly, nonsparsity

S_{j}

, defined as the proportion of nonzero data for each DV, is reported. A refined measure is Shannon’s entropy (Shannon 2001)

H_{j}

, which measures the degree of disorder/information of each DV. While

H_{j}

provides no information of the relationship with

y

, the information gain

I G_{j}

evaluates the additional information that the jth DV provides about the claims

y

with respect to the three classes

C_{b}, b = 0, 1, 2^{+}

.

Apart from the information content of the DVs, it became clear that the multicollinearity between DVs also affects the stability of a regression model. Figure A3a in Appendix D.2 plots the correlation matrix of

Corr (x_{• 1 : J}, y, a)

, and the correlations of the DVs with

y

are denoted by

ρ_{j} = Corr (x_{• j}, y)

. The correlation matrix shows that the DVs up to 16 (except 9) are nearly uncorrelated with each other, the next up to DV 39 are mildly correlated, and the rest are moderately correlated, reflecting some pattern of these DVs. However, they are only weakly correlated with

y

and

a

, showing low signal content of each DV in predicting

y

and

a

.

Table 2 reports

ρ_{j}

,

H_{j}

,

S_{j}

, and

I G_{j}

to quantify the information content of the 65 DVs, flags a DV as “✗” when

H_{j} < 1

and

S_{j} < 1 %

(“✓” otherwise), and highlights

I G_{j}

in boldface when

I G_{j} > 0

indicates information gain. Asterisks are added to the DVs’ ID to indicate the two levels of information content in “✓” and boldface, respectively. Twenty DVs with “✗” were classified as having low information content. Including them in the more complicated PM and ZIP models led to unstable results. Thus, we dropped them and considered

J_{1} = 65 - 20 = 45

DVs in the PM and ZIP models, but we considered all

J_{0} = 65

DVs in the TP models. All the DVs were normalised before analyses to ensure efficient modelling.

Figure A3 in Appendix D.2 plots the Euclidean distance

d (j, j^{'}) = \sqrt{\sum_{i = 1}^{N} {(x_{i j} - x_{i j^{'}})}^{2}}

between the

(j, j^{'})

pair of DVs and demonstrates the hierarchical clustering based on

d (j, j^{'})

. The results show one major cluster of size 54 and two more smaller clusters (

49^{* *}

,

36^{*}

,

43^{* *}

,

72^{*}

,

56^{*}

,

63^{*}

) and (

55^{*}

,

59^{*}

,

71^{*}

,

27^{* *}

,

58^{*}

), with increasing pairwise distance from the major cluster. All spare DVs labelled as noninformative are in the major cluster. These DV features guided our interpretation of the selected DVs in subsequent analyses. Refer to Table A1 and Appendix C for the interpretation of these DVs.

3.4. Two-Stage Threshold Poisson Model

The TP model fit Poisson regression in Section 2.1.1 twice at stages 1 and 2 with the aim of classifying drivers into safe and risky groups at stage 1 and determining predictive DVs for each group at stage 2 using a single-component Poisson model. The DVs for TP models were selected from

J = J_{0} = 65

DVs.

At stage 1, lasso-regularised Poisson regression models were trained through resampling and applied to predict claim frequencies for all drivers. To ensure model robustness and reduce overfitting, we repeated regularised Poisson regression

R = 100

times with 70% simulated subsamples of size

N_{r} = 9910

each and selected DVs for each repeat. For each repeat r, the optimal

λ_{\min, r}

was selected with

K = 10

(default setting) folds CV. Then, the DVs most frequently selected were identified using a weighted count

I_{j}

in (A3) based on the RMSE. There were

J^{T 1} = 52

selected DVs for TPL-1 model and

J^{T 1} = 39

DVs for TPA-1 model. The details are provided in Appendix A. Then, Poisson regression models with the selected DVs were refitted to all drivers. Parameter estimates

β^{T 1}

are reported in Table A2 in Appendix E under

β_{j}^{T 1}

. To visualise these coefficients, Figure 2a plots the heat map of

β^{T 1}

for all PT-1 models. Let

J_{S}^{T 1}

be the number of significant

β_{j}^{T 1}

with a p value < 0.05. For the TPA-1 model, Table A2 shows that

J_{S}^{T 1} = 13

out of

J^{T 1} = 44

selected DVs are significant. Table 3a shows that TPL-1 and TPE-1 provided a similar selected number

J^{T 1}

of DVs. The same applied to models TPA-1 and TPN-1 with adaptive weights. As feature selection is important, we selected the best model in each group, and they are highlighted in Table 3a.

Table 3. Model performance measures for stage 1 TP and ZIP models, stage 2 TP and PM models, and final selection.

At stage 2, predicted claims

{\hat{y}}_{i} = μ_{i}

were calculated using the fitted means in (1) and

β^{T 1}

. Then, the predicted annual claim frequencies

{\hat{a}}_{i} = {\hat{y}}_{i} / n_{i}

were calculated, and drivers were classified into low- and high-claim groups according to

{\hat{a}}_{i} < τ_{h}

and

{\hat{a}}_{i} \geq τ_{h}

, respectively. We considered four thresholds

τ = {τ_{h}} = (0.08, 0.09, 0.10, 0.11)

, and the proportion

P_{h}

of drivers classified into the low claim group out of all drivers was (0.70, 0.79, 0.85, 0.90), respectively. Figure 3 shows how the drivers were classified to low- and high-claim groups according to the four thresholds

τ

using

{\hat{a}}_{i}

from TPA-1 and visualises the relationship between the observed

a_{i}

and predicted

{\hat{a}}_{i}

. We attribute the nonlinear pattern in these scatter plots partially to the impact of driving behaviour revealed by the DVs.

To improve model robustness and reduce overfitting, subsamples

S_{1 : R}

of size

N_{r} = 0.7 N

were drawn again, and each

S_{r}

was further split into two groups

G_{L, r h}^{T 2} = {y_{i} \in S_{r} : {\hat{a}}_{i} < τ_{h}} and G_{H, r h}^{T 2} = {y_{i} \in S_{r} : {\hat{a}}_{i} \geq τ_{h}}, h = 1, \dots, 4

(11)

where T2 indicates stage-2 of the TP model. Then, regularised Poisson regression was applied to each

G_{L, r h}^{T 2}

and

G_{H, r h}^{T 2}

. Let the index sets

I_{L h}^{β}

and

I_{H h}^{β}

for nonzero coefficients (that is, selected at least once from

S_{r}

) be defined similar to

I^{β}

in (A2) for

h = 1, \dots, 4

. Then,

β_{L h} = (β_{L h, j \in I_{L h}^{β}})

and

β_{H h} = (β_{H h, j \in I_{H h}^{β}})

are averaged parameter estimates for the low- and high-claim groups, respectively, obtained in a similar manner to

β

defined in (A1), and

I_{L h}^{T 2}

and

I_{H h}^{T 2}

are importance measures based on the

{RMSE}_{r, L h}

and

{RMSE}_{r, H h}

defined similarly to

I

in (A3);

J_{L h}^{T 2}

and

J_{H h}^{T 2}

are the number of frequently selected DVs out of

β_{L h}

and

β_{H h}

, with

I_{L h j}

> 43 (

62 \times 0.70

;

R_{1} = 0.70

for

τ_{0.08}

and

\max (I_{j}, j \in I^{β}) = 62

is the lower threshold of

I_{j}

), 49 (

τ_{0.09}

), 53 (

τ_{0.10}

), 56 (

τ_{0.11}

),

I_{H h j} >

19 (

62 \times 0.30

), 13, 9, and 6, respectively, (and dropped otherwise). Poisson regression models with various selected DVs were refitted to the low- and high-claim groups for each h. Table A3 in Appendix E reports the parameter estimates

β_{L h}^{T 2}, β_{H h}^{T 2}

of the best model (TPLA-2 for

τ = 0.08, 0.09

and TPLN-2 for

τ = 0.10, 0.11

) when the stage 1 model was TPL-1 (from

J^{T 1} = 52

selected DVs) or TPA-1 (

J^{T 1} = 39

). Table 3b reports the number

J_{L h}^{T 2}, J_{H h}^{T 2}

of selected

β_{L h j}^{T 2}

and

β_{H h j}^{T 2}

and the number

J_{LS h}^{T 2}, J_{HS h}^{T 2}

of significant

β_{L h j}^{T 2}

and

β_{H h j}^{T 2}

with p values < 0.05.

To visualise these coefficients, Figure 2a presents the heat map for models in the low- and high-claim groups. It shows that the DVs, which were mostly selected and significant in the low-claim groups, are 4,

9^{*}

,

18^{* *}

, 24, 32,

37^{*}

,

47^{*}

, 57,

67^{* *}

,

73^{*}

, and 74, while the least are 2, 7, 15, 16,

23^{*}

,

46^{*}

, and 53. For the high-claim group, DVs 4,

29^{*}

,

52^{*}

, and

67^{* *}

were mostly selected. The information content of each DV is indicated by asterisks. See Table 2 for details and Table A1 for the interpretation of these DVs. We observed that there were more selected DVs for the low claim group with thresholds

τ_{2 : 4} = 0.09, 0.10, 0.11

, and the selected DVs are relatively more informative. Two DVs, 4 and 67, were selected in both the low- and high-level claim groups but with differential effects: negative for the low-claim group and positive for the high-claim group for DV 4, whereas DV 67 had a consistent negative effect for both groups.

For model selection, Table 3b summarises the model performance measures, BIC, MSE,

ρ

, and AUC (see Section 2.3) using all data for 32 models, with 8 models under each threshold. The two criteria, BIC and AUC, in Table 3b were averaged over the two groups using the ratio

R_{h}

in Table A3. For each threshold, top ranked measures and the sum of rank

R_{M}

in (10) are boldfaced and yellow highlighted. We first dropped those models with

J_{h}^{T 2}, J_{S h}^{T 2} = 0

for either group and chose the best model

M

with the top

R_{M}

. The best stage two model is TPLA-2 for

τ = 0.08, 0.09

and TPLN-2 for

τ = 0.10, 0.11

. The results will be compared for the PM and ZIP models in Section 3.7.

3.5. Poisson Mixture Model

To facilitate driver classification, we considered lasso-regularised PM models. To robustify our results, we again performed 70% resampling to obtain

R = 100

subsamples of size

N_{r} = 9910

. The parameters were selected from the

J = J_{1} = 45

more informative DVs to provide stable results (see Section 3.3). In each subsample

S_{r}

, the regularised PM model was estimated using

K = 10

folds CV. Then, the drivers were classified into

G_{L, r}^{M}

and

G_{H, r}^{M}

according to

{\hat{z}}_{i g} \geq 0.5

or

< 0.5

, respectively, where

{\hat{z}}_{i g}

was defined in (3). Let the index sets

I_{L}^{β}

and

I_{H}^{β}

for a nonzero coefficient (that is, selected at least once from

S_{r}

) be defined similar to

I^{β}

in (A2). Then,

β_{L} = (β_{L, j \in I_{L}^{β}})

and

β_{H} = (β_{H, j \in I_{H}^{β}})

are averaged parameter estimates for the low- and high-claim groups, respectively, obtained in a similar manner to

β

defined in (A1);

I_{L}^{M}

and

I_{H}^{M}

are importance measures based on the

{RMSE}_{r, L}^{M}

and

{RMSE}_{r, H}^{M}

defined similar to

I

in (A3);

R^{M}

is the average ratio of the low group size over

R = 100

subsamples; and

J_{L}^{M}

and

J_{H}^{M}

are the number of selected DVs out of

β_{L}

and

β_{H}

, with

I_{L j}^{M} >

43 (

62 \times 0.69

and

R^{M} = 0.69

for PML), 45 (

62 \times 0.73

for PMA),

I_{H j}^{M} >

19 (

62 \times 0.31

for PML), and 17 (

62 \times 0.27

for PMA) similar to

J^{T 1}

. We note that some subsamples had too low of sample size (<200) for the high-claim group or too low differences (<0.005) of observed annual claim frequencies between the two groups or both. Both criteria indicate ineffective grouping and should be eliminated. Consequently, 172 and 113 subsamples were drawn for the PML and PMA models, respectively, in order to collect 100 effective subsamples.

To obtain the overall parameter estimates

β_{L}^{M}

and

β_{H}^{M}

, the selected DVs were refitted to the PM model again. Table A4 in Appendix E reports

β_{L}^{M}

and

β_{H}^{M}

of the PML and PMA models, together with

I_{L}^{M}

and

I_{H}^{M}

. Table 3b reports the number of DVs selected,

J_{L}^{M}, J_{H}^{M}

(

β_{L j}^{M}, β_{H j}^{M} \neq 0

and

I_{L j}^{M}, I_{H j}^{M} > 62

), and the number of significant selected DVs,

J_{LS}^{M}, J_{HS}^{M}

, for each PM model. We note that the PM models had more selected and significant DVs for the high-claim group in general than the TP models. Figure 2c plots the heat map of the parameter estimates of the two groups for the four PM models. Across all four models, the two sets of mostly selected and significant variables are (

18^{* *}

,

20^{*}

,

26^{*}

,

29^{*}

,

37^{*}

,

45^{*}

,

58^{*}

,

61^{*}

,

67^{* *}

, 75) and (

18^{* *}

,

29^{*}

,

31^{*}

,

36^{*}

,

59^{*}

,

67^{* *}

,

73^{*}

,

75^{*}

,

77^{*}

) for the low- and high-claim groups, respectively. These two sets of significant DVs are quite different from those of TP models, as only two DVs from each group (in boldface) were also selected by TP models. Again, DV 67 was selected by both groups, which was the same as the case of the TP models. To select the best PM model, Table 3b reports the performance measures BIC, MSE,

ρ

, and AUC. According to the sum of ranks

R_{M}

in (10), the PMA model was selected. For the selected PMA model, Table A4 shows that there were

J_{L}^{M}

= 39 DVs selected for the low-claim group and

J_{H}^{M} =

40 DVs selected for the high-claim group, of which six (18, 20, 45, 58, 61, 75) and five (29, 36, 59, 67, 73) DVs are significant. See point 5 in Appendix B for the implementation of the PM models, Appendix C for the interpretation of the significant selected DVs, and Section 3.7 for the implication of these DVs on risky driving.

3.6. Zero-Inflated Poisson Model

Since 92% of the claims are zero, we applied the ZIP model in (5) and (6) (Lambert 1992; Zeileis et al. 2008) to capture the structural zero portion of the claims and test if the structural zero claim group should be included in modelling claims. As with the TP and PM models, we drew subsamples

S_{1 : 100}

, each with

N_{r} = 9910

drivers, and ZIP lasso-regularised regression was applied to each

S_{r}

to robustify the selection of DVs. The procedures were similar to the cases of the TP and PM models. As with the PM model, the DVs are selected from

J = J_{1} = 45

more informative DVs to provide stable results.

Let

I_{0}^{β}

and

I_{c}^{β}

be the index sets of nonzero parameter estimates (that is, selected at least once from

S_{r}

) in the zero and count models, respectively, defined in a similar manner to

I^{β}

in (A2). Then, the averaged parameter estimates

β_{0} = (β_{0, j \in I_{0}^{β}})

for the zero model and

β_{c} = (β_{c, j \in I_{c}^{β}})

for the count model are obtained in a similar manner to

β

in (A1);

I_{0}^{Z}

and

I_{c}^{Z}

are importance measures based on the

{RMSE}_{0, r}^{Z}

and

{RMSE}_{c, r}^{Z}

, respectively, defined similarly to

I

in (A3), and

J_{0}^{Z}

and

J_{c}^{Z}

are the number of selected DVs out of

β_{0}

and

β_{c}

, with

I_{0, j}^{Z} > 62

and

I_{c, j}^{Z} > 62

similar to

J^{T 1}

. Next, the

J_{0}^{Z}

and

J_{c}^{Z}

selected DVs were refitted to the ZIP model for all data to obtain the overall parameter estimates

β_{0}^{Z}

and

β_{c}^{Z}

. Parameters

β_{0}, β_{c}

were averaged before refit, parameters

β_{0}^{Z}, β_{c}^{Z}

were averaged after refit, and the importance measures

I_{0}^{Z}, I_{c}^{Z}

are reported in Table A4. Table 3a reports the number of selected DVs

J_{0}^{Z}

and

J_{c}^{Z}

, and performance measures AIC, BIC, and MSE for the ZIPL and ZIPA models following the regularisation choices from the two chosen TPL-1 and TPA-1 models. Between the two ZIP models, the ZIPA model was chosen, because it had nonzero DVs selected for the zero component and a lower BIC. However, the MSE was the highest among all the models shown in Table 3b, indicating low predictive power. More importantly, the zero model estimates the probability of structural zero among all zero, but it does not guide the classification of safe and risky drivers, because safe drivers can claim less but not necessarily none, and risky drivers can claim none by luck or for a no-claim bonus. Hence, drivers classified to the structural zero claim group are not necessarily safe drivers. As a result, the ZIPA model was excluded from model comparison and selection. See Appendix B.6 for the implementation of ZIP models.

3.7. Model Comparison and Selection

We compared the performance of the TP and PM models in terms of claim prediction, predictive DVs selection, and driver classification. Table 3b displays the performance of all 32 TP models and four PM models. The best TP model for each threshold and the best PM model were selected according to

R_{M}

in (10) using four measures. Among the selected TPLA-2 (

τ = 0.08, 0.09

), TPLN-2 (

τ = 0.10, 0.11

), and PMA models, Table 3c shows that the PMA model succeeded through the final selection using six count model scores, confirming the superiority of PM model in many aspects and its preferability over the Poisson model. For an interpretation of the significant selected DVs (29, 36, 59, 67, 73; all with positive coefficients except 59) using the PMA model, risky driving is associated with more frequent severe brake to slow-down at weekday and weekend nights, as well as more frequent severe right turns at the junction at weekday nights and Friday rush time.

Apart from the numerical measures, we also visualised their performance using ROC curves. Section 2.3 introduces the AUC according to three classes of drivers with

y_{i} = 0, 1, \geq 2

or an overall class of all drivers. For each of these classes, the AUC was drawn for the binary classifier of low- and high-claim groups comparing the predicted group, with the proxy observed group of each driver estimated by K-means clustering using

J^{T 1} = 52

selected DVs (from TPL-1 model) for all stage 2 TP models and

J_{1} = 45

informative DVs for the PM model. Figure 4 plots the two clusters using the first two principal components (PCs) that explain 22.11% and 17.44% of variance, respectively, using selected DVs for the two cases. The results show that the first PC could separate the two clusters well for both cases. For the stage 2 TP (PM) model,

N_{1} = 9450

(9372) claims were assigned to the low claim cluster accounting for 67% (66%) of drivers using K-means clustering. These two proportions of the low claim group using K-means clustering are not far away from the estimated proportions

P_{h} = 0.7

to 0.89 in Table A3 using the TPL-1 model, as well as

P^{M} = 0.73

using the PMA model.

Figure 5a–e plots the ROC curves with AUC values using classifier

{\hat{y}}_{i} / n_{i}

for the best stage 2 TP models under each threshold and classifier

{\hat{z}}_{i 1}

in (3) for the best PMA model. The AUCs have been calculated into subgroups

y_{i} = 0

(red), 1 (blue),

\geq 2

(green) in Figure 5a–e, and all the data are shown in Figure 5f. The zigzag patterns of the green lines indicate small sample sizes for drivers with

y_{i} \geq 2

. Table 3b reports the overall AUC values, which are the weighted averages of the three AUC values in each of Figure 5a–e and show that model TPLN-2 when

τ = 0.10

is the best classifier of low- and high-claim groups, while PMA displays the third classifying power. However, the accuracy of the results depends on whether K-means clustering can estimate the true latent groups well.

Since zero-claim drivers are not necessarily safe drivers after considering the DVs, we expect some zero-claim drivers to be risky and non-zero-claim drivers to be safe. This disagreement reflects the impact of DVs on assessing driving risk apart from the claim information. Zero disagreement (ZD) is defined as the proportion, out of all drivers, of those zero-claim drivers classified as risky and non-zero claim drivers classified as safe. This ZD is 3.2% for the best PMA model. This small ZD is due to the low MSE showing agreement between the predicted and observed claims.

4. UBI Experience Rating Premium

In the context of general insurance, a common approach for assessing risk in the typical short-tail portfolio involves multiplying predicted claims frequency by claims severity to determine the risk premium. This derived risk premium is subsequently factored into the profit margin, alongside operating expenses, to determine the final premium charged to customers. This paper centers on claims frequency, and in the premium calculation discussed herein, we assume that claim severity remains constant. Consequently, the premium calculation relies on predicting claims frequency.

The traditional experience rating method prices premiums using historical claims and offers the same rate for drivers within the same risk group (low/high or safe/risky). If individual claim history is available, premiums can be calculated using individual claims relative to overall claims—both from historical records. However, although this extended historical experience rating method can capture the individual differences of risk within a group, it still fails to reflect drivers’ recent driving risk. The integration of telematic data enables us to tailor pricing to current individual risks. This enhanced method is called the UBI experience rating method. We leverage premium pricing as a strategic approach to refine our pricing methodology.

Suppose that a new driver i was classified to claim group g with index set

C_{g}

of all drivers in this group and

i \in C_{g}

. Let

P_{i t}

be his premium for year t,

L_{i, t - 1}

be the historical claim/loss in year

t - 1

,

L_{t - 1}^{g} = \sum_{i^{'} \in C_{g}} L_{i^{'}, t - 1}

be the total claim/loss from the claim group g that driver i was classified to, and

P_{t - 1}^{g} = \sum_{i^{'} \in C_{g}} P_{i^{'}, t - 1}

be the total premium from the claim group g. Moreover, suppose that the best PMA model was trained using the sample of drivers. Let

x_{i •, t}

be the observed DVs for driver i at time t,

g = 1

safe group (

g = 2

risky group) be the classified group if the group indicator

{\hat{z}}_{i 1} > 0.5

(otherwise),

{\hat{y}}_{i, t}

in (4) be the predicted claim frequency given

x_{i •, t}

,

{\hat{y}}_{t}^{g} = (\sum_{i^{'} \in C_{g}} {\hat{y}}_{i^{'}, t}) / N_{g}

be the average predicted claim frequencies from the claim group g that driver i was classified to, and

N_{g}

be the size of group g.

Using the proposed UBI experience rating method, the premium

P_{i t}

for driver i in year t is given by

P_{i t, ϰ} = (1 + R_{i, t}^{Δ}) \times {\bar{P}}_{t}^{g} \times E_{i t} \times F + {\bar{P}}_{t}^{*} \times E_{i t} \times (1 - F)

(12)

where

{\bar{P}}_{t}^{g}

is the group average annual premium in period t from the group data,

{\bar{P}}_{t}^{*}

is the average annual premium from all data or some other data source, F is the credibility factor (Dean 1997),

E_{i t}

is the exposure of driver i, and

R_{i, t - 1}^{Δ}

is the individual adjustment factor to the overall group loss ratio given by

R_{i, t}^{Δ} = R_{i, t - 1}^{Δ, H} + ϰ R_{i, t}^{Δ, UB},

(13)

which is the sum of the historical loss rate change adjustment

R_{i, t - 1}^{Δ, H}

and weighted UBI predicted loss rate change adjustment

R_{i, t - 1}^{Δ, UB}

;

ϰ \in [0, 1]

is the UBI policy parameter to determine how much UBI adjustment is applied to

R_{i, t - 1}^{Δ, UB}

in

R_{i, t}^{Δ}

when updating the premium to account for current driving behaviour. The historical loss rate change

R_{i, t - 1}^{Δ, y}

, historical individual loss ratio

R_{i, t - 1}

, and historical group loss ratio

R_{t - 1}^{g}

are, respectively,

R_{i, t - 1}^{Δ, H} = \frac{R_{i, t - 1} - R_{t - 1}^{g}}{R_{t - 1}^{g}}, R_{i, t - 1} = \frac{L_{i, t - 1}}{P_{i, t - 1}}, and R_{t - 1}^{g} = \frac{L_{t - 1}^{g}}{P_{t - 1}^{g}} .

(14)

The UBI predicted loss rate change

R_{i, t}^{Δ, UB}

, UBI predicted individual loss ratio

R_{i, t}^{y}

, and UBI predicted group loss ratio

R_{t}^{y, g}

are, respectively,

R_{i, t}^{Δ, UB} = \frac{R_{i, t}^{y} - R_{t}^{y, g}}{R_{t}^{y, g}}, R_{i, t}^{y} = \frac{{\hat{y}}_{i, t}}{P_{i, t}}, a n d R_{t}^{y, g} = \frac{{\hat{y}}_{t}^{g}}{P_{t}^{g}} .

The credibility factor F is the weight of the best linear combination between the premium estimate

(1 + R_{i, t}^{Δ}) \times {\bar{P}}_{t}^{g}

using the sample data to the premium estimate

{\bar{P}}_{t}^{*}

using all data or data from another source to improve the reliability of the premium estimate

P_{i t}

. The credibility factor increases with the business size and, hence, the number of drivers in the sample. Dean (1997) provided some methods to estimate F and suggested full credibility

F = 1

when the sample size N is large enough, such as above 10,000 in an example. As this requirement is fulfilled for the telematic data with size

N =

14,157, and all data are used to estimate the chosen PMA model, a full credibility of

F = 1

was applied. In cases where insured vehicles are less in number in the sample, the credibility factor F may vary, and external data sources may be used to improve the reliability of the premium estimate. Moreover, as the selected PMA model can classify drivers, the premium calculation can focus on the classified driver group to provide a more precise premium calculation.

We give an example to demonstrate the experience rating method and its extension to UBI. Suppose that driver i is classified as a safe driver (

g = 1

) in a driving test and wants to buy auto insurance for the next period (

E_{i t} = 1

). As summarised in Table 4, the annual premium for the safe group is

{\bar{P}}_{t}^{1} = $ 300

and for the risky group is

{\bar{P}}_{t}^{2} = $ 500

. Driver i has recorded

L_{i, t - 1} = 0.2

in annual claim frequency and paid an annual premium of

P_{i, t - 1} = $ 500

before. The safe group has recorded an average of

L_{t - 1}^{1} = 0.1

in annual claim frequency and paid an annual premium if

P_{t - 1}^{1} = $ 310

per driver before. The risky group has recorded an average of

L_{t - 1}^{2} = 0.3

claims/loss and paid

P_{t - 1}^{2} = $ 510

in annual premium per driver before. Driver i has more claims than the average of a safe group before. According to these historical claim frequencies, driver i is expected to be relatively more risky than the average of the safe group, so he should pay more.

To illustrate the UBI experience rating method, additional assumptions about the predicted annual claim frequencies for driver i have been added to the last row of Table 4. Assume that driver i has a predicted annual claim frequency

{\hat{y}}_{i, t - 1} = 0.15

before; then, that of the safe group is

{\hat{y}}_{t - 1}^{1} = 0.105

, and that of the risk group is

{\hat{y}}_{t - 1}^{2} = 0.305

. This suggests that driver i operates his vehicle more safely than his historical claims indicate. This information is summarised in Table 4.

Taking the policy parameter

ϰ = 1

, the UBI experience rating premium is given by

P_{i t, 1} = (1 + R_{i, t}^{Δ}) \times {\bar{P}}_{c_{i}, t} \times E_{i} \times F = (1 + 0.1260) \times 300 \times 1 \times 1 = $ 337.80

where the historical loss rate change

R_{i, t - 1}^{Δ}

, the historical loss ratio

R_{i, t - 1}

for driver i, the historical loss ratio for safe group

R_{t - 1}^{1}

, the UBI predicted loss rate change

R_{i, t - 1}^{Δ, UB}

, the UBI predicted loss ratio

R_{i, t}^{y}

for driver i, and the UBI predicted loss ratio

R_{t}^{y, 1}

for the safe group are, respectively,

R_{i, t}^{Δ} = R_{i, t - 1}^{Δ, H} + 1 \times R_{i, t}^{Δ, UB} = 0.2403 - 0.1143 = 0.1260,

(15)

\begin{matrix} R_{i, t - 1}^{Δ, H} & = & \frac{R_{i, t - 1} - R_{t - 1}^{1}}{R_{t - 1}^{1}} = \frac{0.4 - 0.3225}{0.3225} = 0.2403, R_{i, t - 1} = \frac{L_{i, t - 1}}{P_{i, t - 1}} = \frac{0.2}{0.5} = 0.4, R_{t - 1}^{1} = \frac{L_{t - 1}^{1}}{P_{t - 1}^{1}} = \frac{0.1}{0.31} = 0.3225, \\ R_{i, t}^{Δ, UB} & = & \frac{R_{i, t}^{y} - R_{t}^{y, 1}}{R_{t}^{y, 1}} = \frac{0.3 - 0.3387}{0.3387} = - 0.1143, R_{i, t}^{y} = \frac{{\hat{y}}_{i, t}}{P_{i, t - 1}} = \frac{0.15}{0.5} = 0.3, R_{t}^{y, 1} = \frac{{\hat{y}}_{t}^{1}}{P_{t}^{1}} = \frac{0.105}{0.31} = 0.3387 \end{matrix}

(16)

using (12). So, the premium for driver i using the UBI experience rating method is $337.80. This premium is higher than the premium

{\bar{P}}_{t}^{1} =

$300 for the safe group because the loss ratio for driver i is higher relative to the overall ratio in the safe group using historical claims. However, his current loss ratio due to current safe driving reduces the adverse effect due to the higher historical claims.

Nevertheless, we recognise that not all insured vehicles are equipped with telematic devices, introducing potential data gaps in the telematics insights. In response to this challenge, the UBI policy parameter

ϰ

in (13) can be set to 0. This adaptation to the UBI pricing model in (12) also allows for the application to newly insured drivers with only historical records (traditional demographic variables). This premium called historical experience rating premium for driver i during period t is

P_{i t, 0} = (1 + R_{i, t - 1}^{Δ, H}) \times {\bar{P}}_{t}^{1} \times E_{i} \times F = (1 + 0.24031) \times 300 \times 1 \times 1 = $ 372.09

where the historical loss rate change

R_{i, t - 1}^{Δ}

is given by (16). This loss rate change can capture individual differences within a claim group using historical claims but fails to reflect the recent driving risk. Hence, this premium is higher than the UBI experience rating premium calculated using both historical and current driving experience. Thus, the historical experience rating method is unable to provide immediate compensation/reward for safe driving.

Moreover, the UBI premium can track driving behaviour more frequently and closely using regularly updated claim class and annual claim frequency prediction

{\hat{y}}_{i, t}

. The updating period can be reduced to monthly or even weekly to provide more instant feedback using the live telematic data. In summary, the proposed UBI experience rating premium provides a correction of the loss rate change

R_{i, t}^{Δ}

of the experience rating only premium using the sum of both the historical loss rate change

R_{i, t - 1}^{Δ, H}

and the UBI predicted loss rate change

R_{i, t}^{Δ, UB}

. Here, the proposed PMA model can predict more instantly the annual claim frequencies

{\hat{y}}_{i, t}

using live telematic data. Hence, the UBI premium can be updated more frequently to provide incentives for safe driving. The proposed UBI experience rating premium provides an incremental innovation to the business processes allowing the company to gradually transit to the new regime of UBI by adjusting the UBI policy factor

ϰ

in (13) such that

ϰ

can gradually increase from 0 to 1 if driver i wants his premium to gradually account for his current driving.

We remark that our analyses made a few assumptions. Firstly, we assumed that the annual premium

{\bar{P}}_{t}^{g}

covers the total cost with possibly some profit, and the expectations of loss ratios

R_{i, t - 1}^{Δ, H}

and

R_{i, t}^{Δ, UB}

across drivers i in group g are around zero. To assess the validity of the assumptions on expectations, one can obtain the distributions of

R_{i, t - 1}^{Δ, H}, R_{i, t}^{Δ, UB}

based on the most recent data. If their means

m_{g}^{Δ, H}, m_{g}^{Δ, UB}

are not zero, the overall loss ratio

R_{i, t}^{Δ}

in (13) can be adjusted as

R_{i, t}^{Δ} = m_{g}^{Δ, H} R_{i, t - 1}^{Δ, H} + m_{g}^{Δ, UB} ϰ R_{i, t}^{Δ, UB}

(17)

for group g. For conservative purposes, the means

m_{g}^{Δ, H}, m_{g}^{Δ, UB}

can be replaced by say 75% quantiles

q_{g, 0.75}^{Δ, H}, q_{g, 0.75}^{Δ, UB}

of the distributions. Secondly, it also implicitly assumes perfect or near-perfect monitoring. However, the advent of monitoring technologies reduces the extent of asymmetric information between insureds and insurers and reduces moral hazard costs.

5. Conclusions

In summary, our study, based on claim data from 14,157 drivers exhibiting equidispersion and a substantial 92% of zero claims, introduces a novel approach using two-stage TP, PM, and ZIP regressions. Employing regularisation techniques such as lasso, elastic net, adaptive lasso, and adaptive elastic regularisation, we aimed to predict annual claim frequencies, identify significant DVs, and categorised drivers into low-claim (safe driver) and high-claim (risky driver) groups. To ensure the robustness of our findings, we performed 100 resampling iterations, each comprising 70% of the drivers for all TP, PM, and ZIP models. Our empirical results show that PMA model with adaptive lasso regularisation displayed the best performance in this study. This finding provides relevant guidelines for practitioners and researchers, as the analysis is based on a sound representative telematics sample. Moreover, the PMA model is highly favoured in Table 3c, and its implementation is more straightforward than the TP models.

Furthermore, we proposed to utilise the best-performing PMA model for implementing a UBI experience rating method, aiming to enhance the efficiency of premium pricing strategies. This approach shifts the focus from traditional claim history to recent driving behaviour, offering a nuanced assessment of drivers’ risk profiles. Notably, our proposed UBI premium pricing method departs from the annual premium revision characteristic of traditional methods and instead allows for more frequent updates based on recent driving performance to provide instant rewards for safe driving practices and feedback against risky driving using scores of the selected significant DVs for the high-claim group. This dynamic pricing approach not only incentivises responsible and less-frequent driving but also minimises the cross-subsidisation of risky drivers. By enabling a more accurate and timely reflection of driver risk, the UBI contributes to improved loss reserving practices for the auto insurance industry. In essence, our findings support the adoption of UBI experience rating methods as a progressive and effective means of enhancing both driver behaviour and the overall operational efficiency of auto insurance companies.

To implement the PMA models for premium pricing, Section 3.5 provides the modelling details and Appendix B.5 the technical application. If it is challenging, some data analytic companies are experienced to support the handling of telematics data, running of PMA models, predicting drivers’ claims, and revising the UBI experience rating premiums. Updating the frequency for models, claim predictions, and premiums depends on the resources and type of policies. As a suggestion, the PMA models can be updated annually to reflect the change in road conditions, transport policies, etc., and the drivers’ predicted annual claims can be updated fortnightly or monthly depending on drivers’ mileage. When their predicted annual claims are updated, the premium can also be updated to provide an incentive for good driving. Averaged and individual driving scores for the selected significant DVs (e.g., 29, 36, 59, 67, 73 for the high-claim group of PMA model) can be sent possibly with warnings to inform driving behaviour and encourage skill improvement. These selected significant DVs are associated with more frequent severe brake to slow down at weekday and weekend nights, as well as more frequent severe right turns at the junction at weekday nights and Friday rush time.

In the context of future research within this domain, expanding the classification of driver groups to three or more holds the potential to encompass a wider range of driving styles, ultimately leading to more accurate predictions of claim liability. Introducing an intermediary driver group, distinct from the existing safe and risky classifications, offers an avenue to capture unique driving behaviours and potentially enhances the predictive power of our models. This extension not only enables a closer examination of different driving behaviours but also poses challenges in terms of identifying and interpreting these additional groups. While the application of similar mixture models and regularisation techniques for modelling multiple components remains viable, unravelling the intricacies of distinct groups within the expanded framework introduces interpretative complexities. Determining whether the third group is a composite of the existing two or represents a genuinely distinct category presents additional challenges. Moreover, handling the label switching problem becomes more intricate when dealing with mixture models featuring multiple groups.

A parallel trajectory for future exploration centers around the integration of neural networks as an alternative modelling approach. In contrast to the selection of key driving variables, neural networks employ hidden layers to capture intricate dynamics, incorporating diverse weights and interaction terms. This modelling paradigm allows for the application of network models to trip data without temporal aggregation, as exemplified by Ma et al. (2018), facilitating a more detailed analysis of driving behaviours in conjunction with real-time information on surrounding traffic conditions.

Author Contributions

Conceptualisation, J.S.K.C. and U.E.M.; methodology, J.S.K.C.; software, F.U. and A.X.D.D.; validation, F.U., J.S.K.C. and A.X.D.D.; formal analysis, F.U. and Y.W.; investigation, F.U. and Y.W.; resources, U.E.M.; data curation, F.U.; writing—original draft preparation, F.U., J.S.K.C. and Y.W.; writing—review and editing, J.S.K.C.; visualisation, F.U.; supervision, J.S.K.C.; project administration, J.S.K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

A	Adaptive lasso
AIC	Akaike information criterion
BIC	Bayesian information criterion
DVs	Driver behaviour variables
E	Elastic net
GLM	Generalized linear model
GPS	Global positioning system
IG	Information gain
L	Lasso
MSE	Mean squared error
N	Adaptive elastic net
NB	Negative binomial
PAYD	Pay As You Drive
PHYD	Pay How You Drive
PM	Poisson mixture
RMSE	Root mean squared error
ROC	Receiver operating characteristic curve
TP	Two-stage threshold Poisson
UBI	Usage-based auto insurance
ZIP	Zero-inflated Poisson

Appendix A. Details of Stage 1 TP Model Procedures

Draw subsamples $S_{r} = {(x_{i •}, n_{i}, y_{i}), i \in I_{r}}, r = 1, \dots, R,$ with each containing $N_{r} = 9910$ drivers, where the index set $I_{r}$ contains all i being sampled. The K-fold CV ( $K = 10$ ) further splits $S_{r}$ into 10 nonoverlapping and equal-sized ( $N_{k} = 991$ ) CV sets

$S_{r k} = {(x_{i •}, n_{i}, y_{i}), i \in I_{r k}}, k = 1, \dots, K with index set I_{r k}$

and the training sets are $S_{r k}^{T} = S_{r} ∖ S_{r k}$ , with index set $I_{r k}^{T} = I_{r} ∖ I_{r k}$ . Set $λ = (λ_{1}, \dots, λ_{M})$ for some M to be the list of potential $λ$ .
Estimate $β_{λ_{m}, r k} = \underset{β}{argmin} {LOSS}_{λ, α, w} (β)$ in (7) for each $λ_{m} \in λ$ and training set $S_{r k}^{T}$ at repeat r and CV k. Find optimal $λ_{m}$ that minimises some regularised CV test statistic such as MSE, MAE, or Deviance (Dev). Taking Dev as an example,

$λ_{r, \min} = \underset{λ_{m} \in λ}{argmin} {Dev}_{r} (λ_{m}) = \underset{λ_{m} \in λ}{argmin} \frac{1}{N_{r}} \sum_{k = 1}^{K} \sum_{i \in I_{r k}} - 2 \log f (y_{r k i}; μ_{r k i} (β_{λ_{m}, r k}))$

where the mean $μ_{r k i, λ_{m}} = \exp (x_{i •} β_{λ_{m}, r k} + \log (n_{i}))$ . Among MSE, MAE, and Dev statistics, optimal $λ_{r, \min}$ using Dev is selected according to the RMSE of predicted claims for all subsamples. Using $λ_{r, \min}$ , $β_{r} = (β_{r 1}, \dots, β_{r J})$ is re-estimated based on the subsample $S_{r}$ . Figure A1a plots Poisson deviance with SE against $\log (λ_{m})$ , showing how it drops to $λ_{r, \min}$ for the first subsample ( $r = 1$ ). Figure A1b shows how $β_{r j}$ shrinks to zero as $λ$ increases.
Average those nonzero coefficients (selected at least once) over repeats as below:

$β_{j} = \sum_{r = 1}^{R} β_{r j} I (β_{r j} \neq 0) / \sum_{r = 1}^{R} I (β_{r j} \neq 0), j \in I^{β}$

(A1)

where $I (A)$ is the indicator function of event A, and the index set

$I^{β} = {j : \exists r = 1, \dots, R, β_{r j} \neq 0}$

(A2)

contains those DVs selected at least once over R subsamples in stage 1. The averaged coefficients $β_{j \in I^{β}}$ (based on Dev) are reported in Table A2 for the TPL-1 and TPA-1 models using the optimal $λ_{\min}$ . For example, DV 10 is not even selected once for the TPL-1 model.
Further select DVs that are frequently (not rarely) selected according to a weighted selection frequency measure given by

$I_{j} = \sum_{r = 1}^{R} \frac{1}{{RMSE}_{r}} I (β_{r j} \neq 0), j \in I^{β}$

(A3)

weighting inversely to RMSE_r. Superscripts T1, T2, M, and Z are added to $I_{j}$ when applied to the stage 1 TP, stage 2 TP, PM, and ZIP models, respectively. This weighted selection counts $I = (I_{j \in I^{β}})$ using MSE, MAE, and deviance, which are also reported in Table A2 for the TPL-1 and TPA-1 models. Table 3a shows that the TPL-1 and TPA-1 models have been selected according to the model performance measures AIC, BIC, and MSE. The results of the TPL-1 model in Table A2 show that 12 DVs in deep grey highlight with $I_{j} < 0.2 \underset{j \in I^{β}}{\max (I_{j})} = 62$ have been dropped, as they are rarely selected, resulting in $J^{T 1} = 65 - 1 - 12 = 52$ DVs. These $J^{T 1}$ DVs can be interpreted as frequently selected DVs or simply selected DVs.

Figure A1. (a) The Poisson deviance CV criteria across

\log λ_{m}

to find

λ_{\min}

. (b) Coefficient

β_{j}

across

\log λ

for stage 1 TP model using lasso regularisation based on the first subsample (

r = 1

).

Figure A1. (a) The Poisson deviance CV criteria across

\log λ_{m}

to find

λ_{\min}

. (b) Coefficient

β_{j}

across

\log λ

for stage 1 TP model using lasso regularisation based on the first subsample (

r = 1

).

Appendix B. Some Technical Details of Model Implementation

This study utilises R commands glm to fit Poisson regression and glmnet to fix Poisson regression with lasso regularisation (Zeileis et al. 2008). The latter command begins with adopting the R function sparse.model.matrix as
data_feature <- sparse.model.matrix(∼., dt_feature).
We use the argument penalty.factor in cv.glmnet for adaptive lasso. We remark that the glmnet package does not provide a p value. We extract the p value for the selected DVs by refitting the model using glm procedure.
We use the 100 simulated dataset in stages 1 and 2 of the TP and PM models to explore optimal $α$ values in the elastic net. We first set up our 10-fold CV strategy. Using caret package in R, we use train() with method = “glmnet” to fit the elastic net.
XX = model.matrix(Claims ~ . -EXP-1,data=stage1)
  YY = stage1$Claims
  OFF = log(stage1$EXP)
  Fit_stage1 <- caret::train(
    x = cbind(XX,OFF),
    y = YY,
    method = "glmnet",
    family = "poisson",
    tuneLength = 10,
    trControl = trainControl(method="cv", number = 10, repeats = 100)
  )
We use roc() in the pROC package to calculate the AUC. The latex2exp package also provides an ROC plot.
We implement the AER package in R using the built-in command dispersiontest() that assesses the alternative hypothesis $H_{1} : Var (Y_{i}) = μ_{i} + Ψ \times trafo (μ_{i})$ , where the transformation function $trafo (μ_{i}) = μ_{i}$ (by default, trafo = NULL) corresponds to the Poisson model with $Var (Y_{i}) = (1 + Ψ) μ_{i}$ . If the dispersion $1 + Ψ$ is greater than 1, it indicates overdispersion.
The PM regression model is estimated using
FLXMRglmnet(formula = .∼., family = c("gaussian","binomial","poisson"),
adaptive = TRUE, select = TRUE, offset = NULL, …)
in the R package flexmix (Leisch 2004) to fit mixtures of GLMs with lasso regularisation. Setting adaptive = TRUE for the adaptive lasso triggers a two-step process. Initially, an unpenalised model is fitted to obtain the preliminary coefficient estimates ${\hat{β}}_{j}$ for the penalty weights $w_{j} = 1 / | {\hat{β}}_{j} |$ .Then, $w_{j}$ values are applied to each coefficient in the subsequent model fitting. With the selected DVs for the low- and high-claim groups, FLXMRglmfix() refits the model, provides the significance of the coefficients, predicts claims, supports CV values and evaluates various goodness-of-fit measures.
The ZIP regression model is estimated using the zipath() function for lasso and elastic net regularisation and the ALasso() function for adaptive lasso regularisation from the mpath and AMAZonn packages. The optimal lambda minimum is searched via 10-fold cross-validation with cv.zipath() and applied to both fitted models, ZIPL and ZIPA, for $R = 100$ subsamples, each with 70% data. Full data are refitted to the PM model based on the selected DVs using Poisson zeroinf.

Appendix C. Driving Variable Description

Event type
ACC	Acceleration Event—Accelerating/From full stop
	C1	Smooth acceleration (acceleration to 30 MPH in more than 12 s)
	C2	Moderate acceleration (acceleration to 30 MPH in 5–11 s)
BRK	Braking Event—Full Stop/Slow down
	C1	Smooth, even slowing down (up to about 7 mph/s)
	C2	Mild to sharp brakes with adequate visibility and road grip (7–10 mph/s)
LFT	Left turning Event—None (Interchange, curved road, overtaking)/At Junction
	C1	Smooth, even cornering within the posted speed and according to the road and visibility conditions
	C2	Moderate cornering slightly above the posted speed (cornering with light disturbance to passengers)
RHT	Right turning Event—None (Interchange, curved road, overtaking)/At Junction
	C1 and C2 are the same as LFT

Time type
T1	Weekday late evening, night, midnight, early morning
T2	Weekday morning rusk, noon, afternoon rush
T3	Weekday morning, afternoon, no rush
T4	Friday rush
T5	Weekend night
T6	Weekend day

Table A1. Driving variable labels.

DV1	ACC_ACCELERATING_T3_C1	DV19	BRK_FULLSTOP_T1_C1	DV39	LFT_NONE_T1_C1	DV57	RHT_NONE_T1_C1
DV2	ACC_ACCELERATING_T3_C2	DV20	BRK_FULLSTOP_T1_C2	DV43	LFT_NONE_T6_C1	DV58	RHT_NONE_T1_C2
DV3	ACC_ACCELERATING_T4_C1	DV22	BRK_FULLSTOP_T2_C2	DV44	LFT_NONE_T6_C2	DV59	RHT_NONE_T4_C1
DV4	ACC_ACCELERATING_T4_C2	DV23	BRK_FULLSTOP_T3_C1	DV45	LFT_ATJUNCTION_T1_C1	DV60	RHT_NONE_T4_C2
DV5	ACC_ACCELERATING_T5_C1	DV24	BRK_FULLSTOP_T3_C2	DV46	LFT_ATJUNCTION_T1_C2	DV61	RHT_NONE_T5_C1
DV7	ACC_ACCELERATING_T5_C2	DV25	BRK_FULLSTOP_T4_C1	DV47	LFT_ATJUNCTION_T2_C1	DV63	RHT_NONE_T5_C2
DV8	ACC_FROMFULLSTOP_T1_C1	DV26	BRK_FULLSTOP_T4_C2	DV49	LFT_ATJUNCTION_T3_C1	DV64	RHT_NONE_T6_C1
DV9	ACC_FROMFULLSTOP_T1_C2	DV27	BRK_FULLSTOP_T6_C1	DV50	LFT_ATJUNCTION_T3_C2	DV65	RHT_NONE_T6_C2
DV10	ACC_FROMFULLSTOP_T2_C1	DV28	BRK_FULLSTOP_T6_C2	DV51	LFT_ATJUNCTION_T4_C1	DV66	RHT_ATJUNCTION_T1_C1
DV13	ACC_FROMFULLSTOP_T3_C2	DV29	BRK_SLOWDOWN_T1_C1	DV52	LFT_ATJUNCTION_T4_C2	DV67	RHT_ATJUNCTION_T1_C2
DV14	ACC_FROMFULLSTOP_T4_C1	DV31	BRK_SLOWDOWN_T2_C1	DV53	LFT_ATJUNCTION_T5_C1	DV68	RHT_ATJUNCTION_T2_C1
DV15	ACC_FROMFULLSTOP_T4_C2	DV32	BRK_SLOWDOWN_T2_C2	DV54	LFT_ATJUNCTION_T5_C2	DV69	RHT_ATJUNCTION_T2_C2
DV16	ACC_FROMFULLSTOP_T5_C1	DV33	BRK_SLOWDOWN_T4_C1	DV55	LFT_ATJUNCTION_T6_C1	DV71	RHT_ATJUNCTION_T3_C2
DV18	ACC_FROMFULLSTOP_T5_C2	DV34	BRK_SLOWDOWN_T4_C2	DV56	LFT_ATJUNCTION_T6_C2	DV72	RHT_ATJUNCTION_T4_C1
		DV35	BRK_SLOWDOWN_T5_C1			DV73	RHT_ATJUNCTION_T4_C2
		DV36	BRK_SLOWDOWN_T5_C2			DV74	RHT_ATJUNCTION_T5_C1
		DV37	BRK_SLOWDOWN_T6_C1			DV75	RHT_ATJUNCTION_T5_C2
		DV38	BRK_SLOWDOWN_T6_C2			DV76	RHT_ATJUNCTION_T6_C1
						DV77	RHT_ATJUNCTION_T6_C2

Appendix D. Visualisation of Driver Variables

Appendix D.1. Driving Variables by Claim Frequency

Figure A2. Value against driver ID with colours showing claim frequency

y_{i} = 0, 1, \geq 2

for 65 DVs.

Figure A2. Value against driver ID with colours showing claim frequency

y_{i} = 0, 1, \geq 2

for 65 DVs.

Appendix D.2. Correlation Matrix and Hierarchical Clustering of Driving Variables

Figure A3. Relationship between variables using correlation matrix and hierarchical clustering.

Appendix E. Parameter Estimates of All Models

Table A2. Parameter estimates

β

in (A1) for stage 1 TP models before refit with

R = 100

subsamples of 70% data using Poisson glmnet,

β^{T 1}

after refitting to full data using Poisson glm on the selected DVs with

I_{j}^{T 2} > 62

(otherwise dropped as indicated in grey highlight), and selection criteria

I^{T 1}

in (A3). There are

J^{T 1} = 52

DVs selected for TPL-1 and

J^{T 1} = 39

DVs selected for TPA-1 under columns

β_{j}^{T 1}

. The bold with yellow highlighted under

β_{j}^{T 1}

are significant.

Table A2. Parameter estimates

β

in (A1) for stage 1 TP models before refit with

R = 100

subsamples of 70% data using Poisson glmnet,

β^{T 1}

after refitting to full data using Poisson glm on the selected DVs with

I_{j}^{T 2} > 62

(otherwise dropped as indicated in grey highlight), and selection criteria

I^{T 1}

in (A3). There are

J^{T 1} = 52

DVs selected for TPL-1 and

J^{T 1} = 39

DVs selected for TPA-1 under columns

β_{j}^{T 1}

. The bold with yellow highlighted under

β_{j}^{T 1}

are significant.

TPL-1
	glmnet with 100 Repeats				glm		glmnet with 100 Repeats				glm		glmnet with 100 Repeats				glm
Measures	MSE	MAE	Deviance		Poisson	Measures	MSE	MAE	Deviance		Poisson	Measures	MSE	MAE	Deviance		Poisson
DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$	DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$	DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$
1	-	34	7	−0.0029	-	28	3	126	61	−0.0031	-	55	-	119	68	0.0082	0.0134
2	89	232	228	0.0176	0.0159	29	227	276	279	0.0279	0.0360	56	37	136	123	0.0149	0.0061
3	180	317	337	0.0402	0.0619	31	251	324	337	0.0409	0.0535	57	139	310	334	−0.0446	−0.1696
4	140	307	334	−0.0409	−0.0987	32	302	327	341	0.0474	0.0513	58	24	147	109	0.0115	0.0095
5	3	109	61	0.0085	-	33	133	273	266	0.0256	0.0393	59	68	252	229	0.0282	0.0448
7	-	136	116	−0.0021	−0.0830	34	-	85	20	−0.0073	-	60	95	198	191	−0.0243	−0.0091
8	7	95	37	−0.0011	-	35	146	245	242	0.0222	0.0168	61	255	317	320	0.0426	0.0626
9	272	310	320	0.0417	0.0546	36	262	320	334	0.0576	0.0797	63	98	242	235	0.0219	0.0346
10	-	17	-	-	-	37	292	327	334	0.0424	0.0518	64	30	194	160	−0.0264	−0.0348
13	10	140	72	−0.0154	−0.0253	38	184	290	289	0.0238	0.0263	65	-	99	41	−0.0084	-
14	-	105	14	−0.0031	-	39	71	232	228	0.0122	0.0160	66	41	164	133	0.0164	0.0173
15	14	119	68	−0.0190	−0.0065	43	78	204	204	−0.0381	−0.0505	67	329	330	341	−0.1400	−0.1706
16	3	113	65	0.0001	−0.0004	44	3	112	41	−0.0113	-	68	17	102	61	−0.0135	-
18	316	327	341	0.0969	0.1254	45	3	78	48	0.0172	-	69	17	188	140	−0.0166	−0.0320
19	-	85	20	0.0021	-	46	31	194	164	0.0210	0.0361	71	78	231	224	0.0172	0.0185
20	173	310	323	0.0363	0.0563	47	177	314	330	−0.0611	−0.0918	72	187	303	297	0.0319	0.0418
22	41	205	177	0.0133	0.0309	49	95	245	252	−0.0341	−0.0448	73	302	327	341	0.0587	0.0743
23	-	133	75	−0.0051	−0.0235	50	72	228	218	0.0157	0.0205	74	102	242	242	0.0216	0.0350
24	136	272	262	0.0213	0.0236	51	116	289	306	−0.0517	−0.0944	75	336	330	341	0.0621	0.0659
25	-	119	58	−0.0087	-	52	150	307	324	−0.0397	−0.0623	76	48	239	235	−0.0236	−0.0565
26	58	160	139	0.0129	0.0024	53	65	174	157	0.0156	0.0107	77	157	307	324	0.0324	0.0549
27	17	160	129	−0.0205	−0.0402	54	163	262	272	0.0209	0.0208
TPA-1
	glmnet with 100 Repeats				glm		glmnet with 100 Repeats				glm		glmnet with 100 Repeats				glm
Measures	MSE	MAE	Deviance		Poisson	Measures	MSE	MAE	Deviance		Poisson	Measures	MSE	MAE	Deviance		Poisson
DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$	DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$	DVs	$I_{j}^{T 1}$			$β_{j}$	$β_{j}^{T 1}$
1	3	41	24	−0.0712	-	28	-	79	-	-	-	55	-	41	20	0.0030	-
2	61	95	68	0.0360	0.0149	29	228	279	276	0.0311	0.0357	56	14	99	61	0.0311	-
3	160	317	310	0.0512	0.0608	31	217	316	313	0.0514	0.0536	57	78	306	327	−0.0797	−0.1726
4	89	296	310	−0.0630	−0.1035	32	319	337	340	0.0552	0.0503	58	-	38	3	0.0297	-
5	-	61	10	0.0482	-	33	78	248	214	0.0352	0.0363	59	31	204	139	0.0374	0.0478
7	-	24	-	-	-	34	-	38	17	−0.0201	-	60	58	143	126	−0.0288	−0.0093
8	-	34	13	−0.0067	-	35	102	190	177	0.0258	0.0199	61	163	283	282	0.0595	0.0702
9	187	300	289	0.0517	0.0579	36	248	334	330	0.0703	0.0775	63	41	194	150	0.0358	0.0422
10	-	-	-	-	-	37	285	334	330	0.0513	0.0530	64	27	143	89	−0.0551	−0.0317
13	-	48	20	−0.0144	-	38	160	231	218	0.0298	0.0271	65	-	17	10	−0.0100	-
14	-	62	-	-	-	39	7	116	71	0.0132	0.0163	66	17	109	55	0.0221	-
15	3	86	31	−0.0200	-	43	48	235	204	−0.0577	−0.0510	67	336	340	340	−0.1752	−0.1686
16	-	14	3	−0.0088	-	44	-	44	7	−0.0294	-	68	-	85	55	−0.0201	-
18	333	340	340	0.1212	0.1205	45	3	72	44	0.0293	-	69	7	99	34	−0.0334	-
19	10	65	17	0.0146	-	46	10	130	51	0.0410	-	71	37	129	102	0.0291	0.0204
20	112	286	272	0.0426	0.0567	47	170	327	327	−0.0773	−0.0913	72	156	269	248	0.0479	0.0443
22	20	143	85	0.0300	0.0282	49	58	194	187	−0.0470	−0.0367	73	289	340	337	0.0710	0.0733
23	-	55	14	−0.0171	-	50	27	129	92	0.0259	0.0206	74	51	228	167	0.0301	0.0341
24	58	188	147	0.0237	0.0230	51	51	282	262	−0.0718	−0.0918	75	316	340	333	0.0748	0.0709
25	-	34	3	−0.0843	-	52	136	306	303	−0.0493	−0.0618	76	41	225	184	−0.0446	−0.0565
26	21	68	38	0.0201	-	53	10	71	54	0.0182	-	77	116	290	273	0.0457	0.0554
27	10	153	105	−0.0349	−0.0359	54	109	176	183	0.0259	0.0256

Table A3. Parameter estimates

β_{L h}^{T 2}, β_{H h}^{T 2}

for the stage 2 TP models with

R = 100

subsamples of 70% data. Parameters are based on

J^{T 1} = 52

DVs in stage 1, and

J_{2}^{T}

refers to the number of frequently selected DVs with

I_{L h j} >

43 (

τ_{0.08}

), 49 (

τ_{0.09}

), 53 (

τ_{0.10}

). 56 (

τ_{0.11}

), and

I_{H h j} >

19, 13, 9, and 6, respectively, which differ across threshold. Significant

β_{L h j}^{T 2}, β_{H h j}^{T 2}

are in boldface with yellow highlighted.

Table A3. Parameter estimates

β_{L h}^{T 2}, β_{H h}^{T 2}

for the stage 2 TP models with

R = 100

subsamples of 70% data. Parameters are based on

J^{T 1} = 52

DVs in stage 1, and

J_{2}^{T}

refers to the number of frequently selected DVs with

I_{L h j} >

43 (

τ_{0.08}

), 49 (

τ_{0.09}

), 53 (

τ_{0.10}

). 56 (

τ_{0.11}

), and

I_{H h j} >

19, 13, 9, and 6, respectively, which differ across threshold. Significant

β_{L h j}^{T 2}, β_{H h j}^{T 2}

are in boldface with yellow highlighted.

$τ_{0.08}$ : TPLA-2						$τ_{0.09}$ : TPLA-2						$τ_{0.10}$ : TPLN-2						$τ_{0.11}$ : TPLN-2
Groups	Low			High			Low			High			Low			High			Low			High
$R_{h}$	0.70			0.30			0.79			0.21			0.85			0.15			0.90			0.10
$I_{h j}$	43			19			49			13			53			9			56			6
$J_{2}^{T}$	17			22			38			14			42			23			43			28
DVs	$I_{L h j}$	$β_{L h j}^{T 2}$	DVs	$I_{H h j}$	$β_{H h j}^{T 2}$	DVs	$I_{L h j}$	$β_{L h j}^{T 2}$	DVs	$I_{H h j}$	$β_{H h j}^{T 2}$	DVs	$I_{L h j}$	$β_{L h j}^{T 2}$	DVs	$I_{H h j}$	$β_{H h j}^{T 2}$	DVs	$I_{L h j}$	$β_{L h j}^{T 2}$	DVs	$I_{H h j}$	$β_{H h j}^{T 2}$
2	-	-	2	42	0.0277	2	-	-	2	19	0.0169	2	-	-	2	18	0.0108	2	-	-	2	5	0.0250
3	8	0.0093	3	31	0.0356	3	52	0.0331	3	-	-	3	116	0.0515	3	5	0.0235	3	118	0.0403	3	-	-
4	49	−0.0567	4	42	0.0366	4	254	−0.0714	4	43	0.0466	4	356	−0.0748	4	76	0.0864	4	357	−0.0833	4	129	0.1443
7	-	-	7	-	-	7	-	-	7	-	-	7	-	-	7	-	-	7	-	-	7	-	-
9	95	0.0636	9	3	−0.0081	9	350	0.0976	9	3	−0.0416	9	348	0.0953	9	10	−0.0421	9	357	0.0859	9	15	−0.0528
13	4	−0.0882	13	48	0.0410	13	66	−0.0648	13	87	0.0824	13	243	−0.0513	13	91	0.0814	13	278	−0.0545	13	83	0.0667
15	11	−0.0010	15	6	−0.0205	15	26	−0.0193	15	-	-	15	44	−0.0643	15	-	-	15	22	−0.0286	15	8	0.0355
16	11	−0.0584	16	17	0.0252	16	33	−0.0353	16	68	0.0424	16	22	−0.0077	16	55	0.0426	16	11	−0.0097	16	15	0.0219
18	46	0.0820	18	36	0.0608	18	210	0.0827	18	3	0.0867	18	323	0.0764	18	-	-	18	343	0.1031	18	-	-
20	15	−0.0056	20	45	0.0351	20	40	0.0428	20	-	-	20	207	0.0437	20	-	-	20	171	0.0363	20	3	0.0768
22	72	−0.0633	22	89	0.0407	22	15	−0.0469	22	8	0.0272	22	36	−0.0138	22	5	0.0174	22	47	−0.0218	22	18	0.0432
23	11	0.0060	23	6	0.0196	23	33	0.0084	23	3	−0.0091	23	101	0.0292	23	8	−0.0442	23	139	0.0280	23	40	−0.0761
24	35	0.0645	24	11	0.0316	24	195	0.0727	24	3	0.0456	24	348	0.0904	24	-	-	24	339	0.0844	24	-	-
26	113	0.0577	26	6	−0.0608	26	199	0.0516	26	11	−0.0282	26	185	0.0377	26	3	−0.0094	26	154	0.0270	26	8	−0.0363
27	61	−0.0502	27	34	0.0476	27	95	−0.0422	27	5	−0.0264	27	214	−0.0478	27	10	−0.0320	27	75	−0.0287	27	33	−0.0863
29	12	−0.0749	29	95	0.0211	29	48	−0.0497	29	32	0.0179	29	163	0.0476	29	11	−0.0494	29	157	0.0358	29	13	0.0148
31	12	0.0563	31	17	0.0219	31	74	0.0495	31	3	0.0463	31	287	0.0555	31	5	0.0323	31	286	0.0549	31	-	-
32	46	0.0614	32	61	0.0320	32	332	0.1115	32	-	-	32	345	0.0893	33	8	−0.0312	32	332	0.0801	32	-	-
33	23	0.0492	33	-	-	33	147	0.0523	33	-	-	33	309	0.0569	33	-	-	33	264	0.0492	33	5	−0.0615
35	-	-	35	25	0.0343	35	85	0.0277	35	-	-	35	127	0.0265	35	-	-	35	61	0.0227	35	5	0.0954
36	69	0.0575	36	3	0.0776	36	137	0.0411	36	-	-	36	264	0.0536	36	-	-	36	318	0.0591	36	3	−0.0304
37	243	0.1047	37	14	0.0276	37	354	0.1337	37	-	-	37	355	0.1239	37	-	-	37	357	0.1153	37	-	-
38	38	−0.0502	38	22	0.0304	38	78	−0.0596	38	16	0.0401	38	127	−0.0437	38	39	0.0386	38	25	−0.0237	38	18	0.0239
39	87	−0.0652	39	-	-	39	251	−0.0819	39	-	-	39	327	−0.1015	39	3	0.0148	39	132	−0.1482	39	-	-
43	27	−0.0380	43	6	0.0150	43	59	−0.0501	43	13	−0.0375	43	65	−0.0265	43	26	−0.0868	43	82	−0.0289	43	83	−0.0808
46	4	−0.0310	46	67	0.0310	46	15	0.0088	46	5	0.0376	46	26	0.0188	46	-	-	46	36	0.0345	46	10	0.0868
47	60	−0.0564	47	6	−0.0103	47	225	−0.0711	47	3	−0.0745	47	341	−0.0804	47	-	-	47	321	−0.0713	47	-	-
49	15	−0.0375	49	-	-	49	59	−0.0326	49	11	−0.0564	49	54	−0.0219	49	75	−0.0759	49	193	−0.0381	49	26	−0.0744
50	15	−0.0394	50	11	0.0320	50	29	−0.0543	50	-	-	50	87	−0.0308	50	13	0.0291	50	64	−0.0244	50	22	0.0360
51	152	−0.0853	51	150	0.0625	51	206	−0.0769	51	8	0.0623	51	214	−0.0555	51	-	-	51	314	−0.0776	51	7	0.0969
52	4	−0.0013	52	45	−0.0826	52	151	−0.0387	52	8	−0.0763	52	268	−0.0416	52	16	−0.0202	52	293	−0.0462	52	7	−0.0014
53	152	0.0733	53	-	-	53	95	0.0407	53	3	0.0167	53	51	−0.0297	53	10	0.0507	53	110	0.0380	53	5	0.0094
54	34	0.0424	54	6	0.0210	54	56	0.0202	54	3	0.0047	54	47	0.0256	54	3	0.0332	54	232	0.0392	54	-	-
55	11	0.0340	55	-	-	55	158	0.0483	55	16	−0.0460	55	152	0.0354	55	13	−0.0540	55	228	0.0491	55	33	−0.1115
56	49	0.0443	56	3	−0.0516	56	122	0.0412	56	3	−0.0275	56	196	0.0327	56	13	−0.0398	56	132	0.0401	56	20	−0.0636
57	15	−0.0626	57	-	-	57	214	−0.0672	57	92	0.0740	57	337	−0.0673	57	49	−0.1974	57	346	−0.0739	57	45	−0.2011
58	49	0.0432	58	78	−0.0545	58	74	0.0290	58	57	−0.0548	58	91	0.0349	58	96	−0.0789	58	100	0.0280	58	98	−0.0941
59	65	0.0571	59	-	-	59	207	0.0566	59	33	−0.0480	59	294	0.0557	59	91	−0.0799	59	285	0.0552	59	50	−0.0947
60	50	−0.0413	60	8	0.0345	60	55	−0.0442	60	-	-	60	239	−0.0514	60	10	0.0615	60	221	−0.0514	60	10	0.0957
61	8	0.0465	61	31	0.0258	61	26	0.0135	61	32	0.0445	61	149	0.0485	61	-	-	61	146	0.0504	61	8	0.0182
63	8	−0.0378	63	67	0.0422	63	4	−0.0054	63	-	-	63	15	−0.0068	63	16	0.0675	63	50	0.0219	63	3	0.0811
64	26	−0.0629	64	-	-	64	88	−0.0593	64	-	-	64	142	−0.0489	64	-	-	64	211	−0.0458	64	35	0.0833
66	38	0.0499	66	-	-	66	225	0.0496	66	-	-	66	254	0.0416	66	-	-	66	211	0.0400	66	3	−0.0610
67	31	−0.0420	67	176	−0.1401	67	310	−0.0912	67	187	−0.1461	67	363	−0.1275	67	112	−0.0977	67	357	−0.1418	67	97	−0.1485
69	19	−0.0407	69	76	0.0508	69	121	−0.0523	69	-	-	69	98	−0.0353	69	-	-	69	46	−0.0232	69	5	−0.0503
71	30	0.0396	71	6	−0.0682	71	192	0.0464	71	8	−0.0405	71	175	0.0328	71	3	−0.0374	71	100	0.0228	71	2	−0.0005
72	8	0.0326	72	11	0.0267	72	41	0.0265	72	-	-	72	76	0.0236	72	-	-	72	104	0.0219	72	10	0.0867
73	42	0.0594	73	50	0.0344	73	185	0.0747	73	5	0.0291	73	268	0.0709	73	5	0.0354	73	321	0.0709	73	2	0.0015
74	-	-	74	17	0.0248	74	122	0.0458	74	3	0.0242	74	276	0.0611	74	-	-	74	188	0.0446	74	-	-
75	15	0.0488	75	107	0.0456	75	254	0.0707	75	16	0.0436	75	301	0.0730	75	3	0.0451	75	332	0.0844	75	-	-
76	27	−0.0662	76	6	−0.0343	76	151	−0.0588	76	5	−0.0035	76	276	−0.0549	76	10	0.0779	76	271	−0.0491	76	27	0.0867
77	7	0.0400	77	14	0.0209	77	26	0.0373	77	5	0.0330	77	36	0.0075	77	24	0.0574	77	100	0.0391	77	5	0.0448

Table A4. Parameter estimates

β_{0}

,

β_{c}

for ZIP model before refit,

β_{L}^{M}, β_{H}^{M}

for PM, and

β_{0}^{Z}

,

β_{c}^{Z}

for ZIP models after refitted to all data based on selected DVs and selection criteria

I_{L}^{M}

,

I_{H}^{M}

,

I_{0}^{Z}

,

I_{c}^{Z}

, with

R = 100

subsamples of 70% data. For PM model,

J_{L}^{M}, J_{H}^{M}

refer to the number of frequently selected DVs with

I_{L j}^{M} >

43 (PML), 45 (PMA), and

I_{H j}^{M} >

19 (PML), 17 (PMA); otherwise, they are dropped, as in grey highlight. For ZIP model,

J_{0}^{Z}, J_{c}^{Z}

refer to the number of frequently selected DVs with

I_{0 j}^{Z} > 62

and

I_{c j}^{Z} > 62

; otherwise,

β_{0 j}

and

β_{c j}

are excluded, as in grey highlight. Significant parameters

β_{L j}^{M}, β_{H j}^{M}, β_{0 j}^{Z}, β_{c j}^{Z}

are boldfaced and yellow highlighted.

Table A4. Parameter estimates

β_{0}

,

β_{c}

for ZIP model before refit,

β_{L}^{M}, β_{H}^{M}

for PM, and

β_{0}^{Z}

,

β_{c}^{Z}

for ZIP models after refitted to all data based on selected DVs and selection criteria

I_{L}^{M}

,

I_{H}^{M}

,

I_{0}^{Z}

,

I_{c}^{Z}

, with

R = 100

subsamples of 70% data. For PM model,

J_{L}^{M}, J_{H}^{M}

refer to the number of frequently selected DVs with

I_{L j}^{M} >

43 (PML), 45 (PMA), and

I_{H j}^{M} >

19 (PML), 17 (PMA); otherwise, they are dropped, as in grey highlight. For ZIP model,

J_{0}^{Z}, J_{c}^{Z}

refer to the number of frequently selected DVs with

I_{0 j}^{Z} > 62

and

I_{c j}^{Z} > 62

; otherwise,

β_{0 j}

and

β_{c j}

are excluded, as in grey highlight. Significant parameters

β_{L j}^{M}, β_{H j}^{M}, β_{0 j}^{Z}, β_{c j}^{Z}

are boldfaced and yellow highlighted.

PML					PMA					ZIPA
$I_{j}^{M}$	43		19		$I_{j}^{M}$	45		17		$I_{j}^{Z}$	62			62
$J^{M}$	45		18		$J^{M}$	39		40		$J^{Z}$	4			45
DVs	Low		High		DVs	Low		High		DVs	Zero			Count
DVs	$I_{L j}^{M}$	$β_{L j}^{M}$	$I_{H j}^{M}$	$β_{H j}^{M}$	DVs	$I_{L j}^{M}$	$β_{L j}^{M}$	$I_{H j}^{M}$	$β_{H j}^{M}$	DVs	$I_{0 j}^{Z}$	$β_{0 j}$	$β_{0 j}^{Z}$	$I_{c j}^{Z}$	$β_{c j}$	$β_{c j}^{Z}$
3	118	0.0182	47	0.0983	3	71	0.0380	126	0.1182	3	44	−0.0022	-	291	0.0404	0.0514
9	88	−0.0003	20	0.0275	9	27	−0.0226	16	0.0106	9	20	−0.0047	-	172	0.0170	0.0450
18	324	0.0636	27	0.0477	18	206	0.0877	149	0.1078	18	-	-	-	88	0.0044	0.1352
19	311	0.0491	3	0.1058	19	200	0.0877	82	0.0821	19	10	−0.0003	-	136	0.0050	−0.0263
20	78	−0.0197	37	0.0919	20	71	−0.0443	27	−0.0532	20	34	−0.0053	-	291	0.0552	0.0717
22	162	0.0098	54	0.0633	22	91	−0.0484	101	0.0839	22	47	−0.0104	-	217	0.0333	0.0441
23	335	−0.2076	24	−0.2634	23	219	−0.2801	159	−0.2732	23	-	-	-	84	−0.0025	−0.0247
26	250	0.0259	98	0.0370	26	212	0.0375	81	0.0119	26	3	1.91 × 10⁻⁵	-	125	0.0069	0.0014
27	338	0.0450	3	0.0068	27	251	0.0755	60	0.0576	27	10	0.0004	-	339	−0.1900	−0.0495
29	324	0.0355	17	0.0633	29	172	0.0663	79	0.0722	29	24	−0.0005	-	267	0.0182	0.0363
31	294	0.0352	14	0.0390	31	165	0.0637	87	0.0496	31	20	−0.0024	-	234	0.0265	0.0515
33	138	−0.0150	-	-	33	50	−0.0516	13	−0.0265	33	55	−0.0123	-	213	0.0243	0.0426
34	287	0.0369	7	0.1254	34	127	0.0586	57	0.0533	34	3	−0.0001	-	88	−0.0022	−0.0234
35	331	0.0578	7	0.0130	35	299	0.1089	43	0.0735	35	20	−0.0029	-	132	0.0099	0.0232
36	335	0.0501	31	0.0450	36	214	0.0690	120	0.0642	36	30	−0.0103	-	281	0.0464	0.0752
37	304	0.0284	13	0.0522	37	183	0.0416	51	0.0567	37	10	−0.0009	-	298	0.0429	0.0524
38	230	−0.0516	7	0.0002	38	121	−0.1553	40	0.0017	38	98	−0.0265	-39.0618	121	0.0076	−0.0065
43	88	−0.0267	3	−0.0238	43	36	−0.0296	44	−0.0703	43	17	0.0008	-	173	−0.0397	−0.0471
44	57	0.0078	-	-	44	54	0.0575	37	0.0932	44	-	-	-	155	−0.0099	−0.0287
45	274	0.0541	24	0.0395	45	249	0.1177	79	0.0982	45	10	−0.0005	-	153	0.0090	0.0188
46	338	−0.0957	14	−0.0442	46	259	−0.1665	78	−0.1081	46	30	−0.0042	-	264	0.0569	0.0362
47	338	−0.1644	20	−0.0469	47	292	−0.2914	77	−0.1493	47	3	0.0000	-	322	−0.0843	−0.0940
49	249	0.0241	10	0.0235	49	91	0.0430	23	0.0378	49	165	0.0366	−0.0193	251	−0.1070	−0.0597
50	331	−0.0898	24	−0.0447	50	197	−0.1691	126	−0.1419	50	-	-	-	122	0.0077	0.0145
51	335	−0.0981	14	−0.0381	51	225	−0.1602	119	−0.1378	51	17	0.0004	-	301	−0.0865	−0.0868
52	314	0.0265	37	0.0265	52	93	0.0447	81	0.0538	52	10	0.0023	-	311	−0.0738	−0.0660
54	84	0.0134	-	-	54	40	0.0111	7	0.0496	54	17	−0.0013	-	213	0.0197	0.0182
55	112	0.0183	7	0.0196	55	33	0.0027	14	−0.0053	55	7	0.0001	-	88	0.0016	0.0109
56	142	0.0193	44	0.0636	56	33	0.0478	114	0.0780	56	3	−0.0003	-	75	0.0012	0.0002
58	240	0.0298	17	0.0632	58	154	0.0718	62	0.0692	58	3	−0.0001	-	128	0.0087	0.0157
59	294	−0.0433	17	−0.0817	59	115	−0.1431	73	−0.1543	59	24	−0.0035	-	200	0.0189	0.0409
60	318	0.0574	24	0.0934	60	183	0.1142	162	0.0945	60	13	0.0024	-	231	−0.0259	−0.0053
61	223	0.0297	3	0.0193	61	167	0.0773	37	0.0514	61	20	−0.0032	-	244	0.0403	0.0560
63	189	−0.0652	20	−0.0031	63	162	−0.1524	125	−0.0583	63	37	−0.0057	-	98	0.0088	0.0362
64	162	0.0132	7	0.0346	64	71	0.0231	7	0.0641	64	-	-	-	139	−0.0220	−0.0409
66	338	−0.1689	7	−0.0538	66	297	−0.3054	64	−0.1943	66	7	−0.0001	-	81	0.0050	0.0175
67	88	−0.0263	-	-	67	37	−0.0730	40	0.0416	67	20	0.0044	-	339	−0.1312	−0.1510
68	210	−0.0265	7	−0.0102	68	90	−0.0940	52	−0.1397	68	7	0.0001	-	67	−0.0013	−0.0041
69	199	0.0210	17	0.0611	69	95	0.0537	26	0.0420	69	7	0.0005	-	180	−0.0190	−0.0286
71	270	0.0388	14	0.0725	71	141	0.0869	76	0.0686	71	34	−0.0040	-	145	0.0098	0.0114
72	318	0.0473	183	0.1172	72	217	0.0774	194	0.0950	72	30	−0.0040	-	268	0.0338	0.0391
73	335	0.0631	88	0.1193	73	242	0.1072	109	0.1064	73	75	−0.0129	−0.1323	288	0.0391	0.0617
75	321	−0.0545	27	−0.0622	75	142	−0.1177	136	−0.1645	75	20	−0.0022	-	301	0.0511	0.0634
76	257	0.0361	10	0.0327	76	161	0.0722	83	0.0827	76	7	0.0009	-	244	−0.0380	−0.0656
77	284	0.0324	17	0.0370	77	170	0.0740	74	0.0615	77	98	−0.0189	−34.9726	149	0.0132	−0.0313

References

Ayuso, Mercedes, Montserrat Guillen, and Jens Perch Nielsen. 2019. Improving automobile insurance ratemaking using telematics: Incorporating mileage and driver behaviour data. Transportation 46: 735–52. [Google Scholar] [CrossRef]
Banerjee, Prithish, Broti Garai, Himel Mallick, Shrabanti Chowdhury, and Saptarshi Chatterjee. 2018. A note on the adaptive lasso for zero-inflated Poisson regression. Journal of Probability and Statistics 2018: 2834183. [Google Scholar] [CrossRef]
Barry, Laurence, and Arthur Charpentier. 2020. Personalization as a promise: Can big data change the practice of insurance? Big Data & Society 7: 2053951720935143. [Google Scholar]
Bhattacharya, Sakyajit, and Paul D. McNicholas. 2014. An adaptive lasso-penalized BIC. arXiv arXiv:1406.1332. [Google Scholar]
Bolderdijk, Jan Willem, Jasper Knockaert, E. M. Steg, and Erik T. Verhoef. 2011. Effects of Pay-As-You-Drive vehicle insurance on young drivers’ speed choice: Results of a dutch field experiment. Accident Analysis & Prevention 43: 1181–86. [Google Scholar]
Cameron, A. Colin, and Pravin K. Trivedi. 1990. Regression-based tests for overdispersion in the Poisson model. Journal of Econometrics 46: 347–64. [Google Scholar] [CrossRef]
Chan, Jennifer S. K., S. T. Boris Choy, Udi Makov, Ariel Shamir, and Vered Shapovalov. 2022. Variable selection algorithm for a mixture of poisson regression for handling overdispersion in claims frequency modeling using telematics car driving data. Risks 10: 83. [Google Scholar] [CrossRef]
Chassagnon, Arnold, and Pierre-André Chiappori. 1997. Insurance under Moral Hazard and Adverse Selection: The Case of Pure Competition. Delta-CREST Document. Available online: https://econpapers.repec.org/paper/fthlavale/28.htm (accessed on 1 August 2024).
Czado, Claudia, Tilmann Gneiting, and Leonhard Held. 2009. Predictive model assessment for count data. Biometrics 65: 1254–61. [Google Scholar] [CrossRef]
Dean, Curtis Gary. 1997. An introduction to credibility. In Casualty Actuary Forum. Arlington: Casualty Actuarial Society, pp. 55–66. Available online: https://www.casact.org/sites/default/files/database/forum_97wforum_97wf055.pdf (accessed on 1 August 2024).
Deng, Min, Mostafa S. Aminzadeh, and Banghee So. 2024. Inference for the parameters of a zero-inflated poisson predictive model. Risks 12: 104. [Google Scholar] [CrossRef]
Duval, Francis, Jean-Philippe Boucher, and Mathieu Pigeon. 2023. Enhancing claim classification with feature extraction from anomaly-detection-derived routine and peculiarity profiles. Journal of Risk and Insurance 90: 421–58. [Google Scholar] [CrossRef]
Eling, Martin, and Mirko Kraft. 2020. The impact of telematics on the insurability of risks. The Journal of Risk Finance 21: 77–109. [Google Scholar] [CrossRef]
Ellison, Adrian B., Michiel C. J. Bliemer, and Stephen P. Greaves. 2015. Evaluating changes in driver behaviour: A risk profiling approach. Accident Analysis & Prevention 75: 298–309. [Google Scholar]
Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
Fawcett, Tom. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27: 861–74. [Google Scholar]
Gao, Guangyuan, and Mario V. Wüthrich. 2018. Feature extraction from telematics car driving heatmaps. European Actuarial Journal 8: 383–406. [Google Scholar] [CrossRef]
Gao, Guangyuan, Mario V. Wüthrich, and Hanfang Yang. 2019. Evaluation of driving risk at different speeds. Insurance: Mathematics and Economics 88: 108–19. [Google Scholar] [CrossRef]
Gao, Guangyuan, Shengwang Meng, and Mario V. Wüthrich. 2019. Claims frequency modeling using telematics car driving data. Scandinavian Actuarial Journal 2019: 143–62. [Google Scholar] [CrossRef]
Guillen, Montserrat, Jens Perch Nielsen, Ana M. Pérez-Marín, and Valandis Elpidorou. 2020. Can automobile insurance telematics predict the risk of near-miss events? North American Actuarial Journal 24: 141–52. [Google Scholar] [CrossRef]
Guillen, Montserrat, Jens Perch Nielsen, and Ana M. Pérez-Marín. 2021. Near-miss telematics in motor insurance. Journal of Risk and Insurance 88: 569–89. [Google Scholar] [CrossRef]
Guillen, Montserrat, Jens Perch Nielsen, Mercedes Ayuso, and Ana M. Pérez-Marín. 2019. The use of telematics devices to improve automobile insurance rates. Risk Analysis 39: 662–72. [Google Scholar] [CrossRef]
Huang, Yifan, and Shengwang Meng. 2019. Automobile insurance classification ratemaking based on telematics driving data. Decision Support Systems 127: 113156. [Google Scholar] [CrossRef]
Hurley, Rich, Peter Evans, and Arun Menon. 2015. Insurance Disrupted: General Insurance in a Connected World. London: The Creative Studio, Deloitte. [Google Scholar]
Jeong, Himchan. 2022. Dimension reduction techniques for summarized telematics data. The Journal of Risk Management 33: 1–24. [Google Scholar] [CrossRef]
Jeong, Himchan, and Emiliano A. Valdez. 2018. Ratemaking Application of Bayesian LASSO with Conjugate Hyperprior. Available online: https://ssrn.com/abstract=3251623 (accessed on 1 December 2018).
Kantor, S., and Tomas Stárek. 2014. Design of algorithms for payment telematics systems evaluating driver’s driving style. Transactions on Transport Sciences 7: 9. [Google Scholar] [CrossRef]
Lambert, Diane. 1992. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34: 1–14. [Google Scholar] [CrossRef]
Leisch, Friedrich. 2004. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software 11: 1–18. [Google Scholar] [CrossRef]
Ma, Yu-Luen, Xiaoyu Zhu, Xianbiao Hu, and Yi-Chang Chiu. 2018. The use of context-sensitive insurance telematics data in auto insurance rate making. Transportation Research Part A: Policy and Practice 113: 243–58. [Google Scholar] [CrossRef]
Makov, Udi, and Jim Weiss. 2016. Predictive modeling for usage-based auto insurance. Predictive Modeling Applications in Actuarial Science 2: 290. [Google Scholar]
Meinshausen, Nicolai, and Peter Bühlmann. 2006. Variable selection and high-dimensional graphs with the lasso. Annals of Statistics 34: 1436–62. [Google Scholar] [CrossRef]
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press. [Google Scholar]
Osafune, Tatsuaki, Toshimitsu Takahashi, Noboru Kiyama, Tsuneo Sobue, Hirozumi Yamaguchi, and Teruo Higashino. 2017. Analysis of accident risks from driving behaviors. International Journal of Intelligent Transportation Systems Research 5: 192–202. [Google Scholar] [CrossRef]
Paefgen, Johannes, Thorsten Staake, and Frédéric Thiesse. 2013. Evaluation and aggregation of Pay-As-You-Drive insurance rate factors: A classification analysis approach. Decision Support Systems 56: 192–201. [Google Scholar] [CrossRef]
Park, Trevor, and George Casella. 2008. The Bayesian lasso. Journal of the American Statistical Association 103: 681–86. [Google Scholar] [CrossRef]
Shannon, Claude Elwood. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5: 3–55. [Google Scholar] [CrossRef]
So, Banghee, Jean-Philippe Boucher, and Emiliano A. Valdez. 2021. Cost-sensitive multi-class adaboost for understanding driving behavior based on telematics. ASTIN Bulletin: The Journal of the IAA 51: 719–51. [Google Scholar] [CrossRef]
Soleymanian, Miremad, Charles B. Weinberg, and Ting Zhu. 2019. Sensor data and behavioral tracking: Does usage-based auto insurance benefit drivers? Marketing Science 38: 21–43. [Google Scholar] [CrossRef]
Städler, Nicolas, Peter Bühlmann, and Sara Van De Geer. 2010. L1-penalization for mixture regression models. TEST: An Official Journal of the Spanish Society of Statistics and Operations Research 19: 209–56. [Google Scholar] [CrossRef]
Stipancic, Joshua, Luis Miranda-Moreno, and Nicolas Saunier. 2018. Vehicle manoeuvers as surrogate safety measures: Extracting data from the GPS-enabled smartphones of regular drivers. Accident Analysis & Prevention 115: 160–69. [Google Scholar]
Tang, Yanlin, Liya Xiang, and Zhongyi Zhu. 2014. Risk factor selection in rate making: EM adaptive lasso for zero-inflated Poisson regression models. Risk Analysis 34: 1112–27. [Google Scholar] [CrossRef]
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58: 267–88. [Google Scholar] [CrossRef]
Tselentis, Dimitrios I., George Yannis, and Eleni I. Vlahogianni. 2016. Innovative insurance schemes: Pay As/How You Drive. Transportation Research Procedia 14: 362–71. [Google Scholar] [CrossRef]
Verbelen, Roel, Katrien Antonio, and Gerda Claeskens. 2018. Unravelling the predictive power of telematics data in car insurance pricing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 67: 1275–304. [Google Scholar] [CrossRef]
Weerasinghe, K. P. M. L., and M. C. Wijegunasekara. 2016. A comparative study of data mining algorithms in the prediction of auto insurance claims. European International Journal of Science and Technology 5: 47–54. [Google Scholar]
Winlaw, Manda, Stefan H. Steiner, R. Jock MacKay, and Allaa R. Hilal. 2019. Using telematics data to find risky driver behaviour. Accident Analysis & Prevention 131: 131–36. [Google Scholar]
Wouters, Peter I. J., and John M. J. Bos. 2000. Traffic accident reduction by monitoring driver behaviour with in-car data recorders. Accident Analysis & Prevention 32: 643–50. [Google Scholar]
Wüthrich, Mario V. 2017. Covariate selection from telematics car driving data. European Actuarial Journal 7: 89–108. [Google Scholar] [CrossRef]
Zeileis, Achim, Christian Kleiber, and Simon Jackman. 2008. Regression models for count data in R. Journal of Statistical Software 27: 1–25. [Google Scholar] [CrossRef]
Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]
Zou, Hui, and Hao Helen Zhang. 2009. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics 37: 1733. [Google Scholar] [CrossRef]
Zou, Hui, and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67: 301–20. [Google Scholar] [CrossRef]

Figure 1. Histogram of claims (left), exposure (mid), and claims per exposure (right).

Figure 2. Heat map of coefficients, with significant values denoted by “S”.

Figure 3. Scatter plots of observed annual claim frequencies

a_{i}

against predicted annual claim frequencies

{\hat{a}}_{i}

using TPA-1 model cross-classified with low- and high-claim groups by the four thresholds, with colour indicating claims

y_{i} = 0, 1, \geq 2

.

Figure 3. Scatter plots of observed annual claim frequencies

a_{i}

against predicted annual claim frequencies

{\hat{a}}_{i}

using TPA-1 model cross-classified with low- and high-claim groups by the four thresholds, with colour indicating claims

y_{i} = 0, 1, \geq 2

.

Figure 4. K-means clustering analysis to segment drivers into low-claim cluster in red, with blue shade ellipse and high-claim cluster in blue, as well as red shade ellipse for TP and PM models.

Figure 5. ROC curve and AUC values for (a–d) the four best stage 2 TP; (e) PMA model; and (f) four best stage 2 TP models and one PM model.

Table 1. Model names for TP, PM, and ZIP models with different lasso regularisation.

Stage 1 Threshold Poisson		Stage 2 Threshold Poisson			Poisson Mixture		Zero-Inflated
TPL-1	Lasso	TPLL-2	TPAL-2	Lasso	PML	Lasso	ZIPL	Lasso
TPE-1	Elastic net	TPLE-2	TPAE-2	Elastic net	PME	Elastic net	ZIPL	Lasso
TPA-1	Adaptive lasso	TPLA-2	TPAA-2	Adaptive lasso	PMA	Adaptive lasso	ZIPA	Adaptive lasso
TPN-1	Adaptive elastic net	TPLN-2	TPAN-2	Adaptive elastic net	PMN	Adaptive elastic net	ZIPA	Adaptive lasso

Table 2. Identification of informative DVs. DVs with one asterisk have

H_{j} \geq 1

and

S_{j} \geq 1 %

. DVs with two asterisks have

I G_{j} > 0

, indicating information gain.

Table 2. Identification of informative DVs. DVs with one asterisk have

H_{j} \geq 1

and

S_{j} \geq 1 %

. DVs with two asterisks have

I G_{j} > 0

, indicating information gain.

DVs	$ρ$	$H_{j}$	$S_{j} (%)$	${IG}_{j}$	Flag	DVs	$ρ$	$H_{j}$	$S_{j} (%)$	${IG}_{j}$	Flag	DVs	$ρ$	$H_{j}$	$S_{j} (%)$	${IG}_{j}$	Flag	DVs	$ρ$	$H_{j}$	$S_{j} (%)$	${IG}_{j}$	Flag
1	−0.002	0.08	0.012	0	✗	$22^{*}$	0.003	89.45	12.991	0	✓	39	0.008	0.42	0.076	0	✗	$59^{*}$	0.002	45.89	8.312	0	✓
2	0.012	0.02	0.002	0	✗	$23^{*}$	−0.001	87.75	12.871	0	✓	$43^{* *}$	−0.053	99.99	13.789	0.002	✓	$60^{* *}$	−0.041	99.93	13.787	0.001	✓
$3^{*}$	0.018	7.04	1.231	0	✓	24	0.017	1.02	0.192	0	✗	$44^{*}$	−0.004	19.69	3.795	0	✓	$61^{*}$	0.018	35.71	6.472	0	✓
4	−0.011	1.91	0.310	0	✗	25	−0.002	0.30	0.046	0	✗	$45^{*}$	0.002	18.61	3.678	0	✓	$63^{*}$	0.006	32.91	6.462	0	✓
5	0.003	0.79	0.119	0	✗	$26^{*}$	0.005	17.50	3.990	0	✓	$46^{*}$	−0.003	90.51	13.008	0	✓	$64^{*}$	−0.021	61.78	10.149	0	✓
7	−0.002	0.01	0.001	0	✗	$27^{* *}$	−0.060	99.69	13.773	0.002	✓	$47^{*}$	−0.035	92.68	13.246	0	✓	65	0.0005	1.22	0.295	0	✗
8	0.004	0.10	0.014	0	✗	28	−0.004	0.03	0.003	0	✗	$49^{* *}$	−0.061	99.98	13.789	0.002	✓	$66^{*}$	0.008	4.41	1.339	0	✓
$9^{*}$	0.010	28.69	5.666	0	✓	$29^{*}$	0.023	4.41	1.288	0	✓	$50^{*}$	0.012	6.65	1.247	0	✓	$67^{* *}$	−0.060	99.54	13.766	0.002	✓
10	−0.002	0.01	0.001	0	✗	$31^{*}$	0.014	15.93	3.698	0	✓	$51^{*}$	−0.025	67.41	10.718	0	✓	$68^{*}$	−0.019	76.17	11.953	0	✓
13	−0.003	0.45	0.069	0	✗	32	0.0247	4.41	1.229	0	✗	$52^{*}$	−0.039	94.18	13.357	0	✓	$69^{*}$	−0.007	7.83	1.895	0	✓
14	−0.006	0.06	0.009	0	✗	$33^{*}$	0.011	39.01	7.957	0	✓	53	0.015	3.00	0.645	0	✗	$71^{*}$	0.006	32.11	6.585	0	✓
15	−0.0001	0.50	0.076	0	✗	$34^{*}$	−0.001	21.44	5.114	0	✓	$54^{*}$	0.023	5.03	1.161	0	✓	$72^{*}$	0.007	41.24	7.861	0	✓
16	0.006	0.24	0.036	0	✗	$35^{*}$	0.010	35.54	7.257	0	✓	$55^{*}$	−0.002	21.21	4.424	0	✓	$73^{*}$	0.023	11.09	2.775	0	✓
$18^{* *}$	−0.010	99.90	13.785	0.001	✓	$36^{*}$	0.009	54.80	9.701	0	✓	$56^{*}$	0.001	34.25	6.654	0	✓	74	0.013	1.03	0.222	0	✗
$19^{*}$	0.003	77.88	12.129	0	✓	$37^{*}$	0.024	2.40	0.856	0	✓	57	−0.012	1.23	0.229	0	✗	$75^{*}$	0.029	10.85	2.669	0	✓
$20^{*}$	0.006	67.10	10.980	0	✓	$38^{*}$	0.022	3.84	1.355	0	✓	$58^{*}$	−0.008	61.74	10.378	0	✓	$76^{*}$	−0.021	35.50	7.043	0	✓
																		$77^{*}$	0.011	13.61	3.354	0	✓

Table 4. Assumptions summary in a case study in thousand dollars.

	Driver i (Safe)	Safe Group	Risky Group
Average annual premium ${\bar{P}}_{t}^{g}$	-	0.3	0.5
Historical annual premium $P_{i, t - 1}, P_{t - 1}^{g}$	0.5	0.31	0.51
Historical annual claims $L_{i, t - 1}, L_{t - 1}^{g}$	0.2	0.1	0.3
Predicted annual claim frequencies ${\hat{y}}_{i, t - 1}, {\hat{y}}_{t - 1}^{g}$	0.15	0.105	0.305

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Usman, F.; Chan, J.S.K.; Makov, U.E.; Wang, Y.; Dong, A.X.D. Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation. Risks 2024, 12, 137. https://doi.org/10.3390/risks12090137

AMA Style

Usman F, Chan JSK, Makov UE, Wang Y, Dong AXD. Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation. Risks. 2024; 12(9):137. https://doi.org/10.3390/risks12090137

Chicago/Turabian Style

Usman, Farha, Jennifer S. K. Chan, Udi E. Makov, Yang Wang, and Alice X. D. Dong. 2024. "Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation" Risks 12, no. 9: 137. https://doi.org/10.3390/risks12090137

APA Style

Usman, F., Chan, J. S. K., Makov, U. E., Wang, Y., & Dong, A. X. D. (2024). Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation. Risks, 12(9), 137. https://doi.org/10.3390/risks12090137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Claim Prediction and Premium Pricing for Telematics Auto Insurance Data Using Poisson Regression with Lasso Regularisation

Abstract

1. Introduction

2. Methodologies

2.1. Regression Models

2.1.1. Poisson Regression Model

2.1.2. Poisson Mixture Model

2.2. Regularisation Techniques

2.3. Model Performance Measures

3. Empirical Studies

3.1. Data Description

3.2. Data Cleaning and DVs Setting

3.3. Exploratory Data Analyses

3.4. Two-Stage Threshold Poisson Model

3.5. Poisson Mixture Model

3.6. Zero-Inflated Poisson Model

3.7. Model Comparison and Selection

4. UBI Experience Rating Premium

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Details of Stage 1 TP Model Procedures

Appendix B. Some Technical Details of Model Implementation

Appendix C. Driving Variable Description

Appendix D. Visualisation of Driver Variables

Appendix D.1. Driving Variables by Claim Frequency

Appendix D.2. Correlation Matrix and Hierarchical Clustering of Driving Variables

Appendix E. Parameter Estimates of All Models

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI