1. Introduction
Insurance is a fundamental risk-transfer mechanism of modern society. The risks of the insured are financially
transferred to the insurer and at the same time
transformed: the damage that would be financially ruinous for an individual is distributed by the insurer among all the members of the same pool and thus becomes sustainable [
1,
2,
3]. Despite the effectiveness of this risk pooling and spreading mechanism, insurers have an interest in preemptively reducing the risks transferred by the insured. Since many risks depend on individual behavior, as in the illustrative case of motor insurance, the insurer’s preventive activity should move from, and act upon, the behavior of the insured. But policyholders’ behavior is in principle unobservable.
Since the early 2000s, insurance companies selling third-party liability motor insurance policies have invested heavily in the use of telematics data to track drivers’ behavior. The data collected should make it possible to assess policyholders’ risk profile and adjust their policy premium accordingly. The insurance industry terms this opportunity usage-based insurance (UBI).
Business experience over the last two decades shows a significant evolution in the use of behavioral data. Insurance companies not only use it to refine the risk profile of policyholders but also feed the aggregated information they obtain from behavioral data back to the drivers. The aim is to promote greater awareness by policyholders of their driving style and to encourage a change in driving behavior in the case that their driving habits show criticalities that increase risk exposure. In this respect, insurance companies speak of
coaching [
4,
5,
6]. The usual abstract assumption is that coaching works because policyholders who receive feedback improve their driving style and become better drivers [
7] (p. 22).
The effectiveness of coaching, however, is yet to be proven. Our hypothesis is that it depends on the willingness of users to take in and use the information fed back by the insurance company in order to motivate them to change their behavior, which is usually conveyed by a digital app. A rapidly developing strand of research [
8,
9,
10] calls this kind of activity
engagement. By engagement, we mean the time and effort that individuals put into improving their risk profile. We propose to distinguish a broad and a narrow sense of engagement. A recent paper [
11] exemplifies the use of the term in a broad sense: policyholders who took out health and life insurance based on behavioral data are considered ‘engaged’ when they participate in pre-established programs including diagnostic screening, gym membership, and daily exercise to promote a healthy lifestyle.
For our research on telematics motor insurance, however, it is important to focus on a narrow sense of the term ‘engagement’ to refer exclusively to the users’ interaction with the app. Our research assumes that this engagement is also a behavior, which like driving behavior is (or can be) tracked by digital devices. If by ‘behavior’ we do not only mean ‘the use of the car’ but also ‘the use of the app’, then insurance companies have to deal with two different types of behavioral data—behavioral data on driving style, and behavioral data on users’ interaction with the app.
In this article, we want to explore the interplay of information feedback and behavioral change. According to this approach, the usual UBI formulation should be clarified. The success of proactive strategies implemented by insurance companies is, in fact, based on two different types of behavior that can both be tracked by the telematics app: not only the driving behavior of policyholders but also their interaction with the app, i.e., their being engaged. Insurance companies selling telematics insurance policies collect a lot of data about both behaviors: use of the car and use of the app. This is, in our opinion, a crucial novelty.
Behavioral data are considered a “remarkable advance” in automobile insurance [
12] (p. 662). Previously, the insurance industry could only use variables related to fixed characteristics of the policyholder and the vehicle, many of which, such as age and gender, are not causally related to the risk of getting into a crash. They are proxy variables. Behavioral variables, instead, are
causally related to the risk of road accidents and promise to enable personalized tarification, which may be considered a fairer policy premium setting system [
13,
14]. Moreover, as we have seen, behavioral data processing can be carried out to implement coaching strategies and possibly improve policyholders’ driving behavior. Behavioral data processing, thus, is expected to impact policyholders (who know that their behavior is monitored), insurance companies (which can improve their predicting capacity by combining behavioral and non-behavioral variables), and the relationship between policyholders and insurance companies (triggering feedback loops).
The objective of our research is to test if the effectiveness of current experiments depends on the integration of these two distinct types of behavioral data. This integration raises a number of new questions: How should engagement be properly defined? How should it be measured? Is there any empirical evidence of a connection between engagement and driving behavior improvement? And how does this connection change over time? To answer these questions, we investigated the dataset of an insurance company selling telematics motor insurance policies. In
Section 2, we describe the emergence of the idea of insurance as a loss prevention institution and the evolution of usage-based auto insurance policies over the past two decades. In
Section 3, we provide a brief overview of relevant research. In
Section 4, we describe the dataset we worked on, the methodology we followed, and some limitations of our study.
Section 5 presents our main findings.
Section 6 exposes our conclusion and suggests possible directions for future developments of behavioral insurance.
2. The Evolution of Usage-Based Auto Insurance Policies
In the mid-1990s, the motor insurance industry began to question the insurance model that merely compensates policyholders’ claims. Starting from the assumption that the majority of road accidents are caused by human miscalculations (of driving capability, road conditions, or driving control under certain road and weather conditions), the possibility of insurance companies acting as loss prevention companies began to be discussed. The aim was “stopping claims before they happen” [
15] (p. 271). Underpinning this project was the conviction that insurance could not simply be a risk-spreading mechanism. Spreading risks basically means that policyholders transfer their risks to the insurance company, which distributes them over the pool of insured customers. The result is risk mitigation for the customers who feel relieved from the financial consequences of possible future damages.
A consequence, however, can also be that policyholders are less incentivized to take precautionary measures, producing the thorny problem of moral hazard [
16,
17]. To counter this attitude, it has been suggested to try “to make people more individually accountable for risks” [
18] (p. 1). The basic idea was to move from “spreading risks” to “embracing risks”: even if policyholders pay for coverage, they should be aware that they retain, at least in part, both a moral and financial responsibility for the consequences of their behavior [
18] (p. 3). In the case of auto insurance, this meant that drivers should engage in preventive actions. But prevention first requires an awareness of the risks to be avoided in order for bad driving habits to be removed [
15] (p. 278). What remained unclear, however, was how the insurance industry could tackle the problem of bad driving. This is where digital devices used as monitoring devices to produce behavioral data come into play.
The first form of
usage-based insurance (UBI) tested in the early 2000s was the so-called
pay-as-you-drive (PAYD) insurance policy [
19]. The novelty of this policy was that its pricing system was based on the mileage driven by the policyholders during the policy term. The underlying idea was that mileage is a crucial risk factor statistically related to claim probability. The assumption was that people with low mileage are low-risk motorists and should pay less, whereas people with high mileage are high-risk motorists and should pay more. The PAYD-pricing system was later questioned, as it does not take into account that higher mileage can also mean higher driving experience producing better driving skills. Increasing mileage can be connected with a ‘learning effect’ [
12] that, in turn, might decrease the risk of road accidents. Between young licensed drivers and claim probability, on the other hand, there is a similar statistically significant relationship.
UBI later evolved into
pay-how-you-drive (PHYD) insurance policies, based on the idea that driving style is causally related to the risk of road accidents and should also be taken into consideration when setting the policy premium. Between statistical variables like gender and age and claim probability, there is actually a strong statistical correlation but no evident causal relationship. Between phone distraction and the likelihood of getting into a crash, instead, there is a causal relationship. PHYD insurance policies, therefore, keep measuring mileage as PAYD policies, but they also track drivers’ behavioral characteristics to assess their actual driving style. Drivers’ behavior is tracked by means of telemetry packages. PHYD insurance policies usually require the installation of a black box in the car with the policyholder’s consent. This black box generates a huge amount of behavioral data that allows the company to monitor the policyholders’ driving style—how they steer, how and how often they brake, whether they exceed the speed limit, whether they drive predominantly during the day or at night, and so on. The aggregation of these features makes it possible to assess the individual risk profile and can be used to adjust the policy premium accordingly. This information can also be the basis for coaching services that aim to prevent claims before they occur [
20].
The crucial condition to implement coaching strategies is
feedback. In most advanced telematics insurance solutions, drivers who take out a PHYD insurance policy are supposed to download an app on their smartphone. This app notifies policyholders of the overall score they achieved depending on how well or badly they drove. The same app also communicates the scores achieved in the main features (maneuvers) that the company uses to define the individual driving profile [
11,
21]. Finally, the app shows every single trip traveled by the insured and indicates exactly whether any criticalities were found and what they are (e.g., where the insured exceeded the speed limit or made a U-turn). This information is made available after driving, not in real time.
By means of feedback, information literally
circulates, that is, it runs circularly. Drivers disclose information about their driving behavior to the insurance company. The insurance company, in turn, discloses information concerning risk assessment and risk profile to the drivers. Telematics insurance policies, thus, do not simply turn information asymmetry upside down as many scholars argue [
22,
23]. They rather trigger a circular relationship where behavior produces information, and information is fed back to change behavior. What is really going on in PHYD insurance policies is a kind of ‘feedback loop’.
3. Previous Research
As shown by a recent bibliometric review of telematics-based auto insurance [
24], the literature on telematics motor insurance is very large and ever-expanding. Here, we only focus on the contributions which explore the relationship between information feedback and driving behavior. A recent overview of studies investigating the impact of telematics on road safety points out that there is still scarce research about before/after feedback provision to the drivers [
25].
More than twenty years ago, Wouters and Bos [
26] (p. 644ff) put forward the hypothesis that drivers who know that they are being monitored might be encouraged to change their behavior, especially if they receive feedback as a result of this monitoring. In their empirical research on a business fleet, Wouters and Bos assessed the effect of this ‘behavioral feedback’ based on JDR (journey data recorder) by comparing an experimental group with a control group of vehicles for a period of 12 months. A statistically significant accident reduction could be detected only for some of the fleet sets, but the overall accident rate in the experimental group was reduced by 20% after the intervention. However, both monitoring and behavioral feedback were not linked to an insurance policy and lacked the reinforcement that insurance policies usually provide in addition to feedback, namely, financial incentives.
A decade later, Farmer, Kirley, and McCartt in [
27] tested the effects of in-vehicle monitoring on the driving behavior of teenagers, whose crash rates, as is well-known, are consistently higher than any other age group. Feedback, in this case, was notified to their parents on a dedicated website. After 24 weeks monitoring on 85 recently licensed drivers in a suburban Washington DC area, it turned out that there were no statistically relevant changes in driving behavior and that parents themselves made few visits to the website to check the driving behavior of their children. Also in this case, feedback was not associated to an insurance policy.
In 2011, Bolderdijk, Knockaert, Steg, and Verhoef in [
28] carried out a field experiment on the effects of a PAYD insurance policy on young drivers’ speeding behavior. The basic reasoning was that young drivers are overrepresented in road accidents statistics because they tend to drive at higher speed, and speed is one of the most important behavioral determinants of crash risk. The goal of their research was to test if the provision of financial rewards for keeping the speed limit could encourage young drivers to modify their driving behavior. Participants could check their performance by logging in to a website which provided detailed feedback on speed violations, mileage and night-time driving, and showed by default the prospective overall discount they could earn. The incentive group (ca. 150 participants) showed a modest but significant reduction in speeding, strongly associated to financial incentives (when financial incentives were removed, speeding increased again).
The research that comes closest to the problems we investigate in this article is that of Soleymanian, Weinberg and Zhu [
7]. Based on the dataset provided by a major US insurance company offering a PHYD policy, Soleymanian and colleagues were able to observe more than 100,000 customers over a 32-month period. Their main research question was whether there is a statistically significant improvement in the driving behavior of UBI customers compared to usual customers. Their research showed that UBI customers improved their driving score by ca. 9% (from 62.05 in week 1 to 67.87 in week 26), that this improvement was higher in early weeks and for young drivers, and that it did not depend solely on feedback but also on financial incentives. In weeks 11 and 12, 15% of UBI customers dropped out. The consequence was a significant decline in harsh braking, which can be interpreted as the outcome of self-selection: PHYD policies retain the best customers and let bad customers leave.
Soleymanian, Weinberg, and Zhu’s research [
7] is very important, mainly because it is based on an insurance company dataset and observes UBI customers over time. However, it does not investigate many issues that are crucial for us. For example, if the purpose of coaching strategies is the improvement of policyholders’ driving behavior, how should an improvement be properly defined? And how should it be measured by insurance companies that have access to behavioral data? Since improvement should be the result of coaching, how should a coaching process be defined? Are there short-term and long-term coaching effects? Crucial questions for us are also how engagement can be defined and measured, whether there is a connection between engagement and driving behavior improvement, and how this connection changes over time. In our empirical research, we deal with these questions.
4. Dataset and Methodology
The data we worked on were taken from PHYD insurance policies based on mobile telematics. In this case, an app in the smartphone replaces the usual black box. Such a replacement has advantages and disadvantages. The smartphone is usually regarded as an excellent platform for providing users with prompt feedback [
29,
30]. Moreover, the smartphone detects phone distraction, which is known to be one of the main causes of road accidents and a crucial feature to be integrated into the score evaluation process. The main disadvantage is that telematics data produced by a smartphone are less accurate than the telematics data produced by a black box and require more preprocessing.
During each trip, this app downloads geolocation information from its map provider and records raw data from the GPS, the accelerometer, and the gyroscope as well as from the smartphone system to check for phone usage. The data is processed in real time to assess each driving session based on four key aspects: the attention paid by the driver who should not use the phone while driving the car, compliance with speed limits, cautiousness on the road, and a set of circumstances depending on some external factors (such as driving time during rush hours). Accordingly, four sub-scores are generated to analyze each trip feature separately, namely: ‘attentive driving’, ‘conscious driving’, ‘smooth driving’, and ‘contextual’ scores. They all range from 0 (poor) to 100 (excellent) and are described in
Table 1.
A cumulative trip score is computed as a weighted average of the four sub-scores. It still ranges between 0 and 100, but weights are set by the insurance company, which can balance the importance of the features as it considers most appropriate. Immediately after each driving session, the trip scores are displayed, and all significant critical events are localized on the trip map (as shown by the first screenshot in
Figure 1) to ease the scores interpretation. Past trips remain visible and searchable in the app, while a weekly score for the current week is updated on the app homepage as is visible on the right image of
Figure 1.
In many mobile telematics-based insurance policies on the market, the score is used to reward policyholders with financial incentives (e.g., fuel cashback, vouchers, and discount upon renewal of the policy). This is not the case of the UBI motor insurance policy we worked on. On the one side, the absence of financial rewards gave us the opportunity to investigate the pure interplay of information feedback and behavioral change. On the other side, the absence of financial rewards deprived us of the possibility of exploring the relationship between digital engagement and a bonus system based on behavioral improvement. As we explain in
Section 4.3, this is an important limitation of our research.
4.1. Data Preprocessing
We accessed the trip scores and trip metadata of 498 new customers, onboarded in a 9-month period from March 2022 in a Western European country. The observation period was 35 weeks. In the app weekly summary, the automatically defined week always starts on Monday, disregarding the exact onboarding day of each individual. For this reason, we considered Monday-starting weeks and associated the week indexes
to each trip, where
corresponds to the onboarding week for each user, ‘aligning’ the users according to their own timing. Coherently, for each user
, we computed the weekly score
at each
i-th week as the mean of the trip scores of that week. All the values were been linearly scaled from the
range into
, and hence, each
takes values from 0 (bad) to 1 (excellent). The total number of weekly scores is 4419. This feature is described in the histogram in
Figure 2 and in the first row of
Table 2. The median score is 0.6065 and half of the values are between 0.4343 and 0.7690.
Telematics devices, however, can also collect further behavioral features that do not (yet) go into the score. Our app also records the users’ app usage by collecting data about browsing sessions. Each session is defined as a continuous time frame where the app is in the foreground on the smartphone. We used app session data to measure users’ engagement. For this purpose, conceptual decisions are required. One can use the number of sessions or the time spent on the app. The number of sessions itself can be per day or per week. The time spent on the app, in turn, can be computed per session or aggregated. We opted for time spent on the app aggregated over the week.
The original dataset provided 21,283 app sessions of varying durations, up to 16 min. The app session data are presented in the second and third rows of
Table 2. For each user, we first aggregated each session duration on a daily scale, then on a weekly scale, to match the processing workflow previously described for scoring. We denoted such aggregated values as
for all week indexes
and each user
.
Figure 3, summarizing the behavior of our users, shows that almost all policyholders look at the app mainly in the very first weeks of the program, and they spend few minutes weekly interacting with the app.
4.2. Approaches for Data Analysis
The first step of our analysis focused on improvement in driving behavior. In the available literature, the notion of improvement has not yet been clearly defined. Weidner, Transchel, and Weidner [
31] (p. 214) claimed that neither in science nor in practice is there a “standardised method to achieve a clear ‘score’ of driving behavior”. Soleymanian, Weinberg and Zhu in [
7] aggregated scores as mean values over weeks for the entire pool of policyholders and compare them among weeks, without distinctions among users and disregarding drop-off consequences. Yet, should an increase in the average score be considered an improvement, or are there more refined ways of defining it? Should we analyze the pool with average scores, or work at individual level?
Unlike [
7], we decided to analyze improvement in driving behavior for each individual policyholder, and we introduced two different workflows. The first workflow, described in
Section 4.2.1, explores for each user whether there is an improvement in the initial driving style in any week of the period under consideration. The second one, discussed in
Section 4.2.2, observes individual trends for the entire period.
In both cases, we split the users into four classes according to the quartile values of the 498 initial scores
. The
merit-based classes, labeled as ‘very-low’, ‘medium-low’, ‘medium-high’, and ‘very-high’, represent different initial scenarios for our analysis of coaching. In order to reliably measure improvement, we reasoned, it is implausible to include all drivers in an undifferentiated group, because the margins for improvement are of course very different for bad drivers with various critical issues, which can be addressed, than for drivers who already drive excellently, for whom there is little or no room for improvement. However accurate and effective it may be, coaching will have little effect on good drivers (those we include in the medium-high and very-high groups). At the same time, we can expect it to make a difference in the driving style of very-low and medium-low groups. Our analysis of improvement, therefore, differentially explores the improvement effects in the four groups of drivers with different skill levels. A precise description of merit-based classes can be found in
Table 3.
In the following sections, we present the two approaches we used to investigate coaching effects based on two very different metrics to quantify the improvement of driving scores.
4.2.1. Coaching Effects over Single Weeks
In our first set-up, we considered data points corresponding to the
for
i > 0. To study users’ behavior after their initial week, we simply considered the difference:
for each user
k-th. In this case, we could independently study 3921 data points. As the scores are greater than zero, a positive value of
denotes an enhancement in the driving style of the
k-user in the
i-th week with respect to his/her initial score. We consider only sufficiently high score increases given by
to be an
improvement, and we associate the corresponding data point to the
deviation-based class ‘Positive’, relative to users with a positive coaching effect over single weeks. On the contrary, we cast the data points with
into the ‘Negative’ class representing weekly driving sessions with behaviors worse than the initial one. Difference values
correspond to a null or very moderate variation of the driving score, and the corresponding data points are therefore associated to the ‘Null’ class of driving sessions, with no relevant changes in the score. We note that the choice of the amplitude of the ‘Null’-related range is arbitrary, and we have reasonably set it as the 5%-wide interval (as the scores are between 0 and 1, and
can thus take values from −1 to +1).
From a methodological perspective, we remark that the definition of
could also be based on the ratio
instead of on the difference
. In that case, the three classes ‘Positive’, ‘Negative’, and ‘Null’ would be defined around 1 by setting as stability range the thresholds based on the 5% central interval. In our analysis, we initially took both measures into account and then opted for the difference because, based on the dataset at our disposal, this approach allowed us to better discriminate the three deviation-based classes. We associated each
to the cumulative duration
of app usage over a three-week period, which includes the current week
i and the two preceding ones, by summing the weekly duration (in seconds) as
The reason for including the current week of analysis and the previous two weeks is that users’ behavioral patterns are not isolated within single weeks. Rather, they may be influenced by their interaction with the app in the previous weeks.
4.2.2. Coaching Effects over the Entire Period
The previous approach enabled us to take into account only temporary coaching effects, as the changes in each driver’s score do not correlate with the passing of time. Since we are interested in long-lasting improvements as well, we also analyzed the evolution of the score with respect to the week indices
i-th for each user independently. We implemented linear regression models:
explaining the score as the function
s of
i, for each policyholder separately. An example of linear regression is reported in
Figure 4 for one (anonymous) user: on the horizontal axis, we read the index
i of the week since the user’s enrollment into the PHYD program, whereas the
vertical range is relative to the driving score function
. His/her positive slope coefficient
denotes (on average) continuously improving driving performances over all tracked weeks.
Unfortunately, many users have few weekly scores, making the regression analysis not representative for them. We thus discarded users with less than 8 scored weeks within the program and computed the regression coefficients for the remaining 212 users. We performed the statistical analyses of the
coefficients, relying on the popular
p-values, for each driver. A low
p-value suggests that the relationship between independent and dependent variables is statistically significant, i.e., the passing of time influences the driving score within the telematic program. Conversely, a high
p-value indicates that this relationship could plausibly be due to random fluctuations in the data rather than an actual relationship between the variables. In this regard, it is well known that the conventional significance threshold of 0.05 may be unsuitable for studies with very small sample sizes, as in our case, and there is growing acceptance in various research fields to tolerate higher thresholds, such as 0.15 or 0.20 [
32].
Since our analysis focuses on capturing temporal trends rather than establishing a precise predictive model, we adopted 0.20. Thus, we interpret linear regressions with p-values as statistically not significant, and the corresponding users are categorized as ‘Not Significant’. On the contrary, for the users with p-value , we can capture patterns over time by looking at the value of their slope coefficient. These drivers are divided into three classes called ‘Negative’, ‘Null’, and ‘Positive’, as in the previous approach. Specifically, drivers with are classified as ‘Positive’, because their (sufficiently) positive slope denotes a long-term improvement in driving behavior. Drivers with are classified as ‘Negative’, as their scores decrease over the observed weeks, while users with are classified as ‘Null’ to denote that no practical relevant changes have been observed in their scores over time. In the following, we refer to these clusters of users as slope-based classes.
The variable quantifying the engagement of each
k-th user to the app is computed as the mean of all the weekly durations
, i.e., as
where
is the user’s specific number of weeks within the telematics program. The choice to sum over all the weeks and divide by
does not penalize policyholders who enrolled later.
4.3. Limitations and Remarks
Our research has some important limitations that must be taken into consideration before moving on. One limitation is that we could not assess policyholders’ engagement in coaching programs that provide financial incentives. In empirical research on PHYD insurance programs, there is strong evidence that financial rewards are crucial for nudging policyholders into behavioral change [
33,
34,
35]. If policyholders have no economic advantage, they apparently have little or no incentive to change their behavior. The motivation to save money is stronger than safety reason to motivate a change in behavior. There is also empirical evidence that financial incentives alone are not sufficient to motivate policyholders to engage [
36]. Rewards need to be personalized, based on the risk profile of policyholders, and divided into short-term and long-term financial rewards. Unfortunately, the database we explored refers to a policy without a reward system. This lack of financial incentives can, with precision, explain the low digital engagement we observed in our database and drastically reduces the possibility of generalizing our results to the PHYD insurance market, where financial incentives are the norm.
An additional limitation of our research lies in the impossibility of linking the data on engagement and improvement of driving style with the demographic characteristics of the users, which we could not access for privacy reasons. At the beginning of our research, we wondered, for example, if there were differences in engagement between younger and older users (who presumably are less confident with digital technology), if the individual claim history was correlated with behavioral improvements and other issues. Access to this data, of course, could provide both insurance scholars and companies with important insights.
There are two further remarks, both related to time. As learned from [
7], a few months may be sufficient to assess the coaching effects on the insurance pool. However, with a 9-month observation period, we could not ascertain whether and to what extent a ‘habituation’ effect to the insurance program might occur, i.e., whether and to what extent time could affect engagement and, consequently, the effectiveness of the coaching program. Additionally, we could not examine the behavior of policyholders over a period of time beyond the policy renewal threshold (one year and more), which could have been extremely informative but exceeds the scope of this paper.
From a methodological perspective, while we opted for a simple linear regression model to estimate individual driving score trends, we acknowledge that this choice may oversimplify behavioral dynamics. Driving style does not necessarily change linearly over time; for instance, drivers may exhibit rapid improvements during the initial weeks followed by stabilization, or they may experience cyclical fluctuations in behavior. Alternative approaches could address these aspects. Nonlinear models (e.g., polynomial regression or spline-based methods) could capture curvilinear or plateauing trends, providing a more nuanced representation of behavioral change. Similarly, time-series methods (e.g., autoregressive models) could account for temporal dependencies and periodic patterns in driving scores. However, these methods require more observations per user and introduce challenges of model standardization and interpretability across a heterogeneous user population. Given the exploratory nature of this study and the limited number of weekly observations available for each user, we adopted the linear model as a pragmatic and interpretable solution for large-scale analysis. We also note that adopting a more flexible regression model would not fundamentally alter the methodology proposed here; rather, it would extend its applicability to richer datasets. Future research could therefore incorporate nonlinear or time-series models to enhance robustness and capture more complex dynamics in driver behavior. At last, we fully acknowledge that statistical inference based on a relatively high significance threshold () is unconventional and less stringent than the standard level typically adopted in behavioral and social sciences. This decision was not made lightly: it stemmed from the exploratory nature of the study and the constraints of the dataset, where many users have only a small number of weekly observations.
We are aware of the limitations of our investigation, which depend on the type of data we had available and the constraints related to the confidentiality of our sources. Access to high-quality and high-quantity data is notoriously problematic in the field in which our investigation was conducted. However, we believe that despite these limitations, our work offers a useful contribution to the analysis of the use of behavioral data in the insurance sector. Our research highlights the importance of engagement for the analysis and implementation of coaching programs, and proposes a research methodology based on clear definitions of the concepts of engagement and coaching, which have so far been absent in the literature. The use of the available dataset enabled us to show how the proposed methodologies can be applied and how the results can be read/interpreted. The results we illustrate in session
Section 5, in fact, are not meant to be representative of the telematics motor insurance worldwide but to offer clues to analyze and understand some real dynamics of that market.
6. Conclusions
The investigation presented in this work contributes to clarify the definition of engagement and the related methodological challenges that must be addressed when exploring engagement in proactive insurance policies. Behavioral telematics data promise to change both the traditional insurance business model and the interaction between insurers and policyholders, providing a solution to the classical problem of moral hazard [
37] (p. 1231). The paradox underlying moral hazard is that policyholders are less incentivized to take precautionary measures because they are insured [
38,
39]. In proactive insurance, the argument goes, if individual behavior could be observed and either rewarded or penalized depending on exposure to dangers, this could affect the propensity of policyholders to control their behavior and the problem of moral hazard could be, if not removed, at least mitigated. In PHYD insurance policies, telematics of course does not control moral hazard by directly steering individual behavior. The only control that might take place is a kind of self-control, that can be based on the motivation to improve behavior, but also on the awareness of being tracked, or on the mere possibility of earning financial incentives [
4]. To this purpose engagement plays a crucial role: our research shows that improvement effects are higher in policyholders who actively interact with the app.
Our findings, however, also show that engagement cannot be taken for granted. Many policyholders do not look at the app at all, and the ones who do it tend to have short and superficial sessions. The usage of any app is time-consuming and can be annoying over time. This finding is confirmed by research in behavioral data-based health insurance, that is, in the so-called pay-as-you-live (PAYL) insurance policies [
40,
41,
42,
43]. If the goal of PHYD insurance policies is to improve driving style, policyholders must first be motivated to engage with the app in order to be motivated to change their behavior. Is there anything insurance companies can do to increase the level of engagement?
Insurance companies could expand the functionality of the app and make the interaction with it more appealing. For example, effective engagement can depend on the app usability, but also on short messages and notifications proposing challenges to achieve in order to improve driving behavior, earn points, and be rewarded. If these messages and notifications were personalized, coaching could be custom-made and the app-based interaction with the insurance company could be more exciting. Such interaction is itself a kind of behavior producing second-order behavioral data that can be recorded and used strategically. Insurance companies, therefore, could also implement second-order coaching strategies with the goal of improving engagement, besides the first-order strategies aiming at improving driving behavior.
The open question is how insurance companies can make use of these second-order scores. In principle, the level of engagement could be taken into account in the scoring process as well. Including engagement into the score, however, is not without risks. One could reward the most engaged users with points they earn when they interact with the telematics app. But users who know that their score also depends on this interaction could be inclined to game the system, learning to improve their score rather than learning to improve their driving behavior. Indeed, information fed back to the drivers can affect how people behave, but also how people deal with their behavior.
For example, a father who is late in driving his daughter to school might turn off his mobile phone and turn it back on when he drives calmly from school to work. The interaction with the app becomes strategic, if not even opportunistic. Sure, a single event does not affect a long exposure, and by means of crossing data, the lack of the trip can be detected. On the other hand, a systematic misuse can be associated to fraud. Regardless of these two extreme cases, it would be fine to reward users who stay tuned, but a reward based on the level of engagement with the app would work only if it were not disclosed. However, can insurance companies not disclose how they calculate the score?
These difficulties can be interpreted as flaws of behavioral insurance policies but also as confirmation of the crucial role of engagement, which could give rise to innovative approaches. In mundane everyday life, instead of optimizing their driving style, users are rather prone to optimize their interaction with the tracking device [
40]. Instead of keeping their driving behavior under control, users keep their tracking behavior under control. This can be discouraging for software providers and insurance companies that use self-tracking technologies to implement personalized proactive prevention programs, but at the same time it is emblematic of the increasingly pervasive interaction individuals have with apps on their mobile phones. It also suggests the possibility, at least in principle, of harnessing this kind of ‘addiction’ in a positive manner, namely, to prevent accidents. Long-term engagement is a challenge but could also be the starting point for a more complex implementation of UBI business models by insurance companies.