Machine Learning in Insurance

A special issue of Risks (ISSN 2227-9091).

Deadline for manuscript submissions: closed (31 December 2019) | Viewed by 56102

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Cass Business School, City, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK
Interests: machine learning in insurance; structured nonparametric statistics; pension research

E-Mail Website
Guest Editor
Cass Business School, University of London, London, UK

E-Mail Website
Guest Editor
Faculty of Actuarial Science & Insurance, Bayes Business School, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK
Interests: numerical methods: transform techniques and Monte Carlo simulation; stochastic asset modelling; exotic derivatives; commodity markets; actuarial science

Special Issue Information

Dear Colleagues, 

Machine learning is a relatively new field without a unanimous definition. It is well-accepted that machine learning is a combination of computational methods and validation of prediction methods. The latter are often statistical prediction methods, however other methods exist. Additionally, machine learning methods might be ad hoc and closely connected to the application at hand. To create ad hoc prediction methods, the machine learner needs to draw on experience and knowledge of everything surrounding the case at hand. It is not enough to define a simple performance measure without further explanation, such a measure needs to be aligned with the needs of the client, sponsor or who-ever ordered the study in the first place. In many ways, actuaries have been machine learners. In both pricing and reserving, and also more recently in capital modeling, actuaries have combined statistical methodology with a deep understanding of the problem at hand and how any solution may affect the company and its customers. One aspect that has perhaps not been so well-developed among actuaries is validation. Discussions among actuaries’ “preferred methods” were often without solid scientific arguments, including validation of the case at hand. Our criteria for this Special Issue are to promote a good practice of machine learning in insurance considering the following three key issues: a) Who is the client or sponsor or otherwise interested real-life target of this study? b) The reason for working with this particular data set and a clarification of available extra knowledge – that we also call prior knowledge—besides the data set alone. c) A mathematical statistical argument for the validation procedure. In other words, a critical question to be answered is how prior knowledge fits with the data set in a correct mathematical statistical model.

Notice that we do not consider any statistical methods to be more “machine learning” than others. Therefore, a logistic regression method or a generalized linear model may both be the final choice of machine learning.

Moving from these considerations, this Special Issue aims to compile high quality papers that offer a discussion of the state-of-the-art developments or introduce new theoretical or practical advances in this field. We welcome papers related, but not limited to, the following topics:

  • Modeling of capital requirement for Life or Non-Life Underwriting Risk
  • Modeling extensions of Solvency II Standard formula
  • Pricing methodology in non-life insurance
  • Reserving methodology in non-life insurance
  • Any other data-driven modeling procedure relevant in non-life or life insurance, including marketing applications or price elasticity investigations.

Prof. Dr. Jens Perch Nielsen
Dr. Vali Asimit
Dr. Ioannis Kyriakou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Risks is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • risk analysis
  • volatility
  • insurance
  • client
  • audience
  • validation
  • domain knowledge
  • prior knowledge
  • statistical model

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 258 KiB  
Editorial
Special Issue “Machine Learning in Insurance”
by Vali Asimit, Ioannis Kyriakou and Jens Perch Nielsen
Risks 2020, 8(2), 54; https://doi.org/10.3390/risks8020054 - 25 May 2020
Cited by 4 | Viewed by 3103
Abstract
It is our pleasure to prologue the special issue on “Machine Learning in Insurance”, which represents a compilation of ten high-quality articles discussing avant-garde developments or introducing new theoretical or practical advances in this field [...] Full article
(This article belongs to the Special Issue Machine Learning in Insurance)

Research

Jump to: Editorial

14 pages, 1114 KiB  
Article
A Note on Combining Machine Learning with Statistical Modeling for Financial Data Analysis
by José María Sarabia, Faustino Prieto, Vanesa Jordá and Stefan Sperlich
Risks 2020, 8(2), 32; https://doi.org/10.3390/risks8020032 - 03 Apr 2020
Cited by 2 | Viewed by 2793
Abstract
This note revisits the ideas of the so-called semiparametric methods that we consider to be very useful when applying machine learning in insurance. To this aim, we first recall the main essence of semiparametrics like the mixing of global and local estimation and [...] Read more.
This note revisits the ideas of the so-called semiparametric methods that we consider to be very useful when applying machine learning in insurance. To this aim, we first recall the main essence of semiparametrics like the mixing of global and local estimation and the combining of explicit modeling with purely data adaptive inference. Then, we discuss stepwise approaches with different ways of integrating machine learning. Furthermore, for the modeling of prior knowledge, we introduce classes of distribution families for financial data. The proposed procedures are illustrated with data on stock returns for five companies of the Spanish value-weighted index IBEX35. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

27 pages, 2013 KiB  
Article
Prediction of Claims in Export Credit Finance: A Comparison of Four Machine Learning Techniques
by Mathias Bärtl and Simone Krummaker
Risks 2020, 8(1), 22; https://doi.org/10.3390/risks8010022 - 01 Mar 2020
Cited by 16 | Viewed by 8416
Abstract
This study evaluates four machine learning (ML) techniques (Decision Trees (DT), Random Forests (RF), Neural Networks (NN) and Probabilistic Neural Networks (PNN)) on their ability to accurately predict export credit insurance claims. Additionally, we compare the performance of the ML techniques against a [...] Read more.
This study evaluates four machine learning (ML) techniques (Decision Trees (DT), Random Forests (RF), Neural Networks (NN) and Probabilistic Neural Networks (PNN)) on their ability to accurately predict export credit insurance claims. Additionally, we compare the performance of the ML techniques against a simple benchmark (BM) heuristic. The analysis is based on the utilisation of a dataset provided by the Berne Union, which is the most comprehensive collection of export credit insurance data and has been used in only two scientific studies so far. All ML techniques performed relatively well in predicting whether or not claims would be incurred, and, with limitations, in predicting the order of magnitude of the claims. No satisfactory results were achieved predicting actual claim ratios. RF performed significantly better than DT, NN and PNN against all prediction tasks, and most reliably carried their validation performance forward to test performance. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

79 pages, 1797 KiB  
Article
Machine Learning in Least-Squares Monte Carlo Proxy Modeling of Life Insurance Companies
by Anne-Sophie Krah, Zoran Nikolić and Ralf Korn
Risks 2020, 8(1), 21; https://doi.org/10.3390/risks8010021 - 21 Feb 2020
Cited by 10 | Viewed by 5582
Abstract
Under the Solvency II regime, life insurance companies are asked to derive their solvency capital requirements from the full loss distributions over the coming year. Since the industry is currently far from being endowed with sufficient computational capacities to fully simulate these distributions, [...] Read more.
Under the Solvency II regime, life insurance companies are asked to derive their solvency capital requirements from the full loss distributions over the coming year. Since the industry is currently far from being endowed with sufficient computational capacities to fully simulate these distributions, the insurers have to rely on suitable approximation techniques such as the least-squares Monte Carlo (LSMC) method. The key idea of LSMC is to run only a few wisely selected simulations and to process their output further to obtain a risk-dependent proxy function of the loss. In this paper, we present and analyze various adaptive machine learning approaches that can take over the proxy modeling task. The studied approaches range from ordinary and generalized least-squares regression variants over generalized linear model (GLM) and generalized additive model (GAM) methods to multivariate adaptive regression splines (MARS) and kernel regression routines. We justify the combinability of their regression ingredients in a theoretical discourse. Further, we illustrate the approaches in slightly disguised real-world experiments and perform comprehensive out-of-sample tests. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

13 pages, 406 KiB  
Article
Modelling Unobserved Heterogeneity in Claim Counts Using Finite Mixture Models
by Lluís Bermúdez, Dimitris Karlis and Isabel Morillo
Risks 2020, 8(1), 10; https://doi.org/10.3390/risks8010010 - 29 Jan 2020
Cited by 8 | Viewed by 2870
Abstract
When modelling insurance claim count data, the actuary often observes overdispersion and an excess of zeros that may be caused by unobserved heterogeneity. A common approach to accounting for overdispersion is to consider models with some overdispersed distribution as opposed to Poisson models. [...] Read more.
When modelling insurance claim count data, the actuary often observes overdispersion and an excess of zeros that may be caused by unobserved heterogeneity. A common approach to accounting for overdispersion is to consider models with some overdispersed distribution as opposed to Poisson models. Zero-inflated, hurdle and compound frequency models are typically applied to insurance data to account for such a feature of the data. However, a natural way to deal with unobserved heterogeneity is to consider mixtures of a simpler models. In this paper, we consider k-finite mixtures of some typical regression models. This approach has interesting features: first, it allows for overdispersion and the zero-inflated model represents a special case, and second, it allows for an elegant interpretation based on the typical clustering application of finite mixture models. k-finite mixture models are applied to a car insurance claim dataset in order to analyse whether the problem of unobserved heterogeneity requires a richer structure for risk classification. Our results show that the data consist of two subpopulations for which the regression structure is different. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

17 pages, 714 KiB  
Article
In-Sample Hazard Forecasting Based on Survival Models with Operational Time
by Stephan M. Bischofberger
Risks 2020, 8(1), 3; https://doi.org/10.3390/risks8010003 - 03 Jan 2020
Cited by 4 | Viewed by 2774
Abstract
We introduce a generalization of the one-dimensional accelerated failure time model allowing the covariate effect to be any positive function of the covariate. This function and the baseline hazard rate are estimated nonparametrically via an iterative algorithm. In an application in non-life reserving, [...] Read more.
We introduce a generalization of the one-dimensional accelerated failure time model allowing the covariate effect to be any positive function of the covariate. This function and the baseline hazard rate are estimated nonparametrically via an iterative algorithm. In an application in non-life reserving, the survival time models the settlement delay of a claim and the covariate effect is often called operational time. The accident date of a claim serves as covariate. The estimated hazard rate is a nonparametric continuous-time alternative to chain-ladder development factors in reserving and is used to forecast outstanding liabilities. Hence, we provide an extension of the chain-ladder framework for claim numbers without the assumption of independence between settlement delay and accident date. Our proposed algorithm is an unsupervised learning approach to reserving that detects operational time in the data and adjusts for it in the estimation process. Advantages of the new estimation method are illustrated in a data set consisting of paid claims from a motor insurance business line on which we forecast the number of outstanding claims. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

20 pages, 429 KiB  
Article
A Likelihood Approach to Bornhuetter–Ferguson Analysis
by Valandis Elpidorou, Carolin Margraf, María Dolores Martínez-Miranda and Bent Nielsen
Risks 2019, 7(4), 119; https://doi.org/10.3390/risks7040119 - 10 Dec 2019
Cited by 3 | Viewed by 3547
Abstract
A new Bornhuetter–Ferguson method is suggested herein. This is a variant of the traditional chain ladder method. The actuary can adjust the relative ultimates using externally estimated relative ultimates. These correspond to linear constraints on the Poisson likelihood underpinning the chain ladder method. [...] Read more.
A new Bornhuetter–Ferguson method is suggested herein. This is a variant of the traditional chain ladder method. The actuary can adjust the relative ultimates using externally estimated relative ultimates. These correspond to linear constraints on the Poisson likelihood underpinning the chain ladder method. Adjusted cash flow estimates were obtained as constrained maximum likelihood estimates. The statistical derivation of the new method is provided in the generalised linear model framework. A related approach in the literature, combining unconstrained and constrained maximum likelihood estimates, is presented in the same framework and compared theoretically. A data illustration is described using a motor portfolio from a Greek insurer. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

22 pages, 944 KiB  
Article
Conditional Variance Forecasts for Long-Term Stock Returns
by Enno Mammen, Jens Perch Nielsen, Michael Scholz and Stefan Sperlich
Risks 2019, 7(4), 113; https://doi.org/10.3390/risks7040113 - 05 Nov 2019
Cited by 9 | Viewed by 3276
Abstract
In this paper, we apply machine learning to forecast the conditional variance of long-term stock returns measured in excess of different benchmarks, considering the short- and long-term interest rate, the earnings-by-price ratio, and the inflation rate. In particular, we apply in a two-step [...] Read more.
In this paper, we apply machine learning to forecast the conditional variance of long-term stock returns measured in excess of different benchmarks, considering the short- and long-term interest rate, the earnings-by-price ratio, and the inflation rate. In particular, we apply in a two-step procedure a fully nonparametric local-linear smoother and choose the set of covariates as well as the smoothing parameters via cross-validation. We find that volatility forecastability is much less important at longer horizons regardless of the chosen model and that the homoscedastic historical average of the squared return prediction errors gives an adequate approximation of the unobserved realised conditional variance for both the one-year and five-year horizon. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

17 pages, 441 KiB  
Article
On the Validation of Claims with Excess Zeros in Liability Insurance: A Comparative Study
by Marjan Qazvini
Risks 2019, 7(3), 71; https://doi.org/10.3390/risks7030071 - 30 Jun 2019
Cited by 5 | Viewed by 3429
Abstract
In this study, we consider the problem of zero claims in a liability insurance portfolio and compare the predictability of three models. We use French motor third party liability (MTPL) insurance data, which has been used for a pricing game, and show that [...] Read more.
In this study, we consider the problem of zero claims in a liability insurance portfolio and compare the predictability of three models. We use French motor third party liability (MTPL) insurance data, which has been used for a pricing game, and show that how the type of coverage and policyholders’ willingness to subscribe to insurance pricing, based on telematics data, affects their driving behaviour and hence their claims. Using our validation set, we then predict the number of zero claims. Our results show that although a zero-inflated Poisson (ZIP) model performs better than a Poisson regression, it can even be outperformed by logistic regression. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

16 pages, 2386 KiB  
Article
Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression
by Jessica Pesantez-Narvaez, Montserrat Guillen and Manuela Alcañiz
Risks 2019, 7(2), 70; https://doi.org/10.3390/risks7020070 - 20 Jun 2019
Cited by 87 | Viewed by 14260
Abstract
XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims versus no claims can be used to identify the determinants of traffic accidents. This study compared the relative performances of logistic regression [...] Read more.
XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims versus no claims can be used to identify the determinants of traffic accidents. This study compared the relative performances of logistic regression and XGBoost approaches for predicting the existence of accident claims using telematics data. The dataset contained information from an insurance company about the individuals’ driving patterns—including total annual distance driven and percentage of total distance driven in urban areas. Our findings showed that logistic regression is a suitable model given its interpretability and good predictive capacity. XGBoost requires numerous model-tuning procedures to match the predictive performance of the logistic regression model and greater effort as regards to interpretation. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

18 pages, 418 KiB  
Article
Sound Deposit Insurance Pricing Using a Machine Learning Approach
by Hirbod Assa, Mostafa Pouralizadeh and Abdolrahim Badamchizadeh
Risks 2019, 7(2), 45; https://doi.org/10.3390/risks7020045 - 19 Apr 2019
Cited by 4 | Viewed by 3620
Abstract
While the main conceptual issue related to deposit insurances is the moral hazard risk, the main technical issue is inaccurate calibration of the implied volatility. This issue can raise the risk of generating an arbitrage. In this paper, first, we discuss that by [...] Read more.
While the main conceptual issue related to deposit insurances is the moral hazard risk, the main technical issue is inaccurate calibration of the implied volatility. This issue can raise the risk of generating an arbitrage. In this paper, first, we discuss that by imposing the no-moral-hazard risk, the removal of arbitrage is equivalent to removing the static arbitrage. Then, we propose a simple quadratic model to parameterize implied volatility and remove the static arbitrage. The process of removing the static risk is as follows: Using a machine learning approach with a regularized cost function, we update the parameters in such a way that butterfly arbitrage is ruled out and also implementing a calibration method, we make some conditions on the parameters of each time slice to rule out calendar spread arbitrage. Therefore, eliminating the effects of both butterfly and calendar spread arbitrage make the implied volatility surface free of static arbitrage. Full article
(This article belongs to the Special Issue Machine Learning in Insurance)
Show Figures

Figure 1

Back to TopTop